netdisco-users Digest, Vol 189, Issue 3

netdisco-users-request Mon, 23 May 2022 07:33:36 -0700

Send netdisco-users mailing list submissions to
        [email protected]


To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.sourceforge.net/lists/listinfo/netdisco-users
or, via email, send a message with subject or body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of netdisco-users digest..."

Today's Topics:

   1. Re: Urgent Problem with Netdisco new version 2.52.6 (Andy Ruhl)
   2. Re: Netdisco auto discovery tasks suddenly stopped working (Muris)

--- Begin Message ---

This is also something that can happen if your system is running low
on resources. I see from your graphs that this might be the case. I
don't really understand what I'm looking at there.

Disk space might also be an issue.

Check the config file for this:

# number of SNMP workers to run in parallel (in netdisco-backend).
# the default is twice the number of CPU cores. increase this if
# your system has few cores and the schedule is taking too long.
# ```````````````````````````````````````````````````````````````
#workers:
#  tasks: 'AUTO * 2'

Try adjusting that. Also:

# this is the schedule for automatically keeping netdisco up-to-date;
# these are good defaults, so only uncomment if needing to change.
# (or set "schedule: null" if you wish to disable the scheduler)
# ````````````````````````````````````````````````````````````````````
#schedule:
#  discoverall:
#    when: '5 7 * * *'
#  macwalk:
#    when:
#      min: 20
#  arpwalk:
#    when:
#      min: 50
#  nbtwalk:
#    when: '0 8,13,21 * * *'
#  expire:
#    when: '30 23 * * *'
#  makerancidconf: null

I find macwalk (macsuck?) and arpwalk (arpnip?) to only be needed once
an hour or even less. I don't do nbtwalk and expire once a day is fine
for me as well.

This will reduce the load on your machine possibly.

If you really have a memory leak try updating the packages on the system.

Andy

On Mon, May 23, 2022 at 4:52 AM Muris <[email protected]> wrote:
>
> Im also checking the postgresql log and I have this also showing - 
> /var/lib/pgsql/12/data/log
>
> few strange errors occurring. Any ideas anyone?
>
> 2022-05-23 19:57:04.247 ACST [10664] STATEMENT:  INSERT INTO device_vlan ( 
> description, ip, last_discover, vlan) VALUES ( $1, $2, now(), $3 )
> 2022-05-23 19:57:04.247 ACST [10664] ERROR:  current transaction is aborted, 
> commands ignored until end of transaction block
> 2022-05-23 19:57:04.247 ACST [10664] STATEMENT:  INSERT INTO device_vlan ( 
> description, ip, last_discover, vlan) VALUES ( $1, $2, now(), $3 )
> 2022-05-23 19:57:04.248 ACST [10664] ERROR:  current transaction is aborted, 
> commands ignored until end of transaction block
> 2022-05-23 19:57:04.248 ACST [10664] STATEMENT:  INSERT INTO device_vlan ( 
> description, ip, last_discover, vlan) VALUES ( $1, $2, now(), $3 )
> 2022-05-23 19:57:04.248 ACST [10664] ERROR:  current transaction is aborted, 
> commands ignored until end of transaction block
> 2022-05-23 19:57:04.248 ACST [10664] STATEMENT:  INSERT INTO device_vlan ( 
> description, ip, last_discover, vlan) VALUES ( $1, $2, now(), $3 )
> 2022-05-23 19:57:36.171 ACST [10809] ERROR:  canceling autovacuum task
> 2022-05-23 19:57:36.171 ACST [10809] CONTEXT:  automatic vacuum of table 
> "netdisco.public.device_port"
> 2022-05-23 20:05:36.933 ACST [11727] ERROR:  canceling autovacuum task
> 2022-05-23 20:05:36.933 ACST [11727] CONTEXT:  automatic vacuum of table 
> "netdisco.public.device_port"
> 2022-05-23 20:06:36.315 ACST [11812] ERROR:  canceling autovacuum task
> 2022-05-23 20:06:36.315 ACST [11812] CONTEXT:  automatic vacuum of table 
> "netdisco.public.device_port"
> 2022-05-23 20:07:36.741 ACST [11906] ERROR:  canceling autovacuum task
> 2022-05-23 20:07:36.741 ACST [11906] CONTEXT:  automatic vacuum of table 
> "netdisco.public.device_port"
> 2022-05-23 20:09:01.974 ACST [12021] ERROR:  invalid input syntax for type 
> macaddr: "00:00:00:00"
> 2022-05-23 20:09:01.974 ACST [12021] STATEMENT:  SELECT me.ip, me.creation, 
> me.dns, me.description, me.uptime, me.contact, me.name, me.location, 
> me.layers, me.num_ports, me.mac, me.serial, me.chassis_id, me.model, 
> me.ps1_type, me.ps2_type, me.ps1_status, me.ps2_status, me.fan, me.slots, 
> me.vendor, me.os, me.os_ver, me.log, me.snmp_ver, me.snmp_comm, 
> me.snmp_class, me.snmp_engineid, me.vtp_domain, me.last_discover, 
> me.last_macsuck, me.last_arpnip, me.is_pseudo FROM device me WHERE ( mac = $1 
> )
> 2022-05-23 20:10:00.405 ACST [12179] ERROR:  column c.relhasoids does not 
> exist at character 245
> 2022-05-23 20:10:00.405 ACST [12179] STATEMENT:  select n.nspname, c.relname, 
> a.attname, a.atttypid, t.typname, a.attnum, a.attlen, a.atttypmod, 
> a.attnotnull, c.relhasrules, c.relkind, c.oid, pg_get_expr(d.adbin, 
> d.adrelid), case t.typtype when 'd' then t.typbasetype else 0 end, 
> t.typtypmod, c.relhasoids, attidentity, c.relhassubclass from 
> (((pg_catalog.pg_class c inner join pg_catalog.pg_namespace n on n.oid = 
> c.relnamespace and c.oid = 16535) inner join pg_catalog.pg_attribute a on 
> (not a.attisdropped) and a.attnum > 0 and a.attrelid = c.oid) inner join 
> pg_catalog.pg_type t on t.oid = a.atttypid) left outer join pg_attrdef d on 
> a.atthasdef and d.adrelid = a.attrelid and d.adnum = a.attnum order by 
> n.nspname, c.relname, attnum
> 2022-05-23 20:10:00.407 ACST [12179] ERROR:  column c.relhasoids does not 
> exist at character 245
> 2022-05-23 20:10:00.407 ACST [12179] STATEMENT:  select n.nspname, c.relname, 
> a.attname, a.atttypid, t.typname, a.attnum, a.attlen, a.atttypmod, 
> a.attnotnull, c.relhasrules, c.relkind, c.oid, pg_get_expr(d.adbin, 
> d.adrelid), case t.typtype when 'd' then t.typbasetype else 0 end, 
> t.typtypmod, c.relhasoids, attidentity, c.relhassubclass from 
> (((pg_catalog.pg_class c inner join pg_catalog.pg_namespace n on n.oid = 
> c.relnamespace and c.oid = 16535) inner join pg_catalog.pg_attribute a on 
> (not a.attisdropped) and a.attnum > 0 and a.attrelid = c.oid) inner join 
> pg_catalog.pg_type t on t.oid = a.atttypid) left outer join pg_attrdef d on 
> a.atthasdef and d.adrelid = a.attrelid and d.adnum = a.attnum order by 
> n.nspname, c.relname, attnum
> 2022-05-23 20:10:00.408 ACST [12179] ERROR:  column c.relhasoids does not 
> exist at character 245
> 2022-05-23 20:10:00.408 ACST [12179] STATEMENT:  select n.nspname, c.relname, 
> a.attname, a.atttypid, t.typname, a.attnum, a.attlen, a.atttypmod, 
> a.attnotnull, c.relhasrules, c.relkind, c.oid, pg_get_expr(d.adbin, 
> d.adrelid), case t.typtype when 'd' then t.typbasetype else 0 end, 
> t.typtypmod, c.relhasoids, attidentity, c.relhassubclass from 
> (((pg_catalog.pg_class c inner join pg_catalog.pg_namespace n on n.oid = 
> c.relnamespace and c.oid = 16405) inner join pg_catalog.pg_attribute a on 
> (not a.attisdropped) and a.attnum > 0 and a.attrelid = c.oid) inner join 
> pg_catalog.pg_type t on t.oid = a.atttypid) left outer join pg_attrdef d on 
> a.atthasdef and d.adrelid = a.attrelid and d.adnum = a.attnum order by 
> n.nspname, c.relname, attnum
> 2022-05-23 20:10:00.409 ACST [12179] ERROR:  column c.relhasoids does not 
> exist at character 245
>
>
> 2022-05-23 20:50:02.381 ACST [15274] STATEMENT:  select n.nspname, c.relname, 
> a.attname, a.atttypid, t.typname, a.attnum, a.attlen, a.atttypmod, 
> a.attnotnull, c.relhasrules, c.relkind, c.oid, pg_get_expr(d.adbin, 
> d.adrelid), case t.typtype when 'd' then t.typbasetype else 0 end, 
> t.typtypmod, c.relhasoids, attidentity, c.relhassubclass from 
> (((pg_catalog.pg_class c inner join pg_catalog.pg_namespace n on n.oid = 
> c.relnamespace and c.oid = 16405) inner join pg_catalog.pg_attribute a on 
> (not a.attisdropped) and a.attnum > 0 and a.attrelid = c.oid) inner join 
> pg_catalog.pg_type t on t.oid = a.atttypid) left outer join pg_attrdef d on 
> a.atthasdef and d.adrelid = a.attrelid and d.adnum = a.attnum order by 
> n.nspname, c.relname, attnum
> 2022-05-23 20:50:02.382 ACST [15274] ERROR:  column c.relhasoids does not 
> exist at character 245
> 2022-05-23 20:50:02.382 ACST [15274] STATEMENT:  select n.nspname, c.relname, 
> a.attname, a.atttypid, t.typname, a.attnum, a.attlen, a.atttypmod, 
> a.attnotnull, c.relhasrules, c.relkind, c.oid, pg_get_expr(d.adbin, 
> d.adrelid), case t.typtype when 'd' then t.typbasetype else 0 end, 
> t.typtypmod, c.relhasoids, attidentity, c.relhassubclass from 
> (((pg_catalog.pg_class c inner join pg_catalog.pg_namespace n on n.oid = 
> c.relnamespace and c.oid = 16405) inner join pg_catalog.pg_attribute a on 
> (not a.attisdropped) and a.attnum > 0 and a.attrelid = c.oid) inner join 
> pg_catalog.pg_type t on t.oid = a.atttypid) left outer join pg_attrdef d on 
> a.atthasdef and d.adrelid = a.attrelid and d.adnum = a.attnum order by 
> n.nspname, c.relname, attnum
> 2022-05-23 20:50:02.383 ACST [15274] ERROR:  column c.relhasoids does not 
> exist at character 245
> 2022-05-23 20:50:02.383 ACST [15274] STATEMENT:  select n.nspname, c.relname, 
> a.attname, a.atttypid, t.typname, a.attnum, a.attlen, a.atttypmod, 
> a.attnotnull, c.relhasrules, c.relkind, c.oid, pg_get_expr(d.adbin, 
> d.adrelid), case t.typtype when 'd' then t.typbasetype else 0 end, 
> t.typtypmod, c.relhasoids, attidentity, c.relhassubclass from 
> (((pg_catalog.pg_class c inner join pg_catalog.pg_namespace n on n.oid = 
> c.relnamespace and c.oid = 16405) inner join pg_catalog.pg_attribute a on 
> (not a.attisdropped) and a.attnum > 0 and a.attrelid = c.oid) inner join 
> pg_catalog.pg_type t on t.oid = a.atttypid) left outer join pg_attrdef d on 
> a.atthasdef and d.adrelid = a.attrelid and d.adnum = a.attnum order by 
> n.nspname, c.relname, attnum
> 2022-05-23 20:50:02.384 ACST [15274] ERROR:  column c.relhasoids does not 
> exist at character 245
> 2022-05-23 20:50:02.384 ACST [15274] STATEMENT:  select n.nspname, c.relname, 
> a.attname, a.atttypid, t.typname, a.attnum, a.attlen, a.atttypmod, 
> a.attnotnull, c.relhasrules, c.relkind, c.oid, pg_get_expr(d.adbin, 
> d.adrelid), case t.typtype when 'd' then t.typbasetype else 0 end, 
> t.typtypmod, c.relhasoids, attidentity, c.relhassubclass from 
> (((pg_catalog.pg_class c inner join pg_catalog.pg_namespace n on n.oid = 
> c.relnamespace and c.oid = 16525) inner join pg_catalog.pg_attribute a on 
> (not a.attisdropped) and a.attnum > 0 and a.attrelid = c.oid) inner join 
> pg_catalog.pg_type t on t.oid = a.atttypid) left outer join pg_attrdef d on 
> a.atthasdef and d.adrelid = a.attrelid and d.adnum = a.attnum order by 
> n.nspname, c.relname, attnum
> 2022-05-23 20:50:02.385 ACST [15274] ERROR:  column c.relhasoids does not 
> exist at character 245
> 2022-05-23 20:50:02.385 ACST [15274] STATEMENT:  select n.nspname, c.relname, 
> a.attname, a.atttypid, t.typname, a.attnum, a.attlen, a.atttypmod, 
> a.attnotnull, c.relhasrules, c.relkind, c.oid, pg_get_expr(d.adbin, 
> d.adrelid), case t.typtype when 'd' then t.typbasetype else 0 end, 
> t.typtypmod, c.relhasoids, attidentity, c.relhassubclass from 
> (((pg_catalog.pg_class c inner join pg_catalog.pg_namespace n on n.oid = 
> c.relnamespace and c.oid = 16525) inner join pg_catalog.pg_attribute a on 
> (not a.attisdropped) and a.attnum > 0 and a.attrelid = c.oid) inner join 
> pg_catalog.pg_type t on t.oid = a.atttypid) left outer join pg_attrdef d on 
> a.atthasdef and d.adrelid = a.attrelid and d.adnum = a.attnum order by 
> n.nspname, c.relname, attnum
>
>
> On 23/5/2022, 08:39, "Muris" <[email protected]> wrote:
>
>     Thanks Andy I did try that but didn’t work still. I actually had to 
> reboot the box, looks like something was stuck in memory or something it 
> didn’t like.
>
>     I track the box cpu/memory, and this is a before and after some screens I 
> took after the reboot (see attached)
>
>     Seems semi normal now.. but discoveries seem a bit hit and miss with the 
> workers, I think it might be getting stuck on something.
>
>     Those errors have cleared now from the logs after a reboot - box was up 
> for 181 days.. but I see these new ones popup
>
>     Use of uninitialized value in sprintf at 
> /home/netdisco/perl5/lib/perl5/App/Netdisco/Worker/Plugin/Discover/Entities.pm
>  line 108.
>     Use of uninitialized value in sprintf at 
> /home/netdisco/perl5/lib/perl5/App/Netdisco/Worker/Plugin/Discover/Entities.pm
>  line 108.
>     Use of uninitialized value in sprintf at 
> /home/netdisco/perl5/lib/perl5/App/Netdisco/Worker/Plugin/Discover/Entities.pm
>  line 108.
>     Use of uninitialized value in sprintf at 
> /home/netdisco/perl5/lib/perl5/App/Netdisco/Worker/Plugin/Discover/Entities.pm
>  line 108.
>     Use of uninitialized value in sprintf at 
> /home/netdisco/perl5/lib/perl5/App/Netdisco/Worker/Plugin/Discover/Entities.pm
>  line 108.
>     Argument "AirOS" isn't numeric in numeric eq (==) at 
> /home/netdisco/perl5/lib/perl5/SNMP/Info/Layer2/Ubiquiti.pm line 101.
>     Argument "AirOS" isn't numeric in numeric eq (==) at 
> /home/netdisco/perl5/lib/perl5/SNMP/Info/Layer2/Ubiquiti.pm line 101.
>     Use of uninitialized value in sprintf at 
> /home/netdisco/perl5/lib/perl5/App/Netdisco/Worker/Plugin/Discover/Entities.pm
>  line 108.
>     Argument "AirOS" isn't numeric in numeric eq (==) at 
> /home/netdisco/perl5/lib/perl5/SNMP/Info/Layer2/Ubiquiti.pm line 101.
>     Argument "AirOS" isn't numeric in numeric eq (==) at 
> /home/netdisco/perl5/lib/perl5/SNMP/Info/Layer2/Ubiquiti.pm line 101.
>     Argument "AirOS" isn't numeric in numeric eq (==) at 
> /home/netdisco/perl5/lib/perl5/SNMP/Info/Layer2/Ubiquiti.pm line 101.
>
>     On 22/5/2022, 21:32, "Andy Ruhl" <[email protected]> wrote:
>
>         "Cannot allocate memory" is generally not good, you need to
>         investigate that. That's the back end which doesn't affect the web
>         front end directly, but if the system is out of memory that certainly
>         would.
>
>         Maybe try shutting off the backend and see if the front end works?
>
>         netdisco-backend stop
>
>         Andy
>
>         On Sat, May 21, 2022 at 10:31 PM Muris <[email protected]> wrote:
>         >
>         > I had a look at the backend and front end logs and it shows this, I 
> don’t know if this is having any impact.
>         >
>         >
>         >
>         > [16386] 2022-05-22 05:26:04  warn App::Netdisco 2.052006 backend
>         >
>         > MCE::_dispatch_child: Failed to spawn worker 17: Cannot allocate 
> memory at /home/netdisco/perl5/bin/netdisco-backend-fg line 64.
>         >
>         >
>         >
>         > 2022/05/22-14:56:09 Starman::Server (type Net::Server::PreFork) 
> starting! pid(16414)
>         >
>         > Resolved [*]:5000 to [::]:5000, IPv6
>         >
>         > Not including resolved host [0.0.0.0] IPv4 because it will be 
> handled by [::] IPv6
>         >
>         > Binding to TCP port 5000 on host :: with IPv6
>         >
>         > 2022/05/22-14:56:09 Can't connect to TCP port 5000 on :: [Address 
> already in use]
>         >
>         >   at line 64 in file 
> /home/netdisco/perl5/lib/perl5/Net/Server/Proto/TCP.pm
>         >
>         > 2022/05/22-14:56:09 Received QUIT. Running a graceful shutdown
>         >
>         > 2022/05/22-14:56:09 Worker processes cleaned up
>         >
>         > 2022/05/22-14:56:09 Server closing!
>         >
>         >
>         >
>         >
>         >
>         > From: Muris <[email protected]>
>         > Date: Sunday, 22 May 2022 at 14:49
>         > To: "[email protected]" 
> <[email protected]>
>         > Subject: Urgent Problem with Netdisco new version 2.52.6
>         >
>         >
>         >
>         > Hi All,
>         >
>         >
>         >
>         > Have upgraded to netdisco to the new version on test box without 
> issues on 2.52.6, applied the same on production, and now any item when you 
> click on the device the page comes up blank,
>         >
>         > I can see the side bar to the right loads then quickly disappear 
> and the page is turning up blank when click on a device.
>         >
>         >
>         >
>         > The only thing that shows up is the ip address to the right, and a 
> CSV icon, that’s all , all rest of the tabs are missing/and do not load.
>         >
>         >
>         >
>         > Any advice?
>         >
>         >
>         >
>         > Thanks
>         >
>         >
>         >
>         >
>         >
>         >
>         >
>         >
>         >
>         > _______________________________________________
>         > Netdisco mailing list
>         > [email protected]
>         > https://sourceforge.net/p/netdisco/mailman/netdisco-users/
>
>
>
>
>
> _______________________________________________
> Netdisco mailing list
> [email protected]
> https://sourceforge.net/p/netdisco/mailman/netdisco-users/

--- End Message ---

--- Begin Message ---

Hi Christian, its been 3 months since this issue happened, and since it was 
fixed it worked fine. Again 3 months later the same thing has occurred.

Postgresql started getting out of memory and had to reboot box, yes I have also 
optimised postgresql with optimiser tool and the values.

I also updated netdisco to the later version that’s just released.

 

Ive tried running clearing the admin, device_skip and reindex the db..however 
something is getting stuck in discovery once you kick off the process.

 

I don’t know exactly what is getting stuck in discovery, right now its done 
about 1hr of discovery, and stuck at some process.


When I check the db theres still lot of discovery items in the queue but going 
nowhere. I don’t know what else I should be looking at, I don’t know if some 
device is causing a bug

Or something to insert into the db, considering its around 3000+ devices in the 
db it has to discover etc. 

Like it was working fine for months, and all of a sudden something it doesn’t 
like. Hoping you may have some ideas what else to check or do.. 

 

netdisco=> select count(*) from admin where status='queued';

 count 

-------

  6008

(1 row)

 

I also checked the postgresql error log and I am seeing entries in there 
something like this during discovery processes…

 

2022-05-23 19:57:04.247 ACST [10664] STATEMENT:  INSERT INTO device_vlan ( 
description, ip, last_discover, vlan) VALUES ( $1, $2, now(), $3 )

2022-05-23 19:57:04.247 ACST [10664] ERROR:  current transaction is aborted, 
commands ignored until end of transaction block

2022-05-23 19:57:04.247 ACST [10664] STATEMENT:  INSERT INTO device_vlan ( 
description, ip, last_discover, vlan) VALUES ( $1, $2, now(), $3 )

2022-05-23 19:57:04.248 ACST [10664] ERROR:  current transaction is aborted, 
commands ignored until end of transaction block

2022-05-23 19:57:04.248 ACST [10664] STATEMENT:  INSERT INTO device_vlan ( 
description, ip, last_discover, vlan) VALUES ( $1, $2, now(), $3 )

2022-05-23 19:57:04.248 ACST [10664] ERROR:  current transaction is aborted, 
commands ignored until end of transaction block

2022-05-23 19:57:04.248 ACST [10664] STATEMENT:  INSERT INTO device_vlan ( 
description, ip, last_discover, vlan) VALUES ( $1, $2, now(), $3 )

2022-05-23 19:57:36.171 ACST [10809] ERROR:  canceling autovacuum task

2022-05-23 19:57:36.171 ACST [10809] CONTEXT:  automatic vacuum of table 
"netdisco.public.device_port"

2022-05-23 20:05:36.933 ACST [11727] ERROR:  canceling autovacuum task

2022-05-23 20:05:36.933 ACST [11727] CONTEXT:  automatic vacuum of table 
"netdisco.public.device_port"

2022-05-23 20:06:36.315 ACST [11812] ERROR:  canceling autovacuum task

2022-05-23 20:06:36.315 ACST [11812] CONTEXT:  automatic vacuum of table 
"netdisco.public.device_port"

2022-05-23 20:07:36.741 ACST [11906] ERROR:  canceling autovacuum task

2022-05-23 20:07:36.741 ACST [11906] CONTEXT:  automatic vacuum of table 
"netdisco.public.device_port"

2022-05-23 20:09:01.974 ACST [12021] ERROR:  invalid input syntax for type 
macaddr: "00:00:00:00"

 

 

2022-05-23 20:50:02.384 ACST [15274] STATEMENT:  select n.nspname, c.relname, 
a.attname, a.atttypid, t.typname, a.attnum, a.attlen, a.atttypmod, 
a.attnotnull, c.relhasrules, c.relkind, c.oid, pg_get_expr(d.adbin, d.adrelid), 
case t.typtype when 'd' then t.typbasetype else 0 end, t.typtypmod, 
c.relhasoids, attidentity, c.relhassubclass from (((pg_catalog.pg_class c inner 
join pg_catalog.pg_namespace n on n.oid = c.relnamespace and c.oid = 16525) 
inner join pg_catalog.pg_attribute a on (not a.attisdropped) and a.attnum > 0 
and a.attrelid = c.oid) inner join pg_catalog.pg_type t on t.oid = a.atttypid) 
left outer join pg_attrdef d on a.atthasdef and d.adrelid = a.attrelid and 
d.adnum = a.attnum order by n.nspname, c.relname, attnum

2022-05-23 20:50:02.385 ACST [15274] ERROR:  column c.relhasoids does not exist 
at character 245

 

From: Christian Ramseyer <[email protected]>
Date: Thursday, 10 February 2022 at 18:13
To: Muris <[email protected]>, "[email protected]" 
<[email protected]>, Jethro Binks <[email protected]>
Subject: Re: [Netdisco] Netdisco auto discovery tasks suddenly stopped working

 

In the meantime, we've made a wiki paragraph describing it:

https://github.com/netdisco/netdisco/wiki/Database-Tips#unreasonable-database-size-index-bloat

in short, either

reindex table concurrently $tablename;

or

reindex database concurrently netdisco;

to do the whole db at once.

Cheers
Christian

On 10.02.22 03:16, Muris wrote:

I meant to say in my previous email I did a Vacuum Full, which fixed the issue 
with size and dB and everything is back to being responsive. 

 

Christian, can you tell me how to perform the Reindex Concurrently? What 
command do you exactly issue?

 

I tried to execute but not sure if im doing It right.

 

Cheers

 

From: alcatron <[email protected]>
Date: Wednesday, 2 February 2022 at 19:51
To: Christian Ramseyer <[email protected]>, 
"[email protected]" <[email protected]>, 
Jethro Binks <[email protected]>
Subject: Re: [Netdisco] Netdisco auto discovery tasks suddenly stopped working

 

Thanks, I did a autovacuum and I reduced the DB size back down, and saved 15gig.

 

How do you execute reindex concurrently correctly on the db?

 

 

From: Christian Ramseyer <[email protected]>
Date: Monday, 31 January 2022 at 11:44 pm
To: alcatron <[email protected]>, [email protected] 
<[email protected]>, Jethro Binks <[email protected]>
Subject: Re: [Netdisco] Netdisco auto discovery tasks suddenly stopped working

For us, the autovacuum works fine, so I never do vacuum full. But you 
can do it if you're totally out of space and need to reclaim some from 
Postgres.

The only thing that needs additional intervention is the indexes, we do 
a REINDEX CONCURRENTLY once a week, as described here: 
https://www.postgresql.org/docs/current/routine-reindex.html

Apparently this "index bloat" should be less of an issue in Postgres 14 
but I haven't gotten around to try it yet.



On 31.01.22 13:38, alcatron wrote:
> Thanks, would I ever have to do a “vacuum full” to the db by any chance 
> or should it be automatic process?
> 
> Auto Vacuum is set to on
> 
> *From: *Christian Ramseyer <[email protected]>
> *Date: *Monday, 31 January 2022 at 10:34 pm
> *To: *alcatron <[email protected]>, 
> [email protected] 
> <[email protected]>, Jethro Binks 
> <[email protected]>
> *Subject: *Re: [Netdisco] Netdisco auto discovery tasks suddenly stopped 
> working
> 
> 
> 
> On 31.01.22 12:56, alcatron wrote:
>> Thanks Christian, those commands you mentioned is that just at the psql 
>> command line?
>> 
> 
> Yes exactly. You can start the command line interface with "netdisco-do
> psql".
> 
> 
>> For some reason ever since I cleaned this device_skip table the netdisco 
>> postgresql folder has grown dramatically in size by an extra 15gig 
>> within 2 weeks.
>> 
>> I see this directory taking up the space - 
>> /var/lib/pgsql/12/data/base/16386 and lot of other files in there.
>> 
>> I had a look at the netdisco tables and I cant see any table that big in 
>> size, so im not really sure why the psql has dramatically keeps 
>> increasing in disk size ?
> 
> 
> You should see what uses the space with the first query from here:
> https://wiki.postgresql.org/wiki/Disk_Usage 
> <https://wiki.postgresql.org/wiki/Disk_Usage>
> 
> This will include indexes and TOAST tables, the space is probably used
> there instead of the actual table object.
> 
> Cheers
> Christian
> 
> 
> 
> 
>> 
>> Schema |            Name            | Type  |  Owner   |    Size    | 
>> Description
>> 
>> --------+----------------------------+-------+----------+------------+-------------
>> 
>> public | admin                      | table | netdisco | 173 MB     |
>> 
>>   public | community                  | table | netdisco | 224 kB     |
>> 
>>   public | dbix_class_schema_versions | table | netdisco | 40 kB      |
>> 
>>   public | device                     | table | netdisco | 3312 kB    |
>> 
>>   public | device_ip                  | table | netdisco | 34 MB      |
>> 
>>   public | device_module              | table | netdisco | 895 MB     |
>> 
>>   public | device_port                | table | netdisco | 1656 MB    |
>> 
>>   public | device_port_log            | table | netdisco | 48 kB      |
>> 
>>   public | device_port_power          | table | netdisco | 124 MB     |
>> 
>>   public | device_port_properties     | table | netdisco | 354 MB     |
>> 
>>   public | device_port_ssid           | table | netdisco | 17 MB      |
>> 
>>   public | device_port_vlan           | table | netdisco | 1084 MB    |
>> 
>>   public | device_port_wireless       | table | netdisco | 6776 kB    |
>> 
>>   public | device_power               | table | netdisco | 1760 kB    |
>> 
>>   public | device_skip                | table | netdisco | 5544 kB    |
>> 
>>   public | device_vlan                | table | netdisco | 67 MB      |
>> 
>>   public | log                        | table | netdisco | 8192 bytes |
>> 
>>   public | netmap_positions           | table | netdisco | 288 kB     |
>> 
>>   public | node                       | table | netdisco | 317 MB     |
>> 
>>   public | node_ip                    | table | netdisco | 2084 MB    |
>> 
>>   public | node_monitor               | table | netdisco | 8192 bytes |
>> 
>>   public | node_nbt                   | table | netdisco | 4328 kB    |
>> 
>>   public | node_wireless              | table | netdisco | 16 MB      |
>> 
>>   public | oui                        | table | netdisco | 2160 kB    |
>> 
>>   public | process                    | table | netdisco | 8192 bytes |
>> 
>>   public | sessions                   | table | netdisco | 48 kB      |
>> 
>>   public | statistics                 | table | netdisco | 200 kB     |
>> 
>>   public | subnets                    | table | netdisco | 1296 kB    |
>> 
>>   public | topology                   | table | netdisco | 48 kB      |
>> 
>>   public | user_log                   | table | netdisco | 600 kB     |
>> 
>>   public | users                      | table | netdisco | 48 kB      |
>> 
>> *From: *Christian Ramseyer <[email protected]>
>> *Date: *Thursday, 20 January 2022 at 12:45 am
>> *To: *alcatron <[email protected]>, 
>> [email protected] 
>> <[email protected]>, Jethro Binks 
>> <[email protected]>
>> *Subject: *Re: [Netdisco] Netdisco auto discovery tasks suddenly stopped 
>> working
>> 
>> 
>> 
>> On 19.01.22 14:00, alcatron wrote:
>>> As for picking up on the error, I saw this in the netdisco-backend log. 
>>> I believe the device_skip table was getting so big it was running out of 
>>> memory processing it, the device skip table was like 162MB
>>> 
>>> Im sure this will happen again in the next 2-3 months when the 
>>> device_skip table builds up. Perhaps its some kind of bug it can only 
>>> handle a device_skip table of a certain size?
>> 
>> It's weird how it would get that big, as IIRC it keeps only one record
>> per device in your DB at most. Is this including indexes? They might
>> become quite big, since Postgres can create some "bloat" under our
>> insert/delete pattern.
>> 
>> device_skip is just used to not poll unreachable devices over and over
>> again, there is no important data in there. So if in doubt,
>> 
>> delete from device_skip;
>> vacuum analyze device_skip;
>> reindex table device skip;
>> 
>> should allow for a fresh start.
>> 
>> There are also the max_deferrals and retry_after options to control the
>> skip behaviour. I don't think it will affect the table size much though.
>> https://github.com/netdisco/netdisco/wiki/Configuration#workers 
> <https://github.com/netdisco/netdisco/wiki/Configuration#workers>
>> <https://github.com/netdisco/netdisco/wiki/Configuration#workers 
> <https://github.com/netdisco/netdisco/wiki/Configuration#workers>>
>> 
>> If you're getting these issues regularly I'd definitely experiment with
>> the Postgres memory settings a bit, starting at work_mem.
>> 
>> Cheers
>> Christian
>> 
>> 
>>> 
>>> Both of these in the netdisco-backend.log were referring to items in the 
>>> “device_skip”, I looked through lots of logged data and found when it 
>>> started not working.
>>> 
>>> DETAIL:  Failed on request of size 284 in memory context 
>>> "CacheMemoryContext". [for Statement "SELECT me.backend, me.device, 
>>> me.actionset, me.deferrals, me.last_defer FROM device_skip me WHERE ( ( 
>>> me.backend = ? AND me.device = ? ) )" with ParamValues: 1=\'server\', 
>>> 2=\'10.1.1.1\'] at 
>>> /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 261
>>> 
>>> '}, 'DBIx::Class::Exception' )
>>> 
>>> [18851] 2022-01-11 01:30:43 error bless( {'msg' => 
>>> 'DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st 
>>> execute failed: ERROR:  out of memory
>>> 
>>> DETAIL:  Failed on request of size 8344 in memory context 
>>> "MessageContext". [for Statement "SELECT me.backend, me.device, 
>>> me.actionset, me.deferrals, me.last_defer FROM device_skip me WHERE ( ( 
>>> me.backend = ? AND me.device = ? ) )" with ParamValues: 1=\'server\', 
>>> 2=\10.1.1.2\'] at 
>>> /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 261
>>> 
>>> '}, 'DBIx::Class::Exception' )
>>> 
>>> *From: *alcatron <[email protected]>
>>> *Date: *Wednesday, 19 January 2022 at 10:14 pm
>>> *To: *Christian Ramseyer <[email protected]>, 
>>> [email protected] <[email protected]>
>>> *Subject: *Re: [Netdisco] Netdisco auto discovery tasks suddenly stopped 
>>> working
>>> 
>>> Hi Christian, thankyou for the tips.
>>> 
>>> I found what the problem is, it was crashing and not going past a 
>>> certain object in the “device_skip” table in the database.
>>> 
>>> I truncated that field in psql, and let it re-populate and that fixed 
>>> the automatic discovery and arpnip/macsuck etc.
>>> 
>>> I have found after a while perhaps 2-3 months something happens in the 
>>> “device_skip” table and halts these processes then I need to clear it to 
>>> make it work again. I remember I had this similar issue a few months 
>>> back, then I remembered what I did.
>>> 
>>> Muris
>>> 
>>> *From: *Christian Ramseyer <[email protected]>
>>> *Date: *Tuesday, 18 January 2022 at 12:20 pm
>>> *To: *alcatron <[email protected]>, 
>>> [email protected] <[email protected]>
>>> *Subject: *Re: [Netdisco] Netdisco auto discovery tasks suddenly stopped 
>>> working
>>> 
>>> Hi
>>> 
>>>   >  could not connect to
>>>   > server: No such file or directory/
>>> 
>>> This would be very concerning, meaning that Postgres is not running at
>>> all. But since you seem to have the web frontend running that is
>>> probably not the case currently, so I wouldn't worry too much. Might be
>>> an old log entry.
>>> 
>>> 
>>>   > Failed on request of size 16 in memory context
>>>   > "MessageContext".
>>> 
>>> That on the other hand might be the issue. Postgres uses all kinds of
>>> memory parameters, if one of them is too small the total GB of RAM
>>> sticks in the server don't matter much.
>>> 
>>> I had various issues with huge and clogged up discovery queues over the
>>> years, as a first measure I'd try to:
>>> 
>>> stop netdisco-backend
>>> restart Postgres, connect to the database with "netdisco-do psql" and in
>>> there run a "delete from admin;".
>>> for good measure, also run "reindex table admin;"
>>> restart netdisco-backend
>>> 
>>> This sounds dangerous but admin is in fact just the queue of actions to
>>> be done, so no important data will be lost.
>>> 
>>> Also a "select count(*) from admin" first might be interesting, to see
>>> how many rows are in there. If it's an absurdly high number (millions)
>>> you can run e.g. "create table admin_backup as select * from admin;" for
>>> analysis later.
>>> 
>>> If you're still getting the memory errors afterwards and it still
>>> doesn't work, I'd try to configure the memory parameters with this
>>> assistant, using the "online transaction processing" db type.
>>> https://pgtune.leopard.in.ua/#/about <https://pgtune.leopard.in.ua/#/about>
>> <https://pgtune.leopard.in.ua/#/about 
> <https://pgtune.leopard.in.ua/#/about>>
>> <https://pgtune.leopard.in.ua/#/about 
>> <https://pgtune.leopard.in.ua/#/about 
> <https://pgtune.leopard.in.ua/#/about>>>
>>> 
>>> 
>>> Cheers
>>> Christian
>>> 
>>> 
>>> 
>>> On 17.01.22 22:03, alcatron wrote:
>>>> Hi all, just wanting to ask your thoughts on what could be causing 
>>>> netdisco to suddenly stop performing auto discovery tasks.
>>>> 
>>>> Seems only arpnip is working via scheduled tasks, but discovery/macsuck 
>>>> has halted to auto perform. If I go manually to the device on web 
>>>> interface and trigger the auto discovery/arpnip/macsuck it works fine on 
>>>> the device.
>>>> 
>>>> Nothing has changed on system, running for a few months now, and 
>>>> suddenly the auto discovery is broken partly.
>>>> 
>>>> If I go to the backend log I see error like this below. The server is 
>>>> running and operational as I can still perform the manual to get 
>>>> discovery etc
>>>> 
>>>> The server is not out of memory as it has like 16GB and still plenty 
>>>> unused not what the messages are indicating..
>>>> 
>>>> Thanks for any assistance 😊
>>>> 
>>>> /DBIx::Class::Schema::Versioned::_on_connect(): Your DB is currently 
>>>> unversioned. Please call upgrade on your schema to sync the DB. at 
>>>> /home/netdisco/perl5/lib/perl5/DBICx/Sugar.pm line 121/
>>>> 
>>>> /DBIx::Class::Storage::DBI::catch {...} (): DBI Connection failed: DBI 
>>>> connect('dbname=netdisco','netdisco',...) failed: could not connect to 
>>>> server: No such file or directory/
>>>> 
>>>> /            Is the server running locally and accepting/
>>>> 
>>>> /            connections on Unix domain socket 
>>>> "/var/run/postgresql/.s.PGSQL.5432"? at 
>>>> /home/netdisco/perl5/lib/perl5/DBIx/Class/Storage/DBI.pm line 1639. at 
>>>> /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 50/
>>>> 
>>>> //
>>>> 
>>>> /[25756] error bless( {'msg' => 
>>>> 'DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st 
>>>> execute failed: ERROR:  out of memory/
>>>> 
>>>> /DETAIL:  Failed on request of size 16 in memory context 
>>>> "MessageContext". [for Statement "SELECT me.job, me.entered, me.started, 
>>>> me.finished, me.device, me.port, me.action, me.subaction, me.status, 
>>>> me.username, me.userip, me.log, me.debug, me.device_key FROM admin me 
>>>> WHERE ( me.job = ? ) FOR UPDATE" with ParamValues: 1=\'186421742\'] at 
>>>> /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 
>>>> 267/
>>>> 
>>>> /'}, 'DBIx::Class::Exception' )/
>>>> 
>>>> /[25781] 2022-01-11 01:33:53 error bless( {'msg' => 
>>>> 'DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st 
>>>> execute failed: ERROR:  out of memory/
>>>> 
>>>> /DETAIL:  Failed on request of size 16 in memory context 
>>>> "MessageContext". [for Statement "SELECT me.job, me.entered, me.started, 
>>>> me.finished, me.device, me.port, me.action, me.subaction, me.status, 
>>>> me.username, me.userip, me.log, me.debug, me.device_key FROM admin me 
>>>> WHERE ( me.job = ? ) FOR UPDATE" with ParamValues: 1=\'186420514\'] at 
>>>> /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 
>>>> 267/
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Netdisco mailing list
>>>> [email protected]
>>>> https://sourceforge.net/p/netdisco/mailman/netdisco-users/ 
> <https://sourceforge.net/p/netdisco/mailman/netdisco-users/>
>> <https://sourceforge.net/p/netdisco/mailman/netdisco-users/ 
> <https://sourceforge.net/p/netdisco/mailman/netdisco-users/>>
>>> <https://sourceforge.net/p/netdisco/mailman/netdisco-users/ 
>> <https://sourceforge.net/p/netdisco/mailman/netdisco-users/ 
> <https://sourceforge.net/p/netdisco/mailman/netdisco-users/>>>
>>> 
>>> -- 
>>> Christian Ramseyer, netnea ag
>>> Network Management. Security. OpenSource.
>>> https://www.netnea.com <https://www.netnea.com> <https://www.netnea.com 
> <https://www.netnea.com>> <https://www.netnea.com
>> <https://www.netnea.com <https://www.netnea.com>>>
>>> Phone: +41 79 644 77 64
>>> 
>> 
>> -- 
>> Christian Ramseyer, netnea ag
>> Network Management. Security. OpenSource.
>> https://www.netnea.com <https://www.netnea.com> <https://www.netnea.com 
> <https://www.netnea.com>>
>> Phone: +41 79 644 77 64
>> 
> 
> -- 
> Christian Ramseyer, netnea ag
> Network Management. Security. OpenSource.
> https://www.netnea.com <https://www.netnea.com>
> Phone: +41 79 644 77 64
> 

-- 
Christian Ramseyer, netnea ag
Network Management. Security. OpenSource.
https://www.netnea.com
Phone: +41 79 644 77 64



-- 
Christian Ramseyer, netnea ag
Network Management. Security. OpenSource.
https://www.netnea.com
Phone: +41 79 644 77 64

--- End Message ---

_______________________________________________
Netdisco mailing list - Digest Mode
[email protected]
https://lists.sourceforge.net/lists/listinfo/netdisco-users

netdisco-users Digest, Vol 189, Issue 3

Reply via email to