Send netdisco-users mailing list submissions to netdisco-users@lists.sourceforge.net
To subscribe or unsubscribe via the World Wide Web, visit https://lists.sourceforge.net/lists/listinfo/netdisco-users or, via email, send a message with subject or body 'help' to netdisco-users-requ...@lists.sourceforge.net You can reach the person managing the list at netdisco-users-ow...@lists.sourceforge.net When replying, please edit your Subject line so it is more specific than "Re: Contents of netdisco-users digest..."
Today's Topics: 1. Re: Out of memory errors in backend log upon upgrading to Netdisco 2.60.7 (Christian Ramseyer) 2. Re: Out of memory errors in backend log upon upgrading to Netdisco 2.60.7 (Muris)
--- Begin Message ---Sounds like you're getting multiple device_skip entries for the same device.Unfortunately I'm not well versed in how this really works, especially with multiple backends (do you have multiple?). I imagine this query will create some output in your db:select device, count(*) as cnt from device_skip group by device having count(*) > 1;device | cnt --------+----- (0 rows) or maybe:select backend, device, count(*) as cnt from device_skip group by backend, device having count(*) > 1;backend | device | cnt ---------+--------+----- (0 rows)Until somebody has a better idea, you can try disabling the whole device_skip mechanism by setting workers.max_deferrals and retry_after to zero.https://github.com/netdisco/netdisco/wiki/Configuration#workersI'm not sure if this will achiveve not executing the query that fails, but might be worth a try.On 07.03.23 23:51, Muris wrote:Thanks Christian, I have done that also clearing out the admin table, also i increased the work_mem to 64M which now the process workers are working and not getting out of memory.I left it running and it was working fine for an hour or two.. i just checked it and all the nd2 processes seems to have stopped again, so i checkedthe netdisco-backend.log and it shows this[27082] 2023-03-07 22:31:36 error bless( {'msg' => 'DBIx::Class::Row::update(): Can\'t update App::Netdisco::DB::Result::DeviceSkip=HASH(0x5da4278): updated more than one row at /home/netdisco/perl5/lib/perl5/App/Netdisco/DB/Result/DeviceSkip.pm line 41'}, 'DBIx::Class::Exception' )It seems something in the device_skip table causes it to crash, then all the process workers stop working.. then I have to manually restart netdisco-backend,I have cleared device_skip and still this comes... any ideas about this one?On Wed, Mar 8, 2023 at 8:33 AM Christian Ramseyer <ramse...@netnea.com <mailto:ramse...@netnea.com>> wrote:Hi Muris We sometimes see that the admin table or its indexes can get very large, despite not much data being in there. This is due to index bloat in Postgres, ie. fragmentation of on-disk pages. Since all your error seem to come from admin, I'd try clearing out the table and recreating the indexes like shown here: <https://github.com/netdisco/netdisco/wiki/Database-Tips#polling-mysteriously-stuck-or-very-large-admindevice_skip-tables-and-indexes <https://github.com/netdisco/netdisco/wiki/Database-Tips#polling-mysteriously-stuck-or-very-large-admindevice_skip-tables-and-indexes>> No worries, the table just holds job queue records and clearing it out does not lose any data you care about. If you're still having trouble, you can also look at index bloat in other tables: <https://github.com/netdisco/netdisco/wiki/Database-Tips#unreasonable-database-size-index-bloat <https://github.com/netdisco/netdisco/wiki/Database-Tips#unreasonable-database-size-index-bloat>> and try one of the proposed solutions there, e.g. netdisco-do psql -e 'reindex database concurrently netdisco' For general Postgres memory tuning, I like using <https://pgtune.leopard.in.ua/#/about <https://pgtune.leopard.in.ua/#/about>> since it just asks a few system questions and then outputs all the ~20 parameters in a reasonable relation to each other. Cheers Christian On 07.03.23 11:20, Muris wrote: > Hey guys, > > I seem to be having some out of memory errors on upgrading to netdisco > 2.60.7 the latest version.. not sure whats going on. > > I followed this guide > https://severalnines.com/blog/architecture-and-tuning-memory-postgresql-databases/ <https://severalnines.com/blog/architecture-and-tuning-memory-postgresql-databases/> <https://severalnines.com/blog/architecture-and-tuning-memory-postgresql-databases/ <https://severalnines.com/blog/architecture-and-tuning-memory-postgresql-databases/>> > > And I increased the worker_mem to 32MB which seems to have helped a bit, > but still noticing errors on the backend time to time, and messages like > this, any ideas? > > Running on Xeon 8 core with 16gb ram. > > DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st > execute failed: ERROR:out of memory > > DETAIL:Failed on request of size 32800 in memory context > "HashBatchContext". [for Statement "SELECT me.job, me.entered, > me.started, me.finished, me.device, me.port, me.action, me.subaction, > me.status, me.username, me.userip, me.log, me.debug, me.device_key, > me.job_priority FROM (WITH my_jobs AS > > (SELECT admin.* FROM admin > > LEFT OUTER JOIN device_skip ds > > ON (ds.backend = ? AND admin.device = ds.device > > AND admin.action = ANY (ds.actionset)) > > WHERE admin.status = 'queued' > > AND ds.device IS NULL) > > SELECT my_jobs.*, > > CASE WHEN ( (my_jobs.username IS NOT NULL AND ((ds.deferrals IS NULL AND > ds.last_defer IS NULL) > > OR my_jobs.entered > ds.last_defer)) > > OR (my_jobs.action = ANY (string_to_array(btrim(?, '{"}'), '","'))) ) > > THEN 100 > > ELSE 0 > > END AS job_priority > > FROM my_jobs > > LEFT OUTER JOIN device_skip ds > > ON (ds.backend = ? AND ds.device = my_jobs.device) > > WHERE ds.deferrals < ? > > OR (my_jobs.username IS NOT NULL AND (ds.last_defer IS NULL > > OR my_jobs.entered > ds.last_defer)) > > OR (ds.deferrals IS NULL AND ds.last_defer IS NULL) > > OR ds.last_defer <= ( LOCALTIMESTAMP - ?::interval ) > > ORDER BY job_priority DESC, > > ds.deferrals ASC NULLS FIRST, > > ds.last_defer ASC NULLS LAST, > > device_key DESC NULLS LAST, > > random() > > LIMIT ? > > ) me" with ParamValues: 1='netdisco', > 2='{"contact","hook::exec","hook::http","location","portcontrol","portname","power","snapshot","vlan"}', 3=’netdisco’, 4='10', 5='7 days', 6='1'] at /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 122 > > [5405] 2023-03-07 08:36:21 error bless( {'msg' => > 'DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st > execute failed: ERROR:out of memory > > DETAIL:Failed on request of size 72 in memory context "MessageContext". > [for Statement "UPDATE admin SET finished = ?, log = ?, started = ?, > status = ? WHERE ( job = ? )" with ParamValues: 1=\'Tue Mar7 19:06:21 > 2023\', 2=\'DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: > DBD::Pg::st execute failed: ERROR:out of memory > > DETAIL:Failed on request of size 16 in memory context "MessageContext". > [for Statement "SELECT me.ip, me.port, me.creation, me.descr, me.up, > me.up_admin, me.type, me.duplex, me.duplex_admin, me.speed, > me.speed_admin, me.name <http://me.name>, me.mac, me.mtu, me.stp, me.remote_ip, > me.remote_port, me.remote_type, me.remote_id, me.has_subinterfaces, > me.is_master, me.slave_of, me.manual_topo, me.is_uplink, me.vlan, > me.pvid, me.lastchange, me.custom_fields, neighbor_alias.ip, > neighbor_alias.alias, neighbor_alias.subnet, neighbor_alias.port, > neighbor_alias.dns, neighbor_alias.creation, device.ip, device.creation, > device.dns, device.description, device.uptime, device.contact, > device.name <http://device.name>, device.location, device.layers, device.num_ports, > device.mac, device.serial, device.chassis_id, device.model, > device.ps1_type, device.ps2_type, device.ps1_status, device.ps2_status, > device.fan, device.slots, dev...\', 3=\'Tue Mar7 19:06:17 2023\', > 4=\'error\', 5=\'215666754\'] at > /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 282 > > '}, 'DBIx::Class::Exception' ) > > [5427] 2023-03-07 08:36:28 error bless( {'msg' => > 'DBIx::Class::Storage::DBI::catch {...} (): DBI Connection failed: DBI > connect(\'dbname=netdisco\',\'netdisco\',...) failed: server closed the > connection unexpectedly > > This probably means the server terminated abnormally > > before or while processing the request. at > /home/netdisco/perl5/lib/perl5/DBIx/Class/Storage/DBI.pm line 1639. at > /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 292 > > '}, 'DBIx::Class::Exception' ) > > DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st > execute failed: ERROR:out of memory > > Use of uninitialized value $op in pattern match (m//) at > /home/netdisco/perl5/lib/perl5/SQL/Abstract/Classic.pm line 327. > > [14006] 2023-03-07 10:05:49 error bless( {'msg' => > 'DBIx::Class::SQLMaker::ClassicExtensions::puke(): Fatal: Operator calls > in update must be in the form { -op => $arg } at > /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 282 > > '}, 'DBIx::Class::Exception' ) > > Use of uninitialized value $op in pattern match (m//) at > /home/netdisco/perl5/lib/perl5/SQL/Abstract/Classic.pm line 327. > > [14718] 2023-03-07 10:10:16 error bless( {'msg' => > 'DBIx::Class::SQLMaker::ClassicExtensions::puke(): Fatal: Operator calls > in update must be in the form { -op => $arg } at > /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 282 > > '}, 'DBIx::Class::Exception' )-- Christian Ramseyer, netnea agNetwork Management. Security. OpenSource. https://www.netnea.com <https://www.netnea.com> Phone: +41 79 644 77 64-- Christian Ramseyer, netnea ag Network Management. Security. OpenSource. https://www.netnea.com Phone: +41 79 644 77 64
--- End Message ---
--- Begin Message ---Thanks..I only have one backend. Here is the query I executed netdisco=> select device, count(*) as cnt from device_skip group by device having netdisco-> count(*) > 1; device | cnt ---------------+----- 10.183.x.55 | 2 10.183.x.22 | 2 These devices that are listed are actually access points probably discovered by CDP/LLDP that netdisco cant get to. So what does that tell you? Not sure why its having or showing as a list of 2 for access points Maybe it’s a bug of some sort for access points its trying to deal with? On 8/3/2023, 11:21, "Christian Ramseyer" <ramse...@netnea.com <mailto:ramse...@netnea.com>> wrote: Sounds like you're getting multiple device_skip entries for the same device. Unfortunately I'm not well versed in how this really works, especially with multiple backends (do you have multiple?). I imagine this query will create some output in your db: select device, count(*) as cnt from device_skip group by device having count(*) > 1; device | cnt --------+----- (0 rows) or maybe: select backend, device, count(*) as cnt from device_skip group by backend, device having count(*) > 1; backend | device | cnt ---------+--------+----- (0 rows) Until somebody has a better idea, you can try disabling the whole device_skip mechanism by setting workers.max_deferrals and retry_after to zero. https://github.com/netdisco/netdisco/wiki/Configuration#workers <https://github.com/netdisco/netdisco/wiki/Configuration#workers> I'm not sure if this will achiveve not executing the query that fails, but might be worth a try. On 07.03.23 23:51, Muris wrote: > Thanks Christian, I have done that also clearing out the admin table, > also i increased the work_mem to 64M which now the process workers are > working and not getting out of memory. > > I left it running and it was working fine for an hour or two.. i just > checked it and all the nd2 processes seems to have stopped again, so i > checked > the netdisco-backend.log and it shows this > > [27082] 2023-03-07 22:31:36 error bless( {'msg' => > 'DBIx::Class::Row::update(): Can\'t update > App::Netdisco::DB::Result::DeviceSkip=HASH(0x5da4278): updated more than > one row at > /home/netdisco/perl5/lib/perl5/App/Netdisco/DB/Result/DeviceSkip.pm line 41 > '}, 'DBIx::Class::Exception' ) > > It seems something in the device_skip table causes it to crash, then all > the process workers stop working.. then I have to manually restart > netdisco-backend, > I have cleared device_skip and still this comes... any ideas about this one? > > On Wed, Mar 8, 2023 at 8:33 AM Christian Ramseyer <ramse...@netnea.com > <mailto:ramse...@netnea.com> > <mailto:ramse...@netnea.com <mailto:ramse...@netnea.com>>> wrote: > > Hi Muris > > We sometimes see that the admin table or its indexes can get very > large, > despite not much data being in there. This is due to index bloat in > Postgres, ie. fragmentation of on-disk pages. > > Since all your error seem to come from admin, I'd try clearing out the > table and recreating the indexes like shown here: > > <https://github.com/netdisco/netdisco/wiki/Database-Tips#polling-mysteriously-stuck-or-very-large-admindevice_skip-tables-and-indexes > > <https://github.com/netdisco/netdisco/wiki/Database-Tips#polling-mysteriously-stuck-or-very-large-admindevice_skip-tables-and-indexes> > > <https://github.com/netdisco/netdisco/wiki/Database-Tips#polling-mysteriously-stuck-or-very-large-admindevice_skip-tables-and-indexes>> > > <https://github.com/netdisco/netdisco/wiki/Database-Tips#polling-mysteriously-stuck-or-very-large-admindevice_skip-tables-and-indexes>>> > > No worries, the table just holds job queue records and clearing it out > does not lose any data you care about. > > If you're still having trouble, you can also look at index bloat in > other tables: > <https://github.com/netdisco/netdisco/wiki/Database-Tips#unreasonable-database-size-index-bloat > > <https://github.com/netdisco/netdisco/wiki/Database-Tips#unreasonable-database-size-index-bloat> > > <https://github.com/netdisco/netdisco/wiki/Database-Tips#unreasonable-database-size-index-bloat>> > > <https://github.com/netdisco/netdisco/wiki/Database-Tips#unreasonable-database-size-index-bloat>>> > and try one of the proposed solutions there, e.g. > > netdisco-do psql -e 'reindex database concurrently netdisco' > > For general Postgres memory tuning, I like using > <https://pgtune.leopard.in.ua/#/about <https://pgtune.leopard.in.ua/#/about> > <https://pgtune.leopard.in.ua/#/about>> > <https://pgtune.leopard.in.ua/#/about>>> since it just asks a few system > questions and then outputs all the ~20 parameters in a reasonable > relation to each other. > > > > Cheers > Christian > > > On 07.03.23 11:20, Muris wrote: > > Hey guys, > > > > I seem to be having some out of memory errors on upgrading to > netdisco > > 2.60.7 the latest version.. not sure whats going on. > > > > I followed this guide > > > https://severalnines.com/blog/architecture-and-tuning-memory-postgresql-databases/ > > <https://severalnines.com/blog/architecture-and-tuning-memory-postgresql-databases/> > > <https://severalnines.com/blog/architecture-and-tuning-memory-postgresql-databases/> > > <https://severalnines.com/blog/architecture-and-tuning-memory-postgresql-databases/>> > > <https://severalnines.com/blog/architecture-and-tuning-memory-postgresql-databases/ > > <https://severalnines.com/blog/architecture-and-tuning-memory-postgresql-databases/> > > <https://severalnines.com/blog/architecture-and-tuning-memory-postgresql-databases/>> > > <https://severalnines.com/blog/architecture-and-tuning-memory-postgresql-databases/>>> > > > > And I increased the worker_mem to 32MB which seems to have helped > a bit, > > but still noticing errors on the backend time to time, and > messages like > > this, any ideas? > > > > Running on Xeon 8 core with 16gb ram. > > > > DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: > DBD::Pg::st > > execute failed: ERROR:out of memory > > > > DETAIL:Failed on request of size 32800 in memory context > > "HashBatchContext". [for Statement "SELECT me.job, me.entered, > > me.started, me.finished, me.device, me.port, me.action, > me.subaction, > > me.status, me.username, me.userip, me.log, me.debug, me.device_key, > > me.job_priority FROM (WITH my_jobs AS > > > > (SELECT admin.* FROM admin > > > > LEFT OUTER JOIN device_skip ds > > > > ON (ds.backend = ? AND admin.device = ds.device > > > > AND admin.action = ANY (ds.actionset)) > > > > WHERE admin.status = 'queued' > > > > AND ds.device IS NULL) > > > > SELECT my_jobs.*, > > > > CASE WHEN ( (my_jobs.username IS NOT NULL AND ((ds.deferrals IS > NULL AND > > ds.last_defer IS NULL) > > > > OR my_jobs.entered > ds.last_defer)) > > > > OR (my_jobs.action = ANY (string_to_array(btrim(?, '{"}'), '","'))) ) > > > > THEN 100 > > > > ELSE 0 > > > > END AS job_priority > > > > FROM my_jobs > > > > LEFT OUTER JOIN device_skip ds > > > > ON (ds.backend = ? AND ds.device = my_jobs.device) > > > > WHERE ds.deferrals < ? > > > > OR (my_jobs.username IS NOT NULL AND (ds.last_defer IS NULL > > > > OR my_jobs.entered > ds.last_defer)) > > > > OR (ds.deferrals IS NULL AND ds.last_defer IS NULL) > > > > OR ds.last_defer <= ( LOCALTIMESTAMP - ?::interval ) > > > > ORDER BY job_priority DESC, > > > > ds.deferrals ASC NULLS FIRST, > > > > ds.last_defer ASC NULLS LAST, > > > > device_key DESC NULLS LAST, > > > > random() > > > > LIMIT ? > > > > ) me" with ParamValues: 1='netdisco', > > > 2='{"contact","hook::exec","hook::http","location","portcontrol","portname","power","snapshot","vlan"}', > 3=’netdisco’, 4='10', 5='7 days', 6='1'] at > /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 122 > > > > [5405] 2023-03-07 08:36:21 error bless( {'msg' => > > 'DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: > DBD::Pg::st > > execute failed: ERROR:out of memory > > > > DETAIL:Failed on request of size 72 in memory context > "MessageContext". > > [for Statement "UPDATE admin SET finished = ?, log = ?, started = ?, > > status = ? WHERE ( job = ? )" with ParamValues: 1=\'Tue Mar7 > 19:06:21 > > 2023\', 2=\'DBIx::Class::Storage::DBI::_dbh_execute(): DBI > Exception: > > DBD::Pg::st execute failed: ERROR:out of memory > > > > DETAIL:Failed on request of size 16 in memory context > "MessageContext". > > [for Statement "SELECT me.ip, me.port, me.creation, me.descr, me.up, > > me.up_admin, me.type, me.duplex, me.duplex_admin, me.speed, > > me.speed_admin, me.name <http://me.name> <http://me.name>>, me.mac, > > me.mtu, me.stp, > me.remote_ip, > > me.remote_port, me.remote_type, me.remote_id, me.has_subinterfaces, > > me.is_master, me.slave_of, me.manual_topo, me.is_uplink, me.vlan, > > me.pvid, me.lastchange, me.custom_fields, neighbor_alias.ip, > > neighbor_alias.alias, neighbor_alias.subnet, neighbor_alias.port, > > neighbor_alias.dns, neighbor_alias.creation, device.ip, > device.creation, > > device.dns, device.description, device.uptime, device.contact, > > device.name <http://device.name> <http://device.name>>, device.location, > > device.layers, > device.num_ports, > > device.mac, device.serial, device.chassis_id, device.model, > > device.ps1_type, device.ps2_type, device.ps1_status, > device.ps2_status, > > device.fan, device.slots, dev...\', 3=\'Tue Mar7 19:06:17 2023\', > > 4=\'error\', 5=\'215666754\'] at > > > /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm > line 282 > > > > '}, 'DBIx::Class::Exception' ) > > > > [5427] 2023-03-07 08:36:28 error bless( {'msg' => > > 'DBIx::Class::Storage::DBI::catch {...} (): DBI Connection > failed: DBI > > connect(\'dbname=netdisco\',\'netdisco\',...) failed: server > closed the > > connection unexpectedly > > > > This probably means the server terminated abnormally > > > > before or while processing the request. at > > /home/netdisco/perl5/lib/perl5/DBIx/Class/Storage/DBI.pm line > 1639. at > > > /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm > line 292 > > > > '}, 'DBIx::Class::Exception' ) > > > > DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: > DBD::Pg::st > > execute failed: ERROR:out of memory > > > > Use of uninitialized value $op in pattern match (m//) at > > /home/netdisco/perl5/lib/perl5/SQL/Abstract/Classic.pm line 327. > > > > [14006] 2023-03-07 10:05:49 error bless( {'msg' => > > 'DBIx::Class::SQLMaker::ClassicExtensions::puke(): Fatal: > Operator calls > > in update must be in the form { -op => $arg } at > > > /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm > line 282 > > > > '}, 'DBIx::Class::Exception' ) > > > > Use of uninitialized value $op in pattern match (m//) at > > /home/netdisco/perl5/lib/perl5/SQL/Abstract/Classic.pm line 327. > > > > [14718] 2023-03-07 10:10:16 error bless( {'msg' => > > 'DBIx::Class::SQLMaker::ClassicExtensions::puke(): Fatal: > Operator calls > > in update must be in the form { -op => $arg } at > > > /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm > line 282 > > > > '}, 'DBIx::Class::Exception' ) > > > -- > Christian Ramseyer, netnea ag > Network Management. Security. OpenSource. > https://www.netnea.com <https://www.netnea.com> <https://www.netnea.com> > <https://www.netnea.com>> > Phone: +41 79 644 77 64 > -- Christian Ramseyer, netnea ag Network Management. Security. OpenSource. https://www.netnea.com <https://www.netnea.com> Phone: +41 79 644 77 64
--- End Message ---
_______________________________________________ Netdisco mailing list - Digest Mode netdisco-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/netdisco-users