Re: syspatch breaks calling stat on mfs filesystem
On Fri, 2019-12-06 at 09:26 -0500, Art Manion wrote: > Environment: > System : OpenBSD 6.6 > Details : OpenBSD 6.6 (GENERIC) #3: Thu Nov 21 00:59:03 MST 2019 > r...@syspatch-66-i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC > Architecture: OpenBSD.i386 > > Description: > > syspatch adds up the sizes of existing files to be replaced and collects > the device names: > > 91 stat -qf "_dev=\"\${_dev} %Sd\"; > 92 local %Sd=\"\${%Sd:+\${%Sd}\+}%Uz\"" ${_files}) > \ > 93 2>/dev/null || _rc=$? > > then checks that devices are mounted and are not read-only: > > 97 for _d in $(printf '%s\n' ${_dev} | sort -u); do > 98 mount | grep -v read-only | grep -q "^/dev/${_d} " || > 99 sp_err "Read-only filesystem, aborting" > > I have a system with /var and /tmp mounted as mfs. The stat string > format returns '??' for mfs devices: > > $ stat -f %Sd /bin/cat > wd0a > > $ stat -f %Sd /var/db/libc.tags > ?? > > The 'grep -q ^/dev/??' on line 98 fails causing syspatch to error out > reporting a read-only filesystem, which is not correct. > > I noticed this with syspatch66-010_libcauth.tgz which looks to be the > first patch that changes files in /var (like /var/db/libc.tags). > > $ syspatch > Get/Verify syspatch66-010_libcaut... 100% |*| 17685 KB00:03 > > Installing patch 010_libcauth > Read-only filesystem, aborting > > $ mount > /dev/wd0a on / type ffs (local) > /dev/wd0g on /home type ffs (local, nodev, nosuid) > /dev/wd0f on /usr type ffs (local, nodev, wxallowed) > /dev/wd0d on /mfs type ffs (local, nodev, nosuid) > mfs:37162 on /tmp type mfs (asynchronous, local, nodev, nosuid, size=524288 > 512-blocks) > mfs:53614 on /var type mfs (asynchronous, local, nodev, nosuid, size=524288 > 512-blocks) > mfs:40334 on /dev type mfs (asynchronous, local, noexec, nosuid, size=8192 > 512-blocks) > /dev/wd0e on /var/syspatch type ffs (local, nodev, nosuid) > _a_host_:/some/nfs/export on /mnt/_an_nfs_mount_ type nfs (noexec, v3, udp, > timeo=100, retrans=101) > > Can work around it by modifying the check on line 98. Is it OK to allow > mfs filesystems? > > Is the major number for mfs uniquely 255? > > $ stat -f %Hd /var/db/libc.tags > 255 > > $ stat -f %Hd /mnt/_an_nfs_mount_ > 22 Thanks for the report. It is expected yes. Actually having /var mounted over MFS is absolutely not supported (because this is where we store rollback tarballs and where syspatch checks to see whether a particular patch has been installed). I will make that check stronger so that syspatch fails right away on MFS-mounted /var. -- Antoine
Re: syspatch breaks calling stat on mfs filesystem
On Sun, Dec 08, 2019 at 12:01:37PM +0100, Antoine Jacoutot wrote: > On Fri, 2019-12-06 at 09:26 -0500, Art Manion wrote: > > Environment: > > System : OpenBSD 6.6 > > Details : OpenBSD 6.6 (GENERIC) #3: Thu Nov 21 00:59:03 MST 2019 > > r...@syspatch-66-i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC > > Architecture: OpenBSD.i386 > > > > Description: > > > > syspatch adds up the sizes of existing files to be replaced and collects > > the device names: > > > > 91 stat -qf "_dev=\"\${_dev} %Sd\"; > > 92 local %Sd=\"\${%Sd:+\${%Sd}\+}%Uz\"" ${_files}) > > \ > > 93 2>/dev/null || _rc=$? > > > > then checks that devices are mounted and are not read-only: > > > > 97 for _d in $(printf '%s\n' ${_dev} | sort -u); do > > 98 mount | grep -v read-only | grep -q "^/dev/${_d} " || > > 99 sp_err "Read-only filesystem, aborting" > > > > I have a system with /var and /tmp mounted as mfs. The stat string > > format returns '??' for mfs devices: > > > > $ stat -f %Sd /bin/cat > > wd0a > > > > $ stat -f %Sd /var/db/libc.tags > > ?? > > > > The 'grep -q ^/dev/??' on line 98 fails causing syspatch to error out > > reporting a read-only filesystem, which is not correct. > > > > I noticed this with syspatch66-010_libcauth.tgz which looks to be the > > first patch that changes files in /var (like /var/db/libc.tags). > > > > $ syspatch > > Get/Verify syspatch66-010_libcaut... 100% |*| 17685 KB00:03 > > > > Installing patch 010_libcauth > > Read-only filesystem, aborting > > > > $ mount > > /dev/wd0a on / type ffs (local) > > /dev/wd0g on /home type ffs (local, nodev, nosuid) > > /dev/wd0f on /usr type ffs (local, nodev, wxallowed) > > /dev/wd0d on /mfs type ffs (local, nodev, nosuid) > > mfs:37162 on /tmp type mfs (asynchronous, local, nodev, nosuid, size=524288 > > 512-blocks) > > mfs:53614 on /var type mfs (asynchronous, local, nodev, nosuid, size=524288 > > 512-blocks) > > mfs:40334 on /dev type mfs (asynchronous, local, noexec, nosuid, size=8192 > > 512-blocks) > > /dev/wd0e on /var/syspatch type ffs (local, nodev, nosuid) > > _a_host_:/some/nfs/export on /mnt/_an_nfs_mount_ type nfs (noexec, v3, udp, > > timeo=100, retrans=101) > > > > Can work around it by modifying the check on line 98. Is it OK to allow > > mfs filesystems? > > > > Is the major number for mfs uniquely 255? > > > > $ stat -f %Hd /var/db/libc.tags > > 255 > > > > $ stat -f %Hd /mnt/_an_nfs_mount_ > > 22 > > Thanks for the report. > It is expected yes. > Actually having /var mounted over MFS is absolutely not supported (because > this > is where we store rollback tarballs and where syspatch checks to see whether a > particular patch has been installed). > I will make that check stronger so that syspatch fails right away on > MFS-mounted > /var. Ah I misread that you have /var/syspatch on UFS. Ok then the behavior you are seeing is to be expected; we don't want to install updated files to an ephemeral fs. That said, the error message could be more useful. -- Antoine
Re: Memory leak in rip engine (ripd)
On Thu, Dec 05, 2019 at 12:07:25PM +1100, Jason Tubnor wrote: > I have been trying to track down a memory leak that has been observed for > quite a number of years over multiple releases without success. I feel > that I have exhausted all potential configuration items that may have > caused this issue so now filing a bug. > > >Synopsis: ripd engine develops memory leak over time > >Category: system > >Environment: > System : OpenBSD 6.6 > Details : OpenBSD 6.6-current (GENERIC.MP) #492: Wed Nov 27 > 00:21:04 MST 2019 > dera...@amd64.openbsd.org: > /usr/src/sys/arch/amd64/compile/GENERIC.MP > > Architecture: OpenBSD.amd64 > Machine : amd64 > >Description: > When ripd isn't configured to redistribute anything, memory usage > remains static, once something is set for redistribution, that is when > memory begins to be consumed. The config below (minus security - but > without security yields the same result) is loaded and the engine process > [below] is the memory behavior over a 3 hour period. I have been able to > expedite memory consumption by increasing the interfaces and endpoints that > participate. I have witnessed the engine grow to over 10GB in size with no > change in the routing table (393 routes). Once memory is exhausted and > pressure put on swap, the host falls over. > > >How-To-Repeat: > Various version 5.8 - 6.6 exhibit this problem. Below is a simple > configuration and the resulting engine process being monitored over 3 hours: > > # cat /etc/ripd.conf > no redistribute /22 > redistribute connected > split-horizon simple > > interface vxlan32 { > } > interface vio0 { > passive > } > interface vio1 { > passive > } > interface vio2 { > passive > } > interface vio3 { > passive > } > interface vio4 { > passive > } > interface vio5 { > passive > } > I can reproduce this issue. But only when I combine the use of "interface XY { passive }" and "redistribute connected". When I remove the passive interfaces from ripd.conf memory consumption is stable. Why do you combine these configs? > - > Memory growth over 3 hours ( 26.8 MB ) - 214.6 MB per day > --- > > Thu Dec 5 08:07:09 AEDT 2019 > _ripd23284 0.0 1.9 146104 154956 ?? Sp Wed04PM0:18.02 ripd: > rip engine (ripd) > Thu Dec 5 08:17:09 AEDT 2019 > _ripd23284 0.0 1.9 147588 156440 ?? Sp Wed04PM0:18.21 ripd: > rip engine (ripd) > Thu Dec 5 08:27:09 AEDT 2019 > _ripd23284 0.0 1.9 149056 157908 ?? Sp Wed04PM0:18.40 ripd: > rip engine (ripd) > Thu Dec 5 08:37:10 AEDT 2019 > _ripd23284 0.0 1.9 150524 159376 ?? Sp Wed04PM0:18.58 ripd: > rip engine (ripd) > Thu Dec 5 08:47:10 AEDT 2019 > _ripd23284 0.0 1.9 152064 160916 ?? Sp Wed04PM0:18.77 ripd: > rip engine (ripd) > Thu Dec 5 08:57:10 AEDT 2019 > _ripd23284 0.1 1.9 153544 162396 ?? Sp Wed04PM0:18.96 ripd: > rip engine (ripd) > Thu Dec 5 09:07:10 AEDT 2019 > _ripd23284 0.0 2.0 155116 163968 ?? Sp Wed04PM0:19.15 ripd: > rip engine (ripd) > Thu Dec 5 09:17:10 AEDT 2019 > _ripd23284 0.0 2.0 156656 165508 ?? Sp Wed04PM0:19.34 ripd: > rip engine (ripd) > Thu Dec 5 09:27:10 AEDT 2019 > _ripd23284 0.0 2.0 158152 167004 ?? Sp Wed04PM0:19.53 ripd: > rip engine (ripd) > Thu Dec 5 09:37:10 AEDT 2019 > _ripd23284 0.0 2.0 159680 168532 ?? Sp Wed04PM0:19.72 ripd: > rip engine (ripd) > Thu Dec 5 09:47:10 AEDT 2019 > _ripd23284 0.0 2.0 161164 170016 ?? Sp Wed04PM0:19.91 ripd: > rip engine (ripd) > Thu Dec 5 09:57:10 AEDT 2019 > _ripd23284 0.0 2.0 162624 171476 ?? Sp Wed04PM0:20.09 ripd: > rip engine (ripd) > Thu Dec 5 10:07:10 AEDT 2019 > _ripd23284 0.1 2.1 164100 172952 ?? Sp Wed04PM0:20.28 ripd: > rip engine (ripd) > Thu Dec 5 10:17:10 AEDT 2019 > _ripd23284 0.0 2.1 165648 174500 ?? Sp Wed04PM0:20.46 ripd: > rip engine (ripd) > Thu Dec 5 10:27:10 AEDT 2019 > _ripd23284 0.1 2.1 167128 175980 ?? Sp Wed04PM0:20.65 ripd: > rip engine (ripd) > Thu Dec 5 10:37:10 AEDT 2019 > _ripd23284 0.0 2.1 168608 177460 ?? Sp Wed04PM0:20.84 ripd: > rip engine (ripd) > Thu Dec 5 10:47:10 AEDT 2019 > _ripd23284 0.0 2.1 170060 178912 ?? Sp Wed04PM0:21.03 ripd: > rip engine (ripd) > Thu Dec 5 10:57:10 AEDT 2019 > _ripd23284 0.0 2.2 171532 180384 ?? Sp Wed04PM0:21.21 ripd: > rip engine (ripd) > Thu Dec 5 11:07:10 AEDT 2019 > _ripd23284 0.0 2.2 172932 181784 ?? Sp Wed04PM0:21.40 ripd: > rip engine (ripd) > Thu Dec 5 11:17:10 AEDT 2019 > _ripd23284 0.0 2.2 174480 183332 ?? Sp Wed04PM0:21.59 ripd: > rip engine (ripd) > > >Fix: > Current workaround once memory gets to a specific level is to > restart the ripd process (not ideal): > > - rcctl
Re: Memory leak in rip engine (ripd)
On Sun, Dec 08, 2019 at 11:34:15PM +0100, Remi Locherer wrote: > On Thu, Dec 05, 2019 at 12:07:25PM +1100, Jason Tubnor wrote: > > I have been trying to track down a memory leak that has been observed for > > quite a number of years over multiple releases without success. I feel > > that I have exhausted all potential configuration items that may have > > caused this issue so now filing a bug. > > > > >Synopsis: ripd engine develops memory leak over time > > >Category: system > > >Environment: > > System : OpenBSD 6.6 > > Details : OpenBSD 6.6-current (GENERIC.MP) #492: Wed Nov 27 > > 00:21:04 MST 2019 > > dera...@amd64.openbsd.org: > > /usr/src/sys/arch/amd64/compile/GENERIC.MP > > > > Architecture: OpenBSD.amd64 > > Machine : amd64 > > >Description: > > When ripd isn't configured to redistribute anything, memory usage > > remains static, once something is set for redistribution, that is when > > memory begins to be consumed. The config below (minus security - but > > without security yields the same result) is loaded and the engine process > > [below] is the memory behavior over a 3 hour period. I have been able to > > expedite memory consumption by increasing the interfaces and endpoints that > > participate. I have witnessed the engine grow to over 10GB in size with no > > change in the routing table (393 routes). Once memory is exhausted and > > pressure put on swap, the host falls over. > > > > >How-To-Repeat: > > Various version 5.8 - 6.6 exhibit this problem. Below is a simple > > configuration and the resulting engine process being monitored over 3 hours: > > > > # cat /etc/ripd.conf > > no redistribute /22 > > redistribute connected > > split-horizon simple > > > > interface vxlan32 { > > } > > interface vio0 { > > passive > > } > > interface vio1 { > > passive > > } > > interface vio2 { > > passive > > } > > interface vio3 { > > passive > > } > > interface vio4 { > > passive > > } > > interface vio5 { > > passive > > } > > > > I can reproduce this issue. But only when I combine the use of > "interface XY { passive }" and "redistribute connected". When I remove > the passive interfaces from ripd.conf memory consumption is stable. > > Why do you combine these configs? Below diff fixes the memory leak for me. In addition to clear r_list I also moved the check for passive interface a bit up so that the function can return as soon as possible. OK? Remi Index: message.c === RCS file: /cvs/src/usr.sbin/ripd/message.c,v retrieving revision 1.12 diff -u -p -r1.12 message.c --- message.c 25 Oct 2014 03:23:49 - 1.12 +++ message.c 8 Dec 2019 23:35:42 - @@ -105,15 +105,15 @@ send_triggered_update(struct iface *ifac u_int16_tafi, route_tag; u_int32_taddress, netmask, nexthop, metric; + if (iface->passive) + return (0); + inet_aton(ALL_RIP_ROUTERS, &dst.sin_addr); dst.sin_port = htons(RIP_PORT); dst.sin_family = AF_INET; dst.sin_len = sizeof(struct sockaddr_in); - if (iface->passive) - return (0); - if ((buf = ibuf_open(iface->mtu - sizeof(struct ip) - sizeof(struct udphdr))) == NULL) fatal("send_triggered_update"); @@ -166,13 +166,15 @@ send_request(struct packet_head *r_list, port = htons(RIP_PORT); } + if (iface->passive) { + clear_list(r_list); + return (0); + } + dst.sin_port = port; dst.sin_family = AF_INET; dst.sin_len = sizeof(struct sockaddr_in); - if (iface->passive) - return (0); - while (!TAILQ_EMPTY(r_list)) { if ((buf = ibuf_open(iface->mtu - sizeof(struct ip) - sizeof(struct udphdr))) == NULL) @@ -240,13 +242,15 @@ send_response(struct packet_head *r_list port = htons(RIP_PORT); } + if (iface->passive) { + clear_list(r_list); + return (0); + } + dst.sin_port = port; dst.sin_family = AF_INET; dst.sin_len = sizeof(struct sockaddr_in); - if (iface->passive) - return (0); - while (!TAILQ_EMPTY(r_list)) { if ((buf = ibuf_open(iface->mtu - sizeof(struct ip) - sizeof(struct udphdr))) == NULL)
Re: Memory leak in rip engine (ripd)
On Sat, 7 Dec 2019 at 02:02, Hiltjo Posthuma wrote: > > > Hi, > > Does ripd -d -v print any warnings/debug messages for you? If so, this > could be > useful information. > > There was no additional information over an above the standard information of interfaces coming up and the neighbor discovery. Cheers, Jason.
Re: Memory leak in rip engine (ripd)
On Mon, 9 Dec 2019 at 09:34, Remi Locherer wrote: > > > I can reproduce this issue. But only when I combine the use of > "interface XY { passive }" and "redistribute connected". When I remove > the passive interfaces from ripd.conf memory consumption is stable. > > Why do you combine these configs? > We explicitly define what interfaces should not be participating RIPv2 activity as per ripd.conf(5). I can confirm when passive isn't used then memory use is stable between end points.
Re: syspatch breaks calling stat on mfs filesystem
On Sun, Dec 8, 2019 at 6:08 AM Antoine Jacoutot wrote: > > > I have a system with /var and /tmp mounted as mfs. The stat string > > > format returns '??' for mfs devices: > > > > > > $ stat -f %Sd /bin/cat > > > wd0a > > > > > > $ stat -f %Sd /var/db/libc.tags > > > ?? > > > > > > The 'grep -q ^/dev/??' on line 98 fails causing syspatch to error out > > > reporting a read-only filesystem, which is not correct. > > > > mfs:53614 on /var type mfs (asynchronous, local, nodev, nosuid, > size=524288 > > > 512-blocks) > > > /dev/wd0e on /var/syspatch type ffs (local, nodev, nosuid) > > > > Can work around it by modifying the check on line 98. Is it OK to > allow > > > mfs filesystems? > > > Actually having /var mounted over MFS is absolutely not supported > (because this > > is where we store rollback tarballs and where syspatch checks to see > whether a > > particular patch has been installed). > > I will make that check stronger so that syspatch fails right away on > MFS-mounted > > /var. > > Ah I misread that you have /var/syspatch on UFS. > Ok then the behavior you are seeing is to be expected; we don't want to > install > updated files to an ephemeral fs. > That said, the error message could be more useful. Understood. I'm OK dealing with my non-supported choice of mounting /var as mfs. The checks in question are about disk space, if the (valid!) concern is about losing rollback files, I'd suggest an explicit check that /var/syspatch is sane (local, UFS, whatever else). Every previous syspatch on this system worked, only syspatch66-010_libcauth.tgz failed since it happened to include new files destined for /var. Regards, - Art
Re: Memory leak in rip engine (ripd)
On Mon, 9 Dec 2019 at 10:44, Remi Locherer wrote: > > > > > I can reproduce this issue. But only when I combine the use of > > "interface XY { passive }" and "redistribute connected". When I remove > > the passive interfaces from ripd.conf memory consumption is stable. > > > > Why do you combine these configs? > > Below diff fixes the memory leak for me. > > In addition to clear r_list I also moved the check for passive interface > a bit up so that the function can return as soon as possible. > > OK? > > Remi > > > > This patch applied cleanly and has fixed the memory leak issue when interfaces are configured 'passive'. Tested with 1 active and 6 passive interfaces on one host and with a little memory consumption on startup [expected], it settled back down to 984KB of consumed RAM for 391 subnets and didn't move over 1.5 hours. As /cvs/src/usr.sbin/ripd/message.c hasn't been touched since 2014, this patch would apply cleanly to 6.5 and 6.6 if an errata notice needed to be created (I can test on these if validation is required - will just take a little time to spin them up). Thanks for your help Remi. Cheers, Jason.