Re: syspatch breaks calling stat on mfs filesystem

2019-12-08 Thread Antoine Jacoutot
On Fri, 2019-12-06 at 09:26 -0500, Art Manion wrote:
> Environment:
> System  : OpenBSD 6.6
> Details : OpenBSD 6.6 (GENERIC) #3: Thu Nov 21 00:59:03 MST 2019
> r...@syspatch-66-i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
> Architecture: OpenBSD.i386
> 
> Description:
> 
> syspatch adds up the sizes of existing files to be replaced and collects
> the device names:
> 
>  91 stat -qf "_dev=\"\${_dev} %Sd\";
>  92 local %Sd=\"\${%Sd:+\${%Sd}\+}%Uz\"" ${_files})
> \
>  93 2>/dev/null || _rc=$?
> 
> then checks that devices are mounted and are not read-only:
> 
>  97 for _d in $(printf '%s\n' ${_dev} | sort -u); do
>  98 mount | grep -v read-only | grep -q "^/dev/${_d} " ||
>  99 sp_err "Read-only filesystem, aborting"
> 
> I have a system with /var and /tmp mounted as mfs.  The stat string
> format returns '??' for mfs devices:
> 
> $ stat -f %Sd /bin/cat
> wd0a
> 
> $ stat -f %Sd /var/db/libc.tags
> ??
> 
> The 'grep -q ^/dev/??' on line 98 fails causing syspatch to error out
> reporting a read-only filesystem, which is not correct.
> 
> I noticed this with syspatch66-010_libcauth.tgz which looks to be the
> first patch that changes files in /var (like /var/db/libc.tags).
> 
> $ syspatch
> Get/Verify syspatch66-010_libcaut... 100% |*| 17685 KB00:03
> 
> Installing patch 010_libcauth
> Read-only filesystem, aborting
> 
> $ mount
> /dev/wd0a on / type ffs (local)
> /dev/wd0g on /home type ffs (local, nodev, nosuid)
> /dev/wd0f on /usr type ffs (local, nodev, wxallowed)
> /dev/wd0d on /mfs type ffs (local, nodev, nosuid)
> mfs:37162 on /tmp type mfs (asynchronous, local, nodev, nosuid, size=524288
> 512-blocks)
> mfs:53614 on /var type mfs (asynchronous, local, nodev, nosuid, size=524288
> 512-blocks)
> mfs:40334 on /dev type mfs (asynchronous, local, noexec, nosuid, size=8192
> 512-blocks)
> /dev/wd0e on /var/syspatch type ffs (local, nodev, nosuid)
> _a_host_:/some/nfs/export on /mnt/_an_nfs_mount_ type nfs (noexec, v3, udp,
> timeo=100, retrans=101)
> 
> Can work around it by modifying the check on line 98.  Is it OK to allow
> mfs filesystems?
> 
> Is the major number for mfs uniquely 255?
> 
> $ stat -f %Hd /var/db/libc.tags
> 255
> 
> $ stat -f %Hd /mnt/_an_nfs_mount_
> 22

Thanks for the report.
It is expected yes.
Actually having /var mounted over MFS is absolutely not supported (because this
is where we store rollback tarballs and where syspatch checks to see whether a
particular patch has been installed).
I will make that check stronger so that syspatch fails right away on 
MFS-mounted 
/var.



-- 
Antoine



Re: syspatch breaks calling stat on mfs filesystem

2019-12-08 Thread Antoine Jacoutot
On Sun, Dec 08, 2019 at 12:01:37PM +0100, Antoine Jacoutot wrote:
> On Fri, 2019-12-06 at 09:26 -0500, Art Manion wrote:
> > Environment:
> > System  : OpenBSD 6.6
> > Details : OpenBSD 6.6 (GENERIC) #3: Thu Nov 21 00:59:03 MST 2019
> > r...@syspatch-66-i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
> > Architecture: OpenBSD.i386
> > 
> > Description:
> > 
> > syspatch adds up the sizes of existing files to be replaced and collects
> > the device names:
> > 
> >  91 stat -qf "_dev=\"\${_dev} %Sd\";
> >  92 local %Sd=\"\${%Sd:+\${%Sd}\+}%Uz\"" ${_files})
> > \
> >  93 2>/dev/null || _rc=$?
> > 
> > then checks that devices are mounted and are not read-only:
> > 
> >  97 for _d in $(printf '%s\n' ${_dev} | sort -u); do
> >  98 mount | grep -v read-only | grep -q "^/dev/${_d} " ||
> >  99 sp_err "Read-only filesystem, aborting"
> > 
> > I have a system with /var and /tmp mounted as mfs.  The stat string
> > format returns '??' for mfs devices:
> > 
> > $ stat -f %Sd /bin/cat
> > wd0a
> > 
> > $ stat -f %Sd /var/db/libc.tags
> > ??
> > 
> > The 'grep -q ^/dev/??' on line 98 fails causing syspatch to error out
> > reporting a read-only filesystem, which is not correct.
> > 
> > I noticed this with syspatch66-010_libcauth.tgz which looks to be the
> > first patch that changes files in /var (like /var/db/libc.tags).
> > 
> > $ syspatch
> > Get/Verify syspatch66-010_libcaut... 100% |*| 17685 KB00:03
> > 
> > Installing patch 010_libcauth
> > Read-only filesystem, aborting
> > 
> > $ mount
> > /dev/wd0a on / type ffs (local)
> > /dev/wd0g on /home type ffs (local, nodev, nosuid)
> > /dev/wd0f on /usr type ffs (local, nodev, wxallowed)
> > /dev/wd0d on /mfs type ffs (local, nodev, nosuid)
> > mfs:37162 on /tmp type mfs (asynchronous, local, nodev, nosuid, size=524288
> > 512-blocks)
> > mfs:53614 on /var type mfs (asynchronous, local, nodev, nosuid, size=524288
> > 512-blocks)
> > mfs:40334 on /dev type mfs (asynchronous, local, noexec, nosuid, size=8192
> > 512-blocks)
> > /dev/wd0e on /var/syspatch type ffs (local, nodev, nosuid)
> > _a_host_:/some/nfs/export on /mnt/_an_nfs_mount_ type nfs (noexec, v3, udp,
> > timeo=100, retrans=101)
> > 
> > Can work around it by modifying the check on line 98.  Is it OK to allow
> > mfs filesystems?
> > 
> > Is the major number for mfs uniquely 255?
> > 
> > $ stat -f %Hd /var/db/libc.tags
> > 255
> > 
> > $ stat -f %Hd /mnt/_an_nfs_mount_
> > 22
> 
> Thanks for the report.
> It is expected yes.
> Actually having /var mounted over MFS is absolutely not supported (because 
> this
> is where we store rollback tarballs and where syspatch checks to see whether a
> particular patch has been installed).
> I will make that check stronger so that syspatch fails right away on 
> MFS-mounted 
> /var.

Ah I misread that you have /var/syspatch on UFS.
Ok then the behavior you are seeing is to be expected; we don't want to install
updated files to an ephemeral fs.
That said, the error message could be more useful.

-- 
Antoine



Re: Memory leak in rip engine (ripd)

2019-12-08 Thread Remi Locherer
On Thu, Dec 05, 2019 at 12:07:25PM +1100, Jason Tubnor wrote:
> I have been trying to track down a memory leak that has been observed for
> quite a number of years over multiple releases without success.  I feel
> that I have exhausted all potential configuration items that may have
> caused this issue so now filing a bug.
> 
> >Synopsis:  ripd engine develops memory leak over time
> >Category:  system
> >Environment:
> System  : OpenBSD 6.6
> Details : OpenBSD 6.6-current (GENERIC.MP) #492: Wed Nov 27
> 00:21:04 MST 2019
>  dera...@amd64.openbsd.org:
> /usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
> Architecture: OpenBSD.amd64
> Machine : amd64
> >Description:
>  When ripd isn't configured to redistribute anything, memory usage
> remains static, once something is set for redistribution, that is when
> memory begins to be consumed.  The config below (minus security - but
> without security yields the same result) is loaded and the engine process
> [below] is the memory behavior over a 3 hour period.  I have been able to
> expedite memory consumption by increasing the interfaces and endpoints that
> participate.  I have witnessed the engine grow to over 10GB in size with no
> change in the routing table (393 routes).  Once memory is exhausted and
> pressure put on swap, the host falls over.
> 
> >How-To-Repeat:
> Various version 5.8 - 6.6 exhibit this problem.  Below is a simple
> configuration and the resulting engine process being monitored over 3 hours:
> 
> # cat /etc/ripd.conf
> no redistribute /22
> redistribute connected
> split-horizon simple
> 
> interface vxlan32 {
> }
> interface vio0 {
> passive
> }
> interface vio1 {
> passive
> }
> interface vio2 {
> passive
> }
> interface vio3 {
> passive
> }
> interface vio4 {
> passive
> }
> interface vio5 {
> passive
> }
> 

I can reproduce this issue. But only when I combine the use of
"interface XY { passive }" and "redistribute connected". When I remove
the passive interfaces from ripd.conf memory consumption is stable.

Why do you combine these configs?

> -
> Memory growth over 3 hours ( 26.8 MB ) - 214.6 MB per day
> ---
> 
> Thu Dec  5 08:07:09 AEDT 2019
> _ripd23284  0.0  1.9 146104 154956 ??  Sp Wed04PM0:18.02 ripd:
> rip engine (ripd)
> Thu Dec  5 08:17:09 AEDT 2019
> _ripd23284  0.0  1.9 147588 156440 ??  Sp Wed04PM0:18.21 ripd:
> rip engine (ripd)
> Thu Dec  5 08:27:09 AEDT 2019
> _ripd23284  0.0  1.9 149056 157908 ??  Sp Wed04PM0:18.40 ripd:
> rip engine (ripd)
> Thu Dec  5 08:37:10 AEDT 2019
> _ripd23284  0.0  1.9 150524 159376 ??  Sp Wed04PM0:18.58 ripd:
> rip engine (ripd)
> Thu Dec  5 08:47:10 AEDT 2019
> _ripd23284  0.0  1.9 152064 160916 ??  Sp Wed04PM0:18.77 ripd:
> rip engine (ripd)
> Thu Dec  5 08:57:10 AEDT 2019
> _ripd23284  0.1  1.9 153544 162396 ??  Sp Wed04PM0:18.96 ripd:
> rip engine (ripd)
> Thu Dec  5 09:07:10 AEDT 2019
> _ripd23284  0.0  2.0 155116 163968 ??  Sp Wed04PM0:19.15 ripd:
> rip engine (ripd)
> Thu Dec  5 09:17:10 AEDT 2019
> _ripd23284  0.0  2.0 156656 165508 ??  Sp Wed04PM0:19.34 ripd:
> rip engine (ripd)
> Thu Dec  5 09:27:10 AEDT 2019
> _ripd23284  0.0  2.0 158152 167004 ??  Sp Wed04PM0:19.53 ripd:
> rip engine (ripd)
> Thu Dec  5 09:37:10 AEDT 2019
> _ripd23284  0.0  2.0 159680 168532 ??  Sp Wed04PM0:19.72 ripd:
> rip engine (ripd)
> Thu Dec  5 09:47:10 AEDT 2019
> _ripd23284  0.0  2.0 161164 170016 ??  Sp Wed04PM0:19.91 ripd:
> rip engine (ripd)
> Thu Dec  5 09:57:10 AEDT 2019
> _ripd23284  0.0  2.0 162624 171476 ??  Sp Wed04PM0:20.09 ripd:
> rip engine (ripd)
> Thu Dec  5 10:07:10 AEDT 2019
> _ripd23284  0.1  2.1 164100 172952 ??  Sp Wed04PM0:20.28 ripd:
> rip engine (ripd)
> Thu Dec  5 10:17:10 AEDT 2019
> _ripd23284  0.0  2.1 165648 174500 ??  Sp Wed04PM0:20.46 ripd:
> rip engine (ripd)
> Thu Dec  5 10:27:10 AEDT 2019
> _ripd23284  0.1  2.1 167128 175980 ??  Sp Wed04PM0:20.65 ripd:
> rip engine (ripd)
> Thu Dec  5 10:37:10 AEDT 2019
> _ripd23284  0.0  2.1 168608 177460 ??  Sp Wed04PM0:20.84 ripd:
> rip engine (ripd)
> Thu Dec  5 10:47:10 AEDT 2019
> _ripd23284  0.0  2.1 170060 178912 ??  Sp Wed04PM0:21.03 ripd:
> rip engine (ripd)
> Thu Dec  5 10:57:10 AEDT 2019
> _ripd23284  0.0  2.2 171532 180384 ??  Sp Wed04PM0:21.21 ripd:
> rip engine (ripd)
> Thu Dec  5 11:07:10 AEDT 2019
> _ripd23284  0.0  2.2 172932 181784 ??  Sp Wed04PM0:21.40 ripd:
> rip engine (ripd)
> Thu Dec  5 11:17:10 AEDT 2019
> _ripd23284  0.0  2.2 174480 183332 ??  Sp Wed04PM0:21.59 ripd:
> rip engine (ripd)
> 
> >Fix:
> Current workaround once memory gets to a specific level is to
> restart the ripd process (not ideal):
> 
>  - rcctl 

Re: Memory leak in rip engine (ripd)

2019-12-08 Thread Remi Locherer
On Sun, Dec 08, 2019 at 11:34:15PM +0100, Remi Locherer wrote:
> On Thu, Dec 05, 2019 at 12:07:25PM +1100, Jason Tubnor wrote:
> > I have been trying to track down a memory leak that has been observed for
> > quite a number of years over multiple releases without success.  I feel
> > that I have exhausted all potential configuration items that may have
> > caused this issue so now filing a bug.
> > 
> > >Synopsis:  ripd engine develops memory leak over time
> > >Category:  system
> > >Environment:
> > System  : OpenBSD 6.6
> > Details : OpenBSD 6.6-current (GENERIC.MP) #492: Wed Nov 27
> > 00:21:04 MST 2019
> >  dera...@amd64.openbsd.org:
> > /usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> >  When ripd isn't configured to redistribute anything, memory usage
> > remains static, once something is set for redistribution, that is when
> > memory begins to be consumed.  The config below (minus security - but
> > without security yields the same result) is loaded and the engine process
> > [below] is the memory behavior over a 3 hour period.  I have been able to
> > expedite memory consumption by increasing the interfaces and endpoints that
> > participate.  I have witnessed the engine grow to over 10GB in size with no
> > change in the routing table (393 routes).  Once memory is exhausted and
> > pressure put on swap, the host falls over.
> > 
> > >How-To-Repeat:
> > Various version 5.8 - 6.6 exhibit this problem.  Below is a simple
> > configuration and the resulting engine process being monitored over 3 hours:
> > 
> > # cat /etc/ripd.conf
> > no redistribute /22
> > redistribute connected
> > split-horizon simple
> > 
> > interface vxlan32 {
> > }
> > interface vio0 {
> > passive
> > }
> > interface vio1 {
> > passive
> > }
> > interface vio2 {
> > passive
> > }
> > interface vio3 {
> > passive
> > }
> > interface vio4 {
> > passive
> > }
> > interface vio5 {
> > passive
> > }
> > 
> 
> I can reproduce this issue. But only when I combine the use of
> "interface XY { passive }" and "redistribute connected". When I remove
> the passive interfaces from ripd.conf memory consumption is stable.
> 
> Why do you combine these configs?

Below diff fixes the memory leak for me.

In addition to clear r_list I also moved the check for passive interface
a bit up so that the function can return as soon as possible.

OK?

Remi



Index: message.c
===
RCS file: /cvs/src/usr.sbin/ripd/message.c,v
retrieving revision 1.12
diff -u -p -r1.12 message.c
--- message.c   25 Oct 2014 03:23:49 -  1.12
+++ message.c   8 Dec 2019 23:35:42 -
@@ -105,15 +105,15 @@ send_triggered_update(struct iface *ifac
u_int16_tafi, route_tag;
u_int32_taddress, netmask, nexthop, metric;
 
+   if (iface->passive)
+   return (0);
+
inet_aton(ALL_RIP_ROUTERS, &dst.sin_addr);
 
dst.sin_port = htons(RIP_PORT);
dst.sin_family = AF_INET;
dst.sin_len = sizeof(struct sockaddr_in);
 
-   if (iface->passive)
-   return (0);
-
if ((buf = ibuf_open(iface->mtu - sizeof(struct ip) -
sizeof(struct udphdr))) == NULL)
fatal("send_triggered_update");
@@ -166,13 +166,15 @@ send_request(struct packet_head *r_list,
port = htons(RIP_PORT);
}
 
+   if (iface->passive) {
+   clear_list(r_list);
+   return (0);
+   }
+
dst.sin_port = port;
dst.sin_family = AF_INET;
dst.sin_len = sizeof(struct sockaddr_in);
 
-   if (iface->passive)
-   return (0);
-
while (!TAILQ_EMPTY(r_list)) {
if ((buf = ibuf_open(iface->mtu - sizeof(struct ip) -
sizeof(struct udphdr))) == NULL)
@@ -240,13 +242,15 @@ send_response(struct packet_head *r_list
port = htons(RIP_PORT);
}
 
+   if (iface->passive) {
+   clear_list(r_list);
+   return (0);
+   }
+
dst.sin_port = port;
dst.sin_family = AF_INET;
dst.sin_len = sizeof(struct sockaddr_in);
 
-   if (iface->passive)
-   return (0);
-
while (!TAILQ_EMPTY(r_list)) {
if ((buf = ibuf_open(iface->mtu - sizeof(struct ip) -
sizeof(struct udphdr))) == NULL)



Re: Memory leak in rip engine (ripd)

2019-12-08 Thread Jason Tubnor
On Sat, 7 Dec 2019 at 02:02, Hiltjo Posthuma  wrote:

>
>
> Hi,
>
> Does ripd -d -v print any warnings/debug messages for you? If so, this
> could be
> useful information.
>
>
There was no additional information over an above the standard information
of interfaces coming up and the neighbor discovery.

Cheers,

Jason.


Re: Memory leak in rip engine (ripd)

2019-12-08 Thread Jason Tubnor
On Mon, 9 Dec 2019 at 09:34, Remi Locherer  wrote:

>
>
> I can reproduce this issue. But only when I combine the use of
> "interface XY { passive }" and "redistribute connected". When I remove
> the passive interfaces from ripd.conf memory consumption is stable.
>
> Why do you combine these configs?
>

We explicitly define what interfaces should not be participating RIPv2
activity as per ripd.conf(5).

I can confirm when passive isn't used then memory use is stable between end
points.


Re: syspatch breaks calling stat on mfs filesystem

2019-12-08 Thread Art Manion
On Sun, Dec 8, 2019 at 6:08 AM Antoine Jacoutot 
wrote:


> > > I have a system with /var and /tmp mounted as mfs.  The stat string
> > > format returns '??' for mfs devices:
> > >
> > > $ stat -f %Sd /bin/cat
> > > wd0a
> > >
> > > $ stat -f %Sd /var/db/libc.tags
> > > ??
> > >
> > > The 'grep -q ^/dev/??' on line 98 fails causing syspatch to error out
> > > reporting a read-only filesystem, which is not correct.
>


> > > mfs:53614 on /var type mfs (asynchronous, local, nodev, nosuid,
> size=524288
> > > 512-blocks)
> > > /dev/wd0e on /var/syspatch type ffs (local, nodev, nosuid)
>


> > > Can work around it by modifying the check on line 98.  Is it OK to
> allow
> > > mfs filesystems?
>


> > Actually having /var mounted over MFS is absolutely not supported
> (because this
> > is where we store rollback tarballs and where syspatch checks to see
> whether a
> > particular patch has been installed).
> > I will make that check stronger so that syspatch fails right away on
> MFS-mounted
> > /var.
>
> Ah I misread that you have /var/syspatch on UFS.
> Ok then the behavior you are seeing is to be expected; we don't want to
> install
> updated files to an ephemeral fs.
> That said, the error message could be more useful.


Understood. I'm OK dealing with my non-supported choice of mounting /var as
mfs.

The checks in question are about disk space, if the (valid!) concern is
about losing rollback files, I'd suggest an explicit check that
/var/syspatch is sane (local, UFS, whatever else). Every previous syspatch
on this system worked, only syspatch66-010_libcauth.tgz failed since it
happened to include new files destined for /var.

Regards,

 - Art


Re: Memory leak in rip engine (ripd)

2019-12-08 Thread Jason Tubnor
On Mon, 9 Dec 2019 at 10:44, Remi Locherer  wrote:

>
> >
> > I can reproduce this issue. But only when I combine the use of
> > "interface XY { passive }" and "redistribute connected". When I remove
> > the passive interfaces from ripd.conf memory consumption is stable.
> >
> > Why do you combine these configs?
>
> Below diff fixes the memory leak for me.
>
> In addition to clear r_list I also moved the check for passive interface
> a bit up so that the function can return as soon as possible.
>
> OK?
>
> Remi
>
>
>
>
This patch applied cleanly and has fixed the memory leak issue when
interfaces are configured 'passive'.  Tested with 1 active and 6 passive
interfaces on one host and with a little memory consumption on startup
[expected], it settled back down to 984KB of consumed RAM for 391 subnets
and didn't move over 1.5 hours.

As /cvs/src/usr.sbin/ripd/message.c hasn't been touched since 2014, this
patch would apply cleanly to 6.5 and 6.6 if an errata notice needed to be
created (I can test on these if validation is required - will just take a
little time to spin them up).

Thanks for your help Remi.

Cheers,

Jason.