Re: post ino64: lockd no runs?

2017-06-12 Thread Rodney W. Grimes
> On Mon, Jun 12, 2017 at 10:14 AM, John Baldwin  wrote:
> > On Sunday, June 11, 2017 11:12:25 AM David Wolfskill wrote:
> >> On Sun, Jun 04, 2017 at 08:57:44AM -0400, Michael Butler wrote:
> >> > It seems that {rpc.}lockd no longer runs after the ino64 changes on any
> >> > of my systems after a full rebuild of src and ports. No log entries
> >> > offer any insight as to why :-(
> >> >
> >> > imb
> >>
> >> I don't tend to use NFS on my systems that are running head, so I
> >> haven't had occasion to test this as stated.
> >>
> >> However, I just completed my weekly update of the "prooduction" systems
> >> here at home, running stable/11.  And I find that lockd seems to be ...
> >> claiming that all is well, but declining to run (for long).
> >>
> >> To the best of my knowledge, that was not the case until this last
> >> update, which was from:
> >>
> >> FreeBSD albert.catwhisker.org 11.1-PRERELEASE FreeBSD 11.1-PRERELEASE #316 
> >>  r319566M/319569:1100514: Sun Jun  4 03:54:41 PDT 2017 
> >> r...@freebeast.catwhisker.org:/common/S1/obj/usr/src/sys/ALBERT  amd64
> >>
> >> to
> >>
> >> FreeBSD albert.catwhisker.org 11.1-BETA1 FreeBSD 11.1-BETA1 #322  
> >> r319823M/319823:1100514: Sun Jun 11 03:56:10 PDT 2017 
> >> r...@freebeast.catwhisker.org:/common/S1/obj/usr/src/sys/ALBERT  amd64
> >>
> >> The "glaringly obvious" symptom in my case is that I am now unable
> >> to (directly) save an email message from within mutt(1) by appending
> >> it to an NFS-resident file.  (Saving it to a local file, then using
> >> cat(1) to append that to the NFS- resident file & removing the local
> >> copy works)
> >>
> >> After a few variations on a theme of:
> >>
> >> albert(11.1)[5] sudo service lockd restart
> >> lockd not running?
> >> Starting lockd.
> >> albert(11.1)[6] echo $?
> >> 0
> >> albert(11.1)[7] service lockd status
> >> lockd is not running.
> >>
> >> I finally(!) thought to ask ktrace what's going on (as tailing
> >> /var/log/messages was completely unproductive, even after enabling
> >> rc_debug).
> >>
> >> So I tried: "sudo ktrace -di service lockd restart"; upon exanimation of
> >> the output of kdump(1), I see that the trace ends with:
> >>
> >>   ...
> >>   2811 rpc.lockd NAMI  "/var/run/logpriv"
> >>   2786 sh   CALL  read(0xa,0x627fc0,0x400)
> >>   2786 sh   GIO   fd 10 read 0 bytes
> >>""
> >>   2811 rpc.lockd RET   connect 0
> >>   2786 sh   RET   read 0
> >>   2811 rpc.lockd CALL  sendto(0x3,0x7fffe2c0,0x27,0,0,0)
> >>   2786 sh   CALL  exit(0)
> >>   2811 rpc.lockd GIO   fd 3 wrote 39 bytes
> >>"<30>Jun 11 15:43:10 rpc.lockd: Starting"
> >>   2811 rpc.lockd RET   sendto 39/0x27
> >>   2811 rpc.lockd CALL  sigaction(SIGALRM,0x7fffec20,0)
> >>   2811 rpc.lockd RET   sigaction 0
> >>   2811 rpc.lockd CALL  nlm_syscall(0,0x1e,0x4,0x801015040)
> >>   2811 rpc.lockd RET   nlm_syscall -1 errno 14 Bad address
> >
> > This is a really good clue.  nlm_syscall is dying with EFAULT.  The last
> > argument is a pointer to an array of char * pointers, and the only way
> > I can see it dying is if it fails to copyin() one of the strings pointed
> > to by those pointers.  You could try running rpc.lockd under gdb from
> > ports and setting a breakpoint on 'nlm_syscall' and then printing out
> > 'addr_count' and 'p addrs@(addr_count * 2)'.
> 
> Yes, I found that the kernel was trying to copyin() from NULL, and
> then found that corresponds to 'uaddr'.  After some tracing I found
> that the tightened condition for taddr2uaddr have enforced (correctly)
> buffer length passed from caller, which was not set correctly since ~9
> years ago (r177633, which sets the size to sizeof(pointer)) but never
> gets noticed because there is no check on that, so the solution seems
> to be to correctly set the length values to (allocated size), and that
> have fixed the issue for me.
> 
> The code could use some cleanups and I plan to do it at some later time.
> 
> > Unfortunately I'm not able to reproduce the failure on a test machine
> > I have running head post-ino64.
> 
> This should have been fixed by r319852 in -HEAD (
> https://svnweb.freebsd.org/base?view=revision&revision=319852 ), and
> I'll MFC the change after 3 days' settle  assuming there is no
> objections, as this is a regression.

(RE hat on)
The next 11.1 release builds start on the 16th, please try to make
your RFa to RE and complete the merge before that date, I would really
hate to have 11.1 go out without this fixed.


-- 
Rod Grimes rgri...@freebsd.org
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: post ino64: lockd no runs?

2017-06-12 Thread Xin LI
On Mon, Jun 12, 2017 at 10:14 AM, John Baldwin  wrote:
> On Sunday, June 11, 2017 11:12:25 AM David Wolfskill wrote:
>> On Sun, Jun 04, 2017 at 08:57:44AM -0400, Michael Butler wrote:
>> > It seems that {rpc.}lockd no longer runs after the ino64 changes on any
>> > of my systems after a full rebuild of src and ports. No log entries
>> > offer any insight as to why :-(
>> >
>> > imb
>>
>> I don't tend to use NFS on my systems that are running head, so I
>> haven't had occasion to test this as stated.
>>
>> However, I just completed my weekly update of the "prooduction" systems
>> here at home, running stable/11.  And I find that lockd seems to be ...
>> claiming that all is well, but declining to run (for long).
>>
>> To the best of my knowledge, that was not the case until this last
>> update, which was from:
>>
>> FreeBSD albert.catwhisker.org 11.1-PRERELEASE FreeBSD 11.1-PRERELEASE #316  
>> r319566M/319569:1100514: Sun Jun  4 03:54:41 PDT 2017 
>> r...@freebeast.catwhisker.org:/common/S1/obj/usr/src/sys/ALBERT  amd64
>>
>> to
>>
>> FreeBSD albert.catwhisker.org 11.1-BETA1 FreeBSD 11.1-BETA1 #322  
>> r319823M/319823:1100514: Sun Jun 11 03:56:10 PDT 2017 
>> r...@freebeast.catwhisker.org:/common/S1/obj/usr/src/sys/ALBERT  amd64
>>
>> The "glaringly obvious" symptom in my case is that I am now unable
>> to (directly) save an email message from within mutt(1) by appending
>> it to an NFS-resident file.  (Saving it to a local file, then using
>> cat(1) to append that to the NFS- resident file & removing the local
>> copy works)
>>
>> After a few variations on a theme of:
>>
>> albert(11.1)[5] sudo service lockd restart
>> lockd not running?
>> Starting lockd.
>> albert(11.1)[6] echo $?
>> 0
>> albert(11.1)[7] service lockd status
>> lockd is not running.
>>
>> I finally(!) thought to ask ktrace what's going on (as tailing
>> /var/log/messages was completely unproductive, even after enabling
>> rc_debug).
>>
>> So I tried: "sudo ktrace -di service lockd restart"; upon exanimation of
>> the output of kdump(1), I see that the trace ends with:
>>
>>   ...
>>   2811 rpc.lockd NAMI  "/var/run/logpriv"
>>   2786 sh   CALL  read(0xa,0x627fc0,0x400)
>>   2786 sh   GIO   fd 10 read 0 bytes
>>""
>>   2811 rpc.lockd RET   connect 0
>>   2786 sh   RET   read 0
>>   2811 rpc.lockd CALL  sendto(0x3,0x7fffe2c0,0x27,0,0,0)
>>   2786 sh   CALL  exit(0)
>>   2811 rpc.lockd GIO   fd 3 wrote 39 bytes
>>"<30>Jun 11 15:43:10 rpc.lockd: Starting"
>>   2811 rpc.lockd RET   sendto 39/0x27
>>   2811 rpc.lockd CALL  sigaction(SIGALRM,0x7fffec20,0)
>>   2811 rpc.lockd RET   sigaction 0
>>   2811 rpc.lockd CALL  nlm_syscall(0,0x1e,0x4,0x801015040)
>>   2811 rpc.lockd RET   nlm_syscall -1 errno 14 Bad address
>
> This is a really good clue.  nlm_syscall is dying with EFAULT.  The last
> argument is a pointer to an array of char * pointers, and the only way
> I can see it dying is if it fails to copyin() one of the strings pointed
> to by those pointers.  You could try running rpc.lockd under gdb from
> ports and setting a breakpoint on 'nlm_syscall' and then printing out
> 'addr_count' and 'p addrs@(addr_count * 2)'.

Yes, I found that the kernel was trying to copyin() from NULL, and
then found that corresponds to 'uaddr'.  After some tracing I found
that the tightened condition for taddr2uaddr have enforced (correctly)
buffer length passed from caller, which was not set correctly since ~9
years ago (r177633, which sets the size to sizeof(pointer)) but never
gets noticed because there is no check on that, so the solution seems
to be to correctly set the length values to (allocated size), and that
have fixed the issue for me.

The code could use some cleanups and I plan to do it at some later time.

> Unfortunately I'm not able to reproduce the failure on a test machine
> I have running head post-ino64.

This should have been fixed by r319852 in -HEAD (
https://svnweb.freebsd.org/base?view=revision&revision=319852 ), and
I'll MFC the change after 3 days' settle  assuming there is no
objections, as this is a regression.

Cheers,
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: post ino64: lockd no runs?

2017-06-12 Thread John Baldwin
On Sunday, June 11, 2017 11:12:25 AM David Wolfskill wrote:
> On Sun, Jun 04, 2017 at 08:57:44AM -0400, Michael Butler wrote:
> > It seems that {rpc.}lockd no longer runs after the ino64 changes on any
> > of my systems after a full rebuild of src and ports. No log entries
> > offer any insight as to why :-(
> > 
> > imb
> 
> I don't tend to use NFS on my systems that are running head, so I
> haven't had occasion to test this as stated.
> 
> However, I just completed my weekly update of the "prooduction" systems
> here at home, running stable/11.  And I find that lockd seems to be ...
> claiming that all is well, but declining to run (for long).
> 
> To the best of my knowledge, that was not the case until this last
> update, which was from:
> 
> FreeBSD albert.catwhisker.org 11.1-PRERELEASE FreeBSD 11.1-PRERELEASE #316  
> r319566M/319569:1100514: Sun Jun  4 03:54:41 PDT 2017 
> r...@freebeast.catwhisker.org:/common/S1/obj/usr/src/sys/ALBERT  amd64
> 
> to
> 
> FreeBSD albert.catwhisker.org 11.1-BETA1 FreeBSD 11.1-BETA1 #322  
> r319823M/319823:1100514: Sun Jun 11 03:56:10 PDT 2017 
> r...@freebeast.catwhisker.org:/common/S1/obj/usr/src/sys/ALBERT  amd64
> 
> The "glaringly obvious" symptom in my case is that I am now unable
> to (directly) save an email message from within mutt(1) by appending
> it to an NFS-resident file.  (Saving it to a local file, then using
> cat(1) to append that to the NFS- resident file & removing the local
> copy works)
> 
> After a few variations on a theme of:
> 
> albert(11.1)[5] sudo service lockd restart
> lockd not running?
> Starting lockd.
> albert(11.1)[6] echo $?
> 0
> albert(11.1)[7] service lockd status
> lockd is not running.
> 
> I finally(!) thought to ask ktrace what's going on (as tailing
> /var/log/messages was completely unproductive, even after enabling
> rc_debug).
> 
> So I tried: "sudo ktrace -di service lockd restart"; upon exanimation of
> the output of kdump(1), I see that the trace ends with:
> 
>   ...
>   2811 rpc.lockd NAMI  "/var/run/logpriv"
>   2786 sh   CALL  read(0xa,0x627fc0,0x400)
>   2786 sh   GIO   fd 10 read 0 bytes
>""
>   2811 rpc.lockd RET   connect 0
>   2786 sh   RET   read 0
>   2811 rpc.lockd CALL  sendto(0x3,0x7fffe2c0,0x27,0,0,0)
>   2786 sh   CALL  exit(0)
>   2811 rpc.lockd GIO   fd 3 wrote 39 bytes
>"<30>Jun 11 15:43:10 rpc.lockd: Starting"
>   2811 rpc.lockd RET   sendto 39/0x27
>   2811 rpc.lockd CALL  sigaction(SIGALRM,0x7fffec20,0)
>   2811 rpc.lockd RET   sigaction 0
>   2811 rpc.lockd CALL  nlm_syscall(0,0x1e,0x4,0x801015040)
>   2811 rpc.lockd RET   nlm_syscall -1 errno 14 Bad address

This is a really good clue.  nlm_syscall is dying with EFAULT.  The last
argument is a pointer to an array of char * pointers, and the only way
I can see it dying is if it fails to copyin() one of the strings pointed
to by those pointers.  You could try running rpc.lockd under gdb from
ports and setting a breakpoint on 'nlm_syscall' and then printing out
'addr_count' and 'p addrs@(addr_count * 2)'.

Unfortunately I'm not able to reproduce the failure on a test machine
I have running head post-ino64.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: post ino64: lockd no runs?

2017-06-12 Thread David Wolfskill
On Mon, Jun 12, 2017 at 12:24:58AM -0700, Xin Li wrote:
> Thanks for Kostantin's hints, this is indeed related to my change (which
> exposed an old bug with rpc.lockd).
> 
> Please try attached fix.
> 

Aye; that appears to do the job:

freebeast(11.1)[1] uname -a && service lockd status
FreeBSD freebeast.catwhisker.org 11.1-BETA1 FreeBSD 11.1-BETA1 #367  
r319823M/319852:1100514: Mon Jun 12 04:58:48 PDT 2017 
r...@freebeast.catwhisker.org:/co
mmon/S3/obj/usr/src/sys/GENERIC  amd64
lockd is running as pid 602.
freebeast(11.1)[2] 

Thanks! :-)

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
Trump (et al.): Hiding information doesn't prove its falsity.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


signature.asc
Description: PGP signature


Re: post ino64: lockd no runs?

2017-06-12 Thread Xin Li
Thanks for Kostantin's hints, this is indeed related to my change (which
exposed an old bug with rpc.lockd).

Please try attached fix.

Cheers,
Index: usr.sbin/rpc.lockd/lockd.c
===
--- usr.sbin/rpc.lockd/lockd.c  (revision 319826)
+++ usr.sbin/rpc.lockd/lockd.c  (working copy)
@@ -902,8 +902,7 @@ lookup_addresses(struct netconfig *nconf)
sin->sin_port = htons(0);
sin->sin_addr.s_addr = 
htonl(INADDR_ANY);
res->ai_addr = (struct 
sockaddr*) sin;
-   res->ai_addrlen = (socklen_t)
-   sizeof(res->ai_addr);
+   res->ai_addrlen = sizeof(struct 
sockaddr_in);
break;
case AF_INET6:
sin6 = malloc(sizeof(struct 
sockaddr_in6));
@@ -913,7 +912,7 @@ lookup_addresses(struct netconfig *nconf)
sin6->sin6_port = htons(0);
sin6->sin6_addr = in6addr_any;
res->ai_addr = (struct 
sockaddr*) sin6;
-   res->ai_addrlen = (socklen_t) 
sizeof(res->ai_addr);
+   res->ai_addrlen = sizeof(struct 
sockaddr_in6);
break;
default:
break;
@@ -938,7 +937,7 @@ lookup_addresses(struct netconfig *nconf)
}
}
 
-   servaddr.len = servaddr.maxlen = res->ai_addr->sa_len;
+   servaddr.len = servaddr.maxlen = res->ai_addrlen;
servaddr.buf = res->ai_addr;
uaddr = taddr2uaddr(nconf, &servaddr);
 


signature.asc
Description: OpenPGP digital signature


Re: post ino64: lockd no runs?

2017-06-11 Thread David Wolfskill
On Sun, Jun 11, 2017 at 09:58:30PM +0300, Konstantin Belousov wrote:
> On Sun, Jun 11, 2017 at 11:12:25AM -0700, David Wolfskill wrote:
> >   2811 rpc.lockd CALL  nlm_syscall(0,0x1e,0x4,0x801015040)
> >   2811 rpc.lockd RET   nlm_syscall -1 errno 14 Bad address
> 
> If you revert r319614 on stable/11, does the problem go away ?
> 

As it happens, apparently so.

I was able to reproduce the symptom on my build machine:

freebeast(11.1)[1] uname -a && service lockd status 
FreeBSD freebeast.catwhisker.org 11.1-BETA1 FreeBSD 11.1-BETA1 #366  
r319823M/319823:1100514: Sun Jun 11 03:55:49 PDT 2017 
r...@freebeast.catwhisker.org:/co
mmon/S1/obj/usr/src/sys/GENERIC  amd64
lockd is not running.
freebeast(11.1)[2] 

I then "cloned" slice 1 to slice 3, and on slice 3's /usr/src, I
used "svn diff" and "svn patch --reverse-diff" to effectively revert
r319614, then rebooted from slice 3, did a normal src-based update;
rebooted, and:

freebeast(11.1)[1] uname -a && service lockd status
FreeBSD freebeast.catwhisker.org 11.1-BETA1 FreeBSD 11.1-BETA1 #367  
r319823M/319823:1100514: Sun Jun 11 13:31:49 PDT 2017 
r...@freebeast.catwhisker.org:/co
mmon/S3/obj/usr/src/sys/GENERIC  amd64
lockd is running as pid 600.
freebeast(11.1)[2]


If there's a patch someone would like me to try that's a bit more
involved than just reverting r319614, I'm up for it.

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
Looking forward to telling Mr. Trump: "You're fired!"

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


signature.asc
Description: PGP signature


Re: post ino64: lockd no runs?

2017-06-11 Thread Konstantin Belousov
On Sun, Jun 11, 2017 at 11:12:25AM -0700, David Wolfskill wrote:
>   2811 rpc.lockd CALL  nlm_syscall(0,0x1e,0x4,0x801015040)
>   2811 rpc.lockd RET   nlm_syscall -1 errno 14 Bad address

If you revert r319614 on stable/11, does the problem go away ?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: post ino64: lockd no runs?

2017-06-11 Thread Cy Schubert
In message <20170611172022.ga3...@albert.catwhisker.org>, David Wolfskill 
write
s:
> 
> --0eh6TmSyL6TZE2Uz
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: inline
> Content-Transfer-Encoding: quoted-printable
> 
> On Sun, Jun 04, 2017 at 08:57:44AM -0400, Michael Butler wrote:
> > It seems that {rpc.}lockd no longer runs after the ino64 changes on any
> > of my systems after a full rebuild of src and ports. No log entries
> > offer any insight as to why :-(
> >=20
> > imb
> 
> I don't tend to use NFS on my systems that are running head, so I
> haven't had occasion to test this as stated.
> 
> However, I just completed my weekly update of the "prooduction" systems
> here at home, running stable/11.  And I find that lockd seems to be ...
> claiming that all is well, but declining to run (for long).
> 
> To the best of my knowledge, that was not the case until this last
> update, which was from:
> 
> FreeBSD albert.catwhisker.org 11.1-PRERELEASE FreeBSD 11.1-PRERELEASE #316 =
>  r319566M/319569:1100514: Sun Jun  4 03:54:41 PDT 2017 root@freebeast.c=
> atwhisker.org:/common/S1/obj/usr/src/sys/ALBERT  amd64
> 
> to
> 
> FreeBSD albert.catwhisker.org 11.1-BETA1 FreeBSD 11.1-BETA1 #322  r319823M/=
> 319823:1100514: Sun Jun 11 03:56:10 PDT 2017 root@freebeast.catwhisker.=
> org:/common/S1/obj/usr/src/sys/ALBERT  amd64
> 
> The "glaringly obvious" symptom in my case is that I am now unable
> to (directly) save an email message from within mutt(1) by appending
> it to an NFS-resident file.  (Saving it to a local file, then using
> cat(1) to append that to the NFS- resident file & removing the local
> copy works)
> 
> After a few variations on a theme of:
> 
> albert(11.1)[5] sudo service lockd restart
> lockd not running?
> Starting lockd.
> albert(11.1)[6] echo $?
> 0
> albert(11.1)[7] service lockd status
> lockd is not running.
> 
> I finally(!) thought to ask ktrace what's going on (as tailing
> /var/log/messages was completely unproductive, even after enabling
> rc_debug).
> 
> So I tried: "sudo ktrace -di service lockd restart"; upon exanimation of
> the output of kdump(1), I see that the trace ends with:
> 
>   ...
>   2811 rpc.lockd NAMI  "/var/run/logpriv"
>   2786 sh   CALL  read(0xa,0x627fc0,0x400)
>   2786 sh   GIO   fd 10 read 0 bytes
>""
>   2811 rpc.lockd RET   connect 0
>   2786 sh   RET   read 0
>   2811 rpc.lockd CALL  sendto(0x3,0x7fffe2c0,0x27,0,0,0)
>   2786 sh   CALL  exit(0)
>   2811 rpc.lockd GIO   fd 3 wrote 39 bytes
>"<30>Jun 11 15:43:10 rpc.lockd: Starting"
>   2811 rpc.lockd RET   sendto 39/0x27
>   2811 rpc.lockd CALL  sigaction(SIGALRM,0x7fffec20,0)
>   2811 rpc.lockd RET   sigaction 0
>   2811 rpc.lockd CALL  nlm_syscall(0,0x1e,0x4,0x801015040)
>   2811 rpc.lockd RET   nlm_syscall -1 errno 14 Bad address
>   2811 rpc.lockd CALL  sigprocmask(SIG_BLOCK,0x800830c78,0x7fffea40)
>   2811 rpc.lockd RET   sigprocmask 0
>   2811 rpc.lockd CALL  sigprocmask(SIG_SETMASK,0x800830c8c,0)
>   2811 rpc.lockd RET   sigprocmask 0
>   2811 rpc.lockd CALL  sigprocmask(SIG_BLOCK,0x800830c78,0x7fffe5b0)
>   2811 rpc.lockd RET   sigprocmask 0
>   2811 rpc.lockd CALL  sigprocmask(SIG_SETMASK,0x800830c8c,0)
>   2811 rpc.lockd RET   sigprocmask 0
>   2811 rpc.lockd CALL  sigprocmask(SIG_BLOCK,0x800830c78,0x7fffe5b0)
>   2811 rpc.lockd RET   sigprocmask 0
>   2811 rpc.lockd CALL  sigprocmask(SIG_SETMASK,0x800830c8c,0)
>   2811 rpc.lockd RET   sigprocmask 0
>   2811 rpc.lockd CALL  sigprocmask(SIG_BLOCK,0x800830c78,0x7fffe5b0)
>   2811 rpc.lockd RET   sigprocmask 0
>   2811 rpc.lockd CALL  sigprocmask(SIG_SETMASK,0x800830c8c,0)
>   2811 rpc.lockd RET   sigprocmask 0
>   2811 rpc.lockd CALL  exit(0x1)
> 
> Then, when I tried to send this message, I started getting more whines
> =66rom mutt(1).  I finall gave up and rebooted from the previous
> environment:
> 
> FreeBSD albert.catwhisker.org 11.1-PRERELEASE FreeBSD 11.1-PRERELEASE #316 =
>  r319566M/319569:1100514: Sun Jun  4 03:54:41 PDT 2017 root@freebeast.c=
> atwhisker.org:/common/S1/obj/usr/src/sys/ALBERT  amd64
> 
> and lockd is running:
> 
> albert(11.1-P)[2] service lockd status
> lockd is running as pid 629.
> albert(11.1-P)[3]=20
> 
> so mutt(1) is not pitchng a hisssy-fit every time I try to save or
> send a message.
> 
> 
> In light of the above, I have Bcced: this message to current@ (where
> the thread originated) and sent it (and set replies) to stable@.
> 
> 
> I have a test system, last updated to stable/11 as of mid-October
> last year; lockd was running on it, as well (which is why I tried
> going back to last week's image).  I'm happy to update it to points
> where lockd may be broken, if it might help figure out what's broken
> and how to fix it.

I'm running lockd on recent -CURRENT systems. No issues so far. Locking 
works as expected.



-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  http://www.FreeBSD.org

The need of the many outweigh

Re: post ino64: lockd no runs?

2017-06-11 Thread David Wolfskill
On Sun, Jun 04, 2017 at 08:57:44AM -0400, Michael Butler wrote:
> It seems that {rpc.}lockd no longer runs after the ino64 changes on any
> of my systems after a full rebuild of src and ports. No log entries
> offer any insight as to why :-(
> 
>   imb

I don't tend to use NFS on my systems that are running head, so I
haven't had occasion to test this as stated.

However, I just completed my weekly update of the "prooduction" systems
here at home, running stable/11.  And I find that lockd seems to be ...
claiming that all is well, but declining to run (for long).

To the best of my knowledge, that was not the case until this last
update, which was from:

FreeBSD albert.catwhisker.org 11.1-PRERELEASE FreeBSD 11.1-PRERELEASE #316  
r319566M/319569:1100514: Sun Jun  4 03:54:41 PDT 2017 
r...@freebeast.catwhisker.org:/common/S1/obj/usr/src/sys/ALBERT  amd64

to

FreeBSD albert.catwhisker.org 11.1-BETA1 FreeBSD 11.1-BETA1 #322  
r319823M/319823:1100514: Sun Jun 11 03:56:10 PDT 2017 
r...@freebeast.catwhisker.org:/common/S1/obj/usr/src/sys/ALBERT  amd64

The "glaringly obvious" symptom in my case is that I am now unable
to (directly) save an email message from within mutt(1) by appending
it to an NFS-resident file.  (Saving it to a local file, then using
cat(1) to append that to the NFS- resident file & removing the local
copy works)

After a few variations on a theme of:

albert(11.1)[5] sudo service lockd restart
lockd not running?
Starting lockd.
albert(11.1)[6] echo $?
0
albert(11.1)[7] service lockd status
lockd is not running.

I finally(!) thought to ask ktrace what's going on (as tailing
/var/log/messages was completely unproductive, even after enabling
rc_debug).

So I tried: "sudo ktrace -di service lockd restart"; upon exanimation of
the output of kdump(1), I see that the trace ends with:

  ...
  2811 rpc.lockd NAMI  "/var/run/logpriv"
  2786 sh   CALL  read(0xa,0x627fc0,0x400)
  2786 sh   GIO   fd 10 read 0 bytes
   ""
  2811 rpc.lockd RET   connect 0
  2786 sh   RET   read 0
  2811 rpc.lockd CALL  sendto(0x3,0x7fffe2c0,0x27,0,0,0)
  2786 sh   CALL  exit(0)
  2811 rpc.lockd GIO   fd 3 wrote 39 bytes
   "<30>Jun 11 15:43:10 rpc.lockd: Starting"
  2811 rpc.lockd RET   sendto 39/0x27
  2811 rpc.lockd CALL  sigaction(SIGALRM,0x7fffec20,0)
  2811 rpc.lockd RET   sigaction 0
  2811 rpc.lockd CALL  nlm_syscall(0,0x1e,0x4,0x801015040)
  2811 rpc.lockd RET   nlm_syscall -1 errno 14 Bad address
  2811 rpc.lockd CALL  sigprocmask(SIG_BLOCK,0x800830c78,0x7fffea40)
  2811 rpc.lockd RET   sigprocmask 0
  2811 rpc.lockd CALL  sigprocmask(SIG_SETMASK,0x800830c8c,0)
  2811 rpc.lockd RET   sigprocmask 0
  2811 rpc.lockd CALL  sigprocmask(SIG_BLOCK,0x800830c78,0x7fffe5b0)
  2811 rpc.lockd RET   sigprocmask 0
  2811 rpc.lockd CALL  sigprocmask(SIG_SETMASK,0x800830c8c,0)
  2811 rpc.lockd RET   sigprocmask 0
  2811 rpc.lockd CALL  sigprocmask(SIG_BLOCK,0x800830c78,0x7fffe5b0)
  2811 rpc.lockd RET   sigprocmask 0
  2811 rpc.lockd CALL  sigprocmask(SIG_SETMASK,0x800830c8c,0)
  2811 rpc.lockd RET   sigprocmask 0
  2811 rpc.lockd CALL  sigprocmask(SIG_BLOCK,0x800830c78,0x7fffe5b0)
  2811 rpc.lockd RET   sigprocmask 0
  2811 rpc.lockd CALL  sigprocmask(SIG_SETMASK,0x800830c8c,0)
  2811 rpc.lockd RET   sigprocmask 0
  2811 rpc.lockd CALL  exit(0x1)

Then, when I tried to send this message, I started getting more whines
from mutt(1).  I finall gave up and rebooted from the previous
environment:

FreeBSD albert.catwhisker.org 11.1-PRERELEASE FreeBSD 11.1-PRERELEASE #316  
r319566M/319569:1100514: Sun Jun  4 03:54:41 PDT 2017 
r...@freebeast.catwhisker.org:/common/S1/obj/usr/src/sys/ALBERT  amd64

and lockd is running:

albert(11.1-P)[2] service lockd status
lockd is running as pid 629.
albert(11.1-P)[3] 

so mutt(1) is not pitchng a hisssy-fit every time I try to save or
send a message.


In light of the above, I have Bcced: this message to current@ (where
the thread originated) and sent it (and set replies) to stable@.


I have a test system, last updated to stable/11 as of mid-October
last year; lockd was running on it, as well (which is why I tried
going back to last week's image).  I'm happy to update it to points
where lockd may be broken, if it might help figure out what's broken
and how to fix it.

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
Looking forward to telling Mr. Trump: "You're fired!"

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


signature.asc
Description: PGP signature