utilities in /usr/bin or /usr/sbin?

2018-06-14 Thread Rick Macklem
Hi,

I have  three new utilities that are mainly useful for managing the pNFS server
committed as r335130.

In the projects tree, I have them in /usr/bin and man section 1. However,
since they are mostly useful to a sysadmin managing the pNFS service,
I'm thinking that maybe they should be in /usr/sbin with man pages in section 8.

Which of these sounds correct?

Thanks, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Heads Up: pNFS server merge committed to head

2018-06-12 Thread Rick Macklem
Since I only got one response to my query w.r.t. should
projects/pnfs-planb-server be merged into head and it wasn't negative,
I went with "no news is good news" and did the merge/commit.
It is now in head as r335012.
Since it has survived a recent "make universe", I hope it won't cause build
problems, but I will be watching my email for any problems and hopefully
can resolve them quickly.

It was a large commit but should not affect the NFS server for non-pNFS service.

This was the kernel commit. I will commit the changes to userland and man pages
once the dust settles.
It does change the internal interface used between the nfscommon.ko, nfsd.ko and
nfscl.ko modules, so these all need to be rebuilt.
Should I bump __FreeBSD_version to force this for head/current?

Hopefully it will not cause major problems for people.

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: how do I use the make universe machines?

2018-06-07 Thread Rick Macklem
Just replying to one of the messages at random...
Benjamin Kaduk wrote:
[stuff snipped]
>I think https://www.freebsd.org/internal/machines.html sounds like
>the page you're looking for.  (universe is just a top-level make
>target like buildworld, but will take a while on non-beefy
>hardware.)
Yea, the last time I tried it on an i386 was FreeBSD9 and even then it took
a week to complete.;-)

Thanks everyone, it is working fine. In particular, I;d like to thank cem@ for
the hint about reading "motd" to see what the command is.

Thanks again, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


how do I use the make universe machines?

2018-06-05 Thread Rick Macklem
I've heard mention of "make universe" machines multiple times,
but have no idea how to use them?
Is there doc on this?

Thanks, rick
ps: I'll admit I haven't looked at the developer's guide in a long time.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: how to deal with variable set but not used warnings?

2018-06-04 Thread Rick Macklem
Matthew Macy wrote:
>On Sun, Jun 3, 2018 at 2:40 PM, Theron  wrote:
>>> 4. Disable the stupid warning in the Makefile / build system. If you don't
>>> care, and there's a good reason for what you are doing (sounds like there
>>> is), better to just disable the warning as so much useless noise.
>>>
>>> Warner
>>> ___
>>> freebsd-current@freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-current
>>> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>>
>> Or possibly, alongside a comment as in (3), use one of these:
>> 5 - Disable warning pragma -
>> http://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Pragmas.html
>> 6 - Use __attribute__((unused)) -
> https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#Common-Variable-Attributes
>
>
>There is already an __unused alias for #6. It's what I've used to
>annotate variables that are only used by INVARIANTS builds. It
>legitimately finds a bunch of dead code. However, 90+% of the
>instances of the warning are not interesting.
Ok. I didn't realize that __unused would work for this case of "set but not 
used"
but I just tried it on the older gcc48 I have lying around and it worked.
(clang doesn't seem to warn or care about these cases.)

I may use this, since I avoid messing with the make files like the plague.

Thanks, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


how to deal with variable set but not used warnings?

2018-06-03 Thread Rick Macklem
mmacy has sent me a bunch of warnings of the "variable set but not used" kind
generated by gcc8.

When I've looked at the code, these are for RPC arguments I parse but do not
use at this time.
I'd  like to leave the code in place, since these arguments may be needed in the
future and it is hard to figure out how to get them years from now, when they
might be needed.
I can think of 3 ways to handle this:
1 - Get rid of the code. (As above, I'd rather not do this.)
2 - Wrap the code with "#if 0"/"#endif" or similar. I'll admit that I find this 
rather
  ugly and tends to make the code harder to follow.
3 - Leave the code and add a comment w.r.t. why the variables are set but not 
used.

So, what do others think is the preferable alternative?
(Or maybe you have a #4 that seems better than any of these.)

Thanks for your comments, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


SPDX-License-Id for new files

2018-06-03 Thread Rick Macklem
I have a few (3) new files in the projects/pnfs-planb-server subversion tree
that all have the 2 clause FreeBSD copyright.

Do I just add the "SPDX..." line for this license at the top of the copyright 
comment
or is there some other exercise needed to be done for this?

Thanks, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: mlx5(4) jumbo receive

2018-04-25 Thread Rick Macklem
Ryan Stone wrote:
>On Tue, Apr 24, 2018 at 4:55 AM, Konstantin Belousov 
>>>wrote:
>> +#ifndef MLX5E_MAX_RX_BYTES
>> +#defineMLX5E_MAX_RX_BYTES MCLBYTES
>> +#endif
>
>Why do you use a 2KB buffer rather than a PAGE_SIZE'd buffer?
>MJUMPAGESIZE should offer significantly better performance for jumbo
>frames without increasing the risk of memory fragmentation.
Actually, when I was playing with using jumbo mbuf clusters for NFS, I was able
to get it to fragment to the point where allocations failed when mixing 2K and
4K mbuf clusters.
Admittedly I was using a 256Mbyte i386 and it wasn't easily reproduced, but
it was possible.
--> Using a mix of 2K and 4K mbuf clusters can result in fragmentation, although
  I suspect that it isn't nearly as serious as what can happen when using 9K
  mbuf clusters.

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SCHED_ULE makes 256Mbyte i386 unusable

2018-04-22 Thread Rick Macklem
Konstantin Belousov wrote:
>On Sat, Apr 21, 2018 at 11:30:55PM +0000, Rick Macklem wrote:
>> Konstantin Belousov wrote:
>> >On Sat, Apr 21, 2018 at 07:21:58PM +, Rick Macklem wrote:
>> >> I decided to start a new thread on current related to SCHED_ULE, since I 
>> >> see
>> >> more than just performance degradation and on a recent current kernel.
>> >> (I cc'd a couple of the people discussing performance problems in 
>> >> freebsd-stable
>> >>  recently under a subject line of "Re: kern.sched.quantum: Creepy, 
>> >> sadistic scheduler".
>> >>
>> >> When testing a pNFS server on a single core i386 with 256Mbytes using a 
>> >> Dec. 2017
>> >> current/head kernel, I would see about a 30% performance degradation 
>> >> (elapsed
>> >> run time for a kernel build over NFSv4.1) when the server kernel was 
>> >> built with
>> >> options SCHED_ULE
>> >> instead of
>> >> options SCHED_4BSD
So, now that I have decreased the number of nfsd kernel threads to 32, it works
with both schedulers and with essentially the same performance. (ie. The 30%
performance degradation has disappeared.)

>> >>
>> >> Now, with a kernel from a couple of days ago, the
>> >> options SCHED_ULE
>> >> kernel becomes unusable shortly after starting testing.
>> >> I have seen two variants of this:
>> >> - Became essentially hung. All I could do was ping the machine from the 
>> >> network.
>> >> - Reported "vm_thread_new: kstack allocation failed
>> >>   and then any attempt to do anything gets "No more processes".
>> >This is strange.  It usually means that you get KVA either exhausted or
>> >severly fragmented.
>> Yes. I reduced the number of nfsd threads from 256->32 and the SCHED_ULE
>> kernel is working ok now. I haven't done enough to compare performance yet.
>> Maybe I'll post again when I have some numbers.
>>
>> >Enter ddb, it should be operational since pings are replied.  Try to see
>> >where the threads are stuck.
>> I didn't do this, since reducing the number of kernel threads seems to have 
>> fixed
>> the problem. For the pNFS server, the nfsd threads will spawn additional 
>> kernel
>> threads to do proxies to the mirrored DS servers.
>>
>> >> with the only difference being a kernel built with
>> >> options SCHED_4BSD
>> >> everything works and performs the same as the Dec 2017 kernel.
>> >>
>> >> I can try rolling back through the revisions, but it would be nice if 
>> >> someone
>> >> could suggest where to start, because it takes a couple of hours to build 
>> >> a
>> >> kernel on this system.
>> >>
>> >> So, something has made things worse for a head/current kernel this 
>> >> winter, rick
>> >
>> >There are at least two potentially relevant changes.
>> >
>> >First is r326758 Dec 11 which bumped KSTACK_PAGES on i386 to 4.
>> I've been running this machine with KSTACK_PAGES=4 for some time, so no 
>> change.
W.r.t. Rodney Grimes comments about this (which didn't end up in this messages
in the thread):
I didn't see any instability when using KSTACK_PAGES=4 for this until this 
cropped
up and seemed to be scheduler related (but not really, it seems).
I bumped it to KSTACK_PAGES=4 because I needed that for the pNFS Metadata
Server code.

Yes, NFS does use quite a bit of kernel stack. Unfortunately, it isn't one big
item getting allocated on the stack, but many moderate sized ones.
(A part of it is multiple instances of "struct vattr", some buried in "struct 
nfsvattr",
 that NFS needs to use. I don't think these are large enough to justify 
malloc/free,
 but it has to use several of them.)

One case I did try fixing was about 6 cases where "struct nfsstate" ended up on
the stack. I changes the code to malloc/free them and then when testing, to
my surprise I had a 20% performance hit and shelved the patch.
Now that I know that the server was running near its limit, I might try this one
again, to see if the performance hit doesn't occur when the machine has adequate
memory. If the performance hit goes away, I could commit this, but it wouldn't 
have that much effect on the kstack usage. (It's interesting how this patch 
ended
up related to the issue this thread discussed.)

>>
>> >Second is r332489 Apr 13, which introduced 4/4G KVA/UVA split.
>> Could this change have resulted i

Re: i386 hangs during halt "vnodes remaining... 0 time out"

2018-04-22 Thread Rick Macklem
Konstantin Belousov wrote:
>On Sat, Apr 21, 2018 at 11:49:34PM +0200, Tijl Coosemans wrote:
>> On Sat, 21 Apr 2018 21:09:09 +0000 Rick Macklem  wrote:
>> > With a recent head/current kernel (doesn't happen when running a Dec.
>> > 2017 one), when I do a halt, it gets as far as:
>> >
>> > vnodes remaining... 0 time out
>> >
>> > and that's it (the time out appears several seconds after the first "0").
>> > With a Dec. 2017 kernel there would be several "0"s printed.
>> > It appears that it is stuck in the first iteration of the sched_sync()
>> > loop after it is no longer in SYNCER_RUNNING state.
>> >
>> > Any ideas? rick
>>
>> See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227404
>> I have a patch (attached) but haven't been able to test it yet.
>
>> Index: sys/kern/vfs_bio.c
>> >===
>> --- sys/kern/vfs_bio.c(revision 332165)
>> +++ sys/kern/vfs_bio.c(working copy)
>> @@ -791,9 +791,12 @@ bufspace_daemon(void *arg)
>>  {
>>   struct bufdomain *bd;
>>
>> + EVENTHANDLER_REGISTER(shutdown_pre_sync, kthread_shutdown, curthread,
>> + SHUTDOWN_PRI_LAST);
>> +
>>   bd = arg;
>>   for (;;) {
>> - kproc_suspend_check(curproc);
>> + kthread_suspend_check();
>>
>>   /*
>>* Free buffers from the clean queue until we meet our
>> @@ -3357,7 +3360,7 @@ buf_daemon()
>>   /*
>>* This process needs to be suspended prior to shutdown sync.
>>*/
>> - EVENTHANDLER_REGISTER(shutdown_pre_sync, kproc_shutdown, bufdaemonproc,
>> + EVENTHANDLER_REGISTER(shutdown_pre_sync, kthread_shutdown, curthread,
>>   SHUTDOWN_PRI_LAST);
>>
>>   /*
>> @@ -3381,7 +3384,7 @@ buf_daemon()
>>   bd_request = 0;
>>   mtx_unlock(&bdlock);
>>
>> - kproc_suspend_check(bufdaemonproc);
>> + kthread_suspend_check();
>>
>>   /*
>>* Save speedupreq for this pass and reset to capture new
>This looks fine.
For some reason, this thread became two threads, so I'll reply to this one as 
well.

The patch seems to work fine for me.

Thanks, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: i386 hangs during halt "vnodes remaining... 0 time out"

2018-04-21 Thread Rick Macklem
Cy Schubert wrote:
>In message <201804212227.w3lmrp5w002...@slippy.cwsent.com>, Cy Schubert
writes:
>> In message <20180421234934.10d7d...@kalimero.tijl.coosemans.org>, Tijl
>> Cooseman
>> s writes:
>> > --MP_/TDsO+CDIra7UXGs=vVO3NTB
>> > Content-Type: text/plain; charset=US-ASCII
>> > Content-Transfer-Encoding: 7bit
>> > Content-Disposition: inline
>> >
>> > On Sat, 21 Apr 2018 21:09:09 + Rick Macklem  wrot
>> e:
>> > > With a recent head/current kernel (doesn't happen when running a Dec.
>> > > 2017 one), when I do a halt, it gets as far as:
>> > >
>> > > vnodes remaining... 0 time out
>> > >
>> > > and that's it (the time out appears several seconds after the first "0").
>> > > With a Dec. 2017 kernel there would be several "0"s printed.
>> > > It appears that it is stuck in the first iteration of the sched_sync()
>> > > loop after it is no longer in SYNCER_RUNNING state.
>> > >
>> > > Any ideas? rick
>> >
>> > See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227404
>> > I have a patch (attached) but haven't been able to test it yet.
>>
>> I've noticed this as well on my old Penium-M laptop (updated about
>> twice a year). I'll try your patch this weekend or early next week.
>
>Works perfectly.
Patch seems to work for my i386 as well.

Thanks, rick


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SCHED_ULE makes 256Mbyte i386 unusable

2018-04-21 Thread Rick Macklem
Konstantin Belousov wrote:
>On Sat, Apr 21, 2018 at 07:21:58PM +0000, Rick Macklem wrote:
>> I decided to start a new thread on current related to SCHED_ULE, since I see
>> more than just performance degradation and on a recent current kernel.
>> (I cc'd a couple of the people discussing performance problems in 
>> freebsd-stable
>>  recently under a subject line of "Re: kern.sched.quantum: Creepy, sadistic 
>> scheduler".
>>
>> When testing a pNFS server on a single core i386 with 256Mbytes using a Dec. 
>> 2017
>> current/head kernel, I would see about a 30% performance degradation (elapsed
>> run time for a kernel build over NFSv4.1) when the server kernel was built 
>> with
>> options SCHED_ULE
>> instead of
>> options SCHED_4BSD
>>
>> Now, with a kernel from a couple of days ago, the
>> options SCHED_ULE
>> kernel becomes unusable shortly after starting testing.
>> I have seen two variants of this:
>> - Became essentially hung. All I could do was ping the machine from the 
>> network.
>> - Reported "vm_thread_new: kstack allocation failed
>>   and then any attempt to do anything gets "No more processes".
>This is strange.  It usually means that you get KVA either exhausted or
>severly fragmented.
Yes. I reduced the number of nfsd threads from 256->32 and the SCHED_ULE
kernel is working ok now. I haven't done enough to compare performance yet.
Maybe I'll post again when I have some numbers.

>Enter ddb, it should be operational since pings are replied.  Try to see
>where the threads are stuck.
I didn't do this, since reducing the number of kernel threads seems to have 
fixed
the problem. For the pNFS server, the nfsd threads will spawn additional kernel
threads to do proxies to the mirrored DS servers.

>> with the only difference being a kernel built with
>> options SCHED_4BSD
>> everything works and performs the same as the Dec 2017 kernel.
>>
>> I can try rolling back through the revisions, but it would be nice if someone
>> could suggest where to start, because it takes a couple of hours to build a
>> kernel on this system.
>>
>> So, something has made things worse for a head/current kernel this winter, 
>> rick
>
>There are at least two potentially relevant changes.
>
>First is r326758 Dec 11 which bumped KSTACK_PAGES on i386 to 4.
I've been running this machine with KSTACK_PAGES=4 for some time, so no change.

>Second is r332489 Apr 13, which introduced 4/4G KVA/UVA split.
Could this change have resulted in the system being able to allocate fewer
kernel threads/stacks for some reason?

>Consequences of the first one are obvious, it is much harder to find
>the place to map the stack.  Second change, on the other hand, provides
>almost full 4G for KVA and should have mostly compensate for the negative
>effects of the first.
>
>And, I cannot see how changing the scheduler would fix or even affect that
>behaviour.
My hunch is that the system was running near its limit for kernel 
threads/stacks.
Then, somehow, the timing SCHED_ULE caused resulted in the nfsd trying to get
to a higher peak number of threads and hit the limit.
SCHED_4BSD happened to result in timing such that it stayed just below the
limit and worked.
I can think of a couple of things that might affect this:
1 - If SCHED_ULE doesn't do the termination of kernel threads as quickly, then
  they wouldn't terminate and release their resources before more new ones
  are spawned.
2 - If SCHED_ULE handles the nfsd threads in a more "bursty" way, then the burst
  could try and spawn more mirror DS worker threads at about the same time.

Anyhow, thanks for the help, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


i386 hangs during halt "vnodes remaining... 0 time out"

2018-04-21 Thread Rick Macklem
With a recent head/current kernel (doesn't happen when running a Dec. 2017 one),
when I do a halt, it gets as far as:

vnodes remaining... 0 time out

and that's it (the time out appears several seconds after the first "0").
With a Dec. 2017 kernel there would be several "0"s printed.
It appears that it is stuck in the first iteration of the sched_sync() loop 
after
it is no longer in SYNCER_RUNNING state.

Any ideas? rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


SCHED_ULE makes 256Mbyte i386 unusable

2018-04-21 Thread Rick Macklem
I decided to start a new thread on current related to SCHED_ULE, since I see
more than just performance degradation and on a recent current kernel.
(I cc'd a couple of the people discussing performance problems in freebsd-stable
 recently under a subject line of "Re: kern.sched.quantum: Creepy, sadistic 
scheduler".

When testing a pNFS server on a single core i386 with 256Mbytes using a Dec. 
2017
current/head kernel, I would see about a 30% performance degradation (elapsed
run time for a kernel build over NFSv4.1) when the server kernel was built with
options SCHED_ULE
instead of
options SCHED_4BSD

Now, with a kernel from a couple of days ago, the
options SCHED_ULE
kernel becomes unusable shortly after starting testing.
I have seen two variants of this:
- Became essentially hung. All I could do was ping the machine from the network.
- Reported "vm_thread_new: kstack allocation failed
  and then any attempt to do anything gets "No more processes".
with the only difference being a kernel built with
options SCHED_4BSD
everything works and performs the same as the Dec 2017 kernel.

I can try rolling back through the revisions, but it would be nice if someone
could suggest where to start, because it takes a couple of hours to build a
kernel on this system.

So, something has made things worse for a head/current kernel this winter, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: anyone running with ngroups increased from 16?

2018-04-16 Thread Rick Macklem
Brooks Davis wrote:
>On Mon, Apr 16, 2018 at 06:37:53PM +0800, Julian Elischer wrote:
>> Windows users seem to have an almost unlimited number of groups and=20
>> soem places seem to use them a LOT.
>> This gives Posix systems problems with deciding how to handle them=20
>> all. Especially when getting
>> user credentials from winbindd (samba).
>>=20
>> Does anyone know of any work done to either bypass this limit or to at=20
>> least expand it?
>
>I fixed this in 2009 for everything but NFS AUTH_SYS.  NGROUPS_MAX is
>1023.  IIRC the usual hack employed in storage systems is to ignore the
>groups provided by AUTH_SYS and get them from winbindd.  I don't know of
>a public implementation of that.
If winbindd gets the information from LDAP, then you can get the same effect
from "nfsuserd -manage-gids" for AUTH_SYS (or as Toomas Soome noted, the gssd
does the same thing for Kerberized mounts).

Both of these utilities use getgrouplist() on the NFS server to acquire the list
of groups for the user. As such, anything configured for the library call, such
as LDAP, will provide the list of groups.

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: anyone running with ngroups increased from 16?

2018-04-16 Thread Rick Macklem
Julian Elischer wrote:
>On 16/4/18 6:37 pm, Julian Elischer wrote:
>> Windows users seem to have an almost unlimited number of groups and
>> soem places seem to use them a LOT.
>> This gives Posix systems problems with deciding how to handle them
>> all. Especially when getting
>> user credentials from winbindd (samba).
>>
>> Does anyone know of any work done to either bypass this limit or to
>> at least expand it?
>
>I mean with the other applications such NFS usages etc.
>I know mountd explodes with > 16..  has anyone done a cleaning pass?
16 is the limit "on-the-wire" per RFCs for Sun RPC. You can use
nfsuserd --manage-gids (see "man nfsuserd")
on the NFS server so that the daemon uses the group list for the uid in the RPC 
instead of the list of groups (limited to 16) in the RPC header. Works fine so
long as the server knows the same group list for a uid as the client(s) do.

And, yes, this applies to NFSv3 as well as NFSv4.

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: NFSv4.2

2017-12-11 Thread Rick Macklem
Stefan Wendler wrote:
> We would like to use the file copy and the sparse features of 4.2 in our
> Setup. Do you know if any of the two has been implemented yet? The
> sparse feature would be more important than the file copy feature though.
No idea (except that NFSv4.2 isn't in FreeBSD which implies "not in FreeBSD").

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: NFSv4.2

2017-12-11 Thread Rick Macklem
Stefan Wendler wrote:
> I was wondering when and if FreeBSD will support NFSv4.2
> Is there anything planned yet?
Someday, but no specific plans at this point.

Is there some specific feature in NFSv4.2 that you are looking for?
I ask because there isn't a lot of new features in NFSv4.2 that aren't
in NFSv4.1. As such, I didn't see much reason to worry about it.
(I think there are some high end server features like server->server
 file copy, which would only be worth having in the client if you had
 high end NFS servers that supported this stuff.)

NFSv4.1 was a big change from NFSv4.0, but most of the NFSv4.1->NFSv4.2
changes are minor. One of the biggest is a way to incrementally add features
without creating a new (NFSv4.3 or ??) version of the protocol. Probably a
good idea, but only useful when incremental features are being implemented.

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Switch vfs.nfsd.issue_delegations to TUNABLE ?

2017-12-01 Thread Rick Macklem
I have created D13327 on reviews.freebsd.org with a patch that keeps
track of # of issued write delegations and allows nfsrv_checkgetattr()
to return without acquiring the lock when it is 0.

Hopefully kib@, jhb@ or someone who is familiar with the use
of the atomic ops in the kernel can review it. (Where the counter
code goes should be fine, but I am not sure I got the use of the
atomic ops correct.)

Hopefully Emmanuel can test the patch to see if it fixes his
performance problem.

rick


From: owner-freebsd-curr...@freebsd.org  on 
behalf of Rick Macklem 
Sent: Wednesday, November 29, 2017 12:28:05 PM
To: Emmanuel Vadot
Cc: Konstantin Belousov; FreeBSD Current; freebsd...@freebsd.org
Subject: Re: Switch vfs.nfsd.issue_delegations to TUNABLE ?

Emmanuel Vadot wrote:
[stuff snipped]
> I haven't test by I can say that it will work, I actually wondered at
>first doing that. The problem with this patch is what I tried to
>describe in my first and following mails, since you can turn on and off
>delegation you can still have delegation (so nfsrv_delegatecnt > 0)
>even if the sysctl is == 0. That's why I went to the tunable way, seems
>cleaner to me as disabling delegation doesn't really disable them for
>current client.
Yes, if you have delegations enabled and then disable them, there will
be delegations issued for a "while".
During that time, the code in nfsrv_checkgetattr() does need to check for
them.
Since no new delegations will be issued, the outstanding ones will go
away when the client chooses to return them. (You can force this on the
client by doing a dismount/mount, at least for the FreeBSD client.)
Alternately, rebooting the server forces the clients to "recover" delegations
and, if they are disabled, that fails. (Actually the recover succeeds, but with
a flag set that tells the client it must return them asap.)

All the tunable does is make it impossible to disable them without
rebooting, but otherwise the effect is the same.

I have a patch that keeps a separate counter for write delegations
(which are the ones you care about) using atomics to maintain the
value. (That will be similar but somewhat better than what this patch does.)

rick
[lots more snipped]
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Switch vfs.nfsd.issue_delegations to TUNABLE ?

2017-11-29 Thread Rick Macklem
Emmanuel Vadot wrote:
[stuff snipped]
> I haven't test by I can say that it will work, I actually wondered at
>first doing that. The problem with this patch is what I tried to
>describe in my first and following mails, since you can turn on and off
>delegation you can still have delegation (so nfsrv_delegatecnt > 0)
>even if the sysctl is == 0. That's why I went to the tunable way, seems
>cleaner to me as disabling delegation doesn't really disable them for
>current client.
Yes, if you have delegations enabled and then disable them, there will
be delegations issued for a "while".
During that time, the code in nfsrv_checkgetattr() does need to check for
them.
Since no new delegations will be issued, the outstanding ones will go
away when the client chooses to return them. (You can force this on the
client by doing a dismount/mount, at least for the FreeBSD client.)
Alternately, rebooting the server forces the clients to "recover" delegations
and, if they are disabled, that fails. (Actually the recover succeeds, but with
a flag set that tells the client it must return them asap.)

All the tunable does is make it impossible to disable them without
rebooting, but otherwise the effect is the same.

I have a patch that keeps a separate counter for write delegations
(which are the ones you care about) using atomics to maintain the
value. (That will be similar but somewhat better than what this patch does.)

rick
[lots more snipped]
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Switch vfs.nfsd.issue_delegations to TUNABLE ?

2017-11-28 Thread Rick Macklem
Did my usual and forgot to attach it. Here's the patch, rick


From: Rick Macklem 
Sent: Tuesday, November 28, 2017 6:17:13 PM
To: Emmanuel Vadot
Cc: Konstantin Belousov; FreeBSD Current; freebsd...@freebsd.org; Rick Macklem
Subject: Re: Switch vfs.nfsd.issue_delegations to TUNABLE ?

I think the attached patch should fix your performance problem.
It simply checks for nfsrv_delegatecnt != 0 before doing all the
locking stuff in nfsrv_checkgetattr().

If this fixes your performance problem (with delegations disabled),
I'll probably add a separate counter for write delegations and
use atomics to increment/decrement it so that it is SMP safe without
acquiring any lock.

If you can test this, please let me know how it goes? rick

____
From: Rick Macklem 
Sent: Tuesday, November 28, 2017 2:09:51 PM
To: Emmanuel Vadot
Cc: Konstantin Belousov; FreeBSD Current; freebsd...@freebsd.org; Rick Macklem
Subject: Re: Switch vfs.nfsd.issue_delegations to TUNABLE ?

Emmanuel Vadot wrote:
>I wrote:
>> Since it defaults to "disabled", I don't see why a tunable would be 
>> necessary?
>> (Just do nothing and delegations don't happen. If you want the server
>>  to issue delegations, then use the sysctl to turn them on. If you want to 
>> turn
>>  them off again at the server, reboot the server without setting the sysctl.)
>
>If you need to reboot to make things working again without delegation
>this shouldn't be a sysctl.
Turning them off without rebooting doesn't break anything.
I just wrote it that way since you seemed to want to use a tunable, which
implied rebooting was your preference.
> > > >  The reason behind it is recent problem at work on some on our filer
> > > > related to NFS.
> > > >  We use NFSv4 without delegation as we never been able to have good
> > > > performance with FreeBSD server and Linux client (we need to do test
> > > > again but that's for later).
> Delegations are almost never useful, especially with Linux clients.
Emmanuel Vadot wrote:
>Can you elaborate ? Reading what delegation are I see that this is
>exactly what I'm looking for to have better performance with NFS for my
>workload (I only have one client per mount point).
Delegations allow the client to do Opens locally without contacting the
server. Unless, the delegation is a write delegation, this only applied
to read-only delegations. Also, since most implementors couldn`t agree
on how to check permissions via the delegation, most client implementations
still do an Access check at open, which is almost as much overhead as the
Open RPC. (For example, Solaris servers always specified an empty ACE in the
delegation, forcing the client to do an Access. I have no idea what the
current Linux serveréclient does. If you capture packets when a Linux
client is mounted with delegations enabled, you could look for RPCs like Access 
when
are Opened multiple times. If you see them, then delegations aren`t saving RPCs.
Also, they are `per file`, so are only useful if the client is Opening the
same file multiple times.
Further, if another client Opens the same file and the first client got a Write
delegation, then the write delegation must be recalled, which is a lot of
overhead and one of the few cases where the FreeBSD server must exclusively
lock the state lists, forcing all other RPCs related to Open, Close to wait.

They sounded good in theory and might have worked well if the implementors
had agreed to do them, but that didnèt happen. (Companies pay for servers, but 
the
clients donèt get funded so delegation support in the clients are lacking. I 
tried
to make them useful in the FreeBSD client, but Ièm not sure I succeeded.)

> [stuff snipped]
If I understood your original post, you have a performance problem caused
by lock contention, where the server grabs the state lock to check for 
delegations
for every Getattr from a client.

As below, I think the fix is to add code to check for no delegations issued that
does not require acquisition of the state lock.

Btw, large numbers of Getattrs will not perform well with delegations.
(Again, the client should be able to do Getattr operations locally in the
 client when it has a delegation for the file, but if the client is not doing 
that...)

I wrote:
>
> Having a per-mount version of this would be overkill, I think. It would have 
> to
> disable callbacks on the mount point, since there is no way for a client to 
> say
> "don't give me delegations" except disabling callbacks, which the server
> requires for delegation issue.
> [stuff snipped]
> The case where there has never been delegations issued will result in an
> empty delegation queue. Maybe this case can be handled without acquiring
> the "global 

Re: Switch vfs.nfsd.issue_delegations to TUNABLE ?

2017-11-28 Thread Rick Macklem
I think the attached patch should fix your performance problem.
It simply checks for nfsrv_delegatecnt != 0 before doing all the
locking stuff in nfsrv_checkgetattr().

If this fixes your performance problem (with delegations disabled),
I'll probably add a separate counter for write delegations and
use atomics to increment/decrement it so that it is SMP safe without
acquiring any lock.

If you can test this, please let me know how it goes? rick


From: Rick Macklem 
Sent: Tuesday, November 28, 2017 2:09:51 PM
To: Emmanuel Vadot
Cc: Konstantin Belousov; FreeBSD Current; freebsd...@freebsd.org; Rick Macklem
Subject: Re: Switch vfs.nfsd.issue_delegations to TUNABLE ?

Emmanuel Vadot wrote:
>I wrote:
>> Since it defaults to "disabled", I don't see why a tunable would be 
>> necessary?
>> (Just do nothing and delegations don't happen. If you want the server
>>  to issue delegations, then use the sysctl to turn them on. If you want to 
>> turn
>>  them off again at the server, reboot the server without setting the sysctl.)
>
>If you need to reboot to make things working again without delegation
>this shouldn't be a sysctl.
Turning them off without rebooting doesn't break anything.
I just wrote it that way since you seemed to want to use a tunable, which
implied rebooting was your preference.
> > > >  The reason behind it is recent problem at work on some on our filer
> > > > related to NFS.
> > > >  We use NFSv4 without delegation as we never been able to have good
> > > > performance with FreeBSD server and Linux client (we need to do test
> > > > again but that's for later).
> Delegations are almost never useful, especially with Linux clients.
Emmanuel Vadot wrote:
>Can you elaborate ? Reading what delegation are I see that this is
>exactly what I'm looking for to have better performance with NFS for my
>workload (I only have one client per mount point).
Delegations allow the client to do Opens locally without contacting the
server. Unless, the delegation is a write delegation, this only applied
to read-only delegations. Also, since most implementors couldn`t agree
on how to check permissions via the delegation, most client implementations
still do an Access check at open, which is almost as much overhead as the
Open RPC. (For example, Solaris servers always specified an empty ACE in the
delegation, forcing the client to do an Access. I have no idea what the
current Linux serveréclient does. If you capture packets when a Linux
client is mounted with delegations enabled, you could look for RPCs like Access 
when
are Opened multiple times. If you see them, then delegations aren`t saving RPCs.
Also, they are `per file`, so are only useful if the client is Opening the
same file multiple times.
Further, if another client Opens the same file and the first client got a Write
delegation, then the write delegation must be recalled, which is a lot of
overhead and one of the few cases where the FreeBSD server must exclusively
lock the state lists, forcing all other RPCs related to Open, Close to wait.

They sounded good in theory and might have worked well if the implementors
had agreed to do them, but that didnèt happen. (Companies pay for servers, but 
the
clients donèt get funded so delegation support in the clients are lacking. I 
tried
to make them useful in the FreeBSD client, but Ièm not sure I succeeded.)

> [stuff snipped]
If I understood your original post, you have a performance problem caused
by lock contention, where the server grabs the state lock to check for 
delegations
for every Getattr from a client.

As below, I think the fix is to add code to check for no delegations issued that
does not require acquisition of the state lock.

Btw, large numbers of Getattrs will not perform well with delegations.
(Again, the client should be able to do Getattr operations locally in the
 client when it has a delegation for the file, but if the client is not doing 
that...)

I wrote:
>
> Having a per-mount version of this would be overkill, I think. It would have 
> to
> disable callbacks on the mount point, since there is no way for a client to 
> say
> "don't give me delegations" except disabling callbacks, which the server
> requires for delegation issue.
> [stuff snipped]
> The case where there has never been delegations issued will result in an
> empty delegation queue. Maybe this case can be handled without acquiring
> the "global NFSv4 state lock", which is what sounds like is causing the
> performance issue. Maybe a global counter of how many delegations are
> issued that is handled by atomic ops.
> --> If it is 0, nfsrv_checkgetattr() can just return without acquiring the 
> lock.
>
> I'll take a look at this, rick
>
rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Switch vfs.nfsd.issue_delegations to TUNABLE ?

2017-11-28 Thread Rick Macklem
Emmanuel Vadot wrote:
>I wrote:
>> Since it defaults to "disabled", I don't see why a tunable would be 
>> necessary?
>> (Just do nothing and delegations don't happen. If you want the server
>>  to issue delegations, then use the sysctl to turn them on. If you want to 
>> turn
>>  them off again at the server, reboot the server without setting the sysctl.)
>
>If you need to reboot to make things working again without delegation
>this shouldn't be a sysctl.
Turning them off without rebooting doesn't break anything.
I just wrote it that way since you seemed to want to use a tunable, which
implied rebooting was your preference.
> > > >  The reason behind it is recent problem at work on some on our filer
> > > > related to NFS.
> > > >  We use NFSv4 without delegation as we never been able to have good
> > > > performance with FreeBSD server and Linux client (we need to do test
> > > > again but that's for later).
> Delegations are almost never useful, especially with Linux clients.
Emmanuel Vadot wrote:
>Can you elaborate ? Reading what delegation are I see that this is
>exactly what I'm looking for to have better performance with NFS for my
>workload (I only have one client per mount point).
Delegations allow the client to do Opens locally without contacting the
server. Unless, the delegation is a write delegation, this only applied
to read-only delegations. Also, since most implementors couldn`t agree
on how to check permissions via the delegation, most client implementations
still do an Access check at open, which is almost as much overhead as the
Open RPC. (For example, Solaris servers always specified an empty ACE in the
delegation, forcing the client to do an Access. I have no idea what the
current Linux serveréclient does. If you capture packets when a Linux
client is mounted with delegations enabled, you could look for RPCs like Access 
when
are Opened multiple times. If you see them, then delegations aren`t saving RPCs.
Also, they are `per file`, so are only useful if the client is Opening the
same file multiple times.
Further, if another client Opens the same file and the first client got a Write
delegation, then the write delegation must be recalled, which is a lot of
overhead and one of the few cases where the FreeBSD server must exclusively
lock the state lists, forcing all other RPCs related to Open, Close to wait.

They sounded good in theory and might have worked well if the implementors
had agreed to do them, but that didnèt happen. (Companies pay for servers, but 
the
clients donèt get funded so delegation support in the clients are lacking. I 
tried
to make them useful in the FreeBSD client, but Ièm not sure I succeeded.)

> [stuff snipped]
If I understood your original post, you have a performance problem caused
by lock contention, where the server grabs the state lock to check for 
delegations
for every Getattr from a client.

As below, I think the fix is to add code to check for no delegations issued that
does not require acquisition of the state lock.

Btw, large numbers of Getattrs will not perform well with delegations.
(Again, the client should be able to do Getattr operations locally in the
 client when it has a delegation for the file, but if the client is not doing 
that...)

I wrote:
>
> Having a per-mount version of this would be overkill, I think. It would have 
> to
> disable callbacks on the mount point, since there is no way for a client to 
> say
> "don't give me delegations" except disabling callbacks, which the server
> requires for delegation issue.
> [stuff snipped]
> The case where there has never been delegations issued will result in an
> empty delegation queue. Maybe this case can be handled without acquiring
> the "global NFSv4 state lock", which is what sounds like is causing the
> performance issue. Maybe a global counter of how many delegations are
> issued that is handled by atomic ops.
> --> If it is 0, nfsrv_checkgetattr() can just return without acquiring the 
> lock.
>
> I'll take a look at this, rick
> 
rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Switch vfs.nfsd.issue_delegations to TUNABLE ?

2017-11-28 Thread Rick Macklem
Konstantin Belousov wrote:
>On Tue, Nov 28, 2017 at 02:26:10PM +0100, Emmanuel Vadot wrote:
>> On Tue, 28 Nov 2017 13:04:28 +0200
>> Konstantin Belousov  wrote:
>>
>> > On Tue, Nov 28, 2017 at 11:41:36AM +0100, Emmanuel Vadot wrote:
>> > >
>> > >  Hello,
>> > >
>> > >  I would like to switch the vfs.nfsd.issue_delegations sysctl to a
>> > > tunable.
Since it defaults to "disabled", I don't see why a tunable would be necessary?
(Just do nothing and delegations don't happen. If you want the server
 to issue delegations, then use the sysctl to turn them on. If you want to turn
 them off again at the server, reboot the server without setting the sysctl.)

> > >  The reason behind it is recent problem at work on some on our filer
> > > related to NFS.
> > >  We use NFSv4 without delegation as we never been able to have good
> > > performance with FreeBSD server and Linux client (we need to do test
> > > again but that's for later).
Delegations are almost never useful, especially with Linux clients.
[stuff snipped]
> > Perhaps make nodeleg per-mount flag ?
The sysctl vfs.nfsd.issue_delegations only affects the server.
If you have a FreeBSD client that is mounting a delegations enabled server and
does not want delegations, just don't run the "nfscbd" daemon on the client.
The only time you want the "nfscbd" daemon running is for delegations and
pNFS layouts. (I suppose there is the case of a client using NFSv4.1/pNFS 
against
a server with delegations enabled, but since delegations aren't very useful,
I'd just disable delegations on the server for this case.)

Having a per-mount version of this would be overkill, I think. It would have to
disable callbacks on the mount point, since there is no way for a client to say
"don't give me delegations" except disabling callbacks, which the server
requires for delegation issue.
[stuff snipped]
The case where there has never been delegations issued will result in an
empty delegation queue. Maybe this case can be handled without acquiring
the "global NFSv4 state lock", which is what sounds like is causing the
performance issue. Maybe a global counter of how many delegations are
issued that is handled by atomic ops.
--> If it is 0, nfsrv_checkgetattr() can just return without acquiring the lock.

I'll take a look at this, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


pNFS server code merge into head/current

2017-11-25 Thread Rick Macklem
Hi,

There is a source tree in svn at projects/pnfs-planb-server which adds support
for configuring a single Metadata Server (MDS) and multiple Data Servers (DS)
to create a simple pNFS service. (In a pNFS server the Read/Write operations
are separated from the rest of the metadata operations and go directly from the
NFSv4.1 client to the DS, since that is where the fils's data resides.)
The service does support mirrored DSs, but the recovery code for handling
a failed DS is not done yet. I plan on working on that during Winter 2018.

The current implementation seems to be working ok for my testing. Any third
party testing would be appreciated. The basic information on how it works and
how to set up a pNFS service is at
http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt
(John Hixon is working on porting/testing it in FreeNAS, but hasn't quite had
 to time get all set up yet.)

The big question is???
If/when this code should go into head/current?
I cannot do commits during Winter 2018, so if it is going to happen before
April 2018, I need to do it in December. I know a release schedule has not been
worked out for FreeBSD12, but is April 2018 early enough or should it be done
this December?

The merge is large, but should not affect non-pNFS NFS service, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Mergemaster won't run on NFS mount

2017-11-23 Thread Rick Macklem
Thomas Laus wrote:
>My /etc/exports file is empty.  I have set the sharenfs property to
>'YES" on the /usr/obj and /usr/src data sets.  The ZFS filesystem
>handles NFS shares internally from the documenation.
It still reloads the exports, so the outcome is the same.

>In any event,
>this is how my system has successfuly been doing NFS for nearly 2
>years without any issues.  A client should be able to read and not
>write to both /usr/src and /usr/obj to be able to installworld and
>installkernel and have all mergemaster writes confined to a local
>/var/tmp/temproot.
I didn't really think it was the cause of the problem, just the only explanation
I knew of.

>My /etc/rc.conf doesn't have any flags for mountd.
>It is only a "YES" to enable.
If it is a recent install then, yes, it should be set. You should see the 
options
being used by typing "ps ax" on the server.

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Mergemaster won't run on NFS mount

2017-11-22 Thread Rick Macklem
Thomas Laus wrote:
>I have been updating FreeBSD for years on my fastest computer and then
>NFS mounting /usr/src and /usr/obj to share with other PC's.  I just
>updated FreeBSD-CURRENT to 326070 and was able to install the kernel and
>world.  When I attempted to run mergemaster, I received the following
>error message:
>
>*** Creating the temporary root environment in /var/tmp/temproot
> *** /var/tmp/temproot ready for use
> *** Creating and populating directory structure in /var/tmp/temproot
>
> /bin/sh: cannot create freebsd.cf: Permission denied
Make sure mountd is running with the "-S" option on the nfs server. If not,
any mount operation done on the NFS server will result in EACCES failures
while /etc/exports is being reloaded.
(If it is running with "-S" I don't know why it would fail?)

rick
ps: "-S" should be the default now...
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: NFSv3 issues with latest -current

2017-10-31 Thread Rick Macklem
Rodney W. Grimes wrote:
[stuff snipped]
> I wrote:
>> Btw, NFS often causes this because...
>> - Typically TSO is limited to a 64K packet (including TCP/IP and MAC 
>> headers).
>> - When NFS does reading/writing, it will do 64K + NFS, TCP/IP and MAC headers
>>   for an RPC (or a multiple of 64K like 128K).
>> --> This results in tcp_output() generating a 64K TSO segment followed by a
>>  small TCP segment (since another RPC message doesn;t usually end up
>>  queued quickly enough to fill in the rest of the second TCP segment).
>> - Also, at the end of file, you can get an RPC which is just under 64K 
>> including
>>   NFS and TCP/IP headers. (The drivers often broke when adding the MAC
>>   header bumped this case to > 64K.)
>>
>> Thanks go to Yuri for diagnosing this, rick
>
> Just a thought, not asking anyone to write one :-)
>
> It would be handy to have some sh(1) scripts that can exercise this bug
> case and have it readily avaliable to network driver authors for testing
> the tso (or other large segment) code.
You can't easily reproduce this from userland. It depends on the way NFS fills 
in
the mbuf chain for I/O RPCs. (iSCSI does something similar.)

However, if your shell script does an NFS mount and the writes/reads a
file just under 64K in size on the mount...

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: NFSv3 issues with latest -current

2017-10-31 Thread Rick Macklem
Cy Schubert wrote:
[stuff snipped]
>The sysctl is net.inet.tcp.tso. You can also disable tso through ifconfig
>for an interface.
>
For testing this case, I'd recommend using the sysctl. Since the net device
driver is often the culprit, that device driver might not handle the "ifconfig"
correctly either.

Btw, NFS often causes this because...
- Typically TSO is limited to a 64K packet (including TCP/IP and MAC headers).
- When NFS does reading/writing, it will do 64K + NFS, TCP/IP and MAC headers
  for an RPC (or a multiple of 64K like 128K).
--> This results in tcp_output() generating a 64K TSO segment followed by a
 small TCP segment (since another RPC message doesn;t usually end up
 queued quickly enough to fill in the rest of the second TCP segment).
- Also, at the end of file, you can get an RPC which is just under 64K including
  NFS and TCP/IP headers. (The drivers often broke when adding the MAC
  header bumped this case to > 64K.)

Thanks go to Yuri for diagnosing this, rick


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: NFSv3 issues with latest -current

2017-10-29 Thread Rick Macklem
Yuri Pankov wrote:
> All file operations (e.g. copying the file over NFSv3 for me) seem to be
> stuck running the latest -current (r325100).  Reverting just the kernel
> to r323779 (arbitrary chosen) seems to help.  I noticed the "Stale file
> handle when mounting nfs" message but I don't get the "stale file
> handle" messages from mount, probably as I'm not running any linux clients.
These kinds of problems are usually related to your net interface device
driver or the TCP stack.

A couple of things to try:
- Disable TSO (look for a sysctl with "tso" in it).
- Try using mount options rsize=32768,waize=32768 to reduce the I/O
  size. Some device drivers don't handle long chains of mbufs well,
  especially when the size is near 64K.
(These issues have been fixed in current, but if a bug slips into a net driver
 update or ???)
- Look at recent changes to the net device driver you are using and try 
reverting
  those changes if you can do so.
- Capture packets and look at them in wireshark (which knows NFS) and see
  what is going on the wire.

There hasn't been any recent changes to NFS that should affect NFSv3 mounts
or to the kernel rpc, so I doubt the NFSv4.1 changes would be involved.

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: pfind_locked(pid) fails when in a jail?

2017-10-17 Thread Rick Macklem
Mateusz Guzik wrote:
[lots of stuff snipped]
> I proposed registration of per-process callbacks, not filtering.
> The code would just walk the list/table/whatever and call everything on
> it - they asked for it.
Yep, this would work for the NFSv4 client.
Way back when, all I did in OpenBSD was add a function pointer to "struct proc"
that was normally NULL, but set to a function in the NFS client when an NFSv4
Open was done for the process.

I suspect you'd want something like a linked list, so that multiple "users" 
could register callback functions upon exit or ...

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: pfind_locked(pid) fails when in a jail?

2017-10-16 Thread Rick Macklem
[stuff snipped]
> > >
> > pfind* does not do any filtering.
> >
Hmm, well I have no idea why the jailed mounts get looping in here then.

> > The real question though is why are you calling it in the first place. The
> > calls
> > I grepped in nfscl_procdoesntexist are highly suspicious - there is no
> > guarantee
> > the process you found here is the same you had at the time you were saving
> > the pid.
> >
Long long ago (about 2002) this code was written for OpenBSD2.6. I added
a call from the kernel exit() code to do this. When I ported it to FreeBSD
around 2005, I didn't find any way for a process exit notification to be done,
so I created the first version of this code. (This way of doing it was first
coded for Mac OS X 10.3, if I recall correctly.)

Since it does check that the time of process creation is the same, it doesn't
seem likely that it would find a different process (ie. two processes with the
same pid that were created at the same time within the clock resolution of
that creation time seems highly unlikely in practice?).

> > There is no usable process exit notification right now, but it can be added
> > if necessary.
> >
If there was a way for the NFS client to register to get a notification that a
given process is terminating (exit'ng), that could certainly be used instead
of this code.

I wouldn't want a call for every exit(), but only the ones that have NFSv4 
opens.

>>
>> Does that mean there is something wrong with the existing eventhandler
>> notifications related to proc fork/exec/exit?
>>
>
>I already noted in the other mail that the current mechanism has
>avoidable global locking, lists traversals etc. But even with these
>issues fixed it calls everything for everyone.
>
>What's needed is a mechanism to register per-process callbacks. Details
>can be flamed over (e.g. should it require allocating a buffer per
>process or perhaps just one and then point to it from a resizable
>per-proc table when registered), but calling something which has nothing
>to do in almost all cases and from in a super inefficient way at that is
>definitely something we need to start cleaning up.
Yes, I would agree, although it doesn't explain what this CPU hog case is
caused by.

Thanks for the comments and if you create/commit the above, let me know
and I'll change the NFS client code to use it (if your patch doesn't do that).

rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


pfind_locked(pid) fails when in a jail?

2017-10-16 Thread Rick Macklem
Hi,

A problem w.r.t. the NFSv4 client's renew thread (nfscl) running up a lot of CPU
when the NFSv4 mount is in a jail has been reported to the freebsd-stable@
mailing list.

I know nothing about jails, but when looking at the code, the most obvious
cause of this would be "pfind_locked(pid)" failing to find a process.
- Will a jail affect how pfind_locked() behaves?
- If the answer is "yes", then I need to know how to either...
   1 - Make pfind_locked() work the same as when no jail exists.
   OR
   2 - A way for the Renew thread can determine that a jail will affect 
pfind_locked()
 behaviour, so it can avoid this problem.
#1 is preferred, since #2 may not be 100% correct, although #2 would allow the
code to behave well for most cases. (The exception is a case where a file 
remains
open for a long period of time, with different processes doing byte range locks 
on
the file.)

Thanks in advance for any help w.r.t. jail behaviour, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: RFC how to use kernel procs/threads efficiently

2017-10-10 Thread Rick Macklem
Julian Elischer wrote:
[stuff snipped]
>On 10/10/17 4:25 am, Rick Macklem wrote:
>> --> As such, having a fixed reasonable # of threads is probably the best
>>that can be done.
>>- The current patch has the # of threads as a sysctl with a default 
>> of 32.
>why not set it to ncpu or something?
Well, each of these threads will do an RPC, which means a couple of short
bursts of CPU and then sleep the rest of the time waiting for the RPC reply
to come back from the Data Server.
As such, it would seem to me that you would want a lot more threads than
CPUs on the machine?
However, setting the default to "N * ncpu" seems better than just a fixed "32"
to me. (For nfsd, the current default is 8 * ncpu, so maybe that is a good
default for this too?)
What do you think?

Thanks for the comment, rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: RFC how to use kernel procs/threads efficiently

2017-10-09 Thread Rick Macklem


Ian Lepore wrote:
[stuff snipped]
>taskqueue(9) is an existing mechanism to enqueue functions to execute
>asynch using a pool of threads, but it doesn't answer the scalability
>questions.  In fact it may make them harder, inasmuch as I don't think
>there's a mechanism to dynamically adjust the number of threads after
>first calling taskqueue_start_threads().
I've coded it using taskqueue and it seems to work ok.
The patch is here, in case anyone would like to review it:
https://www.reviews.freebsd.org/D12632

I don't know what the overheads are (or even how to measure/compare
them), but I suspect it is less than a kproc_create()/kproc_exit() for every
RPC.

I also don't think having a fixed # of threads is a problem. Since NFS I/O
is so bursty, recent I/O activity doesn't give a good indication of how many
threads will be needed soon. In other words, it can go from no I/O to
heavy I/O and back to no I/O rapidly.
--> As such, having a fixed reasonable # of threads is probably the best
  that can be done.
  - The current patch has the # of threads as a sysctl with a default of 32.

Thanks for your comments and feel free to review it, if you'd like, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: RFC how to use kernel procs/threads efficiently

2017-10-07 Thread Rick Macklem
Ian Lepore wrote:
>On Fri, 2017-10-06 at 19:02 +0000, Rick Macklem wrote:
>> Hi,
>>
>> I have now dropped the client side of Flexible File Layout for pNFS into head
>> and I believe it is basically working.
>> Currently when talking to mirrored DS servers, it does the Write and Commit
>> RPCs to the mirrors serially. This works, but is inefficient w.r.t. elapsed 
>> to to
>> completion.
>>
>> To do them concurrently, I need separate kernel processes/threads to do them.
>> I can think of two ways to do this:
>> 1 - The code that I have running in projects/pnfs-planb-server for the pNFS 
>> server
>>   side does a kproc_create() to create a kernel process that does the 
>> RPC and
>>   then krpc_exit()s.
>>   - This was easy to code and works. However, I am concerned that there 
>> is
>> going to be excessive overheads from doing all the kproc_create()s 
>> and
>> kproc_exit()s?
>>Anyone know if these calls will result in large overheads?
>> 2 - I haven't coded this, but the other way I can think of to do this is to
>>   create a pool of threads (kthread_create() is sufficient in this case, 
>> I
>>   think?) and then hand each RPC to an available thread so it can do the 
>> RPC.
>>   - Other than a little more complex coding, the main issue I see with 
>> this one
>> is "How many threads and when to create more/less of them.".
>>
>> Anyhow, any comments w.r.t. the merits of either of the above approaches
>> (or a suggestion of other ways to do this) would be appreciated, rick
>
>taskqueue(9) is an existing mechanism to enqueue functions to execute
>asynch using a pool of threads, but it doesn't answer the scalability
>questions.  In fact it may make them harder, inasmuch as I don't think
>there's a mechanism to dynamically adjust the number of threads after
>first calling taskqueue_start_threads().
Hmm, yes. Thanks for the pointer. I hadn't read "man taskqueue" until now.
The kernel RPC doesn't use this and I suspect that it is because of what you
said w.r.t. dynamically adjusting the # of threads.
However, it does save "hand coding" the queues for #2 and I'm lazy (plus
don't believe reinventing the wheel is the best plan).

I think I will try using taskqueue and just have a sysctl for #of-threads.
(Actually most of the code ends up the same, because basically they all
 end up with a function with a single argument that does the RPC. The
 only difference is what call starts the RPC.)

Anyone else have comments? rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


RFC how to use kernel procs/threads efficiently

2017-10-06 Thread Rick Macklem
Hi,

I have now dropped the client side of Flexible File Layout for pNFS into head
and I believe it is basically working.
Currently when talking to mirrored DS servers, it does the Write and Commit
RPCs to the mirrors serially. This works, but is inefficient w.r.t. elapsed to 
to
completion.

To do them concurrently, I need separate kernel processes/threads to do them.
I can think of two ways to do this:
1 - The code that I have running in projects/pnfs-planb-server for the pNFS 
server
  side does a kproc_create() to create a kernel process that does the RPC 
and
  then krpc_exit()s.
  - This was easy to code and works. However, I am concerned that there is
going to be excessive overheads from doing all the kproc_create()s and
kproc_exit()s?
   Anyone know if these calls will result in large overheads?
2 - I haven't coded this, but the other way I can think of to do this is to
  create a pool of threads (kthread_create() is sufficient in this case, I
  think?) and then hand each RPC to an available thread so it can do the 
RPC.
  - Other than a little more complex coding, the main issue I see with this 
one
is "How many threads and when to create more/less of them.".

Anyhow, any comments w.r.t. the merits of either of the above approaches
(or a suggestion of other ways to do this) would be appreciated, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic in AcpiOsGetTimer during boot.

2017-10-02 Thread Rick Macklem
Conrad Meyer wrote:
> Are you running into the same issue reported on this svn-src thread?
> https://lists.freebsd.org/pipermail/svn-src-all/2017-September/151775.html
Yep, same panic().

> I believe jkim has reverted the change in a subsequent revision (r324136).
I'll try a post-r324136 kernel. If it still panics, I'll post again.

Thanks, rick
> Best,
> Conrad

On Sun, Oct 1, 2017 at 3:12 PM, Rick Macklem  wrote:
> Hi,
>
> I get the KASSERT panic in AcpiOsGetTimer() while booting a recent (2 day old)
> kernel. When I delete the KASSERT(), the kernel boots and seems to work ok.
> (This is the AcpiOsGetTimer() in sys/dev/acpica/Osd/OsdSchedule.c. There also
>  seems to be one of these functions under contrib.)
>
> Here is my dmesg after boot, if it helps indicate why this is called when 
> "cold"
> is still set. (I've marked where the dmesg ends when it Panics if the 
> KASSERT()
> is enabled.)
>
> dmesg after booting:
> Copyright (c) 1992-2017 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
> The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 12.0-CURRENT #1: Sun Oct  1 09:48:42 EDT 2017
> r...@nfsv4-newerlap.rick.home:/sub1/sys.headsep30/amd64/compile/GENERIC 
> amd64
> FreeBSD clang version 3.7.0 (tags/RELEASE_370/final 246257) 20150906
> WARNING: WITNESS option enabled, expect reduced performance.
> VT(vga): resolution 640x480
> CPU: Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (2294.84-MHz K8-class CPU)
>   Origin="GenuineIntel"  Id=0x306a9  Family=0x6  Model=0x3a  Stepping=9
>   
> Features=0xbfebfbff
>   
> Features2=0x7fbae3bf
>   AMD Features=0x28100800
>   AMD Features2=0x1
>   Structured Extended Features=0x281
>   XSAVE Features=0x1
>   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
>   TSC: P-state invariant, performance statistics
> real memory  = 17179869184 (16384 MB)
> avail memory = 16518905856 (15753 MB)
> Event timer "LAPIC" quality 600
> ACPI APIC Table: <_ASUS_ Notebook>
> FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
> FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 hardware threads
> random: unblocking device.
> arc4random: no preloaded entropy cache
> ioapic0  irqs 0-23 on motherboard
> SMP: AP CPU #1 Launched!
> SMP: AP CPU #3 Launched!
> SMP: AP CPU #7 Launched!
> SMP: AP CPU #6 Launched!
> SMP: AP CPU #4 Launched!
>  dmesg ends here when in KASSERT() panics.
> SMP: AP CPU #5 Launched!
> SMP: AP CPU #2 Launched!
> Timecounter "TSC-low" frequency 1147419696 Hz quality 1000
> random: entropy device external interface
> netmap: loaded module
> [ath_hal] loaded
> module_register_init: MOD_LOAD (vesa, 0x80f779d0, 0) error 19
> random: registering fast source Intel Secure Key RNG
> random: fast provider: "Intel Secure Key RNG"
> kbd1 at kbdmux0
> nexus0
> vtvga0:  on motherboard
> cryptosoft0:  on motherboard
> acpi0: <_ASUS_ Notebook> on motherboard
> acpi_ec0:  port 0x62,0x66 on acpi0
> acpi0: Power Button (fixed)
> cpu0:  on acpi0
> cpu1:  on acpi0
> cpu2:  on acpi0
> cpu3:  on acpi0
> cpu4:  on acpi0
> cpu5:  on acpi0
> cpu6:  on acpi0
> cpu7:  on acpi0
> hpet0:  iomem 0xfed0-0xfed003ff on acpi0
> Timecounter "HPET" frequency 14318180 Hz quality 950
> Event timer "HPET" frequency 14318180 Hz quality 550
> atrtc0:  port 0x70-0x77 irq 8 on acpi0
> atrtc0: Warning: Couldn't map I/O.
> atrtc0: registered as a time-of-day clock, resolution 1.00s
> Event timer "RTC" frequency 32768 Hz quality 0
> attimer0:  port 0x40-0x43,0x50-0x53 irq 0 on acpi0
> Timecounter "i8254" frequency 1193182 Hz quality 0
> Event timer "i8254" frequency 1193182 Hz quality 100
> Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
> pcib0:  port 0xcf8-0xcff on acpi0
> pci0:  on pcib0
> pcib1:  irq 16 at device 1.0 on pci0
> pci1:  on pcib1
> vgapci0:  port 0xe000-0xe07f mem 
> 0xf600-0xf6ff,0xf000-0xf3ff,0xf400-0xf5ff irq 16 at 
> device 0.0 on pci1
> vgapci0: Boot video device
> hdac0:  mem 0xf708-0xf7083fff irq 17 at 
> device 0.1 on pci1
> xhci0:  mem 0xf730-0xf730 irq 
> 16 at device 20.0 on pci0
> xhci0: 32 bytes context size, 64-bit DMA
> usbus0: waiting for BIOS to give up control
> xhci0: Port routing mask set to 0x
> usbus0 on xhci0
> usbus0: 5.0Gbps Super Speed USB v3.0
> pci0:  at device 22.0 (no driver attached)
> ehci0:  mem 0xf7318000-0xf73183ff irq 
>

panic in AcpiOsGetTimer during boot.

2017-10-01 Thread Rick Macklem
Hi,

I get the KASSERT panic in AcpiOsGetTimer() while booting a recent (2 day old)
kernel. When I delete the KASSERT(), the kernel boots and seems to work ok.
(This is the AcpiOsGetTimer() in sys/dev/acpica/Osd/OsdSchedule.c. There also
 seems to be one of these functions under contrib.)

Here is my dmesg after boot, if it helps indicate why this is called when "cold"
is still set. (I've marked where the dmesg ends when it Panics if the KASSERT()
is enabled.)

dmesg after booting:
Copyright (c) 1992-2017 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 12.0-CURRENT #1: Sun Oct  1 09:48:42 EDT 2017
r...@nfsv4-newerlap.rick.home:/sub1/sys.headsep30/amd64/compile/GENERIC 
amd64
FreeBSD clang version 3.7.0 (tags/RELEASE_370/final 246257) 20150906
WARNING: WITNESS option enabled, expect reduced performance.
VT(vga): resolution 640x480
CPU: Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (2294.84-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x306a9  Family=0x6  Model=0x3a  Stepping=9
  
Features=0xbfebfbff
  
Features2=0x7fbae3bf
  AMD Features=0x28100800
  AMD Features2=0x1
  Structured Extended Features=0x281
  XSAVE Features=0x1
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
  TSC: P-state invariant, performance statistics
real memory  = 17179869184 (16384 MB)
avail memory = 16518905856 (15753 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <_ASUS_ Notebook>
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 hardware threads
random: unblocking device.
arc4random: no preloaded entropy cache
ioapic0  irqs 0-23 on motherboard
SMP: AP CPU #1 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #7 Launched!
SMP: AP CPU #6 Launched!
SMP: AP CPU #4 Launched!
 dmesg ends here when in KASSERT() panics.
SMP: AP CPU #5 Launched!
SMP: AP CPU #2 Launched!
Timecounter "TSC-low" frequency 1147419696 Hz quality 1000
random: entropy device external interface
netmap: loaded module
[ath_hal] loaded
module_register_init: MOD_LOAD (vesa, 0x80f779d0, 0) error 19
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
kbd1 at kbdmux0
nexus0
vtvga0:  on motherboard
cryptosoft0:  on motherboard
acpi0: <_ASUS_ Notebook> on motherboard
acpi_ec0:  port 0x62,0x66 on acpi0
acpi0: Power Button (fixed)
cpu0:  on acpi0
cpu1:  on acpi0
cpu2:  on acpi0
cpu3:  on acpi0
cpu4:  on acpi0
cpu5:  on acpi0
cpu6:  on acpi0
cpu7:  on acpi0
hpet0:  iomem 0xfed0-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 950
Event timer "HPET" frequency 14318180 Hz quality 550
atrtc0:  port 0x70-0x77 irq 8 on acpi0
atrtc0: Warning: Couldn't map I/O.
atrtc0: registered as a time-of-day clock, resolution 1.00s
Event timer "RTC" frequency 32768 Hz quality 0
attimer0:  port 0x40-0x43,0x50-0x53 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
pcib1:  irq 16 at device 1.0 on pci0
pci1:  on pcib1
vgapci0:  port 0xe000-0xe07f mem 
0xf600-0xf6ff,0xf000-0xf3ff,0xf400-0xf5ff irq 16 at 
device 0.0 on pci1
vgapci0: Boot video device
hdac0:  mem 0xf708-0xf7083fff irq 17 at device 
0.1 on pci1
xhci0:  mem 0xf730-0xf730 irq 
16 at device 20.0 on pci0
xhci0: 32 bytes context size, 64-bit DMA
usbus0: waiting for BIOS to give up control
xhci0: Port routing mask set to 0x
usbus0 on xhci0
usbus0: 5.0Gbps Super Speed USB v3.0
pci0:  at device 22.0 (no driver attached)
ehci0:  mem 0xf7318000-0xf73183ff irq 
16 at device 26.0 on pci0
usbus1: EHCI version 1.0
usbus1 on ehci0
usbus1: 480Mbps High Speed USB v2.0
hdac1:  mem 0xf731-0xf7313fff irq 22 at 
device 27.0 on pci0
pcib2:  irq 16 at device 28.0 on pci0
pci2:  on pcib2
pcib3:  irq 17 at device 28.1 on pci0
pci3:  on pcib3
iwn0:  mem 0xf720-0xf7201fff irq 17 at 
device 0.0 on pci3
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
pcib4:  irq 19 at device 28.3 on pci0
pci4:  on pcib4
alc0:  port 0xd000-0xd07f mem 
0xf710-0xf713 irq 19 at device 0.0 on pci4
alc0: 11776 Tx FIFO, 12032 Rx FIFO
alc0: Using 1 MSI message(s).
miibus0:  on alc0
atphy0:  PHY 0 on miibus0
atphy0:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, 
1000baseT-FDX-master, auto, auto-flow
alc0: Using defaults for TSO: 65518/35/2048
alc0: Ethernet address: 10:bf:48:23:08:56
ehci1:  mem 0xf7317000-0xf73173ff irq 
23 at device 29.0 on

Re: Can't NFS mount ZFS volume

2017-10-01 Thread Rick Macklem
Danny Braniss wrote:
> Michael Butler wrote:
>> I have no idea why but using ..
>>
>> sudo /sbin/mount vm01:/usr/local/exports/ /mnt
>> .. instead of ..
>>
>> sudo /sbin/mount -t nfs vm01:/usr/local/exports/ /mnt
>
> the not working is :
> mount host:/path some-local-path
>
> which should default to -t nfs, and at least in 11.1 it works
> danny

My understanding of his post was the above worked and
 mount -t nfs host:/path some-local-path
did not work, when done via sudo. (Both seem to be fine when not
using "sudo".)

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Can't NFS mount ZFS volume

2017-10-01 Thread Rick Macklem
Michael Butler wrote:
> I have no idea why but using ..
>
> sudo /sbin/mount vm01:/usr/local/exports/ /mnt
This is weird. I would have thought they would both result in the same
behaviour.
>  .. instead of ..
>
> sudo /sbin/mount -t nfs vm01:/usr/local/exports/ /mnt
Did this work with the older system?.
 I'll admit I always go "su" and then do the mount command as
# mount -t nfs vm01:/usr/local/exports /mnt"
Please let us know if this doesn't work.
(If you try this and it doesn't work, then something is definitely broken.)

I don't even have sudo. (It's a port and my guess would be some issue
related to how either it or "mount" parses things.)

rick


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Can't NFS mount ZFS volume

2017-09-30 Thread Rick Macklem
I have only done two NFS commits within that range.
1 - A trivial one that adds two new arguments always specified as 0,
 which has no change in semantics.
2 - One that only affects NFSv4 during dismount, so it shouldn't affect
 an NFSv3 mount.

Some things to try:
- get rid of rpc.statd and rpc.lockd and do the mount with the "nolockd"
  option
- capture packets during the mount and look at them in wireshark, to
  see what is going on the wire.
- see if there are any patches applied to your net interface driver during
  that commit rev. range
- if you have any kind of firewall setup, get rid of that and see if that
   helps

Good luck with it, rick

From: Michael Butler 
Sent: Friday, September 29, 2017 9:09:35 PM
To: freebsd-current
Cc: rmack...@freebsd.org
Subject: Can't NFS mount ZFS volume

Both client and server have been upgraded from SVN r324033 to r324089

Now I can't mount a ZFS dataset over NFS :-(

imb@toshi:/home/imb> sudo /sbin/mount -t nfs vm01:/usr/local/exports/ /mnt

imb@toshi:/home/imb> mount
/dev/ada0s3a on / (ufs, local, soft-updates)
devfs on /dev (devfs, local, multilabel)
procfs on /proc (procfs, local)
linprocfs on /usr/compat/linux/proc (linprocfs, local)
linsysfs on /usr/compat/linux/sys (linsysfs, local)
vm01:/usr/local/exports on /mnt (nfs)

imb@toshi:/home/imb> sudo ls -l /mnt
ls: /mnt: Input/output error

The server shows the mount as being registered ..

imb@vm01:/home/imb> showmount -a
All mount points on localhost:
toshi.auburn.protected-networks.net:/usr/local/exports

It takes some time to complete an umount request but it does complete,

Any thoughts?

imb
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


anyone in the Boston area with time this week?

2017-09-24 Thread Rick Macklem
Hi,

I really doubt that there is anyone out there interested in doing this, but I 
figured
it can't hurt asking...
RedHat is hosting a NFSv4 testing event at their facility at
34 Littleton Rd
Westford, MA 01186
next week. There is no fee for attendance, but you need to physically be there
to help with testing. (They are actually fairly interesting events, with 
engineers from
various vendors testing their code.)

Anyhow, I have a pNFS server using Flexible Files Layout (and something called
"tightly coupled") that could be tested. You would basically need to show up 
with
a couple of FreeBSD systems (or VMs with different IP addresses) set up with 
this
server.

If anyone is interested, email and I can fill you in, rick
ps: And, yes, I  realize this is last minute. I just didn't think to email 
sooner.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


adding flex file layout support to the pNFS client

2017-09-18 Thread Rick Macklem
Hi,

I now have a series of patches that adds Flex File layout support to the NFSv4 
client
for pNFS.
I am now thinking about how to get them into head.

1 - I could put them up on reviews.freebsd.org, but since they are purely NFS 
patches
 and there is no Flex file layout server to test against (except the one I 
have in
 a projects tree under subversion), I doubt anyone will want to review them.
2 - I could create another projects tree under subversion but, again, I doubt 
anyone will
  be able to test them and the result will be one large patch to merge into 
head.
3 - I can put them in head as a series of patches and then they will be usable 
for testing
  of the pNFS server in the projects area.
Some of these patches are fairly large, but they should not affect current 
operation of
the NFS client.

I am leaning towards #3, but thought I would ask others for comments w.r.t. how 
I
should do this?

Thanks for any comments, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: anyone had experience expanding uid_t and gid_t?

2017-08-22 Thread Rick Macklem
On 19/8/17 11:15 am, Julian Elischer wrote:
>> at $JOB there are clients where 32bits is starting to chafe.
>>
>> Has anyone expanded them?
>>
>Other than a few offline comments I haven't heard anyone directly 
>respond to this.
>Does anyone have any comments on feasibility or suggestions?
>NFSV3 will definitely be screwed..
Actually all NFS mounts that use AUTH_SYS (or all except Kerberized mounts, if 
you
prefer), so even most NFSv4 mounts will be broken.
Although NFSv4 uses strings for users and groups (called owner and owner_group),
the AUTH_SYS authentication header has a 32bit uid and a list of 32bit gids.

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


NFSv4 server configs may need nfsuserd_enable="YES"

2017-07-28 Thread Rick Macklem
As of r321665, an NFSv4 server configuration that supports NFSv4 Kerberos mounts
or NFSv4 clients that do not support the uid/gid in the owner/owner_group string
will need to have:
nfsuserd_enable="YES"
in the machine's /etc/rc.conf file.

The background to this is that the capability to put uid/gid #s in the 
owner/owner_group
strings is allowed for AUTH_SYS by RFC7530 (which replaced RFC3530, that didn't 
allow this).
Since Linux uses this capability by default, many NFSv4 server configurations 
no longer
need to run the nfsuserd daemon and, as such, forcing it to run did not make 
much sense.

For sites using the uid/gid in owner/owner_group string capability, the sysctls:
vfs.nfs.enable_uidtostring
vfs.nfsd.enable_stringtouid
should both be set to 1 in /etc/sysctl.conf.

Hopefully this small POLA violation will not cause you grief, rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: small patch for /etc/rc.d/nfsd, bugfix or POLA violation?

2017-07-11 Thread Rick Macklem
Cy Schubert wrote:
>Rick Macklem wrote:
>> Hi,
>>
>> The attached one line patch to /etc/rc.d/nfsd modifies the script so that i=
>> t
>> does not force the nfsuserd to be run when nfsv4_server_enable is set.
>> (nfsuserd can still be enabled via nfsuserd_enable=3D"YES" is /etc/rc.conf.=
>> )
>>
>> Here's why I think this patch might be appropriate...
>> (a) - The original RFC for NFSv4 (RFC3530) essentially required Owners and
>>Owner_groups to be specified as @ and this required
>>the nfsuserd daemon to be running.
>> (b) - RFC7530, which replace RFC3530, allows a Owner/Owner_group string to =
>> be
>>   the uid/gid number in a string when using AUTH_SYS. This simplifies confi=
>> guration
>>   for an all AUTH_SYS/POSIX environment (most NFS uses, I suspect?).
>>
>> To make the server do (b), two things need to be done:
>> 1 - set vfs.nfsd.enable_stringtouid=3D1
>> 2 - set vfs.nfsd.enable_uidtostring=3D1 (for head, I don't know if it will =
>> be MFC'd?)
>> OR
>>   - never run nfsuserd after booting (killing it off after it has been runn=
>> ing is not
>> sufficient)
>>  =20
>> Given the above, it would seem that /etc/rc.d/nfsd should not force running=
>>  of
>> the nfsuserd daemon, due to changes in the protocol.
>>
>> However, this will result in a POLA violation, in that after the patch, nfs=
>> userd won't
>> start when booting, unless nfsuserd_enable=3D"YES" is added to /etc/rc.conf=
>> .
>>
>> So, what do people think about this patch? rick=
>
>How about a warning message + an UPDATING entry + no MFC? And, relnotes =
>yes to say we now support RFC7530 in 12.0?
Sounds fine to me. I'll wait to see if there are more comments.

Thanks, rick


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


small patch for /etc/rc.d/nfsd, bugfix or POLA violation?

2017-07-09 Thread Rick Macklem
Hi,

The attached one line patch to /etc/rc.d/nfsd modifies the script so that it
does not force the nfsuserd to be run when nfsv4_server_enable is set.
(nfsuserd can still be enabled via nfsuserd_enable="YES" is /etc/rc.conf.)

Here's why I think this patch might be appropriate...
(a) - The original RFC for NFSv4 (RFC3530) essentially required Owners and
   Owner_groups to be specified as @ and this required
   the nfsuserd daemon to be running.
(b) - RFC7530, which replace RFC3530, allows a Owner/Owner_group string to be
  the uid/gid number in a string when using AUTH_SYS. This simplifies 
configuration
  for an all AUTH_SYS/POSIX environment (most NFS uses, I suspect?).

To make the server do (b), two things need to be done:
1 - set vfs.nfsd.enable_stringtouid=1
2 - set vfs.nfsd.enable_uidtostring=1 (for head, I don't know if it will be 
MFC'd?)
OR
  - never run nfsuserd after booting (killing it off after it has been running 
is not
sufficient)
  
Given the above, it would seem that /etc/rc.d/nfsd should not force running of
the nfsuserd daemon, due to changes in the protocol.

However, this will result in a POLA violation, in that after the patch, 
nfsuserd won't
start when booting, unless nfsuserd_enable="YES" is added to /etc/rc.conf.

So, what do people think about this patch? rick

nfsd-rcd.patch
Description: nfsd-rcd.patch
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

adding extern maxbcachebuf to param.h

2017-06-18 Thread Rick Macklem
My recent commit (r320062) broke the arm build when it added
extern int maxbcachebuf;
to sys/param.h. Although I don't understand the actual failure, I believe
it is caused by arm/arm/elf_note.S including param.h and then using the
ELFNOTE() macro.

As a temporary fix, I have committed r320070, which removes the definition
from sys/param.h.
This brings me to the question of how best to fix this?
1 - Just leave it the way it is now, where "extern int maxbcachebuf" isn't 
defined
 in a generic include file and needs to be defined as above before use.
2 - Add "!defined(LOCORE)" to the definition of it in sys/param.h, which I 
believe
 will also fix the problem.
3 - Put it in some other sys/*.h file which never gets included in assembler 
files.
 What .h would be appropriate?

Once I have answers to the above, I can update the fix.
Thanks, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: post ino64: lockd no runs?

2017-06-04 Thread Rick Macklem
Just fyi, rpc.lockd isn't NFS. It is a separate protocol Sun called NLM and
I didn't ever implement it for what I believed were good reasons.
Hopefully someone who works with the code can help.
Btw, if you don't the locks visible to multiple clients concurrently, you can
just do your mounts with "nolockd" and avoid running it.
Or, you can switch to NFSv4, which does a reasonable job of implementing
locking.

I might take a look at the code, in case I can spot something obvious, but I
won't bet on it.

rick


From: Michael Butler 
Sent: Sunday, June 4, 2017 8:57:44 AM
To: freebsd-current; Rick Macklem
Subject: post ino64: lockd no runs?

It seems that {rpc.}lockd no longer runs after the ino64 changes on any
of my systems after a full rebuild of src and ports. No log entries
offer any insight as to why :-(

imb
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Time to increase MAXPHYS?

2017-06-04 Thread Rick Macklem
There is an array in aio.h sized on MAXPHYS as well.

A simpler possibility might be to leave MAXPHYS as a compile
time setting, but allow it to be set "per arch" and make it bigger
for amd64.

Good luck with it, rick

From: owner-freebsd-curr...@freebsd.org  on 
behalf of Konstantin Belousov 
Sent: Sunday, June 4, 2017 4:10:32 AM
To: Warner Losh
Cc: Allan Jude; FreeBSD Current
Subject: Re: Time to increase MAXPHYS?

On Sat, Jun 03, 2017 at 11:28:23PM -0600, Warner Losh wrote:
> On Sat, Jun 3, 2017 at 9:55 PM, Allan Jude  wrote:
>
> > On 2017-06-03 22:35, Julian Elischer wrote:
> > > On 4/6/17 4:59 am, Colin Percival wrote:
> > >> On January 24, 1998, in what was later renumbered to SVN r32724, dyson@
> > >> wrote:
> > >>> Add better support for larger I/O clusters, including larger physical
> > >>> I/O.  The support is not mature yet, and some of the underlying
> > >>> implementation
> > >>> needs help.  However, support does exist for IDE devices now.
> > >> and increased MAXPHYS from 64 kB to 128 kB.  Is it time to increase it
> > >> again,
> > >> or do we need to wait at least two decades between changes?
> > >>
> > >> This is hurting performance on some systems; in particular, EC2 "io1"
> > >> disks
> > >> are optimized for 256 kB I/Os, EC2 "st1" (throughput optimized
> > >> spinning rust)
> > >> disks are optimized for 1 MB I/Os, and Amazon's NFS service (EFS)
> > >> recommends
> > >> using a maximum I/O size of 1 MB (and despite NFS not being *physical*
> > >> I/O it
> > >> seems to still be limited by MAXPHYS).
> > >>
> > > We increase it in freebsd 8 and 10.3 on our systems,  Only good results.
> > >
> > > sys/sys/param.h:#define MAXPHYS (1024 * 1024)   /* max raw I/O
> > > transfer size */
> > >
> > > ___
> > > freebsd-current@freebsd.org mailing list
> > > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > > To unsubscribe, send any mail to "freebsd-current-unsubscribe@
> > freebsd.org"
> >
> > At some point Warner and I discussed how hard it might be to make this a
> > boot time tunable, so that big amd64 machines can have a larger value
> > without causing problems for smaller machines.
> >
> > ZFS supports a block size of 1mb, and doing I/Os in 128kb negates some
> > of the benefit.
> >
> > I am preparing some benchmarks and other data along with a patch to
> > increase the maximum size of pipe I/O's as well, because using 1MB
> > offers a relatively large performance gain there as well.
> >
>
> It doesn't look to be hard to change this, though struct buf depends on
> MAXPHYS:
> struct  vm_page *b_pages[btoc(MAXPHYS)];
> and b_pages isn't the last item in the list, so changing MAXPHYS at boot
> time would cause an ABI change. IMHO, we should move it to the last element
> so that wouldn't happen. IIRC all buf allocations are from a fixed pool.
> We'd have to audit anybody that creates one on the stack knowing it will be
> persisted. Given how things work, I don't think this is possible, so we may
> be safe. Thankfully, struct bio doesn't seem to be affected.
>
> As for making it boot-time configurable, it shouldn't be too horrible with
> the above change. We should have enough of the tunables mechanism up early
> enough to pull this in before we create the buf pool.
>
> Netflix runs MAXPHYS of 8MB. There's issues with something this big, to be
> sure, especially on memory limited systems. Lots of hardware can't do this
> big an I/O, and some drivers can't cope, even if the underlying hardware
> can. Since we don't use such drivers at work, I don't have a list handy
> (though I think the SG list for NVMe limits it to 1MB). 128k is totally
> reasonable bump by default, but I think going larger by default should be
> approached with some caution given the overhead that adds to struct buf.
> Having it be a run-time tunable would be great.
The most important side-effect of bumping MAXPHYS as high as you did,
which is somewhat counter-intuitive and also probably does not matter
for typical Netflix cache box load (as I understand it) is increase of
fragmentation for UFS volumes.

MAXPHYS limits the max cluster size, and larger the cluster we trying to
build, larger is the probability of failure.  We might end with single-block
writes more often, defeating reallocblk defragmenter.  This might be
somewhat theoretical, and probably can be mitigated in the clustering code
if real, but it is a thing to look at.

WRT making the MAXPHYS tunable, I do not like the proposal of converting
b_pages[] into the flexible array.  I think that making b_pages a pointer
to off-structure page run is better.  One of the reason is that buf cache
buffers are not only buffers in the system.  There are several cases where
the buffers are malloced, like markers for iterating queues.  In this case,
b_pages[] can be eliminated at all.  (I believe I changed all local
struct bufs to be allocated with malloc).

Another n

Re: NFS client perf. degradation when SCHED_ULE is used (was when SMP enabled)

2017-06-03 Thread Rick Macklem
Colin Percival wrote:
>On 05/28/17 13:16, Rick Macklem wrote:
>> cperciva@ is running a highly parallelized buuildworld and he sees better
>> slightly better elapsed times and much lower system CPU for SCHED_ULE.
>>
>> As such, I suspect it is the single threaded, processes mostly sleeping 
>> waiting
>> for I/O case that is broken.
>> I suspect this is how many people use NFS, since a highly parallelized make 
>> would
>> not be a typical NFS client task, I think?
>
>Running `make buildworld -j36` on an EC2 "c4.8xlarge" instance (36 vCPUs, 60
>GB RAM, 10 GbE) with GENERIC-NODEBUG, ULE has a slight edge over 4BSD:
>
>GENERIC-NODEBUG, SCHED_4BSD:
>1h14m12.48s real6h25m44.59s user1h4m53.42s sys
>1h15m25.48s real6h25m12.20s user1h4m34.23s sys
>1h13m34.02s real6h25m14.44s user1h4m09.55s sys
>1h13m44.04s real6h25m08.60s user1h4m40.21s sys
>1h14m59.69s real6h25m53.13s user1h4m55.20s sys
>1h14m24.00s real6h24m59.29s user1h5m37.31s sys
>
>GENERIC-NODEBUG, SCHED_ULE:
>   1h13m00.61s real6h02m47.59s user26m45.89s sys
>1h12m30.18s real6h01m39.97s user26m16.45s sys
>1h13m08.43s real6h01m46.94s user26m39.20s sys
>1h12m18.94s real6h02m26.80s user27m39.71s sys
>1h13m21.38s real6h00m46.13s user27m14.96s sys
>1h12m01.80s real6h02m24.48s user27m18.37s sys
>
>Running `make buildworld -j2` on an E2 "m4.large" instance (2 vCPUs, 8 GB RAM,
>~ 500 Mbps network), 4BSD has a slight edge over ULE on real and sys
>time but is slightly worse on user time:
>
>GENERIC-NODEBUG, SCHED_4BSD:
>6h29m25.17s real7h2m56.02s user 14m52.63s sys
>6h29m36.82s real7h2m58.19s user 15m14.21s sys
>6h28m27.61s real7h1m38.24s user 14m56.91s sys
>6h27m05.42s real7h1m38.57s user 15m04.31s sys
>
>GENERIC-NODEBUG, SCHED_ULE:
>   6h34m19.41s real6h59m43.99s user18m8.62s sys
>6h33m55.08s real6h58m44.91s user18m4.31s sys
>6h34m49.68s real6h56m03.58s user17m49.83s sys
>6h35m22.14s real6h58m12.62s user17m52.05s sys
Doing these test runs, but on the 36v CPU system would be closer to what I
was testing. My tests do not use "-j" and run on an 8core chunk
of real hardware.

>Note that in both cases there is lots of idle time (although far more in the
>-j36 case); this is partly due to a lack of parallelism in buildworld, but
>largely due to having /usr/obj mounted on Amazon EFS.
>
>These differences all seem within the range which could result from cache
>effects due to threads staying on one CPU rather than bouncing around; so
>whatever Rick is tripping over, it doesn't seem to be affecting these tests.

Yep. Thanks for doing the testing, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


patch that makes max buffer cache block size tunable for review

2017-05-30 Thread Rick Macklem
Hi,

I just put a patch here:
https://reviews.freebsd.org/D10991
that makes the maximum size of a buffer cache block a tunable.

This allows the NFS client to use larger I/O sized RPCs.
By default, the NFS client will use the largest I/O size possible.
What is actually in use can be checked via "nfsstat -m".

Anyone interested in reviewing and/or testing this, please do so, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: NFS client perf. degradation when SCHED_ULE is used (was when SMP enabled)

2017-05-28 Thread Rick Macklem
I wrote:
[stuff snipped]
> So, I'd say either reverting the patch or replacing it with the "obvious 
> change" mentioned
> in the commit message will at least mostly fix the problem.
"mostly fix" was probably a bit optimistic. Here's my current #s.
(All cases are the same single threaded kernel build, same hardware, etc. The 
only
 changes are recent vs 1yr old head kernel and what is noted.)

- 1yr old kernel, SMP, SCHED_ULE94minutes
- 1yr old kernel, no SMP, SCHED_ULE 111minutes

- recent kernel, SMP, SCHED_4BSD  104minutes
- recent kernel, no SMP, SCHED_ULE   113minutes
- recent kernel, SMP, SCHED_ULE,
   r312426 reverted  122minutes
- recent kernel, SMP, SCHED_ULE 148minutes

So, reverting r312426 only gets rid of about 1/2 of the degradation.
One more thing I will note is that the system CPU is higher for the cases that 
run
with lower/better elapsed times:
- 1yr old kernel, SMP, SCHED_ULE545s
- 1yr old kernel, no SMP, SCHED_ULE   293s
- recent kernel, no SMP, SCHED_ULE292s
- recent kernel, SMP, SCHED_ULE 466s

cperciva@ is running a highly parallelized buuildworld and he sees better
slightly better elapsed times and much lower system CPU for SCHED_ULE.

As such, I suspect it is the single threaded, processes mostly sleeping waiting
for I/O case that is broken.
I suspect this is how many people use NFS, since a highly parallelized make 
would
not be a typical NFS client task, I think?

There are other changes to sched_ule.c in the last year, but I'm not sure which
would be easy to revert and might make a difference in this case?

rick
ps: I've cc'd cperiva@ and he might wish to report his results. I am hoping he
  does try a make without "-j" at some point.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: NFS client perf. degradation when SCHED_ULE is used (was when SMP enabled)

2017-05-27 Thread Rick Macklem
I wrote:
>To briefly summarize the previous post related to perf. degradation when 
>running a
>recent kernel...
>- kernel build running 1yr old kernel took 100minutes
>- same kernel build running recent kernel 148minutes
>(ie. Almost a 50% degradation.)
>As noted in the last post, I got rid of most of the degradation by disabling 
>SMP.
>
>- same kernel build running recent kernel with SCHED_4BSD   104minutes
>
After poking at this some more, it appears that r312426 is the main cause of
this degradation.
Doing SMP enabled test runs using SCHED_ULE running the recent kernel, I got:
- recent kernel  (as above)148minutes
- with r312426 reverted122minutes
- with the "obvious change" mentioned in r312426's commit message, using
   (flags & SW_TYPE_MASK) == SWT_RELINQUISH instead of (flag & SWT_RELINQUISH)
   121minutes

So, I'd say either reverting the patch or replacing it with the "obvious 
change" mentioned
in the commit message will at least mostly fix the problem.

I actually suspect that setting "preempt" for SWT_IDLE and/or SWT_IWAIT is what
is needed to be the pre-r312426 performance, since those are the ones that
SWT_RELINQUISH doesn't match. (There is also SWT_PREEMPT, but that was
handled by the r312426 patch.)
I also tested:
((flags & SW_PREEMPT) != 0 || (flags & SW_TYPE_MASK) == SWT_IDLE ||
  (flags & SW_TYPE_MASK) == SWT_IWAIT)
and it also resulted in121minutes

I still get better perf. from SCHED_4BSD of 104minutes, but I usually see better
performance for SCHED_4BSD, so I think this is expected.

I know nothing about SCHED_ULE, so I don't think I can do more, unless someone
wants me to try a different patch?

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


NFS client perf. degradation when SCHED_ULE is used (was when SMP enabled)

2017-05-26 Thread Rick Macklem
To briefly summarize the previous post related to perf. degradation when 
running a
recent kernel...
- kernel build running 1yr old kernel took 100minutes
- same kernel build running recent kernel 148minutes
(ie. Almost a 50% degradation.)
As noted in the last post, I got rid of most of the degradation by disabling 
SMP.

- same kernel build running recent kernel with SCHED_4BSD   104minutes

I have now found I can get rid of almost all of the degradation by building the
recent kernel with
options SCHED_4BSD
instead of
options SCHED_ULE

The 1yr old kernel was built with SCHED_ULE, so the degradation is some change
to the kernel since Apr. 12, 2016 that affects SCHED_ULE but not SCHED_4BSD.

Any ideas?

Since GENERIC uses SCHED_ULE, it would be nice to see good perf. with that 
option.
However, recommending "options SCHED_4BSD" is nicer than recommending disabling
SMP;-)

rick
ps: I tried the 1yr old net driver in the recent kernel and it had no effect on 
perf.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: NFS client performance degradation when SMP enabled

2017-05-25 Thread Rick Macklem
Nope, it's an alc and the driver has very few changes between the old and
new kernel (a change in the DMA channel from 3 to 4, whatever that means?).

rick

From: Ryan Stone 
Sent: Wednesday, May 24, 2017 8:12:54 PM
To: Rick Macklem
Cc: freebsd-current@freebsd.org
Subject: Re: NFS client performance degradation when SMP enabled

What type of network interface do you have?  The Intel 1G (em and igb) were 
switched over to the "iflib" framework a few months ago and that could be the 
cause.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


NFS client performance degradation when SMP enabled

2017-05-24 Thread Rick Macklem
Without boring you with too much detail, I have been doing development/testing
of pNFS stuff (mostly server side) on a 1 year old kernel (Apr. 12, 2016).
When I recently carried the code across to a recent kernel, everything seemed 
to work,
but performance was much slower.
After some fiddling around, it appears to be on the NFS client side and nothing 
in the
NFS client code seemed to be causing it. (RPC counts were almost exactly the 
same,
for example. I tried reverting r316532 and disabling vfs.nfs.use_buf_pager. 
Neither
made a significant difference.)

I made most of the performance degradation go away by disabling SMP on the 
client.
Here's some elapsed times for kernel builds with everything the same except for
which kernel and SMP enabled/disabled (amd64 client machine).
1 year old kernel, SMP enabled  - 100minutes
recent kernel, SMP disabled- 113minutes
recent kernel, SMP enabled-  148minutes
(The builds were all of the same kernel sources. When I say "1 year old" vs 
"recent"
 I am referring to which kernel was booted for the test run.)

All I can think of is that some change in the last year has resulted in an 
increase in
something like interrupt latency or context switch latency that has caused this?

Anyone have an idea what this might be caused by or any tunables to fool with
beyond disabling SMP (which I suspect won't be a popular answer to "how to fix
slow NFS";-).

I haven't yet tried fiddling with interrupt moderation on the net interface, but
the tests all used the same settings.

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: more default uid/gid for NFS in mountd

2017-05-13 Thread Rick Macklem
Slawa Olhovchenkov wrote:
>Rick Macklem wrote:
>> Hi,
>>
>> Five years ago (yea, it slipped through a crack;-), Slawa reported that files
>> created by root would end up owned by uid 2**32-2 (-2 as uint32_t).
>> This happens if there is no "-maproot=" in the /etc/exports line.
>>
>> The cause is obvious. The value is set to -2 by default.
>>
>> The question is... Should this be changed to 65534 (ie "nobody")?
>> - It would seem more consistent to make it the uid of nobody, but I can also 
>> see
>>   the argument that since it has been like this *forever*, that changing it 
>> would be
>>   a POLA violation.
>> What do others think?
>
>IMHO uid 2**32-2 is POLA violation.
>Nobody expect this uid. Too much number. This is like bug.
This is what I have just committed. Thanks for the comments.

>> It is also the case that mountd.c doesn't look "nobody" up in the password 
>> database
>> to set the default. It would be nice to do this, but it could result in the 
>> mountd daemon
>> getting "stuck" during a boot waiting for an unresponsive LDAP service or 
>> similar.
>> Does doing this sound like a good idea?
>
>This is (stuck at boot) already do for case of using NIS and nfsuserd.
There is a difference here. nfsuserd mpas between uid/names, so it can't work
without the password database.
mountd can work without the password database, so I held off on doing this for 
now.

>I am regular see this for case of DNS failed at boot.
>You offer don't impair current behaviour.
As an aside, if you have the critical entries in the local files (/etc/hosts, 
/etc/passwd,
/etc/group) and then tell the libraries to search these first in 
/etc/nsswitch.conf, then
you usually avoid this problem.

Thanks for the comments, rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


more default uid/gid for NFS in mountd

2017-05-08 Thread Rick Macklem
Hi,

Five years ago (yea, it slipped through a crack;-), Slawa reported that files
created by root would end up owned by uid 2**32-2 (-2 as uint32_t).
This happens if there is no "-maproot=" in the /etc/exports line.

The cause is obvious. The value is set to -2 by default.

The question is... Should this be changed to 65534 (ie "nobody")?
- It would seem more consistent to make it the uid of nobody, but I can also see
  the argument that since it has been like this *forever*, that changing it 
would be
  a POLA violation.
What do others think?

It is also the case that mountd.c doesn't look "nobody" up in the password 
database
to set the default. It would be nice to do this, but it could result in the 
mountd daemon
getting "stuck" during a boot waiting for an unresponsive LDAP service or 
similar.
Does doing this sound like a good idea?

Thanks for any comments, rick
ps: Here's the original email thread, in case you are interested:
  https://lists.freebsd.org/pipermail/freebsd-stable/2012-March/066868.html

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Recent FreeBSD, NFSv4 and /var/db/mounttab

2017-05-07 Thread Rick Macklem
Claude Buisson wrote:
>On 05/07/2017 21:09, Rick Macklem wrote:
>> Claude Buisson wrote:
>>> Hi,
>>>
>>> Last month, I started switching all my systems (stable/9, stable/10,
>>> stable/11 and current) to NFSv4, and I found that:
>>>
>>>   on current (svn 312652) an entry is added to /var/db/mounttab by
>>> mount_nfs(8), but not suppressed by umount(8). It can be suppressed by
>>> rpc.umntall(8).
>>>
>>> The same anomaly appears on stable/11 after upgrading to svn 312950.
>>>
>>> It is relatively easy to trace this anomaly to r308871 on current and
>>> its MFHs (r309517 for stable/11).
>>>
>>> Patching sbin/umount/umount.c to restore the RPC call for NFSv4 makes
>>> umount(8) suppress the mounttab entry as before.
>>>
>>> I do not know what is the proper solution, as suppressing the
>>> modification of mounttab by mount_nfs(8) for NFSv4 could be an (more
>>> complicated) alternative !
I chose this alternative, since NFSv4 has nothing to do with the Mount protocol.
A one line patch to do this is now committed to head as r317931.

Thanks for reporting this, rick
[stuff snipped]
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Recent FreeBSD, NFSv4 and /var/db/mounttab

2017-05-07 Thread Rick Macklem
Claude Buisson wrote:
[stuff snipped]
> This is really an long delayed answer !!
Just made it to the top of my "to do" list...
> 1) I am afraid of a confusion on your side between mounttab which is
> managed on the CLIENT, and mountdtab which is managed of the SERVER.
Ok, now that I've looked, I see what you are talking about. To be honest, I 
never
knew this file even existed (it doesn't on the systems I run, since it has never
been created on them;-).

> 2) Since my first mail, I patched mount_nfs(4) (client side) not to
> write an entry in mounttab in the NFS4 case. But:
Yes, I would say all that is needed is the call to add_mtab() in mount_nfs.c be 
made
conditional on a non-NFSv4 mount.

Thanks for reporting this, rick


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Recent FreeBSD, NFSv4 and /var/db/mounttab

2017-05-07 Thread Rick Macklem
Claude Buisson wrote:
>Hi,
>
>Last month, I started switching all my systems (stable/9, stable/10,
>stable/11 and current) to NFSv4, and I found that:
>
>   on current (svn 312652) an entry is added to /var/db/mounttab by
>mount_nfs(8), but not suppressed by umount(8). It can be suppressed by
>rpc.umntall(8).
>
>The same anomaly appears on stable/11 after upgrading to svn 312950.
>
>It is relatively easy to trace this anomaly to r308871 on current and
>its MFHs (r309517 for stable/11).
>
>Patching sbin/umount/umount.c to restore the RPC call for NFSv4 makes
>umount(8) suppress the mounttab entry as before.
>
>I do not know what is the proper solution, as suppressing the
>modification of mounttab by mount_nfs(8) for NFSv4 could be an (more
>complicated) alternative !
When I do an NFSv4 mount from a recent FreeBSD system, it does not use the
Mount protocol. I am not sure why your NFSv4 mounts are putting an entry in
mounttab, since that is done by mountd.c on the server and the client isn't even
contacting it?

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


default uid/gid for nfsuserd.c

2017-04-21 Thread Rick Macklem
Hi,

I just added GID_NOGROUP to sys/conf.h and fixed the initial values for
nobody/nogroup in the kernel.
However, UID_NOBODY and GID_NOGROUP are in the _KERNEL section of
sys/conf.h, so they aren't visible in userland.

So, how to I set the initial uid/gid values for nfsuserd.c?
(nfsuserd.c looks for entries in the password and group databases, so these 
defaults
 only get used if it doesn't find an entry in the database.)
I don't mind using the hardcoded values, but is there a better way?
(Moving the #define's out of the _KERNEL section maybe?)

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


kernel coding of nobody/nogroup

2017-04-21 Thread Rick Macklem
Hi,

I need to set the default uid/gid values for nobody/nogroup into kernel
variables. I reverted the commit that hardcoded them, since I agree that
wasn't a good thing to do.

I didn't realize that "nobody" was already defined in sys/conf.h and I can
use that.

There is no definition for "nogroup" in sys/conf.h.
Would it be ok to add
#define GID_NOGROUP  65533
to syy/conf.h?
(I know bde@ doesn't like expressing this as 65533, but that is what it is in 
/etc/group.)

rick
ps: These values are usually set by nfsuserd(8), but need to be initialized for 
the case
   where is in not being run. The default uid/gid in nfsuserd.c needs to be 
fixed too,
   although they only get used if there isn't an entry for nobody/nogroup 
in the
   password/group database.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: process killed: text file modification

2017-04-16 Thread Rick Macklem
Julian Elischer wrote:
On 13/4/17 5:45 am, Rick Macklem wrote:
> I have just committed a patch to head (r316745) which should fix this.
> (It includes code to handle the recent change to head to make the pageouts
>   write through the buffer cache.)
>
> It will be MFC'd and should be in 11.1.

> is there any relevance of this change to stable/10
Yes, I do plan on MFC'ng this to stable/10. If Kostik doesn't MFC the 
VOP_PUTPAGES()
changes, I'll just leave the ncl_flush() call out of nfs_set_text().

When I noted it would be in 11.1, I didn't intend to imply that it would be 
MFC'd
to stable/11 only.

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: process killed: text file modification

2017-04-12 Thread Rick Macklem
I have just committed a patch to head (r316745) which should fix this.
(It includes code to handle the recent change to head to make the pageouts
 write through the buffer cache.)

It will be MFC'd and should be in 11.1.

Thanks everyone, for your help with this, rick

From: owner-freebsd-curr...@freebsd.org  on 
behalf of Rick Macklem 
Sent: Friday, March 24, 2017 4:14:45 PM
To: Konstantin Belousov
Cc: Gergely Czuczy; Dimitry Andric; Ian Lepore; FreeBSD Current
Subject: Re: process killed: text file modification

I can't do commits until I get home in mid-April.

That's why it will be waiting until then.

It should make it into stable/11 in plenty of time for 11.1.

Thanks for your help with this, rick

From: owner-freebsd-curr...@freebsd.org  on 
behalf of Konstantin Belousov 
Sent: Friday, March 24, 2017 3:01:41 AM
To: Rick Macklem
Cc: Gergely Czuczy; Dimitry Andric; Ian Lepore; FreeBSD Current
Subject: Re: process killed: text file modification

On Thu, Mar 23, 2017 at 09:39:00PM +0000, Rick Macklem wrote:
> Try whatever you like. However, if the case that failed before doesn't fail 
> now,
> I'd call the test a success.
>
> Thanks, rick
> ps: It looks like Kostik is going to work on converting the NFS 
> vop_putpages() to
>   using the buffer cache. However, if this isn't ready for head/current 
> by mid-April,
>   I will commit this patch to help fix things in the meantime.

I do not see a reason to wait for my work before committing your patch.
IMO, instead, it should be committed ASAP and merged into stable/11 for
upcoming 11.1.

I will do required adjustments if/when _putpages() patch progresses enough.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: NFSv2 boot & OLD_NFSV2

2017-03-26 Thread Rick Macklem
Just in case it wasn't clear, I think this is a good idea and I think
you have a handle on any potential problems.

Good luck with it, rick

From: Toomas Soome 
Sent: Tuesday, March 21, 2017 5:04:59 AM
To: Daniel Braniss
Cc: Baptiste Daroussin; Rick Macklem; FreeBSD Current
Subject: Re: NFSv2 boot & OLD_NFSV2

On 21. märts 2017, at 10:50, Daniel Braniss 
mailto:da...@cs.huji.ac.il>> wrote:


On 21 Mar 2017, at 10:13, Baptiste Daroussin 
mailto:b...@freebsd.org>> wrote:

On Tue, Mar 21, 2017 at 09:58:21AM +0200, Daniel Braniss wrote:

On 20 Mar 2017, at 23:55, Toomas Soome mailto:tso...@me.com>> 
wrote:


On 20. märts 2017, at 23:53, Rick Macklem 
mailto:rmack...@uoguelph.ca>> wrote:

Baptiste Daroussin wrote:
On Mon, Mar 20, 2017 at 08:22:12PM +0200, Toomas Soome wrote:
Hi!

The current boot code is building NFSv3, with preprocessor conditional 
OLD_NFSV2. Should NFSv2 code still be kept around or can we burn it?

rgds,
toomas

I vote burn

Bapt
I would be happy to see NFSv2 go away. However, depending on how people 
configure
their diskless root fs, they do end up using NFSv2 for their root fs.

Does booting over NFSv3 affect this?

I think the answer is no for a FreeBSD server (since the NFSv2 File Handle is 
the same as
the NFSv3 one, except padded with 0 bytes to 32bytes long).
However, there might be non-FreeBSD NFS servers where the NFSv2 file handle is 
different
than the NFSv3 one and for that case, the user would need NFSv2 boot code (or
reconfigure their root fs to use NFSv3).

To be honest, I suspect few realize that they are using NFSv2 for their root fs.
(They'd see it in a packet trace or via "nfsstat -m", but otherwise they 
probably
think they are using NFSv3 for their root fs.)

rick

if they do not suspect, they most likely use v3 - due to simple fact that you 
have to rebuild loader to use NFSv2 - it is compile time option.


old systems, 8.x, still use/boot v2, and so do old linux.
NetApp has discontinued support for v2, so we had to move this machines to use 
FreeBSD server and the day was
saved. So, till these machines get upgraded/discontinued we have a problem. 
There are several solutions
to this issue, but as long as it's a matter of getting rid for the sake of it, 
I would vote to keep it a while longer.

danny


Given you are speaking of 8.x I suppose you are using the loader that comes with
it, meaning you are safe if we remove it from the loader in 12.0 (note as said
by Toomas that is it is already off by default in the 12.0 loader) am I missing
something?


as usual, did not read the whole thread, I assumed - wrongly - that support for 
v2 would be discontinued.
removing v2 support from the boot process is fine! great, go for it. It will 
only involve newer
hosts, and simplifying the boot process is always a good idea.

sorry for the noise.
danny



yes, just to clarify,  the current loader code (in current), is having NFS code 
implemented as:

#ifdef OLD_NFSV2

v2 implementation is here

#else

v3 implementation is here

#endif

Which does mean that pxeboot/loader.efi is built by default to use v3 only, but 
we do have 2 parallel implementations of the NFS readers. And yes, the question 
is just about boot loader reader code (we do not implement NFS writes) - and we 
are *not* talking about server side there.

Indeed it also is possible to merge those 2 version implementations, but to be 
honest, I see very little point of doing that either, even if there is some 
setup still with v2 only server, there is still an option just to use TFTP 
based boot - especially given that current boot loader does provide parallel 
option to use either NFS or TFTP (via dhcp option 150), with existing binaries 
- that is, without having to re-compile.

rgds,
toomas

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: process killed: text file modification

2017-03-24 Thread Rick Macklem
I can't do commits until I get home in mid-April.

That's why it will be waiting until then.

It should make it into stable/11 in plenty of time for 11.1.

Thanks for your help with this, rick

From: owner-freebsd-curr...@freebsd.org  on 
behalf of Konstantin Belousov 
Sent: Friday, March 24, 2017 3:01:41 AM
To: Rick Macklem
Cc: Gergely Czuczy; Dimitry Andric; Ian Lepore; FreeBSD Current
Subject: Re: process killed: text file modification

On Thu, Mar 23, 2017 at 09:39:00PM +0000, Rick Macklem wrote:
> Try whatever you like. However, if the case that failed before doesn't fail 
> now,
> I'd call the test a success.
>
> Thanks, rick
> ps: It looks like Kostik is going to work on converting the NFS 
> vop_putpages() to
>   using the buffer cache. However, if this isn't ready for head/current 
> by mid-April,
>   I will commit this patch to help fix things in the meantime.

I do not see a reason to wait for my work before committing your patch.
IMO, instead, it should be committed ASAP and merged into stable/11 for
upcoming 11.1.

I will do required adjustments if/when _putpages() patch progresses enough.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: crash: umount_nfs: Current

2017-03-24 Thread Rick Macklem
Ok, thanks for testing it. Unless I hear of problems with the patch, I'll 
commit it
to head in mid-April.

Thanks again for reporting this and doing the testing, rick

From: owner-freebsd-curr...@freebsd.org  on 
behalf of Larry Rosenman 
Sent: Friday, March 24, 2017 11:21:39 AM
To: Rick Macklem; freebsd...@freebsd.org
Cc: freebsd-current@FreeBSD.org
Subject: Re: crash: umount_nfs: Current

I tried my test (umount –t nfs –a && mount –t nfs –a) and no crash. (with ~84G 
inact cached NFS data).

I suspect it helped.


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: process killed: text file modification

2017-03-23 Thread Rick Macklem
Try whatever you like. However, if the case that failed before doesn't fail now,
I'd call the test a success.

Thanks, rick
ps: It looks like Kostik is going to work on converting the NFS vop_putpages() 
to
  using the buffer cache. However, if this isn't ready for head/current by 
mid-April,
  I will commit this patch to help fix things in the meantime.


From: Gergely Czuczy 
Sent: Thursday, March 23, 2017 2:25:11 AM
To: Rick Macklem; Konstantin Belousov
Cc: Dimitry Andric; Ian Lepore; FreeBSD Current
Subject: Re: process killed: text file modification

On 2017. 03. 21. 3:40, Rick Macklem wrote:
> Gergely Czuczy wrote:
> [stuff snipped]
>> Actually I want to test it, but you guys are so vehemently discussing
>> it, I thought it would be better to do so, once you guys settled your
>> analysis on the code. Also, me not having the problem occurring, I don't
>> think would mean it's solved, since that would only mean, the codepath
>> for my specific usecase works. There might be other things there as
>> well, what I don't hit.
> I hope by vehemently, you didn't find my comments as nasty. If they did
> come out that way, it was not what I intended and I apologize.
>
>> Let me know which patch should I test, and I will see to it in the next
>> couple of days, when I get the time to do it.
> I've attached it here again and, yes, I would agree that the results you get
> from testing are just another data point and not definitive.
> (I'd say this statement is true of all testing of nontrivial code.)
>
> Thanks in advance for any testing you can do, rick
>
So, I've copied the patched kernel over, and apparently it's working
properly. I'm not getting the error anymore.

So far I've only did a quick test, should I do something more extensive,
like build a couple of ports or something over NFS?

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: crash: umount_nfs: Current

2017-03-22 Thread Rick Macklem
Larry Rosenman wrote:

> Err, I’m at  r315289….
I think the attached patch (only very lightly tested by me) will fix this crash.
If you have an easy way to test it, that would be appreciated, rick



clntcrash.patch
Description: clntcrash.patch
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: process killed: text file modification

2017-03-22 Thread Rick Macklem
Konstantin Belousov wrote:
[stuff snipped]
> Below is something to discuss. This is not finished, but it worked for
> the simple tests I performed. Clustering should be somewhat handled by
> the ncl_write() as is. As an additional advantage, I removed the now
> unneeded phys buffer allocation.
>
> If you agree with the approach on principle, I want to ask what to do
> about the commit stuff there (I simply removed that for now).
Wow, this is looking good to me. I had thought that the simple way to make
ncl_putpages() go through the buffer cache was to replace ncl_writerpc() with
VOP_WRITE(). My concern was all the memory<->memory copying that would
go on between the pages being written and the buffers allocated by VOP_WRITE().
If there is a way to avoid some (if not all) of this memory<->memory copying, 
then
I think it would be a big improvement..

As far as the commit goes, you don't need to do anything if you are calling 
VOP_WRITE().
(The code below VOP_WRITE() takes care of all of that.)
--> You might want to implement a function like nfs_write(), but with extra 
arguments.
  If you did that, you could indicate when you want the writes to happen 
synchronously
  vs. async/delayed and that would decide when FILESYNC would be specified.

As far as I know, the unpatched nc_putpages() is badly broken for the 
UNSTABLE/commit
case. For UNSTABLE writes, the client is supposed to know how to write the data 
again
if the server crashes/reboots before a Commit RPC is successfully done for the 
data.
(The ncl_clearcommit() function is the one called when the server indicates it 
has
 rebooted and needs this. It makes no sense whatsoever and breaks the client to 
call
 it in ncl_putpages() when mustcommit is set. All mustcommit being set 
indicates is
 that the write RPC was done UNSTABLE and the above applies to it. Some servers 
always
 do FILESYNC, so it isn't ever necessary to do a Commit PRC or redo the write 
RPCs.)

Summary. If you are calling VOP_WRITE() or a similar call above the buffer 
cache,
then you don't have to worry about any of this.

> Things that needs to be done is to add missed handling of the IO flags to
> ncl_write().

> +   if (error == 0 || !nfs_keep_dirty_on_error)
> vnode_pager_undirty_pages(pages, rtvals, count - 
> uio.uio_resid);
If the data isn't copied, will this data still be available to the NFS buffer 
cache code,
so that it can redo the writes for the UNSTABLE case, if the server reboots 
before a
Commit RPC has succeeded?

> -   if (must_commit)
> -   ncl_clearcommit(vp->v_mount);
No matter what else we do, this should go away. As above, it breaks the NFS 
client
and basically forces all dirty buffer cache blocks to be rewritten when it 
shouldn't
be necessary.

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: process killed: text file modification

2017-03-21 Thread Rick Macklem
Konstantin Belousov wrote:
[stuff snipped]
> By 'impossible' I mean some arbitrary combination of bytes which were
> written by many means to the file at arbitrary moments.  In other words,
> the file content, or even a single page/block content is not atomic
> WRT the client updates.
Yes. For multiple processes writing the same file, I'd agree that's going to 
happen
unless the processes use advisoty byte range locking to order the updates.

And, I'm pretty sure a process that does both write(2) syscalls on a file and
modifies pages of it that are mmap()'d will produce "interesting" results as you
describe.
[stuff snipped]
> Or, what seems more likely to me, the code was written on a system where
> buffer cache and page queues are not coherent.
>
> Anyway, my position is that nfs VOP_PUTPAGES() should do write through
> buffer cache, not issuing the direct rpc call with the pages as source.
Hmm. Interesting idea. Since a "struct buf" can only refer to contiguous bytes,
I suspect each page might end up as a separate "struct buf", at least until some
clustering algorithm succeeded in merging them.

I would agree that it would be nice to change VOP_PUTPAGES(), since it currently
results in a lot of 4K writes (with FILE_SYNC I think?) and this is normally 
slow/inefficient
for the server. (It would be interesting to try your suggestion above and see 
if the
pages would cluster into larger writes. Also, the "struct buf" code knows how 
to do
UNSTABLE writes followed by a Commit.)
--> I am currently working on a pNFS server (which is coming along fairly 
well), so
  I have no idea if/when I might get around to trying to do this.

> Then your patch would need an update with the mentioned call to ncl_flush().
Yes.

rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: process killed: text file modification

2017-03-20 Thread Rick Macklem
Gergely Czuczy wrote:
[stuff snipped]
> Actually I want to test it, but you guys are so vehemently discussing
> it, I thought it would be better to do so, once you guys settled your
> analysis on the code. Also, me not having the problem occurring, I don't
> think would mean it's solved, since that would only mean, the codepath
> for my specific usecase works. There might be other things there as
> well, what I don't hit.
I hope by vehemently, you didn't find my comments as nasty. If they did
come out that way, it was not what I intended and I apologize.

> Let me know which patch should I test, and I will see to it in the next
> couple of days, when I get the time to do it.
I've attached it here again and, yes, I would agree that the results you get
from testing are just another data point and not definitive. 
(I'd say this statement is true of all testing of nontrivial code.)

Thanks in advance for any testing you can do, rick



textmod.patch
Description: textmod.patch
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: process killed: text file modification

2017-03-20 Thread Rick Macklem
Konstantin Belousov wrote:
[stuff snipped]
> Yes, I have to somewhat retract my claims, but then I have another set
> of surprises.
Righto.

> I realized (remembered) that nfs has its own VOP_PUTPAGES() method.
> Implementation seems to directly initiate write RPC request using the
> pages as the source buffer. I do not see anything in the code which
> would mark the buffers, which possibly contain the pages, as clean,
> or mark the buffer range as undirty.
The only place I know of that the code does this is in the "struct buf's"
hanging off of v_bufobj.bo_dirty.
I imagine there would be a race between the write-back to the NFS server
and further changes to the page by the process. For the most part, the
VOP_PUTPAGES() is likely to happen after the process is done modifying
the pages (often exited). For cases where it happens sooner, I would expect
the page(s) to be written multiple times, but the last write should bring
the file "up to date" on the server.

> At very least, this might cause unnecessary double-write of the same
> data. I am not sure if it could cause coherency issues between data
> written using mappings and write(2). Also, both vm_object_page_clean()
> and vfs_busy_pages() only ensure the shared-busy state on the pages,
> so write(2) can occur while pageout sends data to server, causing
> 'impossible' content transmitted over the wire.
I'm not sure what you mean by impossible content, but for NFS the only
time the file on the NFS server should be "up to date" will be after a file
doing write(2) writing has closed the fd (and only then if options like
"nocto" has not been used) or after an fsync(2) done by the process
doing the writing.
For mmap'd writing, I think msync(2) is about the only
thing the process can do to ensure the data is written back to the server.
(There was a patch to the NFS VOP_CLOSE() that does a vm_object_page_clean()
 but without the OBJPC_SYNC flag which tries to make sure the pages get written
 shortly after the file is closed. Of course, an mmap'd file can still be 
modified by the
 process after close(2), so it is "just an attempt to make the common case 
work".
 I don't recall, but I don't think I was the author of this patch.)

I also wouldn't be surprised that multiple writes of the same page(s) occurs
under certain situations. (NFS has no rules w.r.t. write ordering. Each RPC is
separate and simply writes N bytes at file offset S.) It definitely happens when
there are multiple write(2)s of partial buffers, depending on when a sync() 
happens.

> Could you, please, explain the reasons for such implementation of
> ncl_putpage() ?
Well, I wasn't the author (it was just cribbed from the old NFS client and I 
don't
know who wrote it), so I'm afraid I don't know. (It's code I avoid changing 
because I don't
really understand it.)

I suspect that the author assumed that processes would either mmap the file
or use write(2) and wouldn't ever try and mix them to-gether.

Hope this helps, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: NFSv2 boot & OLD_NFSV2

2017-03-20 Thread Rick Macklem
Baptiste Daroussin wrote:
> On Mon, Mar 20, 2017 at 08:22:12PM +0200, Toomas Soome wrote:
> > Hi!
> >
> > The current boot code is building NFSv3, with preprocessor conditional 
> > OLD_NFSV2. Should NFSv2 code still be kept around or can we burn it?
> >
> > rgds,
> > toomas
>
> I vote burn
>
> Bapt
I would be happy to see NFSv2 go away. However, depending on how people 
configure
their diskless root fs, they do end up using NFSv2 for their root fs.

Does booting over NFSv3 affect this?

I think the answer is no for a FreeBSD server (since the NFSv2 File Handle is 
the same as
the NFSv3 one, except padded with 0 bytes to 32bytes long).
However, there might be non-FreeBSD NFS servers where the NFSv2 file handle is 
different
than the NFSv3 one and for that case, the user would need NFSv2 boot code (or
reconfigure their root fs to use NFSv3).

To be honest, I suspect few realize that they are using NFSv2 for their root fs.
(They'd see it in a packet trace or via "nfsstat -m", but otherwise they 
probably
  think they are using NFSv3 for their root fs.)

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: process killed: text file modification

2017-03-19 Thread Rick Macklem
Kostik wrote:
[stuff snipped]
>> >> Dirty pages are flushed by writes, so if we have a set of dirty pages and
>> >> async vm_object_page_clean() is called on the vnode' vm_object, we get
>> >> a bunch of delayed-write AKA dirty buffers.  This is possible even after
>> >> VOP_CLOSE() was done, e.g. by syncer performing regular run involving
>> >> vfs_msync().
>> When I was talking about ncl_flush() above, I was referring to buffer cache
>> buffers written by a write(2) syscall, not the case of mmap'd pages.
> But dirty buffers can appear on the vnode queue due to dirty pages msyncing
> by syncer, for instance.
Ok, just to clarify this, in case I don't understand it...
- You aren't saying that anything will be added to v_bufobj.bo_dirty.bv_hd by
  vfs_msync() or similar, after VOP_CLOSE(), right?
--> ncl_flush() { was called nfs_flush() in the old NFS client } only deals with
 "struct buf's" hanging off v_bufobj.bo_dirty.bv_hd, so I don't see a use 
for
 it in the patch.

As for pages added to v_bufobj.bo_object...the patch assumes that the process
that was writing the executable file mmap'd is done { normally exited } before
the exec() syscall occurs. If it is still dirtying pages when the exec() 
occurs, then
failing with "Text file modified" seems correct to me. As you mentioned, another
client can do this to the file anyhow.

My understanding is that vm_object_page_clean() will get all the dirty pages 
written
back to the server at that point and if that is done in VOP_SET_TEXT() as this 
patch
does, what more can the NFS client do?

[more stuff snipped]
> Syncer does not open the vnode inside the vfs_msync() operations.
Ok, but this doesn't put "struct buf's" on v_bufobj.bo_dirty.bv_hd. Am I right?
(When I said "buffers". I meant "struct buf's" under bo_dirty, not stuff under
 v_bufobj.bo_object.)

> We do track writeability to the file, and do not allow execution if there is
> an active writer, be it a file descriptor opened for write, or a writeable
> mapping.  And in reverse, if the file is executed (VV_TEXT is set), then
> we disallow opening the file for write.
Yes, and that was why I figured doing this in VOP_SET_TEXT(), just before
setting VV_TEXT, was the right place to do it.
[more stuff snipped]
>
> Thanks for testing the patch. Now, if others can test it...rick
>
Again, hopefully others (especially the original reporter) will be able to
test the patch, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: crash: umount_nfs: Current

2017-03-17 Thread Rick Macklem
Oops, yea I see that now. In the krpc it is very hard to tell when the data 
structures
(that hold the mutexes) can be safely free'd. Code like xprt_unregister() get 
called
asynchronously and lock a mutex as soon as called.
(The crash fixed by r313735 was a prematurely free'd mutex that 
xprt_unregister()
  used, but on the server side.)

I'll look at the code as see if I can figure out how to delay freeing the 
structure without
leaving it allocated "forever". (I'll admit I've been tempted to just never 
free it, since
the memory leak this would cause would be small enough nothing would really 
notice
it. And, of course for your case of shutdown, it would be harmless to just not 
free it.)

rick

From: Larry Rosenman 
Sent: Thursday, March 16, 2017 7:46:51 PM
To: Rick Macklem; freebsd...@freebsd.org
Cc: freebsd-current@FreeBSD.org
Subject: Re: crash: umount_nfs: Current

Err, I’m at  r315289….


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281



On 3/16/17, 5:51 PM, "Rick Macklem"  wrote:

I believe the cause of this crash was fixed by a recent commit
to head r313735 (which was MFC'd to stable/11 and stable/10).

rick

From: owner-freebsd-curr...@freebsd.org  
on behalf of Larry Rosenman 
Sent: Wednesday, March 15, 2017 10:44:33 PM
To: freebsd...@freebsd.org
Cc: freebsd-current@FreeBSD.org
Subject: crash: umount_nfs: Current

Recent current, playing with new FreeNAS Corral, client is FreeBSD -CURRENT.

Lovely crash:

borg.lerctr.org dumped core - see /var/crash/vmcore.1

Wed Mar 15 21:38:53 CDT 2017

FreeBSD borg.lerctr.org 12.0-CURRENT FreeBSD 12.0-CURRENT #11 r315289: Tue 
Mar 14 20:55:36 CDT 2017 r...@borg.lerctr.org:/usr/obj/usr/src/sys/VT-LER  
amd64

panic: general protection fault

GNU gdb (GDB) 7.12.1 [GDB v7.12.1 for FreeBSD]
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd12.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...Reading symbols from 
/usr/lib/debug//boot/kernel/kernel.debug...done.
done.

Unread portion of the kernel message buffer:


Fatal trap 9: general protection fault while in kernel mode
cpuid = 1; apic id = 21
instruction pointer = 0x20:0x80a327ae
stack pointer   = 0x28:0xfe535ebb2800
frame pointer   = 0x28:0xfe535ebb2830
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 3172 (umount)
trap number = 9
panic: general protection fault
cpuid = 1
time = 1489631515
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe535ebb2440
vpanic() at vpanic+0x19c/frame 0xfe535ebb24c0
panic() at panic+0x43/frame 0xfe535ebb2520
trap_fatal() at trap_fatal+0x322/frame 0xfe535ebb2570
trap() at trap+0x5e/frame 0xfe535ebb2730
calltrap() at calltrap+0x8/frame 0xfe535ebb2730
--- trap 0x9, rip = 0x80a327ae, rsp = 0xfe535ebb2800, rbp = 
0xfe535ebb2830 ---
__mtx_lock_flags() at __mtx_lock_flags+0x3e/frame 0xfe535ebb2830
xprt_unregister() at xprt_unregister+0x28/frame 0xfe535ebb2850
clnt_reconnect_destroy() at clnt_reconnect_destroy+0x38/frame 
0xfe535ebb2880
nfs_unmount() at nfs_unmount+0x182/frame 0xfe535ebb28d0
dounmount() at dounmount+0x5c1/frame 0xfe535ebb2950
sys_unmount() at sys_unmount+0x30f/frame 0xfe535ebb2a70
amd64_syscall() at amd64_syscall+0x55a/frame 0xfe535ebb2bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe535ebb2bf0
--- syscall (22, FreeBSD ELF64, sys_unmount), rip = 0x800872b9a, rsp = 
0x7fffde88, rbp = 0x7fffe3c0 ---
Uptime: 2h43m8s
Dumping 5744 out of 131005 
MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading sy

Re: process killed: text file modification

2017-03-17 Thread Rick Macklem
Dimitry Andric wrote:
>On 17 Mar 2017, at 15:19, Konstantin Belousov  wrote:
>>
>> On Fri, Mar 17, 2017 at 01:53:46PM +0000, Rick Macklem wrote:
>>> Well, I don't mind adding ncl_flush(), but it shouldn't be
>>> necessary. I actually had it in the first
>>> rendition of the patch, but took it out because it should happen
>>> on the VOP_CLOSE() if any writing to the buffer cache happened
>>> and that code hasn't changed in many years.
>> Dirty pages are flushed by writes, so if we have a set of dirty pages and
>> async vm_object_page_clean() is called on the vnode' vm_object, we get
>> a bunch of delayed-write AKA dirty buffers.  This is possible even after
>> VOP_CLOSE() was done, e.g. by syncer performing regular run involving
>> vfs_msync().
When I was talking about ncl_flush() above, I was referring to buffer cache
buffers written by a write(2) syscall, not the case of mmap'd pages.

>>
>> I agree that the patch would not create new dirty buffers, but it is possible
>> to get them by other means.
To write to a buffer cache block, the file would be opened by another thread and
that is what this sanity check was meant to catch. As for dirtying pages that 
are mmap'd,
as far as I understand it, the NFS client has no way of knowing if this will 
happen more
until VOP_INACTIVE() is called on the vnode.

To be honest, this check could easily be deleted. After all, NFS could care 
less if a file
is being executed (all it sees are reads and writes). Without the check, the 
executable
might do "interesting" things;-)
[stuff snipped]
> FWIW, Rick's patch seems to do the trick, both for my test case and lld
> itself.  And even with vfs.timestamp_precision=3 on both server and
> client.
Hopefully the original reporter of the problem (Gergely ??) can test the patch 
as well.
I think the patch is pretty harmless, although it could be argued that setting
np->m_mtime = np->n_nattr.na_mtime (or close to that)
shouldn't happen for the case where there isn't any dirty pages found to flush.
However, once a file mmap'd we don't know when it does get modified anyhow
(as discussed above), so setting it here doesn't seem harmful to me.

Thanks for testing the patch. Now, if others can test it...rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: process killed: text file modification

2017-03-17 Thread Rick Macklem
Well, I don't mind adding ncl_flush(), but it shouldn't be
necessary. I actually had it in the first
rendition of the patch, but took it out because it should happen
on the VOP_CLOSE() if any writing to the buffer cache happened
and that code hasn't changed in many years.

What the patch was missing was updating n_mtime after the dirty
page flush.

Btw, a flush without OBJPC_SYNC happens when the file is VOP_CLOSE()'d
unless the default value of vfs.nfs.clean_[ages_on_close is changed, which
I think is why the 1sec resolution always seemed to work, at least for the
example where there was an munmap before close.

Attached is an updated version with that in it, rick

From: owner-freebsd-curr...@freebsd.org  on 
behalf of Konstantin Belousov 
Sent: Friday, March 17, 2017 4:36:05 AM
To: Rick Macklem
Cc: Dimitry Andric; Ian Lepore; Gergely Czuczy; FreeBSD Current
Subject: Re: process killed: text file modification

On Fri, Mar 17, 2017 at 03:10:57AM +, Rick Macklem wrote:
> Hope you don't mind a top post...
> Attached is a little patch you could test maybe?
>
> rick
> 
> From: owner-freebsd-curr...@freebsd.org  
> on behalf of Rick Macklem 
> Sent: Thursday, March 16, 2017 9:57:23 PM
> To: Dimitry Andric; Ian Lepore
> Cc: Gergely Czuczy; FreeBSD Current
> Subject: Re: process killed: text file modification
>
> Dimitry Andric wrote:
> [lots of stuff snipped]
> > I'm also running into this problem, but while using lld.  I must set
> > vfs.timestamp_precision to 1 (e.g. sec + ns accurate to 1/HZ) on both
> > the client and the server, to make it work.
> >
> > Instead of GNU ld, lld uses mmap to write to the output executable.  If
> > this executable is more than one page, and resides on an NFS share,
> > running it will almost always result in "text file modification", if
> > vfs_timestamp_precision >= 2.
> >
> > A small test case: http://www.andric.com/freebsd/test-mmap-write.c,
> > which writes a simple "hello world" i386-freebsd executable file, using
> > the sequence: open() -> ftruncate() -> mmap() -> memcpy() -> munmap() ->
> > close().
> Hopefully Kostik will correct me if I have this wrong, but I don't believe any
> of the above syscalls guarantee that dirty pages have been flushed.
> At least for cases without munmap(), the writes of dirty pages can occur after
> the file descriptor is closed. I run into this in NFSv4, where there is a 
> Close (NFSv4 one)
> that can't be done until VOP_INACTIVE().
> If you look in the NFS VOP_INACTIVE() { called ncl_inactive() } you'll see:
> if (NFS_ISV4(vp) && vp->v_type == VREG) {
> 237 /*
> 238  * Since mmap()'d files do I/O after VOP_CLOSE(), the 
> NFSv4
> 239  * Close operations are delayed until now. Any dirty
> 240  * buffers/pages must be flushed before the close, so 
> that the
> 241  * stateid is available for the writes.
> 242  */
> 243 if (vp->v_object != NULL) {
> 244 VM_OBJECT_WLOCK(vp->v_object);
> 245 retv = vm_object_page_clean(vp->v_object, 0, 
> 0,
> 246 OBJPC_SYNC);
> 247 VM_OBJECT_WUNLOCK(vp->v_object);
> 248 } else
> 249 retv = TRUE;
> 250 if (retv == TRUE) {
> 251 (void)ncl_flush(vp, MNT_WAIT, NULL, ap->a_td, 
> 1, 0);
> 252 (void)nfsrpc_close(vp, 1, ap->a_td);
> 253 }
> 254 }
> Note that nothing like this is done for NFSv3.
> What might work is implementing a VOP_SET_TEXT() vnode op for the NFS
> client that does most of the above (except for nfsrpc_close()) and then sets
> VV_TEXT.
> --> That way, all the dirty pages will be flushed to the server before the 
> executable
>  starts executing.
>
> > Running this on an NFS share, and then attempting to run the resulting
> > 'helloworld' executable will result in the "text file modification"
> > error, and it will be killed.  But if you simply copy the executable to
> > something else, then it runs fine, even if you use -p to retain the
> > properties!
> >
> > IMHO this is a rather surprising problem with the NFS code, and Kostik
> > remarked that the problem seems to be that the VV_TEXT flag is set too
> > early, before the nfs cache is invalidated.  Rick, do you have any ideas
> &

Re: process killed: text file modification

2017-03-16 Thread Rick Macklem
Hope you don't mind a top post...
Attached is a little patch you could test maybe?

rick

From: owner-freebsd-curr...@freebsd.org  on 
behalf of Rick Macklem 
Sent: Thursday, March 16, 2017 9:57:23 PM
To: Dimitry Andric; Ian Lepore
Cc: Gergely Czuczy; FreeBSD Current
Subject: Re: process killed: text file modification

Dimitry Andric wrote:
[lots of stuff snipped]
> I'm also running into this problem, but while using lld.  I must set
> vfs.timestamp_precision to 1 (e.g. sec + ns accurate to 1/HZ) on both
> the client and the server, to make it work.
>
> Instead of GNU ld, lld uses mmap to write to the output executable.  If
> this executable is more than one page, and resides on an NFS share,
> running it will almost always result in "text file modification", if
> vfs_timestamp_precision >= 2.
>
> A small test case: http://www.andric.com/freebsd/test-mmap-write.c,
> which writes a simple "hello world" i386-freebsd executable file, using
> the sequence: open() -> ftruncate() -> mmap() -> memcpy() -> munmap() ->
> close().
Hopefully Kostik will correct me if I have this wrong, but I don't believe any
of the above syscalls guarantee that dirty pages have been flushed.
At least for cases without munmap(), the writes of dirty pages can occur after
the file descriptor is closed. I run into this in NFSv4, where there is a Close 
(NFSv4 one)
that can't be done until VOP_INACTIVE().
If you look in the NFS VOP_INACTIVE() { called ncl_inactive() } you'll see:
if (NFS_ISV4(vp) && vp->v_type == VREG) {
237 /*
238  * Since mmap()'d files do I/O after VOP_CLOSE(), the 
NFSv4
239  * Close operations are delayed until now. Any dirty
240  * buffers/pages must be flushed before the close, so 
that the
241  * stateid is available for the writes.
242  */
243 if (vp->v_object != NULL) {
244 VM_OBJECT_WLOCK(vp->v_object);
245 retv = vm_object_page_clean(vp->v_object, 0, 0,
246 OBJPC_SYNC);
247 VM_OBJECT_WUNLOCK(vp->v_object);
248 } else
249 retv = TRUE;
250 if (retv == TRUE) {
251 (void)ncl_flush(vp, MNT_WAIT, NULL, ap->a_td, 
1, 0);
252 (void)nfsrpc_close(vp, 1, ap->a_td);
253 }
254 }
Note that nothing like this is done for NFSv3.
What might work is implementing a VOP_SET_TEXT() vnode op for the NFS
client that does most of the above (except for nfsrpc_close()) and then sets
VV_TEXT.
--> That way, all the dirty pages will be flushed to the server before the 
executable
 starts executing.

> Running this on an NFS share, and then attempting to run the resulting
> 'helloworld' executable will result in the "text file modification"
> error, and it will be killed.  But if you simply copy the executable to
> something else, then it runs fine, even if you use -p to retain the
> properties!
>
> IMHO this is a rather surprising problem with the NFS code, and Kostik
> remarked that the problem seems to be that the VV_TEXT flag is set too
> early, before the nfs cache is invalidated.  Rick, do you have any ideas
> about this?
I don't think it is the "nfs cache" that needs invalidation, but the dirty
pages written by the mmap'd file need to be flushed, before the VV_TEXT
is set, I think?

If Kostik meant "attribute cache" when he said "nfs cache", I'll note that the
cached attributes (including mtime) are updated by the reply to every write.
(As such, I think it is the writes that must be done before setting VV_TEXT
  that is needed.)

It is a fairly simple patch to create. I'll post one to this thread in a day or 
so
unless Kostik thinks this isn't correct and not worth trying.

rick


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


textmod.patch
Description: textmod.patch
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: process killed: text file modification

2017-03-16 Thread Rick Macklem
Dimitry Andric wrote:
[lots of stuff snipped]
> I'm also running into this problem, but while using lld.  I must set
> vfs.timestamp_precision to 1 (e.g. sec + ns accurate to 1/HZ) on both
> the client and the server, to make it work.
> 
> Instead of GNU ld, lld uses mmap to write to the output executable.  If
> this executable is more than one page, and resides on an NFS share,
> running it will almost always result in "text file modification", if
> vfs_timestamp_precision >= 2.
> 
> A small test case: http://www.andric.com/freebsd/test-mmap-write.c,
> which writes a simple "hello world" i386-freebsd executable file, using
> the sequence: open() -> ftruncate() -> mmap() -> memcpy() -> munmap() ->
> close().
Hopefully Kostik will correct me if I have this wrong, but I don't believe any
of the above syscalls guarantee that dirty pages have been flushed.
At least for cases without munmap(), the writes of dirty pages can occur after
the file descriptor is closed. I run into this in NFSv4, where there is a Close 
(NFSv4 one)
that can't be done until VOP_INACTIVE().
If you look in the NFS VOP_INACTIVE() { called ncl_inactive() } you'll see:
if (NFS_ISV4(vp) && vp->v_type == VREG) {
237 /*
238  * Since mmap()'d files do I/O after VOP_CLOSE(), the 
NFSv4
239  * Close operations are delayed until now. Any dirty
240  * buffers/pages must be flushed before the close, so 
that the
241  * stateid is available for the writes.
242  */
243 if (vp->v_object != NULL) {
244 VM_OBJECT_WLOCK(vp->v_object);
245 retv = vm_object_page_clean(vp->v_object, 0, 0,
246 OBJPC_SYNC);
247 VM_OBJECT_WUNLOCK(vp->v_object);
248 } else
249 retv = TRUE;
250 if (retv == TRUE) {
251 (void)ncl_flush(vp, MNT_WAIT, NULL, ap->a_td, 
1, 0);
252 (void)nfsrpc_close(vp, 1, ap->a_td);
253 }
254 }
Note that nothing like this is done for NFSv3.
What might work is implementing a VOP_SET_TEXT() vnode op for the NFS
client that does most of the above (except for nfsrpc_close()) and then sets
VV_TEXT.
--> That way, all the dirty pages will be flushed to the server before the 
executable
 starts executing.

> Running this on an NFS share, and then attempting to run the resulting
> 'helloworld' executable will result in the "text file modification"
> error, and it will be killed.  But if you simply copy the executable to
> something else, then it runs fine, even if you use -p to retain the
> properties!
>
> IMHO this is a rather surprising problem with the NFS code, and Kostik
> remarked that the problem seems to be that the VV_TEXT flag is set too
> early, before the nfs cache is invalidated.  Rick, do you have any ideas
> about this?
I don't think it is the "nfs cache" that needs invalidation, but the dirty
pages written by the mmap'd file need to be flushed, before the VV_TEXT
is set, I think?

If Kostik meant "attribute cache" when he said "nfs cache", I'll note that the
cached attributes (including mtime) are updated by the reply to every write.
(As such, I think it is the writes that must be done before setting VV_TEXT
  that is needed.)

It is a fairly simple patch to create. I'll post one to this thread in a day or 
so
unless Kostik thinks this isn't correct and not worth trying.

rick


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: crash: umount_nfs: Current

2017-03-16 Thread Rick Macklem
I believe the cause of this crash was fixed by a recent commit
to head r313735 (which was MFC'd to stable/11 and stable/10).

rick

From: owner-freebsd-curr...@freebsd.org  on 
behalf of Larry Rosenman 
Sent: Wednesday, March 15, 2017 10:44:33 PM
To: freebsd...@freebsd.org
Cc: freebsd-current@FreeBSD.org
Subject: crash: umount_nfs: Current

Recent current, playing with new FreeNAS Corral, client is FreeBSD -CURRENT.

Lovely crash:

borg.lerctr.org dumped core - see /var/crash/vmcore.1

Wed Mar 15 21:38:53 CDT 2017

FreeBSD borg.lerctr.org 12.0-CURRENT FreeBSD 12.0-CURRENT #11 r315289: Tue Mar 
14 20:55:36 CDT 2017 r...@borg.lerctr.org:/usr/obj/usr/src/sys/VT-LER  amd64

panic: general protection fault

GNU gdb (GDB) 7.12.1 [GDB v7.12.1 for FreeBSD]
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd12.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...Reading symbols from 
/usr/lib/debug//boot/kernel/kernel.debug...done.
done.

Unread portion of the kernel message buffer:


Fatal trap 9: general protection fault while in kernel mode
cpuid = 1; apic id = 21
instruction pointer = 0x20:0x80a327ae
stack pointer   = 0x28:0xfe535ebb2800
frame pointer   = 0x28:0xfe535ebb2830
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 3172 (umount)
trap number = 9
panic: general protection fault
cpuid = 1
time = 1489631515
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe535ebb2440
vpanic() at vpanic+0x19c/frame 0xfe535ebb24c0
panic() at panic+0x43/frame 0xfe535ebb2520
trap_fatal() at trap_fatal+0x322/frame 0xfe535ebb2570
trap() at trap+0x5e/frame 0xfe535ebb2730
calltrap() at calltrap+0x8/frame 0xfe535ebb2730
--- trap 0x9, rip = 0x80a327ae, rsp = 0xfe535ebb2800, rbp = 
0xfe535ebb2830 ---
__mtx_lock_flags() at __mtx_lock_flags+0x3e/frame 0xfe535ebb2830
xprt_unregister() at xprt_unregister+0x28/frame 0xfe535ebb2850
clnt_reconnect_destroy() at clnt_reconnect_destroy+0x38/frame 0xfe535ebb2880
nfs_unmount() at nfs_unmount+0x182/frame 0xfe535ebb28d0
dounmount() at dounmount+0x5c1/frame 0xfe535ebb2950
sys_unmount() at sys_unmount+0x30f/frame 0xfe535ebb2a70
amd64_syscall() at amd64_syscall+0x55a/frame 0xfe535ebb2bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe535ebb2bf0
--- syscall (22, FreeBSD ELF64, sys_unmount), rip = 0x800872b9a, rsp = 
0x7fffde88, rbp = 0x7fffe3c0 ---
Uptime: 2h43m8s
Dumping 5744 out of 131005 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/zfs.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/zfs.ko.debug...done.
done.
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/opensolaris.ko.debug...done.
done.
Reading symbols from /boot/kernel/linux.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/linux.ko.debug...done.
done.
Reading symbols from /boot/kernel/linux_common.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/linux_common.ko.debug...done.
done.
Reading symbols from /boot/kernel/if_lagg.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/if_lagg.ko.debug...done.
done.
Reading symbols from /boot/kernel/coretemp.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/coretemp.ko.debug...done.
done.
Reading symbols from /boot/kernel/aesni.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/aesni.ko.debug...done.
done.
Reading symbols from /boot/kernel/filemon.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/filemon.ko.debug...done.
done.
Reading symbols from /boot/kernel/fuse.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/fuse.ko.debug...done.
done.
Reading symbols from /boot/kernel/ichsmb.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/ichsmb.ko.debug...done.
done.
Reading symbols from /boot/kernel/smbus.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/smbus.ko.debug...done.
done.
Reading symbols from /boot/kernel/ichwd.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/ichwd.ko.debug...done.
done.
Reading symbols from /boot/kernel/cpuctl.ko...Reading symbols from 
/usr/li

Re: Recent FreeBSD, NFSv4 and /var/db/mounttab

2017-02-01 Thread Rick Macklem
Claude Buisson wrote:
>Hi,
>
>Last month, I started switching all my systems (stable/9, stable/10,
>stable/11 and current) to NFSv4, and I found that:
>
>   on current (svn 312652) an entry is added to /var/db/mounttab by
>mount_nfs(8), but not suppressed by umount(8). It can be suppressed by
>rpc.umntall(8).
>
>The same anomaly appears on stable/11 after upgrading to svn 312950.
>
>It is relatively easy to trace this anomaly to r308871 on current and
>its MFHs (r309517 for stable/11).
>
>Patching sbin/umount/umount.c to restore the RPC call for NFSv4 makes
>umount(8) suppress the mounttab entry as before.
>
>I do not know what is the proper solution, as suppressing the
>modification of mounttab by mount_nfs(8) for NFSv4 could be an (more
>complicated) alternative !
This would be the correct fix. The entries in mounttab are meaningless.
Even for NFSv3, all they do is provide a "best guess" answer for
"showmount".
- The Mount protocol is not part of NFSv4. I had a patch which disabled
  it for NFSv4 servers, but some folk liked the idea of having "showmount -e"
  to work, so I didn't commit it.

rick
ps: I had actually thought mount_nfs(8) didn't do a Mount protocol RPC
  for NFSv4, but I guess it is. That needs to be fixed, since NFSv4 servers
  don't need to support Mount at all.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: NFS 4.1

2017-01-17 Thread Rick Macklem
The vmware client will not work with the FreeBSD server at this time. It does
a ReclaimComplete with file system boolean set ``true``. This isn`t supported
by the FreeBSD server at this time. (vmware is the only client that does this, 
as
far as I am know.)

The fix is probably simple, but since I don`t
have access to vmware and those that reported it haven`t been able to give me
the information I need

If you are willing to test a couple of simple patches for the server in order
to resolve this, email and I`ll send them to you, rick


From: owner-freebsd-curr...@freebsd.org  on 
behalf of Michael Ware 
Sent: Tuesday, January 17, 2017 1:16:06 PM
To: Russell L. Carter
Cc: freebsd-current@freebsd.org
Subject: Re: NFS 4.1

Thanks for the reply Russell,
I'm looking to set up a 4.1 server in order to host vmware images. I have
set up an exports but I get an error stating NFS 4 is not supported when
trying to attach it in VM storage. Is there any documentation for setting
this up?
Thanks
Mike

On Tue, Jan 17, 2017 at 10:02 AM, Russell L. Carter 
wrote:

> On 01/17/17 10:38, Michael Ware wrote:
>
>> Good day,
>> Does anyone know if NFS 4.1 (not 4.0) is available in FreeBSD 11? I have
>> not been able to find any documentation around this.
>> Thanks
>>
>>
> Yes, though I'm not sure what specific feature you're looking for.
> FreeBSD interoperates with my linux NFS 4.1 servers and clients
> just fine.
>
> man nfsv4
>
> $ cat ~/bin/knuth-mount
> #! /bin/sh
>
> # man mount_nfs
> MOUNT="mount_nfs -o nfsv4,minorversion=1"
> #MOUNT="mount_nfs -o nfsv3"
>
> NFS_SERVER_HOST=knuth
> $MOUNT $NFS_SERVER_HOST:/export/packages /mnt/$NFS_SERVER_HOST/packages
> $MOUNT $NFS_SERVER_HOST:/usr/src /usr/src
> $MOUNT $NFS_SERVER_HOST:/usr/obj /usr/obj
>
> HTH,
> Russell
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>



--
Michael Ware
UCSC Baskin Engineering
Unix, Network and Security
406-210-4725
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: NFSv4 performance degradation with 12.0-CURRENT client

2016-11-26 Thread Rick Macklem
Alan Somers wrote:
[stuff snipped]
>Mounting nullfs with the nocache option, ad kib suggested, fixed the
>problem.  Also, applying kib's patch and then mounting nullfs with
>default options also fixed the problem.  Here is the nfsstat output
>for "ls -al" when using kib's patch.  Notice the client has far fewer
>opens:
I did a quick test which confirmed that the opens get closed when the "nocache"
option is used on the nullfs mount as well.

Kostik, I think your patch is a good idea and you can consider it reviewed by me
if you'd like.

I also did a quick test wth unionfs and it did not accumulate opens, so it 
doesn't
seem to suffer from this problem. (It does have issues, as noted by the BUGS
section of the mount_unionfs man page.)

rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: NFSv4 performance degradation with 12.0-CURRENT client

2016-11-26 Thread Rick Macklem
Konstantin Belousov wrote:
[stuff snipped]
>I thought that the issue was in tracking any opens and mmaps, but from this
>reply it is not that clear.  Do you need callback when all opens and mmaps
>have ended, or only opens and mmaps for write ?  If later, we already have
>a suitable mechanism VOP_ADD_WRITECOUNT().

Not quite. The NFSv4 client needs to Close the NFSv4 Open after all I/O on
the file has been done. This applies to both reads and writes.
Since mmap'd files can generate I/O after the VOP_CLOSE(), the NFSv4 client
can't do the NFSv4 Close in VOP_CLOSE().
Since it can't do it then, it waits until VOP_INACTIVE() to do the NFSv4 Close.

This might be improved by:
- A flag that indicates that an open file descriptor has been mmap()d, which
  VOP_CLOSE() could check to decide if it can do the NFSv4 Close.
  (ie. It could do the NFSv4 Close if the file descriptor hasn't been mmap()d.)
- If the system knows when mmap()d I/O is done (the process has terminated?),
  it could do a VOP_MMAP_IO_DONE() and the NFSv4 client would do the NFSv4 Close
  in it.
  --> I don't know if this is feasible and I suspect if it could be done, that 
it would
usually happen just before VOP_INACTIVE(). { This case of nullfs 
caching was an
exception, I think? }

Does this clarify it? rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: NFSv4 performance degradation with 12.0-CURRENT client

2016-11-25 Thread Rick Macklem
Konstantin Belousov wrote:

>On Thu, Nov 24, 2016 at 10:45:51PM +0000, Rick Macklem wrote:
>> asom...@gmail.com wrote:
>> >OpenOwner Opens LockOwner LocksDelegs  LocalOwn LocalOpen 
>> >LocalLOwn
>> > 5638141453 0 0 0 0 0   
>> >   0
>> Ok, I think this shows us the problem. 141453 opens is a lot and the client 
>> would have
>> to chek these every time another open is done (there goes all that CPU;-).
>>
>> Now, why has this occurred?
>> Well, the NFSv4 client can't close NFSv4 Opens on a vnode until that vnode's
>> v_usecount goes to 0. This is because mmap'd files might do I/O after the 
>> file
>> descriptor is closed.
>> Now, hopefully Kostik will know something about nullfs and can help with 
>> this.
>> My guess is that nullfs ends up acquiring a refcnt on the NFS vnode so the
>> v_usecount doesn't go to 0 and, therefore, the client never closes the NFSv4 
>> Opens.
>> Kostik, do you know if this is the case and whether or not it can be changed?
>You are absolutely right. Nullfs vnode keeps a reference to the lower
>vnode which is below the nullfs one, i.e. to the nfs vnode in this case.
>If cache option is specified for the nullfs mount (default), the nullfs
>vnodes are cached normally to avoid the cost of creating and destroying
>nullfs vnode on each operation, and related cost of the exclusive locks
>on the lower vnode.
>
>An answer to my question in the previous mail to try with nocache
>option would give the confirmation. Really, I suspected that v_hash
>is calculated differently for NFSv3 and v4 mounts, but if opens are
>accumulated until use ref is dropped, that would explain things as well.
Hopefully Alan can test this and let us know if "nocache" on the nullfs mount
fixes the problem.

>Assuming your diagnosis is correct, are you in fact stating that the
>current VFS KPI is flawed ?  It sounds as if either some another callback
>or counter needs to exist to track number of mapping references to the
>vm object of the vnode, in addition to VOP_OPEN/VOP_CLOSE ?
>
>Currently a rough estimation of the number of mappings, which is sometimes
>slightly wrong, can be obtained by the expression
>vp->v_object->ref_count - vp->v_object->shadow_count
Well, ideally theer would be a VOP_MMAPDONE() or something like that, which
would tell the NFSv4 client that I/O is done on the vnode so it can close it.
If there was some way for the NFSv4 VOP_CLOSE() to be able to tell if the file
has been mmap'd, that would help since it could close the ones that are not
mmap'd on the last descriptor close.
(A counter wouldn't be as useful, since NFSv4 would have to keep checking it to
 see if it can do the close yet, but it might still be doable.)
>
>> >LocalLock
>> >0
>> >Rpc Info:
>> >TimedOut   Invalid X Replies   Retries  Requests
>> >0 0 0 0   662
>> >Cache Info:
>> >Attr HitsMisses Lkup HitsMisses BioR HitsMisses BioW Hits
>> >Misses
>> > 127558   837   121 0 0 0   
>> >   0
>> >BioRLHitsMisses BioD HitsMisses DirE HitsMisses
>> >1 0 6 0 1 0
>> >
>> [more stuff snipped]
>> >What role could nullfs be playing?
>> As noted above, my hunch is that is acquiring a refcnt on the NFS client 
>> vnode such
>> that the v_usecount doesn't go to zero (at least for a long time) and without
>> a VOP_INACTIVE() on the NFSv4 vnode, the NFSv4 Opens don't get closed and
>> accumulate.
>> (If that isn't correct, it is somehow interfering with the client Closing 
>> the NFSv4 Opens
>>  in some other way.)
>>
>The following patch should automatically unset cache option for nullfs
>mounts over NFSv4 filesystem.
>
>diff --git a/sys/fs/nfsclient/nfs_clvfsops.c b/sys/fs/nfsclient/nfs_clvfsops.c
>index 524a372..a7e9fe3 100644
>--- a/sys/fs/nfsclient/nfs_clvfsops.c
>+++ b/sys/fs/nfsclient/nfs_clvfsops.c
>@@ -1320,6 +1320,8 @@ out:
>MNT_ILOCK(mp);
>mp->mnt_kern_flag |= MNTK_LOOKUP_SHARED | MNTK_NO_IOPF |
>MNTK_USES_BCACHE;
>+   if ((VFSTONFS(mp)->nm_flag & NFSMNT_NFSV4) != 0)
>+   mp->mnt_kern_flag |= MNTK_NULL_NOCACHE;
>MNT_IUNLOCK(mp);
>}
>return (error);
>diff --git a/sys/fs/nullfs/null_vfsops.c b/sys/fs/nullfs/null_vfsops.c
>index 49bae28..de05e8b 100644
>--- a/sys/fs/nullf

Re: NFSv4 performance degradation with 12.0-CURRENT client

2016-11-24 Thread Rick Macklem
asom...@gmail.com wrote:
[stuff snipped]
>I've reproduced the issue on stock FreeBSD 12, and I've also learned
>that nullfs is a required factor.  Doing the buildworld directly on
>the NFS mount doesn't cause any slowdown, but doing a buildworld on
>the nullfs copy of the NFS mount does.  The slowdown affects the base
>NFS mount as well as the nullfs copy.  Here is the nfsstat output for
>both server and client duing "ls -al" on the client:
>
>nfsstat -e -s -z
If you do this again, avoid using "-z" and I think you'll see the Opens (below 
Server:)
going up and up...
>
>Server Info:
>  Getattr   SetattrLookup  Readlink  Read WriteCreateRemove
>  800 0   121 0 0 2 0 0
>   Rename  Link   Symlink Mkdir Rmdir   Readdir  RdirPlusAccess
>0 0 0 0 0 0 0 8
>MknodFsstatFsinfo  PathConfCommit   LookupP   SetClId SetClIdCf
>   0 0 0 0 1 3 0 0
> Open  OpenAttr OpenDwnGr  OpenCfrm DelePurge   DeleRet GetFH  Lock
>0 0 0 0 0 0   123 0
>LockT LockU CloseVerify   NVerify PutFH  PutPubFH PutRootFH
>0 0 0 0 0   674 0 0
>Renew RestoreFHSaveFH   Secinfo RelLckOwn  V4Create
>0 0 0 0 0 0
>Server:
>RetfailedFaults   Clients
>0 0 0
>OpenOwner Opens LockOwner LocksDelegs
>0 0 0 0 0
Oops, I think this is an nfsstats bug. I don't normally use "-z", so I didn't 
notice
it clears these counts and it probably should not, since they are "how many of
these that are currently allocated".
I'll check this. (Not relevant to this issue, but needs fixin.;-)
>Server Cache Stats:
>   Inprog  Idem  Non-idemMisses CacheSize   TCPPeak
>0 0 0   674 16738 16738
>
>nfsstat -e -c -z
>Client Info:
>Rpc Counts:
> Getattr   SetattrLookup  Readlink  Read WriteCreateRemove
>   60 0   119 0 0 0 0 0
>   Rename  Link   Symlink Mkdir Rmdir   Readdir  RdirPlusAccess
>0 0 0 0 0 0 0 3
>MknodFsstatFsinfo  PathConfCommit   SetClId SetClIdCf  Lock
>0 0 0 0 0 0 0 0
>LockT LockU  Open   OpenCfr
>0 0 0 0
>OpenOwner Opens LockOwner LocksDelegs  LocalOwn LocalOpen LocalLOwn
> 5638141453 0 0 0 0 0 0
Ok, I think this shows us the problem. 141453 opens is a lot and the client 
would have
to chek these every time another open is done (there goes all that CPU;-).

Now, why has this occurred?
Well, the NFSv4 client can't close NFSv4 Opens on a vnode until that vnode's
v_usecount goes to 0. This is because mmap'd files might do I/O after the file
descriptor is closed.
Now, hopefully Kostik will know something about nullfs and can help with this.
My guess is that nullfs ends up acquiring a refcnt on the NFS vnode so the
v_usecount doesn't go to 0 and, therefore, the client never closes the NFSv4 
Opens.
Kostik, do you know if this is the case and whether or not it can be changed?
>LocalLock
>0
>Rpc Info:
>TimedOut   Invalid X Replies   Retries  Requests
>0 0 0 0   662
>Cache Info:
>Attr HitsMisses Lkup HitsMisses BioR HitsMisses BioW HitsMisses
> 127558   837   121 0 0 0 0
>BioRLHitsMisses BioD HitsMisses DirE HitsMisses
>1 0 6 0 1 0
>
[more stuff snipped]
>What role could nullfs be playing?
As noted above, my hunch is that is acquiring a refcnt on the NFS client vnode 
such
that the v_usecount doesn't go to zero (at least for a long time) and without
a VOP_INACTIVE() on the NFSv4 vnode, the NFSv4 Opens don't get closed and
accumulate.
(If that isn't correct, it is somehow interfering with the client Closing the 
NFSv4 Opens
 in some other way.)

rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: NFSv4 performance degradation with 12.0-CURRENT client

2016-11-24 Thread Rick Macklem

On Wed, Nov 23, 2016 at 10:17:25PM -0700, Alan Somers wrote:
> I have a FreeBSD 10.3-RELEASE-p12 server exporting its home
> directories over both NFSv3 and NFSv4.  I have a TrueOS client (based
> on 12.0-CURRENT on the drm-next-4.7 branch, built on 28-October)
> mounting the home directories over NFSv4.  At first, everything is
> fine and performance is good.  But if the client does a buildworld
> using sources on NFS and locally stored objects, performance slowly
> degrades.  The degradation is most noticeable with metadata-heavy
> operations.  For example, "ls -l" in a directory with 153 files takes
> less than 0.1 seconds right after booting.  But the longer the
> buildworld goes on, the slower it gets.  Eventually that same "ls -l"
> takes 19 seconds.  When the home directories are mounted over NFSv3
> instead, I see no degradation.
>
> top shows negligible CPU consumption on the server, and very high
> consumption on the client when using NFSv4 (nearly 100%).  The
> NFS-using process is spending almost all of its time in system mode,
> and dtrace shows that almost all of its time is spent in
> ncl_getpages().
>
A couple of things you could do when it slow (as well as what Kostik suggested):
- nfsstat -c -e on client and nfsstat -e -s on server, to see what RPCs are 
being done
  and how quickly. (nfsstat -s -e will also show you how big the DRC is, 
although a
  large DRC should show up as increased CPU consumption on the server)
- capture packets with tcpdump -s 0 -w test.pcap host 
  - then you can email me test.pcap as an attachment. I can look at it in 
wireshark
and see if there seem to protocol and/or TCP issues. (You can look at in 
wireshark
yourself, the look for NFS4ERR_xxx, TCP segment retransmits...)

If you are using either "intr" or "soft" on the mounts, try without those mount 
options.
(The Bugs section of mount_nfs recommends against using them. If an RPC fails 
due to
 these options, something called a seqid# can be "out of sync" between 
client/server and
 that causes serious problems.)
--> These seqid#s are not used by NFSv4.1, so you could try that by adding
  "minorversion=1" to your mount options.

Good luck with it, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: build fails post SVN r304026

2016-08-13 Thread Rick Macklem
Lev Serebryakov wrote:
>On 13.08.2016 16:54, Michael Butler wrote:
>
>> Is anyone else seeing this?
>  Yes, I've posted message to fs@, as it is r304026 for sure (and author
>was CC:ed too).
Should be fixed now. Sorry about the breakage. I didn't realize the old
nfsstat.c wouldn't build with the kernel source patch. (The old nfsstat
binary does still work, as required to avoid a POLA violation.)

Anyhow, the changes to nfsstat.c are now committed to head.

Sorry about the breakage, rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: NFSv4 compatibility with ESX6U2

2016-05-30 Thread Rick Macklem
Michael Butler wrote:
> On 05/29/16 21:05, Michael Butler wrote:
> > I was just fooling around with ESX this evening and trying to add an
> > NFSv4 mount onto it as extra storage. Curiously, given the correct
> > credentials, it will report the total volume size and free remaining but
> > won't display either files or subdirectories :-(
> > 
> > In this case, the underlying file-system is UFS but I was hoping to
> > migrate to a ZFS share once this worked.
> > 
> > Is there something I can do to identify the interoperability issue?
> 
> Never mind - I got it working with the username set to "user@domain" ..
> 
> 
If user<->uid mapping keeps giving you grief, you can:
sysctl vfs.nfsd.enable_stringtouid=1
on the NFS server and then kill off the nfsuserd on both client and server.
(After doing this, the user on the wire is just a string with the uid in it
 like "102". Same applies to groups.)

Also, you need to avoid multiple username->same uid mappings. I delete "toor"
from /etc/passwd for example.

rick

>   imb
> 
> 
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Aw: Re: Aw: Re: Partitioning on a MBR table disk fails (and destroys my data...)

2015-12-12 Thread Rick Macklem
> Rick Macklem  wrote:
> 
> > I don`t use it, but gpart is the preferred FreeBSD command. You might try
> > that instead.
>
> Does it work with MBR or only GPT? Anyway, I'll try it.
>
It does handle MBR. However, since you are already comfortable with the 
OpenBSD/NetBSD
fdisk, maybe firing up one of those and using their fdisk to clear out the slice
you want to use for FreeBSD would be easier. Especially since you mention below
that you don't want to "touch with FreeBSD anymore".
If you have problems doing this, maybe posting with exactly what error(s) you
get from fdisk might get some specific suggestions w.r.t. fixing it?

If you do choose to use a  from the FreeBSD installer, either fdisk (I 
don't
think you need to specify a disk, but if you do, try "/dev/ada0") or "gpart 
list"
should show you what FreeBSD thinks the partition table looks like.
(I don't use gpart, so I don't know its commands beyond that. The man page is
 rather long, so if you choose to use it, you've got some reading to do.)
--> If there is anything under the "freebsd" slice, you need to delete those
before creating new ones.
--> If you do "Manual...MBR" from the installer, it should show you the slices
and anything within each slice. If it shows you anything inside the 
"freebsd"
slice, delete those before trying to create any new ones.
The "Manual...MBR" is a front-end to either gpart of fdisk (I don't work on the
installer, so I don't know which) and has always worked fine for me.
(I recently installed using this on the space left over from a Windows install,
 so it understood the MBR the Windows install put on the drive. I did create the
 freebsd slice with this, followed by the partitions within the slice.)
The only "trick" I've noticed is that I needed to know the names for the types 
of
partitions:
freebsd-ufs for a UFS partition
freebsd-swap for a swap partition
freebsd-zfs for a ZFS partition
- because the installer seems to expect you to know these for the "Manual...MBR"
  case.

> > Well, although installing is always a bit scary, if you don`t touch the
> > other
> > slices, I`d delete and create the freebsd one. It gets to a certain point
> > when
> > doing the `Manual MBR` before it asks you if you want to save it on disk.
>
> At least creating by the (curses) GUI installer is not possible. It does 
> create
> somewhere instead of asking me and it doesn't even tell me where it
> has created it. And there are numeric bugs in the tool. The numbers it
> displayed changed without reason and became even negative ...
> So the MBR I don't touch with FreeBSD anymore ...
> A simple task for the installer developer: Please let me use an existing
> empty slice. This is no rocket sience.
>
> Carsten
Once you have cleared out the FreeBSD partition with OpenBSD/NetBSD, then
I'd encourage you to use "Manual ..MBR" and avoid the Auto option when you
get to that point in the FreeBSD install.

Good luck with it, rick
ps: I'm not an installer guy. I just use it from time to time.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Aw: Re: Partitioning on a MBR table disk fails (and destroys my data...)

2015-12-11 Thread Rick Macklem
Carsten Kunze wrote:
> Rick Macklem  wrote:
> 
> > Did you use "Manual" when it gets to the partitioning screen?
> > When I've done this, after selecting "Manual MBR" (or whatever it's called,
> > one or two below "Auto"), it should show you the slices
> > (what FreeBSD calls the 4 MBR partitions):
> > - Then I select the "freebsd" (move around until it is highlighted one)
> >   and create partitions within it with "Create" at the bottom of the
> > screen.
> >   (I always have "fun" with the interface, but repeated attempts with 
> >and the arrow keys eventually get me to the right place on the screen;-)
> 
> I think I did select "auto" which brings me into the "manual" screen after
> few
> steps. It does show the slices and does even show NTFS and Linux
> partitions inside the extended partition (I have 3 primary MBR partitions,
> first is freebsd, then two NTFS, then an extendet with further NTFS and
> Linux).
> 
> The first 10MB of the first slice (freebsd) had been cleared with
> "dd if=/dev/zero of=...".  When I put the cursor line on this slice and
> select "create" it doesn't allow me to create the freebsd-ufs for "/".
> 
Sorry, I can`t explain why it would fail. I have seen different fdisks have
differing opinions w.r.t. partition alignment in the past.

> > Good luck with it, rick
> > ps: If it doesn't show the slices,I'm guessing the MBR doesn't make sense
> > to
> > FreeBSD's fdisk. You can go to "" instead of "" and
> > then
> > try typing "fdisk".
> 
> I did try the shell and typed "fdisk" and "disklabel" but this did not work
> as
> known from other BSDs.
> 
You didn`t say what `fdisk` gave as output.
If it doesn`t show your slices, I can only guess that the MBR isn`t understood
by FreeBSD for some reason.
I don`t use it, but gpart is the preferred FreeBSD command. You might try that 
instead.

As I said, it works for me, but I use a simple:
- windows NTFS
- windows NTFS
- freebsd
set of slices and I created freebsd with the "Manual MBR" option of the
installer. I do not know what ``auto`` might have done.

Over the years, I have found that different variants of "fdisk" have different
ideas w.r.t. alignment.
> The actually issue is that I can't create something in the found freebsd
> slice.  In the past I did simply remove this slice and added a new one
> (since the free space on the disk had exactly been what I wanted to
> use).  But now the seemingly free space is not actually completely free
> so I'd like to not delete the slice.  The installer should support using
> this slice.
> 
Well, although installing is always a bit scary, if you don`t touch the other
slices, I`d delete and create the freebsd one. It gets to a certain point when
doing the `Manual MBR` before it asks you if you want to save it on disk.

rick

> Carsten
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Partitioning on a MBR table disk fails (and destroys my data...)

2015-12-11 Thread Rick Macklem
Carsten Kunze wrote:
> Hello,
> 
> how is it possible to install FreeBSD in an existing empty MBR partition with
> type "freebsd"?  The installer does not allow this (for unknown reason), it
> returns the error "no space left".  What steps would be necessary to add two
> freebsd-ufs and one freebsd-swap into the existing freebsd partition?  In no
> case I want to delete this partition since I do *not* want FreeBSD to edit
> the MBR (to not have data loss again).  There is unfortunately not much
> information in the handbook "2.6.5. Shell Mode Partitioning" (anyway I'd
> prefer to use the curses UI partition editor).
> 
Did you use "Manual" when it gets to the partitioning screen?
When I've done this, after selecting "Manual MBR" (or whatever it's called,
one or two below "Auto"), it should show you the slices
(what FreeBSD calls the 4 MBR partitions):
- Then I select the "freebsd" (move around until it is highlighted one)
  and create partitions within it with "Create" at the bottom of the screen.
  (I always have "fun" with the interface, but repeated attempts with 
   and the arrow keys eventually get me to the right place on the screen;-)

Good luck with it, rick
ps: If it doesn't show the slices,I'm guessing the MBR doesn't make sense to
FreeBSD's fdisk. You can go to "" instead of "" and then
try typing "fdisk".

> Carsten
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: RPC request sent to 127.0.0.1 becomes from other IP on machine

2015-12-10 Thread Rick Macklem
Ok, I had a hunch it was related to the use of jails.
I am just testing a patch that switches the nfsuserd over to
using an af_local socket, so this will be avoided.
(I think it makes more sense anyhow. I just never got around
 to doing it.;-)

Thanks for the info, rick

- Original Message -
> On Thu, 10 Dec 2015, Rick Macklem wrote:
> 
> > Hi,
> >
> > Mark has reported a problem via email where the nfsuserd daemon sees
> > requests coming from an IP# assigned to the machine instead of 127.0.0.1.
> > Here's a snippet from his message:
> >   Ok, I have Plex in a jail and when I scan the remote NFS file share the
> >   *local* server's nfsuserd spams the logs.
> > Spamming the logs refers to the messages nfsuserd generates when it gets
> > a request from an address other than 127.0.0.1.
> >
> > I think the best solution is to switch nfsuserd over to using an AF_LOCAL
> > socket like the gssd uses, but that will take a little coding and probably
> > won't be MFCable.
> >
> > I've sent him the attached patch to try as a workaround.
> >
> > Does anyone happen to know under what circumstances the address 127.0.0.1
> > gets replaced?
> 
> My memory is quite hazy on this subject, but I think that outbound traffic
> from a jail is not permitted to use the system loopback address 127.0.0.1;
> traffic from this address within a jail gets replace with the jail's
> primary IP address.  It is possible to specify an alternate loopback
> address for use within the jail (e.g., 127.0.0.2) and if that alternate
> address is only bound within the jail, it can be used for outgoing traffic
> to the host.  See jail.conf(5); I appear to have something like:
> 
> kduck {
> host.hostname = "kduck.mit.edu";
> ip4.addr = lo0|127.0.0.2, 18.18.0.52;
> [...]
> }
> 
> Note that there may be some additional magic about the primary address of
> the jail being first (or last?) in the list of addresses.
> 
> -Ben
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


RPC request sent to 127.0.0.1 becomes from other IP on machine

2015-12-10 Thread Rick Macklem
Hi,

Mark has reported a problem via email where the nfsuserd daemon sees
requests coming from an IP# assigned to the machine instead of 127.0.0.1.
Here's a snippet from his message:
  Ok, I have Plex in a jail and when I scan the remote NFS file share the
  *local* server's nfsuserd spams the logs.
Spamming the logs refers to the messages nfsuserd generates when it gets
a request from an address other than 127.0.0.1.

I think the best solution is to switch nfsuserd over to using an AF_LOCAL
socket like the gssd uses, but that will take a little coding and probably
won't be MFCable.

I've sent him the attached patch to try as a workaround.

Does anyone happen to know under what circumstances the address 127.0.0.1
gets replaced?

And do you know if it will always be replaced with the same
address?
(I'm basically wondering if the workaround needs to be a list of IP addresses
 instead of a single address?)

Thanks in advance for any help with this, rick

--- nfsuserd.c.sav	2015-12-09 18:46:29.284972000 -0500
+++ nfsuserd.c	2015-12-09 18:59:33.564498000 -0500
@@ -40,6 +40,10 @@ __FBSDID("$FreeBSD: head/usr.sbin/nfsuse
 #include 
 #include 
 
+#include 
+
+#include 
+
 #include 
 
 #include 
@@ -94,6 +98,7 @@ gid_t defaultgid = (gid_t)32767;
 int verbose = 0, im_a_slave = 0, nfsuserdcnt = -1, forcestart = 0;
 int defusertimeout = DEFUSERTIMEOUT, manage_gids = 0;
 pid_t slaves[MAXNFSUSERD];
+struct in_addr fromip;
 
 int
 main(int argc, char *argv[])
@@ -144,6 +149,7 @@ main(int argc, char *argv[])
 			}
 		}
 	}
+	fromip.s_addr = inet_addr("127.0.0.1");
 	nid.nid_usermax = DEFUSERMAX;
 	nid.nid_usertimeout = defusertimeout;
 
@@ -190,6 +196,15 @@ main(int argc, char *argv[])
 usage();
 			}
 			nid.nid_usertimeout = defusertimeout = i * 60;
+		} else if (!strcmp(*argv, "-fromip")) {
+			if (argc == 1)
+usage();
+			argc--;
+			argv++;
+			if (inet_aton(*argv, &fromip) == 0) {
+fprintf(stderr, "Bad -fromip %s\n", *argv);
+usage();
+			}
 		} else if (nfsuserdcnt == -1) {
 			nfsuserdcnt = atoi(*argv);
 			if (nfsuserdcnt < 1)
@@ -458,22 +473,22 @@ nfsuserdsrv(struct svc_req *rqstp, SVCXP
 	u_short sport;
 	struct info info;
 	struct nfsd_idargs nid;
-	u_int32_t saddr;
 	gid_t grps[NGROUPS];
 	int ngroup;
 
 	/*
-	 * Only handle requests from 127.0.0.1 on a reserved port number.
+	 * Only handle requests from 127.0.0.1 on a reserved port number,
+	 * unless the "-fromip" specified a different address.
 	 * (Since a reserved port # at localhost implies a client with
 	 *  local root, there won't be a security breach. This is about
 	 *  the only case I can think of where a reserved port # means
 	 *  something.)
 	 */
 	sport = ntohs(transp->xp_raddr.sin_port);
-	saddr = ntohl(transp->xp_raddr.sin_addr.s_addr);
 	if ((rqstp->rq_proc != NULLPROC && sport >= IPPORT_RESERVED) ||
-	saddr != 0x7f01) {
-		syslog(LOG_ERR, "req from ip=0x%x port=%d\n", saddr, sport);
+	transp->xp_raddr.sin_addr.s_addr != fromip.s_addr) {
+		syslog(LOG_ERR, "req from ip=%s port=%d\n",
+		inet_ntoa(transp->xp_raddr.sin_addr), sport);
 		svcerr_weakauth(transp);
 		return;
 	}
@@ -721,5 +736,5 @@ usage(void)
 {
 
 	errx(1,
-	"usage: nfsuserd [-usermax cache_size] [-usertimeout minutes] [-verbose] [-manage-gids] [-domain domain_name] [n]");
+	"usage: nfsuserd [-usermax cache_size] [-usertimeout minutes] [-verbose] [-manage-gids] [-domain domain_name] [-fromip xx.xx.xx.xx] [n]");
 }
--- nfsuserd.8.sav	2015-12-09 19:13:48.173812000 -0500
+++ nfsuserd.8	2015-12-09 19:19:38.522516000 -0500
@@ -24,7 +24,7 @@
 .\"
 .\" $FreeBSD: head/usr.sbin/nfsuserd/nfsuserd.8 276258 2014-12-26 21:56:23Z joel $
 .\"
-.Dd November 1, 2015
+.Dd December 9, 2015
 .Dt NFSUSERD 8
 .Os
 .Sh NAME
@@ -37,6 +37,7 @@ services plus support manage-gids for al
 .Op Fl domain Ar domain_name
 .Op Fl usertimeout Ar minutes
 .Op Fl usermax Ar max_cache_size
+.Op Fl fromip Ar ip_address
 .Op Fl verbose
 .Op Fl force
 .Op Fl manage-gids
@@ -76,6 +77,21 @@ the more kernel memory is used, but the 
 system can afford the memory use, make this the sum of the number of
 entries in your group and password databases.
 The default is 200 entries.
+.It Fl fromip Ar ip_address
+This overrides the default upcall from address of 127.0.0.1.
+Although the upcall connection is done to 127.0.0.1, some network
+configurations will result in another IP address assigned to the machine
+as the from address.
+If you get syslog messages like:
+.sp
+.nf
+Dec  9 19:03:20 nfsv4-laptop nfsuserd:[506]: req from ip=131.104.48.17 port=861
+.fi
+.sp
+then you should use this option to set the from IP address to the one in
+the message.
+Only do this for IP addresses assigned to the machine this daemon is running
+on.
 .It Fl verbose
 When set, the server logs a bunch of information to syslog.
 .It Fl force
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send

Re: slow screen updates on laptop console (i386)

2015-12-08 Thread Rick Macklem
tuuid: 44454c4c-3300-1038-8047-cac04f443831.
Setting hostid: 0x49e43364.
warning: total configured swap (1310720 pages) exceeds maximum recommended 
amount (439936 pages).
warning: increase kern.maxswzone or reduce amount of swap.
Starting file system checks:
/dev/ada0s1a: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ada0s1a: clean, 1131648 free (504 frags, 141393 blocks, 0.0% fragmentation)
/dev/ada0s1d: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ada0s1d: clean, 2207358 free (2270 frags, 275636 blocks, 0.1% 
fragmentation)
/dev/ada0s1e: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ada0s1e: clean, 1390306 free (105498 frags, 160601 blocks, 3.5% 
fragmentation)
Mounting local file systems:.
ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib
a.out ldconfig path: /usr/lib/aout /usr/lib/compat/aout
Setting hostname: nfsv4-laptop.rick.home.
Setting up 
harvesting:[UMA],[FS_ATIME],SWI,INTERRUPT,NET_NG,NET_ETHER,NET_TUN,MOUSE,KEYBOARD,ATTACH,CACHED
Feeding entropy:
uhub4: 8 ports with 8 removable, self powered
.
bfe0: link state changed to DOWN
bfe0: link state changed to UP
Starting Network: lo0 bfe0.
lo0: flags=8049 metric 0 mtu 16384
options=63
inet6 ::1 prefixlen 128 
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2 
inet 127.0.0.1 netmask 0xff00 
nd6 options=21
groups: lo 
bfe0: flags=8843 metric 0 mtu 1500
options=80008
ether 00:14:22:93:66:a0
inet 192.168.1.4 netmask 0xff00 broadcast 192.168.1.255 
nd6 options=29
media: Ethernet autoselect (100baseTX )
status: active
Starting devd.
add net fe80::: gateway ::1
add net ff02::: gateway ::1
add net :::0.0.0.0: gateway ::1
add net ::0.0.0.0: gateway ::1
Creating and/or trimming log files.
Starting syslogd.
No core dumps found.
Starting casperd.
Clearing /tmp (X related).
NFS on reserved port only=YES
Starting nfsuserd.
Starting rpcbind.
Starting mountd.
Starting nfsd.
Updating motd:.
Mounting late file systems:.
Configuring vt: blanktime.
Performing sanity check on sshd configuration.
Starting sshd.
Starting cron.
Starting inetd.

Tue Dec  8 17:02:04 EST 2015
Dec  8 17:02:09 nfsv4-laptop login: ROOT LOGIN (root) ON ttyv0

> Thanks,
> 
> -a
> 
> 
> On 8 December 2015 at 13:19, Rick Macklem  wrote:
> > Adrian Chadd wrote:
> >> Hi,
> >>
> >> Yea - try setting hw.acpi.cpu.cx_lowest=C1 and re-test.
> >>
> > Yep, with this setting, LAPIC seems to work fine.
> >
> > rick
> >
> >>
> >> -a
> >> ___
> >> freebsd-current@freebsd.org mailing list
> >> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> >> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> >>
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: slow screen updates on laptop console (i386)

2015-12-08 Thread Rick Macklem
Adrian Chadd wrote:
> Hi,
> 
> Yea - try setting hw.acpi.cpu.cx_lowest=C1 and re-test.
> 
Yep, with this setting, LAPIC seems to work fine.

rick

> 
> -a
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


<    1   2   3   4   5   6   7   8   >