Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-02-14 Thread J. Bruce Fields
On Mon, Feb 05, 2018 at 06:06:29PM -0500, J. Bruce Fields wrote:
> Or this?:
> 
>   
> https://www.newegg.com/Product/Product.aspx?Item=N82E16820156153_re=ssd_power_loss_protection-_-20-156-153-_-Product

Ugh, Anandtech explains that their marketing is misleading, that drive
can't actually destage its volatile write cache on power loss:

https://www.anandtech.com/show/8528/micron-m600-128gb-256gb-1tb-ssd-revi
+ew-nda-placeholder

I've been trying to figure this out in part because I wondered what I
might use if I replaced my home server this year.  After some further
looking the cheapest PCIe-attached SSD with real power loss protection
that I've found is this Intel model a little over $300:


http://www.intel.com/content/www/us/en/products/memory-storage/solid-state-drives/data-center-ssds/dc-p3520-series/dc-p3520-450gb-2-5inch-3d1.html

Kinda ridiculous to buy a 450 gig drive mainly so I can put a half-gig
journal on it.  It might turn out to be best for my case just to RAID a
couple of those SSDs and skip the conventional drives completely.

--b.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-02-08 Thread J. Bruce Fields
On Thu, Feb 08, 2018 at 08:21:44PM +, Terry Barnaby wrote:
> Doesn't fsync() and perhaps sync() work across NFS then when the server has
> an async export,

No.

On a local filesystem, a file create followed by a sync will ensure
the file create reaches disk.  Normally on NFS, the same is true--for
the trivial reason that the file create already ensured this.  If your
server is Linux knfsd exporting the filesystem with async, the file
create may still not be on disk after the sync.

> I don't think a program on a remote system is particularly worse off
> if the NFS server dies, it may have to die if it can't do any special
> recovery.

Well-written applications should be able to deal with recovering after a
crash, *if* the filesystem respects fsync() and friends.  If the
filesystem ignores them and loses data silently, the application is left
in a rather more difficult position!

> > > Only difference from the normal FS conventions I am suggesting is to
> > > allow the server to stipulate "sync" on its mount that forces sync
> > > mode for all clients on that FS.
> > Anyway, we don't have protocol to tell clients to do that.
> As I said NFSv4.3 :)

Protocol extensions are certainly possible.

> > So if you have reliable servers and power, maybe you're comfortable with
> > the risk.  There's a reason that's not the default, though.
> Well, it is the default for local FS mounts so I really don't see why it
> should be different for network mounts.

It's definitely not the default for local mounts to ignore sync().  So,
you understand why I say that the "async" export option is very
different from the mount option with the same name.  (Yes, the name was
a mistake.)  And you can see why a filesystem engineer would get nervous
about recommending that configuration.

> But anyway for my usage NFS sync is completely unusable (as would
> local sync mounts) so it has to be async NFS or local disks (13 secs
> local disk -> 3mins NFS async-> 2 hours NFS sync). I would have
> thought that would go for the majority of NFS usage. No issue to me
> though as long as async can be configured and works well :)

So, instead what I personally use is a hardware configuration that allow
me to get similar performance while still using the default export
options.

> > Sure.  The protocol issues are probably more complicated than they first
> > appear, though!
> Yes, they probably are, most things are below the surface, but I still think
> there are likely to be a lot of improvements that could be made that would
> make using NFS async more tenable to the user.
> If necessary local file caching (to local disk) with delayed NFS writes. I
> do use fscache for the NFS - OpenVPN - FTTP mounts, but the NFS caching time
> tests probably hit the performance of this for reads and I presume writes
> would be write through rather than delayed write. Haven't actually looked at
> the performance of this and I know there are other network file systems that
> may be more suited in that case.

fscache doesn't remove the need for synchronous file creates.

So, in the existing protocol write delegations are probably what would
help most; which is why they're near the top of my todo list.

But write delegations just cover file data and attributes.  If we want a
client to be able to, for example, respond to creat() with success, we
want write delegations on *directories*.  That's rather more
complicated, and we currently don't even have protocol proposed for
that.  It's been proposed in the past and I hope there may be sufficient
time and motivation to make it happen some day

--b.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-02-08 Thread Terry Barnaby

On 06/02/18 21:48, J. Bruce Fields wrote:

On Tue, Feb 06, 2018 at 08:18:27PM +, Terry Barnaby wrote:

Well, when a program running on a system calls open(), write() etc. to the
local disk FS the disk's contents is not actually updated. The data is in
server buffers until the next sync/fsync or some time has passed. So, in
your parlance, the OS write() call lies to the program. So it is by default
async unless the "sync" mount option is used when mounting the particular
file system in question.

That's right, but note applications are written with the knowledge that
OS's behave this way, and are given tools (sync, fsync, etc.) to manage
this behavior so that they still have some control over what survives a
crash.

(But sync & friends no longer do what they're supposed to on an Linux
server exporting with async.)
Doesn't fsync() and perhaps sync() work across NFS then when the server 
has an async export, I thought they did along with file locking to some 
extent ?

Although it is different from the current NFS settings methods, I would have
thought that this should be the same for NFS. So if a client mounts a file
system normally it is async, ie write() data is in buffers somewhere (client
or server) unless the client mounts the file system in sync mode.

In fact, this is pretty much how it works, for write().

It didn't used to be that way--NFSv2 writes were all synchronous.

The problem is that if a server power cycles while it still had dirty
data in its caches, what should you do?
You can't ignore it--you'd just be silently losing data.  You could
return an error at some point, but "we just lost some or your idea, no
idea what" isn't an error an application can really act on.
Yes, it is tricky error handling. But what does a program do when its 
local hard disk disk or machine dies underneath it anyway ? I don't 
think a program on a remote system is particularly worse off if the NFS 
server dies, it may have to die if it can't do any special recovery. If 
it was important to get the data to disk it would have been using 
fsync(), FS sync, or some other transaction based approach, indeed it 
shouldn't be using network remote disk mounts anyway. It all depends on 
what the program is doing and its usage requirements. A cc failing one 
in a blue moon is not a real issue (as long as it fails and removes its 
created files or at least a make clean can be run). As I have said I 
have used NFS async for about 27+ years on multiple systems with no 
problems when servers die with the type of usage I use NFS for. The 
number of times a server has died is low in that time. Client systems 
have died many many more times (User issues, experimental 
programs/kernels, random program usage, single cheap disks, cheaper non 
ECC RAM etc.)

So NFSv3 introduced a separation of write into WRITE and COMMIT.  The
client first sends a WRITE with the data, then latter sends a COMMIT
call that says "please don't return till that data I sent before is
actually on disk".

If the server reboots, there's a limited set of data that the client
needs to resend to recover (just data that's been written but not
committed.)

But we only have that for file data, metadata would be more complicated,
so stuff like file creates, setattr, directory operations, etc., are
still synchronous.


Only difference from the normal FS conventions I am suggesting is to
allow the server to stipulate "sync" on its mount that forces sync
mode for all clients on that FS.

Anyway, we don't have protocol to tell clients to do that.

As I said NFSv4.3 :)



In the case of a /home mount for example, or a source code build file
system, it is normally only one client that is accessing the dir and if a
write fails due to the server going down (an unlikely occurrence, its not
much of an issue. I have only had this happen a couple of times in 28 years
and then with no significant issues (power outage, disk fail pre-raid etc.).

So if you have reliable servers and power, maybe you're comfortable with
the risk.  There's a reason that's not the default, though.
Well, it is the default for local FS mounts so I really don't see why it 
should be different for network mounts. But anyway for my usage NFS sync 
is completely unusable (as would local sync mounts) so it has to be 
async NFS or local disks (13 secs local disk -> 3mins NFS async-> 2 
hours NFS sync). I would have thought that would go for the majority of 
NFS usage. No issue to me though as long as async can be configured and 
works well :)



4. The 0.5ms RPC latency seems a bit high (ICMP pings 0.12ms) . Maybe this
is worth investigating in the Linux kernel processing (how ?) ?

Yes, that'd be interesting to investigate.  With some kernel tracing I
think it should be possible to get high-resolution timings for the
processing of a single RPC call, which would make a good start.

It'd probably also interesting to start with the simplest possible RPC
and then work our way up and see when the RTT increases 

Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-02-06 Thread J. Bruce Fields
On Tue, Feb 06, 2018 at 08:18:27PM +, Terry Barnaby wrote:
> Well, when a program running on a system calls open(), write() etc. to the
> local disk FS the disk's contents is not actually updated. The data is in
> server buffers until the next sync/fsync or some time has passed. So, in
> your parlance, the OS write() call lies to the program. So it is by default
> async unless the "sync" mount option is used when mounting the particular
> file system in question.

That's right, but note applications are written with the knowledge that
OS's behave this way, and are given tools (sync, fsync, etc.) to manage
this behavior so that they still have some control over what survives a
crash.

(But sync & friends no longer do what they're supposed to on an Linux
server exporting with async.)

> Although it is different from the current NFS settings methods, I would have
> thought that this should be the same for NFS. So if a client mounts a file
> system normally it is async, ie write() data is in buffers somewhere (client
> or server) unless the client mounts the file system in sync mode.

In fact, this is pretty much how it works, for write().

It didn't used to be that way--NFSv2 writes were all synchronous.

The problem is that if a server power cycles while it still had dirty
data in its caches, what should you do?

You can't ignore it--you'd just be silently losing data.  You could
return an error at some point, but "we just lost some or your idea, no
idea what" isn't an error an application can really act on.

So NFSv3 introduced a separation of write into WRITE and COMMIT.  The
client first sends a WRITE with the data, then latter sends a COMMIT
call that says "please don't return till that data I sent before is
actually on disk".

If the server reboots, there's a limited set of data that the client
needs to resend to recover (just data that's been written but not
committed.)

But we only have that for file data, metadata would be more complicated,
so stuff like file creates, setattr, directory operations, etc., are
still synchronous.

> Only difference from the normal FS conventions I am suggesting is to
> allow the server to stipulate "sync" on its mount that forces sync
> mode for all clients on that FS.

Anyway, we don't have protocol to tell clients to do that.

> In the case of a /home mount for example, or a source code build file
> system, it is normally only one client that is accessing the dir and if a
> write fails due to the server going down (an unlikely occurrence, its not
> much of an issue. I have only had this happen a couple of times in 28 years
> and then with no significant issues (power outage, disk fail pre-raid etc.).

So if you have reliable servers and power, maybe you're comfortable with
the risk.  There's a reason that's not the default, though.

> > > 4. The 0.5ms RPC latency seems a bit high (ICMP pings 0.12ms) . Maybe this
> > > is worth investigating in the Linux kernel processing (how ?) ?
> > Yes, that'd be interesting to investigate.  With some kernel tracing I
> > think it should be possible to get high-resolution timings for the
> > processing of a single RPC call, which would make a good start.
> > 
> > It'd probably also interesting to start with the simplest possible RPC
> > and then work our way up and see when the RTT increases the most--e.g
> > does an RPC ping (an RPC with procedure 0, empty argument and reply)
> > already have a round-trip time closer to .5ms or .12ms?
> Any pointers to trying this ? I have a small amount of time as work is quiet
> at the moment.

Hm.  I wonder if testing over loopback would give interesting enough
results.  That might simplify testing even if it's not as realistic.
You could start by seeing if latency is still similar.

You could start by googling around for "ftrace", I think lwn.net's
articles were pretty good introductions.

I don't do this very often and don't have good step-by-step
instructions

I beleive the simplest way to do it was using "trace-cmd" (which is
packaged for fedora in a package of the same name).  The man page looks
skimpy, but https://lwn.net/Articles/410200/ looks good.  Maybe run it
while just stat-ing a single file on an NFS partition as a start.

I don't know if that will result in too much data.  Figuring out how to
filter it may be tricky.  Tracing everything may be prohibitive.
Several processes are involved so you don't want to restrict by process.
Maybe restricting to functions in nfsd and sunrpc modules would work,
with something like -l ':mod:nfs' -l ':mod:sunrpc'.

> We have also found that SSD's or at least NAND flash has quite a few write
> latency peculiarities . We use eMMC NAND flash on a few embedded systems we
> have designed and the write latency patterns are a bit random and not well
> described/defined in datasheets etc. Difficult when you have an embedded
> system with small amounts of RAM doing real-time data capture !

That's one of the reasons you want the "enterprise" 

Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-02-06 Thread Terry Barnaby

On 06/02/18 18:55, J. Bruce Fields wrote:

On Tue, Feb 06, 2018 at 06:49:28PM +, Terry Barnaby wrote:

On 05/02/18 14:52, J. Bruce Fields wrote:

Yet another poor NFSv3 performance issue. If I do a "ls -lR" of a certain
NFS mounted directory over a slow link (NFS over Openvpn over FTTP
80/20Mbps), just after mounting the file system (default NFSv4 mount with
async), it takes about 9 seconds. If I run the same "ls -lR" again, just
after, it takes about 60 seconds.

A wireshark trace might help.

Also, is it possible some process is writing while this is happening?

--b.


Ok, I have made some wireshark traces and put these at:

https://www.beam.ltd.uk/files/files//nfs/

There are other processing running obviously, but nothing that should be
doing anything that should really affect this.

As a naive input, it looks like the client is using a cache but checking the
update times of each file individually using GETATTR. As it is using a
simple GETATTR per file in each directory the latency of these RPC calls is
mounting up. I guess it would be possible to check the cache status of all
files in a dir at once with one call that would allow this to be faster when
a full readdir is in progress, like a "GETATTR_DIR " RPC call. The
overhead of the extra data would probably not affect a single file check
cache time as latency rather than amount of data is the killer.

Yeah, that's effectively what READDIR is--it can request attributes
along with the directory entries.  (In NFSv4--in NFSv3 there's a
seperate call called READDIR_PLUS that gets attributes.)

So the client needs some heuristics to decide when to do a lot of
GETATTRs and when to instead do READDIR.  Those heuristics have gotten
some tweaking over time.

What kernel version is your client on again?

--b.


System is Fedora27, Kernel is: 4.14.16-300.fc27.x86_64 on both client 
and server.


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-02-06 Thread Terry Barnaby

On 05/02/18 23:06, J. Bruce Fields wrote:

On Thu, Feb 01, 2018 at 08:29:49AM +, Terry Barnaby wrote:

1. Have an OPEN-SETATTR-WRITE RPC call all in one and a SETATTR-CLOSE call
all in one. This would reduce the latency of a small file to 1ms rather than
3ms thus 66% faster. Would require the client to delay the OPEN/SETATTR
until the first WRITE. Not sure how possible this is in the implementations.
Maybe READ's could be improved as well but getting the OPEN through quick
may be better in this case ?

2. Could go further with an OPEN-SETATTR-WRITE-CLOSE RPC call. (0.5ms vs
3ms).

The protocol doesn't currently let us delay the OPEN like that,
unfortunately.
Yes, should have thought of that, to focused on network traces and not 
thinking about the program/OS API :)

But maybe  OPEN-SETATTR and SETATTR-CLOSE would be possible.


What we can do that might help: we can grant a write delegation in the
reply to the OPEN.  In theory that should allow the following operations
to be performed asynchronously, so the untar can immediately issue the
next OPEN without waiting.  (In practice I'm not sure what the current
client will do.)

I'm expecting to get to write delegations this year

It probably wouldn't be hard to hack the server to return write
delegations even when that's not necessarily correct, just to get an
idea what kind of speedup is available here.
That sounds good. I will have to read up on NFS write delegations, not 
sure how they work. I guess write() errors would be returned later than 
they actually occurred etc. ?



3. On sync/async modes personally I think it would be better for the client
to request the mount in sync/async mode. The setting of sync on the server
side would just enforce sync mode for all clients. If the server is in the
default async mode clients can mount using sync or async as to their
requirements. This seems to match normal VFS semantics and usage patterns
better.

The client-side and server-side options are both named "sync", but they
aren't really related.  The server-side "async" export option causes the
server to lie to clients, telling them that data has reached disk even
when it hasn't.  This affects all clients, whether they mounted with
"sync" or "async".  It violates the NFS specs, so it is not the default.

I don't understand your proposal.  It sounds like you believe that
mounting on the client side with the "sync" option will make your data
safe even if the "async" option is set on the server side?
Unfortunately that's not how it works.
Well, when a program running on a system calls open(), write() etc. to 
the local disk FS the disk's contents is not actually updated. The data 
is in server buffers until the next sync/fsync or some time has passed. 
So, in your parlance, the OS write() call lies to the program. So it is 
by default async unless the "sync" mount option is used when mounting 
the particular file system in question.


Although it is different from the current NFS settings methods, I would 
have thought that this should be the same for NFS. So if a client mounts 
a file system normally it is async, ie write() data is in buffers 
somewhere (client or server) unless the client mounts the file system in 
sync mode. Only difference from the normal FS conventions I am 
suggesting is to allow the server to stipulate "sync" on its mount that 
forces sync mode for all clients on that FS. I know it is different from 
standard NFS config but it just seems more logical to me :) The 
sync/async option and the ramifications of it are really dependent of 
the clients usage in most cases.


In the case of a /home mount for example, or a source code build file 
system, it is normally only one client that is accessing the dir and if 
a write fails due to the server going down (an unlikely occurrence, its 
not much of an issue. I have only had this happen a couple of times in 
28 years and then with no significant issues (power outage, disk fail 
pre-raid etc.).


I know that is not how NFS currently "works", it just seems illogical to 
me they way it currently does work :)






4. The 0.5ms RPC latency seems a bit high (ICMP pings 0.12ms) . Maybe this
is worth investigating in the Linux kernel processing (how ?) ?

Yes, that'd be interesting to investigate.  With some kernel tracing I
think it should be possible to get high-resolution timings for the
processing of a single RPC call, which would make a good start.

It'd probably also interesting to start with the simplest possible RPC
and then work our way up and see when the RTT increases the most--e.g
does an RPC ping (an RPC with procedure 0, empty argument and reply)
already have a round-trip time closer to .5ms or .12ms?
Any pointers to trying this ? I have a small amount of time as work is 
quiet at the moment.



5. The 20ms RPC latency I see in sync mode needs a look at on my system
although async mode is fine for my usage. Maybe this ends up as 2 x 10ms
drive seeks on ext4 and is thus expected.


Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-02-06 Thread Terry Barnaby

On 05/02/18 14:52, J. Bruce Fields wrote:

Yet another poor NFSv3 performance issue. If I do a "ls -lR" of a certain
NFS mounted directory over a slow link (NFS over Openvpn over FTTP
80/20Mbps), just after mounting the file system (default NFSv4 mount with
async), it takes about 9 seconds. If I run the same "ls -lR" again, just
after, it takes about 60 seconds.

A wireshark trace might help.

Also, is it possible some process is writing while this is happening?

--b.


Ok, I have made some wireshark traces and put these at:

https://www.beam.ltd.uk/files/files//nfs/

There are other processing running obviously, but nothing that should be 
doing anything that should really affect this.


As a naive input, it looks like the client is using a cache but checking 
the update times of each file individually using GETATTR. As it is using 
a simple GETATTR per file in each directory the latency of these RPC 
calls is mounting up. I guess it would be possible to check the cache 
status of all files in a dir at once with one call that would allow this 
to be faster when a full readdir is in progress, like a "GETATTR_DIR 
" RPC call. The overhead of the extra data would probably not 
affect a single file check cache time as latency rather than amount of 
data is the killer.



So much for caching ! I have noticed
Makefile based builds (over Ethernet 1Gbps) taking a long time with a second
or so between each directory, I think this maybe why.

Listing the directory using a NFSv3 mount takes 67 seconds on the first
mount and about the same on subsequent ones. No noticeable caching (default
mount options with async), At least NFSv4 is fast the first time !

NFSv4 directory reads after mount:

No. Time   Source Destination   Protocol Length Info
     667 4.560833210    192.168.202.2 192.168.201.1 NFS  304
V4 Call (Reply In 672) READDIR FH: 0xde55a546
     668 4.582809439    192.168.201.1 192.168.202.2 TCP  1405
2049 → 679 [ACK] Seq=304477 Ack=45901 Win=1452 Len=1337 TSval=2646321616
TSecr=913651354 [TCP segment of a reassembled PDU]
     669 4.582986377    192.168.201.1 192.168.202.2 TCP  1405
2049 → 679 [ACK] Seq=305814 Ack=45901 Win=1452 Len=1337 TSval=2646321616
TSecr=913651354 [TCP segment of a reassembled PDU]
     670 4.583003805    192.168.202.2 192.168.201.1 TCP  68
679 → 2049 [ACK] Seq=45901 Ack=307151 Win=1444 Len=0 TSval=913651376
TSecr=2646321616
     671 4.583265423    192.168.201.1 192.168.202.2 TCP  1405
2049 → 679 [ACK] Seq=307151 Ack=45901 Win=1452 Len=1337 TSval=2646321616
TSecr=913651354 [TCP segment of a reassembled PDU]
     672 4.583280603    192.168.201.1 192.168.202.2 NFS  289
V4 Reply (Call In 667) READDIR
     673 4.583291818    192.168.202.2 192.168.201.1 TCP  68
679 → 2049 [ACK] Seq=45901 Ack=308709 Win=1444 Len=0 TSval=913651377
TSecr=2646321616
     674 4.583819172    192.168.202.2 192.168.201.1 NFS  280
V4 Call (Reply In 675) GETATTR FH: 0xb91bfde7
     675 4.605389953    192.168.201.1 192.168.202.2 NFS  312
V4 Reply (Call In 674) GETATTR
     676 4.605491075    192.168.202.2 192.168.201.1 NFS  288
V4 Call (Reply In 677) ACCESS FH: 0xb91bfde7, [Check: RD LU MD XT DL]
     677 4.626848306    192.168.201.1 192.168.202.2 NFS  240
V4 Reply (Call In 676) ACCESS, [Allowed: RD LU MD XT DL]
     678 4.626993773    192.168.202.2 192.168.201.1 NFS  304
V4 Call (Reply In 679) READDIR FH: 0xb91bfde7
     679 4.649330354    192.168.201.1 192.168.202.2 NFS  2408
V4 Reply (Call In 678) READDIR
     680 4.649380840    192.168.202.2 192.168.201.1 TCP  68
679 → 2049 [ACK] Seq=46569 Ack=311465 Win=1444 Len=0 TSval=913651443
TSecr=2646321683
     681 4.649716746    192.168.202.2 192.168.201.1 NFS  280
V4 Call (Reply In 682) GETATTR FH: 0xb6d01f2a
     682 4.671167708    192.168.201.1 192.168.202.2 NFS  312
V4 Reply (Call In 681) GETATTR
     683 4.671281003    192.168.202.2 192.168.201.1 NFS  288
V4 Call (Reply In 684) ACCESS FH: 0xb6d01f2a, [Check: RD LU MD XT DL]
     684 4.692647455    192.168.201.1 192.168.202.2 NFS  240
V4 Reply (Call In 683) ACCESS, [Allowed: RD LU MD XT DL]
     685 4.692825251    192.168.202.2 192.168.201.1 NFS  304
V4 Call (Reply In 690) READDIR FH: 0xb6d01f2a
     686 4.715060586    192.168.201.1 192.168.202.2 TCP  1405
2049 → 679 [ACK] Seq=311881 Ack=47237 Win=1452 Len=1337 TSval=2646321748
TSecr=913651486 [TCP segment of a reassembled PDU]
     687 4.715199557    192.168.201.1 192.168.202.2 TCP  1405
2049 → 679 [ACK] Seq=313218 Ack=47237 Win=1452 Len=1337 TSval=2646321748
TSecr=913651486 [TCP segment of a reassembled PDU]
     688 4.715215055    192.168.202.2 192.168.201.1 TCP  68
679 → 2049 [ACK] Seq=47237 Ack=314555 Win=1444 Len=0 TSval=913651509
TSecr=2646321748
     689 

Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-02-06 Thread J. Bruce Fields
On Tue, Feb 06, 2018 at 06:49:28PM +, Terry Barnaby wrote:
> On 05/02/18 14:52, J. Bruce Fields wrote:
> > > Yet another poor NFSv3 performance issue. If I do a "ls -lR" of a certain
> > > NFS mounted directory over a slow link (NFS over Openvpn over FTTP
> > > 80/20Mbps), just after mounting the file system (default NFSv4 mount with
> > > async), it takes about 9 seconds. If I run the same "ls -lR" again, just
> > > after, it takes about 60 seconds.
> > A wireshark trace might help.
> > 
> > Also, is it possible some process is writing while this is happening?
> > 
> > --b.
> > 
> Ok, I have made some wireshark traces and put these at:
> 
> https://www.beam.ltd.uk/files/files//nfs/
> 
> There are other processing running obviously, but nothing that should be
> doing anything that should really affect this.
> 
> As a naive input, it looks like the client is using a cache but checking the
> update times of each file individually using GETATTR. As it is using a
> simple GETATTR per file in each directory the latency of these RPC calls is
> mounting up. I guess it would be possible to check the cache status of all
> files in a dir at once with one call that would allow this to be faster when
> a full readdir is in progress, like a "GETATTR_DIR " RPC call. The
> overhead of the extra data would probably not affect a single file check
> cache time as latency rather than amount of data is the killer.

Yeah, that's effectively what READDIR is--it can request attributes
along with the directory entries.  (In NFSv4--in NFSv3 there's a
seperate call called READDIR_PLUS that gets attributes.)

So the client needs some heuristics to decide when to do a lot of
GETATTRs and when to instead do READDIR.  Those heuristics have gotten
some tweaking over time.

What kernel version is your client on again?

--b.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-02-05 Thread J. Bruce Fields
On Thu, Feb 01, 2018 at 08:29:49AM +, Terry Barnaby wrote:
> 1. Have an OPEN-SETATTR-WRITE RPC call all in one and a SETATTR-CLOSE call
> all in one. This would reduce the latency of a small file to 1ms rather than
> 3ms thus 66% faster. Would require the client to delay the OPEN/SETATTR
> until the first WRITE. Not sure how possible this is in the implementations.
> Maybe READ's could be improved as well but getting the OPEN through quick
> may be better in this case ?
> 
> 2. Could go further with an OPEN-SETATTR-WRITE-CLOSE RPC call. (0.5ms vs
> 3ms).

The protocol doesn't currently let us delay the OPEN like that,
unfortunately.

What we can do that might help: we can grant a write delegation in the
reply to the OPEN.  In theory that should allow the following operations
to be performed asynchronously, so the untar can immediately issue the
next OPEN without waiting.  (In practice I'm not sure what the current
client will do.)

I'm expecting to get to write delegations this year

It probably wouldn't be hard to hack the server to return write
delegations even when that's not necessarily correct, just to get an
idea what kind of speedup is available here.

> 3. On sync/async modes personally I think it would be better for the client
> to request the mount in sync/async mode. The setting of sync on the server
> side would just enforce sync mode for all clients. If the server is in the
> default async mode clients can mount using sync or async as to their
> requirements. This seems to match normal VFS semantics and usage patterns
> better.

The client-side and server-side options are both named "sync", but they
aren't really related.  The server-side "async" export option causes the
server to lie to clients, telling them that data has reached disk even
when it hasn't.  This affects all clients, whether they mounted with
"sync" or "async".  It violates the NFS specs, so it is not the default.

I don't understand your proposal.  It sounds like you believe that
mounting on the client side with the "sync" option will make your data
safe even if the "async" option is set on the server side?
Unfortunately that's not how it works.

> 4. The 0.5ms RPC latency seems a bit high (ICMP pings 0.12ms) . Maybe this
> is worth investigating in the Linux kernel processing (how ?) ?

Yes, that'd be interesting to investigate.  With some kernel tracing I
think it should be possible to get high-resolution timings for the
processing of a single RPC call, which would make a good start.

It'd probably also interesting to start with the simplest possible RPC
and then work our way up and see when the RTT increases the most--e.g
does an RPC ping (an RPC with procedure 0, empty argument and reply)
already have a round-trip time closer to .5ms or .12ms?

> 5. The 20ms RPC latency I see in sync mode needs a look at on my system
> although async mode is fine for my usage. Maybe this ends up as 2 x 10ms
> drive seeks on ext4 and is thus expected.

Yes, this is why dedicated file servers have hardware designed to lower
that latency.

As long as you're exporting with "async" and don't care about data
safety across crashes or power outages, I guess you could go all the way
and mount your ext4 export with "nobarrier", I *think* that will let the
system acknowledge writes as soon as they reach the disk's write cache.
I don't recommend that.

Just for fun I dug around a little for cheap options to get safe
low-latency storage:

For Intel you can cross-reference this list:


https://ark.intel.com/Search/FeatureFilter?productType=solidstatedrives=true

of SSD's with "enhanced power loss data protection" (EPLDP) with
shopping sites and I find e.g. this for US $121:

https://www.newegg.com/Product/Product.aspx?Item=9SIABVR66R5680

See the "device=" option in the ext4 man pages--you can use that to give
your existing ext4 filesystem an external journal on that device.  I
think you want "data=journal" as well, then writes should normally be
acknowledged once they hit that SSD's write cache, which should be quite
quick.

I was also curious whether there were PCI SSDs, but the cheapest Intel
SSD with EPLDP is the P4800X, at US $1600.

Intel Optane Memory is interesting as it starts at $70.  It doesn't have
EPLDP but latency of the underlying storage might be better even without
that?

I haven't figured out how to get a similar list for other brands.

Just searching for "SSD power loss protection" on newegg:

This also claims "power loss protection" at $53, but I can't find any
reviews:


https://www.newegg.com/Product/Product.aspx?Item=9SIA1K642V2376_re=ssd_power_loss_protection-_-9SIA1K642V2376-_-Product

Or this?:


https://www.newegg.com/Product/Product.aspx?Item=N82E16820156153_re=ssd_power_loss_protection-_-20-156-153-_-Product

This is another interesting discussion of the problem:


https://blogs.technet.microsoft.com/filecab/2016/11/18/dont-do-it-consumer-ssd/

--b.

Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-02-05 Thread J. Bruce Fields
On Mon, Feb 05, 2018 at 08:21:06AM +, Terry Barnaby wrote:
> On 01/02/18 08:29, Terry Barnaby wrote:
> > On 01/02/18 01:34, Jeremy Linton wrote:
> > > On 01/31/2018 09:49 AM, J. Bruce Fields wrote:
> > > > On Tue, Jan 30, 2018 at 01:52:49PM -0600, Jeremy Linton wrote:
> > > > > Have you tried this with a '-o nfsvers=3' during mount? Did that help?
> > > > > 
> > > > > I noticed a large decrease in my kernel build times across
> > > > > NFS/lan a while
> > > > > back after a machine/kernel/10g upgrade. After playing with
> > > > > mount/export
> > > > > options filesystem tuning/etc, I got to this point of timing
> > > > > a bunch of
> > > > > these operations vs the older machine, at which point I
> > > > > discovered that
> > > > > simply backing down to NFSv3 solved the problem.
> > > > > 
> > > > > AKA a nfsv3 server on a 10 year old 4 disk xfs RAID5 on 1Gb
> > > > > ethernet, was
> > > > > slower than a modern machine with a 8 disk xfs RAID5 on 10Gb
> > > > > on nfsv4. The
> > > > > effect was enough to change a kernel build from ~45 minutes
> > > > > down to less
> > > > > than 5.
> > > > 
> > Using NFSv3 in async mode is faster than NFSv4 in async mode (still
> > abysmal in sync mode).
> > 
> > NFSv3 async: sync; time (tar -xf linux-4.14.15.tar.gz -C /data2/tmp;
> > sync)
> > 
> > real    2m25.717s
> > user    0m8.739s
> > sys 0m13.362s
> > 
> > NFSv4 async: sync; time (tar -xf linux-4.14.15.tar.gz -C /data2/tmp;
> > sync)
> > 
> > real    3m33.032s
> > user    0m8.506s
> > sys 0m16.930s
> > 
> > NFSv3 async: wireshark trace
> > 
> > No. Time   Source Destination   Protocol Length Info
> >   18527 2.815884979    192.168.202.2 192.168.202.1 NFS 
> > 216    V3 CREATE Call (Reply In 18528), DH: 0x62f39428/dma.h Mode:
> > EXCLUSIVE
> >   18528 2.816362338    192.168.202.1 192.168.202.2 NFS 
> > 328    V3 CREATE Reply (Call In 18527)
> >   18529 2.816418841    192.168.202.2 192.168.202.1 NFS 
> > 224    V3 SETATTR Call (Reply In 18530), FH: 0x13678ba0
> >   18530 2.816871820    192.168.202.1 192.168.202.2 NFS 
> > 216    V3 SETATTR Reply (Call In 18529)
> >   18531 2.816966771    192.168.202.2 192.168.202.1 NFS 
> > 1148   V3 WRITE Call (Reply In 18532), FH: 0x13678ba0 Offset: 0 Len: 934
> > FILE_SYNC
> >   18532 2.817441291    192.168.202.1 192.168.202.2 NFS 
> > 208    V3 WRITE Reply (Call In 18531) Len: 934 FILE_SYNC
> >   18533 2.817495775    192.168.202.2 192.168.202.1 NFS 
> > 236    V3 SETATTR Call (Reply In 18534), FH: 0x13678ba0
> >   18534 2.817920346    192.168.202.1 192.168.202.2 NFS 
> > 216    V3 SETATTR Reply (Call In 18533)
> >   18535 2.818002910    192.168.202.2 192.168.202.1 NFS 
> > 216    V3 CREATE Call (Reply In 18536), DH: 0x62f39428/elf.h Mode:
> > EXCLUSIVE
> >   18536 2.818492126    192.168.202.1 192.168.202.2 NFS 
> > 328    V3 CREATE Reply (Call In 18535)
> > 
> > This is taking about 2ms for a small file write rather than 3ms for
> > NFSv4. There is an extra GETATTR and CLOSE RPC in NFSv4 accounting for
> > the difference.
> > 
> > So where I am:
> > 
> > 1. NFS in sync mode, at least on my two Fedora27 systems for my usage is
> > completely unusable. (sync: 2 hours, async: 3 minutes, localdisk: 13
> > seconds).
> > 
> > 2. NFS async mode is working, but the small writes are still very slow.
> > 
> > 3. NFS in async mode is 30% better with NFSv3 than NFSv4 when writing
> > small files due to the increased latency caused by NFSv4's two extra RPC
> > calls.
> > 
> > I really think that in 2018 we should be able to have better NFS
> > performance when writing many small files such as used in software
> > development. This would speed up any system that was using NFS with this
> > sort of workload dramatically and reduce power usage all for some
> > improvements in the NFS protocol.
> > 
> > I don't know the details of if this would work, or who is responsible
> > for NFS, but it would be good if possible to have some improvements
> > (NFSv4.3 ?). Maybe:
> > 
> > 1. Have an OPEN-SETATTR-WRITE RPC call all in one and a SETATTR-CLOSE
> > call all in one. This would reduce the latency of a small file to 1ms
> > rather than 3ms thus 66% faster. Would require the client to delay the
> > OPEN/SETATTR until the first WRITE. Not sure how possible this is in the
> > implementations. Maybe READ's could be improved as well but getting the
> > OPEN through quick may be better in this case ?
> > 
> > 2. Could go further with an OPEN-SETATTR-WRITE-CLOSE RPC call. (0.5ms vs
> > 3ms).
> > 
> > 3. On sync/async modes personally I think it would be better for the
> > client to request the mount in sync/async mode. The setting of sync on
> > the server side would just enforce sync mode for all clients. If the
> > server is in the default async mode clients can mount using sync or
> > async as to their requirements. This seems to match normal VFS 

Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-02-05 Thread Chris Murphy
http://vger.kernel.org/vger-lists.html#linux-nfs



-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-02-05 Thread Terry Barnaby

On 01/02/18 08:29, Terry Barnaby wrote:

On 01/02/18 01:34, Jeremy Linton wrote:

On 01/31/2018 09:49 AM, J. Bruce Fields wrote:

On Tue, Jan 30, 2018 at 01:52:49PM -0600, Jeremy Linton wrote:

Have you tried this with a '-o nfsvers=3' during mount? Did that help?

I noticed a large decrease in my kernel build times across NFS/lan 
a while
back after a machine/kernel/10g upgrade. After playing with 
mount/export
options filesystem tuning/etc, I got to this point of timing a 
bunch of
these operations vs the older machine, at which point I discovered 
that

simply backing down to NFSv3 solved the problem.

AKA a nfsv3 server on a 10 year old 4 disk xfs RAID5 on 1Gb 
ethernet, was
slower than a modern machine with a 8 disk xfs RAID5 on 10Gb on 
nfsv4. The
effect was enough to change a kernel build from ~45 minutes down to 
less

than 5.


Using NFSv3 in async mode is faster than NFSv4 in async mode (still 
abysmal in sync mode).


NFSv3 async: sync; time (tar -xf linux-4.14.15.tar.gz -C /data2/tmp; 
sync)


real    2m25.717s
user    0m8.739s
sys 0m13.362s

NFSv4 async: sync; time (tar -xf linux-4.14.15.tar.gz -C /data2/tmp; 
sync)


real    3m33.032s
user    0m8.506s
sys 0m16.930s

NFSv3 async: wireshark trace

No. Time   Source Destination   Protocol Length Info
  18527 2.815884979    192.168.202.2 192.168.202.1 NFS  
216    V3 CREATE Call (Reply In 18528), DH: 0x62f39428/dma.h Mode: 
EXCLUSIVE
  18528 2.816362338    192.168.202.1 192.168.202.2 NFS  
328    V3 CREATE Reply (Call In 18527)
  18529 2.816418841    192.168.202.2 192.168.202.1 NFS  
224    V3 SETATTR Call (Reply In 18530), FH: 0x13678ba0
  18530 2.816871820    192.168.202.1 192.168.202.2 NFS  
216    V3 SETATTR Reply (Call In 18529)
  18531 2.816966771    192.168.202.2 192.168.202.1 NFS  
1148   V3 WRITE Call (Reply In 18532), FH: 0x13678ba0 Offset: 0 Len: 
934 FILE_SYNC
  18532 2.817441291    192.168.202.1 192.168.202.2 NFS  
208    V3 WRITE Reply (Call In 18531) Len: 934 FILE_SYNC
  18533 2.817495775    192.168.202.2 192.168.202.1 NFS  
236    V3 SETATTR Call (Reply In 18534), FH: 0x13678ba0
  18534 2.817920346    192.168.202.1 192.168.202.2 NFS  
216    V3 SETATTR Reply (Call In 18533)
  18535 2.818002910    192.168.202.2 192.168.202.1 NFS  
216    V3 CREATE Call (Reply In 18536), DH: 0x62f39428/elf.h Mode: 
EXCLUSIVE
  18536 2.818492126    192.168.202.1 192.168.202.2 NFS  
328    V3 CREATE Reply (Call In 18535)


This is taking about 2ms for a small file write rather than 3ms for 
NFSv4. There is an extra GETATTR and CLOSE RPC in NFSv4 accounting for 
the difference.


So where I am:

1. NFS in sync mode, at least on my two Fedora27 systems for my usage 
is completely unusable. (sync: 2 hours, async: 3 minutes, localdisk: 
13 seconds).


2. NFS async mode is working, but the small writes are still very slow.

3. NFS in async mode is 30% better with NFSv3 than NFSv4 when writing 
small files due to the increased latency caused by NFSv4's two extra 
RPC calls.


I really think that in 2018 we should be able to have better NFS 
performance when writing many small files such as used in software 
development. This would speed up any system that was using NFS with 
this sort of workload dramatically and reduce power usage all for some 
improvements in the NFS protocol.


I don't know the details of if this would work, or who is responsible 
for NFS, but it would be good if possible to have some improvements 
(NFSv4.3 ?). Maybe:


1. Have an OPEN-SETATTR-WRITE RPC call all in one and a SETATTR-CLOSE 
call all in one. This would reduce the latency of a small file to 1ms 
rather than 3ms thus 66% faster. Would require the client to delay the 
OPEN/SETATTR until the first WRITE. Not sure how possible this is in 
the implementations. Maybe READ's could be improved as well but 
getting the OPEN through quick may be better in this case ?


2. Could go further with an OPEN-SETATTR-WRITE-CLOSE RPC call. (0.5ms 
vs 3ms).


3. On sync/async modes personally I think it would be better for the 
client to request the mount in sync/async mode. The setting of sync on 
the server side would just enforce sync mode for all clients. If the 
server is in the default async mode clients can mount using sync or 
async as to their requirements. This seems to match normal VFS 
semantics and usage patterns better.


4. The 0.5ms RPC latency seems a bit high (ICMP pings 0.12ms) . Maybe 
this is worth investigating in the Linux kernel processing (how ?) ?


5. The 20ms RPC latency I see in sync mode needs a look at on my 
system although async mode is fine for my usage. Maybe this ends up as 
2 x 10ms drive seeks on ext4 and is thus expected.


Yet another poor NFSv3 performance issue. If I do a "ls -lR" of a 
certain NFS mounted directory over a slow link (NFS over Openvpn over 
FTTP 80/20Mbps), just after mounting the file 

Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-02-01 Thread J. Bruce Fields
On Wed, Jan 31, 2018 at 07:34:24PM -0600, Jeremy Linton wrote:
> On 01/31/2018 09:49 AM, J. Bruce Fields wrote:
> > In the kernel compile case there's probably also a lot of re-opening and
> > re-reading files too?  NFSv4 is chattier there too.  Read delegations
> > should help compensate, but we need to improve the heuristics that
> > decide when they're given out.
> 
> The main kernel include files get repeatedly hammered, despite them in
> theory being in cache, IIRC. So yes, if the concurrent (re)open path is even
> slightly slower its going to hurt a lot.
> 
> > All that aside I can't think what would explain that big a difference
> > (45 minutes vs. 5).  It might be interesting to figure out what
> > happened.
> 
> I had already spent more than my time allotted looking in the wrong
> direction at the filesystem/RAID (did turn off intellipark though) by the
> time I discovered the nfsv3/v4 perf delta. Its been sitting way down on the
> "things to look into" list for a long time now. I'm still using it as a NFS
> server so at some point I can take another look if the problem persists.

OK, understood.

Well, if you ever want to take another look at the v4 issue--I've been
meaning to rework the delegation heuristics.  Assuming you're on a
recent kernel, I could give you some experimental (but probably not too
risky) kernel patches if you didn't mind keeping notes on the results.

I'll probably get around to it eventually on my own, but it'd probably
happen sooner with a collaborator.

But the difference you saw was so drastic, there may have just been some
unrelated NFSv4 bug.

--b.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-02-01 Thread Terry Barnaby

On 01/02/18 01:34, Jeremy Linton wrote:

On 01/31/2018 09:49 AM, J. Bruce Fields wrote:

On Tue, Jan 30, 2018 at 01:52:49PM -0600, Jeremy Linton wrote:

Have you tried this with a '-o nfsvers=3' during mount? Did that help?

I noticed a large decrease in my kernel build times across NFS/lan a 
while
back after a machine/kernel/10g upgrade. After playing with 
mount/export

options filesystem tuning/etc, I got to this point of timing a bunch of
these operations vs the older machine, at which point I discovered that
simply backing down to NFSv3 solved the problem.

AKA a nfsv3 server on a 10 year old 4 disk xfs RAID5 on 1Gb 
ethernet, was
slower than a modern machine with a 8 disk xfs RAID5 on 10Gb on 
nfsv4. The
effect was enough to change a kernel build from ~45 minutes down to 
less

than 5.


Using NFSv3 in async mode is faster than NFSv4 in async mode (still 
abysmal in sync mode).


NFSv3 async: sync; time (tar -xf linux-4.14.15.tar.gz -C /data2/tmp; sync)

real    2m25.717s
user    0m8.739s
sys 0m13.362s

NFSv4 async: sync; time (tar -xf linux-4.14.15.tar.gz -C /data2/tmp; sync)

real    3m33.032s
user    0m8.506s
sys 0m16.930s

NFSv3 async: wireshark trace

No. Time   Source Destination   Protocol Length Info
  18527 2.815884979    192.168.202.2 192.168.202.1 NFS  
216    V3 CREATE Call (Reply In 18528), DH: 0x62f39428/dma.h Mode: EXCLUSIVE
  18528 2.816362338    192.168.202.1 192.168.202.2 NFS  
328    V3 CREATE Reply (Call In 18527)
  18529 2.816418841    192.168.202.2 192.168.202.1 NFS  
224    V3 SETATTR Call (Reply In 18530), FH: 0x13678ba0
  18530 2.816871820    192.168.202.1 192.168.202.2 NFS  
216    V3 SETATTR Reply (Call In 18529)
  18531 2.816966771    192.168.202.2 192.168.202.1 NFS  
1148   V3 WRITE Call (Reply In 18532), FH: 0x13678ba0 Offset: 0 Len: 934 
FILE_SYNC
  18532 2.817441291    192.168.202.1 192.168.202.2 NFS  
208    V3 WRITE Reply (Call In 18531) Len: 934 FILE_SYNC
  18533 2.817495775    192.168.202.2 192.168.202.1 NFS  
236    V3 SETATTR Call (Reply In 18534), FH: 0x13678ba0
  18534 2.817920346    192.168.202.1 192.168.202.2 NFS  
216    V3 SETATTR Reply (Call In 18533)
  18535 2.818002910    192.168.202.2 192.168.202.1 NFS  
216    V3 CREATE Call (Reply In 18536), DH: 0x62f39428/elf.h Mode: EXCLUSIVE
  18536 2.818492126    192.168.202.1 192.168.202.2 NFS  
328    V3 CREATE Reply (Call In 18535)


This is taking about 2ms for a small file write rather than 3ms for 
NFSv4. There is an extra GETATTR and CLOSE RPC in NFSv4 accounting for 
the difference.


So where I am:

1. NFS in sync mode, at least on my two Fedora27 systems for my usage is 
completely unusable. (sync: 2 hours, async: 3 minutes, localdisk: 13 
seconds).


2. NFS async mode is working, but the small writes are still very slow.

3. NFS in async mode is 30% better with NFSv3 than NFSv4 when writing 
small files due to the increased latency caused by NFSv4's two extra RPC 
calls.


I really think that in 2018 we should be able to have better NFS 
performance when writing many small files such as used in software 
development. This would speed up any system that was using NFS with this 
sort of workload dramatically and reduce power usage all for some 
improvements in the NFS protocol.


I don't know the details of if this would work, or who is responsible 
for NFS, but it would be good if possible to have some improvements 
(NFSv4.3 ?). Maybe:


1. Have an OPEN-SETATTR-WRITE RPC call all in one and a SETATTR-CLOSE 
call all in one. This would reduce the latency of a small file to 1ms 
rather than 3ms thus 66% faster. Would require the client to delay the 
OPEN/SETATTR until the first WRITE. Not sure how possible this is in the 
implementations. Maybe READ's could be improved as well but getting the 
OPEN through quick may be better in this case ?


2. Could go further with an OPEN-SETATTR-WRITE-CLOSE RPC call. (0.5ms vs 
3ms).


3. On sync/async modes personally I think it would be better for the 
client to request the mount in sync/async mode. The setting of sync on 
the server side would just enforce sync mode for all clients. If the 
server is in the default async mode clients can mount using sync or 
async as to their requirements. This seems to match normal VFS semantics 
and usage patterns better.


4. The 0.5ms RPC latency seems a bit high (ICMP pings 0.12ms) . Maybe 
this is worth investigating in the Linux kernel processing (how ?) ?


5. The 20ms RPC latency I see in sync mode needs a look at on my system 
although async mode is fine for my usage. Maybe this ends up as 2 x 10ms 
drive seeks on ext4 and is thus expected.


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-01-31 Thread Jeremy Linton

On 01/31/2018 09:49 AM, J. Bruce Fields wrote:

On Tue, Jan 30, 2018 at 01:52:49PM -0600, Jeremy Linton wrote:

Have you tried this with a '-o nfsvers=3' during mount? Did that help?

I noticed a large decrease in my kernel build times across NFS/lan a while
back after a machine/kernel/10g upgrade. After playing with mount/export
options filesystem tuning/etc, I got to this point of timing a bunch of
these operations vs the older machine, at which point I discovered that
simply backing down to NFSv3 solved the problem.

AKA a nfsv3 server on a 10 year old 4 disk xfs RAID5 on 1Gb ethernet, was
slower than a modern machine with a 8 disk xfs RAID5 on 10Gb on nfsv4. The
effect was enough to change a kernel build from ~45 minutes down to less
than 5.


Did you mean "faster than"?


Yes, sorry about that.



Definitely worth trying, though I wouldn't expect it to make that big a
difference in the untarring-a-kernel-tree case--I think the only RPC
avoided in the v3 case would be the CLOSE, and it should be one of the
faster ones.

In the kernel compile case there's probably also a lot of re-opening and
re-reading files too?  NFSv4 is chattier there too.  Read delegations
should help compensate, but we need to improve the heuristics that
decide when they're given out.


The main kernel include files get repeatedly hammered, despite them in 
theory being in cache, IIRC. So yes, if the concurrent (re)open path is 
even slightly slower its going to hurt a lot.




All that aside I can't think what would explain that big a difference
(45 minutes vs. 5).  It might be interesting to figure out what
happened.


I had already spent more than my time allotted looking in the wrong 
direction at the filesystem/RAID (did turn off intellipark though) by 
the time I discovered the nfsv3/v4 perf delta. Its been sitting way down 
on the "things to look into" list for a long time now. I'm still using 
it as a NFS server so at some point I can take another look if the 
problem persists.

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-01-31 Thread J. Bruce Fields
On Tue, Jan 30, 2018 at 01:52:49PM -0600, Jeremy Linton wrote:
> Have you tried this with a '-o nfsvers=3' during mount? Did that help?
> 
> I noticed a large decrease in my kernel build times across NFS/lan a while
> back after a machine/kernel/10g upgrade. After playing with mount/export
> options filesystem tuning/etc, I got to this point of timing a bunch of
> these operations vs the older machine, at which point I discovered that
> simply backing down to NFSv3 solved the problem.
> 
> AKA a nfsv3 server on a 10 year old 4 disk xfs RAID5 on 1Gb ethernet, was
> slower than a modern machine with a 8 disk xfs RAID5 on 10Gb on nfsv4. The
> effect was enough to change a kernel build from ~45 minutes down to less
> than 5.

Did you mean "faster than"?

Definitely worth trying, though I wouldn't expect it to make that big a
difference in the untarring-a-kernel-tree case--I think the only RPC
avoided in the v3 case would be the CLOSE, and it should be one of the
faster ones.

In the kernel compile case there's probably also a lot of re-opening and
re-reading files too?  NFSv4 is chattier there too.  Read delegations
should help compensate, but we need to improve the heuristics that
decide when they're given out.

All that aside I can't think what would explain that big a difference
(45 minutes vs. 5).  It might be interesting to figure out what
happened.

--b.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-01-30 Thread J. Bruce Fields
On Tue, Jan 30, 2018 at 10:30:04PM +, Terry Barnaby wrote:
> Also, on the 0.5ms. Is this effectively the 1ms system tick ie. the NFS
> processing is not processing based on the packet events (not pre-emptive)
> but on the next system tick ?
> 
> An ICMP ping is about 0.13ms (to and fro) between these systems. Although
> 0.5ms is relatively fast, I wouldn't have thought it should have to take
> 0.5ms for a minimal RPC even over TCPIP.

It'd be interesting to break down that latency.  I'm not sure where it's
coming from.  I doubt it has to do with the system tick.

--b.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-01-30 Thread Terry Barnaby

On 30/01/18 21:31, J. Bruce Fields wrote:

On Tue, Jan 30, 2018 at 07:03:17PM +, Terry Barnaby wrote:

It looks like each RPC call takes about 0.5ms. Why do there need to be some
many RPC calls for this ? The OPEN call could set the attribs, no need for
the later GETATTR or SETATTR calls.

The first SETATTR (which sets ctime and mtime to server's time) seems
unnecessary, maybe there's a client bug.

The second looks like tar's fault, strace shows it doing a utimensat()
on each file.  I don't know why or if that's optional.


Even the CLOSE could be integrated with the WRITE and taking this
further OPEN could do OPEN, SETATTR, and some WRITE all in one.

We'd probably need some new protocol to make it safe to return from the
open systemcall before we've gotten the OPEN reply from the server.

Write delegations might save us from having to wait for the other
operations.

Taking a look at my own setup, I see the same calls taking about 1ms.
The drives can't do that, so I've got a problem somewhere too

--b.


Also, on the 0.5ms. Is this effectively the 1ms system tick ie. the NFS 
processing is not processing based on the packet events (not 
pre-emptive) but on the next system tick ?


An ICMP ping is about 0.13ms (to and fro) between these systems. 
Although 0.5ms is relatively fast, I wouldn't have thought it should 
have to take 0.5ms for a minimal RPC even over TCPIP.

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-01-30 Thread J. Bruce Fields
On Tue, Jan 30, 2018 at 04:31:58PM -0500, J. Bruce Fields wrote:
> On Tue, Jan 30, 2018 at 07:03:17PM +, Terry Barnaby wrote:
> > It looks like each RPC call takes about 0.5ms. Why do there need to be some
> > many RPC calls for this ? The OPEN call could set the attribs, no need for
> > the later GETATTR or SETATTR calls.
> 
> The first SETATTR (which sets ctime and mtime to server's time) seems
> unnecessary, maybe there's a client bug.
> 
> The second looks like tar's fault, strace shows it doing a utimensat()
> on each file.  I don't know why or if that's optional.
> 
> > Even the CLOSE could be integrated with the WRITE and taking this
> > further OPEN could do OPEN, SETATTR, and some WRITE all in one.
> 
> We'd probably need some new protocol to make it safe to return from the
> open systemcall before we've gotten the OPEN reply from the server.
> 
> Write delegations might save us from having to wait for the other
> operations.
> 
> Taking a look at my own setup, I see the same calls taking about 1ms.
> The drives can't do that, so I've got a problem somewhere too

Whoops, I totally forgot it was still set up with an external journal on
SSD:

# tune2fs -l /dev/mapper/export-export |grep '^Journal'
Journal UUID: dc356049-6e2f-4e74-b185-5357bee73a32
Journal device:   0x0803
Journal backup:   inode blocks
# blkid --uuid dc356049-6e2f-4e74-b185-5357bee73a32
/dev/sda3
# cat /sys/block/sda/device/model 
INTEL SSDSA2M080

So, most of the data is striped across a couple big hard drives, but the
journal is actually on a small partition on an SSD.

If I remember correctly, I initially tried this with an older intel SSD
and didn't get a performance improvement.  Then I replaced it with this
model which has the "Enhanced Power Loss Data Protection" feature, which
I believe means the write cache is durable, so it should be able to
safely acknowledge writes as soon as they reach the SSD's cache.

And weirdly I think I never actually got around to rerunning these tests
after I installed the new SSD.

Anyway, so that might explain the difference we're seeing.

I'm not sure how to find new SSDs with that feature, but it may be worth
considering as a cheap way to accelerate this kind of workload.  It can
be a very small SSD as it only needs to hold the journal.  Adding an
external journal is a quick operation (you don't have to recreate the
filesystem or anything).

--b.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-01-30 Thread J. Bruce Fields
On Tue, Jan 30, 2018 at 07:03:17PM +, Terry Barnaby wrote:
> It looks like each RPC call takes about 0.5ms. Why do there need to be some
> many RPC calls for this ? The OPEN call could set the attribs, no need for
> the later GETATTR or SETATTR calls.

The first SETATTR (which sets ctime and mtime to server's time) seems
unnecessary, maybe there's a client bug.

The second looks like tar's fault, strace shows it doing a utimensat()
on each file.  I don't know why or if that's optional.

> Even the CLOSE could be integrated with the WRITE and taking this
> further OPEN could do OPEN, SETATTR, and some WRITE all in one.

We'd probably need some new protocol to make it safe to return from the
open systemcall before we've gotten the OPEN reply from the server.

Write delegations might save us from having to wait for the other
operations.

Taking a look at my own setup, I see the same calls taking about 1ms.
The drives can't do that, so I've got a problem somewhere too

--b.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-01-30 Thread Jeremy Linton

Hi,

On 01/30/2018 01:03 PM, Terry Barnaby wrote:


Being a daredevil, I have used the NFS async option for 27 years 
without an issue on multiple systems :)


I have just mounted my ext4 disk with the same options you were using 
and the same NFS export options and the speed here looks the same as I 
had previously. As I can't wait 2+ hours so I'm just looking at 
ksysguard and it is showing a network rate of about 10 KBytes/s and 
the directory on the server is growing in size very very slowly.


This is using the current Fedora27 kernel 4.14.14-300.fc27.x86_64.

I will have a look at using wireshark to see if this shows anything.


This is a snippet from a wireshark trace of the NFS when untaring the 
linux kernel 4.14.15 sources into an NFSv4.2 mounted directory with 
"sync" option on my NFS server. The whole untar would take > 2 hours vs 
13 seconds direct to the disk. This is about 850 MBytes of 60k files. 
The following is a single, small file write.


No. Time   Source Destination   Protocol Length Info
    1880 11.928600315   192.168.202.2 192.168.202.1 NFS 380
V4 Call (Reply In 1881) OPEN DH: 0xac0502f2/sysfs-c2port
    1881 11.950329198   192.168.202.1 192.168.202.2 NFS 408
V4 Reply (Call In 1880) OPEN StateID: 0xaa72
    1882 11.950446430   192.168.202.2 192.168.202.1 NFS 304
V4 Call (Reply In 1883) SETATTR FH: 0x825014ee
    1883 11.972608880   192.168.202.1 192.168.202.2 NFS 336
V4 Reply (Call In 1882) SETATTR
    1884 11.972754709   192.168.202.2 192.168.202.1 TCP 
1516   785 → 2049 [ACK] Seq=465561 Ack=183381 Win=8990 Len=1448 
TSval=1663691771 TSecr=3103357902 [TCP segment of a reassembled PDU]
    1885 11.972763078   192.168.202.2 192.168.202.1 TCP 
1516   785 → 2049 [ACK] Seq=467009 Ack=183381 Win=8990 Len=1448 
TSval=1663691771 TSecr=3103357902 [TCP segment of a reassembled PDU]
    1886 11.972979437   192.168.202.2 192.168.202.1 NFS 332
V4 Call (Reply In 1888) WRITE StateID: 0xafdf Offset: 0 Len: 2931
    1887 11.973074490   192.168.202.1 192.168.202.2 TCP 
68 2049 → 785 [ACK] Seq=183381 Ack=468721 Win=24557 Len=0 
TSval=3103357902 TSecr=1663691771
    1888 12.017153631   192.168.202.1 192.168.202.2 NFS 248
V4 Reply (Call In 1886) WRITE
    1889 12.017338766   192.168.202.2 192.168.202.1 NFS 260
V4 Call (Reply In 1890) GETATTR FH: 0x825014ee
    1890 12.017834411   192.168.202.1 192.168.202.2 NFS 312
V4 Reply (Call In 1889) GETATTR
    1891 12.017961690   192.168.202.2 192.168.202.1 NFS 328
V4 Call (Reply In 1892) SETATTR FH: 0x825014ee
    1892 12.039456634   192.168.202.1 192.168.202.2 NFS 336
V4 Reply (Call In 1891) SETATTR
    1893 12.039536705   192.168.202.2 192.168.202.1 NFS 284
V4 Call (Reply In 1894) CLOSE StateID: 0xaa72
    1894 12.039979528   192.168.202.1 192.168.202.2 NFS 248
V4 Reply (Call In 1893) CLOSE
    1895 12.040077180   192.168.202.2 192.168.202.1 NFS 392
V4 Call (Reply In 1896) OPEN DH: 0xac0502f2/sysfs-cfq-target-latency
    1896 12.061903798   192.168.202.1 192.168.202.2 NFS 408
V4 Reply (Call In 1895) OPEN StateID: 0xaa72


It looks like this takes about 100ms to write this small file. With the 
approx 60k files in the archive this would take about 6000 secs, so is 
in the 2 hours ballpark or the untar that I am seeing.


Looks like OPEN 21ms, SETATTR 22ms, WRITE 44ms, second SETATTR 21ms a 
lot of time ...


The following is for an "async" mount:

No. Time   Source Destination   Protocol Length Info
   37393 7.630012608    192.168.202.2 192.168.202.1 NFS 396
V4 Call (Reply In 37394) OPEN DH: 0x1f828ac9/vidioc-dbg-g-chip-info.rst
   37394 7.630488451    192.168.202.1 192.168.202.2 NFS 408
V4 Reply (Call In 37393) OPEN StateID: 0xaa72
   37395 7.630525117    192.168.202.2 192.168.202.1 NFS 304
V4 Call (Reply In 37396) SETATTR FH: 0x0f65c554
   37396 7.630980560    192.168.202.1 192.168.202.2 NFS 336
V4 Reply (Call In 37395) SETATTR
   37397 7.631035171    192.168.202.2 192.168.202.1 TCP 
1516   785 → 2049 [ACK] Seq=13054241 Ack=3620329 Win=8990 Len=1448 
TSval=1664595527 TSecr=3104261711 [TCP segment of a reassembled PDU]
   37398 7.631038994    192.168.202.2 192.168.202.1 TCP 
1516   785 → 2049 [ACK] Seq=13055689 Ack=3620329 Win=8990 Len=1448 
TSval=1664595527 TSecr=3104261711 [TCP segment of a reassembled PDU]
   37399 7.631042228    192.168.202.2 192.168.202.1 TCP 
1516   785 → 2049 [ACK] Seq=13057137 Ack=3620329 Win=8990 Len=1448 
TSval=1664595527 TSecr=3104261711 [TCP segment of a reassembled PDU]
   37400 7.631195554    192.168.202.2 192.168.202.1 NFS 448
V4 Call (Reply In 37402) WRITE StateID: 0xafdf Offset: 0 Len: 4493
   37401 7.631277423    192.168.202.1 192.168.202.2 TCP 
68 2049 → 785 [ACK] Seq=3620329 

Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-01-30 Thread Terry Barnaby


Being a daredevil, I have used the NFS async option for 27 years 
without an issue on multiple systems :)


I have just mounted my ext4 disk with the same options you were using 
and the same NFS export options and the speed here looks the same as I 
had previously. As I can't wait 2+ hours so I'm just looking at 
ksysguard and it is showing a network rate of about 10 KBytes/s and 
the directory on the server is growing in size very very slowly.


This is using the current Fedora27 kernel 4.14.14-300.fc27.x86_64.

I will have a look at using wireshark to see if this shows anything.


This is a snippet from a wireshark trace of the NFS when untaring the 
linux kernel 4.14.15 sources into an NFSv4.2 mounted directory with 
"sync" option on my NFS server. The whole untar would take > 2 hours vs 
13 seconds direct to the disk. This is about 850 MBytes of 60k files. 
The following is a single, small file write.


No. Time   Source Destination   Protocol Length Info
   1880 11.928600315   192.168.202.2 192.168.202.1 NFS  
380    V4 Call (Reply In 1881) OPEN DH: 0xac0502f2/sysfs-c2port
   1881 11.950329198   192.168.202.1 192.168.202.2 NFS  
408    V4 Reply (Call In 1880) OPEN StateID: 0xaa72
   1882 11.950446430   192.168.202.2 192.168.202.1 NFS  
304    V4 Call (Reply In 1883) SETATTR FH: 0x825014ee
   1883 11.972608880   192.168.202.1 192.168.202.2 NFS  
336    V4 Reply (Call In 1882) SETATTR
   1884 11.972754709   192.168.202.2 192.168.202.1 TCP  
1516   785 → 2049 [ACK] Seq=465561 Ack=183381 Win=8990 Len=1448 
TSval=1663691771 TSecr=3103357902 [TCP segment of a reassembled PDU]
   1885 11.972763078   192.168.202.2 192.168.202.1 TCP  
1516   785 → 2049 [ACK] Seq=467009 Ack=183381 Win=8990 Len=1448 
TSval=1663691771 TSecr=3103357902 [TCP segment of a reassembled PDU]
   1886 11.972979437   192.168.202.2 192.168.202.1 NFS  
332    V4 Call (Reply In 1888) WRITE StateID: 0xafdf Offset: 0 Len: 2931
   1887 11.973074490   192.168.202.1 192.168.202.2 TCP  
68 2049 → 785 [ACK] Seq=183381 Ack=468721 Win=24557 Len=0 
TSval=3103357902 TSecr=1663691771
   1888 12.017153631   192.168.202.1 192.168.202.2 NFS  
248    V4 Reply (Call In 1886) WRITE
   1889 12.017338766   192.168.202.2 192.168.202.1 NFS  
260    V4 Call (Reply In 1890) GETATTR FH: 0x825014ee
   1890 12.017834411   192.168.202.1 192.168.202.2 NFS  
312    V4 Reply (Call In 1889) GETATTR
   1891 12.017961690   192.168.202.2 192.168.202.1 NFS  
328    V4 Call (Reply In 1892) SETATTR FH: 0x825014ee
   1892 12.039456634   192.168.202.1 192.168.202.2 NFS  
336    V4 Reply (Call In 1891) SETATTR
   1893 12.039536705   192.168.202.2 192.168.202.1 NFS  
284    V4 Call (Reply In 1894) CLOSE StateID: 0xaa72
   1894 12.039979528   192.168.202.1 192.168.202.2 NFS  
248    V4 Reply (Call In 1893) CLOSE
   1895 12.040077180   192.168.202.2 192.168.202.1 NFS  
392    V4 Call (Reply In 1896) OPEN DH: 0xac0502f2/sysfs-cfq-target-latency
   1896 12.061903798   192.168.202.1 192.168.202.2 NFS  
408    V4 Reply (Call In 1895) OPEN StateID: 0xaa72


It looks like this takes about 100ms to write this small file. With the 
approx 60k files in the archive this would take about 6000 secs, so is 
in the 2 hours ballpark or the untar that I am seeing.


Looks like OPEN 21ms, SETATTR 22ms, WRITE 44ms, second SETATTR 21ms a 
lot of time ...


The following is for an "async" mount:

No. Time   Source Destination   Protocol Length Info
  37393 7.630012608    192.168.202.2 192.168.202.1 NFS  
396    V4 Call (Reply In 37394) OPEN DH: 
0x1f828ac9/vidioc-dbg-g-chip-info.rst
  37394 7.630488451    192.168.202.1 192.168.202.2 NFS  
408    V4 Reply (Call In 37393) OPEN StateID: 0xaa72
  37395 7.630525117    192.168.202.2 192.168.202.1 NFS  
304    V4 Call (Reply In 37396) SETATTR FH: 0x0f65c554
  37396 7.630980560    192.168.202.1 192.168.202.2 NFS  
336    V4 Reply (Call In 37395) SETATTR
  37397 7.631035171    192.168.202.2 192.168.202.1 TCP  
1516   785 → 2049 [ACK] Seq=13054241 Ack=3620329 Win=8990 Len=1448 
TSval=1664595527 TSecr=3104261711 [TCP segment of a reassembled PDU]
  37398 7.631038994    192.168.202.2 192.168.202.1 TCP  
1516   785 → 2049 [ACK] Seq=13055689 Ack=3620329 Win=8990 Len=1448 
TSval=1664595527 TSecr=3104261711 [TCP segment of a reassembled PDU]
  37399 7.631042228    192.168.202.2 192.168.202.1 TCP  
1516   785 → 2049 [ACK] Seq=13057137 Ack=3620329 Win=8990 Len=1448 
TSval=1664595527 TSecr=3104261711 [TCP segment of a reassembled PDU]
  37400 7.631195554    192.168.202.2 192.168.202.1 NFS  
448    V4 Call (Reply In 37402) WRITE StateID: 0xafdf Offset: 0 Len: 4493
  37401 7.631277423    192.168.202.1 

Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-01-30 Thread Terry Barnaby

On 30/01/18 17:54, J. Bruce Fields wrote:

On Tue, Jan 30, 2018 at 12:31:22PM -0500, J. Bruce Fields wrote:

On Tue, Jan 30, 2018 at 04:49:41PM +, Terry Barnaby wrote:

I have just tried running the untar on our work systems. These are again
Fedora27 but newer hardware.
I set one of the servers NFS exports to just rw (removed the async option in
/etc/exports and ran exportfs -arv).
Remounted this NFS file system on a Fedora27 client and re-ran the test. I
have only waited 10mins but the overal network data rate is in the order of
0.1 MBytes/sec so it looks like it will be a multiple hour job as at home.
So I have two completely separate systems with the same performance over
NFS.
With your NFS "sync" test are you sure you set the "sync" mode on the server
and re-exported the file systems ?

Not being a daredevil, I use "sync" by default:

# exportfs -v /export 
(rw,sync,wdelay,hide,no_subtree_check,sec=sys,insecure,no_root_squash,no_all_squash)

For the "async" case I changed the options and actually rebooted, yes.

The filesystem is:

/dev/mapper/export-export on /export type ext4 
(rw,relatime,seclabel,nodelalloc,stripe=32,data=journal)

(I think data=journal is the only non-default, and I don't remember why
I chose that.)

Hah, well, with data=ordered (the default) the same untar (with "sync"
export) took 15m38s.  So... that probably wasn't an accident.

It may be irresponsible for me to guess given the state of my ignorance
about ext4 journaling, but perhaps writing everything to the journal and
delaying writing it out to its real location as long as possible allows
some sort of tradeoff between bandwidth and seeks that helps with this
sync-heavy workload.

--b.


Being a daredevil, I have used the NFS async option for 27 years without 
an issue on multiple systems :)


I have just mounted my ext4 disk with the same options you were using 
and the same NFS export options and the speed here looks the same as I 
had previously. As I can't wait 2+ hours so I'm just looking at 
ksysguard and it is showing a network rate of about 10 KBytes/s and the 
directory on the server is growing in size very very slowly.


This is using the current Fedora27 kernel 4.14.14-300.fc27.x86_64.

I will have a look at using wireshark to see if this shows anything.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-01-30 Thread J. Bruce Fields
On Tue, Jan 30, 2018 at 12:31:22PM -0500, J. Bruce Fields wrote:
> On Tue, Jan 30, 2018 at 04:49:41PM +, Terry Barnaby wrote:
> > I have just tried running the untar on our work systems. These are again
> > Fedora27 but newer hardware.
> > I set one of the servers NFS exports to just rw (removed the async option in
> > /etc/exports and ran exportfs -arv).
> > Remounted this NFS file system on a Fedora27 client and re-ran the test. I
> > have only waited 10mins but the overal network data rate is in the order of
> > 0.1 MBytes/sec so it looks like it will be a multiple hour job as at home.
> > So I have two completely separate systems with the same performance over
> > NFS.
> > With your NFS "sync" test are you sure you set the "sync" mode on the server
> > and re-exported the file systems ?
> 
> Not being a daredevil, I use "sync" by default:
> 
>   # exportfs -v /export 
> (rw,sync,wdelay,hide,no_subtree_check,sec=sys,insecure,no_root_squash,no_all_squash)
> 
> For the "async" case I changed the options and actually rebooted, yes.
> 
> The filesystem is:
> 
>   /dev/mapper/export-export on /export type ext4 
> (rw,relatime,seclabel,nodelalloc,stripe=32,data=journal) 
> 
> (I think data=journal is the only non-default, and I don't remember why
> I chose that.)

Hah, well, with data=ordered (the default) the same untar (with "sync"
export) took 15m38s.  So... that probably wasn't an accident.

It may be irresponsible for me to guess given the state of my ignorance
about ext4 journaling, but perhaps writing everything to the journal and
delaying writing it out to its real location as long as possible allows
some sort of tradeoff between bandwidth and seeks that helps with this
sync-heavy workload.

--b.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-01-30 Thread J. Bruce Fields
On Tue, Jan 30, 2018 at 04:49:41PM +, Terry Barnaby wrote:
> I have just tried running the untar on our work systems. These are again
> Fedora27 but newer hardware.
> I set one of the servers NFS exports to just rw (removed the async option in
> /etc/exports and ran exportfs -arv).
> Remounted this NFS file system on a Fedora27 client and re-ran the test. I
> have only waited 10mins but the overal network data rate is in the order of
> 0.1 MBytes/sec so it looks like it will be a multiple hour job as at home.
> So I have two completely separate systems with the same performance over
> NFS.
> With your NFS "sync" test are you sure you set the "sync" mode on the server
> and re-exported the file systems ?

Not being a daredevil, I use "sync" by default:

# exportfs -v /export 
(rw,sync,wdelay,hide,no_subtree_check,sec=sys,insecure,no_root_squash,no_all_squash)

For the "async" case I changed the options and actually rebooted, yes.

The filesystem is:

/dev/mapper/export-export on /export type ext4 
(rw,relatime,seclabel,nodelalloc,stripe=32,data=journal) 

(I think data=journal is the only non-default, and I don't remember why
I chose that.)

--b.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-01-30 Thread Terry Barnaby

On 30/01/18 16:22, J. Bruce Fields wrote:

On Tue, Jan 30, 2018 at 03:29:41PM +, Terry Barnaby wrote:

On 30/01/18 15:09, J. Bruce Fields wrote:

By comparison on my little home server (Fedora, ext4, a couple WD Black
1TB drives), with sync, that untar takes is 7:44, about 8ms/file.

Ok, that is far more reasonable, so something is up on my systems :)
What speed do you get with the server export set to async ?

I tried just now and got 4m2s.

The drives probably still have to do a seek or two per create, the
difference now is that we don't have to wait for one create to start the
next one, so the drives can work in parallel.

So given that I'm striping across two drives, I *think* it makes sense
that I'm getting about double the performance with the async export
option.

But that doesn't explain the difference between async and local
performance (22s when I tried the same untar directly on the server, 25s
when I included a final sync in the timing).  And your numbers are a
complete mystery.
I have just tried running the untar on our work systems. These are again 
Fedora27 but newer hardware.
I set one of the servers NFS exports to just rw (removed the async 
option in /etc/exports and ran exportfs -arv).
Remounted this NFS file system on a Fedora27 client and re-ran the test. 
I have only waited 10mins but the overal network data rate is in the 
order of 0.1 MBytes/sec so it looks like it will be a multiple hour job 
as at home.
So I have two completely separate systems with the same performance over 
NFS.
With your NFS "sync" test are you sure you set the "sync" mode on the 
server and re-exported the file systems ?




--b.


What's the disk configuration and what filesystem is this?

Those tests above were to a single: SATA Western Digital Red 3TB, WDC
WD30EFRX-68EUZN0 using ext4.
Most of my tests have been to software RAID1 SATA disks, Western Digital Red
2TB on one server and Western Digital RE4 2TB WDC WD2003FYYS-02W0B1 on
another quad core Xeon server all using ext4 and all having plenty of RAM.
All on stock Fedora27 (both server and client) updated to date.


Is it really expected for NFS to be this bad these days with a reasonably
typical operation and are there no other tuning parameters that can help  ?

It's expected that the performance of single-threaded file creates will
depend on latency, not bandwidth.

I believe high-performance servers use battery backed write caches with
storage behind them that can do lots of IOPS.

(One thing I've been curious about is whether you could get better
performance cheap on this kind of workload ext3/4 striped across a few
drives and an external journal on SSD.  But when I experimented with
that a few years ago I found synchronous write latency wasn't much
better.  I didn't investigate why not, maybe that's just the way SSDs
are.)

--b.



___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-01-30 Thread J. Bruce Fields
On Tue, Jan 30, 2018 at 03:29:41PM +, Terry Barnaby wrote:
> On 30/01/18 15:09, J. Bruce Fields wrote:
> > By comparison on my little home server (Fedora, ext4, a couple WD Black
> > 1TB drives), with sync, that untar takes is 7:44, about 8ms/file.
> Ok, that is far more reasonable, so something is up on my systems :)
> What speed do you get with the server export set to async ?

I tried just now and got 4m2s.

The drives probably still have to do a seek or two per create, the
difference now is that we don't have to wait for one create to start the
next one, so the drives can work in parallel.

So given that I'm striping across two drives, I *think* it makes sense
that I'm getting about double the performance with the async export
option.

But that doesn't explain the difference between async and local
performance (22s when I tried the same untar directly on the server, 25s
when I included a final sync in the timing).  And your numbers are a
complete mystery.

--b.

> > 
> > What's the disk configuration and what filesystem is this?
> Those tests above were to a single: SATA Western Digital Red 3TB, WDC
> WD30EFRX-68EUZN0 using ext4.
> Most of my tests have been to software RAID1 SATA disks, Western Digital Red
> 2TB on one server and Western Digital RE4 2TB WDC WD2003FYYS-02W0B1 on
> another quad core Xeon server all using ext4 and all having plenty of RAM.
> All on stock Fedora27 (both server and client) updated to date.
> 
> > 
> > > Is it really expected for NFS to be this bad these days with a reasonably
> > > typical operation and are there no other tuning parameters that can help  
> > > ?
> > It's expected that the performance of single-threaded file creates will
> > depend on latency, not bandwidth.
> > 
> > I believe high-performance servers use battery backed write caches with
> > storage behind them that can do lots of IOPS.
> > 
> > (One thing I've been curious about is whether you could get better
> > performance cheap on this kind of workload ext3/4 striped across a few
> > drives and an external journal on SSD.  But when I experimented with
> > that a few years ago I found synchronous write latency wasn't much
> > better.  I didn't investigate why not, maybe that's just the way SSDs
> > are.)
> > 
> > --b.
> 
> 
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-01-30 Thread Terry Barnaby

On 30/01/18 15:09, J. Bruce Fields wrote:

On Tue, Jan 30, 2018 at 08:49:27AM +, Terry Barnaby wrote:

On 29/01/18 22:28, J. Bruce Fields wrote:

On Mon, Jan 29, 2018 at 08:37:50PM +, Terry Barnaby wrote:

Ok, that's a shame unless NFSv4's write performance with small files/dirs
is relatively ok which it isn't on my systems.
Although async was "unsafe" this was not an issue in main standard
scenarios such as an NFS mounted home directory only being used by one
client.
The async option also does not appear to work when using NFSv3. I guess it
was removed from that protocol at some point as well ?

This isn't related to the NFS protocol version.

I think everybody's confusing the server-side "async" export option with
the client-side mount "async" option.  They're not really related.

The unsafe thing that speeds up file creates is the server-side "async"
option.  Sounds like you tried to use the client-side mount option
instead, which wouldn't do anything.


What is the expected sort of write performance when un-taring, for example,
the linux kernel sources ? Is 2 MBytes/sec on average on a Gigabit link
typical (3 mins to untar 4.14.15) or should it be better ?

It's not bandwidth that matters, it's latency.

The file create isn't allowed to return until the server has created the
file and the change has actually reached disk.

So an RPC has to reach the server, which has to wait for disk, and then
the client has to get the RPC reply.  Usually it's the disk latency that
dominates.

And also the final close after the new file is written can't return
until all the new file data has reached disk.

v4.14.15 has 61305 files:

$ git ls-tree -r  v4.14.15|wc -l
61305

So time to create each file was about 3 minutes/61305 =~ 3ms.

So assuming two roundtrips per file, your disk latency is probably about
1.5ms?

You can improve the storage latency somehow (e.g. with a battery-backed
write cache) or use more parallelism (has anyone ever tried to write a
parallel untar?).  Or you can cheat and set the async export option, and
then the server will no longer wait for disk before replying.  The
problem is that on server reboot/crash, the client's assumptions about
which operations succeeded may turn out to be wrong.

--b.

Many thanks for your reply.

Yes, I understand the above (latency and normally synchronous nature of
NFS). I have async defined in the servers /etc/exports options. I have,
later, also defined it on the client side as the async option on the server
did not appear to be working and I wondered if with ongoing changes it had
been moved there (would make some sense for the client to define it and pass
this option over to the server as it knows, in most cases, if the bad
aspects of async would be an issue to its usage in the situation in
question).

It's a server with large disks, so SSD is not really an option. The use of
async is ok for my usage (mainly /home mounted and users home files only in
use by one client at a time etc etc.).

Note it's not concurrent access that will cause problems, it's server
crashes.  A UPS may reduce the risk a little.


However I have just found that async is actually working! I just did not
believe it was, due to the poor write performance. Without async on the
server the performance is truly abysmal. The figures I get for untaring the
kernel sources (4.14.15 895MBytes untared) using "rm -fr linux-4.14.15;
sync; time (tar -xf linux-4.14.15.tar.gz -C /data2/tmp; sync)" are:

Untar on server to its local disk:  13 seconds, effective data rate: 68
MBytes/s

Untar on server over NFSv4.2 with async on server:  3 minutes, effective
data rate: 4.9 MBytes/sec

Untar on server over NFSv4.2 without async on server:  2 hours 12 minutes,
effective data rate: 115 kBytes/s !!

2:12 is 7920 seconds, and you've got 61305 files to write, so that's
about 130ms/file.  That's more than I'd expect even if you're waiting
for a few seeks on each file create, so there may indeed be something
wrong.

By comparison on my little home server (Fedora, ext4, a couple WD Black
1TB drives), with sync, that untar takes is 7:44, about 8ms/file.

Ok, that is far more reasonable, so something is up on my systems :)
What speed do you get with the server export set to async ?


What's the disk configuration and what filesystem is this?
Those tests above were to a single: SATA Western Digital Red 3TB, WDC 
WD30EFRX-68EUZN0 using ext4.
Most of my tests have been to software RAID1 SATA disks, Western Digital 
Red 2TB on one server and Western Digital RE4 2TB WDC WD2003FYYS-02W0B1 
on another quad core Xeon server all using ext4 and all having plenty of 
RAM.

All on stock Fedora27 (both server and client) updated to date.




Is it really expected for NFS to be this bad these days with a reasonably
typical operation and are there no other tuning parameters that can help  ?

It's expected that the performance of single-threaded file creates will
depend on latency, not bandwidth.

I believe 

Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-01-30 Thread J. Bruce Fields
On Tue, Jan 30, 2018 at 08:49:27AM +, Terry Barnaby wrote:
> On 29/01/18 22:28, J. Bruce Fields wrote:
> > On Mon, Jan 29, 2018 at 08:37:50PM +, Terry Barnaby wrote:
> > > Ok, that's a shame unless NFSv4's write performance with small files/dirs
> > > is relatively ok which it isn't on my systems.
> > > Although async was "unsafe" this was not an issue in main standard
> > > scenarios such as an NFS mounted home directory only being used by one
> > > client.
> > > The async option also does not appear to work when using NFSv3. I guess it
> > > was removed from that protocol at some point as well ?
> > This isn't related to the NFS protocol version.
> > 
> > I think everybody's confusing the server-side "async" export option with
> > the client-side mount "async" option.  They're not really related.
> > 
> > The unsafe thing that speeds up file creates is the server-side "async"
> > option.  Sounds like you tried to use the client-side mount option
> > instead, which wouldn't do anything.
> > 
> > > What is the expected sort of write performance when un-taring, for 
> > > example,
> > > the linux kernel sources ? Is 2 MBytes/sec on average on a Gigabit link
> > > typical (3 mins to untar 4.14.15) or should it be better ?
> > It's not bandwidth that matters, it's latency.
> > 
> > The file create isn't allowed to return until the server has created the
> > file and the change has actually reached disk.
> > 
> > So an RPC has to reach the server, which has to wait for disk, and then
> > the client has to get the RPC reply.  Usually it's the disk latency that
> > dominates.
> > 
> > And also the final close after the new file is written can't return
> > until all the new file data has reached disk.
> > 
> > v4.14.15 has 61305 files:
> > 
> > $ git ls-tree -r  v4.14.15|wc -l
> > 61305
> > 
> > So time to create each file was about 3 minutes/61305 =~ 3ms.
> > 
> > So assuming two roundtrips per file, your disk latency is probably about
> > 1.5ms?
> > 
> > You can improve the storage latency somehow (e.g. with a battery-backed
> > write cache) or use more parallelism (has anyone ever tried to write a
> > parallel untar?).  Or you can cheat and set the async export option, and
> > then the server will no longer wait for disk before replying.  The
> > problem is that on server reboot/crash, the client's assumptions about
> > which operations succeeded may turn out to be wrong.
> > 
> > --b.
> 
> Many thanks for your reply.
> 
> Yes, I understand the above (latency and normally synchronous nature of
> NFS). I have async defined in the servers /etc/exports options. I have,
> later, also defined it on the client side as the async option on the server
> did not appear to be working and I wondered if with ongoing changes it had
> been moved there (would make some sense for the client to define it and pass
> this option over to the server as it knows, in most cases, if the bad
> aspects of async would be an issue to its usage in the situation in
> question).
> 
> It's a server with large disks, so SSD is not really an option. The use of
> async is ok for my usage (mainly /home mounted and users home files only in
> use by one client at a time etc etc.).

Note it's not concurrent access that will cause problems, it's server
crashes.  A UPS may reduce the risk a little.

> However I have just found that async is actually working! I just did not
> believe it was, due to the poor write performance. Without async on the
> server the performance is truly abysmal. The figures I get for untaring the
> kernel sources (4.14.15 895MBytes untared) using "rm -fr linux-4.14.15;
> sync; time (tar -xf linux-4.14.15.tar.gz -C /data2/tmp; sync)" are:
> 
> Untar on server to its local disk:  13 seconds, effective data rate: 68
> MBytes/s
> 
> Untar on server over NFSv4.2 with async on server:  3 minutes, effective
> data rate: 4.9 MBytes/sec
> 
> Untar on server over NFSv4.2 without async on server:  2 hours 12 minutes,
> effective data rate: 115 kBytes/s !!

2:12 is 7920 seconds, and you've got 61305 files to write, so that's
about 130ms/file.  That's more than I'd expect even if you're waiting
for a few seeks on each file create, so there may indeed be something
wrong.

By comparison on my little home server (Fedora, ext4, a couple WD Black
1TB drives), with sync, that untar takes is 7:44, about 8ms/file.

What's the disk configuration and what filesystem is this?

> Is it really expected for NFS to be this bad these days with a reasonably
> typical operation and are there no other tuning parameters that can help  ?

It's expected that the performance of single-threaded file creates will
depend on latency, not bandwidth.

I believe high-performance servers use battery backed write caches with
storage behind them that can do lots of IOPS.

(One thing I've been curious about is whether you could get better
performance cheap on this kind of workload ext3/4 striped across a few
drives and an external journal on SSD.  But 

Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-01-30 Thread Terry Barnaby

On 29/01/18 22:28, J. Bruce Fields wrote:

On Mon, Jan 29, 2018 at 08:37:50PM +, Terry Barnaby wrote:

Ok, that's a shame unless NFSv4's write performance with small files/dirs
is relatively ok which it isn't on my systems.
Although async was "unsafe" this was not an issue in main standard
scenarios such as an NFS mounted home directory only being used by one
client.
The async option also does not appear to work when using NFSv3. I guess it
was removed from that protocol at some point as well ?

This isn't related to the NFS protocol version.

I think everybody's confusing the server-side "async" export option with
the client-side mount "async" option.  They're not really related.

The unsafe thing that speeds up file creates is the server-side "async"
option.  Sounds like you tried to use the client-side mount option
instead, which wouldn't do anything.


What is the expected sort of write performance when un-taring, for example,
the linux kernel sources ? Is 2 MBytes/sec on average on a Gigabit link
typical (3 mins to untar 4.14.15) or should it be better ?

It's not bandwidth that matters, it's latency.

The file create isn't allowed to return until the server has created the
file and the change has actually reached disk.

So an RPC has to reach the server, which has to wait for disk, and then
the client has to get the RPC reply.  Usually it's the disk latency that
dominates.

And also the final close after the new file is written can't return
until all the new file data has reached disk.

v4.14.15 has 61305 files:

$ git ls-tree -r  v4.14.15|wc -l
61305

So time to create each file was about 3 minutes/61305 =~ 3ms.

So assuming two roundtrips per file, your disk latency is probably about
1.5ms?

You can improve the storage latency somehow (e.g. with a battery-backed
write cache) or use more parallelism (has anyone ever tried to write a
parallel untar?).  Or you can cheat and set the async export option, and
then the server will no longer wait for disk before replying.  The
problem is that on server reboot/crash, the client's assumptions about
which operations succeeded may turn out to be wrong.

--b.


Many thanks for your reply.

Yes, I understand the above (latency and normally synchronous nature of 
NFS). I have async defined in the servers /etc/exports options. I have, 
later, also defined it on the client side as the async option on the 
server did not appear to be working and I wondered if with ongoing 
changes it had been moved there (would make some sense for the client to 
define it and pass this option over to the server as it knows, in most 
cases, if the bad aspects of async would be an issue to its usage in the 
situation in question).


It's a server with large disks, so SSD is not really an option. The use 
of async is ok for my usage (mainly /home mounted and users home files 
only in use by one client at a time etc etc.).


However I have just found that async is actually working! I just did not 
believe it was, due to the poor write performance. Without async on the 
server the performance is truly abysmal. The figures I get for untaring 
the kernel sources (4.14.15 895MBytes untared) using "rm -fr 
linux-4.14.15; sync; time (tar -xf linux-4.14.15.tar.gz -C /data2/tmp; 
sync)" are:


Untar on server to its local disk:  13 seconds, effective data rate: 68 
MBytes/s


Untar on server over NFSv4.2 with async on server:  3 minutes, effective 
data rate: 4.9 MBytes/sec


Untar on server over NFSv4.2 without async on server:  2 hours 12 
minutes, effective data rate: 115 kBytes/s !!


Is it really expected for NFS to be this bad these days with a 
reasonably typical operation and are there no other tuning parameters 
that can help  ?

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-01-29 Thread Petr Pisar
On 2018-01-29, J. Bruce Fields  wrote:
> The file create isn't allowed to return until the server has created the
> file and the change has actually reached disk.
>
Why is there such a requirement? This is not true for local file
systems. This is why fsync() exists.

-- Petr
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-01-29 Thread J. Bruce Fields
On Mon, Jan 29, 2018 at 08:37:50PM +, Terry Barnaby wrote:
> Ok, that's a shame unless NFSv4's write performance with small files/dirs
> is relatively ok which it isn't on my systems.
> Although async was "unsafe" this was not an issue in main standard
> scenarios such as an NFS mounted home directory only being used by one
> client.
> The async option also does not appear to work when using NFSv3. I guess it
> was removed from that protocol at some point as well ?

This isn't related to the NFS protocol version.

I think everybody's confusing the server-side "async" export option with
the client-side mount "async" option.  They're not really related.

The unsafe thing that speeds up file creates is the server-side "async"
option.  Sounds like you tried to use the client-side mount option
instead, which wouldn't do anything.

> What is the expected sort of write performance when un-taring, for example,
> the linux kernel sources ? Is 2 MBytes/sec on average on a Gigabit link
> typical (3 mins to untar 4.14.15) or should it be better ?

It's not bandwidth that matters, it's latency.

The file create isn't allowed to return until the server has created the
file and the change has actually reached disk.

So an RPC has to reach the server, which has to wait for disk, and then
the client has to get the RPC reply.  Usually it's the disk latency that
dominates.

And also the final close after the new file is written can't return
until all the new file data has reached disk.

v4.14.15 has 61305 files:

$ git ls-tree -r  v4.14.15|wc -l
61305

So time to create each file was about 3 minutes/61305 =~ 3ms.

So assuming two roundtrips per file, your disk latency is probably about
1.5ms?

You can improve the storage latency somehow (e.g. with a battery-backed
write cache) or use more parallelism (has anyone ever tried to write a
parallel untar?).  Or you can cheat and set the async export option, and
then the server will no longer wait for disk before replying.  The
problem is that on server reboot/crash, the client's assumptions about
which operations succeeded may turn out to be wrong.

--b.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-01-29 Thread Terry Barnaby

On 29/01/18 19:50, Steve Dickson wrote:


On 01/29/2018 12:42 PM, Steven Whitehouse wrote:



 Forwarded Message 
Subject:Re: Fedora27: NFS v4 terrible write performance, is async 
working
Date:   Sun, 28 Jan 2018 21:17:02 +
From:   Terry Barnaby 
To: Steven Whitehouse , Development discussions related to Fedora 
, Terry Barnaby 
CC: Steve Dickson , Benjamin Coddington 




On 28/01/18 14:38, Steven Whitehouse wrote:

Hi,


On 28/01/18 07:48, Terry Barnaby wrote:

When doing a tar -xzf ... of a big source tar on an NFSv4 file system
the time taken is huge. I am seeing an overall data rate of about 1
MByte per second across the network interface. If I copy a single
large file I see a network data rate of about 110 MBytes/sec which is
about the limit of the Gigabit Ethernet interface I am using.

Now, in the past I have used the NFS "async" mount option to help
with write speed (lots of small files in the case of an untar of a
set of source files).

However, this does not seem to speed this up in Fedora27 and also I
don't see the "async" option listed when I run the "mount" command.
When I use the "sync" option it does show up in the "mount" list.

The question is, is the "async" option actually working with NFS v4
in Fedora27 ?

No. Its something left over from v3 that allowed servers to be unsafe.
With v4, the protocol defines stableness of the writes.

Thanks for the reply.

Ok, that's a shame unless NFSv4's write performance with small 
files/dirs is relatively ok which it isn't on my systems.
Although async was "unsafe" this was not an issue in main standard 
scenarios such as an NFS mounted home directory only being used by one 
client.
The async option also does not appear to work when using NFSv3. I guess 
it was removed from that protocol at some point as well ?



___

What server is in use? Is that Linux too? Also, is this v4.0 or v4.1?
I've copied in some of the NFS team who should be able to assist,

Steve.

Thanks for the reply.

Server is a Fedora27 as well. vers=4.2 the default. Same issue at other
sites with Fedora27.

Server export: "/data *.kingnet(rw,async,fsid=17)"

Client fstab: "king.kingnet:/data /data nfs async,nocto 0 0"

Client mount: "king.kingnet:/data on /data type nfs4
(rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,nocto,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.202.2,local_lock=none,addr=192.168.202.1)"



This looks normal except for setting fsid=17...

The best way to debug this is to open up a bugzilla report
and attached a (compressed) wireshark network trace to see
what is happening on the wire... The entire tar is not needed
just a good chunk...

steved.


Ok, will try doing the wireshark trace. What should I open a Bugzilla 
report against, the kernel ?


What is the expected sort of write performance when un-taring, for 
example, the linux kernel sources ? Is 2 MBytes/sec on average on a 
Gigabit link typical (3 mins to untar 4.14.15) or should it be better ?

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

2018-01-29 Thread Steve Dickson


On 01/29/2018 12:42 PM, Steven Whitehouse wrote:
> 
> 
> 
>  Forwarded Message 
> Subject:  Re: Fedora27: NFS v4 terrible write performance, is async 
> working
> Date: Sun, 28 Jan 2018 21:17:02 +
> From: Terry Barnaby 
> To:   Steven Whitehouse , Development discussions 
> related to Fedora , Terry Barnaby 
> 
> CC:   Steve Dickson , Benjamin Coddington 
> 
> 
> 
> 
> On 28/01/18 14:38, Steven Whitehouse wrote:
>> Hi,
>>
>>
>> On 28/01/18 07:48, Terry Barnaby wrote:
>>> When doing a tar -xzf ... of a big source tar on an NFSv4 file system 
>>> the time taken is huge. I am seeing an overall data rate of about 1 
>>> MByte per second across the network interface. If I copy a single 
>>> large file I see a network data rate of about 110 MBytes/sec which is 
>>> about the limit of the Gigabit Ethernet interface I am using.
>>>
>>> Now, in the past I have used the NFS "async" mount option to help 
>>> with write speed (lots of small files in the case of an untar of a 
>>> set of source files).
>>>
>>> However, this does not seem to speed this up in Fedora27 and also I 
>>> don't see the "async" option listed when I run the "mount" command. 
>>> When I use the "sync" option it does show up in the "mount" list.
>>>
>>> The question is, is the "async" option actually working with NFS v4 
>>> in Fedora27 ?
No. Its something left over from v3 that allowed servers to be unsafe.
With v4, the protocol defines stableness of the writes.

>>> ___
>>
>> What server is in use? Is that Linux too? Also, is this v4.0 or v4.1? 
>> I've copied in some of the NFS team who should be able to assist,
>>
>> Steve.
> 
> Thanks for the reply.
> 
> Server is a Fedora27 as well. vers=4.2 the default. Same issue at other 
> sites with Fedora27.
> 
> Server export: "/data *.kingnet(rw,async,fsid=17)"
> 
> Client fstab: "king.kingnet:/data /data nfs async,nocto 0 0"
> 
> Client mount: "king.kingnet:/data on /data type nfs4 
> (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,nocto,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.202.2,local_lock=none,addr=192.168.202.1)"
> 
> 
This looks normal except for setting fsid=17...

The best way to debug this is to open up a bugzilla report
and attached a (compressed) wireshark network trace to see 
what is happening on the wire... The entire tar is not needed
just a good chunk...

steved.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org