[Nfs-ganesha-devel] Announce Push of V2.5-rc6

2017-05-12 Thread Frank Filz
Branch next

Tag:V2.5-rc6

NOTE: Double merges this week due to some significant issues

Release Highlights

* FSAL_MEM can now pass pynfs (it can now store a configurable amount of
date)

* More FSAL stacking and upcall fixes

* Add run time command line option to dump backtrace on SEGFAULT etc.

Signed-off-by: Frank S. Filz 

Contents:

412b5c4 Frank S. Filz V2.5-rc6
03441ab Gui Hecheng add signal handlers for SIGSEGV like signals to dump
backtrace
371950c Daniel Gryniewicz MDCACHE - Protect export init from UP calls
56c6eb1 Daniel Gryniewicz Fix up stacking in RGW
0dd13dc Daniel Gryniewicz Fix NFS_MSK log config
274b30c Daniel Gryniewicz FSAL_MEM - Allow writing small amounts of data


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Any plans to support callback API model for FSALs?

2017-05-12 Thread William Allen Simpson
Soumya and I have been working on-and-off for a couple of months on a
design for both async callback and zero-copy, based upon APIs already
implemented for Gluster.  Once we have something comprehensive and
well-written, I'd like to get feedback from other FSALs.

And of course, zero-copy is the whole point of RDMA.  Earlier Gluster
testing with Samba showed that zero-copy gave a better performance
improvement than async IO.

The underlying Linux OS calls only allow one or the other.  For example,
for TCP output in Gashesha V2.5/ntirpc 1.5, I've eliminated one task
switch, but still use the writev() in a semi-dedicated thread, as there
is no async writev() variant.  We should have a measurable performance
improvement (but it might be masked by all the MDCACHE changes).

For FSALs we have the opportunity to design a combined system.

Here's the current state of the design introduction:

NFS-Ganesha direct data placement
with reduced task switching and zero-copy

Currently/Previously

(Task switch 1.)  Upon signalling (epoll), a master polling thread launches 
other worker threads, one for each signal.

(Task switch 2.)  If there is more than one concurrent request on the same 
transport tuple (IP source, IP destination, source port, destination port), the 
request is added to a stall queue.

(Task switch 3.)  During parsing the NFS input, each thread can wait for more 
data.

(Task switch 4.)  After parsing the NFS input, the thread queues the request 
according to several (4) priorities for handling by another worker thread.  
Requests are not handled in order.

(Task switch 5.)  While executing the NFS request, the thread can stall waiting 
for FSAL data.

(Task switch 6.)  After retrieving the resulting data, the thread hands-off the 
output to another thread to handle the system write. [Eliminated in Ganesha 
V2.5/ntirpc v1.5]

Ideally

(Task switch 1.)  Upon signalling (epoll), the worker thread will make only one 
system call to accept the incoming connection.

If there is more than one signal at a time, that same worker will queue the 
additional signals, queue another work request to handle the next signal, then 
continue hot processing the first signal. Note that this replaces the stall 
queue, as the latter 
threads utilize a worker pool and are sequentially executed in a fair queuing 
fashion.

To remain hot, the thread checks for additional work before returning to the 
idle pool.

(Task switch 2.)  Instead of waiting for a read system call to complete, use a 
callback to schedule another worker thread, parse the NFS request, and call the 
appropriate FSAL..

If more [TCP, RDMA] data is needed for the request, the thread will save the 
state for the subsequent signal.

(Task switch 3.)  While executing the NFS request, the thread can stall waiting 
for FSAL data. The FSAL will return its result, and make a second system call 
to send output. In the case that FSAL result does not require a stall, no task 
switch is needed.

To remain hot, the thread checks for additional output data before returning to 
the idle pool. Other threads will queue their output data. (As of Ganesha 
V2.5/ntirpc v1.5, this is implemented for TCP.)

Input signal changes

Currently, the (epoll) signal is blocked per fd after each fd signal. The input 
signal thread does not reinstate the fd signal until after input processing is 
complete. This causes a data backlog in the underlying OS, until data is 
dropped for lack of 
signal processing. Evidence that sawtoothed patterns appear in TCP, as the OS 
will acknowledge (ack) the data until no more data can be held, causing TCP 
stall and slow start.

Ideally, the signal should never be blocked. Until the entire task scheme is 
upgraded according to this plan, this is not possible. So the block should be 
reinstated as soon as practicable, allowing new signals to be queued quickly.

The signal queue(s) implemented for RDMA should be used for all signals. 
Preliminary testing by CEA demonstrated that up to 3,000 client connections 
could be handled during cold startup. However, this cannot be implemented until 
better asynchrony and 
parallelism is available.

Transport parallelism

Currently, on SVC_RECV() a new transport (SVCXPRT) is spawned for each incoming 
TCP and RDMA connection, but not for UDP connections. This requires extensive 
locking around UDP receive and send, as each incoming request uses the same 
buffers for input and 
output, and stores common data fields used by both input and output. There 
exists a UDP multi-threading window between the SVC_RECV() and SVC_SEND() – 
that is, the long-standing code is !MT-safe.

Instead, spawn a new UDP transport for each incoming request. Rather than 
allocating a separate buffer for each UDP transport, append an IOQ buffer, 
replacing the rpc_buffer() pointer. This will keep the number of memory 
allocation calls and contention 
exactly the same as previously, and permit usage of the significantly faster 
duplex IOQ for

[Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: MDCACHE - Protect export init from UP calls

2017-05-12 Thread GerritHub
>From Daniel Gryniewicz :

Daniel Gryniewicz has uploaded this change for review. ( 
https://review.gerrithub.io/360642


Change subject: MDCACHE - Protect export init from UP calls
..

MDCACHE - Protect export init from UP calls

As soon as we call create_export() on the Sub-FSAL, it can start making
UP calls.  However, we're not ready to process them yet, since stacking
is not complete.

Add a RW lock to protect this.  UP calls take it for read, and init
takes it for write.  This means UP calls to not block each other, but
all of them wait for init to complete.

Change-Id: I4a79fb6f9c187b999f42163e4515eeca9d24eb27
Signed-off-by: Daniel Gryniewicz 
---
M src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_export.c
M src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_int.h
M src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_main.c
M src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_up.c
4 files changed, 50 insertions(+), 11 deletions(-)



  git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha 
refs/changes/42/360642/1
-- 
To view, visit https://review.gerrithub.io/360642
To unsubscribe, visit https://review.gerrithub.io/settings

Gerrit-Project: ffilz/nfs-ganesha
Gerrit-Branch: next
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4a79fb6f9c187b999f42163e4515eeca9d24eb27
Gerrit-Change-Number: 360642
Gerrit-PatchSet: 1
Gerrit-Owner: Daniel Gryniewicz 
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel