On 07/24/2012 04:22 PM, Jeremy Allison wrote:
On Tue, Jul 24, 2012 at 02:35:28PM -0700, Andrew Scherpbier wrote:
Hi Daniel,

Just a note of encouragement...
I have so far written 2 filesystems in Java that use Samba for 2
different companies, so you're not alone!  :-)

The strategy I've used is to write a simple TCP protocol client (the
VFS module) and server (a straight forward threaded Java server).
Works like a charm.  As long as the client side is abstracted enough
so that its samba connection state is independent from the server
connection state, there are no issues with restarting either.  (I
started out using a statefull protocol, but ended up changing to a
completely stateless one, where the individual messages contain
enough information to establish context.  This way, if either end of
the system goes down, recovery is the simple act of building a new
TCP connection.)

I also attempted to use the Apache ActiveMQ C++ library for
communication, but found it buggy and leaky.

I originally looked into hosting the JVM in the VFS module, but that
was going to be a problem because each smbd process would have to
start its own JVM.  The JVM startup time (especially the server JVM)
is very high and the memory overhead would not make it scalable.

TCP through the loopback interface is very fast (at least on the
linux system's I've developed for), so there was no need to
implement some sort of shared memory interface.

The system I'm working on now manages PB class storage (currently up
to 10PB) with hundreds of concurrent clients and the VFS module does
this without issues or much overhead.  We're regularly seeing write
speeds in the 400-500MB/s range using 10GbE and multiple windows
clients.

Good luck!

P.S.:  Blatant plug for my current project:
http://www.cuttedge.com/psca/index.html
Wow - that's really cool stuff !

I'm glad the VFS works so well for you. I wanted to give you
a heads-up on the changes we're making to the VFS moving
forward with 4.0.x and above - take a look at the changes
Volker made for the pread() -> pread_send_fn()/pread_recv_fn()
and pwrite() -> pwrite_send_fn()/pwrite_recv_fn() in order to
make the VFS async (and allow pthreaded implementations to
be hidden under the covers).

Sample implementations are in source3/modules/vfs_default.c
in:

vfswrap_pread_send()/vfswrap_asys_ssize_t_recv()
vfswrap_pwrite_send()/vfswrap_asys_ssize_t_recv()

It makes the VFS a little more complicated, but should
enable you to get more performance out of it.

Interesting stuff. Right now I'm letting default_vfs do all the low-level I/O, so any improvements in speed you guys make should immediately be useful! So does this mean that the VFS module will need to be changed to be thread-safe? That actually will be a significant issue. I'm not too familiar with pthreads and don't know too much about the low level implications WRT errno, etc. (I'm mostly a Java weenie nowadays, sorry! Last time I used threads in C++ was a couple years ago using Boost under Windows)



We're also thinking longer term about changing the
model of keeping the current working directory as
the root of the exported service and changing the
internals of Samba to chdir() to the parent directory
of any path currently being processed - this allows
easier security checks inside smbd and reduces the
opportunity for pathname check race conditions.
For what I'm doing now, I don't think that matters much, other than the realpath calls, I believe. Since I'm only dealing with files *after* they have been closed, the only thing I'm worried about is getting the right path to the files.
Feedback very welcome - especially from someone
who has implemented a couple of production Samba
VFS modules already :-).

My main gripe with the VFS stuff is the lack of documentation. What I'd like to see is at least a call flow to make it easier for module writers to figure out what calls to hook. For example, does create_file call open or do both need to be implemented/hooked? I unfortunately happen to have lots of experience with windows kernel calls because I also wrote a filter-driver based FS for windows in a previous life, so I know how complicated the create_file call is (Thanks, Microsoft!). The fact that you don't need to hook it is awesome, but that's not explained anywhere I could find.

Or at least detailed docs on the individual hooks, what they are supposed to do, why they are called, what their side effects are supposed to be, etc. (Doxygen docs in the code would be awesome!)

I spend way too much time running "grep -rn something" on the samba source and following ctags right now :-(


Don't get me wrong! I love working on this stuff, but the VFS module is a small (but important) part of the bigger system and I end up spending a disproportionate amount of time on the module because of the lack of documentation.

Thanks !

Jeremy.

--
Andrew Scherpbier
and...@scherpbier.org

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba

Reply via email to