Re: [Gluster-devel] optimizing gluster fuse

2018-04-09 Thread Manoj Pillai
On Tue, Apr 10, 2018 at 10:02 AM, riya khanna 
wrote:

> On Mon, Apr 9, 2018 at 10:42 PM, Raghavendra Gowdappa  > wrote:
>
>> +Manoj.
>>
>> On Mon, Apr 9, 2018 at 10:18 PM, riya khanna 
>> wrote:
>>
>>> Hi All,
>>>
>>> I'm trying to use the new framework to speed up lookups/attr/xattr
>>> operations by split functionality between fast/slow execution paths. I'd
>>> highly appreciate if you could suggest experiments to evaluate the
>>> performance improvement.
>>>
>>
How about a software build workload, varying the number of source files?
Especially the case where nothing needs to be done because no files have
changed since last build -- this case should be all metadata operations.

-- Manoj


>> As you've pointed out already, this is a good place for read caches (both
>> data and metadata). While there is an overlap between things cached by
>> kernel and things cached by glusterfs, there are somethings which are
>> cached only by glusterfs but not by VFS/kernel. I think this is the area we
>> can explore to move these caches into kernel. Things I can think of:
>>
>>
> Even if things are cached by VFS (e.g., dir entries, attributes, etc.).
> The size of VFS dcache is limited and can affect performance when under
> pressure. Have you ever experienced such as case? Nevertheless, with the
> new framework can help create your own dir/attr cache managed by the
> user-space daemon - lets call it self-managed dcache.
>
>
>> * xattr caching - done by md-cache in glusterfs. I am not sure whether
>> VFS caches xattrs. If not, this can yield good returns for workloads
>> involving xattrs (like POSIX acls etc).
>>
>
> Thanks! Similar to attr, xattr caching should be doable as well. I can
> start by looking at the existing implementation in md-cache.
>
>
>> * GET kind of interface for small files - done by quick-read in
>> glusterfs. Note that we fetch the file in lookup. If we couple this with
>> pushing open-behind in kernel, we can prevent open/readv/flush/release to
>> glusterfs completely in suitable workloads (We had earlier found that this
>> boosts performance for webserver usecases). I think in lookup response, we
>> would've to populate page cache. Also lookup response signature doesn't
>> provide for holding this data. Not sure whether this can be done.
>>
>
> This one is tricky. There are some limitations imposed by the framework.
> Let me think about it.
>
>
>> * Dirent prefetching for directories - done by readdir-ahead.
>>
> The user space daemon in readdir() can populate the self-managed dcache.
> Future lookups can be served from this cache entirely within the kernel.
> What kind of workload can benefit from this?
>
>
>> * As you've already pointed out, we can improve on our invalidation
>> strategies.
>> * since page cache is already present in VFS, I don't think
>> read-ahead/io-cache might have any benefits.
>>
>
> The framework can also bypass fuse user space daemon during data I/O
> (e.g., read, write) if the file is locally stored by the lower file system.
> This design is called pass-though I/O and has been discussed numerous times
> on fuse-dlevel mailing list. Recent discussion: https://lwn.net/Ar
> ticles/674286/
> Does this apply to glusterfs as well, perhaps when a file is cached by the
> client locally?
>
>
>>> As I mentioned in my previous email, I'm caching replies from fuse
>>> daemon (hashed key/value blobs) in the kernel so that for the same key
>>> (e.g.,  in case of FUSE_LOOKUP), the reply (e.g.,
>>> fuse_entry_out) is served from the kernel itself and no call is delivered
>>> to user-space.
>>>
>>> While this may seem redundant due to entry_timeout/attr_timeout caching
>>> that already exists in FUSE, this design provides more control to the
>>> user-space daemon over when/what to invalidate. For instance, entry_timeout
>>> caching is only valid until a timeout or until the kernel removes dentry
>>> from its dcache.
>>>
>>> For invalidation, fuse_lowlevel_notify_inval_entry() can also remove
>>> entries from the hash table. Please refer to the figure attached in my last
>>> email.
>>>
>>> Thanks,
>>> Riya
>>>
>>> On Tue, Apr 3, 2018 at 1:45 PM, riya khanna 
>>> wrote:
>>>
 I'm attaching a figure that depicts the architecture of my optimized
 fuse framework. Kindly let me know if you have any questions.

 On Mon, Apr 2, 2018 at 10:57 AM, riya khanna 
 wrote:

> Thanks Amar! Please see my answers inline.
>
> On Mon, Apr 2, 2018 at 5:41 AM, Amar Tumballi 
> wrote:
>
>> Hi Riya,
>>
>> Thanks for writing to us. Some questions before we start on this.
>>
>> * Where can we see your work of modifying the fuse module to cache
>> the calls? Some reference would help us to provide more specific 
>> pointers.
>> (or ask better questions).
>>
>> I've created a fast 

Re: [Gluster-devel] optimizing gluster fuse

2018-04-09 Thread riya khanna
On Mon, Apr 9, 2018 at 10:42 PM, Raghavendra Gowdappa 
wrote:

> +Manoj.
>
> On Mon, Apr 9, 2018 at 10:18 PM, riya khanna 
> wrote:
>
>> Hi All,
>>
>> I'm trying to use the new framework to speed up lookups/attr/xattr
>> operations by split functionality between fast/slow execution paths. I'd
>> highly appreciate if you could suggest experiments to evaluate the
>> performance improvement.
>>
>
> As you've pointed out already, this is a good place for read caches (both
> data and metadata). While there is an overlap between things cached by
> kernel and things cached by glusterfs, there are somethings which are
> cached only by glusterfs but not by VFS/kernel. I think this is the area we
> can explore to move these caches into kernel. Things I can think of:
>
>
Even if things are cached by VFS (e.g., dir entries, attributes, etc.). The
size of VFS dcache is limited and can affect performance when under
pressure. Have you ever experienced such as case? Nevertheless, with the
new framework can help create your own dir/attr cache managed by the
user-space daemon - lets call it self-managed dcache.


> * xattr caching - done by md-cache in glusterfs. I am not sure whether VFS
> caches xattrs. If not, this can yield good returns for workloads involving
> xattrs (like POSIX acls etc).
>

Thanks! Similar to attr, xattr caching should be doable as well. I can
start by looking at the existing implementation in md-cache.


> * GET kind of interface for small files - done by quick-read in glusterfs.
> Note that we fetch the file in lookup. If we couple this with pushing
> open-behind in kernel, we can prevent open/readv/flush/release to glusterfs
> completely in suitable workloads (We had earlier found that this boosts
> performance for webserver usecases). I think in lookup response, we
> would've to populate page cache. Also lookup response signature doesn't
> provide for holding this data. Not sure whether this can be done.
>

This one is tricky. There are some limitations imposed by the framework.
Let me think about it.


> * Dirent prefetching for directories - done by readdir-ahead.
>
The user space daemon in readdir() can populate the self-managed dcache.
Future lookups can be served from this cache entirely within the kernel.
What kind of workload can benefit from this?


> * As you've already pointed out, we can improve on our invalidation
> strategies.
> * since page cache is already present in VFS, I don't think
> read-ahead/io-cache might have any benefits.
>

The framework can also bypass fuse user space daemon during data I/O (e.g.,
read, write) if the file is locally stored by the lower file system. This
design is called pass-though I/O and has been discussed numerous times on
fuse-dlevel mailing list. Recent discussion:
https://lwn.net/Articles/674286/
Does this apply to glusterfs as well, perhaps when a file is cached by the
client locally?


>> As I mentioned in my previous email, I'm caching replies from fuse
>> daemon (hashed key/value blobs) in the kernel so that for the same key
>> (e.g.,  in case of FUSE_LOOKUP), the reply (e.g.,
>> fuse_entry_out) is served from the kernel itself and no call is delivered
>> to user-space.
>>
>> While this may seem redundant due to entry_timeout/attr_timeout caching
>> that already exists in FUSE, this design provides more control to the
>> user-space daemon over when/what to invalidate. For instance, entry_timeout
>> caching is only valid until a timeout or until the kernel removes dentry
>> from its dcache.
>>
>> For invalidation, fuse_lowlevel_notify_inval_entry() can also remove
>> entries from the hash table. Please refer to the figure attached in my last
>> email.
>>
>> Thanks,
>> Riya
>>
>> On Tue, Apr 3, 2018 at 1:45 PM, riya khanna 
>> wrote:
>>
>>> I'm attaching a figure that depicts the architecture of my optimized
>>> fuse framework. Kindly let me know if you have any questions.
>>>
>>> On Mon, Apr 2, 2018 at 10:57 AM, riya khanna 
>>> wrote:
>>>
 Thanks Amar! Please see my answers inline.

 On Mon, Apr 2, 2018 at 5:41 AM, Amar Tumballi 
 wrote:

> Hi Riya,
>
> Thanks for writing to us. Some questions before we start on this.
>
> * Where can we see your work of modifying the fuse module to cache the
> calls? Some reference would help us to provide more specific pointers. (or
> ask better questions).
>
> I've created a fast path framework for FUSE, where the user space
 daemon can load a module and register handlers for file operations (lookup,
 open, r/w, etc.) that must be handled in the kernel itself without an up
 call to the user space. I call them fast path handlers. This design also
 retains the regular FUSE handlers for file system operations in upser-space
 (slow path). The fast path and slow path can communicate with each other
 over shared 

Re: [Gluster-devel] optimizing gluster fuse

2018-04-09 Thread Raghavendra Gowdappa
+Manoj.

On Mon, Apr 9, 2018 at 10:18 PM, riya khanna 
wrote:

> Hi All,
>
> I'm trying to use the new framework to speed up lookups/attr/xattr
> operations by split functionality between fast/slow execution paths. I'd
> highly appreciate if you could suggest experiments to evaluate the
> performance improvement.
>

As you've pointed out already, this is a good place for read caches (both
data and metadata). While there is an overlap between things cached by
kernel and things cached by glusterfs, there are somethings which are
cached only by glusterfs but not by VFS/kernel. I think this is the area we
can explore to move these caches into kernel. Things I can think of:

* xattr caching - done by md-cache in glusterfs. I am not sure whether VFS
caches xattrs. If not, this can yield good returns for workloads involving
xattrs (like POSIX acls etc).
* GET kind of interface for small files - done by quick-read in glusterfs.
Note that we fetch the file in lookup. If we couple this with pushing
open-behind in kernel, we can prevent open/readv/flush/release to glusterfs
completely in suitable workloads (We had earlier found that this boosts
performance for webserver usecases). I think in lookup response, we
would've to populate page cache. Also lookup response signature doesn't
provide for holding this data. Not sure whether this can be done.
* Dirent prefetching for directories - done by readdir-ahead.
* As you've already pointed out, we can improve on our invalidation
strategies.
* since page cache is already present in VFS, I don't think
read-ahead/io-cache might have any benefits.


> As I mentioned in my previous email, I'm caching replies from fuse daemon
> (hashed key/value blobs) in the kernel so that for the same key (e.g.,
>  in case of FUSE_LOOKUP), the reply (e.g.,
> fuse_entry_out) is served from the kernel itself and no call is delivered
> to user-space.
>
> While this may seem redundant due to entry_timeout/attr_timeout caching
> that already exists in FUSE, this design provides more control to the
> user-space daemon over when/what to invalidate. For instance, entry_timeout
> caching is only valid until a timeout or until the kernel removes dentry
> from its dcache.
>
> For invalidation, fuse_lowlevel_notify_inval_entry() can also remove
> entries from the hash table. Please refer to the figure attached in my last
> email.
>
> Thanks,
> Riya
>
> On Tue, Apr 3, 2018 at 1:45 PM, riya khanna 
> wrote:
>
>> I'm attaching a figure that depicts the architecture of my optimized fuse
>> framework. Kindly let me know if you have any questions.
>>
>> On Mon, Apr 2, 2018 at 10:57 AM, riya khanna 
>> wrote:
>>
>>> Thanks Amar! Please see my answers inline.
>>>
>>> On Mon, Apr 2, 2018 at 5:41 AM, Amar Tumballi 
>>> wrote:
>>>
 Hi Riya,

 Thanks for writing to us. Some questions before we start on this.

 * Where can we see your work of modifying the fuse module to cache the
 calls? Some reference would help us to provide more specific pointers. (or
 ask better questions).

 I've created a fast path framework for FUSE, where the user space
>>> daemon can load a module and register handlers for file operations (lookup,
>>> open, r/w, etc.) that must be handled in the kernel itself without an up
>>> call to the user space. I call them fast path handlers. This design also
>>> retains the regular FUSE handlers for file system operations in upser-space
>>> (slow path). The fast path and slow path can communicate with each other
>>> over shared memory or using  syscalls to enable/invalidate caching of data
>>> structs (e.g., results of getattr, getxattr, etc.)
>>>
>>> There's a process I need to follow in order to make the code available
>>> publicly. I've already started, but will take some time. I will try to do
>>> this asap.
>>>
>>> * If the caching happened in fuse module, and it expects the regular
 arguments as the parameters, then there may not be any work required at all
 in glusterfs, as it works on low-level fuse api.


>>> The fast handlers expect same interface and args (fuse_args) as the
>>> regular user-space daemon. The fast handler code is fs-specific, therefore,
>>> must come from glusterfs. Changes are also needed in glusterfs code to
>>> communicate with the fast path for enabling/invalidating caching.
>>>
>>>
 * Also, how to invalidate caches from userspace program? because
 GlusterFS could be accessed from multiple clients, so it becomes an
 important piece to have.


>>> Server invalidate can trigger a system call into the fast path framework
>>> to invalidate caches.
>>>
>>>
 For referring at the codebase to look at integration with fuse module,
 please check the directory 'xlators/mount/fuse/src/' and mostly the file
 'fuse-bridge.c'.

 Thanks for your interest in project, would be great to collaborate 

Re: [Gluster-devel] optimizing gluster fuse

2018-04-09 Thread riya khanna
Hi All,

I'm trying to use the new framework to speed up lookups/attr/xattr
operations by split functionality between fast/slow execution paths. I'd
highly appreciate if you could suggest experiments to evaluate the
performance improvement.

As I mentioned in my previous email, I'm caching replies from fuse daemon
(hashed key/value blobs) in the kernel so that for the same key (e.g.,
 in case of FUSE_LOOKUP), the reply (e.g.,
fuse_entry_out) is served from the kernel itself and no call is delivered
to user-space.

While this may seem redundant due to entry_timeout/attr_timeout caching
that already exists in FUSE, this design provides more control to the
user-space daemon over when/what to invalidate. For instance, entry_timeout
caching is only valid until a timeout or until the kernel removes dentry
from its dcache.

For invalidation, fuse_lowlevel_notify_inval_entry() can also remove
entries from the hash table. Please refer to the figure attached in my last
email.

Thanks,
Riya

On Tue, Apr 3, 2018 at 1:45 PM, riya khanna 
wrote:

> I'm attaching a figure that depicts the architecture of my optimized fuse
> framework. Kindly let me know if you have any questions.
>
> On Mon, Apr 2, 2018 at 10:57 AM, riya khanna 
> wrote:
>
>> Thanks Amar! Please see my answers inline.
>>
>> On Mon, Apr 2, 2018 at 5:41 AM, Amar Tumballi 
>> wrote:
>>
>>> Hi Riya,
>>>
>>> Thanks for writing to us. Some questions before we start on this.
>>>
>>> * Where can we see your work of modifying the fuse module to cache the
>>> calls? Some reference would help us to provide more specific pointers. (or
>>> ask better questions).
>>>
>>> I've created a fast path framework for FUSE, where the user space daemon
>> can load a module and register handlers for file operations (lookup, open,
>> r/w, etc.) that must be handled in the kernel itself without an up call to
>> the user space. I call them fast path handlers. This design also retains
>> the regular FUSE handlers for file system operations in upser-space (slow
>> path). The fast path and slow path can communicate with each other over
>> shared memory or using  syscalls to enable/invalidate caching of data
>> structs (e.g., results of getattr, getxattr, etc.)
>>
>> There's a process I need to follow in order to make the code available
>> publicly. I've already started, but will take some time. I will try to do
>> this asap.
>>
>> * If the caching happened in fuse module, and it expects the regular
>>> arguments as the parameters, then there may not be any work required at all
>>> in glusterfs, as it works on low-level fuse api.
>>>
>>>
>> The fast handlers expect same interface and args (fuse_args) as the
>> regular user-space daemon. The fast handler code is fs-specific, therefore,
>> must come from glusterfs. Changes are also needed in glusterfs code to
>> communicate with the fast path for enabling/invalidating caching.
>>
>>
>>> * Also, how to invalidate caches from userspace program? because
>>> GlusterFS could be accessed from multiple clients, so it becomes an
>>> important piece to have.
>>>
>>>
>> Server invalidate can trigger a system call into the fast path framework
>> to invalidate caches.
>>
>>
>>> For referring at the codebase to look at integration with fuse module,
>>> please check the directory 'xlators/mount/fuse/src/' and mostly the file
>>> 'fuse-bridge.c'.
>>>
>>> Thanks for your interest in project, would be great to collaborate on
>>> this effort, as it can enhance the performance of glusterfs in many
>>> usecases.
>>>
>>
>> I'm still going through gluster developer documentation, but it'd be
>> helpful if you could mention what kind of use cases does the fast/slow
>> split FUSE framework enable? i've already applied the framework to
>> accelerate multiple FUSE-based stackable file systems, but want the
>> interface to be generic enough for all FUSE file systems to take advantage
>> of it.
>>
>>
>>> Regards,
>>> Amar
>>>
>>>
>>>
>>>
>>> On Mon, Apr 2, 2018 at 6:34 AM, riya khanna 
>>> wrote:
>>>
 Hi,

 I've modified FUSE framework to take a part of user-space daemon code
 and moves it into the kernel fuse driver to minimize user-kernel-user
 switches during file system  operations. An example would be caching
 getattr/getxattr/lookup/security checks etc. This design, therefore,
 create fast (served directly from the kernel) and a slow (regular fuse)
 execution paths. The fast and slow paths can also communicate with each
 other using shared memory.

 I was wondering if it is possible to accelerate glusterfs using this
 design. What pieces could (should) be easily moved to kernel space?
 Any pointers would be highly appreciated. Thanks!

 -Riya

 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 

Re: [Gluster-devel] optimizing gluster fuse

2018-04-02 Thread riya khanna
Thanks Amar! Please see my answers inline.

On Mon, Apr 2, 2018 at 5:41 AM, Amar Tumballi  wrote:

> Hi Riya,
>
> Thanks for writing to us. Some questions before we start on this.
>
> * Where can we see your work of modifying the fuse module to cache the
> calls? Some reference would help us to provide more specific pointers. (or
> ask better questions).
>
> I've created a fast path framework for FUSE, where the user space daemon
can load a module and register handlers for file operations (lookup, open,
r/w, etc.) that must be handled in the kernel itself without an up call to
the user space. I call them fast path handlers. This design also retains
the regular FUSE handlers for file system operations in upser-space (slow
path). The fast path and slow path can communicate with each other over
shared memory or using  syscalls to enable/invalidate caching of data
structs (e.g., results of getattr, getxattr, etc.)

There's a process I need to follow in order to make the code available
publicly. I've already started, but will take some time. I will try to do
this asap.

* If the caching happened in fuse module, and it expects the regular
> arguments as the parameters, then there may not be any work required at all
> in glusterfs, as it works on low-level fuse api.
>
>
The fast handlers expect same interface and args (fuse_args) as the regular
user-space daemon. The fast handler code is fs-specific, therefore, must
come from glusterfs. Changes are also needed in glusterfs code to
communicate with the fast path for enabling/invalidating caching.


> * Also, how to invalidate caches from userspace program? because GlusterFS
> could be accessed from multiple clients, so it becomes an important piece
> to have.
>
>
Server invalidate can trigger a system call into the fast path framework to
invalidate caches.


> For referring at the codebase to look at integration with fuse module,
> please check the directory 'xlators/mount/fuse/src/' and mostly the file
> 'fuse-bridge.c'.
>
> Thanks for your interest in project, would be great to collaborate on this
> effort, as it can enhance the performance of glusterfs in many usecases.
>

I'm still going through gluster developer documentation, but it'd be
helpful if you could mention what kind of use cases does the fast/slow
split FUSE framework enable? i've already applied the framework to
accelerate multiple FUSE-based stackable file systems, but want the
interface to be generic enough for all FUSE file systems to take advantage
of it.


> Regards,
> Amar
>
>
>
>
> On Mon, Apr 2, 2018 at 6:34 AM, riya khanna 
> wrote:
>
>> Hi,
>>
>> I've modified FUSE framework to take a part of user-space daemon code and
>> moves it into the kernel fuse driver to minimize user-kernel-user switches
>> during file system  operations. An example would be caching
>> getattr/getxattr/lookup/security checks etc. This design, therefore,
>> create fast (served directly from the kernel) and a slow (regular fuse)
>> execution paths. The fast and slow paths can also communicate with each
>> other using shared memory.
>>
>> I was wondering if it is possible to accelerate glusterfs using this
>> design. What pieces could (should) be easily moved to kernel space? Any
>> pointers would be highly appreciated. Thanks!
>>
>> -Riya
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
>
> --
> Amar Tumballi (amarts)
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] optimizing gluster fuse

2018-04-02 Thread Amar Tumballi
Hi Riya,

Thanks for writing to us. Some questions before we start on this.

* Where can we see your work of modifying the fuse module to cache the
calls? Some reference would help us to provide more specific pointers. (or
ask better questions).

* If the caching happened in fuse module, and it expects the regular
arguments as the parameters, then there may not be any work required at all
in glusterfs, as it works on low-level fuse api.

* Also, how to invalidate caches from userspace program? because GlusterFS
could be accessed from multiple clients, so it becomes an important piece
to have.

For referring at the codebase to look at integration with fuse module,
please check the directory 'xlators/mount/fuse/src/' and mostly the file
'fuse-bridge.c'.

Thanks for your interest in project, would be great to collaborate on this
effort, as it can enhance the performance of glusterfs in many usecases.

Regards,
Amar




On Mon, Apr 2, 2018 at 6:34 AM, riya khanna 
wrote:

> Hi,
>
> I've modified FUSE framework to take a part of user-space daemon code and
> moves it into the kernel fuse driver to minimize user-kernel-user switches
> during file system  operations. An example would be caching
> getattr/getxattr/lookup/security checks etc. This design, therefore,
> create fast (served directly from the kernel) and a slow (regular fuse)
> execution paths. The fast and slow paths can also communicate with each
> other using shared memory.
>
> I was wondering if it is possible to accelerate glusterfs using this
> design. What pieces could (should) be easily moved to kernel space? Any
> pointers would be highly appreciated. Thanks!
>
> -Riya
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Amar Tumballi (amarts)
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel