Re: [Gluster-users] Disabling read-ahead and io-cache for native fuse mounts

2019-02-12 Thread Manoj Pillai
On Wed, Feb 13, 2019 at 10:51 AM Raghavendra Gowdappa 
wrote:

>
>
> On Tue, Feb 12, 2019 at 5:38 PM Raghavendra Gowdappa 
> wrote:
>
>> All,
>>
>> We've found perf xlators io-cache and read-ahead not adding any
>> performance improvement. At best read-ahead is redundant due to kernel
>> read-ahead
>>
>
> One thing we are still figuring out is whether kernel read-ahead is
> tunable. From what we've explored, it _looks_ like (may not be entirely
> correct), ra is capped at 128KB. If that's the case, I am interested in few
> things:
> * Are there any realworld applications/usecases, which would benefit from
> larger read-ahead (Manoj says block devices can do ra of 4MB)?
>

kernel read-ahead is adaptive but influenced by the read-ahead setting on
the block device (/sys/block//queue/read_ahead_kb), which can be
tuned. For RHEL specifically, the default is 128KB (last I checked) but the
default RHEL tuned-profile, throughput-performance, bumps that up to 4MB.
It should be fairly easy to rig up a test  where 4MB read-ahead on the
block device gives better performance than 128KB read-ahead.

-- Manoj

* Is the limit on kernel ra tunable a hard one? IOW, what does it take to
> make it to do higher ra? If its difficult, can glusterfs read-ahead provide
> the expected performance improvement for these applications that would
> benefit from aggressive ra (as glusterfs can support larger ra sizes)?
>
> I am still inclined to prefer kernel ra as I think its more intelligent
> and can identify more sequential patterns than Glusterfs read-ahead [1][2].
> [1] https://www.kernel.org/doc/ols/2007/ols2007v2-pages-273-284.pdf
> [2] https://lwn.net/Articles/155510/
>
> and at worst io-cache is degrading the performance for workloads that
>> doesn't involve re-read. Given that VFS already have both these
>> functionalities, I am proposing to have these two translators turned off by
>> default for native fuse mounts.
>>
>> For non-native fuse mounts like gfapi (NFS-ganesha/samba) we can have
>> these xlators on by having custom profiles. Comments?
>>
>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1665029
>>
>> regards,
>> Raghavendra
>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [ovirt-users] GlusterFS performance with only one drive per host?

2018-03-24 Thread Manoj Pillai
My take is that unless you have loads of data and are trying to optimize
for cost/TB, HDDs are probably not the right choice. This is particularly
true for random I/O workloads for which HDDs are really quite bad.

I'd recommend a recent gluster release, and some tuning because the default
settings are not optimized for performance. Some options to consider:
client.event-threads
server.event-threads
cluster.choose-local
performance.client-io-threads

You can toggle the last two and see what works for you. You'd probably need
to set event-threads to 4 or more. Ideally you'd tune some of the thread
pools based on observed bottlenecks in collected stats. top (top -bHd 10 >
top_threads.out.txt) is great for this. Using 6 small drives/bricks instead
of 3 is also a good idea to reduce likelihood of rpc bottlenecks.

There has been an effort to improve gluster performance over fast SSDs.
Hence the recommendation to try with a recent release. You can also check
in on some of the issues being worked on:
https://github.com/gluster/glusterfs/issues/412
https://github.com/gluster/glusterfs/issues/410

-- Manoj

On Sat, Mar 24, 2018 at 4:14 AM, Jayme <jay...@gmail.com> wrote:

> Do you feel that SSDs are worth the extra cost or am I better off using
> regular HDDs?  I'm looking for the best performance I can get with glusterFS
>
> On Fri, Mar 23, 2018 at 12:03 AM, Manoj Pillai <mpil...@redhat.com> wrote:
>
>>
>>
>> On Thu, Mar 22, 2018 at 3:31 PM, Sahina Bose <sab...@redhat.com> wrote:
>>
>>>
>>>
>>> On Mon, Mar 19, 2018 at 5:57 PM, Jayme <jay...@gmail.com> wrote:
>>>
>>>> I'm spec'ing a new oVirt build using three Dell R720's w/ 256GB.  I'm
>>>> considering storage options.  I don't have a requirement for high amounts
>>>> of storage, I have a little over 1TB to store but want some overhead so I'm
>>>> thinking 2TB of usable space would be sufficient.
>>>>
>>>> I've been doing some research on Micron 1100 2TB ssd's and they seem to
>>>> offer a lot of value for the money.  I'm considering using smaller cheaper
>>>> SSDs for boot drives and using one 2TB micron SSD in each host for a
>>>> glusterFS replica 3 setup (on the fence about using an arbiter, I like the
>>>> extra redundancy replicate 3 will give me).
>>>>
>>>> My question is, would I see a performance hit using only one drive in
>>>> each host with glusterFS or should I try to add more physical disks.  Such
>>>> as 6 1TB drives instead of 3 2TB drives?
>>>>
>>>
>> It is possible. With SSDs the rpc layer can become the bottleneck with
>> some workloads, especially if there are not enough connections out to the
>> server side. We had experimented with a multi-connection model for this
>> reason:  https://review.gluster.org/#/c/19133/.
>>
>> -- Manoj
>>
>>>
>>> [Adding gluster-users for inputs here]
>>>
>>>
>>>> Also one other question.  I've read that gluster can only be done in
>>>> groups of three.  Meaning you need 3, 6, or 9 hosts.  Is this true?  If I
>>>> had an operational replicate 3 glusterFS setup and wanted to add more
>>>> capacity I would have to add 3 more hosts, or is it possible for me to add
>>>> a 4th host in to the mix for extra processing power down the road?
>>>>
>>>
>>> In oVirt, we support replica 3 or replica 3 with arbiter (where one of
>>> the 3 bricks is a low storage arbiter brick). To expand storage, you would
>>> need to add in multiples of 3 bricks. However if you only want to expand
>>> compute capacity in your HC environment, you can add a 4th node.
>>>
>>>
>>>> Thanks!
>>>>
>>>>
>>>> ___
>>>> Users mailing list
>>>> us...@ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>>
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [ovirt-users] GlusterFS performance with only one drive per host?

2018-03-22 Thread Manoj Pillai
On Thu, Mar 22, 2018 at 3:31 PM, Sahina Bose  wrote:

>
>
> On Mon, Mar 19, 2018 at 5:57 PM, Jayme  wrote:
>
>> I'm spec'ing a new oVirt build using three Dell R720's w/ 256GB.  I'm
>> considering storage options.  I don't have a requirement for high amounts
>> of storage, I have a little over 1TB to store but want some overhead so I'm
>> thinking 2TB of usable space would be sufficient.
>>
>> I've been doing some research on Micron 1100 2TB ssd's and they seem to
>> offer a lot of value for the money.  I'm considering using smaller cheaper
>> SSDs for boot drives and using one 2TB micron SSD in each host for a
>> glusterFS replica 3 setup (on the fence about using an arbiter, I like the
>> extra redundancy replicate 3 will give me).
>>
>> My question is, would I see a performance hit using only one drive in
>> each host with glusterFS or should I try to add more physical disks.  Such
>> as 6 1TB drives instead of 3 2TB drives?
>>
>
It is possible. With SSDs the rpc layer can become the bottleneck with some
workloads, especially if there are not enough connections out to the server
side. We had experimented with a multi-connection model for this reason:
https://review.gluster.org/#/c/19133/.

-- Manoj

>
> [Adding gluster-users for inputs here]
>
>
>> Also one other question.  I've read that gluster can only be done in
>> groups of three.  Meaning you need 3, 6, or 9 hosts.  Is this true?  If I
>> had an operational replicate 3 glusterFS setup and wanted to add more
>> capacity I would have to add 3 more hosts, or is it possible for me to add
>> a 4th host in to the mix for extra processing power down the road?
>>
>
> In oVirt, we support replica 3 or replica 3 with arbiter (where one of the
> 3 bricks is a low storage arbiter brick). To expand storage, you would need
> to add in multiples of 3 bricks. However if you only want to expand compute
> capacity in your HC environment, you can add a 4th node.
>
>
>> Thanks!
>>
>>
>> ___
>> Users mailing list
>> us...@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] gluster for home directories?

2018-03-08 Thread Manoj Pillai
Hi Rik,

Nice clarity and detail in the description. Thanks!

inline...

On Wed, Mar 7, 2018 at 8:29 PM, Rik Theys 
wrote:

> Hi,
>
> We are looking into replacing our current storage solution and are
> evaluating gluster for this purpose. Our current solution uses a SAN
> with two servers attached that serve samba and NFS 4. Clients connect to
> those servers using NFS or SMB. All users' home directories live on this
> server.
>
> I would like to have some insight in who else is using gluster for home
> directories for about 500 users and what performance they get out of the
> solution. Which connectivity method are you using on the clients
> (gluster native, nfs, smb)? Which volume options do you have configured
> for your gluster volume? What hardware are you using? Are you using
> snapshots and/or quota? If so, any number on performance impact?
>
> The solution I had in mind for our setup is multiple servers/bricks with
> replica 3 arbiter 1 volume where each server is also running nfs-ganesha
> and samba in HA. Clients would be connecting to one of the nfs servers
> (dns round robin). In this case the nfs servers would be the gluster
> clients. Gluster traffic would go over a dedicated network with 10G and
> jumbo frames.
>
> I'm currently testing gluster (3.12, now 3.13) on older machines[1] and
> have created a replica 3 arbiter 1 volume 2x(2+1). I seem to run in all
> sorts of (performance) problems. I must be doing something wrong but
> I've tried all sorts of benchmarks and nothing seems to make my setup
> live up to what I would expect from this hardware.
>
> * I understand that gluster only starts to work well when multiple
> clients are connecting in parallel, but I did expect the single client
> performance to be better.
>
> * Unpacking the linux-4.15.7.tar.xz file on the brick XFS filesystem
> followed by a sync takes about 1 minute. Doing the same on the gluster
> volume using the fuse client (client is one of the brick servers) takes
> over 9 minutes and neither disk nor cpu nor network are reaching their
> bottleneck. Doing the same over NFS-ganesha (client is a workstation
> connected through gbit) takes even longer (more than 30min!?).
>
> I understand that unpacking a lot of small files may be the worst
> workload for a distributed filesystem, but when I look at the file sizes
> of the files in our users' home directories, more than 90% is smaller
> than 1MB.
>
> * A file copy of a 300GB file over NFS 4 (nfs-ganesha) starts fast
> (90MB/s) and then drops to 20MB/s. When I look at the servers during the
> copy, I don't see where the bottleneck is as the cpu, disk and network
> are not maxing out (on none of the bricks). When the same client copies
> the file to our current NFS storage it is limited by the gbit network
> connection of the client.
>

Both untar and cp are single-threaded, which means throughput is mostly
dictated by latency. Latency is generally higher in a distributed FS;
nfs-ganesha has an extra hop to the backend, and hence higher latency for
most operations compared to glusterfs-fuse.

You don't necessarily need multiple clients for good performance with
gluster. Many multi-threaded benchmarks give good performance from a single
client. Here for e.g., if you run multiple copy commands in parallel from
the same client, I'd expect your aggregate transfer rate to improve.

Been a long while since I looked at nfs-ganesha. But in terms of upper
bounds for throughput tests: data needs to flow over the client->nfs-server
link, and then, depending on which servers the file is located on, either
1x (if the nfs-ganesha node is also hosting one copy of the file, and
neglecting arbiter) or 2x over the s2s link. With 1Gbps links, that means
an upper bound between 125 MB/s and 62.5 MB/s, in the steady state, unless
I miscalculated.

-- Manoj


>
> * I had the 'cluster.optimize-lookup' option enabled but ran into all
> sorts of issues where ls is showing either the wrong files (content of a
> different directory), or claiming a directory does not exist when mkdir
> says it already exists... I current have the following options set:
>
> server.outstanding-rpc-limit: 256
> client.event-threads: 4
> performance.io-thread-count: 16
> performance.parallel-readdir: on
> server.event-threads: 4
> performance.cache-size: 2GB
> performance.rda-cache-limit: 128MB
> performance.write-behind-window-size: 8MB
> performance.md-cache-timeout: 600
> performance.cache-invalidation: on
> performance.stat-prefetch: on
> network.inode-lru-limit: 50
> performance.nl-cache-timeout: 600
> performance.nl-cache: on
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> transport.address-family: inet
> nfs.disable: on
> cluster.enable-shared-storage: enable
>
> The brick servers have 2 dual-core cpu's so I've set the client and
> server event threads to 4.
>
> * When using nfs-ganesha I run into bugs that makes me wonder who is
> using nfs-ganesha with 

Re: [Gluster-users] [Gluster-devel] CFP for Gluster Developer Summit

2016-08-19 Thread Manoj Pillai

Here's a proposal ...

Title: State of Gluster Performance
Theme: Stability and Performance

I hope to achieve the following in this talk:

* present a brief overview of current performance for the broad
workload classes: large-file sequential and random workloads,
small-file and metadata-intensive workloads.

* highlight some use-cases where we are seeing really good
performance.

* highlight some of the areas of concerns, covering in some detail
the state of analysis and work in progress.

Regards,
Manoj

- Original Message -
> Hey All,
> 
> Gluster Developer Summit 2016 is fast approaching [1] on us. We are
> looking to have talks and discussions related to the following themes in
> the summit:
> 
> 1. Gluster.Next - focusing on features shaping the future of Gluster
> 
> 2. Experience - Description of real world experience and feedback from:
> a> Devops and Users deploying Gluster in production
> b> Developers integrating Gluster with other ecosystems
> 
> 3. Use cases  - focusing on key use cases that drive Gluster.today and
> Gluster.Next
> 
> 4. Stability & Performance - focusing on current improvements to reduce
> our technical debt backlog
> 
> 5. Process & infrastructure  - focusing on improving current workflow,
> infrastructure to make life easier for all of us!
> 
> If you have a talk/discussion proposal that can be part of these themes,
> please send out your proposal(s) by replying to this thread. Please
> clearly mention the theme for which your proposal is relevant when you
> do so. We will be ending the CFP by 12 midnight PDT on August 31st, 2016.
> 
> If you have other topics that do not fit in the themes listed, please
> feel free to propose and we might be able to accommodate some of them as
> lightening talks or something similar.
> 
> Please do reach out to me or Amye if you have any questions.
> 
> Thanks!
> Vijay
> 
> [1] https://www.gluster.org/events/summit2016/
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users