Re: [Gluster-devel] FOP ratelimit?
Something ceph is working on "dmclock" http://tracker.ceph.com/projects/ceph/wiki/Rados_qos might be we can talk to them ? ~Joe - Original Message - From: "Jeff Darcy" To: "Joseph Fernandes" Cc: "Gluster Devel" , "Raghavendra Gowdappa" , "Venky Shankar" , "Pranith Kumar Karampuri" , "Shyamsundar Ranganathan" Sent: Thursday, September 10, 2015 6:57:51 PM Subject: Re: [Gluster-devel] FOP ratelimit? > Have we given thought about other IO scheduling algorithms like mclock > algorithm [1], used by vmware for their QOS solution. > Plus another point to keep in mind here is the distributed nature of the > solution. Its easier to think of a brick > controlling the throughput for a client or a tenant. But how would this work > in collaboration and scale with all the > bricks together, what I am talking about is Distributed QOS. At the packet level, this is a core problem that SDN has to solve. When we're running in an SDN environment, we should just hand off responsibility for QoS to them. Otherwise, we should probably steal their algorithms. ;) I believe there are some experts elsewhere at Red Hat whose brains we can and should pick. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] FOP ratelimit?
> Have we given thought about other IO scheduling algorithms like mclock > algorithm [1], used by vmware for their QOS solution. > Plus another point to keep in mind here is the distributed nature of the > solution. Its easier to think of a brick > controlling the throughput for a client or a tenant. But how would this work > in collaboration and scale with all the > bricks together, what I am talking about is Distributed QOS. At the packet level, this is a core problem that SDN has to solve. When we're running in an SDN environment, we should just hand off responsibility for QoS to them. Otherwise, we should probably steal their algorithms. ;) I believe there are some experts elsewhere at Red Hat whose brains we can and should pick. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] FOP ratelimit?
Hi Guys, Have we given thought about other IO scheduling algorithms like mclock algorithm [1], used by vmware for their QOS solution. Plus another point to keep in mind here is the distributed nature of the solution. Its easier to think of a brick controlling the throughput for a client or a tenant. But how would this work in collaboration and scale with all the bricks together, what I am talking about is Distributed QOS. Regards, Joe [1] http://www.gluster.org/community/documentation/index.php/File:Qos.odp - Original Message - From: "Venky Shankar" To: "Raghavendra Gowdappa" Cc: "Gluster Devel" Sent: Thursday, September 10, 2015 12:16:41 PM Subject: Re: [Gluster-devel] FOP ratelimit? On Thu, Sep 3, 2015 at 11:36 AM, Raghavendra Gowdappa wrote: > > > - Original Message - >> From: "Emmanuel Dreyfus" >> To: "Raghavendra Gowdappa" , "Pranith Kumar Karampuri" >> >> Cc: gluster-devel@gluster.org >> Sent: Wednesday, September 2, 2015 8:12:37 PM >> Subject: Re: [Gluster-devel] FOP ratelimit? >> >> Raghavendra Gowdappa wrote: >> >> > Its helpful if you can give some pointers on what parameters (like >> > latency, throughput etc) you want us to consider for QoS. >> >> Full blown QoS would be nice, but a first line of defense against >> resource hogs seems just badly required. >> >> A bare minimum could be to process client's FOP in a round robin >> fashion. That way even if one client sends a lot of FOPs, there is >> always some window for others to slip in. >> >> Any opinion? > > As of now we depend on epoll/poll events informing servers about incoming > messages. All sockets are put in the same event-pool represented by a single > poll-control fd. So, the order of our processing of msgs from various clients > really depends on how epoll/poll picks events across multiple sockets. Do > poll/epoll have any sort of scheduling? or is it random? Any pointers on this > are appreciated. I haven't come across any kind of scheduling for picking events for sockets. Routers use synthetic throttling for traffic shaping. Most commonly used technique is by using TBF (token bucket filter) to "induce" latency for outbound traffic. Lustre had some work[1] done for QoS along the lines of TBF. HTH. [1]: http://cdn.opensfs.org/wp-content/uploads/2014/10/7-DDN_LiXi_lustre_QoS.pdf > >> >> -- >> Emmanuel Dreyfus >> http://hcpnet.free.fr/pubz >> m...@netbsd.org >> > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] FOP ratelimit?
Hi Guys, Have we given thought about other IO scheduling algorithms like mclock algorithm [1], used by vmware for their QOS solution. Plus another point to keep in mind here is the distributed nature of the solution. Its easier to think of a brick controlling the throughput for a client or a tenant. But how would this work in collaboration and scale with all the bricks together, what I am talking about is Distributed QOS. Regards, Joe [1] http://www.gluster.org/community/documentation/index.php/File:Qos.odp - Original Message - From: "Venky Shankar" To: "Raghavendra Gowdappa" Cc: "Gluster Devel" Sent: Thursday, September 10, 2015 12:16:41 PM Subject: Re: [Gluster-devel] FOP ratelimit? On Thu, Sep 3, 2015 at 11:36 AM, Raghavendra Gowdappa wrote: > > > - Original Message - >> From: "Emmanuel Dreyfus" >> To: "Raghavendra Gowdappa" , "Pranith Kumar Karampuri" >> >> Cc: gluster-devel@gluster.org >> Sent: Wednesday, September 2, 2015 8:12:37 PM >> Subject: Re: [Gluster-devel] FOP ratelimit? >> >> Raghavendra Gowdappa wrote: >> >> > Its helpful if you can give some pointers on what parameters (like >> > latency, throughput etc) you want us to consider for QoS. >> >> Full blown QoS would be nice, but a first line of defense against >> resource hogs seems just badly required. >> >> A bare minimum could be to process client's FOP in a round robin >> fashion. That way even if one client sends a lot of FOPs, there is >> always some window for others to slip in. >> >> Any opinion? > > As of now we depend on epoll/poll events informing servers about incoming > messages. All sockets are put in the same event-pool represented by a single > poll-control fd. So, the order of our processing of msgs from various clients > really depends on how epoll/poll picks events across multiple sockets. Do > poll/epoll have any sort of scheduling? or is it random? Any pointers on this > are appreciated. I haven't come across any kind of scheduling for picking events for sockets. Routers use synthetic throttling for traffic shaping. Most commonly used technique is by using TBF (token bucket filter) to "induce" latency for outbound traffic. Lustre had some work[1] done for QoS along the lines of TBF. HTH. [1]: http://cdn.opensfs.org/wp-content/uploads/2014/10/7-DDN_LiXi_lustre_QoS.pdf > >> >> -- >> Emmanuel Dreyfus >> http://hcpnet.free.fr/pubz >> m...@netbsd.org >> > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] FOP ratelimit?
On Thu, Sep 3, 2015 at 11:36 AM, Raghavendra Gowdappa wrote: > > > - Original Message - >> From: "Emmanuel Dreyfus" >> To: "Raghavendra Gowdappa" , "Pranith Kumar Karampuri" >> >> Cc: gluster-devel@gluster.org >> Sent: Wednesday, September 2, 2015 8:12:37 PM >> Subject: Re: [Gluster-devel] FOP ratelimit? >> >> Raghavendra Gowdappa wrote: >> >> > Its helpful if you can give some pointers on what parameters (like >> > latency, throughput etc) you want us to consider for QoS. >> >> Full blown QoS would be nice, but a first line of defense against >> resource hogs seems just badly required. >> >> A bare minimum could be to process client's FOP in a round robin >> fashion. That way even if one client sends a lot of FOPs, there is >> always some window for others to slip in. >> >> Any opinion? > > As of now we depend on epoll/poll events informing servers about incoming > messages. All sockets are put in the same event-pool represented by a single > poll-control fd. So, the order of our processing of msgs from various clients > really depends on how epoll/poll picks events across multiple sockets. Do > poll/epoll have any sort of scheduling? or is it random? Any pointers on this > are appreciated. I haven't come across any kind of scheduling for picking events for sockets. Routers use synthetic throttling for traffic shaping. Most commonly used technique is by using TBF (token bucket filter) to "induce" latency for outbound traffic. Lustre had some work[1] done for QoS along the lines of TBF. HTH. [1]: http://cdn.opensfs.org/wp-content/uploads/2014/10/7-DDN_LiXi_lustre_QoS.pdf > >> >> -- >> Emmanuel Dreyfus >> http://hcpnet.free.fr/pubz >> m...@netbsd.org >> > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] FOP ratelimit?
- Original Message - > From: "Emmanuel Dreyfus" > To: "Raghavendra Gowdappa" , "Pranith Kumar Karampuri" > > Cc: gluster-devel@gluster.org > Sent: Wednesday, September 2, 2015 8:12:37 PM > Subject: Re: [Gluster-devel] FOP ratelimit? > > Raghavendra Gowdappa wrote: > > > Its helpful if you can give some pointers on what parameters (like > > latency, throughput etc) you want us to consider for QoS. > > Full blown QoS would be nice, but a first line of defense against > resource hogs seems just badly required. > > A bare minimum could be to process client's FOP in a round robin > fashion. That way even if one client sends a lot of FOPs, there is > always some window for others to slip in. > > Any opinion? As of now we depend on epoll/poll events informing servers about incoming messages. All sockets are put in the same event-pool represented by a single poll-control fd. So, the order of our processing of msgs from various clients really depends on how epoll/poll picks events across multiple sockets. Do poll/epoll have any sort of scheduling? or is it random? Any pointers on this are appreciated. > > -- > Emmanuel Dreyfus > http://hcpnet.free.fr/pubz > m...@netbsd.org > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] FOP ratelimit?
Raghavendra Gowdappa wrote: > Its helpful if you can give some pointers on what parameters (like > latency, throughput etc) you want us to consider for QoS. Full blown QoS would be nice, but a first line of defense against resource hogs seems just badly required. A bare minimum could be to process client's FOP in a round robin fashion. That way even if one client sends a lot of FOPs, there is always some window for others to slip in. Any opinion? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] FOP ratelimit?
> Do you have any ideas here on QoS? Can it be provided as a use-case for > multi-tenancy you were working on earlier? My interpretation of QoS would include rate limiting, but more per *activity* (e.g. self-heal, rebalance, user I/O) or per *tenant* rather than per *client*. Also, it's easier to implement at the message level (which can be done on the servers) rather than the fop level (which has to be on clients). How well does that apply to what we've been discussing in this thread? ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] FOP ratelimit?
On Wed, Sep 02, 2015 at 02:04:32PM +0530, Pranith Kumar Karampuri wrote: > >And more generally, do we have a way to ratelimit FOPs per client, so > >that one client cannot make the cluster unusable for the others? > Do you have profile data? No, it was on a production setup and I was too focused to restoring functionnality to have thought about it. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] FOP ratelimit?
On Wed, Sep 02, 2015 at 02:05:03PM +0530, Venky Shankar wrote: > > I understand rename on DHT can be very costly because data really have > > to be moved from a brick to another one just for a file name change. > > Is there a workaround for this behavior? > > Not really. DHT uses pointer files (so called link-to) to work around > moving file contents on rename(). Then I have been misled by the huge amount of DHT rename opeeations in the logs, but the user killed performance another way. Too bad I did not collect profile data at that time. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] FOP ratelimit?
+Jeff. Jeff, Do you have any ideas here on QoS? Can it be provided as a use-case for multi-tenancy you were working on earlier? regards, Raghavendra. - Original Message - > From: "Raghavendra Gowdappa" > To: "Pranith Kumar Karampuri" > Cc: gluster-devel@gluster.org > Sent: Wednesday, September 2, 2015 2:11:35 PM > Subject: Re: [Gluster-devel] FOP ratelimit? > > > > - Original Message - > > From: "Pranith Kumar Karampuri" > > To: "Emmanuel Dreyfus" , gluster-devel@gluster.org > > Sent: Wednesday, September 2, 2015 2:04:32 PM > > Subject: Re: [Gluster-devel] FOP ratelimit? > > > > > > > > On 09/02/2015 01:59 PM, Emmanuel Dreyfus wrote: > > > Hi > > > > > > Yesterday I experienced the problem of a single user bringing down > > > a glusterfs cluster to its knees because of a high amount of rename > > > operations. > > > > > > I understand rename on DHT can be very costly because data really have > > > to be moved from a brick to another one just for a file name change. > > > Is there a workaround for this behavior? > > This is not true. > > Data is not moved across bricks during rename. So, may be something else is > causing the issue. Were you running rebalance while these renames were being > done? > > > > > > > And more generally, do we have a way to ratelimit FOPs per client, so > > > that one client cannot make the cluster unusable for the others? > > Do you have profile data? > > > > Raghavendra G is working on some QOS related enahancements in gluster. > > Please let us know if you have any inputs here. > > Thanks Pranith. > > @Manu and others, > > Its helpful if you can give some pointers on what parameters (like latency, > throughput etc) you want us to consider for QoS. Also, any ideas (like > interface for QoS) in this area is welcome. With my very basic search, seems > like there are not many filesystems with QoS functionality. > > regards, > Raghavendra. > > > > Pranith > > > > > > > ___ > > Gluster-devel mailing list > > Gluster-devel@gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] FOP ratelimit?
- Original Message - > From: "Pranith Kumar Karampuri" > To: "Emmanuel Dreyfus" , gluster-devel@gluster.org > Sent: Wednesday, September 2, 2015 2:04:32 PM > Subject: Re: [Gluster-devel] FOP ratelimit? > > > > On 09/02/2015 01:59 PM, Emmanuel Dreyfus wrote: > > Hi > > > > Yesterday I experienced the problem of a single user bringing down > > a glusterfs cluster to its knees because of a high amount of rename > > operations. > > > > I understand rename on DHT can be very costly because data really have > > to be moved from a brick to another one just for a file name change. > > Is there a workaround for this behavior? > This is not true. Data is not moved across bricks during rename. So, may be something else is causing the issue. Were you running rebalance while these renames were being done? > > > > And more generally, do we have a way to ratelimit FOPs per client, so > > that one client cannot make the cluster unusable for the others? > Do you have profile data? > > Raghavendra G is working on some QOS related enahancements in gluster. > Please let us know if you have any inputs here. Thanks Pranith. @Manu and others, Its helpful if you can give some pointers on what parameters (like latency, throughput etc) you want us to consider for QoS. Also, any ideas (like interface for QoS) in this area is welcome. With my very basic search, seems like there are not many filesystems with QoS functionality. regards, Raghavendra. > > Pranith > > > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] FOP ratelimit?
On Wed, Sep 2, 2015 at 2:05 PM, Venky Shankar wrote: > On Wed, Sep 2, 2015 at 1:59 PM, Emmanuel Dreyfus wrote: >> Hi >> >> Yesterday I experienced the problem of a single user bringing down >> a glusterfs cluster to its knees because of a high amount of rename >> operations. >> >> I understand rename on DHT can be very costly because data really have >> to be moved from a brick to another one just for a file name change. >> Is there a workaround for this behavior? > > Not really. DHT uses pointer files (so called link-to) to work around > moving file contents on rename(). > >> >> And more generally, do we have a way to ratelimit FOPs per client, so >> that one client cannot make the cluster unusable for the others? > > There is some form of limiting based on priority (w/ client-pids) in > io-threads. For bit-rot, I had used token bucket > based throttling[1] during hash calculation. But that resides on the > client side for bitrot xlator. It may be beneficial > to have that on the server side. [1]: https://github.com/gluster/glusterfs/blob/master/xlators/features/bit-rot/src/bitd/bit-rot-tbf.c > >> >> -- >> Emmanuel Dreyfus >> m...@netbsd.org >> ___ >> Gluster-devel mailing list >> Gluster-devel@gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] FOP ratelimit?
On Wed, Sep 2, 2015 at 1:59 PM, Emmanuel Dreyfus wrote: > Hi > > Yesterday I experienced the problem of a single user bringing down > a glusterfs cluster to its knees because of a high amount of rename > operations. > > I understand rename on DHT can be very costly because data really have > to be moved from a brick to another one just for a file name change. > Is there a workaround for this behavior? Not really. DHT uses pointer files (so called link-to) to work around moving file contents on rename(). > > And more generally, do we have a way to ratelimit FOPs per client, so > that one client cannot make the cluster unusable for the others? There is some form of limiting based on priority (w/ client-pids) in io-threads. For bit-rot, I had used token bucket based throttling[1] during hash calculation. But that resides on the client side for bitrot xlator. It may be beneficial to have that on the server side. > > -- > Emmanuel Dreyfus > m...@netbsd.org > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] FOP ratelimit?
On 09/02/2015 01:59 PM, Emmanuel Dreyfus wrote: Hi Yesterday I experienced the problem of a single user bringing down a glusterfs cluster to its knees because of a high amount of rename operations. I understand rename on DHT can be very costly because data really have to be moved from a brick to another one just for a file name change. Is there a workaround for this behavior? This is not true. And more generally, do we have a way to ratelimit FOPs per client, so that one client cannot make the cluster unusable for the others? Do you have profile data? Raghavendra G is working on some QOS related enahancements in gluster. Please let us know if you have any inputs here. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel