Re: [Gluster-devel] [Gluster-infra] Smoke tests run on the builder in RH DC (at least)
Le lundi 25 janvier 2016 à 22:24 +0100, Niels de Vos a écrit : > On Mon, Jan 25, 2016 at 06:59:33PM +0100, Michael Scherer wrote: > > Hi, > > > > so today, after fixing one last config item, the smoke test jobs run > > fine on the Centos 6 builder in the RH DC, which build things as non > > root, then start the tests, then reboot the server. > > Nice, sounds like great progress! > > Did you need to change anything in the build or test scripts under > /opt/qa? If so, please make sure that the changes land in the > repository: > > https://github.com/gluster/glusterfs-patch-acceptance-tests/ So far, I mostly removed code that was running on the jenkins script (ie, the cleanup part that kill process), and added a reboot at the end. Not sure I want to have that in the script :) > > Now, I am looking at the fedora one, but once this one is good, I will > > likely reinstall a few builders as a test, and go on Centos 7 builder. > > I'm not sure yet if I made an error, or what is going on. But for some > reason smoke tests for my patch series fails... This is the smoke result > of the 1st patch in the serie, it only updates the fuse-header to a > newer version. Of course local testing works just fine... The output and > (not available) logs of the smoke test do not really help me :-/ > > https://build.gluster.org/job/smoke/24395/console > > Could this be related to the changes that were made? If not, I'd > appreciate a pointer to my mistake. No, I tested on a separate job to not interfere. > > I was also planning to look at jenkins job builder for the jenkins, but > > no time yet. Will be after jenkins migration to a new host (which is > > still not planned, unlike gerrit where we should be attempting to find a > > time for that) > > We also might want to use Jenkins Job Builder for the tests we're adding > to the CentOS CI. Maybe we could experiment with it there first, and > then use our knowledge to the Gluster Jenkins? Why not. I think it would work fine on our too, as i am not sure it need to completely take over the server configuration. -- Michael Scherer Sysadmin, Community Infrastructure and Platform, OSAS signature.asc Description: This is a digitally signed message part ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] distributed files/directories and [cm]time updates
Hi Pranith, On 26/01/16 03:47, Pranith Kumar Karampuri wrote: hi, Traditionally gluster has been using ctime/mtime of the files/dirs on the bricks as stat output. Problem we are seeing with this approach is that, software which depends on it gets confused when there are differences in these times. Tar especially gives "file changed as we read it" whenever it detects ctime differences when stat is served from different bricks. The way we have been trying to solve it is to serve the stat structures from same brick in afr, max-time in dht. But it doesn't avoid the problem completely. Because there is no way to change ctime at the moment(lutimes() only allows mtime, atime), there is little we can do to make sure ctimes match after self-heals/xattr updates/rebalance. I am wondering if anyone of you solved these problems before, if yes how did you go about doing it? It seems like applications which depend on this for backups get confused the same way. The only way out I see it is to bring ctime to an xattr, but that will need more iops and gluster has to keep updating it on quite a few fops. I did think about this when I was writing ec at the beginning. The idea was that the point in time at which each fop is executed were controlled by the client by adding an special xattr to each regular fop. Of course this would require support inside the storage/posix xlator. At that time, adding the needed support to other xlators seemed too complex for me, so I decided to do something similar to afr. Anyway, the idea was like this: for example, when a write fop needs to be sent, dht/afr/ec sets the current time in a special xattr, for example 'glusterfs.time'. It can be done in a way that if the time is already set by a higher xlator, it's not modified. This way DHT could set the time in fops involving multiple afr subvolumes. For other fops, would be afr who sets the time. It could also be set directly by the top most xlator (fuse), but that time could be incorrect because lower xlators could delay the fop execution and reorder it. This would need more thinking. That xattr will be received by storage/posix. This xlator will determine what times need to be modified and will change them. In the case of a write, it can decide to modify mtime and, maybe, atime. For a mkdir or create, it will set the times of the new file/directory and also the mtime of the parent directory. It depends on the specific fop being processed. mtime, atime and ctime (or even others) could be saved in a special posix xattr instead of relying on the file system attributes that cannot be modified (at least for ctime). This solution doesn't require extra fops, So it seems quite clean to me. The additional I/O needed in posix could be minimized by implementing a metadata cache in storage/posix that would read all metadata on lookup and update it on disk only at regular intervals and/or on invalidation. All fops would read/write into the cache. This would even reduce the number of I/O we are currently doing for each fop. Xavi ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Throttling xlator on the bricks
On Tue, Jan 26, 2016 at 03:11:50AM +, Richard Wareing wrote: > > If there is one bucket per client and one thread per bucket, it would be > > difficult to scale as the number of clients increase. How can we do this > > better? > > On this note... consider that 10's of thousands of clients are not > unrealistic in production :). Using a thread per bucket would also > beunwise.. > > On the idea in general, I'm just wondering if there's specific (real-world) > cases where this has even been an issue where least-prio queuing hasn't been > able to handle? Or is this more of a theoretical concern? I ask as I've not > really encountered situations where I wished I could give more FOPs to SHD vs > rebalance and such. > > In any event, it might be worth having Shreyas detail his throttling feature > (that can throttle any directory hierarchy no less) to illustrate how a > simpler design can achieve similar results to these more complicated (and it > followsbug prone) approaches. TBF isn't complicated at all - it's widely used for traffic shaping, cgroups, UML to rate limit disk I/O. But, I won't hurry up on things and wait to hear out from Shreyas regarding his throttling design. > > Richard > > > From: gluster-devel-boun...@gluster.org [gluster-devel-boun...@gluster.org] > on behalf of Vijay Bellur [vbel...@redhat.com] > Sent: Monday, January 25, 2016 6:44 PM > To: Ravishankar N; Gluster Devel > Subject: Re: [Gluster-devel] Throttling xlator on the bricks > > On 01/25/2016 12:36 AM, Ravishankar N wrote: > > Hi, > > > > We are planning to introduce a throttling xlator on the server (brick) > > process to regulate FOPS. The main motivation is to solve complaints about > > AFR selfheal taking too much of CPU resources. (due to too many fops for > > entry > > self-heal, rchecksums for data self-heal etc.) > > > I am wondering if we can re-use the same xlator for throttling > bandwidth, iops etc. in addition to fops. Based on admin configured > policies we could provide different upper thresholds to different > clients/tenants and this could prove to be an useful feature in > multitenant deployments to avoid starvation/noisy neighbor class of > problems. Has any thought gone in this direction? > > > > > The throttling is achieved using the Token Bucket Filter algorithm > > (TBF). TBF > > is already used by bitrot's bitd signer (which is a client process) in > > gluster to regulate the CPU intensive check-sum calculation. By putting the > > logic on the brick side, multiple clients- selfheal, bitrot, rebalance or > > even the mounts themselves can avail the benefits of throttling. > > > > The TBF algorithm in a nutshell is as follows: There is a bucket which > > is filled > > at a steady (configurable) rate with tokens. Each FOP will need a fixed > > amount > > of tokens to be processed. If the bucket has that many tokens, the FOP is > > allowed and that many tokens are removed from the bucket. If not, the FOP is > > queued until the bucket is filled. > > > > The xlator will need to reside above io-threads and can have different > > buckets, > > one per client. There has to be a communication mechanism between the > > client and > > the brick (IPC?) to tell what FOPS need to be regulated from it, and the > > no. of > > tokens needed etc. These need to be re configurable via appropriate > > mechanisms. > > Each bucket will have a token filler thread which will fill the tokens > > in it. > > If there is one bucket per client and one thread per bucket, it would be > difficult to scale as the number of clients increase. How can we do this > better? > > > The main thread will enqueue heals in a list in the bucket if there aren't > > enough tokens. Once the token filler detects some FOPS can be serviced, > > it will > > send a cond-broadcast to a dequeue thread which will process (stack > > wind) all > > the FOPS that have the required no. of tokens from all buckets. > > > > This is just a high level abstraction: requesting feedback on any aspect of > > this feature. what kind of mechanism is best between the client/bricks for > > tuning various parameters? What other requirements do you foresee? > > > > I am in favor of having administrator defined policies or templates > (collection of policies) being used to provide the tuning parameter per > client or a set of clients. We could even have a default template per > use case etc. Is there a specific need to have this negotiation between > clients and servers? > > Thanks, > Vijay > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel&d=CwICAg&c=5VD0RTtNlTh3ycd41b3MUw&r=qJ8Lp7ySfpQklq3QZr44Iw&m=aQHnnoxK50Ebw77QHtp3ykjC976mJIt2qrIUzpqEViQ&s=Jitbldlbjwye6QI8V33ZoKtVt6-B64p2_-5piVlfXMQ&e= > ___ > Glu
Re: [Gluster-devel] Throttling xlator on the bricks
On 01/25/16 20:36, Pranith Kumar Karampuri wrote: On 01/26/2016 08:41 AM, Richard Wareing wrote: If there is one bucket per client and one thread per bucket, it would be difficult to scale as the number of clients increase. How can we do this better? On this note... consider that 10's of thousands of clients are not unrealistic in production :). Using a thread per bucket would also beunwise.. There is only one thread and this solution is for internal processes(shd, rebalance, quota etc) not coming in the way of clients which do I/O. On the idea in general, I'm just wondering if there's specific (real-world) cases where this has even been an issue where least-prio queuing hasn't been able to handle? Or is this more of a theoretical concern? I ask as I've not really encountered situations where I wished I could give more FOPs to SHD vs rebalance and such. I have seen users resort to offline healing of the bricks whenever a brick is replaced, or new brick is added to replication to increase replica count. When entry self-heal happens or big VM image data self-heals which do rchecksums CPU spikes are seen and I/O becomes useless. This is the recent thread where a user ran into similar problem (just yesterday) (This is a combination of client-side healing and healing-load): http://www.gluster.org/pipermail/gluster-users/2016-January/025051.html We can find more of such threads if we put some time to dig into the mailing list. I personally have seen people even resort to things like, "we let gluster heal over the weekend or in the nights when none of us are working on the volumes" etc. I get at least weekly complaints of such on the IRC channel. A lot of them are in virtual environments (aws). There are people who complain healing is too slow too. We get both kinds of complaints :-). Your multi-threaded shd patch is going to help here. I somehow feel you guys are in this set of people :-). +1 In any event, it might be worth having Shreyas detail his throttling feature (that can throttle any directory hierarchy no less) to illustrate how a simpler design can achieve similar results to these more complicated (and it followsbug prone) approaches. The solution we came up with is about throttling internal I/O. And there are only 4/5 such processes(shd, rebalance, quota, bitd etc). What you are saying above about throttling any directory hierarchy seems a bit different than what we are trying to solve from the looks of it(At least from the small description you gave above :-) ). Shreyas' mail detailing the feature would definitely help us understand what each of us are trying to solve. We want to GA both multi-threaded shd and this feature for 3.8. Pranith Richard From: gluster-devel-boun...@gluster.org [gluster-devel-boun...@gluster.org] on behalf of Vijay Bellur [vbel...@redhat.com] Sent: Monday, January 25, 2016 6:44 PM To: Ravishankar N; Gluster Devel Subject: Re: [Gluster-devel] Throttling xlator on the bricks On 01/25/2016 12:36 AM, Ravishankar N wrote: Hi, We are planning to introduce a throttling xlator on the server (brick) process to regulate FOPS. The main motivation is to solve complaints about AFR selfheal taking too much of CPU resources. (due to too many fops for entry self-heal, rchecksums for data self-heal etc.) I am wondering if we can re-use the same xlator for throttling bandwidth, iops etc. in addition to fops. Based on admin configured policies we could provide different upper thresholds to different clients/tenants and this could prove to be an useful feature in multitenant deployments to avoid starvation/noisy neighbor class of problems. Has any thought gone in this direction? The throttling is achieved using the Token Bucket Filter algorithm (TBF). TBF is already used by bitrot's bitd signer (which is a client process) in gluster to regulate the CPU intensive check-sum calculation. By putting the logic on the brick side, multiple clients- selfheal, bitrot, rebalance or even the mounts themselves can avail the benefits of throttling. The TBF algorithm in a nutshell is as follows: There is a bucket which is filled at a steady (configurable) rate with tokens. Each FOP will need a fixed amount of tokens to be processed. If the bucket has that many tokens, the FOP is allowed and that many tokens are removed from the bucket. If not, the FOP is queued until the bucket is filled. The xlator will need to reside above io-threads and can have different buckets, one per client. There has to be a communication mechanism between the client and the brick (IPC?) to tell what FOPS need to be regulated from it, and the no. of tokens needed etc. These need to be re configurable via appropriate mechanisms. Each bucket will have a token filler thread which will fill the tokens in it. If there is one bucket per client and one thread per bucket, it would be difficult to sca
Re: [Gluster-devel] Throttling xlator on the bricks
On 01/25/16 18:24, Ravishankar N wrote: On 01/26/2016 01:22 AM, Shreyas Siravara wrote: Just out of curiosity, what benefits do we think this throttling xlator would provide over the "enable-least-priority" option (where we put all the fops from SHD, etc into a least pri queue)? For one, it could provide more granularity on the amount of throttling you want to do, for specific fops, from specific clients. If the only I/O going through the bricks was from the SHD, they would all be least-priority but yet consume an unfair % of the CPU. We could tweak `performance.least-rate-limit` to throttle but it would be a global option. Right, because as it is now, when shd is the only client, it queues up so much iops that higher prioritiy ops are still getting delayed. On Jan 25, 2016, at 12:29 AM, Venky Shankar wrote: On Mon, Jan 25, 2016 at 01:08:38PM +0530, Ravishankar N wrote: On 01/25/2016 12:56 PM, Venky Shankar wrote: Also, it would be beneficial to have the core TBF implementation as part of libglusterfs so as to be consumable by the server side xlator component to throttle dispatched FOPs and for daemons to throttle anything that's outside "brick" boundary (such as cpu, etc..). That makes sense. We were initially thinking to overload posix_rchecksum() to do the SHA256 sums for the signer. That does have advantages by avoiding network rountrips by computing SHA* locally. TBF could still implement ->rchecksum and throttle that (on behalf of clients, residing on the server - internal daemons). Placing the core implementation as part of libglusterfs would still provide the flexibility. ___ Gluster-devel mailing list Gluster-devel@gluster.org https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel&d=CwICAg&c=5VD0RTtNlTh3ycd41b3MUw&r=N7LE2BKIHDDBvkYkakYthA&m=9W9xtRg0TIEUvFL-8HpUCux8psoWKkUbEFiwqykRwH4&s=OVF0dZRXt8GFcIxsHlkbNjH-bjD9097q5hjVVHgOFkQ&e= ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Throttling xlator on the bricks
On 01/26/2016 08:41 AM, Richard Wareing wrote: If there is one bucket per client and one thread per bucket, it would be difficult to scale as the number of clients increase. How can we do this better? On this note... consider that 10's of thousands of clients are not unrealistic in production :). Using a thread per bucket would also beunwise.. There is only one thread and this solution is for internal processes(shd, rebalance, quota etc) not coming in the way of clients which do I/O. On the idea in general, I'm just wondering if there's specific (real-world) cases where this has even been an issue where least-prio queuing hasn't been able to handle? Or is this more of a theoretical concern? I ask as I've not really encountered situations where I wished I could give more FOPs to SHD vs rebalance and such. I have seen users resort to offline healing of the bricks whenever a brick is replaced, or new brick is added to replication to increase replica count. When entry self-heal happens or big VM image data self-heals which do rchecksums CPU spikes are seen and I/O becomes useless. This is the recent thread where a user ran into similar problem (just yesterday) (This is a combination of client-side healing and healing-load): http://www.gluster.org/pipermail/gluster-users/2016-January/025051.html We can find more of such threads if we put some time to dig into the mailing list. I personally have seen people even resort to things like, "we let gluster heal over the weekend or in the nights when none of us are working on the volumes" etc. There are people who complain healing is too slow too. We get both kinds of complaints :-). Your multi-threaded shd patch is going to help here. I somehow feel you guys are in this set of people :-). In any event, it might be worth having Shreyas detail his throttling feature (that can throttle any directory hierarchy no less) to illustrate how a simpler design can achieve similar results to these more complicated (and it followsbug prone) approaches. The solution we came up with is about throttling internal I/O. And there are only 4/5 such processes(shd, rebalance, quota, bitd etc). What you are saying above about throttling any directory hierarchy seems a bit different than what we are trying to solve from the looks of it(At least from the small description you gave above :-) ). Shreyas' mail detailing the feature would definitely help us understand what each of us are trying to solve. We want to GA both multi-threaded shd and this feature for 3.8. Pranith Richard From: gluster-devel-boun...@gluster.org [gluster-devel-boun...@gluster.org] on behalf of Vijay Bellur [vbel...@redhat.com] Sent: Monday, January 25, 2016 6:44 PM To: Ravishankar N; Gluster Devel Subject: Re: [Gluster-devel] Throttling xlator on the bricks On 01/25/2016 12:36 AM, Ravishankar N wrote: Hi, We are planning to introduce a throttling xlator on the server (brick) process to regulate FOPS. The main motivation is to solve complaints about AFR selfheal taking too much of CPU resources. (due to too many fops for entry self-heal, rchecksums for data self-heal etc.) I am wondering if we can re-use the same xlator for throttling bandwidth, iops etc. in addition to fops. Based on admin configured policies we could provide different upper thresholds to different clients/tenants and this could prove to be an useful feature in multitenant deployments to avoid starvation/noisy neighbor class of problems. Has any thought gone in this direction? The throttling is achieved using the Token Bucket Filter algorithm (TBF). TBF is already used by bitrot's bitd signer (which is a client process) in gluster to regulate the CPU intensive check-sum calculation. By putting the logic on the brick side, multiple clients- selfheal, bitrot, rebalance or even the mounts themselves can avail the benefits of throttling. The TBF algorithm in a nutshell is as follows: There is a bucket which is filled at a steady (configurable) rate with tokens. Each FOP will need a fixed amount of tokens to be processed. If the bucket has that many tokens, the FOP is allowed and that many tokens are removed from the bucket. If not, the FOP is queued until the bucket is filled. The xlator will need to reside above io-threads and can have different buckets, one per client. There has to be a communication mechanism between the client and the brick (IPC?) to tell what FOPS need to be regulated from it, and the no. of tokens needed etc. These need to be re configurable via appropriate mechanisms. Each bucket will have a token filler thread which will fill the tokens in it. If there is one bucket per client and one thread per bucket, it would be difficult to scale as the number of clients increase. How can we do this better? The main thread will enqueue heals in a list in the bucket if there aren't enough tokens. Once the token filler detects some
Re: [Gluster-devel] distributed files/directories and [cm]time updates
Addition case on ctime, 1. When we do internal operations like rebalance, tier-migration, afr-heal etc ctime time changes, this is not desirable as its a internal fop 2. For a compliance case(WORM-Retenetion), ctime change means there is something that has changes on a READ-ONLY file, which is be a point of concern. I agree with you on the overhead of booking keep of this xattr and sync between bricks and will add to the random access of the disk. Also there is a window where we can get inconsistent, i.e ctime has changed as a result of a normal fop and before we get to save it in the xattr brick goes down. Well this situation can be handled with replication sync however. Just thought of bring it to notice. ~Joe - Original Message - From: "Pranith Kumar Karampuri" To: "Gluster Devel" Sent: Tuesday, January 26, 2016 8:17:14 AM Subject: [Gluster-devel] distributed files/directories and [cm]time updates hi, Traditionally gluster has been using ctime/mtime of the files/dirs on the bricks as stat output. Problem we are seeing with this approach is that, software which depends on it gets confused when there are differences in these times. Tar especially gives "file changed as we read it" whenever it detects ctime differences when stat is served from different bricks. The way we have been trying to solve it is to serve the stat structures from same brick in afr, max-time in dht. But it doesn't avoid the problem completely. Because there is no way to change ctime at the moment(lutimes() only allows mtime, atime), there is little we can do to make sure ctimes match after self-heals/xattr updates/rebalance. I am wondering if anyone of you solved these problems before, if yes how did you go about doing it? It seems like applications which depend on this for backups get confused the same way. The only way out I see it is to bring ctime to an xattr, but that will need more iops and gluster has to keep updating it on quite a few fops. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Throttling xlator on the bricks
On 01/26/2016 08:14 AM, Vijay Bellur wrote: On 01/25/2016 12:36 AM, Ravishankar N wrote: Hi, We are planning to introduce a throttling xlator on the server (brick) process to regulate FOPS. The main motivation is to solve complaints about AFR selfheal taking too much of CPU resources. (due to too many fops for entry self-heal, rchecksums for data self-heal etc.) I am wondering if we can re-use the same xlator for throttling bandwidth, iops etc. in addition to fops. Based on admin configured policies we could provide different upper thresholds to different clients/tenants and this could prove to be an useful feature in multitenant deployments to avoid starvation/noisy neighbor class of problems. Has any thought gone in this direction? Nope. It was mainly about internal processes at the moment. The throttling is achieved using the Token Bucket Filter algorithm (TBF). TBF is already used by bitrot's bitd signer (which is a client process) in gluster to regulate the CPU intensive check-sum calculation. By putting the logic on the brick side, multiple clients- selfheal, bitrot, rebalance or even the mounts themselves can avail the benefits of throttling. The TBF algorithm in a nutshell is as follows: There is a bucket which is filled at a steady (configurable) rate with tokens. Each FOP will need a fixed amount of tokens to be processed. If the bucket has that many tokens, the FOP is allowed and that many tokens are removed from the bucket. If not, the FOP is queued until the bucket is filled. The xlator will need to reside above io-threads and can have different buckets, one per client. There has to be a communication mechanism between the client and the brick (IPC?) to tell what FOPS need to be regulated from it, and the no. of tokens needed etc. These need to be re configurable via appropriate mechanisms. Each bucket will have a token filler thread which will fill the tokens in it. If there is one bucket per client and one thread per bucket, it would be difficult to scale as the number of clients increase. How can we do this better? It is same thread for all the buckets. Because the number of internal clients at the moment is in single digits. The problem statement we have right now doesn't consider what you are looking for. The main thread will enqueue heals in a list in the bucket if there aren't enough tokens. Once the token filler detects some FOPS can be serviced, it will send a cond-broadcast to a dequeue thread which will process (stack wind) all the FOPS that have the required no. of tokens from all buckets. This is just a high level abstraction: requesting feedback on any aspect of this feature. what kind of mechanism is best between the client/bricks for tuning various parameters? What other requirements do you foresee? I am in favor of having administrator defined policies or templates (collection of policies) being used to provide the tuning parameter per client or a set of clients. We could even have a default template per use case etc. Is there a specific need to have this negotiation between clients and servers? Thanks, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Throttling xlator on the bricks
> If there is one bucket per client and one thread per bucket, it would be > difficult to scale as the number of clients increase. How can we do this > better? On this note... consider that 10's of thousands of clients are not unrealistic in production :). Using a thread per bucket would also beunwise.. On the idea in general, I'm just wondering if there's specific (real-world) cases where this has even been an issue where least-prio queuing hasn't been able to handle? Or is this more of a theoretical concern? I ask as I've not really encountered situations where I wished I could give more FOPs to SHD vs rebalance and such. In any event, it might be worth having Shreyas detail his throttling feature (that can throttle any directory hierarchy no less) to illustrate how a simpler design can achieve similar results to these more complicated (and it followsbug prone) approaches. Richard From: gluster-devel-boun...@gluster.org [gluster-devel-boun...@gluster.org] on behalf of Vijay Bellur [vbel...@redhat.com] Sent: Monday, January 25, 2016 6:44 PM To: Ravishankar N; Gluster Devel Subject: Re: [Gluster-devel] Throttling xlator on the bricks On 01/25/2016 12:36 AM, Ravishankar N wrote: > Hi, > > We are planning to introduce a throttling xlator on the server (brick) > process to regulate FOPS. The main motivation is to solve complaints about > AFR selfheal taking too much of CPU resources. (due to too many fops for > entry > self-heal, rchecksums for data self-heal etc.) I am wondering if we can re-use the same xlator for throttling bandwidth, iops etc. in addition to fops. Based on admin configured policies we could provide different upper thresholds to different clients/tenants and this could prove to be an useful feature in multitenant deployments to avoid starvation/noisy neighbor class of problems. Has any thought gone in this direction? > > The throttling is achieved using the Token Bucket Filter algorithm > (TBF). TBF > is already used by bitrot's bitd signer (which is a client process) in > gluster to regulate the CPU intensive check-sum calculation. By putting the > logic on the brick side, multiple clients- selfheal, bitrot, rebalance or > even the mounts themselves can avail the benefits of throttling. > > The TBF algorithm in a nutshell is as follows: There is a bucket which > is filled > at a steady (configurable) rate with tokens. Each FOP will need a fixed > amount > of tokens to be processed. If the bucket has that many tokens, the FOP is > allowed and that many tokens are removed from the bucket. If not, the FOP is > queued until the bucket is filled. > > The xlator will need to reside above io-threads and can have different > buckets, > one per client. There has to be a communication mechanism between the > client and > the brick (IPC?) to tell what FOPS need to be regulated from it, and the > no. of > tokens needed etc. These need to be re configurable via appropriate > mechanisms. > Each bucket will have a token filler thread which will fill the tokens > in it. If there is one bucket per client and one thread per bucket, it would be difficult to scale as the number of clients increase. How can we do this better? > The main thread will enqueue heals in a list in the bucket if there aren't > enough tokens. Once the token filler detects some FOPS can be serviced, > it will > send a cond-broadcast to a dequeue thread which will process (stack > wind) all > the FOPS that have the required no. of tokens from all buckets. > > This is just a high level abstraction: requesting feedback on any aspect of > this feature. what kind of mechanism is best between the client/bricks for > tuning various parameters? What other requirements do you foresee? > I am in favor of having administrator defined policies or templates (collection of policies) being used to provide the tuning parameter per client or a set of clients. We could even have a default template per use case etc. Is there a specific need to have this negotiation between clients and servers? Thanks, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel&d=CwICAg&c=5VD0RTtNlTh3ycd41b3MUw&r=qJ8Lp7ySfpQklq3QZr44Iw&m=aQHnnoxK50Ebw77QHtp3ykjC976mJIt2qrIUzpqEViQ&s=Jitbldlbjwye6QI8V33ZoKtVt6-B64p2_-5piVlfXMQ&e= ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] distributed files/directories and [cm]time updates
hi, Traditionally gluster has been using ctime/mtime of the files/dirs on the bricks as stat output. Problem we are seeing with this approach is that, software which depends on it gets confused when there are differences in these times. Tar especially gives "file changed as we read it" whenever it detects ctime differences when stat is served from different bricks. The way we have been trying to solve it is to serve the stat structures from same brick in afr, max-time in dht. But it doesn't avoid the problem completely. Because there is no way to change ctime at the moment(lutimes() only allows mtime, atime), there is little we can do to make sure ctimes match after self-heals/xattr updates/rebalance. I am wondering if anyone of you solved these problems before, if yes how did you go about doing it? It seems like applications which depend on this for backups get confused the same way. The only way out I see it is to bring ctime to an xattr, but that will need more iops and gluster has to keep updating it on quite a few fops. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Throttling xlator on the bricks
On 01/25/2016 12:36 AM, Ravishankar N wrote: Hi, We are planning to introduce a throttling xlator on the server (brick) process to regulate FOPS. The main motivation is to solve complaints about AFR selfheal taking too much of CPU resources. (due to too many fops for entry self-heal, rchecksums for data self-heal etc.) I am wondering if we can re-use the same xlator for throttling bandwidth, iops etc. in addition to fops. Based on admin configured policies we could provide different upper thresholds to different clients/tenants and this could prove to be an useful feature in multitenant deployments to avoid starvation/noisy neighbor class of problems. Has any thought gone in this direction? The throttling is achieved using the Token Bucket Filter algorithm (TBF). TBF is already used by bitrot's bitd signer (which is a client process) in gluster to regulate the CPU intensive check-sum calculation. By putting the logic on the brick side, multiple clients- selfheal, bitrot, rebalance or even the mounts themselves can avail the benefits of throttling. The TBF algorithm in a nutshell is as follows: There is a bucket which is filled at a steady (configurable) rate with tokens. Each FOP will need a fixed amount of tokens to be processed. If the bucket has that many tokens, the FOP is allowed and that many tokens are removed from the bucket. If not, the FOP is queued until the bucket is filled. The xlator will need to reside above io-threads and can have different buckets, one per client. There has to be a communication mechanism between the client and the brick (IPC?) to tell what FOPS need to be regulated from it, and the no. of tokens needed etc. These need to be re configurable via appropriate mechanisms. Each bucket will have a token filler thread which will fill the tokens in it. If there is one bucket per client and one thread per bucket, it would be difficult to scale as the number of clients increase. How can we do this better? The main thread will enqueue heals in a list in the bucket if there aren't enough tokens. Once the token filler detects some FOPS can be serviced, it will send a cond-broadcast to a dequeue thread which will process (stack wind) all the FOPS that have the required no. of tokens from all buckets. This is just a high level abstraction: requesting feedback on any aspect of this feature. what kind of mechanism is best between the client/bricks for tuning various parameters? What other requirements do you foresee? I am in favor of having administrator defined policies or templates (collection of policies) being used to provide the tuning parameter per client or a set of clients. We could even have a default template per use case etc. Is there a specific need to have this negotiation between clients and servers? Thanks, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Throttling xlator on the bricks
On 01/26/2016 01:22 AM, Shreyas Siravara wrote: Just out of curiosity, what benefits do we think this throttling xlator would provide over the "enable-least-priority" option (where we put all the fops from SHD, etc into a least pri queue)? For one, it could provide more granularity on the amount of throttling you want to do, for specific fops, from specific clients. If the only I/O going through the bricks was from the SHD, they would all be least-priority but yet consume an unfair % of the CPU. We could tweak `performance.least-rate-limit` to throttle but it would be a global option. On Jan 25, 2016, at 12:29 AM, Venky Shankar wrote: On Mon, Jan 25, 2016 at 01:08:38PM +0530, Ravishankar N wrote: On 01/25/2016 12:56 PM, Venky Shankar wrote: Also, it would be beneficial to have the core TBF implementation as part of libglusterfs so as to be consumable by the server side xlator component to throttle dispatched FOPs and for daemons to throttle anything that's outside "brick" boundary (such as cpu, etc..). That makes sense. We were initially thinking to overload posix_rchecksum() to do the SHA256 sums for the signer. That does have advantages by avoiding network rountrips by computing SHA* locally. TBF could still implement ->rchecksum and throttle that (on behalf of clients, residing on the server - internal daemons). Placing the core implementation as part of libglusterfs would still provide the flexibility. ___ Gluster-devel mailing list Gluster-devel@gluster.org https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel&d=CwICAg&c=5VD0RTtNlTh3ycd41b3MUw&r=N7LE2BKIHDDBvkYkakYthA&m=9W9xtRg0TIEUvFL-8HpUCux8psoWKkUbEFiwqykRwH4&s=OVF0dZRXt8GFcIxsHlkbNjH-bjD9097q5hjVVHgOFkQ&e= ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Tips and Tricks for Gluster Developer
On 01/22/2016 09:13 AM, Raghavendra Talur wrote: HI All, I am sure there are many tricks hidden under sleeves of many Gluster developers. I realized this when speaking to new developers. It would be good have a searchable thread of such tricks. Just reply back on this thread with the tricks that you have and I promise I will collate them and add them to developer guide. Looking forward to be amazed! Things that I normally do: 1. Visualizing flow through the stack is one of the first steps that I use in debugging. Tracing a call from the origin (application) to the underlying filesystem is usually helpful in isolating most problems. Looking at logs emanating from the endpoints of each stack (fuse/nfs etc. + client protocols, server + posix) helps in identifying the stack that might be the source of a problem. For understanding the nature of fops happening, you can use the wireshark plugin or the trace translator at appropriate locations in the graph. 2. Use statedump/meta for understanding internal state. For servers statedump is the only recourse, on fuse clients you can get a meta view of the filesystem by cd /mnt/point/.meta and you get to view the statedump information in a hierarchical fashion (including information from individual xlators). 3. Reproduce a problem by minimizing the number of nodes in a graph. This can be done by disabling translators that can be disabled through volume set interface and by having custom volume files. 4. Use error-gen while developing new code to simulate fault injection for fops. 5. If a problem happens only at scale, try reproducing the problem by reducing default limits in code (timeouts, inode table limits etc.). Some of them do require re-compilation of code. 6. Use the wealth of tools available on *nix systems for understanding a performance problem better. This infographic [1] and page [2] by Brendan Gregg is quite handy for using the right tool at the right layer. 7. For isolating regression test failures: - use tests/utils/testn.sh to quickly identify the failing test - grep for the last "Test Summary Report" in the jenkins report for a failed regression run. That usually provides a pointer to the failing test. - In case of a failure due to a core, the gdb command provided in the jenkins report is quite handy to get a backtrace after downloading the core and its runtime to your laptop. 8. Get necessary information to debug a problem as soon as a new bug is logged (a day or two is ideal). If we miss that opportunity, users could have potentially moved on to other things and obtaining information can prove to be difficult. 9. Be paranoid about any code that you write ;-). Anything that is not tested by us will come back to haunt us sometime else in the future. 10. Use terminator [3] for concurrently executing the same command on multiple nodes. Will fill in more when I recollect something useful. -Vijay [1] http://www.brendangregg.com/Perf/linux_observability_tools.png [2] http://www.brendangregg.com/perf.html [3] https://launchpad.net/terminator ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Vault CFP closes January 29th
The Linux Foundation's Vault (http://events.linuxfoundation.org/events/vault ) event focusing on Linux storage and filesystems currently has their call for papers open - but it closes this Friday, January 29th. I'm highlighting this because GlusterFS is mentioned as a suggested topic! This year, it's in Raleigh, North Carolina. >From http://events.linuxfoundation.org/events/vault/program/cfp CFP Close: January 29, 2016 CFP Notifications: February 9, 2016 Schedule Announced: February 11, 2016 Suggested Topics We seek proposals on a diverse range of topics related to storage, Linux, and open source, including: Object, Block, and File System Storage Architectures (Ceph, Swift, Cinder, Manila, OpenZFS) Distributed, Clustered, and Parallel Storage Systems (**GlusterFS**, Ceph, Lustre, OrangeFS, XtreemFS, MooseFS, OCFS2, HDFS) Persistent Memory and Other New Hardware Technologies File System Scaling Issues IT Automation and Storage Management (OpenLMI, Ovirt, Ansible) Client/server file systems (NFS, Samba, pNFS) Big Data Storage Long Term, Offline Data Archiving Data Compression and Storage Optimization Software Defined Storage -- Anyone want to put in some great 3.8 talks? -- amye -- Amye Scavarda | a...@redhat.com | Gluster Community Lead ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-infra] Smoke tests run on the builder in RH DC (at least)
On Mon, Jan 25, 2016 at 10:24:33PM +0100, Niels de Vos wrote: > On Mon, Jan 25, 2016 at 06:59:33PM +0100, Michael Scherer wrote: > > Hi, > > > > so today, after fixing one last config item, the smoke test jobs run > > fine on the Centos 6 builder in the RH DC, which build things as non > > root, then start the tests, then reboot the server. > > Nice, sounds like great progress! > > Did you need to change anything in the build or test scripts under > /opt/qa? If so, please make sure that the changes land in the > repository: > > https://github.com/gluster/glusterfs-patch-acceptance-tests/ > > > Now, I am looking at the fedora one, but once this one is good, I will > > likely reinstall a few builders as a test, and go on Centos 7 builder. > > I'm not sure yet if I made an error, or what is going on. But for some > reason smoke tests for my patch series fails... This is the smoke result > of the 1st patch in the serie, it only updates the fuse-header to a > newer version. Of course local testing works just fine... The output and > (not available) logs of the smoke test do not really help me :-/ > > https://build.gluster.org/job/smoke/24395/console > > Could this be related to the changes that were made? If not, I'd > appreciate a pointer to my mistake. Well, I guess that this is a limitation in the FUSE kernel module that is part of EL6 and EL7. One of the structures sent is probably too big and the kernel refuses to accept it. I guess I'll need to go back to the drawing board and add a real check for the FUSE version, or something like that. Niels signature.asc Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Feature: Automagic lock-revocation for features/locks xlator (v3.7.x)
Hey Hey Panith, >Maybe give clients a second (or more) chance to "refresh" their locks - in the >sense, when a lock is about to be revoked, notify the client which can then >call for a refresh to conform it's locks holding validity. This would require >some maintainance work on the client to keep >track of locked regions. So we've thought about this as well, however the approach I'd rather is that we (long term) eliminate any need for multi-hour locking. This would put the responsibility on the SHD/rebalance/bitrot daemons to take out another lock request once in a while to signal to the POSIX locks translator that they are still there and alive. The world we want to be in is that locks > N minutes is most _definitely_ a bug or broken client and should be revoked. With this patch it's simply a heuristic to make a judgement call, in our world however we've seen that once you have 1000's of lock requests piled outit's only a matter of time before your entire cluster is going to collapse; so the "correctness" of the locking behavior or however much you might upset SHD/bitrot/rebalance is a completely secondary concern over the availability and stability of the cluster itself. For folks that want to use this feature conservatively, they shouldn't revoke based on time, but rather based on (lock request) queue depth; if you are in a situation like I've described above it's almost certainly a bug or a situation not fully understood by developers. Richard From: Venky Shankar [yknev.shan...@gmail.com] Sent: Sunday, January 24, 2016 9:36 PM To: Pranith Kumar Karampuri Cc: Richard Wareing; Gluster Devel Subject: Re: [Gluster-devel] Feature: Automagic lock-revocation for features/locks xlator (v3.7.x) On Jan 25, 2016 08:12, "Pranith Kumar Karampuri" mailto:pkara...@redhat.com>> wrote: > > > > On 01/25/2016 02:17 AM, Richard Wareing wrote: >> >> Hello all, >> >> Just gave a talk at SCaLE 14x today and I mentioned our new locks revocation >> feature which has had a significant impact on our GFS cluster reliability. >> As such I wanted to share the patch with the community, so here's the >> bugzilla report: >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1301401 >> >> = >> Summary: >> Mis-behaving brick clients (gNFSd, FUSE, gfAPI) can cause cluster >> instability and eventual complete unavailability due to failures in >> releasing entry/inode locks in a timely manner. >> >> Classic symptoms on this are increased brick (and/or gNFSd) memory usage due >> the high number of (lock request) frames piling up in the processes. The >> failure-mode results in bricks eventually slowing down to a crawl due to >> swapping, or OOMing due to complete memory exhaustion; during this period >> the entire cluster can begin to fail. End-users will experience this as >> hangs on the filesystem, first in a specific region of the file-system and >> ultimately the entire filesystem as the offending brick begins to turn into >> a zombie (i.e. not quite dead, but not quite alive either). >> >> Currently, these situations must be handled by an administrator detecting & >> intervening via the "clear-locks" CLI command. Unfortunately this doesn't >> scale for large numbers of clusters, and it depends on the correct >> (external) detection of the locks piling up (for which there is little >> signal other than state dumps). >> >> This patch introduces two features to remedy this situation: >> >> 1. Monkey-unlocking - This is a feature targeted at developers (only!) to >> help track down crashes due to stale locks, and prove the utility of he lock >> revocation feature. It does this by silently dropping 1% of unlock >> requests; simulating bugs or mis-behaving clients. >> >> The feature is activated via: >> features.locks-monkey-unlocking >> >> You'll see the message >> "[] W [inodelk.c:653:pl_inode_setlk] 0-groot-locks: MONKEY >> LOCKING (forcing stuck lock)!" ... in the logs indicating a request has been >> dropped. >> >> 2. Lock revocation - Once enabled, this feature will revoke a >> *contended*lock (i.e. if nobody else asks for the lock, we will not revoke >> it) either by the amount of time the lock has been held, how many other lock >> requests are waiting on the lock to be freed, or some combination of both. >> Clients which are losing their locks will be notified by receiving EAGAIN >> (send back to their callback function). >> >> The feature is activated via these options: >> features.locks-revocation-secs >> features.locks-revocation-clear-all [on/off] >> features.locks-revocation-max-blocked >> >> Recommended settings are: 1800 seconds for a time based timeout (give >> clients the benefit of the doubt, or chose a max-blocked requires some >> experimentation depending on your workload, but generally values of hundreds >> to low thousands (it's normal for many ten's of locks to be taken out when >> files are being written @ high t
Re: [Gluster-devel] [Gluster-infra] Smoke tests run on the builder in RH DC (at least)
On Mon, Jan 25, 2016 at 06:59:33PM +0100, Michael Scherer wrote: > Hi, > > so today, after fixing one last config item, the smoke test jobs run > fine on the Centos 6 builder in the RH DC, which build things as non > root, then start the tests, then reboot the server. Nice, sounds like great progress! Did you need to change anything in the build or test scripts under /opt/qa? If so, please make sure that the changes land in the repository: https://github.com/gluster/glusterfs-patch-acceptance-tests/ > Now, I am looking at the fedora one, but once this one is good, I will > likely reinstall a few builders as a test, and go on Centos 7 builder. I'm not sure yet if I made an error, or what is going on. But for some reason smoke tests for my patch series fails... This is the smoke result of the 1st patch in the serie, it only updates the fuse-header to a newer version. Of course local testing works just fine... The output and (not available) logs of the smoke test do not really help me :-/ https://build.gluster.org/job/smoke/24395/console Could this be related to the changes that were made? If not, I'd appreciate a pointer to my mistake. > I was also planning to look at jenkins job builder for the jenkins, but > no time yet. Will be after jenkins migration to a new host (which is > still not planned, unlike gerrit where we should be attempting to find a > time for that) We also might want to use Jenkins Job Builder for the tests we're adding to the CentOS CI. Maybe we could experiment with it there first, and then use our knowledge to the Gluster Jenkins? Thanks, Niels signature.asc Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client
Here are the results of "rsync" test. I've got 2 volumes — source and target — performing multiple files rsyncing from one volume to another. Source volume: === root 22259 3.5 1.5 1204200 771004 ? Ssl Jan23 109:42 /usr/sbin/ glusterfs --volfile-server=glusterfs.example.com --volfile-id=source /mnt/net/ glusterfs/source === One may see that memory consumption of source volume is not that high as with "find" test. Here is source volume client statedump: https://gist.github.com/ ef5b798859219e739aeb Here is source volume info: https://gist.github.com/3d2f32e7346df9333004 Target volume: === root 22200 23.8 6.9 3983676 3456252 ? Ssl Jan23 734:57 /usr/sbin/ glusterfs --volfile-server=glusterfs.example.com --volfile-id=target /mnt/net/ glusterfs/target === Here is target volume info: https://gist.github.com/c9de01168071575b109e Target volume RAM consumption is very high (more than 3 GiBs). Here is client statedump too: https://gist.github.com/31e43110eaa4da663435 I see huge DHT-related memory usage, e.g.: === [cluster/distribute.asterisk_records-dht - usage-type gf_common_mt_mem_pool memusage] size=725575592 num_allocs=7552486 max_size=725575836 max_num_allocs=7552489 total_allocs=90843958 [cluster/distribute.asterisk_records-dht - usage-type gf_common_mt_char memusage] size=586404954 num_allocs=7572836 max_size=586405157 max_num_allocs=7572839 total_allocs=80463096 === Ideas? On понеділок, 25 січня 2016 р. 02:46:32 EET Oleksandr Natalenko wrote: > Also, I've repeated the same "find" test again, but with glusterfs process > launched under valgrind. And here is valgrind output: > > https://gist.github.com/097afb01ebb2c5e9e78d > > On неділя, 24 січня 2016 р. 09:33:00 EET Mathieu Chateau wrote: > > Thanks for all your tests and times, it looks promising :) > > > > > > Cordialement, > > Mathieu CHATEAU > > http://www.lotp.fr > > > > 2016-01-23 22:30 GMT+01:00 Oleksandr Natalenko : > > > OK, now I'm re-performing tests with rsync + GlusterFS v3.7.6 + the > > > following > > > patches: > > > > > > === > > > > > > Kaleb S KEITHLEY (1): > > > fuse: use-after-free fix in fuse-bridge, revisited > > > > > > Pranith Kumar K (1): > > > mount/fuse: Fix use-after-free crash > > > > > > Soumya Koduri (3): > > > gfapi: Fix inode nlookup counts > > > inode: Retire the inodes from the lru list in inode_table_destroy > > > upcall: free the xdr* allocations > > > > > > === > > > > > > I run rsync from one GlusterFS volume to another. While memory started > > > from > > > under 100 MiBs, it stalled at around 600 MiBs for source volume and does > > > not > > > grow further. As for target volume it is ~730 MiBs, and that is why I'm > > > going > > > to do several rsync rounds to see if it grows more (with no patches bare > > > 3.7.6 > > > could consume more than 20 GiBs). > > > > > > No "kernel notifier loop terminated" message so far for both volumes. > > > > > > Will report more in several days. I hope current patches will be > > > incorporated > > > into 3.7.7. > > > > > > On пʼятниця, 22 січня 2016 р. 12:53:36 EET Kaleb S. KEITHLEY wrote: > > > > On 01/22/2016 12:43 PM, Oleksandr Natalenko wrote: > > > > > On пʼятниця, 22 січня 2016 р. 12:32:01 EET Kaleb S. KEITHLEY wrote: > > > > >> I presume by this you mean you're not seeing the "kernel notifier > > > > >> loop > > > > >> terminated" error in your logs. > > > > > > > > > > Correct, but only with simple traversing. Have to test under rsync. > > > > > > > > Without the patch I'd get "kernel notifier loop terminated" within a > > > > few > > > > minutes of starting I/O. With the patch I haven't seen it in 24 hours > > > > of beating on it. > > > > > > > > >> Hmmm. My system is not leaking. Last 24 hours the RSZ and VSZ are > > > > > > > >> stable: > > > http://download.gluster.org/pub/gluster/glusterfs/dynamic-analysis/longe > > > v > > > > > > > >> ity /client.out > > > > > > > > > > What ops do you perform on mounted volume? Read, write, stat? Is > > > > > that > > > > > 3.7.6 + patches? > > > > > > > > I'm running an internally developed I/O load generator written by a > > > > guy > > > > on our perf team. > > > > > > > > it does, create, write, read, rename, stat, delete, and more. > > ___ > Gluster-users mailing list > gluster-us...@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] Gluster Monthly Newsletter, January 2015 Edition
On Fri, Jan 22, 2016 at 9:29 AM, Niels de Vos wrote: > On Mon, Jan 18, 2016 at 07:46:16PM -0800, Amye Scavarda wrote: >> We're kicking off an updated Monthly Newsletter, coming out mid-month. >> We'll highlight special posts, news and noteworthy threads from the >> mailing lists, events, and other things that are important for the >> Gluster community. > > ... snip! > >> FOSDEM: >> * Gluster roadmap, recent improvements and upcoming features - Niels De Vos > > More details about the talk and related interview here: > > https://fosdem.org/2016/schedule/event/gluster_roadmap/ > https://fosdem.org/2016/interviews/2016-niels-de-vos/ > >> * Go & Plugins - Kaushal Madappa >> * Gluster Stand >> DevConf >> * small Gluster Developer Gathering >> * Heketi GlusterFS volume management - Lusis Pabon >> * Gluster roadmap, recent improvements and upcoming features - Niels De Vos > > Sorry, this is not correct. That talk was proposed, but not accepted. > I'll be giving a workshop though: > > Build your own Scale-Out Storage with Gluster > http://sched.co/5m1X > >> FAST >> >> == >> Questions? Comments? Want to be involved? > > Can the newsletter get posted in a blog as well? I like reading posts > like this through the RSS feed from http://planet.gluster.org/ . > > Thanks! > Niels Thanks for the update! This time around, I just put out the Community Survey followup, but I'll be adding this to our main blog (with updates to reflect changes). -- Amye Scavarda | a...@redhat.com | Gluster Community Lead ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Throttling xlator on the bricks
Just out of curiosity, what benefits do we think this throttling xlator would provide over the "enable-least-priority" option (where we put all the fops from SHD, etc into a least pri queue)? > On Jan 25, 2016, at 12:29 AM, Venky Shankar wrote: > > On Mon, Jan 25, 2016 at 01:08:38PM +0530, Ravishankar N wrote: >> On 01/25/2016 12:56 PM, Venky Shankar wrote: >>> Also, it would be beneficial to have the core TBF implementation as part of >>> libglusterfs so as to be consumable by the server side xlator component to >>> throttle dispatched FOPs and for daemons to throttle anything that's outside >>> "brick" boundary (such as cpu, etc..). >> That makes sense. We were initially thinking to overload posix_rchecksum() >> to do the SHA256 sums for the signer. > > That does have advantages by avoiding network rountrips by computing SHA* > locally. > TBF could still implement ->rchecksum and throttle that (on behalf of clients, > residing on the server - internal daemons). Placing the core implementation as > part of libglusterfs would still provide the flexibility. > >> >> > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel&d=CwICAg&c=5VD0RTtNlTh3ycd41b3MUw&r=N7LE2BKIHDDBvkYkakYthA&m=9W9xtRg0TIEUvFL-8HpUCux8psoWKkUbEFiwqykRwH4&s=OVF0dZRXt8GFcIxsHlkbNjH-bjD9097q5hjVVHgOFkQ&e= > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Smoke tests run on the builder in RH DC (at least)
On Mon, Jan 25, 2016 at 11:29 PM, Michael Scherer wrote: > Hi, > > so today, after fixing one last config item, the smoke test jobs run > fine on the Centos 6 builder in the RH DC, which build things as non > root, then start the tests, then reboot the server. > Awesome! > > Now, I am looking at the fedora one, but once this one is good, I will > likely reinstall a few builders as a test, and go on Centos 7 builder. > This is what I had to do to get Fedora working. Ansible lines are shown where applicable. 1. change ownership for python site packages: difference is in version 2.7 when compared to 2.6 of CentOS file: path=/usr/lib/python2.7/site-packages/gluster/ state=directory owner=jenkins group=root 2. Had to give jenkins write permission on /usr/lib/systemd/system/ for installing glusterd service file. > I was also planning to look at jenkins job builder for the jenkins, but > no time yet. Will be after jenkins migration to a new host (which is > still not planned, unlike gerrit where we should be attempting to find a > time for that) > > > -- > Michael Scherer > Sysadmin, Community Infrastructure and Platform, OSAS > > > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Smoke tests run on the builder in RH DC (at least)
Hi, so today, after fixing one last config item, the smoke test jobs run fine on the Centos 6 builder in the RH DC, which build things as non root, then start the tests, then reboot the server. Now, I am looking at the fedora one, but once this one is good, I will likely reinstall a few builders as a test, and go on Centos 7 builder. I was also planning to look at jenkins job builder for the jenkins, but no time yet. Will be after jenkins migration to a new host (which is still not planned, unlike gerrit where we should be attempting to find a time for that) -- Michael Scherer Sysadmin, Community Infrastructure and Platform, OSAS signature.asc Description: This is a digitally signed message part ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Gerrit down for 10 to 15 minutes for reindexing
Le lundi 25 janvier 2016 à 15:41 +0100, Michael Scherer a écrit : > Hi, > > in order to fix some issues (I hope), I am gonna start a reindex of the > lucense DB of gerrit. This requires the server to be put offline for a > while, and I did a test on another VM, would take ~10 minutes (it was > 240 seconds on the VM, but it was likely faster since the VM is faster) > > I will to do that around 18h UTC, in ~ 3h ( so 1pm Boston time, 23h Pune > time, and 19h Amsterdam time, so it shouldn't impact too much people who > would either be sleeping and/or eating ). > > If people really want to work, we always have bugs to triage :) So it took 3 to 4 minutes, much faster than what I tought. And it did fixed the issue it was meant to fix, ie Prashanth being unable to find his own reviews. This did happen because we did some modification directly in SQL, but gerrit need to reindex everything to see some changes, and this requires to take the db offline (something that is fixed with a newer version of gerrit). Please ping me if anything weird appear. -- Michael Scherer Sysadmin, Community Infrastructure and Platform, OSAS signature.asc Description: This is a digitally signed message part ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Tips and Tricks for Gluster Developer
- Original Message - From: "Jeff Darcy" To: "Richard Wareing" Cc: "Gluster Devel" Sent: Monday, January 25, 2016 7:27:20 PM Subject: Re: [Gluster-devel] Tips and Tricks for Gluster Developer Oh boy, here we go. ;) I second Richard's suggestion to use cscope or some equivalent. It's a good idea in general, but especially with a codebase as large and complex as Gluster's. I literally wouldn't be able to do my job without it. I also have a set of bash/zsh aliases that will regenerate the cscope database after any git action, so I rarely have to do it myself. JOE : Well cscope and vim is good enough but a good IDE (with its own search and cscope integrated) will also help. I have been using codelite (http://codelite.org/) for over 2 years now and it rocks! Another secondary tip is that in many cases anything you see in the code as "xyz_t" is actually "struct _xyz" so you can save a bit of time (in vim) with ":ta _xyz" instead of going through the meaningless typedef. Unfortunately we're not as consistent as we should be about this convention, but it mostly works. Some day I'll figure out the vim macro syntax enough to create a proper macro and binding for this shortcut. I should probably write a whole new blog post about gdb stuff. Here's one I wrote a while ago: http://pl.atyp.us/hekafs.org/index.php/2013/02/gdb-macros-for-glusterfs/ There's a lot more that could be done in this area. For example, adding loc_t or inode_t or fd_t would all be good exercises. On a more controversial note, I am opposed to the practice of doing "make install" on anything other than a transient VM/container. I've seen too many patches that were broken because they relied on "leftovers" in someone's source directory or elsewhere on the system from previous installs. On my test systems, I always build and install actual RPMs, to make sure new files are properly incorporated in to the configure/rpm system. One of these days I'll set it up so the test system even does a "git clone" (instead of rsync) from my real source tree to catch un-checked-in files as well. I'll probably think of more later, and will update here as I do. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Gerrit down for 10 to 15 minutes for reindexing
Hi, in order to fix some issues (I hope), I am gonna start a reindex of the lucense DB of gerrit. This requires the server to be put offline for a while, and I did a test on another VM, would take ~10 minutes (it was 240 seconds on the VM, but it was likely faster since the VM is faster) I will to do that around 18h UTC, in ~ 3h ( so 1pm Boston time, 23h Pune time, and 19h Amsterdam time, so it shouldn't impact too much people who would either be sleeping and/or eating ). If people really want to work, we always have bugs to triage :) -- Michael Scherer Sysadmin, Community Infrastructure and Platform, OSAS signature.asc Description: This is a digitally signed message part ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Tips and Tricks for Gluster Developer
Oh boy, here we go. ;) I second Richard's suggestion to use cscope or some equivalent. It's a good idea in general, but especially with a codebase as large and complex as Gluster's. I literally wouldn't be able to do my job without it. I also have a set of bash/zsh aliases that will regenerate the cscope database after any git action, so I rarely have to do it myself. Another secondary tip is that in many cases anything you see in the code as "xyz_t" is actually "struct _xyz" so you can save a bit of time (in vim) with ":ta _xyz" instead of going through the meaningless typedef. Unfortunately we're not as consistent as we should be about this convention, but it mostly works. Some day I'll figure out the vim macro syntax enough to create a proper macro and binding for this shortcut. I should probably write a whole new blog post about gdb stuff. Here's one I wrote a while ago: http://pl.atyp.us/hekafs.org/index.php/2013/02/gdb-macros-for-glusterfs/ There's a lot more that could be done in this area. For example, adding loc_t or inode_t or fd_t would all be good exercises. On a more controversial note, I am opposed to the practice of doing "make install" on anything other than a transient VM/container. I've seen too many patches that were broken because they relied on "leftovers" in someone's source directory or elsewhere on the system from previous installs. On my test systems, I always build and install actual RPMs, to make sure new files are properly incorporated in to the configure/rpm system. One of these days I'll set it up so the test system even does a "git clone" (instead of rsync) from my real source tree to catch un-checked-in files as well. I'll probably think of more later, and will update here as I do. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Tips and Tricks for Gluster Developer
- Original Message - > From: "Niels de Vos" > To: "Rajesh Joseph" > Cc: "Richard Wareing" , "Gluster Devel" > > Sent: Monday, January 25, 2016 6:30:53 PM > Subject: Re: [Gluster-devel] Tips and Tricks for Gluster Developer > > On Mon, Jan 25, 2016 at 06:41:50AM -0500, Rajesh Joseph wrote: > > > > > > - Original Message - > > > From: "Richard Wareing" > > > To: "Raghavendra Talur" > > > Cc: "Gluster Devel" > > > Sent: Monday, January 25, 2016 8:12:53 AM > > > Subject: Re: [Gluster-devel] Tips and Tricks for Gluster Developer > > > > > > Here's my tips: > > > > > > 1. General C tricks > > > - learn to use vim or emacs & read their manuals; customize to suite your > > > style > > > - use vim w/ pathogen plugins for auto formatting (don't use tabs!) & > > > syntax > > > - use ctags to jump around functions > > > - Use ASAN & valgrind to check for memory leaks and heap corruption > > > - learn to use "git bisect" to quickly find where regressions were > > > introduced > > > & revert them > > > - Use a window manager like tmux or screen > > > > > > 2. Gluster specific tricks > > > - Alias "ggrep" to grep through all Gluster source files for some string > > > and > > > show you the line numbers > > > - Alias "gvim" or "gemacs" to open any source file without full path, eg. > > > "gvim afr.c" > > > - GFS specific gdb macros to dump out pretty formatting of various > > > structs > > > (Jeff Darcy has some of these IIRC) > > > > I also use few macros for printing dictionary and walking through the list > > structures. > > I think it would be good to collect these macros, scripts and tool in a > > common place > > so that people can use them. Can we include them in "extras/dev" directory > > under Gluster source tree? > > Yes, but please call it "extras/devel-tools" or something descriptive > like that. "extras/dev" sounds like some device under /dev :) Yes, sure :-) > > Thanks, > Niels > > > > > > > - Write prove tests...for everything you write, and any bug you fix. > > > Make > > > them deterministic (timing/races shouldn't matter). > > > - Bugs/races and/or crashes which are hard or impossible to repro often > > > require the creation of a developer specific feature to simulate the > > > failure > > > and efficiently code/test a fix. Example: "monkey-unlocking" in the lock > > > revocation patch I just posted. > > > - That edge case you are ignoring because you think it's > > > impossible/unlikely? > > > We will find/hit it in 48hrs at large scale (seriously we will) > > > handle > > > it correctly or at a minimum write a (kernel style) "OOPS" log type > > > message. > > > > > > That's all I have off the top of my head. I'll give example aliases in > > > another reply. > > > > > > Richard > > > > > > Sent from my iPhone > > > > > > > On Jan 22, 2016, at 6:14 AM, Raghavendra Talur > > > > wrote: > > > > > > > > HI All, > > > > > > > > I am sure there are many tricks hidden under sleeves of many Gluster > > > > developers. > > > > I realized this when speaking to new developers. It would be good have > > > > a > > > > searchable thread of such tricks. > > > > > > > > Just reply back on this thread with the tricks that you have and I > > > > promise > > > > I will collate them and add them to developer guide. > > > > > > > > > > > > Looking forward to be amazed! > > > > > > > > Thanks, > > > > Raghavendra Talur > > > > > > > > ___ > > > > Gluster-devel mailing list > > > > Gluster-devel@gluster.org > > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel&d=CwICAg&c=5VD0RTtNlTh3ycd41b3MUw&r=qJ8Lp7ySfpQklq3QZr44Iw&m=wVrGhYdkvCanDEZF0xOyVbFg0am_GxaoXR26Cvp7H2U&s=JOrY0up51BoZOq2sKaNJQHPzqKiUS3Bwgn7fr5VPXjw&e= > > > ___ > > > Gluster-devel mailing list > > > Gluster-devel@gluster.org > > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > > > ___ > > Gluster-devel mailing list > > Gluster-devel@gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Tips and Tricks for Gluster Developer
On Mon, Jan 25, 2016 at 06:41:50AM -0500, Rajesh Joseph wrote: > > > - Original Message - > > From: "Richard Wareing" > > To: "Raghavendra Talur" > > Cc: "Gluster Devel" > > Sent: Monday, January 25, 2016 8:12:53 AM > > Subject: Re: [Gluster-devel] Tips and Tricks for Gluster Developer > > > > Here's my tips: > > > > 1. General C tricks > > - learn to use vim or emacs & read their manuals; customize to suite your > > style > > - use vim w/ pathogen plugins for auto formatting (don't use tabs!) & syntax > > - use ctags to jump around functions > > - Use ASAN & valgrind to check for memory leaks and heap corruption > > - learn to use "git bisect" to quickly find where regressions were > > introduced > > & revert them > > - Use a window manager like tmux or screen > > > > 2. Gluster specific tricks > > - Alias "ggrep" to grep through all Gluster source files for some string and > > show you the line numbers > > - Alias "gvim" or "gemacs" to open any source file without full path, eg. > > "gvim afr.c" > > - GFS specific gdb macros to dump out pretty formatting of various structs > > (Jeff Darcy has some of these IIRC) > > I also use few macros for printing dictionary and walking through the list > structures. > I think it would be good to collect these macros, scripts and tool in a > common place > so that people can use them. Can we include them in "extras/dev" directory > under Gluster source tree? Yes, but please call it "extras/devel-tools" or something descriptive like that. "extras/dev" sounds like some device under /dev :) Thanks, Niels > > > - Write prove tests...for everything you write, and any bug you fix. Make > > them deterministic (timing/races shouldn't matter). > > - Bugs/races and/or crashes which are hard or impossible to repro often > > require the creation of a developer specific feature to simulate the failure > > and efficiently code/test a fix. Example: "monkey-unlocking" in the lock > > revocation patch I just posted. > > - That edge case you are ignoring because you think it's > > impossible/unlikely? > > We will find/hit it in 48hrs at large scale (seriously we will) handle > > it correctly or at a minimum write a (kernel style) "OOPS" log type message. > > > > That's all I have off the top of my head. I'll give example aliases in > > another reply. > > > > Richard > > > > Sent from my iPhone > > > > > On Jan 22, 2016, at 6:14 AM, Raghavendra Talur wrote: > > > > > > HI All, > > > > > > I am sure there are many tricks hidden under sleeves of many Gluster > > > developers. > > > I realized this when speaking to new developers. It would be good have a > > > searchable thread of such tricks. > > > > > > Just reply back on this thread with the tricks that you have and I promise > > > I will collate them and add them to developer guide. > > > > > > > > > Looking forward to be amazed! > > > > > > Thanks, > > > Raghavendra Talur > > > > > > ___ > > > Gluster-devel mailing list > > > Gluster-devel@gluster.org > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel&d=CwICAg&c=5VD0RTtNlTh3ycd41b3MUw&r=qJ8Lp7ySfpQklq3QZr44Iw&m=wVrGhYdkvCanDEZF0xOyVbFg0am_GxaoXR26Cvp7H2U&s=JOrY0up51BoZOq2sKaNJQHPzqKiUS3Bwgn7fr5VPXjw&e= > > ___ > > Gluster-devel mailing list > > Gluster-devel@gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel signature.asc Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Tips and Tricks for Gluster Developer
- Original Message - > From: "Richard Wareing" > To: "Raghavendra Talur" > Cc: "Gluster Devel" > Sent: Monday, January 25, 2016 8:12:53 AM > Subject: Re: [Gluster-devel] Tips and Tricks for Gluster Developer > > Here's my tips: > > 1. General C tricks > - learn to use vim or emacs & read their manuals; customize to suite your > style > - use vim w/ pathogen plugins for auto formatting (don't use tabs!) & syntax > - use ctags to jump around functions > - Use ASAN & valgrind to check for memory leaks and heap corruption > - learn to use "git bisect" to quickly find where regressions were introduced > & revert them > - Use a window manager like tmux or screen > > 2. Gluster specific tricks > - Alias "ggrep" to grep through all Gluster source files for some string and > show you the line numbers > - Alias "gvim" or "gemacs" to open any source file without full path, eg. > "gvim afr.c" > - GFS specific gdb macros to dump out pretty formatting of various structs > (Jeff Darcy has some of these IIRC) I also use few macros for printing dictionary and walking through the list structures. I think it would be good to collect these macros, scripts and tool in a common place so that people can use them. Can we include them in "extras/dev" directory under Gluster source tree? > - Write prove tests...for everything you write, and any bug you fix. Make > them deterministic (timing/races shouldn't matter). > - Bugs/races and/or crashes which are hard or impossible to repro often > require the creation of a developer specific feature to simulate the failure > and efficiently code/test a fix. Example: "monkey-unlocking" in the lock > revocation patch I just posted. > - That edge case you are ignoring because you think it's impossible/unlikely? > We will find/hit it in 48hrs at large scale (seriously we will) handle > it correctly or at a minimum write a (kernel style) "OOPS" log type message. > > That's all I have off the top of my head. I'll give example aliases in > another reply. > > Richard > > Sent from my iPhone > > > On Jan 22, 2016, at 6:14 AM, Raghavendra Talur wrote: > > > > HI All, > > > > I am sure there are many tricks hidden under sleeves of many Gluster > > developers. > > I realized this when speaking to new developers. It would be good have a > > searchable thread of such tricks. > > > > Just reply back on this thread with the tricks that you have and I promise > > I will collate them and add them to developer guide. > > > > > > Looking forward to be amazed! > > > > Thanks, > > Raghavendra Talur > > > > ___ > > Gluster-devel mailing list > > Gluster-devel@gluster.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel&d=CwICAg&c=5VD0RTtNlTh3ycd41b3MUw&r=qJ8Lp7ySfpQklq3QZr44Iw&m=wVrGhYdkvCanDEZF0xOyVbFg0am_GxaoXR26Cvp7H2U&s=JOrY0up51BoZOq2sKaNJQHPzqKiUS3Bwgn7fr5VPXjw&e= > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Tips and Tricks for Gluster Developer
I don't like installing the bits under /usr/local so I configure and compile them to install in the same place as a Fedora rpm would. Here is my compile command ./autogen.sh CFLAGS="-g -O0 -Werror -Wall -Wno-error=cpp -Wno-error=maybe-uninitialized" ./configure \ --prefix=/usr \ --exec-prefix=/usr \ --bindir=/usr/bin \ --sbindir=/usr/sbin \ --sysconfdir=/etc \ --datadir=/usr/share \ --includedir=/usr/include \ --libdir=/usr/lib64 \ --libexecdir=/usr/libexec \ --localstatedir=/var \ --sharedstatedir=/var/lib \ --mandir=/usr/share/man \ --infodir=/usr/share/info \ --libdir=/usr/lib64 \ --enable-debug make install On Fri, Jan 22, 2016 at 7:43 PM, Raghavendra Talur wrote: > HI All, > > I am sure there are many tricks hidden under sleeves of many Gluster > developers. > I realized this when speaking to new developers. It would be good have a > searchable thread of such tricks. > > Just reply back on this thread with the tricks that you have and I promise > I will collate them and add them to developer guide. > > > Looking forward to be amazed! > > Thanks, > Raghavendra Talur > > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Throttling xlator on the bricks
On Mon, Jan 25, 2016 at 01:08:38PM +0530, Ravishankar N wrote: > On 01/25/2016 12:56 PM, Venky Shankar wrote: > >Also, it would be beneficial to have the core TBF implementation as part of > >libglusterfs so as to be consumable by the server side xlator component to > >throttle dispatched FOPs and for daemons to throttle anything that's outside > >"brick" boundary (such as cpu, etc..). > That makes sense. We were initially thinking to overload posix_rchecksum() > to do the SHA256 sums for the signer. That does have advantages by avoiding network rountrips by computing SHA* locally. TBF could still implement ->rchecksum and throttle that (on behalf of clients, residing on the server - internal daemons). Placing the core implementation as part of libglusterfs would still provide the flexibility. > > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] GlusterFS FUSE client hangs on rsyncing lots of file
the client statedump is at http://pastebin.centos.org/38671/ On Mon, Jan 25, 2016 at 3:33 PM, baul jianguo wrote: > 3.5.7 also hangs.only the flush op hung. Yes,off the > performance.client-io-threads ,no hang. > > The hang does not relate the client kernel version. > > One client statdump about flush op,any abnormal? > > [global.callpool.stack.12] > > uid=0 > > gid=0 > > pid=14432 > > unique=16336007098 > > lk-owner=77cb199aa36f3641 > > op=FLUSH > > type=1 > > cnt=6 > > > > [global.callpool.stack.12.frame.1] > > ref_count=1 > > translator=fuse > > complete=0 > > > > [global.callpool.stack.12.frame.2] > > ref_count=0 > > translator=datavolume-write-behind > > complete=0 > > parent=datavolume-read-ahead > > wind_from=ra_flush > > wind_to=FIRST_CHILD (this)->fops->flush > > unwind_to=ra_flush_cbk > > > > [global.callpool.stack.12.frame.3] > > ref_count=1 > > translator=datavolume-read-ahead > > complete=0 > > parent=datavolume-open-behind > > wind_from=default_flush_resume > > wind_to=FIRST_CHILD(this)->fops->flush > > unwind_to=default_flush_cbk > > > > [global.callpool.stack.12.frame.4] > > ref_count=1 > > translator=datavolume-open-behind > > complete=0 > > parent=datavolume-io-threads > > wind_from=iot_flush_wrapper > > wind_to=FIRST_CHILD(this)->fops->flush > > unwind_to=iot_flush_cbk > > > > [global.callpool.stack.12.frame.5] > > ref_count=1 > > translator=datavolume-io-threads > > complete=0 > > parent=datavolume > > wind_from=io_stats_flush > > wind_to=FIRST_CHILD(this)->fops->flush > > unwind_to=io_stats_flush_cbk > > > > [global.callpool.stack.12.frame.6] > > ref_count=1 > > translator=datavolume > > complete=0 > > parent=fuse > > wind_from=fuse_flush_resume > > wind_to=xl->fops->flush > > unwind_to=fuse_err_cbk > > > > On Sun, Jan 24, 2016 at 5:35 AM, Oleksandr Natalenko > wrote: >> With "performance.client-io-threads" set to "off" no hangs occurred in 3 >> rsync/rm rounds. Could that be some fuse-bridge lock race? Will bring that >> option to "on" back again and try to get full statedump. >> >> On четвер, 21 січня 2016 р. 14:54:47 EET Raghavendra G wrote: >>> On Thu, Jan 21, 2016 at 10:49 AM, Pranith Kumar Karampuri < >>> >>> pkara...@redhat.com> wrote: >>> > On 01/18/2016 02:28 PM, Oleksandr Natalenko wrote: >>> >> XFS. Server side works OK, I'm able to mount volume again. Brick is 30% >>> >> full. >>> > >>> > Oleksandr, >>> > >>> > Will it be possible to get the statedump of the client, bricks >>> > >>> > output next time it happens? >>> > >>> > https://github.com/gluster/glusterfs/blob/master/doc/debugging/statedump.m >>> > d#how-to-generate-statedump >>> We also need to dump inode information. To do that you've to add "all=yes" >>> to /var/run/gluster/glusterdump.options before you issue commands to get >>> statedump. >>> >>> > Pranith >>> > >>> >> On понеділок, 18 січня 2016 р. 15:07:18 EET baul jianguo wrote: >>> >>> What is your brick file system? and the glusterfsd process and all >>> >>> thread status? >>> >>> I met same issue when client app such as rsync stay in D status,and >>> >>> the brick process and relate thread also be in the D status. >>> >>> And the brick dev disk util is 100% . >>> >>> >>> >>> On Sun, Jan 17, 2016 at 6:13 AM, Oleksandr Natalenko >>> >>> >>> >>> wrote: >>> Wrong assumption, rsync hung again. >>> >>> On субота, 16 січня 2016 р. 22:53:04 EET Oleksandr Natalenko wrote: >>> > One possible reason: >>> > >>> > cluster.lookup-optimize: on >>> > cluster.readdir-optimize: on >>> > >>> > I've disabled both optimizations, and at least as of now rsync still >>> > does >>> > its job with no issues. I would like to find out what option causes >>> > such >>> > a >>> > behavior and why. Will test more. >>> > >>> > On пʼятниця, 15 січня 2016 р. 16:09:51 EET Oleksandr Natalenko wrote: >>> >> Another observation: if rsyncing is resumed after hang, rsync itself >>> >> hangs a lot faster because it does stat of already copied files. So, >>> >> the >>> >> reason may be not writing itself, but massive stat on GlusterFS >>> >> volume >>> >> as well. >>> >> >>> >> 15.01.2016 09:40, Oleksandr Natalenko написав: >>> >>> While doing rsync over millions of files from ordinary partition to >>> >>> GlusterFS volume, just after approx. first 2 million rsync hang >>> >>> happens, and the following info appears in dmesg: >>> >>> >>> >>> === >>> >>> [17075038.924481] INFO: task rsync:10310 blocked for more than 120 >>> >>> seconds. >>> >>> [17075038.931948] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>> >>> disables this message. >>> >>> [17075038.940748] rsync D 88207fc13680 0 10310 >>> >>> 10309 0x0080 >>> >>> [17075038.940752] 8809c578be18 0086 >>> >>> 8809c578bfd8 >>> >>> 00013680 >>> >>> [17075038.940756] 8809c578bfd8 00013680 >>> >>> 880310cbe6