Re: [Gluster-devel] Bitrot/Tering : Bad files get migrated and hence corruption goes undetected.
Well correctly we dont migrate the existing signature, the file starts it life fresh in the new tier(i.e get the bit rot version 1 on the new tier), Now this is also the case with any special xattr/attributes of the file. Again we rely heavily on the dht rebalance mechanism for migrations, which also doesnt carry special attributes/xattr. - Original Message - From: "Niels de Vos" To: "Joseph Fernandes" Cc: "Gluster Devel" Sent: Friday, February 26, 2016 10:33:11 PM Subject: Re: [Gluster-devel] Bitrot/Tering : Bad files get migrated and hence corruption goes undetected. On Fri, Feb 26, 2016 at 09:32:46AM -0500, Joseph Fernandes wrote: > Hi All, > > This is a discussion mail on the following issue, > > 1. Object is corrupted before it could be signed: In this case, the corrupted >object is signed and get migrated upon I/O. There's no way to identify > corruption >for this set of objects. > > 2. Object is signed (but not scrubbed) and corruption happens thereafter: >In this case, as of now, integrity checking is not done on the fly >and the object would get migrated (and signed again in the hot tier). > > > The (1) is definitely not a issue with bitrot with tiering. But (2) we can do > something to avoid > corrupted file from getting migrated. Before we migrate files we can scrub > it, but its just a naive > thought, any better suggestions? Is there a reason the existing signature can not be migrated? Why does it become invalid? Niels ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Need help with bitrot
Thanks Joseph for the precise set of information. I will follow these. - Ajil On Fri, Feb 26, 2016 at 7:05 AM, Joseph Fernandes wrote: > Hope this helps > > Courtesy : Raghvendra Talur (rta...@redhat.com) > > 1. Clone glusterfs repo to your laptop and get acquainted with dev > workflow. > https://gluster.readthedocs.org/en/latest/Developer-guide/Developers-Index/ > > 2. If you find using your laptop as the test machine for Gluster as too > scary, here is a vagrant based mechanism to get VMs setup on your laptop > easily for Gluster testing. > http://comments.gmane.org/gmane.comp.file-systems.gluster.devel/13494 > > 3.Find my Gluster Introduction blog post here in the preview link: > > > > https://6227134958232800133_bafac39c28bee4f256bbbef7510c9bb9b44fca05.blogspot.com/b/post-preview?token=s6_4MVIBAAA.zY--3ij00CkDwnitBOwnFBowEvCsKZ0o4ToQ0KYk9Po4pKujPj9ugmn-fm-XUFdLQxU50FmnCxBBr_IkSzuSlA.l_XFe1UvIEAiqkFAZZPdqQ&postId=4168074834715190149&type=POST > > 4. Follow all the lessons here in Translator 101 series for build on your > understanding of Gluster. > > http://pl.atyp.us/hekafs.org/index.php/2011/11/translator-101-class-1-setting-the-stage/ > > http://hekafs.org/index.php/2011/11/translator-101-lesson-2-init-fini-and-private-context > > http://hekafs.org/index.php/2011/11/translator-101-lesson-3-this-time-for-real > > http://hekafs.org/index.php/2011/11/translator-101-lesson-4-debugging-a-translator > > > 5. Try to fix or understand any of the bugs in this list > > https://bugzilla.redhat.com/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&classification=Community&f1=keywords&list_id=4424622&o1=substring&product=GlusterFS&query_format=advanced&v1=easyfix > > > Regards, > Joe > > > - Original Message - > From: "Ajil Abraham" > To: "FNU Raghavendra Manjunath" > Cc: "Gluster Devel" > Sent: Thursday, February 25, 2016 8:58:35 PM > Subject: Re: [Gluster-devel] Need help with bitrot > > Thanks FNU Raghavendra. Does the signing happen only when the file data > changes or even when extended attribute changes? > > I am also trying to understand the Gluster internal data structures. Are > there any materials for the same? Similarly for the translators, the way > they are stacked on client & server side, how control flows between them. > Can somebody please help? > > - Ajil > > On Thu, Feb 25, 2016 at 7:27 AM, FNU Raghavendra Manjunath < > rab...@redhat.com > wrote: > > > > Hi Ajil, > > Expiry policy tells the signer (Bit-rot Daemon) to wait for a specific > period of time before signing a object. > > Whenever a object is modified, a notification is sent to the signer by > brick process (bit-rot-stub xlator sitting in the I/O path) upon getting a > release (i.e. when all the fds of that object are closed). The expiry > policy tells the signer to wait for some time (by default its 120 seconds) > before signing that object. It is done because, suppose the signer starts > signing (i.e. read the object + calculate the checksum + store the > checksum) a object the object gets modified again, then a new notification > has to be sent and again signer has to sign the object by calculating the > checksum. Whereas if the signer waits for some time and receives a new > notification on the same object when its waiting, then it can avoid signing > for the first notification. > > Venky, do you want to add anything more? > > Regards, > Raghavendra > > > > On Wed, Feb 24, 2016 at 12:28 AM, Ajil Abraham < ajil95.abra...@gmail.com > > wrote: > > > > Hi, > > I am a student interested in GlusterFS. Trying to understand the design of > GlusterFS. Came across the Bitrot design document in Google. There is a > mention of expiry policy used to sign the files. I did not clearly > understand what the expiry policy is. Can somebody please help? > > -Ajil > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > > > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Query on healing process
Hi Ravi, Thanks for the response. We are using Glugsterfs-3.7.8 Here is the use case: We have a logging file which saves logs of the events for every board of a node and these files are in sync using glusterfs. System in replica 2 mode it means When one brick in a replicated volume goes offline, the glusterd daemons on the other nodes keep track of all the files that are not replicated to the offline brick. When the offline brick becomes available again, the cluster initiates a healing process, replicating the updated files to that brick. But in our casse, we see that log file of one board is not in the sync and its format is corrupted means files are not in sync. Even the outcome of #gluster volume heal c_glusterfs info shows that there is no pending heals. Also , The logging file which is updated is of fixed size and the new entries will be wrapped ,overwriting the old entries. This way we have seen that after few restarts , the contents of the same file on two bricks are different , but the volume heal info shows zero entries Solution: But when we tried to put delay > 5 min before the healing everything is working fine. Regards, Abhishek On Fri, Feb 26, 2016 at 6:35 AM, Ravishankar N wrote: > On 02/25/2016 06:01 PM, ABHISHEK PALIWAL wrote: > > Hi, > > Here, I have one query regarding the time taken by the healing process. > In current two node setup when we rebooted one node then the self-healing > process starts less than 5min interval on the board which resulting the > corruption of the some files data. > > > Heal should start immediately after the brick process comes up. What > version of gluster are you using? What do you mean by corruption of data? > Also, how did you observe that the heal started after 5 minutes? > -Ravi > > > And to resolve it I have search on google and found the following link: > https://support.rackspace.com/how-to/glusterfs-troubleshooting/ > > Mentioning that the healing process can takes upto 10min of time to start > this process. > > Here is the statement from the link: > > "Healing replicated volumes > > When any brick in a replicated volume goes offline, the glusterd daemons > on the remaining nodes keep track of all the files that are not replicated > to the offline brick. When the offline brick becomes available again, the > cluster initiates a healing process, replicating the updated files to that > brick. *The start of this process can take up to 10 minutes, based on > observation.*" > > After giving the time of more than 5 min file corruption problem has been > resolved. > > So, Here my question is there any way through which we can reduce the time > taken by the healing process to start? > > > Regards, > Abhishek Paliwal > > > > > ___ > Gluster-devel mailing > listGluster-devel@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel > > > > -- Regards Abhishek Paliwal ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Need help with bitrot
Thank you sir. I will read these documents. - Ajil On Thu, Feb 25, 2016 at 9:05 PM, FNU Raghavendra Manjunath < rab...@redhat.com> wrote: > As of now, signing happens only upon data modification. Metadata changes > and xattr changes does not trigger signing. > > For more information about gluster and its internals, you can look here " > https://gluster.readthedocs.org/en/latest/";. > > Regards, > Raghavendra > > On Thu, Feb 25, 2016 at 10:28 AM, Ajil Abraham > wrote: > >> Thanks FNU Raghavendra. Does the signing happen only when the file data >> changes or even when extended attribute changes? >> >> I am also trying to understand the Gluster internal data structures. Are >> there any materials for the same? Similarly for the translators, the way >> they are stacked on client & server side, how control flows between them. >> Can somebody please help? >> >> - Ajil >> >> >> On Thu, Feb 25, 2016 at 7:27 AM, FNU Raghavendra Manjunath < >> rab...@redhat.com> wrote: >> >>> Hi Ajil, >>> >>> Expiry policy tells the signer (Bit-rot Daemon) to wait for a specific >>> period of time before signing a object. >>> >>> Whenever a object is modified, a notification is sent to the signer by >>> brick process (bit-rot-stub xlator sitting in the I/O path) upon getting a >>> release (i.e. when all the fds of that object are closed). The expiry >>> policy tells the signer to wait for some time (by default its 120 seconds) >>> before signing that object. It is done because, suppose the signer starts >>> signing (i.e. read the object + calculate the checksum + store the >>> checksum) a object the object gets modified again, then a new notification >>> has to be sent and again signer has to sign the object by calculating the >>> checksum. Whereas if the signer waits for some time and receives a new >>> notification on the same object when its waiting, then it can avoid signing >>> for the first notification. >>> >>> Venky, do you want to add anything more? >>> >>> Regards, >>> Raghavendra >>> >>> >>> >>> On Wed, Feb 24, 2016 at 12:28 AM, Ajil Abraham >> > wrote: >>> Hi, I am a student interested in GlusterFS. Trying to understand the design of GlusterFS. Came across the Bitrot design document in Google. There is a mention of expiry policy used to sign the files. I did not clearly understand what the expiry policy is. Can somebody please help? -Ajil ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel >>> >>> >> > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Need help with bitrot
On Fri, Feb 26, 2016 at 7:05 AM, Joseph Fernandes wrote: > Hope this helps > > Courtesy : Raghvendra Talur (rta...@redhat.com) > > 1. Clone glusterfs repo to your laptop and get acquainted with dev > workflow. > https://gluster.readthedocs.org/en/latest/Developer-guide/Developers-Index/ > > 2. If you find using your laptop as the test machine for Gluster as too > scary, here is a vagrant based mechanism to get VMs setup on your laptop > easily for Gluster testing. > http://comments.gmane.org/gmane.comp.file-systems.gluster.devel/13494 > > 3.Find my Gluster Introduction blog post here in the preview link: > > > > https://6227134958232800133_bafac39c28bee4f256bbbef7510c9bb9b44fca05.blogspot.com/b/post-preview?token=s6_4MVIBAAA.zY--3ij00CkDwnitBOwnFBowEvCsKZ0o4ToQ0KYk9Po4pKujPj9ugmn-fm-XUFdLQxU50FmnCxBBr_IkSzuSlA.l_XFe1UvIEAiqkFAZZPdqQ&postId=4168074834715190149&type=POST This is the public link http://blog.raghavendratalur.in/2016/02/gluster-developer-guide-part-1.html > > > 4. Follow all the lessons here in Translator 101 series for build on your > understanding of Gluster. > > http://pl.atyp.us/hekafs.org/index.php/2011/11/translator-101-class-1-setting-the-stage/ > > http://hekafs.org/index.php/2011/11/translator-101-lesson-2-init-fini-and-private-context > > http://hekafs.org/index.php/2011/11/translator-101-lesson-3-this-time-for-real > > http://hekafs.org/index.php/2011/11/translator-101-lesson-4-debugging-a-translator > > > 5. Try to fix or understand any of the bugs in this list > > https://bugzilla.redhat.com/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&classification=Community&f1=keywords&list_id=4424622&o1=substring&product=GlusterFS&query_format=advanced&v1=easyfix > > > Regards, > Joe > > > - Original Message - > From: "Ajil Abraham" > To: "FNU Raghavendra Manjunath" > Cc: "Gluster Devel" > Sent: Thursday, February 25, 2016 8:58:35 PM > Subject: Re: [Gluster-devel] Need help with bitrot > > Thanks FNU Raghavendra. Does the signing happen only when the file data > changes or even when extended attribute changes? > > I am also trying to understand the Gluster internal data structures. Are > there any materials for the same? Similarly for the translators, the way > they are stacked on client & server side, how control flows between them. > Can somebody please help? > > - Ajil > > On Thu, Feb 25, 2016 at 7:27 AM, FNU Raghavendra Manjunath < > rab...@redhat.com > wrote: > > > > Hi Ajil, > > Expiry policy tells the signer (Bit-rot Daemon) to wait for a specific > period of time before signing a object. > > Whenever a object is modified, a notification is sent to the signer by > brick process (bit-rot-stub xlator sitting in the I/O path) upon getting a > release (i.e. when all the fds of that object are closed). The expiry > policy tells the signer to wait for some time (by default its 120 seconds) > before signing that object. It is done because, suppose the signer starts > signing (i.e. read the object + calculate the checksum + store the > checksum) a object the object gets modified again, then a new notification > has to be sent and again signer has to sign the object by calculating the > checksum. Whereas if the signer waits for some time and receives a new > notification on the same object when its waiting, then it can avoid signing > for the first notification. > > Venky, do you want to add anything more? > > Regards, > Raghavendra > > > > On Wed, Feb 24, 2016 at 12:28 AM, Ajil Abraham < ajil95.abra...@gmail.com > > wrote: > > > > Hi, > > I am a student interested in GlusterFS. Trying to understand the design of > GlusterFS. Came across the Bitrot design document in Google. There is a > mention of expiry policy used to sign the files. I did not clearly > understand what the expiry policy is. Can somebody please help? > > -Ajil > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > > > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Bitrot/Tering : Bad files get migrated and hence corruption goes undetected.
On Fri, Feb 26, 2016 at 09:32:46AM -0500, Joseph Fernandes wrote: > Hi All, > > This is a discussion mail on the following issue, > > 1. Object is corrupted before it could be signed: In this case, the corrupted >object is signed and get migrated upon I/O. There's no way to identify > corruption >for this set of objects. > > 2. Object is signed (but not scrubbed) and corruption happens thereafter: >In this case, as of now, integrity checking is not done on the fly >and the object would get migrated (and signed again in the hot tier). > > > The (1) is definitely not a issue with bitrot with tiering. But (2) we can do > something to avoid > corrupted file from getting migrated. Before we migrate files we can scrub > it, but its just a naive > thought, any better suggestions? Is there a reason the existing signature can not be migrated? Why does it become invalid? Niels signature.asc Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Bitrot/Tering : Bad files get migrated and hence corruption goes undetected.
Hi All, This is a discussion mail on the following issue, 1. Object is corrupted before it could be signed: In this case, the corrupted object is signed and get migrated upon I/O. There's no way to identify corruption for this set of objects. 2. Object is signed (but not scrubbed) and corruption happens thereafter: In this case, as of now, integrity checking is not done on the fly and the object would get migrated (and signed again in the hot tier). The (1) is definitely not a issue with bitrot with tiering. But (2) we can do something to avoid corrupted file from getting migrated. Before we migrate files we can scrub it, but its just a naive thought, any better suggestions? Regards, Joe ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-infra] r.g.o not responding or too slow
On 02/26/2016 06:10 PM, Niels de Vos wrote: > On Fri, Feb 26, 2016 at 05:40:00PM +0530, Atin Mukherjee wrote: >> As $subj > > Please check the infra list archive before sending reports like this :) I'd say that some other guy who sent it should have copied gluster-devel too in the mail :) > > http://news.gmane.org/gmane.comp.file-systems.gluster.infra already has > an email about it. Not sure if any of the admins with access to the > Gerrit server are available, none of them seem to be on irc in > #gluster-dev. > > Niels > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-infra] r.g.o not responding or too slow
On Fri, Feb 26, 2016 at 05:40:00PM +0530, Atin Mukherjee wrote: > As $subj Please check the infra list archive before sending reports like this :) http://news.gmane.org/gmane.comp.file-systems.gluster.infra already has an email about it. Not sure if any of the admins with access to the Gerrit server are available, none of them seem to be on irc in #gluster-dev. Niels signature.asc Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] r.g.o not responding or too slow
As $subj ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] WORM/Retention Feature - 26/02/2016
Hi all, The current status of the project is: -It works as a file level WORM -Handles the setattr call if all the write bits are removed -Sets an xattr storing the WORM/Retention state along with the retention period -atime of the file will point to the time till which the file is retained -When a write/unlink/rename/truncate request comes for a WORM/Retained file, it returns EROFS error -Whenever a fop request comes for a file, it will do a lookup -Lookup will do the state transition if the retention period is expired -It will reset the state from WORM/Retained to WORM -The atime of the file will also revert back to the actual atime -The file will still be read-only and blocks the write, truncate, and the rename requests -The unlink call will succeed for a WORM file -You can transition back to the WORM/Retained state by again doing setattr Plans for next week: -As Niels suggestion, preparing a specs document -Fixing the bugs in the program -Working on handling the ctime change You can find the feature page at: http://www.gluster.org/community/documentation/index.php/Features/gluster_compliance_archive Patch: http://review.gluster.org/#/c/13429/ Your valuable suggestions, wish lists, and reviews are most welcome. Regards, Karthik Subrahmanya ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Throttling xlator on the bricks
Hey Shreyas, I'll be starting on the TBF based implementation next week, as this needs to be completed by 3.8. If you can send your patch, I'll see we can leverage it too. Thanks, Ravi On 02/13/2016 09:06 AM, Pranith Kumar Karampuri wrote: On 02/13/2016 12:13 AM, Richard Wareing wrote: Hey Ravi, I'll ping Shreyas about this today. There's also a patch we'll need for multi-threaded SHD to fix the least-pri queuing. The PID of the process wasn't tagged correctly via the call frame in my original patch. The patch below fixes this (for 3.6.3), I didn't see multi-threaded self heal on github/master yet so let me know what branch you need this patch on and I can come up with a clean patch. Hi Richard, I reviewed the patch and found that the same needs to be done even for ec. So I am thinking if I can split it out as two different patches, 1 patch in syncop-utils which builds the functionality of parallelization. Another patch which uses this in afr, ec. Do you mind if I give it a go? I can complete it by end of Wednesday. Pranith Richard = diff --git a/xlators/cluster/afr/src/afr-self-heald.c b/xlators/cluster/afr/src/afr-self-heald.c index 028010d..b0f6248 100644 --- a/xlators/cluster/afr/src/afr-self-heald.c +++ b/xlators/cluster/afr/src/afr-self-heald.c @@ -532,6 +532,9 @@ afr_mt_process_entries_done (int ret, call_frame_t *sync_frame, pthread_cond_signal (&mt_data->task_done); } pthread_mutex_unlock (&mt_data->lock); + +if (task_ctx->frame) +AFR_STACK_DESTROY (task_ctx->frame); GF_FREE (task_ctx); return 0; } @@ -787,6 +790,7 @@ _afr_mt_create_process_entries_task (xlator_t *this, int ret = -1; afr_mt_process_entries_task_ctx_t *task_ctx; afr_mt_data_t *mt_data; +call_frame_t *frame = NULL; mt_data = &healer->mt_data; @@ -799,6 +803,8 @@ _afr_mt_create_process_entries_task (xlator_t *this, if (!task_ctx) goto err; +task_ctx->frame = afr_frame_create (this); + INIT_LIST_HEAD (&task_ctx->list); task_ctx->readdir_xl = this; task_ctx->healer = healer; @@ -812,7 +818,7 @@ _afr_mt_create_process_entries_task (xlator_t *this, // This returns immediately, and afr_mt_process_entries_done will // be called when the task is completed e.g. our queue is empty ret = synctask_new (this->ctx->env, afr_mt_process_entries_task, -afr_mt_process_entries_done, NULL, +afr_mt_process_entries_done, task_ctx->frame, (void *)task_ctx); if (!ret) { diff --git a/xlators/cluster/afr/src/afr-self-heald.h b/xlators/cluster/afr/src/afr-self-heald.h index 817e712..1588fc8 100644 --- a/xlators/cluster/afr/src/afr-self-heald.h +++ b/xlators/cluster/afr/src/afr-self-heald.h @@ -74,6 +74,7 @@ typedef struct afr_mt_process_entries_task_ctx_ { subvol_healer_t *healer; xlator_t*readdir_xl; inode_t *idx_inode; /* inode ref for xattrop dir */ +call_frame_t*frame; unsigned intentries_healed; unsigned intentries_processed; unsigned intalready_healed; Richard From: Ravishankar N [ravishan...@redhat.com] Sent: Sunday, February 07, 2016 11:15 PM To: Shreyas Siravara Cc: Richard Wareing; Vijay Bellur; Gluster Devel Subject: Re: [Gluster-devel] Throttling xlator on the bricks Hello, On 01/29/2016 06:51 AM, Shreyas Siravara wrote: So the way our throttling works is (intentionally) very simplistic. (1) When someone mounts an NFS share, we tag the frame with a 32 bit hash of the export name they were authorized to mount. (2) io-stats keeps track of the "current rate" of fops we're seeing for that particular mount, using a sampling of fops and a moving average over a short period of time. (3) Based on whether the share violated its allowed rate (which is defined in a config file), we tag the FOP as "least-pri". Of course this makes the assumption that all NFS endpoints are receiving roughly the same # of FOPs. The rate defined in the config file is a *per* NFS endpoint number. So if your cluster has 10 NFS endpoints, and you've pre-computed that it can do roughly 1000 FOPs per second, the rate in the config file would be 100. (4) IO-Threads then shoves the FOP into the least-pri queue, rather than its default. The value is honored all the way down to the bricks. The code is actually complete, and I'll put it up for review after we iron out a few minor issues. Did you get a chance to send the patch? Just wanted to run some tests and see if this is all we need at the moment to regulate shd traffic, especially with Richa