Re: [Gluster-devel] Pull Request review workflow
I think it's a very good suggestion, I have faced this issue too. I think we should it now before we get used to of the current process :) --- Ashish - Original Message - From: "Xavi Hernandez" To: "gluster-devel" Sent: Thursday, October 15, 2020 6:16:06 PM Subject: Re: [Gluster-devel] Pull Request review workflow If everyone agrees, I'll prepare a PR with the changes in rfc.sh and documentation to implement this change. Xavi On Thu, Oct 15, 2020 at 1:27 PM Ravishankar N < ravishan...@redhat.com > wrote: On 15/10/20 4:36 pm, Sheetal Pamecha wrote: +1 Just a note to the maintainers who are merging PRs to have patience and check the commit message when there are more than 1 commits in PR. Makes sense. Another thing to consider is that rfc.sh script always does a rebase before pushing changes. This rewrites history and changes all commits of a PR. I think we shouldn't do a rebase in rfc.sh. Only if there are conflicts, I would do a manual rebase and push the changes. I think we would also need to rebase if say some .t failure was fixed and we need to submit the PR on top of that, unless "run regression" always applies your PR on the latest HEAD in the concerned branch and triggers the regression. Actually True, Since the migration to github. I have not been using ./rfc.sh and For me it's easier and cleaner. Me as well :) -Ravi ___ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel ___ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel ___ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Removing problematic language in geo-replication
1. Can I replace master:slave with primary:secondary everywhere in the code and the CLI? Are there any suggestions for more appropriate terminology? >> Other options could be - Leader : follower 2. Is it okay to target the changes to a major release (release-9) and *not* provide backward compatibility for the CLI? >> I hope so as it is not impacting functionality. Ashish - Original Message - From: "Ravishankar N" To: "Gluster Devel" Sent: Wednesday, July 22, 2020 2:34:01 PM Subject: [Gluster-devel] Removing problematic language in geo-replication Hi, The gluster code base has some words and terminology (blacklist, whitelist, master, slave etc.) that can be considered hurtful/offensive to people in a global open source setting. Some of words can be fixed trivially but the Geo-replication code seems to be something that needs extensive rework. More so because we have these words being used in the CLI itself. Two questions that I had were: 1. Can I replace master:slave with primary:secondary everywhere in the code and the CLI? Are there any suggestions for more appropriate terminology? 2. Is it okay to target the changes to a major release (release-9) and *not* provide backward compatibility for the CLI? Thanks, Ravi ___ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel ___ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] "Transport endpoint is not connected" error + long list of files to be healed
Hi Mauro, Yes, it will take time to heal these files and time depends on the number of file/dir you have created and the amount of data you have written while the bricks were down. YOu can just run following command and keep observing that the count is changing or not - gluster volume heal tier2 info | grep entries --- Ashish - Original Message - From: "Mauro Tridici" To: "Gluster Devel" Cc: "Gluster-users" Sent: Wednesday, November 13, 2019 7:00:37 PM Subject: [Gluster-users] "Transport endpoint is not connected" error + long list of files to be healed Dear All, our GlusterFS filesystem was showing some problem during some simple users actions (for example, during directory or file creation). mkdir -p test mkdir: impossibile creare la directory `test': Transport endpoint is not connected After received some users notification, I investigated about the issue and I detected that 3 bricks (each one in a separate gluster servers) were down. So, I forced the bricks to be up using “gluster vol start tier force” and bricks come back successfully. All the bricks are up. Anyway, I see from “gluster vol status” command output that also 2 self-heal daemons were down and I had to restart daemons to fix the problem. Now, everything seems to be ok watching the output of “gluster vol status” and I can create a test directory on the file system. But, during the last check made using “gluster volume heal tier2 info”, I saw a long list of files and directories that need to be healed. The list is very long and the command output is still going ahead on my terminal. What I can do to fix this issue? Does the self-heal feature fix automatically each files that need to be healed? Could you please help me to understand what I need to do in this case? You can find below some information about our GlusterFS configuration: Volume Name: tier2 Type: Distributed-Disperse Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c Status: Started Snapshot Count: 0 Number of Bricks: 12 x (4 + 2) = 72 Transport-type: tcp Thank you in advance. Regards, Mauro Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/118564314 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/118564314 Gluster-users mailing list gluster-us...@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users ___ Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/441850968 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/441850968 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Gluster Community Meeting : 2019-07-09
Hi All, Today, we had Gluster Community Meeting and the minutes of meeting can be found on following link - https://github.com/gluster/community/blob/master/meetings/2019-07-09-Community_meeting.md --- Ashish ___ Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/836554017 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/486278655 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Gluster Community Meeting (APAC friendly hours)
BEGIN:VCALENDAR PRODID:Zimbra-Calendar-Provider VERSION:2.0 METHOD:REQUEST BEGIN:VTIMEZONE TZID:Asia/Kolkata BEGIN:STANDARD DTSTART:16010101T00 TZOFFSETTO:+0530 TZOFFSETFROM:+0530 TZNAME:IST END:STANDARD END:VTIMEZONE BEGIN:VEVENT UID:2f47412e-78da-4c15-a865-3a07e33cd8f7 SUMMARY:Gluster Community Meeting (APAC friendly hours) LOCATION:https://bluejeans.com/836554017 ATTENDEE;CN=gluster-users;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=TR UE:mailto:gluster-us...@gluster.org ATTENDEE;CN=gluster-devel;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=TR UE:mailto:gluster-devel@gluster.org ATTENDEE;CN=Ashish Pandey;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=TR UE:mailto:aspan...@redhat.com ORGANIZER;CN=Ashish Pandey:mailto:aspan...@redhat.com DTSTART;TZID="Asia/Kolkata":20190709T113000 DTEND;TZID="Asia/Kolkata":20190709T123000 STATUS:CONFIRMED CLASS:PUBLIC X-MICROSOFT-CDO-INTENDEDSTATUS:BUSY TRANSP:OPAQUE LAST-MODIFIED:20190708T065254Z DTSTAMP:20190708T065254Z SEQUENCE:0 DESCRIPTION:The following is a new meeting request:\n\nSubject: Gluster Comm unity Meeting (APAC friendly hours) \nOrganizer: "Ashish Pandey" \n\nLocation: https://bluejeans.com/836554017 \nTime: Tuesday\, J uly 9\, 2019\, 11:30:00 AM - 12:30:00 PM GMT +05:30 Chennai\, Kolkata\, Mumb ai\, New Delhi\n \nInvitees: gluster-us...@gluster.org\; gluster-devel@glust er.org\; aspan...@redhat.com \n\n\n*~*~*~*~*~*~*~*~*~*\n\n\nBridge: https:// bluejeans.com/836554017\n\nMinutes meeting: https://hackmd.io/Keo9lk_yRMK24Q TEo7qr7g\n\nPrevious Meeting notes: https://github.com/gluster/community/mee tings\n\nFlash talk: Amar would like to talk about glusterfs 8.0 and its roa dmap. \n X-ALT-DESC;FMTTYPE=text/html:The following is a new meeting request:\n\n\n\nS ubject:Gluster Community Meeting (APAC friendly hours) \n Organizer:"Ashish Pandey" \;aspandey@redhat.c om\; \n\n\n\nLo cation:https://bluejeans.com/836554017 \nTime:Tuesday\, July 9\, 2019\, 11:30:00 AM - 12:30:00 PM GMT +05 :30 Chennai\, Kolkata\, Mumbai\, New Delhi\n \n\n\nInvitees:gluster-us...@gluster.org \; gluster-devel@gluster.org\; aspan...@redhat.com \n\n*~*~*~*~*~*~*~*~*~* Bridge: https://bluejeans.com/836554017\n\nMinutes meeting: https://ha ckmd.io/Keo9lk_yRMK24QTEo7qr7g\n\nPrevious Meeting notes: https://github.com /gluster/community/meetings\n\nFlash talk: Amar would like to talk about glu sterfs 8.0 and its roadmap.\n BEGIN:VALARM ACTION:DISPLAY TRIGGER;RELATED=START:-PT5M DESCRIPTION:Reminder END:VALARM END:VEVENT END:VCALENDAR___ Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/836554017 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/486278655 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Should we enable features.locks-notify.contention by default ?
- Original Message - From: "Xavi Hernandez" To: "Ashish Pandey" Cc: "Amar Tumballi Suryanarayan" , "gluster-devel" Sent: Thursday, May 30, 2019 2:03:54 PM Subject: Re: [Gluster-devel] Should we enable features.locks-notify.contention by default ? On Thu, May 30, 2019 at 9:03 AM Ashish Pandey < aspan...@redhat.com > wrote: I am only concerned about in-service upgrade. If a feature/option is not present in V1, then I would prefer not to enable it by default on V2. The problem is that without enabling it, (other-)eager-lock will cause performance issues in some cases. It doesn't seem good to keep an option disabled if enabling it solves these problems. We have seen some problem in other-eager-lock when we changed it to enable by default. Which problems ? I think the only issue with other-eager-lock has been precisely that locks-notify-contention was disabled and a bug that needed to be solved anyway. I was talking about the issue when we have other-eager-lock disabled and then try to do in-service upgrade to a version where this option is ON by default. Although we don't have root cause of that, I was wondering if similar issue could happen in this case also. The difference will be that upgraded bricks will start sending upcall notifications. If clients are too old, these will simply be ignored. So I don't see any problem right now. Am I missing something ? --- Ashish From: "Amar Tumballi Suryanarayan" < atumb...@redhat.com > To: "Xavi Hernandez" < xhernan...@redhat.com > Cc: "gluster-devel" < gluster-devel@gluster.org > Sent: Thursday, May 30, 2019 12:04:43 PM Subject: Re: [Gluster-devel] Should we enable features.locks-notify.contention by default ? On Thu, May 30, 2019 at 11:34 AM Xavi Hernandez < xhernan...@redhat.com > wrote: Hi all, a patch [1] was added some time ago to send upcall notifications from the locks xlator to the current owner of a granted lock when another client tries to acquire the same lock (inodelk or entrylk). This makes it possible to use eager-locking on the client side, which improves performance significantly, while also keeping good performance when multiple clients are accessing the same files (the current owner of the lock receives the notification and releases it as soon as possible, allowing the other client to acquire it and proceed very soon). Currently both AFR and EC are ready to handle these contention notifications and both use eager-locking. However the upcall contention notification is disabled by default. I think we should enabled it by default. Does anyone see any possible issue if we do that ? If it helps performance, we should ideally do it. But, considering we are days away from glusterfs-7.0 branching, should we do it now, or wait for branch out, and make it default for next version? (so that it gets time for testing). Considering it is about consistency I would like to hear everyone's opinion here. Regards, Amar Regards, Xavi [1] https://review.gluster.org/c/glusterfs/+/14736 ___ -- Amar Tumballi (amarts) ___ Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/836554017 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/486278655 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel ___ Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/836554017 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/486278655 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Meeting Details on footer of the gluster-devel and gluster-user mailing list
Hi, While we send a mail on gluster-devel or gluster-user mailing list, following content gets auto generated and placed at the end of mail. Gluster-users mailing list gluster-us...@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel In the similar way, is it possible to attach meeting schedule and link at the end of every such mails? Like this - Meeting schedule - * APAC friendly hours * Tuesday 14th May 2019 , 11:30AM IST * Bridge: https://bluejeans.com/836554017 * NA/EMEA * Tuesday 7th May 2019 , 01:00 PM EDT * Bridge: https://bluejeans.com/486278655 Or just a link to meeting minutes details?? https://github.com/gluster/community/tree/master/meetings This will help developers and users of the community to know when and where meeting happens and how to attend those meetings. --- Ashish ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Should we enable contention notification by default ?
Xavi, I would like to keep this option (features.lock-notify-contention) enabled by default. However, I can see that there is one more option which will impact the working of this option which is "notify-contention-delay" .description = "This value determines the minimum amount of time " "(in seconds) between upcall contention notifications " "on the same inode. If multiple lock requests are " "received during this period, only one upcall will " "be sent."}, I am not sure what should be the best value for this option if we want to keep features.lock-notify-contention ON by default? It looks like if we keep the value of notify-contention-delay more, say 5 sec, it will wait for this much time to send up call notification which does not look good. Is my understanding correct? What will be impact of this value and what should be the default value of this option? --- Ashish - Original Message - From: "Xavi Hernandez" To: "gluster-devel" Cc: "Pranith Kumar Karampuri" , "Ashish Pandey" , "Amar Tumballi" Sent: Thursday, May 2, 2019 4:15:38 PM Subject: Should we enable contention notification by default ? Hi all, there's a feature in the locks xlator that sends a notification to current owner of a lock when another client tries to acquire the same lock. This way the current owner is made aware of the contention and can release the lock as soon as possible to allow the other client to proceed. This is specially useful when eager-locking is used and multiple clients access the same files and directories. Currently both replicated and dispersed volumes use eager-locking and can use contention notification to force an early release of the lock. Eager-locking reduces the number of network requests required for each operation, improving performance, but could add delays to other clients while it keeps the inode or entry locked. With the contention notification feature we avoid this delay, so we get the best performance with minimal issues in multiclient environments. Currently the contention notification feature is controlled by the 'features.lock-notify-contention' option and it's disabled by default. Should we enable it by default ? I don't see any reason to keep it disabled by default. Does anyone foresee any problem ? Regards, Xavi ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] Gluster : Improvements on "heal info" command
No, it is not necessary that the first brick would be local one. I really don't think starting from local node will make a difference. The major time spent is not in getting list of entries from .gluster/indices/xattrop folder. The LOCK->XATTR_CHECK->UNLOCK is the cycle which takes most of the time which is not going to change even if it is from local brick. --- Ashish - Original Message - From: "Strahil" To: "Ashish" , "Gluster" , "Gluster" Sent: Wednesday, March 6, 2019 10:21:26 PM Subject: Re: [Gluster-users] Gluster : Improvements on "heal info" command Hi , This sounds nice. I would like to ask if the order is starting from the local node's bricks first ? (I am talking about --brick=one) Best Regards, Strahil Nikolov On Mar 5, 2019 10:51, Ashish Pandey wrote: Hi All, We have observed and heard from gluster users about the long time "heal info" command takes. Even when we all want to know if a gluster volume is healthy or not, it takes time to list down all the files from all the bricks after which we can be sure if the volume is healthy or not. Here, we have come up with some options for "heal info" command which provide report quickly and reliably. gluster v heal vol info --subvol=[number of the subvol] --brick=[one,all] Problem: "gluster v heal info" command picks each subvolume and checks the .glusterfs/indices/xattrop folder of every brick of that subvolume to find out if there is any entry which needs to be healed. It picks the entry and takes a lock on that entry to check xattrs to find out if that entry actually needs heal or not. This LOCK->CHECK-XATTR->UNLOCK cycle takes lot of time for each file. Let's consider two most often seen cases for which we use "heal info" and try to understand the improvements. Case -1 : Consider 4+2 EC volume and all the bricks on 6 different nodes. A brick of the volume is down and client has written 1 files on one of the mount point of this volume. Entries for these 10K files will be created on ".glusterfs/indices/xattrop" on all the rest of 5 bricks. Now, brick is UP and when we use "heal info" command for this volume, it goes to all the bricks and picks these 10K file entries and goes through LOCK->CHECK-XATTR->UNLOCK cycle for all the files. This happens for all the bricks, that means, we check 50K files and perform the LOCK->CHECK-XATTR->UNLOCK cycle 50K times, while only 10K entries were sufficient to check. It is a very time consuming operation. If IO"s are happening one some of the new files, we check these files also which will add the time. Here, all we wanted to know if our volume has been healed and healthy. Solution : Whenever a brick goes down and comes up and when we use "heal info" command, our *main intention* is to find out if the volume is *healthy* or *unhealthy*. A volume is unhealthy even if one file is not healthy. So, we should scan bricks one by one and as soon as we find that one brick is having some entries which require to be healed, we can come out and list the files and say the volume is not healthy. No need to scan rest of the bricks. That's where "--brick=[one,all]" option has been introduced. "gluster v heal vol info --brick=[one,all]" "one" - It will scan the brick sequentially and as soon as it will find any unhealthy entries, it will list it out and stop scanning other bricks. "all" - It will act just like current behavior and provide all the files from all the bricks. If we do not provide this option, default (current) behavior will be applicable. Case -2 : Consider 24 X (4+2) EC volume. Let's say one brick from *only one* of the sub volume has been replaced and a heal has been triggered. To know if the volume is in healthy state, we go to each brick of *each and every sub volume* and check if there are any entries in ".glusterfs/indices/xattrop" folder which need heal or not. If we know which sub volume participated in brick replacement, we just need to check health of that sub volume and not query/check other sub volumes. If several clients are writing number of files on this volume, an entry for each of these files will be created in .glusterfs/indices/xattrop and "heal info' command will go through LOCK->CHECK-XATTR->UNLOCK cycle to find out if these entries need heal or not which takes lot of time. In addition to this a client will also see performance drop as it will have to release and take lock again. Solution: Provide an option to mention number of sub volume for which we want to check heal info. "gluster v heal vol info --subvol= " Here, --subvol will be given number of the subvolume we want to check. Example: "gluster v heal vol info --subvol=1 "
[Gluster-devel] Gluster : Improvements on "heal info" command
Hi All, We have observed and heard from gluster users about the long time "heal info" command takes. Even when we all want to know if a gluster volume is healthy or not, it takes time to list down all the files from all the bricks after which we can be sure if the volume is healthy or not. Here, we have come up with some options for "heal info" command which provide report quickly and reliably. gluster v heal vol info --subvol=[number of the subvol] --brick=[one,all] Problem: "gluster v heal info" command picks each subvolume and checks the .glusterfs/indices/xattrop folder of every brick of that subvolume to find out if there is any entry which needs to be healed. It picks the entry and takes a lock on that entry to check xattrs to find out if that entry actually needs heal or not. This LOCK->CHECK-XATTR->UNLOCK cycle takes lot of time for each file. Let's consider two most often seen cases for which we use "heal info" and try to understand the improvements. Case -1 : Consider 4+2 EC volume and all the bricks on 6 different nodes. A brick of the volume is down and client has written 1 files on one of the mount point of this volume. Entries for these 10K files will be created on ".glusterfs/indices/xattrop" on all the rest of 5 bricks. Now, brick is UP and when we use "heal info" command for this volume, it goes to all the bricks and picks these 10K file entries and goes through LOCK->CHECK-XATTR->UNLOCK cycle for all the files. This happens for all the bricks, that means, we check 50K files and perform the LOCK->CHECK-XATTR->UNLOCK cycle 50K times, while only 10K entries were sufficient to check. It is a very time consuming operation. If IO"s are happening one some of the new files, we check these files also which will add the time. Here, all we wanted to know if our volume has been healed and healthy. Solution : Whenever a brick goes down and comes up and when we use "heal info" command, our *main intention* is to find out if the volume is *healthy* or *unhealthy*. A volume is unhealthy even if one file is not healthy. So, we should scan bricks one by one and as soon as we find that one brick is having some entries which require to be healed, we can come out and list the files and say the volume is not healthy. No need to scan rest of the bricks. That's where "--brick=[one,all]" option has been introduced. "gluster v heal vol info --brick=[one,all]" "one" - It will scan the brick sequentially and as soon as it will find any unhealthy entries, it will list it out and stop scanning other bricks. "all" - It will act just like current behavior and provide all the files from all the bricks. If we do not provide this option, default (current) behavior will be applicable. Case -2 : Consider 24 X (4+2) EC volume. Let's say one brick from *only one* of the sub volume has been replaced and a heal has been triggered. To know if the volume is in healthy state, we go to each brick of *each and every sub volume* and check if there are any entries in ".glusterfs/indices/xattrop" folder which need heal or not. If we know which sub volume participated in brick replacement, we just need to check health of that sub volume and not query/check other sub volumes. If several clients are writing number of files on this volume, an entry for each of these files will be created in .glusterfs/indices/xattrop and "heal info' command will go through LOCK->CHECK-XATTR->UNLOCK cycle to find out if these entries need heal or not which takes lot of time. In addition to this a client will also see performance drop as it will have to release and take lock again. Solution: Provide an option to mention number of sub volume for which we want to check heal info. "gluster v heal vol info --subvol= " Here, --subvol will be given number of the subvolume we want to check. Example: "gluster v heal vol info --subvol=1 " === Performance Data - A quick performance test done on standalone system. Type: Distributed-Disperse Volume ID: ea40eb13-d42c-431c-9c89-0153e834e67e Status: Started Snapshot Count: 0 Number of Bricks: 2 x (4 + 2) = 12 Transport-type: tcp Bricks: Brick1: apandey:/home/apandey/bricks/gluster/vol-1 Brick2: apandey:/home/apandey/bricks/gluster/vol-2 Brick3: apandey:/home/apandey/bricks/gluster/vol-3 Brick4: apandey:/home/apandey/bricks/gluster/vol-4 Brick5: apandey:/home/apandey/bricks/gluster/vol-5 Brick6: apandey:/home/apandey/bricks/gluster/vol-6 Brick7: apandey:/home/apandey/bricks/gluster/new-1 Brick8: apandey:/home/apandey/bricks/gluster/new-2 Brick9: apandey:/home/apandey/bricks/gluster/new-3 Brick10: apandey:/home/apandey/bricks/gluster/new-4 Brick11: apandey:/home/apandey/bricks/gluster/new-5 Brick12: apandey:/home/apandey/bricks/gluster/new-6 Just disabled the shd to get the data - Killed one brick each from two subvolumes and wrote 2000 files on mount point. [root@apandey
Re: [Gluster-devel] Release 6: Kick off!
Following is the patch I am working and targeting - https://review.gluster.org/#/c/glusterfs/+/21933/ It is under review phase and yet to be merged. -- Ashish - Original Message - From: "RAFI KC" To: "Shyam Ranganathan" , "GlusterFS Maintainers" , "Gluster Devel" Sent: Wednesday, January 23, 2019 4:22:42 PM Subject: Re: [Gluster-devel] Release 6: Kick off! There are three patches that I'm working for Gluster-6. [1] : https://review.gluster.org/#/c/glusterfs/+/22075/ [2] : https://review.gluster.org/#/c/glusterfs/+/21333/ [3] : https://review.gluster.org/#/c/glusterfs/+/21720/ Regards Rafi KC On 1/19/19 1:51 AM, Shyam Ranganathan wrote: > On 12/6/18 9:34 AM, Shyam Ranganathan wrote: >> On 11/6/18 11:34 AM, Shyam Ranganathan wrote: >>> ## Schedule >> We have decided to postpone release-6 by a month, to accommodate for >> late enhancements and the drive towards getting what is required for the >> GCS project [1] done in core glusterfs. >> >> This puts the (modified) schedule for Release-6 as below, >> >> Working backwards on the schedule, here's what we have: >> - Announcement: Week of Mar 4th, 2019 >> - GA tagging: Mar-01-2019 >> - RC1: On demand before GA >> - RC0: Feb-04-2019 >> - Late features cut-off: Week of Jan-21st, 2018 >> - Branching (feature cutoff date): Jan-14-2018 >> (~45 days prior to branching) > We are slightly past the branching date, I would like to branch early > next week, so please respond with a list of patches that need to be part > of the release and are still pending a merge, will help address review > focus on the same and also help track it down and branch the release. > > Thanks, Shyam > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression health for release-5.next and release-6
I downloaded logs of regression runs 1077 and 1073 and tried to investigate it. In both regression ec/bug-1236065.t is hanging on TEST 70 which is trying to get the online brick count I can see that in mount/bricks and glusterd logs it has not move forward after this test. glusterd.log - [2019-01-06 16:27:51.346408]:++ G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 70 5 online_brick_count ++ [2019-01-06 16:27:51.645014] I [MSGID: 106499] [glusterd-handler.c:4404:__glusterd_handle_status_volume] 0-management: Received status volume req for volume patchy [2019-01-06 16:27:51.646664] I [dict.c:2745:dict_get_str_boolean] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x4a6c3) [0x7f4c37fe06c3] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x43b3a) [0x7f4c37fd9b3a] -->/build/install/lib/libglusterfs.so.0(dict_get_str_boolean+0x170) [0x7f4c433d83fb] ) 0-dict: key nfs.disable, integer type asked, has string type [Invalid argument] [2019-01-06 16:27:51.647177] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick0.rdma_port, string type asked, has integer type [Invalid argument] [2019-01-06 16:27:51.647227] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick1.rdma_port, string type asked, has integer type [Invalid argument] [2019-01-06 16:27:51.647292] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick2.rdma_port, string type asked, has integer type [Invalid argument] [2019-01-06 16:27:51.647333] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick3.rdma_port, string type asked, has integer type [Invalid argument] [2019-01-06 16:27:51.647371] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick4.rdma_port, string type asked, has integer type [Invalid argument] [2019-01-06 16:27:51.647409] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick5.rdma_port, string type asked, has integer type [Invalid argument] [2019-01-06 16:27:51.647447] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick6.rdma_port, string type asked, has integer type [Invalid argument] [2019-01-06 16:27:51.649335] E [MSGID: 101191] [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-01-06 16:27:51.932871] I [MSGID: 106499] [glusterd-handler.c:4404:__glusterd_handle_status_volume] 0-management: Received status volume req for volume patchy It is just taking lot of time to get the status at this point. It looks like there could be some issue with connection or the handing of volume status when some bricks are down. --- Ashish - Original Message - From: "Mohit Agrawal" To: "Shyam Ranganathan" Cc: "Gluster Devel" Sent: Saturday, January 12, 2019 6:46:20 PM Subject: Re: [Gluster-devel] Regression health for release-5.next and release-6 Previous logs related to client not bricks, below are the brick logs [2019-01-12 12:25:25.893485]:++ G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 68 rm -f 0.o 10.o 11.o 12.o 13.o 14.o 15.o 16.o 17.o 18.o 19.o 1.o 2.o 3.o 4.o 5.o 6.o 7.o 8.o 9.o ++ The message "I [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.ec.size' would not be sent on wire in the future [Invalid
Re: [Gluster-devel] Master branch lock down: RCA for tests (ec-1468261.t)
Correction. RCA - https://lists.gluster.org/pipermail/gluster-devel/2018-August/055167.html Patch - Mohit is working on this patch (server side) which is yet to be merged. We can put extra test to make sure bricks are connected to shd before heal begin. Will send a patch for that. --- Ashish - Original Message - From: "Ashish Pandey" To: "Shyam Ranganathan" Cc: "GlusterFS Maintainers" , "Gluster Devel" Sent: Monday, August 13, 2018 10:54:16 AM Subject: Re: [Gluster-devel] Master branch lock down: RCA for tests (ec-1468261.t) RCA - https://lists.gluster.org/pipermail/gluster-devel/2018-August/055167.html Patch - https://review.gluster.org/#/c/glusterfs/+/20657/ should also fix this issue. Checking if we can put extra test to make sure bricks are connected to shd before heal begin. Will send a patch for that. --- Ashish - Original Message - From: "Shyam Ranganathan" To: "Gluster Devel" , "GlusterFS Maintainers" Sent: Monday, August 13, 2018 6:12:59 AM Subject: Re: [Gluster-devel] Master branch lock down: RCA for tests (testname.t) As a means of keeping the focus going and squashing the remaining tests that were failing sporadically, request each test/component owner to, - respond to this mail changing the subject (testname.t) to the test name that they are responding to (adding more than one in case they have the same RCA) - with the current RCA and status of the same List of tests and current owners as per the spreadsheet that we were tracking are: ./tests/basic/distribute/rebal-all-nodes-migrate.t TBD ./tests/basic/tier/tier-heald.t TBD ./tests/basic/afr/sparse-file-self-heal.t TBD ./tests/bugs/shard/bug-1251824.t TBD ./tests/bugs/shard/configure-lru-limit.t TBD ./tests/bugs/replicate/bug-1408712.t Ravi ./tests/basic/afr/replace-brick-self-heal.t TBD ./tests/00-geo-rep/00-georep-verify-setup.t Kotresh ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t Karthik ./tests/basic/stats-dump.t TBD ./tests/bugs/bug-1110262.t TBD ./tests/basic/ec/ec-data-heal.t Mohit ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t Pranith ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t TBD ./tests/basic/ec/ec-5-2.t Sunil ./tests/bugs/shard/bug-shard-discard.t TBD ./tests/bugs/glusterd/remove-brick-testcases.t TBD ./tests/bugs/protocol/bug-808400-repl.t TBD ./tests/bugs/quick-read/bug-846240.t Du ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t Mohit ./tests/00-geo-rep/georep-basic-dr-tarssh.t Kotresh ./tests/bugs/ec/bug-1236065.t Pranith ./tests/00-geo-rep/georep-basic-dr-rsync.t Kotresh ./tests/basic/ec/ec-1468261.t Ashish ./tests/basic/afr/add-brick-self-heal.t Ravi ./tests/basic/afr/granular-esh/replace-brick.t Pranith ./tests/bugs/core/multiplex-limit-issue-151.t Sanju ./tests/bugs/glusterd/validating-server-quorum.t Atin ./tests/bugs/replicate/bug-1363721.t Ravi ./tests/bugs/index/bug-1559004-EMLINK-handling.t Pranith ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t Karthik ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t Atin ./tests/bugs/glusterd/rebalance-operations-in-single-node.t TBD ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t TBD ./tests/bitrot/bug-1373520.t Kotresh ./tests/bugs/distribute/bug-1117851.t Shyam/Nigel ./tests/bugs/glusterd/quorum-validation.t Atin ./tests/bugs/distribute/bug-1042725.t Shyam ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t Karthik ./tests/bugs/quota/bug-1293601.t TBD ./tests/bugs/bug-1368312.t Du ./tests/bugs/distribute/bug-1122443.t Du ./tests/bugs/core/bug-1432542-mpx-restart-crash.t 1608568 Nithya/Shyam Thanks, Shyam ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Master branch lock down: RCA for tests (ec-1468261.t)
RCA - https://lists.gluster.org/pipermail/gluster-devel/2018-August/055167.html Patch - https://review.gluster.org/#/c/glusterfs/+/20657/ should also fix this issue. Checking if we can put extra test to make sure bricks are connected to shd before heal begin. Will send a patch for that. --- Ashish - Original Message - From: "Shyam Ranganathan" To: "Gluster Devel" , "GlusterFS Maintainers" Sent: Monday, August 13, 2018 6:12:59 AM Subject: Re: [Gluster-devel] Master branch lock down: RCA for tests (testname.t) As a means of keeping the focus going and squashing the remaining tests that were failing sporadically, request each test/component owner to, - respond to this mail changing the subject (testname.t) to the test name that they are responding to (adding more than one in case they have the same RCA) - with the current RCA and status of the same List of tests and current owners as per the spreadsheet that we were tracking are: ./tests/basic/distribute/rebal-all-nodes-migrate.t TBD ./tests/basic/tier/tier-heald.t TBD ./tests/basic/afr/sparse-file-self-heal.t TBD ./tests/bugs/shard/bug-1251824.t TBD ./tests/bugs/shard/configure-lru-limit.t TBD ./tests/bugs/replicate/bug-1408712.t Ravi ./tests/basic/afr/replace-brick-self-heal.t TBD ./tests/00-geo-rep/00-georep-verify-setup.t Kotresh ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t Karthik ./tests/basic/stats-dump.t TBD ./tests/bugs/bug-1110262.t TBD ./tests/basic/ec/ec-data-heal.t Mohit ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t Pranith ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t TBD ./tests/basic/ec/ec-5-2.t Sunil ./tests/bugs/shard/bug-shard-discard.t TBD ./tests/bugs/glusterd/remove-brick-testcases.t TBD ./tests/bugs/protocol/bug-808400-repl.t TBD ./tests/bugs/quick-read/bug-846240.t Du ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t Mohit ./tests/00-geo-rep/georep-basic-dr-tarssh.t Kotresh ./tests/bugs/ec/bug-1236065.t Pranith ./tests/00-geo-rep/georep-basic-dr-rsync.t Kotresh ./tests/basic/ec/ec-1468261.t Ashish ./tests/basic/afr/add-brick-self-heal.t Ravi ./tests/basic/afr/granular-esh/replace-brick.t Pranith ./tests/bugs/core/multiplex-limit-issue-151.t Sanju ./tests/bugs/glusterd/validating-server-quorum.t Atin ./tests/bugs/replicate/bug-1363721.t Ravi ./tests/bugs/index/bug-1559004-EMLINK-handling.t Pranith ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t Karthik ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t Atin ./tests/bugs/glusterd/rebalance-operations-in-single-node.t TBD ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t TBD ./tests/bitrot/bug-1373520.t Kotresh ./tests/bugs/distribute/bug-1117851.t Shyam/Nigel ./tests/bugs/glusterd/quorum-validation.t Atin ./tests/bugs/distribute/bug-1042725.t Shyam ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t Karthik ./tests/bugs/quota/bug-1293601.t TBD ./tests/bugs/bug-1368312.t Du ./tests/bugs/distribute/bug-1122443.t Du ./tests/bugs/core/bug-1432542-mpx-restart-crash.t 1608568 Nithya/Shyam Thanks, Shyam ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Master branch lock down status
I think the problem with this failure is the same which Shyam suspected for other EC failure. Connection to bricks are not being setup after killing bricks and starting volume using force. ./tests/basic/ec/ec-1468261.t - Failure reported - 23:03:05 ok 34, LINENUM:79 23:03:05 not ok 35 Got "5" instead of "6", LINENUM:80 23:03:05 FAILED COMMAND: 6 ec_child_up_count patchy 0 23:03:05 not ok 36 Got "1298" instead of "^0$", LINENUM:83 23:03:05 FAILED COMMAND: ^0$ get_pending_heal_count patchy 23:03:05 ok 37, LINENUM:86 23:03:05 ok 38, LINENUM:87 23:03:05 not ok 39 Got "3" instead of "4", LINENUM:88 When I see the glustershd log, I can see that there is an issue while starting the volume by force to starte the killed bricks. The bricks are not getting connected. I am seeing following logs in glustershd == [2018-08-06 23:05:45.077699] I [MSGID: 101016] [glusterfs3.h:739:dict_to_xdr] 0-dict: key 'trusted.ec.size' is would not be sent on wire in future [Invalid argument] [2018-08-06 23:05:45.077724] I [MSGID: 101016] [glusterfs3.h:739:dict_to_xdr] 0-dict: key 'trusted.ec.dirty' is would not be sent on wire in future [Invalid argument] [2018-08-06 23:05:45.077744] I [MSGID: 101016] [glusterfs3.h:739:dict_to_xdr] 0-dict: key 'trusted.ec.version' is would not be sent on wire in future [Invalid argument] [2018-08-06 23:05:46.695719] I [rpc-clnt.c:2087:rpc_clnt_reconfig] 0-patchy-client-1: changing port to 49152 (from 0) [2018-08-06 23:05:46.699766] W [MSGID: 114043] [client-handshake.c:1061:client_setvolume_cbk] 0-patchy-client-1: failed to set the volume [Resource temporarily unavailable] [2018-08-06 23:05:46.699809] W [MSGID: 114007] [client-handshake.c:1090:client_setvolume_cbk] 0-patchy-client-1: failed to get 'process-uuid' from reply dict [Invalid argument] [2018-08-06 23:05:46.699833] E [MSGID: 114044] [client-handshake.c:1096:client_setvolume_cbk] 0-patchy-client-1: SETVOLUME on remote-host failed: cleanup flag is set for xlator. Try again later [Resource temporarily unavailable] [2018-08-06 23:05:46.699855] I [MSGID: 114051] [client-handshake.c:1201:client_setvolume_cbk] 0-patchy-client-1: sending CHILD_CONNECTING event [2018-08-06 23:05:46.699920] I [MSGID: 114018] [client.c:2255:client_rpc_notify] 0-patchy-client-1: disconnected from patchy-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2018-08-06 23:05:50.702806] I [rpc-clnt.c:2087:rpc_clnt_reconfig] 0-patchy-client-1: changing port to 49152 (from 0) [2018-08-06 23:05:50.706726] W [MSGID: 114043] [client-handshake.c:1061:client_setvolume_cbk] 0-patchy-client-1: failed to set the volume [Resource temporarily unavailable] [2018-08-06 23:05:50.706783] W [MSGID: 114007] [client-handshake.c:1090:client_setvolume_cbk] 0-patchy-client-1: failed to get 'process-uuid' from reply dict [Invalid argument] [2018-08-06 23:05:50.706808] E [MSGID: 114044] [client-handshake.c:1096:client_setvolume_cbk] 0-patchy-client-1: SETVOLUME on remote-host failed: cleanup flag is set for xlator. Try again later [Resource temporarily unavailable] [2018-08-06 23:05:50.706831] I [MSGID: 114051] [client-handshake.c:1201:client_setvolume_cbk] 0-patchy-client-1: sending CHILD_CONNECTING event [2018-08-06 23:05:50.706904] I [MSGID: 114018] [client.c:2255:client_rpc_notify] 0-patchy-client-1: disconnected from patchy-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2018-08-06 23:05:54.713490] I [rpc-clnt.c:2087:rpc_clnt_reconfig] 0-patchy-client-1: changing port to 49152 (from 0) [2018-08-06 23:05:54.717417] W [MSGID: 114043] [client-handshake.c:1061:client_setvolume_cbk] 0-patchy-client-1: failed to set the volume [Resource temporarily unavailable] [2018-08-06 23:05:54.717483] W [MSGID: 114007] [client-handshake.c:1090:client_setvolume_cbk] 0-patchy-client-1: failed to get 'process-uuid' from reply dict [Invalid argument] [2018-08-06 23:05:54.717508] E [MSGID: 114044] [client-handshake.c:1096:client_setvolume_cbk] 0-patchy-client-1: SETVOLUME on remote-host failed: cleanup flag is set for xlator. Try again later [Resource temporarily unavailable] [2018-08-06 23:05:54.717530] I [MSGID: 114051] [client-handshake.c:1201:client_setvolume_cbk] 0-patchy-client-1: sending CHILD_CONNECTING event [2018-08-06 23:05:54.717605] I [MSGID: 114018] [client.c:2255:client_rpc_notify] 0-patchy-client-1: disconnected from patchy-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2018-08-06 23:05:58.204494]:++ G_LOG:./tests/basic/ec/ec-1468261.t: TEST: 83 ^0$ get_pending_heal_count patchy ++ There are many more such logs in this duration Time at which test at line no 80 started - [2018-08-06 23:05:38.652297]:++
Re: [Gluster-devel] [Gluster-users] Integration of GPU with glusterfs
It is disappointing to see the limitation being put by Nvidia on low cost GPU usage on data centers. https://www.theregister.co.uk/2018/01/03/nvidia_server_gpus/ We thought of providing an option in glusterfs by which we can control if we want to use GPU or not. So, the concern of gluster eating out GPU's which could be used by others can be addressed. --- Ashish - Original Message - From: "Jim Kinney"To: gluster-us...@gluster.org, "Lindsay Mathieson" , "Darrell Budic" , "Gluster Users" Cc: "Gluster Devel" Sent: Friday, January 12, 2018 6:00:25 PM Subject: Re: [Gluster-devel] [Gluster-users] Integration of GPU with glusterfs On January 11, 2018 10:58:28 PM EST, Lindsay Mathieson wrote: >On 12/01/2018 3:14 AM, Darrell Budic wrote: >> It would also add physical resource requirements to future client >> deploys, requiring more than 1U for the server (most likely), and I’m > >> not likely to want to do this if I’m trying to optimize for client >> density, especially with the cost of GPUs today. > >Nvidia has banned their GPU's being used in Data Centers now to, I >imagine they are planning to add a licensing fee. Nvidia banned only the lower cost, home user versions of their GPU line from datacenters. > >-- >Lindsay Mathieson > >___ >Gluster-users mailing list >gluster-us...@gluster.org >http://lists.gluster.org/mailman/listinfo/gluster-users -- Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] Integration of GPU with glusterfs
I have updated the comment. Thanks!!! --- Ashish - Original Message - From: "Shyam Ranganathan" <srang...@redhat.com> To: "Ashish Pandey" <aspan...@redhat.com> Cc: "Gluster Devel" <gluster-devel@gluster.org> Sent: Thursday, January 11, 2018 10:12:54 PM Subject: Re: [Gluster-users] Integration of GPU with glusterfs On 01/11/2018 01:12 AM, Ashish Pandey wrote: > There is a gihub issue opened for this. Please provide your comment or > reply to this mail. > > A - https://github.com/gluster/glusterfs/issues/388 Ashish, the github issue first comment is carrying the default message that we populate. It would make it more readable if you could copy the text in your mail to that instead (it would also look a lot cleaner). Thanks, Shyam ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Integration of GPU with glusterfs
Hi, We have been thinking of exploiting GPU capabilities to enhance performance of glusterfs. We would like to know others thoughts on this. In EC, we have been doing CPU intensive computations to encode and decode data before writing and reading. This requires a lot of CPU cycles and we have been observing 100% CPU usage on client side. Data healing will also have the same impact as it also needs to do read-decode-encode-write cycle. As most of the modern servers comes with GPU feature, having glusterfs GPU ready might give us performance improvements. This is not only specific to EC volume, there are other features which will require a lot of computations and could use this capability; For Example: 1 - Encryption/Decryption 2 - Compression and de-duplication 3 - Hashing 4 - Any other? [Please add if you have something in mind] Before proceeding further we would like to have your inputs on this. Do you have any other use case (existing or future) which could perform better on GPU? Do you think that it is worth to integrate GPU with glusterfs? The effort to have this performance gain could be achieved by some other better ways. Any input on the way we should implement it. There is a gihub issue opened for this. Please provide your comment or reply to this mail. A - https://github.com/gluster/glusterfs/issues/388 --- Ashish ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression failure : /tests/basic/ec/ec-1468261.t
I don't think it is an issue with the test you mentioned. You may have to re-trigger the test. This is what I did for one of my patch. -- Ashish - Original Message - From: "Nithya Balachandran" <nbala...@redhat.com> To: "Gluster Devel" <gluster-devel@gluster.org>, "Xavi Hernandez" <jaher...@redhat.com>, "Ashish Pandey" <aspan...@redhat.com> Sent: Monday, November 6, 2017 6:35:24 PM Subject: Regression failure : /tests/basic/ec/ec-1468261.t Can someone take a look at this? The run was aborted ( https://build.gluster.org/job/centos6-regression/7232/console ) Thanks, Nithya ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Need inputs on patch #17985
Raghvendra, I have provided my comment on this patch. I think EC will not have any issue with this approach. However, I would welcome comments from Xavi and Pranith too for any side effects which I may not be able to foresee. Ashish - Original Message - From: "Raghavendra Gowdappa" <rgowd...@redhat.com> To: "Ashish Pandey" <aspan...@redhat.com> Cc: "Pranith Kumar Karampuri" <pkara...@redhat.com>, "Xavier Hernandez" <xhernan...@datalab.es>, "Gluster Devel" <gluster-devel@gluster.org> Sent: Wednesday, August 23, 2017 8:29:48 AM Subject: Need inputs on patch #17985 Hi Ashish, Following are the blockers for making a decision on whether patch [1] can be merged or not: * Evaluation of dentry operations (like rename etc) in dht * Whether EC works fine if a non-lookup fop (like open(dir), stat, chmod etc) hits EC without a single lookup performed on file/inode Can you please comment on the patch? I'll take care of dht part. [1] https://review.gluster.org/#/c/17985/ regards, Raghavendra ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] High load on CPU due to glusterfsd process
Hi, The issue you are seeing is a little complex one but information you have provided is very less. - Volume info - Volume status? - What kind of IO is going on? - Any brick is down or not? - Snapshot of Top command. - Anything you are seeing in glustershd or mount logs or bricks logs? --- Ashish - Original Message - From: "ABHISHEK PALIWAL"To: "gluster-users" , "Gluster Devel" Sent: Wednesday, August 2, 2017 1:49:30 PM Subject: Re: [Gluster-devel] High load on CPU due to glusterfsd process Could you please response? On Fri, Jul 28, 2017 at 5:55 PM, ABHISHEK PALIWAL < abhishpali...@gmail.com > wrote: Hi Team, Whenever I am performing the IO operation on gluster volume, the loads is getting increase on CPU which reaches upto 70-80 sometimes. when we started debugging, found that the io_worker thread is created to server the IO request and consume high CPU till that request gets completed. Could you please let me know why io_worker thread takes this much of CPU. Is there any way to resole this? -- Regards Abhishek Paliwal -- Regards Abhishek Paliwal ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Glusto failures with dispersed volumes + Samba
Hi Nigel, As Pranith has already mentioned, we are getting different gfid's in loc and loc->inode. It looks like issue with DHT. If a re validate fails for gfid, a fresh look up should be done. I don't know if it is related or not but a similar bug was fixed by Pranith https://review.gluster.org/#/c/16986/ Ashish - Original Message - From: "Pranith Kumar Karampuri"To: "Anoop C S" Cc: "gluster-devel" Sent: Thursday, June 29, 2017 7:36:45 PM Subject: Re: [Gluster-devel] Glusto failures with dispersed volumes + Samba On Thu, Jun 29, 2017 at 6:49 PM, Anoop C S < anoo...@autistici.org > wrote: On Thu, 2017-06-29 at 16:35 +0530, Nigel Babu wrote: > Hi Pranith and Xavi, > > We seem to be running into a problem with glusto tests when we try to run > them against dispersed > volumes over a CIFS mount[1]. Is this a new test case? If not was it running successfully before? > You can find the logs attached to the job [2]. VFS stat call failures are seen in Samba logs: [2017/06/29 11:01:55.959374, 0] ../source3/modules/vfs_glusterfs.c:870(vfs_gluster_stat) glfs_stat(.) failed: Invalid argument I could also see the following errors(repeatedly..) in glusterfs client logs: [2017-06-29 10:33:43.031198] W [MSGID: 122019] [ec-helpers.c:412:ec_loc_gfid_check] 0- testvol_distributed-dispersed-disperse-0: Mismatching GFID's in loc [2017-06-29 10:33:43.031303] I [MSGID: 109094] [dht-common.c:1016:dht_revalidate_cbk] 0- testvol_distributed-dispersed-dht: Revalidate: subvolume testvol_distributed-dispersed-disperse-0 for /user11 (gfid = 665c515b-3940-480f-af7c-6aaf37731eaa) returned -1 [Invalid argument] This log basically says that EC received loc which has different gfids in loc->inode->gfid and loc->gfid. > I've triggered a fresh job[3] to confirm that it only fails in these > particular conditions and > certainly seems to be the case. The job is currently ongoing, so you may want > to take a look when > you get some time how this job went. > > Let me know if you have any questions or need more debugging information. > > [1]: https://ci.centos.org/job/gluster_glusto/325/testReport/ > [2]: https://ci.centos.org/job/gluster_glusto/325/artifact/ > [3]: https://ci.centos.org/job/gluster_glusto/326/console > > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-devel -- Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Disperse volume : Sequential Writes
I think it is a good Idea. May be we can add more enhancement in this xlator to improve things in future. - Original Message - From: "Pranith Kumar Karampuri" <pkara...@redhat.com> To: "Ashish Pandey" <aspan...@redhat.com> Cc: "Xavier Hernandez" <xhernan...@datalab.es>, "Gluster Devel" <gluster-devel@gluster.org> Sent: Monday, July 3, 2017 9:05:54 AM Subject: Re: [Gluster-devel] Disperse volume : Sequential Writes Ashish, Xavi, I think it is better to implement this change as a separate read-after-write caching xlator which we can load between EC and client xlator. That way EC will not get a lot more functionality than necessary and may be this xlator can be used somewhere else in the stack if possible. On Fri, Jun 16, 2017 at 4:19 PM, Ashish Pandey < aspan...@redhat.com > wrote: I think it should be done as we have agreement on basic design. From: "Pranith Kumar Karampuri" < pkara...@redhat.com > To: "Xavier Hernandez" < xhernan...@datalab.es > Cc: "Ashish Pandey" < aspan...@redhat.com >, "Gluster Devel" < gluster-devel@gluster.org > Sent: Friday, June 16, 2017 3:50:09 PM Subject: Re: [Gluster-devel] Disperse volume : Sequential Writes On Fri, Jun 16, 2017 at 3:12 PM, Xavier Hernandez < xhernan...@datalab.es > wrote: On 16/06/17 10:51, Pranith Kumar Karampuri wrote: On Fri, Jun 16, 2017 at 12:02 PM, Xavier Hernandez < xhernan...@datalab.es > wrote: On 15/06/17 11:50, Pranith Kumar Karampuri wrote: On Thu, Jun 15, 2017 at 11:51 AM, Ashish Pandey < aspan...@redhat.com >> wrote: Hi All, We have been facing some issues in disperse (EC) volume. We know that currently EC is not good for random IO as it requires READ-MODIFY-WRITE fop cycle if an offset and offset+length falls in the middle of strip size. Unfortunately, it could also happen with sequential writes. Consider an EC volume with configuration 4+2. The stripe size for this would be 512 * 4 = 2048. That is, 2048 bytes of user data stored in one stripe. Let's say 2048 + 512 = 2560 bytes are already written on this volume. 512 Bytes would be in second stripe. Now, if there are sequential writes with offset 2560 and of size 1 Byte, we have to read the whole stripe, encode it with 1 Byte and then again have to write it back. Next, write with offset 2561 and size of 1 Byte will again READ-MODIFY-WRITE the whole stripe. This is causing bad performance. There are some tools and scenario's where such kind of load is coming and users are not aware of that. Example: fio and zip Solution: One possible solution to deal with this issue is to keep last stripe in memory. This way, we need not to read it again and we can save READ fop going over the network. Considering the above example, we have to keep last 2048 bytes (maximum) in memory per file. This should not be a big deal as we already keep some data like xattr's and size info in memory and based on that we take decisions. Please provide your thoughts on this and also if you have any other solution. Just adding more details. The stripe will be in memory only when lock on the inode is active. I think that's ok. One thing we are yet to decide on is: do we want to read the stripe everytime we get the lock or just after an extending write is performed. I am thinking keeping the stripe in memory just after an extending write is better as it doesn't involve extra network operation. I wouldn't read the last stripe unconditionally every time we lock the inode. There's no benefit at all on random writes (in fact it's worse) and a sequential write will issue the read anyway when needed. The only difference is a small delay for the first operation after a lock. Yes, perfect. What I would do is to keep the last stripe of every write (we can consider to do it per fd), even if it's not the last stripe of the file (to also optimize sequential rewrites). Ah! good point. But if we remember it per fd, one fd's cached data can be over-written by another fd on the disk so we need to also do cache invalidation. We only cache data if we have the inodelk, so all related fd's must be from the same client, and we'll control all its writes so cache invalidation in this case is pretty easy. There exists the possibility to have two fd's from the same client writing to the same region. To control this we would need some range checking in the writes, but all this is local, so it's easy to control it. Anyway, this is probably not a common case, so we could start by caching only the last stripe of the last write, ignoring the fd. May be implementation should consider this possibility. Yet to think about how to do this. But it is a good point. We should consider this. Maybe we could keep a list of cached stripes sorted by offs
[Gluster-devel] BUG: Code changes in EC as part of Brick Multiplexing
Hi, There are some code changes in EC which is impacting response time of gluster v heal info I have sent following patch to initiate the discussion on this and to understand why this code change was done. https://review.gluster.org/#/c/17606/1 ec: Increase notification in all the cases Problem: "gluster v heal info" is taking long time to respond when a brick is down. RCA: Heal info command does virtual mount. EC wait for 10 seconds, before sending UP call to upper xlator, to get notification (DOWN or UP) from all the bricks. Currently, we are increasing ec->xl_notify_count based on the current status of the brick. So, if a DOWN event notification has come and brick is already down, we are not increasing ec->xl_notify_count in ec_handle_down. Solution: Handle DOWN even as notification irrespective of what is the current status of brick. Code change was done by https://review.gluster.org/#/c/14763/ Ashish ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Build failed in Jenkins: regression-test-with-multiplex #60
Ok, I will check if this is catching the data corruption or not after modifying the code in EC. Initially it was not doing so. - Original Message - From: "Atin Mukherjee" <amukh...@redhat.com> To: "Ashish Pandey" <aspan...@redhat.com> Cc: "Gluster Devel" <gluster-devel@gluster.org> Sent: Monday, June 12, 2017 3:21:29 PM Subject: Re: Build failed in Jenkins: regression-test-with-multiplex #60 On Mon, Jun 12, 2017 at 11:37 AM, Atin Mukherjee < amukh...@redhat.com > wrote: On Mon, Jun 12, 2017 at 11:15 AM, Ashish Pandey < aspan...@redhat.com > wrote: Test is failing because of ENOTCONN. -+ [2017-06-11 21:26:04.650497] I [fuse-bridge.c:4210:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.14 [2017-06-11 21:26:04.650546] I [fuse-bridge.c:4840:fuse_graph_sync] 0-fuse: switched to graph 0 [2017-06-11 21:26:04.650890] E [fuse-bridge.c:4276:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected) [2017-06-11 21:26:04.651204] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: ----0001: failed to resolve (Transport endpoint is not connected) [2017-06-11 21:26:04.651231] E [fuse-bridge.c:881:fuse_getattr_resume] 0-glusterfs-fuse: 2: GETATTR 1 (----0001) resolution failed [2017-06-11 21:26:04.651379] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: ----0001: failed to resolve (Transport endpoint is not connected) [2017-06-11 21:26:04.651396] E [fuse-bridge.c:881:fuse_getattr_resume] 0-glusterfs-fuse: 3: GETATTR 1 (----0001) resolution failed [2017-06-11 21:26:05.654880] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: ----0001: failed to resolve (Transport endpoint is not connected) [2017-06-11 21:26:05.654921] E [fuse-bridge.c:881:fuse_getattr_resume] 0-glusterfs-fuse: 4: GETATTR 1 (----0001) resolution failed [2017-06-11 21:26:05.655105] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: ----0001: failed to resolve (Transport endpoint is not connected) [2017-06-11 21:26:05.655132] E [fuse-bridge.c:881:fuse_getattr_resume] 0-glusterfs-fuse: 5: GETATTR 1 (----0001) resolution failed [2017-06-11 21:26:06.658233] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: ----0001: failed to resolve (Transport endpoint is not connected) [2017-06-11 21:26:06.658294] E [fuse-bridge.c:881:fuse_getattr_resume] 0-glusterfs-fuse: 6: GETATTR 1 (----0001) resolution failed [2017-06-11 21:26:06.658445] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: ----0001: failed to resolve (Transport endpoint is not connected) [2017-06-11 21:26:06.658471] E [fuse-bridge.c:881:fuse_getattr_resume] 0-glusterfs-fuse: 7: GETATTR 1 (----0001) resolution failed [2017-06-11 21:26:07.661446] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: ----0001: failed to resolve (Transport endpoint is not connected) [2017-06-11 21:26:07.661487] E [fuse-bridge.c:881:fuse_getattr_resume] 0-glusterfs-fuse: 8: GETATTR 1 (----0001) resolution failed [2017-06-11 21:26:07.661625] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: ----0001: failed to resolve (Transport endpoint is not connected) [2017-06-11 21:26:07.661642] E [fuse-bridge.c:881:fuse_getattr_resume] 0-glusterfs-fuse: 9: GETATTR 1 (----0001) resolution failed [2017-06-11 21:26:08.664545] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: ----0001: failed to resolve (Transport endpoint is not connected) [2017-06-11 21:26:08.664598] E [fuse-bridge.c:881:fuse_getattr_resume] 0-glusterfs-fuse: 10: GETATTR 1 (----0001) resolution failed In this test, we are trying to kill a brick and starting it using command line. I think that is what actually failing. In multiplexing, can we do it? Or is there some other way of doing the same thing? What's the reason of starting the brick from cmdline? Why can't we start the volume with force? Posted a patch : https://review.gluster.org/#/c/17508 Ashish From: "Atin Mukherjee" < amukh...@redhat.com > To: "Ashish Pandey" < aspan...@redhat.com > Cc: "Gluster Devel" < gluster-devel@gluster.org > Sent: Monday, June 12, 2017 10:10:05 AM Subject: Fwd: Build failed in Jenkins: regression-test-with-multiplex #60 https://review.gluster.org/#/c/16985/ has intro
Re: [Gluster-devel] Build failed in Jenkins: regression-test-with-multiplex #60
Test is failing because of ENOTCONN. -+ [2017-06-11 21:26:04.650497] I [fuse-bridge.c:4210:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.14 [2017-06-11 21:26:04.650546] I [fuse-bridge.c:4840:fuse_graph_sync] 0-fuse: switched to graph 0 [2017-06-11 21:26:04.650890] E [fuse-bridge.c:4276:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected) [2017-06-11 21:26:04.651204] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: ----0001: failed to resolve (Transport endpoint is not connected) [2017-06-11 21:26:04.651231] E [fuse-bridge.c:881:fuse_getattr_resume] 0-glusterfs-fuse: 2: GETATTR 1 (----0001) resolution failed [2017-06-11 21:26:04.651379] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: ----0001: failed to resolve (Transport endpoint is not connected) [2017-06-11 21:26:04.651396] E [fuse-bridge.c:881:fuse_getattr_resume] 0-glusterfs-fuse: 3: GETATTR 1 (----0001) resolution failed [2017-06-11 21:26:05.654880] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: ----0001: failed to resolve (Transport endpoint is not connected) [2017-06-11 21:26:05.654921] E [fuse-bridge.c:881:fuse_getattr_resume] 0-glusterfs-fuse: 4: GETATTR 1 (----0001) resolution failed [2017-06-11 21:26:05.655105] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: ----0001: failed to resolve (Transport endpoint is not connected) [2017-06-11 21:26:05.655132] E [fuse-bridge.c:881:fuse_getattr_resume] 0-glusterfs-fuse: 5: GETATTR 1 (----0001) resolution failed [2017-06-11 21:26:06.658233] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: ----0001: failed to resolve (Transport endpoint is not connected) [2017-06-11 21:26:06.658294] E [fuse-bridge.c:881:fuse_getattr_resume] 0-glusterfs-fuse: 6: GETATTR 1 (----0001) resolution failed [2017-06-11 21:26:06.658445] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: ----0001: failed to resolve (Transport endpoint is not connected) [2017-06-11 21:26:06.658471] E [fuse-bridge.c:881:fuse_getattr_resume] 0-glusterfs-fuse: 7: GETATTR 1 (----0001) resolution failed [2017-06-11 21:26:07.661446] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: ----0001: failed to resolve (Transport endpoint is not connected) [2017-06-11 21:26:07.661487] E [fuse-bridge.c:881:fuse_getattr_resume] 0-glusterfs-fuse: 8: GETATTR 1 (----0001) resolution failed [2017-06-11 21:26:07.661625] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: ----0001: failed to resolve (Transport endpoint is not connected) [2017-06-11 21:26:07.661642] E [fuse-bridge.c:881:fuse_getattr_resume] 0-glusterfs-fuse: 9: GETATTR 1 (----0001) resolution failed [2017-06-11 21:26:08.664545] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: ----0001: failed to resolve (Transport endpoint is not connected) [2017-06-11 21:26:08.664598] E [fuse-bridge.c:881:fuse_getattr_resume] 0-glusterfs-fuse: 10: GETATTR 1 (----0001) resolution failed In this test, we are trying to kill a brick and starting it using command line. I think that is what actually failing. In multiplexing, can we do it? Or is there some other way of doing the same thing? Ashish - Original Message - From: "Atin Mukherjee" <amukh...@redhat.com> To: "Ashish Pandey" <aspan...@redhat.com> Cc: "Gluster Devel" <gluster-devel@gluster.org> Sent: Monday, June 12, 2017 10:10:05 AM Subject: Fwd: Build failed in Jenkins: regression-test-with-multiplex #60 https://review.gluster.org/#/c/16985/ has introduced a new test ec-data-heal.t which is now constantly failing with brick multiplexing. Can this be looked at? -- Forwarded message -- From: < jenk...@build.gluster.org > Date: Mon, Jun 12, 2017 at 6:33 AM Subject: Build failed in Jenkins: regression-test-with-multiplex #60 To: maintain...@gluster.org , j...@pl.atyp.us , avish...@redhat.com , pkara...@redhat.com , amukh...@redhat.com , xhernan...@datalab.es , rgowd...@redhat.com , nde...@redhat.com See < https://build.gluster.org/job/regression-test-with-multiplex/60/display/redirect > -- [...truncated 747.65 KB...] ./tests/basic/glusterd/arbiter-volume-probe.t - 14 second ./tests/basic/gfid-access.t - 14 second ./tests/basic/ec/ec-root-heal.t - 14 second ./tests/basic
Re: [Gluster-devel] Performance experiments with io-stats translator
Please note the bug in fio https://github.com/axboe/fio/issues/376 which is actually impacting performance in case of EC volume. I am not sure if this would be relevant in your case but thought to mention it. Ashish - Original Message - From: "Manoj Pillai"To: "Krutika Dhananjay" Cc: "Gluster Devel" Sent: Thursday, June 8, 2017 12:22:19 PM Subject: Re: [Gluster-devel] Performance experiments with io-stats translator Thanks. So I was suggesting a repeat of the test but this time with iodepth=1 in the fio job. If reducing the no. of concurrent requests reduces drastically the high latency you're seeing from the client-side, that would strengthen the hypothesis than serialization/contention among concurrent requests at the n/w layers is the root cause here. -- Manoj On Thu, Jun 8, 2017 at 11:46 AM, Krutika Dhananjay < kdhan...@redhat.com > wrote: Hi, This is what my job file contains: [global] ioengine=libaio #unified_rw_reporting=1 randrepeat=1 norandommap=1 group_reporting direct=1 runtime=60 thread size=16g [workload] bs=4k rw=randread iodepth=8 numjobs=1 file_service_type=random filename=/perf5/iotest/fio_5 filename=/perf6/iotest/fio_6 filename=/perf7/iotest/fio_7 filename=/perf8/iotest/fio_8 I have 3 vms reading from one mount, and each of these vms is running the above job in parallel. -Krutika On Tue, Jun 6, 2017 at 9:14 PM, Manoj Pillai < mpil...@redhat.com > wrote: On Tue, Jun 6, 2017 at 5:05 PM, Krutika Dhananjay < kdhan...@redhat.com > wrote: Hi, As part of identifying performance bottlenecks within gluster stack for VM image store use-case, I loaded io-stats at multiple points on the client and brick stack and ran randrd test using fio from within the hosted vms in parallel. Before I get to the results, a little bit about the configuration ... 3 node cluster; 1x3 plain replicate volume with group virt settings, direct-io. 3 FUSE clients, one per node in the cluster (which implies reads are served from the replica that is local to the client). io-stats was loaded at the following places: On the client stack: Above client-io-threads and above protocol/client-0 (the first child of AFR). On the brick stack: Below protocol/server, above and below io-threads and just above storage/posix. Based on a 60-second run of randrd test and subsequent analysis of the stats dumped by the individual io-stats instances, the following is what I found: Translator Position Avg Latency of READ fop as seen by this translator 1. parent of client-io-threads 1666us ∆ (1,2) = 50us 2. parent of protocol/client-0 1616us ∆ (2,3) = 1453us - end of client stack - - beginning of brick stack --- 3. child of protocol/server 163us ∆ (3,4) = 7us 4. parent of io-threads 156us ∆ (4,5) = 20us 5. child-of-io-threads 136us ∆ (5,6) = 11us 6. parent of storage/posix 125us ... end of brick stack So it seems like the biggest bottleneck here is a combination of the network + epoll, rpc layer? I must admit I am no expert with networks, but I'm assuming if the client is reading from the local brick, then even latency contribution from the actual network won't be much, in which case bulk of the latency is coming from epoll, rpc layer, etc at both client and brick end? Please correct me if I'm wrong. I will, of course, do some more runs and confirm if the pattern is consistent. -Krutika Really interesting numbers! How many concurrent requests are in flight in this test? Could you post the fio job? I'm wondering if/how these latency numbers change if you reduce the number of concurrent requests. -- Manoj ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] EC Healing Algorithm
If the data is written on minimum number of brick, heal will take place on failed brick only. Data will be read from good bricks, encoding will happen and the fragment on the failed brick will be written only. - Original Message - From: "jayakrishnan mm"To: "Gluster Devel" Sent: Thursday, April 6, 2017 2:21:26 PM Subject: [Gluster-devel] EC Healing Algorithm Hi I am using Glusterfs3.7.15. What type of algorithm is used in EC Healing ? I mean , if a brick fails during writing and if it comes back online later , whether all the bricks will be re-written or only the failed brick is written with the new data? Best regards JK ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] Proposal to deprecate replace-brick for "distribute only" volumes
- Original Message - From: "Atin Mukherjee"To: "Raghavendra Talur" , gluster-devel@gluster.org, gluster-us...@gluster.org Sent: Thursday, March 16, 2017 4:22:41 PM Subject: Re: [Gluster-devel] [Gluster-users] Proposal to deprecate replace-brick for "distribute only" volumes Makes sense. On Thu, 16 Mar 2017 at 06:51, Raghavendra Talur < rta...@redhat.com > wrote: Hi, In the last few releases, we have changed replace-brick command such that it can be called only with "commit force" option. When invoked, this is what happens to the volume: a. distribute only volume: the given brick is replaced with a empty brick with 100% probability of data loss. b. distribute-replicate: the given brick is replaced with a empty brick and self heal is triggered. If admin is wise enough to monitor self heal status before another replace-brick command, data is safe. c. distribute-disperse: same as above in distribute-replicate My proposal is to fully deprecate replace-brick command for "distribute only" volumes. It should print out a error "The right way to replace brick for distribute only volume is to add brick, wait for rebalance to complete and remove brick" and return a "-1". It makes sense. I just don't see any use of add-brick before remove-brick except the fact that it will help to keep the overall storage capacity of volume intact . What is the guarantee that the files on the brick which we want to replace would migrate to added brick? If a brick, which we want to replace, is healthy and we just want to replace it then perhaps we should provide a command to copy those files to new brick and then remove the old brick. Thoughts? Thanks, Raghavendra Talur ___ Gluster-users mailing list gluster-us...@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users -- --Atin ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Spurious regression failure? tests/basic/ec/ec-background-heals.t
Xavi, shd has been disabled in this test on line number 12 and we have also disabled client side heal. So, no body is going to try to heal it. Ashish - Original Message - From: "Atin Mukherjee" <amukh...@redhat.com> To: "Ashish Pandey" <aspan...@redhat.com>, "Raghavendra Gowdappa" <rgowd...@redhat.com>, "Xavier Hernandez" <xhernan...@datalab.es> Cc: "Gluster Devel" <gluster-devel@gluster.org> Sent: Thursday, January 26, 2017 5:50:00 PM Subject: Re: [Gluster-devel] Spurious regression failure? tests/basic/ec/ec-background-heals.t I've +1ed it now. On Thu, 26 Jan 2017 at 15:05, Xavier Hernandez < xhernan...@datalab.es > wrote: Hi Atin, I don't clearly see what's the problem. Even if the truncate causes a dirty flag to be set, eventually it should be removed before the $HEAL_TIMEOUT value. For now I've marked the test as bad. Patch is: https://review.gluster.org/16470 Xavi On 25/01/17 17:24, Atin Mukherjee wrote: > Can we please address this as early as possible, my patch has hit this > failure 3 out of 4 recheck attempts now. I'm guessing some recent > changes has caused it. > > On Wed, 25 Jan 2017 at 12:10, Ashish Pandey < aspan...@redhat.com > > wrote: > > > Pranith, > > In this test tests/basic/ec/ec-background-heals.t, I think the line > number 86 actually creating a heal entry instead of > helping data heal quickly. What if all the data was already healed > at that moment, truncate came and in preop set the dirty flag and at the > end, as part of the heal, dirty flag was unset on previous good > bricks only and the brick which acted as heal-sink still has dirty > marked by truncate. > That is why we are only seeing "1" as get_pending_heal_count. If a > file was actually not healed it should be "2". > If heal on this file completes and unset of dirty flag happens > before truncate everything will be fine. > > I think we can wait for file to be heal without truncate? > > 71 #Test that disabling background-heals still drains the queue > 72 TEST $CLI volume set $V0 disperse.background-heals 1 > 73 TEST touch $M0/{a,b,c,d} > 74 TEST kill_brick $V0 $H0 $B0/${V0}2 > 75 EXPECT_WITHIN $CONFIG_UPDATE_TIMEOUT "1" mount_get_option_value > $M0 $V0-disperse-0 background-heals > 76 EXPECT_WITHIN $CONFIG_UPDATE_TIMEOUT "200" > mount_get_option_value $M0 $V0-disperse-0 heal-wait-qlength > 77 TEST truncate -s 1GB $M0/a > 78 echo abc > $M0/b > 79 echo abc > $M0/c > 80 echo abc > $M0/d > 81 TEST $CLI volume start $V0 force > 82 EXPECT_WITHIN $CHILD_UP_TIMEOUT "3" ec_child_up_count $V0 0 > 83 TEST chown root:root $M0/{a,b,c,d} > 84 TEST $CLI volume set $V0 disperse.background-heals 0 > 85 EXPECT_NOT "0" mount_get_option_value $M0 $V0-disperse-0 > heal-waiters > > 86 TEST truncate -s 0 $M0/a # This completes the heal fast ;-) <<<<<<< > > 87 EXPECT_WITHIN $HEAL_TIMEOUT "^0$" get_pending_heal_count $V0 > > ---- > Ashish > > > > > > > *From: *"Raghavendra Gowdappa" < rgowd...@redhat.com > > > *To: *"Nithya Balachandran" < nbala...@redhat.com > > > *Cc: *"Gluster Devel" < gluster-devel@gluster.org > >, "Pranith Kumar Karampuri" > < pkara...@redhat.com >, "Ashish Pandey" > < aspan...@redhat.com > > *Sent: *Wednesday, January 25, 2017 9:41:38 AM > *Subject: *Re: [Gluster-devel] Spurious regression > failure? tests/basic/ec/ec-background-heals.t > > > Found another failure on same test: > https://build.gluster.org/job/centos6-regression/2874/consoleFull > > - Original Message - > > From: "Nithya Balachandran" < nbala...@redhat.com > > > > To: "Gluster Devel" < gluster-devel@gluster.org > >, "Pranith Kumar Karampuri" > < pkara...@redhat.com >, "Ashish Pandey" > > < aspan...@redhat.com > > > Sent: Tuesday, January 24, 2017 9:16:31 AM > > Subject: [Gluster-devel] Spurious regression > failure? tests/basic/ec/ec-background-heals.t > > > > Hi, > > > > > > Can you please take a look at > > https://build.gluster.org/job/centos6-regression/2859/console ? > > > > tests/basic/ec/ec-background-heals.t has failed. > > > > Thanks, > > Nithya > > > > ___ > > Gluster-devel mailing list > > Gluster-devel@gluster.org > > http://lists.gluster.org/mailman/listinfo/gluster-devel > ___ > > Gluster-devel mailing list > > Gluster-devel@gluster.org > > http://lists.gluster.org/mailman/listinfo/gluster-devel > > -- > - Atin (atinm) > > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-devel > -- - Atin (atinm) ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Spurious regression failure? tests/basic/ec/ec-background-heals.t
Pranith, In this test tests/basic/ec/ec-background-heals.t, I think the line number 86 actually creating a heal entry instead of helping data heal quickly. What if all the data was already healed at that moment, truncate came and in preop set the dirty flag and at the end, as part of the heal, dirty flag was unset on previous good bricks only and the brick which acted as heal-sink still has dirty marked by truncate. That is why we are only seeing "1" as get_pending_heal_count. If a file was actually not healed it should be "2". If heal on this file completes and unset of dirty flag happens before truncate everything will be fine. I think we can wait for file to be heal without truncate? 71 #Test that disabling background-heals still drains the queue 72 TEST $CLI volume set $V0 disperse.background-heals 1 73 TEST touch $M0/{a,b,c,d} 74 TEST kill_brick $V0 $H0 $B0/${V0}2 75 EXPECT_WITHIN $CONFIG_UPDATE_TIMEOUT "1" mount_get_option_value $M0 $V0-disperse-0 background-heals 76 EXPECT_WITHIN $CONFIG_UPDATE_TIMEOUT "200" mount_get_option_value $M0 $V0-disperse-0 heal-wait-qlength 77 TEST truncate -s 1GB $M0/a 78 echo abc > $M0/b 79 echo abc > $M0/c 80 echo abc > $M0/d 81 TEST $CLI volume start $V0 force 82 EXPECT_WITHIN $CHILD_UP_TIMEOUT "3" ec_child_up_count $V0 0 83 TEST chown root:root $M0/{a,b,c,d} 84 TEST $CLI volume set $V0 disperse.background-heals 0 85 EXPECT_NOT "0" mount_get_option_value $M0 $V0-disperse-0 heal-waiters 86 TEST truncate -s 0 $M0/a # This completes the heal fast ;-) <<<<<<< 87 EXPECT_WITHIN $HEAL_TIMEOUT "^0$" get_pending_heal_count $V0 Ashish - Original Message - From: "Raghavendra Gowdappa" <rgowd...@redhat.com> To: "Nithya Balachandran" <nbala...@redhat.com> Cc: "Gluster Devel" <gluster-devel@gluster.org>, "Pranith Kumar Karampuri" <pkara...@redhat.com>, "Ashish Pandey" <aspan...@redhat.com> Sent: Wednesday, January 25, 2017 9:41:38 AM Subject: Re: [Gluster-devel] Spurious regression failure? tests/basic/ec/ec-background-heals.t Found another failure on same test: https://build.gluster.org/job/centos6-regression/2874/consoleFull - Original Message - > From: "Nithya Balachandran" <nbala...@redhat.com> > To: "Gluster Devel" <gluster-devel@gluster.org>, "Pranith Kumar Karampuri" > <pkara...@redhat.com>, "Ashish Pandey" > <aspan...@redhat.com> > Sent: Tuesday, January 24, 2017 9:16:31 AM > Subject: [Gluster-devel] Spurious regression failure? > tests/basic/ec/ec-background-heals.t > > Hi, > > > Can you please take a look at > https://build.gluster.org/job/centos6-regression/2859/console ? > > tests/basic/ec/ec-background-heals.t has failed. > > Thanks, > Nithya > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] Error being logged in disperse volumes
That means ec is not getting correct trusted.ec.config xattr from minimum number of bricks. 1 - Did you see any error on client side while accessing any file? 2 - If yes, check the file xattr's from all the bricks for such files. It is too less information to find out the cause. If [1] is true then you have to give all client logs, getxattr from all the bricks for all the file giving error. gluster v status, info and also gluster volume heal info. Is there anything you changed recently on volume? Ashish - Original Message - From: "Ankireddypalle Reddy"To: gluster-us...@gluster.org, "Gluster Devel (gluster-devel@gluster.org)" Sent: Tuesday, December 20, 2016 7:42:29 PM Subject: [Gluster-users] Error being logged in disperse volumes Hi, I am seeing many instances of the following error in the log files. What does this signify. [2016-12-19 08:14:04.988004] E [MSGID: 122001] [ec-common.c:872:ec_config_check] 0-StoragePool-disperse-1: Invalid or corrupted config [Invalid argument] [2016-12-19 08:14:04.988027] E [MSGID: 122066] [ec-common.c:969:ec_prepare_update_cbk] 0-StoragePool-disperse-1: Invalid config xattr [Invalid argument] [2016-12-19 08:14:04.988038] E [MSGID: 122001] [ec-common.c:872:ec_config_check] 0-StoragePool-disperse-0: Invalid or corrupted config [Invalid argument] [2016-12-19 08:14:04.988055] E [MSGID: 122066] [ec-common.c:969:ec_prepare_update_cbk] 0-StoragePool-disperse-0: Invalid config xattr [Invalid argument] [2016-12-19 08:14:04.988179] E [MSGID: 122001] [ec-common.c:872:ec_config_check] 0-StoragePool-disperse-3: Invalid or corrupted config [Invalid argument] [2016-12-19 08:14:04.988193] E [MSGID: 122066] [ec-common.c:969:ec_prepare_update_cbk] 0-StoragePool-disperse-3: Invalid config xattr [Invalid argument] [2016-12-19 08:14:04.988228] E [MSGID: 122001] [ec-common.c:872:ec_config_check] 0-StoragePool-disperse-2: Invalid or corrupted config [Invalid argument] [2016-12-19 08:14:04.988248] E [MSGID: 122066] [ec-common.c:969:ec_prepare_update_cbk] 0-StoragePool-disperse-2: Invalid config xattr [Invalid argument] [2016-12-19 08:14:04.988338] E [MSGID: 122001] [ec-common.c:872:ec_config_check] 0-StoragePool-disperse-4: Invalid or corrupted config [Invalid argument] [2016-12-19 08:14:04.988350] E [MSGID: 122066] [ec-common.c:969:ec_prepare_update_cbk] 0-StoragePool-disperse-4: Invalid config xattr [Invalid argument] [2016-12-19 08:14:04.988374] E [MSGID: 122001] [ec-common.c:872:ec_config_check] 0-StoragePool-disperse-5: Invalid or corrupted config [Invalid argument] [2016-12-19 08:14:04.988388] E [MSGID: 122066] [ec-common.c:969:ec_prepare_update_cbk] 0-StoragePool-disperse-5: Invalid config xattr [Invalid argument] [2016-12-19 08:14:04.988460] E [MSGID: 122001] [ec-common.c:872:ec_config_check] 0-StoragePool-disperse-7: Invalid or corrupted config [Invalid argument] [2016-12-19 08:14:04.988478] E [MSGID: 122066] [ec-common.c:969:ec_prepare_update_cbk] 0-StoragePool-disperse-7: Invalid config xattr [Invalid argument] [2016-12-19 08:14:05.508034] E [MSGID: 122001] [ec-common.c:872:ec_config_check] 0-StoragePool-disperse-6: Invalid or corrupted config [Invalid argument] [2016-12-19 08:14:05.508072] E [MSGID: 122066] [ec-common.c:969:ec_prepare_update_cbk] 0-StoragePool-disperse-6: Invalid config xattr [Invalid argument] Thanks and Regards, Ram ***Legal Disclaimer*** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you." ** ___ Gluster-users mailing list gluster-us...@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] 1402538 : Assertion failure during rebalance of symbolic links
Hi All, We have been seeing an issue where re balancing symbolic links leads to an assertion failure in EC volume. The root cause of this is that while migrating symbolic links to other sub volume, it creates a link file (with attributes .T) . This file is a regular file. Now, during migration a setattr comes to this link and because of possible race, posix_stat return stats of this "T" file. In ec_manager_seattr, we receive callbacks and check the type of entry. If it is a regular file we try to get size and if it is not there, we raise an assert. So, basically we are checking a size of the link (which will not have size) which has been returned as regular file and we are ending up when this condition becomes TRUE. Now, this looks like a problem with re balance and difficult to fix at this point (as per the discussion). We have an alternative to fix it in EC but that will be more like a hack than an actual fix. We should not modify EC to deal with an individual issue which is in other translator. Now the question is how to proceed with this? Any suggestions? Details on this bug can be found here - https://bugzilla.redhat.com/show_bug.cgi?id=1402538 Ashish ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] EC volume: Bug caused by race condition during rmdir and inodelk
Hi All, On EC volume, we have been seeing an interesting bug caused by fine race between rmdir and inodelk which leads to EIO error. Pranith, Xavi and I had a discussion on this and have some possible solution. Your inputs are required on this bug and its possible solution. 1 - Consider rmdir on /a/b and chown on a/b from 2 different clients/process. rmdir /a/b takes lock on "a" and deletes "b". However, chown /a/b will take lock on "b" to do setattr fop. Now, in case of (4+2) EC volume, inodelk might get ENOENT from 3 bricks (if rmdir /a/b succeeds on these 3 bricks) and might get locks from rest of the 3 bricks. As an operation should be successful on at least 4 bricks, it will throw EIO for chown. This can be solved on EC side while processing callbacks and based on error we can decide which error we should be passed on. In the above case sending ENOENT could be safer. 2 - rmdir /a/b and rmdir /a/b/c comes from 2 different clients/process. Now, suppose "c" has been deleted by some other process, rmdir /a/b would be succeeded. At this point, it is possible that /a/b has been deleted and the inode for "b" has been purged on 3 bricks. At time the inodelk on "b" comes for rmdir /a/b/c. It will fail on 3 bricks and gets lock on rest of the 3. In this case again, we gets EIO. To solve this, It was suggested to take lock on parent as well as on entry which is to be deleted. So in the above case when we do rmdir /a/b/c we will take locks on "b" and "c" both. For rmdir /a/b we will take lock on "a" and "b". This will certainly impact performance but at this moment this looks feasible solution. Ashish ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Review request for EC - set/unset dirty flag for data/metadata update
Hi, Please review the following patch for EC- http://review.gluster.org/#/c/13733/ Ashish ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Compilation failed on latest gluster
As Susant and Atin suggested, I cleaned everything and did installation from scratch and it is working now. - Original Message - From: "Nigel Babu" <nig...@redhat.com> To: "Ashish Pandey" <aspan...@redhat.com> Cc: "Manikandan Selvaganesh" <mselv...@redhat.com>, "Gluster Devel" <gluster-devel@gluster.org> Sent: Thursday, August 25, 2016 11:38:46 AM Subject: Re: [Gluster-devel] Compilation failed on latest gluster Are you using something that's not Centos, NetBSD, or FreeBSD? I'm curious how we managed to slip a build failure despite our smoke tests. On Thu, Aug 25, 2016 at 11:19 AM, Ashish Pandey < aspan...@redhat.com > wrote: Hi, I am trying to build latest code on my laptop and it is giving compilation error - CC cli-rl.o CC cli-cmd-global.o CC cli-cmd-volume.o cli-cmd-volume.c: In function ‘cli_cmd_quota_cbk’: cli-cmd-volume.c:1712:35: error: ‘EVENT_QUOTA_ENABLE’ undeclared (first use in this function) gf_event (EVENT_QUOTA_ENABLE, "volume=%s", volname); ^ cli-cmd-volume.c:1712:35: note: each undeclared identifier is reported only once for each function it appears in cli-cmd-volume.c:1715:35: error: ‘EVENT_QUOTA_DISABLE’ undeclared (first use in this function) gf_event (EVENT_QUOTA_DISABLE, "volume=%s", volname); ^ cli-cmd-volume.c:1718:35: error: ‘EVENT_QUOTA_SET_USAGE_LIMIT’ undeclared (first use in this function) gf_event (EVENT_QUOTA_SET_USAGE_LIMIT, "volume=%s;" ^ cli-cmd-volume.c:1723:35: error: ‘EVENT_QUOTA_SET_OBJECTS_LIMIT’ undeclared (first use in this function) gf_event (EVENT_QUOTA_SET_OBJECTS_LIMIT, "volume=%s;" ^ cli-cmd-volume.c:1728:35: error: ‘EVENT_QUOTA_REMOVE_USAGE_LIMIT’ undeclared (first use in this function) gf_event (EVENT_QUOTA_REMOVE_USAGE_LIMIT, "volume=%s;" ^ cli-cmd-volume.c:1732:35: error: ‘EVENT_QUOTA_REMOVE_OBJECTS_LIMIT’ undeclared (first use in this function) gf_event (EVENT_QUOTA_REMOVE_OBJECTS_LIMIT, ^ cli-cmd-volume.c:1736:35: error: ‘EVENT_QUOTA_ALERT_TIME’ undeclared (first use in this function) gf_event (EVENT_QUOTA_ALERT_TIME, "volume=%s;time=%s", ^ cli-cmd-volume.c:1740:35: error: ‘EVENT_QUOTA_SOFT_TIMEOUT’ undeclared (first use in this function) gf_event (EVENT_QUOTA_SOFT_TIMEOUT, "volume=%s;" ^ cli-cmd-volume.c:1744:35: error: ‘EVENT_QUOTA_HARD_TIMEOUT’ undeclared (first use in this function) gf_event (EVENT_QUOTA_HARD_TIMEOUT, "volume=%s;" ^ cli-cmd-volume.c:1748:35: error: ‘EVENT_QUOTA_DEFAULT_SOFT_LIMIT’ undeclared (first use in this function) gf_event (EVENT_QUOTA_DEFAULT_SOFT_LIMIT, "volume=%s;" ^ Makefile:539: recipe for target 'cli-cmd-volume.o' failed If I roll back 4 patches and then compile it is working. I am suspecting that http://review.gluster.org/15230 is doing something. Could you please look into this? Do I need to do something to make it work? Ashish ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel -- nigelb ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Compilation failed on latest gluster
Hi, I am trying to build latest code on my laptop and it is giving compilation error - CC cli-rl.o CC cli-cmd-global.o CC cli-cmd-volume.o cli-cmd-volume.c: In function ‘cli_cmd_quota_cbk’: cli-cmd-volume.c:1712:35: error: ‘EVENT_QUOTA_ENABLE’ undeclared (first use in this function) gf_event (EVENT_QUOTA_ENABLE, "volume=%s", volname); ^ cli-cmd-volume.c:1712:35: note: each undeclared identifier is reported only once for each function it appears in cli-cmd-volume.c:1715:35: error: ‘EVENT_QUOTA_DISABLE’ undeclared (first use in this function) gf_event (EVENT_QUOTA_DISABLE, "volume=%s", volname); ^ cli-cmd-volume.c:1718:35: error: ‘EVENT_QUOTA_SET_USAGE_LIMIT’ undeclared (first use in this function) gf_event (EVENT_QUOTA_SET_USAGE_LIMIT, "volume=%s;" ^ cli-cmd-volume.c:1723:35: error: ‘EVENT_QUOTA_SET_OBJECTS_LIMIT’ undeclared (first use in this function) gf_event (EVENT_QUOTA_SET_OBJECTS_LIMIT, "volume=%s;" ^ cli-cmd-volume.c:1728:35: error: ‘EVENT_QUOTA_REMOVE_USAGE_LIMIT’ undeclared (first use in this function) gf_event (EVENT_QUOTA_REMOVE_USAGE_LIMIT, "volume=%s;" ^ cli-cmd-volume.c:1732:35: error: ‘EVENT_QUOTA_REMOVE_OBJECTS_LIMIT’ undeclared (first use in this function) gf_event (EVENT_QUOTA_REMOVE_OBJECTS_LIMIT, ^ cli-cmd-volume.c:1736:35: error: ‘EVENT_QUOTA_ALERT_TIME’ undeclared (first use in this function) gf_event (EVENT_QUOTA_ALERT_TIME, "volume=%s;time=%s", ^ cli-cmd-volume.c:1740:35: error: ‘EVENT_QUOTA_SOFT_TIMEOUT’ undeclared (first use in this function) gf_event (EVENT_QUOTA_SOFT_TIMEOUT, "volume=%s;" ^ cli-cmd-volume.c:1744:35: error: ‘EVENT_QUOTA_HARD_TIMEOUT’ undeclared (first use in this function) gf_event (EVENT_QUOTA_HARD_TIMEOUT, "volume=%s;" ^ cli-cmd-volume.c:1748:35: error: ‘EVENT_QUOTA_DEFAULT_SOFT_LIMIT’ undeclared (first use in this function) gf_event (EVENT_QUOTA_DEFAULT_SOFT_LIMIT, "volume=%s;" ^ Makefile:539: recipe for target 'cli-cmd-volume.o' failed If I roll back 4 patches and then compile it is working. I am suspecting that http://review.gluster.org/15230 is doing something. Could you please look into this? Do I need to do something to make it work? Ashish ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Patch Review
Hi All, I have modified the code for volume file generation to support decompounder translator. Please review this patch and provide me your comments/suggestion. http://review.gluster.org/#/c/13968/ Ashish ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression-test-burn-in crash in EC test
Hi Jeff, Where can we find the core dump? --- Ashish - Original Message - From: "Pranith Kumar Karampuri" <pkara...@redhat.com> To: "Jeff Darcy" <jda...@redhat.com> Cc: "Gluster Devel" <gluster-devel@gluster.org>, "Ashish Pandey" <aspan...@redhat.com> Sent: Thursday, April 28, 2016 11:58:54 AM Subject: Re: [Gluster-devel] Regression-test-burn-in crash in EC test Ashish, Could you take a look at this? Pranith - Original Message - > From: "Jeff Darcy" <jda...@redhat.com> > To: "Gluster Devel" <gluster-devel@gluster.org> > Sent: Wednesday, April 27, 2016 11:31:25 PM > Subject: [Gluster-devel] Regression-test-burn-in crash in EC test > > One of the "rewards" of reviewing and merging people's patches is getting > email if the next regression-test-burn-in should fail - even if it fails for > a completely unrelated reason. Today I got one that's not among the usual > suspects. The failure was a core dump in tests/bugs/disperse/bug-1304988.t, > weighing in at a respectable 42 frames. > > #0 0x7fef25976cb9 in dht_rename_lock_cbk > #1 0x7fef25955f62 in dht_inodelk_done > #2 0x7fef25957352 in dht_blocking_inodelk_cbk > #3 0x7fef32e02f8f in default_inodelk_cbk > #4 0x7fef25c029a3 in ec_manager_inodelk > #5 0x7fef25bf9802 in __ec_manager > #6 0x7fef25bf990c in ec_manager > #7 0x7fef25c03038 in ec_inodelk > #8 0x7fef25bee7ad in ec_gf_inodelk > #9 0x7fef25957758 in dht_blocking_inodelk_rec > #10 0x7fef25957b2d in dht_blocking_inodelk > #11 0x7fef2597713f in dht_rename_lock > #12 0x7fef25977835 in dht_rename > #13 0x7fef32e0f032 in default_rename > #14 0x7fef32e0f032 in default_rename > #15 0x7fef32e0f032 in default_rename > #16 0x7fef32e0f032 in default_rename > #17 0x7fef32e0f032 in default_rename > #18 0x7fef32e07c29 in default_rename_resume > #19 0x7fef32d8ed40 in call_resume_wind > #20 0x7fef32d98b2f in call_resume > #21 0x7fef24cfc568 in open_and_resume > #22 0x7fef24cffb99 in ob_rename > #23 0x7fef24aee482 in mdc_rename > #24 0x7fef248d68e5 in io_stats_rename > #25 0x7fef32e0f032 in default_rename > #26 0x7fef2ab1b2b9 in fuse_rename_resume > #27 0x7fef2ab12c47 in fuse_fop_resume > #28 0x7fef2ab107cc in fuse_resolve_done > #29 0x7fef2ab108a2 in fuse_resolve_all > #30 0x7fef2ab10900 in fuse_resolve_continue > #31 0x7fef2ab0fb7c in fuse_resolve_parent > #32 0x7fef2ab1077d in fuse_resolve > #33 0x7fef2ab10879 in fuse_resolve_all > #34 0x7fef2ab10900 in fuse_resolve_continue > #35 0x7fef2ab0fb7c in fuse_resolve_parent > #36 0x7fef2ab1077d in fuse_resolve > #37 0x7fef2ab10824 in fuse_resolve_all > #38 0x7fef2ab1093e in fuse_resolve_and_resume > #39 0x7fef2ab1b40e in fuse_rename > #40 0x7fef2ab2a96a in fuse_thread_proc > #41 0x7fef3204daa1 in start_thread > > In other words we started at FUSE, went through a bunch of performance > translators, through DHT to EC, and then crashed on the way back. It seems > a little odd that we turn the fop around immediately in EC, and that we have > default_inodelk_cbk at frame 3. Could one of the DHT or EC people please > take a look at it? Thanks! > > > https://build.gluster.org/job/regression-test-burn-in/868/console > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] Assertion failed: ec_get_inode_size
Hi Serkan, I have gone through the logs and can see there are some blocked inode lock requests. We have observed that some other user have also faced this issue with similar logs. I think you have tried some rolling update on your setup or some NODES , on which you have collected these statedumps, must have gone down for one or other reason. We will further dig it up and will try to find out the root cause. Till than you can resolve this issue by restarting the volume which will restart nfs and shd and will release any locks taken by these process. "gluster volume start force" will do the same. Regards, Ashish - Original Message - From: "Serkan Çoban" <cobanser...@gmail.com> To: "Ashish Pandey" <aspan...@redhat.com> Cc: "Gluster Users" <gluster-us...@gluster.org>, "Gluster Devel" <gluster-devel@gluster.org> Sent: Monday, April 18, 2016 11:51:37 AM Subject: Re: [Gluster-users] Assertion failed: ec_get_inode_size You can find the statedumps of server and client in below link. Gluster version is 3.7.10, 78x(16+4) disperse setup. 60 nodes named node185..node244 https://www.dropbox.com/s/cc2dgsxwuk48mba/gluster_statedumps.zip?dl=0 On Fri, Apr 15, 2016 at 9:52 PM, Ashish Pandey <aspan...@redhat.com> wrote: > > Actually it was my mistake I overlooked the configuration you provided..It > will be huge. > I would suggest to take statedump on all the nodes and try to grep for > "BLOCKED" in statedump files on all the nodes. > See if you can see any such line in any file and send those files. No need > to send statedump of all the bricks.. > > > > > > From: "Serkan Çoban" <cobanser...@gmail.com> > To: "Ashish Pandey" <aspan...@redhat.com> > Cc: "Gluster Users" <gluster-us...@gluster.org>, "Gluster Devel" > <gluster-devel@gluster.org> > Sent: Friday, April 15, 2016 6:07:00 PM > > Subject: Re: [Gluster-users] Assertion failed: ec_get_inode_size > > Hi Asish, > > Sorry for the question but do you want all brick statedumps from all > servers or all brick dumps from one server? > All server brick dumps is nearly 700MB zipped.. > > On Fri, Apr 15, 2016 at 2:16 PM, Ashish Pandey <aspan...@redhat.com> wrote: >> >> To get the state dump of fuse client- >> 1 - get the PID of fuse mount process >> 2 - kill -USR1 >> >> statedump can be found in the same directory where u get for brick >> process. >> >> Following link could be helpful for future reference - >> >> https://github.com/gluster/glusterfs/blob/master/doc/debugging/statedump.md >> >> Ashish >> >> >> From: "Serkan Çoban" <cobanser...@gmail.com> >> To: "Ashish Pandey" <aspan...@redhat.com> >> Cc: "Gluster Users" <gluster-us...@gluster.org>, "Gluster Devel" >> <gluster-devel@gluster.org> >> Sent: Friday, April 15, 2016 4:02:20 PM >> Subject: Re: [Gluster-users] Assertion failed: ec_get_inode_size >> >> Yes it is only one brick which error appears. I can send all other >> brick dumps too.. >> How can I get state dump in fuse client? There is no gluster command >> there.. >> ___ >> Gluster-users mailing list >> gluster-us...@gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users >> > ___ > Gluster-users mailing list > gluster-us...@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users > ___ Gluster-users mailing list gluster-us...@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] Assertion failed: ec_get_inode_size
Actually it was my mistake I overlooked the configuration you provided..It will be huge. I would suggest to take statedump on all the nodes and try to grep for "BLOCKED" in statedump files on all the nodes. See if you can see any such line in any file and send those files. No need to send statedump of all the bricks.. - Original Message - From: "Serkan Çoban" <cobanser...@gmail.com> To: "Ashish Pandey" <aspan...@redhat.com> Cc: "Gluster Users" <gluster-us...@gluster.org>, "Gluster Devel" <gluster-devel@gluster.org> Sent: Friday, April 15, 2016 6:07:00 PM Subject: Re: [Gluster-users] Assertion failed: ec_get_inode_size Hi Asish, Sorry for the question but do you want all brick statedumps from all servers or all brick dumps from one server? All server brick dumps is nearly 700MB zipped.. On Fri, Apr 15, 2016 at 2:16 PM, Ashish Pandey <aspan...@redhat.com> wrote: > > To get the state dump of fuse client- > 1 - get the PID of fuse mount process > 2 - kill -USR1 > > statedump can be found in the same directory where u get for brick process. > > Following link could be helpful for future reference - > https://github.com/gluster/glusterfs/blob/master/doc/debugging/statedump.md > > Ashish > > > From: "Serkan Çoban" <cobanser...@gmail.com> > To: "Ashish Pandey" <aspan...@redhat.com> > Cc: "Gluster Users" <gluster-us...@gluster.org>, "Gluster Devel" > <gluster-devel@gluster.org> > Sent: Friday, April 15, 2016 4:02:20 PM > Subject: Re: [Gluster-users] Assertion failed: ec_get_inode_size > > Yes it is only one brick which error appears. I can send all other > brick dumps too.. > How can I get state dump in fuse client? There is no gluster command there.. > ___ > Gluster-users mailing list > gluster-us...@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users > ___ Gluster-users mailing list gluster-us...@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] Assertion failed: ec_get_inode_size
Hi Serkan, Could you also provide us the statedump of all the brick processes and clients? Commands to generate statedumps for brick processes/nfs server/quotad For bricks: gluster volume statedump For nfs server: gluster volume statedump nfs We can find the directory where statedump files are created using 'gluster --print-statedumpdir' Also, the mount logs would help us to debug the issue. Ashish - Original Message - From: "Serkan Çoban"To: "Gluster Users" , "Gluster Devel" Sent: Thursday, April 14, 2016 6:27:10 PM Subject: Re: [Gluster-users] Assertion failed: ec_get_inode_size Here is the related brick log: /var/log/glusterfs/bricks/bricks-02.log:[2016-04-14 11:31:25.700556] E [inodelk.c:309:__inode_unlock_lock] 0-v0-locks: Matching lock not found for unlock 0-9223372036854775807, by 94d29e885e7f on 0x7f037413b990 /var/log/glusterfs/bricks/bricks-02.log:[2016-04-14 11:31:25.700639] E [MSGID: 115053] [server-rpc-fops.c:276:server_inodelk_cbk] 0-v0-server: 712984: INODELK /workdir/raw_output/xxx/yyy/zzz.dat.gz.snappy1460474606605 (1191e32e-44ba-4e20-87ca-35ace8519c19) ==> (Invalid argument) [Invalid argument] On Thu, Apr 14, 2016 at 3:25 PM, Serkan Çoban wrote: > Hi, > > During read/write tests to a 78x(16+4) distributed disperse volume > from 50 clients, One clients hangs on read/write with the following > logs: > > [2016-04-14 11:11:04.728580] W [MSGID: 122056] > [ec-combine.c:866:ec_combine_check] 0-v0-disperse-6: Mismatching xdata > in answers of 'LOOKUP' > [2016-04-14 11:11:04.728624] W [MSGID: 122053] > [ec-common.c:116:ec_check_status] 0-v0-disperse-6: Operation failed on > some subvolumes (up=F, mask=F, remaining=0, good=D, > bad=2) > [2016-04-14 11:11:04.736689] I [MSGID: 122058] > [ec-heal.c:2340:ec_heal_do] 0-v0-disperse-6: /workdir/raw_output2: > name heal successful on F > [2016-04-14 11:29:26.718036] W [MSGID: 122056] > [ec-combine.c:866:ec_combine_check] 0-v0-disperse-1: Mismatching xdata > in answers of 'LOOKUP' > [2016-04-14 11:29:26.718121] W [MSGID: 122053] > [ec-common.c:116:ec_check_status] 0-v0-disperse-1: Operation failed on > some subvolumes (up=F, mask=F, remaining=0, good=E, > bad=1) > [2016-04-14 11:29:42.501760] I [MSGID: 122058] > [ec-heal.c:2340:ec_heal_do] 0-v0-disperse-1: /workdir/raw_output2: > name heal successful on F > [2016-04-14 11:31:25.714812] E [ec-inode-read.c:1612:ec_manager_stat] > (-->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(ec_resume+0x91) > [0x7f5ec9f942b1] > -->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(__ec_manager+0x57) > [0x7f5ec9f94497] > -->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(ec_manager_stat+0x2c4) > > [0x7f5ec9faaed4] ) 0-: Assertion failed: ec_get_inode_size(fop, > fop->locks[0].lock->loc.inode, >iatt[0].ia_size) > [2016-04-14 11:31:25.722372] E [MSGID: 114031] > [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-40: remote > operation failed [Invalid argument] > [2016-04-14 11:31:25.722411] E [MSGID: 114031] > [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-41: remote > operation failed [Invalid argument] > [2016-04-14 11:31:25.722450] E [MSGID: 114031] > [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-44: remote > operation failed [Invalid argument] > [2016-04-14 11:31:25.722477] E [MSGID: 114031] > [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-42: remote > operation failed [Invalid argument] > [2016-04-14 11:31:25.722503] E [MSGID: 114031] > [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-43: remote > operation failed [Invalid argument] > [2016-04-14 11:31:25.722577] E [MSGID: 114031] > [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-45: remote > operation failed [Invalid argument] > [2016-04-14 11:31:25.722605] E [MSGID: 114031] > [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-46: remote > operation failed [Invalid argument] > [2016-04-14 11:31:25.722742] E [MSGID: 114031] > [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-49: remote > operation failed [Invalid argument] > [2016-04-14 11:31:25.722794] E [MSGID: 114031] > [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-47: remote > operation failed [Invalid argument] > [2016-04-14 11:31:25.722818] E [MSGID: 114031] > [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-48: remote > operation failed [Invalid argument] > [2016-04-14 11:31:25.722840] E [MSGID: 114031] > [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-50: remote > operation failed [Invalid argument] > [2016-04-14 11:31:25.722883] E [MSGID: 114031] > [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-52: remote > operation failed [Invalid argument] > [2016-04-14 11:31:25.722906] E [MSGID: 114031] >
Re: [Gluster-devel] [Gluster-users] Assertion failed: ec_get_inode_size
To get the state dump of fuse client- 1 - get the PID of fuse mount process 2 - kill -USR1 statedump can be found in the same directory where u get for brick process. Following link could be helpful for future reference - https://github.com/gluster/glusterfs/blob/master/doc/debugging/statedump.md Ashish - Original Message - From: "Serkan Çoban" <cobanser...@gmail.com> To: "Ashish Pandey" <aspan...@redhat.com> Cc: "Gluster Users" <gluster-us...@gluster.org>, "Gluster Devel" <gluster-devel@gluster.org> Sent: Friday, April 15, 2016 4:02:20 PM Subject: Re: [Gluster-users] Assertion failed: ec_get_inode_size Yes it is only one brick which error appears. I can send all other brick dumps too.. How can I get state dump in fuse client? There is no gluster command there.. ___ Gluster-users mailing list gluster-us...@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] Assertion failed: ec_get_inode_size
I think this is the statesump of only one brick. We would required statedump from all the bricks and client process in case of fuse or nfs process if it is mounted through nfs. Ashish - Original Message - From: "Serkan Çoban" <cobanser...@gmail.com> To: "Ashish Pandey" <aspan...@redhat.com> Cc: "Gluster Users" <gluster-us...@gluster.org>, "Gluster Devel" <gluster-devel@gluster.org> Sent: Friday, April 15, 2016 2:11:57 PM Subject: Re: [Gluster-users] Assertion failed: ec_get_inode_size Sorry for typo, brick state dump file. On Fri, Apr 15, 2016 at 11:41 AM, Serkan Çoban <cobanser...@gmail.com> wrote: > Hi I reproduce the problem, brick log file is in below link: > https://www.dropbox.com/s/iy09j7mm2hrsf03/bricks-02.5677.dump.1460705370.gz?dl=0 > > > > On Thu, Apr 14, 2016 at 8:07 PM, Ashish Pandey <aspan...@redhat.com> wrote: >> Hi Serkan, >> >> Could you also provide us the statedump of all the brick processes and >> clients? >> >> Commands to generate statedumps for brick processes/nfs server/quotad >> >> For bricks: gluster volume statedump >> >> For nfs server: gluster volume statedump nfs >> >> >> We can find the directory where statedump files are created using 'gluster >> --print-statedumpdir' >> Also, the mount logs would help us to debug the issue. >> >> Ashish >> >> >> From: "Serkan Çoban" <cobanser...@gmail.com> >> To: "Gluster Users" <gluster-us...@gluster.org>, "Gluster Devel" >> <gluster-devel@gluster.org> >> Sent: Thursday, April 14, 2016 6:27:10 PM >> Subject: Re: [Gluster-users] Assertion failed: ec_get_inode_size >> >> >> Here is the related brick log: >> >> /var/log/glusterfs/bricks/bricks-02.log:[2016-04-14 11:31:25.700556] E >> [inodelk.c:309:__inode_unlock_lock] 0-v0-locks: Matching lock not >> found for unlock 0-9223372036854775807, by 94d29e885e7f on >> 0x7f037413b990 >> /var/log/glusterfs/bricks/bricks-02.log:[2016-04-14 11:31:25.700639] E >> [MSGID: 115053] [server-rpc-fops.c:276:server_inodelk_cbk] >> 0-v0-server: 712984: INODELK >> /workdir/raw_output/xxx/yyy/zzz.dat.gz.snappy1460474606605 >> (1191e32e-44ba-4e20-87ca-35ace8519c19) ==> (Invalid argument) [Invalid >> argument] >> >> On Thu, Apr 14, 2016 at 3:25 PM, Serkan Çoban <cobanser...@gmail.com> wrote: >>> Hi, >>> >>> During read/write tests to a 78x(16+4) distributed disperse volume >>> from 50 clients, One clients hangs on read/write with the following >>> logs: >>> >>> [2016-04-14 11:11:04.728580] W [MSGID: 122056] >>> [ec-combine.c:866:ec_combine_check] 0-v0-disperse-6: Mismatching xdata >>> in answers of 'LOOKUP' >>> [2016-04-14 11:11:04.728624] W [MSGID: 122053] >>> [ec-common.c:116:ec_check_status] 0-v0-disperse-6: Operation failed on >>> some subvolumes (up=F, mask=F, remaining=0, good=D, >>> bad=2) >>> [2016-04-14 11:11:04.736689] I [MSGID: 122058] >>> [ec-heal.c:2340:ec_heal_do] 0-v0-disperse-6: /workdir/raw_output2: >>> name heal successful on F >>> [2016-04-14 11:29:26.718036] W [MSGID: 122056] >>> [ec-combine.c:866:ec_combine_check] 0-v0-disperse-1: Mismatching xdata >>> in answers of 'LOOKUP' >>> [2016-04-14 11:29:26.718121] W [MSGID: 122053] >>> [ec-common.c:116:ec_check_status] 0-v0-disperse-1: Operation failed on >>> some subvolumes (up=F, mask=F, remaining=0, good=E, >>> bad=1) >>> [2016-04-14 11:29:42.501760] I [MSGID: 122058] >>> [ec-heal.c:2340:ec_heal_do] 0-v0-disperse-1: /workdir/raw_output2: >>> name heal successful on F >>> [2016-04-14 11:31:25.714812] E [ec-inode-read.c:1612:ec_manager_stat] >>> (-->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(ec_resume+0x91) >>> [0x7f5ec9f942b1] >>> >>> -->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(__ec_manager+0x57) >>> >>> [0x7f5ec9f94497] >>> >>> -->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(ec_manager_stat+0x2c4) >>> >>> [0x7f5ec9faaed4] ) 0-: Assertion failed: ec_get_inode_size(fop, >>> fop->locks[0].lock->loc.inode, >iatt[0].ia_size) >>> [2016-04-14 11:31:25.722372] E [MSGID: 114031] >>> [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-40: remote >>> opera
[Gluster-devel] Fragment size in Systematic erasure code
Hi Xavi, I think for Systematic erasure coded volume you are going to take fragment size of 512 Bytes. Will there be any CLI option to configure this block size? We were having a discussion and Manoj was suggesting to have this option which might improve performance for some workload. For example- If we can configure it to 8K, all the read can be served only from one brick in case a file size is less than 8K. Ashish ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel