Re: [Gluster-devel] Regression-test-burn-in crash in EC test
-Atin Sent from one plus one On 29-Apr-2016 9:36 PM, "Ashish Pandey"wrote: > > > Hi Jeff, > > Where can we find the core dump? > > --- > Ashish > > > From: "Pranith Kumar Karampuri" > To: "Jeff Darcy" > Cc: "Gluster Devel" , "Ashish Pandey" < aspan...@redhat.com> > Sent: Thursday, April 28, 2016 11:58:54 AM > Subject: Re: [Gluster-devel] Regression-test-burn-in crash in EC test > > > Ashish, >Could you take a look at this? > > Pranith > > - Original Message - > > From: "Jeff Darcy" > > To: "Gluster Devel" > > Sent: Wednesday, April 27, 2016 11:31:25 PM > > Subject: [Gluster-devel] Regression-test-burn-in crash in EC test > > > > One of the "rewards" of reviewing and merging people's patches is getting > > email if the next regression-test-burn-in should fail - even if it fails for > > a completely unrelated reason. Today I got one that's not among the usual > > suspects. The failure was a core dump in tests/bugs/disperse/bug-1304988.t, > > weighing in at a respectable 42 frames. > > > > #0 0x7fef25976cb9 in dht_rename_lock_cbk > > #1 0x7fef25955f62 in dht_inodelk_done > > #2 0x7fef25957352 in dht_blocking_inodelk_cbk > > #3 0x7fef32e02f8f in default_inodelk_cbk > > #4 0x7fef25c029a3 in ec_manager_inodelk > > #5 0x7fef25bf9802 in __ec_manager > > #6 0x7fef25bf990c in ec_manager > > #7 0x7fef25c03038 in ec_inodelk > > #8 0x7fef25bee7ad in ec_gf_inodelk > > #9 0x7fef25957758 in dht_blocking_inodelk_rec > > #10 0x7fef25957b2d in dht_blocking_inodelk > > #11 0x7fef2597713f in dht_rename_lock > > #12 0x7fef25977835 in dht_rename > > #13 0x7fef32e0f032 in default_rename > > #14 0x7fef32e0f032 in default_rename > > #15 0x7fef32e0f032 in default_rename > > #16 0x7fef32e0f032 in default_rename > > #17 0x7fef32e0f032 in default_rename > > #18 0x7fef32e07c29 in default_rename_resume > > #19 0x7fef32d8ed40 in call_resume_wind > > #20 0x7fef32d98b2f in call_resume > > #21 0x7fef24cfc568 in open_and_resume > > #22 0x7fef24cffb99 in ob_rename > > #23 0x7fef24aee482 in mdc_rename > > #24 0x7fef248d68e5 in io_stats_rename > > #25 0x7fef32e0f032 in default_rename > > #26 0x7fef2ab1b2b9 in fuse_rename_resume > > #27 0x7fef2ab12c47 in fuse_fop_resume > > #28 0x7fef2ab107cc in fuse_resolve_done > > #29 0x7fef2ab108a2 in fuse_resolve_all > > #30 0x7fef2ab10900 in fuse_resolve_continue > > #31 0x7fef2ab0fb7c in fuse_resolve_parent > > #32 0x7fef2ab1077d in fuse_resolve > > #33 0x7fef2ab10879 in fuse_resolve_all > > #34 0x7fef2ab10900 in fuse_resolve_continue > > #35 0x7fef2ab0fb7c in fuse_resolve_parent > > #36 0x7fef2ab1077d in fuse_resolve > > #37 0x7fef2ab10824 in fuse_resolve_all > > #38 0x7fef2ab1093e in fuse_resolve_and_resume > > #39 0x7fef2ab1b40e in fuse_rename > > #40 0x7fef2ab2a96a in fuse_thread_proc > > #41 0x7fef3204daa1 in start_thread > > > > In other words we started at FUSE, went through a bunch of performance > > translators, through DHT to EC, and then crashed on the way back. It seems > > a little odd that we turn the fop around immediately in EC, and that we have > > default_inodelk_cbk at frame 3. Could one of the DHT or EC people please > > take a look at it? Thanks! > > > > > > https://build.gluster.org/job/regression-test-burn-in/868/console This is the one. > > ___ > > Gluster-devel mailing list > > Gluster-devel@gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] [IMPORTANT] Adding release notes for 3.8 features
Hi all, The branching for 3.8 will happen on April 30th, 2016. Since we are approaching the last stage for 3.8 release a public pad [1] is created for adding release notes. It should mention about major changes that may impact overall working of a feature. For example we are planning to depreciate gluster nfs from 3.8, i.e when volume get started, nfs server won't start by default.The user need to turn off "nfs.disable" option to bring up gluster nfs. So I kindly request all the feature owners to update the release note about their feature. Also please update your progress on 3.8 features in the roadmap [2] [1] https://public.pad.fsfe.org/p/glusterfs-3.8-release-notes [2] https://www.gluster.org/community/roadmap/3.8/ Thanks, Niels & Jiffin ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] .glusterfs grown larger than volume content
Hello, We've noticed that the .glusterfs directory is larger than the contents of the volume. Our application only has access through the client so I don't suspect anything was deleted on the brick. # du -sh .glusterfs 31G .glusterfs/ # du -sh * 13G dir1 31M dir2 How could we have come into this state? Is there a way to find what is orphaned? We tried looking for any references to deleted files but it didn't seem to yield much: # find .glusterfs -links 1 -ls 2211630 lrwxrwxrwx 1 root root 51 Mar 4 14:34 .glusterfs/91/ff/91ffa-f20f-4933-a8d6-abx93074 -> ../../00/00/----0001/dir2 3835150 lrwxrwxrwx 1 root root 59 Mar 4 15:08 .glusterfs/b1/2d/bd5b5-e00c-4bd1-95c6-312a25 -> ../../7e/85/7cxxx90-88e9-4cdd-95fd-dd48/recyclebin 4494050 lrwxrwxrwx 1 root root 51 Mar 4 15:08 .glusterfs/21/28/2102-101e-4177-b775-74379ba -> ../../00/00/----0001/dir2 3941500 lrwxrwxrwx 1 root root 59 Apr 4 13:24 .glusterfs/c7/2b/c728-877-49a-b7d-3b3149c -> ../../e1/09/e10xx94e-c5xcd-4c1f-95f-4824106e/recyclebin 2299340 lrwxrwxrwx 1 root root 60 Mar 4 15:08 .glusterfs/00/00/----0006 -> ../../00/00/----0005/internal_op 2129310 lrwxrwxrwx 1 root root 8 Mar 4 15:08 .glusterfs/00/00/----0001 -> ../../.. 4775410 lrwxrwxrwx 1 root root 58 Mar 4 15:08 .glusterfs/00/00/----0005 -> ../../00/00/----0001/.trashcan 3850480 lrwxrwxrwx 1 root root 55 Mar 23 12:02 .glusterfs/b3/21/b3xxb20-4b23-4e93-8db4-3dxx8x6e -> ../../e1/09/e10084e-c5cd-4c1f-95f-482106e/videos 2199364 -rw-r--r-- 1 root root19 Apr 27 10:54 .glusterfs/health_check 2640270 -- 1 root root0 Apr 26 13:01 .glusterfs/indices/xattrop/xattrop-2198-d683-431-bxx2-103474 2129410 lrwxrwxrwx 1 root root 51 Mar 4 14:24 .glusterfs/e1/09/e1xxx4e-c5d-4c1f-95f-482xe -> ../../00/00/----0001/dir1 3976650 lrwxrwxrwx 1 root root 51 Mar 4 15:08 .glusterfs/7e/85/757c90-8e9-4cdd-95fd-dd48 -> ../../00/00/----0001/dir1 270337 20 -rw-r--r-- 1 root root20480 Dec 14 23:03 .glusterfs/data.db We are running on a single node but when I added a second node and perform a full heal, the .glusterfs directory size is the same as the volume content size which is that we expected. Version: glusterfs 3.7.3 OS: CentOS 5 Any advice would be much appreciated! Thanks! Vincent- ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression-test-burn-in crash in EC test
Hi Jeff, Where can we find the core dump? --- Ashish - Original Message - From: "Pranith Kumar Karampuri"To: "Jeff Darcy" Cc: "Gluster Devel" , "Ashish Pandey" Sent: Thursday, April 28, 2016 11:58:54 AM Subject: Re: [Gluster-devel] Regression-test-burn-in crash in EC test Ashish, Could you take a look at this? Pranith - Original Message - > From: "Jeff Darcy" > To: "Gluster Devel" > Sent: Wednesday, April 27, 2016 11:31:25 PM > Subject: [Gluster-devel] Regression-test-burn-in crash in EC test > > One of the "rewards" of reviewing and merging people's patches is getting > email if the next regression-test-burn-in should fail - even if it fails for > a completely unrelated reason. Today I got one that's not among the usual > suspects. The failure was a core dump in tests/bugs/disperse/bug-1304988.t, > weighing in at a respectable 42 frames. > > #0 0x7fef25976cb9 in dht_rename_lock_cbk > #1 0x7fef25955f62 in dht_inodelk_done > #2 0x7fef25957352 in dht_blocking_inodelk_cbk > #3 0x7fef32e02f8f in default_inodelk_cbk > #4 0x7fef25c029a3 in ec_manager_inodelk > #5 0x7fef25bf9802 in __ec_manager > #6 0x7fef25bf990c in ec_manager > #7 0x7fef25c03038 in ec_inodelk > #8 0x7fef25bee7ad in ec_gf_inodelk > #9 0x7fef25957758 in dht_blocking_inodelk_rec > #10 0x7fef25957b2d in dht_blocking_inodelk > #11 0x7fef2597713f in dht_rename_lock > #12 0x7fef25977835 in dht_rename > #13 0x7fef32e0f032 in default_rename > #14 0x7fef32e0f032 in default_rename > #15 0x7fef32e0f032 in default_rename > #16 0x7fef32e0f032 in default_rename > #17 0x7fef32e0f032 in default_rename > #18 0x7fef32e07c29 in default_rename_resume > #19 0x7fef32d8ed40 in call_resume_wind > #20 0x7fef32d98b2f in call_resume > #21 0x7fef24cfc568 in open_and_resume > #22 0x7fef24cffb99 in ob_rename > #23 0x7fef24aee482 in mdc_rename > #24 0x7fef248d68e5 in io_stats_rename > #25 0x7fef32e0f032 in default_rename > #26 0x7fef2ab1b2b9 in fuse_rename_resume > #27 0x7fef2ab12c47 in fuse_fop_resume > #28 0x7fef2ab107cc in fuse_resolve_done > #29 0x7fef2ab108a2 in fuse_resolve_all > #30 0x7fef2ab10900 in fuse_resolve_continue > #31 0x7fef2ab0fb7c in fuse_resolve_parent > #32 0x7fef2ab1077d in fuse_resolve > #33 0x7fef2ab10879 in fuse_resolve_all > #34 0x7fef2ab10900 in fuse_resolve_continue > #35 0x7fef2ab0fb7c in fuse_resolve_parent > #36 0x7fef2ab1077d in fuse_resolve > #37 0x7fef2ab10824 in fuse_resolve_all > #38 0x7fef2ab1093e in fuse_resolve_and_resume > #39 0x7fef2ab1b40e in fuse_rename > #40 0x7fef2ab2a96a in fuse_thread_proc > #41 0x7fef3204daa1 in start_thread > > In other words we started at FUSE, went through a bunch of performance > translators, through DHT to EC, and then crashed on the way back. It seems > a little odd that we turn the fop around immediately in EC, and that we have > default_inodelk_cbk at frame 3. Could one of the DHT or EC people please > take a look at it? Thanks! > > > https://build.gluster.org/job/regression-test-burn-in/868/console > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Gluster Download link
Hi, I am unable to install GlusterFS-Server from the installation instruction provided here: http://www.gluster.org/community/documentation/index.php/Getting_started_install I am using Centos 7 and when I enter the command: wget -P /etc/yum.repos.d http://download.gluster.org/pub/gluster/glusterfs/LATEST/RHEL/glusterfs-epel.repo I get the message that it was not found. I manually search for the repos link as it seems to have changed, and after entering it if I run “ yum install glusterfs” it again fails as it says that index file was not found. Something is wrong. Please help in fixing it. Please look at the links and make required corrections so I can install glusterfs server as system seems to be broken. Thank you. Shakti Rathore ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] netbsd smoke failure
Le vendredi 29 avril 2016 à 10:05 -0400, Susant Palai a écrit : > Hi All, > On many of my patches the following error is seen from netbsd smoke test. > > Triggered by Gerrit: http://review.gluster.org/13993 > Building remotely on netbsd0.cloud.gluster.org (netbsd_build) in workspace > /home/jenkins/root/workspace/netbsd6-smoke > > git rev-parse --is-inside-work-tree # timeout=10 > Fetching changes from the remote Git repository > > git config remote.origin.url git://review.gluster.org/glusterfs.git # > timeout=10 > ERROR: Error fetching remote repo 'origin' > hudson.plugins.git.GitException: Failed to fetch from > git://review.gluster.org/glusterfs.git > at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:810) > at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1066) > at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1097) > at hudson.scm.SCM.checkout(SCM.java:485) > at hudson.model.AbstractProject.checkout(AbstractProject.java:1269) > at > hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:607) > at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86) > at > hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529) > at hudson.model.Run.execute(Run.java:1738) > at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) > at hudson.model.ResourceController.execute(ResourceController.java:98) > at hudson.model.Executor.run(Executor.java:410) > Caused by: hudson.plugins.git.GitException: Command "git config > remote.origin.url git://review.gluster.org/glusterfs.git" returned status > code 255: > stdout: > stderr: error: could not lock config file .git/config: File exists > > > > Please let me know how this can be resolved. So as I didn't add access, I ran a groovy script to add my key as root. Then i did reboot the server (since some old process were running), and then removed the file causing trouble ( .git/config.lock in the repo ). I will reenable the builder back soon. -- Michael Scherer Sysadmin, Community Infrastructure and Platform, OSAS signature.asc Description: This is a digitally signed message part ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] gluster 3.7.9 permission denied and mv errors
Raghavendra, This error is occurring in a shell script moving files between directories on a FUSE mount when overwriting an old file with a newer file (it's a backup script, moving an incremental backup of a file into a 'rolling full backup' directory). As a temporary workaround, we parse the output of this shell script for move errors and handle the errors as they happen. Simply re-moving the files fails, so we stat the destination (to see if we can learn anything about the type of file that causes this behavior), delete the destination, and try the move again (success!). Typical output is as follows: /bin/mv: cannot move `./homegfs/hpc_shared/motorsports/gmics/Raven/p11/149/data_collected4' > to `../bkp00/./homegfs/hpc_shared/motorsports/gmics/ > Raven/p11/149/data_collected4': File exists > /bin/mv: cannot move > `./homegfs/hpc_shared/motorsports/gmics/Raven/p11/149/data_collected4' > to `../bkp00/./homegfs/hpc_shared/motorsports/gmics/ > Raven/p11/149/data_collected4': File exists > File: `../bkp00/./homegfs/hpc_shared/motorsports/gmics/ > Raven/p11/149/data_collected4' > Size: 1714Blocks: 4 IO Block: 131072 regular file > Device: 13h/19d Inode: 11051758947722304158 Links: 1 > Access: (0660/-rw-rw) Uid: ( 628/pkeistler) Gid: ( 2020/ gmirl) > Access: 2016-01-20 17:20:45.0 -0500 > Modify: 2015-11-06 15:20:41.0 -0500 > Change: 2016-01-27 03:35:00.434712146 -0500 > retry: renaming > ./homegfs/hpc_shared/motorsports/gmics/Raven/p11/149/data_collected4 > -> ../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p11/ > 149/data_collected4 > Not sure if that description rings any bells as to what the problem might be, but if not, I added some code to print out the 'getattr' for the source and destination file on all of the bricks (before we delete the destination) and will post to this thread the next time we have that issue. Thanks, Patrick On Fri, Apr 29, 2016 at 8:15 AM, Raghavendra Gwrote: > > > On Wed, Apr 13, 2016 at 10:00 PM, David F. Robinson < > david.robin...@corvidtec.com> wrote: > >> I am running into two problems (possibly related?). >> >> 1) Every once in a while, when I do a 'rm -rf DIRNAME', it comes back >> with an error: >> rm: cannot remove `DIRNAME` : Directory not empty >> >> If I try the 'rm -rf' again after the error, it deletes the >> directory. The issue is that I have scripts that clean up directories, and >> they are failing unless I go through the deletes a 2nd time. >> > > What kind of mount are you using? Is it a FUSE or NFS mount? Recently we > saw a similar issue on NFS clients on RHEL6 where rm -rf used to fail with > ENOTEMPTY in some specific cases. > > >> >> 2) I have different scripts to move a large numbers of files (5-25k) from >> one directory to another. Sometimes I receive an error: >> /bin/mv: cannot move `xyz` to `../bkp00/xyz`: File exists >> > > Does ./bkp00/xyz exist on backend? If yes, what is the value of gfid xattr > (key: "trusted.gfid") for "xyz" and "./bkp00/xyz" on backend bricks (I need > gfid from all the bricks) when this issue happens? > > >> The move is done using '/bin/mv -f', so it should overwrite the file >> if it exists. I have tested this with hundreds of files, and it works as >> expected. However, every few days the script that moves the files will >> have problems with 1 or 2 files during the move. This is one move problem >> out of roughly 10,000 files that are being moved and I cannot figure out >> any reason for the intermittent problem. >> >> Setup details for my gluster configuration shown below. >> >> [root@gfs01bkp logs]# gluster volume info >> >> Volume Name: gfsbackup >> Type: Distribute >> Volume ID: e78d5123-d9bc-4d88-9c73-61d28abf0b41 >> Status: Started >> Number of Bricks: 7 >> Transport-type: tcp >> Bricks: >> Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/gfsbackup >> Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/gfsbackup >> Brick3: gfsib02bkp.corvidtec.com:/data/brick01bkp/gfsbackup >> Brick4: gfsib02bkp.corvidtec.com:/data/brick02bkp/gfsbackup >> Brick5: gfsib02bkp.corvidtec.com:/data/brick03bkp/gfsbackup >> Brick6: gfsib02bkp.corvidtec.com:/data/brick04bkp/gfsbackup >> Brick7: gfsib02bkp.corvidtec.com:/data/brick05bkp/gfsbackup >> Options Reconfigured: >> nfs.disable: off >> server.allow-insecure: on >> storage.owner-gid: 100 >> server.manage-gids: on >> cluster.lookup-optimize: on >> server.event-threads: 8 >> client.event-threads: 8 >> changelog.changelog: off >> storage.build-pgfid: on >> performance.readdir-ahead: on >> diagnostics.brick-log-level: WARNING >> diagnostics.client-log-level: WARNING >> cluster.rebal-throttle: aggressive >> performance.cache-size: 1024MB >> performance.write-behind-window-size: 10MB >> >> >> [root@gfs01bkp logs]# rpm -qa | grep gluster >> glusterfs-server-3.7.9-1.el6.x86_64 >> glusterfs-debuginfo-3.7.9-1.el6.x86_64 >> glusterfs-api-3.7.9-1.el6.x86_64 >>
[Gluster-devel] netbsd smoke failure
Hi All, On many of my patches the following error is seen from netbsd smoke test. Triggered by Gerrit: http://review.gluster.org/13993 Building remotely on netbsd0.cloud.gluster.org (netbsd_build) in workspace /home/jenkins/root/workspace/netbsd6-smoke > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url git://review.gluster.org/glusterfs.git # > timeout=10 ERROR: Error fetching remote repo 'origin' hudson.plugins.git.GitException: Failed to fetch from git://review.gluster.org/glusterfs.git at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:810) at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1066) at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1097) at hudson.scm.SCM.checkout(SCM.java:485) at hudson.model.AbstractProject.checkout(AbstractProject.java:1269) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:607) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529) at hudson.model.Run.execute(Run.java:1738) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:98) at hudson.model.Executor.run(Executor.java:410) Caused by: hudson.plugins.git.GitException: Command "git config remote.origin.url git://review.gluster.org/glusterfs.git" returned status code 255: stdout: stderr: error: could not lock config file .git/config: File exists Please let me know how this can be resolved. Here are few links of netbsd logs: https://build.gluster.org/job/netbsd6-smoke/13136/console https://build.gluster.org/job/netbsd6-smoke/13137/console Thanks, Susant ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Possible bug in the communications layer ?
With your patch applied, it seems that the bug is not hit. I guess it's a timing issue that the new logging hides. Maybe no more data available after reading the partial readv header ? (it will arrive later). I'll continue testing... Xavi On 29/04/16 13:48, Raghavendra Gowdappa wrote: Attaching the patch. - Original Message - From: "Raghavendra Gowdappa"To: "Xavier Hernandez" Cc: "Gluster Devel" Sent: Friday, April 29, 2016 5:14:02 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? - Original Message - From: "Xavier Hernandez" To: "Raghavendra Gowdappa" Cc: "Gluster Devel" Sent: Friday, April 29, 2016 1:21:57 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? Hi Raghavendra, yes, the readv response contains xdata. The dict length is 38 (0x26) and, at the moment of failure, rsp.xdata.xdata_len already contains 0x26. rsp.xdata.xdata_len having 0x26 even when decoding failed indicates that the approach used in socket.c to get the length of xdata is correct. However, I cannot find any other way of xdata going into payload vector other than xdata_len being zero. Just to be double sure, I've a patch containing debug message printing xdata_len when decoding fails in socket.c. Can you please apply the patch, run the tests and revert back with results? Xavi On 29/04/16 09:10, Raghavendra Gowdappa wrote: - Original Message - From: "Raghavendra Gowdappa" To: "Xavier Hernandez" Cc: "Gluster Devel" Sent: Friday, April 29, 2016 12:36:43 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? - Original Message - From: "Raghavendra Gowdappa" To: "Xavier Hernandez" Cc: "Jeff Darcy" , "Gluster Devel" Sent: Friday, April 29, 2016 12:07:59 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? - Original Message - From: "Xavier Hernandez" To: "Jeff Darcy" Cc: "Gluster Devel" Sent: Thursday, April 28, 2016 8:15:36 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? Hi Jeff, On 28.04.2016 15:20, Jeff Darcy wrote: This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The volume is a distributed-disperse 4*(4+2). I'm able to reproduce the problem easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting the read test. As can be seen in the data below, client3_3_readv_cbk() is processing an iovec of 116 bytes, however it should be of 154 bytes (the buffer in memory really seems to contain 154 bytes). The data on the network seems ok (at least I haven't been able to identify any problem), so this must be a processing error on the client side. The last field in cut buffer of the sequentialized data corresponds to the length of the xdata field: 0x26. So at least 38 more byte should be present. Nice detective work, Xavi. It would be *very* interesting to see what the value of the "count" parameter is (it's unfortunately optimized out). I'll bet it's two, and iov[1].iov_len is 38. I have a weak memory of some problems with how this iov is put together, a couple of years ago, and it looks like you might have tripped over one more. It seems you are right. The count is 2 and the first 38 bytes of the second vector contains the remaining data of xdata field. This is the bug. client3_3_readv_cbk (and for that matter all the actors/cbks) expects response in utmost two vectors: 1. Program header containing request or response. This is subjected to decoding/encoding. This vector should point to a buffer that contains the entire program header/response contiguously. 2. If the procedure returns payload (like readv response or a write request), second vector contains the buffer pointing to the entire (contiguous) payload. Note that this payload is raw and is not subjected to encoding/decoding. In your case, this _clean_ separation is broken with part of program header slipping into 2nd vector supposed to contain read data (may be because of rpc fragmentation). I think this is a bug in socket layer. I'll update more on this. Does your read response include xdata too? I think the code related to reading xdata in readv response is a bit murky. case SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT: default_read_size = xdr_sizeof ((xdrproc_t) xdr_gfs3_read_rsp, _rsp); proghdr_buf = frag->fragcurrent;
Re: [Gluster-devel] How to enable ACL support in Glusterfs volume
Hi Niels, Now I am able to run 'setfacl' command on Gluster volume using Kenrel NFS. The problem was, I was exporting Gluster volume mount point which I get after mounting the gluster volume in /etc/exports file: like mount -t glusterfs -o acl :/ But instead of using this point I need to export the volume brick path i.e. '/tmp/brick/gv0' gluster volume info Volume Name: gv0 Type: Distribute Volume ID: c3d636aa-f718-47b2-90eb-2b5846ad52a2 Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: 128.224.95.140:/tmp/brick/gv0 Options Reconfigured: nfs.disable: on performance.readdir-ahead: on and mount on remote as below: mount -t nfs -o acl,vers=3 128.224.95.140:/tmp/brick/gv0 then run setfacl -m u:nobody:rw / Now I have one question here, please answer it: Please let me confirm, is there any side effect to export brick path? Regards, Abhishek On Thu, Apr 28, 2016 at 5:35 PM, ABHISHEK PALIWALwrote: > > > On Thu, Apr 28, 2016 at 4:13 PM, Niels de Vos wrote: > >> On Thu, Apr 28, 2016 at 12:05:37PM +0530, ABHISHEK PALIWAL wrote: >> > Hi, >> > >> > I have one more query: >> > >> > I am using machine with ip 10.32.0.48 where gluster is running and >> mounted >> > my gluster volume as follows >> > >> > mount -t glusterfs -o acl 10.32.0.48:/c_glusterfs /mnt/c >> > >> > and after that I mounted /mnt/c volume to /tmp/l on same machin >> 10.32.0.48 >> > >> > mount -t nfs -o acl,vers=3 10.32.0.48:/mnt/c /tmp/l >> > >> > When I run setfacl command on /tmp/l (mounted as nfs) volume its not >> > working >> > # setfacl -m u:application:r /tmp/l/usr >> > setfacl: /tmp/l/usr: Operation not supported >> > >> > but when I run setfacl command on /mnt/c(mounted as glusterfs) it is >> > working >> > # setfacl -m u:application:r /mnt/c >> > >> > Could you please tell me the reason for this. >> >> Note that NFSv3 ACLs are not part of the NFS protocol itself. It is >> handled by a side-band protocol. If all ACL operations on any NFS server >> fail, make sure to check that the ports for NFSv3 ACLs are open. You can >> chech that with 'rpcinfo -p $NFS_SERVER'. >> > > ACL operation working fine with other NFS servers and ports are also open > > > >> >> Gluster/NFS should have ACLs enabled by default. It is possible to >> disable support for ACLs in Gluster/NFS with the 'nfs.acl' volume >> option, just make sure that the option is not set, or is set to 'true'. >> > > I have tried with Gluster/NFS option where > > nfs.disable off > nfs.acl on > but still getting setfacl command failure. > >> >> HTH, >> Niels >> >> >> > >> > Regards, >> > Abhishek >> > >> > On Wed, Apr 27, 2016 at 4:56 PM, Niels de Vos >> wrote: >> > >> > > Thank you for your email. >> > > >> > > I am out of the office on 27-April-2016 and will return on >> 28-April-2016. >> > > While I am out I will have limited access to email. When I have >> returned, I >> > > will respond to your message as soon as possible. >> > > >> > > Many thanks, >> > > Niels de Vos >> > > >> > >> > >> > >> > -- >> > >> > >> > >> > >> > Regards >> > Abhishek Paliwal >> > > > > -- > > > > > Regards > Abhishek Paliwal > -- Regards Abhishek Paliwal ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] gluster 3.7.9 permission denied and mv errors
On Wed, Apr 13, 2016 at 10:00 PM, David F. Robinson < david.robin...@corvidtec.com> wrote: > I am running into two problems (possibly related?). > > 1) Every once in a while, when I do a 'rm -rf DIRNAME', it comes back with > an error: > rm: cannot remove `DIRNAME` : Directory not empty > > If I try the 'rm -rf' again after the error, it deletes the > directory. The issue is that I have scripts that clean up directories, and > they are failing unless I go through the deletes a 2nd time. > What kind of mount are you using? Is it a FUSE or NFS mount? Recently we saw a similar issue on NFS clients on RHEL6 where rm -rf used to fail with ENOTEMPTY in some specific cases. > > 2) I have different scripts to move a large numbers of files (5-25k) from > one directory to another. Sometimes I receive an error: > /bin/mv: cannot move `xyz` to `../bkp00/xyz`: File exists > Does ./bkp00/xyz exist on backend? If yes, what is the value of gfid xattr (key: "trusted.gfid") for "xyz" and "./bkp00/xyz" on backend bricks (I need gfid from all the bricks) when this issue happens? > The move is done using '/bin/mv -f', so it should overwrite the file > if it exists. I have tested this with hundreds of files, and it works as > expected. However, every few days the script that moves the files will > have problems with 1 or 2 files during the move. This is one move problem > out of roughly 10,000 files that are being moved and I cannot figure out > any reason for the intermittent problem. > > Setup details for my gluster configuration shown below. > > [root@gfs01bkp logs]# gluster volume info > > Volume Name: gfsbackup > Type: Distribute > Volume ID: e78d5123-d9bc-4d88-9c73-61d28abf0b41 > Status: Started > Number of Bricks: 7 > Transport-type: tcp > Bricks: > Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/gfsbackup > Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/gfsbackup > Brick3: gfsib02bkp.corvidtec.com:/data/brick01bkp/gfsbackup > Brick4: gfsib02bkp.corvidtec.com:/data/brick02bkp/gfsbackup > Brick5: gfsib02bkp.corvidtec.com:/data/brick03bkp/gfsbackup > Brick6: gfsib02bkp.corvidtec.com:/data/brick04bkp/gfsbackup > Brick7: gfsib02bkp.corvidtec.com:/data/brick05bkp/gfsbackup > Options Reconfigured: > nfs.disable: off > server.allow-insecure: on > storage.owner-gid: 100 > server.manage-gids: on > cluster.lookup-optimize: on > server.event-threads: 8 > client.event-threads: 8 > changelog.changelog: off > storage.build-pgfid: on > performance.readdir-ahead: on > diagnostics.brick-log-level: WARNING > diagnostics.client-log-level: WARNING > cluster.rebal-throttle: aggressive > performance.cache-size: 1024MB > performance.write-behind-window-size: 10MB > > > [root@gfs01bkp logs]# rpm -qa | grep gluster > glusterfs-server-3.7.9-1.el6.x86_64 > glusterfs-debuginfo-3.7.9-1.el6.x86_64 > glusterfs-api-3.7.9-1.el6.x86_64 > glusterfs-resource-agents-3.7.9-1.el6.noarch > gluster-nagios-common-0.1.1-0.el6.noarch > glusterfs-libs-3.7.9-1.el6.x86_64 > glusterfs-fuse-3.7.9-1.el6.x86_64 > glusterfs-extra-xlators-3.7.9-1.el6.x86_64 > glusterfs-geo-replication-3.7.9-1.el6.x86_64 > glusterfs-3.7.9-1.el6.x86_64 > glusterfs-cli-3.7.9-1.el6.x86_64 > glusterfs-devel-3.7.9-1.el6.x86_64 > glusterfs-rdma-3.7.9-1.el6.x86_64 > samba-vfs-glusterfs-4.1.11-2.el6.x86_64 > glusterfs-client-xlators-3.7.9-1.el6.x86_64 > glusterfs-api-devel-3.7.9-1.el6.x86_64 > python-gluster-3.7.9-1.el6.noarch > > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > -- Raghavendra G ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Gluster + Infiniband + 3.x kernel -> hard crash?
On Thu, Apr 7, 2016 at 2:02 AM, Glomski, Patrick < patrick.glom...@corvidtec.com> wrote: > We run gluster 3.7 in a distributed replicated setup. Infiniband (tcp) > links the gluster peers together and clients use the ethernet interface. > > This setup is stable running CentOS 6.x and using the most recent > infiniband drivers provided by Mellanox. Uptime was 170 days when we took > it down to wipe the systems and update to CentOS 7. > > When the exact same setup is loaded onto a CentOS 7 machine (minor setup > differences, but basically the same; setup is handled by ansible), the > peers will (seemingly randomly) experience a hard crash and need to be > power-cycled. There is no output on the screen and nothing in the logs. > After rebooting, the peer reconnects, heals whatever files it missed, and > everything is happy again. Maximum uptime for any given peer is 20 days. > Thanks to the replication, clients maintain connectivity, but from a system > administration perspective it's driving me crazy! > > We run other storage servers with the same infiniband and CentOS7 setup > except that they use NFS instead of gluster. NFS shares are served through > infiniband to some machines and ethernet to others. > > Is it possible that gluster's (and only gluster's) use of the infiniband > kernel module to send tcp packets to its peers on a 3 kernel is causing the > system to have a hard crash? > Please note that Gluster is only a "userspace" consumer of infiniband. So, at least in "theory" it shouldn't result in kernel panic. However infiniband also allows userspace programs to do somethings which can be done only by kernel (like pinning pages to a specific address). I am not very familiar with internals of infiniband and hence cannot authoritatively comment on whether kernel panic is possible/impossible. Some one with an understanding of infiniband internals would be in a better position to comment on this. Pretty specific problem and it doesn't make much sense to me, but that's > sure where the evidence seems to point. > > Anyone running CentOS 7 gluster arrays with infiniband out there to > confirm that it works fine for them? Gluster devs care to chime in with a > better theory? I'd love for this random crashing to stop. > > Thanks, > Patrick > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > -- Raghavendra G ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression-test-burn-in crash in EC test
Seems like I missed adding rtalur/sakshi to cc list. On Fri, Apr 29, 2016 at 5:25 PM, Raghavendra Gwrote: > Raghavendra Talur reported another crash in dht_rename_lock_cbk (which is > similar - not exactly same - to the bt presented here). I heard Sakshi is > taking a look into this. > > Rtalur/Sakshi, > > Can you please post your findings here? > > regards, > Raghavendra > > On Fri, Apr 29, 2016 at 4:50 PM, Jeff Darcy wrote: > >> > The test is doing renames where source and target directories are >> > different. At the same time a new ec-set is added and rebalance started. >> > Rebalance will cause dht to also move files between bricks. Maybe this >> > is causing some race in dht ? >> > >> > I'll try to continue investigating when I have some time. >> >> That would be great, but if you've pursued this as far as DHT then it >> would be OK to hand it off to that team as well. Thanks! >> ___ >> Gluster-devel mailing list >> Gluster-devel@gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-devel >> > > > > -- > Raghavendra G > -- Raghavendra G ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Possible bug in the communications layer ?
Attaching the patch. - Original Message - > From: "Raghavendra Gowdappa"> To: "Xavier Hernandez" > Cc: "Gluster Devel" > Sent: Friday, April 29, 2016 5:14:02 PM > Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > > > - Original Message - > > From: "Xavier Hernandez" > > To: "Raghavendra Gowdappa" > > Cc: "Gluster Devel" > > Sent: Friday, April 29, 2016 1:21:57 PM > > Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > > > Hi Raghavendra, > > > > yes, the readv response contains xdata. The dict length is 38 (0x26) > > and, at the moment of failure, rsp.xdata.xdata_len already contains 0x26. > > rsp.xdata.xdata_len having 0x26 even when decoding failed indicates that the > approach used in socket.c to get the length of xdata is correct. However, I > cannot find any other way of xdata going into payload vector other than > xdata_len being zero. Just to be double sure, I've a patch containing debug > message printing xdata_len when decoding fails in socket.c. Can you please > apply the patch, run the tests and revert back with results? > > > > > Xavi > > > > On 29/04/16 09:10, Raghavendra Gowdappa wrote: > > > > > > > > > - Original Message - > > >> From: "Raghavendra Gowdappa" > > >> To: "Xavier Hernandez" > > >> Cc: "Gluster Devel" > > >> Sent: Friday, April 29, 2016 12:36:43 PM > > >> Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > >> > > >> > > >> > > >> - Original Message - > > >>> From: "Raghavendra Gowdappa" > > >>> To: "Xavier Hernandez" > > >>> Cc: "Jeff Darcy" , "Gluster Devel" > > >>> > > >>> Sent: Friday, April 29, 2016 12:07:59 PM > > >>> Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > >>> > > >>> > > >>> > > >>> - Original Message - > > From: "Xavier Hernandez" > > To: "Jeff Darcy" > > Cc: "Gluster Devel" > > Sent: Thursday, April 28, 2016 8:15:36 PM > > Subject: Re: [Gluster-devel] Possible bug in the communications layer > > ? > > > > > > > > Hi Jeff, > > > > On 28.04.2016 15:20, Jeff Darcy wrote: > > > > > > > > This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. > > The > > volume is a distributed-disperse 4*(4+2). I'm able to reproduce the > > problem > > easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w > > -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g > > -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after > > starting > > the > > read test. As can be seen in the data below, client3_3_readv_cbk() is > > processing an iovec of 116 bytes, however it should be of 154 bytes > > (the > > buffer in memory really seems to contain 154 bytes). The data on the > > network > > seems ok (at least I haven't been able to identify any problem), so > > this > > must be a processing error on the client side. The last field in cut > > buffer > > of the sequentialized data corresponds to the length of the xdata > > field: > > 0x26. So at least 38 more byte should be present. > > Nice detective work, Xavi. It would be *very* interesting to see what > > the value of the "count" parameter is (it's unfortunately optimized > > out). > > I'll bet it's two, and iov[1].iov_len is 38. I have a weak memory of > > some problems with how this iov is put together, a couple of years > > ago, > > and it looks like you might have tripped over one more. > > It seems you are right. The count is 2 and the first 38 bytes of the > > second > > vector contains the remaining data of xdata field. > > >>> > > >>> This is the bug. client3_3_readv_cbk (and for that matter all the > > >>> actors/cbks) expects response in utmost two vectors: > > >>> 1. Program header containing request or response. This is subjected to > > >>> decoding/encoding. This vector should point to a buffer that contains > > >>> the > > >>> entire program header/response contiguously. > > >>> 2. If the procedure returns payload (like readv response or a write > > >>> request), > > >>> second vector contains the buffer pointing to the entire (contiguous) > > >>> payload. Note that this payload is raw and is not subjected to > > >>> encoding/decoding. > > >>> > > >>> In your case, this _clean_ separation is broken with part of program > > >>> header > > >>> slipping into 2nd vector supposed to contain read data (may be because > > >>>
Re: [Gluster-devel] How use Gluster/NFS
On 04/29/2016 07:34 AM, Rick Macklem wrote: > Abhishek Paliwal wrote: >> Hi Team, >> >> I want to use gluster NFS and export this gluster volume using 'mount -t nfs >> -o acl' command. >> >> i have done the following changes: >> 1. Enable the NFS using nfs.disable off >> 2. Enable the ACL using nfs.acl on >> 3. RPCbind is also running >> 4. Kernel NFS is stopped >> > You could try setting > nfs.register-with-portmap on > I thought it was enabled by default, but maybe that changed > when the default for nfs.disable changed? The default for nfs.disable is _only_ changing starting with GlusterFS-3.8. GlusterFS-3.8 HASN'T BEEN RELEASED YET. IOW the default for nfs.disable has _not_ changed in GlusterFS-3.7 and nfs.register-with-portmap _remains_ enabled by default; and will remain enabled by default even in GlusterFS-3.8. -- Kaleb ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Possible bug in the communications layer ?
- Original Message - > From: "Xavier Hernandez"> To: "Raghavendra Gowdappa" > Cc: "Gluster Devel" > Sent: Friday, April 29, 2016 1:21:57 PM > Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > Hi Raghavendra, > > yes, the readv response contains xdata. The dict length is 38 (0x26) > and, at the moment of failure, rsp.xdata.xdata_len already contains 0x26. rsp.xdata.xdata_len having 0x26 even when decoding failed indicates that the approach used in socket.c to get the length of xdata is correct. However, I cannot find any other way of xdata going into payload vector other than xdata_len being zero. Just to be double sure, I've a patch containing debug message printing xdata_len when decoding fails in socket.c. Can you please apply the patch, run the tests and revert back with results? > > Xavi > > On 29/04/16 09:10, Raghavendra Gowdappa wrote: > > > > > > - Original Message - > >> From: "Raghavendra Gowdappa" > >> To: "Xavier Hernandez" > >> Cc: "Gluster Devel" > >> Sent: Friday, April 29, 2016 12:36:43 PM > >> Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > >> > >> > >> > >> - Original Message - > >>> From: "Raghavendra Gowdappa" > >>> To: "Xavier Hernandez" > >>> Cc: "Jeff Darcy" , "Gluster Devel" > >>> > >>> Sent: Friday, April 29, 2016 12:07:59 PM > >>> Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > >>> > >>> > >>> > >>> - Original Message - > From: "Xavier Hernandez" > To: "Jeff Darcy" > Cc: "Gluster Devel" > Sent: Thursday, April 28, 2016 8:15:36 PM > Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > > > Hi Jeff, > > On 28.04.2016 15:20, Jeff Darcy wrote: > > > > This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The > volume is a distributed-disperse 4*(4+2). I'm able to reproduce the > problem > easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w > -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g > -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting > the > read test. As can be seen in the data below, client3_3_readv_cbk() is > processing an iovec of 116 bytes, however it should be of 154 bytes (the > buffer in memory really seems to contain 154 bytes). The data on the > network > seems ok (at least I haven't been able to identify any problem), so this > must be a processing error on the client side. The last field in cut > buffer > of the sequentialized data corresponds to the length of the xdata field: > 0x26. So at least 38 more byte should be present. > Nice detective work, Xavi. It would be *very* interesting to see what > the value of the "count" parameter is (it's unfortunately optimized > out). > I'll bet it's two, and iov[1].iov_len is 38. I have a weak memory of > some problems with how this iov is put together, a couple of years ago, > and it looks like you might have tripped over one more. > It seems you are right. The count is 2 and the first 38 bytes of the > second > vector contains the remaining data of xdata field. > >>> > >>> This is the bug. client3_3_readv_cbk (and for that matter all the > >>> actors/cbks) expects response in utmost two vectors: > >>> 1. Program header containing request or response. This is subjected to > >>> decoding/encoding. This vector should point to a buffer that contains the > >>> entire program header/response contiguously. > >>> 2. If the procedure returns payload (like readv response or a write > >>> request), > >>> second vector contains the buffer pointing to the entire (contiguous) > >>> payload. Note that this payload is raw and is not subjected to > >>> encoding/decoding. > >>> > >>> In your case, this _clean_ separation is broken with part of program > >>> header > >>> slipping into 2nd vector supposed to contain read data (may be because of > >>> rpc fragmentation). I think this is a bug in socket layer. I'll update > >>> more > >>> on this. > >> > >> Does your read response include xdata too? I think the code related to > >> reading xdata in readv response is a bit murky. > >> > >> > >> > >> case SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT: > >> default_read_size = xdr_sizeof ((xdrproc_t) > >> xdr_gfs3_read_rsp, > >> _rsp); > >> > >> proghdr_buf = frag->fragcurrent; > >> > >> __socket_proto_init_pending (priv,
Re: [Gluster-devel] How use Gluster/NFS
Abhishek Paliwal wrote: > Hi Team, > > I want to use gluster NFS and export this gluster volume using 'mount -t nfs > -o acl' command. > > i have done the following changes: > 1. Enable the NFS using nfs.disable off > 2. Enable the ACL using nfs.acl on > 3. RPCbind is also running > 4. Kernel NFS is stopped > You could try setting nfs.register-with-portmap on I thought it was enabled by default, but maybe that changed when the default for nfs.disable changed? Good luck with it, rick > But still getting follows errors: > > mount.nfs: mount(2): Connection refused > mount.nfs: portmap query retrying: RPC: Program not registered > > mount.nfs: portmap query failed: RPC: Program not registered > > mount.nfs: requested NFS version or transport protocol is not supported > mount.nfs: timeout set for Fri Apr 29 06:13:25 2016 > mount.nfs: trying text-based options > 'acl,vers=4,addr=10.32.0.48,clientaddr=10.32.0.48' > mount.nfs: trying text-based options 'acl,addr=10.32.0.48' > mount.nfs: prog 13, trying vers=3, prot=6 > mount.nfs: prog 13, trying vers=3, prot=17 > > After execute the mount command as follows: > > mount -v -t nfs -o acl 10.32.0.48:/opt/lvmdir/c2/brick /tmp/p > > #rpcinfo -p output > > # rpcinfo -p > program vers proto port service > 10 4 tcp 111 portmapper > 10 3 tcp 111 portmapper > 10 2 tcp 111 portmapper > 10 4 udp 111 portmapper > 10 3 udp 111 portmapper > 10 2 udp 111 portmapper > 100024 1 udp 53564 status > 100024 1 tcp 60246 status > > > Showing no open port for GLuster/NFS so please tell me the steps to enable > the Gluster/NFS > > > -- > > > > > Regards > Abhishek Paliwal > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression-test-burn-in crash in EC test
> The test is doing renames where source and target directories are > different. At the same time a new ec-set is added and rebalance started. > Rebalance will cause dht to also move files between bricks. Maybe this > is causing some race in dht ? > > I'll try to continue investigating when I have some time. That would be great, but if you've pursued this as far as DHT then it would be OK to hand it off to that team as well. Thanks! ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Geo-replication in Tiering based volume - Review request
Hi, Please provide your comments / feedback about handling Geo-replication in a Tiering based volume. link: https://github.com/gluster/glusterfs-specs/blob/master/under_review/Tiering_georeplication.md Following patches fixes the issue(which are already merged): http://review.gluster.org/#/c/12326 http://review.gluster.org/#/c/12355 http://review.gluster.org/#/c/12239 http://review.gluster.org/#/c/12417 http://review.gluster.org/#/c/12844 http://review.gluster.org/#/c/13281 Thanks, Saravana ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Requesting lock-migration reviews
Hi All, Following patches need reviews for lock-migration feature. They are targeted for 3.8. Requesting for reviews. 1- http://review.gluster.org/#/c/13970/ 2- http://review.gluster.org/#/c/13993/ 3- http://review.gluster.org/#/c/13994/ 4- http://review.gluster.org/#/c/13995/ 5- http://review.gluster.org/#/c/14011/ 6- http://review.gluster.org/#/c/14012/ 7- http://review.gluster.org/#/c/14013/ 8- http://review.gluster.org/#/c/14014/ 9- http://review.gluster.org/#/c/14024/ 10- http://review.gluster.org/#/c/13493/ 11-http://review.gluster.org/#/c/14074/ Thanks, Susant ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Bugs with incorrect status
1008839 (mainline) POST: Certain blocked entry lock info not retained after the lock is granted [master] Ie37837 features/locks : Certain blocked entry lock info not retained after the lock is granted (ABANDONED) ** ata...@redhat.com: Bug 1008839 is in POST, but all changes have been abandoned ** 1062437 (mainline) POST: stripe does not work with empty xlator [master] I778699 stripe: fix FIRST_CHILD checks to be more general (ABANDONED) ** jda...@redhat.com: Bug 1062437 is in POST, but all changes have been abandoned ** 1074947 (mainline) ON_QA: add option to bulld rpm without server [master] Iaa1498 build: add option to bulld rpm without server (NEW) ** nde...@redhat.com: Bug 1074947 should be in POST, change Iaa1498 under review ** 1089642 (mainline) POST: Quotad doesn't load io-stats xlator, which implies none of the logging options have any effect on it. [master] Iccc033 glusterd: add io-stats to all quotad's sub-graphs (ABANDONED) ** spa...@redhat.com: Bug 1089642 is in POST, but all changes have been abandoned ** 1092414 (mainline) POST: Disable NFS by default [master] Ibdf990 glusterd: default value of nfs.disable, change from false to true (MERGED) [master] If52f5e glusterd: default value of nfs.disable, change from false to true (MERGED) ** nde...@redhat.com: Bug 1092414 should be MODIFIED, change If52f5e has been merged ** 1093768 (3.5.0) POST: Comment typo in gf-history.changelog.c ** kschi...@redhat.com: No change posted, but bug 1093768 is in POST ** 1094478 (3.5.0) POST: Bad macro in changelog-misc.h ** kschi...@redhat.com: No change posted, but bug 1094478 is in POST ** 1099294 (3.5.0) POST: Incorrect error message in /features/changelog/lib/src/gf-history-changelog.c ** kschi...@redhat.com: No change posted, but bug 1099294 is in POST ** 1099460 (3.5.0) NEW: file locks are not released within an acceptable time when a fuse-client uncleanly disconnects [release-3.5] I5e5f54 socket: use TCP_USER_TIMEOUT to detect client failures quicker (NEW) ** nde...@redhat.com: Bug 1099460 should be in POST, change I5e5f54 under review ** 1099683 (3.5.0) POST: Silent error from call to realpath in features/changelog/lib/src/gf-history-changelog.c ** vshan...@redhat.com: No change posted, but bug 1099683 is in POST ** 110 (mainline) MODIFIED: [RFE] Add regression tests for the component geo-replication [master] Ie27848 tests/geo-rep: Automated configuration for geo-rep regression. (NEW) [master] I9c9ae8 geo-rep: Regression tests improvements (ABANDONED) [master] I433dd8 Geo-rep: Adding regression tests for geo-rep (MERGED) [master] Ife8201 Geo-rep: Adding regression tests for geo-rep (ABANDONED) ** khire...@redhat.com: Bug 110 should be in POST, change Ie27848 under review ** 020 (3.5.0) POST: Unused code changelog_entry_length ** kschi...@redhat.com: No change posted, but bug 020 is in POST ** 031 (3.5.0) POST: CHANGELOG_FILL_HTIME_DIR macro fills buffer without size limits ** kschi...@redhat.com: No change posted, but bug 031 is in POST ** 1114415 (mainline) MODIFIED: There is no way to monitor if the healing is successful when the brick is erased ** pkara...@redhat.com: No change posted, but bug 1114415 is in MODIFIED ** 1116714 (3.5.0) POST: indices/xattrop directory contains stale entries [release-3.5] I470cf8 afr : Added xdata flags to indicate probable existence of stale index. (ABANDONED) ** ata...@redhat.com: Bug 1116714 is in POST, but all changes have been abandoned ** 1117886 (mainline) MODIFIED: Gluster not resolving hosts with IPv6 only lookups [master] Icbaa3c Reinstate ipv6 support (NEW) [master] Iebc96e glusterd: Bug fixes for IPv6 support (MERGED) ** nithind1...@yahoo.in: Bug 1117886 should be in POST, change Icbaa3c under review ** 1122120 (3.5.1) MODIFIED: Bricks crashing after disable and re-enabled quota on a volume ** kdhan...@redhat.com: No change posted, but bug 1122120 is in MODIFIED ** 1131846 (mainline) POST: remove-brick - once you stop remove-brick using stop command, status says ' failed: remove-brick not started.' ** gg...@redhat.com: No change posted, but bug 1131846 is in POST ** 1132074 (mainline) POST: Document steps to perform for replace-brick [master] Ic7292b doc: Steps for Replacing brick in gluster volume (ABANDONED) ** pkara...@redhat.com: Bug 1132074 is in POST, but all changes have been abandoned ** 1134305 (mainline) POST: rpc actor failed to complete successfully messages in Glusterd [master] I094516 protocol/client: Add explicit states for connection sequence (ABANDONED) ** pkara...@redhat.com: Bug 1134305 is in POST, but all changes have been abandoned ** 1142423 (mainline) MODIFIED: [DHT-REBALANCE]-DataLoss: The data appended to a file during its migration will be lost once the migration is done [master] I044a83 cluster/dht Use additional dst_info in inode_ctx (NEW) [master] I5c8810 cluster/dht: Fix stale
Re: [Gluster-devel] Regression-test-burn-in crash in EC test
Hi Jeff, On 27/04/16 20:01, Jeff Darcy wrote: One of the "rewards" of reviewing and merging people's patches is getting email if the next regression-test-burn-in should fail - even if it fails for a completely unrelated reason. Today I got one that's not among the usual suspects. The failure was a core dump in tests/bugs/disperse/bug-1304988.t, weighing in at a respectable 42 frames. #0 0x7fef25976cb9 in dht_rename_lock_cbk #1 0x7fef25955f62 in dht_inodelk_done #2 0x7fef25957352 in dht_blocking_inodelk_cbk #3 0x7fef32e02f8f in default_inodelk_cbk #4 0x7fef25c029a3 in ec_manager_inodelk #5 0x7fef25bf9802 in __ec_manager #6 0x7fef25bf990c in ec_manager #7 0x7fef25c03038 in ec_inodelk #8 0x7fef25bee7ad in ec_gf_inodelk #9 0x7fef25957758 in dht_blocking_inodelk_rec #10 0x7fef25957b2d in dht_blocking_inodelk #11 0x7fef2597713f in dht_rename_lock #12 0x7fef25977835 in dht_rename #13 0x7fef32e0f032 in default_rename #14 0x7fef32e0f032 in default_rename #15 0x7fef32e0f032 in default_rename #16 0x7fef32e0f032 in default_rename #17 0x7fef32e0f032 in default_rename #18 0x7fef32e07c29 in default_rename_resume #19 0x7fef32d8ed40 in call_resume_wind #20 0x7fef32d98b2f in call_resume #21 0x7fef24cfc568 in open_and_resume #22 0x7fef24cffb99 in ob_rename #23 0x7fef24aee482 in mdc_rename #24 0x7fef248d68e5 in io_stats_rename #25 0x7fef32e0f032 in default_rename #26 0x7fef2ab1b2b9 in fuse_rename_resume #27 0x7fef2ab12c47 in fuse_fop_resume #28 0x7fef2ab107cc in fuse_resolve_done #29 0x7fef2ab108a2 in fuse_resolve_all #30 0x7fef2ab10900 in fuse_resolve_continue #31 0x7fef2ab0fb7c in fuse_resolve_parent #32 0x7fef2ab1077d in fuse_resolve #33 0x7fef2ab10879 in fuse_resolve_all #34 0x7fef2ab10900 in fuse_resolve_continue #35 0x7fef2ab0fb7c in fuse_resolve_parent #36 0x7fef2ab1077d in fuse_resolve #37 0x7fef2ab10824 in fuse_resolve_all #38 0x7fef2ab1093e in fuse_resolve_and_resume #39 0x7fef2ab1b40e in fuse_rename #40 0x7fef2ab2a96a in fuse_thread_proc #41 0x7fef3204daa1 in start_thread In other words we started at FUSE, went through a bunch of performance translators, through DHT to EC, and then crashed on the way back. It seems a little odd that we turn the fop around immediately in EC, and that we have default_inodelk_cbk at frame 3. Could one of the DHT or EC people please take a look at it? Thanks! The part regarding to ec seems ok. This is uncommon, but can happen. When ec_gf_inodelk() is called, it sends a inodelk request to all its subvolumes. It may happen that the callbacks of all these requests are received before returning from ec_gf_inodelk() itself. This executes the callback inside the same thread of the caller. The reason why default_inodelk_cbk() is seen is because ec uses this function to report the result back to the caller (instead of calling STACK_UNWIND() itself). This seems what have happened here. The frames returned by ec to upper xlators are the same used by them (the frame in dht_blocking_lock() is the same that receives dht_blocking_inodelk_cbk()) and ec doesn't touch them, however the frame at 0x7fef1003ca5c is absolutely corrupted. We can see the call state from the core: (gdb) f 4 #4 0x7fef25c029a3 in ec_manager_inodelk (fop=0x7fef1000d37c, state=5) at /home/jenkins/root/workspace/regression-test-burn-in/xlators/cluster/ec/src/ec-locks.c:645 645 fop->cbks.inodelk(fop->req_frame, fop, fop->xl, (gdb) print fop->answer $30 = (ec_cbk_data_t *) 0x7fef180094ac (gdb) print fop->answer->op_ret $31 = 0 (gdb) print fop->answer->op_errno $32 = 0 (gdb) print fop->answer->count $33 = 6 (gdb) print fop->answer->mask $34 = 63 As we can see there's an actual answer to the request with a success result (op_ret == 0 and op_errno == 0) composed of the combination of answers from 6 subvolumes (count == 6). Looking at the dht code I have been unable to see any possible cause either. The test is doing renames where source and target directories are different. At the same time a new ec-set is added and rebalance started. Rebalance will cause dht to also move files between bricks. Maybe this is causing some race in dht ? I'll try to continue investigating when I have some time. Xavi https://build.gluster.org/job/regression-test-burn-in/868/console ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Possible bug in the communications layer ?
Hi Raghavendra, yes, the readv response contains xdata. The dict length is 38 (0x26) and, at the moment of failure, rsp.xdata.xdata_len already contains 0x26. Xavi On 29/04/16 09:10, Raghavendra Gowdappa wrote: - Original Message - From: "Raghavendra Gowdappa"To: "Xavier Hernandez" Cc: "Gluster Devel" Sent: Friday, April 29, 2016 12:36:43 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? - Original Message - From: "Raghavendra Gowdappa" To: "Xavier Hernandez" Cc: "Jeff Darcy" , "Gluster Devel" Sent: Friday, April 29, 2016 12:07:59 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? - Original Message - From: "Xavier Hernandez" To: "Jeff Darcy" Cc: "Gluster Devel" Sent: Thursday, April 28, 2016 8:15:36 PM Subject: Re: [Gluster-devel] Possible bug in the communications layer ? Hi Jeff, On 28.04.2016 15:20, Jeff Darcy wrote: This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The volume is a distributed-disperse 4*(4+2). I'm able to reproduce the problem easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting the read test. As can be seen in the data below, client3_3_readv_cbk() is processing an iovec of 116 bytes, however it should be of 154 bytes (the buffer in memory really seems to contain 154 bytes). The data on the network seems ok (at least I haven't been able to identify any problem), so this must be a processing error on the client side. The last field in cut buffer of the sequentialized data corresponds to the length of the xdata field: 0x26. So at least 38 more byte should be present. Nice detective work, Xavi. It would be *very* interesting to see what the value of the "count" parameter is (it's unfortunately optimized out). I'll bet it's two, and iov[1].iov_len is 38. I have a weak memory of some problems with how this iov is put together, a couple of years ago, and it looks like you might have tripped over one more. It seems you are right. The count is 2 and the first 38 bytes of the second vector contains the remaining data of xdata field. This is the bug. client3_3_readv_cbk (and for that matter all the actors/cbks) expects response in utmost two vectors: 1. Program header containing request or response. This is subjected to decoding/encoding. This vector should point to a buffer that contains the entire program header/response contiguously. 2. If the procedure returns payload (like readv response or a write request), second vector contains the buffer pointing to the entire (contiguous) payload. Note that this payload is raw and is not subjected to encoding/decoding. In your case, this _clean_ separation is broken with part of program header slipping into 2nd vector supposed to contain read data (may be because of rpc fragmentation). I think this is a bug in socket layer. I'll update more on this. Does your read response include xdata too? I think the code related to reading xdata in readv response is a bit murky. case SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT: default_read_size = xdr_sizeof ((xdrproc_t) xdr_gfs3_read_rsp, _rsp); proghdr_buf = frag->fragcurrent; __socket_proto_init_pending (priv, default_read_size); frag->call_body.reply.accepted_success_state = SP_STATE_READING_PROC_HEADER; /* fall through */ case SP_STATE_READING_PROC_HEADER: __socket_proto_read (priv, ret); By this time we've read read response _minus_ the xdata I meant we have read "readv response header" /* there can be 'xdata' in read response, figure it out */ xdrmem_create (, proghdr_buf, default_read_size, XDR_DECODE); We created xdr stream above with "default_read_size" (this doesn't include xdata) /* This will fail if there is xdata sent from server, if not, well and good, we don't need to worry about */ what if xdata is present and decoding failed (as length of xdr stream above - default_read_size - doesn't include xdata)? would we have a valid value in read_rsp.xdata.xdata_len? This is the part I am confused about. If read_rsp.xdata.xdata_len is not correct then there is a possibility that xdata might not be entirely present in the vector socket passes to higher layers as progheader (with part or entire xdata spilling over to payload vector). xdr_gfs3_read_rsp (, _rsp);
Re: [Gluster-devel] Requesting for NetBSD setup
- Original Message - > From: "Emmanuel Dreyfus"> To: "Karthik Subrahmanya" > Cc: "gluster-devel" , gluster-in...@gluster.org > Sent: Friday, April 29, 2016 12:35:24 PM > Subject: Re: [Gluster-devel] Requesting for NetBSD setup > > On Fri, Apr 29, 2016 at 01:28:53AM -0400, Karthik Subrahmanya wrote: > > I would like to ask for a NetBSD setup > > nbslave7[4gh] are disabled in Jenkins right now. They are labeled > "Disconnected by kaushal", but I don't kno why. Once it is confirmed > that they are not alread used for testing, you could pick one. > > I still does not know who is the password guardian at Rehat, though. > Thanks for the advice Emmanuel, I think that is going to take some time. Can you point me some other alternative way to test it in my system? I have actually been stuck with this for sometime now and I really can't understand why it's failing. Thanks, Karthik Subrahmanya > -- > Emmanuel Dreyfus > m...@netbsd.org > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-infra] Requesting for NetBSD setup
On Fri, Apr 29, 2016 at 12:35 PM, Emmanuel Dreyfuswrote: > On Fri, Apr 29, 2016 at 01:28:53AM -0400, Karthik Subrahmanya wrote: >> I would like to ask for a NetBSD setup > > nbslave7[4gh] are disabled in Jenkins right now. They are labeled > "Disconnected by kaushal", but I don't kno why. Once it is confirmed > that they are not alread used for testing, you could pick one. > > I still does not know who is the password guardian at Rehat, though. I often disconnect machines that aren't in a working state, and reboot them. If I've left something in the disconnected state, most likely those machines didn't get back to a working state after the reboot. Or it could be that I just forgot. > > -- > Emmanuel Dreyfus > m...@netbsd.org > ___ > Gluster-infra mailing list > gluster-in...@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-infra ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Possible bug in the communications layer ?
- Original Message - > From: "Raghavendra Gowdappa"> To: "Xavier Hernandez" > Cc: "Gluster Devel" > Sent: Friday, April 29, 2016 12:36:43 PM > Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > > > - Original Message - > > From: "Raghavendra Gowdappa" > > To: "Xavier Hernandez" > > Cc: "Jeff Darcy" , "Gluster Devel" > > > > Sent: Friday, April 29, 2016 12:07:59 PM > > Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > > > > > > > - Original Message - > > > From: "Xavier Hernandez" > > > To: "Jeff Darcy" > > > Cc: "Gluster Devel" > > > Sent: Thursday, April 28, 2016 8:15:36 PM > > > Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > > > > > > > > > > > Hi Jeff, > > > > > > On 28.04.2016 15:20, Jeff Darcy wrote: > > > > > > > > > > > > This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The > > > volume is a distributed-disperse 4*(4+2). I'm able to reproduce the > > > problem > > > easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w > > > -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g > > > -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting > > > the > > > read test. As can be seen in the data below, client3_3_readv_cbk() is > > > processing an iovec of 116 bytes, however it should be of 154 bytes (the > > > buffer in memory really seems to contain 154 bytes). The data on the > > > network > > > seems ok (at least I haven't been able to identify any problem), so this > > > must be a processing error on the client side. The last field in cut > > > buffer > > > of the sequentialized data corresponds to the length of the xdata field: > > > 0x26. So at least 38 more byte should be present. > > > Nice detective work, Xavi. It would be *very* interesting to see what > > > the value of the "count" parameter is (it's unfortunately optimized out). > > > I'll bet it's two, and iov[1].iov_len is 38. I have a weak memory of > > > some problems with how this iov is put together, a couple of years ago, > > > and it looks like you might have tripped over one more. > > > It seems you are right. The count is 2 and the first 38 bytes of the > > > second > > > vector contains the remaining data of xdata field. > > > > This is the bug. client3_3_readv_cbk (and for that matter all the > > actors/cbks) expects response in utmost two vectors: > > 1. Program header containing request or response. This is subjected to > > decoding/encoding. This vector should point to a buffer that contains the > > entire program header/response contiguously. > > 2. If the procedure returns payload (like readv response or a write > > request), > > second vector contains the buffer pointing to the entire (contiguous) > > payload. Note that this payload is raw and is not subjected to > > encoding/decoding. > > > > In your case, this _clean_ separation is broken with part of program header > > slipping into 2nd vector supposed to contain read data (may be because of > > rpc fragmentation). I think this is a bug in socket layer. I'll update more > > on this. > > Does your read response include xdata too? I think the code related to > reading xdata in readv response is a bit murky. > > > > case SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT: > default_read_size = xdr_sizeof ((xdrproc_t) > xdr_gfs3_read_rsp, > _rsp); > > proghdr_buf = frag->fragcurrent; > > __socket_proto_init_pending (priv, default_read_size); > > frag->call_body.reply.accepted_success_state > = SP_STATE_READING_PROC_HEADER; > > /* fall through */ > > case SP_STATE_READING_PROC_HEADER: > __socket_proto_read (priv, ret); > > > By this time we've read read response _minus_ the xdata I meant we have read "readv response header" > > /* there can be 'xdata' in read response, figure it out */ > xdrmem_create (, proghdr_buf, default_read_size, >XDR_DECODE); > > >> We created xdr stream above with "default_read_size" (this doesn't > >> include xdata) > > /* This will fail if there is xdata sent from server, if not, >well and good, we don't need to worry about */ > > >> what if xdata is present and decoding failed (as length of xdr stream > >> above - default_read_size - doesn't include xdata)? would we have a > >> valid value in read_rsp.xdata.xdata_len? This is the part I am > >> confused about. If read_rsp.xdata.xdata_len is not correct
[Gluster-devel] Bitrot Review Request
Hi Pranith, You had a concern of consuming I/O threads when bit-rot uses rchecksum interface to signing, normal scrubbing and on-demand scrubbing with tiering. http://review.gluster.org/#/c/13833/5/xlators/storage/posix/src/posix.c As discussed over comments, the concern is valid and the above patch is not being taken in and would be abandoned. I have the following patch where the signing and normal scrubbing would not consume io-threads. Only the on-demand scrubbing consumes io-threads. I think this should be fine as tiering is single threaded and only consumes one I/O thread (as told by Joseph on PatchSet 6). http://review.gluster.org/#/c/13969/ Since, on-demand scrubbing is disabled by default and there is a size cap and we document to increase the default number of I/O threads, consuming one I/O thread for scrubbing would be fine I guess. Let me know your thoughts. Thanks and Regards, Kotresh H R ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Possible bug in the communications layer ?
- Original Message - > From: "Raghavendra Gowdappa"> To: "Xavier Hernandez" > Cc: "Jeff Darcy" , "Gluster Devel" > > Sent: Friday, April 29, 2016 12:07:59 PM > Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > > > - Original Message - > > From: "Xavier Hernandez" > > To: "Jeff Darcy" > > Cc: "Gluster Devel" > > Sent: Thursday, April 28, 2016 8:15:36 PM > > Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > > > > > > > Hi Jeff, > > > > On 28.04.2016 15:20, Jeff Darcy wrote: > > > > > > > > This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The > > volume is a distributed-disperse 4*(4+2). I'm able to reproduce the problem > > easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w > > -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g > > -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting > > the > > read test. As can be seen in the data below, client3_3_readv_cbk() is > > processing an iovec of 116 bytes, however it should be of 154 bytes (the > > buffer in memory really seems to contain 154 bytes). The data on the > > network > > seems ok (at least I haven't been able to identify any problem), so this > > must be a processing error on the client side. The last field in cut buffer > > of the sequentialized data corresponds to the length of the xdata field: > > 0x26. So at least 38 more byte should be present. > > Nice detective work, Xavi. It would be *very* interesting to see what > > the value of the "count" parameter is (it's unfortunately optimized out). > > I'll bet it's two, and iov[1].iov_len is 38. I have a weak memory of > > some problems with how this iov is put together, a couple of years ago, > > and it looks like you might have tripped over one more. > > It seems you are right. The count is 2 and the first 38 bytes of the second > > vector contains the remaining data of xdata field. > > This is the bug. client3_3_readv_cbk (and for that matter all the > actors/cbks) expects response in utmost two vectors: > 1. Program header containing request or response. This is subjected to > decoding/encoding. This vector should point to a buffer that contains the > entire program header/response contiguously. > 2. If the procedure returns payload (like readv response or a write request), > second vector contains the buffer pointing to the entire (contiguous) > payload. Note that this payload is raw and is not subjected to > encoding/decoding. > > In your case, this _clean_ separation is broken with part of program header > slipping into 2nd vector supposed to contain read data (may be because of > rpc fragmentation). I think this is a bug in socket layer. I'll update more > on this. Does your read response include xdata too? I think the code related to reading xdata in readv response is a bit murky. case SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT: default_read_size = xdr_sizeof ((xdrproc_t) xdr_gfs3_read_rsp, _rsp); proghdr_buf = frag->fragcurrent; __socket_proto_init_pending (priv, default_read_size); frag->call_body.reply.accepted_success_state = SP_STATE_READING_PROC_HEADER; /* fall through */ case SP_STATE_READING_PROC_HEADER: __socket_proto_read (priv, ret); > By this time we've read read response _minus_ the xdata /* there can be 'xdata' in read response, figure it out */ xdrmem_create (, proghdr_buf, default_read_size, XDR_DECODE); >> We created xdr stream above with "default_read_size" (this doesn't >> include xdata) /* This will fail if there is xdata sent from server, if not, well and good, we don't need to worry about */ >> what if xdata is present and decoding failed (as length of xdr stream >> above - default_read_size - doesn't include xdata)? would we have a >> valid value in read_rsp.xdata.xdata_len? This is the part I am confused >> about. If read_rsp.xdata.xdata_len is not correct then there is a >> possibility that xdata might not be entirely present in the vector >> socket passes to higher layers as progheader (with part or entire xdata >> spilling over to payload vector). xdr_gfs3_read_rsp (, _rsp); free (read_rsp.xdata.xdata_val); /* need to round off to proper roof (%4), as XDR packing pads
Re: [Gluster-devel] Requesting for NetBSD setup
On Fri, Apr 29, 2016 at 01:28:53AM -0400, Karthik Subrahmanya wrote: > I would like to ask for a NetBSD setup nbslave7[4gh] are disabled in Jenkins right now. They are labeled "Disconnected by kaushal", but I don't kno why. Once it is confirmed that they are not alread used for testing, you could pick one. I still does not know who is the password guardian at Rehat, though. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Possible bug in the communications layer ?
- Original Message - > From: "Xavier Hernandez"> To: "Jeff Darcy" > Cc: "Gluster Devel" > Sent: Thursday, April 28, 2016 8:15:36 PM > Subject: Re: [Gluster-devel] Possible bug in the communications layer ? > > > > Hi Jeff, > > On 28.04.2016 15:20, Jeff Darcy wrote: > > > > This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The > volume is a distributed-disperse 4*(4+2). I'm able to reproduce the problem > easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w > -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g > -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting the > read test. As can be seen in the data below, client3_3_readv_cbk() is > processing an iovec of 116 bytes, however it should be of 154 bytes (the > buffer in memory really seems to contain 154 bytes). The data on the network > seems ok (at least I haven't been able to identify any problem), so this > must be a processing error on the client side. The last field in cut buffer > of the sequentialized data corresponds to the length of the xdata field: > 0x26. So at least 38 more byte should be present. > Nice detective work, Xavi. It would be *very* interesting to see what > the value of the "count" parameter is (it's unfortunately optimized out). > I'll bet it's two, and iov[1].iov_len is 38. I have a weak memory of > some problems with how this iov is put together, a couple of years ago, > and it looks like you might have tripped over one more. > It seems you are right. The count is 2 and the first 38 bytes of the second > vector contains the remaining data of xdata field. This is the bug. client3_3_readv_cbk (and for that matter all the actors/cbks) expects response in utmost two vectors: 1. Program header containing request or response. This is subjected to decoding/encoding. This vector should point to a buffer that contains the entire program header/response contiguously. 2. If the procedure returns payload (like readv response or a write request), second vector contains the buffer pointing to the entire (contiguous) payload. Note that this payload is raw and is not subjected to encoding/decoding. In your case, this _clean_ separation is broken with part of program header slipping into 2nd vector supposed to contain read data (may be because of rpc fragmentation). I think this is a bug in socket layer. I'll update more on this. > The rest of the data in > the second vector seems the payload of the readv fop, plus a 2 bytes > padding: > (gdb) f 0 > #0 client3_3_readv_cbk (req=0x7fdc4051a31c, iov=0x7fdc4051a35c, > count=, myframe=0x7fdc520d505c) at client-rpc-fops.c:3021 > 3021gf_msg (this->name, GF_LOG_ERROR, EINVAL, > (gdb) print *iov > $2 = {iov_base = 0x7fdc14b0d018, iov_len = 116} > (gdb) f 1 > #1 0x7fdc56dafab0 in rpc_clnt_handle_reply > (clnt=clnt@entry=0x7fdc3c1f4bb0, pollin=pollin@entry=0x7fdc34010f20) at > rpc-clnt.c:764 > 764 req->cbkfn (req, req->rsp, req->rspcnt, saved_frame->frame); > (gdb) print *pollin > $3 = {vector = {{iov_base = 0x7fdc14b0d000, iov_len = 140}, {iov_base = > 0x7fdc14a4d000, iov_len = 32808}, {iov_base = 0x0, iov_len = 0} times>}, count = 2, > vectored = 1 '\001', private = 0x7fdc340106c0, iobref = 0x7fdc34006660, > hdr_iobuf = 0x7fdc3c4c07c0, is_reply = 1 '\001'} > (gdb) f 0 > #0 client3_3_readv_cbk (req=0x7fdc4051a31c, iov=0x7fdc4051a35c, > count=, myframe=0x7fdc520d505c) at client-rpc-fops.c:3021 > 3021gf_msg (this->name, GF_LOG_ERROR, EINVAL, > (gdb) print iov[1] > $4 = {iov_base = 0x7fdc14a4d000, iov_len = 32808} > (gdb) print iov[2] > $5 = {iov_base = 0x2, iov_len = 140583741974112} > (gdb) x/128xb 0x7fdc14a4d000 > 0x7fdc14a4d000: 0x000x000x000x010x000x000x000x17 > 0x7fdc14a4d008: 0x000x000x000x020x670x6c0x750x73 > 0x7fdc14a4d010: 0x740x650x720x660x730x2e0x690x6e > 0x7fdc14a4d018: 0x6f0x640x650x6c0x6b0x2d0x630x6f > 0x7fdc14a4d020: 0x750x6e0x740x000x310x000x000x00 > 0x7fdc14a4d028: 0x5c0x5c0x5c0x5c0x5c0x5c0x5c0x5c > 0x7fdc14a4d030: 0x000x000x000x000x000x000x000x00 > 0x7fdc14a4d038: 0x000x000x000x000x000x000x000x00 > 0x7fdc14a4d040: 0x000x000x000x000x000x000x000x00 > 0x7fdc14a4d048: 0x5c0x000x000x000x000x000x000x00 > 0x7fdc14a4d050: 0x000x000x000x000x000x000x000x00 > 0x7fdc14a4d058: 0x000x000x000x000x000x000x000x00 > 0x7fdc14a4d060: 0x000x000x000x000x000x000x000x00 > 0x7fdc14a4d068: 0x000x000x000x000x000x000x000x00 > 0x7fdc14a4d070: 0x000x000x000x000x000x000x00
[Gluster-devel] How use Gluster/NFS
Hi Team, I want to use gluster NFS and export this gluster volume using 'mount -t nfs -o acl' command. i have done the following changes: 1. Enable the NFS using nfs.disable off 2. Enable the ACL using nfs.acl on 3. RPCbind is also running 4. Kernel NFS is stopped But still getting follows errors: mount.nfs: mount(2): Connection refused mount.nfs: portmap query retrying: RPC: Program not registered mount.nfs: portmap query failed: RPC: Program not registered mount.nfs: requested NFS version or transport protocol is not supported mount.nfs: timeout set for Fri Apr 29 06:13:25 2016 mount.nfs: trying text-based options 'acl,vers=4,addr=10.32.0.48,clientaddr=10.32.0.48' mount.nfs: trying text-based options 'acl,addr=10.32.0.48' mount.nfs: prog 13, trying vers=3, prot=6 mount.nfs: prog 13, trying vers=3, prot=17 After execute the mount command as follows: mount -v -t nfs -o acl 10.32.0.48:/opt/lvmdir/c2/brick /tmp/p #rpcinfo -p output # rpcinfo -p program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 53564 status 1000241 tcp 60246 status Showing no open port for GLuster/NFS so please tell me the steps to enable the Gluster/NFS -- Regards Abhishek Paliwal ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel