Re: [Gluster-devel] [Gluster-users] CentOS 7: Gluster Test Framework testcases failure
I installed the 'psmisc' package and it installed killall command and reverted pkill to killall in include.rc file. Testcases started executing properly and will send tests failure report soon. Thanks, Kiran. On Thu, Sep 25, 2014 at 12:32 PM, Niels de Vos nde...@redhat.com wrote: On Thu, Sep 25, 2014 at 10:45:49AM +0530, Kiran Patil wrote: pkill expects only one pattern, so I did as below in tests/include.rc file and test cases started working fine. pkill glusterfs 2/dev/null || true; pkill glusterfsd 2/dev/null || true; pkill glusterd 2/dev/null || true; Sorry, I'm a little late to the party, but the 'killall' command should be available for CentOS-7 too. It seems to be part of the 'psmisc' package. I guess we should add this as a dependency on the wiki page. Could you check if that works for you too? If not, and you are interested, I'll help you posting a patch to make the pkill change. Thanks, Niels On Wed, Sep 24, 2014 at 6:48 PM, Justin Clift jus...@gluster.org wrote: On 24/09/2014, at 2:07 PM, Kiran Patil wrote: Some of the reasons I have found so far are as below, 1. Cleanup operation does not work since killall is not part of CentOS 7 2. I used pkill and still testcases fail at first step Ex: TEST glusterd 3. Subsequent running of testcases does not proceed and hangs at the first testcase (tests/basic/bd.t) This sounds like there could be a few challenges then. I'm setting up a new Fedora 20 (or 21 alpha) VM in Rackspace for running btrfs regression tests on. Guessing that will experience these same problems as your CentOS 7 test run, so I'm definitely interested in this too. + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-users mailing list gluster-us...@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] glusterfs-3.6.0beta2 released
Hi All, I am happy to announce the availability of 3.6.0beta2 release for testing. This release contains improvements in snapshot, afrv2, erasure coding and has a few fixes for portability. The source tarball is available at [1] and RPMs for Fedora 19,20,21,22 and RHEL/CentOS 5,6,7. are available at [2]. List of patches that were added to this release after beta1 can be found at [3]. Do rev up your test engines and let us know what you find with this release :). Cheers, Vijay [1] http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-3.6.0beta2.tar.gz [2] http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.6.0beta2/ [3] http://www.gluster.org/community/documentation/index.php/3.6.0beta2-changelog ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] CentOS 7: Gluster Test Framework testcases failure
On Thu, Sep 25, 2014 at 12:49:21PM +0530, Kiran Patil wrote: I installed the 'psmisc' package and it installed killall command and reverted pkill to killall in include.rc file. Testcases started executing properly and will send tests failure report soon. Thanks, I've added 'psmisc' to the list of packages in the wiki: - http://www.gluster.org/community/documentation/index.php/Using_the_Gluster_Test_Framework#Preparation_steps_for_CentOS_7_.28only.29 Niels Thanks, Kiran. On Thu, Sep 25, 2014 at 12:32 PM, Niels de Vos nde...@redhat.com wrote: On Thu, Sep 25, 2014 at 10:45:49AM +0530, Kiran Patil wrote: pkill expects only one pattern, so I did as below in tests/include.rc file and test cases started working fine. pkill glusterfs 2/dev/null || true; pkill glusterfsd 2/dev/null || true; pkill glusterd 2/dev/null || true; Sorry, I'm a little late to the party, but the 'killall' command should be available for CentOS-7 too. It seems to be part of the 'psmisc' package. I guess we should add this as a dependency on the wiki page. Could you check if that works for you too? If not, and you are interested, I'll help you posting a patch to make the pkill change. Thanks, Niels On Wed, Sep 24, 2014 at 6:48 PM, Justin Clift jus...@gluster.org wrote: On 24/09/2014, at 2:07 PM, Kiran Patil wrote: Some of the reasons I have found so far are as below, 1. Cleanup operation does not work since killall is not part of CentOS 7 2. I used pkill and still testcases fail at first step Ex: TEST glusterd 3. Subsequent running of testcases does not proceed and hangs at the first testcase (tests/basic/bd.t) This sounds like there could be a few challenges then. I'm setting up a new Fedora 20 (or 21 alpha) VM in Rackspace for running btrfs regression tests on. Guessing that will experience these same problems as your CentOS 7 test run, so I'm definitely interested in this too. + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-users mailing list gluster-us...@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] CentOS 7: Gluster Test Framework testcases failure
The below testcases failing are related to xfs, cluster and others.. The hardcoded ones I have fixed temporarily by providing the absolute pathname. Testcase /tests/bugs/bug-767095.t is fixed by changing awk parameter $5 to $4. Testcase tests/bugs/bug-861542.t is failing at EXPECT N/A port_field $V0 '0'; # volume status If I change its value to to '1' it passes, is that correct ? I will keep posting the new findings and possible fixes. Test Summary Report ./tests/bugs/bug-802417.t (Wstat: 0 Tests: 39 Failed: 2) Failed tests: 28, 31 ./tests/bugs/bug-821056.t (Wstat: 0 Tests: 21 Failed: 1) Failed test: 15 ./tests/bugs/bug-861542.t (Wstat: 0 Tests: 13 Failed: 1) Failed test: 10 ./tests/bugs/bug-908146.t (Wstat: 0 Tests: 10 Failed: 2) Failed tests: 8-9 ./tests/bugs/bug-913555.t (Wstat: 0 Tests: 11 Failed: 4) Failed tests: 4-6, 9 ./tests/bugs/bug-948686.t (Wstat: 0 Tests: 19 Failed: 8) Failed tests: 5-7, 9, 11, 13-15 ./tests/bugs/bug-948729/bug-948729-force.t (Wstat: 0 Tests: 35 Failed: 14) Failed tests: 15, 17, 19, 21, 24-27, 29-31, 33-35 ./tests/bugs/bug-948729/bug-948729-mode-script.t (Wstat: 0 Tests: 35 Failed: 14) Failed tests: 15, 17, 19, 21, 24-27, 29-31, 33-35 ./tests/bugs/bug-948729/bug-948729.t(Wstat: 0 Tests: 23 Failed: 4) Failed tests: 12, 15, 19, 23 Files=124, Tests=2031, 4470 wallclock secs ( 1.57 usr 0.24 sys + 259.31 cusr 220.81 csys = 481.93 CPU) Result: FAIL Thanks, Kiran. On Thu, Sep 25, 2014 at 2:17 PM, Niels de Vos nde...@redhat.com wrote: On Thu, Sep 25, 2014 at 12:49:21PM +0530, Kiran Patil wrote: I installed the 'psmisc' package and it installed killall command and reverted pkill to killall in include.rc file. Testcases started executing properly and will send tests failure report soon. Thanks, I've added 'psmisc' to the list of packages in the wiki: - http://www.gluster.org/community/documentation/index.php/Using_the_Gluster_Test_Framework#Preparation_steps_for_CentOS_7_.28only.29 Niels Thanks, Kiran. On Thu, Sep 25, 2014 at 12:32 PM, Niels de Vos nde...@redhat.com wrote: On Thu, Sep 25, 2014 at 10:45:49AM +0530, Kiran Patil wrote: pkill expects only one pattern, so I did as below in tests/include.rc file and test cases started working fine. pkill glusterfs 2/dev/null || true; pkill glusterfsd 2/dev/null || true; pkill glusterd 2/dev/null || true; Sorry, I'm a little late to the party, but the 'killall' command should be available for CentOS-7 too. It seems to be part of the 'psmisc' package. I guess we should add this as a dependency on the wiki page. Could you check if that works for you too? If not, and you are interested, I'll help you posting a patch to make the pkill change. Thanks, Niels On Wed, Sep 24, 2014 at 6:48 PM, Justin Clift jus...@gluster.org wrote: On 24/09/2014, at 2:07 PM, Kiran Patil wrote: Some of the reasons I have found so far are as below, 1. Cleanup operation does not work since killall is not part of CentOS 7 2. I used pkill and still testcases fail at first step Ex: TEST glusterd 3. Subsequent running of testcases does not proceed and hangs at the first testcase (tests/basic/bd.t) This sounds like there could be a few challenges then. I'm setting up a new Fedora 20 (or 21 alpha) VM in Rackspace for running btrfs regression tests on. Guessing that will experience these same problems as your CentOS 7 test run, so I'm definitely interested in this too. + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-users mailing list gluster-us...@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] CentOS 7: Gluster Test Framework testcases failure
On 25/09/2014, at 2:31 PM, Kiran Patil wrote: The below testcases failing are related to xfs, cluster and others.. The hardcoded ones I have fixed temporarily by providing the absolute pathname. Testcase /tests/bugs/bug-767095.t is fixed by changing awk parameter $5 to $4. Interesting. We might need to do something like check which OS it's running on, modifying the awk line depending on the result. Testcase tests/bugs/bug-861542.t is failing at EXPECT N/A port_field $V0 '0'; # volume status If I change its value to to '1' it passes, is that correct ? Similar sort of thing here. I will keep posting the new findings and possible fixes. Thanks. :) Emmanuel Dreyfus and Harsha may have useful insight here too. They're been working through the regression scripts for a while now, making them more cross platform in order to run on the BSD's. (and maybe OSX eventually) Regards and best wishes, Justin Clift -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] [Gluster-users] heal-failed on 3.5.2
Hi all, I had an instance of heal-failed today on a 3x2 replicated volume with 17TB on ubuntu 12.04 xfs bricks running gluster 3.5.2 Initially: on the brick log: Warnings in /var/log/glusterfs/glustershd.loghttp://xymonprod001.sl1.shopzilla.sea/xymon-cgi/svcstatus.sh?CLIENT=glusterprod001.shopzilla.laxhqSECTION=msgs:/var/log/glusterfs/glustershd.log .2014-09-25 15:56:10.200387] E [afr-self-heal-common.c:1615:afr_sh_common_lookup_cbk] 0-sas03-replicate-0: Conflicting entries for /RdB2C_20140917.dat .2014-09-25 15:56:10.653858] E [afr-self-heal-common.c:1615:afr_sh_common_lookup_cbk] 0-sas03-replicate-0: Conflicting entries for /RdB2C_20140917.dat which staying a file having conflich. NO split-brain detected but a heal-failed: root@glusterprod001:~# gluster volume heal sas03 info heal-failed Gathering list of heal failed entries on volume sas03 has been successful Brick glusterprod001.shopzilla.laxhq:/brick03/gfs Number of entries: 2 atpath on brick --- 2014-09-25 15:56:10 /HypDataSata03/data/RdctB2C 2014-09-25 16:06:09 /HypDataSata03/data/RdctB2C Brick glusterprod002.shopzilla.laxhq:/brick03/gfs Number of entries: 1 atpath on brick --- 2014-09-25 15:58:37 /HypDataSata03//data//RdctB2C Brick glusterprod003.shopzilla.laxhq:/brick03/gfs Number of entries: 0 Brick glusterprod004.shopzilla.laxhq:/brick03/gfs Number of entries: 0 Brick glusterprod005.shopzilla.laxhq:/brick03/gfs Number of entries: 0 Brick glusterprod006.shopzilla.laxhq:/brick03/gfs Number of entries: 0 Noticed it stays the directory heal-failed instead of the file. Gluster clients sees the error on the file with an invalid file while doing ls against it. Then: I tried to restart glusterfs-server on both prod001 and prod002 as that's how I used to resolve the heal-failed. and it became like this: root@glusterprod001:~# gluster volume heal sas03 info heal-failed Gathering list of heal failed entries on volume sas03 has been successful Brick glusterprod001.shopzilla.laxhq:/brick03/gfs Number of entries: 2 atpath on brick --- 2014-09-25 16:17:51 gfid:9ec801a3-53d4-4d98-b950-14211920694e 2014-09-25 16:17:53 gfid:9ec801a3-53d4-4d98-b950-14211920694e Brick glusterprod002.shopzilla.laxhq:/brick03/gfs Number of entries: 3 atpath on brick --- 2014-09-25 16:15:43 gfid:9ec801a3-53d4-4d98-b950-14211920694e 2014-09-25 16:15:44 gfid:9ec801a3-53d4-4d98-b950-14211920694e 2014-09-25 16:17:51 gfid:9ec801a3-53d4-4d98-b950-14211920694e Seems like the folder turns into gfid And then: I identified the file on the brick and removed the invalid copy then issue a volume heal # gluster volume heal sas03 This fixed the client access to the file but info heal-failed got this: root@glusterprod001:/# gluster volume heal sas03 info heal-failed Gathering list of heal failed entries on volume sas03 has been successful Brick glusterprod001.shopzilla.laxhq:/brick03/gfs Number of entries: 3 atpath on brick --- 2014-09-25 16:17:51 gfid:9ec801a3-53d4-4d98-b950-14211920694e 2014-09-25 16:17:53 gfid:9ec801a3-53d4-4d98-b950-14211920694e 2014-09-25 16:27:53 gfid:9ec801a3-53d4-4d98-b950-14211920694e Brick glusterprod002.shopzilla.laxhq:/brick03/gfs Number of entries: 5 atpath on brick --- 2014-09-25 16:15:43 gfid:9ec801a3-53d4-4d98-b950-14211920694e 2014-09-25 16:15:44 gfid:9ec801a3-53d4-4d98-b950-14211920694e 2014-09-25 16:17:51 gfid:9ec801a3-53d4-4d98-b950-14211920694e 2014-09-25 16:25:44 gfid:9ec801a3-53d4-4d98-b950-14211920694e 2014-09-25 16:34:11 /HypDataSata03/data/RdctB2C Brick glusterprod003.shopzilla.laxhq:/brick03/gfs Number of entries: 0 Brick glusterprod004.shopzilla.laxhq:/brick03/gfs Number of entries: 0 Brick glusterprod005.shopzilla.laxhq:/brick03/gfs Number of entries: 0 Brick glusterprod006.shopzilla.laxhq:/brick03/gfs Number of entries: 0 which has all the gfid and the directory showed up on the heal-failed Finally: I restarted glusterfs-server on both prod001 and prod002 and that cleared the heal-failed entries Should there be a better way to resolve the heal-failed and file conflict? Thanks Peter ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] [Gluster-users] heal-failed on 3.5.2
Hi all, I had an instance of heal-failed today on a 3x2 replicated volume with 17TB on ubuntu 12.04 xfs bricks running gluster 3.5.2 Initially: on the brick log: Warnings in /var/log/glusterfs/glustershd.loghttp://xymonprod001.sl1.shopzilla.sea/xymon-cgi/svcstatus.sh?CLIENT=glusterprod001.shopzilla.laxhqSECTION=msgs:/var/log/glusterfs/glustershd.log .2014-09-25 15:56:10.200387] E [afr-self-heal-common.c:1615:afr_sh_common_lookup_cbk] 0-sas03-replicate-0: Conflicting entries for /RdB2C_20140917.dat .2014-09-25 15:56:10.653858] E [afr-self-heal-common.c:1615:afr_sh_common_lookup_cbk] 0-sas03-replicate-0: Conflicting entries for /RdB2C_20140917.dat which staying a file having conflich. NO split-brain detected but a heal-failed: root@glusterprod001:~# gluster volume heal sas03 info heal-failed Gathering list of heal failed entries on volume sas03 has been successful Brick glusterprod001.shopzilla.laxhq:/brick03/gfs Number of entries: 2 atpath on brick --- 2014-09-25 15:56:10 /HypDataSata03/data/RdctB2C 2014-09-25 16:06:09 /HypDataSata03/data/RdctB2C Brick glusterprod002.shopzilla.laxhq:/brick03/gfs Number of entries: 1 atpath on brick --- 2014-09-25 15:58:37 /HypDataSata03//data//RdctB2C Brick glusterprod003.shopzilla.laxhq:/brick03/gfs Number of entries: 0 Brick glusterprod004.shopzilla.laxhq:/brick03/gfs Number of entries: 0 Brick glusterprod005.shopzilla.laxhq:/brick03/gfs Number of entries: 0 Brick glusterprod006.shopzilla.laxhq:/brick03/gfs Number of entries: 0 Noticed it stays the directory heal-failed instead of the file. Gluster clients sees the error on the file with an invalid file while doing ls against it. Then: I tried to restart glusterfs-server on both prod001 and prod002 as that's how I used to resolve the heal-failed. and it became like this: root@glusterprod001:~# gluster volume heal sas03 info heal-failed Gathering list of heal failed entries on volume sas03 has been successful Brick glusterprod001.shopzilla.laxhq:/brick03/gfs Number of entries: 2 atpath on brick --- 2014-09-25 16:17:51 gfid:9ec801a3-53d4-4d98-b950-14211920694e 2014-09-25 16:17:53 gfid:9ec801a3-53d4-4d98-b950-14211920694e Brick glusterprod002.shopzilla.laxhq:/brick03/gfs Number of entries: 3 atpath on brick --- 2014-09-25 16:15:43 gfid:9ec801a3-53d4-4d98-b950-14211920694e 2014-09-25 16:15:44 gfid:9ec801a3-53d4-4d98-b950-14211920694e 2014-09-25 16:17:51 gfid:9ec801a3-53d4-4d98-b950-14211920694e Seems like the folder turns into gfid And then: I identified the file on the brick and removed the invalid copy then issue a volume heal # gluster volume heal sas03 This fixed the client access to the file but info heal-failed got this: root@glusterprod001:/# gluster volume heal sas03 info heal-failed Gathering list of heal failed entries on volume sas03 has been successful Brick glusterprod001.shopzilla.laxhq:/brick03/gfs Number of entries: 3 atpath on brick --- 2014-09-25 16:17:51 gfid:9ec801a3-53d4-4d98-b950-14211920694e 2014-09-25 16:17:53 gfid:9ec801a3-53d4-4d98-b950-14211920694e 2014-09-25 16:27:53 gfid:9ec801a3-53d4-4d98-b950-14211920694e Brick glusterprod002.shopzilla.laxhq:/brick03/gfs Number of entries: 5 atpath on brick --- 2014-09-25 16:15:43 gfid:9ec801a3-53d4-4d98-b950-14211920694e 2014-09-25 16:15:44 gfid:9ec801a3-53d4-4d98-b950-14211920694e 2014-09-25 16:17:51 gfid:9ec801a3-53d4-4d98-b950-14211920694e 2014-09-25 16:25:44 gfid:9ec801a3-53d4-4d98-b950-14211920694e 2014-09-25 16:34:11 /HypDataSata03/data/RdctB2C Brick glusterprod003.shopzilla.laxhq:/brick03/gfs Number of entries: 0 Brick glusterprod004.shopzilla.laxhq:/brick03/gfs Number of entries: 0 Brick glusterprod005.shopzilla.laxhq:/brick03/gfs Number of entries: 0 Brick glusterprod006.shopzilla.laxhq:/brick03/gfs Number of entries: 0 which has all the gfid and the directory showed up on the heal-failed Finally: I restarted glusterfs-server on both prod001 and prod002 and that cleared the heal-failed entries Should there be a better way to resolve the heal-failed and file conflict? Thanks Peter ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] CentOS 7: Gluster Test Framework testcases failure
On 25/09/2014, at 6:47 PM, Emmanuel Dreyfus wrote: Justin Clift jus...@gluster.org wrote: Emmanuel Dreyfus and Harsha may have useful insight here too. They're been working through the regression scripts for a while now, making them more cross platform in order to run on the BSD's. (and maybe OSX eventually) Sure but what it the question? Seems to be some portability issues in the regression tests for CentOS 7 too. Figured you might have some insight into some of them, since you're doing portability stuff around this anyway. :) Start of thread: http://supercolony.gluster.org/pipermail/gluster-devel/2014-September/042361.html + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] CentOS 7: Gluster Test Framework testcases failure
On 25/09/2014, at 9:28 PM, Lalatendu Mohanty wrote: snip Have we published somewhere which distributions or OS versions we are running regression tests ? if not lets compile it and publish as this will help community to understand which os distributions are part of the regression testing. The best we have so far is probably this: http://www.gluster.org/community/documentation/index.php/Using_the_Gluster_Test_Framework Do we have plans to run regression on a variety of distributions? Not sure how difficult or complex it is to maintain. The primary OS at the moment is CentOS 6.x (mainly due to it being the primary OS for GlusterFS I think). Manu and Harsha have been going through the regression tests recently, making them more cross platform in order to run on the BSDs. This effort has also highlighted some interesting Linux specific behaviour in the main GlusterFS code base, and led to fixes there. In short, we're all for running the regression tests on as many distributions as possible. If Community members want to put VM's or something online (medium-long term), I'd be happy to hook our Jenkins infrastructure up to them to automatically run tests on them. Is that kind of what you're asking? :) Regards and best wishes, Justin Clift -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] CentOS 7: Gluster Test Framework testcases failure
Justin Clift jus...@gluster.org wrote: Seems to be some portability issues in the regression tests for CentOS 7 too. Figured you might have some insight into some of them, since you're doing portability stuff around this anyway. :) Start of thread: (...) It seems very broken! I would start by the first failing test, and run it by hand line by line to see where it first chokes. Best candidates are race conditions and 32/64 bits issues. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel