Re: [Gluster-devel] NetBSD regression tests not Initializing...
On Tue, Jul 07, 2015 at 06:04:44PM +0200, Niels de Vos wrote: On Tue, Jul 07, 2015 at 07:13:53PM +0530, Kaushal M wrote: I've taken this slave and one other offline and am rebooting it. Reminder that you do not need to take teh system offline for rebooting. I normally follow these steps to get hung systems back functional: 1. verify stuck job, NFS unmount related? 2. open http://build.gluster.org/view/Infra/job/reboot-vm/build 3. login on Jenkins 4. start the reboot-vm job for the stuck system 5. wait until the job finished 6. click the abort [x] link on the stuck job 7. retrigger the job after aborting has been done (reload page) These hangs do not seem to happen on tests from the master branch anymore, only on release-3.7. I think this is a confirmation that the reference counting for auth-cache structures in gluster/nfs is a working solution. We should backport these changes: - nfs: add a gf_lock_t for the auth_cache-cache_dict http://review.gluster.org/11021 - core: add gf_ref_t for common refcounting structures http://review.gluster.org/11022 (already done through http://review.gluster.org/11421) - nfs: refcount each auth_cache_entry and related data_t http://review.gluster.org/11023 - refcount: correct the documentation http://review.gluster.org/11328 I'll try to send backports later this week (maybe Thursday?), unless someone else beats me to it. Please reply to this thread if you file a bug for this and send some backports. The above backports have been posted. These should prevent the Gluster/NFS crashes in the regression tests, and therefor prevent the hanging of NetBSD on unmounting NFS (when the NFS-server died). Please check these patches, and merge them when ready: http://review.gluster.org/#/q/status:open+project:glusterfs+branch:release-3.7+topic:bug-1242515 Thanks, Niels Thanks, Niels On Tue, Jul 7, 2015 at 6:44 PM, Kotresh Hiremath Ravishankar khire...@redhat.com wrote: Hi Emmanuel, We are seeing these issues again on nbslave7h.cloud.gluster.org http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/7974/console Thanks and Regards, Kotresh H R - Original Message - From: Emmanuel Dreyfus m...@netbsd.org To: Kotresh Hiremath Ravishankar khire...@redhat.com, Gluster Devel gluster-devel@gluster.org Sent: Sunday, July 5, 2015 12:52:23 AM Subject: Re: [Gluster-devel] NetBSD regression tests not Initializing... Kotresh Hiremath Ravishankar khire...@redhat.com wrote: Any help is appreciated. nbslave72 was sick indeed: it refused SSH connexions. I rebooted it and retiggered your change, but it went on another machine. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests not Initializing...
NetBSD tests arefailing again: http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/8123/console Triggered by Gerrit:http://review.gluster.org/11616 in silent mode. Building remotely onnbslave74.cloud.gluster.org http://build.gluster.org/computer/nbslave74.cloud.gluster.org (netbsd7_regression) in workspace /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository git config remote.origin.urlhttp://review.gluster.org/glusterfs.git # timeout=10 Fetching upstream changes fromhttp://review.gluster.org/glusterfs.git git --version # timeout=10 git -c core.askpass=true fetch --tags --progresshttp://review.gluster.org/glusterfs.git refs/changes/16/11616/1 ERROR: Error fetching remote repo 'origin' ERROR http://stacktrace.jenkins-ci.org/search?query=ERROR: Error fetching remote repo 'origin' Finished http://stacktrace.jenkins-ci.org/search?query=Finished: FAILURE Thanks, Vijay On Tuesday 07 July 2015 07:13 PM, Kaushal M wrote: I've taken this slave and one other offline and am rebooting it. On Tue, Jul 7, 2015 at 6:44 PM, Kotresh Hiremath Ravishankar khire...@redhat.com wrote: Hi Emmanuel, We are seeing these issues again on nbslave7h.cloud.gluster.org http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/7974/console Thanks and Regards, Kotresh H R - Original Message - From: Emmanuel Dreyfus m...@netbsd.org To: Kotresh Hiremath Ravishankar khire...@redhat.com, Gluster Devel gluster-devel@gluster.org Sent: Sunday, July 5, 2015 12:52:23 AM Subject: Re: [Gluster-devel] NetBSD regression tests not Initializing... Kotresh Hiremath Ravishankar khire...@redhat.com wrote: Any help is appreciated. nbslave72 was sick indeed: it refused SSH connexions. I rebooted it and retiggered your change, but it went on another machine. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests not Initializing...
Vijaikumar M vmall...@redhat.com wrote: NetBSD tests arefailing again: (...) ERROR: Error fetching remote repo 'origin' Please reboot it. I amstill working on the infamous NFS unmount kernel bug, I hope the NetBSD slaves will behave better with the fix. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests not Initializing...
I've taken this slave and one other offline and am rebooting it. On Tue, Jul 7, 2015 at 6:44 PM, Kotresh Hiremath Ravishankar khire...@redhat.com wrote: Hi Emmanuel, We are seeing these issues again on nbslave7h.cloud.gluster.org http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/7974/console Thanks and Regards, Kotresh H R - Original Message - From: Emmanuel Dreyfus m...@netbsd.org To: Kotresh Hiremath Ravishankar khire...@redhat.com, Gluster Devel gluster-devel@gluster.org Sent: Sunday, July 5, 2015 12:52:23 AM Subject: Re: [Gluster-devel] NetBSD regression tests not Initializing... Kotresh Hiremath Ravishankar khire...@redhat.com wrote: Any help is appreciated. nbslave72 was sick indeed: it refused SSH connexions. I rebooted it and retiggered your change, but it went on another machine. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests not Initializing...
Thanks Emmanuel. Thanks and Regards, Kotresh H R - Original Message - From: Emmanuel Dreyfus m...@netbsd.org To: Kotresh Hiremath Ravishankar khire...@redhat.com, Gluster Devel gluster-devel@gluster.org Sent: Sunday, July 5, 2015 12:52:23 AM Subject: Re: [Gluster-devel] NetBSD regression tests not Initializing... Kotresh Hiremath Ravishankar khire...@redhat.com wrote: Any help is appreciated. nbslave72 was sick indeed: it refused SSH connexions. I rebooted it and retiggered your change, but it went on another machine. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests not Initializing...
Kotresh Hiremath Ravishankar khire...@redhat.com wrote: Any help is appreciated. nbslave72 was sick indeed: it refused SSH connexions. I rebooted it and retiggered your change, but it went on another machine. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] NetBSD regression tests not Initializing...
Hi NetBSD regressions are not initializing because of following error consistently with multiple re-triggers. I see the same error for quite a few patches. http://review.gluster.org/#/c/11443/ Building remotely on nbslave72.cloud.gluster.org (netbsd7_regression) in workspace /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository git config remote.origin.url http://review.gluster.org/glusterfs.git # timeout=10 Fetching upstream changes from http://review.gluster.org/glusterfs.git git --version # timeout=10 git -c core.askpass=true fetch --tags --progress http://review.gluster.org/glusterfs.git refs/changes/43/11443/9 ERROR: Error fetching remote repo 'origin' ERROR: Error fetching remote repo 'origin' Finished: FAILURE Any help is appreciated. Thanks and Regards, Kotresh H R ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests hanging after ./tests/basic/mgmt_v3-locks.t
On Tuesday 16 June 2015 02:19 AM, Emmanuel Dreyfus wrote: Rajesh Joseph rjos...@redhat.com wrote: Correct me if I am wrong, but I think interruptible is good with hard mount. Which is good in real deployment scenario. Since we are talking about test scripts, I thought soft mount along with timeout period can be a good option to prevent hangs. soft mount means an I/O operation can timeout and return failure interruptible mount means you can kill a process undergoing I/O, which is useful for cleanup routine. Both are like belt with sustenders, but given how likely we are to hang, it does not hurts. We again hit this problem [1]. Can we use soft mount with some retries and timeouts so that we don't need manual intervention to recover a hung VM? Thanks, Vijay [1] http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/6971/console ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests hanging after ./tests/basic/mgmt_v3-locks.t
Vijay Bellur vbel...@redhat.com wrote: We again hit this problem [1]. Can we use soft mount with some retries and timeouts so that we don't need manual intervention to recover a hung VM? Sure, but while there, I advise soft and interruptible mount (On NetBSD, either mount -o soft,intr or mount -i -s) -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests hanging after ./tests/basic/mgmt_v3-locks.t
Emmanuel Dreyfus m...@netbsd.org wrote: We again hit this problem [1]. Can we use soft mount with some retries and timeouts so that we don't need manual intervention to recover a hung VM? Um, looking at the current test scripts, we already do it. A side note: It seems the hung case is always with dd(1). I have beven caught tests using quota.c undergoing the same failure. The only tests that do NFS mount + dd(1) are: tests/basic/ec/nfs.t tests/basic/mount-nfs-auth.t tests/bugs/glusterfs/bug-872923.t tests/bugs/quota/bug-1153964.t Perhaps it is time to add options to quota.c and use it everywhere? It would be interesting to understand what makes dd(1) hang while quota.c is fine, though. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests hanging after ./tests/basic/mgmt_v3-locks.t
Emmanuel Dreyfus m...@netbsd.org wrote: This means the dd process getting stuck in tstile because glusterfsd died is probably a NetBSD kernel bug. I have to investigate. I think I found the culprit, but fixing this will need some discussions on NetBSD lists: dd waits on a vnode lock owned by the ioflush kernel thread, which is responsible of periodical fsync. ioflush is stuck on the following backtrace: cv_wait genfs_do_putpages genfs_putpages VOP_PUTPAGES nfs_flush nfs_fsync VOP_FSYNC nfs_sync sync_fsync The cv_wait() call in genfs_do_putpages(): /* Wait for output to complete. */ if (!wasclean !async vp-v_numoutput != 0) { while (vp-v_numoutput != 0) cv_wait(vp-v_cv, slock); } cv_wait() is uninterruptible, timeout-less wait which is obviously wrong there. cv_timedwait_sig() would be better, but that means pulling NFS mount options from a lower layer. Not obvious on the architecture front. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests hanging after ./tests/basic/mgmt_v3-locks.t
Rajesh Joseph rjos...@redhat.com wrote: Correct me if I am wrong, but I think interruptible is good with hard mount. Which is good in real deployment scenario. Since we are talking about test scripts, I thought soft mount along with timeout period can be a good option to prevent hangs. soft mount means an I/O operation can timeout and return failure interruptible mount means you can kill a process undergoing I/O, which is useful for cleanup routine. Both are like belt with sustenders, but given how likely we are to hang, it does not hurts. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests hanging after ./tests/basic/mgmt_v3-locks.t
Emmanuel, I am not sure of the feasibility but just wanted to ask you. Do you think there is a possibility to error out operations on the mount when mount crashes instead of hanging? That would prevent a lot of manual intervention even in future. Pranith. On 06/15/2015 01:35 PM, Niels de Vos wrote: Hi, sometimes the NetBSD regression tests hang with messages like this: [12:29:07] ./tests/basic/mgmt_v3-locks.t ... ok79867 ms No volumes present mount_nfs: can't access /patchy: Permission denied mount_nfs: can't access /patchy: Permission denied mount_nfs: can't access /patchy: Permission denied Most (if not all) of these hangs are caused by a crashing Gluster/NFS process. Once the Gluster/NFS server is not reachable anymore, unmounting fails. The only way to recover is to reboot the VM and retrigger the test. For rebooting, the http://build.gluster.org/job/reboot-vm job can be used, and retriggering works by clicking the retrigger link in the left menu once the test has been marked as failed/aborted. When logging in on the NetBSD system that hangs, you can verify with these steps: 1. check if there is a /glusterfsd.core file 2. run gdb on the core: # cd /build/install # gdb --core=/glusterfsd.core sbin/glusterfs ... Program terminated with signal SIGSEGV, Segmentation fault. #0 0xb9b94f0b in auth_cache_lookup (cache=0xb9aa2310, fh=0xb9044bf8, host_addr=0xb900e400 104.130.205.187, timestamp=0xbf7fd900, can_write=0xbf7fd8fc) at /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/nfs/server/src/auth-cache.c:164 164 *can_write = lookup_res-item-opts-rw; 3. verify the lookup_res structure: (gdb) p *lookup_res $1 = {timestamp = 1434284981, item = 0xb901e3b0} (gdb) p *lookup_res-item $2 = {name = 0xff00 error: Cannot access memory at address 0xff00, opts = 0x} A fix for this has been sent, it is currently waiting for an update to the prosed reference counting: - http://review.gluster.org/11022 core: add gf_ref_t for common refcounting structures - http://review.gluster.org/11023 nfs: refcount each auth_cache_entry and related data_t Thanks, Niels ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests hanging after ./tests/basic/mgmt_v3-locks.t
The hang we observe is not something specific to Gluster. I've observed this kind of hangs when a filesystem which is in use goes offline. For example I've accidently shutdown machines which were being used for mounting nfs, which lead to the client systems hanging completely and required a hard reboot. If there are ways to avoid these kinds hangs when they eventually occur, I'm all ears. On Mon, Jun 15, 2015 at 4:38 PM, Pranith Kumar Karampuri pkara...@redhat.com wrote: Emmanuel, I am not sure of the feasibility but just wanted to ask you. Do you think there is a possibility to error out operations on the mount when mount crashes instead of hanging? That would prevent a lot of manual intervention even in future. Pranith. On 06/15/2015 01:35 PM, Niels de Vos wrote: Hi, sometimes the NetBSD regression tests hang with messages like this: [12:29:07] ./tests/basic/mgmt_v3-locks.t ... ok79867 ms No volumes present mount_nfs: can't access /patchy: Permission denied mount_nfs: can't access /patchy: Permission denied mount_nfs: can't access /patchy: Permission denied Most (if not all) of these hangs are caused by a crashing Gluster/NFS process. Once the Gluster/NFS server is not reachable anymore, unmounting fails. The only way to recover is to reboot the VM and retrigger the test. For rebooting, the http://build.gluster.org/job/reboot-vm job can be used, and retriggering works by clicking the retrigger link in the left menu once the test has been marked as failed/aborted. When logging in on the NetBSD system that hangs, you can verify with these steps: 1. check if there is a /glusterfsd.core file 2. run gdb on the core: # cd /build/install # gdb --core=/glusterfsd.core sbin/glusterfs ... Program terminated with signal SIGSEGV, Segmentation fault. #0 0xb9b94f0b in auth_cache_lookup (cache=0xb9aa2310, fh=0xb9044bf8, host_addr=0xb900e400 104.130.205.187, timestamp=0xbf7fd900, can_write=0xbf7fd8fc) at /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/nfs/server/src/auth-cache.c:164 164 *can_write = lookup_res-item-opts-rw; 3. verify the lookup_res structure: (gdb) p *lookup_res $1 = {timestamp = 1434284981, item = 0xb901e3b0} (gdb) p *lookup_res-item $2 = {name = 0xff00 error: Cannot access memory at address 0xff00, opts = 0x} A fix for this has been sent, it is currently waiting for an update to the prosed reference counting: - http://review.gluster.org/11022 core: add gf_ref_t for common refcounting structures - http://review.gluster.org/11023 nfs: refcount each auth_cache_entry and related data_t Thanks, Niels ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests hanging after ./tests/basic/mgmt_v3-locks.t
On Mon, Jun 15, 2015 at 04:38:54PM +0530, Pranith Kumar Karampuri wrote: Emmanuel, I am not sure of the feasibility but just wanted to ask you. Do you think there is a possibility to error out operations on the mount when mount crashes instead of hanging? That would prevent a lot of manual intervention even in future. Your message is a bit contradictory: there are bits quoted about NFS mount, which is native, and bits about glusterfs mount. What information are you looking for? If we talk about hanging mount, this is probably NFS client awaiting for a NFS server that will never return. I alsready wrote how this can be cleaned up by umount -f -R and the limitation of that approahc. If we talk about crashing mount then this is more likely to be a native mount, for which you have information in the logs, don't you? -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests hanging after ./tests/basic/mgmt_v3-locks.t
On Monday 15 June 2015 05:21 PM, Kaushal M wrote: The hang we observe is not something specific to Gluster. I've observed this kind of hangs when a filesystem which is in use goes offline. For example I've accidently shutdown machines which were being used for mounting nfs, which lead to the client systems hanging completely and required a hard reboot. If there are ways to avoid these kinds hangs when they eventually occur, I'm all ears. For these test cases can't we use the nfs soft mount option to prevent the hang? On Mon, Jun 15, 2015 at 4:38 PM, Pranith Kumar Karampuri pkara...@redhat.com wrote: Emmanuel, I am not sure of the feasibility but just wanted to ask you. Do you think there is a possibility to error out operations on the mount when mount crashes instead of hanging? That would prevent a lot of manual intervention even in future. Pranith. On 06/15/2015 01:35 PM, Niels de Vos wrote: Hi, sometimes the NetBSD regression tests hang with messages like this: [12:29:07] ./tests/basic/mgmt_v3-locks.t ... ok79867 ms No volumes present mount_nfs: can't access /patchy: Permission denied mount_nfs: can't access /patchy: Permission denied mount_nfs: can't access /patchy: Permission denied Most (if not all) of these hangs are caused by a crashing Gluster/NFS process. Once the Gluster/NFS server is not reachable anymore, unmounting fails. The only way to recover is to reboot the VM and retrigger the test. For rebooting, the http://build.gluster.org/job/reboot-vm job can be used, and retriggering works by clicking the retrigger link in the left menu once the test has been marked as failed/aborted. When logging in on the NetBSD system that hangs, you can verify with these steps: 1. check if there is a /glusterfsd.core file 2. run gdb on the core: # cd /build/install # gdb --core=/glusterfsd.core sbin/glusterfs ... Program terminated with signal SIGSEGV, Segmentation fault. #0 0xb9b94f0b in auth_cache_lookup (cache=0xb9aa2310, fh=0xb9044bf8, host_addr=0xb900e400 104.130.205.187, timestamp=0xbf7fd900, can_write=0xbf7fd8fc) at /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/nfs/server/src/auth-cache.c:164 164 *can_write = lookup_res-item-opts-rw; 3. verify the lookup_res structure: (gdb) p *lookup_res $1 = {timestamp = 1434284981, item = 0xb901e3b0} (gdb) p *lookup_res-item $2 = {name = 0xff00 error: Cannot access memory at address 0xff00, opts = 0x} A fix for this has been sent, it is currently waiting for an update to the prosed reference counting: - http://review.gluster.org/11022 core: add gf_ref_t for common refcounting structures - http://review.gluster.org/11023 nfs: refcount each auth_cache_entry and related data_t Thanks, Niels ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests hanging after ./tests/basic/mgmt_v3-locks.t
On Mon, Jun 15, 2015 at 06:28:26PM +0530, Rajesh Joseph wrote: For these test cases can't we use the nfs soft mount option to prevent the hang? soft mount will not be enough. I think you also need interruptible. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests hanging after ./tests/basic/mgmt_v3-locks.t
On Monday 15 June 2015 06:34 PM, Emmanuel Dreyfus wrote: On Mon, Jun 15, 2015 at 06:28:26PM +0530, Rajesh Joseph wrote: For these test cases can't we use the nfs soft mount option to prevent the hang? soft mount will not be enough. I think you also need interruptible. Correct me if I am wrong, but I think interruptible is good with hard mount. Which is good in real deployment scenario. Since we are talking about test scripts, I thought soft mount along with timeout period can be a good option to prevent hangs. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] NetBSD regression tests hanging after ./tests/basic/mgmt_v3-locks.t
Hi, sometimes the NetBSD regression tests hang with messages like this: [12:29:07] ./tests/basic/mgmt_v3-locks.t ... ok79867 ms No volumes present mount_nfs: can't access /patchy: Permission denied mount_nfs: can't access /patchy: Permission denied mount_nfs: can't access /patchy: Permission denied Most (if not all) of these hangs are caused by a crashing Gluster/NFS process. Once the Gluster/NFS server is not reachable anymore, unmounting fails. The only way to recover is to reboot the VM and retrigger the test. For rebooting, the http://build.gluster.org/job/reboot-vm job can be used, and retriggering works by clicking the retrigger link in the left menu once the test has been marked as failed/aborted. When logging in on the NetBSD system that hangs, you can verify with these steps: 1. check if there is a /glusterfsd.core file 2. run gdb on the core: # cd /build/install # gdb --core=/glusterfsd.core sbin/glusterfs ... Program terminated with signal SIGSEGV, Segmentation fault. #0 0xb9b94f0b in auth_cache_lookup (cache=0xb9aa2310, fh=0xb9044bf8, host_addr=0xb900e400 104.130.205.187, timestamp=0xbf7fd900, can_write=0xbf7fd8fc) at /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/nfs/server/src/auth-cache.c:164 164 *can_write = lookup_res-item-opts-rw; 3. verify the lookup_res structure: (gdb) p *lookup_res $1 = {timestamp = 1434284981, item = 0xb901e3b0} (gdb) p *lookup_res-item $2 = {name = 0xff00 error: Cannot access memory at address 0xff00, opts = 0x} A fix for this has been sent, it is currently waiting for an update to the prosed reference counting: - http://review.gluster.org/11022 core: add gf_ref_t for common refcounting structures - http://review.gluster.org/11023 nfs: refcount each auth_cache_entry and related data_t Thanks, Niels ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests: reviews required
Vijay Bellur vbel...@redhat.com wrote: More the merrier :-). Hi On master, I still have this one pending to fix glustershd: http://review.gluster.com/9071 Same fix on release-3.6 http://review.gluster.com/9084 While I am there, fixes done in master but pending for release-3.6: http://review.gluster.com/9215 (easy buffer overrun fix) http://review.gluster.com/9214 (reviewed +2) -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests: reviews required
On Mon, Dec 01, 2014 at 05:49:54AM +0100, Emmanuel Dreyfus wrote: Here is the latest list of NetBSD fixes for regression tests: Hi This is a friendly reminder that I sill have the following pending: http://review.gluster.com/9071 http://review.gluster.com/9075 http://review.gluster.com/9074 http://review.gluster.com/9216 [2] http://review.gluster.com/9217 http://review.gluster.com/9219 http://review.gluster.com/9220 [2] Here I fix the symptom rather than the cause. Hints are welcome to help fixing the cause, but perhaps the symptom fix could be merged as an interim solution so that glustershd stops crashing during the test. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests: reviews required
On 12/01/2014 05:49 AM, Emmanuel Dreyfus wrote: Vijay Bellur vbel...@redhat.com wrote: And as the fix crop, I have a few others to share :-) More the merrier :-). Here is the latest list of NetBSD fixes for regression tests: http://review.gluster.com/8982 http://review.gluster.com/9071 http://review.gluster.com/9075 http://review.gluster.com/9074 http://review.gluster.com/9212 [1] http://review.gluster.com/9216 [2] http://review.gluster.com/9217 http://review.gluster.com/9219 http://review.gluster.com/9220 [1] Krishnan Parthasarathi will probably want to improve the commit message before merging. [2] Here I fix the symptom rather than the cause. Hints are welcome to help fixing the cause, but perhaps the symptom fix could be merged as an interim solution so that glustershd stops crashing during the test. The regression.sh script on nbslave71 and nbslave72 still disable two test that always fail ./tests/basic/afr/entry-self-heal.t - I am working on it ./tests/basic/ec/quota.t - Xavier Hernandez and Raghavendra Gowdappa may have a word about it. A temporal solution if you need to implement this very soon is to add a sleep of a few seconds between the 'dd' and 'rm' commands in the quota.t script. This prevents the crash on DHT and allows the test to pass. I can do that if needed. Xavi ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests: reviews required
On Mon, Dec 01, 2014 at 10:02:32AM +0100, Xavier Hernandez wrote: A temporal solution if you need to implement this very soon is to add a sleep of a few seconds between the 'dd' and 'rm' commands in the quota.t script. This prevents the crash on DHT and allows the test to pass. Given that NetBSD regressions appear almost as fast as they are fixed, I am in a hurry to have the triggered regression tests operationnal, so that new regressions are not introduced. Hence yes, I am in favor of temporary fixes so that tests can be run. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests: reviews required
This patch should solve the crash and let the test finish successfully. http://review.gluster.org/9222 Xavi On 12/01/2014 10:26 AM, Emmanuel Dreyfus wrote: On Mon, Dec 01, 2014 at 10:02:32AM +0100, Xavier Hernandez wrote: A temporal solution if you need to implement this very soon is to add a sleep of a few seconds between the 'dd' and 'rm' commands in the quota.t script. This prevents the crash on DHT and allows the test to pass. Given that NetBSD regressions appear almost as fast as they are fixed, I am in a hurry to have the triggered regression tests operationnal, so that new regressions are not introduced. Hence yes, I am in favor of temporary fixes so that tests can be run. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests: reviews required
- Original Message - From: Xavier Hernandez xhernan...@datalab.es To: Emmanuel Dreyfus m...@netbsd.org, Vijay Bellur vbel...@redhat.com, Justin Clift jus...@gluster.org, Pranith Kumar Karampuri pkara...@redhat.com, Krishnan Parthasarathi kpart...@redhat.com, Raghavendra Gowdappa rgowd...@redhat.com Cc: gluster-devel@gluster.org Sent: Monday, December 1, 2014 2:32:32 PM Subject: Re: [Gluster-devel] NetBSD regression tests: reviews required On 12/01/2014 05:49 AM, Emmanuel Dreyfus wrote: Vijay Bellur vbel...@redhat.com wrote: And as the fix crop, I have a few others to share :-) More the merrier :-). Here is the latest list of NetBSD fixes for regression tests: http://review.gluster.com/8982 http://review.gluster.com/9071 http://review.gluster.com/9075 http://review.gluster.com/9074 http://review.gluster.com/9212 [1] http://review.gluster.com/9216 [2] http://review.gluster.com/9217 http://review.gluster.com/9219 http://review.gluster.com/9220 [1] Krishnan Parthasarathi will probably want to improve the commit message before merging. [2] Here I fix the symptom rather than the cause. Hints are welcome to help fixing the cause, but perhaps the symptom fix could be merged as an interim solution so that glustershd stops crashing during the test. The regression.sh script on nbslave71 and nbslave72 still disable two test that always fail ./tests/basic/afr/entry-self-heal.t - I am working on it ./tests/basic/ec/quota.t - Xavier Hernandez and Raghavendra Gowdappa may have a word about it. A temporal solution if you need to implement this very soon is to add a sleep of a few seconds between the 'dd' and 'rm' commands in the quota.t script. This prevents the crash on DHT and allows the test to pass. I can do that if needed. Go ahead :). Xavi ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests: reviews required
On Mon, Dec 1, 2014 at 2:32 PM, Xavier Hernandez xhernan...@datalab.es wrote: On 12/01/2014 05:49 AM, Emmanuel Dreyfus wrote: Vijay Bellur vbel...@redhat.com wrote: And as the fix crop, I have a few others to share :-) More the merrier :-). Here is the latest list of NetBSD fixes for regression tests: http://review.gluster.com/8982 http://review.gluster.com/9071 http://review.gluster.com/9075 http://review.gluster.com/9074 http://review.gluster.com/9212 [1] http://review.gluster.com/9216 [2] http://review.gluster.com/9217 http://review.gluster.com/9219 http://review.gluster.com/9220 [1] Krishnan Parthasarathi will probably want to improve the commit message before merging. [2] Here I fix the symptom rather than the cause. Hints are welcome to help fixing the cause, but perhaps the symptom fix could be merged as an interim solution so that glustershd stops crashing during the test. The regression.sh script on nbslave71 and nbslave72 still disable two test that always fail ./tests/basic/afr/entry-self-heal.t - I am working on it ./tests/basic/ec/quota.t - Xavier Hernandez and Raghavendra Gowdappa may have a word about it. A temporal solution if you need to implement this very soon is to add a sleep of a few seconds between the 'dd' and 'rm' commands in the quota.t script. This prevents the crash on DHT Do you've the core/back-trace? I am curious to know what caused the crash. and allows the test to pass. I can do that if needed. Xavi ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel -- Raghavendra G ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests: reviews required
Emmanuel, I think Raghavendra is referring to the crash in tests/basic/ec/quota.t here (as opposed to the one in tests/basic/afr/self-heald.t for which I posted the explanation). -Krutika - Original Message - From: Emmanuel Dreyfus m...@netbsd.org To: Raghavendra G raghaven...@gluster.com, Xavier Hernandez xhernan...@datalab.es Cc: Gluster Devel gluster-devel@gluster.org Sent: Monday, December 1, 2014 6:04:57 PM Subject: Re: [Gluster-devel] NetBSD regression tests: reviews required Raghavendra G raghaven...@gluster.com wrote: Do you've the core/back-trace? I am curious to know what caused the crash. Krutika Dhananjay just posted the full explanation in 1374342506.4067656.1417436203250.javamail.zim...@redhat.com -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests: reviews required
Vijay Bellur vbel...@redhat.com wrote: And as the fix crop, I have a few others to share :-) More the merrier :-). Here is the latest list of NetBSD fixes for regression tests: http://review.gluster.com/8982 http://review.gluster.com/9071 http://review.gluster.com/9075 http://review.gluster.com/9074 http://review.gluster.com/9212 [1] http://review.gluster.com/9216 [2] http://review.gluster.com/9217 http://review.gluster.com/9219 http://review.gluster.com/9220 [1] Krishnan Parthasarathi will probably want to improve the commit message before merging. [2] Here I fix the symptom rather than the cause. Hints are welcome to help fixing the cause, but perhaps the symptom fix could be merged as an interim solution so that glustershd stops crashing during the test. The regression.sh script on nbslave71 and nbslave72 still disable two test that always fail ./tests/basic/afr/entry-self-heal.t - I am working on it ./tests/basic/ec/quota.t - Xavier Hernandez and Raghavendra Gowdappa may have a word about it. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests: reviews required
On Sat, 22 Nov 2014 16:55:07 +0100 m...@netbsd.org (Emmanuel Dreyfus) wrote: snip Some news on triggered NetBSD regression tests: we still have a few test that always fail in basic. I tweaked the regression test launching script to skip them, so that we can come some useful results until they are fixed. But before enabling votes, we still need to merge two change sets to fix spurious failures: http://review.gluster.com/9071 http://review.gluster.com/9075 snip And while I am there, it would be nice if someone could review this one: http://review.gluster.com/9137 Hi all, Does anyone have time to look over some (or all) of these three for correctness? We're trying to get the NetBSD side of things running 100%, and waiting on these is blocking us. ;) Regards and best wishes, Justin Clift -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests: reviews required
On 11/28/2014 09:25 PM, Justin Clift wrote: On Sat, 22 Nov 2014 16:55:07 +0100 m...@netbsd.org (Emmanuel Dreyfus) wrote: snip Some news on triggered NetBSD regression tests: we still have a few test that always fail in basic. I tweaked the regression test launching script to skip them, so that we can come some useful results until they are fixed. But before enabling votes, we still need to merge two change sets to fix spurious failures: http://review.gluster.com/9071 http://review.gluster.com/9075 Have dropped a note to Pranith and Raghavendra Bhat to review these. We should get this in soon. snip And while I am there, it would be nice if someone could review this one: http://review.gluster.com/9137 Have Merged this. Hi all, Does anyone have time to look over some (or all) of these three for correctness? We're trying to get the NetBSD side of things running 100%, and waiting on these is blocking us. ;) Sorry about this delay. All patches should be in the repo in a bit. Regards, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests: reviews required
On Fri, Nov 28, 2014 at 03:55:22PM +, Justin Clift wrote: We're trying to get the NetBSD side of things running 100%, and waiting on these is blocking us. ;) And as the fix crop, I have a few others to share :-) Currently I test with: http://review.gluster.org/8074 http://review.gluster.org/8982 http://review.gluster.org/9075 http://review.gluster.org/9137 http://review.gluster.org/9171 http://review.gluster.org/9204 http://review.gluster.org/9212 (maybe not ready yet) -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests: reviews required
On 11/28/2014 09:44 PM, Emmanuel Dreyfus wrote: On Fri, Nov 28, 2014 at 03:55:22PM +, Justin Clift wrote: We're trying to get the NetBSD side of things running 100%, and waiting on these is blocking us. ;) And as the fix crop, I have a few others to share :-) More the merrier :-). Currently I test with: http://review.gluster.org/8074 s/8074/9074/ ? Pranith - If this happens to be 9074, how would we want to address this? Should we wait for glfsheal to start working in release-3.6 mainline? http://review.gluster.org/8982 Have triggered a regression run for this. Will merge upon successful completion of regression. http://review.gluster.org/9075 Under review (along with 9071). http://review.gluster.org/9137 http://review.gluster.org/9171 http://review.gluster.org/9204 All three patches are now in merged state. http://review.gluster.org/9212 (maybe not ready yet) Should be in once ready. Cheers, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] NetBSD regression tests: reviews required
Hello Some news on triggered NetBSD regression tests: we still have a few test that always fail in basic. I tweaked the regression test launching script to skip them, so that we can come some useful results until they are fixed. But before enabling votes, we still need to merge two change sets to fix spurious failures: http://review.gluster.com/9071 http://review.gluster.com/9075 The NetBSD tirggered regression history can be seen here: starting at build #79, we have 2 success and two spurious failures: http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/ The skipped tests for now are: - tests/basic/afr/self-heald.t Anuradha Talur ata...@redhat.com is working on it - tests/basic/ec/ec.t - tests/basic/ec/self-heal.t Xavier Hernandez xhernan...@datalab.es submitted this fix that needs to me merged: http://review.gluster.org/9151 - tests/basic/ec/quota.t Still being investigated by Xavier. And while I am there, it would be nice if someone could review this one: http://review.gluster.com/9137 -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] NetBSD regression tests
Hi We now have NetBSD regression tests triggered on each commit. The tests are restricted to tests/basic Unfortunately we have 3 tests that almost reliabily fail. Help is welcome to fix them (nbslave70.cloud.gluster.org is available for testing): ./tests/basic/afr/self-heald.t (Wstat: 0 Tests: 83 Failed: 1) Failed test: 29 ./tests/basic/ec/quota.t (Wstat: 0 Tests: 22 Failed: 3) Failed tests: 19-21 ./tests/basic/ec/self-heal.t (Wstat: 0 Tests: 257 Failed: 1) Failed test: 246 There are also a few spurious failures. Merging that changes would help: http://review.gluster.org/8982 http://review.gluster.org/9071 http://review.gluster.org/9075 -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests
Hi, On 11/18/2014 09:44 AM, Emmanuel Dreyfus wrote: Hi We now have NetBSD regression tests triggered on each commit. The tests are restricted to tests/basic Unfortunately we have 3 tests that almost reliabily fail. Help is welcome to fix them (nbslave70.cloud.gluster.org is available for testing): ./tests/basic/afr/self-heald.t (Wstat: 0 Tests: 83 Failed: 1) Failed test: 29 ./tests/basic/ec/quota.t (Wstat: 0 Tests: 22 Failed: 3) Failed tests: 19-21 ./tests/basic/ec/self-heal.t (Wstat: 0 Tests: 257 Failed: 1) Failed test: 246 This failure is solved by patches http://review.gluster.org/9117/ (already merged) and http://review.gluster.org/9133/ (I've just reviewed it). I'm still working on quota.t failure. Xavi ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests
On Tue, Nov 18, 2014 at 10:30:46AM +0100, Xavier Hernandez wrote: ./tests/basic/ec/quota.t (Wstat: 0 Tests: 22 Failed: 3) Failed tests: 19-21 ./tests/basic/ec/self-heal.t (Wstat: 0 Tests: 257 Failed: 1) Failed test: 246 This failure is solved by patches http://review.gluster.org/9117/ (already merged) and http://review.gluster.org/9133/ (I've just reviewed it). Unfortunately http://review.gluster.org/9133 does not always fix test 246 of ./tests/basic/ec/self-heal.t. Running ls 2 here shows that the result is not conistent: we do not have the same objects listed on each try -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests
On 11/18/2014 02:05 PM, Emmanuel Dreyfus wrote: On Tue, Nov 18, 2014 at 10:30:46AM +0100, Xavier Hernandez wrote: ./tests/basic/ec/quota.t (Wstat: 0 Tests: 22 Failed: 3) Failed tests: 19-21 ./tests/basic/ec/self-heal.t (Wstat: 0 Tests: 257 Failed: 1) Failed test: 246 This failure is solved by patches http://review.gluster.org/9117/ (already merged) and http://review.gluster.org/9133/ (I've just reviewed it). Unfortunately http://review.gluster.org/9133 does not always fix test 246 of ./tests/basic/ec/self-heal.t. Running ls 2 here shows that the result is not conistent: we do not have the same objects listed on each try Have you applied 9117 ? When I was doing tests, it was not applied. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests
On Tue, Nov 18, 2014 at 02:16:24PM +0100, Xavier Hernandez wrote: Have you applied 9117 ? When I was doing tests, it was not applied. Yes, I am testing at mine to avoid breaking your experiments on quota.t I will resync everything to master to see if it improves. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests
On Tue, Nov 18, 2014 at 02:16:24PM +0100, Xavier Hernandez wrote: Have you applied 9117 ? When I was doing tests, it was not applied. Yes, it still fails with latest master (and changes 8982 9071 9075 9097 9133 and 9137 -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] NetBSD regression tests: patches to merge
Hi I need this one to be merged so that I can setup pre-commit basic regression tests on NetBSD for master: http://review.gluster.org/8936 While there, my life would be simplier if that big one could be merged: http://review.gluster.org/9009 -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel