Re: [Gluster-users] A problem with gluster 3.3.0 and Sun Grid Engine
Hi Avati, Good news is it seems the problem is solved after I added 'entry-timeout=0'. I will test our production script soon, and keep you updated Bad news is that mount.glusterfs doesn't recognize such an option, I have to tweak into it to make it accept this option. Best, Manhong From: Anand Avati [anand.av...@gmail.com] Sent: Tuesday, September 11, 2012 9:33 AM To: Dai, Manhong Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] A problem with gluster 3.3.0 and Sun Grid Engine I meant the other way. I meant to say that 3.2.0 must also have the issue (since you mentioned 3.3.0 in the subject) Avati On Tue, Sep 11, 2012 at 6:27 AM, Manhong Dai mailto:da...@umich.edu>> wrote: Hi Avati, Thanks a lot for your help! It is good to know that 3.2.x doesn't have this problem. So the worst scenario for me is to re-install it with the latest 3.2.*. I hope my life won't be that miserable. Best, Manhong On Mon, 2012-09-10 at 20:53 -0700, Anand Avati wrote: > Also, I find it very suspect that 3.2.x did not have the same > behavior! > > > Avati > > On Mon, Sep 10, 2012 at 8:53 PM, Anand Avati > mailto:anand.av...@gmail.com>> > wrote: > This is a limitation of the 'handle' nature of FUSE > filesystems. You will have to set a lower entry-timeout (mount > option) to fix this problem. > > > Avati > > > On Mon, Sep 10, 2012 at 5:13 PM, Dai, Manhong > mailto:da...@umich.edu>> wrote: > Hi Avati, > > > Thanks a lot! In my case, the application that > tries to create a new file is not inside the folder. > > > I write a simple bash scrip to demo this problem. > > #!/bin/bash > FOLDER=/home/mengf_lab/daimh/temp/testdir > for ((i=0; i<100; i++)) > do > echo "###$i###" > ssh mengf-n1 "rm -r $FOLDER; mkdir $FOLDER" > seq 10 |split -l 1 - $FOLDER/a. > done > > > And its output is > ###0### > ###1### > split: /home/mengf_lab/daimh/temp/testdir/a.aa: No > such file or directory > ###2### > split: /home/mengf_lab/daimh/temp/testdir/a.aa: No > such file or directory > ###3### > ###4### > > > Best, > Manhong > > > > > > > __ > From: Anand Avati > [anand.av...@gmail.com<mailto:anand.av...@gmail.com>] > Sent: Monday, September 10, 2012 5:25 PM > To: Dai, Manhong > Cc: > gluster-users@gluster.org<mailto:gluster-users@gluster.org> > Subject: Re: [Gluster-users] A problem with gluster > 3.3.0 and Sun Grid Engine > > > > > > On Mon, Sep 10, 2012 at 8:30 AM, Manhong Dai > mailto:da...@umich.edu>> wrote: > Hi, > > > We got a huge problem on our sun grid engine > cluster with glusterfs 3.3.0. Could somebody > help me? > > > Based on my understanding, if a folder is > removed and recreated on other client node, a > program that tries to create a new file under > the folder fails very often. > > > > > > Is the directory deleted and recreated by another > client/mount while the application which attempts to > create the file stays cd'ed inside the directory? Can > you try to confirm if this is the pattern? > > > Avati > > > > > > ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] A problem with gluster 3.3.0 and Sun Grid Engine
I meant the other way. I meant to say that 3.2.0 must also have the issue (since you mentioned 3.3.0 in the subject) Avati On Tue, Sep 11, 2012 at 6:27 AM, Manhong Dai wrote: > Hi Avati, > > > Thanks a lot for your help! It is good to know that 3.2.x doesn't > have > this problem. So the worst scenario for me is to re-install it with the > latest 3.2.*. I hope my life won't be that miserable. > > > Best, > Manhong > On Mon, 2012-09-10 at 20:53 -0700, Anand Avati wrote: > > Also, I find it very suspect that 3.2.x did not have the same > > behavior! > > > > > > Avati > > > > On Mon, Sep 10, 2012 at 8:53 PM, Anand Avati > > wrote: > > This is a limitation of the 'handle' nature of FUSE > > filesystems. You will have to set a lower entry-timeout (mount > > option) to fix this problem. > > > > > > Avati > > > > > > On Mon, Sep 10, 2012 at 5:13 PM, Dai, Manhong > > wrote: > > Hi Avati, > > > > > > Thanks a lot! In my case, the application that > > tries to create a new file is not inside the folder. > > > > > > I write a simple bash scrip to demo this problem. > > > > #!/bin/bash > > FOLDER=/home/mengf_lab/daimh/temp/testdir > > for ((i=0; i<100; i++)) > > do > > echo "###$i###" > > ssh mengf-n1 "rm -r $FOLDER; mkdir $FOLDER" > > seq 10 |split -l 1 - $FOLDER/a. > > done > > > > > > And its output is > > ###0### > > ###1### > > split: /home/mengf_lab/daimh/temp/testdir/a.aa: No > > such file or directory > > ###2### > > split: /home/mengf_lab/daimh/temp/testdir/a.aa: No > > such file or directory > > ###3### > > ###4### > > > > > > Best, > > Manhong > > > > > > > > > > > > > > __ > > From: Anand Avati [anand.av...@gmail.com] > > Sent: Monday, September 10, 2012 5:25 PM > > To: Dai, Manhong > > Cc: gluster-users@gluster.org > > Subject: Re: [Gluster-users] A problem with gluster > > 3.3.0 and Sun Grid Engine > > > > > > > > > > > > On Mon, Sep 10, 2012 at 8:30 AM, Manhong Dai > > wrote: > > Hi, > > > > > > We got a huge problem on our sun grid engine > > cluster with glusterfs 3.3.0. Could somebody > > help me? > > > > > > Based on my understanding, if a folder is > > removed and recreated on other client node, a > > program that tries to create a new file under > > the folder fails very often. > > > > > > > > > > > > Is the directory deleted and recreated by another > > client/mount while the application which attempts to > > create the file stays cd'ed inside the directory? Can > > you try to confirm if this is the pattern? > > > > > > Avati > > > > > > > > > > > > > > > ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] A problem with gluster 3.3.0 and Sun Grid Engine
Hi Avati, Thanks a lot for your help! It is good to know that 3.2.x doesn't have this problem. So the worst scenario for me is to re-install it with the latest 3.2.*. I hope my life won't be that miserable. Best, Manhong On Mon, 2012-09-10 at 20:53 -0700, Anand Avati wrote: > Also, I find it very suspect that 3.2.x did not have the same > behavior! > > > Avati > > On Mon, Sep 10, 2012 at 8:53 PM, Anand Avati > wrote: > This is a limitation of the 'handle' nature of FUSE > filesystems. You will have to set a lower entry-timeout (mount > option) to fix this problem. > > > Avati > > > On Mon, Sep 10, 2012 at 5:13 PM, Dai, Manhong > wrote: > Hi Avati, > > > Thanks a lot! In my case, the application that > tries to create a new file is not inside the folder. > > > I write a simple bash scrip to demo this problem. > > #!/bin/bash > FOLDER=/home/mengf_lab/daimh/temp/testdir > for ((i=0; i<100; i++)) > do > echo "###$i###" > ssh mengf-n1 "rm -r $FOLDER; mkdir $FOLDER" > seq 10 |split -l 1 - $FOLDER/a. > done > > > And its output is > ###0### > ###1### > split: /home/mengf_lab/daimh/temp/testdir/a.aa: No > such file or directory > ###2### > split: /home/mengf_lab/daimh/temp/testdir/a.aa: No > such file or directory > ###3### > ###4### > > > Best, > Manhong > > > > > > > __ > From: Anand Avati [anand.av...@gmail.com] > Sent: Monday, September 10, 2012 5:25 PM > To: Dai, Manhong > Cc: gluster-users@gluster.org > Subject: Re: [Gluster-users] A problem with gluster > 3.3.0 and Sun Grid Engine > > > > > > On Mon, Sep 10, 2012 at 8:30 AM, Manhong Dai > wrote: > Hi, > > > We got a huge problem on our sun grid engine > cluster with glusterfs 3.3.0. Could somebody > help me? > > > Based on my understanding, if a folder is > removed and recreated on other client node, a > program that tries to create a new file under > the folder fails very often. > > > > > > Is the directory deleted and recreated by another > client/mount while the application which attempts to > create the file stays cd'ed inside the directory? Can > you try to confirm if this is the pattern? > > > Avati > > > > > > ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] A problem with gluster 3.3.0 and Sun Grid Engine
Hi Pranith, Thanks a lot for your prompt reply! The volume is freshly installed a few weeks ago. Here is its info. gluster> volume info home Volume Name: home Type: Distribute Volume ID: 948d7b2e-ed78-43c0-9659-d3ac99cd7879 Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: camel1:/brick/home Brick2: camel2:/brick/home Options Reconfigured: performance.flush-behind: off nfs.disable: on In addition, glustefs version is 3.3.0 and OS is CentOS 6.3, and kernel is 2.6.32-279.5.2.el6.x86_64 Just let me know if you need any additional info. Best, Manhong On Mon, 2012-09-10 at 22:23 -0400, Pranith Kumar Karampuri wrote: > Manhong Dai, >Thanks for the script. Could you give the volume configuration also so > that we can re-create the problem in our setups. > > Pranith. > - Original Message - > From: "Manhong Dai" > To: "Anand Avati" > Cc: gluster-users@gluster.org > Sent: Tuesday, September 11, 2012 5:43:40 AM > Subject: Re: [Gluster-users] A problem with gluster 3.3.0 and Sun Grid Engine > > > > Hi Avati, > > > Thanks a lot! In my case, the application that tries to create a new file is > not inside the folder. > > > I write a simple bash scrip to demo this problem. > > #!/bin/bash > FOLDER=/home/mengf_lab/daimh/temp/testdir > for ((i=0; i<100; i++)) > do > echo "###$i###" > ssh mengf-n1 "rm -r $FOLDER; mkdir $FOLDER" > seq 10 |split -l 1 - $FOLDER/a. > done > > > And its output is > ###0### > ###1### > split: /home/mengf_lab/daimh/temp/testdir/a.aa: No such file or directory > ###2### > split: /home/mengf_lab/daimh/temp/testdir/a.aa: No such file or directory > ###3### > ###4### > > > Best, > Manhong > > > > > > > > > From: Anand Avati [anand.av...@gmail.com] > Sent: Monday, September 10, 2012 5:25 PM > To: Dai, Manhong > Cc: gluster-users@gluster.org > Subject: Re: [Gluster-users] A problem with gluster 3.3.0 and Sun Grid Engine > > > > > > > On Mon, Sep 10, 2012 at 8:30 AM, Manhong Dai < da...@umich.edu > wrote: > > > > Hi, > > > We got a huge problem on our sun grid engine cluster with glusterfs 3.3.0. > Could somebody help me? > > > Based on my understanding, if a folder is removed and recreated on other > client node, a program that tries to create a new file under the folder fails > very often. > > > > > > Is the directory deleted and recreated by another client/mount while the > application which attempts to create the file stays cd'ed inside the > directory? Can you try to confirm if this is the pattern? > > > Avati > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] A problem with gluster 3.3.0 and Sun Grid Engine
Also, I find it very suspect that 3.2.x did not have the same behavior! Avati On Mon, Sep 10, 2012 at 8:53 PM, Anand Avati wrote: > This is a limitation of the 'handle' nature of FUSE filesystems. You will > have to set a lower entry-timeout (mount option) to fix this problem. > > Avati > > > On Mon, Sep 10, 2012 at 5:13 PM, Dai, Manhong wrote: > >> Hi Avati, >> >> >> Thanks a lot! In my case, the application that tries to create a new >> file is not inside the folder. >> >> >> I write a simple bash scrip to demo this problem. >> >> #!/bin/bash >> FOLDER=/home/mengf_lab/daimh/temp/testdir >> for ((i=0; i<100; i++)) >> do >> echo "###$i###" >> ssh mengf-n1 "rm -r $FOLDER; mkdir $FOLDER" >> seq 10 |split -l 1 - $FOLDER/a. >> done >> >> >> And its output is >> ###0### >> ###1### >> split: /home/mengf_lab/daimh/temp/testdir/a.aa: No such file or directory >> ###2### >> split: /home/mengf_lab/daimh/temp/testdir/a.aa: No such file or directory >> ###3### >> ###4### >> >> >> Best, >> Manhong >> >> >> >> >> >> -- >> *From:* Anand Avati [anand.av...@gmail.com] >> *Sent:* Monday, September 10, 2012 5:25 PM >> *To:* Dai, Manhong >> *Cc:* gluster-users@gluster.org >> *Subject:* Re: [Gluster-users] A problem with gluster 3.3.0 and Sun Grid >> Engine >> >> >> >> On Mon, Sep 10, 2012 at 8:30 AM, Manhong Dai wrote: >> >>> ** >>> Hi, >>> >>> >>> We got a huge problem on our sun grid engine cluster with glusterfs >>> 3.3.0. Could somebody help me? >>> >>> >>> Based on my understanding, if a folder is removed and recreated on other >>> client node, a program that tries to create a new file under the folder >>> fails very often. >>> >> >> >> Is the directory deleted and recreated by another client/mount while >> the application which attempts to create the file stays cd'ed inside the >> directory? Can you try to confirm if this is the pattern? >> >> Avati >> >> > ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] A problem with gluster 3.3.0 and Sun Grid Engine
This is a limitation of the 'handle' nature of FUSE filesystems. You will have to set a lower entry-timeout (mount option) to fix this problem. Avati On Mon, Sep 10, 2012 at 5:13 PM, Dai, Manhong wrote: > Hi Avati, > > > Thanks a lot! In my case, the application that tries to create a new > file is not inside the folder. > > > I write a simple bash scrip to demo this problem. > > #!/bin/bash > FOLDER=/home/mengf_lab/daimh/temp/testdir > for ((i=0; i<100; i++)) > do > echo "###$i###" > ssh mengf-n1 "rm -r $FOLDER; mkdir $FOLDER" > seq 10 |split -l 1 - $FOLDER/a. > done > > > And its output is > ###0### > ###1### > split: /home/mengf_lab/daimh/temp/testdir/a.aa: No such file or directory > ###2### > split: /home/mengf_lab/daimh/temp/testdir/a.aa: No such file or directory > ###3### > ###4### > > > Best, > Manhong > > > > > > -- > *From:* Anand Avati [anand.av...@gmail.com] > *Sent:* Monday, September 10, 2012 5:25 PM > *To:* Dai, Manhong > *Cc:* gluster-users@gluster.org > *Subject:* Re: [Gluster-users] A problem with gluster 3.3.0 and Sun Grid > Engine > > > > On Mon, Sep 10, 2012 at 8:30 AM, Manhong Dai wrote: > >> ** >> Hi, >> >> >> We got a huge problem on our sun grid engine cluster with glusterfs >> 3.3.0. Could somebody help me? >> >> >> Based on my understanding, if a folder is removed and recreated on other >> client node, a program that tries to create a new file under the folder >> fails very often. >> > > > Is the directory deleted and recreated by another client/mount while the > application which attempts to create the file stays cd'ed inside the > directory? Can you try to confirm if this is the pattern? > > Avati > > ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] A problem with gluster 3.3.0 and Sun Grid Engine
Manhong Dai, Thanks for the script. Could you give the volume configuration also so that we can re-create the problem in our setups. Pranith. - Original Message - From: "Manhong Dai" To: "Anand Avati" Cc: gluster-users@gluster.org Sent: Tuesday, September 11, 2012 5:43:40 AM Subject: Re: [Gluster-users] A problem with gluster 3.3.0 and Sun Grid Engine Hi Avati, Thanks a lot! In my case, the application that tries to create a new file is not inside the folder. I write a simple bash scrip to demo this problem. #!/bin/bash FOLDER=/home/mengf_lab/daimh/temp/testdir for ((i=0; i<100; i++)) do echo "###$i###" ssh mengf-n1 "rm -r $FOLDER; mkdir $FOLDER" seq 10 |split -l 1 - $FOLDER/a. done And its output is ###0### ###1### split: /home/mengf_lab/daimh/temp/testdir/a.aa: No such file or directory ###2### split: /home/mengf_lab/daimh/temp/testdir/a.aa: No such file or directory ###3### ###4### Best, Manhong From: Anand Avati [anand.av...@gmail.com] Sent: Monday, September 10, 2012 5:25 PM To: Dai, Manhong Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] A problem with gluster 3.3.0 and Sun Grid Engine On Mon, Sep 10, 2012 at 8:30 AM, Manhong Dai < da...@umich.edu > wrote: Hi, We got a huge problem on our sun grid engine cluster with glusterfs 3.3.0. Could somebody help me? Based on my understanding, if a folder is removed and recreated on other client node, a program that tries to create a new file under the folder fails very often. Is the directory deleted and recreated by another client/mount while the application which attempts to create the file stays cd'ed inside the directory? Can you try to confirm if this is the pattern? Avati ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] A problem with gluster 3.3.0 and Sun Grid Engine
Hi Avati, Thanks a lot! In my case, the application that tries to create a new file is not inside the folder. I write a simple bash scrip to demo this problem. #!/bin/bash FOLDER=/home/mengf_lab/daimh/temp/testdir for ((i=0; i<100; i++)) do echo "###$i###" ssh mengf-n1 "rm -r $FOLDER; mkdir $FOLDER" seq 10 |split -l 1 - $FOLDER/a. done And its output is ###0### ###1### split: /home/mengf_lab/daimh/temp/testdir/a.aa: No such file or directory ###2### split: /home/mengf_lab/daimh/temp/testdir/a.aa: No such file or directory ###3### ###4### Best, Manhong From: Anand Avati [anand.av...@gmail.com] Sent: Monday, September 10, 2012 5:25 PM To: Dai, Manhong Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] A problem with gluster 3.3.0 and Sun Grid Engine On Mon, Sep 10, 2012 at 8:30 AM, Manhong Dai mailto:da...@umich.edu>> wrote: Hi, We got a huge problem on our sun grid engine cluster with glusterfs 3.3.0. Could somebody help me? Based on my understanding, if a folder is removed and recreated on other client node, a program that tries to create a new file under the folder fails very often. Is the directory deleted and recreated by another client/mount while the application which attempts to create the file stays cd'ed inside the directory? Can you try to confirm if this is the pattern? Avati ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] A problem with gluster 3.3.0 and Sun Grid Engine
On Mon, Sep 10, 2012 at 8:30 AM, Manhong Dai wrote: > ** > Hi, > > > We got a huge problem on our sun grid engine cluster with glusterfs 3.3.0. > Could somebody help me? > > > Based on my understanding, if a folder is removed and recreated on other > client node, a program that tries to create a new file under the folder > fails very often. > Is the directory deleted and recreated by another client/mount while the application which attempts to create the file stays cd'ed inside the directory? Can you try to confirm if this is the pattern? Avati ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] A problem with gluster 3.3.0 and Sun Grid Engine
Hi, We got a huge problem on our sun grid engine cluster with glusterfs 3.3.0. Could somebody help me? Based on my understanding, if a folder is removed and recreated on other client node, a program that tries to create a new file under the folder fails very often. We partially fixed this problem by "ls" the folder before doing anything in our command, however, Sun Grid Engine tries to create a new log file before executing our script. We often get the error message "no such file or directory" in SGE log and cannot do anything about it. flush-behind is already turned off. What's the next thing we should try? Error log on client is [2012-09-10 11:18:48.129102] W [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-home-client-0: remote operation failed: Stale NFS file handle. Path: XX (488a7270-1039-473b-9122-ae07b9e2c617) [2012-09-10 11:18:48.129162] W [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-home-client-1: remote operation failed: Stale NFS file handle. Path: XX (488a7270-1039-473b-9122-ae07b9e2c617) Error log on two brick severs are [2012-09-10 11:18:48.046828] I [server3_1-fops.c:1707:server_stat_cbk] 0-home-server: 3531033: STAT (null) (--) ==> -1 (No such file or directory) [2012-09-10 11:18:48.047176] I [server3_1-fops.c:1707:server_stat_cbk] 0-home-server: 3531035: STAT (null) (--) ==> -1 (No such file or directory) [2012-09-10 11:18:48.054315] I [server3_1-fops.c:1707:server_stat_cbk] 0-home-server: 5347741: STAT (null) (--) ==> -1 (No such file or directory) [2012-09-10 11:18:48.054544] I [server3_1-fops.c:1707:server_stat_cbk] 0-home-server: 5347742: STAT (null) (--) ==> -1 (No such file or directory) [2012-09-10 11:18:48.056036] I [server3_1-fops.c:1707:server_stat_cbk] 0-home-server: 5347746: STAT (null) (--) ==> -1 (No such file or directory) Best, Manhong ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users