Re: [Ocfs2-users] git checkout on an ocfs2 filesystem
-- Warning - threadjack in progress. But I think it might be related. -- Interesting. Could this be similar to the case that I'm seeing with "no space left on device"? Here's my uneducated assumption: - at this point I believe some form of I/O error or interrupt causes ocfs2 to error out - the "remount read only" effect silently kicks in (no log message though) - now file operations return "no space left on device", but my device is showing 2% use The reasons I imagine a correlation between these two are: Tao says: > The ERESTARTSYS may happen when we get interrupted from ocfs2_cluster_lock. > I met with it when I rm -rf a very large dir and use "ctrl+c" to stop it > when I tested bug 1162. There are also a fair number of posts over in the kernel lists talking about qlogic driver issues (qla2xxx) relating to PCI MSI's causing hangs under moderate I/O load. eg: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/268242 The rm -rf on a very large dir would do presumably do this. I also hit this when I do a local rsync or untar of very large directory trees. EMC also recommends the qlogic HBA be set to Interrupt after every I/O completion. Could this cause a race condition? All of these have interrupts in common. Think setting nointr as a mount option would help here? Best, James Joel Becker wrote: On Mon, Aug 31, 2009 at 12:39:02PM -0700, Joel Becker wrote: On Mon, Aug 31, 2009 at 12:16:36PM -0700, Joel Becker wrote: 5441 open("t/t6015-rev-list-show-all-parents.sh", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0777) = ? ERESTARTSYS (To be restarted) The workaround is to mount with the 'nointr' option. Joel ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] git checkout on an ocfs2 filesystem
On Mon, Aug 31, 2009 at 12:16:36PM -0700, Joel Becker wrote: > Can you file a bug at http://oss.oracle.com/bugzilla? Include > all the info you have in your emails. Thanks! Filed as bug 1165. http://oss.oracle.com/bugzilla/show_bug.cgi?id=1165 Joel -- "Egotist: a person more interested in himself than in me." - Ambrose Bierce Joel Becker Principal Software Developer Oracle E-mail: joel.bec...@oracle.com Phone: (650) 506-8127 ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] git checkout on an ocfs2 filesystem
Hi Joel, Joel Becker wrote: > On Mon, Aug 31, 2009 at 12:16:36PM -0700, Joel Becker wrote: >> On Sun, Aug 30, 2009 at 08:19:08PM -0500, Nathaniel Griswold wrote: >>> Has anyone here had problems with git checkouts on ocfs2? >> Oh, boy, this is wacky. > > No, it's extra wacky: > > 5441 lstat64("t/t6015-rev-list-show-all-parents.sh", 0xffc2e318) = -1 ENOENT > (No such file or directory) > 5441 lstat64("t", {st_mode=S_IFDIR|0755, st_size=12288, ...}) = 0 > 5441 open("t/t6015-rev-list-show-all-parents.sh", > O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0777) = ? ERESTARTSYS (To be restarted) > 5441 --- SIGALRM (Alarm clock) @ 0 (0) --- > 5441 sigreturn() = ? (mask now []) > 5441 open("t/t6015-rev-list-show-all-parents.sh", > O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0777) = -1 EEXIST (File exists) > 5441 write(2, "error: git-checkout-index: unabl"..., 100) = 100 > > How on earth is userspace seeing ERESTARTSYS? Did someone forget to > -ive it? The ERESTARTSYS may happen when we get interrupted from ocfs2_cluster_lock. I met with it when I rm -rf a very large dir and use "ctrl+c" to stop it when I tested bug 1162. Regards, Tao ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] git checkout on an ocfs2 filesystem
On Tue, Sep 01, 2009 at 09:32:14AM +0800, Tao Ma wrote: > >5441 open("t/t6015-rev-list-show-all-parents.sh", > >O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0777) = ? ERESTARTSYS (To be restarted) > >5441 --- SIGALRM (Alarm clock) @ 0 (0) --- > >5441 sigreturn() = ? (mask now []) > >5441 open("t/t6015-rev-list-show-all-parents.sh", > >O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0777) = -1 EEXIST (File exists) > >5441 write(2, "error: git-checkout-index: unabl"..., 100) = 100 > > > >How on earth is userspace seeing ERESTARTSYS? Did someone forget to > >-ive it? > The ERESTARTSYS may happen when we get interrupted from ocfs2_cluster_lock. > I met with it when I rm -rf a very large dir and use "ctrl+c" to > stop it when I tested bug 1162. Yeah, I got there. In the git case, we do ocfs2_add_entry(), but then a signal interrupts ocfs2_dentry_lock(). So ERESTARTSYS is returned, but the file has been created. When entry.S goes to retry the open(O_EXCL), it gets EEXIST. I've added code to block signals around that dentry lock call, but there's another place that the git code is triggering. I'm hunting that down. In general, we can't return ERESTARTSYS once we've done something that isn't idempotent. I think we need to audit our code a bit. Joel -- Life's Little Instruction Book #173 "Be kinder than necessary." Joel Becker Principal Software Developer Oracle E-mail: joel.bec...@oracle.com Phone: (650) 506-8127 ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] git checkout on an ocfs2 filesystem
On Mon, Aug 31, 2009 at 12:39:02PM -0700, Joel Becker wrote: > On Mon, Aug 31, 2009 at 12:16:36PM -0700, Joel Becker wrote: > 5441 open("t/t6015-rev-list-show-all-parents.sh", > O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0777) = ? ERESTARTSYS (To be restarted) The workaround is to mount with the 'nointr' option. Joel -- "Vote early and vote often." - Al Capone Joel Becker Principal Software Developer Oracle E-mail: joel.bec...@oracle.com Phone: (650) 506-8127 ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] git checkout on an ocfs2 filesystem
On Mon, Aug 31, 2009 at 12:16:36PM -0700, Joel Becker wrote: > On Sun, Aug 30, 2009 at 08:19:08PM -0500, Nathaniel Griswold wrote: > > Has anyone here had problems with git checkouts on ocfs2? > > Oh, boy, this is wacky. No, it's extra wacky: 5441 lstat64("t/t6015-rev-list-show-all-parents.sh", 0xffc2e318) = -1 ENOENT (No such file or directory) 5441 lstat64("t", {st_mode=S_IFDIR|0755, st_size=12288, ...}) = 0 5441 open("t/t6015-rev-list-show-all-parents.sh", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0777) = ? ERESTARTSYS (To be restarted) 5441 --- SIGALRM (Alarm clock) @ 0 (0) --- 5441 sigreturn() = ? (mask now []) 5441 open("t/t6015-rev-list-show-all-parents.sh", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0777) = -1 EEXIST (File exists) 5441 write(2, "error: git-checkout-index: unabl"..., 100) = 100 How on earth is userspace seeing ERESTARTSYS? Did someone forget to -ive it? Joel -- "Reality is merely an illusion, albeit a very persistent one." - Albert Einstien Joel Becker Principal Software Developer Oracle E-mail: joel.bec...@oracle.com Phone: (650) 506-8127 ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] git checkout on an ocfs2 filesystem
On Sun, Aug 30, 2009 at 08:19:08PM -0500, Nathaniel Griswold wrote: > Has anyone here had problems with git checkouts on ocfs2? Oh, boy, this is wacky. > On a freshly created filesystem, i'm getting a bunch of weird errors > as below, but only if i mount the filesystem on multiple nodes: > > [r...@node-2 tmp]# mkfs.ocfs2 -L "testvol" /dev/sdm > [r...@node-2 tmp]# mount /dev/sdm /tmp/m > [r...@node-1 tmp]# mount /dev/sdm /tmp/m > [r...@node-1 tmp]# cd m > [r...@node-1 m]# git clone git://git.kernel.org/pub/scm/git/git.git > > Initialized empty Git repository in /tmp/m/git/.git/ > remote: Counting objects: 104239, done. > remote: Compressing objects: 100% (29045/29045), done. > remote: Total 104239 (delta 75710), reused 101527 (delta 73491) > Receiving objects: 100% (104239/104239), 25.43 MiB | 1299 KiB/s, done. > Resolving deltas: 100% (75710/75710), done. > error: git checkout-index: unable to create file > contrib/remotes2config.sh (File exists) > error: git checkout-index: unable to create file > t/t4013/diff.diff_--dirstat_master~1_master~2 (File exists) I got it on this particular file myself. I was using 2.6.18-128.el5 on a ppc64. 1.4.2 of ocfs2 as well. Can you file a bug at http://oss.oracle.com/bugzilla? Include all the info you have in your emails. Thanks! Joel -- "This is the end, beautiful friend. This is the end, my only friend the end Of our elaborate plans, the end Of everything that stands, the end No safety or surprise, the end I'll never look into your eyes again." Joel Becker Principal Software Developer Oracle E-mail: joel.bec...@oracle.com Phone: (650) 506-8127 ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users