Re: [Ocfs2-users] git checkout on an ocfs2 filesystem

2009-08-31 Thread James Harrell

-- Warning - threadjack in progress. But I think it might be related. --

Interesting. Could this be similar to the case that I'm seeing with "no 
space left on device"? Here's my uneducated assumption:
- at this point I believe some form of I/O error or interrupt causes 
ocfs2 to error out

- the "remount read only" effect silently kicks in (no log message though)
- now file operations return "no space left on device", but my device 
is showing 2% use


The reasons I imagine a correlation between these two are:
Tao says:
> The ERESTARTSYS may happen when we get interrupted from 
ocfs2_cluster_lock.

> I met with it when I rm -rf a very large dir and use "ctrl+c" to stop it
> when I tested bug 1162.

There are also a fair number of posts over in the kernel lists talking 
about qlogic driver issues (qla2xxx) relating to PCI MSI's causing hangs 
under moderate I/O load.

eg: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/268242

The rm -rf  on a very large dir would do presumably do this. I also hit 
this when I do a local rsync or untar of very large directory trees.
EMC also recommends the qlogic HBA be set to Interrupt after every I/O 
completion. Could this cause a race condition?
All of these have interrupts in common. Think setting nointr as a mount 
option would help here?


Best,
James

Joel Becker wrote:

On Mon, Aug 31, 2009 at 12:39:02PM -0700, Joel Becker wrote:
  

On Mon, Aug 31, 2009 at 12:16:36PM -0700, Joel Becker wrote:
5441  open("t/t6015-rev-list-show-all-parents.sh", 
O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0777) = ? ERESTARTSYS (To be restarted)



The workaround is to mount with the 'nointr' option.

Joel

  
___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] git checkout on an ocfs2 filesystem

2009-08-31 Thread Joel Becker
On Mon, Aug 31, 2009 at 12:16:36PM -0700, Joel Becker wrote:
>   Can you file a bug at http://oss.oracle.com/bugzilla?  Include
> all the info you have in your emails.  Thanks!

Filed as bug 1165.
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1165

Joel


-- 

"Egotist: a person more interested in himself than in me."
 - Ambrose Bierce 

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.bec...@oracle.com
Phone: (650) 506-8127

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] git checkout on an ocfs2 filesystem

2009-08-31 Thread Tao Ma
Hi Joel,

Joel Becker wrote:
> On Mon, Aug 31, 2009 at 12:16:36PM -0700, Joel Becker wrote:
>> On Sun, Aug 30, 2009 at 08:19:08PM -0500, Nathaniel Griswold wrote:
>>> Has anyone here had problems with git checkouts on ocfs2?
>>  Oh, boy, this is wacky.
> 
>   No, it's extra wacky:
> 
> 5441  lstat64("t/t6015-rev-list-show-all-parents.sh", 0xffc2e318) = -1 ENOENT 
> (No such file or directory)
> 5441  lstat64("t", {st_mode=S_IFDIR|0755, st_size=12288, ...}) = 0
> 5441  open("t/t6015-rev-list-show-all-parents.sh", 
> O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0777) = ? ERESTARTSYS (To be restarted)
> 5441  --- SIGALRM (Alarm clock) @ 0 (0) ---
> 5441  sigreturn()   = ? (mask now [])
> 5441  open("t/t6015-rev-list-show-all-parents.sh", 
> O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0777) = -1 EEXIST (File exists)
> 5441  write(2, "error: git-checkout-index: unabl"..., 100) = 100
> 
> How on earth is userspace seeing ERESTARTSYS?  Did someone forget to
> -ive it?
The ERESTARTSYS may happen when we get interrupted from ocfs2_cluster_lock.
I met with it when I rm -rf a very large dir and use "ctrl+c" to stop it 
when I tested bug 1162.

Regards,
Tao

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] git checkout on an ocfs2 filesystem

2009-08-31 Thread Joel Becker
On Tue, Sep 01, 2009 at 09:32:14AM +0800, Tao Ma wrote:
> >5441  open("t/t6015-rev-list-show-all-parents.sh", 
> >O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0777) = ? ERESTARTSYS (To be restarted)
> >5441  --- SIGALRM (Alarm clock) @ 0 (0) ---
> >5441  sigreturn()   = ? (mask now [])
> >5441  open("t/t6015-rev-list-show-all-parents.sh", 
> >O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0777) = -1 EEXIST (File exists)
> >5441  write(2, "error: git-checkout-index: unabl"..., 100) = 100
> >
> >How on earth is userspace seeing ERESTARTSYS?  Did someone forget to
> >-ive it?
> The ERESTARTSYS may happen when we get interrupted from ocfs2_cluster_lock.
> I met with it when I rm -rf a very large dir and use "ctrl+c" to
> stop it when I tested bug 1162.

Yeah, I got there.  In the git case, we do ocfs2_add_entry(),
but then a signal interrupts ocfs2_dentry_lock().  So ERESTARTSYS is
returned, but the file has been created.  When entry.S goes to retry the
open(O_EXCL), it gets EEXIST.
I've added code to block signals around that dentry lock call,
but there's another place that the git code is triggering.  I'm hunting
that down.
In general, we can't return ERESTARTSYS once we've done
something that isn't idempotent.  I think we need to audit our code a
bit.

Joel

-- 

Life's Little Instruction Book #173

"Be kinder than necessary."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.bec...@oracle.com
Phone: (650) 506-8127

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] git checkout on an ocfs2 filesystem

2009-08-31 Thread Joel Becker
On Mon, Aug 31, 2009 at 12:39:02PM -0700, Joel Becker wrote:
> On Mon, Aug 31, 2009 at 12:16:36PM -0700, Joel Becker wrote:
> 5441  open("t/t6015-rev-list-show-all-parents.sh", 
> O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0777) = ? ERESTARTSYS (To be restarted)

The workaround is to mount with the 'nointr' option.

Joel

-- 

"Vote early and vote often." 
- Al Capone

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.bec...@oracle.com
Phone: (650) 506-8127

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] git checkout on an ocfs2 filesystem

2009-08-31 Thread Joel Becker
On Mon, Aug 31, 2009 at 12:16:36PM -0700, Joel Becker wrote:
> On Sun, Aug 30, 2009 at 08:19:08PM -0500, Nathaniel Griswold wrote:
> > Has anyone here had problems with git checkouts on ocfs2?
> 
>   Oh, boy, this is wacky.

No, it's extra wacky:

5441  lstat64("t/t6015-rev-list-show-all-parents.sh", 0xffc2e318) = -1 ENOENT 
(No such file or directory)
5441  lstat64("t", {st_mode=S_IFDIR|0755, st_size=12288, ...}) = 0
5441  open("t/t6015-rev-list-show-all-parents.sh", 
O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0777) = ? ERESTARTSYS (To be restarted)
5441  --- SIGALRM (Alarm clock) @ 0 (0) ---
5441  sigreturn()   = ? (mask now [])
5441  open("t/t6015-rev-list-show-all-parents.sh", 
O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0777) = -1 EEXIST (File exists)
5441  write(2, "error: git-checkout-index: unabl"..., 100) = 100

How on earth is userspace seeing ERESTARTSYS?  Did someone forget to
-ive it?

Joel

-- 

"Reality is merely an illusion, albeit a very persistent one."
- Albert Einstien

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.bec...@oracle.com
Phone: (650) 506-8127

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] git checkout on an ocfs2 filesystem

2009-08-31 Thread Joel Becker
On Sun, Aug 30, 2009 at 08:19:08PM -0500, Nathaniel Griswold wrote:
> Has anyone here had problems with git checkouts on ocfs2?

Oh, boy, this is wacky.

> On a freshly created filesystem, i'm getting a bunch of weird errors
> as below, but only if i mount the filesystem on multiple nodes:
> 
> [r...@node-2 tmp]# mkfs.ocfs2 -L "testvol" /dev/sdm
> [r...@node-2 tmp]# mount /dev/sdm /tmp/m
> [r...@node-1 tmp]# mount /dev/sdm /tmp/m
> [r...@node-1 tmp]# cd m
> [r...@node-1 m]# git clone git://git.kernel.org/pub/scm/git/git.git
> 
> Initialized empty Git repository in /tmp/m/git/.git/
> remote: Counting objects: 104239, done.
> remote: Compressing objects: 100% (29045/29045), done.
> remote: Total 104239 (delta 75710), reused 101527 (delta 73491)
> Receiving objects: 100% (104239/104239), 25.43 MiB | 1299 KiB/s, done.
> Resolving deltas: 100% (75710/75710), done.
> error: git checkout-index: unable to create file
> contrib/remotes2config.sh (File exists)
> error: git checkout-index: unable to create file
> t/t4013/diff.diff_--dirstat_master~1_master~2 (File exists)

I got it on this particular file myself.  I was using
2.6.18-128.el5 on a ppc64.  1.4.2 of ocfs2 as well.
Can you file a bug at http://oss.oracle.com/bugzilla?  Include
all the info you have in your emails.  Thanks!

Joel



-- 

"This is the end, beautiful friend.
 This is the end, my only friend the end
 Of our elaborate plans, the end
 Of everything that stands, the end
 No safety or surprise, the end
 I'll never look into your eyes again."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.bec...@oracle.com
Phone: (650) 506-8127

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users