Re: [Bugme-new] [Bug 9855] New: ext3 ACL corruption

2008-01-31 Thread Eric Sandeen
Andreas Dilger wrote:
 On Jan 30, 2008  14:49 -0800, Andrew Morton wrote:
 Problem Description:

 Inode size:   256 
 
 This is a bit interesting, since it isn't very common to use large inodes.
 I suspect this relates to the problem.

I think it is somewhat common on samba servers, though.

And it's the new default in the latest e2fsprogs... maybe something will
shake out in the F9 development cycle.

 These are production Samba servers making fairly extensive use of file and
 directory ACLs. Thus far, I've only noticed the corruptions when it came 
 time
 to upgrade to a new kernel and reboot (and the boot scripts then run fsck).
 Note that I've never noticed any issues at runtime because of this - only 
 when
 I later realised that ACLs had been removed from random files and/or
 directories.

 I think I will implement some scripts to unmount and run fsck nightly from
 cron, so I can at least detect the corruption a little earlier. If there is
 some more helpful debugging output I can provide, please let me know.
 
 There is just such a script in the thread forced fsck (again?).  Since you
 are using LVs for the filesystem.

Which is on the ext3-users list btw...

 If you are able to reproduce this, could you please dump the inode and EA
 block before fixing the problem.

Do you need instructions on doing that?

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 9855] New: ext3 ACL corruption

2008-01-30 Thread Andrew Morton
On Wed, 30 Jan 2008 14:29:27 -0800 (PST)
[EMAIL PROTECTED] wrote:

 http://bugzilla.kernel.org/show_bug.cgi?id=9855
 
Summary: ext3 ACL corruption
Product: File System
Version: 2.5
  KernelVersion: 2.6.23
   Platform: All
 OS/Version: Linux
   Tree: Mainline
 Status: NEW
   Severity: normal
   Priority: P1
  Component: ext3
 AssignedTo: [EMAIL PROTECTED]
 ReportedBy: [EMAIL PROTECTED]
 
 
 Latest working kernel version: Unknown
 Earliest failing kernel version: Definitely 2.6.23 and 2.6.23.8 but earlier is
 possible
 Distribution: Debian Etch
 Hardware Environment: Multiple x86 machines
 
 Software Environment:
 Filesystem is Ext3 on LVM on RAID-1 (on SATA).
 # e2fsck -V
 e2fsck 1.40-WIP (14-Nov-2006)
 Using EXT2FS Library version 1.40-WIP, 14-Nov-2006
 
 Problem Description:
 On several occasions now I have had e2fsck prune away ACLs on my file systems
 during a file system check after rebooting a number of (reasonably) long
 running Samba servers. This morning I decided to manually run fsck before
 rebooting one of these:
 
 # e2fsck -pfv /dev/mapper/vg_main-lv_samba
 (entry-e_value_offs + entry-e_value_size: 116, offs: 120)
 /dev/mapper/vg_main-lv_samba: Extended attribute in inode 163841 has a value
 offset (56) which is invalid
 CLEARED.
 (entry-e_value_offs + entry-e_value_size: 116, offs: 120)
 /dev/mapper/vg_main-lv_samba: Extended attribute in inode 262146 has a value
 offset (56) which is invalid
 CLEARED.
 [ snip lots of (near) identical errors]
 
 8301 inodes used (0.08%)
 1621 non-contiguous inodes (19.5%)
  # of inodes with ind/dind/tind blocks: 3837/24/0
  1108478 blocks used (5.29%)
0 bad blocks
1 large file
 
 7590 regular files
  662 directories
0 character device files
0 block device files
0 fifos
0 links
   40 symbolic links (38 fast symbolic links)
0 sockets
 
 8292 files
 
 (Note: after remounting)
 # tune2fs -l /dev/mapper/vg_main-lv_samba 
 tune2fs 1.40-WIP (14-Nov-2006)
 Filesystem volume name:   none
 Last mounted on:  not available
 Filesystem UUID:  88677414-c1f8-41ba-b737-d9f6170d771b
 Filesystem magic number:  0xEF53
 Filesystem revision #:1 (dynamic)
 Filesystem features:  has_journal ext_attr resize_inode dir_index filetype
 needs_recovery sparse_super large_file
 Filesystem flags: signed directory hash 
 Default mount options:(none)
 Filesystem state: clean
 Errors behavior:  Continue
 Filesystem OS type:   Linux
 Inode count:  10485760
 Block count:  20971520
 Reserved block count: 1048576
 Free blocks:  19863038
 Free inodes:  10477459
 First block:  0
 Block size:   4096
 Fragment size:4096
 Reserved GDT blocks:  1019
 Blocks per group: 32768
 Fragments per group:  32768
 Inodes per group: 16384
 Inode blocks per group:   1024
 Filesystem created:   Wed Feb 21 21:38:33 2007
 Last mount time:  Thu Jan 31 03:18:54 2008
 Last write time:  Thu Jan 31 03:18:54 2008
 Mount count:  1
 Maximum mount count:  30
 Last checked: Thu Jan 31 03:16:51 2008
 Check interval:   15552000 (6 months)
 Next check after: Tue Jul 29 02:16:51 2008
 Reserved blocks uid:  0 (user root)
 Reserved blocks gid:  0 (group root)
 First inode:  11
 Inode size:   256
 Journal inode:8
 Default directory hash:   tea
 Directory Hash Seed:  be8c201b-3563-4fa5-a2a6-e2864e4b73e2
 Journal backup:   inode blocks
 
 
 Steps to reproduce:
 Unfortunately, precise steps are not known. Restoring all the filesystem's 
 ACLs
 from a recent dump made using getfacl -RP fixes the ACLs without causing the
 corruption to return.
 
 These are production Samba servers making fairly extensive use of file and
 directory ACLs. Thus far, I've only noticed the corruptions when it came time
 to upgrade to a new kernel and reboot (and the boot scripts then run fsck).
 Note that I've never noticed any issues at runtime because of this - only when
 I later realised that ACLs had been removed from random files and/or
 directories.
 
 I think I will implement some scripts to unmount and run fsck nightly from
 cron, so I can at least detect the corruption a little earlier. If there is
 some more helpful debugging output I can provide, please let me know.
 

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 9855] New: ext3 ACL corruption

2008-01-30 Thread Andreas Dilger
On Jan 30, 2008  14:49 -0800, Andrew Morton wrote:
  Problem Description:
  On several occasions now I have had e2fsck prune away ACLs on my file 
  systems
  during a file system check after rebooting a number of (reasonably) long
  running Samba servers. This morning I decided to manually run fsck before
  rebooting one of these:
  
  # e2fsck -pfv /dev/mapper/vg_main-lv_samba
  (entry-e_value_offs + entry-e_value_size: 116, offs: 120)
  /dev/mapper/vg_main-lv_samba: Extended attribute in inode 163841 has a value
  offset (56) which is invalid
  CLEARED.
  (entry-e_value_offs + entry-e_value_size: 116, offs: 120)
  /dev/mapper/vg_main-lv_samba: Extended attribute in inode 262146 has a value
  offset (56) which is invalid
  CLEARED.

While these error messages still exist in e2fsck, this code appears to
have been changed somewhat because these same error messages no longer
get printed in e2fsprogs 1.40.5.

  Inode size:   256 

This is a bit interesting, since it isn't very common to use large inodes.
I suspect this relates to the problem.

  These are production Samba servers making fairly extensive use of file and
  directory ACLs. Thus far, I've only noticed the corruptions when it came 
  time
  to upgrade to a new kernel and reboot (and the boot scripts then run fsck).
  Note that I've never noticed any issues at runtime because of this - only 
  when
  I later realised that ACLs had been removed from random files and/or
  directories.
  
  I think I will implement some scripts to unmount and run fsck nightly from
  cron, so I can at least detect the corruption a little earlier. If there is
  some more helpful debugging output I can provide, please let me know.

There is just such a script in the thread forced fsck (again?).  Since you
are using LVs for the filesystem.

If you are able to reproduce this, could you please dump the inode and EA
block before fixing the problem.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html