Re: [patch 0/2] i_version update

2007-05-31 Thread Trond Myklebust
On Thu, 2007-05-31 at 10:01 +1000, Neil Brown wrote:

 This will provide a change number that normally changes only when the
 file changes and doesn't require any extra storage on disk.
 The change number will change inappropriately only when the inode has
 fallen out of cache and is being reload, which is either after a crash
 (hopefully rare) of when a file hasn't been used for a while, implying
 that it is unlikely that any client has it in cache.

It will also change inappropriately if the server is under heavy load
and needs to reclaim memory by tossing out inodes that are cached and
still in use by the clients. That change will trigger clients to
invalidate their caches and to refetch the data from the server, further
cranking up the load.

Not an ideal solution...

Trond

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e2fsprogs coverity patch cid-33.diff

2007-05-31 Thread Theodore Tso
On Tue, May 29, 2007 at 02:51:41PM -0600, Andreas Dilger wrote:
  Did cid-34.diff get lost?
 
 I still have it in my apply atop 1.39-WIP series, so it appears not
 to have made it into Ted's repo.  I'm including the patch again for
 posterity.

Fix applied.

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops on disk write (kernel 2.6.16.y)

2007-05-31 Thread Michal Piotrowski
[linux-ext4 added to CC]

Holger Eitzenberger napisaƂ(a):
 Hi,
 
 I am currently experiencing the same Kernel crash on several machines and
 Kernel version 2.6.16.43.  Attached are some dumps of one particular
 machine which crashed several times because of this.  Up until now I was
 unable to reproduce this behaviour in the testlab, also putting some I/O
 on the box helped not.
 
 All of them happened on UP kernels, but this may be just a coincidence.
From the logs I see that at least in one case the machine didn't stop
 immediately but worked for few our from that point on until it hit the
 wall.
 
 Looking at the traces I can say that all of them follow a codepath from
 the block I/O layer downward to ext3, e.g. here in page writeback path,
 see:
 
 kernel: Unable to handle kernel NULL pointer dereference at virtual address 
 0004
 kernel: printing eip:
 kernel: c018e36a
 kernel: *pde = 
 kernel: Oops:  [#1]
 kernel: Modules linked in: nfnetlink_queue ip_nat_ftp ip_conntrack_ftp
 edd sg sd_mod sr_mod scsi_mod ide_cd cdrom ipt_MASQUERADE ipt_hashlimit
 xt_condition ipt_REDIRECT xt_limit xt_conntrack ipt_esp xt_tcpudp
 ipt_psd ipt_addrtype ip_nat_mms ip_nat_pptp ip_nat_irc iptable_nat
 ebtable_nat ebtables iptable_ips ip_conntrack_mms ip_conntrack_pptp
 ip_conntrack_irc ppp_deflate zlib_deflate bsd_comp sha1
 arc4 ppp_mppe ppp_async crc_ccitt ppp_generic slhc crypto_null blowfish
 cast5 serpent twofish ipsec af_packet ipt_logmark ipt_confirmed
 ipt_owner ipt_REJECT ipt_CONFIRMED evdev ehci_hcd uhci_hcd ohci_hcd
 parport_pc ppdev parport xt_state xt_NOTRACK iptable_raw iptable_filter
 ip_conntrack_netlink ip_nat ipt_LOG ip_conntrack ip_tables x_tables
 nfnetlink_log nfnetlink eepro100 mii e100 capability commoncap loop
 kernel: CPU:0
 kernel: EIP:0060:[c018e36a]Not tainted VLI
 kernel: EFLAGS: 00010286   (2.6.16.43-46-default #1)
 kernel: EIP is at walk_page_buffers+0x1a/0x70
 kernel: eax:    ebx:    ecx:    edx: 
 kernel: esi:    edi: d5f3cb74   ebp:    esp: c34a5d6c
 kernel: ds: 007b   es: 007b   ss: 0068
 kernel: Process pdflush (pid: 22753, threadinfo=c34a4000 task=cd7cb070)
 kernel: Stack: 0d573c574 dbb7c720  dbb7c720 c1114540 dbb7c720
 d5f3cb74 dbb7c720
 kernel: c01919c3 1000  c018e3c0 cb6e1710 0246 c1114540
 000a
 kernel: c01918c0 c34a5f48 c0176979 c1114540 c34a5f48 c34a5e28 
 000e
 kernel: Call Trace:
 kernel: [c01919c3] ext3_ordered_writepage+0x103/0x1f0
 kernel: [c018e3c0] bget_one+0x0/0x10
 kernel: [c01918c0] ext3_ordered_writepage+0x0/0x1f0
 kernel: [c0176979] mpage_writepages+0x1c9/0x3e0
 kernel: [c01918c0] ext3_ordered_writepage+0x0/0x1f0
 kernel: [c013aa79] do_writepages+0x49/0x50
 kernel: [c017504c] __writeback_single_inode+0x8c/0x3c0
 kernel: [c02fbafc] schedule_timeout+0x4c/0xc0
 kernel: [c01755a8] sync_sb_inodes+0x178/0x230
 kernel: [c0175b1f] writeback_inodes+0x6f/0x89
 kernel: [c013ac59] wb_kupdate+0xf9/0x170
 kernel: [c013b5ee] pdflush+0x8e/0x180
 ...
 
 The disassembly of write_page_buffers() is at [1].  At least some of the
 other crashes happen in the sys_write() path, I have attached some of
 them ([2], [3] and [4]).
 
 Looking at the LKML archive I can say that
 
  http://lkml.org/lkml/2007/3/4/11
 
 looks similar.
 
 Any help appreciated.
 
 Thanks.
 
   /holger
 
 
 [1]
 http://ftp.astaromail.com/people/heitzenberger/v7/kernel/6313/walk_page_buffers.s
 [2]
 http://ftp.astaromail.com/people/heitzenberger/v7/kernel/6313/kernel-2007-05-03.log.gz
 [3] 
 http://ftp.astaromail.com/people/heitzenberger/v7/kernel/6313/kernel-2007-05-10.log.gz
 [4] 
 http://ftp.astaromail.com/people/heitzenberger/v7/kernel/6313/kernel-2007-05-15.log.gz

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e2fsprogs coverity patch cid-33.diff

2007-05-31 Thread Theodore Tso
On Tue, May 29, 2007 at 05:10:25PM -0600, Andreas Dilger wrote:
 I also have another outstanding patch:
 
 ===
 Coverity ID: 6: Forward Null
 
 At the second conditional iter-file could still be NULL. We need to
 check for it again.

Also applied.

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND e2progs] get_backup_sb: Consider block size when searching for backups

2007-05-31 Thread Theodore Tso
On Tue, May 29, 2007 at 10:26:44PM +0100, Daniel Drake wrote:
 I sent this in a few weeks ago, and it hasn't been applied to the hg tree.
 Any comments? Thanks.

Thanks, applied.  Sorry for the delay.

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Multiple mount protection

2007-05-31 Thread Theodore Tso
On Thu, May 31, 2007 at 02:28:33AM +0530, Kalpak Shah wrote:
 
 So can I assume that the INCOMPAT_MMP flag and the s_mmp_interval and
 s_mmp_block superblock fields will be reserved regardless of whether the
 patches go into ext4? I had attached the patches in the last mail so you
 can share your views on them.

Yes, i've reserved the code point and superblock fields.  I'm not
going to add INCOMPAT_MMP flag to the supported file until I get and
integrate the patch ext2fs_open() that actually tests for the flag,
though, since that would be a bit silly.

I assume the patch will add a flag to ext2fs_open which skips the MMP
checking.  After all, tune2fs is allowed to make changes to the
superblock while the filesystem is mounted.  So it needs to be able to
open the filesystem read/only even if it is mounted.

Regards,

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] store RAID stride in superblock

2007-05-31 Thread Theodore Tso
On Thu, May 24, 2007 at 07:45:32PM +0530, Rupesh Thakare wrote:
 Hello,
 I've added s_raid_stripe_width parameter in superblock.
 I've also incorporated s_raid_stride and s_raid_stripe_width 
 parameters in tune2fs.
 The new options can be specified using  '-E options' in both mke2fs and 
 tune2fs.
 Both the Man pages (mke2fs and tune2fs) are updated accordingly.
 Patch is attached herewith.

Thanks.  I've used a different offset for the raid_stripe_width, to
avoid conflicting with Kalpak's mmp patch.  

Could you send me a signed-off-by for your patch?

Thanks,

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] store RAID stride in superblock

2007-05-31 Thread Andreas Dilger
On May 31, 2007  12:21 -0400, Theodore Tso wrote:
 On Thu, May 24, 2007 at 07:45:32PM +0530, Rupesh Thakare wrote:
  I've added s_raid_stripe_width parameter in superblock.
  I've also incorporated s_raid_stride and s_raid_stripe_width 
  parameters in tune2fs.
  The new options can be specified using  '-E options' in both mke2fs and 
  tune2fs.
  Both the Man pages (mke2fs and tune2fs) are updated accordingly.
  Patch is attached herewith.
 
 Thanks.  I've used a different offset for the raid_stripe_width, to
 avoid conflicting with Kalpak's mmp patch.  

Ah, we've been doing it the other way around here.  It makes sense to keep
the s_raid_stripe_width fields together.  I think this code is preliminary
enough that nobody has actually started using it yet.  Can you please post
what the end of ext2_super_block looks like (whether you decide to reorder
the fields or not).

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] store RAID stride in superblock

2007-05-31 Thread Kalpak Shah
On Thu, 2007-05-31 at 14:19 -0600, Andreas Dilger wrote:
 On May 31, 2007  12:21 -0400, Theodore Tso wrote:
  On Thu, May 24, 2007 at 07:45:32PM +0530, Rupesh Thakare wrote:
   I've added s_raid_stripe_width parameter in superblock.
   I've also incorporated s_raid_stride and s_raid_stripe_width 
   parameters in tune2fs.
   The new options can be specified using  '-E options' in both mke2fs and 
   tune2fs.
   Both the Man pages (mke2fs and tune2fs) are updated accordingly.
   Patch is attached herewith.
  
  Thanks.  I've used a different offset for the raid_stripe_width, to
  avoid conflicting with Kalpak's mmp patch.  
 
 Ah, we've been doing it the other way around here.  It makes sense to keep
 the s_raid_stripe_width fields together.  I think this code is preliminary
 enough that nobody has actually started using it yet.  Can you please post
 what the end of ext2_super_block looks like (whether you decide to reorder
 the fields or not).

I can update the MMP patches when I actually send them for inclusion. So
I think it makes sense to keep the s_raid_* fields together.

Thanks,
Kalpak.

 
 Cheers, Andreas
 --
 Andreas Dilger
 Principal Software Engineer
 Cluster File Systems, Inc.
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Multiple mount protection

2007-05-31 Thread Kalpak Shah
On Thu, 2007-05-31 at 12:16 -0400, Theodore Tso wrote:
 On Thu, May 31, 2007 at 02:28:33AM +0530, Kalpak Shah wrote:
  
  So can I assume that the INCOMPAT_MMP flag and the s_mmp_interval and
  s_mmp_block superblock fields will be reserved regardless of whether the
  patches go into ext4? I had attached the patches in the last mail so you
  can share your views on them.
 
 Yes, i've reserved the code point and superblock fields.

Thanks.

   I'm not going to add INCOMPAT_MMP flag to the supported file until I get and
 integrate the patch ext2fs_open() that actually tests for the flag,
 though, since that would be a bit silly.
 
 I assume the patch will add a flag to ext2fs_open which skips the MMP
 checking.

Yes I have added a EXT2_FLAG_SKIP_MMP flag to ext2fs_open() to bypass
MMP which will be set if tunefs is used with -f option. Also MMP check
will not be run if the filesystem is being opened readonly.

Thanks,
Kalpak.

   After all, tune2fs is allowed to make changes to the
 superblock while the filesystem is mounted.  So it needs to be able to
 open the filesystem read/only even if it is mounted.
 
 Regards,
 
   - Ted

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT] Please pull ext4 bug fixes

2007-05-31 Thread Theodore Ts'o

Hi Linus,

Please pull from the for_linus branch at:

git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git for_linus

It contains the following fixes, which (except for the superblock field
reservations) have all been in the -mm tree for quite a while.

Thanks, regards,

- Ted


 fs/ext4/balloc.c|6 -
 fs/ext4/extents.c   |  148 +++-
 fs/ext4/inode.c |4 -
 fs/ext4/namei.c |4 -
 fs/ext4/super.c |2 
 include/linux/ext4_fs.h |   33 +---
 include/linux/ext4_fs_extents.h |5 -
 include/linux/ext4_fs_i.h   |6 -
 8 files changed, 134 insertions(+), 74 deletions(-)

Alex Tomas (1):
  When ext4_ext_insert_extent() fails to insert new blocks

Amit Arora (1):
  ext4: Extent overlap bugfix

Dave Kleikamp (1):
  EXT4: Fix whitespace

Mingming Cao (1):
  Remove unnecessary exported symbols.

Theodore Ts'o (1):
  Define/reserve new ext4 superblock fields

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] store RAID stride in superblock

2007-05-31 Thread Andreas Dilger
On May 31, 2007  17:33 -0400, Theodore Tso wrote:
 Oops, I just pushed a set of bugfixes to Linux that included the
 superblock field reservations.

Oh well.

 What is in the e2fsprogs hg repository ... is:
 
   ..
   __u16   s_raid_stride;  /* RAID stride */
   __u16   s_mmp_interval; /* # seconds to wait in MMP checking */
   __u64   s_mmp_block;/* Block for multi-mount protection */
   __u32   s_raid_stripe_width;/* blocks on all data disks (N*stride)*/
   __u32   s_reserved[163];/* Padding to the end of the block */
 };

We're updating our patches to be based on the new HG code.

 One question which does come to mind; is there any reason why we might
 want to know the RAID level and/or the number of disks (as opposed to
 just the stripe width)?

Not so far.  The raid_stride is for bitmap placement (and could also be
used for alignment of random IOs to avoid making 2 disks busy when 1
would do).  The raid_stripe_width is the amount that delalloc+mballoc
will use for allocations+writes to avoid read-modify-write of RAID stripes.
It doesn't really matter what the RAID level is.

 And has anyone investigated where there are
 magic ioctl's or libdevmapper APi's so we can get the RAID parameters
 automatically?  If so, patches so that mke2fs can get the information
 automatically (as opposed to forcing the user to have to specify lots
 of annoying options) would be most welcome

For now we will specify this via mke2fs or tune2fs for existing filesystems.
The XFS folks mentioned they have a library to extract this info for linux
devices (e.g. DM, MD, etc), but of course that still won't work for e.g.
external RAID devices.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fw: ext3 dir_index causes an error

2007-05-31 Thread Andrew Morton

Ted is dir_index maintainer ;)

That's a nice-looking bug report, btw.  Thanks.


Begin forwarded message:

Date: Fri, 01 Jun 2007 13:01:07 +0900
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: ext3 dir_index causes an error



Hello,

First of all, I really appricate your great works.
Now I've found a problem around dir_index feature.
Here is a report following linux/REPORTING-BUGS.


[1.] One line summary of the problem:
ext3 dir_index causes an error

[2.] Full description of the problem/report:
This is my local test program to reproduce this problem. The
readdir1.c calls creat(2), opendir(3) and readdir(3). And the shell
script execute it repeatedly with a brand-new ext3fs image on a
loopback device.
When the script adds '-O dir_index' to mkfs, some errors appear.

On a system with linux-2.6.21.3, ext3fs produces these error message,
and the filesystem seems to be corrupted.
--
kjournald starting.  Commit interval 5 seconds
EXT3 FS on loop0, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
:::
EXT3-fs: mounted filesystem with ordered data mode.
EXT3-fs error (device loop0): htree_dirblock_to_tree: bad entry in directory 
#2: rec_len is too small for name_len - offset=6924, inode=26, rec_len=244, 
name_len=249
EXT3-fs error (device loop0): htree_dirblock_to_tree: bad entry in directory 
#2: rec_len is too small for name_len - offset=6924, inode=26, rec_len=244, 
name_len=249
EXT3-fs error (device loop0): htree_dirblock_to_tree: bad entry in directory 
#2: rec_len is too small for name_len - offset=6924, inode=26, rec_len=244, 
name_len=249
kjournald starting.  Commit interval 5 seconds
:::
--

On the other system with linux-2.6.18 (debian etch kernel), the same
error appears.
When the script adds '-O ^dir_index' to mkfs, the problem never appears.

It is not everytime that these errors appear. So the shell script
executes the readdir1 test program repeatedly.
Recently I upgraded my debian system from version 3.1 'sarge' to 4.0
'etch'. The debian etch sets the dir_index feature by default. So I
found this problem.

[3.] Keywords (i.e., modules, networking, kernel):
ext3 dir_index

[4.] Kernel information
[4.1.] Kernel version (from /proc/version):
[4.2.] Kernel .config file:
[5.] Most recent kernel version which did not have the bug:
[6.] Output of Oops.. message (if applicable) with symbolic information
 resolved (see Documentation/oops-tracing.txt)
[7.] A small shell script or example program which triggers the
 problem (if possible)

(readdir1.c)

#include sys/types.h
#include sys/stat.h
#include fcntl.h
#include dirent.h
#include stdio.h
#include stdlib.h
#include unistd.h
#include assert.h
#include string.h
#include errno.h

void fin(char *s)
{
perror(s);
exit(1);
}

void msg(int found, char *fname)
{
printf(%s%s found\n, fname, found?: not);
}

int
main(int argc, char *argv[])
{
DIR *dp;
struct dirent *de;
int err, found, i;
char a[250];

err = chdir(argv[1]);
if (err)
fin(chdir);

memset(a, 'a', sizeof(a)-1);
a[sizeof(a)-1] = 0;
for (i = 0; i  16+1; i++) {
a[0]++;
err = creat(a, 0644);
if (err  0)
fin(creat);

err = creat(argv[2], 0644);
if (err  0)
fin(creat);
}

#if 0
err = unlink(argv[2]);
if (err  errno != ENOENT)
fin(unlink);
#endif

dp = opendir(.);
if (!dp)
fin(opendir);

de = readdir(dp);
if (!de)
fin(1st readdir);
assert(strcmp(argv[2], de-d_name));

#if 0
argv[2][0]++;
err = creat(argv[2], 0644);
if (err  0)
fin(creat);

argv[2][0]--;
#endif
err = creat(argv[2], 0644);
if (err  0)
fin(creat);

#if 0
err = unlink(argv[2]);
if (err  errno != ENOENT)
fin(unlink);
#endif

found = 0;
while ((de = readdir(dp))  !found)
found = !strcmp(argv[2], de-d_name);
msg(found, argv[2]);

found = 0;
rewinddir(dp);
while ((de = readdir(dp))  !found)
found = !strcmp(argv[2], de-d_name);
msg(found, argv[2]);

closedir(dp);
dp = opendir(.);
if (!dp)
fin(opendir);

found = 0;
while ((de = readdir(dp))  !found)
found = !strcmp(argv[2], de-d_name);
msg(found, argv[2]);

return 0;
}
--
#!/bin/sh

img=rw.img
dir=rw
set -e
make /tmp/readdir1

cd /dev/shm
dd if=/dev/zero of=$img bs=1k count=4k 2