Re: Problem with multiple mounts

2006-11-09 Thread Lennart Sorensen
On Wed, Nov 08, 2006 at 04:06:23PM -0700, Andreas Dilger wrote:
 I would suggest that even while this is not supported, it would be prudent
 to fix such a bug.  It might be possible to hit a similar problem if there
 is corruption of the on-disk data in the journal and oopsing the kernel
 isn't a graceful way to deal with bad data on disk.

On the other hand corrupt data at least doesn't change under you while
you are trying to figure out the filesystem.  This particular use would
have meta data changing while you are trying to read it, making things
not be consistent with each other from one moment to another.  There may
be nothing that can be done about it.

--
Len Sorensen


Re: Problem with multiple mounts

2006-11-08 Thread Lennart Sorensen
On Wed, Nov 08, 2006 at 11:22:15AM -0800, Suzuki wrote:
 I exported a disk partition using nbd protocol. On the nbd client, I 
 make reiserfs and run fsstress test case on this partition. At the same 
 time, I mount this partition on the nbd server. Then Oops appears as 
 following:
 
 ReiserFS: sda10: found reiserfs format 3.6 with standard journal
 ReiserFS: sda10: using ordered data mode
 ReiserFS: sda10: journal params: device sda10, size 8192, journal first 
 block 18, max trans len 1024, max batch 900, max commit age 30, max 
 trans age 30
 ReiserFS: sda10: checking transaction log (sda10)
 
 Oops: Kernel access of bad area, sig: 11 [#1]
 
 Call Trace:
 [C00011333090] [C01EDB70] .journal_read+0x165c/0x1b6c 
 (unreliable)
 [C00011333410] [C01EF280] .journal_init+0xdc0/0xee8
 [C00011333530] [C01CDBD8] .reiserfs_fill_super+0xa90/0x1e40
 [C00011333790] [C011E988] .get_sb_bdev+0x208/0x31c
 [C00011333870] [C01CA00C] .get_super_block+0x38/0x60
 [C00011333900] [C011E260] .vfs_kern_mount+0xec/0x198
 [C000113339B0] [C011E3E0] .do_kern_mount+0x88/0xdc
 [C00011333A50] [C01532CC] .do_mount+0xd50/0xe08
 [C00011333D60] [C0175090] .compat_sys_mount+0x368/0x448
 [C00011333E30] [C000861C] syscall_exit+0x0/0x40
 
 But, if we try the steps in the reverse order,
 
 mount the partition on nbd server first and then try fsstress tests on 
 the client side. This is just to ensure that the server is not seeing an 
 incomplete journal created by the client side runs.
 
 Things work fine !
 
 I doubt if this is due to the mount finding an incomplete journal 
 created by the client side fsstress runs in the first scenario.
 
 My question is : Is this supported ? Mounting a filesystem which is 
 already mounted and replaying the ( - a may be incomplete- ) journal.

Absolutely not supported.  Unless you have a filesystem that is
specifically designed for simultanious read-write mount from multiple
places, then you can't.  For performance reasons most systems cache
writes and updates in many cases, so the data read by one system may be
out of date because another system has an update waiting to go to disk.
You need a filesystem that has the ability for multiple systems to talk
to each other about updates and locking and such things.  Look for a
cluster supporting filesystem or whatever is used to refer to a
filesystem that supports multiple hosts having it mounted to provide
redundant access.  No normal filesystem can do it unless everyone has it
mounted read only.  If you want to share it, use NFS.  That's what it's
for.

--
Len Sorensen


Re: Problem with multiple mounts

2006-11-08 Thread Suzuki

Lennart Sorensen wrote:

On Wed, Nov 08, 2006 at 11:22:15AM -0800, Suzuki wrote:

I exported a disk partition using nbd protocol. On the nbd client, I 
make reiserfs and run fsstress test case on this partition. At the same 
time, I mount this partition on the nbd server. Then Oops appears as 
following:


ReiserFS: sda10: found reiserfs format 3.6 with standard journal
ReiserFS: sda10: using ordered data mode
ReiserFS: sda10: journal params: device sda10, size 8192, journal first 
block 18, max trans len 1024, max batch 900, max commit age 30, max 
trans age 30

ReiserFS: sda10: checking transaction log (sda10)

Oops: Kernel access of bad area, sig: 11 [#1]

Call Trace:
[C00011333090] [C01EDB70] .journal_read+0x165c/0x1b6c 
(unreliable)

[C00011333410] [C01EF280] .journal_init+0xdc0/0xee8
[C00011333530] [C01CDBD8] .reiserfs_fill_super+0xa90/0x1e40
[C00011333790] [C011E988] .get_sb_bdev+0x208/0x31c
[C00011333870] [C01CA00C] .get_super_block+0x38/0x60
[C00011333900] [C011E260] .vfs_kern_mount+0xec/0x198
[C000113339B0] [C011E3E0] .do_kern_mount+0x88/0xdc
[C00011333A50] [C01532CC] .do_mount+0xd50/0xe08
[C00011333D60] [C0175090] .compat_sys_mount+0x368/0x448
[C00011333E30] [C000861C] syscall_exit+0x0/0x40

But, if we try the steps in the reverse order,

mount the partition on nbd server first and then try fsstress tests on 
the client side. This is just to ensure that the server is not seeing an 
incomplete journal created by the client side runs.


Things work fine !

I doubt if this is due to the mount finding an incomplete journal 
created by the client side fsstress runs in the first scenario.


My question is : Is this supported ? Mounting a filesystem which is 
already mounted and replaying the ( - a may be incomplete- ) journal.



Absolutely not supported.  Unless you have a filesystem that is
specifically designed for simultanious read-write mount from multiple
places, then you can't.  For performance reasons most systems cache
writes and updates in many cases, so the data read by one system may be
out of date because another system has an update waiting to go to disk.
You need a filesystem that has the ability for multiple systems to talk
to each other about updates and locking and such things.  Look for a
cluster supporting filesystem or whatever is used to refer to a
filesystem that supports multiple hosts having it mounted to provide
redundant access.  No normal filesystem can do it unless everyone has it
mounted read only.  If you want to share it, use NFS.  That's what it's
for.


Thanks for the response. This problem was reported by one of our test 
team on 2.6.19. So, I wanted to confirm that what they are doing is not 
supported !




Thanks,

Suzuki


--
Len Sorensen




Re: Problem with multiple mounts

2006-11-08 Thread Andreas Dilger
On Nov 08, 2006  14:38 -0800, Suzuki wrote:
 Lennart Sorensen wrote:
 ReiserFS: sda10: checking transaction log (sda10)
 
 Oops: Kernel access of bad area, sig: 11 [#1]
 
 Call Trace:
 [C00011333090] [C01EDB70] .journal_read+0x165c/0x1b6c 
 (unreliable)
 [C00011333410] [C01EF280] .journal_init+0xdc0/0xee8
 [C00011333530] [C01CDBD8] .reiserfs_fill_super+0xa90/0x1e40
 [C00011333790] [C011E988] .get_sb_bdev+0x208/0x31c
 [C00011333870] [C01CA00C] .get_super_block+0x38/0x60
 [C00011333900] [C011E260] .vfs_kern_mount+0xec/0x198
 [C000113339B0] [C011E3E0] .do_kern_mount+0x88/0xdc
 [C00011333A50] [C01532CC] .do_mount+0xd50/0xe08
 [C00011333D60] [C0175090] .compat_sys_mount+0x368/0x448
 [C00011333E30] [C000861C] syscall_exit+0x0/0x40
 
 My question is : Is this supported ? Mounting a filesystem which is 
 already mounted and replaying the ( - a may be incomplete- ) journal.

 Thanks for the response. This problem was reported by one of our test 
 team on 2.6.19. So, I wanted to confirm that what they are doing is not 
 supported !

I would suggest that even while this is not supported, it would be prudent
to fix such a bug.  It might be possible to hit a similar problem if there
is corruption of the on-disk data in the journal and oopsing the kernel
isn't a graceful way to deal with bad data on disk.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.