newly created 8.5TB reiserfs fails fsck on amd64 and causes OOPS

2005-07-19 Thread Paul Slootman
This is on a dual-CPU opteron system, with 2 x 3ware 9500 12-channel
SATA controllers for a total of 8.5TB; I've configured a RAID5 over each
3ware controller, and use linux md RAID0 over those two "devices".
There was an issue with linux md RAID0 for that size, but that's been
resolved (at least, the problem I had first :-)

The device itself seems to work fine, as reiser4 works. I
wanted to compare to reiserfs 3.6, so I created a reiserfs, mounted it,
and tried to use it. Running bonnie++ on it caused an oops, apparently
in the reiserfs code:

kernel: Unable to handle kernel paging request at 2aad79ea RIP:
kernel: {scan_bitmap_block+129}
kernel: PGD 954ad067 PUD 9a1af067 PMD b75bc067 PTE 0
kernel: Oops:  [1] SMP
kernel: CPU 0
kernel: Modules linked in: raid0 reiser4 zlib_deflate zlib_inflate raid5 raid6 
xor ipv6 evdev tg3 3w_9xxx hw_random i2c_amd756 i2c_amd8111 i2c_core psmouse rtc
kernel: Pid: 12006, comm: bonnie++ Not tainted 2.6.12.2.raid0fixreiser4
kernel: RIP: 0010:[scan_bitmap_block+129/768] 
{scan_bitmap_block+129}
kernel: RSP: 0018:81007f461a18  EFLAGS: 00010286
kernel: RAX: 2aad79ea RBX: 001c RCX: 21e0
kernel: RDX:  RSI: 001c RDI: 81007f461d98
kernel: RBP: c250a1c0 R08: 0001 R09: 0011
kernel: R10:  R11:  R12: 81007f461a9c
kernel: R13: 21e0 R14: 81007f0ffc00 R15: 0011
kernel: FS:  2b26f8c0() GS:804b7480() 
knlGS:
kernel: CS:  0010 DS:  ES:  CR0: 8005003b
kernel: CR2: 2aad79ea CR3: d9d11000 CR4: 06e0
kernel: Process bonnie++ (pid: 12006, threadinfo 81007f46, task 
81007feee0f0)
kernel: Stack: 0010 801cabce 001c0001 
81007f461d98
kernel:81009f6c9018 001c 81007f0ffc00 
001c
kernel:81007f461d98 0001
kernel: Call Trace:{scan_bitmap+585} 
{reiserfs_allocate_blocknrs+803}
kernel:{reiserfs_allocate_blocks_for_region+524}
kernel:{alloc_page_buffers+102} 
{pathrelse+40}
kernel:{autoremove_wake_function+0} 
{reiserfs_file_write+1045}
kernel:{do_anonymous_page+861} 
{handle_mm_fault+304}
kernel:{__up_read+33} 
{do_page_fault+601}
kernel:{_spin_lock+3} 
{vfs_write+192}
kernel:{sys_write+83} 
{system_call+126}
kernel:
kernel:
kernel: Code: 8b 00 48 c1 e8 02 a8 01 74 17 49 8b 86 70 02 00 00 48 ff 80
kernel: RIP {scan_bitmap_block+129} RSP 
kernel: CR2: 2aad79ea


I rebooted (hard, as a shutdown didn't work...). After that, I tried a
mkfs followd by an fsck, which gives an error! Here's the console log:


satazilla:~# mkfs.reiserfs /dev/md13
mkfs.reiserfs 3.6.19 (2003 www.namesys.com)

A pair of credits:
Joshua Macdonald wrote the first draft of the transaction manager. Yuri Rupasov
did testing  and benchmarking,  plus he invented the r5 hash  (also used by the
dcache  code).  Yura  Rupasov,  Anatoly Pinchuk,  Igor Krasheninnikov,  Grigory
Zaigralin,  Mikhail  Gilula,   Igor  Zagorovsky,  Roman  Pozlevich,  Konstantin
Shvachko, and Joshua MacDonald are former contributors to the project.

The  Defense  Advanced  Research  Projects Agency (DARPA, www.darpa.mil) is the
primary sponsor of Reiser4.  DARPA  does  not  endorse  this project; it merely 
sponsors it.


Guessing about desired format.. Kernel 2.6.12.2.raid0fixreiser4 is running.
Format 3.6 with standard journal
Count of blocks on the device: 2148377056
Number of blocks consumed by mkreiserfs formatting process: 8239
Blocksize: 4096
Hash function used to sort names: "r5"
Journal Size 8193 blocks (first block 18)
Journal Max transaction length 1024
inode generation number: 0
UUID: 10ae60ee-1abb-49d1-ae55-cf238626c0b5
ATTENTION: YOU SHOULD REBOOT AFTER FDISK!
ALL DATA WILL BE LOST ON '/dev/md13'!
Continue (y/n):y
Initializing journal - 0%20%40%60%80%100%
Syncing..ok

Tell your friends to use a kernel based on 2.4.18 or later, and especially not a
kernel based on 2.4.9, when you use reiserFS. Have fun.

ReiserFS is successfully created on /dev/md13.
satazilla:~# reiserfsck  /dev/md13
reiserfsck 3.6.19 (2003 www.namesys.com)

*
** If you are using the latest reiserfsprogs and  it fails **
** please  email bug reports to reiserfs-list@namesys.com, **
** providing  as  much  information  as  possible --  your **
** hardware,  kernel,  patches,  settings,  all reiserfsck **
** messages  (including version),  the reiserfsck logfile, **
** check  the  syslog file  for  any  related information. **
** If you would like advice on using this program, support **
** is available  for $25 at  www.namesys.com/support.html. **
*

Will read-only check consistency of the filesystem on /dev/md13
Will put 

Re: newly created 8.5TB reiserfs fails fsck on amd64 and causes OOPS

2005-07-28 Thread Vitaly Fertman
Hello, 

On Tuesday 19 July 2005 15:47, Paul Slootman wrote:
> This is on a dual-CPU opteron system, with 2 x 3ware 9500 12-channel
> SATA controllers for a total of 8.5TB; I've configured a RAID5 over each
> 3ware controller, and use linux md RAID0 over those two "devices".
> There was an issue with linux md RAID0 for that size, but that's been
> resolved (at least, the problem I had first :-)
> 
> The device itself seems to work fine, as reiser4 works. I
> wanted to compare to reiserfs 3.6, so I created a reiserfs, mounted it,
> and tried to use it. Running bonnie++ on it caused an oops, apparently
> in the reiserfs code:
> 
> I rebooted (hard, as a shutdown didn't work...). After that, I tried a
> mkfs followd by an fsck, which gives an error! Here's the console log:
> 
> 
> satazilla:~# mkfs.reiserfs /dev/md13
> mkfs.reiserfs 3.6.19 (2003 www.namesys.com)
> 
> Guessing about desired format.. Kernel 2.6.12.2.raid0fixreiser4 is running.
> Format 3.6 with standard journal
> Count of blocks on the device: 2148377056

ahh, indeed, this amount of blocks needs 65564 bitmap count,
whereas there is only 16 bits field in the super block for 
the bitmap count. in other words, this limits the reiserfs 
size to: 65535 * BlockSize * 8 * Blocksize, for BlockSize 
== 4K it is 8T. 

the check for bitmap block count overflow seems to be missed 
in progs. hmm, and our faq about 16Tb is not correct also...

> Number of blocks consumed by mkreiserfs formatting process: 8239
> Blocksize: 4096
> Hash function used to sort names: "r5"
> Journal Size 8193 blocks (first block 18)
> Journal Max transaction length 1024
> inode generation number: 0
> UUID: 10ae60ee-1abb-49d1-ae55-cf238626c0b5
> ATTENTION: YOU SHOULD REBOOT AFTER FDISK!
> ALL DATA WILL BE LOST ON '/dev/md13'!
> Continue (y/n):y
> Initializing journal - 0%20%40%60%80%100%
> Syncing..ok
> 
> Tell your friends to use a kernel based on 2.4.18 or later, and especially 
> not a
> kernel based on 2.4.9, when you use reiserFS. Have fun.
> 
> ReiserFS is successfully created on /dev/md13.
> satazilla:~# reiserfsck  /dev/md13
> reiserfsck 3.6.19 (2003 www.namesys.com)
> 
> Will read-only check consistency of the filesystem on /dev/md13
> Will put log info to 'stdout'
> 
> Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
> ###
> reiserfsck --check started at Tue Jul 19 13:29:22 2005
> ###
> Replaying journal..
> No transactions found
> reiserfs_open_ondisk_bitmap: wrong either bitmaps number,
> count of blocks or blocksize, run with --rebuild-sb to fix it
> reiserfsck: Could not open bitmap


 
> When I try running with --rebuild-sb it says:
> 
> Reiserfs super block in block 16 on 0x90d of format 3.6 with standard journal
> Count of blocks on the device: 2148377056
> Number of bitmaps: 28
> Blocksize: 4096
> Free blocks (count of blocks - used [journal, bitmaps, data, reserved] 
> blocks): 2148368817
> Root block: 8211
> Filesystem is clean
> Tree height: 2
> Hash function used to sort names: "r5"
> Objectid map size 2, max 972
> Journal parameters:
> Device [0x0]
> Magic [0x168ed58c]
> Size 8193 blocks (including 1 for journal header) (first block 18)
> Max transaction length 1024 blocks
> Max batch size 900 blocks
> Max commit age 30
> Blocks reserved by journal: 0
> Fs state field: 0x0:
> sb_version: 2
> inode generation number: 0
> UUID: 10ae60ee-1abb-49d1-ae55-cf238626c0b5
> LABEL: 
> Set flags in SB:
> ATTRIBUTES CLEAN
> 
> Super block seems to be correct
> 
> 
> 
> Something seems seriously wrong here.
> I'm happy to run any tests or try any patches, this system is mine to play 
> with
> until the end of the month.
> 
> 
> Paul Slootman
> 
> 

-- 
Thanks,
Vitaly Fertman


Re: newly created 8.5TB reiserfs fails fsck on amd64 and causes OOPS

2005-07-28 Thread Jeff Mahoney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Vitaly Fertman wrote:
> Hello, 
> 
> On Tuesday 19 July 2005 15:47, Paul Slootman wrote:
>>This is on a dual-CPU opteron system, with 2 x 3ware 9500 12-channel
>>SATA controllers for a total of 8.5TB; I've configured a RAID5 over each
>>3ware controller, and use linux md RAID0 over those two "devices".
>>There was an issue with linux md RAID0 for that size, but that's been
>>resolved (at least, the problem I had first :-)
>>
>>The device itself seems to work fine, as reiser4 works. I
>>wanted to compare to reiserfs 3.6, so I created a reiserfs, mounted it,
>>and tried to use it. Running bonnie++ on it caused an oops, apparently
>>in the reiserfs code:
>>
>>I rebooted (hard, as a shutdown didn't work...). After that, I tried a
>>mkfs followd by an fsck, which gives an error! Here's the console log:
>>
>>
>>satazilla:~# mkfs.reiserfs /dev/md13
>>mkfs.reiserfs 3.6.19 (2003 www.namesys.com)
>>
>>Guessing about desired format.. Kernel 2.6.12.2.raid0fixreiser4 is running.
>>Format 3.6 with standard journal
>>Count of blocks on the device: 2148377056
> 
> ahh, indeed, this amount of blocks needs 65564 bitmap count,
> whereas there is only 16 bits field in the super block for 
> the bitmap count. in other words, this limits the reiserfs 
> size to: 65535 * BlockSize * 8 * Blocksize, for BlockSize 
> == 4K it is 8T. 
> 
> the check for bitmap block count overflow seems to be missed 
> in progs. hmm, and our faq about 16Tb is not correct also...

Out of curiousity, why is the number of bitmaps even needed if it can be
calculated?

If that's truly the limiting factor, could we perhaps set s_bmap_nr = 0
and calculate the number of bitmaps at mount time? The s_bmap_nr = 0
would ensure that a mount of the filesystem on a kernel unaware of the
larger size would fail since it would fail allocating memory to store
the buffer heads.

It's not friendly, but neither is advertising a 16TB filesystem size,
when there is a limit at 8TB on most systems.

- -Jeff

- --
Jeff Mahoney
SuSE Labs
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFC6QmpLPWxlyuTD7IRApKgAJ9djDA5MrAWrnT8T/JwobMMankNwgCfRKY6
lE0x+U5lemBsw0k8G8iwCHc=
=SdkN
-END PGP SIGNATURE-


Re: newly created 8.5TB reiserfs fails fsck on amd64 and causes OOPS

2005-07-28 Thread Vladimir V. Saveliev

Hello

Jeff Mahoney wrote:

Vitaly Fertman wrote:

Hello, 


On Tuesday 19 July 2005 15:47, Paul Slootman wrote:


This is on a dual-CPU opteron system, with 2 x 3ware 9500 12-channel
SATA controllers for a total of 8.5TB; I've configured a RAID5 over each
3ware controller, and use linux md RAID0 over those two "devices".
There was an issue with linux md RAID0 for that size, but that's been
resolved (at least, the problem I had first :-)

The device itself seems to work fine, as reiser4 works. I
wanted to compare to reiserfs 3.6, so I created a reiserfs, mounted it,
and tried to use it. Running bonnie++ on it caused an oops, apparently
in the reiserfs code:

I rebooted (hard, as a shutdown didn't work...). After that, I tried a
mkfs followd by an fsck, which gives an error! Here's the console log:


satazilla:~# mkfs.reiserfs /dev/md13
mkfs.reiserfs 3.6.19 (2003 www.namesys.com)

Guessing about desired format.. Kernel 2.6.12.2.raid0fixreiser4 is running.
Format 3.6 with standard journal
Count of blocks on the device: 2148377056


ahh, indeed, this amount of blocks needs 65564 bitmap count,
whereas there is only 16 bits field in the super block for 
the bitmap count. in other words, this limits the reiserfs 
size to: 65535 * BlockSize * 8 * Blocksize, for BlockSize 
== 4K it is 8T. 

the check for bitmap block count overflow seems to be missed 
in progs. hmm, and our faq about 16Tb is not correct also...



Out of curiousity, why is the number of bitmaps even needed if it can be
calculated?


Well, usually, at least for me, when you look at the code you wrote some time 
ago (8 years for example)
you always wonder "how me could write that".
So, reiserfs could go just fine without it.


If that's truly the limiting factor, could we perhaps set s_bmap_nr = 0
and calculate the number of bitmaps at mount time? The s_bmap_nr = 0
would ensure that a mount of the filesystem on a kernel unaware of the
larger size would fail since it would fail allocating memory to store
the buffer heads.

It's not friendly, but neither is advertising a 16TB filesystem size,
when there is a limit at 8TB on most systems.

-Jeff

--
Jeff Mahoney
SuSE Labs