Re: Failed reads from RAID-0 array (from newbie who has read the FAQ)

2007-03-17 Thread Michael Schwarz
Neil:

Relevant stack trace follows. Any suggestions? blk_backing_dev_unplug...
Does that mean the raid subsystem thinks one of the usb drives has been
removed? I assure you that physically this is untrue, but that doesn't
mean that some sort logical disconnect hasn't happened...

Makes me wonder if one of my USB hub connections is intermittent...

I would also welcome any tips on any other developers group to follow up
with. I haven't hacked any kernel code since the 2.2.x kernel and things
have changed a bit! I don't mind digging into this, but I suspect I could
get things cleared up fast if I could find the right subject expert!



 ===
cpD E2FBEDB0  1784  4271   4270 (NOTLB)
   e2fbedb4 00200086 c15dc550 e2fbedb0 0001 00200082 1000

    c15dc550 000a e94182b0 f3161430 26320f40 01c5

   e94183bc c1c8c480  ecd7d300 c04e0bf2 c042e0e4 f7d767f8
003b6622
Call Trace:
 [] blk_backing_dev_unplug+0x73/0x7b
 [] getnstimeofday+0x30/0xb6
 [] io_schedule+0x3a/0x5c
 [] sync_page+0x0/0x3b
 [] sync_page+0x38/0x3b
 [] __wait_on_bit_lock+0x2a/0x52
 [] __lock_page+0x58/0x5e
 [] wake_bit_function+0x0/0x3c
 [] do_generic_mapping_read+0x1e0/0x459
 [] generic_file_aio_read+0x173/0x1a6
 [] file_read_actor+0x0/0xe0
 [] do_sync_read+0xc7/0x10a
 [] autoremove_wake_function+0x0/0x35
 [] do_sync_read+0x0/0x10a
 [] vfs_read+0xa6/0x152
 [] sys_read+0x41/0x67
 [] syscall_call+0x7/0xb
 ===

-- 
Michael Schwarz

> My guess would be a locking bug in the usb storage driver or some
> lower level USB driver..
> A significant difference between raid0 and linear is that a largish IO
> will touch all drives for raid-0, but only one or two for linear.
> That gives much more opportunity for locking bugs to hit.
>
> When it is in the hanging state, do
>   echo t > /proc/sysrq-trigger
>
> and look in the kernel logs for the stack trace of all processes.
> Hopefully the stack trace for the processes in 'D' state will be
> informative.
>
> NeilBrown
>
>
>>
>> Here are my mdadm commands to create the array:
>>
>> mdadm --create /dev/md0 --level=linear --auto=md --chunk=32
>> --raid-devices=7 /dev/sd?
>>
>> (The wildcard works because the seven flash drives are the only scsi
>> devices on the system).
>>
>> The command for the raid-0 array is the same as above except for the
>> "--level=0" it takes to make a raid 0 array.
>>
>> I then use "mkfs" to make the filesystem and mount the resulting array
>> at
>> "/mnt"
>>
>> Can anyone give a raid newbiw some tips? Is there something obvious I'm
>> missing? Would it help to provide strace/ltrace/ptrace of the hanging
>> copy
>> command?
>>
>> Any help (including URLs of manuals I should RTFM) would be most
>> welcome.
>>
>> Thanks!
>>
>>
>> --
>> Michael Schwarz
>>
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to [EMAIL PROTECTED]
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Failed reads from RAID-0 array (from newbie who has read the FAQ)

2007-03-16 Thread Neil Brown
On Friday March 16, [EMAIL PROTECTED] wrote:
> I'm not a Linux newbie (I've even written a couple of books and done some
> very light device driver work), but I'm completely new to the software
> raid subsystem.
> 
> I'm doing something rather oddball. I'm making an array of USB flash
> drives and comparing read and write rates.
> 
> Well, I've had great success writing. I've got seven flash drives on a
> hub. I've joined them up both linear and raid0 and written large amounts
> of data to them. But come time to read from them, linear works, but raid0
> hangs after transferring just shy of 2G of data. It doesn't matter if it
> reading from one file or from many files whose cumulative size is just shy
> of 2G. It doesn't matter if I'm using "dd" or "cp" to read the file or
> files.
> 
> The process doing the transfer is unkillable. Not with a kill -15 or a
> kill -9. It won't die, but it also won't make progress.
> 
> "Linear" always works. Raid-0 always hangs.

My guess would be a locking bug in the usb storage driver or some
lower level USB driver..
A significant difference between raid0 and linear is that a largish IO
will touch all drives for raid-0, but only one or two for linear.
That gives much more opportunity for locking bugs to hit.

When it is in the hanging state, do
  echo t > /proc/sysrq-trigger

and look in the kernel logs for the stack trace of all processes.
Hopefully the stack trace for the processes in 'D' state will be
informative.

NeilBrown


> 
> Here are my mdadm commands to create the array:
> 
> mdadm --create /dev/md0 --level=linear --auto=md --chunk=32
> --raid-devices=7 /dev/sd?
> 
> (The wildcard works because the seven flash drives are the only scsi
> devices on the system).
> 
> The command for the raid-0 array is the same as above except for the
> "--level=0" it takes to make a raid 0 array.
> 
> I then use "mkfs" to make the filesystem and mount the resulting array at
> "/mnt"
> 
> Can anyone give a raid newbiw some tips? Is there something obvious I'm
> missing? Would it help to provide strace/ltrace/ptrace of the hanging copy
> command?
> 
> Any help (including URLs of manuals I should RTFM) would be most welcome.
> 
> Thanks!
> 
> 
> -- 
> Michael Schwarz
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html