On Sat, 2007-11-17 at 19:42 -0500, Tom Sightler wrote:
> Running this command on RHEL5.1 consistently produces errors on the
> filesystem with messages like the following:
> 
> EXT3-fs error (device dm-19) in ext3_orphan_del: Journal has aborted
> EXT3-fs error (device dm-19) in ext3_reserve_inode_write: Journal has
> aborted
> __journal_remove_journal_head: freeing b_committed_data
> __journal_remove_journal_head: freeing b_committed_data
> attempt to access beyond end of device
> dm-19: rw=0, want=17247241224, limit=4688363520
> attempt to access beyond end of device
> 
> If we reboot the very same hardware with RHEL4.5, mount the same volume,
> and run the same test it works perfectly every time.
> 
> Has anyone else run significant I/O stress test on RHEL5.1 yet?  We have
> not been able to reproduce this issue with non-striped volumes but we're
> still very early in our testing and are just looking for community
> feedback before taking up the problem with Redhat.

I know it's poor form to reply to myself but looking deeper into the
test results it seems the corruption is only happening when the
underlying physical volumes are using dm-multipath with round-robin load
balancing, and perhaps only with certain hardware.  We can easily
reproduce the issue with a simple partition over a single dm-multipath
device to a LUN on an Apple Xserve RAID.

This still seems like it's probably a bug since the exact same config
works flawlessly with RHEL4.5 and hardware works fine with round-robin.
Changing the policy to "failover" rather than "multibus" seems to work
around the problem since that makes only one path active.  We'll do more
testing with a wider array of storage next week but I'd still love to
hear from others that might be running dm-multipath with round-robin
load balancing if their seeing any issues with 5.1.

Thanks,
Tom


_______________________________________________
rhelv5-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/rhelv5-list

Reply via email to