Christian Kujau wrote:
On Tue, October 16, 2007 00:27, Charles Perreault wrote:
I've got a Ubuntu backup server using JFS on a mdadm raid 1 array.  The
server has been using XFS for months on a 2.6.17 kernel and has been

You are aware of http://oss.sgi.com/projects/xfs/faq.html#dir2 ?
No I wasn't, but I know about the infamous "bug" or "security feature", call it whatever you like, that fills files with zeros sometimes when the server crash. That's in part why I'm moving to JFS.
Twelve days ago, the filesystem remounted itself in read-only mode.  I
didn't even know that would happen in case of a problem.

See mount(8):

  Mount options for jfs
  [...]
  errors=continue / errors=remount-ro / errors=panic
Do you really know people that actually read user manuals before they face a problem ? Little joke about tech support :P Yeah I've seen that line since the first time the filesystem was corrupt, and I agree remount-ro is the best default choice.

I did a fsck in read-only mode and remounted read-write.

Did you mean, you remounted the device RO and did fsck, or did you unmount
the device and performed a RO-fsck (jfs.fsck -n). Please try to use a
current version of jfsprogs and try to fsck (without -n) the device.
I mean I didn't remount anything before doing the fsck : the filesystem put itself in read-only mode. No I didn't use the fsck -n, it was a repair fsck I made. My jfsprogs are v1.1.11. I'll try doing the -n check next time the filesystem corrupts.
This morning, I woke up to find my server again remounted in read-only
mode.  A funny thing is that mount reports the drive to be mounted (rw),
but it's impossible to touch any file.

Sometimes /etc/mtab does not represent the contents of /proc/mounts.

Again, fsck then remount get my array back online, but this is
annoying.  Anyone know why this corruption would occur ?

Hm, occasional corruptions can stem from software bugs to anything like
bad hardware. 2.6.23.1 is out, you could check the changelog if there's
anything related to this one. But I'm more curious about the outcome of
fsck on the unmounted device....

Christian.
I'm actually testing my hardware on that server. Memtest86+ ran 75 pass without any error (36 straight hours), the cpu too passed several stress test. I had filesystem corruption on another server due to bad ram in the past, I know what pain that can cause. The hard drives both passed smart tests and were zeroed with dd without problem. The only hardware part that would need more extensive testing would be the SATA controller. It's a Silicon Image 3114, which libata says it's stable (production ready). Also as the HDD tests are good, and they use the SATA controller, I'd say it's working correctly. I'll try compiling 2.6.23 later this week.

Charles


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Jfs-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jfs-discussion

Reply via email to