I have a 2 node cluster of HP DL380G4s. These machines are attached via scsi to an external HP disk enclosure. They run 32bit RH AS 4.0 and OCFS 1.2.4, the latest release. They were upgraded from 1.2.3 only a few days after 1.2.4 was released. I had reported on the mailing list that my developers were happy, and things seemed faster. However, twice in that time, the cluster has gone down due to the kernel OOM killer killing processes, and then ASR kicks in, and eventually reboots the box.
I am also starting to notice some directory corruption, and errors like this in /var/log/messages Feb 18 04:14:37 cyber1 kernel: (23693,1):ocfs2_check_dir_entry:1703 ERROR: bad entry in directory #101726961: rec_len % 4 != 0 - offset=0, inode=3484598105688391, rec_len=18, name_len=128 Sometimes I can't delete a directory, it will tell me its not empty, even though it is. What could this be? I was hoping that OCFS 1.2.4 would have fixed the out of memory problems, but it looks like I still run into it. What information can I provide that will help? _______________________________________________ Ocfs2-users mailing list [email protected] http://oss.oracle.com/mailman/listinfo/ocfs2-users
