Re: [CentOS] [Ocfs2-users] Unexplained reboots in DRBD82 + OCFS2 setup

2009-06-26 Thread Fabian Arrotin
Kris Buytaert wrote:
  big snip
Have you already tested with GFS/GFS2 ? I remember (after having 
discussed with DRBD people) that OCFS2 was more or less supported on 
DRBD 7.x while they advised using GFS/GFS2 on top of DRBD  8.x devices ..
Just my two cents though : i've only tested OCFS2 once and never used it 
in production ;-)

-- 
--
Fabian Arrotin
  idea=`grep -i clue /dev/brain`
  test -z $idea  echo sorry, init 6 in progress || sh ./answer.sh
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] [Ocfs2-users] Unexplained reboots in DRBD82 + OCFS2 setup

2009-06-25 Thread Kris Buytaert
On Wed, 2009-06-24 at 12:02 -0700, Sunil Mushran wrote:
 Do you have a separate network path for drbd traffic? If you do
 not, then you are probably overloading the network. In this case,
 I believe drbd is unable to replicate the ios fast enough and thus
 is blocking the o2cb disk heartbeat. One workaround is to increase
 the O2CB_HEARTBEAT_THRESHOLD to more than the default of 60 secs.
 Refer to the ocfs2 faq or ocfs2 1.4 user's guide for more on this.
 
I've already modified the O2CB_HEARTBEAT_TRESHOLD to different values
(120, 240 etc), with no changes..


 And if you want to capture the logs, setup netconsole.
 
/dev/console is a serial device connected to a terminal server,  so far
the best I got was a partial timestamp before I saw the output of the
reboot again .. 

It tries to log .. but doesn't finish writing it :(  But mostly there is
no activity at all on the serial console :( 

Any other ideas ? 

greetings


Kris 




 Kris Buytaert wrote:
  We're trying to setup a dual-primary DRBD environment, with a shared
  disk with either OCFS2 or GFS.   The environment is a Centos 5.3 with
  DRBD82 (but also tried with DRBD83 from testing) .
 
  Setting up a single primary disk and running bonnie++ on it works.
  Setting up a dual-primary disk, only mounting it on one node (ext3) and
  running bonnie++  works
 
  When setting up ocfs2 on the /dev/drbd0 disk and mounting it on both
  nodes, basic functionality seems in place but usually less than 5-10
  minutes after I start bonnie++ as a test on one of the nodes , both
  nodes power cycle  with no errors in the logfiles, just a crash.
 
  When at the console at the time of crash it looks like a disk IO (you
  can type , but actions happen)  block happens  then a reboot, no panics,
  no oops , nothing. ( sysctl panic values set to timeouts etc )
  Setting up a dual-primary disk , with ocfs2 only mounting it on one node
  and starting bonnie++ causes only that node to crash.
 
  On DRBD level I get the following error when that node dissapears
 
  drbd0: PingAck did not arrive in time.
  drbd0: peer( Primary - Unknown ) conn( Connected - NetworkFailure )
  pdsk(UpToDate - DUnknown )
  drbd0: asender terminated
  drbd0: Terminating asender thread
 
  That however is an expected error because of the reboot.
 
  At first I assumed OCFS2 to be the root of this problem ..so I moved
  forward and setup an ISCSI target on a 3rd node, and used that device
  with the same OCFS2 setup. There no crashes occured and bonnie++
  flawlessly completed it test run.
 
  So my attention went  back to the combination of DRBD and OCFS 
 
  I tried both DRBD 8.2 drbd82-8.2.6-1.el5.centos kmod-drbd82-8.2.6-2  and
  the 83 variant from Centos Testing
 
  At first I was trying with the ocfs2 1.4.1-1.el5.i386.rpm verson but
  upgrading to  1.4.2-1.el5.i386.rpm didn't change the behaviour
 
 
  Anyone has an idea on this ? 
  How can we get more debug info from OCFS2  , apart from heartbeat
  tracing which doesn't learn me nothing yet ..  in order to potentially
  file a valuable bug report.

 

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] [Ocfs2-users] Unexplained reboots in DRBD82 + OCFS2 setup

2009-06-25 Thread Ross Walker
On Jun 25, 2009, at 5:44 AM, Kris Buytaert m...@inuits.be wrote:

 /dev/console is a serial device connected to a terminal server,  so  
 far
 the best I got was a partial timestamp before I saw the output of the
 reboot again ..

 It tries to log .. but doesn't finish writing it :(  But mostly  
 there is
 no activity at all on the serial console :(

 Any other ideas ?

Set up the crash kernel and get a core dump of the system at the time  
of the crash.

It's the only way to find the culprit.

-Ross

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos