Re: [CentOS] HP Proliant ML150 : how do I access disks ?

2010-02-17 Thread Kris Buytaert
On Wed, 2010-02-17 at 15:20 +0100, Rainer Duffner wrote:
> >
> > On 02/17/2010 03:38 PM, Rainer Duffner wrote:
> >   
> > Hello there,
> >
> > I don't know about ML's but with DL series CentOS don't have any
> > problems at all and with seeing disks in particular. So I presume that
> > Rainer is absolutely right. You have to build an array first.
> > Check this link
> > http://docs.hp.com/en/9320/acu.pdf
> >
> >   
> 
> 
> 
> BTW: does 5.4 work on the latest G6 hardware?

It does on my DL360 and BL460's



___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] [DRBD-user] Unexplained reboots in DRBD82 + OCFS2 setup

2009-06-30 Thread Kris Buytaert
On Thu, 2009-06-25 at 11:42 +0200, Kris Buytaert wrote:

> > Use a serial console, attach that to some "monitoring" host.
> > (you can useUSB-to-Serial, they are cheap and work), and log
> > on that one. You'll get the last messages from there.
> > 
> I indeed had hoped to see some output on on the serial console when the
> reboots happened .. but the best I got so far was a partial timestamp
> with no further explanation before the reboot output started again .. 
> 
> Any other ideas ? 
> 

Update : 

The problem is indeed ocfs2 fencing off the systems , the logging
however does not show up in a serial console  it DOES show up when using
netconsole 


[base-r...@ccmt-a ~]# nc -l -u -p 
(8,0):o2hb_write_timeout:166 ERROR: Heartbeat write timeout to device
drbd0 after 478000 milliseconds
(8,0):o2hb_stop_all_regions:1873 ERROR: stopping heartbeat on all active
regions.
ocfs2 is very sorry to be fencing this system by restarting
,

One'd think that it output over Serial console before it log over the
network :)   It doesn't . 




Next step is that I`ll start fiddling some more with the timeout
values :) 


 

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] [Ocfs2-users] Unexplained reboots in DRBD82 + OCFS2 setup

2009-06-25 Thread Kris Buytaert
On Wed, 2009-06-24 at 12:02 -0700, Sunil Mushran wrote:
> Do you have a separate network path for drbd traffic? If you do
> not, then you are probably overloading the network. In this case,
> I believe drbd is unable to replicate the ios fast enough and thus
> is blocking the o2cb disk heartbeat. One workaround is to increase
> the O2CB_HEARTBEAT_THRESHOLD to more than the default of 60 secs.
> Refer to the ocfs2 faq or ocfs2 1.4 user's guide for more on this.
> 
I've already modified the O2CB_HEARTBEAT_TRESHOLD to different values
(120, 240 etc), with no changes..


> And if you want to capture the logs, setup netconsole.
> 
/dev/console is a serial device connected to a terminal server,  so far
the best I got was a partial timestamp before I saw the output of the
reboot again .. 

It tries to log .. but doesn't finish writing it :(  But mostly there is
no activity at all on the serial console :( 

Any other ideas ? 

greetings


Kris 




> Kris Buytaert wrote:
> > We're trying to setup a dual-primary DRBD environment, with a shared
> > disk with either OCFS2 or GFS.   The environment is a Centos 5.3 with
> > DRBD82 (but also tried with DRBD83 from testing) .
> >
> > Setting up a single primary disk and running bonnie++ on it works.
> > Setting up a dual-primary disk, only mounting it on one node (ext3) and
> > running bonnie++  works
> >
> > When setting up ocfs2 on the /dev/drbd0 disk and mounting it on both
> > nodes, basic functionality seems in place but usually less than 5-10
> > minutes after I start bonnie++ as a test on one of the nodes , both
> > nodes power cycle  with no errors in the logfiles, just a crash.
> >
> > When at the console at the time of crash it looks like a disk IO (you
> > can type , but actions happen)  block happens  then a reboot, no panics,
> > no oops , nothing. ( sysctl panic values set to timeouts etc )
> > Setting up a dual-primary disk , with ocfs2 only mounting it on one node
> > and starting bonnie++ causes only that node to crash.
> >
> > On DRBD level I get the following error when that node dissapears
> >
> > drbd0: PingAck did not arrive in time.
> > drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure )
> > pdsk(UpToDate -> DUnknown )
> > drbd0: asender terminated
> > drbd0: Terminating asender thread
> >
> > That however is an expected error because of the reboot.
> >
> > At first I assumed OCFS2 to be the root of this problem ..so I moved
> > forward and setup an ISCSI target on a 3rd node, and used that device
> > with the same OCFS2 setup. There no crashes occured and bonnie++
> > flawlessly completed it test run.
> >
> > So my attention went  back to the combination of DRBD and OCFS 
> >
> > I tried both DRBD 8.2 drbd82-8.2.6-1.el5.centos kmod-drbd82-8.2.6-2  and
> > the 83 variant from Centos Testing
> >
> > At first I was trying with the ocfs2 1.4.1-1.el5.i386.rpm verson but
> > upgrading to  1.4.2-1.el5.i386.rpm didn't change the behaviour
> >
> >
> > Anyone has an idea on this ? 
> > How can we get more debug info from OCFS2  , apart from heartbeat
> > tracing which doesn't learn me nothing yet ..  in order to potentially
> > file a valuable bug report.
> >   
> 

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] Unexplained reboots in DRBD82 + OCFS2 setup

2009-06-24 Thread Kris Buytaert


We're trying to setup a dual-primary DRBD environment, with a shared
disk with either OCFS2 or GFS.   The environment is a Centos 5.3 with
DRBD82 (but also tried with DRBD83 from testing) .

Setting up a single primary disk and running bonnie++ on it works.
Setting up a dual-primary disk, only mounting it on one node (ext3) and
running bonnie++  works

When setting up ocfs2 on the /dev/drbd0 disk and mounting it on both
nodes, basic functionality seems in place but usually less than 5-10
minutes after I start bonnie++ as a test on one of the nodes , both
nodes power cycle  with no errors in the logfiles, just a crash.

When at the console at the time of crash it looks like a disk IO (you
can type , but actions happen)  block happens  then a reboot, no panics,
no oops , nothing. ( sysctl panic values set to timeouts etc )
Setting up a dual-primary disk , with ocfs2 only mounting it on one node
and starting bonnie++ causes only that node to crash.

On DRBD level I get the following error when that node dissapears

drbd0: PingAck did not arrive in time.
drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure )
pdsk(UpToDate -> DUnknown )
drbd0: asender terminated
drbd0: Terminating asender thread

That however is an expected error because of the reboot.

At first I assumed OCFS2 to be the root of this problem ..so I moved
forward and setup an ISCSI target on a 3rd node, and used that device
with the same OCFS2 setup. There no crashes occured and bonnie++
flawlessly completed it test run.

So my attention went  back to the combination of DRBD and OCFS 

I tried both DRBD 8.2 drbd82-8.2.6-1.el5.centos kmod-drbd82-8.2.6-2  and
the 83 variant from Centos Testing

At first I was trying with the ocfs2 1.4.1-1.el5.i386.rpm verson but
upgrading to  1.4.2-1.el5.i386.rpm didn't change the behaviour


Anyone has an idea on this ? 
How can we get more debug info from OCFS2  , apart from heartbeat
tracing which doesn't learn me nothing yet ..  in order to potentially
file a valuable bug report.


thnx in advance 

Kris 


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos