Re: [CentOS] Unexplained reboots in DRBD82 + OCFS2 setup

2009-06-25 Thread Ian Forde
On Wed, 2009-06-24 at 07:22 -0700, nate wrote:
 Kris Buytaert wrote:
 
 
  We're trying to setup a dual-primary DRBD environment, with a shared
  disk with either OCFS2 or GFS.   The environment is a Centos 5.3 with
  DRBD82 (but also tried with DRBD83 from testing) .
 
 Both OCFS2 and GFS are meant to be used on SANs with shared storage(same
 LUNs being accessed by multiple servers), I just re-confirmed that DRBD
 is not a shared storage mechanism but just a simple block mirroring
 technology between a couple of nodes(as I originally thought).

Actually, it's both.
http://www.drbd.org/users-guide-emb/ch-fundamentals.html gives the
overview.  It's shared storage with local disk access. And if you're
using Gig-E for the interconnect, it's *fast*. ;)

 I think you are mixing incompatible technologies. Even if you can
 get it working, just seems like a really bad idea.

That functionality is built in.  DRBD fully supports use of OCFS2 on top
of it in dual-primary mode.  See
http://www.drbd.org/users-guide-emb/ch-ocfs2.html

 Perhaps what you could do is setup an iSCSI initiator on your DRBD
 cluster, export a LUN to another cluster running OCFS2 or GFS(last I
 checked GFS required at least 3 nodes less than that and the cluster
 goes to read-only mode, I didn't see any minimum requirements for
 OCFS2).

You could do that, but it would probably be overkill.  Too many moving
parts.  You'd also slow down the speed.  You're talking about app node
- Gig-E - OCFS2/GFS cluster - Gig-E - iSCSI/DRBD cluster.  I'd
rather have app node - Gig-E - OCFS2/DRBD cluster.  And it's *much*
easier to setup.  GFS is a bit of a pita to setup.  I used to do it for
RH professionally and it's not entirely painless...

 Though the whole concept of DRBD just screams to me crap performance
 compared to a real shared storage system, wouldn't touch it with
 a 50 foot pole myself.

Nah... performance is pretty sweet.  Local disk access, sub-second
resync after rebooting one of the nodes, and the cost is *much* lower
than a real shared-storage system... if cost is a factor, I'd
seriously consider trialing the DRBD/OCFS2 combo.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] Unexplained reboots in DRBD82 + OCFS2 setup

2009-06-24 Thread Kris Buytaert


We're trying to setup a dual-primary DRBD environment, with a shared
disk with either OCFS2 or GFS.   The environment is a Centos 5.3 with
DRBD82 (but also tried with DRBD83 from testing) .

Setting up a single primary disk and running bonnie++ on it works.
Setting up a dual-primary disk, only mounting it on one node (ext3) and
running bonnie++  works

When setting up ocfs2 on the /dev/drbd0 disk and mounting it on both
nodes, basic functionality seems in place but usually less than 5-10
minutes after I start bonnie++ as a test on one of the nodes , both
nodes power cycle  with no errors in the logfiles, just a crash.

When at the console at the time of crash it looks like a disk IO (you
can type , but actions happen)  block happens  then a reboot, no panics,
no oops , nothing. ( sysctl panic values set to timeouts etc )
Setting up a dual-primary disk , with ocfs2 only mounting it on one node
and starting bonnie++ causes only that node to crash.

On DRBD level I get the following error when that node dissapears

drbd0: PingAck did not arrive in time.
drbd0: peer( Primary - Unknown ) conn( Connected - NetworkFailure )
pdsk(UpToDate - DUnknown )
drbd0: asender terminated
drbd0: Terminating asender thread

That however is an expected error because of the reboot.

At first I assumed OCFS2 to be the root of this problem ..so I moved
forward and setup an ISCSI target on a 3rd node, and used that device
with the same OCFS2 setup. There no crashes occured and bonnie++
flawlessly completed it test run.

So my attention went  back to the combination of DRBD and OCFS 

I tried both DRBD 8.2 drbd82-8.2.6-1.el5.centos kmod-drbd82-8.2.6-2  and
the 83 variant from Centos Testing

At first I was trying with the ocfs2 1.4.1-1.el5.i386.rpm verson but
upgrading to  1.4.2-1.el5.i386.rpm didn't change the behaviour


Anyone has an idea on this ? 
How can we get more debug info from OCFS2  , apart from heartbeat
tracing which doesn't learn me nothing yet ..  in order to potentially
file a valuable bug report.


thnx in advance 

Kris 


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Unexplained reboots in DRBD82 + OCFS2 setup

2009-06-24 Thread nate
Kris Buytaert wrote:


 We're trying to setup a dual-primary DRBD environment, with a shared
 disk with either OCFS2 or GFS.   The environment is a Centos 5.3 with
 DRBD82 (but also tried with DRBD83 from testing) .

Both OCFS2 and GFS are meant to be used on SANs with shared storage(same
LUNs being accessed by multiple servers), I just re-confirmed that DRBD
is not a shared storage mechanism but just a simple block mirroring
technology between a couple of nodes(as I originally thought).

I think you are mixing incompatible technologies. Even if you can
get it working, just seems like a really bad idea.

Perhaps what you could do is setup an iSCSI initiator on your DRBD
cluster, export a LUN to another cluster running OCFS2 or GFS(last I
checked GFS required at least 3 nodes less than that and the cluster
goes to read-only mode, I didn't see any minimum requirements for
OCFS2).

Though the whole concept of DRBD just screams to me crap performance
compared to a real shared storage system, wouldn't touch it with
a 50 foot pole myself.

nate





___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos