Bug#273713: Lustre packaging

2006-09-08 Thread Jimmy Tang
Hi Alastair,

> The 2.6.16 code I have works for light use: survives some tests such as 
> bonnie, etc. 
> but hangs in large workloads: I'm debugging this, but would prefer to 
> target 2.6.17 for Etch. 
> (even if we don't get in the Etch release, I'd like to support the 
> stable kernel.) Some patches 
> ported to 2.6.17.

Out of curiousity what sort of heavy workloads are you trying out on the
system?

I'd be interested in testing the package out on a small test cluster
here as well for users who have heavy IO needs.

also is there any interest in testing these patches for 2.6.16/17 with
with the openib patches/stacks?


Jimmy.


-- 
Jimmy Tang
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#273713: Lustre packaging

2006-09-08 Thread Jimmy Tang
Hi Alastair,

On Fri, Sep 08, 2006 at 01:24:44PM +0100, Alastair McKinstry wrote:
> >   
> None at the moment; we've a small test cluster that had driver issues up
> to 2.6.17, and so i'm trying out 2.6.17.

ah okay, i think i know the problem you may be refering (to the e326
sata disks and controllers) i think we've patched our sles9 kernel for
that issue. its probably the same fix thats in .17

> give it a bit to sort out some issues with the packaging.  The current
> head-of-tree
> in the repo is definitely a Work in progress, concentrating on merging
> current work
> by Goswin von Brederlow and myself (and others); I plan to get an
> experimental release
> worth proper testing, then we can add openib patches. I'll email you as
> soon as thats
> ready. Do you have openib patches for 2.6.16/17 ?

I think one of the guys had the openib stack/patches working with
2.6.16 a few months ago on a small segment of our cluster. we havent
been too impressed with openib, as there were a few issues with it. but
its something that we're probably going to revisit at a later date. we
were just using the patches from the openfabrics svn repo.


Jimmy.

-- 
Jimmy Tang
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#273713: Lustre packaging

2006-09-08 Thread Jimmy Tang
Hi Goswin,

On Fri, Sep 08, 2006 at 03:34:40PM +0200, Goswin von Brederlow wrote:
> >
> >> The 2.6.16 code I have works for light use: survives some tests such as 
> >> bonnie, etc. 
> >> but hangs in large workloads: I'm debugging this, but would prefer to 
> >> target 2.6.17 for Etch. 
> >> (even if we don't get in the Etch release, I'd like to support the 
> >> stable kernel.) Some patches 
> >> ported to 2.6.17.
> >
> > Out of curiousity what sort of heavy workloads are you trying out on the
> > system?
> >
> > I'd be interested in testing the package out on a small test cluster
> > here as well for users who have heavy IO needs.
> 
> We usualy do a burn-in test that continiously copies a linux source
> tree to a new dir and compares it. And that with a few clients.
> 
> Also some benchmarks like bonnie with 1-x clients to see how it
> scales.
> 
> > also is there any interest in testing these patches for 2.6.16/17 with
> > with the openib patches/stacks?
> 
> For that I'm waiting for 2.6.18. I'm assuming you mean the openib2
> driver in the vanilla kernel and not the (extra) melanox drivers. With
> 2.6.15 we patch in the melanox drivers.
> 

I guess i didnt phrase my initial mail too well, but yes openib2 in the
vanilla kernel + lustre it is something I would like to test. though we havent
sucessfully gotten openib2 to work correctly on our compute systems so
we havent looked at lustre + openib2 yet.

i guess we should look at getting openib2 working correctly at our site
before i post more to this list in relation to openib2+lustre.


Thanks,
Jimmy. 

-- 
Jimmy Tang
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#273713: Lustre packaging

2006-09-11 Thread Jimmy Tang
Hi Alastair,

just poking through the package so far, and i noticed that etch is using
gcc4.1 and cross referencing the lustre-discuss list, i noticed that
even though 4.x is targetted but isnt working right, 3.3 / 3.4 seems to be
a better choice for compilers for lustre (at least for now) if one wantss a
more stable system.

quoting the lustre discuss list (though a nearly a month old at this
point in time)...

Date: Mon, 14 Aug 2006 11:30:41 -0400
From: "Peter J. Braam" <[EMAIL PROTECTED]>
Subject: RE: [Lustre-discuss] GCC version(s)
To: <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]

It looks like it consumes more stack than the gcc3 family, and we have
seen crashes due to that.  We are not 100% sure about this, but this is
what we are guessing at the moment.

- Peter -

 > -Original Message-
 > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 > Sent: Monday, August 14, 2006 9:12 AM
 > To: Peter J. Braam
 > Cc: [EMAIL PROTECTED]
 > Subject: [Lustre-discuss] GCC version(s)
 >
 > From: "Peter J. Braam" <[EMAIL PROTECTED]>
 > Date: Mon, 14 Aug 2006 10:53:02 -0400
 >
 > Hi
 >
 > The gcc4 problem will be tackled during the coming
 > months.  We hope, of
 > course, to increase our agility and keep up a little better.
 >
 > That's good to hear.
 >
 > What exactly is the problem with gcc4?  It won't compile?
 > Wierd errors at runtime?  Something else?
 >
 >


Jimmy.

On Fri, Sep 08, 2006 at 01:24:44PM +0100, Alastair McKinstry wrote:
> Jimmy Tang wrote:
> > Hi Alastair,
> >
> >   
> >> The 2.6.16 code I have works for light use: survives some tests such as 
> >> bonnie, etc. 
> >> but hangs in large workloads: I'm debugging this, but would prefer to 
> >> target 2.6.17 for Etch. 
> >> (even if we don't get in the Etch release, I'd like to support the 
> >> stable kernel.) Some patches 
> >> ported to 2.6.17.
> >> 
> >
> > Out of curiousity what sort of heavy workloads are you trying out on the
> > system?
> >
> >   
> None at the moment; we've a small test cluster that had driver issues up
> to 2.6.17, and so i'm trying out 2.6.17.
> > I'd be interested in testing the package out on a small test cluster
> > here as well for users who have heavy IO needs.
> >
> > also is there any interest in testing these patches for 2.6.16/17 with
> > with the openib patches/stacks?
> >
> >
> >   
> give it a bit to sort out some issues with the packaging.  The current
> head-of-tree
> in the repo is definitely a Work in progress, concentrating on merging
> current work
> by Goswin von Brederlow and myself (and others); I plan to get an
> experimental release
> worth proper testing, then we can add openib patches. I'll email you as
> soon as thats
> ready. Do you have openib patches for 2.6.16/17 ?
> > Jimmy.
> >
> >
> >   
> 
> Alastair
> 
> 
> -- 
> To UNSUBSCRIBE, email to [EMAIL PROTECTED]
> with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
---end quoted text---

-- 
Jimmy Tang
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#273713: Lustre packaging

2006-09-12 Thread Jimmy Tang
Hi Alastair,


On Mon, Sep 11, 2006 at 12:59:39PM +0100, Alastair McKinstry wrote:
> Hi Jimmy,
> 
> Thanks, I saw that and am using gcc-3.3 at the moment. I will try to
> enforce 3.3 in
> the packaging (Depends: on gcc, then gcc-3.3 in the Makefiles, etc.)
> 
> Regards
> Alastair
---end quoted text---

I've been pulling updates from the svn repo and testing with building
the package, and i've run across a build issue,

i've been trying out in building the package in sarge with some packages
installed from unstable, and it happyily fails to build due to a gcc error

  multiple definition of `__i686.get_pc_thunk.bx'

sarge's gcc 3.3 version is 3.3.5 (Debian 1:3.3.5-13), upon upgrading to
gcc-3.3 in unstable/testing  3.3.6 (Debian 1:3.3.6-13), things happily
build again.

its probably not a bad idea to just stick in a >= 3.3.6-13 than version in
the control file for gcc, at least it will stop people from trying to build a
backport only to have it fail on the wrong compiler version.

Jimmy.


-- 
Jimmy Tang
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]