Re: [Lustre-discuss] Activation Inactive OST

2013-12-22 Thread Arden Wiebe


Yes for sure check the logs.  At times with my 1.87 after starting the clients 
before the MDS it takes some time for the reconnect.  Hope you resolved by 
reading logs.  Otherwise waiting for a reconnect is an option but I usually 
reboot the clients and know after the boot screen if they mounted or not within 
10 seconds without having to look.


On Sunday, December 22, 2013 12:53:51 AM, "Dilger, Andreas" 
 wrote:
 
Inactive OST means that the MDS cannot contact the OST for some reason (OST is 
not running, firewall rules, etc). You need to check console logs on MDS or OSS 
to see why.
>
>Cheers, Andreas
>
>On Dec 22, 2013, at 0:04, "Amjad Syed" 
>mailto:amjad...@gmail.com>> wrote:
>
>Hello,
>
>My apologies if this a simple question, but i am a newbie to lustre and trying 
>to debug an issue of mount.
>
>I have Lustre MDT at 1.8.7 and clients at 1.8.9.
>
>The OSTS are up but not active
>
>lctl dl
>  0 UP mgs MGS MGS 7
>  1 UP mgc MGC10.129.1.111@o2ib 8b175398-7e96-12ff-78ba-eb735bfdd319 5
>  2 UP mdt MDS MDS_uuid 3
>  3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4
>  4 UP mds lustre-MDT lustre-MDT_UUID 8
>  5 UP osc lustre-OST-osc lustre-mdtlov_UUID 5
>  6 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5
>  7 UP osc lustre-OST0002-osc lustre-mdtlov_UUID 5
>  8 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5
>  9 UP osc lustre-OST0004-osc lustre-mdtlov_UUID 5
>10 UP osc lustre-OST0005-osc lustre-mdtlov_UUID 5
>11 UP osc lustre-OST0006-osc lustre-mdtlov_UUID 5
>12 UP osc lustre-OST0007-osc lustre-mdtlov_UUID 5
>13 UP osc lustre-OST0008-osc lustre-mdtlov_UUID 5
>14 UP osc lustre-OST0009-osc lustre-mdtlov_UUID 5
>15 UP osc lustre-OST000a-osc lustre-mdtlov_UUID 5
>16 UP osc lustre-OST000b-osc lustre-mdtlov_UUID 5
>17 UP osc lustre-OST000c-osc lustre-mdtlov_UUID 5
>18 UP osc lustre-OST000d-osc lustre-mdtlov_UUID 5
>19 UP osc lustre-OST000e-osc lustre-mdtlov_UUID 5
>20 UP osc lustre-OST000f-osc lustre-mdtlov_UUID 5
>21 UP osc lustre-OST0010-osc lustre-mdtlov_UUID 5
>22 UP osc lustre-OST0011-osc lustre-mdtlov_UUID 5
>23 UP osc lustre-OST0012-osc lustre-mdtlov_UUID 5
>24 UP osc lustre-OST0013-osc lustre-mdtlov_UUID 5
>25 UP osc lustre-OST0014-osc lustre-mdtlov_UUID 5
>26 UP osc lustre-OST0015-osc lustre-mdtlov_UUID 5
>27 UP osc lustre-OST0016-osc lustre-mdtlov_UUID 5
>28 UP osc lustre-OST0017-osc lustre-mdtlov_UUID 5
>
>cat /proc/fs/lustre/lov/lustre-mdtlov/target_obd
>0: lustre-OST_UUID INACTIVE
>1: lustre-OST0001_UUID INACTIVE
>2: lustre-OST0002_UUID INACTIVE
>3: lustre-OST0003_UUID INACTIVE
>4: lustre-OST0004_UUID INACTIVE
>5: lustre-OST0005_UUID INACTIVE
>6: lustre-OST0006_UUID INACTIVE
>7: lustre-OST0007_UUID INACTIVE
>8: lustre-OST0008_UUID INACTIVE
>9: lustre-OST0009_UUID INACTIVE
>10: lustre-OST000a_UUID INACTIVE
>11: lustre-OST000b_UUID INACTIVE
>12: lustre-OST000c_UUID INACTIVE
>13: lustre-OST000d_UUID INACTIVE
>14: lustre-OST000e_UUID INACTIVE
>15: lustre-OST000f_UUID INACTIVE
>16: lustre-OST0010_UUID INACTIVE
>17: lustre-OST0011_UUID INACTIVE
>18: lustre-OST0012_UUID INACTIVE
>19: lustre-OST0013_UUID INACTIVE
>20: lustre-OST0014_UUID INACTIVE
>21: lustre-OST0015_UUID INACTIVE
>22: lustre-OST0016_UUID INACTIVE
>23: lustre-OST0017_UUID INACTIVE
>
>So the question is how can i activate the INACTIVE OSTS?
>
>Thanks
>
>___
>Lustre-discuss mailing list
>Lustre-discuss@lists.lustre.org
>http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>___
>Lustre-discuss mailing list
>Lustre-discuss@lists.lustre.org
>http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
>___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Applications of Lustre - streaming?

2012-12-08 Thread Arden Wiebe
Lustre works great for a Zoneminder installation.  Zoneminder has no problems 
recording indefinitely on four different cameras to the Lustre mount point.  
Plus I can view it remotely though a virtual session at 30fps on all four 
cameras for as long as necessary.

+1 for Mythtv and Lustre.  Can't beat a recording indefinitely on eight 
different tuners to 8TB of space.  Seems to take a long time to fill up that 
much space with the audio and video.  Works great for archiving all my movies 
and audio tracks.  

Backup and redundancy in my case comes from another box with 4 2TB drives 
raided together.  I copy big files around 105Mb/s and smaller ones like .frm 
files from database directories in their own good time mainly because there 
seems to be hundreds of thousands of them.  

I even take Lustre on the road with Mythtv and Zoneminder running in the 
Coach.  Works great for providing entertainment along the way and providing 
over the road security from all four sides as I'm travelling.  Plus it captures 
some nice videos and stills of the memories.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Fw: Re: Unable to activate OST

2010-01-14 Thread Arden Wiebe


--- On Thu, 1/14/10, Arden Wiebe  wrote:
> DM:
> 
> Your mount command is wrong - try this format.
> 
> mount -t lustre 192.168@tcp0:/ioio /mnt/ioio
> 
> > [r...@oss ~]# mkfs.lustre --fsname=datafs --ost --
> > mgsnode=192.168@tcp0 [r...@oss ~]# mount -t lustre
> /dev/lustre/
> > OST /lustre/OSS
> > mount.lustre: mount /dev/lustre/OST at /lustre/OSS
> failed: Input/
> > output error
> > Is the MGS running?
> 
> So by substitution for supplied your mount line should
> read:
> 
> mount -t datafs 192.168@tcp0:/datafs /mnt/datafs
> 
> Enjoy the required reading and testing.  I found by
> naming things uniquely helped me clarify what was actually
> required.  Try calling your filesystem "Dusty" or
> "Mark" and that should make things clearer for you.  
> 
> --- On Thu, 1/14/10, Andreas Dilger 
> wrote:
> 
> > From: Andreas Dilger 
> > Subject: Re: [Lustre-discuss] Unable to activate OST
> > To: "Dusty Marks" 
> > Cc: "lustre-discuss@lists.lustre.org
> discuss" 
> > Date: Thursday, January 14, 2010, 9:03 PM
> > On 2010-01-14, at 23:51, Dusty Marks
> > wrote:
> > > You are correct, there is information in
> messages.
> > Following are the  
> > > entries related the lustre. The line that says
> > 192.168@tcp is  
> > > unreachable makes sense, but what exactly is the
> > problem? I entered  
> > > the line "options lnet networks=tcp" in
> modprobe.conf
> > on the oss and  
> > > mds. The only difference was, i entered that
> line
> > AFTER i setup  
> > > lustre on the OSS. Could that be the problem? I
> don't
> > see why that  
> > > would be the problem, as the oss is trying to
> reach
> > the MDS/MGS,  
> > > which is 192.168.0.2.
> > >
> > > ---
> > /var/log/messages  
> > >
> >
> ---
> > > Jan 14 22:41:07 oss kernel: Lustre:
> > 2846:0:(linux-tcpip.c: 
> > > 688:libcfs_sock_connect()) Error -113 connecting
> > 0.0.0.0/1023 ->  
> > > 192.168.0.2/988
> > > Jan 14 22:41:07 oss kernel: Lustre:
> > 2846:0:(acceptor.c: 
> > > 95:lnet_connect_console_error()) Connection to
> > 192.168@tcp at  
> > > host 192.168.0.2 was unreachable: the network or
> that
> > node may be  
> > > down, or Lustre may be misconfigured.
> > 
> > 
> > Please read the chapter in the manual about network
> > configuration.  I  
> > suspect the .0.2 network is not your eth0 network
> > interface, and your  
> > modprobe.conf needs to be fixed.
> > 
> > Cheers, Andreas
> > --
> > Andreas Dilger
> > Sr. Staff Engineer, Lustre Group
> > Sun Microsystems of Canada, Inc.
> > 
> > ___
> > Lustre-discuss mailing list
> > Lustre-discuss@lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> > 
> 
> 
> 
> 


  
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Large scale delete results in lag on clients

2009-08-12 Thread Arden Wiebe
Jim:

Mag just started a good thread about a live back up.  Depending on your budget 
if the spare boxes are available and enough disks just make another Lustre 
filesystem and copy the existing data over with smb.  Here is a screenshot of 
my commodity hardware rig-up of a 5.4 TB raid10 Lustre Filesystem that uses 28 
1TB hard drives that could easily be built for under $1.00 if you shopped a 
little more conservatively or had existing hardware you could utilize in the 
build out.  http://www.ioio.ca/Lustre-tcp-bonding/images.html and 
http://www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html

Being that I used raid10 for the underlying redundancy my available storage 
space was reduced substantially.  I'm sure you could squeeze 15TB out of close 
to that number of disks if you used the right raid level.  Here is the hardware 
recipe I used at http://oil-gas.ca/phpsysinfo if it helps you to contemplate 
the upgrade route or the backup then upgrade route.  Otherwise if you knew 
someone with a spare 15TB of storage and bandwith you could quickly or not so 
quickly upload your data and then download again - again just ideas but the 
thought of doing a 15TB end to end data transfer using Lustre is interesting.

Arden

--- On Mon, 8/10/09, Jim McCusker  wrote:

> From: Jim McCusker 
> Subject: Re: [Lustre-discuss] Large scale delete results in lag on clients
> To: "Oleg Drokin" 
> Cc: "lustre-discuss" 
> Date: Monday, August 10, 2009, 8:03 PM
> On Monday, August 10, 2009, Oleg
> Drokin 
> wrote:
> > What lustre version is it now?
> >
> > We used to have uncontrolled unlinking where OSTs
> might get swamped with
> > unlink requests.
> > Now we limit to 8 unlinks to OST at any one time. This
> slows down the
> > deletion process, but at least there are no following
> aftershocks.
> > (bug 13843, included into 1.6.5 release
> 
> We're at 1.6.4.x. Is it too late to upgrade?
> 
> Jim
> 
> -- 
> Jim
> --
> Jim McCusker
> Programmer Analyst
> Krauthammer Lab, Pathology Informatics
> Yale School of Medicine
> james.mccus...@yale.edu
> | (203) 785-6330
> http://krauthammerlab.med.yale.edu
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 


  
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Large scale delete results in lag on clients

2009-08-07 Thread Arden Wiebe


--- On Fri, 8/7/09, Jim McCusker  wrote:

> From: Jim McCusker 
> Subject: Re: [Lustre-discuss] Large scale delete results in lag on clients
> To: "lustre-discuss" 
> Date: Friday, August 7, 2009, 5:25 AM
> On Fri, Aug 7, 2009 at 6:45 AM, Arden
> Wiebe 
> wrote:
> > --- On Thu, 8/6/09, Andreas Dilger 
> wrote:
> > > Jim McCusker wrote:
> 
> > > > We have a 15 TB luster volume across 4 OSTs
> and we recently deleted over 4
> > > > million files from it in order to free up
> the 80 GB MDT/MDS (going from 100%
> > > > capacity on it to 81%. As a result, after
> the rm completed, there is
> > > > significant lag on most file system
> operations (but fast access once it
> > > > occurs) even after the two servers that host
> the targets were rebooted. It
> > > > seems to clear up for a little while after
> reboot, but comes back after some
> > > > time.
> > > >
> > > > Any ideas?
> > >
> > > The Lustre unlink processing is somewhat
> asynchronous, so you may still be
> > > catching up with unlinks.  You can check this by
> looking at the OSS service
> > > RPC stats file to see if there are still object
> destroys being processed
> > > by the OSTs.  You could also just check the
> system load/io on the OSTs to
> > > see how busy they are in a "no load" situation.
> > >
> > >
> > > > For the curious, we host a large image
> archive (almost 400k images) and do
> > > > research on processing them. We had a lot of
> intermediate files that we
> > > > needed to clean up:
> > > >
> > > >  http://krauthammerlab.med.yale.edu/imagefinder
> (currently laggy and
> > > > unresponsive due to this problem)
> > > >
> >
> > Jim, from the web side perspective it seems
> responsive.  Are you actually serving the images from the
> lustre cluster?  I have ran a few searches looking for
> "Purified HIV Electron Microscope" and your project returns
> 15 pages of results with great links to full abstracts
> almost instantly but obviously none with real purified HIV
> electron microscope images similar to a real pathogenic
> virus like 
> http://krauthammerlab.med.yale.edu/imagefinder/Figure.external?sp=62982&state:Figure=BrO0ABXcRAQAACmRvY3VtZW50SWRzcgARamF2YS5sYW5nLkludGVnZXIS4qCk94GHOAIAAUkABXZhbHVleHIAEGphdmEubGFuZy5OdW1iZXKGrJUdC5TgiwIAAHhwAAD2Cg%3D%3D
> 
> The images and the lucene index are both served from the
> lustre
> cluster (as is just about everything else on our network).
> I think
> Andreas is right, it seems to have cleared itself up.
> You're seeing
> typical performance. If you don't find what you're looking
> for, you
> can expand your search to the full text, abstract, or title
> using the
> checkboxes below the search box. Of course, the lack of
> images in
> search has more to do with the availability of open access
> papers on
> the topic than the performance of lustre. :-)
> 

Yeah I was all over the full text check box as soon as I ran one query.  Great 
project by the way as there really is no way for any researcher or doctor to 
read the volumes of scientific journals the pharmaceutical industry pays for 
every month.  Sad how mass consensus has replaced the actual scientific method 
all for capitalism.  

> > Have you physically separated your MDS/MDT from the
> MGS portion on different servers?  I somehow doubt you
> overlooked this but if you didn't for some reason this could
> be a cause of unresponsiveness on the client side.  Again
> if your serving up the images from the cluster I find it
> works great.
> 
> This server started life as a 1.4.x server, so the MGS is
> still on the
> same partition as MDS/MDT. We have one server with the MGS,
> MDS/MDT,
> and two OSTs, and another server with two more OSTs. The
> first server
> also provides NFS and SMB services for the volume in
> question. I know
> that we're not supposed to mount the volume on a server
> that provides
> it, but limited budget means limited servers, and
> performance has been
> excellent except for this one problem.
> 

I roll the same way at http://oil-gas.ca/phpsysinfo and 
http://linuxguru.ca/phpsysinfo with the OST's actually providing tcp routing 
and DNS service for the network that leads surfers to the internal lustre 
powered webservers although at this time I'm actually only serving one file via 
a symlink from the physically separated by block device lustre cluster right 
now at http://workwanted.ca/images/3689011.avi (let me know how fast it 
downloads bac

Re: [Lustre-discuss] Large scale delete results in lag on clients

2009-08-07 Thread Arden Wiebe


--- On Thu, 8/6/09, Andreas Dilger  wrote:

> From: Andreas Dilger 
> Subject: Re: [Lustre-discuss] Large scale delete results in lag on clients
> To: "Jim McCusker" 
> Cc: "lustre-discuss" 
> Date: Thursday, August 6, 2009, 1:27 PM
> On Aug 06, 2009  15:08 -0400,
> Jim McCusker wrote:
> > We have a 15 TB luster volume across 4 OSTs and we
> recently deleted over 4
> > million files from it in order to free up the 80 GB
> MDT/MDS (going from 100%
> > capacity on it to 81%. As a result, after the rm
> completed, there is
> > significant lag on most file system operations (but
> fast access once it
> > occurs) even after the two servers that host the
> targets were rebooted. It
> > seems to clear up for a little while after reboot, but
> comes back after some
> > time.
> > 
> > Any ideas?
> 
> The Lustre unlink processing is somewhat asynchronous, so
> you may still be
> catching up with unlinks.  You can check this by
> looking at the OSS service
> RPC stats file to see if there are still object destroys
> being processed
> by the OSTs.  You could also just check the system
> load/io on the OSTs to
> see how busy they are in a "no load" situation.
> 
> 
> > For the curious, we host a large image archive (almost
> 400k images) and do
> > research on processing them. We had a lot of
> intermediate files that we
> > needed to clean up:
> > 
> >  http://krauthammerlab.med.yale.edu/imagefinder
> (currently laggy and
> > unresponsive due to this problem)
> > 

Jim, from the web side perspective it seems responsive.  Are you actually 
serving the images from the lustre cluster?  I have ran a few searches looking 
for "Purified HIV Electron Microscope" and your project returns 15 pages of 
results with great links to full abstracts almost instantly but obviously none 
with real purified HIV electron microscope images similar to a real pathogenic 
virus like 
http://krauthammerlab.med.yale.edu/imagefinder/Figure.external?sp=62982&state:Figure=BrO0ABXcRAQAACmRvY3VtZW50SWRzcgARamF2YS5sYW5nLkludGVnZXIS4qCk94GHOAIAAUkABXZhbHVleHIAEGphdmEubGFuZy5OdW1iZXKGrJUdC5TgiwIAAHhwAAD2Cg%3D%3D
 

Again though, not surprisingly some of the same proteins in this virus are 
present in molecular clones of HIV. I'll have to agree more now with 
http://www.karymullis.com that using PCR to detect viral infection is a bad 
idea lacking proper viral isolation of HIV that is still overlooked after 25 
years.  No doubt http://ThePerthGroup.com are probably correct in their views 
but enough curiosity.  

Have you physically separated your MDS/MDT from the MGS portion on different 
servers?  I somehow doubt you overlooked this but if you didn't for some reason 
this could be a cause of unresponsiveness on the client side.  Again if your 
serving up the images from the cluster I find it works great.

http://krauthammerlab.med.yale.edu/imagefinder/Figure.external?sp=62982&state:Figure=BrO0ABXcRAQAACmRvY3VtZW50SWRzcgARamF2YS5sYW5nLkludGVnZXIS4qCk94GHOAIAAUkABXZhbHVleHIAEGphdmEubGFuZy5OdW1iZXKGrJUdC5TgiwIAAHhwAAD2Cg%3D%3D

> > Thanks,
> > Jim
> > --
> > Jim McCusker
> > Programmer Analyst
> > Krauthammer Lab, Pathology Informatics
> > Yale School of Medicine
> > james.mccus...@yale.edu
> | (203) 785-6330
> > http://krauthammerlab.med.yale.edu
> 
> > ___
> > Lustre-discuss mailing list
> > Lustre-discuss@lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
> 
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 


  
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] a simple question

2009-06-18 Thread Arden Wiebe

Onane:

One of the first commands I issue is lfs df -h to check my newly mounted 
filesystem.  I mount my clients manually with a command similar to this:

mount -t lustre 192.168@tcp0:/ioio /mnt/ioio and then after it mounts in a 
minute or two I issue lfs df -h and look at the output.  While it is mounting I 
watch the output of tail -f /var/log/messages in a separate terminal.  If it's 
all good I mount the remaining clients and do some random copies to the new 
file system.  Your looking for similar output on your client machines.  

[r...@lustreone www.workwanted.ca]# lfs df -h
UUID bytes  Used Available  Use% Mounted on
ioio-MDT_UUID 1.6T683.3M  1.5T0% /mnt/ioio[MDT:0]
ioio-OST_UUID 2.7T 76.1G  2.5T2% /mnt/ioio[OST:0]
ioio-OST0001_UUID 2.7T 61.7G  2.5T2% /mnt/ioio[OST:1]

filesystem summary:   5.4T137.8G  5.0T2% /mnt/ioio

Next check how many inodes you have available for the system:

[r...@lustreone www.workwanted.ca]# lfs df -ih
UUIDInodes IUsed IFree IUse% Mounted on
ioio-MDT_UUID   412.3M  5.0M407.3M1% /mnt/ioio[MDT:0]
ioio-OST_UUID   174.7M  2.6M172.0M1% /mnt/ioio[OST:0]
ioio-OST0001_UUID   174.7M  2.3M172.4M1% /mnt/ioio[OST:1]

filesystem summary: 412.3M  5.0M407.3M1% /mnt/ioio

The command lfs has quite a few options you might find useful. 

Arden

--- On Thu, 6/18/09, Onane  wrote:

> From: Onane 
> Subject: [Lustre-discuss] a simple question
> To: lustre-discuss@lists.lustre.org
> Date: Thursday, June 18, 2009, 1:52 AM
> Hello,
> After installing lustre, how can I test it quickliy if it
> is installed correctly ?
> 
> 
> -Inline Attachment Follows-
> 
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 


  
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre installation and configuration problems

2009-06-17 Thread Arden Wiebe

Carlos:

This client of mine works. Matter of fact on all my clients it works.

[r...@lustreone]# rpm -qa | grep -i lustre
lustre-ldiskfs-3.0.8-2.6.18_92.1.17.el5_lustre.1.8.0smp
lustre-1.8.0-2.6.18_92.1.17.el5_lustre.1.8.0smp
lustre-modules-1.8.0-2.6.18_92.1.17.el5_lustre.1.8.0smp
kernel-lustre-smp-2.6.18-92.1.17.el5_lustre.1.8.0

Otherwise your output for the same command lists only 2 packages installed so 
you are missing some packages - those being the client packages if you don't 
want to use the patched kernel method of making a client as I have done above.  
If you issue the rpm commands I mentioned in the very first response of this 
thread you will have a working client.

Arden

--- On Wed, 6/17/09, Carlos Santana  wrote:

> From: Carlos Santana 
> Subject: Re: [Lustre-discuss] Lustre installation and configuration problems
> To: "Jerome, Ron" 
> Cc: lustre-discuss@lists.lustre.org
> Date: Wednesday, June 17, 2009, 5:10 PM
> Folks,
> 
> It been unsuccessful till now..
> 
> I made a fresh CentOS 5.2 minimum install (2.6.18-92.el5).
> Later, I
> updated kernel to 2.6.18-92.1.17 version. Here is a output
> from uname
> and rpm query:
> 
> [r...@localhost ~]# rpm -qa | grep lustre
> lustre-1.8.0-2.6.18_92.1.17.el5_lustre.1.8.0smp
> lustre-modules-1.8.0-2.6.18_92.1.17.el5_lustre.1.8.0smp
> [r...@localhost ~]# uname -a
> Linux localhost.localdomain 2.6.18-92.1.17.el5 #1 SMP Tue
> Nov 4
> 13:45:01 EST 2008 i686 i686 i386 GNU/Linux
> 
> Other details:
> --- --- ---
> [r...@localhost ~]# ls -l /lib/modules | grep 2.6
> drwxr-xr-x 6 root root 4096 Jun 17 18:47
> 2.6.18-92.1.17.el5
> drwxr-xr-x 6 root root 4096 Jun 17 17:38 2.6.18-92.el5
> 
> 
> [r...@localhost modules]# find . | grep lustre
> ./2.6.18-92.1.17.el5/kernel/net/lustre
> ./2.6.18-92.1.17.el5/kernel/net/lustre/libcfs.ko
> ./2.6.18-92.1.17.el5/kernel/net/lustre/lnet.ko
> ./2.6.18-92.1.17.el5/kernel/net/lustre/ksocklnd.ko
> ./2.6.18-92.1.17.el5/kernel/net/lustre/ko2iblnd.ko
> ./2.6.18-92.1.17.el5/kernel/net/lustre/lnet_selftest.ko
> ./2.6.18-92.1.17.el5/kernel/fs/lustre
> ./2.6.18-92.1.17.el5/kernel/fs/lustre/osc.ko
> ./2.6.18-92.1.17.el5/kernel/fs/lustre/ptlrpc.ko
> ./2.6.18-92.1.17.el5/kernel/fs/lustre/obdecho.ko
> ./2.6.18-92.1.17.el5/kernel/fs/lustre/lvfs.ko
> ./2.6.18-92.1.17.el5/kernel/fs/lustre/mgc.ko
> ./2.6.18-92.1.17.el5/kernel/fs/lustre/llite_lloop.ko
> ./2.6.18-92.1.17.el5/kernel/fs/lustre/lov.ko
> ./2.6.18-92.1.17.el5/kernel/fs/lustre/mdc.ko
> ./2.6.18-92.1.17.el5/kernel/fs/lustre/lquota.ko
> ./2.6.18-92.1.17.el5/kernel/fs/lustre/lustre.ko
> ./2.6.18-92.1.17.el5/kernel/fs/lustre/obdclass.ko
> --- --- ---
> 
> 
> I am still having same problem. I seriously doubt, am I
> missing anything?
> I also tried a source install for 'patchless client',
> however I have
> been consistent in its results too.
> 
> Are there any configuration steps needed after rpm (or
> source)
> installation? The one that I know of is restricting
> interfaces in
> modeprobe.conf, however I have tried it on-n-off with no
> success.
> Could anyone please suggest any debugging and tests for the
> same? How
> can I provide you more valuable output to help me? Any
> insights?
> 
> Also, I have a suggestion here. It might be good idea to
> check for
> 'uname -r' check in RPM installation to check for matching
> kernel
> version and if not suggest for source install.
> 
> Thanks for the help. I really appreciate your patience..
> 
> -
> Thanks,
> CS.
> 
> 
> On Wed, Jun 17, 2009 at 10:40 AM, Jerome, Ron
> wrote:
> > I think the problem you have, as Cliff alluded to, is
> a mismatch between
> > your kernel version  and the Luster kernel version
> modules.
> >
> >
> >
> > You have kernel “2.6.18-92.el5” and are installing
> Lustre
> > “2.6.18_92.1.17.el5”   Note the “.1.17” is
> significant as the modules will
> > end up in the wrong directory.  There is an update to
> CentOS to bring the
> > kernel to the matching 2.6.18_92.1.17.el5 version you
> can pull it off the
> > CentOS mirror site in the updates directory.
> >
> >
> >
> >
> >
> > Ron.
> >
> >
> >
> > From: lustre-discuss-boun...@lists.lustre.org
> > [mailto:lustre-discuss-boun...@lists.lustre.org]
> On Behalf Of Carlos Santana
> > Sent: June 17, 2009 11:21 AM
> > To: lustre-discuss@lists.lustre.org
> > Subject: Re: [Lustre-discuss] Lustre installation and
> configuration problems
> >
> >
> >
> > And is there any specific installation order for
> patchless client? Could
> > someone please share it with me?
> >
> > -
> > CS.
> >
> > On Wed, Jun 17, 2009 at 10:18 AM, Carlos Santana
> 
> wrote:
> >
> > Huh... :( Sorry to bug you guys again...
> >
> > I am planning to make a fresh start now as nothing
> seems to have worked for
> > me. If you have any comments/feedback please share
> them.
> >
> > I would like to confirm installation order before I
> make a fresh start. From
> > Arden's experience:
> > http://lists.lustre.org/pipermail/lustre-discuss/2009-June/010710.html
> , the
> > lusre-module is installed last. As 

Re: [Lustre-discuss] Lustre installation and configuration problems

2009-06-17 Thread Arden Wiebe

Cliff:

I have some questions about the client packages.  I am not sure why the roadmap 
or lustre users require separate client packages but stating the obvious some 
people must need separate client packages is that correct?  

Otherwise the server packages contain the client anyhow correct?  If the later 
are the client packages for linux somewhat redundant?  When will the real 
client .exe for windows become available?

Arden

--- On Wed, 6/17/09, Sheila Barthel  wrote:

> From: Sheila Barthel 
> Subject: Re: [Lustre-discuss] Lustre installation and configuration problems
> To: "Carlos Santana" 
> Cc: "Cliff White" , lustre-discuss@lists.lustre.org
> Date: Wednesday, June 17, 2009, 1:08 PM
> Carlos -
> 
> The installation procedures for Lustre 1.6 and 1.8 are the
> same. The manual's installation procedure includes a table
> that shows which packages to install on servers and clients
> (I've attached a PDF of the table). The procedure also
> describes the installation order for packages (kernel,
> modules, ldiskfs, then utilities/userspace, then
> e2fsprogs).
> 
> http://manual.lustre.org/manual/LustreManual16_HTML/LustreInstallation.html#50401389_pgfId-1291574
> 
> Sheila
> 
> Cliff White wrote:
> > Carlos Santana wrote:
> >   
> >> Huh... :( Sorry to bug you guys again...
> >> 
> >> I am planning to make a fresh start now as nothing
> seems to have worked for me. If you have any
> comments/feedback please share them.
> >> 
> >> I would like to confirm installation order before
> I make a fresh start.  From Arden's experience: 
> http://lists.lustre.org/pipermail/lustre-discuss/2009-June/010710.html
> , the lusre-module is installed last. As I was installing
> Lustre 1.8, I was referring 1.8 operations manual 
> http://manual.lustre.org/index.php?title=Main_Page .
> The installation order in the manual is different than what
> Arden has suggested.
> >> 
> >> Will it make a difference in configuration at
> later stage? Which one should I follow now?
> >> Any comments?
> >>     
> > 
> > RPM installation order really doesn't matter. If you
> install in the 'wrong' order you will get a lot of warnings
> from RPM due to the relationship of the various RPMs. But
> these are harmless - whatever order you install in, it
> should work fine.
> > cliffw
> >   
> >> Thanks,
> >> CS.
> >> 
> >> 
> >> On Wed, Jun 17, 2009 at 12:35 AM, Carlos Santana
>  >
> wrote:
> >> 
> >>     Thanks Cliff.
> >> 
> >>     The depmod -a was
> successful before as well. I am using CentOS 5.2
> >>     box. Following are the
> packages installed:
> >>     [r...@localhost tmp]# rpm
> -qa | grep -i lustre
> >> 
>    lustre-modules-1.8.0-2.6.18_92.1.17.el5_lustre.1.8.0smp
> >> 
>    lustre-1.8.0-2.6.18_92.1.17.el5_lustre.1.8.0smp
> >> 
> >>     [r...@localhost tmp]#
> uname -a
> >>     Linux
> localhost.localdomain 2.6.18-92.el5 #1 SMP Tue Jun 10
> 18:49:47
> >>     EDT 2008 i686 i686 i386
> GNU/Linux
> >> 
> >>     And here is a output from
> strace for mount:
> >>     http://www.heypasteit.com/clip/8WT
> >> 
> >>     Any further debugging
> hints?
> >> 
> >>     Thanks,
> >>     CS.
> >> 
> >>     On 6/16/09, Cliff White
>  >>     >
> wrote:
> >>      > Carlos Santana wrote:
> >>      >> The '$ modprobe -l
> lustre*' did not show any module on a patchless
> >>      >> client. modprobe -v
> returns 'FATAL: Module lustre not found'.
> >>      >>
> >>      >> How do I install a
> patchless client?
> >>      >> I have tried
> lustre-client-modules and lustre-client-ver rpm
> >>     packages in
> >>      >> both sequences. Am I
> missing anything?
> >>      >>
> >>      >
> >>      > Make sure the
> lustre-client-modules package matches your running
> >>     kernel.
> >>      > Run depmod -a to be sure
> >>      > cliffw
> >>      >
> >>      >> Thanks,
> >>      >> CS.
> >>      >>
> >>      >>
> >>      >>
> >>      >> On Tue, Jun 16, 2009
> at 2:28 PM, Cliff White
> >>      
> >>      >>  >>
> wrote:
> >>      >>
> >>      >> 
>    Carlos Santana wrote:
> >>      >>
> >>      >>     
>    The lctlt ping and 'net up' failed with
> the following
> >>     messages:
> >>      >>     
>    --- ---
> >>      >>     
>    [r...@localhost ~]# lctl ping 10.0.0.42
> >>      >>     
>    opening /dev/lnet failed: No such device
> >>      >>     
>    hint: the kernel modules may not be
> loaded
> >>      >>     
>    failed to ping 10.0.0...@tcp: No such
> device
> >>      >>
> >>      >>     
>    [r...@localhost ~]# lctl network up
> >>      >>     
>    opening /dev/lnet failed: No such device
> >>      >>     
>    hint: the kernel modules may not be
> loaded
> >>      >>     
>    LNET configure error 19: No such device
> >>      >>
> >>      >>
> >>      >> 
>    Make sure modules are unloaded, then try
> modprobe -v.
> >>      >> 
>    Looks like you have lnet mis-configured,
> if your module
> >>     op

Re: [Lustre-discuss] Lustre installation and configuration problems

2009-06-17 Thread Arden Wiebe

Carlos:

Now that the obvious clue has been sleuthed out and identified the villainous 
depreciated kernel installation media can be destroyed.  That should come in 
whatever form you feel appropriate from the good old Frisbee and forget or the 
always popular coaster contemplation collection.

The order doesn't matter that much - aside from correct kernel first.  What 
matters is the thoughtful message "Are the Modules Loaded?"  If your getting it 
you have missed installing one of the packages.  When all else fails remove and 
reinstall or even force as the case may be sometimes with the e2fsprogs.  This 
becomes quite a chore when your installing on more then two computers.

What is needed is a bare bones Lustre installation dvd iso.  I'm sure Brian 
plugged the one Sun offers in another post and for a fact one University runs 
and develops in house their own distribution that would be very interesting to 
obtain but it's not public. Good luck Carlos and be sure to have plenty of 
inodes! 

Arden

--- On Wed, 6/17/09, Jerome, Ron  wrote:

> From: Jerome, Ron 
> Subject: Re: [Lustre-discuss] Lustre installation and configuration problems
> To: "Carlos Santana" 
> Cc: lustre-discuss@lists.lustre.org
> Date: Wednesday, June 17, 2009, 8:40 AM
> 
> 
> 
>  
>  
> 
> 
> 
> 
> 
> 
>  
> 
> 
> 
> I think the problem you have,
> as Cliff alluded to, is a mismatch
> between your kernel version  and the Luster kernel
> version modules.   
> 
>    
> 
> You have kernel
> “2.6.18-92.el5” and are
> installing Lustre
> “2.6.18_92.1.17.el5”   Note the
> “.1.17”
> is significant as the modules will end up in the wrong
> directory. 
> There is an update to CentOS to bring the kernel to the
> matching 2.6.18_92.1.17.el5
> version you can pull it off the CentOS mirror site in
> the updates directory. 
> 
>    
> 
>    
> 
> Ron.  
> 
>    
> 
> 
> 
> 
> 
> 
> 
> From:
> lustre-discuss-boun...@lists.lustre.org
> [mailto:lustre-discuss-boun...@lists.lustre.org] On
> Behalf Of Carlos
> Santana
> 
> Sent: June 17, 2009 11:21 AM
> 
> To: lustre-discuss@lists.lustre.org
> 
> Subject: Re: [Lustre-discuss] Lustre installation
> and configuration
> problems 
> 
> 
> 
> 
> 
>    
> 
> And is
> there any specific
> installation order for patchless client? Could someone
> please share it with me?
> 
> 
> 
> 
> -
> 
> CS.  
> 
> 
> 
> On Wed, Jun 17, 2009 at 10:18 AM,
> Carlos Santana 
> wrote: 
> 
> Huh... :( Sorry to bug you guys
> again... 
> 
> 
> 
> I am planning to make a fresh start now as nothing seems to
> have worked for me.
> If you have any comments/feedback please share them. 
> 
> 
> 
> I would like to confirm installation order before I make a
> fresh start. From
> Arden's experience: 
> http://lists.lustre.org/pipermail/lustre-discuss/2009-June/010710.html
> , the lusre-module is installed last. As I was installing
> Lustre 1.8, I was
> referring 1.8 operations manual 
> http://manual.lustre.org/index.php?title=Main_Page
> . The installation order in the manual is different than
> what Arden has
> suggested. 
> 
> 
> 
> Will it make a difference in configuration at later stage?
> Which one should I
> follow now? 
> 
> Any comments? 
> 
> 
> 
> Thanks,
> 
> CS.  
> 
> 
> 
> 
> 
> 
>   
> 
> 
> 
> On Wed, Jun 17, 2009 at 12:35 AM,
> Carlos Santana 
> wrote: 
> 
> Thanks Cliff.
> 
> 
> 
> The depmod -a was successful before as well. I am using
> CentOS 5.2
> 
> box. Following are the packages installed:
> 
> [r...@localhost tmp]# rpm -qa | grep -i lustre
> 
> lustre-modules-1.8.0-2.6.18_92.1.17.el5_lustre.1.8.0smp
> 
> 
> 
> 
> lustre-1.8.0-2.6.18_92.1.17.el5_lustre.1.8.0smp
> 
> 
> 
> 
> [r...@localhost tmp]# uname -a 
> 
> 
> 
> Linux
> localhost.localdomain
> 2.6.18-92.el5 #1 SMP Tue Jun 10 18:49:47
> 
> EDT 2008 i686 i686 i386 GNU/Linux 
> 
> 
> 
> And here is a output from strace for
> mount: http://www.heypasteit.com/clip/8WT
> 
> 
> 
> Any further debugging hints?
> 
> 
> 
> Thanks,
> 
> CS. 
> 
> 
> 
> 
> 
> 
> 
> On 6/16/09, Cliff White 
> wrote:
> 
> > Carlos Santana wrote:
> 
> >> The '$ modprobe -l lustre*' did not show
> any module on a patchless
> 
> >> client. modprobe -v returns 'FATAL: Module
> lustre not found'.
> 
> >>
> 
> >> How do I install a patchless client?
> 
> >> I have tried lustre-client-modules and
> lustre-client-ver rpm packages
> in
> 
> >> both sequences. Am I missing anything?
> 
> >>
> 
> >
> 
> > Make sure the lustre-client-modules package matches
> your running kernel.
> 
> > Run depmod -a to be sure
> 
> > cliffw
> 
> >
> 
> >> Thanks,
> 
> >> CS.
> 
> >>
> 
> >>
> 
> >>
> 
> >> On Tue, Jun 16, 2009 at 2:28 PM, Cliff White
>  
> >> >
> wrote:
> 
> >>
> 
> >>     Carlos Santana wrote:
> 
> >>
> 
> >>         The lctlt ping and
> 'net up' failed with
> the following messages:
> 
> >>         --- ---
> 
> >>         [r...@localhost ~]#
> lctl ping 10.0.0.42
> 
> >>         opening /dev/lnet
> failed: No such device
> 
> >>         hint: the kernel
> modules ma

Re: [Lustre-discuss] Kernel bug in combination with bonding

2009-06-16 Thread Arden Wiebe

Tom:

I just reviewed my logs and have found a similar reports on all 5 of my quad 
core machines during a stretch when I had two months of uptime.  So it did not 
cause my production machines to lock up completely and require a power cycle.  
I recently rebooted them all to do some work on the racking otherwise I'm 
confident they would have still been running.  
[r...@ns1 ~]# uptime
 07:33:25 up 9 days, 10:44,  6 users,  load average: 0.02, 0.06, 0.08

I'll be watching closer in the future for the messages in the logs and gauge if 
there is any unresponsiveness from the network or the machines requiring a 
reboot.

Arden


--- On Tue, 6/16/09, Tom Woezel  wrote:

> From: Tom Woezel 
> Subject: [Lustre-discuss] Kernel bug in combination with bonding
> To: lustre-discuss@lists.lustre.org
> Date: Tuesday, June 16, 2009, 3:57 AM
> Dear all,
> Currently we are running a lustre environment
> with 2 servers for MGS and MDTs and 3 OSDs, all Sun x4140
> with RedHat EL5 and Lustre 1.6.7. Recently we decided to go
> for bonding on the 3 OSDs. We bonded all 4 interfaces
> together and so far the configuration working. Today I
> recognized that one of the OSDs is showing weird behavior
> and some of the clients having problems connecting to the
> filesystem. From what I have learned so far this is a known
> kernel bug with this kernel version 
> (http://bugs.centos.org/view.php?id=3095) and
> I couldn't find a solution for this. 
> I was wondering if any of you has encountered a
> similar problem and if so, how did you fix it?
> Current Kernel is:
> [r...@sososd1 ~]# uname -aLinux
> sososd1 2.6.18-92.1.17.el5_lustre.1.6.7smp #1 SMP Mon Feb 9
> 19:56:55 MST 2009 x86_64 x86_64 x86_64 GNU/Linux
> The bondig configuration:
> [r...@sososd1 ~]# cat
> /etc/modprobe.conf alias eth0
> forcedethalias eth1 forcedethalias
> eth2 forcedethalias eth3
> forcedethalias bond0 bondingoptions
> bond0 miimon=100 mode=4alias scsi_hostadapter
> aacraidalias scsi_hostadapter1
> sata_nvalias scsi_hostadapter2
> qla2xxxalias scsi_hostadapter3
> usb-storageoptions lnet
> networks="tcp(bond0)"
> [r...@sososd1 ~]# cat
> /etc/sysconfig/network-scripts/ifcfg-bond0 DEVICE=bond0IPADDR=xxx.xxx.xxx.xxxNETMASK=xxx.xxx.xxx.xxxNETWORK=xxx.xxx.xxx.xxxBROADCAST=xxx.xxx.xxx.xxxGATEWAY=xxx.xxx.xxx.xxxONBOOT=yesBOOTPROTO=noneUSERCTL=no
> And each of the interfaces is configured like
> this:
> [r...@sososd1 ~]# cat
> /etc/sysconfig/network-scripts/ifcfg-eth0# nVidia
> Corporation MCP55
> EthernetDEVICE=eth0ONBOOT=yesBOOTPROTO=noneUSERCTL=noMASTER=bond0SLAVE=yes
> And this is a extract from the log
> file:
> Jun 16 04:33:38 sososd1 kernel: BUG: soft lockup
> - CPU#2 stuck for 10s! [bond0:3914]Jun 16
> 04:33:38 sososd1 kernel: CPU 2:Jun 16 04:33:38
> sososd1 kernel: Modules linked in: obdfilter(U)
> fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U) crc16(U)
> lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U)
> ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U)
> ipv6(U) xfrm_nalgo(U) crypto_api(U) autofs4(U)
> ipmi_devintf(U) ipmi_si(U) ipmi_msghandler(U) hidp(U)
> rfcomm(U) l2cap(U) bluetooth(U) sunrpc(U) bonding(U)
> dm_rdac(U) dm_round_robin(U) dm_multipath(U)
> video(U) sbs(U) backlight(U) i2c_ec(U) button(U) battery(U)
> asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U)
> parport(U) joydev(U) i2c_nforce2(U) sr_mod(U)
> cdrom(U) pata_acpi(U) i2c_core(U) forcedeth(U) sg(U)
> pcspkr(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_mod(U)
> usb_storage(U) qla2xxx(U) scsi_transport_fc(U)
> sata_nv(U) libata(U) shpchp(U) aacraid(U)
> sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U)
> ehci_hcd(U)Jun 16 04:33:38 sososd1 kernel: Pid:
> 3914, comm: bond0 Tainted: G    
>  2.6.18-92.1.17.el5_lustre.1.6.7smp #1Jun 16
> 04:33:38 sososd1 kernel: RIP:
> 0010:[]
>  []
> .text.lock.spinlock+0x2/0x30Jun 16 04:33:38
> sososd1 kernel: RSP: 0018:81012b993d98  EFLAGS:
> 0286Jun 16 04:33:38 sososd1 kernel: RAX:
> 0001 RBX: 81012b97a080 RCX:
> 0004Jun 16 04:33:38 sososd1 kernel:
> RDX: 81012b97a000 RSI: 81012b97a080 RDI:
> 81012b97a168Jun 16 04:33:38 sososd1 kernel:
> RBP: 81012b993d10 R08:  R09:
> 810226ad5d28Jun 16 04:33:38 sososd1 kernel:
> R10: 00fe009a R11: 810227efcae0 R12:
> 8005dc8eJun 16 04:33:38 sososd1 kernel:
> R13: 81010e39d81e R14: 80076fd7 R15:
> 81012b993d10Jun 16 04:33:38 sososd1 kernel:
> FS:  2abdd36dc220() GS:810104159240()
> knlGS:f7f928d0Jun 16 04:33:38 sososd1
> kernel: CS:  0010 DS: 0018 ES: 0018 CR0:
> 8005003bJun 16 04:33:38 sososd1 kernel:
> CR2: 2c009000 CR3: 00201000 CR4:
> 06e0Jun 16 04:33:38 sososd1
> kernel: Jun 16 04:33:38 sososd1 kernel: Call
> Trace:Jun 16 04:33:38 sososd1 kernel:
>    []
> :bonding:ad_rx_machine+0x20/0x502Jun 16 04:33:38
> sososd1 kernel:  []
> :bonding:bond_3ad_lacpdu_recv+0xc1/0x1fcJun 16
> 04:33:38 sososd1 kernel:  []
> try_to_wake_up+0x407/0x418Jun 

Re: [Lustre-discuss] Lustre installation and configuration problems

2009-06-15 Thread Arden Wiebe

Carlos:

I'm not clear on which kernel package you tried to install.  There is pretty 
much a set order to install the packages from my understanding of the wording 
in the manual.  From experience:

rpm -ivh kernel-lustre-smp-2.6.18-92.1.17.el5_lustre.1.8.0.x86_64.rpm
rpm -ivh lustre-1.8.0-2.6.18_92.1.17.el5_lustre.1.8.0smp.x86_64.rpm
rpm -ivh lustre-ldiskfs-3.0.8-2.6.18_92.1.17.el5_lustre.1.8.0smp.x86_64.rpm
rpm -ivh lustre-modules-1.8.0-2.6.18_92.1.17.el5_lustre.1.8.0smp.x86_64.rpm
rpm -Uvh e2fsprogs-1.40.11.sun1-0redhat.rhel5.x86_64.rpm

Hope that helps as that order has worked for me many times.

Arden


--- On Mon, 6/15/09, Carlos Santana  wrote:

> From: Carlos Santana 
> Subject: [Lustre-discuss] Lustre installation and configuration problems
> To: lustre-discuss@lists.lustre.org
> Date: Monday, June 15, 2009, 2:07 PM
> Hello list,
> 
> I am struggling to install Lustre 1.8 on a CentOS 5.2 box.
> I am referring to Lustre manual 
> http://manual.lustre.org/index.php?title=Main_Page
> and Lustre HowTo http://wiki.lustre.org/index.php/Lustre_Howto
> guide. Following is the installation order and warning/error
> messages (if any) associated with it. 
> 
>  - kernel-lustre patch 
>  - luster-module: http://www.heypasteit.com/clip/8UJ
> 
>  - lustre-ldiskfs http://www.heypasteit.com/clip/8UK
> 
> 
>  - lustre-utilities
>  - e2fsprogs: http://www.heypasteit.com/clip/8UL
> 
> 
> I did not see any test examples under
> /usr/lib/lustre/examples directory as mentioned in the HowTo
> document. In fact, I do not have 'examples' dir at
> all. So I skipped to 
> http://wiki.lustre.org/index.php/Lustre_Howto#Using_Supplied_Configuration_Tools
> section. But I did not have lmc, lconf, and lctl commands
> either. Any clues on how should I proceed with installation
> and configuration? Is there any guide for step-by-step
> installation? Feedback/comments welcome. 
> 
> 
> Thanks,
> CS.  
> 
> 
> -Inline Attachment Follows-
> 
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 


  
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Question on upgrading Lustre 1.6.6 -> 1.8.0

2009-05-17 Thread Arden Wiebe

I concur that the upgrade from 1.6 to 1.8 was as simple as upgrading the 
packages on all the nodes for the clients and the servers. 

I searched the mailing list archives and found reference to other posts that 
mentioned the same upgrade procedure and so I did as well.  It was a snap.  The 
upgrade of 4 machines was complete in under half an hour minus the tune2fs of 
course.

[r...@ns2 ~]# uname -rv
2.6.18-92.1.17.el5_lustre.1.8.0smp #1 SMP Thu Mar 5 17:41:12 MST 2009

--- On Sun, 5/17/09, thhsieh  wrote:

> From: thhsieh 
> Subject: [Lustre-discuss] Question on upgrading Lustre 1.6.6 -> 1.8.0
> To: lustre-discuss@lists.lustre.org
> Date: Sunday, May 17, 2009, 1:33 AM
> Dear All,
> 
> I have read the description of Lustre Operation Guide for
> version
> 1.8. But I am still not very sure about the exact
> procedures to
> upgrade from version 1.6.6 to version 1.8.0. Now I try to
> write up
> a plan of upgrading. Please give me your kindly comments on
> my
> procedures. :)
> 
> In our system, we have three Lustre filesystems (they are
> all version
> 1.6.6, for all the MGS, MDT, OST, and clients), which are
> configured
> in the following:
> 
> 1. fsname="chome"
>    MGS: qa1:/dev/sda5
>    MDT: qa1:/dev/sda5  (i.e., exactly
> same disk partition as MGS)
>    OST: qaX:/dev/sdaX  (distributed in
> several OST nodes)
> 
> 2. fsname="cwork"
>    MGS: qa1:/dev/sda5  (shared with
> that of "chome")
>    MDT: qa1:/dev/sda6
>    OST: qaY:/dev/sdaY  (distributed in
> several OST nodes)
> 
> 3. fsname="cwork1"
>    MGS: qa1:/dev/sda5  (shared with
> that of "chome")
>    MDT: qa1:/dev/sda7
>    OST: qaZ:/dev/sdaZ  (distributed in
> several OST nodes)
> 
> We do not have failover configurations in all the
> filesystems.
> 
> I am planing to shutdown all the Lustre filesystems, and
> then perform the
> upgrading, and finally startup them. I guess that would be
> simpler. The
> exact procedures I am going to do are:
> 
> 1. For each of the Lustre filesystems, I will perform the
> following
>    shutdown procedures (chome should be the
> last one to shutdown, since
>    it share the MDT and MGS in the same
> partition):
>    - umount all clients
>    - umount all OSTs
>    - umount MDT
> 
> 2. Install the new Lustre-1.8 software and modules and
> reboot all the
>    nodes. Then I will upgrade "chome" first,
> and then "cwork", and
>    finally "cwork1".
> 
> 3. Upgrade MGS and "MDT for chome":
>    
>    qa1# tunefs.lustre --mgs --mdt
> --fsname=chome /dev/sda5
> 
> 4. Upgrade OSTs for chome:
> 
>    qaX# tunefs.lustre --ost --fsname=chome
> --mgsnode=qa1 /dev/sdaX
> 
>    Up to this point the "chome" part should
> be ready, I guess.
> 
> 
> 5. Now the MDT for "cwork". The manual says that we should
> copy the MDT
>    and client startup logs from the MDT to
> the MGS, so I guess that I should
> 
>    - Mount MGS as ldiskfs:
>      qa1# mount -t ldiskfs /dev/sda5
> /mnt
> 
>    - Run script "lustre_up14" on the MDT of
> "cwork" partition:
>      qa1# lustre_up14 /dev/sda6 cwork
> 
>      then I will get the following
> files:
>      /tmp/logs/cwork-client
>      /tmp/logs/cwork-MDT
> 
>    - Copy these log files to /mnt/CONFIGS/
> 
>    - Umount MGS:
>      qa1# umount /mnt
> 
>    - Upgrade the MDT:
>      qa1# tunefs.lustre --mdt --nomgs
> --fsname=cwork --mgsnode=qa1 /dev/sda6
> 
> 
> 6. Now the OSTs for "cwork":
> 
>    qaY# tunefs.lustre --ost --fsname=cwork1
> --mgsnode=qa1 /dev/sdaY
> 
>    Up to now the filesystem "cwork" should
> be ready.
> 
> 
> 7. For the MDT and OSTs for "cwork1", we can follow the
> same procedures
>    as step 6 and 7.
> 
> 8. Start up the new Lustre filesystems:
> 
>    For chome:
>    qa1# mount -t lustre /dev/sda5
> /cfs/chome_mdt
>    qaX# mount -t lustre /dev/sdaX
> /cfs/chome_ostX
>    mount the clients
> 
>    for cwork:
>    qa1# mount -t lustre /dev/sda6
> /cfs/cwork_mdt
>    qaY# mount -t lustre /dev/sdaY
> /cfs/cwork_ostY
>    mount the clients
> 
>    for cwork1:
>    qa1# mount -t lustre /dev/sda7
> /cfs/cwork1_mdt
>    qaZ# mount -t lustre /dev/sdaZ
> /cfs/cwork1_ostZ
>    mount the clients
> 
> 
> Please kindly give me your comments. Thanks very much.
> 
> 
> Best Regards,
> 
> T.H.Hsieh
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 


  
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8

2009-05-10 Thread Arden Wiebe

Mag, your welcome. From the page referenced first for a search for Linux 
Bonding it states:

How many bonding devices can I have?

There is no limit.
How many slaves can a bonding device have?

This is limited only by the number of network interfaces Linux supports and/or 
the number of network cards you can place in your system.

--- On Sun, 5/10/09, Mag Gam  wrote:

> From: Mag Gam 
> Subject: Re: [Lustre-discuss] tcp network load balancing understanding lustre 
>  1.8
> To: "Arden Wiebe" 
> Cc: "Andreas Dilger" , "Michael Ruepp" 
> , lustre-discuss@lists.lustre.org
> Date: Sunday, May 10, 2009, 5:48 AM
> Thanks for the screen shot Arden.
> 
> What is the maximum # of slaves you can have on a bonded
> interface?
> 
> 
> 
> On Sun, May 10, 2009 at 12:15 AM, Arden Wiebe 
> wrote:
> >
> > Bond0 knows which interface to utilize because all the
> other eth0-5 are designated as slaves in their configuration
> files.  The manual is fairly clear on that.
> >
> > In the screenshot the memory used in gnome system
> monitor is at 452.4 MiB of 7.8 GiB and the sustained
> bandwidth to the OSS and OST is 404.2 MiB/s which
> corresponds roughly to what collectl is showing for KBWrite
> for Disks.  Collectl shows a few different results for
> Disks, Network and Lustre OST and I believe it to be
> measuring the other OST on the network around 170MiB/s if
> you view the other screenshot for OST1 or lustrethree.
> >
> > In the screenshots Lustreone=MGS Lustretwo=MDT
> Lustrethree=OSS+raid10 target Lustrefour=OSS+raid10 target
> >
> > To help clarify the entire network and stress testing
> I did with all the clients I could give it is at
> www.ioio.ca/Lustre-tcp-bonding/images/html and
> www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html
> >
> > Proper benchmarking would be nice though as I just hit
> it with everything I could and it lived so I was happy. I
> found the manual to be lacking in benchmarking and really
> wanted to make nice graphs of it all but failed with iozone
> to do so for some reason.
> >
> > I'll be taking a run at upgrading everything to 1.8 in
> the coming week or so and when I do I'll grab some new
> screenshots and post the relevant items to the wiki.
>  Otherwise if someone else wants to post the existing
> screenshots your welcome to use them as they do detail a
> ground up build. Apparently 1.8 is great with small files
> now so it should work even better with
> www.oil-gas.ca/phpsysinfo and www.linuxguru.ca/phpsysinfo
> >
> >
> > --- On Sat, 5/9/09, Andreas Dilger 
> wrote:
> >
> >> From: Andreas Dilger 
> >> Subject: Re: [Lustre-discuss] tcp network load
> balancing understanding lustre 1.8
> >> To: "Arden Wiebe" 
> >> Cc: lustre-discuss@lists.lustre.org,
> "Michael Ruepp" 
> >> Date: Saturday, May 9, 2009, 11:31 AM
> >> On May 09, 2009  09:18 -0700,
> >> Arden Wiebe wrote:
> >> > This might help answer some questions.
> >> > http://ioio.ca/Lustre-tcp-bonding/OST2.png which shows
> >> my mostly not
> >> > tuned OSS and OST's pulling 400+MiB/s over
> TCP Bonding
> >> provided by the
> >> > kernel complete with a cat of the
> modeprobe.conf
> >> file.  You have the other
> >> > links I've sent you but the picture above is
> relevant
> >> to your questions.
> >>
> >> Arden, thanks for sharing this info.  Any chance
> you
> >> could post it to
> >> wiki.lustre.org?  It would seem there is one bit
> of
> >> info missing somewhere -
> >> how does bond0 know which interfaces to use?
> >>
> >>
> >> Also, another oddity - the network monitor is
> showing
> >> 450MiB/s Received,
> >> yet the disk is showing only about 170MiB/s going
> to the
> >> disk.  Either
> >> something is wacky with the monitoring (e.g. it is
> counting
> >> Received for
> >> both the eth* networks AND bond0), or Lustre is
> doing
> >> something very
> >> wierd and retransmitting the bulk data like crazy
> (seems
> >> unlikely).
> >>
> >>
> >> > --- On Thu, 5/7/09, Michael Ruepp 
> >> wrote:
> >> >
> >> > > From: Michael Ruepp 
> >> > > Subject: [Lustre-discuss] tcp network
> load
> >> balancing understanding lustre 1.8
> >> > > To: lustre-discuss@lists.lustre.org
> >> > > Date: Thursday, May 7, 2009, 5:50 AM
> >> > > Hi there,
> >> > >

Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8

2009-05-09 Thread Arden Wiebe

Bond0 knows which interface to utilize because all the other eth0-5 are 
designated as slaves in their configuration files.  The manual is fairly clear 
on that.  

In the screenshot the memory used in gnome system monitor is at 452.4 MiB of 
7.8 GiB and the sustained bandwidth to the OSS and OST is 404.2 MiB/s which 
corresponds roughly to what collectl is showing for KBWrite for Disks.  
Collectl shows a few different results for Disks, Network and Lustre OST and I 
believe it to be measuring the other OST on the network around 170MiB/s if you 
view the other screenshot for OST1 or lustrethree.  

In the screenshots Lustreone=MGS Lustretwo=MDT Lustrethree=OSS+raid10 target 
Lustrefour=OSS+raid10 target

To help clarify the entire network and stress testing I did with all the 
clients I could give it is at www.ioio.ca/Lustre-tcp-bonding/images/html and 
www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html

Proper benchmarking would be nice though as I just hit it with everything I 
could and it lived so I was happy. I found the manual to be lacking in 
benchmarking and really wanted to make nice graphs of it all but failed with 
iozone to do so for some reason.

I'll be taking a run at upgrading everything to 1.8 in the coming week or so 
and when I do I'll grab some new screenshots and post the relevant items to the 
wiki.  Otherwise if someone else wants to post the existing screenshots your 
welcome to use them as they do detail a ground up build. Apparently 1.8 is 
great with small files now so it should work even better with 
www.oil-gas.ca/phpsysinfo and www.linuxguru.ca/phpsysinfo
 

--- On Sat, 5/9/09, Andreas Dilger  wrote:

> From: Andreas Dilger 
> Subject: Re: [Lustre-discuss] tcp network load balancing understanding lustre 
> 1.8
> To: "Arden Wiebe" 
> Cc: lustre-discuss@lists.lustre.org, "Michael Ruepp" 
> Date: Saturday, May 9, 2009, 11:31 AM
> On May 09, 2009  09:18 -0700,
> Arden Wiebe wrote:
> > This might help answer some questions.
> > http://ioio.ca/Lustre-tcp-bonding/OST2.png which shows
> my mostly not
> > tuned OSS and OST's pulling 400+MiB/s over TCP Bonding
> provided by the
> > kernel complete with a cat of the modeprobe.conf
> file.  You have the other
> > links I've sent you but the picture above is relevant
> to your questions.
> 
> Arden, thanks for sharing this info.  Any chance you
> could post it to 
> wiki.lustre.org?  It would seem there is one bit of
> info missing somewhere -
> how does bond0 know which interfaces to use? 
> 
> 
> Also, another oddity - the network monitor is showing
> 450MiB/s Received,
> yet the disk is showing only about 170MiB/s going to the
> disk.  Either
> something is wacky with the monitoring (e.g. it is counting
> Received for
> both the eth* networks AND bond0), or Lustre is doing
> something very
> wierd and retransmitting the bulk data like crazy (seems
> unlikely).
> 
> 
> > --- On Thu, 5/7/09, Michael Ruepp 
> wrote:
> > 
> > > From: Michael Ruepp 
> > > Subject: [Lustre-discuss] tcp network load
> balancing understanding lustre 1.8
> > > To: lustre-discuss@lists.lustre.org
> > > Date: Thursday, May 7, 2009, 5:50 AM
> > > Hi there,
> > > 
> > > I am configured a simple tcp lustre 1.8 with one
> mdc (one
> > > nic) and two  
> > > oss (four nic per oss)
> > > As well as in the 1.6 documentation, the
> multihomed
> > > sections is a  
> > > little bit unclear to me.
> > > 
> > > I give every NID a IP in the same subnet, eg:
> > > 10.111.20.35-38 - oss0  
> > > and 10.111.20.39-42 oss1
> > > 
> > > Do I have to make modprobe.conf.local look like
> this to
> > > force lustre  
> > > to use all four interfaces parallel:
> > > 
> > > options lnet networks=tcp0(eth0,eth1,eth2,eth3)
> > > Because on Page 138 the 1.8 Manual says:
> > > "Note – In the case of TCP-only clients, the
> first
> > > available non- 
> > > loopback IP interface
> > > is used for tcp0 since the interfaces are not
> specified. "
> > > 
> > > or do I have to specify it like this:
> > > options lnet networks=tcp
> > > Because on Page 112 the lustre 1.6 Manual says:
> > > "Note – In the case of TCP-only clients, all
> available IP
> > > interfaces  
> > > are used for tcp0
> > > since the interfaces are not specified. If there
> is more
> > > than one, the  
> > > IP of the first one
> > > found is used to construct the tcp0 ID."
> > > 
> > > Which is the opposite of the 1.8 Manual
> &g

Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8

2009-05-09 Thread Arden Wiebe

Michael,

This might help answer some questions.  
http://ioio.ca/Lustre-tcp-bonding/OST2.png which shows my mostly not tuned OSS 
and OST's pulling 400+MiB/s over TCP Bonding provided by the kernel complete 
with a cat of the modeprobe.conf file.  You have the other links I've sent you 
but the picture above is relevant to your questions. 

Arden

--- On Thu, 5/7/09, Michael Ruepp  wrote:

> From: Michael Ruepp 
> Subject: [Lustre-discuss] tcp network load balancing understanding lustre 1.8
> To: lustre-discuss@lists.lustre.org
> Date: Thursday, May 7, 2009, 5:50 AM
> Hi there,
> 
> I am configured a simple tcp lustre 1.8 with one mdc (one
> nic) and two  
> oss (four nic per oss)
> As well as in the 1.6 documentation, the multihomed
> sections is a  
> little bit unclear to me.
> 
> I give every NID a IP in the same subnet, eg:
> 10.111.20.35-38 - oss0  
> and 10.111.20.39-42 oss1
> 
> Do I have to make modprobe.conf.local look like this to
> force lustre  
> to use all four interfaces parallel:
> 
> options lnet networks=tcp0(eth0,eth1,eth2,eth3)
> Because on Page 138 the 1.8 Manual says:
> "Note – In the case of TCP-only clients, the first
> available non- 
> loopback IP interface
> is used for tcp0 since the interfaces are not specified. "
> 
> or do I have to specify it like this:
> options lnet networks=tcp
> Because on Page 112 the lustre 1.6 Manual says:
> "Note – In the case of TCP-only clients, all available IP
> interfaces  
> are used for tcp0
> since the interfaces are not specified. If there is more
> than one, the  
> IP of the first one
> found is used to construct the tcp0 ID."
> 
> Which is the opposite of the 1.8 Manual
> 
> My goal ist to let lustre utilize all four Gb Links
> parallel. And my  
> Lustre Clients are equipped with two Gb links which should
> be utilized  
> by the lustre clients as well (eth0, eth1)
> 
> Or is bonding the better solution in terms of performance?
> 
> Thanks very much for input,
> 
> Michael Ruepp
> Schwarzfilm AG
> 
> 
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 


  
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Lustre for web application

2009-02-19 Thread Arden Wiebe
Well not typically suited for small files related to serving web pages Lustre 
works for my purposes.  I have not mounted the document root to the Lustre 
mount points but eventually I may pursue this.  I may also symlink user 
directories directly to the mount points.  For sure I use the mount points to 
back up database files and store images.

The two OSS nodes I have are now routers each with their own static IP address. 
 They do network address translation and port forward 80 internally to the 
webservers that live on the MGS and MDT nodes.  You can see the webservers at 
http://www.linuxguru.ca/phpsysinfo or http://www.oil-gas.ca/phpsysinfo

All in all it was well worth the effort as the internal network is 400% faster 
due to the Network Interface Bonding.  I can actually load a 3 table mysql 
database in phpmyadmin now with no problems from either of the apache/mysql 
servers to a local machine on the network.  Re-posted in plain text.




  
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Lustre for web application

2009-02-19 Thread Arden Wiebe
Well not typically suited for small files related to serving web pages Lustre 
works for my purposes.  I have not mounted the document root to the Lustre 
mount points but eventually I may pursue this.  I may also symlink user 
directories directly to the mount points.  For sure I use the mount points to 
back up database files and store images.

The two OSS nodes I have are now routers each with their own static IP 
address.  They do network address translation and port forward 80 internally to 
the webservers that live on the MGS and MDT nodes.  You can see the webservers 
at http://www.linuxguru.ca/phpsysinfo or http://www.oil-gas.ca/phpsysinfo

All in all it was well worth the effort as the internal network is 400% faster 
due to the Network Interface Bonding.  I can actually load a 3 table mysql 
database in phpmyadmin now with no problems from either of the apache/mysql 
servers to a local machine on the network.



  ___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lustre no longer allows reads/writes (stopped working)?

2009-01-30 Thread Arden Wiebe




>I have setup a lustre system for testing consisting of four OST's and one
>MDT. It seems to work fine for about a day. At the end of about 24 hours,
>the clients can no longer read or write the mount point (although a file
>listing (ls) works). 

That is the problem.  Your clients are mounting wrong.  You have used incorrect 
formatting of the nodes.

>For example, a mkdir yields a "cannot create directory
>'/datafs/temp': Identifier removed", and the temp dir does not exist. 
>A file listing of the /datafs directory comes back complete and correct,
>but if I try to ls a subdirectory it gives me the erorr "ls: /datafs 
>>/test2:
>Identifier removed". 

Please review via your bash history the exact commands you used to make the 
underlying filesystem.  Be certain everything is pointing to the correct 
filesystem and to the correct directories.

>The client is mounting the dir to /datafs. This worked fine eariler, I >left
>for the day, came back in and this error is occurring on all clients >(albeit
>I only have three clients for testing). All clients/servers are running
>RHEL5, and the lustre was installed via rpms as per the manual. 

The client if you followed the manual 100% (takes practice) should be mounting 
your combined MDT/MGS node at the MDT/MGS Node IP address via your network for 
example, tcp0 on a local mountpoint likened to /mnt/datafs.

I found that changing the manual representations of your new filesystem to 
something other than datafs or testfs or spfs.  In your case I would recommend 
the word litefs.  Also there are some ambiguities with slashes in the examples 
and I might ad use or misuse of the = sign after fsname.

By far the best example is further into the manual about mounting external 
journals.  Also it is best to have the MGS and MDT separate from everything I 
have read.   Otherwise you must on your combined MDT/MGS node have two mount 
points /mnt/mgs and /mnt/data/mdt.  

>Out of curiosity, if I go to the server and do an ls on /mnt/data/mdt or
>to the OST server and do an ls on /mnt/data/ost1, I get an error that
>it is not a directory (although that could be normal, I am not sure). 

Yes that is normal because those are mount points not directories.

>A cat of /proc/fs/lustre/devices on the mdt does not show anything out >of 
>place
>(or at least, it is the same as when I started the lustre and mounted
>the servers/clients) 

So we assume your combined MDT/MGS is up and running but is it formatted 
properly and mounted properly?

>I have configured it all according to 
>http://manual.lustre.org/manual/LustreManual16_HTML 
>>/ConfiguringLustreExamples.html#50548848_pgfId-1286919
>as per section 6.1.1.2 Configuration Generation and Application, using >one 
>server
>for the MGT and MDS, and I have four OSTs, just like the example. 

>Has anyone seen this before? 

Yes and it is common until you become good enough at creating your Lustre 
filesystem and knowing which formatting and mounting procedures interact to 
make a live filesystem that you adopt and know to be sound.

Robert to simplify things I'll include some of my .bash_history on the nodes 
for you to examine.  This should considerably decrease your initial 
configuration timeframe.

My configuration differs in that I opt for seperate MGS and MDT. This obviously 
is from the MDT.

umount /mnt/mgs
mdadm -S /dev/md2
mdadm -S /dev/md1
mdadm -S /dev/md0
mdadm --zero-superblock /dev/sdb
mdadm --zero-superblock /dev/sdc
mdadm --zero-superblock /dev/sdd
mdadm --zero-superblock /dev/sde
mdadm --zero-superblock /dev/sdf
mdadm -v --create --assume-clean /dev/md0 --level=raid10 --raid-devices=4 
/dev/sdb /dev/sdc /dev/sdd /dev/sde
sfdisk -uC /dev/sdf << EOF
mke2fs -b 4096 -O journal_dev /dev/sdf1
cat /proc/mdstat
mkfs.lustre --mgs --fsname=ioio --mkfsoptions="-J device=/dev/sdf1" --reformat 
/dev/md0
rm /etc/mdadm.conf
mdadm --detail --scan --verbose > /etc/mdadm.conf
mount -t lustre /dev/md0 /mnt/mgs
e2label /dev/md0
vi /etc/fstab
e2label /dev/md0
cat /proc/mdstat
mount -t lustre 192.168@tcp0:/ioio /mnt/ioio
lctl dl
lfs df -h

This shows a single MGS with an external journal on /dev/sdf1.  The MGS is 
mounted on /mnt/mgs by the /dev/md0 devices.  The e2label of which will be 
label=MGS followed by mount options in the /etc/fstab.  Here you can see I 
connect a client to the MGS to test the filesystem but only after the MDT is 
mounted and the OSS are mounted.

On the MDT

umount /mnt/data/mdt
mdadm -S /dev/md2
mdadm -S /dev/md0
mdadm -S /dev/md1
mdadm --zero-superblock /dev/sdb
mdadm --zero-superblock /dev/sdc
mdadm --zero-superblock /dev/sdd
mdadm --zero-superblock /dev/sde
mdadm --zero-superblock /dev/sdf
mdadm -v --create --assume-clean /dev/md0 --level=raid10 --raid-devices=4 
/dev/sdb /dev/sdc /dev/sdd /dev/sde
sfdisk -uC /dev/sdf << EOF
mke2fs -b 4096 -O journal_dev /dev/sdf1
cat /proc/mdstat
mkfs.lustre --mdt --fsname=ioio --mgsnode=192.168@tcp0 --mkfsoptions="-J 
device=/dev/sdf1" --reformat /dev/md0

Re: [Lustre-discuss] Plateau around 200MiB/s bond0

2009-01-28 Thread Arden Wiebe
 
6GigE bond0 can I test in any other way to increase the 412MiB/s plateau?  How 
do I best interpret the above results?

--- On Wed, 1/28/09, Jeremy Mann  wrote:

From: Jeremy Mann 
Subject: Re: [Lustre-discuss] Plateau around 200MiB/s bond0
To: "Arden Wiebe" 
Cc: "lustre-discuss@lists.lustre.org" 
Date: Wednesday, January 28, 2009, 1:56 PM

Arden, we also use dual channel gigE (bond0) and in my tests found that
this works best:

options bonding miimon=100 mode=802.3ad xmit_hash_policy=layer3+4

This allows us to get roughly 250 MB/s transfers. Here is the iozone
command I used:

 iozone -t1 -i0 -il -r4m -s2g

You will not get anymore performance unless you move to Infiniband or
another interconnect.

Jeffrey Alan Bennett wrote:
> Hi Arden,
>
> Are you obtaining more than 100 MB/sec from one client to one OST? Given
> that you are using 802.3ad link aggregation, it will determine the
> physical NIC by the other party's MAC address. So having multiple OST and
> multiple clients will improve the chances of using more than one NIC of
> the bonding.
>
> What is the maximum performance you obtain on the client with two 1GbE?
>
> jeff
>
>
>
>
> 
> From: lustre-discuss-boun...@lists.lustre.org
> [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Arden Wiebe
> Sent: Sunday, January 25, 2009 12:08 AM
> To: lustre-discuss@lists.lustre.org
> Subject: Re: [Lustre-discuss] Plateau around 200MiB/s bond0
>
> So if one OST gets 200MiB/s and another OST gets 200MiB/s does that make
> 400 MiB/s or this is not how to calculate throughput?  I will eventually
> plug the right sequence into iozone to measure it.
>
>>From my perspective it looks like ioio.ca/ioio.jpg ioio.ca/lustreone.png
>> ioio.ca/lustretwo.png ioio.ca/lustrethree.png ioio.ca/lustrefour.png
>
> --- On Sat, 1/24/09, Arden Wiebe  wrote:
>
> From: Arden Wiebe 
> Subject: [Lustre-discuss] Plateau around 200MiB/s bond0
> To: lustre-discuss@lists.lustre.org
> Date: Saturday, January 24, 2009, 6:04 PM
>
> 1-2948-SFP Plus Baseline 3Com Switch
> 1-MGS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1
> 1-MDT bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1
> 2-OSS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid6
> 1-MGS-CLIENT bond0(eth0,eth1,eth2,eth3,eth4,eth5)
> 1-CLIENT bond0(eth0,eth1)
> 1-CLIENT eth0
> 1-CLIENT eth0
>
> I fail so far creating external journal for MDT, MGS and OSSx2.  How to
> add the external journal to /etc/fstab specifically the output of e2label
> /dev/sdb followed by what options for fstab?
>
> [r...@lustreone ~]# cat /proc/fs/lustre/devices
>   0 UP mgs MGS MGS 17
>   1 UP mgc mgc192.168@tcp 876c20af-aaec-1da0-5486-1fc61ec8cd15 5
>   2 UP lov ioio-clilov-810209363c00
> 7307490a-4a12-4e8c-56ea-448e030a82e4 4
>   3 UP mdc ioio-MDT-mdc-810209363c00
> 7307490a-4a12-4e8c-56ea-448e030a82e4 5
>   4 UP osc ioio-OST-osc-810209363c00
> 7307490a-4a12-4e8c-56ea-448e030a82e4 5
>   5 UP osc ioio-OST0001-osc-810209363c00
> 7307490a-4a12-4e8c-56ea-448e030a82e4 5
> [r...@lustreone ~]# lfs df -h
> UUID                     bytes      Used Available  Use% Mounted on
> ioio-MDT_UUID       815.0G    534.0M    767.9G    0% /mnt/ioio[MDT:0]
> ioio-OST_UUID         3.6T     28.4G      3.4T    0% /mnt/ioio[OST:0]
> ioio-OST0001_UUID         3.6T     18.0G      3.4T    0% /mnt/ioio[OST:1]
>
> filesystem summary:       7.2T     46.4G      6.8T    0% /mnt/ioio
>
> [r...@lustreone ~]# cat /proc/net/bonding/bond0
> Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)
>
> Bonding Mode: IEEE 802.3ad Dynamic link aggregation
> Transmit Hash Policy: layer2 (0)
> MII Status: up
> MII Polling Interval (ms): 100
> Up Delay (ms): 0
> Down Delay (ms): 0
>
> 802.3ad info
> LACP rate: slow
> Active Aggregator Info:
>         Aggregator ID: 1
>         Number of ports: 1
>         Actor Key: 17
>         Partner Key: 1
>         Partner Mac Address: 00:00:00:00:00:00
>
> Slave Interface: eth0
> MII Status: up
> Link Failure Count: 1
> Permanent HW addr: 00:1b:21:28:77:db
> Aggregator ID: 1
>
> Slave Interface: eth1
> MII Status: up
> Link Failure Count: 1
> Permanent HW addr: 00:1b:21:28:77:6c
> Aggregator ID: 2
>
> Slave Interface: eth3
> MII Status: up
> Link Failure Count: 0
> Permanent HW addr: 00:22:15:06:3a:94
> Aggregator ID: 3
>
> Slave Interface: eth2
> MII Status: up
> Link Failure Count: 0
> Permanent HW addr: 00:22:15:06:3a:93
> Aggregator ID: 4
>
> Slave Interface: eth4
> MII Status: up
> Link Failure Count: 0
> Permanent HW addr: 00:22:15:06:3a:95
> Aggregator ID: 5
>
> Slave Interface: eth5
> MII Status: up
&g

Re: [Lustre-discuss] can't find mdc_open_data, but the close succeeded

2009-01-28 Thread Arden Wiebe
Please read the last paragraph before implementing any suggestions below.

I would say you have an interconnect problem with your MGS node.  Examine any 
indicators of port state for the given NID 10.13.32.12 and note any 
deficiencies and correct.  Don't overlook the switching or cabling.  Mediate by 
doing a network restart for the 10.13.32.12 node trailing your messages 
watching for signs of reconnection.  It sometimes takes awhile for all clients 
to reconnect after changing network values.

I produced similar errors when the MTU of an interface was incompatible with 
lesser MTU values and or also incorrectly configured switch.  Optionally you 
can unload lctl rmmod I believe the lustre modules.  Failing that reboot or 
failover the node and observe message logs on MGS MDT and Clients as node comes 
back up.  Continue to change network values or hardware until communication is 
restored.  Observe switch configurations.  

Your logs clearly show your MGS node receiving intermittent network 
connectivity leading to incomplete communication between the MDT and client 
nodes causing client Node5 to hang as it never gets to fully reconnect or 
disconnect as the MDT is telling it in this case.

Consider the filesystem to be safe on a reboot of the MGS but on reboot and 
even before examine any and all partions or raid arrays and be sure they are 
mounted appropriately on reboot.  Issue lctl dl immediately on reboot to verify 
MGS was mounted correctly to rule out basic mounting errors of MGS.  Mount a 
client directly from the MGS if you can't connect you know it is network 
related.  

Also if 10.13.32.12 is your failover MGS pull the plug as it is not configured 
properly either due to a switching, cabling or network configuration problem.  
If the focus on the MGS does not solve carry on down the line to the MDT and 
look for anything out of the ordinary like an unmounted raid array or network 
configuration deficiencies.  Hard Reboot client nodes you suspect are hung.  
Observe switch configuration data.

That is my interpretation.  Take it at for what it is worth but understand 
this.  I have very limited experience with Lustre.
 
--- On Wed, 1/28/09, Nico.Budewitz  wrote:

From: Nico.Budewitz 
Subject: [Lustre-discuss] can't find mdc_open_data, but the close succeeded
To: lustre-discuss@lists.lustre.org, "Nico Budewitz" 
Date: Wednesday, January 28, 2009, 2:14 AM

Hi,

I had strange problems this morning:

A parallel application  on 256 cores has been started and recovered 
512GB from the last checkpoint this morning at 4.03. This application 
had to read the last checkpoint (256x2GB) from lustre into the main memory.

It seems that the filesystem was hanging for a while and other 
applications which were using lustre at the same time could not open 
files anymore and crashed instantly .

I received this error messages on the master node of the 256 core job:

Jan 28 04:04:07 node005 kernel: Lustre: Request x1138068 sent from 
aeifs2-MDT-mdc-81021db45400 to NID 10.13.32...@o2ib 8s ago has 
timed out (limit 8s).
Jan 28 04:04:24 node005 kernel: Lustre: 
8957:0:(import.c:410:import_select_connection()) 
aeifs2-MDT-mdc-81021db45400: tried all connections, increasing 
latenc
y to 6s
Jan 28 04:04:24 node005 kernel: Lustre: Changing connection for 
aeifs2-MDT-mdc-81021db45400 to 10.13.32...@o2ib/10.13.32...@o2ib
Jan 28 04:04:24 node005 kernel: LustreError: 167-0: This client was 
evicted by aeifs2-MDT; in progress operations using this service 
will fail.
Jan 28 04:04:24 node005 kernel: LustreError: 
9022:0:(file.c:116:ll_close_inode_openhandle()) inode 4262393 mdc close 
failed: rc = -5
Jan 28 04:04:24 node005 kernel: LustreError: 
10503:0:(client.c:722:ptlrpc_import_delay_req()) @@@ IMP_INVALID  
r...@8101f337be00 x1138066/t0 o35->aeifs2-MDT_UU
i...@10.13.32.11@o2ib:23/10 lens 296/1248 e 0 to 6 dl 0 ref 1 fl Rpc:/0/0 
rc 0/0
Jan 28 04:04:24 node005 kernel: LustreError: 
9022:0:(file.c:116:ll_close_inode_openhandle()) Skipped 1 previous 
similar message
Jan 28 04:04:24 node005 kernel: LustreError: 
15629:0:(mdc_locks.c:598:mdc_enqueue()) ldlm_cli_enqueue: -5
Jan 28 04:04:24 node005 kernel: Lustre: 
aeifs2-MDT-mdc-81021db45400: Connection restored to service 
aeifs2-MDT using nid 10.13.32...@o2ib.
Jan 28 04:04:25 node005 kernel: LustreError: 
15629:0:(mdc_request.c:741:mdc_close()) Unexpected: can't find 
mdc_open_data, but the close succeeded.  Please tell .
Jan 28 04:04:25 node005 kernel: LustreError: 
10503:0:(mdc_request.c:741:mdc_close()) Unexpected: can't find 
mdc_open_data, but the close succeeded.  Please tell .
Jan 28 04:04:26 node005 kernel: LustreError: 
15629:0:(mdc_request.c:741:mdc_close()) Unexpected: can't find 
mdc_open_data, but the close succeeded.  Please tell .
Jan 28 04:04:27 node005 kernel: LustreError: 11-0: an error occurred 
while communicating with 10.13.32...@o2ib. The mds_close operation 
failed with -116
Jan 28 04:04:27 node005 ker

Re: [Lustre-discuss] Performance Expectations of Lustre

2009-01-28 Thread Arden Wiebe
Nick:

On another note I just had to do a mysqlcheck -p --auto-repair on a 23266 table 
database tonight so probably not a good idea doing direct copies of 
/var/lib/mysql to the lustre filesystem.  Correlated or not would be better to 
mysqldump there instead.

Ardently;

Arden Wiebe

--- On Mon, 1/26/09, Nick Jennings  wrote:

From: Nick Jennings 
Subject: [Lustre-discuss] Performance Expectations of Lustre
To: lustre-discuss@lists.lustre.org
Date: Monday, January 26, 2009, 7:51 AM

Hello (and a special hello to all my ex-co-workers from the CFS days :)

  The company where I work now has grown fast in the past year and we 
suddenly find ourselves in need of a lot of storage. For 5 years the 
company ran on a 60gig server, last year we got a 1TB RAID that is now 
almost full. In 1-2 years we could easily be using 10-15TB of storage.

  Instead of just adding another 1TB server, I need to plan for a more 
scalable solution. Immediately Lustre came to mind, but I'm wondering 
about the performance. Basically our company does niche web-hosting for 
"Creative Professionals" so we need fast access to the data in order to 
have snappy web services for our clients. Typically these are smaller 
files (2MB pictures, 50MB videos, .swf files, etc.).

  Also I'm wondering about the best way set this up in terms of speed 
and ease of growth. I want the web-servers and the storage pool to be 
independent of each other. So I can add web-servers as the web traffic 
increases, and add more storage ass our storage needs grow. We have the 
option of an MD3000 or MD3000i for back-end storage.

  I was thinking initially we could start with 2 servers, both attached 
to the storage array. setup as OSS' and functioning as (load balanced) 
web-servers as well. In the future I could separate this out so that we 
have the web-servers on the "front line" mounting the data from the OSS' 
which will be on a private (gigE) network.

  Now, it's been years since I've played with Lustre, I'm sure some 
stuff will come back to me as I start using it again, other things I'll 
probably have to re-learn. I wanted to get some input from the Lustre 
community on whether or not this seems like a reasonable use for Lustre? 
Are there alternatives out there which might fit my needs more? 
(specifically speed and a shared storage pool). Also, what kind of 
performance can I expect, am I out of touch to expect something similar 
to a directly attached RAID array?

  I appreciate any and all feedback, suggestions, comments etc.

Thanks,
- Nick

--
Nick Jennings
Senior Programmer & Systems Administrator
Creative Motion Design
n...@creativemotiondesign.com
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss



  
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Performance Expectations of Lustre

2009-01-28 Thread Arden Wiebe
Nick:

In case I had a capitalization in the links I sent you mixed up.  
http://www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html and 
http://www.ioio.ca/Lustre-tcp-bonding/images.html should work.  Go easy on my 
old girl she only has one processor and is a complete hack to recover data 
after root stroke and jail riot last year on main drive that I couldn't 
salvage.  Pity it had the only copy of the code I needed yesterday.

Aside from the webserver it originates from it should put a pretty clear visual 
into how far you can take it for roughly how much TCO.  Again with valid points 
for tuning to small file size as best as possible.  If you would like to see a 
specific small file benchmark from some view I would do my best to produce if 
you tell me what to write.

Ardenly;

Arden Wiebe  

 
--- On Mon, 1/26/09, Nick Jennings  wrote:

From: Nick Jennings 
Subject: [Lustre-discuss] Performance Expectations of Lustre
To: lustre-discuss@lists.lustre.org
Date: Monday, January 26, 2009, 7:51 AM

Hello (and a special hello to all my ex-co-workers from the CFS days :)

  The company where I work now has grown fast in the past year and we 
suddenly find ourselves in need of a lot of storage. For 5 years the 
company ran on a 60gig server, last year we got a 1TB RAID that is now 
almost full. In 1-2 years we could easily be using 10-15TB of storage.

  Instead of just adding another 1TB server, I need to plan for a more 
scalable solution. Immediately Lustre came to mind, but I'm wondering 
about the performance. Basically our company does niche web-hosting for 
"Creative Professionals" so we need fast access to the data in order to 
have snappy web services for our clients. Typically these are smaller 
files (2MB pictures, 50MB videos, .swf files, etc.).

  Also I'm wondering about the best way set this up in terms of speed 
and ease of growth. I want the web-servers and the storage pool to be 
independent of each other. So I can add web-servers as the web traffic 
increases, and add more storage ass our storage needs grow. We have the 
option of an MD3000 or MD3000i for back-end storage.

  I was thinking initially we could start with 2 servers, both attached 
to the storage array. setup as OSS' and functioning as (load balanced) 
web-servers as well. In the future I could separate this out so that we 
have the web-servers on the "front line" mounting the data from the OSS' 
which will be on a private (gigE) network.

  Now, it's been years since I've played with Lustre, I'm sure some 
stuff will come back to me as I start using it again, other things I'll 
probably have to re-learn. I wanted to get some input from the Lustre 
community on whether or not this seems like a reasonable use for Lustre? 
Are there alternatives out there which might fit my needs more? 
(specifically speed and a shared storage pool). Also, what kind of 
performance can I expect, am I out of touch to expect something similar 
to a directly attached RAID array?

  I appreciate any and all feedback, suggestions, comments etc.

Thanks,
- Nick

--
Nick Jennings
Senior Programmer & Systems Administrator
Creative Motion Design
n...@creativemotiondesign.com
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss



  
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Plateau around 200MiB/s bond0

2009-01-27 Thread Arden Wiebe

--- On Mon, 1/26/09, Brian J. Murrell  wrote:

From: Brian J. Murrell 
Subject: Re: [Lustre-discuss] Plateau around 200MiB/s bond0
To: lustre-discuss@lists.lustre.org
Date: Monday, January 26, 2009, 6:59 AM

In general, when writing messages to this list, you need to be more
concise about what you are asking.  I see so much information here, I'm
not sure what is relevant to your few interspersed questions and what is
not.  I will try to answer your specific question...

My apologies for posting my study hacks to the list.  Thanks Brian for at least 
trying to answer questions that I have to learn the answer for myself first 
before I know the correct question to ask.  

Also, in the future, please use a simple plain-text format and just copy
and paste for plain-text content.  All of the "quoted-printable"
mime-types are confusing my MUA.

No doubt.  Sorry, I'm not good with MTA or MUA in general but I'll switch to 
plain text in the future.

On Sat, 2009-01-24 at 18:04 -0800, Arden Wiebe wrote:
> 
> I fail so far creating external journal for MDT, MGS and OSSx2.  How
> to add the external journal to /etc/fstab specifically the output of
> e2label /dev/sdb followed by what options for fstab?
> 

You need to look at the mkfs.ext3 manpage on how to create an external
journal (i.e. -O journal_dev external-journal) and attach an external
journal to an ext3 filesystem (i.e. -J device=external-journal) then
apply those mkfs.ext3 options to your Lustre device with mkfs.lustre's
--mkfsoptions option.

All of this is covered in the operations manual in section 10.3
"Creating an External Journal".

Been there done that well sort of.  Managed to have every luster filesystem 
with external journals some even on different controllers.  Underlying 
root/boot presentation separates the raid from the MBR and root and boot 
partitions that are un-raided and could eventually be done with a USB memory 
stick to afford a hot spare implementation from the released /dev/sda.  

The goal so far as the root file system is eventually a network/cluster 
configuration tool so that root/boot partitions can be delivered over the 
cluster to new and old nodes.  Until then the DVD.iso method works fine and can 
rehabilitate a failed boot drive in the standard CentOS 5.2 install time.  

The manual or list said without quoting in numerous places no partitions.  
There are no partitions in this configuration save for a 1TB / partition on 
/dev/sda1 of all main nodes and external journals on /dev/sdf1 on the MDT and 
MGS and /dev/sdb1 on the two OST that all occupy ,50,L of the entire 1TB drive 
for no doubt the 400mb journal. 

Solution at the time was to learn proper syntax for creation of raid10 device.  
So instead of physically making two raid 1 arrays and one raid 0 array to make 
a raid 1+0 configuration I had to learn the right way to make a raid10 - ya 
believe it.  e2label was reporting MGS for two drive volumes and fstab was all 
borked.

To top it all off I was dealing with a network anomaly that still persists on 
my MGS node whereupon I can't run the node at MTU 9000 while the rest of the 
nodes that are set can.  Even removed pulled the box off the shelf checked for 
hardware faults, reseated cards.  Removed all network interfaces and started 
over.  Still persists due to mixing of MTU 1500 and MTU 9000 on the same subnet 
no doubt.

Not sure if this is a proper list deliverable  but I have produced a series of 
pictures that in my understanding show a small lustre ethernet cluster running 
on comodity hardware doing 400MiB/s on one OST but also one that needs to 
handle smaller files better.  http://www.ioio.ca/Lustre-tcp-bonding/images.html 
and http://www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html 

Typical usage so far shows that copying /var/lib/mysql is still a time 
consuming process given 4.9G of data.  Web based files in flight are also 
typical small file size. Further objectives for the cluster are not implemented 
at this time but would include more of the same and then some.
Further suggestions regarding implementation of network specific cluster 
enhancements, partitioning, formatting, benchmarking or modes appreciated. 

My apologies for the --verbose thread that I hope is better formatted to fit 
your screen and also for my lack of specific questions due to not having enough 
experience to know the correct ones to ask at times.

a.

b.


-Inline Attachment Follows-

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss



  
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Plateau around 200MiB/s bond0

2009-01-25 Thread Arden Wiebe
So if one OST gets 200MiB/s and another OST gets 200MiB/s does that make 400 
MiB/s or this is not how to calculate throughput?  I will eventually plug the 
right sequence into iozone to measure it.  

>From my perspective it looks like ioio.ca/ioio.jpg ioio.ca/lustreone.png 
>ioio.ca/lustretwo.png ioio.ca/lustrethree.png ioio.ca/lustrefour.png

--- On Sat, 1/24/09, Arden Wiebe  wrote:

From: Arden Wiebe 
Subject: [Lustre-discuss] Plateau around 200MiB/s bond0
To: lustre-discuss@lists.lustre.org
Date: Saturday, January 24, 2009, 6:04 PM

1-2948-SFP Plus Baseline 3Com Switch
1-MGS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1
1-MDT bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1
2-OSS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid6
1-MGS-CLIENT bond0(eth0,eth1,eth2,eth3,eth4,eth5)
1-CLIENT bond0(eth0,eth1)
1-CLIENT eth0
1-CLIENT eth0

I fail so far creating external journal for MDT, MGS and OSSx2.  How to add the 
external journal to /etc/fstab specifically the output of e2label /dev/sdb 
followed by what options for fstab?

[r...@lustreone ~]# cat /proc/fs/lustre/devices
  0 UP mgs MGS MGS 17
  1 UP mgc mgc192.168@tcp 876c20af-aaec-1da0-5486-1fc61ec8cd15 5
  2 UP lov ioio-clilov-810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 4
  3 UP mdc ioio-MDT-mdc-810209363c00 
7307490a-4a12-4e8c-56ea-448e030a82e4 5
  4 UP osc
 ioio-OST-osc-810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 5
  5 UP osc ioio-OST0001-osc-810209363c00 
7307490a-4a12-4e8c-56ea-448e030a82e4 5
[r...@lustreone ~]# lfs df -h
UUID bytes  Used Available  Use% Mounted on
ioio-MDT_UUID   815.0G    534.0M    767.9G    0% /mnt/ioio[MDT:0]
ioio-OST_UUID 3.6T 28.4G  3.4T    0% /mnt/ioio[OST:0]
ioio-OST0001_UUID 3.6T 18.0G  3.4T    0% /mnt/ioio[OST:1]

filesystem summary:  
 7.2T 46.4G  6.8T    0% /mnt/ioio

[r...@lustreone ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Active Aggregator Info:
    Aggregator ID: 1
    Number of ports: 1
    Actor Key: 17
    Partner Key: 1
    Partner Mac Address: 00:00:00:00:00:00

Slave Interface: eth0
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1b:21:28:77:db
Aggregator ID: 1

Slave Interface:
 eth1
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1b:21:28:77:6c
Aggregator ID: 2

Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:94
Aggregator ID: 3

Slave Interface: eth2
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:93
Aggregator ID: 4

Slave Interface: eth4
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:95
Aggregator ID: 5

Slave Interface: eth5
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:96
Aggregator ID: 6
[r...@lustreone ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb[0] sdc[1]
  976762496 blocks [2/2] [UU]

unused devices: 
[r...@lustreone ~]# cat
 /etc/fstab
LABEL=/ /   ext3    defaults    1 1
tmpfs   /dev/shm    tmpfs   defaults    0 0
devpts  /dev/pts    devpts  gid=5,mode=620  0
 0
sysfs   /sys    sysfs   defaults    0 0
proc    /proc   proc    defaults    0 0
LABEL=MGS   /mnt/mgs    lustre  defaults,_netdev 0 0
192.168@tcp0:/ioio 
 /mnt/ioio   lustre  defaults,_netdev,noauto 0 0

[r...@lustreone ~]# ifconfig
bond0 Link encap:Ethernet  HWaddr 00:1B:21:28:77:DB
  inet addr:192.168.0.7  Bcast:192.168.0.255  Mask:255.255.255.0
  inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link
  UP BROADCAST RUNNING MASTER MULTICAST  MTU:9000  Metric:1
  RX packets:5457486 errors:0 dropped:0 overruns:0 frame:0
  TX packets:4665580 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0
 txqueuelen:0
  RX bytes:12376680079 (11.5 GiB)  TX bytes:34438742885 (32.0 GiB)

eth0  Link encap:Ethernet  HWaddr 00:1B:21:28:77:DB
  inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link
  UP BROADCAST RUNNING SLAVE MULTICAST  MTU:9000  Metric:1
  RX packets:3808615 errors:0 dropped:0 overruns:0 frame:0
  TX packets:4664270 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:12290700380 (11.4 GiB)  TX bytes:34438581771 (32.0
 GiB)
  Base address:0xec00 Memory:febe-fec0

>From what I have read not having an external journal configured for the OST's 
>is a sure recipie for slowness which I would rather not have considering the 
>goal is around 35

[Lustre-discuss] Plateau around 200MiB/s bond0

2009-01-24 Thread Arden Wiebe
1-2948-SFP Plus Baseline 3Com Switch
1-MGS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1
1-MDT bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1
2-OSS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid6
1-MGS-CLIENT bond0(eth0,eth1,eth2,eth3,eth4,eth5)
1-CLIENT bond0(eth0,eth1)
1-CLIENT eth0
1-CLIENT eth0

I fail so far creating external journal for MDT, MGS and OSSx2.  How to add the 
external journal to /etc/fstab specifically the output of e2label /dev/sdb 
followed by what options for fstab?

[r...@lustreone ~]# cat /proc/fs/lustre/devices
  0 UP mgs MGS MGS 17
  1 UP mgc mgc192.168@tcp 876c20af-aaec-1da0-5486-1fc61ec8cd15 5
  2 UP lov ioio-clilov-810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 4
  3 UP mdc ioio-MDT-mdc-810209363c00 
7307490a-4a12-4e8c-56ea-448e030a82e4 5
  4 UP osc ioio-OST-osc-810209363c00 
7307490a-4a12-4e8c-56ea-448e030a82e4 5
  5 UP osc ioio-OST0001-osc-810209363c00 
7307490a-4a12-4e8c-56ea-448e030a82e4 5
[r...@lustreone ~]# lfs df -h
UUID bytes  Used Available  Use% Mounted on
ioio-MDT_UUID   815.0G    534.0M    767.9G    0% /mnt/ioio[MDT:0]
ioio-OST_UUID 3.6T 28.4G  3.4T    0% /mnt/ioio[OST:0]
ioio-OST0001_UUID 3.6T 18.0G  3.4T    0% /mnt/ioio[OST:1]

filesystem summary:   7.2T 46.4G  6.8T    0% /mnt/ioio

[r...@lustreone ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Active Aggregator Info:
    Aggregator ID: 1
    Number of ports: 1
    Actor Key: 17
    Partner Key: 1
    Partner Mac Address: 00:00:00:00:00:00

Slave Interface: eth0
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1b:21:28:77:db
Aggregator ID: 1

Slave Interface: eth1
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1b:21:28:77:6c
Aggregator ID: 2

Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:94
Aggregator ID: 3

Slave Interface: eth2
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:93
Aggregator ID: 4

Slave Interface: eth4
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:95
Aggregator ID: 5

Slave Interface: eth5
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:96
Aggregator ID: 6
[r...@lustreone ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb[0] sdc[1]
  976762496 blocks [2/2] [UU]

unused devices: 
[r...@lustreone ~]# cat /etc/fstab
LABEL=/ /   ext3    defaults    1 1
tmpfs   /dev/shm    tmpfs   defaults    0 0
devpts  /dev/pts    devpts  gid=5,mode=620  0 0
sysfs   /sys    sysfs   defaults    0 0
proc    /proc   proc    defaults    0 0
LABEL=MGS   /mnt/mgs    lustre  defaults,_netdev 0 0
192.168@tcp0:/ioio  /mnt/ioio   lustre  defaults,_netdev,noauto 
0 0

[r...@lustreone ~]# ifconfig
bond0 Link encap:Ethernet  HWaddr 00:1B:21:28:77:DB
  inet addr:192.168.0.7  Bcast:192.168.0.255  Mask:255.255.255.0
  inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link
  UP BROADCAST RUNNING MASTER MULTICAST  MTU:9000  Metric:1
  RX packets:5457486 errors:0 dropped:0 overruns:0 frame:0
  TX packets:4665580 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:12376680079 (11.5 GiB)  TX bytes:34438742885 (32.0 GiB)

eth0  Link encap:Ethernet  HWaddr 00:1B:21:28:77:DB
  inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link
  UP BROADCAST RUNNING SLAVE MULTICAST  MTU:9000  Metric:1
  RX packets:3808615 errors:0 dropped:0 overruns:0 frame:0
  TX packets:4664270 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:12290700380 (11.4 GiB)  TX bytes:34438581771 (32.0 GiB)
  Base address:0xec00 Memory:febe-fec0

>From what I have read not having an external journal configured for the OST's 
>is a sure recipie for slowness which I would rather not have considering the 
>goal is around 350MiB/s or more which should be obtainable.  

Here is how I formated the raid6 device on both OSS's that have identical 
[r...@lustrefour ~]# fdisk -l

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot  Start End  Blocks   Id  System
/dev/sda1   *   1  121601   976760001   83  Linux

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev

[Lustre-discuss] Myri Cards and Motherboards?

2009-01-16 Thread Arden Wiebe
Has anyone used Myri NICs with the P5Q series of Asus motherboards?  Would be 
good to know from a compatibility standpoint.


  
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] How to use the openSuse 10.3 kernel.

2009-01-11 Thread Arden Wiebe
There was the mkinitrd command issued.  Notice there is a difference in loaded 
modules but none that I would suspect making the system still hang looking for 
a Want me to fall back to /dev/disk/by-id/scsi-SATA y/n error.  Trying a clean 
install again with slightly different bios settings.  Oh those are set for ahci 
in the bios as it doesn't find any drives without.

lustrethree:/boot # mkinitrd

Kernel image:   /boot/vmlinuz-2.6.16.60-0.27_lustre.1.6.6-smp
Initrd image:   /boot/initrd-2.6.16.60-0.27_lustre.1.6.6-smp
Root device:    /dev/disk/by-id/scsi-SATA_SAMSUNG_HD103UJS13PJDWQA29776-part1 
(/  
 dev/sda1) (mounted on / as ext3)
Kernel Modules: processor thermal scsi_mod libata ahci pata_marvell fan jbd 
ext3   
 edd sd_mod usbcore ohci-hcd uhci-hcd ehci-hcd usbhid
Features:   block usb resume.userspace resume.kernel
Bootsplash: SuSE (1280x1024)
49142 blocks

Kernel image:   /boot/vmlinuz-2.6.22.5-31-default
Initrd image:   /boot/initrd-2.6.22.5-31-default
Root device:    /dev/disk/by-id/scsi-SATA_SAMSUNG_HD103UJS13PJDWQA29776-part1 
(/dev/sda1) (mounted on / as ext3)
Kernel Modules: processor thermal scsi_mod libata ahci pata_marvell fan jbd 
mbcache ext3 edd sd_mod usbcore ohci-hcd uhci-hcd ehci-hcd ff-memless hid usbhid
Features:   block usb resume.userspace resume.kernel
Bootsplash: SuSE (1280x1024)
25990 blocks
lustrethree:/boot #


--- On Sun, 1/11/09, Arden Wiebe  wrote:

From: Arden Wiebe 
Subject: Re: [Lustre-discuss] How to use the openSuse 10.3 kernel.
To: "Guido Juckeland" , 
lustre-discuss@lists.Lustre.org
Date: Sunday, January 11, 2009, 6:28 PM

Here is what I have done to try and get it working on this box:

lustrethree:~/Desktop # rpm -ivh 
kernel-lustre-smp-2.6.16-60_0.27_lustre.1.6.6.x86_64.rpm
Preparing...    ### [100%]
    package kernel-lustre-smp-2.6.16-60_0.27_lustre.1.6.6 is already 
installed
lustrethree:~/Desktop # rpm -ivh 
lustre-ldiskfs-3.0.6-2.6.16.60_0.27_lustre.1.6.6_smp.x86_64.rpm
Preparing...    ### [100%]
    package lustre-ldiskfs-3.0.6-2.6.16.60_0.27_lustre.1.6.6_smp is already 
installed
lustrethree:~/Desktop # rpm -ivh
 lustre-modules-1.6.6-2.6.16.60_0.27_lustre.1.6.6_smp.x86_64.rpm
Preparing...    ### [100%]
    package lustre-modules-1.6.6-2.6.16.60_0.27_lustre.1.6.6_smp is already 
installed
lustrethree:~/Desktop # rpm -ivh 
lustre-1.6.6-2.6.16.60_0.27_lustre.1.6.6_smp.x86_64.rpm
Preparing...    ### [100%]
    package lustre-1.6.6-2.6.16.60_0.27_lustre.1.6.6_smp is already 
installed
lustrethree:~/Desktop # 

First have to solve for the dependency.  Downloaded from 
http://rpmfind.net//linux/RPM/opensuse/10.3/x86_64/db43-4.3.29-59.x86_64.html
for future reference.

lustrethree:~/Desktop # rpm -ivh
 db43-4.3.29-59.x86_64.rpm
Preparing...    ### [100%]
   1:db43   ### [100%]
lustrethree:~/Desktop # rpm --force -Uvh 
e2fsprogs-1.40.11.sun1-0suse.sles10.x86_64.rpm
Preparing...    ### [100%]
   1:e2fsprogs  ### [100%]
lustrethree:~/Desktop # 

Reboot and cross fingers.
--- On Sun, 1/11/09, Arden Wiebe  wrote:

From: Arden Wiebe 
Subject: Re: [Lustre-discuss] How to use the openSuse 10.3 kernel.
To: "Guido Juckeland" , 
lustre-discuss@lists.Lustre.org
Date: Sunday, January 11, 2009, 5:11 PM


After doing a mkinitrd and rebooting I still fail to load the lustre openSuse 
10.3 kernel.  It fails when asking:

Want me to fall back to /dev/disk/by-id/scsi-SATA y/n

I have tried editing my fstab and menu.lst file.  Is there anything else I must 
edit?

lustrefour:~ # cat /etc/fstab
/dev/sda1
    /    ext3   acl,user_xattr    1 1
proc /proc    proc  
 defaults  0 0
sysfs    /sys sysfs  noauto    0 0
debugfs  /sys/kernel/debug    debugfs    noauto    0 0
usbfs    /proc/bus/usb    usbfs  noauto   
 0 0
devpts   /dev/pts devpts mode=0620,gid=5   0 0
/dev/fd0 /media/floppy    auto   noauto,user,sync  0 0
lustrefour:~ #

# Modified by YaST2. Last modification on Sun Jan 11 20:54:40 UTC 2009
default 0
timeout 8
##YaST - generic_mbr
gfxmenu (hd0,0)/boot/message
##YaST - activate

###Don't change this comment - YaST2 identifier: Original name: lustre###
title lustre
    root (h

Re: [Lustre-discuss] How to use the openSuse 10.3 kernel.

2009-01-11 Thread Arden Wiebe
Here is what I have done to try and get it working on this box:

lustrethree:~/Desktop # rpm -ivh 
kernel-lustre-smp-2.6.16-60_0.27_lustre.1.6.6.x86_64.rpm
Preparing...    ### [100%]
    package kernel-lustre-smp-2.6.16-60_0.27_lustre.1.6.6 is already 
installed
lustrethree:~/Desktop # rpm -ivh 
lustre-ldiskfs-3.0.6-2.6.16.60_0.27_lustre.1.6.6_smp.x86_64.rpm
Preparing...    ### [100%]
    package lustre-ldiskfs-3.0.6-2.6.16.60_0.27_lustre.1.6.6_smp is already 
installed
lustrethree:~/Desktop # rpm -ivh 
lustre-modules-1.6.6-2.6.16.60_0.27_lustre.1.6.6_smp.x86_64.rpm
Preparing...    ### [100%]
    package lustre-modules-1.6.6-2.6.16.60_0.27_lustre.1.6.6_smp is already 
installed
lustrethree:~/Desktop # rpm -ivh 
lustre-1.6.6-2.6.16.60_0.27_lustre.1.6.6_smp.x86_64.rpm
Preparing...    ### [100%]
    package lustre-1.6.6-2.6.16.60_0.27_lustre.1.6.6_smp is already 
installed
lustrethree:~/Desktop # 

First have to solve for the dependency.  Downloaded from 
http://rpmfind.net//linux/RPM/opensuse/10.3/x86_64/db43-4.3.29-59.x86_64.html
for future reference.

lustrethree:~/Desktop # rpm -ivh db43-4.3.29-59.x86_64.rpm
Preparing...    ### [100%]
   1:db43   ### [100%]
lustrethree:~/Desktop # rpm --force -Uvh 
e2fsprogs-1.40.11.sun1-0suse.sles10.x86_64.rpm
Preparing...    ### [100%]
   1:e2fsprogs  ### [100%]
lustrethree:~/Desktop # 

Reboot and cross fingers.
--- On Sun, 1/11/09, Arden Wiebe  wrote:

From: Arden Wiebe 
Subject: Re: [Lustre-discuss] How to use the openSuse 10.3 kernel.
To: "Guido Juckeland" , 
lustre-discuss@lists.Lustre.org
Date: Sunday, January 11, 2009, 5:11 PM


After doing a mkinitrd and rebooting I still fail to load the lustre openSuse 
10.3 kernel.  It fails when asking:

Want me to fall back to /dev/disk/by-id/scsi-SATA y/n

I have tried editing my fstab and menu.lst file.  Is there anything else I must 
edit?

lustrefour:~ # cat /etc/fstab
/dev/sda1    /    ext3   acl,user_xattr    1 1
proc /proc    proc  
 defaults  0 0
sysfs    /sys sysfs  noauto    0 0
debugfs  /sys/kernel/debug    debugfs    noauto    0 0
usbfs    /proc/bus/usb    usbfs  noauto   
 0 0
devpts   /dev/pts devpts mode=0620,gid=5   0 0
/dev/fd0 /media/floppy    auto   noauto,user,sync  0 0
lustrefour:~ #

# Modified by YaST2. Last modification on Sun Jan 11 20:54:40 UTC 2009
default 0
timeout 8
##YaST - generic_mbr
gfxmenu (hd0,0)/boot/message
##YaST - activate

###Don't change this comment - YaST2 identifier: Original name: lustre###
title lustre
    root (hd0,0)
    kernel /boot/vmlinuz-2.6.16.60-0.27_lustre.1.6.6-smp root=/dev/sda1 
vga=0x317    splash=silent
 showopts
    initrd /boot/initrd-2.6.16.60-0.27_lustre.1.6.6-smp

###Don't change this comment - YaST2 identifier: Original name: linux###
title openSUSE 10.3
    root (hd0,0)
    kernel /boot/vmlinuz-2.6.22.5-31-default 
root=/dev/disk/by-id/scsi-SATA_SAMSUNG_HD103UJS13PJ90QB38092-part1 vga=0x317    
splash=silent showopts
    initrd /boot/initrd-2.6.22.5-31-default

###Don't change this comment - YaST2 identifier: Original name: floppy###
title Floppy
    rootnoverify (hd0,0)
    chainloader (fd0)+1

###Don't change this comment - YaST2 identifier: Original name: failsafe###
title Failsafe -- openSUSE 10.3
    root (hd0,0)
    kernel /boot/vmlinuz-2.6.22.5-31-default 
root=/dev/disk/by-id/scsi-SATA_SAMSUNG_HD103UJS13PJ90QB38092-part1 vga=normal 
showopts ide=nodma apm=off
 acpi=off noresume edd=off 3
    initrd /boot/initrd-2.6.22.5-31-default
lustrefour:~ #  
--- On Sun, 1/11/09, Guido Juckeland  wrote:

From: Guido Juckeland 
Subject: Re: [Lustre-discuss] How to use the openSuse 10.3 kernel.
To: "Arden Wiebe" 
Date: Sunday, January 11, 2009, 2:35 PM

I would try an "mkinitrd".

Guido

Arden Wiebe wrote:
> Okay I went to the sun download site and retrieved the lustre RPM's for my 
> platform:
> 
> lustrefour:~/Desktop # uname -a
> Linux lustrefour 2.6.22.5-31-default #1 SMP 2007/09/21 22:29:00 UTC x86_64 
> x86_64 x86_64 GNU/Linux
> lustrefour:~/Desktop # cat /etc/grub.conf
> setup
 --stage2=/boot/grub/stage2 (hd0,0) (hd0,0)
> quit
> lustrefour:~/Desktop # cat /proc/version
> Linux version 2.6.22.5-31-default (ge...@buildhost) (gcc version 4.2.1 (SUSE 
> Linux)) #1 SMP 2007/09/21 22:29:00 UTC
> lustrefour:~/D

Re: [Lustre-discuss] How to use the openSuse 10.3 kernel.

2009-01-11 Thread Arden Wiebe

After doing a mkinitrd and rebooting I still fail to load the lustre openSuse 
10.3 kernel.  It fails when asking:

Want me to fall back to /dev/disk/by-id/scsi-SATA y/n

I have tried editing my fstab and menu.lst file.  Is there anything else I must 
edit?

lustrefour:~ # cat /etc/fstab
/dev/sda1    /    ext3   acl,user_xattr    1 1
proc /proc    proc   defaults  0 0
sysfs    /sys sysfs  noauto    0 0
debugfs  /sys/kernel/debug    debugfs    noauto    0 0
usbfs    /proc/bus/usb    usbfs  noauto    0 0
devpts   /dev/pts devpts mode=0620,gid=5   0 0
/dev/fd0 /media/floppy    auto   noauto,user,sync  0 0
lustrefour:~ #

# Modified by YaST2. Last modification on Sun Jan 11 20:54:40 UTC 2009
default 0
timeout 8
##YaST - generic_mbr
gfxmenu (hd0,0)/boot/message
##YaST - activate

###Don't change this comment - YaST2 identifier: Original name: lustre###
title lustre
    root (hd0,0)
    kernel /boot/vmlinuz-2.6.16.60-0.27_lustre.1.6.6-smp root=/dev/sda1 
vga=0x317    splash=silent showopts
    initrd /boot/initrd-2.6.16.60-0.27_lustre.1.6.6-smp

###Don't change this comment - YaST2 identifier: Original name: linux###
title openSUSE 10.3
    root (hd0,0)
    kernel /boot/vmlinuz-2.6.22.5-31-default 
root=/dev/disk/by-id/scsi-SATA_SAMSUNG_HD103UJS13PJ90QB38092-part1 vga=0x317    
splash=silent showopts
    initrd /boot/initrd-2.6.22.5-31-default

###Don't change this comment - YaST2 identifier: Original name: floppy###
title Floppy
    rootnoverify (hd0,0)
    chainloader (fd0)+1

###Don't change this comment - YaST2 identifier: Original name: failsafe###
title Failsafe -- openSUSE 10.3
    root (hd0,0)
    kernel /boot/vmlinuz-2.6.22.5-31-default 
root=/dev/disk/by-id/scsi-SATA_SAMSUNG_HD103UJS13PJ90QB38092-part1 vga=normal 
showopts ide=nodma apm=off acpi=off noresume edd=off 3
    initrd /boot/initrd-2.6.22.5-31-default
lustrefour:~ #  
--- On Sun, 1/11/09, Guido Juckeland  wrote:

From: Guido Juckeland 
Subject: Re: [Lustre-discuss] How to use the openSuse 10.3 kernel.
To: "Arden Wiebe" 
Date: Sunday, January 11, 2009, 2:35 PM

I would try an "mkinitrd".

Guido

Arden Wiebe wrote:
> Okay I went to the sun download site and retrieved the lustre RPM's for my 
> platform:
> 
> lustrefour:~/Desktop # uname -a
> Linux lustrefour 2.6.22.5-31-default #1 SMP 2007/09/21 22:29:00 UTC x86_64 
> x86_64 x86_64 GNU/Linux
> lustrefour:~/Desktop # cat /etc/grub.conf
> setup --stage2=/boot/grub/stage2 (hd0,0) (hd0,0)
> quit
> lustrefour:~/Desktop # cat /proc/version
> Linux version 2.6.22.5-31-default (ge...@buildhost) (gcc version 4.2.1 (SUSE 
> Linux)) #1 SMP 2007/09/21 22:29:00 UTC
> lustrefour:~/Desktop # rpm -qf /boot/vmlinuz
> kernel-default-2.6.22.5-31
> lustrefour:~/Desktop #
> 
> 
> I installed all the rpm's but I can't boot the kernel.  What didn't I do?  
> I'm new to openSuse.
> 
> 
>       
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 

-- 

__
Guido Juckeland, M.Sc.
Senior System Engineer (HPC)

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Trefftz-Bau, HRSK/151
Phone:  (+49) 351 463-39871
Fax:    (+49) 351 463-37773
e-mail: guido.juckel...@tu-dresden.de
WWW:    http://www.tu-dresden.de/zih




  ___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] openSuse 10.3 and pata_marvell

2009-01-11 Thread Arden Wiebe
I don't have a .config file yet for the openSuse 10.3 lustre kernel to check to 
see if the pata_marvell drivers are compiled with it.  Can anyone tell me if 
this is so.


  
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] How to use the openSuse 10.3 kernel.

2009-01-11 Thread Arden Wiebe
Okay I went to the sun download site and retrieved the lustre RPM's for my 
platform:

lustrefour:~/Desktop # uname -a
Linux lustrefour 2.6.22.5-31-default #1 SMP 2007/09/21 22:29:00 UTC x86_64 
x86_64 x86_64 GNU/Linux
lustrefour:~/Desktop # cat /etc/grub.conf
setup --stage2=/boot/grub/stage2 (hd0,0) (hd0,0)
quit
lustrefour:~/Desktop # cat /proc/version
Linux version 2.6.22.5-31-default (ge...@buildhost) (gcc version 4.2.1 (SUSE 
Linux)) #1 SMP 2007/09/21 22:29:00 UTC
lustrefour:~/Desktop # rpm -qf /boot/vmlinuz
kernel-default-2.6.22.5-31
lustrefour:~/Desktop #


I installed all the rpm's but I can't boot the kernel.  What didn't I do?  I'm 
new to openSuse.


  
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] tcp0 for maximum effect

2009-01-10 Thread Arden Wiebe
I have two boxes that have this:

[r...@lustrethree Desktop]# ifconfig
eth0  Link encap:Ethernet  HWaddr 00:1B:21:2A:17:76
  inet addr:192.168.0.19  Bcast:192.168.0.255  Mask:255.255.255.0
  inet6 addr: fe80::21b:21ff:fe2a:1776/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:1501960 errors:0 dropped:0 overruns:0 frame:0
  TX packets:3792561 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:120168321 (114.6 MiB)  TX bytes:5300070662 (4.9 GiB)
  Base address:0xec00 Memory:febe-fec0

eth1  Link encap:Ethernet  HWaddr 00:1B:21:2A:1C:DC
  inet addr:192.168.0.20  Bcast:192.168.0.255  Mask:255.255.255.0
  inet6 addr: fe80::21b:21ff:fe2a:1cdc/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:828283 errors:0 dropped:0 overruns:0 frame:0
  TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:55673426 (53.0 MiB)  TX bytes:846 (846.0 b)
  Base address:0xe880 Memory:feb8-feba

eth2  Link encap:Ethernet  HWaddr 00:22:15:06:3A:0F
  inet addr:192.168.0.21  Bcast:192.168.0.255  Mask:255.255.255.0
  inet6 addr: fe80::222:15ff:fe06:3a0f/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:828047 errors:0 dropped:0 overruns:0 frame:0
  TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:55657196 (53.0 MiB)  TX bytes:782 (782.0 b)
  Interrupt:185

eth3  Link encap:Ethernet  HWaddr 00:22:15:06:3A:10
  inet addr:192.168.0.22  Bcast:192.168.0.255  Mask:255.255.255.0
  inet6 addr: fe80::222:15ff:fe06:3a10/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:827857 errors:0 dropped:0 overruns:0 frame:0
  TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:55644744 (53.0 MiB)  TX bytes:782 (782.0 b)
  Interrupt:209

eth4  Link encap:Ethernet  HWaddr 00:22:15:06:3A:11
  inet addr:192.168.0.23  Bcast:192.168.0.255  Mask:255.255.255.0
  inet6 addr: fe80::222:15ff:fe06:3a11/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:827706 errors:0 dropped:0 overruns:0 frame:0
  TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:55634788 (53.0 MiB)  TX bytes:782 (782.0 b)
  Interrupt:169

eth5  Link encap:Ethernet  HWaddr 00:22:15:06:3A:12
  inet addr:192.168.0.24  Bcast:192.168.0.255  Mask:255.255.255.0
  inet6 addr: fe80::222:15ff:fe06:3a12/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:827519 errors:0 dropped:0 overruns:0 frame:0
  TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:55622528 (53.0 MiB)  TX bytes:782 (782.0 b)
  Interrupt:193

loLink encap:Local Loopback
  inet addr:127.0.0.1  Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK RUNNING  MTU:16436  Metric:1
  RX packets:1485135 errors:0 dropped:0 overruns:0 frame:0
  TX packets:1485135 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:5089873659 (4.7 GiB)  TX bytes:5089873659 (4.7 GiB)

[r...@lustrethree Desktop]#

Would it be better to have these two boxes as OSS's or as MDT or MGS machines?  
Currently they are configured 1 as a MGS and the other as the MDT.  The 
question is does LNET use the available tcp0 connections different from the OSS 
perspective as opposed to the MDT or MGS perspective?


  
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Optimal OSS OST drives for boxed deployment

2009-01-10 Thread Arden Wiebe
Purchased 4 more 1TB spinpoint drives for the OSS's.  This should allow for 
proper raid 6 if the boards, power supplies and backup power can handle the 
load.  

--- On Sat, 1/10/09, Arden Wiebe  wrote:

From: Arden Wiebe 
Subject: Re: [Lustre-discuss] Optimal OSS OST drives for boxed deployment
To: lustre-discuss@lists.lustre.org
Date: Saturday, January 10, 2009, 1:48 PM

The reason I ask is because I am at the tuning and configuration section of my 
deployment and that includes raiding the servers properly.  So far this is what 
it looks like unmounted from a client.  I've had it all mounted from the client 
connected to an OSS before but that is not ideal.


r...@lustreone ~]# cat /proc/fs/lustre/devices
  0 UP mgc mgc192.168.0...@tcp 0c8fe823-9b73-df4d-b3d3-73eb8db70038 5
  1 UP ost OSS OSS_uuid 3
  2 UP obdfilter datafs-OST001e datafs-OST001e_UUID 3
  3 UP obdfilter datafs-OST001f datafs-OST001f_UUID 3
  4 UP obdfilter datafs-OST0020 datafs-OST0020_UUID 3
  5 UP obdfilter datafs-OST0021 datafs-OST0021_UUID 3
  6 UP obdfilter datafs-OST0022 datafs-OST0022_UUID 3
[r...@lustreone ~]#

Ouch!
[r...@lustretwo Desktop]# cat
 /proc/fs/lustre/devices
  0 UP mgc mgc192.168.0...@tcp 19c4feb6-3285-f01f-b528-02dfeaef0b57 5
  1 UP ost OSS OSS_uuid 3
  2 UP obdfilter datafs-OST0023 datafs-OST0023_UUID 5
  3 UP obdfilter datafs-OST0024 datafs-OST0024_UUID 5
  4 UP obdfilter datafs-OST0025 datafs-OST0025_UUID 5
  5 UP obdfilter datafs-OST0026 datafs-OST0026_UUID 5
  6 UP obdfilter datafs-OST0027 datafs-OST0027_UUID 5
[r...@lustretwo Desktop]#  

[r...@lustrethree Desktop]# cat /proc/fs/lustre/devices
  0 UP mgs MGS MGS 11
  1 UP mgc mgc192.168.0...@tcp 3aba1efe-92c2-88dd-c06b-47be63d63f49 5
[r...@lustrethree Desktop]#
[r...@lustrefour Desktop]# cat /proc/fs/lustre/devices
  0 UP mgc mgc192.168.0...@tcp c9c83cf8-2965-4677-5b76-404d738e15bc 5
  1 UP mdt MDS MDS_uuid 3
  2 UP lov datafs-mdtlov datafs-mdtlov_UUID 4
  3 IN osc datafs-OST-osc datafs-mdtlov_UUID
 5
  4 IN osc datafs-OST0001-osc datafs-mdtlov_UUID 5
  5 IN osc datafs-OST0002-osc datafs-mdtlov_UUID 5
  6 IN osc datafs-OST0003-osc datafs-mdtlov_UUID 5
  7 IN osc datafs-OST0004-osc datafs-mdtlov_UUID 5
  8 IN osc datafs-OST0005-osc datafs-mdtlov_UUID 5
  9 IN osc datafs-OST0006-osc datafs-mdtlov_UUID 5
 10 IN osc datafs-OST0007-osc datafs-mdtlov_UUID 5
 11 IN osc datafs-OST0008-osc datafs-mdtlov_UUID 5
 12 IN osc datafs-OST0009-osc datafs-mdtlov_UUID 5
 13 IN osc datafs-OST000a-osc datafs-mdtlov_UUID 5
 14 IN osc datafs-OST000b-osc datafs-mdtlov_UUID 5
 15 IN osc datafs-OST000c-osc datafs-mdtlov_UUID 5
 16 IN osc datafs-OST000d-osc datafs-mdtlov_UUID 5
 17 IN osc datafs-OST000e-osc datafs-mdtlov_UUID 5
 18 IN osc datafs-OST000f-osc datafs-mdtlov_UUID 5
 19 IN osc datafs-OST0010-osc datafs-mdtlov_UUID 5
 20 IN osc
 datafs-OST0011-osc datafs-mdtlov_UUID 5
 21 IN osc datafs-OST0012-osc datafs-mdtlov_UUID 5
 22 IN osc datafs-OST0013-osc datafs-mdtlov_UUID 5
 23 UP mds datafs-MDT datafs-MDT_UUID 5
 24 UP osc datafs-OST0014-osc datafs-mdtlov_UUID 5
 25 UP osc datafs-OST0015-osc datafs-mdtlov_UUID 5
 26 UP osc datafs-OST0016-osc datafs-mdtlov_UUID 5
 27 UP osc datafs-OST0017-osc datafs-mdtlov_UUID 5
 28 UP osc datafs-OST0018-osc datafs-mdtlov_UUID 5
 29 UP osc datafs-OST0019-osc datafs-mdtlov_UUID 5
 30 UP osc datafs-OST001a-osc datafs-mdtlov_UUID 5
 31 UP osc datafs-OST001b-osc datafs-mdtlov_UUID 5
 32 UP osc datafs-OST001c-osc datafs-mdtlov_UUID 5
 33 UP osc datafs-OST001d-osc datafs-mdtlov_UUID 5
 34 UP osc datafs-OST001e-osc datafs-mdtlov_UUID 5
 35 UP osc datafs-OST001f-osc datafs-mdtlov_UUID 5
 36 UP osc datafs-OST0020-osc
 datafs-mdtlov_UUID 5
 37 UP osc datafs-OST0021-osc datafs-mdtlov_UUID 5
 38 UP osc datafs-OST0022-osc datafs-mdtlov_UUID 5
 39 UP osc datafs-OST0023-osc datafs-mdtlov_UUID 5
 40 UP osc datafs-OST0024-osc datafs-mdtlov_UUID 5
 41 UP osc datafs-OST0025-osc datafs-mdtlov_UUID 5
 42 UP osc datafs-OST0026-osc datafs-mdtlov_UUID 5
 43 UP osc datafs-OST0027-osc datafs-mdtlov_UUID 5
[r...@lustrefour Desktop]#    

--- On Sat, 1/10/09, Arden Wiebe  wrote:

From: Arden Wiebe 
Subject: [Lustre-discuss] Optimal OSS OST drives for boxed deployment
To: lustre-discuss@lists.lustre.org
Date: Saturday, January 10, 2009, 1:40 PM

I have
 two OSS each have six 1TB drives.  
sda contains the kernel and the operating system.
sdb,sdc,sdd,sde,sdf are the targets and make only a raid 5.

Is it advisable to add another drive to each of these OSS's to facilitate raid 
6 for the targets?

sda has only / partion and occupies the entire 1TB drive.


      
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss



  
-Inline Attachment Follows-

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo

Re: [Lustre-discuss] Optimal OSS OST drives for boxed deployment

2009-01-10 Thread Arden Wiebe
The reason I ask is because I am at the tuning and configuration section of my 
deployment and that includes raiding the servers properly.  So far this is what 
it looks like unmounted from a client.  I've had it all mounted from the client 
connected to an OSS before but that is not ideal.


r...@lustreone ~]# cat /proc/fs/lustre/devices
  0 UP mgc mgc192.168.0...@tcp 0c8fe823-9b73-df4d-b3d3-73eb8db70038 5
  1 UP ost OSS OSS_uuid 3
  2 UP obdfilter datafs-OST001e datafs-OST001e_UUID 3
  3 UP obdfilter datafs-OST001f datafs-OST001f_UUID 3
  4 UP obdfilter datafs-OST0020 datafs-OST0020_UUID 3
  5 UP obdfilter datafs-OST0021 datafs-OST0021_UUID 3
  6 UP obdfilter datafs-OST0022 datafs-OST0022_UUID 3
[r...@lustreone ~]#

Ouch!
[r...@lustretwo Desktop]# cat /proc/fs/lustre/devices
  0 UP mgc mgc192.168.0...@tcp 19c4feb6-3285-f01f-b528-02dfeaef0b57 5
  1 UP ost OSS OSS_uuid 3
  2 UP obdfilter datafs-OST0023 datafs-OST0023_UUID 5
  3 UP obdfilter datafs-OST0024 datafs-OST0024_UUID 5
  4 UP obdfilter datafs-OST0025 datafs-OST0025_UUID 5
  5 UP obdfilter datafs-OST0026 datafs-OST0026_UUID 5
  6 UP obdfilter datafs-OST0027 datafs-OST0027_UUID 5
[r...@lustretwo Desktop]#  

[r...@lustrethree Desktop]# cat /proc/fs/lustre/devices
  0 UP mgs MGS MGS 11
  1 UP mgc mgc192.168.0...@tcp 3aba1efe-92c2-88dd-c06b-47be63d63f49 5
[r...@lustrethree Desktop]#
[r...@lustrefour Desktop]# cat /proc/fs/lustre/devices
  0 UP mgc mgc192.168.0...@tcp c9c83cf8-2965-4677-5b76-404d738e15bc 5
  1 UP mdt MDS MDS_uuid 3
  2 UP lov datafs-mdtlov datafs-mdtlov_UUID 4
  3 IN osc datafs-OST-osc datafs-mdtlov_UUID 5
  4 IN osc datafs-OST0001-osc datafs-mdtlov_UUID 5
  5 IN osc datafs-OST0002-osc datafs-mdtlov_UUID 5
  6 IN osc datafs-OST0003-osc datafs-mdtlov_UUID 5
  7 IN osc datafs-OST0004-osc datafs-mdtlov_UUID 5
  8 IN osc datafs-OST0005-osc datafs-mdtlov_UUID 5
  9 IN osc datafs-OST0006-osc datafs-mdtlov_UUID 5
 10 IN osc datafs-OST0007-osc datafs-mdtlov_UUID 5
 11 IN osc datafs-OST0008-osc datafs-mdtlov_UUID 5
 12 IN osc datafs-OST0009-osc datafs-mdtlov_UUID 5
 13 IN osc datafs-OST000a-osc datafs-mdtlov_UUID 5
 14 IN osc datafs-OST000b-osc datafs-mdtlov_UUID 5
 15 IN osc datafs-OST000c-osc datafs-mdtlov_UUID 5
 16 IN osc datafs-OST000d-osc datafs-mdtlov_UUID 5
 17 IN osc datafs-OST000e-osc datafs-mdtlov_UUID 5
 18 IN osc datafs-OST000f-osc datafs-mdtlov_UUID 5
 19 IN osc datafs-OST0010-osc datafs-mdtlov_UUID 5
 20 IN osc datafs-OST0011-osc datafs-mdtlov_UUID 5
 21 IN osc datafs-OST0012-osc datafs-mdtlov_UUID 5
 22 IN osc datafs-OST0013-osc datafs-mdtlov_UUID 5
 23 UP mds datafs-MDT datafs-MDT_UUID 5
 24 UP osc datafs-OST0014-osc datafs-mdtlov_UUID 5
 25 UP osc datafs-OST0015-osc datafs-mdtlov_UUID 5
 26 UP osc datafs-OST0016-osc datafs-mdtlov_UUID 5
 27 UP osc datafs-OST0017-osc datafs-mdtlov_UUID 5
 28 UP osc datafs-OST0018-osc datafs-mdtlov_UUID 5
 29 UP osc datafs-OST0019-osc datafs-mdtlov_UUID 5
 30 UP osc datafs-OST001a-osc datafs-mdtlov_UUID 5
 31 UP osc datafs-OST001b-osc datafs-mdtlov_UUID 5
 32 UP osc datafs-OST001c-osc datafs-mdtlov_UUID 5
 33 UP osc datafs-OST001d-osc datafs-mdtlov_UUID 5
 34 UP osc datafs-OST001e-osc datafs-mdtlov_UUID 5
 35 UP osc datafs-OST001f-osc datafs-mdtlov_UUID 5
 36 UP osc datafs-OST0020-osc datafs-mdtlov_UUID 5
 37 UP osc datafs-OST0021-osc datafs-mdtlov_UUID 5
 38 UP osc datafs-OST0022-osc datafs-mdtlov_UUID 5
 39 UP osc datafs-OST0023-osc datafs-mdtlov_UUID 5
 40 UP osc datafs-OST0024-osc datafs-mdtlov_UUID 5
 41 UP osc datafs-OST0025-osc datafs-mdtlov_UUID 5
 42 UP osc datafs-OST0026-osc datafs-mdtlov_UUID 5
 43 UP osc datafs-OST0027-osc datafs-mdtlov_UUID 5
[r...@lustrefour Desktop]#    

--- On Sat, 1/10/09, Arden Wiebe  wrote:

From: Arden Wiebe 
Subject: [Lustre-discuss] Optimal OSS OST drives for boxed deployment
To: lustre-discuss@lists.lustre.org
Date: Saturday, January 10, 2009, 1:40 PM

I have two OSS each have six 1TB drives.  
sda contains the kernel and the operating system.
sdb,sdc,sdd,sde,sdf are the targets and make only a raid 5.

Is it advisable to add another drive to each of these OSS's to facilitate raid 
6 for the targets?

sda has only / partion and occupies the entire 1TB drive.


      
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss



  ___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Optimal OSS OST drives for boxed deployment

2009-01-10 Thread Arden Wiebe
I have two OSS each have six 1TB drives.  
sda contains the kernel and the operating system.
sdb,sdc,sdd,sde,sdf are the targets and make only a raid 5.

Is it advisable to add another drive to each of these OSS's to facilitate raid 
6 for the targets?

sda has only / partion and occupies the entire 1TB drive.


  
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] What do clients run on?

2009-01-10 Thread Arden Wiebe
So if MGS is trivial load then I can safely mount client there?

--- On Sat, 1/10/09, Kevin Van Maren  wrote:

From: Kevin Van Maren 
Subject: Re: [Lustre-discuss] What do clients run on?
To: "Arden Wiebe" 
Cc: "lustre-discuss@lists.lustre.org" 
Date: Saturday, January 10, 2009, 1:26 PM

It's normally phrased as dedicated server machines, but yes: running thw client 
on MDS or OSS can (and does) deadlock under low memory conditions.

MGS is a trivial load and is normally on MDS node.

Kevin

On Jan 10, 2009, at 1:51 PM, Arden Wiebe  wrote:

> I've read it a zillion times but can't seem to find it again.  Can a client 
> run on the same server as a MGS, MDT or OSS?  Is a dedicated client machines 
> necessary?
> 
> 
> 
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss



  ___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] What do clients run on?

2009-01-10 Thread Arden Wiebe
Okay I'll rephrase the question?  Given a limited deployment can I mount the 
client on the MDT, MGS or OSS?  Is the best choice to build a dedicated client?

--- On Sat, 1/10/09, Arden Wiebe  wrote:

From: Arden Wiebe 
Subject: [Lustre-discuss] What do clients run on?
To: lustre-discuss@lists.lustre.org
Date: Saturday, January 10, 2009, 12:51 PM

I've read it a zillion times but can't seem to find it again.  Can a client run 
on the same server as a MGS, MDT or OSS?  Is a dedicated client machines 
necessary?


      
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss



  ___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] What do clients run on?

2009-01-10 Thread Arden Wiebe
I've read it a zillion times but can't seem to find it again.  Can a client run 
on the same server as a MGS, MDT or OSS?  Is a dedicated client machines 
necessary?


  
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Updated Kernel Question?

2008-12-23 Thread Arden Wiebe
Will Lustre soon be available with a different kernel for the .rpm's?  I see 
openSUSE 11.1 now and RedHat is in public beta of 5.3 now.  Updated rpm's would 
be ideal from a module compilation perspective.

Recently my experience with some of the newer motherboards on the market has 
not been good in relation to configuring ethernet devices.  So far I have 
purchased the x2 P5Q motherboards and have continuing problems configuring the 
atl1e.ko driver to work with Lustre for CentOS 5.2.  

Also purchased recently x2 P5Q Premium motherboards and now find that I have 
the same difficulties with ethernet configuration of the Marvell Technology 
Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12).  

I would imagine a new kernel release is in the works as I have read much of 
what is available on the wiki :)  In the end the easiest way for me to work 
around the dated kernel ethernet drivers is to simply purchase more ethernet 
cards that work with the kernel and in the end that can't be a bad thing for my 
implementation of Lustre and LNET as eventually I will just have more bonded 
nics.  Oh I recently found the README.kernel.source so I'll keep on working on 
resolving how to build against the supplied kernel.






  ___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Kmod, dkms or make to compile network modules.

2008-12-18 Thread Arden Wiebe
I have been attempting to compile 
ftp://ftp.hogchain.net/pub/linux/attansic/atl1e/source/ against the lustre 
source in order to get the onboard ethernet working on my platforms.  Otherwise 
I'm having fun working with lustre on these two test boxes.

[r...@lustreone ~]# cat /proc/fs/lustre/devices
  0 UP mgs MGS MGS 7
  1 UP mgc mgc192.168@tcp 62dabb26-7650-e222-bb43-f5c07d60ce8d 5
  2 UP mdt MDS MDS_uuid 3
  3 UP lov arden-mdtlov arden-mdtlov_UUID 4
  4 UP mds arden-MDT arden-MDT_UUID 7
  5 UP ost OSS OSS_uuid 3
  6 UP obdfilter arden-OST arden-OST_UUID 9
  7 UP osc arden-OST-osc arden-mdtlov_UUID 5
  8 UP obdfilter arden-OST0001 arden-OST0001_UUID 9
  9 UP osc arden-OST0001-osc arden-mdtlov_UUID 5
 10 UP obdfilter arden-OST0002 arden-OST0002_UUID 9
 11 UP osc arden-OST0002-osc arden-mdtlov_UUID 5
 12 UP obdfilter arden-OST0003 arden-OST0003_UUID 9
 13 UP osc arden-OST0003-osc arden-mdtlov_UUID 5
 14 UP lov arden-clilov-81011c180c00 af4bbc1d-bdff-446f-e05a-bdbb69f24787 4
 15 UP mdc arden-MDT-mdc-81011c180c00 
af4bbc1d-bdff-446f-e05a-bdbb69f24787 5
 16 UP osc arden-OST-osc-81011c180c00 
af4bbc1d-bdff-446f-e05a-bdbb69f24787 5
 17 UP osc arden-OST0001-osc-81011c180c00 
af4bbc1d-bdff-446f-e05a-bdbb69f24787 5
 18 UP osc arden-OST0002-osc-81011c180c00 
af4bbc1d-bdff-446f-e05a-bdbb69f24787 5
 19 UP osc arden-OST0003-osc-81011c180c00 
af4bbc1d-bdff-446f-e05a-bdbb69f24787 5
[r...@lustreone ~]# df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/sda1 903G  6.3G  850G   1% /
tmpfs 2.0G 0  2.0G   0% /dev/shm
df: `/mnt/arden/mdt': No such file or directory
df: `/mnt/arden/ost0': No such file or directory
df: `/mnt/arden/ost1': No such file or directory
df: `/mnt/arden/ost2': No such file or directory
df: `/mnt/arden/ost3': No such file or directory
192.168@tcp0:/arden
  3.6T  3.2G  3.4T   1% /mnt/arden

**How do I unmount the client off the MDS/MDT/OST/OSS combined node?**

[r...@lustretwo arden]# df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/sda1 903G  6.9G  849G   1% /
tmpfs 2.0G 0  2.0G   0% /dev/shm
192.168@tcp0:/arden
  3.6T  3.2G  3.4T   1% /mnt/arden
[r...@lustretwo arden]#   
[r...@lustreone src]# uname -a
Linux lustreone.linuxguru.ca 2.6.18-92.1.10.el5_lustre.1.6.6smp #1 SMP Tue Aug 
26 12:16:17 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

[r...@lustreone ~]# rpm -qa kernel\* | sort
kernel-devel-2.6.18-92.1.18.el5
kernel-headers-2.6.18-92.1.18.el5
kernel-lustre-smp-2.6.18-92.1.10.el5_lustre.1.6.6
kernel-lustre-source-2.6.18-92.1.10.el5_lustre.1.6.6
[r...@lustreone l1e-1.0.1.0]# ls
atl1e.7  copying  readme  release_note.txt  src
[r...@lustreone l1e-1.0.1.0]# cd src
[r...@lustreone src]# ls
at_ethtool.c  at.h  at_hw.c  at_hw.h  at_main.c  at_osdep.h  at_param.c  
kcompat.c  kcompat_ethtool.c  kcompat.h  Makefile  Makefile~
[r...@lustreone src]# make install
Makefile:61: >> /lib/modules/2.6.18-92.1.10.el5_lustre.1.6.6smp/build
Makefile:176: *** *** Aborting the build. *** This driver is not supported on 
kernel versions older than 2.4.0.  Stop.
[r...@lustreone src]# 

If I could move past this module compilation I would deploy more boxes and 
utilize more network interfaces.  I followed the 
http://wiki.lustre.org/index.php?title=BuildLustre partially.  Is there any way 
to build a kmod for this particular kernel or a dkms modules for the 
atheros/attansic ethernet driver?  I would be interested to know if anyone can 
make the driver source make install properly.  


  
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] MDS_with_ZFS_DMU

2008-11-17 Thread Arden Wiebe
Without having checked through the cvs is this the preferred method of 
implementing Lustre?  I am RTFM at every opportunity and implementing on my 
boxes here to develop a growing cluster.  So far I have the latest 1.6 
installed but not really working on one machine.

  Permanent disk data:
Target: spfs-MDT
Index:  unassigned
Lustre FS:  spfs
Mount type: ldiskfs
Flags:  0x75
  (MDT MGS needs_index first_time update )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: mdt.group_upcall=/usr/sbin/l_getgroups

checking for existing Lustre data: not found
device size = 953869MB
2 6 18
formatting backing filesystem ldiskfs on /dev/sdb
target name  spfs-MDT
4k blocks 0
options-J size=400 -i 4096 -I 512 -q -O dir_inde
mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs-MDT  -J size=400 -i
Writing CONFIGS/mountdata
[EMAIL PROTECTED] ~]# mkdir -p /mnt/test/mdt
You have new mail in /var/spool/mail/root
[EMAIL PROTECTED] ~]# mount -t lustre /dev/sdb /mnt/test/mdt
[EMAIL PROTECTED] ~]# cat /proc/fs/lustre/devices
  0 UP mgs MGS MGS 5
  1 UP mgc [EMAIL PROTECTED] e2dcc535-a2e8-0eed-89ef-0a995b
  2 UP mdt MDS MDS_uuid 3
  3 UP lov spfs-mdtlov spfs-mdtlov_UUID 4
  4 UP mds spfs-MDT spfs-MDT_UUID 3


Fun stuff.  Installed the 

[EMAIL PROTECTED] Desktop]# rpm -ivh 
lustre_admin-2-0_2007_09_28_Friday_09h49m25s.noarch.rpm

although I haven't figured out how to start it yet.  Wanting to install the 
kernel, modules and tools for the subject matter assuming it is the way to 
proceed and bypass 1.6 branch.


  
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss