Re: [Lustre-discuss] lo2iblnd and Mellanox IB question

2012-11-26 Thread Jerome, Ron
I've had to rebuild against the Mellanox OFED every time I change Lustre or 
OFED versions.  It's a bit of a catch 22 situation because you have to build 
the Mellanox OFED against the Lustre kernel, install the Mellanox OFED, then 
rebuild the Lustre modules against the Mellanox OFED.  The procedure I use is 
as follows...

* install upgraded Lustre kernel and kernel-devel rpms
* rebuild Mellanox OFED against Lustre kernel 
- mount -o loop MLNX_OFED.iso /root/mnt
- /root/mnt/docs/mlnx_add_kernel_support.sh -i /root/MLNX_OFED.iso
* install Mellanox OFED from rebuilt  MLNX_OFED.iso 
* install kernel-ib-devel from rebuilt MLNX_OFED.iso 

Now rebuld lustre-modules RPM to get ko2iblnd.ko which is compatible with 
Mellanox kernel-ib drivers...

* cd /usr/src/lustre-x.x.x
* configure --with-o2ib=/usr/src/openib  
* make  rpms


Ron. 
-Original Message-
From: lustre-discuss-boun...@lists.lustre.org 
[mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Ms. Megan Larko
Sent: November 20, 2012 4:21 PM
To: Lustre User Discussion Mailing List
Subject: [Lustre-discuss] lo2iblnd and Mellanox IB question

Hello to Everyone!

I have a question to which I think I know the answer, but I am seeking
confirmation (re-assurance?).

I have build a RHEL 6.2 system with lustre-2.1.2.   I am using the
rpms from the Whamcloud site for linux kernel
2.6.32_220.17.1.el6_lustre.x85_64 along with the version-matching
lustre,  lustre-modules, lustre-ldiskfs, and kernel-devel,I also
have from the Whamcloud site
kernel-ib-1.8.5-2.6.32-220.17.1.el6_lustre.x86_64 and the related
kernel-ib-devel for same.

The lustre file system works properly for TCP.

I would like to use InfiniBand.   The system has a new Mellanox card
for which mlxn1 firmware and drivers were installed.   After this was
done (I cannot speak to before) the IB network will come up on boot
and copy and ping in a traditional network fashion.

Hard Part:  I would like to run the lustre file system on the IB (ib0).
I re-created the lustre network to use /etc/modprobe.d/lustre.conf
pointing to o2ib in place of tcp0.   I rebuilt the mgs/mdt and all
osts to use the IB network (the mgs/mds --failnode=[new_IB_addr] and
the osts point to mgs on IB net).   When I modprobe lustre to start
the system I receive error messages stating that there are
Input/Output errors on lustre modules fld.ko, fid,ko, mdc.ko osc.ko
lov.ko.   The lustre.ko cannot be started.   A look in
/var/log/messages reveals many Unknown symbol and Disagrees about
version of symbol  from the ko2iblnd module.

A modprobe --dump-modversions /path/to/kernel/lo2iblnd.ko  shows it
pointing to the Modules.symvers of the lustre kernel.

Am I correct in thinking that because of the specific Mellanox IB
hardware I have (with its own /usr/src/ofa_kernel/Module.symvers
file), that I have to build Lustre-2.1.2 from tarball to use the
configure --with-o2ib=/usr/src/ofa_kernel  mandating that this
system use the ofa_kernel-1.8.5  modules and not the OFED 1.8.5 from
the kernel-ib rpms  to which Lustre defaults in the Linux kernel?

Is a rebuild of lustre from source mandartory or is there a way in
which I may point to the appropriate symbols needed by the
ko2iblnd.ko?

Enjoy the Thanksgiving holiday for those U.S. readers.To everyone
else in the world, have a great weekend!

Megan Larko
Hewlett-Packard
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre newbie

2012-06-29 Thread Jerome, Ron
Without actual error messages and the version of the installed code, I don't 
think anyone is going to be able to help much.

Personally, the first place I would start is the logs on the MDS. You can also 
get the lustre version from that node by running:

 rpm -qa | grep lustre

Recently, I also inherited a Lustre system (running 1.8.3) which was exhibiting 
similar issues and upgrading all the lustre servers to 1.8.8-wc1 (and CentOS 
5.8) seems to have resolved all the issues.

Ron. 

 -Original Message-
 From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-
 boun...@lists.lustre.org] On Behalf Of Jason Brooks
 Sent: June 28, 2012 2:35 PM
 To: lustre-discuss@lists.lustre.org
 Subject: [Lustre-discuss] Lustre newbie
 
 Hello,
 
 I am totally new to lustre.  I have inherited a couple of clusters which have
 a lustre filesystem mounted on each node via infiniband.
 
 one cluster has 56 nodes on it, the other has about 18.
 
 There are 6 lustre servers, five of which are ost's and the sixth is the ost.
 
 I have a problem: namely that the lustre filesystem is not mounting at times,
 or mysteriously unmounts itself.  If I try to mount it, at times I will get an
 error, but I can't recall what it is.  I think the latency values have
 something to do with it, but in truth, I am kind of at a loss where to start.
 
 I have luster 1.8 installed.  I have the pdf manual by sun, but what it really
 doesn't seem to illustrate well is how to step into a running system.  Do any
 of you have any recommended reading I do?
 
 Thanks!
 
 --jason
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Hardware upgrade routes

2009-11-16 Thread Jerome, Ron
Hi Daniel,

 

I have done this on a live file system by deactivating the OSTs,
migrating (see section 26.2 of the manual for a sample script) the data
off the OSTs in question, replacing the hardware and migrating it back.


 

_

Ron Jerome

Programmer/Analyst

National Research Council Canada

1200 Montreal Road, Ottawa, Ontario K1A 0R6

Government of Canada

 

 

From: lustre-discuss-boun...@lists.lustre.org
[mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of
daniel.ha...@stfc.ac.uk
Sent: November 16, 2009 9:18 AM
To: lustre-discuss@lists.lustre.org
Subject: [Lustre-discuss] Hardware upgrade routes

 

Hi,

 

I'm interested to find out what possible solutions there are for
upgrading storage hardware within a cluster over time, either following
a failure or just through nodes coming to the end of their normal
working life. We would expect a cluster to exist for many years whereas
the individual nodes may only last a few years each. Ideally it should
be possible to migrate data off an OST as required but there doesn't
appear to be anything in the manual which covers this use case
specifically.

 
The closest thing seems to be in section 4.3.11 of the manual Removing
and Restoring OSTs
(http://manual.lustre.org/manual/LustreManual18_HTML/ConfiguringLustre.h
tml#50532400_57420):

 

OSTs can be removed from and restored to a Lustre file system.
Currently in Lustre, removing an OST really means that the OST is
'deactivated' in the file system, not permanently removed. A removed OST
still appears in the file system; do not create a new OST with the same
name.

 

Thus one route to migration to new hardware could be to remove an OST
(making sure the name is not reused) then use step 2.5 in section
4.3.11.1 to copy to the _new_ hardware, rather than recovering to the
same hardware.

 

Does anyone have experience with this type of use case or knowledge of
alternative ways of handling this which they could describe for me?

 

Many thanks,

Daniel.

 

***

British Atmospheric Data Centre

STFC Rutherford Appleton Laboratory

Chilton, Didcot, Oxfordshire, OX11 0QX

***

 

 

-- 
Scanned by iCritical. 

 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Bad distribution of files among OSTs

2009-11-01 Thread Jerome, Ron
Another question I had with regards to this is how long have your OSS's been 
running without a reboot? 

Mine have been up for 148 days which is probably longer than ever before.  And 
now that I've said this, it just occurred to me that one of them was rebooted 
about a three weeks ago and all the others have been up for almost 6 months.  

I don't know if this has any relevance, but it's the only thing I can think of 
that's different.  

Ron.


-Original Message-
From: lustre-discuss-boun...@lists.lustre.org on behalf of Thomas Roth
Sent: Sun 11/1/2009 4:03 AM
To: Andreas Dilger
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [Lustre-discuss] Bad distribution of files among OSTs
 
Another question:
Could this situation, 10 full OSTs out of 200, lead to a significant drop in 
performance?
Before, we could usually get the full 110MB/s or so over the 1Gbit/s ethernet 
lines of the clients.
That had dropped to about 50%, but we did not find any other odd thing than the 
filling levels of
the OSTs.

Regards,
Thomas

Andreas Dilger wrote:
 On 2009-10-30, at 12:07, Thomas Roth wrote:
 in our 196 OST - Cluster, the previously perfect distribution of files
 among the OSTs is not working anymore, since ~ 2 weeks.
 The filling for most OSTs is between 57% and 62%, but some (~10)  have
 risen up to 94%. I'm trying to fix that by having these OSTs deactivated
 on the MDT and finding and migrating away data from them, but it seems
 I'm not fast enough and it's a ongoing problem - I've just deactivated
 another OST with threatening 67%.
 
 Is this correlated to some upgrade of Lustre?  What version are you using?
 
 
 Our qos_prio_free is at the default 90%.

 Our OST's sizes are between 2.3TB and 4.5TB. We use striping level 1, so
 it would be possible to fill up an OST by just creating a 2TB file.
 However, I'm not aware of any such gigafiles (using robinhood to get a
 picture of our file system).
 
 To fill the smallest OST from 60% to 90% would only need a few file that
 total 0.3 * 2.3TB, or 690GB.  One way to find such files is to mount the
 full OSTs with ldiskfs and do find /mnt/ost/O/0 -size +100G to list the
 object IDs that are very large, and then in bug 21244 I've written a small
 program that dumps the MDS inode number from the specified objects.  You
 can then use debugfs -c -R ncheck {list of inode numbers} /dev/${mdsdev}
 on the MDS to find the pathnames of those files.
 
 Cheers, Andreas
 -- 
 Andreas Dilger
 Sr. Staff Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.
 

-- 

Thomas Roth
Gesellschaft für Schwerionenforschung
Planckstr. 1- 64291 Darmstadt, Germany
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Bad distribution of files among OSTs

2009-10-30 Thread Jerome, Ron

Strangely (although I'm sure it's not related) I have seen the exact same 
behavior on my Lustre cluster in the last month or so. I have also never seen 
this before, and to the best of my knowledge there is no change in usage 
patterns.

I'm running 1.6.7.2 on the servers.

Ron Jerome
National Research Council Canada.


-Original Message-
From: lustre-discuss-boun...@lists.lustre.org on behalf of Thomas Roth
Sent: Fri 10/30/2009 2:07 PM
To: lustre-discuss@lists.lustre.org
Subject: [Lustre-discuss] Bad distribution of files among OSTs
 
Hi all,

in our 196 OST - Cluster, the previously perfect distribution of files
among the OSTs is not working anymore, since ~ 2 weeks.
The filling for most OSTs is between 57% and 62%, but some (~10)  have
risen up to 94%. I'm trying to fix that by having these OSTs deactivated
on the MDT and finding and migrating away data from them, but it seems
I'm not fast enough and it's a ongoing problem - I've just deactivated
another OST with threatening 67%.

Our qos_prio_free is at the default 90%.

Our OST's sizes are between 2.3TB and 4.5TB. We use striping level 1, so
it would be possible to fill up an OST by just creating a 2TB file.
However, I'm not aware of any such gigafiles (using robinhood to get a
picture of our file system).

In addition, our user's behavior should not have changed recently. In
August, the entire cluster had filled up to almost 80% in a neatly even
distribution among the OSTs, so we extended the cluster by more OSTs,
migrating data to even the filling between old and new ones. This also
succeeded, and up to October there was no indication of something not
working.

There are no error message in the logs that would point to some OSTs
being favored ;-)

So, what could be the cause of this misdistribution?

Regards,
Thomas


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] MDT backup procedure

2009-06-19 Thread Jerome, Ron
Hi John,

I migrated my MGS/MDT to new hardware just a few weeks ago without much
difficulty.  I did not use an LVM snapshots though, rather the procedure
outlined in the manual (section 15.1.3.1 of the 1.6 manual) using tar
(with the sparse option, this is very important!) and getattr.  Mine
is a combination MGS/MDT, so I also needed to tunefs.lustre --writeconf
to get the OST's to update their configuration logs on the new server.
I gave the new server the same IP address as the old one, so there
weren't any issues with changing nids.  It's been running great ever
since.

FYI, it took a few hours to create the tar and extended attribute files
on the old server (~3.4M inodes) and about half that time to restore
them onto the new server (faster disks :)  All in all, about 4 hours of
down time.

Ron Jerome
National Research Council Canada.

 
 -Original Message-
 From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-
 boun...@lists.lustre.org] On Behalf Of John White
 Sent: June 19, 2009 1:58 AM
 To: Adam Knight
 Cc: lustre-discuss@lists.lustre.org
 Subject: Re: [Lustre-discuss] MDT backup procedure
 
 On Jun 17, 2009, at 4:09 PM, Adam Knight wrote:
 
  Pertaining to your original email, rather than taking the MDT down
to
  backup, it is very convenient to use LVM snapshots.  With this
  functionality it creates a LV duplicate of the MDT and allows you to
  mount that as ldiskfs and backup files from a consistent copy (won't
  be
  changing even if your MDT continues to add/remove data).  Your
lustre
  filesystem will therefore stay operational during the backup.  If
you
  time it cleverly, you can snapshot your MDT and OSTs at the same
time
  and backup from all of them to have a consistent copy of the whole
  filesystem as well.
 
 
 So following this, has anyone migrated an MDT to new storage with this
 sort of procedure?
 
 -create an lvm'd MDT that produces snapshots
 -use it for a while in production
 -get some snazzy new disk
 -shutdown lustre
 -take a snapshot of the MDT and shuffle it off to some different
 storage media
 -create a new LVM with snazzy new disk (specifically of a different
 size from the original MDT)
 -restore snapshot
 -run lfsck for good measure (is this advisable on what could feasibly
 be a clean filesystem?)
 -bring up lustre
 
 Please keep in mind, I've used LVM but haven't used snapshots, I'm not
 familiar with their limitations.  We're looking to create a filesystem
 immediately but would like to get some much faster storage for the MDT
 later without burning and building a new FS.
 
  Thanks for this verbose reply.  It is exactly what I needed and
  what I suspected I would run into.  We are planning on multiple
  backup procedures.  Users will backup at checkpoints in their work
  flow, IT will backup the MDT nightly and we are also looking at the
  possibility of backup the complete file system.
 
  Thanks again for everyone's input, this gives me some good
  ammunition going forward for proposals.
 
  Thanks,
   Dan Kulinski
 
 
 
  ___
  Lustre-discuss mailing list
  Lustre-discuss@lists.lustre.org
  http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre installation and configuration problems

2009-06-17 Thread Jerome, Ron
I think the problem you have, as Cliff alluded to, is a mismatch between
your kernel version  and the Luster kernel version modules.  

 

You have kernel 2.6.18-92.el5 and are installing Lustre
2.6.18_92.1.17.el5   Note the .1.17 is significant as the modules
will end up in the wrong directory.  There is an update to CentOS to
bring the kernel to the matching 2.6.18_92.1.17.el5 version you can pull
it off the CentOS mirror site in the updates directory.

 

 

Ron. 

 

From: lustre-discuss-boun...@lists.lustre.org
[mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Carlos
Santana
Sent: June 17, 2009 11:21 AM
To: lustre-discuss@lists.lustre.org
Subject: Re: [Lustre-discuss] Lustre installation and configuration
problems

 

And is there any specific installation order for patchless client? Could
someone please share it with me? 

-
CS. 

On Wed, Jun 17, 2009 at 10:18 AM, Carlos Santana neu...@gmail.com
wrote:

Huh... :( Sorry to bug you guys again... 

I am planning to make a fresh start now as nothing seems to have worked
for me. If you have any comments/feedback please share them. 

I would like to confirm installation order before I make a fresh start.
From Arden's experience:
http://lists.lustre.org/pipermail/lustre-discuss/2009-June/010710.html ,
the lusre-module is installed last. As I was installing Lustre 1.8, I
was referring 1.8 operations manual
http://manual.lustre.org/index.php?title=Main_Page . The installation
order in the manual is different than what Arden has suggested. 

Will it make a difference in configuration at later stage? Which one
should I follow now? 
Any comments? 

Thanks,
CS. 

 

On Wed, Jun 17, 2009 at 12:35 AM, Carlos Santana neu...@gmail.com
wrote:

Thanks Cliff.

The depmod -a was successful before as well. I am using CentOS 5.2
box. Following are the packages installed:
[r...@localhost tmp]# rpm -qa | grep -i lustre
lustre-modules-1.8.0-2.6.18_92.1.17.el5_lustre.1.8.0smp

lustre-1.8.0-2.6.18_92.1.17.el5_lustre.1.8.0smp

[r...@localhost tmp]# uname -a

Linux localhost.localdomain 2.6.18-92.el5 #1 SMP Tue Jun 10 18:49:47
EDT 2008 i686 i686 i386 GNU/Linux

And here is a output from strace for mount:
http://www.heypasteit.com/clip/8WT

Any further debugging hints?

Thanks,
CS.


On 6/16/09, Cliff White cliff.wh...@sun.com wrote:
 Carlos Santana wrote:
 The '$ modprobe -l lustre*' did not show any module on a patchless
 client. modprobe -v returns 'FATAL: Module lustre not found'.

 How do I install a patchless client?
 I have tried lustre-client-modules and lustre-client-ver rpm packages
in
 both sequences. Am I missing anything?


 Make sure the lustre-client-modules package matches your running
kernel.
 Run depmod -a to be sure
 cliffw

 Thanks,
 CS.



 On Tue, Jun 16, 2009 at 2:28 PM, Cliff White cliff.wh...@sun.com
 mailto:cliff.wh...@sun.com wrote:

 Carlos Santana wrote:

 The lctlt ping and 'net up' failed with the following
messages:
 --- ---
 [r...@localhost ~]# lctl ping 10.0.0.42
 opening /dev/lnet failed: No such device
 hint: the kernel modules may not be loaded
 failed to ping 10.0.0...@tcp: No such device

 [r...@localhost ~]# lctl network up
 opening /dev/lnet failed: No such device
 hint: the kernel modules may not be loaded
 LNET configure error 19: No such device


 Make sure modules are unloaded, then try modprobe -v.
 Looks like you have lnet mis-configured, if your module options
are
 wrong, you will see an error during the modprobe.
 cliffw

 --- ---


 I tried lustre_rmmod and depmod commands and it did not
return
 any error messages. Any further clues? Reinstall patchless
 client again?

 -
 CS.


 On Tue, Jun 16, 2009 at 1:32 PM, Cliff White
 cliff.wh...@sun.com mailto:cliff.wh...@sun.com
 mailto:cliff.wh...@sun.com mailto:cliff.wh...@sun.com
wrote:

Carlos Santana wrote:

I was able to run lustre_rmmod and depmod
successfully. The
'$lctl list_nids' returned the server ip address and
 interface
(tcp0).

I tried to mount the file system on a remote client,
but it
failed with the following message.
--- ---
[r...@localhost ~]# mount -t lustre
10.0.0...@tcp0:/lustre
/mnt/lustre
mount.lustre: mount 10.0.0...@tcp0:/lustre at
/mnt/lustre
failed: No such device
Are the lustre modules loaded?
Check /etc/modprobe.conf and /proc/filesystems
Note 'alias lustre llite' should be removed from
 modprobe.conf
--- ---

However, the mounting is successful on a single node
configuration - with client on the same machine as MDS
 and OST.
Any clues? Where to look for logs and debug 

Re: [Lustre-discuss] MDT backup/restore

2009-05-22 Thread Jerome, Ron
Ok, replying to myself for the benefit of others who stumble upon
this...  

Using the -S (or --sparse) argument on the tar command when archiving
MDT/MDS file systems solves the issue of the restored files being larger
than the original and thus not fitting on the target file system.

I would suggest that adding this to the documentation might be beificial
to all who attempt to move an MDS file system :-)

Thanks to the Lustre team for all their hard work,

Ron. 


 -Original Message-
 From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-
 boun...@lists.lustre.org] On Behalf Of Jerome, Ron
 Sent: May 21, 2009 3:39 PM
 To: lustre-discuss@lists.lustre.org
 Cc: Brian J. Murrell
 Subject: Re: [Lustre-discuss] MDT backup/restore
 
 Hmmm, a little research in the archives leads me to believe that the
 --sparse option is required on the tar create command line.
 
 Would this be correct?
 
 BTW, this MDT is running 1.6.7.1
 
 Thanks,
 
 _
 Ron Jerome
 Programmer/Analyst
 National Research Council Canada
 M-2, 1200 Montreal Road, Ottawa, Ontario K1A 0R6
 Government of Canada
 Phone: 613-993-5346
 FAX:   613-941-1571
 _
 
 
  -Original Message-
  From: lustre-discuss-boun...@lists.lustre.org
[mailto:lustre-discuss-
  boun...@lists.lustre.org] On Behalf Of Brian J. Murrell
  Sent: May 21, 2009 2:15 PM
  To: lustre-discuss@lists.lustre.org
  Subject: Re: [Lustre-discuss] MDT backup/restore
 
  On Thu, 2009-05-21 at 13:39 -0400, Jerome, Ron wrote:
   Hi all,
 
  Hi Ron,
 
   I attempting to move my MDT to a new server and I'm seeing strange
   behavior when trying to restore the MDT tar file taken from the
  original
   disk to the new one.  Basically, the existing MDT data appears to
 use
   about 2.5G, however when I restore the tar file on the new server
 it
   completely fills a 120G partition and the restoration fails with
 out
  of
   disk space errors??
 
  There has been a lot of discussion on this list about backing up the
  MDT
  for relocation.  Please review the archives.  IIRC, there was even
  somebody reporting this exact issue.
 
  b.
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] MDT backup/restore

2009-05-21 Thread Jerome, Ron
Hi all, 

I attempting to move my MDT to a new server and I'm seeing strange
behavior when trying to restore the MDT tar file taken from the original
disk to the new one.  Basically, the existing MDT data appears to use
about 2.5G, however when I restore the tar file on the new server it
completely fills a 120G partition and the restoration fails with out of
disk space errors??  

I'm a little mystified as to what is happening here since the disk
format is more or less the same, what follows is the disk information
for the original disk followed by the new one.

Any insights would be greatly appreciated...

= Original MDT
=
[r...@mds1 data-MDT]# df -i
FilesystemInodes   IUsed   IFree IUse% Mounted on
/dev/md0 78151680 3368714 747829665% /mnt/data


[r...@mds1 data-MDT]# df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/md0  112G  2.5G  102G   3% /mnt/data

[r...@mds1 data-MDT]# tune2fs -l /dev/md0
tune2fs 1.40.11.sun1 (17-June-2008)
device /dev/md0 mounted by lustre per
/proc/fs/lustre/mds/data-MDT/mntdev
Filesystem volume name:   data-MDT
Last mounted on:  not available
Filesystem UUID:  57449444-0de9-42e3-a919-70d216bc01f5
Filesystem magic number:  0xEF53
Filesystem revision #:1 (dynamic)
Filesystem features:  has_journal ext_attr resize_inode dir_index
filetype needs_recovery sparse_super large_file
Filesystem flags: signed_directory_hash
Default mount options:(none)
Filesystem state: clean
Errors behavior:  Continue
Filesystem OS type:   Linux
Inode count:  78151680
Block count:  39070048
Reserved block count: 1953502
Free blocks:  28643263
Free inodes:  74782966
First block:  0
Block size:   4096
Fragment size:4096
Reserved GDT blocks:  1024
Blocks per group: 16384
Fragments per group:  16384
Inodes per group: 32768
Inode blocks per group:   4096
Filesystem created:   Tue May 29 13:49:47 2007
Last mount time:  Thu May 21 13:18:25 2009
Last write time:  Thu May 21 13:18:25 2009
Mount count:  180
Maximum mount count:  22
Last checked: Tue May 29 13:49:47 2007
Check interval:   15552000 (6 months)
Next check after: Sun Nov 25 12:49:47 2007
Reserved blocks uid:  0 (user root)
Reserved blocks gid:  0 (group root)
First inode:  11
Inode size:   512
Journal inode:8
Default directory hash:   tea
Directory Hash Seed:  cf133553-2ae6-42a1-b0b5-cfbad6fd104b
Journal backup:   inode blocks

= New MDT
=

[r...@mds2 ~]# df -i
FilesystemInodes   IUsed   IFree IUse% Mounted on
/dev/md2 35848192  384427 354637652% /root/data

[r...@mds2 ~]# df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/md2  120G  120G 0 100% /root/data

[r...@mds2 ~]# tune2fs -l /dev/md2
tune2fs 1.40.11.sun1 (17-June-2008)
Filesystem volume name:   data-MDT
Last mounted on:  not available
Filesystem UUID:  d197b003-5ce5-4d1b-8253-b196b2009d07
Filesystem magic number:  0xEF53
Filesystem revision #:1 (dynamic)
Filesystem features:  has_journal ext_attr resize_inode dir_index
filetype needs_recovery sparse_super large_file uninit_groups
Filesystem flags: signed_directory_hash
Default mount options:(none)
Filesystem state: clean
Errors behavior:  Continue
Filesystem OS type:   Linux
Inode count:  35848192
Block count:  35842992
Reserved block count: 1792149
Free blocks:  125970
Free inodes:  35463765
First block:  0
Block size:   4096
Fragment size:4096
Reserved GDT blocks:  1015
Blocks per group: 32768
Fragments per group:  32768
Inodes per group: 32768
Inode blocks per group:   4096
Filesystem created:   Wed May 20 16:45:54 2009
Last mount time:  Thu May 21 13:27:52 2009
Last write time:  Thu May 21 13:27:52 2009
Mount count:  14
Maximum mount count:  26
Last checked: Wed May 20 16:45:54 2009
Check interval:   15552000 (6 months)
Next check after: Mon Nov 16 15:45:54 2009
Reserved blocks uid:  0 (user root)
Reserved blocks gid:  0 (group root)
First inode:  11
Inode size:   512
Journal inode:8
Default directory hash:   tea
Directory Hash Seed:  819ff3d7-b341-4300-8cd8-b6a6b3020c4e
Journal backup:   inode blocks
_
Ron Jerome
Programmer/Analyst
National Research Council Canada
M-2, 1200 Montreal Road, Ottawa, Ontario K1A 0R6
Government of Canada
Phone: 613-993-5346
FAX:   613-941-1571

Re: [Lustre-discuss] MDT backup/restore

2009-05-21 Thread Jerome, Ron
Hmmm, a little research in the archives leads me to believe that the
--sparse option is required on the tar create command line.  

Would this be correct?

BTW, this MDT is running 1.6.7.1

Thanks,

_
Ron Jerome
Programmer/Analyst
National Research Council Canada
M-2, 1200 Montreal Road, Ottawa, Ontario K1A 0R6
Government of Canada
Phone: 613-993-5346
FAX:   613-941-1571
_


 -Original Message-
 From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-
 boun...@lists.lustre.org] On Behalf Of Brian J. Murrell
 Sent: May 21, 2009 2:15 PM
 To: lustre-discuss@lists.lustre.org
 Subject: Re: [Lustre-discuss] MDT backup/restore
 
 On Thu, 2009-05-21 at 13:39 -0400, Jerome, Ron wrote:
  Hi all,
 
 Hi Ron,
 
  I attempting to move my MDT to a new server and I'm seeing strange
  behavior when trying to restore the MDT tar file taken from the
 original
  disk to the new one.  Basically, the existing MDT data appears to
use
  about 2.5G, however when I restore the tar file on the new server it
  completely fills a 120G partition and the restoration fails with out
 of
  disk space errors??
 
 There has been a lot of discussion on this list about backing up the
 MDT
 for relocation.  Please review the archives.  IIRC, there was even
 somebody reporting this exact issue.
 
 b.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] speedy server shutdown

2009-02-09 Thread Jerome, Ron
If I'm not mistaken, umount -f will unmount your ost's (in fact any
lustre mount be it mds, mgs, ost, client) without delay.


Ron Jerome
National Reseach Council Canada


 -Original Message-
 From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-
 boun...@lists.lustre.org] On Behalf Of Robin Humble
 Sent: February 8, 2009 11:29 PM
 To: lustre-discuss@lists.lustre.org
 Subject: [Lustre-discuss] speedy server shutdown
 
 Hi,
 
 when shutting down our OSS's and then MDS's we often wait 330s for
each
 set of umount's to finish eg.
   Feb  2 03:20:06 xemds2 kernel: Lustre: Mount still busy with 68
refs,
 waiting for 330 secs...
   Feb  2 03:20:11 xemds2 kernel: Lustre: Mount still busy with 68
refs,
 waiting for 325 secs...
   ...
 is there a way to speed this up?
 
 we're interested in the (perhaps unusual) case where all clients are
 gone
 because the power has failed, and the Lustre servers are running on
UPS
 and need to be shut down ASAP.
 
 the tangible reward for a quick shutdown is that we can buy a lower
 capacity (cheaper) UPS if we can reliably and cleanly shutdown all the
 Lustre servers in 10mins, and preferably 3 minutes. if we're
tweaking
 timeouts to do this then hopefully we can tweak them just before the
 shutdown and avoid running short timeouts in normal operation.
 
 I'm probably missing something obvious, but I have looked through a
 bunch of /proc/{fs/lustre,sys/lnet,sys/lustre} entries and the
 Operations Manual and I can't actually see where the default 330s
comes
 from... ???
 it seems to be quite repeatable for both OSS's and MDS's.
 
 we're using Lustre 1.6.6 or 1.6.5.1 on servers and patchless 1.6.4.3
on
 clients with x86_64 RHEL 5.2 everywhere.
 thanks for any help!
 
 cheers,
 robin
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] 1.6.5.1 - 1.6.6

2009-01-07 Thread Jerome, Ron
For what it's worth, I've upgraded through every version of the Lustre
1.6.x series starting at 1.6.0 and now running 1.6.6 on a production
cluster (32TB in size with 5 OSS's 15 OST's and one MDS/MDT) and I have
never once ran tunefs.luster during the upgrade process.
 
Ron Jerome
National Research Council Canada.

 
 -Original Message-
 From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-
 boun...@lists.lustre.org] On Behalf Of Jakob Goldbach
 Sent: January 7, 2009 9:41 AM
 To: lustre-discuss
 Subject: [Lustre-discuss] 1.6.5.1 - 1.6.6
 
 Hi,
 
 The manual says I need to
 
 tunefs.lustre --mgs --writeconf  ...
 
 when upgrading from 1.6.5.1 to 1.6.6.
 
 Whats happens if I want to downgrade to 1.6.5.1 again ?
 
 I come from 1.6.4.3 - does that make a difference ?
 
 Thanks
 /Jakob
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre clients failing, and cant reconnect

2008-09-05 Thread Jerome, Ron
For what it's worth...  I've seen similar problems with clients not
being able to connect to OSSs

SERVER OS: Linux oss1 2.6.18-53.1.14.el5_lustre.1.6.5.1smp #1 SMP Thu
Jun 26 01:38:50 EDT 2008 i686 i686 i386 GNU/Linux
CLIENT OS: Linux x15 2.6.18-53.1.14.el5_lustre.1.6.5smp #1 SMP Mon May
12 22:24:24 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux


On the client side I see this...

  CLIENT LOG
=
Sep  1 15:17:22 x15 kernel: Lustre: Request x30990319 sent from
data-OST0004-osc-81022067ec00 to NID [EMAIL PROTECTED] 100s ago has
timed out (limit
100s).
Sep  1 15:17:22 x15 kernel: Lustre: Skipped 9 previous similar messages
Sep  1 15:17:22 x15 kernel: Lustre: data-OST0004-osc-81022067ec00:
Connection to service data-OST0004 via nid [EMAIL PROTECTED] was lost;
in progress
 operations using this service will wait for recovery to complete.
Sep  1 15:17:22 x15 kernel: LustreError:
3834:0:(ldlm_request.c:986:ldlm_cli_cancel_req()) Got rc -11 from cancel
RPC: canceling anyway
Sep  1 15:17:22 x15 kernel: LustreError:
3834:0:(ldlm_request.c:1575:ldlm_cli_cancel_list())
ldlm_cli_cancel_list: -11
Sep  1 15:17:22 x15 kernel: LustreError: 11-0: an error occurred while
communicating with [EMAIL PROTECTED] The ost_connect operation failed
with -16
Sep  1 15:17:22 x15 kernel: LustreError: Skipped 2 previous similar
messages
Sep  1 15:17:47 x15 kernel: Lustre:
3216:0:(import.c:395:import_select_connection())
data-OST0004-osc-81022067ec00: tried all connections, increasing
 latency to 6s
Sep  1 15:17:47 x15 kernel: Lustre:
3216:0:(import.c:395:import_select_connection()) Skipped 4 previous
similar messages
Sep  1 15:17:47 x15 kernel: LustreError: 11-0: an error occurred while
communicating with [EMAIL PROTECTED] The ost_connect operation failed
with -16
Sep  1 15:18:37 x15 last message repeated 2 times
Sep  1 15:19:02 x15 kernel: LustreError: 11-0: an error occurred while
communicating with [EMAIL PROTECTED] The ost_connect operation failed
with -16
Sep  1 15:19:27 x15 kernel: Lustre:
3216:0:(import.c:395:import_select_connection())
data-OST0004-osc-81022067ec00: tried all connections, increasing
 latency to 26s
Sep  1 15:19:27 x15 kernel: Lustre:
3216:0:(import.c:395:import_select_connection()) Skipped 3 previous
similar messages
Sep  1 15:19:27 x15 kernel: LustreError: 11-0: an error occurred while
communicating with [EMAIL PROTECTED] The ost_connect operation failed
with -16
Sep  1 15:20:17 x15 last message repeated 2 times
Sep  1 15:21:07 x15 kernel: LustreError: 11-0: an error occurred while
communicating with [EMAIL PROTECTED] The ost_connect operation failed
with -16
Sep  1 15:21:07 x15 kernel: LustreError: Skipped 1 previous similar
message
Sep  1 15:21:57 x15 kernel: Lustre:
3216:0:(import.c:395:import_select_connection())
data-OST0004-osc-81022067ec00: tried all connections, increasing
 latency to 51s
Sep  1 15:21:57 x15 kernel: Lustre:
3216:0:(import.c:395:import_select_connection()) Skipped 5 previous
similar messages
Sep  1 15:22:22 x15 kernel: LustreError: 11-0: an error occurred while
communicating with [EMAIL PROTECTED] The ost_connect operation failed
with -16
Sep  1 15:22:22 x15 kernel: LustreError: Skipped 2 previous similar
messages
Sep  1 15:24:52 x15 kernel: LustreError: 11-0: an error occurred while
communicating with [EMAIL PROTECTED] The ost_connect operation failed
with -16
Sep  1 15:24:52 x15 kernel: LustreError: Skipped 5 previous similar
messages
Sep  1 15:27:22 x15 kernel: Lustre:
3216:0:(import.c:395:import_select_connection())
data-OST0004-osc-81022067ec00: tried all connections, increasing
 latency to 51s
Sep  1 15:27:22 x15 kernel: Lustre:
3216:0:(import.c:395:import_select_connection()) Skipped 12 previous
similar messages
Sep  1 15:29:27 x15 kernel: LustreError: 11-0: an error occurred while
communicating with [EMAIL PROTECTED] The ost_connect operation failed
with -16
Sep  1 15:29:27 x15 kernel: LustreError: Skipped 10 previous similar
messages
Sep  1 15:37:47 x15 kernel: Lustre:
3216:0:(import.c:395:import_select_connection())
data-OST0004-osc-81022067ec00: tried all connections, increasing
 latency to 51s
Sep  1 15:37:47 x15 kernel: Lustre:
3216:0:(import.c:395:import_select_connection()) Skipped 24 previous
similar messages
Sep  1 15:38:12 x15 kernel: LustreError: 11-0: an error occurred while
communicating with [EMAIL PROTECTED] The ost_connect operation failed
with -16
Sep  1 15:38:12 x15 kernel: LustreError: Skipped 20 previous similar
messages
Sep  1 15:48:12 x15 kernel: Lustre:
3216:0:(import.c:395:import_select_connection())
data-OST0004-osc-81022067ec00: tried all connections, increasing
 latency to 51s
  END CLIENT LOG
=


Server log at corresponding time...

  SERVER LOG
=
Aug 31 04:02:04 oss1 syslogd 

Re: [Lustre-discuss] Problem with e2fsprogs

2008-06-20 Thread Jerome, Ron
Further to this, I was able to get
e2fsprogs-1.40.7.sun3-0redhat.src.rpm  to rebuild by changing the order
in which the libraries are built.

 

In Makefile.in I changed LIB_SUBDIRS from...

 

LIB_SUBDIRS=lib/et lib/ss lib/e2p lib/ext2fs lib/uuid lib/blkid intl

 

To...

 

LIB_SUBDIRS=lib/et lib/ss lib/ext2fs lib/e2p lib/uuid lib/blkid intl

 

i.e. building ext2fs prior to e2p 

 

Otherwise ext2_err.h was missing when trying to build e2p.

 

Ron. 

 

 

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jerome,
Ron
Sent: June 19, 2008 9:18 PM
To: David Frioni
Cc: Lustre-discuss@lists.lustre.org
Subject: Re: [Lustre-discuss] Problem with e2fsprogs

 

Thanks David,

 

I also tried that before posting to the list, but unfortunately it
failed to build so I didn't really pursue it much further thinking
perhaps it was a simple as the wrong file being packaged with the RHEL4
release.

 

Ron.

 



From: David Frioni [mailto:[EMAIL PROTECTED]
Sent: Thu 6/19/2008 5:32 PM
To: Jerome, Ron
Subject: Re: [Lustre-discuss] Problem with e2fsprogs

Jerome- 

 

When I tried to install this I also ran into the libdb-4.3.so dependency
problem. I resolved it by downloading the e2fsprogs src rpm from the
lustre site

and building it as follows:

 

rpmbuild --rebuild e2fsprogs-1.40.7.sun3-0redhat.src.rpm

 

 

Hope this helps,

Dave

 

 

 

 

 

On Jun 19, 2008, at 5:10 PM, Jerome, Ron wrote:





I downloaded Lustre(TM) 1.6.5 for Red Hat Enterprise Linux 4, i686 and
when trying to install it on CentOS 4.6,  I get  dependency errors as
show below.  It almost looks like this e2fsprogs package is for RHEL5
not RHEL4 however I did download it from the 4.x page (I did it twice
just to make sure).

 

--- Package e2fsprogs.i386 0:1.40.7.sun3-0redhat set to be updated

--- Package net-snmp-libs.i386 0:5.1.2-11.el4_6.11.3 set to be updated

--- Package lustre-ldiskfs.i686 0:3.0.4-2.6.9_67.0.7.EL_lustre.1.6.5smp
set to be updated

-- Running transaction check

-- Processing Dependency: libc.so.6(GLIBC_2.4) for package: e2fsprogs

-- Processing Dependency: libdb-4.3.so for package: e2fsprogs

-- Processing Dependency: rtld(GNU_HASH) for package: e2fsprogs

-- Finished Dependency Resolution

Error: Missing Dependency: libc.so.6(GLIBC_2.4) is needed by package
e2fsprogs

Error: Missing Dependency: libdb-4.3.so is needed by package e2fsprogs

Error: Missing Dependency: rtld(GNU_HASH) is needed by package e2fsprogs

 

_
Ron Jerome
Programmer/Analyst
National Research Council Canada
M-2, 1200 Montreal Road, Ottawa, Ontario K1A 0R6
Government of Canada
Phone: 613-993-5346
FAX:   613-941-1571
_

___

Lustre-discuss mailing list

Lustre-discuss@lists.lustre.org

http://lists.lustre.org/mailman/listinfo/lustre-discuss

 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Problem with e2fsprogs

2008-06-19 Thread Jerome, Ron
I downloaded Lustre(TM) 1.6.5 for Red Hat Enterprise Linux 4, i686 and
when trying to install it on CentOS 4.6,  I get  dependency errors as
show below.  It almost looks like this e2fsprogs package is for RHEL5
not RHEL4 however I did download it from the 4.x page (I did it twice
just to make sure).

 

--- Package e2fsprogs.i386 0:1.40.7.sun3-0redhat set to be updated

--- Package net-snmp-libs.i386 0:5.1.2-11.el4_6.11.3 set to be updated

--- Package lustre-ldiskfs.i686 0:3.0.4-2.6.9_67.0.7.EL_lustre.1.6.5smp
set to be updated

-- Running transaction check

-- Processing Dependency: libc.so.6(GLIBC_2.4) for package: e2fsprogs

-- Processing Dependency: libdb-4.3.so for package: e2fsprogs

-- Processing Dependency: rtld(GNU_HASH) for package: e2fsprogs

-- Finished Dependency Resolution

Error: Missing Dependency: libc.so.6(GLIBC_2.4) is needed by package
e2fsprogs

Error: Missing Dependency: libdb-4.3.so is needed by package e2fsprogs

Error: Missing Dependency: rtld(GNU_HASH) is needed by package e2fsprogs

 

_
Ron Jerome
Programmer/Analyst
National Research Council Canada
M-2, 1200 Montreal Road, Ottawa, Ontario K1A 0R6
Government of Canada
Phone: 613-993-5346
FAX:   613-941-1571
_

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] 1.6.3 - 1.6.4.2 upgrade

2008-02-17 Thread Jerome, Ron
I asked the same question a while back, and although I was never given a 100% 
guarantee (they only test between major and minor increments, not across 
multiple minor increments), the bottom line was that it should work and it did 
(from 1.6.0.1 - 1.6.3 without issue).  Basically, just install and reboot.
 
Ron Jerome
National Research Council Canada.



From: [EMAIL PROTECTED] on behalf of Charles Taylor
Sent: Sun 2/17/2008 7:04 PM
To: Lustre-discuss
Subject: Re: [Lustre-discuss] 1.6.3 - 1.6.4.2 upgrade




Turns out that the client upgrade worked just fine.   I had not 
noticed that the ko2iblnd module was not in place.

I'm still wondering if I need to do anything special with regard to 
upgrading the MGS/MDS and OSSs.   I'm hoping to just dump the 
software in place and reboot with live clients.   Seems kind of 
risky, but hey, the docs say you can do it for 1.4 - 1.6.3 so going 
from 1.6.3 - 1.6.4.2 ought to be a no-brainer, right?   :)

Charlie Taylor
UF HPC Center

On Feb 17, 2008, at 3:44 PM, Charles Taylor wrote:



 Just updated a single client from 1.6.3 to 1.6.4.2.The
 documentation seems to indicate that an upgraded client should still
 be able to mount the file system from a non-upgraded MGSMDT.The
 documentation appears to be referring to a 1.4 to 1.6 upgrade but I
 made the leap that similar things would apply to 1.6.3 - 1.6.4.2.

 As I said, the servers are still running 1.6.3 and have not been
 touched.The client is upgraded to 1.6.4.2.   When I try to mount
 the file system I get...

 Is the MGS specification correct?
 Is the filesystem name correct?
 If upgrading, is the copied client log valid? (see upgrade docs)

 I've double-checked the first two but I have no idea what the third
 item refers to.The docs doc about using tunefs to manually copy
 client config logs when upgrading an MGS/MDS but they seem to
 indicate that the only issue on the client one needs to worry about
 is the form of the mount command.   In going from 1.6.3 to 1.6.4.2,
 that should not be an issue.
 Have I missed a step?   Do I need to do something to tell the MGS/MDS
 that the client has been upgraded?

 Is there newer documentation for going from 1.6.3 to 1.6.4.2?I
 was hoping that I could just upgrade the software on the MGS/MDS and
 OSS (in that order) and restart?Is that not the case?

 Thanks,

 charlie taylor
 uf hpc center
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss