Re: [Lustre-discuss] MDS inode allocation question

2010-04-23 Thread Kevin Van Maren
Not sure if it was fixed, but there was a bug in Lustre returning the 
wrong values here.
If you create a bunch of files, the number of inodes reported should go 
up until you get
where you expect it to be.

Note that the number of inodes on the OSTs also limits the number of 
creatable files:
each file requires an inodes on at least one OST (number depends on how 
many OSTs
each file is striped across).

Kevin


Gary Molenkamp wrote:
> When creating the MDS filesystem, I used  '-i 1024' on a 860GB logical
> drive to provide approx 800M inodes in the lustre filesystem.  This was
> then verified with 'df -i' on the server:
>
>   /dev/sda86016  130452 8600295481% /data/mds
>
> Later, after completing the OST creation and mounting the full
> filesystem on a client, I noticed that 'df -i' on the client mount is
> only showing 108M inodes in the lfs:
>
> 10.18.1...@tcp:10.18.1...@tcp:/gulfwork
>  107454606  130452 1073241541% /gulfwork
>
> A check with 'lfs df -i' shows the MDT only has 108M inodes:
>
> gulfwork-MDT_UUID 107454606130452 1073241540%
>   /gulfwork[MDT:0]
>
> Is there a preallocation mechanism in play here, or did I miss something
> critical in the initial setup?  My concern is that modifications to the
> inodes are not reconfigurable, so it must be correct before the
> filesystem goes into production.
>
> FYI,  the filesystem was created with:
>
> MDS/MGS on 880G logical drive:
> mkfs.lustre --fsname gulfwork --mdt --mgs --mkfsoptions='-i 1024'
>   --failnode=10.18.12.1 /dev/sda
>
> OSSs on 9.1TB logical drives:
> /usr/sbin/mkfs.lustre --fsname gulfwork --ost --mgsnode=10.18.1...@tcp
>   --mgsnode=10.18.1...@tcp /dev/cciss/c0d0
>
> Thanks.
>
>   

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] MDS inode allocation question

2010-04-23 Thread Gary Molenkamp

When creating the MDS filesystem, I used  '-i 1024' on a 860GB logical
drive to provide approx 800M inodes in the lustre filesystem.  This was
then verified with 'df -i' on the server:

  /dev/sda86016  130452 8600295481% /data/mds

Later, after completing the OST creation and mounting the full
filesystem on a client, I noticed that 'df -i' on the client mount is
only showing 108M inodes in the lfs:

10.18.1...@tcp:10.18.1...@tcp:/gulfwork
 107454606  130452 1073241541% /gulfwork

A check with 'lfs df -i' shows the MDT only has 108M inodes:

gulfwork-MDT_UUID 107454606130452 1073241540%
/gulfwork[MDT:0]

Is there a preallocation mechanism in play here, or did I miss something
critical in the initial setup?  My concern is that modifications to the
inodes are not reconfigurable, so it must be correct before the
filesystem goes into production.

FYI,  the filesystem was created with:

MDS/MGS on 880G logical drive:
mkfs.lustre --fsname gulfwork --mdt --mgs --mkfsoptions='-i 1024'
--failnode=10.18.12.1 /dev/sda

OSSs on 9.1TB logical drives:
/usr/sbin/mkfs.lustre --fsname gulfwork --ost --mgsnode=10.18.1...@tcp
--mgsnode=10.18.1...@tcp /dev/cciss/c0d0

Thanks.

-- 
Gary Molenkamp  SHARCNET
Systems Administrator   University of Western Ontario
g...@sharcnet.cahttp://www.sharcnet.ca
(519) 661-2111 x88429   (519) 661-4000
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Moving files off an OST

2010-04-23 Thread Scott
Appolgies if Im missing something obvious here.

My OSTs are set up in Raid 5 and one of the arrays has a bad stripe so i 
need to rebuild it.  In preparation for this i want to move all the data 
off of this OST so i deactivated the OST on the MDS and ran:


lfs find --recursive --obd nasone-OST0002_UUID --quiet /lustre | while 
read F;  cp $F $F.tmp && mv $F.tmp $F; done

This ran for quite a while and after it finished i ran the find command 
again to confirm there were no more files on the OST.

However if i look at the OSS i still show there are 3.4TBs of used space 
on that OST2:

# df
Filesystem   1K-blocks  Used Available Use% Mounted on

/dev/sdd 5765425880 5223515676 249043440  96% /mnt/ost4
/dev/sdc 4804519904 3479755816 1080708536  77% /mnt/ost2

Does this make any sense at all or am i missing something obvious here? 
I was expecting (hoping) to see the used space back to almost zero so 
does this mean i have quite a bit of lost data?

Any help?

Regards

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] 1.8.2 "make debs" for 2.6.22.19

2010-04-23 Thread Brian J. Murrell
On Fri, 2010-04-23 at 08:22 -0500, Hendelman, Rob wrote: 
> checking build system type... x86_64-unknown-linux-gnu
> checking host system type... x86_64-unknown-linux-gnu
> checking target system type... x86_64-unknown-linux-gnu
> checking for a BSD-compatible install... /usr/bin/install -c
> checking whether build environment is sane... yes
> checking for gawk... gawk
> checking whether make sets $(MAKE)... yes
> checking for gcc... gcc-
> checking for C compiler default output file name... configure: error: C 
> compiler cannot create executables
> See `config.log' for more details.

Looks like something in your environment is confusing configure about
what your compiler is.

You can override that test simply by setting CC="gcc" (assuming your
compiler is gcc and in $PATH) and exporting it before running configure
(or make debs I suppose).

$ export CC=gcc
$ make debs

Other than that, you could debug why configure is getting confused.

b.



signature.asc
Description: This is a digitally signed message part
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Future of LusterFS?

2010-04-23 Thread Stuart Midgley
Yes, we suffer hardware failures.  All the time.  That is sort of the point of 
Lustre and a clustered file system :)

We have had double-disk failures with raid5 (recovered everything except ~1MB 
of data), server failures, MDS failures etc.  We successfully recovered from 
them all.  Sure, it can be a little stressful... but it all works.

If server hardware fails, our file systems basically hangs until we fix it.  
Our most common failure is obviously disks... and they are all covered by raid. 
 Since we have mostly direct attached disk, you have a few minutes downtime of 
a server while you replace the disk... everything continues as normal when the 
server comes back.

-- 
Dr Stuart Midgley
sdm...@gmail.com



On 23/04/2010, at 18:41 , Janne Aho wrote:

> On 23/04/10 11:42, Stu Midgley wrote:
> 
>>> Would lustre have issues if using cheap off the shell components or
>>> would people here think you need to have high end machines with built in
>>> redundancy for everything?
>> 
>> We run lustre on cheap off the shelf gear.  We have 4 generations of
>> cheapish gear in a single 300TB lustre config (40 oss's)
>> 
>> It has been running very very well for about 3.5 years now.
> 
> This sounds promising.
> 
> Have you had any hardware failures?
> If yes, how well has the cluster cooped with the loss of the machine(s)?
> 
> 
> Any advice you can share from your initial setup of lustre?

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Future of LusterFS?

2010-04-23 Thread Stuart Midgley
Our success is based on simplicity.  Software raid on direct attached disks 
with no add-on cards (ie. ensure MB's have intel pro 1000 nics, at least 6 sata 
ports and reliable cpu's etc).

Our first generation gear consisted of a super-micro MB, 2GB memory single dual 
core intel cpu's and 6x750GB direct attached disks in a white-box chassis 
running software raid 5.  That was over 3.5years ago and it will actually be 
decommissioned tomorrow.

2nd generation were the same boxes, just the latest super-micro MB.

3rd generation were SGI xe250's with 8x1TB direct attached disks with software 
raid5.

4th generation are SGI/Rackable systems with 12x2TB disks with an LSI/3ware 
hardware raid6 card.

We absolutely hammer our file system and it has stood the test of time.  I 
think our latest gear went in for ~$420/TB.


-- 
Dr Stuart Midgley
sdm...@gmail.com



On 23/04/2010, at 23:17 , Troy Benjegerdes wrote:

> Taking a break from my current non-computer related work.. 
> 
> My guess based on your success is your gear is not so much cheap, as
> *cost effective high MTBF commodity parts*. 
> 
> If you go for the absolute bargain basement stuff, you'll have problems
> as individual components flake out. 
> 
> If you spend way too much money on high-end multi-redundant whizbangs,
> you generally get two things.. redundancy, which in my mind often only
> serves to make the eventual failure worse, and high-quality, long MTBF
> components.
> 
> If you can get the high MTBF components without all the redudancy
> (and associated complexity nightmare), then you win.
> 
> 
> On Fri, Apr 23, 2010 at 05:42:30PM +0800, Stu Midgley wrote:
>> We run lustre on cheap off the shelf gear.  We have 4 generations of
>> cheapish gear in a single 300TB lustre config (40 oss's)
>> 
>> It has been running very very well for about 3.5 years now.
>> 
>> 
>>> Would lustre have issues if using cheap off the shell components or
>>> would people here think you need to have high end maskines with built in
>>> redundancy for everything?

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Future of LusterFS?

2010-04-23 Thread Troy Benjegerdes
Taking a break from my current non-computer related work.. 

My guess based on your success is your gear is not so much cheap, as
*cost effective high MTBF commodity parts*. 

If you go for the absolute bargain basement stuff, you'll have problems
as individual components flake out. 

If you spend way too much money on high-end multi-redundant whizbangs,
you generally get two things.. redundancy, which in my mind often only
serves to make the eventual failure worse, and high-quality, long MTBF
components.

If you can get the high MTBF components without all the redudancy
(and associated complexity nightmare), then you win.


On Fri, Apr 23, 2010 at 05:42:30PM +0800, Stu Midgley wrote:
> We run lustre on cheap off the shelf gear.  We have 4 generations of
> cheapish gear in a single 300TB lustre config (40 oss's)
> 
> It has been running very very well for about 3.5 years now.
> 
> 
> > Would lustre have issues if using cheap off the shell components or
> > would people here think you need to have high end maskines with built in
> > redundancy for everything?
> 
 

Troy Benjegerdes 'da hozer'ho...@hozed.org  
CTO, Freedom Fertilizer, Sustainable wind to NH3, t...@freedomfertilizer.com
Benjegerdes Farms   TerraCarbo biofuels

The challenge in changing the world is not in having great ideas, it's in
having stupid simple ideas, as those are the ones that cause change.

Intellectual property is one of those great complicated ideas that
intellectuals like to intellectualize over, lawyers like to bill too
much over, and engineers like to overengineer. Meanwhile, it's the
stupid simple ideas underfoot that create wealth.   -- Troy, Mar 2010
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Kernel oops after cat on /proc/fs/lustre/mgs/MGS/exports/*/stats

2010-04-23 Thread Wojciech Turek
Hi,

This is a known bug that is fixed in 1.8.2

https://bugzilla.lustre.org/show_bug.cgi?id=21420

Best regards

Wojciech

On 23 April 2010 13:18, Christopher Huhn  wrote:

> Dear lustre wizards,
>
> we are experiencing problems on our MDS and our Lustre expert is abroad
> (he just attended LUG meeting).
>
> One of the symptoms we observe are reproducible kernel oopses when
> viewing some stats files beneath /proc/fs/lustre/mgs/MGS/exports :
>
>mds:~# cat /proc/fs/lustre/mgs/MGS/exports/10.12...@tcp/stats
>Killed
>mds:~#  mds kernel: Oops:  [38] SMP
>Apr 23 13:23:19 mds kernel: Unable to handle kernel paging request
>at 00040024 RIP:
>Apr 23 13:23:19 mds kernel: []
>:obdclass:lprocfs_stats_seq_show+0x80/0x1e0
>Apr 23 13:23:19 mds kernel: PGD 203067 PUD 0
>Apr 23 13:23:19 mds kernel: Oops:  [38] SMP
>Apr 23 13:23:20 mds kernel: CPU 7
>Apr 23 13:23:20 mds kernel: Modules linked in: mds fsfilt_ldiskfs(F)
>mgs mgc ldiskfs crc16 lustre lov mdc lquota osc ksocklnd ptlrpc
>obdclass lnet lvfs libcfs xt_tcpudp iptable_filter ip_tables
>x_tables drbd cn button ac battery bonding xfs ipmi_si ipmi_devintf
>ipmi_msghandler serio_raw psmouse joydev pcspkr i2c_i801 i2c_core
>shpchp pci_hotplug evdev parport_pc parport ext3 jbd mbcache
>dm_mirror dm_snapshot dm_mod raid10 raid456 xor raid1 raid0
>multipath linear md_mod sd_mod ide_cd cdrom ata_generic libata
>generic usbhid hid piix 3w_9xxx floppy ide_core ehci_hcd uhci_hcd
>e1000 scsi_mod thermal processor fan
>Apr 23 13:23:20 mds kernel: Pid: 7293, comm: cat Tainted: GF
>2.6.22+lustre1.6.7.2+0.credativ.etch.1 #2
>Apr 23 13:23:20 mds kernel: RIP: 0010:[]
>[] :obdclass:lprocfs_stats_seq_show+0x80/0x1e0
>Apr 23 13:23:20 mds kernel: RSP: 0018:8103ba5f9e48  EFLAGS: 00010282
>Apr 23 13:23:20 mds kernel: RAX: 00040004 RBX:
>7fff RCX: 0006
>Apr 23 13:23:20 mds kernel: RDX: 0101010101010101 RSI:
> RDI: 
>Apr 23 13:23:20 mds kernel: RBP:  R08:
>0008 R09: 
>Apr 23 13:23:20 mds kernel: R10:  R11:
> R12: 
>Apr 23 13:23:20 mds kernel: R13:  R14:
> R15: 8108000a1760
>Apr 23 13:23:20 mds kernel: FS:  2b4a366786d0()
>GS:81081004b840() knlGS:
>Apr 23 13:23:20 mds kernel: CS:  0010 DS:  ES:  CR0:
>8005003b
>Apr 23 13:23:20 mds kernel: CR2: 00040024 CR3:
>00078f018000 CR4: 06e0
>Apr 23 13:23:20 mds kernel: Process cat (pid: 7293, threadinfo
>8103ba5f8000, task 8107dc299530)
>Apr 23 13:23:20 mds kernel: Stack:  0202
> 00040004 81067dae2640
>Apr 23 13:23:20 mds kernel: 4bd18327 000ca54d
> 81067dae2640
>Apr 23 13:23:20 mds kernel: 00040004 00040004
>0400 
>Apr 23 13:23:20 mds kernel: Call Trace:
>Apr 23 13:23:20 mds kernel: [] seq_read+0x105/0x28d
>Apr 23 13:23:20 mds kernel: [] vfs_read+0xcb/0x153
>Apr 23 13:23:20 mds kernel: [] sys_read+0x45/0x6e
>Apr 23 13:23:20 mds kernel: [] system_call+0x7e/0x83
>Apr 23 13:23:20 mds kernel:
>Apr 23 13:23:20 mds kernel:
>Apr 23 13:23:20 mds kernel: Code: 48 8b 50 20 48 8b 48 28 4c 03 60
>10 4c 03 68 18 48 39 d3 48
>Apr 23 13:23:20 mds kernel: RIP  []
>:obdclass:lprocfs_stats_seq_show+0x80/0x1e0
> mds kernel: CR2: 00040024
>Apr 23 13:23:20 mds kernel: RSP 
>Apr 23 13:23:20 mds kernel: CR2: 00040024
>
>
> Server and affected client both run Lustre 1.6.7.2 on Debian Etch/x86_64
> in this case. The behavior does not change after a client reboot.
>
> All hints on how to solve this are really appreciated.
>
> Kind regards,
>Christopher
>
> --
> Christopher Huhn
> Linux therapist
>
> GSI Helmholtzzentrum fuer Schwerionenforschung GmbH
> Planckstr. 1
> 64291 Darmstadt
> http://www.gsi.de/
>
> Gesellschaft mit beschraenkter Haftung
>
> Sitz der Gesellschaft / Registered Office:Darmstadt
> Handelsregister   / Commercial Register:
>Amtsgericht Darmstadt, HRB 1528
>
> Geschaeftsfuehrung/ Managing Directors:
> Professor Dr. Dr. h.c. Horst Stoecker,
>Christiane Neumann,
>   Dr. Hartmut Eickhoff
> Vorsitzende des Aufsichtsrates / Supervisory Board Chair:
>   Dr. Beatrix Vierkorn-Rudolph
> Stellvertreter/ Deputy Chair: Dr. Rolf Bernhard
>
>
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lis

Re: [Lustre-discuss] 1.8.2 "make debs" for 2.6.22.19

2010-04-23 Thread Hendelman, Rob
Good morning Mr. Murrell & List,

I attempted this again & your patch did seem to fix that particular problem.  
Thanks for the patch.

Since I originally posted that question, I've switched to Ubuntu 8.04.4 with 
the included build system (the previous temporary build machine was recycled...)

r...@mag-hardy-change:/usr/src/lustre-1.8.2# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 8.04.4 LTS
Release:8.04
Codename:   hardy

I then applied the patch you posted & did the following:

./configure --with-linux=/usr/src/linux (symlink to linux-2.6.22.19)
Make
Make debs

This gets to: 
===
# touch files to same date, to avoid auto*
find . -type f -print0 | xargs -0 touch -r COPYING; \
if [ "." != "." ]; then \
mkdir -p ./build ./lustre/contrib ./libsysio; \
cp build/Makefile ./build/; \
cp lustre/contrib/mpich-*.patch ./lustre/contrib/; \
ln -s ../../../libsysio/include ./libsysio/; \
fi
( cd . && \
 ./configure --disable-dependency-tracking \
   --disable-modules \
   --disable-snmp \
   --disable-client \
   --enable-quota \
   --disable-server )
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking target system type... x86_64-unknown-linux-gnu
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking for gcc... gcc-
checking for C compiler default output file name... configure: error: C 
compiler cannot create executables
See `config.log' for more details.
make[1]: *** [configure-stamp] Error 77
make[1]: Leaving directory `/usr/src/lustre-1.8.2'
dpkg-buildpackage: failure: debian/rules build gave error exit status 2
make: *** [debs] Error 2
===

Apparently it thinks my compiler is gcc- instead of "gcc" ?

Config.log shows:
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Lustre configure LUSTRE_VERSION, which was
generated by GNU Autoconf 2.59.  Invocation command line was

  $ ./configure --disable-dependency-tracking --disable-modules --disable-snmp 
--disable-client --enable-quota --disable-server

## - ##
## Platform. ##
## - ##

hostname = mag-hardy-change
uname -m = x86_64
uname -r = 2.6.24-27-server
uname -s = Linux
uname -v = #1 SMP Wed Mar 24 11:32:39 UTC 2010

/usr/bin/uname -p = unknown
/bin/uname -X = unknown

/bin/arch  = unknown
/usr/bin/arch -k   = unknown
/usr/convex/getsysinfo = unknown
hostinfo   = unknown
/bin/machine   = unknown
/usr/bin/oslevel   = unknown
/bin/universe  = unknown

PATH: /usr/share/modass/gcc-4.2
PATH: /usr/local/sbin
PATH: /usr/local/bin
PATH: /usr/sbin
PATH: /usr/bin
PATH: /sbin
PATH: /bin
PATH: /usr/games


## --- ##
## Core tests. ##
## --- ##

configure:1509: checking build system type
configure:1527: result: x86_64-unknown-linux-gnu
configure:1535: checking host system type
configure:1549: result: x86_64-unknown-linux-gnu
configure:1557: checking target system type
configure:1571: result: x86_64-unknown-linux-gnu
configure:1600: checking for a BSD-compatible install
configure:1655: result: /usr/bin/install -c
configure:1666: checking whether build environment is sane
configure:1709: result: yes
configure:1742: checking for gawk
configure:1758: found /usr/bin/gawk
configure:1768: result: gawk
configure:1778: checking whether make sets $(MAKE)
configure:1798: result: yes
configure:2010: checking for gcc
configure:2036: result: gcc-
configure:2280: checking for C compiler version
configure:2283: gcc- --version &5
./configure: line 2284: gcc-: command not found
configure:2286: $? = 127
configure:2288: gcc- -v &5
./configure: line 2289: gcc-: command not found
configure:2291: $? = 127
configure:2293: gcc- -V &5
./configure: line 2294: gcc-: command not found
configure:2296: $? = 127
configure:2319: checking for C compiler default output file name
configure:2322: gcc- -Wall -g -O2 -O2  -Wl,-Bsymbolic-functions conftest.c  >&5
./configure: line 2323: gcc-: command not found
configure:2325: $? = 127
configure: failed program was:
| /* confdefs.h.  */
| 
| #define PACKAGE_NAME "Lustre"
| #define PACKAGE_TARNAME "lustre"
| #define PACKAGE_VERSION "LUSTRE_VERSION"
| #define PACKAGE_STRING "Lustre LUSTRE_VERSION"
| #define PACKAGE_BUGREPORT "https://bugzilla.lustre.org/";
| #define PACKAGE "lustre"
| #define VERSION "1.8.2"
| /* end confdefs.h.  */
| 
| int
| main ()
| {
| 
|   ;
|   return 0;
| }
configure:2364: error: C compiler cannot create executables
See `config.log' for more details.
##  ##
## Cache variables. ##
##  ##

ac_cv_build=x86_64-unknown-li

[Lustre-discuss] Kernel oops after cat on /proc/fs/lustre/mgs/MGS/exports/*/stats

2010-04-23 Thread Christopher Huhn
Dear lustre wizards,

we are experiencing problems on our MDS and our Lustre expert is abroad
(he just attended LUG meeting).

One of the symptoms we observe are reproducible kernel oopses when
viewing some stats files beneath /proc/fs/lustre/mgs/MGS/exports :

mds:~# cat /proc/fs/lustre/mgs/MGS/exports/10.12...@tcp/stats
Killed
mds:~#  mds kernel: Oops:  [38] SMP
Apr 23 13:23:19 mds kernel: Unable to handle kernel paging request
at 00040024 RIP:
Apr 23 13:23:19 mds kernel: []
:obdclass:lprocfs_stats_seq_show+0x80/0x1e0
Apr 23 13:23:19 mds kernel: PGD 203067 PUD 0
Apr 23 13:23:19 mds kernel: Oops:  [38] SMP
Apr 23 13:23:20 mds kernel: CPU 7
Apr 23 13:23:20 mds kernel: Modules linked in: mds fsfilt_ldiskfs(F)
mgs mgc ldiskfs crc16 lustre lov mdc lquota osc ksocklnd ptlrpc
obdclass lnet lvfs libcfs xt_tcpudp iptable_filter ip_tables
x_tables drbd cn button ac battery bonding xfs ipmi_si ipmi_devintf
ipmi_msghandler serio_raw psmouse joydev pcspkr i2c_i801 i2c_core
shpchp pci_hotplug evdev parport_pc parport ext3 jbd mbcache
dm_mirror dm_snapshot dm_mod raid10 raid456 xor raid1 raid0
multipath linear md_mod sd_mod ide_cd cdrom ata_generic libata
generic usbhid hid piix 3w_9xxx floppy ide_core ehci_hcd uhci_hcd
e1000 scsi_mod thermal processor fan
Apr 23 13:23:20 mds kernel: Pid: 7293, comm: cat Tainted: GF 
2.6.22+lustre1.6.7.2+0.credativ.etch.1 #2
Apr 23 13:23:20 mds kernel: RIP: 0010:[] 
[] :obdclass:lprocfs_stats_seq_show+0x80/0x1e0
Apr 23 13:23:20 mds kernel: RSP: 0018:8103ba5f9e48  EFLAGS: 00010282
Apr 23 13:23:20 mds kernel: RAX: 00040004 RBX:
7fff RCX: 0006
Apr 23 13:23:20 mds kernel: RDX: 0101010101010101 RSI:
 RDI: 
Apr 23 13:23:20 mds kernel: RBP:  R08:
0008 R09: 
Apr 23 13:23:20 mds kernel: R10:  R11:
 R12: 
Apr 23 13:23:20 mds kernel: R13:  R14:
 R15: 8108000a1760
Apr 23 13:23:20 mds kernel: FS:  2b4a366786d0()
GS:81081004b840() knlGS:
Apr 23 13:23:20 mds kernel: CS:  0010 DS:  ES:  CR0:
8005003b
Apr 23 13:23:20 mds kernel: CR2: 00040024 CR3:
00078f018000 CR4: 06e0
Apr 23 13:23:20 mds kernel: Process cat (pid: 7293, threadinfo
8103ba5f8000, task 8107dc299530)
Apr 23 13:23:20 mds kernel: Stack:  0202
 00040004 81067dae2640
Apr 23 13:23:20 mds kernel: 4bd18327 000ca54d
 81067dae2640
Apr 23 13:23:20 mds kernel: 00040004 00040004
0400 
Apr 23 13:23:20 mds kernel: Call Trace:
Apr 23 13:23:20 mds kernel: [] seq_read+0x105/0x28d
Apr 23 13:23:20 mds kernel: [] vfs_read+0xcb/0x153
Apr 23 13:23:20 mds kernel: [] sys_read+0x45/0x6e
Apr 23 13:23:20 mds kernel: [] system_call+0x7e/0x83
Apr 23 13:23:20 mds kernel:
Apr 23 13:23:20 mds kernel:
Apr 23 13:23:20 mds kernel: Code: 48 8b 50 20 48 8b 48 28 4c 03 60
10 4c 03 68 18 48 39 d3 48
Apr 23 13:23:20 mds kernel: RIP  []
:obdclass:lprocfs_stats_seq_show+0x80/0x1e0
 mds kernel: CR2: 00040024
Apr 23 13:23:20 mds kernel: RSP 
Apr 23 13:23:20 mds kernel: CR2: 00040024


Server and affected client both run Lustre 1.6.7.2 on Debian Etch/x86_64
in this case. The behavior does not change after a client reboot.

All hints on how to solve this are really appreciated.

Kind regards,
Christopher

-- 
Christopher Huhn
Linux therapist

GSI Helmholtzzentrum fuer Schwerionenforschung GmbH
Planckstr. 1
64291 Darmstadt
http://www.gsi.de/

Gesellschaft mit beschraenkter Haftung

Sitz der Gesellschaft / Registered Office:Darmstadt
Handelsregister   / Commercial Register: 
Amtsgericht Darmstadt, HRB 1528

Geschaeftsfuehrung/ Managing Directors:  
 Professor Dr. Dr. h.c. Horst Stoecker,
Christiane Neumann,
   Dr. Hartmut Eickhoff
Vorsitzende des Aufsichtsrates / Supervisory Board Chair:  
   Dr. Beatrix Vierkorn-Rudolph
Stellvertreter/ Deputy Chair: Dr. Rolf Bernhard


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Future of LusterFS?

2010-04-23 Thread Janne Aho
On 23/04/10 11:42, Stu Midgley wrote:

>> Would lustre have issues if using cheap off the shell components or
>> would people here think you need to have high end machines with built in
>> redundancy for everything?
>
> We run lustre on cheap off the shelf gear.  We have 4 generations of
> cheapish gear in a single 300TB lustre config (40 oss's)
>
> It has been running very very well for about 3.5 years now.

This sounds promising.

Have you had any hardware failures?
If yes, how well has the cluster cooped with the loss of the machine(s)?


Any advice you can share from your initial setup of lustre?


-- 
Janne Aho (Developer) | City Network Hosting AB - www.citynetwork.se
Phone: +46 455 690022 | Cell: +46 733 312775
EMail/MSN: ja...@citynetwork.se
ICQ: 567311547 | Skype: janne_mz | AIM: janne4cn | Gadu: 16275665
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Future of LusterFS?

2010-04-23 Thread Stu Midgley
We run lustre on cheap off the shelf gear.  We have 4 generations of
cheapish gear in a single 300TB lustre config (40 oss's)

It has been running very very well for about 3.5 years now.


> Would lustre have issues if using cheap off the shell components or
> would people here think you need to have high end maskines with built in
> redundancy for everything?


-- 
Dr Stuart Midgley
sdm...@gmail.com
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Future of LusterFS?

2010-04-23 Thread Janne Aho
On 22/04/10 17:38, Lundgren, Andrew wrote:

(somehow managed to send this as private mail, while it was ment to be 
sent to the list)

sorry being old fashioned and answer inline, but it feels less jeopardy.


> I think the lustre 2.0 release notes indicated that lustre will continue but 
> may only be supported on Oracle hardware by Oracle.
> If you are doing anything else, it seemed like you would be on your own.

In our economical calculation there ain't much space for support fees 
and we would getting some 3rd party support from one of our partners.


> That said, http://www.clusterstor.com/ is a new company founded by Peter 
> Braum (the guy who invented Lustre).
> They are creating a new cluster file system as well as supporting Lustre.  
> They have a customers link off of their website that indicates some of the 
> notables.

Interesting, but a bit hefty price tag for us.


> There is a possibility that there will be a lustre fork in the future.
> Some following Oracle's "opensource" model and the other following the more 
> traditional model.

After reading "After the Software wars" by Keith Curtis, in the long 
run, I think I'll be betting on the open source project than the closed one.

We are still here talking a bit about LustreFS vs GlusterFS, as the 
first time we will be using a cluster file system it feels quite 
difficult what to choose and the same time we need to keep the total 
cost as low as possible.

Would lustre have issues if using cheap off the shell components or 
would people here think you need to have high end maskines with built in 
redundancy for everything?

-- 
Janne Aho (Developer) | City Network Hosting AB - www.citynetwork.se
Phone: +46 455 690022 | Cell: +46 733 312775
EMail/MSN: ja...@citynetwork.se
ICQ: 567311547 | Skype: janne_mz | AIM: janne4cn | Gadu: 16275665
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss