Re: [lustre-discuss] LUG 2022 REGISTRATION IS NOW OPEN!

2022-04-25 Thread Kilian Cavalotti via lustre-discuss
Hi Kirill,

Thanks for the invitation, the openness and transparency are very much
appreciated.

Now, for the issue at hand, I expect it will be quite hard for people to
justify the registration cost for a fully-virtual event, when the slides
and recording will be posted online and publicly available a few days later.

I don't know how past LUG attendees feel about this, but I'm concerned
charging users $175 for the privilege to attend LUG 2022 through Zoom may
not be exactly aligned with OpenSFS' missions of promoting Lustre usage,
increasing awareness and expanding its community.

Anyway, that's my $.02.

Cheers,
-- 
Kilian


On Fri, Apr 22, 2022 at 4:13 PM Kirill Lozinskiy via lustre-discuss <
lustre-discuss@lists.lustre.org> wrote:
>
> Kilian,
>
> Thank you for bringing this up and expressing your concerns!
>
> We typically provide an overview of the OpenSFS finances at the Annual
Members Meeting that takes place at LUG. We understand that there is
interest in going deeper into the annual budget, so the OpenSFS Board of
Directors would like to invite any interested OpenSFS Members and
Participants to attend a budget deepdive next week on Wednesday, April 27th
at 12:30 PM Pacific time. If you are interested in attending the Board Zoom
call, please let us know so we can send you the invite. You can email
ad...@opensfs.org if you are interested in attending.
>
> Thank you for bringing this up, and we hope to see you next week!
>
> Warm regards,
>
> Kirill Lozinskiy
> OpenSFS Treasurer
>
>
> On Thu, Apr 21, 2022 at 3:17 PM Kilian Cavalotti via Execs <
ex...@lists.opensfs.org> wrote:
>>
>> Dear OpenSFS,
>>
>> On Thu, Apr 21, 2022 at 9:06 AM OpenSFS Administration via
>> lustre-discuss  wrote:
>>
>> > We’re excited to announce that registration for the Lustre User Group
(LUG) 2022 virtual conference is now open. REGISTER ONLINE.  General
registration is $175.
>>
>> So, just to clarify: the event is entirely virtual and yet
>> participants will be charged a $175 registration fee?
>> That seems a bit steep... :(
>>
>> What's the rationale here? Especially considering that registration
>> for both LUG 2021 and LUG Webinar Series in 2020 (both virtual events
>> as well) was free.
>>
>> Cheers,
>> --
>> Kilian
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] LUG 2022 REGISTRATION IS NOW OPEN!

2022-04-21 Thread Kilian Cavalotti via lustre-discuss
Dear OpenSFS,

On Thu, Apr 21, 2022 at 9:06 AM OpenSFS Administration via
lustre-discuss  wrote:

> We’re excited to announce that registration for the Lustre User Group (LUG) 
> 2022 virtual conference is now open. REGISTER ONLINE.  General registration 
> is $175.

So, just to clarify: the event is entirely virtual and yet
participants will be charged a $175 registration fee?
That seems a bit steep... :(

What's the rationale here? Especially considering that registration
for both LUG 2021 and LUG Webinar Series in 2020 (both virtual events
as well) was free.

Cheers,
--
Kilian
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre-discuss Digest, Vol 162, Issue 10

2019-09-23 Thread Kilian Cavalotti
Hi Andrew,

On Mon, Sep 23, 2019 at 5:56 AM Tauferner, Andrew T
 wrote:
> What is the outlook for 2.12.3 and 2.13 availability.  I thought 2.12.3 would 
> already be available but I don't even see a release candidate in git.  Thank 
> you.

As just mentioned during LAD'19 today
(https://www.eofs.eu/_media/events/lad19/lad19_paper_3.pdf):
- 2.12.3 is targeted for the end of the month
- 2.13 for Q4 2019

Cheers,
-- 
Kilian
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] ZFS not freeing disk space

2016-08-10 Thread Kilian Cavalotti
Hi Thomas,

On Wed, Aug 10, 2016 at 10:57 AM, Thomas Roth  wrote:
> one of our ((Lustre 2.5.3, ZFS 0.6.3) OSTs got filled up to >90%, so I
> deactivated it and am now migrating files off of that OST.
>
> But when I do either 'lfs df' or 'df' on the OSS, and don't see any change
> in terms of bytes, while the migrated files already sum up to several GB.

It's very likely because your OST is deactivated, ie. disconnected
from the MDS, and thus freed up space is not accounted for. When you
reactivate your OST, it will reconnect to the MDS, which will start
cleaning up orphan inodes (ie. inodes that still exist on the OST but
are not referenced by any file on the MDT anymore). You should see
messages like "lustre-OST: deleting orphan objects from
0x0:180570872 to 0x0:180570891" when this happens.

That's actually how it's supposed to work, but there are some
limitations in 2.5 that may require a restart of the MDS. See
https://jira.hpdd.intel.com/browse/LU-7012 for details.

And of course, as soon as you re-activate your OST, new files will be
created on it, so it may skew the counters the other way.
But AFAIK, it's not specific to ZFS at all.

Cheers,
-- 
Kilian
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [Lustre-discuss] Performance dropoff for a nearly full Lustre file system

2015-01-14 Thread Kilian Cavalotti
Hi all,

On Wed, Jan 14, 2015 at 7:27 PM, Dilger, Andreas
 wrote:
> Of course, fragmentation also plays a role, which is why ldiskfs will reserve 
> 5% of the disk by default to avoid permanent performance loss caused by 
> fragmentation if the filesystem gets totally full.

Ashley Pittman gave a presentation at LAD'13 about the influence of
fragmentation on performance.
http://www.eofs.eu/fileadmin/lad2013/slides/03_Ashley_Pittman_Fragmentation_lad13.pdf

Cheers,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Build lustre 2.6 Client on Debian Wheezy

2015-01-07 Thread Kilian Cavalotti
Bonjour Thierry,

> /bin/sh: 1: [: -lt: unexpected operator

I'm pretty sure that's because in Debian, /bin/sh is linked to dash
and the Lustre build script expects bash.

You can try to run:
# dpkg-reconfigure dash
choose No to link /bin/sh to bash, and re-run the make-kpkg part.
Hopefully it will work better.

I suggest to re-run "dpkg-reconfigure dash" afterwards to restore dash
as the default shell.

Cheers,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Unable to write to the Lustre File System as any user except root

2011-10-11 Thread Kilian CAVALOTTI
Hi Carl,

On Tue, Oct 11, 2011 at 9:07 PM, Barberi, Carl E
 wrote:
> “ LustreError: 11-0: an error occurred while communicating with
> 192.168.10.2@o2ib.  The mds_getxattr operation failed with -13.”

You likely miss authentication information on your MDS about the user
you're trying to write as.
Just configure NIS, LDAP or whatever you're using on your MDS, and you
should be good to go.

Cheers,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lustre 1.6.5.1 panic on failover

2008-07-31 Thread Kilian CAVALOTTI
On Thursday 31 July 2008 17:22:28 Brock Palen wrote:
> Whats a good tool to grab this? Its more than one page long, and the
> machine does not have serial ports.

If your servers do IPMI, you probably can configure Serial-over-LAN to get 
a console and capture the logs.

But a way more convenient solution is netdump. As long as the network 
connection is working on the panicking machine, you should be able to 
transmit the kernel panic info, as well as a stack trace, to a 
netump-server, which will store it in a file.

See http://www.redhat.com/support/wpapers/redhat/netdump/


Cheers,
-- 
Kilian 
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Another download other than Sun?

2008-07-31 Thread Kilian CAVALOTTI
Hi Jeremy,

On Thursday 31 July 2008 01:04:24 pm Jeremy Mann wrote:
> I'm having difficulties downloading 1.6.5.1 through Sun. Every time I
> get a "General Protection" error. I really need to get this version
> so I can go home at a decent time tonight. Can somebody point me to
> an alternative location to download 1.6.5.1 for RHEL4?

I get the same error with Konqueror. However, the download page works 
from Firefox, so you may want to try that.

Although I agree that the plain Apache DirectoryIndex version from 
pre-Sun times was much easier and convenient to use (wget love). But 
well... :)

Cheers,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Luster recovery when clients go away

2008-07-31 Thread Kilian CAVALOTTI
Hi Brock,

On Thursday 31 July 2008 07:30:04 am Brock Palen wrote:
> Is there a a way to tell the OST's to go ahead and evict those two
> clients and finish recovering?  Also "time remaining" has been 0
> sense it was booted.  How long will the OST's wait before it lets
> operations continue?

Well, there should be a timeout, and recovery should be aborted anyway 
when then "time remaining" counter reaches 0, no matter how many 
clients have been recovered (the remaining ones are evicted, I 
believe).

In case this doesn't work, you can still avoid the recovery process by 
mounting your OSTs with -o abort_recov.

Cheers,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] collectl

2008-07-30 Thread Kilian CAVALOTTI
Hi Mark,

> useful. I suppose one might also make that argument about things like
> statfs, getattr - the only time I was able to make them change was in
> response to lfs commands. Might that logic also be applied to
> extended attributes and acl counters which I suspect also fall into
> the category of slowly changing counters?  

If you have ACLs enabled on your MDS, then every "ls -l" will induce 
getxattr()s and the mds_getxattr counter will be increased by as much. 
So this can change quickly. mds_setxattr, on the other hand, may change 
less often, since you usually set ACLs less often than you list files. 
But it can still be interesting to see if mds_setxattr goes through the 
roof.

> On the other hand, it seems like the 'reint' counters are the ones
> that tend to change a lot. Perhaps a clue is they're all prefaced
> with reint which leads me to ask if there is some simple definition
> of what reint actually means other than 'reintegrated operations'? 

I'd bet on "request identification" or something along those lines.

> Perhaps such a definition will help explain why setattr is a reint
> counter but getattr is not.  In fact, I have seen getattr_lock change
> a lot more than getattr.  What is the difference between the 2
> (obviously the latter is some sort of lock but it must be used more
> than just when incrementing getattr since they don't change
> together)?

I'm only speculating here, but I believe that extended attributes which 
are modifiable by a user on a client (like ACLs) are counted in 
*_xattr, while internal extended attributes used by the MDS, are 
counted in gettatr.

> That all said, it feels like the data to report is all the reints,
> getattr, getattr_lock and sync.  

I would also be interested in seeing (dis)connect (this can probably 
reveal network problems, if it increases too much), as well as quotactl 
and get/setxattr, since I use quotas and ACLs. :)


Cheers,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Questions about Lustre ACLs

2008-07-25 Thread Kilian CAVALOTTI
Hi all,

I've got a couple questions about ACLs in Lustre:


1. When they're enabled on the MDS, can a client mount the filesystem 
without them? It doesn't seem to be the case, but at the same time, the 
mount.lustre manpage mentions the noacl option in the "client-specific" 
section.

See, for instance:

Checking ACLs on the MDS:
# lctl get_param -n mdc.home-MDT-mdc-*.connect_flags | grep acl
acl

Mounting the client with no ACLs
# mount -t lustre -o noacl [EMAIL PROTECTED]:/home /home

ACLs are still in use:
# strace ls -al /home/kilian/mpihw.c 2>&1 | grep xattr
getxattr("/home/kilian/mpihw.c", "system.posix_acl_access"..., 0x0, 0) 
= -1 ENODATA (No data available)
getxattr("/home/kilian/mpihw.c", "system.posix_acl_default"..., 0x0, 0) 
= -1 ENODATA (No data available)

I believe getxattr() should return EOPNOTSUPP instead of ENODATA, if 
ACLs were disabled.


2. My second question is about the overhead induced by the ACLs. I 
didn't do any quantifying measurements, but having ACLs enabled seems 
to slower all MDS operations. A "ls" in a directory containing a lot of 
files "feels" way slower when ACLs are enabled on the MDS. Is that 
something to be expected? 


Thanks,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Can't build sun rdac driver against lustre source.

2008-07-25 Thread Kilian CAVALOTTI
Hi Stuart,

On Friday 25 July 2008 11:19:18 am Stuart Marshall wrote:
> The sequence I've used (perhaps not the best) is:
>
> - cd /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5.1smp/source/
> - cp /boot/config-2.6.9-67.0.7.EL_lustre.1.6.5.1smp .config
> - make clean
> - make mrproper

Doesn't "make mrproper" erase the .config file you just copied? In which 
case you probably end up with a default kernel, which doesn't matter 
too much since it's only about compiling an external module, but I 
guess it can bite you back if you ever consider recompiling the whole 
kernel.

Cheers,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Can't build sun rdac driver against lustre source.

2008-07-25 Thread Kilian CAVALOTTI
Hi Brock,

On Friday 25 July 2008 11:03:12 am Brock Palen wrote:
> I just had to copy  genksyms and mod from
> linux-2.6.9-67.0.7.EL_lustre.1.6.5.1 to linux-2.6.9-67.0.7.EL_lustre.
> 1.6.5.1-obj
>
> I figured you should be aware of this, if its a problem with sun's
> build system for their multipath driver or lustre source package.
> This is on RHEL4.  Using the lustre RPM's form sun's website.

It's a problem with the fact that Lustre kernels for RHEL4 are packaged 
the SuSE way, with a /usr/src/linux-$VERSION-$RELEASE/ and a 
/usr/src/linux-$VERSION-$RELEASE-obj/$ARCH/$FLAVOR/ directory holding 
the object files. Whereas RHEL4 expects everything to be located in 
/usr/src/linux-$VERSION-$RELEASE/.

A workaround this is to put the .config file into the kernel sources 
directory, and prepare the kernel tree manually.

What I usually do is the following (this is for Lustre 1.6.5.1):

# rm /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5.1smp/build
# ln -s /usr/src/linux-2.6.9-67.0.7.EL_lustre.1.6.5.1 
/lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5.1smp/build
# cp  /usr/src/linux-2.6.9-67.0.7.EL_lustre.1.6.5.1-obj/x86_64/smp/.config 
/usr/src/linux-2.6.9-67.0.7.EL_lustre.1.6.5.1/
# cd  /usr/src/linux-2.6.9-67.0.7.EL_lustre.1.6.5.1/
# [edit Makefile, and replace 'custom' by 'smp' in EXTRAVERSION]
# make oldconfig
# make modules_prepare

And then, you should be able to compile any additional kernel module.

> The next problem I am stuck on is:
>
> In file included from mppLnx26_spinlock_size.c:51:
> /usr/include/linux/autoconf.h:1:2: #error Invalid kernel header
> included in userspace
> mppLnx26_spinlock_size.c: In function `main':
> mppLnx26_spinlock_size.c:102: error: `spinlock_t' undeclared (first
> use in this function)

Can't be sure it will fix this problem too, but it may be worth a try.

Cheers,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] OSS crashes

2008-07-24 Thread Kilian CAVALOTTI
Hi Thomas, 

On Thursday 24 July 2008 09:24:11 am Thomas Roth wrote:
> On the next crash I'll try to get a stack trace, and logging the
> console to more than the xterm buffer surely is something we ought to
> do as well.

If you don't know it or use it already, maybe you could give a try to 
netdump: http://www.redhat.com/support/wpapers/redhat/netdump/

It basically allows you to get crash dumps and stack traces from a 
remote machine. Much useful for gathering Lustre debug information.

Cheers,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] specifying OST

2008-07-14 Thread Kilian CAVALOTTI
Hi Mag, 

On Friday 11 July 2008 04:38:40 am Mag Gam wrote:
> is it possible to create a file on a particular OST?

I guess you can do so using the "lfs setstripe" command. You can set the 
stripping information on a file or directory so that it only uses one 
OST. That's the case by default, but you need to use setstripe to 
specify which OST you want to use.

For instance, the following command will put "yourfile" of the first OST 
(id 0):

$ lfs setstripe --count 1 --index 0 yourfile

$ dd if=/dev/zero of=yourfile count=1 bs=100M
1+0 records in
1+0 records out

$ lfs getstripe yourfile
OBDS:
0: home-OST_UUID ACTIVE
[...]
yourfile
obdidx   objid  objidgroup
 033459243  0x1fe8c2b0

Cheers,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] 1.6.5 and OFED?

2008-06-17 Thread Kilian CAVALOTTI
On Monday 16 June 2008 04:35:41 am Greenseid, Joseph M. wrote:
> Is there any word on when the IB packages might be making it up to
> the download site for 1.6.5?  As had been previously noted, they were
> missing when the rest of 1.6.5 was pushed.

I'd like to support this request, since this is part of the 1.6.5 
Changelog:
"""
Severity: enhancement 
Bugzilla: 15316 
Description: build kernel-ib packages for OFED 1.3 in our release cycle
"""

Also, the download site lists lustre-client and lustre-client-modules 
RPMs for RHEL5 and SLES10, but not for RHEL4 nor SLES9. Is that by 
design, or are they missing too?

Thanks,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Gluster then DRBD now Lustre?

2008-06-16 Thread Kilian CAVALOTTI
On Monday 16 June 2008 11:40:40 am Andreas Dilger wrote:
> > NYC == New York City?  What
> > is SJC?
>
> SJC == San Jose, California

That's why I thought, but if so, the following part loses me: 

> This is working in a test setup, however there are some down sides.
> The first is that DRBD only supports IP, so we have to run IPoIB over
> our our infiniband adapters, not an ideal solution.

Nathan, you won't be able to use Infiniband between Ney Work City and 
San Jose, CA, anyway, right? Even without considering IB cables' length 
limitation, and unless you can use some kind of dedicated, 
special-purpose link between your sites, the public Internet is not 
really able to provide bandwidth nor latencies compatible with 
Infiniband standards.

IP is probably your best bet, here, and DRBD would probably be an 
appropriate candidate for this kind of job. Although, you probably 
don't want your synchronization data unencrypted over the public pipes, 
and you may need an extra VPN-ish layer to ensure data confidentiality.

Cheers,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre Mount Crashing

2008-06-02 Thread Kilian CAVALOTTI
On Monday 02 June 2008 08:35:35 am Charles Taylor wrote:
> Unfortunately, getting the messages off the console (in the machine
> room) means using a pencil and paper (you'd think we have something
> as fancy as a ip-kvm console server, but alas, we do things, ahem,
> "inexpensively" here.   

There are a couple solutions to help you there:

* using a serial console connected to a remote machine (costs a serial 
  cable and some configuration).

* having an IPMI-enabled BMC, or any sort of remote-controler card 
  should give you easy access to the machine's console, remotely. Those 
  cards ain't cheap, but if you already got them in your servers, that's
  the good occasion to put them in use.

* and maybe the easiest, most inexpensive (no hardware involved) and 
  most convenient one: using netdump [1]. You configure a netdump client 
  on the machine you want to gather logs and traces from, and a 
  netdump-server on an other host, to receive those messages. This 
  solution proved to be really efficient in gathering Lustre's debug 
  logs and crash dumps.

[1] http://www.redhat.com/support/wpapers/redhat/netdump/
and http://docs.freevps.com/doku.php?id=how-to:netdump

HTH,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Swap on Lustre (was: Client is not accesible when OSS/OST server is down)

2008-04-29 Thread Kilian CAVALOTTI
Hi Brian,

On Tuesday 29 April 2008 07:53:01 am Brian J. Murrell wrote:
> Unless you are using Lustre for your root and/or usr filesystem
> and/or for swap, Lustre should not hang a machine completely.

I was precisely wondering if it was possible to use a file residing on a 
Lustre filesystem as a swap file. I tried the basic steps without any 
success.

On a regular ext3 fs, no problem:

/tmp # dd if=/dev/zero of=./swapfile  bs=1024 count=1024
10240+0 records in
10240+0 records out
/tmp # mkswap ./swapfile
Setting up swapspace version 1, size = 104853 kB
/tmp # swapon -a ./swapfile
/tmp # swapon -s
Filename   TypeSizeUsedPriority
/dev/sda3  partition   4096564 204 -1
/tmp/swapfile  file102392  0   -2

But on a Lustre mount:

# cd /scratch
/scratch # grep /scratch /proc/mounts
[EMAIL PROTECTED]:/scratch /scratch lustre rw 0 0
/scratch # dd if=/dev/zero of=./swapfile  bs=1024 count=1024
10240+0 records in
10240+0 records out
/scratch # mkswap ./swapfile
Setting up swapspace version 1, size = 104853 kB
/scratch # swapon -a ./swapfile
swapon: ./swapfile: Invalid argument

Is that expected?

Thanks,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] HW experience

2008-03-26 Thread Kilian CAVALOTTI
Hi Martin,

On Wednesday 26 March 2008 04:53:31 Martin Gasthuber wrote:
>   we would like to establish a small Lustre instance and for the OST
> planning to use standard Dell PE1950 servers (2x QuadCore + 16 GB Ram)
> and for the disk a JBOD (MD1000) steered by the PE1950 internal Raid
> controller (Raid-6). Any experience (good or bad) with such a config ?

I also have a 50TB Lustre setup based on this hardware: 8 PE1950 OSSes 
connected to two MD1000 OSTes each. The MDS uses a MD3000 as a MDT for 
high-availability (redundancy is not currently in use, though, I never 
managed to get it working reliably).

Can't say much about the PERC6 controller, since I'm using its older 
brother PERC5, but memory wise, you should be good with 16B. We planned 
4GB per OSS (2xOST each) at the beginning, but we had to double that to 
avoid memory exhaustion [1]. It will depend on the load induced by the 
clients, though.

MD1000s' performance is great as long as you set the read-ahead settings as 
Aaron mentioned.

/scratch $ iozone -c -c -R -b ~/iozone.xls -C -r 64k -s 24m -i 0 -i 1 -i 
2 -i8 -t50   
"Throughput report Y-axis is type of test X-axis is number of processes"
"Record size = 64 Kbytes "
"Output is in Kbytes/sec"

"  Initial write " 1317906.72
"Rewrite " 2423618.81
"   Read " 3484409.47
"Re-read " 4023550.60
"Random read " 3361937.08
" Mixed workload " 2994666.57
"   Random write " 1777569.04


[1]http://lists.lustre.org/pipermail/lustre-discuss/2008-February/004874.html

Cheers,
-- 
Kilian 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre SNMP module

2008-03-20 Thread Kilian CAVALOTTI
On Thursday 20 March 2008 01:15:04 pm Mark Seger wrote:
> not sure if you're talking about collectl 

Not I wasn't, I was referring to the Lustre Monitoring Tool (LMT) from 
LLNL.

> Be careful here.  You can certain stick some data into an rrd but
> certainly not all of it, especially if you want to collect a lot of
> it at a reasonable frequency.  If you want accurate detail plots,
> you've gotta go to the data stored on each separate system.  I just
> don't see any way around this, at least not yet...

Yes, you're absolutely right. Given its intrinsic multi-scale nature, a 
RRD is well suited for keeping historical data on large time scales. 
This could allow a very convenient graphical overview of the different 
system metrics, but would be pointless for debugging purposes, where 
you do need fine-grained data. That's where collectl is the most useful 
for me. 

But what about both? I don't see any reason why collectl couldn't 
provide high-frequency accurate data to diagnose problems locally, and 
at the same time allow to aggregate less precise values in RRD for 
global visualization of multi-hosts systems.

> As a final note, I've put together a tutorial on using collectl in a
> lustre environment and have upload a preliminary copy at
> http://collectl.sourceforge.net/Tutorial-Lustre.html in case anyone
> wants to preview it before I link it into the documentation.  
> If nothing else, look at my very last example where I show what you 
> can see by monitoring lustre at the same time as your network
> interface.  

Very good, thanks for this. The readahead experiment is insightful.

> Did I also mention that collectl is probably one of the few tools
> that can monitor your Infiniband traffic as well?

That's why it rocks. :)

Now the only thing which still make me want to use other monitoring 
software is the ability to get a global view. Centralized data 
collection and easy graphing (RRD feeding) are still what I need most 
of the time.

Cheers,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre SNMP module

2008-03-11 Thread Kilian CAVALOTTI
On Tuesday 11 March 2008 01:52:33 am Brian J. Murrell wrote:
> You could do that, but I suspect that if you want to see those
> developments include SNMP access to the stats, you are going to have
> to be more proactive than just following the current development.  I
> don't have any more insight than what's in that thread about the
> plans underway but I'd be very surprised if they currently include
> SNMP.  I could be wrong but I suspect that if you want to see SNMP
> availability you'd have to get active 

Gotcha. Bug #15197, "Feature request: expand SNMP scope"

> either with participating in 
> the design and perhaps some hacking 

I'm not sure I can be of any help in this area, unfortunately. But 
I've seen that some users expressed the same kind of need and rolled up 
their sleeves :) 
http://lists.lustre.org/pipermail/lustre-devel/2008-January/001504.html

> or voicing your desires through 
> your sales channel.

That I can do. :)

Thanks for the advice,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre SNMP module

2008-03-10 Thread Kilian CAVALOTTI
Hi Brian, 

On Monday 10 March 2008 03:04:33 pm Brian J. Murrell wrote:
> I can't disagree with that, especially as Lustre installations get
> bigger and bigger.  Apart from writing custom monitoring tools,
> there's not a lot of "pre-emptive" monitoring options available. 
> There are a few tools out there like collectl (never seen it, just
> heard about it) 

collectl is very nice, but as dstat and such, it has to run on each and
every host. It can provide its results via sockets though, so it could
be used as a centralized monitoring system for a Lustre installation.

And it provides detailled statistics too:

# collectl -sL -O R
waiting for 1 second sample...

# LUSTRE CLIENT DETAIL: READAHEAD
#Filsys   Reads ReadKB  Writes WriteKB  Pend  Hits Misses NotCon MisWin LckFal  
Discrd ZFile ZerWin RA2Eof HitMax
home100192   0   0 0 0100  0  0  0  
0  0100  0  0
scratch 100192   0   0 0 0100  0  0  0  
0  0100  0  0
home102   6294  23 233 0 0 87  0  0  0  
0  0 87  0  0
scratch 102   6294  23 233 0 0 87  0  0  0  
0  0 87  0  0
home 95158  22 222 0 0 81  0  0  0  
0  0 81  0  0
scratch  95158  22 222 0 0 81  0  0  0  
0  0 81  0  0

# collectl -sL -O M
waiting for 1 second sample...

# LUSTRE CLIENT DETAIL: METADATA
#Filsys   Reads ReadKB  Writes WriteKB  Open Close GAttr SAttr  Seek Fsync 
DrtHit DrtMis
home  0  0   0   0 0 0 0 0 0 0  
0  0
scratch   0  0   0   0 0 0 2 0 0 0  
0  0
home  0  0   0   0 0 0 0 0 0 0  
0  0
scratch   0  0   0   0 0 0 0 0 0 0  
0  0
home  0  0   0   0 0 0 0 0 0 0  
0  0
scratch   0  0   0   0 0 0 1 0 0 0  
0  0

# collectl -sL -O B
waiting for 1 second sample...

# LUSTRE FILESYSTEM SINGLE OST STATISTICS
#Ost  Rds  RdK   1K   2K   4K   8K  16K  32K  64K 128K 256K Wrts 
WrtK   1K   2K   4K   8K  16K  32K  64K 128K 256K
home-OST0007000000000000
0000000000
scratch-OST0007 00900000000   12 
3075900000003
home-OST0007000000000000
0000000000
scratch-OST0007 001000000001
2100000000
home-OST0007000000000000
0000000000
scratch-OST0007 001000000001
2100000000


> and LLNL have one on sourceforge, 

Last time I checked, it only supported 1.4 versions, but it's been a while, 
so I'm probably a bit behind.

> but I can certainly  
> see the attraction at being able to monitor Lustre on your servers
> with the same tools as you are using to monitor the servers' health
> themselves.

Yes, that'd be a strong selling point.

> This could wind becoming a lustre-devel@ discussion, but for now, it
> would be interesting to extend the interface(s) we use to
> introduce /proc (and what will soon be it's replacement/augmentation)
> stats files so that they are automagically provided via SNMP.

That sounds like the way to proceed, indeed.

> You know, given the discussion in this thread:
> http://lists.lustre.org/pipermail/lustre-devel/2008-January/001475.ht
>ml now would be a good time for the the community (that perhaps might
> want to contribute) desiring SNMP access to get their foot in the
> door. Ideally, you get SNMP into the generic interface and then SNMP
> access to all current and future variables comes more or less free.

Oh, thanks for pointing this. It looks like major underlying changes 
are coming. I think I'll subscribe to the lustre-devel ML to try to 
follow them.

> That all said, there are some /proc files which provide a copious
> amount of information, like brw_stats for instance.  I don't know how
> well that sort of thing maps to SNMP, but having an SNMP manager
> watching something as useful as brw_stats for trends over time could
> be quite interesting.

Add some RRD graphs to keep historical variations, and you got the 
all-in-one Lustre monitoring tool we sysadmins are all waiting for. ;)

Cheers,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-

Re: [Lustre-discuss] Lustre SNMP module

2008-03-10 Thread Kilian CAVALOTTI
Hi Klaus,

On Friday 07 March 2008 05:52:51 pm Klaus Steden wrote:
> I was asking that same question a few months ago.

Yes, I remember you haven't been overwhelmed by answers. :\

> I can send you my 
> 1.6.2 spec file for reference ... That version also did not bundle
> the SNMP library, so I ended up building it by recompiling the whole
> set of Lustre RPMs to get what I needed, and then just dropped the
> DSO in place.

That's exactly what I did, finally.

> I'm curious as to what metrics you see to be useful -- I wasn't sure
> what to look for, so while I installed the module, I haven't yet
> thought of good things to ask of it.

So, from what I've seen in the MIB, the current SNMP module mainly 
report version numbers and free space information.

I think it would also be useful to get "activity metrics", the same kind 
of information which is in /proc/fs/lustre/llite/*/stats on clients (so 
we can see reads/writes and fs operations rates), 
in /proc/fs/lustre/obdfilter/*/stats on OSSes and 
in /proc/fs/lustre/mds/*/stats on MDSes.

Actually, all the /proc/fs/lustre/*/**/stats could be useful, but I 
guess what precise metric is the most useful heavily depends on what 
you want to see. :)

Cheers,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre SNMP module

2008-03-07 Thread Kilian CAVALOTTI
On Friday 07 March 2008 05:01:11 pm Kilian CAVALOTTI wrote:
> So I was wondering if there was any plan to include the SNMP module
> back in future RPM versions?

And in addition to that, is there any plan to add more stats through 
this SNMP module (the kind we find  
in /proc/fs/lustre/{llite,ost,mdt}/.../stats)? That'd be an excellent 
starting point to collect metrics and remotely monitor a Lustre setup 
fron a central location.

Thanks,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Lustre SNMP module

2008-03-07 Thread Kilian CAVALOTTI
Hi all,

I'd like to get some Lustre info from my OSS/MDSs through SNMP. So I'm 
reading the Lustre manual, and it indicates [1] that the lustresnmp.so 
file should be provided by the "base Lustre RPM". But it's not. :) At 
least not in the 1.6.4.1 RHEL4 x86_64 RPMs.

So I was wondering if there was any plan to include the SNMP module back 
in future RPM versions?

Thanks,
-- 
Kilian

[1]http://manual.lustre.org/manual/LustreManual16_HTML/DynamicHTML-15-1.html
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lustre dstat plugin

2008-03-07 Thread Kilian CAVALOTTI
Hi Brock,

On Wednesday 05 March 2008 05:21:51 pm Brock Palen wrote:
> I have wrote a lustre dstat plugin.  You can find it on my blog:

That's cool! Very useful for my daily work, thanks!

> It only works on clients, and has not been tested on multiple mounts,
> Its very simple just reads /proc/

It indeed doesn't read stats for multiple mounts. I slightly modified it 
so it can display read/write numbers for all the mounts it founds (see 
the attached patch).

Here's a typical output for a rsync transfer from scrath to home:

-- 8< ---
$ dstat -M lustre

Module dstat_lustre is still experimental.
--scratch---home---
 read write: read write
 110M0 :   0   110M
 183M0 :   0   183M
 184M0 :   0   184M
-- 8< ---

Maybe it could be useful to also add the other metrics from the stat 
file, but I'm not sure which ones would be the more relevant. And it 
would probably be wise to do that in a separate module, like 
lustre_stats, to avoid clutter.

Anyway, great job, and thanks for sharing it!
Cheers,
-- 
Kilian
--- dstat_lustre_orig.py	2008-03-07 15:54:10.0 -0800
+++ dstat_lustre.py	2008-03-07 15:54:36.0 -0800
@@ -5,28 +5,33 @@
 
 class dstat_lustre(dstat):
 	def __init__(self):
-		self.name = 'lustre 1.6 client'
-		for entry in os.listdir("/proc/fs/lustre/llite"):
-			filesystem = '/'.join(['/proc/fs/lustre/llite',entry,'stats'])
-			self.open(filesystem)
+		self.name = []
+		self.vars = []
+		if os.path.exists('/proc/fs/lustre/llite'):
+			for mount in os.listdir('/proc/fs/lustre/llite'):
+self.vars.append(mount)
+self.name.append(mount[:mount.rfind('-')])
 		self.format = ('f', 5, 1024)
-		self.vars = ('read', 'write')
-		self.nick = ('read', 'writ')
-		self.init(self.vars, 1)
+		self.nick = ('read', 'write')
+		self.init(self.vars, 2)
 		info(1, 'Module dstat_lustre is still experimental.')
 
 	def extract(self):
-		for line in self.readlines():
-			l = line.split()
-			if not l or l[0] != 'read_bytes': continue
-			self.cn2['read'] = long(l[6])
-		for line in self.readlines():
-			l = line.split()
-			if not l or l[0] != 'write_bytes': continue
-			self.cn2['write'] = long(l[6])
 		for name in self.vars:
-			self.val[name] = (self.cn2[name] - self.cn1[name]) * 1.0 / tick
-		if step == op.delay:
-			self.cn1.update(self.cn2)
+			f = open('/'.join(['/proc/fs/lustre/llite',name,'stats']))
+			lines = f.readlines()
+			for line in lines:
+l = line.split()
+if not l or l[0] != 'read_bytes': continue
+read = long(l[6])
+			for line in lines:
+l = line.split()
+if not l or l[0] != 'write_bytes': continue
+write = long(l[6])
+			self.cn2[name] = (read, write)
+			self.val[name] = ( (self.cn2[name][0] - self.cn1[name][0]) * 1.0 / tick,\
+			   (self.cn2[name][1] - self.cn1[name][1]) * 1.0 / tick ) 
+			if step == op.delay:
+self.cn1.update(self.cn2)
 
 # vim:ts=4:sw=4
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre Downloads

2008-02-14 Thread Kilian CAVALOTTI
On Thursday 14 February 2008 01:34:46 pm Cliff White wrote:
> http://downloads.clusterfs.com/
> should be working now. Please let us know if there are further
> issues. cliffw

That looks way better. :)

Thanks!
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre Downloads

2008-02-14 Thread Kilian CAVALOTTI
On Thursday 14 February 2008 12:09:59 pm Canon, Richard Shane wrote:
> I see that the download site has been moved and integrated into the
> Sun site.  It looks like this broke a few things.  For one, I can't
> get to any of the 1.4 releases.  Can this get fixed?

Grrr, I support this, and I don't like to have to "register" to download 
a tarball either...

Cheers,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] how do you mount mountconf (i .e. 1.6) lustre on your servers?

2008-02-14 Thread Kilian CAVALOTTI
On Thursday 14 February 2008 10:44:33 am Brian J. Murrell wrote:
> > on an OSS:
> >   /dev/sdb  /lustre/ost-home  lustre  defaults,_netdev 0 0
>
> No heartbeat or failover then?

Nope. We initially planned to implement failover on our MDS, but I never 
managed to get Heartbeat working reliabily on our shared-bus 
configuration. It caused more downtime than it provides 
high-availability. 

We also had hardware issues, which likely caused the problems, but now 
that our cluster is in production, I can't really bring it down to 
reimplement failover. Users would probably begin to throw things at 
me... :)

Cheers,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] how do you mount mountconf (i.e. 1.6) lustre on your servers?

2008-02-14 Thread Kilian CAVALOTTI
Hi Brian,

On Thursday 14 February 2008 10:38:45 am Brian J. Murrell wrote:
> I'd like to take a small survey on how those of you using mountconf
> (1.6) are managing the mounting of your Lustre devices on the
> servers. 

We do use /etc/fstab, with the _netdev option (RHEL4):

on a client:
  [EMAIL PROTECTED]:/home   /home  lustre  defaults,flock,_netdev  0 0

on an OSS:
  /dev/sdb  /lustre/ost-home  lustre  defaults,_netdev 0 0


Cheers,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] o2iblnd no resources

2008-02-04 Thread Kilian CAVALOTTI
On Sunday 03 February 2008 06:30:16 am Isaac Huang wrote:
> It depends on the architectures of the OSSes - o2iblnd, and I believe
> OFED too, can't use memory in ZONE_HIGHMEM. For example, on x86_64
> where ZONE_HIGHMEM is empty, adding more RAM will certainly help.

Good to know, thanks.

On the strange side, this "no resources" message only appears on one 
client. It get it from pretty much all our 8 OSSes, while all the other 
276 clients can still access the filesystem (hence all the 8 OSSes) 
with not a single problem. Rebooting the problematic client doesn't 
help either.

Does that sound that something that logic can explain? I would assume 
that if the OSS were out of memory, this would indifferently affect all 
the clients, right?

Thanks,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Luster clients getting evicted

2008-02-04 Thread Kilian CAVALOTTI
On Monday 04 February 2008 10:17:37 am Brock Palen wrote:
> The
> cluster IS to big, but there isn't a person at the university who is
> willing to pay for anything other than more cluster nodes.  Enough
> with politics.

That's the first time I hear a cluster is too big, people usually 
complain about the contrary. :)
But the second part sounds very very familiar, though... Anyway.

> I just had another node get evicted while running code causing the
> code to lock up.  This time it was the MDS that evicted it.  Pinging
> work though:
>
> [EMAIL PROTECTED] ~]# lctl ping [EMAIL PROTECTED]
> [EMAIL PROTECTED]
> [EMAIL PROTECTED]

Ok.

> I have attached the output of lctl dk  from the client and some
> syslog messages from the MDS.

(recover.c:188:ptlrpc_request_handle_notconn()) import 
nobackup-MDT-mdc-01012bd27c00 of 
[EMAIL PROTECTED]@tcp abruptly disconnected: 
reconnecting
(import.c:133:ptlrpc_set_import_discon()) 
nobackup-MDT-mdc-01012bd27c00: Connection to service 
nobackup-MDT via nid [EMAIL PROTECTED] was lost; 

I will let Lustre people comment on this, but this sure looks like a 
network problem to me.

Is there any information you can get out of the switches (logs, dropped 
packets, retries, stats, anything)?

> Nope both servers have 2GB ram, and load is almost 0.  No swapping.

Do you see dropped packets or errors in your ifconfig output, on the 
servers and/or clients?

Cheers,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Luster clients getting evicted

2008-02-04 Thread Kilian CAVALOTTI
Hi Brock,

On Monday 04 February 2008 07:11:11 am Brock Palen wrote:
> on our cluster that has been running lustre for about 1 month. I have
> 1 MDT/MGS and 1 OSS with 2 OST's.
>
> Our cluster uses all Gige and has about 608 nodes 1854 cores.

This seems to be a lot of clients for only one OSS (and thus for only 
one GigE link to the OSS).

> We have allot of jobs that die, and/or go into high IO wait,  strace
> shows processes stuck in fstat().
>
> The big problem is (i think) I would like some feedback on it that of
> these 608 nodes 209 of them have in dmesg the string
>
> "This client was evicted by"
>
> Is this normal for clients to be dropped like this?  

I'm not an expert here, but evictions typically occur when a client 
hasn't been seen for a certain period by the OSS/MDS. This is often 
related to network problems. Considering your number of clients, if 
they all do I/O operations on the filesystem concurrently, maybe your 
Ethernet switches are the bottleneck and have to drop packets. Is your 
GigE network working fine outside of Lustre?

To eliminate networking issues from the equation, you can try to lctl 
ping your MDS and OSS from a freshly evicted node, and see what you 
get. (lctl ping )

> Is there some 
> tuning that needs to be done to the server to carry this many nodes
> out of the box?  We are using default lustre install with Gige.

Do your MDS or OSS show any particularly high load or memory usage? Do 
you see any Lustre-related error messages in their logs?

CHeers,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] mkfs.lustre and disk partitions

2008-02-04 Thread Kilian CAVALOTTI
Hi Jim,

On Monday 04 February 2008 07:04:10 am Jim Albin wrote:
> Hello,
>  I've seen several notes mentioning the disadvantages of using disk
> partitions on the storage devices for Lustre OSTs (and/or MDTs). My
> questions, if anyone can help me, are;
>
> 1) Should I delete any existing partitions on the device?

There's no need to explicitely destroy the partitions if you overwrite 
them.

> 2) If not, should I partition the device into a single partition with
> a specific block size (maybe 1mb)?

No need either. One single partition is still a partition.

> 3) Can I just use the disk block device (eg, /dev/sda) when I
> mkfs.lustre and is it
> smart enough to ignore the partition table?

Yes, exactly. Generally speaking, mkfs /dev/sdb will use the whole sdb 
device for the filesystem, and you won't have any partition table. As a 
consequence, you won't be able to boot from it, which is not relevant 
here, but all the other operations will work as on any regular 
partition (tunefs, mount, etc).

> I'm trying to set up Lustre 1.6.3 and am seeing poor performance,
> possibly fragmentation on the mdt and ost.
> Thanks in advance for any suggestions.

I don't know what backend hardware you're using, but in case of Dell 
MD1000s, you probably can give a look (and a try) at: 
http://thias.marmotte.net/archives/2008/01/05/Dell-PERC5E-and-MD1000-performance-tweaks.html

HTH,
Cheers,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] o2iblnd no resources

2008-02-02 Thread Kilian CAVALOTTI
Hi Liang, 

On Friday 01 February 2008 23:39:09 you wrote:
> I think it's because o2iblnd uses fragmented RDMA by default(Max to
> 256), so we have to set max_send_wr as (concurrent_send * (256 + 1))
> while creating QP by rdma_create_qp(), it takes a lot of resource and
> can make a busy server out of memory sometime.

By the way, is there a way to free some of this memory to resolve the 
problem temporarily, without having to restart the OSS?

Thanks,
-- 
Kilian 
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] o2iblnd no resources

2008-02-02 Thread Kilian CAVALOTTI
On Saturday 02 February 2008 00:42:47 Isaac Huang wrote:
> > Here is patch for this problem (using FMR in o2iblnd)
> > https://bugzilla.lustre.org/attachment.cgi?id=15144
>
> This is an experimental patch - nodes with the patch applied are not
> interoperable with those without it. Please don't propagate the patch
> to production systems.

Thanks for the explanation. Since the problem indeed occurs on a production 
system, I'd rather keep experimental patches out of the way.

I assume that adding more RAM on the OSSes is likely to solve this problem, 
right? If that's the case, I'd probably go this way, before the FMR patch 
is landed.

Thanks,
-- 
Kilian 
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] o2iblnd no resources

2008-02-01 Thread Kilian CAVALOTTI
Hi all,

What can cause a client to receive a "o2iblnd no resources" message 
from an OSS?
---
Feb  1 15:24:24 node-5-8 kernel: LustreError: 
1893:0:(o2iblnd_cb.c:2448:kiblnd_rejected()) [EMAIL PROTECTED] rejected: 
o2iblnd no resources
---

I suspect an out-of-memory problem, and indeed the OSS logs are filled
up with the following:
---
ib_cm/3: page allocation failure. order:4, mode:0xd0

Call Trace:{__alloc_pages+777} 
{alloc_page_interleave+61}
   {__get_free_pages+11} 
{kmem_getpages+36}
   {cache_alloc_refill+609} 
{__kmalloc+123}
   {:ib_mthca:mthca_alloc_qp_common+668}
   {:ib_mthca:mthca_alloc_qp+178} 
{:ib_mthca:mthca_create_qp+311}
   {:ib_core:ib_create_qp+20} 
{:rdma_cm:rdma_create_qp+43}
   {dma_pool_free+245} 
{:ib_mthca:mthca_init_cq+1073}
   {:ib_mthca:mthca_create_cq+282} 
{alloc_page_interleave+61}
   {:ko2iblnd:kiblnd_cq_completion+0}
   {:ko2iblnd:kiblnd_cq_event+0} 
{:ib_core:ib_create_cq+33}
   {:ko2iblnd:kiblnd_create_conn+3565}
   {:libcfs:cfs_alloc+40} 
{:ko2iblnd:kiblnd_passive_connect+2215}
   {:ib_core:ib_find_cached_gid+244}
   {:rdma_cm:cma_acquire_dev+293} 
{:ko2iblnd:kiblnd_cm_callback+64}
   {:ko2iblnd:kiblnd_cm_callback+0}
   {:rdma_cm:cma_req_handler+863} 
{alloc_layer+67}
   {idr_get_new_above_int+423} 
{:ib_cm:cm_process_work+101}
   {:ib_cm:cm_req_handler+2398} 
{:ib_cm:cm_work_handler+0}
   {:ib_cm:cm_work_handler+46} 
{worker_thread+419}
   {default_wake_function+0} 
{__wake_up_common+67}
   {default_wake_function+0} 
{keventd_create_kthread+0}
   {worker_thread+0} 
{keventd_create_kthread+0}
   {kthread+200} {child_rip+8}
   {keventd_create_kthread+0} 
{kthread+0}
   {child_rip+0}
Mem-info:
Node 0 DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
cpu 1 hot: low 2, high 6, batch 1
cpu 1 cold: low 0, high 2, batch 1
cpu 2 hot: low 2, high 6, batch 1
cpu 2 cold: low 0, high 2, batch 1
cpu 3 hot: low 2, high 6, batch 1
cpu 3 cold: low 0, high 2, batch 1
Node 0 Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
cpu 2 hot: low 32, high 96, batch 16
cpu 2 cold: low 0, high 32, batch 16
cpu 3 hot: low 32, high 96, batch 16
cpu 3 cold: low 0, high 32, batch 16
Node 0 HighMem per-cpu: empty

Free pages:   35336kB (0kB HighMem)
Active:534156 inactive:127091 dirty:1072 writeback:0 unstable:0 free:8834 
slab:146612 mapped:26222 pagetables:1035
Node 0 DMA free:9832kB min:52kB low:64kB high:76kB active:0kB inactive:0kB 
present:16384kB pages_scanned:37 all_unreclaimable? yes
protections[]: 0 510200 510200
Node 0 Normal free:25504kB min:16328kB low:20408kB high:24492kB 
active:2136624kB inactive:508364kB present:4964352kB pages_scanned:0 
all_unreclaimable? no
protections[]: 0 0 0
Node 0 HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB 
present:0kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
Node 0 DMA: 2*4kB 2*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 
0*2048kB 2*4096kB = 9832kB
Node 0 Normal: 1284*4kB 2290*8kB 126*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 
0*1024kB 0*2048kB 0*4096kB = 25504kB
Node 0 HighMem: empty
Swap cache: add 111, delete 111, find 23/36, race 0+0
Free swap:   4096360kB
1245184 pages of RAM
235840 reserved pages
659867 pages shared
0 pages swap cached
---

IB links are up and working on both the client and the OSS:
---
client# ibstatus
Infiniband device 'mthca0' port 1 status:
default gid: fe80::::0005:ad00:0008:af71
base lid:0x83
sm lid:  0x130
state:   4: ACTIVE
phys state:  5: LinkUp
rate:20 Gb/sec (4X DDR)
oss# ibstatus
Infiniband device 'mthca0' port 1 status:
default gid: fe80::::0005:ad00:0008:cb11
base lid:0x126
sm lid:  0x130
state:   4: ACTIVE
phys state:  5: LinkUp
rate:20 Gb/sec (4X DDR)
---
And the Subnet Manager doesn't expose any unusual error or skyrocketed 
counter (I use OFED 1.2, kernel 2.6.9-55.0.9.EL_lustre.1.6.4.1smp).

What I don't really get is that most clients can access files on this
OSS with no issue, and besides, my limited understanding of the kernel
memory mechanisms tend to let me believe that this OSS is not out of 
memory:
---
# 

Re: [Lustre-discuss] Lustre 1.6.4.1 - client lockup

2008-01-25 Thread Kilian CAVALOTTI
Hi Niklas,

On Friday 25 January 2008 07:10:47 am Niklas Edmundsson wrote:
> We're able to consistently kill the lustre client with bonnie in
> combination with striping. 

Out of curiosity, I tried to reproduce your experiment, and didn't 
encounter any problem. All the bonnie processes ran fine.

There are a lot of significative differences between our test 
environments, but I thought it may be useful to know the results of 
your test case on a different system.

> This is Lustre 1.6.4.1, Debian 2.6.18 
> amd64 kernel with lustre patches on both server and clients 

I used Lustre 1.6.4.1, RHEL4 and 2.6.9-55.0.9.EL_lustre.1.6.4.1smp amd64 
x86_64 kernel.

> All machines are dual opterons connected with GigE.

They are Intel quad-cores (E5345) connected with IB.

> We have 5 servers, 1 MDS with 1 MGS and 1 MDT target and 4 OSS's with
> 2 OST targets (~1.2TB) each.

We have 9 servers, 1 MDS with MGS and MDT, and 8 OSSs with 2 OSTs each.

> Jan 25 11:16:23 BUG: soft lockup detected on CPU#1!

> After 10-15 minutes it locks up, this time with a bunch of
> LustreErrors before the stack trace:

They look like a network interruption problem, but it's hard to tell if 
that's the cause or the consequence. Can't that be that your Ethernet 
switches dropped some packets?

Cheers,
-- 
Kilian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss