Re: [lustre-discuss] LUG 2022 REGISTRATION IS NOW OPEN!
Hi Kirill, Thanks for the invitation, the openness and transparency are very much appreciated. Now, for the issue at hand, I expect it will be quite hard for people to justify the registration cost for a fully-virtual event, when the slides and recording will be posted online and publicly available a few days later. I don't know how past LUG attendees feel about this, but I'm concerned charging users $175 for the privilege to attend LUG 2022 through Zoom may not be exactly aligned with OpenSFS' missions of promoting Lustre usage, increasing awareness and expanding its community. Anyway, that's my $.02. Cheers, -- Kilian On Fri, Apr 22, 2022 at 4:13 PM Kirill Lozinskiy via lustre-discuss < lustre-discuss@lists.lustre.org> wrote: > > Kilian, > > Thank you for bringing this up and expressing your concerns! > > We typically provide an overview of the OpenSFS finances at the Annual Members Meeting that takes place at LUG. We understand that there is interest in going deeper into the annual budget, so the OpenSFS Board of Directors would like to invite any interested OpenSFS Members and Participants to attend a budget deepdive next week on Wednesday, April 27th at 12:30 PM Pacific time. If you are interested in attending the Board Zoom call, please let us know so we can send you the invite. You can email ad...@opensfs.org if you are interested in attending. > > Thank you for bringing this up, and we hope to see you next week! > > Warm regards, > > Kirill Lozinskiy > OpenSFS Treasurer > > > On Thu, Apr 21, 2022 at 3:17 PM Kilian Cavalotti via Execs < ex...@lists.opensfs.org> wrote: >> >> Dear OpenSFS, >> >> On Thu, Apr 21, 2022 at 9:06 AM OpenSFS Administration via >> lustre-discuss wrote: >> >> > We’re excited to announce that registration for the Lustre User Group (LUG) 2022 virtual conference is now open. REGISTER ONLINE. General registration is $175. >> >> So, just to clarify: the event is entirely virtual and yet >> participants will be charged a $175 registration fee? >> That seems a bit steep... :( >> >> What's the rationale here? Especially considering that registration >> for both LUG 2021 and LUG Webinar Series in 2020 (both virtual events >> as well) was free. >> >> Cheers, >> -- >> Kilian > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] LUG 2022 REGISTRATION IS NOW OPEN!
Dear OpenSFS, On Thu, Apr 21, 2022 at 9:06 AM OpenSFS Administration via lustre-discuss wrote: > We’re excited to announce that registration for the Lustre User Group (LUG) > 2022 virtual conference is now open. REGISTER ONLINE. General registration > is $175. So, just to clarify: the event is entirely virtual and yet participants will be charged a $175 registration fee? That seems a bit steep... :( What's the rationale here? Especially considering that registration for both LUG 2021 and LUG Webinar Series in 2020 (both virtual events as well) was free. Cheers, -- Kilian ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] lustre-discuss Digest, Vol 162, Issue 10
Hi Andrew, On Mon, Sep 23, 2019 at 5:56 AM Tauferner, Andrew T wrote: > What is the outlook for 2.12.3 and 2.13 availability. I thought 2.12.3 would > already be available but I don't even see a release candidate in git. Thank > you. As just mentioned during LAD'19 today (https://www.eofs.eu/_media/events/lad19/lad19_paper_3.pdf): - 2.12.3 is targeted for the end of the month - 2.13 for Q4 2019 Cheers, -- Kilian ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] ZFS not freeing disk space
Hi Thomas, On Wed, Aug 10, 2016 at 10:57 AM, Thomas Roth wrote: > one of our ((Lustre 2.5.3, ZFS 0.6.3) OSTs got filled up to >90%, so I > deactivated it and am now migrating files off of that OST. > > But when I do either 'lfs df' or 'df' on the OSS, and don't see any change > in terms of bytes, while the migrated files already sum up to several GB. It's very likely because your OST is deactivated, ie. disconnected from the MDS, and thus freed up space is not accounted for. When you reactivate your OST, it will reconnect to the MDS, which will start cleaning up orphan inodes (ie. inodes that still exist on the OST but are not referenced by any file on the MDT anymore). You should see messages like "lustre-OST: deleting orphan objects from 0x0:180570872 to 0x0:180570891" when this happens. That's actually how it's supposed to work, but there are some limitations in 2.5 that may require a restart of the MDS. See https://jira.hpdd.intel.com/browse/LU-7012 for details. And of course, as soon as you re-activate your OST, new files will be created on it, so it may skew the counters the other way. But AFAIK, it's not specific to ZFS at all. Cheers, -- Kilian ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [Lustre-discuss] Performance dropoff for a nearly full Lustre file system
Hi all, On Wed, Jan 14, 2015 at 7:27 PM, Dilger, Andreas wrote: > Of course, fragmentation also plays a role, which is why ldiskfs will reserve > 5% of the disk by default to avoid permanent performance loss caused by > fragmentation if the filesystem gets totally full. Ashley Pittman gave a presentation at LAD'13 about the influence of fragmentation on performance. http://www.eofs.eu/fileadmin/lad2013/slides/03_Ashley_Pittman_Fragmentation_lad13.pdf Cheers, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Build lustre 2.6 Client on Debian Wheezy
Bonjour Thierry, > /bin/sh: 1: [: -lt: unexpected operator I'm pretty sure that's because in Debian, /bin/sh is linked to dash and the Lustre build script expects bash. You can try to run: # dpkg-reconfigure dash choose No to link /bin/sh to bash, and re-run the make-kpkg part. Hopefully it will work better. I suggest to re-run "dpkg-reconfigure dash" afterwards to restore dash as the default shell. Cheers, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Unable to write to the Lustre File System as any user except root
Hi Carl, On Tue, Oct 11, 2011 at 9:07 PM, Barberi, Carl E wrote: > “ LustreError: 11-0: an error occurred while communicating with > 192.168.10.2@o2ib. The mds_getxattr operation failed with -13.” You likely miss authentication information on your MDS about the user you're trying to write as. Just configure NIS, LDAP or whatever you're using on your MDS, and you should be good to go. Cheers, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lustre 1.6.5.1 panic on failover
On Thursday 31 July 2008 17:22:28 Brock Palen wrote: > Whats a good tool to grab this? Its more than one page long, and the > machine does not have serial ports. If your servers do IPMI, you probably can configure Serial-over-LAN to get a console and capture the logs. But a way more convenient solution is netdump. As long as the network connection is working on the panicking machine, you should be able to transmit the kernel panic info, as well as a stack trace, to a netump-server, which will store it in a file. See http://www.redhat.com/support/wpapers/redhat/netdump/ Cheers, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Another download other than Sun?
Hi Jeremy, On Thursday 31 July 2008 01:04:24 pm Jeremy Mann wrote: > I'm having difficulties downloading 1.6.5.1 through Sun. Every time I > get a "General Protection" error. I really need to get this version > so I can go home at a decent time tonight. Can somebody point me to > an alternative location to download 1.6.5.1 for RHEL4? I get the same error with Konqueror. However, the download page works from Firefox, so you may want to try that. Although I agree that the plain Apache DirectoryIndex version from pre-Sun times was much easier and convenient to use (wget love). But well... :) Cheers, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Luster recovery when clients go away
Hi Brock, On Thursday 31 July 2008 07:30:04 am Brock Palen wrote: > Is there a a way to tell the OST's to go ahead and evict those two > clients and finish recovering? Also "time remaining" has been 0 > sense it was booted. How long will the OST's wait before it lets > operations continue? Well, there should be a timeout, and recovery should be aborted anyway when then "time remaining" counter reaches 0, no matter how many clients have been recovered (the remaining ones are evicted, I believe). In case this doesn't work, you can still avoid the recovery process by mounting your OSTs with -o abort_recov. Cheers, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] collectl
Hi Mark, > useful. I suppose one might also make that argument about things like > statfs, getattr - the only time I was able to make them change was in > response to lfs commands. Might that logic also be applied to > extended attributes and acl counters which I suspect also fall into > the category of slowly changing counters? If you have ACLs enabled on your MDS, then every "ls -l" will induce getxattr()s and the mds_getxattr counter will be increased by as much. So this can change quickly. mds_setxattr, on the other hand, may change less often, since you usually set ACLs less often than you list files. But it can still be interesting to see if mds_setxattr goes through the roof. > On the other hand, it seems like the 'reint' counters are the ones > that tend to change a lot. Perhaps a clue is they're all prefaced > with reint which leads me to ask if there is some simple definition > of what reint actually means other than 'reintegrated operations'? I'd bet on "request identification" or something along those lines. > Perhaps such a definition will help explain why setattr is a reint > counter but getattr is not. In fact, I have seen getattr_lock change > a lot more than getattr. What is the difference between the 2 > (obviously the latter is some sort of lock but it must be used more > than just when incrementing getattr since they don't change > together)? I'm only speculating here, but I believe that extended attributes which are modifiable by a user on a client (like ACLs) are counted in *_xattr, while internal extended attributes used by the MDS, are counted in gettatr. > That all said, it feels like the data to report is all the reints, > getattr, getattr_lock and sync. I would also be interested in seeing (dis)connect (this can probably reveal network problems, if it increases too much), as well as quotactl and get/setxattr, since I use quotas and ACLs. :) Cheers, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Questions about Lustre ACLs
Hi all, I've got a couple questions about ACLs in Lustre: 1. When they're enabled on the MDS, can a client mount the filesystem without them? It doesn't seem to be the case, but at the same time, the mount.lustre manpage mentions the noacl option in the "client-specific" section. See, for instance: Checking ACLs on the MDS: # lctl get_param -n mdc.home-MDT-mdc-*.connect_flags | grep acl acl Mounting the client with no ACLs # mount -t lustre -o noacl [EMAIL PROTECTED]:/home /home ACLs are still in use: # strace ls -al /home/kilian/mpihw.c 2>&1 | grep xattr getxattr("/home/kilian/mpihw.c", "system.posix_acl_access"..., 0x0, 0) = -1 ENODATA (No data available) getxattr("/home/kilian/mpihw.c", "system.posix_acl_default"..., 0x0, 0) = -1 ENODATA (No data available) I believe getxattr() should return EOPNOTSUPP instead of ENODATA, if ACLs were disabled. 2. My second question is about the overhead induced by the ACLs. I didn't do any quantifying measurements, but having ACLs enabled seems to slower all MDS operations. A "ls" in a directory containing a lot of files "feels" way slower when ACLs are enabled on the MDS. Is that something to be expected? Thanks, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Can't build sun rdac driver against lustre source.
Hi Stuart, On Friday 25 July 2008 11:19:18 am Stuart Marshall wrote: > The sequence I've used (perhaps not the best) is: > > - cd /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5.1smp/source/ > - cp /boot/config-2.6.9-67.0.7.EL_lustre.1.6.5.1smp .config > - make clean > - make mrproper Doesn't "make mrproper" erase the .config file you just copied? In which case you probably end up with a default kernel, which doesn't matter too much since it's only about compiling an external module, but I guess it can bite you back if you ever consider recompiling the whole kernel. Cheers, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Can't build sun rdac driver against lustre source.
Hi Brock, On Friday 25 July 2008 11:03:12 am Brock Palen wrote: > I just had to copy genksyms and mod from > linux-2.6.9-67.0.7.EL_lustre.1.6.5.1 to linux-2.6.9-67.0.7.EL_lustre. > 1.6.5.1-obj > > I figured you should be aware of this, if its a problem with sun's > build system for their multipath driver or lustre source package. > This is on RHEL4. Using the lustre RPM's form sun's website. It's a problem with the fact that Lustre kernels for RHEL4 are packaged the SuSE way, with a /usr/src/linux-$VERSION-$RELEASE/ and a /usr/src/linux-$VERSION-$RELEASE-obj/$ARCH/$FLAVOR/ directory holding the object files. Whereas RHEL4 expects everything to be located in /usr/src/linux-$VERSION-$RELEASE/. A workaround this is to put the .config file into the kernel sources directory, and prepare the kernel tree manually. What I usually do is the following (this is for Lustre 1.6.5.1): # rm /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5.1smp/build # ln -s /usr/src/linux-2.6.9-67.0.7.EL_lustre.1.6.5.1 /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5.1smp/build # cp /usr/src/linux-2.6.9-67.0.7.EL_lustre.1.6.5.1-obj/x86_64/smp/.config /usr/src/linux-2.6.9-67.0.7.EL_lustre.1.6.5.1/ # cd /usr/src/linux-2.6.9-67.0.7.EL_lustre.1.6.5.1/ # [edit Makefile, and replace 'custom' by 'smp' in EXTRAVERSION] # make oldconfig # make modules_prepare And then, you should be able to compile any additional kernel module. > The next problem I am stuck on is: > > In file included from mppLnx26_spinlock_size.c:51: > /usr/include/linux/autoconf.h:1:2: #error Invalid kernel header > included in userspace > mppLnx26_spinlock_size.c: In function `main': > mppLnx26_spinlock_size.c:102: error: `spinlock_t' undeclared (first > use in this function) Can't be sure it will fix this problem too, but it may be worth a try. Cheers, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] OSS crashes
Hi Thomas, On Thursday 24 July 2008 09:24:11 am Thomas Roth wrote: > On the next crash I'll try to get a stack trace, and logging the > console to more than the xterm buffer surely is something we ought to > do as well. If you don't know it or use it already, maybe you could give a try to netdump: http://www.redhat.com/support/wpapers/redhat/netdump/ It basically allows you to get crash dumps and stack traces from a remote machine. Much useful for gathering Lustre debug information. Cheers, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] specifying OST
Hi Mag, On Friday 11 July 2008 04:38:40 am Mag Gam wrote: > is it possible to create a file on a particular OST? I guess you can do so using the "lfs setstripe" command. You can set the stripping information on a file or directory so that it only uses one OST. That's the case by default, but you need to use setstripe to specify which OST you want to use. For instance, the following command will put "yourfile" of the first OST (id 0): $ lfs setstripe --count 1 --index 0 yourfile $ dd if=/dev/zero of=yourfile count=1 bs=100M 1+0 records in 1+0 records out $ lfs getstripe yourfile OBDS: 0: home-OST_UUID ACTIVE [...] yourfile obdidx objid objidgroup 033459243 0x1fe8c2b0 Cheers, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] 1.6.5 and OFED?
On Monday 16 June 2008 04:35:41 am Greenseid, Joseph M. wrote: > Is there any word on when the IB packages might be making it up to > the download site for 1.6.5? As had been previously noted, they were > missing when the rest of 1.6.5 was pushed. I'd like to support this request, since this is part of the 1.6.5 Changelog: """ Severity: enhancement Bugzilla: 15316 Description: build kernel-ib packages for OFED 1.3 in our release cycle """ Also, the download site lists lustre-client and lustre-client-modules RPMs for RHEL5 and SLES10, but not for RHEL4 nor SLES9. Is that by design, or are they missing too? Thanks, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Gluster then DRBD now Lustre?
On Monday 16 June 2008 11:40:40 am Andreas Dilger wrote: > > NYC == New York City? What > > is SJC? > > SJC == San Jose, California That's why I thought, but if so, the following part loses me: > This is working in a test setup, however there are some down sides. > The first is that DRBD only supports IP, so we have to run IPoIB over > our our infiniband adapters, not an ideal solution. Nathan, you won't be able to use Infiniband between Ney Work City and San Jose, CA, anyway, right? Even without considering IB cables' length limitation, and unless you can use some kind of dedicated, special-purpose link between your sites, the public Internet is not really able to provide bandwidth nor latencies compatible with Infiniband standards. IP is probably your best bet, here, and DRBD would probably be an appropriate candidate for this kind of job. Although, you probably don't want your synchronization data unencrypted over the public pipes, and you may need an extra VPN-ish layer to ensure data confidentiality. Cheers, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre Mount Crashing
On Monday 02 June 2008 08:35:35 am Charles Taylor wrote: > Unfortunately, getting the messages off the console (in the machine > room) means using a pencil and paper (you'd think we have something > as fancy as a ip-kvm console server, but alas, we do things, ahem, > "inexpensively" here. There are a couple solutions to help you there: * using a serial console connected to a remote machine (costs a serial cable and some configuration). * having an IPMI-enabled BMC, or any sort of remote-controler card should give you easy access to the machine's console, remotely. Those cards ain't cheap, but if you already got them in your servers, that's the good occasion to put them in use. * and maybe the easiest, most inexpensive (no hardware involved) and most convenient one: using netdump [1]. You configure a netdump client on the machine you want to gather logs and traces from, and a netdump-server on an other host, to receive those messages. This solution proved to be really efficient in gathering Lustre's debug logs and crash dumps. [1] http://www.redhat.com/support/wpapers/redhat/netdump/ and http://docs.freevps.com/doku.php?id=how-to:netdump HTH, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Swap on Lustre (was: Client is not accesible when OSS/OST server is down)
Hi Brian, On Tuesday 29 April 2008 07:53:01 am Brian J. Murrell wrote: > Unless you are using Lustre for your root and/or usr filesystem > and/or for swap, Lustre should not hang a machine completely. I was precisely wondering if it was possible to use a file residing on a Lustre filesystem as a swap file. I tried the basic steps without any success. On a regular ext3 fs, no problem: /tmp # dd if=/dev/zero of=./swapfile bs=1024 count=1024 10240+0 records in 10240+0 records out /tmp # mkswap ./swapfile Setting up swapspace version 1, size = 104853 kB /tmp # swapon -a ./swapfile /tmp # swapon -s Filename TypeSizeUsedPriority /dev/sda3 partition 4096564 204 -1 /tmp/swapfile file102392 0 -2 But on a Lustre mount: # cd /scratch /scratch # grep /scratch /proc/mounts [EMAIL PROTECTED]:/scratch /scratch lustre rw 0 0 /scratch # dd if=/dev/zero of=./swapfile bs=1024 count=1024 10240+0 records in 10240+0 records out /scratch # mkswap ./swapfile Setting up swapspace version 1, size = 104853 kB /scratch # swapon -a ./swapfile swapon: ./swapfile: Invalid argument Is that expected? Thanks, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] HW experience
Hi Martin, On Wednesday 26 March 2008 04:53:31 Martin Gasthuber wrote: > we would like to establish a small Lustre instance and for the OST > planning to use standard Dell PE1950 servers (2x QuadCore + 16 GB Ram) > and for the disk a JBOD (MD1000) steered by the PE1950 internal Raid > controller (Raid-6). Any experience (good or bad) with such a config ? I also have a 50TB Lustre setup based on this hardware: 8 PE1950 OSSes connected to two MD1000 OSTes each. The MDS uses a MD3000 as a MDT for high-availability (redundancy is not currently in use, though, I never managed to get it working reliably). Can't say much about the PERC6 controller, since I'm using its older brother PERC5, but memory wise, you should be good with 16B. We planned 4GB per OSS (2xOST each) at the beginning, but we had to double that to avoid memory exhaustion [1]. It will depend on the load induced by the clients, though. MD1000s' performance is great as long as you set the read-ahead settings as Aaron mentioned. /scratch $ iozone -c -c -R -b ~/iozone.xls -C -r 64k -s 24m -i 0 -i 1 -i 2 -i8 -t50 "Throughput report Y-axis is type of test X-axis is number of processes" "Record size = 64 Kbytes " "Output is in Kbytes/sec" " Initial write " 1317906.72 "Rewrite " 2423618.81 " Read " 3484409.47 "Re-read " 4023550.60 "Random read " 3361937.08 " Mixed workload " 2994666.57 " Random write " 1777569.04 [1]http://lists.lustre.org/pipermail/lustre-discuss/2008-February/004874.html Cheers, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre SNMP module
On Thursday 20 March 2008 01:15:04 pm Mark Seger wrote: > not sure if you're talking about collectl Not I wasn't, I was referring to the Lustre Monitoring Tool (LMT) from LLNL. > Be careful here. You can certain stick some data into an rrd but > certainly not all of it, especially if you want to collect a lot of > it at a reasonable frequency. If you want accurate detail plots, > you've gotta go to the data stored on each separate system. I just > don't see any way around this, at least not yet... Yes, you're absolutely right. Given its intrinsic multi-scale nature, a RRD is well suited for keeping historical data on large time scales. This could allow a very convenient graphical overview of the different system metrics, but would be pointless for debugging purposes, where you do need fine-grained data. That's where collectl is the most useful for me. But what about both? I don't see any reason why collectl couldn't provide high-frequency accurate data to diagnose problems locally, and at the same time allow to aggregate less precise values in RRD for global visualization of multi-hosts systems. > As a final note, I've put together a tutorial on using collectl in a > lustre environment and have upload a preliminary copy at > http://collectl.sourceforge.net/Tutorial-Lustre.html in case anyone > wants to preview it before I link it into the documentation. > If nothing else, look at my very last example where I show what you > can see by monitoring lustre at the same time as your network > interface. Very good, thanks for this. The readahead experiment is insightful. > Did I also mention that collectl is probably one of the few tools > that can monitor your Infiniband traffic as well? That's why it rocks. :) Now the only thing which still make me want to use other monitoring software is the ability to get a global view. Centralized data collection and easy graphing (RRD feeding) are still what I need most of the time. Cheers, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre SNMP module
On Tuesday 11 March 2008 01:52:33 am Brian J. Murrell wrote: > You could do that, but I suspect that if you want to see those > developments include SNMP access to the stats, you are going to have > to be more proactive than just following the current development. I > don't have any more insight than what's in that thread about the > plans underway but I'd be very surprised if they currently include > SNMP. I could be wrong but I suspect that if you want to see SNMP > availability you'd have to get active Gotcha. Bug #15197, "Feature request: expand SNMP scope" > either with participating in > the design and perhaps some hacking I'm not sure I can be of any help in this area, unfortunately. But I've seen that some users expressed the same kind of need and rolled up their sleeves :) http://lists.lustre.org/pipermail/lustre-devel/2008-January/001504.html > or voicing your desires through > your sales channel. That I can do. :) Thanks for the advice, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre SNMP module
Hi Brian, On Monday 10 March 2008 03:04:33 pm Brian J. Murrell wrote: > I can't disagree with that, especially as Lustre installations get > bigger and bigger. Apart from writing custom monitoring tools, > there's not a lot of "pre-emptive" monitoring options available. > There are a few tools out there like collectl (never seen it, just > heard about it) collectl is very nice, but as dstat and such, it has to run on each and every host. It can provide its results via sockets though, so it could be used as a centralized monitoring system for a Lustre installation. And it provides detailled statistics too: # collectl -sL -O R waiting for 1 second sample... # LUSTRE CLIENT DETAIL: READAHEAD #Filsys Reads ReadKB Writes WriteKB Pend Hits Misses NotCon MisWin LckFal Discrd ZFile ZerWin RA2Eof HitMax home100192 0 0 0 0100 0 0 0 0 0100 0 0 scratch 100192 0 0 0 0100 0 0 0 0 0100 0 0 home102 6294 23 233 0 0 87 0 0 0 0 0 87 0 0 scratch 102 6294 23 233 0 0 87 0 0 0 0 0 87 0 0 home 95158 22 222 0 0 81 0 0 0 0 0 81 0 0 scratch 95158 22 222 0 0 81 0 0 0 0 0 81 0 0 # collectl -sL -O M waiting for 1 second sample... # LUSTRE CLIENT DETAIL: METADATA #Filsys Reads ReadKB Writes WriteKB Open Close GAttr SAttr Seek Fsync DrtHit DrtMis home 0 0 0 0 0 0 0 0 0 0 0 0 scratch 0 0 0 0 0 0 2 0 0 0 0 0 home 0 0 0 0 0 0 0 0 0 0 0 0 scratch 0 0 0 0 0 0 0 0 0 0 0 0 home 0 0 0 0 0 0 0 0 0 0 0 0 scratch 0 0 0 0 0 0 1 0 0 0 0 0 # collectl -sL -O B waiting for 1 second sample... # LUSTRE FILESYSTEM SINGLE OST STATISTICS #Ost Rds RdK 1K 2K 4K 8K 16K 32K 64K 128K 256K Wrts WrtK 1K 2K 4K 8K 16K 32K 64K 128K 256K home-OST0007000000000000 0000000000 scratch-OST0007 00900000000 12 3075900000003 home-OST0007000000000000 0000000000 scratch-OST0007 001000000001 2100000000 home-OST0007000000000000 0000000000 scratch-OST0007 001000000001 2100000000 > and LLNL have one on sourceforge, Last time I checked, it only supported 1.4 versions, but it's been a while, so I'm probably a bit behind. > but I can certainly > see the attraction at being able to monitor Lustre on your servers > with the same tools as you are using to monitor the servers' health > themselves. Yes, that'd be a strong selling point. > This could wind becoming a lustre-devel@ discussion, but for now, it > would be interesting to extend the interface(s) we use to > introduce /proc (and what will soon be it's replacement/augmentation) > stats files so that they are automagically provided via SNMP. That sounds like the way to proceed, indeed. > You know, given the discussion in this thread: > http://lists.lustre.org/pipermail/lustre-devel/2008-January/001475.ht >ml now would be a good time for the the community (that perhaps might > want to contribute) desiring SNMP access to get their foot in the > door. Ideally, you get SNMP into the generic interface and then SNMP > access to all current and future variables comes more or less free. Oh, thanks for pointing this. It looks like major underlying changes are coming. I think I'll subscribe to the lustre-devel ML to try to follow them. > That all said, there are some /proc files which provide a copious > amount of information, like brw_stats for instance. I don't know how > well that sort of thing maps to SNMP, but having an SNMP manager > watching something as useful as brw_stats for trends over time could > be quite interesting. Add some RRD graphs to keep historical variations, and you got the all-in-one Lustre monitoring tool we sysadmins are all waiting for. ;) Cheers, -- Kilian ___ Lustre-discuss mailing list Lustre-
Re: [Lustre-discuss] Lustre SNMP module
Hi Klaus, On Friday 07 March 2008 05:52:51 pm Klaus Steden wrote: > I was asking that same question a few months ago. Yes, I remember you haven't been overwhelmed by answers. :\ > I can send you my > 1.6.2 spec file for reference ... That version also did not bundle > the SNMP library, so I ended up building it by recompiling the whole > set of Lustre RPMs to get what I needed, and then just dropped the > DSO in place. That's exactly what I did, finally. > I'm curious as to what metrics you see to be useful -- I wasn't sure > what to look for, so while I installed the module, I haven't yet > thought of good things to ask of it. So, from what I've seen in the MIB, the current SNMP module mainly report version numbers and free space information. I think it would also be useful to get "activity metrics", the same kind of information which is in /proc/fs/lustre/llite/*/stats on clients (so we can see reads/writes and fs operations rates), in /proc/fs/lustre/obdfilter/*/stats on OSSes and in /proc/fs/lustre/mds/*/stats on MDSes. Actually, all the /proc/fs/lustre/*/**/stats could be useful, but I guess what precise metric is the most useful heavily depends on what you want to see. :) Cheers, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre SNMP module
On Friday 07 March 2008 05:01:11 pm Kilian CAVALOTTI wrote: > So I was wondering if there was any plan to include the SNMP module > back in future RPM versions? And in addition to that, is there any plan to add more stats through this SNMP module (the kind we find in /proc/fs/lustre/{llite,ost,mdt}/.../stats)? That'd be an excellent starting point to collect metrics and remotely monitor a Lustre setup fron a central location. Thanks, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Lustre SNMP module
Hi all, I'd like to get some Lustre info from my OSS/MDSs through SNMP. So I'm reading the Lustre manual, and it indicates [1] that the lustresnmp.so file should be provided by the "base Lustre RPM". But it's not. :) At least not in the 1.6.4.1 RHEL4 x86_64 RPMs. So I was wondering if there was any plan to include the SNMP module back in future RPM versions? Thanks, -- Kilian [1]http://manual.lustre.org/manual/LustreManual16_HTML/DynamicHTML-15-1.html ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lustre dstat plugin
Hi Brock, On Wednesday 05 March 2008 05:21:51 pm Brock Palen wrote: > I have wrote a lustre dstat plugin. You can find it on my blog: That's cool! Very useful for my daily work, thanks! > It only works on clients, and has not been tested on multiple mounts, > Its very simple just reads /proc/ It indeed doesn't read stats for multiple mounts. I slightly modified it so it can display read/write numbers for all the mounts it founds (see the attached patch). Here's a typical output for a rsync transfer from scrath to home: -- 8< --- $ dstat -M lustre Module dstat_lustre is still experimental. --scratch---home--- read write: read write 110M0 : 0 110M 183M0 : 0 183M 184M0 : 0 184M -- 8< --- Maybe it could be useful to also add the other metrics from the stat file, but I'm not sure which ones would be the more relevant. And it would probably be wise to do that in a separate module, like lustre_stats, to avoid clutter. Anyway, great job, and thanks for sharing it! Cheers, -- Kilian --- dstat_lustre_orig.py 2008-03-07 15:54:10.0 -0800 +++ dstat_lustre.py 2008-03-07 15:54:36.0 -0800 @@ -5,28 +5,33 @@ class dstat_lustre(dstat): def __init__(self): - self.name = 'lustre 1.6 client' - for entry in os.listdir("/proc/fs/lustre/llite"): - filesystem = '/'.join(['/proc/fs/lustre/llite',entry,'stats']) - self.open(filesystem) + self.name = [] + self.vars = [] + if os.path.exists('/proc/fs/lustre/llite'): + for mount in os.listdir('/proc/fs/lustre/llite'): +self.vars.append(mount) +self.name.append(mount[:mount.rfind('-')]) self.format = ('f', 5, 1024) - self.vars = ('read', 'write') - self.nick = ('read', 'writ') - self.init(self.vars, 1) + self.nick = ('read', 'write') + self.init(self.vars, 2) info(1, 'Module dstat_lustre is still experimental.') def extract(self): - for line in self.readlines(): - l = line.split() - if not l or l[0] != 'read_bytes': continue - self.cn2['read'] = long(l[6]) - for line in self.readlines(): - l = line.split() - if not l or l[0] != 'write_bytes': continue - self.cn2['write'] = long(l[6]) for name in self.vars: - self.val[name] = (self.cn2[name] - self.cn1[name]) * 1.0 / tick - if step == op.delay: - self.cn1.update(self.cn2) + f = open('/'.join(['/proc/fs/lustre/llite',name,'stats'])) + lines = f.readlines() + for line in lines: +l = line.split() +if not l or l[0] != 'read_bytes': continue +read = long(l[6]) + for line in lines: +l = line.split() +if not l or l[0] != 'write_bytes': continue +write = long(l[6]) + self.cn2[name] = (read, write) + self.val[name] = ( (self.cn2[name][0] - self.cn1[name][0]) * 1.0 / tick,\ + (self.cn2[name][1] - self.cn1[name][1]) * 1.0 / tick ) + if step == op.delay: +self.cn1.update(self.cn2) # vim:ts=4:sw=4 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre Downloads
On Thursday 14 February 2008 01:34:46 pm Cliff White wrote: > http://downloads.clusterfs.com/ > should be working now. Please let us know if there are further > issues. cliffw That looks way better. :) Thanks! -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre Downloads
On Thursday 14 February 2008 12:09:59 pm Canon, Richard Shane wrote: > I see that the download site has been moved and integrated into the > Sun site. It looks like this broke a few things. For one, I can't > get to any of the 1.4 releases. Can this get fixed? Grrr, I support this, and I don't like to have to "register" to download a tarball either... Cheers, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] how do you mount mountconf (i .e. 1.6) lustre on your servers?
On Thursday 14 February 2008 10:44:33 am Brian J. Murrell wrote: > > on an OSS: > > /dev/sdb /lustre/ost-home lustre defaults,_netdev 0 0 > > No heartbeat or failover then? Nope. We initially planned to implement failover on our MDS, but I never managed to get Heartbeat working reliabily on our shared-bus configuration. It caused more downtime than it provides high-availability. We also had hardware issues, which likely caused the problems, but now that our cluster is in production, I can't really bring it down to reimplement failover. Users would probably begin to throw things at me... :) Cheers, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] how do you mount mountconf (i.e. 1.6) lustre on your servers?
Hi Brian, On Thursday 14 February 2008 10:38:45 am Brian J. Murrell wrote: > I'd like to take a small survey on how those of you using mountconf > (1.6) are managing the mounting of your Lustre devices on the > servers. We do use /etc/fstab, with the _netdev option (RHEL4): on a client: [EMAIL PROTECTED]:/home /home lustre defaults,flock,_netdev 0 0 on an OSS: /dev/sdb /lustre/ost-home lustre defaults,_netdev 0 0 Cheers, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] o2iblnd no resources
On Sunday 03 February 2008 06:30:16 am Isaac Huang wrote: > It depends on the architectures of the OSSes - o2iblnd, and I believe > OFED too, can't use memory in ZONE_HIGHMEM. For example, on x86_64 > where ZONE_HIGHMEM is empty, adding more RAM will certainly help. Good to know, thanks. On the strange side, this "no resources" message only appears on one client. It get it from pretty much all our 8 OSSes, while all the other 276 clients can still access the filesystem (hence all the 8 OSSes) with not a single problem. Rebooting the problematic client doesn't help either. Does that sound that something that logic can explain? I would assume that if the OSS were out of memory, this would indifferently affect all the clients, right? Thanks, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Luster clients getting evicted
On Monday 04 February 2008 10:17:37 am Brock Palen wrote: > The > cluster IS to big, but there isn't a person at the university who is > willing to pay for anything other than more cluster nodes. Enough > with politics. That's the first time I hear a cluster is too big, people usually complain about the contrary. :) But the second part sounds very very familiar, though... Anyway. > I just had another node get evicted while running code causing the > code to lock up. This time it was the MDS that evicted it. Pinging > work though: > > [EMAIL PROTECTED] ~]# lctl ping [EMAIL PROTECTED] > [EMAIL PROTECTED] > [EMAIL PROTECTED] Ok. > I have attached the output of lctl dk from the client and some > syslog messages from the MDS. (recover.c:188:ptlrpc_request_handle_notconn()) import nobackup-MDT-mdc-01012bd27c00 of [EMAIL PROTECTED]@tcp abruptly disconnected: reconnecting (import.c:133:ptlrpc_set_import_discon()) nobackup-MDT-mdc-01012bd27c00: Connection to service nobackup-MDT via nid [EMAIL PROTECTED] was lost; I will let Lustre people comment on this, but this sure looks like a network problem to me. Is there any information you can get out of the switches (logs, dropped packets, retries, stats, anything)? > Nope both servers have 2GB ram, and load is almost 0. No swapping. Do you see dropped packets or errors in your ifconfig output, on the servers and/or clients? Cheers, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Luster clients getting evicted
Hi Brock, On Monday 04 February 2008 07:11:11 am Brock Palen wrote: > on our cluster that has been running lustre for about 1 month. I have > 1 MDT/MGS and 1 OSS with 2 OST's. > > Our cluster uses all Gige and has about 608 nodes 1854 cores. This seems to be a lot of clients for only one OSS (and thus for only one GigE link to the OSS). > We have allot of jobs that die, and/or go into high IO wait, strace > shows processes stuck in fstat(). > > The big problem is (i think) I would like some feedback on it that of > these 608 nodes 209 of them have in dmesg the string > > "This client was evicted by" > > Is this normal for clients to be dropped like this? I'm not an expert here, but evictions typically occur when a client hasn't been seen for a certain period by the OSS/MDS. This is often related to network problems. Considering your number of clients, if they all do I/O operations on the filesystem concurrently, maybe your Ethernet switches are the bottleneck and have to drop packets. Is your GigE network working fine outside of Lustre? To eliminate networking issues from the equation, you can try to lctl ping your MDS and OSS from a freshly evicted node, and see what you get. (lctl ping ) > Is there some > tuning that needs to be done to the server to carry this many nodes > out of the box? We are using default lustre install with Gige. Do your MDS or OSS show any particularly high load or memory usage? Do you see any Lustre-related error messages in their logs? CHeers, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] mkfs.lustre and disk partitions
Hi Jim, On Monday 04 February 2008 07:04:10 am Jim Albin wrote: > Hello, > I've seen several notes mentioning the disadvantages of using disk > partitions on the storage devices for Lustre OSTs (and/or MDTs). My > questions, if anyone can help me, are; > > 1) Should I delete any existing partitions on the device? There's no need to explicitely destroy the partitions if you overwrite them. > 2) If not, should I partition the device into a single partition with > a specific block size (maybe 1mb)? No need either. One single partition is still a partition. > 3) Can I just use the disk block device (eg, /dev/sda) when I > mkfs.lustre and is it > smart enough to ignore the partition table? Yes, exactly. Generally speaking, mkfs /dev/sdb will use the whole sdb device for the filesystem, and you won't have any partition table. As a consequence, you won't be able to boot from it, which is not relevant here, but all the other operations will work as on any regular partition (tunefs, mount, etc). > I'm trying to set up Lustre 1.6.3 and am seeing poor performance, > possibly fragmentation on the mdt and ost. > Thanks in advance for any suggestions. I don't know what backend hardware you're using, but in case of Dell MD1000s, you probably can give a look (and a try) at: http://thias.marmotte.net/archives/2008/01/05/Dell-PERC5E-and-MD1000-performance-tweaks.html HTH, Cheers, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] o2iblnd no resources
Hi Liang, On Friday 01 February 2008 23:39:09 you wrote: > I think it's because o2iblnd uses fragmented RDMA by default(Max to > 256), so we have to set max_send_wr as (concurrent_send * (256 + 1)) > while creating QP by rdma_create_qp(), it takes a lot of resource and > can make a busy server out of memory sometime. By the way, is there a way to free some of this memory to resolve the problem temporarily, without having to restart the OSS? Thanks, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] o2iblnd no resources
On Saturday 02 February 2008 00:42:47 Isaac Huang wrote: > > Here is patch for this problem (using FMR in o2iblnd) > > https://bugzilla.lustre.org/attachment.cgi?id=15144 > > This is an experimental patch - nodes with the patch applied are not > interoperable with those without it. Please don't propagate the patch > to production systems. Thanks for the explanation. Since the problem indeed occurs on a production system, I'd rather keep experimental patches out of the way. I assume that adding more RAM on the OSSes is likely to solve this problem, right? If that's the case, I'd probably go this way, before the FMR patch is landed. Thanks, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] o2iblnd no resources
Hi all, What can cause a client to receive a "o2iblnd no resources" message from an OSS? --- Feb 1 15:24:24 node-5-8 kernel: LustreError: 1893:0:(o2iblnd_cb.c:2448:kiblnd_rejected()) [EMAIL PROTECTED] rejected: o2iblnd no resources --- I suspect an out-of-memory problem, and indeed the OSS logs are filled up with the following: --- ib_cm/3: page allocation failure. order:4, mode:0xd0 Call Trace:{__alloc_pages+777} {alloc_page_interleave+61} {__get_free_pages+11} {kmem_getpages+36} {cache_alloc_refill+609} {__kmalloc+123} {:ib_mthca:mthca_alloc_qp_common+668} {:ib_mthca:mthca_alloc_qp+178} {:ib_mthca:mthca_create_qp+311} {:ib_core:ib_create_qp+20} {:rdma_cm:rdma_create_qp+43} {dma_pool_free+245} {:ib_mthca:mthca_init_cq+1073} {:ib_mthca:mthca_create_cq+282} {alloc_page_interleave+61} {:ko2iblnd:kiblnd_cq_completion+0} {:ko2iblnd:kiblnd_cq_event+0} {:ib_core:ib_create_cq+33} {:ko2iblnd:kiblnd_create_conn+3565} {:libcfs:cfs_alloc+40} {:ko2iblnd:kiblnd_passive_connect+2215} {:ib_core:ib_find_cached_gid+244} {:rdma_cm:cma_acquire_dev+293} {:ko2iblnd:kiblnd_cm_callback+64} {:ko2iblnd:kiblnd_cm_callback+0} {:rdma_cm:cma_req_handler+863} {alloc_layer+67} {idr_get_new_above_int+423} {:ib_cm:cm_process_work+101} {:ib_cm:cm_req_handler+2398} {:ib_cm:cm_work_handler+0} {:ib_cm:cm_work_handler+46} {worker_thread+419} {default_wake_function+0} {__wake_up_common+67} {default_wake_function+0} {keventd_create_kthread+0} {worker_thread+0} {keventd_create_kthread+0} {kthread+200} {child_rip+8} {keventd_create_kthread+0} {kthread+0} {child_rip+0} Mem-info: Node 0 DMA per-cpu: cpu 0 hot: low 2, high 6, batch 1 cpu 0 cold: low 0, high 2, batch 1 cpu 1 hot: low 2, high 6, batch 1 cpu 1 cold: low 0, high 2, batch 1 cpu 2 hot: low 2, high 6, batch 1 cpu 2 cold: low 0, high 2, batch 1 cpu 3 hot: low 2, high 6, batch 1 cpu 3 cold: low 0, high 2, batch 1 Node 0 Normal per-cpu: cpu 0 hot: low 32, high 96, batch 16 cpu 0 cold: low 0, high 32, batch 16 cpu 1 hot: low 32, high 96, batch 16 cpu 1 cold: low 0, high 32, batch 16 cpu 2 hot: low 32, high 96, batch 16 cpu 2 cold: low 0, high 32, batch 16 cpu 3 hot: low 32, high 96, batch 16 cpu 3 cold: low 0, high 32, batch 16 Node 0 HighMem per-cpu: empty Free pages: 35336kB (0kB HighMem) Active:534156 inactive:127091 dirty:1072 writeback:0 unstable:0 free:8834 slab:146612 mapped:26222 pagetables:1035 Node 0 DMA free:9832kB min:52kB low:64kB high:76kB active:0kB inactive:0kB present:16384kB pages_scanned:37 all_unreclaimable? yes protections[]: 0 510200 510200 Node 0 Normal free:25504kB min:16328kB low:20408kB high:24492kB active:2136624kB inactive:508364kB present:4964352kB pages_scanned:0 all_unreclaimable? no protections[]: 0 0 0 Node 0 HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no protections[]: 0 0 0 Node 0 DMA: 2*4kB 2*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 2*4096kB = 9832kB Node 0 Normal: 1284*4kB 2290*8kB 126*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 25504kB Node 0 HighMem: empty Swap cache: add 111, delete 111, find 23/36, race 0+0 Free swap: 4096360kB 1245184 pages of RAM 235840 reserved pages 659867 pages shared 0 pages swap cached --- IB links are up and working on both the client and the OSS: --- client# ibstatus Infiniband device 'mthca0' port 1 status: default gid: fe80::::0005:ad00:0008:af71 base lid:0x83 sm lid: 0x130 state: 4: ACTIVE phys state: 5: LinkUp rate:20 Gb/sec (4X DDR) oss# ibstatus Infiniband device 'mthca0' port 1 status: default gid: fe80::::0005:ad00:0008:cb11 base lid:0x126 sm lid: 0x130 state: 4: ACTIVE phys state: 5: LinkUp rate:20 Gb/sec (4X DDR) --- And the Subnet Manager doesn't expose any unusual error or skyrocketed counter (I use OFED 1.2, kernel 2.6.9-55.0.9.EL_lustre.1.6.4.1smp). What I don't really get is that most clients can access files on this OSS with no issue, and besides, my limited understanding of the kernel memory mechanisms tend to let me believe that this OSS is not out of memory: --- #
Re: [Lustre-discuss] Lustre 1.6.4.1 - client lockup
Hi Niklas, On Friday 25 January 2008 07:10:47 am Niklas Edmundsson wrote: > We're able to consistently kill the lustre client with bonnie in > combination with striping. Out of curiosity, I tried to reproduce your experiment, and didn't encounter any problem. All the bonnie processes ran fine. There are a lot of significative differences between our test environments, but I thought it may be useful to know the results of your test case on a different system. > This is Lustre 1.6.4.1, Debian 2.6.18 > amd64 kernel with lustre patches on both server and clients I used Lustre 1.6.4.1, RHEL4 and 2.6.9-55.0.9.EL_lustre.1.6.4.1smp amd64 x86_64 kernel. > All machines are dual opterons connected with GigE. They are Intel quad-cores (E5345) connected with IB. > We have 5 servers, 1 MDS with 1 MGS and 1 MDT target and 4 OSS's with > 2 OST targets (~1.2TB) each. We have 9 servers, 1 MDS with MGS and MDT, and 8 OSSs with 2 OSTs each. > Jan 25 11:16:23 BUG: soft lockup detected on CPU#1! > After 10-15 minutes it locks up, this time with a bunch of > LustreErrors before the stack trace: They look like a network interruption problem, but it's hard to tell if that's the cause or the consequence. Can't that be that your Ethernet switches dropped some packets? Cheers, -- Kilian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss