[Lustre-discuss] Unable to move MDS using procedure in the manual
I tried to move my MDS from one filesystem on the same machine to another, using the procedure outlined in the Lustre manuals (I didn't use dd, since the underlying disks weren't the same size and also I did not think it was required). Specifically, I used rsync to copy the files, and also used getfattr/setfattr to copy over the extended attributes. Some brief poking around seemed to show that the EA information made it into the new filesystem. However, when I went to mount the new MDS partition, it failed with the following error: May 30 23:36:50 mds-foo kernel: [ 186.604083] LustreError: 3082:0:(md_local_object.c:433:llo_local_objects_setup()) creating obj [fld] fid = [0x20001:0x3:0x0] rc = -116 May 30 23:36:50 mds-foo kernel: [ 186.698205] LustreError: 3082:0:(mdt_handler.c:4576:mdt_init0()) Can't init device stack, rc -116 May 30 23:36:50 mds-foo kernel: [ 186.797206] LustreError: 3082:0:(obd_config.c:522:class_setup()) setup foo-MDT failed (-116) May 30 23:36:50 mds-foo kernel: [ 186.806140] LustreError: 3082:0:(obd_config.c:1363:class_config_llog_handler()) Err -116 on cfg command: May 30 23:36:50 mds-foo kernel: [ 186.815615] Lustre:cmd=cf003 0:foo-MDT 1:foo-MDT_UUID 2:0 3:foo-MDT-mdtlov 4:f There were more errors, bu they all pretty much were cascading from these errors. I switched back to the original filesystem and everything worked. I am willing to believe I did something wrong, but I'm not sure what; I did everything the directions said to do. -116 is ESTALE, and I found in the code where I believe that error was returned, but it was a little unclear to me what the root cause was. Can anyone offer any advice? --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] [HPDD-discuss] Unable to move MDS using procedure in the manual
Which version? Whoops, can you believe I forgot that? It's 2.1.2. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] [HPDD-discuss] Unable to move MDS using procedure in the manual
Which version of Lustre is this? File based backup / restore does not work in 2.x. OI scrub which rebuilds the object index is available from Lustre 2.3 onwards. So file based backup / restore will work from 2.3 onwards. Well, crud. I guess that's what Colin was going to tell me, and I see Andreas said the same thing. So, this leads to a follow-up question: _where_ is latest and greatest Lustre manual? I used the one labelled 2.0 here: http://wiki.lustre.org/manual/LustreManual20_HTML/LustreOperations.html Which doesn't actually mention that you can't do a file-level backup on the MDT. Some poking around led me to the Whamcloud one, which actually does say that. Perhaps an upgrade to 2.4 is in order (which we were interested in doing anyway). --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] problem with installing lustre and ofed
That's good to know kernel-ib comes with the lustre stock install. What about the rest of the OFED tools? I mean things like ibdiagnet, ibstatus, etc? (I will look at the contents of the other rpms and see what I can learn) I think Jeff missed a few steps. If you want the _server-side_ packages, what you need to do is: - Install a Lustre-patched kernel, including devel packages (you can use the ones from Whamcloud if they're suitable). - Build your OFED against that kernel install it. - Compile Lustre against the Lustre-patched kernel and the OFED. This is the tricky part; you need to make sure to tell Lustre to link against the right OFED package. There are Lustre build scripts that actually automate all of this; last time I checked, they were only available in the git tree, NOT in the source tarball. Those build scripts are a bit of a pain to use, and I find that I always have to tweak them a bit. But once you figure them all out it makes things easier. Now as for the userspace utilities ... well, you need to make sure they're not too far off from the kernel. How far is too far? Good question. I don't think they're guaranteed to work when they don't match, but in my limited experience minor version differences are ok. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] What does ldlm AST mean?
When I was reading source codes of ldlm, the term AST puzzled me. I think it means callback, but what is its full name? Asynchronous software trap. If it makes you feel any better, I had to ask as well :-/ I was told the term dates back from VMS. Hm, some quick Googling suggests that it may really mean Asynchronous SYSTEM Trap; it's possible I misheard or misremember what someone told me. http://en.wikipedia.org/wiki/Asynchronous_System_Trap --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Interoperable issues between 1.8.6 and 2.1
When you refer to ia64, are you referring to the itanium systems? I'm referring to systems where uname -p returns ia64. Is that itanium? No idea. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Interoperable issues between 1.8.6 and 2.1
I am not finding where it says explicitly that a lustre client running 1.8 will successfully be able to read and write to a set of lustre servers running lustre 2.1. are there any known issues? I forget where that was written down; I can report that it works fine. WITH THE EXCEPTION of ia64 1.8-based clients; that totally doesn't work. Are there any known issues upgrading the oss/ost and mds/mdt systems from 1.8 to 2.1? There's already 16 terabytes in place... Nope, it was basically umount-upgrade-mount for us. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] [Twg] Lustre and cross-platform portability
Ken, my apologies for this misstatement. I guess that my faulty memory is to blame for the fact that I didn't recall the MacOS code was made publicly available for download. No problem. Back when I gave the talk at LUG the source wasn't available yet due to issues here, but we got that worked out and I was pushing my changes to a publically available Oracle git repo. I did send out email to everyone about that, but I'm sure it was easy to miss. I don't think I've ever seen patches sent from you to either Oracle or Whamcloud, and unfortunately nobody on our side has had the bandwidth or user demand/funding to be pulling such changes either. Well, I did actually submit patches to Oracle to start the process of working out at least the portability issues, but I believe that was when Oracle started to implode the Lustre group so things sort of stalled. I'll take 75% of the blame for that if we assign 25% to Larry Ellison :-) This isn't strictly correct. It would be possible to change the libcfs portability layer to export the same API as the Linux kernel to MacOS and Windows. This would simplify getting the client into the Linux kernel, but still allow a native client on MacOS. Well ... that shifts the burden to cross-platform people basically having to re-implment the Linux kernel. For some things, that's possible without too much pain. For other things, it's not. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] [wc-discuss] Re: [Lustre-devel] [Twg] Lustre and cross-platform portability
Also fuse client will able to run on any OS have a FUSE porting that is any BSD, OpenSolaris, MacOS, in additional to the windows. That is easy way to maintain a single client for many OS. It is, unfortunately, not quite that simple. I can't claim to be a FUSE expert, but I've been paying attention to it on other platforms. From what I can tell, FUSE works great on Linux, but on other platforms the support is iffy. Also, it's not quite implemented the same on other operating systems as it is on Linux, making porting a Linux FUSE module to other platforms not trivial; from what I've seen, this is due to the Linux filesystem interface versus the vnode interface used by every Unix except Linux (and this is part of what makes Lustre hard to port). I guess what I'm saying is that don't fall into the underwear gnomes trap of thinking: 1) Get liblustre working with FUSE 2) ??? 3) Lustre client everywhere! It might make it easier, but I doubt it will make it easy. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] [Twg] Lustre and cross-platform portability
I have no information that the WinNT project will ever be released by Oracle, and as yet there has not been any code released from the MacOS port, so the libcfs portability layer is potentially exacting a high cost in code maintenance and complexity (CLIO being a prime example) for no apparent benefit. Similarly, the liblustre client needs a portability layer for userspace, and suffers from the same apparent lack of interest or users. In terms of the MacOS X port, I don't think that's quite fair ... the code I did is available and anyone can download it. It was functional in a very basic way but needed some additonal love. Okay, I haven't rolled that stuff into the Whamcloud release ... what happened there was when there was all the uncertainty with Oracle Lustre development I lost momentum and got caught up in other things. I've talked with the guys at Whamcloud about bringing at least the portability changes over, and that's all been on me; it's certainly on my list to work on. I can say that at least for MacOS X, there has been interest; I can't speak for the amount of interest, and there's a bit of a chicken and egg problem ... people don't plan their Lustre use around MacOS X clients because there isn't one that works well, and people don't put work into it because there isn't people who plan their Lustre use around it. I'd like to get some feedback from the Lustre community about removing the libcfs abstraction entirely, or possibly restructuring it to look like the Linux kernel API, and having the other platforms code against it as a Linux portability layer, like ZFS on Linux uses the Solaris Portability Layer (SPL) to avoid changing the core ZFS code. A related topic is whether it would be better to replace all cfs_* functions with standard Linux kernel functions en-masse, or migrate away from cfs_* functions slowly? The only thing I can think of is that if this is done, then officially Lustre is going to be a Linux-only filesystem. I understand there are real costs to maintaining the cfs layer, and I can't speak to whether or not the loss would be worth the gains. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] New wc-discuss Lustre Mailing List
According to this FAQ: http://groups.google.com/support/bin/answer.py?answer=46438topic=9257 There's no need for a Google account to join a public Google Group via email. But sending an email to wc-discuss+subscr...@googlegroups.com and wc-discuss-subscr...@googlegroups.com both ended up with an error that recipient address not exist. I forgot to follow up on this ... I sent an email to wc-discuss+subscr...@whamcloud.com and I was subscribed right away. I know someone else said they were sent to a login page; all I can say is that didn't happen to me. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] speed differences in lustre/infiniband ipoib native ib
we're struggling mightily to get ubuntu clients working in native IB mode against centos lustre/IB servers. since we've never had a working native IB client, we have no basis in our assumption that the speed increase should be tremendous thus justify our struggle. It really depends on a ton of factors that are impractical to list here. I guess I would summarize it as significant, most of the time. I wouldn't call it tremendous, compared to just using TCP/IP over the iboib interface. But seriously, though ... struggling mightily? Once we got the Lustre IB module loaded, everything Just Worked. If you want to give some details on what's going wrong here we might be able to help you. (If by some random chance your problem is you can't get the Lustre IB module loaded because of symbol version issues, then you should check the archives because that has been discussed plenty of times). --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Poor metadata operation performance
So I guess there are some things I _still_ don't understand about Lustre metadata handling. Specifically, what metadata gets stored on OSTs and why. What brings this all up is that a) we have users who have lots of files and b) we recently are doing through some reorganization that requires changing the groups on lots of these files (this is all running Lustre 1.8.4; we're due for an upgrade in the medium future). I figured okay, this wouldn't be so bad, since those are all metadata server operations. But I started running some tests, and I found out that chown() system calls perform poorly. Because I was doing some previous metadata performance analysis, I took a souce code tree which consists of approximately 50,000 files and put two copies in one of our Lustre filesystems: one with the default striping (across all OSTs) and one where all files have no striping at all. The performance between these two trees for stat() calls is large, as you can imagine, but the disparity between the chown() calls is even larger. You can run chgrp on all of the files in the no-striped copy in about 3-5 seconds, but the striped copy takes more than 50 seconds. I did some more digging as to why this is. I thought maybe at first that this is an issue on the client, but there is code in there that skips over talking to the OSTs for certain types of metadata updates, and turning on debugging on the client verifies that no setattr RPCs are being sent to the OSSes. Looking more closely at the RPC traces reveals that the issue is on the metadata server; the setattr RPCs simply take longer when the files are striped. I've looked at the metadata server code for a bit, and I've verified that the metadata server does send setattr RPCs to the OSSes, but I see that it's done asynchronously; it shouldn't be waiting for the replies. So I'm stumped as to why this is happening. I also realize that I'm still puzzled as to what metadata is stored on the OSTs; it seems like the client prefers the metadata from the MDS (except of course for size), but a fair amount of metadata is still stored on the OSSes. Can anyone shed some light on this? --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Poor metadata operation performance
Ken, the OSTs need to track the ownership of objects for quota. The more stripes there are on a file, the more RPCs that need to be sent, which is why we don't recommend wide striping unless there is a reason for it (bandwidth, size, etc). Fair enough; I always forget about quota accounting, because we never use it. But I'm wondering why this in particular causes such a hit, because the MDS sends the setattr RPCs asynchronously; in theory it should just fire them off and not have to wait until they're done. Perhaps it's the overhead of sending those RPCs which is slowing things down? I could believe that, although I would have thought that it wouldn't be that bad. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] IB storage as an OST target
Anybody had any experience using an IB based storage target as an OST? We do that. Apart from the obvious issue of separating the IB SAN(SRP/SER) storage traffic from the Lustre traffic are there any issues? We don't actually separate the IB traffic from the Lustre traffic; in our cases they actually run over the same IB HCAs. That isn't the setup I would have chosen, but it was the system that was available. Here is one implementation detail that stands out in my mind. Because the IB storage tends to come on line rather late in the boot process, we had to develop a custom boot script that waits around for the IB device nodes to appear before attempting to mount the Lustre filesystems. That was a bit of a pain until we had it all worked out. As other as pointed out, if your backend storage disappears (which happens more often than I would prefer, but in our case the issues which caused that have been resolved for the most part) then that makes Lustre very unhappy very quickly. We've been able to recover from those situations, but it can be a royal pain. What about failover? We use MMP as others have mentioned, but we don't actually have the Lustre failover stuff all up and running; mostly it hasn't been an issue for us, so we haven't seen a need to finish it. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] clients gets EINTR from time to time
I don't understand why you don't just fix your application to handle a perfectly valid and expected condition (that it's currently not handling) instead of wasting time trying to find the cause of the expected condition. Even if you find it, it's likely not a bug and not something that can/will be fixed. It's your application that needs to be fixed. To be fair ... normally disk I/O operations are not interruptable by signals, so it's not an unreasonable behavior on the part of an application. I did check POSIX, and it doesn't say that behavior is restricted only to network sockets, so yeah, it's TECHNICALLY allowable behavior according to the standard (although the Linux manpage for signal(7) says that it will not happen). But honestly, I've seen plenty of cases where applications handle this for network I/O; it's normal, everyone knows it will happen there. But for _disk_ I/O? Never seen it done. I'm not saying that there are no applications that handle this case, but it's certainly very uncommon. I freely admit that network filesystems sort of mix the concepts of network socket and disk I/O together, and what is the right behavior is unclear. But calling this perfectly valid and expected is not quite accurate. It would be interesting to see what other network filesystems do under the same circumstances. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] clients gets EINTR from time to time
I have a report from a user that is is getting EINTR when a SIGALRM goes off on his write(). It isn't unexpected to get SIGALRM because he called the alarm, but he also has SA_RESTART set. I can't remember whose responsibility it is to restart the call, syscall or whereever, but it seems that someone is dropping the ball because if EINTR is returned then SA_RESTART didn't seem to do the trick, right? I would agree with you on that one; if you're setting SA_RESTART then you shouldn't ever get EINTR. It looks like what should be happening is that if you get interrupted the system call should return ERESTARTSYS and then after the signal handler is done the system call should be re-run for you by the signal handling code. I see that at least for some cases, Lustre will use ERESTARTSYS; just a guess, but maybe somewhere Lustre is returning EINTR itself instead of returning ERESTARTSYS? --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] clients gets EINTR from time to time
OK, the app is used to deal with standard disks, that is why it is not handling the EINTR signal propoerly. I think you're misunderstanding what a signal is in the Unix sense. EINTR isn't a signal; it's a return code from the write() system call that says, Hey, you got a signal in the middle of this write() call and it didn't complete. It doesn't mean that there was an error writing the file; if that was happening, you'd get a (presumably different) error code. Signals can be sent by the operating system, but those signals are things like SIGSEGV, which basically means, you're program screwed up. Programs can also send signals to each other, with kill(2) and the like. Now, NORMALLY systems calls like write() are interrupted by signals when you're writing to slow devices, like network sockets. According to the signal(7) man page, disks are not normally considered slow devices, so I can understand the application not being used to handling this. And you know, now that I think about it I'm not even sure that network filesystems SHOULD allow I/O system calls to be interrupted by signals ... I'd have to think more about it. I suspect what happened is that something changed between 1.8.5 and the previous version of Lustre that you were using that allowed some operations to be interruptable by signals. Some things to try: - Check to see if you are, in fact, receiving a signal in your application and Lustre isn't returning EINTR for some other reason. - If you are receiving a signal, when you set the signal handler for it you could use the SA_RESTART flag to restart the interrupted I/O; I think that would make everything work like it did before. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] clients gets EINTR from time to time
As for your questions : - I have to mention that I always had had this issue, and this is why I've upgraded from 1.8.4 to 1.8.5, hoping this would solve it. Ah, okay, I misunderstood that; my apologies. - I will try to have that SA_RESTART flag set in the app... if I can find where the signal handler is set. Searching for sigaction or signal should help there. - How can I see that lustre is returning EINTR for any other reason ? As I said no logs shows nothing neither on MDS or OSSs, but I didn't go through examining lctl debug_kernel yet... which I'm going to do right away... Weeelll ... that was just a guess on my part. I did a quick grep though the Lustre sources and saw a few places where EINTR was returned, but most of those seemed to deal with the case where I/O was interrupted (those places happened fairly far down in the stack; it wasn't clear to me that those errors would ever bubble back up to a return code to a system call). If _that_ is the issue, then tracking that down will be a challenge. my last question is : how can I tell which signal I am receiving ? because my app doesn't say, it just dumps outs the write/pwrite error code. I think your easiest way is to use strace; something like strace -e signal should do the right thing (that will only trace signals, not all system calls). And if there is no signal handler, then it should follow the standard actions (as of man 7 signal). On the other hand, my app does not stop or dump core, and is not ignored, so it has to be handled in the code. Correct me if I'm wrong... That is my understanding as well; if you don't have a signal handler installed, the default action should be taking place, and if the default action is to ignore the signal that you shouldn't be getting EINTR. But hey, I've been wrong before :-) --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Compiling Lustre 2 on SLES10
FYI, 1.8 and 2.0 used to share the ldiskfs patches (we lost this with the transition to git), so the ldiskfs module shipped with 1.8.5 (which supports SLES11 SP1) already has most of the patches required for lustre 2.0.0. I think it would have been less painful to start from there and to add the missing patches (e.g. data_in_dirent.patch). Oh, sure, NOW you tell me! No doubt you're right; I guess I was thinking that since there was a series of patches marked SLES11 in 2.0.0.1, that was the closest place to start from. Live and learn, I guess. On the upside, I now know more about ext4, ldiskfs, and the Lustre build system than I did before :-/ --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Compiling Lustre 2 on SLES10
Thanks for putting in this effort. I think that others would be interested (I am). Perhaps you could share your work at github or post to lustre-devel ahead of LUG? Thanks for the words of encouragement! My boss has no problem with my work being redistributed, but he's not really a fan of putting it on github; for legal reasons, we'd rather be under the umbrella of another organization (like what I am doing with the MacOS X port). I'll check into a few things and see what my options are, and I'll post something here (and to lustre-devel) when we get it out and about (I suppose I could do a simple context diff now, but that's not very manageable). --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Help! Newbie trying to set up Lustre network
Dmesg and syslog are clean and has no entries about lustre client. ... are you _sure_? Even /var/log/messages? I ask because this sure seems like the Lustre modules are not loaded (you can check that with the lsmod command). If they aren't loaded, then the core issue will be buried somewhere in the messages file (the trick is to look at the earliest related Lustre messages). For example, if you run into the problem that Bob Ball mentioned where one of the RPC services is using the Lustre port, you can find the Address already in use error message if you look at the right spot. If the lustre modules are loaded, then it's beats me what is causing this problem. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Help! Newbie trying to set up Lustre network
But my mount command is failing and that's the issue: mount -t lustre 192.168.0.2@tcp0:/temp /lustre mount.lustre: mount 192.168.0.2@tcp0:/temp at /lustre failed: No such device Are the lustre modules loaded? Right, and every time I've seen the mount command fail like this (with ENODEV as the error), the _root cause_ is the kernel modules are not loading; that can happen for a variety of reasons. The fact you're getting _nothing_ in the logs is itself rather suspicious; as Brian has already pointed out Lustre is normally very chatty, even when it is working correctly. You could try loading the modules yourself with insmod; if that's not working, then you should start from there. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Compiling Lustre 2 on SLES10
Yes, that is what Oracle had announced in the roadmap. SLES servers are still supported on Lustre 1.8.x, but Oracle announced plans to not support them with Lustre 2.x. Given the similarities between the RHEL6 and SLES11 kernels, I am sure someone could bring SLES support back when RHEL6 is supported, if enough people were willing to pay for it. If anyone cares ... I did the mechanics of getting Lustre 2.0.0.1 compiled and running under SLES11SP1 (I first tried just SLES11, but it was too hard; the kernel was just too old to make bringing the ext4 patches forward feasible, at least for me). I have it working right now in a small test filesystem I use for non-production work. It was a pain (the major problems were in ldiskfs), but MOST of the pieces were there; it was mostly a matter of shuffling things around and figuring out what went where (I don't want to say it was EASY; it took a while. But it wouldn't call it _hard_; it was mostly annoying, especially since I wasn't that familiar at the time with ext4/ldiskfs). I've been thinking of working with the open-source Lustre groups to get this into a future release; perhaps I'll talk with some of them at the upcoming LUG. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] recovering formatted OST
Now I have another problem. After last segfault I can not restart the fsck due to MMP. [...] Also when I try to access filesystem via debugfs it fails: debugfs -c -R 'ls' /dev/scratch2_ost16vg/ost16lv debugfs 1.41.10.sun2 (24-Feb-2010) /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while opening filesystem ls: Filesystem not open Is there a way to clear teh MMP flag so it allows fsck to run? You want tune2fs -f -E clear-mmp --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Problem with LNET and openibd on Lustre 1.8.4 while rebooting
lustre does get unmounted before NFS filesystem as seen in the log message... the problem is due to the fact that LNET is still up when openibd gets removed. Huh, I'm wondering how it ever worked right before. Certainly on the systems I have at 1.8.1.1, I always had to have a Lustre start/stop script which did a lustre_rmmod as part of the stop sequence. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] OST targets not mountable after disabling/enabling MMP
We recently experienced a power failure (and subsequent UPS failure) which caused our Lustre filesystem to shutdown hard. We were able to bring it back online but started seeing errors where the OSTs were being remounted as read-only. We observed that all of the read-only OSTs were reporting an I/O error on the same block (the MMP block) and generating the following message: [...] I had a similar issue once, but the issue was tha the MMP block was corrupted. What finally fixed it was running tune2fs -E clear-mmp. Maybe that might solve the problem? --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] OST targets not mountable after disabling/enabling MMP
This gives me an MMP error though: [r...@oss-0-25 log]# tune2fs -E clear-mmp /dev/sdd tune2fs 1.40.11.sun1 (17-June-2008) tune2fs: MMP: appears fsck currently being run on the filesystem while trying to open /dev/sdd Couldn't find valid filesystem superblock. Oh, I forgot ... did you try adding the -f flag? E.g.: # tune2fs -f -E clear-mmp /dev/sdd According to the tune2fs man page, when you use clear-mmp, you also need the -f flag. Still being able to mount the filesystm read-only would make sense to me, since that wouldn't affect fsck being run. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] OST targets not mountable after disabling/enabling MMP
Using 'tune2fs -f -E clear-mmp' causes tune2fs to segfault: Ewww well, not sure what to tell you about that. Did you use a newer version of tune2fs/e2fsprogs? Our current version is e2fsprogs-1.40.11.sun1-0redhat. Do you know if it's safe to rev up versions on e2fsprogs while running an older lustre kernel revision (1.6.6)? I am using e2fsprogs-1.41.6.sun1-0suse ... and I know that is old. I was going to say that I don't know if revving up e2fsprogs is okay, but I see that Andreas already answered that one. I can't be 100% sure that upgrading e2fsprogs _will_ solve your problem, but I think it's worth a shot. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Per directory quota
Without size-on-mds, either way would have to query both the mds and each OST to get the size info. Not being that familiar with size-on- mds, it does seem likely that du would still have to query the OST for size info, even when ls -l does not. As someone who has spent the past week or two struggling with the size-on-mds code ... IF everything is working right (a reasonably-sized IF), then it should not. AFAIK, du is simply calling stat(), which is the same thing ls -l is doing. Certainly part of the information you store as part of SOM is the disk block usage, which is what du is adding up. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre on FreeBSD
...the build them against your kernel -- unless you mean licensing support under FreeBSD? In terms of licensing ... since Lustre is GPL, I can't see any reason why there would be any licensing conflict. All you have to do is download the sources, make the few minor changes to port Lustre to FreeBSD, and you should be in business. Should be a snap! :-) But seriously ... as someone who's been beating their head against the wall with regards to the Macintosh port, you've got an uphill battle. And by uphill, I'm talking about the Northeast ridge of Mount Everest. Okay, maybe it's not that bad, but it's not for the faint of heart or people unfamiliar with kernel development. I did find some effort a while ago to port Lustre to FreeBSD, but I think it used Liblustre and FUSE, but when I last checked it seem to have stalled. If you are crazy enough to want to port Lustre to FreeBSD, you might want to check out the Macintosh port. It is at least the preliminary work you would need to do to get it working on a vnode-based system. A lot of work will still need to be done, but you could leverage a lot of work from what I've done (and what I'm still working on). If you're not that crazy and you're asking if someone has ported Lustre to FreeBSD already ... I think the correct answer is no. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] MMP Problems
Feature will not be enabled until e2fsprogsis updated and 'tune2fs -O mmp %{device}' is run. Normaly MMP should be automaticly enabled with lustre 1.8.x. I also installed the newerst e2fsprogs but the error message is the same. The rest works fine. In the specific case of SLES11, not only do you need to install the latest e2fsprogs, but you also need some libraries as well. If you look at http://downloads.lustre.org/public/tools/e2fsprogs for newer versions of e2fsprogs you will find a sles11 directory. In there you will find the e2fsprogs RPM, as well as other RPMs you need. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] MacOS X Lustre client source code now available
Greetings all. Thanks to the good folks at Oracle (who were kind enough to allow me to use their public Git repository infrastructure, and did all the hard work of setting up the repo and educating me on the finer points of pushing to remote Git repos) I am pleased to announce that the source code to the MacOS X Lustre client that I released two weeks ago is now available. The URL for this repo is: git://git.lustre.org/nrl/lustre.git That repo will have two branches: master (which has the latest master source code that I've merged against) and b_master_macosx (the branch on which I do my work). If you are unfamiliar with how Git works, here's a super-brief tutorial to get the source code: % git clone git://git.lustre.org/nrl/lustre.git % cd lustre % git checkout b_master_macosx If you are actually crazy enough to want to BUILD the source code, well, here's what you should do (you should be running Snow Leopard and have the latest download of Xcode): % ./autogen.sh % ./configure --disable-server --disable-snmp --disable-liblustre-acl --enable-mpitests=no --enabler-pinger % make For the truely curious, the last (and only) release I did can be accessed via the tag macosx-alpha-1. I plan on pushing to this repo on a regular basis, so it make break occasionally; consider yourself warned. Shar and enjoy! --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Future of LusterFS?
Make sure you read and understand the Lustre 2.0 release notes before you buy. There seemed to be some specifics in there about using Oracle hardware. In all fairness ... that only matters if you pay Oracle for support. If you aren't paying Oracle for support (or have no plans to), then it doesn't matter. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Early alpha version of MacOS X Lustre client available
Greetings all. I am pleased to announce that I have made available an early alpha version of my port of the Lustre client to MacOS X. By early alpha, I mean that it works, for me, and it might work for you. But it might crash your system. Actually, it will probably almost certainly crash your system at some point. Don't plan on using this in any sort of production system is really my main point (I don't expect that it will harm your servers, but, hey ... it IS an early alpha). You can download it here: ftp://ftp.cmf.nrl.navy.mil/pub/kenh/macosx-lustre-client-alpha1.tar.gz There is a brief README in there, and some scripts to install and uninstall the Lustre client. For those of you going to the LUG I am giving a talk titled Porting Lustre to Operating Systems other than Linux and this client will obviously be the topic of that talk, so feel free to ask me more about it then. However, I have answered some likely questions below. If you ask me something that I've already answered below, I will feel free to mock you mercilessly. 1) Is this based in userspace via FUSE or something like that? Nope This is a real, honest-to-god port of the Lustre kernel extensions to MacOS X. You get all of the same kernel extensions that you know and love from Linux, just ported to the Mac (well, I had to write something new to replace llite). None of the server extensions are supported, however. 2) What version of Lustre is this based on? It is based on the HEAD as of ... Thursday (4/8/2010). Specifically, commit d354281 is the last commit from master that I have merged into my local branch tree for this version. 3) Hey, I noticed that feature X doesn't seem to be supported? Yes, you are right. Sadly, I was not able to get feature X working due to lack of {time, technical ability, understanding} on my part. But rest assured ... feature X is on the schedule and I plan on implementing it, hopefully before {the next few weeks, the next few months, next year, the heat death of the universe}. 4) Hey, how come performance ... isn't great? Yeah ... sorry about that. The short answer is that this client doesn't currently implement any caching. At all. Obviously that's a major deficiency and one I plan on correcting as soon as reasonably possible. There's also no readahead, so that's part of the reason why read peformance is so lousy. Okay, write performance, while better, kinda sucks as well. I'll be working on that as well. 5) It seems like timestamps are messed up? Yeah, I haven't quite had a chance to make that work yet. So any files created will this client will have a Unix timestamp of 0, which means that their dates will be Jan 1, 1970 UTC. Also, the setattr call will currently return a not supported error. 6) Which version of MacOS X does this client work with? It currently supports Intel-based Macintoshes runing Snow Leopard (Darwin version 10.6). Specifically, I developed it on systems running 10.6.2 and 10.6.3. There are no plans to support Leopard (10.5) or PowerPC systems. 7) Does this work with Lustre 1.8/1.6 servers? Sadly, no. Right now it only works with 2.0 servers. I can't take all the blame for that one, though ... from what we've seen here, that's a problem with all 2.0-based clients with earlier servers. 8) Is the source code available? Not yet. I want to release it, my boss wants me to release it ... we just need to figure out our long-term plans for this source code, it's eventual home, and the mechanics of distributing the source code. If you are a kernel hacker, and you REALLY want to hack on it, drop me a line and we'll see if we can work something out. 9) Hey, my machine crashed while testing it! What should I do? Send me the kernel panic log, and I'll take a look at it. 10) Something else weird/strange happened while testing it. What should I do? Drop me a line, and I'll see if I can figure out what's going on. Enjoy! --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] RHEL5's OFED with lustre1.8.2 on IB
Why not just use the binary kernel we provide instead of rebuilding your own? It's the *exact* same kernel that we used in our QA testing and therefore a known quantity. I have to agree with Brian here ... the best success that we've had is to either use _everything_ from Sun/Oracle (I'm just not used to thinking of you guys as Oracle yet!), or compile _everything_ yourself. We do the latter on some systems (for various reasons), but I prefer it when we can do the former. Mixing and matching just leads you into trouble (like the symbol version problems you were encountering). --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Is OFED 'kernel-ib' required for o2ib on RHEL5?
I've tried also to get Lustre 1.8.2 working with RHEL5.4 and OFED 1.5 but I didn't get this trio working. Even with OFED 1.4.2 I had problems when modprobing lustre module. I think you had problems with the module symbol versions, right? Those are relatively easy to track down, once you know a few tricks; the core problem is that you (or someone else) compiled Lustre by pointing it at the wrong version of OFED. If that's your problem, then let me know; I can give you some guidance on how to figure out what is wrong. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Is OFED 'kernel-ib' required for o2ib on RHEL5?
You're right, I had problems with the module symbol versions using Lustre 1.8.2 packages available at Sun website, kernel 2.6.18-164.11.1.el5 (RHEL 5.4) and OFED 1.5. The same problems happens when using OFED 1.4.2. So since this comes up now and then, I've cc'd the list. So you can Google around to find more about kernel symbol versioning. The short answer is that there is a CRC associated with each exported symbol in the loaded kernel, and that version is recorded in the module when it is compiled. That's all well and good, but figuring out what happens when it doesn't work is a pain, because all of the information isn't in one place (and nobody has explained it well, at least that I've seen). When a module (like Lustre) is compiled, it's pointed at a file called Module.symvers; that contains the versions of the symbols that modules are expected to link against, and those versions are recorded in the module object file. When you get this mismatch at module load time, one of two things is happening: the wrong OFed is being loaded, or you linked against the wrong Module.symvers file. How do you figure out which one is the problem? Well, let's take a common OFed symbol, like rdma_connect. You can find out the version of this symbol by grep'ing /proc/kallsyms. On our system: # grep rdma_connect /proc/kallsyms a0375510 u rdma_connect [ko2iblnd] a0375510 u rdma_connect [rdma_ucm] a0375510 u rdma_connect [ib_sdp] a0377000 r __ksymtab_rdma_connect [rdma_cm] a0377225 r __kstrtab_rdma_connect [rdma_cm] a03770f0 r __kcrctab_rdma_connect [rdma_cm] 0ef3a1e8 a __crc_rdma_connect [rdma_cm] a0375510 T rdma_connect [rdma_cm] The symbol you care about is the absolute symbol, the one prefixed by __crc. So in this case, we are interested in __crc_rdma_connect, and that symbol's version is 0x0ef3a1ea. This is the symbol used by the currently running kernel. Which version is Lustre linked against? Well, for that you need to find the ko2iblnd.ko file, and dump the __versions section. # objdump -s -j __versions ko2iblnd.ko | less [...] 0670 0680 e8a1f30e 72646d61 5f636f6e rdma_con 0690 6e656374 nect 06a0 This display isn't as pretty, but you want to look in the hex dump just before the symbol name. In this case, right before rmda_connect, you will see e8a1f30e ... which is the little-endian version of our symbol version! So they match up, and everything works. If you want to find out which symbol version is in a particular OFed module (in this case, we want to look at rdma_cm.ko), you can do this: # nm ./kernel/drivers/infiniband/core/rdma_cm.ko | grep rdma_connect cd7aa3e6 A __crc_rdma_connect Wrong version! But we're ACTUALLY using the module located here: nm ./updates/kernel/drivers/infiniband/core/rdma_cm.ko | grep rdma_connect 0ef3a1e8 A __crc_rdma_connect Which is the correct version. But if you LINK against the first version, you'll get these errors when you try to load Lustre. Note that my Module.symvers file for this kernel contains: 0xcd7aa3e6 rdma_connectdrivers/infiniband/core/rdma_cm EXPORT_SYMBOL Which is wrong! In this case, you need to explicitly point Lustre at the OFed directory which contains the Module.symvers file. (Can you tell I've beaten my head against the wall over this issue a WHOLE LOT? :-/) --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Is OFED 'kernel-ib' required for o2ib on RHEL5?
Thank you very much for your post, it worked! So ... what was your problem? Wrong version of OFed loaded? Or Lustre was compiled using the wrong symbol versions? --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] How to force client-oss communication over IB when the MDS has only ethernet?
At the moment, it seems that all the traffic between clients-OSS goes also through the slow eth connection. Is it possible to force them to use faster IB interfaces when communication with each other, and only use eth to communicate with the MDS? Stupid question time: is it possible that you added the IB interface to the OSSes _after_ you created the filesystems on the OSSes? (I know that the MDS remembers the interfaces that are on it at MDS creation time, and you have to do an extra step to tell it about any new interfaces; I think the same is true of the OSSes, but I am not 100% sure). --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] How to force client-oss communication over IB when the MDS has only ethernet?
Yes! I created OSTs and did some testing with plain eth configuration first, so this would explain things. How can I tell the MDS that things have changed? You'll have to run tunefs.lustre --writeconf (the Lustre manual explains this in a bit more detail). I had some vague memory that you only used to have to do it on the MDT, but last time I did it I had to do it on the MDT and all OSTs. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] e2fsck: undefined symbol: ext2_attr_index_prefi
I have Lustre 1.8.1 running on a bunch of SLES 11/x86_64 systems. I'm using the stock binaries from www.sun.com. Everything is fine ... except that some of the e2fsprogs utilites are unhappy. Specifically, if I try to run e2fsck, I get: # e2fsck /dev/sdb e2fsck: symbol lookup error: e2fsck: undefined symbol: ext2_attr_index_prefix I have, of course, the latest e2fsprogs that were released with 1.8.1: # rpm -q -a | grep e2fsprogs e2fsprogs-1.41.6.sun1-0suse (Occasionally tunefs.lustre complains about a missing symbol as well, but it has mmp in the name. But that doesn't happen always). What am I doing wrong? I was not involved with the installation of the SLES 11 system, but I was under the impression it was pretty vanilla. --Ken ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss