[Lustre-discuss] Unable to move MDS using procedure in the manual

2013-06-04 Thread Ken Hornstein
I tried to move my MDS from one filesystem on the same machine to another,
using the procedure outlined in the Lustre manuals (I didn't use dd, since
the underlying disks weren't the same size and also I did not think
it was required).

Specifically, I used rsync to copy the files, and also used getfattr/setfattr
to copy over the extended attributes.  Some brief poking around seemed to
show that the EA information made it into the new filesystem.

However, when I went to mount the new MDS partition, it failed with the
following error:

May 30 23:36:50 mds-foo kernel: [  186.604083] LustreError: 
3082:0:(md_local_object.c:433:llo_local_objects_setup()) creating obj [fld] fid 
= [0x20001:0x3:0x0] rc = -116
May 30 23:36:50 mds-foo kernel: [  186.698205] LustreError: 
3082:0:(mdt_handler.c:4576:mdt_init0()) Can't init device stack, rc -116
May 30 23:36:50 mds-foo kernel: [  186.797206] LustreError: 
3082:0:(obd_config.c:522:class_setup()) setup foo-MDT failed (-116)
May 30 23:36:50 mds-foo kernel: [  186.806140] LustreError: 
3082:0:(obd_config.c:1363:class_config_llog_handler()) Err -116 on cfg command:
May 30 23:36:50 mds-foo kernel: [  186.815615] Lustre:cmd=cf003 
0:foo-MDT  1:foo-MDT_UUID  2:0  3:foo-MDT-mdtlov  4:f  

There were more errors, bu they all pretty much were cascading from these
errors.  I switched back to the original filesystem and everything worked.

I am willing to believe I did something wrong, but I'm not sure what; I
did everything the directions said to do.  -116 is ESTALE, and I found
in the code where I believe that error was returned, but it was a little
unclear to me what the root cause was.  Can anyone offer any advice?

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] [HPDD-discuss] Unable to move MDS using procedure in the manual

2013-06-04 Thread Ken Hornstein
Which version?

Whoops, can you believe I forgot that?  It's 2.1.2.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] [HPDD-discuss] Unable to move MDS using procedure in the manual

2013-06-04 Thread Ken Hornstein
Which version of Lustre is this? File based backup / restore does not work
in 2.x. OI scrub which rebuilds the object index is available from Lustre
2.3 onwards. So file based backup / restore will work from 2.3 onwards.

Well, crud.  I guess that's what Colin was going to tell me, and I see
Andreas said the same thing.

So, this leads to a follow-up question: _where_ is latest and greatest
Lustre manual?  I used the one labelled 2.0 here:

http://wiki.lustre.org/manual/LustreManual20_HTML/LustreOperations.html

Which doesn't actually mention that you can't do a file-level backup
on the MDT.  Some poking around led me to the Whamcloud one, which actually
does say that.

Perhaps an upgrade to 2.4 is in order (which we were interested in doing
anyway).

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] problem with installing lustre and ofed

2012-12-28 Thread Ken Hornstein
That's good to know kernel-ib comes with the lustre stock install.

What about the rest of the OFED tools?  I mean things like ibdiagnet,
ibstatus, etc?  (I will look at the contents of the other rpms and see
what I can learn)

I think Jeff missed a few steps.  If you want the _server-side_ packages,
what you need to do is:

- Install a Lustre-patched kernel, including devel packages (you can use
  the ones from Whamcloud if they're suitable).
- Build your OFED against that kernel  install it.
- Compile Lustre against the Lustre-patched kernel and the OFED.  This
  is the tricky part; you need to make sure to tell Lustre to link against
  the right OFED package.

There are Lustre build scripts that actually automate all of this; last
time I checked, they were only available in the git tree, NOT in the
source tarball.  Those build scripts are a bit of a pain to use, and I
find that I always have to tweak them a bit.  But once you figure them all
out it makes things easier.

Now as for the userspace utilities ... well, you need to make sure they're
not too far off from the kernel.  How far is too far?  Good question.
I don't think they're guaranteed to work when they don't match, but in my
limited experience minor version differences are ok.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] What does ldlm AST mean?

2012-08-28 Thread Ken Hornstein
When I was reading source codes of ldlm, the term AST puzzled me. I think
it means callback, but what is its full name?

Asynchronous software trap.  If it makes you feel any better, I had to
ask as well :-/  I was told the term dates back from VMS.

Hm, some quick Googling suggests that it may really mean Asynchronous
SYSTEM Trap; it's possible I misheard or misremember what someone told me.

http://en.wikipedia.org/wiki/Asynchronous_System_Trap

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Interoperable issues between 1.8.6 and 2.1

2012-08-23 Thread Ken Hornstein
When you refer to ia64, are you referring to the itanium systems?  

I'm referring to systems where uname -p returns ia64.  Is that
itanium?  No idea.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Interoperable issues between 1.8.6 and 2.1

2012-08-22 Thread Ken Hornstein
I am not finding where it says explicitly that a lustre client running
1.8 will successfully be able to read and write to a set of lustre
servers running lustre 2.1. are there any known issues?

I forget where that was written down; I can report that it works fine.
WITH THE EXCEPTION of ia64 1.8-based clients; that totally doesn't work.

Are there any known issues upgrading the oss/ost and mds/mdt systems
from 1.8 to 2.1?  There's already  16 terabytes in place...

Nope, it was basically umount-upgrade-mount for us.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] [Twg] Lustre and cross-platform portability

2012-03-16 Thread Ken Hornstein
Ken, my apologies for this misstatement.  I guess that my faulty memory
is to blame for the fact that I didn't recall the MacOS code was made
publicly available for download.

No problem.  Back when I gave the talk at LUG the source wasn't available
yet due to issues here, but we got that worked out and I was pushing
my changes to a publically available Oracle git repo.  I did send out
email to everyone about that, but I'm sure it was easy to miss.

I don't think I've ever seen patches sent from you to either Oracle or
Whamcloud, and unfortunately nobody on our side has had the bandwidth or
user demand/funding to be pulling such changes either.

Well, I did actually submit patches to Oracle to start the process of
working out at least the portability issues, but I believe that was
when Oracle started to implode the Lustre group so things sort of
stalled.  I'll take 75% of the blame for that if we assign 25% to
Larry Ellison :-)

This isn't strictly correct.  It would be possible to change the libcfs
portability layer to export the same API as the Linux kernel to MacOS
and Windows.  This would simplify getting the client into the Linux
kernel, but still allow a native client on MacOS.

Well ... that shifts the burden to cross-platform people basically having
to re-implment the Linux kernel.  For some things, that's possible without
too much pain.  For other things, it's not.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] [wc-discuss] Re: [Lustre-devel] [Twg] Lustre and cross-platform portability

2012-03-16 Thread Ken Hornstein
Also fuse client will able to run on any OS have a FUSE porting that is
any BSD, OpenSolaris, MacOS, in additional to the windows.  That is
easy way to maintain a single client for many OS.

It is, unfortunately, not quite that simple.

I can't claim to be a FUSE expert, but I've been paying attention
to it on other platforms.  From what I can tell, FUSE works great
on Linux, but on other platforms the support is iffy.  Also, it's
not quite implemented the same on other operating systems as it is
on Linux, making porting a Linux FUSE module to other platforms not
trivial; from what I've seen, this is due to the Linux filesystem
interface versus the vnode interface used by every Unix except Linux
(and this is part of what makes Lustre hard to port).

I guess what I'm saying is that don't fall into the underwear gnomes
trap of thinking:

1) Get liblustre working with FUSE
2) ???
3) Lustre client everywhere!

It might make it easier, but I doubt it will make it easy.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] [Twg] Lustre and cross-platform portability

2012-03-15 Thread Ken Hornstein
I have no information that the WinNT project will ever be released
by Oracle, and as yet there has not been any code released from the
MacOS port, so the libcfs portability layer is potentially exacting
a high cost in code maintenance and complexity (CLIO being a prime
example) for no apparent benefit.  Similarly, the liblustre client needs
a portability layer for userspace, and suffers from the same apparent
lack of interest or users.

In terms of the MacOS X port, I don't think that's quite fair ...
the code I did is available and anyone can download it.  It was
functional in a very basic way but needed some additonal love.
Okay, I haven't rolled that stuff into the Whamcloud release ...
what happened there was when there was all the uncertainty with
Oracle  Lustre development I lost momentum and got caught up in
other things.  I've talked with the guys at Whamcloud about bringing
at least the portability changes over, and that's all been on me;
it's certainly on my list to work on.

I can say that at least for MacOS X, there has been interest; I can't
speak for the amount of interest, and there's a bit of a chicken and
egg problem ... people don't plan their Lustre use around MacOS X
clients because there isn't one that works well, and people don't put
work into it because there isn't people who plan their Lustre use
around it.

I'd like to get some feedback from the Lustre community about removing
the libcfs abstraction entirely, or possibly restructuring it to look
like the Linux kernel API, and having the other platforms code against
it as a Linux portability layer, like ZFS on Linux uses the Solaris
Portability Layer (SPL) to avoid changing the core ZFS code.  A related
topic is whether it would be better to replace all cfs_* functions with
standard Linux kernel functions en-masse, or migrate away from cfs_*
functions slowly?

The only thing I can think of is that if this is done, then officially
Lustre is going to be a Linux-only filesystem.  I understand there are
real costs to maintaining the cfs layer, and I can't speak to whether or
not the loss would be worth the gains.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] New wc-discuss Lustre Mailing List

2011-07-12 Thread Ken Hornstein
According to this FAQ:
http://groups.google.com/support/bin/answer.py?answer=46438topic=9257

There's no need for a Google account to join a public Google Group via
email. But sending an email to wc-discuss+subscr...@googlegroups.com
and wc-discuss-subscr...@googlegroups.com both ended up with an error
that recipient address not exist.

I forgot to follow up on this ...

I sent an email to wc-discuss+subscr...@whamcloud.com and I was subscribed
right away.  I know someone else said they were sent to a login page; all
I can say is that didn't happen to me.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] speed differences in lustre/infiniband ipoib native ib

2011-06-29 Thread Ken Hornstein
we're struggling mightily to get ubuntu clients working in native IB
mode against centos lustre/IB servers.

since we've never had a working native IB client, we have no basis in
our assumption that the speed increase should be tremendous  thus
justify our struggle.

It really depends on a ton of factors that are impractical to list
here.  I guess I would summarize it as significant, most of the
time.  I wouldn't call it tremendous, compared to just using
TCP/IP over the iboib interface.

But seriously, though ... struggling mightily?  Once we got the Lustre
IB module loaded, everything Just Worked.  If you want to give some
details on what's going wrong here we might be able to help you.

(If by some random chance your problem is you can't get the Lustre IB
module loaded because of symbol version issues, then you should check
the archives because that has been discussed plenty of times).

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Poor metadata operation performance

2011-05-20 Thread Ken Hornstein
So I guess there are some things I _still_ don't understand about Lustre
metadata handling.  Specifically, what metadata gets stored on OSTs and
why.

What brings this all up is that a) we have users who have lots of files
and b) we recently are doing through some reorganization that requires
changing the groups on lots of these files (this is all running Lustre
1.8.4; we're due for an upgrade in the medium future).

I figured okay, this wouldn't be so bad, since those are all metadata
server operations.  But I started running some tests, and I found out
that chown() system calls perform poorly.

Because I was doing some previous metadata performance analysis, I took
a souce code tree which consists of approximately 50,000 files and put
two copies in one of our Lustre filesystems: one with the default striping
(across all OSTs) and one where all files have no striping at all.  The
performance between these two trees for stat() calls is large, as you
can imagine, but the disparity between the chown() calls is even larger.
You can run chgrp on all of the files in the no-striped copy in about
3-5 seconds, but the striped copy takes more than 50 seconds.

I did some more digging as to why this is.  I thought maybe at first that
this is an issue on the client, but there is code in there that skips
over talking to the OSTs for certain types of metadata updates, and turning
on debugging on the client verifies that no setattr RPCs are being sent
to the OSSes.  Looking more closely at the RPC traces reveals that the issue
is on the metadata server; the setattr RPCs simply take longer when the
files are striped.

I've looked at the metadata server code for a bit, and I've verified
that the metadata server does send setattr RPCs to the OSSes, but I see
that it's done asynchronously; it shouldn't be waiting for the
replies.  So I'm stumped as to why this is happening.  I also realize
that I'm still puzzled as to what metadata is stored on the OSTs; it seems
like the client prefers the metadata from the MDS (except of course for
size), but a fair amount of metadata is still stored on the OSSes.  Can
anyone shed some light on this?

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Poor metadata operation performance

2011-05-20 Thread Ken Hornstein
Ken, the OSTs need to track the ownership of objects for quota.  The more
stripes there are on a file, the more RPCs that need to be sent, which is why
we don't recommend wide striping unless there is a reason for it (bandwidth,
size, etc).

Fair enough; I always forget about quota accounting, because we never use
it.  But I'm wondering why this in particular causes such a hit, because
the MDS sends the setattr RPCs asynchronously; in theory it should just
fire them off and not have to wait until they're done.  Perhaps it's the
overhead of sending those RPCs which is slowing things down?  I could believe
that, although I would have thought that it wouldn't be that bad.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] IB storage as an OST target

2011-03-28 Thread Ken Hornstein
Anybody had any experience using an IB based storage
target as an OST?

We do that.

Apart from the obvious issue of separating the IB SAN(SRP/SER)
storage traffic from the Lustre traffic are there any issues?

We don't actually separate the IB traffic from the Lustre traffic; in
our cases they actually run over the same IB HCAs.  That isn't the
setup I would have chosen, but it was the system that was available.

Here is one implementation detail that stands out in my mind.  Because
the IB storage tends to come on line rather late in the boot process,
we had to develop a custom boot script that waits around for the IB
device nodes to appear before attempting to mount the Lustre
filesystems.  That was a bit of a pain until we had it all worked out.

As other as pointed out, if your backend storage disappears (which
happens more often than I would prefer, but in our case the issues
which caused that have been resolved for the most part) then that makes
Lustre very unhappy very quickly.  We've been able to recover from
those situations, but it can be a royal pain.

What about failover?

We use MMP as others have mentioned, but we don't actually have the Lustre
failover stuff all up and running; mostly it hasn't been an issue for us,
so we haven't seen a need to finish it.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] clients gets EINTR from time to time

2011-02-25 Thread Ken Hornstein
I don't understand why you don't just fix your application to handle a
perfectly valid and expected condition (that it's currently not
handling) instead of wasting time trying to find the cause of the
expected condition.  Even if you find it, it's likely not a bug and not
something that can/will be fixed.  It's your application that needs to
be fixed.

To be fair ... normally disk I/O operations are not interruptable by
signals, so it's not an unreasonable behavior on the part of an
application.  I did check POSIX, and it doesn't say that behavior is
restricted only to network sockets, so yeah, it's TECHNICALLY allowable
behavior according to the standard (although the Linux manpage for
signal(7) says that it will not happen).  But honestly, I've seen
plenty of cases where applications handle this for network I/O; it's
normal, everyone knows it will happen there.  But for _disk_ I/O?
Never seen it done.  I'm not saying that there are no applications that
handle this case, but it's certainly very uncommon.  I freely admit
that network filesystems sort of mix the concepts of network socket
and disk I/O together, and what is the right behavior is unclear.
But calling this perfectly valid and expected is not quite accurate.
It would be interesting to see what other network filesystems do under
the same circumstances.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] clients gets EINTR from time to time

2011-02-25 Thread Ken Hornstein
I have a report from a user that is is getting EINTR when a SIGALRM goes 
off on his write().  It isn't unexpected to get SIGALRM because he 
called the alarm, but he also has SA_RESTART set.  I can't remember 
whose responsibility it is to restart the call, syscall or whereever, 
but it seems that someone is dropping the ball because if EINTR is 
returned then SA_RESTART didn't seem to do the trick, right?

I would agree with you on that one; if you're setting SA_RESTART then
you shouldn't ever get EINTR.  It looks like what should be happening
is that if you get interrupted the system call should return
ERESTARTSYS and then after the signal handler is done the system call
should be re-run for you by the signal handling code.

I see that at least for some cases, Lustre will use ERESTARTSYS; just a
guess, but maybe somewhere Lustre is returning EINTR itself instead of
returning ERESTARTSYS?

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] clients gets EINTR from time to time

2011-02-24 Thread Ken Hornstein
OK, the app is used to deal with standard disks, that is why it is not
handling the EINTR signal propoerly.

I think you're misunderstanding what a signal is in the Unix sense.

EINTR isn't a signal; it's a return code from the write() system call
that says, Hey, you got a signal in the middle of this write() call
and it didn't complete.  It doesn't mean that there was an error
writing the file; if that was happening, you'd get a (presumably
different) error code.  Signals can be sent by the operating system,
but those signals are things like SIGSEGV, which basically means, you're
program screwed up.  Programs can also send signals to each other,
with kill(2) and the like.

Now, NORMALLY systems calls like write() are interrupted by signals
when you're writing to slow devices, like network sockets.  According
to the signal(7) man page, disks are not normally considered slow
devices, so I can understand the application not being used to handling
this.  And you know, now that I think about it I'm not even sure that
network filesystems SHOULD allow I/O system calls to be interrupted by
signals ... I'd have to think more about it.

I suspect what happened is that something changed between 1.8.5 and the
previous version of Lustre that you were using that allowed some operations
to be interruptable by signals.  Some things to try:

- Check to see if you are, in fact, receiving a signal in your application
  and Lustre isn't returning EINTR for some other reason.
- If you are receiving a signal, when you set the signal handler for it
  you could use the SA_RESTART flag to restart the interrupted I/O; I think
  that would make everything work like it did before.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] clients gets EINTR from time to time

2011-02-24 Thread Ken Hornstein
As for your questions :
- I have to mention that I always had had this issue, and this is why
I've upgraded from 1.8.4 to 1.8.5, hoping this would solve it.

Ah, okay, I misunderstood that; my apologies.

- I will try to have that SA_RESTART flag set in the app... if I can
find where the signal handler is set.

Searching for sigaction or signal should help there.

- How can I see that lustre is returning EINTR for any other reason ?
As I said no logs shows nothing neither on MDS or OSSs, but I didn't go
through examining lctl debug_kernel yet... which I'm going to do
right away...

Weeelll ... that was just a guess on my part.  I did a quick grep
though the Lustre sources and saw a few places where EINTR was
returned, but most of those seemed to deal with the case where I/O was
interrupted (those places happened fairly far down in the stack; it
wasn't clear to me that those errors would ever bubble back up to a
return code to a system call).  If _that_ is the issue, then tracking
that down will be a challenge.

my last question is : how can I tell which signal I am receiving ?
because my app doesn't say, it just dumps outs the write/pwrite error
code.

I think your easiest way is to use strace; something like strace -e signal
should do the right thing (that will only trace signals, not all system calls).

And if there is no signal handler, then it should follow the standard
actions (as of man 7 signal). On the other hand, my app does not stop
or dump core, and is not ignored, so it has to be handled in the code.
Correct me if I'm wrong...

That is my understanding as well; if you don't have a signal handler
installed, the default action should be taking place, and if the
default action is to ignore the signal that you shouldn't be getting
EINTR.  But hey, I've been wrong before :-)

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Compiling Lustre 2 on SLES10

2011-02-22 Thread Ken Hornstein
FYI, 1.8 and 2.0 used to share the ldiskfs patches (we lost this with the
transition to git), so the ldiskfs module shipped with 1.8.5 (which supports
SLES11 SP1) already has most of the patches required for lustre 2.0.0. I
think it would have been less painful to start from there and to add the
missing patches (e.g. data_in_dirent.patch).

Oh, sure, NOW you tell me!

No doubt you're right; I guess I was thinking that since there was a series
of patches marked SLES11 in 2.0.0.1, that was the closest place to start
from.  Live and learn, I guess.  On the upside, I now know more about ext4,
ldiskfs, and the Lustre build system than I did before :-/

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Compiling Lustre 2 on SLES10

2011-02-22 Thread Ken Hornstein
Thanks for putting in this effort.  I think that others would be
interested (I am).  Perhaps you could share your work at github or post
to lustre-devel ahead of LUG?

Thanks for the words of encouragement!

My boss has no problem with my work being redistributed, but he's not
really a fan of putting it on github; for legal reasons, we'd rather be
under the umbrella of another organization (like what I am doing with
the MacOS X port).  I'll check into a few things and see what my options
are, and I'll post something here (and to lustre-devel) when we get it
out and about (I suppose I could do a simple context diff now, but that's
not very manageable).

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Help! Newbie trying to set up Lustre network

2011-02-22 Thread Ken Hornstein
Dmesg and syslog are clean and has no entries about lustre client.

... are you _sure_?  Even /var/log/messages?

I ask because this sure seems like the Lustre modules are not loaded (you
can check that with the lsmod command).  If they aren't loaded, then the
core issue will be buried somewhere in the messages file (the trick is
to look at the earliest related Lustre messages).  For example, if you
run into the problem that Bob Ball mentioned where one of the RPC services
is using the Lustre port, you can find the Address already in use error
message if you look at the right spot.

If the lustre modules are loaded, then it's beats me what is causing this
problem.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Help! Newbie trying to set up Lustre network

2011-02-22 Thread Ken Hornstein
But my mount command is failing and that's the issue:
mount -t lustre 192.168.0.2@tcp0:/temp /lustre
mount.lustre: mount 192.168.0.2@tcp0:/temp at /lustre failed: No such
device
Are the lustre modules loaded?

Right, and every time I've seen the mount command fail like this (with
ENODEV as the error), the _root cause_ is the kernel modules are not
loading; that can happen for a variety of reasons.  The fact you're
getting _nothing_ in the logs is itself rather suspicious; as Brian has
already pointed out Lustre is normally very chatty, even when it is
working correctly.

You could try loading the modules yourself with insmod; if that's not
working, then you should start from there.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Compiling Lustre 2 on SLES10

2011-02-21 Thread Ken Hornstein
Yes, that is what Oracle had announced in the roadmap.

SLES servers are still supported on Lustre 1.8.x, but Oracle announced 
plans to not support them with Lustre 2.x.  Given the similarities 
between the RHEL6 and SLES11 kernels, I am sure someone could bring SLES 
support back when RHEL6 is supported, if enough people were willing to 
pay for it.

If anyone cares ...

I did the mechanics of getting Lustre 2.0.0.1 compiled and running
under SLES11SP1 (I first tried just SLES11, but it was too hard;
the kernel was just too old to make bringing the ext4 patches forward
feasible, at least for me).  I have it working right now in a small
test filesystem I use for non-production work.  It was a pain (the
major problems were in ldiskfs), but MOST of the pieces were there;
it was mostly a matter of shuffling things around and figuring out
what went where (I don't want to say it was EASY; it took a while.
But it wouldn't call it _hard_; it was mostly annoying, especially
since I wasn't that familiar at the time with ext4/ldiskfs).

I've been thinking of working with the open-source Lustre groups to get
this into a future release; perhaps I'll talk with some of them at the
upcoming LUG.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] recovering formatted OST

2010-10-21 Thread Ken Hornstein
Now I have another problem. After last segfault I can not restart the fsck
due to MMP.
[...]
Also when I try to access filesystem via debugfs it fails:

debugfs -c -R 'ls' /dev/scratch2_ost16vg/ost16lv
debugfs 1.41.10.sun2 (24-Feb-2010)
/dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while opening filesystem
ls: Filesystem not open

Is there a way to clear teh MMP flag so it allows fsck to run?

You want tune2fs -f -E clear-mmp

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Problem with LNET and openibd on Lustre 1.8.4 while rebooting

2010-09-09 Thread Ken Hornstein
lustre does get unmounted before NFS filesystem as seen in the log message...
the problem is due to the fact that LNET is still up when openibd gets 
removed.

Huh, I'm wondering how it ever worked right before.  Certainly on the systems
I have at 1.8.1.1, I always had to have a Lustre start/stop script which did
a lustre_rmmod as part of the stop sequence.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] OST targets not mountable after disabling/enabling MMP

2010-08-09 Thread Ken Hornstein
We recently experienced a power failure (and subsequent UPS failure) 
which caused our Lustre filesystem to shutdown hard.  We were able to 
bring it back online but started seeing errors where the OSTs were being 
remounted as read-only.  We observed that all of the read-only OSTs were 
reporting an I/O error on the same block (the MMP block) and generating 
the following message:
[...]

I had a similar issue once, but the issue was tha the MMP block was
corrupted.  What finally fixed it was running tune2fs -E clear-mmp.
Maybe that might solve the problem?

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] OST targets not mountable after disabling/enabling MMP

2010-08-09 Thread Ken Hornstein
This gives me an MMP error though:
[r...@oss-0-25 log]# tune2fs -E clear-mmp /dev/sdd
tune2fs 1.40.11.sun1 (17-June-2008)
tune2fs: MMP: appears fsck currently being run on the filesystem while 
trying to open /dev/sdd
Couldn't find valid filesystem superblock.

Oh, I forgot ... did you try adding the -f flag?  E.g.:

# tune2fs -f -E clear-mmp /dev/sdd

According to the tune2fs man page, when you use clear-mmp, you also need
the -f flag.  Still being able to mount the filesystm read-only would
make sense to me, since that wouldn't affect fsck being run.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] OST targets not mountable after disabling/enabling MMP

2010-08-09 Thread Ken Hornstein
Using 'tune2fs -f -E clear-mmp' causes tune2fs to segfault:

Ewww  well, not sure what to tell you about that.

Did you use a newer version of tune2fs/e2fsprogs?  Our current version 
is e2fsprogs-1.40.11.sun1-0redhat.  Do you know if it's safe to rev up 
versions on e2fsprogs while running an older lustre kernel revision (1.6.6)?

I am using e2fsprogs-1.41.6.sun1-0suse ... and I know that is old.

I was going to say that I don't know if revving up e2fsprogs is okay, but
I see that Andreas already answered that one.  I can't be 100% sure that
upgrading e2fsprogs _will_ solve your problem, but I think it's worth
a shot.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Per directory quota

2010-07-16 Thread Ken Hornstein
Without size-on-mds, either way would have to query both the mds and  
each OST to get the size info.  Not being that familiar with size-on- 
mds, it does seem likely that du would still have to query the OST  
for size info, even when ls -l does not.

As someone who has spent the past week or two struggling with the
size-on-mds code ... IF everything is working right (a reasonably-sized
IF), then it should not.  AFAIK, du is simply calling stat(), which
is the same thing ls -l is doing.  Certainly part of the information
you store as part of SOM is the disk block usage, which is what du is
adding up.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre on FreeBSD

2010-06-23 Thread Ken Hornstein
...the build them against your kernel -- unless you mean licensing 
support under FreeBSD?

In terms of licensing ... since Lustre is GPL, I can't see any reason
why there would be any licensing conflict.

All you have to do is download the sources, make the few minor changes
to port Lustre to FreeBSD, and you should be in business.  Should be
a snap! :-)

But seriously ... as someone who's been beating their head against the
wall with regards to the Macintosh port, you've got an uphill battle.
And by uphill, I'm talking about the Northeast ridge of Mount
Everest.  Okay, maybe it's not that bad, but it's not for the faint of
heart or people unfamiliar with kernel development.  I did find some
effort a while ago to port Lustre to FreeBSD, but I think it used
Liblustre and FUSE, but when I last checked it seem to have stalled.

If you are crazy enough to want to port Lustre to FreeBSD, you might want
to check out the Macintosh port.  It is at least the preliminary work
you would need to do to get it working on a vnode-based system.  A lot
of work will still need to be done, but you could leverage a lot of
work from what I've done (and what I'm still working on).

If you're not that crazy and you're asking if someone has ported Lustre
to FreeBSD already ... I think the correct answer is no.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] MMP Problems

2010-04-27 Thread Ken Hornstein
Feature will not be enabled until e2fsprogsis updated and 'tune2fs -O 
mmp %{device}' is run.
Normaly MMP should be automaticly enabled with lustre 1.8.x.
I also installed the newerst e2fsprogs but the error message is the same.
The rest works fine.

In the specific case of SLES11, not only do you need to install the latest
e2fsprogs, but you also need some libraries as well.  If you look at
http://downloads.lustre.org/public/tools/e2fsprogs for newer versions of
e2fsprogs you will find a sles11 directory.  In there you will find
the e2fsprogs RPM, as well as other RPMs you need.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] MacOS X Lustre client source code now available

2010-04-27 Thread Ken Hornstein
Greetings all.

Thanks to the good folks at Oracle (who were kind enough to allow me to
use their public Git repository infrastructure, and did all the hard
work of setting up the repo and educating me on the finer points of
pushing to remote Git repos) I am pleased to announce that the source
code to the MacOS X Lustre client that I released two weeks ago is now
available.

The URL for this repo is:

git://git.lustre.org/nrl/lustre.git

That repo will have two branches: master (which has the latest master
source code that I've merged against) and b_master_macosx (the branch on
which I do my work).  If you are unfamiliar with how Git works, here's
a super-brief tutorial to get the source code:

% git clone git://git.lustre.org/nrl/lustre.git
% cd lustre
% git checkout b_master_macosx

If you are actually crazy enough to want to BUILD the source code, well,
here's what you should do (you should be running Snow Leopard and have the
latest download of Xcode):

% ./autogen.sh
% ./configure --disable-server --disable-snmp --disable-liblustre-acl 
--enable-mpitests=no --enabler-pinger
% make

For the truely curious, the last (and only) release I did can be accessed
via the tag macosx-alpha-1.  I plan on pushing to this repo on a regular
basis, so it make break occasionally; consider yourself warned.

Shar and enjoy!

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Future of LusterFS?

2010-04-22 Thread Ken Hornstein
Make sure you read and understand the Lustre 2.0 release notes before you
buy.  There seemed to be some specifics in there about using Oracle hardware.

In all fairness ... that only matters if you pay Oracle for support.  If
you aren't paying Oracle for support (or have no plans to), then it doesn't
matter.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Early alpha version of MacOS X Lustre client available

2010-04-12 Thread Ken Hornstein
Greetings all.

I am pleased to announce that I have made available an early alpha version
of my port of the Lustre client to MacOS X.  By early alpha, I mean that
it works, for me, and it might work for you.  But it might crash your
system.  Actually, it will probably almost certainly crash your system at
some point.  Don't plan on using this in any sort of production system is
really my main point (I don't expect that it will harm your servers, but,
hey ... it IS an early alpha).

You can download it here:

ftp://ftp.cmf.nrl.navy.mil/pub/kenh/macosx-lustre-client-alpha1.tar.gz

There is a brief README in there, and some scripts to install and uninstall
the Lustre client.

For those of you going to the LUG I am giving a talk titled Porting Lustre
to Operating Systems other than Linux and this client will obviously be
the topic of that talk, so feel free to ask me more about it then.

However, I have answered some likely questions below.  If you ask me something
that I've already answered below, I will feel free to mock you mercilessly.

1) Is this based in userspace via FUSE or something like that?

   Nope  This is a real, honest-to-god port of the Lustre kernel
   extensions to MacOS X.  You get all of the same kernel extensions
   that you know and love from Linux, just ported to the Mac (well, I
   had to write something new to replace llite).  None of the server
   extensions are supported, however.

2) What version of Lustre is this based on?

   It is based on the HEAD as of ... Thursday (4/8/2010).  Specifically,
   commit d354281 is the last commit from master that I have merged into
   my local branch tree for this version.

3) Hey, I noticed that feature X doesn't seem to be supported?

   Yes, you are right.  Sadly, I was not able to get feature X working
   due to lack of {time, technical ability, understanding} on my part.
   But rest assured ... feature X is on the schedule and I plan on
   implementing it, hopefully before {the next few weeks, the next few
   months, next year, the heat death of the universe}.

4) Hey, how come performance ... isn't great?

   Yeah ... sorry about that.

   The short answer is that this client doesn't currently implement any
   caching.  At all.  Obviously that's a major deficiency and one I plan
   on correcting as soon as reasonably possible.  There's also no readahead,
   so that's part of the reason why read peformance is so lousy.  Okay,
   write performance, while better, kinda sucks as well.  I'll be working
   on that as well.

5) It seems like timestamps are messed up?

   Yeah, I haven't quite had a chance to make that work yet.  So any files
   created will this client will have a Unix timestamp of 0, which means
   that their dates will be Jan 1, 1970 UTC.  Also, the setattr call
   will currently return a not supported error.

6) Which version of MacOS X does this client work with?

   It currently supports Intel-based Macintoshes runing Snow Leopard
   (Darwin version 10.6).  Specifically, I developed it on systems running
   10.6.2 and 10.6.3.  There are no plans to support Leopard (10.5)
   or PowerPC systems.

7) Does this work with Lustre 1.8/1.6 servers?

   Sadly, no.  Right now it only works with 2.0 servers.  I can't take
   all the blame for that one, though ... from what we've seen here,
   that's a problem with all 2.0-based clients with earlier servers.

8) Is the source code available?

   Not yet.

   I want to release it, my boss wants me to release it ... we just
   need to figure out our long-term plans for this source code, it's
   eventual home, and the mechanics of distributing the source code.

   If you are a kernel hacker, and you REALLY want to hack on it, drop
   me a line and we'll see if we can work something out.

9) Hey, my machine crashed while testing it!  What should I do?

   Send me the kernel panic log, and I'll take a look at it.

10) Something else weird/strange happened while testing it.  What should I do?

Drop me a line, and I'll see if I can figure out what's going on.

Enjoy!

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] RHEL5's OFED with lustre1.8.2 on IB

2010-04-08 Thread Ken Hornstein
Why not just use the binary kernel we provide instead of rebuilding your
own?  It's the *exact* same kernel that we used in our QA testing and
therefore a known quantity.

I have to agree with Brian here ... the best success that we've had is to
either use _everything_ from Sun/Oracle (I'm just not used to thinking of
you guys as Oracle yet!), or compile _everything_ yourself.  We do the
latter on some systems (for various reasons), but I prefer it when we
can do the former.  Mixing and matching just leads you into trouble
(like the symbol version problems you were encountering).

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Is OFED 'kernel-ib' required for o2ib on RHEL5?

2010-03-23 Thread Ken Hornstein
I've tried also to get Lustre 1.8.2 working with RHEL5.4 and OFED 1.5
but I didn't get this trio working. Even with OFED 1.4.2 I had problems
when modprobing lustre module.

I think you had problems with the module symbol versions, right?  Those
are relatively easy to track down, once you know a few tricks; the
core problem is that you (or someone else) compiled Lustre by pointing
it at the wrong version of OFED.

If that's your problem, then let me know; I can give you some guidance
on how to figure out what is wrong.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Is OFED 'kernel-ib' required for o2ib on RHEL5?

2010-03-23 Thread Ken Hornstein
You're right, I had problems with the module symbol versions using
Lustre 1.8.2 packages available at Sun website, kernel
2.6.18-164.11.1.el5 (RHEL 5.4) and OFED 1.5. The same problems happens
when using OFED 1.4.2.

So since this comes up now and then, I've cc'd the list.

So you can Google around to find more about kernel symbol versioning.
The short answer is that there is a CRC associated with each exported
symbol in the loaded kernel, and that version is recorded in the module
when it is compiled.  That's all well and good, but figuring out what
happens when it doesn't work is a pain, because all of the information
isn't in one place (and nobody has explained it well, at least that
I've seen).

When a module (like Lustre) is compiled, it's pointed at a file called
Module.symvers; that contains the versions of the symbols that
modules are expected to link against, and those versions are recorded
in the module object file.  When you get this mismatch at module load
time, one of two things is happening: the wrong OFed is being loaded,
or you linked against the wrong Module.symvers file.

How do you figure out which one is the problem?  Well, let's take a
common OFed symbol, like rdma_connect.  You can find out the version of
this symbol by grep'ing /proc/kallsyms.  On our system:

# grep rdma_connect /proc/kallsyms 
a0375510 u rdma_connect [ko2iblnd]
a0375510 u rdma_connect [rdma_ucm]
a0375510 u rdma_connect [ib_sdp]
a0377000 r __ksymtab_rdma_connect   [rdma_cm]
a0377225 r __kstrtab_rdma_connect   [rdma_cm]
a03770f0 r __kcrctab_rdma_connect   [rdma_cm]
0ef3a1e8 a __crc_rdma_connect   [rdma_cm]
a0375510 T rdma_connect [rdma_cm]

The symbol you care about is the absolute symbol, the one prefixed by
__crc.  So in this case, we are interested in __crc_rdma_connect, and
that symbol's version is 0x0ef3a1ea.  This is the symbol used by the
currently running kernel.

Which version is Lustre linked against?  Well, for that you need to
find the ko2iblnd.ko file, and dump the __versions section.

# objdump -s -j __versions ko2iblnd.ko | less
[...]
0670      
0680 e8a1f30e  72646d61 5f636f6e  rdma_con
0690 6e656374     nect
06a0      

This display isn't as pretty, but you want to look in the hex dump
just before the symbol name.  In this case, right before rmda_connect,
you will see e8a1f30e ... which is the little-endian version of our
symbol version!  So they match up, and everything works.

If you want to find out which symbol version is in a particular OFed module
(in this case, we want to look at rdma_cm.ko), you can do this:

# nm ./kernel/drivers/infiniband/core/rdma_cm.ko | grep rdma_connect
cd7aa3e6 A __crc_rdma_connect

Wrong version!  But we're ACTUALLY using the module located here:

nm ./updates/kernel/drivers/infiniband/core/rdma_cm.ko | grep rdma_connect
0ef3a1e8 A __crc_rdma_connect

Which is the correct version.  But if you LINK against the first
version, you'll get these errors when you try to load Lustre.  Note
that my Module.symvers file for this kernel contains:

0xcd7aa3e6  rdma_connectdrivers/infiniband/core/rdma_cm EXPORT_SYMBOL

Which is wrong!  In this case, you need to explicitly point Lustre at
the OFed directory which contains the Module.symvers file.

(Can you tell I've beaten my head against the wall over this issue
a WHOLE LOT? :-/)

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Is OFED 'kernel-ib' required for o2ib on RHEL5?

2010-03-23 Thread Ken Hornstein
Thank you very much for your post, it worked!

So ... what was your problem?  Wrong version of OFed loaded?  Or Lustre
was compiled using the wrong symbol versions?

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] How to force client-oss communication over IB when the MDS has only ethernet?

2010-03-23 Thread Ken Hornstein
At the moment, it seems that all the traffic between clients-OSS goes 
also through the slow eth connection.  Is it possible to force them to 
use faster IB interfaces when communication with each other, and only 
use eth to communicate with the MDS?

Stupid question time: is it possible that you added the IB interface to
the OSSes _after_ you created the filesystems on the OSSes?

(I know that the MDS remembers the interfaces that are on it at MDS
creation time, and you have to do an extra step to tell it about any
new interfaces; I think the same is true of the OSSes, but I am not
100% sure).

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] How to force client-oss communication over IB when the MDS has only ethernet?

2010-03-23 Thread Ken Hornstein
Yes! I created OSTs and did some testing with plain eth configuration 
first, so this would explain things.  How can I tell the MDS that things 
have changed?

You'll have to run tunefs.lustre --writeconf (the Lustre manual
explains this in a bit more detail).  I had some vague memory that you
only used to have to do it on the MDT, but last time I did it I had to
do it on the MDT and all OSTs.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] e2fsck: undefined symbol: ext2_attr_index_prefi

2009-10-09 Thread Ken Hornstein
I have Lustre 1.8.1 running on a bunch of SLES 11/x86_64 systems.  I'm
using the stock binaries from www.sun.com.  Everything is fine ... except
that some of the e2fsprogs utilites are unhappy.  Specifically, if I try
to run e2fsck, I get:

# e2fsck /dev/sdb
e2fsck: symbol lookup error: e2fsck: undefined symbol: ext2_attr_index_prefix

I have, of course, the latest e2fsprogs that were released with 1.8.1:

# rpm -q -a | grep e2fsprogs
e2fsprogs-1.41.6.sun1-0suse

(Occasionally tunefs.lustre complains about a missing symbol as well, but
it has mmp in the name.  But that doesn't happen always).

What am I doing wrong?  I was not involved with the installation of the
SLES 11 system, but I was under the impression it was pretty vanilla.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss