Re: [Lustre-discuss] lustre patches for e2fsprogs version 1.41.0?

2008-08-28 Thread Patrick Winnertz
Hello,

> If you are very interested to start working on this, then you can get the
> lustre-e2fsprogs CVS module (put it in a directory called "patches" in
> the e2fsprogs tree) and then run "quilt push -a" to try and apply patches,
> fixing each one as you go.
Where is this module located? I didn't find any hint in the lustre wiki about 
this module and therefore I don't have any idea where to check out.

> Any contribution is appreciated, even if you didn't finish it, since it
> saves the developer to tackle the tricky parts of the integration.
I'll try this next week.

Greetings
Patrick Winnertz
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Softlockup issues. Lustre related?

2008-08-28 Thread Bernd Schubert
Hello Alex,

On Thursday 28 August 2008 05:20:47 Alex Lee wrote:
> Hello Folks,
>
> I have few client nodes that are getting soft lockup errors. These are
> patchless clients running Lustre 1.6.5.1 with kernel
> 2.6.18-53.1.6.el5-PAPI. More or less stock RHEL 5.1 with PAPI patch added
> on it. The MDS and OSS are running Lustre 1.6.5.1 with the supplied Lustre
> kernels and OFED 1.3.1.
>
> I remember there was an issue with __d_lookup in the past but I thought it
> was fixed with the newest release of Lustre. So I dont know if this is
> related in anyway at all. I dont see any other real lustre error messages
> on the client or the MDS/OSS at the time of the softlock up. Also wasnt
> there a softirq issue? I dont think this is related to that...

according to the traces it somehow looks like there is a double locking. 
Unfortunately this is hard to debug due to lustre bug#12752.
In your traces it also looks like it might have locked up 
at "rcu_read_lock();"

While you have compiled yourself anyway, could you recompile it with debugging 
symbols and then resolve __d_lookup+0xd2 using gdb? 


Cheers,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Seeing OST errors on the OSS that doesnt have it mounted

2008-08-28 Thread Bernd Schubert
On Thursday 28 August 2008 05:41:24 Alex Lee wrote:
> Is there any documentation on how to decode the error messages? I feel
> bad keep posting on the list for every single error message I dont
> understand.

I don't think so, you have the source ;) Nathaniel Rutman posted a quite 
useful bash errno function some time ago:

# errnos
function errno()
{
for i in `find /usr/include -name errno*.h`; do
expand $i | grep " "$1" "
done
}


So to find out what error 19 is:

[EMAIL PROTECTED] ~> errno 19
#define ENODEV  19  /* No such device */


Cheers,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Softlockup issues. Lustre related?

2008-08-28 Thread Alex Lee
Bernd Schubert wrote:
> Hello Alex,
>
> On Thursday 28 August 2008 05:20:47 Alex Lee wrote:
>   
>> Hello Folks,
>>
>> I have few client nodes that are getting soft lockup errors. These are
>> patchless clients running Lustre 1.6.5.1 with kernel
>> 2.6.18-53.1.6.el5-PAPI. More or less stock RHEL 5.1 with PAPI patch added
>> on it. The MDS and OSS are running Lustre 1.6.5.1 with the supplied Lustre
>> kernels and OFED 1.3.1.
>>
>> I remember there was an issue with __d_lookup in the past but I thought it
>> was fixed with the newest release of Lustre. So I dont know if this is
>> related in anyway at all. I dont see any other real lustre error messages
>> on the client or the MDS/OSS at the time of the softlock up. Also wasnt
>> there a softirq issue? I dont think this is related to that...
>> 
>
> according to the traces it somehow looks like there is a double locking.
> Unfortunately this is hard to debug due to lustre bug#12752.
> In your traces it also looks like it might have locked up
> at "rcu_read_lock();"
>
> While you have compiled yourself anyway, could you recompile it with debugging
> symbols and then resolve __d_lookup+0xd2 using gdb?
>
>
> Cheers,
> Bernd
>
> --
> Bernd Schubert
> Q-Leap Networks GmbH
>   
Someone found this bug for me that looks very similar.

https://bugzilla.lustre.org/show_bug.cgi?id=15975

Does this look anything close? I'm pretty clueless about debugging 
kernel traces.

Thanks,
-Alex

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] HAMMER

2008-08-28 Thread Mag Gam
Well, I guess I was intrigued by the replication portion of HAMMER. I
suppose SNS will take care of this for us...



On Sat, Aug 23, 2008 at 10:28 AM, Troy Benjegerdes <[EMAIL PROTECTED]> wrote:
> On Sat, Aug 23, 2008 at 05:51:36PM +0400, Nikita Danilov wrote:
>> Mag Gam writes:
>>  > Looks like there is another parallel filesystem similar to Lustre
>>  > called "HAMMER".
>>  > http://kerneltrap.org/DragonFlyBSD/HAMMER_Filesystem_Design
>>
>> Hello,
>>
>>  >
>>  > Has anyone heard about this? The architecture  seems very similar to 
>> Lustre.
>>
>> while HAMMER design is very interesting and it looks M. Dillon plans to
>> ultimately use it as a part of his single-image Dragonfly clustering,
>> it's a local file system, and as such cannot be fairly compared to
>> Lustre.
>
> I start feeling like an old geezer when I hear about a new filesystem of
> the day. Let's compare this once there's at least 1 top500 machine
> running hammer. What I really want to know is what happens to a hammer
> filesystem when a node starts randomly corrupting memory. Instant
> multi-master corruption replication!!
>
> The issues Hammer seems to try to solve in clustering seem to be a lot of the 
> same
> issues I decided to try running AFS as the root filesystem.
> But It's not a parallel network filesystem.
>
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] csum errors

2008-08-28 Thread Stuart Midgley
for completeness, here are the logs from 172.16.4.93

Aug 27 07:49:55 clus093 kernel: LustreError: 132-0: BAD WRITE  
CHECKSUM: changed on the client after we checksummed it - likely false  
positive due to mmap IO (bug 11742): from [EMAIL PROTECTED] inum  
24522277/1605841060 object 12021/0 extent [10485760-11534335]
Aug 27 07:49:55 clus093 kernel: LustreError: 28573:0:(osc_request.c: 
1162:check_write_checksum()) original client csum 2dbc1696 (type 2),  
server csum 9d081697 (type 2), client csum now 9d081697
Aug 27 07:49:55 clus093 kernel: LustreError: 28573:0:(osc_request.c: 
1372:osc_brw_redo_request()) @@@ redo for recoverable error   
[EMAIL PROTECTED] x4720217/t820873 o4->p1- 
[EMAIL PROTECTED]@tcp:6/4 lens 384/480 e 0 to 100 dl 1219794694  
ref 2 fl Interpret:R/0/0 rc 0/0


-- 
Dr Stuart Midgley
[EMAIL PROTECTED]



On 28/08/2008, at 11:57 AM, Stuart Midgley wrote:

> We recently upgraded from 1.4.10.1 to 1.6.5.1 (clients and servers)  
> and now we are seeing errors like
>
>
> Aug 27 07:49:54 oss025 kernel: LustreError: 3738:0:(ost_handler.c: 
> 1163:ost_brw_write()) client csum 2dbc1696, server csum 9d081697
> Aug 27 07:49:54 oss025 kernel: LustreError: 168-f: p1-OST0018: BAD  
> WRITE CHECKSUM: changed in transit before arrival at OST from  
> [EMAIL PROTECTED] inum 24522277/426969871 object 12021/0 extent  
> [10485760-11534335]
> Aug 27 07:49:55 oss025 kernel: LustreError: 3738:0:(ost_handler.c: 
> 1225:ost_brw_write()) client csum 2dbc1696, original server csum  
> 9d081697, server csum now 9d081697
>
>
> always from the same cluster node...  Should we be worried?  I  
> suspect this means we shouldn't turn check summing off?  I assume  
> these are rejected and resent from the client?
>
>
> -- 
> Dr Stuart Midgley
> [EMAIL PROTECTED]
>
>
>

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] ksocklnd multiple connections

2008-08-28 Thread Tim Burgess
Hi All,

Just wondering if someone can give us some insight into the logic that
ksocklnd uses to decide which connections to make.

There's not so much in the Lustre operations manual about it, but the
impression I get from reading around is that if we have:

options lnet networks=tcp0(eth0,eth1)

on all of our dual-connected hosts, then they will load-balance by making
multiple connections between clients and servers.  Indeed, they do.
However, I would have expected 4 TCP connection bundles (i.e. 12
connections) (eth[0,1]<->eth[0,1]), but we actually get two (i.e. 6
connections) (eth0<->eth0 and eth1<->eth1).  How does lustre know which
combinations to use???

Some important points about our setup:
- This is a shared network segment 172.16.0.0/16
- Three switches (LeftSwitch<->TopSwitch<->RightSwitch)
- all dual connected hosts are connected to both LeftSwitch and RightSwitch
- clients network interfaces are 172.16.4.x/16 (eth0,leftswitch) and
172.16.5.x/16 (eth1,rightswitch)
- OSS/MDS network interfaces are 172.16.0.x/16 (eth0,leftswitch) and
172.16.1.x/16 (eth1,rightswitch)
- to get good routing, we have static routes configured as such, on all
dual-connect machines:
Kernel IP routing table
Destination Gateway Genmask Flags   MSS Window  irtt
Iface
172.16.4.0  0.0.0.0 255.255.255.0   U 0 0  0
eth0
172.16.5.0  0.0.0.0 255.255.255.0   U 0 0  0
eth1
172.16.0.0  0.0.0.0 255.255.255.0   U 0 0  0
eth0
172.16.1.0  0.0.0.0 255.255.255.0   U 0 0  0
eth1
(i.e. all traffic between clients and servers shouldn't traverse TopSwitch,
notwithstanding occasional ugly arp issues)

So lustre has done the right thing in connecting eth0<->eth0 and eth1<->eth1
in this case.  But how does it know?  Does the client connect to both server
addresses and throw away any connections originating from the same address?
Is there some check of the return path?

My motive here is that I also have a set of singly-connected machines, and
want to have their traffic balanced across both server networks (single
connect machines come in via topswitch).  Right now, these clients all
connect to the eth0 address (172.16.0.x) on all OSSes and the MDS.  All the
traffic goes via leftswitch, and my peak bandwidth to a single OSS is
therefore 1 gigabit, and the disks are capable of more than that.  What if
my single connect client was a 10gig NIC?  Or lots of 1gig single connect
clients?  It seems that we are getting it wrong in these cases.

I understand the issues with routing return traffic from the OSS - I am
happy/planning to configure source-based routing on the server nodes, but I
only want to go to the effort once I understand whatever black magic
ksocklnd is doing to decide which connections it should make!  If I go ahead
and configure the source routing, will we end up with two connections being
made from a client with a single IP?

Thanks for your help,

Tim
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Lustre_config fails trying to access mgs - mdt and mgs are configured together.

2008-08-28 Thread Alexander, Jack

In my lustre 1.6 config, I have two MSA2000 and two DL380G5 servers.  The 
servers sfs1 and sfs2 are the internal network Ethernet names for my two 
servers. The system interconnect names, ic-sfs1 and ic-sfs2, correspond to the 
servers.

I've successfully (I think) run both "lctl ping [EMAIL PROTECTED]" and "lctl 
ping [EMAIL PROTECTED]" from server sfs1 and sfs2. Does this look correct. How 
do you read the output from this command?
[EMAIL PROTECTED] ~]# lctl ping [EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED] ~]# lctl ping [EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]

This is the .csv file I'm using as input to the lustre_config command. Note 
that the mdt and mgs components are mounted together.
hpcsfse1:root> cat src/scripts/hpcsfse_lustre_config.csv
sfs1,options lnet 
networks=o2ib0,/dev/mapper/mpath0,/mnt/mdt_mgs,mdt|mgs,testfs,_netdev,[EMAIL
 PROTECTED]
sfs2,options lnet networks=o2ib0,/dev/mapper/mpath1,/mnt/ost0,ost,testfs,[EMAIL 
PROTECTED]_netdev,[EMAIL PROTECTED]

Configuration of sfs1 server seems to be OK.
hpcsfse1:root> lustre_config -vfw sfs1 src/scripts/hpcsfse_lustre_config.csv
lustre_config: Operating on the following nodes: sfs1
lustre_config: Checking the cluster network connectivity and hostnames...
lc_net: Verifying network connectivity between "hpcsfse1.hpclab.usa.hp.com" and 
"sfs1"...
lc_net: OK
lustre_config: Check the cluster network connectivity and hostnames OK!

lustre_config:  Lustre cluster configuration START 
lustre_config: Explicit MGS target /dev/mapper/mpath0 in host sfs1.
lustre_config: Adding lnet module options to sfs1
lustre_config: Starting lnet network in sfs1
lustre_config: Creating the mount point /mnt/mdt_mgs on sfs1
lustre_config: Formatting Lustre target /dev/mapper/mpath0 on sfs1...
lustre_config: Formatting command line is: ssh -x -q sfs1 
"PATH=$PATH:/sbin:/usr/sbin; /usr/sbin/mkfs.lustre --reformat  --mgs --mdt 
--fsname=testfs [EMAIL PROTECTED] /dev/mapper/mpath0"
lustre_config: Waiting for the return of the remote command...

   Permanent disk data:
Target: testfs-MDT
Index:  unassigned
Lustre FS:  testfs
Mount type: ldiskfs
Flags:  0x75
  (MDT MGS needs_index first_time update )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: [EMAIL PROTECTED] [EMAIL PROTECTED] 
mdt.group_upcall=/usr/sbin/l_getgroups

device size = 1259804MB
2 6 18
formatting backing filesystem ldiskfs on /dev/mapper/mpath0
target name  testfs-MDT
4k blocks 0
options-J size=400 -i 4096 -I 512 -q -O 
dir_index,uninit_groups,mmp -F
mkfs_cmd = mkfs.ext2 -j -b 4096 -L testfs-MDT  -J size=400 -i 4096 -I 512 
-q -O dir_index,uninit_groups,mmp -F /dev/mapper/mpath0
Writing CONFIGS/mountdata
lustre_config: Success on all Lustre targets!
lustre_config: Modify /etc/fstab of host sfs1 to add Lustre target 
/dev/mapper/mpath0
lustre_config: /dev/mapper/mpath0   /mnt/mdt_mgslustre  
_netdev 0 0
lustre_config:  Lustre cluster configuration END **

hpcsfse1:root> mount /mnt/mdt_mgs

Configuration of the sfs2 server fails. How do I debug and or correct this?
hpcsfse1:root> lustre_config -vfw sfs2 src/scripts/hpcsfse_lustre_config.csv
lustre_config: Operating on the following nodes: sfs2
lustre_config: Checking the cluster network connectivity and hostnames...
lc_net: Verifying network connectivity between "hpcsfse1.hpclab.usa.hp.com" and 
"sfs2"...
lc_net: OK
lustre_config: Check the cluster network connectivity and hostnames OK!

lustre_config:  Lustre cluster configuration START 
lustre_config: There is no MGS target in the node list "sfs2".
lustre_config: Creating the mount point /mnt/ost0 on sfs2
lustre_config: Adding lnet module options to sfs2
lustre_config: Starting lnet network in sfs2
lustre_config: Checking lnet connectivity between sfs2 and the MGS node
lustre_config: check_lnet_connect() error: sfs2 cannot contact the MGS node  
with nids - "[EMAIL PROTECTED]"! Check /usr/sbin/lctl command!

hpcsfse1:root> lctl dl
  0 UP mgs MGS MGS 5
  1 UP mgc [EMAIL PROTECTED] c2910fbc-b150-b759-0c41-a5851616e41e 5
  2 UP mdt MDS MDS_uuid 3
  3 UP lov testfs-mdtlov testfs-mdtlov_UUID 4
  4 UP mds testfs-MDT testfs-MDT_UUID 3



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre Patchless Client

2008-08-28 Thread Andreas Dilger
On Aug 27, 2008  20:57 +0300, Ender G�ler wrote:
> I'm new to lustre community and lustre software as well. I have question
> regarding to patchless lustre client installation. The OS is Redhat EL 5.1.
> I'm using voltaire gridstack v5.1.3 for infiniband software stack. I
> installed the lustre-1.6.5.1 servers without any problem. The client and
> server OSes are same. The problem is installing patchless client. Are there
> any howto or guide for that. I googled but got nothing about it. Just the
> link http://wiki.lustre.org/index.php?title=Patchless_Client . When i tried
> the things which i read on that page, no modules were build. Are there
> anything that must be done for the patchless client installation.
> 
> I ran the following command for configuration:
> 
> ./configure --prefix=/usr/local/lustre --enable-uoss --enable-bgl
> --enable-posix-osd --enable-panic_dumplog --enable-quota
> --enable-health-write --enable-lru-resize --enable-adaptive-timeouts
> --enable-efence --enable-libwrap --enable-snmp --with-o2ib=/usr/src/openib
> --with-linux=/usr/src/kernels/2.6.18-53.1.14.el5
> 
> Then ran make && make install. None of the above steps returned error
> messages. So I assumed that the compilation and installation is successful.
> But when i try to mount the server the following error message returned:

There are actually patchless client RPMs available for download at the
same place as the server RPMs.  They are called "lustre-client" or similar.

Note that it is also possible to run the same "server" kernel and Lustre
RPMs on the client nodes, since the lustre RPM also contains the client.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Softlockup issues. Lustre related?

2008-08-28 Thread Bernd Schubert
Hi Alex,

On Thursday 28 August 2008 14:52:22 Alex Lee wrote:
> Someone found this bug for me that looks very similar.
>
> https://bugzilla.lustre.org/show_bug.cgi?id=15975
>
> Does this look anything close? I'm pretty clueless about debugging
> kernel traces.

yeah, looks like this is your issue. Also explains why we didn't run into this 
ourselves - we are using the same server and client kernel for all of our 
customers and so we are already patching d_rehash() for server site support.


Cheers,
Bernd


-- 
Bernd Schubert
Q-Leap Networks GmbH
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Softlockup issues. Lustre related?

2008-08-28 Thread Alex Lee
Bernd Schubert wrote:
> Hi Alex,
>
> On Thursday 28 August 2008 14:52:22 Alex Lee wrote:
>   
>> Someone found this bug for me that looks very similar.
>>
>> https://bugzilla.lustre.org/show_bug.cgi?id=15975
>>
>> Does this look anything close? I'm pretty clueless about debugging
>> kernel traces.
>> 
>
> yeah, looks like this is your issue. Also explains why we didn't run into this
> ourselves - we are using the same server and client kernel for all of our
> customers and so we are already patching d_rehash() for server site support.
>
>
> Cheers,
> Bernd
>
>
> --
> Bernd Schubert
> Q-Leap Networks GmbH
>   

Hi Bernd,

Your patching for d_rehash() for your servers? I thought the race 
conditions which causes the softlockup only happened on the patchless 
clients (which were running).
The solution I saw was run lustre supplied kernel or run the iopen-misc 
patch on the kernel.

Or did you mean you are already patched?

-Alex


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Seeing OST errors on the OSS that doesnt have it mounted

2008-08-28 Thread Andreas Dilger
On Aug 28, 2008  12:41 +0900, Alex Lee wrote:
> Andreas Dilger wrote:
>>> Aug 23 12:27:52 lustre-oss-0-0 kernel: LustreError: 
>>> 2918:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@
>>> @ processing error (-19)  [EMAIL PROTECTED] x52/t0 o8->@:0/0 lens 
>>> 240/0 e 0 to 0 dl 1219462372
>>>  ref 1 fl Interpret:/0/0 rc -19/0
>>
>> The fact that lustre-oss-0-0 returns -ENODEV (-19) isn't a reason to stop
>> trying there, because it may take some time for OST to failover from primary
>> server to backup.
>>
>> What this really means is that your primary server is having network
>> trouble, or is so severely overloaded that the client has given up on
>> it and is trying the backup.  It could also be a problem on the client
>> I guess.
>
> Is there any documentation on how to decode the error messages? I feel  
> bad keep posting on the list for every single error message I dont  
> understand.

There is some documentation about error messages in the manual (in
particular how to decode the above RPC message).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] csum errors

2008-08-28 Thread Andreas Dilger
On Aug 28, 2008  21:49 +0800, Stuart Midgley wrote:
> for completeness, here are the logs from 172.16.4.93
> 
> Aug 27 07:49:55 clus093 kernel: LustreError: 132-0: BAD WRITE  
> CHECKSUM: changed on the client after we checksummed it - likely false  
> positive due to mmap IO (bug 11742): from [EMAIL PROTECTED] inum  

This is the important part to note - if you are doing mmap writes then
the VM doesn't protect the pages from being modified while they are
being checksummed and sent over the wire.

> Aug 27 07:49:55 clus093 kernel: LustreError: 28573:0:(osc_request.c: 
> 1162:check_write_checksum()) original client csum 2dbc1696 (type 2),  
> server csum 9d081697 (type 2), client csum now 9d081697

This means the data changed after the initial checksum was computed,
but now it has settled down.  In some cases the "client csum now" can
have changed again, depending on whether the process is rewriting the
same file repeatedly.

> Aug 27 07:49:55 clus093 kernel: LustreError: 28573:0:(osc_request.c: 
> 1372:osc_brw_redo_request()) @@@ redo for recoverable error   
> [EMAIL PROTECTED] x4720217/t820873 o4->p1- 
> [EMAIL PROTECTED]@tcp:6/4 lens 384/480 e 0 to 100 dl 1219794694  
> ref 2 fl Interpret:R/0/0 rc 0/0

Here it tells you it is resending the RPC.

> > always from the same cluster node...  Should we be worried?  I  
> > suspect this means we shouldn't turn check summing off?  I assume  
> > these are rejected and resent from the client?

If you are NOT doing mmap IO (just normal read/write) then it is possible
your node has memory corruption.  There is an extra check that can be
done to checksum the pages while they are in memory, instead of just
over the wire.  More overhead of course, but can help isolate the problem.

echo 1> /proc/fs/lustre/llite/*/checksum_pages

This will also enable on-wire checksumming, which is already on
by default.  One caveat is that turning off checksum_pages will
also turn off the on-wire checksums (which can be re-enabled via
/proc/fs/lustre/osc/*/checksums)...  Blame Phil.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lustre patches for e2fsprogs version 1.41.0?

2008-08-28 Thread Andreas Dilger
On Aug 28, 2008  09:12 +0200, Patrick Winnertz wrote:
> > If you are very interested to start working on this, then you can get the
> > lustre-e2fsprogs CVS module (put it in a directory called "patches" in
> > the e2fsprogs tree) and then run "quilt push -a" to try and apply patches,
> > fixing each one as you go.

> Where is this module located? I didn't find any hint in the lustre wiki about 
> this module and therefore I don't have any idea where to check out.

It _should_ be available in the same place as the lustre CVS, in a module
called "lustre-e2fsprogs" insted of the normal "lustre" module.  That said,
I'm not 100% sure because I don't use the external CVS.

Alternately, you can get the e2fsprogs-1.40.11.src.rpm.  This includes
_most_ of the patches in .../patches (except a few newer ones only in CVS)
and are already applied to the sources.  This directory can be moved over
to the e2fsprogs-1.41.x tree and applied with "quilt push -a" (which will
fail at the first one, which adds -sun1 to the e2fsprogs version ;-).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] ksocklnd multiple connections

2008-08-28 Thread Andreas Dilger
On Aug 28, 2008  22:11 +0800, Tim Burgess wrote:
> - all dual connected hosts are connected to both LeftSwitch and RightSwitch
> - clients network interfaces are 172.16.4.x/16 (eth0,leftswitch) and
> 172.16.5.x/16 (eth1,rightswitch)
> - OSS/MDS network interfaces are 172.16.0.x/16 (eth0,leftswitch) and
> 172.16.1.x/16 (eth1,rightswitch)
> 
> So lustre has done the right thing in connecting eth0<->eth0 and eth1<->eth1
> in this case.  But how does it know?  Does the client connect to both server
> addresses and throw away any connections originating from the same address?
> Is there some check of the return path?

It does this matching based on the subnet addresses.  I think it refuses
to have multiple connections between nodes on the same interfaces, as
this sometimes happens when there is a A->B vs B->A connection race.

> My motive here is that I also have a set of singly-connected machines, and
> want to have their traffic balanced across both server networks (single
> connect machines come in via topswitch).  Right now, these clients all
> connect to the eth0 address (172.16.0.x) on all OSSes and the MDS.

Normally what is done is using ip2nets module option, so that clients
will load-balance their connections between the 2 interfaces.  That
doesn't help the single client, but is normally fine in a cluster.

I believe that is documented in the manual.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] csum errors

2008-08-28 Thread Stuart Midgley
Thanks for the information, greatly appreciated.

We are keeping an eye on the client causing these errors and doing a  
few tests.  The mmap issue is interesting.  The code producing these  
errors is running across the entire cluster, so I assume if it was  
mmap-ing we would be seeing these sort of random errors from more than  
just one node.

If the problems persist, we will turn on the extra debugging and see  
where we are at.


-- 
Dr Stuart Midgley
[EMAIL PROTECTED]



On 29/08/2008, at 1:25 AM, Andreas Dilger wrote:

> On Aug 28, 2008  21:49 +0800, Stuart Midgley wrote:
>> for completeness, here are the logs from 172.16.4.93
>>
>> Aug 27 07:49:55 clus093 kernel: LustreError: 132-0: BAD WRITE
>> CHECKSUM: changed on the client after we checksummed it - likely  
>> false
>> positive due to mmap IO (bug 11742): from [EMAIL PROTECTED] inum
>
> This is the important part to note - if you are doing mmap writes then
> the VM doesn't protect the pages from being modified while they are
> being checksummed and sent over the wire.
>
>> Aug 27 07:49:55 clus093 kernel: LustreError: 28573:0:(osc_request.c:
>> 1162:check_write_checksum()) original client csum 2dbc1696 (type 2),
>> server csum 9d081697 (type 2), client csum now 9d081697
>
> This means the data changed after the initial checksum was computed,
> but now it has settled down.  In some cases the "client csum now" can
> have changed again, depending on whether the process is rewriting the
> same file repeatedly.
>
>> Aug 27 07:49:55 clus093 kernel: LustreError: 28573:0:(osc_request.c:
>> 1372:osc_brw_redo_request()) @@@ redo for recoverable error
>> [EMAIL PROTECTED] x4720217/t820873 o4->p1-
>> [EMAIL PROTECTED]@tcp:6/4 lens 384/480 e 0 to 100 dl  
>> 1219794694
>> ref 2 fl Interpret:R/0/0 rc 0/0
>
> Here it tells you it is resending the RPC.
>
>>> always from the same cluster node...  Should we be worried?  I
>>> suspect this means we shouldn't turn check summing off?  I assume
>>> these are rejected and resent from the client?
>
> If you are NOT doing mmap IO (just normal read/write) then it is  
> possible
> your node has memory corruption.  There is an extra check that can be
> done to checksum the pages while they are in memory, instead of just
> over the wire.  More overhead of course, but can help isolate the  
> problem.
>
>   echo 1> /proc/fs/lustre/llite/*/checksum_pages
>
> This will also enable on-wire checksumming, which is already on
> by default.  One caveat is that turning off checksum_pages will
> also turn off the on-wire checksums (which can be re-enabled via
> /proc/fs/lustre/osc/*/checksums)...  Blame Phil.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss