Re: [lustre-discuss] What happens if my stripe count is set to more than my number of stripes

2015-04-20 Thread Michael Kluge

Hi Oleg,

I tried it and looks like it actually stores the 128 stripe size (at 
least for dirs). lfs getstripe tells me that my dir is now striped over 
128 OSTs (I have 48).


[/scratch/mkluge] lctl dl | grep osc | wc -l
48
[/scratch/mkluge] mkdir  p
[/scratch/mkluge] lfs setstripe -c 128 p
[/scratch/mkluge] lfs getstripe p
p
stripe_count:   128 stripe_size:1048576 stripe_offset:  -1


Regards, Michael

Am 20.04.2015 um 18:44 schrieb Drokin, Oleg:

Hello!

Current allocator behaviour is such that when you specify more
stripes than you have OSTs, it'll treat it the same as if you set
stripe count to -1 (that is - the maximum possible stripes).

Bye, Oleg On Apr 20, 2015, at 4:47 AM, prakrati.agra...@shell.com
prakrati.agra...@shell.com wrote:


Hi,

I have a doubt regarding Lustre file system. If I have a file of
size 64 GB and I set stripe size to 1GB, my number of stripes
become 64. But if I set my stripe count as 128, what does the
Lustre do in that case?

Thanks and Regards, Prakrati
___ lustre-discuss
mailing list lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___ lustre-discuss
mailing list lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org






smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] New community release model and 2.5.3 (and 2.x.0) patch lists?

2015-04-16 Thread Michael Kluge

On Wed, Apr 15, 2015 at 11:44 AM, Scott Nolin scott.no...@ssec.wisc.edu
wrote:


Since Intel will not be making community releases for 2.5.4 or 2.x.0
releases now, it seems the community will need to maintain some sort of
patch list against these releases.


I don't think this is how I understood it a LUG. What took with me: 
Intel will make 2.x.0 releases every 6 month including fixes. New 
releases may or may have not new features. But there will be a regular 
release cycle.


Michael

--
Dr.-Ing. Michael Kluge

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih



smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [Lustre-discuss] [HPDD-discuss] will obdfilter-survey destroy an already formatted file system

2013-05-22 Thread Michael Kluge
Hi Cory,

I am running this stuff now since a few weeks. Only a few users are using the 
file system up to now. Either I was lucky or Andreas is right. No one has 
complained yet that data got lost. I am running integrity checks in parallel 
and they did not find anything yet. So we can say it is most probably safe :)


Regards, Michael


 Michael,
 
 Unfortunately, the current Lustre Ops Manual indicates the opposite.  From
 section 24.3 Testing OST Performance (obdfilter_survey):
 
 The obdfilter_survey script is destructive and should not be run on
 devices that containing existing data that needs to be preserved. Thus,
 tests using obdfilter_survey should be run before the Lustre file system
 is placed in production.
 
 I opened LUDOC-146 to track the issue previously and updated the details
 to include Andreas' explanation.
 
 Thanks,
 -Cory
 
 
 On 3/21/13 7:18 PM, Dilger, Andreas andreas.dil...@intel.com wrote:
 
 On 2013/21/03 4:09 AM, Michael Kluge michael.kl...@tu-dresden.de
 wrote:
 I have read through the documentation for obdfilter-survey but could not
 found any information on how invasive the test is. Will it destroy an
 already formatted OST or render user data unusable?
 
 It shouldn't - the obdfilter-survey uses a different object sequence (2)
 compared to normal filesystem objects (currently always 0), so the two do
 not collide.
 
 Cheers, Andreas
 -- 
 Andreas Dilger
 
 Lustre Software Architect
 Intel High Performance Data Division
 
 
 ___
 HPDD-discuss mailing list
 hpdd-disc...@lists.01.org
 https://lists.01.org/mailman/listinfo/hpdd-discuss
 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre On Two Clusters

2013-05-13 Thread Michael Kluge
Hi Mark,

I remember that the NRL used them. They had a couple of presentations at 
the Lustre User Group. Here is some pretty old stuff:
http://wiki.lustre.org/images/3/3a/JamesHoffman.pdf


Regards, Michael


Am 09.05.2013 17:15, schrieb Mr. Mark L. Dotson (Contractor):
 Thanks, Lee.

 Has anyone done any work with Lustre and IB WAN extenders? I need help
 with my configuration.

 Thanks,

 Mark

 On 05/08/13 11:03, Lee, Brett wrote:
 -Original Message-
 From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-
 boun...@lists.lustre.org] On Behalf Of Mr. Mark L. Dotson (Contractor)
 Sent: Tuesday, May 07, 2013 9:16 AM
 To: lustre-discuss@lists.lustre.org
 Subject: [Lustre-discuss] Lustre On Two Clusters

 I have Lustre installed and working on 1 cluster. Everything is IB. I can 
 mount
 clients in this cluster with no problems. I want to mount this Lustre FS on
 another cluster that is attached to a separate IB switch.
 What's the best way to do this? Does it require a separate subnet for the IB
 interfaces, or does it matter?

 Hi Mark,

 Good to hear from you on the list.

 Regarding your question, a couple options jump out at me.

 1.  Add additional interfaces to the servers.  This will allow the Lustre 
 servers to be on both IB networks, and able to directly serve the file 
 system to the clients.
 2.  Use LNet router(s), the basics of which is documented in the operations 
 manuals.

 Either way, you'll need to perform some network configuration in (at least) 
 the servers lustre.conf.

 -Brett


 Currently, my /etc/modprobe.d/lustre.conf has the following:

 options lnet networks=o2ib0(ib0)

 Lustre version is 2.3
 OS's are CentOS 6.4.

 Any help would be much appreciated. Thanks.

 Mark


 --
 Mark Dotson
 Systems Administrator
 Lockheed-Martin
 dotsonml@afrl.hpc.mil
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


 --
 Brett Lee
 Sr. Systems Engineer
 Intel High Performance Data Division




___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] will obdfilter-survey destroy an already formatted file system

2013-03-21 Thread Michael Kluge
Hi,

I have read through the documentation for obdfilter-survey but could not found 
any information on how invasive the test is. Will it destroy an already 
formatted OST or render user data unusable?


Regards, Michael

--
Dr.-Ing. Michael Kluge

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih



smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] df -h question

2012-07-11 Thread Michael Kluge
Dear list,

we are in the process of copying the whole content of a 1.6.7 Lustre FS 
to a 1.8.7 Lustre FS. For this I precreated all individual directories 
on the new FS to set striping information based on the #bytes/#files 
ratio. Then we used a parallel rsync to copy all directories over. All 
of this worked fine. Now, on the old FS the user data consumed 63 TB 
while on the new FS 'df -h' reports only 56 TB as used. I'm sure we 
copied all dirs and all rsyncs finished successfully.

Is this difference expected if one moves from 1.6-1.8? Or did I miss 
something?


Regards, Michael

-- 
Dr.-Ing. Michael Kluge

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] wrong free inode count on the client side with 1.8.7

2012-06-20 Thread Michael Kluge
Hi list,

the number of free inodes seems to be reported wrongly on the client side. If I 
create files, the number of free inodes does not change. If I delete the files, 
the number of free inodes increases. So, from a client perspective, if I repeat 
to create and remove files, I can have more and more free inodes. I tried to 
find a bug for this in Whamcloud's database but could not find one. 'df -i' for 
the mdt on the MDS looks OK.

I think behaviour is depicted here:
http://lists.lustre.org/pipermail/lustre-discuss/2011-July/015789.html

Right now I don't think this is a big problem. Can this turn into a real 
problem? Like when the number of free inodes as seen by the client exceeds 2^64 
or  whatever is the limit there?


Michael

-- 

Dr.-Ing. Michael Kluge

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] sgpdd-survey and /dev/dm-0

2012-06-12 Thread Michael Kluge
Hi Frank,

thanks a lot, that helped.


Regards,
Michael


Am Dienstag, 12. Juni 2012, 14:24:27 schrieb Frank Riley:
 Mount your OSTs as a raw devices using raw. Do a man raw. I can't remember
 if you create the raw device from the /dev/mapper/* device or the /dev/dm-N
 device, but one of those works. Then run sgpdd_survey on the /dev/rawN
 devices.

 From: lustre-discuss-boun...@lists.lustre.org
 [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Michael Kluge
 Sent: Tuesday, June 12, 2012 5:51 AM
 To: lustre-discuss
 Subject: [Lustre-discuss] sgpdd-survey and /dev/dm-0

 Hi list,

 is there way to run sgpdd-survey on device mapper disks?


 Regards, Michael

 --

 Dr.-Ing. Michael Kluge

 Technische Universität Dresden
 Center for Information Services and
 High Performance Computing (ZIH)
 D-01062 Dresden
 Germany

 Contact:
 Willersbau, Room WIL A 208
 Phone:  (+49) 351 463-34217
 Fax:(+49) 351 463-37773
 e-mail: michael.kl...@tu-dresden.demailto:michael.kl...@tu-dresden.de
 WWW:http://www.tu-dresden.de/zih
--

Dr.-Ing. Michael Kluge

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih

smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] performance: hard vs. soft links

2012-05-26 Thread Michael Kluge
 Hard links are only directory entries with refcounts on the target inode, so 
 that when the last link to an inode is removed the inode will be deleted.

 Symlinks are inodes with a string that points to the original name. They are 
 not recounted on the target, but require a new inode to be allocated for each 
 one.

 It isn't obvious which one would be slower, since they both have some 
 overhead.

 Is your sample size large enough?  1000 may only take 1s to complete and may 
 not provide consistent results.

The 1000 creates need  between 2.9 and 3.0 s (3 runs) for the hard links 
and 2.2-2.3 s (3 runs as well) for the soft links. I think the numbers 
are not so bad in terms of accuracy. Thanks for the explanation.


Michael
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] performance: hard vs. soft links

2012-05-25 Thread Michael Kluge
Hi list,

for creating hard links instead of soft links (1.6.7, 1000 links created by
one process, all in the same subdir, the node is behind one lnet router) I see
about 25% overhead (time) on the client side. Is this OK/normal/expected?
Lustre probably needs to increment some ref. counter on the link target if
hard links are used?


Michael

--

Dr.-Ing. Michael Kluge

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih

smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Most recent Linux Kernel on the client for a 1.8.7 server

2012-05-16 Thread Michael Kluge
Hi list,

could someone please tell me what the most recent kernel version (and lustre 
version) is on the client side, if I have to stick to 1.8.7 on the server side? 
I think Lustre 2.1 will is not compatible, the 1.8.8 client can be compiled 
with 2.6.32 but I do not know how 2.0 is doing ...


Regards, Michael

-- 

Dr.-Ing. Michael Kluge

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih



smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Most recent Linux Kernel on the client for a 1.8.7 server

2012-05-16 Thread Michael Kluge
Hi Adrian,

OK, thanks. Then the state is the same as I remember.


Regards, Michael

On 16.05.2012 20:14, Adrian Ulrich wrote:

 could someone please tell me what the most recent kernel version (and lustre 
 version) is on the client side, if I have to stick to 1.8.7 on the server 
 side?

 2.x clients will refuse to talk to 1.8.x servers.

 You can build the 1.8.x client with a few patches on CentOS6 (2.6.32), but 
 you should really consider to upgrade to 2.x in the future.

 Regards,
   Adrian




-- 
Dr.-Ing. Michael Kluge

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] IOR writing to a shared file, performance does not scale

2012-02-10 Thread Michael Kluge
Hi Kshitij,

I would recommend to run sgpdd-survey on the servers for one and for 
multiple disks and then obdfilter-survey. Then you know what your 
storage can deliver. Then you could do lnet tests as well to see wether 
the network works fine. If the disks and the network deliver the 
expected performance, IOR will most probably run with good performance 
as well.

Please see:
http://wiki.lustre.org/images/4/40/Wednesday_shpc-2009-benchmarking.pdf


Regards, Michael

On 10.02.2012 23:27, Kshitij Mehta wrote:
 We have lustre 1.6.7 configured using 64 OSTs.
 I am testing the performance using IOR, which is a file system benchmark.

 When I run IOR using mpi such that processes write to a shared file,
 performance does not scale. I tested with 1,2 and 4 processes, and the
 performance remains constant at 230 MBps.

 When processes write to separate files, performance improves greatly,
 reaching 475 MBps.

 Note that all processes are spawned on a single node.

 Here is the output:
 Writing to a shared file:

 Command line used: ./IOR -a POSIX -b 2g -e -t 32m -w -o
 /fastfs/gabriel/ss_64/km_ior.out
 Machine: Linux deimos102

 Summary:
  api= POSIX
  test filename  = /fastfs/gabriel/ss_64/km_ior.out
  access = single-shared-file
  ordering in a file = sequential offsets
  ordering inter file= no tasks offsets
  clients= 4 (4 per node)
  repetitions= 1
  xfersize   = 32 MiB
  blocksize  = 2 GiB
  aggregate filesize = 8 GiB

 Operation  Max (MiB)  Min (MiB)  Mean (MiB)   Std Dev  Max (OPs)  Min
 (OPs)  Mean (OPs)   Std Dev  Mean (s)
 -  -  -  --   ---  -
 -  --   ---  
 write 233.61 233.61  233.61  0.00   7.30
 7.307.30  0.00  35.06771   EXCEL

 Max Write: 233.61 MiB/sec (244.95 MB/sec)

 Writing to separate files:

 Command line used: ./IOR -a POSIX -b 2g -e -t 32m -w -o
 /fastfs/gabriel/ss_64/km_ior.out -F
 Machine: Linux deimos102

 Summary:
  api= POSIX
  test filename  = /fastfs/gabriel/ss_64/km_ior.out
  access = file-per-process
  ordering in a file = sequential offsets
  ordering inter file= no tasks offsets
  clients= 4 (4 per node)
  repetitions= 1
  xfersize   = 32 MiB
  blocksize  = 2 GiB
  aggregate filesize = 8 GiB

 Operation  Max (MiB)  Min (MiB)  Mean (MiB)   Std Dev  Max (OPs)  Min
 (OPs)  Mean (OPs)   Std Dev  Mean (s)
 -  -  -  --   ---  -
 -  --   ---  
 write 475.95 475.95  475.95  0.00  14.87
 14.87   14.87  0.00  17.21191   EXCEL

 Max Write: 475.95 MiB/sec (499.07 MB/sec)

 I am trying to understand where the bottleneck is, when processes write
 to a shared file.
 Your help is appreciated.


-- 
Dr.-Ing. Michael Kluge

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] 1.8 client loses contact to 1.6 router

2012-02-03 Thread Michael Kluge
Hi list,

we have a 1.6.7 fs running which still works nicely. One node exports this FS 
(via 10GE)  to another cluster that has some 1.8.5 patchless clients. These 
clients at some point (randomly, I think) mark the router as down (lctl 
show_route). It is always a different client and usually a few clients each 
week that do this. Despite that we configured the clients to ping the router 
again from time to time, the route never comes back. On these clients I can 
still ping the IP of the router but lctl ping gives me an Input/Output 
error. If I do somthing like:

lctl --net o2ib set_route 172.30.128.241@tcp1 down
sleep 45
lctl --net o2ib del_route 172.30.128.241@tcp1
sleep 45
lctl --net o2ib add_route 172.30.128.241@tcp1
sleep 45
lctl --net o2ib set_route 172.30.128.241@tcp1 up

the route comes back, sometimes the client works again but sometimes the 
clients issue an unexpected aliveness of peer .. and need a reboot.

I looked around and could not find a note whether 1.8. clients and 1.6 routers 
will work together as expexted. Has anyone experience with this kind of setup 
or an idea for further debugging?


Regards, Michael

modprobe.d/luste.conf on the 1.8.5 clients
-8--
options lnet networks=tcp1(eth0)
options lnet routes=o2ib 172.30.128.241@tcp1;
options lnet dead_router_check_interval=60 router_ping_timeout=30
-8--



-- 

Dr.-Ing. Michael Kluge

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] MDS failover: SSD+DRDB or shared 15K-SAS-Storage RAID with approx. 10 disks

2012-01-22 Thread Michael Kluge
Hi,

I have been asked, which one of the two I would chose for two MDS 
servers (active/passive). Whether I would like to have SSDs, maybe two 
(mirrored) in both servers and DRDB for synching, or a RAID controller 
that has a 15K disks. I have not done benchmarks on this topic myself 
and would like to ask if anyone has an idea or numbers? The cluster will 
be pretty small, about 50 clients.


Regards, Michael

-- 
Dr.-Ing. Michael Kluge

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] MDS failover: SSD+DRDB or shared 15K-SAS-Storage RAID with approx. 10 disks

2012-01-22 Thread Michael Kluge
Hi Carlos,

 In my experience SSDs didn't help much, since the MDS bottleneck is not
 only a disk problem rather than the entire lustre metadata mechanism.

Yes, but one does not need much space on the MDS and four SSDs (as MDT) 
are way cheaper than a RAID controller with 10 15K disks. So the 
question is basically how the DRDB latency will influence the MDT 
performance. I know sync/async makes a big difference here, but I have 
no idea about the performance impact of both or how the reliability is 
influenced.

 One remark about DRDB: I've seen customers using it, but IMHO, if
 Active/standby HA type configuration would be more reliable and will
 provide you a better resilience. Again, don't know about your uptime and
 reliability needs, but the customers I've worked with that requires
 minimum downtime on production, always go for RAID controllers rather than
 DRDB replication.

OK, thanks. That is a good information. So SSD+DRDB are considered to be 
the cheap solution. Even for small clusters?


Regards, Michael


 Regards,
 Carlos.


 --
 Carlos Thomaz | Systems Architect
 Mobile: +1 (303) 519-0578
 ctho...@ddn.com | Skype ID: carlosthomaz
 DataDirect Networks, Inc.
 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921
 ddn.comhttp://www.ddn.com/  | Twitter: @ddn_limitless
 http://twitter.com/ddn_limitless  | 1.800.TERABYTE





 On 1/22/12 12:04 PM, Michael Klugemichael.kl...@tu-dresden.de  wrote:

 Hi,

 I have been asked, which one of the two I would chose for two MDS
 servers (active/passive). Whether I would like to have SSDs, maybe two
 (mirrored) in both servers and DRDB for synching, or a RAID controller
 that has a 15K disks. I have not done benchmarks on this topic myself
 and would like to ask if anyone has an idea or numbers? The cluster will
 be pretty small, about 50 clients.


 Regards, Michael

 --
 Dr.-Ing. Michael Kluge

 Technische Universität Dresden
 Center for Information Services and
 High Performance Computing (ZIH)
 D-01062 Dresden
 Germany

 Contact:
 Willersbau, Room WIL A 208
 Phone:  (+49) 351 463-34217
 Fax:(+49) 351 463-37773
 e-mail: michael.kl...@tu-dresden.de
 WWW:http://www.tu-dresden.de/zih
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


-- 
Dr.-Ing. Michael Kluge

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Client behind Router can't mount with failover mgs

2011-12-20 Thread Michael Kluge
Hi Colin,

  our mgs server (Lustre 1.6.7) failed and we mounted it on the failover
  node. Our clients (1.6.7) on the same IB network are still functional.
 
 Ok.. Well aside from the fact that 1.6.7 is long since deprecated, what
 else isn't functional after failover?

Nothing. Everything is fine. Just the 1.8.5. clients behind a IB-10GE router 
can't mount anymore.

We have exported the fs via a Lustre/10GE router to another cluster
with a patchless 1.8.5. The router works , we can ping around and get
the usual protocol errors. But mounting the fs from the failover node
does not work on these clients. Is this expected or is this supposed
to work?
 
 Sorry, what are you actually trying to do here???

We have a (pretty old) SDR IB based Cluster with ~700 nodes and 10 Lustre 
servers. We use an IB-10GE router to attach this Lustre FS to another 
cluster. This works pretty well. But only, when the MGS is mounted on the 
primary node, not when the MGS is mounted on the failover node. I just want to 
know if this is an expected behaviour or not.


Regards, Michael

-- 

Dr.-Ing. Michael Kluge

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih


smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Client behind Router can't mount with failover mgs

2011-12-18 Thread Michael Kluge
Hi list,

our mgs server (Lustre 1.6.7) failed and we mounted it on the failover node. 
Our clients (1.6.7) on the same IB network are still functional. We have 
exported the fs via a Lustre/10GE router to another cluster with a patchless 
1.8.5. The router works , we can ping around and get the usual protocol errors. 
But mounting the fs from the failover node does not work on these clients. Is 
this expected or is this supposed to work?


Regards, Michael

--
Dr.-Ing. Michael Kluge

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] most recent kernel version for patchless client?

2011-12-14 Thread Michael Kluge
Hi list,

I am looking for information what the most recent kernel version is that I can 
use to build a patchless client for. OFED for example refuses to build on 
kernels 3.0.0. Has someone recently tried newer kernels with 1.8.7 ?


Regards, Michael

-- 

Dr.-Ing. Michael Kluge

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih



smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Interpreting iozone measurements

2011-03-09 Thread Michael Kluge
Hi all,

we have a testbed running with Lustre 1.8.3 and a RTT of ~4ms (10GE
network cards everywhere) for a ping between client and servers. If I
have read the iozone source code correctly, iozone reports bandwidth in
KB/s and includes the time for the open() call, but not for close(). If
I write a single 64 byte file (with one I/O request), iozone tells me
something like '295548', which means ~295 MB/s. Dividing the file size
by the bandwidth, I get the time that was needed to create the file and
write the 64 bytes (with a single request). In this case, the time is
about 0,2 micro seconds which is way below the RTT. 

That mean for a Lustre file system, if I create a file and write 64
bytes, the client sends two(?) RPCs to the server and does not wait for
the completion. Is this correct? But it will wait for the completion of
both RPCs when the file ist closed?

The numbers look different when I disable the client side cache by
setting max_dirty_mb to 0.


Regards, Michael

-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih


smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] OSS replacement

2011-02-24 Thread Michael Kluge
Hi Johann,

interesting. Is there no need to set the file system volume of the new
OSS name via tune2fs to the same string?


Michael


Am Donnerstag, den 24.02.2011, 10:48 +0100 schrieb Johann Lombardi: 
 Hi,
 
 On Thu, Feb 24, 2011 at 10:39:32AM +0100, Gizo Nanava wrote:
 we need to replace one of the  OSS in the cluster. We wounder whether 
  simply copying(eg. rsync) over network
  the content of all /dev/sdX(ldiskfs mounted) from OSS to be replaced to 
  the new, already lustre formatted OSS
  (all /dev/sdX on both servers are the same) will work?
 
 Yes, the procedure is detailed in the manual:
 http://wiki.lustre.org/manual/LustreManual18_HTML/LustreTroubleshooting.html#50651190_pgfId-1291458
 
 Johann
 

-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih


smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Running MGS and OSS on the same machine

2011-02-18 Thread Michael Kluge
Hi Arya,

if I remember well, Lustre uses 0@lo for the localhost address. Does 
using the other NID 192.168.0.10@tcp0 give any error message?


Michael

Am 18.02.2011 16:10, schrieb Arya Mazaheri:
 Hi again,
 I have planned to use one server as MGS and OSS simultaneously. But how
 can I format the OSTs as lustre FS?
 for example, the line below tells the ost which it's mgsnode is at
 192.168.0.10@tcp0:
 mkfs.lustre --fsname lustre --ost --mgsnode=192.168.0.10@tcp0 /dev/vg00/ost1

 But, now mgsnode is the same machine. I tried to put localhost instead
 the ip address. but I didn't work.

 What shoud I do?

 Arya



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


-- 
Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] up a router that is marked down

2011-01-25 Thread Michael Kluge
Hi list,

if a Lustre router is down, comes back to life and the servers do not
actively test the routers periodically: is it possible to mark a Lustre
router as up? Or to tell the servers to ping the router?

Or can I enable the router pinger in a live system without unloading
and loading the Lustre kernel modules?


Regards, Michael

-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih


smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] up a router that is marked down

2011-01-25 Thread Michael Kluge
Jason, Michael,

thanks y lot for your replies. I pinged everone from all directions but
the router is still marked down on the client. I even removed and
re-added the router entry via lctl --net tcp1 del_route xyz@o2ib and
lctl --net tcp1 add_route xyz@o2ib . No luck. So I think I'll wait for
the next maintenance window. Oh, and I forgot to mention that the
servers run a 1.6.7.2, the router as well and the clients 1.8.5. Works
good so far. 


Thanks, Michael


Am Dienstag, den 25.01.2011, 15:12 +0100 schrieb Temple Jason: 
 I've found that even with the Protocal Error, it still works.
 
 -Jason
 
 -Original Message-
 From: lustre-discuss-boun...@lists.lustre.org 
 [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Michael Shuey
 Sent: martedì, 25. gennaio 2011 14:45
 To: Michael Kluge
 Cc: Lustre Diskussionsliste
 Subject: Re: [Lustre-discuss] up a router that is marked down
 
 You'll want to add the dead_router_check_interval lnet module
 parameter as soon as you are able.  As near as I can tell, without
 that there's no automatic check to make sure the router is alive.
 
 I've had some success in getting machines to recognize that a router
 is alive again by doing an lctl ping of their side of a router (e.g.,
 on a tcp0 client, `lctl ping routerIP@tcp0`, then `lctl ping
 routerIP@o2ib0` from an o2ib0 client).  If you have a server/client
 version mismatch, where lctl ping returns a protocol error, you may be
 out of luck.
 
 --
 Mike Shuey
 
 
 
 On Tue, Jan 25, 2011 at 8:38 AM, Michael Kluge
 michael.kl...@tu-dresden.de wrote:
  Hi list,
 
  if a Lustre router is down, comes back to life and the servers do not
  actively test the routers periodically: is it possible to mark a Lustre
  router as up? Or to tell the servers to ping the router?
 
  Or can I enable the router pinger in a live system without unloading
  and loading the Lustre kernel modules?
 
 
  Regards, Michael
 
  --
 
  Michael Kluge, M.Sc.
 
  Technische Universität Dresden
  Center for Information Services and
  High Performance Computing (ZIH)
  D-01062 Dresden
  Germany
 
  Contact:
  Willersbau, Room A 208
  Phone:  (+49) 351 463-34217
  Fax:(+49) 351 463-37773
  e-mail: michael.kl...@tu-dresden.de
  WWW:http://www.tu-dresden.de/zih
 
  ___
  Lustre-discuss mailing list
  Lustre-discuss@lists.lustre.org
  http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 

-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih


smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] up a router that is marked down

2011-01-25 Thread Michael Kluge
Hi Jeremy,

yup, it's marked  obsolete (DANGEROUS) , whatever, it did the
trick :)


Thanks a lot, Michael



Am Dienstag, den 25.01.2011, 18:55 -0500 schrieb Jeremy Filizetti: 
 Though I think its marked as development or experimental in the Lustre
 documention or source lctl set_route has worked fine for me in the
 past with no issues.
  
 lctl set_route nid up
  
 is the syntax I believe.
  
 Jeremy
 
 
 On Tue, Jan 25, 2011 at 9:52 AM, Michael Kluge
 michael.kl...@tu-dresden.de wrote:
 Jason, Michael,
 
 thanks y lot for your replies. I pinged everone from all
 directions but
 the router is still marked down on the client. I even
 removed and
 re-added the router entry via lctl --net tcp1 del_route
 xyz@o2ib and
 lctl --net tcp1 add_route xyz@o2ib . No luck. So I think I'll
 wait for
 the next maintenance window. Oh, and I forgot to mention that
 the
 servers run a 1.6.7.2, the router as well and the clients
 1.8.5. Works
 good so far.
 
 
 Thanks, Michael
 
 
 Am Dienstag, den 25.01.2011, 15:12 +0100 schrieb Temple
 Jason: 
 
  I've found that even with the Protocal Error, it still
 works.
 
  -Jason
 
  -Original Message-
  From: lustre-discuss-boun...@lists.lustre.org
 [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of
 Michael Shuey
  Sent: martedì, 25. gennaio 2011 14:45
  To: Michael Kluge
  Cc: Lustre Diskussionsliste
  Subject: Re: [Lustre-discuss] up a router that is marked
 down
 
  You'll want to add the dead_router_check_interval lnet
 module
  parameter as soon as you are able.  As near as I can tell,
 without
  that there's no automatic check to make sure the router is
 alive.
 
  I've had some success in getting machines to recognize that
 a router
  is alive again by doing an lctl ping of their side of a
 router (e.g.,
  on a tcp0 client, `lctl ping routerIP@tcp0`, then `lctl
 ping
  routerIP@o2ib0` from an o2ib0 client).  If you have a
 server/client
  version mismatch, where lctl ping returns a protocol error,
 you may be
  out of luck.
 
  --
  Mike Shuey
 
 
 
  On Tue, Jan 25, 2011 at 8:38 AM, Michael Kluge
  michael.kl...@tu-dresden.de wrote:
   Hi list,
  
   if a Lustre router is down, comes back to life and the
 servers do not
   actively test the routers periodically: is it possible to
 mark a Lustre
   router as up? Or to tell the servers to ping the router?
  
   Or can I enable the router pinger in a live system
 without unloading
   and loading the Lustre kernel modules?
  
  
   Regards, Michael
  
   --
  
   Michael Kluge, M.Sc.
  
   Technische Universität Dresden
   Center for Information Services and
   High Performance Computing (ZIH)
   D-01062 Dresden
   Germany
  
   Contact:
   Willersbau, Room A 208
   Phone:  (+49) 351 463-34217
   Fax:(+49) 351 463-37773
   e-mail: michael.kl...@tu-dresden.de
   WWW:http://www.tu-dresden.de/zih
  
   ___
   Lustre-discuss mailing list
   Lustre-discuss@lists.lustre.org
   http://lists.lustre.org/mailman/listinfo/lustre-discuss
  
  
  ___
  Lustre-discuss mailing list
  Lustre-discuss@lists.lustre.org
  http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 
 
 -- 
 
 
 Michael Kluge, M.Sc.
 
 Technische Universität Dresden
 Center for Information Services and
 High Performance Computing (ZIH)
 D-01062 Dresden
 Germany
 
 Contact:
 Willersbau, Room A 208
 Phone:  (+49) 351 463-34217
 Fax:(+49) 351 463-37773
 e-mail: michael.kl...@tu-dresden.de
 WWW:http://www.tu-dresden.de/zih
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 

-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351

Re: [Lustre-discuss] lnet rounter immediatelly marked as down

2010-12-03 Thread Michael Kluge
Hi Liang,

sure, but my current question is: Why are the nodes within o2ib 
considering the router as down?

I add the route to a node within o2ib and instantly afterwards lctl 
show_route say the router is down. That does not make much sense to me.

And if I try to send a message through the router from this node I see 
that it can't send the message beause all routers are down.


Regards, Michael

Am 03.12.2010 16:29, schrieb liang Zhen:
   Hi Michael,

 To add router dynamically, you also have to run --net o2ib add_route
 a.b@tcp1 on all nodes of tcp1, so the better choice is using
 universal modprobe.conf by define ip2nets and routes, you can see
 some example at here:
 http://wiki.lustre.org/manual/LustreManual18_HTML/MoreComplicatedConfigurations.html

 Regards
 Liang

 On 12/3/10 9:32 PM, Michael Kluge wrote:
 Hi list,

 we have a Lustr 1.6.7.2 running on our (IB SDR) cluster and have added
 one additional NIC (tcp1) to one node and like to use this node as
 router. I have added a ip2nets statement and forwaring=enabled to the
 modprobe files on the router and reloaded the modules. I see two NIDS
 now and no trouble.

 The MDS server that need to go through the router to a hand full of
 additional clients is in production and I can't take it down. So I added
 the route to the additional network via lctl --net tcp1 add_route
 w.x@o2ib where W.X.Y.Z is the ipoib address of the router. When I do
 an lctl show_routes, this router is marked as down. Is there a way to
 bring it to life? I can lctl ping the router node from the MDS but can't
 reload lnet to enable active router tests. Right now on the MDS the only
 option for the lnet module is the network config for the IB network
 interface.

 Any ideas who to enable this router?


 Regards, Michael



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] sgpdd-survey provokes DID_BUS_BUSY on an SFA10K

2010-10-23 Thread Michael Kluge
Hi Bernd,

do you have a rpm with OFED 1.4 kernel modules for your kernel? I took a 
2.6.18-164 from the Lustre kernels and OFED won't built against it. The 
OFED backports report lot and lots of symbols as redefined.


Michael

Am 22.10.2010 23:30, schrieb Bernd Schubert:
 Hello Michael,

 On Friday, October 22, 2010, you wrote:
 Hi Bernd,

 I'm sorry to hear that. Unfortunately, I really do not have the time to
 port this version to your kernel version.

 No worries. I don't expect this :)

 I remember that you use Debian. But I guess you are still using a SLES
 kernel then? You could ask Suse about it, although I guess they only do
 care about SP1 with 2.6.32-sles now. If you use Debian Lenny, the RHEL5
 kernel should work (and besides its name, it is internally more or less
 a 2.6.29 to 2.6.32 kernel). Later Debian and Ubuntu releases have a more
 recent udev, which requires at least 2.6.27.

 OK, if the 2.6.18 works like a charm, I'll give the 2.6.18-194 it a try.

 Just don't forget that -194 requires 1.8.4 (I think you had been at 1.8.3
 previously). We also have this driver added as Lustre kernel patch in our -ddn
 releases. 1.8.4 is in testing, but I have not uploaded it yet. 1.8.3-ddn also
 includes the driver together with with recent security backports.

 http://eu.ddn.com:8080/lustre/lustre/1.8.3/


 Cheers,
 Bernd



-- 
Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] sgpdd-survey provokes DID_BUS_BUSY on an SFA10K

2010-10-23 Thread Michael Kluge
Hi Bernd,

I get the same message with you kernel RPMS:

In file included from include/linux/list.h:6,
  from include/linux/mutex.h:13,
  from 
/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.4/drivers/infiniband/core/addr.c:36:
/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.4/kernel_addons/backport/2.6.18_FC6/include/linux/stddef.h:9:
 
error: redeclaration of enumerator 'false'
include/linux/stddef.h:16: error: previous definition of 'false' was here
/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.4/kernel_addons/backport/2.6.18_FC6/include/linux/stddef.h:11:
 
error: redeclaration of enumerator 'true'
include/linux/stddef.h:18: error: previous definition of 'true' was here

Could it be that this '2.6.18 being almost an 2.6.28/29' confuses the 
OFED backports and the 2.6.18 backport does not work anymore? Is that 
solvable? I found nothing in the OFED bugzilla.


Michael

Am 23.10.2010 17:51, schrieb Michael Kluge:
 Hi Bernd,

 do you have a rpm with OFED 1.4 kernel modules for your kernel? I took a
 2.6.18-164 from the Lustre kernels and OFED won't built against it. The
 OFED backports report lot and lots of symbols as redefined.


 Michael

 Am 22.10.2010 23:30, schrieb Bernd Schubert:
 Hello Michael,

 On Friday, October 22, 2010, you wrote:
 Hi Bernd,

 I'm sorry to hear that. Unfortunately, I really do not have the time to
 port this version to your kernel version.

 No worries. I don't expect this :)

 I remember that you use Debian. But I guess you are still using a SLES
 kernel then? You could ask Suse about it, although I guess they only do
 care about SP1 with 2.6.32-sles now. If you use Debian Lenny, the RHEL5
 kernel should work (and besides its name, it is internally more or less
 a 2.6.29 to 2.6.32 kernel). Later Debian and Ubuntu releases have a more
 recent udev, which requires at least 2.6.27.

 OK, if the 2.6.18 works like a charm, I'll give the 2.6.18-194 it a try.

 Just don't forget that -194 requires 1.8.4 (I think you had been at 1.8.3
 previously). We also have this driver added as Lustre kernel patch in our 
 -ddn
 releases. 1.8.4 is in testing, but I have not uploaded it yet. 1.8.3-ddn also
 includes the driver together with with recent security backports.

 http://eu.ddn.com:8080/lustre/lustre/1.8.3/


 Cheers,
 Bernd





-- 
Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] sgpdd-survey provokes DID_BUS_BUSY on an SFA10K

2010-10-22 Thread Michael Kluge
Reducing the queue depth from the default of 32 to 8 did not help. It
looks like this problem always shows up when I am writing to more than
one region. 2 regions and 2 threads are enough to see the problem. The
last tests that succeeds is 1 one region and 16 threads. 1/32 is not
being tested.

Michael

Am Freitag, den 22.10.2010, 10:48 +0200 schrieb Michael Kluge: 
 Hi list,
 
 DID_BUS_BUSY means that the controller is unable to handle the SCSI
 command and is basically asking the host to send it again later. I had I
 think just one concurrent region and 32 threads running. What would be
 the appropriate action in this case? Reducing the queue depth on the
 HBA? We have Qlogic here, there is an option for the kernel module for
 this.
 
 
 Regards, Michael
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih


smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] sgpdd-survey provokes DID_BUS_BUSY on an SFA10K

2010-10-22 Thread Michael Kluge
Hi Bernd,

I have found a RHEL-only release for this version. It does not compile
on a 2.6.27 kernel :( I actually don't want to go back to 2.6.18 just to
get a new driver.


Michael

Am Freitag, den 22.10.2010, 13:34 +0200 schrieb Bernd Schubert: 
 On Friday, October 22, 2010, Michael Kluge wrote:
  Hi list,
  
  DID_BUS_BUSY means that the controller is unable to handle the SCSI
  command and is basically asking the host to send it again later. I had I
  think just one concurrent region and 32 threads running. What would be
  the appropriate action in this case? Reducing the queue depth on the
  HBA? We have Qlogic here, there is an option for the kernel module for
  this.
 
 I think you run into a known issue with the Q-Logic driver an the SFA10K. You 
 will need at least qla2xxx version 8.03.01.06.05.06-k. And the optimal 
 numbers 
 of commands is likely to be 16 (with 4 OSS connected).
 
 
 Hope it helps,
 Bernd
 

-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih


smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] ldiskfs performance vs. XFS performance

2010-10-20 Thread Michael Kluge
Thanks a lot for all the replies. sgpdd shows 700+ MB/s for the device.
We trapped into one or two bugs with obdfilter-survey as lctl has at
least one bug in 1.8.3 when is uses multiple threads and
obdfilter-survey also causes an LBUG when you CTRL+C it. We see 600+
MB/s for obdfilter-survey over a reasonable parameter space after we
changed to the ext4 based ldiskfs. So that seems to be the trick.

Michael

Am Montag, den 18.10.2010, 14:04 -0600 schrieb Andreas Dilger: 
 On 2010-10-18, at 10:40, Johann Lombardi wrote:
  On Mon, Oct 18, 2010 at 01:58:40PM +0200, Michael Kluge wrote:
  dd if=/dev/zero of=$RAM_DEV bs=1M count=1000
  mke2fs -O journal_dev -b 4096 $RAM_DEV
  
  mkfs.lustre  --device-size=$((7*1024*1024*1024)) --ost --fsname=luram
  --mgsnode=$MDS_NID --mkfsoptions=-E stride=32,stripe-width=256 -b 4096
  -j -J device=$RAM_DEV /dev/disk/by-path/...
  
  mount -t ldiskfs /dev/disk/by-path/... /mnt/ost_1
  
  In fact, Lustre uses additional mount options (see Persistent mount opts 
  in tunefs.lustre output).
  If your ldiskfs module is based on ext3, you should add the extents and 
  mballoc options which are known to improve performance.
 
 Even then, the IO submission path of ext3 from userspace is not very good, 
 and such a performance difference is not unexpected.  When submitting IO from 
 userspace to ext3/ldiskfs it is being done in 4kB blocks, and each block is 
 allocated separately (regardless of mballoc, unfortunately).  When Lustre is 
 doing IO from the kernel, the client is aggregating the IO into 1MB chunks 
 and the entire 1MB write is allocated in one operation.
 
 That is why we developed the delalloc code for ext4 - so that userspace 
 could also get better IO performance, and utilize the multi-block allocation 
 (mballoc) routines that have been in ldiskfs for ages, but only accessible 
 from the kernel.
 
 For Lustre performance testing, I would suggest looking at lustre-iokit, and 
 in particular sgpdd to test the underlying block device, and then 
 obdfilter-survey to test the local Lustre IO submission path.
 
 Cheers, Andreas
 --
 Andreas Dilger
 Lustre Technical Lead
 Oracle Corporation Canada Inc.
 
 

-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih


smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] high CPU load limits bandwidth?

2010-10-20 Thread Michael Kluge
Hi list,

is it normal, that a 'dd' or an 'IOR' pushing 10MB blocks to a lustre
file system shows up with a 100% CPU load within 'top'? The reason why I
am asking this is that I can write from one client to one OST with 500
MB/s. The CPU load will be at 100% in this case. If I stripe over two
OSTs (which use different OSS servers and different RAID controllers) I
will get 500 as well (seeing 2x250 MB/s on the OSTs). The CPU load will
be at 100% again. 

A 'dd' on my desktop pushing 10M blocks to the local disk shows 7-10%
CPU load.

Are there ways to tune this behavior? Changing max_rpcs_in_flight and
max_dirty_mb did not help.


Regards, Michael

-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih


smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] high CPU load limits bandwidth?

2010-10-20 Thread Michael Kluge
It is the CPU load on the client. The dd/IOR process is using one core 
completely. The clients and the servers are connected via DDR IB. LNET 
bandwidth is at 1.8 GB/s. Servers have 1.8.3, the client has 1.8.3 
patchless.


Micha

Am 20.10.2010 18:15, schrieb Andreas Dilger:
 Is this client CPU or server CPU?  If you are using Ethernet it will 
 definitely be CPU hungry and can easily saturate a single core.

 Cheers, Andreas

 On 2010-10-20, at 8:41, Michael Klugemichael.kl...@tu-dresden.de  wrote:

 Hi list,

 is it normal, that a 'dd' or an 'IOR' pushing 10MB blocks to a lustre
 file system shows up with a 100% CPU load within 'top'? The reason why I
 am asking this is that I can write from one client to one OST with 500
 MB/s. The CPU load will be at 100% in this case. If I stripe over two
 OSTs (which use different OSS servers and different RAID controllers) I
 will get 500 as well (seeing 2x250 MB/s on the OSTs). The CPU load will
 be at 100% again.

 A 'dd' on my desktop pushing 10M blocks to the local disk shows 7-10%
 CPU load.

 Are there ways to tune this behavior? Changing max_rpcs_in_flight and
 max_dirty_mb did not help.


 Regards, Michael

 --

 Michael Kluge, M.Sc.

 Technische Universität Dresden
 Center for Information Services and
 High Performance Computing (ZIH)
 D-01062 Dresden
 Germany

 Contact:
 Willersbau, Room A 208
 Phone:  (+49) 351 463-34217
 Fax:(+49) 351 463-37773
 e-mail: michael.kl...@tu-dresden.de
 WWW:http://www.tu-dresden.de/zih
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss



-- 
Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] ldiskfs performance vs. XFS performance

2010-10-20 Thread Michael Kluge
 For your final final filesystem you still probably want to enable async
 journals (unless you are willing to enable the S2A unmirrored device cache).

OK, thanks. We'll give this a try.

Michael

 Most obdecho/obdfilter-survey bugs are gone in 1.8.4, except your ctrl+c
 problem, for which a patch exists:

 https://bugzilla.lustre.org/show_bug.cgi?id=21745






 Cheers,
 Bernd


 On Wednesday, October 20, 2010, Michael Kluge wrote:
 Thanks a lot for all the replies. sgpdd shows 700+ MB/s for the device.
 We trapped into one or two bugs with obdfilter-survey as lctl has at
 least one bug in 1.8.3 when is uses multiple threads and
 obdfilter-survey also causes an LBUG when you CTRL+C it. We see 600+
 MB/s for obdfilter-survey over a reasonable parameter space after we
 changed to the ext4 based ldiskfs. So that seems to be the trick.

 Michael

 Am Montag, den 18.10.2010, 14:04 -0600 schrieb Andreas Dilger:
 On 2010-10-18, at 10:40, Johann Lombardi wrote:
 On Mon, Oct 18, 2010 at 01:58:40PM +0200, Michael Kluge wrote:
 dd if=/dev/zero of=$RAM_DEV bs=1M count=1000
 mke2fs -O journal_dev -b 4096 $RAM_DEV

 mkfs.lustre  --device-size=$((7*1024*1024*1024)) --ost --fsname=luram
 --mgsnode=$MDS_NID --mkfsoptions=-E stride=32,stripe-width=256 -b
 4096 -j -J device=$RAM_DEV /dev/disk/by-path/...

 mount -t ldiskfs /dev/disk/by-path/... /mnt/ost_1

 In fact, Lustre uses additional mount options (see Persistent mount
 opts in tunefs.lustre output). If your ldiskfs module is based on
 ext3, you should add the extents and mballoc options which are known
 to improve performance.

 Even then, the IO submission path of ext3 from userspace is not very
 good, and such a performance difference is not unexpected.  When
 submitting IO from userspace to ext3/ldiskfs it is being done in 4kB
 blocks, and each block is allocated separately (regardless of mballoc,
 unfortunately).  When Lustre is doing IO from the kernel, the client is
 aggregating the IO into 1MB chunks and the entire 1MB write is allocated
 in one operation.

 That is why we developed the delalloc code for ext4 - so that userspace
 could also get better IO performance, and utilize the multi-block
 allocation (mballoc) routines that have been in ldiskfs for ages, but
 only accessible from the kernel.

 For Lustre performance testing, I would suggest looking at lustre-iokit,
 and in particular sgpdd to test the underlying block device, and then
 obdfilter-survey to test the local Lustre IO submission path.

 Cheers, Andreas
 --
 Andreas Dilger
 Lustre Technical Lead
 Oracle Corporation Canada Inc.




-- 
Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] high CPU load limits bandwidth?

2010-10-20 Thread Michael Kluge
Using O_DIRECT reduces the CPU load but the magical limit of 500 MB/s 
for one thread remains. Are the CRC sums calculated on a per thread 
base? Or stripe base? Is there a way to test the checksumming speed only?


Michael

Am 20.10.2010 18:53, schrieb Andreas Dilger:
 On 2010-10-20, at 10:40, Michael Klugemichael.kl...@tu-dresden.de  wrote:
 It is the CPU load on the client. The dd/IOR process is using one core 
 completely. The clients and the servers are connected via DDR IB. LNET 
 bandwidth is at 1.8 GB/s. Servers have 1.8.3, the client has 1.8.3 patchless.

 If you only have a single threaded write, then this is somewhat unavoidable 
 to saturate a CPU due to copy_from_user().  O_DIRECT will avoid this.

   Also, disabling data checksums and debugging can help considerably. There 
 is a patch in bugzilla to add support for h/w crc32c on Nehalem CPUs to 
 reduce this overhead, but still not as fast as no checksum at all.

 Cheers, Andreas

 Am 20.10.2010 18:15, schrieb Andreas Dilger:
 Is this client CPU or server CPU?  If you are using Ethernet it will 
 definitely be CPU hungry and can easily saturate a single core.

 Cheers, Andreas

 On 2010-10-20, at 8:41, Michael Klugemichael.kl...@tu-dresden.de   wrote:

 Hi list,

 is it normal, that a 'dd' or an 'IOR' pushing 10MB blocks to a lustre
 file system shows up with a 100% CPU load within 'top'? The reason why I
 am asking this is that I can write from one client to one OST with 500
 MB/s. The CPU load will be at 100% in this case. If I stripe over two
 OSTs (which use different OSS servers and different RAID controllers) I
 will get 500 as well (seeing 2x250 MB/s on the OSTs). The CPU load will
 be at 100% again.

 A 'dd' on my desktop pushing 10M blocks to the local disk shows 7-10%
 CPU load.

 Are there ways to tune this behavior? Changing max_rpcs_in_flight and
 max_dirty_mb did not help.


 Regards, Michael

 --

 Michael Kluge, M.Sc.

 Technische Universität Dresden
 Center for Information Services and
 High Performance Computing (ZIH)
 D-01062 Dresden
 Germany

 Contact:
 Willersbau, Room A 208
 Phone:  (+49) 351 463-34217
 Fax:(+49) 351 463-37773
 e-mail: michael.kl...@tu-dresden.de
 WWW:http://www.tu-dresden.de/zih
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss



 --
 Michael Kluge, M.Sc.

 Technische Universität Dresden
 Center for Information Services and
 High Performance Computing (ZIH)
 D-01062 Dresden
 Germany

 Contact:
 Willersbau, Room WIL A 208
 Phone:  (+49) 351 463-34217
 Fax:(+49) 351 463-37773
 e-mail: michael.kl...@tu-dresden.de
 WWW:http://www.tu-dresden.de/zih



-- 
Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] high CPU load limits bandwidth?

2010-10-20 Thread Michael Kluge
Disabling checksums boosts the performance to 660 MB/s for a single 
thread. Now placing 6 IOR processes one my eight core box gives with 
some striping 1.6 GB/s which is close to the LNET bandwidth. Thanks a 
lot again!

Michael

Am 20.10.2010 19:13, schrieb Michael Kluge:
 Using O_DIRECT reduces the CPU load but the magical limit of 500 MB/s
 for one thread remains. Are the CRC sums calculated on a per thread
 base? Or stripe base? Is there a way to test the checksumming speed only?


 Michael

 Am 20.10.2010 18:53, schrieb Andreas Dilger:
 On 2010-10-20, at 10:40, Michael Klugemichael.kl...@tu-dresden.de   wrote:
 It is the CPU load on the client. The dd/IOR process is using one core 
 completely. The clients and the servers are connected via DDR IB. LNET 
 bandwidth is at 1.8 GB/s. Servers have 1.8.3, the client has 1.8.3 
 patchless.

 If you only have a single threaded write, then this is somewhat unavoidable 
 to saturate a CPU due to copy_from_user().  O_DIRECT will avoid this.

Also, disabling data checksums and debugging can help considerably. There 
 is a patch in bugzilla to add support for h/w crc32c on Nehalem CPUs to 
 reduce this overhead, but still not as fast as no checksum at all.

 Cheers, Andreas

 Am 20.10.2010 18:15, schrieb Andreas Dilger:
 Is this client CPU or server CPU?  If you are using Ethernet it will 
 definitely be CPU hungry and can easily saturate a single core.

 Cheers, Andreas

 On 2010-10-20, at 8:41, Michael Klugemichael.kl...@tu-dresden.de
 wrote:

 Hi list,

 is it normal, that a 'dd' or an 'IOR' pushing 10MB blocks to a lustre
 file system shows up with a 100% CPU load within 'top'? The reason why I
 am asking this is that I can write from one client to one OST with 500
 MB/s. The CPU load will be at 100% in this case. If I stripe over two
 OSTs (which use different OSS servers and different RAID controllers) I
 will get 500 as well (seeing 2x250 MB/s on the OSTs). The CPU load will
 be at 100% again.

 A 'dd' on my desktop pushing 10M blocks to the local disk shows 7-10%
 CPU load.

 Are there ways to tune this behavior? Changing max_rpcs_in_flight and
 max_dirty_mb did not help.


 Regards, Michael

 --

 Michael Kluge, M.Sc.

 Technische Universität Dresden
 Center for Information Services and
 High Performance Computing (ZIH)
 D-01062 Dresden
 Germany

 Contact:
 Willersbau, Room A 208
 Phone:  (+49) 351 463-34217
 Fax:(+49) 351 463-37773
 e-mail: michael.kl...@tu-dresden.de
 WWW:http://www.tu-dresden.de/zih
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss



 --
 Michael Kluge, M.Sc.

 Technische Universität Dresden
 Center for Information Services and
 High Performance Computing (ZIH)
 D-01062 Dresden
 Germany

 Contact:
 Willersbau, Room WIL A 208
 Phone:  (+49) 351 463-34217
 Fax:(+49) 351 463-37773
 e-mail: michael.kl...@tu-dresden.de
 WWW:http://www.tu-dresden.de/zih





-- 
Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] ldiskfs performance vs. XFS performance

2010-10-18 Thread Michael Kluge
Hi list,

we have Lustre 1.8.3 running on a DDN 9900. One LUN (10 discs) formatted
with XFS shows 400 MB/s if oppressed with one 'dd' and large block
sizes. One LUN formatted an mounted with ldiskfs (the ext3 based that is
default in 1.8.3.) shows 110 MB/s. It this the expected behaviour? It
looks a bit low compared to XFS.

We think with help from DDN we did everything we can from a hardware
perspective. We formatted the LUN with the correct striping and stripe
size, DDN adjusted some controller parameters and we even put the file
system journal on a RAM disk. The LUN has 16 TB capacity. I formated
only 7 for the moment due to the 8 TB limit. 

This is what I did:

mds_nid...@somehwere
RAM_DEV=/dev/ram1
dd if=/dev/zero of=$RAM_DEV bs=1M count=1000
mke2fs -O journal_dev -b 4096 $RAM_DEV

mkfs.lustre  --device-size=$((7*1024*1024*1024)) --ost --fsname=luram
--mgsnode=$MDS_NID --mkfsoptions=-E stride=32,stripe-width=256 -b 4096
-j -J device=$RAM_DEV /dev/disk/by-path/...

mount -t ldiskfs /dev/disk/by-path/... /mnt/ost_1

Is there a way to push the bandwidth limit for a single data stream any
further?



Michael


-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih


smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] 1.8/2.6.32 support

2010-10-04 Thread Michael Kluge
Hi,

is there any chance to get a 1.8.4 compiled on a 2.6.32+ kernel right
now with the standard Lustre sources that are available through the
download pages? The build your own kernel wiki page points to a
collection of supported kernels
http://downloads.lustre.org/public/kernels/sles11/
which has a 2.6.32 in it but I could not find a working set of patches
for this. Has anyone been more successful?


Michael

Am Montag, den 26.04.2010, 12:11 -0600 schrieb Andreas Dilger: 
 On 2010-03-31, at 10:16, Stephen Willey wrote:
  Obviously there is no RH-6.0 just yet (at least not beta or release) and as 
  such 2.6.32 is not on the supported kernels list - obviously fair enough.
  
  There are bugzilla entries with patches for 2.6.32 but these all apply to 
  HEAD as opposed to the b1_8 branch.  Particularly all the stuff that 
  applied against libcfs/blah/blah.m4
  
  I'm trying to build an up-to-date patchless 1.8 client for Fedora 12 
  (2.6.32) and given a few hours to mash patches from HEAD into b1_8, it's 
  doable, albeit hacky (I'm not a programmer) whereas I can compile HEAD 
  almost without modification.
  
  Is it the intention to backport these various changes into b1_8 or is that 
  more or less as-is now until the release of 2.0?  We're in a bit of an 
  awkward place since we can't compile 1.6.7.2 on 2.6.32 and 2.0 is still not 
  in a production state.
 
 There is work going on in bugzilla for b1_8 SLES11 SP1(?) kernel support, 
 which will hopefully also be usable for RHEL6, when it is available.
 
 Cheers, Andreas
 --
 Andreas Dilger
 Lustre Technical Lead
 Oracle Corporation Canada Inc.
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 

-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih


smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] ls does not work on ram disk for normal user

2010-09-22 Thread Michael Kluge
Hi all,

I have a 1.8.3 running on a couple of servers connected via IB to a
small cluster. To test the network performance I have one MDS and 14 OST
residing in ram disks. One the client it is mounted on /lustre.

I have a file in this directory (created as root and then chown'ed to
'mkluge'):

mkl...@r2i0n0:~ ls -la /lustre/dfddd/ball
-rw-r--r-- 1 mkluge zih 14680064000 2010-09-22 10:14 /lustre/dfddd/ball
mkl...@r2i0n0:~ cd /lustre/dfddd/
mkl...@r2i0n0:/lustre/dfddd ls
/bin/ls: .: Identifier removed
mkl...@r2i0n0:/lustre/dfddd ls -la
/bin/ls: .: Identifier removed

Has anyone an idea what this could be? I can't event create a directory
in /lustre

mkl...@r2i0n0:~ mkdir /lustre/ww
mkdir: cannot create directory `/lustre/ww': Identifier removed

'root' is able to create the directory. Setting permissions to '777' or 
'1777' does not help either.

The MDS was formated to use mgt and mgs from the same ram device.


Regards, Michael


-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih


smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] ls does not work on ram disk for normal user

2010-09-22 Thread Michael Kluge
Ahh. This user has different UIDs on the clients and the server. Do they 
actually have to be the same? I thought the MDS and the OSS servers just store 
files with the uid/gid as reported by the client. I did not assume that the 
servers need to map these UIDs to a user name.

Michael


Am 22.09.2010 um 10:57 schrieb Thomas Roth:

 Hi Michael,
 
 Identifier removed occured to me when the user data base was not accessible 
 by
 the MDS - when the MDS didn't know about any normal user. root is of course 
 known
 there, but what does e.g. id mkluge say on your MDS?
 
 Regards,
 Thomas
 
 On 09/22/2010 10:29 AM, Michael Kluge wrote:
 Hi all,
 
 I have a 1.8.3 running on a couple of servers connected via IB to a
 small cluster. To test the network performance I have one MDS and 14 OST
 residing in ram disks. One the client it is mounted on /lustre.
 
 I have a file in this directory (created as root and then chown'ed to
 'mkluge'):
 
 mkl...@r2i0n0:~  ls -la /lustre/dfddd/ball
 -rw-r--r-- 1 mkluge zih 14680064000 2010-09-22 10:14 /lustre/dfddd/ball
 mkl...@r2i0n0:~  cd /lustre/dfddd/
 mkl...@r2i0n0:/lustre/dfddd  ls
 /bin/ls: .: Identifier removed
 mkl...@r2i0n0:/lustre/dfddd  ls -la
 /bin/ls: .: Identifier removed
 
 Has anyone an idea what this could be? I can't event create a directory
 in /lustre
 
 mkl...@r2i0n0:~  mkdir /lustre/ww
 mkdir: cannot create directory `/lustre/ww': Identifier removed
 
 'root' is able to create the directory. Setting permissions to '777' or
 '1777' does not help either.
 
 The MDS was formated to use mgt and mgs from the same ram device.
 
 
 Regards, Michael
 
 
 
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 -- 
 
 Thomas Roth
 Department: Informationstechnologie
 Location: SB3 1.262
 Phone: +49-6159-71 1453  Fax: +49-6159-71 2986
 
 GSI Helmholtzzentrum für Schwerionenforschung GmbH
 Planckstraße 1
 64291 Darmstadt
 www.gsi.de
 
 Gesellschaft mit beschränkter Haftung
 Sitz der Gesellschaft: Darmstadt
 Handelsregister: Amtsgericht Darmstadt, HRB 1528
 
 Geschäftsführung: Professor Dr. Dr. h.c. Horst Stöcker,
 Dr. Hartmut Eickhoff
 
 Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph
 Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
 


-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lnet router tuning

2010-09-13 Thread Michael Kluge
Hi Eric,

basically right now I have one IB node, one 10GE node and one router node that 
has both types of network interfaces.

I've got a small lnet test script on the router node, that does the work:
export LST_SESSION=$$
lst new_session rw
lst add_group readers 192.168.1...@tcp
lst add_group writers 10.148.0...@o2ib
lst add_batch bulk_rw
lst add_test --batch bulk_rw --from writers --to readers brw read check=simple 
size=1M
lst run bulk_rw
lst stat writers  sleep 30; kill $!
lst end_session

Is there a way to figure out the messages in flight? I remember to have a 
rpc's in flight tunable but this is connected to the OSC layer which does not 
do anything in my case (I think).


Michael



Am 13.09.2010 um 03:08 schrieb Eric Barton:

  
 Michael,
  
  
 How are you generating load and measuring the throughput?   I’m particularly 
 interested in the number
 of nodes on each side of the router and how many messages you have in flight 
 between each one.
  
  
 Cheers,
Eric
  
  
  
  
 From: lustre-discuss-boun...@lists.lustre.org 
 [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Michael Kluge
 Sent: 11 September 2010 12:56 AM
 To: Michael Kluge
 Cc: Lustre Diskussionsliste
 Subject: Re: [Lustre-discuss] lnet router tuning
  
 And here are my params:
  
 r...@doss05:/home/tests/lnet# for F in /sys/module/lnet/parameters/* ; do 
 echo -n $F: ; cat $F ; done
 /sys/module/lnet/parameters/accept: secure
 /sys/module/lnet/parameters/accept_backlog: 127
 /sys/module/lnet/parameters/accept_port: 988
 /sys/module/lnet/parameters/accept_timeout: 5
 /sys/module/lnet/parameters/auto_down: 1
 /sys/module/lnet/parameters/avoid_asym_router_failure: 0
 /sys/module/lnet/parameters/check_routers_before_use: 0
 /sys/module/lnet/parameters/config_on_load: 0
 /sys/module/lnet/parameters/dead_router_check_interval: 0
 /sys/module/lnet/parameters/forwarding: enabled
 /sys/module/lnet/parameters/ip2nets: 
 /sys/module/lnet/parameters/large_router_buffers: 512
 /sys/module/lnet/parameters/live_router_check_interval: 0
 /sys/module/lnet/parameters/local_nid_dist_zero: 1
 /sys/module/lnet/parameters/networks: tcp0(eth2),o2ib(ib1)
 /sys/module/lnet/parameters/peer_buffer_credits: 0
 /sys/module/lnet/parameters/portals_compatibility: none
 /sys/module/lnet/parameters/router_ping_timeout: 50
 /sys/module/lnet/parameters/routes: 
 /sys/module/lnet/parameters/small_router_buffers: 8192
 /sys/module/lnet/parameters/tiny_router_buffers: 1024
  
 I have not used ip2nets but configure routing but put explict routing 
 statements into the modprobe.d/ files. Is that OK? 
  
  
 Michael
  
  
 Am 10.09.2010 um 17:48 schrieb Michael Kluge:
 
 
 OK, IB back to back is at 1,2 GB/s, 10GE back to back at 950 MB/s, with 
 additional lnet router I see 550 MB/s. Time for lnet tuning?
  
 Michael
 
 
 Hi Andreas,
  
 Am 10.09.2010 um 16:35 schrieb Andreas Dilger:
 
 
 On 2010-09-10, at 08:23, Michael Kluge wrote:
 
 I have a Lustre 1.8.3 setup where I'd like to some lnet router performance 
 tests with routing between DDR IB-10GE networks. Currently I have three 
 nodes, one with DDR IB, one with 10GE and one with both that does the 
 routing. A first short lnet test shows 520-550 MB/s performance.
  
 Has anyone an idea which of the variables of the lnet module are worth 
 playing with to get this number a bit closer to 1GB/s?
 
 I would start by testing the performance on just the 10GigE side, and then 
 separately on the IB side, to verify you are getting the expected performance 
 from the components before trying them both together.  Often it is necessary 
 to tune the ethernet send/receive buffers.
  
 Ethernet back to back is at 950 MB/s. I have not looked at IB back to back 
 yet.
  
  
 Michael
 
 -- 
 
 Michael Kluge, M.Sc.
 
 Technische Universität Dresden
 Center for Information Services and
 High Performance Computing (ZIH)
 D-01062 Dresden
 Germany
 
 Contact:
 Willersbau, Room WIL A 208
 Phone:  (+49) 351 463-34217
 Fax:(+49) 351 463-37773
 e-mail: michael.kl...@tu-dresden.de
 WWW:http://www.tu-dresden.de/zih
  
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
  
 
 -- 
 
 Michael Kluge, M.Sc.
 
 Technische Universität Dresden
 Center for Information Services and
 High Performance Computing (ZIH)
 D-01062 Dresden
 Germany
 
 Contact:
 Willersbau, Room WIL A 208
 Phone:  (+49) 351 463-34217
 Fax:(+49) 351 463-37773
 e-mail: michael.kl...@tu-dresden.de
 WWW:http://www.tu-dresden.de/zih
  
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
  
 
 -- 
 
 Michael Kluge, M.Sc.
 
 Technische Universität Dresden
 Center for Information Services and
 High Performance Computing (ZIH)
 D-01062 Dresden
 Germany
 
 Contact:
 Willersbau, Room WIL A 208
 Phone:  (+49

Re: [Lustre-discuss] lnet router tuning

2010-09-13 Thread Michael Kluge
Hi Eric,

--concurrency 2 already boosted the performance to 1026 MB/s. I don't think 
we'll get any more out of this :)


Thanks a lot, Michael

Am 13.09.2010 um 07:55 schrieb Eric Barton:

 Michael,
  
 I think you may have only got 1 BRW READ in flight at a time with this script,
 so I would expect the routed throughput to be getting on for half of direct
 throughput.  Can you try “--concurrency 8” to simulate the number of I/Os
 a real client would keep in flight?
  
 Cheers,
Eric
  
  From: Michael Kluge [mailto:michael.kl...@tu-dresden.de] 
 Sent: 13 September 2010 10:35 PM
 To: Eric Barton
 Cc: 'Lustre Diskussionsliste'
 Subject: Re: [Lustre-discuss] lnet router tuning
  
 Hi Eric,
  
 basically right now I have one IB node, one 10GE node and one router node 
 that has both types of network interfaces.
  
 I've got a small lnet test script on the router node, that does the work:
 export LST_SESSION=$$
 lst new_session rw
 lst add_group readers 192.168.1...@tcp
 lst add_group writers 10.148.0...@o2ib
 lst add_batch bulk_rw
 lst add_test --batch bulk_rw --from writers --to readers brw read 
 check=simple size=1M
 lst run bulk_rw
 lst stat writers  sleep 30; kill $!
 lst end_session
  
 Is there a way to figure out the messages in flight? I remember to have a 
 rpc's in flight tunable but this is connected to the OSC layer which does 
 not do anything in my case (I think).
  
  
 Michael
  
  
  
 Am 13.09.2010 um 03:08 schrieb Eric Barton:
 
 
  
 Michael,
  
  
 How are you generating load and measuring the throughput?   I’m particularly 
 interested in the number
 of nodes on each side of the router and how many messages you have in flight 
 between each one.
  
  
 Cheers,
Eric
  
  
  
  
 From: lustre-discuss-boun...@lists.lustre.org 
 [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Michael Kluge
 Sent: 11 September 2010 12:56 AM
 To: Michael Kluge
 Cc: Lustre Diskussionsliste
 Subject: Re: [Lustre-discuss] lnet router tuning
  
 And here are my params:
  
 r...@doss05:/home/tests/lnet# for F in /sys/module/lnet/parameters/* ; do 
 echo -n $F: ; cat $F ; done
 /sys/module/lnet/parameters/accept: secure
 /sys/module/lnet/parameters/accept_backlog: 127
 /sys/module/lnet/parameters/accept_port: 988
 /sys/module/lnet/parameters/accept_timeout: 5
 /sys/module/lnet/parameters/auto_down: 1
 /sys/module/lnet/parameters/avoid_asym_router_failure: 0
 /sys/module/lnet/parameters/check_routers_before_use: 0
 /sys/module/lnet/parameters/config_on_load: 0
 /sys/module/lnet/parameters/dead_router_check_interval: 0
 /sys/module/lnet/parameters/forwarding: enabled
 /sys/module/lnet/parameters/ip2nets: 
 /sys/module/lnet/parameters/large_router_buffers: 512
 /sys/module/lnet/parameters/live_router_check_interval: 0
 /sys/module/lnet/parameters/local_nid_dist_zero: 1
 /sys/module/lnet/parameters/networks: tcp0(eth2),o2ib(ib1)
 /sys/module/lnet/parameters/peer_buffer_credits: 0
 /sys/module/lnet/parameters/portals_compatibility: none
 /sys/module/lnet/parameters/router_ping_timeout: 50
 /sys/module/lnet/parameters/routes: 
 /sys/module/lnet/parameters/small_router_buffers: 8192
 /sys/module/lnet/parameters/tiny_router_buffers: 1024
  
 I have not used ip2nets but configure routing but put explict routing 
 statements into the modprobe.d/ files. Is that OK? 
  
  
 Michael
  
  
 Am 10.09.2010 um 17:48 schrieb Michael Kluge:
 
 
 
 OK, IB back to back is at 1,2 GB/s, 10GE back to back at 950 MB/s, with 
 additional lnet router I see 550 MB/s. Time for lnet tuning?
  
 Michael
 
 
 
 Hi Andreas,
  
 Am 10.09.2010 um 16:35 schrieb Andreas Dilger:
 
 
 
 On 2010-09-10, at 08:23, Michael Kluge wrote:
 
 
 I have a Lustre 1.8.3 setup where I'd like to some lnet router performance 
 tests with routing between DDR IB-10GE networks. Currently I have three 
 nodes, one with DDR IB, one with 10GE and one with both that does the 
 routing. A first short lnet test shows 520-550 MB/s performance.
  
 Has anyone an idea which of the variables of the lnet module are worth 
 playing with to get this number a bit closer to 1GB/s?
 
 I would start by testing the performance on just the 10GigE side, and then 
 separately on the IB side, to verify you are getting the expected performance 
 from the components before trying them both together.  Often it is necessary 
 to tune the ethernet send/receive buffers.
  
 Ethernet back to back is at 950 MB/s. I have not looked at IB back to back 
 yet.
  
  
 Michael
 
 -- 
 
 Michael Kluge, M.Sc.
 
 Technische Universität Dresden
 Center for Information Services and
 High Performance Computing (ZIH)
 D-01062 Dresden
 Germany
 
 Contact:
 Willersbau, Room WIL A 208
 Phone:  (+49) 351 463-34217
 Fax:(+49) 351 463-37773
 e-mail: michael.kl...@tu-dresden.de
 WWW:http://www.tu-dresden.de/zih
  
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman

[Lustre-discuss] lnet router tuning

2010-09-10 Thread Michael Kluge
Hi all,

I have a Lustre 1.8.3 setup where I'd like to some lnet router
performance tests with routing between DDR IB-10GE networks. Currently
I have three nodes, one with DDR IB, one with 10GE and one with both
that does the routing. A first short lnet test shows 520-550 MB/s
performance. 

Has anyone an idea which of the variables of the lnet module are worth
playing with to get this number a bit closer to 1GB/s? 

parm:   tiny_router_buffers:# of 0 payload messages to buffer in
the router (int)
parm:   small_router_buffers:# of small (1 page) messages to
buffer in the router (int)
parm:   large_router_buffers:# of large messages to buffer in
the router (int)
parm:   peer_buffer_credits:# router buffer credits per peer
(int)

The CPU on the router node is less utilized than it was when I did back
to back 10GE tests. I have 6 cores in the machine, 5 have been idle and
one showing a load of about 60%.


Michael

-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih


smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lnet router tuning

2010-09-10 Thread Michael Kluge
Hi Andreas,

Am 10.09.2010 um 16:35 schrieb Andreas Dilger:

 On 2010-09-10, at 08:23, Michael Kluge wrote:
 I have a Lustre 1.8.3 setup where I'd like to some lnet router performance 
 tests with routing between DDR IB-10GE networks. Currently I have three 
 nodes, one with DDR IB, one with 10GE and one with both that does the 
 routing. A first short lnet test shows 520-550 MB/s performance. 
 
 Has anyone an idea which of the variables of the lnet module are worth 
 playing with to get this number a bit closer to 1GB/s? 
 
 I would start by testing the performance on just the 10GigE side, and then 
 separately on the IB side, to verify you are getting the expected performance 
 from the components before trying them both together.  Often it is necessary 
 to tune the ethernet send/receive buffers.

Ethernet back to back is at 950 MB/s. I have not looked at IB back to back yet.


Michael

-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lnet router tuning

2010-09-10 Thread Michael Kluge
OK, IB back to back is at 1,2 GB/s, 10GE back to back at 950 MB/s, with 
additional lnet router I see 550 MB/s. Time for lnet tuning?

Michael

 Hi Andreas,
 
 Am 10.09.2010 um 16:35 schrieb Andreas Dilger:
 
 On 2010-09-10, at 08:23, Michael Kluge wrote:
 I have a Lustre 1.8.3 setup where I'd like to some lnet router performance 
 tests with routing between DDR IB-10GE networks. Currently I have three 
 nodes, one with DDR IB, one with 10GE and one with both that does the 
 routing. A first short lnet test shows 520-550 MB/s performance. 
 
 Has anyone an idea which of the variables of the lnet module are worth 
 playing with to get this number a bit closer to 1GB/s? 
 
 I would start by testing the performance on just the 10GigE side, and then 
 separately on the IB side, to verify you are getting the expected 
 performance from the components before trying them both together.  Often it 
 is necessary to tune the ethernet send/receive buffers.
 
 Ethernet back to back is at 950 MB/s. I have not looked at IB back to back 
 yet.
 
 
 Michael
 
 -- 
 
 Michael Kluge, M.Sc.
 
 Technische Universität Dresden
 Center for Information Services and
 High Performance Computing (ZIH)
 D-01062 Dresden
 Germany
 
 Contact:
 Willersbau, Room WIL A 208
 Phone:  (+49) 351 463-34217
 Fax:(+49) 351 463-37773
 e-mail: michael.kl...@tu-dresden.de
 WWW:http://www.tu-dresden.de/zih
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lnet router tuning

2010-09-10 Thread Michael Kluge
And here are my params:

r...@doss05:/home/tests/lnet# for F in /sys/module/lnet/parameters/* ; do echo 
-n $F: ; cat $F ; done
/sys/module/lnet/parameters/accept: secure
/sys/module/lnet/parameters/accept_backlog: 127
/sys/module/lnet/parameters/accept_port: 988
/sys/module/lnet/parameters/accept_timeout: 5
/sys/module/lnet/parameters/auto_down: 1
/sys/module/lnet/parameters/avoid_asym_router_failure: 0
/sys/module/lnet/parameters/check_routers_before_use: 0
/sys/module/lnet/parameters/config_on_load: 0
/sys/module/lnet/parameters/dead_router_check_interval: 0
/sys/module/lnet/parameters/forwarding: enabled
/sys/module/lnet/parameters/ip2nets: 
/sys/module/lnet/parameters/large_router_buffers: 512
/sys/module/lnet/parameters/live_router_check_interval: 0
/sys/module/lnet/parameters/local_nid_dist_zero: 1
/sys/module/lnet/parameters/networks: tcp0(eth2),o2ib(ib1)
/sys/module/lnet/parameters/peer_buffer_credits: 0
/sys/module/lnet/parameters/portals_compatibility: none
/sys/module/lnet/parameters/router_ping_timeout: 50
/sys/module/lnet/parameters/routes: 
/sys/module/lnet/parameters/small_router_buffers: 8192
/sys/module/lnet/parameters/tiny_router_buffers: 1024

I have not used ip2nets but configure routing but put explict routing 
statements into the modprobe.d/ files. Is that OK? 


Michael


Am 10.09.2010 um 17:48 schrieb Michael Kluge:

 OK, IB back to back is at 1,2 GB/s, 10GE back to back at 950 MB/s, with 
 additional lnet router I see 550 MB/s. Time for lnet tuning?
 
 Michael
 
 Hi Andreas,
 
 Am 10.09.2010 um 16:35 schrieb Andreas Dilger:
 
 On 2010-09-10, at 08:23, Michael Kluge wrote:
 I have a Lustre 1.8.3 setup where I'd like to some lnet router performance 
 tests with routing between DDR IB-10GE networks. Currently I have three 
 nodes, one with DDR IB, one with 10GE and one with both that does the 
 routing. A first short lnet test shows 520-550 MB/s performance. 
 
 Has anyone an idea which of the variables of the lnet module are worth 
 playing with to get this number a bit closer to 1GB/s? 
 
 I would start by testing the performance on just the 10GigE side, and then 
 separately on the IB side, to verify you are getting the expected 
 performance from the components before trying them both together.  Often it 
 is necessary to tune the ethernet send/receive buffers.
 
 Ethernet back to back is at 950 MB/s. I have not looked at IB back to back 
 yet.
 
 
 Michael
 
 -- 
 
 Michael Kluge, M.Sc.
 
 Technische Universität Dresden
 Center for Information Services and
 High Performance Computing (ZIH)
 D-01062 Dresden
 Germany
 
 Contact:
 Willersbau, Room WIL A 208
 Phone:  (+49) 351 463-34217
 Fax:(+49) 351 463-37773
 e-mail: michael.kl...@tu-dresden.de
 WWW:http://www.tu-dresden.de/zih
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 
 -- 
 
 Michael Kluge, M.Sc.
 
 Technische Universität Dresden
 Center for Information Services and
 High Performance Computing (ZIH)
 D-01062 Dresden
 Germany
 
 Contact:
 Willersbau, Room WIL A 208
 Phone:  (+49) 351 463-34217
 Fax:(+49) 351 463-37773
 e-mail: michael.kl...@tu-dresden.de
 WWW:http://www.tu-dresden.de/zih
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lnet router tuning

2010-09-10 Thread Michael Kluge
Has anyone else a 10GE-IB Lustre router? What are the typical 
performance numbers? How close do you get to 1GB/s?

Michael


Am 10.09.2010 17:55, schrieb Michael Kluge:
 And here are my params:

 r...@doss05:/home/tests/lnet# for F in /sys/module/lnet/parameters/* ;
 do echo -n $F: ; cat $F ; done
 /sys/module/lnet/parameters/accept: secure
 /sys/module/lnet/parameters/accept_backlog: 127
 /sys/module/lnet/parameters/accept_port: 988
 /sys/module/lnet/parameters/accept_timeout: 5
 /sys/module/lnet/parameters/auto_down: 1
 /sys/module/lnet/parameters/avoid_asym_router_failure: 0
 /sys/module/lnet/parameters/check_routers_before_use: 0
 /sys/module/lnet/parameters/config_on_load: 0
 /sys/module/lnet/parameters/dead_router_check_interval: 0
 /sys/module/lnet/parameters/forwarding: enabled
 /sys/module/lnet/parameters/ip2nets:
 /sys/module/lnet/parameters/large_router_buffers: 512
 /sys/module/lnet/parameters/live_router_check_interval: 0
 /sys/module/lnet/parameters/local_nid_dist_zero: 1
 /sys/module/lnet/parameters/networks: tcp0(eth2),o2ib(ib1)
 /sys/module/lnet/parameters/peer_buffer_credits: 0
 /sys/module/lnet/parameters/portals_compatibility: none
 /sys/module/lnet/parameters/router_ping_timeout: 50
 /sys/module/lnet/parameters/routes:
 /sys/module/lnet/parameters/small_router_buffers: 8192
 /sys/module/lnet/parameters/tiny_router_buffers: 1024

 I have not used ip2nets but configure routing but put explict routing
 statements into the modprobe.d/ files. Is that OK?


 Michael


 Am 10.09.2010 um 17:48 schrieb Michael Kluge:

 OK, IB back to back is at 1,2 GB/s, 10GE back to back at 950 MB/s,
 with additional lnet router I see 550 MB/s. Time for lnet tuning?

 Michael

 Hi Andreas,

 Am 10.09.2010 um 16:35 schrieb Andreas Dilger:

 On 2010-09-10, at 08:23, Michael Kluge wrote:
 I have a Lustre 1.8.3 setup where I'd like to some lnet router
 performance tests with routing between DDR IB-10GE networks.
 Currently I have three nodes, one with DDR IB, one with 10GE and
 one with both that does the routing. A first short lnet test shows
 520-550 MB/s performance.

 Has anyone an idea which of the variables of the lnet module are
 worth playing with to get this number a bit closer to 1GB/s?

 I would start by testing the performance on just the 10GigE side,
 and then separately on the IB side, to verify you are getting the
 expected performance from the components before trying them both
 together. Often it is necessary to tune the ethernet send/receive
 buffers.

 Ethernet back to back is at 950 MB/s. I have not looked at IB back to
 back yet.


 Michael

 --

 Michael Kluge, M.Sc.

 Technische Universität Dresden
 Center for Information Services and
 High Performance Computing (ZIH)
 D-01062 Dresden
 Germany

 Contact:
 Willersbau, Room WIL A 208
 Phone: (+49) 351 463-34217
 Fax: (+49) 351 463-37773
 e-mail: michael.kl...@tu-dresden.de mailto:michael.kl...@tu-dresden.de
 WWW: http://www.tu-dresden.de/zih

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org mailto:Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


 --

 Michael Kluge, M.Sc.

 Technische Universität Dresden
 Center for Information Services and
 High Performance Computing (ZIH)
 D-01062 Dresden
 Germany

 Contact:
 Willersbau, Room WIL A 208
 Phone: (+49) 351 463-34217
 Fax: (+49) 351 463-37773
 e-mail: michael.kl...@tu-dresden.de mailto:michael.kl...@tu-dresden.de
 WWW: http://www.tu-dresden.de/zih

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org mailto:Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


 --

 Michael Kluge, M.Sc.

 Technische Universität Dresden
 Center for Information Services and
 High Performance Computing (ZIH)
 D-01062 Dresden
 Germany

 Contact:
 Willersbau, Room WIL A 208
 Phone: (+49) 351 463-34217
 Fax: (+49) 351 463-37773
 e-mail: michael.kl...@tu-dresden.de mailto:michael.kl...@tu-dresden.de
 WWW: http://www.tu-dresden.de/zih



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] O_DIRECT

2010-08-14 Thread Michael Kluge
Hi all,

how does Lustre handle write() requests to files opened with O_DIRECT. 
Does the OSS enforce that the OST has physically written the data to the 
OST before the op is completed or does the write() call return on the 
client before this? I do not see the whole file content walking through 
the FC port of the RAID controller, but it can also be that my 
measurement is wrong ...


Michael


-- 
Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Complete lnet routing example

2010-06-24 Thread Michael Kluge
Hi there,

does anyone have a complete lnet routing example that he/she wants to share 
that contains a network diagram and all modprobe.conf options for clients, 
servers and the routers? I found only one mail in the mailing list and the 
interesting parts have gone through a filter and now a lot of the configuration 
options are '[EMAIL PROTECTED]'.


Thanks a lot in advance,
Michael

-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Complete lnet routing example

2010-06-24 Thread Michael Kluge
Hi Josh,

thanks a lot!


Michael

Am 24.06.2010 um 15:40 schrieb Joshua Walgenbach:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Hi Michael,
 
 This is what I'm using on my test systems:
 
 I have the servers set up on 192.168.1.0/24 and clients set up on
 192.168.2.0/24, with no network routing between them and a lustre router
 bridging the two networks with ip addresses of 192.168.1.31 and
 192.168.2.31. I've a attached a quick diagram.
 
 modprobe.conf for MDS and OSS servers:
 
 options lnet networks=tcp0(eth2) routes=tcp1 192.168.1...@tcp0
 
 modprobe.conf for router:
 
 options lnet networks=tcp0(eth2), tcp1(eth3) forwarding=enabled
 
 modprobe.conf for clients:
 
 options lnet networks=tcp1(eth2) routes=tcp0 192.168.2...@tcp1
 
 What I have is pretty minimal, but it gets the job done.
 
 - -Josh
 
 On 06/24/2010 06:15 AM, Michael Kluge wrote:
 Hi there,
 
 does anyone have a complete lnet routing example that he/she wants to
 share that contains a network diagram and all modprobe.conf options for
 clients, servers and the routers? I found only one mail in the mailing
 list and the interesting parts have gone through a filter and now a lot
 of the configuration options are '[EMAIL PROTECTED]'.
 
 
 Thanks a lot in advance,
 Michael
 
 -- 
 
 Michael Kluge, M.Sc.
 
 Technische Universität Dresden
 Center for Information Services and
 High Performance Computing (ZIH)
 D-01062 Dresden
 Germany
 
 Contact:
 Willersbau, Room WIL A 208
 Phone:  (+49) 351 463-34217
 Fax:(+49) 351 463-37773
 e-mail: michael.kl...@tu-dresden.de mailto:michael.kl...@tu-dresden.de
 WWW:http://www.tu-dresden.de/zih
 
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.10 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
 
 iEYEARECAAYFAkwjYEIACgkQcqyJPuRTYp9tTACeIGttWBu44dc4SKB/0IIjHhF9
 i3QAn17sBD38/3MdsYuiGcUOruZVS8j/
 =SLQp
 -END PGP SIGNATURE-
 lustre_routing.pnglustre_routing.png.sig___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] MDS overload, why?

2009-10-09 Thread Michael Kluge
Hi Arne,

could be memory pressure and the OOM running and shooting at things. How
much memory does you server has?


Michael

Am Freitag, den 09.10.2009, 10:26 +0200 schrieb Arne Brutschy:
 Hi everyone,
 
 2 months ago, we switched our ~80 node cluster from NFS to lustre. 1
 MDS, 4 OSTs, lustre 1.6.7.2 on a rocks 4.2.1/centos 4.2/linux
 2.6.9-78.0.22.
 
 We were quite happy with lustre's performance, especially because
 bottlenecks caused by /home disk access were history.
 
 Saturday, the cluster went down (= was inaccessible). After some
 investigation I found out that the reason seems to be an overloaded MDS.
 Over the following 4 days, this happened multiple times and could only
 be resolved by 1) killing all user jobs and 2) hard-resetting the MDS.
 
 The MDS did not respond to any command, if I managed to get a video
 signal (not often), load was 170. Additionally, 2 times kernel oops got
 displayed, but unfortunately I have to record of them.
 
 The clients showed the following error:
  Oct  8 09:58:55 majorana kernel: LustreError: 
  3787:0:(events.c:66:request_out_callback()) @@@ type 4, status -5  
  r...@f6222800 x8702488/t0 o250-m...@10.255.255.206@tcp:26/25 lens 304/456 
  e 0 to 1 dl 1254988740 ref 2 fl Rpc:N/0/0 rc 0/0
  Oct  8 09:58:55 majorana kernel: LustreError: 
  3787:0:(events.c:66:request_out_callback()) Skipped 33 previous similar 
  messages
 
 So, my question is: what could cause such a load? The cluster was not
 exessively used... Is this a bug or a user's job that creates the load?
 How can I protect lustre against this kind of failure?
 
 Thanks in advance,
 Arne 
 
-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih


smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] MDS overload, why?

2009-10-09 Thread Michael Kluge
Hmm. Should be enough. I guess you need to set up a loghost for syslog
then and a reliable serial console to get stack traces. Everything else
would be just a wild guess (as the question for the ram size was).

Michael

 Hi,
 
 8GB of ram, 2x 4core Intel Xeon E5410 @ 2.33GHz
 
 Arne
 
 On Fr, 2009-10-09 at 12:16 +0200, Michael Kluge wrote:
  Hi Arne,
  
  could be memory pressure and the OOM running and shooting at things. How
  much memory does you server has?
  
  
  Michael
  
  Am Freitag, den 09.10.2009, 10:26 +0200 schrieb Arne Brutschy:
   Hi everyone,
   
   2 months ago, we switched our ~80 node cluster from NFS to lustre. 1
   MDS, 4 OSTs, lustre 1.6.7.2 on a rocks 4.2.1/centos 4.2/linux
   2.6.9-78.0.22.
   
   We were quite happy with lustre's performance, especially because
   bottlenecks caused by /home disk access were history.
   
   Saturday, the cluster went down (= was inaccessible). After some
   investigation I found out that the reason seems to be an overloaded MDS.
   Over the following 4 days, this happened multiple times and could only
   be resolved by 1) killing all user jobs and 2) hard-resetting the MDS.
   
   The MDS did not respond to any command, if I managed to get a video
   signal (not often), load was 170. Additionally, 2 times kernel oops got
   displayed, but unfortunately I have to record of them.
   
   The clients showed the following error:
Oct  8 09:58:55 majorana kernel: LustreError: 
3787:0:(events.c:66:request_out_callback()) @@@ type 4, status -5  
r...@f6222800 x8702488/t0 o250-m...@10.255.255.206@tcp:26/25 lens 
304/456 e 0 to 1 dl 1254988740 ref 2 fl Rpc:N/0/0 rc 0/0
Oct  8 09:58:55 majorana kernel: LustreError: 
3787:0:(events.c:66:request_out_callback()) Skipped 33 previous similar 
messages
   
   So, my question is: what could cause such a load? The cluster was not
   exessively used... Is this a bug or a user's job that creates the load?
   How can I protect lustre against this kind of failure?
   
   Thanks in advance,
   Arne 
   
-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih


smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] MDS overload, why?

2009-10-09 Thread Michael Kluge
LMT (http://code.google.com/p/lmt) might be able to give some hints if
users are using the FS in a 'wild' fashion. For the question what can
cause this behaviour of my MDS I guess the answer is like: a million
things ;) There is no way of being more specific with more input about
the problem itself.

Michael

Am Freitag, den 09.10.2009, 16:15 +0200 schrieb Arne Brutschy:
 Hi,
 
 thanks for replying!
 
 I understand that without further information we can't do much about the
 oopses. I was more hoping for some information regarding possible
 sources of such an overload. Is it normal that a MDS gets overloaded
 like this, while the OSTs have nothing to do, and what can I do about
 it? How can I find the source of the problem?
 
 More specifically, what are the operations that lead to a lot of MDS
 load and none for the OSTs? Although our MDS (8GB ram, 2x4core, SATA) is
 not a top-notch server, it's fairly recent and I feel the load we're
 experiencing is not handable by a single MDS.
 
 My problem is that I can't make out major problems in the user's jobs
 running on the cluster, and I can't quantify nor track down the problem
 because I don't know what behavior might have caused it. 
 
 As I said, ooppses appeared only twice, and all other problems where
 just apparent by a non-responsive MDS.
 
 Thanks,
 Arne
 
 
 On Fr, 2009-10-09 at 07:44 -0400, Brian J. Murrell wrote:
  On Fri, 2009-10-09 at 10:26 +0200, Arne Brutschy wrote:
   
   The clients showed the following error:
Oct  8 09:58:55 majorana kernel: LustreError: 
3787:0:(events.c:66:request_out_callback()) @@@ type 4, status -5  
r...@f6222800 x8702488/t0 o250-m...@10.255.255.206@tcp:26/25 lens 
304/456 e 0 to 1 dl 1254988740 ref 2 fl Rpc:N/0/0 rc 0/0
Oct  8 09:58:55 majorana kernel: LustreError: 
3787:0:(events.c:66:request_out_callback()) Skipped 33 previous similar 
messages
   
   So, my question is: what could cause such a load? The cluster was not
   exessively used... Is this a bug or a user's job that creates the load?
   How can I protect lustre against this kind of failure?
  
  Without any more information we could not possibly know.  If you really
  are getting oopses then you will need console logs (i.e. serial console)
  so that we can see the stack trace.
  
  b.
  
  ___
  Lustre-discuss mailing list
  Lustre-discuss@lists.lustre.org
  http://lists.lustre.org/mailman/listinfo/lustre-discuss
-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih


smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Read/Write performance problem

2009-10-07 Thread Michael Kluge
Am Dienstag, den 06.10.2009, 09:33 -0600 schrieb Andreas Dilger:
  ... bla bla ...
  Is there a reason why an immediate read after a write on the same node
  from/to a shared file is slow? Is there any additional communication,
  e.g. is the client flushing the buffer cache before the first read? The
  statistics show that the average time to complete a 1.44MB read request
  is increasing during the runtime of our program. At some point it hits
  an upper limit or a saturation point and stays there. Is there some kind
  of queue or something that is getting full in this kind of
  write/read-scenario? May tuneable some stuff in /proc/fs/luste?
 
 One possible issue is that you don't have enough extra RAM to cache 1.5GB
 of the checkpoint, so during the write it is being flushed to the OSTs
 and evicted from cache.  When you immediately restart there is still dirty
 data being written from the clients that is contending with the reads to
 restart.
 Cheers, Andreas

Well, I do call fsync() after the write is finished. During the write
process I see a constant stream of 4 GB/s running from the lustre
servers to the raid controllers which finishes when the write process
terminates. When I start reading, there are no more writes going this
way, so I suspect it might be something else ... Even if I wait between
the writes and reads 5 minutes (all dirty pages should have been flushed
by then) the picture does not change.


Michael

-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih


smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Read/Write performance problem

2009-10-06 Thread Michael Kluge
Hi all,

our Lustre FS shows an interesting performance problem which I'd like to
discuss as some of you might have seen this kind of things before and
maybe someone has a quick explanation of what's going on.

We are running Lustre 1.6.5.1. The problem shows up when we read a
shared file from multiple nodes that has just been written from the same
set of nodes. 512 processes write a checkpoint (1.5 GB from each node)
into a shared file by seeking to position RANK*1.5GB and writing 1.5GB
in 1.44M chunks. Writing works fine and gives the full file system
performance. The data is being written by using write() and no flags
aside O_CREAT and O_WRONLY. If the checkpoint is written, the program is
terminated and restarted and reads in the same portion of the file. For
some reason this almost immediate reading of the same data that was just
written on the same node is very slow. If we a) change the set of nodes
or b) wait a day, we get the full read performance when we use the same
executable and the same shared file. 

Is there a reason why an immediate read after a write on the same node
from/to a shared file is slow? Is there any additional communication,
e.g. is the client flushing the buffer cache before the first read? The
statistics show that the average time to complete a 1.44MB read request
is increasing during the runtime of our program. At some point it hits
an upper limit or a saturation point and stays there. Is there some kind
of queue or something that is getting full in this kind of
write/read-scenario? May tuneable some stuff in /proc/fs/luste?


Regards, Michael


-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih


smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss