Re: [Lustre-discuss] Read/Write performance problem

2009-10-07 Thread Michael Kluge
Am Dienstag, den 06.10.2009, 09:33 -0600 schrieb Andreas Dilger:
  ... bla bla ...
  Is there a reason why an immediate read after a write on the same node
  from/to a shared file is slow? Is there any additional communication,
  e.g. is the client flushing the buffer cache before the first read? The
  statistics show that the average time to complete a 1.44MB read request
  is increasing during the runtime of our program. At some point it hits
  an upper limit or a saturation point and stays there. Is there some kind
  of queue or something that is getting full in this kind of
  write/read-scenario? May tuneable some stuff in /proc/fs/luste?
 
 One possible issue is that you don't have enough extra RAM to cache 1.5GB
 of the checkpoint, so during the write it is being flushed to the OSTs
 and evicted from cache.  When you immediately restart there is still dirty
 data being written from the clients that is contending with the reads to
 restart.
 Cheers, Andreas

Well, I do call fsync() after the write is finished. During the write
process I see a constant stream of 4 GB/s running from the lustre
servers to the raid controllers which finishes when the write process
terminates. When I start reading, there are no more writes going this
way, so I suspect it might be something else ... Even if I wait between
the writes and reads 5 minutes (all dirty pages should have been flushed
by then) the picture does not change.


Michael

-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih


smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] strange performance with POSIX file capabilities

2009-10-07 Thread Ralf Utermann
Andreas Dilger wrote:
 On Oct 06, 2009  15:13 +0200, Ralf Utermann wrote:
 with newer vanilla kernels we  saw strange performance
 data with iozone on patchless clients: some OSTs had a lower write
 bandwith in the iozone benchmark, getting worse with record
 sizes below 1024.
 After lots of kernel builds, it looks like the kernel config entry
 CONFIG_SECURITY_FILE_CAPABILITIES is the one, wich
 introduces this problem. If CONFIG_SECURITY_FILE_CAPABILITIES
 is not set, iozone data look good, if it's compiled into the
 kernel, we see the problem:
 http://www.physik.uni-augsburg.de/~ralfu/LustreTest/Lustre_with_file_caps.html
 
 Just to clarify, you are reporting the above config option affects
 write performance when changed on the client, correct?  It appears

Hi Andreas,

Yes, this option has only been used on the client side. The servers
are running a 2.6.22 kernel and it looks like this option has been
introduced with 2.6.24.

 that this option is off by default in the upstream kernels, so I
 suspect it doesn't get tested much.

This option is set on by default in the Debian kernels, and that's the config
I usually start with. I think, recent fedora kernels would also have this set,
and also RHEL6.

 
 Any idea, why file capabilities should affect the write
 performance on Lustre, and why it should only affect some OSTs?
 
 I can imagine that if this is adding some significant overhead on a
 per-system-call basis that it would hurt performance.
 
 It is definitely odd that it would affect the performance of only some
 of the OSTs.  I assume they are otherwise identical?  The only thing

the OSTs are either 4 or 8 data disks on Sun 6140 systems; the 4 with
problems are on 2 OSS, the 3 without problems are on the other 2 OSS.

 I can imagine is that this option is related to SELinux and has some
 overhead in getting extended attributes, but even then the xattrs are
 only stored on the MDS so this would hurt all OSTs uniformly.

As I don't need this option anyway, I will just build my kernels now
with this option off. Of course an unpleasant feeling remains, not
knowing what really happens ...

As off vanilla kernel 2.6.29 there should be a no_file_caps kernel boot
parameter. I would like to test this setup, but b1_8 only builds fine with
vanilla 2.6.28, I cannot get it running with vanilla 2.6.[29|30] -- but
this should be different thread ...

Bye, Ralf
-- 
Ralf Utermann
_
Universität Augsburg, Institut für Physik   --   EDV-Betreuer
Universitätsstr.1 
D-86135 Augsburg Phone:  +49-821-598-3231
SMTP: ralf.uterm...@physik.uni-augsburg.de Fax: -3411
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Measure disk IO for Lustre client

2009-10-07 Thread Hendra Sumilo
Hi all,

I am new to Lustre. I recently run some experiment on the clusters machine
with lustre backbone and I want to measure the disk IO during the program
execution. The problem with using tools like iostat is it just measure the
read/write operation in the local disk. Is there any tools for lustre client
that I can measure the disk IO not just in my compute node but also to the
other machine, how much it is doing read/write operation to all other disk
during the execution time.

Thanks



Regards,
Hendra
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] replication in Lustre 2.0

2009-10-07 Thread anthony garnier

Hi,

I saw that developpers included replication of the data in the version 2.0, Is 
these replication only used to make a recovery in a case of hard disk failure 
or can the client use it ?

Thx.

Anthony Garnier 
DTI/DPV/DCPS/RSH/ACCE 
PSA Peugeot Citroën 
IT center, 90160 Bessoncourt, France  
  
_
Nouveau! Découvrez le Windows phone Samsung Omnia II disponible chez SFR. 
http://clk.atdmt.com/FRM/go/175819072/direct/01/___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] OST retirement

2009-10-07 Thread Arne Wiebalck

Dear list,

is there a way to see the difference whether I deactivated
an OST filesystem-wide by

lctl --device 15 conf_param pps-OST000a.osc.active=0

  or only locally by

lctl set_param osc.pps-OST000a-osc.active=0?

And: After deactivation, I see the OSTs still on the device
list (as inactive): is there a way to completely remove them,
so that new clients would have no idea they ever existed (and
the uuids get eventually reused)?

TIA,
 Arne



smime.p7s
Description: S/MIME Cryptographic Signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Assertion failur in ldiskfs_get_blocks_handle

2009-10-07 Thread Patrick Winnertz
Hey!

After reinstallation of an lustre testcluster with 1.8.1 one ost crashed with 
an Assertion failure in inode.c (running 2.6.18er kernel on amd64). 

Do you know such an issue or has anybody else met this error before?

Greetings
Winnie

here the error msg:

Lustre: spfs-OST0001: received MDS connection from 192.168@tcp  
   
Assertion failure in ldiskfs_get_blocks_handle() at 
/usr/src/modules/lustre/ldiskfs/ldiskfs/inode.c:806: !(LDISKFS_I(inode)-
i_flags  LDISKFS_EXTENTS_FL)  
--- [cut here ] - [please bite here ] - 
   
Kernel BUG at /usr/src/modules/lustre/ldiskfs/ldiskfs/inode.c:806   
   
invalid opcode:  [1] SMP
   
CPU 0   
   
Modules linked in: obdfilter fsfilt_ldiskfs ost mgc ldiskfs crc16 lustre lov 
mdc 
lquota osc ksocklnd ptlrpc obdclass lnet lvfs libcfs ipv6 button ac battery 
dm_snapshot dm_mirror dm_mod sbp2 loop evdev sg serio_raw pcspkr psmouse 
eth1394 sr_mod cdrom ext3 jbd mbcache sd_mod sata_nv libata usb_storage 
scsi_mod ohci1394 e1000 ieee1394 generic amd74xx ide_core ehci_hcd ohci_hcd 
thermal processor fan   
 
Pid: 2343, comm: ll_ost_io_01 Tainted: GF 
2.6.18+lustre1.8.1+0.credativ.etch.1 #1  
RIP: 0010:[885756d0]  [885756d0] 
:ldiskfs:ldiskfs_get_blocks_handle+0x80/0xd10 
RSP: 0018:81003a345490  EFLAGS: 00010286
   
RAX: 00a0 RBX:  RCX: 80450868   
   
RDX: 80450868 RSI: 0086 RDI: 80450860   
   
RBP: 81003b2261d8 R08: 80450868 R09: 0020   
   
R10: 0046 R11:  R12: 81003a3456d0   
   
R13:  R14: 0001 R15: 81003b2261d8   
   
FS:  2ba142b446d0() GS:80522000() knlGS:
   
CS:  0010 DS:  ES:  CR0: 8005003b   
   
CR2: 2b598fc6a360 CR3: 37e66000 CR4: 06e0   
   
Process ll_ost_io_01 (pid: 2343, threadinfo 81003a344000, task 
81003a011830)   
Stack:  8100c000 0086 89340001 81003d6cfed0 
   
 0001 0001 81003a3456d0 0001
   
   81003c24c040 81003b226100
   
Call Trace: 
   
 [8022c31f] __wake_up+0x38/0x4f   
   
 [8840b0c0] :ksocklnd:ksocknal_queue_tx_locked+0x460/0x4a0
   
 [8840b9cf] :ksocklnd:ksocknal_find_conn_locked+0xcf/0x1f0
   
 [8840bfec] :ksocklnd:ksocknal_launch_packet+0x2ac/0x3a0  
   
 [8840db25] :ksocklnd:ksocknal_alloc_tx+0x205/0x2b0   
   
 [8857675e] :ldiskfs:ldiskfs_get_block+0xde/0x120 
   
 [88574710] :ldiskfs:ldiskfs_bmap+0x0/0xb0
   
 [8023110e] generic_block_bmap+0x37/0x41  
   
 [88574710] :ldiskfs:ldiskfs_bmap+0x0/0xb0
   
 [8860954d] :obdfilter:filter_commitrw_write+0x37d/0x2590 
   
 [80256f2e] cache_alloc_refill+0xde/0x1da 
   
 [8025c11e] thread_return+0x0/0xe7
   
 [8025ca8a] schedule_timeout+0x92/0xad
   
 [885c3968] :ost:ost_brw_write+0x1b88/0x2310  
   
 [8027c6b0] default_wake_function+0x0/0xe 
   
 [88388f28] :ptlrpc:lustre_msg_check_version_v2+0x8/0x20  
   
 [885c6f53] :ost:ost_handle+0x2e63/0x5a00 
   
 [802aa138] zone_statistics+0x3e/0x6d 
   
 [8020de5c] __alloc_pages+0x5c/0x2a9  
   
 [882e4838] 

Re: [Lustre-discuss] Lustre-discuss Digest, Vol 45, Issue 6

2009-10-07 Thread Andreas Dilger
On Oct 06, 2009  20:24 -0700, Dam Thanh Tung wrote:
  RAID5 over RAID1? Nahh. Consider http://WWW.BAARF.com/ and that
  the storage system of a Lustre pool over DRBD is ideally suited to
  RAID10 (with each pair a DRBD resource). RAID5 may be contributing
  to your speed problem below because of or being rebuilt/syncing
  itself.

 Poor me, i don't know it before, so now we can't change anything on my
 raid partition :( .

It is documented in the Lustre manual that the MDS should be running
on RAID-1 or RAID-1+0.

I would suggest to shut down your MDS, make sure your remote DRBD copy
is up-to-date, then reformat the local storage into RAID-1+0, copy
the remote DRBD mirror back to the local system, and then reformat
the remote DRBD storage to RAID-1+0 also and copy it there.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss