Re: [Lustre-discuss] Read/Write performance problem
Am Dienstag, den 06.10.2009, 09:33 -0600 schrieb Andreas Dilger: ... bla bla ... Is there a reason why an immediate read after a write on the same node from/to a shared file is slow? Is there any additional communication, e.g. is the client flushing the buffer cache before the first read? The statistics show that the average time to complete a 1.44MB read request is increasing during the runtime of our program. At some point it hits an upper limit or a saturation point and stays there. Is there some kind of queue or something that is getting full in this kind of write/read-scenario? May tuneable some stuff in /proc/fs/luste? One possible issue is that you don't have enough extra RAM to cache 1.5GB of the checkpoint, so during the write it is being flushed to the OSTs and evicted from cache. When you immediately restart there is still dirty data being written from the clients that is contending with the reads to restart. Cheers, Andreas Well, I do call fsync() after the write is finished. During the write process I see a constant stream of 4 GB/s running from the lustre servers to the raid controllers which finishes when the write process terminates. When I start reading, there are no more writes going this way, so I suspect it might be something else ... Even if I wait between the writes and reads 5 minutes (all dirty pages should have been flushed by then) the picture does not change. Michael -- Michael Kluge, M.Sc. Technische Universität Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room A 208 Phone: (+49) 351 463-34217 Fax:(+49) 351 463-37773 e-mail: michael.kl...@tu-dresden.de WWW:http://www.tu-dresden.de/zih smime.p7s Description: S/MIME cryptographic signature ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] strange performance with POSIX file capabilities
Andreas Dilger wrote: On Oct 06, 2009 15:13 +0200, Ralf Utermann wrote: with newer vanilla kernels we saw strange performance data with iozone on patchless clients: some OSTs had a lower write bandwith in the iozone benchmark, getting worse with record sizes below 1024. After lots of kernel builds, it looks like the kernel config entry CONFIG_SECURITY_FILE_CAPABILITIES is the one, wich introduces this problem. If CONFIG_SECURITY_FILE_CAPABILITIES is not set, iozone data look good, if it's compiled into the kernel, we see the problem: http://www.physik.uni-augsburg.de/~ralfu/LustreTest/Lustre_with_file_caps.html Just to clarify, you are reporting the above config option affects write performance when changed on the client, correct? It appears Hi Andreas, Yes, this option has only been used on the client side. The servers are running a 2.6.22 kernel and it looks like this option has been introduced with 2.6.24. that this option is off by default in the upstream kernels, so I suspect it doesn't get tested much. This option is set on by default in the Debian kernels, and that's the config I usually start with. I think, recent fedora kernels would also have this set, and also RHEL6. Any idea, why file capabilities should affect the write performance on Lustre, and why it should only affect some OSTs? I can imagine that if this is adding some significant overhead on a per-system-call basis that it would hurt performance. It is definitely odd that it would affect the performance of only some of the OSTs. I assume they are otherwise identical? The only thing the OSTs are either 4 or 8 data disks on Sun 6140 systems; the 4 with problems are on 2 OSS, the 3 without problems are on the other 2 OSS. I can imagine is that this option is related to SELinux and has some overhead in getting extended attributes, but even then the xattrs are only stored on the MDS so this would hurt all OSTs uniformly. As I don't need this option anyway, I will just build my kernels now with this option off. Of course an unpleasant feeling remains, not knowing what really happens ... As off vanilla kernel 2.6.29 there should be a no_file_caps kernel boot parameter. I would like to test this setup, but b1_8 only builds fine with vanilla 2.6.28, I cannot get it running with vanilla 2.6.[29|30] -- but this should be different thread ... Bye, Ralf -- Ralf Utermann _ Universität Augsburg, Institut für Physik -- EDV-Betreuer Universitätsstr.1 D-86135 Augsburg Phone: +49-821-598-3231 SMTP: ralf.uterm...@physik.uni-augsburg.de Fax: -3411 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Measure disk IO for Lustre client
Hi all, I am new to Lustre. I recently run some experiment on the clusters machine with lustre backbone and I want to measure the disk IO during the program execution. The problem with using tools like iostat is it just measure the read/write operation in the local disk. Is there any tools for lustre client that I can measure the disk IO not just in my compute node but also to the other machine, how much it is doing read/write operation to all other disk during the execution time. Thanks Regards, Hendra ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] replication in Lustre 2.0
Hi, I saw that developpers included replication of the data in the version 2.0, Is these replication only used to make a recovery in a case of hard disk failure or can the client use it ? Thx. Anthony Garnier DTI/DPV/DCPS/RSH/ACCE PSA Peugeot Citroën IT center, 90160 Bessoncourt, France _ Nouveau! Découvrez le Windows phone Samsung Omnia II disponible chez SFR. http://clk.atdmt.com/FRM/go/175819072/direct/01/___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] OST retirement
Dear list, is there a way to see the difference whether I deactivated an OST filesystem-wide by lctl --device 15 conf_param pps-OST000a.osc.active=0 or only locally by lctl set_param osc.pps-OST000a-osc.active=0? And: After deactivation, I see the OSTs still on the device list (as inactive): is there a way to completely remove them, so that new clients would have no idea they ever existed (and the uuids get eventually reused)? TIA, Arne smime.p7s Description: S/MIME Cryptographic Signature ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Assertion failur in ldiskfs_get_blocks_handle
Hey! After reinstallation of an lustre testcluster with 1.8.1 one ost crashed with an Assertion failure in inode.c (running 2.6.18er kernel on amd64). Do you know such an issue or has anybody else met this error before? Greetings Winnie here the error msg: Lustre: spfs-OST0001: received MDS connection from 192.168@tcp Assertion failure in ldiskfs_get_blocks_handle() at /usr/src/modules/lustre/ldiskfs/ldiskfs/inode.c:806: !(LDISKFS_I(inode)- i_flags LDISKFS_EXTENTS_FL) --- [cut here ] - [please bite here ] - Kernel BUG at /usr/src/modules/lustre/ldiskfs/ldiskfs/inode.c:806 invalid opcode: [1] SMP CPU 0 Modules linked in: obdfilter fsfilt_ldiskfs ost mgc ldiskfs crc16 lustre lov mdc lquota osc ksocklnd ptlrpc obdclass lnet lvfs libcfs ipv6 button ac battery dm_snapshot dm_mirror dm_mod sbp2 loop evdev sg serio_raw pcspkr psmouse eth1394 sr_mod cdrom ext3 jbd mbcache sd_mod sata_nv libata usb_storage scsi_mod ohci1394 e1000 ieee1394 generic amd74xx ide_core ehci_hcd ohci_hcd thermal processor fan Pid: 2343, comm: ll_ost_io_01 Tainted: GF 2.6.18+lustre1.8.1+0.credativ.etch.1 #1 RIP: 0010:[885756d0] [885756d0] :ldiskfs:ldiskfs_get_blocks_handle+0x80/0xd10 RSP: 0018:81003a345490 EFLAGS: 00010286 RAX: 00a0 RBX: RCX: 80450868 RDX: 80450868 RSI: 0086 RDI: 80450860 RBP: 81003b2261d8 R08: 80450868 R09: 0020 R10: 0046 R11: R12: 81003a3456d0 R13: R14: 0001 R15: 81003b2261d8 FS: 2ba142b446d0() GS:80522000() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 2b598fc6a360 CR3: 37e66000 CR4: 06e0 Process ll_ost_io_01 (pid: 2343, threadinfo 81003a344000, task 81003a011830) Stack: 8100c000 0086 89340001 81003d6cfed0 0001 0001 81003a3456d0 0001 81003c24c040 81003b226100 Call Trace: [8022c31f] __wake_up+0x38/0x4f [8840b0c0] :ksocklnd:ksocknal_queue_tx_locked+0x460/0x4a0 [8840b9cf] :ksocklnd:ksocknal_find_conn_locked+0xcf/0x1f0 [8840bfec] :ksocklnd:ksocknal_launch_packet+0x2ac/0x3a0 [8840db25] :ksocklnd:ksocknal_alloc_tx+0x205/0x2b0 [8857675e] :ldiskfs:ldiskfs_get_block+0xde/0x120 [88574710] :ldiskfs:ldiskfs_bmap+0x0/0xb0 [8023110e] generic_block_bmap+0x37/0x41 [88574710] :ldiskfs:ldiskfs_bmap+0x0/0xb0 [8860954d] :obdfilter:filter_commitrw_write+0x37d/0x2590 [80256f2e] cache_alloc_refill+0xde/0x1da [8025c11e] thread_return+0x0/0xe7 [8025ca8a] schedule_timeout+0x92/0xad [885c3968] :ost:ost_brw_write+0x1b88/0x2310 [8027c6b0] default_wake_function+0x0/0xe [88388f28] :ptlrpc:lustre_msg_check_version_v2+0x8/0x20 [885c6f53] :ost:ost_handle+0x2e63/0x5a00 [802aa138] zone_statistics+0x3e/0x6d [8020de5c] __alloc_pages+0x5c/0x2a9 [882e4838]
Re: [Lustre-discuss] Lustre-discuss Digest, Vol 45, Issue 6
On Oct 06, 2009 20:24 -0700, Dam Thanh Tung wrote: RAID5 over RAID1? Nahh. Consider http://WWW.BAARF.com/ and that the storage system of a Lustre pool over DRBD is ideally suited to RAID10 (with each pair a DRBD resource). RAID5 may be contributing to your speed problem below because of or being rebuilt/syncing itself. Poor me, i don't know it before, so now we can't change anything on my raid partition :( . It is documented in the Lustre manual that the MDS should be running on RAID-1 or RAID-1+0. I would suggest to shut down your MDS, make sure your remote DRBD copy is up-to-date, then reformat the local storage into RAID-1+0, copy the remote DRBD mirror back to the local system, and then reformat the remote DRBD storage to RAID-1+0 also and copy it there. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss