date:20091006

Re: [Lustre-discuss] Read/Write performance problem

2009-10-06 Thread Michael Kluge

Am Dienstag, den 06.10.2009, 09:33 -0600 schrieb Andreas Dilger:
> > ... bla bla ...
> > Is there a reason why an immediate read after a write on the same node
> > from/to a shared file is slow? Is there any additional communication,
> > e.g. is the client flushing the buffer cache before the first read? The
> > statistics show that the average time to complete a 1.44MB read request
> > is increasing during the runtime of our program. At some point it hits
> > an upper limit or a saturation point and stays there. Is there some kind
> > of queue or something that is getting full in this kind of
> > write/read-scenario? May tuneable some stuff in /proc/fs/luste?
> 
> One possible issue is that you don't have enough extra RAM to cache 1.5GB
> of the checkpoint, so during the write it is being flushed to the OSTs
> and evicted from cache.  When you immediately restart there is still dirty
> data being written from the clients that is contending with the reads to
> restart.
> Cheers, Andreas

Well, I do call fsync() after the write is finished. During the write
process I see a constant stream of 4 GB/s running from the lustre
servers to the raid controllers which finishes when the write process
terminates. When I start reading, there are no more writes going this
way, so I suspect it might be something else ... Even if I wait between
the writes and reads 5 minutes (all dirty pages should have been flushed
by then) the picture does not change.


Michael

-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih


smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] 1.8.0 Losing connection to the MDT for several minutes and then recovering.

2009-10-06 Thread Lundgren, Andrew

I haven’t done anything yet.  The machines seem to reconnect without 
intervention.   But the problem occurs again later.

From: Đào Thị Thảo [mailto:tha...@isds.vn]
Sent: Tuesday, October 06, 2009 9:43 PM
To: Lundgren, Andrew
Cc: lustre-discuss
Subject: Re: [Lustre-discuss] 1.8.0 Loosing connection to the MDT for several 
minutes and then recovering.

hi,
I have the same problem with Lundgren.
I don't understand why it's happen. While, my network is still stable.
This problem is repeated. Some time, client can't connection restored to 
service (OSTs or MDS) . My  provisional measure is rebooting the node.
Andrew, can you explain more detail and guide how to fix it?
On Wed, Oct 7, 2009 at 12:20 AM, Lundgren, Andrew 
mailto:andrew.lundg...@level3.com>> wrote:

Oh man, that should have read LOSING!

From: 
lustre-discuss-boun...@lists.lustre.org

[mailto:lustre-discuss-boun...@lists.lustre.org]
 On Behalf Of Lundgren, Andrew
Sent: Tuesday, October 06, 2009 11:14 AM
To: lustre-discuss
Subject: [Lustre-discuss] 1.8.0 Loosing connection to the MDT for several 
minutes and then recovering.

We have a few 1.8.0 clusters running.  We have seen multiple instances now 
where the clients lose connectivity to the MDT.  The MDS logs indicate that 
there is some sort of problem on the MDT.

The following is a typical output:

Oct  6 02:56:08 mint1502 kernel: LustreError: 
28999:0:(handler.c:161:mds_sendpage()) @@@ bulk failed: timeout 0(4096), 
evicting 7523f416-2579-5f49-cd3f-54d2438b8...@net_0x2ce213b0b_uuid

Oct  6 02:56:08 mint1502 kernel:   r...@8100ac9f4000 x1314647461000449/t0 
o37->7523f416-2579-5f49-cd3f-54d2438b8...@net_0x2ce213b0b_uuid:0/0 lens 
408/360 e 1 to 0 dl 1254797793 ref 1 fl Interpret:/0/0 rc 0/0

Oct  6 02:56:16 mint1502 kernel: Lustre: Request x1312747398000879 sent from 
content-OST001d-osc to NID 207.123.49...@tcp 7s ago has timed out (limit 7s).

Oct  6 02:56:16 mint1502 kernel: LustreError: 166-1: content-OST001d-osc: 
Connection to service content-OST001d via nid 207.123.49...@tcp was lost; in 
progress operations using this service will fail.

Oct  6 02:56:16 mint1502 kernel: LustreError: Skipped 1 previous similar message

Oct  6 02:56:17 mint1502 kernel: LustreError: 166-1: content-OST001c-osc: 
Connection to service content-OST001c via nid 207.123.49...@tcp was lost; in 
progress operations using this service will fail.

Oct  6 02:56:17 mint1502 kernel: LustreError: Skipped 1 previous similar message

Oct  6 02:56:18 mint1502 kernel: LustreError: 138-a: content-MDT: A client 
on nid 207.123.49...@tcp was evicted due to a lock blocking callback to 
207.123.49...@tcp timed out: rc -107

Oct  6 02:56:18 mint1502 kernel: LustreError: 138-a: content-MDT: A client 
on nid 207.123.4...@tcp was evicted due to a lock blocking callback to 
207.123.4...@tcp timed out: rc -107

Oct  6 02:56:18 mint1502 kernel: BUG: soft lockup - CPU#2 stuck for 10s! 
[ll_mdt_rdpg_04:28999]

Oct  6 02:56:18 mint1502 kernel: CPU 2:

Oct  6 02:56:18 mint1502 kernel: Modules linked in: mds(U) fsfilt_ldiskfs(U) 
mgs(U) mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) 
ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) ipv6(U) 
xfrm_nalgo(U) crypto_api(U) sunrpc(U) bonding(U) ip_conntrack_netbios_ns(U) 
ipt_REJECT(U) xt_tcpudp(U) xt_state(U) ip_conntrack(U) nfnetlink(U) 
iptable_filter(U) ip_tables(U) x_tables(U) dm_round_robin(U) dm_rdac(U) 
dm_multipath(U) video(U) sbs(U) backlight(U) i2c_ec(U) button(U) battery(U) 
asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) joydev(U) 
i2c_i801(U) e1000e(U) sr_mod(U) i2c_core(U) i5000_edac(U) cdrom(U) pata_acpi(U) 
shpchp(U) edac_mc(U) serio_raw(U) sg(U) pcspkr(U) dm_snapshot(U) dm_zero(U) 
dm_mirror(U) dm_mod(U) usb_storage(U) lpfc(U) scsi_transport_fc(U) ahci(U) 
ata_piix(U) libata(U) mptsas(U) mptscsih(U) mptbase(U) scsi_transport_sas(U) 
sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U)

Oct  6 02:56:18 mint1502 kernel: Pid: 28999, comm: ll_mdt_rdpg_04 Tainted: G
  2.6.18-92.1.17.el5_lustre.1.8.0smp #1

Oct  6 02:56:18 mint1502 kernel: RIP: 0010:[]  
[] :obdclass:lustre_hash_for_each_empty+0x1f0/0x290

Oct  6 02:56:18 mint1502 kernel: RSP: 0018:8104c02ad850  EFLAGS: 0206

Oct  6 02:56:18 mint1502 kernel: RAX: 810448dfd200 RBX: 328c 
RCX: ba75

Oct  6 02:56:18 mint1502 kernel: RDX: 5e7d RSI: 802f0d80 
RDI: c200109d78cc

Oct  6 02:56:18 mint1502 kernel: RBP: 8860c8f2 R08: 810001016e60 
R09: 

Oct  6 02:56:18 mint1502 kernel: R10: 8100b7a28500 R11: 0150 
R12: 810448dfd200

Oct  6 02:56:18 mint1502 kernel: R13: 0286 R14: 0286 
R15: 8104c02ad7f0

Oct  6 02:56:18 mint1502 kernel: FS:  2b49ff256220() 
GS:81051fc53d40() knlGS:00

Re: [Lustre-discuss] 1.8.0 Loosing connection to the MDT for several minutes and then recovering.

2009-10-06 Thread Đào Thị Thảo

hi,
I have the same problem with Lundgren.
I don't understand why it's happen. While, my network is still stable.
This problem is repeated. Some time, client can't connection restored to
service (OSTs or MDS) . My  provisional measure is rebooting the node.
Andrew, can you explain more detail and guide how to fix it?

On Wed, Oct 7, 2009 at 12:20 AM, Lundgren, Andrew <
andrew.lundg...@level3.com> wrote:

>  Oh man, that should have read LOSING!
>
>
>
> *From:* lustre-discuss-boun...@lists.lustre.org [mailto:
> lustre-discuss-boun...@lists.lustre.org] *On Behalf Of *Lundgren, Andrew
> *Sent:* Tuesday, October 06, 2009 11:14 AM
> *To:* lustre-discuss
> *Subject:* [Lustre-discuss] 1.8.0 Loosing connection to the MDT for
> several minutes and then recovering.
>
>
>
> We have a few 1.8.0 clusters running.  We have seen multiple instances now
> where the clients loose connectivity to the MDT.  The MDS logs indicate that
> there is some sort of problem on the MDT.
>
>
>
> The following is a typical output:
>
>
>
> Oct  6 02:56:08 mint1502 kernel: LustreError:
> 28999:0:(handler.c:161:mds_sendpage()) @@@ bulk failed: timeout 0(4096),
> evicting 7523f416-2579-5f49-cd3f-54d2438b8...@net_0x2ce213b0b_uuid
>
> Oct  6 02:56:08 mint1502 kernel:   r...@8100ac9f4000x1314647461000449/t0
> o37->7523f416-2579-5f49-cd3f-54d2438b8...@net_0x2ce213b0b_uuid:0/0
> lens 408/360 e 1 to 0 dl 1254797793 ref 1 fl Interpret:/0/0 rc 0/0
>
> Oct  6 02:56:16 mint1502 kernel: Lustre: Request x1312747398000879 sent
> from content-OST001d-osc to NID 207.123.49...@tcp 7s ago has timed out
> (limit 7s).
>
> Oct  6 02:56:16 mint1502 kernel: LustreError: 166-1: content-OST001d-osc:
> Connection to service content-OST001d via nid 207.123.49...@tcp was lost;
> in progress operations using this service will fail.
>
> Oct  6 02:56:16 mint1502 kernel: LustreError: Skipped 1 previous similar
> message
>
> Oct  6 02:56:17 mint1502 kernel: LustreError: 166-1: content-OST001c-osc:
> Connection to service content-OST001c via nid 207.123.49...@tcp was lost;
> in progress operations using this service will fail.
>
> Oct  6 02:56:17 mint1502 kernel: LustreError: Skipped 1 previous similar
> message
>
> Oct  6 02:56:18 mint1502 kernel: LustreError: 138-a: content-MDT: A
> client on nid 207.123.49...@tcp was evicted due to a lock blocking
> callback to 207.123.49...@tcp timed out: rc -107
>
> Oct  6 02:56:18 mint1502 kernel: LustreError: 138-a: content-MDT: A
> client on nid 207.123.4...@tcp was evicted due to a lock blocking callback
> to 207.123.4...@tcp timed out: rc -107
>
> Oct  6 02:56:18 mint1502 kernel: BUG: soft lockup - CPU#2 stuck for 10s!
> [ll_mdt_rdpg_04:28999]
>
> Oct  6 02:56:18 mint1502 kernel: CPU 2:
>
> Oct  6 02:56:18 mint1502 kernel: Modules linked in: mds(U)
> fsfilt_ldiskfs(U) mgs(U) mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U)
> lquota(U) osc(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U)
> ipv6(U) xfrm_nalgo(U) crypto_api(U) sunrpc(U) bonding(U)
> ip_conntrack_netbios_ns(U) ipt_REJECT(U) xt_tcpudp(U) xt_state(U)
> ip_conntrack(U) nfnetlink(U) iptable_filter(U) ip_tables(U) x_tables(U)
> dm_round_robin(U) dm_rdac(U) dm_multipath(U) video(U) sbs(U) backlight(U)
> i2c_ec(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U)
> parport_pc(U) lp(U) parport(U) joydev(U) i2c_i801(U) e1000e(U) sr_mod(U)
> i2c_core(U) i5000_edac(U) cdrom(U) pata_acpi(U) shpchp(U) edac_mc(U)
> serio_raw(U) sg(U) pcspkr(U) dm_snapshot(U) dm_zero(U) dm_mirror(U)
> dm_mod(U) usb_storage(U) lpfc(U) scsi_transport_fc(U) ahci(U) ata_piix(U)
> libata(U) mptsas(U) mptscsih(U) mptbase(U) scsi_transport_sas(U) sd_mod(U)
> scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U)
>
> Oct  6 02:56:18 mint1502 kernel: Pid: 28999, comm: ll_mdt_rdpg_04 Tainted:
> G  2.6.18-92.1.17.el5_lustre.1.8.0smp #1
>
> Oct  6 02:56:18 mint1502 kernel: RIP: 0010:[]
> [] :obdclass:lustre_hash_for_each_empty+0x1f0/0x290
>
> Oct  6 02:56:18 mint1502 kernel: RSP: 0018:8104c02ad850  EFLAGS:
> 0206
>
> Oct  6 02:56:18 mint1502 kernel: RAX: 810448dfd200 RBX:
> 328c RCX: ba75
>
> Oct  6 02:56:18 mint1502 kernel: RDX: 5e7d RSI:
> 802f0d80 RDI: c200109d78cc
>
> Oct  6 02:56:18 mint1502 kernel: RBP: 8860c8f2 R08:
> 810001016e60 R09: 
>
> Oct  6 02:56:18 mint1502 kernel: R10: 8100b7a28500 R11:
> 0150 R12: 810448dfd200
>
> Oct  6 02:56:18 mint1502 kernel: R13: 0286 R14:
> 0286 R15: 8104c02ad7f0
>
> Oct  6 02:56:18 mint1502 kernel: FS:  2b49ff256220()
> GS:81051fc53d40() knlGS:
>
> Oct  6 02:56:18 mint1502 kernel: CS:  0010 DS:  ES:  CR0:
> 8005003b
>
> Oct  6 02:56:18 mint1502 kernel: CR2: 003d82e985d0 CR3:
> 00201000 CR4: 06e0
>
> Oct  6 02:56:18 mint1502 kernel:
>
> Oct  6 02:56:18 mint1502 kernel: Call Trace:
>
> Oct  6 02:56:18 mint1502 ke

Re: [Lustre-discuss] Lustre-discuss Digest, Vol 45, Issue 6

2009-10-06 Thread Dam Thanh Tung

>
>
> Date: Mon, 5 Oct 2009 15:42:25 +0100
> From: pg_...@lus.for.sabi.co.uk (Peter Grandi)
> Subject: Re: [Lustre-discuss] drbd slow I/O with lustre filesystem
> To: Lustre discussion 
> Message-ID: <19146.1489.532204.6...@tree.ty.sabi.co.uk>
> Content-Type: text/plain; charset=us-ascii
>


> RAID5 over RAID1? Nahh. Consider http://WWW.BAARF.com/ and that
> the storage system of a Lustre pool over DRBD is ideally suited to
> RAID10 (with each pair a DRBD resource). RAID5 may be contributing
> to your speed problem below because of or being rebuilt/syncing
> itself.
>
> Poor me, i don't know it before, so now we can't change anything on my raid
partition :( .


> > After formatting them with lustre format ( using mkfs.lustre ) ,
> > i start to copy data to my drbd devices, but:
>
> > - Its I/O wait when i monitor by top or iostat is too hight,
> > about 25%
>
> This is not much related to anything... After all you are doing a
> lot of IO, and jumping around on the disk, doing a restore.
>

Could you please tell me in detail what do you mean is ?  I don't really
understand it ?

>
> > - The copy speed from my web client to our OST using drbd
> > devices is too low, only about 13MB/s although client and ost in
> > is the same 1Gb Ethernet LAN.
>
> Too few details about this. Thigns to check:
>
> * Raw network speed: I like 'nuttcp' to do check it. Using the
>  usual trick (larger send/receive buffers, jumbo frames, ...) may
>  help if there are issues. But then you were getting 70MB/s above.
>http://lists.centos.org/pipermail/centos/2009-July/079505.html
> * If you are using LVMN2 bad news.
>http://archives.free.net.ph/message/20070815.091608.fff62ba9.en.html
> * Using RAID5 as argued above may be detrimental.
> * The DRBD must be configured to allow higher sync speeds:
>http://www.ossramblings.com/drbd_defaults_too_slow
>
> http://www.linux-ha.org/DRBD/FAQ#head-e09d2c15ba7ff691ecd5d5d7b848a50d25a3c3eb
>  Your initial sync however seemed to run at 70MB/s so
>  I wonder. Maybe tuning the "unplug" waterkmark in DRBD
>  or if you have battery backup enabling no-flush mode.
>http://archives.free.net.ph/message/20081219.085301.997727d2.en.html
>
> > When i tried using one OST without drbd, it worked quite well
>
> It might mean that it is mainly a DRBD issue. You might want to
> get the latest DRBD versions, as some earlier versions. If you
> have RHEL the ElRepo has got fairly recent ones.
>
> > So, could any one please tell me where the problem is ? In our
> > drbd devices or because of lustre ? Is there anyone has the same
> > problem with me ? :(
>
> All of the above probably -- max performance here means ensuring
> that write requests are issued as fast as possible and back-to-back
> packets/blocks are then possible both on the network and on the
> storage system...
>
>  http://www.gossamer-threads.com/lists/drbd/users/17991
>  http://lists.linbit.com/pipermail/drbd-user/2007-August/007256.html
>  http://lists.linbit.com/pipermail/drbd-user/2009-January/011165.html
>  http://lists.linbit.com/pipermail/drbd-user/2009-January/011198.html
>

They are really great information, i checked it but will consider to using
some of them ( i.e some drbd options like no-disk-flushes, no-md-flushes ...
it's maybe useful in speed tuning but i am not sure it won't affect to my
system stability )

Anyway, many thanks for all of them :)

>
> It may conceivably be quicker for you to load all your data first
> on the primary storage half of the pair, and then reactivate the
> secondary and let resync.
>
> I tried using that way but the speed increasing is not remarkable, about
5-7MB

My impression is that a problem is unlikely to originate in the
> Lustre side, but more on the underlying layers mentioned above.
> There is a fair bit of material on DRBD optimization, both on its
> site, and more specifically around the MySQL community, where it
> is very commonly used, and they care a lot about performance.
>
>
> It's also what i guessed, so i posted my questions to  both lustre and drbd
mailing list and luckily, i received some useful information and tips.

After all, many thanks for you detail answer. I'm really appreciated it :)
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Groups and Use of newgrp

2009-10-06 Thread Andreas Dilger

On Oct 06, 2009  15:27 -0400, Ms. Tammy R. Brown (Contractor) wrote:
> Why do I have to use "newgrp" command to access files and directories 
> owned by groups to which I belong but are not my primary group?
> 
> Is there a way to mount this file system to cure this problem?

Is the MDS configured to have the supplementary groups upcall running?
Does the MDS list the same /etc/groups (or LDAP or whatever) as the
clients?

I believe this is in the manual.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] Groups and Use of newgrp

2009-10-06 Thread Ms. Tammy R. Brown (Contractor)

Why do I have to use "newgrp" command to access files and directories 
owned by groups to which I belong but are not my primary group?

Is there a way to mount this file system to cure this problem?

Thanks,
Tammy
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] 1.8.0 Loosing connection to the MDT for several minutes and then recovering.

2009-10-06 Thread Lundgren, Andrew

Oh man, that should have read LOSING!

From: lustre-discuss-boun...@lists.lustre.org 
[mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Lundgren, Andrew
Sent: Tuesday, October 06, 2009 11:14 AM
To: lustre-discuss
Subject: [Lustre-discuss] 1.8.0 Loosing connection to the MDT for several 
minutes and then recovering.

We have a few 1.8.0 clusters running.  We have seen multiple instances now 
where the clients loose connectivity to the MDT.  The MDS logs indicate that 
there is some sort of problem on the MDT.

The following is a typical output:

Oct  6 02:56:08 mint1502 kernel: LustreError: 
28999:0:(handler.c:161:mds_sendpage()) @@@ bulk failed: timeout 0(4096), 
evicting 7523f416-2579-5f49-cd3f-54d2438b8...@net_0x2ce213b0b_uuid
Oct  6 02:56:08 mint1502 kernel:   r...@8100ac9f4000 x1314647461000449/t0 
o37->7523f416-2579-5f49-cd3f-54d2438b8...@net_0x2ce213b0b_uuid:0/0 lens 
408/360 e 1 to 0 dl 1254797793 ref 1 fl Interpret:/0/0 rc 0/0
Oct  6 02:56:16 mint1502 kernel: Lustre: Request x1312747398000879 sent from 
content-OST001d-osc to NID 207.123.49...@tcp 7s ago has timed out (limit 7s).
Oct  6 02:56:16 mint1502 kernel: LustreError: 166-1: content-OST001d-osc: 
Connection to service content-OST001d via nid 207.123.49...@tcp was lost; in 
progress operations using this service will fail.
Oct  6 02:56:16 mint1502 kernel: LustreError: Skipped 1 previous similar message
Oct  6 02:56:17 mint1502 kernel: LustreError: 166-1: content-OST001c-osc: 
Connection to service content-OST001c via nid 207.123.49...@tcp was lost; in 
progress operations using this service will fail.
Oct  6 02:56:17 mint1502 kernel: LustreError: Skipped 1 previous similar message
Oct  6 02:56:18 mint1502 kernel: LustreError: 138-a: content-MDT: A client 
on nid 207.123.49...@tcp was evicted due to a lock blocking callback to 
207.123.49...@tcp timed out: rc -107
Oct  6 02:56:18 mint1502 kernel: LustreError: 138-a: content-MDT: A client 
on nid 207.123.4...@tcp was evicted due to a lock blocking callback to 
207.123.4...@tcp timed out: rc -107
Oct  6 02:56:18 mint1502 kernel: BUG: soft lockup - CPU#2 stuck for 10s! 
[ll_mdt_rdpg_04:28999]
Oct  6 02:56:18 mint1502 kernel: CPU 2:
Oct  6 02:56:18 mint1502 kernel: Modules linked in: mds(U) fsfilt_ldiskfs(U) 
mgs(U) mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) 
ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) ipv6(U) 
xfrm_nalgo(U) crypto_api(U) sunrpc(U) bonding(U) ip_conntrack_netbios_ns(U) 
ipt_REJECT(U) xt_tcpudp(U) xt_state(U) ip_conntrack(U) nfnetlink(U) 
iptable_filter(U) ip_tables(U) x_tables(U) dm_round_robin(U) dm_rdac(U) 
dm_multipath(U) video(U) sbs(U) backlight(U) i2c_ec(U) button(U) battery(U) 
asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) joydev(U) 
i2c_i801(U) e1000e(U) sr_mod(U) i2c_core(U) i5000_edac(U) cdrom(U) pata_acpi(U) 
shpchp(U) edac_mc(U) serio_raw(U) sg(U) pcspkr(U) dm_snapshot(U) dm_zero(U) 
dm_mirror(U) dm_mod(U) usb_storage(U) lpfc(U) scsi_transport_fc(U) ahci(U) 
ata_piix(U) libata(U) mptsas(U) mptscsih(U) mptbase(U) scsi_transport_sas(U) 
sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U)
Oct  6 02:56:18 mint1502 kernel: Pid: 28999, comm: ll_mdt_rdpg_04 Tainted: G
  2.6.18-92.1.17.el5_lustre.1.8.0smp #1
Oct  6 02:56:18 mint1502 kernel: RIP: 0010:[]  
[] :obdclass:lustre_hash_for_each_empty+0x1f0/0x290
Oct  6 02:56:18 mint1502 kernel: RSP: 0018:8104c02ad850  EFLAGS: 0206
Oct  6 02:56:18 mint1502 kernel: RAX: 810448dfd200 RBX: 328c 
RCX: ba75
Oct  6 02:56:18 mint1502 kernel: RDX: 5e7d RSI: 802f0d80 
RDI: c200109d78cc
Oct  6 02:56:18 mint1502 kernel: RBP: 8860c8f2 R08: 810001016e60 
R09: 
Oct  6 02:56:18 mint1502 kernel: R10: 8100b7a28500 R11: 0150 
R12: 810448dfd200
Oct  6 02:56:18 mint1502 kernel: R13: 0286 R14: 0286 
R15: 8104c02ad7f0
Oct  6 02:56:18 mint1502 kernel: FS:  2b49ff256220() 
GS:81051fc53d40() knlGS:
Oct  6 02:56:18 mint1502 kernel: CS:  0010 DS:  ES:  CR0: 
8005003b
Oct  6 02:56:18 mint1502 kernel: CR2: 003d82e985d0 CR3: 00201000 
CR4: 06e0
Oct  6 02:56:18 mint1502 kernel:
Oct  6 02:56:18 mint1502 kernel: Call Trace:
Oct  6 02:56:18 mint1502 kernel:  [] 
:obdclass:class_disconnect+0x378/0x400
Oct  6 02:56:18 mint1502 kernel:  [] 
:ptlrpc:ldlm_cancel_locks_for_export_cb+0x0/0xc0
Oct  6 02:56:18 mint1502 kernel:  [] 
:mds:mds_disconnect+0x121/0xe30
Oct  6 02:56:18 mint1502 kernel:  [] lock_timer_base+0x1b/0x3c
Oct  6 02:56:19 mint1502 kernel:  [] 
try_to_del_timer_sync+0x51/0x5a
Oct  6 02:56:19 mint1502 kernel:  [] 
:obdclass:class_fail_export+0x384/0x4c0
Oct  6 02:56:19 mint1502 kernel:  [] 
:mds:mds_readpage+0x1571/0x18a0
Oct  6 02:56:19 mint1502 kernel:  [] 
:ptlrpc:lustre_msg_set_limit+0x35/0xf0
Oct  6 02:56:19 mint1502 kernel:  [] 
d

[Lustre-discuss] 1.8.0 Loosing connection to the MDT for several minutes and then recovering.

2009-10-06 Thread Lundgren, Andrew

We have a few 1.8.0 clusters running.  We have seen multiple instances now 
where the clients loose connectivity to the MDT.  The MDS logs indicate that 
there is some sort of problem on the MDT.

The following is a typical output:

Oct  6 02:56:08 mint1502 kernel: LustreError: 
28999:0:(handler.c:161:mds_sendpage()) @@@ bulk failed: timeout 0(4096), 
evicting 7523f416-2579-5f49-cd3f-54d2438b8...@net_0x2ce213b0b_uuid
Oct  6 02:56:08 mint1502 kernel:   r...@8100ac9f4000 x1314647461000449/t0 
o37->7523f416-2579-5f49-cd3f-54d2438b8...@net_0x2ce213b0b_uuid:0/0 lens 
408/360 e 1 to 0 dl 1254797793 ref 1 fl Interpret:/0/0 rc 0/0
Oct  6 02:56:16 mint1502 kernel: Lustre: Request x1312747398000879 sent from 
content-OST001d-osc to NID 207.123.49...@tcp 7s ago has timed out (limit 7s).
Oct  6 02:56:16 mint1502 kernel: LustreError: 166-1: content-OST001d-osc: 
Connection to service content-OST001d via nid 207.123.49...@tcp was lost; in 
progress operations using this service will fail.
Oct  6 02:56:16 mint1502 kernel: LustreError: Skipped 1 previous similar message
Oct  6 02:56:17 mint1502 kernel: LustreError: 166-1: content-OST001c-osc: 
Connection to service content-OST001c via nid 207.123.49...@tcp was lost; in 
progress operations using this service will fail.
Oct  6 02:56:17 mint1502 kernel: LustreError: Skipped 1 previous similar message
Oct  6 02:56:18 mint1502 kernel: LustreError: 138-a: content-MDT: A client 
on nid 207.123.49...@tcp was evicted due to a lock blocking callback to 
207.123.49...@tcp timed out: rc -107
Oct  6 02:56:18 mint1502 kernel: LustreError: 138-a: content-MDT: A client 
on nid 207.123.4...@tcp was evicted due to a lock blocking callback to 
207.123.4...@tcp timed out: rc -107
Oct  6 02:56:18 mint1502 kernel: BUG: soft lockup - CPU#2 stuck for 10s! 
[ll_mdt_rdpg_04:28999]
Oct  6 02:56:18 mint1502 kernel: CPU 2:
Oct  6 02:56:18 mint1502 kernel: Modules linked in: mds(U) fsfilt_ldiskfs(U) 
mgs(U) mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) 
ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) ipv6(U) 
xfrm_nalgo(U) crypto_api(U) sunrpc(U) bonding(U) ip_conntrack_netbios_ns(U) 
ipt_REJECT(U) xt_tcpudp(U) xt_state(U) ip_conntrack(U) nfnetlink(U) 
iptable_filter(U) ip_tables(U) x_tables(U) dm_round_robin(U) dm_rdac(U) 
dm_multipath(U) video(U) sbs(U) backlight(U) i2c_ec(U) button(U) battery(U) 
asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) joydev(U) 
i2c_i801(U) e1000e(U) sr_mod(U) i2c_core(U) i5000_edac(U) cdrom(U) pata_acpi(U) 
shpchp(U) edac_mc(U) serio_raw(U) sg(U) pcspkr(U) dm_snapshot(U) dm_zero(U) 
dm_mirror(U) dm_mod(U) usb_storage(U) lpfc(U) scsi_transport_fc(U) ahci(U) 
ata_piix(U) libata(U) mptsas(U) mptscsih(U) mptbase(U) scsi_transport_sas(U) 
sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U)
Oct  6 02:56:18 mint1502 kernel: Pid: 28999, comm: ll_mdt_rdpg_04 Tainted: G
  2.6.18-92.1.17.el5_lustre.1.8.0smp #1
Oct  6 02:56:18 mint1502 kernel: RIP: 0010:[]  
[] :obdclass:lustre_hash_for_each_empty+0x1f0/0x290
Oct  6 02:56:18 mint1502 kernel: RSP: 0018:8104c02ad850  EFLAGS: 0206
Oct  6 02:56:18 mint1502 kernel: RAX: 810448dfd200 RBX: 328c 
RCX: ba75
Oct  6 02:56:18 mint1502 kernel: RDX: 5e7d RSI: 802f0d80 
RDI: c200109d78cc
Oct  6 02:56:18 mint1502 kernel: RBP: 8860c8f2 R08: 810001016e60 
R09: 
Oct  6 02:56:18 mint1502 kernel: R10: 8100b7a28500 R11: 0150 
R12: 810448dfd200
Oct  6 02:56:18 mint1502 kernel: R13: 0286 R14: 0286 
R15: 8104c02ad7f0
Oct  6 02:56:18 mint1502 kernel: FS:  2b49ff256220() 
GS:81051fc53d40() knlGS:
Oct  6 02:56:18 mint1502 kernel: CS:  0010 DS:  ES:  CR0: 
8005003b
Oct  6 02:56:18 mint1502 kernel: CR2: 003d82e985d0 CR3: 00201000 
CR4: 06e0
Oct  6 02:56:18 mint1502 kernel:
Oct  6 02:56:18 mint1502 kernel: Call Trace:
Oct  6 02:56:18 mint1502 kernel:  [] 
:obdclass:class_disconnect+0x378/0x400
Oct  6 02:56:18 mint1502 kernel:  [] 
:ptlrpc:ldlm_cancel_locks_for_export_cb+0x0/0xc0
Oct  6 02:56:18 mint1502 kernel:  [] 
:mds:mds_disconnect+0x121/0xe30
Oct  6 02:56:18 mint1502 kernel:  [] lock_timer_base+0x1b/0x3c
Oct  6 02:56:19 mint1502 kernel:  [] 
try_to_del_timer_sync+0x51/0x5a
Oct  6 02:56:19 mint1502 kernel:  [] 
:obdclass:class_fail_export+0x384/0x4c0
Oct  6 02:56:19 mint1502 kernel:  [] 
:mds:mds_readpage+0x1571/0x18a0
Oct  6 02:56:19 mint1502 kernel:  [] 
:ptlrpc:lustre_msg_set_limit+0x35/0xf0
Oct  6 02:56:19 mint1502 kernel:  [] 
default_wake_function+0x0/0xe
Oct  6 02:56:19 mint1502 kernel:  [] 
:mds:mds_handle+0x2140/0x4c80
Oct  6 02:56:19 mint1502 kernel:  [] __next_cpu+0x19/0x28
Oct  6 02:56:19 mint1502 kernel:  [] 
:ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0
Oct  6 02:56:19 mint1502 kernel:  [] 
:ptlrpc:ptlrpc_check_req+0x1d/0x110
Oct  6 02:56:19 mint1

Re: [Lustre-discuss] strange performance with POSIX file capabilities

2009-10-06 Thread Andreas Dilger

On Oct 06, 2009  15:13 +0200, Ralf Utermann wrote:
> with newer vanilla kernels we  saw strange performance
> data with iozone on patchless clients: some OSTs had a lower write
> bandwith in the iozone benchmark, getting worse with record
> sizes below 1024.
> After lots of kernel builds, it looks like the kernel config entry
> CONFIG_SECURITY_FILE_CAPABILITIES is the one, wich
> introduces this problem. If CONFIG_SECURITY_FILE_CAPABILITIES
> is not set, iozone data look good, if it's compiled into the
> kernel, we see the problem:
> http://www.physik.uni-augsburg.de/~ralfu/LustreTest/Lustre_with_file_caps.html

Just to clarify, you are reporting the above config option affects
write performance when changed on the client, correct?  It appears
that this option is off by default in the upstream kernels, so I
suspect it doesn't get tested much.

> Any idea, why file capabilities should affect the write
> performance on Lustre, and why it should only affect some OSTs?

I can imagine that if this is adding some significant overhead on a
per-system-call basis that it would hurt performance.

It is definitely odd that it would affect the performance of only some
of the OSTs.  I assume they are otherwise identical?  The only thing
I can imagine is that this option is related to SELinux and has some
overhead in getting extended attributes, but even then the xattrs are
only stored on the MDS so this would hurt all OSTs uniformly.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Read/Write performance problem

2009-10-06 Thread Andreas Dilger

On Oct 06, 2009  13:24 +0200, Michael Kluge wrote:
> We are running Lustre 1.6.5.1. The problem shows up when we read a
> shared file from multiple nodes that has just been written from the same
> set of nodes. 512 processes write a checkpoint (1.5 GB from each node)
> into a shared file by seeking to position RANK*1.5GB and writing 1.5GB
> in 1.44M chunks. Writing works fine and gives the full file system
> performance. The data is being written by using write() and no flags
> aside O_CREAT and O_WRONLY. If the checkpoint is written, the program is
> terminated and restarted and reads in the same portion of the file. For
> some reason this almost immediate reading of the same data that was just
> written on the same node is very slow. If we a) change the set of nodes
> or b) wait a day, we get the full read performance when we use the same
> executable and the same shared file. 
> 
> Is there a reason why an immediate read after a write on the same node
> from/to a shared file is slow? Is there any additional communication,
> e.g. is the client flushing the buffer cache before the first read? The
> statistics show that the average time to complete a 1.44MB read request
> is increasing during the runtime of our program. At some point it hits
> an upper limit or a saturation point and stays there. Is there some kind
> of queue or something that is getting full in this kind of
> write/read-scenario? May tuneable some stuff in /proc/fs/luste?

One possible issue is that you don't have enough extra RAM to cache 1.5GB
of the checkpoint, so during the write it is being flushed to the OSTs
and evicted from cache.  When you immediately restart there is still dirty
data being written from the clients that is contending with the reads to
restart.

As a general rule, avoiding unnecessary IO (i.e. reading back data that
was just written) reduces the time that the application is not doing
useful work (i.e. computing).


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Question about sleeping processes

2009-10-06 Thread Brian J. Murrell

On Tue, 2009-10-06 at 17:13 +0200, Michael Schwartzkopff wrote:
> 
> Thanks for your fast reply. I think # 20020 is the one we hit.

Certainly seems so.

> Waiting for a solution.

There are patches landed for that bug, but it would appear that it's
been reopened.  You could CC yourself to that bug to follow progress.

b.

signature.asc
Description: This is a digitally signed message part
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Question about sleeping processes

2009-10-06 Thread Michael Schwartzkopff

Am Dienstag, 6. Oktober 2009 17:08:44 schrieb Brian J. Murrell:
> On Tue, 2009-10-06 at 17:01 +0200, Michael Schwartzkopff wrote:
> > Here is some additional from the logs. Any ideas about that?
> >
> > Oct  5 10:26:43 sosmds2 kernel: LustreError: 30617:0:
> > (pack_generic.c:655:lustre_shrink_reply_v2()) ASSERTION(msg->lm_bufcount
> > > segment) failed
>
> Here's the failed assertion.
>
> > Oct  5 10:26:43 sosmds2 kernel: LustreError: 30617:0:
> > (pack_generic.c:655:lustre_shrink_reply_v2()) LBUG
>
> Which always leads to an LBUG which is what is putting the thread to
> sleep.
>
> Any time you see an LBUG in a server log file, you need to reboot the
> server.
>
> So now you need to take that ASSERTION message to our bugzilla and see
> if you can find a bug for already, and if not, file a new one, please.
>
> Cheers,
> b.

Thanks for your fast reply. I think # 20020 is the one we hit.
Waiting for a solution.
Greetings,
-- 
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: mi...@multinet.de
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Question about sleeping processes

2009-10-06 Thread Brian J. Murrell

On Tue, 2009-10-06 at 17:01 +0200, Michael Schwartzkopff wrote:
> 
> Here is some additional from the logs. Any ideas about that?
> 
> Oct  5 10:26:43 sosmds2 kernel: LustreError: 30617:0:
> (pack_generic.c:655:lustre_shrink_reply_v2()) ASSERTION(msg->lm_bufcount > 
> segment) failed

Here's the failed assertion.

> Oct  5 10:26:43 sosmds2 kernel: LustreError: 30617:0:
> (pack_generic.c:655:lustre_shrink_reply_v2()) LBUG

Which always leads to an LBUG which is what is putting the thread to
sleep.

Any time you see an LBUG in a server log file, you need to reboot the
server.

So now you need to take that ASSERTION message to our bugzilla and see
if you can find a bug for already, and if not, file a new one, please.

Cheers,
b.

signature.asc
Description: This is a digitally signed message part
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Question about sleeping processes

2009-10-06 Thread Michael Schwartzkopff

Am Dienstag, 6. Oktober 2009 16:22:08 schrieb Brian J. Murrell:
> On Tue, 2009-10-06 at 12:48 +0200, Michael Schwartzkopff wrote:
> > Hi,
>
> Hi,
>
> > my system load shows that quite a number of processes are waiting.
>
> Blocked.  I guess the word waiting is similar.
>
> > My questions are:
> > What causes the problem?
>
> In this case, the thread has lbugged previously.
>
> If you look in syslog for node with these processes you should find
> entries with LBUG and/or ASSERTION messages.  These are the defects that
> are causing the processes to get blocked (uninteruptable sleep)
(...)

Here is some additional from the logs. Any ideas about that?

Oct  5 10:26:43 sosmds2 kernel: LustreError: 30617:0:
(pack_generic.c:655:lustre_shrink_reply_v2()) ASSERTION(msg->lm_bufcount > 
segment) failed
Oct  5 10:26:43 sosmds2 kernel: LustreError: 30617:0:
(pack_generic.c:655:lustre_shrink_reply_v2()) LBUG
Oct  5 10:26:43 sosmds2 kernel: Lustre: 30617:0:(linux-
debug.c:264:libcfs_debug_dumpstack()) showing stack for process 30617
Oct  5 10:26:43 sosmds2 kernel: ll_mdt_47 R  running task   0 30617 
 
1 30618 30616 (L-TLB)
Oct  5 10:26:43 sosmds2 kernel:   0001 
000714a28100 0001
Oct  5 10:26:43 sosmds2 kernel:  0001 0086 
0012 8102212dfe88
Oct  5 10:26:43 sosmds2 kernel:  0001  
802f6aa0 
Oct  5 10:26:43 sosmds2 kernel: Call Trace:
Oct  5 10:26:43 sosmds2 kernel:  [] 
autoremove_wake_function+0x9/0x2e
Oct  5 10:26:43 sosmds2 kernel:  [] __wake_up_common+0x3e/0x68
Oct  5 10:26:43 sosmds2 kernel:  [] __wake_up_common+0x3e/0x68
Oct  5 10:26:43 sosmds2 kernel:  [] vprintk+0x2cb/0x317
Oct  5 10:26:43 sosmds2 kernel:  [] kallsyms_lookup+0xc2/0x17b
Oct  5 10:26:43 sosmds2 last message repeated 3 times
Oct  5 10:26:43 sosmds2 kernel:  [] printk_address+0x9f/0xab
Oct  5 10:26:43 sosmds2 kernel:  [] printk+0x8/0xbd
Oct  5 10:26:43 sosmds2 kernel:  [] printk+0x52/0xbd
Oct  5 10:26:43 sosmds2 kernel:  [] 
module_text_address+0x33/0x3c
Oct  5 10:26:43 sosmds2 kernel:  [] 
kernel_text_address+0x1a/0x26
Oct  5 10:26:43 sosmds2 kernel:  [] dump_trace+0x211/0x23a
Oct  5 10:26:43 sosmds2 kernel:  [] show_trace+0x34/0x47
Oct  5 10:26:43 sosmds2 kernel:  [] _show_stack+0xdb/0xea
Oct  5 10:26:43 sosmds2 kernel:  [] 
:libcfs:lbug_with_loc+0x7a/0xd0
Oct  5 10:26:43 sosmds2 kernel:  [] 
:libcfs:tracefile_init+0x0/0x110
Oct  5 10:26:43 sosmds2 kernel:  [] 
:ptlrpc:lustre_shrink_reply_v2+0xa8/0x240
Oct  5 10:26:43 sosmds2 kernel:  [] 
:mds:mds_getattr_lock+0xc59/0xce0
Oct  5 10:26:43 sosmds2 kernel:  [] 
:ptlrpc:lustre_msg_add_version+0x34/0x110
Oct  5 10:26:43 sosmds2 kernel:  [] 
:lnet:lnet_ni_send+0x93/0xd0
Oct  5 10:26:43 sosmds2 kernel:  [] 
:lnet:lnet_send+0x973/0x9a0
Oct  5 10:26:43 sosmds2 kernel:  [] 
cache_alloc_refill+0x106/0x186
Oct  5 10:26:43 sosmds2 kernel:  [] 
:mds:fixup_handle_for_resent_req+0x5a/0x2c0
Oct  5 10:26:43 sosmds2 kernel:  [] 
:mds:mds_intent_policy+0x636/0xc10
Oct  5 10:26:43 sosmds2 kernel:  [] 
:ptlrpc:ldlm_resource_putref+0x1b6/0x3a0
Oct  5 10:26:43 sosmds2 kernel:  [] 
:ptlrpc:ldlm_lock_enqueue+0x186/0xb30
Oct  5 10:26:43 sosmds2 kernel:  [] 
:ptlrpc:ldlm_export_lock_get+0x6f/0xe0
Oct  5 10:26:43 sosmds2 kernel:  [] 
:obdclass:lustre_hash_add+0x218/0x2e0
Oct  5 10:26:43 sosmds2 kernel:  [] 
:ptlrpc:ldlm_server_blocking_ast+0x0/0x83d
Oct  5 10:26:43 sosmds2 kernel:  [] 
:ptlrpc:ldlm_handle_enqueue+0xc19/0x1210
Oct  5 10:26:43 sosmds2 kernel:  [] 
:mds:mds_handle+0x4080/0x4cb0
Oct  5 10:26:43 sosmds2 kernel:  [] __next_cpu+0x19/0x28
Oct  5 10:26:43 sosmds2 kernel:  [] 
find_busiest_group+0x20d/0x621
Oct  5 10:26:43 sosmds2 kernel:  [] 
:ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0
Oct  5 10:26:43 sosmds2 kernel:  [] enqueue_task+0x41/0x56
Oct  5 10:26:43 sosmds2 kernel:  [] 
:ptlrpc:ptlrpc_check_req+0x1d/0x110
Oct  5 10:26:43 sosmds2 kernel:  [] 
:ptlrpc:ptlrpc_server_handle_request+0xa97/0x1160
Oct  5 10:26:43 sosmds2 kernel:  [] thread_return+0x62/0xfe
Oct  5 10:26:43 sosmds2 kernel:  [] __wake_up_common+0x3e/0x68
Oct  5 10:26:43 sosmds2 kernel:  [] 
:ptlrpc:ptlrpc_main+0x1218/0x13e0
Oct  5 10:26:43 sosmds2 kernel:  [] 
default_wake_function+0x0/0xe
Oct  5 10:26:43 sosmds2 kernel:  [] 
audit_syscall_exit+0x327/0x342
Oct  5 10:26:43 sosmds2 kernel:  [] child_rip+0xa/0x11
Oct  5 10:26:43 sosmds2 kernel:  [] 
:ptlrpc:ptlrpc_main+0x0/0x13e0
Oct  5 10:26:43 sosmds2 kernel:  [] child_rip+0x0/0x11
Oct  5 10:26:43 sosmds2 kernel:
Oct  5 10:26:43 sosmds2 kernel: LustreError: dumping log to /tmp/lustre-
log.1254731203.30617


-- 
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: mi...@multinet.de
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter J

Re: [Lustre-discuss] Question about sleeping processes

2009-10-06 Thread Brian J. Murrell

On Tue, 2009-10-06 at 12:48 +0200, Michael Schwartzkopff wrote:
> Hi,

Hi,

> my system load shows that quite a number of processes are waiting.

Blocked.  I guess the word waiting is similar.

> My questions are:
> What causes the problem?

In this case, the thread has lbugged previously.

If you look in syslog for node with these processes you should find
entries with LBUG and/or ASSERTION messages.  These are the defects that
are causing the processes to get blocked (uninteruptable sleep)

> Can I kill the "hanging" processes?

Nope.  You have to reboot the node.

Please search bugzilla for the LBUG/ASSERTIONs you are getting and if
you don't find anything that matches, please file a new bug.

> Oct  5 10:28:03 sosmds2 kernel: Lustre: 0:0:(watchdog.c:181:lcw_cb()) 
> Watchdog 
> triggered for pid 28402: it was inactive for 200.00s
> Oct  5 10:28:03 sosmds2 kernel: ll_mdt_35 D 81000100c980 0 28402  
> 
> 1 28403 28388 (L-TLB)
> Oct  5 10:28:03 sosmds2 kernel:  81041c723810 0046 
>  7fff
> Oct  5 10:28:03 sosmds2 kernel:  81041c7237d0 0001 
> 81022f3e60c0 81022f12e080
> Oct  5 10:28:03 sosmds2 kernel:  000177b2feff847c 14df 
> 81022f3e62a8 0001028f
> Oct  5 10:28:03 sosmds2 kernel: Call Trace:
> Oct  5 10:28:03 sosmds2 kernel:  [] 
> default_wake_function+0x0/0xe
> Oct  5 10:28:03 sosmds2 kernel:  [] 
> :libcfs:lbug_with_loc+0xc6/0xd0

Here's where you can see that the thread has lbugged.

b.

signature.asc
Description: This is a digitally signed message part
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] FW: Soft CPU Lockup

2009-10-06 Thread Hendelman, Rob

Correction below:  The stack dumps once a minute or so started at 12:40.
I rebooted client1 at 13:13.

Sorry for the confusion.

Rob

-Original Message-
From: Hendelman, Rob 
Sent: Tuesday, October 06, 2009 8:15 AM
To: 'lustre-discuss@lists.lustre.org'
Subject: Soft CPU Lockup

Hello Mr. Drokin,

Thank you for your prior response.

There was a client eviction just prior to the threads hanging and eating
100%, but NOT prior to the OSS finally dropping cpu usage again.
 
Here is a basic timeline(in hours:min "military" time)

09:07am->12:39:  Client "6" which was cloned from "client1" is being
worked on, rebooted, and connected/disconnected from the lustre servers.
No issues
12:39: final OSS message that says "haven't heard from 
in 240 seconds, I think it's dead and I'm evicting it.
12:40: what appear to be stack dumps on the OSS server for 2 i/o threads
(previously mentioned) 
12:44: client1 has lost it's lustre mounts and is complaining in nagios.
All other clients are fine.
13:13:  "stack dumps" once a minute or so, but no LBUG.  I leave the
server up and finally reboot client1.  The other clients2-5 are not
affected.  All other clients seem to be working normally so I don't
touch the OSS.
14:10: Final messages on OSS before OSS calms down (no messages after
this)

Oct  5 14:10:56 maglustre04 kernel: 
Oct  5 14:10:59 maglustre04 kernel: Lustre:
13366:0:(service.c:1317:ptlrpc_server_handle_request()) @@@ Request
x6413848 took longer than estimated (100+5495s); client may timeout.
  r...@81009308c400 x6413848/t0
o101->1b9e4991-1d5e-814d-2607-8c52f432e68d@:0/0 lens 232/288 e 0 to 0 dl
1254764364 ref 1 fl Complete:/0/0 rc 301/301
Oct  5 14:10:59 maglustre04 kernel: Lustre:
13421:0:(watchdog.c:330:lcw_update_time()) Expired watchdog for pid
13421 disabled after 5595.8041s
Oct  5 14:10:59 maglustre04 kernel: Lustre:
13366:0:(service.c:1317:ptlrpc_server_handle_request()) Skipped 1
previous similar message
Oct  5 14:10:59 maglustre04 kernel: Lustre:
13366:0:(watchdog.c:330:lcw_update_time()) Expired watchdog for pid
13366 disabled after 5595.8059s

Should I file a new bug?  Is there enough info in /var/log/messages to
file a bug or do I need to turn on some sort of more verbose debugging
incase this happens again?

Thanks,

Robert

The information contained in this message and its attachments 
is intended only for the private and confidential use of the 
intended recipient(s).  If you are not the intended recipient 
(or have received this e-mail in error) please notify the 
sender immediately and destroy this e-mail. Any unauthorized 
copying, disclosure or distribution of the material in this e-
mail is strictly prohibited.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] Soft CPU Lockup

2009-10-06 Thread Hendelman, Rob

Hello Mr. Drokin,

Thank you for your prior response.

There was a client eviction just prior to the threads hanging and eating
100%, but NOT prior to the OSS finally dropping cpu usage again.
 
Here is a basic timeline(in hours:min "military" time)

09:07am->12:39:  Client "6" which was cloned from "client1" is being
worked on, rebooted, and connected/disconnected from the lustre servers.
No issues
12:39: final OSS message that says "haven't heard from 
in 240 seconds, I think it's dead and I'm evicting it.
12:40: what appear to be stack dumps on the OSS server for 2 i/o threads
(previously mentioned) 
12:44: client1 has lost it's lustre mounts and is complaining in nagios.
All other clients are fine.
13:13:  "stack dumps" once a minute or so, but no LBUG.  I leave the
server up and finally reboot client1.  The other clients2-5 are not
affected.  All other clients seem to be working normally so I don't
touch the OSS.
14:10: Final messages on OSS before OSS calms down (no messages after
this)

Oct  5 14:10:56 maglustre04 kernel: 
Oct  5 14:10:59 maglustre04 kernel: Lustre:
13366:0:(service.c:1317:ptlrpc_server_handle_request()) @@@ Request
x6413848 took longer than estimated (100+5495s); client may timeout.
  r...@81009308c400 x6413848/t0
o101->1b9e4991-1d5e-814d-2607-8c52f432e68d@:0/0 lens 232/288 e 0 to 0 dl
1254764364 ref 1 fl Complete:/0/0 rc 301/301
Oct  5 14:10:59 maglustre04 kernel: Lustre:
13421:0:(watchdog.c:330:lcw_update_time()) Expired watchdog for pid
13421 disabled after 5595.8041s
Oct  5 14:10:59 maglustre04 kernel: Lustre:
13366:0:(service.c:1317:ptlrpc_server_handle_request()) Skipped 1
previous similar message
Oct  5 14:10:59 maglustre04 kernel: Lustre:
13366:0:(watchdog.c:330:lcw_update_time()) Expired watchdog for pid
13366 disabled after 5595.8059s

Should I file a new bug?  Is there enough info in /var/log/messages to
file a bug or do I need to turn on some sort of more verbose debugging
incase this happens again?

Thanks,

Robert

The information contained in this message and its attachments 
is intended only for the private and confidential use of the 
intended recipient(s).  If you are not the intended recipient 
(or have received this e-mail in error) please notify the 
sender immediately and destroy this e-mail. Any unauthorized 
copying, disclosure or distribution of the material in this e-
mail is strictly prohibited.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] strange performance with POSIX file capabilities

2009-10-06 Thread Ralf Utermann

Dear list,

with newer vanilla kernels we  saw strange performance
data with iozone on patchless clients: some OSTs had a lower write
bandwith in the iozone benchmark, getting worse with record
sizes below 1024.
After lots of kernel builds, it looks like the kernel config entry
CONFIG_SECURITY_FILE_CAPABILITIES is the one, wich
introduces this problem. If CONFIG_SECURITY_FILE_CAPABILITIES
is not set, iozone data look good, if it's compiled into the
kernel, we see the problem:
http://www.physik.uni-augsburg.de/~ralfu/LustreTest/Lustre_with_file_caps.html

Any idea, why file capabilities should affect the write
performance on Lustre, and why it should only affect some OSTs?

bye, Ralf
-- 
Ralf Utermann
_
Universität Augsburg, Institut für Physik   --   EDV-Betreuer
Universitätsstr.1 
D-86135 Augsburg Phone:  +49-821-598-3231
SMTP: ralf.uterm...@physik.uni-augsburg.de Fax: -3411
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] Read/Write performance problem

2009-10-06 Thread Michael Kluge

Hi all,

our Lustre FS shows an interesting performance problem which I'd like to
discuss as some of you might have seen this kind of things before and
maybe someone has a quick explanation of what's going on.

We are running Lustre 1.6.5.1. The problem shows up when we read a
shared file from multiple nodes that has just been written from the same
set of nodes. 512 processes write a checkpoint (1.5 GB from each node)
into a shared file by seeking to position RANK*1.5GB and writing 1.5GB
in 1.44M chunks. Writing works fine and gives the full file system
performance. The data is being written by using write() and no flags
aside O_CREAT and O_WRONLY. If the checkpoint is written, the program is
terminated and restarted and reads in the same portion of the file. For
some reason this almost immediate reading of the same data that was just
written on the same node is very slow. If we a) change the set of nodes
or b) wait a day, we get the full read performance when we use the same
executable and the same shared file. 

Is there a reason why an immediate read after a write on the same node
from/to a shared file is slow? Is there any additional communication,
e.g. is the client flushing the buffer cache before the first read? The
statistics show that the average time to complete a 1.44MB read request
is increasing during the runtime of our program. At some point it hits
an upper limit or a saturation point and stays there. Is there some kind
of queue or something that is getting full in this kind of
write/read-scenario? May tuneable some stuff in /proc/fs/luste?


Regards, Michael


-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:(+49) 351 463-37773
e-mail: michael.kl...@tu-dresden.de
WWW:http://www.tu-dresden.de/zih


smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] Question about sleeping processes

2009-10-06 Thread Michael Schwartzkopff

Hi,

my system load shows that quite a number of processes are waiting. ps shows me 
the same number of processes in state D (uniterruptable sleep). All processes 
are ll_mdt_NN, where NN is a decimal number.

In the logs I find the entry ( see log below).

My questions are:
What causes the problem?
Can I kill the "hanging" processes?

System: Luste 1.8.1 on RHEL5.3

thanks for any hints.

---

Oct  5 10:28:03 sosmds2 kernel: Lustre: 0:0:(watchdog.c:181:lcw_cb()) Watchdog 
triggered for pid 28402: it was inactive for 200.00s
Oct  5 10:28:03 sosmds2 kernel: ll_mdt_35 D 81000100c980 0 28402
  
1 28403 28388 (L-TLB)
Oct  5 10:28:03 sosmds2 kernel:  81041c723810 0046 
 7fff
Oct  5 10:28:03 sosmds2 kernel:  81041c7237d0 0001 
81022f3e60c0 81022f12e080
Oct  5 10:28:03 sosmds2 kernel:  000177b2feff847c 14df 
81022f3e62a8 0001028f
Oct  5 10:28:03 sosmds2 kernel: Call Trace:
Oct  5 10:28:03 sosmds2 kernel:  [] 
default_wake_function+0x0/0xe
Oct  5 10:28:03 sosmds2 kernel:  [] 
:libcfs:lbug_with_loc+0xc6/0xd0
Oct  5 10:28:03 sosmds2 kernel:  [] 
:libcfs:tracefile_init+0x0/0x110
Oct  5 10:28:03 sosmds2 kernel:  [] 
:ptlrpc:lustre_shrink_reply_v2+0xa8/0x240
Oct  5 10:28:03 sosmds2 kernel:  [] 
:mds:mds_getattr_lock+0xc59/0xce0
Oct  5 10:28:03 sosmds2 kernel:  [] 
:ptlrpc:lustre_msg_add_version+0x34/0x110
Oct  5 10:28:03 sosmds2 kernel:  [] 
:lnet:lnet_ni_send+0x93/0xd0
Oct  5 10:28:03 sosmds2 kernel:  [] 
:lnet:lnet_send+0x973/0x9a0
Oct  5 10:28:03 sosmds2 kernel:  [] 
:mds:fixup_handle_for_resent_req+0x5a/0x2c0
Oct  5 10:28:03 sosmds2 kernel:  [] 
:mds:mds_intent_policy+0x636/0xc10
Oct  5 10:28:03 sosmds2 kernel:  [] 
:ptlrpc:ldlm_resource_putref+0x1b6/0x3a0
Oct  5 10:28:03 sosmds2 kernel:  [] 
:ptlrpc:ldlm_lock_enqueue+0x186/0xb30
Oct  5 10:28:03 sosmds2 kernel:  [] 
:ptlrpc:ldlm_export_lock_get+0x6f/0xe0
Oct  5 10:28:03 sosmds2 kernel:  [] 
:obdclass:lustre_hash_add+0x218/0x2e0
Oct  5 10:28:03 sosmds2 kernel:  [] 
:ptlrpc:ldlm_server_blocking_ast+0x0/0x83d
Oct  5 10:28:03 sosmds2 kernel:  [] 
:ptlrpc:ldlm_handle_enqueue+0xc19/0x1210
Oct  5 10:28:03 sosmds2 kernel:  [] 
:mds:mds_handle+0x4080/0x4cb0
Oct  5 10:28:03 sosmds2 kernel:  [] 
:lvfs:lprocfs_counter_sub+0x57/0x90
Oct  5 10:28:03 sosmds2 kernel:  [] __next_cpu+0x19/0x28
Oct  5 10:28:03 sosmds2 kernel:  [] 
:ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0
Oct  5 10:28:03 sosmds2 kernel:  [] enqueue_task+0x41/0x56
Oct  5 10:28:03 sosmds2 kernel:  [] 
:ptlrpc:ptlrpc_check_req+0x1d/0x110
Oct  5 10:28:03 sosmds2 kernel:  [] 
:ptlrpc:ptlrpc_server_handle_request+0xa97/0x1160
Oct  5 10:28:03 sosmds2 kernel:  [] lock_timer_base+0x1b/0x3c
Oct  5 10:28:03 sosmds2 kernel:  [] __wake_up_common+0x3e/0x68
Oct  5 10:28:03 sosmds2 kernel:  [] 
:ptlrpc:ptlrpc_main+0x1218/0x13e0
Oct  5 10:28:03 sosmds2 kernel:  [] 
default_wake_function+0x0/0xe
Oct  5 10:28:03 sosmds2 kernel:  [] 
audit_syscall_exit+0x327/0x342
Oct  5 10:28:03 sosmds2 kernel:  [] child_rip+0xa/0x11
Oct  5 10:28:03 sosmds2 kernel:  [] 
:ptlrpc:ptlrpc_main+0x0/0x13e0
Oct  5 10:28:03 sosmds2 kernel:  [] child_rip+0x0/0x11


-- 
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: mi...@multinet.de
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Lustre voltaire configuration

2009-10-06 Thread Aielli Roberto

Hi Dennis,
I want to test both configurations: IPoIB and after RDMA.
I've reloaded the modules after the modification, but probably some more
configurations needs to be done.
   
Thanks, Roberto

Dennis Nelson wrote:
> On 10/5/09 7:58 AM, "Aielli Roberto"  wrote:
>
>   
>> Hi,
>> I'm trying to configure Lustre 1.8.1 with a Voltaire infiniband network.
>> On the MGS, MDS and OSSs I have two interfaces: eth1 and ib0. I've
>> successfully  completed a test using eth1 so I've mounted a filesystem
>> on client node. Now I want to do the same thing with Voltaire infiniband
>> (ib0) modifying the modprobe.conf on both servers an clients with the line:
>>
>> options lnet networks=tcp(ib0)
>> 
>
> This would use IPoIB, not native infiniband.  Is that what you want?  To do
> native infiniband, asssuming you are using OFED, you need to specify:
>
> options lnet networks=o2ib(ib0)
>
> Did you unload/reload the Lustre modules after changing modprobe.conf.local?
> If not, it would not recognize the changes in modprobe.conf.local.
>
>   
>> When I try to mount the FS on the client node nothing happen and I find
>> the following error in the syslog:
>>
>> Oct  5 13:08:12 xc264 kernel: Lustre:
>> 5468:0:(linux-tcpip.c:688:libcfs_sock_connect()) Error -101 connecting
>> 0.0.0.0/1023 -> 172.31.1.25/988
>>
>> Oct  5 13:08:12 xc264 kernel: Lustre:
>> 5468:0:(acceptor.c:95:lnet_connect_console_error()) Connection to
>> 172.31.1...@tcp at host 172.31.1.25 was unreachable: the network or that node
>> may be down, or
>>
>>  Lustre may be misconfigured.
>>
>> Oct  5 13:08:12 xc264 kernel: Lustre:
>> 5468:0:(socklnd_cb.c:421:ksocknal_txlist_done()) Deleting packet type 1 len
>> 368 172.31.65...@tcp->172.31.1...@tcp
>>
>> Oct  5 13:08:17 xc264 kernel: Lustre:
>> 5474:0:(client.c:1383:ptlrpc_expire_one_request()) @@@ Request
>> x1315690795499541 sent from lustre-MDT-mdc-81021f97e400 to NID
>> 172.31.1...@tcp 5s ago ha
>>
>> s timed out (limit 5s).
>>
>> Oct  5 13:08:17 xc264 kernel:   r...@81021828fc00 x1315690795499541/t0
>> o38->lustre-mdt_u...@172.31.1.25@tcp:12/10 lens 368/584 e 0 to 1 dl
>> 1254740897 ref 1 fl Rpc:N/0/0 rc 0/0
>>
>>
>> The main problem is displayed on the first line. The MGS ib0 address is
>> 172.31.65.25 but as you can see the client always try to connect the
>> eth1 address (172.31.1.25) even shutting down eth1.
>>
>> Placing in modprobe.conf a line with vib(ib0) the problem is different
>> I've also tried to modify modprobe.conf by changing the options line to:
>>
>> options lnet networks=vib(ib0)
>>
>>
>> but in the syslog I've found:
>>
>> Oct  5 12:33:19 xc264 kernel: LustreError:
>> 4864:0:(api-ni.c:1043:lnet_startup_lndnis()) Can't load LND vib, module
>> kviblnd
>>
>>
>> Are there other configurations for IB that I forgot to make/modify?
>>
>> Thanks in advance for your help
>> ___
>> Lustre-discuss mailing list
>> Lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> 
>
>
>  
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Lustre voltaire configuration

2009-10-06 Thread Aielli Roberto

Megan,
After the writeconf and a few other configurations I've managed to make
it work with IPoIB.

Thanks, Roberto


Ms. Megan Larko wrote:
> Hopefully this is a silly question for you.
>
> When you changed your lustre set-up to use the ib0 in place of the
> eth1, did you do the updated writeconf command on the MDS/MDT and the
> OSS/OST to have them point to the new address?
>
> I'm kind of assuming you have but I thought I would ask the question.
>
> Cheers!
> megan
>
>   

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Read/Write performance problem

Re: [Lustre-discuss] 1.8.0 Losing connection to the MDT for several minutes and then recovering.

Re: [Lustre-discuss] 1.8.0 Loosing connection to the MDT for several minutes and then recovering.

Re: [Lustre-discuss] Lustre-discuss Digest, Vol 45, Issue 6

Re: [Lustre-discuss] Groups and Use of newgrp

[Lustre-discuss] Groups and Use of newgrp

Re: [Lustre-discuss] 1.8.0 Loosing connection to the MDT for several minutes and then recovering.

[Lustre-discuss] 1.8.0 Loosing connection to the MDT for several minutes and then recovering.

Re: [Lustre-discuss] strange performance with POSIX file capabilities

Re: [Lustre-discuss] Read/Write performance problem

Re: [Lustre-discuss] Question about sleeping processes

Re: [Lustre-discuss] Question about sleeping processes

Re: [Lustre-discuss] Question about sleeping processes

Re: [Lustre-discuss] Question about sleeping processes

Re: [Lustre-discuss] Question about sleeping processes

[Lustre-discuss] FW: Soft CPU Lockup

[Lustre-discuss] Soft CPU Lockup

[Lustre-discuss] strange performance with POSIX file capabilities

[Lustre-discuss] Read/Write performance problem

[Lustre-discuss] Question about sleeping processes

Re: [Lustre-discuss] Lustre voltaire configuration

Re: [Lustre-discuss] Lustre voltaire configuration

22 matches

Site Navigation

Mail list logo

Footer information