Re: [Lustre-discuss] Newbie w/issues

2010-04-28 Thread Oleg Drokin
Hello!

On Apr 27, 2010, at 7:29 PM, Brian Andrus wrote:
 Apr 27 16:15:19 nas-0-1 kernel: LustreError: 
 4133:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error (-107)  
 r...@810669d35c50 x1334203739385128/t0 o400-?@?:0/0 lens 192/0 e 0 
 to 0 dl 1272410135 ref 1 fl Interpret:H/0/0 rc -107/0
 
 Any direction/insigt would be most helpful.

That's way too late in the logs to see what happened aside from server decided 
to evict some clients for some reason.
Interesting parts should be around evicting or timeout were first mentioned.

Bye,
Oleg
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Newbie w/issues

2010-04-28 Thread Cliff White
Brian Andrus wrote:
 Ok, I inherited a lustre filesystem used on a cluster. 
 
 I am seeing an issue where on the frontend, I see all of /work
 On nodes, however, I only see SOME of the user's directories.

That's rather odd. The directory structure is all on the MDS, so
it's usually either all there, or not there. Are any of the user errors
permission-related? That's the only thing I can think that would change 
what directories one node sees vs another.
 
 Work consists of one MDT/MGS and 3 osts
 The osts are LVMs served from a DDN via infiniband
 
 Running the kernel modules/client one the nodes/frontend
 lustre-client-1.8.2-2.6.18_164.11.1.el5_lustre.1.8.2
 lustre-client-modules-1.8.2-2.6.18_164.11.1.el5_lustre.1.8.2
 
 on the ost/mdt
 lustre-modules-1.8.2-2.6.18_164.11.1.el5_lustre.1.8.2
 kernel-2.6.18-164.11.1.el5_lustre.1.8.2
 lustre-1.8.2-2.6.18_164.11.1.el5_lustre.1.8.2
 lustre-ldiskfs-3.0.9-2.6.18_164.11.1.el5_lustre.1.8.2
 
 I have so many error messages in the logs, I am not sure which to sift 
 through for this issue.
 A quick tail on the MDT:
 =
 Apr 27 16:15:19 nas-0-1 kernel: LustreError: 
 4133:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error 
 (-107)  r...@810669d35c50 x1334203739385128/t0 o400-?@?:0/0 lens 
 192/0 e 0 to 0 dl 1272410135 ref 1 fl Interpret:H/0/0 rc -107/0
 Apr 27 16:15:19 nas-0-1 kernel: LustreError: 
 4133:0:(ldlm_lib.c:1848:target_send_reply_msg()) Skipped 419 previous 
 similar messages
 Apr 27 16:16:38 nas-0-1 kernel: LustreError: 
 4155:0:(handler.c:1518:mds_handle()) operation 400 on unconnected MDS 
 from 12345-10.1.255...@tcp
 Apr 27 16:16:38 nas-0-1 kernel: LustreError: 
 4155:0:(handler.c:1518:mds_handle()) Skipped 177 previous similar messages
 Apr 27 16:25:21 nas-0-1 kernel: LustreError: 
 6789:0:(mgs_handler.c:573:mgs_handle()) lustre_mgs: operation 400 on 
 unconnected MGS
 Apr 27 16:25:21 nas-0-1 kernel: LustreError: 
 6789:0:(mgs_handler.c:573:mgs_handle()) Skipped 229 previous similar 
 messages
 Apr 27 16:25:21 nas-0-1 kernel: LustreError: 
 6789:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error 
 (-107)  r...@810673a78050 x1334009404220652/t0 o400-?@?:0/0 lens 
 192/0 e 0 to 0 dl 1272410737 ref 1 fl Interpret:H/0/0 rc -107/0
 Apr 27 16:25:21 nas-0-1 kernel: LustreError: 
 6789:0:(ldlm_lib.c:1848:target_send_reply_msg()) Skipped 404 previous 
 similar messages
 Apr 27 16:26:41 nas-0-1 kernel: LustreError: 
 4173:0:(handler.c:1518:mds_handle()) operation 400 on unconnected MDS 
 from 12345-10.1.255...@tcp
 Apr 27 16:26:41 nas-0-1 kernel: LustreError: 
 4173:0:(handler.c:1518:mds_handle()) Skipped 181 previous similar messages
 =
 

The ENOTCONN (-107) points at server/network health. I would umount the 
clients and verify server health, then verify LNET connectivity. 
However, this would not relate to missing directories - in the absence 
of other explanations, check the MDT with fsck - that's more of a 
generic useful thing to do rather then something indicated by your data.

I would also look through older logs if available, and see if you can
find a point in time where things go bad. The first error is always the 
most useful.
 Any direction/insigt would be most helpful.

Hope this helps
cliffw

 
 Brian Andrus
 
 
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Newbie w/issues

2010-04-28 Thread Oleg Drokin
Hello!

On Apr 27, 2010, at 9:38 PM, Brian Andrus wrote:

 Odd, I just went through the log on the MDT and basically it has been 
 repeating those errors for over 24 hours (not spewing, but often enough). 
 only ONE other line on an ost:

Each such message means there was an attempt to send a ping to this server from 
a client that the server does not recognize.

 Apr 26 06:59:45 nas-0-4 kernel: LustreError: 137-5: UUID 'work-OST_UUID' 
 is not available  for connect (no target)

This one tells you that a client tried to contact OST0, but this service is not 
hosted on that node (or did not yet start up).
This might be a somewhat valid message if you have failover configured and this 
node is a currently passive failover target for the service.

Bye,
Oleg
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Newbie w/issues

2010-04-28 Thread Andreas Dilger
This means that your OST is not available. Maybe it is nor mounted?

Cheers, Andreas

On 2010-04-27, at 19:38, Brian Andrus toomuc...@gmail.com wrote:

 On 4/27/2010 6:10 PM, Oleg Drokin wrote:
 Hello!

 On Apr 27, 2010, at 7:29 PM, Brian Andrus wrote:

 Apr 27 16:15:19 nas-0-1 kernel: LustreError: 4133:0:(ldlm_lib.c: 
 1848:target_send_reply_msg()) @@@ processing error (-107)   
 r...@810669d35c50 x1334203739385128/t0 o400-?@?:0/0 lens  
 192/0 e 0 to 0 dl 1272410135 ref 1 fl Interpret:H/0/0 rc -107/0

 Any direction/insigt would be most helpful.

 That's way too late in the logs to see what happened aside from  
 server decided to evict some clients for some reason.
 Interesting parts should be around evicting or timeout were  
 first mentioned.

 Bye,
 Oleg
 Odd, I just went through the log on the MDT and basically it has been
 repeating those errors for over 24 hours (not spewing, but often
 enough). only ONE other line on an ost:

 Apr 26 06:59:45 nas-0-4 kernel: LustreError: 137-5: UUID
 'work-OST_UUID' is not available  for connect (no target)


 Brian

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] LBUG with 1.8.2 during rm

2010-04-28 Thread Patrick Winnertz
Hey,

after the my test lustre filesystem was quite full, I've made a rm -rf * on the 
lustre filesystem and got this error message:

[  135.094107] Lustre: mgc192.168@tcp: Reactivating import
[  135.094706] Lustre: Server MGS on device /dev/hda5 has started
[  137.827630] Lustre: spfs-MDT: temporarily refusing client connection 
from 192.168@tcp
[  137.828076] LustreError: 2100:0:(ldlm_lib.c:1848:target_send_reply_msg()) 
@@@ processing error (-11)  r...@decada00 x1333718444276272/t0 o38-?@?:0/0 
lens 368/0 e 0 to 0 dl 1272455883 ref 1 fl Interpret:/0/0 rc -11/0
[  155.830871] Lustre: spfs-MDT: temporarily refusing client connection 
from 192.168@tcp
[  155.831308] LustreError: 2099:0:(ldlm_lib.c:1848:target_send_reply_msg()) 
@@@ processing error (-11)  r...@de96b800 x1333718444276280/t0 o38-?@?:0/0 
lens 368/0 e 0 to 0 dl 1272455901 ref 1 fl Interpret:/0/0 rc -11/0
[  169.705870] Lustre: 2124:0:(mds_lov.c:1167:mds_notify()) MDS spfs-MDT: 
add target spfs-OST_UUID
[  169.770049] Lustre: 2052:0:(mds_lov.c:1203:mds_notify()) MDS spfs-MDT: 
in recovery, not resetting orphans on spfs-OST_UUID
[  169.802342] Lustre: spfs-mdtlov.lov: set parameter stripesize=1048576
[  173.866361] LustreError: 2103:0:(mds_open.c:1666:mds_close()) @@@ no handle 
for file close ino 1738772: cookie 0x5a629f5fe2dc51f1  r...@ded9f600 
x1333718444266421/t0 o35-51740db3-b37e-ec7c-ab23-9e7365d70fab@:0/0 lens 
408/528 e 0 to 0 dl 1272456350 ref 2 fl Interpret:/4/0 rc 0/0
[  173.867518] LustreError: 2103:0:(ldlm_lib.c:1848:target_send_reply_msg()) 
@@@ processing error (-116)  r...@ded9f600 x1333718444266421/t0 o35-51740db3-
b37e-ec7c-ab23-9e7365d70fab@:0/0 lens 408/432 e 0 to 0 dl 1272456350 ref 2 fl 
Interpret:/4/0 rc -116/0
[  174.635986] LustreError: 2104:0:(mds_open.c:1666:mds_close()) @@@ no handle 
for file close ino 1671195: cookie 0x5a629f5fe2dc47a2  r...@dfac4c00 
x1333718444266751/t0 o35-51740db3-b37e-ec7c-ab23-9e7365d70fab@:0/0 lens 
408/528 e 0 to 0 dl 1272455836 ref 2 fl Interpret:/4/0 rc 0/0
[  174.637154] LustreError: 2104:0:(mds_open.c:1666:mds_close()) Skipped 2 
previous similar messages
[  176.182138] LustreError: 2103:0:(mds_open.c:1666:mds_close()) @@@ no handle 
for file close ino 893277: cookie 0x5a629f5fe2dc22e2  r...@dfac9400 
x1333718444267408/t0 o35-51740db3-b37e-ec7c-ab23-9e7365d70fab@:0/0 lens 
408/528 e 0 to 0 dl 1272455838 ref 2 fl Interpret:/4/0 rc 0/0
[  176.183281] LustreError: 2103:0:(mds_open.c:1666:mds_close()) Skipped 4 
previous similar messages
[  177.489667] LustreError: 2099:0:(mds_reint.c:1772:mds_orphan_add_link()) 
ASSERTION(inode-i_nlink == 2) failed: dir nlink == 1
[  177.490214] LustreError: 2099:0:(mds_reint.c:1772:mds_orphan_add_link()) 
LBUG
[  177.490559] Pid: 2099, comm: ll_mdt_00
[  177.490759] 
[  177.490760] Call Trace:
[  177.491067]  [e0d0a7a8] libcfs_debug_dumpstack+0x58/0x80 [libcfs]
[  177.491423]  [e0d0aedd] lbug_with_loc+0x6d/0xc0 [libcfs]
[  177.491779]  [e13a30b5] mds_orphan_add_link+0xd85/0xd90 [mds]
[  177.492162]  [e0e9c5c4] __ldiskfs_journal_stop+0x24/0x50 
[ldiskfs]
[  177.492527]  [e13b6bcf] mds_reint_unlink+0x1e8f/0x3b80 [mds]
[  177.492867]  [e13a2093] mds_reint_rec+0x133/0x3d0 [mds]
[  177.493189]  [e138e239] mds_reint+0x229/0x740 [mds]
[  177.493583]  [e15a6d44] lustre_msg_get_flags+0x104/0x200 [ptlrpc]
[  177.493941]  [e1399599] mds_handle+0x17c9/0xa180 [mds]
[  177.494243]  [c02d32ae] _spin_lock+0x5/0x7
[  177.494505]  [c02d3204] _spin_lock_irqsave+0x23/0x29
[  177.494801]  [c012d2fb] lock_timer_base+0x19/0x35
[  177.495081]  [c012d485] __mod_timer+0xc0/0xc9
[  177.495392]  [e15a5cfc] lustre_msg_get_transno+0x10c/0x210 
[ptlrpc]
[  177.495738]  [c02d32ae] _spin_lock+0x5/0x7
[  177.496046]  [e1546ef2] 
target_queue_recovery_request+0xaf2/0x1750 [ptlrpc]
[  177.496428]  [c0129b11] __do_softirq+0x143/0x16b
[  177.496707]  [c02d32ae] _spin_lock+0x5/0x7
[  177.496980]  [e139a783] mds_handle+0x29b3/0xa180 [mds]
[  177.497282]  [c026d23f] net_rx_action+0xa4/0x1be
[  177.497558]  [c0129b11] __do_softirq+0x143/0x16b
[  177.497879]  [e15a5104] lustre_msg_get_conn_cnt+0x104/0x200 
[ptlrpc]
[  177.498238]  [c0104363] common_interrupt+0x23/0x28
[  177.498566]  [e15b4ce6] ptlrpc_update_export_timer+0x56/0x670 
[ptlrpc]
[  177.498972]  [e15b4a86] ptlrpc_check_req+0x16/0x220 [ptlrpc]
[  177.499305]  [e0b1558b] lprocfs_counter_add+0x5b/0x150 [lvfs]
[  177.499673]  [e15b8959] ptlrpc_server_handle_request+0xb29/0x1d90 
[ptlrpc]
[  177.500065]  [c011a5fb] enqueue_task+0x52/0x5d
[  177.500336]  [c012048a] try_to_wake_up+0x15c/0x165
[  177.500637]  [e0d1750b] lc_watchdog_touch+0x9b/0x270 [libcfs]
[  177.501675]  [e0d169b1] lc_watchdog_disable+0x81/0x260 [libcfs]
[  177.502052]  [e15bc277] 

Re: [Lustre-discuss] LBUG with 1.8.2 during rm

2010-04-28 Thread Johann Lombardi
On Wed, Apr 28, 2010 at 01:50:13PM +0200, Patrick Winnertz wrote:
 [  177.489667] LustreError: 2099:0:(mds_reint.c:1772:mds_orphan_add_link()) 
 ASSERTION(inode-i_nlink == 2) failed: dir nlink == 1
 [  177.490214] LustreError: 2099:0:(mds_reint.c:1772:mds_orphan_add_link()) 
 LBUG

This is a known problem with open-unlinked directory in 1.8.2.
There is a fix attached to bug 22177.

Johann
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Future of LusterFS?

2010-04-28 Thread Janne Aho
Thought I would say thanks for all the input you have given and I'm 
sorry for misspelling of LustreFS.

As we are completely green when it comes to any form of cluster file 
systems and we have to concider all the facts before we definitely know 
what we should do and how much time we have to calculate to build up a 
system, so that we can see if we can manage it by ourselves of if we 
would need to get in external help.

Do you dare to estimate how long it would take to setup and tune a 
system with offers 40T storage. The main purpose is to use the storage 
to store VM-images which are used by KVM. Lest us assume that we use 
hardware RAID6. Today there will be ~20 clients using the storage.

I know it's a really difficult question to answer, specially when not 
saying how many machines there will be and so on (frankly we don't know 
how it will be in the end), but I'm not really looking for how many 
minutes it will take, more roughly if it will be something that may take 
some days or quite many weeks or months...


Thanks in advance for any replays.

-- 
Janne Aho (Developer) | City Network Hosting AB - www.citynetwork.se
Phone: +46 455 690022 | Cell: +46 733 312775
EMail/MSN: ja...@citynetwork.se
ICQ: 567311547 | Skype: janne_mz | AIM: janne4cn | Gadu: 16275665
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] MDS inode allocation question

2010-04-28 Thread Gary Molenkamp

Thanx for the details on the Inode number, but I'm still having an issue
where I'm not getting the number I expected from the MDS creation, but I
suspect its not a reporting error from lfs.

When I create the MDS, I specified '-i 1024' and I can see (locally)
800M inodes, but only part of the available space is allocated.  Also,
when the client mounts the filesystem,  the MDS only has 400M blocks
available:

gulfwork-MDT_UUID 430781784500264 3872740840% /gulfwork[MDT:0]

As we were creating files for testing, I saw that each inode allocation
on the MDS was consuming 4k of space, so even though I have 800M inodes
available on actual mds partition, it appears that the actual space
available was only allowing 100M inodes in the lustre fs.  Am I
understanding that correctly?

I tried to force the MDS creation to use a smaller size per inode but
that produced an error:

mkfs.lustre --fsname gulfwork --mdt --mgs --mkfsoptions='-i 1024 -I
1024' --reformat --failnode=10.18.12.1 /dev/sda
...
   mke2fs: inode_size (1024) * inodes_count (860148736) too big for a
filesystem with 215037184 blocks, specify higher inode_ratio
(-i) or lower inode count (-N).
...

yet the actual drive has many more blocks available:

SCSI device sda: 1720297472 512-byte hdwr sectors (880792 MB)

Is this ext4 setting the block size limit?


FYI, I am using:
  lustre-1.8.2-2.6.18_164.11.1.el5-ext4_lustre.1.8.2.x86_64.rpm
  lustre-ldiskfs-3.0.9-2.6.18_164.11.1.el5-ext4_lustre.1.8.2.x86_64.rpm
  e2fsprogs-1.41.6.sun1-0redhat.rhel5.x86_64.rpm




-- 
Gary Molenkamp  SHARCNET
Systems Administrator   University of Western Ontario
g...@sharcnet.cahttp://www.sharcnet.ca
(519) 661-2111 x88429   (519) 661-4000
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Future of LusterFS?

2010-04-28 Thread Stuart Midgley
At most a day.  Most of that day will be inserting the DVD's with Centos into 
the machines and installing :)

The mkfs and mount of lustre will take a total of about an hour.


-- 
Dr Stuart Midgley
sdm...@gmail.com



On 28/04/2010, at 21:03 , Janne Aho wrote:

 Thought I would say thanks for all the input you have given and I'm 
 sorry for misspelling of LustreFS.
 
 As we are completely green when it comes to any form of cluster file 
 systems and we have to concider all the facts before we definitely know 
 what we should do and how much time we have to calculate to build up a 
 system, so that we can see if we can manage it by ourselves of if we 
 would need to get in external help.
 
 Do you dare to estimate how long it would take to setup and tune a 
 system with offers 40T storage. The main purpose is to use the storage 
 to store VM-images which are used by KVM. Lest us assume that we use 
 hardware RAID6. Today there will be ~20 clients using the storage.
 
 I know it's a really difficult question to answer, specially when not 
 saying how many machines there will be and so on (frankly we don't know 
 how it will be in the end), but I'm not really looking for how many 
 minutes it will take, more roughly if it will be something that may take 
 some days or quite many weeks or months...
 
 
 Thanks in advance for any replays.
 
 -- 
 Janne Aho (Developer) | City Network Hosting AB - www.citynetwork.se
 Phone: +46 455 690022 | Cell: +46 733 312775
 EMail/MSN: ja...@citynetwork.se
 ICQ: 567311547 | Skype: janne_mz | AIM: janne4cn | Gadu: 16275665
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Future of LusterFS?

2010-04-28 Thread Frank Leers
Your mention of hosting VM's below: to store VM-images which are used by KVM 
is interesting.  You'd probably
benefit from some sort of de-duplication, which lustre doesn't do.  The 
workload would also seem to not fit into 
lustre's key strengths.  

Have you considered using something like a ZFS backend, sharing out either 
iSCSI LUN's or NFS
to your 20 clients?  On the surface, this looks like it would be a better fit.  
Have a look at the Oracle 7000 appliances
if you want something turnkey, or look into one of the many ZFS appliances 
built on OpenSolaris...or build your own.

$.02

-frank


On Apr 28, 2010, at 8:26 AM, Stuart Midgley wrote:

 At most a day.  Most of that day will be inserting the DVD's with Centos into 
 the machines and installing :)

Surely, you've heard of kickstart (or some other provisioning mechanism).

 
 The mkfs and mount of lustre will take a total of about an hour.
 
 
 -- 
 Dr Stuart Midgley
 sdm...@gmail.com
 
 
 
 On 28/04/2010, at 21:03 , Janne Aho wrote:
 
 Thought I would say thanks for all the input you have given and I'm 
 sorry for misspelling of LustreFS.
 
 As we are completely green when it comes to any form of cluster file 
 systems and we have to concider all the facts before we definitely know 
 what we should do and how much time we have to calculate to build up a 
 system, so that we can see if we can manage it by ourselves of if we 
 would need to get in external help.
 
 Do you dare to estimate how long it would take to setup and tune a 
 system with offers 40T storage. The main purpose is to use the storage 
 to store VM-images which are used by KVM. Lest us assume that we use 
 hardware RAID6. Today there will be ~20 clients using the storage.
 
 I know it's a really difficult question to answer, specially when not 
 saying how many machines there will be and so on (frankly we don't know 
 how it will be in the end), but I'm not really looking for how many 
 minutes it will take, more roughly if it will be something that may take 
 some days or quite many weeks or months...
 
 
 Thanks in advance for any replays.
 
 -- 
 Janne Aho (Developer) | City Network Hosting AB - www.citynetwork.se
 Phone: +46 455 690022 | Cell: +46 733 312775
 EMail/MSN: ja...@citynetwork.se
 ICQ: 567311547 | Skype: janne_mz | AIM: janne4cn | Gadu: 16275665
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre Client - Memory Issue

2010-04-28 Thread Jagga Soorma
Hi Johann,

I am actually using 1.8.1 and not 1.8.2:

# rpm -qa | grep -i lustre
lustre-client-1.8.1.1-2.6.27.29_0.1_lustre.1.8.1.1_default
lustre-client-modules-1.8.1.1-2.6.27.29_0.1_lustre.1.8.1.1_default

My kernel version on the SLES 11 clients is:
# uname -r
2.6.27.29-0.1-default

My kernel version on the RHEL 5.3 mds/oss servers is:
# uname -r
2.6.18-128.7.1.el5_lustre.1.8.1.1

Please let me know if you need any further information.  I am still trying
to get the user to help me run his app so that I can run the leak finder
script to capture more information.

Regards,
-Simran

On Tue, Apr 27, 2010 at 7:20 AM, Johann Lombardi joh...@sun.com wrote:

 Hi,

 On Tue, Apr 20, 2010 at 09:08:25AM -0700, Jagga Soorma wrote:
  Thanks for your response.* I will try to run the leak-finder script and
  hopefully it will point us in the right direction.* This only seems to be
  happening on some of my clients:

 Could you please tell us what kernel you use on the client side?

 client104: ll_obdo_cache* 0 433506280*** 208** 19*** 1 :
 tunables*
 120** 60*** 8 : slabdata* 0 22816120* 0
 client116: ll_obdo_cache* 0 457366746*** 208** 19*** 1 :
 tunables*
 120** 60*** 8 : slabdata* 0 24071934* 0
 client113: ll_obdo_cache* 0 456778867*** 208** 19*** 1 :
 tunables*
 120** 60*** 8 : slabdata* 0 24040993* 0
 client106: ll_obdo_cache* 0 456372267*** 208** 19*** 1 :
 tunables*
 120** 60*** 8 : slabdata* 0 24019593* 0
 client115: ll_obdo_cache* 0 449929310*** 208** 19*** 1 :
 tunables*
 120** 60*** 8 : slabdata* 0 23680490* 0
 client101: ll_obdo_cache* 0 454318101*** 208** 19*** 1 :
 tunables*
 120** 60*** 8 : slabdata* 0 23911479* 0
 --
 
 Hopefully this should help.* Not sure which application might be
 causing
 the leaks.* Currently R is the only app that users seem to be using
 heavily on these clients.* Will let you know what I find.

 Tommi Tervo has filed a bugzilla ticket for this issue, see
 https://bugzilla.lustre.org/show_bug.cgi?id=22701

 Could you please add a comment to this ticket to describe the
 behavior of the application R (fork many threads, write to
 many files, use direct i/o, ...)?

 Cheers,
 Johann

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre MDS unable to start

2010-04-28 Thread Peter Grandi
[ ... ]

 The /etc/modprobe.conf is the same for all nodes:
 --
 alias eth0 e1000e
 alias eth1 e1000e
 alias eth2 8139too
 alias scsi_hostadapter aic79xx
 alias scsi_hostadapter1 ata_piix
 alias ib0 ib_ipoib
 alias ib1 ib_ipoib

Regardless of your Lustre issues, you want to double check that
these lines are guaranteed to do what you may expect them to do.
You may be making very optimistic assumptions as to their effect.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] MDS inode allocation question

2010-04-28 Thread Andreas Dilger
On 2010-04-28, at 7:44, Gary Molenkamp g...@sharcnet.ca wrote:

 When I create the MDS, I specified '-i 1024' and I can see (locally)
 800M inodes, but only part of the available space is allocated.

This is to be expected. There needs to be free space on the MDS for  
directories, striping and other internal usage.

  Also, when the client mounts the filesystem,  the MDS only has 400M  
 blocks available:

 gulfwork-MDT_UUID 430781784500264 3872740840% /gulfwork 
 [MDT:0]

 As we were creating files for testing, I saw that each inode  
 allocation
 on the MDS was consuming 4k of space,

That depends on how you are striping your files.  If the striping is  
larger than will fit inside the inode (13 stripes for 512-byte inodes  
IIRC) them each inode will also consume a block for the striping, and  
some step-wise fraction of a block for each directory entry. That is  
why 'df -i' will return min(free blocks, free inodes), though the  
common case is that files do not need an external xattr block for the  
striping (see stripe hint argument for mkfs.lustre) and the number of  
'free' inodes will remain constant as files are being created, until  
the number of free blocks exceeds the free inode count.

 so even though I have 800M inodes available on actual mds partition,  
 it appears that the actual space available was only allowing 100M  
 inodes in the lustre fs.  Am I
 understanding that correctly?

Possibly, yes. If you are striping all files widely by default it can  
happen as you write.

 I tried to force the MDS creation to use a smaller size per inode but
 that produced an error:

 mkfs.lustre --fsname gulfwork --mdt --mgs --mkfsoptions='-i 1024 -I
 1024' --reformat --failnode=10.18.12.1 /dev/sda
 ...
   mke2fs: inode_size (1024) * inodes_count (860148736) too big for a
filesystem with 215037184 blocks, specify higher inode_ratio
(-i) or lower inode count (-N).
 ...

You can't fill the filesystem 100% full of inodes (1 inode per 1024  
bytes and each inode is 1024 bytes in size). If you ARE striping  
widely you may try -i 1536 -I 1024 but please make sure this is  
actually needed or it will reduce you MDS performance due to 2x larger  
inodes.

 yet the actual drive has many more blocks available:

 SCSI device sda: 1720297472 512-byte hdwr sectors (880792 MB)

 Is this ext4 setting the block size limit?


 FYI, I am using:
  lustre-1.8.2-2.6.18_164.11.1.el5-ext4_lustre.1.8.2.x86_64.rpm
  lustre-ldiskfs-3.0.9-2.6.18_164.11.1.el5-ext4_lustre.1.8.2.x86_64.rpm
  e2fsprogs-1.41.6.sun1-0redhat.rhel5.x86_64.rpm




 -- 
 Gary MolenkampSHARCNET
 Systems AdministratorUniversity of Western Ontario
 g...@sharcnet.cahttp://www.sharcnet.ca
 (519) 661-2111 x88429(519) 661-4000
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss