[Lustre-discuss] Client Cannot Mount File System

2014-06-16 Thread Charles Taylor
MDS/OSSs: 1.8.8-wc1_2.6.18_308.4.1.el5_gbc88c4c
Client:   1.8.9-wc1_2.6.32_358.23.2.el6

One (out of hundreds) of our clients has been unable to mount our lustre file 
system.  We could find no host or network issues.  Attempts to mount yielded 
the following on the client

mount -t lustre -o localflock 10.13.68.1@o2ib:10.13.68.2@o2ib:/lfs /lfs/scratch 
 
mount.lustre: mount 10.13.68.1@o2ib:10.13.68.2@o2ib:/lfs at /lfs/scratch failed:
Interrupted system call
Error: Failed to mount 10.13.68.1@o2ib:10.13.68.2@o2ib:/lfs

with the following syslog messages.

Jun 10 15:21:05 r15a-s40 kernel: Lustre: 
1269:0:(o2iblnd_cb.c:1813:kiblnd_close_conn_locked()) Closing conn to 
10.13.79.252@o2ib2: error 0(waiting)
Jun 10 15:21:05 r15a-s40 kernel: LustreError: 166-1: MGC10.13.68.1@o2ib: 
Connection to service MGS via nid 10.13.68.1@o2ib was lost; in progress 
operations using this service will fail.
Jun 10 15:21:05 r15a-s40 kernel: LustreError: 15c-8: MGC10.13.68.1@o2ib: The 
configuration from log 'lfs-client' failed (-4). This may be the result of 
communication errors between this node and the MGS, a bad configuration, or 
other errors. See the syslog for more information.
Jun 10 15:21:05 r15a-s40 kernel: LustreError: 
4012:0:(llite_lib.c:1099:ll_fill_super()) Unable to process log: -4
Jun 10 15:21:05 r15a-s40 kernel: LustreError: 
4012:0:(lov_obd.c:1012:lov_cleanup()) lov tgt 1 not cleaned! deathrow=0, lovrc=1
Jun 10 15:21:05 r15a-s40 kernel: LustreError: 
4012:0:(lov_obd.c:1012:lov_cleanup()) Skipped 5 previous similar messages
Jun 10 15:21:05 r15a-s40 kernel: LustreError: 
4012:0:(lov_obd.c:1012:lov_cleanup()) lov tgt 13 not cleaned! deathrow=1, 
lovrc=1
Jun 10 15:21:05 r15a-s40 kernel: LustreError: 
4012:0:(mdc_request.c:1500:mdc_precleanup()) client import never connected
Jun 10 15:21:05 r15a-s40 kernel: Lustre: MGC10.13.68.1@o2ib: Reactivating import
Jun 10 15:21:05 r15a-s40 kernel: Lustre: MGC10.13.68.1@o2ib: Connection 
restored to service MGS using nid 10.13.68.1@o2ib.
Jun 10 15:21:05 r15a-s40 kernel: Lustre: client lfs-client(88061e105c00) 
umount complete
Jun 10 15:21:05 r15a-s40 kernel: LustreError: 
4012:0:(obd_mount.c:2067:lustre_fill_super()) Unable to mount  (-4)

Nothing noteworthy on the MDS.   

After reconfiguring the client with a new IPoIB IP (and hence, NID), it was 
able to mount with no problems and is working fine.Additionally, the MDS 
was rebooted at least once during the time that this client in question was 
unable to mount so it seems like whatever was on the MDT was saved - presumably 
on the MDT.   

I'm particularly curious about the ll_fill_super message.  To what log is 
it referring?   

Anyone seen this before and have an idea what we need to clear on the MDS/MDT 
to allow this client to successfully mount the file system again?

Thanks,

Charlie Taylor
UF Research Computing


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Thread might be hung, Heavy IO Load messages

2012-02-01 Thread Charles Taylor

You may also want to check and, if necessary, limit the lru_size on your 
clients.   I believe there are guidelines in the ops manual.  We have ~750 
clients and limit ours to 600 per OST.   That, combined with the setting 
zone_reclaim_mode=0 should make a big difference.   

Regards,

Charlie Taylor
UF HPC Center


On Feb 1, 2012, at 2:04 PM, Carlos Thomaz wrote:

 Hi David,
 
 You may be facing the same issue discussed on previous threads, which is
 the issue regarding the zone_reclaim_mode.
 
 Take a look on the previous thread where myself and Kevin replied to
 Vijesh Ek.
 
 If you don't have access to the previous emails, look at your kernel
 settings for the zone reclaim:
 
 cat /proc/sys/vm/zone_reclaim_mode
 
 It should be set to 0.
 
 Also, look at the number of Lustre OSS service threads. It may be set to
 high...
 
 Rgds.
 Carlos.
 
 
 --
 Carlos Thomaz | HPC Systems Architect
 Mobile: +1 (303) 519-0578
 ctho...@ddn.com | Skype ID: carlosthomaz
 DataDirect Networks, Inc.
 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921
 ddn.com http://www.ddn.com/ | Twitter: @ddn_limitless
 http://twitter.com/ddn_limitless | 1.800.TERABYTE
 
 
 
 
 
 On 2/1/12 11:57 AM, David Noriega tsk...@my.utsa.edu wrote:
 
 indicates the system was overloaded (too many service threads, or
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

Charles A. Taylor, Ph.D.
Associate Director,
UF HPC Center
(352) 392-4036



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Inactive Service Threads

2011-12-27 Thread Charles Taylor
Lustre Version:
lustre-1.8.6-wc1_2.6.18_238.12.1.el5_lustre.1.8.6.x86_64

OSS Configuration:
--
Dual E5620 Processors (8 2.4 GHz cores)
24 GB RAM:
8 OSTs ( two per controller)
   4 x Adaptec 51245
   2 x RAID-6 LUN's per controller
   7200 RPM Hitachi Drives (SATA HUA722020ALA330))
   128 KB Stripe Size
   512 KB RPCs


We've tested the configuration extensively and know that we can sustain 2.4 
GB/sec to the OSSs for large-block sequential I/O for long periods of time with 
no issues.  The problem comes in production under more typical client work 
loads where we see far too many of the messages below - even when the load on 
the servers is not all that great (8 - 10).   Five minutes for an IOP to 
complete seems like a long time.   Seems like we must be either hitting a 
bug or running out of some resource (locks?).  Iostat tends to show fairly 
typical service, queue, and wait times which further suggests that there is 
more going on here than just busy disks.

We have about 600 clients with the following settings...

   lctl set_param ldlm.namespaces.*osc*.lru_size=600
   lctl set_param ldlm.namespaces.*mdc*.lru_size=600
   max_rpcs_in_flight=32
   max_pages_per_rpc=128


Note that we have tuned down the number of ost threads from the default to 96 
but it has had little impact.  If we are to believe the messages, we should 
probably reduce the thread count further but it feels like something else is 
wrong.   Perhaps someone else has encountered this or can see an obvious 
problem in our setup.

Any ideas or suggestions are welcome.

Charlie Taylor
UF HPC Center


Dec 26 15:05:10 hpcoss8 kernel: Lustre: Service thread pid 26929 was inactive 
for 320.00s. The thread might be hung, or it might only be slow and will resume 
later. Dumping the stack trace for debugging purposes:
Dec 26 15:05:10 hpcoss8 kernel: Pid: 26929, comm: ll_ost_io_36
Dec 26 15:05:10 hpcoss8 kernel:
Dec 26 15:05:10 hpcoss8 kernel: Call Trace:
Dec 26 15:05:10 hpcoss8 kernel:  [80047144] try_to_wake_up+0x472/0x484
Dec 26 15:05:10 hpcoss8 kernel:  [8008c871] __wake_up_common+0x3e/0x68
Dec 26 15:05:10 hpcoss8 kernel:  [8028882e] __down_trylock+0x39/0x4e
Dec 26 15:05:10 hpcoss8 kernel:  [8006472d] 
__down_failed_trylock+0x35/0x3a
Dec 26 15:05:10 hpcoss8 kernel:  [800646b9] __down_failed+0x35/0x3a
Dec 26 15:05:10 hpcoss8 kernel:  [88b491e6] 
.text.lock.ldlm_resource+0x7d/0x87 [ptlrpc]
Dec 26 15:05:10 hpcoss8 kernel:  [88b6c337] 
ldlm_pools_shrink+0x247/0x2f0 [ptlrpc]
Dec 26 15:05:10 hpcoss8 kernel:  [80064604] __down_read+0x12/0x92
Dec 26 15:05:10 hpcoss8 kernel:  [8002231e] __up_read+0x19/0x7f
Dec 26 15:05:10 hpcoss8 kernel:  [8003f6c0] shrink_slab+0x60/0x153
Dec 26 15:05:10 hpcoss8 kernel:  [800cdd0a] zone_reclaim+0x235/0x2cd
Dec 26 15:05:10 hpcoss8 kernel:  [800ca13d] __rmqueue+0x44/0xc7
Dec 26 15:05:10 hpcoss8 kernel:  [8000a919] 
get_page_from_freelist+0xbf/0x43a
Dec 26 15:05:10 hpcoss8 kernel:  [8000f41a] __alloc_pages+0x78/0x308
Dec 26 15:05:10 hpcoss8 kernel:  [80025d41] 
find_or_create_page+0x32/0x72
Dec 26 15:05:10 hpcoss8 kernel:  [88e694e5] filter_get_page+0x35/0x70 
[obdfilter]
Dec 26 15:05:10 hpcoss8 kernel:  [88e6b72a] 
filter_preprw+0x14da/0x1e00 [obdfilter]
Dec 26 15:05:10 hpcoss8 kernel:  [88a41a54] 
kiblnd_init_tx_msg+0x154/0x1d0 [ko2iblnd]
Dec 26 15:05:10 hpcoss8 kernel:  [88ad4dc0] 
class_handle2object+0xe0/0x170 [obdclass]
Dec 26 15:05:11 hpcoss8 kernel:  [88a49f2d] kiblnd_send+0x86d/0x8b0 
[ko2iblnd]
Dec 26 15:05:11 hpcoss8 kernel:  [88e1600c] 
ost_brw_write+0xf9c/0x2480 [ost]
Dec 26 15:05:11 hpcoss8 kernel:  [889fe111] LNetMDBind+0x301/0x450 
[lnet]
Dec 26 15:05:11 hpcoss8 kernel:  [88b88c65] 
lustre_msg_set_limit+0x35/0xf0 [ptlrpc]
Dec 26 15:05:11 hpcoss8 kernel:  [88b7eac8] 
ptlrpc_send_reply+0x5e8/0x600 [ptlrpc]
Dec 26 15:05:11 hpcoss8 kernel:  [88b82fe5] 
lustre_msg_get_version+0x35/0xf0 [ptlrpc]
Dec 26 15:05:11 hpcoss8 kernel:  [88b82ef5] 
lustre_msg_get_opc+0x35/0xf0 [ptlrpc]
Dec 26 15:05:11 hpcoss8 kernel:  [88b830a8] 
lustre_msg_check_version_v2+0x8/0x20 [ptlrpc]
Dec 26 15:05:11 hpcoss8 kernel:  [88e1a09e] ost_handle+0x2bae/0x55b0 
[ost]
Dec 26 15:05:11 hpcoss8 kernel:  [80153e70] __next_cpu+0x19/0x28
Dec 26 15:05:11 hpcoss8 kernel:  [8008dc31] dequeue_task+0x18/0x37
Dec 26 15:05:11 hpcoss8 kernel:  [88b926d9] 
ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc]
Dec 26 15:05:11 hpcoss8 kernel:  [88b92e35] 
ptlrpc_wait_event+0x2e5/0x310 [ptlrpc]
Dec 26 15:05:11 hpcoss8 kernel:  [8008e435] 
default_wake_function+0x0/0xe
Dec 26 15:05:11 hpcoss8 kernel:  [88b93dc6] ptlrpc_main+0xf66/0x1120 
[ptlrpc]
Dec 26 15:05:11 hpcoss8 kernel:  [8005dfb1] child_rip+0xa/0x11
Dec 26 15:05:11 hpcoss8 kernel:  [88b92e60] 

Re: [Lustre-discuss] RAID cards - what works well with Lustre?

2011-07-05 Thread Charles Taylor

We use adaptec 51245s and 51645s with

1. max_hw_sectors_kb=512
2. RAID5 4+1 or RAID6 4+2
3. RAID chunk size = 128

So each 1 MB lustre RPC results in two 4-way, striped writes with no  
read-modify-write penalty.   We can further improve write performance  
by matching the max_pages_per_rpc (per OST on the client side) i.e.  
the max rpc size to the max_hw_sectors_kb setting for the block  
devices.   In this case


max_pages_per_rpc=128

instead of the default 256 at which point you have 1 raid-stripe write  
per rpc.


If you put your OSTs atop LVs (LVM2) as we do, you will want to take  
the additional step of making sure your LVs are aligned as well.


pvcreate --dataalignment 1024S /dev/sd$driveChar

You need a fairly new version of the LVM2 that supports the -- 
dataalignment option. We are using  lvm2-2.02.56-8.el5_5.6.x86_64.


Note that we attempted to increase the max_hw_sectors_kb for the block  
devices (RAID LDs) to 1024 but in order to do so, we needed to change  
the adaptec driver (aacraid) kernel parameter acbsize=8192 which we  
found to be unstable.For our adaptec drivers we use..


options aacraid cache=7 msi=2 expose_physicals=-1 acbsize=4096

Note that most of the information above was the result of testing and  
tuning performed here by Craig Prescott.


We now have close to a PB of such storage in production here at the UF  
HPC Center.   We used Areca cards at first but found them to be a bit  
too flakey for our needs. The adaptecs seem to have some infant  
mortality issues.   We RMA about 10% to 12% percent of newly purchased  
cards but if they make it past initial burn-in testing, they tend to  
be pretty reliable.


Regards,

Charlie Taylor
UF HPC Center









On Jul 5, 2011, at 12:33 PM, Daire Byrne wrote:


Hi,

I have been testing some LSI 9260 RAID cards for use with Lustre  
v1.8.6 but have found that the megaraid_sas driver is not really  
able to facilitate the 1MB full stripe IOs that Lustre likes. This  
topic has also come up recently in the following two email threads:


http://groups.google.com/group/lustre-discuss-list/browse_thread/thread/65a1fdc312b0eccb#
http://groups.google.com/group/lustre-discuss-list/browse_thread/thread/fcf39d85b7e945ab

I was able to up the max_hw_sectors_kb - 1024 by setting the  
max_sectors megaraid_sas module option but found that the IOs were  
still being pretty fragmented:


disk I/O size  ios   % cum % |  ios   % cum %
4K:   3060   0   0   | 2611   0   0
8K:   3261   0   0   | 2664   0   0
16K:  6408   0   1   | 5296   0   1
32K: 13025   1   2   | 10692   1   2
64K: 48397   4   6   | 26417   2   4
128K:50166   4  10   | 42218   4   9
256K:   113124   9  20   | 86516   8  17
512K:   677242  57  78   | 448231  45  63
1M: 254195  21 100   | 355804  36 100

So next I looked at the sg_tablesize and found it was being set to  
80 by the driver (which queries the firmware). I tried to hack the  
driver and increase this value but bad things happened and so it  
looks like it is a genuine hardware limit with these cards.


The overall throughput isn't exactly terrible because the RAID write- 
back cache does a reasonable job but I suspect it could be better,  
e.g.


ost  3 sz 201326592K rsz 1024K obj  192 thr  192 write 1100.52  
[ 231.75, 529.96] read  940.26 [ 275.70, 357.60]
ost  3 sz 201326592K rsz 1024K obj  192 thr  384 write 1112.19  
[ 184.80, 546.43] read 1169.20 [ 337.63, 462.52]
ost  3 sz 201326592K rsz 1024K obj  192 thr  768 write 1217.79  
[ 219.77, 665.32] read 1532.47 [ 403.58, 552.43]
ost  3 sz 201326592K rsz 1024K obj  384 thr  384 write  920.87  
[ 171.82, 466.77] read  901.03 [ 257.73, 372.87]
ost  3 sz 201326592K rsz 1024K obj  384 thr  768 write 1058.11  
[ 166.83, 681.25] read 1309.63 [ 346.64, 484.51]


All of this brings me to my main question - what internal cards have  
people here used which work well with Lustre?  3ware, Areca or other  
models of LSI?


Cheers,

Daire
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] quota_chk_acq_common()

2011-06-17 Thread Charles Taylor
We enabled quotas on two new file systems and are now seeing lots of  
the following in our logs...


Lustre: 31473:0:(quota_interface.c:460:quota_chk_acq_common()) still  
haven't
managed to acquire quota space from the quota master after 10 retries  
(err=0,

rc=0): 2 Time(s)
Looking at the code, it is clearly going through the loop at least 10  
times however rc is always zero when the message is printed so the  
acquire() call is succeeding, apparently, on the 10th try.   However,  
if I'm reading the code correctly, for that to happen, the thread has  
already waited at least 45s (cumulatively) which is a long time to  
us.   It seems like such a long wait would cause other complaints but  
we aren't seeing anything obvious.
Is this normal?Are others seeing the same messages?  Is there some  
tuning we should be doing.

Note that one of the file systems is 1.8.5 while the other is 2.0.
Thx,
Charlie Taylor
UF HPC Center




___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Lustre HA Experiences

2011-05-04 Thread Charles Taylor

We are dipping our toes into the waters of Lustre HA using  
pacemaker. We have 16 7.2 TB OSTs across 4 OSSs (4 OSTs each).
The four OSSs are broken out into two dual-active pairs running Lustre  
1.8.5.Mostly, the water is fine but we've encountered a few  
surprises.

1. An 8-client  iozone write test in which we write 64 files of 1.7  
TB  each seems to go well - until the end at which point iozone seems  
to finish successfully and begins its cleanup.   That is to say it  
starts to remove all 64 large files.At this point, the ll_ost   
threads go bananas - consuming all available cpu cycles on all 8 cores  
of each server.   This seems to block the corosync totem exchange  
long enough to initiate a stonith request.

2. We have found that re-mounting the OSTs, either via the HA agent or  
manually, often can take a *very* long time - on the order of four or  
five minutes.   We have not figured out why yet.   An strace of the  
mount process has not yielded much.The mount seems to just be  
waiting for something but we can't tell what.

We are starting to adjust our HA parameters to compensate for these  
observations but we hate to do this in a vacuum and wonder if others  
have also observed these behaviors and what, if anything, was done to  
compensate/correct?

Regards,

Charlie Taylor
UF HPC Center


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] MDS can't recover OSTs

2011-04-28 Thread Charles Taylor

We had a RAID array barf this morning resulting in some OST corruption  
which appeared to be successfully repaired with a combination of fsck  
and ll_recover_lost_found_objs.   The OSTs mounted OK but the MDS  
can't seem to recover its connection to two of the OSTs as we are  
seeing a continuing stream of the following in the MDS syslog.

Apr 28 11:37:54 crnmds kernel: Lustre: 31983:0:(recover.c: 
67:ptlrpc_initiate_recovery()) crn-OST0013_UUID: starting recovery
Apr 28 11:37:54 crnmds kernel: Lustre: 31983:0:(import.c: 
608:ptlrpc_connect_import()) 810117426000 crn-OST0013_UUID:  
changing import state from DISCONN to CONNECTING
Apr 28 11:37:54 crnmds kernel: Lustre: 31983:0:(import.c: 
470:import_select_connection()) crn-OST0013-osc: connect to NID  
10.13.24.92@o2ib last attempt 22689204132
Apr 28 11:37:54 crnmds kernel: Lustre: 31983:0:(import.c: 
544:import_select_connection()) crn-OST0013-osc: import  
810117426000 using connection 10.13.24.92@o2ib/10.13.24.92@o2ib
Apr 28 11:37:54 crnmds kernel: Lustre: 31982:0:(import.c: 
1091:ptlrpc_connect_interpret()) 810117426000 crn-OST0013_UUID:  
changing import state from CONNECTING to DISCONN
Apr 28 11:37:54 crnmds kernel: Lustre: 31982:0:(import.c: 
1137:ptlrpc_connect_interpret()) recovery of crn-OST0013_UUID on  
10.13.24.92@o2ib failed (-16)
Apr 28 11:37:54 crnmds kernel: Lustre: 31982:0:(import.c: 
1091:ptlrpc_connect_interpret()) 81012e50d000 crn-OST0007_UUID:  
changing import state from CONNECTING to DISCONN
Apr 28 11:37:54 crnmds kernel: Lustre: 31982:0:(import.c: 
1137:ptlrpc_connect_interpret()) recovery of crn-OST0007_UUID on  
10.13.24.91@o2ib failed (-16)

It seems that we never see a  'oscc recovery finished' message on  
crnmds for OST0007 or OST0013.

We have not seen this problem before so we are trying to figure out  
how to get the MDT reconnected to these two OSTs.

Any one else been through this before?

Thanks,

Charlie Taylor
UF HPC Center



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] how to reuse OST indices (EADDRINUSE)

2010-12-21 Thread Charles Taylor

On Dec 21, 2010, at 12:39 PM, Andreas Dilger wrote:

 It's unfortunate that you didn't see the thread from a few weeks ago  
 that discussed this exact topic of OST replacement.

Agreed.  :(

 It should get a section in the manual I think.

Agreed.

 This file is at /O/0/LAST_ID (capital 'o' then zero) and should be  
 copied for OSTs you haven't replaced yet, along with the other  
 files.  It can be recreated with a binary editor from the value on  
 the MDS (lctl get_param osc.*.prealloc_next_id) for the 6 OSTs that  
 have already been replaced. Search the list or bugzilla for  
 LAST_ID for a detailed procedure.

This seems to do the trick.   Thank you!.One important  
clarification though...on the mds, should we getting the value of  
prealloc_next_id or prealloc_last_id?Section 23.3.9 of the 2.0 Ops  
manual for How to fix a Bad LAST_ID on an OST seems to use  
prealloc_last_id.Which should we be using?

Thank you again,

Charlie Taylor
UF HPC Center


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] controlling which eth interface lustre uses

2010-10-21 Thread Charles Taylor

On Oct 21, 2010, at 9:51 AM, Brock Palen wrote:

 On Oct 21, 2010, at 9:48 AM, Joe Landman wrote:

 On 10/21/2010 09:37 AM, Brock Palen wrote:
 We recently added a new oss, it has 1 1Gb interface and 1 10Gb
 interface,

 The 10Gb interface is eth4 10.164.0.166 The 1Gb   interface is eth0
 10.164.0.10

 They look like they are on the same subnet if you are using /24 ...

 You are correct

 Both interfaces are on the same subnet:

 [r...@oss4-gb ~]# route
 Kernel IP routing table
 Destination Gateway Genmask Flags Metric Ref 
 Use Iface
 10.164.0.0  *   255.255.248.0   U 0   
 00 eth0
 10.164.0.0  *   255.255.248.0   U 0   
 00 eth4
 169.254.0.0 *   255.255.0.0 U 0   
 00 eth4
 default 10.164.0.1  0.0.0.0 UG0   
 00 eth0

 There is no way to mask the lustre service away from the 1Gb  
 interface?

We struggle with this as well but have not found a way to enforce  
it.   You would think that lustre would honor the NID for incoming  
*and* outgoing traffic but apparently the standard linux routing table  
determines the outbound path and lnet is out of the picture. Thus,  
you end up having to assign separate subnets, shut down your eth0 (in  
this case) interface, or use static routes to fine tune the routing  
decisions (where possible).

We wish that the outgoing decision could be made on the basis of the  
*NID* but that might be too intrusive with regard to the linux  
kernel's network stack so I can understand, somewhat, why it is not  
that way.   Still, it is somewhat counter-intuitive to go through all  
the trouble of having the LNET layer and assigning NIDs only to have  
them disregarded for outbound traffic.

Perhaps there is a way around this that we don't know about.

Regards,

Charlie Taylor
UF HPC Center

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Lustre 1.8.4 Patched Kernel Build

2010-09-03 Thread Charles Taylor

After installing the kernel source...

rpm -Uvh kernel-2.6.18-194.3.1.0.1.el5.src.rpm
rpmbuild -bp kernel-2.6.spec

intalling the lustre source...

lustre-source-1.8.4-2.6.18_194.3.1.0.1.el5_lustre.1.8.4.x86_64.rpm

and patching the kernel...

ln -s /usr/src/lustre-1.8.4/ldiskfs/kernel_patches/patches .
ln -s /usr/src/lustre-1.8.4/ldiskfs/kernel_patches/series/ldiskfs-2.6- 
rhel5.series series
quilt push -av,

we attempt to build the kernel and get

   CC  fs/compat_ioctl.o
In file included from include/linux/ext3_jbd.h:20,
  from fs/compat_ioctl.c:50:
include/linux/ext3_fs.h: In function ‘ext3_new_blocks’:
include/linux/ext3_fs.h:1057: error: ‘EXT2_MOUNT_MBALLOC’ undeclared  
(first use in this function)
include/linux/ext3_fs.h:1057: error: (Each undeclared identifier is  
reported only once
include/linux/ext3_fs.h:1057: error: for each function it appears in.)

This seems easy enough to fix but  doing so just results in more of  
the same (and worse) down the road.

Shouldn't this just work?Is there a problem with the source RPMs?

Charlie Taylor
UF HPC Center



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] two problems

2010-06-03 Thread Charles Taylor

On Jun 3, 2010, at 6:17 PM, Andreas Dilger wrote:

 On 2010-06-03, at 06:23, Stefano Elmopi wrote:
 surely my action was to test environment, in a production environment, I 
 would have placed all the files before deleting the server OST1.
 
 The main problem here is that you have completely erased all knowledge of the 
 failed OST, while there are still files in the filesystem using it (i.e. 
 using lctl --writeconf).
 
 If the OST had simply failed and been marked inactive (which is what is 
 normally done in such situations) it would still be possible to delete the 
 files.  The problem being seen on the MDT now is simply one that cannot 
 happen in any normal failure scenario.

I'm sure I'm speaking out of turn but our recent experience contradicts this.   
 We lost an OST and marked it as inactive and *could not* remove the files 
until we actually replaced the OST with another (using the same index).   Once 
we did that and reactivated the OST we could delete the files which didn't 
really exist other than on the MDT.  

It was kind of annoying.   Our intent was not to replace the OST but it became 
such a hassle for us and our users (recursive file operations would often 
encounter the missing files and error out) that we did so just to be able to 
remove the files that had been on the failed OST.

Regards,

Charlie Taylor
UF HPC Center
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lost Files - How to remove from MDT

2010-04-19 Thread Charles Taylor

On Apr 18, 2010, at 11:46 AM, Bernd Schubert wrote:

 
 You don't need to take the filesystem offline for lfsck.

You sure about that?   Looking at 
http://wiki.lustre.org/manual/LustreManual18_HTML/LustreRecovery.html#50598012_37365
 step 1 says Stop the Lustre File System.   

 Also, I have 
 rewritten large parts of lfsck and also fixed the parallelization code. I 
 need 
 to review all patches again and probably also make a hg or git repository out 
 of it. Unfortunately, I always have more tasks to do than I manage to do...
 But given the fact that I fixed several bugs and added safety checks, I think 
 my version actually is better than upstream.
 
 Let me know if you are interested and I can put a tar ball of 
 e2fsprogs-sun-ddn on my home page.

Sure, we can try it but it seems to me that by the time you generate the OST 
data and run lfsck against the MDT, much could change on a file system being 
used by 600+ active clients.

REgards,

Charlie

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lost Files - How to remove from MDT

2010-04-19 Thread Charles Taylor

On Apr 18, 2010, at 1:14 PM, Andreas Dilger wrote:

 On 2010-04-18, at 07:16, Charles Taylor wrote:
 On Apr 18, 2010, at 9:35 AM, Miguel Afonso Oliveira wrote:
 You are going to have to use unlink with something like this:
 
 for file in lost_files
 unlink $file
 
 Nope.   That's really no different than rm and produces the same result...
 
 unlink /scratch/crn/bwang/NCS/1O5P/1o5p_wat.prmtop
 unlink: cannot unlink `/scratch/crn/bwang/NCS/1O5P/1o5p_wat.prmtop': Invalid 
 argument
 
 This surprises me that unlink doesn't work, since that is the answer I was 
 going to give also.  Did you also verify that after this message is posted, 
 that the file isn't actually unlinked?  I suspect that the file name was 
 unlinked, but an error is returned from destroying the OST object, but that 
 is fine since the OST is dead and gone anyway.

Nope.  They are still there following the Invalid argument error.  It seems 
that before we deactivated the OST we could remove the files but got an error 
message but once the OST was deactivated, we get the error message and the file 
(err, its metadata) remains.

 
 What error messages are posted on the console log (dmesg/syslog)?

Lots of the following but there is a find running as well so I don't think it 
is necessarily from the rm command.

Lustre: 4286:0:(lov_pack.c:67:lov_dump_lmm_v1()) stripe_size 1048576, 
stripe_count 1
Lustre: 4286:0:(lov_pack.c:76:lov_dump_lmm_v1()) stripe 0 idx 17 subobj 
0x0/0x3dbe6b
Lustre: 4286:0:(lov_pack.c:64:lov_dump_lmm_v1()) objid 0x38f59c8, magic 
0x0bd10bd0, pattern 0x1
Lustre: 4286:0:(lov_pack.c:67:lov_dump_lmm_v1()) stripe_size 1048576, 
stripe_count 1
Lustre: 4286:0:(lov_pack.c:76:lov_dump_lmm_v1()) stripe 0 idx 17 subobj 
0x0/0x3dc0fa


Charlie 


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Lost Files - How to remove from MDT

2010-04-18 Thread Charles Taylor
We lost an OST several months ago and could not recover it.We decided to 
deactivate until we bring some new storage online and can just rebuild the 
entire file system.However, now, the MDT still knows about all the files 
that were on the lost OST and this results in things like invalid argument 
and ??   ?.. in directory listings.The files cannot be 
removed by standard commands.   We end up doing something like

mv Dir to Tmp
cp -r Tmp Dir   (this produces lots of 'cp: cannot stat ...' for the missing 
files)
mv Tmp /lost+found (this moves all the missing file names more or less out of 
the way).

Is there some way to remove these files from the MDT - as though they never 
existed - without reformatting the entire file system?

Thanks,

Charlie Taylor
UF HPC Center

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lost Files - How to remove from MDT

2010-04-18 Thread Charles Taylor

On Apr 18, 2010, at 9:38 AM, Brian J. Murrell wrote:

 On Sun, 2010-04-18 at 09:30 -0400, Charles Taylor wrote: 
 
 Is there some way to remove these files from the MDT - as though they never 
 existed - without reformatting the entire file system?
 
 lfsck is the documented, supported method.

Yes, but we attempted that at one time with a smaller file system (for a 
different reason).   After letting it run for over a day, we estimated that it 
would have taken seven to ten days to finish.   That just wasn't practical for 
us at the time and still isn't.  This file system would probably take a couple 
of weeks to lfsck.  I'm sorry to say we can't take the file system offline for 
that long. 

We may just have to leave it as is until we put some new storage in place and 
can migrate the good data off it.   I just thought I'd ask. 

Thanks for the reply though,  

Charlie Taylor
UF HPC Center
  

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lost Files - How to remove from MDT

2010-04-18 Thread Charles Taylor

On Apr 18, 2010, at 9:35 AM, Miguel Afonso Oliveira wrote:

 Hi,
 
 You are going to have to use unlink with something like this:
 
 for file in lost_files
   unlink $file

Nope.   That's really no different than rm and produces the same result...

unlink /scratch/crn/bwang/NCS/1O5P/1o5p_wat.prmtop
unlink: cannot unlink `/scratch/crn/bwang/NCS/1O5P/1o5p_wat.prmtop': Invalid 
argument

Thanks for the suggestion though,

Charlie Taylor
UF HPC Center

 
 Cheers,
 
 Miguel Afonso Oliveira
 
 P.S.: To build a list of all your lost files you can do a rsync with the 
 dry-run flag.
 
 On Apr 18, 2010, at 2:30 PM, Charles Taylor wrote:
 
 We lost an OST several months ago and could not recover it.We decided to 
 deactivate until we bring some new storage online and can just rebuild the 
 entire file system.However, now, the MDT still knows about all the files 
 that were on the lost OST and this results in things like invalid argument 
 and ??   ?.. in directory listings.The files cannot be 
 removed by standard commands.   We end up doing something like
 
 mv Dir to Tmp
 cp -r Tmp Dir   (this produces lots of 'cp: cannot stat ...' for the missing 
 files)
 mv Tmp /lost+found (this moves all the missing file names more or less out 
 of the way).
 
 Is there some way to remove these files from the MDT - as though they never 
 existed - without reformatting the entire file system?
 
 Thanks,
 
 Charlie Taylor
 UF HPC Center
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lost Files - How to remove from MDT

2010-04-18 Thread Charles Taylor

On Apr 18, 2010, at 10:47 AM, Miguel Afonso Oliveira wrote:

 Hi again,
 
 Sorry I forgot to mention this only works if the offending OST still 
 exists. If at this time you can no longer re-include the OST where these 
 files existed then you can still
 create a new one with the same index and then you can unlink.

Ok, thanks.   We may go ahead and try that.

Charlie

 
 MAO
 On Apr 18, 2010, at 3:16 PM, Charles Taylor wrote:
 
 
 On Apr 18, 2010, at 9:35 AM, Miguel Afonso Oliveira wrote:
 
 Hi,
 
 You are going to have to use unlink with something like this:
 
 for file in lost_files
 unlink $file
 
 Nope.   That's really no different than rm and produces the same result...
 
 unlink /scratch/crn/bwang/NCS/1O5P/1o5p_wat.prmtop
 unlink: cannot unlink `/scratch/crn/bwang/NCS/1O5P/1o5p_wat.prmtop': Invalid 
 argument
 
 Thanks for the suggestion though,
 
 Charlie Taylor
 UF HPC Center
 
 
 Cheers,
 
 Miguel Afonso Oliveira
 
 P.S.: To build a list of all your lost files you can do a rsync with the 
 dry-run flag.
 
 On Apr 18, 2010, at 2:30 PM, Charles Taylor wrote:
 
 We lost an OST several months ago and could not recover it.We decided 
 to deactivate until we bring some new storage online and can just rebuild 
 the entire file system.However, now, the MDT still knows about all the 
 files that were on the lost OST and this results in things like invalid 
 argument and ??   ?.. in directory listings.The files 
 cannot be removed by standard commands.   We end up doing something 
 like
 
 mv Dir to Tmp
 cp -r Tmp Dir   (this produces lots of 'cp: cannot stat ...' for the 
 missing files)
 mv Tmp /lost+found (this moves all the missing file names more or less out 
 of the way).
 
 Is there some way to remove these files from the MDT - as though they 
 never existed - without reformatting the entire file system?
 
 Thanks,
 
 Charlie Taylor
 UF HPC Center
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lost Files - How to remove from MDT

2010-04-18 Thread Charles Taylor

While I'm thinking about it, that brings up an interesting question.   All the 
OSTs for this file system were originally formatted under 1.6.3.   We have 
since upgraded to 1.8.x.   If we reformat the missing OST with the same index 
under 1.8.2 and add it back into the file system (sans its data) should we 
expect trouble?   We were reluctant to do so since we doubt that this is a 
tested scenario but perhaps we are being overly paranoid.   

Should it be OK to mix OSTs formatted under different versions (1.6 vs 1.8) of 
Lustre?Seems like it should be OK but you can't test everything and this 
seems like a bit of an outlier.  

Regards,

Charlie Taylor
UF HPC Center

On Apr 18, 2010, at 10:47 AM, Miguel Afonso Oliveira wrote:

 Hi again,
 
 Sorry I forgot to mention this only works if the offending OST still 
 exists. If at this time you can no longer re-include the OST where these 
 files existed then you can still
 create a new one with the same index and then you can unlink.
 
 MAO
 On Apr 18, 2010, at 3:16 PM, Charles Taylor wrote:
 
 
 On Apr 18, 2010, at 9:35 AM, Miguel Afonso Oliveira wrote:
 
 Hi,
 
 You are going to have to use unlink with something like this:
 
 for file in lost_files
 unlink $file
 
 Nope.   That's really no different than rm and produces the same result...
 
 unlink /scratch/crn/bwang/NCS/1O5P/1o5p_wat.prmtop
 unlink: cannot unlink `/scratch/crn/bwang/NCS/1O5P/1o5p_wat.prmtop': Invalid 
 argument
 
 Thanks for the suggestion though,
 
 Charlie Taylor
 UF HPC Center
 
 
 Cheers,
 
 Miguel Afonso Oliveira
 
 P.S.: To build a list of all your lost files you can do a rsync with the 
 dry-run flag.
 
 On Apr 18, 2010, at 2:30 PM, Charles Taylor wrote:
 
 We lost an OST several months ago and could not recover it.We decided 
 to deactivate until we bring some new storage online and can just rebuild 
 the entire file system.However, now, the MDT still knows about all the 
 files that were on the lost OST and this results in things like invalid 
 argument and ??   ?.. in directory listings.The files 
 cannot be removed by standard commands.   We end up doing something 
 like
 
 mv Dir to Tmp
 cp -r Tmp Dir   (this produces lots of 'cp: cannot stat ...' for the 
 missing files)
 mv Tmp /lost+found (this moves all the missing file names more or less out 
 of the way).
 
 Is there some way to remove these files from the MDT - as though they 
 never existed - without reformatting the entire file system?
 
 Thanks,
 
 Charlie Taylor
 UF HPC Center
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] bad write checksums

2009-07-24 Thread Charles Taylor

On Jul 24, 2009, at 10:33 AM, Craig Prescott wrote:


 Hi;

 We've been testing some 1.8.0.1 patchless clients (RHEL5.3, x86_64,  
 RPMs
 from the Sun download page) with out 1.6.4.2 servers.

Just to clarify the typo...

That should have been with our 1.6.4.2 servers. We are running  
1.8.0.1 patch-less clients with 1.6.4.2 on the MGS/MDS and OSSs and  
getting the messages Craig refers to below.

ct

 The OSS nodes started logging these LustreErrors from the 1.8.0.1  
 clients:

 LustreError: 7302:0:(ost_handler.c:1157:ost_brw_write()) client  
 csum 8448447f, original server csum 66fb7cff, server csum now  
 66fb7cff
 LustreError: 7302:0:(ost_handler.c:1157:ost_brw_write()) Skipped 1  
 previous similar message
 LustreError: 7391:0:(ost_handler.c:1095:ost_brw_write()) client  
 csum 9d8c7d6a, server csum 2cfdcb47
 LustreError: 168-f: ufhpc-OST0004: BAD WRITE CHECKSUM: changed in  
 transit before arrival at OST from 12345-10.13.28...@tcp inum  
 38470778/1485322248 object 67094039/0 extent [0-1023]

 Is this a known issue with running 1.8.0.1 clients against 1.6.4.2
 servers?  We aren't seeing these messages in relation to our 1.6  
 clients.

 Looking through the Lustre bugzilla, I see bug 18296, which discusses
 these messages, but it was logged against Lustre version 1.6.6.

 Cheers,
 Craig
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] 1.6.4.2 - 1.8.0.1 Upgrade Question

2009-07-23 Thread Charles Taylor


Thanks Brian, but I'm still a little unsure of the tunefs step in the  
upgrade.Is it actually necessary as part of the MDS upgrade?  Is  
it safe and do we do the same when upgrading the OSSs i.e. do we have  
to run tunefs.lustre on each OST or just the MDT?


Thanks,

Charlie Taylor
UF HPC Center

On Jul 23, 2009, at 9:47 AM, Brian J. Murrell wrote:


Charles,

I have opened bug 20246 to have that section of the manual reviewed.

Thanx for pointing that out.

b.


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss



We are about to upgrade our standalone MGS/MDS (no failover) from  
1.6.4.2 to 1.8.0.1.I'm a little confused by section 13.2.4 of  
the Lustre 1.8 Operations Manual. What is the purpose of the


mdt1# tunefs.lustre --mgs --mdt --fsname=testfs /dev/sda1

command?I assume it is writing this information to the MDT (/dev/ 
sda1) but wan't that information alright put there when the file  
system was created under 1.6.4.2?Has the format changed between  
the two versions?Why is the tunefs.lustre step necessary?I'm  
concerned about overwriting *anything* on the MDT and rendering our  
file system unusable.   I just want to be sure we understand what we  
are doing.


The paragraph labeled Description in section 32.2 (describing  
tunefs.lustre) did not exactly give me a warm-fuzzy.


BTW, we already have a number of 1.8.0.1 clients running against the  
1.6.4.2 servers. Working great so far!


Thanks,

Charlie Taylor
UF HPC Center


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Question on upgrading Lustre 1.6.6 - 1.8.0

2009-05-17 Thread Charles Taylor

On May 17, 2009, at 7:10 AM, Daire Byrne wrote:


 I think the v1.8 manual is still referring to the upgrade of Lustre  
 v1.4 - v1.6. If you are upgrading from v1.6 to v1.8 then you should  
 only need to install the newer packages and reboot. You may need to  
 do a tune2fs if you want to enable newer features but I'm not 100%  
 sure of that.

Wow, I hope that's not the case.   I know documentation is a pain but  
*no* documentation is better than wrong, misleading, or old  
documentation.

We are getting ready to upgrade and were planning to go by this  
procedure.  If it is not for 1.6.x - to 1.8.x, then that would be  
good to know.

Note that for clients, we can just reload modules and reboot.   That  
works very nicely.  Haven't done anything on the server side yet.   We  
are proceeding with caution and may wait for 1.8.1.

Charlie Taylor
UF HPC Center




 Daire

 - thhsieh thhs...@piano.rcas.sinica.edu.tw wrote:

 Dear All,

 I have read the description of Lustre Operation Guide for version
 1.8. But I am still not very sure about the exact procedures to
 upgrade from version 1.6.6 to version 1.8.0. Now I try to write up
 a plan of upgrading. Please give me your kindly comments on my
 procedures. :)

 In our system, we have three Lustre filesystems (they are all version
 1.6.6, for all the MGS, MDT, OST, and clients), which are configured
 in the following:

 1. fsname=chome
   MGS: qa1:/dev/sda5
   MDT: qa1:/dev/sda5  (i.e., exactly same disk partition as MGS)
   OST: qaX:/dev/sdaX  (distributed in several OST nodes)

 2. fsname=cwork
   MGS: qa1:/dev/sda5  (shared with that of chome)
   MDT: qa1:/dev/sda6
   OST: qaY:/dev/sdaY  (distributed in several OST nodes)

 3. fsname=cwork1
   MGS: qa1:/dev/sda5  (shared with that of chome)
   MDT: qa1:/dev/sda7
   OST: qaZ:/dev/sdaZ  (distributed in several OST nodes)

 We do not have failover configurations in all the filesystems.

 I am planing to shutdown all the Lustre filesystems, and then perform
 the
 upgrading, and finally startup them. I guess that would be simpler.
 The
 exact procedures I am going to do are:

 1. For each of the Lustre filesystems, I will perform the following
   shutdown procedures (chome should be the last one to shutdown,
 since
   it share the MDT and MGS in the same partition):
   - umount all clients
   - umount all OSTs
   - umount MDT

 2. Install the new Lustre-1.8 software and modules and reboot all the
   nodes. Then I will upgrade chome first, and then cwork, and
   finally cwork1.

 3. Upgrade MGS and MDT for chome:

   qa1# tunefs.lustre --mgs --mdt --fsname=chome /dev/sda5

 4. Upgrade OSTs for chome:

   qaX# tunefs.lustre --ost --fsname=chome --mgsnode=qa1 /dev/sdaX

   Up to this point the chome part should be ready, I guess.


 5. Now the MDT for cwork. The manual says that we should copy the
 MDT
   and client startup logs from the MDT to the MGS, so I guess that I
 should

   - Mount MGS as ldiskfs:
 qa1# mount -t ldiskfs /dev/sda5 /mnt

   - Run script lustre_up14 on the MDT of cwork partition:
 qa1# lustre_up14 /dev/sda6 cwork

 then I will get the following files:
 /tmp/logs/cwork-client
 /tmp/logs/cwork-MDT

   - Copy these log files to /mnt/CONFIGS/

   - Umount MGS:
 qa1# umount /mnt

   - Upgrade the MDT:
 qa1# tunefs.lustre --mdt --nomgs --fsname=cwork --mgsnode=qa1
 /dev/sda6


 6. Now the OSTs for cwork:

   qaY# tunefs.lustre --ost --fsname=cwork1 --mgsnode=qa1 /dev/sdaY

   Up to now the filesystem cwork should be ready.


 7. For the MDT and OSTs for cwork1, we can follow the same
 procedures
   as step 6 and 7.

 8. Start up the new Lustre filesystems:

   For chome:
   qa1# mount -t lustre /dev/sda5 /cfs/chome_mdt
   qaX# mount -t lustre /dev/sdaX /cfs/chome_ostX
   mount the clients

   for cwork:
   qa1# mount -t lustre /dev/sda6 /cfs/cwork_mdt
   qaY# mount -t lustre /dev/sdaY /cfs/cwork_ostY
   mount the clients

   for cwork1:
   qa1# mount -t lustre /dev/sda7 /cfs/cwork1_mdt
   qaZ# mount -t lustre /dev/sdaZ /cfs/cwork1_ostZ
   mount the clients


 Please kindly give me your comments. Thanks very much.


 Best Regards,

 T.H.Hsieh
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Download

2009-05-12 Thread Charles Taylor

Hmmm.   I tried to download lustre 1.8.0 two days ago (shortly after  
the announcement).I tried to use my existing Sun account info  
that I established when we downloaded 1.6.x.The download site  
accepted my user name and password but denied me access saying that my  
account required manual review.A day later, yesterday, they sent  
me an email saying they needed my Full Legal name and that the  
initial A was not sufficient or some other nonsense.So it is now  
two days later and I still can't download what is *supposed* to be  
freely available open-source software.

For the record, I don't mind registering for the download.   I *do*  
mind the security screening.I'm not opening a bank account here.
I just want to download some bits with an account I used in the  
past.Fortunately, I could just fake another account and get the  
software.   My question is simply, Why do you want to hassle your  
faithful this way?We are not long-time lustre users (about 1.5  
years now) but we have touted the benefits of lustre to anyone and  
everyone who would listen and have contributed to the adoption of  
lustre at several other sites.

I'll just add that the multi-tiered download site is also an  
unnecessary hassle and puts off your users.

Stop the insanity,

Charlie Taylor
UF HPC Center
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] (no subject)

2009-05-11 Thread Charles Taylor

On May 11, 2009, at 8:07 PM, Andreas Dilger wrote:

 On May 11, 2009  14:38 -0700, Hayes, Robert N wrote:
 We will test the mem=12G suggestion. Before attempting the 1.8.0  
 client,
 can you confirm that a 1.8 client should work with a 1.6 server  
 without
 causing any more complications?

 Yes, the 1.8.x clients are interoperable with 1.6.x servers.  If you  
 are
 worried about testing this out during live system time then you can  
 wait
 for an outage window to test the 1.8 client in isolation.  There is
 nothing to do on the server, and just RPM upgrade/downgrades on the  
 client.

And it's a beautiful thing.  :)

Charlie Taylor
UF HPC Center

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] LUG 2009

2009-02-27 Thread Charles Taylor

Wow, sorry to waste the BW here but I'm confused.Are you really  
having the meeting at the Cavallo Point Lodge but the Advanced User  
Seminar (on the 15th) at The Lodge at Sonoma Renaissance Resort and  
Spa? Is that a misprint or are the meeting and the Seminar at two  
different places?

Thanks,

Charlie Taylor
UF HPC Center
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Recovery without end

2009-02-25 Thread Charles Taylor
I'm going to pipe in here.We too use a very large (1000) timeout  
value.   We have two separate luster file systems one of them consists  
of two rather beefy OSSs with 12 OSTs each (FalconIII FC-SATA RAID).
The other consists of 8 OSSs with 3 OSTs each (Xyratex 4900FC).   We  
have about 500 clients and support both tcp and o2ib NIDS.   We run  
Lustre 1.6.4.2 on a patched 2.6.18-8.1.14 CentOS/RH kernel.   It has  
worked *very* well for us for over a year now - very few problems with  
very good performance under very heavy loads.

We've tried setting our timeout to lower values but settled on the  
1000 value (despite the long recovery periods) because if we don't,  
our lustre connectivity starts to breakdown and our mounts come and go  
with errors like transport endpoint failure or transport endpoint  
not connected or some such (its been a while now).File system  
access comes and goes randomly on nodes.We tried many tunings and  
looked for other sources of  problems (underlying network issues).
Ultimately, the only thing we found that fixed this was to extend the  
timeout value.

I know you will be tempted to tell us that our network must be flakey  
but it simply is not.   We'd love to understand why we need such a  
large timeout value and why, if we don't use a large value, we see  
these transport end-point failures.However, after spending several  
days trying to understand and resolve the issue, we finally just  
accepted the long timeout as a suitable workaround.

I wonder if there are others who have silently done the same.   We'll  
be upgrading to 1.6.6 or 1.6.7 in the not-too-distant future.Maybe  
then we'll be able to do away with the long timeout value but until  
then, we need it.  :(

Just my two cents,

Charlie Taylor
UF HPC Center

On Feb 25, 2009, at 11:03 AM, Brian J. Murrell wrote:

 On Wed, 2009-02-25 at 16:09 +0100, Thomas Roth wrote:

 Our /proc/sys/lustre/timeout is 1000

 That's way to high.  Long recoveries are exactly the reason you don't
 want this number to be huge.

 - there has been some debate on
 this large value here, but most other installation will not run in a
 network environment with a setup as crazy as ours.

 What's so crazy about your set up?  Unless your network is very flaky
 and/or you have not tuned your OSSes properly, there should be no need
 for such a high timeout and if there is you need to address the  
 problems
 requiring it.

 Putting the timeout
 to 100 immediately results in Transport endpoint errors,  
 impossible to
 run Lustre like this.

 300 is the max that we recommend and we have very large production
 clusters that use such values successfully.

 Since this is a 1.6.5.1 system, I activated the adaptive timeouts   
 - and
 put them to equally large values,
 /sys/module/ptlrpc/parameters/at_max = 6000
 /sys/module/ptlrpc/parameters/at_history = 6000
 /sys/module/ptlrpc/parameters/at_early_margin = 50
 /sys/module/ptlrpc/parameters/at_extra = 30

 This is likely not good as well.  I will let somebody more  
 knowledgeable
 about AT comment in detail though.  It's a new feature and not getting
 wide use at all yet, so the real-world experience is still low.

 b.

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Another server question.

2009-02-04 Thread Charles Taylor

On Feb 4, 2009, at 10:39 AM, Robert Minvielle wrote:


 I still can not seem to get this OST to come online. The clients
 are still exhibiting the same behaviour as before. Is there any
 way to get the OST to go into active by force? I ran a ext3 check
 on it using the SUN modded e2fsprogs and it returns

 e2fsck 1.40.11.sun1 (17-June-2008)
 datafs-OST0001: recovering journal
 datafs-OST0001: clean, 472/25608192 files, 1862944/102410358 blocks

 Yet, I still get:

 cd /proc/fs/lustre; find . -name *recov* -exec cat {} \;
 status: INACTIVE

 On the MGS, it seems to show as active...

 [r...@l1storage1 ~]# cat /proc/fs/lustre/lov/datafs-mdtlov/target_obd
 0: datafs-OST_UUID ACTIVE
 1: datafs-OST0001_UUID ACTIVE
 4: datafs-OST0004_UUID ACTIVE
 5: datafs-OST0005_UUID ACTIVE
 6: datafs-OST0006_UUID ACTIVE


We've seen OSTs come up as INACTIVE before.   We are not sure why it  
happens.Sometimes it will transition into RECOVERY if you remount  
it (umount, mount).   Sometimes you may find that the OST is mounted  
read-only and you can force it back to read-write with mount (as in  
mount -o rw,remount device). Sometimes, if you wait, it will  
transition to ACTIVE on its own (perhaps passing through RECOVERY  
first, I don't know). We've intentionally and unintentionally  
experienced all three.

I think Brian and/or Andreas have already mentioned the remount route.

Don't worry though.   Lustre really does work.   This sounds like  
normal tooth cutting.   You'll be ok.  :)

Charlie Taylor
UF HPC Center
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Another server question.

2009-02-04 Thread Charles Taylor

On Feb 4, 2009, at 4:33 AM, Andreas Dilger wrote:

 On Feb 03, 2009  12:21 -0500, Charles Taylor wrote:
 In our experience, despite what has been said and what we have read,
 if we lose or take down a single OSS, our clients lose access (i/o
 seems blocked) to the file system until that OSS is back up and has
 completed recovery.That's just or experience and it has been very
 consistent.   We've never seen otherwise, though we would like  
 to.  :)

 To be clear - a client process will wait indefinitely until an OST
 is back alive, unless either the process is killed (this should be
 possible after the Lustre recovery timeout is exceeded, 100s by
 default), or the OST is explicitly marked inactive on the clients:

   lctl --device {failed OSC device on client} deactivate

 After the OSC is marked inactive, then all IO to that OST should
 immediately return with -EIO, and not hang.

Thanks Andreas, I think that clears things up and will help us  
understand what to expect going forward.

 If you have experiences other than this it is a bug.  If this isn't
 explained in the documentation it is a documentation bug.

If that is spelled out clearly in the documentation, I missed it  
(certainly possible).   I hope I indicated that this business has  
never been a show-stopper for us.   Typically, if we lose an OSS or  
OST our top priority is getting it back in service.   As you indicate,  
most clients wait and resume when recovery is complete and this is  
usually fine with us.  In fact, its awesome and users understand it  
since it is akin to what they were used to w/ NFS - back in the day.

We love you man!   :)

Charlie Taylor
UF HPC Center

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Another server question.

2009-02-03 Thread Charles Taylor

On Feb 3, 2009, at 11:42 AM, Brian J. Murrell wrote:


 I down one of the servers (normal shutdown, not the MGD of course).
 OK, so the clients seem to be frozen in regards to the lustre.

 Only if they want to access objects (files, or file stripes) on that
 server that you shut down, yes.

In our experience, despite what has been said and what we have read,  
if we lose or take down a single OSS, our clients lose access (i/o  
seems blocked) to the file system until that OSS is back up and has  
completed recovery.That's just or experience and it has been very  
consistent.   We've never seen otherwise, though we would like to.  :)


 Many here
 have noted that it should be ok, with the exception of files that  
 were
 stored on the downed server,

Again, not in our experience.We are currently running 1.6.4.2 and  
have never seen this work.Losing a single OSS renders the file  
system pretty much unusable until the OSS has recovered.We could  
be doing something wrong, I suppose but I'm not sure what.

 but that does not seem to be the case here.
 That is not my main concern however, the real question is, I bring  
 the server
 back up; check its ID by issuing lctl dl; I check the MGS by a cat / 
 proc/fs/lustre/devices
 and see the ID in there as UP. OK, so it all seems well again, but  
 the client
 is still (somewhat) stuck.

You have to wait for recovery to complete. You can check the  
recovery status on the OSSs and MGS/MDS by

cd /proc/fs/lustre; find . -name *recov* -exec cat {} \;

Once all the OSSs/MGS show recovery COMPLETE, clients will be able  
to access the file system again.

We've been running three separate Lustre file systems for over a year  
now and are *very* happy with it.There are a few things that we  
still don't understand and this is one of them.   We wish that when an  
OSS went down, we only lost access to files/objects on *that* OSS but,  
again, that has not been our experience.Still we've kissed a lot  
of distributed/parallel file system frogs.   We'll take Lustre, hands  
down.

Charlie Taylor
UF HPC Center


 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Another server question.

2009-02-03 Thread Charles Taylor

On Feb 3, 2009, at 12:28 PM, Brian J. Murrell wrote:

 On Tue, 2009-02-03 at 12:21 -0500, Charles Taylor wrote:

 In our experience, despite what has been said and what we have read,
 if we lose or take down a single OSS, our clients lose access (i/o
 seems blocked) to the file system until that OSS is back up and has
 completed recovery.

 That is likely the real world results of taking down an OSS, indeed.
 But that is more likely simply due to the random distribution of
 files/stripes around your filesystem and that it won't take long for  
 all
 active clients to eventually want something from that missing OSS.

That could certainly be the case.


 Again, not in our experience.

 Have you actually tested your theory in a controlled environment where
 you could be sure that clients that got hung up have never tried to
 access an OST on missing OSS?

No, we've never set out to prove that it works or doesn't.   We are  
not complaining though - just saying that for us the practical  
ramification of an OSS going down is that the file system will be  
unusable until the OSS is back in service and recovery is complete.

  If so, and you are still finding that
 clients that don't touch the downed OSS are getting hung up, please,  
 by
 all means, file a bug.

Will do.   We'll be upgrading to 1.6.6 pretty soon and perhaps we'll  
do some more extensive testing then.

Regards,

Charlie


 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] lvbo_init failed

2008-07-17 Thread Charles Taylor


We are getting lots of these (always for the same resource) on one of  
our OSSs.


LustreError: 22308:0:(ldlm_resource.c:719:ldlm_resource_add())  
lvbo_init failed for resource 5820180: rc -2: 1 Time(s)
LustreError: 5:0:(ldlm_resource.c:719:ldlm_resource_add())  
lvbo_init failed for resource 5820180: rc -2: 1 Time(s)
LustreError: 22277:0:(ldlm_resource.c:719:ldlm_resource_add())  
lvbo_init failed for resource 5820180: rc -2: 2 Time(s)
LustreError: 22274:0:(ldlm_resource.c:719:ldlm_resource_add())  
lvbo_init failed for resource 5820180: rc -2: 3 Time(s)
LustreError: 22204:0:(ldlm_resource.c:719:ldlm_resource_add())  
lvbo_init failed for resource 5820180: rc -2: 1 Time(s)
LustreError: 22193:0:(ldlm_resource.c:719:ldlm_resource_add())  
lvbo_init failed for resource 5820180: rc -2: 2 Time(s)
LustreError: 22253:0:(ldlm_resource.c:719:ldlm_resource_add())  
lvbo_init failed for resource 5820180: rc -2: 1 Time(s)
LustreError: 22200:0:(ldlm_resource.c:719:ldlm_resource_add())  
lvbo_init failed for resource 5820180: rc -2: 2 Time(s)
LustreError: 22264:0:(ldlm_resource.c:719:ldlm_resource_add())  
lvbo_init failed for resource 5820180: rc -2: 1 Time(s)


We've tried to track down the object with lfs find but no joy so  
far.I'm not even sure that is the right approach.   We found a but  
pertaining to this in the lustre bugzilla but it looks like it was  
resolved so I'm not sure that's the issue either.   Any one else run  
into this before?   Is there something we can do to stop it?


We are running 1.6.4.2 on CentOS 4.5 with an updated kernel on the OSSs.

Linux hpcio7.ufhpc 2.6.18-8.1.14.el5.L-1642 #1 SMP Mon Feb 18 13:24:27  
EST 2008 x86_64 x86_64 x86_64 GNU/Linux).


This file system has been in production for about six months - first  
time we've seen this.


Charlie Taylor
UF HPC Center





___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre 1.6.5 install problem

2008-06-19 Thread Charles Taylor
Lustre doesn't know where your ib modules symbols are.   When you  
configured lustre (in the build sense) you pointed it to a patched  
kernel tree.   In that directory is a Module.symvers file devoid of ib  
module symbols.  You should also have a Module.symvers in your /usr/ 
src/ofa_kernel directory (assuming you built OFED as well).   So...

cat /usr/src/ofa_kernel/Module.symvers  patched_kernel_dir/ 
Module.symvers

and run make install again and it should be happy.   For a 2.6.9  
kernel, you probably need OFED 1.2.

Charlie Taylor
UF HPC Center

On Jun 18, 2008, at 5:55 AM, Johnlya wrote:

 Install step is:
 rpm -Uvh --nodeps e2fsprogs-devel-1.40.7.sun3-0redhat.x86_64.rpm
 rpm -Uvh e2fsprogs-1.40.7.sun3-0redhat.x86_64.rpm
 cd ../PyXML/
 tar -zxvf  PyXML-0.8.4.tar.gz
 cd PyXML-0.8.4
 python setup.py build
 python setup.py install
 cd ../../Expect
 rpm -ivh expect-5.42.1-1.src.rpm
 cd ../1.6.5/
 rpm -ivh kernel-lustre-smp-2.6.9-67.0.7.EL_lustre.1.6.5.x86_64.rpm
 rpm -ivh lustre-ldiskfs-3.0.4-2.6.9_67.0.7.EL_lustre.
 1.6.5smp.x86_64.rpm
 rpm -ivh lustre-modules-1.6.5-2.6.9_67.0.7.EL_lustre.
 1.6.5smp.x86_64.rpm

 when install lustre-modules, it displays warning:
 WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
 lustre/ko2iblnd.ko needs unknown symbol ib_create_cq
 WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
 lustre/ko2iblnd.ko needs unknown symbol rdma_resolve_addr
 WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
 lustre/ko2iblnd.ko needs unknown symbol ib_dereg_mr
 WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
 lustre/ko2iblnd.ko needs unknown symbol rdma_reject
 WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
 lustre/ko2iblnd.ko needs unknown symbol rdma_disconnect
 WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
 lustre/ko2iblnd.ko needs unknown symbol rdma_resolve_route
 WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
 lustre/ko2iblnd.ko needs unknown symbol rdma_bind_addr
 WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
 lustre/ko2iblnd.ko needs unknown symbol rdma_create_qp
 WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
 lustre/ko2iblnd.ko needs unknown symbol ib_destroy_cq
 WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
 lustre/ko2iblnd.ko needs unknown symbol rdma_create_id
 WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
 lustre/ko2iblnd.ko needs unknown symbol rdma_listen
 WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
 lustre/ko2iblnd.ko needs unknown symbol rdma_destroy_qp
 WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
 lustre/ko2iblnd.ko needs unknown symbol ib_get_dma_mr
 WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
 lustre/ko2iblnd.ko needs unknown symbol ib_alloc_pd
 WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
 lustre/ko2iblnd.ko needs unknown symbol rdma_connect
 WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
 lustre/ko2iblnd.ko needs unknown symbol ib_modify_qp
 WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
 lustre/ko2iblnd.ko needs unknown symbol rdma_destroy_id
 WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
 lustre/ko2iblnd.ko needs unknown symbol rdma_accept
 WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
 lustre/ko2iblnd.ko needs unknown symbol ib_dealloc_pd

 Please tell me why?
 Thank you
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] MDS Problems

2008-06-14 Thread Charles Taylor

On Jun 13, 2008, at 5:46 PM, Andreas Dilger wrote:

 On Jun 13, 2008  16:03 -0400, Charles Taylor wrote:
 We have been running the config below on three different lustre file
 systems since early January and, for the most part, things have been
 pretty stable.We are now experiencing frequent hangs on some
 clients - particularly our interactive login nodes.All processes
 get blocked behind Lustre I/O requests.   When this happens there are
 *no* messages in either dmesg or syslog on the clients. They seem
 unaware of a problem.

 This is likely due to client statahead problems.  Please disable  
 this
 with echo 0  /proc/fs/lustre/llite/*/statahead_max on the clients.
 This should also be fixed in 1.6.5

This seems to have done the trick.   Odd though that we've been  
running this way for several months and it didn't seem to be an issue  
until now.   We saw the discussions of this go by at one point and we  
should have just taken care of it then whether we were seeing it or  
not.   Thanks for reminding us of it.



 1. A ton of lustre-log.M.N files get dumped into /tmp in a  short
 period of time.   Most of them appear to be full of garbage and
 unprintable characters rather than thread stack traces.   Many of  
 them
 are also zero length.

 The lustre-log files are not stack traces.  They are dumped lustre  
 debug
 logs.

Got it.



 We have been adjusting lru_size on the clients but so far it has made
 no difference.We have options mds mds_num_threads=512 and our
 system timeout is 1000 (sure, go ahead and flame me but if we don't  
 do
 that we get tons of endpoint transport failures on the clients and
 no, there are no connectivity issues).   :)

 We are open to suggestion and wondering if we should update the MDSs
 to 1.6.5.   Can we do that safely without also upgrading the clients
 and OSTs?

 In general the MDS and OSS nodes should run the same level of  
 software,
 as that is what we test, but there isn't a hard requirement for it.

Would it be reasonable then, to upgrade the MDSs and OSSs but leave  
the clients at 1.6.4.2 or is that asking for trouble.   I think this  
comes up a lot and I'm pretty sure people have said they do it  
successfully.   I'm just wondering if it is a *design* goal that is  
architected in or just something that happens to work most of the time.

Thanks again,

Charlie Taylor
UF HPC Center


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] MDS Problems

2008-06-13 Thread Charles Taylor

We have been running the config below on three different lustre file  
systems since early January and, for the most part, things have been  
pretty stable.We are now experiencing frequent hangs on some  
clients - particularly our interactive login nodes.All processes   
get blocked behind Lustre I/O requests.   When this happens there are  
*no* messages in either dmesg or syslog on the clients. They seem  
unaware of a problem.

However, on the MDS we see the following...


1. A ton of lustre-log.M.N files get dumped into /tmp in a  short  
period of time.   Most of them appear to be full of garbage and  
unprintable characters rather than thread stack traces.   Many of them  
are also zero length.

2. Lots of dmesg output similar to that appended (see below).

3. Pretty much the same in syslog.

4. The frequency/period of these events seems to consistent with the  
timeouts associated with the following messages...

 Lustre: 4534:0:(watchdog.c:312:lcw_update_time())  
Expired watchdog for pid 4534 disabled after 499.9514s


We have been adjusting lru_size on the clients but so far it has made  
no difference.We have options mds mds_num_threads=512 and our  
system timeout is 1000 (sure, go ahead and flame me but if we don't do  
that we get tons of endpoint transport failures on the clients and  
no, there are no connectivity issues).   :)

We are open to suggestion and wondering if we should update the MDSs  
to 1.6.5.   Can we do that safely without also upgrading the clients  
and OSTs?

Our config is as  below.

Thanks,

Charlie Taylor
UF HPC Center
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 


Config:
   Lustre 1.6.4.2
   CentOS 5.0
   Kernel 2.6.18-8.1.14
   OFED 1.2
   320 o2ib clients
 80  tcp clients


dmesg output:
=
LustreError: dumping log to /tmp/lustre-log.1213385571.4240
ll_mdt_20 S 81023d2c9700 0  4070  1  4071   
4069 (L-TLB)
  81023d2c9700 81023d2c9630 81023d2c9630 000a
  81023fa09100 8101438ca7a0 0002bb03681325fb 8f21
  81023fa092e8 0001 8866eb51 
Call Trace:
  [8866eb51] :ptlrpc:ldlm_run_cp_ast_work+0x161/0x1f0
  [88686da0] :ptlrpc:ldlm_expired_completion_wait+0x0/0x250
  [800611f7] schedule_timeout+0x8a/0xad
  [80092c5e] process_timeout+0x0/0x5
  [886881fd] :ptlrpc:ldlm_completion_ast+0x35d/0x6a0
  [886705e9] :ptlrpc:ldlm_lock_enqueue+0x559/0x5c0
  [80086a74] default_wake_function+0x0/0xe
  [8866ce6a] :ptlrpc:ldlm_lock_addref_internal_nolock+0x3a/ 
0x90
  [88684bb0] :ptlrpc:ldlm_blocking_ast+0x0/0x2d0
  [88685e24] :ptlrpc:ldlm_cli_enqueue_local+0x454/0x510
  [888d5e87] :mds:mds_fid2locked_dentry+0x1d7/0x2a0
  [88687ea0] :ptlrpc:ldlm_completion_ast+0x0/0x6a0
  [888d6647] :mds:mds_getattr_lock+0x6f7/0xc70
  [885b11c4] :ksocklnd:ksocknal_alloc_tx+0x1c4/0x270
  [888d7191] :mds:mds_intent_policy+0x5d1/0xbe0
  [8854dca7] :lnet:lnet_prep_send+0x67/0xb0
  [88673786] :ptlrpc:ldlm_resource_putref+0x1b6/0x3b0
  [88670183] :ptlrpc:ldlm_lock_enqueue+0xf3/0x5c0
  [8866dbbd] :ptlrpc:ldlm_lock_create+0x98d/0x9c0
  [88690660] :ptlrpc:ldlm_server_completion_ast+0x0/0x570
  [8868cda0] :ptlrpc:ldlm_handle_enqueue+0xd90/0x1410
  [88690bd0] :ptlrpc:ldlm_server_blocking_ast+0x0/0x690
  [888e0cad] :mds:mds_handle+0x46dd/0x58ff
  [8860bcb2] :obdclass:class_handle2object+0xd2/0x160
  [886a7280] :ptlrpc:lustre_swab_ptlrpc_body+0x0/0x90
  [886a4e35] :ptlrpc:lustre_swab_buf+0xc5/0xf0
  [886aca8b] :ptlrpc:ptlrpc_server_handle_request+0xb0b/0x1270
  [800608e8] thread_return+0x0/0xea
  [8006b165] do_gettimeofday+0x50/0x92
  [8851b056] :libcfs:lcw_update_time+0x16/0x100
  [8003cee3] lock_timer_base+0x1b/0x3c
  [886af4cc] :ptlrpc:ptlrpc_main+0x7dc/0x950
  [80086a74] default_wake_function+0x0/0xe
  [8005be25] child_rip+0xa/0x11
  [886aecf0] :ptlrpc:ptlrpc_main+0x0/0x950
  [8005be1b] child_rip+0x0/0x11

LustreError: dumping log to /tmp/lustre-log.1213385600.4070
Lustre: 4338:0:(ldlm_lib.c:519:target_handle_reconnect()) hpcdata- 
MDT: a9f365a3-8746-6ed5-e45e-cd292891ece2 reconnecting
Lustre: 4338:0:(ldlm_lib.c:519:target_handle_reconnect()) Skipped 6  
previous similar messages
Lustre: 4338:0:(ldlm_lib.c:747:target_handle_connect()) hpcdata- 
MDT: refuse reconnection from [EMAIL PROTECTED] 
@o2ib to 0x8101cdbd6000; still busy with 2 active RPCs
Lustre: 4338:0:(ldlm_lib.c:747:target_handle_connect()) Skipped 6  
previous similar messages
ll_mdt_501S 81023ca51700 0  4551  1  4552   
4550 (L-TLB)
  81023ca51700 

Re: [Lustre-discuss] MDS crash and the Dilger Procedure

2008-06-06 Thread Charles Taylor

Paging Dr. Dilger, paging Dr. Dilger.   Dr. Dilger, you are needed in  
the emergency room.   :)

On Jun 5, 2008, at 5:28 PM, Jakob Goldbach wrote:

 Hi,

 I just had to go through the Dilger procedure after MDS crashing  
 when
 mounting MDT. The system is running fine now - glad that I just  
 learned
 about this procedure.

 trace attached.

Stay tuned for more episodes of... As the MDT Mounts.

ct





___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Lustre Mount Crashing

2008-06-02 Thread Charles Taylor

We lost our MDS/MGS to a power failure yesterday evening. Just to  
be safe, we ran e2fsck on the combined MDT/MGT and there were only a  
couple of minor complaints about HTREE issues that it fixed.The  
MDT/MGT now fsck's cleanly. The problem is that, despite the clean  
e2fsck, the MGS is crashing in the lustre mount code when attempting  
to mount the MDT.

It is a scratch file system so it is not backed up.   Still, it is a  
pain to lose the data.I'm assuming this is not normal and there is  
not much in the manual about doing anything more than e2fsck but I  
want to ask if anyone else has seem something like this before and  
might have some additional suggestions before I trash the data and  
reformat the file system.

Thanks,

Charlie Taylor
UF HPC Center
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre Mount Crashing

2008-06-02 Thread Charles Taylor


On Jun 2, 2008, at 11:49 AM, Dennis Nelson wrote:




 Unless you are getting some kind of kernel panic, that stack trace
 should be in the syslog.



No, it is going down hard in a kernel panic. All of the stack  
trace I can see at the moment looks like (scribbled by hand... so  
forgive me for leaving off the addresses and offsets).



:libcfs:cfs_alloc
:obdclass:lustre_init_lsi
:obdclass:lustre_fill_super
:obdclass::lustre_fill_super
set_anon_super
set_anon_super
:obd_class:lustre_fill_super
et_sb_nodev
vfs_kern_mount
do_kern_mount
do_mount
__handle_mm_fault
__up_read
do_page_fault
zone_statistics
__alloc_pages
sys_mount
system_call

RIP   .   resched_task


I wish I could get the whole trace to you.   We might try to get kdump  
on there but my luck with kdump has been mixed.   It seems to work  
with some chipsets and not with others.


Anyway, we may just be out of luck.   I just hate to give up too  
easily because it seems like everything is solid yet we crash on or  
just after the mount.   This is on a MDS that has been running without  
a problem for 5 months (lustre 1.6.4.2 ).


uname -a
Linux hpcmds 2.6.18-8.1.14.el5.L-1642 #2 SMP Thu Feb 21 15:42:14 EST  
2008 x86_64 x86_64 x86_64 GNU/Linux


I don't know if that trace is a lot of help to you since it is not  
complete (which is why I didn't post it initially) but maybe there is  
something there of use.


Regards,

Charlie Taylor
UF HPC Center




___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Cannot send after transport endpoint shutdown (-108)

2008-03-05 Thread Charles Taylor

Sure, we will provide you with more details of our installation but  
let me first say that, if recollection serves, we did not pull that  
number out of a hat.   I believe that there is a formula in one of  
the lustre tuning manuals for calculating the recommended timeout  
value.   I'll have to take a moment to go back and find it.   Anyway,  
if you use that formula for our cluster, the recommended timeout  
value, I think, comes out to be *much* larger than 1000.

Later this morning, we will go back and find that formula and share  
with the list how we came up w/ our timeout.   Perhaps you can show  
us where we are going wrong.

One more comment We just brought up our second large lustre file  
system.   It is 80+ TB served by 24 OSTs on two (pretty beefy)  
OSSs.   We just achieved over 2GB/sec of sustained (large block,  
sequential) I/O from an aggregate of 20 clients.Our design target  
was 1.0 GB/sec/OSS and we hit that pretty comfortably.   That said,  
when we first mounted the new (1.6.4.2) file system across all 400  
nodes in our cluster, we immediately started getting transport  
endpoint failures and evictions.   We looked rather intensively for  
network/fabric problems (we have both o2ib and tcp nids) and could  
find none.   All of our MPI apps are/were running just fine.   The  
only way we could get rid of the evictions and transport endpoint  
failures was by increasing the timeout.   Also, we knew to do this  
based on our experience with our first lustre file system (1.6.3 +  
patches) where we had to do the same thing.

Like I said, a little bit later, Craig or I will post more details  
about our implementation.   If we are doing something wrong with  
regard to this timeout business, I would love to know what it is.

Thanks,

Charlie Taylor
UF HPC Center

On Mar 4, 2008, at 4:04 PM, Brian J. Murrell wrote:

 On Tue, 2008-03-04 at 15:55 -0500, Aaron S. Knister wrote:
 I think I tried that before and it didn't help, but I will try it
 again. Thanks for the suggestion.

 Just so you guys know, 1000 seconds for the obd_timeout is very, very
 large!  As you could probably guess, we have some very, very big  
 Lustre
 installations and to the best of my knowledge none of them are using
 anywhere near that.  AFAIK (and perhaps a Sun engineer with closer
 experience to some of these very large clusters might correct me) the
 largest value that the largest clusters are using is in the
 neighbourhood of 300s.  There has to be some other problem at play  
 here
 that you need 1000s.

 Can you both please report your lustre and kernel versions?  I know  
 you
 said latest Aaron, but some version numbers might be more solid  
 to go
 on.

 b.


 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Cannot send after transport endpoint shutdown (-108)

2008-03-05 Thread Charles Taylor
Well, go figure.We are running...

Lustre: 1.6.4.2 on clients and servers
Kernel: 2.6.18-8.1.14.el5Lustre (clients and servers)
Platform: X86_64 (opteron 275s, mostly)
Interconnect: IB,  Ethernet
IB Stack: OFED 1.2

We already posted our procedure for patching the kernel, building  
OFED, and building lustre so I don't think I'll go into that  
again.Like I said, we just brought a new file system online.
Everything looked fine at first with just a few clients mounted. 
Once we mounted all 408 (or so), we started gettting all kinds of  
transport endpoint failures and the MGSs and OSTs were evicting  
clients left and right.We looked for network problems and could  
not find any of any substance.Once we increased the obd/lustre/ 
system timeout setting as previously discussed, the errors  
vanished.This was consistent with our experience with 1.6.3 as  
well.That file system has been online since early December.
Both file systems appear to be working well.

I'm not sure what to make of it.Perhaps we are just masking  
another problem. Perhaps there are some other, related values  
that need to be tuned.We've done the best we could but I'm sure  
there is still much about Lustre we don't know.   We'll try to get  
someone out to the next class but until then, we're on our own, so to  
speak.

Charlie Taylor
UF HPC Center


 Just so you guys know, 1000 seconds for the obd_timeout is very, very
 large!  As you could probably guess, we have some very, very big  
 Lustre
 installations and to the best of my knowledge none of them are using
 anywhere near that.  AFAIK (and perhaps a Sun engineer with closer
 experience to some of these very large clusters might correct me) the
 largest value that the largest clusters are using is in the
 neighbourhood of 300s.  There has to be some other problem at play  
 here
 that you need 1000s.

 I can confirm that at a recent large installation with several  
 thousand
 clients, the default of 100 is in effect.


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Cannot send after transport endpoint shutdown (-108)

2008-03-04 Thread Charles Taylor
We've seen this before as well.Our experience is that the  
obd_timeout is  far too small for large clusters (ours is 400+  
nodes)  and the only way we avoid these errors is by setting it to  
1000 which seems high to us but  appears to work and puts an end to  
the transport endpoint shutdowns.

On the MDS

lctl conf_param srn.sys.timeout=1000

You may have to do this on the OSS's as well unless you restart the  
OSS's but I could be wrong on that.   You should check it everywhere  
with...

cat /proc/sys/lustre/timeout


On Mar 4, 2008, at 3:31 PM, Aaron S. Knister wrote:

 This morning I've had both my infiniband and tcp lustre clients  
 hiccup. They are evicted from the server presumably as a result of  
 their high load and consequent timeouts. My question is- why don't  
 the clients re-connect. The infiniband and tcp clients both give  
 the following message when I type df - Cannot send after  
 transport endpoint shutdown (-108). I've been battling with this on  
 and off now for a few months. I've upgraded my infiniband switch  
 firmware, all the clients and servers are running the latest  
 version of lustre and the lustre patched kernel. Any ideas?

 -Aaron
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre Bug 13917

2008-02-20 Thread Charles Taylor

Sure enough, we updated enough clients with the 1.6.4.2 bits to  
accommodate our 512-way job and it now fires right up.   That sure  
read like a server side fix to us but thanks for setting us  
straight.Thanks for clarifying the upgrade questions as well.

Regards,

Charlie Taylor
UF HPC Center

On Feb 20, 2008, at 9:27 AM, Brian J. Murrell wrote:

 On Sun, 2008-02-17 at 08:04 -0500, Charles Taylor wrote:
 We are running lustre 1.6.3 with some patches we applied by hand with
 a patched 2.6.18-8.1.14 kernel on both the clients and servers.
 We think we are now hitting lustre bug 13197

 Do you mean 13917?

 and can no longer
 operate without a fix.We could apply the 13197 patch to 1.6.3 and
 keep going as we are but we would like to start moving  to 1.6.4.2.

 Good idea.

 Would it be insane to update to 1.6.4.2 on the MDS and OSSs while
 continuing to run 1.6.3 on the clients

 If you want to fix 13917, yes.  13917 is a client-side fix.

 or is 1.6.4.2 similar enough
 to interoperate with 1.6.3 clients?

 Our interoperability commitment provides that 1.6.3 and 1.6.4.2 will
 inter-operate, however upgrading the OSS and MDS only will not fix
 13917.

 b.

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] ldlm_enqueue operation failures

2008-02-19 Thread Charles Taylor
:(ldlm_lib.c: 
1442:target_send_reply_msg()) Skipped 207 previous similar messages
Feb 19 08:03:03 hpcmds kernel: LustreError: 6057:0:(mgs_handler.c: 
515:mgs_handle()) lustre_mgs: operation 101 on unconnected MGS
Feb 19 08:03:03 hpcmds kernel: LustreError: 6057:0:(mgs_handler.c: 
515:mgs_handle()) Skipped 203 previous similar messages
Feb 19 08:07:30 hpcmds kernel: LustreError: 6056:0:(ldlm_lib.c: 
1442:target_send_reply_msg()) @@@ processing error (-107)   
[EMAIL PROTECTED] x6548994/t0 o101-?@?:-1 lens 232/0 ref 0 fl  
Interpret:/0/0 rc -107/0
Feb 19 08:07:30 hpcmds kernel: LustreError: 6056:0:(ldlm_lib.c: 
1442:target_send_reply_msg()) Skipped 205 previous similar messages
Feb 19 08:13:05 hpcmds kernel: LustreError: 7162:0:(mgs_handler.c: 
515:mgs_handle()) lustre_mgs: operation 101 on unconnected MGS
Feb 19 08:13:05 hpcmds kernel: LustreError: 7162:0:(mgs_handler.c: 
515:mgs_handle()) Skipped 207 previous similar messages
Feb 19 08:17:33 hpcmds kernel: LustreError: 6056:0:(ldlm_lib.c: 
1442:target_send_reply_msg()) @@@ processing error (-107)   
[EMAIL PROTECTED] x680167/t0 o101-?@?:-1 lens 232/0 ref 0 fl  
Interpret:/0/0 rc -107/0
Feb 19 08:17:33 hpcmds kernel: LustreError: 6056:0:(ldlm_lib.c: 
1442:target_send_reply_msg()) Skipped 209 previous similar messages
Feb 19 08:23:07 hpcmds kernel: LustreError: 6057:0:(mgs_handler.c: 
515:mgs_handle()) lustre_mgs: operation 101 on unconnected MGS
Feb 19 08:23:07 hpcmds kernel: LustreError: 6057:0:(mgs_handler.c: 
515:mgs_handle()) Skipped 205 previous similar messages



On Feb 19, 2008, at 12:15 AM, Oleg Drokin wrote:

 Hello!

 On Feb 18, 2008, at 5:13 PM, Charles Taylor wrote:
 Feb 18 15:32:47 r5b-s42 kernel: LustreError: 11-0: an error occurred
 while communicating with [EMAIL PROTECTED] The mds_close operation
 failed with -116
 Feb 18 15:32:47 r5b-s42 kernel: LustreError: Skipped 3 previous
 similar messages
 Feb 18 15:32:47 r5b-s42 kernel: LustreError: 7828:0:(file.c:
 97:ll_close_inode_openhandle()) inode 17243099 mdc close failed: rc =
 -116
 Feb 18 15:32:47 r5b-s42 kernel: LustreError: 7828:0:(file.c:
 97:ll_close_inode_openhandle()) Skipped 1 previous similar messages

 These mean client was evicted (And later successfully reconnected)  
 after
 opening file successfully.

 We need all the failure/evictions info since job started to make any
 meaningful progress, because as of now I have no idea why clients
 were evicted.

 Bye,
 Oleg

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] ldlm_enqueue operation failures

2008-02-19 Thread Charles Taylor
One more thing worth mentioning, we have no more callback or watchdog  
timer expired messages so 1.6.4.2 seems to have  fixed that.  So  
it just seems like if 512 threads try to open the same file at  
roughly the same time, we are running out of some resource on the MDS  
or OSSs that keeps Lustre from satisfying the request.

Charlie

On Feb 19, 2008, at 8:45 AM, Charles Taylor wrote:

 Yes, I understand.Right now we are just trying to isolate our
 problems so that we don't provide information that is not related to
 the issue.  Just to recap we were running pretty well with our
 patched 1.6.3 implementation.   However, we could not start a 512-way
 job in which each thread tries to open a single copy of the same
 file.Inevitably, one or more threads would get a can not open
 file error and call mpi_abort() even though the file is there and
 many other threads open it successfully. We thought we were
 hitting lustre bug 13197 which is supposed to be fixed in 1.6.4.2 so
 we upgraded our MGS/MDS and OSSs to 1.6.4.2.   We have *not* upgraded
 the clients (400+ of them) and were hoping to avoid that for the  
 moment.

 The upgrade seemed to go well and the file system is accessible on
 all the clients. However, our 512-way application still cannot
 run.We tried modifying the app so that each thread opens its own
 copy of the input file (i.e. file.in.rank) and duplicated the input
 file 512 times).This allowed the job to start but it eventually
 failed anyway with another can not open file error.

 ERROR (proc. 00410) - cannot open file: ./
 skews_ms2p0.mixt.cva_00411_5.3E-04


 This seems to clearly indicate a problem with Lustre and/or our
 implementation.

 On a perhaps separate note (perhaps not), since the upgrade
 yesterday, we are seeing the messages below every ten minutes.
 Perhaps we need shutdown and impose some sanity on all this but in
 reality, this is the only job that is having trouble (out of
 hundreds, sometimes thousands) and the file system seems to be
 operating just fine otherwise.

 Any insight is appreciated at this point.   We've put a lot of effort
 into lustre at this point and would like to stick with it but right
 now it looks like it can't scale to a 512 way job.

 Thanks for the help,

 Charlie



 Feb 19 07:07:09 hpcmds kernel: LustreError: 6057:0:(ldlm_lib.c:
 1442:target_send_reply_msg()) Skipped 202 previous similar messages
 Feb 19 07:12:41 hpcmds kernel: LustreError: 6056:0:(mgs_handler.c:
 515:mgs_handle()) lustre_mgs: operation 101 on unconnected MGS
 Feb 19 07:12:41 hpcmds kernel: LustreError: 6056:0:(mgs_handler.c:
 515:mgs_handle()) Skipped 201 previous similar messages
 Feb 19 07:17:12 hpcmds kernel: LustreError: 7162:0:(ldlm_lib.c:
 1442:target_send_reply_msg()) @@@ processing error (-107)
 [EMAIL PROTECTED] x36818597/t0 o101-?@?:-1 lens 232/0 ref 0
 fl Interpret:/0/0 rc -107/0
 Feb 19 07:17:12 hpcmds kernel: LustreError: 7162:0:(ldlm_lib.c:
 1442:target_send_reply_msg()) Skipped 207 previous similar messages
 Feb 19 07:22:42 hpcmds kernel: LustreError: 6056:0:(mgs_handler.c:
 515:mgs_handle()) lustre_mgs: operation 101 on unconnected MGS
 Feb 19 07:22:42 hpcmds kernel: LustreError: 6056:0:(mgs_handler.c:
 515:mgs_handle()) Skipped 209 previous similar messages
 Feb 19 07:27:16 hpcmds kernel: LustreError: 6056:0:(ldlm_lib.c:
 1442:target_send_reply_msg()) @@@ processing error (-107)
 [EMAIL PROTECTED] x679809/t0 o101-?@?:-1 lens 232/0 ref 0 fl
 Interpret:/0/0 rc -107/0
 Feb 19 07:27:16 hpcmds kernel: LustreError: 6056:0:(ldlm_lib.c:
 1442:target_send_reply_msg()) Skipped 207 previous similar messages
 Feb 19 07:32:50 hpcmds kernel: LustreError: 7162:0:(mgs_handler.c:
 515:mgs_handle()) lustre_mgs: operation 101 on unconnected MGS
 Feb 19 07:32:50 hpcmds kernel: LustreError: 7162:0:(mgs_handler.c:
 515:mgs_handle()) Skipped 205 previous similar messages
 Feb 19 07:37:16 hpcmds kernel: LustreError: 6057:0:(ldlm_lib.c:
 1442:target_send_reply_msg()) @@@ processing error (-107)
 [EMAIL PROTECTED] x140057135/t0 o101-?@?:-1 lens 232/0 ref 0
 fl Interpret:/0/0 rc -107/0
 Feb 19 07:37:16 hpcmds kernel: LustreError: 6057:0:(ldlm_lib.c:
 1442:target_send_reply_msg()) Skipped 201 previous similar messages
 Feb 19 07:42:52 hpcmds kernel: LustreError: 6057:0:(mgs_handler.c:
 515:mgs_handle()) lustre_mgs: operation 101 on unconnected MGS
 Feb 19 07:42:52 hpcmds kernel: LustreError: 6057:0:(mgs_handler.c:
 515:mgs_handle()) Skipped 205 previous similar messages
 Feb 19 07:47:17 hpcmds kernel: LustreError: 7162:0:(ldlm_lib.c:
 1442:target_send_reply_msg()) @@@ processing error (-107)
 [EMAIL PROTECTED] x5243687/t0 o101-?@?:-1 lens 232/0 ref 0 fl
 Interpret:/0/0 rc -107/0
 Feb 19 07:47:17 hpcmds kernel: LustreError: 7162:0:(ldlm_lib.c:
 1442:target_send_reply_msg()) Skipped 207 previous similar messages
 Feb 19 07:52:59 hpcmds kernel: LustreError: 6057:0:(mgs_handler.c:
 515:mgs_handle()) lustre_mgs: operation 101 on unconnected MGS
 Feb 19 07

[Lustre-discuss] ldlm_enqueue operation failures

2008-02-18 Thread Charles Taylor


FWIW, we got our  MGS/MDS and OSSs upgraded to 1.6.4.2 and they seem  
to be fine.The clients are still running 1.6.3.

Unfortunately, the upgrade did not resolve our issue.One our  
users has an mpi app where every thread opens the same input file  
(actually several in succession).Although we have run this job  
successfully before on up to 512 procs, it is not working now. 
Lustre seems to be locking up when all the threads go after the same  
file (to open) and we see things such as ...

Feb 18 15:42:11 r3b-s16 kernel: LustreError: 11-0: an error occurred  
while communicating with [EMAIL PROTECTED] The ldlm_enqueue operation  
failed with -107
Feb 18 15:42:11 r3b-s16 kernel: LustreError: Skipped 21 previous  
similar messages
Feb 18 15:52:51 r3b-s16 kernel: LustreError: 11-0: an error occurred  
while communicating with [EMAIL PROTECTED] The ldlm_enqueue operation  
failed with -107
Feb 18 15:52:51 r3b-s16 kernel: LustreError: Skipped 19 previous  
similar messages

[EMAIL PROTECTED] is our MDS.   We have 512 ll_mdt threads (the max).

The actual error in the code on some of the threads will be that the  
file was not found (even though it was clearly there) and this only  
happens after about an 8 minute timeout.

Note that we have the file system mounted with the -o flock  
option. Is this part of the problem or are we hitting yet another  
bug?

Thanks,

Charlie Taylor
UF HPC Center
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] ldlm_enqueue operation failures

2008-02-18 Thread Charles Taylor
Well, the log on the MDS at the time of the failure looks like...

Feb 18 15:25:50 hpcmds kernel: LustreError: 7162:0:(mgs_handler.c: 
515:mgs_handle()) lustre_mgs: operation 101 on unconnected MGS
Feb 18 15:25:50 hpcmds kernel: LustreError: 7162:0:(mgs_handler.c: 
515:mgs_handle()) Skipped 263 previous similar messages
Feb 18 15:29:25 hpcmds kernel: LustreError: 6057:0:(ldlm_lib.c: 
1442:target_send_reply_msg()) @@@ processing error (-107)   
[EMAIL PROTECTED] x1602651/t0 o101-?@?:-1 lens 232/0 ref 0 fl  
Interpret:/0/0 rc -107/0
Feb 18 15:29:25 hpcmds kernel: LustreError: 6057:0:(ldlm_lib.c: 
1442:target_send_reply_msg()) Skipped 427 previous similar messages
Feb 18 15:31:28 hpcmds kernel: LustreError: 7150:0:(mds_open.c: 
1474:mds_close()) @@@ no handle for file close ino 43116025: cookie  
0x1938027bf9d67349  [EMAIL PROTECTED] x1789/t0 o35- 
 beb7df79-6127-c0ca-9d36-2a96817a77a9@:-1 lens 296/1736 ref 0 fl  
Interpret:/0/0 rc 0/0
Feb 18 15:31:28 hpcmds kernel: LustreError: 7150:0:(mds_open.c: 
1474:mds_close()) Skipped 161 previous similar messages
Feb 18 15:33:17 hpcmds kernel: LustreError: 0:0:(ldlm_lockd.c: 
210:waiting_locks_callback()) ### lock callback timer expired:  
evicting client 2bdea9d4-43c3-a0b0-2822- 
[EMAIL PROTECTED] nid [EMAIL PROTECTED]  ns: mds- 
ufhpc-MDT_UUID lock: 810053d3f100/0x688cfbc7df2ef487 lrc:  
1/0,0 mode: CR/CR res: 21878337/3424633214 bits 0x3 rrc: 582 type:  
IBT flags: 430 remote: 0x95c1d2685c2c76d9 expref: 21 pid 6090
Feb 18 15:33:17 hpcmds kernel: LustreError: 0:0:(ldlm_lockd.c: 
210:waiting_locks_callback()) Skipped 3 previous similar messages
Feb 18 15:33:17 hpcmds kernel: LustreError: 6265:0:(ldlm_lockd.c: 
962:ldlm_handle_enqueue()) ### lock on destroyed export  
8101096ec000 ns: mds-ufhpc-MDT_UUID lock:  
810225fe12c0/0x688cfbc7df2ef505 lrc: 2/0,0 mode: CR/CR res:  
21878337/3424633214 bits 0x3 rrc: 579 type: IBT flags: 430  
remote: 0x95c1d2685c2c76e0 expref: 6 pid 6265
Feb 18 15:33:17 hpcmds kernel: LustreError: 6265:0:(ldlm_lockd.c: 
962:ldlm_handle_enqueue()) Skipped 3 previous similar messages
Feb 18 15:33:17 hpcmds kernel: Lustre: 6061:0:(mds_reint.c: 
127:mds_finish_transno()) commit transaction for disconnected client  
2bdea9d4-43c3-a0b0-2822-c49ecfe6e044: rc 0

We don't have any watchdog timeouts associated with the event so I  
don't have any tracebacks from those.One one of the clients we  
have...

Feb 18 15:33:17 r1b-s23 kernel: LustreError: 11-0: an error occurred  
while communicating with [EMAIL PROTECTED] The ldlm_enqueue operation  
failed with -107
Feb 18 15:33:17 r1b-s23 kernel: LustreError: Skipped 2 previous  
similar messages
Feb 18 15:33:17 r1b-s23 kernel: Lustre: ufhpc-MDT-mdc- 
81012d370800: Connection to service ufhpc-MDT via nid  
[EMAIL PROTECTED] was lost; in progress operations using thi\
s service will wait for recovery to complete.
Feb 18 15:33:17 r1b-s23 kernel: Lustre: Skipped 2 previous similar  
messages
Feb 18 15:33:17 r1b-s23 kernel: LustreError: 167-0: This client was  
evicted by ufhpc-MDT; in progress operations using this service  
will fail.
Feb 18 15:33:17 r1b-s23 kernel: LustreError: Skipped 2 previous  
similar messages
Feb 18 15:33:17 r1b-s23 kernel: LustreError: 12004:0:(mdc_locks.c: 
423:mdc_finish_enqueue()) ldlm_cli_enqueue: -5
Feb 18 15:33:17 r1b-s23 kernel: LustreError: 12004:0:(mdc_locks.c: 
423:mdc_finish_enqueue()) Skipped 3 previous similar messages
Feb 18 15:33:17 r1b-s23 kernel: Lustre: ufhpc-MDT-mdc- 
81012d370800: Connection restored to service ufhpc-MDT using  
nid [EMAIL PROTECTED]
Feb 18 15:33:17 r1b-s23 kernel: Lustre: Skipped 2 previous similar  
messages


ct



On Feb 18, 2008, at 4:42 PM, Oleg Drokin wrote:

 Hello!

 On Feb 18, 2008, at 4:29 PM, Charles Taylor wrote:

 Unfortunately, the upgrade did not resolve our issue.One our
 users has an mpi app where every thread opens the same input file
 (actually several in succession).Although we have run this job
 successfully before on up to 512 procs, it is not working now.
 Lustre seems to be locking up when all the threads go after the same
 file (to open) and we see things such as ...

 Can you upload full log from start of problematic job to end  
 somewhere?
 Also somewhere when first watchdog timeouts hit, it would be nice  
 if you
 can do sysrq-t on MDS too to get traces of all threads (you need to  
 have
 big dmesg buffer for them to fit, of use serial console).
 Is the job uses flocks/fcntl locks at all? if not, then don't worry  
 about
 mounting with -o flock.

 Bye,
 Oleg

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] ldlm_enqueue operation failures

2008-02-18 Thread Charles Taylor
Well, yes.   But the evictions are the result of the job trying to  
start.   Absent that, there are no evictions.A bunch of threads  
trying to open the same file should not cause the clients to be  
evicted.That's an odd way of dealing with concurrency.  :)

Charlie

On Feb 18, 2008, at 4:57 PM, Oleg Drokin wrote:

 Hello!

 On Feb 18, 2008, at 4:55 PM, Charles Taylor wrote:
 Feb 18 15:25:50 hpcmds kernel: LustreError: 7162:0:(mgs_handler.c: 
 515:mgs_handle()) lustre_mgs: operation 101 on unconnected MGS
 Feb 18 15:25:50 hpcmds kernel: LustreError: 7162:0:(mgs_handler.c: 
 515:mgs_handle()) Skipped 263 previous similar messages
 Feb 18 15:29:25 hpcmds kernel: LustreError: 6057:0:(ldlm_lib.c: 
 1442:target_send_reply_msg()) @@@ processing error (-107)   
 [EMAIL PROTECTED] x1602651/t0 o101-?@?:-1 lens 232/0 ref 0  
 fl Interpret:/0/0 rc -107/0
 Feb 18 15:29:25 hpcmds kernel: LustreError: 6057:0:(ldlm_lib.c: 
 1442:target_send_reply_msg()) Skipped 427 previous similar messages
 Feb 18 15:31:28 hpcmds kernel: LustreError: 7150:0:(mds_open.c: 
 1474:mds_close()) @@@ no handle for file close ino 43116025:  
 cookie 0x1938027bf9d67349  [EMAIL PROTECTED] x1789/t0 o35- 
 beb7df79-6127-c0ca-9d36-2a96817a77a9@:-1 lens 296/1736 ref 0 fl  
 Interpret:/0/0 rc 0/0
 Feb 18 15:31:28 hpcmds kernel: LustreError: 7150:0:(mds_open.c: 
 1474:mds_close()) Skipped 161 previous similar messages
 Feb 18 15:33:17 hpcmds kernel: LustreError: 0:0:(ldlm_lockd.c: 
 210:waiting_locks_callback()) ### lock callback timer expired:  
 evicting client 2bdea9d4-43c3-a0b0-2822- 
 [EMAIL PROTECTED] nid [EMAIL PROTECTED]  ns:  
 mds-ufhpc-MDT_UUID lock: 810053d3f100/0x688cfbc7df2ef487  
 lrc: 1/0,0 mode: CR/CR res: 21878337/3424633214 bits 0x3 rrc: 582  
 type: IBT flags: 430 remote: 0x95c1d2685c2c76d9 expref: 21 pid  
 6090
 Feb 18 15:33:17 hpcmds kernel: LustreError: 0:0:(ldlm_lockd.c: 
 210:waiting_locks_callback()) Skipped 3 previous similar messages
 Feb 18 15:33:17 hpcmds kernel: LustreError: 6265:0:(ldlm_lockd.c: 
 962:ldlm_handle_enqueue()) ### lock on destroyed export  
 8101096ec000 ns: mds-ufhpc-MDT_UUID lock:  
 810225fe12c0/0x688cfbc7df2ef505 lrc: 2/0,0 mode: CR/CR res:  
 21878337/3424633214 bits 0x3 rrc: 579 type: IBT flags: 430  
 remote: 0x95c1d2685c2c76e0 expref: 6 pid 6265
 Feb 18 15:33:17 hpcmds kernel: LustreError: 6265:0:(ldlm_lockd.c: 
 962:ldlm_handle_enqueue()) Skipped 3 previous similar messages
 Feb 18 15:33:17 hpcmds kernel: Lustre: 6061:0:(mds_reint.c: 
 127:mds_finish_transno()) commit transaction for disconnected  
 client 2bdea9d4-43c3-a0b0-2822-c49ecfe6e044: rc 0

 This looks like in the middle of eviction storm, and by this point  
 MDS and MGS anlready evicted tons of clients for unknown reasons  
 (should be in the log before those messages).

 Bye,
 Oleg

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] ldlm_enqueue operation failures

2008-02-18 Thread Charles Taylor

We also see these on some of the clients...

Feb 18 15:32:47 r5b-s42 kernel: LustreError: 11-0: an error occurred  
while communicating with [EMAIL PROTECTED] The mds_close operation  
failed with -116
Feb 18 15:32:47 r5b-s42 kernel: LustreError: Skipped 3 previous  
similar messages
Feb 18 15:32:47 r5b-s42 kernel: LustreError: 7828:0:(file.c: 
97:ll_close_inode_openhandle()) inode 17243099 mdc close failed: rc =  
-116
Feb 18 15:32:47 r5b-s42 kernel: LustreError: 7828:0:(file.c: 
97:ll_close_inode_openhandle()) Skipped 1 previous similar message

I'm assuming some of the threads succeed in opening the file.   When  
one fails, it calls mpi_abort() at which point all those threads that  
successfully opened the file then try to close it.Apparently they  
can't close the file at that point either.I'm guessing of course,  
but it seems plausible.

ct

On Feb 18, 2008, at 4:57 PM, Oleg Drokin wrote:

 Hello!

 On Feb 18, 2008, at 4:55 PM, Charles Taylor wrote:
 Feb 18 15:25:50 hpcmds kernel: LustreError: 7162:0:(mgs_handler.c: 
 515:mgs_handle()) lustre_mgs: operation 101 on unconnected MGS
 Feb 18 15:25:50 hpcmds kernel: LustreError: 7162:0:(mgs_handler.c: 
 515:mgs_handle()) Skipped 263 previous similar messages
 Feb 18 15:29:25 hpcmds kernel: LustreError: 6057:0:(ldlm_lib.c: 
 1442:target_send_reply_msg()) @@@ processing error (-107)   
 [EMAIL PROTECTED] x1602651/t0 o101-?@?:-1 lens 232/0 ref 0  
 fl Interpret:/0/0 rc -107/0
 Feb 18 15:29:25 hpcmds kernel: LustreError: 6057:0:(ldlm_lib.c: 
 1442:target_send_reply_msg()) Skipped 427 previous similar messages
 Feb 18 15:31:28 hpcmds kernel: LustreError: 7150:0:(mds_open.c: 
 1474:mds_close()) @@@ no handle for file close ino 43116025:  
 cookie 0x1938027bf9d67349  [EMAIL PROTECTED] x1789/t0 o35- 
 beb7df79-6127-c0ca-9d36-2a96817a77a9@:-1 lens 296/1736 ref 0 fl  
 Interpret:/0/0 rc 0/0
 Feb 18 15:31:28 hpcmds kernel: LustreError: 7150:0:(mds_open.c: 
 1474:mds_close()) Skipped 161 previous similar messages
 Feb 18 15:33:17 hpcmds kernel: LustreError: 0:0:(ldlm_lockd.c: 
 210:waiting_locks_callback()) ### lock callback timer expired:  
 evicting client 2bdea9d4-43c3-a0b0-2822- 
 [EMAIL PROTECTED] nid [EMAIL PROTECTED]  ns:  
 mds-ufhpc-MDT_UUID lock: 810053d3f100/0x688cfbc7df2ef487  
 lrc: 1/0,0 mode: CR/CR res: 21878337/3424633214 bits 0x3 rrc: 582  
 type: IBT flags: 430 remote: 0x95c1d2685c2c76d9 expref: 21 pid  
 6090
 Feb 18 15:33:17 hpcmds kernel: LustreError: 0:0:(ldlm_lockd.c: 
 210:waiting_locks_callback()) Skipped 3 previous similar messages
 Feb 18 15:33:17 hpcmds kernel: LustreError: 6265:0:(ldlm_lockd.c: 
 962:ldlm_handle_enqueue()) ### lock on destroyed export  
 8101096ec000 ns: mds-ufhpc-MDT_UUID lock:  
 810225fe12c0/0x688cfbc7df2ef505 lrc: 2/0,0 mode: CR/CR res:  
 21878337/3424633214 bits 0x3 rrc: 579 type: IBT flags: 430  
 remote: 0x95c1d2685c2c76e0 expref: 6 pid 6265
 Feb 18 15:33:17 hpcmds kernel: LustreError: 6265:0:(ldlm_lockd.c: 
 962:ldlm_handle_enqueue()) Skipped 3 previous similar messages
 Feb 18 15:33:17 hpcmds kernel: Lustre: 6061:0:(mds_reint.c: 
 127:mds_finish_transno()) commit transaction for disconnected  
 client 2bdea9d4-43c3-a0b0-2822-c49ecfe6e044: rc 0

 This looks like in the middle of eviction storm, and by this point  
 MDS and MGS anlready evicted tons of clients for unknown reasons  
 (should be in the log before those messages).

 Bye,
 Oleg

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] ext3-unlink-race.patch

2008-02-17 Thread Charles Taylor

When trying to build 1.6.4.2 from a clean source tree I get the  
following after a successful configure.

cd linux-stage  quilt push -a -q
Applying patch patches/ext3-wantedi-2.6-rhel4.patch
Applying patch patches/iopen-2.6.18-rhel5.patch
Applying patch patches/ext3-map_inode_page-2.6.18.patch
Applying patch patches/export-ext3-2.6-rhel4.patch
Applying patch patches/ext3-include-fixes-2.6-rhel4.patch
Applying patch patches/ext3-extents-2.6.18-vanilla.patch
Applying patch patches/ext3-mballoc3-core.patch
Applying patch patches/ext3-mballoc3-2.6.18.patch
Applying patch patches/ext3-nlinks-2.6.9.patch
Applying patch patches/ext3-ialloc-2.6.patch
Applying patch patches/ext3-remove-cond_resched-calls-2.6.12.patch
Applying patch patches/ext3-filterdata-sles10.patch
Applying patch patches/ext3-uninit-2.6.18.patch
Applying patch patches/ext3-nanosecond-2.6.18-vanilla.patch
Applying patch patches/ext3-inode-version-2.6.18-vanilla.patch
Applying patch patches/ext3-mmp-2.6.18-vanilla.patch
Applying patch patches/ext3-unlink-race.patch
1 out of 1 hunk FAILED
Patch patches/ext3-unlink-race.patch does not apply (enforce with -f)


Looking at the patch and the source, I can see why the hunk failed.
I don't see a place to put it either.I'm going to just remove  
this patch file from the list but if anyone has encountered this or  
knows why this patch doesn't apply successfully and wants to save me  
from impending disaster, I'd be happy to hear from you.  :)

Thanks,

Charlie Taylor
UF HPC Center
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] 1.6.3 - 1.6.4.2 upgrade

2008-02-17 Thread Charles Taylor

Turns out that the client upgrade worked just fine.   I had not  
noticed that the ko2iblnd module was not in place.

I'm still wondering if I need to do anything special with regard to  
upgrading the MGS/MDS and OSSs.   I'm hoping to just dump the  
software in place and reboot with live clients.   Seems kind of  
risky, but hey, the docs say you can do it for 1.4 - 1.6.3 so going  
from 1.6.3 - 1.6.4.2 ought to be a no-brainer, right?   :)

Charlie Taylor
UF HPC Center

On Feb 17, 2008, at 3:44 PM, Charles Taylor wrote:



 Just updated a single client from 1.6.3 to 1.6.4.2.The
 documentation seems to indicate that an upgraded client should still
 be able to mount the file system from a non-upgraded MGSMDT.The
 documentation appears to be referring to a 1.4 to 1.6 upgrade but I
 made the leap that similar things would apply to 1.6.3 - 1.6.4.2.

 As I said, the servers are still running 1.6.3 and have not been
 touched.The client is upgraded to 1.6.4.2.   When I try to mount
 the file system I get...

 Is the MGS specification correct?
 Is the filesystem name correct?
 If upgrading, is the copied client log valid? (see upgrade docs)

 I've double-checked the first two but I have no idea what the third
 item refers to.The docs doc about using tunefs to manually copy
 client config logs when upgrading an MGS/MDS but they seem to
 indicate that the only issue on the client one needs to worry about
 is the form of the mount command.   In going from 1.6.3 to 1.6.4.2,
 that should not be an issue.
 Have I missed a step?   Do I need to do something to tell the MGS/MDS
 that the client has been upgraded?

 Is there newer documentation for going from 1.6.3 to 1.6.4.2?I
 was hoping that I could just upgrade the software on the MGS/MDS and
 OSS (in that order) and restart?Is that not the case?

 Thanks,

 charlie taylor
 uf hpc center
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss