[lustre-discuss] LAD'15: Call for papers extended up to August 4th

2015-07-27 Thread DEGREMONT Aurelien

*LAD'15 - Lustre Administrator and Developer Workshop*
September 22th - 23th, 2015
Paris Marriott Champs Elysees Hotel, Paris - France

*CALL FOR PAPERS*
We are extending call for papers up to August 4th, 2015. You have one 
more week to send your abstract!
We are inviting community members to send proposals for presentations at 
this event. No proceeding is required, just an abstract of a 30-min 
(technical) presentation.

Please send this to l...@eofs.eu

Topics may include (but are not limited to): site updates or future 
projects, Lustre administration, monitoring and tools, Lustre feature 
overview, Lustre client performance (benefits of hardware evolution to 
Lustre (like SSD, many-cores…), comparison between Lustre and other 
parallel file systems (perf. and/or features), Lustre and Exascale I/O, 
tunings, etc.


*REGISTRATION*
Registration for the workshop is open:
http://lad.eofs.org/register.php

*WEB SITE*
Get all details on http://www.eofs.eu/?id=lad15

*SPONSORS*
We are very pleased this event is organized thanks to the following 
generous sponsors:

ATOS, CEA, CRAY, DDN, INTEL and SEAGATE

For any other information, please contact l...@eofs.eu
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Problems moving an OSS from an old Lustre installation to a new one

2015-07-27 Thread Massimo Sgaravatto

Hi

We are migrating from an old Lustre installation composed by 1 MDS and 2 
OSS to a new Lustre 2.5.3 installation.


For this second installation we installated from scratch a new MDS + a 
new OST and we migrated the data from the old Lustre system).



Problems started when we tried to move a OSS from the old installation 
to the new one.


For this OSS server we reinstalled from scratch the Operating System 
(keeping the same IP name and number).

Then for the OSTs we formatted the file systems using commands such as:


 mkfs.lustre --reformat --fsname=cmswork 
--mgsnode=t2-mds-01.lnl.infn.it@tcp0 --ost --param ost.quota_type=ug 
--index=3 --mkfsoptions='-i 65536' /dev/mapper/MD1200_1p1



(t2-mds-01.lnl.infn.it is the new MDS)

and then we mounted the file systems

Apparently this worked.


After a while we realized that in the syslog of this moved OSS there 
were messages such as:


Jul 25 10:54:02 t2-oss-03 kernel: Lustre: cmswork-OST0003: haven't heard 
from client cmswork-MDT-mdtlov_UUID (at 10.60.16.8@tcp) in 232 
seconds. I think it's dead, and I am evicting it. exp 8803123bf400, 
cur 1437814442 expire 1437814292 last 1437814210



10.60.16.8 is the IP name of the old MDS !!!


No idea why it was expecting communications from it !
At any rate on this old MDS I umounted the MGS and MDT file systems.


After a while users complaining that there were problems for some (not 
all) files written in the new OSTs, e.g.:


# ls -l
/lustre/cmswork/ronchese/pat_ntu/cmssw53B_slc6/dev08tmp/src/PDAnalysis/EDM/bin/ntu.root

ls: cannot access
/lustre/cmswork/ronchese/pat_ntu/cmssw53B_slc6/dev08tmp/src/PDAnalysis/EDM/bin/ntu.root:
Cannot allocate memory


In the syslog of the client:

Jul 26 08:01:09 t2-ui-13 kernel: LustreError: 11-0:
cmswork-OST0003-osc-880818e5: Communicating with 10.60.16.9@tcp,
operation ldlm_enqueue failed with -12.


10.60.16.9 is the IP of the moved OSS.
In its syslog:


Jul 26 08:01:09 t2-oss-03 kernel: LustreError:
8114:0:(ldlm_resource.c:1188:ldlm_resource_get()) cmswork-OST0003:
lvbo_init failed for resource 0xb9:0x0: rc = -2
Jul 26 08:01:09 t2-oss-03 kernel: LustreError:
8114:0:(ldlm_resource.c:1188:ldlm_resource_get()) Skipped 1 previous
similar message


Reading:

https://jira.hpdd.intel.com/browse/LU-4034

I guess the memory is not the real problem. The problem is that the 
object was not found in the OST.



Some interesting messages found in the syslog of the moved OSS:

Jul 24 14:56:25 t2-oss-03 kernel: Lustre: cmswork-OST0003: Received MDS 
connection from 10.60.16.8@tcp, removing former export from 10.60.16.38@tcp


Jul 24 14:56:27 t2-oss-03 kernel: Lustre: cmswork-OST0003: already 
connected client cmswork-MDT-mdtlov_UUID \
(at 10.60.16.8@tcp) with handle 0xdb376ec08bf7d020. Rejecting client 
with the same UUID trying to reconnect with\

 handle 0x6dffb49bb9b3bc70

10.60.16.8 is the IP of the old MDS
10.60.16.38 is the IP of the new MDS


For the the being we disabled the OSTs hosted on the moved OSS so that 
new objects are not written there.



Any idea what the problem is and how we could recover the system ?



Thanks, Massimo



smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] quota only in- but not decreasing after upgrading to Lustre 2.5.3

2015-07-27 Thread Torsten Harenberg
Good morning Malcom, all,

Am 28.07.15 um 03:23 schrieb Cowe, Malcolm J:
 lctl conf_param fsname.quota.ost=ug
 
 You can verify status on the servers with:
 
 lctl get_param *.*.quota_slave.info

think we did that correctly:

on the MDS/MGT:

[root@lustre2 ~]# lctl get_param *.*.quota_slave.info
osd-ldiskfs.lustre-MDT.quota_slave.info=
target name:lustre-MDT
pool ID:0
type:   md
quota enabled:  ug
conn to master: setup
space acct: ug
user uptodate:  glb[1],slv[1],reint[0]
group uptodate: glb[1],slv[1],reint[0]
[root@lustre2 ~]#

on the OSTs:

[root@lustre4 ~]# lctl get_param *.*.quota_slave.info | grep quota
enable | uniq -c
  7 quota enabled:  ug
[root@lustre4 ~]#

[root@lustre3 ~]# lctl get_param *.*.quota_slave.info | grep quota
enable | uniq -c
  8 quota enabled:  ug
[root@lustre3 ~]#

Best regards

  Torsten

-- 

  
 Dr. Torsten Harenberg torsten.harenb...@cern.ch  
 Bergische Universitaet   
 FB C - Physik Tel.: +49 (0)202 439-3521  
 Gaussstr. 20  Fax : +49 (0)202 439-2811  
 42097 Wuppertal   @CERN: Bat. 1-1-049
  
 Of course it runs NetBSD http://www.netbsd.org 
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Problems moving an OSS from an old Lustre installation to a new one

2015-07-27 Thread Oliver Mangold
On 28.07.2015 07:17, Massimo Sgaravatto wrote:


 Some interesting messages found in the syslog of the moved OSS:

 Jul 24 14:56:25 t2-oss-03 kernel: Lustre: cmswork-OST0003: Received
 MDS connection from 10.60.16.8@tcp, removing former export from
 10.60.16.38@tcp

 Jul 24 14:56:27 t2-oss-03 kernel: Lustre: cmswork-OST0003: already
 connected client cmswork-MDT-mdtlov_UUID \
 (at 10.60.16.8@tcp) with handle 0xdb376ec08bf7d020. Rejecting client
 with the same UUID trying to reconnect with\
  handle 0x6dffb49bb9b3bc70

 10.60.16.8 is the IP of the old MDS
 10.60.16.38 is the IP of the new MDS


 For the the being we disabled the OSTs hosted on the moved OSS so
 that new objects are not written there.


 Any idea what the problem is and how we could recover the system ?

Do I see it correctly, that the old MGS/MDS is still up and running? I
understand it that way, that it still tries to find a OST at
10.60.16.9@tcp (that info is stored in the llog on the MGS). But I'm
confused also, why it should think that the new OST is the one it is
looking for. It  has a new UUID, so it should be detected. Anyway, I
would first shutdown the old MGS/MDS before I tried to write any more
data to the new OST.

-- 
Dr. Oliver Mangold
System Analyst
NEC Deutschland GmbH
HPC Division
Raiffeisenstraße 14
70771 Leinfelden-Echterdingen
Germany
Phone: +49 711 78055 13
Mail: oliver.mang...@emea.nec.com

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org