[lustre-discuss] LAD'15: Call for papers extended up to August 4th
*LAD'15 - Lustre Administrator and Developer Workshop* September 22th - 23th, 2015 Paris Marriott Champs Elysees Hotel, Paris - France *CALL FOR PAPERS* We are extending call for papers up to August 4th, 2015. You have one more week to send your abstract! We are inviting community members to send proposals for presentations at this event. No proceeding is required, just an abstract of a 30-min (technical) presentation. Please send this to l...@eofs.eu Topics may include (but are not limited to): site updates or future projects, Lustre administration, monitoring and tools, Lustre feature overview, Lustre client performance (benefits of hardware evolution to Lustre (like SSD, many-cores…), comparison between Lustre and other parallel file systems (perf. and/or features), Lustre and Exascale I/O, tunings, etc. *REGISTRATION* Registration for the workshop is open: http://lad.eofs.org/register.php *WEB SITE* Get all details on http://www.eofs.eu/?id=lad15 *SPONSORS* We are very pleased this event is organized thanks to the following generous sponsors: ATOS, CEA, CRAY, DDN, INTEL and SEAGATE For any other information, please contact l...@eofs.eu ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Problems moving an OSS from an old Lustre installation to a new one
Hi We are migrating from an old Lustre installation composed by 1 MDS and 2 OSS to a new Lustre 2.5.3 installation. For this second installation we installated from scratch a new MDS + a new OST and we migrated the data from the old Lustre system). Problems started when we tried to move a OSS from the old installation to the new one. For this OSS server we reinstalled from scratch the Operating System (keeping the same IP name and number). Then for the OSTs we formatted the file systems using commands such as: mkfs.lustre --reformat --fsname=cmswork --mgsnode=t2-mds-01.lnl.infn.it@tcp0 --ost --param ost.quota_type=ug --index=3 --mkfsoptions='-i 65536' /dev/mapper/MD1200_1p1 (t2-mds-01.lnl.infn.it is the new MDS) and then we mounted the file systems Apparently this worked. After a while we realized that in the syslog of this moved OSS there were messages such as: Jul 25 10:54:02 t2-oss-03 kernel: Lustre: cmswork-OST0003: haven't heard from client cmswork-MDT-mdtlov_UUID (at 10.60.16.8@tcp) in 232 seconds. I think it's dead, and I am evicting it. exp 8803123bf400, cur 1437814442 expire 1437814292 last 1437814210 10.60.16.8 is the IP name of the old MDS !!! No idea why it was expecting communications from it ! At any rate on this old MDS I umounted the MGS and MDT file systems. After a while users complaining that there were problems for some (not all) files written in the new OSTs, e.g.: # ls -l /lustre/cmswork/ronchese/pat_ntu/cmssw53B_slc6/dev08tmp/src/PDAnalysis/EDM/bin/ntu.root ls: cannot access /lustre/cmswork/ronchese/pat_ntu/cmssw53B_slc6/dev08tmp/src/PDAnalysis/EDM/bin/ntu.root: Cannot allocate memory In the syslog of the client: Jul 26 08:01:09 t2-ui-13 kernel: LustreError: 11-0: cmswork-OST0003-osc-880818e5: Communicating with 10.60.16.9@tcp, operation ldlm_enqueue failed with -12. 10.60.16.9 is the IP of the moved OSS. In its syslog: Jul 26 08:01:09 t2-oss-03 kernel: LustreError: 8114:0:(ldlm_resource.c:1188:ldlm_resource_get()) cmswork-OST0003: lvbo_init failed for resource 0xb9:0x0: rc = -2 Jul 26 08:01:09 t2-oss-03 kernel: LustreError: 8114:0:(ldlm_resource.c:1188:ldlm_resource_get()) Skipped 1 previous similar message Reading: https://jira.hpdd.intel.com/browse/LU-4034 I guess the memory is not the real problem. The problem is that the object was not found in the OST. Some interesting messages found in the syslog of the moved OSS: Jul 24 14:56:25 t2-oss-03 kernel: Lustre: cmswork-OST0003: Received MDS connection from 10.60.16.8@tcp, removing former export from 10.60.16.38@tcp Jul 24 14:56:27 t2-oss-03 kernel: Lustre: cmswork-OST0003: already connected client cmswork-MDT-mdtlov_UUID \ (at 10.60.16.8@tcp) with handle 0xdb376ec08bf7d020. Rejecting client with the same UUID trying to reconnect with\ handle 0x6dffb49bb9b3bc70 10.60.16.8 is the IP of the old MDS 10.60.16.38 is the IP of the new MDS For the the being we disabled the OSTs hosted on the moved OSS so that new objects are not written there. Any idea what the problem is and how we could recover the system ? Thanks, Massimo smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] quota only in- but not decreasing after upgrading to Lustre 2.5.3
Good morning Malcom, all, Am 28.07.15 um 03:23 schrieb Cowe, Malcolm J: lctl conf_param fsname.quota.ost=ug You can verify status on the servers with: lctl get_param *.*.quota_slave.info think we did that correctly: on the MDS/MGT: [root@lustre2 ~]# lctl get_param *.*.quota_slave.info osd-ldiskfs.lustre-MDT.quota_slave.info= target name:lustre-MDT pool ID:0 type: md quota enabled: ug conn to master: setup space acct: ug user uptodate: glb[1],slv[1],reint[0] group uptodate: glb[1],slv[1],reint[0] [root@lustre2 ~]# on the OSTs: [root@lustre4 ~]# lctl get_param *.*.quota_slave.info | grep quota enable | uniq -c 7 quota enabled: ug [root@lustre4 ~]# [root@lustre3 ~]# lctl get_param *.*.quota_slave.info | grep quota enable | uniq -c 8 quota enabled: ug [root@lustre3 ~]# Best regards Torsten -- Dr. Torsten Harenberg torsten.harenb...@cern.ch Bergische Universitaet FB C - Physik Tel.: +49 (0)202 439-3521 Gaussstr. 20 Fax : +49 (0)202 439-2811 42097 Wuppertal @CERN: Bat. 1-1-049 Of course it runs NetBSD http://www.netbsd.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Problems moving an OSS from an old Lustre installation to a new one
On 28.07.2015 07:17, Massimo Sgaravatto wrote: Some interesting messages found in the syslog of the moved OSS: Jul 24 14:56:25 t2-oss-03 kernel: Lustre: cmswork-OST0003: Received MDS connection from 10.60.16.8@tcp, removing former export from 10.60.16.38@tcp Jul 24 14:56:27 t2-oss-03 kernel: Lustre: cmswork-OST0003: already connected client cmswork-MDT-mdtlov_UUID \ (at 10.60.16.8@tcp) with handle 0xdb376ec08bf7d020. Rejecting client with the same UUID trying to reconnect with\ handle 0x6dffb49bb9b3bc70 10.60.16.8 is the IP of the old MDS 10.60.16.38 is the IP of the new MDS For the the being we disabled the OSTs hosted on the moved OSS so that new objects are not written there. Any idea what the problem is and how we could recover the system ? Do I see it correctly, that the old MGS/MDS is still up and running? I understand it that way, that it still tries to find a OST at 10.60.16.9@tcp (that info is stored in the llog on the MGS). But I'm confused also, why it should think that the new OST is the one it is looking for. It has a new UUID, so it should be detected. Anyway, I would first shutdown the old MGS/MDS before I tried to write any more data to the new OST. -- Dr. Oliver Mangold System Analyst NEC Deutschland GmbH HPC Division Raiffeisenstraße 14 70771 Leinfelden-Echterdingen Germany Phone: +49 711 78055 13 Mail: oliver.mang...@emea.nec.com ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org