Re: [lustre-discuss] confused about mdt space
Thanks a lot. I have two more questions: 1) Assume I consider the mdt space use the method described in lustre manual, by calculation, the metadata space is 400GB. After format(default option), about 160GB(40% of 400GB) preallocated for inodes, so the avalaible inodes number is less than estimated, right ? 2) mds need additional space for other use, like log,acls,xattrs;how to estimate these space ? Thanks! Mohr Jr, Richard Frank 于2020年3月31日周二 下午9:57写道: > > > > On Mar 30, 2020, at 10:56 PM, 肖正刚 wrote: > > > > Hello, I have some question about metadata space. > > 1) I have ten 960GB SAS SSDs for mdt,after done raid10,we have 4.7TB > space free. > > after formated as mdt,we only have 2.6TB space free; so where the 2.1TB > space go ? > > 2) for the 2.6TB space, what's it used for? > > That space is used by inodes. I believe the recent lustre versions use a > 1KB inode size by default and the default format options create 1 inodes > for every 2.5 KB of MDT space. So about 40% of your disk space will be > consumed by inodes. > > — > Rick Mohr > Senior HPC System Administrator > Joint Institute for Computational Sciences > University of Tennessee > > > > > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] OST recovery
On Mar 29, 2020, at 20:04, Gong-Do Hwang mailto:grover.hw...@gmail.com>> wrote: Thanks Andreas, I ran "mkfs.lustre --ost --reformat --fsname lfs_home --index 6 --mgsnode 10.10.0.10@o2ib --servicenode 10.10.0.13@o2ib --failnode 10.10.0.14@o2ib /dev/mapper/mpathx", and at that time /dev/mapper/mpathx was mounted and served as an OST under FS lfs. And the FS lfs ran well until I umount the /dev/mapper/mpathx in order to restart the mgt/mgs. The issue here is that the "--reformat" option will override the checks if a filesystem already exists on the device. That should not normally be used. And after I re-mounted the ost I got the msg "mount.lustre FATAL: failed to write local files: Invalid argument mount.lustre: mount /dev/mapper/mpathx at /lfs/ost8 failed: Invalid argument" and the tunefs.luster --dryrun /dev/mapper/mapthx output is "tunefs.lustre --dryrun /dev/mapper/mpathx checking for existing Lustre data: found Reading CONFIGS/mountdata Read previous values: Target: lfs-OST0008 This shows that the device was previously part of the "lfs" filesystem at index 8. While it is possible to change the filesystem name, the OST index should never change, so there is no tool for this. Two things need to be done. You can rewrite the filesystem label with "e2label /dev/mapper/mpathx lfs-OST0008". Then you need to rebuild the "CONFIGS/mountdata" file. The easiest way to generate a new mountdata file would be to run "mkfs.lustre" with the same options as the original OST on a temporary device (e.g. loopback device) but add in the "--replace" option so that the OST doesn't try to add itself to the filesystem as a new OST. Then mount the temporary and original OSTs as type ldiskfs and copy the file CONFIGS/mountdata from temp to original OST to replace the broken one (probably a good idea to make a backup first). Hopefully with these two changes you can mount your OST again. Cheers, Andreas Index: 6 Lustre FS: lfs_home Mount type: ldiskfs Flags: 0x1042 (OST update no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.10.0.10@o2ib failover.node=10.10.0.13@o2ib:10.10.0.14@o2ib Permanent disk data: Target: lfs_home-OST0006 Index: 6 Lustre FS: lfs_home Mount type: ldiskfs Flags: 0x1042 (OST update no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.10.0.10@o2ib failover.node=10.10.0.13@o2ib:10.10.0.14@o2ib" I guess I actually have run the mkfs command twice so the Lustre FS in the previous value became lfs_home(originally is lfs). I tried to mount the partition use backup superblocks and all of them are empty. But from the dumpe2fs info, "Inode count: 41943040 Block count: 10737418240 Reserved block count: 536870912 Free blocks: 1459812475 Free inodes: 39708575" seems there is still data on it. The backup superblocks are for the underlying ext4/ldiskfs filesystem, so are not really related to this problem. So my problem is, if the data on the partition is still intact, is there any way I can rebuild the file index? And is there anyway I can rewrite the CONFIGS/mountdata back to its original values? Sorry for the lengthy messages and really appreciate your help! Best Regards, Grover On Mon, Mar 30, 2020 at 7:14 AM Andreas Dilger mailto:adil...@whamcloud.com>> wrote: It would be useful if you provided the actual error messages, so we can see where the problem is. What command did you run on the OST? Does the OST still show that it has data in it (e.g. "df" or "dumpe2fs -h" shows lots of used blocks)? On Mar 25, 2020, at 10:05, Gong-Do Hwang mailto:grover.hw...@gmail.com>> wrote: Dear Lustre, Months ago when I tried to add a new disk to my new Lustre FS, I accidentally target the mkfs.lustre to a then mounted OST partition of another Lustre FS. Weird enough the command passed through, and without paying attention to it, I umount the partition months later and couldn't mount it back, then I realized the mkfs.lustre command was legit. But my old lustre FS worked well through these months, so I guess the data in that OST is still there. But now the permanent CONFIG/mountdata is the new one, and I can still see my old config in the previous value. My question is is there any way I can write back the old CONFIG/mountdata and still keep all my files in that OST? I am using Luster 2.13.0 for my mgs/mdt/ost Thanks for your help and I really appreciate it! Grover Cheers, Andreas -- Andreas Dilger Principal Lustre Architect Whamcloud ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Files hanging on lustre clients
> On Mar 31, 2020, at 3:43 PM, Kurt Strosahl wrote: > > I can't tell, any commands I run against the files in question hang > indefinitely. It seems very suspicious though. The fact that the same OST appeared in error messages on two different clients made me think the problem might be with the OST. Would you be able to deactivate that OST so no new files get created on it? Then you could see if the problem goes away for newly created files. — Rick Mohr Senior HPC System Administrator Joint Institute for Computational Sciences University of Tennessee ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Files hanging on lustre clients
> On Mar 31, 2020, at 2:36 PM, Kurt Strosahl wrote: > > an strace on an ls command run against some of these files produced the > following: > getxattr("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stdout.030408_124.out", > "system.posix_acl_default", NULL, 0) = -1 ENODATA (No data available) > lstat("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stderr.030408_118.err", > {st_mode=S_IFREG|0644, st_size=16979, ...}) = 0 > getxattr("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stderr.030408_118.err", > "system.posix_acl_access", NULL, 0) = -1 ENODATA (No data available) > getxattr("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stderr.030408_118.err", > "system.posix_acl_default", NULL, 0) = -1 ENODATA (No data available) > lstat("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stdout.030408_000.out", > Lustre: lustre19-OST0028-osc-88105fecd000: Connection to lustre19-OST0028 > (at 172.17.0.99@o2ib) was lost; in progress operations using this service > will wait for recovery to complete > Lustre: lustre19-OST0028-osc-88105fecd000: Connection restored to > lustre19-OST0028 (at 172.17.0.99@o2ib) Of the files listed in the strace above that gave errors, are all those files striped across OST0028? — Rick Mohr Senior HPC System Administrator Joint Institute for Computational Sciences University of Tennessee ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Files hanging on lustre clients
I can't tell, any commands I run against the files in question hang indefinitely. It seems very suspicious though. From: Mohr Jr, Richard Frank Sent: Tuesday, March 31, 2020 3:41 PM To: Kurt Strosahl Cc: lustre-discuss@lists.lustre.org ; sci...@jlab.org Subject: [EXTERNAL] Re: [lustre-discuss] Files hanging on lustre clients > On Mar 31, 2020, at 2:36 PM, Kurt Strosahl wrote: > > an strace on an ls command run against some of these files produced the > following: > getxattr("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stdout.030408_124.out", > "system.posix_acl_default", NULL, 0) = -1 ENODATA (No data available) > lstat("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stderr.030408_118.err", > {st_mode=S_IFREG|0644, st_size=16979, ...}) = 0 > getxattr("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stderr.030408_118.err", > "system.posix_acl_access", NULL, 0) = -1 ENODATA (No data available) > getxattr("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stderr.030408_118.err", > "system.posix_acl_default", NULL, 0) = -1 ENODATA (No data available) > lstat("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stdout.030408_000.out", > Lustre: lustre19-OST0028-osc-88105fecd000: Connection to lustre19-OST0028 > (at 172.17.0.99@o2ib) was lost; in progress operations using this service > will wait for recovery to complete > Lustre: lustre19-OST0028-osc-88105fecd000: Connection restored to > lustre19-OST0028 (at 172.17.0.99@o2ib) Of the files listed in the strace above that gave errors, are all those files striped across OST0028? — Rick Mohr Senior HPC System Administrator Joint Institute for Computational Sciences University of Tennessee ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Files hanging on lustre clients
Hello, I'm tracking a very vexing issue. Somehow users are creating files that cause attempts to examine or manipulate the files to hang. They can't be removed, they can't be examined, even an ls command will hang. an strace on an ls command run against some of these files produced the following: getxattr("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stdout.030408_124.out", "system.posix_acl_default", NULL, 0) = -1 ENODATA (No data available) lstat("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stderr.030408_118.err", {st_mode=S_IFREG|0644, st_size=16979, ...}) = 0 getxattr("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stderr.030408_118.err", "system.posix_acl_access", NULL, 0) = -1 ENODATA (No data available) getxattr("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stderr.030408_118.err", "system.posix_acl_default", NULL, 0) = -1 ENODATA (No data available) lstat("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stdout.030408_000.out", On one client that was being used to debug this problem I see the following (running CentOS 6.5 and luster 2.5.42) Lustre: 2753:0:(client.c:1920:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1585677903/real 1585677903] req@880238fdd000 x1657754295856608/t0(0) o101->lustre19-OST0028-osc-88105fecd000@172.17.0.99@o2ib:28/4 lens 328/400 e 1 to 1 dl 1585678504 ref 1 fl Rpc:X/0/ rc 0/-1 Lustre: lustre19-OST0028-osc-88105fecd000: Connection to lustre19-OST0028 (at 172.17.0.99@o2ib) was lost; in progress operations using this service will wait for recovery to complete Lustre: lustre19-OST0028-osc-88105fecd000: Connection restored to lustre19-OST0028 (at 172.17.0.99@o2ib) Lustre: 2733:0:(client.c:304:ptlrpc_at_adj_net_latency()) Reported service time 465 > total measured time 96 LustreError: 2733:0:(layout.c:2005:__req_capsule_get()) @@@ Wrong buffer for field `dlm_rep' (1 of 1) in format `LDLM_ENQUEUE_LVB': 0 vs. 112 (server) req@8802149c5800 x1657754295862208/t0(0) o101->lustre19-OST0028-osc-88105fecd000@172.17.0.99@o2ib:28/4 lens 328/192 e 0 to 0 dl 1585679147 ref 1 fl Interpret:R/2/0 rc 0/0 LustreError: 2733:0:(layout.c:2005:__req_capsule_get()) Skipped 1 previous similar message On newer systems (RHEL 7.7 running 2.10.8-1) the problem also occurs and I see the following in dmesg [5437818.727792] Lustre: lustre19-OST0028-osc-886229244000: Connection restored to 172.17.0.99@o2ib (at 172.17.0.99@o2ib) [5438419.769959] Lustre: 2747:0:(client.c:2116:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1585678345/real 1585678345] req@884834ec0600 x1657002677904272/t0(0) o101->lustre19-OST0028-osc-886229244000@172.17.0.99@o2ib:28/4 lens 344/400 e 24 to 1 dl 1585678946 ref 1 fl Rpc:X/2/ rc -11/-1 [5438419.769973] Lustre: lustre19-OST0028-osc-886229244000: Connection to lustre19-OST0028 (at 172.17.0.99@o2ib) was lost; in progress operations using this service will wait for recovery to complete [5438419.770435] Lustre: lustre19-OST0028-osc-886229244000: Connection restored to 172.17.0.99@o2ib (at 172.17.0.99@o2ib) df and lfs df commands on these systems aren't hanging. on the OSS we are running: lustre 2.12.1-1 and CentOS 7.6) I'm seeing the following: [Mar31 14:26] LNet: Service thread pid 309349 was inactive for 1121.43s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [ +0.07] Pid: 309349, comm: ll_ost00_088 3.10.0-957.10.1.el7_lustre.x86_64 #1 SMP Tue Apr 30 22:18:15 UTC 2019 [ +0.02] Call Trace: [ +0.14] [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] [ +0.74] [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] [ +0.39] [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] [ +0.45] [] ofd_intent_policy+0x69b/0x920 [ofd] [ +0.15] [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] [ +0.38] [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] [ +0.46] [] tgt_enqueue+0x62/0x210 [ptlrpc] [ +0.72] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [ +0.65] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [ +0.56] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [ +0.54] [] kthread+0xd1/0xe0 [ +0.07] [] ret_from_fork_nospec_begin+0x7/0x21 [ +0.09] [] 0x [ +0.36] LustreError: dumping log to /tmp/lustre-log.1585679216.309349 Thank you, Kurt J. Strosahl System Administrator: Lustre, HPC Scientific Computing Group, Thomas Jefferson National Accelerator Facility ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] CFP: First International CHAOSS International Workshop
# First International Workshop on Challenges and Opportunities of HPC Storage Systems (CHAOSS) The workshop is aimed at researchers, developers of scientific applications, engineers and everyone interested in the evolution of HPC storage systems. As the developments of computing power, storage and network technologies continue to diverge, the performance gap between them widens. This trend, combined with the growing data volumes, results in I/O and storage bottlenecks that become increasingly serious especially for large-scale HPC storage systems. The hierarchy of different storage technologies to ease this situation leads to a complex environment which will become even more challenging for future exascale systems. This workshop is a venue for papers exploring topics related to data organization and management along with the impacts of multi-tier memory and storage for optimizing application throughput. It will take place at the Euro-Par 2020 conference in Warsaw, Poland on either August 24 or 25, 2020. More information is available at: https://wr.informatik.uni-hamburg.de/events/2020/chaoss ## Important Dates Paper Submission:May 8, 2020 Notification to Authors: June 30, 2020 Registration:July 10, 2020 Camera-Ready Deadline (Informal Proceedings): July 10, 2020 Workshop Dates: August 24 or 25, 2020 Camera-Ready Deadline: September 11, 2020 ## Submission Guidelines Papers should not exceed 12 pages (including title, text, figures, appendices and references). Papers of less than 10 pages will be considered as short papers that can be presented at the conference but will not be published in the proceedings. Papers must be formatted according to Springer's LNCS guidelines available at https://www.springer.com/gp/computer-science/lncs/conference-proceedings-guidelines. Accepted papers will be published in a separate LNCS workshop volume after the conference. One author of each accepted paper is required to register for the workshop and present the paper. Submissions will be submitted and managed via EasyChair at: https://easychair.org/conferences/?conf=europar2020workshop ## Topics of Interest Submissions may be more hands-on than research papers and we therefore explicitly encourage submissions in the early stages of research. Topics of interest include, but are not limited to: - Kernel and user space file/storage systems - Parallel and distributed file/storage systems - Data management approaches for heterogeneous storage systems - Management of self-describing data formats - Metadata management - Approaches using query and database interfaces - Hybrid solutions using file systems and databases - Optimized indexing techniques - Data organizations to support online workflows - Domain-specific data management solutions - Related experiences from users: what worked, what didn't? ## Program Committee - Gabriel Antoniu (INRIA) - Konstantinos Chasapis (DDN) - Andreas Dilger (Whamcloud/DDN) - Kira Duwe (UHH) - Wolfgang Frings (JSC) - Elsa Gonsiororowski (LLNL) - Anthony Kougkas (IIT) - Michael Kuhn (UHH) - Margaret Lawson (UIUC SNL) - Jay Lofstead (SNL) - Johann Lombardi (Intel) - Jakob Lüttgau (DKRZ) - Anna Queralt (BSC) - Yue Zhu (FSU) Cheers, Andreas -- Andreas Dilger Principal Lustre Architect Whamcloud ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] confused about mdt space
> On Mar 30, 2020, at 10:56 PM, 肖正刚 wrote: > > Hello, I have some question about metadata space. > 1) I have ten 960GB SAS SSDs for mdt,after done raid10,we have 4.7TB space > free. > after formated as mdt,we only have 2.6TB space free; so where the 2.1TB space > go ? > 2) for the 2.6TB space, what's it used for? That space is used by inodes. I believe the recent lustre versions use a 1KB inode size by default and the default format options create 1 inodes for every 2.5 KB of MDT space. So about 40% of your disk space will be consumed by inodes. — Rick Mohr Senior HPC System Administrator Joint Institute for Computational Sciences University of Tennessee ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org