Re: [lustre-discuss] confused about mdt space

2020-03-31 Thread 肖正刚
Thanks a lot.
I have two more questions:
1) Assume I consider the mdt space use the method described in lustre
manual, by calculation, the metadata space is 400GB.
After format(default option), about 160GB(40% of 400GB) preallocated for
inodes, so the avalaible inodes number is less than estimated, right ?
2) mds need additional space for other use, like log,acls,xattrs;how to
estimate these space ?

Thanks!

Mohr Jr, Richard Frank  于2020年3月31日周二 下午9:57写道:

>
>
> > On Mar 30, 2020, at 10:56 PM, 肖正刚  wrote:
> >
> > Hello, I have some question about metadata space.
> > 1) I have ten 960GB SAS SSDs for mdt,after done raid10,we have 4.7TB
> space free.
> > after formated as mdt,we only have 2.6TB space free; so where the 2.1TB
> space go ?
> > 2) for the 2.6TB space, what's it used for?
>
> That space is used by inodes.  I believe the recent lustre versions use a
> 1KB inode size by default and the default format options create 1 inodes
> for every 2.5 KB of MDT space.  So about 40% of your disk space will be
> consumed by inodes.
>
> —
> Rick Mohr
> Senior HPC System Administrator
> Joint Institute for Computational Sciences
> University of Tennessee
>
>
>
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] OST recovery

2020-03-31 Thread Andreas Dilger

On Mar 29, 2020, at 20:04, Gong-Do Hwang 
mailto:grover.hw...@gmail.com>> wrote:

Thanks Andreas,

I ran  "mkfs.lustre --ost --reformat --fsname lfs_home --index 6 --mgsnode 
10.10.0.10@o2ib --servicenode 10.10.0.13@o2ib  --failnode 10.10.0.14@o2ib  
/dev/mapper/mpathx", and at that time /dev/mapper/mpathx was mounted and served 
as an OST under FS lfs. And the FS lfs ran well until I umount the 
/dev/mapper/mpathx in order to restart the mgt/mgs.

The issue here is that the "--reformat" option will override the checks if a 
filesystem already exists on the device.  That should not normally be used.


 And after I re-mounted the ost I got the msg "mount.lustre FATAL: failed to 
write local files: Invalid argument
mount.lustre: mount /dev/mapper/mpathx at /lfs/ost8 failed: Invalid argument"  
and the tunefs.luster --dryrun /dev/mapper/mapthx output is "tunefs.lustre 
--dryrun /dev/mapper/mpathx
checking for existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target: lfs-OST0008

This shows that the device was previously part of the "lfs" filesystem at index 
8.  While it is possible to change the filesystem name, the OST index should 
never change, so there is no tool for this.

Two things need to be done.  You can rewrite the filesystem label with "e2label 
/dev/mapper/mpathx lfs-OST0008".  Then you need to rebuild the 
"CONFIGS/mountdata" file.

The easiest way to generate a new mountdata file would be to run "mkfs.lustre" 
with the same options as the original OST on a temporary device (e.g. loopback 
device) but add in the "--replace" option so that the OST doesn't try to add 
itself to the filesystem as a new OST.  Then mount the temporary and original 
OSTs as type ldiskfs and copy the file CONFIGS/mountdata from temp to original 
OST to replace the broken one (probably a good idea to make a backup first).

Hopefully with these two changes you can mount your OST again.

Cheers, Andreas

Index:  6
Lustre FS:  lfs_home
Mount type: ldiskfs
Flags:  0x1042
  (OST update no_primnode )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=10.10.0.10@o2ib  
failover.node=10.10.0.13@o2ib:10.10.0.14@o2ib


   Permanent disk data:
Target: lfs_home-OST0006
Index:  6
Lustre FS:  lfs_home
Mount type: ldiskfs
Flags:  0x1042
  (OST update no_primnode )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=10.10.0.10@o2ib  
failover.node=10.10.0.13@o2ib:10.10.0.14@o2ib"

I guess I actually have run the mkfs command twice so the Lustre FS in the 
previous value became lfs_home(originally is lfs).

I tried to mount the partition use backup superblocks and all of them are 
empty. But from the dumpe2fs info,
"Inode count:  41943040
Block count:  10737418240
Reserved block count: 536870912
Free blocks:  1459812475
Free inodes:  39708575"
seems there is still data on it.

The backup superblocks are for the underlying ext4/ldiskfs filesystem, so are 
not really related to this problem.


So my problem is, if the data on the partition is still intact, is there any 
way I can rebuild the file index? And is there anyway I can rewrite the 
CONFIGS/mountdata back to its original values?
Sorry for the lengthy messages and really appreciate your help!

Best Regards,

Grover

On Mon, Mar 30, 2020 at 7:14 AM Andreas Dilger 
mailto:adil...@whamcloud.com>> wrote:
It would be useful if you provided the actual error messages, so we can see 
where the problem is.

What command did you run on the OST?

Does the OST still show that it has data in it (e.g. "df" or "dumpe2fs -h" 
shows lots of used blocks)?

On Mar 25, 2020, at 10:05, Gong-Do Hwang 
mailto:grover.hw...@gmail.com>> wrote:

Dear Lustre,

Months ago when I tried to add a new disk to my new Lustre FS, I accidentally 
target the mkfs.lustre to a then mounted OST partition of another Lustre FS. 
Weird enough the command passed through, and without paying attention to it, I 
umount the partition months later and couldn't mount it back, then I realized 
the mkfs.lustre command was legit.

But my old lustre FS worked well through these months, so I guess the data in 
that OST is still there. But now the permanent CONFIG/mountdata is the new one, 
and I can still see my old config in the previous value.

My question is is there any way I can write back the old CONFIG/mountdata and 
still keep all my files in that OST?

I am using Luster 2.13.0 for my mgs/mdt/ost

Thanks for your help and I really appreciate it!

Grover

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud






___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Files hanging on lustre clients

2020-03-31 Thread Mohr Jr, Richard Frank


> On Mar 31, 2020, at 3:43 PM, Kurt Strosahl  wrote:
> 
> I can't tell, any commands I run against the files in question hang 
> indefinitely.  It seems very suspicious though.

The fact that the  same OST appeared in error messages on two different clients 
made me think the problem might be with the OST.

Would you be able to deactivate that OST so no new files get created on it?  
Then you could see if the problem goes away for newly created files.

—
Rick Mohr
Senior HPC System Administrator
Joint Institute for Computational Sciences
University of Tennessee




___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Files hanging on lustre clients

2020-03-31 Thread Mohr Jr, Richard Frank


> On Mar 31, 2020, at 2:36 PM, Kurt Strosahl  wrote:
> 
> an strace on an ls command run against some of these files produced the 
> following:
> getxattr("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stdout.030408_124.out",
>  "system.posix_acl_default", NULL, 0) = -1 ENODATA (No data available)
> lstat("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stderr.030408_118.err",
>  {st_mode=S_IFREG|0644, st_size=16979, ...}) = 0
> getxattr("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stderr.030408_118.err",
>  "system.posix_acl_access", NULL, 0) = -1 ENODATA (No data available)
> getxattr("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stderr.030408_118.err",
>  "system.posix_acl_default", NULL, 0) = -1 ENODATA (No data available)
> lstat("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stdout.030408_000.out",



> Lustre: lustre19-OST0028-osc-88105fecd000: Connection to lustre19-OST0028 
> (at 172.17.0.99@o2ib) was lost; in progress operations using this service 
> will wait for recovery to complete
> Lustre: lustre19-OST0028-osc-88105fecd000: Connection restored to 
> lustre19-OST0028 (at 172.17.0.99@o2ib)

Of the files listed in the strace above that gave errors, are all those files 
striped across OST0028?

—
Rick Mohr
Senior HPC System Administrator
Joint Institute for Computational Sciences
University of Tennessee




___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Files hanging on lustre clients

2020-03-31 Thread Kurt Strosahl
I can't tell, any commands I run against the files in question hang 
indefinitely.  It seems very suspicious though.


From: Mohr Jr, Richard Frank 
Sent: Tuesday, March 31, 2020 3:41 PM
To: Kurt Strosahl 
Cc: lustre-discuss@lists.lustre.org ; 
sci...@jlab.org 
Subject: [EXTERNAL] Re: [lustre-discuss] Files hanging on lustre clients



> On Mar 31, 2020, at 2:36 PM, Kurt Strosahl  wrote:
>
> an strace on an ls command run against some of these files produced the 
> following:
> getxattr("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stdout.030408_124.out",
>  "system.posix_acl_default", NULL, 0) = -1 ENODATA (No data available)
> lstat("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stderr.030408_118.err",
>  {st_mode=S_IFREG|0644, st_size=16979, ...}) = 0
> getxattr("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stderr.030408_118.err",
>  "system.posix_acl_access", NULL, 0) = -1 ENODATA (No data available)
> getxattr("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stderr.030408_118.err",
>  "system.posix_acl_default", NULL, 0) = -1 ENODATA (No data available)
> lstat("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stdout.030408_000.out",



> Lustre: lustre19-OST0028-osc-88105fecd000: Connection to lustre19-OST0028 
> (at 172.17.0.99@o2ib) was lost; in progress operations using this service 
> will wait for recovery to complete
> Lustre: lustre19-OST0028-osc-88105fecd000: Connection restored to 
> lustre19-OST0028 (at 172.17.0.99@o2ib)

Of the files listed in the strace above that gave errors, are all those files 
striped across OST0028?

—
Rick Mohr
Senior HPC System Administrator
Joint Institute for Computational Sciences
University of Tennessee




___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Files hanging on lustre clients

2020-03-31 Thread Kurt Strosahl
Hello,

   I'm tracking a very vexing issue.  Somehow users are creating files that 
cause attempts to examine or manipulate the files to hang.  They can't be 
removed, they can't be examined, even an ls command will hang.

an strace on an ls command run against some of these files produced the 
following:

getxattr("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stdout.030408_124.out",
 "system.posix_acl_default", NULL, 0) = -1 ENODATA (No data available)

lstat("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stderr.030408_118.err",
 {st_mode=S_IFREG|0644, st_size=16979, ...}) = 0

getxattr("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stderr.030408_118.err",
 "system.posix_acl_access", NULL, 0) = -1 ENODATA (No data available)

getxattr("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stderr.030408_118.err",
 "system.posix_acl_default", NULL, 0) = -1 ENODATA (No data available)

lstat("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stdout.030408_000.out",

On one client that was being used to debug this problem I see the following 
(running CentOS 6.5 and luster 2.5.42)
Lustre: 2753:0:(client.c:1920:ptlrpc_expire_one_request()) @@@ Request sent has 
timed out for slow reply: [sent 1585677903/real 1585677903]  
req@880238fdd000 x1657754295856608/t0(0) 
o101->lustre19-OST0028-osc-88105fecd000@172.17.0.99@o2ib:28/4 lens 328/400 
e 1 to 1 dl 1585678504 ref 1 fl Rpc:X/0/ rc 0/-1
Lustre: lustre19-OST0028-osc-88105fecd000: Connection to lustre19-OST0028 
(at 172.17.0.99@o2ib) was lost; in progress operations using this service will 
wait for recovery to complete
Lustre: lustre19-OST0028-osc-88105fecd000: Connection restored to 
lustre19-OST0028 (at 172.17.0.99@o2ib)
Lustre: 2733:0:(client.c:304:ptlrpc_at_adj_net_latency()) Reported service time 
465 > total measured time 96
LustreError: 2733:0:(layout.c:2005:__req_capsule_get()) @@@ Wrong buffer for 
field `dlm_rep' (1 of 1) in format `LDLM_ENQUEUE_LVB': 0 vs. 112 (server)
  req@8802149c5800 x1657754295862208/t0(0) 
o101->lustre19-OST0028-osc-88105fecd000@172.17.0.99@o2ib:28/4 lens 328/192 
e 0 to 0 dl 1585679147 ref 1 fl Interpret:R/2/0 rc 0/0
LustreError: 2733:0:(layout.c:2005:__req_capsule_get()) Skipped 1 previous 
similar message

On newer systems (RHEL 7.7 running 2.10.8-1)  the problem also occurs and I see 
the following in dmesg
[5437818.727792] Lustre: lustre19-OST0028-osc-886229244000: Connection 
restored to 172.17.0.99@o2ib (at 172.17.0.99@o2ib)
[5438419.769959] Lustre: 2747:0:(client.c:2116:ptlrpc_expire_one_request()) @@@ 
Request sent has timed out for slow reply: [sent 1585678345/real 1585678345]  
req@884834ec0600 x1657002677904272/t0(0) 
o101->lustre19-OST0028-osc-886229244000@172.17.0.99@o2ib:28/4 lens 344/400 
e 24 to 1 dl 1585678946 ref 1 fl Rpc:X/2/ rc -11/-1
[5438419.769973] Lustre: lustre19-OST0028-osc-886229244000: Connection to 
lustre19-OST0028 (at 172.17.0.99@o2ib) was lost; in progress operations using 
this service will wait for recovery to complete
[5438419.770435] Lustre: lustre19-OST0028-osc-886229244000: Connection 
restored to 172.17.0.99@o2ib (at 172.17.0.99@o2ib)

df and lfs df commands on these systems aren't hanging.

on the OSS we are running: lustre 2.12.1-1 and CentOS 7.6) I'm seeing the 
following:
[Mar31 14:26] LNet: Service thread pid 309349 was inactive for 1121.43s. The 
thread might be hung, or it might only be slow and will resume later. Dumping 
the stack trace for debugging purposes:
[  +0.07] Pid: 309349, comm: ll_ost00_088 3.10.0-957.10.1.el7_lustre.x86_64 
#1 SMP Tue Apr 30 22:18:15 UTC 2019
[  +0.02] Call Trace:
[  +0.14]  [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc]
[  +0.74]  [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc]
[  +0.39]  [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc]
[  +0.45]  [] ofd_intent_policy+0x69b/0x920 [ofd]
[  +0.15]  [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc]
[  +0.38]  [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc]
[  +0.46]  [] tgt_enqueue+0x62/0x210 [ptlrpc]
[  +0.72]  [] tgt_request_handle+0xaea/0x1580 [ptlrpc]
[  +0.65]  [] ptlrpc_server_handle_request+0x24b/0xab0 
[ptlrpc]
[  +0.56]  [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc]
[  +0.54]  [] kthread+0xd1/0xe0
[  +0.07]  [] ret_from_fork_nospec_begin+0x7/0x21
[  +0.09]  [] 0x
[  +0.36] LustreError: dumping log to /tmp/lustre-log.1585679216.309349

Thank you,

Kurt J. Strosahl
System Administrator: Lustre, HPC
Scientific Computing Group, Thomas Jefferson National Accelerator Facility
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] CFP: First International CHAOSS International Workshop

2020-03-31 Thread Andreas Dilger
# First International Workshop on Challenges and Opportunities of HPC Storage 
Systems (CHAOSS)

The workshop is aimed at researchers, developers of scientific applications, 
engineers and everyone interested in the evolution of HPC storage systems. As 
the developments of computing power, storage and network technologies continue 
to diverge, the performance gap between them widens. This trend, combined with 
the growing data volumes, results in I/O and storage bottlenecks that become 
increasingly serious especially for large-scale HPC storage systems. The 
hierarchy of different storage technologies to ease this situation leads to a 
complex environment which will become even more challenging for future exascale 
systems.

This workshop is a venue for papers exploring topics related to data 
organization and management along with the impacts of multi-tier memory and 
storage for optimizing application throughput. It will take place at the 
Euro-Par 2020 conference in Warsaw, Poland on either August 24 or 25, 2020. 
More information is available at: 
https://wr.informatik.uni-hamburg.de/events/2020/chaoss

## Important Dates

Paper Submission:May 8, 2020
Notification to Authors: June 30, 2020
Registration:July 10, 2020
Camera-Ready Deadline
(Informal Proceedings):  July 10, 2020
Workshop Dates:  August 24 or 25, 2020
Camera-Ready Deadline:   September 11, 2020

## Submission Guidelines

Papers should not exceed 12 pages (including title, text, figures, appendices 
and references). Papers of less than 10 pages will be considered as short 
papers that can be presented at the conference but will not be published in the 
proceedings.

Papers must be formatted according to Springer's LNCS guidelines available at 
https://www.springer.com/gp/computer-science/lncs/conference-proceedings-guidelines.
 Accepted papers will be published in a separate LNCS workshop volume after the 
conference. One author of each accepted paper is required to register for the 
workshop and present the paper.

Submissions will be submitted and managed via EasyChair at: 
https://easychair.org/conferences/?conf=europar2020workshop

## Topics of Interest

Submissions may be more hands-on than research papers and we therefore 
explicitly encourage submissions in the early stages of research. Topics of 
interest include, but are not limited to:

- Kernel and user space file/storage systems
- Parallel and distributed file/storage systems
- Data management approaches for heterogeneous storage systems
- Management of self-describing data formats
- Metadata management
- Approaches using query and database interfaces
- Hybrid solutions using file systems and databases
- Optimized indexing techniques
- Data organizations to support online workflows
- Domain-specific data management solutions
- Related experiences from users: what worked, what didn't?

## Program Committee

- Gabriel Antoniu (INRIA)
- Konstantinos Chasapis (DDN)
- Andreas Dilger (Whamcloud/DDN)
- Kira Duwe (UHH)
- Wolfgang Frings (JSC)
- Elsa Gonsiororowski (LLNL)
- Anthony Kougkas (IIT)
- Michael Kuhn (UHH)
- Margaret Lawson (UIUC SNL)
- Jay Lofstead (SNL)
- Johann Lombardi (Intel)
- Jakob Lüttgau (DKRZ)
- Anna Queralt (BSC)
- Yue Zhu (FSU)

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud






___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] confused about mdt space

2020-03-31 Thread Mohr Jr, Richard Frank


> On Mar 30, 2020, at 10:56 PM, 肖正刚  wrote:
> 
> Hello, I have some question about metadata space.
> 1) I have ten 960GB SAS SSDs for mdt,after done raid10,we have 4.7TB space 
> free.
> after formated as mdt,we only have 2.6TB space free; so where the 2.1TB space 
> go ?
> 2) for the 2.6TB space, what's it used for?

That space is used by inodes.  I believe the recent lustre versions use a 1KB 
inode size by default and the default format options create 1 inodes for every 
2.5 KB of MDT space.  So about 40% of your disk space will be consumed by 
inodes.

—
Rick Mohr
Senior HPC System Administrator
Joint Institute for Computational Sciences
University of Tennessee




___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org