Re: [lustre-discuss] Lustre compilation error

2017-10-19 Thread Parag Khuraswar
Hi,

 

Any solution on bellow issue ?

 

Regards,

Parag

 

 

From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf 
Of parag_k
Sent: Thursday, October , 2017 8:15 AM
To: Dilger, Andreas
Cc: Lustre User Discussion Mailing List
Subject: Re: [lustre-discuss] Lustre compilation error

 

Hi Dilger,

 

I extracted the src rpm of lustre 2.10.0 using 7zip and got the tarball of 
lustre 2.10.0.

 

Also if you put below mentioned link on browser you will find snapshot option 
and can download lustre.

 

But once you get tar, the procedure of compilation will be same i guess what i 
mentioned in last mail. 

 

Regards,

Parag

 

 Original message 

From: "Dilger, Andreas"  

Date: 19/10/2017 7:48 am (GMT+05:30) 

To: parag_k  

Cc: Chris Horn , Lustre User Discussion Mailing List 
 

Subject: Re: [lustre-discuss] Lustre compilation error 

 

On Oct 18, 2017, at 07:44, parag_k  wrote:
> 
> I got the source from github.

Lustre isn't hosted on GitHub (unless someone is cloning it there), so it isn't 
clear what you are compiling.

You should download sources from git://git.hpdd.intel.com/fs/lustre-release.git

Cheers, Andreas

> My configure line is-
> 
> ./configure --disable-client 
> --with-kernel-source-header=/usr/src/kernels/3.10.0-514.el7.x86_64/ 
> --with-o2ib=/usr/src/ofa_kernel/default/
> 
> There are two things I was trying to do.
> 
> 1)  Creating rpms from source. And error mailed below is while making 
> rpms.
> 
>  
> 
> 2)  Compiling from source which is mentioned in the attached guide.
> 
>  
> 
> I also tried by extracting the src rpm and getting tar.gz from there.
> 
>  
> 
> 
> 
> Regards,
> Parag
> 
>  Original message 
> From: Chris Horn 
> Date: 18/10/2017 10:31 am (GMT+05:30)
> To: Parag Khuraswar , 'Lustre User Discussion Mailing 
> List' 
> Subject: Re: [lustre-discuss] Lustre compilation error
> 
> It would be helpful if you provided more context. How did you acquire the 
> source? What was your configure line? Is there a set of build instructions 
> that you are following?
> 
>  
> 
> Chris Horn
> 
>  
> 
> From: lustre-discuss  on behalf of 
> Parag Khuraswar 
> Date: Tuesday, October 17, 2017 at 11:52 PM
> To: 'Lustre User Discussion Mailing List' 
> Subject: Re: [lustre-discuss] Lustre compilation error
> 
>  
> 
> Hi,
> 
>  
> 
> Does any one have any idea on below issue?
> 
>  
> 
> Regards,
> 
> Parag
> 
>  
> 
>  
> 
> From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On 
> Behalf Of Parag Khuraswar
> Sent: Tuesday, October , 2017 6:11 PM
> To: 'Lustre User Discussion Mailing List'
> Subject: [lustre-discuss] Lustre compilation error
> 
>  
> 
> Hi,
> 
>  
> 
> I am trying to make rpms from lustre 2.10.0 source. I get below error when I 
> run “make”
> 
>  
> 
> ==
> 
> make[4]: *** No rule to make target `fld.ko', needed by `all-am'.  Stop.
> 
> make[3]: *** [all-recursive] Error 1
> 
> make[2]: *** [all-recursive] Error 1
> 
> make[1]: *** [all] Error 2
> 
> error: Bad exit status from 
> /tmp/rpmbuild-lustre-root-Ssi5N0Xv/TMP/rpm-tmp.bKMjSO (%build)
> 
>  
> 
>  
> 
> RPM build errors:
> 
> Bad exit status from 
> /tmp/rpmbuild-lustre-root-Ssi5N0Xv/TMP/rpm-tmp.bKMjSO (%build)
> 
> make: *** [rpms] Error 1
> 
> ==
> 
>  
> 
> Regards,
> 
> Parag
> 
>  
> 
>  
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation








___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Acceptable thresholds

2017-10-19 Thread Patrick Farrell
Several processes per CPU core, probably?  It’s a lot.

But there’s a lot of environmental and configuration dependence here too.

Why not look at how many you have running currently when Lustre is set up and 
set the limit to double that?  Watching process count isn’t a good way to 
measure load anyway - it’s probably only good for watching for a fork-bomb type 
thing, where process count goes runaway.  So why not configure to catch that 
and otherwise don’t worry about it?

- Patrick

From: lustre-discuss 
mailto:lustre-discuss-boun...@lists.lustre.org>>
 on behalf of "E.S. Rosenberg" 
mailto:esr+lus...@mail.hebrew.edu>>
Date: Thursday, October 19, 2017 at 2:20 PM
To: "lustre-discuss@lists.lustre.org" 
mailto:lustre-discuss@lists.lustre.org>>
Subject: [lustre-discuss] Acceptable thresholds

Hi,
This question is I guess not truly answerable because it is probably very 
specific for each environment etc. but I am still going to ask it to get a 
general idea.

We started testing monitoring using Zabbix, its' default 'too many processes' 
threshold is not very high, so I already raised it to 1024 but the Lustre 
servers are still well over even that count.

So what is a 'normal' process count for Lustre servers?
Should I assume X processes per client? What is X?

Thanks,
Eli
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Acceptable thresholds

2017-10-19 Thread E.S. Rosenberg
Hi,
This question is I guess not truly answerable because it is probably very
specific for each environment etc. but I am still going to ask it to get a
general idea.

We started testing monitoring using Zabbix, its' default 'too many
processes' threshold is not very high, so I already raised it to 1024 but
the Lustre servers are still well over even that count.

So what is a 'normal' process count for Lustre servers?
Should I assume X processes per client? What is X?

Thanks,
Eli
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] OSTs remounting read-only after ldiskfs journal error

2017-10-19 Thread Mohr Jr, Richard Frank (Rick Mohr)
Recently, I ran into an issue where several of the OSTs on my Lustre file 
system went read-only.  When I checked the logs, I saw messages like these for 
several OSTs:

Oct  6 23:27:11 haven-oss2 kernel: LDISKFS-fs: ldiskfs_getblk:834: aborting 
transaction: error 28 in __ldiskfs_handle_dirty_metadata
Oct  6 23:27:11 haven-oss2 kernel: LDISKFS-fs error (device sfa0023): 
ldiskfs_getblk:834: inode #81: block 688560124: comm ll_ost00_022: 
journal_dirty_metadata failed: handle type 0 started at line 1723, credits 8/0, 
errcode -28
Oct  6 23:27:11 haven-oss2 kernel: Aborting journal on device sfa0023-8.
Oct  6 23:27:11 haven-oss2 kernel: LDISKFS-fs (sfa0023): Remounting filesystem 
read-only
Oct  6 23:27:11 haven-oss2 kernel: LDISKFS-fs error (device sfa0023) in 
osd_trans_stop:1830: error 28

(This looks a lot like LU-9740)

In an effort to get the file system back up, I unmounted the osts, rebooted the 
oss servers, and then remounted the osts.  Most of the osts that had gone 
read-only mounted back up.  There were complaints that the file systems were 
“clean with errors” and needed fsck, but otherwise they seemed fine.  However, 
there were two osts that would still fall back to read-only and report errors 
like this:

Lustre: haven-OST001a: Recovery over after 0:10, of 90 clients 90 recovered and 
0 were evicted.
Lustre: haven-OST001a: deleting orphan objects from 0x0:1124076 to 0x0:1124289
LDISKFS-fs: ldiskfs_getblk:834: aborting transaction: error 28 in 
__ldiskfs_handle_dirty_metadata
LDISKFS-fs error (device sfa0027): ldiskfs_getblk:834: inode #81: block 
72797184: comm ll_ost00_002: journal_dirty_metadata failed: handle type 0 
started at line 1723, credits 8/0, errcode -28
Aborting journal on device sfa0027-8.
LDISKFS-fs (sfa0027): Remounting filesystem read-only
LustreError: 16018:0:(osd_io.c:1679:osd_ldiskfs_write_record()) sfa0027: error 
reading offset 20480 (block 5): rc = -28
LDISKFS-fs error (device sfa0027) in osd_trans_stop:1830: error 28
LustreError: 16954:0:(osd_handler.c:1553:osd_trans_commit_cb()) transaction 
@0x8807b97ba500 commit error: 2
LustreError: 16954:0:(osd_handler.c:1553:osd_trans_commit_cb()) Skipped 3 
previous similar messages
LustreError: 16018:0:(osd_handler.c:1833:osd_trans_stop()) haven-OST001a: 
failed to stop transaction: rc = -28
LDISKFS-fs warning (device sfa0027): kmmpd:187: kmmpd being stopped since 
filesystem has been remounted as readonly.
LustreError: 16017:0:(tgt_lastrcvd.c:980:tgt_client_new()) haven-OST000e: 
Failed to write client lcd at idx 96, rc -30

I ended up unmounting all the osts in the file system and running “e2fsck -fn” 
on them.  There were no problems reported.  I then ran “e2fsck -fp” on the osts 
that were “clean with errors” so that the file system state would get reset to 
“clean”.  When I remounted everything, the same two osts would always go 
read-only. 

I did some digging with debugfs, and it looks like inode 81 corresponds to the 
last_rcvd file.  So I am wondering if there might be one of two things 
happening:

1) The journal is corrupted.  When it tries to replay a transaction that 
modifies the last_rcvd file, that transaction fails and the journal replay 
aborts.  (In which case, is there some way to get around a corrupted journal?)

2) The journal is fine, but the last_rcvd file is somehow corrupted which is 
preventing the transaction from replaying.  (If that is the case, will Lustre 
regenerate the last_rcvd file if I delete it?)

Of course, it could be that it's neither of those two options.

I am hoping that someone on the mailing list might have some experience with 
this so they can share their wisdom with me.

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org