Re: [lustre-discuss] Command line tool to monitor Lustre I/O ?

2018-12-20 Thread Harr, Cameron
I use ltop heavily:

https://github.com/LLNL/lmt


On 12/20/18 9:15 AM, Alexander I Kulyavtsev wrote:

1) cerebro + ltop still work.


2) telegraf + inflixdb (collector, time series DB ). Telegraf has input plugins 
for lustre ("lustre2"), zfs,  and many others. Grafana to  plot live data from 
DB. Also, influxDB integrates with Prometheus.

Basically, each component can feed data to different output types through 
plugins; or take data from multiple type of sources so you can use different 
combination for your monitoring stack.


For the simplest tool you may take a look if telegraf from influxdb stack has 
proper output plugin (see influxdata on github).


Alex.


From: lustre-discuss 

 on behalf of Laifer, Roland (SCC) 

Sent: Thursday, December 20, 2018 8:04:55 AM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] Command line tool to monitor Lustre I/O ?

Dear Lustre administrators,

what is a good command line tool to monitor current Lustre metadata and
throughput operations on the local client or server? Up to now we had
used collectl but this no longer works for Lustre 2.10.

Some background about collectl: The Lustre support of collectl was
removed many years ago but up to Lustre 2.7 it was still possible to
monitor metadata and throughput operations on clients. In addition,
there were plugins which also worked for the server side, see
https://urldefense.proofpoint.com/v2/url?u=http-3A__wiki.lustre.org_Collectl&d=DwICAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=23V5nhLj03jeTboyg6QveA&m=RpMjhssRJoiP3ANRP6Ze3_nBrliMMPOgQaewqEwRTn4&s=QmdmoNcRR5A0sOgiJimMo0KtZnc-ne44A4YY8aSWbuI&e=
However, it seems that there was no update for these plugins to adapt
them for Lustre 2.10.

Regards,
  Roland
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwICAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=23V5nhLj03jeTboyg6QveA&m=RpMjhssRJoiP3ANRP6Ze3_nBrliMMPOgQaewqEwRTn4&s=SXbueuHkxyBAq95D_-bLmBayRVDMtR-l7t0XZfNXEXk&e=



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Command line tool to monitor Lustre I/O ?

2018-12-20 Thread Daniel Kobras
Hi Roland!

> Am 20.12.2018 um 15:04 schrieb Laifer, Roland (SCC) :
> 
> what is a good command line tool to monitor current Lustre metadata and
> throughput operations on the local client or server? Up to now we had
> used collectl but this no longer works for Lustre 2.10.

The Lustre exporter (https://github.com/HewlettPackard/lustre_exporter) for 
Prometheus copes well with 2.10. Calling it a command-line tool is a bit of a 
stretch (hey, there’s curl after all!), but it can certainly step in for 
collect’s non-interactive mode of operation.

Kind regards,

Daniel
-- 
Daniel Kobras
Principal Architect
Puzzle ITC Deutschland
https://www.puzzle-itc.de

-- 
Puzzle ITC Deutschland GmbH
Sitz der Gesellschaft:  Jurastr. 27/1, 72072 
Tübingen
Eingetragen am Amtsgericht Stuttgart HRB 765802
Geschäftsführer: 
Lukas Kallies
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] lfsck oi_scrub failed counts

2018-12-20 Thread Josh Samuelson
Greetings,

We're looking for suggestions on how to interpret the status output of
the various stages of the 'lctl lfsck_start' command.  In particular
the oi_scrub failed counts.

The manual states the following in the 'LFSCK status of OI Scrub' section:
'Failed - total number of objects that failed to be repaired.'
 
A recent 'lctl lfsck_start -M Name-MDT' to verify OI, layout and
namespace reported high failed counts on the oi_scrub for all of our
OSTs in the FS.  This was unexpected.  We were running an online lfsck
because we had a single OST go read-only whilst the underlying RAID6
hardware was rebuilding a disk and had a long period of not responding
to I/O (I'll spare this tale of woe).  The resulting e2fsck'd OST had 6
zero sized, trusted.lma extended attribute containing "Unattached inode"
that were routed manually to /lost+found.  These 6 inodes showed up in
the /proc/fs/lustre/osd-ldiskfs//oi_scrub file:

lf_scanned: 6
lf_repaired: 6

However this and the 49 other OSTs also showed 'failed:' counts in
oi_scrub, ranging between ~45000 and ~50200 for the low and high end of
the ranges respectively, a snippet of the OST having the above lf_* counts:

first_failure_position: 87
checked: 1784231
updated: 327
failed: 47725
prior_updated: 0
noscrub: 225
igif: 1
success_count: 1

All of the OST oi_scrub status files had the following:
first_failure_position: 87

All the OSSs have the following default debug settings:

lctl get_param debug
debug=ioctl neterror warning error emerg ha config console lfsck

Performing a 'lctl debug_kernel dk.txt' on the OSSs and looking for LFSCK
subsystem/debug_mask lines appearing to be involved with scrub activities
were _much_ smaller than the failed counts.  The scrub LFSCK debug lines
looked similar to the following:

0010:1000:17.0:1545248778.311791:0:13132:0:(osd_scrub.c:454:osd_scrub_convert_ff())
 Name-OST002a-osd: fail to convert ff [0x1:0xb0:0x0]: rc = -17

and I assume -17 is -EEXIST.

Should we be concerned about these failed counts?  If so, how do we match
failed counts in LFSCK status output to Lustre debug lines so we can find
the cause and try to resolve the problem?

We're running Lustre 2.8.0 on the servers and clients in case that matters
at all.

Thank you in advance for any wisdom you can share,
-Josh
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Command line tool to monitor Lustre I/O ?

2018-12-20 Thread Alexander I Kulyavtsev
1) cerebro + ltop still work.


2) telegraf + inflixdb (collector, time series DB ). Telegraf has input plugins 
for lustre ("lustre2"), zfs,  and many others. Grafana to  plot live data from 
DB. Also, influxDB integrates with Prometheus.

Basically, each component can feed data to different output types through 
plugins; or take data from multiple type of sources so you can use different 
combination for your monitoring stack.


For the simplest tool you may take a look if telegraf from influxdb stack has 
proper output plugin (see influxdata on github).


Alex.


From: lustre-discuss  on behalf of 
Laifer, Roland (SCC) 
Sent: Thursday, December 20, 2018 8:04:55 AM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] Command line tool to monitor Lustre I/O ?

Dear Lustre administrators,

what is a good command line tool to monitor current Lustre metadata and
throughput operations on the local client or server? Up to now we had
used collectl but this no longer works for Lustre 2.10.

Some background about collectl: The Lustre support of collectl was
removed many years ago but up to Lustre 2.7 it was still possible to
monitor metadata and throughput operations on clients. In addition,
there were plugins which also worked for the server side, see
https://urldefense.proofpoint.com/v2/url?u=http-3A__wiki.lustre.org_Collectl&d=DwICAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=23V5nhLj03jeTboyg6QveA&m=RpMjhssRJoiP3ANRP6Ze3_nBrliMMPOgQaewqEwRTn4&s=QmdmoNcRR5A0sOgiJimMo0KtZnc-ne44A4YY8aSWbuI&e=
However, it seems that there was no update for these plugins to adapt
them for Lustre 2.10.

Regards,
  Roland
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwICAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=23V5nhLj03jeTboyg6QveA&m=RpMjhssRJoiP3ANRP6Ze3_nBrliMMPOgQaewqEwRTn4&s=SXbueuHkxyBAq95D_-bLmBayRVDMtR-l7t0XZfNXEXk&e=
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Quota Reporting (for all users and/or gruops)

2018-12-20 Thread Marion Hakanson
We've just been iterating through all our group IDs, doing "lfs quota -g" for 
each one.  Collect them up in a text file and import into a spreadsheet, and 
you can get totals pretty easily.

While we're at it, I would really like to have a way to list all quotas that 
have been set with "lfs setquota".  Does anyone know how to get such a list?

Regards,

Marion


On Dec 20, 2018, at 06:50, Paul Edmon 
mailto:ped...@cfa.harvard.edu>> wrote:


I'm not aware of one, but I too would love to either learn of a tool to do this 
or advocate for Lustre to add it.


-Paul Edmon-


On 12/20/18 9:41 AM, Jason Williams wrote:

It is entirely possible that this already exists, but my google-foo is not what 
it used to be.  However, I've searched around the internet and it seems as 
though it doesn't really exist.  There are handfuls of now defunct or 
un-maintained projects out there, but nothing that seems to report all of the 
user and/or group quotas.


Does anyone know of a good quota reporting tool that can give quota information 
in the same way as a 'repquota -u' or 'repquota -g' would?


--

Jason Williams
Assistant Director
Systems and Data Center Operations.
Maryland Advanced Research Computing Center (MARCC)
Johns Hopkins University
jas...@jhu.edu




___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Quota Reporting (for all users and/or gruops)

2018-12-20 Thread Christopher Johnston
I have been using Robinhood and some scripts I wrote to run daily checks to
send reports to my user, etc.  You may be able to accomplish something
similar there. .

On Thu, Dec 20, 2018 at 9:49 AM Paul Edmon  wrote:

> I'm not aware of one, but I too would love to either learn of a tool to do
> this or advocate for Lustre to add it.
>
>
> -Paul Edmon-
>
>
> On 12/20/18 9:41 AM, Jason Williams wrote:
>
> It is entirely possible that this already exists, but my google-foo is not
> what it used to be.  However, I've searched around the internet and it
> seems as though it doesn't really exist.  There are handfuls of now defunct
> or un-maintained projects out there, but nothing that seems to report all
> of the user and/or group quotas.
>
>
> Does anyone know of a good quota reporting tool that can give quota
> information in the same way as a 'repquota -u' or 'repquota -g' would?
>
>
> --
> Jason Williams
> Assistant Director
> Systems and Data Center Operations.
> Maryland Advanced Research Computing Center (MARCC)
> Johns Hopkins University
> jas...@jhu.edu
>
>
> ___
> lustre-discuss mailing 
> listlustre-discuss@lists.lustre.orghttp://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Quota Reporting (for all users and/or gruops)

2018-12-20 Thread Paul Edmon
I'm not aware of one, but I too would love to either learn of a tool to 
do this or advocate for Lustre to add it.



-Paul Edmon-


On 12/20/18 9:41 AM, Jason Williams wrote:


It is entirely possible that this already exists, but my google-foo is 
not what it used to be.  However, I've searched around the internet 
and it seems as though it doesn't really exist.  There are handfuls of 
now defunct or un-maintained projects out there, but nothing that 
seems to report all of the user and/or group quotas.



Does anyone know of a good quota reporting tool that can give quota 
information in the same way as a 'repquota -u' or 'repquota -g' would?



--

Jason Williams
Assistant Director
Systems and Data Center Operations.
Maryland Advanced Research Computing Center (MARCC)
Johns Hopkins University
jas...@jhu.edu 


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Quota Reporting (for all users and/or gruops)

2018-12-20 Thread Jason Williams
It is entirely possible that this already exists, but my google-foo is not what 
it used to be.  However, I've searched around the internet and it seems as 
though it doesn't really exist.  There are handfuls of now defunct or 
un-maintained projects out there, but nothing that seems to report all of the 
user and/or group quotas.


Does anyone know of a good quota reporting tool that can give quota information 
in the same way as a 'repquota -u' or 'repquota -g' would?


--

Jason Williams
Assistant Director
Systems and Data Center Operations.
Maryland Advanced Research Computing Center (MARCC)
Johns Hopkins University
jas...@jhu.edu

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Command line tool to monitor Lustre I/O ?

2018-12-20 Thread Laifer, Roland (SCC)
Dear Lustre administrators,

what is a good command line tool to monitor current Lustre metadata and
throughput operations on the local client or server? Up to now we had
used collectl but this no longer works for Lustre 2.10.

Some background about collectl: The Lustre support of collectl was
removed many years ago but up to Lustre 2.7 it was still possible to
monitor metadata and throughput operations on clients. In addition,
there were plugins which also worked for the server side, see
http://wiki.lustre.org/Collectl
However, it seems that there was no update for these plugins to adapt
them for Lustre 2.10.

Regards,
  Roland
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org