date:20200217

Re: [lustre-discuss] LNET ports and connections

2020-02-17 Thread NeilBrown

LNet is a peer-to-peer protocol, it has no concept of client and server.
If one host needs to send a message to another but doesn't already have
a connection, it creates a new connection.
I don't yet know enough specifics of the lustre protocol to be certain
of the circumstances when a lustre server will need to initiate a message
to a client, but I imagine that recalling a lock might be one.

I think you should assume that any LNet node might receive a connection
from any other LNet node (for which they share an LNet network), and
that the connection could come from any port between 512 and 1023
(LNET_ACCEPTOR_MIN_PORT to LNET_ACCEPTOR_MAX_PORT).

NeilBrown

On Mon, Feb 17 2020, Degremont, Aurelien wrote:

> Hi all,
>
> From what I've understood so far, LNET listens on port 988 by default and 
> peers connect to it using 1021-1023 TCP ports as source ports.
> At Lustre level, servers listen on 988 and clients connect to them using the 
> same source ports 1021-1023.
> So only accepting connections to port 988 on server side sounded pretty safe 
> to me. However, I've seen connections from 1021-1023 to 988, from server 
> hosts to client hosts sometimes.
> I can't understand what mechanism could trigger these connections. Did I miss 
> something?
>
> Thanks
>
> Aurélien
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

signature.asc
Description: PGP signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] enable quota enforcement on the fly?

2020-02-17 Thread Liam Forbes

We recently noticed we apparently did not enable group quota enforcement
early last year during the most recent rebuild of our Lustre filesystem. Is
it possible to do so on the fly, or is it better/required for the
filesystem to be quiesced first? We are using Lustre 2.10.7 with ZFS 0.7.5
(project quotas are not needed).

[loforbes@mds01 ~]$ lctl get_param osd-*.*.quota_slave.info
osd-zfs.lustre2-MDT.quota_slave.info=
target name:lustre2-MDT
pool ID:0
type:   md
quota enabled:  none
conn to master: setup
space acct: ug
user uptodate:  glb[0],slv[0],reint[0]
group uptodate: glb[0],slv[0],reint[0]
project uptodate: glb[0],slv[0],reint[0]

Thank you!
-- 
Regards,
-liam

-There are uncountably more irrational fears than rational ones. -P. Dolan
Liam Forbes  lofor...@alaska.edu  ph: 907-450-8618 fax: 907-450-8601
UAF Research Computing Systems Senior HPC Engineer  CISSP
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] LNET ports and connections

2020-02-17 Thread Degremont, Aurelien

Hi all,

From what I've understood so far, LNET listens on port 988 by default and peers 
connect to it using 1021-1023 TCP ports as source ports.
At Lustre level, servers listen on 988 and clients connect to them using the 
same source ports 1021-1023.
So only accepting connections to port 988 on server side sounded pretty safe to 
me. However, I've seen connections from 1021-1023 to 988, from server hosts to 
client hosts sometimes.
I can't understand what mechanism could trigger these connections. Did I miss 
something?

Thanks

Aurélien

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Jobstats harvesting

2020-02-17 Thread Andrew Elwell

On Mon., 17 Feb. 2020, 18:06 Andreas Dilger,  wrote:

> You don't mention which Lustre release you are using, but newer
> releases allow "complex JobIDs" that can contain both the SLURMJobID
> as well as other constant strings (e.g. cluster name), hostname, UID, GID,
> and process name.
>

Yeah, i twigged that once I'd sent the mail: we're still 2.10.8 in
production, so having the option of the more complex jobid string is
another reason for upgrading

Related, ive found the DDN fork of collectd, and i see the lustre2.c plugin
is GPL2 but are there any plans to get it merged upstream?

Andrew
(Also who's mad enough to be running mythtv on lustre judging from the
examples?)

>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Speed of deleting small files on OST vs DoM

2020-02-17 Thread Åke Sandgren

Hi!

Is there a good reason why deleting lots of small files (io500 test
md_easy/hard_delete) with the files on OSTs are up to two times faster
then when using DoM with the whole file(s) on the MDT?

Using server/client 2.13.0
DoM up to 64k, test files < 4k

I can see that the actual data deletion with the data on OST is
asynchronous, but I see no reason for it to be almost two times faster.

Both MDT's and OSTs are SSDs.

The situation is basically the same for a single task and for multiple
clients with multiple tasks/client.

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Jobstats harvesting

2020-02-17 Thread Andreas Dilger

You don't mention which Lustre release you are using, but newer
releases allow "complex JobIDs" that can contain both the SLURMJobID
as well as other constant strings (e.g. cluster name), hostname, UID, GID, and 
process name.

This is documented in the Lustre manual at:
http://doc.lustre.org/lustre_manual.xhtml#dbdoclet.jobstats

Cheers, Andreas

On Feb 14, 2020, at 19:13, Andrew Elwell  wrote:

Hi folks,

I've finally got round to enabling jobstats on a test system. As we're
a Slurm shop, setting this to jobid_var=SLURM_JOB_ID works OK, but is
it possible to use a combination of variables?
ie ${PAWSEY_CLUSTER}-${SLURM_JOB_ID} (or even SLURM_CLUSTER_NAME which
is the same as $PAWSEY_CLUSTER)? if so, what's the syntax? (Yes, I
know that setting it to federated would jump up the JobId namespace to
include a cluster identifier, but that's not happening for now.

However, main reason for mail is to find out what people use to
harvest the stats off the MDT/OSTs - I'm aware of Roland Laifer's
LAD15 presentation (sadly his tarball misses a sample config file out,
so it's taken me a bit of iteration over the Perl scripts to recreate
syntax) which saves to a file based structure, and I've seen others
using Prometheus (via https://grafana.com/grafana/dashboards/9671)

We've got influxdb (lnet / mds / ost stats gathered as well as regular
collectd output) and mariaDB (slurmdbd and robinhood) DBs available,
so I'd rather go with something that fed into that.
We're not doing serious high throughput (financial style) but more
traditional HPC with a lot (sigh) of single node jobs over 4
production filesystems (of which 3 are non-appliance LTS releases
maintained by us)

Hopefully the discussion here will lead to some updated content at
http://wiki.lustre.org/Lustre_Monitoring_and_Statistics_Guide (hat tip
to Scott for a great start)

Many thanks

Andrew
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] LNET ports and connections

[lustre-discuss] enable quota enforcement on the fly?

[lustre-discuss] LNET ports and connections

Re: [lustre-discuss] Jobstats harvesting

[lustre-discuss] Speed of deleting small files on OST vs DoM

Re: [lustre-discuss] Jobstats harvesting

6 matches

Site Navigation

Mail list logo

Footer information