Re: [lustre-discuss] Limit to number of OSS?

2019-10-10 Thread Andreas Dilger
On Oct 10, 2019, at 11:20, Michael Di Domenico 
mailto:mdidomeni...@gmail.com>> wrote:

On Mon, Oct 7, 2019 at 6:33 PM Andreas Dilger 
mailto:adil...@whamcloud.com>> wrote:

With socklnd there are 3 TCP connections per client-server pair.
For IB there is no such connection limit that I'm aware of.

just out of morbid curiosity, can very briefly explain the
connectivity differences between TCP/IB.  Does IB use the same 3
connections as TCP?  If not, is that why the connectivity limit
doesn't exist with IB or is there some other overriding design
principal in IB that allows lustre to push past TCP?  Not that any of
this has any relevance to anything i do, i'm just curious.

i'd love to have 2000 OSS's and 20k clients, but sadly i do not... :(

This is a fundamental difference between TCP and IB.  TCP needs a persistent
connection between peers (socket) to manage state, and the (very ancient) IP
protocol on which TCP is built has a limit of 65536 connections on a single 
node.
When computers had 1-2MB of RAM that was more than enough...

IB does not have this limitation, though it does consume some memory for each
peer that that it is communicating with.  o2iblnd can establish multiple 
connections
to a single peer to get better bandwidth, and this is important for OPA 
performance,
but is not critical for IB networks.

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud






___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Limit to number of OSS?

2019-10-10 Thread Michael Di Domenico
On Mon, Oct 7, 2019 at 6:33 PM Andreas Dilger  wrote:
>
> With socklnd there are 3 TCP connections per client-server pair.
> For IB there is no such connection limit that I'm aware of.

just out of morbid curiosity, can very briefly explain the
connectivity differences between TCP/IB.  Does IB use the same 3
connections as TCP?  If not, is that why the connectivity limit
doesn't exist with IB or is there some other overriding design
principal in IB that allows lustre to push past TCP?  Not that any of
this has any relevance to anything i do, i'm just curious.

i'd love to have 2000 OSS's and 20k clients, but sadly i do not... :(
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Limit to number of OSS?

2019-10-07 Thread Andreas Dilger
Whether there are problems with a large number of OSS and/or MDS nodes depends 
on whether you are using TCP or IB networking.

With socklnd there are 3 TCP connections per client-server pair (bulk read, 
bulk write, and small message) so the maximum you could have would be around 
(65536 - 1024)/3 = 21500 (or likely fewer) clients or servers, unless you also 
configured LNet routers in between (which would allow more clients, but not 
more servers).  That isn't a limitation for most deployments, but at least one 
known limitation.  For IB there is no such connection limit that I'm aware of.

There are likely other factors such as memory consumption per target, but I 
don't think that would be the first thing to cause problems on modern systems 
with hundreds of GB of RAM.

Cheers, Andreas

On Oct 4, 2019, at 01:45, Degremont, Aurelien 
mailto:degre...@amazon.com>> wrote:

Thanks for this info. But actually I was really looking at the number of OSS, 
not OSTs :)
This is really more how Lustre client nodes and MDT will cope with very large 
number of OSSes.

De : Andreas Dilger mailto:adil...@whamcloud.com>>
Date : vendredi 4 octobre 2019 à 04:54
À : "Degremont, Aurelien" mailto:degre...@amazon.com>>
Objet : Re: [lustre-discuss] Limit to number of OSS?

On Oct 3, 2019, at 07:55, Degremont, Aurelien 
mailto:degre...@amazon.com>> wrote:

Hello all!

This doc from the wiki says "Lustre can support up to 2000 OSS per file system" 
(http://wiki.lustre.org/Lustre_Server_Requirements_Guidelines).

I'm a bit surprised by this statement. Does somebody has information about the 
upper limit to the number of OSSes?
Or what could be the scaling limitator for this number of OSS? Network limit? 
Memory consumption? Other?

That's likely a combination of a bit of confusion and a bit of safety on the 
part of Intel writing that document.

The Lustre Operations Manual writes:
Although a single file can only be striped over 2000 objects, Lustre file 
systems can have thousands of OSTs. The I/O bandwidth to access a single file 
is the aggregated I/O bandwidth to the objects in a file, which can be as much 
as a bandwidth of up to 2000 servers. On systems with more than 2000 OSTs, 
clients can do I/O using multiple files to utilize the full file system 
bandwidth.
I think PNNL once tested up to 4000 OSTs, and I think the compile-time limit 
is/was 8000 OSTs (maybe it was made dynamic, I don't recall offhand), but the 
current code could _probably_ handle up to 65000 OSTs without significant 
problems.  Beyond that, there is the 16-bit OST index limit in the filesystem 
device labels and the __u16 lov_user_md_v1->lmm_stripe_offset to specify the 
starting OST index for "lfs setstripe", but that could be overcome with some 
changes.

Given OSTs are starting to approach 1PB with large drives and 
declustered-parity RAID, this would get us in the range 8-65EB, which is over 
2^64 bytes (16EB), so I don't think it is an immediate concern.  Let me know if 
you have any trouble with a 9000-OST filesystem... :-)

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud






___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Limit to number of OSS?

2019-10-04 Thread Degremont, Aurelien
Thanks for this info. But actually I was really looking at the number of OSS, 
not OSTs :)
This is really more how Lustre client nodes and MDT will cope with very large 
number of OSSes.

De : Andreas Dilger 
Date : vendredi 4 octobre 2019 à 04:54
À : "Degremont, Aurelien" 
Cc : "lustre-discuss@lists.lustre.org" 
Objet : Re: [lustre-discuss] Limit to number of OSS?

On Oct 3, 2019, at 07:55, Degremont, Aurelien 
mailto:degre...@amazon.com>> wrote:

Hello all!

This doc from the wiki says "Lustre can support up to 2000 OSS per file system" 
(http://wiki.lustre.org/Lustre_Server_Requirements_Guidelines).

I'm a bit surprised by this statement. Does somebody has information about the 
upper limit to the number of OSSes?
Or what could be the scaling limitator for this number of OSS? Network limit? 
Memory consumption? Other?

That's likely a combination of a bit of confusion and a bit of safety on the 
part of Intel writing that document.

The Lustre Operations Manual writes:
Although a single file can only be striped over 2000 objects, Lustre file 
systems can have thousands of OSTs. The I/O bandwidth to access a single file 
is the aggregated I/O bandwidth to the objects in a file, which can be as much 
as a bandwidth of up to 2000 servers. On systems with more than 2000 OSTs, 
clients can do I/O using multiple files to utilize the full file system 
bandwidth.
I think PNNL once tested up to 4000 OSTs, and I think the compile-time limit 
is/was 8000 OSTs (maybe it was made dynamic, I don't recall offhand), but the 
current code could _probably_ handle up to 65000 OSTs without significant 
problems.  Beyond that, there is the 16-bit OST index limit in the filesystem 
device labels and the __u16 lov_user_md_v1->lmm_stripe_offset to specify the 
starting OST index for "lfs setstripe", but that could be overcome with some 
changes.

Given OSTs are starting to approach 1PB with large drives and 
declustered-parity RAID, this would get us in the range 8-65EB, which is over 
2^64 bytes (16EB), so I don't think it is an immediate concern.  Let me know if 
you have any trouble with a 9000-OST filesystem... :-)

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud





___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Limit to number of OSS?

2019-10-03 Thread Andreas Dilger
On Oct 3, 2019, at 07:55, Degremont, Aurelien 
mailto:degre...@amazon.com>> wrote:

Hello all!

This doc from the wiki says "Lustre can support up to 2000 OSS per file system" 
(http://wiki.lustre.org/Lustre_Server_Requirements_Guidelines).

I'm a bit surprised by this statement. Does somebody has information about the 
upper limit to the number of OSSes?
Or what could be the scaling limitator for this number of OSS? Network limit? 
Memory consumption? Other?

That's likely a combination of a bit of confusion and a bit of safety on the 
part of Intel writing that document.

The Lustre Operations Manual writes:

Although a single file can only be striped over 2000 objects, Lustre file 
systems can have thousands of OSTs. The I/O bandwidth to access a single file 
is the aggregated I/O bandwidth to the objects in a file, which can be as much 
as a bandwidth of up to 2000 servers. On systems with more than 2000 OSTs, 
clients can do I/O using multiple files to utilize the full file system 
bandwidth.

I think PNNL once tested up to 4000 OSTs, and I think the compile-time limit 
is/was 8000 OSTs (maybe it was made dynamic, I don't recall offhand), but the 
current code could _probably_ handle up to 65000 OSTs without significant 
problems.  Beyond that, there is the 16-bit OST index limit in the filesystem 
device labels and the __u16 lov_user_md_v1->lmm_stripe_offset to specify the 
starting OST index for "lfs setstripe", but that could be overcome with some 
changes.

Given OSTs are starting to approach 1PB with large drives and 
declustered-parity RAID, this would get us in the range 8-65EB, which is over 
2^64 bytes (16EB), so I don't think it is an immediate concern.  Let me know if 
you have any trouble with a 9000-OST filesystem... :-)

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud






___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] limit on number of oss/ost's?

2018-10-11 Thread Patrick Farrell
The 160 limit has been raised.  I don't know what the new one is, but it is 
*quite* large.  I'm pretty sure it's beyond practical interest today.

There are a few issues with having extremely large numbers of OSTs, especially 
if you are explicitly trading off 1 vs many OSTs.

There are no particular scaling issues with number of OSTs of an OSS, so if you 
took the same storage and subdivided it to create more OSTs, there's no 
particular concern there.  But that assumes you're taking the same storage and 
deciding how to subdivide it - Obviously, a given amount of CPU/RAM/network on 
the OSS can only "feed" so much storage, so if you're just *adding* storage, 
you will quickly exhaust your OSS resources.  (Generally speaking one tries to 
match the two, so one does not have too much CPU/RAM/network bandwidth OR too 
much storage.)

The two main problems I see with "many" OSTs are:
1. They can get rather small, and so they can fill up relatively easily.  If 
your OSTs are really small and a few of the files assigned to that OST become 
large (so, they're assigned there when the OST is mostly empty, and then grow 
large), you'll run out of space on that OST and will no longer be able to write 
to files striped there.
2. As file stripe counts go up, the file layout - basically, the mapping from 
the byte range as seen in userspace to the actual objects on the OSTs - can 
become large enough that sending it around is a performance bottleneck.  
Opening a single file with hundreds of stripes from thousands of clients - like 
a large supercomputer center might do - can take a significant period of time.

That second is the only scaling issue with OST *count* that I'm aware of, other 
than that there is a bit of memory overhead for tracking each OST - so 10 OSTs 
instead of 1 OST will use marginally more memory on servers and clients.  This 
is pretty small, though.

So in general, I would say you'd be happier with fewer & faster, rather than 
more & slower, especially when talking about very large OST counts.  There are 
some performance issues with multiple writers to single files with low stripe 
counts, so it doesn't hold in extremis.  This is all to say you'd be much 
better served with 10 OSTs than with *1*, but 100 is probably not a better idea 
than 10.

- Patrick

On 10/11/18, 1:07 PM, "lustre-discuss on behalf of Michael Di Domenico" 
 
wrote:

Is there a limit on the number of oss servers you can have in a single
filesystem?  is there one for ost's?

I'm curious of the performance implications between two different
configurations (this is just theory mind you)...

1000 oss with 1 ost each
vs
100 oss with 10 ost each

one could scale this up further 2000, 5000, 1 oss's with 1 single ost 
each

i did note two references one from 2012 by Oleg Drokin that tested
1300 OST's at ornl which "mostly worked" and a note from Andreas last
year that quoted 2000 OST's before scaling issues.

I'm curious if there's a fundamental issue with scaling lustre, which
might be based on the presumption that oss's are typically fatter
nodes (getting fatter everyday) rather than large quantities of skinny
nodes

I'm also curious if there is still a 160 OST limit for file striping as 
well.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org