Re: [lustre-discuss] Limit to number of OSS?
On Oct 10, 2019, at 11:20, Michael Di Domenico mailto:mdidomeni...@gmail.com>> wrote: On Mon, Oct 7, 2019 at 6:33 PM Andreas Dilger mailto:adil...@whamcloud.com>> wrote: With socklnd there are 3 TCP connections per client-server pair. For IB there is no such connection limit that I'm aware of. just out of morbid curiosity, can very briefly explain the connectivity differences between TCP/IB. Does IB use the same 3 connections as TCP? If not, is that why the connectivity limit doesn't exist with IB or is there some other overriding design principal in IB that allows lustre to push past TCP? Not that any of this has any relevance to anything i do, i'm just curious. i'd love to have 2000 OSS's and 20k clients, but sadly i do not... :( This is a fundamental difference between TCP and IB. TCP needs a persistent connection between peers (socket) to manage state, and the (very ancient) IP protocol on which TCP is built has a limit of 65536 connections on a single node. When computers had 1-2MB of RAM that was more than enough... IB does not have this limitation, though it does consume some memory for each peer that that it is communicating with. o2iblnd can establish multiple connections to a single peer to get better bandwidth, and this is important for OPA performance, but is not critical for IB networks. Cheers, Andreas -- Andreas Dilger Principal Lustre Architect Whamcloud ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Limit to number of OSS?
On Mon, Oct 7, 2019 at 6:33 PM Andreas Dilger wrote: > > With socklnd there are 3 TCP connections per client-server pair. > For IB there is no such connection limit that I'm aware of. just out of morbid curiosity, can very briefly explain the connectivity differences between TCP/IB. Does IB use the same 3 connections as TCP? If not, is that why the connectivity limit doesn't exist with IB or is there some other overriding design principal in IB that allows lustre to push past TCP? Not that any of this has any relevance to anything i do, i'm just curious. i'd love to have 2000 OSS's and 20k clients, but sadly i do not... :( ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Limit to number of OSS?
Whether there are problems with a large number of OSS and/or MDS nodes depends on whether you are using TCP or IB networking. With socklnd there are 3 TCP connections per client-server pair (bulk read, bulk write, and small message) so the maximum you could have would be around (65536 - 1024)/3 = 21500 (or likely fewer) clients or servers, unless you also configured LNet routers in between (which would allow more clients, but not more servers). That isn't a limitation for most deployments, but at least one known limitation. For IB there is no such connection limit that I'm aware of. There are likely other factors such as memory consumption per target, but I don't think that would be the first thing to cause problems on modern systems with hundreds of GB of RAM. Cheers, Andreas On Oct 4, 2019, at 01:45, Degremont, Aurelien mailto:degre...@amazon.com>> wrote: Thanks for this info. But actually I was really looking at the number of OSS, not OSTs :) This is really more how Lustre client nodes and MDT will cope with very large number of OSSes. De : Andreas Dilger mailto:adil...@whamcloud.com>> Date : vendredi 4 octobre 2019 à 04:54 À : "Degremont, Aurelien" mailto:degre...@amazon.com>> Objet : Re: [lustre-discuss] Limit to number of OSS? On Oct 3, 2019, at 07:55, Degremont, Aurelien mailto:degre...@amazon.com>> wrote: Hello all! This doc from the wiki says "Lustre can support up to 2000 OSS per file system" (http://wiki.lustre.org/Lustre_Server_Requirements_Guidelines). I'm a bit surprised by this statement. Does somebody has information about the upper limit to the number of OSSes? Or what could be the scaling limitator for this number of OSS? Network limit? Memory consumption? Other? That's likely a combination of a bit of confusion and a bit of safety on the part of Intel writing that document. The Lustre Operations Manual writes: Although a single file can only be striped over 2000 objects, Lustre file systems can have thousands of OSTs. The I/O bandwidth to access a single file is the aggregated I/O bandwidth to the objects in a file, which can be as much as a bandwidth of up to 2000 servers. On systems with more than 2000 OSTs, clients can do I/O using multiple files to utilize the full file system bandwidth. I think PNNL once tested up to 4000 OSTs, and I think the compile-time limit is/was 8000 OSTs (maybe it was made dynamic, I don't recall offhand), but the current code could _probably_ handle up to 65000 OSTs without significant problems. Beyond that, there is the 16-bit OST index limit in the filesystem device labels and the __u16 lov_user_md_v1->lmm_stripe_offset to specify the starting OST index for "lfs setstripe", but that could be overcome with some changes. Given OSTs are starting to approach 1PB with large drives and declustered-parity RAID, this would get us in the range 8-65EB, which is over 2^64 bytes (16EB), so I don't think it is an immediate concern. Let me know if you have any trouble with a 9000-OST filesystem... :-) Cheers, Andreas -- Andreas Dilger Principal Lustre Architect Whamcloud Cheers, Andreas -- Andreas Dilger Principal Lustre Architect Whamcloud ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Limit to number of OSS?
Thanks for this info. But actually I was really looking at the number of OSS, not OSTs :) This is really more how Lustre client nodes and MDT will cope with very large number of OSSes. De : Andreas Dilger Date : vendredi 4 octobre 2019 à 04:54 À : "Degremont, Aurelien" Cc : "lustre-discuss@lists.lustre.org" Objet : Re: [lustre-discuss] Limit to number of OSS? On Oct 3, 2019, at 07:55, Degremont, Aurelien mailto:degre...@amazon.com>> wrote: Hello all! This doc from the wiki says "Lustre can support up to 2000 OSS per file system" (http://wiki.lustre.org/Lustre_Server_Requirements_Guidelines). I'm a bit surprised by this statement. Does somebody has information about the upper limit to the number of OSSes? Or what could be the scaling limitator for this number of OSS? Network limit? Memory consumption? Other? That's likely a combination of a bit of confusion and a bit of safety on the part of Intel writing that document. The Lustre Operations Manual writes: Although a single file can only be striped over 2000 objects, Lustre file systems can have thousands of OSTs. The I/O bandwidth to access a single file is the aggregated I/O bandwidth to the objects in a file, which can be as much as a bandwidth of up to 2000 servers. On systems with more than 2000 OSTs, clients can do I/O using multiple files to utilize the full file system bandwidth. I think PNNL once tested up to 4000 OSTs, and I think the compile-time limit is/was 8000 OSTs (maybe it was made dynamic, I don't recall offhand), but the current code could _probably_ handle up to 65000 OSTs without significant problems. Beyond that, there is the 16-bit OST index limit in the filesystem device labels and the __u16 lov_user_md_v1->lmm_stripe_offset to specify the starting OST index for "lfs setstripe", but that could be overcome with some changes. Given OSTs are starting to approach 1PB with large drives and declustered-parity RAID, this would get us in the range 8-65EB, which is over 2^64 bytes (16EB), so I don't think it is an immediate concern. Let me know if you have any trouble with a 9000-OST filesystem... :-) Cheers, Andreas -- Andreas Dilger Principal Lustre Architect Whamcloud ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Limit to number of OSS?
On Oct 3, 2019, at 07:55, Degremont, Aurelien mailto:degre...@amazon.com>> wrote: Hello all! This doc from the wiki says "Lustre can support up to 2000 OSS per file system" (http://wiki.lustre.org/Lustre_Server_Requirements_Guidelines). I'm a bit surprised by this statement. Does somebody has information about the upper limit to the number of OSSes? Or what could be the scaling limitator for this number of OSS? Network limit? Memory consumption? Other? That's likely a combination of a bit of confusion and a bit of safety on the part of Intel writing that document. The Lustre Operations Manual writes: Although a single file can only be striped over 2000 objects, Lustre file systems can have thousands of OSTs. The I/O bandwidth to access a single file is the aggregated I/O bandwidth to the objects in a file, which can be as much as a bandwidth of up to 2000 servers. On systems with more than 2000 OSTs, clients can do I/O using multiple files to utilize the full file system bandwidth. I think PNNL once tested up to 4000 OSTs, and I think the compile-time limit is/was 8000 OSTs (maybe it was made dynamic, I don't recall offhand), but the current code could _probably_ handle up to 65000 OSTs without significant problems. Beyond that, there is the 16-bit OST index limit in the filesystem device labels and the __u16 lov_user_md_v1->lmm_stripe_offset to specify the starting OST index for "lfs setstripe", but that could be overcome with some changes. Given OSTs are starting to approach 1PB with large drives and declustered-parity RAID, this would get us in the range 8-65EB, which is over 2^64 bytes (16EB), so I don't think it is an immediate concern. Let me know if you have any trouble with a 9000-OST filesystem... :-) Cheers, Andreas -- Andreas Dilger Principal Lustre Architect Whamcloud ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] limit on number of oss/ost's?
The 160 limit has been raised. I don't know what the new one is, but it is *quite* large. I'm pretty sure it's beyond practical interest today. There are a few issues with having extremely large numbers of OSTs, especially if you are explicitly trading off 1 vs many OSTs. There are no particular scaling issues with number of OSTs of an OSS, so if you took the same storage and subdivided it to create more OSTs, there's no particular concern there. But that assumes you're taking the same storage and deciding how to subdivide it - Obviously, a given amount of CPU/RAM/network on the OSS can only "feed" so much storage, so if you're just *adding* storage, you will quickly exhaust your OSS resources. (Generally speaking one tries to match the two, so one does not have too much CPU/RAM/network bandwidth OR too much storage.) The two main problems I see with "many" OSTs are: 1. They can get rather small, and so they can fill up relatively easily. If your OSTs are really small and a few of the files assigned to that OST become large (so, they're assigned there when the OST is mostly empty, and then grow large), you'll run out of space on that OST and will no longer be able to write to files striped there. 2. As file stripe counts go up, the file layout - basically, the mapping from the byte range as seen in userspace to the actual objects on the OSTs - can become large enough that sending it around is a performance bottleneck. Opening a single file with hundreds of stripes from thousands of clients - like a large supercomputer center might do - can take a significant period of time. That second is the only scaling issue with OST *count* that I'm aware of, other than that there is a bit of memory overhead for tracking each OST - so 10 OSTs instead of 1 OST will use marginally more memory on servers and clients. This is pretty small, though. So in general, I would say you'd be happier with fewer & faster, rather than more & slower, especially when talking about very large OST counts. There are some performance issues with multiple writers to single files with low stripe counts, so it doesn't hold in extremis. This is all to say you'd be much better served with 10 OSTs than with *1*, but 100 is probably not a better idea than 10. - Patrick On 10/11/18, 1:07 PM, "lustre-discuss on behalf of Michael Di Domenico" wrote: Is there a limit on the number of oss servers you can have in a single filesystem? is there one for ost's? I'm curious of the performance implications between two different configurations (this is just theory mind you)... 1000 oss with 1 ost each vs 100 oss with 10 ost each one could scale this up further 2000, 5000, 1 oss's with 1 single ost each i did note two references one from 2012 by Oleg Drokin that tested 1300 OST's at ornl which "mostly worked" and a note from Andreas last year that quoted 2000 OST's before scaling issues. I'm curious if there's a fundamental issue with scaling lustre, which might be based on the presumption that oss's are typically fatter nodes (getting fatter everyday) rather than large quantities of skinny nodes I'm also curious if there is still a 160 OST limit for file striping as well. ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org