[Lustre-devel] RE: [Arch] scalability study: single client _CONNECTS_ to a very large number of OSS servers

Peter J. Braam Fri, 17 Nov 2006 09:55:48 -0800

Hi Niu,

This is a good review of the scalability of connections.  But there are some
questions.  I have now cc'd lustre-devel to get the discussion in the open.


> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Niu YaWei
> Sent: Wednesday, November 15, 2006 8:29 PM
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: [Arch] scalability study: 
> 

single client _CONNECTS_ to a very large number of OSS servers

Review form:

> 1. Use case identifier:  single client _CONNECTS_ to a very 
> large number of OSS servers
> 
> 2. Link to architectural information: None
> 
> 3. HLD available: YES
> 
> 4. Patterns of basic operations:
>         a. RPCs:
>                 - One OST_CONNECT RPC for each OST.
>         b. fs/obd other methods:
>                 - obd_connect.
>         c. cache: None.
>         d. Lustre & Linux locks: No suspect locks.
>         e. lists, arrays, queues @runtime:
>                 - obd array obd_devs, the maximum device 
> count is 8k, so the osc count must be less than 8k, I think 
> it's enough.

Nope - we want this to be far more scalable than 8K OSC's.  Last week we
heard that Evan Felix ran with 4000 OSC's (getting a whopping 130GB/sec read
from Lustre!

The array needs to go away.  I think Nathan is already working on this for
the load simulator btw.

Hmm, I don't see a server side consideration of this problem.  Am I missing
something?


>                 - qos_add_tgt() will search and maintain the 
> lq_oss_list, this list grows as OSS number grow.
>                 - Need search connection in the 
> imp_conn_list, but this list is quite small and will never grow.
>         f. startup data: None.
> 
> 5. Scalable use pattern of basic operations:
>         - One client perform mount.
>         - MDS setup.

Is connect also used against the management server?

> 6. Scalability measures:
>         - The number of OST_CONNECT RPC is N (OST count), 
> since the RPC is sent asynchronously, it runs in O(1) time.
>         - Unless we are going to build a cluster with more 
> than 8k OSTs, we can't run out of obd_devs.
>         - The time complexity of qos_add_tgt() is O(N), and 
> it should only happen when MDS connect OSS, so no need
>           to improve it.

On the server side isn't there scanning in the list of existing connections
to see if a UUID of a connection is already in the list?  Isn't that list
O(N) long?  If so, the scan is O(N^2)?

Eeb - can you confirm one more time that connection setup, which is likely
to happen at this point in LNET has no linear scans?

> 
> 7. Experiment description and findings:
>         - No test for it.

Nathan - will the load simulator do this?  I think it could even be used
over the net?
 
> 8. Recommendations for improvements:
>         - No recommendation on implementation improvements.

A. Kill the array (P2)
B. Fix the searching on the server (P1)

> 9. Non scalable issues encountered, not identified by this process:
>         - The qos lists are useless for client's lov, but we 
> have to setup them since MDS and client use
>           the same lov driver, this needless list maintenance 
> work will burden each client mount, we should
>           avoid it.

Hmm.  But this will change in the future when the client has a full WB
cache, so let's leave them in.  Does the setup scale well?

- Peter -


> 
> 
> _______________________________________________
> Arch mailing list
> [EMAIL PROTECTED]
> https://mail.clusterfs.com/mailman/listinfo/arch

_______________________________________________
Lustre-devel mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-devel

[Lustre-devel] RE: [Arch] scalability study: single client _CONNECTS_ to a very large number of OSS servers

Reply via email to