On Nov 16, 2006 11:29 +0800, Niu YaWei wrote:
> 1. Use case identifier: single client _CONNECTS_ to a very large number
> of OSS servers
Just to clarify, will there be a separate use case for many clients
connecting to a single OSS?
> e. lists, arrays, queues @runtime:
> - obd array obd_devs, the maximum device count is 8k, so
> the osc count must be less than 8k, I think it's enough.
There has already been a real test with almost 4000 OSTs, though nothing
in production with more than ~500 OSTs AFAIK. I would still consider this
a scalability issue, and a bug should be filed and block the
scalability-tracking bug.
Yu did a good job making the obd_devs[] _elements_ dynamically allocated,
the work to make the array itself grow should be relatively small. Don't
do this with ever-increasing kmallocs, as that will eventually fail too
(likely because of memory fragmentation before the 128kB kmalloc limit,
which is only 16k 8-byte pointers).
> - The time complexity of qos_add_tgt() is O(N), and it should
> only happen when MDS connect OSS, so no need to improve it.
Hmm, later you (correctly) say that qos lists are also set up on
clients... The good news is that the existing "UUID HASH" work being
done can help avoid much of the list walking here (at least the "find OSS"
part) by adding the OSS UUID to the hash for fast lookup, though not the
"insert OSS into sorted-by-num-OSTs list" part.
The latter can be fixed by changing the algorithm to avoid searching:
- for each #OST count N {1..MAX_OSTS_PER_OSS}, keep a pointer to one OSS
with N OSTs on it, and insert AFTER that OSS when adding/moving an OSS.
- adding single-OST OSS insert at end of list, may set as reference for N=1
- if the "reference" OSS for N is incremented check .next for OSS with same
N and use that as new "reference" for N, or NULL pointer if none is found.
No list move is needed since the OSS is already at the boundary, though
it may become the new reference OSS for N+1.
- a MAX_OSTS_PER_OSS of 16 is safe (generally only 2 or 4 OSTs/OSS, and it
is only a pointer so array can be reallocated if this is exceeded)
This will keep the list sorting O(1), and benefits from the fact that we
know there will be a very small range of N (essentially implementing this
as a bunch of buckets instead of a list). It will need some special
handling to keep the references in sync when OSTs are removed from the OSS.
Can you please file a separate bug on this, also block scalability-tracking.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
_______________________________________________
Lustre-devel mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-devel