On 28/07/10 13:07, Emmanuel Lecharny wrote:
On 7/28/10 11:31 AM, Stefan Seelmann wrote:
I was thinking lately about the DN class. I know that OpenDS (and
probably
UnboundId, but not sure) has a DN.valueOf( "<a DN>" ) factory that
returns a
DN potentially leveraging a cache associated to a ThreadLocal.
...
I don't think it's such a good idea :
- first, as it's ThreadLocal based, you will have as many cache as
you have
threads processing requests. Not sure it competes with a unique
cache, not
sure either we can't use the memory in a better way...
An advantage to use ThreadLocal is that you don't need to synchronize
access to the cache Could be worth to measure the performance
Using ConcurrentHashMap should not be a major performance penalty. I
mean, it *will* be more costly than not having any synchronization but
it sounds acceptable.
Unfortunately a CHM won't help either since you need to manage cache
eviction, assuming that you want the cache to have a finite size.
LinkedHashMap has an eviction strategy which can be defined by
overriding the removeEldestEntry method, but unfortunately LHM is not
thread safe.
Another possibility is to use a CopyOnWriteArraySet, but I'm afraid
that it will crawl if many new DN are added.
Yes - this is not an appropriate collection to use.
difference, I wonder if the OpenDS team did some performance analysis?
I did some testing some time back and I have forgotten the exact figures
that I got. I do remember finding a substantial performance improvement
when parsing DNs when caching is enabled - something like 30ns with
caching vs 300ns without for DNs containing 4 RDN components (i.e. about
an order of magnitude IIRC).
We implement our DNs using a recursive RDN + parent DN structure so we
are usually able to fast track the decoding process to just a single RDN
for DNs having a common ancestor (pretty common).
We opted for the ThreadLocal approach due to the synchronization
limitations of using a single global cache. However, I have often
worried about this approach as it will not scale for applications having
large numbers of threads, resulting in OOM exceptions.
Another approach I have thought about is to use a single global
two-level cache comprising of a fixed size array of LinkedHashMaps
(think of it as a Map of Maps) each one having its own synchronization.
We then distribute the DNs over the LHMs and amortize the
synchronization costs across multiple locks (in a similar manner to CHM).
This idea needs testing. In particular, we'd need to figure out the
optimal array size (i.e. number of locks / LHMs). For example,
distributing the cache over 16 LHMs is not going to help much for really
big multi-threaded apps containing 16000+ threads (1000 threads
contending per lock).
A major problem with this approach if we choose to use it in the OpenDS
SDK is that common ancestor DNs (e.g. "dc=com") are going to end up in a
single LHM so, using our current design (RDN + parent DN), all decoding
attempts will usually end up contending on the same lock anyway :-( So
we may need to change our DN implementation to better cope with this
caching strategy.
We are not alone though: a concurrent Map implementation which can be
used for caching in a similar manner to LHM is one of the most
frequently demanded enhancements to the java.util.concurrent library.
They compared the performances they get with a ThreadLocal cache and
no cache : the gain was sensible (I don't have the number for OpenDS).
FYI, the DN parsing count for more or less 13% of the whole CPU needed
internally (network excluded) to process a simple search, and
normalization cost an extra 10%. There is most certainly a net
potential gain to implement a DN cache !
Agreed DN parsing can be expensive, especially normalization.
Matt