Re: ThreadLocal in SegmentReader

robert engels Fri, 11 Jul 2008 06:18:29 -0700

As always, you still have the issue that if the object in theThreadLocal has a reference to a native resource (e.g. file handle),you might run out of file handles before any OOM which triggers theGC (to close the file handle if relying on finalization).


On Jul 11, 2008, at 4:54 AM, Michael McCandless wrote:

OK, I created a simple test to test this (attached). The test justruns 10 threads, each one creating a 100 KB byte array which isstored into a ThreadLocal, and then periodically the ThreadLocal isreplaced with a new one. This is to test whether GC of aThreadLocal, even though the thread is still alive, in fact leadsto GC of the objects held in the ThreadLocal.
Indeed on Sun JRE 1.4, 1.5 and 1.6 it appears that the objects arein fact properly collected.
So this is not a leak but rather a "delayed collection" issue.Java's GC is never guaranteed to be immediate, and apparently whenusing ThreadLocals it's even less immediate than "normal". In theoriginal issue, if other things create ThreadLocals, theneventually Lucene's unreferenced ThreadLocals would be properlycollected.
So I think we continue to use non-static ThreadLocals in Lucene...

Mike

<ThreadTest.java>


robert engels wrote:
Once again, these are "static" thread locals. A completelydifferent issue. Since the object is available statically, theweak reference cannot be cleared so stale entries will never becleared as long as the thread is alive.
On Jul 9, 2008, at 4:46 PM, Adrian Tarau wrote:
Just a few examples of "problems" using ThreadLocals.

http://opensource.atlassian.com/projects/hibernate/browse/HHH-2481
http://www.theserverside.com/news/thread.tss?thread_id=41473
Once again, I'm not pointing to Lucene SegmentReader as a "bad"implementation, and maybe the current "problems" of ThreadLocalsare not a problem for SegmentReader but it seems safer to useThreadLocals to pass context information which is cleared whenthe call exits instead of storing long-lived objects.
robert engels wrote:
Aside from the pre-1.5 thread local "perceived leak", there areno issues with ThreadLocals if used properly.
There is no need for try/finally blocks, unless you MUST releaseresources immediately, usually this is not the case, which iswhy a ThreadLocal is used in the first place.
From the ThreadLocalMap javadoc...

 /**
     * ThreadLocalMap is a customized hash map suitable only for
     * maintaining thread local values. No operations are exported
* outside of the ThreadLocal class. The class is packageprivate to* allow declaration of fields in class Thread. To helpdeal with
     * very large and long-lived usages, the hash table entries use
* WeakReferences for keys. However, since reference queuesare not
     * used, stale entries are guaranteed to be removed only when
     * the table starts running out of space.
     */

/**
         * Heuristically scan some cells looking for stale entries.
         * This is invoked when either a new element is added, or
         * another stale one has been expunged. It performs a
         * logarithmic number of scans, as a balance between no
* scanning (fast but retains garbage) and a number ofscans
         * proportional to number of elements, that would find all
* garbage but would cause some insertions to take O(n)time.
         *
         * @param i a position known NOT to hold a stale entry. The
         * scan starts at the element after i.
         *
* @param n scan control: <tt>log2(n)</tt> cells arescanned,
         * unless a stale entry one is found, in which case
* <tt>log2(table.length)-1</tt> additional cells arescanned.* When called from insertions, this parameter is thenumber
         * of elements, but when from replaceStaleEntry, it is the
* table length. (Note: all this could be changed to beeither
         * more or less aggressive by weighting n instead of just
* using straight log n. But this version is simple,fast, and
         * seems to work well.)
         *
         * @return true if any stale entries have been removed.
         */
The instance ThreadLocals (and what the refer to) will be GC'dwhen the containing Object is GC'd.
There IS NO MEMORY LEAK in ThreadLocal. If the ThreadLocalrefers to an object that has native resources (e.g. filehandles), it may not be released until other thread locals arecreated by the thread (or the thread terminates).
You can avoid this "delay" by calling remove(), but in mostapplications it should never be necessary - unless a verystrange usage...
On Jul 9, 2008, at 2:37 PM, Adrian Tarau wrote:
From what I know, storing objects in ThreadLocal is safe aslong as you release the object within a try {} finall {} blockor store objects which are independent of the rest of the code(no dependencies).Otherwise it can get pretty tricky(memoryleaks, classloader problems) after awhile.
It is pretty convenient to pass HTTP request information with aThreadLocal in a servlet(but you should cleanup the variablebefore leaving the servlet) but I'm not sure how safe it is inthis case.
robert engels wrote:
Using synchronization is a poor/invalid substitute for threadlocals in many cases.
The point of the thread local in these referenced cases is tooallow streaming reads on a file descriptor. if you use ashared file descriptor/buffer you are going to continuallyinvalidate the buffer.
On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote:
Well ... SegmentReader uses ThreadLocal to hold a thread-private instance of TermVectorsReader, to avoid synchronizingper-document when loading term vectors.
Clearing this ThreadLocal value per call to SegmentReader'smethods that load term vectors would defeat its purpose.
Though, of course, we then synchronize on the underlying file(when using FSDirectory), so perhaps we are really not savingmuch by using ThreadLocal here. But we are looking to relaxthat low level synchronization with LUCENE-753.
Maybe we could make our own ThreadLocal that just uses aHashMap, which we'd have to synchronize on when getting theper-thread instances. Or, go back to sharing a singleTermVectorsReader and synchronize per-document.
Jason has suggested moving to a model where you ask theIndexReader for an object that can return term vectors /stored fields / etc, and then you interact with that manytimes to retrieve each doc. We could then synchronize onlyon retrieving that object, and provide a thread-privateinstance.
It seems like we should move away from using ThreadLocal inLucene and do "normal" synchronization instead.
Mike

Adrian Tarau wrote:
Usually ThreadLocal.remove() should be called at the end(ina finally block), before the current call leaves your code.
Ex : if during searching ThreadLocal is used, every search(..) method should cleanup any ThreadLocal variables, oreven deeper in the implementation. When the call leavesLucene any used ThreadLocal should be cleaned up.
Michael McCandless wrote:
ThreadLocal, which we use in several places in Lucene,causes a leak in app servers because the classloader neverfully deallocates Lucene's classes because the ThreadLocalis holding strong references.
Yet, ThreadLocal is very convenient for avoidingsynchronization.
Does anyone have any ideas on how to solve this w/o fallingback to "normal" synchronization?
Mike

Begin forwarded message:
From: "Yonik Seeley" <[EMAIL PROTECTED]>
Date: July 7, 2008 3:30:28 PM EDT
To: [EMAIL PROTECTED]
Subject: Re: ThreadLocal in SegmentReader
Reply-To: [EMAIL PROTECTED]

On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
So now I'm confused: the SegmentReader itself should nolonger be reachable,assuming you are not holding any references to yourIndexReader.
Which means the ThreadLocal instance should no longer bereachable.
It will still be referenced from the Thread(s) ThreadLocalMap
The key (the ThreadLocal) will be weakly referenced, butthe values(now stale) are strongly referenced and won't be actuallyremoved
until the table is resized (under the Java6 impl at least).
Nice huh?

-Yonik
---------------------------------------------------------------------To unsubscribe, e-mail: java-user-[EMAIL PROTECTED]For additional commands, e-mail: java-user-[EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: java-dev-[EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: java-dev-[EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: ThreadLocal in SegmentReader

Reply via email to