[ 
https://issues.apache.org/jira/browse/LUCENE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reopened LUCENE-2329:
----------------------------------------


Reopening -- this fixed causes an intermittent deadlock in
TestStressIndexing2.

It's actually a pre-existing issue, whereby if a flush happens only
because of deletions (ie no indexed docs), and you're using multiple
threads, it's possible some idled threads would fail to be notified
to wake up and continue indexing once the flush completes.

The fix here increased the chance of hitting that bug because the RAM
accounting has a bug whereby it overly-aggressively flushes because of
deletions, ie, rather than free up RAM allocated but not used for
indexing, it flushes.

I first fixed the deadlock case (need to clear DW's flushPending when
we only flush deletes).

Then I fixed the shared deletes/indexing RAM by:

  * Not reusing the RAM for postings arrays -- we now null this out
    for every field after flushing

  * Calling balanceRAM when deletes have filled up RAM before deciding
    to flush, because this can free RAM up, making more space for
    deletes.

I also further simplified things -- no more separate call to
doBalanceRAM, and added a fun unit test that randomly alternates
between pure indexing and pure deleting, asserting that the flushing
doesn't "run hot" on any of those transitions.


> Use parallel arrays instead of PostingList objects
> --------------------------------------------------
>
>                 Key: LUCENE-2329
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2329
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: lucene-2329-2.patch, LUCENE-2329.patch, 
> LUCENE-2329.patch, LUCENE-2329.patch, lucene-2329.patch, lucene-2329.patch, 
> lucene-2329.patch
>
>
> This is Mike's idea that was discussed in LUCENE-2293 and LUCENE-2324.
> In order to avoid having very many long-living PostingList objects in 
> TermsHashPerField we want to switch to parallel arrays.  The termsHash will 
> simply be a int[] which maps each term to dense termIDs.
> All data that the PostingList classes currently hold will then we placed in 
> parallel arrays, where the termID is the index into the arrays.  This will 
> avoid the need for object pooling, will remove the overhead of object 
> initialization and garbage collection.  Especially garbage collection should 
> benefit significantly when the JVM runs out of memory, because in such a 
> situation the gc mark times can get very long if there is a big number of 
> long-living objects in memory.
> Another benefit could be to build more efficient TermVectors.  We could avoid 
> the need of having to store the term string per document in the TermVector.  
> Instead we could just store the segment-wide termIDs.  This would reduce the 
> size and also make it easier to implement efficient algorithms that use 
> TermVectors, because no term mapping across documents in a segment would be 
> necessary.  Though this improvement we can make with a separate jira issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to