[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Jason Rutherglen (JIRA) Sat, 25 Sep 2010 10:25:56 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914838#action_12914838
 ]


Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

{quote}Can you explain what's the "copy on write ByteBlockPool"?
Exactly when do we make a copy....? {quote}

A copy of the byte[][] refs is made when getReader is called.
Each DWPT is locked, eg, writes stop, a copy of the byte[][] is
made (just the refs) for that reader. I think the issue at the
moment is I'm using a boolean[] to signify if a byte[] needs to
be copied before being written to. As with BV and norms cloning,
read-only references are carried forward, which would imply
making copies of the boolean[] as well. In other words, as with
BV and norms, I think we need ref counts to the individual
byte[]s so that read-only references to byte[]s are carried
forward properly. However this implies creating a BytesRefCount
object because a parallel array cannot point back to the same
underlying byte[] if the byte[] in the byte[][] can be replaced
when a copy is made. 

{quote}Do we have a design thought out for this? The challenge
is because every doc state now has its own private docID
stream{quote}

It sounded easy when I first heard it, however, I needed to
write it down to fully understand and work through what's going
on. That process is located in LUCENE-2558. 

{quote}Well, I was thinking only implement the single-level skip
case (since it ought to be alot simpler than the
MLSLW/R)....{quote}

I started on this, eg, implementing a single-level skip list
that reads and writes from the BBP. It's a good lesson in how to
use the BBP.

{quote}Actually, conjunction (AND) queries, and also
PhraseQuery{quote}

Both very common types of queries, so we probably need some type
of skipping, which we will, it'll just be single-level.

{quote}Probably we should stop reusing the byte[] with this
change? So when all readers using a given byte[] are finally
GCd, is when that byte[] is reclaimed.{quote}

I have a suspicion we'll change our minds about pooling byte[]s.
We may end up implementing ref counting anyways (as described
above), and the sudden garbage generated *could* be a massive
change for users? Of course ref counting was difficult to
implement the first time around in LUCENE-1314, perhaps however
it'll be easier the 2nd time. 

As a side note, there is still an issue in my mind around the
term frequencies parallel array (introduced in these patches),
in that we'd need to make a copy of it for each reader (because
if it changes, the scoring model becomes inaccurate?). However,
we could in fact use a 2 dimensional PagedBytes (in this case,
PagesInts) for this purpose. Or is the garbage of an int[] the
size of the number of docs OK per reader? There is also the
lookup cost to consider.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, 
> LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Reply via email to