[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Michael McCandless (JIRA) Tue, 28 Sep 2010 02:33:04 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915685#action_12915685
 ]


Michael McCandless commented on LUCENE-2575:
--------------------------------------------

bq. A copy of the byte[][] refs is made when getReader is called.

Hmm why can't the reader just use the current byte[][]?  The writer only adds 
in new blocks to this array (doesn't overwrite the already written blocks, 
until flush)?  (And then allocates a new byte[][] once that array is full).

{quote}
I think the issue at the
moment is I'm using a boolean[] to signify if a byte[] needs to
be copied before being written to
{quote}
Hmm so we also copy-on-write a given byte[] block?  Is this because JMM can't 
make the guarantees we need about other threads reading the bytes written?

{quote}
I have a suspicion we'll change our minds about pooling byte[]s.
We may end up implementing ref counting anyways (as described
above), and the sudden garbage generated could be a massive
change for users?
{quote}

But even if we do reuse, we will cause tons of garbage, until the still-open 
readers are closed?  Ie we cannot re-use the byte[] being "held open" by any 
NRT reader that's still referencing the in-RAM segment after that segment had 
been flushed to disk.

Also the garbage shouldn't be that bad since each object is large.  It's not 
like 3.x's situation with FieldCache or terms dict index, for example....

I would start simple by dropping reuse.  We can then add it back if we see perf 
issues?

{quote}
Both very common types of queries, so we probably need some type
of skipping, which we will, it'll just be single-level.
{quote}
I would start simple, here, and make skipping stupid, ie just scan.  You can 
get everything working, all tests passing, etc., and then adding in skipping is 
much more isolated change.  You need all the isolation you can get here!  This 
stuff is *hairy*.

{quote}
As a side note, there is still an issue in my mind around the
term frequencies parallel array (introduced in these patches),
in that we'd need to make a copy of it for each reader (because
if it changes, the scoring model becomes inaccurate?).
{quote}

Hmm your'e right that each reader needs a private copy, to remain truly "point 
in time".  This (4 bytes per unique term X number of readers reading that term) 
is a non-trivial addition of RAM.

BTW I'm assuming IW will now be modal?  Ie caller must tell IW up front if NRT 
readers will be used?  Because non-NRT users shouldn't have to pay all this 
added RAM cost?


> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, 
> LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Reply via email to