[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Jason Rutherglen (JIRA) Fri, 24 Sep 2010 15:30:56 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914684#action_12914684
 ]


Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

{quote}Maybe we can not skip until we've hit the max slice? This
way skipping would always know it's on the max slice. This works
out to 429 bytes into the stream... likely this is fine. {quote}

Me like-y. I'll implement the skip list to point to the largest
level slices.

{quote}Can we just have IW allocate a new byte[][] after flush?
So then any open readers can keep using the one they have?{quote}

This means the prior byte[]s will still be recycled after all
active previous flush readers are closed? If there are multiple
readers from the previous flush, we'd probably still need
reference counting (ala bitvector and norms)? Unfortunately a
reference count parallel array will not quite work because we're
copy-on-writing the byte[]s, eg, there's nothing consistent for
the index numeral to point to. A hash map of byte[]s would
likely be too heavyweight? We may need to implement a ByteArray
object composed of a byte[] and a refcount. This is somewhat
counter to our parallel array memory savings strategy, though it
is directly analogous to the way norms are implemented in
SegmentReader.

{quote}it's possible single level skipping, with a larger skip
interval, is fine for even large RAM buffers.{quote}

True, I'll implement a default of one level, and a default
large-ish skip interval.

{quote}Maybe we can get an initial version of this working,
without the skipping? Ie skipping is implemented as scanning.
{quote}

How many scorers, or how often is skipping used? It's mostly for
disjunction queries? If we limit the skip level to one, and not
implement the BBP level byte at the beginning of the slice, the
MLSL will be a lot easier (ie faster) to implement and test. 

I'd like to see BytesHash get out of THPF (eg, LUCENE-2662), get
deletes working in the RT branch, and merge the flush by DWPT to
trunk. Concurrently I'll work on the search on the RAM buffer
which is most of the way completed. I'd prefer to test a more
complete version of LUCENE-2312 with skip lists (which can
easily be turned off), so that when we do take it through the
laundromat of testing, we won't need to retrofit anything back
in, re-test, and possibly re-design. 

On a side note related to testing: One naive way I've tested is
to do the copy-on-write of the BBP when the segment needs to be
flushed to disk, and write the segment from the read-only copy
of the BBP. If the segment is correct, then at least we know the
copy worked properly and nothing's missing.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, 
> LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Reply via email to