[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Jason Rutherglen (JIRA) Tue, 28 Sep 2010 20:00:02 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916000#action_12916000
 ]


Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

OK, I think there's a solution to copying the actual byte[],
we'd need to alter the behavior of BBPs. It would require always
allocating 3 empty bytes at the end of a slice for the
forwarding address, rather than what we do today, which is write
the postings up to the end of the slice, then when allocating a
new slice, copying the last 3 bytes forward to the new slice
location. We would also need to pass a unique parallel posting
upto array to each reader. This is required so that the reader
never ventures beyond the end of a slice, as the slice was
written when the reader was instantiated.

This would yield significant savings because we would not be
generating garbage from the byte[]s, which are 32 KB each. They
add up if the indexing is touching many different byte[]s for
example. With this solution, there would essentially not be any
garbage generated from incremental indexing, only after a DWPTs
segment is flushed (and all readers were also GCed). 

The only downside is we'd be leaving those 3 bytes per term
unallocated at all times, that's not a very high price. Perhaps
more impacting is the posting upto array per reader, which'd be
4 bytes per term, the same cost as the term freq array. It's a
pick your poison problem.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, 
> LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Reply via email to