[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Jason Rutherglen (JIRA) Sat, 16 Oct 2010 19:19:49 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921784#action_12921784
 ]


Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

The issue with the model given is the posting-upto is handed to
the byte slice reader as the end index. However newly written
bytes may not actually make it to a reader thread as per the
JMM. A reader thread may reach partially written bytes. There
doesn't seem to be a way to tell the reader it's reached the end
of the written bytes and so we probably need to add 2 paged ints
arrays for freq and prox uptos respectively. This would be
unfortunate because either the paged ints will need to be
updated during the get reader call, or during indexing. Both
could be detrimental to performance, though the net is still
faster that the current NRT solution. The alternative is to
simply copy-on-write the byte blocks, though that'd need to
include the int blocks as well. I think we'd want to update the
paged ints during indexing, otherwise discount it as a solution
because otherwise it'd require full array iterations in the get
reader call to compare and update. The advantage of
copy-on-write of the blocks is the indexing speed will not be
affected, nor the read speed, the main potential performance
drag could be the garbage generated by the byte and int arrays
thrown away on reader close. It would depend on how many blocks
were updated in between get reader calls. 

We probably need to implement both solutions, try them out and
measure the performance difference. 

There's Michael B.'s multiple slice levels linked together
by atomic int arrays illustrated here:
http://www.box.net/shared/hivdg1hge9 

After reading this, the main idea I think we can use is to
instead of using paged ints, simply maintain 2 upto arrays. One
that's being written to, and a 2nd that's guaranteed to be in
sync with the byte blocks. This would save on garbage and
lookups into paged ints. The cost would is the array copy in the
get reader lock. Given the array already exists, the copy should
be fast?  Perhaps this is the go ahead solution?

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, 
> LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Reply via email to