[ 
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696438#action_12696438
 ] 

Michael McCandless commented on LUCENE-1313:
--------------------------------------------

{quote}
> I'd be very interested to compare (benchmark) this approach
> vs solely LUCENE-1516.

Is the .alg using the NearRealtimeReader from LUCENE-1516 our
best measure of realtime performance?
{quote}

So far, I think so?  You get to set an update rate (delete + add) docs, eg 50 
docs/sec, and a pause time between NRT reopens.

Still, it's synthetic.  If you guys (LinkedIn) have a way to fold in some 
realism into the test, that'd be great, if only "our app ingests at X 
docs(MB)/sec and reopens the NRT reader X times per second" to set our ballback.

{quote}
> the transactional restriction could/should layer on
> top of this performance optimization for near-realtime search?

The transactional system should be able to support both methods.
Perhaps a non-locking setting would allow the same RealtimeIndex
class support both modes of operation?
{quote}

Sorry, what are both "modes" of operation?

I think there are two different "layers" here -- first layer optimizes NRT by 
flushing small segments to RAMDir first.  This seems generally useful and in 
theory has no impact to the API IndexWriter exposes (it's "merely" an internal 
optimization).  The second layer adds this new Transaction object, such that N 
adds/deletes/commit/re-open NRT reader can be done atomically wrt other pending 
Transaction objects.

{quote}
We'll need to integrate the RAM based indexer into IndexWriter
to carry over the deletes to the ram index while it's copied to
disk. This is similar to IndexWriter.commitMergedDeletes
carrying deletes over at the segment reader level based on a
comparison of the current reader and the cloned reader.
Otherwise there's redundant deletions to the disk index using
IW.deleteDocuments which can be unnecessarily expensive. To make
external we would need to do the delete by doc id genealogy.
{quote}

Right, I think the RAMDir optimization would go directly into IW, if we can 
separate it out from Transaction.  It could naturally derive from the existing 
RAMBufferSizeMB, ie if NRT forces a flush, so long as its tiny, put it into the 
local RAMDir instead of the actual Dir, then "deduct" that size from the 
allowed budget of DW's ram usage.  When RAMDIr + DW exceeds RAMBufferSizeMB, we 
then merge all of RAMDir's segments into a "real" segment in the directory.

> Realtime Search
> ---------------
>
>                 Key: LUCENE-1313
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1313
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, 
> lucene-1313.patch, lucene-1313.patch, lucene-1313.patch
>
>
> Realtime search with transactional semantics.  
> Possible future directions:
>   * Optimistic concurrency
>   * Replication
> Encoding each transaction into a set of bytes by writing to a RAMDirectory 
> enables replication.  It is difficult to replicate using other methods 
> because while the document may easily be serialized, the analyzer cannot.
> I think this issue can hold realtime benchmarks which include indexing and 
> searching concurrently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to