[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696438#action_12696438 ]
Michael McCandless commented on LUCENE-1313: -------------------------------------------- {quote} > I'd be very interested to compare (benchmark) this approach > vs solely LUCENE-1516. Is the .alg using the NearRealtimeReader from LUCENE-1516 our best measure of realtime performance? {quote} So far, I think so? You get to set an update rate (delete + add) docs, eg 50 docs/sec, and a pause time between NRT reopens. Still, it's synthetic. If you guys (LinkedIn) have a way to fold in some realism into the test, that'd be great, if only "our app ingests at X docs(MB)/sec and reopens the NRT reader X times per second" to set our ballback. {quote} > the transactional restriction could/should layer on > top of this performance optimization for near-realtime search? The transactional system should be able to support both methods. Perhaps a non-locking setting would allow the same RealtimeIndex class support both modes of operation? {quote} Sorry, what are both "modes" of operation? I think there are two different "layers" here -- first layer optimizes NRT by flushing small segments to RAMDir first. This seems generally useful and in theory has no impact to the API IndexWriter exposes (it's "merely" an internal optimization). The second layer adds this new Transaction object, such that N adds/deletes/commit/re-open NRT reader can be done atomically wrt other pending Transaction objects. {quote} We'll need to integrate the RAM based indexer into IndexWriter to carry over the deletes to the ram index while it's copied to disk. This is similar to IndexWriter.commitMergedDeletes carrying deletes over at the segment reader level based on a comparison of the current reader and the cloned reader. Otherwise there's redundant deletions to the disk index using IW.deleteDocuments which can be unnecessarily expensive. To make external we would need to do the delete by doc id genealogy. {quote} Right, I think the RAMDir optimization would go directly into IW, if we can separate it out from Transaction. It could naturally derive from the existing RAMBufferSizeMB, ie if NRT forces a flush, so long as its tiny, put it into the local RAMDir instead of the actual Dir, then "deduct" that size from the allowed budget of DW's ram usage. When RAMDIr + DW exceeds RAMBufferSizeMB, we then merge all of RAMDir's segments into a "real" segment in the directory. > Realtime Search > --------------- > > Key: LUCENE-1313 > URL: https://issues.apache.org/jira/browse/LUCENE-1313 > Project: Lucene - Java > Issue Type: New Feature > Components: Index > Affects Versions: 2.4.1 > Reporter: Jason Rutherglen > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, > lucene-1313.patch, lucene-1313.patch, lucene-1313.patch > > > Realtime search with transactional semantics. > Possible future directions: > * Optimistic concurrency > * Replication > Encoding each transaction into a set of bytes by writing to a RAMDirectory > enables replication. It is difficult to replicate using other methods > because while the document may easily be serialized, the analyzer cannot. > I think this issue can hold realtime benchmarks which include indexing and > searching concurrently. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org