[
https://issues.apache.org/jira/browse/LUCENE-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13905331#comment-13905331
]
Shai Erera commented on LUCENE-5438:
------------------------------------
I committed NRTIndexRevision and matching test:
* NRTIndexRevision lists the files in the SIS obtained from
IW.flushAndIncrement() and holds a byte[] in memory of that SIS (by
{{SIS.write(DataOutput)}}).
* The segments_N is listed as segments_nrt_N, where N is SIS.getVersion() and
in fact this file isn't materialized on any Directory, it's just listed so
replica can request it.
* There's a nocommit about how to handle commits, i.e. if you: addDoc(),
commit(), addDoc(), publish(new NRTRev()), SIS.listFiles() contains segments_N
as well
** On one hand I think it's good to list that file as well, so that replica can
replicate it and if it crashes, it can recover to the last known commit
** It also simplifies how the app should integrate its NRT and commit() w/
replicator
** But if we choose to pass that file as well, we should take care of it on the
replica side, by e.g. sync'ing it (which we don't do for the in-memory SIS).
I think it will be good if the framework allows the app to publish
IndexRevision (commits) and NRTIndexRevision (NRT) seamlessly, so app can
choose what to replicate. That way NRTIndexRevision doesn't need to know about
SnapshotDeletionPolicy and its code path remains simple (IW.flushAndIncRef +
IW.decRefDeleter). Once I add support on the replica side, I'll write a test
which demonstrates how to mix commits/nrt w/ one replicator.
> add near-real-time replication
> ------------------------------
>
> Key: LUCENE-5438
> URL: https://issues.apache.org/jira/browse/LUCENE-5438
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/replicator
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 4.7, 5.0
>
> Attachments: LUCENE-5438.patch, LUCENE-5438.patch
>
>
> Lucene's replication module makes it easy to incrementally sync index
> changes from a master index to any number of replicas, and it
> handles/abstracts all the underlying complexity of holding a
> time-expiring snapshot, finding which files need copying, syncing more
> than one index (e.g., taxo + index), etc.
> But today you must first commit on the master, and then again the
> replica's copied files are fsync'd, because the code operates on
> commit points. But this isn't "technically" necessary, and it mixes
> up durability and fast turnaround time.
> Long ago we added near-real-time readers to Lucene, for the same
> reason: you shouldn't have to commit just to see the new index
> changes.
> I think we should do the same for replication: allow the new segments
> to be copied out to replica(s), and new NRT readers to be opened, to
> fully decouple committing from visibility. This way apps can then
> separately choose when to replicate (for freshness), and when to
> commit (for durability).
> I think for some apps this could be a compelling alternative to the
> "re-index all documents on each shard" approach that Solr Cloud /
> ElasticSearch implement today, and it may also mean that the
> transaction log can remain external to / above the cluster.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]