[ https://issues.apache.org/jira/browse/SOLR-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146174#comment-14146174 ]
Renaud Delbru commented on SOLR-6460: ------------------------------------- Hi, here is an initial analysis and proposal of the modifications of the UpdateLog for the CDCR scenario. Most of the original workflow of the UpdateLog can be left untouched. It is necessary however to keep the concept of "maximum number of records to keep" (except for the cleaning of old transaction logs) in order to not interfere with the normal workflow. h4. Cleaning of Old Transaction Logs The logic to remove old tlog files should be modified so that it relies on pointers instead of a limit defined by the maximum number of records to keep. The UpdateLog should be the one in charge of keeping the list of pointers and of managing their life-cycle (or to deleguate it to the LogReader which is presented next). Such a pointer, denoted LogPointer, should be composed of a tlog file and of an associated file pointer. h4. Log Reader The UpdateLog must provide a log reader, denoted LogReader, that will be used by the CDC Replicator to search, scan and read the update logs. The LogReader will wrap a LogPointer and hide its management (e.g., instantiation, increment, release). The operations that must be provided by the LogReader are: * Scan: move LogPointer to next entry * Read: read a log entry specified by the LogPointer * Lookup: lookup a version number - this will be performed during the initialisation of the CDC Replicator / election of a new leader, therefore rarely. The LogReader must not only read olf tlog files, but also the new tlog file (i.e., transaction log being written). This requires specific logic, since a LogReader can be exhausted at a time t1 and have new entries available at a time t2. h4. Log Index In order to support efficient lookup of version numbers across a large number of tlog files, we need a pre-computed index of version numbers across tlog files. The index could be designed as a list of tlog files, associated with their lower and upper bound in term of version numbers. The search will then read this index to find quickly the tlog files containing a given version number, then read the tlog file to find the associated entry. However, a single tlog file can be large in certain scenarios. Therefore, we could add another secondary index per tlog file. This index will contain a list of <version, pointer> pairs. This will allow the LogReader to quickly find an entry without having to scan the full tlog file. This index will be created and managed by the TransactionLog. This secondary index however duplicates the version number for each log entry. A possible optimisation is to modify the format of the transaction log so that the version number is not stored as part of the log entry. h4. Transaction Log The TransactionLog class is opening the tlog file in the constructor. This could be problematic with a large numbers of tlog files, as it will exhaust the file descriptors. One possible solution is to create a subclass for read only mode that will not open the file in the constructor. Instead, the file will be opened and closed on-demand by using the TransactionLog#LogReader. The CDCR Update Logs will take care of converting old transaction log objects into a read-only version. This has however indirect consequences on the initialisation of the UpdateLog, more precisely in the recovery phase (#recoverFromLog), as the UpdateLog might write a commit (line 1418) at the end of an old tlog during replaying. h4. Integration within the UpdateHandler We will have to extend the UpdateHandler constructor in order to have the possibility to switch the UpdateLog implementation based on some configuration keys in the solrconfig.xml file. > Keep transaction logs around longer > ----------------------------------- > > Key: SOLR-6460 > URL: https://issues.apache.org/jira/browse/SOLR-6460 > Project: Solr > Issue Type: Sub-task > Reporter: Yonik Seeley > > Transaction logs are currently deleted relatively quickly... but we need to > keep them around much longer to be used as a source for cross-datacenter > recovery. This will also be useful in the future for enabling peer-sync to > use more historical updates before falling back to replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org