FWIW, taking usage of NamespaceGroupingStrategy introduced in HBASE-14456 together with multiwal may make the design simpler
I think the work is valuable and HBASE-16415 is waiting for its owner, so just go ahead to take it. More discussion could be done in HBASE-16415 as Ted suggested. Thanks. Best Regards, Yu On 9 June 2017 at 06:49, Ted Yu <[email protected]> wrote: > Mind putting the below proposal on HBASE-16415 ? > > Thanks > > On Thu, Jun 8, 2017 at 3:24 PM, Jan Kunigk <[email protected]> wrote: > > > Hi, with regards to the above JIRA I would like to make the following > > contribution. > > I am looking very much forward to feedback and comments. > > > > ReplicationSourceWALReaderThread continuously follows WALEntries to be > > replicated for a specified WAL via WAL.Reader's next() method and adds > them > > to WALEntryBatches > > > > As far as I can see, those WALEntries are copies of the originally > > persisted local WALs. In order to direct these Entries to TableNames, > > different to the source, I propose to intercept the copied WALEntries on > > the source cluster and probe if they belong to a TableName, which is to > be > > re-written. > > > > If such a probe is successful, then the WALKey of any such WALEntry needs > > to be changed accordingly. WALKey provides a getTableName() method, but > > currently not a setTableName() method, which would simply have to be > added > > to change the private TableName member. > > > > I propose to intercept the entries via a new method redirectEntry(), > which > > is invoked shortly before the entry is added to its WALEntryBatch and > > immediately after the entry has been filtered by filterEntry() like so: > > > > Entry entry = entryStream.next(); > > if (updateSerialReplPos(batch, entry)) { > > batch.lastWalPosition = entryStream.getPosition(); > > break; > > } > > entry = filterEntry(entry); > > entry = redirectEntry(entry); // <-- > > if (entry != null) { > > WALEdit edit = entry.getEdit(); > > if (edit != null && !edit.isEmpty()) { > > long entrySize = getEntrySize(entry); > > batch.addEntry(entry); > > > > redirectEntry() bases its decisions on a 'Map<TableName, TableName> > > redirections', where the keys are the source table name and the values > the > > destination table name. The Map would be included in the > > ReplicationPeerConfig, which can be obtained from within > > ReplicationSourceWALReaderThread via the instance of > > ReplicationSourceManager, which is in turn passed as an argument to both > > available constructors. > > > > When a TableName object from a WALKey from the WALEntryStream matches the > > key of any of the entries in the redirections map, that WALKey's > TableName > > is replaced by the the value of that entry. > > > > The rationale for intercepting on the sending side is that the setup and > > peer management is performed on the source today already and there is no > > mechanism I can see which would carry the redirection rules themselves > > across. > > > > Similarly to the way that the hbase shell allows to specify the tables > and > > column families to be replicated (set_peer_table_CFs), I propose a new > > command (also on the sending side) 'set_peer_table_redirections', which > > accepts a map of Strings, corresponding to the required final > specification > > of the redirections as TableNames: > > > > set_peer_redirections['ns_source1:table_source1' : > 'ns_dest1:table_dest1', > > 'ns_source2:table_source2' : 'ns_dest2:table_dest2', ... > > 'ns_sourcen:table_sourcen' : 'ns_destn:table_destn', ] > > > > Thanks, best, J > > >
