[ https://issues.apache.org/jira/browse/HBASE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632330#comment-14632330 ]
Lars Hofhansl commented on HBASE-14014: --------------------------------------- Some more thoughts. Would it be correct to batch edit for multiple WALKeys and mark the result with both the latest write time and latest seqnum seen? If so, I can freely recombine edits for the table table and source cluster ids and hence be able to group Cells by row across WALEdits. I.e. if we had two edits: write time T1, seqnum N1, and write time T2, seqnum N2, with cells for the same table, cluster ids, and rows. I would then recombine these into a single WALEdit with T2 and seqnum N2. As a side-effect it would also reduce the amount of data to be sent to the sink. > Explore row-by-row grouping options > ----------------------------------- > > Key: HBASE-14014 > URL: https://issues.apache.org/jira/browse/HBASE-14014 > Project: HBase > Issue Type: Sub-task > Components: Replication > Reporter: Lars Hofhansl > > See discussion in parent. > We need to considering the following attributes of WALKey: > * The cluster ids > * Table Name > * write time (here we could use the latest of any batch) > * seqNum > As long as we preserve these we can rearrange the cells between WALEdits. > Since seqNum is unique this will be a challenge. Currently it is not used, > but we shouldn't design anything that prevents us guaranteeing better > ordering guarantees using seqNum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)