[
https://issues.apache.org/jira/browse/PHOENIX-7751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Palash Chauhan resolved PHOENIX-7751.
-------------------------------------
Fix Version/s: 5.4.0
5.3.1
Resolution: Fixed
> Feature to validate table data using PhoenixSyncTable tool b/w source and
> target cluster
> ----------------------------------------------------------------------------------------
>
> Key: PHOENIX-7751
> URL: https://issues.apache.org/jira/browse/PHOENIX-7751
> Project: Phoenix
> Issue Type: Sub-task
> Affects Versions: 5.2.0, 5.2.1, 5.3.0
> Reporter: Rahul Kumar
> Assignee: Rahul Kumar
> Priority: Major
> Fix For: 5.4.0, 5.3.1
>
>
> The tool runs on the source cluster and gets the list of region boundaries
> for a table or table section from the source cluster. This list becomes the
> list of splits for the MR job. For the checkpointing purpose, the tool adds
> chunks and mapper regions to the output table as their processing completes.
>
> The chunk formation is done on the server side by the Phoenix coprocessor
> UngrouppedAggregateRegionObserver. A mapper opens a scan for its mapper
> region on both source and target cluster. The mapper region boundaries are
> serialized into a scan attribute on these scans. These scans also include an
> attribute to signal the Phoenix coprocessor that they are for chunk
> formation. A scan returns a chunk at a time. A chunk could be a full chunk or
> a partial chunk. A partial chunk is returned only when the table region ends
> before the mapper region does. This can happen on the source cluster if the
> table region boundaries change due to region splits and merges while the tool
> is running. Partial chunks are expected to happen more often on the target
> cluster as mapper regions are aligned with the table regions on the source
> cluster. In this case, the mapper opens another scan to continue from where
> the previous scan ended. This scan also includes a scan attribute for the
> partial chunk so that the scan can complete this partial chunk.
>
> After receiving the two copies of a chunk, one from the source and the other
> from the target cluster, the tool within its mappers compares them. If the
> chunk copies are different, then the tool optionally repairs the target copy
> (that is if the current run of the tool is configured for repair). The repair
> operation requires scanning the rows of the chunk using a raw scan from both
> clusters. The repair operation is also done inline in the same mapper and
> will be done with best possible effort i.e repair as much as possible for a
> mapper or chunk.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)