[ 
https://issues.apache.org/jira/browse/PHOENIX-7751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Palash Chauhan resolved PHOENIX-7751.
-------------------------------------
    Fix Version/s: 5.4.0
                   5.3.1
       Resolution: Fixed

> Feature to validate table data using PhoenixSyncTable tool b/w source and 
> target cluster
> ----------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-7751
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7751
>             Project: Phoenix
>          Issue Type: Sub-task
>    Affects Versions: 5.2.0, 5.2.1, 5.3.0
>            Reporter: Rahul Kumar
>            Assignee: Rahul Kumar
>            Priority: Major
>             Fix For: 5.4.0, 5.3.1
>
>
> The tool runs on the source cluster and gets the list of region boundaries 
> for a table or table section from the source cluster. This list becomes the 
> list of splits for the MR job. For the checkpointing purpose, the tool adds 
> chunks and mapper regions to the output table as their processing completes.
>  
> The chunk formation is done on the server side by the Phoenix coprocessor 
> UngrouppedAggregateRegionObserver.  A mapper opens a scan for its mapper 
> region on both source and target cluster. The mapper region boundaries are 
> serialized into a scan attribute on these scans. These scans also include an 
> attribute to signal the Phoenix coprocessor that they are for chunk 
> formation. A scan returns a chunk at a time. A chunk could be a full chunk or 
> a partial chunk.  A partial chunk is returned only when the table region ends 
> before the mapper region does. This can happen on the source cluster if the 
> table region boundaries change due to region splits and merges while the tool 
> is running. Partial chunks are expected to happen more often on the target 
> cluster as mapper regions are aligned with the table regions on the source 
> cluster. In this case, the mapper opens another scan to continue from where 
> the previous scan ended. This scan also includes a scan attribute for the 
> partial chunk so that the scan can complete this partial chunk.
>  
> After receiving the two copies of a chunk, one from the source and the other 
> from the target cluster, the tool within its mappers compares them. If the 
> chunk copies are different, then the tool optionally repairs the target copy 
> (that is if the current run of the tool is configured for repair). The repair 
> operation requires scanning the rows of the chunk using a raw scan from both 
> clusters. The repair operation is also done inline in the same mapper and 
> will be done with best possible effort i.e repair as much as possible for a 
> mapper or chunk.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to