[jira] [Updated] (HUDI-1502) Restore on MOR table leaves metadata table out-of-sync from data table

2021-01-20 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1502:
-
Status: Closed  (was: Patch Available)

> Restore on MOR table leaves metadata table out-of-sync from data table
> --
>
> Key: HUDI-1502
> URL: https://issues.apache.org/jira/browse/HUDI-1502
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.7.0
>
> Attachments: image-2021-01-03-22-48-54-646.png
>
>
> Below is the stack trace from running `TestHoodieBackedMetadata#testSync` on 
> MOR tables. This seems like a more fundamental issue with deleting instant 
> files, during restore. 
> So what happens is that we restore which rolls back a delta commit that has 
> not been synced yet. (20210103224054 in the e.g) And that delta commit has 
> introduced a new log file, which has not been added to the metadata table. 
> But the restore effectively deletes the 20210103224054.deltacommit. 
> {code}
> Commit 20210103224042 added HoodieKey { recordKey=2016/03/15 
> partitionPath=files} HoodieMetadataPayload {key=2016/03/15, type=2, 
> creations=[6b8f2187-5505-40ae-845e-a71a2163d064-0_4-2-6_20210103224041.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=2015/03/16 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/16, type=2, 
> creations=[028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet,
>  25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet], 
> deletions=[], }
>   HoodieKey { recordKey=2015/03/17 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/17, type=2, 
> creations=[2ab899de-4745-43c5-9fa4-d09721d3aa91-0_1-2-3_20210103224041.parquet,
>  4733dbda-7824-4411-a708-4b2d978f887b-0_4-9-22_20210103224042.parquet, 
> 532a6f9b-ca89-4b96-84b7-0e3b13068b4b-0_3-9-21_20210103224042.parquet, 
> 6842e596-46b3-4546-9faa-8a7f8c674a17-0_0-2-2_20210103224041.parquet, 
> 7f0635d7-126e-40b6-9677-7fd8a123d5b9-0_3-2-5_20210103224041.parquet, 
> d1906fdc-66ca-48a4-86b6-687c865d939d-0_2-9-20_20210103224042.parquet, 
> fd446460-a662-434a-a6ab-1cd498af94ca-0_2-2-4_20210103224041.parquet], 
> deletions=[], }
>   HoodieKey { recordKey=__all_partitions__ partitionPath=files} 
> HoodieMetadataPayload {key=__all_partitions__, type=1, creations=[2015/03/16, 
> 2015/03/17, 2016/03/15], deletions=[], } 
>  Syncing [20210103224045__deltacommit__COMPLETED] to metadata table.
> Commit 20210103224045 added HoodieKey { recordKey=2016/03/15 
> partitionPath=files} HoodieMetadataPayload {key=2016/03/15, type=2, 
> creations=[6b8f2187-5505-40ae-845e-a71a2163d064-0_0-31-52_20210103224045.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=2015/03/16 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/16, type=2, 
> creations=[25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=2015/03/17 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/17, type=2, 
> creations=[2ab899de-4745-43c5-9fa4-d09721d3aa91-0_2-31-54_20210103224045.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=__all_partitions__ partitionPath=files} 
> HoodieMetadataPayload {key=__all_partitions__, type=1, creations=[2015/03/16, 
> 2015/03/17, 2016/03/15], deletions=[], } >>> (after compaction) State at 
> 20210103224051 files 
>.028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224042.log.1_0-100-148 
>028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet 
>
> 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
>  
> >>> (after delete) State at 20210103224052 files 
>.028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224042.log.1_0-100-148 
>028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet 
>
> 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
>  
> >>> (after clean) State at 20210103224053 files 
>
> 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
>  
> >>> (after update) State at 20210103224054 files 
>.028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224051.log.1_1-160-262 
>

[jira] [Updated] (HUDI-1502) Restore on MOR table leaves metadata table out-of-sync from data table

2021-01-20 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1502:
-
Status: Patch Available  (was: In Progress)

> Restore on MOR table leaves metadata table out-of-sync from data table
> --
>
> Key: HUDI-1502
> URL: https://issues.apache.org/jira/browse/HUDI-1502
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.7.0
>
> Attachments: image-2021-01-03-22-48-54-646.png
>
>
> Below is the stack trace from running `TestHoodieBackedMetadata#testSync` on 
> MOR tables. This seems like a more fundamental issue with deleting instant 
> files, during restore. 
> So what happens is that we restore which rolls back a delta commit that has 
> not been synced yet. (20210103224054 in the e.g) And that delta commit has 
> introduced a new log file, which has not been added to the metadata table. 
> But the restore effectively deletes the 20210103224054.deltacommit. 
> {code}
> Commit 20210103224042 added HoodieKey { recordKey=2016/03/15 
> partitionPath=files} HoodieMetadataPayload {key=2016/03/15, type=2, 
> creations=[6b8f2187-5505-40ae-845e-a71a2163d064-0_4-2-6_20210103224041.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=2015/03/16 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/16, type=2, 
> creations=[028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet,
>  25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet], 
> deletions=[], }
>   HoodieKey { recordKey=2015/03/17 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/17, type=2, 
> creations=[2ab899de-4745-43c5-9fa4-d09721d3aa91-0_1-2-3_20210103224041.parquet,
>  4733dbda-7824-4411-a708-4b2d978f887b-0_4-9-22_20210103224042.parquet, 
> 532a6f9b-ca89-4b96-84b7-0e3b13068b4b-0_3-9-21_20210103224042.parquet, 
> 6842e596-46b3-4546-9faa-8a7f8c674a17-0_0-2-2_20210103224041.parquet, 
> 7f0635d7-126e-40b6-9677-7fd8a123d5b9-0_3-2-5_20210103224041.parquet, 
> d1906fdc-66ca-48a4-86b6-687c865d939d-0_2-9-20_20210103224042.parquet, 
> fd446460-a662-434a-a6ab-1cd498af94ca-0_2-2-4_20210103224041.parquet], 
> deletions=[], }
>   HoodieKey { recordKey=__all_partitions__ partitionPath=files} 
> HoodieMetadataPayload {key=__all_partitions__, type=1, creations=[2015/03/16, 
> 2015/03/17, 2016/03/15], deletions=[], } 
>  Syncing [20210103224045__deltacommit__COMPLETED] to metadata table.
> Commit 20210103224045 added HoodieKey { recordKey=2016/03/15 
> partitionPath=files} HoodieMetadataPayload {key=2016/03/15, type=2, 
> creations=[6b8f2187-5505-40ae-845e-a71a2163d064-0_0-31-52_20210103224045.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=2015/03/16 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/16, type=2, 
> creations=[25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=2015/03/17 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/17, type=2, 
> creations=[2ab899de-4745-43c5-9fa4-d09721d3aa91-0_2-31-54_20210103224045.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=__all_partitions__ partitionPath=files} 
> HoodieMetadataPayload {key=__all_partitions__, type=1, creations=[2015/03/16, 
> 2015/03/17, 2016/03/15], deletions=[], } >>> (after compaction) State at 
> 20210103224051 files 
>.028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224042.log.1_0-100-148 
>028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet 
>
> 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
>  
> >>> (after delete) State at 20210103224052 files 
>.028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224042.log.1_0-100-148 
>028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet 
>
> 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
>  
> >>> (after clean) State at 20210103224053 files 
>
> 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
>  
> >>> (after update) State at 20210103224054 files 
>.028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224051.log.1_1-160-262 
>

[jira] [Updated] (HUDI-1502) Restore on MOR table leaves metadata table out-of-sync from data table

2021-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1502:
-
Labels: pull-request-available  (was: )

> Restore on MOR table leaves metadata table out-of-sync from data table
> --
>
> Key: HUDI-1502
> URL: https://issues.apache.org/jira/browse/HUDI-1502
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.7.0
>
> Attachments: image-2021-01-03-22-48-54-646.png
>
>
> Below is the stack trace from running `TestHoodieBackedMetadata#testSync` on 
> MOR tables. This seems like a more fundamental issue with deleting instant 
> files, during restore. 
> So what happens is that we restore which rolls back a delta commit that has 
> not been synced yet. (20210103224054 in the e.g) And that delta commit has 
> introduced a new log file, which has not been added to the metadata table. 
> But the restore effectively deletes the 20210103224054.deltacommit. 
> {code}
> Commit 20210103224042 added HoodieKey { recordKey=2016/03/15 
> partitionPath=files} HoodieMetadataPayload {key=2016/03/15, type=2, 
> creations=[6b8f2187-5505-40ae-845e-a71a2163d064-0_4-2-6_20210103224041.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=2015/03/16 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/16, type=2, 
> creations=[028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet,
>  25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet], 
> deletions=[], }
>   HoodieKey { recordKey=2015/03/17 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/17, type=2, 
> creations=[2ab899de-4745-43c5-9fa4-d09721d3aa91-0_1-2-3_20210103224041.parquet,
>  4733dbda-7824-4411-a708-4b2d978f887b-0_4-9-22_20210103224042.parquet, 
> 532a6f9b-ca89-4b96-84b7-0e3b13068b4b-0_3-9-21_20210103224042.parquet, 
> 6842e596-46b3-4546-9faa-8a7f8c674a17-0_0-2-2_20210103224041.parquet, 
> 7f0635d7-126e-40b6-9677-7fd8a123d5b9-0_3-2-5_20210103224041.parquet, 
> d1906fdc-66ca-48a4-86b6-687c865d939d-0_2-9-20_20210103224042.parquet, 
> fd446460-a662-434a-a6ab-1cd498af94ca-0_2-2-4_20210103224041.parquet], 
> deletions=[], }
>   HoodieKey { recordKey=__all_partitions__ partitionPath=files} 
> HoodieMetadataPayload {key=__all_partitions__, type=1, creations=[2015/03/16, 
> 2015/03/17, 2016/03/15], deletions=[], } 
>  Syncing [20210103224045__deltacommit__COMPLETED] to metadata table.
> Commit 20210103224045 added HoodieKey { recordKey=2016/03/15 
> partitionPath=files} HoodieMetadataPayload {key=2016/03/15, type=2, 
> creations=[6b8f2187-5505-40ae-845e-a71a2163d064-0_0-31-52_20210103224045.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=2015/03/16 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/16, type=2, 
> creations=[25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=2015/03/17 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/17, type=2, 
> creations=[2ab899de-4745-43c5-9fa4-d09721d3aa91-0_2-31-54_20210103224045.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=__all_partitions__ partitionPath=files} 
> HoodieMetadataPayload {key=__all_partitions__, type=1, creations=[2015/03/16, 
> 2015/03/17, 2016/03/15], deletions=[], } >>> (after compaction) State at 
> 20210103224051 files 
>.028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224042.log.1_0-100-148 
>028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet 
>
> 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
>  
> >>> (after delete) State at 20210103224052 files 
>.028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224042.log.1_0-100-148 
>028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet 
>
> 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
>  
> >>> (after clean) State at 20210103224053 files 
>
> 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
>  
> >>> (after update) State at 20210103224054 files 
>.028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224051.log.1_1-160-262 
>

[jira] [Updated] (HUDI-1502) Restore on MOR table leaves metadata table out-of-sync from data table

2021-01-04 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1502:
-
Status: Open  (was: New)

> Restore on MOR table leaves metadata table out-of-sync from data table
> --
>
> Key: HUDI-1502
> URL: https://issues.apache.org/jira/browse/HUDI-1502
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Writer Core
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.7.0
>
> Attachments: image-2021-01-03-22-48-54-646.png
>
>
> Below is the stack trace from running `TestHoodieBackedMetadata#testSync` on 
> MOR tables. This seems like a more fundamental issue with deleting instant 
> files, during restore. 
> So what happens is that we restore which rolls back a delta commit that has 
> not been synced yet. (20210103224054 in the e.g) And that delta commit has 
> introduced a new log file, which has not been added to the metadata table. 
> But the restore effectively deletes the 20210103224054.deltacommit. 
> {code}
> Commit 20210103224042 added HoodieKey { recordKey=2016/03/15 
> partitionPath=files} HoodieMetadataPayload {key=2016/03/15, type=2, 
> creations=[6b8f2187-5505-40ae-845e-a71a2163d064-0_4-2-6_20210103224041.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=2015/03/16 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/16, type=2, 
> creations=[028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet,
>  25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet], 
> deletions=[], }
>   HoodieKey { recordKey=2015/03/17 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/17, type=2, 
> creations=[2ab899de-4745-43c5-9fa4-d09721d3aa91-0_1-2-3_20210103224041.parquet,
>  4733dbda-7824-4411-a708-4b2d978f887b-0_4-9-22_20210103224042.parquet, 
> 532a6f9b-ca89-4b96-84b7-0e3b13068b4b-0_3-9-21_20210103224042.parquet, 
> 6842e596-46b3-4546-9faa-8a7f8c674a17-0_0-2-2_20210103224041.parquet, 
> 7f0635d7-126e-40b6-9677-7fd8a123d5b9-0_3-2-5_20210103224041.parquet, 
> d1906fdc-66ca-48a4-86b6-687c865d939d-0_2-9-20_20210103224042.parquet, 
> fd446460-a662-434a-a6ab-1cd498af94ca-0_2-2-4_20210103224041.parquet], 
> deletions=[], }
>   HoodieKey { recordKey=__all_partitions__ partitionPath=files} 
> HoodieMetadataPayload {key=__all_partitions__, type=1, creations=[2015/03/16, 
> 2015/03/17, 2016/03/15], deletions=[], } 
>  Syncing [20210103224045__deltacommit__COMPLETED] to metadata table.
> Commit 20210103224045 added HoodieKey { recordKey=2016/03/15 
> partitionPath=files} HoodieMetadataPayload {key=2016/03/15, type=2, 
> creations=[6b8f2187-5505-40ae-845e-a71a2163d064-0_0-31-52_20210103224045.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=2015/03/16 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/16, type=2, 
> creations=[25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=2015/03/17 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/17, type=2, 
> creations=[2ab899de-4745-43c5-9fa4-d09721d3aa91-0_2-31-54_20210103224045.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=__all_partitions__ partitionPath=files} 
> HoodieMetadataPayload {key=__all_partitions__, type=1, creations=[2015/03/16, 
> 2015/03/17, 2016/03/15], deletions=[], } >>> (after compaction) State at 
> 20210103224051 files 
>.028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224042.log.1_0-100-148 
>028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet 
>
> 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
>  
> >>> (after delete) State at 20210103224052 files 
>.028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224042.log.1_0-100-148 
>028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet 
>
> 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
>  
> >>> (after clean) State at 20210103224053 files 
>
> 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
>  
> >>> (after update) State at 20210103224054 files 
>.028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224051.log.1_1-160-262 
>.25c9a174-4c07-43a1-a1a2-40454a3f0310-0_20210103224045.log.1_2-160-263 
>
> 

[jira] [Updated] (HUDI-1502) Restore on MOR table leaves metadata table out-of-sync from data table

2021-01-04 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1502:
-
Parent: HUDI-1292
Issue Type: Sub-task  (was: Bug)

> Restore on MOR table leaves metadata table out-of-sync from data table
> --
>
> Key: HUDI-1502
> URL: https://issues.apache.org/jira/browse/HUDI-1502
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.7.0
>
> Attachments: image-2021-01-03-22-48-54-646.png
>
>
> Below is the stack trace from running `TestHoodieBackedMetadata#testSync` on 
> MOR tables. This seems like a more fundamental issue with deleting instant 
> files, during restore. 
> So what happens is that we restore which rolls back a delta commit that has 
> not been synced yet. (20210103224054 in the e.g) And that delta commit has 
> introduced a new log file, which has not been added to the metadata table. 
> But the restore effectively deletes the 20210103224054.deltacommit. 
> {code}
> Commit 20210103224042 added HoodieKey { recordKey=2016/03/15 
> partitionPath=files} HoodieMetadataPayload {key=2016/03/15, type=2, 
> creations=[6b8f2187-5505-40ae-845e-a71a2163d064-0_4-2-6_20210103224041.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=2015/03/16 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/16, type=2, 
> creations=[028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet,
>  25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet], 
> deletions=[], }
>   HoodieKey { recordKey=2015/03/17 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/17, type=2, 
> creations=[2ab899de-4745-43c5-9fa4-d09721d3aa91-0_1-2-3_20210103224041.parquet,
>  4733dbda-7824-4411-a708-4b2d978f887b-0_4-9-22_20210103224042.parquet, 
> 532a6f9b-ca89-4b96-84b7-0e3b13068b4b-0_3-9-21_20210103224042.parquet, 
> 6842e596-46b3-4546-9faa-8a7f8c674a17-0_0-2-2_20210103224041.parquet, 
> 7f0635d7-126e-40b6-9677-7fd8a123d5b9-0_3-2-5_20210103224041.parquet, 
> d1906fdc-66ca-48a4-86b6-687c865d939d-0_2-9-20_20210103224042.parquet, 
> fd446460-a662-434a-a6ab-1cd498af94ca-0_2-2-4_20210103224041.parquet], 
> deletions=[], }
>   HoodieKey { recordKey=__all_partitions__ partitionPath=files} 
> HoodieMetadataPayload {key=__all_partitions__, type=1, creations=[2015/03/16, 
> 2015/03/17, 2016/03/15], deletions=[], } 
>  Syncing [20210103224045__deltacommit__COMPLETED] to metadata table.
> Commit 20210103224045 added HoodieKey { recordKey=2016/03/15 
> partitionPath=files} HoodieMetadataPayload {key=2016/03/15, type=2, 
> creations=[6b8f2187-5505-40ae-845e-a71a2163d064-0_0-31-52_20210103224045.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=2015/03/16 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/16, type=2, 
> creations=[25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=2015/03/17 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/17, type=2, 
> creations=[2ab899de-4745-43c5-9fa4-d09721d3aa91-0_2-31-54_20210103224045.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=__all_partitions__ partitionPath=files} 
> HoodieMetadataPayload {key=__all_partitions__, type=1, creations=[2015/03/16, 
> 2015/03/17, 2016/03/15], deletions=[], } >>> (after compaction) State at 
> 20210103224051 files 
>.028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224042.log.1_0-100-148 
>028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet 
>
> 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
>  
> >>> (after delete) State at 20210103224052 files 
>.028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224042.log.1_0-100-148 
>028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet 
>
> 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
>  
> >>> (after clean) State at 20210103224053 files 
>
> 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
>  
> >>> (after update) State at 20210103224054 files 
>.028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224051.log.1_1-160-262 
>.25c9a174-4c07-43a1-a1a2-40454a3f0310-0_20210103224045.log.1_2-160-263 
>
> 

[jira] [Updated] (HUDI-1502) Restore on MOR table leaves metadata table out-of-sync from data table

2021-01-04 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1502:
-
Status: In Progress  (was: Open)

> Restore on MOR table leaves metadata table out-of-sync from data table
> --
>
> Key: HUDI-1502
> URL: https://issues.apache.org/jira/browse/HUDI-1502
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Writer Core
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.7.0
>
> Attachments: image-2021-01-03-22-48-54-646.png
>
>
> Below is the stack trace from running `TestHoodieBackedMetadata#testSync` on 
> MOR tables. This seems like a more fundamental issue with deleting instant 
> files, during restore. 
> So what happens is that we restore which rolls back a delta commit that has 
> not been synced yet. (20210103224054 in the e.g) And that delta commit has 
> introduced a new log file, which has not been added to the metadata table. 
> But the restore effectively deletes the 20210103224054.deltacommit. 
> {code}
> Commit 20210103224042 added HoodieKey { recordKey=2016/03/15 
> partitionPath=files} HoodieMetadataPayload {key=2016/03/15, type=2, 
> creations=[6b8f2187-5505-40ae-845e-a71a2163d064-0_4-2-6_20210103224041.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=2015/03/16 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/16, type=2, 
> creations=[028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet,
>  25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet], 
> deletions=[], }
>   HoodieKey { recordKey=2015/03/17 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/17, type=2, 
> creations=[2ab899de-4745-43c5-9fa4-d09721d3aa91-0_1-2-3_20210103224041.parquet,
>  4733dbda-7824-4411-a708-4b2d978f887b-0_4-9-22_20210103224042.parquet, 
> 532a6f9b-ca89-4b96-84b7-0e3b13068b4b-0_3-9-21_20210103224042.parquet, 
> 6842e596-46b3-4546-9faa-8a7f8c674a17-0_0-2-2_20210103224041.parquet, 
> 7f0635d7-126e-40b6-9677-7fd8a123d5b9-0_3-2-5_20210103224041.parquet, 
> d1906fdc-66ca-48a4-86b6-687c865d939d-0_2-9-20_20210103224042.parquet, 
> fd446460-a662-434a-a6ab-1cd498af94ca-0_2-2-4_20210103224041.parquet], 
> deletions=[], }
>   HoodieKey { recordKey=__all_partitions__ partitionPath=files} 
> HoodieMetadataPayload {key=__all_partitions__, type=1, creations=[2015/03/16, 
> 2015/03/17, 2016/03/15], deletions=[], } 
>  Syncing [20210103224045__deltacommit__COMPLETED] to metadata table.
> Commit 20210103224045 added HoodieKey { recordKey=2016/03/15 
> partitionPath=files} HoodieMetadataPayload {key=2016/03/15, type=2, 
> creations=[6b8f2187-5505-40ae-845e-a71a2163d064-0_0-31-52_20210103224045.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=2015/03/16 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/16, type=2, 
> creations=[25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=2015/03/17 partitionPath=files} 
> HoodieMetadataPayload {key=2015/03/17, type=2, 
> creations=[2ab899de-4745-43c5-9fa4-d09721d3aa91-0_2-31-54_20210103224045.parquet],
>  deletions=[], }
>   HoodieKey { recordKey=__all_partitions__ partitionPath=files} 
> HoodieMetadataPayload {key=__all_partitions__, type=1, creations=[2015/03/16, 
> 2015/03/17, 2016/03/15], deletions=[], } >>> (after compaction) State at 
> 20210103224051 files 
>.028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224042.log.1_0-100-148 
>028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet 
>
> 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
>  
> >>> (after delete) State at 20210103224052 files 
>.028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224042.log.1_0-100-148 
>028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet 
>
> 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
>  
> >>> (after clean) State at 20210103224053 files 
>
> 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
>25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
>  
> >>> (after update) State at 20210103224054 files 
>.028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224051.log.1_1-160-262 
>.25c9a174-4c07-43a1-a1a2-40454a3f0310-0_20210103224045.log.1_2-160-263 
>
> 

[jira] [Updated] (HUDI-1502) Restore on MOR table leaves metadata table out-of-sync from data table

2021-01-03 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1502:
-
Description: 
Below is the stack trace from running `TestHoodieBackedMetadata#testSync` on 
MOR tables. This seems like a more fundamental issue with deleting instant 
files, during restore. 

So what happens is that we restore which rolls back a delta commit that has not 
been synced yet. (20210103224054 in the e.g) And that delta commit has 
introduced a new log file, which has not been added to the metadata table. But 
the restore effectively deletes the 20210103224054.deltacommit. 

{code}
Commit 20210103224042 added HoodieKey { recordKey=2016/03/15 
partitionPath=files} HoodieMetadataPayload {key=2016/03/15, type=2, 
creations=[6b8f2187-5505-40ae-845e-a71a2163d064-0_4-2-6_20210103224041.parquet],
 deletions=[], }
HoodieKey { recordKey=2015/03/16 partitionPath=files} 
HoodieMetadataPayload {key=2015/03/16, type=2, 
creations=[028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet,
 25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet], 
deletions=[], }
HoodieKey { recordKey=2015/03/17 partitionPath=files} 
HoodieMetadataPayload {key=2015/03/17, type=2, 
creations=[2ab899de-4745-43c5-9fa4-d09721d3aa91-0_1-2-3_20210103224041.parquet, 
4733dbda-7824-4411-a708-4b2d978f887b-0_4-9-22_20210103224042.parquet, 
532a6f9b-ca89-4b96-84b7-0e3b13068b4b-0_3-9-21_20210103224042.parquet, 
6842e596-46b3-4546-9faa-8a7f8c674a17-0_0-2-2_20210103224041.parquet, 
7f0635d7-126e-40b6-9677-7fd8a123d5b9-0_3-2-5_20210103224041.parquet, 
d1906fdc-66ca-48a4-86b6-687c865d939d-0_2-9-20_20210103224042.parquet, 
fd446460-a662-434a-a6ab-1cd498af94ca-0_2-2-4_20210103224041.parquet], 
deletions=[], }
HoodieKey { recordKey=__all_partitions__ partitionPath=files} 
HoodieMetadataPayload {key=__all_partitions__, type=1, creations=[2015/03/16, 
2015/03/17, 2016/03/15], deletions=[], } 
 Syncing [20210103224045__deltacommit__COMPLETED] to metadata table.
Commit 20210103224045 added HoodieKey { recordKey=2016/03/15 
partitionPath=files} HoodieMetadataPayload {key=2016/03/15, type=2, 
creations=[6b8f2187-5505-40ae-845e-a71a2163d064-0_0-31-52_20210103224045.parquet],
 deletions=[], }
HoodieKey { recordKey=2015/03/16 partitionPath=files} 
HoodieMetadataPayload {key=2015/03/16, type=2, 
creations=[25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet],
 deletions=[], }
HoodieKey { recordKey=2015/03/17 partitionPath=files} 
HoodieMetadataPayload {key=2015/03/17, type=2, 
creations=[2ab899de-4745-43c5-9fa4-d09721d3aa91-0_2-31-54_20210103224045.parquet],
 deletions=[], }
HoodieKey { recordKey=__all_partitions__ partitionPath=files} 
HoodieMetadataPayload {key=__all_partitions__, type=1, creations=[2015/03/16, 
2015/03/17, 2016/03/15], deletions=[], } >>> (after compaction) State at 
20210103224051 files 
 .028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224042.log.1_0-100-148 
 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet 
 
028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
 25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet 
 25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
 
>>> (after delete) State at 20210103224052 files 
 .028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224042.log.1_0-100-148 
 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet 
 
028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
 25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet 
 25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
 
>>> (after clean) State at 20210103224053 files 
 
028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
 25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
 
>>> (after update) State at 20210103224054 files 
 .028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224051.log.1_1-160-262 
 .25c9a174-4c07-43a1-a1a2-40454a3f0310-0_20210103224045.log.1_2-160-263 
 
028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
 25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
 
>>> (after restore) State after restore files 
 .028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224051.log.1_1-160-262 
 .028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224051.log.2_1-0-1 
 .25c9a174-4c07-43a1-a1a2-40454a3f0310-0_20210103224045.log.1_2-160-263 
 .25c9a174-4c07-43a1-a1a2-40454a3f0310-0_20210103224045.log.2_1-0-1 
 
028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
 25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 

Syncing