[
https://issues.apache.org/jira/browse/HUDI-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17915917#comment-17915917
]
Y Ethan Guo commented on HUDI-8632:
-----------------------------------
Found another issue: HUDI-8896. Right now reading partition columns for
bootstrapped table (bootstrap skeleton + data file) relies on engine-specific
handling, not in the file group reader. So compaction and clustering using the
file group reader to read out records from metadata-only bootstrapped table can
encounter nulls in the partition columns, causing the new base file written to
have nulls in the partition columns.
> Support bootstrap files in file group reader-based compaction
> -------------------------------------------------------------
>
> Key: HUDI-8632
> URL: https://issues.apache.org/jira/browse/HUDI-8632
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: Y Ethan Guo
> Assignee: Y Ethan Guo
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 1.0.1
>
> Original Estimate: 4h
> Time Spent: 8h
> Remaining Estimate: 2h
>
> testMetadataBootstrapMORPartitionedInlineCompactionOn fails with file group
> reader-based compaction (HoodieSparkMergeHandleV2). Currently if the
> compaction plan and operations contain bootstrap files, the compaction goes
> through the old flow using the regular HoodieMergeHandle. We should support
> bootstrap files in file group reader-based compaction in
> HoodieSparkMergeHandleV2.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)