[jira] [Updated] (HUDI-6962) Correct the behavior of bulk insert for NB-CC
[ https://issues.apache.org/jira/browse/HUDI-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-6962: - Fix Version/s: 1.0.0 > Correct the behavior of bulk insert for NB-CC > -- > > Key: HUDI-6962 > URL: https://issues.apache.org/jira/browse/HUDI-6962 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Jing Zhang >Assignee: Jing Zhang >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > How to handle the case if the multiple writer contains a job with bulk insert > operation? > 1. Generated file group id: Generate a fixed file group ID because other jobs > will use the fixed file group id suffix instead of random uuid suffix. The > behavior needs to be consistent to prevent later writer jobs from writing the > records with same primary key to different file groups. > 2.Deal with the transaction: The conflict resolution of bulk insert could not > defer to the compaction phase. Because bulk insert writers flush data into > base files, if there are multiple bulk insert job, there might exists > multiple base files in the same bucket. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6962) Correct the behavior of bulk insert for NB-CC
[ https://issues.apache.org/jira/browse/HUDI-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6962: - Labels: pull-request-available (was: ) > Correct the behavior of bulk insert for NB-CC > -- > > Key: HUDI-6962 > URL: https://issues.apache.org/jira/browse/HUDI-6962 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Jing Zhang >Assignee: Jing Zhang >Priority: Major > Labels: pull-request-available > > How to handle the case if the multiple writer contains a job with bulk insert > operation? > 1. Generated file group id: Generate a fixed file group ID because other jobs > will use the fixed file group id suffix instead of random uuid suffix. The > behavior needs to be consistent to prevent later writer jobs from writing the > records with same primary key to different file groups. > 2.Deal with the transaction: The conflict resolution of bulk insert could not > defer to the compaction phase. Because bulk insert writers flush data into > base files, if there are multiple bulk insert job, there might exists > multiple base files in the same bucket. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6962) Correct the behavior of bulk insert for NB-CC
[ https://issues.apache.org/jira/browse/HUDI-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhang updated HUDI-6962: - Description: How to handle the case if the multiple writer contains a job with bulk insert operation? 1. Generated file group id: Generate a fixed file group ID because other jobs will use the fixed file group id suffix instead of random uuid suffix. The behavior needs to be consistent to prevent later writer jobs from writing the records with same primary key to different file groups. 2.Deal with the transaction: The conflict resolution of bulk insert could not defer to the compaction phase. Because bulk insert writers flush data into base files, if there are multiple bulk insert job, there might exists multiple base files in the same bucket. was: How to handle the case if the multiple writer contains a job with bulk insert operation? 1. Generated file group id: Generate a fixed file group ID because all subsequent jobs will use the fixed file group id suffix instead of random uuid suffix. The behavior needs to be consistent to prevent later writer jobs from writing the records with same primary key to different file groups. 2.Deal with the transaction: The conflict resolution of bulk insert could not defer to the compaction phase. Because bulk insert writers flush data into base files, if there are multiple bulk insert job, there might exists multiple base files in the same bucket. > Correct the behavior of bulk insert for NB-CC > -- > > Key: HUDI-6962 > URL: https://issues.apache.org/jira/browse/HUDI-6962 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Jing Zhang >Assignee: Jing Zhang >Priority: Major > > How to handle the case if the multiple writer contains a job with bulk insert > operation? > 1. Generated file group id: Generate a fixed file group ID because other jobs > will use the fixed file group id suffix instead of random uuid suffix. The > behavior needs to be consistent to prevent later writer jobs from writing the > records with same primary key to different file groups. > 2.Deal with the transaction: The conflict resolution of bulk insert could not > defer to the compaction phase. Because bulk insert writers flush data into > base files, if there are multiple bulk insert job, there might exists > multiple base files in the same bucket. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6962) Correct the behavior of bulk insert for NB-CC
[ https://issues.apache.org/jira/browse/HUDI-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhang updated HUDI-6962: - Description: How to handle the case if the multiple writer contains a job with bulk insert operation? 1. Generated file group id: Generate a fixed file group ID because all subsequent jobs will use the fixed file group id suffix instead of random uuid suffix. The behavior needs to be consistent to prevent later writer jobs from writing the records with same primary key to different file groups. 2.Deal with the transaction: The conflict resolution of bulk insert could not defer to the compaction phase. Because bulk insert writers flush data into base files, if there are multiple bulk insert job, there might exists multiple base files in the same bucket. was: How to handle the case if the multiple writer contains a job with bulk insert operation? 1. Generated file group id: Generate a fixed file group ID because all subsequent jobs will use the fixed file group id suffix instead of random uuid suffix. The behavior needs to be consistent to prevent later writer jobs from writing the records with same primary key to different file groups. 2.Resolve conflict: The conflict resolution of bulk insert could not defer to the compaction phase. Because bulk insert writers flush data into base files, if there are multiple bulk insert job, there might exists multiple base files in the same bucket. > Correct the behavior of bulk insert for NB-CC > -- > > Key: HUDI-6962 > URL: https://issues.apache.org/jira/browse/HUDI-6962 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Jing Zhang >Assignee: Jing Zhang >Priority: Major > > How to handle the case if the multiple writer contains a job with bulk insert > operation? > 1. Generated file group id: Generate a fixed file group ID because all > subsequent jobs will use the fixed file group id suffix instead of random > uuid suffix. The behavior needs to be consistent to prevent later writer jobs > from writing the records with same primary key to different file groups. > 2.Deal with the transaction: The conflict resolution of bulk insert could not > defer to the compaction phase. Because bulk insert writers flush data into > base files, if there are multiple bulk insert job, there might exists > multiple base files in the same bucket. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6962) Correct the behavior of bulk insert for NB-CC
[ https://issues.apache.org/jira/browse/HUDI-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhang updated HUDI-6962: - Description: How to handle the case if the multiple writer contains a job with bulk insert operation? 1. Generated file group id: Generate a fixed file group ID because all subsequent jobs will use the fixed file group id suffix instead of random uuid suffix. The behavior needs to be consistent to prevent later writer jobs from writing the records with same primary key to different file groups. 2.Resolve conflict: The conflict resolution of bulk insert could not defer to the compaction phase. Because bulk insert writers flush data into base files, if there are multiple bulk insert job, there might exists multiple base files in the same bucket. > Correct the behavior of bulk insert for NB-CC > -- > > Key: HUDI-6962 > URL: https://issues.apache.org/jira/browse/HUDI-6962 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Jing Zhang >Assignee: Jing Zhang >Priority: Major > > How to handle the case if the multiple writer contains a job with bulk insert > operation? > 1. Generated file group id: Generate a fixed file group ID because all > subsequent jobs will use the fixed file group id suffix instead of random > uuid suffix. The behavior needs to be consistent to prevent later writer jobs > from writing the records with same primary key to different file groups. > 2.Resolve conflict: The conflict resolution of bulk insert could not defer to > the compaction phase. Because bulk insert writers flush data into base files, > if there are multiple bulk insert job, there might exists multiple base files > in the same bucket. -- This message was sent by Atlassian Jira (v8.20.10#820010)