[jira] [Updated] (HUDI-3883) Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues
[ https://issues.apache.org/jira/browse/HUDI-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3883: -- Fix Version/s: 0.13.1 (was: 0.13.0) > Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues > --- > > Key: HUDI-3883 > URL: https://issues.apache.org/jira/browse/HUDI-3883 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.1 > > Attachments: Screen Shot 2022-04-14 at 1.08.19 PM.png > > > Even after HUDI-3709, i still see that when writing partitioned-table > file-sizing doesn't seem to be properly respected: in that case i was running > ingestion job with following configs which was supposed to yield me ~100Mb > files > {code:java} > Map( > "hoodie.parquet.small.file.limit" -> String.valueOf(100 * 1024 * 1024), // > 100Mb > "hoodie.parquet.max.file.size"-> String.valueOf(120 * 1024 * 1024) // > 120Mb > ) {code} > > Instead, my table contains a lot of very small (~1Mb) files: > !Screen Shot 2022-04-14 at 1.08.19 PM.png|width=742,height=422! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-3883) Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues
[ https://issues.apache.org/jira/browse/HUDI-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-3883: Description: Even after HUDI-3709, i still see that when writing partitioned-table file-sizing doesn't seem to be properly respected: in that case i was running ingestion job with following configs which was supposed to yield me ~100Mb files {code:java} Map( "hoodie.parquet.small.file.limit" -> String.valueOf(100 * 1024 * 1024), // 100Mb "hoodie.parquet.max.file.size"-> String.valueOf(120 * 1024 * 1024) // 120Mb ) {code} Instead, my table contains a lot of very small (~1Mb) files: !Screen Shot 2022-04-14 at 1.08.19 PM.png|width=742,height=422! was: Even after HUDI-3709, i still see that when writing partitioned-table file-sizing doesn't seem to be properly respected: in that case i was running ingestion job with following configs which was supposed to yield me ~100Mb files {code:java} Map( "hoodie.parquet.small.file.limit" -> String.valueOf(100 * 1024 * 1024), // 100Mb "hoodie.parquet.max.file.size"-> String.valueOf(120 * 1024 * 1024) // 120Mb ) {code} Instead, my table contains a lot of very small (~1Mb) files: !Screen Shot 2022-04-14 at 1.08.19 PM.png! > Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues > --- > > Key: HUDI-3883 > URL: https://issues.apache.org/jira/browse/HUDI-3883 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > Attachments: Screen Shot 2022-04-14 at 1.08.19 PM.png > > > Even after HUDI-3709, i still see that when writing partitioned-table > file-sizing doesn't seem to be properly respected: in that case i was running > ingestion job with following configs which was supposed to yield me ~100Mb > files > {code:java} > Map( > "hoodie.parquet.small.file.limit" -> String.valueOf(100 * 1024 * 1024), // > 100Mb > "hoodie.parquet.max.file.size"-> String.valueOf(120 * 1024 * 1024) // > 120Mb > ) {code} > > Instead, my table contains a lot of very small (~1Mb) files: > !Screen Shot 2022-04-14 at 1.08.19 PM.png|width=742,height=422! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-3883) Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues
[ https://issues.apache.org/jira/browse/HUDI-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3883: - Sprint: Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 2022/05/02, 2022/05/16, 2022/05/31 (was: Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 2022/05/02, 2022/05/16, 2022/05/31, 2022/08/22) > Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues > --- > > Key: HUDI-3883 > URL: https://issues.apache.org/jira/browse/HUDI-3883 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > Attachments: Screen Shot 2022-04-14 at 1.08.19 PM.png > > > Even after HUDI-3709, i still see that when writing partitioned-table > file-sizing doesn't seem to be properly respected: in that case i was running > ingestion job with following configs which was supposed to yield me ~100Mb > files > {code:java} > Map( > "hoodie.parquet.small.file.limit" -> String.valueOf(100 * 1024 * 1024), // > 100Mb > "hoodie.parquet.max.file.size"-> String.valueOf(120 * 1024 * 1024) // > 120Mb > ) {code} > > Instead, my table contains a lot of very small (~1Mb) files: > !Screen Shot 2022-04-14 at 1.08.19 PM.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-3883) Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues
[ https://issues.apache.org/jira/browse/HUDI-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3883: -- Sprint: Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 2022/05/02, 2022/05/16, 2022/05/31, 2022/08/22 (was: Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 2022/05/02, 2022/05/16, 2022/05/31, 2022/08/08) > Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues > --- > > Key: HUDI-3883 > URL: https://issues.apache.org/jira/browse/HUDI-3883 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > Attachments: Screen Shot 2022-04-14 at 1.08.19 PM.png > > > Even after HUDI-3709, i still see that when writing partitioned-table > file-sizing doesn't seem to be properly respected: in that case i was running > ingestion job with following configs which was supposed to yield me ~100Mb > files > {code:java} > Map( > "hoodie.parquet.small.file.limit" -> String.valueOf(100 * 1024 * 1024), // > 100Mb > "hoodie.parquet.max.file.size"-> String.valueOf(120 * 1024 * 1024) // > 120Mb > ) {code} > > Instead, my table contains a lot of very small (~1Mb) files: > !Screen Shot 2022-04-14 at 1.08.19 PM.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-3883) Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues
[ https://issues.apache.org/jira/browse/HUDI-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3883: - Sprint: Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 2022/05/02, 2022/05/16, 2022/05/31, 2022/08/08 (was: Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 2022/05/02, 2022/05/16, 2022/05/31) > Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues > --- > > Key: HUDI-3883 > URL: https://issues.apache.org/jira/browse/HUDI-3883 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > Attachments: Screen Shot 2022-04-14 at 1.08.19 PM.png > > > Even after HUDI-3709, i still see that when writing partitioned-table > file-sizing doesn't seem to be properly respected: in that case i was running > ingestion job with following configs which was supposed to yield me ~100Mb > files > {code:java} > Map( > "hoodie.parquet.small.file.limit" -> String.valueOf(100 * 1024 * 1024), // > 100Mb > "hoodie.parquet.max.file.size"-> String.valueOf(120 * 1024 * 1024) // > 120Mb > ) {code} > > Instead, my table contains a lot of very small (~1Mb) files: > !Screen Shot 2022-04-14 at 1.08.19 PM.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-3883) Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues
[ https://issues.apache.org/jira/browse/HUDI-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3883: -- Fix Version/s: 0.13.0 (was: 0.12.0) > Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues > --- > > Key: HUDI-3883 > URL: https://issues.apache.org/jira/browse/HUDI-3883 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > Attachments: Screen Shot 2022-04-14 at 1.08.19 PM.png > > > Even after HUDI-3709, i still see that when writing partitioned-table > file-sizing doesn't seem to be properly respected: in that case i was running > ingestion job with following configs which was supposed to yield me ~100Mb > files > {code:java} > Map( > "hoodie.parquet.small.file.limit" -> String.valueOf(100 * 1024 * 1024), // > 100Mb > "hoodie.parquet.max.file.size"-> String.valueOf(120 * 1024 * 1024) // > 120Mb > ) {code} > > Instead, my table contains a lot of very small (~1Mb) files: > !Screen Shot 2022-04-14 at 1.08.19 PM.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-3883) Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues
[ https://issues.apache.org/jira/browse/HUDI-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3883: -- Summary: Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues (was: File-sizing issues when writing COW table to S3) > Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues > --- > > Key: HUDI-3883 > URL: https://issues.apache.org/jira/browse/HUDI-3883 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.12.0 > > Attachments: Screen Shot 2022-04-14 at 1.08.19 PM.png > > > Even after HUDI-3709, i still see that when writing partitioned-table > file-sizing doesn't seem to be properly respected: in that case i was running > ingestion job with following configs which was supposed to yield me ~100Mb > files > {code:java} > Map( > "hoodie.parquet.small.file.limit" -> String.valueOf(100 * 1024 * 1024), // > 100Mb > "hoodie.parquet.max.file.size"-> String.valueOf(120 * 1024 * 1024) // > 120Mb > ) {code} > > Instead, my table contains a lot of very small (~1Mb) files: > !Screen Shot 2022-04-14 at 1.08.19 PM.png! -- This message was sent by Atlassian Jira (v8.20.7#820007)