[jira] [Updated] (HUDI-3883) Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues

2023-02-06 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-3883:
--
Fix Version/s: 0.13.1
   (was: 0.13.0)

> Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues
> ---
>
> Key: HUDI-3883
> URL: https://issues.apache.org/jira/browse/HUDI-3883
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.1
>
> Attachments: Screen Shot 2022-04-14 at 1.08.19 PM.png
>
>
> Even after HUDI-3709, i still see that when writing partitioned-table 
> file-sizing doesn't seem to be properly respected: in that case i was running 
> ingestion job with following configs which was supposed to yield me ~100Mb 
> files
> {code:java}
> Map(
>   "hoodie.parquet.small.file.limit" -> String.valueOf(100 * 1024 * 1024),  // 
> 100Mb
>   "hoodie.parquet.max.file.size"-> String.valueOf(120 * 1024 * 1024)   // 
> 120Mb
> ) {code}
>  
> Instead, my table contains a lot of very small (~1Mb) files: 
> !Screen Shot 2022-04-14 at 1.08.19 PM.png|width=742,height=422!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-3883) Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues

2022-12-06 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-3883:

Description: 
Even after HUDI-3709, i still see that when writing partitioned-table 
file-sizing doesn't seem to be properly respected: in that case i was running 
ingestion job with following configs which was supposed to yield me ~100Mb files
{code:java}
Map(
  "hoodie.parquet.small.file.limit" -> String.valueOf(100 * 1024 * 1024),  // 
100Mb
  "hoodie.parquet.max.file.size"-> String.valueOf(120 * 1024 * 1024)   // 
120Mb
) {code}
 

Instead, my table contains a lot of very small (~1Mb) files: 

!Screen Shot 2022-04-14 at 1.08.19 PM.png|width=742,height=422!

  was:
Even after HUDI-3709, i still see that when writing partitioned-table 
file-sizing doesn't seem to be properly respected: in that case i was running 
ingestion job with following configs which was supposed to yield me ~100Mb files
{code:java}
Map(
  "hoodie.parquet.small.file.limit" -> String.valueOf(100 * 1024 * 1024),  // 
100Mb
  "hoodie.parquet.max.file.size"-> String.valueOf(120 * 1024 * 1024)   // 
120Mb
) {code}
 

Instead, my table contains a lot of very small (~1Mb) files: 

!Screen Shot 2022-04-14 at 1.08.19 PM.png!


> Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues
> ---
>
> Key: HUDI-3883
> URL: https://issues.apache.org/jira/browse/HUDI-3883
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
> Attachments: Screen Shot 2022-04-14 at 1.08.19 PM.png
>
>
> Even after HUDI-3709, i still see that when writing partitioned-table 
> file-sizing doesn't seem to be properly respected: in that case i was running 
> ingestion job with following configs which was supposed to yield me ~100Mb 
> files
> {code:java}
> Map(
>   "hoodie.parquet.small.file.limit" -> String.valueOf(100 * 1024 * 1024),  // 
> 100Mb
>   "hoodie.parquet.max.file.size"-> String.valueOf(120 * 1024 * 1024)   // 
> 120Mb
> ) {code}
>  
> Instead, my table contains a lot of very small (~1Mb) files: 
> !Screen Shot 2022-04-14 at 1.08.19 PM.png|width=742,height=422!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-3883) Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues

2022-08-22 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3883:
-
Sprint: Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 2022/05/02, 2022/05/16, 
2022/05/31  (was: Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 2022/05/02, 
2022/05/16, 2022/05/31, 2022/08/22)

> Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues
> ---
>
> Key: HUDI-3883
> URL: https://issues.apache.org/jira/browse/HUDI-3883
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
> Attachments: Screen Shot 2022-04-14 at 1.08.19 PM.png
>
>
> Even after HUDI-3709, i still see that when writing partitioned-table 
> file-sizing doesn't seem to be properly respected: in that case i was running 
> ingestion job with following configs which was supposed to yield me ~100Mb 
> files
> {code:java}
> Map(
>   "hoodie.parquet.small.file.limit" -> String.valueOf(100 * 1024 * 1024),  // 
> 100Mb
>   "hoodie.parquet.max.file.size"-> String.valueOf(120 * 1024 * 1024)   // 
> 120Mb
> ) {code}
>  
> Instead, my table contains a lot of very small (~1Mb) files: 
> !Screen Shot 2022-04-14 at 1.08.19 PM.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-3883) Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues

2022-08-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3883:
--
Sprint: Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 2022/05/02, 2022/05/16, 
2022/05/31, 2022/08/22  (was: Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 
2022/05/02, 2022/05/16, 2022/05/31, 2022/08/08)

> Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues
> ---
>
> Key: HUDI-3883
> URL: https://issues.apache.org/jira/browse/HUDI-3883
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
> Attachments: Screen Shot 2022-04-14 at 1.08.19 PM.png
>
>
> Even after HUDI-3709, i still see that when writing partitioned-table 
> file-sizing doesn't seem to be properly respected: in that case i was running 
> ingestion job with following configs which was supposed to yield me ~100Mb 
> files
> {code:java}
> Map(
>   "hoodie.parquet.small.file.limit" -> String.valueOf(100 * 1024 * 1024),  // 
> 100Mb
>   "hoodie.parquet.max.file.size"-> String.valueOf(120 * 1024 * 1024)   // 
> 120Mb
> ) {code}
>  
> Instead, my table contains a lot of very small (~1Mb) files: 
> !Screen Shot 2022-04-14 at 1.08.19 PM.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-3883) Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues

2022-08-07 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3883:
-
Sprint: Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 2022/05/02, 2022/05/16, 
2022/05/31, 2022/08/08  (was: Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 
2022/05/02, 2022/05/16, 2022/05/31)

> Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues
> ---
>
> Key: HUDI-3883
> URL: https://issues.apache.org/jira/browse/HUDI-3883
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
> Attachments: Screen Shot 2022-04-14 at 1.08.19 PM.png
>
>
> Even after HUDI-3709, i still see that when writing partitioned-table 
> file-sizing doesn't seem to be properly respected: in that case i was running 
> ingestion job with following configs which was supposed to yield me ~100Mb 
> files
> {code:java}
> Map(
>   "hoodie.parquet.small.file.limit" -> String.valueOf(100 * 1024 * 1024),  // 
> 100Mb
>   "hoodie.parquet.max.file.size"-> String.valueOf(120 * 1024 * 1024)   // 
> 120Mb
> ) {code}
>  
> Instead, my table contains a lot of very small (~1Mb) files: 
> !Screen Shot 2022-04-14 at 1.08.19 PM.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-3883) Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues

2022-07-28 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-3883:
--
Fix Version/s: 0.13.0
   (was: 0.12.0)

> Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues
> ---
>
> Key: HUDI-3883
> URL: https://issues.apache.org/jira/browse/HUDI-3883
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
> Attachments: Screen Shot 2022-04-14 at 1.08.19 PM.png
>
>
> Even after HUDI-3709, i still see that when writing partitioned-table 
> file-sizing doesn't seem to be properly respected: in that case i was running 
> ingestion job with following configs which was supposed to yield me ~100Mb 
> files
> {code:java}
> Map(
>   "hoodie.parquet.small.file.limit" -> String.valueOf(100 * 1024 * 1024),  // 
> 100Mb
>   "hoodie.parquet.max.file.size"-> String.valueOf(120 * 1024 * 1024)   // 
> 120Mb
> ) {code}
>  
> Instead, my table contains a lot of very small (~1Mb) files: 
> !Screen Shot 2022-04-14 at 1.08.19 PM.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-3883) Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues

2022-06-15 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-3883:
--
Summary: Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues  (was: 
File-sizing issues when writing COW table to S3)

> Bulk-insert w/ sort-mode "NONE" leads to file-sizing issues
> ---
>
> Key: HUDI-3883
> URL: https://issues.apache.org/jira/browse/HUDI-3883
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.0
>
> Attachments: Screen Shot 2022-04-14 at 1.08.19 PM.png
>
>
> Even after HUDI-3709, i still see that when writing partitioned-table 
> file-sizing doesn't seem to be properly respected: in that case i was running 
> ingestion job with following configs which was supposed to yield me ~100Mb 
> files
> {code:java}
> Map(
>   "hoodie.parquet.small.file.limit" -> String.valueOf(100 * 1024 * 1024),  // 
> 100Mb
>   "hoodie.parquet.max.file.size"-> String.valueOf(120 * 1024 * 1024)   // 
> 120Mb
> ) {code}
>  
> Instead, my table contains a lot of very small (~1Mb) files: 
> !Screen Shot 2022-04-14 at 1.08.19 PM.png!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)