[ https://issues.apache.org/jira/browse/SPARK-41386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644175#comment-17644175 ]
Zhe Dong commented on SPARK-41386: ---------------------------------- Hi. [~podongfeng] That was my mistake. I removed it. sorry for that. > There are some small files when using rebalance(column) > ------------------------------------------------------- > > Key: SPARK-41386 > URL: https://issues.apache.org/jira/browse/SPARK-41386 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.3.1 > Reporter: Zhe Dong > Priority: Minor > > *Problem ( REBALANCE(column)* {*}){*}: > SparkSession config: > {noformat} > config("spark.sql.adaptive.optimizeSkewsInRebalancePartitions.enabled", > "true") > config("spark.sql.adaptive.advisoryPartitionSizeInBytes", "20m") > config("spark.sql.adaptive.rebalancePartitionsSmallPartitionFactor", > "0.5"){noformat} > so, we except that files size should be bigger than 20m*0.5=10m at least. > but in fact , we got some small files like the following: > {noformat} > -rw-r--r-- 1 jp28948 staff 12.1 M 2022-12-07 13:13 > .../part-00000-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 12.1 M 2022-12-07 13:13 > .../part-00001-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 12.1 M 2022-12-07 13:13 > .../part-00002-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 12.1 M 2022-12-07 13:13 > .../part-00003-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 9.1 M 2022-12-07 13:13 > .../part-00004-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet > -rw-r--r-- 1 jp28948 staff 3.0 M 2022-12-07 13:13 > .../part-00005-1ece1aae-f4f6-47ac-abe2-170ccb61f60e.c000.snappy.parquet{noformat} > 9.1 M and 3.0 M is smaller than 10M. we have to handle these small files in > another way. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org