[ https://issues.apache.org/jira/browse/SPARK-35216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
qian wang updated SPARK-35216: ------------------------------ Summary: a general auto merge output small files feature for datasource api (was: a general auto merge output files feature for datasource api) > a general auto merge output small files feature for datasource api > ------------------------------------------------------------------ > > Key: SPARK-35216 > URL: https://issues.apache.org/jira/browse/SPARK-35216 > Project: Spark > Issue Type: New Feature > Components: SQL > Affects Versions: 3.0.2 > Reporter: qian wang > Priority: Major > > in most case, users write data to hive table or hdfs dir with spark sql, > since as spark3.0 released, offical didn't encourge to use hive module to > read/write hive table, preferredĀ switching to datasoruce api from hive > strategy rule, so as to centralize io operation with one module. > so given a general auto merge output files ability for datasource api would > resolve many users's small files problem in production, and it can bind with > datasource write framwork tightly, so that the auto merge course is > transparent to users, and it is capable to handle all kinds of writing > method, such as writing hdfs dir/non-partitioned hive table/dynamic partition > hive table > this is my individual implemetation for the functionality, and it's stable in > production environment of my company -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org