[ 
https://issues.apache.org/jira/browse/SPARK-35216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qian wang updated SPARK-35216:
------------------------------
    Summary: a general auto merge output small files feature for datasource api 
 (was: a general auto merge output files feature for datasource api)

> a general auto merge output small files feature for datasource api
> ------------------------------------------------------------------
>
>                 Key: SPARK-35216
>                 URL: https://issues.apache.org/jira/browse/SPARK-35216
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.0.2
>            Reporter: qian wang
>            Priority: Major
>
> in most case, users write data to hive table or hdfs dir with spark sql, 
> since as spark3.0 released, offical didn't encourge to use hive module to 
> read/write hive table, preferredĀ  switching to datasoruce api from hive 
> strategy rule, so as to centralize io operation with one module.
> so given a general auto merge output files ability for datasource api would 
> resolve many users's small files problem in production, and it can bind with 
> datasource write framwork tightly, so that the auto merge course is 
> transparent to users, and it is capable to handle all kinds of writing 
> method, such as writing hdfs dir/non-partitioned hive table/dynamic partition 
> hive table
> this is my individual implemetation for the functionality, and it's stable in 
> production environment of my company



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to