[jira] [Updated] (SPARK-24940) Coalesce Hint for SQL Queries
[ https://issues.apache.org/jira/browse/SPARK-24940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-24940: Target Version/s: 2.4.0 > Coalesce Hint for SQL Queries > - > > Key: SPARK-24940 > URL: https://issues.apache.org/jira/browse/SPARK-24940 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1 >Reporter: John Zhuge >Priority: Major > > Many Spark SQL users in my company have asked for a way to control the number > of output files in Spark SQL. The users prefer not to use function > repartition\(n\) or coalesce(n, shuffle) that require them to write and > deploy Scala/Java/Python code. > > There are use cases to either reduce or increase the number. > > The DataFrame API has repartition/coalesce for a long time. However, we do > not have an equivalent functionality in SQL queries. We propose adding the > following Hive-style Coalesce hint to Spark SQL. > {noformat} > /*+ COALESCE(n, shuffle) */ > /*+ REPARTITION(n) */ > {noformat} > REPARTITION\(n\) is equal to COALESCE(n, shuffle=true). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24940) Coalesce Hint for SQL Queries
[ https://issues.apache.org/jira/browse/SPARK-24940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-24940: - Target Version/s: (was: 2.4.0) > Coalesce Hint for SQL Queries > - > > Key: SPARK-24940 > URL: https://issues.apache.org/jira/browse/SPARK-24940 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1 >Reporter: John Zhuge >Priority: Major > > Many Spark SQL users in my company have asked for a way to control the number > of output files in Spark SQL. The users prefer not to use function > repartition\(n\) or coalesce(n, shuffle) that require them to write and > deploy Scala/Java/Python code. > > There are use cases to either reduce or increase the number. > > The DataFrame API has repartition/coalesce for a long time. However, we do > not have an equivalent functionality in SQL queries. We propose adding the > following Hive-style Coalesce hint to Spark SQL. > {noformat} > /*+ COALESCE(n, shuffle) */ > /*+ REPARTITION(n) */ > {noformat} > REPARTITION\(n\) is equal to COALESCE(n, shuffle=true). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24940) Coalesce Hint for SQL Queries
[ https://issues.apache.org/jira/browse/SPARK-24940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Zhuge updated SPARK-24940: --- Summary: Coalesce Hint for SQL Queries (was: Coalesce Hint for SQL) > Coalesce Hint for SQL Queries > - > > Key: SPARK-24940 > URL: https://issues.apache.org/jira/browse/SPARK-24940 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1 >Reporter: John Zhuge >Priority: Major > > Many Spark SQL users in my company have asked for a way to control the number > of output files in Spark SQL. The users prefer not to use function > repartition\(n\) or coalesce(n, shuffle) that require them to write and > deploy Scala/Java/Python code. > > There are use cases to either reduce or increase the number. > > The DataFrame API has repartition/coalesce for a long time. However, we do > not have an equivalent functionality in SQL queries. We propose adding the > following Hive-style Coalesce hint to Spark SQL. > {noformat} > /*+ COALESCE(n, shuffle) */ > /*+ REPARTITION(n) */ > {noformat} > REPARTITION\(n\) is equal to COALESCE(n, shuffle=true). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24940) Coalesce Hint for SQL
[ https://issues.apache.org/jira/browse/SPARK-24940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Zhuge updated SPARK-24940: --- Description: Many Spark SQL users in my company have asked for a way to control the number of output files in Spark SQL. The users prefer not to use function repartition\(n\) or coalesce(n, shuffle) that require them to write and deploy Scala/Java/Python code. There are use cases to either reduce or increase the number. The DataFrame API has repartition/coalesce for a long time. However, we do not have an equivalent functionality in SQL queries. We propose adding the following Hive-style Coalesce hint to Spark SQL. {noformat} /*+ COALESCE(n, shuffle) */ /*+ REPARTITION(n) */ {noformat} REPARTITION\(n\) is equal to COALESCE(n, shuffle=true). was: Many Spark SQL users in my company have asked for a way to control the number of output files in Spark SQL. The users prefer not to use function repartition(n) or coalesce(n, shuffle) that require them to write and deploy Scala/Java/Python code. There are use cases to either reduce or increase the number. The DataFrame API has repartition/coalesce for a long time. However, we do not have an equivalent functionality in SQL queries. We propose adding the following Hive-style Coalesce hint to Spark SQL. {noformat} /*+ COALESCE(n, shuffle) */ /*+ REPARTITION(n) */ {noformat} REPARTITION(n) is equal to COALESCE(n, shuffle=true). > Coalesce Hint for SQL > - > > Key: SPARK-24940 > URL: https://issues.apache.org/jira/browse/SPARK-24940 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1 >Reporter: John Zhuge >Priority: Major > > Many Spark SQL users in my company have asked for a way to control the number > of output files in Spark SQL. The users prefer not to use function > repartition\(n\) or coalesce(n, shuffle) that require them to write and > deploy Scala/Java/Python code. > > There are use cases to either reduce or increase the number. > > The DataFrame API has repartition/coalesce for a long time. However, we do > not have an equivalent functionality in SQL queries. We propose adding the > following Hive-style Coalesce hint to Spark SQL. > {noformat} > /*+ COALESCE(n, shuffle) */ > /*+ REPARTITION(n) */ > {noformat} > REPARTITION\(n\) is equal to COALESCE(n, shuffle=true). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org