[jira] [Updated] (SPARK-24940) Coalesce Hint for SQL Queries

2018-08-02 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-24940:

Target Version/s: 2.4.0

> Coalesce Hint for SQL Queries
> -
>
> Key: SPARK-24940
> URL: https://issues.apache.org/jira/browse/SPARK-24940
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: John Zhuge
>Priority: Major
>
> Many Spark SQL users in my company have asked for a way to control the number 
> of output files in Spark SQL. The users prefer not to use function 
> repartition\(n\) or coalesce(n, shuffle) that require them to write and 
> deploy Scala/Java/Python code.
>   
>  There are use cases to either reduce or increase the number.
>   
>  The DataFrame API has repartition/coalesce for a long time. However, we do 
> not have an equivalent functionality in SQL queries. We propose adding the 
> following Hive-style Coalesce hint to Spark SQL.
> {noformat}
> /*+ COALESCE(n, shuffle) */
> /*+ REPARTITION(n) */
> {noformat}
> REPARTITION\(n\) is equal to COALESCE(n, shuffle=true).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24940) Coalesce Hint for SQL Queries

2018-07-27 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-24940:
-
Target Version/s:   (was: 2.4.0)

> Coalesce Hint for SQL Queries
> -
>
> Key: SPARK-24940
> URL: https://issues.apache.org/jira/browse/SPARK-24940
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: John Zhuge
>Priority: Major
>
> Many Spark SQL users in my company have asked for a way to control the number 
> of output files in Spark SQL. The users prefer not to use function 
> repartition\(n\) or coalesce(n, shuffle) that require them to write and 
> deploy Scala/Java/Python code.
>   
>  There are use cases to either reduce or increase the number.
>   
>  The DataFrame API has repartition/coalesce for a long time. However, we do 
> not have an equivalent functionality in SQL queries. We propose adding the 
> following Hive-style Coalesce hint to Spark SQL.
> {noformat}
> /*+ COALESCE(n, shuffle) */
> /*+ REPARTITION(n) */
> {noformat}
> REPARTITION\(n\) is equal to COALESCE(n, shuffle=true).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24940) Coalesce Hint for SQL Queries

2018-07-26 Thread John Zhuge (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated SPARK-24940:
---
Summary: Coalesce Hint for SQL Queries  (was: Coalesce Hint for SQL)

> Coalesce Hint for SQL Queries
> -
>
> Key: SPARK-24940
> URL: https://issues.apache.org/jira/browse/SPARK-24940
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: John Zhuge
>Priority: Major
>
> Many Spark SQL users in my company have asked for a way to control the number 
> of output files in Spark SQL. The users prefer not to use function 
> repartition\(n\) or coalesce(n, shuffle) that require them to write and 
> deploy Scala/Java/Python code.
>   
>  There are use cases to either reduce or increase the number.
>   
>  The DataFrame API has repartition/coalesce for a long time. However, we do 
> not have an equivalent functionality in SQL queries. We propose adding the 
> following Hive-style Coalesce hint to Spark SQL.
> {noformat}
> /*+ COALESCE(n, shuffle) */
> /*+ REPARTITION(n) */
> {noformat}
> REPARTITION\(n\) is equal to COALESCE(n, shuffle=true).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24940) Coalesce Hint for SQL

2018-07-26 Thread John Zhuge (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated SPARK-24940:
---
Description: 
Many Spark SQL users in my company have asked for a way to control the number 
of output files in Spark SQL. The users prefer not to use function 
repartition\(n\) or coalesce(n, shuffle) that require them to write and deploy 
Scala/Java/Python code.
  
 There are use cases to either reduce or increase the number.
  
 The DataFrame API has repartition/coalesce for a long time. However, we do not 
have an equivalent functionality in SQL queries. We propose adding the 
following Hive-style Coalesce hint to Spark SQL.
{noformat}
/*+ COALESCE(n, shuffle) */
/*+ REPARTITION(n) */
{noformat}
REPARTITION\(n\) is equal to COALESCE(n, shuffle=true).

  was:
Many Spark SQL users in my company have asked for a way to control the number 
of output files in Spark SQL. The users prefer not to use function 
repartition(n) or coalesce(n, shuffle) that require them to write and deploy 
Scala/Java/Python code.
  
 There are use cases to either reduce or increase the number.
  
 The DataFrame API has repartition/coalesce for a long time. However, we do not 
have an equivalent functionality in SQL queries. We propose adding the 
following Hive-style Coalesce hint to Spark SQL.
{noformat}
/*+ COALESCE(n, shuffle) */
/*+ REPARTITION(n) */
{noformat}
REPARTITION(n) is equal to COALESCE(n, shuffle=true).


> Coalesce Hint for SQL
> -
>
> Key: SPARK-24940
> URL: https://issues.apache.org/jira/browse/SPARK-24940
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: John Zhuge
>Priority: Major
>
> Many Spark SQL users in my company have asked for a way to control the number 
> of output files in Spark SQL. The users prefer not to use function 
> repartition\(n\) or coalesce(n, shuffle) that require them to write and 
> deploy Scala/Java/Python code.
>   
>  There are use cases to either reduce or increase the number.
>   
>  The DataFrame API has repartition/coalesce for a long time. However, we do 
> not have an equivalent functionality in SQL queries. We propose adding the 
> following Hive-style Coalesce hint to Spark SQL.
> {noformat}
> /*+ COALESCE(n, shuffle) */
> /*+ REPARTITION(n) */
> {noformat}
> REPARTITION\(n\) is equal to COALESCE(n, shuffle=true).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org