[jira] [Updated] (SPARK-12974) Add Python API for spark.ml bisecting k-means

2016-01-24 Thread Yanbo Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanbo Liang updated SPARK-12974:

Component/s: PySpark
 ML

> Add Python API for spark.ml bisecting k-means
> -
>
> Key: SPARK-12974
> URL: https://issues.apache.org/jira/browse/SPARK-12974
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Yanbo Liang
>Priority: Minor
>
> Add Python API for spark.ml bisecting k-means



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12974) Add Python API for spark.ml bisecting k-means

2016-01-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12974:


Assignee: (was: Apache Spark)

> Add Python API for spark.ml bisecting k-means
> -
>
> Key: SPARK-12974
> URL: https://issues.apache.org/jira/browse/SPARK-12974
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Yanbo Liang
>Priority: Minor
>
> Add Python API for spark.ml bisecting k-means



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12974) Add Python API for spark.ml bisecting k-means

2016-01-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114258#comment-15114258
 ] 

Apache Spark commented on SPARK-12974:
--

User 'yanboliang' has created a pull request for this issue:
https://github.com/apache/spark/pull/10889

> Add Python API for spark.ml bisecting k-means
> -
>
> Key: SPARK-12974
> URL: https://issues.apache.org/jira/browse/SPARK-12974
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Yanbo Liang
>Priority: Minor
>
> Add Python API for spark.ml bisecting k-means



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12974) Add Python API for spark.ml bisecting k-means

2016-01-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12974:


Assignee: Apache Spark

> Add Python API for spark.ml bisecting k-means
> -
>
> Key: SPARK-12974
> URL: https://issues.apache.org/jira/browse/SPARK-12974
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Yanbo Liang
>Assignee: Apache Spark
>Priority: Minor
>
> Add Python API for spark.ml bisecting k-means



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12973) Support to set priority when submit spark application to YARN

2016-01-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12973:


Assignee: (was: Apache Spark)

> Support to set priority when submit spark application to YARN
> -
>
> Key: SPARK-12973
> URL: https://issues.apache.org/jira/browse/SPARK-12973
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.6.1
>Reporter: Chaozhong Yang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12973) Support to set priority when submit spark application to YARN

2016-01-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12973:


Assignee: Apache Spark

> Support to set priority when submit spark application to YARN
> -
>
> Key: SPARK-12973
> URL: https://issues.apache.org/jira/browse/SPARK-12973
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.6.1
>Reporter: Chaozhong Yang
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12973) Support to set priority when submit spark application to YARN

2016-01-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114250#comment-15114250
 ] 

Apache Spark commented on SPARK-12973:
--

User 'debugger87' has created a pull request for this issue:
https://github.com/apache/spark/pull/10888

> Support to set priority when submit spark application to YARN
> -
>
> Key: SPARK-12973
> URL: https://issues.apache.org/jira/browse/SPARK-12973
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.6.1
>Reporter: Chaozhong Yang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12974) Add Python API for spark.ml bisecting k-means

2016-01-24 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-12974:
---

 Summary: Add Python API for spark.ml bisecting k-means
 Key: SPARK-12974
 URL: https://issues.apache.org/jira/browse/SPARK-12974
 Project: Spark
  Issue Type: Improvement
Reporter: Yanbo Liang
Priority: Minor


Add Python API for spark.ml bisecting k-means



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2016-01-24 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114721#comment-15114721
 ] 

Yin Huai commented on SPARK-9740:
-

[~emlyn] You can use {{expr}} function provided by 
{{org.apache.spark.sql.functions}} to do that. For example, 
{{expr("first(colName, true)")}}.

> first/last aggregate NULL behavior
> --
>
> Key: SPARK-9740
> URL: https://issues.apache.org/jira/browse/SPARK-9740
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Herman van Hovell
>Assignee: Yin Huai
>  Labels: releasenotes
> Fix For: 1.6.0
>
>
> The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
> return the first or last non-null value (if any) found. This is a departure 
> from the behavior of the old FIRST/LAST aggregates and from the 
> FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
> if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
> this behavior for the old UDAF interface.
> Hive makes this behavior configurable, by adding a skipNulls flag. I would 
> suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6847) Stack overflow on updateStateByKey which followed by a dstream with checkpoint set

2016-01-24 Thread Jack Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114722#comment-15114722
 ] 

Jack Hu commented on SPARK-6847:


Test on latest 1.6 branch (f913f7e [SPARK-12120][PYSPARK] Improve exception 
message when failing to init), it still exists.

> Stack overflow on updateStateByKey which followed by a dstream with 
> checkpoint set
> --
>
> Key: SPARK-6847
> URL: https://issues.apache.org/jira/browse/SPARK-6847
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.3.0
>Reporter: Jack Hu
>  Labels: StackOverflowError, Streaming
>
> The issue happens with the following sample code: uses {{updateStateByKey}} 
> followed by a {{map}} with checkpoint interval 10 seconds
> {code}
> val sparkConf = new SparkConf().setAppName("test")
> val streamingContext = new StreamingContext(sparkConf, Seconds(10))
> streamingContext.checkpoint("""checkpoint""")
> val source = streamingContext.socketTextStream("localhost", )
> val updatedResult = source.map(
> (1,_)).updateStateByKey(
> (newlist : Seq[String], oldstate : Option[String]) => 
> newlist.headOption.orElse(oldstate))
> updatedResult.map(_._2)
> .checkpoint(Seconds(10))
> .foreachRDD((rdd, t) => {
>   println("Deep: " + rdd.toDebugString.split("\n").length)
>   println(t.toString() + ": " + rdd.collect.length)
> })
> streamingContext.start()
> streamingContext.awaitTermination()
> {code}
> From the output, we can see that the dependency will be increasing time over 
> time, the {{updateStateByKey}} never get check-pointed,  and finally, the 
> stack overflow will happen. 
> Note:
> * The rdd in {{updatedResult.map(_._2)}} get check-pointed in this case, but 
> not the {{updateStateByKey}} 
> * If remove the {{checkpoint(Seconds(10))}} from the map result ( 
> {{updatedResult.map(_._2)}} ), the stack overflow will not happen



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12975) Eliminate Bucketing Columns that are part of Partitioning Columns

2016-01-24 Thread Xiao Li (JIRA)
Xiao Li created SPARK-12975:
---

 Summary: Eliminate Bucketing Columns that are part of Partitioning 
Columns
 Key: SPARK-12975
 URL: https://issues.apache.org/jira/browse/SPARK-12975
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.0.0
Reporter: Xiao Li


When users are using partitionBy and bucketBy at the same time, some bucketing 
columns might be part of partitioning columns. For example, 
{code}
df.write
  .format(source)
  .partitionBy("i")
  .bucketBy(8, "i", "k")
  .sortBy("k")
  .saveAsTable("bucketed_table")
{code}

However, in the above case, adding column `i` is useless. It is just wasting 
extra CPU when reading or writing bucket tables. Thus, we can automatically 
remove these overlapping columns from bucketing columns. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12976) Add LazilyGenerateOrdering and use it for RangePartitioner of Exchange.

2016-01-24 Thread Takuya Ueshin (JIRA)
Takuya Ueshin created SPARK-12976:
-

 Summary: Add LazilyGenerateOrdering and use it for 
RangePartitioner of Exchange.
 Key: SPARK-12976
 URL: https://issues.apache.org/jira/browse/SPARK-12976
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Takuya Ueshin


Add LazilyGenerateOrdering to support generated ordering for RangePartitioner 
of Exchange instead of InterpretedOrdering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12901) Refector options to be correctly formed in a case class

2016-01-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12901:


Assignee: (was: Apache Spark)

> Refector options to be correctly formed in a case class
> ---
>
> Key: SPARK-12901
> URL: https://issues.apache.org/jira/browse/SPARK-12901
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> The {{CSVParameters}} class is a case class but looks more like a normal 
> class.
> This might be refactored similar with {{JSONOptions}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12901) Refector options to be correctly formed in a case class

2016-01-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12901:


Assignee: Apache Spark

> Refector options to be correctly formed in a case class
> ---
>
> Key: SPARK-12901
> URL: https://issues.apache.org/jira/browse/SPARK-12901
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Minor
>
> The {{CSVParameters}} class is a case class but looks more like a normal 
> class.
> This might be refactored similar with {{JSONOptions}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12901) Refector options to be correctly formed in a case class

2016-01-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114761#comment-15114761
 ] 

Apache Spark commented on SPARK-12901:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/10895

> Refector options to be correctly formed in a case class
> ---
>
> Key: SPARK-12901
> URL: https://issues.apache.org/jira/browse/SPARK-12901
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> The {{CSVParameters}} class is a case class but looks more like a normal 
> class.
> This might be refactored similar with {{JSONOptions}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12917) Add DML support to Spark SQL for HIVE

2016-01-24 Thread Hemang Nagar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114616#comment-15114616
 ] 

Hemang Nagar commented on SPARK-12917:
--

Update and Delete operations are supported in Hive 0.14 and after that, we need 
Spark to support it. Also, need Insert by values operations to be supported. 

For example, insert into table values(1, "john doe"), this gives an unsupported 
operation exception in Spark. 

> Add DML support to Spark SQL for HIVE
> -
>
> Key: SPARK-12917
> URL: https://issues.apache.org/jira/browse/SPARK-12917
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Hemang Nagar
>Priority: Blocker
>
> Spark SQL should be updated to support the DML operations that are being 
> supported by Hive since 0.14



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12976) Add LazilyGenerateOrdering and use it for RangePartitioner of Exchange.

2016-01-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114745#comment-15114745
 ] 

Apache Spark commented on SPARK-12976:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/10894

> Add LazilyGenerateOrdering and use it for RangePartitioner of Exchange.
> ---
>
> Key: SPARK-12976
> URL: https://issues.apache.org/jira/browse/SPARK-12976
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Takuya Ueshin
>
> Add LazilyGenerateOrdering to support generated ordering for RangePartitioner 
> of Exchange instead of InterpretedOrdering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12976) Add LazilyGenerateOrdering and use it for RangePartitioner of Exchange.

2016-01-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12976:


Assignee: Apache Spark

> Add LazilyGenerateOrdering and use it for RangePartitioner of Exchange.
> ---
>
> Key: SPARK-12976
> URL: https://issues.apache.org/jira/browse/SPARK-12976
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>
> Add LazilyGenerateOrdering to support generated ordering for RangePartitioner 
> of Exchange instead of InterpretedOrdering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12976) Add LazilyGenerateOrdering and use it for RangePartitioner of Exchange.

2016-01-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12976:


Assignee: (was: Apache Spark)

> Add LazilyGenerateOrdering and use it for RangePartitioner of Exchange.
> ---
>
> Key: SPARK-12976
> URL: https://issues.apache.org/jira/browse/SPARK-12976
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Takuya Ueshin
>
> Add LazilyGenerateOrdering to support generated ordering for RangePartitioner 
> of Exchange instead of InterpretedOrdering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12946) The SQL page is empty

2016-01-24 Thread KaiXinXIaoLei (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114650#comment-15114650
 ] 

KaiXinXIaoLei commented on SPARK-12946:
---

I use the default config. In local mode, the problem exists, too. The way to 
test: 
use code of the master branch to build, then run "bin/spark-sq" , and run 
"create table a(i int); ". Finally, check the SQL page in http://IP:4040. I 
find the page is empty.

> The SQL page is empty
> -
>
> Key: SPARK-12946
> URL: https://issues.apache.org/jira/browse/SPARK-12946
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.0
>Reporter: KaiXinXIaoLei
> Attachments: SQLpage.png
>
>
> I run sql query using  "bin/spark-sql --master yarn". Then i open the ui , 
> and find the SQL page is empty



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12975) Eliminate Bucketing Columns that are part of Partitioning Columns

2016-01-24 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-12975:

Description: 
When users are using partitionBy and bucketBy at the same time, some bucketing 
columns might be part of partitioning columns. For example, 
{code}
df.write
  .format(source)
  .partitionBy("i")
  .bucketBy(8, "i", "k")
  .sortBy("k")
  .saveAsTable("bucketed_table")
{code}

However, in the above case, adding column `i` is useless. It is just wasting 
extra CPU when reading or writing bucket tables. Thus, we can automatically 
remove these overlapping columns from the bucketing columns. 

  was:
When users are using partitionBy and bucketBy at the same time, some bucketing 
columns might be part of partitioning columns. For example, 
{code}
df.write
  .format(source)
  .partitionBy("i")
  .bucketBy(8, "i", "k")
  .sortBy("k")
  .saveAsTable("bucketed_table")
{code}

However, in the above case, adding column `i` is useless. It is just wasting 
extra CPU when reading or writing bucket tables. Thus, we can automatically 
remove these overlapping columns from bucketing columns. 


> Eliminate Bucketing Columns that are part of Partitioning Columns
> -
>
> Key: SPARK-12975
> URL: https://issues.apache.org/jira/browse/SPARK-12975
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> When users are using partitionBy and bucketBy at the same time, some 
> bucketing columns might be part of partitioning columns. For example, 
> {code}
> df.write
>   .format(source)
>   .partitionBy("i")
>   .bucketBy(8, "i", "k")
>   .sortBy("k")
>   .saveAsTable("bucketed_table")
> {code}
> However, in the above case, adding column `i` is useless. It is just wasting 
> extra CPU when reading or writing bucket tables. Thus, we can automatically 
> remove these overlapping columns from the bucketing columns. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10911) Executors should System.exit on clean shutdown

2016-01-24 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114717#comment-15114717
 ] 

Yin Huai commented on SPARK-10911:
--

Quote or provide the link of new comments about this issue?

> Executors should System.exit on clean shutdown
> --
>
> Key: SPARK-10911
> URL: https://issues.apache.org/jira/browse/SPARK-10911
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Assignee: Zhuo Liu
>Priority: Minor
>
> Executors should call System.exit on clean shutdown to make sure all user 
> threads exit and jvm shuts down.
> We ran into a case where an Executor was left around for days trying to 
> shutdown because the user code was using a non-daemon thread pool and one of 
> those threads wasn't exiting.  We should force the jvm to go away with 
> System.exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-2004) Automate QA of Spark Build/Deploy Matrix

2016-01-24 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-2004.
---
Resolution: Later

Resolving this for now since I think this JIRA isn't actionable right now and 
it's kind of broad / vague. We can re-open when we have more concrete plans.

> Automate QA of Spark Build/Deploy Matrix
> 
>
> Key: SPARK-2004
> URL: https://issues.apache.org/jira/browse/SPARK-2004
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Deploy, Project Infra
>Reporter: Xiangrui Meng
>Assignee: Nicholas Chammas
>
> This is an umbrella JIRA to track QA automation tasks. Spark supports
> * several deploy modes
> ** local
> ** standalone
> ** yarn
> ** mesos
> * three languages
> ** scala
> ** java
> ** python
> * several hadoop versions
> ** 0.x
> ** 1.x
> ** 2.x
> * job submission from different systems
> ** linux
> ** mac os x
> ** windows
> The cross product of them creates a big deployment matrix. QA automation is 
> really necessary to avoid regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-2005) Investigate linux container-based solution

2016-01-24 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-2005.
---
Resolution: Later

Resolving as "later".

> Investigate linux container-based solution
> --
>
> Key: SPARK-2005
> URL: https://issues.apache.org/jira/browse/SPARK-2005
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Xiangrui Meng
>
> We can set up container-based cluster environment and automatically test 
> against a deployment matrix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12948) Consider reducing size of broadcasts in OrcRelation

2016-01-24 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated SPARK-12948:
-
Attachment: SPARK-12948.mem.prof.snapshot.png

> Consider reducing size of broadcasts in OrcRelation
> ---
>
> Key: SPARK-12948
> URL: https://issues.apache.org/jira/browse/SPARK-12948
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Rajesh Balamohan
> Attachments: SPARK-12948.mem.prof.snapshot.png, 
> SPARK-12948_cpuProf.png
>
>
> Size of broadcasted data in OrcRelation was significantly higher when running 
> query with large number of partitions (e.g TPC-DS). Consider reducing the 
> size of the broadcasted data in OrcRelation, as it has an impact on the job 
> runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12975) Eliminate Bucketing Columns that are part of Partitioning Columns

2016-01-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12975:


Assignee: Apache Spark

> Eliminate Bucketing Columns that are part of Partitioning Columns
> -
>
> Key: SPARK-12975
> URL: https://issues.apache.org/jira/browse/SPARK-12975
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> When users are using partitionBy and bucketBy at the same time, some 
> bucketing columns might be part of partitioning columns. For example, 
> {code}
> df.write
>   .format(source)
>   .partitionBy("i")
>   .bucketBy(8, "i", "k")
>   .sortBy("k")
>   .saveAsTable("bucketed_table")
> {code}
> However, in the above case, adding column `i` is useless. It is just wasting 
> extra CPU when reading or writing bucket tables. Thus, we can automatically 
> remove these overlapping columns from the bucketing columns. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12975) Eliminate Bucketing Columns that are part of Partitioning Columns

2016-01-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12975:


Assignee: (was: Apache Spark)

> Eliminate Bucketing Columns that are part of Partitioning Columns
> -
>
> Key: SPARK-12975
> URL: https://issues.apache.org/jira/browse/SPARK-12975
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> When users are using partitionBy and bucketBy at the same time, some 
> bucketing columns might be part of partitioning columns. For example, 
> {code}
> df.write
>   .format(source)
>   .partitionBy("i")
>   .bucketBy(8, "i", "k")
>   .sortBy("k")
>   .saveAsTable("bucketed_table")
> {code}
> However, in the above case, adding column `i` is useless. It is just wasting 
> extra CPU when reading or writing bucket tables. Thus, we can automatically 
> remove these overlapping columns from the bucketing columns. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12975) Eliminate Bucketing Columns that are part of Partitioning Columns

2016-01-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114651#comment-15114651
 ] 

Apache Spark commented on SPARK-12975:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/10891

> Eliminate Bucketing Columns that are part of Partitioning Columns
> -
>
> Key: SPARK-12975
> URL: https://issues.apache.org/jira/browse/SPARK-12975
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> When users are using partitionBy and bucketBy at the same time, some 
> bucketing columns might be part of partitioning columns. For example, 
> {code}
> df.write
>   .format(source)
>   .partitionBy("i")
>   .bucketBy(8, "i", "k")
>   .sortBy("k")
>   .saveAsTable("bucketed_table")
> {code}
> However, in the above case, adding column `i` is useless. It is just wasting 
> extra CPU when reading or writing bucket tables. Thus, we can automatically 
> remove these overlapping columns from the bucketing columns. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5175) bug in updating counters when starting multiple workers/supervisors in actor-based receiver

2016-01-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114676#comment-15114676
 ] 

Apache Spark commented on SPARK-5175:
-

User 'CodingCat' has created a pull request for this issue:
https://github.com/apache/spark/pull/10892

> bug in updating counters when starting multiple workers/supervisors in 
> actor-based receiver
> ---
>
> Key: SPARK-5175
> URL: https://issues.apache.org/jira/browse/SPARK-5175
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.2.0
>Reporter: Nan Zhu
>
> when starting multiple workers(ActorReceiver.scala), we didn't update the 
> counters in it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5174) Missing Document for starting multiple workers/supervisors in actor-based receiver

2016-01-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114675#comment-15114675
 ] 

Apache Spark commented on SPARK-5174:
-

User 'CodingCat' has created a pull request for this issue:
https://github.com/apache/spark/pull/10892

> Missing Document for starting multiple workers/supervisors in actor-based 
> receiver
> --
>
> Key: SPARK-5174
> URL: https://issues.apache.org/jira/browse/SPARK-5174
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.2.0
>Reporter: Nan Zhu
>Priority: Minor
>
> Currently, the document about starting multiple supervisors/workers are 
> missing, though the implementation provides this capacity
> {code:title=ActorReceiver.scala|borderStyle=solid}
> case props: Props =>
> val worker = context.actorOf(props)
> logInfo("Started receiver worker at:" + worker.path)
> sender ! worker
>   case (props: Props, name: String) =>
> val worker = context.actorOf(props, name)
> logInfo("Started receiver worker at:" + worker.path)
> sender ! worker
>   case _: PossiblyHarmful => hiccups.incrementAndGet()
>   case _: Statistics =>
> val workers = context.children
> sender ! Statistics(n.get, workers.size, hiccups.get, 
> workers.mkString("\n"))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12934) Count-min sketch serialization

2016-01-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114682#comment-15114682
 ] 

Apache Spark commented on SPARK-12934:
--

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/10893

> Count-min sketch serialization
> --
>
> Key: SPARK-12934
> URL: https://issues.apache.org/jira/browse/SPARK-12934
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12934) Count-min sketch serialization

2016-01-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12934:


Assignee: Apache Spark  (was: Cheng Lian)

> Count-min sketch serialization
> --
>
> Key: SPARK-12934
> URL: https://issues.apache.org/jira/browse/SPARK-12934
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12934) Count-min sketch serialization

2016-01-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12934:


Assignee: Cheng Lian  (was: Apache Spark)

> Count-min sketch serialization
> --
>
> Key: SPARK-12934
> URL: https://issues.apache.org/jira/browse/SPARK-12934
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12624) When schema is specified, we should give better error message if actual row length doesn't match

2016-01-24 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-12624:
-
Assignee: Cheng Lian

> When schema is specified, we should give better error message if actual row 
> length doesn't match
> 
>
> Key: SPARK-12624
> URL: https://issues.apache.org/jira/browse/SPARK-12624
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Reporter: Reynold Xin
>Assignee: Cheng Lian
>Priority: Blocker
> Fix For: 1.6.1, 2.0.0
>
>
> The following code snippet reproduces this issue:
> {code}
> from pyspark.sql.types import StructType, StructField, IntegerType, StringType
> from pyspark.sql.types import Row
> schema = StructType([StructField("a", IntegerType()), StructField("b", 
> StringType())])
> rdd = sc.parallelize(range(10)).map(lambda x: Row(a=x))
> df = sqlContext.createDataFrame(rdd, schema)
> df.show()
> {code}
> An unintuitive {{ArrayIndexOutOfBoundsException}} exception is thrown in this 
> case:
> {code}
> ...
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.spark.sql.catalyst.expressions.GenericInternalRow.genericGet(rows.scala:227)
> at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getAs(rows.scala:35)
> at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.isNullAt(rows.scala:36)
> ...
> {code}
> We should give a better error message here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12624) When schema is specified, we should give better error message if actual row length doesn't match

2016-01-24 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-12624.
--
   Resolution: Fixed
Fix Version/s: 1.6.1
   2.0.0

Issue resolved by pull request 10886
[https://github.com/apache/spark/pull/10886]

> When schema is specified, we should give better error message if actual row 
> length doesn't match
> 
>
> Key: SPARK-12624
> URL: https://issues.apache.org/jira/browse/SPARK-12624
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Reporter: Reynold Xin
>Priority: Blocker
> Fix For: 2.0.0, 1.6.1
>
>
> The following code snippet reproduces this issue:
> {code}
> from pyspark.sql.types import StructType, StructField, IntegerType, StringType
> from pyspark.sql.types import Row
> schema = StructType([StructField("a", IntegerType()), StructField("b", 
> StringType())])
> rdd = sc.parallelize(range(10)).map(lambda x: Row(a=x))
> df = sqlContext.createDataFrame(rdd, schema)
> df.show()
> {code}
> An unintuitive {{ArrayIndexOutOfBoundsException}} exception is thrown in this 
> case:
> {code}
> ...
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.spark.sql.catalyst.expressions.GenericInternalRow.genericGet(rows.scala:227)
> at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getAs(rows.scala:35)
> at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.isNullAt(rows.scala:36)
> ...
> {code}
> We should give a better error message here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12970) Error in documentation

2016-01-24 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114292#comment-15114292
 ] 

Sean Owen commented on SPARK-12970:
---

The example works, except for two minor issues: need to {{import 
org.apache.spark.sql.types._}} as well, and the double-bracket syntax used in 
the result that is printed in the last line causes that strange {{@link ...}} 
to appear.

As far as I know the {{Row}} here is of the correct type to use with the struct 
schema, though it's not shown actually used here. Do you see a particular 
problem?

> Error in documentation 
> ---
>
> Key: SPARK-12970
> URL: https://issues.apache.org/jira/browse/SPARK-12970
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.6.0
>Reporter: Haidar Hadi
>Priority: Minor
>  Labels: documentation
>
> The provided example in this doc 
> https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/types/StructType.html
>  for creating Row from Struct is wrong
>  // Create a Row with the schema defined by struct
>  val row = Row(Row(1, 2, true))
>  // row: Row = {@link 1,2,true}
>  
> the above example does not create a Row object with schema.
> this error is in the scala docs too. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4878) driverPropsFetcher causes spurious Akka disassociate errors

2016-01-24 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114283#comment-15114283
 ] 

Sean Owen commented on SPARK-4878:
--

[~viirya] maybe you know better; I still see this code in 
{{CoarseGrainedExecutorBackend}}, but I am not clear whether it's still live 
and used?

{code}
  // Bootstrap to fetch the driver's Spark properties.
  val executorConf = new SparkConf
  val port = executorConf.getInt("spark.executor.port", 0)
  val fetcher = RpcEnv.create(
"driverPropsFetcher",
hostname,
port,
executorConf,
new SecurityManager(executorConf),
clientMode = true)
  val driver = fetcher.setupEndpointRefByURI(driverUrl)
  val props = driver.askWithRetry[Seq[(String, 
String)]](RetrieveSparkProps) ++
Seq[(String, String)](("spark.app.id", appId))
  fetcher.shutdown()
{code}

> driverPropsFetcher causes spurious Akka disassociate errors
> ---
>
> Key: SPARK-4878
> URL: https://issues.apache.org/jira/browse/SPARK-4878
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Stephen Haberman
>Priority: Minor
>
> The dedicated Akka system to fetching driver properties seems fine, but it 
> leads to very misleading "AssociationHandle$Disassociated", dead letter, etc. 
> sort of messages that can lead the user to believe something is wrong with 
> the cluster.
> (E.g. personally I thought it was a Spark -rc1/-rc2 bug and spent awhile 
> poking around until I saw in the code that driverPropsFetcher is 
> purposefully/immediately shutdown.)
> Is there any way to cleanly shutdown that initial akka system so that the 
> driver doesn't log these errors?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12890) Spark SQL query related to only partition fields should not scan the whole data.

2016-01-24 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114794#comment-15114794
 ] 

Hyukjin Kwon commented on SPARK-12890:
--

In that case, it will not read all the data but only footer (metadata), 
{{_METADATA}} or {{_COMMON_METADATA}} as the requested columns would be empty 
because the required column is a partition column.

> Spark SQL query related to only partition fields should not scan the whole 
> data.
> 
>
> Key: SPARK-12890
> URL: https://issues.apache.org/jira/browse/SPARK-12890
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Prakash Chockalingam
>
> I have a SQL query which has only partition fields. The query ends up 
> scanning all the data which is unnecessary.
> Example: select max(date) from table, where the table is partitioned by date.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12890) Spark SQL query related to only partition fields should not scan the whole data.

2016-01-24 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114794#comment-15114794
 ] 

Hyukjin Kwon edited comment on SPARK-12890 at 1/25/16 5:49 AM:
---

In that case, it will not read all the data but only footers (metadata) for 
each file, {{_METADATA}} or {{_COMMON_METADATA}} as the requested columns would 
be empty because the required column is a partition column.




was (Author: hyukjin.kwon):
In that case, it will not read all the data but only footer (metadata), 
{{_METADATA}} or {{_COMMON_METADATA}} as the requested columns would be empty 
because the required column is a partition column.

> Spark SQL query related to only partition fields should not scan the whole 
> data.
> 
>
> Key: SPARK-12890
> URL: https://issues.apache.org/jira/browse/SPARK-12890
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Prakash Chockalingam
>
> I have a SQL query which has only partition fields. The query ends up 
> scanning all the data which is unnecessary.
> Example: select max(date) from table, where the table is partitioned by date.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12946) The SQL page is empty

2016-01-24 Thread KaiXinXIaoLei (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114816#comment-15114816
 ] 

KaiXinXIaoLei commented on SPARK-12946:
---

The same problem with SPARK-12492, now i close the jira. Thanks.

> The SQL page is empty
> -
>
> Key: SPARK-12946
> URL: https://issues.apache.org/jira/browse/SPARK-12946
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.0
>Reporter: KaiXinXIaoLei
> Attachments: SQLpage.png
>
>
> I run sql query using  "bin/spark-sql --master yarn". Then i open the ui , 
> and find the SQL page is empty



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12890) Spark SQL query related to only partition fields should not scan the whole data.

2016-01-24 Thread Simeon Simeonov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114829#comment-15114829
 ] 

Simeon Simeonov commented on SPARK-12890:
-

Thanks for the clarification, [~hyukjin.kwon]. Still, there is no reason why it 
should be looking at the files at all. This is especially a problem when the 
Parquet files are in an object store such as S3, because there is no such thing 
as reading the footer of an S3 object. 

> Spark SQL query related to only partition fields should not scan the whole 
> data.
> 
>
> Key: SPARK-12890
> URL: https://issues.apache.org/jira/browse/SPARK-12890
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Prakash Chockalingam
>
> I have a SQL query which has only partition fields. The query ends up 
> scanning all the data which is unnecessary.
> Example: select max(date) from table, where the table is partitioned by date.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12941) Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype

2016-01-24 Thread Thomas Sebastian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114860#comment-15114860
 ] 

Thomas Sebastian commented on SPARK-12941:
--

Sure. Working on it.

> Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR 
> datatype
> --
>
> Key: SPARK-12941
> URL: https://issues.apache.org/jira/browse/SPARK-12941
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.1
> Environment: Apache Spark 1.4.2.2
>Reporter: Jose Martinez Poblete
>
> When exporting data from Spark to Oracle, string datatypes are translated to 
> TEXT for Oracle, this is leading to the following error
> {noformat}
> java.sql.SQLSyntaxErrorException: ORA-00902: invalid datatype
> {noformat}
> As per the following code:
> https://github.com/apache/spark/blob/branch-1.4/sql/core/src/main/scala/org/apache/spark/sql/jdbc/jdbc.scala#L144
> See also:
> http://stackoverflow.com/questions/31287182/writing-to-oracle-database-using-apache-spark-1-4-0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4878) driverPropsFetcher causes spurious Akka disassociate errors

2016-01-24 Thread Liang-Chi Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114861#comment-15114861
 ] 

Liang-Chi Hsieh commented on SPARK-4878:


I think it is still alive and used. The above code sends the message 
{code}RetrieveSparkProps{code}  to 
{code}CoarseGrainedSchedulerBackend.DriverEndpoint{code} and receives spark 
properties back.

> driverPropsFetcher causes spurious Akka disassociate errors
> ---
>
> Key: SPARK-4878
> URL: https://issues.apache.org/jira/browse/SPARK-4878
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Stephen Haberman
>Priority: Minor
>
> The dedicated Akka system to fetching driver properties seems fine, but it 
> leads to very misleading "AssociationHandle$Disassociated", dead letter, etc. 
> sort of messages that can lead the user to believe something is wrong with 
> the cluster.
> (E.g. personally I thought it was a Spark -rc1/-rc2 bug and spent awhile 
> poking around until I saw in the code that driverPropsFetcher is 
> purposefully/immediately shutdown.)
> Is there any way to cleanly shutdown that initial akka system so that the 
> driver doesn't log these errors?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4878) driverPropsFetcher causes spurious Akka disassociate errors

2016-01-24 Thread Liang-Chi Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114861#comment-15114861
 ] 

Liang-Chi Hsieh edited comment on SPARK-4878 at 1/25/16 7:19 AM:
-

I think it is still alive and used. The above code sends the message 
{{RetrieveSparkProps}}  to {{CoarseGrainedSchedulerBackend.DriverEndpoint}} and 
receives spark properties back. But of course it doesn't use Akka anymore.


was (Author: viirya):
I think it is still alive and used. The above code sends the message 
{{RetrieveSparkProps}}  to {{CoarseGrainedSchedulerBackend.DriverEndpoint}} and 
receives spark properties back.

> driverPropsFetcher causes spurious Akka disassociate errors
> ---
>
> Key: SPARK-4878
> URL: https://issues.apache.org/jira/browse/SPARK-4878
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Stephen Haberman
>Priority: Minor
>
> The dedicated Akka system to fetching driver properties seems fine, but it 
> leads to very misleading "AssociationHandle$Disassociated", dead letter, etc. 
> sort of messages that can lead the user to believe something is wrong with 
> the cluster.
> (E.g. personally I thought it was a Spark -rc1/-rc2 bug and spent awhile 
> poking around until I saw in the code that driverPropsFetcher is 
> purposefully/immediately shutdown.)
> Is there any way to cleanly shutdown that initial akka system so that the 
> driver doesn't log these errors?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12890) Spark SQL query related to only partition fields should not scan the whole data.

2016-01-24 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114794#comment-15114794
 ] 

Hyukjin Kwon edited comment on SPARK-12890 at 1/25/16 5:53 AM:
---

In that case, it will not read all the data but only footers (metadata) for 
each file, {{_METADATA}} or {{_COMMON_METADATA}} as the requested columns would 
be empty because the required column is a partition column.


was (Author: hyukjin.kwon):
In that case, it will not read all the data but only footers (metadata) for 
each file, {{_METADATA}} or {{_COMMON_METADATA}} as the requested columns would 
be empty because the required column is a partition column.

Oh, if you meant not filtering row groups, yes it will read all the row groups

> Spark SQL query related to only partition fields should not scan the whole 
> data.
> 
>
> Key: SPARK-12890
> URL: https://issues.apache.org/jira/browse/SPARK-12890
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Prakash Chockalingam
>
> I have a SQL query which has only partition fields. The query ends up 
> scanning all the data which is unnecessary.
> Example: select max(date) from table, where the table is partitioned by date.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12890) Spark SQL query related to only partition fields should not scan the whole data.

2016-01-24 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114794#comment-15114794
 ] 

Hyukjin Kwon edited comment on SPARK-12890 at 1/25/16 5:52 AM:
---

In that case, it will not read all the data but only footers (metadata) for 
each file, {{_METADATA}} or {{_COMMON_METADATA}} as the requested columns would 
be empty because the required column is a partition column.

Oh, if you meant not filtering row groups, yes it will read all the row groups


was (Author: hyukjin.kwon):
In that case, it will not read all the data but only footers (metadata) for 
each file, {{_METADATA}} or {{_COMMON_METADATA}} as the requested columns would 
be empty because the required column is a partition column.



> Spark SQL query related to only partition fields should not scan the whole 
> data.
> 
>
> Key: SPARK-12890
> URL: https://issues.apache.org/jira/browse/SPARK-12890
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Prakash Chockalingam
>
> I have a SQL query which has only partition fields. The query ends up 
> scanning all the data which is unnecessary.
> Example: select max(date) from table, where the table is partitioned by date.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12946) The SQL page is empty

2016-01-24 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114838#comment-15114838
 ] 

Josh Rosen commented on SPARK-12946:


Ah, thanks for providing these extra details; this is very helpful.

My hunch is that DDL operations like CREATE TABLE aren't triggering the 
execution of Spark jobs, explaining why you don't see any queries on the SQL 
page. My hunch is that you'd see some output there if you ran an actual query 
against that table after creating it, though.

> The SQL page is empty
> -
>
> Key: SPARK-12946
> URL: https://issues.apache.org/jira/browse/SPARK-12946
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.0
>Reporter: KaiXinXIaoLei
> Attachments: SQLpage.png
>
>
> I run sql query using  "bin/spark-sql --master yarn". Then i open the ui , 
> and find the SQL page is empty



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12973) Support to set priority when submit spark application to YARN

2016-01-24 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114807#comment-15114807
 ] 

Saisai Shao commented on SPARK-12973:
-

I think there's a similar JIRA SPARK-10879 about this issue, and there's a 
closed PR about it.

> Support to set priority when submit spark application to YARN
> -
>
> Key: SPARK-12973
> URL: https://issues.apache.org/jira/browse/SPARK-12973
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.6.1
>Reporter: Chaozhong Yang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4878) driverPropsFetcher causes spurious Akka disassociate errors

2016-01-24 Thread Liang-Chi Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114861#comment-15114861
 ] 

Liang-Chi Hsieh edited comment on SPARK-4878 at 1/25/16 7:17 AM:
-

I think it is still alive and used. The above code sends the message 
{{RetrieveSparkProps}}  to {{CoarseGrainedSchedulerBackend.DriverEndpoint}} and 
receives spark properties back.


was (Author: viirya):
I think it is still alive and used. The above code sends the message 
{code}RetrieveSparkProps{code}  to 
{code}CoarseGrainedSchedulerBackend.DriverEndpoint{code} and receives spark 
properties back.

> driverPropsFetcher causes spurious Akka disassociate errors
> ---
>
> Key: SPARK-4878
> URL: https://issues.apache.org/jira/browse/SPARK-4878
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Stephen Haberman
>Priority: Minor
>
> The dedicated Akka system to fetching driver properties seems fine, but it 
> leads to very misleading "AssociationHandle$Disassociated", dead letter, etc. 
> sort of messages that can lead the user to believe something is wrong with 
> the cluster.
> (E.g. personally I thought it was a Spark -rc1/-rc2 bug and spent awhile 
> poking around until I saw in the code that driverPropsFetcher is 
> purposefully/immediately shutdown.)
> Is there any way to cleanly shutdown that initial akka system so that the 
> driver doesn't log these errors?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12977) Factoring out StreamingListener and UI to support history UI

2016-01-24 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-12977:
---

 Summary: Factoring out StreamingListener and UI to support history 
UI
 Key: SPARK-12977
 URL: https://issues.apache.org/jira/browse/SPARK-12977
 Project: Spark
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 1.6.0
Reporter: Saisai Shao






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12783) Dataset map serialization error

2016-01-24 Thread Muthu Jayakumar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114340#comment-15114340
 ] 

Muthu Jayakumar commented on SPARK-12783:
-

I moved it to another file altogether (as attached).
I have another file that has the main thread like shown below..
{code}
object SparkJira extends App{
  val sc = //get sc.

  private val sqlContext: SQLContext = sc._2.sqlContext

  import sqlContext.implicits._
  val df1 = sqlContext.createDataset(Seq(TestCaseClass("2015-05-01", "data1"), 
TestCaseClass("2015-05-01", "data2"))).toDF()

  df1.as[TestCaseClass].map(_.toStr).show() //works fine
  df1.as[TestCaseClass].map(_.toMyMap).show() //error
}

{code} 

> Dataset map serialization error
> ---
>
> Key: SPARK-12783
> URL: https://issues.apache.org/jira/browse/SPARK-12783
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Muthu Jayakumar
>Assignee: Wenchen Fan
>Priority: Critical
> Attachments: MyMap.scala
>
>
> When Dataset API is used to map to another case class, an error is thrown.
> {code}
> case class MyMap(map: Map[String, String])
> case class TestCaseClass(a: String, b: String){
>   def toMyMap: MyMap = {
> MyMap(Map(a->b))
>   }
>   def toStr: String = {
> a
>   }
> }
> //Main method section below
> import sqlContext.implicits._
> val df1 = sqlContext.createDataset(Seq(TestCaseClass("2015-05-01", "data1"), 
> TestCaseClass("2015-05-01", "data2"))).toDF()
> df1.as[TestCaseClass].map(_.toStr).show() //works fine
> df1.as[TestCaseClass].map(_.toMyMap).show() //fails
> {code}
> Error message:
> {quote}
> Caused by: java.io.NotSerializableException: 
> scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$1
> Serialization stack:
>   - object not serializable (class: 
> scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$1, value: 
> package lang)
>   - field (class: scala.reflect.internal.Types$ThisType, name: sym, type: 
> class scala.reflect.internal.Symbols$Symbol)
>   - object (class scala.reflect.internal.Types$UniqueThisType, 
> java.lang.type)
>   - field (class: scala.reflect.internal.Types$TypeRef, name: pre, type: 
> class scala.reflect.internal.Types$Type)
>   - object (class scala.reflect.internal.Types$ClassNoArgsTypeRef, String)
>   - field (class: scala.reflect.internal.Types$TypeRef, name: normalized, 
> type: class scala.reflect.internal.Types$Type)
>   - object (class scala.reflect.internal.Types$AliasNoArgsTypeRef, String)
>   - field (class: 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6, name: keyType$1, 
> type: class scala.reflect.api.Types$TypeApi)
>   - object (class 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6, )
>   - field (class: org.apache.spark.sql.catalyst.expressions.MapObjects, 
> name: function, type: interface scala.Function1)
>   - object (class org.apache.spark.sql.catalyst.expressions.MapObjects, 
> mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),-
>  field (class: "scala.collection.immutable.Map", name: "map"),- root class: 
> "collector.MyMap"),keyArray,ArrayType(StringType,true)),StringType))
>   - field (class: org.apache.spark.sql.catalyst.expressions.Invoke, name: 
> targetObject, type: class 
> org.apache.spark.sql.catalyst.expressions.Expression)
>   - object (class org.apache.spark.sql.catalyst.expressions.Invoke, 
> invoke(mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),-
>  field (class: "scala.collection.immutable.Map", name: "map"),- root class: 
> "collector.MyMap"),keyArray,ArrayType(StringType,true)),StringType),array,ObjectType(class
>  [Ljava.lang.Object;)))
>   - writeObject data (class: 
> scala.collection.immutable.List$SerializationProxy)
>   - object (class scala.collection.immutable.List$SerializationProxy, 
> scala.collection.immutable.List$SerializationProxy@4c7e3aab)
>   - writeReplace data (class: 
> scala.collection.immutable.List$SerializationProxy)
>   - object (class scala.collection.immutable.$colon$colon, 
> List(invoke(mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),-
>  field (class: "scala.collection.immutable.Map", name: "map"),- root class: 
> "collector.MyMap"),keyArray,ArrayType(StringType,true)),StringType),array,ObjectType(class
>  [Ljava.lang.Object;)), 
> invoke(mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),-
>  field (class: "scala.collection.immutable.Map", name: "map"),- root class: 
> "collector.MyMap"),valueArray,ArrayType(StringType,true)),StringType),array,ObjectType(class
>  [Ljava.lang.Object;
>   - field (class: org.apache.spark.sql.catalyst.expressions.StaticInvoke, 
> name: arguments, type: interface scala.collection.Seq)
>

[jira] [Comment Edited] (SPARK-12783) Dataset map serialization error

2016-01-24 Thread Muthu Jayakumar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114340#comment-15114340
 ] 

Muthu Jayakumar edited comment on SPARK-12783 at 1/24/16 3:12 PM:
--

I moved it to another file altogether (as attached).
I have another file that has the main thread like shown below..
{code}
object SparkJira extends App{
  val sc = //get sc.

  private val sqlContext: SQLContext = sc._2.sqlContext

  import sqlContext.implicits._
  val df1 = sqlContext.createDataset(Seq(TestCaseClass("2015-05-01", "data1"), 
TestCaseClass("2015-05-01", "data2"))).toDF()

  df1.as[TestCaseClass].map(_.toStr).show() //works fine
  df1.as[TestCaseClass].map(_.toMyMap).show() //error
}
{code} 

I am using 1.6 release version for testing. Would want me to try with some 
other version?


was (Author: babloo80):
I moved it to another file altogether (as attached).
I have another file that has the main thread like shown below..
{code}
object SparkJira extends App{
  val sc = //get sc.

  private val sqlContext: SQLContext = sc._2.sqlContext

  import sqlContext.implicits._
  val df1 = sqlContext.createDataset(Seq(TestCaseClass("2015-05-01", "data1"), 
TestCaseClass("2015-05-01", "data2"))).toDF()

  df1.as[TestCaseClass].map(_.toStr).show() //works fine
  df1.as[TestCaseClass].map(_.toMyMap).show() //error
}

{code} 

> Dataset map serialization error
> ---
>
> Key: SPARK-12783
> URL: https://issues.apache.org/jira/browse/SPARK-12783
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Muthu Jayakumar
>Assignee: Wenchen Fan
>Priority: Critical
> Attachments: MyMap.scala
>
>
> When Dataset API is used to map to another case class, an error is thrown.
> {code}
> case class MyMap(map: Map[String, String])
> case class TestCaseClass(a: String, b: String){
>   def toMyMap: MyMap = {
> MyMap(Map(a->b))
>   }
>   def toStr: String = {
> a
>   }
> }
> //Main method section below
> import sqlContext.implicits._
> val df1 = sqlContext.createDataset(Seq(TestCaseClass("2015-05-01", "data1"), 
> TestCaseClass("2015-05-01", "data2"))).toDF()
> df1.as[TestCaseClass].map(_.toStr).show() //works fine
> df1.as[TestCaseClass].map(_.toMyMap).show() //fails
> {code}
> Error message:
> {quote}
> Caused by: java.io.NotSerializableException: 
> scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$1
> Serialization stack:
>   - object not serializable (class: 
> scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$1, value: 
> package lang)
>   - field (class: scala.reflect.internal.Types$ThisType, name: sym, type: 
> class scala.reflect.internal.Symbols$Symbol)
>   - object (class scala.reflect.internal.Types$UniqueThisType, 
> java.lang.type)
>   - field (class: scala.reflect.internal.Types$TypeRef, name: pre, type: 
> class scala.reflect.internal.Types$Type)
>   - object (class scala.reflect.internal.Types$ClassNoArgsTypeRef, String)
>   - field (class: scala.reflect.internal.Types$TypeRef, name: normalized, 
> type: class scala.reflect.internal.Types$Type)
>   - object (class scala.reflect.internal.Types$AliasNoArgsTypeRef, String)
>   - field (class: 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6, name: keyType$1, 
> type: class scala.reflect.api.Types$TypeApi)
>   - object (class 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6, )
>   - field (class: org.apache.spark.sql.catalyst.expressions.MapObjects, 
> name: function, type: interface scala.Function1)
>   - object (class org.apache.spark.sql.catalyst.expressions.MapObjects, 
> mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),-
>  field (class: "scala.collection.immutable.Map", name: "map"),- root class: 
> "collector.MyMap"),keyArray,ArrayType(StringType,true)),StringType))
>   - field (class: org.apache.spark.sql.catalyst.expressions.Invoke, name: 
> targetObject, type: class 
> org.apache.spark.sql.catalyst.expressions.Expression)
>   - object (class org.apache.spark.sql.catalyst.expressions.Invoke, 
> invoke(mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),-
>  field (class: "scala.collection.immutable.Map", name: "map"),- root class: 
> "collector.MyMap"),keyArray,ArrayType(StringType,true)),StringType),array,ObjectType(class
>  [Ljava.lang.Object;)))
>   - writeObject data (class: 
> scala.collection.immutable.List$SerializationProxy)
>   - object (class scala.collection.immutable.List$SerializationProxy, 
> scala.collection.immutable.List$SerializationProxy@4c7e3aab)
>   - writeReplace data (class: 
> scala.collection.immutable.List$SerializationProxy)
>   - object (class scala.collection.immutable.$colon$colon, 
> 

[jira] [Commented] (SPARK-11796) Docker JDBC integration tests fail in Maven build due to dependency issue

2016-01-24 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114383#comment-15114383
 ] 

Mark Grover commented on SPARK-11796:
-

[~blbradley] I put instructions 
[here|https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-RunningDockerintegrationtests]
 on how to make tests pass.

> Docker JDBC integration tests fail in Maven build due to dependency issue
> -
>
> Key: SPARK-11796
> URL: https://issues.apache.org/jira/browse/SPARK-11796
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 1.6.0
>Reporter: Josh Rosen
>Assignee: Mark Grover
> Fix For: 1.6.0
>
>
> Our new Docker integration tests for JDBC dialects are failing in the Maven 
> builds. For now, I've disabled this for Maven by adding the 
> {{-Dtest.exclude.tags=org.apache.spark.tags.DockerTest}} flag to our Jenkins 
> builds, but we should fix this soon. The test failures seem to be related to 
> dependency or classpath issues:
> {code}
> *** RUN ABORTED ***
>   java.lang.NoSuchMethodError: 
> org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder;
>   at 
> org.glassfish.jersey.apache.connector.ApacheConnector.(ApacheConnector.java:240)
>   at 
> org.glassfish.jersey.apache.connector.ApacheConnectorProvider.getConnector(ApacheConnectorProvider.java:115)
>   at 
> org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:418)
>   at 
> org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88)
>   at 
> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120)
>   at 
> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117)
>   at 
> org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340)
>   at 
> org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726)
>   at 
> org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285)
>   at 
> org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126)
>   ...
> {code}
> To reproduce locally: {{build/mvn -pl docker-integration-tests package}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12783) Dataset map serialization error

2016-01-24 Thread Muthu Jayakumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Muthu Jayakumar updated SPARK-12783:

Attachment: MyMap.scala

> Dataset map serialization error
> ---
>
> Key: SPARK-12783
> URL: https://issues.apache.org/jira/browse/SPARK-12783
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Muthu Jayakumar
>Assignee: Wenchen Fan
>Priority: Critical
> Attachments: MyMap.scala
>
>
> When Dataset API is used to map to another case class, an error is thrown.
> {code}
> case class MyMap(map: Map[String, String])
> case class TestCaseClass(a: String, b: String){
>   def toMyMap: MyMap = {
> MyMap(Map(a->b))
>   }
>   def toStr: String = {
> a
>   }
> }
> //Main method section below
> import sqlContext.implicits._
> val df1 = sqlContext.createDataset(Seq(TestCaseClass("2015-05-01", "data1"), 
> TestCaseClass("2015-05-01", "data2"))).toDF()
> df1.as[TestCaseClass].map(_.toStr).show() //works fine
> df1.as[TestCaseClass].map(_.toMyMap).show() //fails
> {code}
> Error message:
> {quote}
> Caused by: java.io.NotSerializableException: 
> scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$1
> Serialization stack:
>   - object not serializable (class: 
> scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$1, value: 
> package lang)
>   - field (class: scala.reflect.internal.Types$ThisType, name: sym, type: 
> class scala.reflect.internal.Symbols$Symbol)
>   - object (class scala.reflect.internal.Types$UniqueThisType, 
> java.lang.type)
>   - field (class: scala.reflect.internal.Types$TypeRef, name: pre, type: 
> class scala.reflect.internal.Types$Type)
>   - object (class scala.reflect.internal.Types$ClassNoArgsTypeRef, String)
>   - field (class: scala.reflect.internal.Types$TypeRef, name: normalized, 
> type: class scala.reflect.internal.Types$Type)
>   - object (class scala.reflect.internal.Types$AliasNoArgsTypeRef, String)
>   - field (class: 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6, name: keyType$1, 
> type: class scala.reflect.api.Types$TypeApi)
>   - object (class 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6, )
>   - field (class: org.apache.spark.sql.catalyst.expressions.MapObjects, 
> name: function, type: interface scala.Function1)
>   - object (class org.apache.spark.sql.catalyst.expressions.MapObjects, 
> mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),-
>  field (class: "scala.collection.immutable.Map", name: "map"),- root class: 
> "collector.MyMap"),keyArray,ArrayType(StringType,true)),StringType))
>   - field (class: org.apache.spark.sql.catalyst.expressions.Invoke, name: 
> targetObject, type: class 
> org.apache.spark.sql.catalyst.expressions.Expression)
>   - object (class org.apache.spark.sql.catalyst.expressions.Invoke, 
> invoke(mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),-
>  field (class: "scala.collection.immutable.Map", name: "map"),- root class: 
> "collector.MyMap"),keyArray,ArrayType(StringType,true)),StringType),array,ObjectType(class
>  [Ljava.lang.Object;)))
>   - writeObject data (class: 
> scala.collection.immutable.List$SerializationProxy)
>   - object (class scala.collection.immutable.List$SerializationProxy, 
> scala.collection.immutable.List$SerializationProxy@4c7e3aab)
>   - writeReplace data (class: 
> scala.collection.immutable.List$SerializationProxy)
>   - object (class scala.collection.immutable.$colon$colon, 
> List(invoke(mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),-
>  field (class: "scala.collection.immutable.Map", name: "map"),- root class: 
> "collector.MyMap"),keyArray,ArrayType(StringType,true)),StringType),array,ObjectType(class
>  [Ljava.lang.Object;)), 
> invoke(mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),-
>  field (class: "scala.collection.immutable.Map", name: "map"),- root class: 
> "collector.MyMap"),valueArray,ArrayType(StringType,true)),StringType),array,ObjectType(class
>  [Ljava.lang.Object;
>   - field (class: org.apache.spark.sql.catalyst.expressions.StaticInvoke, 
> name: arguments, type: interface scala.collection.Seq)
>   - object (class org.apache.spark.sql.catalyst.expressions.StaticInvoke, 
> staticinvoke(class 
> org.apache.spark.sql.catalyst.util.ArrayBasedMapData$,ObjectType(interface 
> scala.collection.Map),toScalaMap,invoke(mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),-
>  field (class: "scala.collection.immutable.Map", name: "map"),- root class: 
> "collector.MyMap"),keyArray,ArrayType(StringType,true)),StringType),array,ObjectType(class
>  
> 

[jira] [Commented] (SPARK-11796) Docker JDBC integration tests fail in Maven build due to dependency issue

2016-01-24 Thread Brandon Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114384#comment-15114384
 ] 

Brandon Bradley commented on SPARK-11796:
-

[~mgrover] The test runs on the command line but not in IntelliJ. Looks like a 
classpath dependency issue, having trouble sorting it out.

> Docker JDBC integration tests fail in Maven build due to dependency issue
> -
>
> Key: SPARK-11796
> URL: https://issues.apache.org/jira/browse/SPARK-11796
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 1.6.0
>Reporter: Josh Rosen
>Assignee: Mark Grover
> Fix For: 1.6.0
>
>
> Our new Docker integration tests for JDBC dialects are failing in the Maven 
> builds. For now, I've disabled this for Maven by adding the 
> {{-Dtest.exclude.tags=org.apache.spark.tags.DockerTest}} flag to our Jenkins 
> builds, but we should fix this soon. The test failures seem to be related to 
> dependency or classpath issues:
> {code}
> *** RUN ABORTED ***
>   java.lang.NoSuchMethodError: 
> org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder;
>   at 
> org.glassfish.jersey.apache.connector.ApacheConnector.(ApacheConnector.java:240)
>   at 
> org.glassfish.jersey.apache.connector.ApacheConnectorProvider.getConnector(ApacheConnectorProvider.java:115)
>   at 
> org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:418)
>   at 
> org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88)
>   at 
> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120)
>   at 
> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117)
>   at 
> org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340)
>   at 
> org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726)
>   at 
> org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285)
>   at 
> org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126)
>   ...
> {code}
> To reproduce locally: {{build/mvn -pl docker-integration-tests package}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12850) Support bucket pruning (predicate pushdown for bucketed tables)

2016-01-24 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114420#comment-15114420
 ] 

Xiao Li commented on SPARK-12850:
-

To make the implementation of this JIRA simpler, I will first submit a separate 
PR for handling the table bucketing when partitioning columns have overlapping 
columns with the bucketing columns.

> Support bucket pruning (predicate pushdown for bucketed tables)
> ---
>
> Key: SPARK-12850
> URL: https://issues.apache.org/jira/browse/SPARK-12850
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>
> We now support bucketing. One optimization opportunity is to push some 
> predicates into the scan to skip scanning files that definitely won't match 
> the values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12970) Error in documentation

2016-01-24 Thread Haidar Hadi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114485#comment-15114485
 ] 

Haidar Hadi commented on SPARK-12970:
-

let's consider the following code:

import org.apache.spark.sql.types._
 val struct = StructType(StructField("f1", StringType, true) :: Nil)
 val row = Row(1)
 println(row.fieldIndex("f1"))
 
which generates the following error when executed:
Exception in thread "main" java.lang.UnsupportedOperationException: fieldIndex 
on a Row without schema is undefined.


> Error in documentation 
> ---
>
> Key: SPARK-12970
> URL: https://issues.apache.org/jira/browse/SPARK-12970
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.6.0
>Reporter: Haidar Hadi
>Priority: Minor
>  Labels: documentation
>
> The provided example in this doc 
> https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/types/StructType.html
>  for creating Row from Struct is wrong
>  // Create a Row with the schema defined by struct
>  val row = Row(Row(1, 2, true))
>  // row: Row = {@link 1,2,true}
>  
> the above example does not create a Row object with schema.
> this error is in the scala docs too. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11796) Docker JDBC integration tests fail in Maven build due to dependency issue

2016-01-24 Thread Brandon Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114484#comment-15114484
 ] 

Brandon Bradley commented on SPARK-11796:
-

[~mgrover] I figured it out! I believe IntelliJ doesn't support shaded 
dependencies.

> Docker JDBC integration tests fail in Maven build due to dependency issue
> -
>
> Key: SPARK-11796
> URL: https://issues.apache.org/jira/browse/SPARK-11796
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 1.6.0
>Reporter: Josh Rosen
>Assignee: Mark Grover
> Fix For: 1.6.0
>
>
> Our new Docker integration tests for JDBC dialects are failing in the Maven 
> builds. For now, I've disabled this for Maven by adding the 
> {{-Dtest.exclude.tags=org.apache.spark.tags.DockerTest}} flag to our Jenkins 
> builds, but we should fix this soon. The test failures seem to be related to 
> dependency or classpath issues:
> {code}
> *** RUN ABORTED ***
>   java.lang.NoSuchMethodError: 
> org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder;
>   at 
> org.glassfish.jersey.apache.connector.ApacheConnector.(ApacheConnector.java:240)
>   at 
> org.glassfish.jersey.apache.connector.ApacheConnectorProvider.getConnector(ApacheConnectorProvider.java:115)
>   at 
> org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:418)
>   at 
> org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88)
>   at 
> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120)
>   at 
> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117)
>   at 
> org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340)
>   at 
> org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726)
>   at 
> org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285)
>   at 
> org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126)
>   ...
> {code}
> To reproduce locally: {{build/mvn -pl docker-integration-tests package}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10498) Add requirements file for create dev python tools

2016-01-24 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-10498.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10871
[https://github.com/apache/spark/pull/10871]

> Add requirements file for create dev python tools
> -
>
> Key: SPARK-10498
> URL: https://issues.apache.org/jira/browse/SPARK-10498
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: holdenk
>Priority: Minor
> Fix For: 2.0.0
>
>
> Minor since so few people use them, but it would probably be good to have a 
> requirements file for our python release tools for easier setup (also version 
> pinning).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12970) Error in documentation on creating rows with schemas defined by structs

2016-01-24 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-12970:
---
Summary: Error in documentation on creating rows with schemas defined by 
structs  (was: Error in documentation )

> Error in documentation on creating rows with schemas defined by structs
> ---
>
> Key: SPARK-12970
> URL: https://issues.apache.org/jira/browse/SPARK-12970
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.6.0
>Reporter: Haidar Hadi
>Priority: Minor
>  Labels: documentation
>
> The provided example in this doc 
> https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/types/StructType.html
>  for creating Row from Struct is wrong
>  // Create a Row with the schema defined by struct
>  val row = Row(Row(1, 2, true))
>  // row: Row = {@link 1,2,true}
>  
> the above example does not create a Row object with schema.
> this error is in the scala docs too. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12971) Address test isolation problems which broke Hive tests on Hadoop 2.3 SBT build

2016-01-24 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-12971.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10884
[https://github.com/apache/spark/pull/10884]

> Address test isolation problems which broke Hive tests on Hadoop 2.3 SBT build
> --
>
> Key: SPARK-12971
> URL: https://issues.apache.org/jira/browse/SPARK-12971
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 2.0.0
>
>
> ErrorPositionSuite and one of the HiveComparisonTest tests have been 
> consistently failing on the Hadoop 2.3 SBT build (but on no other builds). I 
> believe that this is due to test isolation issues (e.g. tests sharing state 
> via the sets of temporary tables that are registered to TestHive).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-11796) Docker JDBC integration tests fail in Maven build due to dependency issue

2016-01-24 Thread Brandon Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114484#comment-15114484
 ] 

Brandon Bradley edited comment on SPARK-11796 at 1/24/16 7:38 PM:
--

[~mgrover] I figured it out! I believe IntelliJ 15 doesn't support shaded 
dependencies from SBT. It imports jars from the shaded dependencies as well.


was (Author: blbradley):
[~mgrover] I figured it out! I believe IntelliJ doesn't support shaded 
dependencies.

> Docker JDBC integration tests fail in Maven build due to dependency issue
> -
>
> Key: SPARK-11796
> URL: https://issues.apache.org/jira/browse/SPARK-11796
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 1.6.0
>Reporter: Josh Rosen
>Assignee: Mark Grover
> Fix For: 1.6.0
>
>
> Our new Docker integration tests for JDBC dialects are failing in the Maven 
> builds. For now, I've disabled this for Maven by adding the 
> {{-Dtest.exclude.tags=org.apache.spark.tags.DockerTest}} flag to our Jenkins 
> builds, but we should fix this soon. The test failures seem to be related to 
> dependency or classpath issues:
> {code}
> *** RUN ABORTED ***
>   java.lang.NoSuchMethodError: 
> org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder;
>   at 
> org.glassfish.jersey.apache.connector.ApacheConnector.(ApacheConnector.java:240)
>   at 
> org.glassfish.jersey.apache.connector.ApacheConnectorProvider.getConnector(ApacheConnectorProvider.java:115)
>   at 
> org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:418)
>   at 
> org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88)
>   at 
> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120)
>   at 
> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117)
>   at 
> org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340)
>   at 
> org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726)
>   at 
> org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285)
>   at 
> org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126)
>   ...
> {code}
> To reproduce locally: {{build/mvn -pl docker-integration-tests package}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12970) Error in documentation

2016-01-24 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114500#comment-15114500
 ] 

Josh Rosen commented on SPARK-12970:


Also, general note regarding JIRA titles: "error in documentation" is a really 
bad title since that could mean anything. Next time, please choose a more 
descriptive-yet-concise title, since that makes issues easier to search and 
scan and helps the emails to have better subject lines in our inboxes.

> Error in documentation 
> ---
>
> Key: SPARK-12970
> URL: https://issues.apache.org/jira/browse/SPARK-12970
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.6.0
>Reporter: Haidar Hadi
>Priority: Minor
>  Labels: documentation
>
> The provided example in this doc 
> https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/types/StructType.html
>  for creating Row from Struct is wrong
>  // Create a Row with the schema defined by struct
>  val row = Row(Row(1, 2, true))
>  // row: Row = {@link 1,2,true}
>  
> the above example does not create a Row object with schema.
> this error is in the scala docs too. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12850) Support bucket pruning (predicate pushdown for bucketed tables)

2016-01-24 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114503#comment-15114503
 ] 

Reynold Xin commented on SPARK-12850:
-

I'd just do the simply cases first.


> Support bucket pruning (predicate pushdown for bucketed tables)
> ---
>
> Key: SPARK-12850
> URL: https://issues.apache.org/jira/browse/SPARK-12850
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>
> We now support bucketing. One optimization opportunity is to push some 
> predicates into the scan to skip scanning files that definitely won't match 
> the values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12850) Support bucket pruning (predicate pushdown for bucketed tables)

2016-01-24 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114415#comment-15114415
 ] 

Xiao Li commented on SPARK-12850:
-

Doing the design and prototype. Notice a few issues we need to consider:
- Partitioning columns could have overlapping columns with the bucketing 
columns;
- The predicates we can use for bucket pruning: EqualTo, EqualNullSafe, IsNull, 
In, InSet;
- Need to support mixed And and Or in the filters;
- After generating the bucket set we need to scan, we should remove the 
corresponding filters, if possible;

Maybe, I will just submit a simplified version at first. 

> Support bucket pruning (predicate pushdown for bucketed tables)
> ---
>
> Key: SPARK-12850
> URL: https://issues.apache.org/jira/browse/SPARK-12850
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>
> We now support bucketing. One optimization opportunity is to push some 
> predicates into the scan to skip scanning files that definitely won't match 
> the values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10498) Add requirements file for create dev python tools

2016-01-24 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-10498:
---
Assignee: holdenk

> Add requirements file for create dev python tools
> -
>
> Key: SPARK-10498
> URL: https://issues.apache.org/jira/browse/SPARK-10498
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: holdenk
>Assignee: holdenk
>Priority: Minor
> Fix For: 2.0.0
>
>
> Minor since so few people use them, but it would probably be good to have a 
> requirements file for our python release tools for easier setup (also version 
> pinning).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12850) Support bucket pruning (predicate pushdown for bucketed tables)

2016-01-24 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114515#comment-15114515
 ] 

Xiao Li commented on SPARK-12850:
-

Sure, will make a try. Thanks!

> Support bucket pruning (predicate pushdown for bucketed tables)
> ---
>
> Key: SPARK-12850
> URL: https://issues.apache.org/jira/browse/SPARK-12850
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>
> We now support bucketing. One optimization opportunity is to push some 
> predicates into the scan to skip scanning files that definitely won't match 
> the values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12120) Improve exception message when failing to initialize HiveContext in PySpark

2016-01-24 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-12120:
---
Assignee: Jeff Zhang

> Improve exception message when failing to initialize HiveContext in PySpark
> ---
>
> Key: SPARK-12120
> URL: https://issues.apache.org/jira/browse/SPARK-12120
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>Priority: Minor
>
> I get the following exception message when failing to initialize HiveContext. 
> This is hard to figure out why HiveContext failed to initialize. Actually I 
> build spark with hive profile enabled. The reason the HiveContext failed is 
> due to I didn't start hdfs service. And actually I can see the full 
> stacktrace in spark-shell. And I also can see the full stack trace in 
> python2. The issue only exists in python2.x
> {code}
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 430, 
> in createDataFrame
> jdf = self._ssql_ctx.applySchemaToPythonRDD(jrdd.rdd(), schema.json())
>   File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 691, 
> in _ssql_ctx
> "build/sbt assembly", e)
> Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run 
> build/sbt assembly", Py4JJavaError(u'An error occurred while calling 
> None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o34))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12970) Error in documentation on creating rows with schemas defined by structs

2016-01-24 Thread Haidar Hadi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114539#comment-15114539
 ] 

Haidar Hadi commented on SPARK-12970:
-

sure [~jrose] I understand. 

> Error in documentation on creating rows with schemas defined by structs
> ---
>
> Key: SPARK-12970
> URL: https://issues.apache.org/jira/browse/SPARK-12970
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.6.0
>Reporter: Haidar Hadi
>Priority: Minor
>  Labels: documentation
>
> The provided example in this doc 
> https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/types/StructType.html
>  for creating Row from Struct is wrong
>  // Create a Row with the schema defined by struct
>  val row = Row(Row(1, 2, true))
>  // row: Row = {@link 1,2,true}
>  
> the above example does not create a Row object with schema.
> this error is in the scala docs too. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12917) Add DML support to Spark SQL for HIVE

2016-01-24 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114513#comment-15114513
 ] 

Herman van Hovell commented on SPARK-12917:
---

Could you be a bit more specific? What dml operations are you missing?

> Add DML support to Spark SQL for HIVE
> -
>
> Key: SPARK-12917
> URL: https://issues.apache.org/jira/browse/SPARK-12917
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Hemang Nagar
>Priority: Blocker
>
> Spark SQL should be updated to support the DML operations that are being 
> supported by Hive since 0.14



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12120) Improve exception message when failing to initialize HiveContext in PySpark

2016-01-24 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114523#comment-15114523
 ] 

Josh Rosen commented on SPARK-12120:


Fixed in 1.6.1 and 2.0.0.

> Improve exception message when failing to initialize HiveContext in PySpark
> ---
>
> Key: SPARK-12120
> URL: https://issues.apache.org/jira/browse/SPARK-12120
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>Priority: Minor
> Fix For: 1.6.1, 2.0.0
>
>
> I get the following exception message when failing to initialize HiveContext. 
> This is hard to figure out why HiveContext failed to initialize. Actually I 
> build spark with hive profile enabled. The reason the HiveContext failed is 
> due to I didn't start hdfs service. And actually I can see the full 
> stacktrace in spark-shell. And I also can see the full stack trace in 
> python2. The issue only exists in python2.x
> {code}
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 430, 
> in createDataFrame
> jdf = self._ssql_ctx.applySchemaToPythonRDD(jrdd.rdd(), schema.json())
>   File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 691, 
> in _ssql_ctx
> "build/sbt assembly", e)
> Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run 
> build/sbt assembly", Py4JJavaError(u'An error occurred while calling 
> None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o34))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12120) Improve exception message when failing to initialize HiveContext in PySpark

2016-01-24 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-12120.

   Resolution: Fixed
Fix Version/s: 2.0.0
   1.6.1

> Improve exception message when failing to initialize HiveContext in PySpark
> ---
>
> Key: SPARK-12120
> URL: https://issues.apache.org/jira/browse/SPARK-12120
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>Priority: Minor
> Fix For: 1.6.1, 2.0.0
>
>
> I get the following exception message when failing to initialize HiveContext. 
> This is hard to figure out why HiveContext failed to initialize. Actually I 
> build spark with hive profile enabled. The reason the HiveContext failed is 
> due to I didn't start hdfs service. And actually I can see the full 
> stacktrace in spark-shell. And I also can see the full stack trace in 
> python2. The issue only exists in python2.x
> {code}
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 430, 
> in createDataFrame
> jdf = self._ssql_ctx.applySchemaToPythonRDD(jrdd.rdd(), schema.json())
>   File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 691, 
> in _ssql_ctx
> "build/sbt assembly", e)
> Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run 
> build/sbt assembly", Py4JJavaError(u'An error occurred while calling 
> None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o34))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10345) Flaky test: HiveCompatibilitySuite.nonblock_op_deduplicate

2016-01-24 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-10345.

Resolution: Cannot Reproduce

> Flaky test: HiveCompatibilitySuite.nonblock_op_deduplicate
> --
>
> Key: SPARK-10345
> URL: https://issues.apache.org/jira/browse/SPARK-10345
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Davies Liu
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41759/testReport/org.apache.spark.sql.hive.execution/HiveCompatibilitySuite/nonblock_op_deduplicate/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12970) Error in documentation on creating rows with schemas defined by structs

2016-01-24 Thread Haidar Hadi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114485#comment-15114485
 ] 

Haidar Hadi edited comment on SPARK-12970 at 1/24/16 9:09 PM:
--

[~srowen]
let's consider the following code:

import org.apache.spark.sql.types._
 val struct = StructType(StructField("f1", StringType, true) :: Nil)
 val row = Row(1)
 println(row.fieldIndex("f1"))
 
which generates the following error when executed:
Exception in thread "main" java.lang.UnsupportedOperationException: fieldIndex 
on a Row without schema is undefined.

Therefore, I do not think it is taking the struct schema as param in 
constructing the Row object. 


was (Author: hhadi):
let's consider the following code:

import org.apache.spark.sql.types._
 val struct = StructType(StructField("f1", StringType, true) :: Nil)
 val row = Row(1)
 println(row.fieldIndex("f1"))
 
which generates the following error when executed:
Exception in thread "main" java.lang.UnsupportedOperationException: fieldIndex 
on a Row without schema is undefined.


> Error in documentation on creating rows with schemas defined by structs
> ---
>
> Key: SPARK-12970
> URL: https://issues.apache.org/jira/browse/SPARK-12970
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.6.0
>Reporter: Haidar Hadi
>Priority: Minor
>  Labels: documentation
>
> The provided example in this doc 
> https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/types/StructType.html
>  for creating Row from Struct is wrong
>  // Create a Row with the schema defined by struct
>  val row = Row(Row(1, 2, true))
>  // row: Row = {@link 1,2,true}
>  
> the above example does not create a Row object with schema.
> this error is in the scala docs too. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-12858) Remove duplicated code in metrics

2016-01-24 Thread Benjamin Fradet (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Fradet closed SPARK-12858.
---
Resolution: Not A Problem

> Remove duplicated code in metrics
> -
>
> Key: SPARK-12858
> URL: https://issues.apache.org/jira/browse/SPARK-12858
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Benjamin Fradet
>Priority: Minor
>
> I noticed there is some duplicated code in the sinks regarding the poll 
> period.
> Also, parts of the metrics.properties template are unclear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11796) Docker JDBC integration tests fail in Maven build due to dependency issue

2016-01-24 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114590#comment-15114590
 ] 

Mark Grover commented on SPARK-11796:
-

Awesome, thanks for sharing.

> Docker JDBC integration tests fail in Maven build due to dependency issue
> -
>
> Key: SPARK-11796
> URL: https://issues.apache.org/jira/browse/SPARK-11796
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 1.6.0
>Reporter: Josh Rosen
>Assignee: Mark Grover
> Fix For: 1.6.0
>
>
> Our new Docker integration tests for JDBC dialects are failing in the Maven 
> builds. For now, I've disabled this for Maven by adding the 
> {{-Dtest.exclude.tags=org.apache.spark.tags.DockerTest}} flag to our Jenkins 
> builds, but we should fix this soon. The test failures seem to be related to 
> dependency or classpath issues:
> {code}
> *** RUN ABORTED ***
>   java.lang.NoSuchMethodError: 
> org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder;
>   at 
> org.glassfish.jersey.apache.connector.ApacheConnector.(ApacheConnector.java:240)
>   at 
> org.glassfish.jersey.apache.connector.ApacheConnectorProvider.getConnector(ApacheConnectorProvider.java:115)
>   at 
> org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:418)
>   at 
> org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88)
>   at 
> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120)
>   at 
> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117)
>   at 
> org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340)
>   at 
> org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726)
>   at 
> org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285)
>   at 
> org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126)
>   ...
> {code}
> To reproduce locally: {{build/mvn -pl docker-integration-tests package}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org