[jira] [Assigned] (SPARK-16043) Prepare GenericArrayData implementation specialized for a primitive array

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16043:


Assignee: (was: Apache Spark)

> Prepare GenericArrayData implementation specialized for a primitive array
> -
>
> Key: SPARK-16043
> URL: https://issues.apache.org/jira/browse/SPARK-16043
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Kazuaki Ishizaki
>
> There is a ToDo of GenericArrayData class, which is to eliminate 
> boxing/unboxing for a primitive array (described 
> [here|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala#L31])
> It would be good to prepare GenericArrayData implementation specialized for a 
> primitive array to eliminate boxing/unboxing from the view of runtime memory 
> footprint and performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16044) input_file_name() returns empty strings in data sources based on NewHadoopRDD.

2016-06-17 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-16044:


 Summary: input_file_name() returns empty strings in data sources 
based on NewHadoopRDD.
 Key: SPARK-16044
 URL: https://issues.apache.org/jira/browse/SPARK-16044
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Hyukjin Kwon


The issue is, {{input_file_name()}} function does not contain file paths when 
data sources use {{NewHadoopRDD}}. This is currently only supported for 
{{FileScanRDD}} and {{HadoopRDD}}.

To be clear, this does not affect Spark's internal data sources because 
currently they all do not use {{NewHadoopRDD}}.

However, there are several datasources using this. For example,
 
spark-redshift - 
[here|https://github.com/databricks/spark-redshift/blob/cba5eee1ab79ae8f0fa9e668373a54d2b5babf6b/src/main/scala/com/databricks/spark/redshift/RedshiftRelation.scala#L149]
spark-xml - 
[here|https://github.com/databricks/spark-xml/blob/master/src/main/scala/com/databricks/spark/xml/util/XmlFile.scala#L39-L47]

Currently, using this functions shows the output below:

{code}
+-+
|input_file_name()|
+-+
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
+-+
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16043) Prepare GenericArrayData implementation specialized for a primitive array

2016-06-17 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337617#comment-15337617
 ] 

Apache Spark commented on SPARK-16043:
--

User 'kiszk' has created a pull request for this issue:
https://github.com/apache/spark/pull/13758

> Prepare GenericArrayData implementation specialized for a primitive array
> -
>
> Key: SPARK-16043
> URL: https://issues.apache.org/jira/browse/SPARK-16043
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Kazuaki Ishizaki
>
> There is a ToDo of GenericArrayData class, which is to eliminate 
> boxing/unboxing for a primitive array (described 
> [here|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala#L31])
> It would be good to prepare GenericArrayData implementation specialized for a 
> primitive array to eliminate boxing/unboxing from the view of runtime memory 
> footprint and performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16043) Prepare GenericArrayData implementation specialized for a primitive array

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16043:


Assignee: Apache Spark

> Prepare GenericArrayData implementation specialized for a primitive array
> -
>
> Key: SPARK-16043
> URL: https://issues.apache.org/jira/browse/SPARK-16043
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Kazuaki Ishizaki
>Assignee: Apache Spark
>
> There is a ToDo of GenericArrayData class, which is to eliminate 
> boxing/unboxing for a primitive array (described 
> [here|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala#L31])
> It would be good to prepare GenericArrayData implementation specialized for a 
> primitive array to eliminate boxing/unboxing from the view of runtime memory 
> footprint and performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16022) Input size is different when I use 1 or 3 nodes but the shufle size remains +- icual, do you know why?

2016-06-17 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337615#comment-15337615
 ] 

Sean Owen commented on SPARK-16022:
---

The u...@spark.apache.org mailing list http://spark.apache.org/community.html

> Input size is different when I use 1 or 3 nodes but the shufle size remains 
> +- icual, do you know why?
> --
>
> Key: SPARK-16022
> URL: https://issues.apache.org/jira/browse/SPARK-16022
> Project: Spark
>  Issue Type: Test
>Reporter: jon
>
> I run some queries on spark with just one node and then with 3 nodes. And in 
> the spark:4040 UI I see something that I am not understanding.
> For example after executing a query with 3 nodes and check the results in the 
> spark UI, in the "input" tab appears 2,8gb, so spark read 2,8gb from hadoop. 
> The same query on hadoop with just one node in local mode appears 7,3gb, the 
> spark read 7,3GB from hadoop. But this value shouldnt be equal?
> For example the value of shuffle remains +- equal in one node vs 3. Why the 
> input value doesn't stay equal? The same amount of data must be read from the 
> hdfs, so I am not understanding.
> Do you know?
> Single node:
> Input: 7,3 GB
> Shuffle read: 208.1kb
> Shuffle write: 208.1kb
> 3 nodes:
> Input: 2,8 GB
> Shuffle read: 193,3 kb
> Shuffle write; 208.1 kb



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16040) spark.mllib PIC document extra line of refernece

2016-06-17 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-16040:
--
Priority: Trivial  (was: Minor)

OK, this does not need a JIRA

> spark.mllib PIC document extra line of refernece
> 
>
> Key: SPARK-16040
> URL: https://issues.apache.org/jira/browse/SPARK-16040
> Project: Spark
>  Issue Type: Documentation
>Reporter: Miao Wang
>Priority: Trivial
>
> In the 2.0 document, Line "A full example that produces the experiment 
> described in the PIC paper can be found under examples/." is redundant. 
> There is already "Find full example code at 
> "examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala"
>  in the Spark repo.".
> We should remove the first line, which is consistent with other documents. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16043) Prepare GenericArrayData implementation specialized for a primitive array

2016-06-17 Thread Kazuaki Ishizaki (JIRA)
Kazuaki Ishizaki created SPARK-16043:


 Summary: Prepare GenericArrayData implementation specialized for a 
primitive array
 Key: SPARK-16043
 URL: https://issues.apache.org/jira/browse/SPARK-16043
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Kazuaki Ishizaki


There is a ToDo of GenericArrayData class, which is to eliminate 
boxing/unboxing for a primitive array (described 
[here|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala#L31])

It would be good to prepare GenericArrayData implementation specialized for a 
primitive array to eliminate boxing/unboxing from the view of runtime memory 
footprint and performance.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15973) Fix GroupedData Documentation

2016-06-17 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-15973.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> Fix GroupedData Documentation
> -
>
> Key: SPARK-15973
> URL: https://issues.apache.org/jira/browse/SPARK-15973
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Vladimir Feinberg
>Priority: Trivial
> Fix For: 2.0.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> (1)
> {{GroupedData.pivot}} documenation uses {{//}} instead of {{#}} for doctest 
> python comments, which messes up formatting in the documentation as well as 
> the doctests themselves.
> A PR resolving this should probably resolve the other places this happens in 
> pyspark.
> (2)
> Simple aggregation functions which take column names {{cols}} as varargs 
> arguments show up in documentation with the argument {{args}}, but their 
> documentation refers to {{cols}}.
> The discrepancy is caused by an annotation, {{df_varargs_api}}, which 
> produces a temporary function with arguments {{args}} instead of {{cols}}, 
> creating the confusing documentation.
> (3)
> The {{pyspark.sql.GroupedData}} object calls the Java object it wraps around 
> as the member variable {{self._jdf}}, which is exactly the same as 
> {{pyspark.sql.DataFrame}}, when referring its object.
> The acronym is incorrect, standing for "Java DataFrame" instead of what 
> should be "Java GroupedData". As such, the name should be changed to 
> {{self._jgd}} - in fact, in the {{DataFrame.groupBy}} implementation, the 
> java object is referred to as exactly {{jgd}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16025) Document OFF_HEAP storage level in 2.0

2016-06-17 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-16025:
--
Priority: Minor  (was: Major)

> Document OFF_HEAP storage level in 2.0
> --
>
> Key: SPARK-16025
> URL: https://issues.apache.org/jira/browse/SPARK-16025
> Project: Spark
>  Issue Type: Documentation
>Reporter: Eric Liang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16023) Move InMemoryRelation to its own file

2016-06-17 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-16023:
--
Issue Type: Improvement  (was: Bug)

> Move InMemoryRelation to its own file
> -
>
> Key: SPARK-16023
> URL: https://issues.apache.org/jira/browse/SPARK-16023
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Minor
> Fix For: 2.0.0
>
>
> Just to make InMemoryTableScanExec a little smaller and more readable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16023) Move InMemoryRelation to its own file

2016-06-17 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-16023.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> Move InMemoryRelation to its own file
> -
>
> Key: SPARK-16023
> URL: https://issues.apache.org/jira/browse/SPARK-16023
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Minor
> Fix For: 2.0.0
>
>
> Just to make InMemoryTableScanExec a little smaller and more readable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16042) Eliminate nullcheck code at projection for an array type

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16042:


Assignee: Apache Spark

> Eliminate nullcheck code at projection for an array type
> 
>
> Key: SPARK-16042
> URL: https://issues.apache.org/jira/browse/SPARK-16042
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Kazuaki Ishizaki
>Assignee: Apache Spark
>
> When we run a spark program with a projection for a array type, nullcheck at 
> a call to write each element of an array is generated. If we know all of the 
> elements do not have {{null}} at compilation time, we can eliminate code for 
> nullcheck.
> {code}
> val df = sparkContext.parallelize(Seq(1.0, 2.0), 1).toDF("v")
> df.selectExpr("Array(v + 2.2, v + 3.3)").collect
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16042) Eliminate nullcheck code at projection for an array type

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16042:


Assignee: (was: Apache Spark)

> Eliminate nullcheck code at projection for an array type
> 
>
> Key: SPARK-16042
> URL: https://issues.apache.org/jira/browse/SPARK-16042
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Kazuaki Ishizaki
>
> When we run a spark program with a projection for a array type, nullcheck at 
> a call to write each element of an array is generated. If we know all of the 
> elements do not have {{null}} at compilation time, we can eliminate code for 
> nullcheck.
> {code}
> val df = sparkContext.parallelize(Seq(1.0, 2.0), 1).toDF("v")
> df.selectExpr("Array(v + 2.2, v + 3.3)").collect
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16042) Eliminate nullcheck code at projection for an array type

2016-06-17 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337594#comment-15337594
 ] 

Apache Spark commented on SPARK-16042:
--

User 'kiszk' has created a pull request for this issue:
https://github.com/apache/spark/pull/13757

> Eliminate nullcheck code at projection for an array type
> 
>
> Key: SPARK-16042
> URL: https://issues.apache.org/jira/browse/SPARK-16042
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Kazuaki Ishizaki
>
> When we run a spark program with a projection for a array type, nullcheck at 
> a call to write each element of an array is generated. If we know all of the 
> elements do not have {{null}} at compilation time, we can eliminate code for 
> nullcheck.
> {code}
> val df = sparkContext.parallelize(Seq(1.0, 2.0), 1).toDF("v")
> df.selectExpr("Array(v + 2.2, v + 3.3)").collect
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16042) Eliminate nullcheck code at projection for an array type

2016-06-17 Thread Kazuaki Ishizaki (JIRA)
Kazuaki Ishizaki created SPARK-16042:


 Summary: Eliminate nullcheck code at projection for an array type
 Key: SPARK-16042
 URL: https://issues.apache.org/jira/browse/SPARK-16042
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Kazuaki Ishizaki


When we run a spark program with a projection for a array type, nullcheck at a 
call to write each element of an array is generated. If we know all of the 
elements do not have {{null}} at compilation time, we can eliminate code for 
nullcheck.

{code}
val df = sparkContext.parallelize(Seq(1.0, 2.0), 1).toDF("v")
df.selectExpr("Array(v + 2.2, v + 3.3)").collect
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15803) Support with statement syntax for SparkSession

2016-06-17 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-15803:
---
Assignee: Jeff Zhang

> Support with statement syntax for SparkSession
> --
>
> Key: SPARK-15803
> URL: https://issues.apache.org/jira/browse/SPARK-15803
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>Priority: Minor
> Fix For: 2.0.0
>
>
> It would be nice to support with statement syntax for SparkSession like 
> following
> {code}
> with SparkSession.builder.(...).getOrCreate() as session:
>   session.sql("show tables").show()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15803) Support with statement syntax for SparkSession

2016-06-17 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-15803.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 13541
[https://github.com/apache/spark/pull/13541]

> Support with statement syntax for SparkSession
> --
>
> Key: SPARK-15803
> URL: https://issues.apache.org/jira/browse/SPARK-15803
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Jeff Zhang
>Priority: Minor
> Fix For: 2.0.0
>
>
> It would be nice to support with statement syntax for SparkSession like 
> following
> {code}
> with SparkSession.builder.(...).getOrCreate() as session:
>   session.sql("show tables").show()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16035) The SparseVector parser fails checking for valid end parenthesis

2016-06-17 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-16035:
--
Assignee: Andrea Pasqua

> The SparseVector parser fails checking for valid end parenthesis
> 
>
> Key: SPARK-16035
> URL: https://issues.apache.org/jira/browse/SPARK-16035
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Andrea Pasqua
>Assignee: Andrea Pasqua
>Priority: Minor
> Fix For: 1.6.2, 2.0.0
>
>
> Running
>   SparseVector.parse(' (4, [0,1 ],[ 4.0,5.0] ')
> will not raise an exception as expected, although it parses it as if there 
> was an end parenthesis.
> This can be fixed by replacing
>   if start == -1:
>raise ValueError("Tuple should end with ')'")
> with
>  if end == -1:
>raise ValueError("Tuple should end with ')'")
> Please see posted PR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16035) The SparseVector parser fails checking for valid end parenthesis

2016-06-17 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-16035.
---
   Resolution: Fixed
Fix Version/s: 1.6.2
   2.0.0

Issue resolved by pull request 13750
[https://github.com/apache/spark/pull/13750]

> The SparseVector parser fails checking for valid end parenthesis
> 
>
> Key: SPARK-16035
> URL: https://issues.apache.org/jira/browse/SPARK-16035
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Andrea Pasqua
>Priority: Minor
> Fix For: 2.0.0, 1.6.2
>
>
> Running
>   SparseVector.parse(' (4, [0,1 ],[ 4.0,5.0] ')
> will not raise an exception as expected, although it parses it as if there 
> was an end parenthesis.
> This can be fixed by replacing
>   if start == -1:
>raise ValueError("Tuple should end with ')'")
> with
>  if end == -1:
>raise ValueError("Tuple should end with ')'")
> Please see posted PR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16041) Disallow Duplicate Columns in `partitionBy`, `blockBy` and `sortBy` in DataFrameWriter

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16041:


Assignee: Apache Spark

> Disallow Duplicate Columns in `partitionBy`, `blockBy` and `sortBy` in 
> DataFrameWriter
> --
>
> Key: SPARK-16041
> URL: https://issues.apache.org/jira/browse/SPARK-16041
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> Duplicate columns are not allowed in `partitionBy`, `blockBy`, `sortBy` in 
> DataFrameWriter. The duplicate columns could cause unpredictable results. For 
> example, the resolution failure. 
> We should detect the duplicates and issue exceptions with appropriate 
> messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16041) Disallow Duplicate Columns in `partitionBy`, `blockBy` and `sortBy` in DataFrameWriter

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16041:


Assignee: (was: Apache Spark)

> Disallow Duplicate Columns in `partitionBy`, `blockBy` and `sortBy` in 
> DataFrameWriter
> --
>
> Key: SPARK-16041
> URL: https://issues.apache.org/jira/browse/SPARK-16041
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> Duplicate columns are not allowed in `partitionBy`, `blockBy`, `sortBy` in 
> DataFrameWriter. The duplicate columns could cause unpredictable results. For 
> example, the resolution failure. 
> We should detect the duplicates and issue exceptions with appropriate 
> messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16041) Disallow Duplicate Columns in `partitionBy`, `blockBy` and `sortBy` in DataFrameWriter

2016-06-17 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337552#comment-15337552
 ] 

Apache Spark commented on SPARK-16041:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/13756

> Disallow Duplicate Columns in `partitionBy`, `blockBy` and `sortBy` in 
> DataFrameWriter
> --
>
> Key: SPARK-16041
> URL: https://issues.apache.org/jira/browse/SPARK-16041
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> Duplicate columns are not allowed in `partitionBy`, `blockBy`, `sortBy` in 
> DataFrameWriter. The duplicate columns could cause unpredictable results. For 
> example, the resolution failure. 
> We should detect the duplicates and issue exceptions with appropriate 
> messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16041) Disallow Duplicate Columns in `partitionBy`, `blockBy` and `sortBy` in DataFrameWriter

2016-06-17 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-16041:

Description: 
Duplicate columns are not allowed in `partitionBy`, `blockBy`, `sortBy` in . 
The duplicate columns could cause unpredictable results. For example, the 
resolution failure. 

We should detect the duplicates and issue exceptions with appropriate messages.


  was:
Duplicate columns are not allowed in `partitionBy`, `blockBy`, `sortBy`. The 
duplicate columns could cause unpredictable results. For example, the 
resolution failure. 

We should detect the duplicates and issue exceptions with appropriate messages.



> Disallow Duplicate Columns in `partitionBy`, `blockBy` and `sortBy` in 
> DataFrameWriter
> --
>
> Key: SPARK-16041
> URL: https://issues.apache.org/jira/browse/SPARK-16041
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> Duplicate columns are not allowed in `partitionBy`, `blockBy`, `sortBy` in . 
> The duplicate columns could cause unpredictable results. For example, the 
> resolution failure. 
> We should detect the duplicates and issue exceptions with appropriate 
> messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16041) Disallow Duplicate Columns in `partitionBy`, `blockBy` and `sortBy` in DataFrameWriter

2016-06-17 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-16041:

Description: 
Duplicate columns are not allowed in `partitionBy`, `blockBy`, `sortBy` in 
DataFrameWriter. The duplicate columns could cause unpredictable results. For 
example, the resolution failure. 

We should detect the duplicates and issue exceptions with appropriate messages.


  was:
Duplicate columns are not allowed in `partitionBy`, `blockBy`, `sortBy` in . 
The duplicate columns could cause unpredictable results. For example, the 
resolution failure. 

We should detect the duplicates and issue exceptions with appropriate messages.



> Disallow Duplicate Columns in `partitionBy`, `blockBy` and `sortBy` in 
> DataFrameWriter
> --
>
> Key: SPARK-16041
> URL: https://issues.apache.org/jira/browse/SPARK-16041
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> Duplicate columns are not allowed in `partitionBy`, `blockBy`, `sortBy` in 
> DataFrameWriter. The duplicate columns could cause unpredictable results. For 
> example, the resolution failure. 
> We should detect the duplicates and issue exceptions with appropriate 
> messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16041) Disallow Duplicate Columns in `partitionBy`, `blockBy` and `sortBy`

2016-06-17 Thread Xiao Li (JIRA)
Xiao Li created SPARK-16041:
---

 Summary: Disallow Duplicate Columns in `partitionBy`, `blockBy` 
and `sortBy` 
 Key: SPARK-16041
 URL: https://issues.apache.org/jira/browse/SPARK-16041
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Xiao Li


Duplicate columns are not allowed in `partitionBy`, `blockBy`, `sortBy`. The 
duplicate columns could cause unpredictable results. For example, the 
resolution failure. 

We should detect the duplicates and issue exceptions with appropriate messages.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16041) Disallow Duplicate Columns in `partitionBy`, `blockBy` and `sortBy` in DataFrameWriter

2016-06-17 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-16041:

Summary: Disallow Duplicate Columns in `partitionBy`, `blockBy` and 
`sortBy` in DataFrameWriter  (was: Disallow Duplicate Columns in `partitionBy`, 
`blockBy` and `sortBy` )

> Disallow Duplicate Columns in `partitionBy`, `blockBy` and `sortBy` in 
> DataFrameWriter
> --
>
> Key: SPARK-16041
> URL: https://issues.apache.org/jira/browse/SPARK-16041
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> Duplicate columns are not allowed in `partitionBy`, `blockBy`, `sortBy`. The 
> duplicate columns could cause unpredictable results. For example, the 
> resolution failure. 
> We should detect the duplicates and issue exceptions with appropriate 
> messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16040) spark.mllib PIC document extra line of refernece

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16040:


Assignee: (was: Apache Spark)

> spark.mllib PIC document extra line of refernece
> 
>
> Key: SPARK-16040
> URL: https://issues.apache.org/jira/browse/SPARK-16040
> Project: Spark
>  Issue Type: Documentation
>Reporter: Miao Wang
>Priority: Minor
>
> In the 2.0 document, Line "A full example that produces the experiment 
> described in the PIC paper can be found under examples/." is redundant. 
> There is already "Find full example code at 
> "examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala"
>  in the Spark repo.".
> We should remove the first line, which is consistent with other documents. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16040) spark.mllib PIC document extra line of refernece

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16040:


Assignee: Apache Spark

> spark.mllib PIC document extra line of refernece
> 
>
> Key: SPARK-16040
> URL: https://issues.apache.org/jira/browse/SPARK-16040
> Project: Spark
>  Issue Type: Documentation
>Reporter: Miao Wang
>Assignee: Apache Spark
>Priority: Minor
>
> In the 2.0 document, Line "A full example that produces the experiment 
> described in the PIC paper can be found under examples/." is redundant. 
> There is already "Find full example code at 
> "examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala"
>  in the Spark repo.".
> We should remove the first line, which is consistent with other documents. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16040) spark.mllib PIC document extra line of refernece

2016-06-17 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337550#comment-15337550
 ] 

Apache Spark commented on SPARK-16040:
--

User 'wangmiao1981' has created a pull request for this issue:
https://github.com/apache/spark/pull/13755

> spark.mllib PIC document extra line of refernece
> 
>
> Key: SPARK-16040
> URL: https://issues.apache.org/jira/browse/SPARK-16040
> Project: Spark
>  Issue Type: Documentation
>Reporter: Miao Wang
>Priority: Minor
>
> In the 2.0 document, Line "A full example that produces the experiment 
> described in the PIC paper can be found under examples/." is redundant. 
> There is already "Find full example code at 
> "examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala"
>  in the Spark repo.".
> We should remove the first line, which is consistent with other documents. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16040) spark.mllib PIC document extra line of refernece

2016-06-17 Thread Miao Wang (JIRA)
Miao Wang created SPARK-16040:
-

 Summary: spark.mllib PIC document extra line of refernece
 Key: SPARK-16040
 URL: https://issues.apache.org/jira/browse/SPARK-16040
 Project: Spark
  Issue Type: Documentation
Reporter: Miao Wang
Priority: Minor


In the 2.0 document, Line "A full example that produces the experiment 
described in the PIC paper can be found under examples/." is redundant. 

There is already "Find full example code at 
"examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala"
 in the Spark repo.".

We should remove the first line, which is consistent with other documents. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16020) Fix complete mode aggregation with console sink

2016-06-17 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-16020.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

> Fix complete mode aggregation with console sink
> ---
>
> Key: SPARK-16020
> URL: https://issues.apache.org/jira/browse/SPARK-16020
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 2.0.0
>
>
> Complete mode aggregation doesn't work with console sink. ConsoleSink just 
> shows the new data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16037) use by-position resolution when insert into hive table

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16037:


Assignee: Apache Spark  (was: Wenchen Fan)

> use by-position resolution when insert into hive table
> --
>
> Key: SPARK-16037
> URL: https://issues.apache.org/jira/browse/SPARK-16037
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>
> INSERT INTO TABLE src SELECT 1, 2 AS c, 3 AS b;
> The result is 1, 3, 2 for hive table, which is wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16036) better error message if the number of columns in SELECT clause doesn't match the table schema

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16036:


Assignee: Wenchen Fan  (was: Apache Spark)

> better error message if the number of columns in SELECT clause doesn't match 
> the table schema
> -
>
> Key: SPARK-16036
> URL: https://issues.apache.org/jira/browse/SPARK-16036
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>
> INSERT INTO TABLE src PARTITION(b=2, c=3) SELECT 4, 5, 6;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16036) better error message if the number of columns in SELECT clause doesn't match the table schema

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16036:


Assignee: Apache Spark  (was: Wenchen Fan)

> better error message if the number of columns in SELECT clause doesn't match 
> the table schema
> -
>
> Key: SPARK-16036
> URL: https://issues.apache.org/jira/browse/SPARK-16036
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>
> INSERT INTO TABLE src PARTITION(b=2, c=3) SELECT 4, 5, 6;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16037) use by-position resolution when insert into hive table

2016-06-17 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337543#comment-15337543
 ] 

Apache Spark commented on SPARK-16037:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/13754

> use by-position resolution when insert into hive table
> --
>
> Key: SPARK-16037
> URL: https://issues.apache.org/jira/browse/SPARK-16037
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>
> INSERT INTO TABLE src SELECT 1, 2 AS c, 3 AS b;
> The result is 1, 3, 2 for hive table, which is wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16037) use by-position resolution when insert into hive table

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16037:


Assignee: Wenchen Fan  (was: Apache Spark)

> use by-position resolution when insert into hive table
> --
>
> Key: SPARK-16037
> URL: https://issues.apache.org/jira/browse/SPARK-16037
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>
> INSERT INTO TABLE src SELECT 1, 2 AS c, 3 AS b;
> The result is 1, 3, 2 for hive table, which is wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16036) better error message if the number of columns in SELECT clause doesn't match the table schema

2016-06-17 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337542#comment-15337542
 ] 

Apache Spark commented on SPARK-16036:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/13754

> better error message if the number of columns in SELECT clause doesn't match 
> the table schema
> -
>
> Key: SPARK-16036
> URL: https://issues.apache.org/jira/browse/SPARK-16036
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>
> INSERT INTO TABLE src PARTITION(b=2, c=3) SELECT 4, 5, 6;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16029) Deprecate dropTempTable in SparkR

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16029:


Assignee: Apache Spark

> Deprecate dropTempTable in SparkR
> -
>
> Key: SPARK-16029
> URL: https://issues.apache.org/jira/browse/SPARK-16029
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Shivaram Venkataraman
>Assignee: Apache Spark
>
> This should be called dropTempTable to match the new Scala API



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16029) Deprecate dropTempTable in SparkR

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16029:


Assignee: (was: Apache Spark)

> Deprecate dropTempTable in SparkR
> -
>
> Key: SPARK-16029
> URL: https://issues.apache.org/jira/browse/SPARK-16029
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Shivaram Venkataraman
>
> This should be called dropTempTable to match the new Scala API



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16029) Deprecate dropTempTable in SparkR

2016-06-17 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337536#comment-15337536
 ] 

Apache Spark commented on SPARK-16029:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/13753

> Deprecate dropTempTable in SparkR
> -
>
> Key: SPARK-16029
> URL: https://issues.apache.org/jira/browse/SPARK-16029
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Shivaram Venkataraman
>
> This should be called dropTempTable to match the new Scala API



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Deleted] (SPARK-16038) we can omit partition list when insert into hive table

2016-06-17 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan deleted SPARK-16038:



> we can omit partition list when insert into hive table
> --
>
> Key: SPARK-16038
> URL: https://issues.apache.org/jira/browse/SPARK-16038
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16028) Remove the need to pass in a SparkContext for spark.lapply

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16028:


Assignee: Apache Spark

> Remove the need to pass in a SparkContext for spark.lapply 
> ---
>
> Key: SPARK-16028
> URL: https://issues.apache.org/jira/browse/SPARK-16028
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Shivaram Venkataraman
>Assignee: Apache Spark
>
> Similar to https://github.com/apache/spark/pull/9192 and SPARK-10903 we 
> should remove the need to pass in SparkContext to `spark.lapply`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16028) Remove the need to pass in a SparkContext for spark.lapply

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16028:


Assignee: (was: Apache Spark)

> Remove the need to pass in a SparkContext for spark.lapply 
> ---
>
> Key: SPARK-16028
> URL: https://issues.apache.org/jira/browse/SPARK-16028
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Shivaram Venkataraman
>
> Similar to https://github.com/apache/spark/pull/9192 and SPARK-10903 we 
> should remove the need to pass in SparkContext to `spark.lapply`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16028) Remove the need to pass in a SparkContext for spark.lapply

2016-06-17 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337534#comment-15337534
 ] 

Apache Spark commented on SPARK-16028:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/13752

> Remove the need to pass in a SparkContext for spark.lapply 
> ---
>
> Key: SPARK-16028
> URL: https://issues.apache.org/jira/browse/SPARK-16028
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Shivaram Venkataraman
>
> Similar to https://github.com/apache/spark/pull/9192 and SPARK-10903 we 
> should remove the need to pass in SparkContext to `spark.lapply`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15159) SparkSession R API

2016-06-17 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337532#comment-15337532
 ] 

Apache Spark commented on SPARK-15159:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/13751

> SparkSession R API
> --
>
> Key: SPARK-15159
> URL: https://issues.apache.org/jira/browse/SPARK-15159
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.6.1
>Reporter: Sun Rui
>Assignee: Felix Cheung
>Priority: Blocker
> Fix For: 2.0.0
>
>
> HiveContext is to be deprecated in 2.0.  Replace them with 
> SparkSession.builder.enableHiveSupport in SparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9857) Add expression functions into SparkR which conflict with the existing R's generic

2016-06-17 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337526#comment-15337526
 ] 

Shivaram Venkataraman commented on SPARK-9857:
--

[~yuu.ishik...@gmail.com] [~sunrui] Do we know what other functions fall into 
this category ? I'm trying to see if this work is done or if we have missed 
something here etc.

> Add expression functions into SparkR which conflict with the existing R's 
> generic
> -
>
> Key: SPARK-9857
> URL: https://issues.apache.org/jira/browse/SPARK-9857
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Yu Ishikawa
>
> Add expression functions into SparkR which conflict with the existing R's 
> generic, like {{coalesce(e: Column*)}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15124) R 2.0 QA: New R APIs and API docs

2016-06-17 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337523#comment-15337523
 ] 

Shivaram Venkataraman commented on SPARK-15124:
---

One more item on this list is the SparkSession change we merged recently. We'll 
need to update the examples and programming guide to reflect this

cc [~dongjoon]

> R 2.0 QA: New R APIs and API docs
> -
>
> Key: SPARK-15124
> URL: https://issues.apache.org/jira/browse/SPARK-15124
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SparkR
>Reporter: Joseph K. Bradley
>Priority: Blocker
>
> Audit new public R APIs.  Take note of:
> * Correctness and uniformity of API
> * Documentation: Missing?  Bad links or formatting?
> ** Check both the generated docs linked from the user guide and the R command 
> line docs `?read.df`. These are generated using roxygen.
> As you find issues, please create JIRAs and link them to this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6817) DataFrame UDFs in R

2016-06-17 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337525#comment-15337525
 ] 

Shivaram Venkataraman commented on SPARK-6817:
--

I think all the ones we need for 2.0 are completed here.  

[~srowen] Is there a clean way to mark the umbrella as complete for 2.0 and 
retarget the remaining for 2.1 ?

> DataFrame UDFs in R
> ---
>
> Key: SPARK-6817
> URL: https://issues.apache.org/jira/browse/SPARK-6817
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> This depends on some internal interface of Spark SQL, should be done after 
> merging into Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15159) SparkSession R API

2016-06-17 Thread Shivaram Venkataraman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-15159.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 13635
[https://github.com/apache/spark/pull/13635]

> SparkSession R API
> --
>
> Key: SPARK-15159
> URL: https://issues.apache.org/jira/browse/SPARK-15159
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.6.1
>Reporter: Sun Rui
>Priority: Blocker
> Fix For: 2.0.0
>
>
> HiveContext is to be deprecated in 2.0.  Replace them with 
> SparkSession.builder.enableHiveSupport in SparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15159) SparkSession R API

2016-06-17 Thread Shivaram Venkataraman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-15159:
--
Assignee: Felix Cheung

> SparkSession R API
> --
>
> Key: SPARK-15159
> URL: https://issues.apache.org/jira/browse/SPARK-15159
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.6.1
>Reporter: Sun Rui
>Assignee: Felix Cheung
>Priority: Blocker
> Fix For: 2.0.0
>
>
> HiveContext is to be deprecated in 2.0.  Replace them with 
> SparkSession.builder.enableHiveSupport in SparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15946) Wrap the conversion utils in Python

2016-06-17 Thread Yanbo Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanbo Liang resolved SPARK-15946.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> Wrap the conversion utils in Python
> ---
>
> Key: SPARK-15946
> URL: https://issues.apache.org/jira/browse/SPARK-15946
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
> Fix For: 2.0.0
>
>
> This is to wrap SPARK-15945 in Python. So Python users can use it to convert 
> DataFrames with vector columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15129) Clarify conventions for calling Spark and MLlib from R

2016-06-17 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-15129.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 13285
[https://github.com/apache/spark/pull/13285]

> Clarify conventions for calling Spark and MLlib from R
> --
>
> Key: SPARK-15129
> URL: https://issues.apache.org/jira/browse/SPARK-15129
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, ML, SparkR
>Reporter: Joseph K. Bradley
>Assignee: Gayathri Murali
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Since some R API modifications happened in 2.0, we need to make the new 
> standards clear in the user guide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15892) Incorrectly merged AFTAggregator with zero total count

2016-06-17 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-15892:
--
Fix Version/s: 2.0.0

> Incorrectly merged AFTAggregator with zero total count
> --
>
> Key: SPARK-15892
> URL: https://issues.apache.org/jira/browse/SPARK-15892
> Project: Spark
>  Issue Type: Bug
>  Components: Examples, ML, PySpark
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Joseph K. Bradley
>Assignee: Hyukjin Kwon
> Fix For: 1.6.2, 2.0.0
>
>
> Running the example (after the fix in 
> [https://github.com/apache/spark/pull/13393]) causes this failure:
> {code}
> Traceback (most recent call last):
>   
>   File 
> "/Users/josephkb/spark/examples/src/main/python/ml/aft_survival_regression.py",
>  line 49, in 
> model = aft.fit(training)
>   File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/ml/base.py", 
> line 64, in fit
>   File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", 
> line 213, in _fit
>   File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", 
> line 210, in _fit_java
>   File 
> "/Users/josephkb/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", 
> line 933, in __call__
>   File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", 
> line 79, in deco
> pyspark.sql.utils.IllegalArgumentException: u'requirement failed: The number 
> of instances should be greater than 0.0, but got 0.'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15892) Incorrectly merged AFTAggregator with zero total count

2016-06-17 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-15892.
---
   Resolution: Fixed
Fix Version/s: (was: 2.0.0)
   1.6.2

Issue resolved by pull request 13725
[https://github.com/apache/spark/pull/13725]

> Incorrectly merged AFTAggregator with zero total count
> --
>
> Key: SPARK-15892
> URL: https://issues.apache.org/jira/browse/SPARK-15892
> Project: Spark
>  Issue Type: Bug
>  Components: Examples, ML, PySpark
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Joseph K. Bradley
>Assignee: Hyukjin Kwon
> Fix For: 1.6.2
>
>
> Running the example (after the fix in 
> [https://github.com/apache/spark/pull/13393]) causes this failure:
> {code}
> Traceback (most recent call last):
>   
>   File 
> "/Users/josephkb/spark/examples/src/main/python/ml/aft_survival_regression.py",
>  line 49, in 
> model = aft.fit(training)
>   File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/ml/base.py", 
> line 64, in fit
>   File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", 
> line 213, in _fit
>   File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", 
> line 210, in _fit_java
>   File 
> "/Users/josephkb/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", 
> line 933, in __call__
>   File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", 
> line 79, in deco
> pyspark.sql.utils.IllegalArgumentException: u'requirement failed: The number 
> of instances should be greater than 0.0, but got 0.'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15603) Replace SQLContext with SparkSession in ML/MLLib

2016-06-17 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-15603.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Replace SQLContext with SparkSession in ML/MLLib
> 
>
> Key: SPARK-15603
> URL: https://issues.apache.org/jira/browse/SPARK-15603
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 2.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
> Fix For: 2.0.0
>
>
> This issue replaces all deprecated `SQLContext` occurrences with 
> `SparkSession` in `ML/MLLib` module except the following two classes. These 
> two classes use `SQLContext` as their function arguments.
> - ReadWrite.scala
> - TreeModels.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16033) DataFrameWriter.partitionBy() can't be used together with DataFrameWriter.insertInto()

2016-06-17 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-16033.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 13747
[https://github.com/apache/spark/pull/13747]

> DataFrameWriter.partitionBy() can't be used together with 
> DataFrameWriter.insertInto()
> --
>
> Key: SPARK-16033
> URL: https://issues.apache.org/jira/browse/SPARK-16033
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
> Fix For: 2.0.0
>
>
> When inserting into an existing partitioned table, partitioning columns 
> should always be determined by catalog metadata of the existing table to be 
> inserted. Extra {{partitionBy()}} calls don't make sense, and mess up 
> existing data because newly inserted data may have wrong partitioning 
> directory layout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16028) Remove the need to pass in a SparkContext for spark.lapply

2016-06-17 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337475#comment-15337475
 ] 

Felix Cheung commented on SPARK-16028:
--

Fix ready as soon as the parent PR is merged.

> Remove the need to pass in a SparkContext for spark.lapply 
> ---
>
> Key: SPARK-16028
> URL: https://issues.apache.org/jira/browse/SPARK-16028
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Shivaram Venkataraman
>
> Similar to https://github.com/apache/spark/pull/9192 and SPARK-10903 we 
> should remove the need to pass in SparkContext to `spark.lapply`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16027) Fix SparkR session unit test

2016-06-17 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337476#comment-15337476
 ] 

Felix Cheung commented on SPARK-16027:
--

Fix ready as soon as parent PR is merged.


> Fix SparkR session unit test
> 
>
> Key: SPARK-16027
> URL: https://issues.apache.org/jira/browse/SPARK-16027
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Shivaram Venkataraman
>
> As described in https://github.com/apache/spark/pull/13635/files, the test 
> titled "repeatedly starting and stopping SparkR" does not seem to work 
> consistently with the new sparkR.session code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16039) Spark SQL - Number of rows inserted by Insert Sql

2016-06-17 Thread Prabhu Kasinathan (JIRA)
Prabhu Kasinathan created SPARK-16039:
-

 Summary: Spark SQL - Number of rows inserted by Insert Sql
 Key: SPARK-16039
 URL: https://issues.apache.org/jira/browse/SPARK-16039
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.6.1
Reporter: Prabhu Kasinathan


Insert spark sql currently returns only "OK" and time taken. But, it would be 
good, if insert sql returns number of rows inserted into target table.

Example:

{code}
INSERT INTO TABLE target
SELECT * FROM source;
1000 rows inserted
OK
Time taken: 1 min 30 secs
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15340) Limit the size of the map used to cache JobConfs to void OOM

2016-06-17 Thread Zhongshuai Pei (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337421#comment-15337421
 ] 

Zhongshuai Pei edited comment on SPARK-15340 at 6/18/16 1:47 AM:
-

[~clockfly]

1. I run in the cluster mode on YARN and use beeline
2. i run tpcds(500g  and must be orc) and set driver.memory 30g
3. it is heap space OOM.you can run " jstat -gc pid" and will find the memory 
of old  grow fast and will not be released
4. i run tpcds for 5 hours and OOM happened



was (Author: doingdone9):
[~clockfly]

1. I run in the cluster mode on YARN and use spark-sql
2. i run tpcds(500g  and must be orc) and set driver.memory 30g
3. it is heap space OOM.you can run " jstat -gc pid" and will find the memory 
of old  grow fast and will not be released
4. i run tpcds for 5 hours and OOM happened


> Limit the size of the map used to cache JobConfs to void OOM
> 
>
> Key: SPARK-15340
> URL: https://issues.apache.org/jira/browse/SPARK-15340
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Zhongshuai Pei
>Priority: Critical
>
> when i run tpcds (orc)  by using JDBCServer, driver always  OOM.
> i find tens of thousands of Jobconf from dump file and these JobConf  can not 
> be recycled, So we should limit the size of the map used to cache JobConfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16016) where i can find the code of Extreme Learning Machine(elm) on spark

2016-06-17 Thread yueyou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337423#comment-15337423
 ] 

yueyou commented on SPARK-16016:


you say nothing

> where i can find the code of Extreme Learning Machine(elm) on spark
> ---
>
> Key: SPARK-16016
> URL: https://issues.apache.org/jira/browse/SPARK-16016
> Project: Spark
>  Issue Type: IT Help
>  Components: MLlib
>Affects Versions: 1.6.0
>Reporter: yueyou
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
>  i cann't find the code of Extreme Learning Machine(elm) on spark. someone 
> help me ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15340) Limit the size of the map used to cache JobConfs to void OOM

2016-06-17 Thread Zhongshuai Pei (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337421#comment-15337421
 ] 

Zhongshuai Pei commented on SPARK-15340:


[~clockfly]

1. I run in the cluster mode on YARN and use spark-sql
2. i run tpcds(500g  and must be orc) and set driver.memory 30g
3. it is heap space OOM.you can run " jstat -gc pid" and will find the memory 
of old  grow fast and will not be released
4. i run tpcds for 5 hours and OOM happened


> Limit the size of the map used to cache JobConfs to void OOM
> 
>
> Key: SPARK-15340
> URL: https://issues.apache.org/jira/browse/SPARK-15340
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Zhongshuai Pei
>Priority: Critical
>
> when i run tpcds (orc)  by using JDBCServer, driver always  OOM.
> i find tens of thousands of Jobconf from dump file and these JobConf  can not 
> be recycled, So we should limit the size of the map used to cache JobConfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16035) The SparseVector parser fails checking for valid end parenthesis

2016-06-17 Thread Andrea Pasqua (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337389#comment-15337389
 ] 

Andrea Pasqua commented on SPARK-16035:
---

https://github.com/apache/spark/pull/13750

> The SparseVector parser fails checking for valid end parenthesis
> 
>
> Key: SPARK-16035
> URL: https://issues.apache.org/jira/browse/SPARK-16035
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Andrea Pasqua
>Priority: Minor
>
> Running
>   SparseVector.parse(' (4, [0,1 ],[ 4.0,5.0] ')
> will not raise an exception as expected, although it parses it as if there 
> was an end parenthesis.
> This can be fixed by replacing
>   if start == -1:
>raise ValueError("Tuple should end with ')'")
> with
>  if end == -1:
>raise ValueError("Tuple should end with ')'")
> Please see posted PR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-16035) The SparseVector parser fails checking for valid end parenthesis

2016-06-17 Thread Andrea Pasqua (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrea Pasqua updated SPARK-16035:
--
Comment: was deleted

(was: https://github.com/apache/spark/pull/13750)

> The SparseVector parser fails checking for valid end parenthesis
> 
>
> Key: SPARK-16035
> URL: https://issues.apache.org/jira/browse/SPARK-16035
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Andrea Pasqua
>Priority: Minor
>
> Running
>   SparseVector.parse(' (4, [0,1 ],[ 4.0,5.0] ')
> will not raise an exception as expected, although it parses it as if there 
> was an end parenthesis.
> This can be fixed by replacing
>   if start == -1:
>raise ValueError("Tuple should end with ')'")
> with
>  if end == -1:
>raise ValueError("Tuple should end with ')'")
> Please see posted PR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16035) The SparseVector parser fails checking for valid end parenthesis

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16035:


Assignee: (was: Apache Spark)

> The SparseVector parser fails checking for valid end parenthesis
> 
>
> Key: SPARK-16035
> URL: https://issues.apache.org/jira/browse/SPARK-16035
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Andrea Pasqua
>Priority: Minor
>
> Running
>   SparseVector.parse(' (4, [0,1 ],[ 4.0,5.0] ')
> will not raise an exception as expected, although it parses it as if there 
> was an end parenthesis.
> This can be fixed by replacing
>   if start == -1:
>raise ValueError("Tuple should end with ')'")
> with
>  if end == -1:
>raise ValueError("Tuple should end with ')'")
> Please see posted PR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16035) The SparseVector parser fails checking for valid end parenthesis

2016-06-17 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337388#comment-15337388
 ] 

Apache Spark commented on SPARK-16035:
--

User 'andreapasqua' has created a pull request for this issue:
https://github.com/apache/spark/pull/13750

> The SparseVector parser fails checking for valid end parenthesis
> 
>
> Key: SPARK-16035
> URL: https://issues.apache.org/jira/browse/SPARK-16035
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Andrea Pasqua
>Priority: Minor
>
> Running
>   SparseVector.parse(' (4, [0,1 ],[ 4.0,5.0] ')
> will not raise an exception as expected, although it parses it as if there 
> was an end parenthesis.
> This can be fixed by replacing
>   if start == -1:
>raise ValueError("Tuple should end with ')'")
> with
>  if end == -1:
>raise ValueError("Tuple should end with ')'")
> Please see posted PR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16035) The SparseVector parser fails checking for valid end parenthesis

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16035:


Assignee: Apache Spark

> The SparseVector parser fails checking for valid end parenthesis
> 
>
> Key: SPARK-16035
> URL: https://issues.apache.org/jira/browse/SPARK-16035
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Andrea Pasqua
>Assignee: Apache Spark
>Priority: Minor
>
> Running
>   SparseVector.parse(' (4, [0,1 ],[ 4.0,5.0] ')
> will not raise an exception as expected, although it parses it as if there 
> was an end parenthesis.
> This can be fixed by replacing
>   if start == -1:
>raise ValueError("Tuple should end with ')'")
> with
>  if end == -1:
>raise ValueError("Tuple should end with ')'")
> Please see posted PR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16034) Checks the partition columns when calling dataFrame.write.mode("append").saveAsTable

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16034:


Assignee: (was: Apache Spark)

> Checks the partition columns when calling 
> dataFrame.write.mode("append").saveAsTable
> 
>
> Key: SPARK-16034
> URL: https://issues.apache.org/jira/browse/SPARK-16034
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Sean Zhong
>
> Suppose we have defined a partitioned table:
> {code}
> CREATE TABLE src (a INT, b INT, c INT)
> USING PARQUET
> PARTITIONED BY (a, b);
> {code}
> We should check the partition columns when appending DataFrame data to 
> existing table: 
> {code}
> val df = Seq((1, 2, 3)).toDF("a", "b", "c")
> df.write.partitionBy("b", "a").mode("append").saveAsTable("src")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16034) Checks the partition columns when calling dataFrame.write.mode("append").saveAsTable

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16034:


Assignee: Apache Spark

> Checks the partition columns when calling 
> dataFrame.write.mode("append").saveAsTable
> 
>
> Key: SPARK-16034
> URL: https://issues.apache.org/jira/browse/SPARK-16034
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Sean Zhong
>Assignee: Apache Spark
>
> Suppose we have defined a partitioned table:
> {code}
> CREATE TABLE src (a INT, b INT, c INT)
> USING PARQUET
> PARTITIONED BY (a, b);
> {code}
> We should check the partition columns when appending DataFrame data to 
> existing table: 
> {code}
> val df = Seq((1, 2, 3)).toDF("a", "b", "c")
> df.write.partitionBy("b", "a").mode("append").saveAsTable("src")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16034) Checks the partition columns when calling dataFrame.write.mode("append").saveAsTable

2016-06-17 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337385#comment-15337385
 ] 

Apache Spark commented on SPARK-16034:
--

User 'clockfly' has created a pull request for this issue:
https://github.com/apache/spark/pull/13749

> Checks the partition columns when calling 
> dataFrame.write.mode("append").saveAsTable
> 
>
> Key: SPARK-16034
> URL: https://issues.apache.org/jira/browse/SPARK-16034
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Sean Zhong
>
> Suppose we have defined a partitioned table:
> {code}
> CREATE TABLE src (a INT, b INT, c INT)
> USING PARQUET
> PARTITIONED BY (a, b);
> {code}
> We should check the partition columns when appending DataFrame data to 
> existing table: 
> {code}
> val df = Seq((1, 2, 3)).toDF("a", "b", "c")
> df.write.partitionBy("b", "a").mode("append").saveAsTable("src")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16031) Add debug-only socket source in Structured Streaming

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16031:


Assignee: Matei Zaharia  (was: Apache Spark)

> Add debug-only socket source in Structured Streaming
> 
>
> Key: SPARK-16031
> URL: https://issues.apache.org/jira/browse/SPARK-16031
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL, Streaming
>Reporter: Matei Zaharia
>Assignee: Matei Zaharia
>
> This is a debug-only version of SPARK-15842: for tutorials and debugging of 
> streaming apps, it would be nice to have a text-based socket source similar 
> to the one in Spark Streaming. It will clearly be marked as debug-only so 
> that users don't try to run it in production applications, because this type 
> of source cannot provide HA without storing a lot of state in Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16031) Add debug-only socket source in Structured Streaming

2016-06-17 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337373#comment-15337373
 ] 

Apache Spark commented on SPARK-16031:
--

User 'mateiz' has created a pull request for this issue:
https://github.com/apache/spark/pull/13748

> Add debug-only socket source in Structured Streaming
> 
>
> Key: SPARK-16031
> URL: https://issues.apache.org/jira/browse/SPARK-16031
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL, Streaming
>Reporter: Matei Zaharia
>Assignee: Matei Zaharia
>
> This is a debug-only version of SPARK-15842: for tutorials and debugging of 
> streaming apps, it would be nice to have a text-based socket source similar 
> to the one in Spark Streaming. It will clearly be marked as debug-only so 
> that users don't try to run it in production applications, because this type 
> of source cannot provide HA without storing a lot of state in Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16031) Add debug-only socket source in Structured Streaming

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16031:


Assignee: Apache Spark  (was: Matei Zaharia)

> Add debug-only socket source in Structured Streaming
> 
>
> Key: SPARK-16031
> URL: https://issues.apache.org/jira/browse/SPARK-16031
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL, Streaming
>Reporter: Matei Zaharia
>Assignee: Apache Spark
>
> This is a debug-only version of SPARK-15842: for tutorials and debugging of 
> streaming apps, it would be nice to have a text-based socket source similar 
> to the one in Spark Streaming. It will clearly be marked as debug-only so 
> that users don't try to run it in production applications, because this type 
> of source cannot provide HA without storing a lot of state in Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16038) we can omit partition list when insert into hive table

2016-06-17 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-16038:
---

 Summary: we can omit partition list when insert into hive table
 Key: SPARK-16038
 URL: https://issues.apache.org/jira/browse/SPARK-16038
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.0.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16037) use by-position resolution when insert into hive table

2016-06-17 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-16037:
---

 Summary: use by-position resolution when insert into hive table
 Key: SPARK-16037
 URL: https://issues.apache.org/jira/browse/SPARK-16037
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.0.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan


INSERT INTO TABLE src SELECT 1, 2 AS c, 3 AS b;

The result is 1, 3, 2 for hive table, which is wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16035) The SparseVector parser fails checking for valid end parenthesis

2016-06-17 Thread Andrea Pasqua (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrea Pasqua updated SPARK-16035:
--
Description: 
Running
  SparseVector.parse(' (4, [0,1 ],[ 4.0,5.0] ')
will not raise an exception as expected, although it parses it as if there was 
an end parenthesis.

This can be fixed by replacing

  if start == -1:
   raise ValueError("Tuple should end with ')'")

with
 if end == -1:
   raise ValueError("Tuple should end with ')'")

Please see posted PR

  was:
Running
```
SparseVector.parse(' (4, [0,1 ],[ 4.0,5.0] ')
```


> The SparseVector parser fails checking for valid end parenthesis
> 
>
> Key: SPARK-16035
> URL: https://issues.apache.org/jira/browse/SPARK-16035
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Andrea Pasqua
>Priority: Minor
>
> Running
>   SparseVector.parse(' (4, [0,1 ],[ 4.0,5.0] ')
> will not raise an exception as expected, although it parses it as if there 
> was an end parenthesis.
> This can be fixed by replacing
>   if start == -1:
>raise ValueError("Tuple should end with ')'")
> with
>  if end == -1:
>raise ValueError("Tuple should end with ')'")
> Please see posted PR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16035) The SparseVector parser fails checking for valid end parenthesis

2016-06-17 Thread Andrea Pasqua (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrea Pasqua updated SPARK-16035:
--
Component/s: PySpark

> The SparseVector parser fails checking for valid end parenthesis
> 
>
> Key: SPARK-16035
> URL: https://issues.apache.org/jira/browse/SPARK-16035
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Andrea Pasqua
>Priority: Minor
>
> Running
>   SparseVector.parse(' (4, [0,1 ],[ 4.0,5.0] ')
> will not raise an exception as expected, although it parses it as if there 
> was an end parenthesis.
> This can be fixed by replacing
>   if start == -1:
>raise ValueError("Tuple should end with ')'")
> with
>  if end == -1:
>raise ValueError("Tuple should end with ')'")
> Please see posted PR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16034) Checks the partition columns when calling dataFrame.write.mode("append").saveAsTable

2016-06-17 Thread Sean Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Zhong updated SPARK-16034:
---
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-16032

> Checks the partition columns when calling 
> dataFrame.write.mode("append").saveAsTable
> 
>
> Key: SPARK-16034
> URL: https://issues.apache.org/jira/browse/SPARK-16034
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Sean Zhong
>
> Suppose we have defined a partitioned table:
> {code}
> CREATE TABLE src (a INT, b INT, c INT)
> USING PARQUET
> PARTITIONED BY (a, b);
> {code}
> We should check the partition columns when appending DataFrame data to 
> existing table: 
> {code}
> val df = Seq((1, 2, 3)).toDF("a", "b", "c")
> df.write.partitionBy("b", "a").mode("append").saveAsTable("src")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16036) better error message if the number of columns in SELECT clause doesn't match the table schema

2016-06-17 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-16036:
---

 Summary: better error message if the number of columns in SELECT 
clause doesn't match the table schema
 Key: SPARK-16036
 URL: https://issues.apache.org/jira/browse/SPARK-16036
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.0.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan


INSERT INTO TABLE src PARTITION(b=2, c=3) SELECT 4, 5, 6;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16035) The SparseVector parser fails checking for valid end parenthesis

2016-06-17 Thread Andrea Pasqua (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrea Pasqua updated SPARK-16035:
--
Description: 
Running
```
SparseVector.parse(' (4, [0,1 ],[ 4.0,5.0] ')
```

> The SparseVector parser fails checking for valid end parenthesis
> 
>
> Key: SPARK-16035
> URL: https://issues.apache.org/jira/browse/SPARK-16035
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Andrea Pasqua
>Priority: Minor
>
> Running
> ```
> SparseVector.parse(' (4, [0,1 ],[ 4.0,5.0] ')
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16035) The SparseVector parser fails checking for valid end parenthesis

2016-06-17 Thread Andrea Pasqua (JIRA)
Andrea Pasqua created SPARK-16035:
-

 Summary: The SparseVector parser fails checking for valid end 
parenthesis
 Key: SPARK-16035
 URL: https://issues.apache.org/jira/browse/SPARK-16035
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.6.1, 2.0.0
Reporter: Andrea Pasqua
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16034) Checks the partition columns when calling dataFrame.write.mode("append").saveAsTable

2016-06-17 Thread Sean Zhong (JIRA)
Sean Zhong created SPARK-16034:
--

 Summary: Checks the partition columns when calling 
dataFrame.write.mode("append").saveAsTable
 Key: SPARK-16034
 URL: https://issues.apache.org/jira/browse/SPARK-16034
 Project: Spark
  Issue Type: Bug
Reporter: Sean Zhong


Suppose we have defined a partitioned table:
{code}
CREATE TABLE src (a INT, b INT, c INT)
USING PARQUET
PARTITIONED BY (a, b);
{code}

We should check the partition columns when appending DataFrame data to existing 
table: 
{code}
val df = Seq((1, 2, 3)).toDF("a", "b", "c")
df.write.partitionBy("b", "a").mode("append").saveAsTable("src")
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16033) DataFrameWriter.partitionBy() can't be used together with DataFrameWriter.insertInto()

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16033:


Assignee: Cheng Lian  (was: Apache Spark)

> DataFrameWriter.partitionBy() can't be used together with 
> DataFrameWriter.insertInto()
> --
>
> Key: SPARK-16033
> URL: https://issues.apache.org/jira/browse/SPARK-16033
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> When inserting into an existing partitioned table, partitioning columns 
> should always be determined by catalog metadata of the existing table to be 
> inserted. Extra {{partitionBy()}} calls don't make sense, and mess up 
> existing data because newly inserted data may have wrong partitioning 
> directory layout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16033) DataFrameWriter.partitionBy() can't be used together with DataFrameWriter.insertInto()

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16033:


Assignee: Apache Spark  (was: Cheng Lian)

> DataFrameWriter.partitionBy() can't be used together with 
> DataFrameWriter.insertInto()
> --
>
> Key: SPARK-16033
> URL: https://issues.apache.org/jira/browse/SPARK-16033
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Apache Spark
>
> When inserting into an existing partitioned table, partitioning columns 
> should always be determined by catalog metadata of the existing table to be 
> inserted. Extra {{partitionBy()}} calls don't make sense, and mess up 
> existing data because newly inserted data may have wrong partitioning 
> directory layout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16030) Allow specifying static partitions in an INSERT statement for data source tables

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16030:


Assignee: Apache Spark  (was: Yin Huai)

> Allow specifying static partitions in an INSERT statement for data source 
> tables
> 
>
> Key: SPARK-16030
> URL: https://issues.apache.org/jira/browse/SPARK-16030
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Apache Spark
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16030) Allow specifying static partitions in an INSERT statement for data source tables

2016-06-17 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337341#comment-15337341
 ] 

Apache Spark commented on SPARK-16030:
--

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/13746

> Allow specifying static partitions in an INSERT statement for data source 
> tables
> 
>
> Key: SPARK-16030
> URL: https://issues.apache.org/jira/browse/SPARK-16030
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Yin Huai
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16030) Allow specifying static partitions in an INSERT statement for data source tables

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16030:


Assignee: Yin Huai  (was: Apache Spark)

> Allow specifying static partitions in an INSERT statement for data source 
> tables
> 
>
> Key: SPARK-16030
> URL: https://issues.apache.org/jira/browse/SPARK-16030
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Yin Huai
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16033) DataFrameWriter.partitionBy() can't be used together with DataFrameWriter.insertInto()

2016-06-17 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337343#comment-15337343
 ] 

Apache Spark commented on SPARK-16033:
--

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/13747

> DataFrameWriter.partitionBy() can't be used together with 
> DataFrameWriter.insertInto()
> --
>
> Key: SPARK-16033
> URL: https://issues.apache.org/jira/browse/SPARK-16033
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> When inserting into an existing partitioned table, partitioning columns 
> should always be determined by catalog metadata of the existing table to be 
> inserted. Extra {{partitionBy()}} calls don't make sense, and mess up 
> existing data because newly inserted data may have wrong partitioning 
> directory layout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15997) Audit ml.feature Update documentation for ml feature transformers

2016-06-17 Thread Gayathri Murali (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337335#comment-15337335
 ] 

Gayathri Murali commented on SPARK-15997:
-

https://github.com/apache/spark/pull/13745 - This is right link to the PR

> Audit ml.feature Update documentation for ml feature transformers
> -
>
> Key: SPARK-15997
> URL: https://issues.apache.org/jira/browse/SPARK-15997
> Project: Spark
>  Issue Type: Documentation
>  Components: ML, MLlib
>Affects Versions: 2.0.0
>Reporter: Gayathri Murali
>Assignee: Gayathri Murali
>
> This JIRA is a subtask of SPARK-15100 and improves documentation for new 
> features added to 
> 1. HashingTF
> 2. Countvectorizer
> 3. QuantileDiscretizer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16033) DataFrameWriter.partitionBy() can't be used together with DataFrameWriter.insertInto()

2016-06-17 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-16033:
---
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-16032

> DataFrameWriter.partitionBy() can't be used together with 
> DataFrameWriter.insertInto()
> --
>
> Key: SPARK-16033
> URL: https://issues.apache.org/jira/browse/SPARK-16033
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> When inserting into an existing partitioned table, partitioning columns 
> should always be determined by catalog metadata of the existing table to be 
> inserted. Extra {{partitionBy()}} calls don't make sense, and mess up 
> existing data because newly inserted data may have wrong partitioning 
> directory layout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16033) DataFrameWriter.partitionBy() can't be used together with DataFrameWriter.insertInto()

2016-06-17 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-16033:
--

 Summary: DataFrameWriter.partitionBy() can't be used together with 
DataFrameWriter.insertInto()
 Key: SPARK-16033
 URL: https://issues.apache.org/jira/browse/SPARK-16033
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian


When inserting into an existing partitioned table, partitioning columns should 
always be determined by catalog metadata of the existing table to be inserted. 
Extra {{partitionBy()}} calls don't make sense, and mess up existing data 
because newly inserted data may have wrong partitioning directory layout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16030) Allow specifying static partitions in an INSERT statement for data source tables

2016-06-17 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-16030:
---
Assignee: Yin Huai

> Allow specifying static partitions in an INSERT statement for data source 
> tables
> 
>
> Key: SPARK-16030
> URL: https://issues.apache.org/jira/browse/SPARK-16030
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Yin Huai
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16032) Audit semantics of various insertion operations related to partitioned tables

2016-06-17 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-16032:
--

 Summary: Audit semantics of various insertion operations related 
to partitioned tables
 Key: SPARK-16032
 URL: https://issues.apache.org/jira/browse/SPARK-16032
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Wenchen Fan
Priority: Blocker


We found that semantics of various insertion operations related to partition 
tables can be inconsistent. This is an umbrella ticket for all related tickets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16030) Allow specifying static partitions in an INSERT statement for data source tables

2016-06-17 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-16030:
---
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-16032

> Allow specifying static partitions in an INSERT statement for data source 
> tables
> 
>
> Key: SPARK-16030
> URL: https://issues.apache.org/jira/browse/SPARK-16030
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15997) Audit ml.feature Update documentation for ml feature transformers

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15997:


Assignee: Gayathri Murali  (was: Apache Spark)

> Audit ml.feature Update documentation for ml feature transformers
> -
>
> Key: SPARK-15997
> URL: https://issues.apache.org/jira/browse/SPARK-15997
> Project: Spark
>  Issue Type: Documentation
>  Components: ML, MLlib
>Affects Versions: 2.0.0
>Reporter: Gayathri Murali
>Assignee: Gayathri Murali
>
> This JIRA is a subtask of SPARK-15100 and improves documentation for new 
> features added to 
> 1. HashingTF
> 2. Countvectorizer
> 3. QuantileDiscretizer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15997) Audit ml.feature Update documentation for ml feature transformers

2016-06-17 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337285#comment-15337285
 ] 

Apache Spark commented on SPARK-15997:
--

User 'GayathriMurali' has created a pull request for this issue:
https://github.com/apache/spark/pull/13176

> Audit ml.feature Update documentation for ml feature transformers
> -
>
> Key: SPARK-15997
> URL: https://issues.apache.org/jira/browse/SPARK-15997
> Project: Spark
>  Issue Type: Documentation
>  Components: ML, MLlib
>Affects Versions: 2.0.0
>Reporter: Gayathri Murali
>Assignee: Gayathri Murali
>
> This JIRA is a subtask of SPARK-15100 and improves documentation for new 
> features added to 
> 1. HashingTF
> 2. Countvectorizer
> 3. QuantileDiscretizer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15997) Audit ml.feature Update documentation for ml feature transformers

2016-06-17 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15997:


Assignee: Apache Spark  (was: Gayathri Murali)

> Audit ml.feature Update documentation for ml feature transformers
> -
>
> Key: SPARK-15997
> URL: https://issues.apache.org/jira/browse/SPARK-15997
> Project: Spark
>  Issue Type: Documentation
>  Components: ML, MLlib
>Affects Versions: 2.0.0
>Reporter: Gayathri Murali
>Assignee: Apache Spark
>
> This JIRA is a subtask of SPARK-15100 and improves documentation for new 
> features added to 
> 1. HashingTF
> 2. Countvectorizer
> 3. QuantileDiscretizer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16030) Allow specifying static partitions in an INSERT statement for data source tables

2016-06-17 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-16030:
-
Priority: Critical  (was: Major)

> Allow specifying static partitions in an INSERT statement for data source 
> tables
> 
>
> Key: SPARK-16030
> URL: https://issues.apache.org/jira/browse/SPARK-16030
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Yin Huai
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15916) JDBC AND/OR operator push down does not respect lower OR operator precedence

2016-06-17 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-15916:
---
Description: 
A table from SQL server Northwind database was registered as a JDBC dataframe.

A query was executed on Spark SQL, the {{northwind_dbo_Categories}} table is a 
temporary table which is a JDBC dataframe to 
{{\[northwind\].\[dbo\].\[Categories\]}} SQL server table:

SQL executed on Spark sql context:

{code:sql}
SELECT CategoryID FROM northwind_dbo_Categories
WHERE (CategoryID = 1 OR CategoryID = 2) AND CategoryName = 'Beverages'
{code}

Spark has done a proper predicate pushdown to JDBC, however parenthesis around 
two {{OR}} conditions was removed. Instead the following query was sent over 
JDBC to SQL Server:

{code:sql}
SELECT "CategoryID" FROM [northwind].[dbo].[Categories] WHERE (CategoryID = 1) 
OR (CategoryID = 2) AND CategoryName = 'Beverages'
{code}

As a result, the last two conditions (around the AND operator) were considered 
as the highest precedence: {{(CategoryID = 2) AND CategoryName = 'Beverages'}}

Finally SQL Server has executed a query like this:

{code:sql}
SELECT "CategoryID" FROM [northwind].[dbo].[Categories] WHERE CategoryID = 1 OR 
(CategoryID = 2 AND CategoryName = 'Beverages')
{code}


  was:
A table from sql server Northwind database was registered as a JDBC dataframe.
A query was executed on Spark SQL, the "northwind_dbo_Categories" table is a 
temporary table which is a JDBC dataframe to "[northwind].[dbo].[Categories]" 
sql server table:

SQL executed on Spark sql context:
SELECT CategoryID FROM northwind_dbo_Categories
WHERE (CategoryID = 1 OR CategoryID = 2) AND CategoryName = 'Beverages'


Spark has done a proper predicate pushdown to JDBC, however parenthesis around 
two OR conditions was removed. Instead the following query was sent over JDBC 
to SQL Server:
SELECT "CategoryID" FROM [northwind].[dbo].[Categories] WHERE (CategoryID = 1) 
OR (CategoryID = 2) AND CategoryName = 'Beverages'


As a result, the last two conditions (around the AND operator) were considered 
as the highest precedence: (CategoryID = 2) AND CategoryName = 'Beverages'

Finally SQL Server has executed a query like this:
SELECT "CategoryID" FROM [northwind].[dbo].[Categories] WHERE CategoryID = 1 OR 
(CategoryID = 2 AND CategoryName = 'Beverages')



> JDBC AND/OR operator push down does not respect lower OR operator precedence
> 
>
> Key: SPARK-15916
> URL: https://issues.apache.org/jira/browse/SPARK-15916
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Piotr Czarnas
>Assignee: Hyukjin Kwon
> Fix For: 2.0.0
>
>
> A table from SQL server Northwind database was registered as a JDBC dataframe.
> A query was executed on Spark SQL, the {{northwind_dbo_Categories}} table is 
> a temporary table which is a JDBC dataframe to 
> {{\[northwind\].\[dbo\].\[Categories\]}} SQL server table:
> SQL executed on Spark sql context:
> {code:sql}
> SELECT CategoryID FROM northwind_dbo_Categories
> WHERE (CategoryID = 1 OR CategoryID = 2) AND CategoryName = 'Beverages'
> {code}
> Spark has done a proper predicate pushdown to JDBC, however parenthesis 
> around two {{OR}} conditions was removed. Instead the following query was 
> sent over JDBC to SQL Server:
> {code:sql}
> SELECT "CategoryID" FROM [northwind].[dbo].[Categories] WHERE (CategoryID = 
> 1) OR (CategoryID = 2) AND CategoryName = 'Beverages'
> {code}
> As a result, the last two conditions (around the AND operator) were 
> considered as the highest precedence: {{(CategoryID = 2) AND CategoryName = 
> 'Beverages'}}
> Finally SQL Server has executed a query like this:
> {code:sql}
> SELECT "CategoryID" FROM [northwind].[dbo].[Categories] WHERE CategoryID = 1 
> OR (CategoryID = 2 AND CategoryName = 'Beverages')
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15984) WARN message "o.a.h.y.s.resourcemanager.rmapp.RMAppImpl: The specific max attempts: 0 for application: 8 is invalid" when starting application on YARN

2016-06-17 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337278#comment-15337278
 ] 

Saisai Shao commented on SPARK-15984:
-

Is there any problem? I guess you might set max app attempt to 0, so you will 
get such warning log, it is illegal to set 0 for max app attempt, it should be 
>= 1.

Here is the yarn code.

{code}
 int globalMaxAppAttempts = 
conf.getInt(YarnConfiguration.RM_AM_MAX_ATTEMPTS,
YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS);
int individualMaxAppAttempts = submissionContext.getMaxAppAttempts();
if (individualMaxAppAttempts <= 0 ||
individualMaxAppAttempts > globalMaxAppAttempts) {
  this.maxAppAttempts = globalMaxAppAttempts;
  LOG.warn("The specific max attempts: " + individualMaxAppAttempts
  + " for application: " + applicationId.getId()
  + " is invalid, because it is out of the range [1, "
  + globalMaxAppAttempts + "]. Use the global max attempts instead.");
} else {
  this.maxAppAttempts = individualMaxAppAttempts;
}
{code}

> WARN message "o.a.h.y.s.resourcemanager.rmapp.RMAppImpl: The specific max 
> attempts: 0 for application: 8 is invalid" when starting application on YARN
> --
>
> Key: SPARK-15984
> URL: https://issues.apache.org/jira/browse/SPARK-15984
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> When executing {{spark-shell}} on Spark on YARN 2.7.2 on Mac OS as follows:
> {code}
> YARN_CONF_DIR=hadoop-conf ./bin/spark-shell --master yarn -c 
> spark.shuffle.service.enabled=true --deploy-mode client -c 
> spark.scheduler.mode=FAIR
> {code}
> it ends up with the following WARN in the logs:
> {code}
> 2016-06-16 08:33:05,308 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new 
> applicationId: 8
> 2016-06-16 08:33:07,305 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The specific 
> max attempts: 0 for application: 8 is invalid, because it is out of the range 
> [1, 2]. Use the global max attempts instead.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   >