date:20160816

[jira] [Assigned] (SPARK-17101) Provide format identifier for TextFileFormat

2016-08-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17101:


Assignee: Apache Spark

> Provide format identifier for TextFileFormat
> 
>
> Key: SPARK-17101
> URL: https://issues.apache.org/jira/browse/SPARK-17101
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Jacek Laskowski
>Assignee: Apache Spark
>Priority: Trivial
>
> Define the format identifier that is used in {{Optimized Logical Plan}} in 
> {{explain}} for {{text}} file format.
> {code}
> scala> spark.read.text("people.csv").cache.explain(extended = true)
> == Parsed Logical Plan ==
> Relation[value#24] text
> == Analyzed Logical Plan ==
> value: string
> Relation[value#24] text
> == Optimized Logical Plan ==
> InMemoryRelation [value#24], true, 1, StorageLevel(disk, memory, 
> deserialized, 1 replicas)
>+- *FileScan text [value#24] Batched: false, Format: 
> org.apache.spark.sql.execution.datasources.text.TextFileFormat@262e2c8c, 
> InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], 
> PushedFilters: [], ReadSchema: struct
> == Physical Plan ==
> InMemoryTableScan [value#24]
>+- InMemoryRelation [value#24], true, 1, StorageLevel(disk, memory, 
> deserialized, 1 replicas)
>  +- *FileScan text [value#24] Batched: false, Format: 
> org.apache.spark.sql.execution.datasources.text.TextFileFormat@262e2c8c, 
> InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], 
> PushedFilters: [], ReadSchema: struct
> {code}
> When you {{explain}} csv format you can see {{Format: CSV}}.
> {code}
> scala> spark.read.csv("people.csv").cache.explain(extended = true)
> == Parsed Logical Plan ==
> Relation[_c0#39,_c1#40,_c2#41,_c3#42] csv
> == Analyzed Logical Plan ==
> _c0: string, _c1: string, _c2: string, _c3: string
> Relation[_c0#39,_c1#40,_c2#41,_c3#42] csv
> == Optimized Logical Plan ==
> InMemoryRelation [_c0#39, _c1#40, _c2#41, _c3#42], true, 1, 
> StorageLevel(disk, memory, deserialized, 1 replicas)
>+- *FileScan csv [_c0#39,_c1#40,_c2#41,_c3#42] Batched: false, Format: 
> CSV, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, 
> PartitionFilters: [], PushedFilters: [], ReadSchema: 
> struct<_c0:string,_c1:string,_c2:string,_c3:string>
> == Physical Plan ==
> InMemoryTableScan [_c0#39, _c1#40, _c2#41, _c3#42]
>+- InMemoryRelation [_c0#39, _c1#40, _c2#41, _c3#42], true, 1, 
> StorageLevel(disk, memory, deserialized, 1 replicas)
>  +- *FileScan csv [_c0#39,_c1#40,_c2#41,_c3#42] Batched: false, 
> Format: CSV, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, 
> PartitionFilters: [], PushedFilters: [], ReadSchema: 
> struct<_c0:string,_c1:string,_c2:string,_c3:string>
> {code}
> The custom format is defined for JSON, too.
> {code}
> scala> spark.read.json("people.csv").cache.explain(extended = true)
> == Parsed Logical Plan ==
> Relation[_corrupt_record#93] json
> == Analyzed Logical Plan ==
> _corrupt_record: string
> Relation[_corrupt_record#93] json
> == Optimized Logical Plan ==
> InMemoryRelation [_corrupt_record#93], true, 1, StorageLevel(disk, 
> memory, deserialized, 1 replicas)
>+- *FileScan json [_corrupt_record#93] Batched: false, Format: JSON, 
> InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], 
> PushedFilters: [], ReadSchema: struct<_corrupt_record:string>
> == Physical Plan ==
> InMemoryTableScan [_corrupt_record#93]
>+- InMemoryRelation [_corrupt_record#93], true, 1, StorageLevel(disk, 
> memory, deserialized, 1 replicas)
>  +- *FileScan json [_corrupt_record#93] Batched: false, Format: JSON, 
> InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], 
> PushedFilters: [], ReadSchema: struct<_corrupt_record:string>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17101) Provide format identifier for TextFileFormat

2016-08-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15424001#comment-15424001
 ] 

Apache Spark commented on SPARK-17101:
--

User 'jaceklaskowski' has created a pull request for this issue:
https://github.com/apache/spark/pull/14680

> Provide format identifier for TextFileFormat
> 
>
> Key: SPARK-17101
> URL: https://issues.apache.org/jira/browse/SPARK-17101
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Jacek Laskowski
>Priority: Trivial
>
> Define the format identifier that is used in {{Optimized Logical Plan}} in 
> {{explain}} for {{text}} file format.
> {code}
> scala> spark.read.text("people.csv").cache.explain(extended = true)
> == Parsed Logical Plan ==
> Relation[value#24] text
> == Analyzed Logical Plan ==
> value: string
> Relation[value#24] text
> == Optimized Logical Plan ==
> InMemoryRelation [value#24], true, 1, StorageLevel(disk, memory, 
> deserialized, 1 replicas)
>+- *FileScan text [value#24] Batched: false, Format: 
> org.apache.spark.sql.execution.datasources.text.TextFileFormat@262e2c8c, 
> InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], 
> PushedFilters: [], ReadSchema: struct
> == Physical Plan ==
> InMemoryTableScan [value#24]
>+- InMemoryRelation [value#24], true, 1, StorageLevel(disk, memory, 
> deserialized, 1 replicas)
>  +- *FileScan text [value#24] Batched: false, Format: 
> org.apache.spark.sql.execution.datasources.text.TextFileFormat@262e2c8c, 
> InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], 
> PushedFilters: [], ReadSchema: struct
> {code}
> When you {{explain}} csv format you can see {{Format: CSV}}.
> {code}
> scala> spark.read.csv("people.csv").cache.explain(extended = true)
> == Parsed Logical Plan ==
> Relation[_c0#39,_c1#40,_c2#41,_c3#42] csv
> == Analyzed Logical Plan ==
> _c0: string, _c1: string, _c2: string, _c3: string
> Relation[_c0#39,_c1#40,_c2#41,_c3#42] csv
> == Optimized Logical Plan ==
> InMemoryRelation [_c0#39, _c1#40, _c2#41, _c3#42], true, 1, 
> StorageLevel(disk, memory, deserialized, 1 replicas)
>+- *FileScan csv [_c0#39,_c1#40,_c2#41,_c3#42] Batched: false, Format: 
> CSV, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, 
> PartitionFilters: [], PushedFilters: [], ReadSchema: 
> struct<_c0:string,_c1:string,_c2:string,_c3:string>
> == Physical Plan ==
> InMemoryTableScan [_c0#39, _c1#40, _c2#41, _c3#42]
>+- InMemoryRelation [_c0#39, _c1#40, _c2#41, _c3#42], true, 1, 
> StorageLevel(disk, memory, deserialized, 1 replicas)
>  +- *FileScan csv [_c0#39,_c1#40,_c2#41,_c3#42] Batched: false, 
> Format: CSV, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, 
> PartitionFilters: [], PushedFilters: [], ReadSchema: 
> struct<_c0:string,_c1:string,_c2:string,_c3:string>
> {code}
> The custom format is defined for JSON, too.
> {code}
> scala> spark.read.json("people.csv").cache.explain(extended = true)
> == Parsed Logical Plan ==
> Relation[_corrupt_record#93] json
> == Analyzed Logical Plan ==
> _corrupt_record: string
> Relation[_corrupt_record#93] json
> == Optimized Logical Plan ==
> InMemoryRelation [_corrupt_record#93], true, 1, StorageLevel(disk, 
> memory, deserialized, 1 replicas)
>+- *FileScan json [_corrupt_record#93] Batched: false, Format: JSON, 
> InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], 
> PushedFilters: [], ReadSchema: struct<_corrupt_record:string>
> == Physical Plan ==
> InMemoryTableScan [_corrupt_record#93]
>+- InMemoryRelation [_corrupt_record#93], true, 1, StorageLevel(disk, 
> memory, deserialized, 1 replicas)
>  +- *FileScan json [_corrupt_record#93] Batched: false, Format: JSON, 
> InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], 
> PushedFilters: [], ReadSchema: struct<_corrupt_record:string>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16391) KeyValueGroupedDataset.reduceGroups should support partial aggregation

2016-08-16 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-16391:

Target Version/s: 2.0.1, 2.1.0  (was: 2.1.0)

> KeyValueGroupedDataset.reduceGroups should support partial aggregation
> --
>
> Key: SPARK-16391
> URL: https://issues.apache.org/jira/browse/SPARK-16391
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>
> KeyValueGroupedDataset.reduceGroups is currently implemented via 
> flatMapGroups, which is very inefficient since effectively does a physical 
> group by.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-17102) bypass UserDefinedGenerator for json format check

2016-08-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17102:


Assignee: Apache Spark  (was: Wenchen Fan)

> bypass UserDefinedGenerator for json format check
> -
>
> Key: SPARK-17102
> URL: https://issues.apache.org/jira/browse/SPARK-17102
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17102) bypass UserDefinedGenerator for json format check

2016-08-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423989#comment-15423989
 ] 

Apache Spark commented on SPARK-17102:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/14679

> bypass UserDefinedGenerator for json format check
> -
>
> Key: SPARK-17102
> URL: https://issues.apache.org/jira/browse/SPARK-17102
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-17102) bypass UserDefinedGenerator for json format check

2016-08-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17102:


Assignee: Wenchen Fan  (was: Apache Spark)

> bypass UserDefinedGenerator for json format check
> -
>
> Key: SPARK-17102
> URL: https://issues.apache.org/jira/browse/SPARK-17102
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17101) Provide format identifier for TextFileFormat

2016-08-16 Thread Jacek Laskowski (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Laskowski updated SPARK-17101:

Description: 
Define the format identifier that is used in {{Optimized Logical Plan}} in 
{{explain}} for {{text}} file format.

{code}
scala> spark.read.text("people.csv").cache.explain(extended = true)
== Parsed Logical Plan ==
Relation[value#24] text

== Analyzed Logical Plan ==
value: string
Relation[value#24] text

== Optimized Logical Plan ==
InMemoryRelation [value#24], true, 1, StorageLevel(disk, memory, 
deserialized, 1 replicas)
   +- *FileScan text [value#24] Batched: false, Format: 
org.apache.spark.sql.execution.datasources.text.TextFileFormat@262e2c8c, 
InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], 
PushedFilters: [], ReadSchema: struct

== Physical Plan ==
InMemoryTableScan [value#24]
   +- InMemoryRelation [value#24], true, 1, StorageLevel(disk, memory, 
deserialized, 1 replicas)
 +- *FileScan text [value#24] Batched: false, Format: 
org.apache.spark.sql.execution.datasources.text.TextFileFormat@262e2c8c, 
InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], 
PushedFilters: [], ReadSchema: struct
{code}

When you {{explain}} csv format you can see {{Format: CSV}}.

{code}
scala> spark.read.csv("people.csv").cache.explain(extended = true)
== Parsed Logical Plan ==
Relation[_c0#39,_c1#40,_c2#41,_c3#42] csv

== Analyzed Logical Plan ==
_c0: string, _c1: string, _c2: string, _c3: string
Relation[_c0#39,_c1#40,_c2#41,_c3#42] csv

== Optimized Logical Plan ==
InMemoryRelation [_c0#39, _c1#40, _c2#41, _c3#42], true, 1, 
StorageLevel(disk, memory, deserialized, 1 replicas)
   +- *FileScan csv [_c0#39,_c1#40,_c2#41,_c3#42] Batched: false, Format: CSV, 
InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], 
PushedFilters: [], ReadSchema: 
struct<_c0:string,_c1:string,_c2:string,_c3:string>

== Physical Plan ==
InMemoryTableScan [_c0#39, _c1#40, _c2#41, _c3#42]
   +- InMemoryRelation [_c0#39, _c1#40, _c2#41, _c3#42], true, 1, 
StorageLevel(disk, memory, deserialized, 1 replicas)
 +- *FileScan csv [_c0#39,_c1#40,_c2#41,_c3#42] Batched: false, Format: 
CSV, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: 
[], PushedFilters: [], ReadSchema: 
struct<_c0:string,_c1:string,_c2:string,_c3:string>
{code}

The custom format is defined for JSON, too.

{code}
scala> spark.read.json("people.csv").cache.explain(extended = true)
== Parsed Logical Plan ==
Relation[_corrupt_record#93] json

== Analyzed Logical Plan ==
_corrupt_record: string
Relation[_corrupt_record#93] json

== Optimized Logical Plan ==
InMemoryRelation [_corrupt_record#93], true, 1, StorageLevel(disk, memory, 
deserialized, 1 replicas)
   +- *FileScan json [_corrupt_record#93] Batched: false, Format: JSON, 
InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], 
PushedFilters: [], ReadSchema: struct<_corrupt_record:string>

== Physical Plan ==
InMemoryTableScan [_corrupt_record#93]
   +- InMemoryRelation [_corrupt_record#93], true, 1, StorageLevel(disk, 
memory, deserialized, 1 replicas)
 +- *FileScan json [_corrupt_record#93] Batched: false, Format: JSON, 
InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], 
PushedFilters: [], ReadSchema: struct<_corrupt_record:string>
{code}



  was:
Define the format identifier that is used in {{Optimized Logical Plan}} in 
{{explain}} for {{text}} file format.

When you {{explain}} csv format you can see {{Format: CSV}}.

{code}
scala> spark.read.csv("people.csv").cache.explain(extended = true)
== Parsed Logical Plan ==
Relation[_c0#39,_c1#40,_c2#41,_c3#42] csv

== Analyzed Logical Plan ==
_c0: string, _c1: string, _c2: string, _c3: string
Relation[_c0#39,_c1#40,_c2#41,_c3#42] csv

== Optimized Logical Plan ==
InMemoryRelation [_c0#39, _c1#40, _c2#41, _c3#42], true, 1, 
StorageLevel(disk, memory, deserialized, 1 replicas)
   +- *FileScan csv [_c0#39,_c1#40,_c2#41,_c3#42] Batched: false, Format: CSV, 
InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], 
PushedFilters: [], ReadSchema: 
struct<_c0:string,_c1:string,_c2:string,_c3:string>

== Physical Plan ==
InMemoryTableScan [_c0#39, _c1#40, _c2#41, _c3#42]
   +- InMemoryRelation [_c0#39, _c1#40, _c2#41, _c3#42], true, 1, 
StorageLevel(disk, memory, deserialized, 1 replicas)
 +- *FileScan csv [_c0#39,_c1#40,_c2#41,_c3#42] Batched: false, Format: 
CSV, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: 
[], PushedFilters: [], ReadSchema: 
struct<_c0:string,_c1:string,_c2:string,_c3:string>
{code}

The custom format is defined for JSON, too.

{code}
scala> spark.read.json("people.csv").cache.explain(extended = true)
== Parsed Logical Plan ==
Relation[_corrupt_record#93] json

== Analyzed Logical P

[jira] [Created] (SPARK-17102) bypass UserDefinedGenerator for json format check

2016-08-16 Thread Wenchen Fan (JIRA)

Wenchen Fan created SPARK-17102:
---

 Summary: bypass UserDefinedGenerator for json format check
 Key: SPARK-17102
 URL: https://issues.apache.org/jira/browse/SPARK-17102
 Project: Spark
  Issue Type: Test
  Components: SQL
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17101) Provide format identifier for TextFileFormat

2016-08-16 Thread Jacek Laskowski (JIRA)

Jacek Laskowski created SPARK-17101:
---

 Summary: Provide format identifier for TextFileFormat
 Key: SPARK-17101
 URL: https://issues.apache.org/jira/browse/SPARK-17101
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.1.0
Reporter: Jacek Laskowski
Priority: Trivial


Define the format identifier that is used in {{Optimized Logical Plan}} in 
{{explain}} for {{text}} file format.

When you {{explain}} csv format you can see {{Format: CSV}}.

{code}
scala> spark.read.csv("people.csv").cache.explain(extended = true)
== Parsed Logical Plan ==
Relation[_c0#39,_c1#40,_c2#41,_c3#42] csv

== Analyzed Logical Plan ==
_c0: string, _c1: string, _c2: string, _c3: string
Relation[_c0#39,_c1#40,_c2#41,_c3#42] csv

== Optimized Logical Plan ==
InMemoryRelation [_c0#39, _c1#40, _c2#41, _c3#42], true, 1, 
StorageLevel(disk, memory, deserialized, 1 replicas)
   +- *FileScan csv [_c0#39,_c1#40,_c2#41,_c3#42] Batched: false, Format: CSV, 
InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], 
PushedFilters: [], ReadSchema: 
struct<_c0:string,_c1:string,_c2:string,_c3:string>

== Physical Plan ==
InMemoryTableScan [_c0#39, _c1#40, _c2#41, _c3#42]
   +- InMemoryRelation [_c0#39, _c1#40, _c2#41, _c3#42], true, 1, 
StorageLevel(disk, memory, deserialized, 1 replicas)
 +- *FileScan csv [_c0#39,_c1#40,_c2#41,_c3#42] Batched: false, Format: 
CSV, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: 
[], PushedFilters: [], ReadSchema: 
struct<_c0:string,_c1:string,_c2:string,_c3:string>
{code}

The custom format is defined for JSON, too.

{code}
scala> spark.read.json("people.csv").cache.explain(extended = true)
== Parsed Logical Plan ==
Relation[_corrupt_record#93] json

== Analyzed Logical Plan ==
_corrupt_record: string
Relation[_corrupt_record#93] json

== Optimized Logical Plan ==
InMemoryRelation [_corrupt_record#93], true, 1, StorageLevel(disk, memory, 
deserialized, 1 replicas)
   +- *FileScan json [_corrupt_record#93] Batched: false, Format: JSON, 
InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], 
PushedFilters: [], ReadSchema: struct<_corrupt_record:string>

== Physical Plan ==
InMemoryTableScan [_corrupt_record#93]
   +- InMemoryRelation [_corrupt_record#93], true, 1, StorageLevel(disk, 
memory, deserialized, 1 replicas)
 +- *FileScan json [_corrupt_record#93] Batched: false, Format: JSON, 
InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], 
PushedFilters: [], ReadSchema: struct<_corrupt_record:string>
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-17068) Retain view visibility information through out Analysis

2016-08-16 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-17068.
-
   Resolution: Fixed
Fix Version/s: 2.0.1

> Retain view visibility information through out Analysis
> ---
>
> Key: SPARK-17068
> URL: https://issues.apache.org/jira/browse/SPARK-17068
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Herman van Hovell
>Assignee: Herman van Hovell
> Fix For: 2.0.1, 2.1.0
>
>
> Views in Spark SQL are replaced by their backing {{LogicalPlan}} during 
> analysis. This can be confusing when dealing with and debugging large 
> {{LogicalPlan}}s. I propose to add an identifier to the subquery alias in 
> order to improve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17068) Retain view visibility information through out Analysis

2016-08-16 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-17068:

Fix Version/s: (was: 2.0.1)

> Retain view visibility information through out Analysis
> ---
>
> Key: SPARK-17068
> URL: https://issues.apache.org/jira/browse/SPARK-17068
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Herman van Hovell
>Assignee: Herman van Hovell
> Fix For: 2.1.0
>
>
> Views in Spark SQL are replaced by their backing {{LogicalPlan}} during 
> analysis. This can be confusing when dealing with and debugging large 
> {{LogicalPlan}}s. I propose to add an identifier to the subquery alias in 
> order to improve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17100) pyspark filter on a udf column after join gives java.lang.UnsupportedOperationException

2016-08-16 Thread Tim Sell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423917#comment-15423917
 ] 

Tim Sell commented on SPARK-17100:
--

I don't know why, but using `dataframe.cache()` before the filter is a 
workaround. 

> pyspark filter on a udf column after join gives 
> java.lang.UnsupportedOperationException
> ---
>
> Key: SPARK-17100
> URL: https://issues.apache.org/jira/browse/SPARK-17100
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.0.0
> Environment: spark-2.0.0-bin-hadoop2.7. Python2 and Python3.
>Reporter: Tim Sell
> Attachments: bug.py, test_bug.py
>
>
> In pyspark, when filtering on a udf derived column after some join types,
> the optimized logical plan results is a 
> java.lang.UnsupportedOperationException.
> I could not replicate this in scala code from the shell, just python. It is a 
> pyspark regression from spark 1.6.2.
> This can be replicated with: bin/spark-submit bug.py
> {code:python:title=bug.py}
> import pyspark.sql.functions as F
> from pyspark.sql import Row, SparkSession
> if __name__ == '__main__':
> spark = SparkSession.builder.appName("test").getOrCreate()
> left = spark.createDataFrame([Row(a=1)])
> right = spark.createDataFrame([Row(a=1)])
> df = left.join(right, on='a', how='left_outer')
> df = df.withColumn('b', F.udf(lambda x: 'x')(df.a))
> df = df.filter('b = "x"')
> df.explain(extended=True)
> {code}
> The output is:
> {code}
> == Parsed Logical Plan ==
> 'Filter ('b = x)
> +- Project [a#0L, (a#0L) AS b#8]
>+- Project [a#0L]
>   +- Join LeftOuter, (a#0L = a#3L)
>  :- LogicalRDD [a#0L]
>  +- LogicalRDD [a#3L]
> == Analyzed Logical Plan ==
> a: bigint, b: string
> Filter (b#8 = x)
> +- Project [a#0L, (a#0L) AS b#8]
>+- Project [a#0L]
>   +- Join LeftOuter, (a#0L = a#3L)
>  :- LogicalRDD [a#0L]
>  +- LogicalRDD [a#3L]
> == Optimized Logical Plan ==
> java.lang.UnsupportedOperationException: Cannot evaluate expression: 
> (input[0, bigint, true])
> == Physical Plan ==
> java.lang.UnsupportedOperationException: Cannot evaluate expression: 
> (input[0, bigint, true])
> {code}
> It fails when the join is:
> * how='outer', on=column expression
> * how='left_outer', on=string or column expression
> * how='right_outer', on=string or column expression
> It passes when the join is:
> * how='inner', on=string or column expression
> * how='outer', on=string
> I made some tests to demonstrate each of these.
> Run with bin/spark-submit test_bug.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17084) Rename ParserUtils.assert to validate

2016-08-16 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-17084:

Summary: Rename ParserUtils.assert to validate  (was: Rename 
ParserUtils.assert to require)

> Rename ParserUtils.assert to validate
> -
>
> Key: SPARK-17084
> URL: https://issues.apache.org/jira/browse/SPARK-17084
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.1, 2.1.0
>Reporter: Herman van Hovell
>Assignee: Herman van Hovell
>Priority: Minor
> Fix For: 2.0.1, 2.1.0
>
>
> We currently have an assert method in ParserUtils. This is, however, not used 
> as an assert (a failed assert meaning that the program has reached an invalid 
> state) but is used to check requirements. I propose to rename this method to 
> {{require}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17084) Rename ParserUtils.assert to validate

2016-08-16 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-17084:

Description: We currently have an assert method in ParserUtils. This is, 
however, not used as an assert (a failed assert meaning that the program has 
reached an invalid state) but is used to check requirements. I propose to 
rename this method to {{validate}}.  (was: We currently have an assert method 
in ParserUtils. This is, however, not used as an assert (a failed assert 
meaning that the program has reached an invalid state) but is used to check 
requirements. I propose to rename this method to {{require}})

> Rename ParserUtils.assert to validate
> -
>
> Key: SPARK-17084
> URL: https://issues.apache.org/jira/browse/SPARK-17084
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.1, 2.1.0
>Reporter: Herman van Hovell
>Assignee: Herman van Hovell
>Priority: Minor
> Fix For: 2.0.1, 2.1.0
>
>
> We currently have an assert method in ParserUtils. This is, however, not used 
> as an assert (a failed assert meaning that the program has reached an invalid 
> state) but is used to check requirements. I propose to rename this method to 
> {{validate}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-17084) Rename ParserUtils.assert to require

2016-08-16 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-17084.
-
   Resolution: Fixed
Fix Version/s: 2.1.0
   2.0.1

> Rename ParserUtils.assert to require
> 
>
> Key: SPARK-17084
> URL: https://issues.apache.org/jira/browse/SPARK-17084
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.1, 2.1.0
>Reporter: Herman van Hovell
>Assignee: Herman van Hovell
>Priority: Minor
> Fix For: 2.0.1, 2.1.0
>
>
> We currently have an assert method in ParserUtils. This is, however, not used 
> as an assert (a failed assert meaning that the program has reached an invalid 
> state) but is used to check requirements. I propose to rename this method to 
> {{require}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17100) pyspark filter on a udf column after join gives java.lang.UnsupportedOperationException

2016-08-16 Thread Tim Sell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Sell updated SPARK-17100:
-
Attachment: test_bug.py

> pyspark filter on a udf column after join gives 
> java.lang.UnsupportedOperationException
> ---
>
> Key: SPARK-17100
> URL: https://issues.apache.org/jira/browse/SPARK-17100
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.0.0
> Environment: spark-2.0.0-bin-hadoop2.7. Python2 and Python3.
>Reporter: Tim Sell
> Attachments: bug.py, test_bug.py
>
>
> In pyspark, when filtering on a udf derived column after some join types,
> the optimized logical plan results is a 
> java.lang.UnsupportedOperationException.
> I could not replicate this in scala code from the shell, just python. It is a 
> pyspark regression from spark 1.6.2.
> This can be replicated with: bin/spark-submit bug.py
> {code:python:title=bug.py}
> import pyspark.sql.functions as F
> from pyspark.sql import Row, SparkSession
> if __name__ == '__main__':
> spark = SparkSession.builder.appName("test").getOrCreate()
> left = spark.createDataFrame([Row(a=1)])
> right = spark.createDataFrame([Row(a=1)])
> df = left.join(right, on='a', how='left_outer')
> df = df.withColumn('b', F.udf(lambda x: 'x')(df.a))
> df = df.filter('b = "x"')
> df.explain(extended=True)
> {code}
> The output is:
> {code}
> == Parsed Logical Plan ==
> 'Filter ('b = x)
> +- Project [a#0L, (a#0L) AS b#8]
>+- Project [a#0L]
>   +- Join LeftOuter, (a#0L = a#3L)
>  :- LogicalRDD [a#0L]
>  +- LogicalRDD [a#3L]
> == Analyzed Logical Plan ==
> a: bigint, b: string
> Filter (b#8 = x)
> +- Project [a#0L, (a#0L) AS b#8]
>+- Project [a#0L]
>   +- Join LeftOuter, (a#0L = a#3L)
>  :- LogicalRDD [a#0L]
>  +- LogicalRDD [a#3L]
> == Optimized Logical Plan ==
> java.lang.UnsupportedOperationException: Cannot evaluate expression: 
> (input[0, bigint, true])
> == Physical Plan ==
> java.lang.UnsupportedOperationException: Cannot evaluate expression: 
> (input[0, bigint, true])
> {code}
> It fails when the join is:
> * how='outer', on=column expression
> * how='left_outer', on=string or column expression
> * how='right_outer', on=string or column expression
> It passes when the join is:
> * how='inner', on=string or column expression
> * how='outer', on=string
> I made some tests to demonstrate each of these.
> Run with bin/spark-submit test_bug.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17100) pyspark filter on a udf column after join gives java.lang.UnsupportedOperationException

2016-08-16 Thread Tim Sell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Sell updated SPARK-17100:
-
Attachment: bug.py

> pyspark filter on a udf column after join gives 
> java.lang.UnsupportedOperationException
> ---
>
> Key: SPARK-17100
> URL: https://issues.apache.org/jira/browse/SPARK-17100
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.0.0
> Environment: spark-2.0.0-bin-hadoop2.7. Python2 and Python3.
>Reporter: Tim Sell
> Attachments: bug.py, test_bug.py
>
>
> In pyspark, when filtering on a udf derived column after some join types,
> the optimized logical plan results is a 
> java.lang.UnsupportedOperationException.
> I could not replicate this in scala code from the shell, just python. It is a 
> pyspark regression from spark 1.6.2.
> This can be replicated with: bin/spark-submit bug.py
> {code:python:title=bug.py}
> import pyspark.sql.functions as F
> from pyspark.sql import Row, SparkSession
> if __name__ == '__main__':
> spark = SparkSession.builder.appName("test").getOrCreate()
> left = spark.createDataFrame([Row(a=1)])
> right = spark.createDataFrame([Row(a=1)])
> df = left.join(right, on='a', how='left_outer')
> df = df.withColumn('b', F.udf(lambda x: 'x')(df.a))
> df = df.filter('b = "x"')
> df.explain(extended=True)
> {code}
> The output is:
> {code}
> == Parsed Logical Plan ==
> 'Filter ('b = x)
> +- Project [a#0L, (a#0L) AS b#8]
>+- Project [a#0L]
>   +- Join LeftOuter, (a#0L = a#3L)
>  :- LogicalRDD [a#0L]
>  +- LogicalRDD [a#3L]
> == Analyzed Logical Plan ==
> a: bigint, b: string
> Filter (b#8 = x)
> +- Project [a#0L, (a#0L) AS b#8]
>+- Project [a#0L]
>   +- Join LeftOuter, (a#0L = a#3L)
>  :- LogicalRDD [a#0L]
>  +- LogicalRDD [a#3L]
> == Optimized Logical Plan ==
> java.lang.UnsupportedOperationException: Cannot evaluate expression: 
> (input[0, bigint, true])
> == Physical Plan ==
> java.lang.UnsupportedOperationException: Cannot evaluate expression: 
> (input[0, bigint, true])
> {code}
> It fails when the join is:
> * how='outer', on=column expression
> * how='left_outer', on=string or column expression
> * how='right_outer', on=string or column expression
> It passes when the join is:
> * how='inner', on=string or column expression
> * how='outer', on=string
> I made some tests to demonstrate each of these.
> Run with bin/spark-submit test_bug.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17100) pyspark filter on a udf column after join gives java.lang.UnsupportedOperationException

2016-08-16 Thread Tim Sell (JIRA)

Tim Sell created SPARK-17100:


 Summary: pyspark filter on a udf column after join gives 
java.lang.UnsupportedOperationException
 Key: SPARK-17100
 URL: https://issues.apache.org/jira/browse/SPARK-17100
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 2.0.0
 Environment: spark-2.0.0-bin-hadoop2.7. Python2 and Python3.
Reporter: Tim Sell


In pyspark, when filtering on a udf derived column after some join types,
the optimized logical plan results is a java.lang.UnsupportedOperationException.

I could not replicate this in scala code from the shell, just python. It is a 
pyspark regression from spark 1.6.2.

This can be replicated with: bin/spark-submit bug.py

{code:python:title=bug.py}
import pyspark.sql.functions as F
from pyspark.sql import Row, SparkSession

if __name__ == '__main__':
spark = SparkSession.builder.appName("test").getOrCreate()
left = spark.createDataFrame([Row(a=1)])
right = spark.createDataFrame([Row(a=1)])
df = left.join(right, on='a', how='left_outer')
df = df.withColumn('b', F.udf(lambda x: 'x')(df.a))
df = df.filter('b = "x"')
df.explain(extended=True)
{code}

The output is:
{code}
== Parsed Logical Plan ==
'Filter ('b = x)
+- Project [a#0L, (a#0L) AS b#8]
   +- Project [a#0L]
  +- Join LeftOuter, (a#0L = a#3L)
 :- LogicalRDD [a#0L]
 +- LogicalRDD [a#3L]

== Analyzed Logical Plan ==
a: bigint, b: string
Filter (b#8 = x)
+- Project [a#0L, (a#0L) AS b#8]
   +- Project [a#0L]
  +- Join LeftOuter, (a#0L = a#3L)
 :- LogicalRDD [a#0L]
 +- LogicalRDD [a#3L]

== Optimized Logical Plan ==
java.lang.UnsupportedOperationException: Cannot evaluate expression: 
(input[0, bigint, true])
== Physical Plan ==
java.lang.UnsupportedOperationException: Cannot evaluate expression: 
(input[0, bigint, true])
{code}


It fails when the join is:

* how='outer', on=column expression
* how='left_outer', on=string or column expression
* how='right_outer', on=string or column expression

It passes when the join is:

* how='inner', on=string or column expression
* how='outer', on=string

I made some tests to demonstrate each of these.

Run with bin/spark-submit test_bug.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17082) Replace ByteBuffer with ChunkedByteBuffer

2016-08-16 Thread Guoqiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guoqiang Li updated SPARK-17082:

Description: The size of ByteBuffers can not be greater than 2G, should be 
replaced by ChunkedByteBuffer  (was: The size of ByteBuffers can not be greater 
than 2G, should be replaced by Java)

> Replace ByteBuffer with ChunkedByteBuffer
> -
>
> Key: SPARK-17082
> URL: https://issues.apache.org/jira/browse/SPARK-17082
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Reporter: Guoqiang Li
>
> The size of ByteBuffers can not be greater than 2G, should be replaced by 
> ChunkedByteBuffer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17082) Replace ByteBuffer with ChunkedByteBuffer

2016-08-16 Thread Guoqiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guoqiang Li updated SPARK-17082:

Description: The size of ByteBuffers can not be greater than 2G, should be 
replaced by Java  (was: the various 2G limit we have in Spark, due to the use 
of ByteBuffers.)

> Replace ByteBuffer with ChunkedByteBuffer
> -
>
> Key: SPARK-17082
> URL: https://issues.apache.org/jira/browse/SPARK-17082
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Reporter: Guoqiang Li
>
> The size of ByteBuffers can not be greater than 2G, should be replaced by Java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17093) Roundtrip encoding of array> fields is wrong when whole-stage codegen is disabled

2016-08-16 Thread Liwei Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423777#comment-15423777
 ] 

Liwei Lin commented on SPARK-17093:
---

Oh the interpreted evaluation codepath indeed forgot to {{copy}} somewhere. 
I'll submit a patch shortly, thanks.

> Roundtrip encoding of array> fields is wrong when whole-stage 
> codegen is disabled
> --
>
> Key: SPARK-17093
> URL: https://issues.apache.org/jira/browse/SPARK-17093
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Josh Rosen
>Priority: Critical
>
> The following failing test demonstrates a bug where Spark mis-encodes 
> array-of-struct fields if whole-stage codegen is disabled:
> {code}
> withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
>   val data = Array(Array((1, 2), (3, 4)))
>   val ds = spark.sparkContext.parallelize(data).toDS()
>   assert(ds.collect() === data)
> }
> {code}
> When wholestage codegen is enabled (the default), this works fine. When it's 
> disabled, as in the test above, Spark returns {{Array(Array((3,4), (3,4)))}}. 
> Because the last element of the array appears to be repeated my best guess is 
> that the interpreted evaluation codepath forgot to {{copy()}} somewhere.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17099) Incorrect result when HAVING clause is added to group by query

2016-08-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-17099:
---
Description: 
Random query generation uncovered the following query which returns incorrect 
results when run on Spark SQL. This wasn't the original query uncovered by the 
generator, since I performed a bit of minimization to try to make it more 
understandable.

With the following tables:

{code}
val t1 = sc.parallelize(Seq(-234, 145, 367, 975, 298)).toDF("int_col_5")
val t2 = sc.parallelize(
  Seq(
(-769, -244),
(-800, -409),
(940, 86),
(-507, 304),
(-367, 158))
).toDF("int_col_2", "int_col_5")

t1.registerTempTable("t1")
t2.registerTempTable("t2")
{code}

Run

{code}
SELECT
  (SUM(COALESCE(t1.int_col_5, t2.int_col_2))),
 ((COALESCE(t1.int_col_5, t2.int_col_2)) * 2)
FROM t1
RIGHT JOIN t2
  ON (t2.int_col_2) = (t1.int_col_5)
GROUP BY GREATEST(COALESCE(t2.int_col_5, 109), COALESCE(t1.int_col_5, -449)),
 COALESCE(t1.int_col_5, t2.int_col_2)
HAVING (SUM(COALESCE(t1.int_col_5, t2.int_col_2))) > ((COALESCE(t1.int_col_5, 
t2.int_col_2)) * 2)
{code}

In Spark SQL, this returns an empty result set, whereas Postgres returns four 
rows. However, if I omit the {{HAVING}} clause I see that the group's rows are 
being incorrectly filtered by the {{HAVING}} clause:

{code}
+--+---+--+
| sum(coalesce(int_col_5, int_col_2))  | (coalesce(int_col_5, int_col_2) * 2)  |
+--+---+--+
| -507 | -1014 |
| 940  | 1880  |
| -769 | -1538 |
| -367 | -734  |
| -800 | -1600 |
+--+---+--+
{code}

Based on this, the output after adding the {{HAVING}} should contain four rows, 
not zero.

I'm not sure how to further shrink this in a straightforward way, so I'm 
opening this bug to get help in triaging further.

  was:
Random query generation uncovered the following query which returns incorrect 
results when run on Spark SQL. This wasn't the original query uncovered by the 
generator, since I performed a bit of minimization to try to make it more 
understandable.

With the following tables:

{code}
val t1 = sc.parallelize(Seq(-234, 145, 367, 975, 298)).toDF("int_col_5")
val t2 = sc.parallelize(
  Seq(
(-769, -244),
(-800, -409),
(940, 86),
(-507, 304),
(-367, 158))
).toDF("int_col_2", "int_col_5")

t1.registerTempTable("t1")
t2.registerTempTable("t2")
{code}

Run

{code}
SELECT
  (SUM(COALESCE(t1.int_col_5, t2.int_col_2))),
 ((COALESCE(t1.int_col_5, t2.int_col_2)) * 2)
FROM t1
RIGHT JOIN t2
  ON (t2.int_col_2) = (t1.int_col_5)
GROUP BY GREATEST(COALESCE(t2.int_col_5, 109), COALESCE(t1.int_col_5, -449)),
 COALESCE(t1.int_col_5, t2.int_col_2)
HAVING (SUM(COALESCE(t1.int_col_5, t2.int_col_2))) > ((COALESCE(t1.int_col_5, 
t2.int_col_2)) * 2)
{code}

In Spark SQL, this returns an empty result set, whereas Postgres returns four 
rows. However, if I omit the {{HAVING}} clause I see that the group's rows are 
being incorrectly filtered by it:

{code}
+--+---+--+
| sum(coalesce(int_col_5, int_col_2))  | (coalesce(int_col_5, int_col_2) * 2)  |
+--+---+--+
| -507 | -1014 |
| 940  | 1880  |
| -769 | -1538 |
| -367 | -734  |
| -800 | -1600 |
+--+---+--+
{code}

Based on this, the output after adding the {{HAVING}} should contain four rows, 
not zero.

I'm not sure how to further shrink this in a straightforward way, so I'm 
opening this bug to get help in triaging further.


> Incorrect result when HAVING clause is added to group by query
> --
>
> Key: SPARK-17099
> URL: https://issues.apache.org/jira/browse/SPARK-17099
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Josh Rosen
>Priority: Critical
> Fix For: 2.1.0
>
>
> Random query generation uncovered the following que

[jira] [Created] (SPARK-17099) Incorrect result when complex HAVING clause is added to query

2016-08-16 Thread Josh Rosen (JIRA)

Josh Rosen created SPARK-17099:
--

 Summary: Incorrect result when complex HAVING clause is added to 
query
 Key: SPARK-17099
 URL: https://issues.apache.org/jira/browse/SPARK-17099
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.1.0
Reporter: Josh Rosen
Priority: Critical
 Fix For: 2.1.0


Random query generation uncovered the following query which returns incorrect 
results when run on Spark SQL. This wasn't the original query uncovered by the 
generator, since I performed a bit of minimization to try to make it more 
understandable.

With the following tables:

{code}
val t1 = sc.parallelize(Seq(-234, 145, 367, 975, 298)).toDF("int_col_5")
val t2 = sc.parallelize(
  Seq(
(-769, -244),
(-800, -409),
(940, 86),
(-507, 304),
(-367, 158))
).toDF("int_col_2", "int_col_5")

t1.registerTempTable("t1")
t2.registerTempTable("t2")
{code}

Run

{code}
SELECT
  (SUM(COALESCE(t1.int_col_5, t2.int_col_2))),
 ((COALESCE(t1.int_col_5, t2.int_col_2)) * 2)
FROM t1
RIGHT JOIN t2
  ON (t2.int_col_2) = (t1.int_col_5)
GROUP BY GREATEST(COALESCE(t2.int_col_5, 109), COALESCE(t1.int_col_5, -449)),
 COALESCE(t1.int_col_5, t2.int_col_2)
HAVING (SUM(COALESCE(t1.int_col_5, t2.int_col_2))) > ((COALESCE(t1.int_col_5, 
t2.int_col_2)) * 2)
{code}

In Spark SQL, this returns an empty result set, whereas Postgres returns four 
rows. However, if I omit the {{HAVING}} clause I see that the group's rows are 
being incorrectly filtered by it:

{code}
+--+---+--+
| sum(coalesce(int_col_5, int_col_2))  | (coalesce(int_col_5, int_col_2) * 2)  |
+--+---+--+
| -507 | -1014 |
| 940  | 1880  |
| -769 | -1538 |
| -367 | -734  |
| -800 | -1600 |
+--+---+--+
{code}

Based on this, the output after adding the {{HAVING}} should contain four rows, 
not zero.

I'm not sure how to further shrink this in a straightforward way, so I'm 
opening this bug to get help in triaging further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17099) Incorrect result when HAVING clause is added to group by query

2016-08-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-17099:
---
Summary: Incorrect result when HAVING clause is added to group by query  
(was: Incorrect result when complex HAVING clause is added to query)

> Incorrect result when HAVING clause is added to group by query
> --
>
> Key: SPARK-17099
> URL: https://issues.apache.org/jira/browse/SPARK-17099
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Josh Rosen
>Priority: Critical
> Fix For: 2.1.0
>
>
> Random query generation uncovered the following query which returns incorrect 
> results when run on Spark SQL. This wasn't the original query uncovered by 
> the generator, since I performed a bit of minimization to try to make it more 
> understandable.
> With the following tables:
> {code}
> val t1 = sc.parallelize(Seq(-234, 145, 367, 975, 298)).toDF("int_col_5")
> val t2 = sc.parallelize(
>   Seq(
> (-769, -244),
> (-800, -409),
> (940, 86),
> (-507, 304),
> (-367, 158))
> ).toDF("int_col_2", "int_col_5")
> t1.registerTempTable("t1")
> t2.registerTempTable("t2")
> {code}
> Run
> {code}
> SELECT
>   (SUM(COALESCE(t1.int_col_5, t2.int_col_2))),
>  ((COALESCE(t1.int_col_5, t2.int_col_2)) * 2)
> FROM t1
> RIGHT JOIN t2
>   ON (t2.int_col_2) = (t1.int_col_5)
> GROUP BY GREATEST(COALESCE(t2.int_col_5, 109), COALESCE(t1.int_col_5, -449)),
>  COALESCE(t1.int_col_5, t2.int_col_2)
> HAVING (SUM(COALESCE(t1.int_col_5, t2.int_col_2))) > ((COALESCE(t1.int_col_5, 
> t2.int_col_2)) * 2)
> {code}
> In Spark SQL, this returns an empty result set, whereas Postgres returns four 
> rows. However, if I omit the {{HAVING}} clause I see that the group's rows 
> are being incorrectly filtered by it:
> {code}
> +--+---+--+
> | sum(coalesce(int_col_5, int_col_2))  | (coalesce(int_col_5, int_col_2) * 2) 
>  |
> +--+---+--+
> | -507 | -1014
>  |
> | 940  | 1880 
>  |
> | -769 | -1538
>  |
> | -367 | -734 
>  |
> | -800 | -1600
>  |
> +--+---+--+
> {code}
> Based on this, the output after adding the {{HAVING}} should contain four 
> rows, not zero.
> I'm not sure how to further shrink this in a straightforward way, so I'm 
> opening this bug to get help in triaging further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17054) SparkR can not run in yarn-cluster mode on mac os

2016-08-16 Thread Jeff Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423745#comment-15423745
 ] 

Jeff Zhang commented on SPARK-17054:


I push another commit to disable downloading spark if it is cluster mode.  
After that I can run sparkR successfully in yarn-cluster mode. 

> SparkR can not run in yarn-cluster mode on mac os
> -
>
> Key: SPARK-17054
> URL: https://issues.apache.org/jira/browse/SPARK-17054
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Jeff Zhang
>
> This is due to it download sparkR to the wrong place.
> {noformat}
> Warning message:
> 'sparkR.init' is deprecated.
> Use 'sparkR.session' instead.
> See help("Deprecated")
> Spark not found in SPARK_HOME:  .
> To search in the cache directory. Installation will start if not found.
> Mirror site not provided.
> Looking for site suggested from apache website...
> Preferred mirror site found: http://apache.mirror.cdnetworks.com/spark
> Downloading Spark spark-2.0.0 for Hadoop 2.7 from:
> - 
> http://apache.mirror.cdnetworks.com/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz
> Fetch failed from http://apache.mirror.cdnetworks.com/spark
>  open destfile '/home//Library/Caches/spark/spark-2.0.0-bin-hadoop2.7.tgz', 
> reason 'No such file or directory'>
> To use backup site...
> Downloading Spark spark-2.0.0 for Hadoop 2.7 from:
> - 
> http://www-us.apache.org/dist/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz
> Fetch failed from http://www-us.apache.org/dist/spark
>  open destfile '/home//Library/Caches/spark/spark-2.0.0-bin-hadoop2.7.tgz', 
> reason 'No such file or directory'>
> Error in robust_download_tar(mirrorUrl, version, hadoopVersion, packageName,  
> :
>   Unable to download Spark spark-2.0.0 for Hadoop 2.7. Please check network 
> connection, Hadoop version, or provide other mirror sites.
> Calls: sparkRSQL.init ... sparkR.session -> install.spark -> 
> robust_download_tar
> In addition: Warning messages:
> 1: 'sparkRSQL.init' is deprecated.
> Use 'sparkR.session' instead.
> See help("Deprecated")
> 2: In dir.create(localDir, recursive = TRUE) :
>   cannot create dir '/home//Library', reason 'Operation not supported'
> Execution halted
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16578) Configurable hostname for RBackend

2016-08-16 Thread Jeff Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423741#comment-15423741
 ] 

Jeff Zhang commented on SPARK-16578:


Another scenario I'd like to clarify is that. Say we launch R process in client 
machine and R backend process in AM container, then is this client mode or 
cluster mode ?

> Configurable hostname for RBackend
> --
>
> Key: SPARK-16578
> URL: https://issues.apache.org/jira/browse/SPARK-16578
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> One of the requirements that comes up with SparkR being a standalone package 
> is that users can now install just the R package on the client side and 
> connect to a remote machine which runs the RBackend class.
> We should check if we can support this mode of execution and what are the 
> pros / cons of it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-17054) SparkR can not run in yarn-cluster mode on mac os

2016-08-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17054:


Assignee: Apache Spark

> SparkR can not run in yarn-cluster mode on mac os
> -
>
> Key: SPARK-17054
> URL: https://issues.apache.org/jira/browse/SPARK-17054
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Jeff Zhang
>Assignee: Apache Spark
>
> This is due to it download sparkR to the wrong place.
> {noformat}
> Warning message:
> 'sparkR.init' is deprecated.
> Use 'sparkR.session' instead.
> See help("Deprecated")
> Spark not found in SPARK_HOME:  .
> To search in the cache directory. Installation will start if not found.
> Mirror site not provided.
> Looking for site suggested from apache website...
> Preferred mirror site found: http://apache.mirror.cdnetworks.com/spark
> Downloading Spark spark-2.0.0 for Hadoop 2.7 from:
> - 
> http://apache.mirror.cdnetworks.com/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz
> Fetch failed from http://apache.mirror.cdnetworks.com/spark
>  open destfile '/home//Library/Caches/spark/spark-2.0.0-bin-hadoop2.7.tgz', 
> reason 'No such file or directory'>
> To use backup site...
> Downloading Spark spark-2.0.0 for Hadoop 2.7 from:
> - 
> http://www-us.apache.org/dist/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz
> Fetch failed from http://www-us.apache.org/dist/spark
>  open destfile '/home//Library/Caches/spark/spark-2.0.0-bin-hadoop2.7.tgz', 
> reason 'No such file or directory'>
> Error in robust_download_tar(mirrorUrl, version, hadoopVersion, packageName,  
> :
>   Unable to download Spark spark-2.0.0 for Hadoop 2.7. Please check network 
> connection, Hadoop version, or provide other mirror sites.
> Calls: sparkRSQL.init ... sparkR.session -> install.spark -> 
> robust_download_tar
> In addition: Warning messages:
> 1: 'sparkRSQL.init' is deprecated.
> Use 'sparkR.session' instead.
> See help("Deprecated")
> 2: In dir.create(localDir, recursive = TRUE) :
>   cannot create dir '/home//Library', reason 'Operation not supported'
> Execution halted
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-17054) SparkR can not run in yarn-cluster mode on mac os

2016-08-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17054:


Assignee: (was: Apache Spark)

> SparkR can not run in yarn-cluster mode on mac os
> -
>
> Key: SPARK-17054
> URL: https://issues.apache.org/jira/browse/SPARK-17054
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Jeff Zhang
>
> This is due to it download sparkR to the wrong place.
> {noformat}
> Warning message:
> 'sparkR.init' is deprecated.
> Use 'sparkR.session' instead.
> See help("Deprecated")
> Spark not found in SPARK_HOME:  .
> To search in the cache directory. Installation will start if not found.
> Mirror site not provided.
> Looking for site suggested from apache website...
> Preferred mirror site found: http://apache.mirror.cdnetworks.com/spark
> Downloading Spark spark-2.0.0 for Hadoop 2.7 from:
> - 
> http://apache.mirror.cdnetworks.com/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz
> Fetch failed from http://apache.mirror.cdnetworks.com/spark
>  open destfile '/home//Library/Caches/spark/spark-2.0.0-bin-hadoop2.7.tgz', 
> reason 'No such file or directory'>
> To use backup site...
> Downloading Spark spark-2.0.0 for Hadoop 2.7 from:
> - 
> http://www-us.apache.org/dist/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz
> Fetch failed from http://www-us.apache.org/dist/spark
>  open destfile '/home//Library/Caches/spark/spark-2.0.0-bin-hadoop2.7.tgz', 
> reason 'No such file or directory'>
> Error in robust_download_tar(mirrorUrl, version, hadoopVersion, packageName,  
> :
>   Unable to download Spark spark-2.0.0 for Hadoop 2.7. Please check network 
> connection, Hadoop version, or provide other mirror sites.
> Calls: sparkRSQL.init ... sparkR.session -> install.spark -> 
> robust_download_tar
> In addition: Warning messages:
> 1: 'sparkRSQL.init' is deprecated.
> Use 'sparkR.session' instead.
> See help("Deprecated")
> 2: In dir.create(localDir, recursive = TRUE) :
>   cannot create dir '/home//Library', reason 'Operation not supported'
> Execution halted
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16757) Set up caller context to HDFS

2016-08-16 Thread Weiqing Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423730#comment-15423730
 ] 

Weiqing Yang commented on SPARK-16757:
--

Hi, [~srowen] Could you help review this PR please?

> Set up caller context to HDFS
> -
>
> Key: SPARK-16757
> URL: https://issues.apache.org/jira/browse/SPARK-16757
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Weiqing Yang
>
> In this jira, Spark will invoke hadoop caller context api to set up its 
> caller context to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16947) Support type coercion and foldable expression for inline tables

2016-08-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423678#comment-15423678
 ] 

Apache Spark commented on SPARK-16947:
--

User 'petermaxlee' has created a pull request for this issue:
https://github.com/apache/spark/pull/14676

> Support type coercion and foldable expression for inline tables
> ---
>
> Key: SPARK-16947
> URL: https://issues.apache.org/jira/browse/SPARK-16947
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Herman van Hovell
>Assignee: Herman van Hovell
>
> Inline tables were added in to Spark SQL in 2.0, e.g.: {{select * from values 
> (1, 'A'), (2, 'B') as tbl(a, b)}}
> This is currently implemented using a {{LocalRelation}} and this relation is 
> created during parsing. This has several weaknesses: you can only use simple 
> expressions in such a plan, and type coercion is based on the first row in 
> the relation, and all subsequent values are cast in to this type. The latter 
> violates the principle of least surprise.
> I would like to rewrite this into a union of projects; each of these projects 
> would contain a single table row. We apply better type coercion rules to a 
> union, and we should be able to rewrite this into a local relation during 
> optimization. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16947) Support type coercion and foldable expression for inline tables

2016-08-16 Thread Peter Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Lee updated SPARK-16947:
--
Summary: Support type coercion and foldable expression for inline tables  
(was: Improve type coercion and support foldable expression for inline tables)

> Support type coercion and foldable expression for inline tables
> ---
>
> Key: SPARK-16947
> URL: https://issues.apache.org/jira/browse/SPARK-16947
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Herman van Hovell
>Assignee: Herman van Hovell
>
> Inline tables were added in to Spark SQL in 2.0, e.g.: {{select * from values 
> (1, 'A'), (2, 'B') as tbl(a, b)}}
> This is currently implemented using a {{LocalRelation}} and this relation is 
> created during parsing. This has several weaknesses: you can only use simple 
> expressions in such a plan, and type coercion is based on the first row in 
> the relation, and all subsequent values are cast in to this type. The latter 
> violates the principle of least surprise.
> I would like to rewrite this into a union of projects; each of these projects 
> would contain a single table row. We apply better type coercion rules to a 
> union, and we should be able to rewrite this into a local relation during 
> optimization. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16947) Improve type coercion and support foldable expression for inline tables

2016-08-16 Thread Peter Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Lee updated SPARK-16947:
--
Summary: Improve type coercion and support foldable expression for inline 
tables  (was: Improve type coercion of inline tables)

> Improve type coercion and support foldable expression for inline tables
> ---
>
> Key: SPARK-16947
> URL: https://issues.apache.org/jira/browse/SPARK-16947
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Herman van Hovell
>Assignee: Herman van Hovell
>
> Inline tables were added in to Spark SQL in 2.0, e.g.: {{select * from values 
> (1, 'A'), (2, 'B') as tbl(a, b)}}
> This is currently implemented using a {{LocalRelation}} and this relation is 
> created during parsing. This has several weaknesses: you can only use simple 
> expressions in such a plan, and type coercion is based on the first row in 
> the relation, and all subsequent values are cast in to this type. The latter 
> violates the principle of least surprise.
> I would like to rewrite this into a union of projects; each of these projects 
> would contain a single table row. We apply better type coercion rules to a 
> union, and we should be able to rewrite this into a local relation during 
> optimization. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17096) Fix StreamingQueryListener to return message and stacktrace of actual exception

2016-08-16 Thread Tathagata Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-17096:
--
Description: 
Currently only the stacktrace of StreamingQueryException is returned through 
StreamingQueryListener, which is useless as it hides the actual exception's 
stacktrace.

For example, if there is a / by zero exception in a task, the 
QueryTerminated.stackTrace will have

{code}
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:211)
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:124)
{code}

  was:
Currently only the stacktrace of StreamingQueryException is returned through 
StreamingQueryListener, which is useless as it hides the actual exception's 
stacktrace.

For example, if there is a / by zero exception in a task, the 
QueryTerminated.stackTrace will have

{code}

org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:211)

org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:124)
{code}


> Fix StreamingQueryListener to return message and stacktrace of actual 
> exception
> ---
>
> Key: SPARK-17096
> URL: https://issues.apache.org/jira/browse/SPARK-17096
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Minor
>
> Currently only the stacktrace of StreamingQueryException is returned through 
> StreamingQueryListener, which is useless as it hides the actual exception's 
> stacktrace.
> For example, if there is a / by zero exception in a task, the 
> QueryTerminated.stackTrace will have
> {code}
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:211)
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:124)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17096) Fix StreamingQueryListener to return message and stacktrace of actual exception

2016-08-16 Thread Tathagata Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-17096:
--
Description: 
Currently only the stacktrace of StreamingQueryException is returned through 
StreamingQueryListener, which is useless as it hides the actual exception's 
stacktrace.

For example, if there is a / by zero exception in a task, the 
QueryTerminated.stackTrace will have

{code}

org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:211)

org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:124)
{code}

  was:Currently only the stacktrace of StreamingQueryException is returned 
through StreamingQueryListener, which is useless as it hides the actual 
exception's stacktrace.


> Fix StreamingQueryListener to return message and stacktrace of actual 
> exception
> ---
>
> Key: SPARK-17096
> URL: https://issues.apache.org/jira/browse/SPARK-17096
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Minor
>
> Currently only the stacktrace of StreamingQueryException is returned through 
> StreamingQueryListener, which is useless as it hides the actual exception's 
> stacktrace.
> For example, if there is a / by zero exception in a task, the 
> QueryTerminated.stackTrace will have
> {code}
>   
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:211)
>   
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:124)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-17096) Fix StreamingQueryListener to return message and stacktrace of actual exception

2016-08-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17096:


Assignee: Apache Spark  (was: Tathagata Das)

> Fix StreamingQueryListener to return message and stacktrace of actual 
> exception
> ---
>
> Key: SPARK-17096
> URL: https://issues.apache.org/jira/browse/SPARK-17096
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Streaming
>Reporter: Tathagata Das
>Assignee: Apache Spark
>Priority: Minor
>
> Currently only the stacktrace of StreamingQueryException is returned through 
> StreamingQueryListener, which is useless as it hides the actual exception's 
> stacktrace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17096) Fix StreamingQueryListener to return message and stacktrace of actual exception

2016-08-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423650#comment-15423650
 ] 

Apache Spark commented on SPARK-17096:
--

User 'tdas' has created a pull request for this issue:
https://github.com/apache/spark/pull/14675

> Fix StreamingQueryListener to return message and stacktrace of actual 
> exception
> ---
>
> Key: SPARK-17096
> URL: https://issues.apache.org/jira/browse/SPARK-17096
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Minor
>
> Currently only the stacktrace of StreamingQueryException is returned through 
> StreamingQueryListener, which is useless as it hides the actual exception's 
> stacktrace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-17096) Fix StreamingQueryListener to return message and stacktrace of actual exception

2016-08-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17096:


Assignee: Tathagata Das  (was: Apache Spark)

> Fix StreamingQueryListener to return message and stacktrace of actual 
> exception
> ---
>
> Key: SPARK-17096
> URL: https://issues.apache.org/jira/browse/SPARK-17096
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Minor
>
> Currently only the stacktrace of StreamingQueryException is returned through 
> StreamingQueryListener, which is useless as it hides the actual exception's 
> stacktrace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17098) "SELECT COUNT(NULL) OVER ()" throws UnsupportedOperationException during analysis

2016-08-16 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423634#comment-15423634
 ] 

Josh Rosen commented on SPARK-17098:


Actually, given the error here I think that the problem could be that sometimes 
{{WindowExpression.foldable == true}} even though {{WindowExpression}} is 
{{Unevaluable}}:

{code}
case class WindowExpression(
windowFunction: Expression,
windowSpec: WindowSpecDefinition) extends Expression with Unevaluable {

  override def children: Seq[Expression] = windowFunction :: windowSpec :: Nil

  override def dataType: DataType = windowFunction.dataType
  override def foldable: Boolean = windowFunction.foldable
  override def nullable: Boolean = windowFunction.nullable

  override def toString: String = s"$windowFunction $windowSpec"
  override def sql: String = windowFunction.sql + " OVER " + windowSpec.sql
}
{code}

/cc [~hvanhovell], FYI. 

> "SELECT COUNT(NULL) OVER ()" throws UnsupportedOperationException during 
> analysis
> -
>
> Key: SPARK-17098
> URL: https://issues.apache.org/jira/browse/SPARK-17098
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Josh Rosen
>
> Running
> {code}
> SELECT COUNT(NULL) OVER ()
> {code}
> throws an UnsupportedOperationException during analysis:
> {code}
> java.lang.UnsupportedOperationException: Cannot evaluate expression: cast(0 
> as bigint) windowspecdefinition(ROWS BETWEEN UNBOUNDED PRECEDING AND 
> UNBOUNDED FOLLOWING)
>   at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:221)
>   at 
> org.apache.spark.sql.catalyst.expressions.WindowExpression.eval(windowExpressions.scala:288)
>   at 
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18$$anonfun$applyOrElse$3.applyOrElse(Optimizer.scala:759)
>   at 
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18$$anonfun$applyOrElse$3.applyOrElse(Optimizer.scala:752)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:278)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:321)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:319)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:284)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionDown$1(QueryPlan.scala:156)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:166)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:170)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:170)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$4.apply(QueryPlan.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDown(QueryPlan.scala:175)
>   at 
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18.applyOrElse(Optimizer.scala:752)
>   at 
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18.applyOrElse(Optimizer.scala:751)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNod

[jira] [Updated] (SPARK-17098) "SELECT COUNT(NULL) OVER ()" throws UnsupportedOperationException during analysis

2016-08-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-17098:
---
Description: 
Running

{code}
SELECT COUNT(NULL) OVER ()
{code}

throws an UnsupportedOperationException during analysis:

{code}
java.lang.UnsupportedOperationException: Cannot evaluate expression: cast(0 as 
bigint) windowspecdefinition(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED 
FOLLOWING)
at 
org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:221)
at 
org.apache.spark.sql.catalyst.expressions.WindowExpression.eval(windowExpressions.scala:288)
at 
org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18$$anonfun$applyOrElse$3.applyOrElse(Optimizer.scala:759)
at 
org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18$$anonfun$applyOrElse$3.applyOrElse(Optimizer.scala:752)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:278)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:321)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:319)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:284)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionDown$1(QueryPlan.scala:156)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:166)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:170)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:170)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$4.apply(QueryPlan.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDown(QueryPlan.scala:175)
at 
org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18.applyOrElse(Optimizer.scala:752)
at 
org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18.applyOrElse(Optimizer.scala:751)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:278)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:321)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:319)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:284)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:321)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren

[jira] [Created] (SPARK-17098) "SELECT COUNT(NULL) OVER ()" throws UnsupportedOperationException during analysis

2016-08-16 Thread Josh Rosen (JIRA)

Josh Rosen created SPARK-17098:
--

 Summary: "SELECT COUNT(NULL) OVER ()" throws 
UnsupportedOperationException during analysis
 Key: SPARK-17098
 URL: https://issues.apache.org/jira/browse/SPARK-17098
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Josh Rosen


Running

{code}
SELECT COUNT(NULL) OVER ()
{code}

throws an UnsupportedOperationException during analysis:

{code}
java.lang.UnsupportedOperationException: Cannot evaluate expression: cast(0 as 
bigint) windowspecdefinition(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED 
FOLLOWING)
at 
org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:221)
at 
org.apache.spark.sql.catalyst.expressions.WindowExpression.eval(windowExpressions.scala:288)
at 
org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18$$anonfun$applyOrElse$3.applyOrElse(Optimizer.scala:759)
at 
org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18$$anonfun$applyOrElse$3.applyOrElse(Optimizer.scala:752)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:278)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:321)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:319)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:284)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionDown$1(QueryPlan.scala:156)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:166)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:170)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:170)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$4.apply(QueryPlan.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDown(QueryPlan.scala:175)
at 
org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18.applyOrElse(Optimizer.scala:752)
at 
org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18.applyOrElse(Optimizer.scala:751)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:278)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:321)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:319)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:284)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
at 
org.apache.spark.sql.catalyst.trees.TreeN

[jira] [Commented] (SPARK-17095) Latex and Scala doc do not play nicely

2016-08-16 Thread Jakob Odersky (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423602#comment-15423602
 ] 

Jakob Odersky commented on SPARK-17095:
---

Since this bug also occurs when there are no opening braces (}}} anywhere in 
the doc is sufficient), I think this is an issue with scaladoc itself. I would 
recommend creating a bug report on the scala tracker 
https://issues.scala-lang.org/secure/Dashboard.jspa.
Ideally, code blocks could be delimited with an arbitrary number of opening 
symbols followed by an arbitrary number of closing symbols (e.g. you could use 
 (4 braces)  to delimit code that itself contains }}} 3 braces.

> Latex and Scala doc do not play nicely
> --
>
> Key: SPARK-17095
> URL: https://issues.apache.org/jira/browse/SPARK-17095
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Seth Hendrickson
>Priority: Minor
>  Labels: starter
>
> In Latex, it is common to find "}}}" when closing several expressions at 
> once. [SPARK-16822|https://issues.apache.org/jira/browse/SPARK-16822] added 
> Mathjax to render Latex equations in scaladoc. However, when scala doc sees 
> "}}}" or "{{{" it treats it as a special character for code block. This 
> results in some very strange output.
> A poor workaround is to use "}}\,}" in latex which inserts a small 
> whitespace. This is not ideal, and we can hopefully find a better solution. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16700) StructType doesn't accept Python dicts anymore

2016-08-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-16700:
---
Labels: releasenotes  (was: )

> StructType doesn't accept Python dicts anymore
> --
>
> Key: SPARK-16700
> URL: https://issues.apache.org/jira/browse/SPARK-16700
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Sylvain Zimmer
>Assignee: Davies Liu
>  Labels: releasenotes
> Fix For: 2.1.0
>
>
> Hello,
> I found this issue while testing my codebase with 2.0.0-rc5
> StructType in Spark 1.6.2 accepts the Python  type, which is very 
> handy. 2.0.0-rc5 does not and throws an error.
> I don't know if this was intended but I'd advocate for this behaviour to 
> remain the same. MapType is probably wasteful when your key names never 
> change and switching to Python tuples would be cumbersome.
> Here is a minimal script to reproduce the issue: 
> {code}
> from pyspark import SparkContext
> from pyspark.sql import types as SparkTypes
> from pyspark.sql import SQLContext
> sc = SparkContext()
> sqlc = SQLContext(sc)
> struct_schema = SparkTypes.StructType([
> SparkTypes.StructField("id", SparkTypes.LongType())
> ])
> rdd = sc.parallelize([{"id": 0}, {"id": 1}])
> df = sqlc.createDataFrame(rdd, struct_schema)
> print df.collect()
> # 1.6.2 prints [Row(id=0), Row(id=1)]
> # 2.0.0-rc5 raises TypeError: StructType can not accept object {'id': 0} in 
> type 
> {code}
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17097) Pregel does not keep vertex state properly; fails to terminate

2016-08-16 Thread Seth Bromberger (JIRA)

Seth Bromberger created SPARK-17097:
---

 Summary: Pregel does not keep vertex state properly; fails to 
terminate 
 Key: SPARK-17097
 URL: https://issues.apache.org/jira/browse/SPARK-17097
 Project: Spark
  Issue Type: Bug
  Components: GraphX
Affects Versions: 1.6.0
 Environment: Scala 2.10.5, Spark 1.6.0 with GraphX and Pregel
Reporter: Seth Bromberger


Consider the following minimum example:
{code:title=PregelBug.scala|borderStyle=solid}
package testGraph

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.graphx.{Edge, EdgeTriplet, Graph, _}

object PregelBug {
  def main(args: Array[String]) = {
//FIXME breaks if TestVertex is a case class; works if not case class
case class TestVertex(inId: VertexId,
 inData: String,
 inLabels: collection.mutable.HashSet[String]) extends 
Serializable {
  val id = inId
  val value = inData
  val labels = inLabels
}

class TestLink(inSrc: VertexId, inDst: VertexId, inData: String) extends 
Serializable  {
  val src = inSrc
  val dst = inDst
  val data = inData
}

val startString = "XXXSTARTXXX"

val conf = new SparkConf().setAppName("pregeltest").setMaster("local[*]")
val sc = new SparkContext(conf)

val vertexes = Vector(
  new TestVertex(0, "label0", collection.mutable.HashSet[String]()),
  new TestVertex(1, "label1", collection.mutable.HashSet[String]())
)
val links = Vector(
  new TestLink(0, 1, "linkData01")
)
val vertexes_packaged = vertexes.map(v => (v.id, v))
val links_packaged = links.map(e => Edge(e.src, e.dst, e))

val graph = Graph[TestVertex, TestLink](sc.parallelize(vertexes_packaged), 
sc.parallelize(links_packaged))

def vertexProgram (vertexId: VertexId, vdata: TestVertex, message: 
Vector[String]): TestVertex = {
  message.foreach {
case `startString` =>
  if (vdata.id == 0L)
vdata.labels.add(vdata.value)

case m =>
  if (!vdata.labels.contains(m))
vdata.labels.add(m)
  }
  new TestVertex(vdata.id, vdata.value, vdata.labels)
}

def sendMessage (triplet: EdgeTriplet[TestVertex, TestLink]): 
Iterator[(VertexId, Vector[String])] = {
  val srcLabels = triplet.srcAttr.labels
  val dstLabels = triplet.dstAttr.labels

  val msgsSrcDst = srcLabels.diff(dstLabels)
.map(label => (triplet.dstAttr.id, Vector[String](label)))

  val msgsDstSrc = dstLabels.diff(dstLabels)
.map(label => (triplet.srcAttr.id, Vector[String](label)))

  msgsSrcDst.toIterator ++ msgsDstSrc.toIterator
}

def mergeMessage (m1: Vector[String], m2: Vector[String]): Vector[String] = 
m1.union(m2).distinct

val g = graph.pregel(Vector[String](startString))(vertexProgram, 
sendMessage, mergeMessage)

println("---pregel done---")
println("vertex info:")
g.vertices.foreach(
  v => {
val labels = v._2.labels
println(
  "vertex " + v._1 +
": name = " + v._2.id +
", labels = " + labels)
  }
)
  }
}
{code}


This code never terminates even though we expect it to. To fix, we simply 
remove the "case" designation for the TestVertex class (see FIXME comment), and 
then it behaves as expected.

(Apologies if this has been fixed in later versions; we're unfortunately pegged 
to 2.10.5 / 1.6.0 for now.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-17002) Document that spark.ssl.protocol. is required for SSL

2016-08-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17002:


Assignee: Apache Spark

> Document that spark.ssl.protocol. is required for SSL
> -
>
> Key: SPARK-17002
> URL: https://issues.apache.org/jira/browse/SPARK-17002
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Michael Gummelt
>Assignee: Apache Spark
>
> cc [~jlewandowski]
> I was trying to start the Spark master.  When setting 
> {{spark.ssl.enabled=true}}, but failing to set {{spark.ssl.protocol}}, I get 
> this none-too-helpful error message:
> {code}
> 16/08/10 15:17:50 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(mgummelt); users 
> with modify permissions: Set(mgummelt)
> 16/08/10 15:17:50 WARN SecurityManager: Using 'accept-all' trust manager for 
> SSL connections.
> Exception in thread "main" java.security.KeyManagementException: Default 
> SSLContext is initialized automatically
>   at 
> sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749)
>   at javax.net.ssl.SSLContext.init(SSLContext.java:282)
>   at org.apache.spark.SecurityManager.(SecurityManager.scala:284)
>   at 
> org.apache.spark.deploy.master.Master$.startRpcEnvAndEndpoint(Master.scala:1121)
>   at org.apache.spark.deploy.master.Master$.main(Master.scala:1106)
>   at org.apache.spark.deploy.master.Master.main(Master.scala)
> {code}
> We should document that {{spark.ssl.protocol}} is required, and throw a more 
> helpful error message when it isn't set.  In fact, we should remove the 
> `getOrElse` here: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SecurityManager.scala#L285,
>  since the following line fails when the protocol is set to "Default"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17002) Document that spark.ssl.protocol. is required for SSL

2016-08-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423564#comment-15423564
 ] 

Apache Spark commented on SPARK-17002:
--

User 'wangmiao1981' has created a pull request for this issue:
https://github.com/apache/spark/pull/14674

> Document that spark.ssl.protocol. is required for SSL
> -
>
> Key: SPARK-17002
> URL: https://issues.apache.org/jira/browse/SPARK-17002
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Michael Gummelt
>
> cc [~jlewandowski]
> I was trying to start the Spark master.  When setting 
> {{spark.ssl.enabled=true}}, but failing to set {{spark.ssl.protocol}}, I get 
> this none-too-helpful error message:
> {code}
> 16/08/10 15:17:50 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(mgummelt); users 
> with modify permissions: Set(mgummelt)
> 16/08/10 15:17:50 WARN SecurityManager: Using 'accept-all' trust manager for 
> SSL connections.
> Exception in thread "main" java.security.KeyManagementException: Default 
> SSLContext is initialized automatically
>   at 
> sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749)
>   at javax.net.ssl.SSLContext.init(SSLContext.java:282)
>   at org.apache.spark.SecurityManager.(SecurityManager.scala:284)
>   at 
> org.apache.spark.deploy.master.Master$.startRpcEnvAndEndpoint(Master.scala:1121)
>   at org.apache.spark.deploy.master.Master$.main(Master.scala:1106)
>   at org.apache.spark.deploy.master.Master.main(Master.scala)
> {code}
> We should document that {{spark.ssl.protocol}} is required, and throw a more 
> helpful error message when it isn't set.  In fact, we should remove the 
> `getOrElse` here: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SecurityManager.scala#L285,
>  since the following line fails when the protocol is set to "Default"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-17002) Document that spark.ssl.protocol. is required for SSL

2016-08-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17002:


Assignee: (was: Apache Spark)

> Document that spark.ssl.protocol. is required for SSL
> -
>
> Key: SPARK-17002
> URL: https://issues.apache.org/jira/browse/SPARK-17002
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Michael Gummelt
>
> cc [~jlewandowski]
> I was trying to start the Spark master.  When setting 
> {{spark.ssl.enabled=true}}, but failing to set {{spark.ssl.protocol}}, I get 
> this none-too-helpful error message:
> {code}
> 16/08/10 15:17:50 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(mgummelt); users 
> with modify permissions: Set(mgummelt)
> 16/08/10 15:17:50 WARN SecurityManager: Using 'accept-all' trust manager for 
> SSL connections.
> Exception in thread "main" java.security.KeyManagementException: Default 
> SSLContext is initialized automatically
>   at 
> sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749)
>   at javax.net.ssl.SSLContext.init(SSLContext.java:282)
>   at org.apache.spark.SecurityManager.(SecurityManager.scala:284)
>   at 
> org.apache.spark.deploy.master.Master$.startRpcEnvAndEndpoint(Master.scala:1121)
>   at org.apache.spark.deploy.master.Master$.main(Master.scala:1106)
>   at org.apache.spark.deploy.master.Master.main(Master.scala)
> {code}
> We should document that {{spark.ssl.protocol}} is required, and throw a more 
> helpful error message when it isn't set.  In fact, we should remove the 
> `getOrElse` here: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SecurityManager.scala#L285,
>  since the following line fails when the protocol is set to "Default"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16725) Migrate Guava to 16+?

2016-08-16 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423559#comment-15423559
 ] 

Marcelo Vanzin commented on SPARK-16725:


Don't get me wrong, I feel your pain, and I really hope Hadoop 3.x will fix 
this mess.

But upgrading the version of Guava in Spark doesn't really solve *the* problem. 
It might solve your specific problem, but then there are ways of solving your 
problem that do not involve changing Spark, too.

> Migrate Guava to 16+?
> -
>
> Key: SPARK-16725
> URL: https://issues.apache.org/jira/browse/SPARK-16725
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.0.1
>Reporter: Min Wei
>Priority: Minor
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Currently Spark depends on an old version of Guava, version 14. However 
> Spark-cassandra driver asserts on Guava version 16 and above. 
> It would be great to update the Guava dependency to version 16+
> diff --git a/core/src/main/scala/org/apache/spark/SecurityManager.scala 
> b/core/src/main/scala/org/apache/spark/SecurityManager.scala
> index f72c7de..abddafe 100644
> --- a/core/src/main/scala/org/apache/spark/SecurityManager.scala
> +++ b/core/src/main/scala/org/apache/spark/SecurityManager.scala
> @@ -23,7 +23,7 @@ import java.security.{KeyStore, SecureRandom}
>  import java.security.cert.X509Certificate
>  import javax.net.ssl._
>  
> -import com.google.common.hash.HashCodes
> +import com.google.common.hash.HashCode
>  import com.google.common.io.Files
>  import org.apache.hadoop.io.Text
>  
> @@ -432,7 +432,7 @@ private[spark] class SecurityManager(sparkConf: SparkConf)
>  val secret = new Array[Byte](length)
>  rnd.nextBytes(secret)
>  
> -val cookie = HashCodes.fromBytes(secret).toString()
> +val cookie = HashCode.fromBytes(secret).toString()
>  SparkHadoopUtil.get.addSecretKeyToUserCredentials(SECRET_LOOKUP_KEY, 
> cookie)
>  cookie
>} else {
> diff --git a/core/src/main/scala/org/apache/spark/SparkEnv.scala 
> b/core/src/main/scala/org/apache/spark/SparkEnv.scala
> index af50a6d..02545ae 100644
> --- a/core/src/main/scala/org/apache/spark/SparkEnv.scala
> +++ b/core/src/main/scala/org/apache/spark/SparkEnv.scala
> @@ -72,7 +72,7 @@ class SparkEnv (
>  
>// A general, soft-reference map for metadata needed during HadoopRDD 
> split computation
>// (e.g., HadoopFileRDD uses this to cache JobConfs and InputFormats).
> -  private[spark] val hadoopJobMetadata = new 
> MapMaker().softValues().makeMap[String, Any]()
> +  private[spark] val hadoopJobMetadata = new 
> MapMaker().weakValues().makeMap[String, Any]()
>  
>private[spark] var driverTmpDir: Option[String] = None
>  
> diff --git a/pom.xml b/pom.xml
> index d064cb5..7c3e036 100644
> --- a/pom.xml
> +++ b/pom.xml
> @@ -368,8 +368,7 @@
>
>  com.google.guava
>  guava
> -14.0.1
> -provided
> +19.0
>
>
>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16725) Migrate Guava to 16+?

2016-08-16 Thread Russell Spitzer (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423555#comment-15423555
 ] 

Russell Spitzer commented on SPARK-16725:
-

I'm well aware as we've been dealing with this since 1.0, that's why we begun 
the process of shading Guava for Hadoop based builds, now though we are stuck 
doing it for all builds :(

> Migrate Guava to 16+?
> -
>
> Key: SPARK-16725
> URL: https://issues.apache.org/jira/browse/SPARK-16725
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.0.1
>Reporter: Min Wei
>Priority: Minor
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Currently Spark depends on an old version of Guava, version 14. However 
> Spark-cassandra driver asserts on Guava version 16 and above. 
> It would be great to update the Guava dependency to version 16+
> diff --git a/core/src/main/scala/org/apache/spark/SecurityManager.scala 
> b/core/src/main/scala/org/apache/spark/SecurityManager.scala
> index f72c7de..abddafe 100644
> --- a/core/src/main/scala/org/apache/spark/SecurityManager.scala
> +++ b/core/src/main/scala/org/apache/spark/SecurityManager.scala
> @@ -23,7 +23,7 @@ import java.security.{KeyStore, SecureRandom}
>  import java.security.cert.X509Certificate
>  import javax.net.ssl._
>  
> -import com.google.common.hash.HashCodes
> +import com.google.common.hash.HashCode
>  import com.google.common.io.Files
>  import org.apache.hadoop.io.Text
>  
> @@ -432,7 +432,7 @@ private[spark] class SecurityManager(sparkConf: SparkConf)
>  val secret = new Array[Byte](length)
>  rnd.nextBytes(secret)
>  
> -val cookie = HashCodes.fromBytes(secret).toString()
> +val cookie = HashCode.fromBytes(secret).toString()
>  SparkHadoopUtil.get.addSecretKeyToUserCredentials(SECRET_LOOKUP_KEY, 
> cookie)
>  cookie
>} else {
> diff --git a/core/src/main/scala/org/apache/spark/SparkEnv.scala 
> b/core/src/main/scala/org/apache/spark/SparkEnv.scala
> index af50a6d..02545ae 100644
> --- a/core/src/main/scala/org/apache/spark/SparkEnv.scala
> +++ b/core/src/main/scala/org/apache/spark/SparkEnv.scala
> @@ -72,7 +72,7 @@ class SparkEnv (
>  
>// A general, soft-reference map for metadata needed during HadoopRDD 
> split computation
>// (e.g., HadoopFileRDD uses this to cache JobConfs and InputFormats).
> -  private[spark] val hadoopJobMetadata = new 
> MapMaker().softValues().makeMap[String, Any]()
> +  private[spark] val hadoopJobMetadata = new 
> MapMaker().weakValues().makeMap[String, Any]()
>  
>private[spark] var driverTmpDir: Option[String] = None
>  
> diff --git a/pom.xml b/pom.xml
> index d064cb5..7c3e036 100644
> --- a/pom.xml
> +++ b/pom.xml
> @@ -368,8 +368,7 @@
>
>  com.google.guava
>  guava
> -14.0.1
> -provided
> +19.0
>
>
>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17096) Fix StreamingQueryListener to return message and stacktrace of actual exception

2016-08-16 Thread Tathagata Das (JIRA)

Tathagata Das created SPARK-17096:
-

 Summary: Fix StreamingQueryListener to return message and 
stacktrace of actual exception
 Key: SPARK-17096
 URL: https://issues.apache.org/jira/browse/SPARK-17096
 Project: Spark
  Issue Type: Sub-task
  Components: SQL, Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Minor


Currently only the stacktrace of StreamingQueryException is returned through 
StreamingQueryListener, which is useless as it hides the actual exception's 
stacktrace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16725) Migrate Guava to 16+?

2016-08-16 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423544#comment-15423544
 ] 

Marcelo Vanzin commented on SPARK-16725:


Welcome to dependency hell. See my comment above for why this issue is not new 
and why you could have it even with Spark 1.x.

> Migrate Guava to 16+?
> -
>
> Key: SPARK-16725
> URL: https://issues.apache.org/jira/browse/SPARK-16725
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.0.1
>Reporter: Min Wei
>Priority: Minor
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Currently Spark depends on an old version of Guava, version 14. However 
> Spark-cassandra driver asserts on Guava version 16 and above. 
> It would be great to update the Guava dependency to version 16+
> diff --git a/core/src/main/scala/org/apache/spark/SecurityManager.scala 
> b/core/src/main/scala/org/apache/spark/SecurityManager.scala
> index f72c7de..abddafe 100644
> --- a/core/src/main/scala/org/apache/spark/SecurityManager.scala
> +++ b/core/src/main/scala/org/apache/spark/SecurityManager.scala
> @@ -23,7 +23,7 @@ import java.security.{KeyStore, SecureRandom}
>  import java.security.cert.X509Certificate
>  import javax.net.ssl._
>  
> -import com.google.common.hash.HashCodes
> +import com.google.common.hash.HashCode
>  import com.google.common.io.Files
>  import org.apache.hadoop.io.Text
>  
> @@ -432,7 +432,7 @@ private[spark] class SecurityManager(sparkConf: SparkConf)
>  val secret = new Array[Byte](length)
>  rnd.nextBytes(secret)
>  
> -val cookie = HashCodes.fromBytes(secret).toString()
> +val cookie = HashCode.fromBytes(secret).toString()
>  SparkHadoopUtil.get.addSecretKeyToUserCredentials(SECRET_LOOKUP_KEY, 
> cookie)
>  cookie
>} else {
> diff --git a/core/src/main/scala/org/apache/spark/SparkEnv.scala 
> b/core/src/main/scala/org/apache/spark/SparkEnv.scala
> index af50a6d..02545ae 100644
> --- a/core/src/main/scala/org/apache/spark/SparkEnv.scala
> +++ b/core/src/main/scala/org/apache/spark/SparkEnv.scala
> @@ -72,7 +72,7 @@ class SparkEnv (
>  
>// A general, soft-reference map for metadata needed during HadoopRDD 
> split computation
>// (e.g., HadoopFileRDD uses this to cache JobConfs and InputFormats).
> -  private[spark] val hadoopJobMetadata = new 
> MapMaker().softValues().makeMap[String, Any]()
> +  private[spark] val hadoopJobMetadata = new 
> MapMaker().weakValues().makeMap[String, Any]()
>  
>private[spark] var driverTmpDir: Option[String] = None
>  
> diff --git a/pom.xml b/pom.xml
> index d064cb5..7c3e036 100644
> --- a/pom.xml
> +++ b/pom.xml
> @@ -368,8 +368,7 @@
>
>  com.google.guava
>  guava
> -14.0.1
> -provided
> +19.0
>
>
>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16725) Migrate Guava to 16+?

2016-08-16 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423542#comment-15423542
 ] 

Marcelo Vanzin commented on SPARK-16725:


Spark's use of Guava is shaded. That comments refers to Hadoop's use of Guava.

If you download the 1.6.x builds of Spark that say "without Hadoop", and you 
add the Hadoop libraries from your distro to Spark, you'll have the same issue, 
since those will include Hadoop's Guava dependency.

This dependency hell is why shading is the way to go.

> Migrate Guava to 16+?
> -
>
> Key: SPARK-16725
> URL: https://issues.apache.org/jira/browse/SPARK-16725
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.0.1
>Reporter: Min Wei
>Priority: Minor
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Currently Spark depends on an old version of Guava, version 14. However 
> Spark-cassandra driver asserts on Guava version 16 and above. 
> It would be great to update the Guava dependency to version 16+
> diff --git a/core/src/main/scala/org/apache/spark/SecurityManager.scala 
> b/core/src/main/scala/org/apache/spark/SecurityManager.scala
> index f72c7de..abddafe 100644
> --- a/core/src/main/scala/org/apache/spark/SecurityManager.scala
> +++ b/core/src/main/scala/org/apache/spark/SecurityManager.scala
> @@ -23,7 +23,7 @@ import java.security.{KeyStore, SecureRandom}
>  import java.security.cert.X509Certificate
>  import javax.net.ssl._
>  
> -import com.google.common.hash.HashCodes
> +import com.google.common.hash.HashCode
>  import com.google.common.io.Files
>  import org.apache.hadoop.io.Text
>  
> @@ -432,7 +432,7 @@ private[spark] class SecurityManager(sparkConf: SparkConf)
>  val secret = new Array[Byte](length)
>  rnd.nextBytes(secret)
>  
> -val cookie = HashCodes.fromBytes(secret).toString()
> +val cookie = HashCode.fromBytes(secret).toString()
>  SparkHadoopUtil.get.addSecretKeyToUserCredentials(SECRET_LOOKUP_KEY, 
> cookie)
>  cookie
>} else {
> diff --git a/core/src/main/scala/org/apache/spark/SparkEnv.scala 
> b/core/src/main/scala/org/apache/spark/SparkEnv.scala
> index af50a6d..02545ae 100644
> --- a/core/src/main/scala/org/apache/spark/SparkEnv.scala
> +++ b/core/src/main/scala/org/apache/spark/SparkEnv.scala
> @@ -72,7 +72,7 @@ class SparkEnv (
>  
>// A general, soft-reference map for metadata needed during HadoopRDD 
> split computation
>// (e.g., HadoopFileRDD uses this to cache JobConfs and InputFormats).
> -  private[spark] val hadoopJobMetadata = new 
> MapMaker().softValues().makeMap[String, Any]()
> +  private[spark] val hadoopJobMetadata = new 
> MapMaker().weakValues().makeMap[String, Any]()
>  
>private[spark] var driverTmpDir: Option[String] = None
>  
> diff --git a/pom.xml b/pom.xml
> index d064cb5..7c3e036 100644
> --- a/pom.xml
> +++ b/pom.xml
> @@ -368,8 +368,7 @@
>
>  com.google.guava
>  guava
> -14.0.1
> -provided
> +19.0
>
>
>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16725) Migrate Guava to 16+?

2016-08-16 Thread Russell Spitzer (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423539#comment-15423539
 ] 

Russell Spitzer commented on SPARK-16725:
-

In our case it's exposing a library which exposes the shaded code. Ie we 
include the Cassandra Java Driver which publicly exposes Guava in some places. 
So those access points are necessarily broken but it's not something we can 
directly control.

> Migrate Guava to 16+?
> -
>
> Key: SPARK-16725
> URL: https://issues.apache.org/jira/browse/SPARK-16725
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.0.1
>Reporter: Min Wei
>Priority: Minor
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Currently Spark depends on an old version of Guava, version 14. However 
> Spark-cassandra driver asserts on Guava version 16 and above. 
> It would be great to update the Guava dependency to version 16+
> diff --git a/core/src/main/scala/org/apache/spark/SecurityManager.scala 
> b/core/src/main/scala/org/apache/spark/SecurityManager.scala
> index f72c7de..abddafe 100644
> --- a/core/src/main/scala/org/apache/spark/SecurityManager.scala
> +++ b/core/src/main/scala/org/apache/spark/SecurityManager.scala
> @@ -23,7 +23,7 @@ import java.security.{KeyStore, SecureRandom}
>  import java.security.cert.X509Certificate
>  import javax.net.ssl._
>  
> -import com.google.common.hash.HashCodes
> +import com.google.common.hash.HashCode
>  import com.google.common.io.Files
>  import org.apache.hadoop.io.Text
>  
> @@ -432,7 +432,7 @@ private[spark] class SecurityManager(sparkConf: SparkConf)
>  val secret = new Array[Byte](length)
>  rnd.nextBytes(secret)
>  
> -val cookie = HashCodes.fromBytes(secret).toString()
> +val cookie = HashCode.fromBytes(secret).toString()
>  SparkHadoopUtil.get.addSecretKeyToUserCredentials(SECRET_LOOKUP_KEY, 
> cookie)
>  cookie
>} else {
> diff --git a/core/src/main/scala/org/apache/spark/SparkEnv.scala 
> b/core/src/main/scala/org/apache/spark/SparkEnv.scala
> index af50a6d..02545ae 100644
> --- a/core/src/main/scala/org/apache/spark/SparkEnv.scala
> +++ b/core/src/main/scala/org/apache/spark/SparkEnv.scala
> @@ -72,7 +72,7 @@ class SparkEnv (
>  
>// A general, soft-reference map for metadata needed during HadoopRDD 
> split computation
>// (e.g., HadoopFileRDD uses this to cache JobConfs and InputFormats).
> -  private[spark] val hadoopJobMetadata = new 
> MapMaker().softValues().makeMap[String, Any]()
> +  private[spark] val hadoopJobMetadata = new 
> MapMaker().weakValues().makeMap[String, Any]()
>  
>private[spark] var driverTmpDir: Option[String] = None
>  
> diff --git a/pom.xml b/pom.xml
> index d064cb5..7c3e036 100644
> --- a/pom.xml
> +++ b/pom.xml
> @@ -368,8 +368,7 @@
>
>  com.google.guava
>  guava
> -14.0.1
> -provided
> +19.0
>
>
>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16725) Migrate Guava to 16+?

2016-08-16 Thread Brian Hess (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423534#comment-15423534
 ] 

 Brian Hess commented on SPARK-16725:
-

But it looks like it is not being shaded in 2.0:
https://github.com/apache/spark/blob/branch-2.0/assembly/pom.xml#L79-L89

>From the comment there: "Because we don't shade dependencies anymore..."

> Migrate Guava to 16+?
> -
>
> Key: SPARK-16725
> URL: https://issues.apache.org/jira/browse/SPARK-16725
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.0.1
>Reporter: Min Wei
>Priority: Minor
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Currently Spark depends on an old version of Guava, version 14. However 
> Spark-cassandra driver asserts on Guava version 16 and above. 
> It would be great to update the Guava dependency to version 16+
> diff --git a/core/src/main/scala/org/apache/spark/SecurityManager.scala 
> b/core/src/main/scala/org/apache/spark/SecurityManager.scala
> index f72c7de..abddafe 100644
> --- a/core/src/main/scala/org/apache/spark/SecurityManager.scala
> +++ b/core/src/main/scala/org/apache/spark/SecurityManager.scala
> @@ -23,7 +23,7 @@ import java.security.{KeyStore, SecureRandom}
>  import java.security.cert.X509Certificate
>  import javax.net.ssl._
>  
> -import com.google.common.hash.HashCodes
> +import com.google.common.hash.HashCode
>  import com.google.common.io.Files
>  import org.apache.hadoop.io.Text
>  
> @@ -432,7 +432,7 @@ private[spark] class SecurityManager(sparkConf: SparkConf)
>  val secret = new Array[Byte](length)
>  rnd.nextBytes(secret)
>  
> -val cookie = HashCodes.fromBytes(secret).toString()
> +val cookie = HashCode.fromBytes(secret).toString()
>  SparkHadoopUtil.get.addSecretKeyToUserCredentials(SECRET_LOOKUP_KEY, 
> cookie)
>  cookie
>} else {
> diff --git a/core/src/main/scala/org/apache/spark/SparkEnv.scala 
> b/core/src/main/scala/org/apache/spark/SparkEnv.scala
> index af50a6d..02545ae 100644
> --- a/core/src/main/scala/org/apache/spark/SparkEnv.scala
> +++ b/core/src/main/scala/org/apache/spark/SparkEnv.scala
> @@ -72,7 +72,7 @@ class SparkEnv (
>  
>// A general, soft-reference map for metadata needed during HadoopRDD 
> split computation
>// (e.g., HadoopFileRDD uses this to cache JobConfs and InputFormats).
> -  private[spark] val hadoopJobMetadata = new 
> MapMaker().softValues().makeMap[String, Any]()
> +  private[spark] val hadoopJobMetadata = new 
> MapMaker().weakValues().makeMap[String, Any]()
>  
>private[spark] var driverTmpDir: Option[String] = None
>  
> diff --git a/pom.xml b/pom.xml
> index d064cb5..7c3e036 100644
> --- a/pom.xml
> +++ b/pom.xml
> @@ -368,8 +368,7 @@
>
>  com.google.guava
>  guava
> -14.0.1
> -provided
> +19.0
>
>
>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17095) Latex and Scala doc do not play nicely

2016-08-16 Thread Seth Hendrickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423514#comment-15423514
 ] 

Seth Hendrickson commented on SPARK-17095:
--

cc [~lins05] [~srowen] [~jodersky]

> Latex and Scala doc do not play nicely
> --
>
> Key: SPARK-17095
> URL: https://issues.apache.org/jira/browse/SPARK-17095
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Seth Hendrickson
>Priority: Minor
>  Labels: starter
>
> In Latex, it is common to find "}}}" when closing several expressions at 
> once. [SPARK-16822|https://issues.apache.org/jira/browse/SPARK-16822] added 
> Mathjax to render Latex equations in scaladoc. However, when scala doc sees 
> "}}}" or "{{{" it treats it as a special character for code block. This 
> results in some very strange output.
> A poor workaround is to use "}}\,}" in latex which inserts a small 
> whitespace. This is not ideal, and we can hopefully find a better solution. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17095) Latex and Scala doc do not play nicely

2016-08-16 Thread Seth Hendrickson (JIRA)

Seth Hendrickson created SPARK-17095:


 Summary: Latex and Scala doc do not play nicely
 Key: SPARK-17095
 URL: https://issues.apache.org/jira/browse/SPARK-17095
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Reporter: Seth Hendrickson
Priority: Minor


In Latex, it is common to find "}}}" when closing several expressions at once. 
[SPARK-16822|https://issues.apache.org/jira/browse/SPARK-16822] added Mathjax 
to render Latex equations in scaladoc. However, when scala doc sees "}}}" or 
"{{{" it treats it as a special character for code block. This results in some 
very strange output.

A poor workaround is to use "}}\,}" in latex which inserts a small whitespace. 
This is not ideal, and we can hopefully find a better solution. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16725) Migrate Guava to 16+?

2016-08-16 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423505#comment-15423505
 ] 

Marcelo Vanzin commented on SPARK-16725:


Exposing 3rd-party libraries in an API should be considered a bug, unless 
there's really no way (e.g. Spark needs to expose parts of the Hadoop API). 

Spark 1.x did that with Guava, but that went out before it could be fixed; so 
the shading in 1.x was not complete. Spark 2.x fixes that (there's no more 
Guava anything in the public API).

> Migrate Guava to 16+?
> -
>
> Key: SPARK-16725
> URL: https://issues.apache.org/jira/browse/SPARK-16725
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.0.1
>Reporter: Min Wei
>Priority: Minor
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Currently Spark depends on an old version of Guava, version 14. However 
> Spark-cassandra driver asserts on Guava version 16 and above. 
> It would be great to update the Guava dependency to version 16+
> diff --git a/core/src/main/scala/org/apache/spark/SecurityManager.scala 
> b/core/src/main/scala/org/apache/spark/SecurityManager.scala
> index f72c7de..abddafe 100644
> --- a/core/src/main/scala/org/apache/spark/SecurityManager.scala
> +++ b/core/src/main/scala/org/apache/spark/SecurityManager.scala
> @@ -23,7 +23,7 @@ import java.security.{KeyStore, SecureRandom}
>  import java.security.cert.X509Certificate
>  import javax.net.ssl._
>  
> -import com.google.common.hash.HashCodes
> +import com.google.common.hash.HashCode
>  import com.google.common.io.Files
>  import org.apache.hadoop.io.Text
>  
> @@ -432,7 +432,7 @@ private[spark] class SecurityManager(sparkConf: SparkConf)
>  val secret = new Array[Byte](length)
>  rnd.nextBytes(secret)
>  
> -val cookie = HashCodes.fromBytes(secret).toString()
> +val cookie = HashCode.fromBytes(secret).toString()
>  SparkHadoopUtil.get.addSecretKeyToUserCredentials(SECRET_LOOKUP_KEY, 
> cookie)
>  cookie
>} else {
> diff --git a/core/src/main/scala/org/apache/spark/SparkEnv.scala 
> b/core/src/main/scala/org/apache/spark/SparkEnv.scala
> index af50a6d..02545ae 100644
> --- a/core/src/main/scala/org/apache/spark/SparkEnv.scala
> +++ b/core/src/main/scala/org/apache/spark/SparkEnv.scala
> @@ -72,7 +72,7 @@ class SparkEnv (
>  
>// A general, soft-reference map for metadata needed during HadoopRDD 
> split computation
>// (e.g., HadoopFileRDD uses this to cache JobConfs and InputFormats).
> -  private[spark] val hadoopJobMetadata = new 
> MapMaker().softValues().makeMap[String, Any]()
> +  private[spark] val hadoopJobMetadata = new 
> MapMaker().weakValues().makeMap[String, Any]()
>  
>private[spark] var driverTmpDir: Option[String] = None
>  
> diff --git a/pom.xml b/pom.xml
> index d064cb5..7c3e036 100644
> --- a/pom.xml
> +++ b/pom.xml
> @@ -368,8 +368,7 @@
>
>  com.google.guava
>  guava
> -14.0.1
> -provided
> +19.0
>
>
>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16725) Migrate Guava to 16+?

2016-08-16 Thread Russell Spitzer (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423498#comment-15423498
 ] 

Russell Spitzer commented on SPARK-16725:
-

I think *But it works* is a bit of an overstatement. It "works" when those 
shaded libraries are never exposed through a public api but it is basically 
broken whenever they are.

> Migrate Guava to 16+?
> -
>
> Key: SPARK-16725
> URL: https://issues.apache.org/jira/browse/SPARK-16725
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.0.1
>Reporter: Min Wei
>Priority: Minor
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Currently Spark depends on an old version of Guava, version 14. However 
> Spark-cassandra driver asserts on Guava version 16 and above. 
> It would be great to update the Guava dependency to version 16+
> diff --git a/core/src/main/scala/org/apache/spark/SecurityManager.scala 
> b/core/src/main/scala/org/apache/spark/SecurityManager.scala
> index f72c7de..abddafe 100644
> --- a/core/src/main/scala/org/apache/spark/SecurityManager.scala
> +++ b/core/src/main/scala/org/apache/spark/SecurityManager.scala
> @@ -23,7 +23,7 @@ import java.security.{KeyStore, SecureRandom}
>  import java.security.cert.X509Certificate
>  import javax.net.ssl._
>  
> -import com.google.common.hash.HashCodes
> +import com.google.common.hash.HashCode
>  import com.google.common.io.Files
>  import org.apache.hadoop.io.Text
>  
> @@ -432,7 +432,7 @@ private[spark] class SecurityManager(sparkConf: SparkConf)
>  val secret = new Array[Byte](length)
>  rnd.nextBytes(secret)
>  
> -val cookie = HashCodes.fromBytes(secret).toString()
> +val cookie = HashCode.fromBytes(secret).toString()
>  SparkHadoopUtil.get.addSecretKeyToUserCredentials(SECRET_LOOKUP_KEY, 
> cookie)
>  cookie
>} else {
> diff --git a/core/src/main/scala/org/apache/spark/SparkEnv.scala 
> b/core/src/main/scala/org/apache/spark/SparkEnv.scala
> index af50a6d..02545ae 100644
> --- a/core/src/main/scala/org/apache/spark/SparkEnv.scala
> +++ b/core/src/main/scala/org/apache/spark/SparkEnv.scala
> @@ -72,7 +72,7 @@ class SparkEnv (
>  
>// A general, soft-reference map for metadata needed during HadoopRDD 
> split computation
>// (e.g., HadoopFileRDD uses this to cache JobConfs and InputFormats).
> -  private[spark] val hadoopJobMetadata = new 
> MapMaker().softValues().makeMap[String, Any]()
> +  private[spark] val hadoopJobMetadata = new 
> MapMaker().weakValues().makeMap[String, Any]()
>  
>private[spark] var driverTmpDir: Option[String] = None
>  
> diff --git a/pom.xml b/pom.xml
> index d064cb5..7c3e036 100644
> --- a/pom.xml
> +++ b/pom.xml
> @@ -368,8 +368,7 @@
>
>  com.google.guava
>  guava
> -14.0.1
> -provided
> +19.0
>
>
>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17094) provide simplified API for ML pipeline

2016-08-16 Thread yuhao yang (JIRA)

yuhao yang created SPARK-17094:
--

 Summary: provide simplified API for ML pipeline
 Key: SPARK-17094
 URL: https://issues.apache.org/jira/browse/SPARK-17094
 Project: Spark
  Issue Type: New Feature
  Components: ML
Reporter: yuhao yang


Many machine learning pipeline has the API for easily assembling transformers.

One example would be:
val model = new Pipeline("tokenizer", "countvectorizer", "lda").fit(data).

Appreciate feedback and suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17093) Roundtrip encoding of array> fields is wrong when whole-stage codegen is disabled

2016-08-16 Thread Josh Rosen (JIRA)

Josh Rosen created SPARK-17093:
--

 Summary: Roundtrip encoding of array> fields is wrong 
when whole-stage codegen is disabled
 Key: SPARK-17093
 URL: https://issues.apache.org/jira/browse/SPARK-17093
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Josh Rosen
Priority: Critical


The following failing test demonstrates a bug where Spark mis-encodes 
array-of-struct fields if whole-stage codegen is disabled:

{code}
withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
  val data = Array(Array((1, 2), (3, 4)))
  val ds = spark.sparkContext.parallelize(data).toDS()
  assert(ds.collect() === data)
}
{code}

When wholestage codegen is enabled (the default), this works fine. When it's 
disabled, as in the test above, Spark returns {{Array(Array((3,4), (3,4)))}}. 
Because the last element of the array appears to be repeated my best guess is 
that the interpreted evaluation codepath forgot to {{copy()}} somewhere.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15083) History Server would OOM due to unlimited TaskUIData in some stages

2016-08-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423393#comment-15423393
 ] 

Apache Spark commented on SPARK-15083:
--

User 'ajbozarth' has created a pull request for this issue:
https://github.com/apache/spark/pull/14673

> History Server would OOM due to unlimited TaskUIData in some stages
> ---
>
> Key: SPARK-15083
> URL: https://issues.apache.org/jira/browse/SPARK-15083
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.5.2, 1.6.0, 2.0.0
>Reporter: Zheng Tan
> Attachments: Screen Shot 2016-05-01 at 3.50.02 PM.png, Screen Shot 
> 2016-05-01 at 3.51.01 PM.png, Screen Shot 2016-05-01 at 3.51.59 PM.png, 
> Screen Shot 2016-05-01 at 3.55.30 PM.png
>
>
> History Server will load all tasks in a stage, which would cause memory leak 
> if tasks occupy too many memory. 
> In the following example, a single application would consume 1.1G memory of 
> History Sever. 
> I think we should limit tasks memory usages by adding spark.ui.retainedTasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17092) DataFrame with large number of columns causing code generation error

2016-08-16 Thread Aris V (JIRA)

Aris V created SPARK-17092:
--

 Summary: DataFrame with large number of columns causing code 
generation error
 Key: SPARK-17092
 URL: https://issues.apache.org/jira/browse/SPARK-17092
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
 Environment: On vanilla Spark hadoop 2.7 Scala 2.11 in Linux CentOS, 
cluster with 9 slaves. Amazon AWS node size m3.2xlarge.

Reporter: Aris V


On vanilla Spark hadoop 2.7 Scala 2.11:

When I use randomSplit on a DataFrame with several hundreds of columns, I get 
Janino code generation errors. The lowest number of columns that triggers the 
bug is around 500 or less.

The error message:

```
Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method 
"(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I"
 of class "org.apache.spark.sql.catalyst.ex
pressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB
```

Here is a small code sample which causes it in spark-shell

```
import org.apache.spark.sql.types.{DoubleType, StructType}
import org.apache.spark.sql.{Row, SparkSession}

val COLMAX: Double = 500.0
val ROWSIZE: Int = 1000

val intToRow: Int => Row = (i: Int) => Row.fromSeq(Range.Double.inclusive(1.0, 
COLMAX, 1.0).toSeq)
val schema: StructType = (1 to COLMAX.toInt).foldLeft(new StructType())((s, i) 
=> s.add(i.toString, DoubleType, nullable = true))
val rdds = spark.sparkContext.parallelize((1 to ROWSIZE).map(intToRow))
val df = spark.createDataFrame(rdds, schema)
val Array(left, right) = df.randomSplit(Array(.8,.2))

// This crashes
left.count
```






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17034) Ordinal in ORDER BY or GROUP BY should be treated as an unresolved expression

2016-08-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423376#comment-15423376
 ] 

Apache Spark commented on SPARK-17034:
--

User 'petermaxlee' has created a pull request for this issue:
https://github.com/apache/spark/pull/14672

> Ordinal in ORDER BY or GROUP BY should be treated as an unresolved expression
> -
>
> Key: SPARK-17034
> URL: https://issues.apache.org/jira/browse/SPARK-17034
> Project: Spark
>  Issue Type: Bug
>Reporter: Sean Zhong
>Assignee: Sean Zhong
> Fix For: 2.1.0
>
>
> Ordinals in GROUP BY or ORDER BY like "1" in "order by 1" or "group by 1" 
> should be considered as unresolved before analysis. But in current code, it 
> uses "Literal" expression to store the ordinal. This is inappropriate as 
> "Literal" itself is a resolved expression, it gives the user a wrong message 
> that the ordinals has already been resolved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16656) CreateTableAsSelectSuite is flaky

2016-08-16 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-16656:
-
Fix Version/s: 1.6.3

> CreateTableAsSelectSuite is flaky
> -
>
> Key: SPARK-16656
> URL: https://issues.apache.org/jira/browse/SPARK-16656
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Yin Huai
> Fix For: 1.6.3, 2.0.1, 2.1.0
>
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62593/testReport/junit/org.apache.spark.sql.sources/CreateTableAsSelectSuite/create_a_table__drop_it_and_create_another_one_with_the_same_name/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-17089) Remove link of api doc for mapReduceTriplets because its removed from api.

2016-08-16 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-17089.
-
   Resolution: Fixed
Fix Version/s: 2.1.0
   2.0.1

> Remove link of api doc for mapReduceTriplets because its removed from api. 
> ---
>
> Key: SPARK-17089
> URL: https://issues.apache.org/jira/browse/SPARK-17089
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.0.0
>Reporter: sandeep purohit
>Assignee: sandeep purohit
>Priority: Trivial
> Fix For: 2.0.1, 2.1.0
>
>
> Remove link of api doc for mapReduceTriplets because its removed from api 
> because when user redirected to the latest api doc they cant get any api 
> description for mapReduceTriplets



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17089) Remove link of api doc for mapReduceTriplets because its removed from api.

2016-08-16 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-17089:

Assignee: sandeep purohit

> Remove link of api doc for mapReduceTriplets because its removed from api. 
> ---
>
> Key: SPARK-17089
> URL: https://issues.apache.org/jira/browse/SPARK-17089
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.0.0
>Reporter: sandeep purohit
>Assignee: sandeep purohit
>Priority: Trivial
> Fix For: 2.0.1, 2.1.0
>
>
> Remove link of api doc for mapReduceTriplets because its removed from api 
> because when user redirected to the latest api doc they cant get any api 
> description for mapReduceTriplets



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17091) ParquetFilters rewrite IN to OR of Eq

2016-08-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423245#comment-15423245
 ] 

Apache Spark commented on SPARK-17091:
--

User 'andreweduffy' has created a pull request for this issue:
https://github.com/apache/spark/pull/14671

> ParquetFilters rewrite IN to OR of Eq
> -
>
> Key: SPARK-17091
> URL: https://issues.apache.org/jira/browse/SPARK-17091
> Project: Spark
>  Issue Type: Bug
>Reporter: Andrew Duffy
>
> Past attempts at pushing down the InSet operation for Parquet relied on 
> user-defined predicates. It would be simpler to rewrite an IN clause into the 
> corresponding OR union of a set of equality conditions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-17091) ParquetFilters rewrite IN to OR of Eq

2016-08-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17091:


Assignee: Apache Spark

> ParquetFilters rewrite IN to OR of Eq
> -
>
> Key: SPARK-17091
> URL: https://issues.apache.org/jira/browse/SPARK-17091
> Project: Spark
>  Issue Type: Bug
>Reporter: Andrew Duffy
>Assignee: Apache Spark
>
> Past attempts at pushing down the InSet operation for Parquet relied on 
> user-defined predicates. It would be simpler to rewrite an IN clause into the 
> corresponding OR union of a set of equality conditions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-17091) ParquetFilters rewrite IN to OR of Eq

2016-08-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17091:


Assignee: (was: Apache Spark)

> ParquetFilters rewrite IN to OR of Eq
> -
>
> Key: SPARK-17091
> URL: https://issues.apache.org/jira/browse/SPARK-17091
> Project: Spark
>  Issue Type: Bug
>Reporter: Andrew Duffy
>
> Past attempts at pushing down the InSet operation for Parquet relied on 
> user-defined predicates. It would be simpler to rewrite an IN clause into the 
> corresponding OR union of a set of equality conditions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16654) UI Should show blacklisted executors & nodes

2016-08-16 Thread Alex Bozarth (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423240#comment-15423240
 ] 

Alex Bozarth commented on SPARK-16654:
--

I don't have time right now to tackle this so go right ahead. And the other 
part of my comment was a implementation suggestion. We currently have a 
"status" column that lists either Alive or Dead. I'm suggesting that when 
shown, Blacklisted nodes are listed as Blacklisted or Alive (Blacklisted) in 
the status column, this would make the ui change for this very minimal to the 
user even though it'll be a good chunk of code to make it work behind the 
scenes.

> UI Should show blacklisted executors & nodes
> 
>
> Key: SPARK-16654
> URL: https://issues.apache.org/jira/browse/SPARK-16654
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Web UI
>Affects Versions: 2.0.0
>Reporter: Imran Rashid
>
> SPARK-8425 will add the ability to blacklist entire executors and nodes to 
> deal w/ faulty hardware.  However, without displaying it on the UI, it can be 
> hard to realize which executor is bad, and why tasks aren't getting scheduled 
> on certain executors.
> As a first step, we should just show nodes and executors that are blacklisted 
> for the entire application (no need to show blacklisting for tasks & stages).
> This should also ensure that blacklisting events get into the event logs for 
> the history server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15285) Generated SpecificSafeProjection.apply method grows beyond 64 KB

2016-08-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423235#comment-15423235
 ] 

Apache Spark commented on SPARK-15285:
--

User 'kiszk' has created a pull request for this issue:
https://github.com/apache/spark/pull/14670

> Generated SpecificSafeProjection.apply method grows beyond 64 KB
> 
>
> Key: SPARK-15285
> URL: https://issues.apache.org/jira/browse/SPARK-15285
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Konstantin Shaposhnikov
>Assignee: Kazuaki Ishizaki
> Fix For: 2.0.0
>
>
> The following code snippet results in 
> {noformat}
>  org.codehaus.janino.JaninoRuntimeException: Code of method 
> "(Ljava/lang/Object;)Ljava/lang/Object;" of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection"
>  grows beyond 64 KB
>   at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:941)
> {noformat}
> {code}
> case class S100(s1:String="1", s2:String="2", s3:String="3", s4:String="4", 
> s5:String="5", s6:String="6", s7:String="7", s8:String="8", s9:String="9", 
> s10:String="10", s11:String="11", s12:String="12", s13:String="13", 
> s14:String="14", s15:String="15", s16:String="16", s17:String="17", 
> s18:String="18", s19:String="19", s20:String="20", s21:String="21", 
> s22:String="22", s23:String="23", s24:String="24", s25:String="25", 
> s26:String="26", s27:String="27", s28:String="28", s29:String="29", 
> s30:String="30", s31:String="31", s32:String="32", s33:String="33", 
> s34:String="34", s35:String="35", s36:String="36", s37:String="37", 
> s38:String="38", s39:String="39", s40:String="40", s41:String="41", 
> s42:String="42", s43:String="43", s44:String="44", s45:String="45", 
> s46:String="46", s47:String="47", s48:String="48", s49:String="49", 
> s50:String="50", s51:String="51", s52:String="52", s53:String="53", 
> s54:String="54", s55:String="55", s56:String="56", s57:String="57", 
> s58:String="58", s59:String="59", s60:String="60", s61:String="61", 
> s62:String="62", s63:String="63", s64:String="64", s65:String="65", 
> s66:String="66", s67:String="67", s68:String="68", s69:String="69", 
> s70:String="70", s71:String="71", s72:String="72", s73:String="73", 
> s74:String="74", s75:String="75", s76:String="76", s77:String="77", 
> s78:String="78", s79:String="79", s80:String="80", s81:String="81", 
> s82:String="82", s83:String="83", s84:String="84", s85:String="85", 
> s86:String="86", s87:String="87", s88:String="88", s89:String="89", 
> s90:String="90", s91:String="91", s92:String="92", s93:String="93", 
> s94:String="94", s95:String="95", s96:String="96", s97:String="97", 
> s98:String="98", s99:String="99", s100:String="100")
> case class S(s1: S100=S100(), s2: S100=S100(), s3: S100=S100(), s4: 
> S100=S100(), s5: S100=S100(), s6: S100=S100(), s7: S100=S100(), s8: 
> S100=S100(), s9: S100=S100(), s10: S100=S100())
> val ds = Seq(S(),S(),S()).toDS
> ds.show()
> {code}
> I could reproduce this with Spark built from 1.6 branch and with 
> https://home.apache.org/~pwendell/spark-nightly/spark-master-bin/spark-2.0.0-SNAPSHOT-2016_05_11_01_03-8beae59-bin/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-15285) Generated SpecificSafeProjection.apply method grows beyond 64 KB

2016-08-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15285:


Assignee: Apache Spark  (was: Kazuaki Ishizaki)

> Generated SpecificSafeProjection.apply method grows beyond 64 KB
> 
>
> Key: SPARK-15285
> URL: https://issues.apache.org/jira/browse/SPARK-15285
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Konstantin Shaposhnikov
>Assignee: Apache Spark
> Fix For: 2.0.0
>
>
> The following code snippet results in 
> {noformat}
>  org.codehaus.janino.JaninoRuntimeException: Code of method 
> "(Ljava/lang/Object;)Ljava/lang/Object;" of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection"
>  grows beyond 64 KB
>   at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:941)
> {noformat}
> {code}
> case class S100(s1:String="1", s2:String="2", s3:String="3", s4:String="4", 
> s5:String="5", s6:String="6", s7:String="7", s8:String="8", s9:String="9", 
> s10:String="10", s11:String="11", s12:String="12", s13:String="13", 
> s14:String="14", s15:String="15", s16:String="16", s17:String="17", 
> s18:String="18", s19:String="19", s20:String="20", s21:String="21", 
> s22:String="22", s23:String="23", s24:String="24", s25:String="25", 
> s26:String="26", s27:String="27", s28:String="28", s29:String="29", 
> s30:String="30", s31:String="31", s32:String="32", s33:String="33", 
> s34:String="34", s35:String="35", s36:String="36", s37:String="37", 
> s38:String="38", s39:String="39", s40:String="40", s41:String="41", 
> s42:String="42", s43:String="43", s44:String="44", s45:String="45", 
> s46:String="46", s47:String="47", s48:String="48", s49:String="49", 
> s50:String="50", s51:String="51", s52:String="52", s53:String="53", 
> s54:String="54", s55:String="55", s56:String="56", s57:String="57", 
> s58:String="58", s59:String="59", s60:String="60", s61:String="61", 
> s62:String="62", s63:String="63", s64:String="64", s65:String="65", 
> s66:String="66", s67:String="67", s68:String="68", s69:String="69", 
> s70:String="70", s71:String="71", s72:String="72", s73:String="73", 
> s74:String="74", s75:String="75", s76:String="76", s77:String="77", 
> s78:String="78", s79:String="79", s80:String="80", s81:String="81", 
> s82:String="82", s83:String="83", s84:String="84", s85:String="85", 
> s86:String="86", s87:String="87", s88:String="88", s89:String="89", 
> s90:String="90", s91:String="91", s92:String="92", s93:String="93", 
> s94:String="94", s95:String="95", s96:String="96", s97:String="97", 
> s98:String="98", s99:String="99", s100:String="100")
> case class S(s1: S100=S100(), s2: S100=S100(), s3: S100=S100(), s4: 
> S100=S100(), s5: S100=S100(), s6: S100=S100(), s7: S100=S100(), s8: 
> S100=S100(), s9: S100=S100(), s10: S100=S100())
> val ds = Seq(S(),S(),S()).toDS
> ds.show()
> {code}
> I could reproduce this with Spark built from 1.6 branch and with 
> https://home.apache.org/~pwendell/spark-nightly/spark-master-bin/spark-2.0.0-SNAPSHOT-2016_05_11_01_03-8beae59-bin/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-15285) Generated SpecificSafeProjection.apply method grows beyond 64 KB

2016-08-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15285:


Assignee: Kazuaki Ishizaki  (was: Apache Spark)

> Generated SpecificSafeProjection.apply method grows beyond 64 KB
> 
>
> Key: SPARK-15285
> URL: https://issues.apache.org/jira/browse/SPARK-15285
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Konstantin Shaposhnikov
>Assignee: Kazuaki Ishizaki
> Fix For: 2.0.0
>
>
> The following code snippet results in 
> {noformat}
>  org.codehaus.janino.JaninoRuntimeException: Code of method 
> "(Ljava/lang/Object;)Ljava/lang/Object;" of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection"
>  grows beyond 64 KB
>   at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:941)
> {noformat}
> {code}
> case class S100(s1:String="1", s2:String="2", s3:String="3", s4:String="4", 
> s5:String="5", s6:String="6", s7:String="7", s8:String="8", s9:String="9", 
> s10:String="10", s11:String="11", s12:String="12", s13:String="13", 
> s14:String="14", s15:String="15", s16:String="16", s17:String="17", 
> s18:String="18", s19:String="19", s20:String="20", s21:String="21", 
> s22:String="22", s23:String="23", s24:String="24", s25:String="25", 
> s26:String="26", s27:String="27", s28:String="28", s29:String="29", 
> s30:String="30", s31:String="31", s32:String="32", s33:String="33", 
> s34:String="34", s35:String="35", s36:String="36", s37:String="37", 
> s38:String="38", s39:String="39", s40:String="40", s41:String="41", 
> s42:String="42", s43:String="43", s44:String="44", s45:String="45", 
> s46:String="46", s47:String="47", s48:String="48", s49:String="49", 
> s50:String="50", s51:String="51", s52:String="52", s53:String="53", 
> s54:String="54", s55:String="55", s56:String="56", s57:String="57", 
> s58:String="58", s59:String="59", s60:String="60", s61:String="61", 
> s62:String="62", s63:String="63", s64:String="64", s65:String="65", 
> s66:String="66", s67:String="67", s68:String="68", s69:String="69", 
> s70:String="70", s71:String="71", s72:String="72", s73:String="73", 
> s74:String="74", s75:String="75", s76:String="76", s77:String="77", 
> s78:String="78", s79:String="79", s80:String="80", s81:String="81", 
> s82:String="82", s83:String="83", s84:String="84", s85:String="85", 
> s86:String="86", s87:String="87", s88:String="88", s89:String="89", 
> s90:String="90", s91:String="91", s92:String="92", s93:String="93", 
> s94:String="94", s95:String="95", s96:String="96", s97:String="97", 
> s98:String="98", s99:String="99", s100:String="100")
> case class S(s1: S100=S100(), s2: S100=S100(), s3: S100=S100(), s4: 
> S100=S100(), s5: S100=S100(), s6: S100=S100(), s7: S100=S100(), s8: 
> S100=S100(), s9: S100=S100(), s10: S100=S100())
> val ds = Seq(S(),S(),S()).toDS
> ds.show()
> {code}
> I could reproduce this with Spark built from 1.6 branch and with 
> https://home.apache.org/~pwendell/spark-nightly/spark-master-bin/spark-2.0.0-SNAPSHOT-2016_05_11_01_03-8beae59-bin/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16519) Handle SparkR RDD generics that create warnings in R CMD check

2016-08-16 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-16519:
--
Assignee: Felix Cheung

> Handle SparkR RDD generics that create warnings in R CMD check
> --
>
> Key: SPARK-16519
> URL: https://issues.apache.org/jira/browse/SPARK-16519
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>Assignee: Felix Cheung
> Fix For: 2.0.1, 2.1.0
>
>
> One of the warnings we get from R CMD check is that RDD implementations of 
> some of the generics are not documented. These generics are shared between 
> RDD, DataFrames in SparkR. The list includes
> {quote}
> WARNING
> Undocumented S4 methods:
>   generic 'cache' and siglist 'RDD'
>   generic 'collect' and siglist 'RDD'
>   generic 'count' and siglist 'RDD'
>   generic 'distinct' and siglist 'RDD'
>   generic 'first' and siglist 'RDD'
>   generic 'join' and siglist 'RDD,RDD'
>   generic 'length' and siglist 'RDD'
>   generic 'partitionBy' and siglist 'RDD'
>   generic 'persist' and siglist 'RDD,character'
>   generic 'repartition' and siglist 'RDD'
>   generic 'show' and siglist 'RDD'
>   generic 'take' and siglist 'RDD,numeric'
>   generic 'unpersist' and siglist 'RDD'
> {quote}
> As described in 
> https://stat.ethz.ch/pipermail/r-devel/2003-September/027490.html this looks 
> like a limitation of R where exporting a generic from a package also exports 
> all the implementations of that generic. 
> One way to get around this is to remove the RDD API or rename the methods in 
> Spark 2.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-16519) Handle SparkR RDD generics that create warnings in R CMD check

2016-08-16 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-16519.
---
   Resolution: Fixed
Fix Version/s: 2.1.0
   2.0.1

Issue resolved by pull request 14626
[https://github.com/apache/spark/pull/14626]

> Handle SparkR RDD generics that create warnings in R CMD check
> --
>
> Key: SPARK-16519
> URL: https://issues.apache.org/jira/browse/SPARK-16519
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
> Fix For: 2.0.1, 2.1.0
>
>
> One of the warnings we get from R CMD check is that RDD implementations of 
> some of the generics are not documented. These generics are shared between 
> RDD, DataFrames in SparkR. The list includes
> {quote}
> WARNING
> Undocumented S4 methods:
>   generic 'cache' and siglist 'RDD'
>   generic 'collect' and siglist 'RDD'
>   generic 'count' and siglist 'RDD'
>   generic 'distinct' and siglist 'RDD'
>   generic 'first' and siglist 'RDD'
>   generic 'join' and siglist 'RDD,RDD'
>   generic 'length' and siglist 'RDD'
>   generic 'partitionBy' and siglist 'RDD'
>   generic 'persist' and siglist 'RDD,character'
>   generic 'repartition' and siglist 'RDD'
>   generic 'show' and siglist 'RDD'
>   generic 'take' and siglist 'RDD,numeric'
>   generic 'unpersist' and siglist 'RDD'
> {quote}
> As described in 
> https://stat.ethz.ch/pipermail/r-devel/2003-September/027490.html this looks 
> like a limitation of R where exporting a generic from a package also exports 
> all the implementations of that generic. 
> One way to get around this is to remove the RDD API or rename the methods in 
> Spark 2.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17091) ParquetFilters rewrite IN to OR of Eq

2016-08-16 Thread Andrew Duffy (JIRA)

Andrew Duffy created SPARK-17091:


 Summary: ParquetFilters rewrite IN to OR of Eq
 Key: SPARK-17091
 URL: https://issues.apache.org/jira/browse/SPARK-17091
 Project: Spark
  Issue Type: Bug
Reporter: Andrew Duffy


Past attempts at pushing down the InSet operation for Parquet relied on 
user-defined predicates. It would be simpler to rewrite an IN clause into the 
corresponding OR union of a set of equality conditions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17002) Document that spark.ssl.protocol. is required for SSL

2016-08-16 Thread Miao Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423156#comment-15423156
 ] 

Miao Wang commented on SPARK-17002:
---

I think we can add a require() statement and remove the getOrElse part. Thus, 
in your case, it will throw a meaningful message. I can create a PR for this 
one.

> Document that spark.ssl.protocol. is required for SSL
> -
>
> Key: SPARK-17002
> URL: https://issues.apache.org/jira/browse/SPARK-17002
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Michael Gummelt
>
> cc [~jlewandowski]
> I was trying to start the Spark master.  When setting 
> {{spark.ssl.enabled=true}}, but failing to set {{spark.ssl.protocol}}, I get 
> this none-too-helpful error message:
> {code}
> 16/08/10 15:17:50 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(mgummelt); users 
> with modify permissions: Set(mgummelt)
> 16/08/10 15:17:50 WARN SecurityManager: Using 'accept-all' trust manager for 
> SSL connections.
> Exception in thread "main" java.security.KeyManagementException: Default 
> SSLContext is initialized automatically
>   at 
> sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749)
>   at javax.net.ssl.SSLContext.init(SSLContext.java:282)
>   at org.apache.spark.SecurityManager.(SecurityManager.scala:284)
>   at 
> org.apache.spark.deploy.master.Master$.startRpcEnvAndEndpoint(Master.scala:1121)
>   at org.apache.spark.deploy.master.Master$.main(Master.scala:1106)
>   at org.apache.spark.deploy.master.Master.main(Master.scala)
> {code}
> We should document that {{spark.ssl.protocol}} is required, and throw a more 
> helpful error message when it isn't set.  In fact, we should remove the 
> `getOrElse` here: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SecurityManager.scala#L285,
>  since the following line fails when the protocol is set to "Default"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17090) Make tree aggregation level in linear/logistic regression configurable

2016-08-16 Thread Seth Hendrickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423151#comment-15423151
 ] 

Seth Hendrickson commented on SPARK-17090:
--

cc [~dbtsai]

> Make tree aggregation level in linear/logistic regression configurable
> --
>
> Key: SPARK-17090
> URL: https://issues.apache.org/jira/browse/SPARK-17090
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Seth Hendrickson
>Priority: Minor
>
> Linear/logistic regression use treeAggregate with default aggregation depth 
> for collecting coefficient gradient updates to the driver. For high 
> dimensional problems, this can case OOM error on the driver. We should make 
> it configurable, perhaps via an expert param, so that users can avoid this 
> problem if their data has many features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17090) Make tree aggregation level in linear/logistic regression configurable

2016-08-16 Thread Seth Hendrickson (JIRA)

Seth Hendrickson created SPARK-17090:


 Summary: Make tree aggregation level in linear/logistic regression 
configurable
 Key: SPARK-17090
 URL: https://issues.apache.org/jira/browse/SPARK-17090
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Seth Hendrickson
Priority: Minor


Linear/logistic regression use treeAggregate with default aggregation depth for 
collecting coefficient gradient updates to the driver. For high dimensional 
problems, this can case OOM error on the driver. We should make it 
configurable, perhaps via an expert param, so that users can avoid this problem 
if their data has many features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17054) SparkR can not run in yarn-cluster mode on mac os

2016-08-16 Thread Miao Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423139#comment-15423139
 ] 

Miao Wang commented on SPARK-17054:
---

Maybe, I can try it out.

> SparkR can not run in yarn-cluster mode on mac os
> -
>
> Key: SPARK-17054
> URL: https://issues.apache.org/jira/browse/SPARK-17054
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Jeff Zhang
>
> This is due to it download sparkR to the wrong place.
> {noformat}
> Warning message:
> 'sparkR.init' is deprecated.
> Use 'sparkR.session' instead.
> See help("Deprecated")
> Spark not found in SPARK_HOME:  .
> To search in the cache directory. Installation will start if not found.
> Mirror site not provided.
> Looking for site suggested from apache website...
> Preferred mirror site found: http://apache.mirror.cdnetworks.com/spark
> Downloading Spark spark-2.0.0 for Hadoop 2.7 from:
> - 
> http://apache.mirror.cdnetworks.com/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz
> Fetch failed from http://apache.mirror.cdnetworks.com/spark
>  open destfile '/home//Library/Caches/spark/spark-2.0.0-bin-hadoop2.7.tgz', 
> reason 'No such file or directory'>
> To use backup site...
> Downloading Spark spark-2.0.0 for Hadoop 2.7 from:
> - 
> http://www-us.apache.org/dist/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz
> Fetch failed from http://www-us.apache.org/dist/spark
>  open destfile '/home//Library/Caches/spark/spark-2.0.0-bin-hadoop2.7.tgz', 
> reason 'No such file or directory'>
> Error in robust_download_tar(mirrorUrl, version, hadoopVersion, packageName,  
> :
>   Unable to download Spark spark-2.0.0 for Hadoop 2.7. Please check network 
> connection, Hadoop version, or provide other mirror sites.
> Calls: sparkRSQL.init ... sparkR.session -> install.spark -> 
> robust_download_tar
> In addition: Warning messages:
> 1: 'sparkRSQL.init' is deprecated.
> Use 'sparkR.session' instead.
> See help("Deprecated")
> 2: In dir.create(localDir, recursive = TRUE) :
>   cannot create dir '/home//Library', reason 'Operation not supported'
> Execution halted
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17089) Remove link of api doc for mapReduceTriplets because its removed from api.

2016-08-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423135#comment-15423135
 ] 

Apache Spark commented on SPARK-17089:
--

User 'phalodi' has created a pull request for this issue:
https://github.com/apache/spark/pull/14669

> Remove link of api doc for mapReduceTriplets because its removed from api. 
> ---
>
> Key: SPARK-17089
> URL: https://issues.apache.org/jira/browse/SPARK-17089
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.0.0
>Reporter: sandeep purohit
>Priority: Trivial
>
> Remove link of api doc for mapReduceTriplets because its removed from api 
> because when user redirected to the latest api doc they cant get any api 
> description for mapReduceTriplets



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-17089) Remove link of api doc for mapReduceTriplets because its removed from api.

2016-08-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17089:


Assignee: Apache Spark

> Remove link of api doc for mapReduceTriplets because its removed from api. 
> ---
>
> Key: SPARK-17089
> URL: https://issues.apache.org/jira/browse/SPARK-17089
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.0.0
>Reporter: sandeep purohit
>Assignee: Apache Spark
>Priority: Trivial
>
> Remove link of api doc for mapReduceTriplets because its removed from api 
> because when user redirected to the latest api doc they cant get any api 
> description for mapReduceTriplets



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-17089) Remove link of api doc for mapReduceTriplets because its removed from api.

2016-08-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17089:


Assignee: (was: Apache Spark)

> Remove link of api doc for mapReduceTriplets because its removed from api. 
> ---
>
> Key: SPARK-17089
> URL: https://issues.apache.org/jira/browse/SPARK-17089
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.0.0
>Reporter: sandeep purohit
>Priority: Trivial
>
> Remove link of api doc for mapReduceTriplets because its removed from api 
> because when user redirected to the latest api doc they cant get any api 
> description for mapReduceTriplets



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17089) Remove link of api doc for mapReduceTriplets because its removed from api.

2016-08-16 Thread sandeep purohit (JIRA)

sandeep purohit created SPARK-17089:
---

 Summary: Remove link of api doc for mapReduceTriplets because its 
removed from api. 
 Key: SPARK-17089
 URL: https://issues.apache.org/jira/browse/SPARK-17089
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 2.0.0
Reporter: sandeep purohit
Priority: Trivial


Remove link of api doc for mapReduceTriplets because its removed from api 
because when user redirected to the latest api doc they cant get any api 
description for mapReduceTriplets



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17088) IsolatedClientLoader fails to load Hive client when sharesHadoopClasses is false

2016-08-16 Thread Marcelo Vanzin (JIRA)

Marcelo Vanzin created SPARK-17088:
--

 Summary: IsolatedClientLoader fails to load Hive client when 
sharesHadoopClasses is false
 Key: SPARK-17088
 URL: https://issues.apache.org/jira/browse/SPARK-17088
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Marcelo Vanzin
Priority: Minor


There's a bug in a very rare code path in {{IsolatedClientLoader}}:

{code}
  case e: RuntimeException if e.getMessage.contains("hadoop") =>
// If the error message contains hadoop, it is probably because the 
hadoop
// version cannot be resolved (e.g. it is a vendor specific version 
like
// 2.0.0-cdh4.1.1). If it is the case, we will try just
// "org.apache.hadoop:hadoop-client:2.4.0". 
"org.apache.hadoop:hadoop-client:2.4.0"
// is used just because we used to hard code it as the hadoop 
artifact to download.
logWarning(s"Failed to resolve Hadoop artifacts for the version 
${hadoopVersion}. " +
  s"We will change the hadoop version from ${hadoopVersion} to 
2.4.0 and try again. " +
  "Hadoop classes will not be shared between Spark and Hive 
metastore client. " +
  "It is recommended to set jars used by Hive metastore client 
through " +
  "spark.sql.hive.metastore.jars in the production environment.")
sharesHadoopClasses = false
{code}

That's the rare part. But when {{sharesHadoopClasses}} is set to false, the 
instantiation of {{HiveClientImpl}} fails:

{code}
  classLoader
.loadClass(classOf[HiveClientImpl].getName)
.getConstructors.head
.newInstance(version, sparkConf, hadoopConf, config, classLoader, this)
.asInstanceOf[HiveClient]
{code}

{{hadoopConf}} here is an instance of {{Configuration}} loaded by the main 
Spark class loader, but in this case {{HiveClientImpl}} expects an instance of 
{{Configuration}} loaded by the isolated class loader (yay class loaders are 
fun). So you get an error like this:

{noformat}
2016-08-10 13:51:20.742 - stderr> Exception in thread "main" 
java.lang.IllegalArgumentException: argument type mismatch
2016-08-10 13:51:20.743 - stderr>   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
2016-08-10 13:51:20.743 - stderr>   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
2016-08-10 13:51:20.743 - stderr>   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
2016-08-10 13:51:20.743 - stderr>   at 
java.lang.reflect.Constructor.newInstance(Constructor.java:526)
2016-08-10 13:51:20.744 - stderr>   at 
org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
2016-08-10 13:51:20.744 - stderr>   at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:354)
2016-08-10 13:51:20.744 - stderr>   at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:258)
2016-08-10 13:51:20.744 - stderr>   at 
org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)
2016-08-10 13:51:20.745 - stderr>   at 
org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-17035) Conversion of datetime.max to microseconds produces incorrect value

2016-08-16 Thread Davies Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-17035.

   Resolution: Fixed
Fix Version/s: 2.1.0

Issue resolved by pull request 14631
[https://github.com/apache/spark/pull/14631]

> Conversion of datetime.max to microseconds produces incorrect value
> ---
>
> Key: SPARK-17035
> URL: https://issues.apache.org/jira/browse/SPARK-17035
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Michael Styles
>Priority: Minor
> Fix For: 2.1.0
>
>
> Conversion of datetime.max to microseconds produces incorrect value. For 
> example,
> {noformat}
> from datetime import datetime
> from pyspark.sql import Row
> from pyspark.sql.types import StructType, StructField, TimestampType
> schema = StructType([StructField("dt", TimestampType(), False)])
> data = [{"dt": datetime.max}]
> # convert python objects to sql data
> sql_data = [schema.toInternal(row) for row in data]
> # Value is wrong.
> sql_data
> [(2.534023188e+17,)]
> {noformat}
> This value should be [(2534023187,)].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16656) CreateTableAsSelectSuite is flaky

2016-08-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423051#comment-15423051
 ] 

Apache Spark commented on SPARK-16656:
--

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/14668

> CreateTableAsSelectSuite is flaky
> -
>
> Key: SPARK-16656
> URL: https://issues.apache.org/jira/browse/SPARK-16656
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Yin Huai
> Fix For: 2.0.1, 2.1.0
>
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62593/testReport/junit/org.apache.spark.sql.sources/CreateTableAsSelectSuite/create_a_table__drop_it_and_create_another_one_with_the_same_name/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17087) Make Spark on Mesos honor port restrictions - Documentation

2016-08-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423007#comment-15423007
 ] 

Apache Spark commented on SPARK-17087:
--

User 'skonto' has created a pull request for this issue:
https://github.com/apache/spark/pull/14667

> Make Spark on Mesos honor port restrictions - Documentation
> ---
>
> Key: SPARK-17087
> URL: https://issues.apache.org/jira/browse/SPARK-17087
> Project: Spark
>  Issue Type: Documentation
>  Components: Mesos
>Reporter: Stavros Kontopoulos
>
> Need to add the documentation missing from:
> https://issues.apache.org/jira/browse/SPARK-11714



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-17087) Make Spark on Mesos honor port restrictions - Documentation

2016-08-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17087:


Assignee: (was: Apache Spark)

> Make Spark on Mesos honor port restrictions - Documentation
> ---
>
> Key: SPARK-17087
> URL: https://issues.apache.org/jira/browse/SPARK-17087
> Project: Spark
>  Issue Type: Documentation
>  Components: Mesos
>Reporter: Stavros Kontopoulos
>
> Need to add the documentation missing from:
> https://issues.apache.org/jira/browse/SPARK-11714



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-17087) Make Spark on Mesos honor port restrictions - Documentation

2016-08-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17087:


Assignee: Apache Spark

> Make Spark on Mesos honor port restrictions - Documentation
> ---
>
> Key: SPARK-17087
> URL: https://issues.apache.org/jira/browse/SPARK-17087
> Project: Spark
>  Issue Type: Documentation
>  Components: Mesos
>Reporter: Stavros Kontopoulos
>Assignee: Apache Spark
>
> Need to add the documentation missing from:
> https://issues.apache.org/jira/browse/SPARK-11714



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17087) Make Spark on Mesos honor port restrictions - Documentation

2016-08-16 Thread Stavros Kontopoulos (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stavros Kontopoulos updated SPARK-17087:

Description: 
Need to add the documentation missing from:
https://issues.apache.org/jira/browse/SPARK-11714


  was:
Adds the documentation missing from:
https://issues.apache.org/jira/browse/SPARK-11714



> Make Spark on Mesos honor port restrictions - Documentation
> ---
>
> Key: SPARK-17087
> URL: https://issues.apache.org/jira/browse/SPARK-17087
> Project: Spark
>  Issue Type: Documentation
>  Components: Mesos
>Reporter: Stavros Kontopoulos
>
> Need to add the documentation missing from:
> https://issues.apache.org/jira/browse/SPARK-11714



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17087) Make Spark on Mesos honor port restrictions - Documentation

2016-08-16 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15422996#comment-15422996
 ] 

Stavros Kontopoulos commented on SPARK-17087:
-

WIP

> Make Spark on Mesos honor port restrictions - Documentation
> ---
>
> Key: SPARK-17087
> URL: https://issues.apache.org/jira/browse/SPARK-17087
> Project: Spark
>  Issue Type: Documentation
>  Components: Mesos
>Reporter: Stavros Kontopoulos
>
> Need to add the documentation missing from:
> https://issues.apache.org/jira/browse/SPARK-11714



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17087) Make Spark on Mesos honor port restrictions - Documentation

2016-08-16 Thread Stavros Kontopoulos (JIRA)

Stavros Kontopoulos created SPARK-17087:
---

 Summary: Make Spark on Mesos honor port restrictions - 
Documentation
 Key: SPARK-17087
 URL: https://issues.apache.org/jira/browse/SPARK-17087
 Project: Spark
  Issue Type: Documentation
  Components: Mesos
Reporter: Stavros Kontopoulos


Adds the documentation missing from:
https://issues.apache.org/jira/browse/SPARK-11714




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17086) QuantileDiscretizer throws InvalidArgumentException (parameter splits given invalid value) on valid data

2016-08-16 Thread Barry Becker (JIRA)

Barry Becker created SPARK-17086:


 Summary: QuantileDiscretizer throws InvalidArgumentException 
(parameter splits given invalid value) on valid data
 Key: SPARK-17086
 URL: https://issues.apache.org/jira/browse/SPARK-17086
 Project: Spark
  Issue Type: Bug
  Components: ML
Affects Versions: 2.1.0
Reporter: Barry Becker


I discovered this bug when working with a build from the master branch (which I 
believe is 2.1.0). This used to work fine when running spark 1.6.2.

I have a dataframe with an "intData" column that has values like 
{code}
1 3 2 1 1 2 3 2 2 2 1 3
{code}
I have a stage in my pipeline that uses the QuantileDiscretizer to produce 
equal weight splits like this
{code}
new QuantileDiscretizer()
.setInputCol("intData")
.setOutputCol("intData_bin")
.setNumBuckets(10)
.fit(df)
{code}
But when that gets run it (incorrectly) throws this error:
{code}
parameter splits given invalid value [-Infinity, 1.0, 1.0, 2.0, 2.0, 3.0, 3.0, 
Infinity]
{code}
I don't think that there should be duplicate splits generated should there be?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-16578) Configurable hostname for RBackend

2016-08-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16578:


Assignee: (was: Apache Spark)

> Configurable hostname for RBackend
> --
>
> Key: SPARK-16578
> URL: https://issues.apache.org/jira/browse/SPARK-16578
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> One of the requirements that comes up with SparkR being a standalone package 
> is that users can now install just the R package on the client side and 
> connect to a remote machine which runs the RBackend class.
> We should check if we can support this mode of execution and what are the 
> pros / cons of it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16578) Configurable hostname for RBackend

2016-08-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15422978#comment-15422978
 ] 

Apache Spark commented on SPARK-16578:
--

User 'junyangq' has created a pull request for this issue:
https://github.com/apache/spark/pull/14666

> Configurable hostname for RBackend
> --
>
> Key: SPARK-16578
> URL: https://issues.apache.org/jira/browse/SPARK-16578
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> One of the requirements that comes up with SparkR being a standalone package 
> is that users can now install just the R package on the client side and 
> connect to a remote machine which runs the RBackend class.
> We should check if we can support this mode of execution and what are the 
> pros / cons of it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-16578) Configurable hostname for RBackend

2016-08-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16578:


Assignee: Apache Spark

> Configurable hostname for RBackend
> --
>
> Key: SPARK-16578
> URL: https://issues.apache.org/jira/browse/SPARK-16578
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>Assignee: Apache Spark
>
> One of the requirements that comes up with SparkR being a standalone package 
> is that users can now install just the R package on the client side and 
> connect to a remote machine which runs the RBackend class.
> We should check if we can support this mode of execution and what are the 
> pros / cons of it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16484) Incremental Cardinality estimation operations with Hyperloglog

2016-08-16 Thread Yongjia Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15422887#comment-15422887
 ] 

Yongjia Wang commented on SPARK-16484:
--

Here is my solution using Spark UDAF and UDT
https://github.com/yongjiaw/Spark_HLL

> Incremental Cardinality estimation operations with Hyperloglog
> --
>
> Key: SPARK-16484
> URL: https://issues.apache.org/jira/browse/SPARK-16484
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Yongjia Wang
>
> Efficient cardinality estimation is very important, and SparkSQL has had 
> approxCountDistinct based on Hyperloglog for quite some time. However, there 
> isn't a way to do incremental estimation. For example, if we want to get 
> updated distinct counts of the last 90 days, we need to do the aggregation 
> for the entire window over and over again. The more efficient way involves 
> serializing the counter for smaller time windows (such as hourly) so the 
> counts can be efficiently updated in an incremental fashion for any time 
> window.
> With the support of custom UDAF, Binary DataType and the HyperloglogPlusPlus 
> implementation in the current Spark version, it's easy enough to extend the 
> functionality to include incremental counting, and even other general set 
> operations such as intersection and set difference. Spark API is already as 
> elegant as it can be, but it still takes quite some effort to do a custom 
> implementation of the aforementioned operations which are supposed to be in 
> high demand. I have been searching but failed to find an usable existing 
> solution nor any ongoing effort for this. The closest I got is the following 
> but it does not work with Spark 1.6 due to API changes. 
> https://github.com/collectivemedia/spark-hyperloglog/blob/master/src/main/scala/org/apache/spark/sql/hyperloglog/aggregates.scala
> I wonder if it worth to integrate such operations into SparkSQL. The only 
> problem I see is it depends on serialization of a specific HLL implementation 
> and introduce compatibility issues. But as long as the user is aware of such 
> issue, it should be fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17085) Documentation and actual code differs - Unsupported Operations

2016-08-16 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15422864#comment-15422864
 ] 

Sean Owen commented on SPARK-17085:
---

Yes I think the first doc link is wrong. Go ahead with a pull request.

> Documentation and actual code differs - Unsupported Operations
> --
>
> Key: SPARK-17085
> URL: https://issues.apache.org/jira/browse/SPARK-17085
> Project: Spark
>  Issue Type: Documentation
>  Components: Streaming
>Affects Versions: 2.0.0
>Reporter: Samritti
>Priority: Minor
>
> Spark Stuctured Streaming doc in this link
> https://spark.apache.org/docs/2.0.0/structured-streaming-programming-guide.html#unsupported-operations
> mentions 
> >>>"Right outer join with a streaming Dataset on the right is not supported"
>  but the code here conveys a different/opposite error
> https://github.com/apache/spark/blob/5545b791096756b07b3207fb3de13b68b9a37b00/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala#L114
> >>>"Right outer join with a streaming DataFrame/Dataset on the left is " +
> "not supported"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11714) Make Spark on Mesos honor port restrictions

2016-08-16 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15422856#comment-15422856
 ] 

Stavros Kontopoulos commented on SPARK-11714:
-

I will create another one for the documentation. I guess will need to document 
the behavior.

> Make Spark on Mesos honor port restrictions
> ---
>
> Key: SPARK-11714
> URL: https://issues.apache.org/jira/browse/SPARK-11714
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Charles Allen
>Assignee: Stavros Kontopoulos
> Fix For: 2.1.0
>
>
> Currently the MesosSchedulerBackend does not make any effort to honor "ports" 
> as a resource offer in Mesos. This ask is to have the ports which the 
> executor binds to honor the limits of the "ports" resource of an offer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17085) Documentation and actual code differs - Unsupported Operations

2016-08-16 Thread Samritti (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samritti updated SPARK-17085:
-
Priority: Minor  (was: Major)

> Documentation and actual code differs - Unsupported Operations
> --
>
> Key: SPARK-17085
> URL: https://issues.apache.org/jira/browse/SPARK-17085
> Project: Spark
>  Issue Type: Documentation
>  Components: Streaming
>Affects Versions: 2.0.0
>Reporter: Samritti
>Priority: Minor
>
> Spark Stuctured Streaming doc in this link
> https://spark.apache.org/docs/2.0.0/structured-streaming-programming-guide.html#unsupported-operations
> mentions 
> >>>"Right outer join with a streaming Dataset on the right is not supported"
>  but the code here conveys a different/opposite error
> https://github.com/apache/spark/blob/5545b791096756b07b3207fb3de13b68b9a37b00/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala#L114
> >>>"Right outer join with a streaming DataFrame/Dataset on the left is " +
> "not supported"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 156 matches

Mail list logo