[jira] [Assigned] (SPARK-17101) Provide format identifier for TextFileFormat
[ https://issues.apache.org/jira/browse/SPARK-17101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17101: Assignee: Apache Spark > Provide format identifier for TextFileFormat > > > Key: SPARK-17101 > URL: https://issues.apache.org/jira/browse/SPARK-17101 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Jacek Laskowski >Assignee: Apache Spark >Priority: Trivial > > Define the format identifier that is used in {{Optimized Logical Plan}} in > {{explain}} for {{text}} file format. > {code} > scala> spark.read.text("people.csv").cache.explain(extended = true) > == Parsed Logical Plan == > Relation[value#24] text > == Analyzed Logical Plan == > value: string > Relation[value#24] text > == Optimized Logical Plan == > InMemoryRelation [value#24], true, 1, StorageLevel(disk, memory, > deserialized, 1 replicas) >+- *FileScan text [value#24] Batched: false, Format: > org.apache.spark.sql.execution.datasources.text.TextFileFormat@262e2c8c, > InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], > PushedFilters: [], ReadSchema: struct > == Physical Plan == > InMemoryTableScan [value#24] >+- InMemoryRelation [value#24], true, 1, StorageLevel(disk, memory, > deserialized, 1 replicas) > +- *FileScan text [value#24] Batched: false, Format: > org.apache.spark.sql.execution.datasources.text.TextFileFormat@262e2c8c, > InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], > PushedFilters: [], ReadSchema: struct > {code} > When you {{explain}} csv format you can see {{Format: CSV}}. > {code} > scala> spark.read.csv("people.csv").cache.explain(extended = true) > == Parsed Logical Plan == > Relation[_c0#39,_c1#40,_c2#41,_c3#42] csv > == Analyzed Logical Plan == > _c0: string, _c1: string, _c2: string, _c3: string > Relation[_c0#39,_c1#40,_c2#41,_c3#42] csv > == Optimized Logical Plan == > InMemoryRelation [_c0#39, _c1#40, _c2#41, _c3#42], true, 1, > StorageLevel(disk, memory, deserialized, 1 replicas) >+- *FileScan csv [_c0#39,_c1#40,_c2#41,_c3#42] Batched: false, Format: > CSV, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, > PartitionFilters: [], PushedFilters: [], ReadSchema: > struct<_c0:string,_c1:string,_c2:string,_c3:string> > == Physical Plan == > InMemoryTableScan [_c0#39, _c1#40, _c2#41, _c3#42] >+- InMemoryRelation [_c0#39, _c1#40, _c2#41, _c3#42], true, 1, > StorageLevel(disk, memory, deserialized, 1 replicas) > +- *FileScan csv [_c0#39,_c1#40,_c2#41,_c3#42] Batched: false, > Format: CSV, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, > PartitionFilters: [], PushedFilters: [], ReadSchema: > struct<_c0:string,_c1:string,_c2:string,_c3:string> > {code} > The custom format is defined for JSON, too. > {code} > scala> spark.read.json("people.csv").cache.explain(extended = true) > == Parsed Logical Plan == > Relation[_corrupt_record#93] json > == Analyzed Logical Plan == > _corrupt_record: string > Relation[_corrupt_record#93] json > == Optimized Logical Plan == > InMemoryRelation [_corrupt_record#93], true, 1, StorageLevel(disk, > memory, deserialized, 1 replicas) >+- *FileScan json [_corrupt_record#93] Batched: false, Format: JSON, > InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], > PushedFilters: [], ReadSchema: struct<_corrupt_record:string> > == Physical Plan == > InMemoryTableScan [_corrupt_record#93] >+- InMemoryRelation [_corrupt_record#93], true, 1, StorageLevel(disk, > memory, deserialized, 1 replicas) > +- *FileScan json [_corrupt_record#93] Batched: false, Format: JSON, > InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], > PushedFilters: [], ReadSchema: struct<_corrupt_record:string> > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17101) Provide format identifier for TextFileFormat
[ https://issues.apache.org/jira/browse/SPARK-17101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15424001#comment-15424001 ] Apache Spark commented on SPARK-17101: -- User 'jaceklaskowski' has created a pull request for this issue: https://github.com/apache/spark/pull/14680 > Provide format identifier for TextFileFormat > > > Key: SPARK-17101 > URL: https://issues.apache.org/jira/browse/SPARK-17101 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Jacek Laskowski >Priority: Trivial > > Define the format identifier that is used in {{Optimized Logical Plan}} in > {{explain}} for {{text}} file format. > {code} > scala> spark.read.text("people.csv").cache.explain(extended = true) > == Parsed Logical Plan == > Relation[value#24] text > == Analyzed Logical Plan == > value: string > Relation[value#24] text > == Optimized Logical Plan == > InMemoryRelation [value#24], true, 1, StorageLevel(disk, memory, > deserialized, 1 replicas) >+- *FileScan text [value#24] Batched: false, Format: > org.apache.spark.sql.execution.datasources.text.TextFileFormat@262e2c8c, > InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], > PushedFilters: [], ReadSchema: struct > == Physical Plan == > InMemoryTableScan [value#24] >+- InMemoryRelation [value#24], true, 1, StorageLevel(disk, memory, > deserialized, 1 replicas) > +- *FileScan text [value#24] Batched: false, Format: > org.apache.spark.sql.execution.datasources.text.TextFileFormat@262e2c8c, > InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], > PushedFilters: [], ReadSchema: struct > {code} > When you {{explain}} csv format you can see {{Format: CSV}}. > {code} > scala> spark.read.csv("people.csv").cache.explain(extended = true) > == Parsed Logical Plan == > Relation[_c0#39,_c1#40,_c2#41,_c3#42] csv > == Analyzed Logical Plan == > _c0: string, _c1: string, _c2: string, _c3: string > Relation[_c0#39,_c1#40,_c2#41,_c3#42] csv > == Optimized Logical Plan == > InMemoryRelation [_c0#39, _c1#40, _c2#41, _c3#42], true, 1, > StorageLevel(disk, memory, deserialized, 1 replicas) >+- *FileScan csv [_c0#39,_c1#40,_c2#41,_c3#42] Batched: false, Format: > CSV, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, > PartitionFilters: [], PushedFilters: [], ReadSchema: > struct<_c0:string,_c1:string,_c2:string,_c3:string> > == Physical Plan == > InMemoryTableScan [_c0#39, _c1#40, _c2#41, _c3#42] >+- InMemoryRelation [_c0#39, _c1#40, _c2#41, _c3#42], true, 1, > StorageLevel(disk, memory, deserialized, 1 replicas) > +- *FileScan csv [_c0#39,_c1#40,_c2#41,_c3#42] Batched: false, > Format: CSV, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, > PartitionFilters: [], PushedFilters: [], ReadSchema: > struct<_c0:string,_c1:string,_c2:string,_c3:string> > {code} > The custom format is defined for JSON, too. > {code} > scala> spark.read.json("people.csv").cache.explain(extended = true) > == Parsed Logical Plan == > Relation[_corrupt_record#93] json > == Analyzed Logical Plan == > _corrupt_record: string > Relation[_corrupt_record#93] json > == Optimized Logical Plan == > InMemoryRelation [_corrupt_record#93], true, 1, StorageLevel(disk, > memory, deserialized, 1 replicas) >+- *FileScan json [_corrupt_record#93] Batched: false, Format: JSON, > InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], > PushedFilters: [], ReadSchema: struct<_corrupt_record:string> > == Physical Plan == > InMemoryTableScan [_corrupt_record#93] >+- InMemoryRelation [_corrupt_record#93], true, 1, StorageLevel(disk, > memory, deserialized, 1 replicas) > +- *FileScan json [_corrupt_record#93] Batched: false, Format: JSON, > InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], > PushedFilters: [], ReadSchema: struct<_corrupt_record:string> > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16391) KeyValueGroupedDataset.reduceGroups should support partial aggregation
[ https://issues.apache.org/jira/browse/SPARK-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-16391: Target Version/s: 2.0.1, 2.1.0 (was: 2.1.0) > KeyValueGroupedDataset.reduceGroups should support partial aggregation > -- > > Key: SPARK-16391 > URL: https://issues.apache.org/jira/browse/SPARK-16391 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > > KeyValueGroupedDataset.reduceGroups is currently implemented via > flatMapGroups, which is very inefficient since effectively does a physical > group by. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17102) bypass UserDefinedGenerator for json format check
[ https://issues.apache.org/jira/browse/SPARK-17102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17102: Assignee: Apache Spark (was: Wenchen Fan) > bypass UserDefinedGenerator for json format check > - > > Key: SPARK-17102 > URL: https://issues.apache.org/jira/browse/SPARK-17102 > Project: Spark > Issue Type: Test > Components: SQL >Reporter: Wenchen Fan >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17102) bypass UserDefinedGenerator for json format check
[ https://issues.apache.org/jira/browse/SPARK-17102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423989#comment-15423989 ] Apache Spark commented on SPARK-17102: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/14679 > bypass UserDefinedGenerator for json format check > - > > Key: SPARK-17102 > URL: https://issues.apache.org/jira/browse/SPARK-17102 > Project: Spark > Issue Type: Test > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17102) bypass UserDefinedGenerator for json format check
[ https://issues.apache.org/jira/browse/SPARK-17102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17102: Assignee: Wenchen Fan (was: Apache Spark) > bypass UserDefinedGenerator for json format check > - > > Key: SPARK-17102 > URL: https://issues.apache.org/jira/browse/SPARK-17102 > Project: Spark > Issue Type: Test > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17101) Provide format identifier for TextFileFormat
[ https://issues.apache.org/jira/browse/SPARK-17101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacek Laskowski updated SPARK-17101: Description: Define the format identifier that is used in {{Optimized Logical Plan}} in {{explain}} for {{text}} file format. {code} scala> spark.read.text("people.csv").cache.explain(extended = true) == Parsed Logical Plan == Relation[value#24] text == Analyzed Logical Plan == value: string Relation[value#24] text == Optimized Logical Plan == InMemoryRelation [value#24], true, 1, StorageLevel(disk, memory, deserialized, 1 replicas) +- *FileScan text [value#24] Batched: false, Format: org.apache.spark.sql.execution.datasources.text.TextFileFormat@262e2c8c, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], PushedFilters: [], ReadSchema: struct == Physical Plan == InMemoryTableScan [value#24] +- InMemoryRelation [value#24], true, 1, StorageLevel(disk, memory, deserialized, 1 replicas) +- *FileScan text [value#24] Batched: false, Format: org.apache.spark.sql.execution.datasources.text.TextFileFormat@262e2c8c, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], PushedFilters: [], ReadSchema: struct {code} When you {{explain}} csv format you can see {{Format: CSV}}. {code} scala> spark.read.csv("people.csv").cache.explain(extended = true) == Parsed Logical Plan == Relation[_c0#39,_c1#40,_c2#41,_c3#42] csv == Analyzed Logical Plan == _c0: string, _c1: string, _c2: string, _c3: string Relation[_c0#39,_c1#40,_c2#41,_c3#42] csv == Optimized Logical Plan == InMemoryRelation [_c0#39, _c1#40, _c2#41, _c3#42], true, 1, StorageLevel(disk, memory, deserialized, 1 replicas) +- *FileScan csv [_c0#39,_c1#40,_c2#41,_c3#42] Batched: false, Format: CSV, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], PushedFilters: [], ReadSchema: struct<_c0:string,_c1:string,_c2:string,_c3:string> == Physical Plan == InMemoryTableScan [_c0#39, _c1#40, _c2#41, _c3#42] +- InMemoryRelation [_c0#39, _c1#40, _c2#41, _c3#42], true, 1, StorageLevel(disk, memory, deserialized, 1 replicas) +- *FileScan csv [_c0#39,_c1#40,_c2#41,_c3#42] Batched: false, Format: CSV, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], PushedFilters: [], ReadSchema: struct<_c0:string,_c1:string,_c2:string,_c3:string> {code} The custom format is defined for JSON, too. {code} scala> spark.read.json("people.csv").cache.explain(extended = true) == Parsed Logical Plan == Relation[_corrupt_record#93] json == Analyzed Logical Plan == _corrupt_record: string Relation[_corrupt_record#93] json == Optimized Logical Plan == InMemoryRelation [_corrupt_record#93], true, 1, StorageLevel(disk, memory, deserialized, 1 replicas) +- *FileScan json [_corrupt_record#93] Batched: false, Format: JSON, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], PushedFilters: [], ReadSchema: struct<_corrupt_record:string> == Physical Plan == InMemoryTableScan [_corrupt_record#93] +- InMemoryRelation [_corrupt_record#93], true, 1, StorageLevel(disk, memory, deserialized, 1 replicas) +- *FileScan json [_corrupt_record#93] Batched: false, Format: JSON, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], PushedFilters: [], ReadSchema: struct<_corrupt_record:string> {code} was: Define the format identifier that is used in {{Optimized Logical Plan}} in {{explain}} for {{text}} file format. When you {{explain}} csv format you can see {{Format: CSV}}. {code} scala> spark.read.csv("people.csv").cache.explain(extended = true) == Parsed Logical Plan == Relation[_c0#39,_c1#40,_c2#41,_c3#42] csv == Analyzed Logical Plan == _c0: string, _c1: string, _c2: string, _c3: string Relation[_c0#39,_c1#40,_c2#41,_c3#42] csv == Optimized Logical Plan == InMemoryRelation [_c0#39, _c1#40, _c2#41, _c3#42], true, 1, StorageLevel(disk, memory, deserialized, 1 replicas) +- *FileScan csv [_c0#39,_c1#40,_c2#41,_c3#42] Batched: false, Format: CSV, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], PushedFilters: [], ReadSchema: struct<_c0:string,_c1:string,_c2:string,_c3:string> == Physical Plan == InMemoryTableScan [_c0#39, _c1#40, _c2#41, _c3#42] +- InMemoryRelation [_c0#39, _c1#40, _c2#41, _c3#42], true, 1, StorageLevel(disk, memory, deserialized, 1 replicas) +- *FileScan csv [_c0#39,_c1#40,_c2#41,_c3#42] Batched: false, Format: CSV, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], PushedFilters: [], ReadSchema: struct<_c0:string,_c1:string,_c2:string,_c3:string> {code} The custom format is defined for JSON, too. {code} scala> spark.read.json("people.csv").cache.explain(extended = true) == Parsed Logical Plan == Relation[_corrupt_record#93] json == Analyzed Logical P
[jira] [Created] (SPARK-17102) bypass UserDefinedGenerator for json format check
Wenchen Fan created SPARK-17102: --- Summary: bypass UserDefinedGenerator for json format check Key: SPARK-17102 URL: https://issues.apache.org/jira/browse/SPARK-17102 Project: Spark Issue Type: Test Components: SQL Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17101) Provide format identifier for TextFileFormat
Jacek Laskowski created SPARK-17101: --- Summary: Provide format identifier for TextFileFormat Key: SPARK-17101 URL: https://issues.apache.org/jira/browse/SPARK-17101 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.1.0 Reporter: Jacek Laskowski Priority: Trivial Define the format identifier that is used in {{Optimized Logical Plan}} in {{explain}} for {{text}} file format. When you {{explain}} csv format you can see {{Format: CSV}}. {code} scala> spark.read.csv("people.csv").cache.explain(extended = true) == Parsed Logical Plan == Relation[_c0#39,_c1#40,_c2#41,_c3#42] csv == Analyzed Logical Plan == _c0: string, _c1: string, _c2: string, _c3: string Relation[_c0#39,_c1#40,_c2#41,_c3#42] csv == Optimized Logical Plan == InMemoryRelation [_c0#39, _c1#40, _c2#41, _c3#42], true, 1, StorageLevel(disk, memory, deserialized, 1 replicas) +- *FileScan csv [_c0#39,_c1#40,_c2#41,_c3#42] Batched: false, Format: CSV, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], PushedFilters: [], ReadSchema: struct<_c0:string,_c1:string,_c2:string,_c3:string> == Physical Plan == InMemoryTableScan [_c0#39, _c1#40, _c2#41, _c3#42] +- InMemoryRelation [_c0#39, _c1#40, _c2#41, _c3#42], true, 1, StorageLevel(disk, memory, deserialized, 1 replicas) +- *FileScan csv [_c0#39,_c1#40,_c2#41,_c3#42] Batched: false, Format: CSV, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], PushedFilters: [], ReadSchema: struct<_c0:string,_c1:string,_c2:string,_c3:string> {code} The custom format is defined for JSON, too. {code} scala> spark.read.json("people.csv").cache.explain(extended = true) == Parsed Logical Plan == Relation[_corrupt_record#93] json == Analyzed Logical Plan == _corrupt_record: string Relation[_corrupt_record#93] json == Optimized Logical Plan == InMemoryRelation [_corrupt_record#93], true, 1, StorageLevel(disk, memory, deserialized, 1 replicas) +- *FileScan json [_corrupt_record#93] Batched: false, Format: JSON, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], PushedFilters: [], ReadSchema: struct<_corrupt_record:string> == Physical Plan == InMemoryTableScan [_corrupt_record#93] +- InMemoryRelation [_corrupt_record#93], true, 1, StorageLevel(disk, memory, deserialized, 1 replicas) +- *FileScan json [_corrupt_record#93] Batched: false, Format: JSON, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], PushedFilters: [], ReadSchema: struct<_corrupt_record:string> {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17068) Retain view visibility information through out Analysis
[ https://issues.apache.org/jira/browse/SPARK-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-17068. - Resolution: Fixed Fix Version/s: 2.0.1 > Retain view visibility information through out Analysis > --- > > Key: SPARK-17068 > URL: https://issues.apache.org/jira/browse/SPARK-17068 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Herman van Hovell >Assignee: Herman van Hovell > Fix For: 2.0.1, 2.1.0 > > > Views in Spark SQL are replaced by their backing {{LogicalPlan}} during > analysis. This can be confusing when dealing with and debugging large > {{LogicalPlan}}s. I propose to add an identifier to the subquery alias in > order to improve this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17068) Retain view visibility information through out Analysis
[ https://issues.apache.org/jira/browse/SPARK-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-17068: Fix Version/s: (was: 2.0.1) > Retain view visibility information through out Analysis > --- > > Key: SPARK-17068 > URL: https://issues.apache.org/jira/browse/SPARK-17068 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Herman van Hovell >Assignee: Herman van Hovell > Fix For: 2.1.0 > > > Views in Spark SQL are replaced by their backing {{LogicalPlan}} during > analysis. This can be confusing when dealing with and debugging large > {{LogicalPlan}}s. I propose to add an identifier to the subquery alias in > order to improve this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17100) pyspark filter on a udf column after join gives java.lang.UnsupportedOperationException
[ https://issues.apache.org/jira/browse/SPARK-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423917#comment-15423917 ] Tim Sell commented on SPARK-17100: -- I don't know why, but using `dataframe.cache()` before the filter is a workaround. > pyspark filter on a udf column after join gives > java.lang.UnsupportedOperationException > --- > > Key: SPARK-17100 > URL: https://issues.apache.org/jira/browse/SPARK-17100 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.0.0 > Environment: spark-2.0.0-bin-hadoop2.7. Python2 and Python3. >Reporter: Tim Sell > Attachments: bug.py, test_bug.py > > > In pyspark, when filtering on a udf derived column after some join types, > the optimized logical plan results is a > java.lang.UnsupportedOperationException. > I could not replicate this in scala code from the shell, just python. It is a > pyspark regression from spark 1.6.2. > This can be replicated with: bin/spark-submit bug.py > {code:python:title=bug.py} > import pyspark.sql.functions as F > from pyspark.sql import Row, SparkSession > if __name__ == '__main__': > spark = SparkSession.builder.appName("test").getOrCreate() > left = spark.createDataFrame([Row(a=1)]) > right = spark.createDataFrame([Row(a=1)]) > df = left.join(right, on='a', how='left_outer') > df = df.withColumn('b', F.udf(lambda x: 'x')(df.a)) > df = df.filter('b = "x"') > df.explain(extended=True) > {code} > The output is: > {code} > == Parsed Logical Plan == > 'Filter ('b = x) > +- Project [a#0L, (a#0L) AS b#8] >+- Project [a#0L] > +- Join LeftOuter, (a#0L = a#3L) > :- LogicalRDD [a#0L] > +- LogicalRDD [a#3L] > == Analyzed Logical Plan == > a: bigint, b: string > Filter (b#8 = x) > +- Project [a#0L, (a#0L) AS b#8] >+- Project [a#0L] > +- Join LeftOuter, (a#0L = a#3L) > :- LogicalRDD [a#0L] > +- LogicalRDD [a#3L] > == Optimized Logical Plan == > java.lang.UnsupportedOperationException: Cannot evaluate expression: > (input[0, bigint, true]) > == Physical Plan == > java.lang.UnsupportedOperationException: Cannot evaluate expression: > (input[0, bigint, true]) > {code} > It fails when the join is: > * how='outer', on=column expression > * how='left_outer', on=string or column expression > * how='right_outer', on=string or column expression > It passes when the join is: > * how='inner', on=string or column expression > * how='outer', on=string > I made some tests to demonstrate each of these. > Run with bin/spark-submit test_bug.py -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17084) Rename ParserUtils.assert to validate
[ https://issues.apache.org/jira/browse/SPARK-17084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-17084: Summary: Rename ParserUtils.assert to validate (was: Rename ParserUtils.assert to require) > Rename ParserUtils.assert to validate > - > > Key: SPARK-17084 > URL: https://issues.apache.org/jira/browse/SPARK-17084 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.1, 2.1.0 >Reporter: Herman van Hovell >Assignee: Herman van Hovell >Priority: Minor > Fix For: 2.0.1, 2.1.0 > > > We currently have an assert method in ParserUtils. This is, however, not used > as an assert (a failed assert meaning that the program has reached an invalid > state) but is used to check requirements. I propose to rename this method to > {{require}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17084) Rename ParserUtils.assert to validate
[ https://issues.apache.org/jira/browse/SPARK-17084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-17084: Description: We currently have an assert method in ParserUtils. This is, however, not used as an assert (a failed assert meaning that the program has reached an invalid state) but is used to check requirements. I propose to rename this method to {{validate}}. (was: We currently have an assert method in ParserUtils. This is, however, not used as an assert (a failed assert meaning that the program has reached an invalid state) but is used to check requirements. I propose to rename this method to {{require}}) > Rename ParserUtils.assert to validate > - > > Key: SPARK-17084 > URL: https://issues.apache.org/jira/browse/SPARK-17084 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.1, 2.1.0 >Reporter: Herman van Hovell >Assignee: Herman van Hovell >Priority: Minor > Fix For: 2.0.1, 2.1.0 > > > We currently have an assert method in ParserUtils. This is, however, not used > as an assert (a failed assert meaning that the program has reached an invalid > state) but is used to check requirements. I propose to rename this method to > {{validate}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17084) Rename ParserUtils.assert to require
[ https://issues.apache.org/jira/browse/SPARK-17084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-17084. - Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 > Rename ParserUtils.assert to require > > > Key: SPARK-17084 > URL: https://issues.apache.org/jira/browse/SPARK-17084 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.1, 2.1.0 >Reporter: Herman van Hovell >Assignee: Herman van Hovell >Priority: Minor > Fix For: 2.0.1, 2.1.0 > > > We currently have an assert method in ParserUtils. This is, however, not used > as an assert (a failed assert meaning that the program has reached an invalid > state) but is used to check requirements. I propose to rename this method to > {{require}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17100) pyspark filter on a udf column after join gives java.lang.UnsupportedOperationException
[ https://issues.apache.org/jira/browse/SPARK-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Sell updated SPARK-17100: - Attachment: test_bug.py > pyspark filter on a udf column after join gives > java.lang.UnsupportedOperationException > --- > > Key: SPARK-17100 > URL: https://issues.apache.org/jira/browse/SPARK-17100 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.0.0 > Environment: spark-2.0.0-bin-hadoop2.7. Python2 and Python3. >Reporter: Tim Sell > Attachments: bug.py, test_bug.py > > > In pyspark, when filtering on a udf derived column after some join types, > the optimized logical plan results is a > java.lang.UnsupportedOperationException. > I could not replicate this in scala code from the shell, just python. It is a > pyspark regression from spark 1.6.2. > This can be replicated with: bin/spark-submit bug.py > {code:python:title=bug.py} > import pyspark.sql.functions as F > from pyspark.sql import Row, SparkSession > if __name__ == '__main__': > spark = SparkSession.builder.appName("test").getOrCreate() > left = spark.createDataFrame([Row(a=1)]) > right = spark.createDataFrame([Row(a=1)]) > df = left.join(right, on='a', how='left_outer') > df = df.withColumn('b', F.udf(lambda x: 'x')(df.a)) > df = df.filter('b = "x"') > df.explain(extended=True) > {code} > The output is: > {code} > == Parsed Logical Plan == > 'Filter ('b = x) > +- Project [a#0L, (a#0L) AS b#8] >+- Project [a#0L] > +- Join LeftOuter, (a#0L = a#3L) > :- LogicalRDD [a#0L] > +- LogicalRDD [a#3L] > == Analyzed Logical Plan == > a: bigint, b: string > Filter (b#8 = x) > +- Project [a#0L, (a#0L) AS b#8] >+- Project [a#0L] > +- Join LeftOuter, (a#0L = a#3L) > :- LogicalRDD [a#0L] > +- LogicalRDD [a#3L] > == Optimized Logical Plan == > java.lang.UnsupportedOperationException: Cannot evaluate expression: > (input[0, bigint, true]) > == Physical Plan == > java.lang.UnsupportedOperationException: Cannot evaluate expression: > (input[0, bigint, true]) > {code} > It fails when the join is: > * how='outer', on=column expression > * how='left_outer', on=string or column expression > * how='right_outer', on=string or column expression > It passes when the join is: > * how='inner', on=string or column expression > * how='outer', on=string > I made some tests to demonstrate each of these. > Run with bin/spark-submit test_bug.py -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17100) pyspark filter on a udf column after join gives java.lang.UnsupportedOperationException
[ https://issues.apache.org/jira/browse/SPARK-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Sell updated SPARK-17100: - Attachment: bug.py > pyspark filter on a udf column after join gives > java.lang.UnsupportedOperationException > --- > > Key: SPARK-17100 > URL: https://issues.apache.org/jira/browse/SPARK-17100 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.0.0 > Environment: spark-2.0.0-bin-hadoop2.7. Python2 and Python3. >Reporter: Tim Sell > Attachments: bug.py, test_bug.py > > > In pyspark, when filtering on a udf derived column after some join types, > the optimized logical plan results is a > java.lang.UnsupportedOperationException. > I could not replicate this in scala code from the shell, just python. It is a > pyspark regression from spark 1.6.2. > This can be replicated with: bin/spark-submit bug.py > {code:python:title=bug.py} > import pyspark.sql.functions as F > from pyspark.sql import Row, SparkSession > if __name__ == '__main__': > spark = SparkSession.builder.appName("test").getOrCreate() > left = spark.createDataFrame([Row(a=1)]) > right = spark.createDataFrame([Row(a=1)]) > df = left.join(right, on='a', how='left_outer') > df = df.withColumn('b', F.udf(lambda x: 'x')(df.a)) > df = df.filter('b = "x"') > df.explain(extended=True) > {code} > The output is: > {code} > == Parsed Logical Plan == > 'Filter ('b = x) > +- Project [a#0L, (a#0L) AS b#8] >+- Project [a#0L] > +- Join LeftOuter, (a#0L = a#3L) > :- LogicalRDD [a#0L] > +- LogicalRDD [a#3L] > == Analyzed Logical Plan == > a: bigint, b: string > Filter (b#8 = x) > +- Project [a#0L, (a#0L) AS b#8] >+- Project [a#0L] > +- Join LeftOuter, (a#0L = a#3L) > :- LogicalRDD [a#0L] > +- LogicalRDD [a#3L] > == Optimized Logical Plan == > java.lang.UnsupportedOperationException: Cannot evaluate expression: > (input[0, bigint, true]) > == Physical Plan == > java.lang.UnsupportedOperationException: Cannot evaluate expression: > (input[0, bigint, true]) > {code} > It fails when the join is: > * how='outer', on=column expression > * how='left_outer', on=string or column expression > * how='right_outer', on=string or column expression > It passes when the join is: > * how='inner', on=string or column expression > * how='outer', on=string > I made some tests to demonstrate each of these. > Run with bin/spark-submit test_bug.py -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17100) pyspark filter on a udf column after join gives java.lang.UnsupportedOperationException
Tim Sell created SPARK-17100: Summary: pyspark filter on a udf column after join gives java.lang.UnsupportedOperationException Key: SPARK-17100 URL: https://issues.apache.org/jira/browse/SPARK-17100 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.0.0 Environment: spark-2.0.0-bin-hadoop2.7. Python2 and Python3. Reporter: Tim Sell In pyspark, when filtering on a udf derived column after some join types, the optimized logical plan results is a java.lang.UnsupportedOperationException. I could not replicate this in scala code from the shell, just python. It is a pyspark regression from spark 1.6.2. This can be replicated with: bin/spark-submit bug.py {code:python:title=bug.py} import pyspark.sql.functions as F from pyspark.sql import Row, SparkSession if __name__ == '__main__': spark = SparkSession.builder.appName("test").getOrCreate() left = spark.createDataFrame([Row(a=1)]) right = spark.createDataFrame([Row(a=1)]) df = left.join(right, on='a', how='left_outer') df = df.withColumn('b', F.udf(lambda x: 'x')(df.a)) df = df.filter('b = "x"') df.explain(extended=True) {code} The output is: {code} == Parsed Logical Plan == 'Filter ('b = x) +- Project [a#0L, (a#0L) AS b#8] +- Project [a#0L] +- Join LeftOuter, (a#0L = a#3L) :- LogicalRDD [a#0L] +- LogicalRDD [a#3L] == Analyzed Logical Plan == a: bigint, b: string Filter (b#8 = x) +- Project [a#0L, (a#0L) AS b#8] +- Project [a#0L] +- Join LeftOuter, (a#0L = a#3L) :- LogicalRDD [a#0L] +- LogicalRDD [a#3L] == Optimized Logical Plan == java.lang.UnsupportedOperationException: Cannot evaluate expression: (input[0, bigint, true]) == Physical Plan == java.lang.UnsupportedOperationException: Cannot evaluate expression: (input[0, bigint, true]) {code} It fails when the join is: * how='outer', on=column expression * how='left_outer', on=string or column expression * how='right_outer', on=string or column expression It passes when the join is: * how='inner', on=string or column expression * how='outer', on=string I made some tests to demonstrate each of these. Run with bin/spark-submit test_bug.py -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17082) Replace ByteBuffer with ChunkedByteBuffer
[ https://issues.apache.org/jira/browse/SPARK-17082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-17082: Description: The size of ByteBuffers can not be greater than 2G, should be replaced by ChunkedByteBuffer (was: The size of ByteBuffers can not be greater than 2G, should be replaced by Java) > Replace ByteBuffer with ChunkedByteBuffer > - > > Key: SPARK-17082 > URL: https://issues.apache.org/jira/browse/SPARK-17082 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Reporter: Guoqiang Li > > The size of ByteBuffers can not be greater than 2G, should be replaced by > ChunkedByteBuffer -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17082) Replace ByteBuffer with ChunkedByteBuffer
[ https://issues.apache.org/jira/browse/SPARK-17082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-17082: Description: The size of ByteBuffers can not be greater than 2G, should be replaced by Java (was: the various 2G limit we have in Spark, due to the use of ByteBuffers.) > Replace ByteBuffer with ChunkedByteBuffer > - > > Key: SPARK-17082 > URL: https://issues.apache.org/jira/browse/SPARK-17082 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Reporter: Guoqiang Li > > The size of ByteBuffers can not be greater than 2G, should be replaced by Java -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17093) Roundtrip encoding of array> fields is wrong when whole-stage codegen is disabled
[ https://issues.apache.org/jira/browse/SPARK-17093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423777#comment-15423777 ] Liwei Lin commented on SPARK-17093: --- Oh the interpreted evaluation codepath indeed forgot to {{copy}} somewhere. I'll submit a patch shortly, thanks. > Roundtrip encoding of array> fields is wrong when whole-stage > codegen is disabled > -- > > Key: SPARK-17093 > URL: https://issues.apache.org/jira/browse/SPARK-17093 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Josh Rosen >Priority: Critical > > The following failing test demonstrates a bug where Spark mis-encodes > array-of-struct fields if whole-stage codegen is disabled: > {code} > withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") { > val data = Array(Array((1, 2), (3, 4))) > val ds = spark.sparkContext.parallelize(data).toDS() > assert(ds.collect() === data) > } > {code} > When wholestage codegen is enabled (the default), this works fine. When it's > disabled, as in the test above, Spark returns {{Array(Array((3,4), (3,4)))}}. > Because the last element of the array appears to be repeated my best guess is > that the interpreted evaluation codepath forgot to {{copy()}} somewhere. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17099) Incorrect result when HAVING clause is added to group by query
[ https://issues.apache.org/jira/browse/SPARK-17099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-17099: --- Description: Random query generation uncovered the following query which returns incorrect results when run on Spark SQL. This wasn't the original query uncovered by the generator, since I performed a bit of minimization to try to make it more understandable. With the following tables: {code} val t1 = sc.parallelize(Seq(-234, 145, 367, 975, 298)).toDF("int_col_5") val t2 = sc.parallelize( Seq( (-769, -244), (-800, -409), (940, 86), (-507, 304), (-367, 158)) ).toDF("int_col_2", "int_col_5") t1.registerTempTable("t1") t2.registerTempTable("t2") {code} Run {code} SELECT (SUM(COALESCE(t1.int_col_5, t2.int_col_2))), ((COALESCE(t1.int_col_5, t2.int_col_2)) * 2) FROM t1 RIGHT JOIN t2 ON (t2.int_col_2) = (t1.int_col_5) GROUP BY GREATEST(COALESCE(t2.int_col_5, 109), COALESCE(t1.int_col_5, -449)), COALESCE(t1.int_col_5, t2.int_col_2) HAVING (SUM(COALESCE(t1.int_col_5, t2.int_col_2))) > ((COALESCE(t1.int_col_5, t2.int_col_2)) * 2) {code} In Spark SQL, this returns an empty result set, whereas Postgres returns four rows. However, if I omit the {{HAVING}} clause I see that the group's rows are being incorrectly filtered by the {{HAVING}} clause: {code} +--+---+--+ | sum(coalesce(int_col_5, int_col_2)) | (coalesce(int_col_5, int_col_2) * 2) | +--+---+--+ | -507 | -1014 | | 940 | 1880 | | -769 | -1538 | | -367 | -734 | | -800 | -1600 | +--+---+--+ {code} Based on this, the output after adding the {{HAVING}} should contain four rows, not zero. I'm not sure how to further shrink this in a straightforward way, so I'm opening this bug to get help in triaging further. was: Random query generation uncovered the following query which returns incorrect results when run on Spark SQL. This wasn't the original query uncovered by the generator, since I performed a bit of minimization to try to make it more understandable. With the following tables: {code} val t1 = sc.parallelize(Seq(-234, 145, 367, 975, 298)).toDF("int_col_5") val t2 = sc.parallelize( Seq( (-769, -244), (-800, -409), (940, 86), (-507, 304), (-367, 158)) ).toDF("int_col_2", "int_col_5") t1.registerTempTable("t1") t2.registerTempTable("t2") {code} Run {code} SELECT (SUM(COALESCE(t1.int_col_5, t2.int_col_2))), ((COALESCE(t1.int_col_5, t2.int_col_2)) * 2) FROM t1 RIGHT JOIN t2 ON (t2.int_col_2) = (t1.int_col_5) GROUP BY GREATEST(COALESCE(t2.int_col_5, 109), COALESCE(t1.int_col_5, -449)), COALESCE(t1.int_col_5, t2.int_col_2) HAVING (SUM(COALESCE(t1.int_col_5, t2.int_col_2))) > ((COALESCE(t1.int_col_5, t2.int_col_2)) * 2) {code} In Spark SQL, this returns an empty result set, whereas Postgres returns four rows. However, if I omit the {{HAVING}} clause I see that the group's rows are being incorrectly filtered by it: {code} +--+---+--+ | sum(coalesce(int_col_5, int_col_2)) | (coalesce(int_col_5, int_col_2) * 2) | +--+---+--+ | -507 | -1014 | | 940 | 1880 | | -769 | -1538 | | -367 | -734 | | -800 | -1600 | +--+---+--+ {code} Based on this, the output after adding the {{HAVING}} should contain four rows, not zero. I'm not sure how to further shrink this in a straightforward way, so I'm opening this bug to get help in triaging further. > Incorrect result when HAVING clause is added to group by query > -- > > Key: SPARK-17099 > URL: https://issues.apache.org/jira/browse/SPARK-17099 > Project: Spark > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Josh Rosen >Priority: Critical > Fix For: 2.1.0 > > > Random query generation uncovered the following que
[jira] [Created] (SPARK-17099) Incorrect result when complex HAVING clause is added to query
Josh Rosen created SPARK-17099: -- Summary: Incorrect result when complex HAVING clause is added to query Key: SPARK-17099 URL: https://issues.apache.org/jira/browse/SPARK-17099 Project: Spark Issue Type: Bug Affects Versions: 2.1.0 Reporter: Josh Rosen Priority: Critical Fix For: 2.1.0 Random query generation uncovered the following query which returns incorrect results when run on Spark SQL. This wasn't the original query uncovered by the generator, since I performed a bit of minimization to try to make it more understandable. With the following tables: {code} val t1 = sc.parallelize(Seq(-234, 145, 367, 975, 298)).toDF("int_col_5") val t2 = sc.parallelize( Seq( (-769, -244), (-800, -409), (940, 86), (-507, 304), (-367, 158)) ).toDF("int_col_2", "int_col_5") t1.registerTempTable("t1") t2.registerTempTable("t2") {code} Run {code} SELECT (SUM(COALESCE(t1.int_col_5, t2.int_col_2))), ((COALESCE(t1.int_col_5, t2.int_col_2)) * 2) FROM t1 RIGHT JOIN t2 ON (t2.int_col_2) = (t1.int_col_5) GROUP BY GREATEST(COALESCE(t2.int_col_5, 109), COALESCE(t1.int_col_5, -449)), COALESCE(t1.int_col_5, t2.int_col_2) HAVING (SUM(COALESCE(t1.int_col_5, t2.int_col_2))) > ((COALESCE(t1.int_col_5, t2.int_col_2)) * 2) {code} In Spark SQL, this returns an empty result set, whereas Postgres returns four rows. However, if I omit the {{HAVING}} clause I see that the group's rows are being incorrectly filtered by it: {code} +--+---+--+ | sum(coalesce(int_col_5, int_col_2)) | (coalesce(int_col_5, int_col_2) * 2) | +--+---+--+ | -507 | -1014 | | 940 | 1880 | | -769 | -1538 | | -367 | -734 | | -800 | -1600 | +--+---+--+ {code} Based on this, the output after adding the {{HAVING}} should contain four rows, not zero. I'm not sure how to further shrink this in a straightforward way, so I'm opening this bug to get help in triaging further. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17099) Incorrect result when HAVING clause is added to group by query
[ https://issues.apache.org/jira/browse/SPARK-17099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-17099: --- Summary: Incorrect result when HAVING clause is added to group by query (was: Incorrect result when complex HAVING clause is added to query) > Incorrect result when HAVING clause is added to group by query > -- > > Key: SPARK-17099 > URL: https://issues.apache.org/jira/browse/SPARK-17099 > Project: Spark > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Josh Rosen >Priority: Critical > Fix For: 2.1.0 > > > Random query generation uncovered the following query which returns incorrect > results when run on Spark SQL. This wasn't the original query uncovered by > the generator, since I performed a bit of minimization to try to make it more > understandable. > With the following tables: > {code} > val t1 = sc.parallelize(Seq(-234, 145, 367, 975, 298)).toDF("int_col_5") > val t2 = sc.parallelize( > Seq( > (-769, -244), > (-800, -409), > (940, 86), > (-507, 304), > (-367, 158)) > ).toDF("int_col_2", "int_col_5") > t1.registerTempTable("t1") > t2.registerTempTable("t2") > {code} > Run > {code} > SELECT > (SUM(COALESCE(t1.int_col_5, t2.int_col_2))), > ((COALESCE(t1.int_col_5, t2.int_col_2)) * 2) > FROM t1 > RIGHT JOIN t2 > ON (t2.int_col_2) = (t1.int_col_5) > GROUP BY GREATEST(COALESCE(t2.int_col_5, 109), COALESCE(t1.int_col_5, -449)), > COALESCE(t1.int_col_5, t2.int_col_2) > HAVING (SUM(COALESCE(t1.int_col_5, t2.int_col_2))) > ((COALESCE(t1.int_col_5, > t2.int_col_2)) * 2) > {code} > In Spark SQL, this returns an empty result set, whereas Postgres returns four > rows. However, if I omit the {{HAVING}} clause I see that the group's rows > are being incorrectly filtered by it: > {code} > +--+---+--+ > | sum(coalesce(int_col_5, int_col_2)) | (coalesce(int_col_5, int_col_2) * 2) > | > +--+---+--+ > | -507 | -1014 > | > | 940 | 1880 > | > | -769 | -1538 > | > | -367 | -734 > | > | -800 | -1600 > | > +--+---+--+ > {code} > Based on this, the output after adding the {{HAVING}} should contain four > rows, not zero. > I'm not sure how to further shrink this in a straightforward way, so I'm > opening this bug to get help in triaging further. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17054) SparkR can not run in yarn-cluster mode on mac os
[ https://issues.apache.org/jira/browse/SPARK-17054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423745#comment-15423745 ] Jeff Zhang commented on SPARK-17054: I push another commit to disable downloading spark if it is cluster mode. After that I can run sparkR successfully in yarn-cluster mode. > SparkR can not run in yarn-cluster mode on mac os > - > > Key: SPARK-17054 > URL: https://issues.apache.org/jira/browse/SPARK-17054 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.0.0 >Reporter: Jeff Zhang > > This is due to it download sparkR to the wrong place. > {noformat} > Warning message: > 'sparkR.init' is deprecated. > Use 'sparkR.session' instead. > See help("Deprecated") > Spark not found in SPARK_HOME: . > To search in the cache directory. Installation will start if not found. > Mirror site not provided. > Looking for site suggested from apache website... > Preferred mirror site found: http://apache.mirror.cdnetworks.com/spark > Downloading Spark spark-2.0.0 for Hadoop 2.7 from: > - > http://apache.mirror.cdnetworks.com/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz > Fetch failed from http://apache.mirror.cdnetworks.com/spark > open destfile '/home//Library/Caches/spark/spark-2.0.0-bin-hadoop2.7.tgz', > reason 'No such file or directory'> > To use backup site... > Downloading Spark spark-2.0.0 for Hadoop 2.7 from: > - > http://www-us.apache.org/dist/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz > Fetch failed from http://www-us.apache.org/dist/spark > open destfile '/home//Library/Caches/spark/spark-2.0.0-bin-hadoop2.7.tgz', > reason 'No such file or directory'> > Error in robust_download_tar(mirrorUrl, version, hadoopVersion, packageName, > : > Unable to download Spark spark-2.0.0 for Hadoop 2.7. Please check network > connection, Hadoop version, or provide other mirror sites. > Calls: sparkRSQL.init ... sparkR.session -> install.spark -> > robust_download_tar > In addition: Warning messages: > 1: 'sparkRSQL.init' is deprecated. > Use 'sparkR.session' instead. > See help("Deprecated") > 2: In dir.create(localDir, recursive = TRUE) : > cannot create dir '/home//Library', reason 'Operation not supported' > Execution halted > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16578) Configurable hostname for RBackend
[ https://issues.apache.org/jira/browse/SPARK-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423741#comment-15423741 ] Jeff Zhang commented on SPARK-16578: Another scenario I'd like to clarify is that. Say we launch R process in client machine and R backend process in AM container, then is this client mode or cluster mode ? > Configurable hostname for RBackend > -- > > Key: SPARK-16578 > URL: https://issues.apache.org/jira/browse/SPARK-16578 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Shivaram Venkataraman > > One of the requirements that comes up with SparkR being a standalone package > is that users can now install just the R package on the client side and > connect to a remote machine which runs the RBackend class. > We should check if we can support this mode of execution and what are the > pros / cons of it -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17054) SparkR can not run in yarn-cluster mode on mac os
[ https://issues.apache.org/jira/browse/SPARK-17054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17054: Assignee: Apache Spark > SparkR can not run in yarn-cluster mode on mac os > - > > Key: SPARK-17054 > URL: https://issues.apache.org/jira/browse/SPARK-17054 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.0.0 >Reporter: Jeff Zhang >Assignee: Apache Spark > > This is due to it download sparkR to the wrong place. > {noformat} > Warning message: > 'sparkR.init' is deprecated. > Use 'sparkR.session' instead. > See help("Deprecated") > Spark not found in SPARK_HOME: . > To search in the cache directory. Installation will start if not found. > Mirror site not provided. > Looking for site suggested from apache website... > Preferred mirror site found: http://apache.mirror.cdnetworks.com/spark > Downloading Spark spark-2.0.0 for Hadoop 2.7 from: > - > http://apache.mirror.cdnetworks.com/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz > Fetch failed from http://apache.mirror.cdnetworks.com/spark > open destfile '/home//Library/Caches/spark/spark-2.0.0-bin-hadoop2.7.tgz', > reason 'No such file or directory'> > To use backup site... > Downloading Spark spark-2.0.0 for Hadoop 2.7 from: > - > http://www-us.apache.org/dist/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz > Fetch failed from http://www-us.apache.org/dist/spark > open destfile '/home//Library/Caches/spark/spark-2.0.0-bin-hadoop2.7.tgz', > reason 'No such file or directory'> > Error in robust_download_tar(mirrorUrl, version, hadoopVersion, packageName, > : > Unable to download Spark spark-2.0.0 for Hadoop 2.7. Please check network > connection, Hadoop version, or provide other mirror sites. > Calls: sparkRSQL.init ... sparkR.session -> install.spark -> > robust_download_tar > In addition: Warning messages: > 1: 'sparkRSQL.init' is deprecated. > Use 'sparkR.session' instead. > See help("Deprecated") > 2: In dir.create(localDir, recursive = TRUE) : > cannot create dir '/home//Library', reason 'Operation not supported' > Execution halted > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17054) SparkR can not run in yarn-cluster mode on mac os
[ https://issues.apache.org/jira/browse/SPARK-17054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17054: Assignee: (was: Apache Spark) > SparkR can not run in yarn-cluster mode on mac os > - > > Key: SPARK-17054 > URL: https://issues.apache.org/jira/browse/SPARK-17054 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.0.0 >Reporter: Jeff Zhang > > This is due to it download sparkR to the wrong place. > {noformat} > Warning message: > 'sparkR.init' is deprecated. > Use 'sparkR.session' instead. > See help("Deprecated") > Spark not found in SPARK_HOME: . > To search in the cache directory. Installation will start if not found. > Mirror site not provided. > Looking for site suggested from apache website... > Preferred mirror site found: http://apache.mirror.cdnetworks.com/spark > Downloading Spark spark-2.0.0 for Hadoop 2.7 from: > - > http://apache.mirror.cdnetworks.com/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz > Fetch failed from http://apache.mirror.cdnetworks.com/spark > open destfile '/home//Library/Caches/spark/spark-2.0.0-bin-hadoop2.7.tgz', > reason 'No such file or directory'> > To use backup site... > Downloading Spark spark-2.0.0 for Hadoop 2.7 from: > - > http://www-us.apache.org/dist/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz > Fetch failed from http://www-us.apache.org/dist/spark > open destfile '/home//Library/Caches/spark/spark-2.0.0-bin-hadoop2.7.tgz', > reason 'No such file or directory'> > Error in robust_download_tar(mirrorUrl, version, hadoopVersion, packageName, > : > Unable to download Spark spark-2.0.0 for Hadoop 2.7. Please check network > connection, Hadoop version, or provide other mirror sites. > Calls: sparkRSQL.init ... sparkR.session -> install.spark -> > robust_download_tar > In addition: Warning messages: > 1: 'sparkRSQL.init' is deprecated. > Use 'sparkR.session' instead. > See help("Deprecated") > 2: In dir.create(localDir, recursive = TRUE) : > cannot create dir '/home//Library', reason 'Operation not supported' > Execution halted > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16757) Set up caller context to HDFS
[ https://issues.apache.org/jira/browse/SPARK-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423730#comment-15423730 ] Weiqing Yang commented on SPARK-16757: -- Hi, [~srowen] Could you help review this PR please? > Set up caller context to HDFS > - > > Key: SPARK-16757 > URL: https://issues.apache.org/jira/browse/SPARK-16757 > Project: Spark > Issue Type: Sub-task >Reporter: Weiqing Yang > > In this jira, Spark will invoke hadoop caller context api to set up its > caller context to HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16947) Support type coercion and foldable expression for inline tables
[ https://issues.apache.org/jira/browse/SPARK-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423678#comment-15423678 ] Apache Spark commented on SPARK-16947: -- User 'petermaxlee' has created a pull request for this issue: https://github.com/apache/spark/pull/14676 > Support type coercion and foldable expression for inline tables > --- > > Key: SPARK-16947 > URL: https://issues.apache.org/jira/browse/SPARK-16947 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Herman van Hovell >Assignee: Herman van Hovell > > Inline tables were added in to Spark SQL in 2.0, e.g.: {{select * from values > (1, 'A'), (2, 'B') as tbl(a, b)}} > This is currently implemented using a {{LocalRelation}} and this relation is > created during parsing. This has several weaknesses: you can only use simple > expressions in such a plan, and type coercion is based on the first row in > the relation, and all subsequent values are cast in to this type. The latter > violates the principle of least surprise. > I would like to rewrite this into a union of projects; each of these projects > would contain a single table row. We apply better type coercion rules to a > union, and we should be able to rewrite this into a local relation during > optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16947) Support type coercion and foldable expression for inline tables
[ https://issues.apache.org/jira/browse/SPARK-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Lee updated SPARK-16947: -- Summary: Support type coercion and foldable expression for inline tables (was: Improve type coercion and support foldable expression for inline tables) > Support type coercion and foldable expression for inline tables > --- > > Key: SPARK-16947 > URL: https://issues.apache.org/jira/browse/SPARK-16947 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Herman van Hovell >Assignee: Herman van Hovell > > Inline tables were added in to Spark SQL in 2.0, e.g.: {{select * from values > (1, 'A'), (2, 'B') as tbl(a, b)}} > This is currently implemented using a {{LocalRelation}} and this relation is > created during parsing. This has several weaknesses: you can only use simple > expressions in such a plan, and type coercion is based on the first row in > the relation, and all subsequent values are cast in to this type. The latter > violates the principle of least surprise. > I would like to rewrite this into a union of projects; each of these projects > would contain a single table row. We apply better type coercion rules to a > union, and we should be able to rewrite this into a local relation during > optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16947) Improve type coercion and support foldable expression for inline tables
[ https://issues.apache.org/jira/browse/SPARK-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Lee updated SPARK-16947: -- Summary: Improve type coercion and support foldable expression for inline tables (was: Improve type coercion of inline tables) > Improve type coercion and support foldable expression for inline tables > --- > > Key: SPARK-16947 > URL: https://issues.apache.org/jira/browse/SPARK-16947 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Herman van Hovell >Assignee: Herman van Hovell > > Inline tables were added in to Spark SQL in 2.0, e.g.: {{select * from values > (1, 'A'), (2, 'B') as tbl(a, b)}} > This is currently implemented using a {{LocalRelation}} and this relation is > created during parsing. This has several weaknesses: you can only use simple > expressions in such a plan, and type coercion is based on the first row in > the relation, and all subsequent values are cast in to this type. The latter > violates the principle of least surprise. > I would like to rewrite this into a union of projects; each of these projects > would contain a single table row. We apply better type coercion rules to a > union, and we should be able to rewrite this into a local relation during > optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17096) Fix StreamingQueryListener to return message and stacktrace of actual exception
[ https://issues.apache.org/jira/browse/SPARK-17096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-17096: -- Description: Currently only the stacktrace of StreamingQueryException is returned through StreamingQueryListener, which is useless as it hides the actual exception's stacktrace. For example, if there is a / by zero exception in a task, the QueryTerminated.stackTrace will have {code} org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:211) org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:124) {code} was: Currently only the stacktrace of StreamingQueryException is returned through StreamingQueryListener, which is useless as it hides the actual exception's stacktrace. For example, if there is a / by zero exception in a task, the QueryTerminated.stackTrace will have {code} org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:211) org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:124) {code} > Fix StreamingQueryListener to return message and stacktrace of actual > exception > --- > > Key: SPARK-17096 > URL: https://issues.apache.org/jira/browse/SPARK-17096 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Minor > > Currently only the stacktrace of StreamingQueryException is returned through > StreamingQueryListener, which is useless as it hides the actual exception's > stacktrace. > For example, if there is a / by zero exception in a task, the > QueryTerminated.stackTrace will have > {code} > org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:211) > org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:124) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17096) Fix StreamingQueryListener to return message and stacktrace of actual exception
[ https://issues.apache.org/jira/browse/SPARK-17096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-17096: -- Description: Currently only the stacktrace of StreamingQueryException is returned through StreamingQueryListener, which is useless as it hides the actual exception's stacktrace. For example, if there is a / by zero exception in a task, the QueryTerminated.stackTrace will have {code} org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:211) org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:124) {code} was:Currently only the stacktrace of StreamingQueryException is returned through StreamingQueryListener, which is useless as it hides the actual exception's stacktrace. > Fix StreamingQueryListener to return message and stacktrace of actual > exception > --- > > Key: SPARK-17096 > URL: https://issues.apache.org/jira/browse/SPARK-17096 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Minor > > Currently only the stacktrace of StreamingQueryException is returned through > StreamingQueryListener, which is useless as it hides the actual exception's > stacktrace. > For example, if there is a / by zero exception in a task, the > QueryTerminated.stackTrace will have > {code} > > org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:211) > > org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:124) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17096) Fix StreamingQueryListener to return message and stacktrace of actual exception
[ https://issues.apache.org/jira/browse/SPARK-17096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17096: Assignee: Apache Spark (was: Tathagata Das) > Fix StreamingQueryListener to return message and stacktrace of actual > exception > --- > > Key: SPARK-17096 > URL: https://issues.apache.org/jira/browse/SPARK-17096 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Apache Spark >Priority: Minor > > Currently only the stacktrace of StreamingQueryException is returned through > StreamingQueryListener, which is useless as it hides the actual exception's > stacktrace. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17096) Fix StreamingQueryListener to return message and stacktrace of actual exception
[ https://issues.apache.org/jira/browse/SPARK-17096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423650#comment-15423650 ] Apache Spark commented on SPARK-17096: -- User 'tdas' has created a pull request for this issue: https://github.com/apache/spark/pull/14675 > Fix StreamingQueryListener to return message and stacktrace of actual > exception > --- > > Key: SPARK-17096 > URL: https://issues.apache.org/jira/browse/SPARK-17096 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Minor > > Currently only the stacktrace of StreamingQueryException is returned through > StreamingQueryListener, which is useless as it hides the actual exception's > stacktrace. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17096) Fix StreamingQueryListener to return message and stacktrace of actual exception
[ https://issues.apache.org/jira/browse/SPARK-17096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17096: Assignee: Tathagata Das (was: Apache Spark) > Fix StreamingQueryListener to return message and stacktrace of actual > exception > --- > > Key: SPARK-17096 > URL: https://issues.apache.org/jira/browse/SPARK-17096 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Minor > > Currently only the stacktrace of StreamingQueryException is returned through > StreamingQueryListener, which is useless as it hides the actual exception's > stacktrace. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17098) "SELECT COUNT(NULL) OVER ()" throws UnsupportedOperationException during analysis
[ https://issues.apache.org/jira/browse/SPARK-17098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423634#comment-15423634 ] Josh Rosen commented on SPARK-17098: Actually, given the error here I think that the problem could be that sometimes {{WindowExpression.foldable == true}} even though {{WindowExpression}} is {{Unevaluable}}: {code} case class WindowExpression( windowFunction: Expression, windowSpec: WindowSpecDefinition) extends Expression with Unevaluable { override def children: Seq[Expression] = windowFunction :: windowSpec :: Nil override def dataType: DataType = windowFunction.dataType override def foldable: Boolean = windowFunction.foldable override def nullable: Boolean = windowFunction.nullable override def toString: String = s"$windowFunction $windowSpec" override def sql: String = windowFunction.sql + " OVER " + windowSpec.sql } {code} /cc [~hvanhovell], FYI. > "SELECT COUNT(NULL) OVER ()" throws UnsupportedOperationException during > analysis > - > > Key: SPARK-17098 > URL: https://issues.apache.org/jira/browse/SPARK-17098 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Josh Rosen > > Running > {code} > SELECT COUNT(NULL) OVER () > {code} > throws an UnsupportedOperationException during analysis: > {code} > java.lang.UnsupportedOperationException: Cannot evaluate expression: cast(0 > as bigint) windowspecdefinition(ROWS BETWEEN UNBOUNDED PRECEDING AND > UNBOUNDED FOLLOWING) > at > org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:221) > at > org.apache.spark.sql.catalyst.expressions.WindowExpression.eval(windowExpressions.scala:288) > at > org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18$$anonfun$applyOrElse$3.applyOrElse(Optimizer.scala:759) > at > org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18$$anonfun$applyOrElse$3.applyOrElse(Optimizer.scala:752) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:278) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:321) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:319) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:284) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionDown$1(QueryPlan.scala:156) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:166) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:170) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:170) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$4.apply(QueryPlan.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDown(QueryPlan.scala:175) > at > org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18.applyOrElse(Optimizer.scala:752) > at > org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18.applyOrElse(Optimizer.scala:751) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279) > at > org.apache.spark.sql.catalyst.trees.TreeNod
[jira] [Updated] (SPARK-17098) "SELECT COUNT(NULL) OVER ()" throws UnsupportedOperationException during analysis
[ https://issues.apache.org/jira/browse/SPARK-17098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-17098: --- Description: Running {code} SELECT COUNT(NULL) OVER () {code} throws an UnsupportedOperationException during analysis: {code} java.lang.UnsupportedOperationException: Cannot evaluate expression: cast(0 as bigint) windowspecdefinition(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) at org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:221) at org.apache.spark.sql.catalyst.expressions.WindowExpression.eval(windowExpressions.scala:288) at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18$$anonfun$applyOrElse$3.applyOrElse(Optimizer.scala:759) at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18$$anonfun$applyOrElse$3.applyOrElse(Optimizer.scala:752) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:278) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:321) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179) at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:319) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:284) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionDown$1(QueryPlan.scala:156) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:166) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:170) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:170) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$4.apply(QueryPlan.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDown(QueryPlan.scala:175) at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18.applyOrElse(Optimizer.scala:752) at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18.applyOrElse(Optimizer.scala:751) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:278) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:321) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179) at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:319) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:284) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:321) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179) at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren
[jira] [Created] (SPARK-17098) "SELECT COUNT(NULL) OVER ()" throws UnsupportedOperationException during analysis
Josh Rosen created SPARK-17098: -- Summary: "SELECT COUNT(NULL) OVER ()" throws UnsupportedOperationException during analysis Key: SPARK-17098 URL: https://issues.apache.org/jira/browse/SPARK-17098 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Josh Rosen Running {code} SELECT COUNT(NULL) OVER () {code} throws an UnsupportedOperationException during analysis: {code} java.lang.UnsupportedOperationException: Cannot evaluate expression: cast(0 as bigint) windowspecdefinition(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) at org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:221) at org.apache.spark.sql.catalyst.expressions.WindowExpression.eval(windowExpressions.scala:288) at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18$$anonfun$applyOrElse$3.applyOrElse(Optimizer.scala:759) at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18$$anonfun$applyOrElse$3.applyOrElse(Optimizer.scala:752) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:278) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:321) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179) at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:319) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:284) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionDown$1(QueryPlan.scala:156) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:166) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:170) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:170) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$4.apply(QueryPlan.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDown(QueryPlan.scala:175) at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18.applyOrElse(Optimizer.scala:752) at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$18.applyOrElse(Optimizer.scala:751) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:278) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:321) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179) at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:319) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:284) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284) at org.apache.spark.sql.catalyst.trees.TreeN
[jira] [Commented] (SPARK-17095) Latex and Scala doc do not play nicely
[ https://issues.apache.org/jira/browse/SPARK-17095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423602#comment-15423602 ] Jakob Odersky commented on SPARK-17095: --- Since this bug also occurs when there are no opening braces (}}} anywhere in the doc is sufficient), I think this is an issue with scaladoc itself. I would recommend creating a bug report on the scala tracker https://issues.scala-lang.org/secure/Dashboard.jspa. Ideally, code blocks could be delimited with an arbitrary number of opening symbols followed by an arbitrary number of closing symbols (e.g. you could use (4 braces) to delimit code that itself contains }}} 3 braces. > Latex and Scala doc do not play nicely > -- > > Key: SPARK-17095 > URL: https://issues.apache.org/jira/browse/SPARK-17095 > Project: Spark > Issue Type: Improvement > Components: Documentation >Reporter: Seth Hendrickson >Priority: Minor > Labels: starter > > In Latex, it is common to find "}}}" when closing several expressions at > once. [SPARK-16822|https://issues.apache.org/jira/browse/SPARK-16822] added > Mathjax to render Latex equations in scaladoc. However, when scala doc sees > "}}}" or "{{{" it treats it as a special character for code block. This > results in some very strange output. > A poor workaround is to use "}}\,}" in latex which inserts a small > whitespace. This is not ideal, and we can hopefully find a better solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16700) StructType doesn't accept Python dicts anymore
[ https://issues.apache.org/jira/browse/SPARK-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-16700: --- Labels: releasenotes (was: ) > StructType doesn't accept Python dicts anymore > -- > > Key: SPARK-16700 > URL: https://issues.apache.org/jira/browse/SPARK-16700 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.0.0 >Reporter: Sylvain Zimmer >Assignee: Davies Liu > Labels: releasenotes > Fix For: 2.1.0 > > > Hello, > I found this issue while testing my codebase with 2.0.0-rc5 > StructType in Spark 1.6.2 accepts the Python type, which is very > handy. 2.0.0-rc5 does not and throws an error. > I don't know if this was intended but I'd advocate for this behaviour to > remain the same. MapType is probably wasteful when your key names never > change and switching to Python tuples would be cumbersome. > Here is a minimal script to reproduce the issue: > {code} > from pyspark import SparkContext > from pyspark.sql import types as SparkTypes > from pyspark.sql import SQLContext > sc = SparkContext() > sqlc = SQLContext(sc) > struct_schema = SparkTypes.StructType([ > SparkTypes.StructField("id", SparkTypes.LongType()) > ]) > rdd = sc.parallelize([{"id": 0}, {"id": 1}]) > df = sqlc.createDataFrame(rdd, struct_schema) > print df.collect() > # 1.6.2 prints [Row(id=0), Row(id=1)] > # 2.0.0-rc5 raises TypeError: StructType can not accept object {'id': 0} in > type > {code} > Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17097) Pregel does not keep vertex state properly; fails to terminate
Seth Bromberger created SPARK-17097: --- Summary: Pregel does not keep vertex state properly; fails to terminate Key: SPARK-17097 URL: https://issues.apache.org/jira/browse/SPARK-17097 Project: Spark Issue Type: Bug Components: GraphX Affects Versions: 1.6.0 Environment: Scala 2.10.5, Spark 1.6.0 with GraphX and Pregel Reporter: Seth Bromberger Consider the following minimum example: {code:title=PregelBug.scala|borderStyle=solid} package testGraph import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.graphx.{Edge, EdgeTriplet, Graph, _} object PregelBug { def main(args: Array[String]) = { //FIXME breaks if TestVertex is a case class; works if not case class case class TestVertex(inId: VertexId, inData: String, inLabels: collection.mutable.HashSet[String]) extends Serializable { val id = inId val value = inData val labels = inLabels } class TestLink(inSrc: VertexId, inDst: VertexId, inData: String) extends Serializable { val src = inSrc val dst = inDst val data = inData } val startString = "XXXSTARTXXX" val conf = new SparkConf().setAppName("pregeltest").setMaster("local[*]") val sc = new SparkContext(conf) val vertexes = Vector( new TestVertex(0, "label0", collection.mutable.HashSet[String]()), new TestVertex(1, "label1", collection.mutable.HashSet[String]()) ) val links = Vector( new TestLink(0, 1, "linkData01") ) val vertexes_packaged = vertexes.map(v => (v.id, v)) val links_packaged = links.map(e => Edge(e.src, e.dst, e)) val graph = Graph[TestVertex, TestLink](sc.parallelize(vertexes_packaged), sc.parallelize(links_packaged)) def vertexProgram (vertexId: VertexId, vdata: TestVertex, message: Vector[String]): TestVertex = { message.foreach { case `startString` => if (vdata.id == 0L) vdata.labels.add(vdata.value) case m => if (!vdata.labels.contains(m)) vdata.labels.add(m) } new TestVertex(vdata.id, vdata.value, vdata.labels) } def sendMessage (triplet: EdgeTriplet[TestVertex, TestLink]): Iterator[(VertexId, Vector[String])] = { val srcLabels = triplet.srcAttr.labels val dstLabels = triplet.dstAttr.labels val msgsSrcDst = srcLabels.diff(dstLabels) .map(label => (triplet.dstAttr.id, Vector[String](label))) val msgsDstSrc = dstLabels.diff(dstLabels) .map(label => (triplet.srcAttr.id, Vector[String](label))) msgsSrcDst.toIterator ++ msgsDstSrc.toIterator } def mergeMessage (m1: Vector[String], m2: Vector[String]): Vector[String] = m1.union(m2).distinct val g = graph.pregel(Vector[String](startString))(vertexProgram, sendMessage, mergeMessage) println("---pregel done---") println("vertex info:") g.vertices.foreach( v => { val labels = v._2.labels println( "vertex " + v._1 + ": name = " + v._2.id + ", labels = " + labels) } ) } } {code} This code never terminates even though we expect it to. To fix, we simply remove the "case" designation for the TestVertex class (see FIXME comment), and then it behaves as expected. (Apologies if this has been fixed in later versions; we're unfortunately pegged to 2.10.5 / 1.6.0 for now.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17002) Document that spark.ssl.protocol. is required for SSL
[ https://issues.apache.org/jira/browse/SPARK-17002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17002: Assignee: Apache Spark > Document that spark.ssl.protocol. is required for SSL > - > > Key: SPARK-17002 > URL: https://issues.apache.org/jira/browse/SPARK-17002 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.2, 2.0.0 >Reporter: Michael Gummelt >Assignee: Apache Spark > > cc [~jlewandowski] > I was trying to start the Spark master. When setting > {{spark.ssl.enabled=true}}, but failing to set {{spark.ssl.protocol}}, I get > this none-too-helpful error message: > {code} > 16/08/10 15:17:50 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(mgummelt); users > with modify permissions: Set(mgummelt) > 16/08/10 15:17:50 WARN SecurityManager: Using 'accept-all' trust manager for > SSL connections. > Exception in thread "main" java.security.KeyManagementException: Default > SSLContext is initialized automatically > at > sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749) > at javax.net.ssl.SSLContext.init(SSLContext.java:282) > at org.apache.spark.SecurityManager.(SecurityManager.scala:284) > at > org.apache.spark.deploy.master.Master$.startRpcEnvAndEndpoint(Master.scala:1121) > at org.apache.spark.deploy.master.Master$.main(Master.scala:1106) > at org.apache.spark.deploy.master.Master.main(Master.scala) > {code} > We should document that {{spark.ssl.protocol}} is required, and throw a more > helpful error message when it isn't set. In fact, we should remove the > `getOrElse` here: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SecurityManager.scala#L285, > since the following line fails when the protocol is set to "Default" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17002) Document that spark.ssl.protocol. is required for SSL
[ https://issues.apache.org/jira/browse/SPARK-17002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423564#comment-15423564 ] Apache Spark commented on SPARK-17002: -- User 'wangmiao1981' has created a pull request for this issue: https://github.com/apache/spark/pull/14674 > Document that spark.ssl.protocol. is required for SSL > - > > Key: SPARK-17002 > URL: https://issues.apache.org/jira/browse/SPARK-17002 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.2, 2.0.0 >Reporter: Michael Gummelt > > cc [~jlewandowski] > I was trying to start the Spark master. When setting > {{spark.ssl.enabled=true}}, but failing to set {{spark.ssl.protocol}}, I get > this none-too-helpful error message: > {code} > 16/08/10 15:17:50 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(mgummelt); users > with modify permissions: Set(mgummelt) > 16/08/10 15:17:50 WARN SecurityManager: Using 'accept-all' trust manager for > SSL connections. > Exception in thread "main" java.security.KeyManagementException: Default > SSLContext is initialized automatically > at > sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749) > at javax.net.ssl.SSLContext.init(SSLContext.java:282) > at org.apache.spark.SecurityManager.(SecurityManager.scala:284) > at > org.apache.spark.deploy.master.Master$.startRpcEnvAndEndpoint(Master.scala:1121) > at org.apache.spark.deploy.master.Master$.main(Master.scala:1106) > at org.apache.spark.deploy.master.Master.main(Master.scala) > {code} > We should document that {{spark.ssl.protocol}} is required, and throw a more > helpful error message when it isn't set. In fact, we should remove the > `getOrElse` here: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SecurityManager.scala#L285, > since the following line fails when the protocol is set to "Default" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17002) Document that spark.ssl.protocol. is required for SSL
[ https://issues.apache.org/jira/browse/SPARK-17002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17002: Assignee: (was: Apache Spark) > Document that spark.ssl.protocol. is required for SSL > - > > Key: SPARK-17002 > URL: https://issues.apache.org/jira/browse/SPARK-17002 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.2, 2.0.0 >Reporter: Michael Gummelt > > cc [~jlewandowski] > I was trying to start the Spark master. When setting > {{spark.ssl.enabled=true}}, but failing to set {{spark.ssl.protocol}}, I get > this none-too-helpful error message: > {code} > 16/08/10 15:17:50 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(mgummelt); users > with modify permissions: Set(mgummelt) > 16/08/10 15:17:50 WARN SecurityManager: Using 'accept-all' trust manager for > SSL connections. > Exception in thread "main" java.security.KeyManagementException: Default > SSLContext is initialized automatically > at > sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749) > at javax.net.ssl.SSLContext.init(SSLContext.java:282) > at org.apache.spark.SecurityManager.(SecurityManager.scala:284) > at > org.apache.spark.deploy.master.Master$.startRpcEnvAndEndpoint(Master.scala:1121) > at org.apache.spark.deploy.master.Master$.main(Master.scala:1106) > at org.apache.spark.deploy.master.Master.main(Master.scala) > {code} > We should document that {{spark.ssl.protocol}} is required, and throw a more > helpful error message when it isn't set. In fact, we should remove the > `getOrElse` here: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SecurityManager.scala#L285, > since the following line fails when the protocol is set to "Default" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16725) Migrate Guava to 16+?
[ https://issues.apache.org/jira/browse/SPARK-16725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423559#comment-15423559 ] Marcelo Vanzin commented on SPARK-16725: Don't get me wrong, I feel your pain, and I really hope Hadoop 3.x will fix this mess. But upgrading the version of Guava in Spark doesn't really solve *the* problem. It might solve your specific problem, but then there are ways of solving your problem that do not involve changing Spark, too. > Migrate Guava to 16+? > - > > Key: SPARK-16725 > URL: https://issues.apache.org/jira/browse/SPARK-16725 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.0.1 >Reporter: Min Wei >Priority: Minor > Original Estimate: 12h > Remaining Estimate: 12h > > Currently Spark depends on an old version of Guava, version 14. However > Spark-cassandra driver asserts on Guava version 16 and above. > It would be great to update the Guava dependency to version 16+ > diff --git a/core/src/main/scala/org/apache/spark/SecurityManager.scala > b/core/src/main/scala/org/apache/spark/SecurityManager.scala > index f72c7de..abddafe 100644 > --- a/core/src/main/scala/org/apache/spark/SecurityManager.scala > +++ b/core/src/main/scala/org/apache/spark/SecurityManager.scala > @@ -23,7 +23,7 @@ import java.security.{KeyStore, SecureRandom} > import java.security.cert.X509Certificate > import javax.net.ssl._ > > -import com.google.common.hash.HashCodes > +import com.google.common.hash.HashCode > import com.google.common.io.Files > import org.apache.hadoop.io.Text > > @@ -432,7 +432,7 @@ private[spark] class SecurityManager(sparkConf: SparkConf) > val secret = new Array[Byte](length) > rnd.nextBytes(secret) > > -val cookie = HashCodes.fromBytes(secret).toString() > +val cookie = HashCode.fromBytes(secret).toString() > SparkHadoopUtil.get.addSecretKeyToUserCredentials(SECRET_LOOKUP_KEY, > cookie) > cookie >} else { > diff --git a/core/src/main/scala/org/apache/spark/SparkEnv.scala > b/core/src/main/scala/org/apache/spark/SparkEnv.scala > index af50a6d..02545ae 100644 > --- a/core/src/main/scala/org/apache/spark/SparkEnv.scala > +++ b/core/src/main/scala/org/apache/spark/SparkEnv.scala > @@ -72,7 +72,7 @@ class SparkEnv ( > >// A general, soft-reference map for metadata needed during HadoopRDD > split computation >// (e.g., HadoopFileRDD uses this to cache JobConfs and InputFormats). > - private[spark] val hadoopJobMetadata = new > MapMaker().softValues().makeMap[String, Any]() > + private[spark] val hadoopJobMetadata = new > MapMaker().weakValues().makeMap[String, Any]() > >private[spark] var driverTmpDir: Option[String] = None > > diff --git a/pom.xml b/pom.xml > index d064cb5..7c3e036 100644 > --- a/pom.xml > +++ b/pom.xml > @@ -368,8 +368,7 @@ > > com.google.guava > guava > -14.0.1 > -provided > +19.0 > > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16725) Migrate Guava to 16+?
[ https://issues.apache.org/jira/browse/SPARK-16725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423555#comment-15423555 ] Russell Spitzer commented on SPARK-16725: - I'm well aware as we've been dealing with this since 1.0, that's why we begun the process of shading Guava for Hadoop based builds, now though we are stuck doing it for all builds :( > Migrate Guava to 16+? > - > > Key: SPARK-16725 > URL: https://issues.apache.org/jira/browse/SPARK-16725 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.0.1 >Reporter: Min Wei >Priority: Minor > Original Estimate: 12h > Remaining Estimate: 12h > > Currently Spark depends on an old version of Guava, version 14. However > Spark-cassandra driver asserts on Guava version 16 and above. > It would be great to update the Guava dependency to version 16+ > diff --git a/core/src/main/scala/org/apache/spark/SecurityManager.scala > b/core/src/main/scala/org/apache/spark/SecurityManager.scala > index f72c7de..abddafe 100644 > --- a/core/src/main/scala/org/apache/spark/SecurityManager.scala > +++ b/core/src/main/scala/org/apache/spark/SecurityManager.scala > @@ -23,7 +23,7 @@ import java.security.{KeyStore, SecureRandom} > import java.security.cert.X509Certificate > import javax.net.ssl._ > > -import com.google.common.hash.HashCodes > +import com.google.common.hash.HashCode > import com.google.common.io.Files > import org.apache.hadoop.io.Text > > @@ -432,7 +432,7 @@ private[spark] class SecurityManager(sparkConf: SparkConf) > val secret = new Array[Byte](length) > rnd.nextBytes(secret) > > -val cookie = HashCodes.fromBytes(secret).toString() > +val cookie = HashCode.fromBytes(secret).toString() > SparkHadoopUtil.get.addSecretKeyToUserCredentials(SECRET_LOOKUP_KEY, > cookie) > cookie >} else { > diff --git a/core/src/main/scala/org/apache/spark/SparkEnv.scala > b/core/src/main/scala/org/apache/spark/SparkEnv.scala > index af50a6d..02545ae 100644 > --- a/core/src/main/scala/org/apache/spark/SparkEnv.scala > +++ b/core/src/main/scala/org/apache/spark/SparkEnv.scala > @@ -72,7 +72,7 @@ class SparkEnv ( > >// A general, soft-reference map for metadata needed during HadoopRDD > split computation >// (e.g., HadoopFileRDD uses this to cache JobConfs and InputFormats). > - private[spark] val hadoopJobMetadata = new > MapMaker().softValues().makeMap[String, Any]() > + private[spark] val hadoopJobMetadata = new > MapMaker().weakValues().makeMap[String, Any]() > >private[spark] var driverTmpDir: Option[String] = None > > diff --git a/pom.xml b/pom.xml > index d064cb5..7c3e036 100644 > --- a/pom.xml > +++ b/pom.xml > @@ -368,8 +368,7 @@ > > com.google.guava > guava > -14.0.1 > -provided > +19.0 > > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17096) Fix StreamingQueryListener to return message and stacktrace of actual exception
Tathagata Das created SPARK-17096: - Summary: Fix StreamingQueryListener to return message and stacktrace of actual exception Key: SPARK-17096 URL: https://issues.apache.org/jira/browse/SPARK-17096 Project: Spark Issue Type: Sub-task Components: SQL, Streaming Reporter: Tathagata Das Assignee: Tathagata Das Priority: Minor Currently only the stacktrace of StreamingQueryException is returned through StreamingQueryListener, which is useless as it hides the actual exception's stacktrace. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16725) Migrate Guava to 16+?
[ https://issues.apache.org/jira/browse/SPARK-16725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423544#comment-15423544 ] Marcelo Vanzin commented on SPARK-16725: Welcome to dependency hell. See my comment above for why this issue is not new and why you could have it even with Spark 1.x. > Migrate Guava to 16+? > - > > Key: SPARK-16725 > URL: https://issues.apache.org/jira/browse/SPARK-16725 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.0.1 >Reporter: Min Wei >Priority: Minor > Original Estimate: 12h > Remaining Estimate: 12h > > Currently Spark depends on an old version of Guava, version 14. However > Spark-cassandra driver asserts on Guava version 16 and above. > It would be great to update the Guava dependency to version 16+ > diff --git a/core/src/main/scala/org/apache/spark/SecurityManager.scala > b/core/src/main/scala/org/apache/spark/SecurityManager.scala > index f72c7de..abddafe 100644 > --- a/core/src/main/scala/org/apache/spark/SecurityManager.scala > +++ b/core/src/main/scala/org/apache/spark/SecurityManager.scala > @@ -23,7 +23,7 @@ import java.security.{KeyStore, SecureRandom} > import java.security.cert.X509Certificate > import javax.net.ssl._ > > -import com.google.common.hash.HashCodes > +import com.google.common.hash.HashCode > import com.google.common.io.Files > import org.apache.hadoop.io.Text > > @@ -432,7 +432,7 @@ private[spark] class SecurityManager(sparkConf: SparkConf) > val secret = new Array[Byte](length) > rnd.nextBytes(secret) > > -val cookie = HashCodes.fromBytes(secret).toString() > +val cookie = HashCode.fromBytes(secret).toString() > SparkHadoopUtil.get.addSecretKeyToUserCredentials(SECRET_LOOKUP_KEY, > cookie) > cookie >} else { > diff --git a/core/src/main/scala/org/apache/spark/SparkEnv.scala > b/core/src/main/scala/org/apache/spark/SparkEnv.scala > index af50a6d..02545ae 100644 > --- a/core/src/main/scala/org/apache/spark/SparkEnv.scala > +++ b/core/src/main/scala/org/apache/spark/SparkEnv.scala > @@ -72,7 +72,7 @@ class SparkEnv ( > >// A general, soft-reference map for metadata needed during HadoopRDD > split computation >// (e.g., HadoopFileRDD uses this to cache JobConfs and InputFormats). > - private[spark] val hadoopJobMetadata = new > MapMaker().softValues().makeMap[String, Any]() > + private[spark] val hadoopJobMetadata = new > MapMaker().weakValues().makeMap[String, Any]() > >private[spark] var driverTmpDir: Option[String] = None > > diff --git a/pom.xml b/pom.xml > index d064cb5..7c3e036 100644 > --- a/pom.xml > +++ b/pom.xml > @@ -368,8 +368,7 @@ > > com.google.guava > guava > -14.0.1 > -provided > +19.0 > > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16725) Migrate Guava to 16+?
[ https://issues.apache.org/jira/browse/SPARK-16725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423542#comment-15423542 ] Marcelo Vanzin commented on SPARK-16725: Spark's use of Guava is shaded. That comments refers to Hadoop's use of Guava. If you download the 1.6.x builds of Spark that say "without Hadoop", and you add the Hadoop libraries from your distro to Spark, you'll have the same issue, since those will include Hadoop's Guava dependency. This dependency hell is why shading is the way to go. > Migrate Guava to 16+? > - > > Key: SPARK-16725 > URL: https://issues.apache.org/jira/browse/SPARK-16725 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.0.1 >Reporter: Min Wei >Priority: Minor > Original Estimate: 12h > Remaining Estimate: 12h > > Currently Spark depends on an old version of Guava, version 14. However > Spark-cassandra driver asserts on Guava version 16 and above. > It would be great to update the Guava dependency to version 16+ > diff --git a/core/src/main/scala/org/apache/spark/SecurityManager.scala > b/core/src/main/scala/org/apache/spark/SecurityManager.scala > index f72c7de..abddafe 100644 > --- a/core/src/main/scala/org/apache/spark/SecurityManager.scala > +++ b/core/src/main/scala/org/apache/spark/SecurityManager.scala > @@ -23,7 +23,7 @@ import java.security.{KeyStore, SecureRandom} > import java.security.cert.X509Certificate > import javax.net.ssl._ > > -import com.google.common.hash.HashCodes > +import com.google.common.hash.HashCode > import com.google.common.io.Files > import org.apache.hadoop.io.Text > > @@ -432,7 +432,7 @@ private[spark] class SecurityManager(sparkConf: SparkConf) > val secret = new Array[Byte](length) > rnd.nextBytes(secret) > > -val cookie = HashCodes.fromBytes(secret).toString() > +val cookie = HashCode.fromBytes(secret).toString() > SparkHadoopUtil.get.addSecretKeyToUserCredentials(SECRET_LOOKUP_KEY, > cookie) > cookie >} else { > diff --git a/core/src/main/scala/org/apache/spark/SparkEnv.scala > b/core/src/main/scala/org/apache/spark/SparkEnv.scala > index af50a6d..02545ae 100644 > --- a/core/src/main/scala/org/apache/spark/SparkEnv.scala > +++ b/core/src/main/scala/org/apache/spark/SparkEnv.scala > @@ -72,7 +72,7 @@ class SparkEnv ( > >// A general, soft-reference map for metadata needed during HadoopRDD > split computation >// (e.g., HadoopFileRDD uses this to cache JobConfs and InputFormats). > - private[spark] val hadoopJobMetadata = new > MapMaker().softValues().makeMap[String, Any]() > + private[spark] val hadoopJobMetadata = new > MapMaker().weakValues().makeMap[String, Any]() > >private[spark] var driverTmpDir: Option[String] = None > > diff --git a/pom.xml b/pom.xml > index d064cb5..7c3e036 100644 > --- a/pom.xml > +++ b/pom.xml > @@ -368,8 +368,7 @@ > > com.google.guava > guava > -14.0.1 > -provided > +19.0 > > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16725) Migrate Guava to 16+?
[ https://issues.apache.org/jira/browse/SPARK-16725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423539#comment-15423539 ] Russell Spitzer commented on SPARK-16725: - In our case it's exposing a library which exposes the shaded code. Ie we include the Cassandra Java Driver which publicly exposes Guava in some places. So those access points are necessarily broken but it's not something we can directly control. > Migrate Guava to 16+? > - > > Key: SPARK-16725 > URL: https://issues.apache.org/jira/browse/SPARK-16725 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.0.1 >Reporter: Min Wei >Priority: Minor > Original Estimate: 12h > Remaining Estimate: 12h > > Currently Spark depends on an old version of Guava, version 14. However > Spark-cassandra driver asserts on Guava version 16 and above. > It would be great to update the Guava dependency to version 16+ > diff --git a/core/src/main/scala/org/apache/spark/SecurityManager.scala > b/core/src/main/scala/org/apache/spark/SecurityManager.scala > index f72c7de..abddafe 100644 > --- a/core/src/main/scala/org/apache/spark/SecurityManager.scala > +++ b/core/src/main/scala/org/apache/spark/SecurityManager.scala > @@ -23,7 +23,7 @@ import java.security.{KeyStore, SecureRandom} > import java.security.cert.X509Certificate > import javax.net.ssl._ > > -import com.google.common.hash.HashCodes > +import com.google.common.hash.HashCode > import com.google.common.io.Files > import org.apache.hadoop.io.Text > > @@ -432,7 +432,7 @@ private[spark] class SecurityManager(sparkConf: SparkConf) > val secret = new Array[Byte](length) > rnd.nextBytes(secret) > > -val cookie = HashCodes.fromBytes(secret).toString() > +val cookie = HashCode.fromBytes(secret).toString() > SparkHadoopUtil.get.addSecretKeyToUserCredentials(SECRET_LOOKUP_KEY, > cookie) > cookie >} else { > diff --git a/core/src/main/scala/org/apache/spark/SparkEnv.scala > b/core/src/main/scala/org/apache/spark/SparkEnv.scala > index af50a6d..02545ae 100644 > --- a/core/src/main/scala/org/apache/spark/SparkEnv.scala > +++ b/core/src/main/scala/org/apache/spark/SparkEnv.scala > @@ -72,7 +72,7 @@ class SparkEnv ( > >// A general, soft-reference map for metadata needed during HadoopRDD > split computation >// (e.g., HadoopFileRDD uses this to cache JobConfs and InputFormats). > - private[spark] val hadoopJobMetadata = new > MapMaker().softValues().makeMap[String, Any]() > + private[spark] val hadoopJobMetadata = new > MapMaker().weakValues().makeMap[String, Any]() > >private[spark] var driverTmpDir: Option[String] = None > > diff --git a/pom.xml b/pom.xml > index d064cb5..7c3e036 100644 > --- a/pom.xml > +++ b/pom.xml > @@ -368,8 +368,7 @@ > > com.google.guava > guava > -14.0.1 > -provided > +19.0 > > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16725) Migrate Guava to 16+?
[ https://issues.apache.org/jira/browse/SPARK-16725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423534#comment-15423534 ] Brian Hess commented on SPARK-16725: - But it looks like it is not being shaded in 2.0: https://github.com/apache/spark/blob/branch-2.0/assembly/pom.xml#L79-L89 >From the comment there: "Because we don't shade dependencies anymore..." > Migrate Guava to 16+? > - > > Key: SPARK-16725 > URL: https://issues.apache.org/jira/browse/SPARK-16725 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.0.1 >Reporter: Min Wei >Priority: Minor > Original Estimate: 12h > Remaining Estimate: 12h > > Currently Spark depends on an old version of Guava, version 14. However > Spark-cassandra driver asserts on Guava version 16 and above. > It would be great to update the Guava dependency to version 16+ > diff --git a/core/src/main/scala/org/apache/spark/SecurityManager.scala > b/core/src/main/scala/org/apache/spark/SecurityManager.scala > index f72c7de..abddafe 100644 > --- a/core/src/main/scala/org/apache/spark/SecurityManager.scala > +++ b/core/src/main/scala/org/apache/spark/SecurityManager.scala > @@ -23,7 +23,7 @@ import java.security.{KeyStore, SecureRandom} > import java.security.cert.X509Certificate > import javax.net.ssl._ > > -import com.google.common.hash.HashCodes > +import com.google.common.hash.HashCode > import com.google.common.io.Files > import org.apache.hadoop.io.Text > > @@ -432,7 +432,7 @@ private[spark] class SecurityManager(sparkConf: SparkConf) > val secret = new Array[Byte](length) > rnd.nextBytes(secret) > > -val cookie = HashCodes.fromBytes(secret).toString() > +val cookie = HashCode.fromBytes(secret).toString() > SparkHadoopUtil.get.addSecretKeyToUserCredentials(SECRET_LOOKUP_KEY, > cookie) > cookie >} else { > diff --git a/core/src/main/scala/org/apache/spark/SparkEnv.scala > b/core/src/main/scala/org/apache/spark/SparkEnv.scala > index af50a6d..02545ae 100644 > --- a/core/src/main/scala/org/apache/spark/SparkEnv.scala > +++ b/core/src/main/scala/org/apache/spark/SparkEnv.scala > @@ -72,7 +72,7 @@ class SparkEnv ( > >// A general, soft-reference map for metadata needed during HadoopRDD > split computation >// (e.g., HadoopFileRDD uses this to cache JobConfs and InputFormats). > - private[spark] val hadoopJobMetadata = new > MapMaker().softValues().makeMap[String, Any]() > + private[spark] val hadoopJobMetadata = new > MapMaker().weakValues().makeMap[String, Any]() > >private[spark] var driverTmpDir: Option[String] = None > > diff --git a/pom.xml b/pom.xml > index d064cb5..7c3e036 100644 > --- a/pom.xml > +++ b/pom.xml > @@ -368,8 +368,7 @@ > > com.google.guava > guava > -14.0.1 > -provided > +19.0 > > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17095) Latex and Scala doc do not play nicely
[ https://issues.apache.org/jira/browse/SPARK-17095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423514#comment-15423514 ] Seth Hendrickson commented on SPARK-17095: -- cc [~lins05] [~srowen] [~jodersky] > Latex and Scala doc do not play nicely > -- > > Key: SPARK-17095 > URL: https://issues.apache.org/jira/browse/SPARK-17095 > Project: Spark > Issue Type: Improvement > Components: Documentation >Reporter: Seth Hendrickson >Priority: Minor > Labels: starter > > In Latex, it is common to find "}}}" when closing several expressions at > once. [SPARK-16822|https://issues.apache.org/jira/browse/SPARK-16822] added > Mathjax to render Latex equations in scaladoc. However, when scala doc sees > "}}}" or "{{{" it treats it as a special character for code block. This > results in some very strange output. > A poor workaround is to use "}}\,}" in latex which inserts a small > whitespace. This is not ideal, and we can hopefully find a better solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17095) Latex and Scala doc do not play nicely
Seth Hendrickson created SPARK-17095: Summary: Latex and Scala doc do not play nicely Key: SPARK-17095 URL: https://issues.apache.org/jira/browse/SPARK-17095 Project: Spark Issue Type: Improvement Components: Documentation Reporter: Seth Hendrickson Priority: Minor In Latex, it is common to find "}}}" when closing several expressions at once. [SPARK-16822|https://issues.apache.org/jira/browse/SPARK-16822] added Mathjax to render Latex equations in scaladoc. However, when scala doc sees "}}}" or "{{{" it treats it as a special character for code block. This results in some very strange output. A poor workaround is to use "}}\,}" in latex which inserts a small whitespace. This is not ideal, and we can hopefully find a better solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16725) Migrate Guava to 16+?
[ https://issues.apache.org/jira/browse/SPARK-16725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423505#comment-15423505 ] Marcelo Vanzin commented on SPARK-16725: Exposing 3rd-party libraries in an API should be considered a bug, unless there's really no way (e.g. Spark needs to expose parts of the Hadoop API). Spark 1.x did that with Guava, but that went out before it could be fixed; so the shading in 1.x was not complete. Spark 2.x fixes that (there's no more Guava anything in the public API). > Migrate Guava to 16+? > - > > Key: SPARK-16725 > URL: https://issues.apache.org/jira/browse/SPARK-16725 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.0.1 >Reporter: Min Wei >Priority: Minor > Original Estimate: 12h > Remaining Estimate: 12h > > Currently Spark depends on an old version of Guava, version 14. However > Spark-cassandra driver asserts on Guava version 16 and above. > It would be great to update the Guava dependency to version 16+ > diff --git a/core/src/main/scala/org/apache/spark/SecurityManager.scala > b/core/src/main/scala/org/apache/spark/SecurityManager.scala > index f72c7de..abddafe 100644 > --- a/core/src/main/scala/org/apache/spark/SecurityManager.scala > +++ b/core/src/main/scala/org/apache/spark/SecurityManager.scala > @@ -23,7 +23,7 @@ import java.security.{KeyStore, SecureRandom} > import java.security.cert.X509Certificate > import javax.net.ssl._ > > -import com.google.common.hash.HashCodes > +import com.google.common.hash.HashCode > import com.google.common.io.Files > import org.apache.hadoop.io.Text > > @@ -432,7 +432,7 @@ private[spark] class SecurityManager(sparkConf: SparkConf) > val secret = new Array[Byte](length) > rnd.nextBytes(secret) > > -val cookie = HashCodes.fromBytes(secret).toString() > +val cookie = HashCode.fromBytes(secret).toString() > SparkHadoopUtil.get.addSecretKeyToUserCredentials(SECRET_LOOKUP_KEY, > cookie) > cookie >} else { > diff --git a/core/src/main/scala/org/apache/spark/SparkEnv.scala > b/core/src/main/scala/org/apache/spark/SparkEnv.scala > index af50a6d..02545ae 100644 > --- a/core/src/main/scala/org/apache/spark/SparkEnv.scala > +++ b/core/src/main/scala/org/apache/spark/SparkEnv.scala > @@ -72,7 +72,7 @@ class SparkEnv ( > >// A general, soft-reference map for metadata needed during HadoopRDD > split computation >// (e.g., HadoopFileRDD uses this to cache JobConfs and InputFormats). > - private[spark] val hadoopJobMetadata = new > MapMaker().softValues().makeMap[String, Any]() > + private[spark] val hadoopJobMetadata = new > MapMaker().weakValues().makeMap[String, Any]() > >private[spark] var driverTmpDir: Option[String] = None > > diff --git a/pom.xml b/pom.xml > index d064cb5..7c3e036 100644 > --- a/pom.xml > +++ b/pom.xml > @@ -368,8 +368,7 @@ > > com.google.guava > guava > -14.0.1 > -provided > +19.0 > > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16725) Migrate Guava to 16+?
[ https://issues.apache.org/jira/browse/SPARK-16725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423498#comment-15423498 ] Russell Spitzer commented on SPARK-16725: - I think *But it works* is a bit of an overstatement. It "works" when those shaded libraries are never exposed through a public api but it is basically broken whenever they are. > Migrate Guava to 16+? > - > > Key: SPARK-16725 > URL: https://issues.apache.org/jira/browse/SPARK-16725 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.0.1 >Reporter: Min Wei >Priority: Minor > Original Estimate: 12h > Remaining Estimate: 12h > > Currently Spark depends on an old version of Guava, version 14. However > Spark-cassandra driver asserts on Guava version 16 and above. > It would be great to update the Guava dependency to version 16+ > diff --git a/core/src/main/scala/org/apache/spark/SecurityManager.scala > b/core/src/main/scala/org/apache/spark/SecurityManager.scala > index f72c7de..abddafe 100644 > --- a/core/src/main/scala/org/apache/spark/SecurityManager.scala > +++ b/core/src/main/scala/org/apache/spark/SecurityManager.scala > @@ -23,7 +23,7 @@ import java.security.{KeyStore, SecureRandom} > import java.security.cert.X509Certificate > import javax.net.ssl._ > > -import com.google.common.hash.HashCodes > +import com.google.common.hash.HashCode > import com.google.common.io.Files > import org.apache.hadoop.io.Text > > @@ -432,7 +432,7 @@ private[spark] class SecurityManager(sparkConf: SparkConf) > val secret = new Array[Byte](length) > rnd.nextBytes(secret) > > -val cookie = HashCodes.fromBytes(secret).toString() > +val cookie = HashCode.fromBytes(secret).toString() > SparkHadoopUtil.get.addSecretKeyToUserCredentials(SECRET_LOOKUP_KEY, > cookie) > cookie >} else { > diff --git a/core/src/main/scala/org/apache/spark/SparkEnv.scala > b/core/src/main/scala/org/apache/spark/SparkEnv.scala > index af50a6d..02545ae 100644 > --- a/core/src/main/scala/org/apache/spark/SparkEnv.scala > +++ b/core/src/main/scala/org/apache/spark/SparkEnv.scala > @@ -72,7 +72,7 @@ class SparkEnv ( > >// A general, soft-reference map for metadata needed during HadoopRDD > split computation >// (e.g., HadoopFileRDD uses this to cache JobConfs and InputFormats). > - private[spark] val hadoopJobMetadata = new > MapMaker().softValues().makeMap[String, Any]() > + private[spark] val hadoopJobMetadata = new > MapMaker().weakValues().makeMap[String, Any]() > >private[spark] var driverTmpDir: Option[String] = None > > diff --git a/pom.xml b/pom.xml > index d064cb5..7c3e036 100644 > --- a/pom.xml > +++ b/pom.xml > @@ -368,8 +368,7 @@ > > com.google.guava > guava > -14.0.1 > -provided > +19.0 > > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17094) provide simplified API for ML pipeline
yuhao yang created SPARK-17094: -- Summary: provide simplified API for ML pipeline Key: SPARK-17094 URL: https://issues.apache.org/jira/browse/SPARK-17094 Project: Spark Issue Type: New Feature Components: ML Reporter: yuhao yang Many machine learning pipeline has the API for easily assembling transformers. One example would be: val model = new Pipeline("tokenizer", "countvectorizer", "lda").fit(data). Appreciate feedback and suggestions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17093) Roundtrip encoding of array> fields is wrong when whole-stage codegen is disabled
Josh Rosen created SPARK-17093: -- Summary: Roundtrip encoding of array> fields is wrong when whole-stage codegen is disabled Key: SPARK-17093 URL: https://issues.apache.org/jira/browse/SPARK-17093 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Josh Rosen Priority: Critical The following failing test demonstrates a bug where Spark mis-encodes array-of-struct fields if whole-stage codegen is disabled: {code} withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") { val data = Array(Array((1, 2), (3, 4))) val ds = spark.sparkContext.parallelize(data).toDS() assert(ds.collect() === data) } {code} When wholestage codegen is enabled (the default), this works fine. When it's disabled, as in the test above, Spark returns {{Array(Array((3,4), (3,4)))}}. Because the last element of the array appears to be repeated my best guess is that the interpreted evaluation codepath forgot to {{copy()}} somewhere. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15083) History Server would OOM due to unlimited TaskUIData in some stages
[ https://issues.apache.org/jira/browse/SPARK-15083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423393#comment-15423393 ] Apache Spark commented on SPARK-15083: -- User 'ajbozarth' has created a pull request for this issue: https://github.com/apache/spark/pull/14673 > History Server would OOM due to unlimited TaskUIData in some stages > --- > > Key: SPARK-15083 > URL: https://issues.apache.org/jira/browse/SPARK-15083 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.5.2, 1.6.0, 2.0.0 >Reporter: Zheng Tan > Attachments: Screen Shot 2016-05-01 at 3.50.02 PM.png, Screen Shot > 2016-05-01 at 3.51.01 PM.png, Screen Shot 2016-05-01 at 3.51.59 PM.png, > Screen Shot 2016-05-01 at 3.55.30 PM.png > > > History Server will load all tasks in a stage, which would cause memory leak > if tasks occupy too many memory. > In the following example, a single application would consume 1.1G memory of > History Sever. > I think we should limit tasks memory usages by adding spark.ui.retainedTasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17092) DataFrame with large number of columns causing code generation error
Aris V created SPARK-17092: -- Summary: DataFrame with large number of columns causing code generation error Key: SPARK-17092 URL: https://issues.apache.org/jira/browse/SPARK-17092 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Environment: On vanilla Spark hadoop 2.7 Scala 2.11 in Linux CentOS, cluster with 9 slaves. Amazon AWS node size m3.2xlarge. Reporter: Aris V On vanilla Spark hadoop 2.7 Scala 2.11: When I use randomSplit on a DataFrame with several hundreds of columns, I get Janino code generation errors. The lowest number of columns that triggers the bug is around 500 or less. The error message: ``` Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method "(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I" of class "org.apache.spark.sql.catalyst.ex pressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB ``` Here is a small code sample which causes it in spark-shell ``` import org.apache.spark.sql.types.{DoubleType, StructType} import org.apache.spark.sql.{Row, SparkSession} val COLMAX: Double = 500.0 val ROWSIZE: Int = 1000 val intToRow: Int => Row = (i: Int) => Row.fromSeq(Range.Double.inclusive(1.0, COLMAX, 1.0).toSeq) val schema: StructType = (1 to COLMAX.toInt).foldLeft(new StructType())((s, i) => s.add(i.toString, DoubleType, nullable = true)) val rdds = spark.sparkContext.parallelize((1 to ROWSIZE).map(intToRow)) val df = spark.createDataFrame(rdds, schema) val Array(left, right) = df.randomSplit(Array(.8,.2)) // This crashes left.count ``` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17034) Ordinal in ORDER BY or GROUP BY should be treated as an unresolved expression
[ https://issues.apache.org/jira/browse/SPARK-17034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423376#comment-15423376 ] Apache Spark commented on SPARK-17034: -- User 'petermaxlee' has created a pull request for this issue: https://github.com/apache/spark/pull/14672 > Ordinal in ORDER BY or GROUP BY should be treated as an unresolved expression > - > > Key: SPARK-17034 > URL: https://issues.apache.org/jira/browse/SPARK-17034 > Project: Spark > Issue Type: Bug >Reporter: Sean Zhong >Assignee: Sean Zhong > Fix For: 2.1.0 > > > Ordinals in GROUP BY or ORDER BY like "1" in "order by 1" or "group by 1" > should be considered as unresolved before analysis. But in current code, it > uses "Literal" expression to store the ordinal. This is inappropriate as > "Literal" itself is a resolved expression, it gives the user a wrong message > that the ordinals has already been resolved. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16656) CreateTableAsSelectSuite is flaky
[ https://issues.apache.org/jira/browse/SPARK-16656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-16656: - Fix Version/s: 1.6.3 > CreateTableAsSelectSuite is flaky > - > > Key: SPARK-16656 > URL: https://issues.apache.org/jira/browse/SPARK-16656 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Yin Huai >Assignee: Yin Huai > Fix For: 1.6.3, 2.0.1, 2.1.0 > > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62593/testReport/junit/org.apache.spark.sql.sources/CreateTableAsSelectSuite/create_a_table__drop_it_and_create_another_one_with_the_same_name/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17089) Remove link of api doc for mapReduceTriplets because its removed from api.
[ https://issues.apache.org/jira/browse/SPARK-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-17089. - Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 > Remove link of api doc for mapReduceTriplets because its removed from api. > --- > > Key: SPARK-17089 > URL: https://issues.apache.org/jira/browse/SPARK-17089 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.0.0 >Reporter: sandeep purohit >Assignee: sandeep purohit >Priority: Trivial > Fix For: 2.0.1, 2.1.0 > > > Remove link of api doc for mapReduceTriplets because its removed from api > because when user redirected to the latest api doc they cant get any api > description for mapReduceTriplets -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17089) Remove link of api doc for mapReduceTriplets because its removed from api.
[ https://issues.apache.org/jira/browse/SPARK-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-17089: Assignee: sandeep purohit > Remove link of api doc for mapReduceTriplets because its removed from api. > --- > > Key: SPARK-17089 > URL: https://issues.apache.org/jira/browse/SPARK-17089 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.0.0 >Reporter: sandeep purohit >Assignee: sandeep purohit >Priority: Trivial > Fix For: 2.0.1, 2.1.0 > > > Remove link of api doc for mapReduceTriplets because its removed from api > because when user redirected to the latest api doc they cant get any api > description for mapReduceTriplets -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17091) ParquetFilters rewrite IN to OR of Eq
[ https://issues.apache.org/jira/browse/SPARK-17091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423245#comment-15423245 ] Apache Spark commented on SPARK-17091: -- User 'andreweduffy' has created a pull request for this issue: https://github.com/apache/spark/pull/14671 > ParquetFilters rewrite IN to OR of Eq > - > > Key: SPARK-17091 > URL: https://issues.apache.org/jira/browse/SPARK-17091 > Project: Spark > Issue Type: Bug >Reporter: Andrew Duffy > > Past attempts at pushing down the InSet operation for Parquet relied on > user-defined predicates. It would be simpler to rewrite an IN clause into the > corresponding OR union of a set of equality conditions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17091) ParquetFilters rewrite IN to OR of Eq
[ https://issues.apache.org/jira/browse/SPARK-17091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17091: Assignee: Apache Spark > ParquetFilters rewrite IN to OR of Eq > - > > Key: SPARK-17091 > URL: https://issues.apache.org/jira/browse/SPARK-17091 > Project: Spark > Issue Type: Bug >Reporter: Andrew Duffy >Assignee: Apache Spark > > Past attempts at pushing down the InSet operation for Parquet relied on > user-defined predicates. It would be simpler to rewrite an IN clause into the > corresponding OR union of a set of equality conditions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17091) ParquetFilters rewrite IN to OR of Eq
[ https://issues.apache.org/jira/browse/SPARK-17091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17091: Assignee: (was: Apache Spark) > ParquetFilters rewrite IN to OR of Eq > - > > Key: SPARK-17091 > URL: https://issues.apache.org/jira/browse/SPARK-17091 > Project: Spark > Issue Type: Bug >Reporter: Andrew Duffy > > Past attempts at pushing down the InSet operation for Parquet relied on > user-defined predicates. It would be simpler to rewrite an IN clause into the > corresponding OR union of a set of equality conditions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16654) UI Should show blacklisted executors & nodes
[ https://issues.apache.org/jira/browse/SPARK-16654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423240#comment-15423240 ] Alex Bozarth commented on SPARK-16654: -- I don't have time right now to tackle this so go right ahead. And the other part of my comment was a implementation suggestion. We currently have a "status" column that lists either Alive or Dead. I'm suggesting that when shown, Blacklisted nodes are listed as Blacklisted or Alive (Blacklisted) in the status column, this would make the ui change for this very minimal to the user even though it'll be a good chunk of code to make it work behind the scenes. > UI Should show blacklisted executors & nodes > > > Key: SPARK-16654 > URL: https://issues.apache.org/jira/browse/SPARK-16654 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Web UI >Affects Versions: 2.0.0 >Reporter: Imran Rashid > > SPARK-8425 will add the ability to blacklist entire executors and nodes to > deal w/ faulty hardware. However, without displaying it on the UI, it can be > hard to realize which executor is bad, and why tasks aren't getting scheduled > on certain executors. > As a first step, we should just show nodes and executors that are blacklisted > for the entire application (no need to show blacklisting for tasks & stages). > This should also ensure that blacklisting events get into the event logs for > the history server. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15285) Generated SpecificSafeProjection.apply method grows beyond 64 KB
[ https://issues.apache.org/jira/browse/SPARK-15285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423235#comment-15423235 ] Apache Spark commented on SPARK-15285: -- User 'kiszk' has created a pull request for this issue: https://github.com/apache/spark/pull/14670 > Generated SpecificSafeProjection.apply method grows beyond 64 KB > > > Key: SPARK-15285 > URL: https://issues.apache.org/jira/browse/SPARK-15285 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1, 2.0.0 >Reporter: Konstantin Shaposhnikov >Assignee: Kazuaki Ishizaki > Fix For: 2.0.0 > > > The following code snippet results in > {noformat} > org.codehaus.janino.JaninoRuntimeException: Code of method > "(Ljava/lang/Object;)Ljava/lang/Object;" of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection" > grows beyond 64 KB > at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:941) > {noformat} > {code} > case class S100(s1:String="1", s2:String="2", s3:String="3", s4:String="4", > s5:String="5", s6:String="6", s7:String="7", s8:String="8", s9:String="9", > s10:String="10", s11:String="11", s12:String="12", s13:String="13", > s14:String="14", s15:String="15", s16:String="16", s17:String="17", > s18:String="18", s19:String="19", s20:String="20", s21:String="21", > s22:String="22", s23:String="23", s24:String="24", s25:String="25", > s26:String="26", s27:String="27", s28:String="28", s29:String="29", > s30:String="30", s31:String="31", s32:String="32", s33:String="33", > s34:String="34", s35:String="35", s36:String="36", s37:String="37", > s38:String="38", s39:String="39", s40:String="40", s41:String="41", > s42:String="42", s43:String="43", s44:String="44", s45:String="45", > s46:String="46", s47:String="47", s48:String="48", s49:String="49", > s50:String="50", s51:String="51", s52:String="52", s53:String="53", > s54:String="54", s55:String="55", s56:String="56", s57:String="57", > s58:String="58", s59:String="59", s60:String="60", s61:String="61", > s62:String="62", s63:String="63", s64:String="64", s65:String="65", > s66:String="66", s67:String="67", s68:String="68", s69:String="69", > s70:String="70", s71:String="71", s72:String="72", s73:String="73", > s74:String="74", s75:String="75", s76:String="76", s77:String="77", > s78:String="78", s79:String="79", s80:String="80", s81:String="81", > s82:String="82", s83:String="83", s84:String="84", s85:String="85", > s86:String="86", s87:String="87", s88:String="88", s89:String="89", > s90:String="90", s91:String="91", s92:String="92", s93:String="93", > s94:String="94", s95:String="95", s96:String="96", s97:String="97", > s98:String="98", s99:String="99", s100:String="100") > case class S(s1: S100=S100(), s2: S100=S100(), s3: S100=S100(), s4: > S100=S100(), s5: S100=S100(), s6: S100=S100(), s7: S100=S100(), s8: > S100=S100(), s9: S100=S100(), s10: S100=S100()) > val ds = Seq(S(),S(),S()).toDS > ds.show() > {code} > I could reproduce this with Spark built from 1.6 branch and with > https://home.apache.org/~pwendell/spark-nightly/spark-master-bin/spark-2.0.0-SNAPSHOT-2016_05_11_01_03-8beae59-bin/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15285) Generated SpecificSafeProjection.apply method grows beyond 64 KB
[ https://issues.apache.org/jira/browse/SPARK-15285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15285: Assignee: Apache Spark (was: Kazuaki Ishizaki) > Generated SpecificSafeProjection.apply method grows beyond 64 KB > > > Key: SPARK-15285 > URL: https://issues.apache.org/jira/browse/SPARK-15285 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1, 2.0.0 >Reporter: Konstantin Shaposhnikov >Assignee: Apache Spark > Fix For: 2.0.0 > > > The following code snippet results in > {noformat} > org.codehaus.janino.JaninoRuntimeException: Code of method > "(Ljava/lang/Object;)Ljava/lang/Object;" of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection" > grows beyond 64 KB > at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:941) > {noformat} > {code} > case class S100(s1:String="1", s2:String="2", s3:String="3", s4:String="4", > s5:String="5", s6:String="6", s7:String="7", s8:String="8", s9:String="9", > s10:String="10", s11:String="11", s12:String="12", s13:String="13", > s14:String="14", s15:String="15", s16:String="16", s17:String="17", > s18:String="18", s19:String="19", s20:String="20", s21:String="21", > s22:String="22", s23:String="23", s24:String="24", s25:String="25", > s26:String="26", s27:String="27", s28:String="28", s29:String="29", > s30:String="30", s31:String="31", s32:String="32", s33:String="33", > s34:String="34", s35:String="35", s36:String="36", s37:String="37", > s38:String="38", s39:String="39", s40:String="40", s41:String="41", > s42:String="42", s43:String="43", s44:String="44", s45:String="45", > s46:String="46", s47:String="47", s48:String="48", s49:String="49", > s50:String="50", s51:String="51", s52:String="52", s53:String="53", > s54:String="54", s55:String="55", s56:String="56", s57:String="57", > s58:String="58", s59:String="59", s60:String="60", s61:String="61", > s62:String="62", s63:String="63", s64:String="64", s65:String="65", > s66:String="66", s67:String="67", s68:String="68", s69:String="69", > s70:String="70", s71:String="71", s72:String="72", s73:String="73", > s74:String="74", s75:String="75", s76:String="76", s77:String="77", > s78:String="78", s79:String="79", s80:String="80", s81:String="81", > s82:String="82", s83:String="83", s84:String="84", s85:String="85", > s86:String="86", s87:String="87", s88:String="88", s89:String="89", > s90:String="90", s91:String="91", s92:String="92", s93:String="93", > s94:String="94", s95:String="95", s96:String="96", s97:String="97", > s98:String="98", s99:String="99", s100:String="100") > case class S(s1: S100=S100(), s2: S100=S100(), s3: S100=S100(), s4: > S100=S100(), s5: S100=S100(), s6: S100=S100(), s7: S100=S100(), s8: > S100=S100(), s9: S100=S100(), s10: S100=S100()) > val ds = Seq(S(),S(),S()).toDS > ds.show() > {code} > I could reproduce this with Spark built from 1.6 branch and with > https://home.apache.org/~pwendell/spark-nightly/spark-master-bin/spark-2.0.0-SNAPSHOT-2016_05_11_01_03-8beae59-bin/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-15285) Generated SpecificSafeProjection.apply method grows beyond 64 KB
[ https://issues.apache.org/jira/browse/SPARK-15285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15285: Assignee: Kazuaki Ishizaki (was: Apache Spark) > Generated SpecificSafeProjection.apply method grows beyond 64 KB > > > Key: SPARK-15285 > URL: https://issues.apache.org/jira/browse/SPARK-15285 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1, 2.0.0 >Reporter: Konstantin Shaposhnikov >Assignee: Kazuaki Ishizaki > Fix For: 2.0.0 > > > The following code snippet results in > {noformat} > org.codehaus.janino.JaninoRuntimeException: Code of method > "(Ljava/lang/Object;)Ljava/lang/Object;" of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection" > grows beyond 64 KB > at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:941) > {noformat} > {code} > case class S100(s1:String="1", s2:String="2", s3:String="3", s4:String="4", > s5:String="5", s6:String="6", s7:String="7", s8:String="8", s9:String="9", > s10:String="10", s11:String="11", s12:String="12", s13:String="13", > s14:String="14", s15:String="15", s16:String="16", s17:String="17", > s18:String="18", s19:String="19", s20:String="20", s21:String="21", > s22:String="22", s23:String="23", s24:String="24", s25:String="25", > s26:String="26", s27:String="27", s28:String="28", s29:String="29", > s30:String="30", s31:String="31", s32:String="32", s33:String="33", > s34:String="34", s35:String="35", s36:String="36", s37:String="37", > s38:String="38", s39:String="39", s40:String="40", s41:String="41", > s42:String="42", s43:String="43", s44:String="44", s45:String="45", > s46:String="46", s47:String="47", s48:String="48", s49:String="49", > s50:String="50", s51:String="51", s52:String="52", s53:String="53", > s54:String="54", s55:String="55", s56:String="56", s57:String="57", > s58:String="58", s59:String="59", s60:String="60", s61:String="61", > s62:String="62", s63:String="63", s64:String="64", s65:String="65", > s66:String="66", s67:String="67", s68:String="68", s69:String="69", > s70:String="70", s71:String="71", s72:String="72", s73:String="73", > s74:String="74", s75:String="75", s76:String="76", s77:String="77", > s78:String="78", s79:String="79", s80:String="80", s81:String="81", > s82:String="82", s83:String="83", s84:String="84", s85:String="85", > s86:String="86", s87:String="87", s88:String="88", s89:String="89", > s90:String="90", s91:String="91", s92:String="92", s93:String="93", > s94:String="94", s95:String="95", s96:String="96", s97:String="97", > s98:String="98", s99:String="99", s100:String="100") > case class S(s1: S100=S100(), s2: S100=S100(), s3: S100=S100(), s4: > S100=S100(), s5: S100=S100(), s6: S100=S100(), s7: S100=S100(), s8: > S100=S100(), s9: S100=S100(), s10: S100=S100()) > val ds = Seq(S(),S(),S()).toDS > ds.show() > {code} > I could reproduce this with Spark built from 1.6 branch and with > https://home.apache.org/~pwendell/spark-nightly/spark-master-bin/spark-2.0.0-SNAPSHOT-2016_05_11_01_03-8beae59-bin/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16519) Handle SparkR RDD generics that create warnings in R CMD check
[ https://issues.apache.org/jira/browse/SPARK-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman updated SPARK-16519: -- Assignee: Felix Cheung > Handle SparkR RDD generics that create warnings in R CMD check > -- > > Key: SPARK-16519 > URL: https://issues.apache.org/jira/browse/SPARK-16519 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Shivaram Venkataraman >Assignee: Felix Cheung > Fix For: 2.0.1, 2.1.0 > > > One of the warnings we get from R CMD check is that RDD implementations of > some of the generics are not documented. These generics are shared between > RDD, DataFrames in SparkR. The list includes > {quote} > WARNING > Undocumented S4 methods: > generic 'cache' and siglist 'RDD' > generic 'collect' and siglist 'RDD' > generic 'count' and siglist 'RDD' > generic 'distinct' and siglist 'RDD' > generic 'first' and siglist 'RDD' > generic 'join' and siglist 'RDD,RDD' > generic 'length' and siglist 'RDD' > generic 'partitionBy' and siglist 'RDD' > generic 'persist' and siglist 'RDD,character' > generic 'repartition' and siglist 'RDD' > generic 'show' and siglist 'RDD' > generic 'take' and siglist 'RDD,numeric' > generic 'unpersist' and siglist 'RDD' > {quote} > As described in > https://stat.ethz.ch/pipermail/r-devel/2003-September/027490.html this looks > like a limitation of R where exporting a generic from a package also exports > all the implementations of that generic. > One way to get around this is to remove the RDD API or rename the methods in > Spark 2.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-16519) Handle SparkR RDD generics that create warnings in R CMD check
[ https://issues.apache.org/jira/browse/SPARK-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman resolved SPARK-16519. --- Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull request 14626 [https://github.com/apache/spark/pull/14626] > Handle SparkR RDD generics that create warnings in R CMD check > -- > > Key: SPARK-16519 > URL: https://issues.apache.org/jira/browse/SPARK-16519 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Shivaram Venkataraman > Fix For: 2.0.1, 2.1.0 > > > One of the warnings we get from R CMD check is that RDD implementations of > some of the generics are not documented. These generics are shared between > RDD, DataFrames in SparkR. The list includes > {quote} > WARNING > Undocumented S4 methods: > generic 'cache' and siglist 'RDD' > generic 'collect' and siglist 'RDD' > generic 'count' and siglist 'RDD' > generic 'distinct' and siglist 'RDD' > generic 'first' and siglist 'RDD' > generic 'join' and siglist 'RDD,RDD' > generic 'length' and siglist 'RDD' > generic 'partitionBy' and siglist 'RDD' > generic 'persist' and siglist 'RDD,character' > generic 'repartition' and siglist 'RDD' > generic 'show' and siglist 'RDD' > generic 'take' and siglist 'RDD,numeric' > generic 'unpersist' and siglist 'RDD' > {quote} > As described in > https://stat.ethz.ch/pipermail/r-devel/2003-September/027490.html this looks > like a limitation of R where exporting a generic from a package also exports > all the implementations of that generic. > One way to get around this is to remove the RDD API or rename the methods in > Spark 2.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17091) ParquetFilters rewrite IN to OR of Eq
Andrew Duffy created SPARK-17091: Summary: ParquetFilters rewrite IN to OR of Eq Key: SPARK-17091 URL: https://issues.apache.org/jira/browse/SPARK-17091 Project: Spark Issue Type: Bug Reporter: Andrew Duffy Past attempts at pushing down the InSet operation for Parquet relied on user-defined predicates. It would be simpler to rewrite an IN clause into the corresponding OR union of a set of equality conditions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17002) Document that spark.ssl.protocol. is required for SSL
[ https://issues.apache.org/jira/browse/SPARK-17002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423156#comment-15423156 ] Miao Wang commented on SPARK-17002: --- I think we can add a require() statement and remove the getOrElse part. Thus, in your case, it will throw a meaningful message. I can create a PR for this one. > Document that spark.ssl.protocol. is required for SSL > - > > Key: SPARK-17002 > URL: https://issues.apache.org/jira/browse/SPARK-17002 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.2, 2.0.0 >Reporter: Michael Gummelt > > cc [~jlewandowski] > I was trying to start the Spark master. When setting > {{spark.ssl.enabled=true}}, but failing to set {{spark.ssl.protocol}}, I get > this none-too-helpful error message: > {code} > 16/08/10 15:17:50 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(mgummelt); users > with modify permissions: Set(mgummelt) > 16/08/10 15:17:50 WARN SecurityManager: Using 'accept-all' trust manager for > SSL connections. > Exception in thread "main" java.security.KeyManagementException: Default > SSLContext is initialized automatically > at > sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749) > at javax.net.ssl.SSLContext.init(SSLContext.java:282) > at org.apache.spark.SecurityManager.(SecurityManager.scala:284) > at > org.apache.spark.deploy.master.Master$.startRpcEnvAndEndpoint(Master.scala:1121) > at org.apache.spark.deploy.master.Master$.main(Master.scala:1106) > at org.apache.spark.deploy.master.Master.main(Master.scala) > {code} > We should document that {{spark.ssl.protocol}} is required, and throw a more > helpful error message when it isn't set. In fact, we should remove the > `getOrElse` here: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SecurityManager.scala#L285, > since the following line fails when the protocol is set to "Default" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17090) Make tree aggregation level in linear/logistic regression configurable
[ https://issues.apache.org/jira/browse/SPARK-17090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423151#comment-15423151 ] Seth Hendrickson commented on SPARK-17090: -- cc [~dbtsai] > Make tree aggregation level in linear/logistic regression configurable > -- > > Key: SPARK-17090 > URL: https://issues.apache.org/jira/browse/SPARK-17090 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Seth Hendrickson >Priority: Minor > > Linear/logistic regression use treeAggregate with default aggregation depth > for collecting coefficient gradient updates to the driver. For high > dimensional problems, this can case OOM error on the driver. We should make > it configurable, perhaps via an expert param, so that users can avoid this > problem if their data has many features. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17090) Make tree aggregation level in linear/logistic regression configurable
Seth Hendrickson created SPARK-17090: Summary: Make tree aggregation level in linear/logistic regression configurable Key: SPARK-17090 URL: https://issues.apache.org/jira/browse/SPARK-17090 Project: Spark Issue Type: Improvement Components: ML Reporter: Seth Hendrickson Priority: Minor Linear/logistic regression use treeAggregate with default aggregation depth for collecting coefficient gradient updates to the driver. For high dimensional problems, this can case OOM error on the driver. We should make it configurable, perhaps via an expert param, so that users can avoid this problem if their data has many features. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17054) SparkR can not run in yarn-cluster mode on mac os
[ https://issues.apache.org/jira/browse/SPARK-17054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423139#comment-15423139 ] Miao Wang commented on SPARK-17054: --- Maybe, I can try it out. > SparkR can not run in yarn-cluster mode on mac os > - > > Key: SPARK-17054 > URL: https://issues.apache.org/jira/browse/SPARK-17054 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.0.0 >Reporter: Jeff Zhang > > This is due to it download sparkR to the wrong place. > {noformat} > Warning message: > 'sparkR.init' is deprecated. > Use 'sparkR.session' instead. > See help("Deprecated") > Spark not found in SPARK_HOME: . > To search in the cache directory. Installation will start if not found. > Mirror site not provided. > Looking for site suggested from apache website... > Preferred mirror site found: http://apache.mirror.cdnetworks.com/spark > Downloading Spark spark-2.0.0 for Hadoop 2.7 from: > - > http://apache.mirror.cdnetworks.com/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz > Fetch failed from http://apache.mirror.cdnetworks.com/spark > open destfile '/home//Library/Caches/spark/spark-2.0.0-bin-hadoop2.7.tgz', > reason 'No such file or directory'> > To use backup site... > Downloading Spark spark-2.0.0 for Hadoop 2.7 from: > - > http://www-us.apache.org/dist/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz > Fetch failed from http://www-us.apache.org/dist/spark > open destfile '/home//Library/Caches/spark/spark-2.0.0-bin-hadoop2.7.tgz', > reason 'No such file or directory'> > Error in robust_download_tar(mirrorUrl, version, hadoopVersion, packageName, > : > Unable to download Spark spark-2.0.0 for Hadoop 2.7. Please check network > connection, Hadoop version, or provide other mirror sites. > Calls: sparkRSQL.init ... sparkR.session -> install.spark -> > robust_download_tar > In addition: Warning messages: > 1: 'sparkRSQL.init' is deprecated. > Use 'sparkR.session' instead. > See help("Deprecated") > 2: In dir.create(localDir, recursive = TRUE) : > cannot create dir '/home//Library', reason 'Operation not supported' > Execution halted > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17089) Remove link of api doc for mapReduceTriplets because its removed from api.
[ https://issues.apache.org/jira/browse/SPARK-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423135#comment-15423135 ] Apache Spark commented on SPARK-17089: -- User 'phalodi' has created a pull request for this issue: https://github.com/apache/spark/pull/14669 > Remove link of api doc for mapReduceTriplets because its removed from api. > --- > > Key: SPARK-17089 > URL: https://issues.apache.org/jira/browse/SPARK-17089 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.0.0 >Reporter: sandeep purohit >Priority: Trivial > > Remove link of api doc for mapReduceTriplets because its removed from api > because when user redirected to the latest api doc they cant get any api > description for mapReduceTriplets -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17089) Remove link of api doc for mapReduceTriplets because its removed from api.
[ https://issues.apache.org/jira/browse/SPARK-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17089: Assignee: Apache Spark > Remove link of api doc for mapReduceTriplets because its removed from api. > --- > > Key: SPARK-17089 > URL: https://issues.apache.org/jira/browse/SPARK-17089 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.0.0 >Reporter: sandeep purohit >Assignee: Apache Spark >Priority: Trivial > > Remove link of api doc for mapReduceTriplets because its removed from api > because when user redirected to the latest api doc they cant get any api > description for mapReduceTriplets -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17089) Remove link of api doc for mapReduceTriplets because its removed from api.
[ https://issues.apache.org/jira/browse/SPARK-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17089: Assignee: (was: Apache Spark) > Remove link of api doc for mapReduceTriplets because its removed from api. > --- > > Key: SPARK-17089 > URL: https://issues.apache.org/jira/browse/SPARK-17089 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.0.0 >Reporter: sandeep purohit >Priority: Trivial > > Remove link of api doc for mapReduceTriplets because its removed from api > because when user redirected to the latest api doc they cant get any api > description for mapReduceTriplets -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17089) Remove link of api doc for mapReduceTriplets because its removed from api.
sandeep purohit created SPARK-17089: --- Summary: Remove link of api doc for mapReduceTriplets because its removed from api. Key: SPARK-17089 URL: https://issues.apache.org/jira/browse/SPARK-17089 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 2.0.0 Reporter: sandeep purohit Priority: Trivial Remove link of api doc for mapReduceTriplets because its removed from api because when user redirected to the latest api doc they cant get any api description for mapReduceTriplets -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17088) IsolatedClientLoader fails to load Hive client when sharesHadoopClasses is false
Marcelo Vanzin created SPARK-17088: -- Summary: IsolatedClientLoader fails to load Hive client when sharesHadoopClasses is false Key: SPARK-17088 URL: https://issues.apache.org/jira/browse/SPARK-17088 Project: Spark Issue Type: Bug Components: SQL Reporter: Marcelo Vanzin Priority: Minor There's a bug in a very rare code path in {{IsolatedClientLoader}}: {code} case e: RuntimeException if e.getMessage.contains("hadoop") => // If the error message contains hadoop, it is probably because the hadoop // version cannot be resolved (e.g. it is a vendor specific version like // 2.0.0-cdh4.1.1). If it is the case, we will try just // "org.apache.hadoop:hadoop-client:2.4.0". "org.apache.hadoop:hadoop-client:2.4.0" // is used just because we used to hard code it as the hadoop artifact to download. logWarning(s"Failed to resolve Hadoop artifacts for the version ${hadoopVersion}. " + s"We will change the hadoop version from ${hadoopVersion} to 2.4.0 and try again. " + "Hadoop classes will not be shared between Spark and Hive metastore client. " + "It is recommended to set jars used by Hive metastore client through " + "spark.sql.hive.metastore.jars in the production environment.") sharesHadoopClasses = false {code} That's the rare part. But when {{sharesHadoopClasses}} is set to false, the instantiation of {{HiveClientImpl}} fails: {code} classLoader .loadClass(classOf[HiveClientImpl].getName) .getConstructors.head .newInstance(version, sparkConf, hadoopConf, config, classLoader, this) .asInstanceOf[HiveClient] {code} {{hadoopConf}} here is an instance of {{Configuration}} loaded by the main Spark class loader, but in this case {{HiveClientImpl}} expects an instance of {{Configuration}} loaded by the isolated class loader (yay class loaders are fun). So you get an error like this: {noformat} 2016-08-10 13:51:20.742 - stderr> Exception in thread "main" java.lang.IllegalArgumentException: argument type mismatch 2016-08-10 13:51:20.743 - stderr> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 2016-08-10 13:51:20.743 - stderr> at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) 2016-08-10 13:51:20.743 - stderr> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 2016-08-10 13:51:20.743 - stderr> at java.lang.reflect.Constructor.newInstance(Constructor.java:526) 2016-08-10 13:51:20.744 - stderr> at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264) 2016-08-10 13:51:20.744 - stderr> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:354) 2016-08-10 13:51:20.744 - stderr> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:258) 2016-08-10 13:51:20.744 - stderr> at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39) 2016-08-10 13:51:20.745 - stderr> at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17035) Conversion of datetime.max to microseconds produces incorrect value
[ https://issues.apache.org/jira/browse/SPARK-17035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-17035. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14631 [https://github.com/apache/spark/pull/14631] > Conversion of datetime.max to microseconds produces incorrect value > --- > > Key: SPARK-17035 > URL: https://issues.apache.org/jira/browse/SPARK-17035 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.0.0 >Reporter: Michael Styles >Priority: Minor > Fix For: 2.1.0 > > > Conversion of datetime.max to microseconds produces incorrect value. For > example, > {noformat} > from datetime import datetime > from pyspark.sql import Row > from pyspark.sql.types import StructType, StructField, TimestampType > schema = StructType([StructField("dt", TimestampType(), False)]) > data = [{"dt": datetime.max}] > # convert python objects to sql data > sql_data = [schema.toInternal(row) for row in data] > # Value is wrong. > sql_data > [(2.534023188e+17,)] > {noformat} > This value should be [(2534023187,)]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16656) CreateTableAsSelectSuite is flaky
[ https://issues.apache.org/jira/browse/SPARK-16656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423051#comment-15423051 ] Apache Spark commented on SPARK-16656: -- User 'yhuai' has created a pull request for this issue: https://github.com/apache/spark/pull/14668 > CreateTableAsSelectSuite is flaky > - > > Key: SPARK-16656 > URL: https://issues.apache.org/jira/browse/SPARK-16656 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Yin Huai >Assignee: Yin Huai > Fix For: 2.0.1, 2.1.0 > > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62593/testReport/junit/org.apache.spark.sql.sources/CreateTableAsSelectSuite/create_a_table__drop_it_and_create_another_one_with_the_same_name/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17087) Make Spark on Mesos honor port restrictions - Documentation
[ https://issues.apache.org/jira/browse/SPARK-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423007#comment-15423007 ] Apache Spark commented on SPARK-17087: -- User 'skonto' has created a pull request for this issue: https://github.com/apache/spark/pull/14667 > Make Spark on Mesos honor port restrictions - Documentation > --- > > Key: SPARK-17087 > URL: https://issues.apache.org/jira/browse/SPARK-17087 > Project: Spark > Issue Type: Documentation > Components: Mesos >Reporter: Stavros Kontopoulos > > Need to add the documentation missing from: > https://issues.apache.org/jira/browse/SPARK-11714 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17087) Make Spark on Mesos honor port restrictions - Documentation
[ https://issues.apache.org/jira/browse/SPARK-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17087: Assignee: (was: Apache Spark) > Make Spark on Mesos honor port restrictions - Documentation > --- > > Key: SPARK-17087 > URL: https://issues.apache.org/jira/browse/SPARK-17087 > Project: Spark > Issue Type: Documentation > Components: Mesos >Reporter: Stavros Kontopoulos > > Need to add the documentation missing from: > https://issues.apache.org/jira/browse/SPARK-11714 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17087) Make Spark on Mesos honor port restrictions - Documentation
[ https://issues.apache.org/jira/browse/SPARK-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17087: Assignee: Apache Spark > Make Spark on Mesos honor port restrictions - Documentation > --- > > Key: SPARK-17087 > URL: https://issues.apache.org/jira/browse/SPARK-17087 > Project: Spark > Issue Type: Documentation > Components: Mesos >Reporter: Stavros Kontopoulos >Assignee: Apache Spark > > Need to add the documentation missing from: > https://issues.apache.org/jira/browse/SPARK-11714 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17087) Make Spark on Mesos honor port restrictions - Documentation
[ https://issues.apache.org/jira/browse/SPARK-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-17087: Description: Need to add the documentation missing from: https://issues.apache.org/jira/browse/SPARK-11714 was: Adds the documentation missing from: https://issues.apache.org/jira/browse/SPARK-11714 > Make Spark on Mesos honor port restrictions - Documentation > --- > > Key: SPARK-17087 > URL: https://issues.apache.org/jira/browse/SPARK-17087 > Project: Spark > Issue Type: Documentation > Components: Mesos >Reporter: Stavros Kontopoulos > > Need to add the documentation missing from: > https://issues.apache.org/jira/browse/SPARK-11714 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17087) Make Spark on Mesos honor port restrictions - Documentation
[ https://issues.apache.org/jira/browse/SPARK-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15422996#comment-15422996 ] Stavros Kontopoulos commented on SPARK-17087: - WIP > Make Spark on Mesos honor port restrictions - Documentation > --- > > Key: SPARK-17087 > URL: https://issues.apache.org/jira/browse/SPARK-17087 > Project: Spark > Issue Type: Documentation > Components: Mesos >Reporter: Stavros Kontopoulos > > Need to add the documentation missing from: > https://issues.apache.org/jira/browse/SPARK-11714 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17087) Make Spark on Mesos honor port restrictions - Documentation
Stavros Kontopoulos created SPARK-17087: --- Summary: Make Spark on Mesos honor port restrictions - Documentation Key: SPARK-17087 URL: https://issues.apache.org/jira/browse/SPARK-17087 Project: Spark Issue Type: Documentation Components: Mesos Reporter: Stavros Kontopoulos Adds the documentation missing from: https://issues.apache.org/jira/browse/SPARK-11714 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17086) QuantileDiscretizer throws InvalidArgumentException (parameter splits given invalid value) on valid data
Barry Becker created SPARK-17086: Summary: QuantileDiscretizer throws InvalidArgumentException (parameter splits given invalid value) on valid data Key: SPARK-17086 URL: https://issues.apache.org/jira/browse/SPARK-17086 Project: Spark Issue Type: Bug Components: ML Affects Versions: 2.1.0 Reporter: Barry Becker I discovered this bug when working with a build from the master branch (which I believe is 2.1.0). This used to work fine when running spark 1.6.2. I have a dataframe with an "intData" column that has values like {code} 1 3 2 1 1 2 3 2 2 2 1 3 {code} I have a stage in my pipeline that uses the QuantileDiscretizer to produce equal weight splits like this {code} new QuantileDiscretizer() .setInputCol("intData") .setOutputCol("intData_bin") .setNumBuckets(10) .fit(df) {code} But when that gets run it (incorrectly) throws this error: {code} parameter splits given invalid value [-Infinity, 1.0, 1.0, 2.0, 2.0, 3.0, 3.0, Infinity] {code} I don't think that there should be duplicate splits generated should there be? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-16578) Configurable hostname for RBackend
[ https://issues.apache.org/jira/browse/SPARK-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16578: Assignee: (was: Apache Spark) > Configurable hostname for RBackend > -- > > Key: SPARK-16578 > URL: https://issues.apache.org/jira/browse/SPARK-16578 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Shivaram Venkataraman > > One of the requirements that comes up with SparkR being a standalone package > is that users can now install just the R package on the client side and > connect to a remote machine which runs the RBackend class. > We should check if we can support this mode of execution and what are the > pros / cons of it -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16578) Configurable hostname for RBackend
[ https://issues.apache.org/jira/browse/SPARK-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15422978#comment-15422978 ] Apache Spark commented on SPARK-16578: -- User 'junyangq' has created a pull request for this issue: https://github.com/apache/spark/pull/14666 > Configurable hostname for RBackend > -- > > Key: SPARK-16578 > URL: https://issues.apache.org/jira/browse/SPARK-16578 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Shivaram Venkataraman > > One of the requirements that comes up with SparkR being a standalone package > is that users can now install just the R package on the client side and > connect to a remote machine which runs the RBackend class. > We should check if we can support this mode of execution and what are the > pros / cons of it -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-16578) Configurable hostname for RBackend
[ https://issues.apache.org/jira/browse/SPARK-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16578: Assignee: Apache Spark > Configurable hostname for RBackend > -- > > Key: SPARK-16578 > URL: https://issues.apache.org/jira/browse/SPARK-16578 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Shivaram Venkataraman >Assignee: Apache Spark > > One of the requirements that comes up with SparkR being a standalone package > is that users can now install just the R package on the client side and > connect to a remote machine which runs the RBackend class. > We should check if we can support this mode of execution and what are the > pros / cons of it -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16484) Incremental Cardinality estimation operations with Hyperloglog
[ https://issues.apache.org/jira/browse/SPARK-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15422887#comment-15422887 ] Yongjia Wang commented on SPARK-16484: -- Here is my solution using Spark UDAF and UDT https://github.com/yongjiaw/Spark_HLL > Incremental Cardinality estimation operations with Hyperloglog > -- > > Key: SPARK-16484 > URL: https://issues.apache.org/jira/browse/SPARK-16484 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Yongjia Wang > > Efficient cardinality estimation is very important, and SparkSQL has had > approxCountDistinct based on Hyperloglog for quite some time. However, there > isn't a way to do incremental estimation. For example, if we want to get > updated distinct counts of the last 90 days, we need to do the aggregation > for the entire window over and over again. The more efficient way involves > serializing the counter for smaller time windows (such as hourly) so the > counts can be efficiently updated in an incremental fashion for any time > window. > With the support of custom UDAF, Binary DataType and the HyperloglogPlusPlus > implementation in the current Spark version, it's easy enough to extend the > functionality to include incremental counting, and even other general set > operations such as intersection and set difference. Spark API is already as > elegant as it can be, but it still takes quite some effort to do a custom > implementation of the aforementioned operations which are supposed to be in > high demand. I have been searching but failed to find an usable existing > solution nor any ongoing effort for this. The closest I got is the following > but it does not work with Spark 1.6 due to API changes. > https://github.com/collectivemedia/spark-hyperloglog/blob/master/src/main/scala/org/apache/spark/sql/hyperloglog/aggregates.scala > I wonder if it worth to integrate such operations into SparkSQL. The only > problem I see is it depends on serialization of a specific HLL implementation > and introduce compatibility issues. But as long as the user is aware of such > issue, it should be fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17085) Documentation and actual code differs - Unsupported Operations
[ https://issues.apache.org/jira/browse/SPARK-17085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15422864#comment-15422864 ] Sean Owen commented on SPARK-17085: --- Yes I think the first doc link is wrong. Go ahead with a pull request. > Documentation and actual code differs - Unsupported Operations > -- > > Key: SPARK-17085 > URL: https://issues.apache.org/jira/browse/SPARK-17085 > Project: Spark > Issue Type: Documentation > Components: Streaming >Affects Versions: 2.0.0 >Reporter: Samritti >Priority: Minor > > Spark Stuctured Streaming doc in this link > https://spark.apache.org/docs/2.0.0/structured-streaming-programming-guide.html#unsupported-operations > mentions > >>>"Right outer join with a streaming Dataset on the right is not supported" > but the code here conveys a different/opposite error > https://github.com/apache/spark/blob/5545b791096756b07b3207fb3de13b68b9a37b00/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala#L114 > >>>"Right outer join with a streaming DataFrame/Dataset on the left is " + > "not supported" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11714) Make Spark on Mesos honor port restrictions
[ https://issues.apache.org/jira/browse/SPARK-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15422856#comment-15422856 ] Stavros Kontopoulos commented on SPARK-11714: - I will create another one for the documentation. I guess will need to document the behavior. > Make Spark on Mesos honor port restrictions > --- > > Key: SPARK-11714 > URL: https://issues.apache.org/jira/browse/SPARK-11714 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Charles Allen >Assignee: Stavros Kontopoulos > Fix For: 2.1.0 > > > Currently the MesosSchedulerBackend does not make any effort to honor "ports" > as a resource offer in Mesos. This ask is to have the ports which the > executor binds to honor the limits of the "ports" resource of an offer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17085) Documentation and actual code differs - Unsupported Operations
[ https://issues.apache.org/jira/browse/SPARK-17085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samritti updated SPARK-17085: - Priority: Minor (was: Major) > Documentation and actual code differs - Unsupported Operations > -- > > Key: SPARK-17085 > URL: https://issues.apache.org/jira/browse/SPARK-17085 > Project: Spark > Issue Type: Documentation > Components: Streaming >Affects Versions: 2.0.0 >Reporter: Samritti >Priority: Minor > > Spark Stuctured Streaming doc in this link > https://spark.apache.org/docs/2.0.0/structured-streaming-programming-guide.html#unsupported-operations > mentions > >>>"Right outer join with a streaming Dataset on the right is not supported" > but the code here conveys a different/opposite error > https://github.com/apache/spark/blob/5545b791096756b07b3207fb3de13b68b9a37b00/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala#L114 > >>>"Right outer join with a streaming DataFrame/Dataset on the left is " + > "not supported" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org