[jira] [Assigned] (SPARK-13812) Fix SparkR lint-r test errors
[ https://issues.apache.org/jira/browse/SPARK-13812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13812: Assignee: Apache Spark > Fix SparkR lint-r test errors > - > > Key: SPARK-13812 > URL: https://issues.apache.org/jira/browse/SPARK-13812 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.6.1 >Reporter: Sun Rui >Assignee: Apache Spark > > After get updated from github, the lintr package can detect errors that are > not detected in previous versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13812) Fix SparkR lint-r test errors
[ https://issues.apache.org/jira/browse/SPARK-13812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13812: Assignee: (was: Apache Spark) > Fix SparkR lint-r test errors > - > > Key: SPARK-13812 > URL: https://issues.apache.org/jira/browse/SPARK-13812 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.6.1 >Reporter: Sun Rui > > After get updated from github, the lintr package can detect errors that are > not detected in previous versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13812) Fix SparkR lint-r test errors
[ https://issues.apache.org/jira/browse/SPARK-13812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190607#comment-15190607 ] Apache Spark commented on SPARK-13812: -- User 'sun-rui' has created a pull request for this issue: https://github.com/apache/spark/pull/11652 > Fix SparkR lint-r test errors > - > > Key: SPARK-13812 > URL: https://issues.apache.org/jira/browse/SPARK-13812 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.6.1 >Reporter: Sun Rui > > After get updated from github, the lintr package can detect errors that are > not detected in previous versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13815) UnsupportedOperationException: empty collection when metadata for a Pipeline is an empty file
Jacek Laskowski created SPARK-13815: --- Summary: UnsupportedOperationException: empty collection when metadata for a Pipeline is an empty file Key: SPARK-13815 URL: https://issues.apache.org/jira/browse/SPARK-13815 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 2.0.0 Environment: today's build of 2.0.0-SNAPSHOT Reporter: Jacek Laskowski Priority: Minor The following code that loads a {{Pipeline}} from an empty {{metadata}} file throws an exception (expected) that says nothing about the real cause of it. {code} $ ls -l hello-pipeline/metadata -rw-r--r-- 1 jacek staff 0 11 mar 09:00 hello-pipeline/metadata scala> Pipeline.read.load("hello-pipeline") ... java.lang.UnsupportedOperationException: empty collection at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1344) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.first(RDD.scala:1341) at org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:285) at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:253) at org.apache.spark.ml.Pipeline$PipelineReader.load(Pipeline.scala:203) at org.apache.spark.ml.Pipeline$PipelineReader.load(Pipeline.scala:197) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13804) Spark SQL's DataFrame.count() Major Divergent (Non-Linear) Performance Slowdown going from 4million rows to 16+ million rows
[ https://issues.apache.org/jira/browse/SPARK-13804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-13804. --- Resolution: Invalid So, this started as an entirely different issue. This is better since the original sounds like a duplicate of your other JIRA. This however should be a question to user@ to start. There are too many possiblities that don't mean there's a bug (major one being not being able to cache the data set) > Spark SQL's DataFrame.count() Major Divergent (Non-Linear) Performance > Slowdown going from 4million rows to 16+ million rows > - > > Key: SPARK-13804 > URL: https://issues.apache.org/jira/browse/SPARK-13804 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 > Environment: - 3 nodes Spark cluster: 1 master node and 2 slave nodes > - Each node is an EC2 with c3.4xlarge > - Each node has 16 cores and 30GB of RAM >Reporter: Michael Nguyen > > Spark SQL is used to load cvs files via com.databricks.spark.csv and then run > dataFrame.count() > In the same environment with plenty of CPU and RAM, Spark SQL takes > - 18.25 seconds to load a table with 4 millions vs > - 346.624 seconds (5.77 minutes) to load a table with 16 million rows. > Even though the number of rows increases by 4 times, the time it takes Spark > SQL to run dataframe.count () increases by 19.22 times. So the performance of > dataframe.count () diverges so drastically. > 1. Why does Spark SQL's performance not proportional to the number of rows > while there is plenty of CPU and RAM (it uses only 10GB out of 30GB RAM) ? > 2. What can be done to fix this performance issue ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13816) Add parameter checks for algorithms in Graphx
zhengruifeng created SPARK-13816: Summary: Add parameter checks for algorithms in Graphx Key: SPARK-13816 URL: https://issues.apache.org/jira/browse/SPARK-13816 Project: Spark Issue Type: Improvement Components: GraphX Reporter: zhengruifeng Priority: Trivial Add parameter checks in Graphx-Algorithms: maxIterations in Pregel maxSteps in LabelPropagation numIter,resetProb,tol in PageRank maxIters,maxVal,minVal in SVDPlusPlus -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13816) Add parameter checks for algorithms in Graphx
[ https://issues.apache.org/jira/browse/SPARK-13816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13816: Assignee: (was: Apache Spark) > Add parameter checks for algorithms in Graphx > -- > > Key: SPARK-13816 > URL: https://issues.apache.org/jira/browse/SPARK-13816 > Project: Spark > Issue Type: Improvement > Components: GraphX >Reporter: zhengruifeng >Priority: Trivial > > Add parameter checks in Graphx-Algorithms: > maxIterations in Pregel > maxSteps in LabelPropagation > numIter,resetProb,tol in PageRank > maxIters,maxVal,minVal in SVDPlusPlus -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13816) Add parameter checks for algorithms in Graphx
[ https://issues.apache.org/jira/browse/SPARK-13816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190703#comment-15190703 ] Apache Spark commented on SPARK-13816: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/11655 > Add parameter checks for algorithms in Graphx > -- > > Key: SPARK-13816 > URL: https://issues.apache.org/jira/browse/SPARK-13816 > Project: Spark > Issue Type: Improvement > Components: GraphX >Reporter: zhengruifeng >Priority: Trivial > > Add parameter checks in Graphx-Algorithms: > maxIterations in Pregel > maxSteps in LabelPropagation > numIter,resetProb,tol in PageRank > maxIters,maxVal,minVal in SVDPlusPlus -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13816) Add parameter checks for algorithms in Graphx
[ https://issues.apache.org/jira/browse/SPARK-13816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13816: Assignee: Apache Spark > Add parameter checks for algorithms in Graphx > -- > > Key: SPARK-13816 > URL: https://issues.apache.org/jira/browse/SPARK-13816 > Project: Spark > Issue Type: Improvement > Components: GraphX >Reporter: zhengruifeng >Assignee: Apache Spark >Priority: Trivial > > Add parameter checks in Graphx-Algorithms: > maxIterations in Pregel > maxSteps in LabelPropagation > numIter,resetProb,tol in PageRank > maxIters,maxVal,minVal in SVDPlusPlus -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-13553) Migrate basic inspection operations
[ https://issues.apache.org/jira/browse/SPARK-13553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian deleted SPARK-13553: --- > Migrate basic inspection operations > --- > > Key: SPARK-13553 > URL: https://issues.apache.org/jira/browse/SPARK-13553 > Project: Spark > Issue Type: Sub-task >Reporter: Cheng Lian >Assignee: Cheng Lian > > Should migrate the following methods and corresponding tests to Dataset: > {noformat} > - Basic inspection operations > - dtypes > - columns > - printSchema > - explain > - Column accessors > - col > - apply > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-13554) Migrate typed relational operations
[ https://issues.apache.org/jira/browse/SPARK-13554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian deleted SPARK-13554: --- > Migrate typed relational operations > --- > > Key: SPARK-13554 > URL: https://issues.apache.org/jira/browse/SPARK-13554 > Project: Spark > Issue Type: Sub-task >Reporter: Cheng Lian >Assignee: Cheng Lian > > Should migrate the following methods and corresponding tests to Dataset: > {noformat} > - Relational operations > - Typed relational operations > - as(String): Dataset[T] // Subquery > - filter(Column): Dataset[T] > - filter(String): Dataset[T] > - where(Column): Dataset[T] > - where(String): Dataset[T] > - limit(n): Dataset[T] > - sortWithinPartitions(String, String*): Dataset[T] > - sortWithinPartitions(Column*): Dataset[T] > - sort(String, String*): Dataset[T] > - sort(Column*): Dataset[T] > - orderBy(String, String*): Dataset[T] > - orderBy(Column*): Dataset[T] > - randomSplit(Array[Double], Long): Array[Dataset[T]] > - randomSplit(Array[Double]): Array[Dataset[T]] > - Set operations > - unionAll // alias of union (remove it?) > - except // alias of substract (remove it?) > - Repartitioning > - repartition(Int, Column*): Dataset[T] > - repartition(Column*): Dataset[T] > - explode[A <: Product: TypeTag](Column*)(Row => TraversableOnce[A]): > Dataset[A] > - explode[A, B: TypeTag](String, String)(A => TraversableOnce[B]): > Dataset[B] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-13553) Migrate basic inspection operations
[ https://issues.apache.org/jira/browse/SPARK-13553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian deleted SPARK-13553: --- > Migrate basic inspection operations > --- > > Key: SPARK-13553 > URL: https://issues.apache.org/jira/browse/SPARK-13553 > Project: Spark > Issue Type: Sub-task >Reporter: Cheng Lian >Assignee: Cheng Lian > > Should migrate the following methods and corresponding tests to Dataset: > {noformat} > - Basic inspection operations > - dtypes > - columns > - printSchema > - explain > - Column accessors > - col > - apply > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-13555) Migrate untyped relational operations
[ https://issues.apache.org/jira/browse/SPARK-13555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian deleted SPARK-13555: --- > Migrate untyped relational operations > - > > Key: SPARK-13555 > URL: https://issues.apache.org/jira/browse/SPARK-13555 > Project: Spark > Issue Type: Sub-task >Reporter: Cheng Lian >Assignee: Cheng Lian > > Should migrate the following methods and corresponding tests to Dataset: > {noformat} > - Relational operations > - Untyped relational operations > - select(Column*): Dataset[Row] > - select(String, String*): Dataset[Row] > - selectExpr(String*): Dataset[Row] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-13556) Migrate untyped joins
[ https://issues.apache.org/jira/browse/SPARK-13556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian deleted SPARK-13556: --- > Migrate untyped joins > - > > Key: SPARK-13556 > URL: https://issues.apache.org/jira/browse/SPARK-13556 > Project: Spark > Issue Type: Sub-task >Reporter: Cheng Lian >Assignee: Cheng Lian > > Should migrate the following methods and corresponding tests to Dataset: > {noformat} > - Joins > - Untyped joins > - join[U: Encoder](Dataset[U]): Dataset[Row] > - join[U: Encoder](Dataset[U], String): Dataset[Row] > - join[U: Encoder](Dataset[U], Seq[String]): Dataset[Row] > - join[U: Encoder](Dataset[U], Seq[String], String): Dataset[Row] > - join[U: Encoder](Dataset[U], Column): Dataset[Row] > - join[U: Encoder](Dataset[U], Column, String): Dataset[Row] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-13557) Migrate gather-to-driver actions
[ https://issues.apache.org/jira/browse/SPARK-13557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian deleted SPARK-13557: --- > Migrate gather-to-driver actions > > > Key: SPARK-13557 > URL: https://issues.apache.org/jira/browse/SPARK-13557 > Project: Spark > Issue Type: Sub-task >Reporter: Cheng Lian > > Should migrate the following methods and corresponding tests to Dataset: > {noformat} > - Gater-to-driver actions > - head(Int): Array[T] > - head(): T > - first(): T > - collect(): Array[T] > - collectAsList(): java.util.List[T] > - take(Int): Array[T] > - takeAsList(Int): java.util.List[T] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-13558) Migrate basic GroupedDataset methods
[ https://issues.apache.org/jira/browse/SPARK-13558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian deleted SPARK-13558: --- > Migrate basic GroupedDataset methods > > > Key: SPARK-13558 > URL: https://issues.apache.org/jira/browse/SPARK-13558 > Project: Spark > Issue Type: Sub-task >Reporter: Cheng Lian >Assignee: Cheng Lian > > Should migrate the following methods and corresponding tests to Dataset: > {noformat} > - Aggregations > - GroupedDataset > - Support GroupType (GroupBy/GroupingSet/Rollup/Cube) > - Untyped aggregations > - agg((String, String), (String, String)*): Dataset[Row] > - agg(Map[String, String]): Dataset[Row] > - agg(java.util.Map[String, String]): Dataset[Row] > - agg(Column, Column*): Dataset[Row] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-13559) Migrate common GroupedDataset aggregations
[ https://issues.apache.org/jira/browse/SPARK-13559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian deleted SPARK-13559: --- > Migrate common GroupedDataset aggregations > -- > > Key: SPARK-13559 > URL: https://issues.apache.org/jira/browse/SPARK-13559 > Project: Spark > Issue Type: Sub-task >Reporter: Cheng Lian > > Should migrate the following methods and corresponding tests to Dataset: > {noformat} > - Aggregations > - GroupedDataset > - Common untyped aggregations > - mean(String*): Dataset[Row] > - max(String*): Dataset[Row] > - avg(String*): Dataset[Row] > - min(String*): Dataset[Row] > - sum(String*): Dataset[Row] > - Common typed aggregations > - count(): Dataset[(K, Long)] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-13560) Migrate GroupedDataset pivoting methods
[ https://issues.apache.org/jira/browse/SPARK-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian deleted SPARK-13560: --- > Migrate GroupedDataset pivoting methods > --- > > Key: SPARK-13560 > URL: https://issues.apache.org/jira/browse/SPARK-13560 > Project: Spark > Issue Type: Sub-task >Reporter: Cheng Lian > > Should migrate the following methods and corresponding tests to Dataset: > {noformat} > - Aggregations > - GroupedDataset > - Pivoting > - pivot(String): GroupedDataset[Row, V] > - pivot(String, Seq[Any]): GroupedDataset[Row, V] > - pivot(String, java.util.List[Any]): GroupedDataset[Row, V] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13817) Re-enable MiMA check after unifying DataFrame and Dataset API
Cheng Lian created SPARK-13817: -- Summary: Re-enable MiMA check after unifying DataFrame and Dataset API Key: SPARK-13817 URL: https://issues.apache.org/jira/browse/SPARK-13817 Project: Spark Issue Type: Test Components: Build Affects Versions: 2.0.0 Reporter: Cheng Lian Assignee: Cheng Lian In [PR #11443|https://github.com/apache/spark/pull/11443], we unified DataFrame and Dataset API. Since this PR did tons of API changes, we disabled MiMA check temporarily for convenience. Now it is merged, we should re-enable MiMA check. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13817) Re-enable MiMA check after unifying DataFrame and Dataset API
[ https://issues.apache.org/jira/browse/SPARK-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190744#comment-15190744 ] Apache Spark commented on SPARK-13817: -- User 'liancheng' has created a pull request for this issue: https://github.com/apache/spark/pull/11656 > Re-enable MiMA check after unifying DataFrame and Dataset API > - > > Key: SPARK-13817 > URL: https://issues.apache.org/jira/browse/SPARK-13817 > Project: Spark > Issue Type: Test > Components: Build >Affects Versions: 2.0.0 >Reporter: Cheng Lian >Assignee: Cheng Lian > > In [PR #11443|https://github.com/apache/spark/pull/11443], we unified > DataFrame and Dataset API. Since this PR did tons of API changes, we disabled > MiMA check temporarily for convenience. Now it is merged, we should re-enable > MiMA check. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13817) Re-enable MiMA check after unifying DataFrame and Dataset API
[ https://issues.apache.org/jira/browse/SPARK-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13817: Assignee: Cheng Lian (was: Apache Spark) > Re-enable MiMA check after unifying DataFrame and Dataset API > - > > Key: SPARK-13817 > URL: https://issues.apache.org/jira/browse/SPARK-13817 > Project: Spark > Issue Type: Test > Components: Build >Affects Versions: 2.0.0 >Reporter: Cheng Lian >Assignee: Cheng Lian > > In [PR #11443|https://github.com/apache/spark/pull/11443], we unified > DataFrame and Dataset API. Since this PR did tons of API changes, we disabled > MiMA check temporarily for convenience. Now it is merged, we should re-enable > MiMA check. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13817) Re-enable MiMA check after unifying DataFrame and Dataset API
[ https://issues.apache.org/jira/browse/SPARK-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13817: Assignee: Apache Spark (was: Cheng Lian) > Re-enable MiMA check after unifying DataFrame and Dataset API > - > > Key: SPARK-13817 > URL: https://issues.apache.org/jira/browse/SPARK-13817 > Project: Spark > Issue Type: Test > Components: Build >Affects Versions: 2.0.0 >Reporter: Cheng Lian >Assignee: Apache Spark > > In [PR #11443|https://github.com/apache/spark/pull/11443], we unified > DataFrame and Dataset API. Since this PR did tons of API changes, we disabled > MiMA check temporarily for convenience. Now it is merged, we should re-enable > MiMA check. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-13564) Migrate DataFrameStatFunctions to Dataset
[ https://issues.apache.org/jira/browse/SPARK-13564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian deleted SPARK-13564: --- > Migrate DataFrameStatFunctions to Dataset > - > > Key: SPARK-13564 > URL: https://issues.apache.org/jira/browse/SPARK-13564 > Project: Spark > Issue Type: Sub-task >Reporter: Cheng Lian > > After the migration, we should have a separate namespace {{Dataset.stat}} for > statistics methods, just like {{DataFrame.stat}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-13563) Migrate DataFrameNaFunctions to Dataset
[ https://issues.apache.org/jira/browse/SPARK-13563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian deleted SPARK-13563: --- > Migrate DataFrameNaFunctions to Dataset > --- > > Key: SPARK-13563 > URL: https://issues.apache.org/jira/browse/SPARK-13563 > Project: Spark > Issue Type: Sub-task >Reporter: Cheng Lian > > After the migration, we should have a separate namespace {{Dataset.na}}, just > like {{DataFrame.na}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-13562) Migrate Dataset typed aggregations
[ https://issues.apache.org/jira/browse/SPARK-13562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian deleted SPARK-13562: --- > Migrate Dataset typed aggregations > -- > > Key: SPARK-13562 > URL: https://issues.apache.org/jira/browse/SPARK-13562 > Project: Spark > Issue Type: Sub-task >Reporter: Cheng Lian > > Should migrate the following methods and corresponding tests to Dataset: > {noformat} > - Aggregations > - Untyped aggregations (depends on GroupedDataset) > - groupBy(Column*): GroupedDataset[Row, T] > - groupBy(String, String*): GroupedDataset[Row, T] > - rollup(Column*): GroupedDataset[Row, T] > - rollup(String, String*): GroupedDataset[Row, T] > - cube(Column*): GroupedDataset[Row, T] > - cube(String, String*): GroupedDataset[Row, T] > - agg((String, String), (String, String)*): Dataset[Row] > - agg(Map[String, String]): Dataset[Row] > - agg(java.util.Map[String, String]): Dataset[Row] > - agg(Column, Column*): Dataset[Row] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13815) UnsupportedOperationException: empty collection when metadata for a Pipeline is an empty file
[ https://issues.apache.org/jira/browse/SPARK-13815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190765#comment-15190765 ] Sean Owen commented on SPARK-13815: --- This is more generally what happens when you call something like first or take on an empty RDD. Is it that misleading? it says you're doing something you can't because a collection is empty. You can add an isEmpty check or something and throw a more specific exception, sure. > UnsupportedOperationException: empty collection when metadata for a Pipeline > is an empty file > - > > Key: SPARK-13815 > URL: https://issues.apache.org/jira/browse/SPARK-13815 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 2.0.0 > Environment: today's build of 2.0.0-SNAPSHOT >Reporter: Jacek Laskowski >Priority: Minor > > The following code that loads a {{Pipeline}} from an empty {{metadata}} file > throws an exception (expected) that says nothing about the real cause of it. > {code} > $ ls -l hello-pipeline/metadata > -rw-r--r-- 1 jacek staff 0 11 mar 09:00 hello-pipeline/metadata > scala> Pipeline.read.load("hello-pipeline") > ... > java.lang.UnsupportedOperationException: empty collection > at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1344) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) > at org.apache.spark.rdd.RDD.first(RDD.scala:1341) > at > org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:285) > at > org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:253) > at > org.apache.spark.ml.Pipeline$PipelineReader.load(Pipeline.scala:203) > at > org.apache.spark.ml.Pipeline$PipelineReader.load(Pipeline.scala:197) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13818) the spark streaming job will be always processing status when restart elasticsearch
yuemeng created SPARK-13818: --- Summary: the spark streaming job will be always processing status when restart elasticsearch Key: SPARK-13818 URL: https://issues.apache.org/jira/browse/SPARK-13818 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.5.0, 1.4.0, 1.3.0 Reporter: yuemeng Priority: Blocker Fix For: 1.4.2, 1.5.3 Using spark streaming to write data into elasticsearch-hadoop system ,when we restart elasticsearch system,tasks in some job at this time will be get follow error: Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 (TID 75, CIS-store02): org.elasticsearch.hadoop.EsHadoopIllegalStateException: Cluster state volatile; cannot find node backing shards - please check whether your cluster is stable at org.elasticsearch.hadoop.rest.RestRepository.getWriteTargetPrimaryShards(RestRepository.java:370) at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:425) at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:393) at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: and this batch will be always in the status of processing,Never failed or finished,it maybe cause resources for this batch never release. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13819) using a tegexp_replace in a gropu by clause raises a nullpointerexception
Javier Pérez created SPARK-13819: Summary: using a tegexp_replace in a gropu by clause raises a nullpointerexception Key: SPARK-13819 URL: https://issues.apache.org/jira/browse/SPARK-13819 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.0 Reporter: Javier Pérez 1. Start start-thriftserver.sh 2. connect with beeline 3. Perform the following query over a table: SELECT t0.textsample FROM test t0 ORDER BY regexp_replace( t0.code, concat('\\Q', 'a', '\\E'), regexp_replace( regexp_replace('zz', '', ''), '\\$', '\\$')) DESC; Problem: NullPointerException Trace: java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.RegExpReplace.nullSafeEval(regexpExpressions.scala:224) at org.apache.spark.sql.catalyst.expressions.TernaryExpression.eval(Expression.scala:458) at org.apache.spark.sql.catalyst.expressions.InterpretedOrdering.compare(ordering.scala:36) at org.apache.spark.sql.catalyst.expressions.InterpretedOrdering.compare(ordering.scala:27) at scala.math.Ordering$class.gt(Ordering.scala:97) at org.apache.spark.sql.catalyst.expressions.InterpretedOrdering.gt(ordering.scala:27) at org.apache.spark.RangePartitioner.getPartition(Partitioner.scala:168) at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:119) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13818) the spark streaming job will be always processing status when restart elasticsearch
[ https://issues.apache.org/jira/browse/SPARK-13818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190783#comment-15190783 ] yuemeng commented on SPARK-13818: - code like: stream.foreachRDD( rdd=> { val ep = esPath + getIndexName("") + "/event" rdd.saveToEs(ep) } when spark streaming run well,we restart the elasticsearch,tasks which at this point will be failed,but this batch never finished or failed,it streaming web ui,we can see that:this job had task faild because of error( org.elasticsearch.hadoop.EsHadoopIllegalStateException: Cluster state volatile; cannot find node backing shards - please check whether your cluster is stable),but this batch will be always in processing status.in my opinion,if job failed becasue of task failure by some reason,this batch's status will be finished or failed instead of processing will be anyone like to check this issue.thanks [~zsxwing] can u help me to check this issue > the spark streaming job will be always processing status when restart > elasticsearch > > > Key: SPARK-13818 > URL: https://issues.apache.org/jira/browse/SPARK-13818 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.0, 1.4.0, 1.5.0 >Reporter: yuemeng >Priority: Blocker > Fix For: 1.4.2, 1.5.3 > > > Using spark streaming to write data into elasticsearch-hadoop system ,when we > restart elasticsearch system,tasks in some job at this time will be get > follow error: > Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most > recent failure: Lost task 0.3 in stage 4.0 (TID 75, CIS-store02): > org.elasticsearch.hadoop.EsHadoopIllegalStateException: Cluster state > volatile; cannot find node backing shards - please check whether your cluster > is stable > at > org.elasticsearch.hadoop.rest.RestRepository.getWriteTargetPrimaryShards(RestRepository.java:370) > at > org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:425) > at > org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:393) > at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40) > at > org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67) > at > org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) > at org.apache.spark.scheduler.Task.run(Task.scala:70) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Driver stacktrace: > and this batch will be always in the status of processing,Never failed or > finished,it maybe cause resources for this batch never release. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-13561) Migrate Dataset untyped aggregations
[ https://issues.apache.org/jira/browse/SPARK-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian deleted SPARK-13561: --- > Migrate Dataset untyped aggregations > > > Key: SPARK-13561 > URL: https://issues.apache.org/jira/browse/SPARK-13561 > Project: Spark > Issue Type: Sub-task >Reporter: Cheng Lian > > Should migrate the following methods and corresponding tests to Dataset: > {noformat} > - Aggregations > - Typed aggregations (depends on GroupedDataset) > - groupBy[K: Encoder](T => K): GroupedDataset[K, T] // rename to > groupByKey > - groupBy[K](MapFunction[T, K], Encoder[K]): GroupedDataset[K, T] // > Rename to groupByKey > - count > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-13565) Migrate DataFrameReader/DataFrameWriter to Dataset API
[ https://issues.apache.org/jira/browse/SPARK-13565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian deleted SPARK-13565: --- > Migrate DataFrameReader/DataFrameWriter to Dataset API > -- > > Key: SPARK-13565 > URL: https://issues.apache.org/jira/browse/SPARK-13565 > Project: Spark > Issue Type: Sub-task >Reporter: Cheng Lian > > We'd like to be able to read/write a Dataset from/to specific data sources. > After the migration, we should have {{Dataset.read}}/{{Dataset.write}}, just > like {{DataFrame.read}}/{{DataFrame.write}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13820) TPC-DS Query 10 fails to compile
Roy Cecil created SPARK-13820: - Summary: TPC-DS Query 10 fails to compile Key: SPARK-13820 URL: https://issues.apache.org/jira/browse/SPARK-13820 Project: Spark Issue Type: Bug Affects Versions: 1.6.1 Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo) Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux Reporter: Roy Cecil TPC-DS Query 10 fails to compile with the following error. Parsing error: KW_SELECT )=> ( KW_EXISTS subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR ^( TOK_SUBQUERY_OP KW_EXISTS ) subQueryExpression ) );]) at org.antlr.runtime.DFA.noViableAlt(DFA.java:158) at org.antlr.runtime.DFA.predict(DFA.java:144) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceEqualExpression(HiveParser_IdentifiersParser.java:8155) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceNotExpression(HiveParser_IdentifiersParser.java:9177) Parsing error: KW_SELECT )=> ( KW_EXISTS subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR ^( TOK_SUBQUERY_OP KW_EXISTS ) subQueryExpression ) );]) at org.antlr.runtime.DFA.noViableAlt(DFA.java:158) at org.antlr.runtime.DFA.predict(DFA.java:144) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceEqualExpression(HiveParser_IdentifiersParser.java:8155) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceNotExpression(HiveParser_IdentifiersParser.java:9177) Query is pasted here for easy reproduction select cd_gender, cd_marital_status, cd_education_status, count(*) cnt1, cd_purchase_estimate, count(*) cnt2, cd_credit_rating, count(*) cnt3, cd_dep_count, count(*) cnt4, cd_dep_employed_count, count(*) cnt5, cd_dep_college_count, count(*) cnt6 from customer c JOIN customer_address ca ON c.c_current_addr_sk = ca.ca_address_sk JOIN customer_demographics ON cd_demo_sk = c.c_current_cdemo_sk LEFT SEMI JOIN (select ss_customer_sk from store_sales JOIN date_dim ON ss_sold_date_sk = d_date_sk where d_year = 2002 and d_moy between 1 and 1+3) ss_wh1 ON c.c_customer_sk = ss_wh1.ss_customer_sk where ca_county in ('Rush County','Toole County','Jefferson County','Dona Ana County','La Porte County') and exists ( select tmp.customer_sk from ( select ws_bill_customer_sk as customer_sk from web_sales,date_dim where web_sales.ws_sold_date_sk = date_dim.d_date_sk and d_year = 2002 and d_moy between 1 and 1+3 UNION ALL select cs_ship_customer_sk as customer_sk from catalog_sales,date_dim where catalog_sales.cs_sold_date_sk = date_dim.d_date_sk and d_year = 2002 and d_moy between 1 and 1+3 ) tmp where c.c_customer_sk = tmp.customer_sk ) group by cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_count, cd_dep_college_count order by cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_count, cd_dep_college_count limit 100; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13818) the spark streaming job will be always processing status when restart elasticsearch
[ https://issues.apache.org/jira/browse/SPARK-13818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-13818: -- Priority: Major (was: Blocker) Fix Version/s: (was: 1.5.3) (was: 1.4.2) @yuemeng Please don't open a JIRA until you read https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark You should not set blocker, and it does not make sense to set fix versions. Further, this is an Elasticsearch issue, not Spark (at this stage at least). I'm going to close it. > the spark streaming job will be always processing status when restart > elasticsearch > > > Key: SPARK-13818 > URL: https://issues.apache.org/jira/browse/SPARK-13818 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.0, 1.4.0, 1.5.0 >Reporter: yuemeng > > Using spark streaming to write data into elasticsearch-hadoop system ,when we > restart elasticsearch system,tasks in some job at this time will be get > follow error: > Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most > recent failure: Lost task 0.3 in stage 4.0 (TID 75, CIS-store02): > org.elasticsearch.hadoop.EsHadoopIllegalStateException: Cluster state > volatile; cannot find node backing shards - please check whether your cluster > is stable > at > org.elasticsearch.hadoop.rest.RestRepository.getWriteTargetPrimaryShards(RestRepository.java:370) > at > org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:425) > at > org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:393) > at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40) > at > org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67) > at > org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) > at org.apache.spark.scheduler.Task.run(Task.scala:70) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Driver stacktrace: > and this batch will be always in the status of processing,Never failed or > finished,it maybe cause resources for this batch never release. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13821) TPC-DS Query 20 fails to compile
Roy Cecil created SPARK-13821: - Summary: TPC-DS Query 20 fails to compile Key: SPARK-13821 URL: https://issues.apache.org/jira/browse/SPARK-13821 Project: Spark Issue Type: Bug Affects Versions: 1.6.1 Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo) Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux Reporter: Roy Cecil TPC-DS Query 20 Fails to compile with the follwing Error Message Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) );]) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) at org.antlr.runtime.DFA.predict(DFA.java:80) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) );]) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) at org.antlr.runtime.DFA.predict(DFA.java:80) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13818) the spark streaming job will be always processing status when restart elasticsearch
[ https://issues.apache.org/jira/browse/SPARK-13818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-13818: -- @yuemeng Please don't open a JIRA until you read https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark You should not set blocker, and it does not make sense to set fix versions. Further, this is an Elasticsearch issue, not Spark (at this stage at least). I'm going to close it. > the spark streaming job will be always processing status when restart > elasticsearch > > > Key: SPARK-13818 > URL: https://issues.apache.org/jira/browse/SPARK-13818 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.0, 1.4.0, 1.5.0 >Reporter: yuemeng > > Using spark streaming to write data into elasticsearch-hadoop system ,when we > restart elasticsearch system,tasks in some job at this time will be get > follow error: > Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most > recent failure: Lost task 0.3 in stage 4.0 (TID 75, CIS-store02): > org.elasticsearch.hadoop.EsHadoopIllegalStateException: Cluster state > volatile; cannot find node backing shards - please check whether your cluster > is stable > at > org.elasticsearch.hadoop.rest.RestRepository.getWriteTargetPrimaryShards(RestRepository.java:370) > at > org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:425) > at > org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:393) > at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40) > at > org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67) > at > org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) > at org.apache.spark.scheduler.Task.run(Task.scala:70) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Driver stacktrace: > and this batch will be always in the status of processing,Never failed or > finished,it maybe cause resources for this batch never release. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13818) the spark streaming job will be always processing status when restart elasticsearch
[ https://issues.apache.org/jira/browse/SPARK-13818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-13818. --- Resolution: Invalid > the spark streaming job will be always processing status when restart > elasticsearch > > > Key: SPARK-13818 > URL: https://issues.apache.org/jira/browse/SPARK-13818 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.0, 1.4.0, 1.5.0 >Reporter: yuemeng > > Using spark streaming to write data into elasticsearch-hadoop system ,when we > restart elasticsearch system,tasks in some job at this time will be get > follow error: > Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most > recent failure: Lost task 0.3 in stage 4.0 (TID 75, CIS-store02): > org.elasticsearch.hadoop.EsHadoopIllegalStateException: Cluster state > volatile; cannot find node backing shards - please check whether your cluster > is stable > at > org.elasticsearch.hadoop.rest.RestRepository.getWriteTargetPrimaryShards(RestRepository.java:370) > at > org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:425) > at > org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:393) > at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40) > at > org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67) > at > org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) > at org.apache.spark.scheduler.Task.run(Task.scala:70) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Driver stacktrace: > and this batch will be always in the status of processing,Never failed or > finished,it maybe cause resources for this batch never release. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13822) Follow-ups of DataFrame/Dataset API unification
Cheng Lian created SPARK-13822: -- Summary: Follow-ups of DataFrame/Dataset API unification Key: SPARK-13822 URL: https://issues.apache.org/jira/browse/SPARK-13822 Project: Spark Issue Type: Improvement Components: Build, SQL Affects Versions: 2.0.0 Reporter: Cheng Lian Assignee: Cheng Lian This is an umbrella ticket for all follow-up work of DataFrame/Dataset API unification (SPARK-13244). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13821) TPC-DS Query 20 fails to compile
[ https://issues.apache.org/jira/browse/SPARK-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190836#comment-15190836 ] Roy Cecil commented on SPARK-13821: --- Query Text is select i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price ,sum(cs_ext_sales_price) as itemrevenue ,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over (partition by i_class) as revenueratio from catalog_sales ,item ,date_dim where cs_item_sk = i_item_sk and i_category in ('Sports', 'Books', 'Home') and cs_sold_date_sk = d_date_sk and d_date between cast('1999-02-22' as date) and date_add(cast('1999-02-22' as date), 30) group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio LIMIT 100; > TPC-DS Query 20 fails to compile > > > Key: SPARK-13821 > URL: https://issues.apache.org/jira/browse/SPARK-13821 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 > Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo) > Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 > 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Roy Cecil > > TPC-DS Query 20 Fails to compile with the follwing Error Message > Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( > tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( > expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA > identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) > );]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) > at org.antlr.runtime.DFA.predict(DFA.java:80) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) > Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( > tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( > expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA > identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) > );]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) > at org.antlr.runtime.DFA.predict(DFA.java:80) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13821) TPC-DS Query 20 fails to compile
[ https://issues.apache.org/jira/browse/SPARK-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190835#comment-15190835 ] Roy Cecil commented on SPARK-13821: --- Query Text is select i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price ,sum(cs_ext_sales_price) as itemrevenue ,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over (partition by i_class) as revenueratio from catalog_sales ,item ,date_dim where cs_item_sk = i_item_sk and i_category in ('Sports', 'Books', 'Home') and cs_sold_date_sk = d_date_sk and d_date between cast('1999-02-22' as date) and date_add(cast('1999-02-22' as date), 30) group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio LIMIT 100; > TPC-DS Query 20 fails to compile > > > Key: SPARK-13821 > URL: https://issues.apache.org/jira/browse/SPARK-13821 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 > Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo) > Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 > 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Roy Cecil > > TPC-DS Query 20 Fails to compile with the follwing Error Message > Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( > tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( > expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA > identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) > );]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) > at org.antlr.runtime.DFA.predict(DFA.java:80) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) > Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( > tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( > expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA > identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) > );]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) > at org.antlr.runtime.DFA.predict(DFA.java:80) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13821) TPC-DS Query 20 fails to compile
[ https://issues.apache.org/jira/browse/SPARK-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190837#comment-15190837 ] Roy Cecil commented on SPARK-13821: --- Query Text is select i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price ,sum(cs_ext_sales_price) as itemrevenue ,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over (partition by i_class) as revenueratio from catalog_sales ,item ,date_dim where cs_item_sk = i_item_sk and i_category in ('Sports', 'Books', 'Home') and cs_sold_date_sk = d_date_sk and d_date between cast('1999-02-22' as date) and date_add(cast('1999-02-22' as date), 30) group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio LIMIT 100; > TPC-DS Query 20 fails to compile > > > Key: SPARK-13821 > URL: https://issues.apache.org/jira/browse/SPARK-13821 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 > Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo) > Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 > 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Roy Cecil > > TPC-DS Query 20 Fails to compile with the follwing Error Message > Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( > tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( > expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA > identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) > );]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) > at org.antlr.runtime.DFA.predict(DFA.java:80) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) > Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( > tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( > expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA > identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) > );]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) > at org.antlr.runtime.DFA.predict(DFA.java:80) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13821) TPC-DS Query 20 fails to compile
[ https://issues.apache.org/jira/browse/SPARK-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190839#comment-15190839 ] Roy Cecil commented on SPARK-13821: --- select i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price ,sum(cs_ext_sales_price) as itemrevenue ,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over (partition by i_class) as revenueratio from catalog_sales ,item ,date_dim where cs_item_sk = i_item_sk and i_category in ('Sports', 'Books', 'Home') and cs_sold_date_sk = d_date_sk and d_date between cast('1999-02-22' as date) and date_add(cast('1999-02-22' as date), 30) group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio LIMIT 100; > TPC-DS Query 20 fails to compile > > > Key: SPARK-13821 > URL: https://issues.apache.org/jira/browse/SPARK-13821 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 > Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo) > Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 > 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Roy Cecil > > TPC-DS Query 20 Fails to compile with the follwing Error Message > Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( > tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( > expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA > identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) > );]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) > at org.antlr.runtime.DFA.predict(DFA.java:80) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) > Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( > tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( > expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA > identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) > );]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) > at org.antlr.runtime.DFA.predict(DFA.java:80) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13817) Re-enable MiMA check after unifying DataFrame and Dataset API
[ https://issues.apache.org/jira/browse/SPARK-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-13817: --- Issue Type: Sub-task (was: Test) Parent: SPARK-13822 > Re-enable MiMA check after unifying DataFrame and Dataset API > - > > Key: SPARK-13817 > URL: https://issues.apache.org/jira/browse/SPARK-13817 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 2.0.0 >Reporter: Cheng Lian >Assignee: Cheng Lian > > In [PR #11443|https://github.com/apache/spark/pull/11443], we unified > DataFrame and Dataset API. Since this PR did tons of API changes, we disabled > MiMA check temporarily for convenience. Now it is merged, we should re-enable > MiMA check. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13823) Always specify Charset in String <-> byte[] conversions (and remaining Coverity items)
Sean Owen created SPARK-13823: - Summary: Always specify Charset in String <-> byte[] conversions (and remaining Coverity items) Key: SPARK-13823 URL: https://issues.apache.org/jira/browse/SPARK-13823 Project: Spark Issue Type: Improvement Components: Spark Core, SQL, Streaming Affects Versions: 2.0.0 Reporter: Sean Owen Assignee: Sean Owen Priority: Minor Most of the remaining items from the last Coverity scan concern using, for example, the constructor {{new String(byte[])}} or the method {{String.getBytes()}}, or similarly for constructors of {{InputStreamReader}} and {{OutputStreamWriter}}. These use the platform default encoding, which means their behavior may change in different locales, which should be undesirable in all cases in Spark. It makes sense to specify UTF-8 as the default everywhere; where already specified, it's UTF-8 in 95% of cases. A few tests set US-ASCII, but UTF-8 is a superset. We should also consistently use {{StandardCharsets.UTF_8}} rather than "UTF-8" or Guava's {{Charsets.UTF_8}} to specify this. (Finally, we should touch up the other few remaining Coverity scan items, which are trivial, while we're here.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13684) Possible unsafe bytesRead increment in StreamInterceptor
[ https://issues.apache.org/jira/browse/SPARK-13684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-13684. --- Resolution: Duplicate If you don't mind I'm going to bundle this up with resolution for all the remaining Coverity issues > Possible unsafe bytesRead increment in StreamInterceptor > > > Key: SPARK-13684 > URL: https://issues.apache.org/jira/browse/SPARK-13684 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: holdenk >Priority: Trivial > > We unsafely increment a volatile (bytesRead) in a call back, if two call > backs are triggered we may under count bytesRead. This issue was found using > coverity. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13823) Always specify Charset in String <-> byte[] conversions (and remaining Coverity items)
[ https://issues.apache.org/jira/browse/SPARK-13823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190850#comment-15190850 ] Apache Spark commented on SPARK-13823: -- User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/11657 > Always specify Charset in String <-> byte[] conversions (and remaining > Coverity items) > -- > > Key: SPARK-13823 > URL: https://issues.apache.org/jira/browse/SPARK-13823 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL, Streaming >Affects Versions: 2.0.0 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Minor > > Most of the remaining items from the last Coverity scan concern using, for > example, the constructor {{new String(byte[])}} or the method > {{String.getBytes()}}, or similarly for constructors of {{InputStreamReader}} > and {{OutputStreamWriter}}. These use the platform default encoding, which > means their behavior may change in different locales, which should be > undesirable in all cases in Spark. > It makes sense to specify UTF-8 as the default everywhere; where already > specified, it's UTF-8 in 95% of cases. A few tests set US-ASCII, but UTF-8 is > a superset. > We should also consistently use {{StandardCharsets.UTF_8}} rather than > "UTF-8" or Guava's {{Charsets.UTF_8}} to specify this. > (Finally, we should touch up the other few remaining Coverity scan items, > which are trivial, while we're here.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13823) Always specify Charset in String <-> byte[] conversions (and remaining Coverity items)
[ https://issues.apache.org/jira/browse/SPARK-13823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13823: Assignee: Apache Spark (was: Sean Owen) > Always specify Charset in String <-> byte[] conversions (and remaining > Coverity items) > -- > > Key: SPARK-13823 > URL: https://issues.apache.org/jira/browse/SPARK-13823 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL, Streaming >Affects Versions: 2.0.0 >Reporter: Sean Owen >Assignee: Apache Spark >Priority: Minor > > Most of the remaining items from the last Coverity scan concern using, for > example, the constructor {{new String(byte[])}} or the method > {{String.getBytes()}}, or similarly for constructors of {{InputStreamReader}} > and {{OutputStreamWriter}}. These use the platform default encoding, which > means their behavior may change in different locales, which should be > undesirable in all cases in Spark. > It makes sense to specify UTF-8 as the default everywhere; where already > specified, it's UTF-8 in 95% of cases. A few tests set US-ASCII, but UTF-8 is > a superset. > We should also consistently use {{StandardCharsets.UTF_8}} rather than > "UTF-8" or Guava's {{Charsets.UTF_8}} to specify this. > (Finally, we should touch up the other few remaining Coverity scan items, > which are trivial, while we're here.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13823) Always specify Charset in String <-> byte[] conversions (and remaining Coverity items)
[ https://issues.apache.org/jira/browse/SPARK-13823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13823: Assignee: Sean Owen (was: Apache Spark) > Always specify Charset in String <-> byte[] conversions (and remaining > Coverity items) > -- > > Key: SPARK-13823 > URL: https://issues.apache.org/jira/browse/SPARK-13823 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL, Streaming >Affects Versions: 2.0.0 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Minor > > Most of the remaining items from the last Coverity scan concern using, for > example, the constructor {{new String(byte[])}} or the method > {{String.getBytes()}}, or similarly for constructors of {{InputStreamReader}} > and {{OutputStreamWriter}}. These use the platform default encoding, which > means their behavior may change in different locales, which should be > undesirable in all cases in Spark. > It makes sense to specify UTF-8 as the default everywhere; where already > specified, it's UTF-8 in 95% of cases. A few tests set US-ASCII, but UTF-8 is > a superset. > We should also consistently use {{StandardCharsets.UTF_8}} rather than > "UTF-8" or Guava's {{Charsets.UTF_8}} to specify this. > (Finally, we should touch up the other few remaining Coverity scan items, > which are trivial, while we're here.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13824) Upgrade to Scala 2.11.8
Jacek Laskowski created SPARK-13824: --- Summary: Upgrade to Scala 2.11.8 Key: SPARK-13824 URL: https://issues.apache.org/jira/browse/SPARK-13824 Project: Spark Issue Type: Bug Affects Versions: 2.0.0 Reporter: Jacek Laskowski Priority: Minor Scala 2.11.8 is out so...time to upgrade before 2.0.0 is out -> http://www.scala-lang.org/news/2.11.8/. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13825) Upgrade to Scala 2.11.8
Jacek Laskowski created SPARK-13825: --- Summary: Upgrade to Scala 2.11.8 Key: SPARK-13825 URL: https://issues.apache.org/jira/browse/SPARK-13825 Project: Spark Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Jacek Laskowski Priority: Minor Scala 2.11.8 is out so...time to upgrade before 2.0.0 is out -> http://www.scala-lang.org/news/2.11.8/. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13824) Upgrade to Scala 2.11.8
[ https://issues.apache.org/jira/browse/SPARK-13824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-13824. --- Resolution: Duplicate > Upgrade to Scala 2.11.8 > --- > > Key: SPARK-13824 > URL: https://issues.apache.org/jira/browse/SPARK-13824 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Jacek Laskowski >Priority: Minor > > Scala 2.11.8 is out so...time to upgrade before 2.0.0 is out -> > http://www.scala-lang.org/news/2.11.8/. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13825) Upgrade to Scala 2.11.8
[ https://issues.apache.org/jira/browse/SPARK-13825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190914#comment-15190914 ] Sean Owen commented on SPARK-13825: --- Yes, I think it's OK to update the version in branch 1.6 too. > Upgrade to Scala 2.11.8 > --- > > Key: SPARK-13825 > URL: https://issues.apache.org/jira/browse/SPARK-13825 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Jacek Laskowski >Priority: Minor > > Scala 2.11.8 is out so...time to upgrade before 2.0.0 is out -> > http://www.scala-lang.org/news/2.11.8/. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13825) Upgrade to Scala 2.11.8
[ https://issues.apache.org/jira/browse/SPARK-13825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190930#comment-15190930 ] Jacek Laskowski commented on SPARK-13825: - OK. Thanks. I'm going to send a pull request later today. > Upgrade to Scala 2.11.8 > --- > > Key: SPARK-13825 > URL: https://issues.apache.org/jira/browse/SPARK-13825 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Jacek Laskowski >Priority: Minor > > Scala 2.11.8 is out so...time to upgrade before 2.0.0 is out -> > http://www.scala-lang.org/news/2.11.8/. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13577) Allow YARN to handle multiple jars, archive when uploading Spark dependencies
[ https://issues.apache.org/jira/browse/SPARK-13577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-13577: -- Assignee: Marcelo Vanzin > Allow YARN to handle multiple jars, archive when uploading Spark dependencies > - > > Key: SPARK-13577 > URL: https://issues.apache.org/jira/browse/SPARK-13577 > Project: Spark > Issue Type: Sub-task > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > See parent bug for more details. > Before we remove assemblies from Spark, we need the YARN backend to > understand how to find and upload multiple jars containing the Spark code. as > a feature request made during spec review, we should also allow the Spark > code to be provided as an archive that would be uploaded as a single file to > the cluster, but exploded when downloaded to the containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13577) Allow YARN to handle multiple jars, archive when uploading Spark dependencies
[ https://issues.apache.org/jira/browse/SPARK-13577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-13577. --- Resolution: Fixed > Allow YARN to handle multiple jars, archive when uploading Spark dependencies > - > > Key: SPARK-13577 > URL: https://issues.apache.org/jira/browse/SPARK-13577 > Project: Spark > Issue Type: Sub-task > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > See parent bug for more details. > Before we remove assemblies from Spark, we need the YARN backend to > understand how to find and upload multiple jars containing the Spark code. as > a feature request made during spec review, we should also allow the Spark > code to be provided as an archive that would be uploaded as a single file to > the cluster, but exploded when downloaded to the containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13826) Revise ScalaDoc of the new Dataset API
Cheng Lian created SPARK-13826: -- Summary: Revise ScalaDoc of the new Dataset API Key: SPARK-13826 URL: https://issues.apache.org/jira/browse/SPARK-13826 Project: Spark Issue Type: Sub-task Components: Documentation, SQL Affects Versions: 2.0.0 Reporter: Cheng Lian Assignee: Cheng Lian Tons of DataFrame operations were migrated to Dataset in SPARK-13244. We should revise ScalaDoc of these APIs. The following thing should be updated: - {{@since}} tag - {{@group}} tag - Example code -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13817) Re-enable MiMA check after unifying DataFrame and Dataset API
[ https://issues.apache.org/jira/browse/SPARK-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-13817. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11656 [https://github.com/apache/spark/pull/11656] > Re-enable MiMA check after unifying DataFrame and Dataset API > - > > Key: SPARK-13817 > URL: https://issues.apache.org/jira/browse/SPARK-13817 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 2.0.0 >Reporter: Cheng Lian >Assignee: Cheng Lian > Fix For: 2.0.0 > > > In [PR #11443|https://github.com/apache/spark/pull/11443], we unified > DataFrame and Dataset API. Since this PR did tons of API changes, we disabled > MiMA check temporarily for convenience. Now it is merged, we should re-enable > MiMA check. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13827) Can't add subquery to an operator with same-name outputs while generate SQL string
Wenchen Fan created SPARK-13827: --- Summary: Can't add subquery to an operator with same-name outputs while generate SQL string Key: SPARK-13827 URL: https://issues.apache.org/jira/browse/SPARK-13827 Project: Spark Issue Type: Bug Components: SQL Reporter: Wenchen Fan -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13828) QueryExecution's assertAnalyzed needs to preserve the stacktrace
Cheng Lian created SPARK-13828: -- Summary: QueryExecution's assertAnalyzed needs to preserve the stacktrace Key: SPARK-13828 URL: https://issues.apache.org/jira/browse/SPARK-13828 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.0.0 Reporter: Cheng Lian Assignee: Cheng Lian SPARK-13244 made Dataset always eager analyzed, and added an extra {{plan}} argument to {{AnalysisException}} to facilitate logical plan analysis debugging using {{QueryExecution.assertAnalyzed}}. (Previously we used to temporarily disable DataFrame eager analysis to report the partially analyzed plan tree.) However, the exception stack trace wasn't properly preserved. It should be added back. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13827) Can't add subquery to an operator with same-name outputs while generate SQL string
[ https://issues.apache.org/jira/browse/SPARK-13827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191022#comment-15191022 ] Apache Spark commented on SPARK-13827: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/11658 > Can't add subquery to an operator with same-name outputs while generate SQL > string > -- > > Key: SPARK-13827 > URL: https://issues.apache.org/jira/browse/SPARK-13827 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13827) Can't add subquery to an operator with same-name outputs while generate SQL string
[ https://issues.apache.org/jira/browse/SPARK-13827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13827: Assignee: Apache Spark > Can't add subquery to an operator with same-name outputs while generate SQL > string > -- > > Key: SPARK-13827 > URL: https://issues.apache.org/jira/browse/SPARK-13827 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Wenchen Fan >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13827) Can't add subquery to an operator with same-name outputs while generate SQL string
[ https://issues.apache.org/jira/browse/SPARK-13827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13827: Assignee: (was: Apache Spark) > Can't add subquery to an operator with same-name outputs while generate SQL > string > -- > > Key: SPARK-13827 > URL: https://issues.apache.org/jira/browse/SPARK-13827 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13827) Can't add subquery to an operator with same-name outputs while generate SQL string
[ https://issues.apache.org/jira/browse/SPARK-13827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13827: Assignee: Apache Spark > Can't add subquery to an operator with same-name outputs while generate SQL > string > -- > > Key: SPARK-13827 > URL: https://issues.apache.org/jira/browse/SPARK-13827 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Wenchen Fan >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13825) Upgrade to Scala 2.11.8
[ https://issues.apache.org/jira/browse/SPARK-13825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-13825: -- Component/s: Spark Core > Upgrade to Scala 2.11.8 > --- > > Key: SPARK-13825 > URL: https://issues.apache.org/jira/browse/SPARK-13825 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Jacek Laskowski >Priority: Minor > > Scala 2.11.8 is out so...time to upgrade before 2.0.0 is out -> > http://www.scala-lang.org/news/2.11.8/. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13821) TPC-DS Query 20 fails to compile
[ https://issues.apache.org/jira/browse/SPARK-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-13821: -- Component/s: SQL > TPC-DS Query 20 fails to compile > > > Key: SPARK-13821 > URL: https://issues.apache.org/jira/browse/SPARK-13821 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1 > Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo) > Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 > 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Roy Cecil > > TPC-DS Query 20 Fails to compile with the follwing Error Message > Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( > tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( > expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA > identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) > );]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) > at org.antlr.runtime.DFA.predict(DFA.java:80) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) > Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( > tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( > expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA > identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) > );]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) > at org.antlr.runtime.DFA.predict(DFA.java:80) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13790) Speed up ColumnVector's getDecimal
[ https://issues.apache.org/jira/browse/SPARK-13790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-13790: -- Assignee: Nong Li > Speed up ColumnVector's getDecimal > -- > > Key: SPARK-13790 > URL: https://issues.apache.org/jira/browse/SPARK-13790 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Nong Li >Assignee: Nong Li >Priority: Minor > Fix For: 2.0.0 > > > This should reuse a decimal object for the simple case. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13820) TPC-DS Query 10 fails to compile
[ https://issues.apache.org/jira/browse/SPARK-13820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-13820: -- Component/s: SQL > TPC-DS Query 10 fails to compile > > > Key: SPARK-13820 > URL: https://issues.apache.org/jira/browse/SPARK-13820 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1 > Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo) > Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 > 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Roy Cecil > > TPC-DS Query 10 fails to compile with the following error. > Parsing error: KW_SELECT )=> ( KW_EXISTS subQueryExpression ) -> ^( > TOK_SUBQUERY_EXPR ^( TOK_SUBQUERY_OP KW_EXISTS ) subQueryExpression ) );]) > at org.antlr.runtime.DFA.noViableAlt(DFA.java:158) > at org.antlr.runtime.DFA.predict(DFA.java:144) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceEqualExpression(HiveParser_IdentifiersParser.java:8155) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceNotExpression(HiveParser_IdentifiersParser.java:9177) > Parsing error: KW_SELECT )=> ( KW_EXISTS subQueryExpression ) -> ^( > TOK_SUBQUERY_EXPR ^( TOK_SUBQUERY_OP KW_EXISTS ) subQueryExpression ) );]) > at org.antlr.runtime.DFA.noViableAlt(DFA.java:158) > at org.antlr.runtime.DFA.predict(DFA.java:144) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceEqualExpression(HiveParser_IdentifiersParser.java:8155) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceNotExpression(HiveParser_IdentifiersParser.java:9177) > Query is pasted here for easy reproduction > select > cd_gender, > cd_marital_status, > cd_education_status, > count(*) cnt1, > cd_purchase_estimate, > count(*) cnt2, > cd_credit_rating, > count(*) cnt3, > cd_dep_count, > count(*) cnt4, > cd_dep_employed_count, > count(*) cnt5, > cd_dep_college_count, > count(*) cnt6 > from > customer c > JOIN customer_address ca ON c.c_current_addr_sk = ca.ca_address_sk > JOIN customer_demographics ON cd_demo_sk = c.c_current_cdemo_sk > LEFT SEMI JOIN (select ss_customer_sk > from store_sales >JOIN date_dim ON ss_sold_date_sk = d_date_sk > where > d_year = 2002 and > d_moy between 1 and 1+3) ss_wh1 ON c.c_customer_sk = > ss_wh1.ss_customer_sk > where > ca_county in ('Rush County','Toole County','Jefferson County','Dona Ana > County','La Porte County') and >exists ( > select tmp.customer_sk from ( > select ws_bill_customer_sk as customer_sk > from web_sales,date_dim > where > web_sales.ws_sold_date_sk = date_dim.d_date_sk and > d_year = 2002 and > d_moy between 1 and 1+3 > UNION ALL > select cs_ship_customer_sk as customer_sk > from catalog_sales,date_dim > where > catalog_sales.cs_sold_date_sk = date_dim.d_date_sk and > d_year = 2002 and > d_moy between 1 and 1+3 > ) tmp where c.c_customer_sk = tmp.customer_sk > ) > group by cd_gender, > cd_marital_status, > cd_education_status, > cd_purchase_estimate, > cd_credit_rating, > cd_dep_count, > cd_dep_employed_count, > cd_dep_college_count > order by cd_gender, > cd_marital_status, > cd_education_status, > cd_purchase_estimate, > cd_credit_rating, > cd_dep_count, > cd_dep_employed_count, > cd_dep_college_count > limit 100; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13732) Remove projectList from Windows
[ https://issues.apache.org/jira/browse/SPARK-13732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-13732: -- Assignee: Xiao Li > Remove projectList from Windows > --- > > Key: SPARK-13732 > URL: https://issues.apache.org/jira/browse/SPARK-13732 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li > Fix For: 2.0.0 > > > projectList is useless. Remove it from the class Window. It simplifies the > codes in Analyzer and Optimizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13797) Eliminate Unnecessary Window
[ https://issues.apache.org/jira/browse/SPARK-13797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-13797: -- Assignee: Xiao Li > Eliminate Unnecessary Window > > > Key: SPARK-13797 > URL: https://issues.apache.org/jira/browse/SPARK-13797 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li > Fix For: 2.0.0 > > > If the Window does not have any window expression, it is useless. It might > happen after column pruning -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13776) Web UI is not available after ./sbin/start-master.sh
[ https://issues.apache.org/jira/browse/SPARK-13776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191233#comment-15191233 ] Erik O'Shaughnessy commented on SPARK-13776: [~zsxwing] I've got your PR building, should have a test completed in an hour or so. > Web UI is not available after ./sbin/start-master.sh > > > Key: SPARK-13776 > URL: https://issues.apache.org/jira/browse/SPARK-13776 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.6.0 > Environment: Solaris 11.3, Oracle SPARC T-5 8 with 1024 hardware > threads >Reporter: Erik O'Shaughnessy >Priority: Minor > > The Apache Spark Web UI fails to become available after starting a Spark > master in stand-alone mode: > $ ./sbin/start-master.sh > The log file contains the following: > {quote} > cat spark-hadoop-org.apache.spark.deploy.master.Master-1-t5-8-002.out > Spark Command: /usr/java/bin/java -cp > /usr/local/spark-1.6.0_nohadoop/conf/:/usr/local/spark-1.6.0_nohadoop/assembly/target/scala-2.10/spark-assembly-1.6.0-hadoop2.2.0.jar:/usr/local/spark-1.6.0_nohadoop/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark-1.6.0_nohadoop/lib_managed/jars/datanucleus-rdbms-3.2.9.jar:/usr/local/spark-1.6.0_nohadoop/lib_managed/jars/datanucleus-core-3.2.10.jar > -Xms1g -Xmx1g org.apache.spark.deploy.master.Master --ip t5-8-002 --port > 7077 --webui-port 8080 > > 16/01/27 12:00:42 WARN AbstractConnector: insufficient threads configured for > SelectChannelConnector@0.0.0.0:8080 > 16/01/27 12:00:42 WARN AbstractConnector: insufficient threads configured for > SelectChannelConnector@t5-8-002:6066 > {quote} > I did some poking around and it seems that message is coming from Jetty and > indicates a mismatch between Jetty's default maxThreads configuration and the > actual number of CPUs available on the hardware (1024). I was not able to > find a way to successfully change Jetty's configuration at run-time. > Our work around was to disable CPUs until the WARN messages did not occur in > the log file, which was when NCPUs = 504. > I don't know for certain that this is isn't a known problem in Jetty from > looking at their bug reports, but I wasn't able to locate a Jetty issue that > described this problem. > While not specifically an Apache Spark problem, I thought documenting it > would at least be helpful. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13829) Spark submit with keytab can't submit to a non-HDFS yarn cluster
Steve Loughran created SPARK-13829: -- Summary: Spark submit with keytab can't submit to a non-HDFS yarn cluster Key: SPARK-13829 URL: https://issues.apache.org/jira/browse/SPARK-13829 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.6.1 Environment: Yarn cluster, spark submit launching with keytab& principal, cluster filesystem is *not* HDFS. Reporter: Steve Loughran If you try to submit work to a secure YARN cluster running on any FS other than HDFS, using a keytab+principal over kinited user, you get to see a stack trace from inside {{Client.getTokenRenewalInterval}} root cause: there is no HDFS to get a delegation token, hence no delegation token to examine for a renewal interval -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13829) Spark submit with keytab can't submit to a non-HDFS yarn cluster
[ https://issues.apache.org/jira/browse/SPARK-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191315#comment-15191315 ] Steve Loughran commented on SPARK-13829: {code} 16/03/11 17:34:51 ERROR SparkContext: Error initializing SparkContext.java.util.NoSuchElementException: head of empty list at scala.collection.immutable.Nil$.head(List.scala:337) at scala.collection.immutable.Nil$.head(List.scala:334) at org.apache.spark.deploy.yarn.Client.getTokenRenewalInterval(Client.scala:603) at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:632) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:732) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:143) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.(SparkContext.scala:530) at com.github.ehiggs.spark.terasort.TeraGen$.main(TeraGen.scala:49) at com.github.ehiggs.spark.terasort.TeraGen.main(TeraGen.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {code} > Spark submit with keytab can't submit to a non-HDFS yarn cluster > > > Key: SPARK-13829 > URL: https://issues.apache.org/jira/browse/SPARK-13829 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.1 > Environment: Yarn cluster, spark submit launching with keytab& > principal, cluster filesystem is *not* HDFS. >Reporter: Steve Loughran > > If you try to submit work to a secure YARN cluster running on any FS other > than HDFS, using a keytab+principal over kinited user, you get to see a stack > trace from inside {{Client.getTokenRenewalInterval}} > root cause: there is no HDFS to get a delegation token, hence no delegation > token to examine for a renewal interval -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13776) Web UI is not available after ./sbin/start-master.sh
[ https://issues.apache.org/jira/browse/SPARK-13776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191330#comment-15191330 ] Erik O'Shaughnessy commented on SPARK-13776: Looks good. Here is the log without a conf/spark-default.conf file: {quote} Spark Command: /usr/java/bin/java -cp /home/eoshaugh/local/spark/conf/:/home/eoshaugh/local/spark/assembly/target/scala-2.11/spark-assembly-2.0.0-SNAPSHOT-hadoop2.2.0.jar:/home/eoshaugh/local/spark/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar:/home/eoshaugh/local/spark/lib_managed/jars/datanucleus-core-3.2.10.jar:/home/eoshaugh/local/spark/lib_managed/jars/datanucleus-rdbms-3.2.9.jar -Xms1g -Xmx1g org.apache.spark.deploy.master.Master --ip t5-8-003 --port 7077 --webui-port 8080 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 16/03/11 10:18:33 INFO Master: Started daemon with process name: 114940@t5-8-003 16/03/11 10:18:33 INFO Master: Registered signal handlers for [TERM, HUP, INT] 16/03/11 10:18:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/03/11 10:18:33 INFO SecurityManager: Changing view acls to: eoshaugh 16/03/11 10:18:33 INFO SecurityManager: Changing modify acls to: eoshaugh 16/03/11 10:18:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(eoshaugh); users with modify permissions: Set(eoshaugh) 16/03/11 10:18:34 INFO Utils: Successfully started service 'sparkMaster' on port 7077. 16/03/11 10:18:34 INFO Master: Starting Spark master at spark://t5-8-003:7077 16/03/11 10:18:34 INFO Master: Running Spark version 2.0.0-SNAPSHOT 16/03/11 10:18:34 INFO Utils: Successfully started service 'MasterUI' on port 8080. 16/03/11 10:18:34 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://10.137.232.160:8080 16/03/11 10:18:34 WARN AbstractConnector: insufficient threads configured for SelectChannelConnector@t5-8-003:6066 16/03/11 10:18:34 INFO Utils: Successfully started service on port 6066. 16/03/11 10:18:34 INFO StandaloneRestServer: Started REST server for submitting applications on port 6066 16/03/11 10:18:35 INFO Master: I have been elected leader! New state: ALIVE {quote} > Web UI is not available after ./sbin/start-master.sh > > > Key: SPARK-13776 > URL: https://issues.apache.org/jira/browse/SPARK-13776 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.6.0 > Environment: Solaris 11.3, Oracle SPARC T-5 8 with 1024 hardware > threads >Reporter: Erik O'Shaughnessy >Priority: Minor > > The Apache Spark Web UI fails to become available after starting a Spark > master in stand-alone mode: > $ ./sbin/start-master.sh > The log file contains the following: > {quote} > cat spark-hadoop-org.apache.spark.deploy.master.Master-1-t5-8-002.out > Spark Command: /usr/java/bin/java -cp > /usr/local/spark-1.6.0_nohadoop/conf/:/usr/local/spark-1.6.0_nohadoop/assembly/target/scala-2.10/spark-assembly-1.6.0-hadoop2.2.0.jar:/usr/local/spark-1.6.0_nohadoop/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark-1.6.0_nohadoop/lib_managed/jars/datanucleus-rdbms-3.2.9.jar:/usr/local/spark-1.6.0_nohadoop/lib_managed/jars/datanucleus-core-3.2.10.jar > -Xms1g -Xmx1g org.apache.spark.deploy.master.Master --ip t5-8-002 --port > 7077 --webui-port 8080 > > 16/01/27 12:00:42 WARN AbstractConnector: insufficient threads configured for > SelectChannelConnector@0.0.0.0:8080 > 16/01/27 12:00:42 WARN AbstractConnector: insufficient threads configured for > SelectChannelConnector@t5-8-002:6066 > {quote} > I did some poking around and it seems that message is coming from Jetty and > indicates a mismatch between Jetty's default maxThreads configuration and the > actual number of CPUs available on the hardware (1024). I was not able to > find a way to successfully change Jetty's configuration at run-time. > Our work around was to disable CPUs until the WARN messages did not occur in > the log file, which was when NCPUs = 504. > I don't know for certain that this is isn't a known problem in Jetty from > looking at their bug reports, but I wasn't able to locate a Jetty issue that > described this problem. > While not specifically an Apache Spark problem, I thought documenting it > would at least be helpful. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13780) SQL "incremental" build in maven is broken
[ https://issues.apache.org/jira/browse/SPARK-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-13780. Resolution: Fixed Assignee: Marcelo Vanzin Fix Version/s: 2.0.0 > SQL "incremental" build in maven is broken > -- > > Key: SPARK-13780 > URL: https://issues.apache.org/jira/browse/SPARK-13780 > Project: Spark > Issue Type: Bug > Components: Build, SQL >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin >Priority: Minor > Fix For: 2.0.0 > > > If you build Spark, and later try to build just the SQL module like this: > {code} > mvn ... -pl :spark-sql_2.11 > {code} > You end up with a nasty error: > {noformat} > [error] uncaught exception during compilation: > scala.reflect.internal.Types$TypeError > scala.reflect.internal.Types$TypeError: bad symbolic reference. A signature > in WebUI.class refers to term servlet > in value org.jetty which is not available. > It may be completely missing from the current classpath, or the version on > {noformat} > This is because of bad interaction between shading, Scala's signature field, > and internal APIs exposing shaded classes. > The fix is simple, we just need to add an explicit dependency on the jetty > artifacts to the sql module. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13821) TPC-DS Query 20 fails to compile
[ https://issues.apache.org/jira/browse/SPARK-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ram Sriharsha updated SPARK-13821: -- Description: TPC-DS Query 20 Fails to compile with the follwing Error Message {format} Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) );]) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) at org.antlr.runtime.DFA.predict(DFA.java:80) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) );]) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) at org.antlr.runtime.DFA.predict(DFA.java:80) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) {format} was: TPC-DS Query 20 Fails to compile with the follwing Error Message Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) );]) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) at org.antlr.runtime.DFA.predict(DFA.java:80) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) );]) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) at org.antlr.runtime.DFA.predict(DFA.java:80) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) > TPC-DS Query 20 fails to compile > > > Key: SPARK-13821 > URL: https://issues.apache.org/jira/browse/SPARK-13821 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1 > Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo) > Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 > 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Roy Cecil > > TPC-DS Query 20 Fails to compile with the follwing Error Message > {format} > Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( > tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( > expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA > identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) > );]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) > at org.antlr.runtime.DFA.predict(DFA.jav
[jira] [Updated] (SPARK-13821) TPC-DS Query 20 fails to compile
[ https://issues.apache.org/jira/browse/SPARK-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ram Sriharsha updated SPARK-13821: -- Description: TPC-DS Query 20 Fails to compile with the follwing Error Message {noformat} Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) );]) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) at org.antlr.runtime.DFA.predict(DFA.java:80) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) );]) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) at org.antlr.runtime.DFA.predict(DFA.java:80) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) {noformat} was: TPC-DS Query 20 Fails to compile with the follwing Error Message {format} Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) );]) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) at org.antlr.runtime.DFA.predict(DFA.java:80) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) );]) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) at org.antlr.runtime.DFA.predict(DFA.java:80) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) {format} > TPC-DS Query 20 fails to compile > > > Key: SPARK-13821 > URL: https://issues.apache.org/jira/browse/SPARK-13821 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1 > Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo) > Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 > 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Roy Cecil > > TPC-DS Query 20 Fails to compile with the follwing Error Message > {noformat} > Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( > tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( > expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA > identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) > );]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) > at org.antlr.runti
[jira] [Commented] (SPARK-13829) Spark submit with keytab can't submit to a non-HDFS yarn cluster
[ https://issues.apache.org/jira/browse/SPARK-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191372#comment-15191372 ] Steve Loughran commented on SPARK-13829: its a bit subtle here, as it's not clear you need a keytab+principal for long lived work on an non-HDFS cluster unless you need tokens to talk to Hive or HBase. No HDFS ==> no hdfs delegation tokens to renew, refresh and propagate. > Spark submit with keytab can't submit to a non-HDFS yarn cluster > > > Key: SPARK-13829 > URL: https://issues.apache.org/jira/browse/SPARK-13829 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.1 > Environment: Yarn cluster, spark submit launching with keytab& > principal, cluster filesystem is *not* HDFS. >Reporter: Steve Loughran > > If you try to submit work to a secure YARN cluster running on any FS other > than HDFS, using a keytab+principal over kinited user, you get to see a stack > trace from inside {{Client.getTokenRenewalInterval}} > root cause: there is no HDFS to get a delegation token, hence no delegation > token to examine for a renewal interval -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-13821) TPC-DS Query 20 fails to compile
[ https://issues.apache.org/jira/browse/SPARK-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roy Cecil updated SPARK-13821: -- Comment: was deleted (was: Query Text is select i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price ,sum(cs_ext_sales_price) as itemrevenue ,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over (partition by i_class) as revenueratio from catalog_sales ,item ,date_dim where cs_item_sk = i_item_sk and i_category in ('Sports', 'Books', 'Home') and cs_sold_date_sk = d_date_sk and d_date between cast('1999-02-22' as date) and date_add(cast('1999-02-22' as date), 30) group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio LIMIT 100;) > TPC-DS Query 20 fails to compile > > > Key: SPARK-13821 > URL: https://issues.apache.org/jira/browse/SPARK-13821 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1 > Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo) > Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 > 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Roy Cecil > > TPC-DS Query 20 Fails to compile with the follwing Error Message > {noformat} > Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( > tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( > expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA > identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) > );]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) > at org.antlr.runtime.DFA.predict(DFA.java:80) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) > Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( > tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( > expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA > identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) > );]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) > at org.antlr.runtime.DFA.predict(DFA.java:80) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-13821) TPC-DS Query 20 fails to compile
[ https://issues.apache.org/jira/browse/SPARK-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roy Cecil updated SPARK-13821: -- Comment: was deleted (was: select i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price ,sum(cs_ext_sales_price) as itemrevenue ,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over (partition by i_class) as revenueratio from catalog_sales ,item ,date_dim where cs_item_sk = i_item_sk and i_category in ('Sports', 'Books', 'Home') and cs_sold_date_sk = d_date_sk and d_date between cast('1999-02-22' as date) and date_add(cast('1999-02-22' as date), 30) group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio LIMIT 100;) > TPC-DS Query 20 fails to compile > > > Key: SPARK-13821 > URL: https://issues.apache.org/jira/browse/SPARK-13821 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1 > Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo) > Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 > 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Roy Cecil > > TPC-DS Query 20 Fails to compile with the follwing Error Message > {noformat} > Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( > tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( > expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA > identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) > );]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) > at org.antlr.runtime.DFA.predict(DFA.java:80) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) > Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( > tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( > expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA > identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) > );]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) > at org.antlr.runtime.DFA.predict(DFA.java:80) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-13821) TPC-DS Query 20 fails to compile
[ https://issues.apache.org/jira/browse/SPARK-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roy Cecil updated SPARK-13821: -- Comment: was deleted (was: Query Text is select i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price ,sum(cs_ext_sales_price) as itemrevenue ,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over (partition by i_class) as revenueratio from catalog_sales ,item ,date_dim where cs_item_sk = i_item_sk and i_category in ('Sports', 'Books', 'Home') and cs_sold_date_sk = d_date_sk and d_date between cast('1999-02-22' as date) and date_add(cast('1999-02-22' as date), 30) group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio LIMIT 100;) > TPC-DS Query 20 fails to compile > > > Key: SPARK-13821 > URL: https://issues.apache.org/jira/browse/SPARK-13821 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1 > Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo) > Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 > 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Roy Cecil > > TPC-DS Query 20 Fails to compile with the follwing Error Message > {noformat} > Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( > tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( > expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA > identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) > );]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) > at org.antlr.runtime.DFA.predict(DFA.java:80) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) > Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( > tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( > expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA > identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) > );]) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835) > at org.antlr.runtime.DFA.predict(DFA.java:80) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13830) Fetch large directly result from executor is very slow
Davies Liu created SPARK-13830: -- Summary: Fetch large directly result from executor is very slow Key: SPARK-13830 URL: https://issues.apache.org/jira/browse/SPARK-13830 Project: Spark Issue Type: Task Components: Spark Core Reporter: Davies Liu Given two task with 100+M result on each, it take more than 50 seconds to fetch the results. The RPC may be not designed to handle large block, we should use block manager for that. But currently this is based on spark.rpc.message.maxSize, which is usually very large (> 128M) for safe, it's too large for handling results. We also counting the time to fetch the direct result (also deserialize it) as schedule delay, it also make sense to only fetch much smaller blocks via DirectResult. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13831) TPC-DS Query 35 fails with the following compile error
Roy Cecil created SPARK-13831: - Summary: TPC-DS Query 35 fails with the following compile error Key: SPARK-13831 URL: https://issues.apache.org/jira/browse/SPARK-13831 Project: Spark Issue Type: Bug Components: SQL Reporter: Roy Cecil TPC-DS Query 35 fails with the following compile error. Scala.NotImplementedError: scala.NotImplementedError: No parse rules for ASTNode type: 864, text: TOK_SUBQUERY_EXPR : TOK_SUBQUERY_EXPR 1, 439,797, 1370 TOK_SUBQUERY_OP 1, 439,439, 1370 exists 1, 439,439, 1370 TOK_QUERY 1, 441,797, 1508 Pasting Query 35 for easy reference. select ca_state, cd_gender, cd_marital_status, cd_dep_count, count(*) cnt1, min(cd_dep_count) cd_dep_count1, max(cd_dep_count) cd_dep_count2, avg(cd_dep_count) cd_dep_count3, cd_dep_employed_count, count(*) cnt2, min(cd_dep_employed_count) cd_dep_employed_count1, max(cd_dep_employed_count) cd_dep_employed_count2, avg(cd_dep_employed_count) cd_dep_employed_count3, cd_dep_college_count, count(*) cnt3, min(cd_dep_college_count) cd_dep_college_count1, max(cd_dep_college_count) cd_dep_college_count2, avg(cd_dep_college_count) cd_dep_college_count3 from customer c JOIN customer_address ca ON c.c_current_addr_sk = ca.ca_address_sk JOIN customer_demographics ON cd_demo_sk = c.c_current_cdemo_sk LEFT SEMI JOIN (select ss_customer_sk from store_sales JOIN date_dim ON ss_sold_date_sk = d_date_sk where d_year = 2002 and d_qoy < 4) ss_wh1 ON c.c_customer_sk = ss_wh1.ss_customer_sk where exists ( select tmp.customer_sk from ( select ws_bill_customer_sk as customer_sk from web_sales,date_dim where ws_sold_date_sk = d_date_sk and d_year = 2002 and d_qoy < 4 UNION ALL select cs_ship_customer_sk as customer_sk from catalog_sales,date_dim where cs_sold_date_sk = d_date_sk and d_year = 2002 and d_qoy < 4 ) tmp where c.c_customer_sk = tmp.customer_sk ) group by ca_state, cd_gender, cd_marital_status, cd_dep_count, cd_dep_employed_count, cd_dep_college_count order by ca_state, cd_gender, cd_marital_status, cd_dep_count, cd_dep_employed_count, cd_dep_college_count limit 100; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13832) TPC-DS Query 36 fails with Parser error
Roy Cecil created SPARK-13832: - Summary: TPC-DS Query 36 fails with Parser error Key: SPARK-13832 URL: https://issues.apache.org/jira/browse/SPARK-13832 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.1 Reporter: Roy Cecil TPC-DS query 36 fails with the following error -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13832) TPC-DS Query 36 fails with Parser error
[ https://issues.apache.org/jira/browse/SPARK-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roy Cecil updated SPARK-13832: -- Description: TPC-DS query 36 fails with the following error Analyzer error: 16/02/28 21:22:51 INFO parse.ParseDriver: Parse Completed Exception in thread "main" org.apache.spark.sql.AnalysisException: expression 'i_category' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44) Query Text pasted here for quick reference. select sum(ss_net_profit)/sum(ss_ext_sales_price) as gross_margin ,i_category ,i_class ,grouping__id as lochierarchy ,rank() over ( partition by grouping__id, case when grouping__id = 0 then i_category end order by sum(ss_net_profit)/sum(ss_ext_sales_price) asc) as rank_within_parent from store_sales ,date_dim d1 ,item ,store where d1.d_year = 2001 and d1.d_date_sk = ss_sold_date_sk and i_item_sk = ss_item_sk and s_store_sk = ss_store_sk and s_state in ('TN','TN','TN','TN', 'TN','TN','TN','TN') group by i_category,i_class WITH ROLLUP order by lochierarchy desc ,case when lochierarchy = 0 then i_category end ,rank_within_parent limit 100; was: TPC-DS query 36 fails with the following error > TPC-DS Query 36 fails with Parser error > --- > > Key: SPARK-13832 > URL: https://issues.apache.org/jira/browse/SPARK-13832 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1 >Reporter: Roy Cecil > > TPC-DS query 36 fails with the following error > Analyzer error: 16/02/28 21:22:51 INFO parse.ParseDriver: Parse Completed > Exception in thread "main" org.apache.spark.sql.AnalysisException: expression > 'i_category' is neither present in the group by, nor is it an aggregate > function. Add to group by or wrap in first() (or first_value) if you don't > care which value you get.; > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44) > Query Text pasted here for quick reference. > select > sum(ss_net_profit)/sum(ss_ext_sales_price) as gross_margin >,i_category >,i_class >,grouping__id as lochierarchy >,rank() over ( > partition by grouping__id, > case when grouping__id = 0 then i_category end > order by sum(ss_net_profit)/sum(ss_ext_sales_price) asc) as > rank_within_parent > from > store_sales >,date_dim d1 >,item >,store > where > d1.d_year = 2001 > and d1.d_date_sk = ss_sold_date_sk > and i_item_sk = ss_item_sk > and s_store_sk = ss_store_sk > and s_state in ('TN','TN','TN','TN', > 'TN','TN','TN','TN') > group by i_category,i_class WITH ROLLUP > order by >lochierarchy desc > ,case when lochierarchy = 0 then i_category end > ,rank_within_parent > limit 100; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13832) TPC-DS Query 36 fails with Parser error
[ https://issues.apache.org/jira/browse/SPARK-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roy Cecil updated SPARK-13832: -- Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo) Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux > TPC-DS Query 36 fails with Parser error > --- > > Key: SPARK-13832 > URL: https://issues.apache.org/jira/browse/SPARK-13832 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1 > Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo) > Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 > 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Roy Cecil > > TPC-DS query 36 fails with the following error > Analyzer error: 16/02/28 21:22:51 INFO parse.ParseDriver: Parse Completed > Exception in thread "main" org.apache.spark.sql.AnalysisException: expression > 'i_category' is neither present in the group by, nor is it an aggregate > function. Add to group by or wrap in first() (or first_value) if you don't > care which value you get.; > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44) > Query Text pasted here for quick reference. > select > sum(ss_net_profit)/sum(ss_ext_sales_price) as gross_margin >,i_category >,i_class >,grouping__id as lochierarchy >,rank() over ( > partition by grouping__id, > case when grouping__id = 0 then i_category end > order by sum(ss_net_profit)/sum(ss_ext_sales_price) asc) as > rank_within_parent > from > store_sales >,date_dim d1 >,item >,store > where > d1.d_year = 2001 > and d1.d_date_sk = ss_sold_date_sk > and i_item_sk = ss_item_sk > and s_store_sk = ss_store_sk > and s_state in ('TN','TN','TN','TN', > 'TN','TN','TN','TN') > group by i_category,i_class WITH ROLLUP > order by >lochierarchy desc > ,case when lochierarchy = 0 then i_category end > ,rank_within_parent > limit 100; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13328) Possible poor read performance for broadcast variables with dynamic resource allocation
[ https://issues.apache.org/jira/browse/SPARK-13328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-13328. --- Resolution: Fixed Assignee: Nezih Yigitbasi Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Possible poor read performance for broadcast variables with dynamic resource > allocation > --- > > Key: SPARK-13328 > URL: https://issues.apache.org/jira/browse/SPARK-13328 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2 >Reporter: Nezih Yigitbasi >Assignee: Nezih Yigitbasi > Fix For: 2.0.0 > > > When dynamic resource allocation is enabled fetching broadcast variables from > removed executors were causing job failures and SPARK-9591 fixed this problem > by trying all locations of a block before giving up. However, the locations > of a block is retrieved only once from the driver in this process and the > locations in this list can be stale due to dynamic resource allocation. This > situation gets worse when running on a large cluster as the size of this > location list can be in the order of several hundreds out of which there may > be tens of stale entries. What we have observed is with the default settings > of 3 max retries and 5s between retries (that's 15s per location) the time it > takes to read a broadcast variable can be as high as ~17m (below log shows > the failed 70th block fetch attempt where each attempt takes 15s) > {code} > ... > 16/02/13 01:02:27 WARN storage.BlockManager: Failed to fetch remote block > broadcast_18_piece0 from BlockManagerId(8, ip-10-178-77-38.ec2.internal, > 60675) (failed attempt 70) > ... > 16/02/13 01:02:27 INFO broadcast.TorrentBroadcast: Reading broadcast variable > 18 took 1051049 ms > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13830) Fetch large directly result from executor is very slow
[ https://issues.apache.org/jira/browse/SPARK-13830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13830: Assignee: (was: Apache Spark) > Fetch large directly result from executor is very slow > -- > > Key: SPARK-13830 > URL: https://issues.apache.org/jira/browse/SPARK-13830 > Project: Spark > Issue Type: Task > Components: Spark Core >Reporter: Davies Liu > > Given two task with 100+M result on each, it take more than 50 seconds to > fetch the results. > The RPC may be not designed to handle large block, we should use block > manager for that. But currently this is based on spark.rpc.message.maxSize, > which is usually very large (> 128M) for safe, it's too large for handling > results. > We also counting the time to fetch the direct result (also deserialize it) as > schedule delay, it also make sense to only fetch much smaller blocks via > DirectResult. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13830) Fetch large directly result from executor is very slow
[ https://issues.apache.org/jira/browse/SPARK-13830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191511#comment-15191511 ] Apache Spark commented on SPARK-13830: -- User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/11659 > Fetch large directly result from executor is very slow > -- > > Key: SPARK-13830 > URL: https://issues.apache.org/jira/browse/SPARK-13830 > Project: Spark > Issue Type: Task > Components: Spark Core >Reporter: Davies Liu > > Given two task with 100+M result on each, it take more than 50 seconds to > fetch the results. > The RPC may be not designed to handle large block, we should use block > manager for that. But currently this is based on spark.rpc.message.maxSize, > which is usually very large (> 128M) for safe, it's too large for handling > results. > We also counting the time to fetch the direct result (also deserialize it) as > schedule delay, it also make sense to only fetch much smaller blocks via > DirectResult. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13830) Fetch large directly result from executor is very slow
[ https://issues.apache.org/jira/browse/SPARK-13830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13830: Assignee: Apache Spark > Fetch large directly result from executor is very slow > -- > > Key: SPARK-13830 > URL: https://issues.apache.org/jira/browse/SPARK-13830 > Project: Spark > Issue Type: Task > Components: Spark Core >Reporter: Davies Liu >Assignee: Apache Spark > > Given two task with 100+M result on each, it take more than 50 seconds to > fetch the results. > The RPC may be not designed to handle large block, we should use block > manager for that. But currently this is based on spark.rpc.message.maxSize, > which is usually very large (> 128M) for safe, it's too large for handling > results. > We also counting the time to fetch the direct result (also deserialize it) as > schedule delay, it also make sense to only fetch much smaller blocks via > DirectResult. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13833) Guard against race condition when re-caching spilled bytes in memory
Josh Rosen created SPARK-13833: -- Summary: Guard against race condition when re-caching spilled bytes in memory Key: SPARK-13833 URL: https://issues.apache.org/jira/browse/SPARK-13833 Project: Spark Issue Type: Improvement Components: Block Manager Reporter: Josh Rosen Assignee: Josh Rosen When reading data from the DiskStore and attempting to cache it back into the memory store, we should guard against race conditions where multiple readers are attempting to re-cache the same block in memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13806) SQL round() produces incorrect results for negative values
[ https://issues.apache.org/jira/browse/SPARK-13806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Hamstra updated SPARK-13806: - Description: Round in catalyst/expressions/mathExpressions.scala appears to be untested with negative values, and it doesn't handle them correctly. There are at least two issues here: First, in the genCode for FloatType and DoubleType with _scale == 0, round() will not produce the same results as for the BigDecimal.ROUND_HALF_UP strategy used in all other cases. This is because Math.round is used for these _scale == 0 cases. For example, Math.round(-3.5) is -3, while BigDecimal.ROUND_HALF_UP at scale 0 for -3.5 is -4. Even after this bug is fixed with something like... {code} if (${ce.value} < 0) { ${ev.value} = -1 * Math.round(-1 * ${ce.value}); } else { ${ev.value} = Math.round(${ce.value}); } {code} ...which will allow an additional test like this to succeed in MathFunctionsSuite.scala: {code} checkEvaluation(Round(-3.5D, 0), -4.0D, EmptyRow) {code} ...there still appears to be a problem on at least the checkEvalutionWithUnsafeProjection path, where failures like this are produced: {code} Incorrect evaluation in unsafe mode: round(-3.141592653589793, -6), actual: [0,0], expected: [0,8000] (ExpressionEvalHelper.scala:145) {code} was: Round in catalyst/expressions/mathExpressions.scala appears to be untested with negative values, and it doesn't handle them correctly. There are at least two issues here: First, in the genCode for FloatType and DoubleType with _scale == 0, round() will not produce the same results as for the BigDecimal.ROUND_HALF_UP strategy used in all other cases. This is because Math.round is used for these _scale == 0 cases. For example, Math.round(-3.5) is -3, while BigDecimal.ROUND_HALF_UP at scale 0 for -3.5 is -4. Even after this bug is fixed with something like... {code} if (${ce.value} < 0) { ${ev.value} = -1 * Math.round(-1 * ${ce.value}); } else { ${ev.value} = Math.round(${ce.value}); } {code} ...which will allow an additional test like this to succeed in MathFunctionsSuite.scala: {code} checkEvaluation(Round(-3.5D, 0), -4.0D) {code} ...there still appears to be a problem on at least the checkEvalutionWithUnsafeProjection path, where failures like this are produced: {code} Incorrect evaluation in unsafe mode: round(-3.141592653589793, -6), actual: [0,0], expected: [0,8000] (ExpressionEvalHelper.scala:145) {code} > SQL round() produces incorrect results for negative values > -- > > Key: SPARK-13806 > URL: https://issues.apache.org/jira/browse/SPARK-13806 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0, 1.6.1, 2.0.0 >Reporter: Mark Hamstra > > Round in catalyst/expressions/mathExpressions.scala appears to be untested > with negative values, and it doesn't handle them correctly. > There are at least two issues here: > First, in the genCode for FloatType and DoubleType with _scale == 0, round() > will not produce the same results as for the BigDecimal.ROUND_HALF_UP > strategy used in all other cases. This is because Math.round is used for > these _scale == 0 cases. For example, Math.round(-3.5) is -3, while > BigDecimal.ROUND_HALF_UP at scale 0 for -3.5 is -4. > Even after this bug is fixed with something like... > {code} > if (${ce.value} < 0) { > ${ev.value} = -1 * Math.round(-1 * ${ce.value}); > } else { > ${ev.value} = Math.round(${ce.value}); > } > {code} > ...which will allow an additional test like this to succeed in > MathFunctionsSuite.scala: > {code} > checkEvaluation(Round(-3.5D, 0), -4.0D, EmptyRow) > {code} > ...there still appears to be a problem on at least the > checkEvalutionWithUnsafeProjection path, where failures like this are > produced: > {code} > Incorrect evaluation in unsafe mode: round(-3.141592653589793, -6), actual: > [0,0], expected: [0,8000] (ExpressionEvalHelper.scala:145) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13833) Guard against race condition when re-caching spilled bytes in memory
[ https://issues.apache.org/jira/browse/SPARK-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191598#comment-15191598 ] Apache Spark commented on SPARK-13833: -- User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/11660 > Guard against race condition when re-caching spilled bytes in memory > > > Key: SPARK-13833 > URL: https://issues.apache.org/jira/browse/SPARK-13833 > Project: Spark > Issue Type: Improvement > Components: Block Manager >Reporter: Josh Rosen >Assignee: Josh Rosen > > When reading data from the DiskStore and attempting to cache it back into the > memory store, we should guard against race conditions where multiple readers > are attempting to re-cache the same block in memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13833) Guard against race condition when re-caching spilled bytes in memory
[ https://issues.apache.org/jira/browse/SPARK-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13833: Assignee: Josh Rosen (was: Apache Spark) > Guard against race condition when re-caching spilled bytes in memory > > > Key: SPARK-13833 > URL: https://issues.apache.org/jira/browse/SPARK-13833 > Project: Spark > Issue Type: Improvement > Components: Block Manager >Reporter: Josh Rosen >Assignee: Josh Rosen > > When reading data from the DiskStore and attempting to cache it back into the > memory store, we should guard against race conditions where multiple readers > are attempting to re-cache the same block in memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13833) Guard against race condition when re-caching spilled bytes in memory
[ https://issues.apache.org/jira/browse/SPARK-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13833: Assignee: Apache Spark (was: Josh Rosen) > Guard against race condition when re-caching spilled bytes in memory > > > Key: SPARK-13833 > URL: https://issues.apache.org/jira/browse/SPARK-13833 > Project: Spark > Issue Type: Improvement > Components: Block Manager >Reporter: Josh Rosen >Assignee: Apache Spark > > When reading data from the DiskStore and attempting to cache it back into the > memory store, we should guard against race conditions where multiple readers > are attempting to re-cache the same block in memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13833) Guard against race condition when re-caching spilled bytes in memory
[ https://issues.apache.org/jira/browse/SPARK-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13833: Assignee: Apache Spark (was: Josh Rosen) > Guard against race condition when re-caching spilled bytes in memory > > > Key: SPARK-13833 > URL: https://issues.apache.org/jira/browse/SPARK-13833 > Project: Spark > Issue Type: Improvement > Components: Block Manager >Reporter: Josh Rosen >Assignee: Apache Spark > > When reading data from the DiskStore and attempting to cache it back into the > memory store, we should guard against race conditions where multiple readers > are attempting to re-cache the same block in memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13833) Guard against race condition when re-caching spilled bytes in memory
[ https://issues.apache.org/jira/browse/SPARK-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13833: Assignee: Josh Rosen (was: Apache Spark) > Guard against race condition when re-caching spilled bytes in memory > > > Key: SPARK-13833 > URL: https://issues.apache.org/jira/browse/SPARK-13833 > Project: Spark > Issue Type: Improvement > Components: Block Manager >Reporter: Josh Rosen >Assignee: Josh Rosen > > When reading data from the DiskStore and attempting to cache it back into the > memory store, we should guard against race conditions where multiple readers > are attempting to re-cache the same block in memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13834) Update sbt for 2.x
Dongjoon Hyun created SPARK-13834: - Summary: Update sbt for 2.x Key: SPARK-13834 URL: https://issues.apache.org/jira/browse/SPARK-13834 Project: Spark Issue Type: Improvement Reporter: Dongjoon Hyun Priority: Minor For 2.0.0, we had better bump `sbt`, too. {code:title=project/build.properties|borderStyle=solid} -sbt.version=0.13.9 +sbt.version=0.13.11 {code} SBT 0.13.11 fixes wrong warnings and improve incremental compilation. *REFERENCE* https://github.com/sbt/sbt/releases/tag/v0.13.11 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13835) IsNotNull Filters for the BinaryComparison inside Not
Xiao Li created SPARK-13835: --- Summary: IsNotNull Filters for the BinaryComparison inside Not Key: SPARK-13835 URL: https://issues.apache.org/jira/browse/SPARK-13835 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.0.0 Reporter: Xiao Li So far, inside Not, we only generate IsNotNull Constraints for Equal. However, we also can do it for the others: LessThan, LessThanOrEqual, GreaterThan, GreaterThanOrEqual -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13834) Update sbt for 2.x
[ https://issues.apache.org/jira/browse/SPARK-13834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191616#comment-15191616 ] Apache Spark commented on SPARK-13834: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/11661 > Update sbt for 2.x > -- > > Key: SPARK-13834 > URL: https://issues.apache.org/jira/browse/SPARK-13834 > Project: Spark > Issue Type: Improvement >Reporter: Dongjoon Hyun >Priority: Minor > > For 2.0.0, we had better bump `sbt`, too. > {code:title=project/build.properties|borderStyle=solid} > -sbt.version=0.13.9 > +sbt.version=0.13.11 > {code} > SBT 0.13.11 fixes wrong warnings and improve incremental compilation. > *REFERENCE* > https://github.com/sbt/sbt/releases/tag/v0.13.11 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13836) IsNotNull Constraints for the BinaryComparison inside Not
Xiao Li created SPARK-13836: --- Summary: IsNotNull Constraints for the BinaryComparison inside Not Key: SPARK-13836 URL: https://issues.apache.org/jira/browse/SPARK-13836 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.0.0 Reporter: Xiao Li So far, inside Not, we only generate IsNotNull Constraints for Equal. However, we also can do it for the others: LessThan, LessThanOrEqual, GreaterThan, GreaterThanOrEqual -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13834) Update sbt for 2.x
[ https://issues.apache.org/jira/browse/SPARK-13834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13834: Assignee: Apache Spark > Update sbt for 2.x > -- > > Key: SPARK-13834 > URL: https://issues.apache.org/jira/browse/SPARK-13834 > Project: Spark > Issue Type: Improvement >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Minor > > For 2.0.0, we had better bump `sbt`, too. > {code:title=project/build.properties|borderStyle=solid} > -sbt.version=0.13.9 > +sbt.version=0.13.11 > {code} > SBT 0.13.11 fixes wrong warnings and improve incremental compilation. > *REFERENCE* > https://github.com/sbt/sbt/releases/tag/v0.13.11 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org