date:20160311

[jira] [Assigned] (SPARK-13812) Fix SparkR lint-r test errors

2016-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13812:


Assignee: Apache Spark

> Fix SparkR lint-r test errors
> -
>
> Key: SPARK-13812
> URL: https://issues.apache.org/jira/browse/SPARK-13812
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.1
>Reporter: Sun Rui
>Assignee: Apache Spark
>
> After get updated from github, the lintr package can detect errors that are 
> not detected in previous versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13812) Fix SparkR lint-r test errors

2016-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13812:


Assignee: (was: Apache Spark)

> Fix SparkR lint-r test errors
> -
>
> Key: SPARK-13812
> URL: https://issues.apache.org/jira/browse/SPARK-13812
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.1
>Reporter: Sun Rui
>
> After get updated from github, the lintr package can detect errors that are 
> not detected in previous versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13812) Fix SparkR lint-r test errors

2016-03-11 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190607#comment-15190607
 ] 

Apache Spark commented on SPARK-13812:
--

User 'sun-rui' has created a pull request for this issue:
https://github.com/apache/spark/pull/11652

> Fix SparkR lint-r test errors
> -
>
> Key: SPARK-13812
> URL: https://issues.apache.org/jira/browse/SPARK-13812
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.1
>Reporter: Sun Rui
>
> After get updated from github, the lintr package can detect errors that are 
> not detected in previous versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13815) UnsupportedOperationException: empty collection when metadata for a Pipeline is an empty file

2016-03-11 Thread Jacek Laskowski (JIRA)

Jacek Laskowski created SPARK-13815:
---

 Summary: UnsupportedOperationException: empty collection when 
metadata for a Pipeline is an empty file
 Key: SPARK-13815
 URL: https://issues.apache.org/jira/browse/SPARK-13815
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 2.0.0
 Environment: today's build of 2.0.0-SNAPSHOT
Reporter: Jacek Laskowski
Priority: Minor


The following code that loads a {{Pipeline}} from an empty {{metadata}} file 
throws an exception (expected) that says nothing about the real cause of it.

{code}
$ ls -l hello-pipeline/metadata
-rw-r--r--  1 jacek  staff  0 11 mar 09:00 hello-pipeline/metadata

scala> Pipeline.read.load("hello-pipeline")
...
java.lang.UnsupportedOperationException: empty collection
at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1344)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.first(RDD.scala:1341)
at 
org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:285)
at 
org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:253)
at org.apache.spark.ml.Pipeline$PipelineReader.load(Pipeline.scala:203)
at org.apache.spark.ml.Pipeline$PipelineReader.load(Pipeline.scala:197)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13804) Spark SQL's DataFrame.count() Major Divergent (Non-Linear) Performance Slowdown going from 4million rows to 16+ million rows

2016-03-11 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-13804.
---
Resolution: Invalid

So, this started as an entirely different issue. This is better since the 
original sounds like a duplicate of your other JIRA. This however should be a 
question to user@ to start. There are too many possiblities that don't mean 
there's a bug (major one being not being able to cache the data set)

> Spark SQL's DataFrame.count()  Major Divergent (Non-Linear) Performance 
> Slowdown going from 4million rows to 16+ million rows
> -
>
> Key: SPARK-13804
> URL: https://issues.apache.org/jira/browse/SPARK-13804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: - 3 nodes Spark cluster: 1 master node and 2 slave nodes
> - Each node is an EC2 with c3.4xlarge
> - Each node has 16 cores and 30GB of RAM
>Reporter: Michael Nguyen
>
> Spark SQL is used to load cvs files via com.databricks.spark.csv and then run 
> dataFrame.count() 
> In the same environment with plenty of CPU and RAM, Spark SQL takes 
> - 18.25 seconds to load  a table with 4 millions vs
> - 346.624 seconds (5.77 minutes) to load a table with 16 million rows.
> Even though the number of rows increases by 4 times, the time it takes Spark 
> SQL to run dataframe.count () increases by 19.22 times. So the performance of 
> dataframe.count () diverges so drastically.
> 1. Why does Spark SQL's performance not proportional to the number of rows 
> while there is plenty of CPU and RAM (it uses only 10GB out of 30GB RAM) ?
> 2. What can be done to fix  this performance issue ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13816) Add parameter checks for algorithms in Graphx

2016-03-11 Thread zhengruifeng (JIRA)

zhengruifeng created SPARK-13816:


 Summary: Add parameter checks for algorithms in Graphx 
 Key: SPARK-13816
 URL: https://issues.apache.org/jira/browse/SPARK-13816
 Project: Spark
  Issue Type: Improvement
  Components: GraphX
Reporter: zhengruifeng
Priority: Trivial


Add parameter checks in Graphx-Algorithms:

maxIterations in Pregel 
maxSteps in LabelPropagation
numIter,resetProb,tol in PageRank
maxIters,maxVal,minVal in SVDPlusPlus





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13816) Add parameter checks for algorithms in Graphx

2016-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13816:


Assignee: (was: Apache Spark)

> Add parameter checks for algorithms in Graphx 
> --
>
> Key: SPARK-13816
> URL: https://issues.apache.org/jira/browse/SPARK-13816
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Reporter: zhengruifeng
>Priority: Trivial
>
> Add parameter checks in Graphx-Algorithms:
> maxIterations in Pregel 
> maxSteps in LabelPropagation
> numIter,resetProb,tol in PageRank
> maxIters,maxVal,minVal in SVDPlusPlus



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13816) Add parameter checks for algorithms in Graphx

2016-03-11 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190703#comment-15190703
 ] 

Apache Spark commented on SPARK-13816:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/11655

> Add parameter checks for algorithms in Graphx 
> --
>
> Key: SPARK-13816
> URL: https://issues.apache.org/jira/browse/SPARK-13816
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Reporter: zhengruifeng
>Priority: Trivial
>
> Add parameter checks in Graphx-Algorithms:
> maxIterations in Pregel 
> maxSteps in LabelPropagation
> numIter,resetProb,tol in PageRank
> maxIters,maxVal,minVal in SVDPlusPlus



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13816) Add parameter checks for algorithms in Graphx

2016-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13816:


Assignee: Apache Spark

> Add parameter checks for algorithms in Graphx 
> --
>
> Key: SPARK-13816
> URL: https://issues.apache.org/jira/browse/SPARK-13816
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Reporter: zhengruifeng
>Assignee: Apache Spark
>Priority: Trivial
>
> Add parameter checks in Graphx-Algorithms:
> maxIterations in Pregel 
> maxSteps in LabelPropagation
> numIter,resetProb,tol in PageRank
> maxIters,maxVal,minVal in SVDPlusPlus



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Deleted] (SPARK-13553) Migrate basic inspection operations

2016-03-11 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian deleted SPARK-13553:
---


> Migrate basic inspection operations
> ---
>
> Key: SPARK-13553
> URL: https://issues.apache.org/jira/browse/SPARK-13553
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Should migrate the following methods and corresponding tests to Dataset:
> {noformat}
> - Basic inspection operations
>   - dtypes
>   - columns
>   - printSchema
>   - explain
>   - Column accessors
> - col
> - apply
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Deleted] (SPARK-13554) Migrate typed relational operations

2016-03-11 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian deleted SPARK-13554:
---


> Migrate typed relational operations
> ---
>
> Key: SPARK-13554
> URL: https://issues.apache.org/jira/browse/SPARK-13554
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Should migrate the following methods and corresponding tests to Dataset:
> {noformat}
> - Relational operations
>   - Typed relational operations
> - as(String): Dataset[T] // Subquery
> - filter(Column): Dataset[T]
> - filter(String): Dataset[T]
> - where(Column): Dataset[T]
> - where(String): Dataset[T]
> - limit(n): Dataset[T]
> - sortWithinPartitions(String, String*): Dataset[T]
> - sortWithinPartitions(Column*): Dataset[T]
> - sort(String, String*): Dataset[T]
> - sort(Column*): Dataset[T]
> - orderBy(String, String*): Dataset[T]
> - orderBy(Column*): Dataset[T]
> - randomSplit(Array[Double], Long): Array[Dataset[T]]
> - randomSplit(Array[Double]): Array[Dataset[T]]
> - Set operations
>   - unionAll // alias of union (remove it?)
>   - except // alias of substract (remove it?)
> - Repartitioning
>   - repartition(Int, Column*): Dataset[T]
>   - repartition(Column*): Dataset[T]
> - explode[A <: Product: TypeTag](Column*)(Row => TraversableOnce[A]): 
> Dataset[A]
> - explode[A, B: TypeTag](String, String)(A => TraversableOnce[B]): 
> Dataset[B]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Deleted] (SPARK-13553) Migrate basic inspection operations

2016-03-11 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian deleted SPARK-13553:
---


> Migrate basic inspection operations
> ---
>
> Key: SPARK-13553
> URL: https://issues.apache.org/jira/browse/SPARK-13553
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Should migrate the following methods and corresponding tests to Dataset:
> {noformat}
> - Basic inspection operations
>   - dtypes
>   - columns
>   - printSchema
>   - explain
>   - Column accessors
> - col
> - apply
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Deleted] (SPARK-13555) Migrate untyped relational operations

2016-03-11 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian deleted SPARK-13555:
---


> Migrate untyped relational operations
> -
>
> Key: SPARK-13555
> URL: https://issues.apache.org/jira/browse/SPARK-13555
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Should migrate the following methods and corresponding tests to Dataset:
> {noformat}
> - Relational operations
>   - Untyped relational operations
> - select(Column*): Dataset[Row]
> - select(String, String*): Dataset[Row]
> - selectExpr(String*): Dataset[Row]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Deleted] (SPARK-13556) Migrate untyped joins

2016-03-11 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian deleted SPARK-13556:
---


> Migrate untyped joins
> -
>
> Key: SPARK-13556
> URL: https://issues.apache.org/jira/browse/SPARK-13556
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Should migrate the following methods and corresponding tests to Dataset:
> {noformat}
> - Joins
>   - Untyped joins
> - join[U: Encoder](Dataset[U]): Dataset[Row]
> - join[U: Encoder](Dataset[U], String): Dataset[Row]
> - join[U: Encoder](Dataset[U], Seq[String]): Dataset[Row]
> - join[U: Encoder](Dataset[U], Seq[String], String): Dataset[Row]
> - join[U: Encoder](Dataset[U], Column): Dataset[Row]
> - join[U: Encoder](Dataset[U], Column, String): Dataset[Row]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Deleted] (SPARK-13557) Migrate gather-to-driver actions

2016-03-11 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian deleted SPARK-13557:
---


> Migrate gather-to-driver actions
> 
>
> Key: SPARK-13557
> URL: https://issues.apache.org/jira/browse/SPARK-13557
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Cheng Lian
>
> Should migrate the following methods and corresponding tests to Dataset:
> {noformat}
> - Gater-to-driver actions
>   - head(Int): Array[T]
>   - head(): T
>   - first(): T
>   - collect(): Array[T]
>   - collectAsList(): java.util.List[T]
>   - take(Int): Array[T]
>   - takeAsList(Int): java.util.List[T]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Deleted] (SPARK-13558) Migrate basic GroupedDataset methods

2016-03-11 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian deleted SPARK-13558:
---


> Migrate basic GroupedDataset methods
> 
>
> Key: SPARK-13558
> URL: https://issues.apache.org/jira/browse/SPARK-13558
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Should migrate the following methods and corresponding tests to Dataset:
> {noformat}
> - Aggregations
>   - GroupedDataset
> - Support GroupType (GroupBy/GroupingSet/Rollup/Cube)
> - Untyped aggregations
>   - agg((String, String), (String, String)*): Dataset[Row]
>   - agg(Map[String, String]): Dataset[Row]
>   - agg(java.util.Map[String, String]): Dataset[Row]
>   - agg(Column, Column*): Dataset[Row]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Deleted] (SPARK-13559) Migrate common GroupedDataset aggregations

2016-03-11 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian deleted SPARK-13559:
---


> Migrate common GroupedDataset aggregations
> --
>
> Key: SPARK-13559
> URL: https://issues.apache.org/jira/browse/SPARK-13559
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Cheng Lian
>
> Should migrate the following methods and corresponding tests to Dataset:
> {noformat}
> - Aggregations
>   - GroupedDataset
> - Common untyped aggregations
>   - mean(String*): Dataset[Row]
>   - max(String*): Dataset[Row]
>   - avg(String*): Dataset[Row]
>   - min(String*): Dataset[Row]
>   - sum(String*): Dataset[Row]
> - Common typed aggregations
>   - count(): Dataset[(K, Long)]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Deleted] (SPARK-13560) Migrate GroupedDataset pivoting methods

2016-03-11 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian deleted SPARK-13560:
---


> Migrate GroupedDataset pivoting methods
> ---
>
> Key: SPARK-13560
> URL: https://issues.apache.org/jira/browse/SPARK-13560
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Cheng Lian
>
> Should migrate the following methods and corresponding tests to Dataset:
> {noformat}
> - Aggregations
>   - GroupedDataset
> - Pivoting
>   - pivot(String): GroupedDataset[Row, V]
>   - pivot(String, Seq[Any]): GroupedDataset[Row, V]
>   - pivot(String, java.util.List[Any]): GroupedDataset[Row, V]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13817) Re-enable MiMA check after unifying DataFrame and Dataset API

2016-03-11 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-13817:
--

 Summary: Re-enable MiMA check after unifying DataFrame and Dataset 
API
 Key: SPARK-13817
 URL: https://issues.apache.org/jira/browse/SPARK-13817
 Project: Spark
  Issue Type: Test
  Components: Build
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian


In [PR #11443|https://github.com/apache/spark/pull/11443], we unified DataFrame 
and Dataset API. Since this PR did tons of API changes, we disabled MiMA check 
temporarily for convenience. Now it is merged, we should re-enable MiMA check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13817) Re-enable MiMA check after unifying DataFrame and Dataset API

2016-03-11 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190744#comment-15190744
 ] 

Apache Spark commented on SPARK-13817:
--

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/11656

> Re-enable MiMA check after unifying DataFrame and Dataset API
> -
>
> Key: SPARK-13817
> URL: https://issues.apache.org/jira/browse/SPARK-13817
> Project: Spark
>  Issue Type: Test
>  Components: Build
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> In [PR #11443|https://github.com/apache/spark/pull/11443], we unified 
> DataFrame and Dataset API. Since this PR did tons of API changes, we disabled 
> MiMA check temporarily for convenience. Now it is merged, we should re-enable 
> MiMA check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13817) Re-enable MiMA check after unifying DataFrame and Dataset API

2016-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13817:


Assignee: Cheng Lian  (was: Apache Spark)

> Re-enable MiMA check after unifying DataFrame and Dataset API
> -
>
> Key: SPARK-13817
> URL: https://issues.apache.org/jira/browse/SPARK-13817
> Project: Spark
>  Issue Type: Test
>  Components: Build
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> In [PR #11443|https://github.com/apache/spark/pull/11443], we unified 
> DataFrame and Dataset API. Since this PR did tons of API changes, we disabled 
> MiMA check temporarily for convenience. Now it is merged, we should re-enable 
> MiMA check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13817) Re-enable MiMA check after unifying DataFrame and Dataset API

2016-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13817:


Assignee: Apache Spark  (was: Cheng Lian)

> Re-enable MiMA check after unifying DataFrame and Dataset API
> -
>
> Key: SPARK-13817
> URL: https://issues.apache.org/jira/browse/SPARK-13817
> Project: Spark
>  Issue Type: Test
>  Components: Build
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Apache Spark
>
> In [PR #11443|https://github.com/apache/spark/pull/11443], we unified 
> DataFrame and Dataset API. Since this PR did tons of API changes, we disabled 
> MiMA check temporarily for convenience. Now it is merged, we should re-enable 
> MiMA check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Deleted] (SPARK-13564) Migrate DataFrameStatFunctions to Dataset

2016-03-11 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian deleted SPARK-13564:
---


> Migrate DataFrameStatFunctions to Dataset
> -
>
> Key: SPARK-13564
> URL: https://issues.apache.org/jira/browse/SPARK-13564
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Cheng Lian
>
> After the migration, we should have a separate namespace {{Dataset.stat}} for 
> statistics methods, just like {{DataFrame.stat}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Deleted] (SPARK-13563) Migrate DataFrameNaFunctions to Dataset

2016-03-11 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian deleted SPARK-13563:
---


> Migrate DataFrameNaFunctions to Dataset
> ---
>
> Key: SPARK-13563
> URL: https://issues.apache.org/jira/browse/SPARK-13563
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Cheng Lian
>
> After the migration, we should have a separate namespace {{Dataset.na}}, just 
> like {{DataFrame.na}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Deleted] (SPARK-13562) Migrate Dataset typed aggregations

2016-03-11 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian deleted SPARK-13562:
---


> Migrate Dataset typed aggregations
> --
>
> Key: SPARK-13562
> URL: https://issues.apache.org/jira/browse/SPARK-13562
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Cheng Lian
>
> Should migrate the following methods and corresponding tests to Dataset:
> {noformat}
> - Aggregations
>   - Untyped aggregations (depends on GroupedDataset)
> - groupBy(Column*): GroupedDataset[Row, T]
> - groupBy(String, String*): GroupedDataset[Row, T]
> - rollup(Column*): GroupedDataset[Row, T]
> - rollup(String, String*): GroupedDataset[Row, T]
> - cube(Column*): GroupedDataset[Row, T]
> - cube(String, String*): GroupedDataset[Row, T]
> - agg((String, String), (String, String)*): Dataset[Row]
> - agg(Map[String, String]): Dataset[Row]
> - agg(java.util.Map[String, String]): Dataset[Row]
> - agg(Column, Column*): Dataset[Row]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13815) UnsupportedOperationException: empty collection when metadata for a Pipeline is an empty file

2016-03-11 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190765#comment-15190765
 ] 

Sean Owen commented on SPARK-13815:
---

This is more generally what happens when you call something like first or take 
on an empty RDD. Is it that misleading? it says you're doing something you 
can't because a collection is empty. You can add an isEmpty check or something 
and throw a more specific exception, sure.

> UnsupportedOperationException: empty collection when metadata for a Pipeline 
> is an empty file
> -
>
> Key: SPARK-13815
> URL: https://issues.apache.org/jira/browse/SPARK-13815
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.0.0
> Environment: today's build of 2.0.0-SNAPSHOT
>Reporter: Jacek Laskowski
>Priority: Minor
>
> The following code that loads a {{Pipeline}} from an empty {{metadata}} file 
> throws an exception (expected) that says nothing about the real cause of it.
> {code}
> $ ls -l hello-pipeline/metadata
> -rw-r--r--  1 jacek  staff  0 11 mar 09:00 hello-pipeline/metadata
> scala> Pipeline.read.load("hello-pipeline")
> ...
> java.lang.UnsupportedOperationException: empty collection
> at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1344)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
> at org.apache.spark.rdd.RDD.first(RDD.scala:1341)
> at 
> org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:285)
> at 
> org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:253)
> at 
> org.apache.spark.ml.Pipeline$PipelineReader.load(Pipeline.scala:203)
> at 
> org.apache.spark.ml.Pipeline$PipelineReader.load(Pipeline.scala:197)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13818) the spark streaming job will be always processing status when restart elasticsearch

2016-03-11 Thread yuemeng (JIRA)

yuemeng created SPARK-13818:
---

 Summary: the spark streaming job will be always processing status 
when restart elasticsearch 
 Key: SPARK-13818
 URL: https://issues.apache.org/jira/browse/SPARK-13818
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.5.0, 1.4.0, 1.3.0
Reporter: yuemeng
Priority: Blocker
 Fix For: 1.4.2, 1.5.3


Using spark streaming to write data into elasticsearch-hadoop system ,when we 
restart  elasticsearch system,tasks in some job at this time will be get follow 
error:
Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most 
recent failure: Lost task 0.3 in stage 4.0 (TID 75, CIS-store02): 
org.elasticsearch.hadoop.EsHadoopIllegalStateException: Cluster state volatile; 
cannot find node backing shards - please check whether your cluster is stable
at 
org.elasticsearch.hadoop.rest.RestRepository.getWriteTargetPrimaryShards(RestRepository.java:370)
at 
org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:425)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:393)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
at 
org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
at 
org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:


and this batch will be always in the status of processing,Never failed or 
finished,it maybe cause resources for this batch never release.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13819) using a tegexp_replace in a gropu by clause raises a nullpointerexception

2016-03-11 Thread JIRA

Javier Pérez created SPARK-13819:


 Summary: using a tegexp_replace in a gropu by clause raises a 
nullpointerexception
 Key: SPARK-13819
 URL: https://issues.apache.org/jira/browse/SPARK-13819
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0
Reporter: Javier Pérez


1. Start start-thriftserver.sh
2. connect with beeline
3. Perform the following query over a table:
  SELECT t0.textsample 
  FROM test t0 
  ORDER BY regexp_replace(
t0.code, 
concat('\\Q', 'a', '\\E'), 
regexp_replace(
   regexp_replace('zz', '', ''),
'\\$', 
'\\$')) DESC;
Problem: NullPointerException

Trace:

 java.lang.NullPointerException
at 
org.apache.spark.sql.catalyst.expressions.RegExpReplace.nullSafeEval(regexpExpressions.scala:224)
at 
org.apache.spark.sql.catalyst.expressions.TernaryExpression.eval(Expression.scala:458)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedOrdering.compare(ordering.scala:36)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedOrdering.compare(ordering.scala:27)
at scala.math.Ordering$class.gt(Ordering.scala:97)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedOrdering.gt(ordering.scala:27)
at org.apache.spark.RangePartitioner.getPartition(Partitioner.scala:168)
at 
org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180)
at 
org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:119)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13818) the spark streaming job will be always processing status when restart elasticsearch

2016-03-11 Thread yuemeng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190783#comment-15190783
 ] 

yuemeng commented on SPARK-13818:
-

code like:
 stream.foreachRDD( rdd=>
  {

val ep = esPath + getIndexName("") + "/event"
rdd.saveToEs(ep)
  }

when spark streaming run well,we restart the elasticsearch,tasks which at this 
point will be failed,but this batch never finished or failed,it streaming web 
ui,we can see that:this job had task faild because of error( 
org.elasticsearch.hadoop.EsHadoopIllegalStateException: Cluster state volatile; 
cannot find node backing shards - please check whether your cluster is 
stable),but this batch will be always in processing status.in my opinion,if job 
failed becasue of  task failure by some reason,this batch's status will be 
finished or failed instead of processing

will be anyone like to check this issue.thanks 
[~zsxwing] can u help me to check this issue




> the spark streaming job will be always processing status when restart 
> elasticsearch 
> 
>
> Key: SPARK-13818
> URL: https://issues.apache.org/jira/browse/SPARK-13818
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.3.0, 1.4.0, 1.5.0
>Reporter: yuemeng
>Priority: Blocker
> Fix For: 1.4.2, 1.5.3
>
>
> Using spark streaming to write data into elasticsearch-hadoop system ,when we 
> restart  elasticsearch system,tasks in some job at this time will be get 
> follow error:
> Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most 
> recent failure: Lost task 0.3 in stage 4.0 (TID 75, CIS-store02): 
> org.elasticsearch.hadoop.EsHadoopIllegalStateException: Cluster state 
> volatile; cannot find node backing shards - please check whether your cluster 
> is stable
> at 
> org.elasticsearch.hadoop.rest.RestRepository.getWriteTargetPrimaryShards(RestRepository.java:370)
> at 
> org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:425)
> at 
> org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:393)
> at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
> at 
> org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
> at 
> org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
> at org.apache.spark.scheduler.Task.run(Task.scala:70)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace:
> and this batch will be always in the status of processing,Never failed or 
> finished,it maybe cause resources for this batch never release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Deleted] (SPARK-13561) Migrate Dataset untyped aggregations

2016-03-11 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian deleted SPARK-13561:
---


> Migrate Dataset untyped aggregations
> 
>
> Key: SPARK-13561
> URL: https://issues.apache.org/jira/browse/SPARK-13561
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Cheng Lian
>
> Should migrate the following methods and corresponding tests to Dataset:
> {noformat}
> - Aggregations
>   - Typed aggregations (depends on GroupedDataset)
> - groupBy[K: Encoder](T => K): GroupedDataset[K, T] // rename to 
> groupByKey
> - groupBy[K](MapFunction[T, K], Encoder[K]): GroupedDataset[K, T] // 
> Rename to groupByKey
> - count
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Deleted] (SPARK-13565) Migrate DataFrameReader/DataFrameWriter to Dataset API

2016-03-11 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian deleted SPARK-13565:
---


> Migrate DataFrameReader/DataFrameWriter to Dataset API
> --
>
> Key: SPARK-13565
> URL: https://issues.apache.org/jira/browse/SPARK-13565
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Cheng Lian
>
> We'd like to be able to read/write a Dataset from/to specific data sources.
> After the migration, we should have {{Dataset.read}}/{{Dataset.write}}, just 
> like {{DataFrame.read}}/{{DataFrame.write}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13820) TPC-DS Query 10 fails to compile

2016-03-11 Thread Roy Cecil (JIRA)

Roy Cecil created SPARK-13820:
-

 Summary: TPC-DS Query 10 fails to compile
 Key: SPARK-13820
 URL: https://issues.apache.org/jira/browse/SPARK-13820
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.6.1
 Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo)
Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 18:37:38 
EST 2015 x86_64 x86_64 x86_64 GNU/Linux

Reporter: Roy Cecil


TPC-DS Query 10 fails to compile with the following error.

Parsing error: KW_SELECT )=> ( KW_EXISTS subQueryExpression ) -> ^( 
TOK_SUBQUERY_EXPR ^( TOK_SUBQUERY_OP KW_EXISTS ) subQueryExpression ) );])
at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
at org.antlr.runtime.DFA.predict(DFA.java:144)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceEqualExpression(HiveParser_IdentifiersParser.java:8155)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceNotExpression(HiveParser_IdentifiersParser.java:9177)
Parsing error: KW_SELECT )=> ( KW_EXISTS subQueryExpression ) -> ^( 
TOK_SUBQUERY_EXPR ^( TOK_SUBQUERY_OP KW_EXISTS ) subQueryExpression ) );])
at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
at org.antlr.runtime.DFA.predict(DFA.java:144)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceEqualExpression(HiveParser_IdentifiersParser.java:8155)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceNotExpression(HiveParser_IdentifiersParser.java:9177)

Query is pasted here for easy reproduction

 select
  cd_gender,
  cd_marital_status,
  cd_education_status,
  count(*) cnt1,
  cd_purchase_estimate,
  count(*) cnt2,
  cd_credit_rating,
  count(*) cnt3,
  cd_dep_count,
  count(*) cnt4,
  cd_dep_employed_count,
  count(*) cnt5,
  cd_dep_college_count,
  count(*) cnt6
 from
  customer c
  JOIN customer_address ca ON c.c_current_addr_sk = ca.ca_address_sk
  JOIN customer_demographics ON cd_demo_sk = c.c_current_cdemo_sk
  LEFT SEMI JOIN (select ss_customer_sk
  from store_sales
   JOIN date_dim ON ss_sold_date_sk = d_date_sk
  where
d_year = 2002 and
d_moy between 1 and 1+3) ss_wh1 ON c.c_customer_sk = 
ss_wh1.ss_customer_sk
 where
  ca_county in ('Rush County','Toole County','Jefferson County','Dona Ana 
County','La Porte County') and
   exists (
select tmp.customer_sk from (
select ws_bill_customer_sk as customer_sk
from web_sales,date_dim
where
  web_sales.ws_sold_date_sk = date_dim.d_date_sk and
  d_year = 2002 and
  d_moy between 1 and 1+3
UNION ALL
select cs_ship_customer_sk as customer_sk
from catalog_sales,date_dim
where
  catalog_sales.cs_sold_date_sk = date_dim.d_date_sk and
  d_year = 2002 and
  d_moy between 1 and 1+3
  ) tmp where c.c_customer_sk = tmp.customer_sk
)
 group by cd_gender,
  cd_marital_status,
  cd_education_status,
  cd_purchase_estimate,
  cd_credit_rating,
  cd_dep_count,
  cd_dep_employed_count,
  cd_dep_college_count
 order by cd_gender,
  cd_marital_status,
  cd_education_status,
  cd_purchase_estimate,
  cd_credit_rating,
  cd_dep_count,
  cd_dep_employed_count,
  cd_dep_college_count
  limit 100;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13818) the spark streaming job will be always processing status when restart elasticsearch

2016-03-11 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13818:
--
 Priority: Major  (was: Blocker)
Fix Version/s: (was: 1.5.3)
   (was: 1.4.2)

@yuemeng Please don't open a JIRA until you read 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark  You 
should not set blocker, and it does not make sense to set fix versions. 
Further, this is an Elasticsearch issue, not Spark (at this stage at least). 
I'm going to close it.

> the spark streaming job will be always processing status when restart 
> elasticsearch 
> 
>
> Key: SPARK-13818
> URL: https://issues.apache.org/jira/browse/SPARK-13818
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.3.0, 1.4.0, 1.5.0
>Reporter: yuemeng
>
> Using spark streaming to write data into elasticsearch-hadoop system ,when we 
> restart  elasticsearch system,tasks in some job at this time will be get 
> follow error:
> Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most 
> recent failure: Lost task 0.3 in stage 4.0 (TID 75, CIS-store02): 
> org.elasticsearch.hadoop.EsHadoopIllegalStateException: Cluster state 
> volatile; cannot find node backing shards - please check whether your cluster 
> is stable
> at 
> org.elasticsearch.hadoop.rest.RestRepository.getWriteTargetPrimaryShards(RestRepository.java:370)
> at 
> org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:425)
> at 
> org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:393)
> at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
> at 
> org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
> at 
> org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
> at org.apache.spark.scheduler.Task.run(Task.scala:70)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace:
> and this batch will be always in the status of processing,Never failed or 
> finished,it maybe cause resources for this batch never release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13821) TPC-DS Query 20 fails to compile

2016-03-11 Thread Roy Cecil (JIRA)

Roy Cecil created SPARK-13821:
-

 Summary: TPC-DS Query 20 fails to compile
 Key: SPARK-13821
 URL: https://issues.apache.org/jira/browse/SPARK-13821
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.6.1
 Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo)
Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 18:37:38 
EST 2015 x86_64 x86_64 x86_64 GNU/Linux
Reporter: Roy Cecil


TPC-DS Query 20 Fails to compile with the follwing Error Message

Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( tableAllColumns 
)=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( expression ( ( ( 
KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA identifier )* RPAREN 
) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) );])
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
at org.antlr.runtime.DFA.predict(DFA.java:80)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)
Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( tableAllColumns 
)=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( expression ( ( ( 
KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA identifier )* RPAREN 
) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) );])
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
at org.antlr.runtime.DFA.predict(DFA.java:80)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13818) the spark streaming job will be always processing status when restart elasticsearch

2016-03-11 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13818:
--

@yuemeng Please don't open a JIRA until you read 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark  You 
should not set blocker, and it does not make sense to set fix versions. 
Further, this is an Elasticsearch issue, not Spark (at this stage at least). 
I'm going to close it.

> the spark streaming job will be always processing status when restart 
> elasticsearch 
> 
>
> Key: SPARK-13818
> URL: https://issues.apache.org/jira/browse/SPARK-13818
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.3.0, 1.4.0, 1.5.0
>Reporter: yuemeng
>
> Using spark streaming to write data into elasticsearch-hadoop system ,when we 
> restart  elasticsearch system,tasks in some job at this time will be get 
> follow error:
> Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most 
> recent failure: Lost task 0.3 in stage 4.0 (TID 75, CIS-store02): 
> org.elasticsearch.hadoop.EsHadoopIllegalStateException: Cluster state 
> volatile; cannot find node backing shards - please check whether your cluster 
> is stable
> at 
> org.elasticsearch.hadoop.rest.RestRepository.getWriteTargetPrimaryShards(RestRepository.java:370)
> at 
> org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:425)
> at 
> org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:393)
> at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
> at 
> org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
> at 
> org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
> at org.apache.spark.scheduler.Task.run(Task.scala:70)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace:
> and this batch will be always in the status of processing,Never failed or 
> finished,it maybe cause resources for this batch never release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13818) the spark streaming job will be always processing status when restart elasticsearch

2016-03-11 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-13818.
---
Resolution: Invalid

> the spark streaming job will be always processing status when restart 
> elasticsearch 
> 
>
> Key: SPARK-13818
> URL: https://issues.apache.org/jira/browse/SPARK-13818
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.3.0, 1.4.0, 1.5.0
>Reporter: yuemeng
>
> Using spark streaming to write data into elasticsearch-hadoop system ,when we 
> restart  elasticsearch system,tasks in some job at this time will be get 
> follow error:
> Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most 
> recent failure: Lost task 0.3 in stage 4.0 (TID 75, CIS-store02): 
> org.elasticsearch.hadoop.EsHadoopIllegalStateException: Cluster state 
> volatile; cannot find node backing shards - please check whether your cluster 
> is stable
> at 
> org.elasticsearch.hadoop.rest.RestRepository.getWriteTargetPrimaryShards(RestRepository.java:370)
> at 
> org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:425)
> at 
> org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:393)
> at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
> at 
> org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
> at 
> org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
> at org.apache.spark.scheduler.Task.run(Task.scala:70)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace:
> and this batch will be always in the status of processing,Never failed or 
> finished,it maybe cause resources for this batch never release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13822) Follow-ups of DataFrame/Dataset API unification

2016-03-11 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-13822:
--

 Summary: Follow-ups of DataFrame/Dataset API unification
 Key: SPARK-13822
 URL: https://issues.apache.org/jira/browse/SPARK-13822
 Project: Spark
  Issue Type: Improvement
  Components: Build, SQL
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian


This is an umbrella ticket for all follow-up work of DataFrame/Dataset API 
unification (SPARK-13244).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13821) TPC-DS Query 20 fails to compile

2016-03-11 Thread Roy Cecil (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190836#comment-15190836
 ] 

Roy Cecil commented on SPARK-13821:
---

Query Text is
select i_item_id
   ,i_item_desc
   ,i_category
   ,i_class
   ,i_current_price
   ,sum(cs_ext_sales_price) as itemrevenue
   ,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
   (partition by i_class) as revenueratio
 from   catalog_sales
 ,item
 ,date_dim
 where cs_item_sk = i_item_sk
   and i_category in ('Sports', 'Books', 'Home')
   and cs_sold_date_sk = d_date_sk
 and d_date between cast('1999-02-22' as date)
  and date_add(cast('1999-02-22' as date), 30)
 group by i_item_id
 ,i_item_desc
 ,i_category
 ,i_class
 ,i_current_price
 order by i_category
 ,i_class
 ,i_item_id
 ,i_item_desc
 ,revenueratio
LIMIT 100;

> TPC-DS Query 20 fails to compile
> 
>
> Key: SPARK-13821
> URL: https://issues.apache.org/jira/browse/SPARK-13821
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
> Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo)
> Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 
> 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Roy Cecil
>
> TPC-DS Query 20 Fails to compile with the follwing Error Message
> Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( 
> tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( 
> expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA 
> identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) 
> );])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
> at org.antlr.runtime.DFA.predict(DFA.java:80)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)
> Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( 
> tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( 
> expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA 
> identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) 
> );])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
> at org.antlr.runtime.DFA.predict(DFA.java:80)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13821) TPC-DS Query 20 fails to compile

2016-03-11 Thread Roy Cecil (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190835#comment-15190835
 ] 

Roy Cecil commented on SPARK-13821:
---

Query Text is
select i_item_id
   ,i_item_desc
   ,i_category
   ,i_class
   ,i_current_price
   ,sum(cs_ext_sales_price) as itemrevenue
   ,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
   (partition by i_class) as revenueratio
 from   catalog_sales
 ,item
 ,date_dim
 where cs_item_sk = i_item_sk
   and i_category in ('Sports', 'Books', 'Home')
   and cs_sold_date_sk = d_date_sk
 and d_date between cast('1999-02-22' as date)
  and date_add(cast('1999-02-22' as date), 30)
 group by i_item_id
 ,i_item_desc
 ,i_category
 ,i_class
 ,i_current_price
 order by i_category
 ,i_class
 ,i_item_id
 ,i_item_desc
 ,revenueratio
LIMIT 100;

> TPC-DS Query 20 fails to compile
> 
>
> Key: SPARK-13821
> URL: https://issues.apache.org/jira/browse/SPARK-13821
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
> Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo)
> Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 
> 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Roy Cecil
>
> TPC-DS Query 20 Fails to compile with the follwing Error Message
> Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( 
> tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( 
> expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA 
> identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) 
> );])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
> at org.antlr.runtime.DFA.predict(DFA.java:80)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)
> Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( 
> tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( 
> expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA 
> identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) 
> );])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
> at org.antlr.runtime.DFA.predict(DFA.java:80)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13821) TPC-DS Query 20 fails to compile

2016-03-11 Thread Roy Cecil (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190837#comment-15190837
 ] 

Roy Cecil commented on SPARK-13821:
---

Query Text is
select i_item_id
   ,i_item_desc
   ,i_category
   ,i_class
   ,i_current_price
   ,sum(cs_ext_sales_price) as itemrevenue
   ,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
   (partition by i_class) as revenueratio
 from   catalog_sales
 ,item
 ,date_dim
 where cs_item_sk = i_item_sk
   and i_category in ('Sports', 'Books', 'Home')
   and cs_sold_date_sk = d_date_sk
 and d_date between cast('1999-02-22' as date)
  and date_add(cast('1999-02-22' as date), 30)
 group by i_item_id
 ,i_item_desc
 ,i_category
 ,i_class
 ,i_current_price
 order by i_category
 ,i_class
 ,i_item_id
 ,i_item_desc
 ,revenueratio
LIMIT 100;

> TPC-DS Query 20 fails to compile
> 
>
> Key: SPARK-13821
> URL: https://issues.apache.org/jira/browse/SPARK-13821
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
> Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo)
> Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 
> 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Roy Cecil
>
> TPC-DS Query 20 Fails to compile with the follwing Error Message
> Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( 
> tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( 
> expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA 
> identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) 
> );])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
> at org.antlr.runtime.DFA.predict(DFA.java:80)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)
> Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( 
> tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( 
> expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA 
> identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) 
> );])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
> at org.antlr.runtime.DFA.predict(DFA.java:80)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13821) TPC-DS Query 20 fails to compile

2016-03-11 Thread Roy Cecil (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190839#comment-15190839
 ] 

Roy Cecil commented on SPARK-13821:
---

select i_item_id
   ,i_item_desc
   ,i_category
   ,i_class
   ,i_current_price
   ,sum(cs_ext_sales_price) as itemrevenue
   ,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
   (partition by i_class) as revenueratio
 from   catalog_sales
 ,item
 ,date_dim
 where cs_item_sk = i_item_sk
   and i_category in ('Sports', 'Books', 'Home')
   and cs_sold_date_sk = d_date_sk
 and d_date between cast('1999-02-22' as date)
  and date_add(cast('1999-02-22' as date), 30)
 group by i_item_id
 ,i_item_desc
 ,i_category
 ,i_class
 ,i_current_price
 order by i_category
 ,i_class
 ,i_item_id
 ,i_item_desc
 ,revenueratio
LIMIT 100;

> TPC-DS Query 20 fails to compile
> 
>
> Key: SPARK-13821
> URL: https://issues.apache.org/jira/browse/SPARK-13821
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
> Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo)
> Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 
> 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Roy Cecil
>
> TPC-DS Query 20 Fails to compile with the follwing Error Message
> Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( 
> tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( 
> expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA 
> identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) 
> );])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
> at org.antlr.runtime.DFA.predict(DFA.java:80)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)
> Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( 
> tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( 
> expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA 
> identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) 
> );])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
> at org.antlr.runtime.DFA.predict(DFA.java:80)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13817) Re-enable MiMA check after unifying DataFrame and Dataset API

2016-03-11 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-13817:
---
Issue Type: Sub-task  (was: Test)
Parent: SPARK-13822

> Re-enable MiMA check after unifying DataFrame and Dataset API
> -
>
> Key: SPARK-13817
> URL: https://issues.apache.org/jira/browse/SPARK-13817
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> In [PR #11443|https://github.com/apache/spark/pull/11443], we unified 
> DataFrame and Dataset API. Since this PR did tons of API changes, we disabled 
> MiMA check temporarily for convenience. Now it is merged, we should re-enable 
> MiMA check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13823) Always specify Charset in String <-> byte[] conversions (and remaining Coverity items)

2016-03-11 Thread Sean Owen (JIRA)

Sean Owen created SPARK-13823:
-

 Summary: Always specify Charset in String <-> byte[] conversions 
(and remaining Coverity items)
 Key: SPARK-13823
 URL: https://issues.apache.org/jira/browse/SPARK-13823
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL, Streaming
Affects Versions: 2.0.0
Reporter: Sean Owen
Assignee: Sean Owen
Priority: Minor


Most of the remaining items from the last Coverity scan concern using, for 
example, the constructor {{new String(byte[])}} or the method 
{{String.getBytes()}}, or similarly for constructors of {{InputStreamReader}} 
and {{OutputStreamWriter}}. These use the platform default encoding, which 
means their behavior may change in different locales, which should be 
undesirable in all cases in Spark.

It makes sense to specify UTF-8 as the default everywhere; where already 
specified, it's UTF-8 in 95% of cases. A few tests set US-ASCII, but UTF-8 is a 
superset.

We should also consistently use {{StandardCharsets.UTF_8}} rather than "UTF-8" 
or Guava's {{Charsets.UTF_8}} to specify this.

(Finally, we should touch up the other few remaining Coverity scan items, which 
are trivial, while we're here.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13684) Possible unsafe bytesRead increment in StreamInterceptor

2016-03-11 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-13684.
---
Resolution: Duplicate

If you don't mind I'm going to bundle this up with resolution for all the 
remaining Coverity issues

> Possible unsafe bytesRead increment in StreamInterceptor
> 
>
> Key: SPARK-13684
> URL: https://issues.apache.org/jira/browse/SPARK-13684
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: holdenk
>Priority: Trivial
>
> We unsafely increment a volatile (bytesRead) in a call back, if two call 
> backs are triggered we may under count bytesRead. This issue was found using 
> coverity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13823) Always specify Charset in String <-> byte[] conversions (and remaining Coverity items)

2016-03-11 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190850#comment-15190850
 ] 

Apache Spark commented on SPARK-13823:
--

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/11657

> Always specify Charset in String <-> byte[] conversions (and remaining 
> Coverity items)
> --
>
> Key: SPARK-13823
> URL: https://issues.apache.org/jira/browse/SPARK-13823
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL, Streaming
>Affects Versions: 2.0.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Minor
>
> Most of the remaining items from the last Coverity scan concern using, for 
> example, the constructor {{new String(byte[])}} or the method 
> {{String.getBytes()}}, or similarly for constructors of {{InputStreamReader}} 
> and {{OutputStreamWriter}}. These use the platform default encoding, which 
> means their behavior may change in different locales, which should be 
> undesirable in all cases in Spark.
> It makes sense to specify UTF-8 as the default everywhere; where already 
> specified, it's UTF-8 in 95% of cases. A few tests set US-ASCII, but UTF-8 is 
> a superset.
> We should also consistently use {{StandardCharsets.UTF_8}} rather than 
> "UTF-8" or Guava's {{Charsets.UTF_8}} to specify this.
> (Finally, we should touch up the other few remaining Coverity scan items, 
> which are trivial, while we're here.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13823) Always specify Charset in String <-> byte[] conversions (and remaining Coverity items)

2016-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13823:


Assignee: Apache Spark  (was: Sean Owen)

> Always specify Charset in String <-> byte[] conversions (and remaining 
> Coverity items)
> --
>
> Key: SPARK-13823
> URL: https://issues.apache.org/jira/browse/SPARK-13823
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL, Streaming
>Affects Versions: 2.0.0
>Reporter: Sean Owen
>Assignee: Apache Spark
>Priority: Minor
>
> Most of the remaining items from the last Coverity scan concern using, for 
> example, the constructor {{new String(byte[])}} or the method 
> {{String.getBytes()}}, or similarly for constructors of {{InputStreamReader}} 
> and {{OutputStreamWriter}}. These use the platform default encoding, which 
> means their behavior may change in different locales, which should be 
> undesirable in all cases in Spark.
> It makes sense to specify UTF-8 as the default everywhere; where already 
> specified, it's UTF-8 in 95% of cases. A few tests set US-ASCII, but UTF-8 is 
> a superset.
> We should also consistently use {{StandardCharsets.UTF_8}} rather than 
> "UTF-8" or Guava's {{Charsets.UTF_8}} to specify this.
> (Finally, we should touch up the other few remaining Coverity scan items, 
> which are trivial, while we're here.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13823) Always specify Charset in String <-> byte[] conversions (and remaining Coverity items)

2016-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13823:


Assignee: Sean Owen  (was: Apache Spark)

> Always specify Charset in String <-> byte[] conversions (and remaining 
> Coverity items)
> --
>
> Key: SPARK-13823
> URL: https://issues.apache.org/jira/browse/SPARK-13823
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL, Streaming
>Affects Versions: 2.0.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Minor
>
> Most of the remaining items from the last Coverity scan concern using, for 
> example, the constructor {{new String(byte[])}} or the method 
> {{String.getBytes()}}, or similarly for constructors of {{InputStreamReader}} 
> and {{OutputStreamWriter}}. These use the platform default encoding, which 
> means their behavior may change in different locales, which should be 
> undesirable in all cases in Spark.
> It makes sense to specify UTF-8 as the default everywhere; where already 
> specified, it's UTF-8 in 95% of cases. A few tests set US-ASCII, but UTF-8 is 
> a superset.
> We should also consistently use {{StandardCharsets.UTF_8}} rather than 
> "UTF-8" or Guava's {{Charsets.UTF_8}} to specify this.
> (Finally, we should touch up the other few remaining Coverity scan items, 
> which are trivial, while we're here.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13824) Upgrade to Scala 2.11.8

2016-03-11 Thread Jacek Laskowski (JIRA)

Jacek Laskowski created SPARK-13824:
---

 Summary: Upgrade to Scala 2.11.8
 Key: SPARK-13824
 URL: https://issues.apache.org/jira/browse/SPARK-13824
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Jacek Laskowski
Priority: Minor


Scala 2.11.8 is out so...time to upgrade before 2.0.0 is out -> 
http://www.scala-lang.org/news/2.11.8/.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13825) Upgrade to Scala 2.11.8

2016-03-11 Thread Jacek Laskowski (JIRA)

Jacek Laskowski created SPARK-13825:
---

 Summary: Upgrade to Scala 2.11.8
 Key: SPARK-13825
 URL: https://issues.apache.org/jira/browse/SPARK-13825
 Project: Spark
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Jacek Laskowski
Priority: Minor


Scala 2.11.8 is out so...time to upgrade before 2.0.0 is out -> 
http://www.scala-lang.org/news/2.11.8/.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13824) Upgrade to Scala 2.11.8

2016-03-11 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-13824.
---
Resolution: Duplicate

> Upgrade to Scala 2.11.8
> ---
>
> Key: SPARK-13824
> URL: https://issues.apache.org/jira/browse/SPARK-13824
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> Scala 2.11.8 is out so...time to upgrade before 2.0.0 is out -> 
> http://www.scala-lang.org/news/2.11.8/.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13825) Upgrade to Scala 2.11.8

2016-03-11 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190914#comment-15190914
 ] 

Sean Owen commented on SPARK-13825:
---

Yes, I think it's OK to update the version in branch 1.6 too.

> Upgrade to Scala 2.11.8
> ---
>
> Key: SPARK-13825
> URL: https://issues.apache.org/jira/browse/SPARK-13825
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> Scala 2.11.8 is out so...time to upgrade before 2.0.0 is out -> 
> http://www.scala-lang.org/news/2.11.8/.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13825) Upgrade to Scala 2.11.8

2016-03-11 Thread Jacek Laskowski (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190930#comment-15190930
 ] 

Jacek Laskowski commented on SPARK-13825:
-

OK. Thanks. I'm going to send a pull request later today.

> Upgrade to Scala 2.11.8
> ---
>
> Key: SPARK-13825
> URL: https://issues.apache.org/jira/browse/SPARK-13825
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> Scala 2.11.8 is out so...time to upgrade before 2.0.0 is out -> 
> http://www.scala-lang.org/news/2.11.8/.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13577) Allow YARN to handle multiple jars, archive when uploading Spark dependencies

2016-03-11 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-13577:
--
Assignee: Marcelo Vanzin

> Allow YARN to handle multiple jars, archive when uploading Spark dependencies
> -
>
> Key: SPARK-13577
> URL: https://issues.apache.org/jira/browse/SPARK-13577
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>
> See parent bug for more details.
> Before we remove assemblies from Spark, we need the YARN backend to 
> understand how to find and upload multiple jars containing the Spark code. as 
> a feature request made during spec review, we should also allow the Spark 
> code to be provided as an archive that would be uploaded as a single file to 
> the cluster, but exploded when downloaded to the containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13577) Allow YARN to handle multiple jars, archive when uploading Spark dependencies

2016-03-11 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-13577.
---
Resolution: Fixed

> Allow YARN to handle multiple jars, archive when uploading Spark dependencies
> -
>
> Key: SPARK-13577
> URL: https://issues.apache.org/jira/browse/SPARK-13577
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>
> See parent bug for more details.
> Before we remove assemblies from Spark, we need the YARN backend to 
> understand how to find and upload multiple jars containing the Spark code. as 
> a feature request made during spec review, we should also allow the Spark 
> code to be provided as an archive that would be uploaded as a single file to 
> the cluster, but exploded when downloaded to the containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13826) Revise ScalaDoc of the new Dataset API

2016-03-11 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-13826:
--

 Summary: Revise ScalaDoc of the new Dataset API
 Key: SPARK-13826
 URL: https://issues.apache.org/jira/browse/SPARK-13826
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, SQL
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian


Tons of DataFrame operations were migrated to Dataset in SPARK-13244. We should 
revise ScalaDoc of these APIs. The following thing should be updated:

- {{@since}} tag
- {{@group}} tag
- Example code



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13817) Re-enable MiMA check after unifying DataFrame and Dataset API

2016-03-11 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-13817.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 11656
[https://github.com/apache/spark/pull/11656]

> Re-enable MiMA check after unifying DataFrame and Dataset API
> -
>
> Key: SPARK-13817
> URL: https://issues.apache.org/jira/browse/SPARK-13817
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
> Fix For: 2.0.0
>
>
> In [PR #11443|https://github.com/apache/spark/pull/11443], we unified 
> DataFrame and Dataset API. Since this PR did tons of API changes, we disabled 
> MiMA check temporarily for convenience. Now it is merged, we should re-enable 
> MiMA check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13827) Can't add subquery to an operator with same-name outputs while generate SQL string

2016-03-11 Thread Wenchen Fan (JIRA)

Wenchen Fan created SPARK-13827:
---

 Summary: Can't add subquery to an operator with same-name outputs 
while generate SQL string
 Key: SPARK-13827
 URL: https://issues.apache.org/jira/browse/SPARK-13827
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13828) QueryExecution's assertAnalyzed needs to preserve the stacktrace

2016-03-11 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-13828:
--

 Summary: QueryExecution's assertAnalyzed needs to preserve the 
stacktrace
 Key: SPARK-13828
 URL: https://issues.apache.org/jira/browse/SPARK-13828
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian


SPARK-13244 made Dataset always eager analyzed, and added an extra {{plan}} 
argument to {{AnalysisException}} to facilitate logical plan analysis debugging 
using {{QueryExecution.assertAnalyzed}}. (Previously we used to temporarily 
disable DataFrame eager analysis to report the partially analyzed plan tree.) 
However, the exception stack trace wasn't properly preserved. It should be 
added back.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13827) Can't add subquery to an operator with same-name outputs while generate SQL string

2016-03-11 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191022#comment-15191022
 ] 

Apache Spark commented on SPARK-13827:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/11658

> Can't add subquery to an operator with same-name outputs while generate SQL 
> string
> --
>
> Key: SPARK-13827
> URL: https://issues.apache.org/jira/browse/SPARK-13827
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13827) Can't add subquery to an operator with same-name outputs while generate SQL string

2016-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13827:


Assignee: Apache Spark

> Can't add subquery to an operator with same-name outputs while generate SQL 
> string
> --
>
> Key: SPARK-13827
> URL: https://issues.apache.org/jira/browse/SPARK-13827
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13827) Can't add subquery to an operator with same-name outputs while generate SQL string

2016-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13827:


Assignee: (was: Apache Spark)

> Can't add subquery to an operator with same-name outputs while generate SQL 
> string
> --
>
> Key: SPARK-13827
> URL: https://issues.apache.org/jira/browse/SPARK-13827
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13827) Can't add subquery to an operator with same-name outputs while generate SQL string

2016-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13827:


Assignee: Apache Spark

> Can't add subquery to an operator with same-name outputs while generate SQL 
> string
> --
>
> Key: SPARK-13827
> URL: https://issues.apache.org/jira/browse/SPARK-13827
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13825) Upgrade to Scala 2.11.8

2016-03-11 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13825:
--
Component/s: Spark Core

> Upgrade to Scala 2.11.8
> ---
>
> Key: SPARK-13825
> URL: https://issues.apache.org/jira/browse/SPARK-13825
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> Scala 2.11.8 is out so...time to upgrade before 2.0.0 is out -> 
> http://www.scala-lang.org/news/2.11.8/.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13821) TPC-DS Query 20 fails to compile

2016-03-11 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13821:
--
Component/s: SQL

> TPC-DS Query 20 fails to compile
> 
>
> Key: SPARK-13821
> URL: https://issues.apache.org/jira/browse/SPARK-13821
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
> Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo)
> Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 
> 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Roy Cecil
>
> TPC-DS Query 20 Fails to compile with the follwing Error Message
> Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( 
> tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( 
> expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA 
> identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) 
> );])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
> at org.antlr.runtime.DFA.predict(DFA.java:80)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)
> Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( 
> tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( 
> expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA 
> identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) 
> );])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
> at org.antlr.runtime.DFA.predict(DFA.java:80)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13790) Speed up ColumnVector's getDecimal

2016-03-11 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13790:
--
Assignee: Nong Li

> Speed up ColumnVector's getDecimal
> --
>
> Key: SPARK-13790
> URL: https://issues.apache.org/jira/browse/SPARK-13790
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Nong Li
>Assignee: Nong Li
>Priority: Minor
> Fix For: 2.0.0
>
>
> This should reuse a decimal object for the simple case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13820) TPC-DS Query 10 fails to compile

2016-03-11 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13820:
--
Component/s: SQL

> TPC-DS Query 10 fails to compile
> 
>
> Key: SPARK-13820
> URL: https://issues.apache.org/jira/browse/SPARK-13820
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
> Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo)
> Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 
> 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Roy Cecil
>
> TPC-DS Query 10 fails to compile with the following error.
> Parsing error: KW_SELECT )=> ( KW_EXISTS subQueryExpression ) -> ^( 
> TOK_SUBQUERY_EXPR ^( TOK_SUBQUERY_OP KW_EXISTS ) subQueryExpression ) );])
> at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
> at org.antlr.runtime.DFA.predict(DFA.java:144)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceEqualExpression(HiveParser_IdentifiersParser.java:8155)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceNotExpression(HiveParser_IdentifiersParser.java:9177)
> Parsing error: KW_SELECT )=> ( KW_EXISTS subQueryExpression ) -> ^( 
> TOK_SUBQUERY_EXPR ^( TOK_SUBQUERY_OP KW_EXISTS ) subQueryExpression ) );])
> at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
> at org.antlr.runtime.DFA.predict(DFA.java:144)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceEqualExpression(HiveParser_IdentifiersParser.java:8155)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceNotExpression(HiveParser_IdentifiersParser.java:9177)
> Query is pasted here for easy reproduction
>  select
>   cd_gender,
>   cd_marital_status,
>   cd_education_status,
>   count(*) cnt1,
>   cd_purchase_estimate,
>   count(*) cnt2,
>   cd_credit_rating,
>   count(*) cnt3,
>   cd_dep_count,
>   count(*) cnt4,
>   cd_dep_employed_count,
>   count(*) cnt5,
>   cd_dep_college_count,
>   count(*) cnt6
>  from
>   customer c
>   JOIN customer_address ca ON c.c_current_addr_sk = ca.ca_address_sk
>   JOIN customer_demographics ON cd_demo_sk = c.c_current_cdemo_sk
>   LEFT SEMI JOIN (select ss_customer_sk
>   from store_sales
>JOIN date_dim ON ss_sold_date_sk = d_date_sk
>   where
> d_year = 2002 and
> d_moy between 1 and 1+3) ss_wh1 ON c.c_customer_sk = 
> ss_wh1.ss_customer_sk
>  where
>   ca_county in ('Rush County','Toole County','Jefferson County','Dona Ana 
> County','La Porte County') and
>exists (
> select tmp.customer_sk from (
> select ws_bill_customer_sk as customer_sk
> from web_sales,date_dim
> where
>   web_sales.ws_sold_date_sk = date_dim.d_date_sk and
>   d_year = 2002 and
>   d_moy between 1 and 1+3
> UNION ALL
> select cs_ship_customer_sk as customer_sk
> from catalog_sales,date_dim
> where
>   catalog_sales.cs_sold_date_sk = date_dim.d_date_sk and
>   d_year = 2002 and
>   d_moy between 1 and 1+3
>   ) tmp where c.c_customer_sk = tmp.customer_sk
> )
>  group by cd_gender,
>   cd_marital_status,
>   cd_education_status,
>   cd_purchase_estimate,
>   cd_credit_rating,
>   cd_dep_count,
>   cd_dep_employed_count,
>   cd_dep_college_count
>  order by cd_gender,
>   cd_marital_status,
>   cd_education_status,
>   cd_purchase_estimate,
>   cd_credit_rating,
>   cd_dep_count,
>   cd_dep_employed_count,
>   cd_dep_college_count
>   limit 100;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13732) Remove projectList from Windows

2016-03-11 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13732:
--
Assignee: Xiao Li

> Remove projectList from Windows
> ---
>
> Key: SPARK-13732
> URL: https://issues.apache.org/jira/browse/SPARK-13732
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.0.0
>
>
> projectList is useless. Remove it from the class Window. It simplifies the 
> codes in Analyzer and Optimizer. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13797) Eliminate Unnecessary Window

2016-03-11 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13797:
--
Assignee: Xiao Li

> Eliminate Unnecessary Window
> 
>
> Key: SPARK-13797
> URL: https://issues.apache.org/jira/browse/SPARK-13797
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.0.0
>
>
> If the Window does not have any window expression, it is useless. It might 
> happen after column pruning



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13776) Web UI is not available after ./sbin/start-master.sh

2016-03-11 Thread Erik O'Shaughnessy (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191233#comment-15191233
 ] 

Erik O'Shaughnessy commented on SPARK-13776:


[~zsxwing] I've got your PR building, should have a test completed in an hour 
or so. 

> Web UI is not available after ./sbin/start-master.sh
> 
>
> Key: SPARK-13776
> URL: https://issues.apache.org/jira/browse/SPARK-13776
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.0
> Environment: Solaris 11.3, Oracle SPARC T-5 8 with 1024 hardware 
> threads
>Reporter: Erik O'Shaughnessy
>Priority: Minor
>
> The Apache Spark Web UI fails to become available after starting a Spark 
> master in stand-alone mode:
> $ ./sbin/start-master.sh
> The log file contains the following:
> {quote}
> cat spark-hadoop-org.apache.spark.deploy.master.Master-1-t5-8-002.out
> Spark Command: /usr/java/bin/java -cp 
> /usr/local/spark-1.6.0_nohadoop/conf/:/usr/local/spark-1.6.0_nohadoop/assembly/target/scala-2.10/spark-assembly-1.6.0-hadoop2.2.0.jar:/usr/local/spark-1.6.0_nohadoop/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark-1.6.0_nohadoop/lib_managed/jars/datanucleus-rdbms-3.2.9.jar:/usr/local/spark-1.6.0_nohadoop/lib_managed/jars/datanucleus-core-3.2.10.jar
>  -Xms1g -Xmx1g org.apache.spark.deploy.master.Master --ip t5-8-002 --port 
> 7077 --webui-port 8080
> 
> 16/01/27 12:00:42 WARN AbstractConnector: insufficient threads configured for 
> SelectChannelConnector@0.0.0.0:8080
> 16/01/27 12:00:42 WARN AbstractConnector: insufficient threads configured for 
> SelectChannelConnector@t5-8-002:6066
> {quote}
> I did some poking around and it seems that message is coming from Jetty and 
> indicates a mismatch between Jetty's default maxThreads configuration and the 
> actual number of CPUs available on the hardware (1024). I was not able to 
> find a way to successfully change Jetty's configuration at run-time. 
> Our work around was to disable CPUs until the WARN messages did not occur in 
> the log file, which was when NCPUs = 504. 
> I don't know for certain that this is isn't a known problem in Jetty from 
> looking at their bug reports, but I wasn't able to locate a Jetty issue that 
> described this problem.
> While not specifically an Apache Spark problem, I thought documenting it 
> would at least be helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13829) Spark submit with keytab can't submit to a non-HDFS yarn cluster

2016-03-11 Thread Steve Loughran (JIRA)

Steve Loughran created SPARK-13829:
--

 Summary: Spark submit with keytab can't submit to a non-HDFS yarn 
cluster
 Key: SPARK-13829
 URL: https://issues.apache.org/jira/browse/SPARK-13829
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.6.1
 Environment: Yarn cluster, spark submit launching with keytab& 
principal, cluster filesystem is *not* HDFS.
Reporter: Steve Loughran


If you try to submit work to a secure YARN cluster running on any FS other than 
HDFS, using a keytab+principal over kinited user, you get to see a stack trace 
from inside {{Client.getTokenRenewalInterval}}

root cause: there is no HDFS to get a delegation token, hence no delegation 
token to examine for a renewal interval



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13829) Spark submit with keytab can't submit to a non-HDFS yarn cluster

2016-03-11 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191315#comment-15191315
 ] 

Steve Loughran commented on SPARK-13829:


{code}
16/03/11 17:34:51 ERROR SparkContext: Error initializing 
SparkContext.java.util.NoSuchElementException: head of empty list
at scala.collection.immutable.Nil$.head(List.scala:337)
at scala.collection.immutable.Nil$.head(List.scala:334)
at 
org.apache.spark.deploy.yarn.Client.getTokenRenewalInterval(Client.scala:603)
at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:632)
at 
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:732)
at 
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:143)
at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.(SparkContext.scala:530)
at com.github.ehiggs.spark.terasort.TeraGen$.main(TeraGen.scala:49)
at com.github.ehiggs.spark.terasort.TeraGen.main(TeraGen.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{code}

> Spark submit with keytab can't submit to a non-HDFS yarn cluster
> 
>
> Key: SPARK-13829
> URL: https://issues.apache.org/jira/browse/SPARK-13829
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.1
> Environment: Yarn cluster, spark submit launching with keytab& 
> principal, cluster filesystem is *not* HDFS.
>Reporter: Steve Loughran
>
> If you try to submit work to a secure YARN cluster running on any FS other 
> than HDFS, using a keytab+principal over kinited user, you get to see a stack 
> trace from inside {{Client.getTokenRenewalInterval}}
> root cause: there is no HDFS to get a delegation token, hence no delegation 
> token to examine for a renewal interval



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13776) Web UI is not available after ./sbin/start-master.sh

2016-03-11 Thread Erik O'Shaughnessy (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191330#comment-15191330
 ] 

Erik O'Shaughnessy commented on SPARK-13776:


Looks good. Here is the log without a conf/spark-default.conf file:

{quote}
Spark Command: /usr/java/bin/java -cp 
/home/eoshaugh/local/spark/conf/:/home/eoshaugh/local/spark/assembly/target/scala-2.11/spark-assembly-2.0.0-SNAPSHOT-hadoop2.2.0.jar:/home/eoshaugh/local/spark/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar:/home/eoshaugh/local/spark/lib_managed/jars/datanucleus-core-3.2.10.jar:/home/eoshaugh/local/spark/lib_managed/jars/datanucleus-rdbms-3.2.9.jar
 -Xms1g -Xmx1g org.apache.spark.deploy.master.Master --ip t5-8-003 --port 7077 
--webui-port 8080

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/03/11 10:18:33 INFO Master: Started daemon with process name: 114940@t5-8-003
16/03/11 10:18:33 INFO Master: Registered signal handlers for [TERM, HUP, INT]
16/03/11 10:18:33 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
16/03/11 10:18:33 INFO SecurityManager: Changing view acls to: eoshaugh
16/03/11 10:18:33 INFO SecurityManager: Changing modify acls to: eoshaugh
16/03/11 10:18:33 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(eoshaugh); users 
with modify permissions: Set(eoshaugh)
16/03/11 10:18:34 INFO Utils: Successfully started service 'sparkMaster' on 
port 7077.
16/03/11 10:18:34 INFO Master: Starting Spark master at spark://t5-8-003:7077
16/03/11 10:18:34 INFO Master: Running Spark version 2.0.0-SNAPSHOT
16/03/11 10:18:34 INFO Utils: Successfully started service 'MasterUI' on port 
8080.
16/03/11 10:18:34 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started 
at http://10.137.232.160:8080
16/03/11 10:18:34 WARN AbstractConnector: insufficient threads configured for 
SelectChannelConnector@t5-8-003:6066
16/03/11 10:18:34 INFO Utils: Successfully started service on port 6066.
16/03/11 10:18:34 INFO StandaloneRestServer: Started REST server for submitting 
applications on port 6066
16/03/11 10:18:35 INFO Master: I have been elected leader! New state: ALIVE
{quote}

> Web UI is not available after ./sbin/start-master.sh
> 
>
> Key: SPARK-13776
> URL: https://issues.apache.org/jira/browse/SPARK-13776
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.0
> Environment: Solaris 11.3, Oracle SPARC T-5 8 with 1024 hardware 
> threads
>Reporter: Erik O'Shaughnessy
>Priority: Minor
>
> The Apache Spark Web UI fails to become available after starting a Spark 
> master in stand-alone mode:
> $ ./sbin/start-master.sh
> The log file contains the following:
> {quote}
> cat spark-hadoop-org.apache.spark.deploy.master.Master-1-t5-8-002.out
> Spark Command: /usr/java/bin/java -cp 
> /usr/local/spark-1.6.0_nohadoop/conf/:/usr/local/spark-1.6.0_nohadoop/assembly/target/scala-2.10/spark-assembly-1.6.0-hadoop2.2.0.jar:/usr/local/spark-1.6.0_nohadoop/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark-1.6.0_nohadoop/lib_managed/jars/datanucleus-rdbms-3.2.9.jar:/usr/local/spark-1.6.0_nohadoop/lib_managed/jars/datanucleus-core-3.2.10.jar
>  -Xms1g -Xmx1g org.apache.spark.deploy.master.Master --ip t5-8-002 --port 
> 7077 --webui-port 8080
> 
> 16/01/27 12:00:42 WARN AbstractConnector: insufficient threads configured for 
> SelectChannelConnector@0.0.0.0:8080
> 16/01/27 12:00:42 WARN AbstractConnector: insufficient threads configured for 
> SelectChannelConnector@t5-8-002:6066
> {quote}
> I did some poking around and it seems that message is coming from Jetty and 
> indicates a mismatch between Jetty's default maxThreads configuration and the 
> actual number of CPUs available on the hardware (1024). I was not able to 
> find a way to successfully change Jetty's configuration at run-time. 
> Our work around was to disable CPUs until the WARN messages did not occur in 
> the log file, which was when NCPUs = 504. 
> I don't know for certain that this is isn't a known problem in Jetty from 
> looking at their bug reports, but I wasn't able to locate a Jetty issue that 
> described this problem.
> While not specifically an Apache Spark problem, I thought documenting it 
> would at least be helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13780) SQL "incremental" build in maven is broken

2016-03-11 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-13780.

   Resolution: Fixed
 Assignee: Marcelo Vanzin
Fix Version/s: 2.0.0

> SQL "incremental" build in maven is broken
> --
>
> Key: SPARK-13780
> URL: https://issues.apache.org/jira/browse/SPARK-13780
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Minor
> Fix For: 2.0.0
>
>
> If you build Spark, and later try to build just the SQL module like this:
> {code}
> mvn ... -pl :spark-sql_2.11
> {code}
> You end up with a nasty error:
> {noformat}
> [error] uncaught exception during compilation: 
> scala.reflect.internal.Types$TypeError
> scala.reflect.internal.Types$TypeError: bad symbolic reference. A signature 
> in WebUI.class refers to term servlet
> in value org.jetty which is not available.
> It may be completely missing from the current classpath, or the version on
> {noformat}
> This is because of bad interaction between shading, Scala's signature field, 
> and internal APIs exposing shaded classes.
> The fix is simple, we just need to add an explicit dependency on the jetty 
> artifacts to the sql module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13821) TPC-DS Query 20 fails to compile

2016-03-11 Thread Ram Sriharsha (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ram Sriharsha updated SPARK-13821:
--
Description: 
TPC-DS Query 20 Fails to compile with the follwing Error Message
{format}
Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( tableAllColumns 
)=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( expression ( ( ( 
KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA identifier )* RPAREN 
) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) );])
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
at org.antlr.runtime.DFA.predict(DFA.java:80)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)
Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( tableAllColumns 
)=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( expression ( ( ( 
KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA identifier )* RPAREN 
) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) );])
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
at org.antlr.runtime.DFA.predict(DFA.java:80)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)

{format}

  was:
TPC-DS Query 20 Fails to compile with the follwing Error Message

Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( tableAllColumns 
)=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( expression ( ( ( 
KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA identifier )* RPAREN 
) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) );])
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
at org.antlr.runtime.DFA.predict(DFA.java:80)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)
Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( tableAllColumns 
)=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( expression ( ( ( 
KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA identifier )* RPAREN 
) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) );])
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
at org.antlr.runtime.DFA.predict(DFA.java:80)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)




> TPC-DS Query 20 fails to compile
> 
>
> Key: SPARK-13821
> URL: https://issues.apache.org/jira/browse/SPARK-13821
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
> Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo)
> Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 
> 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Roy Cecil
>
> TPC-DS Query 20 Fails to compile with the follwing Error Message
> {format}
> Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( 
> tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( 
> expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA 
> identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) 
> );])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
> at org.antlr.runtime.DFA.predict(DFA.jav

[jira] [Updated] (SPARK-13821) TPC-DS Query 20 fails to compile

2016-03-11 Thread Ram Sriharsha (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ram Sriharsha updated SPARK-13821:
--
Description: 
TPC-DS Query 20 Fails to compile with the follwing Error Message
{noformat}
Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( tableAllColumns 
)=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( expression ( ( ( 
KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA identifier )* RPAREN 
) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) );])
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
at org.antlr.runtime.DFA.predict(DFA.java:80)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)
Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( tableAllColumns 
)=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( expression ( ( ( 
KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA identifier )* RPAREN 
) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) );])
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
at org.antlr.runtime.DFA.predict(DFA.java:80)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)

{noformat}

  was:
TPC-DS Query 20 Fails to compile with the follwing Error Message
{format}
Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( tableAllColumns 
)=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( expression ( ( ( 
KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA identifier )* RPAREN 
) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) );])
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
at org.antlr.runtime.DFA.predict(DFA.java:80)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)
Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( tableAllColumns 
)=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( expression ( ( ( 
KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA identifier )* RPAREN 
) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) );])
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
at org.antlr.runtime.DFA.predict(DFA.java:80)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)

{format}


> TPC-DS Query 20 fails to compile
> 
>
> Key: SPARK-13821
> URL: https://issues.apache.org/jira/browse/SPARK-13821
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
> Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo)
> Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 
> 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Roy Cecil
>
> TPC-DS Query 20 Fails to compile with the follwing Error Message
> {noformat}
> Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( 
> tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( 
> expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA 
> identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) 
> );])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
> at org.antlr.runti

[jira] [Commented] (SPARK-13829) Spark submit with keytab can't submit to a non-HDFS yarn cluster

2016-03-11 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191372#comment-15191372
 ] 

Steve Loughran commented on SPARK-13829:


its a bit subtle here, as it's not clear you need a keytab+principal for long 
lived work on an non-HDFS cluster unless you need tokens to talk to Hive or 
HBase. No HDFS ==> no hdfs delegation tokens to renew, refresh and propagate.



> Spark submit with keytab can't submit to a non-HDFS yarn cluster
> 
>
> Key: SPARK-13829
> URL: https://issues.apache.org/jira/browse/SPARK-13829
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.1
> Environment: Yarn cluster, spark submit launching with keytab& 
> principal, cluster filesystem is *not* HDFS.
>Reporter: Steve Loughran
>
> If you try to submit work to a secure YARN cluster running on any FS other 
> than HDFS, using a keytab+principal over kinited user, you get to see a stack 
> trace from inside {{Client.getTokenRenewalInterval}}
> root cause: there is no HDFS to get a delegation token, hence no delegation 
> token to examine for a renewal interval



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-13821) TPC-DS Query 20 fails to compile

2016-03-11 Thread Roy Cecil (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roy Cecil updated SPARK-13821:
--
Comment: was deleted

(was: Query Text is
select i_item_id
   ,i_item_desc
   ,i_category
   ,i_class
   ,i_current_price
   ,sum(cs_ext_sales_price) as itemrevenue
   ,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
   (partition by i_class) as revenueratio
 from   catalog_sales
 ,item
 ,date_dim
 where cs_item_sk = i_item_sk
   and i_category in ('Sports', 'Books', 'Home')
   and cs_sold_date_sk = d_date_sk
 and d_date between cast('1999-02-22' as date)
  and date_add(cast('1999-02-22' as date), 30)
 group by i_item_id
 ,i_item_desc
 ,i_category
 ,i_class
 ,i_current_price
 order by i_category
 ,i_class
 ,i_item_id
 ,i_item_desc
 ,revenueratio
LIMIT 100;)

> TPC-DS Query 20 fails to compile
> 
>
> Key: SPARK-13821
> URL: https://issues.apache.org/jira/browse/SPARK-13821
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
> Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo)
> Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 
> 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Roy Cecil
>
> TPC-DS Query 20 Fails to compile with the follwing Error Message
> {noformat}
> Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( 
> tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( 
> expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA 
> identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) 
> );])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
> at org.antlr.runtime.DFA.predict(DFA.java:80)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)
> Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( 
> tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( 
> expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA 
> identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) 
> );])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
> at org.antlr.runtime.DFA.predict(DFA.java:80)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-13821) TPC-DS Query 20 fails to compile

2016-03-11 Thread Roy Cecil (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roy Cecil updated SPARK-13821:
--
Comment: was deleted

(was: select i_item_id
   ,i_item_desc
   ,i_category
   ,i_class
   ,i_current_price
   ,sum(cs_ext_sales_price) as itemrevenue
   ,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
   (partition by i_class) as revenueratio
 from   catalog_sales
 ,item
 ,date_dim
 where cs_item_sk = i_item_sk
   and i_category in ('Sports', 'Books', 'Home')
   and cs_sold_date_sk = d_date_sk
 and d_date between cast('1999-02-22' as date)
  and date_add(cast('1999-02-22' as date), 30)
 group by i_item_id
 ,i_item_desc
 ,i_category
 ,i_class
 ,i_current_price
 order by i_category
 ,i_class
 ,i_item_id
 ,i_item_desc
 ,revenueratio
LIMIT 100;)

> TPC-DS Query 20 fails to compile
> 
>
> Key: SPARK-13821
> URL: https://issues.apache.org/jira/browse/SPARK-13821
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
> Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo)
> Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 
> 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Roy Cecil
>
> TPC-DS Query 20 Fails to compile with the follwing Error Message
> {noformat}
> Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( 
> tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( 
> expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA 
> identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) 
> );])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
> at org.antlr.runtime.DFA.predict(DFA.java:80)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)
> Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( 
> tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( 
> expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA 
> identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) 
> );])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
> at org.antlr.runtime.DFA.predict(DFA.java:80)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-13821) TPC-DS Query 20 fails to compile

2016-03-11 Thread Roy Cecil (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roy Cecil updated SPARK-13821:
--
Comment: was deleted

(was: Query Text is
select i_item_id
   ,i_item_desc
   ,i_category
   ,i_class
   ,i_current_price
   ,sum(cs_ext_sales_price) as itemrevenue
   ,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
   (partition by i_class) as revenueratio
 from   catalog_sales
 ,item
 ,date_dim
 where cs_item_sk = i_item_sk
   and i_category in ('Sports', 'Books', 'Home')
   and cs_sold_date_sk = d_date_sk
 and d_date between cast('1999-02-22' as date)
  and date_add(cast('1999-02-22' as date), 30)
 group by i_item_id
 ,i_item_desc
 ,i_category
 ,i_class
 ,i_current_price
 order by i_category
 ,i_class
 ,i_item_id
 ,i_item_desc
 ,revenueratio
LIMIT 100;)

> TPC-DS Query 20 fails to compile
> 
>
> Key: SPARK-13821
> URL: https://issues.apache.org/jira/browse/SPARK-13821
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
> Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo)
> Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 
> 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Roy Cecil
>
> TPC-DS Query 20 Fails to compile with the follwing Error Message
> {noformat}
> Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( 
> tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( 
> expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA 
> identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) 
> );])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
> at org.antlr.runtime.DFA.predict(DFA.java:80)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)
> Parsing error: NoViableAltException(10@[127:1: selectItem : ( ( 
> tableAllColumns )=> tableAllColumns -> ^( TOK_SELEXPR tableAllColumns ) | ( 
> expression ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN identifier ( COMMA 
> identifier )* RPAREN ) )? ) -> ^( TOK_SELEXPR expression ( identifier )* ) 
> );])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA17.specialStateTransition(HiveParser_SelectClauseParser.java:11835)
> at org.antlr.runtime.DFA.predict(DFA.java:80)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2853)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1401)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13830) Fetch large directly result from executor is very slow

2016-03-11 Thread Davies Liu (JIRA)

Davies Liu created SPARK-13830:
--

 Summary: Fetch large directly result from executor is very slow
 Key: SPARK-13830
 URL: https://issues.apache.org/jira/browse/SPARK-13830
 Project: Spark
  Issue Type: Task
  Components: Spark Core
Reporter: Davies Liu


Given two task with 100+M result on each, it take more than 50 seconds to fetch 
the results.

The RPC may be not designed to handle large block, we should use block manager 
for that. But currently this is based on spark.rpc.message.maxSize, which is 
usually very large (> 128M) for safe, it's too large for handling results.

We also counting the time to fetch the direct result (also deserialize it) as 
schedule delay, it also make sense to only fetch much smaller blocks via 
DirectResult.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13831) TPC-DS Query 35 fails with the following compile error

2016-03-11 Thread Roy Cecil (JIRA)

Roy Cecil created SPARK-13831:
-

 Summary: TPC-DS Query 35 fails with the following compile error
 Key: SPARK-13831
 URL: https://issues.apache.org/jira/browse/SPARK-13831
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Roy Cecil


TPC-DS Query 35 fails with the following compile error.

Scala.NotImplementedError: 
scala.NotImplementedError: No parse rules for ASTNode type: 864, text: 
TOK_SUBQUERY_EXPR :
TOK_SUBQUERY_EXPR 1, 439,797, 1370
  TOK_SUBQUERY_OP 1, 439,439, 1370
exists 1, 439,439, 1370
  TOK_QUERY 1, 441,797, 1508

Pasting Query 35 for easy reference.
select
  ca_state,
  cd_gender,
  cd_marital_status,
  cd_dep_count,
  count(*) cnt1,
  min(cd_dep_count) cd_dep_count1,
  max(cd_dep_count) cd_dep_count2,
  avg(cd_dep_count) cd_dep_count3,
  cd_dep_employed_count,
  count(*) cnt2,
  min(cd_dep_employed_count) cd_dep_employed_count1,
  max(cd_dep_employed_count) cd_dep_employed_count2,
  avg(cd_dep_employed_count) cd_dep_employed_count3,
  cd_dep_college_count,
  count(*) cnt3,
  min(cd_dep_college_count) cd_dep_college_count1,
  max(cd_dep_college_count) cd_dep_college_count2,
  avg(cd_dep_college_count) cd_dep_college_count3
 from
  customer c
  JOIN customer_address ca ON c.c_current_addr_sk = ca.ca_address_sk
  JOIN customer_demographics ON cd_demo_sk = c.c_current_cdemo_sk
  LEFT SEMI JOIN
  (select ss_customer_sk
  from store_sales
   JOIN date_dim ON ss_sold_date_sk = d_date_sk
  where
d_year = 2002 and
d_qoy < 4) ss_wh1
  ON c.c_customer_sk = ss_wh1.ss_customer_sk
 where
   exists (
select tmp.customer_sk from (
select ws_bill_customer_sk  as customer_sk
from web_sales,date_dim
where
  ws_sold_date_sk = d_date_sk and
  d_year = 2002 and
  d_qoy < 4
   UNION ALL
select cs_ship_customer_sk  as customer_sk
from catalog_sales,date_dim
where
  cs_sold_date_sk = d_date_sk and
  d_year = 2002 and
  d_qoy < 4
  ) tmp where c.c_customer_sk = tmp.customer_sk
)
 group by ca_state,
  cd_gender,
  cd_marital_status,
  cd_dep_count,
  cd_dep_employed_count,
  cd_dep_college_count
 order by ca_state,
  cd_gender,
  cd_marital_status,
  cd_dep_count,
  cd_dep_employed_count,
  cd_dep_college_count
 limit 100;




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13832) TPC-DS Query 36 fails with Parser error

2016-03-11 Thread Roy Cecil (JIRA)

Roy Cecil created SPARK-13832:
-

 Summary: TPC-DS Query 36 fails with Parser error
 Key: SPARK-13832
 URL: https://issues.apache.org/jira/browse/SPARK-13832
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.1
Reporter: Roy Cecil


TPC-DS query 36 fails with the following error




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13832) TPC-DS Query 36 fails with Parser error

2016-03-11 Thread Roy Cecil (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roy Cecil updated SPARK-13832:
--
Description: 
TPC-DS query 36 fails with the following error
Analyzer error: 16/02/28 21:22:51 INFO parse.ParseDriver: Parse Completed
Exception in thread "main" org.apache.spark.sql.AnalysisException: expression 
'i_category' is neither present in the group by, nor is it an aggregate 
function. Add to group by or wrap in first() (or first_value) if you don't care 
which value you get.;
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44)

Query Text pasted here for quick reference.
  select
sum(ss_net_profit)/sum(ss_ext_sales_price) as gross_margin
   ,i_category
   ,i_class
   ,grouping__id as lochierarchy
   ,rank() over (
partition by grouping__id,
case when grouping__id = 0 then i_category end
order by sum(ss_net_profit)/sum(ss_ext_sales_price) asc) as 
rank_within_parent
 from
store_sales
   ,date_dim   d1
   ,item
   ,store
 where
d1.d_year = 2001
 and d1.d_date_sk = ss_sold_date_sk
 and i_item_sk  = ss_item_sk
 and s_store_sk  = ss_store_sk
 and s_state in ('TN','TN','TN','TN',
 'TN','TN','TN','TN')
 group by i_category,i_class WITH ROLLUP
 order by
   lochierarchy desc
  ,case when lochierarchy = 0 then i_category end
  ,rank_within_parent
limit 100;


  was:
TPC-DS query 36 fails with the following error



> TPC-DS Query 36 fails with Parser error
> ---
>
> Key: SPARK-13832
> URL: https://issues.apache.org/jira/browse/SPARK-13832
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Roy Cecil
>
> TPC-DS query 36 fails with the following error
> Analyzer error: 16/02/28 21:22:51 INFO parse.ParseDriver: Parse Completed
> Exception in thread "main" org.apache.spark.sql.AnalysisException: expression 
> 'i_category' is neither present in the group by, nor is it an aggregate 
> function. Add to group by or wrap in first() (or first_value) if you don't 
> care which value you get.;
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
> at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44)
> Query Text pasted here for quick reference.
>   select
> sum(ss_net_profit)/sum(ss_ext_sales_price) as gross_margin
>,i_category
>,i_class
>,grouping__id as lochierarchy
>,rank() over (
> partition by grouping__id,
> case when grouping__id = 0 then i_category end
> order by sum(ss_net_profit)/sum(ss_ext_sales_price) asc) as 
> rank_within_parent
>  from
> store_sales
>,date_dim   d1
>,item
>,store
>  where
> d1.d_year = 2001
>  and d1.d_date_sk = ss_sold_date_sk
>  and i_item_sk  = ss_item_sk
>  and s_store_sk  = ss_store_sk
>  and s_state in ('TN','TN','TN','TN',
>  'TN','TN','TN','TN')
>  group by i_category,i_class WITH ROLLUP
>  order by
>lochierarchy desc
>   ,case when lochierarchy = 0 then i_category end
>   ,rank_within_parent
> limit 100;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13832) TPC-DS Query 36 fails with Parser error

2016-03-11 Thread Roy Cecil (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roy Cecil updated SPARK-13832:
--
Environment: 
Red Hat Enterprise Linux Server release 7.1 (Maipo)
Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 18:37:38 
EST 2015 x86_64 x86_64 x86_64 GNU/Linux

> TPC-DS Query 36 fails with Parser error
> ---
>
> Key: SPARK-13832
> URL: https://issues.apache.org/jira/browse/SPARK-13832
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
> Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo)
> Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 
> 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Roy Cecil
>
> TPC-DS query 36 fails with the following error
> Analyzer error: 16/02/28 21:22:51 INFO parse.ParseDriver: Parse Completed
> Exception in thread "main" org.apache.spark.sql.AnalysisException: expression 
> 'i_category' is neither present in the group by, nor is it an aggregate 
> function. Add to group by or wrap in first() (or first_value) if you don't 
> care which value you get.;
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
> at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44)
> Query Text pasted here for quick reference.
>   select
> sum(ss_net_profit)/sum(ss_ext_sales_price) as gross_margin
>,i_category
>,i_class
>,grouping__id as lochierarchy
>,rank() over (
> partition by grouping__id,
> case when grouping__id = 0 then i_category end
> order by sum(ss_net_profit)/sum(ss_ext_sales_price) asc) as 
> rank_within_parent
>  from
> store_sales
>,date_dim   d1
>,item
>,store
>  where
> d1.d_year = 2001
>  and d1.d_date_sk = ss_sold_date_sk
>  and i_item_sk  = ss_item_sk
>  and s_store_sk  = ss_store_sk
>  and s_state in ('TN','TN','TN','TN',
>  'TN','TN','TN','TN')
>  group by i_category,i_class WITH ROLLUP
>  order by
>lochierarchy desc
>   ,case when lochierarchy = 0 then i_category end
>   ,rank_within_parent
> limit 100;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13328) Possible poor read performance for broadcast variables with dynamic resource allocation

2016-03-11 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-13328.
---
  Resolution: Fixed
Assignee: Nezih Yigitbasi
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Possible poor read performance for broadcast variables with dynamic resource 
> allocation
> ---
>
> Key: SPARK-13328
> URL: https://issues.apache.org/jira/browse/SPARK-13328
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2
>Reporter: Nezih Yigitbasi
>Assignee: Nezih Yigitbasi
> Fix For: 2.0.0
>
>
> When dynamic resource allocation is enabled fetching broadcast variables from 
> removed executors were causing job failures and SPARK-9591 fixed this problem 
> by trying all locations of a block before giving up. However, the locations 
> of a block is retrieved only once from the driver in this process and the 
> locations in this list can be stale due to dynamic resource allocation. This 
> situation gets worse when running on a large cluster as the size of this 
> location list can be in the order of several hundreds out of which there may 
> be tens of stale entries. What we have observed is with the default settings 
> of 3 max retries and 5s between retries (that's 15s per location) the time it 
> takes to read a broadcast variable can be as high as ~17m (below log shows 
> the failed 70th block fetch attempt where each attempt takes 15s)
> {code}
> ...
> 16/02/13 01:02:27 WARN storage.BlockManager: Failed to fetch remote block 
> broadcast_18_piece0 from BlockManagerId(8, ip-10-178-77-38.ec2.internal, 
> 60675) (failed attempt 70)
> ...
> 16/02/13 01:02:27 INFO broadcast.TorrentBroadcast: Reading broadcast variable 
> 18 took 1051049 ms
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13830) Fetch large directly result from executor is very slow

2016-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13830:


Assignee: (was: Apache Spark)

> Fetch large directly result from executor is very slow
> --
>
> Key: SPARK-13830
> URL: https://issues.apache.org/jira/browse/SPARK-13830
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Reporter: Davies Liu
>
> Given two task with 100+M result on each, it take more than 50 seconds to 
> fetch the results.
> The RPC may be not designed to handle large block, we should use block 
> manager for that. But currently this is based on spark.rpc.message.maxSize, 
> which is usually very large (> 128M) for safe, it's too large for handling 
> results.
> We also counting the time to fetch the direct result (also deserialize it) as 
> schedule delay, it also make sense to only fetch much smaller blocks via 
> DirectResult.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13830) Fetch large directly result from executor is very slow

2016-03-11 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191511#comment-15191511
 ] 

Apache Spark commented on SPARK-13830:
--

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/11659

> Fetch large directly result from executor is very slow
> --
>
> Key: SPARK-13830
> URL: https://issues.apache.org/jira/browse/SPARK-13830
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Reporter: Davies Liu
>
> Given two task with 100+M result on each, it take more than 50 seconds to 
> fetch the results.
> The RPC may be not designed to handle large block, we should use block 
> manager for that. But currently this is based on spark.rpc.message.maxSize, 
> which is usually very large (> 128M) for safe, it's too large for handling 
> results.
> We also counting the time to fetch the direct result (also deserialize it) as 
> schedule delay, it also make sense to only fetch much smaller blocks via 
> DirectResult.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13830) Fetch large directly result from executor is very slow

2016-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13830:


Assignee: Apache Spark

> Fetch large directly result from executor is very slow
> --
>
> Key: SPARK-13830
> URL: https://issues.apache.org/jira/browse/SPARK-13830
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Reporter: Davies Liu
>Assignee: Apache Spark
>
> Given two task with 100+M result on each, it take more than 50 seconds to 
> fetch the results.
> The RPC may be not designed to handle large block, we should use block 
> manager for that. But currently this is based on spark.rpc.message.maxSize, 
> which is usually very large (> 128M) for safe, it's too large for handling 
> results.
> We also counting the time to fetch the direct result (also deserialize it) as 
> schedule delay, it also make sense to only fetch much smaller blocks via 
> DirectResult.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13833) Guard against race condition when re-caching spilled bytes in memory

2016-03-11 Thread Josh Rosen (JIRA)

Josh Rosen created SPARK-13833:
--

 Summary: Guard against race condition when re-caching spilled 
bytes in memory
 Key: SPARK-13833
 URL: https://issues.apache.org/jira/browse/SPARK-13833
 Project: Spark
  Issue Type: Improvement
  Components: Block Manager
Reporter: Josh Rosen
Assignee: Josh Rosen


When reading data from the DiskStore and attempting to cache it back into the 
memory store, we should guard against race conditions where multiple readers 
are attempting to re-cache the same block in memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13806) SQL round() produces incorrect results for negative values

2016-03-11 Thread Mark Hamstra (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Hamstra updated SPARK-13806:
-
Description: 
Round in catalyst/expressions/mathExpressions.scala appears to be untested with 
negative values, and it doesn't handle them correctly.

There are at least two issues here:

First, in the genCode for FloatType and DoubleType with _scale == 0, round() 
will not produce the same results as for the BigDecimal.ROUND_HALF_UP strategy 
used in all other cases.  This is because Math.round is used for these _scale 
== 0 cases.  For example, Math.round(-3.5) is -3, while 
BigDecimal.ROUND_HALF_UP at scale 0 for -3.5 is -4. 

Even after this bug is fixed with something like...
{code}
if (${ce.value} < 0) {
  ${ev.value} = -1 * Math.round(-1 * ${ce.value});
} else {
  ${ev.value} = Math.round(${ce.value});
}
{code}
...which will allow an additional test like this to succeed in 
MathFunctionsSuite.scala:
{code}
checkEvaluation(Round(-3.5D, 0), -4.0D, EmptyRow)
{code}
...there still appears to be a problem on at least the 
checkEvalutionWithUnsafeProjection path, where failures like this are produced:
{code}
Incorrect evaluation in unsafe mode: round(-3.141592653589793, -6), actual: 
[0,0], expected: [0,8000] (ExpressionEvalHelper.scala:145)
{code} 

  was:
Round in catalyst/expressions/mathExpressions.scala appears to be untested with 
negative values, and it doesn't handle them correctly.

There are at least two issues here:

First, in the genCode for FloatType and DoubleType with _scale == 0, round() 
will not produce the same results as for the BigDecimal.ROUND_HALF_UP strategy 
used in all other cases.  This is because Math.round is used for these _scale 
== 0 cases.  For example, Math.round(-3.5) is -3, while 
BigDecimal.ROUND_HALF_UP at scale 0 for -3.5 is -4. 

Even after this bug is fixed with something like...
{code}
if (${ce.value} < 0) {
  ${ev.value} = -1 * Math.round(-1 * ${ce.value});
} else {
  ${ev.value} = Math.round(${ce.value});
}
{code}
...which will allow an additional test like this to succeed in 
MathFunctionsSuite.scala:
{code}
checkEvaluation(Round(-3.5D, 0), -4.0D)
{code}
...there still appears to be a problem on at least the 
checkEvalutionWithUnsafeProjection path, where failures like this are produced:
{code}
Incorrect evaluation in unsafe mode: round(-3.141592653589793, -6), actual: 
[0,0], expected: [0,8000] (ExpressionEvalHelper.scala:145)
{code} 


> SQL round() produces incorrect results for negative values
> --
>
> Key: SPARK-13806
> URL: https://issues.apache.org/jira/browse/SPARK-13806
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0, 1.6.1, 2.0.0
>Reporter: Mark Hamstra
>
> Round in catalyst/expressions/mathExpressions.scala appears to be untested 
> with negative values, and it doesn't handle them correctly.
> There are at least two issues here:
> First, in the genCode for FloatType and DoubleType with _scale == 0, round() 
> will not produce the same results as for the BigDecimal.ROUND_HALF_UP 
> strategy used in all other cases.  This is because Math.round is used for 
> these _scale == 0 cases.  For example, Math.round(-3.5) is -3, while 
> BigDecimal.ROUND_HALF_UP at scale 0 for -3.5 is -4. 
> Even after this bug is fixed with something like...
> {code}
> if (${ce.value} < 0) {
>   ${ev.value} = -1 * Math.round(-1 * ${ce.value});
> } else {
>   ${ev.value} = Math.round(${ce.value});
> }
> {code}
> ...which will allow an additional test like this to succeed in 
> MathFunctionsSuite.scala:
> {code}
> checkEvaluation(Round(-3.5D, 0), -4.0D, EmptyRow)
> {code}
> ...there still appears to be a problem on at least the 
> checkEvalutionWithUnsafeProjection path, where failures like this are 
> produced:
> {code}
> Incorrect evaluation in unsafe mode: round(-3.141592653589793, -6), actual: 
> [0,0], expected: [0,8000] (ExpressionEvalHelper.scala:145)
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13833) Guard against race condition when re-caching spilled bytes in memory

2016-03-11 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191598#comment-15191598
 ] 

Apache Spark commented on SPARK-13833:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/11660

> Guard against race condition when re-caching spilled bytes in memory
> 
>
> Key: SPARK-13833
> URL: https://issues.apache.org/jira/browse/SPARK-13833
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> When reading data from the DiskStore and attempting to cache it back into the 
> memory store, we should guard against race conditions where multiple readers 
> are attempting to re-cache the same block in memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13833) Guard against race condition when re-caching spilled bytes in memory

2016-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13833:


Assignee: Josh Rosen  (was: Apache Spark)

> Guard against race condition when re-caching spilled bytes in memory
> 
>
> Key: SPARK-13833
> URL: https://issues.apache.org/jira/browse/SPARK-13833
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> When reading data from the DiskStore and attempting to cache it back into the 
> memory store, we should guard against race conditions where multiple readers 
> are attempting to re-cache the same block in memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13833) Guard against race condition when re-caching spilled bytes in memory

2016-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13833:


Assignee: Apache Spark  (was: Josh Rosen)

> Guard against race condition when re-caching spilled bytes in memory
> 
>
> Key: SPARK-13833
> URL: https://issues.apache.org/jira/browse/SPARK-13833
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager
>Reporter: Josh Rosen
>Assignee: Apache Spark
>
> When reading data from the DiskStore and attempting to cache it back into the 
> memory store, we should guard against race conditions where multiple readers 
> are attempting to re-cache the same block in memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13833) Guard against race condition when re-caching spilled bytes in memory

2016-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13833:


Assignee: Apache Spark  (was: Josh Rosen)

> Guard against race condition when re-caching spilled bytes in memory
> 
>
> Key: SPARK-13833
> URL: https://issues.apache.org/jira/browse/SPARK-13833
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager
>Reporter: Josh Rosen
>Assignee: Apache Spark
>
> When reading data from the DiskStore and attempting to cache it back into the 
> memory store, we should guard against race conditions where multiple readers 
> are attempting to re-cache the same block in memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13833) Guard against race condition when re-caching spilled bytes in memory

2016-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13833:


Assignee: Josh Rosen  (was: Apache Spark)

> Guard against race condition when re-caching spilled bytes in memory
> 
>
> Key: SPARK-13833
> URL: https://issues.apache.org/jira/browse/SPARK-13833
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> When reading data from the DiskStore and attempting to cache it back into the 
> memory store, we should guard against race conditions where multiple readers 
> are attempting to re-cache the same block in memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13834) Update sbt for 2.x

2016-03-11 Thread Dongjoon Hyun (JIRA)

Dongjoon Hyun created SPARK-13834:
-

 Summary: Update sbt for 2.x
 Key: SPARK-13834
 URL: https://issues.apache.org/jira/browse/SPARK-13834
 Project: Spark
  Issue Type: Improvement
Reporter: Dongjoon Hyun
Priority: Minor


For 2.0.0, we had better bump `sbt`, too.

{code:title=project/build.properties|borderStyle=solid}
-sbt.version=0.13.9
+sbt.version=0.13.11
{code}

SBT 0.13.11 fixes wrong warnings and improve incremental compilation.

*REFERENCE*
https://github.com/sbt/sbt/releases/tag/v0.13.11




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13835) IsNotNull Filters for the BinaryComparison inside Not

2016-03-11 Thread Xiao Li (JIRA)

Xiao Li created SPARK-13835:
---

 Summary: IsNotNull Filters for the BinaryComparison inside Not
 Key: SPARK-13835
 URL: https://issues.apache.org/jira/browse/SPARK-13835
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.0.0
Reporter: Xiao Li


So far, inside Not, we only generate IsNotNull Constraints for Equal. However, 
we also can do it for the others: LessThan, LessThanOrEqual, GreaterThan, 
GreaterThanOrEqual



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13834) Update sbt for 2.x

2016-03-11 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191616#comment-15191616
 ] 

Apache Spark commented on SPARK-13834:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/11661

> Update sbt for 2.x
> --
>
> Key: SPARK-13834
> URL: https://issues.apache.org/jira/browse/SPARK-13834
> Project: Spark
>  Issue Type: Improvement
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> For 2.0.0, we had better bump `sbt`, too.
> {code:title=project/build.properties|borderStyle=solid}
> -sbt.version=0.13.9
> +sbt.version=0.13.11
> {code}
> SBT 0.13.11 fixes wrong warnings and improve incremental compilation.
> *REFERENCE*
> https://github.com/sbt/sbt/releases/tag/v0.13.11



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13836) IsNotNull Constraints for the BinaryComparison inside Not

2016-03-11 Thread Xiao Li (JIRA)

Xiao Li created SPARK-13836:
---

 Summary: IsNotNull Constraints for the BinaryComparison inside Not
 Key: SPARK-13836
 URL: https://issues.apache.org/jira/browse/SPARK-13836
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.0.0
Reporter: Xiao Li


So far, inside Not, we only generate IsNotNull Constraints for Equal. However, 
we also can do it for the others: LessThan, LessThanOrEqual, GreaterThan, 
GreaterThanOrEqual



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13834) Update sbt for 2.x

2016-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13834:


Assignee: Apache Spark

> Update sbt for 2.x
> --
>
> Key: SPARK-13834
> URL: https://issues.apache.org/jira/browse/SPARK-13834
> Project: Spark
>  Issue Type: Improvement
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
>
> For 2.0.0, we had better bump `sbt`, too.
> {code:title=project/build.properties|borderStyle=solid}
> -sbt.version=0.13.9
> +sbt.version=0.13.11
> {code}
> SBT 0.13.11 fixes wrong warnings and improve incremental compilation.
> *REFERENCE*
> https://github.com/sbt/sbt/releases/tag/v0.13.11



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 107 matches

Mail list logo