date:20160713

[GitHub] spark issue #14165: [SPARK-16503] SparkSession should provide Spark version

2016-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14165
  
**[Test build #62216 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62216/consoleFull)**
 for PR 14165 at commit 
[`1ea0247`](https://github.com/apache/spark/commit/1ea0247cfd68823ce6175cec42e2027334d31451).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14106: [SPARK-16448] RemoveAliasOnlyProject should not r...

2016-07-13 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14106#discussion_r70585442
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -165,36 +165,48 @@ object PushProjectThroughSample extends 
Rule[LogicalPlan] {
  * but can also benefit other operators.
  */
 object RemoveAliasOnlyProject extends Rule[LogicalPlan] {
-  // Check if projectList in the Project node has the same attribute names 
and ordering
-  // as its child node.
+  /**
+   * Returns true if the project list is semantically same with child 
output, after strip alias on
+   * attribute.
+   */
   private def isAliasOnly(
   projectList: Seq[NamedExpression],
   childOutput: Seq[Attribute]): Boolean = {
-if (!projectList.forall(_.isInstanceOf[Alias]) || projectList.length 
!= childOutput.length) {
+if (projectList.length != childOutput.length) {
   false
 } else {
-  projectList.map(_.asInstanceOf[Alias]).zip(childOutput).forall { 
case (a, o) =>
-a.child match {
-  case attr: Attribute if a.name == attr.name && 
attr.semanticEquals(o) => true
-  case _ => false
-}
+  stripAliasOnAttribute(projectList).zip(childOutput).forall {
+case (a: Attribute, o) if a semanticEquals o => true
+case _ => false
   }
 }
   }
 
+  private def stripAliasOnAttribute(projectList: Seq[NamedExpression]) = {
+projectList.map {
+  // Alias with metadata can not be striped, or the metadata will be 
lost.
+  // If the alias name is different from attribute name, we can't 
strip it either, or we may
+  // accidentally change the output schema name of the root plan.
+  case a @ Alias(attr: Attribute, name) if a.metadata == 
Metadata.empty && name == attr.name =>
+attr
+  case other => other
+}
+  }
+
   def apply(plan: LogicalPlan): LogicalPlan = {
-val aliasOnlyProject = plan.find {
-  case Project(pList, child) if isAliasOnly(pList, child.output) => 
true
-  case _ => false
+val aliasOnlyProject = plan.collectFirst {
+  case p @ Project(pList, child) if isAliasOnly(pList, child.output) 
=> p
 }
 
-aliasOnlyProject.map { case p: Project =>
-  val aliases = p.projectList.map(_.asInstanceOf[Alias])
-  val attrMap = AttributeMap(aliases.map(a => (a.toAttribute, 
a.child)))
+aliasOnlyProject.map { case proj =>
+  val attributesToReplace = 
proj.output.zip(proj.child.output).filterNot {
+case (a1, a2) => a1 semanticEquals a2
+  }
+  val attrMap = AttributeMap(attributesToReplace)
   plan.transformAllExpressions {
 case a: Attribute if attrMap.contains(a) => attrMap(a)
   }.transform {
-case op: Project if op.eq(p) => op.child
+case plan: Project if plan eq proj => plan.child
   }
 }.getOrElse(plan)
   }
--- End diff --

Can we use a `plan.transform` to implement this rule?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14106: [SPARK-16448] RemoveAliasOnlyProject should not r...

2016-07-13 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14106#discussion_r70584787
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -165,36 +165,48 @@ object PushProjectThroughSample extends 
Rule[LogicalPlan] {
  * but can also benefit other operators.
  */
 object RemoveAliasOnlyProject extends Rule[LogicalPlan] {
-  // Check if projectList in the Project node has the same attribute names 
and ordering
-  // as its child node.
+  /**
+   * Returns true if the project list is semantically same with child 
output, after strip alias on
+   * attribute.
+   */
   private def isAliasOnly(
   projectList: Seq[NamedExpression],
   childOutput: Seq[Attribute]): Boolean = {
-if (!projectList.forall(_.isInstanceOf[Alias]) || projectList.length 
!= childOutput.length) {
+if (projectList.length != childOutput.length) {
   false
 } else {
-  projectList.map(_.asInstanceOf[Alias]).zip(childOutput).forall { 
case (a, o) =>
-a.child match {
-  case attr: Attribute if a.name == attr.name && 
attr.semanticEquals(o) => true
-  case _ => false
-}
+  stripAliasOnAttribute(projectList).zip(childOutput).forall {
+case (a: Attribute, o) if a semanticEquals o => true
+case _ => false
   }
 }
   }
 
+  private def stripAliasOnAttribute(projectList: Seq[NamedExpression]) = {
+projectList.map {
+  // Alias with metadata can not be striped, or the metadata will be 
lost.
--- End diff --

Nit: striped => stripped


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14106: [SPARK-16448] RemoveAliasOnlyProject should not r...

2016-07-13 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14106#discussion_r70584778
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -165,36 +165,48 @@ object PushProjectThroughSample extends 
Rule[LogicalPlan] {
  * but can also benefit other operators.
  */
 object RemoveAliasOnlyProject extends Rule[LogicalPlan] {
-  // Check if projectList in the Project node has the same attribute names 
and ordering
-  // as its child node.
+  /**
+   * Returns true if the project list is semantically same with child 
output, after strip alias on
--- End diff --

Nit: "... same with ..." => "... same as ..."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14173: [SPARKR][SPARK-16507] Add a CRAN checker, fix Rd ...

2016-07-13 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14173#discussion_r70583775
  
--- Diff: R/pkg/R/column.R ---
@@ -235,20 +248,16 @@ setMethod("cast",
   function(x, dataType) {
 if (is.character(dataType)) {
   column(callJMethod(x@jc, "cast", dataType))
-} else if (is.list(dataType)) {
--- End diff --

breaking change? if intended, remove example on L243?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14173: [SPARKR][SPARK-16507] Add a CRAN checker, fix Rd ...

2016-07-13 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14173#discussion_r70583496
  
--- Diff: R/pkg/R/column.R ---
@@ -44,6 +44,9 @@ setMethod("initialize", "Column", function(.Object, jc) {
   .Object
 })
 
+#' @rdname column
+#' @name column
+#' @aliases column,jobj-method
--- End diff --

I thought we don't export this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14173: [SPARKR][SPARK-16507] Add a CRAN checker, fix Rd ...

2016-07-13 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14173#discussion_r70583340
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -267,6 +267,10 @@ as.DataFrame.default <- function(data, schema = NULL, 
samplingRatio = 1.0) {
   createDataFrame(data, schema, samplingRatio)
 }
 
+#' @rdname createDataFrame
+#' @aliases createDataFrame
--- End diff --

should aliases here be as.DataFrame so it could be find via `?`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14178: [SPARKR][DOCS][MINOR] R programming guide to include csv...

2016-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14178
  
**[Test build #62228 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62228/consoleFull)**
 for PR 14178 at commit 
[`30c7c81`](https://github.com/apache/spark/commit/30c7c81de962e8cc577b1c9786939521fe1c6899).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14173: [SPARKR][SPARK-16507] Add a CRAN checker, fix Rd ...

2016-07-13 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14173#discussion_r70583224
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2950,6 +3038,10 @@ setMethod("drop",
   })
 
 # Expose base::drop
+#' @name drop
+#' @rdname drop
--- End diff --

this would add a fairly empty Rd page for drop.. I wonder if there is a way 
to avoid that? Perhaps add a link to base::drop?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...

2016-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14176
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62227/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...

2016-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14176
  
**[Test build #62227 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62227/consoleFull)**
 for PR 14176 at commit 
[`a3360e0`](https://github.com/apache/spark/commit/a3360e0ab1223dd43f891e755e648680a402b7df).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...

2016-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14176
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14178: [SPARKR][DOCS][MINOR] R programming guide to incl...

2016-07-13 Thread felixcheung

GitHub user felixcheung opened a pull request:

https://github.com/apache/spark/pull/14178

[SPARKR][DOCS][MINOR] R programming guide to include csv data source example

## What changes were proposed in this pull request?

Minor documentation update for code example, code style, and missed 
reference to "sparkR.init"


## How was this patch tested?

manual

@shivaram 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixcheung/spark rcsvprogrammingguide

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14178.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14178


commit 30c7c81de962e8cc577b1c9786939521fe1c6899
Author: Felix Cheung 
Date:   2016-07-13T06:42:26Z

update




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14177: [SPARK-16027][SPARKR] Fix R tests SparkSession init/stop

2016-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14177
  
**[Test build #62226 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62226/consoleFull)**
 for PR 14177 at commit 
[`1a86e85`](https://github.com/apache/spark/commit/1a86e857ab954620fb33dde8667f3a2a7d5138dc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...

2016-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14176
  
**[Test build #62227 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62227/consoleFull)**
 for PR 14176 at commit 
[`a3360e0`](https://github.com/apache/spark/commit/a3360e0ab1223dd43f891e755e648680a402b7df).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14174: [SPARK-16524][SQL] Add RowBatch and RowBasedHashMapGener...

2016-07-13 Thread ooq

Github user ooq commented on the issue:

https://github.com/apache/spark/pull/14174
  
cc @sameeragarwal @davies @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...

2016-07-13 Thread ooq

Github user ooq commented on the issue:

https://github.com/apache/spark/pull/14176
  
cc @sameeragarwal @davies @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14177: [SPARK-16027][SPARKR] Fix R tests SparkSession in...

2016-07-13 Thread felixcheung

GitHub user felixcheung opened a pull request:

https://github.com/apache/spark/pull/14177

[SPARK-16027][SPARKR] Fix R tests SparkSession init/stop

## What changes were proposed in this pull request?

Fix R SparkSession init/stop, and warnings of reusing existing Spark Context


## How was this patch tested?

unit tests

@shivaram 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixcheung/spark rsessiontest

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14177.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14177


commit 72fffbb593de289fb4434c730c592e04b50fb13f
Author: Felix Cheung 
Date:   2016-07-13T05:42:01Z

fix session start/stop in tests

commit 614a63e091a8164696a4316564bdae53257953de
Author: Felix Cheung 
Date:   2016-07-13T06:56:56Z

fix test

commit 1a86e857ab954620fb33dde8667f3a2a7d5138dc
Author: Felix Cheung 
Date:   2016-07-13T07:56:09Z

fix style




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14176: [SPARK-16525][SQL] Enable Row Based HashMap in Ha...

2016-07-13 Thread ooq

GitHub user ooq opened a pull request:

https://github.com/apache/spark/pull/14176

[SPARK-16525][SQL] Enable Row Based HashMap in HashAggregateExec

## What changes were proposed in this pull request?

This PR is the second step for the following feature:

For hash aggregation in Spark SQL, we use a fast aggregation hashmap to act 
as a "cache" in order to boost aggregation performance. Previously, the hashmap 
is backed by a `ColumnarBatch`. This has performance issues when we have wide 
schema for the aggregation table (large number of key fields or value fields). 
In this JIRA, we support another implementation of fast hashmap, which is 
backed by a `RowBatch`. We then automatically pick between the two 
implementations based on certain knobs.

In this second-step PR, we enable `RowBasedHashMapGenerator` in 
`HashAggregateExec`. 

## How was this patch tested?

Tests and benchmarks will be added in a separate PR in the series. 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ooq/spark rowbasedfastaggmap-pr2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14176.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14176


commit c87f26b318b5d673ac95454df5c1cb9a56c677eb
Author: Qifan Pu 
Date:   2016-07-13T07:35:06Z

add RowBatch and RowBasedHashMapGenerator

commit a3360e0ab1223dd43f891e755e648680a402b7df
Author: Qifan Pu 
Date:   2016-07-13T08:08:35Z

enable row based hashmap




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...

2016-07-13 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14119


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14175: [SPARK-16522][MESOS] Spark application throws exception ...

2016-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14175
  
**[Test build #62225 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62225/consoleFull)**
 for PR 14175 at commit 
[`6fe96e5`](https://github.com/apache/spark/commit/6fe96e5879fd97aa630839e670e3d8b17de785be).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...

2016-07-13 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14119
  
LGTM, I've merged this to master and branch-2.0. Thanks for working on this!

I only observed one weird rendering caused by the blank lines before `{% 
include_example %}`, maybe my local Jekyll version is too low. I think it's 
fine to leave other lines as is. The exceeded lines should be OK.

Could you please remove the WIP tag from the PR title? (I've removed it 
manually while merging this PR.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14175: [SPARK-16522][MESOS] Spark application throws exc...

2016-07-13 Thread sun-rui

GitHub user sun-rui opened a pull request:

https://github.com/apache/spark/pull/14175

[SPARK-16522][MESOS] Spark application throws exception on exit.

## What changes were proposed in this pull request?
Spark applications running on Mesos throw exception upon exit. For details, 
refer to https://issues.apache.org/jira/browse/SPARK-16522.

I am not sure if there is any better fix, so wait for review comments.


## How was this patch tested?
Manual test. Observed that the exception is gone upon application exit.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sun-rui/spark SPARK-16522

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14175.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14175


commit 6fe96e5879fd97aa630839e670e3d8b17de785be
Author: Sun Rui 
Date:   2016-07-13T07:43:38Z

[SPARK-16522][MESOS] Spark application throws exception on exit.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14174: [SPARK-16524][SQL] Add RowBatch and RowBasedHashMapGener...

2016-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14174
  
**[Test build #6 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/6/consoleFull)**
 for PR 14174 at commit 
[`c87f26b`](https://github.com/apache/spark/commit/c87f26b318b5d673ac95454df5c1cb9a56c677eb).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public final class RowBatch extends MemoryConsumer`
  * `class RowBasedHashMapGenerator(`
  * `  case class Buffer(dataType: DataType, name: String)`
  * `   |public class $generatedClassName extends 
org.apache.spark.memory.MemoryConsumer`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14165: [SPARK-16503] SparkSession should provide Spark version

2016-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14165
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14165: [SPARK-16503] SparkSession should provide Spark version

2016-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14165
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62223/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14165: [SPARK-16503] SparkSession should provide Spark version

2016-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14165
  
**[Test build #62223 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62223/consoleFull)**
 for PR 14165 at commit 
[`b4372f7`](https://github.com/apache/spark/commit/b4372f75dea7d486c03a4d35b48d65779c316831).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14174: [SPARK-16524][SQL] Add RowBatch and RowBasedHashMapGener...

2016-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14174
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14174: [SPARK-16524][SQL] Add RowBatch and RowBasedHashMapGener...

2016-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14174
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/6/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14174: [SPARK-16524][SQL] Add RowBatch and RowBasedHashMapGener...

2016-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14174
  
**[Test build #6 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/6/consoleFull)**
 for PR 14174 at commit 
[`c87f26b`](https://github.com/apache/spark/commit/c87f26b318b5d673ac95454df5c1cb9a56c677eb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14036
  
**[Test build #62224 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62224/consoleFull)**
 for PR 14036 at commit 
[`16eff20`](https://github.com/apache/spark/commit/16eff2071a1ce2f532000e61f6990eb9d77c173f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14165: [SPARK-16503] SparkSession should provide Spark version

2016-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14165
  
**[Test build #62223 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62223/consoleFull)**
 for PR 14165 at commit 
[`b4372f7`](https://github.com/apache/spark/commit/b4372f75dea7d486c03a4d35b48d65779c316831).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14174: [SPARK-16524][SQL] Add RowBatch and RowBasedHashM...

2016-07-13 Thread ooq

GitHub user ooq opened a pull request:

https://github.com/apache/spark/pull/14174

[SPARK-16524][SQL] Add RowBatch and RowBasedHashMapGenerator

## What changes were proposed in this pull request?

This PR is the first step for the following feature:

For hash aggregation in Spark SQL, we use a fast aggregation hashmap to act 
as a "cache" in order to boost aggregation performance. Previously, the hashmap 
is backed by a `ColumnarBatch`. This has performance issues when we have wide 
schema for the aggregation table (large number of key fields or value fields). 
In this JIRA, we support another implementation of fast hashmap, which is 
backed by a `RowBatch`. We then automatically pick between the two 
implementations based on certain knobs.

In this first-step PR, implementations for `RowBatch` and 
`RowBasedHashMapGenerator` are added. 

## How was this patch tested?

`RowBatch` could be tested through unit tests (added later). Otherwise, 
tests and benchmarks will be added in a separate PR in the series. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ooq/spark rowbasedfastaggmap-pr1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14174.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14174


commit c87f26b318b5d673ac95454df5c1cb9a56c677eb
Author: Qifan Pu 
Date:   2016-07-13T07:35:06Z

add RowBatch and RowBasedHashMapGenerator




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-13 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/14036
  
@cloud-fan Done ð 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14165: [SPARK-16503] SparkSession should provide Spark version

2016-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14165
  
**[Test build #62221 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62221/consoleFull)**
 for PR 14165 at commit 
[`1ea0247`](https://github.com/apache/spark/commit/1ea0247cfd68823ce6175cec42e2027334d31451).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14165: [SPARK-16503] SparkSession should provide Spark version

2016-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14165
  
**[Test build #62220 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62220/consoleFull)**
 for PR 14165 at commit 
[`b0a724e`](https://github.com/apache/spark/commit/b0a724e65a947a0becda3fd17370acd0e695e42a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-13 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14036
  
LGTM except 2 naming comments, thanks for working on it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...

2016-07-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14036#discussion_r70580188
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 ---
@@ -207,20 +207,12 @@ case class Multiply(left: Expression, right: 
Expression)
   protected override def nullSafeEval(input1: Any, input2: Any): Any = 
numeric.times(input1, input2)
 }
 
-@ExpressionDescription(
-  usage = "a _FUNC_ b - Divides a by b.",
-  extended = "> SELECT 3 _FUNC_ 2;\n 1.5")
-case class Divide(left: Expression, right: Expression)
-extends BinaryArithmetic with NullIntolerant {
-
-  override def inputType: AbstractDataType = TypeCollection(DoubleType, 
DecimalType)
-
-  override def symbol: String = "/"
-  override def decimalMethod: String = "$div"
+abstract class DivisionArithmetic extends BinaryArithmetic with 
NullIntolerant {
--- End diff --

how about `DivideBase`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...

2016-07-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14036#discussion_r70580079
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 ---
@@ -285,6 +278,28 @@ case class Divide(left: Expression, right: Expression)
 }
 
 @ExpressionDescription(
+  usage = "a _FUNC_ b - Fraction Division a by b.",
+  extended = "> SELECT 3 _FUNC_ 2;\n 1.5")
+case class Divide(left: Expression, right: Expression)
+extends DivisionArithmetic {
+
+  override def inputType: AbstractDataType = TypeCollection(DoubleType, 
DecimalType)
+
+  override def symbol: String = "/"
+}
+
+@ExpressionDescription(
+  usage = "a _FUNC_ b - Divides a by b.",
+  extended = "> SELECT 3 _FUNC_ 2;\n 1")
+case class IntegerDivide(left: Expression, right: Expression)
--- End diff --

`IntegralDivide`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14111: [SPARK-16456][SQL] Reuse the uncorrelated scalar subquer...

2016-07-13 Thread lianhuiwang

Github user lianhuiwang commented on the issue:

https://github.com/apache/spark/pull/14111
  
@cloud-fan At firstly I have implemented it with you said. But the 
following situation that has broadcast join will have a error 'ScalarSubquery 
has not finished', example (from SPARK-14791):
val df = (1 to 3).map(i => (i, i)).toDF("key", "value")
  df.createOrReplaceTempView("t1")
  df.createOrReplaceTempView("t2")
  df.createOrReplaceTempView("t3")
  val q = sql("select * from t1 join (select key, value from t2 " +
" where key > (select avg (key) from t3))t on (t1.key = t.key)")
Before:
'''
*BroadcastHashJoin [key#5], [key#26], Inner, BuildRight
:- *Project [_1#2 AS key#5, _2#3 AS value#6]
:  +- *Filter (cast(_1#2 as double) > subquery#13)
: :  +- Subquery subquery#13
: : +- *HashAggregate(keys=[], functions=[avg(cast(key#5 as 
bigint))], output=[avg(key)#25])
: :+- Exchange SinglePartition
: :   +- *HashAggregate(keys=[], 
functions=[partial_avg(cast(key#5 as bigint))], output=[sum#30, count#31L])
: :  +- LocalTableScan [key#5]
: +- LocalTableScan [_1#2, _2#3]
+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
false] as bigint)))
   +- *Project [_1#2 AS key#26, _2#3 AS value#27]
  +- *Filter (cast(_1#2 as double) > subquery#13)
 :  +- Subquery subquery#13
 : +- *HashAggregate(keys=[], functions=[avg(cast(key#5 as 
bigint))], output=[avg(key)#25])
 :+- Exchange SinglePartition
 :   +- *HashAggregate(keys=[], 
functions=[partial_avg(cast(key#5 as bigint))], output=[sum#30, count#31L])
 :  +- LocalTableScan [key#5]
 +- LocalTableScan [_1#2, _2#3]
'''
The steps are as follows:
1. BroadcastHashJoin.prepare()
2. t1.Filter.prepareSubqueries, it will prepare subquery.
3. BroadcastExchange.prepare()
4. t2.Filter.prepareSubqueries, it will prepare subquery.
5. BroadcastExchange.doPrepare()
6. t2.Filter.execute()
7. t2.Filter.waitForSubqueries(), it will wait for subquery.
8. BroadcastHashJoin.doExecute()
9. BroadcastExchange.executeBroadcast()
10. t1.Filter.execute()
11. t1.Filter.waitForSubqueries(), it will wait for subquery.
because before that there are two different subqueries, they cannot wait 
for other's results.

But after this PR, they are the same subquery, the steps are as follows:
1.  t1.Filter.prepareSubqueries, it will prepare subquery.
2.  t2.Filter.prepareSubqueries, it will do not  submit subquery's 
execute().
3. t2.Filter.waitForSubqueries(), it will can wait for subquery that step-1 
have submitted before.
4. t1.Filter.waitForSubqueries(), it do not await subquery's results 
because step-3 have updated.
So I make some logical codes to ScalarSubquery in order to deal with it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14036
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14036
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62213/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

2016-07-13 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14148
  
It's easy to infer the schema once when we create the table and store it 
into external catalog. However, it's a breaking change which means users can't 
change the underlying data file schema after the table is created. It's a bad 
design we need to fix, but we also need to go through the code path to make 
sure we don't break other things.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14036
  
**[Test build #62213 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62213/consoleFull)**
 for PR 14036 at commit 
[`8d9a04d`](https://github.com/apache/spark/commit/8d9a04d61a155f5bc131cc7a06a1f9378ceb1cbe).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14148: [SPARK-16482] [SQL] Describe Table Command for Ta...

2016-07-13 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14148#discussion_r70578153
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -413,38 +413,36 @@ case class DescribeTableCommand(table: 
TableIdentifier, isExtended: Boolean, isF
 } else {
   val metadata = catalog.getTableMetadata(table)
 
+  if (DDLUtils.isDatasourceTable(metadata)) {
+DDLUtils.getSchemaFromTableProperties(metadata) match {
+  case Some(userSpecifiedSchema) => 
describeSchema(userSpecifiedSchema, result)
+  case None => 
describeSchema(catalog.lookupRelation(table).schema, result)
+}
+  } else {
+describeSchema(metadata.schema, result)
+  }
--- End diff --

@yhuai I just did a try. We have to pass `CatalogTable` for avoiding 
another call of `getTableMetadata`. We also need to pass `SessionCatalog` for 
calling `lookupRelation`. Do you like this function? or keep the existing one? 
Thanks!

```Scala
  private def describeSchema(
  tableDesc: CatalogTable,
  catalog: SessionCatalog,
  buffer: ArrayBuffer[Row]): Unit = {
if (DDLUtils.isDatasourceTable(tableDesc)) {
  DDLUtils.getSchemaFromTableProperties(tableDesc) match {
case Some(userSpecifiedSchema) => 
describeSchema(userSpecifiedSchema, buffer)
case None => describeSchema(catalog.lookupRelation(table).schema, 
buffer)
  }
} else {
  describeSchema(tableDesc.schema, buffer)
}
  }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14036
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62212/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14036
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-07-13 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13701
  
@yhuai OK. Thanks for letting me know that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14036
  
**[Test build #62212 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62212/consoleFull)**
 for PR 14036 at commit 
[`ab6858c`](https://github.com/apache/spark/commit/ab6858cac3f8f53a3437038b7cd767e73d170eaa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13658: [SPARK-15937] [yarn] Improving the logic to wait for an ...

2016-07-13 Thread subrotosanyal

Github user subrotosanyal commented on the issue:

https://github.com/apache/spark/pull/13658
  
hi @vanzin 

Even I am surprised to see that notify was not triggered somehow.
> Is your code perhaps setting "spark.master" to "local" or something that 
is not "yarn-cluster" before you create the SparkContext?
I would say we don't set it to local. Further the issue was happening once 
in a while though the client code remained the same. 
Though for time being I have applied the patch and built a custom spark 
distribution to get rid of this random failure but, in long run I won't prefer 
to use any custom distribution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14165: [SPARK-16503] SparkSession should provide Spark v...

2016-07-13 Thread lins05

Github user lins05 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14165#discussion_r70575753
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -79,6 +79,9 @@ class SparkSession private(
 
   sparkContext.assertNotStopped()
 
+  /** The version of Spark on which this application is running. */
+  def version: String = SPARK_VERSION
--- End diff --

@rxin May I ask when should we use the `@Since` java annotation and when 
use `@since` in javadoc?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14172: [SPARK-16516][SQL] Support for pushing down filters for ...

2016-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14172
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62214/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14172: [SPARK-16516][SQL] Support for pushing down filters for ...

2016-07-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14172
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14172: [SPARK-16516][SQL] Support for pushing down filters for ...

2016-07-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14172
  
**[Test build #62214 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62214/consoleFull)**
 for PR 14172 at commit 
[`ade0ad2`](https://github.com/apache/spark/commit/ade0ad27459248d3db1c7e453cbf724596a50a2a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14152: [SPARK-16395] [STREAMING] Fail if too many Checkp...

2016-07-13 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14152#discussion_r70575075
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala ---
@@ -18,8 +18,8 @@
 package org.apache.spark.streaming
 
 import java.io._
-import java.util.concurrent.Executors
-import java.util.concurrent.RejectedExecutionException
+import java.util.concurrent.{ArrayBlockingQueue, 
RejectedExecutionException,
--- End diff --

Yeah it's the style I see elsewhere, like 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L39
  Arguably it's time for a _ import at this stage; I'm indifferent here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

2016-07-13 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14148
  
Tomorrow, I will try to dig it deeper and check whether schema evolution 
could be an issue if the schema is fixed when creating tables. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

2016-07-13 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14148
  
uh... I see what you mean. Agree. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

2016-07-13 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14148
  
I was not talking about caching here. Caching is transient. I want the 
behavior to be the same regardless of how many times I'm restarting Spark ...

And this has nothing to do with refresh. For tables in the catalog, NEVER 
change the schema implicitly, only do it when it is specified by the user.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14148: [SPARK-16482] [SQL] Describe Table Command for Ta...

2016-07-13 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14148#discussion_r70573373
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -413,38 +413,36 @@ case class DescribeTableCommand(table: 
TableIdentifier, isExtended: Boolean, isF
 } else {
   val metadata = catalog.getTableMetadata(table)
 
+  if (DDLUtils.isDatasourceTable(metadata)) {
+DDLUtils.getSchemaFromTableProperties(metadata) match {
+  case Some(userSpecifiedSchema) => 
describeSchema(userSpecifiedSchema, result)
+  case None => 
describeSchema(catalog.lookupRelation(table).schema, result)
+}
+  } else {
+describeSchema(metadata.schema, result)
+  }
--- End diff --

Sure. Let me do it now

BTW, previously, `describeExtended` and `describeFormatted` also contain 
the schema. Both call the original function `describe`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

2016-07-13 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14148
  
@rxin Currently, we do not run schema inference every time when metadata 
cache contains the plan. Based on my understanding, that is the major reason 
why we introduced the metadata cache at the very beginning. 

I think it is not hard to store the schema of data source tables in the 
external catalog (Hive metastore). However, `Refresh Table` only refreshes the 
metadata cache and the data cache. It does not update the schema stored in the 
external catalog. If we do not store the schema in the external catalog, it 
works well. Otherwise, we have to refresh the schema info in the external 
catalog.

To implement your idea, I can submit a PR for the release 2.1 tomorrow. We 
can discuss it in a separate PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 2 3 4 5 6 7

601 - 660 of 660 matches

Mail list logo