[GitHub] spark pull request: [Minor] Alter description of some configuratio...

2015-04-14 Thread ArcherShao
GitHub user ArcherShao opened a pull request:

https://github.com/apache/spark/pull/5519

[Minor] Alter description of some configuration in yarn and mesos

The value of these configurations are calculate by 'math.max(a, b)', but 
description is 'a with minimum of b', alter it to 'a with maximum of b''.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ArcherShao/spark conf-des

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5519.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5519


commit 7d23311dffb919a44bb8e0559159fb616771b59c
Author: ArcherShao 
Date:   2015-04-15T06:50:17Z

[Minor] Alter description of some configuration in yarn and mesos




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5208#issuecomment-93224019
  
  [Test build #30304 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30304/consoleFull)
 for   PR 5208 at commit 
[`ec8061b`](https://github.com/apache/spark/commit/ec8061b7f36b87c883af111438ac9ff0304050d7).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Exchange(`
  * `case class SortMergeJoin(`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5208#issuecomment-93224051
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30304/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5350#issuecomment-93223685
  
  [Test build #30303 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30303/consoleFull)
 for   PR 5350 at commit 
[`3b7bfa8`](https://github.com/apache/spark/commit/3b7bfa8f37e7f2b9aefdfd0e5e57d7b5c6b516ce).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait CaseConversionExpression `
  * `final class UTF8String extends Ordered[UTF8String] with Serializable `

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

2015-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5350#issuecomment-93223712
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30303/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5358#issuecomment-93222352
  
  [Test build #30322 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30322/consoleFull)
 for   PR 5358 at commit 
[`6014acc`](https://github.com/apache/spark/commit/6014acc11e570c880657238dc4a444ba6335bc13).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [YARN] SPARK-6470. Add support for YARN node l...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5242#issuecomment-93222375
  
  [Test build #30323 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30323/consoleFull)
 for   PR 5242 at commit 
[`e377ed6`](https://github.com/apache/spark/commit/e377ed61e398dbbbda976ba2e61eb0c8488f4c7f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL][Minor] Fix foreachUp of treenode

2015-04-14 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/5518#issuecomment-93222096
  
Please add a test in 
[TreeNodeSuite](https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2973] [SQL] Avoid spark job for take on...

2015-04-14 Thread scwf
Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/5247#discussion_r28397087
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
@@ -139,11 +139,13 @@ class DataFrame private[sql](
   @transient protected[sql] val logicalPlan: LogicalPlan = 
queryExecution.logical match {
 // For various commands (like DDL) and queries with side effects, we 
force query optimization to
 // happen right away to let these side effects take place eagerly.
-case _: Command |
- _: InsertIntoTable |
- _: CreateTableAsSelect[_] |
- _: CreateTableUsingAsSelect |
- _: WriteToFile =>
+case _ : Command =>
+  queryExecution.sparkPlan.executeCollect()
+  queryExecution.analyzed
--- End diff --

This will leads to executed command twice when we do action operator on 
dataframe, such as
`sql(s"CREATE DATABASE xxx").count()`
first execution is when constructing dataframe
second is to execute count.

So maybe we still need construct LocalRelation here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL][Minor] Fix foreachUp of treenode

2015-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5518#issuecomment-93221829
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30318/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL][Minor] Fix foreachUp of treenode

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5518#issuecomment-93221818
  
  [Test build #30318 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30318/consoleFull)
 for   PR 5518 at commit 
[`1ccbfa8`](https://github.com/apache/spark/commit/1ccbfa8ef27b284ace64e605b21f0e4915b53393).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6871][SQL] WITH clause in CTE can not f...

2015-04-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5480


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5208#issuecomment-93220011
  
  [Test build #30321 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30321/consoleFull)
 for   PR 5208 at commit 
[`2493b9f`](https://github.com/apache/spark/commit/2493b9f9548c4a63a3d31dc600588ac65968b611).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5208#issuecomment-93219531
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30319/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5208#issuecomment-93219524
  
  [Test build #30319 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30319/consoleFull)
 for   PR 5208 at commit 
[`5049d88`](https://github.com/apache/spark/commit/5049d882fbfcf9b7c63e95ec20d3a15310068752).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Exchange(`
  * `case class SortMergeJoin(`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4723#issuecomment-93219403
  
  [Test build #30320 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30320/consoleFull)
 for   PR 4723 at commit 
[`aaf4c5a`](https://github.com/apache/spark/commit/aaf4c5a4a06cd3fe9cf44e48dbfa6d209a4e75f1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL][Minor] Fix foreachUp of treenode

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5518#issuecomment-93219321
  
  [Test build #30318 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30318/consoleFull)
 for   PR 5518 at commit 
[`1ccbfa8`](https://github.com/apache/spark/commit/1ccbfa8ef27b284ace64e605b21f0e4915b53393).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5208#issuecomment-93219080
  
  [Test build #30319 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30319/consoleFull)
 for   PR 5208 at commit 
[`5049d88`](https://github.com/apache/spark/commit/5049d882fbfcf9b7c63e95ec20d3a15310068752).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5208#issuecomment-93218845
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30315/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5208#issuecomment-93218835
  
  [Test build #30315 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30315/consoleFull)
 for   PR 5208 at commit 
[`f91a2ae`](https://github.com/apache/spark/commit/f91a2aecf795b2a2b2b834bf69b21875ef6f0b6f).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Exchange(`
  * `case class SortMergeJoin(`

 * This patch **adds the following new dependencies:**
   * `commons-math3-3.4.1.jar`
   * `snappy-java-1.1.1.7.jar`

 * This patch **removes the following dependencies:**
   * `commons-math3-3.1.1.jar`
   * `snappy-java-1.1.1.6.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL][Minor] Fix foreachUp of treenode

2015-04-14 Thread scwf
GitHub user scwf opened a pull request:

https://github.com/apache/spark/pull/5518

[SQL][Minor] Fix foreachUp of treenode

`foreachUp` should runs the given function recursively on [[children]] then 
on this node(just like transformUp). The current implementation does not follow 
this.

This will leads to checkanalysis do not check from bottom of logical tree.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/scwf/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5518.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5518


commit 1ccbfa8ef27b284ace64e605b21f0e4915b53393
Author: Fei Wang 
Date:   2015-04-15T06:31:21Z

fix foreachUp




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2973] [SQL] Avoid spark job for take on...

2015-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5247#issuecomment-93218343
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30316/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/5208#discussion_r28396290
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala ---
@@ -157,28 +205,61 @@ private[sql] case class AddExchange(sqlContext: 
SQLContext) extends Rule[SparkPl
 case Seq(a,b) => a compatibleWith b
   }.exists(!_)
 
-  // Check if the partitioning we want to ensure is the same as the 
child's output
-  // partitioning. If so, we do not need to add the Exchange operator.
-  def addExchangeIfNecessary(partitioning: Partitioning, child: 
SparkPlan): SparkPlan =
-if (child.outputPartitioning != partitioning) 
Exchange(partitioning, child) else child
+  // Adds Exchange or Sort operators as required
+  def addOperatorsIfNecessary(
+  partitioning: Partitioning,
+  rowOrdering: Seq[SortOrder],
+  child: SparkPlan): SparkPlan = {
+val needSort = rowOrdering.nonEmpty && child.outputOrdering != 
rowOrdering
+val needsShuffle = child.outputPartitioning != partitioning
+val canSortWithShuffle = Exchange.canSortWithShuffle(partitioning, 
rowOrdering)
+
+if (needSort && needsShuffle && canSortWithShuffle) {
+  Exchange(partitioning, rowOrdering, child)
+} else {
+  val withShuffle = if (needsShuffle) {
+Exchange(partitioning, Nil, child)
+  } else {
+child
+  }
 
-  if (meetsRequirements && compatible) {
+  val withSort = if (needSort) {
+Sort(rowOrdering, global = false, withShuffle)
+  } else {
+withShuffle
+  }
+
+  withSort
+}
+  }
+
+  if (meetsRequirements && compatible  && !needsAnySort) {
 operator
   } else {
 // At least one child does not satisfies its required data 
distribution or
 // at least one child's outputPartitioning is not compatible with 
another child's
 // outputPartitioning. In this case, we need to add Exchange 
operators.
-val repartitionedChildren = 
operator.requiredChildDistribution.zip(operator.children).map {
-  case (AllTuples, child) =>
-addExchangeIfNecessary(SinglePartition, child)
-  case (ClusteredDistribution(clustering), child) =>
-addExchangeIfNecessary(HashPartitioning(clustering, 
numPartitions), child)
-  case (OrderedDistribution(ordering), child) =>
-addExchangeIfNecessary(RangePartitioning(ordering, 
numPartitions), child)
-  case (UnspecifiedDistribution, child) => child
-  case (dist, _) => sys.error(s"Don't know how to ensure $dist")
+val requirements =
+  (operator.requiredChildDistribution, 
operator.requiredChildOrdering, operator.children)
+
+val fixedChildren = requirements.zipped.map {
+  case (AllTuples, rowOrdering, child) =>
+addOperatorsIfNecessary(SinglePartition, rowOrdering, child)
+  case (ClusteredDistribution(clustering), rowOrdering, child) =>
+addOperatorsIfNecessary(HashPartitioning(clustering, 
numPartitions), rowOrdering, child)
+  case (OrderedDistribution(ordering), rowOrdering, child) =>
+addOperatorsIfNecessary(RangePartitioning(ordering, 
numPartitions), Nil, child)
--- End diff --

OK, let's add it then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2973] [SQL] Avoid spark job for take on...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5247#issuecomment-93217874
  
  [Test build #30317 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30317/consoleFull)
 for   PR 5247 at commit 
[`7f51f7e`](https://github.com/apache/spark/commit/7f51f7e7c3406611b20b5570e71872cea44f93e8).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2973] [SQL] Avoid spark job for take on...

2015-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5247#issuecomment-93217876
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30317/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2973] [SQL] Avoid spark job for take on...

2015-04-14 Thread scwf
Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/5247#issuecomment-93217820
  
Here is a problem need to be fixed: the ddl command will be executed twice


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...

2015-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4435#issuecomment-93217736
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30301/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4435#issuecomment-93217730
  
**[Test build #30301 timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30301/consoleFull)**
 for PR 4435 at commit 
[`c22b11f`](https://github.com/apache/spark/commit/c22b11f0a808135e492cb50c5b5bdebcfd73b1a5)
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/5208#discussion_r28396071
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala ---
@@ -87,7 +126,12 @@ case class Exchange(newPartitioning: Partitioning, 
child: SparkPlan) extends Una
 implicit val ordering = new RowOrdering(sortingExpressions, 
child.output)
--- End diff --

oh, I see... For RangePartitioner..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6898][SQL] completely support special c...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5511#issuecomment-93217496
  
  [Test build #30313 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30313/consoleFull)
 for   PR 5511 at commit 
[`48e3e57`](https://github.com/apache/spark/commit/48e3e57e2dd7ac11002515bcb8906eb1215ab0cf).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class UnresolvedAttribute(nameParts: Seq[String])`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6898][SQL] completely support special c...

2015-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5511#issuecomment-93217501
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30313/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2973] [SQL] Avoid spark job for take on...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5247#issuecomment-93217483
  
  [Test build #30317 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30317/consoleFull)
 for   PR 5247 at commit 
[`7f51f7e`](https://github.com/apache/spark/commit/7f51f7e7c3406611b20b5570e71872cea44f93e8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/5208#discussion_r28395998
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala ---
@@ -87,7 +126,12 @@ case class Exchange(newPartitioning: Partitioning, 
child: SparkPlan) extends Una
 implicit val ordering = new RowOrdering(sortingExpressions, 
child.output)
--- End diff --

maybe this line is redundant?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/5208#discussion_r28395922
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala ---
@@ -157,28 +205,61 @@ private[sql] case class AddExchange(sqlContext: 
SQLContext) extends Rule[SparkPl
 case Seq(a,b) => a compatibleWith b
   }.exists(!_)
 
-  // Check if the partitioning we want to ensure is the same as the 
child's output
-  // partitioning. If so, we do not need to add the Exchange operator.
-  def addExchangeIfNecessary(partitioning: Partitioning, child: 
SparkPlan): SparkPlan =
-if (child.outputPartitioning != partitioning) 
Exchange(partitioning, child) else child
+  // Adds Exchange or Sort operators as required
+  def addOperatorsIfNecessary(
+  partitioning: Partitioning,
+  rowOrdering: Seq[SortOrder],
+  child: SparkPlan): SparkPlan = {
+val needSort = rowOrdering.nonEmpty && child.outputOrdering != 
rowOrdering
+val needsShuffle = child.outputPartitioning != partitioning
+val canSortWithShuffle = Exchange.canSortWithShuffle(partitioning, 
rowOrdering)
+
+if (needSort && needsShuffle && canSortWithShuffle) {
+  Exchange(partitioning, rowOrdering, child)
+} else {
+  val withShuffle = if (needsShuffle) {
+Exchange(partitioning, Nil, child)
+  } else {
+child
+  }
 
-  if (meetsRequirements && compatible) {
+  val withSort = if (needSort) {
+Sort(rowOrdering, global = false, withShuffle)
+  } else {
+withShuffle
+  }
+
+  withSort
+}
+  }
+
+  if (meetsRequirements && compatible  && !needsAnySort) {
 operator
   } else {
 // At least one child does not satisfies its required data 
distribution or
 // at least one child's outputPartitioning is not compatible with 
another child's
 // outputPartitioning. In this case, we need to add Exchange 
operators.
-val repartitionedChildren = 
operator.requiredChildDistribution.zip(operator.children).map {
-  case (AllTuples, child) =>
-addExchangeIfNecessary(SinglePartition, child)
-  case (ClusteredDistribution(clustering), child) =>
-addExchangeIfNecessary(HashPartitioning(clustering, 
numPartitions), child)
-  case (OrderedDistribution(ordering), child) =>
-addExchangeIfNecessary(RangePartitioning(ordering, 
numPartitions), child)
-  case (UnspecifiedDistribution, child) => child
-  case (dist, _) => sys.error(s"Don't know how to ensure $dist")
+val requirements =
+  (operator.requiredChildDistribution, 
operator.requiredChildOrdering, operator.children)
+
+val fixedChildren = requirements.zipped.map {
+  case (AllTuples, rowOrdering, child) =>
+addOperatorsIfNecessary(SinglePartition, rowOrdering, child)
+  case (ClusteredDistribution(clustering), rowOrdering, child) =>
+addOperatorsIfNecessary(HashPartitioning(clustering, 
numPartitions), rowOrdering, child)
+  case (OrderedDistribution(ordering), rowOrdering, child) =>
+addOperatorsIfNecessary(RangePartitioning(ordering, 
numPartitions), Nil, child)
--- End diff --

@yhuai Good catch.

@adrian-wang I already modified `addOperatorsIfNecessary` and `Exchange` so 
that they could handle ordering for `RangePartitioning`. We just need to pass 
the information from the `match` into the function call.  The problem with not 
propagating the information here is that we will silently fail to order 
correctly, instead of throwing an error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6443][Spark Submit]Could not submit app...

2015-04-14 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/5116#issuecomment-93216337
  
ping? @andrewor14 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/5208#discussion_r28395812
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala ---
@@ -157,28 +205,61 @@ private[sql] case class AddExchange(sqlContext: 
SQLContext) extends Rule[SparkPl
 case Seq(a,b) => a compatibleWith b
   }.exists(!_)
 
-  // Check if the partitioning we want to ensure is the same as the 
child's output
-  // partitioning. If so, we do not need to add the Exchange operator.
-  def addExchangeIfNecessary(partitioning: Partitioning, child: 
SparkPlan): SparkPlan =
-if (child.outputPartitioning != partitioning) 
Exchange(partitioning, child) else child
+  // Adds Exchange or Sort operators as required
+  def addOperatorsIfNecessary(
+  partitioning: Partitioning,
+  rowOrdering: Seq[SortOrder],
+  child: SparkPlan): SparkPlan = {
+val needSort = rowOrdering.nonEmpty && child.outputOrdering != 
rowOrdering
+val needsShuffle = child.outputPartitioning != partitioning
+val canSortWithShuffle = Exchange.canSortWithShuffle(partitioning, 
rowOrdering)
+
+if (needSort && needsShuffle && canSortWithShuffle) {
+  Exchange(partitioning, rowOrdering, child)
+} else {
+  val withShuffle = if (needsShuffle) {
+Exchange(partitioning, Nil, child)
+  } else {
+child
+  }
 
-  if (meetsRequirements && compatible) {
+  val withSort = if (needSort) {
+Sort(rowOrdering, global = false, withShuffle)
+  } else {
+withShuffle
+  }
+
+  withSort
+}
+  }
+
+  if (meetsRequirements && compatible  && !needsAnySort) {
 operator
   } else {
 // At least one child does not satisfies its required data 
distribution or
 // at least one child's outputPartitioning is not compatible with 
another child's
 // outputPartitioning. In this case, we need to add Exchange 
operators.
-val repartitionedChildren = 
operator.requiredChildDistribution.zip(operator.children).map {
-  case (AllTuples, child) =>
-addExchangeIfNecessary(SinglePartition, child)
-  case (ClusteredDistribution(clustering), child) =>
-addExchangeIfNecessary(HashPartitioning(clustering, 
numPartitions), child)
-  case (OrderedDistribution(ordering), child) =>
-addExchangeIfNecessary(RangePartitioning(ordering, 
numPartitions), child)
-  case (UnspecifiedDistribution, child) => child
-  case (dist, _) => sys.error(s"Don't know how to ensure $dist")
+val requirements =
+  (operator.requiredChildDistribution, 
operator.requiredChildOrdering, operator.children)
+
+val fixedChildren = requirements.zipped.map {
+  case (AllTuples, rowOrdering, child) =>
+addOperatorsIfNecessary(SinglePartition, rowOrdering, child)
+  case (ClusteredDistribution(clustering), rowOrdering, child) =>
+addOperatorsIfNecessary(HashPartitioning(clustering, 
numPartitions), rowOrdering, child)
+  case (OrderedDistribution(ordering), rowOrdering, child) =>
+addOperatorsIfNecessary(RangePartitioning(ordering, 
numPartitions), Nil, child)
--- End diff --

Since we have already have all of the needed functions, why not put the 
`rowOrdering` back (if it is indeed wrong to ignore it)? If we leave it as is, 
in future maybe a new `SparkPlan` requires both OrderedDistribution and some 
kind of row ordering (for example, using range partitioner to handle data 
skew), then the physical plan will be wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6879][HistoryServer]check if app is com...

2015-04-14 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/5491#issuecomment-93216107
  
@vanzin Now I use an extra global ListBuffer to store the apps to clean. 
Update its content and delete its dirs/files in every clean round.

I know the elements in this ListBuffer could be type of `Path` or `String` 
for less space occupied. But for simple logic I just leave it as 
`FsApplicationHistoryInfo`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5208#issuecomment-93215769
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30309/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5208#issuecomment-93215758
  
  [Test build #30309 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30309/consoleFull)
 for   PR 5208 at commit 
[`f515cd2`](https://github.com/apache/spark/commit/f515cd29bbe7765eefbb185ad26b5dbb9e2d7380).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Exchange(`
  * `case class SortMergeJoin(`

 * This patch **adds the following new dependencies:**
   * `snappy-java-1.1.1.7.jar`

 * This patch **removes the following dependencies:**
   * `snappy-java-1.1.1.6.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6352] [SQL] Add DirectParquetOutputComm...

2015-04-14 Thread ypcat
Github user ypcat commented on the pull request:

https://github.com/apache/spark/pull/5042#issuecomment-93214702
  
I cannot find a way to unset a config value in hadoop 1.x API. The closest 
thing is to set it to a default value, which I think should be fine in test 
code.
And I found I cannot add more commits to this PR since it is closed. Should 
we reopen it or use a new PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6898][SQL] completely support special c...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5511#issuecomment-93214000
  
  [Test build #30313 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30313/consoleFull)
 for   PR 5511 at commit 
[`48e3e57`](https://github.com/apache/spark/commit/48e3e57e2dd7ac11002515bcb8906eb1215ab0cf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6879][HistoryServer]check if app is com...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5491#issuecomment-93213684
  
  [Test build #30314 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30314/consoleFull)
 for   PR 5491 at commit 
[`d7455d8`](https://github.com/apache/spark/commit/d7455d8df310d690d8104663dc39508011726d12).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5208#issuecomment-93213692
  
  [Test build #30315 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30315/consoleFull)
 for   PR 5208 at commit 
[`f91a2ae`](https://github.com/apache/spark/commit/f91a2aecf795b2a2b2b834bf69b21875ef6f0b6f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/5208#discussion_r28395453
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala ---
@@ -157,28 +205,61 @@ private[sql] case class AddExchange(sqlContext: 
SQLContext) extends Rule[SparkPl
 case Seq(a,b) => a compatibleWith b
   }.exists(!_)
 
-  // Check if the partitioning we want to ensure is the same as the 
child's output
-  // partitioning. If so, we do not need to add the Exchange operator.
-  def addExchangeIfNecessary(partitioning: Partitioning, child: 
SparkPlan): SparkPlan =
-if (child.outputPartitioning != partitioning) 
Exchange(partitioning, child) else child
+  // Adds Exchange or Sort operators as required
+  def addOperatorsIfNecessary(
+  partitioning: Partitioning,
+  rowOrdering: Seq[SortOrder],
+  child: SparkPlan): SparkPlan = {
+val needSort = rowOrdering.nonEmpty && child.outputOrdering != 
rowOrdering
+val needsShuffle = child.outputPartitioning != partitioning
+val canSortWithShuffle = Exchange.canSortWithShuffle(partitioning, 
rowOrdering)
+
+if (needSort && needsShuffle && canSortWithShuffle) {
+  Exchange(partitioning, rowOrdering, child)
+} else {
+  val withShuffle = if (needsShuffle) {
+Exchange(partitioning, Nil, child)
+  } else {
+child
+  }
 
-  if (meetsRequirements && compatible) {
+  val withSort = if (needSort) {
+Sort(rowOrdering, global = false, withShuffle)
+  } else {
+withShuffle
+  }
+
+  withSort
+}
+  }
+
+  if (meetsRequirements && compatible  && !needsAnySort) {
 operator
   } else {
 // At least one child does not satisfies its required data 
distribution or
 // at least one child's outputPartitioning is not compatible with 
another child's
 // outputPartitioning. In this case, we need to add Exchange 
operators.
-val repartitionedChildren = 
operator.requiredChildDistribution.zip(operator.children).map {
-  case (AllTuples, child) =>
-addExchangeIfNecessary(SinglePartition, child)
-  case (ClusteredDistribution(clustering), child) =>
-addExchangeIfNecessary(HashPartitioning(clustering, 
numPartitions), child)
-  case (OrderedDistribution(ordering), child) =>
-addExchangeIfNecessary(RangePartitioning(ordering, 
numPartitions), child)
-  case (UnspecifiedDistribution, child) => child
-  case (dist, _) => sys.error(s"Don't know how to ensure $dist")
+val requirements =
+  (operator.requiredChildDistribution, 
operator.requiredChildOrdering, operator.children)
+
+val fixedChildren = requirements.zipped.map {
+  case (AllTuples, rowOrdering, child) =>
+addOperatorsIfNecessary(SinglePartition, rowOrdering, child)
+  case (ClusteredDistribution(clustering), rowOrdering, child) =>
+addOperatorsIfNecessary(HashPartitioning(clustering, 
numPartitions), rowOrdering, child)
+  case (OrderedDistribution(ordering), rowOrdering, child) =>
+addOperatorsIfNecessary(RangePartitioning(ordering, 
numPartitions), Nil, child)
--- End diff --

Currently only sort merge join will require `childOrdering`, and in that 
case it could not be `RangePartitioning`, so it doesn't matter if we not handle 
rowOrdering for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6879][HistoryServer]check if app is com...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5491#issuecomment-93210950
  
  [Test build #30312 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30312/consoleFull)
 for   PR 5491 at commit 
[`b0abca5`](https://github.com/apache/spark/commit/b0abca54d693399ec2ebd966309b0aded735dd06).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6692][YARN] Add an option for client to...

2015-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5343#issuecomment-93210723
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30302/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6692][YARN] Add an option for client to...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5343#issuecomment-93210645
  
  [Test build #30302 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30302/consoleFull)
 for   PR 5343 at commit 
[`2a3fa38`](https://github.com/apache/spark/commit/2a3fa381708ce5319ca3786a079c866b70467e81).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch **adds the following new dependencies:**
   * `commons-math3-3.4.1.jar`
   * `snappy-java-1.1.1.7.jar`

 * This patch **removes the following dependencies:**
   * `commons-math3-3.1.1.jar`
   * `snappy-java-1.1.1.6.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5358#issuecomment-93208166
  
  [Test build #30311 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30311/consoleFull)
 for   PR 5358 at commit 
[`8909a5d`](https://github.com/apache/spark/commit/8909a5d14dccc1933a451261bc0e56a2cc876897).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/5208#discussion_r28395272
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala ---
@@ -102,6 +106,8 @@ case class Limit(limit: Int, child: SparkPlan)
   override def output: Seq[Attribute] = child.output
   override def outputPartitioning: Partitioning = SinglePartition
 
+  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
--- End diff --

I am not sure it is correct. We are merging rows from multiple partitions 
to a single partition and `outputOrdering` only guarantee the row ordering 
within a single partition. Seems without merge sort, we cannot use 
`child.outputOrdering`. How about we just remove it for now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/5208#discussion_r28395119
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala ---
@@ -157,28 +205,61 @@ private[sql] case class AddExchange(sqlContext: 
SQLContext) extends Rule[SparkPl
 case Seq(a,b) => a compatibleWith b
   }.exists(!_)
 
-  // Check if the partitioning we want to ensure is the same as the 
child's output
-  // partitioning. If so, we do not need to add the Exchange operator.
-  def addExchangeIfNecessary(partitioning: Partitioning, child: 
SparkPlan): SparkPlan =
-if (child.outputPartitioning != partitioning) 
Exchange(partitioning, child) else child
+  // Adds Exchange or Sort operators as required
+  def addOperatorsIfNecessary(
+  partitioning: Partitioning,
+  rowOrdering: Seq[SortOrder],
+  child: SparkPlan): SparkPlan = {
+val needSort = rowOrdering.nonEmpty && child.outputOrdering != 
rowOrdering
+val needsShuffle = child.outputPartitioning != partitioning
+val canSortWithShuffle = Exchange.canSortWithShuffle(partitioning, 
rowOrdering)
+
+if (needSort && needsShuffle && canSortWithShuffle) {
+  Exchange(partitioning, rowOrdering, child)
+} else {
+  val withShuffle = if (needsShuffle) {
+Exchange(partitioning, Nil, child)
+  } else {
+child
+  }
 
-  if (meetsRequirements && compatible) {
+  val withSort = if (needSort) {
+Sort(rowOrdering, global = false, withShuffle)
+  } else {
+withShuffle
+  }
+
+  withSort
+}
+  }
+
+  if (meetsRequirements && compatible  && !needsAnySort) {
 operator
   } else {
 // At least one child does not satisfies its required data 
distribution or
 // at least one child's outputPartitioning is not compatible with 
another child's
 // outputPartitioning. In this case, we need to add Exchange 
operators.
-val repartitionedChildren = 
operator.requiredChildDistribution.zip(operator.children).map {
-  case (AllTuples, child) =>
-addExchangeIfNecessary(SinglePartition, child)
-  case (ClusteredDistribution(clustering), child) =>
-addExchangeIfNecessary(HashPartitioning(clustering, 
numPartitions), child)
-  case (OrderedDistribution(ordering), child) =>
-addExchangeIfNecessary(RangePartitioning(ordering, 
numPartitions), child)
-  case (UnspecifiedDistribution, child) => child
-  case (dist, _) => sys.error(s"Don't know how to ensure $dist")
+val requirements =
+  (operator.requiredChildDistribution, 
operator.requiredChildOrdering, operator.children)
+
+val fixedChildren = requirements.zipped.map {
+  case (AllTuples, rowOrdering, child) =>
+addOperatorsIfNecessary(SinglePartition, rowOrdering, child)
+  case (ClusteredDistribution(clustering), rowOrdering, child) =>
+addOperatorsIfNecessary(HashPartitioning(clustering, 
numPartitions), rowOrdering, child)
+  case (OrderedDistribution(ordering), rowOrdering, child) =>
+addOperatorsIfNecessary(RangePartitioning(ordering, 
numPartitions), Nil, child)
--- End diff --

@marmbrus Seems we should not ignore the `rowOrdering`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6350][Mesos] Make mesosExecutorCores co...

2015-04-14 Thread jongyoul
Github user jongyoul commented on the pull request:

https://github.com/apache/spark/pull/5063#issuecomment-93206397
  
@andrewor14 Thanks for overall reviewing. I'll handle what you issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6735:[YARN] Adding properties to disable...

2015-04-14 Thread twinkle-sachdeva
Github user twinkle-sachdeva commented on a diff in the pull request:

https://github.com/apache/spark/pull/5449#discussion_r28394987
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -94,6 +98,14 @@ private[yarn] class YarnAllocator(
   // Additional memory overhead.
   protected val memoryOverhead: Int = 
sparkConf.getInt("spark.yarn.executor.memoryOverhead",
 math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toInt, 
MEMORY_OVERHEAD_MIN))
+
+  // Make the maximum executor failure check to be relative with respect 
to duration
+  private val relativeMaxExecutorFailureCheck = 
--- End diff --

Sounds reasonable. 
Added the property as spark.yarn.max.executor.failuresPerMinute


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5208#issuecomment-93205970
  
  [Test build #30309 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30309/consoleFull)
 for   PR 5208 at commit 
[`f515cd2`](https://github.com/apache/spark/commit/f515cd29bbe7765eefbb185ad26b5dbb9e2d7380).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/5208#discussion_r28394994
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala ---
@@ -157,28 +205,61 @@ private[sql] case class AddExchange(sqlContext: 
SQLContext) extends Rule[SparkPl
 case Seq(a,b) => a compatibleWith b
   }.exists(!_)
 
-  // Check if the partitioning we want to ensure is the same as the 
child's output
-  // partitioning. If so, we do not need to add the Exchange operator.
-  def addExchangeIfNecessary(partitioning: Partitioning, child: 
SparkPlan): SparkPlan =
-if (child.outputPartitioning != partitioning) 
Exchange(partitioning, child) else child
+  // Adds Exchange or Sort operators as required
+  def addOperatorsIfNecessary(
+  partitioning: Partitioning,
+  rowOrdering: Seq[SortOrder],
+  child: SparkPlan): SparkPlan = {
+val needSort = rowOrdering.nonEmpty && child.outputOrdering != 
rowOrdering
+val needsShuffle = child.outputPartitioning != partitioning
+val canSortWithShuffle = Exchange.canSortWithShuffle(partitioning, 
rowOrdering)
+
+if (needSort && needsShuffle && canSortWithShuffle) {
+  Exchange(partitioning, rowOrdering, child)
+} else {
+  val withShuffle = if (needsShuffle) {
+Exchange(partitioning, Nil, child)
+  } else {
+child
+  }
 
-  if (meetsRequirements && compatible) {
+  val withSort = if (needSort) {
+Sort(rowOrdering, global = false, withShuffle)
+  } else {
+withShuffle
+  }
+
+  withSort
+}
+  }
+
+  if (meetsRequirements && compatible  && !needsAnySort) {
 operator
   } else {
 // At least one child does not satisfies its required data 
distribution or
 // at least one child's outputPartitioning is not compatible with 
another child's
 // outputPartitioning. In this case, we need to add Exchange 
operators.
-val repartitionedChildren = 
operator.requiredChildDistribution.zip(operator.children).map {
-  case (AllTuples, child) =>
-addExchangeIfNecessary(SinglePartition, child)
-  case (ClusteredDistribution(clustering), child) =>
-addExchangeIfNecessary(HashPartitioning(clustering, 
numPartitions), child)
-  case (OrderedDistribution(ordering), child) =>
-addExchangeIfNecessary(RangePartitioning(ordering, 
numPartitions), child)
-  case (UnspecifiedDistribution, child) => child
-  case (dist, _) => sys.error(s"Don't know how to ensure $dist")
+val requirements =
+  (operator.requiredChildDistribution, 
operator.requiredChildOrdering, operator.children)
+
+val fixedChildren = requirements.zipped.map {
+  case (AllTuples, rowOrdering, child) =>
+addOperatorsIfNecessary(SinglePartition, rowOrdering, child)
+  case (ClusteredDistribution(clustering), rowOrdering, child) =>
+addOperatorsIfNecessary(HashPartitioning(clustering, 
numPartitions), rowOrdering, child)
+  case (OrderedDistribution(ordering), rowOrdering, child) =>
+addOperatorsIfNecessary(RangePartitioning(ordering, 
numPartitions), Nil, child)
+
+  case (UnspecifiedDistribution, Seq(), child) =>
+child
+  case (UnspecifiedDistribution, rowOrdering, child) =>
+Sort(rowOrdering, global = false, child)
--- End diff --

Use `execution.ExternalSort` when `sqlContext.conf.externalSortEnabled` is 
true.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4723#issuecomment-93205694
  
  [Test build #30310 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30310/consoleFull)
 for   PR 4723 at commit 
[`dc0cf6f`](https://github.com/apache/spark/commit/dc0cf6ffdd6f4c4c58a47f69ecef3f9103caef4f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/5208#discussion_r28394950
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala ---
@@ -157,28 +205,61 @@ private[sql] case class AddExchange(sqlContext: 
SQLContext) extends Rule[SparkPl
 case Seq(a,b) => a compatibleWith b
   }.exists(!_)
 
-  // Check if the partitioning we want to ensure is the same as the 
child's output
-  // partitioning. If so, we do not need to add the Exchange operator.
-  def addExchangeIfNecessary(partitioning: Partitioning, child: 
SparkPlan): SparkPlan =
-if (child.outputPartitioning != partitioning) 
Exchange(partitioning, child) else child
+  // Adds Exchange or Sort operators as required
+  def addOperatorsIfNecessary(
+  partitioning: Partitioning,
+  rowOrdering: Seq[SortOrder],
+  child: SparkPlan): SparkPlan = {
+val needSort = rowOrdering.nonEmpty && child.outputOrdering != 
rowOrdering
+val needsShuffle = child.outputPartitioning != partitioning
+val canSortWithShuffle = Exchange.canSortWithShuffle(partitioning, 
rowOrdering)
+
+if (needSort && needsShuffle && canSortWithShuffle) {
+  Exchange(partitioning, rowOrdering, child)
+} else {
+  val withShuffle = if (needsShuffle) {
+Exchange(partitioning, Nil, child)
+  } else {
+child
+  }
 
-  if (meetsRequirements && compatible) {
+  val withSort = if (needSort) {
+Sort(rowOrdering, global = false, withShuffle)
--- End diff --

Like what we do in `SparkStrategies`, use `execution.ExternalSort` when 
`sqlContext.conf.externalSortEnabled` is `true`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/5208#discussion_r28394813
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala ---
@@ -19,24 +19,39 @@ package org.apache.spark.sql.execution
 
 import org.apache.spark.annotation.DeveloperApi
 import org.apache.spark.shuffle.sort.SortShuffleManager
-import org.apache.spark.sql.catalyst.expressions
 import org.apache.spark.{SparkEnv, HashPartitioner, RangePartitioner, 
SparkConf}
 import org.apache.spark.rdd.{RDD, ShuffledRDD}
 import org.apache.spark.sql.{SQLContext, Row}
 import org.apache.spark.sql.catalyst.errors.attachTree
-import org.apache.spark.sql.catalyst.expressions.{Attribute, RowOrdering}
+import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.plans.physical._
 import org.apache.spark.sql.catalyst.rules.Rule
 import org.apache.spark.util.MutablePair
 
+object Exchange {
+  /** Returns true when the ordering expressions are a subset of the key. 
*/
+  def canSortWithShuffle(partitioning: Partitioning, desiredOrdering: 
Seq[SortOrder]): Boolean = {
--- End diff --

It will be good to also explain that we need the ordering expressions to be 
a subset of the key because we are taking advantage of `ShuffledRDD`'s 
`KeyOrdering` for sorting.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6800][SQL] Update doc for JDBCRelation'...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5488#issuecomment-93203795
  
  [Test build #30307 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30307/consoleFull)
 for   PR 5488 at commit 
[`1dcc929`](https://github.com/apache/spark/commit/1dcc9294d0a5a6e9ac58536c0b39ccb433b89b1c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6898][SQL] completely support special c...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5511#issuecomment-93203778
  
  [Test build #30306 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30306/consoleFull)
 for   PR 5511 at commit 
[`820dc45`](https://github.com/apache/spark/commit/820dc4515f968fbbee01dc073fc3813a4fc9d9d0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4723#issuecomment-93203535
  
  [Test build #30308 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30308/consoleFull)
 for   PR 4723 at commit 
[`9da49be`](https://github.com/apache/spark/commit/9da49be0cf2e569a9c871dd7bbb3aee7820f9e0e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/5208#discussion_r28394599
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala ---
@@ -120,27 +161,34 @@ case class Exchange(newPartitioning: Partitioning, 
child: SparkPlan) extends Una
  * Ensures that the 
[[org.apache.spark.sql.catalyst.plans.physical.Partitioning Partitioning]]
  * of input data meets the
  * [[org.apache.spark.sql.catalyst.plans.physical.Distribution 
Distribution]] requirements for
- * each operator by inserting [[Exchange]] Operators where required.
+ * each operator by inserting [[Exchange]] Operators where required.  Also 
ensure that the
+ * required input partition ordering requirements are met.
  */
-private[sql] case class AddExchange(sqlContext: SQLContext) extends 
Rule[SparkPlan] {
+private[sql] case class EnsureRequirements(sqlContext: SQLContext) extends 
Rule[SparkPlan] {
   // TODO: Determine the number of partitions.
   def numPartitions: Int = sqlContext.conf.numShufflePartitions
 
   def apply(plan: SparkPlan): SparkPlan = plan.transformUp {
 case operator: SparkPlan =>
-  // Check if every child's outputPartitioning satisfies the 
corresponding
+  // True iff every child's outputPartitioning satisfies the 
corresponding
   // required data distribution.
   def meetsRequirements: Boolean =
-!operator.requiredChildDistribution.zip(operator.children).map {
+operator.requiredChildDistribution.zip(operator.children).forall {
   case (required, child) =>
 val valid = child.outputPartitioning.satisfies(required)
 logDebug(
   s"${if (valid) "Valid" else "Invalid"} distribution," +
 s"required: $required current: 
${child.outputPartitioning}")
 valid
-}.exists(!_)
+}
 
-  // Check if outputPartitionings of children are compatible with each 
other.
+  // True iff any of the children are incorrectly sorted.
+  def needsAnySort: Boolean =
+operator.requiredChildOrdering.zip(operator.children).exists {
+  case (required, child) => required.nonEmpty && required != child
--- End diff --

Seems you want `case (required, child) => required.nonEmpty && required != 
child. outputOrdering`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6899][SQL] Fix type mismatch when using...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5517#issuecomment-93202277
  
  [Test build #30305 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30305/consoleFull)
 for   PR 5517 at commit 
[`8ae5f65`](https://github.com/apache/spark/commit/8ae5f6505a68f6ef0bed2cd3fb3bd72a61156e22).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6899][SQL] Fix type mismatch when using...

2015-04-14 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/5517

[SPARK-6899][SQL] Fix type mismatch when using codegen with Average

JIRA https://issues.apache.org/jira/browse/SPARK-6899


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 fix_codegen_average

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5517.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5517


commit 8ae5f6505a68f6ef0bed2cd3fb3bd72a61156e22
Author: Liang-Chi Hsieh 
Date:   2015-04-15T05:31:04Z

Add the case of DecimalType.Unlimited to Average.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/5208#discussion_r28394315
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala ---
@@ -91,6 +94,16 @@ class JoinSuite extends QueryTest with 
BeforeAndAfterEach {
   ("SELECT * FROM testData full JOIN testData2 ON (key * a != key + 
a)",
 classOf[BroadcastNestedLoopJoin])
 ).foreach { case (query, joinClass) => assertJoin(query, joinClass) }
+try {
+  conf.setConf("spark.sql.planner.sortMergeJoin", "true")
+  Seq(
+("SELECT * FROM testData JOIN testData2 ON key = a", 
classOf[SortMergeJoin]),
+("SELECT * FROM testData JOIN testData2 ON key = a and key = 2", 
classOf[SortMergeJoin]),
+("SELECT * FROM testData JOIN testData2 ON key = a where key = 2", 
classOf[SortMergeJoin])
+  ).foreach { case (query, joinClass) => assertJoin(query, joinClass) }
+} finally {
+  conf.setConf("spark.sql.planner.sortMergeJoin", 
SORTMERGEJOIN_ENABLED.toString)
+}
   }
 
   test("broadcasted hash join operator selection") {
--- End diff --

Let's also add a test in this one to make sure broadcast join will be 
selected when sort merge join is on.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/5208#discussion_r28394224
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala ---
@@ -51,6 +51,8 @@ class JoinSuite extends QueryTest with BeforeAndAfterEach 
{
   case j: CartesianProduct => j
   case j: BroadcastNestedLoopJoin => j
   case j: BroadcastLeftSemiJoinHash => j
+  case j: ShuffledHashJoin => j
--- End diff --

Seems it is the first `case`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

2015-04-14 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/5350#issuecomment-93195996
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6800][SQL] Update doc for JDBCRelation'...

2015-04-14 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/5488#discussion_r28394137
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala ---
@@ -50,9 +50,11 @@ private[sql] object JDBCRelation {
* Given a partitioning schematic (a column of integral type, a number of
* partitions, and upper and lower bounds on the column's value), 
generate
* WHERE clauses for each partition so that each row in the table appears
-   * exactly once.  The parameters minValue and maxValue are advisory in 
that
+   * exactly once. The parameters minValue and maxValue are advisory in 
that
* incorrect values may cause the partitioning to be poor, but no data
-   * will fail to be represented.
+   * will fail to be represented. Note: the upper and lower bounds are just
+   * used to decide partition stride, not for filtering. So all the rows in
+   * table will be partitioned.
--- End diff --

> The parameters minValue and maxValue are advisory in that incorrect 
values may cause the partitioning to be poor, but no data will fail to be 
represented.

The sentence above already explains that the filters are only used for 
partitioning and that all data will always be returned.  I think the best place 
to update would be in the [SQL programming 
guide](https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md),
 in the table under the section "JDBC To Other Databases".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

2015-04-14 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/5350#discussion_r28394119
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala ---
@@ -76,6 +76,12 @@ case class DropTable(
 private[hive]
 case class AddJar(path: String) extends RunnableCommand {
 
+  override val output: Seq[Attribute] = {
+val schema = StructType(
+  StructField("result", IntegerType, false) :: Nil)
+schema.toAttributes
+  }
--- End diff --

OK, the reason is to match the behavior of Hive... This change looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5208#issuecomment-93194500
  
  [Test build #30304 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30304/consoleFull)
 for   PR 5208 at commit 
[`ec8061b`](https://github.com/apache/spark/commit/ec8061b7f36b87c883af111438ac9ff0304050d7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5794] [SQL] fix add jar

2015-04-14 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4586#discussion_r28394095
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala ---
@@ -79,7 +79,7 @@ case class AddJar(path: String) extends RunnableCommand {
 val hiveContext = sqlContext.asInstanceOf[HiveContext]
 hiveContext.runSqlHive(s"ADD JAR $path")
 hiveContext.sparkContext.addJar(path)
-Seq.empty[Row]
+Seq(Row(0))
--- End diff --

(I thought it may be better to comment at the original pr).

OK, I see. In future, let's make sure we also update the `output` if the 
result of a command is not an empty Seq (#5350 will change the schema for 
`AddJar`). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6898][SQL] completely support special c...

2015-04-14 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/5511#discussion_r28394069
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -59,7 +64,7 @@ case class UnresolvedAttribute(name: String) extends 
Attribute with trees.LeafNo
   override def newInstance(): UnresolvedAttribute = this
   override def withNullability(newNullability: Boolean): 
UnresolvedAttribute = this
   override def withQualifiers(newQualifiers: Seq[String]): 
UnresolvedAttribute = this
-  override def withName(newName: String): UnresolvedAttribute = 
UnresolvedAttribute(name)
--- End diff --

No, that seems wrong to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6898][SQL] completely support special c...

2015-04-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/5511#discussion_r28394051
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -59,7 +64,7 @@ case class UnresolvedAttribute(name: String) extends 
Attribute with trees.LeafNo
   override def newInstance(): UnresolvedAttribute = this
   override def withNullability(newNullability: Boolean): 
UnresolvedAttribute = this
   override def withQualifiers(newQualifiers: Seq[String]): 
UnresolvedAttribute = this
-  override def withName(newName: String): UnresolvedAttribute = 
UnresolvedAttribute(name)
--- End diff --

Origin code ignore the `newName`. Is this intended?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5794] [SQL] fix add jar

2015-04-14 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/4586#discussion_r28394029
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala ---
@@ -79,7 +79,7 @@ case class AddJar(path: String) extends RunnableCommand {
 val hiveContext = sqlContext.asInstanceOf[HiveContext]
 hiveContext.runSqlHive(s"ADD JAR $path")
 hiveContext.sparkContext.addJar(path)
-Seq.empty[Row]
+Seq(Row(0))
--- End diff --

Hive would return a `0` for add jar command.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

2015-04-14 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/5350#discussion_r28393963
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala ---
@@ -76,6 +76,12 @@ case class DropTable(
 private[hive]
 case class AddJar(path: String) extends RunnableCommand {
 
+  override val output: Seq[Attribute] = {
+val schema = StructType(
+  StructField("result", IntegerType, false) :: Nil)
+schema.toAttributes
+  }
--- End diff --

I do not really know the reason that the result of AddJar is a `Row(0)` 
(see a few lines below.). But, we can figure it out after we merge it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5794] [SQL] fix add jar

2015-04-14 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4586#discussion_r28393926
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala ---
@@ -79,7 +79,7 @@ case class AddJar(path: String) extends RunnableCommand {
 val hiveContext = sqlContext.asInstanceOf[HiveContext]
 hiveContext.runSqlHive(s"ADD JAR $path")
 hiveContext.sparkContext.addJar(path)
-Seq.empty[Row]
+Seq(Row(0))
--- End diff --

@adrian-wang Why we need a Row with a value of 0 at here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6898][SQL] completely support special c...

2015-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5511#issuecomment-93192357
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30298/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6898][SQL] completely support special c...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5511#issuecomment-93192335
  
  [Test build #30298 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30298/consoleFull)
 for   PR 5511 at commit 
[`d81ad43`](https://github.com/apache/spark/commit/d81ad43e5e07fe2227db7bb383c98c6d2c0fb875).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch **removes the following dependencies:**
   * `RoaringBitmap-0.4.5.jar`
   * `akka-actor_2.10-2.3.4-spark.jar`
   * `akka-remote_2.10-2.3.4-spark.jar`
   * `akka-slf4j_2.10-2.3.4-spark.jar`
   * `arpack_combined_all-0.1.jar`
   * `breeze-macros_2.10-0.3.1.jar`
   * `breeze_2.10-0.10.jar`
   * `chill-java-0.5.0.jar`
   * `chill_2.10-0.5.0.jar`
   * `commons-beanutils-1.7.0.jar`
   * `commons-beanutils-core-1.8.0.jar`
   * `commons-codec-1.5.jar`
   * `commons-collections-3.2.1.jar`
   * `commons-configuration-1.6.jar`
   * `commons-digester-1.8.jar`
   * `commons-el-1.0.jar`
   * `commons-httpclient-3.1.jar`
   * `commons-io-2.4.jar`
   * `commons-lang-2.4.jar`
   * `commons-lang3-3.3.2.jar`
   * `commons-math-2.1.jar`
   * `commons-math3-3.1.1.jar`
   * `commons-net-2.2.jar`
   * `compress-lzf-1.0.0.jar`
   * `config-1.2.1.jar`
   * `core-1.1.2.jar`
   * `curator-client-2.4.0.jar`
   * `curator-framework-2.4.0.jar`
   * `curator-recipes-2.4.0.jar`
   * `groovy-all-2.3.7.jar`
   * `guava-14.0.1.jar`
   * `hadoop-client-1.0.4.jar`
   * `hadoop-core-1.0.4.jar`
   * `hsqldb-1.8.0.10.jar`
   * `ivy-2.4.0.jar`
   * `jackson-annotations-2.3.0.jar`
   * `jackson-core-2.3.0.jar`
   * `jackson-core-asl-1.8.8.jar`
   * `jackson-databind-2.3.0.jar`
   * `jackson-mapper-asl-1.8.8.jar`
   * `jansi-1.4.jar`
   * `javax.servlet-3.0.0.v201112011016.jar`
   * `jblas-1.2.3.jar`
   * `jcl-over-slf4j-1.7.10.jar`
   * `jets3t-0.7.1.jar`
   * `jline-0.9.94.jar`
   * `jline-2.10.4.jar`
   * `jodd-core-3.6.3.jar`
   * `json4s-ast_2.10-3.2.10.jar`
   * `json4s-core_2.10-3.2.10.jar`
   * `json4s-jackson_2.10-3.2.10.jar`
   * `jsr305-1.3.9.jar`
   * `jtransforms-2.4.0.jar`
   * `jul-to-slf4j-1.7.10.jar`
   * `kryo-2.21.jar`
   * `log4j-1.2.17.jar`
   * `lz4-1.2.0.jar`
   * `mesos-0.21.0-shaded-protobuf.jar`
   * `metrics-core-3.1.0.jar`
   * `metrics-graphite-3.1.0.jar`
   * `metrics-json-3.1.0.jar`
   * `metrics-jvm-3.1.0.jar`
   * `minlog-1.2.jar`
   * `netty-3.8.0.Final.jar`
   * `netty-all-4.0.23.Final.jar`
   * `objenesis-1.2.jar`
   * `opencsv-2.3.jar`
   * `oro-2.0.8.jar`
   * `paranamer-2.6.jar`
   * `parquet-column-1.6.0rc3.jar`
   * `parquet-common-1.6.0rc3.jar`
   * `parquet-encoding-1.6.0rc3.jar`
   * `parquet-format-2.2.0-rc1.jar`
   * `parquet-generator-1.6.0rc3.jar`
   * `parquet-hadoop-1.6.0rc3.jar`
   * `parquet-jackson-1.6.0rc3.jar`
   * `protobuf-java-2.5.0-spark.jar`
   * `py4j-0.8.2.1.jar`
   * `pyrolite-2.0.1.jar`
   * `quasiquotes_2.10-2.0.1.jar`
   * `reflectasm-1.07-shaded.jar`
   * `scala-compiler-2.10.4.jar`
   * `scala-library-2.10.4.jar`
   * `scala-reflect-2.10.4.jar`
   * `scalap-2.10.4.jar`
   * `scalatest_2.10-2.2.1.jar`
   * `slf4j-api-1.7.10.jar`
   * `slf4j-log4j12-1.7.10.jar`
   * `snappy-java-1.1.1.6.jar`
   * `spark-bagel_2.10-1.3.0-SNAPSHOT.jar`
   * `spark-catalyst_2.10-1.3.0-SNAPSHOT.jar`
   * `spark-core_2.10-1.3.0-SNAPSHOT.jar`
   * `spark-graphx_2.10-1.3.0-SNAPSHOT.jar`
   * `spark-mllib_2.10-1.3.0-SNAPSHOT.jar`
   * `spark-network-common_2.10-1.3.0-SNAPSHOT.jar`
   * `spark-network-shuffle_2.10-1.3.0-SNAPSHOT.jar`
   * `spark-repl_2.10-1.3.0-SNAPSHOT.jar`
   * `spark-sql_2.10-1.3.0-SNAPSHOT.jar`
   * `spark-streaming_2.10-1.3.0-SNAPSHOT.jar`
   * `spire-macros_2.10-0.7.4.jar`
   * `spire_2.10-0.7.4.jar`
   * `stream-2.7.0.jar`
   * `tachyon-0.5.0.jar`
   * `tachyon-client-0.5.0.jar`
   * `uncommons-maths-1.2.2a.jar`
   * `unused-1.0.0.jar`
   * `xmlenc-0.52.jar`
   * `zookeeper-3.4.5.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/5208#issuecomment-93190389
  
@yhuai can you do another pass over `Exchange.scala`?  I made several 
changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5208#issuecomment-93189576
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30299/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5208#issuecomment-93189565
  
  [Test build #30299 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30299/consoleFull)
 for   PR 5208 at commit 
[`413fd24`](https://github.com/apache/spark/commit/413fd24a53d3b86eed7a57c130973da4417e8393).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Exchange(`
  * `case class SortMergeJoin(`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5350#issuecomment-93188608
  
  [Test build #30303 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30303/consoleFull)
 for   PR 5350 at commit 
[`3b7bfa8`](https://github.com/apache/spark/commit/3b7bfa8f37e7f2b9aefdfd0e5e57d7b5c6b516ce).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

2015-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5350#issuecomment-93188052
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30297/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5350#issuecomment-93188027
  
  [Test build #30297 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30297/consoleFull)
 for   PR 5350 at commit 
[`2772f0d`](https://github.com/apache/spark/commit/2772f0d8face2f9c634718fb8719fe56c5d8d676).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait CaseConversionExpression `
  * `final class UTF8String extends Ordered[UTF8String] with Serializable `

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

2015-04-14 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/5350#discussion_r28392750
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
 ---
@@ -284,9 +321,9 @@ object CatalystTypeConverters {
   row: Row,
   schema: StructType,
   converters: Array[Any => Any]): Row = {
-val ar = new Array[Any](row.size)
+val ar = new Array[Any](converters.size)
 var idx = 0
-while (idx < row.size) {
+while (idx < converters.size && idx < row.size) {
--- End diff --

It's a new test case in master, I had to merged with master to debug it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

2015-04-14 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/5350#issuecomment-93184082
  
BTW - since this changes so many files, it'd be great to merge this as soon 
as possible. We can fix minor problems later in follow up PRs.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support

2015-04-14 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/5173#issuecomment-93184039
  
@shaananc  It works fine here:
```
Using Python version 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014 00:54:21)
SparkContext available as sc, SQLContext available as sqlContext.
>>> data = (1, 2)
>>> sc.parallelize(data).reduce(lambda a, b: a + b)
3
```

What's is your environment?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...

2015-04-14 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/5350#discussion_r28392622
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
 ---
@@ -284,9 +321,9 @@ object CatalystTypeConverters {
   row: Row,
   schema: StructType,
   converters: Array[Any => Any]): Row = {
-val ar = new Array[Any](row.size)
+val ar = new Array[Any](converters.size)
 var idx = 0
-while (idx < row.size) {
+while (idx < converters.size && idx < row.size) {
--- End diff --

btw, where is "ADD JAR command 2"? I could not find it...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6871][SQL] WITH clause in CTE can not f...

2015-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5480#issuecomment-93183523
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30295/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6871][SQL] WITH clause in CTE can not f...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5480#issuecomment-93183518
  
  [Test build #30295 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30295/consoleFull)
 for   PR 5480 at commit 
[`4da3712`](https://github.com/apache/spark/commit/4da3712f34b4e8672bf143f90fe1279cd114daab).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch **adds the following new dependencies:**
   * `snappy-java-1.1.1.7.jar`

 * This patch **removes the following dependencies:**
   * `snappy-java-1.1.1.6.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6800][SQL] Update doc for JDBCRelation'...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5488#issuecomment-93183475
  
  [Test build #30296 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30296/consoleFull)
 for   PR 5488 at commit 
[`3eb74d6`](https://github.com/apache/spark/commit/3eb74d614a05d33a3071d586fedc20bc4f2e88d6).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5818][SQL] unable to use "add jar" in h...

2015-04-14 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/spark/pull/5393#discussion_r28392460
  
--- Diff: repl/pom.xml ---
@@ -150,6 +150,16 @@
   
   
 
+  hive
+  
+
+  org.apache.spark
+  spark-hive_${scala.binary.version}
+  ${project.version}
+
+  
+
+
--- End diff --

No, dependency is added in hive profile only. So if assembly is built with 
-Phive option then hive dependency is added to repl, so that hive is available 
in class path for repl.
If assembly is built without -Phive option then dependency is not added and 
the testcase is ignored. Testcase also checks runtime if hiveContext class is 
available, if not testcase is ignored.
I have manually tested both the cases of building with -Phive and without 
hive. It will not impact assembly creation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6800][SQL] Update doc for JDBCRelation'...

2015-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5488#issuecomment-93183480
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30296/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

2015-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5358#issuecomment-93183316
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30300/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5358#issuecomment-93183311
  
  [Test build #30300 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30300/consoleFull)
 for   PR 5358 at commit 
[`9e7aaec`](https://github.com/apache/spark/commit/9e7aaecc5d28914a86a4d8b8da47504efd68bde6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6865][SQL] DataFrame column names shoul...

2015-04-14 Thread rxin
Github user rxin closed the pull request at:

https://github.com/apache/spark/pull/5505


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6865][SQL] DataFrame column names shoul...

2015-04-14 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/5505#issuecomment-93182797
  
I discussed with michael offline -- given this would break self-join, we've 
decided to treat [dot] (i.e. ".") as a special case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6919 Add asDict method to StatCounter

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5516#issuecomment-93181762
  
  [Test build #30292 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30292/consoleFull)
 for   PR 5516 at commit 
[`c933af7`](https://github.com/apache/spark/commit/c933af75aaeac641e71fead81f5d6804f882eff8).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch **adds the following new dependencies:**
   * `snappy-java-1.1.1.7.jar`

 * This patch **removes the following dependencies:**
   * `snappy-java-1.1.1.6.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6919 Add asDict method to StatCounter

2015-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5516#issuecomment-93181791
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30292/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6692][YARN] Add an option for client to...

2015-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5343#issuecomment-93181658
  
  [Test build #30302 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30302/consoleFull)
 for   PR 5343 at commit 
[`2a3fa38`](https://github.com/apache/spark/commit/2a3fa381708ce5319ca3786a079c866b70467e81).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6911] [SQL] improve accessor for nested...

2015-04-14 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5513#discussion_r28392080
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala ---
@@ -515,14 +515,15 @@ class Column(protected[sql] val expr: Expression) 
extends Logging {
   def rlike(literal: String): Column = RLike(expr, lit(literal).expr)
 
   /**
-   * An expression that gets an item at position `ordinal` out of an array.
+   * An expression that gets an item at position `ordinal` out of an array,
+   * or gets a value by key `key` in a [[MapType]].
*
* @group expr_ops
*/
-  def getItem(ordinal: Int): Column = GetItem(expr, Literal(ordinal))
+  def getItem(key: Any): Column = GetItem(expr, Literal(key))
--- End diff --

that makes sense. @davies can you add a unit test to scala? in 
ColumnExpressionSuite or DataFrameSuite.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >