[GitHub] spark issue #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added if joi...

2017-08-10 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16985
  
LGTM except one comment for the test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18916: [CORE][DOC]Add spark.internal.config parameter de...

2017-08-10 Thread heary-cao
GitHub user heary-cao opened a pull request:

https://github.com/apache/spark/pull/18916

[CORE][DOC]Add spark.internal.config parameter description

## What changes were proposed in this pull request?

Currently, some of the configuration parameters of spark.internal.config 
without adding a description, which is incorrectly, this PR is based on 
http://spark.apache.org/docs/latest/configuration.html Property Name to 
supplement spark.internal.config parameter description.

## How was this patch tested?

the existing test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/heary-cao/spark doc_config_package

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18916.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18916


commit ebfe016799900271df84c765c2f349d815ec307d
Author: caoxuewen 
Date:   2017-08-11T06:23:18Z

Add spark.internal.config parameter description




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added...

2017-08-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16985#discussion_r132624167
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala ---
@@ -543,6 +551,68 @@ abstract class BucketedReadSuite extends QueryTest 
with SQLTestUtils {
 )
   }
 
+  test("SPARK-19122 Re-order join predicates if they match with the 
child's output partitioning") {
+val bucketedTableTestSpec = BucketedTableTestSpec(
+  Some(BucketSpec(8, Seq("i", "j", "k"), Seq("i", "j", "k"))),
+  numPartitions = 1,
+  expectedShuffle = false,
+  expectedSort = false)
+
+def testBucketingWithPredicate(
+joinCondition: (DataFrame, DataFrame) => Column,
+expectedResult: Option[Array[Row]]): Array[Row] = {
+  testBucketing(
+bucketedTableTestSpecLeft = bucketedTableTestSpec,
+bucketedTableTestSpecRight = bucketedTableTestSpec,
+joinCondition = joinCondition,
+expectedResult = expectedResult
+  )
+}
+
+// Irrespective of the ordering of keys in the join predicate, the 
query plan and
+// query results should always be the same
--- End diff --

I think we don't need to test this here, this is an existing property of 
join implementation in Spark SQL, and should have already been well tested. 
Then we don't need to change `testBucketing`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18810
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18810
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80522/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18810
  
**[Test build #80522 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80522/testReport)**
 for PR 18810 at commit 
[`8b32b54`](https://github.com/apache/spark/commit/8b32b54d0586b8878ea231919266e429f613e8c7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18915: [SPARK-21176][WEB UI] Format worker page links to work w...

2017-08-10 Thread aosagie
Github user aosagie commented on the issue:

https://github.com/apache/spark/pull/18915
  
Hey @cloud-fan and @ajbozarth 
Thanks for checking my last PR. Any chance I could get one more look at 
this clean up? Some of the existing link handling was different between the 
Applications and Workers page.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18915: [SPARK-21176][WEB UI] Format worker page links to...

2017-08-10 Thread aosagie
GitHub user aosagie opened a pull request:

https://github.com/apache/spark/pull/18915

[SPARK-21176][WEB UI] Format worker page links to work with proxy

## What changes were proposed in this pull request?

Several links on the worker page do not work correctly with the proxy 
because:
1) They don't acknowledge the proxy
2) They use relative paths (unlike the Application Page which uses full 
paths)

This patch fixes that. It also fixes a mistake in the proxy's Location 
header parsing which caused it to incorrectly handle redirects.

## How was this patch tested?

I checked the validity of every link with the proxy on and off.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aosagie/spark fix/proxy-links

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18915.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18915


commit 2ab211b3c4d15c9f3fa8cab6af1f1d944bae3721
Author: Anderson Osagie 
Date:   2017-08-10T03:08:42Z

[SPARK-21176][WEB UI] Format worker page links to work with proxy. Fix 
proxy location header creation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18843: [SPARK-21595] Separate thresholds for buffering a...

2017-08-10 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/18843#discussion_r132621976
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1139,9 +1154,14 @@ class SQLConf extends Serializable with Logging {
 
   def windowExecBufferSpillThreshold: Int = 
getConf(WINDOW_EXEC_BUFFER_SPILL_THRESHOLD)
 
+  def windowExecBufferInMemoryThreshold: Int = 
getConf(WINDOW_EXEC_BUFFER_IN_MEMORY_THRESHOLD)
+
   def sortMergeJoinExecBufferSpillThreshold: Int =
 getConf(SORT_MERGE_JOIN_EXEC_BUFFER_SPILL_THRESHOLD)
 
+  def sortMergeJoinExecBufferInMemoryThreshold: Int =
--- End diff --

Sure. Since there was no in-memory buffer for cartesian product before, I 
am using a conservative value 4096 for the in-memory buffer threshold. However, 
the spill threshold is set to 
`UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD` like it was 
before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added if joi...

2017-08-10 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16985
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18499: [SPARK-21176][WEB UI] Use a single ProxyServlet to proxy...

2017-08-10 Thread aosagie
Github user aosagie commented on the issue:

https://github.com/apache/spark/pull/18499
  
Yes, I agree with @IngoSchuster 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added if joi...

2017-08-10 Thread tejasapatil
Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/16985
  
BTW: The "summary of this patch" in your comment accurately captures what 
this PR is doing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added if joi...

2017-08-10 Thread tejasapatil
Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/16985
  
@cloud-fan : I was on a long vacation for a quite sometime so couldn't get 
to this. Wrt to the concern you had, I have replied to that discussion in the 
PR : https://github.com/apache/spark/pull/16985/files#r132620278 along with a 
test case which covers the scenario you had mentioned.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added...

2017-08-10 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/16985#discussion_r132620278
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ReorderJoinPredicates.scala
 ---
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.joins
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.sql.catalyst.expressions.Expression
+import org.apache.spark.sql.catalyst.plans.physical.{HashPartitioning, 
Partitioning}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.SparkPlan
+
+/**
+ * When the physical operators are created for JOIN, the ordering of join 
keys is based on order
+ * in which the join keys appear in the user query. That might not match 
with the output
+ * partitioning of the join node's children (thus leading to extra sort / 
shuffle being
+ * introduced). This rule will change the ordering of the join keys to 
match with the
+ * partitioning of the join nodes' children.
+ */
+class ReorderJoinPredicates extends Rule[SparkPlan] {
+  private def reorderJoinKeys(
+  leftKeys: Seq[Expression],
+  rightKeys: Seq[Expression],
+  leftPartitioning: Partitioning,
+  rightPartitioning: Partitioning): (Seq[Expression], Seq[Expression]) 
= {
+
+def reorder(expectedOrderOfKeys: Seq[Expression],
+currentOrderOfKeys: Seq[Expression]): (Seq[Expression], 
Seq[Expression]) = {
+  val leftKeysBuffer = ArrayBuffer[Expression]()
+  val rightKeysBuffer = ArrayBuffer[Expression]()
+
+  expectedOrderOfKeys.foreach(expression => {
+val index = currentOrderOfKeys.indexWhere(e => 
e.semanticEquals(expression))
+leftKeysBuffer.append(leftKeys(index))
+rightKeysBuffer.append(rightKeys(index))
+  })
+  (leftKeysBuffer, rightKeysBuffer)
+}
+
+if (leftKeys.forall(_.deterministic) && 
rightKeys.forall(_.deterministic)) {
+  leftPartitioning match {
+case HashPartitioning(leftExpressions, _)
+  if leftExpressions.length == leftKeys.length &&
--- End diff --

The contract for reordering is that the set of join keys must be equal to 
the set of child's partitioning columns (implemented at L58-L59 in this file). 
Thus there won't be reordering for the case you pointed out. I have added a 
test case of the same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18899
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18899
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80523/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18810
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18810
  
**[Test build #80518 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80518/testReport)**
 for PR 18810 at commit 
[`6814047`](https://github.com/apache/spark/commit/6814047231044627ee85aef458c108345e129bee).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread mpjlu
Github user mpjlu commented on the issue:

https://github.com/apache/spark/pull/18899
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18736: [SPARK-21481][ML] Add indexOf method for ml.featu...

2017-08-10 Thread facaiy
Github user facaiy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18736#discussion_r132618802
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala 
---
@@ -80,20 +82,31 @@ class HashingTF @Since("1.4.0") (@Since("1.4.0") 
override val uid: String)
 
   /** @group setParam */
   @Since("1.2.0")
-  def setNumFeatures(value: Int): this.type = set(numFeatures, value)
+  def setNumFeatures(value: Int): this.type = {
+hashingTF = new feature.HashingTF($(numFeatures)).setBinary($(binary))
--- End diff --

@WeichenXu123  How about adding `setNumFeatures` method to mllib? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18899
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18899
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80517/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18899
  
**[Test build #80517 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80517/testReport)**
 for PR 18899 at commit 
[`4404b10`](https://github.com/apache/spark/commit/4404b1050770c5d9206d00e42cb80cd61ab826ef).
 * This patch **fails PySpark pip packaging tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18806: [SPARK-21600][docs] The description of "this requires sp...

2017-08-10 Thread guoxiaolongzte
Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/18806
  
@srowen Help review the code,Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...

2017-08-10 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18253
  
I think `BusQueue` is a good abstraction, but we can still simplify other 
parts. My proposal:
```
// Do we need a better name like ListenesGroup? it's very similar to the 
previous LiveListenerBus.
class BusQueue extends SparkListenerBus {
  val eventQueue = ...
}

class LiveListenerBus {
  val listenersGroups: Map[String, BusQueue] = ...

  def addListener(listener, groupName = "default") {
val group = listenerGroups.getOrUpdate(groupName, new BusQueue ...)
group.addListener(listener)
  }

  def post(event) {
listenersGroups.foreach { case (_, group) =>
  group.post(event)
}
  }
}

```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18907: [SPARK-18464][SQL][followup] support old table which doe...

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18907
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80521/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18907: [SPARK-18464][SQL][followup] support old table which doe...

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18907
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18907: [SPARK-18464][SQL][followup] support old table which doe...

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18907
  
**[Test build #80521 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80521/testReport)**
 for PR 18907 at commit 
[`f9d3d17`](https://github.com/apache/spark/commit/f9d3d172da658900e59eb8e370296da5f1c46838).
 * This patch **fails to generate documentation**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CatalogRelation(tableMeta: CatalogTable) extends LeafNode `
  * `case class HiveTableRelation(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][SQL]The wholestage codegen will be ...

2017-08-10 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132616342
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -572,6 +572,14 @@ object SQLConf {
   "disable logging or -1 to apply no limit.")
 .createWithDefault(1000)
 
+  val WHOLESTAGE_MAX_LINES_PER_FUNCTION = 
buildConf("spark.sql.codegen.maxLinesPerFunction")
+.internal()
+.doc("The maximum lines of a single Java function generated by 
whole-stage codegen. " +
+  "When the generated function exceeds this threshold, " +
+  "the whole-stage codegen is deactivated for this subtree of the 
current query plan.")
+.intConf
+.createWithDefault(1500)
--- End diff --

@gatorsmile, Which do you think better to use for the default value, 1500 
or Int.Max ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2017-08-10 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18900
  
+1, similar to `CatalogTable.createTime`, we should have a 
`CatalogTablePartition.createTime`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18810
  
**[Test build #80522 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80522/testReport)**
 for PR 18810 at commit 
[`8b32b54`](https://github.com/apache/spark/commit/8b32b54d0586b8878ea231919266e429f613e8c7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18907: [SPARK-18464][SQL][followup] support old table which doe...

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18907
  
**[Test build #80521 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80521/testReport)**
 for PR 18907 at commit 
[`f9d3d17`](https://github.com/apache/spark/commit/f9d3d172da658900e59eb8e370296da5f1c46838).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18907: [SPARK-18464][SQL][followup] support old table which doe...

2017-08-10 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18907
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18907: [SPARK-18464][SQL][followup] support old table which doe...

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18907
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18907: [SPARK-18464][SQL][followup] support old table which doe...

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18907
  
**[Test build #80520 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80520/testReport)**
 for PR 18907 at commit 
[`f9d3d17`](https://github.com/apache/spark/commit/f9d3d172da658900e59eb8e370296da5f1c46838).
 * This patch **fails to generate documentation**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CatalogRelation(tableMeta: CatalogTable) extends LeafNode `
  * `case class HiveTableRelation(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18907: [SPARK-18464][SQL][followup] support old table which doe...

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18907
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80520/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18843: [SPARK-21595] Separate thresholds for buffering and spil...

2017-08-10 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18843
  
LGTM except one question, thanks for the fix!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][SQL]The wholestage codegen will be ...

2017-08-10 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132616033
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala
 ---
@@ -149,4 +150,56 @@ class WholeStageCodegenSuite extends SparkPlanTest 
with SharedSQLContext {
   assert(df.collect() === Array(Row(1), Row(2)))
 }
   }
+
+  def genGroupByCodeGenContext(caseNum: Int, maxLinesPerFunction: Int): 
CodegenContext = {
+val caseExp = (1 to caseNum).map { i =>
+  s"case when id > $i and id <= ${i + 1} then 1 else 0 end as v$i"
+}.toList
+
+spark.conf.set("spark.sql.codegen.maxLinesPerFunction", 
maxLinesPerFunction)
--- End diff --

Ok, modified, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18843: [SPARK-21595] Separate thresholds for buffering a...

2017-08-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18843#discussion_r132615896
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -844,24 +844,39 @@ object SQLConf {
   .stringConf
   .createWithDefaultFunction(() => TimeZone.getDefault.getID)
 
+  val WINDOW_EXEC_BUFFER_IN_MEMORY_THRESHOLD =
+buildConf("spark.sql.windowExec.buffer.in.memory.threshold")
+  .internal()
+  .doc("Threshold for number of rows guaranteed to be held in memory 
by the window operator")
+  .intConf
+  .createWithDefault(4096)
+
   val WINDOW_EXEC_BUFFER_SPILL_THRESHOLD =
 buildConf("spark.sql.windowExec.buffer.spill.threshold")
   .internal()
-  .doc("Threshold for number of rows buffered in window operator")
+  .doc("Threshold for number of rows to be spilled by window operator")
   .intConf
-  .createWithDefault(4096)
+  
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
+
+  val SORT_MERGE_JOIN_EXEC_BUFFER_IN_MEMORY_THRESHOLD =
+buildConf("spark.sql.sortMergeJoinExec.buffer.in.memory.threshold")
+  .internal()
+  .doc("Threshold for number of rows guaranteed to be held in memory 
by the sort merge " +
+"join operator")
+  .intConf
+  .createWithDefault(Int.MaxValue)
--- End diff --

got it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18843: [SPARK-21595] Separate thresholds for buffering a...

2017-08-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18843#discussion_r132615780
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -844,24 +844,39 @@ object SQLConf {
   .stringConf
   .createWithDefaultFunction(() => TimeZone.getDefault.getID)
 
+  val WINDOW_EXEC_BUFFER_IN_MEMORY_THRESHOLD =
+buildConf("spark.sql.windowExec.buffer.in.memory.threshold")
+  .internal()
+  .doc("Threshold for number of rows guaranteed to be held in memory 
by the window operator")
+  .intConf
+  .createWithDefault(4096)
+
   val WINDOW_EXEC_BUFFER_SPILL_THRESHOLD =
 buildConf("spark.sql.windowExec.buffer.spill.threshold")
   .internal()
-  .doc("Threshold for number of rows buffered in window operator")
+  .doc("Threshold for number of rows to be spilled by window operator")
   .intConf
-  .createWithDefault(4096)
+  
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
+
+  val SORT_MERGE_JOIN_EXEC_BUFFER_IN_MEMORY_THRESHOLD =
--- End diff --

ok let's keep them separated for each operator.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18843: [SPARK-21595] Separate thresholds for buffering a...

2017-08-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18843#discussion_r132615759
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1139,9 +1154,14 @@ class SQLConf extends Serializable with Logging {
 
   def windowExecBufferSpillThreshold: Int = 
getConf(WINDOW_EXEC_BUFFER_SPILL_THRESHOLD)
 
+  def windowExecBufferInMemoryThreshold: Int = 
getConf(WINDOW_EXEC_BUFFER_IN_MEMORY_THRESHOLD)
+
   def sortMergeJoinExecBufferSpillThreshold: Int =
 getConf(SORT_MERGE_JOIN_EXEC_BUFFER_SPILL_THRESHOLD)
 
+  def sortMergeJoinExecBufferInMemoryThreshold: Int =
--- End diff --

shall we introduce a similar config for cartesian product?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18907: [SPARK-18464][SQL][followup] support old table which doe...

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18907
  
**[Test build #80520 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80520/testReport)**
 for PR 18907 at commit 
[`f9d3d17`](https://github.com/apache/spark/commit/f9d3d172da658900e59eb8e370296da5f1c46838).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18907: [SPARK-18464][SQL][followup] support old table wh...

2017-08-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18907#discussion_r132615143
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -412,6 +412,11 @@ case class CatalogRelation(
   assert(tableMeta.partitionSchema.sameType(partitionCols.toStructType))
   assert(tableMeta.dataSchema.sameType(dataCols.toStructType))
 
+  // If table schema is empty, Spark will infer the schema at runtime, so 
we should mark this
+  // relation as unresolved and wait it to be replaced by relation with 
actual schema, before
+  // resolving parent plans.
+  override lazy val resolved: Boolean = tableMeta.schema.nonEmpty
--- End diff --

https://issues.apache.org/jira/browse/SPARK-20008 is another JIRA


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-10 Thread goldmedal
Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132615092
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala ---
@@ -186,6 +186,18 @@ class JsonFunctionsSuite extends QueryTest with 
SharedSQLContext {
   Row("""[{"_1":1}]""") :: Nil)
   }
 
+  test("to_json - map") {
+val df1 = Seq(Map("a" -> Tuple1(1))).toDF("a")
+val df2 = Seq(Map(Tuple1(1) -> Tuple1(1))).toDF("a")
+
+checkAnswer(
+  df1.select(to_json($"a")),
+  Row("""{"a":{"_1":1}}""") :: Nil)
+checkAnswer(
+  df2.select(to_json($"a")),
+  Row("""{"[0,1]":{"_1":1}}""") :: Nil)
--- End diff --

Actually, I'm not sure what answer is it but I got `[0,1]` using
```
scala> 
Seq(Tuple1(Tuple1(Map(Tuple1(1)->Tuple1(1).toDF("a").select(to_json($"a")).show()
++
|structstojson(a)|
++
|[{"_1":{"[0,1]":{...|
++
```
so I think this answer should be correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132614716
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -659,13 +660,19 @@ case class StructsToJson(
 (arr: Any) =>
   gen.write(arr.asInstanceOf[ArrayData])
   getAndReset()
+  case MapType(_: DataType, _: StructType, _: Boolean) =>
+(map: Any) =>
+  val mapType = child.dataType.asInstanceOf[MapType]
+  gen.write(map.asInstanceOf[MapData], mapType)
+  getAndReset()
 }
   }
 
   override def dataType: DataType = StringType
 
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
-case _: StructType | ArrayType(_: StructType, _) =>
+case _: StructType | ArrayType(_: StructType, _) |
--- End diff --

If we support `MapType` in general, I think `ArrayType(MapType)` will be 
allowed? We don't need to address it in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132614534
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala
 ---
@@ -590,4 +590,24 @@ class JsonExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   """{"t":"2015-12-31T16:00:00"}"""
 )
   }
+
+  test("SPARK-21513: to_json support map[string, struct] to json") {
+val schema = MapType(StringType, StructType(StructField("a", 
IntegerType) :: Nil))
+val input = Literal.create(ArrayBasedMapData(Map("test" -> 
InternalRow(1))), schema)
+checkEvaluation(
+  StructsToJson(Map.empty, input, gmtId),
+  """{"test":{"a":1}}"""
+)
+  }
+
+  test("SPARK-21513: to_json support map[struct, struct] to json") {
+val schema = MapType(StructType(StructField("a", IntegerType) :: Nil),
+  StructType(StructField("b", IntegerType) :: Nil))
+val input = Literal.create(ArrayBasedMapData(Map(InternalRow(1) -> 
InternalRow(2))), schema)
+checkEvaluation(
+  StructsToJson(Map.empty, input, gmtId),
--- End diff --

ditto.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132614524
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala
 ---
@@ -590,4 +590,24 @@ class JsonExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   """{"t":"2015-12-31T16:00:00"}"""
 )
   }
+
+  test("SPARK-21513: to_json support map[string, struct] to json") {
+val schema = MapType(StringType, StructType(StructField("a", 
IntegerType) :: Nil))
+val input = Literal.create(ArrayBasedMapData(Map("test" -> 
InternalRow(1))), schema)
+checkEvaluation(
+  StructsToJson(Map.empty, input, gmtId),
--- End diff --

I think we don't need `gmtId` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132614326
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala ---
@@ -186,6 +186,18 @@ class JsonFunctionsSuite extends QueryTest with 
SharedSQLContext {
   Row("""[{"_1":1}]""") :: Nil)
   }
 
+  test("to_json - map") {
+val df1 = Seq(Map("a" -> Tuple1(1))).toDF("a")
+val df2 = Seq(Map(Tuple1(1) -> Tuple1(1))).toDF("a")
+
+checkAnswer(
+  df1.select(to_json($"a")),
+  Row("""{"a":{"_1":1}}""") :: Nil)
+checkAnswer(
+  df2.select(to_json($"a")),
+  Row("""{"[0,1]":{"_1":1}}""") :: Nil)
--- End diff --

`[0,1]` is for the key `Tuple1(1)` ? I think it should be `[1]`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132614036
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3121,8 +3121,9 @@ object functions {
 to_json(e, options.asScala.toMap)
 
   /**
-   * Converts a column containing a `StructType` or `ArrayType` of 
`StructType`s into a JSON string
-   * with the specified schema. Throws an exception, in the case of an 
unsupported type.
+   * Converts a column containing a `StructType`, `ArrayType` of 
`StructType`s
+   * or a `MapType` with a `StructType` Value into a JSON string with the 
specified schema.
--- End diff --

ditto.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132614025
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3106,9 +3106,9 @@ object functions {
   }
 
   /**
-   * (Java-specific) Converts a column containing a `StructType` or 
`ArrayType` of `StructType`s
-   * into a JSON string with the specified schema. Throws an exception, in 
the case of an
-   * unsupported type.
+   * (Java-specific) Converts a column containing a `StructType`, 
`ArrayType` of `StructType`s
+   * or a `MapType` with a `StructType` Value into a JSON string with the 
specified schema.
--- End diff --

ditto.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132614002
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3090,9 +3090,9 @@ object functions {
   }
 
   /**
-   * (Scala-specific) Converts a column containing a `StructType` or 
`ArrayType` of `StructType`s
-   * into a JSON string with the specified schema. Throws an exception, in 
the case of an
-   * unsupported type.
+   * (Scala-specific) Converts a column containing a `StructType`, 
`ArrayType` of `StructType`s
+   * or a `MapType` with a `StructType` Value into a JSON string with the 
specified schema.
--- End diff --

nit: Value -> value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132613913
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -659,13 +660,19 @@ case class StructsToJson(
 (arr: Any) =>
   gen.write(arr.asInstanceOf[ArrayData])
   getAndReset()
+  case MapType(_: DataType, _: StructType, _: Boolean) =>
+(map: Any) =>
+  val mapType = child.dataType.asInstanceOf[MapType]
+  gen.write(map.asInstanceOf[MapData], mapType)
+  getAndReset()
 }
   }
 
   override def dataType: DataType = StringType
 
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
-case _: StructType | ArrayType(_: StructType, _) =>
+case _: StructType | ArrayType(_: StructType, _) |
+ MapType(_: DataType, _: StructType, _: Boolean) =>
--- End diff --

I was thinking that the function name for `StructsToJson` might be similar 
word, but looks it is just `to_json`, there is no wording of struct. Yeah, if 
so, `StructsToJson` is not public API so I think it is fine to change it.

Because this is his first PR, I'd prefer less complexity for him. And to 
support an arbitrary Map type needs to change few more places and makes review 
process longer. It should be great if we can let this in and @goldmedal can 
work on another PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-10 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/18902
  
I test the performance on a small data, the value in the following table is 
the average duration in seconds:

|numColums| Old Mean | Old Median | New Mean | New Median |
|--|--||--||
|1|0.0771394713|0.0658712813|0.080779802|0.04816598149996|
|10|0.723434063099|0.5954440414|0.0867935197|0.1326342865998|
|100|7.3756451568|6.2196631259|0.1911931552|0.862537681701|

We can see that, even on a small data, the speedup is significant.
On big dataset that do not fit in memory, we should obtain better speedup.

and the test code is here:

```
import org.apache.spark.ml.feature._
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._
import spark.implicits._
import scala.util.Random

val seed = 123l
val random = new Random(seed)
val n = 1
val m = 100
val rows = sc.parallelize(1 to n).map(i=> 
Row(Array.fill(m)(random.nextDouble): _*))
val struct = new StructType(Array.range(0,m,1).map(i => 
StructField(s"c$i",DoubleType,true)))
val df = spark.createDataFrame(rows, struct)
df.persist()
df.count()

for (strategy <- Seq("mean", "median"); k <- Seq(1,10,100)) {
val imputer = new 
Imputer().setStrategy(strategy).setInputCols(Array.range(0,k,1).map(i=>s"c$i")).setOutputCols(Array.range(0,k,1).map(i=>s"o$i"))
var duration = 0.0
for (i<- 0 until 10) {
val start = System.nanoTime()
imputer.fit(df)
val end = System.nanoTime()
duration += (end - start) / 1e9
}
println((strategy, k, duration/10))
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18756: [SPARK-21548][SQL] "Support insert into serial columns o...

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18756
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80515/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18756: [SPARK-21548][SQL] "Support insert into serial columns o...

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18756
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18914
  
**[Test build #80519 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80519/testReport)**
 for PR 18914 at commit 
[`b537ce0`](https://github.com/apache/spark/commit/b537ce074a808c10f169357741fb0b0cc256e741).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18756: [SPARK-21548][SQL] "Support insert into serial columns o...

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18756
  
**[Test build #80515 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80515/testReport)**
 for PR 18756 at commit 
[`7277eaa`](https://github.com/apache/spark/commit/7277eaadd7beee5076a6337d11bd318cf04ff461).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test

2017-08-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18914
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test

2017-08-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18914
  
The change looks fine. @HyukjinKwon @gatorsmile Can you help trigger 
jenkins? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132613088
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -659,13 +660,19 @@ case class StructsToJson(
 (arr: Any) =>
   gen.write(arr.asInstanceOf[ArrayData])
   getAndReset()
+  case MapType(_: DataType, _: StructType, _: Boolean) =>
+(map: Any) =>
+  val mapType = child.dataType.asInstanceOf[MapType]
+  gen.write(map.asInstanceOf[MapData], mapType)
+  getAndReset()
 }
   }
 
   override def dataType: DataType = StringType
 
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
-case _: StructType | ArrayType(_: StructType, _) =>
+case _: StructType | ArrayType(_: StructType, _) |
+ MapType(_: DataType, _: StructType, _: Boolean) =>
--- End diff --

I am quite sure we keep compatibility in the name although it might be best 
to not change if possible. I think we can also add `prettyName` to `to_json` 
for the column name concern too.

What do you think about renaming this in another minor PR or a followup 
just for less line diff, less complexity and for an easier review, considering 
this is his very first PR? I am also fine with doing this here if the opposite 
makes more sense to you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18907: [SPARK-18464][SQL][followup] support old table wh...

2017-08-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18907#discussion_r132611451
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -412,6 +412,11 @@ case class CatalogRelation(
   assert(tableMeta.partitionSchema.sameType(partitionCols.toStructType))
   assert(tableMeta.dataSchema.sameType(dataCols.toStructType))
 
+  // If table schema is empty, Spark will infer the schema at runtime, so 
we should mark this
+  // relation as unresolved and wait it to be replaced by relation with 
actual schema, before
+  // resolving parent plans.
+  override lazy val resolved: Boolean = tableMeta.schema.nonEmpty
--- End diff --

That's temp view, I'm really wondering how useful a 0-column table is


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...

2017-08-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18810
  
LGTM except for few comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][SQL]The wholestage codegen will be ...

2017-08-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132611417
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala
 ---
@@ -149,4 +150,56 @@ class WholeStageCodegenSuite extends SparkPlanTest 
with SharedSQLContext {
   assert(df.collect() === Array(Row(1), Row(2)))
 }
   }
+
+  def genGroupByCodeGenContext(caseNum: Int, maxLinesPerFunction: Int): 
CodegenContext = {
+val caseExp = (1 to caseNum).map { i =>
+  s"case when id > $i and id <= ${i + 1} then 1 else 0 end as v$i"
+}.toList
+
+spark.conf.set("spark.sql.codegen.maxLinesPerFunction", 
maxLinesPerFunction)
--- End diff --

withSQLConf?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][SQL]The wholestage codegen will be ...

2017-08-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132611346
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -572,6 +572,14 @@ object SQLConf {
   "disable logging or -1 to apply no limit.")
 .createWithDefault(1000)
 
+  val WHOLESTAGE_MAX_LINES_PER_FUNCTION = 
buildConf("spark.sql.codegen.maxLinesPerFunction")
+.internal()
+.doc("The maximum lines of a single Java function generated by 
whole-stage codegen. " +
+  "When the generated function exceeds this threshold, " +
+  "the whole-stage codegen is deactivated for this subtree of the 
current query plan.")
+.intConf
+.createWithDefault(1500)
--- End diff --

Or maybe`Int.Max` as default value so we won't change current behavior.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][SQL]The wholestage codegen will be ...

2017-08-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132611289
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala
 ---
@@ -149,4 +150,56 @@ class WholeStageCodegenSuite extends SparkPlanTest 
with SharedSQLContext {
   assert(df.collect() === Array(Row(1), Row(2)))
 }
   }
+
+  def genGroupByCodeGenContext(caseNum: Int, maxLinesPerFunction: Int): 
CodegenContext = {
+val caseExp = (1 to caseNum).map { i =>
+  s"case when id > $i and id <= ${i + 1} then 1 else 0 end as v$i"
+}.toList
+
+spark.conf.set("spark.sql.codegen.maxLinesPerFunction", 
maxLinesPerFunction)
+
+val keyExp = List(
+  "id",
+  "(id & 1023) as k1",
+  "cast(id & 1023 as double) as k2",
+  "cast(id & 1023 as int) as k3"
+)
+
+val ds = spark.range(10)
+  .selectExpr(keyExp:::caseExp: _*)
+  .groupBy("k1", "k2", "k3")
+  .sum()
+val plan = ds.queryExecution.executedPlan
+
+val wholeStageCodeGenExec = plan.find(p => p match {
+  case wp: WholeStageCodegenExec => wp.child match {
+case hp: HashAggregateExec if (hp.child.isInstanceOf[ProjectExec]) 
=> true
+case _ => false
+  }
+  case _ => false
+})
+
+assert(wholeStageCodeGenExec.isDefined)
+
wholeStageCodeGenExec.get.asInstanceOf[WholeStageCodegenExec].doCodeGen()._1
+  }
+
+  test("SPARK-21603 check there is a too long generated function") {
+val ctx = genGroupByCodeGenContext(30, 1500)
+assert(ctx.isTooLongGeneratedFunction === true)
+  }
+
+  test("SPARK-21603 check there is not a too long generated function") {
+val ctx = genGroupByCodeGenContext(1, 1500)
+assert(ctx.isTooLongGeneratedFunction === false)
+  }
+
+  test("SPARK-21603 check there is not a too long generated function when 
threshold is Int.Max") {
+val ctx = genGroupByCodeGenContext(30, Int.MaxValue)
+assert(ctx.isTooLongGeneratedFunction === false)
+  }
+
+  test("SPARK-21603 check there is not a too long generated function when 
threshold is 0") {
--- End diff --

`there is not a ...` -> `there is a`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][SQL]The wholestage codegen will be ...

2017-08-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132611153
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -370,6 +370,14 @@ case class WholeStageCodegenExec(child: SparkPlan) 
extends UnaryExecNode with Co
 
   override def doExecute(): RDD[InternalRow] = {
 val (ctx, cleanedSource) = doCodeGen()
+if (ctx.isTooLongGeneratedFunction) {
+  logWarning("Found too long generated codes and JIT optimization 
might not work, " +
+"Whole-stage codegen disabled for this plan, " +
+"You can change the config spark.sql.codegen.MaxFunctionLength " +
+"to adjust the function length limit:\n "
++ s"$treeString")
+  return child.execute()
+}
--- End diff --

@gatorsmile Do you mean `pipelineTime` metric?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][SQL]The wholestage codegen will be ...

2017-08-10 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132610861
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -572,6 +572,14 @@ object SQLConf {
   "disable logging or -1 to apply no limit.")
 .createWithDefault(1000)
 
+  val WHOLESTAGE_MAX_LINES_PER_FUNCTION = 
buildConf("spark.sql.codegen.maxLinesPerFunction")
+.internal()
+.doc("The maximum lines of a single Java function generated by 
whole-stage codegen. " +
+  "When the generated function exceeds this threshold, " +
+  "the whole-stage codegen is deactivated for this subtree of the 
current query plan.")
+.intConf
+.createWithDefault(1500)
--- End diff --

I think it applies to other Java programs using JAVA HotSpot VM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread mpjlu
Github user mpjlu commented on the issue:

https://github.com/apache/spark/pull/18899
  
Hi @sethah , the unit test is added.  Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18810
  
**[Test build #80518 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80518/testReport)**
 for PR 18810 at commit 
[`6814047`](https://github.com/apache/spark/commit/6814047231044627ee85aef458c108345e129bee).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][SQL]The wholestage codegen will be ...

2017-08-10 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132610543
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala
 ---
@@ -149,4 +149,75 @@ class WholeStageCodegenSuite extends SparkPlanTest 
with SharedSQLContext {
   assert(df.collect() === Array(Row(1), Row(2)))
 }
   }
+
+  test("SPARK-21603 check there is a too long generated function") {
+val ds = spark.range(10)
--- End diff --

Ok, I have modified it as you suggested above all, would you like to review 
it again, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18913: [SPARK-21563][CORE] Fix race condition when serializing ...

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18913
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80512/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18913: [SPARK-21563][CORE] Fix race condition when serializing ...

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18913
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18913: [SPARK-21563][CORE] Fix race condition when serializing ...

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18913
  
**[Test build #80512 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80512/testReport)**
 for PR 18913 at commit 
[`b06425f`](https://github.com/apache/spark/commit/b06425f7267e2f0e478000c30b60b963291aacb0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread mpjlu
Github user mpjlu commented on a diff in the pull request:

https://github.com/apache/spark/pull/18899#discussion_r132610165
  
--- Diff: project/MimaExcludes.scala ---
@@ -1012,6 +1012,10 @@ object MimaExcludes {
   
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.classification.RandomForestClassificationModel.setFeatureSubsetStrategy"),
   
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.regression.RandomForestRegressionModel.numTrees"),
   
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.regression.RandomForestRegressionModel.setFeatureSubsetStrategy")
+) ++ Seq(
+  // [SPARK-21680][ML][MLLIB]optimzie Vector coompress
--- End diff --

The error message is"method toSparse(nnz: Int) in trait is present only in 
current version"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18899
  
**[Test build #80517 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80517/testReport)**
 for PR 18899 at commit 
[`4404b10`](https://github.com/apache/spark/commit/4404b1050770c5d9206d00e42cb80cd61ab826ef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread mpjlu
Github user mpjlu commented on a diff in the pull request:

https://github.com/apache/spark/pull/18899#discussion_r132610049
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
@@ -635,8 +642,9 @@ class SparseVector @Since("2.0.0") (
 nnz
   }
 
-  override def toSparse: SparseVector = {
-val nnz = numNonzeros
+  override def toSparse: SparseVector = toSparse(numNonzeros)
--- End diff --

If define 
`def toSparse: SparseVector = toSparse(numNonzeros)`
in the superclass, when call dv.toSparse (there are this kinds of call in 
the code), there will be error message:
Both toSparse in the DenseVector of type (nnz:Int) 
org.apache.spark.ml.linalg.SparseVector and toSparse in trait Vector of type 
=>org.apache.spark.ml.linalg.SparseVector match  .
So we should change the name of toSparse(nnz: Int), maybe 
toSparseWithSize(nnz: Int).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132609949
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -659,13 +660,19 @@ case class StructsToJson(
 (arr: Any) =>
   gen.write(arr.asInstanceOf[ArrayData])
   getAndReset()
+  case MapType(_: DataType, _: StructType, _: Boolean) =>
+(map: Any) =>
+  val mapType = child.dataType.asInstanceOf[MapType]
+  gen.write(map.asInstanceOf[MapData], mapType)
+  getAndReset()
 }
   }
 
   override def dataType: DataType = StringType
 
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
-case _: StructType | ArrayType(_: StructType, _) =>
+case _: StructType | ArrayType(_: StructType, _) |
+ MapType(_: DataType, _: StructType, _: Boolean) =>
--- End diff --

Is there no compatibility issue if we rename the expression?

I've looked around and I think it should not be difficult to support 
`MapType` like `StructType`, although we needs to change few more places.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18810
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17583: [SPARK-20271]Add FuncTransformer to simplify custom tran...

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17583
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18810
  
**[Test build #80516 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80516/testReport)**
 for PR 18810 at commit 
[`b83cd1c`](https://github.com/apache/spark/commit/b83cd1c2e291e2ec429125933db7db4f7c41a11b).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18810
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80516/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17583: [SPARK-20271]Add FuncTransformer to simplify custom tran...

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17583
  
**[Test build #80514 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80514/testReport)**
 for PR 17583 at commit 
[`47aa749`](https://github.com/apache/spark/commit/47aa7492e0f3edf3549e5e7b1eeb6074fb5d6f8b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17583: [SPARK-20271]Add FuncTransformer to simplify custom tran...

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17583
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80514/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18810
  
**[Test build #80516 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80516/testReport)**
 for PR 18810 at commit 
[`b83cd1c`](https://github.com/apache/spark/commit/b83cd1c2e291e2ec429125933db7db4f7c41a11b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18897: [SPARK-21655][YARN] Support Kill CLI for Yarn mode

2017-08-10 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/18897
  
I think this RPC layer could also serve for 
[SPARK-19143](https://issues.apache.org/jira/browse/SPARK-19143?focusedCommentId=15815078&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15815078).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18877: [SPARK-17742][core] Handle child process exit in SparkLa...

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18877
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18877: [SPARK-17742][core] Handle child process exit in SparkLa...

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18877
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80511/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18877: [SPARK-17742][core] Handle child process exit in SparkLa...

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18877
  
**[Test build #80511 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80511/testReport)**
 for PR 18877 at commit 
[`def2329`](https://github.com/apache/spark/commit/def23297b4c3b1542696e5fd2f7efb6aca9274c1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18756: [SPARK-21548][SQL] "Support insert into serial columns o...

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18756
  
**[Test build #80515 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80515/testReport)**
 for PR 18756 at commit 
[`7277eaa`](https://github.com/apache/spark/commit/7277eaadd7beee5076a6337d11bd318cf04ff461).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18914
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18519: [SPARK-16742] Mesos Kerberos Support

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18519
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80509/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18519: [SPARK-16742] Mesos Kerberos Support

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18519
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18519: [SPARK-16742] Mesos Kerberos Support

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18519
  
**[Test build #80509 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80509/testReport)**
 for PR 18519 at commit 
[`cdd3030`](https://github.com/apache/spark/commit/cdd3030fabcecaa00fdf78be489187a5eb1001e9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18914: [MINOR][SQL][TEST]no uncache table in joinsuite t...

2017-08-10 Thread heary-cao
GitHub user heary-cao opened a pull request:

https://github.com/apache/spark/pull/18914

[MINOR][SQL][TEST]no uncache table in joinsuite test

## What changes were proposed in this pull request?

this may be a small mistake,  let's fix it. thanks.

## How was this patch tested?
Existing test cases.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/heary-cao/spark uncache_table

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18914.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18914


commit b537ce074a808c10f169357741fb0b0cc256e741
Author: caoxuewen 
Date:   2017-08-11T02:14:22Z

no uncache table in joinsuite test




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-10 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/18902
  
@hhbyyh Yes, I will test the performance. 
Btw, the median computation by call `stat.approxQuantile` will also 
transform df into rdd before aggregation. see 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala#L102


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18912: [SPARK-21699][SQL] Remove unused getTableOption i...

2017-08-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18912


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18912: [SPARK-21699][SQL] Remove unused getTableOption in Exter...

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18912
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80510/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18912: [SPARK-21699][SQL] Remove unused getTableOption in Exter...

2017-08-10 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18912
  
Merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18912: [SPARK-21699][SQL] Remove unused getTableOption in Exter...

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18912
  
**[Test build #80510 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80510/testReport)**
 for PR 18912 at commit 
[`5be8011`](https://github.com/apache/spark/commit/5be8011f046dcfc1b353cc7a1de3f48ac4a3ba73).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18912: [SPARK-21699][SQL] Remove unused getTableOption in Exter...

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18912
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >