date:20180411

[GitHub] spark pull request #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Opt...

2018-04-11 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20560#discussion_r180664561
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -274,3 +279,7 @@ abstract class BinaryNode extends LogicalPlan {
 
   override final def children: Seq[LogicalPlan] = Seq(left, right)
 }
+
+abstract class KeepOrderUnaryNode extends UnaryNode {
--- End diff --

thanks for the suggestion. I'd love to hear also @cloud-fan's and @wzhfy's 
opinion on this in order to choose all together the best name for it. What do 
you think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Opt...

2018-04-11 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20560#discussion_r180664594
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
@@ -22,10 +22,11 @@ import org.apache.spark.sql.{execution, Row}
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.plans.{Cross, FullOuter, Inner, 
LeftOuter, RightOuter}
-import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, 
Repartition}
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, 
Repartition, Sort}
 import org.apache.spark.sql.catalyst.plans.physical._
 import org.apache.spark.sql.execution.columnar.InMemoryRelation
-import org.apache.spark.sql.execution.exchange.{EnsureRequirements, 
ReusedExchangeExec, ReuseExchange, ShuffleExchangeExec}
+import org.apache.spark.sql.execution.exchange.{EnsureRequirements, 
ReusedExchangeExec, ReuseExchange,
+  ShuffleExchangeExec}
--- End diff --

why?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21019: [SPARK-23948] Trigger mapstage's job listener in submitM...

2018-04-11 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/21019
  
@squito @cloud-fan 
How do you think this change ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21034: [SPARK-23926][SQL] Extending reverse function to support...

2018-04-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21034
  
**[Test build #89183 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89183/testReport)**
 for PR 21034 at commit 
[`3a76d87`](https://github.com/apache/spark/commit/3a76d87492a8d03a29aeee877f5564071b0a68a8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec.bufVars...

2018-04-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21039
  
**[Test build #89185 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89185/testReport)**
 for PR 21039 at commit 
[`0b5a075`](https://github.com/apache/spark/commit/0b5a075b9de40d36379551720d7e0e07ad6deb40).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec.bufVars...

2018-04-11 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/21039
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20981
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20981
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2202/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpa...

2018-04-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21007#discussion_r180710305
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/TestQueryExecutionListener.scala 
---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.util.concurrent.atomic.AtomicBoolean
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.execution.QueryExecution
+import org.apache.spark.sql.util.QueryExecutionListener
+
+
+class TestQueryExecutionListener extends QueryExecutionListener with 
Logging {
--- End diff --

oops true.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20931: [SPARK-23815][Core]Spark writer dynamic partition overwr...

2018-04-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20931
  
**[Test build #89187 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89187/testReport)**
 for PR 20931 at commit 
[`08b9601`](https://github.com/apache/spark/commit/08b9601ee9a0850f8fea0af7c9027bdef2016857).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19868: [SPARK-22676] Avoid iterating all partition paths...

2018-04-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19868#discussion_r180713699
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
@@ -176,12 +176,13 @@ class HadoopTableReader(
   val matches = fs.globStatus(pathPattern)
   matches.foreach(fileStatus => existPathSet += 
fileStatus.getPath.toString)
 }
-// convert  /demo/data/year/month/day  to  /demo/data/*/*/*/
+// convert  /demo/data/year/month/day  to  
/demo/data/year/month/*/
--- End diff --

This is a pretty old logic. Can you explain what's going on here and why 
your change works? It can help other people to understand your change quickly,


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec....

2018-04-11 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21039#discussion_r180719456
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 ---
@@ -236,6 +236,8 @@ case class HashAggregateExec(
  | }
""".stripMargin)
 
+bufVars = null  // explicitly null this field out to allow the 
referent to be GC'd sooner
--- End diff --

I am curious what happens when `bufVars ` is accessed in 
`doConsumeWithoutKeys`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20345
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2198/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...

2018-04-11 Thread tiboun

Github user tiboun commented on the issue:

https://github.com/apache/spark/pull/21014
  
Ah yes, correct Vanzin, I didn't think about that use case. So I will apply 
shellEscape for both in this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21021
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2199/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21021
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20345
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpa...

2018-04-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21007#discussion_r180705768
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/TestQueryExecutionListener.scala 
---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.util.concurrent.atomic.AtomicBoolean
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.execution.QueryExecution
+import org.apache.spark.sql.util.QueryExecutionListener
+
+
+class TestQueryExecutionListener extends QueryExecutionListener with 
Logging {
--- End diff --

No need to with Logging now?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-11 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/20981
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Opt...

2018-04-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20560#discussion_r180716400
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
 ---
@@ -169,4 +169,6 @@ case class InMemoryRelation(
 
   override protected def otherCopyArgs: Seq[AnyRef] =
 Seq(_cachedColumnBuffers, sizeInBytesStats, statsOfPlanToCache)
+
+  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
--- End diff --

We should carry the logical ordering from the cached logical plan when 
building the `InMemoryRelation`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21037
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2203/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20871: [SPARK-23762][SQL] UTF8StringBuffer uses MemoryBlock

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20871
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20874: [SPARK-23763][SQL] OffHeapColumnVector uses MemoryBlock

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20874
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2205/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20874: [SPARK-23763][SQL] OffHeapColumnVector uses MemoryBlock

2018-04-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20874
  
**[Test build #89190 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89190/testReport)**
 for PR 20874 at commit 
[`9ef19df`](https://github.com/apache/spark/commit/9ef19dfcde9dc84f494bff5f03a56db840741496).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

2018-04-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21031
  
**[Test build #89189 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89189/testReport)**
 for PR 21031 at commit 
[`d405d8a`](https://github.com/apache/spark/commit/d405d8a37c6e46340f49ff449b8a5aeca80de751).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

2018-04-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21037
  
**[Test build #89188 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89188/testReport)**
 for PR 21037 at commit 
[`535e511`](https://github.com/apache/spark/commit/535e511a4d33d815a621b57c47901886b2ce8fc7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20871: [SPARK-23762][SQL] UTF8StringBuffer uses MemoryBlock

2018-04-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20871
  
**[Test build #89191 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89191/testReport)**
 for PR 20871 at commit 
[`a7665fd`](https://github.com/apache/spark/commit/a7665fd64875e1343796c8923126d715b5376808).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20874: [SPARK-23763][SQL] OffHeapColumnVector uses MemoryBlock

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20874
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21031
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21031
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2204/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21037
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20871: [SPARK-23762][SQL] UTF8StringBuffer uses MemoryBlock

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20871
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2206/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21035: [SPARK-23847][FOLLOWUP][PYTHON][SQL] Actually test [desc...

2018-04-11 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21035
  
Merged to master.

Thanks @felixcheung.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20345
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20345
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2207/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21038: [SPARK-22968][DStream] Fix Kafka connector partition rev...

2018-04-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21038
  
**[Test build #89177 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89177/testReport)**
 for PR 21038 at commit 
[`f317dec`](https://github.com/apache/spark/commit/f317dec0d863a717dc424707571453b11c43e700).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21038: [SPARK-22968][DStream] Fix Kafka connector partition rev...

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21038
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2197/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21038: [SPARK-22968][DStream] Fix Kafka connector partition rev...

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21038
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2018-04-11 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/15770
  
@wangmiao1981 If you're busy I can help take over this. -:)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-04-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20345
  
**[Test build #89178 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89178/testReport)**
 for PR 20345 at commit 
[`94d9171`](https://github.com/apache/spark/commit/94d9171b8ec26c21724dd393cf4fc83ff52623e7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20931: [SPARK-23815][Core]Spark writer dynamic partition overwr...

2018-04-11 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20931
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21034: [SPARK-23926][SQL] Extending reverse function to support...

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21034
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89183/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20535: [SPARK-23341][SQL] define some standard options f...

2018-04-11 Thread gengliangwang

Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/20535#discussion_r180714763
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceOptions.java 
---
@@ -17,16 +17,61 @@
 
 package org.apache.spark.sql.sources.v2;
 
+import java.io.IOException;
 import java.util.HashMap;
 import java.util.Locale;
 import java.util.Map;
 import java.util.Optional;
+import java.util.stream.Stream;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
 
 import org.apache.spark.annotation.InterfaceStability;
 
 /**
  * An immutable string-to-string map in which keys are case-insensitive. 
This is used to represent
  * data source options.
+ *
+ * Each data source implementation can define its own options and teach 
its users how to set them.
+ * Spark doesn't have any restrictions about what options a data source 
should or should not have.
+ * Instead Spark defines some standard options that data sources can 
optionally adopt. It's possible
+ * that some options are very common and many data sources use them. 
However different data
+ * sources may define the common options(key and meaning) differently, 
which is quite confusing to
+ * end users.
+ *
+ * The standard options defined by Spark:
+ * 
+ *   
+ * Option key
+ * Option value
+ *   
+ *   
+ * path
+ * A path string of the data files/directories, like
+ * path1, /absolute/file2, 
path3/*. The path can
+ * either be relative or absolute, points to either file or directory, 
and can contain
+ * wildcards. This option is commonly used by file-based data 
sources.
+ *   
+ *   
+ * paths
+ * A JSON array style paths string of the data files/directories, 
like
+ * ["path1", "/absolute/file2"]. The format of each path 
is same as the
+ * path option, plus it should follow JSON string literal 
format, e.g. quotes
+ * should be escaped, pa\"th means pa"th.
--- End diff --

pa\"th?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21034: [SPARK-23926][SQL] Extending reverse function to support...

2018-04-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21034
  
**[Test build #89183 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89183/testReport)**
 for PR 21034 at commit 
[`3a76d87`](https://github.com/apache/spark/commit/3a76d87492a8d03a29aeee877f5564071b0a68a8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Reverse(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Opt...

2018-04-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20560#discussion_r180714835
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -274,3 +279,7 @@ abstract class BinaryNode extends LogicalPlan {
 
   override final def children: Seq[LogicalPlan] = Seq(left, right)
 }
+
+abstract class KeepOrderUnaryNode extends UnaryNode {
--- End diff --

OrderPreservingUnaryNode sounds better.

It only makes sense for unary node, so I don't think mixin trait is a good 
idea.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21034: [SPARK-23926][SQL] Extending reverse function to support...

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21034
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21009: [SPARK-23905][SQL] Add UDF weekday

2018-04-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21009
  
**[Test build #89181 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89181/testReport)**
 for PR 21009 at commit 
[`79beb00`](https://github.com/apache/spark/commit/79beb00026fb2dd2df295bccd797aaf21746ebdb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...

2018-04-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21014
  
**[Test build #89180 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89180/testReport)**
 for PR 21014 at commit 
[`4732c4b`](https://github.com/apache/spark/commit/4732c4ba0f47d35d9f3b8a67a3fc272a78f82d7c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21021
  
**[Test build #89179 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89179/testReport)**
 for PR 21021 at commit 
[`0203920`](https://github.com/apache/spark/commit/020392024f19cbc1ce172051643e15dade7e7b19).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Opt...

2018-04-11 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20560#discussion_r180664857
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -733,6 +735,17 @@ object EliminateSorts extends Rule[LogicalPlan] {
   }
 }
 
+/**
+ * Removes Sort operation if the child is already sorted
+ */
+object RemoveRedundantSorts extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case Sort(orders, true, child) if child.outputOrdering.nonEmpty
+&& SortOrder.orderingSatisfies(child.outputOrdering, orders) =>
+  child
+  }
--- End diff --

Yes, you're right. Probably we can do this in other PR. May you open a JIRA 
for this? Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21014
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...

2018-04-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21014
  
**[Test build #89180 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89180/testReport)**
 for PR 21014 at commit 
[`4732c4b`](https://github.com/apache/spark/commit/4732c4ba0f47d35d9f3b8a67a3fc272a78f82d7c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21014: [SPARK-23941][Mesos] Mesos task failed on specific spark...

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21014
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89180/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20983: [SPARK-23747][Structured Streaming] Add EpochCoor...

2018-04-11 Thread efimpoberezkin

Github user efimpoberezkin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20983#discussion_r180679855
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/EpochCoordinatorSuite.scala
 ---
@@ -0,0 +1,225 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming.continuous
+
+import org.mockito.InOrder
+import org.mockito.Matchers.{any, eq => eqTo}
+import org.mockito.Mockito._
+import org.scalatest.BeforeAndAfterEach
+import org.scalatest.mockito.MockitoSugar
+
+import org.apache.spark._
+import org.apache.spark.rpc.RpcEndpointRef
+import org.apache.spark.sql.execution.streaming.continuous._
+import org.apache.spark.sql.sources.v2.reader.streaming.{ContinuousReader, 
PartitionOffset}
+import org.apache.spark.sql.sources.v2.writer.WriterCommitMessage
+import org.apache.spark.sql.sources.v2.writer.streaming.StreamWriter
+import org.apache.spark.sql.test.SharedSparkSession
+
+class EpochCoordinatorSuite
+  extends SparkFunSuite
+with SharedSparkSession
+with MockitoSugar
+with BeforeAndAfterEach {
+
+  private var epochCoordinator: RpcEndpointRef = _
+
+  private var writer: StreamWriter = _
+  private var query: ContinuousExecution = _
+  private var orderVerifier: InOrder = _
+
+  override def beforeEach(): Unit = {
+val reader = mock[ContinuousReader]
+writer = mock[StreamWriter]
+query = mock[ContinuousExecution]
+orderVerifier = inOrder(writer, query)
+
+epochCoordinator
+  = EpochCoordinatorRef.create(writer, reader, query, "test", 1, 
spark, SparkEnv.get)
+  }
+
+  override def afterEach(): Unit = {
+SparkEnv.get.rpcEnv.stop(epochCoordinator)
+  }
+
+  test("single epoch") {
+setWriterPartitions(3)
+setReaderPartitions(2)
+
+commitPartitionEpoch(0, 1)
+commitPartitionEpoch(1, 1)
+commitPartitionEpoch(2, 1)
+reportPartitionOffset(0, 1)
+reportPartitionOffset(1, 1)
+
+// Here and in subsequent tests this is called to make a synchronous 
call to EpochCoordinator
+// so that mocks would have been acted upon by the time verification 
happens
+makeSynchronousCall()
+
+verifyCommit(1)
+  }
+
+  test("single epoch, all but one writer partition has committed") {
+setWriterPartitions(3)
+setReaderPartitions(2)
+
+commitPartitionEpoch(0, 1)
+commitPartitionEpoch(1, 1)
+reportPartitionOffset(0, 1)
+reportPartitionOffset(1, 1)
+
+makeSynchronousCall()
+
+verifyCommitHasntHappened(1)
+  }
+
+  test("single epoch, all but one reader partition has reported an 
offset") {
+setWriterPartitions(3)
+setReaderPartitions(2)
+
+commitPartitionEpoch(0, 1)
+commitPartitionEpoch(1, 1)
+commitPartitionEpoch(2, 1)
+reportPartitionOffset(0, 1)
+
+makeSynchronousCall()
+
+verifyCommitHasntHappened(1)
--- End diff --

@tdas @jose-torres done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...

2018-04-11 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20981#discussion_r180685351
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala ---
@@ -119,4 +119,28 @@ object InternalRow {
 case v: MapData => v.copy()
 case _ => value
   }
+
+  /**
+   * Returns an accessor for an `InternalRow` with given data type. The 
returned accessor
+   * actually takes a `SpecializedGetters` input because it can be 
generalized to other classes
+   * that implements `SpecializedGetters` (e.g., `ArrayData`) too.
+   */
+  def getAccessor(dataType: DataType): (SpecializedGetters, Int) => Any = 
dataType match {
--- End diff --

Hehe - good point. Let's leave it here for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20931: [SPARK-23815][Core]Spark writer dynamic partition overwr...

2018-04-11 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20931
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Opt...

2018-04-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20560#discussion_r180715118
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -733,6 +735,17 @@ object EliminateSorts extends Rule[LogicalPlan] {
   }
 }
 
+/**
+ * Removes Sort operation if the child is already sorted
+ */
+object RemoveRedundantSorts extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case Sort(orders, true, child) if child.outputOrdering.nonEmpty
+&& SortOrder.orderingSatisfies(child.outputOrdering, orders) =>
+  child
+  }
--- End diff --

This is a good follow-up


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21038: [SPARK-22968][DStream] Fix Kafka partition revoked issue

2018-04-11 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21038
  
@koeninger would you please help to review, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-04-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20345
  
**[Test build #89192 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89192/testReport)**
 for PR 20345 at commit 
[`94d9171`](https://github.com/apache/spark/commit/94d9171b8ec26c21724dd393cf4fc83ff52623e7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21036: [SPARK-23958][CORE] HadoopRdd filters empty files to avo...

2018-04-11 Thread guoxiaolongzte

Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/21036
  
Thanks, I will try to add test cases. @felixcheung 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21038: [SPARK-22968][DStream] Fix Kafka connector partit...

2018-04-11 Thread jerryshao

GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/21038

[SPARK-22968][DStream] Fix Kafka connector partition revoked issue

## What changes were proposed in this pull request?

Kafka partitions can be revoked when new consumers joined in the consumer 
group to rebalance the partitions. But current Spark Kafka connector code 
doesn't consider partition revoking scenarios, trying to get latest offset from 
revoked partitions, which will throw exceptions as JIRA mentioned.

To reproduce this issue, user could start two `DirectKafkaWordCount` 
example subsequently. When the later one is started, it will trigger rebalance 
and reallocate the partitions across all the consumers, then the former one 
will throw an exception and stop the app.

To fix it, Spark Kafka connector should consider partition revoking 
scenarios and don't maintain offsets for revoked partitions.

Besides, this PR also fixes bugs in `DirectKafkaWordCount`, this example 
simply cannot be worked without the fix.

```
8/01/05 09:48:27 INFO internals.ConsumerCoordinator: Revoking previously 
assigned partitions [kssh-7, kssh-4, kssh-3, kssh-6, kssh-5, kssh-0, kssh-2, 
kssh-1] for group use_a_separate_group_id_for_each_stream
18/01/05 09:48:27 INFO internals.AbstractCoordinator: (Re-)joining group 
use_a_separate_group_id_for_each_stream
18/01/05 09:48:27 INFO internals.AbstractCoordinator: Successfully joined 
group use_a_separate_group_id_for_each_stream with generation 4
18/01/05 09:48:27 INFO internals.ConsumerCoordinator: Setting newly 
assigned partitions [kssh-7, kssh-4, kssh-6, kssh-5] for group 
use_a_separate_group_id_for_each_stream
```

## How was this patch tested?

This is manually verified in local cluster, unfortunately I'm not sure how 
to simulate it in UT, so propose the PR without UT added.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark SPARK-22968

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21038.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21038


commit f317dec0d863a717dc424707571453b11c43e700
Author: jerryshao 
Date:   2018-04-11T06:59:15Z

Fix Kafka connector partition revoked issue




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...

2018-04-11 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21007
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-04-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20345
  
**[Test build #89178 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89178/testReport)**
 for PR 20345 at commit 
[`94d9171`](https://github.com/apache/spark/commit/94d9171b8ec26c21724dd393cf4fc83ff52623e7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIn...

2018-04-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21004#discussion_r180720924
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionProviderCompatibilitySuite.scala
 ---
@@ -81,7 +81,7 @@ class PartitionProviderCompatibilitySuite
   HiveCatalogMetrics.reset()
   assert(spark.sql("select * from test where partCol < 2").count() 
== 2)
   assert(HiveCatalogMetrics.METRIC_PARTITIONS_FETCHED.getCount() 
== 2)
-  assert(HiveCatalogMetrics.METRIC_FILES_DISCOVERED.getCount() == 
2)
+  assert(HiveCatalogMetrics.METRIC_FILES_DISCOVERED.getCount() == 
7)
--- End diff --

what happened here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-04-11 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20345
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21038: [SPARK-22968][DStream] Fix Kafka partition revoked issue

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21038
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21038: [SPARK-22968][DStream] Fix Kafka partition revoked issue

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21038
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89177/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21038: [SPARK-22968][DStream] Fix Kafka partition revoked issue

2018-04-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21038
  
**[Test build #89177 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89177/testReport)**
 for PR 21038 at commit 
[`f317dec`](https://github.com/apache/spark/commit/f317dec0d863a717dc424707571453b11c43e700).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-11 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21021
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21034: [SPARK-23926][SQL] Extending reverse function to ...

2018-04-11 Thread mn-mikke

Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/21034#discussion_r180686345
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -212,6 +213,96 @@ case class SortArray(base: Expression, ascendingOrder: 
Expression)
   override def prettyName: String = "sort_array"
 }
 
+/**
+ * Returns a reversed string or an array with reverse order of elements.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Returns a reversed string or an array with 
reverse order of elements.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('Spark SQL');
+   LQS krapS
+  > SELECT _FUNC_(array(2, 1, 4, 3));
+   [3, 4, 1, 2]
+  """,
+  since = "2.4.0")
+case class Reverse(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
+
+  // Input types are utilized by type coercion in ImplicitTypeCasts.
+  override def inputTypes: Seq[AbstractDataType] = Seq(StringType)
+
+  val allowedTypes = Seq(StringType, ArrayType)
+
+  override def dataType: DataType = child.dataType
+
+  lazy val elementType: DataType = 
dataType.asInstanceOf[ArrayType].elementType
+
+  override def checkInputDataTypes(): TypeCheckResult = {
--- End diff --

Great idea!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21034: [SPARK-23926][SQL] Extending reverse function to support...

2018-04-11 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21034
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-04-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20888
  
**[Test build #89184 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89184/testReport)**
 for PR 20888 at commit 
[`4e7be70`](https://github.com/apache/spark/commit/4e7be70e6d35f290a8ddd6278a9feccbc06791aa).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20888: [SPARK-23775][TEST] Make DataFrameRangeSuite not flaky

2018-04-11 Thread gaborgsomogyi

Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/20888
  
Updated the title and the description to reflect the changes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20981
  
**[Test build #89186 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89186/testReport)**
 for PR 20981 at commit 
[`5711ea1`](https://github.com/apache/spark/commit/5711ea12574a67ee4e309b81c6d616e8841c4dd6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21007
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2200/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21007
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...

2018-04-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20981#discussion_r180684792
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala ---
@@ -119,4 +119,28 @@ object InternalRow {
 case v: MapData => v.copy()
 case _ => value
   }
+
+  /**
+   * Returns an accessor for an `InternalRow` with given data type. The 
returned accessor
+   * actually takes a `SpecializedGetters` input because it can be 
generalized to other classes
+   * that implements `SpecializedGetters` (e.g., `ArrayData`) too.
+   */
+  def getAccessor(dataType: DataType): (SpecializedGetters, Int) => Any = 
dataType match {
--- End diff --

As `SpecializedGetters` is a java interface, seems we can't create a 
companion object for it? Or I miss something?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec....

2018-04-11 Thread rednaxelafx

GitHub user rednaxelafx opened a pull request:

https://github.com/apache/spark/pull/21039

[SPARK-23960][SQL][MINOR] Mark HashAggregateExec.bufVars as transient

## What changes were proposed in this pull request?

Mark `HashAggregateExec.bufVars` as transient to avoid it from being 
serialized.
Also manually null out this field at the end of `doProduceWithoutKeys()` to 
shorten its lifecycle, because it'll no longer be used after that.

## How was this patch tested?

Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rednaxelafx/apache-spark codegen-improve

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21039.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21039


commit 0b5a075b9de40d36379551720d7e0e07ad6deb40
Author: Kris Mok 
Date:   2018-04-11T09:35:31Z

SPARK-23960: Mark HashAggregateExec.bufVars as transient




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21029: [WIP][SPARK-23952] remove type parameter in DataR...

2018-04-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21029#discussion_r180705366
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/memory.scala 
---
@@ -143,7 +144,7 @@ case class MemoryStream[A : Encoder](id: Int, 
sqlContext: SQLContext)
   logDebug(generateDebugString(newBlocks.flatten, startOrdinal, 
endOrdinal))
 
   newBlocks.map { block =>
-new 
MemoryStreamDataReaderFactory(block).asInstanceOf[DataReaderFactory[UnsafeRow]]
+new 
MemoryStreamDataReaderFactory(block).asInstanceOf[DataReaderFactory]
--- End diff --

Array is a java-friendly type.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21029: [WIP][SPARK-23952] remove type parameter in DataR...

2018-04-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21029#discussion_r180705219
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala
 ---
@@ -95,21 +77,29 @@ case class DataSourceV2ScanExec(
   
sparkContext.getLocalProperty(ContinuousExecution.EPOCH_COORDINATOR_ID_KEY),
   sparkContext.env)
 .askSync[Unit](SetReaderPartitions(readerFactories.size))
-  new ContinuousDataSourceRDD(sparkContext, sqlContext, 
readerFactories)
-.asInstanceOf[RDD[InternalRow]]
-
-case r: SupportsScanColumnarBatch if r.enableBatchRead() =>
-  new DataSourceRDD(sparkContext, 
batchReaderFactories).asInstanceOf[RDD[InternalRow]]
-
+  if (readerFactories.exists(_.dataFormat() == 
DataFormat.COLUMNAR_BATCH)) {
+throw new IllegalArgumentException(
+  "continuous stream reader does not support columnar read yet.")
--- End diff --

We use a type erase hack, and lie to the Scala compiler that we are 
outputting InternalRow. At runtime, we cast the data to ColumnarBatch in 
codegen.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20874: [SPARK-23763][SQL] OffHeapColumnVector uses MemoryBlock

2018-04-11 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20874
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

2018-04-11 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21031
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20871: [SPARK-23762][SQL] UTF8StringBuffer uses MemoryBlock

2018-04-11 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20871
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

2018-04-11 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21037
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIn...

2018-04-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21004#discussion_r180720733
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala
 ---
@@ -126,35 +126,35 @@ abstract class PartitioningAwareFileIndex(
 val caseInsensitiveOptions = CaseInsensitiveMap(parameters)
 val timeZoneId = 
caseInsensitiveOptions.get(DateTimeUtils.TIMEZONE_OPTION)
   .getOrElse(sparkSession.sessionState.conf.sessionLocalTimeZone)
-
-userPartitionSchema match {
+val inferredPartitionSpec = PartitioningUtils.parsePartitions(
+  leafDirs,
+  typeInference = 
sparkSession.sessionState.conf.partitionColumnTypeInferenceEnabled,
+  basePaths = basePaths,
+  timeZoneId = timeZoneId)
+userSpecifiedSchema match {
   case Some(userProvidedSchema) if userProvidedSchema.nonEmpty =>
-val spec = PartitioningUtils.parsePartitions(
-  leafDirs,
-  typeInference = false,
-  basePaths = basePaths,
-  timeZoneId = timeZoneId)
+val userPartitionSchema =
+  
combineInferredAndUserSpecifiedPartitionSchema(inferredPartitionSpec)
 
-// Without auto inference, all of value in the `row` should be 
null or in StringType,
 // we need to cast into the data type that user specified.
 def castPartitionValuesToUserSchema(row: InternalRow) = {
   InternalRow((0 until row.numFields).map { i =>
+val expr = 
inferredPartitionSpec.partitionColumns.fields(i).dataType match {
+  case StringType => Literal.create(row.getUTF8String(i), 
StringType)
--- End diff --

why special case string type?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21036: [SPARK-23958][CORE] HadoopRdd filters empty files...

2018-04-11 Thread guoxiaolongzte

Github user guoxiaolongzte commented on a diff in the pull request:

https://github.com/apache/spark/pull/21036#discussion_r180655799
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -55,7 +56,8 @@ private[spark] class HadoopPartition(rddId: Int, override 
val index: Int, s: Inp
 
   /**
* Get any environment variables that should be added to the users 
environment when running pipes
-   * @return a Map with the environment variables and corresponding 
values, it could be empty
+*
+* @return a Map with the environment variables and corresponding 
values, it could be empty
--- End diff --

Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20345
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...

2018-04-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21007
  
**[Test build #89182 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89182/testReport)**
 for PR 21007 at commit 
[`deacb17`](https://github.com/apache/spark/commit/deacb17816c21811efd630e91cee4c30d421eb36).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21038: [SPARK-22968][DStream] Fix Kafka partition revoked issue

2018-04-11 Thread Hackeruncle

Github user Hackeruncle commented on the issue:

https://github.com/apache/spark/pull/21038
  
@jerryshao Thank you very much for this issue.
I go to compile and test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20345
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89178/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wa...

2018-04-11 Thread gaborgsomogyi

Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20888#discussion_r180691534
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala ---
@@ -164,10 +164,13 @@ class DataFrameRangeSuite extends QueryTest with 
SharedSQLContext with Eventuall
 sparkContext.addSparkListener(listener)
 for (codegen <- Seq(true, false)) {
   withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> 
codegen.toString()) {
-DataFrameRangeSuite.stageToKill = -1
+DataFrameRangeSuite.stageToKill = 
DataFrameRangeSuite.INVALID_STAGE_ID
 val ex = intercept[SparkException] {
   spark.range(0, 1000L, 1, 1).map { x =>
-DataFrameRangeSuite.stageToKill = TaskContext.get().stageId()
+val taskContext = TaskContext.get()
+if (!taskContext.isInterrupted()) {
--- End diff --

To answer your question the whole point of this change is to block 
`DataFrameRangeSuite.stageToKill` overwrite while the first iteration's thread 
is running after `DataFrameRangeSuite.stageToKill = 
DataFrameRangeSuite.INVALID_STAGE_ID` happened. This would work also but would 
be less trivial.

I agree if `DataFrameRangeSuite.stageToKill` object member removed and 
switch to `onTaskStart` then the whole mumbo-jumbo is not required.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec.bufVars...

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21039
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2201/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec.bufVars...

2018-04-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21039
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21019: [SPARK-23948] Trigger mapstage's job listener in submitM...

2018-04-11 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21019
  
cc @jiangxb1987 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Opt...

2018-04-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20560#discussion_r180716173
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
@@ -22,10 +22,11 @@ import org.apache.spark.sql.{execution, Row}
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.plans.{Cross, FullOuter, Inner, 
LeftOuter, RightOuter}
-import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, 
Repartition}
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, 
Repartition, Sort}
 import org.apache.spark.sql.catalyst.plans.physical._
 import org.apache.spark.sql.execution.columnar.InMemoryRelation
-import org.apache.spark.sql.execution.exchange.{EnsureRequirements, 
ReusedExchangeExec, ReuseExchange, ShuffleExchangeExec}
+import org.apache.spark.sql.execution.exchange.{EnsureRequirements, 
ReusedExchangeExec, ReuseExchange,
+  ShuffleExchangeExec}
--- End diff --

it's a unnecessary change. We don't have length limit for imports


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Opt...

2018-04-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20560#discussion_r180716049
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
 ---
@@ -169,4 +169,6 @@ case class InMemoryRelation(
 
   override protected def otherCopyArgs: Seq[AnyRef] =
 Seq(_cachedColumnBuffers, sizeInBytesStats, statsOfPlanToCache)
+
+  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
--- End diff --

in SparkPlan
```
/** Specifies how data is ordered in each partition. */
def outputOrdering: Seq[SortOrder] = Nil
```

So we can't do this


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21018: [SPARK-23880][SQL] Do not trigger any jobs for ca...

2018-04-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21018#discussion_r180717823
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
@@ -119,26 +119,60 @@ class CacheManager extends Logging {
 while (it.hasNext) {
   val cd = it.next()
   if (cd.plan.find(_.sameResult(plan)).isDefined) {
-cd.cachedRepresentation.cachedColumnBuffers.unpersist(blocking)
+cd.cachedRepresentation.clearCache(blocking)
 it.remove()
   }
 }
   }
 
+  /**
+   * Materialize the cache that refers to the given physical plan.
--- End diff --

why do we do this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20451: [SPARK-23146][WIP] Support client mode for Kubernetes cl...

2018-04-11 Thread inetfuture

Github user inetfuture commented on the issue:

https://github.com/apache/spark/pull/20451
  
May I ask, what's the release plan for this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec....

2018-04-11 Thread rednaxelafx

Github user rednaxelafx commented on a diff in the pull request:

https://github.com/apache/spark/pull/21039#discussion_r180721843
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 ---
@@ -236,6 +236,8 @@ case class HashAggregateExec(
  | }
""".stripMargin)
 
+bufVars = null  // explicitly null this field out to allow the 
referent to be GC'd sooner
--- End diff --

The workflow of whole-stage codegen ensures that `doConsumeWithoutKeys` can 
only be on the call stack when `doProduceWithoutKeys` is also on the call 
stack; the liveness of the former is strictly a subset of the latter.
That's because for a plan tree that looks like:
```
+- A
  +- B
+- C
```
The whole-stage codegen system (mostly) works like:
```
A.produce
 |--> B.produce
 | |--> C.produce
 | | |--> B.consume
 | | | |--> A.consume
 | | | ||
 | | | |<---o
 | | |

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-04-11 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20345
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 580 matches

Mail list logo