[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21860
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21860
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95618/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22226: [SPARK-25252][SQL] Support arrays of any types by to_jso...

2018-09-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/6
  
**[Test build #95622 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95622/testReport)**
 for PR 6 at commit 
[`72d2628`](https://github.com/apache/spark/commit/72d2628323af4e44da1083c99c0d4996c34e4c8c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22288: [SPARK-22148][Scheduler] Acquire new executors to...

2018-09-03 Thread Ngone51
Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22288#discussion_r214719743
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -414,9 +425,54 @@ private[spark] class TaskSchedulerImpl(
 launchedAnyTask |= launchedTaskAtCurrentMaxLocality
   } while (launchedTaskAtCurrentMaxLocality)
 }
+
 if (!launchedAnyTask) {
-  taskSet.abortIfCompletelyBlacklisted(hostToExecutors)
-}
+  taskSet.getCompletelyBlacklistedTaskIfAny(hostToExecutors) match 
{
+case taskIndex: Some[Int] => // Returns the taskIndex which 
was unschedulable
+  if (conf.getBoolean("spark.dynamicAllocation.enabled", 
false)) {
+// If the taskSet is unschedulable we kill the existing 
blacklisted executor/s and
+// kick off an abortTimer which after waiting will abort 
the taskSet if we were
+// unable to get new executors and couldn't schedule a 
task from the taskSet.
+// Note: We keep a track of schedulability on a per 
taskSet basis rather than on a
+// per task basis.
+if (!unschedulableTaskSetToExpiryTime.contains(taskSet)) {
+  hostToExecutors.valuesIterator.foreach(executors => 
executors.foreach({
+executor =>
+  logDebug("Killing executor because of task 
unschedulability: " + executor)
+  blacklistTrackerOpt.foreach(blt => 
blt.killBlacklistedExecutor(executor))
--- End diff --

Seriously? You killed all executors ? What if other taskSets' tasks are 
running on them ?

BTW, if you want to refresh executors, you have to enable 
`spark.blacklist.killBlacklistedExecutors`  also.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22318: [SPARK-25150][SQL] Fix attribute deduplication in...

2018-09-03 Thread peter-toth
Github user peter-toth commented on a diff in the pull request:

https://github.com/apache/spark/pull/22318#discussion_r214732767
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -817,7 +819,7 @@ class Analyzer(
   case s: SubqueryExpression =>
 s.withNewPlan(dedupOuterReferencesInSubquery(s.plan, 
attributeRewrites))
 }
-  }
+  }, attributeRewrites)
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22318: [SPARK-25150][SQL] Fix attribute deduplication in...

2018-09-03 Thread peter-toth
Github user peter-toth commented on a diff in the pull request:

https://github.com/apache/spark/pull/22318#discussion_r214732751
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -805,10 +807,10 @@ class Analyzer(
* that this rule cannot handle. When that is the case, there 
must be another rule
* that resolves these conflicts. Otherwise, the analysis will 
fail.
*/
-  right
+  (right, AttributeMap.empty[Attribute])
 case Some((oldRelation, newRelation)) =>
   val attributeRewrites = 
AttributeMap(oldRelation.output.zip(newRelation.output))
-  right transformUp {
+  (right transformUp {
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...

2018-09-03 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22320#discussion_r214735437
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -754,6 +754,47 @@ class HiveDDLSuite
 }
   }
 
+  test("Insert overwrite Hive table should output correct schema") {
+withTable("tbl", "tbl2") {
+  withView("view1") {
+spark.sql("CREATE TABLE tbl(id long)")
--- End diff --

I am not familiar with Hive. But as I look at the debug message of this 
logical plan, the top level is `InsertIntoHiveTable `default`.`tbl2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, true, false, [ID]`. It 
should not be related to this configuration, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...

2018-09-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22320
  
**[Test build #95633 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95633/testReport)**
 for PR 22320 at commit 
[`98bf027`](https://github.com/apache/spark/commit/98bf027df9c4467adc2673097c6762a3ec5210ce).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22319: [SPARK-25044][SQL][followup] add back UserDefinedFunctio...

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22319
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95630/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22319: [SPARK-25044][SQL][followup] add back UserDefinedFunctio...

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22319
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22306: [SPARK-25300][CORE]Unified the configuration parameter `...

2018-09-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22306
  
**[Test build #95621 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95621/testReport)**
 for PR 22306 at commit 
[`8d7baee`](https://github.com/apache/spark/commit/8d7baee91199141f5999f0e49ab3092fb121cc41).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22306: [SPARK-25300][CORE]Unified the configuration parameter `...

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22306
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95621/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22313: [SPARK-25306][SQL] Use cache to speed up `createF...

2018-09-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22313#discussion_r214743988
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala ---
@@ -55,19 +59,52 @@ import org.apache.spark.sql.types._
  * known to be convertible.
  */
 private[orc] object OrcFilters extends Logging {
+  case class FilterWithTypeMap(filter: Filter, typeMap: Map[String, 
DataType])
+
+  private lazy val cacheExpireTimeout =
+
org.apache.spark.sql.execution.datasources.orc.OrcFilters.cacheExpireTimeout
+
+  private lazy val searchArgumentCache = CacheBuilder.newBuilder()
+.expireAfterAccess(cacheExpireTimeout, TimeUnit.SECONDS)
+.build(
+  new CacheLoader[FilterWithTypeMap, Option[Builder]]() {
+override def load(typeMapAndFilter: FilterWithTypeMap): 
Option[Builder] = {
+  buildSearchArgument(
+typeMapAndFilter.typeMap, typeMapAndFilter.filter, 
SearchArgumentFactory.newBuilder())
+}
+  })
+
+  private def getOrBuildSearchArgumentWithNewBuilder(
+  dataTypeMap: Map[String, DataType],
+  expression: Filter): Option[Builder] = {
+// When `spark.sql.orc.cache.sarg.timeout` is 0, cache is disabled.
+if (cacheExpireTimeout > 0) {
+  searchArgumentCache.get(FilterWithTypeMap(expression, dataTypeMap))
+} else {
+  buildSearchArgument(dataTypeMap, expression, 
SearchArgumentFactory.newBuilder())
--- End diff --

Ya. It's possible. But, if we create a Guava loading cache and pass through 
all the cache management logic in Guava, it means a more overhead than this PR. 
In this PR, `spark.sql.orc.cache.sarg.timeout=0` means not creating the loading 
cache at all.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22318: [SPARK-25150][SQL] Fix attribute deduplication in...

2018-09-03 Thread peter-toth
Github user peter-toth commented on a diff in the pull request:

https://github.com/apache/spark/pull/22318#discussion_r214793247
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala ---
@@ -295,4 +295,14 @@ class DataFrameJoinSuite extends QueryTest with 
SharedSQLContext {
   df.join(df, df("id") <=> df("id")).queryExecution.optimizedPlan
 }
   }
+
+  test("SPARK-25150: Attribute deduplication handles attributes in join 
condition properly") {
+val a = spark.range(1, 5)
+val b = spark.range(10)
+val c = b.filter($"id" % 2 === 0)
+
+val r = a.join(b, a("id") === b("id"), "inner").join(c, a("id") === 
c("id"), "inner")
--- End diff --

That simpler join doesn't hit the issue. It is handled by a different rule 
`ResolveNaturalAndUsingJoin`. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

2018-09-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22324
  
**[Test build #95645 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95645/testReport)**
 for PR 22324 at commit 
[`510d729`](https://github.com/apache/spark/commit/510d729b0ed6f83b05a3b0f06c2631163d62ef1a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class FileSourceSuite extends SharedSQLContext `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22324
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95645/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22318: [SPARK-25150][SQL] Fix attribute deduplication in join

2018-09-03 Thread peter-toth
Github user peter-toth commented on the issue:

https://github.com/apache/spark/pull/22318
  
@mgaido91 , 2.2 also suffered from this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22314: [SPARK-25307][SQL] ArraySort function may return an erro...

2018-09-03 Thread dilipbiswal
Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/22314
  
@ueshin Just verified in 2.3. This problem does not exist in 2.3. This is 
due to the fact that implementation of `nullSafeCodeGen` is different in 2.3 
than in master. However, we are missing the test cases we added in these PRs in 
2.3. Should we have the test cases checked in into the branch ? I am afraid 
that if we ever backported the pr that changed nullSafeCodeGen , we may 
introduce this bug. Please advise ..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFileSize in...

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22324
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-03 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22316#discussion_r214754379
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -416,7 +426,7 @@ class RelationalGroupedDataset protected[sql](
 new RelationalGroupedDataset(
   df,
   groupingExprs,
-  RelationalGroupedDataset.PivotType(pivotColumn.expr, 
values.map(Literal.apply)))
+  RelationalGroupedDataset.PivotType(pivotColumn.expr, 
values.map(lit(_).expr)))
--- End diff --

Don't see any advantages of this. It is longer and slower.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22315: [SPARK-25308][SQL] ArrayContains function may return a e...

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22315
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22315: [SPARK-25308][SQL] ArrayContains function may return a e...

2018-09-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22315
  
**[Test build #95636 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95636/testReport)**
 for PR 22315 at commit 
[`712542c`](https://github.com/apache/spark/commit/712542c0480bfb51317a6d4905bbaa0349c940e8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-09-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22240
  
**[Test build #95639 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95639/testReport)**
 for PR 22240 at commit 
[`b6a3c5b`](https://github.com/apache/spark/commit/b6a3c5b3de3ef145805542511770da4f59886858).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22240
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2808/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22240
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-09-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22240
  
**[Test build #95641 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95641/testReport)**
 for PR 22240 at commit 
[`47ebd08`](https://github.com/apache/spark/commit/47ebd0849ec3344f05eb8eb74df36d7bfda7e130).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2

2018-09-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22179#discussion_r214762021
  
--- Diff: 
core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala ---
@@ -412,6 +412,26 @@ class KryoSerializerSuite extends SparkFunSuite with 
SharedSparkContext {
 assert(!ser2.getAutoReset)
   }
 
+  test("ClassCastException when writing a Map after previously " +
--- End diff --

Since this is a bug fix test case, could you add `SPARK-25176` like 
`SPARK-25176 ClassCastException ...`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22317: [SPARK-25310][SQL] ArraysOverlap may throw a Compilation...

2018-09-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22317
  
**[Test build #95629 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95629/testReport)**
 for PR 22317 at commit 
[`7c5b656`](https://github.com/apache/spark/commit/7c5b65657f6e58534ff2ad897f1dfa0618634287).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22315: [SPARK-25308][SQL] ArrayContains function may return a e...

2018-09-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22315
  
**[Test build #95636 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95636/testReport)**
 for PR 22315 at commit 
[`712542c`](https://github.com/apache/spark/commit/712542c0480bfb51317a6d4905bbaa0349c940e8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22145: [SPARK-25152][K8S] Enable SparkR Integration Tests for K...

2018-09-03 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22145
  
what's the latest on this, btw?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...

2018-09-03 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/22320#discussion_r214750815
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
 ---
@@ -805,6 +805,80 @@ class DataFrameReaderWriterSuite extends QueryTest 
with SharedSQLContext with Be
 }
   }
 
+  test("Insert overwrite table command should output correct schema: 
basic") {
+withTable("tbl", "tbl2") {
+  withView("view1") {
+val df = spark.range(10).toDF("id")
--- End diff --

Why is `toDF("id")` required? Why not `spark.range(10)` alone?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...

2018-09-03 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/22320#discussion_r214751930
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -754,6 +754,47 @@ class HiveDDLSuite
 }
   }
 
+  test("Insert overwrite Hive table should output correct schema") {
+withTable("tbl", "tbl2") {
+  withView("view1") {
+spark.sql("CREATE TABLE tbl(id long)")
+spark.sql("INSERT OVERWRITE TABLE tbl SELECT 4")
+spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl")
+spark.sql("CREATE TABLE tbl2(ID long)")
+spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1")
+checkAnswer(spark.table("tbl2"), Seq(Row(4)))
+  }
+}
+  }
+
+  test("Insert into Hive directory should output correct schema") {
+withTable("tbl") {
+  withView("view1") {
+withTempPath { path =>
+  spark.sql("CREATE TABLE tbl(id long)")
+  spark.sql("INSERT OVERWRITE TABLE tbl SELECT 4")
--- End diff --

`s/SELECT/VALUES` as it could be a bit more Spark-idiomatic?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...

2018-09-03 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/22320#discussion_r214751219
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
 ---
@@ -805,6 +805,80 @@ class DataFrameReaderWriterSuite extends QueryTest 
with SharedSQLContext with Be
 }
   }
 
+  test("Insert overwrite table command should output correct schema: 
basic") {
+withTable("tbl", "tbl2") {
+  withView("view1") {
+val df = spark.range(10).toDF("id")
+df.write.format("parquet").saveAsTable("tbl")
+spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl")
+spark.sql("CREATE TABLE tbl2(ID long) USING parquet")
+spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1")
+val identifier = TableIdentifier("tbl2", Some("default"))
+val location = 
spark.sessionState.catalog.getTableMetadata(identifier).location.toString
+val expectedSchema = StructType(Seq(StructField("ID", LongType, 
true)))
+assert(spark.read.parquet(location).schema == expectedSchema)
+checkAnswer(spark.table("tbl2"), df)
+  }
+}
+  }
+
+  test("Insert overwrite table command should output correct schema: 
complex") {
+withTable("tbl", "tbl2") {
+  withView("view1") {
+val df = spark.range(10).map(x => (x, x.toInt, 
x.toInt)).toDF("col1", "col2", "col3")
+df.write.format("parquet").saveAsTable("tbl")
+spark.sql("CREATE VIEW view1 AS SELECT * FROM tbl")
+spark.sql("CREATE TABLE tbl2(COL1 long, COL2 int, COL3 int) USING 
parquet PARTITIONED " +
+  "BY (COL2) CLUSTERED BY (COL3) INTO 3 BUCKETS")
+spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT COL1, COL2, COL3 
FROM view1")
+val identifier = TableIdentifier("tbl2", Some("default"))
+val location = 
spark.sessionState.catalog.getTableMetadata(identifier).location.toString
+val expectedSchema = StructType(Seq(
+  StructField("COL1", LongType, true),
--- End diff --

`nullable` is `true` by default.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...

2018-09-03 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/22320#discussion_r214751023
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
 ---
@@ -805,6 +805,80 @@ class DataFrameReaderWriterSuite extends QueryTest 
with SharedSQLContext with Be
 }
   }
 
+  test("Insert overwrite table command should output correct schema: 
basic") {
+withTable("tbl", "tbl2") {
+  withView("view1") {
+val df = spark.range(10).toDF("id")
+df.write.format("parquet").saveAsTable("tbl")
+spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl")
+spark.sql("CREATE TABLE tbl2(ID long) USING parquet")
+spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1")
+val identifier = TableIdentifier("tbl2", Some("default"))
--- End diff --

`default` is the default database name, isn't it? I'd remove it from the 
test or use `spark.catalog.currentDatabase`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...

2018-09-03 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/22320#discussion_r214751748
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
 ---
@@ -63,7 +63,7 @@ case class CreateHiveTableAsSelectCommand(
 query,
 overwrite = false,
 ifPartitionNotExists = false,
-outputColumns = outputColumns).run(sparkSession, child)
+outputColumnNames = outputColumnNames).run(sparkSession, child)
--- End diff --

Can you remove one `outputColumnNames`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...

2018-09-03 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/22320#discussion_r214751169
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
 ---
@@ -805,6 +805,80 @@ class DataFrameReaderWriterSuite extends QueryTest 
with SharedSQLContext with Be
 }
   }
 
+  test("Insert overwrite table command should output correct schema: 
basic") {
+withTable("tbl", "tbl2") {
+  withView("view1") {
+val df = spark.range(10).toDF("id")
+df.write.format("parquet").saveAsTable("tbl")
+spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl")
+spark.sql("CREATE TABLE tbl2(ID long) USING parquet")
+spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1")
+val identifier = TableIdentifier("tbl2", Some("default"))
+val location = 
spark.sessionState.catalog.getTableMetadata(identifier).location.toString
+val expectedSchema = StructType(Seq(StructField("ID", LongType, 
true)))
+assert(spark.read.parquet(location).schema == expectedSchema)
+checkAnswer(spark.table("tbl2"), df)
+  }
+}
+  }
+
+  test("Insert overwrite table command should output correct schema: 
complex") {
+withTable("tbl", "tbl2") {
+  withView("view1") {
+val df = spark.range(10).map(x => (x, x.toInt, 
x.toInt)).toDF("col1", "col2", "col3")
+df.write.format("parquet").saveAsTable("tbl")
+spark.sql("CREATE VIEW view1 AS SELECT * FROM tbl")
+spark.sql("CREATE TABLE tbl2(COL1 long, COL2 int, COL3 int) USING 
parquet PARTITIONED " +
+  "BY (COL2) CLUSTERED BY (COL3) INTO 3 BUCKETS")
+spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT COL1, COL2, COL3 
FROM view1")
+val identifier = TableIdentifier("tbl2", Some("default"))
--- End diff --

Same as above.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22240
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2807/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-09-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22240
  
**[Test build #95640 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95640/testReport)**
 for PR 22240 at commit 
[`ae4a8e6`](https://github.com/apache/spark/commit/ae4a8e6b784519a2f2a237be258ed1059e91be64).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22240
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22240
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-09-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22240
  
**[Test build #95639 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95639/testReport)**
 for PR 22240 at commit 
[`b6a3c5b`](https://github.com/apache/spark/commit/b6a3c5b3de3ef145805542511770da4f59886858).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22240
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95639/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22240
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-09-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22240
  
**[Test build #95641 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95641/testReport)**
 for PR 22240 at commit 
[`47ebd08`](https://github.com/apache/spark/commit/47ebd0849ec3344f05eb8eb74df36d7bfda7e130).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22179
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22179
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2810/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-03 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22316#discussion_r214761811
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala ---
@@ -308,4 +308,27 @@ class DataFramePivotSuite extends QueryTest with 
SharedSQLContext {
 
 assert(exception.getMessage.contains("aggregate functions are not 
allowed"))
   }
+
+  test("pivoting column list with values") {
+val expected = Row(2012, 1.0, null) :: Row(2013, 48000.0, 3.0) 
:: Nil
+val df = trainingSales
+  .groupBy($"sales.year")
+  .pivot(struct(lower($"sales.course"), $"training"), Seq(
+struct(lit("dotnet"), lit("Experts")),
+struct(lit("java"), lit("Dummies")))
+  ).agg(sum($"sales.earnings"))
+
+checkAnswer(df, expected)
+  }
+
+  test("pivoting column list") {
+val exception = intercept[RuntimeException] {
+  trainingSales
+.groupBy($"sales.year")
+.pivot(struct(lower($"sales.course"), $"training"))
+.agg(sum($"sales.earnings"))
+.collect()
--- End diff --

I tried in your branch;
```
scala> df.show
+++
|training|   sales|
+++
| Experts|[dotNET, 2012, 10...|
| Experts|[JAVA, 2012, 2000...|
| Dummies|[dotNet, 2012, 50...|
| Experts|[dotNET, 2013, 48...|
| Dummies|[Java, 2013, 3000...|
+++

scala> df.groupBy($"sales.year").pivot(struct(lower($"sales.course"), 
$"training")).agg(sum($"sales.earnings"))
java.lang.RuntimeException: Unsupported literal type class 
org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema [dotnet,Dummies]
  at 
org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:78)
  at 
org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$create$2.apply(literals.scala:164)
  at 
org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$create$2.apply(literals.scala:164)
  at scala.util.Try.getOrElse(Try.scala:79)
  at 
org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:163)
  at org.apache.spark.sql.functions$.typedLit(functions.scala:127)
```
I miss something?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...

2018-09-03 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22320#discussion_r214761843
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -69,7 +69,7 @@ case class InsertIntoHiveTable(
 query: LogicalPlan,
 overwrite: Boolean,
 ifPartitionNotExists: Boolean,
-outputColumns: Seq[Attribute]) extends SaveAsHiveFile {
+outputColumnNames: Seq[String]) extends SaveAsHiveFile {
--- End diff --

thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22313: [SPARK-25306][SQL] Use cache to speed up `createF...

2018-09-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22313#discussion_r214744306
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala ---
@@ -55,19 +59,52 @@ import org.apache.spark.sql.types._
  * known to be convertible.
  */
 private[orc] object OrcFilters extends Logging {
+  case class FilterWithTypeMap(filter: Filter, typeMap: Map[String, 
DataType])
+
+  private lazy val cacheExpireTimeout =
+
org.apache.spark.sql.execution.datasources.orc.OrcFilters.cacheExpireTimeout
+
+  private lazy val searchArgumentCache = CacheBuilder.newBuilder()
+.expireAfterAccess(cacheExpireTimeout, TimeUnit.SECONDS)
+.build(
+  new CacheLoader[FilterWithTypeMap, Option[Builder]]() {
+override def load(typeMapAndFilter: FilterWithTypeMap): 
Option[Builder] = {
+  buildSearchArgument(
+typeMapAndFilter.typeMap, typeMapAndFilter.filter, 
SearchArgumentFactory.newBuilder())
+}
+  })
+
+  private def getOrBuildSearchArgumentWithNewBuilder(
+  dataTypeMap: Map[String, DataType],
+  expression: Filter): Option[Builder] = {
+// When `spark.sql.orc.cache.sarg.timeout` is 0, cache is disabled.
+if (cacheExpireTimeout > 0) {
+  searchArgumentCache.get(FilterWithTypeMap(expression, dataTypeMap))
+} else {
+  buildSearchArgument(dataTypeMap, expression, 
SearchArgumentFactory.newBuilder())
+}
+  }
+
   def createFilter(schema: StructType, filters: Array[Filter]): 
Option[SearchArgument] = {
 val dataTypeMap = schema.map(f => f.name -> f.dataType).toMap
 
 // First, tries to convert each filter individually to see whether 
it's convertible, and then
 // collect all convertible ones to build the final `SearchArgument`.
 val convertibleFilters = for {
   filter <- filters
-  _ <- buildSearchArgument(dataTypeMap, filter, 
SearchArgumentFactory.newBuilder())
+  _ <- getOrBuildSearchArgumentWithNewBuilder(dataTypeMap, filter)
 } yield filter
 
 for {
   // Combines all convertible filters using `And` to produce a single 
conjunction
-  conjunction <- convertibleFilters.reduceOption(And)
+  conjunction <- convertibleFilters.reduceOption { (x, y) =>
+val newFilter = org.apache.spark.sql.sources.And(x, y)
+if (cacheExpireTimeout > 0) {
+  // Build in a bottom-up manner
+  getOrBuildSearchArgumentWithNewBuilder(dataTypeMap, newFilter)
+}
--- End diff --

Final conjunction? All sub function results will be cached in the end.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22315: [SPARK-25308][SQL] ArrayContains function may return a e...

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22315
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22315: [SPARK-25308][SQL] ArrayContains function may return a e...

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22315
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95634/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22315: [SPARK-25308][SQL] ArrayContains function may return a e...

2018-09-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22315
  
**[Test build #95634 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95634/testReport)**
 for PR 22315 at commit 
[`712542c`](https://github.com/apache/spark/commit/712542c0480bfb51317a6d4905bbaa0349c940e8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22317: [SPARK-25310][SQL] ArraysOverlap may throw a Compilation...

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22317
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95629/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22317: [SPARK-25310][SQL] ArraysOverlap may throw a Compilation...

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22317
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22313: [SPARK-25306][SQL] Use cache to speed up `createFilter` ...

2018-09-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22313
  
Thank you for review and advice, @cloud-fan . It turns out that my initial 
assessment is not enough.

First of all, from the beginning, 
[SPARK-2883](https://github.com/apache/spark/commit/aa31e431fc09f0477f1c2351c6275769a31aca90#diff-6cac9bc2656e3782b0312dceb8c55d47R75)
 is designed as a recursive function like the following. Please see `tryLeft` 
and `tryRight`. It's a purely computation to check if it succeeds. There is no 
reuse here. So, I tried to cache the first two `tryLeft` and `tryRight` 
operation since they can be re-used.
```scala
val tryLeft = buildSearchArgument(left, newBuilder)
val tryRight = buildSearchArgument(right, newBuilder)
val conjunction = for {
  _ <- tryLeft
  _ <- tryRight
  lhs <- buildSearchArgument(left, builder.startAnd())
  rhs <- buildSearchArgument(right, lhs)
} yield rhs.end()
```

However, before that, `createFilter` generates the target tree with 
[reduceOption(And)](https://github.com/apache/spark/commit/aa31e431fc09f0477f1c2351c6275769a31aca90#diff-6cac9bc2656e3782b0312dceb8c55d47R35)
 as a deeply skewed tree. That was the root cause. I'll update this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22318: [SPARK-25150][SQL] Fix attribute deduplication in join

2018-09-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22318
  
**[Test build #95632 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95632/testReport)**
 for PR 22318 at commit 
[`d6e316a`](https://github.com/apache/spark/commit/d6e316a92cc4283f52f9cf141fe57bcece2cdf6b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...

2018-09-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22320
  
**[Test build #95633 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95633/testReport)**
 for PR 22320 at commit 
[`98bf027`](https://github.com/apache/spark/commit/98bf027df9c4467adc2673097c6762a3ec5210ce).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22320
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22320
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95633/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22314: [SPARK-25307][SQL] ArraySort function may return an erro...

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22314
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22240
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22240
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2806/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22240
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22240
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95640/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-09-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22240
  
**[Test build #95640 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95640/testReport)**
 for PR 22240 at commit 
[`ae4a8e6`](https://github.com/apache/spark/commit/ae4a8e6b784519a2f2a237be258ed1059e91be64).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22315: [SPARK-25308][SQL] ArrayContains function may return a e...

2018-09-03 Thread dilipbiswal
Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/22315
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22315: [SPARK-25308][SQL] ArrayContains function may return a e...

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22315
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2803/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22315: [SPARK-25308][SQL] ArrayContains function may return a e...

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22315
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22316
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95631/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22316
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

2018-09-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22316
  
**[Test build #95631 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95631/testReport)**
 for PR 22316 at commit 
[`673ef00`](https://github.com/apache/spark/commit/673ef001adf9b64d644c782eed2aefecc029ed81).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22318: [SPARK-25150][SQL] Fix attribute deduplication in join

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22318
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22313: [SPARK-25306][SQL] Use cache to speed up `createFilter` ...

2018-09-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22313
  
**[Test build #95637 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95637/testReport)**
 for PR 22313 at commit 
[`4acbaf8`](https://github.com/apache/spark/commit/4acbaf8be9e572c5cdbc61c49b488e8aef9e646b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22318: [SPARK-25150][SQL] Fix attribute deduplication in join

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22318
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95632/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22318: [SPARK-25150][SQL] Fix attribute deduplication in...

2018-09-03 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/22318#discussion_r214752480
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala ---
@@ -295,4 +295,14 @@ class DataFrameJoinSuite extends QueryTest with 
SharedSQLContext {
   df.join(df, df("id") <=> df("id")).queryExecution.optimizedPlan
 }
   }
+
+  test("SPARK-25150: Attribute deduplication handles attributes in join 
condition properly") {
+val a = spark.range(1, 5)
+val b = spark.range(10)
+val c = b.filter($"id" % 2 === 0)
+
+val r = a.join(b, a("id") === b("id"), "inner").join(c, a("id") === 
c("id"), "inner")
--- End diff --

Why is this a simpler `a.join(b, "id").join(c, "id")`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-03 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/22316#discussion_r214752855
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -416,7 +426,7 @@ class RelationalGroupedDataset protected[sql](
 new RelationalGroupedDataset(
   df,
   groupingExprs,
-  RelationalGroupedDataset.PivotType(pivotColumn.expr, 
values.map(Literal.apply)))
+  RelationalGroupedDataset.PivotType(pivotColumn.expr, 
values.map(lit(_).expr)))
--- End diff --

What do you think about `map(lit).map(_.expr)` instead?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22314: [SPARK-25307][SQL] ArraySort function may return an erro...

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22314
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95635/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22314: [SPARK-25307][SQL] ArraySort function may return an erro...

2018-09-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22314
  
**[Test build #95635 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95635/testReport)**
 for PR 22314 at commit 
[`d27256e`](https://github.com/apache/spark/commit/d27256ec70868f3fc66901abec97b4ccd75977ad).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22315: [SPARK-25308][SQL] ArrayContains function may return a e...

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22315
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95636/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22240
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2809/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22240
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-09-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22240
  
**[Test build #95642 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95642/testReport)**
 for PR 22240 at commit 
[`c61eec3`](https://github.com/apache/spark/commit/c61eec363f78d586070c673e44e9120eb10b83b5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22240
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95641/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2

2018-09-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22179
  
**[Test build #95643 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95643/testReport)**
 for PR 22179 at commit 
[`f2fb28d`](https://github.com/apache/spark/commit/f2fb28da3eb272651530b77dbd4ea33511f0727d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2

2018-09-03 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22179
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5