[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...

2018-10-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22661#discussion_r224676911
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/JoinBenchmark.scala
 ---
@@ -19,229 +19,163 @@ package org.apache.spark.sql.execution.benchmark
 
 import org.apache.spark.sql.execution.joins._
 import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types.IntegerType
 
 /**
- * Benchmark to measure performance for aggregate primitives.
- * To run this:
- *  build/sbt "sql/test-only *benchmark.JoinBenchmark"
- *
- * Benchmarks in this file are skipped in normal builds.
+ * Benchmark to measure performance for joins.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *  bin/spark-submit --class  --jars  

+ *   2. build/sbt "sql/test:runMain "
+ *   3. generate result:
+ *  SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain "
+ *  Results will be written to "benchmarks/JoinBenchmark-results.txt".
+ * }}}
  */
-class JoinBenchmark extends BenchmarkWithCodegen {
+object JoinBenchmark extends SqlBasedBenchmark {
 
-  ignore("broadcast hash join, long key") {
+  def broadcastHashJoinLongKey(): Unit = {
 val N = 20 << 20
 val M = 1 << 16
 
-val dim = broadcast(sparkSession.range(M).selectExpr("id as k", 
"cast(id as string) as v"))
-runBenchmark("Join w long", N) {
-  val df = sparkSession.range(N).join(dim, (col("id") % M) === 
col("k"))
+val dim = broadcast(spark.range(M).selectExpr("id as k", "cast(id as 
string) as v"))
+codegenBenchmark("Join w long", N) {
+  val df = spark.range(N).join(dim, (col("id") % M) === col("k"))
   
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
   df.count()
 }
-
-/*
-Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
-Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
-Join w long:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
-
---
-Join w long codegen=false3002 / 3262  7.0  
   143.2   1.0X
-Join w long codegen=true  321 /  371 65.3  
15.3   9.3X
-*/
   }
 
-  ignore("broadcast hash join, long key with duplicates") {
+  def broadcastHashJoinLongKeyWithDuplicates(): Unit = {
 val N = 20 << 20
 val M = 1 << 16
-
-val dim = broadcast(sparkSession.range(M).selectExpr("id as k", 
"cast(id as string) as v"))
-runBenchmark("Join w long duplicated", N) {
-  val dim = broadcast(sparkSession.range(M).selectExpr("cast(id/10 as 
long) as k"))
-  val df = sparkSession.range(N).join(dim, (col("id") % M) === 
col("k"))
+val dim = broadcast(spark.range(M).selectExpr("cast(id/10 as long) as 
k"))
+codegenBenchmark("Join w long duplicated", N) {
+  val df = spark.range(N).join(dim, (col("id") % M) === col("k"))
   
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
   df.count()
 }
-
-/*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *Join w long duplicated: Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
- 
*---
- *Join w long duplicated codegen=false  3446 / 3478  6.1   
  164.3   1.0X
- *Join w long duplicated codegen=true   322 /  351 65.2
  15.3  10.7X
- */
   }
 
-  ignore("broadcast hash join, two int key") {
+  def broadcastHashJoinTwoIntKey(): Unit = {
 val N = 20 << 20
 val M = 1 << 16
-val dim2 = broadcast(sparkSession.range(M)
+val dim2 = broadcast(spark.range(M)
   .selectExpr("cast(id as int) as k1", "cast(id as int) as k2", 
"cast(id as string) as v"))
 
-runBenchmark("Join w 2 ints", N) {
-  val df = sparkSession.range(N).join(dim2,
+codegenBenchmark("Join w 2 ints", N) {
+  val df = spark.range(N).join(dim2,
 (col("id") % M).cast(IntegerType) === col("k1")
   && (col("id") % M).cast(IntegerType) === col("k2"))
   
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
   df.count()
 }
-
-/*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) 

[GitHub] spark issue #22668: [SPARK-25675] [Spark Job History] Job UI page does not s...

2018-10-11 Thread shivusondur
Github user shivusondur commented on the issue:

https://github.com/apache/spark/pull/22668
  
@gengliangwang @felixcheung If everything okay, can you please  merge the 
PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global ...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22696
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global ...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22696
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97289/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global ...

2018-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22696
  
**[Test build #97289 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97289/testReport)**
 for PR 22696 at commit 
[`b0dc140`](https://github.com/apache/spark/commit/b0dc140cd125498070143f67abf51204373fa14c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22646: [SPARK-25654][SQL] Support for nested JavaBean ar...

2018-10-11 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22646#discussion_r224671775
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -1115,9 +1126,38 @@ object SQLContext {
 })
 }
 }
-def createConverter(cls: Class[_], dataType: DataType): Any => Any = 
dataType match {
-  case struct: StructType => createStructConverter(cls, 
struct.map(_.dataType))
-  case _ => CatalystTypeConverters.createToCatalystConverter(dataType)
+def createConverter(t: Type, dataType: DataType): Any => Any = (t, 
dataType) match {
+  case (cls: Class[_], struct: StructType) =>
+// bean type
+createStructConverter(cls, struct.map(_.dataType))
+  case (arrayType: Class[_], array: ArrayType) if arrayType.isArray =>
+// array type
+val converter = createConverter(arrayType.getComponentType, 
array.elementType)
+value => new GenericArrayData(
+  (0 until JavaArray.getLength(value)).map(i =>
+converter(JavaArray.get(value, i))).toArray)
+  case (_, array: ArrayType) =>
+// java.util.List type
+val cls = classOf[java.util.List[_]]
--- End diff --

Seems like `JavaTypeInference.inferDataType()` supports 
`java.lang.Iterable`, not only `List`, but serializer/deserializer don't. 
Should we change `inferDataType()`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

2018-10-11 Thread boy-uber
Github user boy-uber commented on the issue:

https://github.com/apache/spark/pull/22429
  
> @boy-uber the thing you are suggesting is a pretty big undertaking and 
beyond the scope of this PR.
> 
> If you are going to add structured plans to the explain output, you 
probably also want some guarantees about stability over multiple spark versions 
and you probably also want to be able to reconstruct the plan. Neither is easy. 
If you want to have this in Spark, then I suggest sending a proposal to the dev 
list.

Yeah, that is a larger change and may need more discussion. Your points 
about adding structured plans like that are great! Let me send a email to the 
dev list then! Thanks for the suggestion :)




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...

2018-10-11 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22466#discussion_r224667371
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -2993,6 +2990,7 @@ def test_current_database(self):
 AnalysisException,
 "does_not_exist",
 lambda: spark.catalog.setCurrentDatabase("does_not_exist"))
+spark.sql("DROP DATABASE some_db")
--- End diff --

We should surround with try-finally?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...

2018-10-11 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22466#discussion_r224666263
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -350,9 +350,6 @@ def test_sqlcontext_reuses_sparksession(self):
 def tearDown(self):
--- End diff --

Now we can remove this method?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 integra...

2018-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22703
  
**[Test build #97294 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97294/testReport)**
 for PR 22703 at commit 
[`6e34ce7`](https://github.com/apache/spark/commit/6e34ce7ab7961531d97655e0733ed92f701fbbfd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 integra...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22703
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 integra...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22703
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3913/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22702
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97288/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22702
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

2018-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22702
  
**[Test build #97288 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97288/testReport)**
 for PR 22702 at commit 
[`a9359ab`](https://github.com/apache/spark/commit/a9359abff62017f46f33ef18d7f56f97c885af3d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...

2018-10-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22701


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22677: [SPARK-25683][Core] Make AsyncEventQueue.lastReportTimes...

2018-10-11 Thread shivusondur
Github user shivusondur commented on the issue:

https://github.com/apache/spark/pull/22677
  
@jiangxb1987 
Thanks for your comment,
i think printing "since Wed Dec 31 16:00:00 PST 1969" still looks strange,
Instead we can print "**since start of the application** for first time 
event Dropping, this looks more appropriate.

so first time log should look like this 
**18/10/08 17:51:40 WARN AsyncEventQueue: Dropped 1 events since start of 
the application**

Instead of 
**18/10/08 17:51:40 WARN AsyncEventQueue: Dropped 1 events from eventLog 
since Wed Dec 31 16:00:00 PST 1969.**

please correct me if i am wrong.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF ...

2018-10-11 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22701
  
LGTM

Thanks! Merged to master/2.4


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22575: [SPARK-24630][SS] Support SQLStreaming in Spark

2018-10-11 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/spark/pull/22575
  
How should we do if we wanna join two kafka stream and sink the result to 
another stream?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22661
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97287/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22661
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...

2018-10-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22375


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...

2018-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22661
  
**[Test build #97287 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97287/testReport)**
 for PR 22661 at commit 
[`3be13b1`](https://github.com/apache/spark/commit/3be13b16f1a59ffbd158265f54ad4f8d511d2018).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22375: [SPARK-25388][Test][SQL] Detect incorrect nullable of Da...

2018-10-11 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22375
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22706: [SPARK-25716][SQL][MINOR] remove unnecessary collection ...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22706
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22706: [SPARK-25716][SQL][MINOR] remove unnecessary collection ...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22706
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22706: [SPARK-25716][SQL][MINOR] remove unnecessary collection ...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22706
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22698: [SPARK-25710][SQL] range should report metrics correctly

2018-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22698
  
**[Test build #97293 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97293/testReport)**
 for PR 22698 at commit 
[`4058a21`](https://github.com/apache/spark/commit/4058a21bcffbf73a3d01edd76fb67ead434fb91c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22698: [SPARK-25710][SQL] range should report metrics co...

2018-10-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22698#discussion_r224659990
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 ---
@@ -506,18 +513,18 @@ case class RangeExec(range: 
org.apache.spark.sql.catalyst.plans.logical.Range)
   |   $numElementsTodo = 0;
   |   if ($nextBatchTodo == 0) break;
   | }
-  | $numOutput.add($nextBatchTodo);
-  | $inputMetrics.incRecordsRead($nextBatchTodo);
   | $batchEnd += $nextBatchTodo * ${step}L;
   |   }
   |
   |   int $localEnd = (int)(($batchEnd - $nextIndex) / ${step}L);
   |   for (int $localIdx = 0; $localIdx < $localEnd; $localIdx++) {
   | long $value = ((long)$localIdx * ${step}L) + $nextIndex;
   | ${consume(ctx, Seq(ev))}
-  | $shouldStop
+  | $stopCheck
   |   }
   |   $nextIndex = $batchEnd;
+  |   $numOutput.add($localEnd);
--- End diff --

If it is, then it is no problem. I was thinking that the number of output 
metric at range operator should be 100 if it is followed by a limit(100) 
operator.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22698: [SPARK-25710][SQL] range should report metrics correctly

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22698
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22699: [SPARK-25711][Core] Improve start-history-server.sh: sho...

2018-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22699
  
**[Test build #97292 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97292/testReport)**
 for PR 22699 at commit 
[`5e05c60`](https://github.com/apache/spark/commit/5e05c604fdc9913a1424a569deb16ec3301bd4e4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22704: [SPARK-25681][K8S][WIP] Leverage a config to tune renewa...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22704
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22704: [SPARK-25681][K8S][WIP] Leverage a config to tune renewa...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22704
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97286/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22706: [SPARK-25716][SQL][MINOR] remove unnecessary coll...

2018-10-11 Thread SongYadong
GitHub user SongYadong opened a pull request:

https://github.com/apache/spark/pull/22706

[SPARK-25716][SQL][MINOR] remove unnecessary collection operation in valid 
constraints generation

## What changes were proposed in this pull request?

Project logical operator generates valid constraints using two opposite 
operations. It substracts child constraints from all constraints, than union 
child constraints again. I think it may be not necessary.
Aggregate operator has the same problem with Project. 

This PR try to remove these two opposite collection operations.

## How was this patch tested?

Related unit tests:
ProjectEstimationSuite
CollapseProjectSuite
PushProjectThroughUnionSuite
UnsafeProjectionBenchmark
GeneratedProjectionSuite
CodeGeneratorWithInterpretedFallbackSuite
TakeOrderedAndProjectSuite
GenerateUnsafeProjectionSuite
BucketedRandomProjectionLSHSuite
RemoveRedundantAliasAndProjectSuite
AggregateBenchmark
AggregateOptimizeSuite
AggregateEstimationSuite
DecimalAggregatesSuite
DateFrameAggregateSuite
ObjectHashAggregateSuite
TwoLevelAggregateHashMapSuite
ObjectHashAggregateExecBenchmark
SingleLevelAggregateHaspMapSuite
TypedImperativeAggregateSuite
RewriteDistinctAggregatesSuite
HashAggregationQuerySuite
HashAggregationQueryWithControlledFallbackSuite
TypedImperativeAggregateSuite
TwoLevelAggregateHashMapWithVectorizedMapSuite


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SongYadong/spark generate_constraints

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22706.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22706


commit fab5faaa838295affdb9a1bfeae1d613eddfb7a1
Author: SongYadong 
Date:   2018-10-11T14:12:05Z

remove unnecessary collection operation in valid constraints generation




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22699: [SPARK-25711][Core] Improve start-history-server.sh: sho...

2018-10-11 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/22699
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...

2018-10-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22702#discussion_r224658881
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -276,15 +276,15 @@ object BooleanSimplification extends 
Rule[LogicalPlan] with PredicateHelper {
   case a And b if a.semanticEquals(b) => a
   case a Or b if a.semanticEquals(b) => a
 
-  case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
-  case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
-  case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
-  case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
-
-  case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
-  case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
-  case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
-  case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
+  case a And (b Or c) if !a.nullable && Not(a).semanticEquals(b) => 
And(a, c)
--- End diff --

after more thoughts, `a And (b Or c)` should be better than `If(IsNull(a), 
null, And(a, c))`, as it's more likely to get pushed down to data source, so 
the changes here LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22704: [SPARK-25681][K8S][WIP] Leverage a config to tune renewa...

2018-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22704
  
**[Test build #97286 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97286/testReport)**
 for PR 22704 at commit 
[`6e807e1`](https://github.com/apache/spark/commit/6e807e169cc9113c5fcd1653e610ec473c1ff8e8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22699: [SPARK-25711][Core] Improve start-history-server.sh: sho...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22699
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22698: [SPARK-25710][SQL] range should report metrics correctly

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22698
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3912/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22699: [SPARK-25711][Core] Improve start-history-server.sh: sho...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22699
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3911/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...

2018-10-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22375#discussion_r224660195
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
 ---
@@ -69,11 +69,22 @@ trait ExpressionEvalHelper extends 
GeneratorDrivenPropertyChecks with PlanTestBa
 
   /**
* Check the equality between result of expression and expected value, 
it will handle
-   * Array[Byte], Spread[Double], MapData and Row.
+   * Array[Byte], Spread[Double], MapData and Row. Also check whether 
nullable in expression is
+   * true if result is null
*/
-  protected def checkResult(result: Any, expected: Any, exprDataType: 
DataType): Boolean = {
+  protected def checkResult(result: Any, expected: Any, expression: 
Expression): Boolean = {
+checkResult(result, expected, expression.dataType, expression.nullable)
+  }
+
+  protected def checkResult(
+  result: Any,
+  expected: Any,
+  exprDataType: DataType,
+  exprNullable: Boolean): Boolean = {
 val dataType = UserDefinedType.sqlType(exprDataType)
 
+// The result is null for a non-nullable expression
+assert(result != null || exprNullable, "exprNullable should be true if 
result is null")
--- End diff --

nit: how about "result cannot be null since it's not nullable."


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22698: [SPARK-25710][SQL] range should report metrics co...

2018-10-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22698#discussion_r224659380
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 ---
@@ -506,18 +513,18 @@ case class RangeExec(range: 
org.apache.spark.sql.catalyst.plans.logical.Range)
   |   $numElementsTodo = 0;
   |   if ($nextBatchTodo == 0) break;
   | }
-  | $numOutput.add($nextBatchTodo);
-  | $inputMetrics.incRecordsRead($nextBatchTodo);
   | $batchEnd += $nextBatchTodo * ${step}L;
   |   }
   |
   |   int $localEnd = (int)(($batchEnd - $nextIndex) / ${step}L);
   |   for (int $localIdx = 0; $localIdx < $localEnd; $localIdx++) {
   | long $value = ((long)$localIdx * ${step}L) + $nextIndex;
   | ${consume(ctx, Seq(ev))}
-  | $shouldStop
+  | $stopCheck
   |   }
   |   $nextIndex = $batchEnd;
+  |   $numOutput.add($localEnd);
--- End diff --

more background: the stop check for limit is done in batch granularity, 
while the stop check for result buffer is done in row granularity.

That said, even if the limit is smaller than the batch size, the range 
operator still outputs a entire batch, physically.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22698: [SPARK-25710][SQL] range should report metrics co...

2018-10-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22698#discussion_r224659093
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 ---
@@ -506,18 +513,18 @@ case class RangeExec(range: 
org.apache.spark.sql.catalyst.plans.logical.Range)
   |   $numElementsTodo = 0;
   |   if ($nextBatchTodo == 0) break;
   | }
-  | $numOutput.add($nextBatchTodo);
-  | $inputMetrics.incRecordsRead($nextBatchTodo);
   | $batchEnd += $nextBatchTodo * ${step}L;
   |   }
   |
   |   int $localEnd = (int)(($batchEnd - $nextIndex) / ${step}L);
   |   for (int $localIdx = 0; $localIdx < $localEnd; $localIdx++) {
   | long $value = ((long)$localIdx * ${step}L) + $nextIndex;
   | ${consume(ctx, Seq(ev))}
-  | $shouldStop
+  | $stopCheck
   |   }
   |   $nextIndex = $batchEnd;
+  |   $numOutput.add($localEnd);
--- End diff --

that's expected isn't it? The range operator does output 1000 rows, the 
limit operator takes 1000 inputs, but only output like 100 rows.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...

2018-10-11 Thread maryannxue
Github user maryannxue commented on a diff in the pull request:

https://github.com/apache/spark/pull/22701#discussion_r224658264
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2150,8 +2150,10 @@ class Analyzer(
 
 // TODO: skip null handling for not-nullable primitive inputs 
after we can completely
 // trust the `nullable` information.
+val needsNullCheck = (nullable: Boolean, expr: Expression) =>
--- End diff --

Yes, that's because "nullableType" is flipped around here. "nullableType" 
should really be "cantBeNull" or "doesntNeedNullCheck". I'll change this in 
other PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...

2018-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22674
  
**[Test build #97291 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97291/testReport)**
 for PR 22674 at commit 
[`6e3a345`](https://github.com/apache/spark/commit/6e3a345dd2cfc8071efdacf2a37677a588e00b6d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22674
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3910/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22674
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...

2018-10-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22702#discussion_r224655860
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -276,15 +276,15 @@ object BooleanSimplification extends 
Rule[LogicalPlan] with PredicateHelper {
   case a And b if a.semanticEquals(b) => a
   case a Or b if a.semanticEquals(b) => a
 
-  case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
-  case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
-  case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
-  case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
-
-  case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
-  case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
-  case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
-  case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
+  case a And (b Or c) if !a.nullable && Not(a).semanticEquals(b) => 
And(a, c)
--- End diff --

Since this is complicated, shall we put a comment to explain it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22019: [WIP][SPARK-25040][SQL] Empty string for double and floa...

2018-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22019
  
@viirya and @MaxGekk, are you busy? Do you mind if I ask to take over this? 
we will completely disallow empty strings in other types and target it 3.0.0. 
The changes wouldn't be too much and it requires to update the migration guide.

I will be busy for a couple of weeks so I would appreciate it if you find 
some time to take over this.

Otherwise, I will start to work on this after a couple of weeks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20125: [SPARK-17967][SQL] Support for array as an option in SQL...

2018-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20125
  
I am sorry it's been inactive. Let me update this one within a week.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22702: [SPARK-25714] Fix Null Handling in the Optimizer ...

2018-10-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22702#discussion_r224655771
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -276,15 +276,15 @@ object BooleanSimplification extends 
Rule[LogicalPlan] with PredicateHelper {
   case a And b if a.semanticEquals(b) => a
   case a Or b if a.semanticEquals(b) => a
 
-  case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
-  case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
-  case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
-  case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
-
-  case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
-  case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
-  case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
-  case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
+  case a And (b Or c) if !a.nullable && Not(a).semanticEquals(b) => 
And(a, c)
--- End diff --

assuming a is null, then b is also null.
If c is null: `a And (b Or c)` -> null, And(a, c) -> null
If c is true: `a And (b Or c)` -> null, And(a, c) -> null
if c is false: `a And (b Or c)` -> null, And(a, c) -> false

So yes this is a bug, and we should rewrite it to `If(IsNull(a), a, And(a, 
c))`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

2018-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20877
  
@MaxGekk, are you busy? Do you have some time to go for CSV's lineSep? I 
think I wouldn't have some time within a couple of weeks. If you have some 
time, I would appreciate if you could go ahead. Otherwise, I will try this one 
after a couple of weeks.

The problem in CSV's lineSep is about multiline support. As you might 
already know, CSV's multiline mode is different with JSON in a way it parses 
line by line from the stream whereas JSON treats it as a whole record in 
general - so we should set the lineSep to Univocity parser as well.

The problem is, `lineSep` at Univocity parser has some limitation 
(https://github.com/apache/spark/pull/18581#issuecomment-314037750 and see also 
`https://github.com/uniVocity/univocity-parsers/issues/170`).

There are some changes made in https://github.com/apache/spark/pull/18581 . 
Might able to extract CSV related change and make some addition and deletion.

If it's difficult to support `lineSep` more than one characters by the 
limitation, I think we can restrict the lineSep only to one character in 
`multiLine` mode.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22705: [SPARK-25704][CORE][WIP] Allocate a bit less than Int.Ma...

2018-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22705
  
**[Test build #97290 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97290/testReport)**
 for PR 22705 at commit 
[`cb07bad`](https://github.com/apache/spark/commit/cb07badcd853da0e4083b7e02bdfdf86c9d295f1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22705: [SPARK-25704][CORE][WIP] Allocate a bit less than Int.Ma...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22705
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22705: [SPARK-25704][CORE][WIP] Allocate a bit less than Int.Ma...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22705
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3909/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22705: [SPARK-25704][CORE][WIP] Allocate a bit less than...

2018-10-11 Thread squito
GitHub user squito opened a pull request:

https://github.com/apache/spark/pull/22705

[SPARK-25704][CORE][WIP] Allocate a bit less than Int.MaxValue

JVMs don't you allocate arrays of length exactly Int.MaxValue, so leave
a little extra room.  This is necessary when reading blocks >2GB off
the network (for remote reads or for cache replication).

WIP because I'm still running tests on a real cluster

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/squito/spark SPARK-25704

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22705.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22705


commit cb07badcd853da0e4083b7e02bdfdf86c9d295f1
Author: Imran Rashid 
Date:   2018-10-12T01:54:34Z

[SPARK-25704][CORE] Allocate a bit less than Int.MaxValue

JVMs don't you allocate arrays of length exactly Int.MaxValue, so leave
a little extra room.  This is necessary when reading blocks >2GB off
the network (for remote reads or for cache replication).




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22655: [SPARK-25666][PYTHON] Internally document type conversio...

2018-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22655
  
@viirya and @BryanCutler, do you guys have some time to go for Pandas one? 
I think I wouldn't have some time within a couple of weeks. If you guys have 
some time, I would appreciate if you could go ahead. Otherwise, I will start 
this one after a couple of weeks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22699: [SPARK-25711][Core] Allow start-history-server.sh to sho...

2018-10-11 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/22699
  
Let's also update the title to include the deprecation changes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22593: [Streaming][DOC] Fix typo & format in DataStreamWriter.s...

2018-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22593
  
Also, let's mention this PR targets to fix javadoc in the PR description 
and/or title.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22593: [Streaming][DOC] Fix typo & format in DataStreamWriter.s...

2018-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22593
  
Also, let's mention this PR targets to fix javadoc in the PR description, 
title and/or JIRA.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22699: [SPARK-25711][Core] Allow start-history-server.sh to sho...

2018-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22699
  
**[Test build #4373 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4373/testReport)**
 for PR 22699 at commit 
[`5e05c60`](https://github.com/apache/spark/commit/5e05c604fdc9913a1424a569deb16ec3301bd4e4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF ...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22701
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97283/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF ...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22701
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF ...

2018-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22701
  
**[Test build #97283 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97283/testReport)**
 for PR 22701 at commit 
[`dfa301e`](https://github.com/apache/spark/commit/dfa301ebdf289d6501a8c0edf44e35e76a043c7d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22695: [MINOR][SQL]remove Redundant semicolons

2018-10-11 Thread heary-cao
Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/22695
  
@srowen,thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22379
  
Looks pretty much getting close to go.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22379
  
Looks pretty mush getting close to go.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22379#discussion_r224649633
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala
 ---
@@ -19,8 +19,8 @@ package org.apache.spark.sql.execution.datasources.csv
 
 import org.apache.spark.rdd.RDD
 import org.apache.spark.sql.Dataset
+import org.apache.spark.sql.catalyst.csv.CSVOptions
 import org.apache.spark.sql.functions._
-import org.apache.spark.sql.types._
 
 object CSVUtils {
--- End diff --

@MaxGekk, actually I was wondering if it's difficult to move this under 
catalyst package as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22379#discussion_r224649495
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3854,6 +3854,38 @@ object functions {
   @scala.annotation.varargs
   def map_concat(cols: Column*): Column = withExpr { 
MapConcat(cols.map(_.expr)) }
 
+  /**
+   * Parses a column containing a CSV string into a `StructType` with the 
specified schema.
+   * Returns `null`, in the case of an unparseable string.
+   *
+   * @param e a string column containing CSV data.
+   * @param schema the schema to use when parsing the CSV string
+   * @param options options to control how the CSV is parsed. accepts the 
same options and the
+   *CSV data source.
+   *
+   * @group collection_funcs
+   * @since 3.0.0
+   */
+  def from_csv(e: Column, schema: StructType, options: Map[String, 
String]): Column = withExpr {
--- End diff --

I would like to suggest to avoid adding overridden versions for now ... it 
has one Java specific version so should be usable in Java side.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22666
  
Let's add from_csv first.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22379#discussion_r224649188
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3854,6 +3854,38 @@ object functions {
   @scala.annotation.varargs
   def map_concat(cols: Column*): Column = withExpr { 
MapConcat(cols.map(_.expr)) }
 
+  /**
+   * Parses a column containing a CSV string into a `StructType` with the 
specified schema.
+   * Returns `null`, in the case of an unparseable string.
+   *
+   * @param e a string column containing CSV data.
+   * @param schema the schema to use when parsing the CSV string
+   * @param options options to control how the CSV is parsed. accepts the 
same options and the
+   *CSV data source.
+   *
+   * @group collection_funcs
+   * @since 3.0.0
+   */
+  def from_csv(e: Column, schema: StructType, options: Map[String, 
String]): Column = withExpr {
+CsvToStructs(schema, options, e.expr)
+  }
+
+  /**
+   * (Java-specific) Parses a column containing a CSV string into a 
`StructType`
+   * with the specified schema. Returns `null`, in the case of an 
unparseable string.
+   *
+   * @param e a string column containing CSV data.
+   * @param schema the schema to use when parsing the CSV string
+   * @param options options to control how the CSV is parsed. accepts the 
same options and the
+   *CSV data source.
+   *
+   * @group collection_funcs
+   * @since 3.0.0
+   */
+  def from_csv(e: Column, schema: String, options: java.util.Map[String, 
String]): Column = {
--- End diff --

@MaxGekk, can we replace `schema: String` to `schema: Column` for 
`schema_of_csv`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22575: [SPARK-24630][SS] Support SQLStreaming in Spark

2018-10-11 Thread stczwd
Github user stczwd commented on the issue:

https://github.com/apache/spark/pull/22575
  

@WangTaoTheTonic 
Adding 'stream' keyword has two purposes:

- **Mark the entire sql query as a stream query and generate the 
SQLStreaming plan tree.**
- **Mark the table type as UnResolvedStreamRelation.** Parse the table as 
StreamingRelation or other Relation, especially in the stream join batch 
queries, such as kafka join mysql.

**Besides, the keyword 'stream' makes it easier to express StructStreaming 
with pure SQL.**
A little example to show importances of 'stream': read stream from kafka 
stream table, and join mysql to count user message

  - with 'stream'
- `select stream kafka_sql_test.name, count(door)  from kafka_sql_test 
inner join mysql_test on kafka_sql_test.name == mysql_test.name group by 
kafka_sql_test.name`
  - **It will be regarded as Streaming Query using Console Sink**, the 
kafka_sql_test will be parsed as StreamingRelation and mysql_test will be 
parsed as JDBCRelation, not Streaming Relation.
- `insert into csv_sql_table select stream kafka_sql_test.name, 
count(door)  from kafka_sql_test inner join mysql_test on kafka_sql_test.name 
== mysql_test.name group by kafka_sql_test.name`
  - **It will be regarded as Streaming Query using FileStream Sink**, 
the kafka_sql_test will be parsed as StreamingRelation and mysql_test will be 
parsed as JDBCRelation, not Streaming Relation.

  - without 'stream'
- `select kafka_sql.name, count(door) from kafka_sql_test inner join 
mysql_test on kafka_sql_test.name == mysql_test.name group by 
kafka_sql_test.name`
  - **It will be regarded as Batch Query**, the kafka_sql_test will be 
parsed to KafkaRelation and mysql_test will be parsed as JDBCRelation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global ...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22696
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3908/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global ...

2018-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22696
  
**[Test build #97289 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97289/testReport)**
 for PR 22696 at commit 
[`b0dc140`](https://github.com/apache/spark/commit/b0dc140cd125498070143f67abf51204373fa14c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global ...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22696
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22379#discussion_r224648638
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala
 ---
@@ -40,16 +40,6 @@ object CSVUtils {
 }
   }
 
-  /**
-   * Filter ignorable rows for CSV iterator (lines empty and starting with 
`comment`).
-   * This is currently being used in CSV reading path and CSV schema 
inference.
-   */
-  def filterCommentAndEmpty(iter: Iterator[String], options: CSVOptions): 
Iterator[String] = {
--- End diff --

nope. It's under execution package.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22676: [SPARK-25684][SQL] Organize header related codes ...

2018-10-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22676


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22379#discussion_r224648258
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala
 ---
@@ -254,7 +256,7 @@ object TextInputCSVDataSource extends CSVDataSource {
 val header = makeSafeHeader(firstRow, caseSensitive, parsedOptions)
 val sampled: Dataset[String] = CSVUtils.sample(csv, parsedOptions)
 val tokenRDD = sampled.rdd.mapPartitions { iter =>
-  val filteredLines = CSVUtils.filterCommentAndEmpty(iter, 
parsedOptions)
+  val filteredLines = filterCommentAndEmpty(iter, parsedOptions)
--- End diff --

not a big deal but let's just use `CSVUtils...` usage just for consistency 
in this file.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22676: [SPARK-25684][SQL] Organize header related codes in CSV ...

2018-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22676
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22676: [SPARK-25684][SQL] Organize header related codes in CSV ...

2018-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22676
  
Thank you @cloud-fan and @MaxGekk for reviewing this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22697: [SPARK-25700][SQL][BRANCH-2.4] Partially revert append m...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22697
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97281/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22697: [SPARK-25700][SQL][BRANCH-2.4] Partially revert append m...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22697
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22674
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97277/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22674
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22697: [SPARK-25700][SQL][BRANCH-2.4] Partially revert append m...

2018-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22697
  
**[Test build #97281 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97281/testReport)**
 for PR 22697 at commit 
[`b836625`](https://github.com/apache/spark/commit/b836625c0d4404d1ca885d172cef5f820efc187c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...

2018-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22674
  
**[Test build #97277 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97277/testReport)**
 for PR 22674 at commit 
[`0bfc240`](https://github.com/apache/spark/commit/0bfc2408a5941d7da8d93582668ba77a7394ac66).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21364: [SPARK-24317][SQL]Float-point numbers are displayed with...

2018-10-11 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21364
  
cc @srinathshankar @yuchenhuo 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

2018-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22702
  
**[Test build #97288 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97288/testReport)**
 for PR 22702 at commit 
[`a9359ab`](https://github.com/apache/spark/commit/a9359abff62017f46f33ef18d7f56f97c885af3d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22702
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3907/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22702
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

2018-10-11 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22702
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global ...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22696
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global ...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22696
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97280/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global ...

2018-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22696
  
**[Test build #97280 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97280/testReport)**
 for PR 22696 at commit 
[`78a1689`](https://github.com/apache/spark/commit/78a1689ecd7854a11ba709853462897d5e0d1a28).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-11 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r224639756
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
-// We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
-// To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
-// val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
-val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
-  tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
--- End diff --

@kmanamcheri : Lets do this:
- We should prefer doing `getPartitionsByFilterMethod()`. If it fails, we 
retry with increasing delay across retries.
- If retries are exhausted, we could fetch all the partitions of the table. 
Some people might not want this so lets control this using a conf flag. For 
those who don't want it, the query could fail at this point.

What do you think ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22702
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97284/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22702
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22702: [SPARK-25714] Fix Null Handling in the Optimizer rule Bo...

2018-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22702
  
**[Test build #97284 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97284/testReport)**
 for PR 22702 at commit 
[`a9359ab`](https://github.com/apache/spark/commit/a9359abff62017f46f33ef18d7f56f97c885af3d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22661
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...

2018-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22661
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3906/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >