date:20180917

[GitHub] spark issue #22411: [SPARK-25421][SQL] Abstract an output path field in trai...

2018-09-17 Thread LantaoJin

Github user LantaoJin commented on the issue:

https://github.com/apache/spark/pull/22411
  
Using pattern matching will face a problem. 
```InsertIntoHiveDirCommand```,```CreateHiveTableAsSelectCommand``` and 
```InsertIntoHiveTable``` are all in spark-hive module. SparkPlanInfo could not 
include them.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22447
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22447
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96166/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...

2018-09-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22447
  
**[Test build #96166 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96166/testReport)**
 for PR 22447 at commit 
[`7193de3`](https://github.com/apache/spark/commit/7193de3ad8675229eef131214ed62f2ece5cd416).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22441: [SPARK-25445][BUILD] the release script should be...

2018-09-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22441#discussion_r218304237
  
--- Diff: dev/create-release/release-build.sh ---
@@ -111,13 +111,17 @@ fi
 # different versions of Scala are supported.
 BASE_PROFILES="-Pmesos -Pyarn"
 PUBLISH_SCALA_2_10=0
+PUBLISH_SCALA_2_12=0
 SCALA_2_10_PROFILES="-Pscala-2.10"
 SCALA_2_11_PROFILES=
 SCALA_2_12_PROFILES="-Pscala-2.12"
 
 if [[ $SPARK_VERSION > "2.3" ]]; then
   BASE_PROFILES="$BASE_PROFILES -Pkubernetes -Pflume"
   SCALA_2_11_PROFILES="-Pkafka-0-8"
+  if [[ $SPARK_VERSION > "2.4" ]]; then
--- End diff --

then we need to set the BASE_PROFILES and SCALA_2_11_PROFILES again.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22441: [SPARK-25445][BUILD] the release script should be able t...

2018-09-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22441
  
**[Test build #96168 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96168/testReport)**
 for PR 22441 at commit 
[`110913a`](https://github.com/apache/spark/commit/110913a6d4a10087ee31247d5357a0432ffbd8b7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22441: [SPARK-25445][BUILD] the release script should be able t...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22441
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22441: [SPARK-25445][BUILD] the release script should be...

2018-09-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22441#discussion_r218304060
  
--- Diff: dev/create-release/release-build.sh ---
@@ -183,8 +188,17 @@ if [[ "$1" == "package" ]]; then
   # Updated for each binary build
   make_binary_release() {
 NAME=$1
-FLAGS="$MVN_EXTRA_OPTS -B $SCALA_2_11_PROFILES $BASE_RELEASE_PROFILES 
$2"
-BUILD_PACKAGE=$3
+SCALA_VERSION=$2
+SCALA_PROFILES=
+if [[ SCALA_VERSION == "2.10" ]]; then
--- End diff --

People may run the release script in master branch to release Spark 2.1. 
I'd like to keep the handling of scala 2.10 until we announce that we no longer 
support Spark version prior to 2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22441: [SPARK-25445][BUILD] the release script should be able t...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22441
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3181/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

2018-09-17 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22429#discussion_r218303899
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -99,10 +99,11 @@ private[spark] object Utils extends Logging {
* by setting the 'spark.debug.maxToStringFields' conf in SparkEnv.
*/
   val DEFAULT_MAX_TO_STRING_FIELDS = 25
+  val MAX_TO_STRING_FIELDS = "spark.debug.maxToStringFields"
--- End diff --

Can we move it to org.apache.spark.internal.config? I think our Core module 
should do the same thing like what we are doing in SQLConf. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22441: [SPARK-25445][BUILD] the release script should be...

2018-09-17 Thread gengliangwang

Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22441#discussion_r218303693
  
--- Diff: dev/create-release/release-build.sh ---
@@ -111,13 +111,17 @@ fi
 # different versions of Scala are supported.
 BASE_PROFILES="-Pmesos -Pyarn"
 PUBLISH_SCALA_2_10=0
+PUBLISH_SCALA_2_12=0
 SCALA_2_10_PROFILES="-Pscala-2.10"
 SCALA_2_11_PROFILES=
 SCALA_2_12_PROFILES="-Pscala-2.12"
--- End diff --

Here we may miss the arg "-Pkafka-0-8"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22411: [SPARK-25421][SQL] Abstract an output path field in trai...

2018-09-17 Thread LantaoJin

Github user LantaoJin commented on the issue:

https://github.com/apache/spark/pull/22411
  
Agree that. Since this field is important to us. Could I refactor it 
following your advice and file a discussion in another Jira?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22441: [SPARK-25445][BUILD] the release script should be...

2018-09-17 Thread gengliangwang

Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22441#discussion_r218301766
  
--- Diff: dev/create-release/release-build.sh ---
@@ -111,13 +111,17 @@ fi
 # different versions of Scala are supported.
 BASE_PROFILES="-Pmesos -Pyarn"
 PUBLISH_SCALA_2_10=0
+PUBLISH_SCALA_2_12=0
 SCALA_2_10_PROFILES="-Pscala-2.10"
 SCALA_2_11_PROFILES=
 SCALA_2_12_PROFILES="-Pscala-2.12"
 
 if [[ $SPARK_VERSION > "2.3" ]]; then
   BASE_PROFILES="$BASE_PROFILES -Pkubernetes -Pflume"
   SCALA_2_11_PROFILES="-Pkafka-0-8"
+  if [[ $SPARK_VERSION > "2.4" ]]; then
--- End diff --

nit: I think we can move this branch out of the scope of `if [[ 
$SPARK_VERSION > "2.3" ]]; then`. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22441: [SPARK-25445][BUILD] the release script should be able t...

2018-09-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22441
  
**[Test build #96167 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96167/testReport)**
 for PR 22441 at commit 
[`b5237ec`](https://github.com/apache/spark/commit/b5237ecfed1831305cec482ea982c0050e0e3970).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22402: [SPARK-25414][SS][TEST] make it clear that the numRows m...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22402
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22402: [SPARK-25414][SS][TEST] make it clear that the numRows m...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22402
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96163/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22441: [SPARK-25445][BUILD] the release script should be able t...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22441
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3180/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22441: [SPARK-25445][BUILD] the release script should be able t...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22441
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22402: [SPARK-25414][SS][TEST] make it clear that the numRows m...

2018-09-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22402
  
**[Test build #96163 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96163/testReport)**
 for PR 22402 at commit 
[`0c661a0`](https://github.com/apache/spark/commit/0c661a08e74fea90b025ad21fb9da6113ef70d4c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22402: [SPARK-25414][SS][TEST] make it clear that the numRows m...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22402
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22402: [SPARK-25414][SS][TEST] make it clear that the numRows m...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22402
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96165/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22402: [SPARK-25414][SS][TEST] make it clear that the numRows m...

2018-09-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22402
  
**[Test build #96165 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96165/testReport)**
 for PR 22402 at commit 
[`0c661a0`](https://github.com/apache/spark/commit/0c661a08e74fea90b025ad21fb9da6113ef70d4c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17400: [SPARK-19981][SQL] Respect aliases in output partitionin...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17400
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96164/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17400: [SPARK-19981][SQL] Respect aliases in output partitionin...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17400
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17400: [SPARK-19981][SQL] Respect aliases in output partitionin...

2018-09-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17400
  
**[Test build #96164 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96164/testReport)**
 for PR 17400 at commit 
[`5482b1b`](https://github.com/apache/spark/commit/5482b1be6308ddf7e77dc25c0bdfca3ede2d61a7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait AliasAwareOutputPartitioning extends UnaryExecNode `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r218295350
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1879,6 +1879,80 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
 ## Upgrading From Spark SQL 2.3 to 2.4
 
+  - In Spark version 2.3 and earlier, the second parameter to 
array_contains function is implicitly promoted to the element type of first 
array type parameter. This type promotion can be lossy and may cause 
`array_contains` function to return wrong result. This problem has been 
addressed in 2.4 by employing a safer type promotion mechanism. This can cause 
some change in behavior and are illustrated in the table below.
+  
+
+  
+Query
+  
+  
+Result Spark 2.3 or Prior
+  
+  
+Result Spark 2.4
+  
+  
+Remarks
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34D);
+  
+  
+true
+  
+  
+false
+  
+  
+In Spark 2.4, both left and right parameters are  promoted 
to array(double) and double type respectively.
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34);
+  
+  
+true
+  
+  
+AnalysisException is thrown since integer type can not be 
promoted to decimal type in a loss-less manner.
--- End diff --

I left a few comments. Please send a PR, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22411: [SPARK-25421][SQL] Abstract an output path field in trai...

2018-09-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22411
  
Since this is a new feature, we can't just merge it like #22353 without a 
proper design.

Making the event logs as a structured, unified and reliable source for 
Spark metrics looks like a good idea. Let's write a design doc to explain what 
we already have in the event logs, and what is missing, and how to make it 
reliable, and what's the issue if we read it in real time. It's better to 
discuss it in the dev list and see if other people have different ideas to get 
Spark metrics.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22411: [SPARK-25421][SQL] Abstract an output path field in trai...

2018-09-17 Thread LantaoJin

Github user LantaoJin commented on the issue:

https://github.com/apache/spark/pull/22411
  
Most of the information we wanted could be analyzed out from event log 
except some metrics in Executor side which doesn't heartbeat to Driver, e.g RPC 
count with NameNode. Another case is #21221, before that we had to hack code to 
get the similar metrics. Event log as a structured, unified, overall, 
replay-able log, it offers a possibility to analysis offline, even realtime. We 
prefer to use it since the history UI exposes less information than user 
expected, further more not smart and hard to customize. We are on going on this 
based on event log. Thanks @cloud-fan,  I suggest to add this interface to 
```DataWritingCommand```. Pattern matching each implementations looks trick. It 
looks common, maybe it could be used in physical plan optimization in future.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22443: [SPARK-25339][TEST] Refactor FilterPushdownBenchm...

2018-09-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22443#discussion_r218293821
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
 ---
@@ -17,29 +17,28 @@
 
 package org.apache.spark.sql.execution.benchmark
 
-import java.io.{File, FileOutputStream, OutputStream}
+import java.io.File
 
 import scala.util.{Random, Try}
 
-import org.scalatest.{BeforeAndAfterEachTestData, Suite, TestData}
-
 import org.apache.spark.SparkConf
-import org.apache.spark.SparkFunSuite
 import org.apache.spark.sql.{DataFrame, SparkSession}
 import org.apache.spark.sql.functions.monotonically_increasing_id
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.internal.SQLConf.ParquetOutputTimestampType
 import org.apache.spark.sql.types.{ByteType, Decimal, DecimalType, 
TimestampType}
-import org.apache.spark.util.{Benchmark, Utils}
+import org.apache.spark.util.{Benchmark, BenchmarkBase => 
FileBenchmarkBase, Utils}
 
 /**
  * Benchmark to measure read performance with Filter pushdown.
- * To run this:
- *  build/sbt "sql/test-only *FilterPushdownBenchmark"
- *
- * Results will be written to 
"benchmarks/FilterPushdownBenchmark-results.txt".
+ * To run this benchmark:
+ *  1. without sbt: bin/spark-submit --class  
+ *  2. build/sbt "sql/test:runMain "
+ *  3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/test:runMain "
--- End diff --

IIRC there is a special `OutputStream` that can print the output to console.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22443: [SPARK-25339][TEST] Refactor FilterPushdownBenchm...

2018-09-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22443#discussion_r218293752
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
 ---
@@ -17,29 +17,28 @@
 
 package org.apache.spark.sql.execution.benchmark
 
-import java.io.{File, FileOutputStream, OutputStream}
+import java.io.File
 
 import scala.util.{Random, Try}
 
-import org.scalatest.{BeforeAndAfterEachTestData, Suite, TestData}
-
 import org.apache.spark.SparkConf
-import org.apache.spark.SparkFunSuite
 import org.apache.spark.sql.{DataFrame, SparkSession}
 import org.apache.spark.sql.functions.monotonically_increasing_id
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.internal.SQLConf.ParquetOutputTimestampType
 import org.apache.spark.sql.types.{ByteType, Decimal, DecimalType, 
TimestampType}
-import org.apache.spark.util.{Benchmark, Utils}
+import org.apache.spark.util.{Benchmark, BenchmarkBase => 
FileBenchmarkBase, Utils}
 
 /**
  * Benchmark to measure read performance with Filter pushdown.
- * To run this:
- *  build/sbt "sql/test-only *FilterPushdownBenchmark"
- *
- * Results will be written to 
"benchmarks/FilterPushdownBenchmark-results.txt".
+ * To run this benchmark:
+ *  1. without sbt: bin/spark-submit --class  
+ *  2. build/sbt "sql/test:runMain "
+ *  3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/test:runMain "
--- End diff --

shall we print the benchmark result if `SPARK_GENERATE_BENCHMARK_FILES` is 
not set?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

2018-09-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21403#discussion_r218293687
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-basic.sql.out
 ---
@@ -0,0 +1,70 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 7
+
+
+-- !query 0
+create temporary view tab_a as select * from values (1, 1) as tab_a(a1, b1)
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+create temporary view tab_b as select * from values (1, 1) as tab_b(a2, b2)
+-- !query 1 schema
+struct<>
+-- !query 1 output
+
+
+
+-- !query 2
+create temporary view struct_tab as select struct(col1 as a, col2 as b) as 
record from
+ values (1, 1), (1, 2), (2, 1), (2, 2)
+-- !query 2 schema
+struct<>
+-- !query 2 output
+
+
+
+-- !query 3
+select 1 from tab_a where (a1, b1) not in (select a2, b2 from tab_b)
+-- !query 3 schema
+struct<1:int>
+-- !query 3 output
+
+
+
+-- !query 4
+select 1 from tab_a where (a1, b1) not in (select (a2, b2) from tab_b)
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22420: [SPARK-25429][SQL]Use Set improve SparkListenerBu...

2018-09-17 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22420#discussion_r218293704
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala
 ---
@@ -83,7 +83,7 @@ class SQLAppStatusListener(
 // track of the metrics knows which accumulators to look at.
 val accumIds = exec.metrics.map(_.accumulatorId).sorted.toList
 event.stageIds.foreach { id =>
-  stageMetrics.put(id, new LiveStageMetrics(id, 0, accumIds.toArray, 
new ConcurrentHashMap()))
+  stageMetrics.put(id, new LiveStageMetrics(id, 0, accumIds.toSet, new 
ConcurrentHashMap()))
--- End diff --

cc @gengliangwang 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

2018-09-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21403#discussion_r218293670
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-basic.sql.out
 ---
@@ -0,0 +1,70 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 7
+
+
+-- !query 0
+create temporary view tab_a as select * from values (1, 1) as tab_a(a1, b1)
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+create temporary view tab_b as select * from values (1, 1) as tab_b(a2, b2)
+-- !query 1 schema
+struct<>
+-- !query 1 output
+
+
+
+-- !query 2
+create temporary view struct_tab as select struct(col1 as a, col2 as b) as 
record from
+ values (1, 1), (1, 2), (2, 1), (2, 2)
+-- !query 2 schema
+struct<>
+-- !query 2 output
+
+
+
+-- !query 3
+select 1 from tab_a where (a1, b1) not in (select a2, b2 from tab_b)
--- End diff --

what's the result of this query without this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22444: [SPARK-25409][Core]Speed up Spark History loading...

2018-09-17 Thread jianjianjiao

Github user jianjianjiao commented on a diff in the pull request:

https://github.com/apache/spark/pull/22444#discussion_r218292773
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -465,20 +475,31 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
 }
   } catch {
 case _: NoSuchElementException =>
-  // If the file is currently not being tracked by the SHS, 
add an entry for it and try
-  // to parse it. This will allow the cleaner code to detect 
the file as stale later on
-  // if it was not possible to parse it.
-  listing.write(LogInfo(entry.getPath().toString(), 
newLastScanTime, None, None,
-entry.getLen()))
--- End diff --

Hi, @squito  thanks for looking into this PR.

When Spark history starts, it will scan event logs folder, and using 
multi-threads to handle. it will not do next scan before the first finishes.  
That is the problem, in our cluster, there are about 20K event-log files(often 
bigger than 1G), including like 1K .inprogress files, it takes about 2 and a 
half hours to do the first scan. that means, during this 2.5 hours, if an user 
submit a spark application, and it finishes, user cannot find it via the spark 
history UI, and has to wait for the next scan.

That is why I add a limit of how much to scan each time, like set to 3K.  
That means no matter how many log files in the event-logs folder, it will first 
scan the first 3K and handle them, and then do the second scan, let's assume 
that during the first scan, there are 5 applications scanned, and there are 
another 10 applications updated. then the second scan will handle these 15 
applications and another 2885 files ( from 3001 to 5885) in the event folder. 

 checkForLogs scan event-log folders, and only handles files that are 
updated or not handled. 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-09-17 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22138
  
Now vote for Spark 2.4 is in progress. If we are not in stand-by mode for 
any blocker issues for Spark 2.4 RC, I'd be really happy if someone could 
revisit this and continue reviewing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...

2018-09-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22439


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

2018-09-17 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/22439
  
Thanks! merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22433: [SPARK-25442][SQL][K8S] Support STS to run in k8s deploy...

2018-09-17 Thread suryag10

Github user suryag10 commented on the issue:

https://github.com/apache/spark/pull/22433
  
> Agreed with @mridulm that the naming restriction is specific to k8s and 
should be handled in a k8s specific way, e.g., somewhere around 
https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala#L208.

Ok, Will update the PR with the same.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...

2018-09-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22447
  
**[Test build #96166 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96166/testReport)**
 for PR 22447 at commit 
[`7193de3`](https://github.com/apache/spark/commit/7193de3ad8675229eef131214ed62f2ece5cd416).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22447
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3179/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22447
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule u...

2018-09-17 Thread maryannxue

GitHub user maryannxue opened a pull request:

https://github.com/apache/spark/pull/22447

[SPARK-25450][SQL] PushProjectThroughUnion rule uses the same exprId for 
project expressions in each Union child, causing mistakes in constant 
propagation

## What changes were proposed in this pull request?

The problem was cause by the PushProjectThroughUnion rule, which, when 
creating new Project for each child of Union, uses the same exprId for 
expressions of the same position. This is wrong because, for each child of 
Union, the expressions are all independent, and it can lead to a wrong result 
if other rules like FoldablePropagation kicks in, taking two different 
expressions as the same.

This fix is to create new expressions in the new Project for each child of 
Union.

## How was this patch tested?

Added UT.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maryannxue/spark push-project-thru-union-bug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22447.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22447


commit 7193de3ad8675229eef131214ed62f2ece5cd416
Author: maryannxue 
Date:   2018-09-18T02:56:07Z

fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22381: [SPARK-25394][CORE] Add an application status metrics so...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22381
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96159/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22381: [SPARK-25394][CORE] Add an application status metrics so...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22381
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22381: [SPARK-25394][CORE] Add an application status metrics so...

2018-09-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22381
  
**[Test build #96159 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96159/testReport)**
 for PR 22381 at commit 
[`3a2db16`](https://github.com/apache/spark/commit/3a2db16813a2bab3160f444fe0855855187a6178).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22446: [SPARK-19550][DOC][FOLLOW-UP] Update tuning.md to use JD...

2018-09-17 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22446
  
Yes. I can't find more references to the old JDK docs also.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22395: [SPARK-16323][SQL] Add IntegralDivide expression

2018-09-17 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/22395
  
Looks like a use case for a legacy config.

On Mon, Sep 17, 2018 at 6:41 PM Wenchen Fan 
wrote:

> To clarify, it's not following hive, but following the behavior of
> previous Spark versions, which is same as hive.
>
> I also think returning left operand's type is more reasonable, but we
> should do it in another PR since it's a behavior change, and we should 
also
> add migration guide for it.
>
> @mgaido91  do you have time to do this
> change? Thanks!
>
> â
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>
-- 
--
excuse the brevity and lower case due to wrist injury



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22446: [SPARK-19550][DOC][FOLLOW-UP] Update tuning.md to use JD...

2018-09-17 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22446
  
Some references to Java 7, Some references to Java 6.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22305
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96155/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22305
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...

2018-09-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22305
  
**[Test build #96155 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96155/testReport)**
 for PR 22305 at commit 
[`278abbf`](https://github.com/apache/spark/commit/278abbf5a8d9a7c3e3c660faf7a73d4ef1c7532a).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22418: [SPARK-25427][SQL][TEST] Add BloomFilter creation...

2018-09-17 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22418#discussion_r218282733
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
 ---
@@ -50,6 +55,66 @@ abstract class OrcSuite extends OrcTest with 
BeforeAndAfterAll {
   .createOrReplaceTempView("orc_temp_table")
   }
 
+  protected def testBloomFilterCreation(bloomFilterKind: Kind) {
+val tableName = "bloomFilter"
+
+withTempDir { dir =>
+  withTable(tableName) {
+val sqlStatement = orcImp match {
+  case "native" =>
+s"""
+   |CREATE TABLE $tableName (a INT, b STRING)
+   |USING ORC
+   |OPTIONS (
+   |  path '${dir.toURI}',
+   |  orc.bloom.filter.columns '*',
+   |  orc.bloom.filter.fpp 0.1
+   |)
+""".stripMargin
+  case "hive" =>
+s"""
+   |CREATE TABLE $tableName (a INT, b STRING)
+   |STORED AS ORC
+   |LOCATION '${dir.toURI}'
+   |TBLPROPERTIES (
+   |  orc.bloom.filter.columns='*',
+   |  orc.bloom.filter.fpp=0.1
+   |)
+""".stripMargin
+  case impl =>
+throw new UnsupportedOperationException(s"Unknown ORC 
implementation: $impl")
+}
+
+sql(sqlStatement)
+sql(s"INSERT INTO $tableName VALUES (1, 'str')")
+
+val partFiles = dir.listFiles()
+  .filter(f => f.isFile && !f.getName.startsWith(".") && 
!f.getName.startsWith("_"))
+assert(partFiles.length === 1)
+
+val orcFilePath = new Path(partFiles.head.getAbsolutePath)
+val readerOptions = OrcFile.readerOptions(new Configuration())
+val reader = OrcFile.createReader(orcFilePath, readerOptions)
+var recordReader: RecordReaderImpl = null
+try {
+  recordReader = reader.rows.asInstanceOf[RecordReaderImpl]
+
+  // BloomFilter array is created for all types; `struct`, int 
(`a`), string (`b`)
+  val sargColumns = Array(true, true, true)
+  val orcIndex = recordReader.readRowIndex(0, null, sargColumns)
+
+  // Check the types and counts of bloom filters
+  assert(orcIndex.getBloomFilterKinds.forall(_ === 
bloomFilterKind))
--- End diff --

Thank you for advice, @HyukjinKwon .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22402: [SPARK-25414][SS][TEST] make it clear that the numRows m...

2018-09-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22402
  
**[Test build #96165 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96165/testReport)**
 for PR 22402 at commit 
[`0c661a0`](https://github.com/apache/spark/commit/0c661a08e74fea90b025ad21fb9da6113ef70d4c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22402: [SPARK-25414][SS][TEST] make it clear that the numRows m...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22402
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22402: [SPARK-25414][SS][TEST] make it clear that the numRows m...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22402
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3178/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22418: [SPARK-25427][SQL][TEST] Add BloomFilter creation...

2018-09-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22418#discussion_r218281814
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
 ---
@@ -50,6 +55,66 @@ abstract class OrcSuite extends OrcTest with 
BeforeAndAfterAll {
   .createOrReplaceTempView("orc_temp_table")
   }
 
+  protected def testBloomFilterCreation(bloomFilterKind: Kind) {
+val tableName = "bloomFilter"
+
+withTempDir { dir =>
+  withTable(tableName) {
+val sqlStatement = orcImp match {
+  case "native" =>
+s"""
+   |CREATE TABLE $tableName (a INT, b STRING)
+   |USING ORC
+   |OPTIONS (
+   |  path '${dir.toURI}',
+   |  orc.bloom.filter.columns '*',
+   |  orc.bloom.filter.fpp 0.1
+   |)
+""".stripMargin
+  case "hive" =>
+s"""
+   |CREATE TABLE $tableName (a INT, b STRING)
+   |STORED AS ORC
+   |LOCATION '${dir.toURI}'
+   |TBLPROPERTIES (
+   |  orc.bloom.filter.columns='*',
+   |  orc.bloom.filter.fpp=0.1
+   |)
+""".stripMargin
+  case impl =>
+throw new UnsupportedOperationException(s"Unknown ORC 
implementation: $impl")
+}
+
+sql(sqlStatement)
+sql(s"INSERT INTO $tableName VALUES (1, 'str')")
+
+val partFiles = dir.listFiles()
+  .filter(f => f.isFile && !f.getName.startsWith(".") && 
!f.getName.startsWith("_"))
+assert(partFiles.length === 1)
+
+val orcFilePath = new Path(partFiles.head.getAbsolutePath)
+val readerOptions = OrcFile.readerOptions(new Configuration())
+val reader = OrcFile.createReader(orcFilePath, readerOptions)
+var recordReader: RecordReaderImpl = null
+try {
+  recordReader = reader.rows.asInstanceOf[RecordReaderImpl]
+
+  // BloomFilter array is created for all types; `struct`, int 
(`a`), string (`b`)
+  val sargColumns = Array(true, true, true)
+  val orcIndex = recordReader.readRowIndex(0, null, sargColumns)
+
+  // Check the types and counts of bloom filters
+  assert(orcIndex.getBloomFilterKinds.forall(_ === 
bloomFilterKind))
--- End diff --

FWIW, there might be one more thing consider in Parquet too - block level 
filter and record level filter.

and if pushed filters are actually handled within each source properly or 
not (fileformat specifically).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22438: [SPARK-25443][BUILD] fix issues when building doc...

2018-09-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22438


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22418: [SPARK-25427][SQL][TEST] Add BloomFilter creation...

2018-09-17 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22418#discussion_r218281581
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
 ---
@@ -50,6 +55,66 @@ abstract class OrcSuite extends OrcTest with 
BeforeAndAfterAll {
   .createOrReplaceTempView("orc_temp_table")
   }
 
+  protected def testBloomFilterCreation(bloomFilterKind: Kind) {
+val tableName = "bloomFilter"
+
+withTempDir { dir =>
+  withTable(tableName) {
+val sqlStatement = orcImp match {
+  case "native" =>
+s"""
+   |CREATE TABLE $tableName (a INT, b STRING)
+   |USING ORC
+   |OPTIONS (
+   |  path '${dir.toURI}',
+   |  orc.bloom.filter.columns '*',
+   |  orc.bloom.filter.fpp 0.1
+   |)
+""".stripMargin
+  case "hive" =>
+s"""
+   |CREATE TABLE $tableName (a INT, b STRING)
+   |STORED AS ORC
+   |LOCATION '${dir.toURI}'
+   |TBLPROPERTIES (
+   |  orc.bloom.filter.columns='*',
+   |  orc.bloom.filter.fpp=0.1
+   |)
+""".stripMargin
+  case impl =>
+throw new UnsupportedOperationException(s"Unknown ORC 
implementation: $impl")
+}
+
+sql(sqlStatement)
+sql(s"INSERT INTO $tableName VALUES (1, 'str')")
+
+val partFiles = dir.listFiles()
+  .filter(f => f.isFile && !f.getName.startsWith(".") && 
!f.getName.startsWith("_"))
+assert(partFiles.length === 1)
+
+val orcFilePath = new Path(partFiles.head.getAbsolutePath)
+val readerOptions = OrcFile.readerOptions(new Configuration())
+val reader = OrcFile.createReader(orcFilePath, readerOptions)
+var recordReader: RecordReaderImpl = null
+try {
+  recordReader = reader.rows.asInstanceOf[RecordReaderImpl]
+
+  // BloomFilter array is created for all types; `struct`, int 
(`a`), string (`b`)
+  val sargColumns = Array(true, true, true)
+  val orcIndex = recordReader.readRowIndex(0, null, sargColumns)
+
+  // Check the types and counts of bloom filters
+  assert(orcIndex.getBloomFilterKinds.forall(_ === 
bloomFilterKind))
--- End diff --

Got it. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22402: [SPARK-25414][SS][TEST] make it clear that the numRows m...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22402
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17400: [SPARK-19981][SQL] Respect aliases in output partitionin...

2018-09-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17400
  
**[Test build #96164 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96164/testReport)**
 for PR 17400 at commit 
[`5482b1b`](https://github.com/apache/spark/commit/5482b1be6308ddf7e77dc25c0bdfca3ede2d61a7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17400: [SPARK-19981][SQL] Respect aliases in output partitionin...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17400
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3177/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22402: [SPARK-25414][SS][TEST] make it clear that the numRows m...

2018-09-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22402
  
**[Test build #96163 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96163/testReport)**
 for PR 22402 at commit 
[`0c661a0`](https://github.com/apache/spark/commit/0c661a08e74fea90b025ad21fb9da6113ef70d4c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22402: [SPARK-25414][SS][TEST] make it clear that the numRows m...

2018-09-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22402
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22402: [SPARK-25414][SS][TEST] make it clear that the numRows m...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22402
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96160/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17400: [SPARK-19981][SQL] Respect aliases in output partitionin...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17400
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22402: [SPARK-25414][SS][TEST] make it clear that the numRows m...

2018-09-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22402
  
**[Test build #96160 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96160/testReport)**
 for PR 22402 at commit 
[`0c661a0`](https://github.com/apache/spark/commit/0c661a08e74fea90b025ad21fb9da6113ef70d4c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22446: [SPARK-19550][DOC][FOLLOW-UP] Update tuning.md to use JD...

2018-09-17 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22446
  
Are these all the references to Java 7 docs?
Also, does the advice need a bit of updating since we require Java 8? 
around GC?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22438: [SPARK-25443][BUILD] fix issues when building docs with ...

2018-09-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22438
  
thanks, merging to master/2.4!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17400: [SPARK-19981][SQL] Respect aliases in output partitionin...

2018-09-17 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17400
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17400: [SPARK-19981][SQL] Respect aliases in output partitionin...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17400
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96161/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17400: [SPARK-19981][SQL] Respect aliases in output partitionin...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17400
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17400: [SPARK-19981][SQL] Respect aliases in output partitionin...

2018-09-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17400
  
**[Test build #96161 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96161/testReport)**
 for PR 17400 at commit 
[`5482b1b`](https://github.com/apache/spark/commit/5482b1be6308ddf7e77dc25c0bdfca3ede2d61a7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait AliasAwareOutputPartitioning extends UnaryExecNode `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

2018-09-17 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/22414
  
Plz do not change the format: `Problem` -> `What changes were proposed in 
this pull request?`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

2018-09-17 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22414#discussion_r218280823
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala
 ---
@@ -122,6 +123,51 @@ class TimeWindowSuite extends SparkFunSuite with 
ExpressionEvalHelper with Priva
 }
   }
 
+  test("windowDuration and slideDuration should be positive.") {
+val fractions = Table(
+  ("windowDuration", "slideDuration"), // First tuple defines column 
names
+  ("-2 seconds", "1 seconds"),
+  ("1 seconds", "-2 seconds"),
+  ("0 seconds", "1 seconds"),
+  ("1 seconds", "0 seconds"),
+  ("-2 seconds", "-2 seconds"),
+  ("-2 seconds", "-2 hours"),
+  ("0 seconds", "0 seconds"),
+  (-2L, 2L),
+  (2L, -2L),
+  (-2, 2),
+  (2, -2)
+)
+forAllRows(fractions) { (windowDuration: Any, slideDuration: Any) =>
+  logInfo(s"windowDuration = $windowDuration slideDuration = 
$slideDuration")
--- End diff --

What does this log means?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

2018-09-17 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22414#discussion_r218280792
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala
 ---
@@ -35,6 +35,10 @@ case class TimeWindow(
   with ImplicitCastInputTypes
   with Unevaluable
   with NonSQLExpression {
+  require(windowDuration > 0, "The window duration must be " +
+s"a positive integer, long or string literal, found: $windowDuration")
+  require(slideDuration > 0, "The slide duration must be " +
+s"a positive integer, long or string literal, found: $slideDuration")
--- End diff --

Either way, we'd be better to avoid duplicate error checks. Can we make it 
simpler, e.g., how about making a helper func to check these requirements?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22446: [SPARK-19550][DOC][FOLLOW-UP] Update tuning.md to use JD...

2018-09-17 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22446
  
cc @srowen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22444: [SPARK-25409][Core]Speed up Spark History loading...

2018-09-17 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/22444#discussion_r218279175
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -465,20 +475,31 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
 }
   } catch {
 case _: NoSuchElementException =>
-  // If the file is currently not being tracked by the SHS, 
add an entry for it and try
-  // to parse it. This will allow the cleaner code to detect 
the file as stale later on
-  // if it was not possible to parse it.
-  listing.write(LogInfo(entry.getPath().toString(), 
newLastScanTime, None, None,
-entry.getLen()))
--- End diff --

if you don't do this here for all entries, I think the cleaning around line 
522 isn't going to work.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22446: [SPARK-19550][DOC][FOLLOW-UP] Update tuning.md to use JD...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22446
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96162/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22446: [SPARK-19550][DOC][FOLLOW-UP] Update tuning.md to use JD...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22446
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22446: [SPARK-19550][DOC][FOLLOW-UP] Update tuning.md to use JD...

2018-09-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22446
  
**[Test build #96162 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96162/testReport)**
 for PR 22446 at commit 
[`f8924bb`](https://github.com/apache/spark/commit/f8924bb0ce876beb35309ea51f1c1c42497d26e0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22305
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22305
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96158/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...

2018-09-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22305
  
**[Test build #96158 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96158/testReport)**
 for PR 22305 at commit 
[`88aa5c3`](https://github.com/apache/spark/commit/88aa5c31b668b9b476ccb963f35375fcf6d41462).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22395: [SPARK-16323][SQL] Add IntegralDivide expression

2018-09-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22395
  
To clarify, it's not following hive, but following the behavior of previous 
Spark versions, which is same as hive.

I also think returning left operand's type is more reasonable, but we 
should do it in another PR since it's a behavior change, and we should also add 
migration guide for it.

@mgaido91 do you have time to do this change? Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22446: [SPARK-19550][DOC][FOLLOW-UP] Update tuning.md to use JD...

2018-09-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22446
  
**[Test build #96162 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96162/testReport)**
 for PR 22446 at commit 
[`f8924bb`](https://github.com/apache/spark/commit/f8924bb0ce876beb35309ea51f1c1c42497d26e0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22355: [SPARK-25358][SQL] MutableProjection supports fallback t...

2018-09-17 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/22355
  
cc: @gatorsmile @rednaxelafx Could you check again and merge?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22446: [SPARK-19550][DOC][FOLLOW-UP] Update tuning.md to use JD...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22446
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22446: [SPARK-19550][DOC][FOLLOW-UP] Update tuning.md to use JD...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22446
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3176/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17400: [SPARK-19981][SQL] Respect aliases in output part...

2018-09-17 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17400#discussion_r218277145
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/AliasAwareOutputPartitioning.scala
 ---
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.sql.catalyst.expressions.{Alias, Expression, 
NamedExpression}
+import org.apache.spark.sql.catalyst.plans.physical._
+
+trait AliasAwareOutputPartitioning extends UnaryExecNode {
--- End diff --

ok, I'll wait for @maryannxue suggestion.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22446: [SPARK-19550][DOC][FOLLOW-UP] Update tuning.md to...

2018-09-17 Thread wangyum

GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/22446

[SPARK-19550][DOC][FOLLOW-UP] Update tuning.md to use JDK8

## What changes were proposed in this pull request?

Update `tuning.md` and `building-spark.md` to use JDK8.

## How was this patch tested?

manual tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark java8

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22446.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22446


commit f8924bb0ce876beb35309ea51f1c1c42497d26e0
Author: Yuming Wang 
Date:   2018-09-18T01:14:04Z

To java 8




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22305
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22305
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96156/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...

2018-09-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22305
  
**[Test build #96156 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96156/testReport)**
 for PR 22305 at commit 
[`d947317`](https://github.com/apache/spark/commit/d947317e150eb426b0b6da0c116489735975a8b8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22418: [SPARK-25427][SQL][TEST] Add BloomFilter creation...

2018-09-17 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/22418#discussion_r218272427
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
 ---
@@ -50,6 +55,66 @@ abstract class OrcSuite extends OrcTest with 
BeforeAndAfterAll {
   .createOrReplaceTempView("orc_temp_table")
   }
 
+  protected def testBloomFilterCreation(bloomFilterKind: Kind) {
+val tableName = "bloomFilter"
+
+withTempDir { dir =>
+  withTable(tableName) {
+val sqlStatement = orcImp match {
+  case "native" =>
+s"""
+   |CREATE TABLE $tableName (a INT, b STRING)
+   |USING ORC
+   |OPTIONS (
+   |  path '${dir.toURI}',
+   |  orc.bloom.filter.columns '*',
+   |  orc.bloom.filter.fpp 0.1
+   |)
+""".stripMargin
+  case "hive" =>
+s"""
+   |CREATE TABLE $tableName (a INT, b STRING)
+   |STORED AS ORC
+   |LOCATION '${dir.toURI}'
+   |TBLPROPERTIES (
+   |  orc.bloom.filter.columns='*',
+   |  orc.bloom.filter.fpp=0.1
+   |)
+""".stripMargin
+  case impl =>
+throw new UnsupportedOperationException(s"Unknown ORC 
implementation: $impl")
+}
+
+sql(sqlStatement)
+sql(s"INSERT INTO $tableName VALUES (1, 'str')")
+
+val partFiles = dir.listFiles()
+  .filter(f => f.isFile && !f.getName.startsWith(".") && 
!f.getName.startsWith("_"))
+assert(partFiles.length === 1)
+
+val orcFilePath = new Path(partFiles.head.getAbsolutePath)
+val readerOptions = OrcFile.readerOptions(new Configuration())
+val reader = OrcFile.createReader(orcFilePath, readerOptions)
+var recordReader: RecordReaderImpl = null
+try {
+  recordReader = reader.rows.asInstanceOf[RecordReaderImpl]
+
+  // BloomFilter array is created for all types; `struct`, int 
(`a`), string (`b`)
+  val sargColumns = Array(true, true, true)
+  val orcIndex = recordReader.readRowIndex(0, null, sargColumns)
+
+  // Check the types and counts of bloom filters
+  assert(orcIndex.getBloomFilterKinds.forall(_ === 
bloomFilterKind))
--- End diff --

Something like

```
== Physical Plan ==
*(1) Project [_1#3]
+- *(1) Filter (isnotnull(_1#3) && (_1#3._1 = true))
   +- *(1) FileScan parquet [_1#3] Batched: false, Format: Orc, 
  PushedFilters: [IsNotNull(_1), EqualTo(_1._1,true)]
  BloomFilters: [some information]
```

Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22418: [SPARK-25427][SQL][TEST] Add BloomFilter creation...

2018-09-17 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22418#discussion_r218271729
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
 ---
@@ -50,6 +55,66 @@ abstract class OrcSuite extends OrcTest with 
BeforeAndAfterAll {
   .createOrReplaceTempView("orc_temp_table")
   }
 
+  protected def testBloomFilterCreation(bloomFilterKind: Kind) {
+val tableName = "bloomFilter"
+
+withTempDir { dir =>
+  withTable(tableName) {
+val sqlStatement = orcImp match {
+  case "native" =>
+s"""
+   |CREATE TABLE $tableName (a INT, b STRING)
+   |USING ORC
+   |OPTIONS (
+   |  path '${dir.toURI}',
+   |  orc.bloom.filter.columns '*',
+   |  orc.bloom.filter.fpp 0.1
+   |)
+""".stripMargin
+  case "hive" =>
+s"""
+   |CREATE TABLE $tableName (a INT, b STRING)
+   |STORED AS ORC
+   |LOCATION '${dir.toURI}'
+   |TBLPROPERTIES (
+   |  orc.bloom.filter.columns='*',
+   |  orc.bloom.filter.fpp=0.1
+   |)
+""".stripMargin
+  case impl =>
+throw new UnsupportedOperationException(s"Unknown ORC 
implementation: $impl")
+}
+
+sql(sqlStatement)
+sql(s"INSERT INTO $tableName VALUES (1, 'str')")
+
+val partFiles = dir.listFiles()
+  .filter(f => f.isFile && !f.getName.startsWith(".") && 
!f.getName.startsWith("_"))
+assert(partFiles.length === 1)
+
+val orcFilePath = new Path(partFiles.head.getAbsolutePath)
+val readerOptions = OrcFile.readerOptions(new Configuration())
+val reader = OrcFile.createReader(orcFilePath, readerOptions)
+var recordReader: RecordReaderImpl = null
+try {
+  recordReader = reader.rows.asInstanceOf[RecordReaderImpl]
+
+  // BloomFilter array is created for all types; `struct`, int 
(`a`), string (`b`)
+  val sargColumns = Array(true, true, true)
+  val orcIndex = recordReader.readRowIndex(0, null, sargColumns)
+
+  // Check the types and counts of bloom filters
+  assert(orcIndex.getBloomFilterKinds.forall(_ === 
bloomFilterKind))
--- End diff --

Thank you for review, @dbtsai . Definitely, it would be great if we show 
some information.
BTW, could you elaborate more about `in the physical plan like predicate 
pushdown in parquet` specifically?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22445: Branch 2.3 udf nullability

2018-09-17 Thread ptkool

Github user ptkool closed the pull request at:

https://github.com/apache/spark/pull/22445


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22445: Branch 2.3 udf nullability

2018-09-17 Thread ptkool

GitHub user ptkool opened a pull request:

https://github.com/apache/spark/pull/22445

Branch 2.3 udf nullability

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Shopify/spark branch-2.3-udf_nullability

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22445.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22445


commit 4ff52d14df0f40d6114398f0355c074c7f8b8530
Author: ptkool 
Date:   2018-08-14T11:18:01Z

Branch 2.3 udf nullability




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22429
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96154/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

2018-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22429
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

2018-09-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22429
  
**[Test build #96154 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96154/testReport)**
 for PR 22429 at commit 
[`71ff7d1`](https://github.com/apache/spark/commit/71ff7d1387fbe7d30299fe38471bce26fe73dad5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 501 matches

Mail list logo