[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

2018-05-06 Thread ifilonenko
Github user ifilonenko commented on a diff in the pull request:

https://github.com/apache/spark/pull/21092#discussion_r186329528
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile
 ---
@@ -0,0 +1,34 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+ARG base_img
+FROM $base_img
+WORKDIR /
+RUN mkdir ${SPARK_HOME}/python
+COPY python/lib ${SPARK_HOME}/python/lib
+RUN apk add --no-cache python && \
+apk add --no-cache python3 && \
+python -m ensurepip && \
+python3 -m ensurepip && \
+rm -r /usr/lib/python*/ensurepip && \
+pip install --upgrade pip setuptools && \
--- End diff --

Correct. Would love recommendations on dependency management in regards to 
‘pip’ as it’s tricky to allow for both pip installation and pip3 
installation. Unless I use two separate virtual environments for dependency 
management 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21231: [SPARK-24119][SQL]Add interpreted execution to SortPrefi...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21231
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90288/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21231: [SPARK-24119][SQL]Add interpreted execution to SortPrefi...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21231
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21252: [SPARK-24193] Sort by disk when number of limit is big i...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21252
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2979/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21231: [SPARK-24119][SQL]Add interpreted execution to SortPrefi...

2018-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21231
  
**[Test build #90288 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90288/testReport)**
 for PR 21231 at commit 
[`dc9a070`](https://github.com/apache/spark/commit/dc9a0702584c9fd16a1d25c675aaf2b424cd7623).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21252: [SPARK-24193] Sort by disk when number of limit is big i...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21252
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21252: [SPARK-24193] Sort by disk when number of limit is big i...

2018-05-06 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/21252
  
> Instead of touching inside of TakeOrderedAndProjectExec, how about we 
don't replace Sort + Limit with TakeOrderedAndProjectExec when reaching the 
threshold?

Yes, the code will be much cleaner.  I updated the change. 
Note that all data will still be sorted if above the threshold and all data 
will be within one partition after the limit operator


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21252: [SPARK-24193] Sort by disk when number of limit is big i...

2018-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21252
  
**[Test build #90296 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90296/testReport)**
 for PR 21252 at commit 
[`0cbacc3`](https://github.com/apache/spark/commit/0cbacc36fcb2cf965f0697effe1bb6f02db5fe8f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21198: [SPARK-24126][pyspark] Use build-specific temp di...

2018-05-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21198


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21198: [SPARK-24126][pyspark] Use build-specific temp directory...

2018-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21198
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

2018-05-06 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21092#discussion_r186326478
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile
 ---
@@ -0,0 +1,34 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+ARG base_img
+FROM $base_img
+WORKDIR /
+RUN mkdir ${SPARK_HOME}/python
+COPY python/lib ${SPARK_HOME}/python/lib
+RUN apk add --no-cache python && \
+apk add --no-cache python3 && \
+python -m ensurepip && \
+python3 -m ensurepip && \
+rm -r /usr/lib/python*/ensurepip && \
+pip install --upgrade pip setuptools && \
--- End diff --

this goes to python2 only, I think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21198: [SPARK-24126][pyspark] Use build-specific temp directory...

2018-05-06 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21198
  
yes, for R, during package check it can only write to the R tempdir() 
(which can be changed in a number of ways)



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10....

2018-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21070#discussion_r186325093
  
--- Diff: dev/deps/spark-deps-hadoop-2.7 ---
@@ -163,13 +163,13 @@ orc-mapreduce-1.4.3-nohive.jar
 oro-2.0.8.jar
 osgi-resource-locator-1.0.1.jar
 paranamer-2.8.jar
-parquet-column-1.8.2.jar
-parquet-common-1.8.2.jar
-parquet-encoding-1.8.2.jar
-parquet-format-2.3.1.jar
-parquet-hadoop-1.8.2.jar
+parquet-column-1.10.0.jar
+parquet-common-1.10.0.jar
+parquet-encoding-1.10.0.jar
+parquet-format-2.4.0.jar
+parquet-hadoop-1.10.0.jar
 parquet-hadoop-bundle-1.6.0.jar
-parquet-jackson-1.8.2.jar
+parquet-jackson-1.10.0.jar
--- End diff --

(btw, don't forget to fix 
https://github.com/apache/spark/blob/ce7ba2e98e0a3b038e881c271b5905058c43155b/dev/deps/spark-deps-hadoop-3.1#L184-L190
 too)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21240
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2978/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21240
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

2018-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21092
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2895/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16478
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2977/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16478
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2976/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...

2018-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21240
  
**[Test build #90295 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90295/testReport)**
 for PR 21240 at commit 
[`4ab3af0`](https://github.com/apache/spark/commit/4ab3af0c1abfd0ac078c968dbe589bf96091).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...

2018-05-06 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21240
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21240
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90287/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21240
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...

2018-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21240
  
**[Test build #90287 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90287/testReport)**
 for PR 21240 at commit 
[`4ab3af0`](https://github.com/apache/spark/commit/4ab3af0c1abfd0ac078c968dbe589bf96091).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

2018-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21092
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2895/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21092
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2975/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21092
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21252: [SPARK-24193] Sort by disk when number of limit is big i...

2018-05-06 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21252
  
Instead of touching inside of `TakeOrderedAndProjectExec`, how about we 
don't replace `Sort` + `Limit` with `TakeOrderedAndProjectExec` when reaching 
the threshold? A.k.a:

```scala
  object SpecialLimits extends Strategy {
override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
  case ReturnAnswer(rootPlan) => rootPlan match {
case Limit(IntegerLiteral(limit), Sort(order, true, child)) if 
limit < conf.sortInMemForLimitThreshold =>
  TakeOrderedAndProjectExec(limit, order, child.output, 
planLater(child)) :: Nil
 ...
  }
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

2018-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21092
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2894/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2018-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16478
  
**[Test build #90294 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90294/testReport)**
 for PR 16478 at commit 
[`ae00de1`](https://github.com/apache/spark/commit/ae00de13dd779a2a09b142c54a2fcc144d7f8c23).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

2018-05-06 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/21253
  
@brkyvz Can you take a look?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

2018-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21092
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2894/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21092
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.

2018-05-06 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21070
  
> sql("select * from parquetTable where value = '0'")
> sql("select * from parquetTable where value = 0")

Semantically, they are different. The results should be also different. The 
plan shows whether these predicates are pushed down or not. 

Without the upgrade to 1.10.0, we can't get the benefit from the predicate 
pushdown between the above two queries. I also double checked it in my local 
environment. Now, it sounds like it is good reason to upgrade the version, 
since string predicates are pretty common. 

@rdblue @maropu Thank you for your great efforts and patience! This upgrade 
is definitely good to have. We should try to make it in the next release.

@maropu Could you also submit a separate PR for your micro benchmark? 
Thanks!

cc @cloud-fan @hvanhovell @mswit-databricks Do you have any comment about 
the implementation of this PR? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21092
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2974/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #90293 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90293/testReport)**
 for PR 16677 at commit 
[`062b8fd`](https://github.com/apache/spark/commit/062b8fd58ae13f252b1e6f61c70b69ed05521715).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21239: [SPARK-24040][SS] Support single partition aggregates in...

2018-05-06 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/21239
  
This looks pretty good to me. My only major concern is how we are enabling 
it. Adding such a wide enable-all flag is to enable this is not a good idea. 
Rather I would like it to be enabled surgically - 

- Unsupported operation checker allows the query when there is only 1 
aggregate on streaming (this does not need to change when we add 
multi-partition aggregate)

-  ContinuousExecution/UnsupportedOperationChecker adds one additional 
check to verify whether shuffle partition is 1 (single line check, can be 
deleted when we add multi-partition aggregate).



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21254: [SPARK-23094][SPARK-23723][SPARK-23724][SQL][FOLL...

2018-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21254#discussion_r186322250
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -2313,6 +2313,25 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
 }
   }
 
+  test("SPARK-23723: write json in UTF-16/32 with multiline off") {
+Seq("UTF-16", "UTF-32").foreach { encoding =>
+  withTempPath { path =>
+val ds = spark.createDataset(Seq(
+  ("a", 1), ("b", 2), ("c", 3))
+).repartition(2)
--- End diff --

we don't have to repartition though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21239: [SPARK-24040][SS] Support single partition aggreg...

2018-05-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/21239#discussion_r186322252
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousAggregationSuite.scala
 ---
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming.continuous
+
+import org.apache.spark.sql.AnalysisException
+import 
org.apache.spark.sql.execution.streaming.sources.ContinuousMemoryStream
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.streaming.OutputMode
+
+class ContinuousAggregationSuite extends ContinuousSuiteBase {
+  import testImplicits._
+
+  test("not enabled") {
+val ex = intercept[AnalysisException] {
+  val input = ContinuousMemoryStream.singlePartition[Int]
+
+  testStream(input.toDF().agg(max('value)), OutputMode.Complete)(
+AddData(input, 0, 1, 2),
+CheckAnswer(2),
+StopStream,
+AddData(input, 3, 4, 5),
+StartStream(),
+CheckAnswer(5),
+AddData(input, -1, -2, -3),
+CheckAnswer(5))
+}
+
+assert(ex.getMessage.contains("Continuous processing does not support 
Aggregate operations"))
+  }
+
+  test("basic") {
+withSQLConf(("spark.sql.streaming.continuous.allowAllOperators", 
"true")) {
+  val input = ContinuousMemoryStream.singlePartition[Int]
+
+  testStream(input.toDF().agg(max('value)), OutputMode.Complete)(
--- End diff --

Is only complete mode allowed?  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21254: [SPARK-23094][SPARK-23723][SPARK-23724][SQL][FOLLOW-UP] ...

2018-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21254
  
Nope, I am quite sure that we don't have any kind of hidden behaviour 
change. Both `lineSep` and `encoding` options are new. Also, these are actually 
now restricter than it's actually needed for now. Writing can actually work and 
https://github.com/apache/spark/pull/21247 tries to allow it; however, I left a 
comment for him to just focus on getting rid of the restrictions, which is his 
final goal in 2.4.0 (or 3.0.0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21253
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21253: [SPARK-24158][SS] Enable no-data batches for streaming j...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21253
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2973/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

2018-05-06 Thread ifilonenko
Github user ifilonenko commented on a diff in the pull request:

https://github.com/apache/spark/pull/21092#discussion_r186321782
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ---
@@ -63,10 +67,17 @@ private[spark] case class KubernetesConf[T <: 
KubernetesRoleSpecificConf](
 .map(str => str.split(",").toSeq)
 .getOrElse(Seq.empty[String])
 
-  def sparkFiles(): Seq[String] = sparkConf
-.getOption("spark.files")
-.map(str => str.split(",").toSeq)
-.getOrElse(Seq.empty[String])
+  def pyFiles(): Option[String] = sparkConf
+.get(KUBERNETES_PYSPARK_PY_FILES)
+
+  def pySparkMainResource(): Option[String] = sparkConf
--- End diff --

I need to parse out the MainAppResource (which I thought we should be doing 
only once... as such, I thought it would be cleaner to do this... 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21239: [SPARK-24040][SS] Support single partition aggreg...

2018-05-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/21239#discussion_r186321753
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousAggregationSuite.scala
 ---
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming.continuous
+
+import org.apache.spark.sql.AnalysisException
+import 
org.apache.spark.sql.execution.streaming.sources.ContinuousMemoryStream
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.streaming.OutputMode
+
+class ContinuousAggregationSuite extends ContinuousSuiteBase {
+  import testImplicits._
+
+  test("not enabled") {
+val ex = intercept[AnalysisException] {
+  val input = ContinuousMemoryStream.singlePartition[Int]
+
+  testStream(input.toDF().agg(max('value)), OutputMode.Complete)(
+AddData(input, 0, 1, 2),
+CheckAnswer(2),
+StopStream,
+AddData(input, 3, 4, 5),
+StartStream(),
+CheckAnswer(5),
+AddData(input, -1, -2, -3),
--- End diff --

You dont need these many action just to check whether it throws 
AnalysisException


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21239: [SPARK-24040][SS] Support single partition aggreg...

2018-05-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/21239#discussion_r186321671
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousAggregationSuite.scala
 ---
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming.continuous
+
+import org.apache.spark.sql.AnalysisException
+import 
org.apache.spark.sql.execution.streaming.sources.ContinuousMemoryStream
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.streaming.OutputMode
+
+class ContinuousAggregationSuite extends ContinuousSuiteBase {
+  import testImplicits._
+
+  test("not enabled") {
+val ex = intercept[AnalysisException] {
+  val input = ContinuousMemoryStream.singlePartition[Int]
+
+  testStream(input.toDF().agg(max('value)), OutputMode.Complete)(
+AddData(input, 0, 1, 2),
+CheckAnswer(2),
+StopStream,
+AddData(input, 3, 4, 5),
+StartStream(),
+CheckAnswer(5),
+AddData(input, -1, -2, -3),
+CheckAnswer(5))
+}
+
+assert(ex.getMessage.contains("Continuous processing does not support 
Aggregate operations"))
+  }
+
+  test("basic") {
+withSQLConf(("spark.sql.streaming.continuous.allowAllOperators", 
"true")) {
--- End diff --

Why are we using this flag to enable this. I would expect that this will be 
enabled by default with a more specific feature flag (say, 
continuousAggregationsEnabled) to turn this off. Also, in that case, the checks 
inside UnsupportedOperationChecker will be more specific (single aggregation, 
with shuffle partition = 1, no watermark, etc. ).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21254: [SPARK-23094][SPARK-23723][SPARK-23724][SQL][FOLLOW-UP] ...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21254
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21254: [SPARK-23094][SPARK-23723][SPARK-23724][SQL][FOLLOW-UP] ...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21254
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2972/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to b...

2018-05-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21122


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21239: [SPARK-24040][SS] Support single partition aggreg...

2018-05-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/21239#discussion_r186321416
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDD.scala
 ---
@@ -71,8 +72,15 @@ class StateStoreRDD[T: ClassTag, U: ClassTag](
   StateStoreId(checkpointLocation, operatorId, partition.index),
   queryRunId)
 
+// If we're in continuous processing mode, we should get the store 
version for the current
+// epoch rather than the one at planning time.
+val currentVersion = EpochTracker.getCurrentEpoch match {
+  case -1 => storeVersion
--- End diff --

I dont like the fact that -1 is used as value for non-existence, especially 
since this value is present deep inside the EpochTracker. Might as well use 
Options to return None. Then the use of -1 will be contained within the 
EpochTracker class. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16478
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90290/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21254: [SPARK-23094][SPARK-23723][SPARK-23724][SQL][FOLLOW-UP] ...

2018-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21254
  
**[Test build #90292 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90292/testReport)**
 for PR 21254 at commit 
[`d4c290e`](https://github.com/apache/spark/commit/d4c290e85eab07706a8a612dbbf58d5c14588b43).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21239: [SPARK-24040][SS] Support single partition aggreg...

2018-05-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/21239#discussion_r186321310
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/ContinuousMemoryStream.scala
 ---
@@ -47,18 +47,17 @@ import org.apache.spark.util.RpcUtils
  *ContinuousMemoryStreamDataReader instances to poll. It returns the 
record at the specified
  *offset within the list, or null if that offset doesn't yet have a 
record.
  */
-class ContinuousMemoryStream[A : Encoder](id: Int, sqlContext: SQLContext)
+class ContinuousMemoryStream[A : Encoder](id: Int, sqlContext: SQLContext, 
numPartitions: Int = 2)
--- End diff --

+1, I had a itch about this in the continuous memory stream PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16478
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2018-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16478
  
**[Test build #90290 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90290/testReport)**
 for PR 16478 at commit 
[`9fd6c2f`](https://github.com/apache/spark/commit/9fd6c2f21353b41e11748a3f9edaf17647c8bd7f).
 * This patch **fails to generate documentation**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21239: [SPARK-24040][SS] Support single partition aggreg...

2018-05-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/21239#discussion_r186321237
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/EpochTracker.scala
 ---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.continuous
+
+import java.util.concurrent.atomic.AtomicLong
+
+object EpochTracker {
+  // The current epoch. Note that this is a shared reference; 
ContinuousWriteRDD.compute() will
--- End diff --

This goes for other methods below.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to b...

2018-05-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21122#discussion_r186321180
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -1354,7 +1354,8 @@ class HiveDDLSuite
 val indexName = tabName + "_index"
 withTable(tabName) {
   // Spark SQL does not support creating index. Thus, we have to use 
Hive client.
-  val client = 
spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog].client
+  val client =
+
spark.sharedState.externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client
--- End diff --

The issue is out of scope this PR. We can continue addressing the related 
issues in the future PRs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to be an in...

2018-05-06 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21122
  
Thanks! Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21254: [SPARK-23094][SPARK-23723][SPARK-23724][SQL][FOLLOW-UP] ...

2018-05-06 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21254
  
cc @MaxGekk @HyukjinKwon Do we have any behavior change after the previous 
PR: https://github.com/apache/spark/pull/20937?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21239: [SPARK-24040][SS] Support single partition aggreg...

2018-05-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/21239#discussion_r186321087
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/EpochTracker.scala
 ---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.continuous
+
+import java.util.concurrent.atomic.AtomicLong
+
+object EpochTracker {
+  // The current epoch. Note that this is a shared reference; 
ContinuousWriteRDD.compute() will
--- End diff --

Better to not refer to specific usages of methods, rather the general 
purpose of the method. In future, classes other than ContinuousWriteRDD can use 
it. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21254: [SPARK-23094][SPARK-23723][SPARK-23724][SQL][FOLL...

2018-05-06 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/21254

[SPARK-23094][SPARK-23723][SPARK-23724][SQL][FOLLOW-UP] Support custom 
encoding for json files

## What changes were proposed in this pull request?
This is to add a test case to check the behaviors when users write json in 
the specified UTF-16/UTF-32 encoding with multiline off.

## How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark followupSPARK-23094

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21254.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21254


commit 07de099d9fe3b2a39380ce08d095a564ce6c00f4
Author: gatorsmile 
Date:   2018-05-07T03:35:52Z

test case

commit d4c290e85eab07706a8a612dbbf58d5c14588b43
Author: gatorsmile 
Date:   2018-05-07T03:37:02Z

name




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21239: [SPARK-24040][SS] Support single partition aggreg...

2018-05-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/21239#discussion_r186320975
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/EpochTracker.scala
 ---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.continuous
+
+import java.util.concurrent.atomic.AtomicLong
+
+object EpochTracker {
--- End diff --

Add docs to explain the purpose and usage of this class. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21253: [SPARK-24158][SS] Enabled no-data batches for streaming ...

2018-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21253
  
**[Test build #90291 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90291/testReport)**
 for PR 21253 at commit 
[`cb5f55b`](https://github.com/apache/spark/commit/cb5f55b4622fc8637950013a5f6a7005cecf9a07).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21252: [SPARK-24193] Sort by disk when number of limit i...

2018-05-06 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21252#discussion_r186320836
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ---
@@ -127,13 +127,36 @@ case class TakeOrderedAndProjectExec(
 projectList: Seq[NamedExpression],
 child: SparkPlan) extends UnaryExecNode {
 
+  val sortInMemThreshold = conf.sortInMemForLimitThreshold
+
+  override def requiredChildDistribution: Seq[Distribution] = {
+if (limit < sortInMemThreshold) {
+  super.requiredChildDistribution
+} else {
+  Seq(AllTuples)
+}
+  }
+
+  override def requiredChildOrdering: Seq[Seq[SortOrder]] = {
+if (limit < sortInMemThreshold) {
+  super.requiredChildOrdering
+} else {
+  Seq(sortOrder)
+}
+  }
--- End diff --

Thanks a lot for comments :)
Our users have queries with big number of limit, which suffers OOM with 
current code. So I added a config, below which we go by current code, if not we 
do global sort.
Yes there will be performance degradation. But it's better than OOM;
I think a limit-ed heap sort by disk will be a better idea. But I'm didn't 
find an existing implementation and don't want to make it too complicated here.
Thanks again for review this @viirya 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21253: [SPARK-24158][SS] Enabled no-data batches for str...

2018-05-06 Thread tdas
GitHub user tdas opened a pull request:

https://github.com/apache/spark/pull/21253

[SPARK-24158][SS] Enabled no-data batches for streaming joins

## What changes were proposed in this pull request?

This is a continuation of the larger task of enabling zero-data batches for 
more eager state cleanup. This PR enables it for stream-stream joins. 

## How was this patch tested?
- Updated join tests. Additionally, updated them to not use 
`CheckLastBatch` anywhere to set good precedence for future.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tdas/spark SPARK-24158

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21253.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21253


commit cb5f55b4622fc8637950013a5f6a7005cecf9a07
Author: Tathagata Das 
Date:   2018-05-03T02:55:35Z

Enabled no-data batches for joins




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16478
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2971/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16478
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21239: [SPARK-24040][SS] Support single partition aggreg...

2018-05-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/21239#discussion_r186320417
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
 ---
@@ -123,15 +123,7 @@ class ContinuousExecution(
 }
 committedOffsets = nextOffsets.toStreamProgress(sources)
 
-// Get to an epoch ID that has definitely never been sent to a 
sink before. Since sink
-// commit happens between offset log write and commit log write, 
this means an epoch ID
-// which is not in the offset log.
-val (latestOffsetEpoch, _) = offsetLog.getLatest().getOrElse {
-  throw new IllegalStateException(
-s"Offset log had no latest element. This shouldn't be possible 
because nextOffsets is" +
-  s"an element.")
-}
-currentBatchId = latestOffsetEpoch + 1
+currentBatchId = latestEpochId + 1
--- End diff --

nit: remove the line above.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2018-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16478
  
**[Test build #90290 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90290/testReport)**
 for PR 16478 at commit 
[`9fd6c2f`](https://github.com/apache/spark/commit/9fd6c2f21353b41e11748a3f9edaf17647c8bd7f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21252: [SPARK-24193] Sort by disk when number of limit i...

2018-05-06 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21252#discussion_r186318646
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ---
@@ -127,13 +127,36 @@ case class TakeOrderedAndProjectExec(
 projectList: Seq[NamedExpression],
 child: SparkPlan) extends UnaryExecNode {
 
+  val sortInMemThreshold = conf.sortInMemForLimitThreshold
+
+  override def requiredChildDistribution: Seq[Distribution] = {
+if (limit < sortInMemThreshold) {
+  super.requiredChildDistribution
+} else {
+  Seq(AllTuples)
+}
+  }
+
+  override def requiredChildOrdering: Seq[Seq[SortOrder]] = {
+if (limit < sortInMemThreshold) {
+  super.requiredChildOrdering
+} else {
+  Seq(sortOrder)
+}
+  }
--- End diff --

It was only shuffling limit-ed rows from each partitions. The final sort 
was also only applied on limited-data form all partitions.

Now it requires sorting and shuffling on whole data. I suspect that it'd 
degrade performance in the end.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21198: [SPARK-24126][pyspark] Use build-specific temp directory...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21198
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21198: [SPARK-24126][pyspark] Use build-specific temp directory...

2018-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21198
  
**[Test build #90289 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90289/testReport)**
 for PR 21198 at commit 
[`474baf9`](https://github.com/apache/spark/commit/474baf9bc61363ae2d06dbb638abb895005f55fc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21198: [SPARK-24126][pyspark] Use build-specific temp directory...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21198
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90289/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21235: [SPARK-24181][SQL] Better error message for writi...

2018-05-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21235#discussion_r186316448
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -339,9 +339,16 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
 }
   }
 
-  private def assertNotBucketed(operation: String): Unit = {
-if (numBuckets.isDefined || sortColumnNames.isDefined) {
-  throw new AnalysisException(s"'$operation' does not support 
bucketing right now")
--- End diff --

How about keeping the function name unchanged and just changing this 
message and list the sort columns if having. Something like:

> '$operation' does not support bucketing. Number of buckets: ...; sortBy: 
...;


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21212: [SPARK-24143] filter empty blocks when convert ma...

2018-05-06 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21212#discussion_r186315362
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala 
---
@@ -267,28 +269,30 @@ final class ShuffleBlockFetcherIterator(
 // at most maxBytesInFlight in order to limit the amount of data in 
flight.
 val remoteRequests = new ArrayBuffer[FetchRequest]
 
-// Tracks total number of blocks (including zero sized blocks)
-var totalBlocks = 0
 for ((address, blockInfos) <- blocksByAddress) {
-  totalBlocks += blockInfos.size
   if (address.executorId == blockManager.blockManagerId.executorId) {
-// Filter out zero-sized blocks
-localBlocks ++= blockInfos.filter(_._2 != 0).map(_._1)
+blockInfos.find(_._2 <= 0) match {
+  case Some((blockId, size)) if size < 0 =>
+throw new BlockException(blockId, "Negative block size " + 
size)
+  case Some((blockId, size)) if size == 0 =>
+throw new BlockException(blockId, "Zero-sized blocks should be 
excluded.")
--- End diff --

I think that failing with an exception here is a great idea, so thanks for 
adding these checks. In general, I'm in favor of adding explicit fail-fast 
checks for invariants like this because it can help to defend against silent 
corruption bugs. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.

2018-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21070
  
For documentation, I think it's fine. it's a trackable history in JIRA 
although it's a bit hard to search .. Also, there might be some diff we should 
check since I haven't checked the details of JIRAs and if it's still true.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21198: [SPARK-24126][pyspark] Use build-specific temp directory...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21198
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2970/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21198: [SPARK-24126][pyspark] Use build-specific temp directory...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21198
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21198: [SPARK-24126][pyspark] Use build-specific temp directory...

2018-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21198
  
**[Test build #90289 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90289/testReport)**
 for PR 21198 at commit 
[`474baf9`](https://github.com/apache/spark/commit/474baf9bc61363ae2d06dbb638abb895005f55fc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21198: [SPARK-24126][pyspark] Use build-specific temp directory...

2018-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21198
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21231: [SPARK-24119][SQL]Add interpreted execution to SortPrefi...

2018-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21231
  
**[Test build #90288 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90288/testReport)**
 for PR 21231 at commit 
[`dc9a070`](https://github.com/apache/spark/commit/dc9a0702584c9fd16a1d25c675aaf2b424cd7623).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21240
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21240
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2969/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...

2018-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21240
  
**[Test build #90287 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90287/testReport)**
 for PR 21240 at commit 
[`4ab3af0`](https://github.com/apache/spark/commit/4ab3af0c1abfd0ac078c968dbe589bf96091).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21231: [SPARK-24119][SQL]Add interpreted execution to So...

2018-05-06 Thread bersprockets
Github user bersprockets commented on a diff in the pull request:

https://github.com/apache/spark/pull/21231#discussion_r186307490
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SortOrderExpressionsSuite.scala
 ---
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import java.sql.{Date, Timestamp}
+import java.util.TimeZone
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+import org.apache.spark.util.collection.unsafe.sort.PrefixComparators._
+
+class SortOrderExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
+
+  test("SortPrefix") {
+// Explicitly choose a time zone, since Date objects can create 
different values depending on
+// local time zone of the machine on which the test is running
+TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles"))
--- End diff --

@kiszk Maybe not a nit. I should fix that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21231: [SPARK-24119][SQL]Add interpreted execution to So...

2018-05-06 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21231#discussion_r186307162
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SortOrderExpressionsSuite.scala
 ---
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import java.sql.{Date, Timestamp}
+import java.util.TimeZone
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+import org.apache.spark.util.collection.unsafe.sort.PrefixComparators._
+
+class SortOrderExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
+
+  test("SortPrefix") {
+// Explicitly choose a time zone, since Date objects can create 
different values depending on
+// local time zone of the machine on which the test is running
+TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles"))
--- End diff --

nit: do we need to restore the original time zone?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.

2018-05-06 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21070
  
@HyukjinKwon Thanks for the explanation! I feel it's ok to keep this as it 
it for now. btw, we don't need to document the story somewhere?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21251: [SPARK-10878][core] Fix race condition when multiple cli...

2018-05-06 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21251
  
cc @jiangxb1987 @vanzin


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21106: [SPARK-23711][SQL][WIP] Add fallback logic for Un...

2018-05-06 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21106#discussion_r186305770
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CodegenObjectFactory.scala
 ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.codehaus.commons.compiler.CompileException
+import org.codehaus.janino.InternalCompilerException
+
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.Utils
+
+/**
+ * Catches compile error during code generation.
+ */
+object CodegenError {
+  def unapply(throwable: Throwable): Option[Exception] = throwable match {
+case e: InternalCompilerException => Some(e)
+case e: CompileException => Some(e)
+case _ => None
+  }
+}
+
+/**
+ * A factory which can be used to create objects that have both codegen 
and interpreted
+ * implementations. This tries to create codegen object first, if any 
compile error happens,
+ * it fallbacks to interpreted version.
+ */
+abstract class CodegenObjectFactory[IN, OUT] {
+
+  // Creates wanted object. First trying codegen implementation. If any 
compile error happens,
+  // fallbacks to interpreted version.
+  def createObject(in: IN): OUT = {
+val fallbackMode = SQLConf.get.getConf(SQLConf.CODEGEN_OBJECT_FALLBACK)
--- End diff --

We can only access SQLConf on the driver. We can't know this config value 
inside `createObject` which can be called on executors. 

To solve this, `UnsafeProjection` can't be an object as now, but a case 
class with a fallback mode parameter. So we can create this factory on driver. 
@cloud-fan @hvanhovell WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21231: [SPARK-24119][SQL]Add interpreted execution to SortPrefi...

2018-05-06 Thread bersprockets
Github user bersprockets commented on the issue:

https://github.com/apache/spark/pull/21231
  
@maropu @kiszk Hopefully I've addressed all comments. Please take a look.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21180: [SPARK-22674][PYTHON] Disabled _hack_namedtuple for pick...

2018-05-06 Thread superbobry
Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21180
  
Hi @HyukjinKwon, could you kindly have a look at the new version? It is 
backwards compatible and has the same robustness guarantees as 
`collections.namedtuple` from CPython.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21182
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21182
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90286/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...

2018-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21182
  
**[Test build #90286 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90286/testReport)**
 for PR 21182 at commit 
[`6610826`](https://github.com/apache/spark/commit/6610826fbb229ca509c0ae9e023b36253852af33).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21212: [SPARK-24143] filter empty blocks when convert mapstatus...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21212
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90283/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21212: [SPARK-24143] filter empty blocks when convert mapstatus...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21212
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21212: [SPARK-24143] filter empty blocks when convert mapstatus...

2018-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21212
  
**[Test build #90283 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90283/testReport)**
 for PR 21212 at commit 
[`094d829`](https://github.com/apache/spark/commit/094d829ff5c36cdc358f4d6e635297ca4f4e400f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18447
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90285/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...

2018-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18447
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...

2018-05-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18447
  
**[Test build #90285 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90285/testReport)**
 for PR 18447 at commit 
[`b964573`](https://github.com/apache/spark/commit/b96457399433279964575ff570f9364d628dd240).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >