date:20180401

[GitHub] spark issue #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and BufferHolder...

2018-04-01 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20850
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and BufferHolder...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20850
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and BufferHolder...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20850
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1886/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and BufferHolder...

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20850
  
**[Test build #88789 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88789/testReport)**
 for PR 20850 at commit 
[`6caf11c`](https://github.com/apache/spark/commit/6caf11cc0244a43e1b01c3b76942eea67a7318f5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20955: [SPARK-23838][WEBUI] Running SQL query is display...

2018-04-01 Thread gengliangwang

GitHub user gengliangwang opened a pull request:

https://github.com/apache/spark/pull/20955

[SPARK-23838][WEBUI] Running SQL query is displayed as "completed" in SQL 
tab

## What changes were proposed in this pull request?

A running SQL query would appear as completed in the Spark UI:

![image1](https://user-images.githubusercontent.com/1097932/38170733-3d7cb00c-35bf-11e8-994c-43f2d4fa285d.png)

We can see the query in "Completed queries", while in in the job page we 
see it's still running Job 132.

![image2](https://user-images.githubusercontent.com/1097932/38170735-48f2c714-35bf-11e8-8a41-6fae23543c46.png)

After some time in the query still appears in "Completed queries" (while 
it's still running), but the "Duration" gets increased.

![image3](https://user-images.githubusercontent.com/1097932/38170737-50f87ea4-35bf-11e8-8b60-000f6f918964.png)

To reproduce, we can run a query with multiple jobs. E.g. Run TPCDS q6.

The reason is that updates from executions are written into kvstore 
periodically, and the job start event may be missing.

## How was this patch tested?
Manually run the job again and check the SQL Tab. The fix is pretty simple.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gengliangwang/spark jobCompleted

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20955.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20955


commit b418feb6ac4bb7523c50ea27f74d3b02c310956a
Author: Wang Gengliang 
Date:   2018-04-01T07:08:06Z

Fix sql tab




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20909: [SPARK-23776][python][test] Check for needed components/...

2018-04-01 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20909
  
> I modeled this after pyspark/streaming/tests.py, which checks for prereqs 
and raises exceptions with a useful message so one can get past the error 
(although pyspark/streaming/tests.py only checks for its own prereqs, not those 
required by streaming docstring tests).

I actually have been thinking about skipping and proceeding the tests in 
`pyspark/streaming/tests.py` with an explicit message as well. Can we skip and 
continue the tests? I think we basically should just skip the tests.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20955: [SPARK-23838][WEBUI] Running SQL query is displayed as "...

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20955
  
**[Test build #88790 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88790/testReport)**
 for PR 20955 at commit 
[`b418feb`](https://github.com/apache/spark/commit/b418feb6ac4bb7523c50ea27f74d3b02c310956a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20909: [SPARK-23776][python][test] Check for needed comp...

2018-04-01 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20909#discussion_r178450100
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -2977,6 +2977,20 @@ def test_create_dateframe_from_pandas_with_dst(self):
 
 class HiveSparkSubmitTests(SparkSubmitTests):
 
+@classmethod
+def setUpClass(cls):
--- End diff --

I think this way is more correct, as @felixcheung pointed out.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20955: [SPARK-23838][WEBUI] Running SQL query is displayed as "...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20955
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1887/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20955: [SPARK-23838][WEBUI] Running SQL query is displayed as "...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20955
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20894
  
**[Test build #88791 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88791/testReport)**
 for PR 20894 at commit 
[`0904daf`](https://github.com/apache/spark/commit/0904daf52b1d94467f75bc6703d3d81747208e77).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20894
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20894
  
**[Test build #88791 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88791/testReport)**
 for PR 20894 at commit 
[`0904daf`](https://github.com/apache/spark/commit/0904daf52b1d94467f75bc6703d3d81747208e77).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20894
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88791/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20894
  
**[Test build #88792 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88792/testReport)**
 for PR 20894 at commit 
[`191b415`](https://github.com/apache/spark/commit/191b415e77bbc8fe662c19faae627e40787c5e23).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20953: [SPARK-23822][SQL] Improve error message for Parquet sch...

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20953
  
**[Test build #88793 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88793/testReport)**
 for PR 20953 at commit 
[`c65a8a6`](https://github.com/apache/spark/commit/c65a8a6465e234debab8ba4e50645e56c22025e1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20955: [SPARK-23838][WEBUI] Running SQL query is displayed as "...

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20955
  
**[Test build #88790 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88790/testReport)**
 for PR 20955 at commit 
[`b418feb`](https://github.com/apache/spark/commit/b418feb6ac4bb7523c50ea27f74d3b02c310956a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20955: [SPARK-23838][WEBUI] Running SQL query is displayed as "...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20955
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20955: [SPARK-23838][WEBUI] Running SQL query is displayed as "...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20955
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88790/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and BufferHolder...

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20850
  
**[Test build #88789 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88789/testReport)**
 for PR 20850 at commit 
[`6caf11c`](https://github.com/apache/spark/commit/6caf11cc0244a43e1b01c3b76942eea67a7318f5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and BufferHolder...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20850
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and BufferHolder...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20850
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88789/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20945: [SPARK-23790][Mesos] fix metastore connection issue

2018-04-01 Thread skonto

Github user skonto commented on the issue:

https://github.com/apache/spark/pull/20945
  
Failed unit test: 
org.apache.spark.launcher.LauncherServerSuite.testAppHandleDisconnect
Will re-test this is not from this patch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20945: [SPARK-23790][Mesos] fix metastore connection issue

2018-04-01 Thread skonto

Github user skonto commented on the issue:

https://github.com/apache/spark/pull/20945
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20945: [SPARK-23790][Mesos] fix metastore connection issue

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20945
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1888/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20945: [SPARK-23790][Mesos] fix metastore connection issue

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20945
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20945: [SPARK-23790][Mesos] fix metastore connection issue

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20945
  
**[Test build #88794 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88794/testReport)**
 for PR 20945 at commit 
[`1060405`](https://github.com/apache/spark/commit/1060405b11cc4da88e1a65b5d694fb7f9e7e18b5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20955: [SPARK-23838][WEBUI] Running SQL query is displayed as "...

2018-04-01 Thread gengliangwang

Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/20955
  
@vanzin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20894
  
**[Test build #88792 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88792/testReport)**
 for PR 20894 at commit 
[`191b415`](https://github.com/apache/spark/commit/191b415e77bbc8fe662c19faae627e40787c5e23).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20894
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20894
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88792/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20953: [SPARK-23822][SQL] Improve error message for Parquet sch...

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20953
  
**[Test build #88793 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88793/testReport)**
 for PR 20953 at commit 
[`c65a8a6`](https://github.com/apache/spark/commit/c65a8a6465e234debab8ba4e50645e56c22025e1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20953: [SPARK-23822][SQL] Improve error message for Parquet sch...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20953
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20953: [SPARK-23822][SQL] Improve error message for Parquet sch...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20953
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88793/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20955: [SPARK-23838][WEBUI] Running SQL query is displayed as "...

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20955
  
**[Test build #88795 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88795/testReport)**
 for PR 20955 at commit 
[`2357f59`](https://github.com/apache/spark/commit/2357f59a8f0b1cbea2ac1431bb3669cb1a26f687).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20955: [SPARK-23838][WEBUI] Running SQL query is displayed as "...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20955
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20955: [SPARK-23838][WEBUI] Running SQL query is displayed as "...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20955
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1889/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1890/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20886
  
**[Test build #88796 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88796/testReport)**
 for PR 20886 at commit 
[`f1de70c`](https://github.com/apache/spark/commit/f1de70c5284df3b3980d509e2ef7d7e77e020c1c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20954: [BACKPORT][SPARK-23040][CORE] Returns interruptible iter...

2018-04-01 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20954
  
thanks, merging to 2.3!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...

2018-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20850#discussion_r178460515
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java
 ---
@@ -86,11 +82,23 @@ public void grow(int neededSize) {
 }
   }
 
-  public void reset() {
+  byte[] buffer() {
--- End diff --

nit: `getBuffer` is more java-style


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20945: [SPARK-23790][Mesos] fix metastore connection issue

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20945
  
**[Test build #88794 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88794/testReport)**
 for PR 20945 at commit 
[`1060405`](https://github.com/apache/spark/commit/1060405b11cc4da88e1a65b5d694fb7f9e7e18b5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20945: [SPARK-23790][Mesos] fix metastore connection issue

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20945
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88794/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20945: [SPARK-23790][Mesos] fix metastore connection issue

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20945
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...

2018-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20850#discussion_r178460747
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java
 ---
@@ -32,141 +30,123 @@
  */
 public final class UnsafeArrayWriter extends UnsafeWriter {
 
-  private BufferHolder holder;
-
-  // The offset of the global buffer where we start to write this array.
-  private int startingOffset;
-
   // The number of elements in this array
   private int numElements;
 
+  // The element size in this array
+  private int elementSize;
+
   private int headerInBytes;
 
   private void assertIndexIsValid(int index) {
 assert index >= 0 : "index (" + index + ") should >= 0";
 assert index < numElements : "index (" + index + ") should < " + 
numElements;
   }
 
-  public void initialize(BufferHolder holder, int numElements, int 
elementSize) {
+  public UnsafeArrayWriter(UnsafeWriter writer, int elementSize) {
+super(writer.getBufferHolder());
+this.elementSize = elementSize;
+  }
+
+  public void initialize(int numElements) {
 // We need 8 bytes to store numElements in header
 this.numElements = numElements;
 this.headerInBytes = calculateHeaderPortionInBytes(numElements);
 
-this.holder = holder;
-this.startingOffset = holder.cursor;
+this.startingOffset = cursor();
 
 // Grows the global buffer ahead for header and fixed size data.
 int fixedPartInBytes =
   ByteArrayMethods.roundNumberOfBytesToNearestWord(elementSize * 
numElements);
 holder.grow(headerInBytes + fixedPartInBytes);
 
 // Write numElements and clear out null bits to header
-Platform.putLong(holder.buffer, startingOffset, numElements);
+Platform.putLong(buffer(), startingOffset, numElements);
 for (int i = 8; i < headerInBytes; i += 8) {
-  Platform.putLong(holder.buffer, startingOffset + i, 0L);
+  Platform.putLong(buffer(), startingOffset + i, 0L);
 }
 
 // fill 0 into reminder part of 8-bytes alignment in unsafe array
 for (int i = elementSize * numElements; i < fixedPartInBytes; i++) {
-  Platform.putByte(holder.buffer, startingOffset + headerInBytes + i, 
(byte) 0);
+  Platform.putByte(buffer(), startingOffset + headerInBytes + i, 
(byte) 0);
 }
-holder.cursor += (headerInBytes + fixedPartInBytes);
+increaseCursor(headerInBytes + fixedPartInBytes);
   }
 
-  private void zeroOutPaddingBytes(int numBytes) {
-if ((numBytes & 0x07) > 0) {
-  Platform.putLong(holder.buffer, holder.cursor + ((numBytes >> 3) << 
3), 0L);
-}
-  }
-
-  private long getElementOffset(int ordinal, int elementSize) {
+  private long getElementOffset(int ordinal) {
 return startingOffset + headerInBytes + ordinal * elementSize;
   }
 
-  public void setOffsetAndSize(int ordinal, int currentCursor, int size) {
-assertIndexIsValid(ordinal);
-final long relativeOffset = currentCursor - startingOffset;
-final long offsetAndSize = (relativeOffset << 32) | (long)size;
-
-write(ordinal, offsetAndSize);
-  }
-
   private void setNullBit(int ordinal) {
 assertIndexIsValid(ordinal);
-BitSetMethods.set(holder.buffer, startingOffset + 8, ordinal);
+BitSetMethods.set(buffer(), startingOffset + 8, ordinal);
   }
 
   public void setNull1Bytes(int ordinal) {
--- End diff --

seems now we only need a single `setNullAt` method.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20935: [SPARK-23819][SQL] Fix InMemoryTableScanExec comp...

2018-04-01 Thread pwoody

Github user pwoody commented on a diff in the pull request:

https://github.com/apache/spark/pull/20935#discussion_r178460763
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala
 ---
@@ -322,19 +324,76 @@ private[columnar] final class 
DecimalColumnStats(precision: Int, scale: Int) ext
 Array[Any](lower, upper, nullCount, count, sizeInBytes)
 }
 
-private[columnar] final class ObjectColumnStats(dataType: DataType) 
extends ColumnStats {
-  val columnType = ColumnType(dataType)
+private abstract class OrderableSafeColumnStats[T](dataType: DataType) 
extends ColumnStats {
+  protected var upper: T = _
+  protected var lower: T = _
+
+  private val columnType = ColumnType(dataType)
+  private val ordering = dataType match {
+case x if RowOrdering.isOrderable(dataType) =>
+  Option(TypeUtils.getInterpretedOrdering(x))
+case _ => None
+  }
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
 if (!row.isNullAt(ordinal)) {
-  val size = columnType.actualSize(row, ordinal)
-  sizeInBytes += size
+  sizeInBytes += columnType.actualSize(row, ordinal)
   count += 1
+  ordering.foreach { order =>
+val value = getValue(row, ordinal)
+if (upper == null || order.gt(value, upper)) upper = copy(value)
+if (lower == null || order.lt(value, lower)) lower = copy(value)
+  }
 } else {
-  gatherNullStats
+  gatherNullStats()
 }
   }
 
+  def getValue(row: InternalRow, ordinal: Int): T
+
+  def copy(value: T): T
+
+  override def collectedStatistics: Array[Any] =
+Array[Any](lower, upper, nullCount, count, sizeInBytes)
+}
+
+private[columnar] final class ArrayColumnStats(dataType: DataType)
+  extends OrderableSafeColumnStats[UnsafeArrayData](dataType) {
+  override def getValue(row: InternalRow, ordinal: Int): UnsafeArrayData =
+row.getArray(ordinal).asInstanceOf[UnsafeArrayData]
+
+  override def copy(value: UnsafeArrayData): UnsafeArrayData = value.copy()
+}
+
+private[columnar] final class StructColumnStats(dataType: DataType)
+  extends OrderableSafeColumnStats[UnsafeRow](dataType) {
+  private val numFields = dataType.asInstanceOf[StructType].fields.length
+
+  override def getValue(row: InternalRow, ordinal: Int): UnsafeRow =
+row.getStruct(ordinal, numFields).asInstanceOf[UnsafeRow]
+
+  override def copy(value: UnsafeRow): UnsafeRow = value.copy()
+}
+
+private[columnar] final class MapColumnStats(dataType: DataType) extends 
ColumnStats {
--- End diff --

Now that you mention it - we can just have it use it now since it will 
always fall through to the unorderable case. Everything will just work when we 
make it orderable w/o code change here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...

2018-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20850#discussion_r178460820
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
 ---
@@ -117,150 +138,81 @@ public long getFieldOffset(int ordinal) {
 return startingOffset + nullBitsSize + 8 * ordinal;
   }
 
-  public void setOffsetAndSize(int ordinal, int size) {
-setOffsetAndSize(ordinal, holder.cursor, size);
-  }
-
-  public void setOffsetAndSize(int ordinal, int currentCursor, int size) {
-final long relativeOffset = currentCursor - startingOffset;
-final long fieldOffset = getFieldOffset(ordinal);
-final long offsetAndSize = (relativeOffset << 32) | (long) size;
-
-Platform.putLong(holder.buffer, fieldOffset, offsetAndSize);
-  }
-
   public void write(int ordinal, boolean value) {
 final long offset = getFieldOffset(ordinal);
-Platform.putLong(holder.buffer, offset, 0L);
-Platform.putBoolean(holder.buffer, offset, value);
+Platform.putLong(buffer(), offset, 0L);
--- End diff --

`writeLong(0)`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...

2018-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20850#discussion_r178460833
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
 ---
@@ -117,150 +138,81 @@ public long getFieldOffset(int ordinal) {
 return startingOffset + nullBitsSize + 8 * ordinal;
   }
 
-  public void setOffsetAndSize(int ordinal, int size) {
-setOffsetAndSize(ordinal, holder.cursor, size);
-  }
-
-  public void setOffsetAndSize(int ordinal, int currentCursor, int size) {
-final long relativeOffset = currentCursor - startingOffset;
-final long fieldOffset = getFieldOffset(ordinal);
-final long offsetAndSize = (relativeOffset << 32) | (long) size;
-
-Platform.putLong(holder.buffer, fieldOffset, offsetAndSize);
-  }
-
   public void write(int ordinal, boolean value) {
 final long offset = getFieldOffset(ordinal);
-Platform.putLong(holder.buffer, offset, 0L);
-Platform.putBoolean(holder.buffer, offset, value);
+Platform.putLong(buffer(), offset, 0L);
--- End diff --

`writeLong(offset, 0L)`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...

2018-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20850#discussion_r178460868
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
 ---
@@ -117,150 +138,81 @@ public long getFieldOffset(int ordinal) {
 return startingOffset + nullBitsSize + 8 * ordinal;
   }
 
-  public void setOffsetAndSize(int ordinal, int size) {
-setOffsetAndSize(ordinal, holder.cursor, size);
-  }
-
-  public void setOffsetAndSize(int ordinal, int currentCursor, int size) {
-final long relativeOffset = currentCursor - startingOffset;
-final long fieldOffset = getFieldOffset(ordinal);
-final long offsetAndSize = (relativeOffset << 32) | (long) size;
-
-Platform.putLong(holder.buffer, fieldOffset, offsetAndSize);
-  }
-
   public void write(int ordinal, boolean value) {
 final long offset = getFieldOffset(ordinal);
-Platform.putLong(holder.buffer, offset, 0L);
-Platform.putBoolean(holder.buffer, offset, value);
+Platform.putLong(buffer(), offset, 0L);
+writeBoolean(offset, value);
   }
 
   public void write(int ordinal, byte value) {
 final long offset = getFieldOffset(ordinal);
-Platform.putLong(holder.buffer, offset, 0L);
-Platform.putByte(holder.buffer, offset, value);
+Platform.putLong(buffer(), offset, 0L);
+writeByte(offset, value);
   }
 
   public void write(int ordinal, short value) {
 final long offset = getFieldOffset(ordinal);
-Platform.putLong(holder.buffer, offset, 0L);
-Platform.putShort(holder.buffer, offset, value);
+Platform.putLong(buffer(), offset, 0L);
+writeShort(offset, value);
   }
 
   public void write(int ordinal, int value) {
 final long offset = getFieldOffset(ordinal);
-Platform.putLong(holder.buffer, offset, 0L);
-Platform.putInt(holder.buffer, offset, value);
+Platform.putLong(buffer(), offset, 0L);
+writeInt(offset, value);
   }
 
   public void write(int ordinal, long value) {
-Platform.putLong(holder.buffer, getFieldOffset(ordinal), value);
+writeLong(getFieldOffset(ordinal), value);
   }
 
   public void write(int ordinal, float value) {
-if (Float.isNaN(value)) {
-  value = Float.NaN;
-}
 final long offset = getFieldOffset(ordinal);
-Platform.putLong(holder.buffer, offset, 0L);
-Platform.putFloat(holder.buffer, offset, value);
+Platform.putLong(buffer(), offset, 0L);
+writeFloat(offset, value);
   }
 
   public void write(int ordinal, double value) {
-if (Double.isNaN(value)) {
-  value = Double.NaN;
-}
-Platform.putDouble(holder.buffer, getFieldOffset(ordinal), value);
+writeDouble(getFieldOffset(ordinal), value);
   }
 
   public void write(int ordinal, Decimal input, int precision, int scale) {
 if (precision <= Decimal.MAX_LONG_DIGITS()) {
   // make sure Decimal object has the same scale as DecimalType
   if (input.changePrecision(precision, scale)) {
--- End diff --

do we need `input != null` here like 
https://github.com/apache/spark/pull/20850/files#diff-85658ffc242280699a331c90530f54baR149


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20935: [SPARK-23819][SQL] Fix InMemoryTableScanExec complex typ...

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20935
  
**[Test build #88797 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88797/testReport)**
 for PR 20935 at commit 
[`6ea0919`](https://github.com/apache/spark/commit/6ea0919ec0a9dfc6b121c88790fac79aa072bc60).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...

2018-04-01 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20850#discussion_r178463388
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
 ---
@@ -117,150 +138,81 @@ public long getFieldOffset(int ordinal) {
 return startingOffset + nullBitsSize + 8 * ordinal;
   }
 
-  public void setOffsetAndSize(int ordinal, int size) {
-setOffsetAndSize(ordinal, holder.cursor, size);
-  }
-
-  public void setOffsetAndSize(int ordinal, int currentCursor, int size) {
-final long relativeOffset = currentCursor - startingOffset;
-final long fieldOffset = getFieldOffset(ordinal);
-final long offsetAndSize = (relativeOffset << 32) | (long) size;
-
-Platform.putLong(holder.buffer, fieldOffset, offsetAndSize);
-  }
-
   public void write(int ordinal, boolean value) {
 final long offset = getFieldOffset(ordinal);
-Platform.putLong(holder.buffer, offset, 0L);
-Platform.putBoolean(holder.buffer, offset, value);
+Platform.putLong(buffer(), offset, 0L);
+writeBoolean(offset, value);
   }
 
   public void write(int ordinal, byte value) {
 final long offset = getFieldOffset(ordinal);
-Platform.putLong(holder.buffer, offset, 0L);
-Platform.putByte(holder.buffer, offset, value);
+Platform.putLong(buffer(), offset, 0L);
+writeByte(offset, value);
   }
 
   public void write(int ordinal, short value) {
 final long offset = getFieldOffset(ordinal);
-Platform.putLong(holder.buffer, offset, 0L);
-Platform.putShort(holder.buffer, offset, value);
+Platform.putLong(buffer(), offset, 0L);
+writeShort(offset, value);
   }
 
   public void write(int ordinal, int value) {
 final long offset = getFieldOffset(ordinal);
-Platform.putLong(holder.buffer, offset, 0L);
-Platform.putInt(holder.buffer, offset, value);
+Platform.putLong(buffer(), offset, 0L);
+writeInt(offset, value);
   }
 
   public void write(int ordinal, long value) {
-Platform.putLong(holder.buffer, getFieldOffset(ordinal), value);
+writeLong(getFieldOffset(ordinal), value);
   }
 
   public void write(int ordinal, float value) {
-if (Float.isNaN(value)) {
-  value = Float.NaN;
-}
 final long offset = getFieldOffset(ordinal);
-Platform.putLong(holder.buffer, offset, 0L);
-Platform.putFloat(holder.buffer, offset, value);
+Platform.putLong(buffer(), offset, 0L);
+writeFloat(offset, value);
   }
 
   public void write(int ordinal, double value) {
-if (Double.isNaN(value)) {
-  value = Double.NaN;
-}
-Platform.putDouble(holder.buffer, getFieldOffset(ordinal), value);
+writeDouble(getFieldOffset(ordinal), value);
   }
 
   public void write(int ordinal, Decimal input, int precision, int scale) {
 if (precision <= Decimal.MAX_LONG_DIGITS()) {
   // make sure Decimal object has the same scale as DecimalType
   if (input.changePrecision(precision, scale)) {
--- End diff --

Good point. I will add `input != null`.
I am also curious about the differences between these two methods.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15614: [SPARK-18079] [SQL] CollectLimitExec.executeToIte...

2018-04-01 Thread pwoody

Github user pwoody closed the pull request at:

https://github.com/apache/spark/pull/15614


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20955: [SPARK-23838][WEBUI] Running SQL query is displayed as "...

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20955
  
**[Test build #88795 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88795/testReport)**
 for PR 20955 at commit 
[`2357f59`](https://github.com/apache/spark/commit/2357f59a8f0b1cbea2ac1431bb3669cb1a26f687).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20955: [SPARK-23838][WEBUI] Running SQL query is displayed as "...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20955
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88795/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20955: [SPARK-23838][WEBUI] Running SQL query is displayed as "...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20955
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20886
  
**[Test build #88796 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88796/testReport)**
 for PR 20886 at commit 
[`f1de70c`](https://github.com/apache/spark/commit/f1de70c5284df3b3980d509e2ef7d7e77e020c1c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88796/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20935: [SPARK-23819][SQL] Fix InMemoryTableScanExec complex typ...

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20935
  
**[Test build #88797 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88797/testReport)**
 for PR 20935 at commit 
[`6ea0919`](https://github.com/apache/spark/commit/6ea0919ec0a9dfc6b121c88790fac79aa072bc60).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20935: [SPARK-23819][SQL] Fix InMemoryTableScanExec complex typ...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20935
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88797/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
**[Test build #88798 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88798/testReport)**
 for PR 20697 at commit 
[`84a7779`](https://github.com/apache/spark/commit/84a7779f50a0e4d4ca3ef01f5c3e430c681e569c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20935: [SPARK-23819][SQL] Fix InMemoryTableScanExec complex typ...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20935
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-04-01 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20697
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1891/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20954: [BACKPORT][SPARK-23040][CORE] Returns interruptible iter...

2018-04-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20954
  
Hi, @cloud-fan and @jiangxb1987 .

It seems that the newly added test case starts to fail at `sbt-hadoop2.7`. 
Could you take a look at the failure?
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark/JobCancellationSuite/interruptible_iterator_of_shuffle_reader/


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1867/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1867/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-04-01 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20640
  
@skonto, @squito , @kayousterhout ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20722: [SPARK-23571][K8S] Delete auxiliary Kubernetes resources...

2018-04-01 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20722
  
so where are we on this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...

2018-04-01 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20850#discussion_r178466997
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java
 ---
@@ -32,141 +30,123 @@
  */
 public final class UnsafeArrayWriter extends UnsafeWriter {
 
-  private BufferHolder holder;
-
-  // The offset of the global buffer where we start to write this array.
-  private int startingOffset;
-
   // The number of elements in this array
   private int numElements;
 
+  // The element size in this array
+  private int elementSize;
+
   private int headerInBytes;
 
   private void assertIndexIsValid(int index) {
 assert index >= 0 : "index (" + index + ") should >= 0";
 assert index < numElements : "index (" + index + ") should < " + 
numElements;
   }
 
-  public void initialize(BufferHolder holder, int numElements, int 
elementSize) {
+  public UnsafeArrayWriter(UnsafeWriter writer, int elementSize) {
+super(writer.getBufferHolder());
+this.elementSize = elementSize;
+  }
+
+  public void initialize(int numElements) {
 // We need 8 bytes to store numElements in header
 this.numElements = numElements;
 this.headerInBytes = calculateHeaderPortionInBytes(numElements);
 
-this.holder = holder;
-this.startingOffset = holder.cursor;
+this.startingOffset = cursor();
 
 // Grows the global buffer ahead for header and fixed size data.
 int fixedPartInBytes =
   ByteArrayMethods.roundNumberOfBytesToNearestWord(elementSize * 
numElements);
 holder.grow(headerInBytes + fixedPartInBytes);
 
 // Write numElements and clear out null bits to header
-Platform.putLong(holder.buffer, startingOffset, numElements);
+Platform.putLong(buffer(), startingOffset, numElements);
 for (int i = 8; i < headerInBytes; i += 8) {
-  Platform.putLong(holder.buffer, startingOffset + i, 0L);
+  Platform.putLong(buffer(), startingOffset + i, 0L);
 }
 
 // fill 0 into reminder part of 8-bytes alignment in unsafe array
 for (int i = elementSize * numElements; i < fixedPartInBytes; i++) {
-  Platform.putByte(holder.buffer, startingOffset + headerInBytes + i, 
(byte) 0);
+  Platform.putByte(buffer(), startingOffset + headerInBytes + i, 
(byte) 0);
 }
-holder.cursor += (headerInBytes + fixedPartInBytes);
+increaseCursor(headerInBytes + fixedPartInBytes);
   }
 
-  private void zeroOutPaddingBytes(int numBytes) {
-if ((numBytes & 0x07) > 0) {
-  Platform.putLong(holder.buffer, holder.cursor + ((numBytes >> 3) << 
3), 0L);
-}
-  }
-
-  private long getElementOffset(int ordinal, int elementSize) {
+  private long getElementOffset(int ordinal) {
 return startingOffset + headerInBytes + ordinal * elementSize;
   }
 
-  public void setOffsetAndSize(int ordinal, int currentCursor, int size) {
-assertIndexIsValid(ordinal);
-final long relativeOffset = currentCursor - startingOffset;
-final long offsetAndSize = (relativeOffset << 32) | (long)size;
-
-write(ordinal, offsetAndSize);
-  }
-
   private void setNullBit(int ordinal) {
 assertIndexIsValid(ordinal);
-BitSetMethods.set(holder.buffer, startingOffset + 8, ordinal);
+BitSetMethods.set(buffer(), startingOffset + 8, ordinal);
   }
 
   public void setNull1Bytes(int ordinal) {
--- End diff --

IIUC, `UnsafeRowWriter` need a single `setNullAt` method for 8-byte width 
field.
On the other hand, `UnsafeArrayWriter` needs multiple `setNull?Bytes()` for 
different element size. Generated code also uses `setNull?Bytes`.

Could you elaborate your thought?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...

2018-04-01 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20850#discussion_r178469079
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java
 ---
@@ -32,141 +30,123 @@
  */
 public final class UnsafeArrayWriter extends UnsafeWriter {
 
-  private BufferHolder holder;
-
-  // The offset of the global buffer where we start to write this array.
-  private int startingOffset;
-
   // The number of elements in this array
   private int numElements;
 
+  // The element size in this array
+  private int elementSize;
+
   private int headerInBytes;
 
   private void assertIndexIsValid(int index) {
 assert index >= 0 : "index (" + index + ") should >= 0";
 assert index < numElements : "index (" + index + ") should < " + 
numElements;
   }
 
-  public void initialize(BufferHolder holder, int numElements, int 
elementSize) {
+  public UnsafeArrayWriter(UnsafeWriter writer, int elementSize) {
+super(writer.getBufferHolder());
+this.elementSize = elementSize;
+  }
+
+  public void initialize(int numElements) {
 // We need 8 bytes to store numElements in header
 this.numElements = numElements;
 this.headerInBytes = calculateHeaderPortionInBytes(numElements);
 
-this.holder = holder;
-this.startingOffset = holder.cursor;
+this.startingOffset = cursor();
 
 // Grows the global buffer ahead for header and fixed size data.
 int fixedPartInBytes =
   ByteArrayMethods.roundNumberOfBytesToNearestWord(elementSize * 
numElements);
 holder.grow(headerInBytes + fixedPartInBytes);
 
 // Write numElements and clear out null bits to header
-Platform.putLong(holder.buffer, startingOffset, numElements);
+Platform.putLong(buffer(), startingOffset, numElements);
 for (int i = 8; i < headerInBytes; i += 8) {
-  Platform.putLong(holder.buffer, startingOffset + i, 0L);
+  Platform.putLong(buffer(), startingOffset + i, 0L);
 }
 
 // fill 0 into reminder part of 8-bytes alignment in unsafe array
 for (int i = elementSize * numElements; i < fixedPartInBytes; i++) {
-  Platform.putByte(holder.buffer, startingOffset + headerInBytes + i, 
(byte) 0);
+  Platform.putByte(buffer(), startingOffset + headerInBytes + i, 
(byte) 0);
 }
-holder.cursor += (headerInBytes + fixedPartInBytes);
+increaseCursor(headerInBytes + fixedPartInBytes);
   }
 
-  private void zeroOutPaddingBytes(int numBytes) {
-if ((numBytes & 0x07) > 0) {
-  Platform.putLong(holder.buffer, holder.cursor + ((numBytes >> 3) << 
3), 0L);
-}
-  }
-
-  private long getElementOffset(int ordinal, int elementSize) {
+  private long getElementOffset(int ordinal) {
 return startingOffset + headerInBytes + ordinal * elementSize;
   }
 
-  public void setOffsetAndSize(int ordinal, int currentCursor, int size) {
-assertIndexIsValid(ordinal);
-final long relativeOffset = currentCursor - startingOffset;
-final long offsetAndSize = (relativeOffset << 32) | (long)size;
-
-write(ordinal, offsetAndSize);
-  }
-
   private void setNullBit(int ordinal) {
 assertIndexIsValid(ordinal);
-BitSetMethods.set(holder.buffer, startingOffset + 8, ordinal);
+BitSetMethods.set(buffer(), startingOffset + 8, ordinal);
   }
 
   public void setNull1Bytes(int ordinal) {
--- End diff --

We still need the various `setNull*` methods because of arrays.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-04-01 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20345
  
ping


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...

2018-04-01 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20850#discussion_r178469520
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/GenerateColumnAccessor.scala
 ---
@@ -212,11 +210,11 @@ object GenerateColumnAccessor extends 
CodeGenerator[Seq[DataType], ColumnarItera
 
 public InternalRow next() {
   currentRow += 1;
-  bufferHolder.reset();
+  rowWriter.reset();
   rowWriter.zeroOutNullBytes();
   ${extractorCalls}
-  unsafeRow.setTotalSize(bufferHolder.totalSize());
-  return unsafeRow;
+  rowWriter.setTotalSize();
+  return rowWriter.getRow();
--- End diff --

Is there any place where we call `getRow()` without calling 
`setTotalSize()` before that? If there aren't then I'd combine the two.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20886: [SPARK-19724][SQL]create a managed table with an ...

2018-04-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20886#discussion_r178470143
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -103,6 +103,60 @@ class InMemoryCatalogedDDLSuite extends DDLSuite with 
SharedSQLContext with Befo
 }
   }
 
+  test("CTAS a managed table with the existing empty directory") {
+val tableLoc = new 
File(spark.sessionState.catalog.defaultTablePath(TableIdentifier("tab1")))
+try {
+  tableLoc.mkdir()
+  withTable("tab1") {
+sql("CREATE TABLE tab1 USING PARQUET AS SELECT 1, 'a'")
--- End diff --

Move them to `DDLSuite`? Change the format based on `isUsingHiveMetastore`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20886: [SPARK-19724][SQL]create a managed table with an ...

2018-04-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20886#discussion_r178470268
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -298,15 +302,28 @@ class SessionCatalog(
 makeQualifiedPath(tableDefinition.storage.locationUri.get)
   tableDefinition.copy(
 storage = tableDefinition.storage.copy(locationUri = 
Some(qualifiedTableLocation)),
-identifier = TableIdentifier(table, Some(db)))
+identifier = tableIdentifier)
 } else {
-  tableDefinition.copy(identifier = TableIdentifier(table, Some(db)))
+  tableDefinition.copy(identifier = tableIdentifier)
 }
 
 requireDbExists(db)
 externalCatalog.createTable(newTableDefinition, ignoreIfExists)
   }
 
+  def validateTableLocation(table: CatalogTable, tableIdentifier: 
TableIdentifier): Unit = {
--- End diff --

`CatalogTable ` already contains `TableIdentifier `. What is the reason you 
do not use the one directly?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20886: [SPARK-19724][SQL]create a managed table with an ...

2018-04-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20886#discussion_r178470264
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -298,15 +302,28 @@ class SessionCatalog(
 makeQualifiedPath(tableDefinition.storage.locationUri.get)
   tableDefinition.copy(
 storage = tableDefinition.storage.copy(locationUri = 
Some(qualifiedTableLocation)),
-identifier = TableIdentifier(table, Some(db)))
+identifier = tableIdentifier)
 } else {
-  tableDefinition.copy(identifier = TableIdentifier(table, Some(db)))
+  tableDefinition.copy(identifier = tableIdentifier)
 }
 
 requireDbExists(db)
 externalCatalog.createTable(newTableDefinition, ignoreIfExists)
   }
 
+  def validateTableLocation(table: CatalogTable, tableIdentifier: 
TableIdentifier): Unit = {
+// SPARK-19724: the default location of a managed table should be 
non-existent or empty.
+   if (table.tableType == CatalogTableType.MANAGED) {
+ val tableLocation = new Path(defaultTablePath(tableIdentifier))
--- End diff --

We always check `defaultTablePath `? Is that possible, the table location 
points to the different location?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...

2018-04-01 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20886
  
This is a behavior change, we need to make it configurable and also 
document it in the migration guide. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
**[Test build #88798 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88798/testReport)**
 for PR 20697 at commit 
[`84a7779`](https://github.com/apache/spark/commit/84a7779f50a0e4d4ca3ef01f5c3e430c681e569c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88798/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20930: [SPARK-23811][Core] FetchFailed comes before Succ...

2018-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20930#discussion_r178470893
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -794,6 +794,19 @@ private[spark] class TaskSetManager(
 fetchFailed.bmAddress.host, fetchFailed.bmAddress.executorId))
 }
 
+// Kill any other attempts for this FetchFailed task
+for (attemptInfo <- taskAttempts(index) if attemptInfo.running) {
+  logInfo(s"Killing attempt ${attemptInfo.attemptNumber} for task 
${attemptInfo.id} " +
+s"in stage ${taskSet.id} (TID ${attemptInfo.taskId}) on 
${attemptInfo.host} " +
+s"as the attempt ${info.attemptNumber} failed because 
FetchFailed")
+  killedByOtherAttempt(index) = true
+  sched.backend.killTask(
--- End diff --

if this is async, we can't guarantee to not have task success events after 
marking staging as failed, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...

2018-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20850#discussion_r178470952
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java
 ---
@@ -32,141 +30,123 @@
  */
 public final class UnsafeArrayWriter extends UnsafeWriter {
 
-  private BufferHolder holder;
-
-  // The offset of the global buffer where we start to write this array.
-  private int startingOffset;
-
   // The number of elements in this array
   private int numElements;
 
+  // The element size in this array
+  private int elementSize;
+
   private int headerInBytes;
 
   private void assertIndexIsValid(int index) {
 assert index >= 0 : "index (" + index + ") should >= 0";
 assert index < numElements : "index (" + index + ") should < " + 
numElements;
   }
 
-  public void initialize(BufferHolder holder, int numElements, int 
elementSize) {
+  public UnsafeArrayWriter(UnsafeWriter writer, int elementSize) {
+super(writer.getBufferHolder());
+this.elementSize = elementSize;
+  }
+
+  public void initialize(int numElements) {
 // We need 8 bytes to store numElements in header
 this.numElements = numElements;
 this.headerInBytes = calculateHeaderPortionInBytes(numElements);
 
-this.holder = holder;
-this.startingOffset = holder.cursor;
+this.startingOffset = cursor();
 
 // Grows the global buffer ahead for header and fixed size data.
 int fixedPartInBytes =
   ByteArrayMethods.roundNumberOfBytesToNearestWord(elementSize * 
numElements);
 holder.grow(headerInBytes + fixedPartInBytes);
 
 // Write numElements and clear out null bits to header
-Platform.putLong(holder.buffer, startingOffset, numElements);
+Platform.putLong(buffer(), startingOffset, numElements);
 for (int i = 8; i < headerInBytes; i += 8) {
-  Platform.putLong(holder.buffer, startingOffset + i, 0L);
+  Platform.putLong(buffer(), startingOffset + i, 0L);
 }
 
 // fill 0 into reminder part of 8-bytes alignment in unsafe array
 for (int i = elementSize * numElements; i < fixedPartInBytes; i++) {
-  Platform.putByte(holder.buffer, startingOffset + headerInBytes + i, 
(byte) 0);
+  Platform.putByte(buffer(), startingOffset + headerInBytes + i, 
(byte) 0);
 }
-holder.cursor += (headerInBytes + fixedPartInBytes);
+increaseCursor(headerInBytes + fixedPartInBytes);
   }
 
-  private void zeroOutPaddingBytes(int numBytes) {
-if ((numBytes & 0x07) > 0) {
-  Platform.putLong(holder.buffer, holder.cursor + ((numBytes >> 3) << 
3), 0L);
-}
-  }
-
-  private long getElementOffset(int ordinal, int elementSize) {
+  private long getElementOffset(int ordinal) {
 return startingOffset + headerInBytes + ordinal * elementSize;
   }
 
-  public void setOffsetAndSize(int ordinal, int currentCursor, int size) {
-assertIndexIsValid(ordinal);
-final long relativeOffset = currentCursor - startingOffset;
-final long offsetAndSize = (relativeOffset << 32) | (long)size;
-
-write(ordinal, offsetAndSize);
-  }
-
   private void setNullBit(int ordinal) {
 assertIndexIsValid(ordinal);
-BitSetMethods.set(holder.buffer, startingOffset + 8, ordinal);
+BitSetMethods.set(buffer(), startingOffset + 8, ordinal);
   }
 
   public void setNull1Bytes(int ordinal) {
--- End diff --

nvm, I thought we can pattern match the `elementSize`,  but that may hurt 
performance a lot.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the cat...

2018-04-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20795#discussion_r178471806
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -1072,8 +1072,17 @@ class SessionCatalog(
   def functionExists(name: FunctionIdentifier): Boolean = {
 val db = 
formatDatabaseName(name.database.getOrElse(getCurrentDatabase))
 requireDbExists(db)
-functionRegistry.functionExists(name) ||
-  externalCatalog.functionExists(db, name.funcName)
+builtinFunctionExists(name) || externalFunctionExists(name)
+  }
+
+  def builtinFunctionExists(name: FunctionIdentifier): Boolean = {
--- End diff --

```Scala
  /**
   * Returns whether this function has been registered in the function 
registry of the current
   * session. If not existed, returns false.
   */
  def isRegisteredFunction(name: FunctionIdentifier): Boolean = {
functionRegistry.functionExists(name)
  }

  /**
   * Returns whether it is a persistent function. If not existed, returns 
false.
   */
  def isPersistentFunction(name: FunctionIdentifier): Boolean = {
val db = formatDatabaseName(name.database.getOrElse(getCurrentDatabase))
databaseExists(db) && externalCatalog.functionExists(db, name.funcName)
  }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the cat...

2018-04-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20795#discussion_r178471822
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -1072,8 +1072,17 @@ class SessionCatalog(
   def functionExists(name: FunctionIdentifier): Boolean = {
 val db = 
formatDatabaseName(name.database.getOrElse(getCurrentDatabase))
 requireDbExists(db)
-functionRegistry.functionExists(name) ||
-  externalCatalog.functionExists(db, name.funcName)
+builtinFunctionExists(name) || externalFunctionExists(name)
+  }
+
+  def builtinFunctionExists(name: FunctionIdentifier): Boolean = {
--- End diff --

Please also add the unit test cases to `SessionCatalogSuite`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the cat...

2018-04-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20795#discussion_r178471847
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -1072,8 +1072,17 @@ class SessionCatalog(
   def functionExists(name: FunctionIdentifier): Boolean = {
 val db = 
formatDatabaseName(name.database.getOrElse(getCurrentDatabase))
 requireDbExists(db)
-functionRegistry.functionExists(name) ||
-  externalCatalog.functionExists(db, name.funcName)
+builtinFunctionExists(name) || externalFunctionExists(name)
+  }
+
+  def builtinFunctionExists(name: FunctionIdentifier): Boolean = {
--- End diff --

Move them after `def isTemporaryFunction(name: FunctionIdentifier): Boolean 
`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20886: [SPARK-19724][SQL]create a managed table with an ...

2018-04-01 Thread gengliangwang

Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/20886#discussion_r178472405
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -298,15 +302,28 @@ class SessionCatalog(
 makeQualifiedPath(tableDefinition.storage.locationUri.get)
   tableDefinition.copy(
 storage = tableDefinition.storage.copy(locationUri = 
Some(qualifiedTableLocation)),
-identifier = TableIdentifier(table, Some(db)))
+identifier = tableIdentifier)
 } else {
-  tableDefinition.copy(identifier = TableIdentifier(table, Some(db)))
+  tableDefinition.copy(identifier = tableIdentifier)
 }
 
 requireDbExists(db)
 externalCatalog.createTable(newTableDefinition, ignoreIfExists)
   }
 
+  def validateTableLocation(table: CatalogTable, tableIdentifier: 
TableIdentifier): Unit = {
+// SPARK-19724: the default location of a managed table should be 
non-existent or empty.
+   if (table.tableType == CatalogTableType.MANAGED) {
+ val tableLocation = new Path(defaultTablePath(tableIdentifier))
--- End diff --

I think managed tables are always created on default path. So the check 
here should be correct.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20886
  
**[Test build #88799 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88799/testReport)**
 for PR 20886 at commit 
[`f26c08c`](https://github.com/apache/spark/commit/f26c08c1638e2e0caef6283c20cab8364109f0e0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1892/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20940: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20940
  
**[Test build #88800 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88800/testReport)**
 for PR 20940 at commit 
[`5e24fc6`](https://github.com/apache/spark/commit/5e24fc6d7e06d9b836fe759407c270ca004637fe).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...

2018-04-01 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20850#discussion_r178477937
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/GenerateColumnAccessor.scala
 ---
@@ -212,11 +210,11 @@ object GenerateColumnAccessor extends 
CodeGenerator[Seq[DataType], ColumnarItera
 
 public InternalRow next() {
   currentRow += 1;
-  bufferHolder.reset();
+  rowWriter.reset();
   rowWriter.zeroOutNullBytes();
   ${extractorCalls}
-  unsafeRow.setTotalSize(bufferHolder.totalSize());
-  return unsafeRow;
+  rowWriter.setTotalSize();
+  return rowWriter.getRow();
--- End diff --

I think that 
[here](https://github.com/apache/spark/pull/20850/files#diff-90b107e5c61791e17d5b4b25021b89fdR349)
 is only a place where we call `getRow()` without calling `setTotalSize()` if 
`numVarLenFields == 0`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...

2018-04-01 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20850#discussion_r178478113
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/GenerateColumnAccessor.scala
 ---
@@ -212,11 +210,11 @@ object GenerateColumnAccessor extends 
CodeGenerator[Seq[DataType], ColumnarItera
 
 public InternalRow next() {
   currentRow += 1;
-  bufferHolder.reset();
+  rowWriter.reset();
   rowWriter.zeroOutNullBytes();
   ${extractorCalls}
-  unsafeRow.setTotalSize(bufferHolder.totalSize());
-  return unsafeRow;
+  rowWriter.setTotalSize();
+  return rowWriter.getRow();
--- End diff --

Should we call `reset()` and `setTotalSize()` as the interpreted version 
does?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...

2018-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20886
  
**[Test build #88799 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88799/testReport)**
 for PR 20886 at commit 
[`f26c08c`](https://github.com/apache/spark/commit/f26c08c1638e2e0caef6283c20cab8364109f0e0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...

2018-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88799/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20937: [SPARK-23723][SPARK-23724][SQL] Support custom en...

2018-04-01 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20937#discussion_r178476075
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/CreateJacksonParser.scala
 ---
@@ -39,11 +40,36 @@ private[sql] object CreateJacksonParser extends 
Serializable {
 jsonFactory.createParser(new InputStreamReader(bain, "UTF-8"))
   }
 
-  def text(jsonFactory: JsonFactory, record: Text): JsonParser = {
-jsonFactory.createParser(record.getBytes, 0, record.getLength)
+  def text(jsonFactory: JsonFactory, record: Text, encoding: 
Option[String] = None): JsonParser = {
+encoding match {
--- End diff --

Looks we create a partial function and then use it and therefore it's going 
to do the type dispatch for every record. Can we avoid this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20937: [SPARK-23723][SPARK-23724][SQL] Support custom en...

2018-04-01 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20937#discussion_r178476931
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala
 ---
@@ -92,32 +93,34 @@ object TextInputJsonDataSource extends JsonDataSource {
   sparkSession: SparkSession,
   inputPaths: Seq[FileStatus],
   parsedOptions: JSONOptions): StructType = {
-val json: Dataset[String] = createBaseDataset(
-  sparkSession, inputPaths, parsedOptions.lineSeparator)
+val json: Dataset[String] = createBaseDataset(sparkSession, 
inputPaths, parsedOptions)
+
 inferFromDataset(json, parsedOptions)
   }
 
   def inferFromDataset(json: Dataset[String], parsedOptions: JSONOptions): 
StructType = {
 val sampled: Dataset[String] = JsonUtils.sample(json, parsedOptions)
-val rdd: RDD[UTF8String] = 
sampled.queryExecution.toRdd.map(_.getUTF8String(0))
-JsonInferSchema.infer(rdd, parsedOptions, 
CreateJacksonParser.utf8String)
+val rdd: RDD[InternalRow] = sampled.queryExecution.toRdd
+
+JsonInferSchema.infer[InternalRow](
+  rdd,
+  parsedOptions,
+  CreateJacksonParser.internalRow(_, _, 0, parsedOptions.encoding)
+)
   }
 
   private def createBaseDataset(
   sparkSession: SparkSession,
   inputPaths: Seq[FileStatus],
-  lineSeparator: Option[String]): Dataset[String] = {
-val textOptions = lineSeparator.map { lineSep =>
-  Map(TextOptions.LINE_SEPARATOR -> lineSep)
-}.getOrElse(Map.empty[String, String])
-
+  parsedOptions: JSONOptions
+  ): Dataset[String] = {
--- End diff --

Let's move this line up too. I think it should be either


```
...
parsedOptions: JSONOptions)
  : Dataset[String] = {
```

or

```
parsedOptions: JSONOptions) : Dataset[String] = {
```
https://github.com/databricks/scala-style-guide#spacing-and-indentation

but somehow I believe we usually do the latter style.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20937: [SPARK-23723][SPARK-23724][SQL] Support custom en...

2018-04-01 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20937#discussion_r178478519
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -2070,9 +2070,9 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
   // Read
   val data =
 s"""
-  |  {"f":
-  |"a", "f0": 1}$lineSep{"f":
-  |
+   |  {"f":
+   |"a", "f0": 1}$lineSep{"f":
+   |
--- End diff --

Seems a mistake ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 173 matches

Mail list logo