[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2018-01-02 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20114
  
Thanks @HyukjinKwon and @ueshin !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2018-01-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20114
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20114
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85554/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20114
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20114
  
**[Test build #85554 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85554/testReport)**
 for PR 20114 at commit 
[`281ffdc`](https://github.com/apache/spark/commit/281ffdc9132829617af28dcb1668e2fa5eddc599).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20114
  
**[Test build #85554 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85554/testReport)**
 for PR 20114 at commit 
[`281ffdc`](https://github.com/apache/spark/commit/281ffdc9132829617af28dcb1668e2fa5eddc599).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-31 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20114
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20114
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85552/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20114
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20114
  
**[Test build #85552 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85552/testReport)**
 for PR 20114 at commit 
[`281ffdc`](https://github.com/apache/spark/commit/281ffdc9132829617af28dcb1668e2fa5eddc599).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20114
  
**[Test build #85552 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85552/testReport)**
 for PR 20114 at commit 
[`281ffdc`](https://github.com/apache/spark/commit/281ffdc9132829617af28dcb1668e2fa5eddc599).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20114
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20114
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85533/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20114
  
**[Test build #85533 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85533/testReport)**
 for PR 20114 at commit 
[`25cf41c`](https://github.com/apache/spark/commit/25cf41c8ba804a7a6e8fbf9ebaf9498ce03fb063).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-29 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20114
  
The new workaround seems to be fine and I also added another test with 
array null values to test that along with all non-null values.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20114
  
**[Test build #85533 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85533/testReport)**
 for PR 20114 at commit 
[`25cf41c`](https://github.com/apache/spark/commit/25cf41c8ba804a7a6e8fbf9ebaf9498ce03fb063).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-29 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20114
  
> How about simply returning false from ArrowVectorAccessor.isNullAt(int 
rowId) when accessor.getValueCount() > 0 && 
accessor.getValidityBuffer().capacity() == 0

Good idea @ueshin , I think this should be fine as we are only querying the 
validity buffer in the call to `isNullAt`.  I'll give it a try!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20114
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20114
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85505/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20114
  
**[Test build #85505 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85505/testReport)**
 for PR 20114 at commit 
[`d2c5c2b`](https://github.com/apache/spark/commit/d2c5c2b4ea803ac8d1f08a5f79af1076f9e5bd2b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20114
  
**[Test build #85505 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85505/testReport)**
 for PR 20114 at commit 
[`d2c5c2b`](https://github.com/apache/spark/commit/d2c5c2b4ea803ac8d1f08a5f79af1076f9e5bd2b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-29 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20114
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20114
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20114
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85499/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20114
  
**[Test build #85499 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85499/testReport)**
 for PR 20114 at commit 
[`d2c5c2b`](https://github.com/apache/spark/commit/d2c5c2b4ea803ac8d1f08a5f79af1076f9e5bd2b).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-28 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20114
  
How about simply returning `false` from `ArrowVectorAccessor.isNullAt(int 
rowId)` when `accessor.getValueCount() > 0 && 
accessor.getValidityBuffer().capacity() == 0` without modifying the buffer?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-28 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20114
  
ping @ueshin @HyukjinKwon

Unfortunately, there was a bug in the Arrow 0.8.0 release on the Java side 
https://issues.apache.org/jira/browse/ARROW-1948 that caused a problem here.  I 
was able to find a workaround, but it required me to make a change to the 
`ArrowVectorAccessor` class.  I'm not sure if this is something you would be ok 
putting in, or if you would prefer to wait until the next minor release to add 
the ArrayType support.

The issue was that the Arrow spec states that if the validity buffer is 
empty, then that means that all the values are non-null.  In Arrow 0.8.0, the 
C++/Python side started sending buffers this way, and the Arrow ListVector was 
not handling it properly, thinking instead that there were no valid values.  

The workaround I added here looks if the ListVector has a value count of > 
0 and has an empty validity buffer.  This means that all the values are 
non-null and it will allocate a new validity buffer with all bits set.

For Arrow with non-udfs (toPandas and createDataFrame) this only needs to 
be done once, but for udfs each batch read will load new buffers into the arrow 
VectorSchemaRoot, so it needs to be checked after each read.  The simplest 
place to put the workaround to cover these cases was to allow 
`ArrowVectorAccessor.isNullAt(int rowId)` to be overridden.  Let me know what 
you guys think, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2017-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20114
  
**[Test build #85499 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85499/testReport)**
 for PR 20114 at commit 
[`d2c5c2b`](https://github.com/apache/spark/commit/d2c5c2b4ea803ac8d1f08a5f79af1076f9e5bd2b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org