[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15821 In the future we should revert PRs that fail builds IMMEDIATELY. There is no way we should've let the build be broken for days. --- If your project is set up for it, you can reply to this email and

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-28 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15821 Sorry I didn't know there is a PR fixing the issue, and I already reverted it. Please cherry-pick this commit in the new PR and apply the pip fixing. Sorry for the trouble. --- If your project

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-28 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/15821 @cloudfan could you wait until the latest test from #18443 finishes? It should be done soon, but it's failed twice so far due to unrelated errors. If it fails a third time, then I agree

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-28 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15821 I'm against #18439 , I'd rather revert this and fix it later than installing packages without SSL. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15821 FWIW, I am not against reverting. Just want to provide some contexts just in case missed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15821 Should we maybe wait for https://github.com/apache/spark/pull/18443? Actually, I think there is an alternative for this - https://github.com/apache/spark/pull/18439 rather than reverting whole

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15821 Wait @cloud-fan! just want to ask a quesiton. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-28 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15821 Some PRs are blocked because of this failures, for days. I'm reverting it, @BryanCutler please reopen this PR after fixing the pip stuff, thanks! --- If your project is set up for it, you can

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-28 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15821 the last build still failed, shall we update `dev/run-pip-tests` to use pip? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-27 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15821 Great, thanks @shaneknapp . @BryanCutler I've got a webinar and if you don't have a chance to change the tests around until after I'm done teaching I'll do it, but if your flight lands first then

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-27 Thread shaneknapp
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/15821 roger copy... "latest" is 0.4.1, which is what's currently on the jenkins workers. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-27 Thread wesm
Github user wesm commented on the issue: https://github.com/apache/spark/pull/15821 I recommend using the latest. The data format is forward/backward compatible so the JAR doesn't necessarily need to be 0.4.1 if you're using pyarrow 0.4.1 (0.4.1 fixed a Decimal regression in the Java

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-27 Thread shaneknapp
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/15821 btw, do we want pyarrow-0.4.0 or -0.4.1? i'm assuming the latter based on https://github.com/apache/spark/pull/15821#issuecomment-310905209 --- If your project is set up for it, you can reply

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-27 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/15821 Thanks for doing this @shaneknapp and @holdenk! I'm about to hop on a plane but should be online later this afternoon. I can switch out the pyarrow tests then if it still needs to be

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-27 Thread shaneknapp
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/15821 ``` (py3k)-bash-4.1$ pip install pyarrow Requirement already satisfied: pyarrow in /home/anaconda/envs/py3k/lib/python3.4/site-packages Requirement already satisfied: six>=1.0.0 in

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-27 Thread shaneknapp
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/15821 done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-27 Thread shaneknapp
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/15821 anyways: installing pyarrow right now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-27 Thread shaneknapp
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/15821 yeah, i think you're right. however, upgrading to a new version of conda on a live environment does indeed scare me a little bit. :) w/the new jenkins, i'll have a staging server

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-27 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15821 I can do the test updating assuming that @BryanCutler is traveling. I've got a webinar this afternoon but I can do it after I'm done with that. Also I don't think its the wild card issue

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-27 Thread shaneknapp
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/15821 okie dokie. how about this i install pyarrow in the py3k conda environment right now... once that's done, we can remove the pyarrow test from run-pip-tests and add it to the regular

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-27 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15821 @shaneknapp your understanding about what run-pip-tests code is pretty correct. It's important to note that part of the test is installing the pyspark package its self to makesure we didn't break

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-27 Thread shaneknapp
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/15821 i agree w/@MaheshIBM that we're looking at a bad CA cert. i think we're looking at a problem on continuum.io's side, not our side. however, i do no like the thought of ignoring certs

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-27 Thread MaheshIBM
Github user MaheshIBM commented on the issue: https://github.com/apache/spark/pull/15821 That lends me to believe that the download request could be resolving to different hosts every time, can it happen if there is a CDN working in the background? Not all hosts are configured to

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-27 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/15821 It's not looking like the SSL Verification Error is the issue, there are a handful of recent builds that have passed after getting that same error, see below. Maybe something else is

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread MaheshIBM
Github user MaheshIBM commented on the issue: https://github.com/apache/spark/pull/15821 This does not seem like a timeout issue, the certificate CN and the what is used as the hostname are not matching. So clearly the client downloads the certificate but is not able to verify (no

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15821 @shaneknapp it might, assuming the Conda cache is shared it should avoid needing to fetch the package. I'm not super sure but I think we might have better luck updating conda on the jenkins

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread shaneknapp
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/15821 @holdenk yeah, another set of eyes would be great! i haven't actually touched the test infra code in a long time and i'm currently wrapping my brain around the order of operations that

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15821 @shaneknapp let me know if you want some help poking at Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread shaneknapp
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/15821 ok, @JoshRosen and i will bang our respective heads against this in about an hour. we should be able to figure something out pretty quick. --- If your project is set up for it, you can reply

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread shaneknapp
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/15821 @BryanCutler -- i'm ok w/holding out to discuss this in more detail tomorrow. in the meantime, i'll look over this PR and build failures and get myself up to speed w/what's going on.

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/15821 Sorry I'm out of town right now and not able to really look into this until tomorrow. Is it the run-pip-tests script that's causing the failures? If so maybe we can install pyarrow with

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread shaneknapp
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/15821 hmm. currently thinking about this. thanks for the ping, shiv. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread shivaram
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/15821 cc @shaneknapp --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15821 If it's still failing builds we should revert, fix the issue and reemerge once it's fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread wesm
Github user wesm commented on the issue: https://github.com/apache/spark/pull/15821 @srowen @cloud-fan adding the steps from https://github.com/apache/arrow/blob/master/ci/travis_install_conda.sh that update conda to the latest version and increasing the SSL timeout should fix the

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15821 @JoshRosen do we have a jenkins setup script like https://github.com/apache/arrow/blob/master/ci/travis_install_conda.sh#L37 ? I think we need to make conda up to date and increase the

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15821 CC @cloud-fan @BryanCutler is there an easy fix or do we need to revert this temporarily? it's failing the builds --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-25 Thread wesm
Github user wesm commented on the issue: https://github.com/apache/spark/pull/15821 Is your conda up to date? It's a best practice to always update to the latest conda --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-25 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/15821 it's because of this ``` .SSL verification error: hostname 'conda.binstar.org' doesn't match either of 'anaconda.com', 'anacondacloud.com', 'anacondacloud.org', 'binstar.org',

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-25 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15821 Seems it already asks to search in the conda-forge channel? https://github.com/apache/spark/blob/e44697606f429b01808c1a22cb44cb5b89585c5c/dev/run-pip-tests#L86 --- If your project is set

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15821 I believe the pointer is the right place up to my knowledge, via ... https://github.com/apache/spark/blob/e44697606f429b01808c1a22cb44cb5b89585c5c/dev/run-tests#L23 ->

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-25 Thread wesm
Github user wesm commented on the issue: https://github.com/apache/spark/pull/15821 I only see the package referenced here https://github.com/apache/spark/blob/e44697606f429b01808c1a22cb44cb5b89585c5c/dev/run-pip-tests#L86 -- where is the packaging build that @srowen is referencing

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15821 Would you mind opening a PR for this? I guess updating it would probably be done by a followup but this one sounds rather a semi-hotfix. If both timeout and adding chennel are all we need, I

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-25 Thread wesm
Github user wesm commented on the issue: https://github.com/apache/spark/pull/15821 You must add the conda-forge channel; I also recommend increasing the timeout for conda which helps make builds more stable, see:

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-25 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15821 @cloud-fan @BryanCutler it seems like this is failing a number of the builds with errors like: ``` Running

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-22 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/15821 Thanks @cloud-fan and all others who helped out with this PR or reviewed! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-22 Thread wesm
Github user wesm commented on the issue: https://github.com/apache/spark/pull/15821 Thanks all! Apache Arrow has advanced a great deal since November, so I expect we can make a number of follow up PRs to support more data types and optimize use of the streaming record batch machinery

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-22 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15821 Updating the setup seems like a good follow up PR yes. The test hack I think might make sense to keep until the Jenkins refactoring. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-22 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15821 Let's remove the test hack https://github.com/apache/spark/pull/15821/files#r111512686 in followup and make Arrow a requirement in `setup.py`, any thoughts? @HyukjinKwon @holdenk --- If your

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-22 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15821 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-22 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/15821 @cloud-fan I updated with your recent patch for #18378 and cleaned up the related Arrow test. Let me know if it looks ok now, thanks! --- If your project is set up for it, you can reply to

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15821 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15821 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78469/ Test PASSed. ---

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15821 **[Test build #78469 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78469/testReport)** for PR 15821 at commit

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15821 **[Test build #78469 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78469/testReport)** for PR 15821 at commit

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15821 I will actually take it back. This could be checked and done in a followup (inclusing doc update). I see this PR is already quite big. --- If your project is set up for it, you can reply to

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15821 BTW should we need to update `setup.py` too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15821 we can update the test after merging https://github.com/apache/spark/pull/18378 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15821 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78383/ Test PASSed. ---

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15821 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15821 **[Test build #78383 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78383/testReport)** for PR 15821 at commit

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15821 **[Test build #78383 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78383/testReport)** for PR 15821 at commit

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15821 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-20 Thread leifwalsh
Github user leifwalsh commented on the issue: https://github.com/apache/spark/pull/15821 @BryanCutler awesome, thanks. I'll test ASAP but I believe you, don't block merge on my account. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-20 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/15821 Thanks @cloud-fan. I commented above on the reason for the type differences, but basically without arrow `IntegerType` and `FloatType` were getting up-converted to `int64` and `float64`. Even

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15821 **[Test build #78331 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78331/testReport)** for PR 15821 at commit

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15821 LGTM, my last concern is https://github.com/apache/spark/pull/15821#discussion_r122925584 Ideally an optimization should never change result, can you investigate why we have different

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15821 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78265/ Test PASSed. ---

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15821 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15821 **[Test build #78265 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78265/testReport)** for PR 15821 at commit

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15821 **[Test build #78265 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78265/testReport)** for PR 15821 at commit

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-15 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15821 yea I think it's fine to keep `ArrowPayload` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-15 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/15821 Thanks you for the review and good questions @cloud-fan! Let me know if your still opposed to keeping the `ArrowPayload` class as is, otherwise I'll push an update for the `VarCharVector`

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-13 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15821 mostly LGTM, thanks for working on it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-30 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/15821 Hi @rxin, this has been upgraded to Arrow 0.4 and all tests have passed. Scala unit tests have been changed to inline JSON data from your request. Please take another look when possible,

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15821 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77429/ Test PASSed. ---

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15821 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15821 **[Test build #77429 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77429/testReport)** for PR 15821 at commit

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15821 **[Test build #77429 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77429/testReport)** for PR 15821 at commit

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15821 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15821 **[Test build #77401 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77401/testReport)** for PR 15821 at commit

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15821 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77401/ Test FAILed. ---

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15821 **[Test build #77401 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77401/testReport)** for PR 15821 at commit

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-25 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/15821 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15821 **[Test build #77390 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77390/testReport)** for PR 15821 at commit

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-22 Thread wesm
Github user wesm commented on the issue: https://github.com/apache/spark/pull/15821 FYI for others: Arrow 0.3 and 0.4 are backwards/forwards compatible at the binary format. The 0.4 release contains bug fixes and new features in the Python bindings. The release vote is closing today,

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-22 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/15821 A quick update - I'm not sure why the pip tests failed, hopefully just a fluke with the worker. I'm waiting to retest until I can also update to Arrow 0.4, which includes a relevant bug fix

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15821 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77032/ Test FAILed. ---

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15821 **[Test build #77032 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77032/testReport)** for PR 15821 at commit

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15821 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15821 **[Test build #77032 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77032/testReport)** for PR 15821 at commit

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-16 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/15821 >@icexelloss , yes Arrow supports it but Spark stores timestamps is a different way which caused some complication. After talking with Holden, we agreed it was better to keep this PR to simple

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-16 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/15821 >@BryanCutler , is Timestamp and Date type supported now with Arrow 0.3? @icexelloss , yes Arrow supports it but Spark stores timestamps is a different way which caused some

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-16 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/15821 No problem @rxin , I will restructure the tests so that the json data is local to each test, and ping you when done. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15821 @BryanCutler even though the json is long, it is still so much clearer than reading a pile of code that generates json ... --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-15 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/15821 @BryanCutler , is Timestamp and Date type supported now with Arrow 0.3? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-10 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/15821 @rxin I have updated this to use Arrow 0.3 and addressed your other comments, could you please give it another look when possible? Following up on a couple issues: >Use SQLConf rather

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15821 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76754/ Test PASSed. ---

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15821 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15821 **[Test build #76754 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76754/testReport)** for PR 15821 at commit

  1   2   >