[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-04-19 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15821
  
Please move ArrowConverters.scala somewhere else that's not top level, e.g. 
execution.arrow


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-04-19 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15821
  
@BryanCutler Are you going to update this for arrow 0.3?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-04-20 Thread wesm
Github user wesm commented on the issue:

https://github.com/apache/spark/pull/15821
  
Pre-release builds of Arrow 0.3 are now on conda-forge, which should help 
with testing https://anaconda.org/conda-forge/pyarrow/files


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-04-20 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/15821
  
Thank you for the review @rxin!  I will work on an update for the issues 
you brought up, and updating for for Arrow 0.3 should clean up some things and 
offer more type support.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-04-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #76210 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76210/testReport)**
 for PR 15821 at commit 
[`b6fe733`](https://github.com/apache/spark/commit/b6fe733955d6e153722b1945c09ed663d8ed9be2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-04-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #76210 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76210/testReport)**
 for PR 15821 at commit 
[`b6fe733`](https://github.com/apache/spark/commit/b6fe733955d6e153722b1945c09ed663d8ed9be2).
 * This patch **fails build dependency tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76210/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-04-26 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/15821
  
Updated to work with the latest Arrow to prepare for 0.3 release (tests 
should fail because that artifact is not yet available).  Also improved 
consistency of ArrowConverters and did some cleanup.  From @rxin 's comments:

> Move ArrowConverters.scala somewhere else that's not top level, e.g. 
execution.arrow

It is now in the o.a.s.sql.execution.arrow package

> Update this to arrow 0.3

Ready to do this, should just need to update the pom again

>Use SQLConf rather than a parameter for toPandas.

I removed this flag and used the conf "spark.sql.execution.arrow.enable" 
which defaults to "false"

>Handle failure gracefully if arrow is not installed (or somehow package it 
with Spark?)

It would be difficult to package with Spark, I think, because pyarrow also 
depends on the native Arrow cpp library.  I changed it to fail gracefully if 
pyarrow is not available.  The error message is:
```
ImportError: No module named pyarrow
note: pyarrow must be installed and available on calling Python processif 
using spark.sql.execution.arrow.enable=true
```

>How are the memory managed? Who allocates the memory for the arrow 
records, and who's responsible for releasing them?

The Java side of Arrow requires using a BufferAllocator class that manages 
the allocated memory.  An instance of this must be used each time a 
ArrowRecordBatch is created and then the batch and allocator must be 
released/closed after they have been processed.  This is all handled in the 
`ArrowConverter` functions.  On the Python side, buffers are allocated from the 
Arrow cpp library and cleaned up when reference counts to the objects are zero. 
 The end user does not have to worry about managing any memory.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-04-26 Thread wesm
Github user wesm commented on the issue:

https://github.com/apache/spark/pull/15821
  
Note: we are shooting for an Arrow RC in Monday time frame, so with luck 
we'll have a release cut next week


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-04-27 Thread icexelloss
Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/15821
  
>  An instance of this must be used each time a ArrowRecordBatch is created 
and then the batch and allocator must be released/closed after they have been 
processed

I think it would useful to add test to check memory leaks in error cases, 
for instance:
* Have a dataframe that throws exception after n rows. Invoke the arrow 
conversion function, and check allocator memory usage.
* Have a dataframe that is slow, invoke the arrow conversion function, 
cancel the task, and check allocator memory usage.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #76708 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76708/testReport)**
 for PR 15821 at commit 
[`a4d6057`](https://github.com/apache/spark/commit/a4d6057642a922c4beb5b396591ba9f1b5e3f883).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76708/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #76708 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76708/testReport)**
 for PR 15821 at commit 
[`a4d6057`](https://github.com/apache/spark/commit/a4d6057642a922c4beb5b396591ba9f1b5e3f883).
 * This patch **fails build dependency tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #76710 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76710/testReport)**
 for PR 15821 at commit 
[`934c147`](https://github.com/apache/spark/commit/934c147cf41752d382ee6ae304ed18ca5bed73e4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #76710 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76710/testReport)**
 for PR 15821 at commit 
[`934c147`](https://github.com/apache/spark/commit/934c147cf41752d382ee6ae304ed18ca5bed73e4).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76710/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-10 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/15821
  
Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #76754 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76754/testReport)**
 for PR 15821 at commit 
[`934c147`](https://github.com/apache/spark/commit/934c147cf41752d382ee6ae304ed18ca5bed73e4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #76754 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76754/testReport)**
 for PR 15821 at commit 
[`934c147`](https://github.com/apache/spark/commit/934c147cf41752d382ee6ae304ed18ca5bed73e4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76754/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-10 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/15821
  
@rxin I have updated this to use Arrow 0.3 and addressed your other 
comments, could you please give it another look when possible?  Following up on 
a couple issues:

>Use SQLConf rather than a parameter for toPandas.

I removed this flag and used the conf "spark.sql.execution.arrow.enable" 
which defaults to "false", and also added 
"spark.sql.execution.arrow.maxRecordsPerBatch" to limit memory usage, still 
under discussion.

>rather than defining the json using objects and serialize them, can we 
just put the json as a string inline? that'd be much easier to inspect ...

Here is a sample of a simple JSON file the tests use.  It contains metadata 
and validity array in addition to the raw data, and ends up being a fairly 
large string which is why I opt for generating the file instead.

```
{
"schema": {
"fields": [
{
"name": "nullable_int",
"type": {"name": "int", "isSigned": true, "bitWidth": 32},
"nullable": true,
"children": [],
"typeLayout": {
"vectors": [
{"type": "VALIDITY", "typeBitWidth": 1},
{"type": "DATA", "typeBitWidth": 32}
]
}
}
]
},

"batches": [
{
"count": 6,
"columns": [
{
"name": "nullable_int",
"count": 6,
"VALIDITY": [1, 0, 0, 1, 0, 1],
"DATA": [1, -1, 2, -2, 2147483647, -2147483648]
}
]
}
]
}
```

>Handle failure gracefully if arrow is not installed (or somehow package it 
with Spark?)

I just want to make sure I took this the right way.. It should stop 
execution and print out an error with a clear message. Not log a message then 
continue execution without using pyarrow, correct?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-15 Thread icexelloss
Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/15821
  
@BryanCutler , is Timestamp and Date type supported now with Arrow 0.3?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-15 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15821
  
@BryanCutler even though the json is long, it is still so much clearer than 
reading a pile of code that generates json ...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-16 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/15821
  
No problem @rxin , I will restructure the tests so that the json data is 
local to each test, and ping you when done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-16 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/15821
  
>@BryanCutler , is Timestamp and Date type supported now with Arrow 0.3?

@icexelloss , yes Arrow supports it but Spark stores timestamps is a 
different way which caused some complication.  After talking with Holden, we 
agreed it was better to keep this PR to simple data types only and extent type 
support in a follow up PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-16 Thread icexelloss
Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/15821
  
>@icexelloss , yes Arrow supports it but Spark stores timestamps is a 
different way which caused some complication. After talking with Holden, we 
agreed it was better to keep this PR to simple data types only and extent type 
support in a follow up PR.

Got it. Can you share some details?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #77032 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77032/testReport)**
 for PR 15821 at commit 
[`b4eebc2`](https://github.com/apache/spark/commit/b4eebc27e261eddb4d8b0b829245fa3c187dade1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #77032 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77032/testReport)**
 for PR 15821 at commit 
[`b4eebc2`](https://github.com/apache/spark/commit/b4eebc27e261eddb4d8b0b829245fa3c187dade1).
 * This patch **fails PySpark pip packaging tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77032/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-22 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/15821
  
A quick update - I'm not sure why the pip tests failed, hopefully just a 
fluke with the worker.  I'm waiting to retest until I can also update to Arrow 
0.4, which includes a relevant bug fix and should be released any day.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-22 Thread wesm
Github user wesm commented on the issue:

https://github.com/apache/spark/pull/15821
  
FYI for others: Arrow 0.3 and 0.4 are backwards/forwards compatible at the 
binary format. The 0.4 release contains bug fixes and new features in the 
Python bindings. The release vote is closing today, if it passes then we will 
try to get conda-forge and PyPI artifacts published in the next 24 hours.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #77390 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77390/testReport)**
 for PR 15821 at commit 
[`d49a14d`](https://github.com/apache/spark/commit/d49a14daea3a5e92c2cfdf579373ca13b96c20e5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-25 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/15821
  
Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #77401 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77401/testReport)**
 for PR 15821 at commit 
[`d49a14d`](https://github.com/apache/spark/commit/d49a14daea3a5e92c2cfdf579373ca13b96c20e5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #77401 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77401/testReport)**
 for PR 15821 at commit 
[`d49a14d`](https://github.com/apache/spark/commit/d49a14daea3a5e92c2cfdf579373ca13b96c20e5).
 * This patch **fails build dependency tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77401/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #77429 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77429/testReport)**
 for PR 15821 at commit 
[`a630bf0`](https://github.com/apache/spark/commit/a630bf0d867c31be10660f25ae0d9b185dfa00e2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #77429 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77429/testReport)**
 for PR 15821 at commit 
[`a630bf0`](https://github.com/apache/spark/commit/a630bf0d867c31be10660f25ae0d9b185dfa00e2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77429/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-30 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/15821
  
Hi @rxin, this has been upgraded to Arrow 0.4 and all tests have passed.  
Scala unit tests have been changed to inline JSON data from your request.  
Please take another look when possible, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-13 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15821
  
mostly LGTM, thanks for working on it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-15 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/15821
  
Thanks you for the review and good questions @cloud-fan!  Let me know if 
your still opposed to keeping the `ArrowPayload` class as is, otherwise I'll 
push an update for the `VarCharVector` string writer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-15 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15821
  
yea I think it's fine to keep `ArrowPayload`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #78265 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78265/testReport)**
 for PR 15821 at commit 
[`8bff966`](https://github.com/apache/spark/commit/8bff966b637ee35a8c1cb051c7eb700f017e4d71).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #78265 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78265/testReport)**
 for PR 15821 at commit 
[`8bff966`](https://github.com/apache/spark/commit/8bff966b637ee35a8c1cb051c7eb700f017e4d71).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78265/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-20 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15821
  
LGTM, my last concern is 
https://github.com/apache/spark/pull/15821#discussion_r122925584

Ideally an optimization should never change result, can you investigate why 
we have different result for int and float?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #78331 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78331/testReport)**
 for PR 15821 at commit 
[`f96f555`](https://github.com/apache/spark/commit/f96f555e1a3b8aabc7949d4b355f3af3b0e78b5a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-20 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/15821
  
Thanks @cloud-fan.  I commented above on the reason for the type 
differences, but basically without arrow `IntegerType` and `FloatType` were 
getting up-converted to `int64` and `float64`.  Even though this shouldn't 
change any data maybe it would be good to document this change somewhere?

@leifwalsh I also added a check for `concat_tables()` in case all records 
are filtered out and tables are None.  It will then produce the same 
pandas.DataFrame as without using Arrow, which has columns defined but is empty.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-20 Thread leifwalsh
Github user leifwalsh commented on the issue:

https://github.com/apache/spark/pull/15821
  
@BryanCutler awesome, thanks. I'll test ASAP but I believe you, don't block 
merge on my account. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-21 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15821
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #78383 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78383/testReport)**
 for PR 15821 at commit 
[`f96f555`](https://github.com/apache/spark/commit/f96f555e1a3b8aabc7949d4b355f3af3b0e78b5a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #78383 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78383/testReport)**
 for PR 15821 at commit 
[`f96f555`](https://github.com/apache/spark/commit/f96f555e1a3b8aabc7949d4b355f3af3b0e78b5a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78383/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-21 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15821
  
we can update the test after merging 
https://github.com/apache/spark/pull/18378


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15821
  
BTW should we need to update `setup.py` too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15821
  
I will actually take it back. This could be checked and done in a followup 
(inclusing doc update). I see this PR is already quite big.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #78469 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78469/testReport)**
 for PR 15821 at commit 
[`44d7a2a`](https://github.com/apache/spark/commit/44d7a2a3fedb4f4bec167d763d0df3d6448bbe49).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #78469 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78469/testReport)**
 for PR 15821 at commit 
[`44d7a2a`](https://github.com/apache/spark/commit/44d7a2a3fedb4f4bec167d763d0df3d6448bbe49).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78469/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-22 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/15821
  
@cloud-fan I updated with your recent patch for #18378 and cleaned up the 
related Arrow test.  Let me know if it looks ok now, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-22 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15821
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-22 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15821
  
Let's remove the test hack 
https://github.com/apache/spark/pull/15821/files#r111512686 in followup and 
make Arrow a requirement in `setup.py`, any thoughts? @HyukjinKwon @holdenk 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-22 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/15821
  
Updating the setup seems like a good follow up PR yes. The test hack I 
think might make sense to keep until the Jenkins refactoring.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-22 Thread wesm
Github user wesm commented on the issue:

https://github.com/apache/spark/pull/15821
  
Thanks all! Apache Arrow has advanced a great deal since November, so I 
expect we can make a number of follow up PRs to support more data types and 
optimize use of the streaming record batch machinery (if you haven't already!) 
for lower memory utilization and better overall throughput to Python users. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-22 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/15821
  
Thanks @cloud-fan and all others who helped out with this PR or reviewed!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-25 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15821
  
@cloud-fan @BryanCutler it seems like this is failing a number of the 
builds with errors like:

```

Running PySpark packaging tests

Constucting virtual env for testing
Using conda virtual enviroments
Testing pip installation with python 3.5
Using /tmp/tmp.vAml0iYeCs for virtualenv
Fetching package metadata: 
Solving package specifications: .

Package plan for installation in environment /tmp/tmp.vAml0iYeCs/3.5:

The following NEW packages will be INSTALLED:

mkl: 2017.0.1-0 (soft-link)
numpy:   1.13.0-py35_0  (soft-link)
openssl: 1.0.2l-0   (soft-link)
pandas:  0.20.2-np113py35_0 (soft-link)
pip: 9.0.1-py35_1   (soft-link)
python:  3.5.3-1(soft-link)
python-dateutil: 2.6.0-py35_0   (soft-link)
pytz:2017.2-py35_0  (soft-link)
readline:6.2-2  (soft-link)
setuptools:  27.2.0-py35_0  (soft-link)
six: 1.10.0-py35_0  (soft-link)
sqlite:  3.13.0-0   (soft-link)
tk:  8.5.18-0   (soft-link)
wheel:   0.29.0-py35_0  (soft-link)
xz:  5.2.2-1(soft-link)
zlib:1.2.8-3(soft-link)

Linking packages ...
[]|  |  
 0%
[mkl-2017.0.1-0 /home/sparkivy/per]| |  
 0%
[openssl-1.0.2l-0 /home/sparkivy/per]|## |  
 6%
[readline]|##|  
12%
[sqlite-3.13.0-0 /home/sparkivy/per]|##  |  
18%
[tk  ]|  |  
25%
[xz-5.2.2-1 /home/sparkivy/per]| |  
31%
[zlib-1.2.8-3 /home/sparkivy/per]|## |  
37%
[python-3.5.3-1 /home/sparkivy/per]| |  
43%
[numpy-1.13.0-py35_0 /home/sparkivy/per]||  
50%
[pytz-2017.2-py35_0 /home/sparkivy/per]|##   |  
56%
[setuptools-27.2.0-py35_0 /home/sparkivy/per]|   |  
62%
[six-1.10.0-py35_0 /home/sparkivy/per]|###   |  
68%
[wheel-0.29.0-py35_0 /home/sparkivy/per]||  
75%
[pip-9.0.1-py35_1 /home/sparkivy/per]|   |  
81%
[python-dateutil-2.6.0-py35_0 /home/sparkivy/per]|   |  
87%
[pandas-0.20.2-np113py35_0 /home/sparkivy/per]|  |  
93%
[  COMPLETE  ]|##| 
100%
#
# To activate this environment, use:
# $ source activate /tmp/tmp.vAml0iYeCs/3.5
#
# To deactivate this environment, use:
# $ source deactivate
#
discarding /home/anaconda/bin from PATH
prepending /tmp/tmp.vAml0iYeCs/3.5/bin to PATH
Fetching package metadata: ..SSL verification error: hostname 
'conda.binstar.org' doesn't match either of 'anaconda.com', 
'anacondacloud.com', 'anacondacloud.org', 'binstar.org', 'wakari.io'
.SSL verification error: hostname 'conda.binstar.org' doesn't match either 
of 'anaconda.com', 'anacondacloud.com', 'anacondacloud.org', 'binstar.org', 
'wakari.io'
...
Solving package specifications: .
Error:  Package missing in current linux-64 channels: 
  - pyarrow 0.4|0.4.0*
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-25 Thread wesm
Github user wesm commented on the issue:

https://github.com/apache/spark/pull/15821
  
You must add the conda-forge channel; I also recommend increasing the 
timeout for conda which helps make builds more stable, see:

https://github.com/apache/arrow/blob/master/ci/travis_install_conda.sh#L37

Also, I recommend updating to pyarrow 0.4.1 as soon as practical (only bug 
fixes)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15821
  
Would you mind opening a PR for this? I guess updating it would probably be 
done by a followup but this one sounds rather a semi-hotfix. If both timeout 
and adding chennel are all we need, I can propose the change now instead if you 
happen to be busy for this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-25 Thread wesm
Github user wesm commented on the issue:

https://github.com/apache/spark/pull/15821
  
I only see the package referenced here 
https://github.com/apache/spark/blob/e44697606f429b01808c1a22cb44cb5b89585c5c/dev/run-pip-tests#L86
 -- where is the packaging build that @srowen  is referencing happening? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15821
  
I believe the pointer is the right place up to my knowledge, via ... 


https://github.com/apache/spark/blob/e44697606f429b01808c1a22cb44cb5b89585c5c/dev/run-tests#L23
 ->

https://github.com/apache/spark/blob/e44697606f429b01808c1a22cb44cb5b89585c5c/dev/run-tests.py#L609
 ->

https://github.com/apache/spark/blob/e44697606f429b01808c1a22cb44cb5b89585c5c/dev/run-tests.py#L458
 ->

https://github.com/apache/spark/blob/e44697606f429b01808c1a22cb44cb5b89585c5c/dev/run-pip-tests#L78-L87

and assuming from the logs provided above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-25 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15821
  
Seems it already asks to search in the conda-forge channel?


https://github.com/apache/spark/blob/e44697606f429b01808c1a22cb44cb5b89585c5c/dev/run-pip-tests#L86


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-25 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/15821
  
it's because of this
```
.SSL verification error: hostname 'conda.binstar.org' doesn't match either 
of 'anaconda.com', 'anacondacloud.com', 'anacondacloud.org', 'binstar.org', 
'wakari.io'
.SSL verification error: hostname 'conda.binstar.org' doesn't match either 
of 'anaconda.com', 'anacondacloud.com', 'anacondacloud.org', 'binstar.org', 
'wakari.io'
...
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-25 Thread wesm
Github user wesm commented on the issue:

https://github.com/apache/spark/pull/15821
  
Is your conda up to date? It's a best practice to always update to the 
latest conda


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15821
  
CC @cloud-fan @BryanCutler is there an easy fix or do we need to revert 
this temporarily? it's failing the builds


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15821
  
@JoshRosen do we have a jenkins setup script like 
https://github.com/apache/arrow/blob/master/ci/travis_install_conda.sh#L37 ?

I think we need to make conda up to date and increase the timeout.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread wesm
Github user wesm commented on the issue:

https://github.com/apache/spark/pull/15821
  
@srowen @cloud-fan adding the steps from 
https://github.com/apache/arrow/blob/master/ci/travis_install_conda.sh that 
update conda to the latest version and increasing the SSL timeout should fix 
the problem. If it does not I can take a closer look


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/15821
  
If it's still failing builds we should revert, fix the issue and reemerge 
once it's fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread shivaram
Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/15821
  
cc @shaneknapp 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/15821
  
hmm.  currently thinking about this.  thanks for the ping, shiv.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/15821
  
Sorry I'm out of town right now and not able to really look into this until
tomorrow.  Is it the run-pip-tests script that's causing the failures?  If
so maybe we can install pyarrow with pip instead of conda, otherwise if not
a simple fix then maybe best to revert and I can help sort it out tmrw.

On Jun 26, 2017 10:29 AM, "shane"  wrote:

> hmm. currently thinking about this. thanks for the ping, shiv.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/15821
  
@BryanCutler -- i'm ok w/holding out to discuss this in more detail 
tomorrow.  in the meantime, i'll look over this PR and build failures and get 
myself up to speed w/what's going on.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/15821
  
ok, @JoshRosen and i will bang our respective heads against this in about 
an hour.  we should be able to figure something out pretty quick.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/15821
  
@shaneknapp let me know if you want some help poking at Jenkins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/15821
  
@holdenk yeah, another set of eyes would be great!  i haven't actually 
touched the test infra code in a long time and i'm currently wrapping my brain 
around the order of operations that run-pip-tests goes through in conjunction 
w/everything else.

i have a feeling that the chain of scripts (run-tests-jenkins -> 
run-tests-jenkins.py -> run-tests -> run-pip-tests) besides being confusing for 
humans (ie: me), is also fragile WRT conda envs (aka munging PATH) in our 
environment.  

would installing pyarrow 0.4.0 in the py3k conda env fix things?  if so, i 
can bang that out in moments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/15821
  
@shaneknapp it might, assuming the Conda cache is shared it should avoid 
needing to fetch the package. I'm not super sure but I think we might have 
better luck updating conda on the jenkins machines (if people are ok with that) 
since it seems like this is probably from an out of date conda.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread MaheshIBM
Github user MaheshIBM commented on the issue:

https://github.com/apache/spark/pull/15821
  
This does not seem like a timeout issue, the certificate CN and the what is 
used as the hostname are not matching. So clearly the client downloads the 
certificate but is not able to verify (no timeout). If anything it may be 
possible to configure the code/command to ignore ssl cert errors. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/15821
  
It's not looking like the SSL Verification Error is the issue, there are a
handful of recent builds that have passed after getting that same error,
see below.  Maybe something else is timing out?

From
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78669
```
prepending /tmp/tmp.87E7MDUu95/3.5/bin to PATH

Fetching package metadata: ..SSL verification error: hostname
'conda.binstar.org' doesn't match either of 'anaconda.com',
'anacondacloud.com', 'anacondacloud.org', 'binstar.org', 'wakari.io'
.SSL verification error: hostname 'conda.binstar.org' doesn't match
either of 'anaconda.com', 'anacondacloud.com', 'anacondacloud.org',
'binstar.org', 'wakari.io'
...
Solving package specifications: .

Package plan for installation in environment /tmp/tmp.87E7MDUu95/3.5:

The following NEW packages will be INSTALLED:

arrow-cpp:   0.4.1-np112py35_2  (soft-link)
certifi: 2017.4.17-py35_0   (soft-link)
jemalloc:5.0.0-1(soft-link)
ncurses: 5.9-10 (soft-link)
parquet-cpp: 1.1.0-2(soft-link)
pyarrow: 0.4.0-np112py35_0  (soft-link)

```



On Jun 26, 2017 9:23 PM, "Mahesh Sawaiker"  wrote:

> This does not seem like a timeout issue, the certificate CN and the what
> is used as the hostname are not matching. So clearly the client downloads
> the certificate but is not able to verify (no timeout). If anything it may
> be possible to configure the code/command to ignore ssl cert errors.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-26 Thread MaheshIBM
Github user MaheshIBM commented on the issue:

https://github.com/apache/spark/pull/15821
  
That lends me to believe that the download request could be resolving to 
different hosts every time, can it happen if there is a CDN working in the 
background?  Not all hosts are configured to use the bad certificate. While one 
(or more possibly) are using a certificate with DN of conda.binstar.org and 
responding to the domain name in the hostname of the url from where the package 
download is attempted. 

If there is a way for configuring pip to ignore ssl errors (only for 
purpose of troubleshooting and find root cause of the problem here), then that 
is one possible direction to take. I am looking for ways to ignore ssl errors 
when using pip, will update the comment if i find something. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-27 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/15821
  
i agree w/@MaheshIBM that we're looking at a bad CA cert.  i think we're 
looking at a problem on continuum.io's side, not our side.  

however, i do no like the thought of ignoring certs (on principle).  :)

and finally, if i'm reading the run-pip-tests code correctly (and please 
correct me if i'm wrong @holdenk ), we're just creating a temp python 
environment in /tmp, installing some packages, running the tests, and then 
moving on.

some thoughts/suggestions:
* our conda environment is pretty stagnant and hasn't been explicitly 
upgraded since we deployed anaconda python over a year ago.
* the py3k environment that exists in the workers' conda installation is 
solely used by spark builds, so updating said environment w/the packages in the 
run-pip-tests will remove the need to download them, but at the same time, make 
the tests a NOOP.
* we can hope that continuum fixes their cert issue asap.  :\


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >