[jira] [Commented] (ARROW-2301) [Python] Add source distribution publishing instructions to package / release management documentation

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414912#comment-16414912
 ] 

ASF GitHub Bot commented on ARROW-2301:
---

kou commented on issue #1795: ARROW-2301: [Python] Build source distribution 
inside the manylinux1 docker
URL: https://github.com/apache/arrow/pull/1795#issuecomment-376374365
 
 
   Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Add source distribution publishing instructions to package / release 
> management documentation
> --
>
> Key: ARROW-2301
> URL: https://issues.apache.org/jira/browse/ARROW-2301
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We wish to start publishing source tarballs for Python on PyPI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2358) API for Writing to Multiple Feather Files

2018-03-26 Thread Dhruv Madeka (JIRA)
Dhruv Madeka created ARROW-2358:
---

 Summary: API for Writing to Multiple Feather Files
 Key: ARROW-2358
 URL: https://issues.apache.org/jira/browse/ARROW-2358
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C, C++, Python
Affects Versions: 0.9.0
Reporter: Dhruv Madeka
 Fix For: 0.10.0


It would be really great to have an API which can write a Table to a 
`FeatherDataset`. Essentially, taking a name for a file - it would split the 
table into N-equal parts (which could be determined by the user or the code) 
and then write the data to N files with a suffix (which is `_part` by default 
but could be user specificed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX

2018-03-26 Thread Nicholas Schrock (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414792#comment-16414792
 ] 

Nicholas Schrock commented on ARROW-2355:
-

I'm also able to reproduce this issue. I installed pyarrow 0.8.0 as a 
workaround.

> [Python] Unable to import pyarrow [0.9.0] OSX
> -
>
> Key: ARROW-2355
> URL: https://issues.apache.org/jira/browse/ARROW-2355
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Bradford W Littooy
>Priority: Major
>
> I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to 
> import pyarrow into a python3.6 interpreter, I get the following import error:
>  
> >>> import pyarrow
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py",
>  line 47, in 
>     from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: libarrow_boost_system.dylib
>   Referenced from: 
> /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib
>   Reason: image not found
> >>>
> I've installed pyarrow (0.9) on an EC2 instance with no issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2357) Benchmark PandasObjectIsNull

2018-03-26 Thread Phillip Cloud (JIRA)
Phillip Cloud created ARROW-2357:


 Summary: Benchmark PandasObjectIsNull
 Key: ARROW-2357
 URL: https://issues.apache.org/jira/browse/ARROW-2357
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.9.0
Reporter: Phillip Cloud
 Fix For: 0.10.0


This is a follow-up to ARROW-2354 ([C++] Make PyDecimal_Check() faster). We 
should benchmark {{PandasObjectIsNull}} as it gets called in many of our 
conversion routines in tight loops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414791#comment-16414791
 ] 

ASF GitHub Bot commented on ARROW-2354:
---

cpcloud commented on issue #1794: ARROW-2354: [C++] Make PyDecimal_Check() 
faster
URL: https://github.com/apache/arrow/pull/1794#issuecomment-376349245
 
 
   @pitrou @wesm Follow-up JIRA'd: 
https://issues.apache.org/jira/browse/ARROW-2357


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] PyDecimal_Check() is much too slow
> 
>
> Key: ARROW-2354
> URL: https://issues.apache.org/jira/browse/ARROW-2354
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> See https://github.com/apache/arrow/issues/1792



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414780#comment-16414780
 ] 

ASF GitHub Bot commented on ARROW-2354:
---

cpcloud closed pull request #1794: ARROW-2354: [C++] Make PyDecimal_Check() 
faster
URL: https://github.com/apache/arrow/pull/1794
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/cpp/src/arrow/python/helpers.cc b/cpp/src/arrow/python/helpers.cc
index 5719af6f3..63fee54b6 100644
--- a/cpp/src/arrow/python/helpers.cc
+++ b/cpp/src/arrow/python/helpers.cc
@@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) {
 }
 
 bool PyDecimal_Check(PyObject* obj) {
-  // TODO(phillipc): Is this expensive?
-  OwnedRef Decimal;
-  Status status = ImportDecimalType();
-  DCHECK_OK(status);
-  const int32_t result = PyObject_IsInstance(obj, Decimal.obj());
-  DCHECK_NE(result, -1) << " error during PyObject_IsInstance check";
+  static OwnedRef decimal_type;
+  if (!decimal_type.obj()) {
+Status status = ImportDecimalType(_type);
+DCHECK_OK(status);
+DCHECK(PyType_Check(decimal_type.obj()));
+  }
+  // PyObject_IsInstance() is slower as it has to check for virtual subclasses
+  const int result =
+  PyType_IsSubtype(Py_TYPE(obj), 
reinterpret_cast(decimal_type.obj()));
+  DCHECK_NE(result, -1) << " error during PyType_IsSubtype check";
   return result == 1;
 }
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] PyDecimal_Check() is much too slow
> 
>
> Key: ARROW-2354
> URL: https://issues.apache.org/jira/browse/ARROW-2354
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> See https://github.com/apache/arrow/issues/1792



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2354) [C++] PyDecimal_Check() is much too slow

2018-03-26 Thread Phillip Cloud (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phillip Cloud resolved ARROW-2354.
--
   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1794
[https://github.com/apache/arrow/pull/1794]

> [C++] PyDecimal_Check() is much too slow
> 
>
> Key: ARROW-2354
> URL: https://issues.apache.org/jira/browse/ARROW-2354
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> See https://github.com/apache/arrow/issues/1792



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2356) [JS] JSON reader fails on FixedSizeBinary data buffer

2018-03-26 Thread Paul Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Taylor updated ARROW-2356:
---
Description: The JSON reader doesn't ingest the FixedSizeBinary data buffer 
correctly, and we haven't known about it because the JS integration test runner 
is accidentally exiting with code 0 on failures.  (was: The JSON reader doesn't 
ingest the FixedSizeBinary data buffer correctly, and we haven't known about it 
the JS integration test runner is accidentally exiting with code 0 on failures.)

> [JS] JSON reader fails on FixedSizeBinary data buffer
> -
>
> Key: ARROW-2356
> URL: https://issues.apache.org/jira/browse/ARROW-2356
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: JS-0.3.1
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.0
>
>
> The JSON reader doesn't ingest the FixedSizeBinary data buffer correctly, and 
> we haven't known about it because the JS integration test runner is 
> accidentally exiting with code 0 on failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2356) [JS] JSON reader fails on FixedSizeBinary data buffer

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414758#comment-16414758
 ] 

ASF GitHub Bot commented on ARROW-2356:
---

trxcllnt opened a new pull request #1796: ARROW-2356: [JS] Fix JSON Reader 
FixedSizeBinary Vectors
URL: https://github.com/apache/arrow/pull/1796
 
 
   Resolves https://issues.apache.org/jira/browse/ARROW-2356


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] JSON reader fails on FixedSizeBinary data buffer
> -
>
> Key: ARROW-2356
> URL: https://issues.apache.org/jira/browse/ARROW-2356
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: JS-0.3.1
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.0
>
>
> The JSON reader doesn't ingest the FixedSizeBinary data buffer correctly, and 
> we haven't known about it the JS integration test runner is accidentally 
> exiting with code 0 on failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2356) [JS] JSON reader fails on FixedSizeBinary data buffer

2018-03-26 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2356:
--
Labels: pull-request-available  (was: )

> [JS] JSON reader fails on FixedSizeBinary data buffer
> -
>
> Key: ARROW-2356
> URL: https://issues.apache.org/jira/browse/ARROW-2356
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: JS-0.3.1
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.0
>
>
> The JSON reader doesn't ingest the FixedSizeBinary data buffer correctly, and 
> we haven't known about it the JS integration test runner is accidentally 
> exiting with code 0 on failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2356) [JS] JSON reader fails on FixedSizeBinary data buffer

2018-03-26 Thread Paul Taylor (JIRA)
Paul Taylor created ARROW-2356:
--

 Summary: [JS] JSON reader fails on FixedSizeBinary data buffer
 Key: ARROW-2356
 URL: https://issues.apache.org/jira/browse/ARROW-2356
 Project: Apache Arrow
  Issue Type: Bug
  Components: JavaScript
Affects Versions: JS-0.3.1
Reporter: Paul Taylor
Assignee: Paul Taylor
 Fix For: JS-0.4.0


The JSON reader doesn't ingest the FixedSizeBinary data buffer correctly, and 
we haven't known about it the JS integration test runner is accidentally 
exiting with code 0 on failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414662#comment-16414662
 ] 

ASF GitHub Bot commented on ARROW-2354:
---

cpcloud commented on a change in pull request #1794: ARROW-2354: [C++] Make 
PyDecimal_Check() faster
URL: https://github.com/apache/arrow/pull/1794#discussion_r177253437
 
 

 ##
 File path: cpp/src/arrow/python/helpers.cc
 ##
 @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) 
{
 }
 
 bool PyDecimal_Check(PyObject* obj) {
-  // TODO(phillipc): Is this expensive?
-  OwnedRef Decimal;
-  Status status = ImportDecimalType();
-  DCHECK_OK(status);
-  const int32_t result = PyObject_IsInstance(obj, Decimal.obj());
-  DCHECK_NE(result, -1) << " error during PyObject_IsInstance check";
+  static OwnedRef decimal_type;
+  if (!decimal_type.obj()) {
 
 Review comment:
   @wesm This is my only comment, which isn't necessary for merging.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] PyDecimal_Check() is much too slow
> 
>
> Key: ARROW-2354
> URL: https://issues.apache.org/jira/browse/ARROW-2354
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/apache/arrow/issues/1792



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414657#comment-16414657
 ] 

ASF GitHub Bot commented on ARROW-2354:
---

wesm commented on issue #1794: ARROW-2354: [C++] Make PyDecimal_Check() faster
URL: https://github.com/apache/arrow/pull/1794#issuecomment-376330717
 
 
   @cpcloud I'll wait for you to have a last look at this before merging (you 
have a comment unaddressed still)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] PyDecimal_Check() is much too slow
> 
>
> Key: ARROW-2354
> URL: https://issues.apache.org/jira/browse/ARROW-2354
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/apache/arrow/issues/1792



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2301) [Python] Add source distribution publishing instructions to package / release management documentation

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414649#comment-16414649
 ] 

ASF GitHub Bot commented on ARROW-2301:
---

wesm closed pull request #1795: ARROW-2301: [Python] Build source distribution 
inside the manylinux1 docker
URL: https://github.com/apache/arrow/pull/1795
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/dev/release/RELEASE_MANAGEMENT.md 
b/dev/release/RELEASE_MANAGEMENT.md
index 0069a2af8..06340ab11 100644
--- a/dev/release/RELEASE_MANAGEMENT.md
+++ b/dev/release/RELEASE_MANAGEMENT.md
@@ -154,7 +154,8 @@ The pip binary packages (called "wheels") are generated 
from the
 * Push arrow-dist updates to **both** apache/arrow-dist and your fork of
   arrow-dist.
 * Wait for builds to complete
-* Download all wheel files from the new BinTray package version ([example][4])
+* Download all wheel and tar.gz files from the new BinTray package version
+  ([example][4])
 
 Now, you can finally upload the wheels to PyPI using the `twine` CLI tool. You
 must be permissioned on PyPI to upload here; ask Wes McKinney or Uwe Korn if
diff --git a/python/manylinux1/build_arrow.sh b/python/manylinux1/build_arrow.sh
index 5df55a65c..6697733d0 100755
--- a/python/manylinux1/build_arrow.sh
+++ b/python/manylinux1/build_arrow.sh
@@ -70,6 +70,7 @@ for PYTHON_TUPLE in ${PYTHON_VERSIONS}; do
 echo "=== (${PYTHON}) Building wheel ==="
 PATH="$PATH:${CPYTHON_PATH}/bin" $PYTHON_INTERPRETER setup.py build_ext 
--inplace --with-parquet --bundle-arrow-cpp --bundle-boost 
--boost-namespace=arrow_boost
 PATH="$PATH:${CPYTHON_PATH}/bin" $PYTHON_INTERPRETER setup.py bdist_wheel
+PATH="$PATH:${CPYTHON_PATH}/bin" $PYTHON_INTERPRETER setup.py sdist
 
 echo "=== (${PYTHON}) Test the existence of optional modules ==="
 $PIP install -r requirements.txt
@@ -88,4 +89,5 @@ for PYTHON_TUPLE in ${PYTHON_VERSIONS}; do
 deactivate
 
 mv repaired_wheels/*.whl /io/dist
+mv dist/*.tar.gz /io/dist
 done


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Add source distribution publishing instructions to package / release 
> management documentation
> --
>
> Key: ARROW-2301
> URL: https://issues.apache.org/jira/browse/ARROW-2301
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We wish to start publishing source tarballs for Python on PyPI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2301) [Python] Add source distribution publishing instructions to package / release management documentation

2018-03-26 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-2301.
-
   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1795
[https://github.com/apache/arrow/pull/1795]

> [Python] Add source distribution publishing instructions to package / release 
> management documentation
> --
>
> Key: ARROW-2301
> URL: https://issues.apache.org/jira/browse/ARROW-2301
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We wish to start publishing source tarballs for Python on PyPI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-640) [Python] Arrow scalar values should have a sensible __hash__ and comparison

2018-03-26 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-640.

Resolution: Fixed

Issue resolved by pull request 1765
[https://github.com/apache/arrow/pull/1765]

> [Python] Arrow scalar values should have a sensible __hash__ and comparison
> ---
>
> Key: ARROW-640
> URL: https://issues.apache.org/jira/browse/ARROW-640
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Miki Tebeka
>Assignee: Alex Hagerman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> {noformat}
> In [86]: arr = pa.from_pylist([1, 1, 1, 2])
> In [87]: set(arr)
> Out[87]: {1, 2, 1, 1}
> In [88]: arr[0] == arr[1]
> Out[88]: False
> In [89]: arr
> Out[89]: 
> 
> [
>   1,
>   1,
>   1,
>   2
> ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-640) [Python] Arrow scalar values should have a sensible __hash__ and comparison

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414620#comment-16414620
 ] 

ASF GitHub Bot commented on ARROW-640:
--

wesm closed pull request #1765: ARROW-640: [Python] Implement __hash__ and 
equality for Array scalar values Arrow scalar values
URL: https://github.com/apache/arrow/pull/1765
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/python/pyarrow/scalar.pxi b/python/pyarrow/scalar.pxi
index a801acd69..bbbefd834 100644
--- a/python/pyarrow/scalar.pxi
+++ b/python/pyarrow/scalar.pxi
@@ -73,6 +73,8 @@ cdef class ArrayValue(Scalar):
 raise NotImplementedError(
 "Cannot compare Arrow values that don't support as_py()")
 
+def __hash__(self):
+return hash(self.as_py())
 
 cdef class BooleanValue(ArrayValue):
 
diff --git a/python/pyarrow/tests/test_scalars.py 
b/python/pyarrow/tests/test_scalars.py
index 7061a0d3a..92db9b1e0 100644
--- a/python/pyarrow/tests/test_scalars.py
+++ b/python/pyarrow/tests/test_scalars.py
@@ -171,3 +171,30 @@ def test_dictionary(self):
categorical.categories)
 for i, c in enumerate(values):
 assert v[i].as_py() == c
+
+def test_int_hash(self):
+# ARROW-640
+int_arr = pa.array([1, 1, 2, 1])
+assert hash(int_arr[0]) == hash(1)
+
+def test_float_hash(self):
+# ARROW-640
+float_arr = pa.array([1.4, 1.2, 2.5, 1.8])
+assert hash(float_arr[0]) == hash(1.4)
+
+def test_string_hash(self):
+# ARROW-640
+str_arr = pa.array(["foo", "bar"])
+assert hash(str_arr[1]) == hash("bar")
+
+def test_bytes_hash(self):
+# ARROW-640
+byte_arr = pa.array([b'foo', None, b'bar'])
+assert hash(byte_arr[2]) == hash(b"bar")
+
+def test_array_to_set(self):
+# ARROW-640
+arr = pa.array([1, 1, 2, 1])
+set_from_array = set(arr)
+assert isinstance(set_from_array, set)
+assert set_from_array == {1, 2}


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Arrow scalar values should have a sensible __hash__ and comparison
> ---
>
> Key: ARROW-640
> URL: https://issues.apache.org/jira/browse/ARROW-640
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Miki Tebeka
>Assignee: Alex Hagerman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> {noformat}
> In [86]: arr = pa.from_pylist([1, 1, 1, 2])
> In [87]: set(arr)
> Out[87]: {1, 2, 1, 1}
> In [88]: arr[0] == arr[1]
> Out[88]: False
> In [89]: arr
> Out[89]: 
> 
> [
>   1,
>   1,
>   1,
>   2
> ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX

2018-03-26 Thread Phillip Cloud (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414566#comment-16414566
 ] 

Phillip Cloud commented on ARROW-2355:
--

[~xhochy] What is the status of the OS X wheels?

> [Python] Unable to import pyarrow [0.9.0] OSX
> -
>
> Key: ARROW-2355
> URL: https://issues.apache.org/jira/browse/ARROW-2355
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Bradford W Littooy
>Priority: Major
>
> I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to 
> import pyarrow into a python3.6 interpreter, I get the following import error:
>  
> >>> import pyarrow
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py",
>  line 47, in 
>     from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: libarrow_boost_system.dylib
>   Referenced from: 
> /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib
>   Reason: image not found
> >>>
> I've installed pyarrow (0.9) on an EC2 instance with no issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX

2018-03-26 Thread Bradford W Littooy (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bradford W Littooy updated ARROW-2355:
--
Summary: [Python] Unable to import pyarrow [0.9.0] OSX  (was: [Python] 
Unable to import pyarrow [0.9.0] OSX via pip)

> [Python] Unable to import pyarrow [0.9.0] OSX
> -
>
> Key: ARROW-2355
> URL: https://issues.apache.org/jira/browse/ARROW-2355
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Bradford W Littooy
>Priority: Major
>
> I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to 
> import pyarrow into a python3.6 interpreter, I get the following import error:
>  
> >>> import pyarrow
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py",
>  line 47, in 
>     from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: libarrow_boost_system.dylib
>   Referenced from: 
> /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib
>   Reason: image not found
> >>>
> I've installed pyarrow (0.9) on an EC2 instance with no issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX via pip

2018-03-26 Thread Bradford W Littooy (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bradford W Littooy updated ARROW-2355:
--
Description: 
I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to 
import pyarrow into a python3.6 interpreter, I get the following import error:

 

>>> import pyarrow

Traceback (most recent call last):

  File "", line 1, in 

  File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py",
 line 47, in 

    from pyarrow.lib import cpu_count, set_cpu_count

ImportError: 
dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
 2): Library not loaded: libarrow_boost_system.dylib

  Referenced from: 
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib

  Reason: image not found

>>>

I've installed pyarrow (0.9) on an EC2 instance with no issue. 

  was:
I just installed pyarrow to my mac os x (version 10.13.3). When I try to import 
pyarrow into a python3.6 interpreter, I get the following import error:

 

>>> import pyarrow

Traceback (most recent call last):

  File "", line 1, in 

  File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py",
 line 47, in 

    from pyarrow.lib import cpu_count, set_cpu_count

ImportError: 
dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
 2): Library not loaded: libarrow_boost_system.dylib

  Referenced from: 
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib

  Reason: image not found

>>>

I've installed pyarrow (0.9) on an EC2 instance with no issue. 

Summary: [Python] Unable to import pyarrow [0.9.0] OSX via pip  (was: 
[Python] Unable to import pyarrow [0.9.0] OSX)

> [Python] Unable to import pyarrow [0.9.0] OSX via pip
> -
>
> Key: ARROW-2355
> URL: https://issues.apache.org/jira/browse/ARROW-2355
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Bradford W Littooy
>Priority: Major
>
> I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to 
> import pyarrow into a python3.6 interpreter, I get the following import error:
>  
> >>> import pyarrow
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py",
>  line 47, in 
>     from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: libarrow_boost_system.dylib
>   Referenced from: 
> /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib
>   Reason: image not found
> >>>
> I've installed pyarrow (0.9) on an EC2 instance with no issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0]

2018-03-26 Thread Bradford W Littooy (JIRA)
Bradford W Littooy created ARROW-2355:
-

 Summary: [Python] Unable to import pyarrow [0.9.0]
 Key: ARROW-2355
 URL: https://issues.apache.org/jira/browse/ARROW-2355
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Bradford W Littooy


I just installed pyarrow to my mac os x (version 10.13.3). When I try to import 
pyarrow into a python3.6 interpreter, I get the following import error:

 

>>> import pyarrow

Traceback (most recent call last):

  File "", line 1, in 

  File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py",
 line 47, in 

    from pyarrow.lib import cpu_count, set_cpu_count

ImportError: 
dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
 2): Library not loaded: libarrow_boost_system.dylib

  Referenced from: 
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib

  Reason: image not found

>>>

I've installed pyarrow (0.9) on an EC2 instance with no issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX

2018-03-26 Thread Bradford W Littooy (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bradford W Littooy updated ARROW-2355:
--
Summary: [Python] Unable to import pyarrow [0.9.0] OSX  (was: [Python] 
Unable to import pyarrow [0.9.0])

> [Python] Unable to import pyarrow [0.9.0] OSX
> -
>
> Key: ARROW-2355
> URL: https://issues.apache.org/jira/browse/ARROW-2355
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Bradford W Littooy
>Priority: Major
>
> I just installed pyarrow to my mac os x (version 10.13.3). When I try to 
> import pyarrow into a python3.6 interpreter, I get the following import error:
>  
> >>> import pyarrow
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py",
>  line 47, in 
>     from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: libarrow_boost_system.dylib
>   Referenced from: 
> /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib
>   Reason: image not found
> >>>
> I've installed pyarrow (0.9) on an EC2 instance with no issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414495#comment-16414495
 ] 

ASF GitHub Bot commented on ARROW-2354:
---

cpcloud commented on a change in pull request #1794: ARROW-2354: [C++] Make 
PyDecimal_Check() faster
URL: https://github.com/apache/arrow/pull/1794#discussion_r177230296
 
 

 ##
 File path: cpp/src/arrow/python/helpers.cc
 ##
 @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) 
{
 }
 
 bool PyDecimal_Check(PyObject* obj) {
-  // TODO(phillipc): Is this expensive?
-  OwnedRef Decimal;
-  Status status = ImportDecimalType();
-  DCHECK_OK(status);
-  const int32_t result = PyObject_IsInstance(obj, Decimal.obj());
-  DCHECK_NE(result, -1) << " error during PyObject_IsInstance check";
+  static OwnedRef decimal_type;
 
 Review comment:
   Great.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] PyDecimal_Check() is much too slow
> 
>
> Key: ARROW-2354
> URL: https://issues.apache.org/jira/browse/ARROW-2354
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/apache/arrow/issues/1792



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2352) [C++/Python] Test OSX packaging in Travis matrix

2018-03-26 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414302#comment-16414302
 ] 

Wes McKinney commented on ARROW-2352:
-

+1 for nightly cron

> [C++/Python] Test OSX packaging in Travis matrix
> 
>
> Key: ARROW-2352
> URL: https://issues.apache.org/jira/browse/ARROW-2352
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.10.0
>
>
> Maybe we want to do this as a nightly cron. For a first draft, I will simply 
> add it to the matrix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1913) [Java] Fix Javadoc generation bugs with JDK8

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414300#comment-16414300
 ] 

ASF GitHub Bot commented on ARROW-1913:
---

wesm commented on issue #1788: ARROW-1913: [Java] Disable Javadoc doclint with 
Java 8
URL: https://github.com/apache/arrow/pull/1788#issuecomment-376268087
 
 
   https://github.com/apache/arrow/blob/master/.travis.yml#L101


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Fix Javadoc generation bugs with JDK8
> 
>
> Key: ARROW-1913
> URL: https://issues.apache.org/jira/browse/ARROW-1913
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Li Jin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> While trying to cut the release candidate, the source release script fails 
> due to various new Javadoc issues



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1913) [Java] Fix Javadoc generation bugs with JDK8

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414299#comment-16414299
 ] 

ASF GitHub Bot commented on ARROW-1913:
---

wesm commented on issue #1788: ARROW-1913: [Java] Disable Javadoc doclint with 
Java 8
URL: https://github.com/apache/arrow/pull/1788#issuecomment-376268076
 
 
   Should we change the Java part of the test matrix to use JDK8?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Fix Javadoc generation bugs with JDK8
> 
>
> Key: ARROW-1913
> URL: https://issues.apache.org/jira/browse/ARROW-1913
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Li Jin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> While trying to cut the release candidate, the source release script fails 
> due to various new Javadoc issues



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414224#comment-16414224
 ] 

ASF GitHub Bot commented on ARROW-2354:
---

pitrou commented on issue #1794: ARROW-2354: [C++] Make PyDecimal_Check() faster
URL: https://github.com/apache/arrow/pull/1794#issuecomment-376254213
 
 
   The benchmark already exists, see above :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] PyDecimal_Check() is much too slow
> 
>
> Key: ARROW-2354
> URL: https://issues.apache.org/jira/browse/ARROW-2354
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/apache/arrow/issues/1792



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414222#comment-16414222
 ] 

ASF GitHub Bot commented on ARROW-2354:
---

wesm commented on issue #1794: ARROW-2354: [C++] Make PyDecimal_Check() faster
URL: https://github.com/apache/arrow/pull/1794#issuecomment-376253924
 
 
   Can you add an ASV benchmark to exercise the performance issue in 
https://github.com/apache/arrow/issues/1792?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] PyDecimal_Check() is much too slow
> 
>
> Key: ARROW-2354
> URL: https://issues.apache.org/jira/browse/ARROW-2354
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/apache/arrow/issues/1792



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414184#comment-16414184
 ] 

ASF GitHub Bot commented on ARROW-2354:
---

pitrou commented on a change in pull request #1794: ARROW-2354: [C++] Make 
PyDecimal_Check() faster
URL: https://github.com/apache/arrow/pull/1794#discussion_r177172524
 
 

 ##
 File path: cpp/src/arrow/python/helpers.cc
 ##
 @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) 
{
 }
 
 bool PyDecimal_Check(PyObject* obj) {
-  // TODO(phillipc): Is this expensive?
-  OwnedRef Decimal;
-  Status status = ImportDecimalType();
-  DCHECK_OK(status);
-  const int32_t result = PyObject_IsInstance(obj, Decimal.obj());
-  DCHECK_NE(result, -1) << " error during PyObject_IsInstance check";
+  static OwnedRef decimal_type;
 
 Review comment:
   If the first implementation fails for someone, they'll notice and tell us.
   If the second implementation fails for someone, it will produce mysterious 
slowdowns.
   So I think the first implementation should win, for now at least.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] PyDecimal_Check() is much too slow
> 
>
> Key: ARROW-2354
> URL: https://issues.apache.org/jira/browse/ARROW-2354
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/apache/arrow/issues/1792



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414182#comment-16414182
 ] 

ASF GitHub Bot commented on ARROW-2354:
---

cpcloud commented on a change in pull request #1794: ARROW-2354: [C++] Make 
PyDecimal_Check() faster
URL: https://github.com/apache/arrow/pull/1794#discussion_r177171726
 
 

 ##
 File path: cpp/src/arrow/python/helpers.cc
 ##
 @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) 
{
 }
 
 bool PyDecimal_Check(PyObject* obj) {
-  // TODO(phillipc): Is this expensive?
-  OwnedRef Decimal;
-  Status status = ImportDecimalType();
-  DCHECK_OK(status);
-  const int32_t result = PyObject_IsInstance(obj, Decimal.obj());
-  DCHECK_NE(result, -1) << " error during PyObject_IsInstance check";
+  static OwnedRef decimal_type;
 
 Review comment:
   I guess it's a question of which is more stable: the C implementation of the 
decimal module or the way Python names types?
   
   I personally don't feel strongly about either one. If I was forced to 
choose, I probably stick with the first implementation. It seems less brittle, 
because it doesn't depend on Python's naming convention, but that intuition 
could be misleading. Maybe that's a very stable part of Python.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] PyDecimal_Check() is much too slow
> 
>
> Key: ARROW-2354
> URL: https://issues.apache.org/jira/browse/ARROW-2354
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/apache/arrow/issues/1792



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414170#comment-16414170
 ] 

ASF GitHub Bot commented on ARROW-2354:
---

pitrou commented on a change in pull request #1794: ARROW-2354: [C++] Make 
PyDecimal_Check() faster
URL: https://github.com/apache/arrow/pull/1794#discussion_r177169733
 
 

 ##
 File path: cpp/src/arrow/python/helpers.cc
 ##
 @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) 
{
 }
 
 bool PyDecimal_Check(PyObject* obj) {
-  // TODO(phillipc): Is this expensive?
-  OwnedRef Decimal;
-  Status status = ImportDecimalType();
-  DCHECK_OK(status);
-  const int32_t result = PyObject_IsInstance(obj, Decimal.obj());
-  DCHECK_NE(result, -1) << " error during PyObject_IsInstance check";
+  static OwnedRef decimal_type;
 
 Review comment:
   Here is an alternative implementation. Which one do you prefer?
   ```c++
   bool PyDecimal_Check(PyObject* obj) {
 OwnedRef decimal_type;
 PyTypeObject* obj_type = Py_TYPE(obj);
 // Fast fail for most types other than Decimal, for speed
 if (strcmp(obj_type->tp_name, "decimal.Decimal")) {
   return false;
 }
 Status status = ImportDecimalType(_type);
 DCHECK_OK(status);
 DCHECK(PyType_Check(decimal_type.obj()));
 // PyObject_IsInstance() is slower as it has to check for virtual 
subclasses
 const int result =
 PyType_IsSubtype(obj_type, 
reinterpret_cast(decimal_type.obj()));
 DCHECK_NE(result, -1) << " error during PyType_IsSubtype check";
 return result == 1;
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] PyDecimal_Check() is much too slow
> 
>
> Key: ARROW-2354
> URL: https://issues.apache.org/jira/browse/ARROW-2354
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/apache/arrow/issues/1792



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414166#comment-16414166
 ] 

ASF GitHub Bot commented on ARROW-2354:
---

pitrou commented on a change in pull request #1794: ARROW-2354: [C++] Make 
PyDecimal_Check() faster
URL: https://github.com/apache/arrow/pull/1794#discussion_r177168902
 
 

 ##
 File path: cpp/src/arrow/python/helpers.cc
 ##
 @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) 
{
 }
 
 bool PyDecimal_Check(PyObject* obj) {
-  // TODO(phillipc): Is this expensive?
-  OwnedRef Decimal;
-  Status status = ImportDecimalType();
-  DCHECK_OK(status);
-  const int32_t result = PyObject_IsInstance(obj, Decimal.obj());
-  DCHECK_NE(result, -1) << " error during PyObject_IsInstance check";
+  static OwnedRef decimal_type;
 
 Review comment:
   No, because depending on how the C `decimal` module is implemented, there 
may be a separate Decimal type per interpreter. This is currently not the case, 
though, so perhaps we shouldn't worry about it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] PyDecimal_Check() is much too slow
> 
>
> Key: ARROW-2354
> URL: https://issues.apache.org/jira/browse/ARROW-2354
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/apache/arrow/issues/1792



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414153#comment-16414153
 ] 

ASF GitHub Bot commented on ARROW-2354:
---

cpcloud commented on a change in pull request #1794: ARROW-2354: [C++] Make 
PyDecimal_Check() faster
URL: https://github.com/apache/arrow/pull/1794#discussion_r177167278
 
 

 ##
 File path: cpp/src/arrow/python/helpers.cc
 ##
 @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) 
{
 }
 
 bool PyDecimal_Check(PyObject* obj) {
-  // TODO(phillipc): Is this expensive?
-  OwnedRef Decimal;
-  Status status = ImportDecimalType();
-  DCHECK_OK(status);
-  const int32_t result = PyObject_IsInstance(obj, Decimal.obj());
-  DCHECK_NE(result, -1) << " error during PyObject_IsInstance check";
+  static OwnedRef decimal_type;
 
 Review comment:
   Is that because the reference count will be shared across them?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] PyDecimal_Check() is much too slow
> 
>
> Key: ARROW-2354
> URL: https://issues.apache.org/jira/browse/ARROW-2354
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/apache/arrow/issues/1792



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414151#comment-16414151
 ] 

ASF GitHub Bot commented on ARROW-2354:
---

cpcloud commented on a change in pull request #1794: ARROW-2354: [C++] Make 
PyDecimal_Check() faster
URL: https://github.com/apache/arrow/pull/1794#discussion_r177166321
 
 

 ##
 File path: cpp/src/arrow/python/helpers.cc
 ##
 @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) 
{
 }
 
 bool PyDecimal_Check(PyObject* obj) {
-  // TODO(phillipc): Is this expensive?
-  OwnedRef Decimal;
-  Status status = ImportDecimalType();
-  DCHECK_OK(status);
-  const int32_t result = PyObject_IsInstance(obj, Decimal.obj());
-  DCHECK_NE(result, -1) << " error during PyObject_IsInstance check";
+  static OwnedRef decimal_type;
+  if (!decimal_type.obj()) {
+Status status = ImportDecimalType(_type);
+DCHECK_OK(status);
+DCHECK(PyType_Check(decimal_type.obj()));
+  }
+  // PyObject_IsInstance() is slower as it has to check for virtual subclasses
 
 Review comment:
   This was the cause of the performance regression, and not importing over and 
over? Or was it both?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] PyDecimal_Check() is much too slow
> 
>
> Key: ARROW-2354
> URL: https://issues.apache.org/jira/browse/ARROW-2354
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/apache/arrow/issues/1792



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414150#comment-16414150
 ] 

ASF GitHub Bot commented on ARROW-2354:
---

cpcloud commented on a change in pull request #1794: ARROW-2354: [C++] Make 
PyDecimal_Check() faster
URL: https://github.com/apache/arrow/pull/1794#discussion_r177166026
 
 

 ##
 File path: cpp/src/arrow/python/helpers.cc
 ##
 @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) 
{
 }
 
 bool PyDecimal_Check(PyObject* obj) {
-  // TODO(phillipc): Is this expensive?
-  OwnedRef Decimal;
-  Status status = ImportDecimalType();
-  DCHECK_OK(status);
-  const int32_t result = PyObject_IsInstance(obj, Decimal.obj());
-  DCHECK_NE(result, -1) << " error during PyObject_IsInstance check";
+  static OwnedRef decimal_type;
+  if (!decimal_type.obj()) {
 
 Review comment:
   Maybe compare against `nullptr` here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] PyDecimal_Check() is much too slow
> 
>
> Key: ARROW-2354
> URL: https://issues.apache.org/jira/browse/ARROW-2354
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/apache/arrow/issues/1792



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414152#comment-16414152
 ] 

ASF GitHub Bot commented on ARROW-2354:
---

pitrou commented on a change in pull request #1794: ARROW-2354: [C++] Make 
PyDecimal_Check() faster
URL: https://github.com/apache/arrow/pull/1794#discussion_r177167130
 
 

 ##
 File path: cpp/src/arrow/python/helpers.cc
 ##
 @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) 
{
 }
 
 bool PyDecimal_Check(PyObject* obj) {
-  // TODO(phillipc): Is this expensive?
-  OwnedRef Decimal;
-  Status status = ImportDecimalType();
-  DCHECK_OK(status);
-  const int32_t result = PyObject_IsInstance(obj, Decimal.obj());
-  DCHECK_NE(result, -1) << " error during PyObject_IsInstance check";
+  static OwnedRef decimal_type;
+  if (!decimal_type.obj()) {
+Status status = ImportDecimalType(_type);
+DCHECK_OK(status);
+DCHECK(PyType_Check(decimal_type.obj()));
+  }
+  // PyObject_IsInstance() is slower as it has to check for virtual subclasses
 
 Review comment:
   It was importing over and over.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] PyDecimal_Check() is much too slow
> 
>
> Key: ARROW-2354
> URL: https://issues.apache.org/jira/browse/ARROW-2354
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/apache/arrow/issues/1792



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414143#comment-16414143
 ] 

ASF GitHub Bot commented on ARROW-2354:
---

pitrou commented on a change in pull request #1794: ARROW-2354: [C++] Make 
PyDecimal_Check() faster
URL: https://github.com/apache/arrow/pull/1794#discussion_r177166295
 
 

 ##
 File path: cpp/src/arrow/python/helpers.cc
 ##
 @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) 
{
 }
 
 bool PyDecimal_Check(PyObject* obj) {
-  // TODO(phillipc): Is this expensive?
-  OwnedRef Decimal;
-  Status status = ImportDecimalType();
-  DCHECK_OK(status);
-  const int32_t result = PyObject_IsInstance(obj, Decimal.obj());
-  DCHECK_NE(result, -1) << " error during PyObject_IsInstance check";
+  static OwnedRef decimal_type;
 
 Review comment:
   Note that this may not play well with multiple Python (sub)interpreters in 
the same process.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] PyDecimal_Check() is much too slow
> 
>
> Key: ARROW-2354
> URL: https://issues.apache.org/jira/browse/ARROW-2354
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/apache/arrow/issues/1792



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414137#comment-16414137
 ] 

ASF GitHub Bot commented on ARROW-2354:
---

cpcloud commented on a change in pull request #1794: ARROW-2354: [C++] Make 
PyDecimal_Check() faster
URL: https://github.com/apache/arrow/pull/1794#discussion_r177165146
 
 

 ##
 File path: cpp/src/arrow/python/helpers.cc
 ##
 @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) 
{
 }
 
 bool PyDecimal_Check(PyObject* obj) {
-  // TODO(phillipc): Is this expensive?
 
 Review comment:
   Guess the answer is "yes" :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] PyDecimal_Check() is much too slow
> 
>
> Key: ARROW-2354
> URL: https://issues.apache.org/jira/browse/ARROW-2354
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/apache/arrow/issues/1792



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2301) [Python] Add source distribution publishing instructions to package / release management documentation

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414136#comment-16414136
 ] 

ASF GitHub Bot commented on ARROW-2301:
---

xhochy opened a new pull request #1795: ARROW-2301: [Python] Build source 
distribution inside the manylinux1 docker
URL: https://github.com/apache/arrow/pull/1795
 
 
   @kou Once this is merged, I will upload the source distribution for 0.9.0.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Add source distribution publishing instructions to package / release 
> management documentation
> --
>
> Key: ARROW-2301
> URL: https://issues.apache.org/jira/browse/ARROW-2301
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
>
> We wish to start publishing source tarballs for Python on PyPI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2301) [Python] Add source distribution publishing instructions to package / release management documentation

2018-03-26 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2301:
--
Labels: pull-request-available  (was: )

> [Python] Add source distribution publishing instructions to package / release 
> management documentation
> --
>
> Key: ARROW-2301
> URL: https://issues.apache.org/jira/browse/ARROW-2301
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
>
> We wish to start publishing source tarballs for Python on PyPI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2354) [C++] PyDecimal_Check() is much too slow

2018-03-26 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2354:
--
Labels: pull-request-available  (was: )

> [C++] PyDecimal_Check() is much too slow
> 
>
> Key: ARROW-2354
> URL: https://issues.apache.org/jira/browse/ARROW-2354
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/apache/arrow/issues/1792



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2354) [C++] PyDecimal_Check() is much too slow

2018-03-26 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2354:
-

 Summary: [C++] PyDecimal_Check() is much too slow
 Key: ARROW-2354
 URL: https://issues.apache.org/jira/browse/ARROW-2354
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Affects Versions: 0.9.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


See https://github.com/apache/arrow/issues/1792



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-640) [Python] Arrow scalar values should have a sensible __hash__ and comparison

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414108#comment-16414108
 ] 

ASF GitHub Bot commented on ARROW-640:
--

AlexHagerman commented on issue #1765: ARROW-640: [Python] Implement __hash__ 
and equality for Array scalar values Arrow scalar values
URL: https://github.com/apache/arrow/pull/1765#issuecomment-376230898
 
 
   @pitrou any other feedback or comments on this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Arrow scalar values should have a sensible __hash__ and comparison
> ---
>
> Key: ARROW-640
> URL: https://issues.apache.org/jira/browse/ARROW-640
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Miki Tebeka
>Assignee: Alex Hagerman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> {noformat}
> In [86]: arr = pa.from_pylist([1, 1, 1, 2])
> In [87]: set(arr)
> Out[87]: {1, 2, 1, 1}
> In [88]: arr[0] == arr[1]
> Out[88]: False
> In [89]: arr
> Out[89]: 
> 
> [
>   1,
>   1,
>   1,
>   2
> ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2353) Test correctness of built wheel on AppVeyor

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414090#comment-16414090
 ] 

ASF GitHub Bot commented on ARROW-2353:
---

pitrou commented on a change in pull request #1793: ARROW-2353: [CI] Check 
correctness of built wheel on AppVeyor
URL: https://github.com/apache/arrow/pull/1793#discussion_r177153858
 
 

 ##
 File path: ci/msvc-build.bat
 ##
 @@ -103,12 +103,12 @@ cmake -G "%GENERATOR%" ^
 cmake --build . --target install --config %CONFIGURATION%  || exit /B
 
 @rem Needed so python-test.exe works
-set OLD_PYTHONPATH=%PYTHONPATH%
-set 
PYTHONPATH=%CONDA_PREFIX%\Lib;%CONDA_PREFIX%\Lib\site-packages;%CONDA_PREFIX%\python35.zip;%CONDA_PREFIX%\DLLs;%CONDA_PREFIX%;%PYTHONPATH%
+set OLD_PYTHONHOME=%PYTHONHOME%
+set PYTHONHOME=%CONDA_PREFIX%
 
 Review comment:
   @wesm This can also be done in `python-test.exe` instead. Which one do you 
prefer?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Test correctness of built wheel on AppVeyor
> ---
>
> Key: ARROW-2353
> URL: https://issues.apache.org/jira/browse/ARROW-2353
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Continuous Integration, Python
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2332) [Python] Provide API for reading multiple Feather files

2018-03-26 Thread Dhruv Madeka (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16413444#comment-16413444
 ] 

Dhruv Madeka commented on ARROW-2332:
-

[~wesmckinn] - let me know if that sounds like a good plan, I'll try to make a 
PR

> [Python] Provide API for reading multiple Feather files
> ---
>
> Key: ARROW-2332
> URL: https://issues.apache.org/jira/browse/ARROW-2332
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.10.0
>
>
> See discussion in 
> https://github.com/wesm/feather/issues/273#issuecomment-374093374



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)