[jira] [Commented] (ARROW-2301) [Python] Add source distribution publishing instructions to package / release management documentation
[ https://issues.apache.org/jira/browse/ARROW-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414912#comment-16414912 ] ASF GitHub Bot commented on ARROW-2301: --- kou commented on issue #1795: ARROW-2301: [Python] Build source distribution inside the manylinux1 docker URL: https://github.com/apache/arrow/pull/1795#issuecomment-376374365 Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Add source distribution publishing instructions to package / release > management documentation > -- > > Key: ARROW-2301 > URL: https://issues.apache.org/jira/browse/ARROW-2301 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > We wish to start publishing source tarballs for Python on PyPI -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2358) API for Writing to Multiple Feather Files
Dhruv Madeka created ARROW-2358: --- Summary: API for Writing to Multiple Feather Files Key: ARROW-2358 URL: https://issues.apache.org/jira/browse/ARROW-2358 Project: Apache Arrow Issue Type: New Feature Components: C, C++, Python Affects Versions: 0.9.0 Reporter: Dhruv Madeka Fix For: 0.10.0 It would be really great to have an API which can write a Table to a `FeatherDataset`. Essentially, taking a name for a file - it would split the table into N-equal parts (which could be determined by the user or the code) and then write the data to N files with a suffix (which is `_part` by default but could be user specificed). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX
[ https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414792#comment-16414792 ] Nicholas Schrock commented on ARROW-2355: - I'm also able to reproduce this issue. I installed pyarrow 0.8.0 as a workaround. > [Python] Unable to import pyarrow [0.9.0] OSX > - > > Key: ARROW-2355 > URL: https://issues.apache.org/jira/browse/ARROW-2355 > Project: Apache Arrow > Issue Type: Bug >Reporter: Bradford W Littooy >Priority: Major > > I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to > import pyarrow into a python3.6 interpreter, I get the following import error: > > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File > "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py", > line 47, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: > dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so, > 2): Library not loaded: libarrow_boost_system.dylib > Referenced from: > /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib > Reason: image not found > >>> > I've installed pyarrow (0.9) on an EC2 instance with no issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2357) Benchmark PandasObjectIsNull
Phillip Cloud created ARROW-2357: Summary: Benchmark PandasObjectIsNull Key: ARROW-2357 URL: https://issues.apache.org/jira/browse/ARROW-2357 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.9.0 Reporter: Phillip Cloud Fix For: 0.10.0 This is a follow-up to ARROW-2354 ([C++] Make PyDecimal_Check() faster). We should benchmark {{PandasObjectIsNull}} as it gets called in many of our conversion routines in tight loops. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow
[ https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414791#comment-16414791 ] ASF GitHub Bot commented on ARROW-2354: --- cpcloud commented on issue #1794: ARROW-2354: [C++] Make PyDecimal_Check() faster URL: https://github.com/apache/arrow/pull/1794#issuecomment-376349245 @pitrou @wesm Follow-up JIRA'd: https://issues.apache.org/jira/browse/ARROW-2357 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] PyDecimal_Check() is much too slow > > > Key: ARROW-2354 > URL: https://issues.apache.org/jira/browse/ARROW-2354 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > See https://github.com/apache/arrow/issues/1792 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow
[ https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414780#comment-16414780 ] ASF GitHub Bot commented on ARROW-2354: --- cpcloud closed pull request #1794: ARROW-2354: [C++] Make PyDecimal_Check() faster URL: https://github.com/apache/arrow/pull/1794 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/cpp/src/arrow/python/helpers.cc b/cpp/src/arrow/python/helpers.cc index 5719af6f3..63fee54b6 100644 --- a/cpp/src/arrow/python/helpers.cc +++ b/cpp/src/arrow/python/helpers.cc @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) { } bool PyDecimal_Check(PyObject* obj) { - // TODO(phillipc): Is this expensive? - OwnedRef Decimal; - Status status = ImportDecimalType(&Decimal); - DCHECK_OK(status); - const int32_t result = PyObject_IsInstance(obj, Decimal.obj()); - DCHECK_NE(result, -1) << " error during PyObject_IsInstance check"; + static OwnedRef decimal_type; + if (!decimal_type.obj()) { +Status status = ImportDecimalType(&decimal_type); +DCHECK_OK(status); +DCHECK(PyType_Check(decimal_type.obj())); + } + // PyObject_IsInstance() is slower as it has to check for virtual subclasses + const int result = + PyType_IsSubtype(Py_TYPE(obj), reinterpret_cast(decimal_type.obj())); + DCHECK_NE(result, -1) << " error during PyType_IsSubtype check"; return result == 1; } This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] PyDecimal_Check() is much too slow > > > Key: ARROW-2354 > URL: https://issues.apache.org/jira/browse/ARROW-2354 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > See https://github.com/apache/arrow/issues/1792 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-2354) [C++] PyDecimal_Check() is much too slow
[ https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phillip Cloud resolved ARROW-2354. -- Resolution: Fixed Fix Version/s: 0.10.0 Issue resolved by pull request 1794 [https://github.com/apache/arrow/pull/1794] > [C++] PyDecimal_Check() is much too slow > > > Key: ARROW-2354 > URL: https://issues.apache.org/jira/browse/ARROW-2354 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > See https://github.com/apache/arrow/issues/1792 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow
[ https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414779#comment-16414779 ] ASF GitHub Bot commented on ARROW-2354: --- cpcloud commented on issue #1794: ARROW-2354: [C++] Make PyDecimal_Check() faster URL: https://github.com/apache/arrow/pull/1794#issuecomment-376346711 I'll add the patch and bump the build number to `arrow-cpp`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] PyDecimal_Check() is much too slow > > > Key: ARROW-2354 > URL: https://issues.apache.org/jira/browse/ARROW-2354 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > > See https://github.com/apache/arrow/issues/1792 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2356) [JS] JSON reader fails on FixedSizeBinary data buffer
[ https://issues.apache.org/jira/browse/ARROW-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Taylor updated ARROW-2356: --- Description: The JSON reader doesn't ingest the FixedSizeBinary data buffer correctly, and we haven't known about it because the JS integration test runner is accidentally exiting with code 0 on failures. (was: The JSON reader doesn't ingest the FixedSizeBinary data buffer correctly, and we haven't known about it the JS integration test runner is accidentally exiting with code 0 on failures.) > [JS] JSON reader fails on FixedSizeBinary data buffer > - > > Key: ARROW-2356 > URL: https://issues.apache.org/jira/browse/ARROW-2356 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: JS-0.3.1 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Labels: pull-request-available > Fix For: JS-0.4.0 > > > The JSON reader doesn't ingest the FixedSizeBinary data buffer correctly, and > we haven't known about it because the JS integration test runner is > accidentally exiting with code 0 on failures. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2356) [JS] JSON reader fails on FixedSizeBinary data buffer
[ https://issues.apache.org/jira/browse/ARROW-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414758#comment-16414758 ] ASF GitHub Bot commented on ARROW-2356: --- trxcllnt opened a new pull request #1796: ARROW-2356: [JS] Fix JSON Reader FixedSizeBinary Vectors URL: https://github.com/apache/arrow/pull/1796 Resolves https://issues.apache.org/jira/browse/ARROW-2356 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] JSON reader fails on FixedSizeBinary data buffer > - > > Key: ARROW-2356 > URL: https://issues.apache.org/jira/browse/ARROW-2356 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: JS-0.3.1 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Labels: pull-request-available > Fix For: JS-0.4.0 > > > The JSON reader doesn't ingest the FixedSizeBinary data buffer correctly, and > we haven't known about it the JS integration test runner is accidentally > exiting with code 0 on failures. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2356) [JS] JSON reader fails on FixedSizeBinary data buffer
[ https://issues.apache.org/jira/browse/ARROW-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2356: -- Labels: pull-request-available (was: ) > [JS] JSON reader fails on FixedSizeBinary data buffer > - > > Key: ARROW-2356 > URL: https://issues.apache.org/jira/browse/ARROW-2356 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: JS-0.3.1 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Labels: pull-request-available > Fix For: JS-0.4.0 > > > The JSON reader doesn't ingest the FixedSizeBinary data buffer correctly, and > we haven't known about it the JS integration test runner is accidentally > exiting with code 0 on failures. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2356) [JS] JSON reader fails on FixedSizeBinary data buffer
Paul Taylor created ARROW-2356: -- Summary: [JS] JSON reader fails on FixedSizeBinary data buffer Key: ARROW-2356 URL: https://issues.apache.org/jira/browse/ARROW-2356 Project: Apache Arrow Issue Type: Bug Components: JavaScript Affects Versions: JS-0.3.1 Reporter: Paul Taylor Assignee: Paul Taylor Fix For: JS-0.4.0 The JSON reader doesn't ingest the FixedSizeBinary data buffer correctly, and we haven't known about it the JS integration test runner is accidentally exiting with code 0 on failures. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow
[ https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414662#comment-16414662 ] ASF GitHub Bot commented on ARROW-2354: --- cpcloud commented on a change in pull request #1794: ARROW-2354: [C++] Make PyDecimal_Check() faster URL: https://github.com/apache/arrow/pull/1794#discussion_r177253437 ## File path: cpp/src/arrow/python/helpers.cc ## @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) { } bool PyDecimal_Check(PyObject* obj) { - // TODO(phillipc): Is this expensive? - OwnedRef Decimal; - Status status = ImportDecimalType(&Decimal); - DCHECK_OK(status); - const int32_t result = PyObject_IsInstance(obj, Decimal.obj()); - DCHECK_NE(result, -1) << " error during PyObject_IsInstance check"; + static OwnedRef decimal_type; + if (!decimal_type.obj()) { Review comment: @wesm This is my only comment, which isn't necessary for merging. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] PyDecimal_Check() is much too slow > > > Key: ARROW-2354 > URL: https://issues.apache.org/jira/browse/ARROW-2354 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > > See https://github.com/apache/arrow/issues/1792 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow
[ https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414657#comment-16414657 ] ASF GitHub Bot commented on ARROW-2354: --- wesm commented on issue #1794: ARROW-2354: [C++] Make PyDecimal_Check() faster URL: https://github.com/apache/arrow/pull/1794#issuecomment-376330717 @cpcloud I'll wait for you to have a last look at this before merging (you have a comment unaddressed still) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] PyDecimal_Check() is much too slow > > > Key: ARROW-2354 > URL: https://issues.apache.org/jira/browse/ARROW-2354 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > > See https://github.com/apache/arrow/issues/1792 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2301) [Python] Add source distribution publishing instructions to package / release management documentation
[ https://issues.apache.org/jira/browse/ARROW-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414649#comment-16414649 ] ASF GitHub Bot commented on ARROW-2301: --- wesm closed pull request #1795: ARROW-2301: [Python] Build source distribution inside the manylinux1 docker URL: https://github.com/apache/arrow/pull/1795 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/dev/release/RELEASE_MANAGEMENT.md b/dev/release/RELEASE_MANAGEMENT.md index 0069a2af8..06340ab11 100644 --- a/dev/release/RELEASE_MANAGEMENT.md +++ b/dev/release/RELEASE_MANAGEMENT.md @@ -154,7 +154,8 @@ The pip binary packages (called "wheels") are generated from the * Push arrow-dist updates to **both** apache/arrow-dist and your fork of arrow-dist. * Wait for builds to complete -* Download all wheel files from the new BinTray package version ([example][4]) +* Download all wheel and tar.gz files from the new BinTray package version + ([example][4]) Now, you can finally upload the wheels to PyPI using the `twine` CLI tool. You must be permissioned on PyPI to upload here; ask Wes McKinney or Uwe Korn if diff --git a/python/manylinux1/build_arrow.sh b/python/manylinux1/build_arrow.sh index 5df55a65c..6697733d0 100755 --- a/python/manylinux1/build_arrow.sh +++ b/python/manylinux1/build_arrow.sh @@ -70,6 +70,7 @@ for PYTHON_TUPLE in ${PYTHON_VERSIONS}; do echo "=== (${PYTHON}) Building wheel ===" PATH="$PATH:${CPYTHON_PATH}/bin" $PYTHON_INTERPRETER setup.py build_ext --inplace --with-parquet --bundle-arrow-cpp --bundle-boost --boost-namespace=arrow_boost PATH="$PATH:${CPYTHON_PATH}/bin" $PYTHON_INTERPRETER setup.py bdist_wheel +PATH="$PATH:${CPYTHON_PATH}/bin" $PYTHON_INTERPRETER setup.py sdist echo "=== (${PYTHON}) Test the existence of optional modules ===" $PIP install -r requirements.txt @@ -88,4 +89,5 @@ for PYTHON_TUPLE in ${PYTHON_VERSIONS}; do deactivate mv repaired_wheels/*.whl /io/dist +mv dist/*.tar.gz /io/dist done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Add source distribution publishing instructions to package / release > management documentation > -- > > Key: ARROW-2301 > URL: https://issues.apache.org/jira/browse/ARROW-2301 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > We wish to start publishing source tarballs for Python on PyPI -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-2301) [Python] Add source distribution publishing instructions to package / release management documentation
[ https://issues.apache.org/jira/browse/ARROW-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-2301. - Resolution: Fixed Fix Version/s: 0.10.0 Issue resolved by pull request 1795 [https://github.com/apache/arrow/pull/1795] > [Python] Add source distribution publishing instructions to package / release > management documentation > -- > > Key: ARROW-2301 > URL: https://issues.apache.org/jira/browse/ARROW-2301 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > We wish to start publishing source tarballs for Python on PyPI -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-640) [Python] Arrow scalar values should have a sensible __hash__ and comparison
[ https://issues.apache.org/jira/browse/ARROW-640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-640. Resolution: Fixed Issue resolved by pull request 1765 [https://github.com/apache/arrow/pull/1765] > [Python] Arrow scalar values should have a sensible __hash__ and comparison > --- > > Key: ARROW-640 > URL: https://issues.apache.org/jira/browse/ARROW-640 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Miki Tebeka >Assignee: Alex Hagerman >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > {noformat} > In [86]: arr = pa.from_pylist([1, 1, 1, 2]) > In [87]: set(arr) > Out[87]: {1, 2, 1, 1} > In [88]: arr[0] == arr[1] > Out[88]: False > In [89]: arr > Out[89]: > > [ > 1, > 1, > 1, > 2 > ] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-640) [Python] Arrow scalar values should have a sensible __hash__ and comparison
[ https://issues.apache.org/jira/browse/ARROW-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414620#comment-16414620 ] ASF GitHub Bot commented on ARROW-640: -- wesm closed pull request #1765: ARROW-640: [Python] Implement __hash__ and equality for Array scalar values Arrow scalar values URL: https://github.com/apache/arrow/pull/1765 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/python/pyarrow/scalar.pxi b/python/pyarrow/scalar.pxi index a801acd69..bbbefd834 100644 --- a/python/pyarrow/scalar.pxi +++ b/python/pyarrow/scalar.pxi @@ -73,6 +73,8 @@ cdef class ArrayValue(Scalar): raise NotImplementedError( "Cannot compare Arrow values that don't support as_py()") +def __hash__(self): +return hash(self.as_py()) cdef class BooleanValue(ArrayValue): diff --git a/python/pyarrow/tests/test_scalars.py b/python/pyarrow/tests/test_scalars.py index 7061a0d3a..92db9b1e0 100644 --- a/python/pyarrow/tests/test_scalars.py +++ b/python/pyarrow/tests/test_scalars.py @@ -171,3 +171,30 @@ def test_dictionary(self): categorical.categories) for i, c in enumerate(values): assert v[i].as_py() == c + +def test_int_hash(self): +# ARROW-640 +int_arr = pa.array([1, 1, 2, 1]) +assert hash(int_arr[0]) == hash(1) + +def test_float_hash(self): +# ARROW-640 +float_arr = pa.array([1.4, 1.2, 2.5, 1.8]) +assert hash(float_arr[0]) == hash(1.4) + +def test_string_hash(self): +# ARROW-640 +str_arr = pa.array(["foo", "bar"]) +assert hash(str_arr[1]) == hash("bar") + +def test_bytes_hash(self): +# ARROW-640 +byte_arr = pa.array([b'foo', None, b'bar']) +assert hash(byte_arr[2]) == hash(b"bar") + +def test_array_to_set(self): +# ARROW-640 +arr = pa.array([1, 1, 2, 1]) +set_from_array = set(arr) +assert isinstance(set_from_array, set) +assert set_from_array == {1, 2} This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Arrow scalar values should have a sensible __hash__ and comparison > --- > > Key: ARROW-640 > URL: https://issues.apache.org/jira/browse/ARROW-640 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Miki Tebeka >Assignee: Alex Hagerman >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > {noformat} > In [86]: arr = pa.from_pylist([1, 1, 1, 2]) > In [87]: set(arr) > Out[87]: {1, 2, 1, 1} > In [88]: arr[0] == arr[1] > Out[88]: False > In [89]: arr > Out[89]: > > [ > 1, > 1, > 1, > 2 > ] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX
[ https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414566#comment-16414566 ] Phillip Cloud commented on ARROW-2355: -- [~xhochy] What is the status of the OS X wheels? > [Python] Unable to import pyarrow [0.9.0] OSX > - > > Key: ARROW-2355 > URL: https://issues.apache.org/jira/browse/ARROW-2355 > Project: Apache Arrow > Issue Type: Bug >Reporter: Bradford W Littooy >Priority: Major > > I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to > import pyarrow into a python3.6 interpreter, I get the following import error: > > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File > "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py", > line 47, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: > dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so, > 2): Library not loaded: libarrow_boost_system.dylib > Referenced from: > /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib > Reason: image not found > >>> > I've installed pyarrow (0.9) on an EC2 instance with no issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX
[ https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bradford W Littooy updated ARROW-2355: -- Summary: [Python] Unable to import pyarrow [0.9.0] OSX (was: [Python] Unable to import pyarrow [0.9.0] OSX via pip) > [Python] Unable to import pyarrow [0.9.0] OSX > - > > Key: ARROW-2355 > URL: https://issues.apache.org/jira/browse/ARROW-2355 > Project: Apache Arrow > Issue Type: Bug >Reporter: Bradford W Littooy >Priority: Major > > I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to > import pyarrow into a python3.6 interpreter, I get the following import error: > > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File > "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py", > line 47, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: > dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so, > 2): Library not loaded: libarrow_boost_system.dylib > Referenced from: > /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib > Reason: image not found > >>> > I've installed pyarrow (0.9) on an EC2 instance with no issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX via pip
[ https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bradford W Littooy updated ARROW-2355: -- Description: I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to import pyarrow into a python3.6 interpreter, I get the following import error: >>> import pyarrow Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py", line 47, in from pyarrow.lib import cpu_count, set_cpu_count ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so, 2): Library not loaded: libarrow_boost_system.dylib Referenced from: /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib Reason: image not found >>> I've installed pyarrow (0.9) on an EC2 instance with no issue. was: I just installed pyarrow to my mac os x (version 10.13.3). When I try to import pyarrow into a python3.6 interpreter, I get the following import error: >>> import pyarrow Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py", line 47, in from pyarrow.lib import cpu_count, set_cpu_count ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so, 2): Library not loaded: libarrow_boost_system.dylib Referenced from: /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib Reason: image not found >>> I've installed pyarrow (0.9) on an EC2 instance with no issue. Summary: [Python] Unable to import pyarrow [0.9.0] OSX via pip (was: [Python] Unable to import pyarrow [0.9.0] OSX) > [Python] Unable to import pyarrow [0.9.0] OSX via pip > - > > Key: ARROW-2355 > URL: https://issues.apache.org/jira/browse/ARROW-2355 > Project: Apache Arrow > Issue Type: Bug >Reporter: Bradford W Littooy >Priority: Major > > I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to > import pyarrow into a python3.6 interpreter, I get the following import error: > > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File > "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py", > line 47, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: > dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so, > 2): Library not loaded: libarrow_boost_system.dylib > Referenced from: > /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib > Reason: image not found > >>> > I've installed pyarrow (0.9) on an EC2 instance with no issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0]
Bradford W Littooy created ARROW-2355: - Summary: [Python] Unable to import pyarrow [0.9.0] Key: ARROW-2355 URL: https://issues.apache.org/jira/browse/ARROW-2355 Project: Apache Arrow Issue Type: Bug Reporter: Bradford W Littooy I just installed pyarrow to my mac os x (version 10.13.3). When I try to import pyarrow into a python3.6 interpreter, I get the following import error: >>> import pyarrow Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py", line 47, in from pyarrow.lib import cpu_count, set_cpu_count ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so, 2): Library not loaded: libarrow_boost_system.dylib Referenced from: /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib Reason: image not found >>> I've installed pyarrow (0.9) on an EC2 instance with no issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX
[ https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bradford W Littooy updated ARROW-2355: -- Summary: [Python] Unable to import pyarrow [0.9.0] OSX (was: [Python] Unable to import pyarrow [0.9.0]) > [Python] Unable to import pyarrow [0.9.0] OSX > - > > Key: ARROW-2355 > URL: https://issues.apache.org/jira/browse/ARROW-2355 > Project: Apache Arrow > Issue Type: Bug >Reporter: Bradford W Littooy >Priority: Major > > I just installed pyarrow to my mac os x (version 10.13.3). When I try to > import pyarrow into a python3.6 interpreter, I get the following import error: > > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File > "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py", > line 47, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: > dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so, > 2): Library not loaded: libarrow_boost_system.dylib > Referenced from: > /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib > Reason: image not found > >>> > I've installed pyarrow (0.9) on an EC2 instance with no issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow
[ https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414495#comment-16414495 ] ASF GitHub Bot commented on ARROW-2354: --- cpcloud commented on a change in pull request #1794: ARROW-2354: [C++] Make PyDecimal_Check() faster URL: https://github.com/apache/arrow/pull/1794#discussion_r177230296 ## File path: cpp/src/arrow/python/helpers.cc ## @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) { } bool PyDecimal_Check(PyObject* obj) { - // TODO(phillipc): Is this expensive? - OwnedRef Decimal; - Status status = ImportDecimalType(&Decimal); - DCHECK_OK(status); - const int32_t result = PyObject_IsInstance(obj, Decimal.obj()); - DCHECK_NE(result, -1) << " error during PyObject_IsInstance check"; + static OwnedRef decimal_type; Review comment: Great. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] PyDecimal_Check() is much too slow > > > Key: ARROW-2354 > URL: https://issues.apache.org/jira/browse/ARROW-2354 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > > See https://github.com/apache/arrow/issues/1792 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2352) [C++/Python] Test OSX packaging in Travis matrix
[ https://issues.apache.org/jira/browse/ARROW-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414302#comment-16414302 ] Wes McKinney commented on ARROW-2352: - +1 for nightly cron > [C++/Python] Test OSX packaging in Travis matrix > > > Key: ARROW-2352 > URL: https://issues.apache.org/jira/browse/ARROW-2352 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Fix For: 0.10.0 > > > Maybe we want to do this as a nightly cron. For a first draft, I will simply > add it to the matrix. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1913) [Java] Fix Javadoc generation bugs with JDK8
[ https://issues.apache.org/jira/browse/ARROW-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414300#comment-16414300 ] ASF GitHub Bot commented on ARROW-1913: --- wesm commented on issue #1788: ARROW-1913: [Java] Disable Javadoc doclint with Java 8 URL: https://github.com/apache/arrow/pull/1788#issuecomment-376268087 https://github.com/apache/arrow/blob/master/.travis.yml#L101 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Fix Javadoc generation bugs with JDK8 > > > Key: ARROW-1913 > URL: https://issues.apache.org/jira/browse/ARROW-1913 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Li Jin >Priority: Minor > Labels: pull-request-available > Fix For: 0.10.0 > > > While trying to cut the release candidate, the source release script fails > due to various new Javadoc issues -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1913) [Java] Fix Javadoc generation bugs with JDK8
[ https://issues.apache.org/jira/browse/ARROW-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414299#comment-16414299 ] ASF GitHub Bot commented on ARROW-1913: --- wesm commented on issue #1788: ARROW-1913: [Java] Disable Javadoc doclint with Java 8 URL: https://github.com/apache/arrow/pull/1788#issuecomment-376268076 Should we change the Java part of the test matrix to use JDK8? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Fix Javadoc generation bugs with JDK8 > > > Key: ARROW-1913 > URL: https://issues.apache.org/jira/browse/ARROW-1913 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Li Jin >Priority: Minor > Labels: pull-request-available > Fix For: 0.10.0 > > > While trying to cut the release candidate, the source release script fails > due to various new Javadoc issues -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow
[ https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414224#comment-16414224 ] ASF GitHub Bot commented on ARROW-2354: --- pitrou commented on issue #1794: ARROW-2354: [C++] Make PyDecimal_Check() faster URL: https://github.com/apache/arrow/pull/1794#issuecomment-376254213 The benchmark already exists, see above :-) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] PyDecimal_Check() is much too slow > > > Key: ARROW-2354 > URL: https://issues.apache.org/jira/browse/ARROW-2354 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > > See https://github.com/apache/arrow/issues/1792 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow
[ https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414222#comment-16414222 ] ASF GitHub Bot commented on ARROW-2354: --- wesm commented on issue #1794: ARROW-2354: [C++] Make PyDecimal_Check() faster URL: https://github.com/apache/arrow/pull/1794#issuecomment-376253924 Can you add an ASV benchmark to exercise the performance issue in https://github.com/apache/arrow/issues/1792? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] PyDecimal_Check() is much too slow > > > Key: ARROW-2354 > URL: https://issues.apache.org/jira/browse/ARROW-2354 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > > See https://github.com/apache/arrow/issues/1792 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow
[ https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414184#comment-16414184 ] ASF GitHub Bot commented on ARROW-2354: --- pitrou commented on a change in pull request #1794: ARROW-2354: [C++] Make PyDecimal_Check() faster URL: https://github.com/apache/arrow/pull/1794#discussion_r177172524 ## File path: cpp/src/arrow/python/helpers.cc ## @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) { } bool PyDecimal_Check(PyObject* obj) { - // TODO(phillipc): Is this expensive? - OwnedRef Decimal; - Status status = ImportDecimalType(&Decimal); - DCHECK_OK(status); - const int32_t result = PyObject_IsInstance(obj, Decimal.obj()); - DCHECK_NE(result, -1) << " error during PyObject_IsInstance check"; + static OwnedRef decimal_type; Review comment: If the first implementation fails for someone, they'll notice and tell us. If the second implementation fails for someone, it will produce mysterious slowdowns. So I think the first implementation should win, for now at least. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] PyDecimal_Check() is much too slow > > > Key: ARROW-2354 > URL: https://issues.apache.org/jira/browse/ARROW-2354 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > > See https://github.com/apache/arrow/issues/1792 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow
[ https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414182#comment-16414182 ] ASF GitHub Bot commented on ARROW-2354: --- cpcloud commented on a change in pull request #1794: ARROW-2354: [C++] Make PyDecimal_Check() faster URL: https://github.com/apache/arrow/pull/1794#discussion_r177171726 ## File path: cpp/src/arrow/python/helpers.cc ## @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) { } bool PyDecimal_Check(PyObject* obj) { - // TODO(phillipc): Is this expensive? - OwnedRef Decimal; - Status status = ImportDecimalType(&Decimal); - DCHECK_OK(status); - const int32_t result = PyObject_IsInstance(obj, Decimal.obj()); - DCHECK_NE(result, -1) << " error during PyObject_IsInstance check"; + static OwnedRef decimal_type; Review comment: I guess it's a question of which is more stable: the C implementation of the decimal module or the way Python names types? I personally don't feel strongly about either one. If I was forced to choose, I probably stick with the first implementation. It seems less brittle, because it doesn't depend on Python's naming convention, but that intuition could be misleading. Maybe that's a very stable part of Python. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] PyDecimal_Check() is much too slow > > > Key: ARROW-2354 > URL: https://issues.apache.org/jira/browse/ARROW-2354 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > > See https://github.com/apache/arrow/issues/1792 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow
[ https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414170#comment-16414170 ] ASF GitHub Bot commented on ARROW-2354: --- pitrou commented on a change in pull request #1794: ARROW-2354: [C++] Make PyDecimal_Check() faster URL: https://github.com/apache/arrow/pull/1794#discussion_r177169733 ## File path: cpp/src/arrow/python/helpers.cc ## @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) { } bool PyDecimal_Check(PyObject* obj) { - // TODO(phillipc): Is this expensive? - OwnedRef Decimal; - Status status = ImportDecimalType(&Decimal); - DCHECK_OK(status); - const int32_t result = PyObject_IsInstance(obj, Decimal.obj()); - DCHECK_NE(result, -1) << " error during PyObject_IsInstance check"; + static OwnedRef decimal_type; Review comment: Here is an alternative implementation. Which one do you prefer? ```c++ bool PyDecimal_Check(PyObject* obj) { OwnedRef decimal_type; PyTypeObject* obj_type = Py_TYPE(obj); // Fast fail for most types other than Decimal, for speed if (strcmp(obj_type->tp_name, "decimal.Decimal")) { return false; } Status status = ImportDecimalType(&decimal_type); DCHECK_OK(status); DCHECK(PyType_Check(decimal_type.obj())); // PyObject_IsInstance() is slower as it has to check for virtual subclasses const int result = PyType_IsSubtype(obj_type, reinterpret_cast(decimal_type.obj())); DCHECK_NE(result, -1) << " error during PyType_IsSubtype check"; return result == 1; } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] PyDecimal_Check() is much too slow > > > Key: ARROW-2354 > URL: https://issues.apache.org/jira/browse/ARROW-2354 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > > See https://github.com/apache/arrow/issues/1792 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow
[ https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414166#comment-16414166 ] ASF GitHub Bot commented on ARROW-2354: --- pitrou commented on a change in pull request #1794: ARROW-2354: [C++] Make PyDecimal_Check() faster URL: https://github.com/apache/arrow/pull/1794#discussion_r177168902 ## File path: cpp/src/arrow/python/helpers.cc ## @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) { } bool PyDecimal_Check(PyObject* obj) { - // TODO(phillipc): Is this expensive? - OwnedRef Decimal; - Status status = ImportDecimalType(&Decimal); - DCHECK_OK(status); - const int32_t result = PyObject_IsInstance(obj, Decimal.obj()); - DCHECK_NE(result, -1) << " error during PyObject_IsInstance check"; + static OwnedRef decimal_type; Review comment: No, because depending on how the C `decimal` module is implemented, there may be a separate Decimal type per interpreter. This is currently not the case, though, so perhaps we shouldn't worry about it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] PyDecimal_Check() is much too slow > > > Key: ARROW-2354 > URL: https://issues.apache.org/jira/browse/ARROW-2354 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > > See https://github.com/apache/arrow/issues/1792 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow
[ https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414153#comment-16414153 ] ASF GitHub Bot commented on ARROW-2354: --- cpcloud commented on a change in pull request #1794: ARROW-2354: [C++] Make PyDecimal_Check() faster URL: https://github.com/apache/arrow/pull/1794#discussion_r177167278 ## File path: cpp/src/arrow/python/helpers.cc ## @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) { } bool PyDecimal_Check(PyObject* obj) { - // TODO(phillipc): Is this expensive? - OwnedRef Decimal; - Status status = ImportDecimalType(&Decimal); - DCHECK_OK(status); - const int32_t result = PyObject_IsInstance(obj, Decimal.obj()); - DCHECK_NE(result, -1) << " error during PyObject_IsInstance check"; + static OwnedRef decimal_type; Review comment: Is that because the reference count will be shared across them? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] PyDecimal_Check() is much too slow > > > Key: ARROW-2354 > URL: https://issues.apache.org/jira/browse/ARROW-2354 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > > See https://github.com/apache/arrow/issues/1792 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow
[ https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414151#comment-16414151 ] ASF GitHub Bot commented on ARROW-2354: --- cpcloud commented on a change in pull request #1794: ARROW-2354: [C++] Make PyDecimal_Check() faster URL: https://github.com/apache/arrow/pull/1794#discussion_r177166321 ## File path: cpp/src/arrow/python/helpers.cc ## @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) { } bool PyDecimal_Check(PyObject* obj) { - // TODO(phillipc): Is this expensive? - OwnedRef Decimal; - Status status = ImportDecimalType(&Decimal); - DCHECK_OK(status); - const int32_t result = PyObject_IsInstance(obj, Decimal.obj()); - DCHECK_NE(result, -1) << " error during PyObject_IsInstance check"; + static OwnedRef decimal_type; + if (!decimal_type.obj()) { +Status status = ImportDecimalType(&decimal_type); +DCHECK_OK(status); +DCHECK(PyType_Check(decimal_type.obj())); + } + // PyObject_IsInstance() is slower as it has to check for virtual subclasses Review comment: This was the cause of the performance regression, and not importing over and over? Or was it both? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] PyDecimal_Check() is much too slow > > > Key: ARROW-2354 > URL: https://issues.apache.org/jira/browse/ARROW-2354 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > > See https://github.com/apache/arrow/issues/1792 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow
[ https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414150#comment-16414150 ] ASF GitHub Bot commented on ARROW-2354: --- cpcloud commented on a change in pull request #1794: ARROW-2354: [C++] Make PyDecimal_Check() faster URL: https://github.com/apache/arrow/pull/1794#discussion_r177166026 ## File path: cpp/src/arrow/python/helpers.cc ## @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) { } bool PyDecimal_Check(PyObject* obj) { - // TODO(phillipc): Is this expensive? - OwnedRef Decimal; - Status status = ImportDecimalType(&Decimal); - DCHECK_OK(status); - const int32_t result = PyObject_IsInstance(obj, Decimal.obj()); - DCHECK_NE(result, -1) << " error during PyObject_IsInstance check"; + static OwnedRef decimal_type; + if (!decimal_type.obj()) { Review comment: Maybe compare against `nullptr` here? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] PyDecimal_Check() is much too slow > > > Key: ARROW-2354 > URL: https://issues.apache.org/jira/browse/ARROW-2354 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > > See https://github.com/apache/arrow/issues/1792 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow
[ https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414152#comment-16414152 ] ASF GitHub Bot commented on ARROW-2354: --- pitrou commented on a change in pull request #1794: ARROW-2354: [C++] Make PyDecimal_Check() faster URL: https://github.com/apache/arrow/pull/1794#discussion_r177167130 ## File path: cpp/src/arrow/python/helpers.cc ## @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) { } bool PyDecimal_Check(PyObject* obj) { - // TODO(phillipc): Is this expensive? - OwnedRef Decimal; - Status status = ImportDecimalType(&Decimal); - DCHECK_OK(status); - const int32_t result = PyObject_IsInstance(obj, Decimal.obj()); - DCHECK_NE(result, -1) << " error during PyObject_IsInstance check"; + static OwnedRef decimal_type; + if (!decimal_type.obj()) { +Status status = ImportDecimalType(&decimal_type); +DCHECK_OK(status); +DCHECK(PyType_Check(decimal_type.obj())); + } + // PyObject_IsInstance() is slower as it has to check for virtual subclasses Review comment: It was importing over and over. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] PyDecimal_Check() is much too slow > > > Key: ARROW-2354 > URL: https://issues.apache.org/jira/browse/ARROW-2354 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > > See https://github.com/apache/arrow/issues/1792 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow
[ https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414143#comment-16414143 ] ASF GitHub Bot commented on ARROW-2354: --- pitrou commented on a change in pull request #1794: ARROW-2354: [C++] Make PyDecimal_Check() faster URL: https://github.com/apache/arrow/pull/1794#discussion_r177166295 ## File path: cpp/src/arrow/python/helpers.cc ## @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) { } bool PyDecimal_Check(PyObject* obj) { - // TODO(phillipc): Is this expensive? - OwnedRef Decimal; - Status status = ImportDecimalType(&Decimal); - DCHECK_OK(status); - const int32_t result = PyObject_IsInstance(obj, Decimal.obj()); - DCHECK_NE(result, -1) << " error during PyObject_IsInstance check"; + static OwnedRef decimal_type; Review comment: Note that this may not play well with multiple Python (sub)interpreters in the same process. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] PyDecimal_Check() is much too slow > > > Key: ARROW-2354 > URL: https://issues.apache.org/jira/browse/ARROW-2354 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > > See https://github.com/apache/arrow/issues/1792 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow
[ https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414137#comment-16414137 ] ASF GitHub Bot commented on ARROW-2354: --- cpcloud commented on a change in pull request #1794: ARROW-2354: [C++] Make PyDecimal_Check() faster URL: https://github.com/apache/arrow/pull/1794#discussion_r177165146 ## File path: cpp/src/arrow/python/helpers.cc ## @@ -227,12 +227,16 @@ Status UInt64FromPythonInt(PyObject* obj, uint64_t* out) { } bool PyDecimal_Check(PyObject* obj) { - // TODO(phillipc): Is this expensive? Review comment: Guess the answer is "yes" :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] PyDecimal_Check() is much too slow > > > Key: ARROW-2354 > URL: https://issues.apache.org/jira/browse/ARROW-2354 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > > See https://github.com/apache/arrow/issues/1792 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2301) [Python] Add source distribution publishing instructions to package / release management documentation
[ https://issues.apache.org/jira/browse/ARROW-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414136#comment-16414136 ] ASF GitHub Bot commented on ARROW-2301: --- xhochy opened a new pull request #1795: ARROW-2301: [Python] Build source distribution inside the manylinux1 docker URL: https://github.com/apache/arrow/pull/1795 @kou Once this is merged, I will upload the source distribution for 0.9.0. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Add source distribution publishing instructions to package / release > management documentation > -- > > Key: ARROW-2301 > URL: https://issues.apache.org/jira/browse/ARROW-2301 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: pull-request-available > > We wish to start publishing source tarballs for Python on PyPI -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2301) [Python] Add source distribution publishing instructions to package / release management documentation
[ https://issues.apache.org/jira/browse/ARROW-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2301: -- Labels: pull-request-available (was: ) > [Python] Add source distribution publishing instructions to package / release > management documentation > -- > > Key: ARROW-2301 > URL: https://issues.apache.org/jira/browse/ARROW-2301 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: pull-request-available > > We wish to start publishing source tarballs for Python on PyPI -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2354) [C++] PyDecimal_Check() is much too slow
[ https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2354: -- Labels: pull-request-available (was: ) > [C++] PyDecimal_Check() is much too slow > > > Key: ARROW-2354 > URL: https://issues.apache.org/jira/browse/ARROW-2354 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > > See https://github.com/apache/arrow/issues/1792 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2354) [C++] PyDecimal_Check() is much too slow
[ https://issues.apache.org/jira/browse/ARROW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414134#comment-16414134 ] ASF GitHub Bot commented on ARROW-2354: --- pitrou opened a new pull request #1794: ARROW-2354: [C++] Make PyDecimal_Check() faster URL: https://github.com/apache/arrow/pull/1794 This basically keeps an eternal reference to the decimal type. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] PyDecimal_Check() is much too slow > > > Key: ARROW-2354 > URL: https://issues.apache.org/jira/browse/ARROW-2354 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > > See https://github.com/apache/arrow/issues/1792 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2354) [C++] PyDecimal_Check() is much too slow
Antoine Pitrou created ARROW-2354: - Summary: [C++] PyDecimal_Check() is much too slow Key: ARROW-2354 URL: https://issues.apache.org/jira/browse/ARROW-2354 Project: Apache Arrow Issue Type: Bug Components: C++, Python Affects Versions: 0.9.0 Reporter: Antoine Pitrou Assignee: Antoine Pitrou See https://github.com/apache/arrow/issues/1792 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-640) [Python] Arrow scalar values should have a sensible __hash__ and comparison
[ https://issues.apache.org/jira/browse/ARROW-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414108#comment-16414108 ] ASF GitHub Bot commented on ARROW-640: -- AlexHagerman commented on issue #1765: ARROW-640: [Python] Implement __hash__ and equality for Array scalar values Arrow scalar values URL: https://github.com/apache/arrow/pull/1765#issuecomment-376230898 @pitrou any other feedback or comments on this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Arrow scalar values should have a sensible __hash__ and comparison > --- > > Key: ARROW-640 > URL: https://issues.apache.org/jira/browse/ARROW-640 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Miki Tebeka >Assignee: Alex Hagerman >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > {noformat} > In [86]: arr = pa.from_pylist([1, 1, 1, 2]) > In [87]: set(arr) > Out[87]: {1, 2, 1, 1} > In [88]: arr[0] == arr[1] > Out[88]: False > In [89]: arr > Out[89]: > > [ > 1, > 1, > 1, > 2 > ] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2353) Test correctness of built wheel on AppVeyor
[ https://issues.apache.org/jira/browse/ARROW-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2353: -- Labels: pull-request-available (was: ) > Test correctness of built wheel on AppVeyor > --- > > Key: ARROW-2353 > URL: https://issues.apache.org/jira/browse/ARROW-2353 > Project: Apache Arrow > Issue Type: Task > Components: Continuous Integration, Python >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2353) Test correctness of built wheel on AppVeyor
[ https://issues.apache.org/jira/browse/ARROW-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414088#comment-16414088 ] ASF GitHub Bot commented on ARROW-2353: --- pitrou opened a new pull request #1793: ARROW-2353: [CI] Check correctness of built wheel on AppVeyor URL: https://github.com/apache/arrow/pull/1793 And assorted fixes on Windows. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Test correctness of built wheel on AppVeyor > --- > > Key: ARROW-2353 > URL: https://issues.apache.org/jira/browse/ARROW-2353 > Project: Apache Arrow > Issue Type: Task > Components: Continuous Integration, Python >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2353) Test correctness of built wheel on AppVeyor
[ https://issues.apache.org/jira/browse/ARROW-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414090#comment-16414090 ] ASF GitHub Bot commented on ARROW-2353: --- pitrou commented on a change in pull request #1793: ARROW-2353: [CI] Check correctness of built wheel on AppVeyor URL: https://github.com/apache/arrow/pull/1793#discussion_r177153858 ## File path: ci/msvc-build.bat ## @@ -103,12 +103,12 @@ cmake -G "%GENERATOR%" ^ cmake --build . --target install --config %CONFIGURATION% || exit /B @rem Needed so python-test.exe works -set OLD_PYTHONPATH=%PYTHONPATH% -set PYTHONPATH=%CONDA_PREFIX%\Lib;%CONDA_PREFIX%\Lib\site-packages;%CONDA_PREFIX%\python35.zip;%CONDA_PREFIX%\DLLs;%CONDA_PREFIX%;%PYTHONPATH% +set OLD_PYTHONHOME=%PYTHONHOME% +set PYTHONHOME=%CONDA_PREFIX% Review comment: @wesm This can also be done in `python-test.exe` instead. Which one do you prefer? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Test correctness of built wheel on AppVeyor > --- > > Key: ARROW-2353 > URL: https://issues.apache.org/jira/browse/ARROW-2353 > Project: Apache Arrow > Issue Type: Task > Components: Continuous Integration, Python >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2353) Test correctness of built wheel on AppVeyor
Antoine Pitrou created ARROW-2353: - Summary: Test correctness of built wheel on AppVeyor Key: ARROW-2353 URL: https://issues.apache.org/jira/browse/ARROW-2353 Project: Apache Arrow Issue Type: Task Components: Continuous Integration, Python Reporter: Antoine Pitrou Assignee: Antoine Pitrou -- This message was sent by Atlassian JIRA (v7.6.3#76005)