[ 
https://issues.apache.org/jira/browse/BEAM-14235?focusedWorklogId=752561&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752561
 ]

ASF GitHub Bot logged work on BEAM-14235:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Apr/22 23:37
            Start Date: 04/Apr/22 23:37
    Worklog Time Spent: 10m 
      Work Description: cozos opened a new pull request, #17275:
URL: https://github.com/apache/beam/pull/17275

   **Please** add a meaningful description for your change here
   
   parquetio module does not parse [PEP-440](https://peps.python.org/pep-0440/) 
compliant pyarrow version. For example, if the `pyarrow` version was 
`1.0.0+abc.7`, it would fail with the following:
   
   ```
   Traceback (most recent call last):
     File "/usr/local/lib/python3.7/runpy.py", line 183, in _run_module_as_main
       mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
     File "/usr/local/lib/python3.7/runpy.py", line 109, in _get_module_details
       __import__(pkg_name)
     File "/usr/local/lib/python3.7/site-packages/apache_beam/__init__.py", 
line 93, in <module>
       from apache_beam import io
     File "/usr/local/lib/python3.7/site-packages/apache_beam/io/__init__.py", 
line 28, in <module>
       from apache_beam.io.parquetio import *
     File "/usr/local/lib/python3.7/site-packages/apache_beam/io/parquetio.py", 
line 53, in <module>
       ARROW_MAJOR_VERSION, _, _ = map(int, pa.__version__.split('.'))
   ValueError: invalid literal for int() with base 10: '0+abc.7'
   ```
   
   In practice, this would fail when somebody uses their own fork of `pyarrow` 
(like me). This change uses setuptools' `pkg_resourses.parse_version`, which is 
PEP-440 compliant after setuptools > 6.0.
   
   
https://peps.python.org/pep-0440/#summary-of-differences-from-pkg-resources-parse-version
   
   ```
   Note: this comparison is to pkg_resourses.parse_version as it existed at the 
time the PEP was written. After the PEP was accepted, setuptools 6.0 and later 
versions adopted the behaviour described in this PEP.
   ```
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
    - [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   
------------------------------------------------------------------------------------------------
   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI.
   




Issue Time Tracking
-------------------

            Worklog Id:     (was: 752561)
    Remaining Estimate: 0h
            Time Spent: 10m

> parquetio module does not parse PEP-440 compliant Pyarrow version
> -----------------------------------------------------------------
>
>                 Key: BEAM-14235
>                 URL: https://issues.apache.org/jira/browse/BEAM-14235
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-parquet
>    Affects Versions: 2.27.0
>            Reporter: Arwin S Tio
>            Priority: P3
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In version > 2.27, introduced by this PR: 
> [https://github.com/apache/beam/pull/13302/files#diff-33b0b6b112036df96f341aa83b88efba9215ec14dfabc9db9e9ffe66a23154a2R55]
> The parquetio module parses the pyarrow version like this:
> {code:java}
> ARROW_MAJOR_VERSION, _, _ = map(int, pa.__version__.split('.')) {code}
> (see 
> [https://github.com/apache/beam/blob/v2.27.0/sdks/python/apache_beam/io/parquetio.py#L55)]
>  
> This does not support all PEP-440 compliant versions: 
> [https://peps.python.org/pep-0440/]
>  
> For example, if pyarrow were to have a version like this: *1.0.0+abc.7,* then 
> this module would fail:
> {code:java}
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.7/runpy.py", line 183, in _run_module_as_main
>     mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
>   File "/usr/local/lib/python3.7/runpy.py", line 109, in _get_module_details
>     __import__(pkg_name)
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/__init__.py", line 
> 93, in <module>
>     from apache_beam import io
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/io/__init__.py", 
> line 28, in <module>
>     from apache_beam.io.parquetio import *
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/io/parquetio.py", 
> line 53, in <module>
>     ARROW_MAJOR_VERSION, _, _ = map(int, pa.__version__.split('.'))
> ValueError: invalid literal for int() with base 10: '0+abc.7'{code}
>  
> In practice, this would fail when somebody forks pyarrow, like yours truly.
>  
> We can fix this by using *pkg_resourses.parse_version* which is PEP-440 
> compliant starting setuptools 6.0. 
>  
> If maintainers agree with this change I would be wiling to submit a PR.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to