[ 
https://issues.apache.org/jira/browse/BEAM-6154?focusedWorklogId=189665&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189665
 ]

ASF GitHub Bot logged work on BEAM-6154:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 24/Jan/19 20:09
            Start Date: 24/Jan/19 20:09
    Worklog Time Spent: 10m 
      Work Description: markflyhigh commented on issue #7617: [BEAM-6154] 
Update google-apitools to 0.5.26 and fix gcsio in python 3
URL: https://github.com/apache/beam/pull/7617#issuecomment-457338679
 
 
   Run Python Dataflow ValidatesRunner
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 189665)
    Time Spent: 1.5h  (was: 1h 20m)

> Gcsio batch delete broken in Python 3
> -------------------------------------
>
>                 Key: BEAM-6154
>                 URL: https://issues.apache.org/jira/browse/BEAM-6154
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-py-core
>            Reporter: Mark Liu
>            Assignee: Mark Liu
>            Priority: Major
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I'm running Python SDK agianst GCP in Python 3.5 and got following gcsio 
> error while deleting files:
> {code}
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", 
> line 1077, in <genexpr>
>     window.TimestampedValue(v, timestamp.MAX_TIMESTAMP) for v in outputs)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 315, in finalize_write
>     num_threads)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/util.py", 
> line 145, in run_using_threadpool
>     return pool.map(fn_to_execute, inputs)
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 266, in map
>     return self._map_async(func, iterable, mapstar, chunksize).get()
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 644, in get
>     raise self._value
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 119, in worker
>     result = (True, func(*args, **kwds))
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
>     return list(map(*args))
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 299, in _rename_batch
>     FileSystems.rename(source_files, destination_files)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line 
> 252, in rename
>     return filesystem.rename(source_file_names, destination_file_names)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsfilesystem.py", 
> line 229, in rename
>     copy_statuses = gcsio.GcsIO().copy_batch(batch)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsio.py", 
> line 322, in copy_batch
>     api_calls = batch_request.Execute(self.client._http)  # pylint: 
> disable=protected-access
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 222, in Execute
>     batch_http_request.Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 480, in Execute
>     self._Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 450, in _Execute
>     mime_response = parser.parsestr(header + response.content)
> TypeError: Can't convert 'bytes' object to str implicitly
> {code} 
> After looking into related code in apitools library, I found response.content 
> that's returned via http request to gcs is bytes and apitools didn't handle 
> this scenario. This can be a blocker to any pipeline depending on gcsio and 
> apparently blocks all Dataflow job in Python 3.
> This could be another case that moving off apitools dependency in 
> [BEAM-4850|https://issues.apache.org/jira/browse/BEAM-4850].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to