[
https://issues.apache.org/jira/browse/BEAM-7137?focusedWorklogId=236104&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-236104
]
ASF GitHub Bot logged work on BEAM-7137:
----------------------------------------
Author: ASF GitHub Bot
Created on: 02/May/19 04:14
Start Date: 02/May/19 04:14
Worklog Time Spent: 10m
Work Description: tvalentyn commented on pull request #8452: [BEAM-7137]
Writetotext header encode
URL: https://github.com/apache/beam/pull/8452#discussion_r280285297
##########
File path: sdks/python/apache_beam/io/textio.py
##########
@@ -390,7 +390,7 @@ def __init__(self,
def open(self, temp_path):
file_handle = super(_TextSink, self).open(temp_path)
if self._header is not None:
- file_handle.write(self._header)
+ file_handle.write(self.coder.encode(self._header))
Review comment:
This will work well when `coder` is initialized to default value
`coders.ToStringCoder()`, however if the user passes a different coder to
`WriteToText` PTransform, which instantiates `_TextSink`, such coder may not
encode text correctly.
The fastest way to fix this would be to use:
`file_handle.write(coders.ToStringCoder().encode(self._header))`.
Alternatively we could introduce a helper function `as_bytes` for these
purposes, and put it somewhere in a handy place.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 236104)
Time Spent: 40m (was: 0.5h)
> TypeError caused by using str variable as header argument in
> apache_beam.io.textio.WriteToText
> ----------------------------------------------------------------------------------------------
>
> Key: BEAM-7137
> URL: https://issues.apache.org/jira/browse/BEAM-7137
> Project: Beam
> Issue Type: Sub-task
> Components: sdk-py-core
> Affects Versions: 2.11.0
> Environment: Python 3.5.6
> macOS Mojave 10.14.4
> Reporter: yoshiki obata
> Assignee: yoshiki obata
> Priority: Major
> Fix For: 2.13.0
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Using str header to apache_beam.io.textio.WriteToText as argument cause
> TypeError with Python 3.5.6 - or maybe higher - despite docstring says header
> is str.
> This error occurred by writing header to file without encoding to bytes at
> apache_beam.io.textio._TextSink.open.
>
> {code:java}
> Traceback (most recent call last):
> File "apache_beam/runners/common.py", line 727, in
> apache_beam.runners.common.DoFnRunner.process
> File "apache_beam/runners/common.py", line 555, in
> apache_beam.runners.common.PerWindowInvoker.invoke_process
> File "apache_beam/runners/common.py", line 625, in
> apache_beam.runners.common.PerWindowInvoker._invoke_per_window
> File
> "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/iobase.py",
> line 1033, in process
> self.writer = self.sink.open_writer(init_result, str(uuid.uuid4()))
> File
> "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/options/value_provider.py",
> line 137, in _f
> return fnc(self, *args, **kwargs)
> File
> "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py",
> line 185, in open_writer
> return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix)
> File
> "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py",
> line 389, in __init__
> self.temp_handle = self.sink.open(temp_shard_path)
> File
> "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/textio.py",
> line 393, in open
> file_handle.write(self._header)
> TypeError: a bytes-like object is required, not 'str'
> {code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)