[
https://issues.apache.org/jira/browse/BEAM-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791801#comment-16791801
]
Juta Staes commented on BEAM-6769:
----------------------------------
I looked into this issue and found:
* The current test in python 2 does not work when trying to write
b'\xab\xac\xad' to bigquery
* The expected way of writing bytes to bq is by passing base-64 encoded
strings to the bigquery client
I added a pr to change the writing to bq by using base-64:
[https://github.com/apache/beam/pull/8047]
I have some questions about this:
* Do we expect users to handle the base-64 encoding themselves (as they should
when using the bigquery client) or should this happen in bigquery io?
* The current test tests reading bytes from bq and then writing them. Would it
be good to add another test that first writes and then reads the bytes to
actually test writing bytes from python instead of reading them from bq?
> BigQuery IO does not support bytes in Python 3
> ----------------------------------------------
>
> Key: BEAM-6769
> URL: https://issues.apache.org/jira/browse/BEAM-6769
> Project: Beam
> Issue Type: Sub-task
> Components: sdk-py-core
> Reporter: Juta Staes
> Assignee: Juta Staes
> Priority: Major
> Time Spent: 40m
> Remaining Estimate: 0h
>
> In Python 2 you could write bytes data to BigQuery. This is tested in
>
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py#L186]
> Python 3 does not support
> {noformat}
> json.dumps({'test': b'test'}){noformat}
> which is used to encode the data in
>
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L959]
>
> How should writing bytes to BigQuery be handled in Python 3?
> * Forbid writing bytes into BigQuery on Python 3
> * Guess the encoding (utf-8?)
> * Pass the encoding to BigQuery
> cc: [~tvalentyn]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)