[ https://issues.apache.org/jira/browse/BEAM-7173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835766#comment-16835766 ]
Valentyn Tymofieiev commented on BEAM-7173: ------------------------------------------- cc: [~angoenka] > Bigquery connector should not enable schema autodetection without a user > explicitly instructing to do so. > ---------------------------------------------------------------------------------------------------------- > > Key: BEAM-7173 > URL: https://issues.apache.org/jira/browse/BEAM-7173 > Project: Beam > Issue Type: Bug > Components: io-python-gcp > Reporter: Valentyn Tymofieiev > Assignee: Pablo Estrada > Priority: Major > Fix For: 2.13.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Currently BQ_FILE_LOADS insertion method enables schema autodetection: > [https://github.com/apache/beam/blob/6567f1687d53e491b337ba94f521fa2e4af35e46/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L340] > It may be more user-friendly allow users to opt-in for schema autodetection > in their pipelines across all use-cases for BQ connector. Schema > autodetection is an approximation, and does not always work. > For example, schema autodetection cannot infer whether a string data is > binary bytes or textual string, and will always prefer the latter. If schema > autodetection is enabled by default, users who need to write 'bytes' data > will always have to specify a schema, even when writing to a table that was > already created and has the schema. Otherwise autodetected schema will try to > write 'string' entry into a 'bytes' field and the write will fail. > Related discussion: > [https://lists.apache.org/thread.html/1f9d9cb1bbbfca87d74e62ba8e58a15059ed6c20ab419002fcd3f8df@%3Cdev.beam.apache.org%3E] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)