[
https://issues.apache.org/jira/browse/BEAM-7382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16845349#comment-16845349
]
Pablo Estrada commented on BEAM-7382:
-------------------------------------
[~Juta] - I see that this is using the native sink. The native sink does not
support schema autodetection. What we can do in this case, is simply error out
if we're going to use autodetection on the native sink (Juta you've already
added logic to dataflow_runner.py to catch this. We'd just have to error out).
Thoughts? [~Juta][~tvalentyn]
> Bigquery IO: schema autodetection failing
> -----------------------------------------
>
> Key: BEAM-7382
> URL: https://issues.apache.org/jira/browse/BEAM-7382
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Juta Staes
> Assignee: Pablo Estrada
> Priority: Major
>
> I am working on writing it tests for bigquery io on the dataflowrunner.
> When testing the schema auto detection I get:
> {code:java}
> ERROR: test_big_query_write_schema_autodetect
> (apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests)*12:41:01*
>
> ----------------------------------------------------------------------*12:41:01*
> Traceback (most recent call last):*12:41:01* File
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/io/gcp/bigquery_write_it_test.py",
> line 156, in test_big_query_write_schema_autodetect*12:41:01*
> write_disposition=beam.io.BigQueryDisposition.WRITE_EMPTY))*12:41:01* File
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/pipeline.py",
> line 426, in __exit__*12:41:01* self.run().wait_until_finish()*12:41:01*
> File
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/pipeline.py",
> line 419, in run*12:41:01* return self.runner.run_pipeline(self,
> self._options)*12:41:01* File
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
> line 64, in run_pipeline*12:41:01*
> self.result.wait_until_finish(duration=wait_duration)*12:41:01* File
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
> line 1322, in wait_until_finish*12:41:01* (self.state,
> getattr(self._runner, 'last_error_msg', None)), self)*12:41:01*
> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException:
> Dataflow pipeline failed. State: FAILED, Error:*12:41:01* Workflow failed.
> Causes: S01:create/Read+write/WriteToBigQuery/NativeWrite failed., BigQuery
> import job "dataflow_job_18059625072014532771-B" failed., BigQuery job
> "dataflow_job_18059625072014532771-B" in project "apache-beam-testing"
> finished with error(s): errorResult: No schema specified on job or table.,
> error: No schema specified on job or table.
> {code}
> test code:
> {code:java}
> input_data = [
> {'number': 1, 'str': 'abc'},
> {'number': 2, 'str': 'def'},
> ]
> with beam.Pipeline(argv=args) as p:
> (p | 'create' >> beam.Create(input_data)
> | 'write' >> beam.io.WriteToBigQuery(
> output_table,
> schema=beam.io.gcp.bigquery.SCHEMA_AUTODETECT,
> create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
> write_disposition=beam.io.BigQueryDisposition.WRITE_EMPTY))
> {code}
> Is there something wrong with my test or is this a bug?
> link to pr: [https://github.com/apache/beam/pull/8621]
> cc: [~tvalentyn]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)