Glenn Ammons created BEAM-4486:
----------------------------------

             Summary: BigQuery: FILE_LOADS + CREATE_NEVER + field-based 
partitioning => missing schema exception
                 Key: BEAM-4486
                 URL: https://issues.apache.org/jira/browse/BEAM-4486
             Project: Beam
          Issue Type: Bug
          Components: io-java-gcp
    Affects Versions: 2.4.0
            Reporter: Glenn Ammons
            Assignee: Chamikara Jayalath


Our pipeline gets this error from BigQuery when using 
BigQueryIO.Write.Method.FILE_LOADS, 
BigQueryIO.Write.CreateDisposition.CREATE_NEVER, and field-based time 
partitioning (full exception at the bottom of this note):

    Table with field based partitioning must have a schema.

We do supply a schema when we create the pipeline by calling 
BigQuery.Write.withSchema, but this schema is ignored because the 
processElement method here:

[https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java]

always provides a null schema when using CREATE_NEVER.

I would expect Beam to use the provided schema no matter what setting we are 
using for the CreateDisposition.

 

Full exception:

java.io.IOException: Unable to insert job: 
078646f70a664daaa1ed96832b233036_19e873cd24cf1968559515e49b3d868d_00001_00000-0,
 aborting after 9 . 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl.startJob(BigQueryServicesImpl.java:236)
 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl.startJob(BigQueryServicesImpl.java:204)
 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl.startLoadJob(BigQueryServicesImpl.java:144)
 org.apache.beam.sdk.io.gcp.bigquery.WriteTables.load(WriteTables.java:259) 
org.apache.beam.sdk.io.gcp.bigquery.WriteTables.access$600(WriteTables.java:77) 
org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn.processElement(WriteTables.java:155)
 Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 
400 Bad Request \{ "code" : 400, "errors" : [ { "domain" : "global", "message" 
: "Table with field based partitioning must have a schema.", "reason" : 
"invalid" } ], "message" : "Table with field based partitioning must have a 
schema." } 

com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
 
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
 
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
 com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1065) 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl.startJob(BigQueryServicesImpl.java:218)
 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl.startJob(BigQueryServicesImpl.java:204)
 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl.startLoadJob(BigQueryServicesImpl.java:144)
 org.apache.beam.sdk.io.gcp.bigquery.WriteTables.load(WriteTables.java:259) 
org.apache.beam.sdk.io.gcp.bigquery.WriteTables.access$600(WriteTables.java:77) 
org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn.processElement(WriteTables.java:155)
 
org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn$DoFnInvoker.invokeProcessElement(Unknown
 Source) 
org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:177)
 
org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:138)
 
com.google.cloud.dataflow.worker.StreamingSideInputDoFnRunner.startBundle(StreamingSideInputDoFnRunner.java:60)
 
com.google.cloud.dataflow.worker.SimpleParDoFn.reallyStartBundle(SimpleParDoFn.java:300)
 
com.google.cloud.dataflow.worker.SimpleParDoFn.startBundle(SimpleParDoFn.java:226)
 
com.google.cloud.dataflow.worker.util.common.worker.ParDoOperation.start(ParDoOperation.java:35)
 
com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:67)
 
com.google.cloud.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1197)
 
com.google.cloud.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:137)
 
com.google.cloud.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:940)
 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to