[ https://issues.apache.org/jira/browse/AIRFLOW-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15637345#comment-15637345 ]
Giovanni Briggs commented on AIRFLOW-611: ----------------------------------------- Alright then. Wanted to make sure I wasn't creating more work than it was worth! > BigQuery Hooks and Operators "source_format" error > -------------------------------------------------- > > Key: AIRFLOW-611 > URL: https://issues.apache.org/jira/browse/AIRFLOW-611 > Project: Apache Airflow > Issue Type: Bug > Components: gcp > Reporter: Giovanni Briggs > Priority: Minor > > Found an issue with the *source_format* parameter for the > GoogleCloudStorageToBigQueryOperator. > I was trying to upload a JSON file from GCS to BQ and was using the value > *"JSON"* for *source_format*, assuming that this would work. The upload > process started, but then came back with an error saying: > {code:javascript} > {'message': 'Error detected while parsing row starting at position: 0. Error: > Data between close double quote (") and field separator.', 'reason': > 'invalid'} > {code} > There is nothing wrong with the JSON format of the doc, so I went and looked > at the job description on BigQuery and saw that there was no "Source Format" > entry. When I've successfully uploaded CSV files, the "Source Format" entry > is present and says "CSV." > According to Google's docs for [source format > |https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.tableDefinitions.(key).sourceFormat], > acceptable values are: "CSV", "NEWLINE_DELIMTED_JSON", "AVRO" and > "GOOGLE_SHEETS." However, BigQuery doesn't raise an error if you pass a > format not represented in that list (such as "JSON"). Instead, it looks like > BigQuery assumes you mean CSV and tries to parse the file as a CSV file which > results in a completely different error. > Not sure what the appropriate fix is (or if there even is one). At least > having some additional documentation for the BigQuery hook and operators that > points to the list of available values would be helpful. Otherwise, > BigQuery's error leads you to believe that there is something wrong with the > format of your data which is different than having something wrong with the > setup of the API call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)