[ https://issues.apache.org/jira/browse/AIRFLOW-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349967#comment-16349967 ]
ASF subversion and git services commented on AIRFLOW-2053: ---------------------------------------------------------- Commit fd4360b9f0954b3dd4a960153178a06112f05a33 in incubator-airflow's branch refs/heads/master from [~kaxilnaik] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=fd4360b ] [AIRFLOW-2053] Fix quote character bug in BQ hook Modified the condition to check if the quote_character is set. This will allow to set `quote_character` as empty string when the data doesn't contain quoted sections. Closes #2996 from kaxil/bq_hook_quote_fix > BigQuery Hook bug when data doesn't contain quoted values > --------------------------------------------------------- > > Key: AIRFLOW-2053 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2053 > Project: Apache Airflow > Issue Type: Bug > Components: gcp > Affects Versions: 1.9.0, 1.8.2 > Reporter: Kaxil Naik > Assignee: Kaxil Naik > Priority: Minor > Fix For: 2.0.0 > > > The BigQuery API states > [here|https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load.quote] > that : > {quote}The value that is used to quote data sections in a CSV file. BigQuery > converts the string to ISO-8859-1 encoding, and then uses the first byte of > the encoded string to split the data in its raw, binary state. The default > value is a double-quote ('"'). If your data does not contain quoted sections, > set the property value to an empty string. {quote} > But the [current > implementation|https://github.com/apache/incubator-airflow/blob/6ee4bbd4b1bc4b3f275f7946e2bcdd123970e2dd/airflow/contrib/hooks/bigquery_hook.py#L802] > `run_load ` in BigQuery hook has incorrect check to include > `quote_character`. > The code currently is: > {code:python} > if 'fieldDelimiter' not in src_fmt_configs: > src_fmt_configs['fieldDelimiter'] = field_delimiter > if quote_character: > src_fmt_configs['quote'] = quote_character > if allow_quoted_newlines: > src_fmt_configs['allowQuotedNewlines'] = allow_quoted_newlines > {code} > If my data doesn't have quote characters as per BQ API docs I need to put > `quote=''` i.e empty string. The above condition `if quote_character:` will > return false for an empty string. Hence, I get the following error: > {code:json} > {'message': 'Error detected while parsing row starting at position: 0. Error: > Data between close double quote (") and field separator.', 'reason': > 'invalid'} > {code} > So, the condition should be : > {code:python} > if quote_character is not None: > src_fmt_configs['quote'] = quote_character > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)