[jira] [Commented] (BEAM-14146) Python Streaming job failing to drain with BigQueryIO write errors

2022-06-04 Thread Danny McCormick (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17550150#comment-17550150
 ] 

Danny McCormick commented on BEAM-14146:


This issue has been migrated to https://github.com/apache/beam/issues/21711

> Python Streaming job failing to drain with BigQueryIO write errors
> --
>
> Key: BEAM-14146
> URL: https://issues.apache.org/jira/browse/BEAM-14146
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, sdk-py-core
>Affects Versions: 2.37.0
>Reporter: Rahul Iyer
>Assignee: Heejong Lee
>Priority: P1
> Fix For: 2.40.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We have a Python Streaming Dataflow job that writes to BigQuery using the 
> {{FILE_LOADS}} method and {{auto_sharding}} enabled. When we try to drain the 
> job it fails with the following error,
> {code:python}
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
>  line 1000, in perform_load_job ValueError: Either a non-empty list of 
> fully-qualified source URIs must be provided via the source_uris parameter or 
> an open file object must be provided via the source_stream parameter.
> {code}
> Our {{WriteToBigQuery}} configuration,
> {code:python}
> beam.io.WriteToBigQuery(
>   table=options.output_table,
>   schema=bq_schema,
>   create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>   write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
>   insert_retry_strategy=RetryStrategy.RETRY_ON_TRANSIENT_ERROR,
>   method=beam.io.WriteToBigQuery.Method.FILE_LOADS,
>   additional_bq_parameters={
> "timePartitioning": {
>   "type": "HOUR",
>   "field": "bq_insert_timestamp",
> },
> "schemaUpdateOptions": ["ALLOW_FIELD_ADDITION", "ALLOW_FIELD_RELAXATION"],
>   },
>   triggering_frequency=120,
>   with_auto_sharding=True,
> )
> {code}
> We are also noticing that the job only fails to drain when there are actual 
> schema updates. If there are no schema updates the job drains without the 
> above error.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (BEAM-14146) Python Streaming job failing to drain with BigQueryIO write errors

2022-05-08 Thread Johan Brodin (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533645#comment-17533645
 ] 

Johan Brodin commented on BEAM-14146:
-

Hi! I am seeing similar issues for the java SDK when draining a job using the 
new BigQuery insert method, anyone else seeing the same thing? Could it have 
been introduced at the same time? That exception were thrown during drain time 
rather than logging like here?

> Python Streaming job failing to drain with BigQueryIO write errors
> --
>
> Key: BEAM-14146
> URL: https://issues.apache.org/jira/browse/BEAM-14146
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, sdk-py-core
>Affects Versions: 2.37.0
>Reporter: Rahul Iyer
>Assignee: Heejong Lee
>Priority: P1
> Fix For: 2.39.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We have a Python Streaming Dataflow job that writes to BigQuery using the 
> {{FILE_LOADS}} method and {{auto_sharding}} enabled. When we try to drain the 
> job it fails with the following error,
> {code:python}
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
>  line 1000, in perform_load_job ValueError: Either a non-empty list of 
> fully-qualified source URIs must be provided via the source_uris parameter or 
> an open file object must be provided via the source_stream parameter.
> {code}
> Our {{WriteToBigQuery}} configuration,
> {code:python}
> beam.io.WriteToBigQuery(
>   table=options.output_table,
>   schema=bq_schema,
>   create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>   write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
>   insert_retry_strategy=RetryStrategy.RETRY_ON_TRANSIENT_ERROR,
>   method=beam.io.WriteToBigQuery.Method.FILE_LOADS,
>   additional_bq_parameters={
> "timePartitioning": {
>   "type": "HOUR",
>   "field": "bq_insert_timestamp",
> },
> "schemaUpdateOptions": ["ALLOW_FIELD_ADDITION", "ALLOW_FIELD_RELAXATION"],
>   },
>   triggering_frequency=120,
>   with_auto_sharding=True,
> )
> {code}
> We are also noticing that the job only fails to drain when there are actual 
> schema updates. If there are no schema updates the job drains without the 
> above error.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (BEAM-14146) Python Streaming job failing to drain with BigQueryIO write errors

2022-05-06 Thread Heejong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533071#comment-17533071
 ] 

Heejong Lee commented on BEAM-14146:


I assume that the issue is fixed. We can close this now and reopen later if the 
issue still persists.

> Python Streaming job failing to drain with BigQueryIO write errors
> --
>
> Key: BEAM-14146
> URL: https://issues.apache.org/jira/browse/BEAM-14146
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, sdk-py-core
>Affects Versions: 2.37.0
>Reporter: Rahul Iyer
>Assignee: Heejong Lee
>Priority: P1
> Fix For: 2.39.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We have a Python Streaming Dataflow job that writes to BigQuery using the 
> {{FILE_LOADS}} method and {{auto_sharding}} enabled. When we try to drain the 
> job it fails with the following error,
> {code:python}
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
>  line 1000, in perform_load_job ValueError: Either a non-empty list of 
> fully-qualified source URIs must be provided via the source_uris parameter or 
> an open file object must be provided via the source_stream parameter.
> {code}
> Our {{WriteToBigQuery}} configuration,
> {code:python}
> beam.io.WriteToBigQuery(
>   table=options.output_table,
>   schema=bq_schema,
>   create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>   write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
>   insert_retry_strategy=RetryStrategy.RETRY_ON_TRANSIENT_ERROR,
>   method=beam.io.WriteToBigQuery.Method.FILE_LOADS,
>   additional_bq_parameters={
> "timePartitioning": {
>   "type": "HOUR",
>   "field": "bq_insert_timestamp",
> },
> "schemaUpdateOptions": ["ALLOW_FIELD_ADDITION", "ALLOW_FIELD_RELAXATION"],
>   },
>   triggering_frequency=120,
>   with_auto_sharding=True,
> )
> {code}
> We are also noticing that the job only fails to drain when there are actual 
> schema updates. If there are no schema updates the job drains without the 
> above error.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (BEAM-14146) Python Streaming job failing to drain with BigQueryIO write errors

2022-05-06 Thread Yichi Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533011#comment-17533011
 ] 

Yichi Zhang commented on BEAM-14146:


[~heejong] the PR is merged, is the issue fixed?

> Python Streaming job failing to drain with BigQueryIO write errors
> --
>
> Key: BEAM-14146
> URL: https://issues.apache.org/jira/browse/BEAM-14146
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, sdk-py-core
>Affects Versions: 2.37.0
>Reporter: Rahul Iyer
>Assignee: Heejong Lee
>Priority: P1
> Fix For: 2.39.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We have a Python Streaming Dataflow job that writes to BigQuery using the 
> {{FILE_LOADS}} method and {{auto_sharding}} enabled. When we try to drain the 
> job it fails with the following error,
> {code:python}
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
>  line 1000, in perform_load_job ValueError: Either a non-empty list of 
> fully-qualified source URIs must be provided via the source_uris parameter or 
> an open file object must be provided via the source_stream parameter.
> {code}
> Our {{WriteToBigQuery}} configuration,
> {code:python}
> beam.io.WriteToBigQuery(
>   table=options.output_table,
>   schema=bq_schema,
>   create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>   write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
>   insert_retry_strategy=RetryStrategy.RETRY_ON_TRANSIENT_ERROR,
>   method=beam.io.WriteToBigQuery.Method.FILE_LOADS,
>   additional_bq_parameters={
> "timePartitioning": {
>   "type": "HOUR",
>   "field": "bq_insert_timestamp",
> },
> "schemaUpdateOptions": ["ALLOW_FIELD_ADDITION", "ALLOW_FIELD_RELAXATION"],
>   },
>   triggering_frequency=120,
>   with_auto_sharding=True,
> )
> {code}
> We are also noticing that the job only fails to drain when there are actual 
> schema updates. If there are no schema updates the job drains without the 
> above error.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (BEAM-14146) Python Streaming job failing to drain with BigQueryIO write errors

2022-05-04 Thread Chamikara Madhusanka Jayalath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531993#comment-17531993
 ] 

Chamikara Madhusanka Jayalath commented on BEAM-14146:
--

cc: [~cccyang]

> Python Streaming job failing to drain with BigQueryIO write errors
> --
>
> Key: BEAM-14146
> URL: https://issues.apache.org/jira/browse/BEAM-14146
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, sdk-py-core
>Affects Versions: 2.37.0
>Reporter: Rahul Iyer
>Assignee: Heejong Lee
>Priority: P1
> Fix For: 2.39.0
>
>
> We have a Python Streaming Dataflow job that writes to BigQuery using the 
> {{FILE_LOADS}} method and {{auto_sharding}} enabled. When we try to drain the 
> job it fails with the following error,
> {code:python}
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
>  line 1000, in perform_load_job ValueError: Either a non-empty list of 
> fully-qualified source URIs must be provided via the source_uris parameter or 
> an open file object must be provided via the source_stream parameter.
> {code}
> Our {{WriteToBigQuery}} configuration,
> {code:python}
> beam.io.WriteToBigQuery(
>   table=options.output_table,
>   schema=bq_schema,
>   create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>   write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
>   insert_retry_strategy=RetryStrategy.RETRY_ON_TRANSIENT_ERROR,
>   method=beam.io.WriteToBigQuery.Method.FILE_LOADS,
>   additional_bq_parameters={
> "timePartitioning": {
>   "type": "HOUR",
>   "field": "bq_insert_timestamp",
> },
> "schemaUpdateOptions": ["ALLOW_FIELD_ADDITION", "ALLOW_FIELD_RELAXATION"],
>   },
>   triggering_frequency=120,
>   with_auto_sharding=True,
> )
> {code}
> We are also noticing that the job only fails to drain when there are actual 
> schema updates. If there are no schema updates the job drains without the 
> above error.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (BEAM-14146) Python Streaming job failing to drain with BigQueryIO write errors

2022-05-04 Thread Chamikara Madhusanka Jayalath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531992#comment-17531992
 ] 

Chamikara Madhusanka Jayalath commented on BEAM-14146:
--

Looks like the check was added 
[here|https://github.com/apache/beam/pull/14113/files].

I'm OK with relaxing that if that can be safely do so. Also, seems like we 
support providing an upload stream with the request. Is that only used for 
testing ? 


> Python Streaming job failing to drain with BigQueryIO write errors
> --
>
> Key: BEAM-14146
> URL: https://issues.apache.org/jira/browse/BEAM-14146
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, sdk-py-core
>Affects Versions: 2.37.0
>Reporter: Rahul Iyer
>Assignee: Heejong Lee
>Priority: P1
> Fix For: 2.39.0
>
>
> We have a Python Streaming Dataflow job that writes to BigQuery using the 
> {{FILE_LOADS}} method and {{auto_sharding}} enabled. When we try to drain the 
> job it fails with the following error,
> {code:python}
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
>  line 1000, in perform_load_job ValueError: Either a non-empty list of 
> fully-qualified source URIs must be provided via the source_uris parameter or 
> an open file object must be provided via the source_stream parameter.
> {code}
> Our {{WriteToBigQuery}} configuration,
> {code:python}
> beam.io.WriteToBigQuery(
>   table=options.output_table,
>   schema=bq_schema,
>   create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>   write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
>   insert_retry_strategy=RetryStrategy.RETRY_ON_TRANSIENT_ERROR,
>   method=beam.io.WriteToBigQuery.Method.FILE_LOADS,
>   additional_bq_parameters={
> "timePartitioning": {
>   "type": "HOUR",
>   "field": "bq_insert_timestamp",
> },
> "schemaUpdateOptions": ["ALLOW_FIELD_ADDITION", "ALLOW_FIELD_RELAXATION"],
>   },
>   triggering_frequency=120,
>   with_auto_sharding=True,
> )
> {code}
> We are also noticing that the job only fails to drain when there are actual 
> schema updates. If there are no schema updates the job drains without the 
> above error.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (BEAM-14146) Python Streaming job failing to drain with BigQueryIO write errors

2022-05-02 Thread Heejong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530996#comment-17530996
 ] 

Heejong Lee commented on BEAM-14146:


[~chamikara] [~pabloem] Do we really need 
[this|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1005]
 check? Looks like it's possible that {{perform_load_job}} can be called with 
empty list of files during pipeline draining. Could we just do no-op with 
warning messages when it's called with empty list of files?

> Python Streaming job failing to drain with BigQueryIO write errors
> --
>
> Key: BEAM-14146
> URL: https://issues.apache.org/jira/browse/BEAM-14146
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, sdk-py-core
>Affects Versions: 2.37.0
>Reporter: Rahul Iyer
>Assignee: Heejong Lee
>Priority: P1
> Fix For: 2.39.0
>
>
> We have a Python Streaming Dataflow job that writes to BigQuery using the 
> {{FILE_LOADS}} method and {{auto_sharding}} enabled. When we try to drain the 
> job it fails with the following error,
> {code:python}
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
>  line 1000, in perform_load_job ValueError: Either a non-empty list of 
> fully-qualified source URIs must be provided via the source_uris parameter or 
> an open file object must be provided via the source_stream parameter.
> {code}
> Our {{WriteToBigQuery}} configuration,
> {code:python}
> beam.io.WriteToBigQuery(
>   table=options.output_table,
>   schema=bq_schema,
>   create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>   write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
>   insert_retry_strategy=RetryStrategy.RETRY_ON_TRANSIENT_ERROR,
>   method=beam.io.WriteToBigQuery.Method.FILE_LOADS,
>   additional_bq_parameters={
> "timePartitioning": {
>   "type": "HOUR",
>   "field": "bq_insert_timestamp",
> },
> "schemaUpdateOptions": ["ALLOW_FIELD_ADDITION", "ALLOW_FIELD_RELAXATION"],
>   },
>   triggering_frequency=120,
>   with_auto_sharding=True,
> )
> {code}
> We are also noticing that the job only fails to drain when there are actual 
> schema updates. If there are no schema updates the job drains without the 
> above error.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (BEAM-14146) Python Streaming job failing to drain with BigQueryIO write errors

2022-04-29 Thread Chamikara Madhusanka Jayalath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530248#comment-17530248
 ] 

Chamikara Madhusanka Jayalath commented on BEAM-14146:
--

Forwarding to Heejong.

> Python Streaming job failing to drain with BigQueryIO write errors
> --
>
> Key: BEAM-14146
> URL: https://issues.apache.org/jira/browse/BEAM-14146
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, sdk-py-core
>Affects Versions: 2.37.0
>Reporter: Rahul Iyer
>Assignee: Pablo Estrada
>Priority: P1
> Fix For: 2.39.0
>
>
> We have a Python Streaming Dataflow job that writes to BigQuery using the 
> {{FILE_LOADS}} method and {{auto_sharding}} enabled. When we try to drain the 
> job it fails with the following error,
> {code:python}
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
>  line 1000, in perform_load_job ValueError: Either a non-empty list of 
> fully-qualified source URIs must be provided via the source_uris parameter or 
> an open file object must be provided via the source_stream parameter.
> {code}
> Our {{WriteToBigQuery}} configuration,
> {code:python}
> beam.io.WriteToBigQuery(
>   table=options.output_table,
>   schema=bq_schema,
>   create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>   write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
>   insert_retry_strategy=RetryStrategy.RETRY_ON_TRANSIENT_ERROR,
>   method=beam.io.WriteToBigQuery.Method.FILE_LOADS,
>   additional_bq_parameters={
> "timePartitioning": {
>   "type": "HOUR",
>   "field": "bq_insert_timestamp",
> },
> "schemaUpdateOptions": ["ALLOW_FIELD_ADDITION", "ALLOW_FIELD_RELAXATION"],
>   },
>   triggering_frequency=120,
>   with_auto_sharding=True,
> )
> {code}
> We are also noticing that the job only fails to drain when there are actual 
> schema updates. If there are no schema updates the job drains without the 
> above error.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (BEAM-14146) Python Streaming job failing to drain with BigQueryIO write errors

2022-04-19 Thread Chamikara Madhusanka Jayalath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524645#comment-17524645
 ] 

Chamikara Madhusanka Jayalath commented on BEAM-14146:
--

I'm not sure of "schemaUpdateOptions" can be correctly supported for streaming 
pipelines where we have to repeatedly trigger multiple load jobs.

Assigning to Pablo to look into this and either.
(1) Make sure it's supported correctly and add appropriate tests
(2) Document that this is not supported for streaming and fail pipelines early

> Python Streaming job failing to drain with BigQueryIO write errors
> --
>
> Key: BEAM-14146
> URL: https://issues.apache.org/jira/browse/BEAM-14146
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, sdk-py-core
>Affects Versions: 2.37.0
>Reporter: Rahul Iyer
>Priority: P2
>
> We have a Python Streaming Dataflow job that writes to BigQuery using the 
> {{FILE_LOADS}} method and {{auto_sharding}} enabled. When we try to drain the 
> job it fails with the following error,
> {code:python}
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
>  line 1000, in perform_load_job ValueError: Either a non-empty list of 
> fully-qualified source URIs must be provided via the source_uris parameter or 
> an open file object must be provided via the source_stream parameter.
> {code}
> Our {{WriteToBigQuery}} configuration,
> {code:python}
> beam.io.WriteToBigQuery(
>   table=options.output_table,
>   schema=bq_schema,
>   create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>   write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
>   insert_retry_strategy=RetryStrategy.RETRY_ON_TRANSIENT_ERROR,
>   method=beam.io.WriteToBigQuery.Method.FILE_LOADS,
>   additional_bq_parameters={
> "timePartitioning": {
>   "type": "HOUR",
>   "field": "bq_insert_timestamp",
> },
> "schemaUpdateOptions": ["ALLOW_FIELD_ADDITION", "ALLOW_FIELD_RELAXATION"],
>   },
>   triggering_frequency=120,
>   with_auto_sharding=True,
> )
> {code}
> We are also noticing that the job only fails to drain when there are actual 
> schema updates. If there are no schema updates the job drains without the 
> above error.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (BEAM-14146) Python Streaming job failing to drain with BigQueryIO write errors

2022-03-24 Thread Rahul Iyer (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512006#comment-17512006
 ] 

Rahul Iyer commented on BEAM-14146:
---

{quote}
Are you able to unblock by using STEAMING_INSERTS method ?
{quote}
No, we want to use
{code}
"schemaUpdateOptions": ["ALLOW_FIELD_ADDITION", "ALLOW_FIELD_RELAXATION"],
{code}
and I believe this is not supported while using {{STEAMING_INSERTS}}.

{quote}
Also, does this occur consistently ?
{quote}
I believe so yeah.

> Python Streaming job failing to drain with BigQueryIO write errors
> --
>
> Key: BEAM-14146
> URL: https://issues.apache.org/jira/browse/BEAM-14146
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, sdk-py-core
>Affects Versions: 2.37.0
>Reporter: Rahul Iyer
>Priority: P2
>
> We have a Python Streaming Dataflow job that writes to BigQuery using the 
> {{FILE_LOADS}} method and {{auto_sharding}} enabled. When we try to drain the 
> job it fails with the following error,
> {code:python}
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
>  line 1000, in perform_load_job ValueError: Either a non-empty list of 
> fully-qualified source URIs must be provided via the source_uris parameter or 
> an open file object must be provided via the source_stream parameter.
> {code}
> Our {{WriteToBigQuery}} configuration,
> {code:python}
> beam.io.WriteToBigQuery(
>   table=options.output_table,
>   schema=bq_schema,
>   create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>   write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
>   insert_retry_strategy=RetryStrategy.RETRY_ON_TRANSIENT_ERROR,
>   method=beam.io.WriteToBigQuery.Method.FILE_LOADS,
>   additional_bq_parameters={
> "timePartitioning": {
>   "type": "HOUR",
>   "field": "bq_insert_timestamp",
> },
> "schemaUpdateOptions": ["ALLOW_FIELD_ADDITION", "ALLOW_FIELD_RELAXATION"],
>   },
>   triggering_frequency=120,
>   with_auto_sharding=True,
> )
> {code}
> We are also noticing that the job only fails to drain when there are actual 
> schema updates. If there are no schema updates the job drains without the 
> above error.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-14146) Python Streaming job failing to drain with BigQueryIO write errors

2022-03-24 Thread Chamikara Madhusanka Jayalath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512002#comment-17512002
 ] 

Chamikara Madhusanka Jayalath commented on BEAM-14146:
--

Thanks for reporting ? Are you able to unblock by using STEAMING_INSERTS method 
?

Also, does this occur consistently ? 

cc: [~pabloem]

> Python Streaming job failing to drain with BigQueryIO write errors
> --
>
> Key: BEAM-14146
> URL: https://issues.apache.org/jira/browse/BEAM-14146
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, sdk-py-core
>Affects Versions: 2.37.0
>Reporter: Rahul Iyer
>Priority: P2
>
> We have a Python Streaming Dataflow job that writes to BigQuery using the 
> {{FILE_LOADS}} method and {{auto_sharding}} enabled. When we try to drain the 
> job it fails with the following error,
> {code:python}
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
>  line 1000, in perform_load_job ValueError: Either a non-empty list of 
> fully-qualified source URIs must be provided via the source_uris parameter or 
> an open file object must be provided via the source_stream parameter.
> {code}
> Our {{WriteToBigQuery}} configuration,
> {code:python}
> beam.io.WriteToBigQuery(
>   table=options.output_table,
>   schema=bq_schema,
>   create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>   write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
>   insert_retry_strategy=RetryStrategy.RETRY_ON_TRANSIENT_ERROR,
>   method=beam.io.WriteToBigQuery.Method.FILE_LOADS,
>   additional_bq_parameters={
> "timePartitioning": {
>   "type": "HOUR",
>   "field": "bq_insert_timestamp",
> },
> "schemaUpdateOptions": ["ALLOW_FIELD_ADDITION", "ALLOW_FIELD_RELAXATION"],
>   },
>   triggering_frequency=120,
>   with_auto_sharding=True,
> )
> {code}
> We are also noticing that the job only fails to drain when there are actual 
> schema updates. If there are no schema updates the job drains without the 
> above error.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)