[I] [Bug]: Request Size Limit Exceeded for ProtoRows with Large Payloads (Beyond Dismissed Edge Case) [beam]

via GitHub Tue, 24 Jun 2025 17:30:11 -0700


criptonyte opened a new issue, #35425:
URL: https://github.com/apache/beam/issues/35425

### What happened?

We are reporting an issue where BigQueryIO failed due to `maxRequestSize`
being exceeded. This occurred when attempting to append ProtoRows where the
serialised size of a single row pushed the overall request beyond the BiqQuery
API's defined limit, despite the code's internal checks and a "TODO" comment
suggesting this scenario is "nearly impossible".

### Problematic Code Location
This issue originates around lines
[L885](https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWritesShardedRecords.java#L885)
in `StorageApiWritesShardedRecords`.

Specifically the code block:

```
// Handle the case where the request is too large.
if (inserts.getSerializedSize() >= maxRequestSize) {
if (inserts.getSerializedRowsCount() > 1) {
// TODO(reuvenlax): Is it worth trying to handle this case by
splitting the protoRows?
// Given that we split
// the ProtoRows iterable at 2MB and the max request size is
10MB, this scenario seems
// nearly impossible.
LOG.error(
"A request containing more than one row is over the request
size limit of {}. "
+ "This is unexpected. All rows in the request will be
sent to the failed-rows PCollection.",
maxRequestSize);
}
```

However, we found that the same problematic code also exists in
`StorageApiWriteUnshardedRecords` line
[640](https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWriteUnshardedRecords.java#L640)

### Context and Steps to Reproduce:
1. Pipeline: We utilise an Apache Beam pipeline (running on version 2.62.0)
that processes and writes data to BigQuery using BigQueryIO via the Storage
Write API.
2. Data Trigger: The incident was triggered by a specific, albeit rare, data
pattern within the input stream. A particular record generated an exceptionally
large serialised payload when converted to ProtoRows. This large payload, when
included in its micro-batch, caused the total size of the request to exceed the
10MB BigQuery API limit.
Crucially, this micro-batch was constructed and attempted to be sent before
a definitive check or pre-validation mechanism could prevent it from exceeding
BigQuery API's hard limit. This suggest a reliance on a post-submission
validation or a lack of proactive size management for dynamically growing
ProtoRow payloads.
4. Observed Behaviour: Despite the internal BigQueryIO logic and comments
suggesting this would not occur, our production pipeline did encounter this
exact issue. The described LOG.error message was triggered, leading to the
failure of the affected micro-batch and downstream data synchronisation
problems.
### Expected Behaviour
The logic should robustly handle scenarios where serialised ProtoRows within
a micro-batch, or even a single very large row, approach or exceed the
maxRequestSize (10MB). This should involve:
1. More sophisticated pre-validation or dynamic splitting mechanisms to
ensure that individual AppendRows requests always remain within BigQuery API
limits
2. More resilient error handling that minimises the impact of an oversized
record, ideally allowing the successful processing of other valid records in
the batch.
3. A re-evaluation and implementation of a solution for the "TODO" comment,
acknowledging that this scenario is indeed possible in real-world production
data.

### Actual Behaviour
The pipeline failed to write data to BigQuery, logging the LOG.error
regarding the maxRequestSize being exceeded. This resulted in:
1. Data sync failure for the affected micro-batch
2. Interruption of a critical data flows
3. Operation effort due to manual intervention and remediation.

### Proposed Solution / Request
1. Prioritise the implementation of a robust solution for the "TODO" within
`StorageApiWritesShardedRecords` (and `StorageApiWriteUnshardedRecords`), as
our experience confirms it is a real-world edge case with production impact.
2. Implement more resilient splitting or batching logic to prevent single
large records or small groups of records from causing the entire micro-batch to
fail agains the 10MB BQ limit.

### Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

### Issue Components

- [ ] Component: Python SDK
- [x] Component: Java SDK
- [ ] Component: Go SDK
- [ ] Component: Typescript SDK
- [x] Component: IO connector
- [ ] Component: Beam YAML
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Infrastructure
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [x] Component: Google Cloud Dataflow Runner

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] [Bug]: Request Size Limit Exceeded for ProtoRows with Large Payloads (Beyond Dismissed Edge Case) [beam]

Reply via email to