Re: [PR] fix(bigquery): limit result set size to prevent browser memory crashes [superset]

via GitHub Wed, 11 Mar 2026 13:43:06 -0700


bito-code-review[bot] commented on code in PR #38588:
URL: https://github.com/apache/superset/pull/38588#discussion_r2920889747



##########
superset-frontend/packages/superset-ui-core/src/query/types/QueryResponse.ts:
##########
@@ -77,6 +77,7 @@ export interface ChartDataResponseResult {
   // TODO(hainenber): define proper type for below attributes
   rejected_filters?: any[];
   applied_filters?: any[];
+  warning?: string | null;

Review Comment:
   <div>
   
   
   <div id="suggestion">
   <div id="issue"><b>Backend schema missing warning field</b></div>
   <div id="fix">
   
   The added warning field matches backend response data, but the schema lacks 
it, potentially causing validation issues. Add warning = 
fields.String(allow_none=True) to ChartDataResponseResult.
   </div>
   
   
   </div>
   
   
   
   
   <small><i>Code Review Run #e5147f</i></small>
   </div>
   
   ---
   Should Bito avoid suggestions like this for future reviews? (<a 
href=https://alpha.bito.ai/home/ai-agents/review-rules>Manage Rules</a>)
   - [ ] Yes, avoid them



##########
superset/db_engine_specs/bigquery.py:
##########
@@ -304,12 +306,72 @@ def convert_dttm(
 
     @classmethod
     def fetch_data(cls, cursor: Any, limit: int | None = None) -> 
list[tuple[Any, ...]]:
-        data = super().fetch_data(cursor, limit)
-        # Support type BigQuery Row, introduced here PR #4071
-        # google.cloud.bigquery.table.Row
-        if data and type(data[0]).__name__ == "Row":
-            data = [r.values() for r in data]  # type: ignore
-        return data
+        """
+        Progressive fetch for BigQuery to prevent browser memory overload.
+
+        Samples a first batch to estimate row size, then extrapolates the
+        total number of rows that fit within ``BQ_FETCH_MAX_MB``.
+        Falls back to the parent implementation on any error.
+        """
+        max_mb: int = (
+            current_app.config.get("BQ_FETCH_MAX_MB", 200) if current_app else 
200
+        )
+        max_bytes = max_mb * 1024 * 1024
+
+        try:
+            initial_batch_size = min(1000, limit) if limit else 1000
+            first_batch: list[Any] = cursor.fetchmany(initial_batch_size)
+
+            if not first_batch:
+                g.bq_memory_limited = False
+                g.bq_memory_limited_row_count = 0
+                return []
+
+            # Support BigQuery Row objects (PR #4071)
+            if type(first_batch[0]).__name__ == "Row":
+                first_batch = [r.values() for r in first_batch]
+
+            # Estimate how many rows fit in the memory budget
+            first_batch_bytes = sys.getsizeof(str(first_batch))
+            rows_fetched = len(first_batch)
+            avg_bytes_per_row = first_batch_bytes / rows_fetched
+            total_rows_for_target = int(max_bytes / avg_bytes_per_row)
+
+            if limit:
+                total_rows_for_target = min(limit, total_rows_for_target)
+
+            remaining_rows = total_rows_for_target - rows_fetched
+
+            # First batch already covers the budget or the result set
+            if rows_fetched < initial_batch_size or remaining_rows <= 0:
+                memory_limited = (
+                    remaining_rows <= 0 and rows_fetched == initial_batch_size
+                )
+                g.bq_memory_limited = memory_limited
+                g.bq_memory_limited_row_count = len(first_batch)
+                return first_batch
+
+            # Fetch the rest up to the budget
+            second_batch: list[Any] = cursor.fetchmany(remaining_rows) or []
+            if second_batch and type(second_batch[0]).__name__ == "Row":
+                second_batch = [r.values() for r in second_batch]
+
+            data = first_batch + second_batch
+
+            # If we received exactly what we asked for, more rows may exist
+            memory_limited = len(second_batch) == remaining_rows
+            g.bq_memory_limited = memory_limited
+            g.bq_memory_limited_row_count = len(data)
+            return data
+
+        except Exception:  # pylint: disable=broad-except

Review Comment:
   <div>
   
   
   <div id="suggestion">
   <div id="issue"><b>Avoid blind exception catch</b></div>
   <div id="fix">
   
   Replace the broad `except Exception:` at line 367 with specific exception 
types that are expected from cursor operations. This improves error handling 
clarity and prevents masking unexpected errors.
   </div>
   
   
   <details>
   <summary>
   <b>Code suggestion</b>
   </summary>
   <blockquote>Check the AI-generated fix before applying</blockquote>
   <div id="code">
   
   
   ````suggestion
           except (DatabaseError, OperationalError, Exception):  # pylint: 
disable=broad-except
   ````
   
   </div>
   </details>
   
   
   
   </div>
   
   
   
   
   <small><i>Code Review Run #e5147f</i></small>
   </div>
   
   ---
   Should Bito avoid suggestions like this for future reviews? (<a 
href=https://alpha.bito.ai/home/ai-agents/review-rules>Manage Rules</a>)
   - [ ] Yes, avoid them



##########
superset/db_engine_specs/bigquery.py:
##########
@@ -304,12 +306,72 @@ def convert_dttm(
 
     @classmethod
     def fetch_data(cls, cursor: Any, limit: int | None = None) -> 
list[tuple[Any, ...]]:
-        data = super().fetch_data(cursor, limit)
-        # Support type BigQuery Row, introduced here PR #4071
-        # google.cloud.bigquery.table.Row
-        if data and type(data[0]).__name__ == "Row":
-            data = [r.values() for r in data]  # type: ignore
-        return data
+        """
+        Progressive fetch for BigQuery to prevent browser memory overload.
+
+        Samples a first batch to estimate row size, then extrapolates the
+        total number of rows that fit within ``BQ_FETCH_MAX_MB``.
+        Falls back to the parent implementation on any error.
+        """
+        max_mb: int = (
+            current_app.config.get("BQ_FETCH_MAX_MB", 200) if current_app else 
200
+        )
+        max_bytes = max_mb * 1024 * 1024
+
+        try:
+            initial_batch_size = min(1000, limit) if limit else 1000
+            first_batch: list[Any] = cursor.fetchmany(initial_batch_size)
+
+            if not first_batch:
+                g.bq_memory_limited = False
+                g.bq_memory_limited_row_count = 0
+                return []
+
+            # Support BigQuery Row objects (PR #4071)
+            if type(first_batch[0]).__name__ == "Row":
+                first_batch = [r.values() for r in first_batch]
+
+            # Estimate how many rows fit in the memory budget
+            first_batch_bytes = sys.getsizeof(str(first_batch))

Review Comment:
   <div>
   
   
   <div id="suggestion">
   <div id="issue"><b>Inaccurate memory estimation</b></div>
   <div id="fix">
   
   The memory size estimation uses sys.getsizeof(str(first_batch)), but str() 
creates a string representation that can be significantly larger than the 
actual in-memory size of the list object. This leads to overestimating memory 
usage and potentially fetching fewer rows than the budget allows. Use 
sys.getsizeof(first_batch) for correct memory budgeting.
   </div>
   
   
   <details>
   <summary>
   <b>Code suggestion</b>
   </summary>
   <blockquote>Check the AI-generated fix before applying</blockquote>
   <div id="code">
   
   
   ````suggestion
               first_batch_bytes = sys.getsizeof(first_batch)
   ````
   
   </div>
   </details>
   
   
   
   </div>
   
   
   
   
   <small><i>Code Review Run #e5147f</i></small>
   </div>
   
   ---
   Should Bito avoid suggestions like this for future reviews? (<a 
href=https://alpha.bito.ai/home/ai-agents/review-rules>Manage Rules</a>)
   - [ ] Yes, avoid them



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(bigquery): limit result set size to prevent browser memory crashes [superset]

Reply via email to