Re: [PR] fix: use pd.to_numeric in df_metrics_to_num to handle string-encoded numerics from ClickHouse [superset]

via GitHub Sat, 16 May 2026 02:48:08 -0700


codeant-ai-for-open-source[bot] commented on code in PR #40190:
URL: https://github.com/apache/superset/pull/40190#discussion_r3252625012



##########
superset/common/utils/dataframe_utils.py:
##########
@@ -57,7 +57,7 @@ def df_metrics_to_num(df: pd.DataFrame, query_object: 
QueryObject) -> None:
         if dtype.type == np.object_ and col in query_object.metric_names:
             # soft-convert a metric column to numeric
             # will stay as strings if conversion fails
-            df[col] = df[col].infer_objects()
+            df[col] = pd.to_numeric(df[col], errors="ignore")

Review Comment:
   **Suggestion:** Using `errors="ignore"` can leave the entire metric column 
as object/string when even one value is non-numeric, so downstream numeric 
aggregations can still fail with the same `Could not convert string ... to 
numeric` error. Use coercive conversion for metric columns (eg converting 
invalid entries to `NaN`) so valid numeric strings are still parsed and the 
column becomes numeric. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Chart data requests using QueryContextProcessor may crash.
   - ❌ Rolling/aggregate post-processing fails on ClickHouse metric strings.
   - ⚠️ Contribution/other numeric features skip or mis-handle metrics.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. A chart request is executed through the chart data pipeline:
   `QueryContextProcessor.get_payload()` at
   `superset/common/query_context_processor.py:88-126` calls 
`get_query_results(...)`, which
   in turn invokes `_get_full()` at `superset/common/query_actions.py:153-172`, 
obtaining a
   `QueryContext` and calling `query_context.get_df_payload(query_obj, ...)` to 
retrieve a
   pandas DataFrame.
   
   2. `QueryContext.get_df_payload()` delegates to 
`QueryContextProcessor.get_df_payload()`
   at `superset/common/query_context_processor.py:80-60` (around lines 82-60 in 
that file),
   which calls `self.get_query_result(query_obj)`; this delegates to the 
datasource
   implementation `get_query_result()` in `superset/models/helpers.py:107-120`, 
which first
   normalizes the DataFrame via `normalize_df(df, query_object)` at
   `superset/models/helpers.py:32-105`.
   
   3. Inside `normalize_df`, after datetime normalization, metrics are 
normalized by calling
   `dataframe_utils.df_metrics_to_num(df, query_object)` at
   `superset/models/helpers.py:99-102`, which executes `df_metrics_to_num` 
defined in
   `superset/common/utils/dataframe_utils.py:15-21`. This function iterates 
over columns and,
   for metric columns with `dtype.type == np.object_`, runs `df[col] = 
pd.to_numeric(df[col],
   errors="ignore")` (line 60 in the PR). When the ClickHouse backend returns a 
metric column
   (e.g. `sum__metric`) as strings with at least one non-numeric value (for 
example `"655"`
   and `"total"`), `pd.to_numeric(..., errors="ignore")` leaves the entire 
Series unchanged
   as `object`, so the metric column remains a string-typed column instead of 
becoming
   numeric.
   
   4. The resulting `df` with metric columns still of dtype `object` is then 
used by
   downstream numeric code paths. For instance, post-processing operations such 
as rolling
   windows or aggregations use numpy/pandas numeric functions, e.g. the rolling 
operator in
   `superset/utils/pandas_postprocessing/rolling.py:31-88` builds `df_rolling = 
df.loc[:,
   columns.keys()]` and then calls `df_rolling = getattr(df_rolling,
   rolling_type)(**rolling_type_options)`. When `rolling_type` is a numeric 
operation like
   `"sum"` and the metric column is still an `object` Series with string 
values, pandas'
   internal `nanops` machinery (in `pandas/core/nanops.py`) attempts to convert 
those strings
   to numeric and raises `TypeError: Could not convert string '655' to 
numeric`, causing the
   chart request (triggered via `_get_full` / `get_query_results`) to fail. 
This behavior
   contrasts with the legacy `BaseViz.df_metrics_to_num` implementation in
   `superset/viz.py:16-23`, which uses `pd.to_numeric(..., errors="coerce")`, 
coercing
   invalid metric entries to `NaN` while ensuring the overall column becomes 
numeric.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Fcommon%2Futils%2Fdataframe_utils.py%0A%2A%2ALine%3A%2A%2A%2060%3A60%0A%2A%2AComment%3A%2A%2A%0A%09%2ALogic%20Error%3A%20Using%20%60errors%3D%22ignore%22%60%20can%20leave%20the%20entire%20metric%20column%20as%20object%2Fstring%20when%20even%20one%20value%20is%20non-numeric%2C%20so%20downstream%20numeric%20aggregations%20can%20still%20fail%20with%20the%20same%20%60Could%20not%20convert%20string%20...%20to%20numeric%60%20error.%20Use%20coercive%20conversion%20for%20metric%20columns%20%28eg%20converting%20invalid%20entries%20to%20%60NaN%60%29%20so%20valid%20numeric%20strings%20are%20still%20parsed%20and%20the%20column%20becomes%20numeric.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%
 
20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Fcommon%2Futils%2Fdataframe_utils.py%0A%2A%2ALine%3A%2A%2A%2060%3A60%0A%2A%2AComment%3A%2A%2A%0A%09%2ALogic%20Error%3A%20Using%20%60errors%3D%22ignore%22%60%20can%20leave%20the%20entire%20metric%20column%20as%20object%2Fstring%20when%20even%20one%20value%20is%20non-numeric%2C%20so%20downstream%20numeric%20aggregations%20can%20still%20fail%20with%20the%20same%20%60Could%20not%20convert%20string%20...%20to%20numeric%60%20error.%20Use%20coercive%20conversion%
 
20for%20metric%20columns%20%28eg%20converting%20invalid%20entries%20to%20%60NaN%60%29%20so%20valid%20numeric%20strings%20are%20still%20parsed%20and%20the%20column%20becomes%20numeric.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/common/utils/dataframe_utils.py
   **Line:** 60:60
   **Comment:**
        *Logic Error: Using `errors="ignore"` can leave the entire metric 
column as object/string when even one value is non-numeric, so downstream 
numeric aggregations can still fail with the same `Could not convert string ... 
to numeric` error. Use coercive conversion for metric columns (eg converting 
invalid entries to `NaN`) so valid numeric strings are still parsed and the 
column becomes numeric.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40190&comment_hash=0dbafd11a269f136a721e1ef4f04c36d483b600bfabdf462252c2152f105cddc&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40190&comment_hash=0dbafd11a269f136a721e1ef4f04c36d483b600bfabdf462252c2152f105cddc&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: use pd.to_numeric in df_metrics_to_num to handle string-encoded numerics from ClickHouse [superset]

Reply via email to