codeant-ai-for-open-source[bot] commented on code in PR #37516:
URL: https://github.com/apache/superset/pull/37516#discussion_r2842835197
##########
superset/common/query_context_processor.py:
##########
@@ -189,6 +205,33 @@ def get_df_payload(
}
)
cache.df.columns = [unescape_separator(col) for col in
cache.df.columns.values]
+ timing["result_processing_ms"] = round((time.perf_counter() - t) *
1000, 2)
+
+ timing["db_execution_ms"] = timing.get("db_execution_ms")
Review Comment:
**Suggestion:** On cache hits `timing["db_execution_ms"]` is set to `None`,
but the slow-query logging later formats this field with a `%.0f` float
placeholder using `timing.get("db_execution_ms", 0)`, which returns `None` when
the key exists, causing a `TypeError` at runtime. Removing this assignment
allows the logging call to fall back to the numeric default (`0`) when the key
is absent, while preserving correct behavior on cache misses where
`db_execution_ms` is populated. [type error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Slow-query logging fails for cached-but-slow chart requests.
- ⚠️ Logging errors clutter logs, hiding real slow-query details.
```
</details>
```suggestion
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Configure a non-null slow-query threshold (e.g.
`CHART_DATA_SLOW_QUERY_THRESHOLD_MS=500`) so the slow-query logging block in
`QueryContextProcessor.get_df_payload()`
(`superset/common/query_context_processor.py`,
lines ~221-233) is active.
2. Execute any chart that populates the query cache: the first
`/api/v1/chart/data`
request causes a cache miss; `QueryCacheManager.get()`
(`superset/common/utils/query_cache_manager.py:153-211`) returns a
`QueryCacheManager`
with `is_loaded=False`, leading to DB execution and
`timing["db_execution_ms"]` being set.
3. Issue the same chart request again so it is served from cache:
`QueryCacheManager.get()` now sets `cache.is_loaded=True` and
`cache.is_cached=True`, so
the `db_execution` block in `get_df_payload()` is skipped and
`timing["db_execution_ms"]`
is never initialized before line 210.
4. At line 210, `timing["db_execution_ms"] = timing.get("db_execution_ms")`
creates the
key with value `None`; if `timing["total_ms"]` still exceeds the threshold,
the slow-query
logger at lines 222-233 calls `logger.warning(...,
timing.get("db_execution_ms", 0),
...)`, which passes `None` to a `"%.0fms"` placeholder, causing `logging` to
emit a
`TypeError` during message formatting and drop the intended slow-query log
entry.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/common/query_context_processor.py
**Line:** 210:210
**Comment:**
*Type Error: On cache hits `timing["db_execution_ms"]` is set to
`None`, but the slow-query logging later formats this field with a `%.0f` float
placeholder using `timing.get("db_execution_ms", 0)`, which returns `None` when
the key exists, causing a `TypeError` at runtime. Removing this assignment
allows the logging call to fall back to the numeric default (`0`) when the key
is absent, while preserving correct behavior on cache misses where
`db_execution_ms` is populated.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F37516&comment_hash=29034f3bf53eaae4826be51c6301bd01a42e930403b1778f57aceb0583522d10&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F37516&comment_hash=29034f3bf53eaae4826be51c6301bd01a42e930403b1778f57aceb0583522d10&reaction=dislike'>👎</a>
##########
superset/common/query_context_processor.py:
##########
@@ -189,6 +205,33 @@ def get_df_payload(
}
)
cache.df.columns = [unescape_separator(col) for col in
cache.df.columns.values]
+ timing["result_processing_ms"] = round((time.perf_counter() - t) *
1000, 2)
+
+ timing["db_execution_ms"] = timing.get("db_execution_ms")
+ timing["total_ms"] = round((time.perf_counter() - t0) * 1000, 2)
+ timing["is_cached"] = cache.is_cached
+
+ # Emit per-phase metrics via STATS_LOGGER
+ stats_logger = current_app.config.get("STATS_LOGGER")
+ if stats_logger and hasattr(stats_logger, "timing"):
+ for phase, value in timing.items():
+ if isinstance(value, (int, float)) and not isinstance(value,
bool):
+ stats_logger.timing(f"chart_data.{phase}", value)
+
+ # Slow query logging
+ threshold =
current_app.config.get("CHART_DATA_SLOW_QUERY_THRESHOLD_MS")
+ if threshold is not None and timing["total_ms"] > threshold:
+ logger.warning(
+ "Slow chart query: total=%.0fms validate=%.0fms "
+ "cache_lookup=%.0fms db_execution=%.0fms "
+ "result_processing=%.0fms is_cached=%s",
+ timing["total_ms"],
+ timing.get("validate_ms", 0),
+ timing.get("cache_lookup_ms", 0),
+ timing.get("db_execution_ms", 0),
Review Comment:
**Suggestion:** When the result is served from cache, the timing dict sets
the database execution time to None, but the slow-query logging formats this
value with a '%.0f' placeholder; if the slow-query threshold is met on a cached
response, logging will attempt to format None as a float and raise a TypeError
at runtime. To avoid this, ensure that the value passed for db execution time
in the logging call is always numeric (e.g., coerce None to 0) while keeping
the API-facing timing field unchanged. [type error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Cached slow queries fail to produce proper slow logs.
- ⚠️ Logging framework reports formatting errors in application logs.
```
</details>
```suggestion
timing.get("db_execution_ms") or 0,
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Enable slow-query logging by setting `CHART_DATA_SLOW_QUERY_THRESHOLD_MS`
in
configuration so the `logger.warning` block in
`QueryContextProcessor.get_df_payload()`
(`superset/common/query_context_processor.py`, lines ~222-233) executes for
slow charts.
2. Warm the cache for a chart by requesting `/api/v1/chart/data` once;
`QueryCacheManager.get()`
(`superset/common/utils/query_cache_manager.py:153-211`) returns
`is_loaded=False` the first time, causing DB execution and correctly
populating the cache.
3. Request the same chart again so it is served from cache:
`QueryCacheManager.get()` now
sets `cache.is_loaded=True`, the DB execution phase is skipped, and
`timing["db_execution_ms"]` remains unset until line 210 assigns `None` via
`timing["db_execution_ms"] = timing.get("db_execution_ms")`.
4. When `timing["total_ms"]` exceeds the configured threshold, the
slow-query logger
formats `"db_execution=%.0fms"` with the argument
`timing.get("db_execution_ms", 0)` (line
~231), which evaluates to `None` (key exists but value is `None`) and
triggers a
`TypeError` during logging, preventing a valid slow-query log line from
being emitted.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/common/query_context_processor.py
**Line:** 231:231
**Comment:**
*Type Error: When the result is served from cache, the timing dict sets
the database execution time to None, but the slow-query logging formats this
value with a '%.0f' placeholder; if the slow-query threshold is met on a cached
response, logging will attempt to format None as a float and raise a TypeError
at runtime. To avoid this, ensure that the value passed for db execution time
in the logging call is always numeric (e.g., coerce None to 0) while keeping
the API-facing timing field unchanged.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F37516&comment_hash=283d628c09dd9fc5d78f4073f6a1f1a233242e16d2946e66e8f5c7c283a4def4&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F37516&comment_hash=283d628c09dd9fc5d78f4073f6a1f1a233242e16d2946e66e8f5c7c283a4def4&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]