codeant-ai-for-open-source[bot] commented on code in PR #37516:
URL: https://github.com/apache/superset/pull/37516#discussion_r2842835197


##########
superset/common/query_context_processor.py:
##########
@@ -189,6 +205,33 @@ def get_df_payload(
             }
         )
         cache.df.columns = [unescape_separator(col) for col in 
cache.df.columns.values]
+        timing["result_processing_ms"] = round((time.perf_counter() - t) * 
1000, 2)
+
+        timing["db_execution_ms"] = timing.get("db_execution_ms")

Review Comment:
   **Suggestion:** On cache hits `timing["db_execution_ms"]` is set to `None`, 
but the slow-query logging later formats this field with a `%.0f` float 
placeholder using `timing.get("db_execution_ms", 0)`, which returns `None` when 
the key exists, causing a `TypeError` at runtime. Removing this assignment 
allows the logging call to fall back to the numeric default (`0`) when the key 
is absent, while preserving correct behavior on cache misses where 
`db_execution_ms` is populated. [type error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Slow-query logging fails for cached-but-slow chart requests.
   - ⚠️ Logging errors clutter logs, hiding real slow-query details.
   ```
   </details>
   
   ```suggestion
   
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure a non-null slow-query threshold (e.g.
   `CHART_DATA_SLOW_QUERY_THRESHOLD_MS=500`) so the slow-query logging block in
   `QueryContextProcessor.get_df_payload()` 
(`superset/common/query_context_processor.py`,
   lines ~221-233) is active.
   
   2. Execute any chart that populates the query cache: the first 
`/api/v1/chart/data`
   request causes a cache miss; `QueryCacheManager.get()`
   (`superset/common/utils/query_cache_manager.py:153-211`) returns a 
`QueryCacheManager`
   with `is_loaded=False`, leading to DB execution and 
`timing["db_execution_ms"]` being set.
   
   3. Issue the same chart request again so it is served from cache:
   `QueryCacheManager.get()` now sets `cache.is_loaded=True` and 
`cache.is_cached=True`, so
   the `db_execution` block in `get_df_payload()` is skipped and 
`timing["db_execution_ms"]`
   is never initialized before line 210.
   
   4. At line 210, `timing["db_execution_ms"] = timing.get("db_execution_ms")` 
creates the
   key with value `None`; if `timing["total_ms"]` still exceeds the threshold, 
the slow-query
   logger at lines 222-233 calls `logger.warning(..., 
timing.get("db_execution_ms", 0),
   ...)`, which passes `None` to a `"%.0fms"` placeholder, causing `logging` to 
emit a
   `TypeError` during message formatting and drop the intended slow-query log 
entry.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/common/query_context_processor.py
   **Line:** 210:210
   **Comment:**
        *Type Error: On cache hits `timing["db_execution_ms"]` is set to 
`None`, but the slow-query logging later formats this field with a `%.0f` float 
placeholder using `timing.get("db_execution_ms", 0)`, which returns `None` when 
the key exists, causing a `TypeError` at runtime. Removing this assignment 
allows the logging call to fall back to the numeric default (`0`) when the key 
is absent, while preserving correct behavior on cache misses where 
`db_execution_ms` is populated.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F37516&comment_hash=29034f3bf53eaae4826be51c6301bd01a42e930403b1778f57aceb0583522d10&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F37516&comment_hash=29034f3bf53eaae4826be51c6301bd01a42e930403b1778f57aceb0583522d10&reaction=dislike'>👎</a>



##########
superset/common/query_context_processor.py:
##########
@@ -189,6 +205,33 @@ def get_df_payload(
             }
         )
         cache.df.columns = [unescape_separator(col) for col in 
cache.df.columns.values]
+        timing["result_processing_ms"] = round((time.perf_counter() - t) * 
1000, 2)
+
+        timing["db_execution_ms"] = timing.get("db_execution_ms")
+        timing["total_ms"] = round((time.perf_counter() - t0) * 1000, 2)
+        timing["is_cached"] = cache.is_cached
+
+        # Emit per-phase metrics via STATS_LOGGER
+        stats_logger = current_app.config.get("STATS_LOGGER")
+        if stats_logger and hasattr(stats_logger, "timing"):
+            for phase, value in timing.items():
+                if isinstance(value, (int, float)) and not isinstance(value, 
bool):
+                    stats_logger.timing(f"chart_data.{phase}", value)
+
+        # Slow query logging
+        threshold = 
current_app.config.get("CHART_DATA_SLOW_QUERY_THRESHOLD_MS")
+        if threshold is not None and timing["total_ms"] > threshold:
+            logger.warning(
+                "Slow chart query: total=%.0fms validate=%.0fms "
+                "cache_lookup=%.0fms db_execution=%.0fms "
+                "result_processing=%.0fms is_cached=%s",
+                timing["total_ms"],
+                timing.get("validate_ms", 0),
+                timing.get("cache_lookup_ms", 0),
+                timing.get("db_execution_ms", 0),

Review Comment:
   **Suggestion:** When the result is served from cache, the timing dict sets 
the database execution time to None, but the slow-query logging formats this 
value with a '%.0f' placeholder; if the slow-query threshold is met on a cached 
response, logging will attempt to format None as a float and raise a TypeError 
at runtime. To avoid this, ensure that the value passed for db execution time 
in the logging call is always numeric (e.g., coerce None to 0) while keeping 
the API-facing timing field unchanged. [type error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Cached slow queries fail to produce proper slow logs.
   - ⚠️ Logging framework reports formatting errors in application logs.
   ```
   </details>
   
   ```suggestion
                   timing.get("db_execution_ms") or 0,
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Enable slow-query logging by setting `CHART_DATA_SLOW_QUERY_THRESHOLD_MS` 
in
   configuration so the `logger.warning` block in 
`QueryContextProcessor.get_df_payload()`
   (`superset/common/query_context_processor.py`, lines ~222-233) executes for 
slow charts.
   
   2. Warm the cache for a chart by requesting `/api/v1/chart/data` once;
   `QueryCacheManager.get()` 
(`superset/common/utils/query_cache_manager.py:153-211`) returns
   `is_loaded=False` the first time, causing DB execution and correctly 
populating the cache.
   
   3. Request the same chart again so it is served from cache: 
`QueryCacheManager.get()` now
   sets `cache.is_loaded=True`, the DB execution phase is skipped, and
   `timing["db_execution_ms"]` remains unset until line 210 assigns `None` via
   `timing["db_execution_ms"] = timing.get("db_execution_ms")`.
   
   4. When `timing["total_ms"]` exceeds the configured threshold, the 
slow-query logger
   formats `"db_execution=%.0fms"` with the argument 
`timing.get("db_execution_ms", 0)` (line
   ~231), which evaluates to `None` (key exists but value is `None`) and 
triggers a
   `TypeError` during logging, preventing a valid slow-query log line from 
being emitted.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/common/query_context_processor.py
   **Line:** 231:231
   **Comment:**
        *Type Error: When the result is served from cache, the timing dict sets 
the database execution time to None, but the slow-query logging formats this 
value with a '%.0f' placeholder; if the slow-query threshold is met on a cached 
response, logging will attempt to format None as a float and raise a TypeError 
at runtime. To avoid this, ensure that the value passed for db execution time 
in the logging call is always numeric (e.g., coerce None to 0) while keeping 
the API-facing timing field unchanged.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F37516&comment_hash=283d628c09dd9fc5d78f4073f6a1f1a233242e16d2946e66e8f5c7c283a4def4&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F37516&comment_hash=283d628c09dd9fc5d78f4073f6a1f1a233242e16d2946e66e8f5c7c283a4def4&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to