Re: [PR] fix(charts): add query_context validation to prevent incomplete metadata [superset]

via GitHub Sat, 13 Dec 2025 07:23:56 -0800


codeant-ai-for-open-source[bot] commented on code in PR #36076:
URL: https://github.com/apache/superset/pull/36076#discussion_r2616363906



##########
tests/unit_tests/charts/test_schemas.py:
##########
@@ -152,3 +155,120 @@ def 
test_time_grain_validation_with_config_addons(app_context: None) -> None:
     }
     result = schema.load(custom_data)
     assert result["time_grain"] == "PT10M"
+
+
+def test_chart_post_schema_query_context_validation(app_context: None) -> None:
+    """Test that ChartPostSchema validates query_context contains required 
metadata"""
+    schema = ChartPostSchema()
+
+    # Valid query_context with datasource and queries should pass
+    valid_query_context = json.dumps(
+        {
+            "datasource": {"type": "table", "id": 1},
+            "queries": [{"metrics": ["count"], "columns": []}],
+        }
+    )
+    valid_data = {
+        "slice_name": "Test Chart",
+        "datasource_id": 1,
+        "datasource_type": "table",
+        "query_context": valid_query_context,
+    }
+    result = schema.load(valid_data)
+    assert result["query_context"] == valid_query_context
+
+    # None query_context should be allowed (allow_none=True)
+    none_data = {
+        "slice_name": "Test Chart",
+        "datasource_id": 1,
+        "datasource_type": "table",
+        "query_context": None,
+    }
+    result = schema.load(none_data)
+    assert result["query_context"] is None
+
+    # Query context missing 'datasource' field should fail
+    missing_datasource = json.dumps(
+        {"queries": [{"metrics": ["count"], "columns": []}]}
+    )
+    invalid_data_1 = {
+        "slice_name": "Test Chart",
+        "datasource_id": 1,
+        "datasource_type": "table",
+        "query_context": missing_datasource,
+    }
+    with pytest.raises(ValidationError) as exc_info:
+        schema.load(invalid_data_1)
+    assert "query_context" in exc_info.value.messages
+    assert "datasource" in str(exc_info.value.messages["query_context"])
+
+    # Query context missing 'queries' field should fail
+    missing_queries = json.dumps({"datasource": {"type": "table", "id": 1}})
+    invalid_data_2 = {
+        "slice_name": "Test Chart",
+        "datasource_id": 1,
+        "datasource_type": "table",
+        "query_context": missing_queries,
+    }
+    with pytest.raises(ValidationError) as exc_info:
+        schema.load(invalid_data_2)
+    assert "query_context" in exc_info.value.messages
+    assert "queries" in str(exc_info.value.messages["query_context"])
+
+    # Query context missing both 'datasource' and 'queries' should fail
+    empty_query_context = json.dumps({})
+    invalid_data_3 = {
+        "slice_name": "Test Chart",
+        "datasource_id": 1,
+        "datasource_type": "table",
+        "query_context": empty_query_context,
+    }
+    with pytest.raises(ValidationError) as exc_info:
+        schema.load(invalid_data_3)
+    assert "query_context" in exc_info.value.messages
+    assert "datasource" in str(exc_info.value.messages["query_context"])
+    assert "queries" in str(exc_info.value.messages["query_context"])
+
+    # Invalid JSON should fail
+    invalid_json = "not valid json"
+    invalid_data_4 = {
+        "slice_name": "Test Chart",
+        "datasource_id": 1,
+        "datasource_type": "table",
+        "query_context": invalid_json,
+    }
+    with pytest.raises(ValidationError) as exc_info:
+        schema.load(invalid_data_4)
+    assert "query_context" in exc_info.value.messages
+
+
+def test_chart_put_schema_query_context_validation(app_context: None) -> None:
+    """Test that ChartPutSchema validates query_context contains required 
metadata"""
+    schema = ChartPutSchema()
+
+    # Valid query_context with datasource and queries should pass
+    valid_query_context = json.dumps(
+        {
+            "datasource": {"type": "table", "id": 1},
+            "queries": [{"metrics": ["count"], "columns": []}],
+        }
+    )
+    valid_data = {
+        "slice_name": "Updated Chart",
+        "query_context": valid_query_context,
+    }
+    result = schema.load(valid_data)
+    assert result["query_context"] == valid_query_context

Review Comment:
   **Suggestion:** The ChartPutSchema test also asserts exact string equality 
of JSON; like the POST test this is brittle if the schema normalizes or returns 
a dict. Parse and compare the JSON payloads with `json.loads` to avoid false 
negatives due to serialization differences. [logic error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
       assert json.loads(result["query_context"]) == 
json.loads(valid_query_context)
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Same rationale as above for the PUT schema test: structural comparison via
   json.loads avoids brittle string equality and prevents false negatives if
   the schema normalizes or reorders keys.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/unit_tests/charts/test_schemas.py
   **Line:** 261:261
   **Comment:**
        *Logic Error: The ChartPutSchema test also asserts exact string 
equality of JSON; like the POST test this is brittle if the schema normalizes 
or returns a dict. Parse and compare the JSON payloads with `json.loads` to 
avoid false negatives due to serialization differences.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
tests/unit_tests/charts/test_schemas.py:
##########
@@ -152,3 +155,120 @@ def 
test_time_grain_validation_with_config_addons(app_context: None) -> None:
     }
     result = schema.load(custom_data)
     assert result["time_grain"] == "PT10M"
+
+
+def test_chart_post_schema_query_context_validation(app_context: None) -> None:
+    """Test that ChartPostSchema validates query_context contains required 
metadata"""
+    schema = ChartPostSchema()
+
+    # Valid query_context with datasource and queries should pass
+    valid_query_context = json.dumps(
+        {
+            "datasource": {"type": "table", "id": 1},
+            "queries": [{"metrics": ["count"], "columns": []}],
+        }
+    )
+    valid_data = {
+        "slice_name": "Test Chart",
+        "datasource_id": 1,
+        "datasource_type": "table",
+        "query_context": valid_query_context,
+    }
+    result = schema.load(valid_data)
+    assert result["query_context"] == valid_query_context

Review Comment:
   **Suggestion:** The test compares the raw JSON string returned by the schema 
to the original JSON string; if the schema or validator normalizes or 
re-serializes the JSON (or returns a dict), the string comparison will fail. 
Compare parsed JSON structures instead by using `json.loads` on both sides to 
make the assertion robust to formatting/serialization differences. [logic error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
       assert json.loads(result["query_context"]) == 
json.loads(valid_query_context)
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Comparing raw JSON strings is brittle — whitespace or key ordering 
differences
   (or if the validator re-serializes) could cause false negatives. Parsing and
   comparing structures with json.loads makes the assertion robust and reduces
   test flakiness. This directly improves test correctness.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/unit_tests/charts/test_schemas.py
   **Line:** 178:178
   **Comment:**
        *Logic Error: The test compares the raw JSON string returned by the 
schema to the original JSON string; if the schema or validator normalizes or 
re-serializes the JSON (or returns a dict), the string comparison will fail. 
Compare parsed JSON structures instead by using `json.loads` on both sides to 
make the assertion robust to formatting/serialization differences.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/utils/schema.py:
##########
@@ -50,4 +50,37 @@ def validate_json(value: Union[bytes, bytearray, str]) -> 
None:
     try:
         json.validate_json(value)
     except json.JSONDecodeError as ex:
-        raise ValidationError("JSON not valid") from ex
+        error_msg = "JSON not valid"
+        raise ValidationError(error_msg) from ex
+
+
+def validate_query_context_metadata(value: Union[bytes, bytearray, str, None]) 
-> None:
+    """
+    Validator for query_context field to ensure it contains required metadata.
+
+    Validates that the query_context JSON contains the required 'datasource' 
and
+    'queries' fields needed for chart data retrieval.
+
+    :raises ValidationError: if value is not valid JSON or missing required 
fields
+    :param value: a JSON string that should contain datasource and queries 
metadata
+    """
+    if value is None or value == "":
+        return  # Allow None values and empty strings
+
+    # Reuse existing JSON validation logic
+    validate_json(value)
+
+    # Parse and validate the structure
+    parsed_data = json.loads(value)
+
+    # Validate required fields exist in the query_context
+    if not isinstance(parsed_data, dict):
+        error_msg = "Query context must be a valid JSON object"
+        raise ValidationError(error_msg)
+
+    # When query_context is provided (not None), validate it has required 
fields
+    required_fields = {"datasource", "queries"}
+    missing_fields: set[str] = required_fields - parsed_data.keys()

Review Comment:
   **Suggestion:** The inline type annotation `missing_fields: set[str] = ...` 
uses PEP 585-style generics which are not available on Python <3.9 and will 
raise a TypeError at import time; remove the PEP585 annotation (or use 
typing.Set and import it) to avoid runtime errors on older Python versions. 
[type error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
       missing_fields = required_fields - parsed_data.keys()
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   This is a valid correctness/safety suggestion — using built-in generic types 
like `set[str]` will fail at import on Python <3.9 (builtins are not 
subscriptable), causing a runtime error. Removing the annotation or replacing 
it with `typing.Set[str]` (and importing it) prevents that potential 
import-time crash.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/utils/schema.py
   **Line:** 83:83
   **Comment:**
        *Type Error: The inline type annotation `missing_fields: set[str] = 
...` uses PEP 585-style generics which are not available on Python <3.9 and 
will raise a TypeError at import time; remove the PEP585 annotation (or use 
typing.Set and import it) to avoid runtime errors on older Python versions.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(charts): add query_context validation to prevent incomplete metadata [superset]

Reply via email to