codeant-ai-for-open-source[bot] commented on code in PR #38077:
URL: https://github.com/apache/superset/pull/38077#discussion_r2824273998


##########
tests/unit_tests/utils/json_tests.py:
##########
@@ -264,6 +266,119 @@ def test_json_int_dttm_ser():
         json.json_int_dttm_ser(np.datetime64())
 
 
[email protected](
+    "payload,path_values,expected_result",
+    [
+        (
+            {
+                "credentials_info": {
+                    "type": "service_account",
+                    "private_key": "XXXXXXXXXX",
+                },
+            },
+            {"$.credentials_info.private_key": "NEW_KEY"},
+            {
+                "credentials_info": {
+                    "type": "service_account",
+                    "private_key": "NEW_KEY",
+                },
+            },
+        ),
+        (
+            {
+                "auth_params": {
+                    "privatekey_body": "XXXXXXXXXX",
+                    "privatekey_pass": "XXXXXXXXXX",
+                },
+                "other": "value",
+            },
+            {
+                "$.auth_params.privatekey_body": "-----BEGIN PRIVATE KEY-----",
+                "$.auth_params.privatekey_pass": "passphrase",
+            },
+            {
+                "auth_params": {
+                    "privatekey_body": "-----BEGIN PRIVATE KEY-----",
+                    "privatekey_pass": "passphrase",
+                },
+                "other": "value",
+            },
+        ),
+        (
+            {"existing": "value"},
+            {"$.nonexistent.path": "new_value"},
+            {"existing": "value"},
+        ),
+    ],
+)
+def test_set_masked_fields(
+    payload: dict[str, Any],
+    path_values: dict[str, Any],
+    expected_result: dict[str, Any],
+) -> None:
+    """
+    Test setting a value at a JSONPath location.
+    """
+    result = json.set_masked_fields(payload, path_values)
+    assert result == expected_result
+
+
[email protected](
+    "payload,sensitive_fields,expected_result",
+    [
+        (
+            {
+                "credentials_info": {
+                    "type": "service_account",
+                    "private_key": PASSWORD_MASK,
+                },
+            },
+            {"$.credentials_info.private_key", "$.credentials_info.type"},
+            ["$.credentials_info.private_key"],
+        ),
+        (
+            {
+                "credentials_info": {
+                    "private_key": "ACTUAL_KEY",
+                },
+            },
+            {"$.credentials_info.private_key"},
+            [],
+        ),
+        (
+            {
+                "auth_params": {
+                    "privatekey_body": PASSWORD_MASK,
+                    "privatekey_pass": "actual_pass",
+                },
+                "oauth2_client_info": {
+                    "secret": PASSWORD_MASK,
+                },
+            },
+            {
+                "$.auth_params.privatekey_body",
+                "$.auth_params.privatekey_pass",
+                "$.oauth2_client_info.secret",
+            },
+            [
+                "$.auth_params.privatekey_body",
+                "$.oauth2_client_info.secret",
+            ],
+        ),
+    ],
+)
+def test_get_masked_fields(
+    payload: dict[str, Any],
+    sensitive_fields: set[str],
+    expected_result: dict[str, Any],
+) -> None:
+    """
+    Test that get_masked_fields returns paths where value equals PASSWORD_MASK.
+    """
+    masked = json.get_masked_fields(payload, sensitive_fields)
+    assert masked == sorted(expected_result)

Review Comment:
   **Suggestion:** The assertion in `test_get_masked_fields` compares the list 
returned by `get_masked_fields` directly against a sorted `expected_result`, 
but since `get_masked_fields` iterates over a `set` of sensitive fields, the 
order of items in `masked` is non-deterministic, which can cause this test to 
fail intermittently depending on set iteration order; both sides should be 
order-independent in the comparison. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Flaky unit test `test_get_masked_fields` in json_tests.
   - ⚠️ CI runs may intermittently fail on unchanged logic.
   - ⚠️ Reduces trust in tests around sensitive-field masking.
   ```
   </details>
   
   ```suggestion
       assert sorted(masked) == sorted(expected_result)
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Note implementation of `get_masked_fields` in 
`superset/utils/json.py:307-325`, which
   iterates over `sensitive_fields: set[str]` and appends matching JSONPath 
strings to a
   Python list `masked` in the order yielded by the set.
   
   2. In `tests/unit_tests/utils/json_tests.py:326-379`, observe 
`test_get_masked_fields`
   parametrization where `sensitive_fields` is passed as a `set[str]` (see 
lines 336, 345,
   358-362) and `expected_result` is a list of JSONPath strings (lines 337, 
346, 363-366).
   
   3. At `tests/unit_tests/utils/json_tests.py:378-379` the test calls `masked =
   json.get_masked_fields(payload, sensitive_fields)` and asserts `masked ==
   sorted(expected_result)`, i.e. it compares an unsorted list produced in 
set-iteration
   order against a lexicographically sorted list.
   
   4. Run `pytest tests/unit_tests/utils/json_tests.py::test_get_masked_fields` 
multiple
   times with varying `PYTHONHASHSEED` (set by the test runner or environment); 
because
   Python set iteration order is not guaranteed, the order of items in `masked` 
can differ
   from `sorted(expected_result)` (for example `["$.oauth2_client_info.secret",
   "$.auth_params.privatekey_body"]` vs `["$.auth_params.privatekey_body",
   "$.oauth2_client_info.secret"]`), causing intermittent assertion failures 
even though the
   underlying `get_masked_fields` behavior is correct.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/unit_tests/utils/json_tests.py
   **Line:** 379:379
   **Comment:**
        *Logic Error: The assertion in `test_get_masked_fields` compares the 
list returned by `get_masked_fields` directly against a sorted 
`expected_result`, but since `get_masked_fields` iterates over a `set` of 
sensitive fields, the order of items in `masked` is non-deterministic, which 
can cause this test to fail intermittently depending on set iteration order; 
both sides should be order-independent in the comparison.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38077&comment_hash=e01ff3a92833de2458a2a2c1879b6377e112e696ffa80c9b57c296348ed23895&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38077&comment_hash=e01ff3a92833de2458a2a2c1879b6377e112e696ffa80c9b57c296348ed23895&reaction=dislike'>👎</a>



##########
superset/databases/api.py:
##########
@@ -1591,6 +1591,14 @@ def import_(self) -> Response:
                         the private_key should be provided in the following 
format:
                         `{"databases/MyDatabase.yaml": 
"my_private_key_password"}`.
                       type: string
+                    encrypted_extra_secrets:
+                      description: >-
+                        JSON map of sensitive values for 
masked_encrypted_extra fields.
+                        Keys are file paths (e.g., "databases/db.yaml") and 
values
+                        are JSON objects with the restricted fields
+                        (e.g., `{"databases/MyDatabase.yaml":
+                        {"credentials_info": {"private_key": "actual_key"}}}`).

Review Comment:
   **Suggestion:** The OpenAPI documentation for `encrypted_extra_secrets` 
describes the payload as a nested JSON object keyed by field names (e.g. 
`{"credentials_info": {"private_key": "actual_key"}}`), but the backend 
`set_masked_fields` helper expects a map of JSONPath strings to values (e.g. 
`"$.credentials_info.private_key": "actual_key"`); clients following the 
current docs will send a structure that doesn't match what the backend expects, 
so their secrets may not be applied correctly during import. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Database import endpoint `/import/` misdocuments secret payload 
structure.
   - ⚠️ Importers following docs fail to update masked_encrypted_extra secrets.
   - ⚠️ Imported database connections may use stale or missing credentials.
   ```
   </details>
   
   ```suggestion
                           Keys are file paths (e.g., 
"databases/MyDatabase.yaml") and values
                           are JSON objects mapping JSONPath expressions to 
secret values
                           (e.g., `{"databases/MyDatabase.yaml":
                           {"$.credentials_info.private_key": "actual_key"}}`).
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Call the database import endpoint handled by `DatabaseRestApi.import_` in
   `superset/databases/api.py:1535-1669` (exposed at `@expose("/import/",
   methods=("POST",))`) with a multipart/form-data request.
   
   2. In the form data, include a ZIP file under `formData` containing a 
database YAML config
   at `databases/MyDatabase.yaml` that has a `masked_encrypted_extra` field 
(this is later
   read and processed in `superset/commands/importers/v1/utils.load_configs` at 
lines
   196-211).
   
   3. Following the current OpenAPI docs in 
`superset/databases/api.py:1555-1601`, set the
   `encrypted_extra_secrets` form field to a JSON string like:
   
      `{"databases/MyDatabase.yaml": {"credentials_info": {"private_key": 
"actual_key"}}}`
      (i.e. nested by field names, not JSONPath).
   
   4. On the server, `DatabaseRestApi.import_` parses this via
   `json.loads(request.form["encrypted_extra_secrets"])` at
   `superset/databases/api.py:1653-1655` and passes it to 
`ImportDatabasesCommand(...,
   encrypted_extra_secrets=encrypted_extra_secrets)` at line 1659, which in 
turn calls
   `load_configs(..., encrypted_extra_secrets=...)` in
   `superset/commands/importers/v1/utils.py:105-116`.
   
   5. Inside `load_configs`, at `utils.py:196-207`, the code assumes
   `encrypted_extra_secrets[file_name]` is a mapping from JSONPath to scalar 
values (see the
   comment and example `{"$.oauth2_client_info.secret": "actual_value"}`), and 
passes it to
   `json.set_masked_fields` in `superset/utils/json.py:328-346`, which 
explicitly expects
   `path_values: dict[str, Any]` where keys are JSONPath expressions.
   
   6. Because the structure sent by the client (per current docs) is nested 
JSON keyed by
   field names (`"credentials_info": {"private_key": ...}`) instead of JSONPath 
strings
   (`"$.credentials_info.private_key": "..."`), `set_masked_fields` operates on 
unexpected
   keys/values, so the intended secret locations in `masked_encrypted_extra` 
are not updated
   correctly. The import completes, but the masked encrypted fields either 
remain masked or
   are misapplied, contrary to what the OpenAPI documentation promises.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/databases/api.py
   **Line:** 1597:1600
   **Comment:**
        *Logic Error: The OpenAPI documentation for `encrypted_extra_secrets` 
describes the payload as a nested JSON object keyed by field names (e.g. 
`{"credentials_info": {"private_key": "actual_key"}}`), but the backend 
`set_masked_fields` helper expects a map of JSONPath strings to values (e.g. 
`"$.credentials_info.private_key": "actual_key"`); clients following the 
current docs will send a structure that doesn't match what the backend expects, 
so their secrets may not be applied correctly during import.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38077&comment_hash=bbaf6db3fe187981d36ed1ecaf2168ae6856570382d9906415671008a0b9dd28&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38077&comment_hash=bbaf6db3fe187981d36ed1ecaf2168ae6856570382d9906415671008a0b9dd28&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to