codeant-ai-for-open-source[bot] commented on code in PR #40454:
URL: https://github.com/apache/superset/pull/40454#discussion_r3308546998
##########
tests/invariant_impala.test.py:
##########
@@ -0,0 +1,324 @@
+import { describe, test, expect, jest, beforeEach, afterEach } from
'@jest/globals';
+
+/**
+ * Security Property:
+ * The Impala engine spec must never make HTTP POST requests to URLs derived
from
+ * user-controllable connection parameters that point to internal/private
network
+ * addresses, SSRF targets, or otherwise unauthorized destinations.
+ *
+ * WHAT MUST ALWAYS BE TRUE:
+ * - URLs constructed from user-supplied connection parameters must be
validated
+ * against an allowlist of permitted hosts/schemes before any HTTP request
is made.
+ * - Internal network addresses (loopback, RFC1918, metadata services) must be
rejected.
+ * - Non-HTTP(S) schemes must be rejected.
+ * - Redirects to internal services must not be followed blindly.
+ */
+
+// Simulated URL validator that represents the security control that MUST exist
+// in the Impala engine spec before making any HTTP POST request.
+function validateImpalaKillQueryUrl(url: string): { valid: boolean; reason?:
string } {
+ let parsed: URL;
+
+ try {
+ parsed = new URL(url);
+ } catch {
+ return { valid: false, reason: 'Invalid URL format' };
+ }
+
+ // Only allow http and https schemes
+ if (!['http:', 'https:'].includes(parsed.protocol)) {
+ return { valid: false, reason: `Disallowed scheme: ${parsed.protocol}` };
+ }
+
+ const hostname = parsed.hostname.toLowerCase();
+
+ // Block loopback addresses
+ if (
+ hostname === 'localhost' ||
+ hostname === '127.0.0.1' ||
+ hostname === '::1' ||
+ hostname.startsWith('127.')
+ ) {
+ return { valid: false, reason: 'Loopback address not allowed' };
+ }
+
+ // Block RFC1918 private ranges
+ const privateRanges = [
+ /^10\.\d+\.\d+\.\d+$/,
+ /^172\.(1[6-9]|2\d|3[01])\.\d+\.\d+$/,
+ /^192\.168\.\d+\.\d+$/,
+ ];
+ for (const range of privateRanges) {
+ if (range.test(hostname)) {
+ return { valid: false, reason: 'Private network address not allowed' };
+ }
+ }
+
+ // Block cloud metadata services
+ const blockedHosts = [
+ '169.254.169.254', // AWS/GCP/Azure metadata
+ 'metadata.google.internal',
+ 'metadata.internal',
+ '100.100.100.200', // Alibaba Cloud metadata
+ '192.0.2.1', // TEST-NET
+ '0.0.0.0',
+ ];
+ if (blockedHosts.includes(hostname)) {
+ return { valid: false, reason: 'Blocked host (metadata service or
reserved)' };
+ }
+
+ // Block IPv6 link-local
+ if (hostname.startsWith('fe80') || hostname === '[::1]') {
+ return { valid: false, reason: 'IPv6 link-local or loopback not allowed' };
+ }
+
+ // Block URLs with credentials embedded
+ if (parsed.username || parsed.password) {
+ return { valid: false, reason: 'Credentials in URL not allowed' };
+ }
+
+ return { valid: true };
+}
+
+// Simulated function that represents what the Impala spec does when killing a
query
+function impalaKillQuery(connectionHost: string, connectionPort: number,
queryId: string): {
+ urlAttempted: string;
+ requestMade: boolean;
+ blocked: boolean;
+ reason?: string;
+} {
+ // Simulate URL construction from connection parameters (as in the
vulnerable code)
+ const url =
`http://${connectionHost}:${connectionPort}/cancel_query?query_id=${encodeURIComponent(queryId)}`;
+
+ const validation = validateImpalaKillQueryUrl(url);
+
+ if (!validation.valid) {
+ return {
+ urlAttempted: url,
+ requestMade: false,
+ blocked: true,
+ reason: validation.reason,
+ };
+ }
+
+ // Only if validation passes would we make the actual HTTP POST
+ return {
+ urlAttempted: url,
+ requestMade: true,
+ blocked: false,
+ };
+}
Review Comment:
**Suggestion:** The test validates a local simulation instead of invoking
the real Impala engine implementation, so it can pass even when production
cancellation logic remains vulnerable. Replace the mocked `impalaKillQuery`
flow with an integration/unit test that exercises
`ImpalaEngineSpec.cancel_query` (or its immediate call path) directly.
[incomplete implementation]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Test doesn't exercise ImpalaEngineSpec.cancel_query production path.
- ⚠️ Passing test may hide real SSRF vulnerabilities.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Open `tests/invariant_impala.test.py` and locate the helper
`impalaKillQuery` defined
at lines 84-110: it constructs a URL with `const url =
\`http://${connectionHost}:${connectionPort}/cancel_query?query_id=${encodeURIComponent(queryId)}\`;`
(line 91), validates it with `validateImpalaKillQueryUrl(url)` (lines
93-100), and returns
a synthetic object indicating whether a request would be made.
2. Note that this file contains only local functions
`validateImpalaKillQueryUrl` and
`impalaKillQuery` plus Jest-style tests defined via `describe`/`test` blocks
(lines
112-250), and it does not import any Superset backend modules such as
`superset.db_engine_specs.impala` or `superset.sql_lab`, nor does it
reference
`ImpalaEngineSpec.cancel_query` anywhere.
3. Compare this to the actual production cancellation path:
`QueryDAO.stop_query` in
`superset/daos/query.py:1-23` calls `superset.sql_lab.cancel_query(query)`;
`superset/sql_lab.py:8-43` then calls
`query.database.db_engine_spec.cancel_query(cursor,
query, cancel_query_id)`; finally `ImpalaEngineSpec.cancel_query` in
`superset/db_engine_specs/impala.py:21-41` builds the real cancellation URL
and issues
`requests.post(url, timeout=3)`. None of these production functions are
invoked by the
JS/TS helper `impalaKillQuery`.
4. Even if this file were wired into an appropriate Jest test suite, all
assertions would
exercise only the simulated helper `impalaKillQuery` (lines 84-110) and its
pure-logic
validator, so the tests could pass while `ImpalaEngineSpec.cancel_query` or
the SQL Lab
cancellation flow remained vulnerable (for example continuing to POST to
internal/private
hosts), meaning the regression test does not actually cover the real
production
cancellation implementation.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=45f786f10b104c6aa8abe1b8bfb3120d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=45f786f10b104c6aa8abe1b8bfb3120d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** tests/invariant_impala.test.py
**Line:** 84:110
**Comment:**
*Incomplete Implementation: The test validates a local simulation
instead of invoking the real Impala engine implementation, so it can pass even
when production cancellation logic remains vulnerable. Replace the mocked
`impalaKillQuery` flow with an integration/unit test that exercises
`ImpalaEngineSpec.cancel_query` (or its immediate call path) directly.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40454&comment_hash=a5d56bb3b4591aceba408a1e051f3fa772fb9d50cee01fed483129dcef754557&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40454&comment_hash=a5d56bb3b4591aceba408a1e051f3fa772fb9d50cee01fed483129dcef754557&reaction=dislike'>👎</a>
##########
superset/db_engine_specs/impala.py:
##########
@@ -209,7 +209,10 @@ def cancel_query(cls, cursor: Any, query: Query,
cancel_query_id: str) -> bool:
"""
try:
impala_host = query.database.url_object.host
- url =
f"http://{impala_host}:25000/cancel_query?query_id={cancel_query_id}"
+ url = "http://{}:25000/cancel_query?query_id={}".format(
+ requests.utils.quote(impala_host, safe=""),
+ requests.utils.quote(cancel_query_id, safe=""),
+ )
Review Comment:
**Suggestion:** This change still builds the outbound cancellation URL
directly from `query.database.url_object.host` without any SSRF guardrails (no
allowlist, no private/loopback rejection, no DNS/IP validation). URL-encoding
the host does not enforce destination safety, so a user-controlled database
host can still target internal services. Add strict host/scheme validation
before issuing the POST. [ssrf]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ /api/v1/query/stop can POST to internal hosts.
- ❌ Impala cancel_query issues SSRF to arbitrary configured hosts.
- ⚠️ Attackers with DB creation pivot into internal network.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Configure an Impala database connection whose host points to an
internal/private
address (for example `127.0.0.1` or `10.0.0.1`) so that
`query.database.url_object.host`
resolves to that value when a query runs; `ImpalaEngineSpec.cancel_query` at
`superset/db_engine_specs/impala.py:22-33` uses
`query.database.url_object.host` (line 32)
as the `impala_host`.
2. Execute a SQL Lab query against this Impala database so a `Query` ORM row
is created
(model defined in `superset/models/sql_lab.py:27-55`) with its `database`
field pointing
at the Impala `Database` configuration that carries the attacker-controlled
host.
3. From a client (or via tests), invoke the stop-query API `POST
/api/v1/query/stop` which
is wired in integration test
`tests/integration_tests/queries/api_tests.py:19-37` and
handled by `QueryDAO.stop_query` at `superset/daos/query.py:1-23`; this
method looks up
the `Query` row and calls `sql_lab.cancel_query(query)` at
`superset/sql_lab.py:8-22`.
4. Inside `sql_lab.cancel_query` ( `superset/sql_lab.py:8-43` ), Superset
obtains a
SQLAlchemy engine and cursor and dispatches to
`query.database.db_engine_spec.cancel_query(cursor, query, cancel_query_id)`
at lines
35-43; in the Impala implementation, `ImpalaEngineSpec.cancel_query` builds
`url =
"http://{}:25000/cancel_query?query_id={}".format(requests.utils.quote(impala_host,
safe=""), ...)` at `superset/db_engine_specs/impala.py:31-36` and
immediately calls
`requests.post(url, timeout=3)` (line 37) with no
allowlist/loopback/private-range/metadata-service validation, so the
attacker-controlled
database host fully controls the HTTP destination (potential SSRF into
internal services).
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=11bbd91dfb9b4fe0aa631889af8a21d0&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=11bbd91dfb9b4fe0aa631889af8a21d0&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/db_engine_specs/impala.py
**Line:** 212:215
**Comment:**
*Ssrf: This change still builds the outbound cancellation URL directly
from `query.database.url_object.host` without any SSRF guardrails (no
allowlist, no private/loopback rejection, no DNS/IP validation). URL-encoding
the host does not enforce destination safety, so a user-controlled database
host can still target internal services. Add strict host/scheme validation
before issuing the POST.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40454&comment_hash=a5c936f82f0fde1142adf61104c8b1cb1e6f602e47547f29755e6ebef10e0996&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40454&comment_hash=a5c936f82f0fde1142adf61104c8b1cb1e6f602e47547f29755e6ebef10e0996&reaction=dislike'>👎</a>
##########
tests/invariant_impala.test.py:
##########
@@ -0,0 +1,324 @@
+import { describe, test, expect, jest, beforeEach, afterEach } from
'@jest/globals';
Review Comment:
**Suggestion:** This regression test will not be executed by the repo's
configured test runners: pytest only collects `*_test.py`/`test_*.py` style
Python tests, and Jest is configured for JS/TS files under `superset-frontend`
paths. As a result, this security check provides no CI protection. Move it to
the appropriate JS/TS test location (or convert to a Python test that matches
pytest patterns). [incomplete implementation]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ SSRF security invariant test never runs in CI.
- ⚠️ Future Impala changes lack automated SSRF regression guard.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Inspect the added regression test file `tests/invariant_impala.test.py`
and note that
line 1 contains JavaScript/TypeScript syntax: `import { describe, test,
expect, jest,
beforeEach, afterEach } from '@jest/globals';` (shown in the PR hunk and in
the file at
`tests/invariant_impala.test.py:1`), which is not valid Python syntax and
therefore cannot
be imported by pytest as a backend test.
2. Run a code search for `invariant_impala` across the repository using
ripgrep; the only
match reported is in `graphify-out/graph.json`, and there are no references
in any Python
files or in other JS/TS files under `superset-frontend`, meaning no existing
test runner
or module import path references `tests/invariant_impala.test.py`.
3. Backend tests are discovered by pytest over `tests/*.py` (as evidenced by
extensive
`.py` tests under `tests/unit_tests` and `tests/integration_tests` that
import Superset
modules); attempting to import `tests/invariant_impala.test.py` under pytest
would hit a
`SyntaxError` at line 1 due to the `import { ... }` syntax, consistent with
the
static-analysis finding `syntax-error` on this file.
4. Because the file is invalid Python, not referenced by any Jest
configuration or JS/TS
test harness (no imports or matches for this filename/path in the JS tree),
and resides in
the backend `tests/` tree instead of the `superset-frontend` JS test
locations, the
intended security regression test is not executed by either the Python
pytest suite or the
frontend Jest suite, leaving the SSRF invariant un-enforced in CI.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=01cd39c8025c4100ae04a5ee42b3055f&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=01cd39c8025c4100ae04a5ee42b3055f&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** tests/invariant_impala.test.py
**Line:** 1:1
**Comment:**
*Incomplete Implementation: This regression test will not be executed
by the repo's configured test runners: pytest only collects
`*_test.py`/`test_*.py` style Python tests, and Jest is configured for JS/TS
files under `superset-frontend` paths. As a result, this security check
provides no CI protection. Move it to the appropriate JS/TS test location (or
convert to a Python test that matches pytest patterns).
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40454&comment_hash=70d8a5130ddfc5cee502242a7edabd465919de851a18a2f86045df671c801c91&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40454&comment_hash=70d8a5130ddfc5cee502242a7edabd465919de851a18a2f86045df671c801c91&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]