Re: [PR] fix: the impala database engine spec makes an http p... in impala.py [superset]

via GitHub Tue, 26 May 2026 22:00:13 -0700


codeant-ai-for-open-source[bot] commented on code in PR #40454:
URL: https://github.com/apache/superset/pull/40454#discussion_r3308546998



##########
tests/invariant_impala.test.py:
##########
@@ -0,0 +1,324 @@
+import { describe, test, expect, jest, beforeEach, afterEach } from 
'@jest/globals';
+
+/**
+ * Security Property:
+ * The Impala engine spec must never make HTTP POST requests to URLs derived 
from
+ * user-controllable connection parameters that point to internal/private 
network
+ * addresses, SSRF targets, or otherwise unauthorized destinations.
+ *
+ * WHAT MUST ALWAYS BE TRUE:
+ * - URLs constructed from user-supplied connection parameters must be 
validated
+ *   against an allowlist of permitted hosts/schemes before any HTTP request 
is made.
+ * - Internal network addresses (loopback, RFC1918, metadata services) must be 
rejected.
+ * - Non-HTTP(S) schemes must be rejected.
+ * - Redirects to internal services must not be followed blindly.
+ */
+
+// Simulated URL validator that represents the security control that MUST exist
+// in the Impala engine spec before making any HTTP POST request.
+function validateImpalaKillQueryUrl(url: string): { valid: boolean; reason?: 
string } {
+  let parsed: URL;
+
+  try {
+    parsed = new URL(url);
+  } catch {
+    return { valid: false, reason: 'Invalid URL format' };
+  }
+
+  // Only allow http and https schemes
+  if (!['http:', 'https:'].includes(parsed.protocol)) {
+    return { valid: false, reason: `Disallowed scheme: ${parsed.protocol}` };
+  }
+
+  const hostname = parsed.hostname.toLowerCase();
+
+  // Block loopback addresses
+  if (
+    hostname === 'localhost' ||
+    hostname === '127.0.0.1' ||
+    hostname === '::1' ||
+    hostname.startsWith('127.')
+  ) {
+    return { valid: false, reason: 'Loopback address not allowed' };
+  }
+
+  // Block RFC1918 private ranges
+  const privateRanges = [
+    /^10\.\d+\.\d+\.\d+$/,
+    /^172\.(1[6-9]|2\d|3[01])\.\d+\.\d+$/,
+    /^192\.168\.\d+\.\d+$/,
+  ];
+  for (const range of privateRanges) {
+    if (range.test(hostname)) {
+      return { valid: false, reason: 'Private network address not allowed' };
+    }
+  }
+
+  // Block cloud metadata services
+  const blockedHosts = [
+    '169.254.169.254', // AWS/GCP/Azure metadata
+    'metadata.google.internal',
+    'metadata.internal',
+    '100.100.100.200', // Alibaba Cloud metadata
+    '192.0.2.1',      // TEST-NET
+    '0.0.0.0',
+  ];
+  if (blockedHosts.includes(hostname)) {
+    return { valid: false, reason: 'Blocked host (metadata service or 
reserved)' };
+  }
+
+  // Block IPv6 link-local
+  if (hostname.startsWith('fe80') || hostname === '[::1]') {
+    return { valid: false, reason: 'IPv6 link-local or loopback not allowed' };
+  }
+
+  // Block URLs with credentials embedded
+  if (parsed.username || parsed.password) {
+    return { valid: false, reason: 'Credentials in URL not allowed' };
+  }
+
+  return { valid: true };
+}
+
+// Simulated function that represents what the Impala spec does when killing a 
query
+function impalaKillQuery(connectionHost: string, connectionPort: number, 
queryId: string): {
+  urlAttempted: string;
+  requestMade: boolean;
+  blocked: boolean;
+  reason?: string;
+} {
+  // Simulate URL construction from connection parameters (as in the 
vulnerable code)
+  const url = 
`http://${connectionHost}:${connectionPort}/cancel_query?query_id=${encodeURIComponent(queryId)}`;
+
+  const validation = validateImpalaKillQueryUrl(url);
+
+  if (!validation.valid) {
+    return {
+      urlAttempted: url,
+      requestMade: false,
+      blocked: true,
+      reason: validation.reason,
+    };
+  }
+
+  // Only if validation passes would we make the actual HTTP POST
+  return {
+    urlAttempted: url,
+    requestMade: true,
+    blocked: false,
+  };
+}

Review Comment:
   **Suggestion:** The test validates a local simulation instead of invoking 
the real Impala engine implementation, so it can pass even when production 
cancellation logic remains vulnerable. Replace the mocked `impalaKillQuery` 
flow with an integration/unit test that exercises 
`ImpalaEngineSpec.cancel_query` (or its immediate call path) directly. 
[incomplete implementation]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Test doesn't exercise ImpalaEngineSpec.cancel_query production path.
   - ⚠️ Passing test may hide real SSRF vulnerabilities.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Open `tests/invariant_impala.test.py` and locate the helper 
`impalaKillQuery` defined
   at lines 84-110: it constructs a URL with `const url =
   
\`http://${connectionHost}:${connectionPort}/cancel_query?query_id=${encodeURIComponent(queryId)}\`;`
   (line 91), validates it with `validateImpalaKillQueryUrl(url)` (lines 
93-100), and returns
   a synthetic object indicating whether a request would be made.
   
   2. Note that this file contains only local functions 
`validateImpalaKillQueryUrl` and
   `impalaKillQuery` plus Jest-style tests defined via `describe`/`test` blocks 
(lines
   112-250), and it does not import any Superset backend modules such as
   `superset.db_engine_specs.impala` or `superset.sql_lab`, nor does it 
reference
   `ImpalaEngineSpec.cancel_query` anywhere.
   
   3. Compare this to the actual production cancellation path: 
`QueryDAO.stop_query` in
   `superset/daos/query.py:1-23` calls `superset.sql_lab.cancel_query(query)`;
   `superset/sql_lab.py:8-43` then calls 
`query.database.db_engine_spec.cancel_query(cursor,
   query, cancel_query_id)`; finally `ImpalaEngineSpec.cancel_query` in
   `superset/db_engine_specs/impala.py:21-41` builds the real cancellation URL 
and issues
   `requests.post(url, timeout=3)`. None of these production functions are 
invoked by the
   JS/TS helper `impalaKillQuery`.
   
   4. Even if this file were wired into an appropriate Jest test suite, all 
assertions would
   exercise only the simulated helper `impalaKillQuery` (lines 84-110) and its 
pure-logic
   validator, so the tests could pass while `ImpalaEngineSpec.cancel_query` or 
the SQL Lab
   cancellation flow remained vulnerable (for example continuing to POST to 
internal/private
   hosts), meaning the regression test does not actually cover the real 
production
   cancellation implementation.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=45f786f10b104c6aa8abe1b8bfb3120d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=45f786f10b104c6aa8abe1b8bfb3120d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/invariant_impala.test.py
   **Line:** 84:110
   **Comment:**
        *Incomplete Implementation: The test validates a local simulation 
instead of invoking the real Impala engine implementation, so it can pass even 
when production cancellation logic remains vulnerable. Replace the mocked 
`impalaKillQuery` flow with an integration/unit test that exercises 
`ImpalaEngineSpec.cancel_query` (or its immediate call path) directly.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40454&comment_hash=a5d56bb3b4591aceba408a1e051f3fa772fb9d50cee01fed483129dcef754557&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40454&comment_hash=a5d56bb3b4591aceba408a1e051f3fa772fb9d50cee01fed483129dcef754557&reaction=dislike'>👎</a>



##########
superset/db_engine_specs/impala.py:
##########
@@ -209,7 +209,10 @@ def cancel_query(cls, cursor: Any, query: Query, 
cancel_query_id: str) -> bool:
         """
         try:
             impala_host = query.database.url_object.host
-            url = 
f"http://{impala_host}:25000/cancel_query?query_id={cancel_query_id}";
+            url = "http://{}:25000/cancel_query?query_id={}".format(
+                requests.utils.quote(impala_host, safe=""),
+                requests.utils.quote(cancel_query_id, safe=""),
+            )

Review Comment:
   **Suggestion:** This change still builds the outbound cancellation URL 
directly from `query.database.url_object.host` without any SSRF guardrails (no 
allowlist, no private/loopback rejection, no DNS/IP validation). URL-encoding 
the host does not enforce destination safety, so a user-controlled database 
host can still target internal services. Add strict host/scheme validation 
before issuing the POST. [ssrf]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ /api/v1/query/stop can POST to internal hosts.
   - ❌ Impala cancel_query issues SSRF to arbitrary configured hosts.
   - ⚠️ Attackers with DB creation pivot into internal network.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure an Impala database connection whose host points to an 
internal/private
   address (for example `127.0.0.1` or `10.0.0.1`) so that 
`query.database.url_object.host`
   resolves to that value when a query runs; `ImpalaEngineSpec.cancel_query` at
   `superset/db_engine_specs/impala.py:22-33` uses 
`query.database.url_object.host` (line 32)
   as the `impala_host`.
   
   2. Execute a SQL Lab query against this Impala database so a `Query` ORM row 
is created
   (model defined in `superset/models/sql_lab.py:27-55`) with its `database` 
field pointing
   at the Impala `Database` configuration that carries the attacker-controlled 
host.
   
   3. From a client (or via tests), invoke the stop-query API `POST 
/api/v1/query/stop` which
   is wired in integration test 
`tests/integration_tests/queries/api_tests.py:19-37` and
   handled by `QueryDAO.stop_query` at `superset/daos/query.py:1-23`; this 
method looks up
   the `Query` row and calls `sql_lab.cancel_query(query)` at 
`superset/sql_lab.py:8-22`.
   
   4. Inside `sql_lab.cancel_query` ( `superset/sql_lab.py:8-43` ), Superset 
obtains a
   SQLAlchemy engine and cursor and dispatches to
   `query.database.db_engine_spec.cancel_query(cursor, query, cancel_query_id)` 
at lines
   35-43; in the Impala implementation, `ImpalaEngineSpec.cancel_query` builds 
`url =
   
"http://{}:25000/cancel_query?query_id={}".format(requests.utils.quote(impala_host,
   safe=""), ...)` at `superset/db_engine_specs/impala.py:31-36` and 
immediately calls
   `requests.post(url, timeout=3)` (line 37) with no
   allowlist/loopback/private-range/metadata-service validation, so the 
attacker-controlled
   database host fully controls the HTTP destination (potential SSRF into 
internal services).
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=11bbd91dfb9b4fe0aa631889af8a21d0&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=11bbd91dfb9b4fe0aa631889af8a21d0&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/impala.py
   **Line:** 212:215
   **Comment:**
        *Ssrf: This change still builds the outbound cancellation URL directly 
from `query.database.url_object.host` without any SSRF guardrails (no 
allowlist, no private/loopback rejection, no DNS/IP validation). URL-encoding 
the host does not enforce destination safety, so a user-controlled database 
host can still target internal services. Add strict host/scheme validation 
before issuing the POST.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40454&comment_hash=a5c936f82f0fde1142adf61104c8b1cb1e6f602e47547f29755e6ebef10e0996&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40454&comment_hash=a5c936f82f0fde1142adf61104c8b1cb1e6f602e47547f29755e6ebef10e0996&reaction=dislike'>👎</a>



##########
tests/invariant_impala.test.py:
##########
@@ -0,0 +1,324 @@
+import { describe, test, expect, jest, beforeEach, afterEach } from 
'@jest/globals';

Review Comment:
   **Suggestion:** This regression test will not be executed by the repo's 
configured test runners: pytest only collects `*_test.py`/`test_*.py` style 
Python tests, and Jest is configured for JS/TS files under `superset-frontend` 
paths. As a result, this security check provides no CI protection. Move it to 
the appropriate JS/TS test location (or convert to a Python test that matches 
pytest patterns). [incomplete implementation]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ SSRF security invariant test never runs in CI.
   - ⚠️ Future Impala changes lack automated SSRF regression guard.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Inspect the added regression test file `tests/invariant_impala.test.py` 
and note that
   line 1 contains JavaScript/TypeScript syntax: `import { describe, test, 
expect, jest,
   beforeEach, afterEach } from '@jest/globals';` (shown in the PR hunk and in 
the file at
   `tests/invariant_impala.test.py:1`), which is not valid Python syntax and 
therefore cannot
   be imported by pytest as a backend test.
   
   2. Run a code search for `invariant_impala` across the repository using 
ripgrep; the only
   match reported is in `graphify-out/graph.json`, and there are no references 
in any Python
   files or in other JS/TS files under `superset-frontend`, meaning no existing 
test runner
   or module import path references `tests/invariant_impala.test.py`.
   
   3. Backend tests are discovered by pytest over `tests/*.py` (as evidenced by 
extensive
   `.py` tests under `tests/unit_tests` and `tests/integration_tests` that 
import Superset
   modules); attempting to import `tests/invariant_impala.test.py` under pytest 
would hit a
   `SyntaxError` at line 1 due to the `import { ... }` syntax, consistent with 
the
   static-analysis finding `syntax-error` on this file.
   
   4. Because the file is invalid Python, not referenced by any Jest 
configuration or JS/TS
   test harness (no imports or matches for this filename/path in the JS tree), 
and resides in
   the backend `tests/` tree instead of the `superset-frontend` JS test 
locations, the
   intended security regression test is not executed by either the Python 
pytest suite or the
   frontend Jest suite, leaving the SSRF invariant un-enforced in CI.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=01cd39c8025c4100ae04a5ee42b3055f&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=01cd39c8025c4100ae04a5ee42b3055f&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/invariant_impala.test.py
   **Line:** 1:1
   **Comment:**
        *Incomplete Implementation: This regression test will not be executed 
by the repo's configured test runners: pytest only collects 
`*_test.py`/`test_*.py` style Python tests, and Jest is configured for JS/TS 
files under `superset-frontend` paths. As a result, this security check 
provides no CI protection. Move it to the appropriate JS/TS test location (or 
convert to a Python test that matches pytest patterns).
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40454&comment_hash=70d8a5130ddfc5cee502242a7edabd465919de851a18a2f86045df671c801c91&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40454&comment_hash=70d8a5130ddfc5cee502242a7edabd465919de851a18a2f86045df671c801c91&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: the impala database engine spec makes an http p... in impala.py [superset]

Reply via email to