codeant-ai-for-open-source[bot] commented on code in PR #40454:
URL: https://github.com/apache/superset/pull/40454#discussion_r3308537517


##########
tests/invariant_impala.test.py:
##########
@@ -0,0 +1,324 @@
+import { describe, test, expect, jest, beforeEach, afterEach } from 
'@jest/globals';
+
+/**
+ * Security Property:
+ * The Impala engine spec must never make HTTP POST requests to URLs derived 
from
+ * user-controllable connection parameters that point to internal/private 
network
+ * addresses, SSRF targets, or otherwise unauthorized destinations.
+ *
+ * WHAT MUST ALWAYS BE TRUE:
+ * - URLs constructed from user-supplied connection parameters must be 
validated
+ *   against an allowlist of permitted hosts/schemes before any HTTP request 
is made.
+ * - Internal network addresses (loopback, RFC1918, metadata services) must be 
rejected.
+ * - Non-HTTP(S) schemes must be rejected.
+ * - Redirects to internal services must not be followed blindly.
+ */
+
+// Simulated URL validator that represents the security control that MUST exist
+// in the Impala engine spec before making any HTTP POST request.
+function validateImpalaKillQueryUrl(url: string): { valid: boolean; reason?: 
string } {
+  let parsed: URL;
+
+  try {
+    parsed = new URL(url);
+  } catch {
+    return { valid: false, reason: 'Invalid URL format' };
+  }
+
+  // Only allow http and https schemes
+  if (!['http:', 'https:'].includes(parsed.protocol)) {
+    return { valid: false, reason: `Disallowed scheme: ${parsed.protocol}` };
+  }
+
+  const hostname = parsed.hostname.toLowerCase();
+
+  // Block loopback addresses
+  if (
+    hostname === 'localhost' ||
+    hostname === '127.0.0.1' ||
+    hostname === '::1' ||
+    hostname.startsWith('127.')
+  ) {
+    return { valid: false, reason: 'Loopback address not allowed' };
+  }
+
+  // Block RFC1918 private ranges
+  const privateRanges = [
+    /^10\.\d+\.\d+\.\d+$/,
+    /^172\.(1[6-9]|2\d|3[01])\.\d+\.\d+$/,
+    /^192\.168\.\d+\.\d+$/,
+  ];
+  for (const range of privateRanges) {
+    if (range.test(hostname)) {
+      return { valid: false, reason: 'Private network address not allowed' };
+    }
+  }
+
+  // Block cloud metadata services
+  const blockedHosts = [
+    '169.254.169.254', // AWS/GCP/Azure metadata
+    'metadata.google.internal',
+    'metadata.internal',
+    '100.100.100.200', // Alibaba Cloud metadata
+    '192.0.2.1',      // TEST-NET
+    '0.0.0.0',
+  ];
+  if (blockedHosts.includes(hostname)) {
+    return { valid: false, reason: 'Blocked host (metadata service or 
reserved)' };
+  }
+
+  // Block IPv6 link-local
+  if (hostname.startsWith('fe80') || hostname === '[::1]') {
+    return { valid: false, reason: 'IPv6 link-local or loopback not allowed' };
+  }
+
+  // Block URLs with credentials embedded
+  if (parsed.username || parsed.password) {
+    return { valid: false, reason: 'Credentials in URL not allowed' };
+  }
+
+  return { valid: true };
+}
+
+// Simulated function that represents what the Impala spec does when killing a 
query
+function impalaKillQuery(connectionHost: string, connectionPort: number, 
queryId: string): {
+  urlAttempted: string;
+  requestMade: boolean;
+  blocked: boolean;
+  reason?: string;
+} {
+  // Simulate URL construction from connection parameters (as in the 
vulnerable code)
+  const url = 
`http://${connectionHost}:${connectionPort}/cancel_query?query_id=${encodeURIComponent(queryId)}`;
+
+  const validation = validateImpalaKillQueryUrl(url);
+
+  if (!validation.valid) {
+    return {
+      urlAttempted: url,
+      requestMade: false,
+      blocked: true,
+      reason: validation.reason,
+    };
+  }
+
+  // Only if validation passes would we make the actual HTTP POST
+  return {
+    urlAttempted: url,
+    requestMade: true,
+    blocked: false,
+  };
+}

Review Comment:
   **🟠 Architect Review — HIGH**
   
   The added "regression test" is Jest-style TypeScript/JavaScript in a .py 
file that defines its own simulate functions 
(validateImpalaKillQueryUrl/impalaKillQuery) and never imports or invokes 
superset.db_engine_specs.impala.ImpalaEngineSpec.cancel_query; additionally, 
its filename invariant_impala.test.py does not match the pytest python_files 
patterns (*_test.py, test_*.py, *_tests.py), so it is never collected or run 
and the intended security invariant is not enforced in CI.
   
   **Suggestion:** Replace this with a real Python pytest under the existing 
naming conventions (eg, in tests/unit_tests/db_engine_specs/test_impala.py or a 
sibling file) that imports ImpalaEngineSpec.cancel_query, uses a mocked 
requests.post, and asserts that loopback/private/metadata/redirect targets are 
rejected according to the security invariant, ensuring it runs under the repo's 
pytest configuration.
   
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=5c0100f1ab774be38b046703659cf3f6&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=5c0100f1ab774be38b046703659cf3f6&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is an **Architect / Logical Review** comment left during a code review. 
These reviews are first-class, important findings — not optional suggestions. 
Do NOT dismiss this as a 'big architectural change' just because the title says 
architect review; most of these can be resolved with a small, localized fix 
once the intent is understood.
   
   **Path:** tests/invariant_impala.test.py
   **Line:** 1:110
   **Comment:**
        *HIGH: The added "regression test" is Jest-style TypeScript/JavaScript 
in a .py file that defines its own simulate functions 
(validateImpalaKillQueryUrl/impalaKillQuery) and never imports or invokes 
superset.db_engine_specs.impala.ImpalaEngineSpec.cancel_query; additionally, 
its filename invariant_impala.test.py does not match the pytest python_files 
patterns (*_test.py, test_*.py, *_tests.py), so it is never collected or run 
and the intended security invariant is not enforced in CI.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   If a suggested approach is provided above, use it as the authoritative 
instruction. If no explicit code suggestion is given, you MUST still draft and 
apply your own minimal, localized fix — do not punt back with 'no suggestion 
provided, review manually'. Keep the change as small as possible: add a guard 
clause, gate on a loading state, reorder an await, wrap in a conditional, etc. 
Do not refactor surrounding code or expand scope beyond the finding.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>



##########
superset/db_engine_specs/impala.py:
##########
@@ -209,7 +209,10 @@ def cancel_query(cls, cursor: Any, query: Query, 
cancel_query_id: str) -> bool:
         """
         try:
             impala_host = query.database.url_object.host
-            url = 
f"http://{impala_host}:25000/cancel_query?query_id={cancel_query_id}";
+            url = "http://{}:25000/cancel_query?query_id={}".format(
+                requests.utils.quote(impala_host, safe=""),
+                requests.utils.quote(cancel_query_id, safe=""),
+            )
             response = requests.post(url, timeout=3)

Review Comment:
   **🔴 Architect Review — CRITICAL**
   
   ImpalaEngineSpec.cancel_query still issues an HTTP POST directly to 
query.database.url_object.host without any destination validation; URL-encoding 
the host and query ID only escapes characters and does not restrict which hosts 
(including loopback, RFC1918, or metadata IPs) can be targeted, so the SSRF 
channel remains open under adversarial database connection settings.
   
   **Suggestion:** Before constructing and calling the cancel URL, validate the 
scheme and host against an explicit policy (eg, allowlisted hosts/schemes or a 
check that rejects loopback/private/link-local/metadata ranges), failing closed 
when validation fails, and centralize this URL/network validation so all 
outbound engine-spec HTTP calls share the same enforcement.
   
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=8b9b9afa00094e238d55a3097d2b6b91&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=8b9b9afa00094e238d55a3097d2b6b91&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is an **Architect / Logical Review** comment left during a code review. 
These reviews are first-class, important findings — not optional suggestions. 
Do NOT dismiss this as a 'big architectural change' just because the title says 
architect review; most of these can be resolved with a small, localized fix 
once the intent is understood.
   
   **Path:** superset/db_engine_specs/impala.py
   **Line:** 211:216
   **Comment:**
        *CRITICAL: ImpalaEngineSpec.cancel_query still issues an HTTP POST 
directly to query.database.url_object.host without any destination validation; 
URL-encoding the host and query ID only escapes characters and does not 
restrict which hosts (including loopback, RFC1918, or metadata IPs) can be 
targeted, so the SSRF channel remains open under adversarial database 
connection settings.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   If a suggested approach is provided above, use it as the authoritative 
instruction. If no explicit code suggestion is given, you MUST still draft and 
apply your own minimal, localized fix — do not punt back with 'no suggestion 
provided, review manually'. Keep the change as small as possible: add a guard 
clause, gate on a loading state, reorder an await, wrap in a conditional, etc. 
Do not refactor surrounding code or expand scope beyond the finding.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to