Re: [PR] Feat/hive support [superset]

via GitHub Tue, 23 Dec 2025 22:17:54 -0800


codeant-ai-for-open-source[bot] commented on code in PR #36821:
URL: https://github.com/apache/superset/pull/36821#discussion_r2644899051



##########
docker/docker-bootstrap.sh:
##########
@@ -42,7 +42,7 @@ if [ "$CYPRESS_CONFIG" == "true" ]; then
     PORT=8081
 fi
 # Skip postgres requirements installation for workers to avoid conflicts
-if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(whoami)" = "root" ] && [ 
"$1" != "worker" ] && [ "$1" != "beat" ]; then
+if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(whoami)" = "root" ] && [ 
"$1" != "worker" ] && [ "$1" != "beat" ] && [ "$SUPERSET_ENV" != "production" 
]; then

Review Comment:
   **Suggestion:** Using `whoami` in a command substitution can fail on minimal 
images (or be unavailable); replace it with a numeric UID check (`id -u`) which 
is more robust and avoids depending on `whoami`. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
   if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(id -u)" -eq 0 ] && [ "$1" 
!= "worker" ] && [ "$1" != "beat" ] && [ "$SUPERSET_ENV" != "production" ]; then
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Replacing whoami with id -u -eq 0 is more robust in minimal containers and 
POSIX-y. It's a sensible hardening to avoid runtime failure when whoami isn't 
present. Be mindful to use the same approach consistently across the script.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** docker/docker-bootstrap.sh
   **Line:** 45:45
   **Comment:**
        *Possible Bug: Using `whoami` in a command substitution can fail on 
minimal images (or be unavailable); replace it with a numeric UID check (`id 
-u`) which is more robust and avoids depending on `whoami`.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
debug_hive.py:
##########
@@ -0,0 +1,44 @@
+import socket
+import sys
+
+host = 'hive-server'
+port = 10000
+
+print(f"Testing connectivity to {host}:{port}...")
+
+# Test 1: DNS Resolution
+try:
+    ip = socket.gethostbyname(host)
+    print(f"DNS Resolved: {host} -> {ip}")
+except Exception as e:
+    print(f"DNS Failed: {e}")
+    sys.exit(1)
+
+# Test 2: TCP Socket
+try:
+    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+    s.settimeout(5)
+    s.connect((ip, port))
+    print(f"TCP Connection Successful to {ip}:{port}")
+    s.close()

Review Comment:
   **Suggestion:** Resource leak: the TCP socket `s` is only closed on the 
success path; if `s.connect()` raises an exception the socket remains open. Use 
a context manager or ensure `s.close()` runs in a finally block so the socket 
is always closed. [resource leak]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
       # create_connection handles DNS/IPv4/IPv6 resolution and returns a 
socket that supports context manager
       with socket.create_connection((ip, port), timeout=5) as s:
           print(f"TCP Connection Successful to {ip}:{port}")
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   This is a valid resource-leak bug: if connect() raises, the socket object 
may be left open. The improved code uses socket.create_connection within a 
context manager which ensures the socket is closed on all paths and simplifies 
timeout handling. It's a straightforward correctness fix.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** debug_hive.py
   **Line:** 19:23
   **Comment:**
        *Resource Leak: Resource leak: the TCP socket `s` is only closed on the 
success path; if `s.connect()` raises an exception the socket remains open. Use 
a context manager or ensure `s.close()` runs in a finally block so the socket 
is always closed.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
docker/Dockerfile.hive:
##########
@@ -0,0 +1,16 @@
+FROM apache/superset:latest

Review Comment:
   **Suggestion:** Using the unpinned image tag `:latest` makes builds 
non-reproducible and can introduce breaking or insecure changes; expose a build 
argument so callers can pin the Superset base image/version. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
   ARG SUPERSET_VERSION=latest
   FROM apache/superset:${SUPERSET_VERSION}
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Using :latest makes image builds non-reproducible and can silently pull 
breaking or insecure updates. Exposing a build ARG to allow pinning the 
Superset base image is a minimal, low-risk improvement that helps reproducible 
builds.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** docker/Dockerfile.hive
   **Line:** 1:1
   **Comment:**
        *Possible Bug: Using the unpinned image tag `:latest` makes builds 
non-reproducible and can introduce breaking or insecure changes; expose a build 
argument so callers can pin the Superset base image/version.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
docker/Dockerfile.hive:
##########
@@ -0,0 +1,16 @@
+FROM apache/superset:latest
+
+USER root
+
+# Install system dependencies for thrift-sasl
+RUN apt-get update -q \
+    && apt-get install -yq --no-install-recommends \
+       libsasl2-dev \
+       build-essential \
+       libpq-dev \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install python dependencies inside the virtual environment using uv
+RUN uv pip install --system --python /app/.venv pyhive thrift thrift-sasl 
psycopg2-binary

Review Comment:
   **Suggestion:** The command `uv pip ...` is invalid/unknown in the base 
image and will cause the RUN step to fail; also installing into a virtualenv 
path that may not exist will break the build. Use the venv's pip if present, 
otherwise fall back to the system pip, and add --no-cache-dir to avoid leaving 
pip cache in the layer. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
   RUN if [ -x /app/.venv/bin/pip ]; then \
         /app/.venv/bin/pip install --no-cache-dir pyhive thrift thrift-sasl 
psycopg2-binary; \
       else \
         python3 -m pip install --no-cache-dir pyhive thrift thrift-sasl 
psycopg2-binary; \
       fi
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The RUN line in the PR uses "uv pip ..." which looks wrong for the 
apache/superset image — "uv" is not a standard wrapper and will likely make the 
build fail. The proposed fallback to use the venv pip if present or python3 -m 
pip is sensible and the addition of --no-cache-dir avoids leaving pip cache in 
the layer. This directly fixes a build-time break.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** docker/Dockerfile.hive
   **Line:** 14:14
   **Comment:**
        *Possible Bug: The command `uv pip ...` is invalid/unknown in the base 
image and will cause the RUN step to fail; also installing into a virtualenv 
path that may not exist will break the build. Use the venv's pip if present, 
otherwise fall back to the system pip, and add --no-cache-dir to avoid leaving 
pip cache in the layer.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
docker/docker-init.sh:
##########
@@ -16,6 +16,8 @@
 # limitations under the License.
 #
 set -e
+exec > >(tee -a /app/docker/debug.log) 2>&1

Review Comment:
   **Suggestion:** Robustness issue: if /app/docker doesn't exist or isn't 
writable, the process-substitution tee can fail or exit and break 
logging/output; check that the directory/file are creatable and writable and 
fall back to console-only logging when not writable. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
   if mkdir -p /app/docker && touch /app/docker/debug.log && [ -w 
/app/docker/debug.log ]; then
       exec > >(tee -a /app/docker/debug.log) 2>&1
   else
       echo "Warning: cannot write to /app/docker/debug.log, continuing with 
stdout/stderr only"
   fi
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The improved code safely checks that the directory/file exist and are 
writable and falls back to console logging if not — that's a practical 
robustness improvement.
   Note: in bash with set -e, commands inside the if condition won't abort the 
script, so this construct is safe; you may still want to combine this with 
permission setting or clear logging policy for mounted volumes.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** docker/docker-init.sh
   **Line:** 19:19
   **Comment:**
        *Possible Bug: Robustness issue: if /app/docker doesn't exist or isn't 
writable, the process-substitution tee can fail or exit and break 
logging/output; check that the directory/file are creatable and writable and 
fall back to console-only logging when not writable.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Feat/hive support [superset]

Reply via email to