Re: [PR] fix(databricks): handle boolean IN clauses with boolean literals [superset]

via GitHub Sat, 20 Dec 2025 11:43:37 -0800


codeant-ai-for-open-source[bot] commented on code in PR #36777:
URL: https://github.com/apache/superset/pull/36777#discussion_r2637295654



##########
superset/models/helpers.py:
##########
@@ -3079,19 +3079,45 @@ def get_sqla_query(  # pylint: 
disable=too-many-arguments,too-many-locals,too-ma
                         raise QueryObjectValidationError(
                             _("Filter value list cannot be empty")
                         )
-                    if len(eq) > len(
-                        eq_without_none := [x for x in eq if x is not None]
+                    # Handle boolean columns specially for databases that 
require
+                    # boolean literals in IN clauses (e.g., Databricks)
+                    if (
+                        target_generic_type == utils.GenericDataType.BOOLEAN
+                        and hasattr(db_engine_spec, "handle_boolean_in_clause")

Review Comment:
   **Suggestion:** Unsafe attribute check: the code uses 
`hasattr(db_engine_spec, "handle_boolean_in_clause")` but doesn't ensure the 
attribute is callable, which can raise a TypeError when invoked if the 
attribute exists but is not a function; use `callable(getattr(..., None))` to 
guard the call. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
                           and callable(getattr(db_engine_spec, 
"handle_boolean_in_clause", None))
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The check is a small defensive improvement: switching to 
callable(getattr(..., None))
   prevents a possible TypeError if someone (somehow) defined a non-callable 
attribute
   named handle_boolean_in_clause on the engine spec. The codebase expects this 
to be
   a method, so this is low-risk, useful hardening.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/models/helpers.py
   **Line:** 3086:3086
   **Comment:**
        *Possible Bug: Unsafe attribute check: the code uses 
`hasattr(db_engine_spec, "handle_boolean_in_clause")` but doesn't ensure the 
attribute is callable, which can raise a TypeError when invoked if the 
attribute exists but is not a function; use `callable(getattr(..., None))` to 
guard the call.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/db_engine_specs/databricks.py:
##########
@@ -242,6 +242,54 @@ def convert_dttm(
     def epoch_to_dttm(cls) -> str:
         return HiveEngineSpec.epoch_to_dttm()
 
+    @classmethod
+    def handle_boolean_in_clause(
+        cls, sqla_col: Any, values: list[Any]
+    ) -> Any:
+        """
+        Handle boolean values in IN clauses for Databricks.
+
+        Databricks requires boolean literals (True/False) in IN clauses,
+        not integers (0/1). This method uses OR conditions with equality
+        checks to ensure boolean literals are used instead of integers.
+
+        :param sqla_col: SQLAlchemy column element
+        :param values: List of values for the IN clause
+        :return: SQLAlchemy expression for the boolean IN clause
+        """
+        from sqlalchemy import or_
+
+        # Convert values to proper boolean types
+        boolean_values = []
+        for val in values:
+            if val is None:
+                continue
+            # Ensure boolean values are properly typed
+            if isinstance(val, bool):
+                boolean_values.append(val)
+            elif isinstance(val, (int, float)):
+                # Convert 0/1 to False/True
+                boolean_values.append(bool(val))
+            elif isinstance(val, str):
+                # Handle string representations
+                if val.lower() in ("true", "1"):
+                    boolean_values.append(True)
+                elif val.lower() in ("false", "0"):
+                    boolean_values.append(False)
+            else:
+                # For other types, try to convert to boolean
+                boolean_values.append(bool(val))
+
+        if not boolean_values:
+            # If all values are None, return a condition that's always false
+            from sqlalchemy import false
+            return false()
+
+        # Use OR conditions with equality checks for boolean values
+        # This ensures boolean literals (True/False) are used instead of 
integers
+        conditions = [sqla_col == val for val in boolean_values]

Review Comment:
   **Suggestion:** The code may produce duplicate OR conditions when the input 
contains duplicate boolean-like values; deduplicate boolean values before 
building conditions to reduce redundant SQL and potential planner confusion. 
[performance]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
           # Deduplicate boolean values while preserving order to avoid 
redundant OR terms
           seen = set()
           deduped = []
           for b in boolean_values:
               if b not in seen:
                   seen.add(b)
                   deduped.append(b)
   
           # Use OR conditions with equality checks for boolean values
           # This ensures boolean literals (True/False) are used instead of 
integers
           conditions = [sqla_col == val for val in deduped]
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Removing duplicates (True/False) reduces redundant SQL (e.g., col == True OR 
col == True) and simplifies the generated expression. It's a harmless 
optimization that doesn't change semantics.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/databricks.py
   **Line:** 288:290
   **Comment:**
        *Performance: The code may produce duplicate OR conditions when the 
input contains duplicate boolean-like values; deduplicate boolean values before 
building conditions to reduce redundant SQL and potential planner confusion.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
tests/integration_tests/db_engine_specs/databricks_tests.py:
##########
@@ -59,3 +61,44 @@ def test_extras_with_ssl_custom(self):
         extras = DatabricksNativeEngineSpec.get_extra_params(database)
         connect_args = extras["engine_params"]["connect_args"]
         assert connect_args["ssl"] == "1"
+
+    def test_handle_boolean_in_clause(self):
+        """
+        Test that boolean IN clauses use boolean literals instead of integers.
+
+        Databricks requires boolean literals (True/False) in IN clauses,
+        not integers (0/1). This test verifies that handle_boolean_in_clause
+        converts values properly and uses OR conditions with equality checks.
+        """
+        # Create a mock table with a boolean column
+        test_table = Table(
+            "test_table",
+            mock.MagicMock(),
+            Column("is_test_user", Boolean),
+        )
+        sqla_col = test_table.c.is_test_user
+
+        # Test with boolean values
+        result = DatabricksNativeEngineSpec.handle_boolean_in_clause(
+            sqla_col, [False]
+        )
+        # Verify the result is an OR condition (not an IN clause)
+        assert result is not None
+
+        # Test with integer values (should be converted to booleans)
+        result = DatabricksNativeEngineSpec.handle_boolean_in_clause(
+            sqla_col, [0, 1]
+        )
+        assert result is not None
+
+        # Test with string values
+        result = DatabricksNativeEngineSpec.handle_boolean_in_clause(
+            sqla_col, ["false", "true"]
+        )
+        assert result is not None
+
+        # Test with empty list (should return false condition)
+        from sqlalchemy import false
+        result = DatabricksNativeEngineSpec.handle_boolean_in_clause(sqla_col, 
[])
+        # The result should be a false condition when all values are filtered 
out
+        assert result is not None

Review Comment:
   **Suggestion:** The test imports `false` but only checks `result is not 
None`; assert the actual expected false condition to validate behavior (compare 
SQLAlchemy expressions via their string form) so the test fails when the 
function doesn't return a false expression for an empty input list. [logic 
error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
           assert str(result) == str(false())
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The handler returns sqlalchemy.false() when no boolean values remain; 
asserting only non-nullity is weak. Replacing the check with a comparison to 
false() (string form or a structural comparison) strengthens the test and will 
catch regressions. The suggested str(result) == str(false()) is an acceptable 
pragmatic assertion in tests.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/integration_tests/db_engine_specs/databricks_tests.py
   **Line:** 104:104
   **Comment:**
        *Logic Error: The test imports `false` but only checks `result is not 
None`; assert the actual expected false condition to validate behavior (compare 
SQLAlchemy expressions via their string form) so the test fails when the 
function doesn't return a false expression for an empty input list.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
tests/integration_tests/db_engine_specs/databricks_tests.py:
##########
@@ -59,3 +61,44 @@ def test_extras_with_ssl_custom(self):
         extras = DatabricksNativeEngineSpec.get_extra_params(database)
         connect_args = extras["engine_params"]["connect_args"]
         assert connect_args["ssl"] == "1"
+
+    def test_handle_boolean_in_clause(self):
+        """
+        Test that boolean IN clauses use boolean literals instead of integers.
+
+        Databricks requires boolean literals (True/False) in IN clauses,
+        not integers (0/1). This test verifies that handle_boolean_in_clause
+        converts values properly and uses OR conditions with equality checks.
+        """
+        # Create a mock table with a boolean column
+        test_table = Table(
+            "test_table",
+            mock.MagicMock(),

Review Comment:
   **Suggestion:** Using a MagicMock as the Table metadata is fragile and can 
cause Table construction or column access to fail at runtime; use a real 
SQLAlchemy MetaData instance to construct the Table so 
`test_table.c.is_test_user` is a valid ColumnElement. [logic error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
           from sqlalchemy import MetaData
           metadata = MetaData()
           test_table = Table(
               "test_table",
               metadata,
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The current test uses mock.MagicMock() as the Table's metadata object which 
is fragile — SQLAlchemy expects a MetaData instance and using a real MetaData 
ensures Table.c[...] resolves correctly. The proposed change fixes a real 
brittleness in the test and is small and safe.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/integration_tests/db_engine_specs/databricks_tests.py
   **Line:** 74:76
   **Comment:**
        *Logic Error: Using a MagicMock as the Table metadata is fragile and 
can cause Table construction or column access to fail at runtime; use a real 
SQLAlchemy MetaData instance to construct the Table so 
`test_table.c.is_test_user` is a valid ColumnElement.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(databricks): handle boolean IN clauses with boolean literals [superset]

Reply via email to