codeant-ai-for-open-source[bot] commented on code in PR #36777:
URL: https://github.com/apache/superset/pull/36777#discussion_r2637295654
##########
superset/models/helpers.py:
##########
@@ -3079,19 +3079,45 @@ def get_sqla_query( # pylint:
disable=too-many-arguments,too-many-locals,too-ma
raise QueryObjectValidationError(
_("Filter value list cannot be empty")
)
- if len(eq) > len(
- eq_without_none := [x for x in eq if x is not None]
+ # Handle boolean columns specially for databases that
require
+ # boolean literals in IN clauses (e.g., Databricks)
+ if (
+ target_generic_type == utils.GenericDataType.BOOLEAN
+ and hasattr(db_engine_spec, "handle_boolean_in_clause")
Review Comment:
**Suggestion:** Unsafe attribute check: the code uses
`hasattr(db_engine_spec, "handle_boolean_in_clause")` but doesn't ensure the
attribute is callable, which can raise a TypeError when invoked if the
attribute exists but is not a function; use `callable(getattr(..., None))` to
guard the call. [possible bug]
**Severity Level:** Critical 🚨
```suggestion
and callable(getattr(db_engine_spec,
"handle_boolean_in_clause", None))
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
The check is a small defensive improvement: switching to
callable(getattr(..., None))
prevents a possible TypeError if someone (somehow) defined a non-callable
attribute
named handle_boolean_in_clause on the engine spec. The codebase expects this
to be
a method, so this is low-risk, useful hardening.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/models/helpers.py
**Line:** 3086:3086
**Comment:**
*Possible Bug: Unsafe attribute check: the code uses
`hasattr(db_engine_spec, "handle_boolean_in_clause")` but doesn't ensure the
attribute is callable, which can raise a TypeError when invoked if the
attribute exists but is not a function; use `callable(getattr(..., None))` to
guard the call.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/db_engine_specs/databricks.py:
##########
@@ -242,6 +242,54 @@ def convert_dttm(
def epoch_to_dttm(cls) -> str:
return HiveEngineSpec.epoch_to_dttm()
+ @classmethod
+ def handle_boolean_in_clause(
+ cls, sqla_col: Any, values: list[Any]
+ ) -> Any:
+ """
+ Handle boolean values in IN clauses for Databricks.
+
+ Databricks requires boolean literals (True/False) in IN clauses,
+ not integers (0/1). This method uses OR conditions with equality
+ checks to ensure boolean literals are used instead of integers.
+
+ :param sqla_col: SQLAlchemy column element
+ :param values: List of values for the IN clause
+ :return: SQLAlchemy expression for the boolean IN clause
+ """
+ from sqlalchemy import or_
+
+ # Convert values to proper boolean types
+ boolean_values = []
+ for val in values:
+ if val is None:
+ continue
+ # Ensure boolean values are properly typed
+ if isinstance(val, bool):
+ boolean_values.append(val)
+ elif isinstance(val, (int, float)):
+ # Convert 0/1 to False/True
+ boolean_values.append(bool(val))
+ elif isinstance(val, str):
+ # Handle string representations
+ if val.lower() in ("true", "1"):
+ boolean_values.append(True)
+ elif val.lower() in ("false", "0"):
+ boolean_values.append(False)
+ else:
+ # For other types, try to convert to boolean
+ boolean_values.append(bool(val))
+
+ if not boolean_values:
+ # If all values are None, return a condition that's always false
+ from sqlalchemy import false
+ return false()
+
+ # Use OR conditions with equality checks for boolean values
+ # This ensures boolean literals (True/False) are used instead of
integers
+ conditions = [sqla_col == val for val in boolean_values]
Review Comment:
**Suggestion:** The code may produce duplicate OR conditions when the input
contains duplicate boolean-like values; deduplicate boolean values before
building conditions to reduce redundant SQL and potential planner confusion.
[performance]
**Severity Level:** Minor ⚠️
```suggestion
# Deduplicate boolean values while preserving order to avoid
redundant OR terms
seen = set()
deduped = []
for b in boolean_values:
if b not in seen:
seen.add(b)
deduped.append(b)
# Use OR conditions with equality checks for boolean values
# This ensures boolean literals (True/False) are used instead of
integers
conditions = [sqla_col == val for val in deduped]
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
Removing duplicates (True/False) reduces redundant SQL (e.g., col == True OR
col == True) and simplifies the generated expression. It's a harmless
optimization that doesn't change semantics.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/db_engine_specs/databricks.py
**Line:** 288:290
**Comment:**
*Performance: The code may produce duplicate OR conditions when the
input contains duplicate boolean-like values; deduplicate boolean values before
building conditions to reduce redundant SQL and potential planner confusion.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
tests/integration_tests/db_engine_specs/databricks_tests.py:
##########
@@ -59,3 +61,44 @@ def test_extras_with_ssl_custom(self):
extras = DatabricksNativeEngineSpec.get_extra_params(database)
connect_args = extras["engine_params"]["connect_args"]
assert connect_args["ssl"] == "1"
+
+ def test_handle_boolean_in_clause(self):
+ """
+ Test that boolean IN clauses use boolean literals instead of integers.
+
+ Databricks requires boolean literals (True/False) in IN clauses,
+ not integers (0/1). This test verifies that handle_boolean_in_clause
+ converts values properly and uses OR conditions with equality checks.
+ """
+ # Create a mock table with a boolean column
+ test_table = Table(
+ "test_table",
+ mock.MagicMock(),
+ Column("is_test_user", Boolean),
+ )
+ sqla_col = test_table.c.is_test_user
+
+ # Test with boolean values
+ result = DatabricksNativeEngineSpec.handle_boolean_in_clause(
+ sqla_col, [False]
+ )
+ # Verify the result is an OR condition (not an IN clause)
+ assert result is not None
+
+ # Test with integer values (should be converted to booleans)
+ result = DatabricksNativeEngineSpec.handle_boolean_in_clause(
+ sqla_col, [0, 1]
+ )
+ assert result is not None
+
+ # Test with string values
+ result = DatabricksNativeEngineSpec.handle_boolean_in_clause(
+ sqla_col, ["false", "true"]
+ )
+ assert result is not None
+
+ # Test with empty list (should return false condition)
+ from sqlalchemy import false
+ result = DatabricksNativeEngineSpec.handle_boolean_in_clause(sqla_col,
[])
+ # The result should be a false condition when all values are filtered
out
+ assert result is not None
Review Comment:
**Suggestion:** The test imports `false` but only checks `result is not
None`; assert the actual expected false condition to validate behavior (compare
SQLAlchemy expressions via their string form) so the test fails when the
function doesn't return a false expression for an empty input list. [logic
error]
**Severity Level:** Minor ⚠️
```suggestion
assert str(result) == str(false())
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
The handler returns sqlalchemy.false() when no boolean values remain;
asserting only non-nullity is weak. Replacing the check with a comparison to
false() (string form or a structural comparison) strengthens the test and will
catch regressions. The suggested str(result) == str(false()) is an acceptable
pragmatic assertion in tests.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** tests/integration_tests/db_engine_specs/databricks_tests.py
**Line:** 104:104
**Comment:**
*Logic Error: The test imports `false` but only checks `result is not
None`; assert the actual expected false condition to validate behavior (compare
SQLAlchemy expressions via their string form) so the test fails when the
function doesn't return a false expression for an empty input list.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
tests/integration_tests/db_engine_specs/databricks_tests.py:
##########
@@ -59,3 +61,44 @@ def test_extras_with_ssl_custom(self):
extras = DatabricksNativeEngineSpec.get_extra_params(database)
connect_args = extras["engine_params"]["connect_args"]
assert connect_args["ssl"] == "1"
+
+ def test_handle_boolean_in_clause(self):
+ """
+ Test that boolean IN clauses use boolean literals instead of integers.
+
+ Databricks requires boolean literals (True/False) in IN clauses,
+ not integers (0/1). This test verifies that handle_boolean_in_clause
+ converts values properly and uses OR conditions with equality checks.
+ """
+ # Create a mock table with a boolean column
+ test_table = Table(
+ "test_table",
+ mock.MagicMock(),
Review Comment:
**Suggestion:** Using a MagicMock as the Table metadata is fragile and can
cause Table construction or column access to fail at runtime; use a real
SQLAlchemy MetaData instance to construct the Table so
`test_table.c.is_test_user` is a valid ColumnElement. [logic error]
**Severity Level:** Minor ⚠️
```suggestion
from sqlalchemy import MetaData
metadata = MetaData()
test_table = Table(
"test_table",
metadata,
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
The current test uses mock.MagicMock() as the Table's metadata object which
is fragile — SQLAlchemy expects a MetaData instance and using a real MetaData
ensures Table.c[...] resolves correctly. The proposed change fixes a real
brittleness in the test and is small and safe.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** tests/integration_tests/db_engine_specs/databricks_tests.py
**Line:** 74:76
**Comment:**
*Logic Error: Using a MagicMock as the Table metadata is fragile and
can cause Table construction or column access to fail at runtime; use a real
SQLAlchemy MetaData instance to construct the Table so
`test_table.c.is_test_user` is a valid ColumnElement.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]