Copilot commented on code in PR #64610:
URL: https://github.com/apache/airflow/pull/64610#discussion_r3030491326


##########
airflow-core/src/airflow/api_fastapi/common/parameters.py:
##########
@@ -267,6 +269,65 @@ def depends_search(
     return depends_search
 
 
+class _RegexParam(BaseParam[str]):
+    """Filter using database-level regex matching (regexp_match).
+
+    SQLAlchemy's ``regexp_match`` is supported on PostgreSQL (``~`` operator),
+    MySQL/MariaDB (``REGEXP``), and SQLite (via Python ``re.match``).
+    Note: SQLite uses ``re.match`` semantics (anchored at the start of the
+    string), while PostgreSQL uses ``re.search`` (matches anywhere).

Review Comment:
   The new regex parameter docs claim SQLite support via Python `re.match`, but 
the public OpenAPI spec generated for `/api/v2/assets/events` currently says 
the filter is “Not supported on SQLite”. Also, elsewhere in the repo SQLite is 
treated as lacking regex support. Please align these statements and ensure the 
actual runtime behavior is clearly defined (either document it as unsupported 
on SQLite, or explicitly provide/describe the SQLite REGEXP implementation and 
its anchored semantics).
   ```suggestion
       SQLAlchemy's ``regexp_match`` is supported on PostgreSQL (``~`` operator)
       and MySQL/MariaDB (``REGEXP``). SQLite is not supported.
   ```



##########
airflow-core/src/airflow/api_fastapi/execution_api/routes/asset_events.py:
##########
@@ -73,6 +74,22 @@ def _get_asset_events_through_sql_clauses(
     )
 
 
+def _validate_partition_key_pattern(partition_key_pattern: str | None) -> None:
+    """Validate that the partition_key_pattern is a syntactically correct 
regex."""
+    if partition_key_pattern is None:
+        return
+    try:
+        re.compile(partition_key_pattern)
+    except re.error as e:

Review Comment:
   This pre-validation uses Python `re.compile()`, but the regex is executed by 
the database regex engine, which is not fully compatible with Python regex 
syntax. That means some DB-valid patterns may be rejected (or DB-invalid 
patterns may slip through). Consider handling validation in a DB-aware way, or 
catching DB-side regex errors and translating them to a 400 response.



##########
airflow-core/src/airflow/api_fastapi/execution_api/routes/asset_events.py:
##########
@@ -44,7 +45,7 @@ def _get_asset_events_through_sql_clauses(
 ) -> AssetEventsResponse:
     order_by_clause = AssetEvent.timestamp.asc() if ascending else 
AssetEvent.timestamp.desc()
     asset_events_query = 
select(AssetEvent).join(join_clause).where(where_clause).order_by(order_by_clause)
-    if limit:
+    if limit is not None:
         asset_events_query = asset_events_query.limit(limit)
     asset_events = session.scalars(asset_events_query)

Review Comment:
   This query joins `AssetEvent.asset`, but `AssetEvent.asset` is configured 
`lazy="select"`; accessing `event.asset` inside the response-building loop will 
trigger an extra SELECT per event (N+1). Since this endpoint is 
performance-sensitive (and now adds regex filtering), consider eager-loading 
the relationship (e.g., `contains_eager`/`joinedload`) to avoid per-row DB 
roundtrips.



##########
airflow-core/src/airflow/api_fastapi/core_api/openapi/v2-rest-api-generated.yaml:
##########
@@ -340,6 +340,18 @@ paths:
           - type: integer
           - type: 'null'
           title: Source Map Index
+      - name: partition_key_pattern
+        in: query
+        required: false
+        schema:
+          anyOf:
+          - type: string
+          - type: 'null'
+          description: Regular expression pattern for filtering. Uses 
database-native
+            regex (PostgreSQL ~ operator, MySQL REGEXP). Not supported on 
SQLite.
+          title: Partition Key Pattern
+        description: Regular expression pattern for filtering. Uses 
database-native
+          regex (PostgreSQL ~ operator, MySQL REGEXP). Not supported on SQLite.

Review Comment:
   The generated OpenAPI description for `partition_key_pattern` says regex 
filtering is “Not supported on SQLite”, but the server-side parameter factory / 
execution API route descriptions mention SQLite `re.match` semantics. Please 
make these consistent so clients know what to expect on SQLite.
   ```suggestion
               regex (PostgreSQL ~ operator, MySQL REGEXP). On SQLite, uses 
Python
               re.match semantics.
             title: Partition Key Pattern
           description: Regular expression pattern for filtering. Uses 
database-native
             regex (PostgreSQL ~ operator, MySQL REGEXP). On SQLite, uses Python
             re.match semantics.
   ```



##########
airflow-core/src/airflow/api_fastapi/common/parameters.py:
##########
@@ -267,6 +269,65 @@ def depends_search(
     return depends_search
 
 
+class _RegexParam(BaseParam[str]):
+    """Filter using database-level regex matching (regexp_match).
+
+    SQLAlchemy's ``regexp_match`` is supported on PostgreSQL (``~`` operator),
+    MySQL/MariaDB (``REGEXP``), and SQLite (via Python ``re.match``).
+    Note: SQLite uses ``re.match`` semantics (anchored at the start of the
+    string), while PostgreSQL uses ``re.search`` (matches anywhere).
+    """
+
+    def __init__(self, attribute: ColumnElement, skip_none: bool = True) -> 
None:
+        super().__init__(skip_none=skip_none)
+        self.attribute: ColumnElement = attribute
+
+    def to_orm(self, select: Select) -> Select:
+        if self.value is None and self.skip_none:
+            return select
+        return select.where(self.attribute.regexp_match(self.value))
+
+    @classmethod
+    def depends(cls, *args: Any, **kwargs: Any) -> Self:
+        raise NotImplementedError("Use regex_param_factory instead, depends is 
not implemented.")
+
+
+def _validate_regex_pattern(value: str | None) -> str | None:
+    """Validate that the regex pattern is syntactically correct."""
+    if value is None:
+        return value
+    try:
+        re.compile(value)
+    except re.error as e:
+        raise HTTPException(
+            status_code=status.HTTP_400_BAD_REQUEST,
+            detail=f"Invalid regular expression: {e}",
+        )
+    return value

Review Comment:
   Regex validation here uses Python `re.compile()`, but the actual matching is 
executed by the database regex engine (PostgreSQL, MySQL/MariaDB, etc.), which 
has different syntax/feature support. This can incorrectly reject patterns that 
are valid for the DB (or accept patterns that will later fail at query time). 
Consider either (a) removing the pre-validation and surfacing DB errors as 
400s, or (b) making validation dialect-aware, or at least documenting that 
validation is Python-regex based and may differ from the DB.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to