Har1sh-k commented on code in PR #66751:
URL: https://github.com/apache/airflow/pull/66751#discussion_r3358177748
##########
providers/apache/hive/src/airflow/providers/apache/hive/operators/hive_stats.py:
##########
@@ -29,6 +30,21 @@
if TYPE_CHECKING:
from airflow.providers.common.compat.sdk import Context
+# The table, the partition columns, and the metastore columns projected in the
Presto
+# stats SELECT are interpolated as identifiers, which cannot be bound as SQL
parameters.
+# Plain word identifiers are emitted unchanged; anything else is double-quoted
with
+# embedded quotes doubled (how Presto/Trino escape identifiers). Kept local
rather than
+# reusing common.sql's ``Dialect.escape_word``, which needs a live connection
and does
+# not double embedded quotes.
+_PLAIN_IDENT_RE = re.compile(r"[A-Za-z_][A-Za-z0-9_]*")
+
+
+def _quote_presto_identifier(identifier: str) -> str:
+ """Quote a Presto/Trino identifier unless it is already a plain word
identifier."""
+ if _PLAIN_IDENT_RE.fullmatch(identifier):
+ return identifier
+ return '"' + identifier.replace('"', '""') + '"'
Review Comment:
Good catch. Fixed.
`_quote_presto_identifier` now passes an already-quoted identifier through
unchanged, so `"test-123-quoted"` stays `"test-123-quoted"` instead of becoming
`"""test-123-quoted"""`. Malformed/unbalanced inputs (e.g. `"a"b"`) are still
escaped, and because the passthrough is a `fullmatch`, anything with trailing
SQL after the closing quote (e.g. `"x"; DROP TABLE users--`) is still escaped
to one inert identifier. Added a unit test covering these cases.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]