This is an automated email from the ASF dual-hosted git repository.
fokko pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg-python.git
The following commit(s) were added to refs/heads/main by this push:
new 8adf2467 Support quoted column identifiers for scan `row_filter`
(#1863)
8adf2467 is described below
commit 8adf24673458d23cad4a88fa3474b015811f8f83
Author: Ethan Knox <[email protected]>
AuthorDate: Fri Apr 4 15:24:04 2025 -0400
Support quoted column identifiers for scan `row_filter` (#1863)
# Rationale for this change
Our data lake uses old-school Kimball style quoted column names ("User
ID", "Customer Name" etc). The string parser for `row_filter` was unable
to parse this. Now it is.
example:
```python
# before
>> parser.parse(' "User Name" = 'ted')
ParseException: Expected '"', found ' '
# after
>> parser.parse(' "User Name" = 'ted')
EqualTo("User Name", "ted")
# Are these changes tested?
Yes a new test was added.
```
>[!NOTE]
> The `quoted_column_with_dots` previously errored `with "Expected '"',
found '.'"` _when using **double quotes only**_. It now raises error
text expecting an `'or'` value; I didn't toil over finding where the
exception is clobbered, because the error message between single and
double quote exceptions is inconsistent and I didn't really consider
this a polished/first-class error message. If this change is an issue, I
can dig further to try and revert the wording change; IMO raising the
same exception type is more than reasonable to consider the change
non-breaking.
# Are there any user-facing changes?
Yes quoted identifiers are now supported
---
pyiceberg/expressions/parser.py | 13 ++++++++++++-
tests/expressions/test_parser.py | 6 ++++--
2 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/pyiceberg/expressions/parser.py b/pyiceberg/expressions/parser.py
index bad2df95..b9b6f9ab 100644
--- a/pyiceberg/expressions/parser.py
+++ b/pyiceberg/expressions/parser.py
@@ -22,8 +22,10 @@ from pyparsing import (
DelimitedList,
Group,
MatchFirst,
+ ParseException,
ParserElement,
ParseResults,
+ QuotedString,
Suppress,
Word,
alphanums,
@@ -79,7 +81,16 @@ NAN = CaselessKeyword("nan")
LIKE = CaselessKeyword("like")
unquoted_identifier = Word(alphas + "_", alphanums + "_$")
-quoted_identifier = Suppress('"') + unquoted_identifier + Suppress('"')
+quoted_identifier = QuotedString('"', escChar="\\", unquoteResults=True)
+
+
+@quoted_identifier.set_parse_action
+def validate_quoted_identifier(result: ParseResults) -> str:
+ if "." in result[0]:
+ raise ParseException("Expected '\"', found '.'")
+ return result[0]
+
+
identifier = MatchFirst([unquoted_identifier,
quoted_identifier]).set_results_name("identifier")
column = DelimitedList(identifier, delim=".",
combine=False).set_results_name("column")
diff --git a/tests/expressions/test_parser.py b/tests/expressions/test_parser.py
index 807aabeb..064fdb8f 100644
--- a/tests/expressions/test_parser.py
+++ b/tests/expressions/test_parser.py
@@ -230,9 +230,11 @@ def test_quoted_column_with_dots() -> None:
with pytest.raises(ParseException) as exc_info:
parser.parse("\"foo.bar\".baz = 'data'")
- assert "Expected '\"', found '.'" in str(exc_info.value)
-
with pytest.raises(ParseException) as exc_info:
parser.parse("'foo.bar'.baz = 'data'")
assert "Expected <= | <> | < | >= | > | == | = | !=, found '.'" in
str(exc_info.value)
+
+
+def test_quoted_column_with_spaces() -> None:
+ assert EqualTo("Foo Bar", "data") == parser.parse("\"Foo Bar\" = 'data'")