(iceberg-python) branch main updated: Support quoted column identifiers for scan `row_filter` (#1863)

fokko Fri, 04 Apr 2025 12:24:14 -0700

This is an automated email from the ASF dual-hosted git repository.

fokko pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg-python.git



The following commit(s) were added to refs/heads/main by this push:
     new 8adf2467 Support quoted column identifiers for scan `row_filter` 
(#1863)
8adf2467 is described below

commit 8adf24673458d23cad4a88fa3474b015811f8f83
Author: Ethan Knox <[email protected]>
AuthorDate: Fri Apr 4 15:24:04 2025 -0400

    Support quoted column identifiers for scan `row_filter` (#1863)
    
    # Rationale for this change
    Our data lake uses old-school Kimball style quoted column names ("User
    ID", "Customer Name" etc). The string parser for `row_filter` was unable
    to parse this. Now it is.
    
    example:
    ```python
    
    # before
    >> parser.parse(' "User Name" = 'ted')
    ParseException: Expected '"', found ' '
    
    # after
    >> parser.parse(' "User Name" = 'ted')
    EqualTo("User Name", "ted")
    
    # Are these changes tested?
    Yes a new test was added.
    ```
    
    >[!NOTE]
    > The `quoted_column_with_dots` previously errored `with "Expected '"',
    found '.'"` _when using **double quotes only**_. It now raises error
    text expecting an `'or'` value; I didn't toil over finding where the
    exception is clobbered, because the error message between single and
    double quote exceptions is inconsistent and I didn't really consider
    this a polished/first-class error message. If this change is an issue, I
    can dig further to try and revert the wording change; IMO raising the
    same exception type is more than reasonable to consider the change
    non-breaking.
    
    # Are there any user-facing changes?
    Yes quoted identifiers are now supported
---
 pyiceberg/expressions/parser.py  | 13 ++++++++++++-
 tests/expressions/test_parser.py |  6 ++++--
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/pyiceberg/expressions/parser.py b/pyiceberg/expressions/parser.py
index bad2df95..b9b6f9ab 100644
--- a/pyiceberg/expressions/parser.py
+++ b/pyiceberg/expressions/parser.py
@@ -22,8 +22,10 @@ from pyparsing import (
     DelimitedList,
     Group,
     MatchFirst,
+    ParseException,
     ParserElement,
     ParseResults,
+    QuotedString,
     Suppress,
     Word,
     alphanums,
@@ -79,7 +81,16 @@ NAN = CaselessKeyword("nan")
 LIKE = CaselessKeyword("like")
 
 unquoted_identifier = Word(alphas + "_", alphanums + "_$")
-quoted_identifier = Suppress('"') + unquoted_identifier + Suppress('"')
+quoted_identifier = QuotedString('"', escChar="\\", unquoteResults=True)
+
+
+@quoted_identifier.set_parse_action
+def validate_quoted_identifier(result: ParseResults) -> str:
+    if "." in result[0]:
+        raise ParseException("Expected '\"', found '.'")
+    return result[0]
+
+
 identifier = MatchFirst([unquoted_identifier, 
quoted_identifier]).set_results_name("identifier")
 column = DelimitedList(identifier, delim=".", 
combine=False).set_results_name("column")
 
diff --git a/tests/expressions/test_parser.py b/tests/expressions/test_parser.py
index 807aabeb..064fdb8f 100644
--- a/tests/expressions/test_parser.py
+++ b/tests/expressions/test_parser.py
@@ -230,9 +230,11 @@ def test_quoted_column_with_dots() -> None:
     with pytest.raises(ParseException) as exc_info:
         parser.parse("\"foo.bar\".baz = 'data'")
 
-    assert "Expected '\"', found '.'" in str(exc_info.value)
-
     with pytest.raises(ParseException) as exc_info:
         parser.parse("'foo.bar'.baz = 'data'")
 
     assert "Expected <= | <> | < | >= | > | == | = | !=, found '.'" in 
str(exc_info.value)
+
+
+def test_quoted_column_with_spaces() -> None:
+    assert EqualTo("Foo Bar", "data") == parser.parse("\"Foo Bar\" = 'data'")

(iceberg-python) branch main updated: Support quoted column identifiers for scan `row_filter` (#1863)

Reply via email to