[GitHub] [iceberg] samredai commented on a diff in pull request #5258: More Python Expressions

GitBox Tue, 12 Jul 2022 17:29:26 -0700


samredai commented on code in PR #5258:
URL: https://github.com/apache/iceberg/pull/5258#discussion_r919444305



##########
python/pyiceberg/expressions/base.py:
##########
@@ -327,6 +347,54 @@ def bind(self, schema: Schema, case_sensitive: bool) -> 
BoundIn[T]:
         return BoundIn(bound_ref, tuple(lit.to(bound_ref.field.field_type) for 
lit in self.literals))  # type: ignore
 
 
+@dataclass(frozen=True)
+class BoundEQ(BoundPredicate[T]):
+    def __invert__(self):
+        raise TypeError("EQ expressions do not support negation.")
+
+
+@dataclass(frozen=True)
+class EQ(UnboundPredicate[T]):
+    def __invert__(self):
+        raise TypeError("EQ expressions do not support negation.")
+
+    def bind(self, schema: Schema, case_sensitive: bool) -> BoundEQ[T]:
+        bound_ref = self.term.bind(schema, case_sensitive)
+        return BoundEQ(bound_ref, 
self.literals[0].to(bound_ref.field.field_type))  # type: ignore
+
+
+@dataclass(frozen=True)
+class BoundLT(BoundPredicate[T]):
+    def __invert__(self):
+        raise TypeError("LT expressions do not support negation.")
+
+
+@dataclass(frozen=True)
+class LT(UnboundPredicate[T]):

Review Comment:
   This isn't a huge deal and I know the Java enum used these abbreviations 
like `LT`, but I feel like expanding these might be better for the python class 
names, i.e. `LessThan`.
   
   In a larger expression, `LessThan(Reference("foo_num"), literal(10))` is 
easier to reason about when seen for the first time than 
`LT(Reference("foo_num"), literal(10))`.



##########
python/pyiceberg/expressions/base.py:
##########
@@ -172,13 +178,33 @@ def bind(self, schema: Schema, case_sensitive: bool) -> 
BoundReference[T]:
 @dataclass(frozen=True)  # type: ignore[misc]
 class BoundPredicate(Bound[T], BooleanExpression):
     term: BoundReference[T]
-    literals: Tuple[Literal[T], ...]
+    literals: Union[Tuple[Literal[T], ...], Literal[T]]
+    literals_len_lb: int = field(init=False, repr=False, hash=False, 
compare=False, default=1)
+    literals_len_ub: Optional[int] = field(init=False, repr=False, hash=False, 
compare=False, default=1)
+
+    def __post_init__(self):
+        if not isinstance(self.literals, tuple):
+            object.__setattr__(self, "literals", (self.literals,))
+        if not (self.literals_len_lb <= len(self.literals) <= 
(self.literals_len_ub or float("inf"))):  # type: ignore
+            raise AttributeError(
+                f"{self.__class__} expects a number of literals between at 
least {self.literals_len_lb} and at most {self.literals_len_ub or float('inf')}"
+            )
 
 
 @dataclass(frozen=True)  # type: ignore[misc]
 class UnboundPredicate(Unbound[T, BooleanExpression], BooleanExpression):
     term: Reference[T]
-    literals: Tuple[Literal[T], ...]
+    literals: Union[Tuple[Literal[T], ...], Literal[T]]
+    literals_len_lb: int = field(init=False, repr=False, hash=False, 
compare=False, default=1)

Review Comment:
   Should this be `Optional[int]` since it has `default=1` for the field?



##########
python/pyiceberg/expressions/base.py:
##########
@@ -172,13 +178,33 @@ def bind(self, schema: Schema, case_sensitive: bool) -> 
BoundReference[T]:
 @dataclass(frozen=True)  # type: ignore[misc]
 class BoundPredicate(Bound[T], BooleanExpression):
     term: BoundReference[T]
-    literals: Tuple[Literal[T], ...]
+    literals: Union[Tuple[Literal[T], ...], Literal[T]]
+    literals_len_lb: int = field(init=False, repr=False, hash=False, 
compare=False, default=1)
+    literals_len_ub: Optional[int] = field(init=False, repr=False, hash=False, 
compare=False, default=1)
+
+    def __post_init__(self):
+        if not isinstance(self.literals, tuple):
+            object.__setattr__(self, "literals", (self.literals,))
+        if not (self.literals_len_lb <= len(self.literals) <= 
(self.literals_len_ub or float("inf"))):  # type: ignore
+            raise AttributeError(
+                f"{self.__class__} expects a number of literals between at 
least {self.literals_len_lb} and at most {self.literals_len_ub or float('inf')}"
+            )
 
 
 @dataclass(frozen=True)  # type: ignore[misc]
 class UnboundPredicate(Unbound[T, BooleanExpression], BooleanExpression):
     term: Reference[T]
-    literals: Tuple[Literal[T], ...]
+    literals: Union[Tuple[Literal[T], ...], Literal[T]]

Review Comment:
   I think using a pipe character will work here for the typehint. ->   
`literals: Tuple[Literal[T], ...] | Literal[T]` 



##########
python/pyiceberg/expressions/base.py:
##########
@@ -327,6 +347,54 @@ def bind(self, schema: Schema, case_sensitive: bool) -> 
BoundIn[T]:
         return BoundIn(bound_ref, tuple(lit.to(bound_ref.field.field_type) for 
lit in self.literals))  # type: ignore
 
 
+@dataclass(frozen=True)
+class BoundEQ(BoundPredicate[T]):
+    def __invert__(self):
+        raise TypeError("EQ expressions do not support negation.")
+
+
+@dataclass(frozen=True)
+class EQ(UnboundPredicate[T]):
+    def __invert__(self):
+        raise TypeError("EQ expressions do not support negation.")

Review Comment:
   Initially I was thinking we wouldn't need `__invert__` since we could just 
negate when visiting...but I'm realizing we need to capture these inversions as 
the user defines the expression. Since `~EQ(Reference("foo"), literal("bar"))` 
would raise `TypeError: bad operand type for unary ~: EQ`, we actually might 
need a `NotEQ` class that can be used directly or returned by `EQ.__invert__`. 
Does that sound right?



##########
python/pyiceberg/expressions/base.py:
##########
@@ -172,13 +178,33 @@ def bind(self, schema: Schema, case_sensitive: bool) -> 
BoundReference[T]:
 @dataclass(frozen=True)  # type: ignore[misc]
 class BoundPredicate(Bound[T], BooleanExpression):
     term: BoundReference[T]
-    literals: Tuple[Literal[T], ...]
+    literals: Union[Tuple[Literal[T], ...], Literal[T]]
+    literals_len_lb: int = field(init=False, repr=False, hash=False, 
compare=False, default=1)
+    literals_len_ub: Optional[int] = field(init=False, repr=False, hash=False, 
compare=False, default=1)
+
+    def __post_init__(self):
+        if not isinstance(self.literals, tuple):
+            object.__setattr__(self, "literals", (self.literals,))
+        if not (self.literals_len_lb <= len(self.literals) <= 
(self.literals_len_ub or float("inf"))):  # type: ignore
+            raise AttributeError(
+                f"{self.__class__} expects a number of literals between at 
least {self.literals_len_lb} and at most {self.literals_len_ub or float('inf')}"
+            )
 
 
 @dataclass(frozen=True)  # type: ignore[misc]
 class UnboundPredicate(Unbound[T, BooleanExpression], BooleanExpression):
     term: Reference[T]
-    literals: Tuple[Literal[T], ...]
+    literals: Union[Tuple[Literal[T], ...], Literal[T]]
+    literals_len_lb: int = field(init=False, repr=False, hash=False, 
compare=False, default=1)
+    literals_len_ub: Optional[int] = field(init=False, repr=False, hash=False, 
compare=False, default=1)
+
+    def __post_init__(self):
+        if not isinstance(self.literals, tuple):

Review Comment:
   This makes sense to me but `if isinstance(self.literals, Literal):` is more 
explicit here and seems more intuitive.



##########
python/pyiceberg/expressions/base.py:
##########
@@ -172,13 +178,33 @@ def bind(self, schema: Schema, case_sensitive: bool) -> 
BoundReference[T]:
 @dataclass(frozen=True)  # type: ignore[misc]
 class BoundPredicate(Bound[T], BooleanExpression):
     term: BoundReference[T]
-    literals: Tuple[Literal[T], ...]
+    literals: Union[Tuple[Literal[T], ...], Literal[T]]
+    literals_len_lb: int = field(init=False, repr=False, hash=False, 
compare=False, default=1)
+    literals_len_ub: Optional[int] = field(init=False, repr=False, hash=False, 
compare=False, default=1)
+
+    def __post_init__(self):
+        if not isinstance(self.literals, tuple):
+            object.__setattr__(self, "literals", (self.literals,))
+        if not (self.literals_len_lb <= len(self.literals) <= 
(self.literals_len_ub or float("inf"))):  # type: ignore
+            raise AttributeError(
+                f"{self.__class__} expects a number of literals between at 
least {self.literals_len_lb} and at most {self.literals_len_ub or float('inf')}"
+            )
 
 
 @dataclass(frozen=True)  # type: ignore[misc]
 class UnboundPredicate(Unbound[T, BooleanExpression], BooleanExpression):
     term: Reference[T]
-    literals: Tuple[Literal[T], ...]
+    literals: Union[Tuple[Literal[T], ...], Literal[T]]
+    literals_len_lb: int = field(init=False, repr=False, hash=False, 
compare=False, default=1)
+    literals_len_ub: Optional[int] = field(init=False, repr=False, hash=False, 
compare=False, default=1)
+
+    def __post_init__(self):
+        if not isinstance(self.literals, tuple):
+            object.__setattr__(self, "literals", (self.literals,))
+        if not (self.literals_len_lb <= len(self.literals) <= 
(self.literals_len_ub or float("inf"))):  # type: ignore

Review Comment:
   For the `literals_len_lb` and `literals_len_ub`, are there scenarios where 
we need this specificity or is it the case that there will be two kinds of 
predicates: those that take a single literal and those that take a set of 
literals of any size? If it's the latter, should we just handle that in the 
`__post_init__` methods for the concrete predicate classes?
   
   ```py
   @dataclass(frozen=True)
   class In(UnboundPredicate[T]):
   
       def __post_init__(self):
           if isinstance(self.literals, Literal):
               object.__setattr__(self, "literals", set(self.literals,))
   ```
   
   ...and do the reverse for predicates that require a single literal:
   
   ```py
   @dataclass(frozen=True)
   class EQ(UnboundPredicate[T]):
   
       def __post_init__(self):
           if isinstance(self.literals, Set):
               if len(self.literals) > 1:
                   raise TypeError("...")
               object.__setattr__(self, "literals", self.literals[0])
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] samredai commented on a diff in pull request #5258: More Python Expressions

Reply via email to