Hi Mike, Thanks for weighing in here. I think this discussion needs to be split in two parts. On one side there is the implementation discussion which Pieter and Ken have led which focuses on how to make this work for providers, should exceptions be used in control flow, etc. I think the second discussion needs to be about ensuring the semantics make sense for gremlin. My focus here is on the semantic discussion.
I agree with your points that ternary-boolean/three-valued logic well defined and pervasive in many database systems. After all the semantics outlined in our provider docs (https://tinkerpop.apache.org/docs/3.6.0/dev/provider/#_ternary_boolean_logics) are essentially the same as comparison and logic semantics in the SQL standard. Where I have some concerns with the semantics in gremlin has to do with the boundaries of the logic expression. In SQL, it is very clear what is, and is not a part of a single logic expression. If I have something such as: WHERE x > 5 OR (NOT y < 10); It is very well understood that everything following the WHERE should be considered as a single logic expression which is recursively composed of other logic expressions or comparisons. All of the ternary logic is isolated and contained within that composite expression, and it is well understood that WHERE will remove any results in which the composite expression does not return true. Gremlin on the other hand does not have a notion of a logic expression. We have “atomic” predicates like P.gt, and we have composite predicates such as P.and, but there is also logical FilterSteps such as AndStep and NotStep. These steps are what creates the fuzzy boundaries around a logic expression. As of right now, the equivalent of the above in gremlin would be something such as: Ex1: g…().as(“x”)…().as(“y”).where( or( select(“x”).is(P.gt(5)), select(“y”).is(P.not(P.lt(10))) ) ) The question I want to ask about this is what part of that traversal is the “logic expression”. It is tempting to define it as everything within the where() step. That definition makes sense in this example but is difficult to formally define for general cases. If define a logic expression as the contents of a where() step, then in the following example (Ex2), there is a hard break between the inner where() step and the larger expression. Ex2: g…().as(“x”)…().as(“y”).where( or( select(“x”).is(P.gt(5)), where(select(“y”).is(P.not(P.lt(10)))) ) ) In this example, the whole expression in the outer where()-step feels like a single composite logic expression and having the internal where()-step create a hard break and early reduction leads to an unexpected difference in semantics between Ex1 and Ex2. I have seen a few cases in the past in which the different error reduction points between Ex1 and Ex2 has created a lot of confusion for users. An additional bit of weird semantics we currently have due to implementation limitations is that Error states cannot get passed through any parent step which is not a FilterStep. This is because the logic for handling errors is implemented within FilterStep and it is not clear what it would mean to reduce an error state to false from outside of the context of a FilterStep. However, the issue that this creates is that a traversal such as Ex3, will be treated as a single composite logic expression, whereas the union()-step in Ex4 forces early reduction and breaks this into 2 independent logic expressions. Ex3: …not( is(eq(0)) ) Ex4: …not( union( is(eq(0)) ) ) To add one more layer on top of this, we currently have certain strategies which may reorder, modify, replace, or remove certain steps as optimizations. Many of these optimizations have not properly considered the ternary boolean semantics and produce incorrect results in error cases. I have spent a fair bit of time thinking about this and I have not yet come up with a clear set of rules to determine the right boundary for a logic expression which feels intuitive, or at the very least is teachable. Any system I’ve come up with which encompasses steps within logic expression always leads to certain edge cases where the semantics no longer make sense. For this reason, I’m wondering if the best solution might be to set the boundary at the predicate level. In this case ternary boolean semantics would work within a composite predicate such as P.lt(5).and(P.not(P.eq(3))), however when returning a value out to the step level, it would have to be either true or false. Under this model we would no longer view a tree of And/Or/Not steps as a single logic expression but instead as a series of discrete steps. I think this may be the right path forward although it likely requires leveling up the capabilities of predicates in order to be more useful. I would be curious to know if others have any thoughts on what the right rules are for determining when to reduce from ternary boolean logic to true and false. Regards, Cole From: Mike Personick <[email protected]> Date: Wednesday, September 20, 2023 at 9:41 AM To: [email protected] <[email protected]> Subject: Re: [DISCUSS] Ternary Boolean Handling in New Steps Early reduction is not possible because of negation. Please see the discussion of ternary boolean semantics here: https://tinkerpop.apache.org/docs/3.6.0/dev/provider/#_ternary_boolean_logics On Tue, Sep 19, 2023 at 12:25 PM Ken Hu <[email protected]> wrote: > Thanks for your input Pieter. > > I agree with a lot of what you said and I think your suggestion is > reasonable. From what I can tell, the logic in the FilterStep is there > because reduction points are needed for the ternary boolean system. One of > the ways to move logic out of this area would be to get rid of ternary > boolean by immediately reducing any ERROR state to FALSE. > > On Sat, Sep 16, 2023 at 1:06 AM pieter <[email protected]> wrote: > > > Hi, > > I have not really applied my mind to issue of the semantics, but what I > > do recall is being irritated with `GremlinValueComparator` throwing > > `GremlinTypeErrorException`. > > Sometimes its propogated and sometimes swallowed. Code smell!!! > > Using exceptions as process logic right there in the heart of > > TinkerPop's iterator logic seemed to me as a bad idea and breaks > > providers ability to override classes. > > A good example is the logic in FilterStep.processNextStart() where the > > exception is being swalloed. This logic should not be here and > > exceptions should not be used for control flow. > > Providers expect the base filter step to filter, not conditionally > > swallow exceptions based on a long if statement. > > My suggestion is let the comparator do what comparators do and return a > > int. The type issue should be handled higher up the stack. > > > > Regards > > Pieter > > On Fri, 2023-09-15 at 14:14 -0700, Ken Hu wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Cole, "However, it makes sense for this short-term decision > > > > > > to align with our long-term direction regarding comparability > > > > > > semantics. I > > > > > > wouldn’t be opposed to your proposed implementation if the > > > > > > long-term plan is to move all steps towards this immediate > > > > > > reduction behaviour." This is sort of my thinking as well. As > > > > > > you demonstrated in your post, there is already an > > > > > > inconsistency with the way ternary > > > > > > boolean is reduced which leads to different results for > > > > > > equivalent queries. > > > > > > This is why I would prefer to just move ahead with an > > > > > > implementation that I > > > > > > believe is the most consistent with the expectations of users. > > > > > > However, you > > > > > > have valid concerns about adding even more inconsistencies to > > > > > > the > > > > > > language so if others voice their concern as well then I'll > > > > > > make it > > > > > > behave more like AND and OR. Regards, Ken On Mon, Sep 11, 2023 > > > > > > at 6:11 PM Cole Greer [email protected]: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Ken, Thanks for bringing this up, I believe topic > > > > > > > > > > warrants some > > > > > > > > > > further discussion. My understanding of the intent of > > > > > > > > > > the > > > > > > > > > > current system is that it aims to provide a consistent > > > > > > > > > > and predictable set of > > > > > > > > > > rules for comparisons between any datatypes. Prior to > > > > > > > > > > 3.6, in general > > > > > > > > > > comparisons between different types in gremlin produced > > > > > > > > > > undefined behaviour (in practice this usually meant an > > > > > > > > > > exception). The current system > > > > > > > > > > successfully resolved much of this issue although it > > > > > > > > > > has introduced certain > > > > > > > > > > semantic consistency issues (see > > > > > > > > > > https://issues.apache.org/jira/browse/TINKERPOP-2940). > > > > > > > > > > Further, > > > > > > > > > > while the docs ( > > > > > > > > > > https://tinkerpop.apache.org/docs/3.7.0/dev/provider/#_ > > > > > > > > > > ternary_boolean_logics) > > > > > > > > > > are quite clear regarding the propagation/reduction > > > > > > > > > > behaviour > > > > > > > > > > in many cases, as you probe the edges it becomes > > > > > > > > > > muddier. Considering the following example, the docs > > > > > > > > > > quite clearly > > > > > > > > > > define the expected behaviour of the first traversal, > > > > > > > > > > but the expected behaviour is not clear outside of > > > > > > > > > > basic combinations of AND, > > > > > > > > > > OR, and NOT: gremlin> g.inject(1).not(is(gt("one"))) // > > > > > > > > > > Produces no output > > > > > > > > > > gremlin> g.inject(1).not(union(is(gt("one")), > > > > > > > > > > is(eq("zero")))) > > > > > > > > > > ==>1 // Error is reduced to false prior to Union Step, > > > > > > > > > > and thus > > > > > > > > > > not propagated into the Not Step. This is a good > > > > > > > > > > example that we are currently in a bit of a > > > > > > > > > > weird place where some of the language semantics are > > > > > > > > > > formally defined > > > > > > > > > > in documentation, while the rest of the language > > > > > > > > > > semantics are > > > > > > > > > > defined by implementation. It currently cannot be > > > > > > > > > > determined if the above > > > > > > > > > > example is expected or a bug. I believe it is important > > > > > > > > > > that we find a resolution to > > > > > > > > > > this by expanding our formally defined semantics or > > > > > > > > > > changing the > > > > > > > > > > implementation (when a breaking change is permittable). > > > > > > > > > > As for the short-term question posed by ANY and ALL, my > > > > > > > > > > only concern with your suggestion is it would be > > > > > > > > > > subject to the > > > > > > > > > > following inconsistency although as shown above there > > > > > > > > > > is current > > > > > > > > > > precedent for this sort of thing. gremlin> > > > > > > > > > > g.inject(1).not(is(lt("one"))) // Produces no output > > > > > > > > > > gremlin> g.inject([1]).not(any(is(lt("one")))) ==>[1] > > > > > > > > > > In my opinion the most neutral direction would be for > > > > > > > > > > ANY to > > > > > > > > > > behave the same as a chain of OR’s and for ALL to act > > > > > > > > > > as a chain of > > > > > > > > > > ANDs. However, it makes sense for this short-term > > > > > > > > > > decision to align > > > > > > > > > > with our long-term direction regarding comparability > > > > > > > > > > semantics. I > > > > > > > > > > wouldn’t be opposed to your proposed implementation if > > > > > > > > > > the long-term plan is to > > > > > > > > > > move all steps towards this immediate reduction > > > > > > > > > > behaviour. Thanks, Cole Greer From: Ken Hu > > > > > > > > > > [email protected]: Monday, > > > > > > > > > > September 11, 2023 at 4:16 PM To: > > > > > > > > > > [email protected] [email protected] > > > > > > > > > > t: > > > > > > > > > > [DISCUSS] Ternary Boolean Handling in New Steps Hi All, > > > > > > > > > > Starting in version 3.6, the ternary boolean system was > > > > > > > > > > introduced to handle comparison/equality tests within > > > > > > > > > > Gremlin. Recently, > > > > > > > > > > I've been implementing some list functions from > > > > > > > > > > Proposal 3 which > > > > > > > > > > make heavy use of the GremlinValueComparator to > > > > > > > > > > determine if values > > > > > > > > > > satisfy a specific condition. However, I'm finding it a > > > > > > > > > > bit tricky to > > > > > > > > > > understand how I should handle the > > > > > > > > > > GremlinTypeErrorException. For any() and > > > > > > > > > > all(), it seems like it would make sense to immediately > > > > > > > > > > reduce any ERROR state > > > > > > > > > > to false as it's a filter step. In the case of all(), > > > > > > > > > > if a > > > > > > > > > > GremlinTypeErrorException is caught, it would mean > > > > > > > > > > there was a comparison error so the > > > > > > > > > > traverser should be removed from the stream. However, > > > > > > > > > > doing this > > > > > > > > > > seemingly clashes with the original intention of > > > > > > > > > > ternary boolean which is to > > > > > > > > > > allow a provider-specific response on how to handle an > > > > > > > > > > ERROR state. My current thoughts are that we should > > > > > > > > > > rework the ternary > > > > > > > > > > boolean system in the future to make it easier to > > > > > > > > > > incorporate it into > > > > > > > > > > new steps. One of the trickiest parts is that it uses > > > > > > > > > > unchecked exceptions as > > > > > > > > > > a means to implement the ERROR state which can get > > > > > > > > > > easily > > > > > > > > > > missed or accidentally leaked to the user (which has > > > > > > > > > > happened before). > > > > > > > > > > For now, I'm planning to go ahead and immediately > > > > > > > > > > reduce ERROR states as I > > > > > > > > > > think that is what makes the most sense for list > > > > > > > > > > functions. Does anyone have any thoughts about this? > > > > > > > > > > Thanks, Ken > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
