Hi Mike,

Thanks for weighing in here. I think this discussion needs to be split in two 
parts. On one side there is the implementation discussion which Pieter and Ken 
have led which focuses on how to make this work for providers, should 
exceptions be used in control flow, etc. I think the second discussion needs to 
be about ensuring the semantics make sense for gremlin. My focus here is on the 
semantic discussion.

I agree with your points that ternary-boolean/three-valued logic well defined 
and pervasive in many database systems. After all the semantics outlined in our 
provider docs 
(https://tinkerpop.apache.org/docs/3.6.0/dev/provider/#_ternary_boolean_logics) 
are essentially the same as comparison and logic semantics in the SQL standard. 
Where I have some concerns with the semantics in gremlin has to do with the 
boundaries of the logic expression. In SQL, it is very clear what is, and is 
not a part of a single logic expression. If I have something such as:

WHERE x > 5
OR
(NOT y < 10);

It is very well understood that everything following the WHERE should be 
considered as a single logic expression which is recursively composed of other 
logic expressions or comparisons. All of the ternary logic is isolated and 
contained within that composite expression, and it is well understood that 
WHERE will remove any results in which the composite expression does not return 
true.

Gremlin on the other hand does not have a notion of a logic expression. We have 
“atomic” predicates like P.gt, and we have composite predicates such as P.and, 
but there is also logical FilterSteps such as AndStep and NotStep. These steps 
are what creates the fuzzy boundaries around a logic expression. As of right 
now, the equivalent of the above in gremlin would be something such as:

Ex1: g…().as(“x”)…().as(“y”).where( or( select(“x”).is(P.gt(5)),  
select(“y”).is(P.not(P.lt(10))) ) )

The question I want to ask about this is what part of that traversal is the 
“logic expression”. It is tempting to define it as everything within the 
where() step. That definition makes sense in this example but is difficult to 
formally define for general cases. If define a logic expression as the contents 
of a where() step, then in the following example (Ex2), there is a hard break 
between the inner where() step and the larger expression.

Ex2: g…().as(“x”)…().as(“y”).where( or( select(“x”).is(P.gt(5)),  
where(select(“y”).is(P.not(P.lt(10)))) ) )

In this example, the whole expression in the outer where()-step feels like a 
single composite logic expression and having the internal where()-step create a 
hard break and early reduction leads to an unexpected difference in semantics 
between Ex1 and Ex2. I have seen a few cases in the past in which the different 
error reduction points between Ex1 and Ex2 has created a lot of confusion for 
users.

An additional bit of weird semantics we currently have due to implementation 
limitations is that Error states cannot get passed through any parent step 
which is not a FilterStep. This is because the logic for handling errors is 
implemented within FilterStep and it is not clear what it would mean to reduce 
an error state to false from outside of the context of a FilterStep. However, 
the issue that this creates is that a traversal such as Ex3, will be treated as 
a single composite logic expression, whereas the union()-step in Ex4 forces 
early reduction and breaks this into 2 independent logic expressions.

Ex3: …not( is(eq(0)) )
Ex4: …not( union( is(eq(0)) ) )

To add one more layer on top of this, we currently have certain strategies 
which may reorder, modify, replace, or remove certain steps as optimizations. 
Many of these optimizations have not properly considered the ternary boolean 
semantics and produce incorrect results in error cases.

I have spent a fair bit of time thinking about this and I have not yet come up 
with a clear set of rules to determine the right boundary for a logic 
expression which feels intuitive, or at the very least is teachable. Any system 
I’ve come up with which encompasses steps within logic expression always leads 
to certain edge cases where the semantics no longer make sense. For this 
reason, I’m wondering if the best solution might be to set the boundary at the 
predicate level. In this case ternary boolean semantics would work within a 
composite predicate such as P.lt(5).and(P.not(P.eq(3))), however when returning 
a value out to the step level, it would have to be either true or false. Under 
this model we would no longer view a tree of And/Or/Not steps as a single logic 
expression but instead as a series of discrete steps. I think this may be the 
right path forward although it likely requires leveling up the capabilities of 
predicates in order to be more useful.

I would be curious to know if others have any thoughts on what the right rules 
are for determining when to reduce from ternary boolean logic to true and false.

Regards,

Cole



From: Mike Personick <m...@dayzero.io>
Date: Wednesday, September 20, 2023 at 9:41 AM
To: dev@tinkerpop.apache.org <dev@tinkerpop.apache.org>
Subject: Re: [DISCUSS] Ternary Boolean Handling in New Steps
Early reduction is not possible because of negation. Please see the
discussion of ternary boolean semantics here:

https://tinkerpop.apache.org/docs/3.6.0/dev/provider/#_ternary_boolean_logics

On Tue, Sep 19, 2023 at 12:25 PM Ken Hu <k...@bitquilltech.com.invalid>
wrote:

> Thanks for your input Pieter.
>
> I agree with a lot of what you said and I think your suggestion is
> reasonable. From what I can tell, the logic in the FilterStep is there
> because reduction points are needed for the ternary boolean system. One of
> the ways to move logic out of this area would be to get rid of ternary
> boolean by immediately reducing any ERROR state to FALSE.
>
> On Sat, Sep 16, 2023 at 1:06 AM pieter <pieter.mar...@gmail.com> wrote:
>
> > Hi,
> > I have not really applied my mind to issue of the semantics, but what I
> > do recall is being irritated with `GremlinValueComparator` throwing
> > `GremlinTypeErrorException`.
> > Sometimes its propogated and sometimes swallowed. Code smell!!!
> > Using exceptions as process logic right there in the heart of
> > TinkerPop's iterator logic seemed to me as a bad idea and breaks
> > providers ability to override classes.
> > A good example is the logic in FilterStep.processNextStart() where the
> > exception is being swalloed. This logic should not be here and
> > exceptions should not be used for control flow.
> > Providers expect the base filter step to filter, not conditionally
> > swallow exceptions based on a long if statement.
> > My suggestion is let the comparator do what comparators do and return a
> > int. The type issue should be handled higher up the stack.
> >
> > Regards
> > Pieter
> > On Fri, 2023-09-15 at 14:14 -0700, Ken Hu wrote:
> > >
> > >
> > > >
> > >
> > >
> > > >
> > > >
> > > > >
> > > > >
> > > > > >
> > > > > > Hi Cole, "However, it makes sense for this short-term decision
> > > > > > to align with our long-term direction regarding comparability
> > > > > > semantics. I
> > > > > > wouldn’t be opposed to your proposed implementation if the
> > > > > > long-term plan is to move all steps towards this immediate
> > > > > > reduction behaviour." This is sort of my thinking as well. As
> > > > > > you demonstrated in your post, there is already an
> > > > > > inconsistency with the way ternary
> > > > > > boolean is reduced which leads to different results for
> > > > > > equivalent queries.
> > > > > > This is why I would prefer to just move ahead with an
> > > > > > implementation that I
> > > > > > believe is the most consistent with the expectations of users.
> > > > > > However, you
> > > > > > have valid concerns about adding even more inconsistencies to
> > > > > > the
> > > > > > language so if others voice their concern as well then I'll
> > > > > > make it
> > > > > > behave more like AND and OR. Regards, Ken On Mon, Sep 11, 2023
> > > > > > at 6:11 PM Cole Greer cole.gr...@improving.com.invalidwrote:
> > > > >
> > > >
> > >
> > >
> > > >
> > >
> > >
> > > >
> > > >
> > > > >
> > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Hi Ken, Thanks for bringing this up, I believe topic
> > > > > > > > > > warrants some
> > > > > > > > > > further discussion. My understanding of the intent of
> > > > > > > > > > the
> > > > > > > > > > current system is that it aims to provide a consistent
> > > > > > > > > > and predictable set of
> > > > > > > > > > rules for comparisons between any datatypes. Prior to
> > > > > > > > > > 3.6, in general
> > > > > > > > > > comparisons between different types in gremlin produced
> > > > > > > > > > undefined behaviour (in practice this usually meant an
> > > > > > > > > > exception). The current system
> > > > > > > > > > successfully resolved much of this issue although it
> > > > > > > > > > has introduced certain
> > > > > > > > > > semantic consistency issues (see
> > > > > > > > > > https://issues.apache.org/jira/browse/TINKERPOP-2940).
> > > > > > > > > > Further,
> > > > > > > > > > while the docs (
> > > > > > > > > > https://tinkerpop.apache.org/docs/3.7.0/dev/provider/#_
> > > > > > > > > > ternary_boolean_logics)
> > > > > > > > > > are quite clear regarding the propagation/reduction
> > > > > > > > > > behaviour
> > > > > > > > > > in many cases, as you probe the edges it becomes
> > > > > > > > > > muddier. Considering the following example, the docs
> > > > > > > > > > quite clearly
> > > > > > > > > > define the expected behaviour of the first traversal,
> > > > > > > > > > but the expected behaviour is not clear outside of
> > > > > > > > > > basic combinations of AND,
> > > > > > > > > > OR, and NOT: gremlin> g.inject(1).not(is(gt("one"))) //
> > > > > > > > > > Produces no output
> > > > > > > > > > gremlin> g.inject(1).not(union(is(gt("one")),
> > > > > > > > > > is(eq("zero"))))
> > > > > > > > > > ==>1 // Error is reduced to false prior to Union Step,
> > > > > > > > > > and thus
> > > > > > > > > > not propagated into the Not Step. This is a good
> > > > > > > > > > example that we are currently in a bit of a
> > > > > > > > > > weird place where some of the language semantics are
> > > > > > > > > > formally defined
> > > > > > > > > > in documentation, while the rest of the language
> > > > > > > > > > semantics are
> > > > > > > > > > defined by implementation. It currently cannot be
> > > > > > > > > > determined if the above
> > > > > > > > > > example is expected or a bug. I believe it is important
> > > > > > > > > > that we find a resolution to
> > > > > > > > > > this by expanding our formally defined semantics or
> > > > > > > > > > changing the
> > > > > > > > > > implementation (when a breaking change is permittable).
> > > > > > > > > > As for the short-term question posed by ANY and ALL, my
> > > > > > > > > > only concern with your suggestion is it would be
> > > > > > > > > > subject to the
> > > > > > > > > > following inconsistency although as shown above there
> > > > > > > > > > is current
> > > > > > > > > > precedent for this sort of thing. gremlin>
> > > > > > > > > > g.inject(1).not(is(lt("one"))) // Produces no output
> > > > > > > > > > gremlin> g.inject([1]).not(any(is(lt("one")))) ==>[1]
> > > > > > > > > > In my opinion the most neutral direction would be for
> > > > > > > > > > ANY to
> > > > > > > > > > behave the same as a chain of OR’s and for ALL to act
> > > > > > > > > > as a chain of
> > > > > > > > > > ANDs. However, it makes sense for this short-term
> > > > > > > > > > decision to align
> > > > > > > > > > with our long-term direction regarding comparability
> > > > > > > > > > semantics. I
> > > > > > > > > > wouldn’t be opposed to your proposed implementation if
> > > > > > > > > > the long-term plan is to
> > > > > > > > > > move all steps towards this immediate reduction
> > > > > > > > > > behaviour. Thanks, Cole Greer From: Ken Hu
> > > > > > > > > > k...@bitquilltech.com.INVALIDDate: Monday,
> > > > > > > > > > September 11, 2023 at 4:16 PM To:
> > > > > > > > > > dev@tinkerpop.apache.org dev@tinkerpop.apache.orgSubjec
> > > > > > > > > > t:
> > > > > > > > > > [DISCUSS] Ternary Boolean Handling in New Steps Hi All,
> > > > > > > > > > Starting in version 3.6, the ternary boolean system was
> > > > > > > > > > introduced to handle comparison/equality tests within
> > > > > > > > > > Gremlin. Recently,
> > > > > > > > > > I've been implementing some list functions from
> > > > > > > > > > Proposal 3 which
> > > > > > > > > > make heavy use of the GremlinValueComparator to
> > > > > > > > > > determine if values
> > > > > > > > > > satisfy a specific condition. However, I'm finding it a
> > > > > > > > > > bit tricky to
> > > > > > > > > > understand how I should handle the
> > > > > > > > > > GremlinTypeErrorException. For any() and
> > > > > > > > > > all(), it seems like it would make sense to immediately
> > > > > > > > > > reduce any ERROR state
> > > > > > > > > > to false as it's a filter step. In the case of all(),
> > > > > > > > > > if a
> > > > > > > > > > GremlinTypeErrorException is caught, it would mean
> > > > > > > > > > there was a comparison error so the
> > > > > > > > > > traverser should be removed from the stream. However,
> > > > > > > > > > doing this
> > > > > > > > > > seemingly clashes with the original intention of
> > > > > > > > > > ternary boolean which is to
> > > > > > > > > > allow a provider-specific response on how to handle an
> > > > > > > > > > ERROR state. My current thoughts are that we should
> > > > > > > > > > rework the ternary
> > > > > > > > > > boolean system in the future to make it easier to
> > > > > > > > > > incorporate it into
> > > > > > > > > > new steps. One of the trickiest parts is that it uses
> > > > > > > > > > unchecked exceptions as
> > > > > > > > > > a means to implement the ERROR state which can get
> > > > > > > > > > easily
> > > > > > > > > > missed or accidentally leaked to the user (which has
> > > > > > > > > > happened before).
> > > > > > > > > > For now, I'm planning to go ahead and immediately
> > > > > > > > > > reduce ERROR states as I
> > > > > > > > > > think that is what makes the most sense for list
> > > > > > > > > > functions. Does anyone have any thoughts about this?
> > > > > > > > > > Thanks, Ken
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > >
> > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > >
> > >
> >
> >
>

Reply via email to