Re: [DISCUSS] Comparability and Logic Semantics

Cole Greer Fri, 04 Jul 2025 15:09:03 -0700

My apologies, I seem to have forgotten to link my my footnotes, and 
additionally,
my mail client seems to have converted P.lt to a hyperlink without my noticing.
Please disregard the links in the previous examples, and the intended hyperlinks
are included below:


[1] TBL Semantics docs: 
https://tinkerpop.apache.org/docs/3.7.3/dev/provider/#_ternary_boolean_logics
[2] Strategy Dependence of TBL: 
https://issues.apache.org/jira/browse/TINKERPOP-2940

Thanks,
Cole

On 2025/07/04 21:56:24 Cole Greer wrote:
> As of TinkerPop 3.6.0, gremlin has used a ternary boolean logic (TBL) system 
> to
> consistently resolve invalid comparisons (incomparable types, comparisons to
> NaN...). This is documented in our semantic docs[1]. Such a system is standard
> in many declarative query languages such as SQL and SPARQL, however in
> gremlin, it leads to a lot of subtle complexity and unintuitive consequences.
> 
> The primary challenges regarding TBL in gremlin has to do with the notion of
> “reducing” ERROR states to FALSE. Ternary boolean logic is typically used in a
> query language in expressions which filter results. If there is some 
> expression
> such as WHERE x.name is "Bob" AND x.age > 32 and x = {name: null, age: 21},
> then the expression evaluates to (ERROR AND FALSE) -> ERROR. The result of 
> this
> is that `x` should be filtered from the results as the expression did not 
> evaluate
> to TRUE. In essence, ERROR was reduced to FALSE after the logic expression was
> fully evaluated, right before the filter was applied. The main challenge in 
> gremlin
> is determining what is the boundary of a logic expression and when should an
> ERROR state be reduced. The “reduction point” in most declarative query
> languages is obvious as these filtering expressions are typically contained 
> in some
> syntactically well-defined where-clause. Gremlin’s functional syntax results 
> in
> complex filtering expressions being weaved throughout a query, with no clear
> boundaries.
> 
> The current gremlin semantics are defined such that if any FilterStep 
> receives a
> GremlinTypeErrorException (indicating an invalid comparison took place), it 
> will
> propagate the error to its parent step if the parent is also a FilterStep. 
> Otherwise,
> it will drop the current traverser (effectively reducing the ERROR to a 
> FALSE). The
> only exception is the and() and or() steps which will handle the ERROR 
> according
> to TBL semantics.
> 
> This reduction behaviour is very unintuitive for users and leads to many 
> seemingly
> equivalent queries producing different results, as well as certain 
> optimization
> strategies unexpectedly altering traversal semantics [2]. Some examples of 
> this are:
> 
> gremlin> 
> g.withoutStrategies(InlineFilterStrategy).inject(5).not(is(P.lt<http://p.lt/>(Double.NaN)))
> // No results
> gremlin> 
> g.withoutStrategies(InlineFilterStrategy).inject(5).not(where(is(P.lt<http://p.lt/>(Double.NaN))))
> ==>5
> 
> gremlin> g.inject(5).not(or(is(P.lt<http://p.lt/>(Double.NaN)), is(3)))
> // No results
> // Error propagates through or() to not(), after which it is reduced to false 
> and
> // results are filtered out
> gremlin> g.inject(5).not(coalesce(is(P.lt<http://p.lt/>(Double.NaN)), is(3)))
> ==>5
> // Coalesce is not a filter step, so it forces is() to reduce the ERROR to 
> false
> // early, instead of passing the ERROR all the way to not()
> 
> gremlin> 
> g.withoutStrategies(InlineFilterStrategy).inject(1).not(where(is(lt(NaN))))
> ==>1
> gremlin> g.inject(1).not(where(is(lt(NaN))))
> // No results
> // InlineFilterStrategy optimizes to simply not(is(lt(NaN))) which removes the
> // early reduction from the where() step
> 
> I believe these types of issues with TBL are inevitable in gremlin due to the 
> lack of a
> clearly defined where-clause which encapsulates all filtering. For this 
> reason, I
> propose that we replace the ternary boolean semantics with simpler binary 
> boolean
> semantics which do not face this reduction problem. If adopted, that would 
> mean
> that invalid comparisons such as 5>“foo” or 1.0<NaN would immediately
> return FALSE, and thus not(5>“foo”) would evaluate to TRUE. This is different 
> from
> our current semantics, but it is both consistent and communicable, which are 
> both
> are very challenging to achieve if we retain the ternary semantics.
> 
> Please let me know any thoughts you have on the proposed semantics change. If
> there are no objections, I intend to update the implementation and
> documentation to follow binary boolean semantics in 3.8-dev.
> 
> Thanks,
> Cole
>

Re: [DISCUSS] Comparability and Logic Semantics

Reply via email to