As of TinkerPop 3.6.0, gremlin has used a ternary boolean logic (TBL) system to
consistently resolve invalid comparisons (incomparable types, comparisons to
NaN...). This is documented in our semantic docs[1]. Such a system is standard
in many declarative query languages such as SQL and SPARQL, however in
gremlin, it leads to a lot of subtle complexity and unintuitive consequences.

The primary challenges regarding TBL in gremlin has to do with the notion of
“reducing” ERROR states to FALSE. Ternary boolean logic is typically used in a
query language in expressions which filter results. If there is some expression
such as WHERE x.name is "Bob" AND x.age > 32 and x = {name: null, age: 21},
then the expression evaluates to (ERROR AND FALSE) -> ERROR. The result of this
is that `x` should be filtered from the results as the expression did not 
evaluate
to TRUE. In essence, ERROR was reduced to FALSE after the logic expression was
fully evaluated, right before the filter was applied. The main challenge in 
gremlin
is determining what is the boundary of a logic expression and when should an
ERROR state be reduced. The “reduction point” in most declarative query
languages is obvious as these filtering expressions are typically contained in 
some
syntactically well-defined where-clause. Gremlin’s functional syntax results in
complex filtering expressions being weaved throughout a query, with no clear
boundaries.

The current gremlin semantics are defined such that if any FilterStep receives a
GremlinTypeErrorException (indicating an invalid comparison took place), it will
propagate the error to its parent step if the parent is also a FilterStep. 
Otherwise,
it will drop the current traverser (effectively reducing the ERROR to a FALSE). 
The
only exception is the and() and or() steps which will handle the ERROR according
to TBL semantics.

This reduction behaviour is very unintuitive for users and leads to many 
seemingly
equivalent queries producing different results, as well as certain optimization
strategies unexpectedly altering traversal semantics [2]. Some examples of this 
are:

gremlin> 
g.withoutStrategies(InlineFilterStrategy).inject(5).not(is(P.lt<http://p.lt/>(Double.NaN)))
// No results
gremlin> 
g.withoutStrategies(InlineFilterStrategy).inject(5).not(where(is(P.lt<http://p.lt/>(Double.NaN))))
==>5

gremlin> g.inject(5).not(or(is(P.lt<http://p.lt/>(Double.NaN)), is(3)))
// No results
// Error propagates through or() to not(), after which it is reduced to false 
and
// results are filtered out
gremlin> g.inject(5).not(coalesce(is(P.lt<http://p.lt/>(Double.NaN)), is(3)))
==>5
// Coalesce is not a filter step, so it forces is() to reduce the ERROR to false
// early, instead of passing the ERROR all the way to not()

gremlin> 
g.withoutStrategies(InlineFilterStrategy).inject(1).not(where(is(lt(NaN))))
==>1
gremlin> g.inject(1).not(where(is(lt(NaN))))
// No results
// InlineFilterStrategy optimizes to simply not(is(lt(NaN))) which removes the
// early reduction from the where() step

I believe these types of issues with TBL are inevitable in gremlin due to the 
lack of a
clearly defined where-clause which encapsulates all filtering. For this reason, 
I
propose that we replace the ternary boolean semantics with simpler binary 
boolean
semantics which do not face this reduction problem. If adopted, that would mean
that invalid comparisons such as 5>“foo” or 1.0<NaN would immediately
return FALSE, and thus not(5>“foo”) would evaluate to TRUE. This is different 
from
our current semantics, but it is both consistent and communicable, which are 
both
are very challenging to achieve if we retain the ternary semantics.

Please let me know any thoughts you have on the proposed semantics change. If
there are no objections, I intend to update the implementation and
documentation to follow binary boolean semantics in 3.8-dev.

Thanks,
Cole

Reply via email to