Re: [DISCUSS] Is null equal to null

2023-08-07 Thread David Bechberger
Hello Ken,

I don't know that I have a strong opinion on what NULL==NULL should
evaluate to, but I agree we should come up with a set of rules here for
consistency, both within Gremlin but also with other database language
standards (e.g. GQL and SQL) so that Gremlin best matches customer
expectations.  Gremlin's divergence from user expectations when it comes to
null handling has been a constant headache for new users.  While I agree
with Josh that a type system would make this easier, we still need to be
consistent until we cross that bridge.

For example, if you have a list, A, which is
[1,2,null] and a list, B, which is [1,null]. Should the result of an
INTERSECT be [1,null] or [1]

In Postgres, this would be [1,null] so that is probably what I would
recommend unless someone has a stronger opinion to do something different?

Josh, I am familiar with the work you did on Dragon, and I am curious how
you see your work aligning with the recent SIGMOD paper [1] from the LDBC
working group on PG Schema?

Dave

[1] https://arxiv.org/abs/2211.10962


On Sat, Aug 5, 2023 at 7:02 AM Joshua Shinavier  wrote:

> Hi Ken,
>
> Yes indeed, there is that push. I am not saying that Gremlin shouldn't have
> a type system -- just that certain questions will have better answers once
> it does. While I am not drawing a lot of attention to it yet in connection
> with TinkerPop, there is a type system I am going to propose for TinkerPop.
> The formalism is called Lambda Graph, and it is closely related to the
> Algebraic Property Graphs [1] model which was implemented by Dragon [2]. I
> made a big deal about Dragon three years ago and then was unable to release
> it, so I'm waiting until Hydra [3] is completely ready before promoting it
> here. That said, it's not far from being ready. We are building property
> graph (not yet TinkerPop) applications with it at LinkedIn. I recently gave
> a presentation [4] on the data model which has excerpts from the Lambda
> Graph paper draft. In terms of property types, probably the first thing I
> will explore is integrating Hydra's "TinkerPop" model [5] with TinkerPop
> proper. In that model, property types are parameterized and unspecified, as
> are vertex and edge id types; different type systems for properties and ids
> can be plugged in here. For Hydra's core type system, see hydra/core.Type
> [6]. This type system behaves as I described above: there are no "nulls",
> but there are optionals, which are comparable to the extent that the base
> type is comparable.
>
> Josh
>
> [1] https://arxiv.org/abs/1909.04881
> [2] https://www.uber.com/blog/dragon-schema-integration-at-uber-scale/
> [3] https://github.com/CategoricalData/hydra
> [4]
>
> https://docs.google.com/presentation/d/1PF0K3KtopV0tMVa0sGBW2hDA7nw-cSwQm6h1AED1VSA
> [5]
>
> https://categoricaldata.github.io/hydra/hydra-java/javadoc/hydra/langs/tinkerpop/propertyGraph/package-summary.html
> [6]
>
> https://categoricaldata.github.io/hydra/hydra-java/javadoc/hydra/core/Type.html
>
>
> On Fri, Aug 4, 2023 at 6:23 PM Ken Hu 
> wrote:
>
> > Hi Josh,
> >
> > Thanks for your input. There seems to be a push in the graph database
> world
> > towards having a schema. It's likely something like this would be
> > introduced in TinkerPop in the future. Let's assume that TinkerPop does
> > support schemas, and therefore would have a type system, would this
> change
> > your opinion on the matter?
> >
> > Thanks again,
> > Ken
> >
> > On Wed, Aug 2, 2023 at 3:54 PM Joshua Shinavier 
> wrote:
> >
> > > For what it is worth, I think the question of whether null == null is
> > only
> > > meaningful in the context of a specific type system, which Gremlin so
> far
> > > does not provide. My personal preference is to avoid SQL-style nulls
> and
> > > achieve optionality through union types (e.g. Java's Optional or
> > Haskell's
> > > Maybe). In the case of two lists, if you can assume that the type of
> the
> > > list is list>, then you can safely treat null like
> > > Optional.empty(), and compare it with another null of the same logical
> > type
> > > (int). If that is the interpretation of your two lists, then the
> > > intersection is [1, null].
> > >
> > > Josh
> > >
> > >
> > >
> > > On Tue, Aug 1, 2023 at 5:47 PM Ken Hu 
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > As Gremlin evolves and gains more functionality, it is important that
> > we
> > > > establish some fundamental rules to provide consistency in results.
> One
> > > > such question that we should come to agreement on is how null values
> > are
> > > > compared. Currently, Gremlin seems to mostly follow the comparison
> that
> > > is
> > > > used in Java where NULL == NULL returns TRUE. However, in many other
> > > > database systems, NULL == NULL would return FALSE (or NULL).
> > > >
> > > > This question comes about as I'm starting to look a little deeper
> into
> > > the
> > > > proposed list functions. An example of where this is applicable is
> the
> > > > INTERSECT lis

Re: [DISCUSS] Local Scope As Default

2023-08-07 Thread Stephen Mallette
I think your argument makes sense but your example made me think of
something else:

g.V().hasLabel('person').values('age').fold().where(all(gt(18)))

the design of all() is to return true or false. if that's the case then
using it in a where() will always be successful. returning true/false is
really the job of a P so far and this would introduce something new. this
direction could make sense if you wanted this:

g.V().group().by('classroom').by(values('age').fold().all(gt(18

where you were trying to set the value of true/false to each "classroom"
being over age 18. i'm not sure that's what we want all() to be doing
exactly. i think it's intention is more like the one you supplied and i
think it was meant more as a specialized filtering step, specialized in
that it worked on List sorts of types only and more like the intention of
the examples you used, as in:

gremlin> g.V().hasLabel('person').values('age').fold().all(gt(18))
==>[29,27,32,35]

In this way, all()/any()/some() is a bit like how is() behaves in that it
can apply a predicate to an item in the traversal stream. A good use case
might be for dealing with results like:

gremlin>
g.V().both().both().group().by('name').by(outE().values('weight').fold()).unfold()
==>ripple=[]
==>peter=[0.2, 0.2, 0.2]
==>vadas=[]
==>josh=[1.0, 1.0, 1.0, 0.4, 0.4, 0.4, 1.0, 1.0, 1.0, 0.4, 0.4, 0.4, 1.0,
0.4]
==>lop=[]
==>marko=[0.4, 0.4, 0.4, 0.5, 0.5, 0.5, 1.0, 1.0, 1.0, 0.4, 0.5, 1.0, 0.4,
0.4, 0.4, 0.5, 0.5, 0.5, 1.0, 1.0, 1.0]

where you dont even need to fold() the result. given the results above, i
think folks might want to "find all key/value pairs where all weight values
are gt(0.3)":

g.V().both().both().group().by('name').by(outE().values('weight').fold()).unfold().
  where(select(values).all(gt(0.3)))

i think the analogous form today without all() would maybe be something
like:

gremlin>
g.V().both().both().group().by('name').by(outE().values('weight').fold()).unfold().
..1>   filter(select(values).
..2>  and(count(local).is(gt(0)),
..3>  unfold().choose(__.is(gt(0.3)), constant(1),
constant(0)).
..4>  fold().
..5>  union(sum(local), count(local)).fold().as('x').
..6>
 where('x',eq('x')).by(limit(local,1)).by(tail(local,1
==>josh=[1.0, 1.0, 1.0, 0.4, 0.4, 0.4, 1.0, 1.0, 1.0, 0.4, 0.4, 0.4, 1.0,
0.4]
==>marko=[0.4, 0.4, 0.4, 0.5, 0.5, 0.5, 1.0, 1.0, 1.0, 0.4, 0.5, 1.0, 0.4,
0.4, 0.4, 0.5, 0.5, 0.5, 1.0, 1.0, 1.0]

so, i think all() in this form does make good sense. it seems less of a
type of P which would imply some sort of fold(), (i.e. stream consumption -
has('weight',all(0.3))) into it which i think would complicate other forms
that take P. making it more like an is() that is designed to work on a List
seems best. i also don't think all() should do too much magic and force
folks to fold() if they don't already have a List sort of type. That said,
I'd be curious how you define all() to behave when it doesn't get that type
or the List type is empty. I assume it would filter in those cases since
that fits the current design direction made in other steps going back to
mid-3.5.x releases.




On Fri, Aug 4, 2023 at 9:29 PM Ken Hu  wrote:

> Hi All,
>
> As I continue to take a further look into the list functions described in
> Proposal 3, I noticed that they don't take in a Scope. Yet, it could have
> been added because there are some functions that make sense as
> ReducingBarriers. There are some instances, however, where the global scope
> makes no sense so I would propose that we implement these as stated in
> Proposal 3. I just want to point out that this would likely be the first
> time local scope was used as a default for a Step (that isn't unfold) and
> would like to give an opportunity for someone to voice their concerns about
> this.
>
> Let's take a look at one of the examples from Proposal 3.
>
> List Example 1 (LE1)
> Given a list of people, return the list of ages if everyone’s age > 18
> g.V().hasLabel('person').values('age').fold().where(all(gt(18)))
>
> Let's assume the proposal should have included this usage for all() that
> takes a predicate as a parameter. If we remove the fold() from the above
> example so that the example becomes
>
> g.V().hasLabel('person').values('age').where(all(gt(18)))
>
> If all() were to behave like a global scope step here then it would be
> pretty meaningless as the incoming traverser is not a list type. In fact
> the three proposed Steps that return boolean (all, any, none) shouldn't be
> used unless the incoming traverser is an iterable type. In addition, the
> set operations (intersect, union, disjunct and difference) also require the
> incoming traverser to be a list/array type for them to have any sort of
> meaning. I think it's reasonable for the default behavior of all the
> proposed list functions to be the local scope versions. The concat() string
> function has already set a precedent of not taking Scope as a parameter as
>