Re: [DISCUSS] Is null equal to null
Hello Ken, I don't know that I have a strong opinion on what NULL==NULL should evaluate to, but I agree we should come up with a set of rules here for consistency, both within Gremlin but also with other database language standards (e.g. GQL and SQL) so that Gremlin best matches customer expectations. Gremlin's divergence from user expectations when it comes to null handling has been a constant headache for new users. While I agree with Josh that a type system would make this easier, we still need to be consistent until we cross that bridge. For example, if you have a list, A, which is [1,2,null] and a list, B, which is [1,null]. Should the result of an INTERSECT be [1,null] or [1] In Postgres, this would be [1,null] so that is probably what I would recommend unless someone has a stronger opinion to do something different? Josh, I am familiar with the work you did on Dragon, and I am curious how you see your work aligning with the recent SIGMOD paper [1] from the LDBC working group on PG Schema? Dave [1] https://arxiv.org/abs/2211.10962 On Sat, Aug 5, 2023 at 7:02 AM Joshua Shinavier wrote: > Hi Ken, > > Yes indeed, there is that push. I am not saying that Gremlin shouldn't have > a type system -- just that certain questions will have better answers once > it does. While I am not drawing a lot of attention to it yet in connection > with TinkerPop, there is a type system I am going to propose for TinkerPop. > The formalism is called Lambda Graph, and it is closely related to the > Algebraic Property Graphs [1] model which was implemented by Dragon [2]. I > made a big deal about Dragon three years ago and then was unable to release > it, so I'm waiting until Hydra [3] is completely ready before promoting it > here. That said, it's not far from being ready. We are building property > graph (not yet TinkerPop) applications with it at LinkedIn. I recently gave > a presentation [4] on the data model which has excerpts from the Lambda > Graph paper draft. In terms of property types, probably the first thing I > will explore is integrating Hydra's "TinkerPop" model [5] with TinkerPop > proper. In that model, property types are parameterized and unspecified, as > are vertex and edge id types; different type systems for properties and ids > can be plugged in here. For Hydra's core type system, see hydra/core.Type > [6]. This type system behaves as I described above: there are no "nulls", > but there are optionals, which are comparable to the extent that the base > type is comparable. > > Josh > > [1] https://arxiv.org/abs/1909.04881 > [2] https://www.uber.com/blog/dragon-schema-integration-at-uber-scale/ > [3] https://github.com/CategoricalData/hydra > [4] > > https://docs.google.com/presentation/d/1PF0K3KtopV0tMVa0sGBW2hDA7nw-cSwQm6h1AED1VSA > [5] > > https://categoricaldata.github.io/hydra/hydra-java/javadoc/hydra/langs/tinkerpop/propertyGraph/package-summary.html > [6] > > https://categoricaldata.github.io/hydra/hydra-java/javadoc/hydra/core/Type.html > > > On Fri, Aug 4, 2023 at 6:23 PM Ken Hu > wrote: > > > Hi Josh, > > > > Thanks for your input. There seems to be a push in the graph database > world > > towards having a schema. It's likely something like this would be > > introduced in TinkerPop in the future. Let's assume that TinkerPop does > > support schemas, and therefore would have a type system, would this > change > > your opinion on the matter? > > > > Thanks again, > > Ken > > > > On Wed, Aug 2, 2023 at 3:54 PM Joshua Shinavier > wrote: > > > > > For what it is worth, I think the question of whether null == null is > > only > > > meaningful in the context of a specific type system, which Gremlin so > far > > > does not provide. My personal preference is to avoid SQL-style nulls > and > > > achieve optionality through union types (e.g. Java's Optional or > > Haskell's > > > Maybe). In the case of two lists, if you can assume that the type of > the > > > list is list>, then you can safely treat null like > > > Optional.empty(), and compare it with another null of the same logical > > type > > > (int). If that is the interpretation of your two lists, then the > > > intersection is [1, null]. > > > > > > Josh > > > > > > > > > > > > On Tue, Aug 1, 2023 at 5:47 PM Ken Hu > > > wrote: > > > > > > > Hi All, > > > > > > > > As Gremlin evolves and gains more functionality, it is important that > > we > > > > establish some fundamental rules to provide consistency in results. > One > > > > such question that we should come to agreement on is how null values > > are > > > > compared. Currently, Gremlin seems to mostly follow the comparison > that > > > is > > > > used in Java where NULL == NULL returns TRUE. However, in many other > > > > database systems, NULL == NULL would return FALSE (or NULL). > > > > > > > > This question comes about as I'm starting to look a little deeper > into > > > the > > > > proposed list functions. An example of where this is applicable is > the > > > > INTERSECT lis
Re: [DISCUSS] Local Scope As Default
I think your argument makes sense but your example made me think of something else: g.V().hasLabel('person').values('age').fold().where(all(gt(18))) the design of all() is to return true or false. if that's the case then using it in a where() will always be successful. returning true/false is really the job of a P so far and this would introduce something new. this direction could make sense if you wanted this: g.V().group().by('classroom').by(values('age').fold().all(gt(18 where you were trying to set the value of true/false to each "classroom" being over age 18. i'm not sure that's what we want all() to be doing exactly. i think it's intention is more like the one you supplied and i think it was meant more as a specialized filtering step, specialized in that it worked on List sorts of types only and more like the intention of the examples you used, as in: gremlin> g.V().hasLabel('person').values('age').fold().all(gt(18)) ==>[29,27,32,35] In this way, all()/any()/some() is a bit like how is() behaves in that it can apply a predicate to an item in the traversal stream. A good use case might be for dealing with results like: gremlin> g.V().both().both().group().by('name').by(outE().values('weight').fold()).unfold() ==>ripple=[] ==>peter=[0.2, 0.2, 0.2] ==>vadas=[] ==>josh=[1.0, 1.0, 1.0, 0.4, 0.4, 0.4, 1.0, 1.0, 1.0, 0.4, 0.4, 0.4, 1.0, 0.4] ==>lop=[] ==>marko=[0.4, 0.4, 0.4, 0.5, 0.5, 0.5, 1.0, 1.0, 1.0, 0.4, 0.5, 1.0, 0.4, 0.4, 0.4, 0.5, 0.5, 0.5, 1.0, 1.0, 1.0] where you dont even need to fold() the result. given the results above, i think folks might want to "find all key/value pairs where all weight values are gt(0.3)": g.V().both().both().group().by('name').by(outE().values('weight').fold()).unfold(). where(select(values).all(gt(0.3))) i think the analogous form today without all() would maybe be something like: gremlin> g.V().both().both().group().by('name').by(outE().values('weight').fold()).unfold(). ..1> filter(select(values). ..2> and(count(local).is(gt(0)), ..3> unfold().choose(__.is(gt(0.3)), constant(1), constant(0)). ..4> fold(). ..5> union(sum(local), count(local)).fold().as('x'). ..6> where('x',eq('x')).by(limit(local,1)).by(tail(local,1 ==>josh=[1.0, 1.0, 1.0, 0.4, 0.4, 0.4, 1.0, 1.0, 1.0, 0.4, 0.4, 0.4, 1.0, 0.4] ==>marko=[0.4, 0.4, 0.4, 0.5, 0.5, 0.5, 1.0, 1.0, 1.0, 0.4, 0.5, 1.0, 0.4, 0.4, 0.4, 0.5, 0.5, 0.5, 1.0, 1.0, 1.0] so, i think all() in this form does make good sense. it seems less of a type of P which would imply some sort of fold(), (i.e. stream consumption - has('weight',all(0.3))) into it which i think would complicate other forms that take P. making it more like an is() that is designed to work on a List seems best. i also don't think all() should do too much magic and force folks to fold() if they don't already have a List sort of type. That said, I'd be curious how you define all() to behave when it doesn't get that type or the List type is empty. I assume it would filter in those cases since that fits the current design direction made in other steps going back to mid-3.5.x releases. On Fri, Aug 4, 2023 at 9:29 PM Ken Hu wrote: > Hi All, > > As I continue to take a further look into the list functions described in > Proposal 3, I noticed that they don't take in a Scope. Yet, it could have > been added because there are some functions that make sense as > ReducingBarriers. There are some instances, however, where the global scope > makes no sense so I would propose that we implement these as stated in > Proposal 3. I just want to point out that this would likely be the first > time local scope was used as a default for a Step (that isn't unfold) and > would like to give an opportunity for someone to voice their concerns about > this. > > Let's take a look at one of the examples from Proposal 3. > > List Example 1 (LE1) > Given a list of people, return the list of ages if everyone’s age > 18 > g.V().hasLabel('person').values('age').fold().where(all(gt(18))) > > Let's assume the proposal should have included this usage for all() that > takes a predicate as a parameter. If we remove the fold() from the above > example so that the example becomes > > g.V().hasLabel('person').values('age').where(all(gt(18))) > > If all() were to behave like a global scope step here then it would be > pretty meaningless as the incoming traverser is not a list type. In fact > the three proposed Steps that return boolean (all, any, none) shouldn't be > used unless the incoming traverser is an iterable type. In addition, the > set operations (intersect, union, disjunct and difference) also require the > incoming traverser to be a list/array type for them to have any sort of > meaning. I think it's reasonable for the default behavior of all the > proposed list functions to be the local scope versions. The concat() string > function has already set a precedent of not taking Scope as a parameter as >