To try to go back to the original discussion: > Identify types that are problematic and that we therefore don't want to support across all GLVs.
I made a proposal above. It's based on the type representation rationale, it's important to think about possible/common in-memory or disk representation of a type (ie: two compliment for signed integers, IEEE 754 for doubles, ...) when you decide whether you support it or not (ie: is it useful for a TinkerPop implementation?), I didn't want to bring storage in the mix as it's a responsibility of the vendor/implementor of the storage engine. We can go for a minimal set of types or a more broader set, but I agree with Florian that it should be defined and documented. Is anyone proposing anything else? Should we only support Core types on the GLV? 2018-05-29 23:30 GMT+02:00 Robert Dale <[email protected]>: > Florian wrote: > > I don't really see why we should drop them when they make the storage > more efficient and reduce the need for type castings in user code. > > Where is it more efficient? I think that statement a specialization that's > not generally applicable. > a. At the client? This is language dependent. Many high-level languages > actually have only 2 number types: integer, float. My guess is that they > occupy the same space for the most part. In any case, they would ignore any > size hints. > b. At GraphSON? The text size is dependent on actuall length of characters > and not the memory storage size. e.g. a '0' byte is the same size as a '0' > long. > c. In the TinkerPop Java implementation? In Java, all of the numbers are > stored as Objects. Byte, Short, Integer all occupy 16 bytes. Long is 24 > bytes. There's really not a substantial difference. > d. In the database? Yes, but I started out with Gremlin should be > schema-agnostic. And all modern databases I've worked with will handle the > conversion to storage size for you. Are there any that don't? Should we be > concerned about it? > > I've pulled some select phrases from Python's rationale to unify integers: > > "Having the machine word size exposed to the language hinders portability." > > "There is also the general desire to hide unnecessary details from the > [...] user when they are irrelevant for most applications. An example is > memory allocation, [...] giving us the convenience of unlimited sizes on > strings, lists, etc. It makes sense to extend this convenience to numbers." > > - https://www.python.org/dev/peps/pep-0237/ > > So, take the inverse problem of the client where in python 3, for example, > there is only one integer type. In the Gremlin client, it has undone what > python 3 did. That is, the Gremlin python client adds cast functions for > int, long, float. And so it goes for every client out there where you want > to expose memory size. This goes back to client/server symmetry. And why > does GraphSON arbitrarily stop at Int32,64? Why are 1, 8, 16 extended > types? Let's just do away with memory size. Besides, all the cool kids are > doing it. What more do you need than that? > > Florian wrote: > > For Date, you say that it's just a long. Sure, but how does the receiver > know that the long should be deserialized to a Date in this case? As a user > I want to work with a Date object and not just with a long. Also, we > nevertheless need a convention of what this long represents: Milliseconds > since January 1, 1970 (POSIX)? Since January 1, 1 (.NET)? Since December > 31, 1899 (C++ 7.0)? (There are a lot more epoch dates [1].) g:Date is > basically just this convention which is why I would keep it. > > I think you're conflating types with intent. How does the receiver know > what anything is really? That's an application detail. Aside from that, > your argument falls apart exactly for the reasons given. g:Date is not > well-defined. You are implicitly using Java's definition of Date. If we > take your example , what if you were also interacting with a Gremlin Server > written C++ 7.0? Now all your g:Dates are off by 71 years. > > From a Gremlin perspective, currently there is no first-class support for > Date , it's just a pass-through type. In order to do anything with it, you > would have to use an other-language lambda. I'm not against temporal > types, just IMO, this is not sufficiently supported in Gremlin to be a > supported type. So if the long-term goal for Gremlin is to get away from > lambdas and be language agnostic, then it needs the date operators and > functions in order to support a date type. > > Again, provide the use cases. Demonstrate why it's required. Provide the > operators/functions. > > > Stephen wrote: > > I just wonder > exactly how graphs that don't ' have schemas like neo4j/tinkergraph will > deal with someone sending a "Number". What happens in that case? > > Neo4j actually supports only two number types, integer and float, just as > proposed here. > https://neo4j.com/docs/developer-manual/current/drivers/cypher-values/ > > > Robert Dale > > > On Tue, May 29, 2018 at 12:45 PM, Stephen Mallette <[email protected]> > wrote: > > > > Regarding serialization and deserialization asymmetry on GLVs (for > Core > > and Extended types), I think we should avoid it as it could lead to > > obscure error > > messages on the user side. > > > > In the past, I think TinkerPop (going back to 2.x) has been ok with it > and > > I'm not so sure that I recall any specific problems that were every > voiced > > by users on the subject. As it stands, I think we already have some > > asymmetry in gremlin-python so there is some precedent for it. There are > > also some cases where it logically makes some sense to safely have one > but > > not the other. A GLV likely doesn't need to support a Bytecode > deserializer > > because it doesn't receive bytecode from the server. It only needs to > send > > bytecode and thus only has a serializer - at least until we have GVMs > > instead of GLVs :) Does that change your thinking at all Jorge? > > > > > First would be: Gremlin should not concern itself with storage > > schemas..... > > > > I like all of Robert's first paragraph because it makes Jorge's binary > > format proposal that much easier to get right. JanusGraph, DSE Graph and > > others won't have any trouble with this approach because the backend will > > simply know that the particular property that this number is going into > > will be a float and will coerce it as such on storage. I just wonder > > exactly how graphs that don't ' have schemas like neo4j/tinkergraph will > > deal with someone sending a "Number". What happens in that case? > > > > On Mon, May 28, 2018 at 4:20 AM, Florian Hockmann < > [email protected]> > > wrote: > > > > > > these should be dropped: Class (unless this is used for something > > > important? Too many results on 'Class' > > > in the codebase. > > > > > > 'Class' is for example used for 'withoutStrategies' but I agree that > this > > > would probably better handled just as a string. 'Class' is > Java-specific > > > which doesn't make much sense when graph providers want to implement > > > TinkerPop in another language than Java. > > > > > > Apart from that, I'm not sure I get your reasoning behind dropping > types > > > like Date, Int32, and float. It's really trivial in most languages to > add > > > serializers for more numerical types so I don't really see why we > should > > > drop them when they make the storage more efficient and reduce the need > > for > > > type castings in user code. > > > For Date, you say that it's just a long. Sure, but how does the > receiver > > > know that the long should be deserialized to a Date in this case? As a > > user > > > I want to work with a Date object and not just with a long. Also, we > > > nevertheless need a convention of what this long represents: > Milliseconds > > > since January 1, 1970 (POSIX)? Since January 1, 1 (.NET)? Since > December > > > 31, 1899 (C++ 7.0)? (There are a lot more epoch dates [1].) g:Date is > > > basically just this convention which is why I would keep it. > > > > > > > There should be a boolean (which seems to be completely missing??). > > > > > > Yeah, boolean and string are both just serialized without type > > information > > > right now. Maybe we want to change that if we ever introduce GraphSON > 4. > > > > > > > > > Jorge's suggestion to drop all extended types except for the five he > > > listed sounds like a good idea to me. I would only add dropping of > either > > > Timestamp or Date from Core and probably also Class, like Robert > > suggested. > > > > > > [1] https://en.wikipedia.org/wiki/Epoch_%28reference_date%29# > > > Notable_epoch_dates_in_computing > > > > > > -----Ursprüngliche Nachricht----- > > > Von: Robert Dale <[email protected]> > > > Gesendet: Freitag, 25. Mai 2018 15:43 > > > An: [email protected] > > > Betreff: Re: [DISCUSS] Handling of problematic GraphSON types > > > > > > There should be a guiding principle on this to make these decisions > > > clearer. First would be: Gremlin should not concern itself with > storage > > > schemas. As an extension of that, Gremlin should not concern itself > with > > > storage size. Next would be: Gremlin should not be Java-specific. > > Finally, > > > it should be hard to add a new type, i.e. it's demonstratively > difficult > > to > > > do a real world traversal without this type, how GLVs would map it, > what > > > functions on that type should be a part of Gremlin, and n>1 people > > > positively affirm this direction. > > > > > > Thus, there should be a minimal Core on which most else can be built. > > All > > > extended types should be dropped. From Core, these should be dropped: > > Class > > > (unless this is used for something important? Too many results on > 'Class' > > > in the codebase. Otherwise, it's just a string), Date (is a long), > > > Timestamp (is a long, what's the diff to Date anyway?). There should > be > > > one floating point type which is 64-bit. There should be one integer > type > > > which is 64-bit. There should be a boolean (which seems to be > completely > > > missing??). > > > > > > > > > Robert Dale > > > > > > On Fri, May 25, 2018 at 3:37 AM, Jorge Bay Gondra < > > > [email protected]> > > > wrote: > > > > > > > Thanks Florian for starting the discussion on this topic! > > > > > > > > I think its a good exercise to evaluate which types are necessary for > > > > a GLV to support. > > > > > > > > I went through a similar exercise when designing the binary > > > > serialization format. I'll go ahead and propose: > > > > All types that are considered "Core", "Graph Structure" and "Graph > > > Process" > > > > in GraphSON3 > > > > <http://tinkerpop.apache.org/docs/current/dev/io/#_core_2> > > > > plus the following from the "Extended" list: > > > > - Short > > > > - Byte > > > > - ByteBuffer > > > > - BigInteger > > > > - BigDecimal > > > > > > > > The rationale is to select types that *can't be represented and > > > > stored* using other types. > > > > For example: > > > > - Short can be stored using an int backing field, but it would take > > > > twice the space. > > > > - BigDecimal can be stored using a ByteBuffer but ordering on a > buffer > > > > doesn't align with decimal ordering. > > > > > > > > Regarding serialization and deserialization asymmetry on GLVs (for > > > > Core and Extended types), I think we should avoid it as it could lead > > > > to obscure error messages on the user side. > > > > > > > > I think we should provide a comprehensive type representation but it > > > > doesn't have to be contain any type imaginable. The Gremlin Server > and > > > > the GLVs provide extension mechanisms that vendors and users can use > > > > to support other types. > > > > > > > > 2018-05-24 14:31 GMT+02:00 Florian Hockmann <[email protected] > >: > > > > > > > > > As part of the discussion for the pull request by Daniel C. Weber > > > > > that > > > > adds > > > > > support for more extended GraphSON types to Gremlin.Net [1] we > > > > > identified several of those types to be problematic for non-Java > > > > > languages (or at least for .NET in this case) as they don't really > > > > > have counterparts in other languages and for some it was even > > > > > difficult to say where they differ > > > > from > > > > > each other. > > > > > > > > > > > > > > > > > > > > Now the question is basically what we want to do with those > > > > > problematic types. > > > > > > > > > > > > > > > > > > > > My suggestion would be an approach like this: > > > > > > > > > > 1. Identify types that are problematic and that we therefore > > don't > > > > > want > > > > > to support across all GLVs. > > > > > 2. Communicate to users somehow which types are problematic > > > > (something > > > > > like a deprecation) as we won't support them in all GLVs and maybe > > > > > even stop supporting them at all at some point in the future. > > > > > 3. Support the remaining types in all GLVs. > > > > > > > > > > > > > > > > > > > > Does that sound like a good plan? Are there any good ideas for the > > > > > deprecation of those problematic types? My first idea would be to > > > > > put > > > > them > > > > > in a different section in the I/O docs [2] that explains at the > > > > > beginning that and why they are deprecated, but maybe someone here > > > > > has a better > > > > idea. > > > > > > > > > > > > > > > > > > > > Another question that was brought up during the review of the > > > > > mentioned > > > > PR > > > > > by Jorge was whether types should only be supported symmetrically > or > > > > > whether GLVs should try to support types as good as they can. If > > > > > someone has good arguments or a strong opinion for either side then > > > > > it would of course > > > > also > > > > > be good to hear them. > > > > > > > > > > To give a concrete example of what is meant by symmetric support: > > > > > > > > > > In its current form the PR deserializes both GraphSON types > > > > > gx:Duration > > > > and > > > > > gx:Period to the .NET type TimeSpan and it serializes TimeSpan back > > > > > to gx:Duration. This means that gx:Duration is supported > > > > > symmetrically, but gx:Period is not as there exists no .NET > > > > > serializer that create a gx:Period. > > > > > > > > > > > > > > > > > > > > [1] https://github.com/apache/tinkerpop/pull/842 > > > > > > > > > > [2] http://tinkerpop.apache.org/docs/current/dev/io/#_extended_2 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
