Thanks Ken! Given consensus on this discussion, I've began implementing some of the changes in Java and Python to align the GraphBinary serializers with the spec, starting with removing old serializers and updating DateTime: https://github.com/apache/tinkerpop/pull/2776. Reviews and comments are welcome.
Cheers, Yang On 2024/08/28 04:01:51 Ken Hu wrote: > Here are some other updates that I think should be made for GraphBinaryV4. > > There are steps that can return a distinct ordering in Maps like > order(local). However, these aren't necessarily deserialized in GLVs as > Map-types that preserve this ordering. I think that we should add a > value_flag to the Map type in GraphBinary that signals that the > deserialized version of this Map should maintain the ordering. Similar to > the BulkSet change, we can use a value_flag with the value 0x02 to denote > ordering. The value itself would remain unchanged, but the order of the > items in that list should be the insertion/iteration order of the > deserialized Map. > > All values that the Date type can represent are covered by OffsetDateTime > so I think it makes sense to remove even the Date type. Date is also > outdated in Java and its use isn't recommended. > > The current GraphBinary standard states that properties for elements like > Vertex, Edge and VertexProperty should be set to null. The reasoning for > this was that properties were never returned in remote traversals. However, > this changed starting in 3.7, and properties can now be returned. > Properties should instead be a List of Properties and if there are no > properties then an empty list should be returned. > > Labels for elements are currently serialized as String for both GraphBinary > and GraphSON. However, we may want to support multi-labels or no-label > scenarios in the future. To prevent needing a breaking change if this were > to happen, I propose that "label" become a List of String. > > In the new HTTP API, a ReponseMessage is returned per HTTP chunk and the > HTTP response could end with a trailer. The GLVs may need some way to > distinguish between chunks and trailers which could be accomplished using a > one byte buffer in between the last ResponseMessage and the trailing > header. The GLVs may need information to recreate a chunk as some HTTP > clients transparently dechunk the body. A 4-byte header that indicates the > size of ResponseMessage could be added to help GLVs recognize chunk > boundaries which simplifies deserialization. > > On Thu, Aug 22, 2024 at 2:34 PM Ken Hu <kenhu...@gmail.com> wrote: > > > Hi All, > > > > I've noticed that there are some serializers/deserializers that exist for > > GraphBinary and GraphSON that may not serve a real purpose and I'm > > proposing that we remove them in V4. In general, I think we should move > > away from providing serializers for all types in the Java Standard Library, > > as there are many types that other GLVs simply won't support. The types in > > GraphSONv3/GraphBinaryV3 that I would remove include: > > > > Barrier, Cardinality, Column, DT, Merge, Operator, Order, Pick, Pop, > > Scope, T > > > > Because the GLVs will now use GremlinLang (instead of Bytecode), these > > values will be translated to Strings so they don't need their own specific > > serializers. T and Direction are the two exceptions because they are > > returned in the valueMap() and elementMap() steps, although it's likely > > they will be replaced in the future as well. For similar reasons, P and > > TextP should also be removed since they will also be Strings in GremlinLang. > > > > Period, Timestamp, Instant, ZonedDateTime, OffsetTime, LocalDateTime, > > LocalDate, LocalTime, MonthDay, YearMonth, Year, ZoneOffset > > > > These are from the java.time package but are all variants of > > OffsetDateTime and Date. I don't think it makes sense to provide > > serializers for these Java-specific types that are mostly unsupported by > > the other GLVs and can be represented by either an OffsetDateTime and/or > > Date, so I think we should only have OffsetDateTime and Date. > > > > BulkSet and Traverser are two optimizations that exist to increase > > performance but that the user shouldn't be able to interact with. I'm > > suggesting that we remove these serializers, however, to keep the > > performance optimizations, I would suggest that we add a flag to the List > > type so that it can achieve the same sort of bulking that those types > > provided. GraphSON would simply serialize these as arrays without a bulk so > > it would be the expanded BulkSet. GraphBinary would have an additional > > value_flag for List with the value 0x02 to denote bulking. This would mean > > the item_i is a fully qualified item followed by a Long value that is the > > bulk. > > > > Class and InetAddress are highly Java-specific and can't be used by the > > other GLVs. Binding and Bytecode have both been removed so it doesn't make > > sense to keep their serializers around. Metrics, TraversalMetrics can be > > represented as a list of Strings so they shouldn't require their own > > serializers. > > > > The last type that I think should be removed is Custom. The most common > > complaint that I've seen from end-users regarding custom types is that they > > lack serializers in their GLV of choice. Custom serializers require extra > > effort from providers as it is they that need to maintain and publish them. > > Users often don't realize that they can't just use the GLV without adding > > the custom serializer from their provider which can lead to a frustrating > > initial experience. So, instead, to provide a more seamless user experience > > I'm proposing we replace Custom with a Provider Defined Type (PDT). > > Actually, two PDTs, one for primitive types and one for composite types. > > These PDTs will also serve as a mechanism to allow reading of values whose > > serializer has been removed in V4. This would allow a provider that > > supports one of those Java-specific types to still return the value as a > > PDT. > > > > The PrimitivePdt will contain two fields: a String for the type name and a > > String for the actual value. The CompositePdt will contain two fields: a > > String for the type name and a Map that contains the fields of the > > composite type (most likely a class). PDTs will come with default > > serializers and deserializers. On the user-side, the values will be > > deserialized into either PrimitivePdt or CompositePdt. In order to write a > > PDT to the server, the user would have to look up the required fields for > > that specific type name. This information would likely need to exist in the > > provider's documentation. On the server, there will be an API (probably > > attached to the Graph interface) that will allow the provider to add types > > that should be able to be serialized/deserialized from PDT. In order to > > achieve this, there will need to be a TinkerPopCustomType interface that > > has two methods: serialize() and deserialize(). Then, any type that > > implements TinkerPopCustomType could automatically be > > serialized/deserialized with the provided PDT serializers for both > > GraphSONv4 and GraphBinaryV4. > > > > Does anyone have any comments or concerns about these proposed changes? > > > > Thanks, > > Ken > > >