Github user robertdale commented on the issue:
https://github.com/apache/tinkerpop/pull/351
So I've caught up on the discussion and I'll offer some more food for
thought since I haven't seen any other ideas. Embedding metadata is neither
easy nor fun (not for me anyway). For any serious integration type work it's
always best to have a well-defined schema up-front.
On types:
> @spmallette
> In fact we don't always know the types ahead of time (like Titan's
GeoPoint), so using the java class name is pretty convenient
Convenience is not the same as using Java types. By "not using java types",
we mean:
- not using java package names
- not using types specific to Java
- using primitives and other common types that are concise and portable
- should include domain-specific types. e.g. Vertex, Edge, etc.
- may include other standards. e.g. GeoJSON
Defining primitives, common types:
- http://swagger.io/specification/#dataTypeFormat
- http://bsonspec.org/spec.html
- http://geojson.org/geojson-spec.html
- http://ubjson.org/type-reference/
So if your Java implementation conveniently shares the same name as the
type, then that's wonderful. But if you are to be truly language-agnostic, then
at some point the types must be known ahead of time in order to be consumed.
For instance, how can my X parser know how to handle a Titan GeoPoint if it's
all dynamic? It can't. It must be able to handle this type ahead of time. And
I can't imagine someone would want to manually read a graphson file to discover
all the types that must be handled. Maybe I'm getting out of scope as this goes
beyond language and steps into being database agnostic. @newkek, please correct
me if I'm wrong, but it doesn't look like the code does any dynamic
serializing. It looks like all types are registered anyway. So I'll argue again
if you know your types ahead of time, then you may as well have a schema.
But let's continue with embedded metadata...
In JSON, the only unambiguous types are
- array (unless you want to disambiguate from list which may be very valid)
- string
- boolean (true, false)
- null
To avoid confusion on all other types, including numbers, they should be
typed. Thus they are objects (and not lists of things). The metadata can be at
the same level as the object and alleviates these concerns: @newkek " a List in
which the first element is a Map in which the first entry's key" and
@PommeVerte "can be a pain in systems that do not necessarily order lists".
Metadata can be differentiated from member fields by a prefix (e.g. '@').
Primitive types (or objects) having only a single value would have a "value"
key which maps to the actual value.
```json
[
{
"@type":"Vertex",
"id":{
"@type":"int64",
"value":12345
},
"label":"person",
"properties":{
"@type":"VertexProperty",
"skill":{
"id":{ "@type":"int64",
"value":8723
},
"@type":"int32",
"value":5
},
"secrets":[
{ "id":{
"@type":"int64",
"value":8723
},
"@type":"uuid",
"value":"1de7bedf-f9ba-4e94-bde9-f1be28bef239"
},
{ "id":{
"@type":"int64",
"value":8724
},
"@type":"uuid",
"value":"34523adf-f9ba-4e94-bde9-f2345bcd3f45"
}
]
},
"inE":[
{ "@type":"Edge",
"label":"knows",
"id":{
"@type":"int64",
"value":987234
},
"properties":{ },
"outV":[ { } ]
}
]
}
]
```
I wouldn't concern myself with the additional payload size for metadata. I
wouldn't sacrifice conciseness for size. One could always compress the file if
size is a concern. Also, the reader/writer could be easily enhanced to support
zip. I would take the pragmatic approach and address it when it's no longer
working for people.
Anyway, maybe this is all GraphSON 3.0 stuffs. HTH.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---