[
https://issues.apache.org/jira/browse/TINKERPOP-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670284#comment-16670284
]
stephen mallette commented on TINKERPOP-1346:
---------------------------------------------
I wonder if we can stop thinking about the potential for a Gryo 4.0 given
current discussions around: TINKERPOP-1942.
> Gryo 4.0
> --------
>
> Key: TINKERPOP-1346
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1346
> Project: TinkerPop
> Issue Type: Improvement
> Components: io, structure
> Affects Versions: 3.2.0-incubating
> Reporter: Marko A. Rodriguez
> Priority: Major
> Labels: breaking
>
> *Reference*
> Right now, to send a {{ReferenceEdge}} message, we serialize the form as:
> {code:java}
> KryoClassInteger[ReferenceEdge] + KryoClassObject[Edge ID] +
> KryoClassInteger[ReferenceVertex] + KryoClassObject[Vertex ID] +
> KryoClassInteger[ReferenceVertex] + KryoClassObject[Vertex ID]
> {code}
> Assuming {{Long}} Element ids, the math says:
> {code:java}
> 48 bytes = 4 bytes + (4 bytes + 8 bytes [long]) + 4 bytes + (4 bytes + 8
> bytes [long]) + 4 bytes + (4 bytes + 8 bytes [long])
> {code}
> We could get this smaller by not relying on Kryo's {{FieldSerializer}}.
> {code:java}
> KryoClassInteger[ReferenceEdge] + KryoClassInteger[VertexIDClass] +
> KryoClassObject[Edge ID] + KryoObject[Vertex ID] + KryoObject[Vertex ID]
> {code}
> The math says:
> {code:java}
> 36 bytes = 4 bytes + 4 bytes + (4 bytes + 8 bytes [long]) + 8 bytes [long] +
> 8 bytes [long]
> {code}
> Similar techniques would apply to {{ReferenceVertexProperty}} and
> {{ReferenceProperty}}.
> *StarGraph*
> Right now we serialize first the vertex, then its edges, then its properties.
> We should do vertex, properties, edges. Why? If we know that the vertex is to
> be filtered (which is an analysis of its label/id/properties), then we can
> skip over analyzing its edges. Right now, we may do all this work
> deserializing edges only to realize that the GraphFilter says that the vertex
> is filtered. Dah, pointless clock cycles – especially when edge sets can be
> massive.
> {{StarGraph}} is used by the Hadoop {{GraphComputers}} and represents a
> vertex, its properties, its incident edges, and their properties. In essence,
> one "row of an adjacency list."
> Here are some ideas on how to make the next version of the serialization
> format more efficient.
> 1. For all Element ids, we currently use {{kryo.readClassAndObject(...)}}.
> This is bad because we have to write the class with each id. It would be
> better if the {{StarGraph}} had metadata like {{vertexIdClass}},
> {{vertexPropertyIdClass}}, and {{edgeIdClass}}. Now for every vertex we are
> serializing three class, but the benefit is that every id class is now known
> and we can use {{kryo.readObject(..., xxxIdClass)}}.
> 2. Edges and VertexProperties are written out as {{[ edgeLabel[ edge[ id,
> otherVertexId]*]*}} and {{[ propertyKey[ vertexProperty[
> id,propertyValue]*]*}}, respectively. This ensures we don't write so many
> strings as all edges/vertex properties are grouped by label. However, we do
> NOT do this for edge properties nor vertex property properties. We simply
> write out the {{Map<Object,Map<String,Object>>}} which is
> {{Map<EdgeId,Map<PropertyKey,PropertyValue>>}}. Since we have to choose
> between grouping by edgeId or by propertyKey, we should keep it as it is, but
> create a "meta map" that allows us to represent all property keys in a, e.g.,
> {{int}} space. Thus, {{Map<EdgeId,Map<PropertyKeyIntegerId,PropertyValue>>}}
> where we also have a {{Map<PropertyKeyIntegerId,String>}} that is serialized
> with the {{StarGraph}}.
> StarGraph also has a Long identifer - This makes no sense as then each
> StarGraph in the full Graph will have similar ids! Moreover, what is
> referencing what when the adjacent vertices are just arbitrary long ids?!! We
> should require that StarGraph get provided ids for vertices (and perhaps
> edges)... We ensure no inconsistencies and we save 64-bits.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)