[
https://issues.apache.org/jira/browse/TINKERPOP-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephen Mallette closed TINKERPOP-1346.
---------------------------------------
Resolution: Won't Do
Adding a reference to this DISCUSS thread:
https://lists.apache.org/thread.html/rc68d0bf3d6530f14d328fc5f2d5ec141a7e50aac67b2920743612526%40%3Cdev.tinkerpop.apache.org%3E
which basically puts aside the idea of doing Gryo 4.0 as we no longer use it
for network serialization. I suppose that this issue is more about a different
type of usage, but to avoid confusion for now I'm going to close this issue.
> Gryo 4.0
> --------
>
> Key: TINKERPOP-1346
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1346
> Project: TinkerPop
> Issue Type: Improvement
> Components: io, structure
> Affects Versions: 3.2.0-incubating
> Reporter: Marko A. Rodriguez
> Priority: Major
> Labels: breaking
>
> *Reference*
> Right now, to send a {{ReferenceEdge}} message, we serialize the form as:
> {code:java}
> KryoClassInteger[ReferenceEdge] + KryoClassObject[Edge ID] +
> KryoClassInteger[ReferenceVertex] + KryoClassObject[Vertex ID] +
> KryoClassInteger[ReferenceVertex] + KryoClassObject[Vertex ID]
> {code}
> Assuming {{Long}} Element ids, the math says:
> {code:java}
> 48 bytes = 4 bytes + (4 bytes + 8 bytes [long]) + 4 bytes + (4 bytes + 8
> bytes [long]) + 4 bytes + (4 bytes + 8 bytes [long])
> {code}
> We could get this smaller by not relying on Kryo's {{FieldSerializer}}.
> {code:java}
> KryoClassInteger[ReferenceEdge] + KryoClassInteger[VertexIDClass] +
> KryoClassObject[Edge ID] + KryoObject[Vertex ID] + KryoObject[Vertex ID]
> {code}
> The math says:
> {code:java}
> 36 bytes = 4 bytes + 4 bytes + (4 bytes + 8 bytes [long]) + 8 bytes [long] +
> 8 bytes [long]
> {code}
> Similar techniques would apply to {{ReferenceVertexProperty}} and
> {{ReferenceProperty}}.
> *StarGraph*
> Right now we serialize first the vertex, then its edges, then its properties.
> We should do vertex, properties, edges. Why? If we know that the vertex is to
> be filtered (which is an analysis of its label/id/properties), then we can
> skip over analyzing its edges. Right now, we may do all this work
> deserializing edges only to realize that the GraphFilter says that the vertex
> is filtered. Dah, pointless clock cycles – especially when edge sets can be
> massive.
> {{StarGraph}} is used by the Hadoop {{GraphComputers}} and represents a
> vertex, its properties, its incident edges, and their properties. In essence,
> one "row of an adjacency list."
> Here are some ideas on how to make the next version of the serialization
> format more efficient.
> 1. For all Element ids, we currently use {{kryo.readClassAndObject(...)}}.
> This is bad because we have to write the class with each id. It would be
> better if the {{StarGraph}} had metadata like {{vertexIdClass}},
> {{vertexPropertyIdClass}}, and {{edgeIdClass}}. Now for every vertex we are
> serializing three class, but the benefit is that every id class is now known
> and we can use {{kryo.readObject(..., xxxIdClass)}}.
> 2. Edges and VertexProperties are written out as {{[ edgeLabel[ edge[ id,
> otherVertexId]*]*}} and {{[ propertyKey[ vertexProperty[
> id,propertyValue]*]*}}, respectively. This ensures we don't write so many
> strings as all edges/vertex properties are grouped by label. However, we do
> NOT do this for edge properties nor vertex property properties. We simply
> write out the {{Map<Object,Map<String,Object>>}} which is
> {{Map<EdgeId,Map<PropertyKey,PropertyValue>>}}. Since we have to choose
> between grouping by edgeId or by propertyKey, we should keep it as it is, but
> create a "meta map" that allows us to represent all property keys in a, e.g.,
> {{int}} space. Thus, {{Map<EdgeId,Map<PropertyKeyIntegerId,PropertyValue>>}}
> where we also have a {{Map<PropertyKeyIntegerId,String>}} that is serialized
> with the {{StarGraph}}.
> StarGraph also has a Long identifer - This makes no sense as then each
> StarGraph in the full Graph will have similar ids! Moreover, what is
> referencing what when the adjacent vertices are just arbitrary long ids?!! We
> should require that StarGraph get provided ids for vertices (and perhaps
> edges)... We ensure no inconsistencies and we save 64-bits.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)