[ 
https://issues.apache.org/jira/browse/TINKERPOP-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670284#comment-16670284
 ] 

stephen mallette commented on TINKERPOP-1346:
---------------------------------------------

I wonder if we can stop thinking about the potential for a Gryo 4.0 given 
current discussions around: TINKERPOP-1942. 

> Gryo 4.0
> --------
>
>                 Key: TINKERPOP-1346
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1346
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: io, structure
>    Affects Versions: 3.2.0-incubating
>            Reporter: Marko A. Rodriguez
>            Priority: Major
>              Labels: breaking
>
> *Reference*
> Right now, to send a {{ReferenceEdge}} message, we serialize the form as:
> {code:java}
> KryoClassInteger[ReferenceEdge] + KryoClassObject[Edge ID] + 
> KryoClassInteger[ReferenceVertex] + KryoClassObject[Vertex ID] + 
> KryoClassInteger[ReferenceVertex] + KryoClassObject[Vertex ID]
> {code}
> Assuming {{Long}} Element ids, the math says:
> {code:java}
> 48 bytes = 4 bytes + (4 bytes + 8 bytes [long]) + 4 bytes + (4 bytes + 8 
> bytes [long]) + 4 bytes + (4 bytes + 8 bytes [long])
> {code}
> We could get this smaller by not relying on Kryo's {{FieldSerializer}}.
> {code:java}
> KryoClassInteger[ReferenceEdge] + KryoClassInteger[VertexIDClass] + 
> KryoClassObject[Edge ID] + KryoObject[Vertex ID] + KryoObject[Vertex ID]
> {code}
> The math says:
> {code:java}
> 36 bytes = 4 bytes + 4 bytes + (4 bytes + 8 bytes [long]) + 8 bytes [long] + 
> 8 bytes [long]
> {code}
> Similar techniques would apply to {{ReferenceVertexProperty}} and 
> {{ReferenceProperty}}.
> *StarGraph*
> Right now we serialize first the vertex, then its edges, then its properties. 
> We should do vertex, properties, edges. Why? If we know that the vertex is to 
> be filtered (which is an analysis of its label/id/properties), then we can 
> skip over analyzing its edges. Right now, we may do all this work 
> deserializing edges only to realize that the GraphFilter says that the vertex 
> is filtered. Dah, pointless clock cycles – especially when edge sets can be 
> massive.
> {{StarGraph}} is used by the Hadoop {{GraphComputers}} and represents a 
> vertex, its properties, its incident edges, and their properties. In essence, 
> one "row of an adjacency list."
> Here are some ideas on how to make the next version of the serialization 
> format more efficient.
> 1. For all Element ids, we currently use {{kryo.readClassAndObject(...)}}. 
> This is bad because we have to write the class with each id. It would be 
> better if the {{StarGraph}} had metadata like {{vertexIdClass}}, 
> {{vertexPropertyIdClass}}, and {{edgeIdClass}}. Now for every vertex we are 
> serializing three class, but the benefit is that every id class is now known 
> and we can use {{kryo.readObject(..., xxxIdClass)}}.
> 2. Edges and VertexProperties are written out as {{[ edgeLabel[ edge[ id, 
> otherVertexId]*]*}} and {{[ propertyKey[ vertexProperty[ 
> id,propertyValue]*]*}}, respectively. This ensures we don't write so many 
> strings as all edges/vertex properties are grouped by label. However, we do 
> NOT do this for edge properties nor vertex property properties. We simply 
> write out the {{Map<Object,Map<String,Object>>}} which is 
> {{Map<EdgeId,Map<PropertyKey,PropertyValue>>}}. Since we have to choose 
> between grouping by edgeId or by propertyKey, we should keep it as it is, but 
> create a "meta map" that allows us to represent all property keys in a, e.g., 
> {{int}} space. Thus, {{Map<EdgeId,Map<PropertyKeyIntegerId,PropertyValue>>}} 
> where we also have a {{Map<PropertyKeyIntegerId,String>}} that is serialized 
> with the {{StarGraph}}.
> StarGraph also has a Long identifer - This makes no sense as then each 
> StarGraph in the full Graph will have similar ids! Moreover, what is 
> referencing what when the adjacent vertices are just arbitrary long ids?!! We 
> should require that StarGraph get provided ids for vertices (and perhaps 
> edges)... We ensure no inconsistencies and we save 64-bits.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to