[jira] [Updated] (TINKERPOP-1343) A more efficient StarGraph serialization representation.

Marko A. Rodriguez (JIRA) Fri, 17 Jun 2016 13:39:38 -0700

     [ 
https://issues.apache.org/jira/browse/TINKERPOP-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Marko A. Rodriguez updated TINKERPOP-1343:
------------------------------------------
    Description: 
{{StarGraph}} is used by the Hadoop {{GraphComputers}} and represents a vertex, 
its properties, its incident edges, and their properties. In essence, one "row 
of an adjacency list."

Here are some ideas on how to make the next version of the serialization format 
more efficient.

1. For all Element ids, we currently use {{kryo.readClassAndObject(...)}}. This 
is bad because we have to write the class with each id. It would be better if 
the {{StarGraph}} had metadata like {{vertexIdClass}}, 
{{vertexPropertyIdClass}}, and {{edgeIdClass}}. Now for every vertex we are 
serializing three class, but the benefit is that every id class is now known 
and we can use {{kryo.readObject(..., xxxIdClass)}}.

2. Edges and VertexProperties are written out as {{[ edgeLabel[ edge[ id, 
otherVertexId]\*]\*}} and {{[ propertyKey[ vertexProperty[ 
id,propertyValue]\*]\*}}, respectively. This ensures we don't write so many 
strings as all edges/vertex properties are grouped by label. However, we do NOT 
do this for edge properties nor vertex property properties. We simply write out 
the {{Map<Object,Map<String,Object>>}} which is 
{{Map<EdgeId,Map<PropertyKey,PropertyValue>>}}. Since we have to choose between 
grouping by edgeId or by propertyKey, we should keep it as it is, but create a 
"meta map" that allows us to represent all property keys in a, e.g., {{int}} 
space. Thus, {{Map<EdgeId,Map<PropertyKeyIntegerId,PropertyValue>>}} where we 
also have a {{Map<PropertyKeyIntegerId,String>}} that is serialized with the 
{{StarGraph}}.

There are a few other tickets around optimizing {{StarGraph}} here:

https://issues.apache.org/jira/browse/TINKERPOP-1128 (making {{GraphFilters}} 
more efficient)

https://issues.apache.org/jira/browse/TINKERPOP-1122 (pointless bits and 
{{StarGraph}} should never auto-generate IDs as the ID space is distributed).

https://issues.apache.org/jira/browse/TINKERPOP-1287 (related to heap usage and 
clock cycles -- not serialization).


  was:
{{StarGraph}} is used by the Hadoop {{GraphComputers}} and represents a vertex, 
its properties, its incident edges, and their properties. In essence, one "row 
of an adjacency list."

Here are some ideas on how to make the next version of the serialization format 
more efficient.

1. For all Element ids, we currently use {{kryo.readClassAndObject(...)}}. This 
is bad because we have to write the class with each id. It would be better if 
the {{StarGraph}} had metadata like {{vertexIdClass}}, 
{{vertexPropertyIdClass}}, and {{edgeIdClass}}. Now for every vertex we are 
serializing three class, but the benefit is that every id class is now known 
and we can use {{kryo.readObject(..., xxxIdClass)}}.

2. Edges and VertexProperties are written out as 
{{[label[edge[id,otherVertexId]*]*}} and 
{{[label[vertexProperty[id,value]*]*}}, respectively. This ensures we don't 
write so many strings as all edges/vertex properties are grouped by label. 
However, we do NOT do this for edge properties nor vertex property properties. 
We simply write out the {{Map<Object,Map<String,Object>>}} which is 
{{Map<EdgeId,Map<PropertyKey,PropertyValue>>}}. Since we have to choose between 
grouping by edgeId or by propertyKey, we should keep it as it is, but create a 
"meta map" that allows us to represent all property keys in a, e.g., {{int}} 
space. Thus, {{Map<EdgeId,Map<PropertyKeyIntegerId,PropertyValue>>}} where we 
also have a {{Map<PropertyKeyIntegerId,String>}} that is serialized with the 
{{StarGraph}}.

There are a few other tickets around optimizing {{StarGraph}} here:

https://issues.apache.org/jira/browse/TINKERPOP-1128 (making {{GraphFilters}} 
more efficient)

https://issues.apache.org/jira/browse/TINKERPOP-1122 (pointless bits and 
{{StarGraph}} should never auto-generate IDs as the ID space is distributed).

https://issues.apache.org/jira/browse/TINKERPOP-1287 (related to heap usage and 
clock cycles -- not serialization).



> A more efficient StarGraph serialization representation.
> --------------------------------------------------------
>
>                 Key: TINKERPOP-1343
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1343
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.2.0-incubating
>            Reporter: Marko A. Rodriguez
>              Labels: breaking
>
> {{StarGraph}} is used by the Hadoop {{GraphComputers}} and represents a 
> vertex, its properties, its incident edges, and their properties. In essence, 
> one "row of an adjacency list."
> Here are some ideas on how to make the next version of the serialization 
> format more efficient.
> 1. For all Element ids, we currently use {{kryo.readClassAndObject(...)}}. 
> This is bad because we have to write the class with each id. It would be 
> better if the {{StarGraph}} had metadata like {{vertexIdClass}}, 
> {{vertexPropertyIdClass}}, and {{edgeIdClass}}. Now for every vertex we are 
> serializing three class, but the benefit is that every id class is now known 
> and we can use {{kryo.readObject(..., xxxIdClass)}}.
> 2. Edges and VertexProperties are written out as {{[ edgeLabel[ edge[ id, 
> otherVertexId]\*]\*}} and {{[ propertyKey[ vertexProperty[ 
> id,propertyValue]\*]\*}}, respectively. This ensures we don't write so many 
> strings as all edges/vertex properties are grouped by label. However, we do 
> NOT do this for edge properties nor vertex property properties. We simply 
> write out the {{Map<Object,Map<String,Object>>}} which is 
> {{Map<EdgeId,Map<PropertyKey,PropertyValue>>}}. Since we have to choose 
> between grouping by edgeId or by propertyKey, we should keep it as it is, but 
> create a "meta map" that allows us to represent all property keys in a, e.g., 
> {{int}} space. Thus, {{Map<EdgeId,Map<PropertyKeyIntegerId,PropertyValue>>}} 
> where we also have a {{Map<PropertyKeyIntegerId,String>}} that is serialized 
> with the {{StarGraph}}.
> There are a few other tickets around optimizing {{StarGraph}} here:
> https://issues.apache.org/jira/browse/TINKERPOP-1128 (making {{GraphFilters}} 
> more efficient)
> https://issues.apache.org/jira/browse/TINKERPOP-1122 (pointless bits and 
> {{StarGraph}} should never auto-generate IDs as the ID space is distributed).
> https://issues.apache.org/jira/browse/TINKERPOP-1287 (related to heap usage 
> and clock cycles -- not serialization).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TINKERPOP-1343) A more efficient StarGraph serialization representation.

Reply via email to