[
https://issues.apache.org/jira/browse/TINKERPOP-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marko A. Rodriguez closed TINKERPOP-1250.
-----------------------------------------
Resolution: Duplicate
Assignee: Marko A. Rodriguez
This is being done in the {{GraphActors}} work in
ttps://issues.apache.org/jira/browse/TINKERPOP-1564.
This will support the concept of "partitions" which are subgraphs that workers
are responsible for.
> Support Subgraph-Centric GraphComputer
> --------------------------------------
>
> Key: TINKERPOP-1250
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1250
> Project: TinkerPop
> Issue Type: Improvement
> Components: process
> Affects Versions: 3.2.0-incubating
> Reporter: Marko A. Rodriguez
> Assignee: Marko A. Rodriguez
> Labels: breaking
>
> Right now, {{GraphComputer}} and {{VertexPrograms}} are "vertex-centric."
> That is, the boundary of an atomic unit of computation is a single vertex,
> its properties, and its incident edges. We should support "subgraph-centric"
> computing where each worker's partition is loaded in memory (RAM) as a
> connected subgraph. For those vertices that are not in its partition, a
> special "reference vertex" is use to reference it. What this would mean is
> that when a {{Traverser}} is processing, it can continue to evaluate as long
> as its within the subgraph partition. The moment it references a vertex not
> in the partition (a "reference vertex"), it serializes itself as a message.
> This would greatly increase the speed of Gremlin OLAP at the cost of
> requiring a large amount of memory to store all the worker partition
> subgraphs in RAM. There might even be a way to have a hybrid-model where some
> of the partition is held in RAM and the other is (even though still in the
> same partition) is stored as "star vertices."
> How would this be added to {{GraphComputer}} in a backwards compatible way?
> {code}
> GraphComputer.supportsVertexCentricComputing)() // currently true for all
> implementations
> GraphComputer.supportsSubgraphCentricComputing()
> {code}
> A {{VertexProgram}} could then have the following method:
> {code}
> boolean VertexProgram.withinCentricity(final M message)
> {code}
> If the message is NOT within "centricity", then it is serialized and
> distributed, else it continues to execute.
> I haven't thought through all the API and implementation considerations.
> Though it would be good to make this backwards compatible and, moving
> forward, able to support "edge-centric computing" and thus, have a very
> memory limited OLAP system.
> How to think of the different models:
> 1. Vertex-centric: medium speed, medium expressivity, medium memory costs.
> 2. Subgraph-centric: high speed, high expressivity, high memory costs.
> 3. Edge-centric: high speed, low expressivity, low memory costs.
> What is "expressivity"? Well, subgraph-centric computations can have "local
> traversals" that move beyond the "star graph." Edge-centric would not support
> any `by()`-modulators as the computation is bound to a single edge. Thus,
> low expressivity. It really depends on how you represent the edge-centric
> edge list. Do you have the vertices on each side of an edge duplicated with
> all their properties? This wold still be low memory costs, but you could get
> more expressivity.
> Anywho, in general it would be nice if the underlying execution engine can
> handle the three common distributed graph computing paradigms.
> Subgraph-centric seems the easiest to support at this point in time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)