[ 
https://issues.apache.org/jira/browse/GIRAPH-155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Presta updated GIRAPH-155:
-------------------------------------

    Attachment: GIRAPH-155.patch

This solution exploits the code we already have for vertex mutations.

We introduce the EdgeInputFormat class that produces edges from input splits.
For convenience, we also introduce the VertexValueInputFormat, a subclass of 
VertexInputFormat that doesn't produce edges.

A user can use an EdgeInputFormat in conjunction with a 
Vertex{Value}InputFormat, or only one of the two.

If only an EdgeInputFormat is used, the graph is built only based on the edges, 
and vertices are initialized with default values.
If both are used, their input is combined.

Corresponding text-based input formats are included, and they are supported by 
InternalVertexRunner.

I had to add Giraph{File/Text}InputFormat in order to deal with multiple 
sources of input (vertex data and edges).

A few caveats:
- only works with mutable vertices for now; we can support immutable ones too 
by modifying VertexResolver to use setEdges() when needed
- not integrated into GiraphRunner yet
- I had to bypass a couple Checkstyle violations
- there's more code duplication than I would like, but I saw no good way to 
extract a common base for vertex- and edge-related code
- the vertex mutation code is pretty old, so there might be possible 
performance improvements

Future work:
- add corresponding HCatalog input formats
- support immutable vertex classes
- integrate in GiraphRunner
- analyze performance of VertexResolver

Will post some perf results soon.
                
> Allow creation of graph by adding edges that span multiple workers
> ------------------------------------------------------------------
>
>                 Key: GIRAPH-155
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-155
>             Project: Giraph
>          Issue Type: New Feature
>          Components: graph, lib
>    Affects Versions: 0.1.0
>            Reporter: Dionysios Logothetis
>            Assignee: Alessandro Presta
>         Attachments: GIRAPH-155.patch
>
>
> Currently a graph is created only be adding vertices. The typical way is to 
> read input text files line-by-line with each line describing a vertex (its 
> value, its edges etc). The current API allows for the creation of a vertex 
> only if all the information for the vertex is available in a single line.
> However, it's common to have graphs described in the form of edges. Edges 
> might span multiple lines in an input file or even span multiple workers. The 
> current API doesn't allow this. In the input superstep, a vertex must be 
> created by a single worker.
> Instead, it should be possible for multiple workers to mutate the graph 
> during the input superstep.
> This has the following implications:
> 1) Instead of just instantiating a vertex, a vertex reader should be able to 
> do vertex addition and edge addition requests.
> 2) Multiple workers might try to create the same vertex. Any conflicts should 
> be handled with a VertexResolver. So the resolver has to be instantiated 
> before load time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to