[ https://issues.apache.org/jira/browse/GIRAPH-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13257426#comment-13257426 ]
Benjamin Heitmann commented on GIRAPH-170: ------------------------------------------ In addition, I would like to say that Paolos suggestion of providing some ready made code for Pig, HBase and MapReduce for processing RDF sounds like a really great contribution. Please keep us update onthe progress of that! > Workflow for loading RDF graph data into Giraph > ----------------------------------------------- > > Key: GIRAPH-170 > URL: https://issues.apache.org/jira/browse/GIRAPH-170 > Project: Giraph > Issue Type: New Feature > Reporter: Dan Brickley > Priority: Minor > > W3C RDF provides a family of Web standards for exchanging graph-based data. > RDF uses sets of simple binary relationships, labeling nodes and links with > Web identifiers (URIs). Many public datasets are available as RDF, including > the "Linked Data" cloud (see http://richard.cyganiak.de/2007/10/lod/ ). Many > such datasets are listed at http://thedatahub.org/ > RDF has several standard exchange syntaxes. The oldest is RDF/XML. A simple > line-oriented format is N-Triples. A format aligned with RDF's SPARQL query > language is Turtle. Apache Jena and Any23 provide software to handle all > these; http://incubator.apache.org/jena/ http://incubator.apache.org/any23/ > This JIRA leaves open the strategy for loading RDF data into Giraph. There > are various possibilites, including exploitation of intermediate > Hadoop-friendly stores, or pre-processing with e.g. Pig-based tools into a > more Giraph-friendly form, or writing custom loaders. Even a HOWTO document > or implementor notes here would be an advance on the current state of the > art. The BluePrints Graph API (Gremlin etc.) has also been aligned with > various RDF datasources. > Related topics: multigraphs https://issues.apache.org/jira/browse/GIRAPH-141 > touches on the issue (since we can't currently easily represent fully general > RDF graphs since two nodes might be connected by more than one typed edge). > Even without multigraphs it ought to be possible to bring RDF-sourced data > into Giraph, e.g. perhaps some app is only interested in say the Movies + > People subset of a big RDF collection. > From Avery in email: "a helper VertexInputFormat (and maybe > VertexOutputFormat) would certainly [despite GIRAPH-141] still help" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira