Re: Queries regarding RDFs with Flink

2015-04-14 Thread Flavio Pompermaier
Hi to all,
I made a simple RDF Gelly test and I shared it on my github repo at
https://github.com/fpompermaier/rdf-gelly-test.
I basically setup the Gelly stuff but I can't proceed and compute the
drafted TODOs.
Could someone help me and implementing them..?
I think this could become a nice example of how Gelly could help in
handling RDF graphs :)

Best,
Flavio

On Mon, Mar 23, 2015 at 10:41 AM, Flavio Pompermaier pomperma...@okkam.it
wrote:

 Thanks Vasiliki,
 when I'll find the time I'll try to make a quick prototype using the
 pointers you suggested!

 Thanks for the support,
 Flavio

 On Mon, Mar 23, 2015 at 10:31 AM, Vasiliki Kalavri 
 vasilikikala...@gmail.com wrote:

 Hi Flavio,

 I'm not familiar with JSON-LD, but as far as I understand, you want to
 generate some trees from selected root nodes.

 Once you have created the Graph as Andra describes above, you can first
 filter out the edges that are of no interest to you, using filterOnEdges.
 There is a description of how edge filtering works in the Gelly docs [1].
 Then, you could use a vertex-centric iteration and propagate a message
 from
 the selected root node to the neighbors recursively, until you have the
 tree.

 In the vertex-centric model, you program from the perspective of a vertex
 in the graph. You basically need to define what each vertex does within
 each iteration (superstep). In Gelly this boils down to two things:
 (a) what messages this vertex will send to its neighbors and
 (b) how a vertex will update its value using the received messages.

 This is also described in the Gelly docs [2].
 Also, take a look at the Gelly library [3]. The library methods are
 implemented using this model and should give you an idea.

 In your case, you will probably need to simply propagate one message from
 the root node and gather the newly discovered neighbors in each superstep.

 I hope this helps! Let us know if you have further questions!

 -Vasia.

 [1]:

 http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#graph-transformations

 [2]:

 http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#vertex-centric-iterations

 [3]:

 http://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/orgThehveflink/graph/library






Re: Queries regarding RDFs with Flink

2015-04-14 Thread Vasiliki Kalavri
Ok, so, exactly as I wrote a few e-mails back in this thread, you can do
this with a vertex-centric iteration :-)

All you need to do is call myGraph.runVertexCentricIteration(new
MyUpdateFunction(), new MyMessagingFunction(), maxIterations)
and define MyUpdateFunction and MyMessagingFunction.

The first function defines how a vertex updates its value based on the
received messages, while the second defines what messages a vertex sends in
each superstep.
Inside both functions, you have access to the vertex ID and value, so you
can check whether it's the vertex you're interested in.

In your case, in the first superstep, the Person vertex sends a message to
its neighbors​.
You can do this with something like the following inside the the
sendMessages() method:

for (EdgeK, V edge : getOutgoingEdges()) {

  sendMessageTo(edge.getTarget(), msg);

}
The rest of the vertices don't need to do anything in the first superstep.
In the next supersteps, the vertices which have received a message,
propagate it to their neighbors in the same way.

One thing you need to be careful about is detecting cycles, so that the
iteration terminates. One way to do this is to mark the vertices you visit,
e.g. by setting a flag in the vertex value and not propagate messages from
a visited vertex.

If you are totally unfamiliar with the vertex-centric model, it might be a
good idea to first do some reading on this, in order to understand how it
works, for example take a look at the Pregel paper [1].

Let us know how it goes!

Cheers,
-Vasia.

[1]: http://kowshik.github.io/JPregel/pregel_paper.pdf


On 14 April 2015 at 18:12, Flavio Pompermaier pomperma...@okkam.it wrote:

 Hi Vasia,
 for compute subgraph for Person I mean exactly all the vertices that
 can be reached
 starting from this node and following the graph edges.
 I drafted the graph as a set of vertices (where the id is the subject of
 the set of triples and the value is all of its triples)
 and a set of edges (properties connecting two vertices, this is only
 possible if the object is an URI).

 Thus, once computed the subgraph of a Person, if I merge the values of all
 reachable vertices, I'll obtain all the triples of such a subgraph.

 On Tue, Apr 14, 2015 at 4:55 PM, Vasiliki Kalavri 
 vasilikikala...@gmail.com
  wrote:

  Hi Flavio,
 
  I'm not quite familiar with RDF or sparql, so not all of your code is
 clear
  to me.
 
  Your first TODO is compute subgraph for Person. Is Person a vertex id
  in your graph? A vertex value?
  And by subgraph of Person, do you mean all the vertices that can be
  reached starting from this node and following the graph edges?
 
  -Vasia.
 
  On 14 April 2015 at 10:37, Flavio Pompermaier pomperma...@okkam.it
  wrote:
 
   Hi to all,
   I made a simple RDF Gelly test and I shared it on my github repo at
   https://github.com/fpompermaier/rdf-gelly-test.
   I basically setup the Gelly stuff but I can't proceed and compute the
   drafted TODOs.
   Could someone help me and implementing them..?
   I think this could become a nice example of how Gelly could help in
   handling RDF graphs :)
  
   Best,
   Flavio
  
   On Mon, Mar 23, 2015 at 10:41 AM, Flavio Pompermaier 
  pomperma...@okkam.it
   
   wrote:
  
Thanks Vasiliki,
when I'll find the time I'll try to make a quick prototype using the
pointers you suggested!
   
Thanks for the support,
Flavio
   
On Mon, Mar 23, 2015 at 10:31 AM, Vasiliki Kalavri 
vasilikikala...@gmail.com wrote:
   
Hi Flavio,
   
I'm not familiar with JSON-LD, but as far as I understand, you want
 to
generate some trees from selected root nodes.
   
Once you have created the Graph as Andra describes above, you can
  first
filter out the edges that are of no interest to you, using
   filterOnEdges.
There is a description of how edge filtering works in the Gelly docs
   [1].
Then, you could use a vertex-centric iteration and propagate a
 message
from
the selected root node to the neighbors recursively, until you have
  the
tree.
   
In the vertex-centric model, you program from the perspective of a
   vertex
in the graph. You basically need to define what each vertex does
  within
each iteration (superstep). In Gelly this boils down to two things:
(a) what messages this vertex will send to its neighbors and
(b) how a vertex will update its value using the received messages.
   
This is also described in the Gelly docs [2].
Also, take a look at the Gelly library [3]. The library methods are
implemented using this model and should give you an idea.
   
In your case, you will probably need to simply propagate one message
   from
the root node and gather the newly discovered neighbors in each
   superstep.
   
I hope this helps! Let us know if you have further questions!
   
-Vasia.
   
[1]:
   
   
  
 
 http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#graph-transformations
   

Re: Queries regarding RDFs with Flink

2015-04-14 Thread Vasiliki Kalavri
Hi Flavio,

I'm not quite familiar with RDF or sparql, so not all of your code is clear
to me.

Your first TODO is compute subgraph for Person. Is Person a vertex id
in your graph? A vertex value?
And by subgraph of Person, do you mean all the vertices that can be
reached starting from this node and following the graph edges?

-Vasia.

On 14 April 2015 at 10:37, Flavio Pompermaier pomperma...@okkam.it wrote:

 Hi to all,
 I made a simple RDF Gelly test and I shared it on my github repo at
 https://github.com/fpompermaier/rdf-gelly-test.
 I basically setup the Gelly stuff but I can't proceed and compute the
 drafted TODOs.
 Could someone help me and implementing them..?
 I think this could become a nice example of how Gelly could help in
 handling RDF graphs :)

 Best,
 Flavio

 On Mon, Mar 23, 2015 at 10:41 AM, Flavio Pompermaier pomperma...@okkam.it
 
 wrote:

  Thanks Vasiliki,
  when I'll find the time I'll try to make a quick prototype using the
  pointers you suggested!
 
  Thanks for the support,
  Flavio
 
  On Mon, Mar 23, 2015 at 10:31 AM, Vasiliki Kalavri 
  vasilikikala...@gmail.com wrote:
 
  Hi Flavio,
 
  I'm not familiar with JSON-LD, but as far as I understand, you want to
  generate some trees from selected root nodes.
 
  Once you have created the Graph as Andra describes above, you can first
  filter out the edges that are of no interest to you, using
 filterOnEdges.
  There is a description of how edge filtering works in the Gelly docs
 [1].
  Then, you could use a vertex-centric iteration and propagate a message
  from
  the selected root node to the neighbors recursively, until you have the
  tree.
 
  In the vertex-centric model, you program from the perspective of a
 vertex
  in the graph. You basically need to define what each vertex does within
  each iteration (superstep). In Gelly this boils down to two things:
  (a) what messages this vertex will send to its neighbors and
  (b) how a vertex will update its value using the received messages.
 
  This is also described in the Gelly docs [2].
  Also, take a look at the Gelly library [3]. The library methods are
  implemented using this model and should give you an idea.
 
  In your case, you will probably need to simply propagate one message
 from
  the root node and gather the newly discovered neighbors in each
 superstep.
 
  I hope this helps! Let us know if you have further questions!
 
  -Vasia.
 
  [1]:
 
 
 http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#graph-transformations
 
  [2]:
 
 
 http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#vertex-centric-iterations
 
  [3]:
 
 
 http://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/orgThehveflink/graph/library
 
 
 
 



Re: Queries regarding RDFs with Flink

2015-04-14 Thread Flavio Pompermaier
Hi Vasia,
for compute subgraph for Person I mean exactly all the vertices that
can be reached
starting from this node and following the graph edges.
I drafted the graph as a set of vertices (where the id is the subject of
the set of triples and the value is all of its triples)
and a set of edges (properties connecting two vertices, this is only
possible if the object is an URI).

Thus, once computed the subgraph of a Person, if I merge the values of all
reachable vertices, I'll obtain all the triples of such a subgraph.

On Tue, Apr 14, 2015 at 4:55 PM, Vasiliki Kalavri vasilikikala...@gmail.com
 wrote:

 Hi Flavio,

 I'm not quite familiar with RDF or sparql, so not all of your code is clear
 to me.

 Your first TODO is compute subgraph for Person. Is Person a vertex id
 in your graph? A vertex value?
 And by subgraph of Person, do you mean all the vertices that can be
 reached starting from this node and following the graph edges?

 -Vasia.

 On 14 April 2015 at 10:37, Flavio Pompermaier pomperma...@okkam.it
 wrote:

  Hi to all,
  I made a simple RDF Gelly test and I shared it on my github repo at
  https://github.com/fpompermaier/rdf-gelly-test.
  I basically setup the Gelly stuff but I can't proceed and compute the
  drafted TODOs.
  Could someone help me and implementing them..?
  I think this could become a nice example of how Gelly could help in
  handling RDF graphs :)
 
  Best,
  Flavio
 
  On Mon, Mar 23, 2015 at 10:41 AM, Flavio Pompermaier 
 pomperma...@okkam.it
  
  wrote:
 
   Thanks Vasiliki,
   when I'll find the time I'll try to make a quick prototype using the
   pointers you suggested!
  
   Thanks for the support,
   Flavio
  
   On Mon, Mar 23, 2015 at 10:31 AM, Vasiliki Kalavri 
   vasilikikala...@gmail.com wrote:
  
   Hi Flavio,
  
   I'm not familiar with JSON-LD, but as far as I understand, you want to
   generate some trees from selected root nodes.
  
   Once you have created the Graph as Andra describes above, you can
 first
   filter out the edges that are of no interest to you, using
  filterOnEdges.
   There is a description of how edge filtering works in the Gelly docs
  [1].
   Then, you could use a vertex-centric iteration and propagate a message
   from
   the selected root node to the neighbors recursively, until you have
 the
   tree.
  
   In the vertex-centric model, you program from the perspective of a
  vertex
   in the graph. You basically need to define what each vertex does
 within
   each iteration (superstep). In Gelly this boils down to two things:
   (a) what messages this vertex will send to its neighbors and
   (b) how a vertex will update its value using the received messages.
  
   This is also described in the Gelly docs [2].
   Also, take a look at the Gelly library [3]. The library methods are
   implemented using this model and should give you an idea.
  
   In your case, you will probably need to simply propagate one message
  from
   the root node and gather the newly discovered neighbors in each
  superstep.
  
   I hope this helps! Let us know if you have further questions!
  
   -Vasia.
  
   [1]:
  
  
 
 http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#graph-transformations
  
   [2]:
  
  
 
 http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#vertex-centric-iterations
  
   [3]:
  
  
 
 http://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/orgThehveflink/graph/library
  
  
  
  
 



Re: Queries regarding RDFs with Flink

2015-03-23 Thread Flavio Pompermaier
Thanks Vasiliki,
when I'll find the time I'll try to make a quick prototype using the
pointers you suggested!

Thanks for the support,
Flavio

On Mon, Mar 23, 2015 at 10:31 AM, Vasiliki Kalavri 
vasilikikala...@gmail.com wrote:

 Hi Flavio,

 I'm not familiar with JSON-LD, but as far as I understand, you want to
 generate some trees from selected root nodes.

 Once you have created the Graph as Andra describes above, you can first
 filter out the edges that are of no interest to you, using filterOnEdges.
 There is a description of how edge filtering works in the Gelly docs [1].
 Then, you could use a vertex-centric iteration and propagate a message from
 the selected root node to the neighbors recursively, until you have the
 tree.

 In the vertex-centric model, you program from the perspective of a vertex
 in the graph. You basically need to define what each vertex does within
 each iteration (superstep). In Gelly this boils down to two things:
 (a) what messages this vertex will send to its neighbors and
 (b) how a vertex will update its value using the received messages.

 This is also described in the Gelly docs [2].
 Also, take a look at the Gelly library [3]. The library methods are
 implemented using this model and should give you an idea.

 In your case, you will probably need to simply propagate one message from
 the root node and gather the newly discovered neighbors in each superstep.

 I hope this helps! Let us know if you have further questions!

 -Vasia.

 [1]:

 http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#graph-transformations

 [2]:

 http://ci.apache.org/projects/flink/flink-docs-master/gelly_guide.html#vertex-centric-iterations

 [3]:

 http://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/orgThehveflink/graph/library



Re: Queries regarding RDFs with Flink

2015-03-22 Thread Flavio Pompermaier
Is there anu example about rdf graph generation based on a skeleton
structure?
On Mar 22, 2015 12:28 PM, Fabian Hueske fhue...@gmail.com wrote:

 Hi Flavio,

 also, Gelly is a superset of Spargel. It provides the same features and
 much more.

 Since RDF is graph-structured, Gelly might be a good fit for your use case.

 Cheers, Fabian



Re: Queries regarding RDFs with Flink

2015-03-22 Thread Flavio Pompermaier
Thanks Andrea for the help!
For graph generation I mean that I'd like to materialize subgraphs of my
overall knowledge starting from some root nodes whose rdf type is of
interest (something similar to what JSON-LD does). Is that clear?
On Mar 22, 2015 1:09 PM, Andra Lungu lungu.an...@gmail.com wrote:

 Hi Flavio,

 We don't have a specific example for generating RDF graphs using Gelly, but
 I will try to drop some lines of code here and hope you will find them
 useful.

 An RDF statement is formed of Subject - Predicate - Object triples. In Edge
 notation, the Subject and the Object will be the source and target vertices
 respectively, while the edge value will be the predicate.

 This being said, say you have an input plain text file that represents the
 edges.
 A line would look like this : http://test/Frank, marriedWith,
 http://test/Mary

 This is internally coded in Flink like a Tuple3. So, to read this edge file
 you should have something like this:

 private static DataSetEdgeString, String
 getEdgesDataSet(ExecutionEnvironment env) {
if (fileOutput) {
   return env.readCsvFile(edgesInputPath)
 .lineDelimiter(\n)

 // the subject, predicate, object

 .types(String.class, String.class, String.class)
 .map(new MapFunctionTuple3String, String, String,
   EdgeString,
 String() {

@Override
public EdgeString, String map(Tuple3String, String,
 String tuple3) throws Exception {
   return new Edge(tuple3.f0, tuple3.f2, tuple3.f1);
}
 });
} else {
   return getDefaultEdges(env);
}
 }

 After you have this, in your main method, you just write:
 GraphLong, String, String rdfGraph = Graph.fromDataSet(edges, env);

 I picked up the conversation later on, not sure if that's what you meant by
 graph generation...

 Cheers,
 Andra

 On Sun, Mar 22, 2015 at 12:42 PM, Flavio Pompermaier pomperma...@okkam.it
 
 wrote:

  Is there anu example about rdf graph generation based on a skeleton
  structure?
  On Mar 22, 2015 12:28 PM, Fabian Hueske fhue...@gmail.com wrote:
 
   Hi Flavio,
  
   also, Gelly is a superset of Spargel. It provides the same features and
   much more.
  
   Since RDF is graph-structured, Gelly might be a good fit for your use
  case.
  
   Cheers, Fabian
  
 



Re: Queries regarding RDFs with Flink

2015-03-22 Thread Fabian Hueske
Hi Flavio,

also, Gelly is a superset of Spargel. It provides the same features and
much more.

Since RDF is graph-structured, Gelly might be a good fit for your use case.

Cheers, Fabian


Re: Queries regarding RDFs with Flink

2015-03-22 Thread Andra Lungu
Hi Flavio,

We don't have a specific example for generating RDF graphs using Gelly, but
I will try to drop some lines of code here and hope you will find them
useful.

An RDF statement is formed of Subject - Predicate - Object triples. In Edge
notation, the Subject and the Object will be the source and target vertices
respectively, while the edge value will be the predicate.

This being said, say you have an input plain text file that represents the
edges.
A line would look like this : http://test/Frank, marriedWith,
http://test/Mary

This is internally coded in Flink like a Tuple3. So, to read this edge file
you should have something like this:

private static DataSetEdgeString, String
getEdgesDataSet(ExecutionEnvironment env) {
   if (fileOutput) {
  return env.readCsvFile(edgesInputPath)
.lineDelimiter(\n)

// the subject, predicate, object

.types(String.class, String.class, String.class)
.map(new MapFunctionTuple3String, String, String,
  EdgeString, String() {

   @Override
   public EdgeString, String map(Tuple3String, String,
String tuple3) throws Exception {
  return new Edge(tuple3.f0, tuple3.f2, tuple3.f1);
   }
});
   } else {
  return getDefaultEdges(env);
   }
}

After you have this, in your main method, you just write:
GraphLong, String, String rdfGraph = Graph.fromDataSet(edges, env);

I picked up the conversation later on, not sure if that's what you meant by
graph generation...

Cheers,
Andra

On Sun, Mar 22, 2015 at 12:42 PM, Flavio Pompermaier pomperma...@okkam.it
wrote:

 Is there anu example about rdf graph generation based on a skeleton
 structure?
 On Mar 22, 2015 12:28 PM, Fabian Hueske fhue...@gmail.com wrote:

  Hi Flavio,
 
  also, Gelly is a superset of Spargel. It provides the same features and
  much more.
 
  Since RDF is graph-structured, Gelly might be a good fit for your use
 case.
 
  Cheers, Fabian
 



Re: Queries regarding RDFs with Flink

2015-03-21 Thread Stephan Ewen
(transitive). Flink is fairly good at both types of operations.
   
I would look into the graph examples and the graph API for a start:
   
 - Graph examples:
   
   
  
 
 https://github.com/apache/flink/tree/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph
 - Graph API:
   
   
  
 
 https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph
   
If you have a more specific question, I can give you better pointers
  ;-)
   
Stephan
   
   
On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru sani...@gmail.com
 
wrote:
   
 Hello,

 how can flink be useful for processing the data to RDFs and build
 the
 ontology?

 Regards,
 Santosh







 --
 View this message in context:

   
  
 
 http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html
 Sent from the Apache Flink (Incubator) Mailing List archive.
 mailing
   list
 archive at Nabble.com.

   
  
 



Re: Queries regarding RDFs with Flink

2015-03-19 Thread Flavio Pompermaier
.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html
Sent from the Apache Flink (Incubator) Mailing List archive. mailing
  list
archive at Nabble.com.
   
  
 



Re: Queries regarding RDFs with Flink

2015-03-03 Thread Flavio Pompermaier
I have a nice case of RDF manipulation :)
Let's say I have the following RDF triples (Tuple3) in two files or tables:

TABLE A:
http://test/John, type, Person
http://test/John, name, John
http://test/John, knows, http://test/Mary
http://test/John, knows, http://test/Jerry
http://test/Jerry, type, Person
http://test/Jerry, name, Jerry
http://test/Jerry, knows, http://test/Frank
http://test/Mary, type, Person
http://test/Mary, name, Mary

TABLE B:
http://test/Frank, type, Person
http://test/Frank, name, Frank
http://test/Frank, marriedWith, http://test/Mary

What is the best way to build up Person-rooted trees with all node's data
properties and some expanded path like 'Person.knows.marriedWith' ?
Is it better to use Graph/Gelly APIs, Flink Joins, multiple punctuals get
from a Key/value store or what?

The expected 4 trees should be:

tree 1 (root is John) --
http://test/John, type, Person
http://test/John, name, John
http://test/John, knows, http://test/Mary
http://test/John, knows, http://test/Jerry
http://test/Jerry, type, Person
http://test/Jerry, name, Jerry
http://test/Jerry, knows, http://test/Frank
http://test/Mary, type, Person
http://test/Mary, name, Mary
http://test/Frank, type, Person
http://test/Frank, name, Frank
http://test/Frank, marriedWith, http://test/Mary

tree 2 (root is Jerry) --
http://test/Jerry, type, Person
http://test/Jerry, name, Jerry
http://test/Jerry, knows, http://test/Frank
http://test/Frank, type, Person
http://test/Frank, name, Frank
http://test/Frank, marriedWith, http://test/Mary
http://test/Mary, type, Person
http://test/Mary, name, Mary

tree 3 (root is Mary) --
http://test/Mary, type, Person
http://test/Mary, name, Mary

tree 4 (root is Frank) --
http://test/Frank, type, Person
http://test/Frank, name, Frank
http://test/Frank, marriedWith, http://test/Mary
http://test/Mary, type, Person
http://test/Mary, name, Mary

Thanks in advance,
Flavio

On Mon, Mar 2, 2015 at 5:04 PM, Stephan Ewen se...@apache.org wrote:

 Hey Santosh!

 RDF processing often involves either joins, or graph-query like operations
 (transitive). Flink is fairly good at both types of operations.

 I would look into the graph examples and the graph API for a start:

  - Graph examples:

 https://github.com/apache/flink/tree/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph
  - Graph API:

 https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph

 If you have a more specific question, I can give you better pointers ;-)

 Stephan


 On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru sani...@gmail.com
 wrote:

  Hello,
 
  how can flink be useful for processing the data to RDFs and build the
  ontology?
 
  Regards,
  Santosh
 
 
 
 
 
 
 
  --
  View this message in context:
 
 http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html
  Sent from the Apache Flink (Incubator) Mailing List archive. mailing list
  archive at Nabble.com.
 



Re: Queries regarding RDFs with Flink

2015-03-03 Thread Vasiliki Kalavri
Hi Flavio,

if you want to use Gelly to model your data as a graph, you can load your
Tuple3s as Edges.
This will result in http://test/John;, Person, Frank, etc to be
vertices and type, name, knows to be edge values.
In the first case, you can use filterOnEdges() to get the subgraph with the
relation edges.

Once you have the graph, you could probably use a vertex-centric iteration
to generate the trees.
It seems to me that you need something like a BFS from each vertex. Keep in
mind that this can be a very costly operation in terms of memory and
communication for large graphs.

Let me know if you have any questions!

Cheers,
V.

On 3 March 2015 at 09:13, Flavio Pompermaier pomperma...@okkam.it wrote:

 I have a nice case of RDF manipulation :)
 Let's say I have the following RDF triples (Tuple3) in two files or tables:

 TABLE A:
 http://test/John, type, Person
 http://test/John, name, John
 http://test/John, knows, http://test/Mary
 http://test/John, knows, http://test/Jerry
 http://test/Jerry, type, Person
 http://test/Jerry, name, Jerry
 http://test/Jerry, knows, http://test/Frank
 http://test/Mary, type, Person
 http://test/Mary, name, Mary

 TABLE B:
 http://test/Frank, type, Person
 http://test/Frank, name, Frank
 http://test/Frank, marriedWith, http://test/Mary

 What is the best way to build up Person-rooted trees with all node's data
 properties and some expanded path like 'Person.knows.marriedWith' ?
 Is it better to use Graph/Gelly APIs, Flink Joins, multiple punctuals get
 from a Key/value store or what?

 The expected 4 trees should be:

 tree 1 (root is John) --
 http://test/John, type, Person
 http://test/John, name, John
 http://test/John, knows, http://test/Mary
 http://test/John, knows, http://test/Jerry
 http://test/Jerry, type, Person
 http://test/Jerry, name, Jerry
 http://test/Jerry, knows, http://test/Frank
 http://test/Mary, type, Person
 http://test/Mary, name, Mary
 http://test/Frank, type, Person
 http://test/Frank, name, Frank
 http://test/Frank, marriedWith, http://test/Mary

 tree 2 (root is Jerry) --
 http://test/Jerry, type, Person
 http://test/Jerry, name, Jerry
 http://test/Jerry, knows, http://test/Frank
 http://test/Frank, type, Person
 http://test/Frank, name, Frank
 http://test/Frank, marriedWith, http://test/Mary
 http://test/Mary, type, Person
 http://test/Mary, name, Mary

 tree 3 (root is Mary) --
 http://test/Mary, type, Person
 http://test/Mary, name, Mary

 tree 4 (root is Frank) --
 http://test/Frank, type, Person
 http://test/Frank, name, Frank
 http://test/Frank, marriedWith, http://test/Mary
 http://test/Mary, type, Person
 http://test/Mary, name, Mary

 Thanks in advance,
 Flavio

 On Mon, Mar 2, 2015 at 5:04 PM, Stephan Ewen se...@apache.org wrote:

  Hey Santosh!
 
  RDF processing often involves either joins, or graph-query like
 operations
  (transitive). Flink is fairly good at both types of operations.
 
  I would look into the graph examples and the graph API for a start:
 
   - Graph examples:
 
 
 https://github.com/apache/flink/tree/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph
   - Graph API:
 
 
 https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph
 
  If you have a more specific question, I can give you better pointers ;-)
 
  Stephan
 
 
  On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru sani...@gmail.com
  wrote:
 
   Hello,
  
   how can flink be useful for processing the data to RDFs and build the
   ontology?
  
   Regards,
   Santosh
  
  
  
  
  
  
  
   --
   View this message in context:
  
 
 http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html
   Sent from the Apache Flink (Incubator) Mailing List archive. mailing
 list
   archive at Nabble.com.
  
 



Re: Queries regarding RDFs with Flink

2015-03-02 Thread Stephan Ewen
Hey Santosh!

RDF processing often involves either joins, or graph-query like operations
(transitive). Flink is fairly good at both types of operations.

I would look into the graph examples and the graph API for a start:

 - Graph examples:
https://github.com/apache/flink/tree/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph
 - Graph API:
https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph

If you have a more specific question, I can give you better pointers ;-)

Stephan


On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru sani...@gmail.com wrote:

 Hello,

 how can flink be useful for processing the data to RDFs and build the
 ontology?

 Regards,
 Santosh







 --
 View this message in context:
 http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html
 Sent from the Apache Flink (Incubator) Mailing List archive. mailing list
 archive at Nabble.com.



Re: Queries regarding RDFs with Flink

2015-03-01 Thread Robert Metzger
Hi Santosh,

I'm not aware of any existing tools in Flink to process RDFs. However,
Flink should be useful for processing such data.
You can probably use an existing RDF parser for Java to get the data into
the system.

Best,
Robert

On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru sani...@gmail.com wrote:

 Hello,

 how can flink be useful for processing the data to RDFs and build the
 ontology?

 Regards,
 Santosh







 --
 View this message in context:
 http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html
 Sent from the Apache Flink (Incubator) Mailing List archive. mailing list
 archive at Nabble.com.