Re: Dynamic Graphs

2013-09-06 Thread Claudio Martella
Hi Mirko,

this is in general the kind of approach I was suggesting, but looked at in
a broader-perspective. I'd tend to avoid calling other tools such as Hive
or Pig often to compute injections, as Giraph is still a batch-processing
and this could really introduce latency and reduce throughput. I feel that
if the injection of vertices and edges would really require such a
complexity (such a computing them with M/R), then one could just create a
pipeline of jobs. But this is only my superficial analysis/speculation, I
can see your point on integration and your proposal is very interesting.


On Sun, Aug 25, 2013 at 8:55 AM, Mirko Kämpf mirko.kae...@cloudera.comwrote:

 Good morning Gentlemen,

 as far as I understand your thread you are talking about the same topic I
 was thinking and working some time.
 I work on a research project focused on evolution of networks and networks
 dynamics in networks of networks.

 My understanding of Marco's question is, that he needs to change node
 properties or even wants to add nodes to the graph while it is processed,
 right?

 With the WorkerContext we could construct a Connector to the outside
 world, not just for loading data from HDFS, which requires a preprocessing
 step for the data which has to be loaded also. I think about HBase often.
 All my nodes and edges live in HBase. From there it is quite easy to load
 new data based on a simple Scan or even if the WorkerContext triggers a
 Hive or Pig script, one can automatically reorganize or extract relevant
 new links / nodes which have to be added to the graph.

 Such an approach means, after n super steps of the Giraph layer an
 additional utility-step (triggered via WorkerContext, or any other better
 fitting class form Giraph - not sure jet there to start) is executed.
 Before such a step the state of the graph is persisted to allow fall back
 or resume. The utility-step can be a processing (MR, Mahout) or just a load
 (from HDFS, HBase) operation and it allows a kind of clocked data flow
 directly into a running Giraph application. I think this is a very
 important feature in Complex Systems research, as we have interacting
 layers which change in parallel. In this picture the Giraph steps are the
 steps of layer A, lets say something whats going on on top of a network and
 the utility-step expresses the changes in the underlying structure
 affecting the network it self but based on the data / properties of the
 second subsystem, e.g. the agents operating on top of the network.

 I created a tool, which worked like this - but not at scale - and it was
 at a time before Giraph. What do you think, is there a need for such a kind
 of extension in the Giraph world?

 Have a nice Sunday.

 Best wishes
 Mirko

 --
 --
 Mirko Kämpf

 *Trainer* @ Cloudera

 tel: +49 *176 20 63 51 99*
 skype: *kamir1604*
 mi...@cloudera.com



 On Wed, Aug 21, 2013 at 3:30 PM, Claudio Martella 
 claudio.marte...@gmail.com wrote:

 As I said, the injection of the new vertices/edges would have to be done
 manually, hence without any support of the infrastructure. I'd suggest
 you implement a WorkerContext class that supports the reading of a specific
 file with a specific format (under your control) from HDFS, and that is
 accessed by this particular special vertex (e.g. based on the vertex ID).

 Does this make sense?


 On Wed, Aug 21, 2013 at 2:13 PM, Marco Aurelio Barbosa Fagnani Lotz 
 m.a.b.l...@stu12.qmul.ac.uk wrote:

  Dear Mr. Martella,

 Once achieved the conditions for updating the vertex data base, what it
 the best way for the Injector Vertex to call an input reader again?

 I am able to access all the HDFS data, but I guess the vertex would need
 to have access to the input splits and also the vertex input format that I
 designate. Am I correct? Or there is a way that one can just ask Zookeeper
 to create new splits and distribute to the workers from given a path in DFS?

 Best Regards,
 Marco Lotz
  --
 *From:* Claudio Martella claudio.marte...@gmail.com
 *Sent:* 14 August 2013 15:25
 *To:* user@giraph.apache.org
 *Subject:* Re: Dynamic Graphs

  Hi Marco,

  Giraph currently does not support that. One way of doing this would be
 by having a specific (pseudo-)vertex to act as the injector of the new
 vertices and edges For example, it would read a file from HDFS and call the
 mutable API during the computation, superstep after superstep.


 On Wed, Aug 14, 2013 at 3:02 PM, Marco Aurelio Barbosa Fagnani Lotz 
 m.a.b.l...@stu12.qmul.ac.uk wrote:

  Hello all,

 I would like to know if there is any form to use dynamic graphs with
 Giraph. By dynamic one can read graphs that may change while Giraph is
 computing/deliberating. The changes are in the input file and are not
 caused by the graph computation itself.

 Is there any way to analyse it using Giraph? If not, anyone has any
 idea/suggestion if it is possible to modify the framework in order to
 process it?

 Best Regards,
 Marco Lotz

Re: Dynamic Graphs

2013-09-06 Thread Mirko Kämpf
?

 I am able to access all the HDFS data, but I guess the vertex would
 need to have access to the input splits and also the vertex input format
 that I designate. Am I correct? Or there is a way that one can just ask
 Zookeeper to create new splits and distribute to the workers from given a
 path in DFS?

 Best Regards,
 Marco Lotz
  --
 *From:* Claudio Martella claudio.marte...@gmail.com
 *Sent:* 14 August 2013 15:25
 *To:* user@giraph.apache.org
 *Subject:* Re: Dynamic Graphs

  Hi Marco,

  Giraph currently does not support that. One way of doing this would
 be by having a specific (pseudo-)vertex to act as the injector of the new
 vertices and edges For example, it would read a file from HDFS and call the
 mutable API during the computation, superstep after superstep.


 On Wed, Aug 14, 2013 at 3:02 PM, Marco Aurelio Barbosa Fagnani Lotz 
 m.a.b.l...@stu12.qmul.ac.uk wrote:

  Hello all,

 I would like to know if there is any form to use dynamic graphs with
 Giraph. By dynamic one can read graphs that may change while Giraph is
 computing/deliberating. The changes are in the input file and are not
 caused by the graph computation itself.

 Is there any way to analyse it using Giraph? If not, anyone has any
 idea/suggestion if it is possible to modify the framework in order to
 process it?

 Best Regards,
 Marco Lotz




  --
Claudio Martella
claudio.marte...@gmail.com




 --
Claudio Martella
claudio.marte...@gmail.com








 --
Claudio Martella
claudio.marte...@gmail.com




-- 
-- 
Mirko Kämpf

*Trainer* @ Cloudera

tel: +49 *176 20 63 51 99*
skype: *kamir1604*
mi...@cloudera.com


RE: Dynamic Graphs

2013-09-05 Thread Marco Aurelio Barbosa Fagnani Lotz
Hello all,

Answering Mr. Kampf question: In my personal opinion this tool would be indeed 
really useful, since many of the real-world graphs are dynamic.
I have just finished a report of my research in the subject. The report is 
available at:

https://github.com/MarcoLotz/dynamicGraph/blob/master/LotzReport.pdf?raw=true

There is a first application that can do this injection. I am working in the 
minor modifications that are proposed in the document right now. It is 
described in section 2.7
The previous sections just describes some experiences that I had with Giraph 
and an introduction to the scenario.

Best Regards,
Marco Lotz

From: Mirko Kämpf mirko.kae...@cloudera.com
Sent: 25 August 2013 07:55
To: user@giraph.apache.org
Subject: Re: Dynamic Graphs

Good morning Gentlemen,

as far as I understand your thread you are talking about the same topic I was 
thinking and working some time.
I work on a research project focused on evolution of networks and networks 
dynamics in networks of networks.

My understanding of Marco's question is, that he needs to change node 
properties or even wants to add nodes to the graph while it is processed, right?

With the WorkerContext we could construct a Connector to the outside world, 
not just for loading data from HDFS, which requires a preprocessing step for 
the data which has to be loaded also. I think about HBase often. All my nodes 
and edges live in HBase. From there it is quite easy to load new data based on 
a simple Scan or even if the WorkerContext triggers a Hive or Pig script, one 
can automatically reorganize or extract relevant new links / nodes which have 
to be added to the graph.

Such an approach means, after n super steps of the Giraph layer an additional 
utility-step (triggered via WorkerContext, or any other better fitting class 
form Giraph - not sure jet there to start) is executed. Before such a step the 
state of the graph is persisted to allow fall back or resume. The utility-step 
can be a processing (MR, Mahout) or just a load (from HDFS, HBase) operation 
and it allows a kind of clocked data flow directly into a running Giraph 
application. I think this is a very important feature in Complex Systems 
research, as we have interacting layers which change in parallel. In this 
picture the Giraph steps are the steps of layer A, lets say something whats 
going on on top of a network and the utility-step expresses the changes in the 
underlying structure affecting the network it self but based on the data / 
properties of the second subsystem, e.g. the agents operating on top of the 
network.

I created a tool, which worked like this - but not at scale - and it was at a 
time before Giraph. What do you think, is there a need for such a kind of 
extension in the Giraph world?

Have a nice Sunday.

Best wishes
Mirko

--
--
Mirko Kämpf

Trainer @ Cloudera

tel: +49 176 20 63 51 99
skype: kamir1604
mi...@cloudera.commailto:mi...@cloudera.com



On Wed, Aug 21, 2013 at 3:30 PM, Claudio Martella 
claudio.marte...@gmail.commailto:claudio.marte...@gmail.com wrote:
As I said, the injection of the new vertices/edges would have to be done 
manually, hence without any support of the infrastructure. I'd suggest you 
implement a WorkerContext class that supports the reading of a specific file 
with a specific format (under your control) from HDFS, and that is accessed by 
this particular special vertex (e.g. based on the vertex ID).

Does this make sense?


On Wed, Aug 21, 2013 at 2:13 PM, Marco Aurelio Barbosa Fagnani Lotz 
m.a.b.l...@stu12.qmul.ac.ukmailto:m.a.b.l...@stu12.qmul.ac.uk wrote:
Dear Mr. Martella,

Once achieved the conditions for updating the vertex data base, what it the 
best way for the Injector Vertex to call an input reader again?

I am able to access all the HDFS data, but I guess the vertex would need to 
have access to the input splits and also the vertex input format that I 
designate. Am I correct? Or there is a way that one can just ask Zookeeper to 
create new splits and distribute to the workers from given a path in DFS?

Best Regards,
Marco Lotz

From: Claudio Martella 
claudio.marte...@gmail.commailto:claudio.marte...@gmail.com
Sent: 14 August 2013 15:25
To: user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Re: Dynamic Graphs

Hi Marco,

Giraph currently does not support that. One way of doing this would be by 
having a specific (pseudo-)vertex to act as the injector of the new vertices 
and edges For example, it would read a file from HDFS and call the mutable API 
during the computation, superstep after superstep.


On Wed, Aug 14, 2013 at 3:02 PM, Marco Aurelio Barbosa Fagnani Lotz 
m.a.b.l...@stu12.qmul.ac.ukmailto:m.a.b.l...@stu12.qmul.ac.uk wrote:
Hello all,

I would like to know if there is any form to use dynamic graphs with Giraph. By 
dynamic one can read graphs that may change while Giraph is 
computing/deliberating. The changes

RE: Dynamic Graphs

2013-08-21 Thread Marco Aurelio Barbosa Fagnani Lotz
Dear Mr. Martella,

Once achieved the conditions for updating the vertex data base, what it the 
best way for the Injector Vertex to call an input reader again?

I am able to access all the HDFS data, but I guess the vertex would need to 
have access to the input splits and also the vertex input format that I 
designate. Am I correct? Or there is a way that one can just ask Zookeeper to 
create new splits and distribute to the workers from given a path in DFS?

Best Regards,
Marco Lotz

From: Claudio Martella claudio.marte...@gmail.com
Sent: 14 August 2013 15:25
To: user@giraph.apache.org
Subject: Re: Dynamic Graphs

Hi Marco,

Giraph currently does not support that. One way of doing this would be by 
having a specific (pseudo-)vertex to act as the injector of the new vertices 
and edges For example, it would read a file from HDFS and call the mutable API 
during the computation, superstep after superstep.


On Wed, Aug 14, 2013 at 3:02 PM, Marco Aurelio Barbosa Fagnani Lotz 
m.a.b.l...@stu12.qmul.ac.ukmailto:m.a.b.l...@stu12.qmul.ac.uk wrote:
Hello all,

I would like to know if there is any form to use dynamic graphs with Giraph. By 
dynamic one can read graphs that may change while Giraph is 
computing/deliberating. The changes are in the input file and are not caused by 
the graph computation itself.

Is there any way to analyse it using Giraph? If not, anyone has any 
idea/suggestion if it is possible to modify the framework in order to process 
it?

Best Regards,
Marco Lotz



--
   Claudio Martella
   claudio.marte...@gmail.commailto:claudio.marte...@gmail.com


Re: Dynamic Graphs

2013-08-21 Thread Claudio Martella
As I said, the injection of the new vertices/edges would have to be done
manually, hence without any support of the infrastructure. I'd suggest
you implement a WorkerContext class that supports the reading of a specific
file with a specific format (under your control) from HDFS, and that is
accessed by this particular special vertex (e.g. based on the vertex ID).

Does this make sense?


On Wed, Aug 21, 2013 at 2:13 PM, Marco Aurelio Barbosa Fagnani Lotz 
m.a.b.l...@stu12.qmul.ac.uk wrote:

  Dear Mr. Martella,

 Once achieved the conditions for updating the vertex data base, what it
 the best way for the Injector Vertex to call an input reader again?

 I am able to access all the HDFS data, but I guess the vertex would need
 to have access to the input splits and also the vertex input format that I
 designate. Am I correct? Or there is a way that one can just ask Zookeeper
 to create new splits and distribute to the workers from given a path in DFS?

 Best Regards,
 Marco Lotz
  --
 *From:* Claudio Martella claudio.marte...@gmail.com
 *Sent:* 14 August 2013 15:25
 *To:* user@giraph.apache.org
 *Subject:* Re: Dynamic Graphs

  Hi Marco,

  Giraph currently does not support that. One way of doing this would be
 by having a specific (pseudo-)vertex to act as the injector of the new
 vertices and edges For example, it would read a file from HDFS and call the
 mutable API during the computation, superstep after superstep.


 On Wed, Aug 14, 2013 at 3:02 PM, Marco Aurelio Barbosa Fagnani Lotz 
 m.a.b.l...@stu12.qmul.ac.uk wrote:

  Hello all,

 I would like to know if there is any form to use dynamic graphs with
 Giraph. By dynamic one can read graphs that may change while Giraph is
 computing/deliberating. The changes are in the input file and are not
 caused by the graph computation itself.

 Is there any way to analyse it using Giraph? If not, anyone has any
 idea/suggestion if it is possible to modify the framework in order to
 process it?

 Best Regards,
 Marco Lotz




  --
Claudio Martella
claudio.marte...@gmail.com




-- 
   Claudio Martella
   claudio.marte...@gmail.com