RE: Dynamic Graphs

2013-09-05 Thread Marco Aurelio Barbosa Fagnani Lotz
Hello all,

Answering Mr. Kampf question: In my personal opinion this tool would be indeed 
really useful, since many of the real-world graphs are dynamic.
I have just finished a report of my research in the subject. The report is 
available at:

https://github.com/MarcoLotz/dynamicGraph/blob/master/LotzReport.pdf?raw=true

There is a first application that can do this injection. I am working in the 
minor modifications that are proposed in the document right now. It is 
described in section 2.7
The previous sections just describes some experiences that I had with Giraph 
and an introduction to the scenario.

Best Regards,
Marco Lotz

From: Mirko Kämpf 
Sent: 25 August 2013 07:55
To: user@giraph.apache.org
Subject: Re: Dynamic Graphs

Good morning Gentlemen,

as far as I understand your thread you are talking about the same topic I was 
thinking and working some time.
I work on a research project focused on evolution of networks and networks 
dynamics in networks of networks.

My understanding of Marco's question is, that he needs to change node 
properties or even wants to add nodes to the graph while it is processed, right?

With the WorkerContext we could construct a "Connector" to the outside world, 
not just for loading data from HDFS, which requires a preprocessing step for 
the data which has to be loaded also. I think about HBase often. All my nodes 
and edges live in HBase. From there it is quite easy to load new data based on 
a simple "Scan" or even if the WorkerContext triggers a Hive or Pig script, one 
can automatically reorganize or extract relevant new links / nodes which have 
to be added to the graph.

Such an approach means, after n super steps of the Giraph layer an additional 
utility-step (triggered via WorkerContext, or any other better fitting class 
form Giraph - not sure jet there to start) is executed. Before such a step the 
state of the graph is persisted to allow fall back or resume. The utility-step 
can be a processing (MR, Mahout) or just a load (from HDFS, HBase) operation 
and it allows a kind of clocked data flow directly into a running Giraph 
application. I think this is a very important feature in Complex Systems 
research, as we have interacting layers which change in parallel. In this 
picture the Giraph steps are the steps of layer A, lets say something whats 
going on on top of a network and the utility-step expresses the changes in the 
underlying structure affecting the network it self but based on the data / 
properties of the second subsystem, e.g. the agents operating on top of the 
network.

I created a tool, which worked like this - but not at scale - and it was at a 
time before Giraph. What do you think, is there a need for such a kind of 
extension in the Giraph world?

Have a nice Sunday.

Best wishes
Mirko

--
--
Mirko Kämpf

Trainer @ Cloudera

tel: +49 176 20 63 51 99
skype: kamir1604
mi...@cloudera.com



On Wed, Aug 21, 2013 at 3:30 PM, Claudio Martella 
mailto:claudio.marte...@gmail.com>> wrote:
As I said, the injection of the new vertices/edges would have to be done 
"manually", hence without any support of the infrastructure. I'd suggest you 
implement a WorkerContext class that supports the reading of a specific file 
with a specific format (under your control) from HDFS, and that is accessed by 
this particular "special" vertex (e.g. based on the vertex ID).

Does this make sense?


On Wed, Aug 21, 2013 at 2:13 PM, Marco Aurelio Barbosa Fagnani Lotz 
mailto:m.a.b.l...@stu12.qmul.ac.uk>> wrote:
Dear Mr. Martella,

Once achieved the conditions for updating the vertex data base, what it the 
best way for the Injector Vertex to call an input reader again?

I am able to access all the HDFS data, but I guess the vertex would need to 
have access to the input splits and also the vertex input format that I 
designate. Am I correct? Or there is a way that one can just ask Zookeeper to 
create new splits and distribute to the workers from given a path in DFS?

Best Regards,
Marco Lotz

From: Claudio Martella 
mailto:claudio.marte...@gmail.com>>
Sent: 14 August 2013 15:25
To: user@giraph.apache.org
Subject: Re: Dynamic Graphs

Hi Marco,

Giraph currently does not support that. One way of doing this would be by 
having a specific (pseudo-)vertex to act as the "injector" of the new vertices 
and edges For example, it would read a file from HDFS and call the mutable API 
during the computation, superstep after superstep.


On Wed, Aug 14, 2013 at 3:02 PM, Marco Aurelio Barbosa Fagnani Lotz 
mailto:m.a.b.l...@stu12.qmul.ac.uk>> wrote:
Hello all,

I would like to know if there is any form to use dynamic graphs with Giraph. By 
dynamic one can read graphs that may change while Giraph is 
computing/deliberating. The changes are in the input file and are not caused by 
the graph computation itself.

Is there any way to analyse it using Giraph

MySQL Table

2013-09-05 Thread Bu Xiao
Hi Girapher,

   I am currently working on algorithm that requires reading the
vertices from MySQL table and not from HDFS. I thought that there has to be
a way of reading data from SQL table since Giraph is built on top of
Hadoop. But I do not seem to figure this part out. Do you have a class
similar to the DBInputFormat in Hadoop? Thank you very much for your help.


RE: FileNotFoundException: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.

2013-09-05 Thread Ken Williams
> great, I need to get a review soon to get the patch in the codebase.
If I can help then let me know.
Thanks again,
Ken

From: claudio.marte...@gmail.com
Date: Thu, 5 Sep 2013 16:50:43 +0200
Subject: Re: FileNotFoundException: File 
_bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.
To: user@giraph.apache.org

great, i need to get a review soon to get the patch in the codebase.

On Thu, Sep 5, 2013 at 2:10 PM, Ken Williams  wrote:





Hi Claudio,
The patch worked !!  :-)
Just to be clear,I am running Giraph (1.0.0), not git cloned.

 and hadoop 2.0.0-cdh4.1.1
I applied your patch and rebuilt the giraph source code with this command,  
 mvn -Phadoop_2.0.0 clean compile package test install verify

   This built correctly, with no exceptions and no tests failed.   
I then ran the giraph example, which ran successfully with this command
[root@localhost giraph]# hadoop jar 
/usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-
 alpha-jar-with-dependencies.jar  org.apache.giraph.GiraphRunner 
org.apache.giraph.examples.SimpleShortestPathsVertex  -vif 
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat  -vip 
/user/root/input/tiny_graph.txt   -of 
org.apache.giraph.io.formats.IdWithValueTextOutputFormat   -op 
/user/root/output/shortestpaths -w 1


I then deleted the outputhadoop fs -rm -R  
/user/root/output/shortestpaths
I then restarted my HBase daemons, and ran the giraph example again, and it 
worked successfully again,

no errors, no exceptions, no tasks failed, and output produced correctly.
Using 'netstat -an | grep 22181' I can see that ZooKeeper is listening on port 
22181.


 Thank you very much for your help  :-)
Ken

-- 
Claudio Martella
claudio.marte...@gmail.com   
  

Re: FileNotFoundException: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.

2013-09-05 Thread Claudio Martella
great, i need to get a review soon to get the patch in the codebase.


On Thu, Sep 5, 2013 at 2:10 PM, Ken Williams  wrote:

> Hi Claudio,
>
> The patch worked !!  :-)
>
> Just to be clear,
> I am running Giraph (1.0.0), not git cloned.
>  and hadoop 2.0.0-cdh4.1.1
>
> I applied your patch and rebuilt the giraph source code with
>  this command,
>mvn -Phadoop_2.0.0 clean compile package test
> install verify
>
> This built correctly, with no exceptions and no tests failed.
>
> I then ran the giraph example, which ran successfully with this command
>
> [root@localhost giraph]# hadoop jar
> /usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-
> alpha-jar-with-dependencies.jar  org.apache.giraph.GiraphRunner
> org.apache.giraph.examples.SimpleShortestPathsVertex  -vif
> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
>  -vip /user/root/input/tiny_graph.txt   -of
> org.apache.giraph.io.formats.IdWithValueTextOutputFormat   -op
> /user/root/output/shortestpaths -w 1
>
> I then deleted the output
> hadoop fs -rm -R  /user/root/output/shortestpaths
>
> I then restarted my HBase daemons, and ran the giraph example again, and
> it worked successfully again,
> no errors, no exceptions, no tasks failed, and output produced correctly.
>
> Using 'netstat -an | grep 22181' I can see that ZooKeeper is listening on
> port 22181.
>
>  Thank you very much for your help  :-)
>
> Ken
>
>
> --
> From: claudio.marte...@gmail.com
> Date: Wed, 4 Sep 2013 19:21:37 +0200
>
> Subject: Re: FileNotFoundException: File
> _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.
> To: user@giraph.apache.org
>
> Giraph is shipped with Zookeeper 3.3.3, and it is run, if an existing
> zookeeper is not used through the giraph.zkServerList parameter, with its
> own configuration listening on port 22181.
>
>
> On Wed, Sep 4, 2013 at 7:11 PM, Ken Williams  wrote:
>
> H. Interesting.
>
> Is Giraph (1.0.0) supposed to come with its own version of ZooKeeper ?
>
> The only version of ZooKeeper I have installed is the one that came with
> HBase,
> and the config file it uses /etc/zookeeper/conf/zoo.cfg specifies
> clientPort=2181
> This is the only zoo.cfg file on my machine.
>
>
> [root@localhost]# cat /etc/zookeeper/conf/zoo.cfg
> 
> maxClientCnxns=50
> # The number of milliseconds of each tick
> tickTime=2000
> # The number of ticks that the initial
> # synchronization phase can take
> initLimit=10
> # The number of ticks that can pass between
> # sending a request and getting an acknowledgement
> syncLimit=5
> # the directory where the snapshot is stored.
> dataDir=/var/lib/zookeeper
> # the port at which the clients will connect
> clientPort=2181
> server.1=localhost:2888:3888
> [root@localhost Downloads]#
>
>
>
> --
> From: claudio.marte...@gmail.com
> Date: Wed, 4 Sep 2013 12:13:50 +0200
>
> Subject: Re: FileNotFoundException: File
> _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.
> To: user@giraph.apache.org
>
> That should in principle not be the case, as the zookeeper started by
> Giraph listens on a different port than the default. See
> parameter giraph.zkServerPort, which defaults to 22181.
>
>
> On Wed, Sep 4, 2013 at 11:40 AM, Ken Williams  wrote:
>
> Hi Claudio,
>
> I think I have fixed the problem.
>
>HBase runs with its own copy of ZooKeeper which listens on port 2181.
>So, when I tried to start ZooKeeper for Giraph it also tried to listen
> on port 2181
>and found it was already in use, and then it terminated - which is why
> Giraph failed.
>If I stop the HBase daemons (including its copy of ZooKeeper) then
> Giraph runs fine.
>
>Essentially there is a conflict between running ZooKeeper for Giraph,
> if there is
>already ZooKeeper running for HBase.
>
>I will try the patch and get back to you.
>
>Thanks for all your help,
>
> Ken
>
> --
> From: claudio.marte...@gmail.com
> Date: Tue, 3 Sep 2013 17:01:01 +0200
>
> Subject: Re: FileNotFoundException: File
> _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.
> To: user@giraph.apache.org
>
> try with the attached patch applied to trunk, without the mentioned -D
> giraph.zkManagerDirectory.
>
>
> On Tue, Sep 3, 2013 at 3:25 PM, Ken Williams  wrote:
>
> Hi Claudio,
>
> I tried this but it made no difference. The map tasks still fail,
> still no output, and still an
> exception in the log files - FileNotFoundException: File
> /tmp/giraph/_zkServer does not exist.
>
> [root@localhost giraph]# hadoop jar
> /usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar
>   org.apache.giraph.GiraphRunner
>  -Dgiraph.zkManagerDirectory='/tmp/giraph/'
> org.apache.giraph.examples.SimpleShortestPathsVertex  -vif
> org.apache.giraph.io.for

RE: FileNotFoundException: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.

2013-09-05 Thread Ken Williams
Hi Claudio,
The patch worked !!  :-)
Just to be clear,I am running Giraph (1.0.0), not git cloned. 
and hadoop 2.0.0-cdh4.1.1
I applied your patch and rebuilt the giraph source code with this command,  
 mvn -Phadoop_2.0.0 clean compile package test install verify   
This built correctly, with no exceptions and no tests failed.   
I then ran the giraph example, which ran successfully with this command
[root@localhost giraph]# hadoop jar 
/usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-
 alpha-jar-with-dependencies.jar  org.apache.giraph.GiraphRunner 
org.apache.giraph.examples.SimpleShortestPathsVertex  -vif 
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat  -vip 
/user/root/input/tiny_graph.txt   -of 
org.apache.giraph.io.formats.IdWithValueTextOutputFormat   -op 
/user/root/output/shortestpaths -w 1
I then deleted the outputhadoop fs -rm -R  
/user/root/output/shortestpaths
I then restarted my HBase daemons, and ran the giraph example again, and it 
worked successfully again,no errors, no exceptions, no tasks failed, and output 
produced correctly.
Using 'netstat -an | grep 22181' I can see that ZooKeeper is listening on port 
22181.
 Thank you very much for your help  :-)
Ken

From: claudio.marte...@gmail.com
Date: Wed, 4 Sep 2013 19:21:37 +0200
Subject: Re: FileNotFoundException: File 
_bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.
To: user@giraph.apache.org

Giraph is shipped with Zookeeper 3.3.3, and it is run, if an existing zookeeper 
is not used through the giraph.zkServerList parameter, with its own 
configuration listening on port 22181.



On Wed, Sep 4, 2013 at 7:11 PM, Ken Williams  wrote:





H. Interesting.
Is Giraph (1.0.0) supposed to come with its own version of ZooKeeper ?
The only version of ZooKeeper I have installed is the one that came with HBase,

and the config file it uses /etc/zookeeper/conf/zoo.cfg specifies 
clientPort=2181This is the only zoo.cfg file on my machine.



[root@localhost]# cat /etc/zookeeper/conf/zoo.cfg maxClientCnxns=50# The 
number of milliseconds of each tick

tickTime=2000# The number of ticks that the initial # synchronization phase can 
takeinitLimit=10# The number of ticks that can pass between 

# sending a request and getting an acknowledgementsyncLimit=5# the directory 
where the snapshot is stored.dataDir=/var/lib/zookeeper# the port at which the 
clients will connect

clientPort=2181server.1=localhost:2888:3888[root@localhost Downloads]# 


From: claudio.marte...@gmail.com


Date: Wed, 4 Sep 2013 12:13:50 +0200
Subject: Re: FileNotFoundException: File 
_bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.
To: user@giraph.apache.org



That should in principle not be the case, as the zookeeper started by Giraph 
listens on a different port than the default. See parameter 
giraph.zkServerPort, which defaults to 22181.



On Wed, Sep 4, 2013 at 11:40 AM, Ken Williams  wrote:







Hi Claudio,
I think I have fixed the problem.
   HBase runs with its own copy of ZooKeeper which listens on port 2181.   So, 
when I tried to start ZooKeeper for Giraph it also tried to listen on port 2181



   and found it was already in use, and then it terminated - which is why 
Giraph failed.   If I stop the HBase daemons (including its copy of ZooKeeper) 
then Giraph runs fine. 
   Essentially there is a conflict between running ZooKeeper for Giraph, if 
there is 



   already ZooKeeper running for HBase. 
   I will try the patch and get back to you.
   Thanks for all your help,
Ken




From: claudio.marte...@gmail.com
Date: Tue, 3 Sep 2013 17:01:01 +0200
Subject: Re: FileNotFoundException: File 
_bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.




To: user@giraph.apache.org

try with the attached patch applied to trunk, without the mentioned -D 
giraph.zkManagerDirectory.





On Tue, Sep 3, 2013 at 3:25 PM, Ken Williams  wrote:





Hi Claudio,
I tried this but it made no difference. The map tasks still fail, still no 
output, and still anexception in the log files - FileNotFoundException: File 
/tmp/giraph/_zkServer does not exist.






[root@localhost giraph]# hadoop jar 
/usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar
   org.apache.giraph.GiraphRunner  -Dgiraph.zkManagerDirectory='/tmp/giraph/'   
  org.apache.giraph.examples.SimpleShortestPathsVertex  -vif 
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip 
/user/root/input/tiny_graph.txt -of 
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op 
/user/root/output/shortestpaths -w 1 





13/09/03 14:19:58 INFO utils.ConfigurationUtils: No edge input format 
specified. Ensure your InputFormat does not require one.13/09/03 14:19:58 WARN 
job.GiraphConfigurationValidator: Output format vertex index type is n