RE: Dynamic Graphs
Hello all, Answering Mr. Kampf question: In my personal opinion this tool would be indeed really useful, since many of the real-world graphs are dynamic. I have just finished a report of my research in the subject. The report is available at: https://github.com/MarcoLotz/dynamicGraph/blob/master/LotzReport.pdf?raw=true There is a first application that can do this injection. I am working in the minor modifications that are proposed in the document right now. It is described in section 2.7 The previous sections just describes some experiences that I had with Giraph and an introduction to the scenario. Best Regards, Marco Lotz From: Mirko Kämpf mirko.kae...@cloudera.com Sent: 25 August 2013 07:55 To: user@giraph.apache.org Subject: Re: Dynamic Graphs Good morning Gentlemen, as far as I understand your thread you are talking about the same topic I was thinking and working some time. I work on a research project focused on evolution of networks and networks dynamics in networks of networks. My understanding of Marco's question is, that he needs to change node properties or even wants to add nodes to the graph while it is processed, right? With the WorkerContext we could construct a Connector to the outside world, not just for loading data from HDFS, which requires a preprocessing step for the data which has to be loaded also. I think about HBase often. All my nodes and edges live in HBase. From there it is quite easy to load new data based on a simple Scan or even if the WorkerContext triggers a Hive or Pig script, one can automatically reorganize or extract relevant new links / nodes which have to be added to the graph. Such an approach means, after n super steps of the Giraph layer an additional utility-step (triggered via WorkerContext, or any other better fitting class form Giraph - not sure jet there to start) is executed. Before such a step the state of the graph is persisted to allow fall back or resume. The utility-step can be a processing (MR, Mahout) or just a load (from HDFS, HBase) operation and it allows a kind of clocked data flow directly into a running Giraph application. I think this is a very important feature in Complex Systems research, as we have interacting layers which change in parallel. In this picture the Giraph steps are the steps of layer A, lets say something whats going on on top of a network and the utility-step expresses the changes in the underlying structure affecting the network it self but based on the data / properties of the second subsystem, e.g. the agents operating on top of the network. I created a tool, which worked like this - but not at scale - and it was at a time before Giraph. What do you think, is there a need for such a kind of extension in the Giraph world? Have a nice Sunday. Best wishes Mirko -- -- Mirko Kämpf Trainer @ Cloudera tel: +49 176 20 63 51 99 skype: kamir1604 mi...@cloudera.commailto:mi...@cloudera.com On Wed, Aug 21, 2013 at 3:30 PM, Claudio Martella claudio.marte...@gmail.commailto:claudio.marte...@gmail.com wrote: As I said, the injection of the new vertices/edges would have to be done manually, hence without any support of the infrastructure. I'd suggest you implement a WorkerContext class that supports the reading of a specific file with a specific format (under your control) from HDFS, and that is accessed by this particular special vertex (e.g. based on the vertex ID). Does this make sense? On Wed, Aug 21, 2013 at 2:13 PM, Marco Aurelio Barbosa Fagnani Lotz m.a.b.l...@stu12.qmul.ac.ukmailto:m.a.b.l...@stu12.qmul.ac.uk wrote: Dear Mr. Martella, Once achieved the conditions for updating the vertex data base, what it the best way for the Injector Vertex to call an input reader again? I am able to access all the HDFS data, but I guess the vertex would need to have access to the input splits and also the vertex input format that I designate. Am I correct? Or there is a way that one can just ask Zookeeper to create new splits and distribute to the workers from given a path in DFS? Best Regards, Marco Lotz From: Claudio Martella claudio.marte...@gmail.commailto:claudio.marte...@gmail.com Sent: 14 August 2013 15:25 To: user@giraph.apache.orgmailto:user@giraph.apache.org Subject: Re: Dynamic Graphs Hi Marco, Giraph currently does not support that. One way of doing this would be by having a specific (pseudo-)vertex to act as the injector of the new vertices and edges For example, it would read a file from HDFS and call the mutable API during the computation, superstep after superstep. On Wed, Aug 14, 2013 at 3:02 PM, Marco Aurelio Barbosa Fagnani Lotz m.a.b.l...@stu12.qmul.ac.ukmailto:m.a.b.l...@stu12.qmul.ac.uk wrote: Hello all, I would like to know if there is any form to use dynamic graphs with Giraph. By dynamic one can read graphs that may change while Giraph is computing/deliberating. The changes
RE: Dynamic Graphs
Dear Mr. Martella, Once achieved the conditions for updating the vertex data base, what it the best way for the Injector Vertex to call an input reader again? I am able to access all the HDFS data, but I guess the vertex would need to have access to the input splits and also the vertex input format that I designate. Am I correct? Or there is a way that one can just ask Zookeeper to create new splits and distribute to the workers from given a path in DFS? Best Regards, Marco Lotz From: Claudio Martella claudio.marte...@gmail.com Sent: 14 August 2013 15:25 To: user@giraph.apache.org Subject: Re: Dynamic Graphs Hi Marco, Giraph currently does not support that. One way of doing this would be by having a specific (pseudo-)vertex to act as the injector of the new vertices and edges For example, it would read a file from HDFS and call the mutable API during the computation, superstep after superstep. On Wed, Aug 14, 2013 at 3:02 PM, Marco Aurelio Barbosa Fagnani Lotz m.a.b.l...@stu12.qmul.ac.ukmailto:m.a.b.l...@stu12.qmul.ac.uk wrote: Hello all, I would like to know if there is any form to use dynamic graphs with Giraph. By dynamic one can read graphs that may change while Giraph is computing/deliberating. The changes are in the input file and are not caused by the graph computation itself. Is there any way to analyse it using Giraph? If not, anyone has any idea/suggestion if it is possible to modify the framework in order to process it? Best Regards, Marco Lotz -- Claudio Martella claudio.marte...@gmail.commailto:claudio.marte...@gmail.com
RE: Workers input splits and MasterCompute communication
Hello all :) I am having problems calling getContext().getInputSplit(); inside the compute() method in the workers. It always returns as if it didn't get any split at all, since inputSplit.getLocations() returns without the hosts that should have that split as local and inputSplit.getLength() returns 0. Should there be any initialization to the Workers context so that I can get this information? Is there anyway to access the jobContext from the workers or the Master? Best Regards, Marco Lotz From: Marco Aurelio Barbosa Fagnani Lotz m.a.b.l...@stu12.qmul.ac.uk Sent: 17 August 2013 20:20 To: user@giraph.apache.org Subject: Workers input splits and MasterCompute communication Hello all :) In what class the workers actually get the input file splits from the file system? Is it possible to a MasterCompute class object to have access/communication with the workers in that job? I though about using aggregators, but then I assumed that aggregators actually work with vertices compute() (and related methods) and not with the worker itself. When I mean workers I don't mean the vertices in each worker, but the object that runs the compute for all the vertices in that worker. Best Regards, Marco Lotz
New vertex allocation and messages
Hello all :) I am programming an application that has to create and destroy a few vertices. I was wondering if there is any protection in Giraph to prevent a vertex to send a message to another vertex that does not exist (i.e. provide a vertex id that is not associated with a vertex yet). Is there a way to test if the destination vertex exists before sending the message to it? Also, when a vertex is created, is there any source of load balancing or it is always kept in the worker that created it? Best Regards, Marco Lotz
Workers input splits and MasterCompute communication
Hello all :) In what class the workers actually get the input file splits from the file system? Is it possible to a MasterCompute class object to have access/communication with the workers in that job? I though about using aggregators, but then I assumed that aggregators actually work with vertices compute() (and related methods) and not with the worker itself. When I mean workers I don't mean the vertices in each worker, but the object that runs the compute for all the vertices in that worker. Best Regards, Marco Lotz
RE: Logger output
Thanks Ashish :) I took a look in the directory HADOOP_BASE_PATH/logs/userlogs/job_number , but in the syslog there are no indications about the these logs. Right now I am running Giraph in a pseudo-distributed mode, so it should be in this machine. I even tried to change from LOG.debug() to LOG.info() to see if it appears in the logs and it still didn't work. Am I missing something? Should I somehow initialize the LOG by a different method than just declaring it with private static final Logger LOG = Logger.getLogger(SimpleBFSComputation.class);? I am trying to log right now with: LOG.info(testinglog); Best Regards, Marco Lotz From: Ashish Jain ashish@gmail.com Sent: 09 August 2013 18:48 To: user@giraph.apache.org Subject: Re: Logger output Hello Marco, In my experiments, I have found the log output to be in the hadoop log file of the application. When you run your application, note down the job number. The hadoop log file is usually in HADOOP_BASE_PATH/logs/userlogs/job_number. In it you need to look at syslog, among the various lines interleaved will be the output of Log. If you run your program on a cluster, you might have to find out on which node was the program run. One way is, if you use -op in your application, look at the log to see the cluster node name. Other way is to just check the HADOOP_BASE_PATH/logs/userlogs/job_number on all the nodes of your cluster. You will find output from the MasterThread and from one/more worker threads. This is the approach I have used, there might be a better way to do this. Hope this helps. Ashish On Fri, Aug 9, 2013 at 4:43 AM, Marco Aurelio Barbosa Fagnani Lotz m.a.b.l...@stu12.qmul.ac.ukmailto:m.a.b.l...@stu12.qmul.ac.uk wrote: Hello there! :) I am writing a Giraph application but I could not find the output place for the logs. Where is the default output path to see the logged info? By log I mean the log that is inside a class that one creates: private static final Logger LOG = Logger.getLogger(SimpleBFSComputation.class); I call the following method to enable that log to debug: Log.setLevel(Level.DEBUG); And then write some random content in it: if (LOG.isDebugEnabled){ LOG.debug(This is a logged line);} Just to clarify, if I called the Log.setLevel(Level.DEBUG); I am enabling the log for debug, and then the method isDebugEnabled will return true, correct? Best Regards, Marco Lotz