Re: How to utilize combiners
Hi Kyle, combiners are set by the user, as you recognized, and called automatically by the infrastructure at different moments in the path. Combined messages are passed transparently to the compute method (namely less messages than a vertex would have received without a combiner). Have a look at the PageRank examples and benchmark code. Best, Claudio On Tue, Aug 20, 2013 at 8:51 PM, Kyle Orlando kyle.r.orla...@gmail.comwrote: Hey all, I was wondering if there was any example code I could look at that uses a combiner. Creating your own Combiner is easy enough, e.g. DoubleSumCombiner, but I am confused as to how/where I would use the classes in my code. For example, say I wanted to utilize the DoubleSumCombiner class to sum up all of the messages arriving at a particular vertex at the beginning of the superstep, and I wanted to do this for each vertex in the graph. Where should I instantiate a DoubleSumCombiner, when should I call the combine() and createInitialMessage() methods, etc. in the compute() method? What further confuses me is that I see that the MasterCompute class has methods for setCombiner() and getCombiner(), and that there is also a command line option -c to specify a Combiner. I'm not really sure if these are even necessary, but if they are, I don't know how these come into play either. Some clarification or direction towards an example would be nice! Thanks, -- Kyle Orlando Computer Engineering Major University of Maryland -- Claudio Martella claudio.marte...@gmail.com
RE: Dynamic Graphs
Dear Mr. Martella, Once achieved the conditions for updating the vertex data base, what it the best way for the Injector Vertex to call an input reader again? I am able to access all the HDFS data, but I guess the vertex would need to have access to the input splits and also the vertex input format that I designate. Am I correct? Or there is a way that one can just ask Zookeeper to create new splits and distribute to the workers from given a path in DFS? Best Regards, Marco Lotz From: Claudio Martella claudio.marte...@gmail.com Sent: 14 August 2013 15:25 To: user@giraph.apache.org Subject: Re: Dynamic Graphs Hi Marco, Giraph currently does not support that. One way of doing this would be by having a specific (pseudo-)vertex to act as the injector of the new vertices and edges For example, it would read a file from HDFS and call the mutable API during the computation, superstep after superstep. On Wed, Aug 14, 2013 at 3:02 PM, Marco Aurelio Barbosa Fagnani Lotz m.a.b.l...@stu12.qmul.ac.ukmailto:m.a.b.l...@stu12.qmul.ac.uk wrote: Hello all, I would like to know if there is any form to use dynamic graphs with Giraph. By dynamic one can read graphs that may change while Giraph is computing/deliberating. The changes are in the input file and are not caused by the graph computation itself. Is there any way to analyse it using Giraph? If not, anyone has any idea/suggestion if it is possible to modify the framework in order to process it? Best Regards, Marco Lotz -- Claudio Martella claudio.marte...@gmail.commailto:claudio.marte...@gmail.com
Re: Dynamic Graphs
As I said, the injection of the new vertices/edges would have to be done manually, hence without any support of the infrastructure. I'd suggest you implement a WorkerContext class that supports the reading of a specific file with a specific format (under your control) from HDFS, and that is accessed by this particular special vertex (e.g. based on the vertex ID). Does this make sense? On Wed, Aug 21, 2013 at 2:13 PM, Marco Aurelio Barbosa Fagnani Lotz m.a.b.l...@stu12.qmul.ac.uk wrote: Dear Mr. Martella, Once achieved the conditions for updating the vertex data base, what it the best way for the Injector Vertex to call an input reader again? I am able to access all the HDFS data, but I guess the vertex would need to have access to the input splits and also the vertex input format that I designate. Am I correct? Or there is a way that one can just ask Zookeeper to create new splits and distribute to the workers from given a path in DFS? Best Regards, Marco Lotz -- *From:* Claudio Martella claudio.marte...@gmail.com *Sent:* 14 August 2013 15:25 *To:* user@giraph.apache.org *Subject:* Re: Dynamic Graphs Hi Marco, Giraph currently does not support that. One way of doing this would be by having a specific (pseudo-)vertex to act as the injector of the new vertices and edges For example, it would read a file from HDFS and call the mutable API during the computation, superstep after superstep. On Wed, Aug 14, 2013 at 3:02 PM, Marco Aurelio Barbosa Fagnani Lotz m.a.b.l...@stu12.qmul.ac.uk wrote: Hello all, I would like to know if there is any form to use dynamic graphs with Giraph. By dynamic one can read graphs that may change while Giraph is computing/deliberating. The changes are in the input file and are not caused by the graph computation itself. Is there any way to analyse it using Giraph? If not, anyone has any idea/suggestion if it is possible to modify the framework in order to process it? Best Regards, Marco Lotz -- Claudio Martella claudio.marte...@gmail.com -- Claudio Martella claudio.marte...@gmail.com
Re: MultiVertexInputFormat
Hi Yasser, You can do this through the Configuration parameters. You should call: description1.addParameter(myApplication.vertexInputPath, file1.txt); and description2.addParameter(myApplication.vertexInputPath, file2.txt); Then from the code of your InputFormat class you can get this parameter from Configuration. If it's not already, make sure your InputFormat implements ImmutableClassesGiraphConfigurable, and configuration is going to be set in it automatically. You can also take a look at HiveGiraphRunner which uses multiple inputs and sets parameters user passes from command line. Hope this helps, Maja From: Yasser Altowim yasser.alto...@ericsson.commailto:yasser.alto...@ericsson.com Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org user@giraph.apache.orgmailto:user@giraph.apache.org Date: Monday, August 19, 2013 9:16 AM To: user@giraph.apache.orgmailto:user@giraph.apache.org user@giraph.apache.orgmailto:user@giraph.apache.org Subject: RE: MultiVertexInputFormat Hi Guys, Any help on this will be appreciated. I am repeating my question and my code below: I am implementing an algorithm in Giraph that reads the vertex values from two input files, each has its own format. I am not using any EdgeInputFormatClass. I am now using VertexInputFormatDescription along with MultiVertexInputFormats, but still could not figure out how to set the Vertex input path for each Input Format Class. Can you please take a look at my code below and show me how to set the Vertex Input Path? I have taken a look at HiveGiraphRunner but still no luck. Thanks if (null == getConf()) { conf = new Configuration(); } GiraphConfiguration gconf = new GiraphConfiguration(getConf()); int workers = Integer.parseInt(arg0[2]); gconf.setWorkerConfiguration(workers, workers, 100.0f); ListVertexInputFormatDescription vertexInputDescriptions = Lists.newArrayList(); // Input one VertexInputFormatDescription description1 = new VertexInputFormatDescription(UseCase1FirstVertexInputFormat.class); // how to set the vertex input path? i.e. how to say that I want to read file1.txt using this input format class vertexInputDescriptions.add(description1); // Input two VertexInputFormatDescription description2 = new VertexInputFormatDescription(UseCase1SecondVertexInputFormat.class); // how to set the vertex input path? vertexInputDescriptions.add(description2); GiraphConstants.VERTEX_INPUT_FORMAT_CLASS.set(gconf, MultiVertexInputFormat.class); VertexInputFormatDescription.VERTEX_INPUT_FORMAT_DESCRIPTIONS.set(gconf,InputFormatDescription.toJsonString(vertexInputDescriptions)); gconf.setVertexOutputFormatClass(UseCase1OutputFormat.class); gconf.setComputationClass(UseCase1Vertex.class); GiraphJob job = new GiraphJob(gconf, Use Case 1); FileOutputFormat.setOutputPath(job.getInternalJob(), new Path(arg0[1])); return job.run(true) ? 0 : -1; Thanks in advance. Best, Yasser From: Yasser Altowim [mailto:yasser.alto...@ericsson.com] Sent: Friday, August 16, 2013 11:36 AM To: user@giraph.apache.orgmailto:user@giraph.apache.org Subject: RE: MultiVertexInputFormat Thanks a lot Avery for your response. I am now using VertexInputFormatDescription, but still could not figure out how to set the Vertex input path. I just need to read the vertex values from two different files, each with its own format. I am not using any EdgeInputFormatClass. Can you please take a look at my code below and show me how to set the Vertex Input Path? Thanks if (null == getConf()) { conf = new Configuration(); } GiraphConfiguration gconf = new GiraphConfiguration(getConf()); int workers = Integer.parseInt(arg0[2]); gconf.setWorkerConfiguration(workers, workers, 100.0f); ListVertexInputFormatDescription vertexInputDescriptions = Lists.newArrayList(); // Input one VertexInputFormatDescription description1 = new VertexInputFormatDescription(UseCase1FirstVertexInputFormat.class); // how to set the vertex input path? vertexInputDescriptions.add(description1); // Input two VertexInputFormatDescription description2 = new VertexInputFormatDescription(UseCase1SecondVertexInputFormat.class); // how to set the vertex input path? vertexInputDescriptions.add(description2); VertexInputFormatDescription.VERTEX_INPUT_FORMAT_DESCRIPTIONS.set(gconf,InputFormatDescription.toJsonString(vertexInputDescriptions)); gconf.setVertexOutputFormatClass(UseCase1OutputFormat.class); gconf.setComputationClass(UseCase1Vertex.class); GiraphJob job = new GiraphJob(gconf, Use Case 1); FileOutputFormat.setOutputPath(job.getInternalJob(), new Path(arg0[1]));