Re: Running Giraph on YARN
Hi Devs, I was able to get Giraph running on Yarn by creating Yarn specific configuration programmatically. I think that it's better if we can have some shell scripts specific to Yarn (or may be modifications to existing shell scripts), so that we can easily deploy Giraph jobs on Yarn clusters. Please let me know if anyone is working on this. If no one is working on that, I would like to work on that. Thanks Milinda On Tue, Aug 13, 2013 at 2:33 PM, Milinda Pathirage mpath...@umail.iu.edu wrote: Hi, I'm trying to get Giraph running on YARN based on TestYarnJob test case. But having issues with moving required jars to YARN environment. I'm using single node YARN setup. I can see the job in YARN, but with following error. Error: Could not find or load main class org.apache.giraph.yarn.GiraphApplicationMaster I found that FileSystem.get(giraphConf) returns LocalFS inside resource copy methods by debugging. Can someone please point me to a doc or some writeup which describes how to properly configure GiraphYarnClient? Thanks Milinda -- Milinda Pathirage twitter: milindalakmal skype: milinda.pathirage blog: http://milinda.pathirage.org -- Milinda Pathirage twitter: milindalakmal skype: milinda.pathirage blog: http://milinda.pathirage.org
RE: Workers input splits and MasterCompute communication
Hello all :) I am having problems calling getContext().getInputSplit(); inside the compute() method in the workers. It always returns as if it didn't get any split at all, since inputSplit.getLocations() returns without the hosts that should have that split as local and inputSplit.getLength() returns 0. Should there be any initialization to the Workers context so that I can get this information? Is there anyway to access the jobContext from the workers or the Master? Best Regards, Marco Lotz From: Marco Aurelio Barbosa Fagnani Lotz m.a.b.l...@stu12.qmul.ac.uk Sent: 17 August 2013 20:20 To: user@giraph.apache.org Subject: Workers input splits and MasterCompute communication Hello all :) In what class the workers actually get the input file splits from the file system? Is it possible to a MasterCompute class object to have access/communication with the workers in that job? I though about using aggregators, but then I assumed that aggregators actually work with vertices compute() (and related methods) and not with the worker itself. When I mean workers I don't mean the vertices in each worker, but the object that runs the compute for all the vertices in that worker. Best Regards, Marco Lotz
RE: MultiVertexInputFormat
Hi Guys, Any help on this will be appreciated. I am repeating my question and my code below: I am implementing an algorithm in Giraph that reads the vertex values from two input files, each has its own format. I am not using any EdgeInputFormatClass. I am now using VertexInputFormatDescription along with MultiVertexInputFormats, but still could not figure out how to set the Vertex input path for each Input Format Class. Can you please take a look at my code below and show me how to set the Vertex Input Path? I have taken a look at HiveGiraphRunner but still no luck. Thanks if (null == getConf()) { conf = new Configuration(); } GiraphConfiguration gconf = new GiraphConfiguration(getConf()); int workers = Integer.parseInt(arg0[2]); gconf.setWorkerConfiguration(workers, workers, 100.0f); ListVertexInputFormatDescription vertexInputDescriptions = Lists.newArrayList(); // Input one VertexInputFormatDescription description1 = new VertexInputFormatDescription(UseCase1FirstVertexInputFormat.class); // how to set the vertex input path? i.e. how to say that I want to read file1.txt using this input format class vertexInputDescriptions.add(description1); // Input two VertexInputFormatDescription description2 = new VertexInputFormatDescription(UseCase1SecondVertexInputFormat.class); // how to set the vertex input path? vertexInputDescriptions.add(description2); GiraphConstants.VERTEX_INPUT_FORMAT_CLASS.set(gconf, MultiVertexInputFormat.class); VertexInputFormatDescription.VERTEX_INPUT_FORMAT_DESCRIPTIONS.set(gconf,InputFormatDescription.toJsonString(vertexInputDescriptions)); gconf.setVertexOutputFormatClass(UseCase1OutputFormat.class); gconf.setComputationClass(UseCase1Vertex.class); GiraphJob job = new GiraphJob(gconf, Use Case 1); FileOutputFormat.setOutputPath(job.getInternalJob(), new Path(arg0[1])); return job.run(true) ? 0 : -1; Thanks in advance. Best, Yasser From: Yasser Altowim [mailto:yasser.alto...@ericsson.com] Sent: Friday, August 16, 2013 11:36 AM To: user@giraph.apache.org Subject: RE: MultiVertexInputFormat Thanks a lot Avery for your response. I am now using VertexInputFormatDescription, but still could not figure out how to set the Vertex input path. I just need to read the vertex values from two different files, each with its own format. I am not using any EdgeInputFormatClass. Can you please take a look at my code below and show me how to set the Vertex Input Path? Thanks if (null == getConf()) { conf = new Configuration(); } GiraphConfiguration gconf = new GiraphConfiguration(getConf()); int workers = Integer.parseInt(arg0[2]); gconf.setWorkerConfiguration(workers, workers, 100.0f); ListVertexInputFormatDescription vertexInputDescriptions = Lists.newArrayList(); // Input one VertexInputFormatDescription description1 = new VertexInputFormatDescription(UseCase1FirstVertexInputFormat.class); // how to set the vertex input path? vertexInputDescriptions.add(description1); // Input two VertexInputFormatDescription description2 = new VertexInputFormatDescription(UseCase1SecondVertexInputFormat.class); // how to set the vertex input path? vertexInputDescriptions.add(description2); VertexInputFormatDescription.VERTEX_INPUT_FORMAT_DESCRIPTIONS.set(gconf,InputFormatDescription.toJsonString(vertexInputDescriptions)); gconf.setVertexOutputFormatClass(UseCase1OutputFormat.class); gconf.setComputationClass(UseCase1Vertex.class); GiraphJob job = new GiraphJob(gconf, Use Case 1); FileOutputFormat.setOutputPath(job.getInternalJob(), new Path(arg0[1])); return job.run(true) ? 0 : -1; Best, Yasser From: Avery Ching [mailto:ach...@apache.org] Sent: Friday, August 16, 2013 9:50 AM To: user@giraph.apache.orgmailto:user@giraph.apache.org Subject: Re: MultiVertexInputFormat This is doable in Giraph, you can use as many vertex or edge input formats as you like (via GIRAPH-639). You just need to choose MultiVertexInputFormat and/or MultiEdgeInputFromat See VertexInputFormatDescription for vertex input formats /** * VertexInputFormats description - JSON array containing a JSON array for * each vertex input. Vertex input JSON arrays contain one or two elements - * first one is the name of vertex input class, and second one is JSON object * with all specific parameters for this vertex input. For example: * [[VIF1,{p:v1}],[VIF2,{p:v2,q:v}]] */ public static final StrConfOption VERTEX_INPUT_FORMAT_DESCRIPTIONS = new StrConfOption(giraph.multiVertexInput.descriptions, null, VertexInputFormats description - JSON array containing
New vertex allocation and messages
Hello all :) I am programming an application that has to create and destroy a few vertices. I was wondering if there is any protection in Giraph to prevent a vertex to send a message to another vertex that does not exist (i.e. provide a vertex id that is not associated with a vertex yet). Is there a way to test if the destination vertex exists before sending the message to it? Also, when a vertex is created, is there any source of load balancing or it is always kept in the worker that created it? Best Regards, Marco Lotz
Re: Workers input splits and MasterCompute communication
That makes sense, since the Context doesn't have a real InputSplit (it's a Giraph one - see BspInputSplit). What information are you trying to get out of the input splits? Giraph workers can process an arbitrary number of input splits (0 or more), so I don't think this will be useful. You can use Configuration if you need to set some information at runtime. Avery On 8/19/13 9:14 AM, Marco Aurelio Barbosa Fagnani Lotz wrote: Hello all :) I am having problems calling getContext().getInputSplit(); inside the compute() method in the workers. It always returns as if it didn't get any split at all, since inputSplit.getLocations() returns without the hosts that should have that split as local and inputSplit.getLength() returns 0. Should there be any initialization to the Workers context so that I can get this information? Is there anyway to access the jobContext from the workers or the Master? Best Regards, Marco Lotz *From:* Marco Aurelio Barbosa Fagnani Lotz m.a.b.l...@stu12.qmul.ac.uk *Sent:* 17 August 2013 20:20 *To:* user@giraph.apache.org *Subject:* Workers input splits and MasterCompute communication Hello all :) In what class the workers actually get the input file splits from the file system? Is it possible to a MasterCompute class object to have access/communication with the workers in that job? I though about using aggregators, but then I assumed that aggregators actually work with vertices compute() (and related methods) and not with the worker itself. When I mean workers I don't mean the vertices in each worker, but the object that runs the compute for all the vertices in that worker. Best Regards, Marco Lotz
Re: New vertex allocation and messages
Yes, you can control this behavior with the VertexResolver. It handles all mutations to the graph and resolves them in a user defined way. Avery On 8/19/13 9:21 AM, Marco Aurelio Barbosa Fagnani Lotz wrote: Hello all :) I am programming an application that has to create and destroy a few vertices. I was wondering if there is any protection in Giraph to prevent a vertex to send a message to another vertex that does not exist (i.e. provide a vertex id that is not associated with a vertex yet). Is there a way to test if the destination vertex exists before sending the message to it? Also, when a vertex is created, is there any source of load balancing or it is always kept in the worker that created it? Best Regards, Marco Lotz