Re: Running Giraph on YARN

2013-08-19 Thread Milinda Pathirage
Hi Devs,

I was able to get Giraph running on Yarn by creating Yarn specific
configuration programmatically. I think that it's better if we can
have some shell scripts specific to Yarn (or may be modifications to
existing shell scripts), so that we can easily deploy Giraph jobs on
Yarn clusters. Please let me know if anyone is working on this. If no
one is working on that, I would like to work on that.

Thanks
Milinda

On Tue, Aug 13, 2013 at 2:33 PM, Milinda Pathirage
mpath...@umail.iu.edu wrote:
 Hi,

 I'm trying to get Giraph running on YARN based on TestYarnJob test
 case. But having issues with moving required jars to YARN environment.
 I'm using single node YARN setup. I can see the job in YARN, but with
 following error.

 Error: Could not find or load main class
 org.apache.giraph.yarn.GiraphApplicationMaster

 I found that FileSystem.get(giraphConf) returns LocalFS inside
 resource copy methods by debugging. Can someone please point me to a
 doc or some writeup which describes how to properly configure
 GiraphYarnClient?

 Thanks
 Milinda

 --
 Milinda Pathirage

 twitter: milindalakmal
 skype: milinda.pathirage
 blog: http://milinda.pathirage.org



-- 
Milinda Pathirage

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org


RE: Workers input splits and MasterCompute communication

2013-08-19 Thread Marco Aurelio Barbosa Fagnani Lotz
Hello all :)

I am having problems calling getContext().getInputSplit(); inside the compute() 
method in the workers.

It always returns as if it didn't get any split at all, since 
inputSplit.getLocations() returns without the hosts that should have that split 
as local and inputSplit.getLength() returns 0.

Should there be any initialization to the Workers context so that I can get 
this information?
Is there anyway to access the jobContext from the workers or the Master?

Best Regards,
Marco Lotz


From: Marco Aurelio Barbosa Fagnani Lotz m.a.b.l...@stu12.qmul.ac.uk
Sent: 17 August 2013 20:20
To: user@giraph.apache.org
Subject: Workers input splits and MasterCompute communication

Hello all :)

In what class the workers actually get the input file splits from the file 
system?

Is it possible to a MasterCompute class object to have access/communication 
with the workers in that job? I though about using aggregators, but then I 
assumed that aggregators actually work with vertices compute() (and related 
methods) and not with the worker itself.

When I mean workers I don't mean the vertices in each worker, but the object 
that runs the compute for all the vertices in that worker.

Best Regards,
Marco Lotz


RE: MultiVertexInputFormat

2013-08-19 Thread Yasser Altowim
Hi Guys,

 Any help on this will be appreciated. I am repeating my question and my 
code below:


I am implementing an algorithm in Giraph that reads the vertex values from two 
input files, each has its own format. I am not using  any EdgeInputFormatClass. 
I am now using VertexInputFormatDescription along with MultiVertexInputFormats, 
but still could not figure out how to set the Vertex input path for each Input 
Format Class. Can you please take a look at my code below and show me how to 
set the Vertex Input Path? I have taken a look at HiveGiraphRunner but still no 
luck. Thanks

if (null == getConf()) {
conf = new Configuration();
}

GiraphConfiguration gconf = new GiraphConfiguration(getConf());
int workers = Integer.parseInt(arg0[2]);
gconf.setWorkerConfiguration(workers, workers, 100.0f);

ListVertexInputFormatDescription vertexInputDescriptions = 
Lists.newArrayList();

// Input one
VertexInputFormatDescription description1 = new 
VertexInputFormatDescription(UseCase1FirstVertexInputFormat.class);
// how to set the vertex input path? i.e. how to say that I want to read 
file1.txt using this input format class
vertexInputDescriptions.add(description1);

// Input two
VertexInputFormatDescription description2 = new 
VertexInputFormatDescription(UseCase1SecondVertexInputFormat.class);
// how to set the vertex input path?
vertexInputDescriptions.add(description2);


GiraphConstants.VERTEX_INPUT_FORMAT_CLASS.set(gconf,

MultiVertexInputFormat.class);

VertexInputFormatDescription.VERTEX_INPUT_FORMAT_DESCRIPTIONS.set(gconf,InputFormatDescription.toJsonString(vertexInputDescriptions));

gconf.setVertexOutputFormatClass(UseCase1OutputFormat.class);
gconf.setComputationClass(UseCase1Vertex.class);
GiraphJob job = new GiraphJob(gconf, Use Case 1);
FileOutputFormat.setOutputPath(job.getInternalJob(), new Path(arg0[1]));
return job.run(true) ? 0 : -1;


Thanks in advance.

Best,
Yasser

From: Yasser Altowim [mailto:yasser.alto...@ericsson.com]
Sent: Friday, August 16, 2013 11:36 AM
To: user@giraph.apache.org
Subject: RE: MultiVertexInputFormat

Thanks a lot Avery for your response. I am now using 
VertexInputFormatDescription, but still could not figure out how to set the 
Vertex input path. I just need to read the vertex values from two different 
files, each with its own format. I am not using  any EdgeInputFormatClass.

 Can you please take a look at my code below and show me how to set the 
Vertex Input Path? Thanks


if (null == getConf()) {
conf = new Configuration();
   }

   GiraphConfiguration gconf = new GiraphConfiguration(getConf());
   int workers = Integer.parseInt(arg0[2]);
   gconf.setWorkerConfiguration(workers, workers, 100.0f);



   ListVertexInputFormatDescription vertexInputDescriptions = 
Lists.newArrayList();

   // Input one
   VertexInputFormatDescription description1 = new 
VertexInputFormatDescription(UseCase1FirstVertexInputFormat.class);
   // how to set the vertex input path?
   vertexInputDescriptions.add(description1);

  // Input two
   VertexInputFormatDescription description2 = new 
VertexInputFormatDescription(UseCase1SecondVertexInputFormat.class);
   // how to set the vertex input path?
   vertexInputDescriptions.add(description2);


  
VertexInputFormatDescription.VERTEX_INPUT_FORMAT_DESCRIPTIONS.set(gconf,InputFormatDescription.toJsonString(vertexInputDescriptions));


   gconf.setVertexOutputFormatClass(UseCase1OutputFormat.class);
   gconf.setComputationClass(UseCase1Vertex.class);
   GiraphJob job = new GiraphJob(gconf, Use Case 1);
   FileOutputFormat.setOutputPath(job.getInternalJob(), new 
Path(arg0[1]));
   return job.run(true) ? 0 : -1;



Best,
Yasser

From: Avery Ching [mailto:ach...@apache.org]
Sent: Friday, August 16, 2013 9:50 AM
To: user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Re: MultiVertexInputFormat

This is doable in Giraph, you can use as many vertex or edge input formats as 
you like (via GIRAPH-639).  You just need to choose MultiVertexInputFormat 
and/or MultiEdgeInputFromat

See VertexInputFormatDescription for vertex input formats

  /**
   * VertexInputFormats description - JSON array containing a JSON array for
   * each vertex input. Vertex input JSON arrays contain one or two elements -
   * first one is the name of vertex input class, and second one is JSON object
   * with all specific parameters for this vertex input. For example:
   * [[VIF1,{p:v1}],[VIF2,{p:v2,q:v}]]
   */
  public static final StrConfOption VERTEX_INPUT_FORMAT_DESCRIPTIONS =
  new StrConfOption(giraph.multiVertexInput.descriptions, null,
  VertexInputFormats description - JSON array containing 

New vertex allocation and messages

2013-08-19 Thread Marco Aurelio Barbosa Fagnani Lotz
Hello all :)

I am programming an application that has to create and destroy a few vertices. 
I was wondering if there is any protection in Giraph to prevent a vertex to 
send a message to another vertex that does not exist (i.e. provide a vertex id 
that is not associated with a vertex yet).

Is there a way to test if the destination vertex exists before sending the 
message to it?

Also, when a vertex is created, is there any source of load balancing or it is 
always kept in the worker that created it?

Best Regards,
Marco Lotz




Re: Workers input splits and MasterCompute communication

2013-08-19 Thread Avery Ching
That makes sense, since the Context doesn't have a real InputSplit (it's 
a Giraph one - see BspInputSplit).


What information are you trying to get out of the input splits? Giraph 
workers can process an arbitrary number of input splits (0 or more), so 
I don't think this will be useful.


You can use Configuration if you need to set some information at runtime.

Avery

On 8/19/13 9:14 AM, Marco Aurelio Barbosa Fagnani Lotz wrote:

Hello all :)

I am having problems calling getContext().getInputSplit(); inside the 
compute() method in the workers.


It always returns as if it didn't get any split at all, since 
inputSplit.getLocations() returns without the hosts that should have 
that split as local and inputSplit.getLength() returns 0.


Should there be any initialization to the Workers context so that I 
can get this information?

Is there anyway to access the jobContext from the workers or the Master?

Best Regards,
Marco Lotz


*From:* Marco Aurelio Barbosa Fagnani Lotz m.a.b.l...@stu12.qmul.ac.uk
*Sent:* 17 August 2013 20:20
*To:* user@giraph.apache.org
*Subject:* Workers input splits and MasterCompute communication
Hello all :)

In what class the workers actually get the input file splits from the 
file system?


Is it possible to a MasterCompute class object to have 
access/communication with the workers in that job? I though about 
using aggregators, but then I assumed that aggregators actually work 
with vertices compute() (and related methods) and not with the worker 
itself.


When I mean workers I don't mean the vertices in each worker, but the 
object that runs the compute for all the vertices in that worker.


Best Regards,
Marco Lotz




Re: New vertex allocation and messages

2013-08-19 Thread Avery Ching
Yes, you can control this behavior with the VertexResolver.  It handles 
all mutations to the graph and resolves them in a user defined way.


Avery

On 8/19/13 9:21 AM, Marco Aurelio Barbosa Fagnani Lotz wrote:

Hello all :)

I am programming an application that has to create and destroy a few 
vertices. I was wondering if there is any protection in Giraph to 
prevent a vertex to send a message to another vertex that does not 
exist (i.e. provide a vertex id that is not associated with a vertex yet).


Is there a way to test if the destination vertex exists before sending 
the message to it?


Also, when a vertex is created, is there any source of load balancing 
or it is always kept in the worker that created it?


Best Regards,
Marco Lotz