Re: Local-only aggregators

2015-03-25 Thread Claudio Martella
Hi,

I'm not sure aggregators require necessarily high traffic. Aggregators are
aggregated locally on the worker before they are aggregated on the
(corresponding) master worker.
Anyway, assuming you want to proceed, my understanding is that you want
vertices on the same worker to share (aggregated) information. In that
case, I'd suggest just using a WorkerContext.

Hope this helps.
Claudio

On Wed, Mar 25, 2015 at 12:47 AM Alessio Arleo ingar...@icloud.com wrote:

 Hello everybody

 I was wondering if it was possible to extend the concept of aggregator
 from a “global” to a “local-only” perspective.

 Normally, aggregators DO cause network traffic because of the cycle:
 Workers - Aggregator Owner- MasterAggregator - AggregatorOwner - Workers

 What if I’d like to fetch and aggregate values as I would normally do with
 aggregators but without causing this traffic? Let’s assume this situation:

 1 - Define a custom partitioning class and let it partition the graph.
 This is the partition used to assign vertices to workers.
 2 - in the computation class, every time che compute method is called on a
 vertex, the data needed for computation is stored inside the vertex
 neighbours but also in non-neighbouring vertices (think about Force
 Directed layout algorithm for example; to compute the forces, is necessary
 the distance between neighbouring and not-neighbouring vertices, applying
 different kind of forces).
 — Given that the compute class is computing on vertex X
 a - I pick information from X neighbours as I would normally do (iterating
 its edges or the incoming messages)
 b - When it comes to non-neighbouring vertices I would like to use data
 from X worker only.

 The first thing I tried to understand before asking this question was:
 does this make any sense? I am probably wrong, but this actually does. If I
 partition my graph to maximize locality, what I am actually trying to do is
 to reduce the network traffic as much as possibile.

 My doubt is that if I use aggregators to achieve the result the network
 traffic would be heavy, probably losing the advantages of the initial
 partitioning. What if I could access and modify an aggregator-like local
 data structure in the same fashion (i.e. “getAggregatedValue”) but without
 broadcasting it (assuming that I do not need the aggregator to be
 accessible to every worker)? Or could it be possibile to manually assign
 partition owners in order to minimise network traffic (if I need to
 aggregate all values from vertices in partition 3 and 3 only, I assign the
 partition 3 aggregator owner to partition 3 worker)?

 I hope in your comprehension and I hope I somehow caught your attention,
 even if for a brief moment. Ask me if something is not clear ;)

 Cheers!

 ~~~

 Ing. Alessio Arleo

 Dottorando in Ingegneria Industriale e dell’Informazione

 Dottore Magistrale in Ingegneria Informatica e dell’Automazione
 Dottore in Ingegneria Informatica ed Elettronica

 Linkedin: it.linkedin.com/in/IngArleo
 Skype: Ing. Alessio Arleo

 Tel: +39 075 5853920
 Cell: +39 349 0575782

 ~~~






Re: Custom assignment of partitions to workers

2015-03-25 Thread Lukas Nalezenec

Hi,

There are two interfaces:
WorkerGraphPartitioner - Maps vertexes to partitions
MasterGraphPartitioner - Maps partitions to workers.
So you need custom MasterGraphPartitioner.
You dont need any external preprocessing step.

Lukas

On 25.3.2015 19:51, Arjun Sharma wrote:

Hi,

I understand we can override the GraphPartitionerFactory class in 
order to achieve custom partitioning of vertices over partitions. Is 
there a way to do the same to enable assigning partitions to workers 
in a custom way (e.g., partition n should be assigned to worker m)? 
The reason is that it is more beneficial to have partitions that are 
close to each other in terms of the graph structure placed close to 
each other on the same worker to minimize network traffic. Otherwise, 
benefits of overriding  GraphPartitionerFactory may be lost. Let us 
assume that there is an external preprocessing step that outputs the 
desired assignment, so can we materialize that assignment over Giraph 
workers?


Thanks!




Re: Giraph 1.1.0 not running on full cluster with Hadoop 2.6.0

2015-03-25 Thread Kenrick Fernandes
Hi Steve,

Thanks for the link - there's a different error I get now regarding not
finding some other classes,
but ive seen that before and should be able to find a fix.

Running PageRank however, still gives me the *localJobRunner* error (above)
- did you get that to run
successfully?

Thanks,
Kenrick

On Tue, Mar 24, 2015 at 3:06 PM, Steven Harenberg sdhar...@ncsu.edu wrote:

 Hey Kenrick,

 For the issue with GiraphApplicationMaster, I followed what Phillip did
 here:
 http://mail-archives.apache.org/mod_mbox/giraph-user/201503.mbox/%3CCAO3ErG_obGV8mELzX1j%2Be%3DaL6C%3D6%3DtdiSOVRBia2gh0H9tYLZA%40mail.gmail.com%3E

 Basically you need the jar for giraph-examples to be in the directory
 where you are issuing the command. You can do this by creating a symbolic
 link. I have no idea why this worked and you can't use an absolute path,
 but that is how it was for me.

 Thanks,
 Steve

 On Mon, Mar 23, 2015 at 7:06 PM, Kenrick Fernandes kenrick@gmail.com
 wrote:

 Hi Phil,

 The build was successful - now running the *ShortestPaths* example gives
 me a different error,
 *GiraphApplicationMaster* not found . However, when I run the PageRank
 benchmark, I still
 get the same *LocalJobRunner* error:

 -
 *Command:*
 hadoop jar giraph-1.1.0-for-hadoop-2.6.0-jar-with-dependencies.jar
 org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 5000 -w 30

 *Error:*
 Exception in thread main java.lang.IllegalArgumentException:
 checkLocalJobRunnerConfiguration: When using LocalJobRunner, must have only
 one worker since only 1 task at a time!
 at
 org.apache.giraph.job.GiraphJob.checkLocalJobRunnerConfiguration(GiraphJob.java:162)
 at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:236)
 at
 org.apache.giraph.benchmark.GiraphBenchmark.run(GiraphBenchmark.java:96)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
 at
 org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:158)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 -

 Did PageRank run fine for you ?

 Thanks,
 Kenrick

 On Mon, Mar 23, 2015 at 4:35 PM, Phillip Rhodes 
 motley.crue@gmail.com wrote:

 What I had to do to get this to work was:

 edit the pom.xml and change the hadoop_yarn profile to remove the one
 munge symbol that was something like _SASL_SOMETHING_OR_OTHER.

 Build using mvn -Phadoop_yarn -Dhadoop.version=2.5.2 (in my case)


 Phil

 This message optimized for indexing by NSA PRISM


 On Sun, Mar 22, 2015 at 4:28 PM, Kenrick Fernandes
 kenrick@gmail.com wrote:
  Hi,
 
  I am working with Giraph 1.1.0 and a YARN cluster with Hadoop 2.6.0.
  I build Giraph with
   mvn -Phadoop_2 -Dhadoop.version=2.6.0 clean package -DskipTests
 
  So far, when I run any of the benchmarks or Shortest path examples, I
 always
  get the LocalJobRunner error :
 
 
 
 ---
  Exception in thread main java.lang.IllegalArgumentException:
  checkLocalJobRunnerConfiguration: When using LocalJobRunner, you
 cannot run
  in split master / worker mode since there is only 1 task at a time!
  at
 
 org.apache.giraph.job.GiraphJob.checkLocalJobRunnerConfiguration(GiraphJob.java:168)
  at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:236)
  at
 org.apache.giraph.benchmark.GiraphBenchmark.run(GiraphBenchmark.java:96)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
  at
 
 org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:158)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
  ---
 
  I have tried some solutions from forums/StackOverflow/lists, but so far
  nothing has worked. As far as I can tell, Hadoop is configured right
 (other
  MR benchmarks run fine). I tried changing the above Giraph code file
 (making
  the check function returned what I wanted), but that only starts the
 Job and
  gets it running on a single machine - it never uses more than 1
 machine.
 
  Any help or pointers in the right direction would be much appreciated.
 
  Thanks,
  Kenrick


Custom assignment of partitions to workers

2015-03-25 Thread Arjun Sharma
Hi,

I understand we can override the GraphPartitionerFactory class in order to
achieve custom partitioning of vertices over partitions. Is there a way to
do the same to enable assigning partitions to workers in a custom way
(e.g., partition n should be assigned to worker m)? The reason is that it
is more beneficial to have partitions that are close to each other in terms
of the graph structure placed close to each other on the same worker to
minimize network traffic. Otherwise, benefits of
overriding  GraphPartitionerFactory may be lost. Let us assume that there
is an external preprocessing step that outputs the desired assignment, so
can we materialize that assignment over Giraph workers?

Thanks!


Re: Custom assignment of partitions to workers

2015-03-25 Thread Lukas Nalezenec

Its same.
GraphPartitionerFactory has got two methods. One for 
WorkerGraphPartitioner and second for MasterGraphPartitioner.


public interface GraphPartitionerFactory ...  {
  MasterGraphPartitionerI, V, E createMasterGraphPartitioner();
  WorkerGraphPartitionerI, V, E createWorkerGraphPartitioner();
}

Lukas

On 25.3.2015 20:57, Arjun Sharma wrote:
Thanks for your reply! I understand we can supply the 
GraphPartitionerFactory class to Giraph using the 
giraph.graphPartitionerFactoryClass class. Is there a way to supply 
the MasterGraphPartitioner class?


Thanks!

On Wed, Mar 25, 2015 at 12:54 PM, Lukas Nalezenec 
lukas.naleze...@firma.seznam.cz 
mailto:lukas.naleze...@firma.seznam.cz wrote:


 Hi,

 There are two interfaces:
 WorkerGraphPartitioner - Maps vertexes to partitions
 MasterGraphPartitioner - Maps partitions to workers.
 So you need custom MasterGraphPartitioner.
 You dont need any external preprocessing step.

 Lukas


 On 25.3.2015 19:51, Arjun Sharma wrote:

 Hi,

 I understand we can override the GraphPartitionerFactory class in 
order to achieve custom partitioning of vertices over partitions. Is 
there a way to do the same to enable assigning partitions to workers 
in a custom way (e.g., partition n should be assigned to worker m)? 
The reason is that it is more beneficial to have partitions that are 
close to each other in terms of the graph structure placed close to 
each other on the same worker to minimize network traffic. Otherwise, 
benefits of overriding  GraphPartitionerFactory may be lost. Let us 
assume that there is an external preprocessing step that outputs the 
desired assignment, so can we materialize that assignment over Giraph 
workers?


 Thanks!