Re: Local-only aggregators
Hi, I'm not sure aggregators require necessarily high traffic. Aggregators are aggregated locally on the worker before they are aggregated on the (corresponding) master worker. Anyway, assuming you want to proceed, my understanding is that you want vertices on the same worker to share (aggregated) information. In that case, I'd suggest just using a WorkerContext. Hope this helps. Claudio On Wed, Mar 25, 2015 at 12:47 AM Alessio Arleo ingar...@icloud.com wrote: Hello everybody I was wondering if it was possible to extend the concept of aggregator from a “global” to a “local-only” perspective. Normally, aggregators DO cause network traffic because of the cycle: Workers - Aggregator Owner- MasterAggregator - AggregatorOwner - Workers What if I’d like to fetch and aggregate values as I would normally do with aggregators but without causing this traffic? Let’s assume this situation: 1 - Define a custom partitioning class and let it partition the graph. This is the partition used to assign vertices to workers. 2 - in the computation class, every time che compute method is called on a vertex, the data needed for computation is stored inside the vertex neighbours but also in non-neighbouring vertices (think about Force Directed layout algorithm for example; to compute the forces, is necessary the distance between neighbouring and not-neighbouring vertices, applying different kind of forces). — Given that the compute class is computing on vertex X a - I pick information from X neighbours as I would normally do (iterating its edges or the incoming messages) b - When it comes to non-neighbouring vertices I would like to use data from X worker only. The first thing I tried to understand before asking this question was: does this make any sense? I am probably wrong, but this actually does. If I partition my graph to maximize locality, what I am actually trying to do is to reduce the network traffic as much as possibile. My doubt is that if I use aggregators to achieve the result the network traffic would be heavy, probably losing the advantages of the initial partitioning. What if I could access and modify an aggregator-like local data structure in the same fashion (i.e. “getAggregatedValue”) but without broadcasting it (assuming that I do not need the aggregator to be accessible to every worker)? Or could it be possibile to manually assign partition owners in order to minimise network traffic (if I need to aggregate all values from vertices in partition 3 and 3 only, I assign the partition 3 aggregator owner to partition 3 worker)? I hope in your comprehension and I hope I somehow caught your attention, even if for a brief moment. Ask me if something is not clear ;) Cheers! ~~~ Ing. Alessio Arleo Dottorando in Ingegneria Industriale e dell’Informazione Dottore Magistrale in Ingegneria Informatica e dell’Automazione Dottore in Ingegneria Informatica ed Elettronica Linkedin: it.linkedin.com/in/IngArleo Skype: Ing. Alessio Arleo Tel: +39 075 5853920 Cell: +39 349 0575782 ~~~
Re: Custom assignment of partitions to workers
Hi, There are two interfaces: WorkerGraphPartitioner - Maps vertexes to partitions MasterGraphPartitioner - Maps partitions to workers. So you need custom MasterGraphPartitioner. You dont need any external preprocessing step. Lukas On 25.3.2015 19:51, Arjun Sharma wrote: Hi, I understand we can override the GraphPartitionerFactory class in order to achieve custom partitioning of vertices over partitions. Is there a way to do the same to enable assigning partitions to workers in a custom way (e.g., partition n should be assigned to worker m)? The reason is that it is more beneficial to have partitions that are close to each other in terms of the graph structure placed close to each other on the same worker to minimize network traffic. Otherwise, benefits of overriding GraphPartitionerFactory may be lost. Let us assume that there is an external preprocessing step that outputs the desired assignment, so can we materialize that assignment over Giraph workers? Thanks!
Re: Giraph 1.1.0 not running on full cluster with Hadoop 2.6.0
Hi Steve, Thanks for the link - there's a different error I get now regarding not finding some other classes, but ive seen that before and should be able to find a fix. Running PageRank however, still gives me the *localJobRunner* error (above) - did you get that to run successfully? Thanks, Kenrick On Tue, Mar 24, 2015 at 3:06 PM, Steven Harenberg sdhar...@ncsu.edu wrote: Hey Kenrick, For the issue with GiraphApplicationMaster, I followed what Phillip did here: http://mail-archives.apache.org/mod_mbox/giraph-user/201503.mbox/%3CCAO3ErG_obGV8mELzX1j%2Be%3DaL6C%3D6%3DtdiSOVRBia2gh0H9tYLZA%40mail.gmail.com%3E Basically you need the jar for giraph-examples to be in the directory where you are issuing the command. You can do this by creating a symbolic link. I have no idea why this worked and you can't use an absolute path, but that is how it was for me. Thanks, Steve On Mon, Mar 23, 2015 at 7:06 PM, Kenrick Fernandes kenrick@gmail.com wrote: Hi Phil, The build was successful - now running the *ShortestPaths* example gives me a different error, *GiraphApplicationMaster* not found . However, when I run the PageRank benchmark, I still get the same *LocalJobRunner* error: - *Command:* hadoop jar giraph-1.1.0-for-hadoop-2.6.0-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 5000 -w 30 *Error:* Exception in thread main java.lang.IllegalArgumentException: checkLocalJobRunnerConfiguration: When using LocalJobRunner, must have only one worker since only 1 task at a time! at org.apache.giraph.job.GiraphJob.checkLocalJobRunnerConfiguration(GiraphJob.java:162) at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:236) at org.apache.giraph.benchmark.GiraphBenchmark.run(GiraphBenchmark.java:96) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:158) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) - Did PageRank run fine for you ? Thanks, Kenrick On Mon, Mar 23, 2015 at 4:35 PM, Phillip Rhodes motley.crue@gmail.com wrote: What I had to do to get this to work was: edit the pom.xml and change the hadoop_yarn profile to remove the one munge symbol that was something like _SASL_SOMETHING_OR_OTHER. Build using mvn -Phadoop_yarn -Dhadoop.version=2.5.2 (in my case) Phil This message optimized for indexing by NSA PRISM On Sun, Mar 22, 2015 at 4:28 PM, Kenrick Fernandes kenrick@gmail.com wrote: Hi, I am working with Giraph 1.1.0 and a YARN cluster with Hadoop 2.6.0. I build Giraph with mvn -Phadoop_2 -Dhadoop.version=2.6.0 clean package -DskipTests So far, when I run any of the benchmarks or Shortest path examples, I always get the LocalJobRunner error : --- Exception in thread main java.lang.IllegalArgumentException: checkLocalJobRunnerConfiguration: When using LocalJobRunner, you cannot run in split master / worker mode since there is only 1 task at a time! at org.apache.giraph.job.GiraphJob.checkLocalJobRunnerConfiguration(GiraphJob.java:168) at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:236) at org.apache.giraph.benchmark.GiraphBenchmark.run(GiraphBenchmark.java:96) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:158) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) --- I have tried some solutions from forums/StackOverflow/lists, but so far nothing has worked. As far as I can tell, Hadoop is configured right (other MR benchmarks run fine). I tried changing the above Giraph code file (making the check function returned what I wanted), but that only starts the Job and gets it running on a single machine - it never uses more than 1 machine. Any help or pointers in the right direction would be much appreciated. Thanks, Kenrick
Custom assignment of partitions to workers
Hi, I understand we can override the GraphPartitionerFactory class in order to achieve custom partitioning of vertices over partitions. Is there a way to do the same to enable assigning partitions to workers in a custom way (e.g., partition n should be assigned to worker m)? The reason is that it is more beneficial to have partitions that are close to each other in terms of the graph structure placed close to each other on the same worker to minimize network traffic. Otherwise, benefits of overriding GraphPartitionerFactory may be lost. Let us assume that there is an external preprocessing step that outputs the desired assignment, so can we materialize that assignment over Giraph workers? Thanks!
Re: Custom assignment of partitions to workers
Its same. GraphPartitionerFactory has got two methods. One for WorkerGraphPartitioner and second for MasterGraphPartitioner. public interface GraphPartitionerFactory ... { MasterGraphPartitionerI, V, E createMasterGraphPartitioner(); WorkerGraphPartitionerI, V, E createWorkerGraphPartitioner(); } Lukas On 25.3.2015 20:57, Arjun Sharma wrote: Thanks for your reply! I understand we can supply the GraphPartitionerFactory class to Giraph using the giraph.graphPartitionerFactoryClass class. Is there a way to supply the MasterGraphPartitioner class? Thanks! On Wed, Mar 25, 2015 at 12:54 PM, Lukas Nalezenec lukas.naleze...@firma.seznam.cz mailto:lukas.naleze...@firma.seznam.cz wrote: Hi, There are two interfaces: WorkerGraphPartitioner - Maps vertexes to partitions MasterGraphPartitioner - Maps partitions to workers. So you need custom MasterGraphPartitioner. You dont need any external preprocessing step. Lukas On 25.3.2015 19:51, Arjun Sharma wrote: Hi, I understand we can override the GraphPartitionerFactory class in order to achieve custom partitioning of vertices over partitions. Is there a way to do the same to enable assigning partitions to workers in a custom way (e.g., partition n should be assigned to worker m)? The reason is that it is more beneficial to have partitions that are close to each other in terms of the graph structure placed close to each other on the same worker to minimize network traffic. Otherwise, benefits of overriding GraphPartitionerFactory may be lost. Let us assume that there is an external preprocessing step that outputs the desired assignment, so can we materialize that assignment over Giraph workers? Thanks!