Re: How to ensure that only one worker runs per node
As I understand it, 1) set the variable to 1 as you say, and 2) specify the number of workers to the number of nodes - 1 (for the master). When you run a job you can look at the 'map' link on the tasktracker ui to see all the workers plus master. On Thu, Oct 30, 2014 at 7:11 AM, Matthew Saltz wrote: > Hi everyone, > > Is there a good way (a configuration I'm guessing) to prevent more than one > worker from running per node? I saw in this thread to use > mapred.tasktracker.map.tasks.maximum=1, but that doesn't seem to be working. > Thanks for the help. > > Best, > Matthew > -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
"Missing chosen worker" ERROR drills down to "end of stream exception" ("likely client has closed socket"). help!
Hi All, I have a Giraph 1.0.0 job that has failed, but I'm not able to get detail as to what really happened. The master's log says: > 2014-10-28 10:28:32,006 ERROR org.apache.giraph.master.BspServiceMaster: > superstepChosenWorkerAlive: Missing chosen worker > Worker(hostname=compute-0-0.wright, MRtaskID=1, port=30001) on superstep 4 OK, this seems to say compute-0-0 failed in some way, correct? The Ganglia pages show no noticeable OS differences between the failed node and another identical compute node. In the failed node's log I see two WARNs: > 2014-10-28 10:28:19,560 WARN org.apache.giraph.bsp.BspService: process: > Disconnected from ZooKeeper (will automatically try to recover) WatchedEvent > state:Disconnected type:None path:null > 2014-10-28 10:28:19,560 WARN org.apache.giraph.worker.InputSplitsHandler: > process: Problem with zookeeper, got event with path null, state > Disconnected, event type None OK, I guess there was a zookeeper issue. In the Zookeeper log I find: > 2014-10-28 10:28:14,917 WARN org.apache.zookeeper.server.NIOServerCnxn: > caught end of stream exception > EndOfStreamException: Unable to read additional data from client sessionid > 0x149529c74de0a4d, likely client has closed socket > at > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) > at java.lang.Thread.run(Thread.java:745) OK, so I guess the socket closure was the problem. But why did *that* happen? I could really use your help here! Thank you, matt -- Matthew Cornell | m...@matthewcornell.org
Re: YARN vs. MR1: is YARN a good idea ? Out-of-Core Graph ?
Following up, does anyone have thoughts re: MR1 vs YARN performance? Thank you. -- matt On Wed, Oct 22, 2014 at 8:34 AM, wrote: > Hello Tripti, > > I bumped into this mail, I am experiencing out-of-memory errors on my small > cluster, > and, as out-of-core graph does not seem to work on giraph 1.1-SNAPSHOT, I was > wondering if you had any jira / patched already posted to help solve this > issue ? > > > Thanks a lot > > regards > > > > Olivier Varène > Big Data Referent > Orange - DSI France/Digital Factory > olivier.var...@orange.com > +33 4 97 46 29 94 > > > > > > > > Le 10 oct. 2014 à 20:15, Tripti Singh a écrit : > >> Hi Matthew, >> I would have been thrilled to give you numbers on this one but for me the >> Application is not scaling without the out-of-core option( which isn't >> working the way it was in previous version) >> I'm still figuring it out and can get back once it's resolved. I have >> patched a few things and will share them for people who might face similar >> issue. If u have a fix for scalability, do let me know >> >> Thanks, >> Tripti >> >> Sent from my iPhone >> >>> On 06-Oct-2014, at 9:22 pm, "Matthew Cornell" >>> wrote: >>> >>> Hi Folks. I don't think I paid enough attention to YARN vs. MR1 when I >>> built Giraph 1.0.0 for our system. How much better is Giraph on YARN? >>> Thank you. >>> >>> -- >>> Matthew Cornell | m...@matthewcornell.org > > > _ > > Ce message et ses pieces jointes peuvent contenir des informations > confidentielles ou privilegiees et ne doivent donc > pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu > ce message par erreur, veuillez le signaler > a l'expediteur et le detruire ainsi que les pieces jointes. Les messages > electroniques etant susceptibles d'alteration, > Orange decline toute responsabilite si ce message a ete altere, deforme ou > falsifie. Merci. > > This message and its attachments may contain confidential or privileged > information that may be protected by law; > they should not be distributed, used or copied without authorisation. > If you have received this email in error, please notify the sender and delete > this message and its attachments. > As emails may be altered, Orange is not liable for messages that have been > modified, changed or falsified. > Thank you. > -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
YARN vs. MR1: is YARN a good idea?
Hi Folks. I don't think I paid enough attention to YARN vs. MR1 when I built Giraph 1.0.0 for our system. How much better is Giraph on YARN? Thank you. -- Matthew Cornell | m...@matthewcornell.org
Re: Giraph 1.0 | Computation stuck at map 100% - reduce 0% for my algorithm only, at multi-node cluster
I'm new, but in my meager experience when it stops at map 100% it means there was an error somewhere. In Giraph I've often found it difficult to pin down what that error actually was (e.g., out of memory), but the logs are the first place to look. Just to clarify re: not finding outputs: Are you going to http://:50030/jobtracker.jsp and clicking on the failed job id (e.g., job_201409251209_0029 -> http://:50030/jobdetails.jsp?jobid=job_201409251209_0029&refresh=0 )? From there, click the "map" link in the table to see its tasks. (Giraph runs entirely as a map task, IIUC.) You should see tasks for the master plus your workers. If you click on one of them (e.g., task_201409251209_0029_m_00 -> http://:50030/taskdetails.jsp?tipid=task_201409251209_0029_m_00 ) you should see what machine it ran on plus a link to the Task Logs. Click on "All" and you should see three sections for stdout, stderr, and syslog, the latter of which usually contains hints about what went wrong. You should check all the worker logs. Hope that helps. On Tue, Sep 30, 2014 at 2:53 AM, Panagiotis Eustratiadis < ep.pan@gmail.com> wrote: > Good morning, > > I have been having a problem the past few days which sadly I can't solve. > > First of all I set up a Hadoop 0.20.203.0 cluster of two nodes a master > and a slave. I followed this tutorial for the settings: > http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ > > Then I set up Giraph, and I built it properly with maven. When I run the > SimpleShortestPathVertex with number of workers = 2 it runs properly, and > gives me results which I can view from any of the two nodes. Also the > jobtracker at master:50030 and slave:50030 and everything else is working > as expected. > > However, when I try to run my own algorithm it hangs at map 100% reduce 0% > forever. I looked at SimpleShortestPathVertex for any configurations and it > has none. And the weird part is: the jobs at the jobtracker have no logs at > stdout or stderr. The only thing readable is the map task info: > > task_201409300940_0001_m_00 | 100.00% - MASTER_ZOOKEEPER_ONLY | 1 > finished out of 2 on superstep -1 > task_201409300940_0001_m_01 | 100.00% | startSuperstep: WORKER_ONLY - > Attempt=0, Superstep=-1 > task_201409300940_0001_m_02 | 100.00% | startSuperstep: WORKER_ONLY - > Attempt=0, Superstep=-1 > > Is there anything I'm overlooking? I have Googled the obvious stack > overflow solutions for two days now. Has anyone encountered anything > similar? > > Regards, > Panagiotis Eustratiadis. > -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
How do I control which tasks run on which hosts?
Hi Folks, I have a small CDH4 cluster of five hosts (four compute nodes and a head node - call them 0-3 and 'w') where hosts 0-3 have 4 cores and 16GB RAM each, and 'w' has 32 cores and 64GB RAM. All five hosts are running mapreduce tasktracker services, and 'w' is also running the jobtracker. Resources are tight for my particular Giraph application (a kind of path-finding), and I've discovered that some configurations of selected hosts are better than others. My command specifies four workers: hadoop jar $GIRAPH_HOME/giraph-ex.jar org.apache.giraph.GiraphRunner \ -Dgiraph.zkList=wright.cs.umass.edu:2181 \ -libjars ${LIBJARS} \ relpath.RelPathVertex \ -wc relpath.RelPathWorkerContext \ -mc relpath.RelPathMasterCompute \ -vif relpath.JsonAdjacencyListVertexInputFormat \ -vip $REL_PATH_INPUT \ -of relpath.JsonAdjacencyListTextOutputFormat \ -op $REL_PATH_OUTPUT \ -ca RelPathVertex.path=$REL_PATH_PATH \ -w 4 When Giraph (Zookeeper?) puts three or more of the Giraph map tasks on 'w' (e.g., 01www or 1), then that host maxes out ram, cpu, and swap, and the job hangs. However, when the system spreads the work out more evenly so that 'w' has only two or fewer tasks (e.g., 123ww or 0321w), then the job finishes fine. My question is 1) what program is deciding the task-to-host assigment, and 2) how do I control that? Thanks very much! -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
Re: understanding failing my job, Giraph/Hadoop memory usage, under-utilized nodes, and moving forward
On Mon, Sep 22, 2014 at 2:10 PM, Matthew Saltz wrote: > In the logs for the workers, do you have a line that looks like: > 2014-09-21 18:12:13,021 INFO org.apache.giraph.worker.BspServiceWorker: > finishSuperstep: Waiting on all requests, superstep 93 Memory > (free/total/max) = 21951.08M / 36456.50M / 43691.00M > > Looking at the memory usage in the worker that fails at the end of superstep > before failure could give you a clue. Yes, all four workers when I use "-w 4" have those lines: Task Logs: 'attempt_201409191450_0016_m_01_0': compute-0-1: 2014-09-25 09:28:13,425 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Waiting on all requests, superstep -1 Memory (free/total/max) = 242.41M / 438.06M / 1820.50M 2014-09-25 09:28:13,817 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Waiting on all requests, superstep 0 Memory (free/total/max) = 194.77M / 438.06M / 1820.50M 2014-09-25 09:28:14,936 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Waiting on all requests, superstep 1 Memory (free/total/max) = 383.74M / 600.38M / 1820.50M 2014-09-25 09:28:17,820 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Waiting on all requests, superstep 2 Memory (free/total/max) = 362.14M / 1007.50M / 1820.50M 2014-09-25 09:28:31,680 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Waiting on all requests, superstep 3 Memory (free/total/max) = 203.33M / 1661.50M / 1820.50M Task Logs: 'attempt_201409191450_0016_m_02_0': compute-0-1: 2014-09-25 09:28:13,458 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Waiting on all requests, superstep -1 Memory (free/total/max) = 887.74M / 964.50M / 1820.50M 2014-09-25 09:28:14,381 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Waiting on all requests, superstep 0 Memory (free/total/max) = 830.14M / 964.50M / 1820.50M 2014-09-25 09:28:15,337 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Waiting on all requests, superstep 1 Memory (free/total/max) = 785.66M / 1217.00M / 1820.50M 2014-09-25 09:28:18,114 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Waiting on all requests, superstep 2 Memory (free/total/max) = 661.72M / 1113.50M / 1820.50M 2014-09-25 09:28:52,451 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Waiting on all requests, superstep 3 Memory (free/total/max) = 285.90M / 1831.00M / 1831.00M Task Logs: 'attempt_201409191450_0016_m_03_0': wright: 2014-09-25 09:28:13,456 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Waiting on all requests, superstep -1 Memory (free/total/max) = 886.23M / 964.50M / 1820.50M 2014-09-25 09:28:14,399 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Waiting on all requests, superstep 0 Memory (free/total/max) = 826.36M / 964.50M / 1820.50M 2014-09-25 09:28:15,556 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Waiting on all requests, superstep 1 Memory (free/total/max) = 662.50M / 1217.00M / 1820.50M 2014-09-25 09:28:18,170 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Waiting on all requests, superstep 2 Memory (free/total/max) = 581.14M / 1115.00M / 1820.50M 2014-09-25 09:29:31,673 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Waiting on all requests, superstep 3 Memory (free/total/max) = 299.61M / 1834.00M / 1834.00M Task Logs: 'attempt_201409191450_0016_m_04_0': wright: 2014-09-25 09:28:13,473 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Waiting on all requests, superstep -1 Memory (free/total/max) = 887.10M / 964.50M / 1820.50M 2014-09-25 09:28:14,374 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Waiting on all requests, superstep 0 Memory (free/total/max) = 826.65M / 964.50M / 1820.50M 2014-09-25 09:28:15,755 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Waiting on all requests, superstep 1 Memory (free/total/max) = 980.33M / 1217.00M / 1820.50M 2014-09-25 09:28:18,254 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Waiting on all requests, superstep 2 Memory (free/total/max) = 517.13M / 1128.50M / 1820.50M 2014-09-25 09:29:34,392 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Waiting on all requests, superstep 3 Memory (free/total/max) = 271.52M / 1858.50M / 1858.50M I'm still not clear on a couple of things: 1. Each compute node has 16GB of memory, but each task has a max of ~1820M (<2GB). In Cloudera's web UI, I set "MapReduce Child Java Maximum Heap Size" to 2GB (default is 1GB). I will try upping it to 8GB. 2. I still don't understand why only two of my five possible nodes are being used. Thank you. -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
Re: receiving messages that I didn't send
Thanks for replying, Pavan. I figured out that my message Writable (an ArrayListWritable) needed to call clear() in readFields() before calling super(): @Override public void readFields(DataInput in) throws IOException { clear(); super.readFields(in); } This was an 'of course' moment when I realized it was, like other Writables, being reused. But what I don't understand is why doesn't ArrayListWritable#readFields() call clear? Isn't this a nasty bug? ... Oh wait - sure enough: ArrayListWritable object is not cleared in readFields() https://issues.apache.org/jira/browse/GIRAPH-740 Thanks again, matt On Tue, Sep 23, 2014 at 11:46 AM, Pavan Kumar A wrote: > Can you give more context? > What are the types of messages, patch of your compute method, etc. > You will not receive messages that are not sent, but one thing that can > happen is > -- message can have multiple parameters. > suppose message objects can have 2 parameters > m - a,b > say in m's write(out) you do not handle the case of b = null > m1 sets b > m2 has b=null > then because of incorrect code for m's write() m2 can show b = m1.b > that is because message objects will be re-used when receiving. This is a > Giraph gotcha, because of > object reuse in most iterators. > > Thanks > >> From: m...@matthewcornell.org >> Date: Tue, 23 Sep 2014 10:10:48 -0400 >> Subject: receiving messages that I didn't send >> To: user@giraph.apache.org > >> >> Hi Folks. I am refactoring my compute() to use a set of ids as its >> message type, and in my tests it is receiving a message that it >> absolutely did not send. I've debugged it and am at a loss. >> Interestingly, I encountered this once before and solved it by >> creating a copy of a Writeable instead of re-using it, but I haven't >> been able to solve it this time. In general, does this anomalous >> behavior indicate a Giraph/Hadoop gotcha'? It's really confounding! >> Thank very much -- matt >> >> -- >> Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 >> Dickinson Street, Amherst MA 01002 | matthewcornell.org -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
Re: understanding failing my job, Giraph/Hadoop memory usage, under-utilized nodes, and moving forward
I cannot thank you enough, Matthew. You've given me a lot to experiment with. -- matt On Mon, Sep 22, 2014 at 2:10 PM, Matthew Saltz wrote: > Hi Matthew, > > I answered a few of your questions in-line (unfortunately they might not > help the larger problem, but hopefully it'll help a bit). > > Best, > Matthew > > > On Mon, Sep 22, 2014 at 5:50 PM, Matthew Cornell > wrote: >> >> Hi Folks, >> >> I've spent the last two months learning, installing, coding, and >> analyzing the performance of our Giraph app, and I'm able to run on >> small inputs on our tiny cluster (yay!) I am now stuck trying to >> figure out why larger inputs fail, why only some compute nodes are >> being used, and generally how to make sure I've configured hadoop and >> giraph to use all available CPUs and RAM. I feel that I'm "this >> close," and I could really use some pointers. >> >> Below I share our app, configuration, results and log messages, some >> questions, and counter output for the successful run. My post here is >> long (I've broken it into sections delimited with ''), but I hope >> I've provided good enough information to get help on. I'm happy to add >> to it. >> >> Thanks! >> >> >> application >> >> Our application is a kind of path search where all nodes have a type >> and source database ID (e.g., "movie 99"), and searches are expressed >> as type paths, such as "movie, acted_in, actor", which would start >> with movies and then find all actors in each movie, for all movies in >> the database. The program does a kind of filtering by keeping track of >> previously-processed initial IDs. >> >> Our database is a small movie one with 2K movies, 6K users (people who >> rate movies), and 80K ratings of movies by users. Though small, we've >> found this kind of search can result in a massive explosion of >> messages, as was well put by Rob Vesse ( >> >> http://mail-archives.apache.org/mod_mbox/giraph-user/201312.mbox/%3ccec4a409.2d7ad%25rve...@dotnetrdf.org%3E >> ): >> >> > even with this relatively small graph you get a massive explosion of >> > messages by the later super steps which exhausts memory (in my graph the >> > theoretical maximum messages by the last super step was ~3 billion) >> >> >> job failure and error messages >> >> Currently I have a four-step path that completes in ~20 seconds >> ("rates, movie, rates, user" - counter output shown at bottom) but a >> five-step one ("rates, movie, rates, user, rates") fails after a few >> minutes. I've looked carefully at the task logs, but I find it a >> little difficult to discern what the actual failure was. However, >> looking at system information (e.g., top and ganglia) during the run >> indicates hosts are running out of memory. There are no >> OutOfMemoryErrors in the logs, and only this one stsands out: >> >> > ERROR org.apache.giraph.master.BspServiceMaster: >> > superstepChosenWorkerAlive: Missing chosen worker >> > Worker(hostname=compute-0-3.wright, MRtaskID=1, port=30001) on superstep 4 >> >> NB: So far I've been ignoring these other types of messages: >> >> > FATAL org.apache.giraph.master.BspServiceMaster: getLastGoodCheckpoint: >> > No last good checkpoints can be found, killing the job. >> >> > java.io.FileNotFoundException: File >> > _bsp/_checkpoints/job_201409191450_0003 does not exist. >> >> > WARN org.apache.giraph.bsp.BspService: process: Unknown and unprocessed >> > event >> > (path=/_hadoopBsp/job_201409191450_0003/_applicationAttemptsDir/0/_superstepDir/2/_superstepFinished, >> > type=NodeDeleted, state=SyncConnected) >> >> > ERROR org.apache.giraph.worker.BspServiceWorker: unregisterHealth: Got >> > failure, unregistering health on >> > /_hadoopBsp/job_201409191450_0003/_applicationAttemptsDir/0/_superstepDir/4/_workerHealthyDir/compute-0-3.wright_1 >> > on superstep 4 >> >> The counter statistics are minimal after the run fails, but during it >> I see something like this when refreshing the Job Tracker Web UI: >> >> > Counters > Map-Reduce Framework > Physical memory (bytes) snapshot: >> > ~28GB >> > Counters > Map-Reduce Framework > Virtual memory (bytes) snapshot: ~27GB >> > Counters > Giraph Stats > Sent messages: ~181M >> >> >> hadoop/giraph command >> >> h
receiving messages that I didn't send
Hi Folks. I am refactoring my compute() to use a set of ids as its message type, and in my tests it is receiving a message that it absolutely did not send. I've debugged it and am at a loss. Interestingly, I encountered this once before and solved it by creating a copy of a Writeable instead of re-using it, but I haven't been able to solve it this time. In general, does this anomalous behavior indicate a Giraph/Hadoop gotcha'? It's really confounding! Thank very much -- matt -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
understanding failing my job, Giraph/Hadoop memory usage, under-utilized nodes, and moving forward
GB > I/O Sort Memory Buffer (io.sort.mb): 512 MB > Client Java Heap Size in Bytes: 256 MB > Java Heap Size of Jobtracker in Bytes 1 GB > Java Heap Size of TaskTracker in Bytes: 1 GB Cluster summary from the Job Tracker Web UI: > Heap Size is 972.69 MB/989.88 MB > Map Task Capacity: 48 > Reduce Task Capacity: 24 > Avg. Tasks/Node: 14.40 Giraph: Compiled as "giraph-1.0.0-for-hadoop-2.0.0-alpha", CHANGELOG: Release 1.0.0 - 2013-04-15 questions o How can I verify that the failure is actually one of memory? I've looked fairly carefully at the logs. o I noticed that not all hosts are being used. I did three runs, two with 8 workers and one with 12, and I pulled the following from the task logs ('h' = head node, 0-3 = compute nodes): > run #1: 0, 2, 3, h, h, h, h, h, h > run #2: 2, 1, 3, h, h, h, h, h, h > run #3: 3, 3, h, h, h, h, h, h, h, h, h, 1, 1 Note that there's at least one compute node that isn't listed for each run. o What's a good # of workers to use? o What Hadoop parameters should I tweak? > mapred.job.map.memory.mb=xx > mapred.map.child.java.opts=xx > mapred.{map|reduce}.child.ulimit > mapred.task.profile > # map slots for each TaskTracker > number of partitions you keep in memory o What Giraph parameters should I tweak? I'm currently using defaults for all, but I found these possibilities: > giraph.maxPartitionsInMemory > giraph.useOutOfCoreGraph=true > giraph.maxPartitionsInMemory=N (default: 10) > giraph.isStaticGraph=true > giraph.useOutOfCoreMessages=true (default: disabled) > giraph.maxMessagesInMemory=N (default: 100) o How can I get a feel for how much more processing and memory might be needed to finish the job, beyond that it's on the last superstep? For example, of the ~181M sent messages I see during the run, how many more might be left? o Why is the Heap Size from the Cluster summary above (972.69 MB/989.88 MB) so low? Thanks again! counters from successful four-step run INFO mapred.JobClient: Job complete: job_201409191450_0001 INFO mapred.JobClient: Counters: 39 INFO mapred.JobClient: File System Counters INFO mapred.JobClient: FILE: Number of bytes read=0 INFO mapred.JobClient: FILE: Number of bytes written=1694975 INFO mapred.JobClient: FILE: Number of read operations=0 INFO mapred.JobClient: FILE: Number of large read operations=0 INFO mapred.JobClient: FILE: Number of write operations=0 INFO mapred.JobClient: HDFS: Number of bytes read=10016293 INFO mapred.JobClient: HDFS: Number of bytes written=113612773 INFO mapred.JobClient: HDFS: Number of read operations=12 INFO mapred.JobClient: HDFS: Number of large read operations=0 INFO mapred.JobClient: HDFS: Number of write operations=9 INFO mapred.JobClient: Job Counters INFO mapred.JobClient: Launched map tasks=9 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=206659 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 INFO mapred.JobClient: Map-Reduce Framework INFO mapred.JobClient: Map input records=9 INFO mapred.JobClient: Map output records=0 INFO mapred.JobClient: Input split bytes=396 INFO mapred.JobClient: Spilled Records=0 INFO mapred.JobClient: CPU time spent (ms)=243280 INFO mapred.JobClient: Physical memory (bytes) snapshot=9947144192 INFO mapred.JobClient: Virtual memory (bytes) snapshot=25884065792 INFO mapred.JobClient: Total committed heap usage (bytes)=10392305664 INFO mapred.JobClient: Giraph Stats INFO mapred.JobClient: Aggregate edges=402428 INFO mapred.JobClient: Aggregate finished vertices=119141 INFO mapred.JobClient: Aggregate vertices=119141 INFO mapred.JobClient: Current master task partition=0 INFO mapred.JobClient: Current workers=8 INFO mapred.JobClient: Last checkpointed superstep=0 INFO mapred.JobClient: Sent messages=0 INFO mapred.JobClient: Superstep=4 INFO mapred.JobClient: Giraph Timers INFO mapred.JobClient: Input superstep (milliseconds)=1689 INFO mapred.JobClient: Setup (milliseconds)=3977 INFO mapred.JobClient: Shutdown (milliseconds)=1177 INFO mapred.JobClient: Superstep 0 (milliseconds)=834 INFO mapred.JobClient: Superstep 1 (milliseconds)=1836 INFO mapred.JobClient: Superstep 2 (milliseconds)=2524 INFO mapred.JobClient: Superstep 3 (milliseconds)=8284 INFO mapred.JobClient: Total (milliseconds)=20322 EOF -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
Re: how do I maintain a cached List across supersteps?
Thanks to Claudio and Matthew, I went with the WorkerContext solution. Note that I wrote a MasterCompute.validate() to verify the correct WorkerContext class was set. Otherwise I was worried my cast would fail. -- matt On Wed, Sep 17, 2014 at 11:49 AM, Claudio Martella < claudio.marte...@gmail.com> wrote: > I would use a workercontext, it is shared and persistent during > computation by all vertices in a worker. If it's readonly, you won't have > to manage concurrency. > > On Tue, Sep 16, 2014 at 9:42 PM, Matthew Cornell > wrote: > >> Hi Folks. I have a custom argument that's passed into my Giraph job that >> needs parsing. The parsed value is accessed by my Vertex#compute. To avoid >> excessive GC I'd like to cache the parsing results. What's a good way to do >> so? I looked at using the ImmutableClassesGiraphConfiguration returned by >> getConf(), but it supports only String properties. I looked at using my >> custom MasterCompute to manage it, but I couldn't find how to access the >> master compute instance from the vertex. My last idea is to use (abuse?) an >> aggregator to do this. I'd appreciate your thoughts! -- matt >> >> -- >> Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson >> Street, Amherst MA 01002 | matthewcornell.org >> > > > > -- >Claudio Martella > > -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
how do I maintain a cached List across supersteps?
Hi Folks. I have a custom argument that's passed into my Giraph job that needs parsing. The parsed value is accessed by my Vertex#compute. To avoid excessive GC I'd like to cache the parsing results. What's a good way to do so? I looked at using the ImmutableClassesGiraphConfiguration returned by getConf(), but it supports only String properties. I looked at using my custom MasterCompute to manage it, but I couldn't find how to access the master compute instance from the vertex. My last idea is to use (abuse?) an aggregator to do this. I'd appreciate your thoughts! -- matt -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
Re: How do I validate customArguments?
Sorry for the long delay, Matthew. That's really helpful. Right now I'm stuck on apparently running out of memory on our little cluster, but the log messages are confusing. I'm putting together a question, but in the meantime I'll try one of the simpler examples such as degree count to see if /anything/ will run against my graph, which is very small (100K and edges nodes). -- matt On Thu, Aug 28, 2014 at 2:26 PM, Matthew Saltz wrote: > Matt, > > I'm not sure if you've resolved this problem already or not, but if you > haven't: The initialize() method isn't limited to registering aggregators, > and in fact, in my project I use it to do exactly what you're describing to > check and load custom configuration parameters. Inside the initialize() > method, I do this: > > *String numPreprocessingStepsConf = > getConf().get(NUMBER_OF_PREPROCESSING_STEPS_CONF_OPT);* > *numPreprocessingSteps = (numPreprocessingStepsConf != null) ?* > *Integer.parseInt(numPreprocessingStepsConf.trim()) :* > *DEFAULT_NUMBER_OF_PREPROCESSING_STEPS;* > *System.out.println("Number of preprocessing steps: " + > numPreprocessingSteps);* > > where at the class level I declare: > > public static final String NUMBER_OF_PREPROCESSING_STEPS_CONF_OPT = > "wcc.numPreprocessingSteps"; > public static final int DEFAULT_NUMBER_OF_PREPROCESSING_STEPS = 1; > public static int numPreprocessingSteps; > > To set the property, I use the option "-ca > wcc.numPreprocessingSteps=". If you need to check > that it's properly formatted and not store them, this is a fine place to do > it as well, given that it's run before the input superstep (see the giraph > code in BspServiceMaster, line 1617 in the stable 1.1.0 release). What > happens is that on the master, the MasterThread calls coordinateSuperstep() > on a BspServiceMaster object, which checks if it's the input superstep, and > if so, calls initialize() on the MasterCompute object (created in the > becomeMaster() method of BspServiceMaster). > > Hope this helps, > Matthew > > > > On Tue, Aug 26, 2014 at 4:36 PM, Matthew Cornell > wrote: > >> Hi again. My application needs to pass in a String argument to the >> computation which each Vertex needs access to. (The argument is a list of >> the form "[item1, item2, ...]".) I found --customArguments (which I set in >> my tests via conf.set(, )) but I need to check that it's >> properly formatted. Where do I do that? The only thing I thought of is to >> specify a DefaultMasterCompute subclass whose initialize() does the check, >> but all the initialize() examples do is register aggregators; none of them >> check args or do anything else. Thanks in advance! -- matt >> >> -- >> Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson >> Street, Amherst MA 01002 | matthewcornell.org >> > > -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
Which is better: sending many small messages or fewer large ones?
Hi Everyone, I have an app whose messaging granularity could be written two ways - sending many small messages vs. (possibly far) fewer larger ones. Conceptually what moves around is a set of 'alive' vertex IDs that might get filtered at each superstep based on a processed list (vertex value) that vertexes manage. The ones that survive to the end are the lucky winners. compute() calculates a set of 'new-to-me' incoming IDs that are perfect for the outgoing message, but I could easily send each ID one at a time. My guess is that sending fewer messages is more important, but the each set might contain thousands of IDs. Thanks! P.S. A side question: The few custom message type examples I've found are relatively simple objects with a few primitive instance variables, rather than collections. Is it nutty to send around a collection of IDs as a message? -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
Which is better to use to manage Vertex state: POJO instance variables or Giraph values?
Hi there. I'm confused about when it's OK to use Vertex instance variables to maintain state rather than proper Giraph values ala getValue(). An interesting example I found in the source demonstrates both: SimpleTriangleClosingVertex, which has both an instance variable (closeMap) and a custom vertex value (IntArrayListWritable). I'm a little surprised that using an instance variable is legit due to possibly screwing up serialization (?) My question: Is either valid? If so, how do I choose one over the other? Thanks very much. -- matt -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
How do I validate customArguments?
Hi again. My application needs to pass in a String argument to the computation which each Vertex needs access to. (The argument is a list of the form "[item1, item2, ...]".) I found --customArguments (which I set in my tests via conf.set(, )) but I need to check that it's properly formatted. Where do I do that? The only thing I thought of is to specify a DefaultMasterCompute subclass whose initialize() does the check, but all the initialize() examples do is register aggregators; none of them check args or do anything else. Thanks in advance! -- matt -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
How do I look up a Vertex using its ID?
Hi Folks. I have a graph computation that passes 'visited' Vertex IDs around, and I need to output information from those in the output phase. How do I look up a Vertex from its ID? I found Partition.getVertex(), but IIUC there is no guarantee that an arbitrary Vertex will be in a particular partition. Thanks in advance. -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
How do I output only a subset of a graph?
Hi Folks. I have a graph computation that starts with a subset of vertices of a certain type and propagates information through the graph to a set of target vertices, which are also subset of the graph. I want to output only information from those particular vertices, but I don't see a way to do this in the various VertexOutputFormat subclasses, which all seem oriented to outputting something for every vertex in the graph. How do I do this? E.g., are there hooks for the output phase where I can filter output? Or am I supposed to write a VertexOutputFormat implementation that generates no output for the vertices that have no data? Thanks in advance. -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org