Re: How to ensure that only one worker runs per node
As I understand it, 1) set the variable to 1 as you say, and 2) specify the number of workers to the number of nodes - 1 (for the master). When you run a job you can look at the 'map' link on the tasktracker ui to see all the workers plus master. On Thu, Oct 30, 2014 at 7:11 AM, Matthew Saltz sal...@gmail.com wrote: Hi everyone, Is there a good way (a configuration I'm guessing) to prevent more than one worker from running per node? I saw in this thread to use mapred.tasktracker.map.tasks.maximum=1, but that doesn't seem to be working. Thanks for the help. Best, Matthew -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
Missing chosen worker ERROR drills down to end of stream exception (likely client has closed socket). help!
Hi All, I have a Giraph 1.0.0 job that has failed, but I'm not able to get detail as to what really happened. The master's log says: 2014-10-28 10:28:32,006 ERROR org.apache.giraph.master.BspServiceMaster: superstepChosenWorkerAlive: Missing chosen worker Worker(hostname=compute-0-0.wright, MRtaskID=1, port=30001) on superstep 4 OK, this seems to say compute-0-0 failed in some way, correct? The Ganglia pages show no noticeable OS differences between the failed node and another identical compute node. In the failed node's log I see two WARNs: 2014-10-28 10:28:19,560 WARN org.apache.giraph.bsp.BspService: process: Disconnected from ZooKeeper (will automatically try to recover) WatchedEvent state:Disconnected type:None path:null 2014-10-28 10:28:19,560 WARN org.apache.giraph.worker.InputSplitsHandler: process: Problem with zookeeper, got event with path null, state Disconnected, event type None OK, I guess there was a zookeeper issue. In the Zookeeper log I find: 2014-10-28 10:28:14,917 WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x149529c74de0a4d, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) OK, so I guess the socket closure was the problem. But why did *that* happen? I could really use your help here! Thank you, matt -- Matthew Cornell | m...@matthewcornell.org
Re: YARN vs. MR1: is YARN a good idea ? Out-of-Core Graph ?
Following up, does anyone have thoughts re: MR1 vs YARN performance? Thank you. -- matt On Wed, Oct 22, 2014 at 8:34 AM, olivier.var...@orange.com wrote: Hello Tripti, I bumped into this mail, I am experiencing out-of-memory errors on my small cluster, and, as out-of-core graph does not seem to work on giraph 1.1-SNAPSHOT, I was wondering if you had any jira / patched already posted to help solve this issue ? Thanks a lot regards Olivier Varène Big Data Referent Orange - DSI France/Digital Factory olivier.var...@orange.com +33 4 97 46 29 94 Le 10 oct. 2014 à 20:15, Tripti Singh tri...@yahoo-inc.com a écrit : Hi Matthew, I would have been thrilled to give you numbers on this one but for me the Application is not scaling without the out-of-core option( which isn't working the way it was in previous version) I'm still figuring it out and can get back once it's resolved. I have patched a few things and will share them for people who might face similar issue. If u have a fix for scalability, do let me know Thanks, Tripti Sent from my iPhone On 06-Oct-2014, at 9:22 pm, Matthew Cornell m...@matthewcornell.org wrote: Hi Folks. I don't think I paid enough attention to YARN vs. MR1 when I built Giraph 1.0.0 for our system. How much better is Giraph on YARN? Thank you. -- Matthew Cornell | m...@matthewcornell.org _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
YARN vs. MR1: is YARN a good idea?
Hi Folks. I don't think I paid enough attention to YARN vs. MR1 when I built Giraph 1.0.0 for our system. How much better is Giraph on YARN? Thank you. -- Matthew Cornell | m...@matthewcornell.org
Re: Giraph 1.0 | Computation stuck at map 100% - reduce 0% for my algorithm only, at multi-node cluster
I'm new, but in my meager experience when it stops at map 100% it means there was an error somewhere. In Giraph I've often found it difficult to pin down what that error actually was (e.g., out of memory), but the logs are the first place to look. Just to clarify re: not finding outputs: Are you going to http://your_host.com:50030/jobtracker.jsp and clicking on the failed job id (e.g., job_201409251209_0029 - http://your_host.com:50030/jobdetails.jsp?jobid=job_201409251209_0029refresh=0 )? From there, click the map link in the table to see its tasks. (Giraph runs entirely as a map task, IIUC.) You should see tasks for the master plus your workers. If you click on one of them (e.g., task_201409251209_0029_m_00 - http://your_host.com:50030/taskdetails.jsp?tipid=task_201409251209_0029_m_00 ) you should see what machine it ran on plus a link to the Task Logs. Click on All and you should see three sections for stdout, stderr, and syslog, the latter of which usually contains hints about what went wrong. You should check all the worker logs. Hope that helps. On Tue, Sep 30, 2014 at 2:53 AM, Panagiotis Eustratiadis ep.pan@gmail.com wrote: Good morning, I have been having a problem the past few days which sadly I can't solve. First of all I set up a Hadoop 0.20.203.0 cluster of two nodes a master and a slave. I followed this tutorial for the settings: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ Then I set up Giraph, and I built it properly with maven. When I run the SimpleShortestPathVertex with number of workers = 2 it runs properly, and gives me results which I can view from any of the two nodes. Also the jobtracker at master:50030 and slave:50030 and everything else is working as expected. However, when I try to run my own algorithm it hangs at map 100% reduce 0% forever. I looked at SimpleShortestPathVertex for any configurations and it has none. And the weird part is: the jobs at the jobtracker have no logs at stdout or stderr. The only thing readable is the map task info: task_201409300940_0001_m_00 | 100.00% - MASTER_ZOOKEEPER_ONLY | 1 finished out of 2 on superstep -1 task_201409300940_0001_m_01 | 100.00% | startSuperstep: WORKER_ONLY - Attempt=0, Superstep=-1 task_201409300940_0001_m_02 | 100.00% | startSuperstep: WORKER_ONLY - Attempt=0, Superstep=-1 Is there anything I'm overlooking? I have Googled the obvious stack overflow solutions for two days now. Has anyone encountered anything similar? Regards, Panagiotis Eustratiadis. -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
Re: receiving messages that I didn't send
Thanks for replying, Pavan. I figured out that my message Writable (an ArrayListWritable) needed to call clear() in readFields() before calling super(): @Override public void readFields(DataInput in) throws IOException { clear(); super.readFields(in); } This was an 'of course' moment when I realized it was, like other Writables, being reused. But what I don't understand is why doesn't ArrayListWritable#readFields() call clear? Isn't this a nasty bug? ... Oh wait - sure enough: ArrayListWritable object is not cleared in readFields() https://issues.apache.org/jira/browse/GIRAPH-740 Thanks again, matt On Tue, Sep 23, 2014 at 11:46 AM, Pavan Kumar A pava...@outlook.com wrote: Can you give more context? What are the types of messages, patch of your compute method, etc. You will not receive messages that are not sent, but one thing that can happen is -- message can have multiple parameters. suppose message objects can have 2 parameters m - a,b say in m's write(out) you do not handle the case of b = null m1 sets b m2 has b=null then because of incorrect code for m's write() m2 can show b = m1.b that is because message objects will be re-used when receiving. This is a Giraph gotcha, because of object reuse in most iterators. Thanks From: m...@matthewcornell.org Date: Tue, 23 Sep 2014 10:10:48 -0400 Subject: receiving messages that I didn't send To: user@giraph.apache.org Hi Folks. I am refactoring my compute() to use a set of ids as its message type, and in my tests it is receiving a message that it absolutely did not send. I've debugged it and am at a loss. Interestingly, I encountered this once before and solved it by creating a copy of a Writeable instead of re-using it, but I haven't been able to solve it this time. In general, does this anomalous behavior indicate a Giraph/Hadoop gotcha'? It's really confounding! Thank very much -- matt -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
receiving messages that I didn't send
Hi Folks. I am refactoring my compute() to use a set of ids as its message type, and in my tests it is receiving a message that it absolutely did not send. I've debugged it and am at a loss. Interestingly, I encountered this once before and solved it by creating a copy of a Writeable instead of re-using it, but I haven't been able to solve it this time. In general, does this anomalous behavior indicate a Giraph/Hadoop gotcha'? It's really confounding! Thank very much -- matt -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
understanding failing my job, Giraph/Hadoop memory usage, under-utilized nodes, and moving forward
MB Map Task Capacity: 48 Reduce Task Capacity: 24 Avg. Tasks/Node: 14.40 Giraph: Compiled as giraph-1.0.0-for-hadoop-2.0.0-alpha, CHANGELOG: Release 1.0.0 - 2013-04-15 questions o How can I verify that the failure is actually one of memory? I've looked fairly carefully at the logs. o I noticed that not all hosts are being used. I did three runs, two with 8 workers and one with 12, and I pulled the following from the task logs ('h' = head node, 0-3 = compute nodes): run #1: 0, 2, 3, h, h, h, h, h, h run #2: 2, 1, 3, h, h, h, h, h, h run #3: 3, 3, h, h, h, h, h, h, h, h, h, 1, 1 Note that there's at least one compute node that isn't listed for each run. o What's a good # of workers to use? o What Hadoop parameters should I tweak? mapred.job.map.memory.mb=xx mapred.map.child.java.opts=xx mapred.{map|reduce}.child.ulimit mapred.task.profile # map slots for each TaskTracker number of partitions you keep in memory o What Giraph parameters should I tweak? I'm currently using defaults for all, but I found these possibilities: giraph.maxPartitionsInMemory giraph.useOutOfCoreGraph=true giraph.maxPartitionsInMemory=N (default: 10) giraph.isStaticGraph=true giraph.useOutOfCoreMessages=true (default: disabled) giraph.maxMessagesInMemory=N (default: 100) o How can I get a feel for how much more processing and memory might be needed to finish the job, beyond that it's on the last superstep? For example, of the ~181M sent messages I see during the run, how many more might be left? o Why is the Heap Size from the Cluster summary above (972.69 MB/989.88 MB) so low? Thanks again! counters from successful four-step run INFO mapred.JobClient: Job complete: job_201409191450_0001 INFO mapred.JobClient: Counters: 39 INFO mapred.JobClient: File System Counters INFO mapred.JobClient: FILE: Number of bytes read=0 INFO mapred.JobClient: FILE: Number of bytes written=1694975 INFO mapred.JobClient: FILE: Number of read operations=0 INFO mapred.JobClient: FILE: Number of large read operations=0 INFO mapred.JobClient: FILE: Number of write operations=0 INFO mapred.JobClient: HDFS: Number of bytes read=10016293 INFO mapred.JobClient: HDFS: Number of bytes written=113612773 INFO mapred.JobClient: HDFS: Number of read operations=12 INFO mapred.JobClient: HDFS: Number of large read operations=0 INFO mapred.JobClient: HDFS: Number of write operations=9 INFO mapred.JobClient: Job Counters INFO mapred.JobClient: Launched map tasks=9 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=206659 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 INFO mapred.JobClient: Map-Reduce Framework INFO mapred.JobClient: Map input records=9 INFO mapred.JobClient: Map output records=0 INFO mapred.JobClient: Input split bytes=396 INFO mapred.JobClient: Spilled Records=0 INFO mapred.JobClient: CPU time spent (ms)=243280 INFO mapred.JobClient: Physical memory (bytes) snapshot=9947144192 INFO mapred.JobClient: Virtual memory (bytes) snapshot=25884065792 INFO mapred.JobClient: Total committed heap usage (bytes)=10392305664 INFO mapred.JobClient: Giraph Stats INFO mapred.JobClient: Aggregate edges=402428 INFO mapred.JobClient: Aggregate finished vertices=119141 INFO mapred.JobClient: Aggregate vertices=119141 INFO mapred.JobClient: Current master task partition=0 INFO mapred.JobClient: Current workers=8 INFO mapred.JobClient: Last checkpointed superstep=0 INFO mapred.JobClient: Sent messages=0 INFO mapred.JobClient: Superstep=4 INFO mapred.JobClient: Giraph Timers INFO mapred.JobClient: Input superstep (milliseconds)=1689 INFO mapred.JobClient: Setup (milliseconds)=3977 INFO mapred.JobClient: Shutdown (milliseconds)=1177 INFO mapred.JobClient: Superstep 0 (milliseconds)=834 INFO mapred.JobClient: Superstep 1 (milliseconds)=1836 INFO mapred.JobClient: Superstep 2 (milliseconds)=2524 INFO mapred.JobClient: Superstep 3 (milliseconds)=8284 INFO mapred.JobClient: Total (milliseconds)=20322 EOF -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
Re: how do I maintain a cached List across supersteps?
Thanks to Claudio and Matthew, I went with the WorkerContext solution. Note that I wrote a MasterCompute.validate() to verify the correct WorkerContext class was set. Otherwise I was worried my cast would fail. -- matt On Wed, Sep 17, 2014 at 11:49 AM, Claudio Martella claudio.marte...@gmail.com wrote: I would use a workercontext, it is shared and persistent during computation by all vertices in a worker. If it's readonly, you won't have to manage concurrency. On Tue, Sep 16, 2014 at 9:42 PM, Matthew Cornell m...@matthewcornell.org wrote: Hi Folks. I have a custom argument that's passed into my Giraph job that needs parsing. The parsed value is accessed by my Vertex#compute. To avoid excessive GC I'd like to cache the parsing results. What's a good way to do so? I looked at using the ImmutableClassesGiraphConfiguration returned by getConf(), but it supports only String properties. I looked at using my custom MasterCompute to manage it, but I couldn't find how to access the master compute instance from the vertex. My last idea is to use (abuse?) an aggregator to do this. I'd appreciate your thoughts! -- matt -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org -- Claudio Martella -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
how do I maintain a cached List across supersteps?
Hi Folks. I have a custom argument that's passed into my Giraph job that needs parsing. The parsed value is accessed by my Vertex#compute. To avoid excessive GC I'd like to cache the parsing results. What's a good way to do so? I looked at using the ImmutableClassesGiraphConfiguration returned by getConf(), but it supports only String properties. I looked at using my custom MasterCompute to manage it, but I couldn't find how to access the master compute instance from the vertex. My last idea is to use (abuse?) an aggregator to do this. I'd appreciate your thoughts! -- matt -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
How do I validate customArguments?
Hi again. My application needs to pass in a String argument to the computation which each Vertex needs access to. (The argument is a list of the form [item1, item2, ...].) I found --customArguments (which I set in my tests via conf.set(arg_name, arg_val)) but I need to check that it's properly formatted. Where do I do that? The only thing I thought of is to specify a DefaultMasterCompute subclass whose initialize() does the check, but all the initialize() examples do is register aggregators; none of them check args or do anything else. Thanks in advance! -- matt -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
Which is better to use to manage Vertex state: POJO instance variables or Giraph values?
Hi there. I'm confused about when it's OK to use Vertex instance variables to maintain state rather than proper Giraph values ala getValue(). An interesting example I found in the source demonstrates both: SimpleTriangleClosingVertex, which has both an instance variable (closeMap) and a custom vertex value (IntArrayListWritable). I'm a little surprised that using an instance variable is legit due to possibly screwing up serialization (?) My question: Is either valid? If so, how do I choose one over the other? Thanks very much. -- matt -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org
How do I look up a Vertex using its ID?
Hi Folks. I have a graph computation that passes 'visited' Vertex IDs around, and I need to output information from those in the output phase. How do I look up a Vertex from its ID? I found Partition.getVertex(), but IIUC there is no guarantee that an arbitrary Vertex will be in a particular partition. Thanks in advance. -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org