Re: How to ensure that only one worker runs per node

2014-10-30 Thread Matthew Cornell
As I understand it, 1) set the variable to 1 as you say, and 2)
specify the number of workers to the number of nodes - 1 (for the
master). When you run a job you can look at the 'map' link on the
tasktracker ui to see all the workers plus master.

On Thu, Oct 30, 2014 at 7:11 AM, Matthew Saltz sal...@gmail.com wrote:
 Hi everyone,

 Is there a good way (a configuration I'm guessing) to prevent more than one
 worker from running per node? I saw in this thread to use
 mapred.tasktracker.map.tasks.maximum=1, but that doesn't seem to be working.
 Thanks for the help.

 Best,
 Matthew




-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34
Dickinson Street, Amherst MA 01002 | matthewcornell.org


Missing chosen worker ERROR drills down to end of stream exception (likely client has closed socket). help!

2014-10-28 Thread Matthew Cornell
Hi All,

I have a Giraph 1.0.0 job that has failed, but I'm not able to get
detail as to what really happened. The master's log says:

 2014-10-28 10:28:32,006 ERROR org.apache.giraph.master.BspServiceMaster: 
 superstepChosenWorkerAlive: Missing chosen worker 
 Worker(hostname=compute-0-0.wright, MRtaskID=1, port=30001) on superstep 4

OK, this seems to say compute-0-0 failed in some way, correct? The
Ganglia pages show no noticeable OS differences between the failed
node and another identical compute node. In the failed node's log I
see two WARNs:

 2014-10-28 10:28:19,560 WARN org.apache.giraph.bsp.BspService: process: 
 Disconnected from ZooKeeper (will automatically try to recover) WatchedEvent 
 state:Disconnected type:None path:null
 2014-10-28 10:28:19,560 WARN org.apache.giraph.worker.InputSplitsHandler: 
 process: Problem with zookeeper, got event with path null, state 
 Disconnected, event type None

OK, I guess there was a zookeeper issue. In the Zookeeper log I find:

 2014-10-28 10:28:14,917 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 caught end of stream exception
 EndOfStreamException: Unable to read additional data from client sessionid 
 0x149529c74de0a4d, likely client has closed socket
 at 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
 at java.lang.Thread.run(Thread.java:745)

OK, so I guess the socket closure was the problem. But why did *that* happen?

I could really use your help here!

Thank you,

matt


-- 
Matthew Cornell | m...@matthewcornell.org


Re: YARN vs. MR1: is YARN a good idea ? Out-of-Core Graph ?

2014-10-22 Thread Matthew Cornell
Following up, does anyone have thoughts re: MR1 vs YARN performance?
Thank you. -- matt

On Wed, Oct 22, 2014 at 8:34 AM,  olivier.var...@orange.com wrote:
 Hello Tripti,

 I bumped into this mail, I am experiencing out-of-memory errors on my small 
 cluster,
 and, as out-of-core graph does not seem to work on giraph 1.1-SNAPSHOT, I was 
 wondering if you had any jira / patched already posted  to help solve this 
 issue ?


 Thanks a lot

 regards


 
 Olivier Varène
 Big Data Referent
 Orange - DSI France/Digital Factory
 olivier.var...@orange.com
 +33 4 97 46 29 94







 Le 10 oct. 2014 à 20:15, Tripti Singh tri...@yahoo-inc.com a écrit :

 Hi Matthew,
 I would have been thrilled to give you numbers on this one but for me the 
 Application is not scaling without the out-of-core option( which isn't 
 working the way it was in previous version)
 I'm still figuring it out and can get back once it's resolved. I have 
 patched a few things and will share them for people who might face similar 
 issue. If u have a fix for scalability, do let me know

 Thanks,
 Tripti

 Sent from my iPhone

 On 06-Oct-2014, at 9:22 pm, Matthew Cornell m...@matthewcornell.org 
 wrote:

 Hi Folks. I don't think I paid enough attention to YARN vs. MR1 when I
 built Giraph 1.0.0 for our system. How much better is Giraph on YARN?
 Thank you.

 --
 Matthew Cornell | m...@matthewcornell.org


 _

 Ce message et ses pieces jointes peuvent contenir des informations 
 confidentielles ou privilegiees et ne doivent donc
 pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
 ce message par erreur, veuillez le signaler
 a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
 electroniques etant susceptibles d'alteration,
 Orange decline toute responsabilite si ce message a ete altere, deforme ou 
 falsifie. Merci.

 This message and its attachments may contain confidential or privileged 
 information that may be protected by law;
 they should not be distributed, used or copied without authorisation.
 If you have received this email in error, please notify the sender and delete 
 this message and its attachments.
 As emails may be altered, Orange is not liable for messages that have been 
 modified, changed or falsified.
 Thank you.




-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34
Dickinson Street, Amherst MA 01002 | matthewcornell.org


YARN vs. MR1: is YARN a good idea?

2014-10-06 Thread Matthew Cornell
Hi Folks. I don't think I paid enough attention to YARN vs. MR1 when I
built Giraph 1.0.0 for our system. How much better is Giraph on YARN?
Thank you.

-- 
Matthew Cornell | m...@matthewcornell.org


Re: Giraph 1.0 | Computation stuck at map 100% - reduce 0% for my algorithm only, at multi-node cluster

2014-09-30 Thread Matthew Cornell
I'm new, but in my meager experience when it stops at map 100% it means
there was an error somewhere. In Giraph I've often found it difficult to
pin down what that error actually was (e.g., out of memory), but the logs
are the first place to look. Just to clarify re: not finding outputs: Are
you going to http://your_host.com:50030/jobtracker.jsp and clicking on
the failed job id (e.g., job_201409251209_0029 -
http://your_host.com:50030/jobdetails.jsp?jobid=job_201409251209_0029refresh=0
)? From there, click the map link in the table to see its tasks. (Giraph
runs entirely as a map task, IIUC.) You should see tasks for the master
plus your workers. If you click on one of them (e.g.,
task_201409251209_0029_m_00 -
http://your_host.com:50030/taskdetails.jsp?tipid=task_201409251209_0029_m_00
) you should see what machine it ran on plus a link to the Task Logs. Click
on All and you should see three sections for stdout, stderr, and syslog,
the latter of which usually contains hints about what went wrong. You
should check all the worker logs.

Hope that helps.


On Tue, Sep 30, 2014 at 2:53 AM, Panagiotis Eustratiadis 
ep.pan@gmail.com wrote:

 Good morning,

 I have been having a problem the past few days which sadly I can't solve.

 First of all I set up a Hadoop 0.20.203.0 cluster of two nodes a master
 and a slave. I followed this tutorial for the settings:
 http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

 Then I set up Giraph, and I built it properly with maven. When I run the
 SimpleShortestPathVertex with number of workers = 2 it runs properly, and
 gives me results which I can view from any of the two nodes. Also the
 jobtracker at master:50030 and slave:50030 and everything else is working
 as expected.

 However, when I try to run my own algorithm it hangs at map 100% reduce 0%
 forever. I looked at SimpleShortestPathVertex for any configurations and it
 has none. And the weird part is: the jobs at the jobtracker have no logs at
 stdout or stderr. The only thing readable is the map task info:

 task_201409300940_0001_m_00 | 100.00% - MASTER_ZOOKEEPER_ONLY | 1
 finished out of 2 on superstep -1
 task_201409300940_0001_m_01 | 100.00% | startSuperstep: WORKER_ONLY -
 Attempt=0, Superstep=-1
 task_201409300940_0001_m_02 | 100.00% | startSuperstep: WORKER_ONLY -
 Attempt=0, Superstep=-1

 Is there anything I'm overlooking? I have Googled the obvious stack
 overflow solutions for two days now. Has anyone encountered anything
 similar?

 Regards,
 Panagiotis Eustratiadis.




-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
Street, Amherst MA 01002 | matthewcornell.org


Re: receiving messages that I didn't send

2014-09-25 Thread Matthew Cornell
Thanks for replying, Pavan. I figured out that my message Writable (an
ArrayListWritable) needed to call clear() in readFields() before
calling super():

@Override
public void readFields(DataInput in) throws IOException {
clear();
super.readFields(in);
}

This was an 'of course' moment when I realized it was, like other
Writables, being reused. But what I don't understand is why doesn't
ArrayListWritable#readFields() call clear? Isn't this a nasty bug? ...
Oh wait - sure enough:

ArrayListWritable object is not cleared in readFields()
https://issues.apache.org/jira/browse/GIRAPH-740

Thanks again,

matt


On Tue, Sep 23, 2014 at 11:46 AM, Pavan Kumar A pava...@outlook.com wrote:
 Can you give more context?
 What are the types of messages, patch of your compute method, etc.
 You will not receive messages that are not sent, but one thing that can
 happen is
 -- message can have multiple parameters.
 suppose message objects can have 2 parameters
 m - a,b
 say in m's write(out) you do not handle the case of b = null
 m1 sets b
 m2 has b=null
 then because of incorrect code for m's write() m2 can show b = m1.b
 that is because message objects will be re-used when receiving. This is a
 Giraph gotcha, because of
 object reuse in most iterators.

 Thanks

 From: m...@matthewcornell.org
 Date: Tue, 23 Sep 2014 10:10:48 -0400
 Subject: receiving messages that I didn't send
 To: user@giraph.apache.org


 Hi Folks. I am refactoring my compute() to use a set of ids as its
 message type, and in my tests it is receiving a message that it
 absolutely did not send. I've debugged it and am at a loss.
 Interestingly, I encountered this once before and solved it by
 creating a copy of a Writeable instead of re-using it, but I haven't
 been able to solve it this time. In general, does this anomalous
 behavior indicate a Giraph/Hadoop gotcha'? It's really confounding!
 Thank very much -- matt

 --
 Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34
 Dickinson Street, Amherst MA 01002 | matthewcornell.org



-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34
Dickinson Street, Amherst MA 01002 | matthewcornell.org


receiving messages that I didn't send

2014-09-23 Thread Matthew Cornell
Hi Folks. I am refactoring my compute()  to use a set of ids as its
message type, and in my tests it is receiving a message that it
absolutely did not send. I've debugged it and am at a loss.
Interestingly, I encountered this once before and solved it by
creating a copy of a Writeable instead of re-using it, but I haven't
been able to solve it this time. In general, does this anomalous
behavior indicate a Giraph/Hadoop gotcha'? It's really confounding!
Thank very much -- matt

-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34
Dickinson Street, Amherst MA 01002 | matthewcornell.org


understanding failing my job, Giraph/Hadoop memory usage, under-utilized nodes, and moving forward

2014-09-22 Thread Matthew Cornell
 MB
 Map Task Capacity: 48
 Reduce Task Capacity: 24
 Avg. Tasks/Node: 14.40

Giraph: Compiled as giraph-1.0.0-for-hadoop-2.0.0-alpha, CHANGELOG:
Release 1.0.0 - 2013-04-15


 questions 

o How can I verify that the failure is actually one of memory? I've
looked fairly carefully at the logs.

o I noticed that not all hosts are being used. I did three runs, two
with 8 workers and one with 12, and I pulled the following from the
task logs ('h' = head node, 0-3 = compute nodes):

 run #1: 0, 2, 3, h, h, h, h, h, h
 run #2: 2, 1, 3, h, h, h, h, h, h
 run #3: 3, 3, h, h, h, h, h, h, h, h, h, 1, 1

Note that there's at least one compute node that isn't listed for each run.

o What's a good # of workers to use?

o What Hadoop parameters should I tweak?
 mapred.job.map.memory.mb=xx
 mapred.map.child.java.opts=xx
 mapred.{map|reduce}.child.ulimit
 mapred.task.profile
 # map slots for each TaskTracker
 number of partitions you keep in memory


o What Giraph parameters should I tweak? I'm currently using defaults
for all, but I found these possibilities:
 giraph.maxPartitionsInMemory
 giraph.useOutOfCoreGraph=true
 giraph.maxPartitionsInMemory=N (default: 10)
 giraph.isStaticGraph=true
 giraph.useOutOfCoreMessages=true (default: disabled)
 giraph.maxMessagesInMemory=N (default: 100)

o How can I get a feel for how much more processing and memory might
be needed to finish the job, beyond that it's on the last superstep?
For example, of the ~181M sent messages I see during the run, how many
more might be left?

o Why is the Heap Size from the Cluster summary above (972.69
MB/989.88 MB) so low?

Thanks again!


 counters from successful four-step run 

INFO mapred.JobClient: Job complete: job_201409191450_0001
INFO mapred.JobClient: Counters: 39
INFO mapred.JobClient:   File System Counters
INFO mapred.JobClient: FILE: Number of bytes read=0
INFO mapred.JobClient: FILE: Number of bytes written=1694975
INFO mapred.JobClient: FILE: Number of read operations=0
INFO mapred.JobClient: FILE: Number of large read operations=0
INFO mapred.JobClient: FILE: Number of write operations=0
INFO mapred.JobClient: HDFS: Number of bytes read=10016293
INFO mapred.JobClient: HDFS: Number of bytes written=113612773
INFO mapred.JobClient: HDFS: Number of read operations=12
INFO mapred.JobClient: HDFS: Number of large read operations=0
INFO mapred.JobClient: HDFS: Number of write operations=9
INFO mapred.JobClient:   Job Counters
INFO mapred.JobClient: Launched map tasks=9
INFO mapred.JobClient: Total time spent by all maps in occupied
slots (ms)=206659
INFO mapred.JobClient: Total time spent by all reduces in occupied
slots (ms)=0
INFO mapred.JobClient: Total time spent by all maps waiting after
reserving slots (ms)=0
INFO mapred.JobClient: Total time spent by all reduces waiting
after reserving slots (ms)=0
INFO mapred.JobClient:   Map-Reduce Framework
INFO mapred.JobClient: Map input records=9
INFO mapred.JobClient: Map output records=0
INFO mapred.JobClient: Input split bytes=396
INFO mapred.JobClient: Spilled Records=0
INFO mapred.JobClient: CPU time spent (ms)=243280
INFO mapred.JobClient: Physical memory (bytes) snapshot=9947144192
INFO mapred.JobClient: Virtual memory (bytes) snapshot=25884065792
INFO mapred.JobClient: Total committed heap usage (bytes)=10392305664
INFO mapred.JobClient:   Giraph Stats
INFO mapred.JobClient: Aggregate edges=402428
INFO mapred.JobClient: Aggregate finished vertices=119141
INFO mapred.JobClient: Aggregate vertices=119141
INFO mapred.JobClient: Current master task partition=0
INFO mapred.JobClient: Current workers=8
INFO mapred.JobClient: Last checkpointed superstep=0
INFO mapred.JobClient: Sent messages=0
INFO mapred.JobClient: Superstep=4
INFO mapred.JobClient:   Giraph Timers
INFO mapred.JobClient: Input superstep (milliseconds)=1689
INFO mapred.JobClient: Setup (milliseconds)=3977
INFO mapred.JobClient: Shutdown (milliseconds)=1177
INFO mapred.JobClient: Superstep 0 (milliseconds)=834
INFO mapred.JobClient: Superstep 1 (milliseconds)=1836
INFO mapred.JobClient: Superstep 2 (milliseconds)=2524
INFO mapred.JobClient: Superstep 3 (milliseconds)=8284
INFO mapred.JobClient: Total (milliseconds)=20322

 EOF 


-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34
Dickinson Street, Amherst MA 01002 | matthewcornell.org


Re: how do I maintain a cached List across supersteps?

2014-09-17 Thread Matthew Cornell
Thanks to Claudio and Matthew, I went with the WorkerContext solution. Note
that I wrote a MasterCompute.validate() to verify the correct WorkerContext
class was set. Otherwise I was worried my cast would fail. -- matt

On Wed, Sep 17, 2014 at 11:49 AM, Claudio Martella 
claudio.marte...@gmail.com wrote:

 I would use a workercontext, it is shared and persistent during
 computation by all vertices in a worker. If it's readonly, you won't have
 to manage concurrency.

 On Tue, Sep 16, 2014 at 9:42 PM, Matthew Cornell m...@matthewcornell.org
 wrote:

 Hi Folks. I have a custom argument that's passed into my Giraph job that
 needs parsing. The parsed value is accessed by my Vertex#compute. To avoid
 excessive GC I'd like to cache the parsing results. What's a good way to do
 so? I looked at using the ImmutableClassesGiraphConfiguration returned by
 getConf(), but it supports only String properties. I looked at using my
 custom MasterCompute to manage it, but I couldn't find how to access the
 master compute instance from the vertex. My last idea is to use (abuse?) an
 aggregator to do this. I'd appreciate your thoughts! -- matt

 --
 Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
 Street, Amherst MA 01002 | matthewcornell.org




 --
Claudio Martella





-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
Street, Amherst MA 01002 | matthewcornell.org


how do I maintain a cached List across supersteps?

2014-09-16 Thread Matthew Cornell
Hi Folks. I have a custom argument that's passed into my Giraph job that
needs parsing. The parsed value is accessed by my Vertex#compute. To avoid
excessive GC I'd like to cache the parsing results. What's a good way to do
so? I looked at using the ImmutableClassesGiraphConfiguration returned by
getConf(), but it supports only String properties. I looked at using my
custom MasterCompute to manage it, but I couldn't find how to access the
master compute instance from the vertex. My last idea is to use (abuse?) an
aggregator to do this. I'd appreciate your thoughts! -- matt

-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
Street, Amherst MA 01002 | matthewcornell.org


How do I validate customArguments?

2014-08-26 Thread Matthew Cornell
Hi again. My application needs to pass in a String argument to the
computation which each Vertex needs access to. (The argument is a list of
the form [item1, item2, ...].) I found --customArguments (which I set in
my tests via conf.set(arg_name, arg_val)) but I need to check that it's
properly formatted. Where do I do that? The only thing I thought of is to
specify a DefaultMasterCompute subclass whose initialize() does the check,
but all the initialize() examples do is register aggregators; none of them
check args or do anything else. Thanks in advance! -- matt

-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
Street, Amherst MA 01002 | matthewcornell.org


Which is better to use to manage Vertex state: POJO instance variables or Giraph values?

2014-08-26 Thread Matthew Cornell
Hi there. I'm confused about when it's OK to use Vertex instance variables
to maintain state rather than proper Giraph values ala getValue(). An
interesting example I found in the source demonstrates both:
SimpleTriangleClosingVertex, which has both an instance variable (closeMap)
and a custom vertex value (IntArrayListWritable). I'm a little surprised
that using an instance variable is legit due to possibly screwing up
serialization (?) My question: Is either valid? If so, how do I choose one
over the other? Thanks very much. -- matt

-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
Street, Amherst MA 01002 | matthewcornell.org


How do I look up a Vertex using its ID?

2014-08-25 Thread Matthew Cornell
Hi Folks. I have a graph computation that passes 'visited' Vertex IDs
around, and I need to output information from those in the output phase.
How do I look up a Vertex from its ID? I found Partition.getVertex(), but
IIUC there is no guarantee that an arbitrary Vertex will be in a particular
partition. Thanks in advance.

-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
Street, Amherst MA 01002 | matthewcornell.org