Re: How to ensure that only one worker runs per node

2014-10-30 Thread Matthew Cornell
As I understand it, 1) set the variable to 1 as you say, and 2)
specify the number of workers to the number of nodes - 1 (for the
master). When you run a job you can look at the 'map' link on the
tasktracker ui to see all the workers plus master.

On Thu, Oct 30, 2014 at 7:11 AM, Matthew Saltz  wrote:
> Hi everyone,
>
> Is there a good way (a configuration I'm guessing) to prevent more than one
> worker from running per node? I saw in this thread to use
> mapred.tasktracker.map.tasks.maximum=1, but that doesn't seem to be working.
> Thanks for the help.
>
> Best,
> Matthew
>



-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34
Dickinson Street, Amherst MA 01002 | matthewcornell.org


"Missing chosen worker" ERROR drills down to "end of stream exception" ("likely client has closed socket"). help!

2014-10-28 Thread Matthew Cornell
Hi All,

I have a Giraph 1.0.0 job that has failed, but I'm not able to get
detail as to what really happened. The master's log says:

> 2014-10-28 10:28:32,006 ERROR org.apache.giraph.master.BspServiceMaster: 
> superstepChosenWorkerAlive: Missing chosen worker 
> Worker(hostname=compute-0-0.wright, MRtaskID=1, port=30001) on superstep 4

OK, this seems to say compute-0-0 failed in some way, correct? The
Ganglia pages show no noticeable OS differences between the failed
node and another identical compute node. In the failed node's log I
see two WARNs:

> 2014-10-28 10:28:19,560 WARN org.apache.giraph.bsp.BspService: process: 
> Disconnected from ZooKeeper (will automatically try to recover) WatchedEvent 
> state:Disconnected type:None path:null
> 2014-10-28 10:28:19,560 WARN org.apache.giraph.worker.InputSplitsHandler: 
> process: Problem with zookeeper, got event with path null, state 
> Disconnected, event type None

OK, I guess there was a zookeeper issue. In the Zookeeper log I find:

> 2014-10-28 10:28:14,917 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid 
> 0x149529c74de0a4d, likely client has closed socket
> at 
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)

OK, so I guess the socket closure was the problem. But why did *that* happen?

I could really use your help here!

Thank you,

matt


-- 
Matthew Cornell | m...@matthewcornell.org


Re: YARN vs. MR1: is YARN a good idea ? Out-of-Core Graph ?

2014-10-22 Thread Matthew Cornell
Following up, does anyone have thoughts re: MR1 vs YARN performance?
Thank you. -- matt

On Wed, Oct 22, 2014 at 8:34 AM,   wrote:
> Hello Tripti,
>
> I bumped into this mail, I am experiencing out-of-memory errors on my small 
> cluster,
> and, as out-of-core graph does not seem to work on giraph 1.1-SNAPSHOT, I was 
> wondering if you had any jira / patched already posted  to help solve this 
> issue ?
>
>
> Thanks a lot
>
> regards
>
>
> 
> Olivier Varène
> Big Data Referent
> Orange - DSI France/Digital Factory
> olivier.var...@orange.com
> +33 4 97 46 29 94
>
>
>
>
>
>
>
> Le 10 oct. 2014 à 20:15, Tripti Singh  a écrit :
>
>> Hi Matthew,
>> I would have been thrilled to give you numbers on this one but for me the 
>> Application is not scaling without the out-of-core option( which isn't 
>> working the way it was in previous version)
>> I'm still figuring it out and can get back once it's resolved. I have 
>> patched a few things and will share them for people who might face similar 
>> issue. If u have a fix for scalability, do let me know
>>
>> Thanks,
>> Tripti
>>
>> Sent from my iPhone
>>
>>> On 06-Oct-2014, at 9:22 pm, "Matthew Cornell"  
>>> wrote:
>>>
>>> Hi Folks. I don't think I paid enough attention to YARN vs. MR1 when I
>>> built Giraph 1.0.0 for our system. How much better is Giraph on YARN?
>>> Thank you.
>>>
>>> --
>>> Matthew Cornell | m...@matthewcornell.org
>
>
> _
>
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou 
> falsifie. Merci.
>
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been 
> modified, changed or falsified.
> Thank you.
>



-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34
Dickinson Street, Amherst MA 01002 | matthewcornell.org


YARN vs. MR1: is YARN a good idea?

2014-10-06 Thread Matthew Cornell
Hi Folks. I don't think I paid enough attention to YARN vs. MR1 when I
built Giraph 1.0.0 for our system. How much better is Giraph on YARN?
Thank you.

-- 
Matthew Cornell | m...@matthewcornell.org


Re: Giraph 1.0 | Computation stuck at map 100% - reduce 0% for my algorithm only, at multi-node cluster

2014-09-30 Thread Matthew Cornell
I'm new, but in my meager experience when it stops at map 100% it means
there was an error somewhere. In Giraph I've often found it difficult to
pin down what that error actually was (e.g., out of memory), but the logs
are the first place to look. Just to clarify re: not finding outputs: Are
you going to http://:50030/jobtracker.jsp and clicking on
the failed job id (e.g., job_201409251209_0029 ->
http://:50030/jobdetails.jsp?jobid=job_201409251209_0029&refresh=0
)? From there, click the "map" link in the table to see its tasks. (Giraph
runs entirely as a map task, IIUC.) You should see tasks for the master
plus your workers. If you click on one of them (e.g.,
task_201409251209_0029_m_00 ->
http://:50030/taskdetails.jsp?tipid=task_201409251209_0029_m_00
) you should see what machine it ran on plus a link to the Task Logs. Click
on "All" and you should see three sections for stdout, stderr, and syslog,
the latter of which usually contains hints about what went wrong. You
should check all the worker logs.

Hope that helps.


On Tue, Sep 30, 2014 at 2:53 AM, Panagiotis Eustratiadis <
ep.pan@gmail.com> wrote:

> Good morning,
>
> I have been having a problem the past few days which sadly I can't solve.
>
> First of all I set up a Hadoop 0.20.203.0 cluster of two nodes a master
> and a slave. I followed this tutorial for the settings:
> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
>
> Then I set up Giraph, and I built it properly with maven. When I run the
> SimpleShortestPathVertex with number of workers = 2 it runs properly, and
> gives me results which I can view from any of the two nodes. Also the
> jobtracker at master:50030 and slave:50030 and everything else is working
> as expected.
>
> However, when I try to run my own algorithm it hangs at map 100% reduce 0%
> forever. I looked at SimpleShortestPathVertex for any configurations and it
> has none. And the weird part is: the jobs at the jobtracker have no logs at
> stdout or stderr. The only thing readable is the map task info:
>
> task_201409300940_0001_m_00 | 100.00% - MASTER_ZOOKEEPER_ONLY | 1
> finished out of 2 on superstep -1
> task_201409300940_0001_m_01 | 100.00% | startSuperstep: WORKER_ONLY -
> Attempt=0, Superstep=-1
> task_201409300940_0001_m_02 | 100.00% | startSuperstep: WORKER_ONLY -
> Attempt=0, Superstep=-1
>
> Is there anything I'm overlooking? I have Googled the obvious stack
> overflow solutions for two days now. Has anyone encountered anything
> similar?
>
> Regards,
> Panagiotis Eustratiadis.
>



-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
Street, Amherst MA 01002 | matthewcornell.org


How do I control which tasks run on which hosts?

2014-09-29 Thread Matthew Cornell
Hi Folks,

I have a small CDH4 cluster of five hosts (four compute nodes and a head
node - call them 0-3 and 'w') where hosts 0-3 have 4 cores and 16GB RAM
each, and 'w' has 32 cores and 64GB RAM. All five hosts are running
mapreduce tasktracker services, and 'w' is also running the jobtracker.
Resources are tight for my particular Giraph application (a kind of
path-finding), and I've discovered that some configurations of selected
hosts are better than others. My command specifies four workers:

hadoop jar $GIRAPH_HOME/giraph-ex.jar org.apache.giraph.GiraphRunner \
-Dgiraph.zkList=wright.cs.umass.edu:2181 \
-libjars ${LIBJARS} \
relpath.RelPathVertex \
-wc relpath.RelPathWorkerContext \
-mc relpath.RelPathMasterCompute \
-vif relpath.JsonAdjacencyListVertexInputFormat \
-vip $REL_PATH_INPUT \
-of relpath.JsonAdjacencyListTextOutputFormat \
-op $REL_PATH_OUTPUT \
-ca RelPathVertex.path=$REL_PATH_PATH \
-w 4

When Giraph (Zookeeper?) puts three or more of the Giraph map tasks on 'w'
(e.g., 01www or 1), then that host maxes out ram, cpu, and swap, and
the job hangs. However, when the system spreads the work out more evenly so
that 'w' has only two or fewer tasks (e.g., 123ww or 0321w), then the job
finishes fine.

My question is 1) what program is deciding the task-to-host assigment, and
2) how do I control that? Thanks very much!

-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
Street, Amherst MA 01002 | matthewcornell.org


Re: understanding failing my job, Giraph/Hadoop memory usage, under-utilized nodes, and moving forward

2014-09-25 Thread Matthew Cornell
On Mon, Sep 22, 2014 at 2:10 PM, Matthew Saltz  wrote:
> In the logs for the workers, do you have a line that looks like:
> 2014-09-21 18:12:13,021 INFO org.apache.giraph.worker.BspServiceWorker:
> finishSuperstep: Waiting on all requests, superstep 93 Memory
> (free/total/max) = 21951.08M / 36456.50M / 43691.00M
>
> Looking at the memory usage in the worker that fails at the end of
superstep
> before failure could give you a clue.

Yes, all four workers when I use "-w 4" have those lines:

Task Logs: 'attempt_201409191450_0016_m_01_0': compute-0-1:
2014-09-25 09:28:13,425 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep -1 Memory
(free/total/max) = 242.41M / 438.06M / 1820.50M
2014-09-25 09:28:13,817 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 0 Memory
(free/total/max) = 194.77M / 438.06M / 1820.50M
2014-09-25 09:28:14,936 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 1 Memory
(free/total/max) = 383.74M / 600.38M / 1820.50M
2014-09-25 09:28:17,820 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 2 Memory
(free/total/max) = 362.14M / 1007.50M / 1820.50M
2014-09-25 09:28:31,680 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 3 Memory
(free/total/max) = 203.33M / 1661.50M / 1820.50M

Task Logs: 'attempt_201409191450_0016_m_02_0': compute-0-1:
2014-09-25 09:28:13,458 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep -1 Memory
(free/total/max) = 887.74M / 964.50M / 1820.50M
2014-09-25 09:28:14,381 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 0 Memory
(free/total/max) = 830.14M / 964.50M / 1820.50M
2014-09-25 09:28:15,337 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 1 Memory
(free/total/max) = 785.66M / 1217.00M / 1820.50M
2014-09-25 09:28:18,114 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 2 Memory
(free/total/max) = 661.72M / 1113.50M / 1820.50M
2014-09-25 09:28:52,451 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 3 Memory
(free/total/max) = 285.90M / 1831.00M / 1831.00M

Task Logs: 'attempt_201409191450_0016_m_03_0': wright:
2014-09-25 09:28:13,456 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep -1 Memory
(free/total/max) = 886.23M / 964.50M / 1820.50M
2014-09-25 09:28:14,399 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 0 Memory
(free/total/max) = 826.36M / 964.50M / 1820.50M
2014-09-25 09:28:15,556 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 1 Memory
(free/total/max) = 662.50M / 1217.00M / 1820.50M
2014-09-25 09:28:18,170 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 2 Memory
(free/total/max) = 581.14M / 1115.00M / 1820.50M
2014-09-25 09:29:31,673 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 3 Memory
(free/total/max) = 299.61M / 1834.00M / 1834.00M

Task Logs: 'attempt_201409191450_0016_m_04_0': wright:
2014-09-25 09:28:13,473 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep -1 Memory
(free/total/max) = 887.10M / 964.50M / 1820.50M
2014-09-25 09:28:14,374 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 0 Memory
(free/total/max) = 826.65M / 964.50M / 1820.50M
2014-09-25 09:28:15,755 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 1 Memory
(free/total/max) = 980.33M / 1217.00M / 1820.50M
2014-09-25 09:28:18,254 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 2 Memory
(free/total/max) = 517.13M / 1128.50M / 1820.50M
2014-09-25 09:29:34,392 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 3 Memory
(free/total/max) = 271.52M / 1858.50M / 1858.50M


I'm still not clear on a couple of things:

   1. Each compute node has 16GB of memory, but each task has a max of
   ~1820M (<2GB). In Cloudera's web UI, I set "MapReduce Child Java Maximum
   Heap Size" to 2GB (default is 1GB). I will try upping it to 8GB.
   2. I still don't understand why only two of my five possible nodes are
   being used.

Thank you.



-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
Street, Amherst MA 01002 | matthewcornell.org


Re: receiving messages that I didn't send

2014-09-25 Thread Matthew Cornell
Thanks for replying, Pavan. I figured out that my message Writable (an
ArrayListWritable) needed to call clear() in readFields() before
calling super():

@Override
public void readFields(DataInput in) throws IOException {
clear();
super.readFields(in);
}

This was an 'of course' moment when I realized it was, like other
Writables, being reused. But what I don't understand is why doesn't
ArrayListWritable#readFields() call clear? Isn't this a nasty bug? ...
Oh wait - sure enough:

ArrayListWritable object is not cleared in readFields()
https://issues.apache.org/jira/browse/GIRAPH-740

Thanks again,

matt


On Tue, Sep 23, 2014 at 11:46 AM, Pavan Kumar A  wrote:
> Can you give more context?
> What are the types of messages, patch of your compute method, etc.
> You will not receive messages that are not sent, but one thing that can
> happen is
> -- message can have multiple parameters.
> suppose message objects can have 2 parameters
> m - a,b
> say in m's write(out) you do not handle the case of b = null
> m1 sets b
> m2 has b=null
> then because of incorrect code for m's write() m2 can show b = m1.b
> that is because message objects will be re-used when receiving. This is a
> Giraph gotcha, because of
> object reuse in most iterators.
>
> Thanks
>
>> From: m...@matthewcornell.org
>> Date: Tue, 23 Sep 2014 10:10:48 -0400
>> Subject: receiving messages that I didn't send
>> To: user@giraph.apache.org
>
>>
>> Hi Folks. I am refactoring my compute() to use a set of ids as its
>> message type, and in my tests it is receiving a message that it
>> absolutely did not send. I've debugged it and am at a loss.
>> Interestingly, I encountered this once before and solved it by
>> creating a copy of a Writeable instead of re-using it, but I haven't
>> been able to solve it this time. In general, does this anomalous
>> behavior indicate a Giraph/Hadoop gotcha'? It's really confounding!
>> Thank very much -- matt
>>
>> --
>> Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34
>> Dickinson Street, Amherst MA 01002 | matthewcornell.org



-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34
Dickinson Street, Amherst MA 01002 | matthewcornell.org


Re: understanding failing my job, Giraph/Hadoop memory usage, under-utilized nodes, and moving forward

2014-09-24 Thread Matthew Cornell
I cannot thank you enough, Matthew. You've given me a lot to
experiment with. -- matt

On Mon, Sep 22, 2014 at 2:10 PM, Matthew Saltz  wrote:
> Hi Matthew,
>
> I answered a few of your questions in-line (unfortunately they might not
> help the larger problem, but hopefully it'll help a bit).
>
> Best,
> Matthew
>
>
> On Mon, Sep 22, 2014 at 5:50 PM, Matthew Cornell 
> wrote:
>>
>> Hi Folks,
>>
>> I've spent the last two months learning, installing, coding, and
>> analyzing the performance of our Giraph app, and I'm able to run on
>> small inputs on our tiny cluster (yay!) I am now stuck trying to
>> figure out why larger inputs fail, why only some compute nodes are
>> being used, and generally how to make sure I've configured hadoop and
>> giraph to use all available CPUs and RAM. I feel that I'm "this
>> close," and I could really use some pointers.
>>
>> Below I share our app, configuration, results and log messages, some
>> questions, and counter output for the successful run. My post here is
>> long (I've broken it into sections delimited with ''), but I hope
>> I've provided good enough information to get help on. I'm happy to add
>> to it.
>>
>> Thanks!
>>
>>
>>  application 
>>
>> Our application is a kind of path search where all nodes have a type
>> and source database ID (e.g., "movie 99"), and searches are expressed
>> as type paths, such as "movie, acted_in, actor", which would start
>> with movies and then find all actors in each movie, for all movies in
>> the database. The program does a kind of filtering by keeping track of
>> previously-processed initial IDs.
>>
>> Our database is a small movie one with 2K movies, 6K users (people who
>> rate movies), and 80K ratings of movies by users. Though small, we've
>> found this kind of search can result in a massive explosion of
>> messages, as was well put by Rob Vesse (
>>
>> http://mail-archives.apache.org/mod_mbox/giraph-user/201312.mbox/%3ccec4a409.2d7ad%25rve...@dotnetrdf.org%3E
>> ):
>>
>> > even with this relatively small graph you get a massive explosion of
>> > messages by the later super steps which exhausts memory (in my graph the
>> > theoretical maximum messages by the last super step was ~3 billion)
>>
>>
>>  job failure and error messages 
>>
>> Currently I have a four-step path that completes in ~20 seconds
>> ("rates, movie, rates, user" - counter output shown at bottom) but a
>> five-step one ("rates, movie, rates, user, rates") fails after a few
>> minutes. I've looked carefully at the task logs, but I find it a
>> little difficult to discern what the actual failure was. However,
>> looking at system information (e.g., top and ganglia) during the run
>> indicates hosts are running out of memory. There are no
>> OutOfMemoryErrors in the logs, and only this one stsands out:
>>
>> > ERROR org.apache.giraph.master.BspServiceMaster:
>> > superstepChosenWorkerAlive: Missing chosen worker
>> > Worker(hostname=compute-0-3.wright, MRtaskID=1, port=30001) on superstep 4
>>
>> NB: So far I've been ignoring these other types of messages:
>>
>> > FATAL org.apache.giraph.master.BspServiceMaster: getLastGoodCheckpoint:
>> > No last good checkpoints can be found, killing the job.
>>
>> > java.io.FileNotFoundException: File
>> > _bsp/_checkpoints/job_201409191450_0003 does not exist.
>>
>> > WARN org.apache.giraph.bsp.BspService: process: Unknown and unprocessed
>> > event
>> > (path=/_hadoopBsp/job_201409191450_0003/_applicationAttemptsDir/0/_superstepDir/2/_superstepFinished,
>> > type=NodeDeleted, state=SyncConnected)
>>
>> > ERROR org.apache.giraph.worker.BspServiceWorker: unregisterHealth: Got
>> > failure, unregistering health on
>> > /_hadoopBsp/job_201409191450_0003/_applicationAttemptsDir/0/_superstepDir/4/_workerHealthyDir/compute-0-3.wright_1
>> > on superstep 4
>>
>> The counter statistics are minimal after the run fails, but during it
>> I see something like this when refreshing the Job Tracker Web UI:
>>
>> > Counters > Map-Reduce Framework > Physical memory (bytes) snapshot:
>> > ~28GB
>> > Counters > Map-Reduce Framework > Virtual memory (bytes) snapshot: ~27GB
>> > Counters > Giraph Stats > Sent messages: ~181M
>>
>>
>>  hadoop/giraph command 
>>
>> h

receiving messages that I didn't send

2014-09-23 Thread Matthew Cornell
Hi Folks. I am refactoring my compute()  to use a set of ids as its
message type, and in my tests it is receiving a message that it
absolutely did not send. I've debugged it and am at a loss.
Interestingly, I encountered this once before and solved it by
creating a copy of a Writeable instead of re-using it, but I haven't
been able to solve it this time. In general, does this anomalous
behavior indicate a Giraph/Hadoop gotcha'? It's really confounding!
Thank very much -- matt

-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34
Dickinson Street, Amherst MA 01002 | matthewcornell.org


understanding failing my job, Giraph/Hadoop memory usage, under-utilized nodes, and moving forward

2014-09-22 Thread Matthew Cornell
GB
> I/O Sort Memory Buffer (io.sort.mb): 512 MB
> Client Java Heap Size in Bytes: 256 MB
> Java Heap Size of Jobtracker in Bytes 1 GB
> Java Heap Size of TaskTracker in Bytes: 1 GB

Cluster summary from the Job Tracker Web UI:

> Heap Size is 972.69 MB/989.88 MB
> Map Task Capacity: 48
> Reduce Task Capacity: 24
> Avg. Tasks/Node: 14.40

Giraph: Compiled as "giraph-1.0.0-for-hadoop-2.0.0-alpha", CHANGELOG:
Release 1.0.0 - 2013-04-15


 questions 

o How can I verify that the failure is actually one of memory? I've
looked fairly carefully at the logs.

o I noticed that not all hosts are being used. I did three runs, two
with 8 workers and one with 12, and I pulled the following from the
task logs ('h' = head node, 0-3 = compute nodes):

> run #1: 0, 2, 3, h, h, h, h, h, h
> run #2: 2, 1, 3, h, h, h, h, h, h
> run #3: 3, 3, h, h, h, h, h, h, h, h, h, 1, 1

Note that there's at least one compute node that isn't listed for each run.

o What's a good # of workers to use?

o What Hadoop parameters should I tweak?
> mapred.job.map.memory.mb=xx
> mapred.map.child.java.opts=xx
> mapred.{map|reduce}.child.ulimit
> mapred.task.profile
> # map slots for each TaskTracker
> number of partitions you keep in memory


o What Giraph parameters should I tweak? I'm currently using defaults
for all, but I found these possibilities:
> giraph.maxPartitionsInMemory
> giraph.useOutOfCoreGraph=true
> giraph.maxPartitionsInMemory=N (default: 10)
> giraph.isStaticGraph=true
> giraph.useOutOfCoreMessages=true (default: disabled)
> giraph.maxMessagesInMemory=N (default: 100)

o How can I get a feel for how much more processing and memory might
be needed to finish the job, beyond that it's on the last superstep?
For example, of the ~181M sent messages I see during the run, how many
more might be left?

o Why is the Heap Size from the Cluster summary above (972.69
MB/989.88 MB) so low?

Thanks again!


 counters from successful four-step run 

INFO mapred.JobClient: Job complete: job_201409191450_0001
INFO mapred.JobClient: Counters: 39
INFO mapred.JobClient:   File System Counters
INFO mapred.JobClient: FILE: Number of bytes read=0
INFO mapred.JobClient: FILE: Number of bytes written=1694975
INFO mapred.JobClient: FILE: Number of read operations=0
INFO mapred.JobClient: FILE: Number of large read operations=0
INFO mapred.JobClient: FILE: Number of write operations=0
INFO mapred.JobClient: HDFS: Number of bytes read=10016293
INFO mapred.JobClient: HDFS: Number of bytes written=113612773
INFO mapred.JobClient: HDFS: Number of read operations=12
INFO mapred.JobClient: HDFS: Number of large read operations=0
INFO mapred.JobClient: HDFS: Number of write operations=9
INFO mapred.JobClient:   Job Counters
INFO mapred.JobClient: Launched map tasks=9
INFO mapred.JobClient: Total time spent by all maps in occupied
slots (ms)=206659
INFO mapred.JobClient: Total time spent by all reduces in occupied
slots (ms)=0
INFO mapred.JobClient: Total time spent by all maps waiting after
reserving slots (ms)=0
INFO mapred.JobClient: Total time spent by all reduces waiting
after reserving slots (ms)=0
INFO mapred.JobClient:   Map-Reduce Framework
INFO mapred.JobClient: Map input records=9
INFO mapred.JobClient: Map output records=0
INFO mapred.JobClient: Input split bytes=396
INFO mapred.JobClient: Spilled Records=0
INFO mapred.JobClient: CPU time spent (ms)=243280
INFO mapred.JobClient: Physical memory (bytes) snapshot=9947144192
INFO mapred.JobClient: Virtual memory (bytes) snapshot=25884065792
INFO mapred.JobClient: Total committed heap usage (bytes)=10392305664
INFO mapred.JobClient:   Giraph Stats
INFO mapred.JobClient: Aggregate edges=402428
INFO mapred.JobClient: Aggregate finished vertices=119141
INFO mapred.JobClient: Aggregate vertices=119141
INFO mapred.JobClient: Current master task partition=0
INFO mapred.JobClient: Current workers=8
INFO mapred.JobClient: Last checkpointed superstep=0
INFO mapred.JobClient: Sent messages=0
INFO mapred.JobClient: Superstep=4
INFO mapred.JobClient:   Giraph Timers
INFO mapred.JobClient: Input superstep (milliseconds)=1689
INFO mapred.JobClient: Setup (milliseconds)=3977
INFO mapred.JobClient: Shutdown (milliseconds)=1177
INFO mapred.JobClient: Superstep 0 (milliseconds)=834
INFO mapred.JobClient: Superstep 1 (milliseconds)=1836
INFO mapred.JobClient: Superstep 2 (milliseconds)=2524
INFO mapred.JobClient: Superstep 3 (milliseconds)=8284
INFO mapred.JobClient: Total (milliseconds)=20322

 EOF 


-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34
Dickinson Street, Amherst MA 01002 | matthewcornell.org


Re: how do I maintain a cached List across supersteps?

2014-09-17 Thread Matthew Cornell
Thanks to Claudio and Matthew, I went with the WorkerContext solution. Note
that I wrote a MasterCompute.validate() to verify the correct WorkerContext
class was set. Otherwise I was worried my cast would fail. -- matt

On Wed, Sep 17, 2014 at 11:49 AM, Claudio Martella <
claudio.marte...@gmail.com> wrote:

> I would use a workercontext, it is shared and persistent during
> computation by all vertices in a worker. If it's readonly, you won't have
> to manage concurrency.
>
> On Tue, Sep 16, 2014 at 9:42 PM, Matthew Cornell 
> wrote:
>
>> Hi Folks. I have a custom argument that's passed into my Giraph job that
>> needs parsing. The parsed value is accessed by my Vertex#compute. To avoid
>> excessive GC I'd like to cache the parsing results. What's a good way to do
>> so? I looked at using the ImmutableClassesGiraphConfiguration returned by
>> getConf(), but it supports only String properties. I looked at using my
>> custom MasterCompute to manage it, but I couldn't find how to access the
>> master compute instance from the vertex. My last idea is to use (abuse?) an
>> aggregator to do this. I'd appreciate your thoughts! -- matt
>>
>> --
>> Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
>> Street, Amherst MA 01002 | matthewcornell.org
>>
>
>
>
> --
>Claudio Martella
>
>



-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
Street, Amherst MA 01002 | matthewcornell.org


how do I maintain a cached List across supersteps?

2014-09-16 Thread Matthew Cornell
Hi Folks. I have a custom argument that's passed into my Giraph job that
needs parsing. The parsed value is accessed by my Vertex#compute. To avoid
excessive GC I'd like to cache the parsing results. What's a good way to do
so? I looked at using the ImmutableClassesGiraphConfiguration returned by
getConf(), but it supports only String properties. I looked at using my
custom MasterCompute to manage it, but I couldn't find how to access the
master compute instance from the vertex. My last idea is to use (abuse?) an
aggregator to do this. I'd appreciate your thoughts! -- matt

-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
Street, Amherst MA 01002 | matthewcornell.org


Re: How do I validate customArguments?

2014-09-10 Thread Matthew Cornell
Sorry for the long delay, Matthew. That's really helpful. Right now I'm
stuck on apparently running out of memory on our little cluster, but the
log messages are confusing. I'm putting together a question, but in the
meantime I'll try one of the simpler examples such as degree count to see
if /anything/ will run against my graph, which is very small (100K and
edges nodes). -- matt

On Thu, Aug 28, 2014 at 2:26 PM, Matthew Saltz  wrote:

> Matt,
>
> I'm not sure if you've resolved this problem already or not, but if you
> haven't: The initialize() method isn't limited to registering aggregators,
> and in fact, in my project I use it to do exactly what you're describing to
> check and load custom configuration parameters. Inside the initialize()
> method, I do this:
>
> *String numPreprocessingStepsConf =
> getConf().get(NUMBER_OF_PREPROCESSING_STEPS_CONF_OPT);*
> *numPreprocessingSteps = (numPreprocessingStepsConf != null) ?*
> *Integer.parseInt(numPreprocessingStepsConf.trim()) :*
> *DEFAULT_NUMBER_OF_PREPROCESSING_STEPS;*
> *System.out.println("Number of preprocessing steps: " +
> numPreprocessingSteps);*
>
> where at the class level I declare:
>
>   public static final String NUMBER_OF_PREPROCESSING_STEPS_CONF_OPT =
> "wcc.numPreprocessingSteps";
>   public static final int DEFAULT_NUMBER_OF_PREPROCESSING_STEPS = 1;
>   public static int numPreprocessingSteps;
>
> To set the property, I use the option "-ca
> wcc.numPreprocessingSteps=". If you need to check
> that it's properly formatted and not store them, this is a fine place to do
> it as well, given that it's run before the input superstep (see the giraph
> code in BspServiceMaster, line 1617 in the stable 1.1.0 release). What
> happens is that on the master, the MasterThread calls coordinateSuperstep()
> on a BspServiceMaster object, which checks if it's the input superstep, and
> if so, calls initialize() on the MasterCompute object (created in the
> becomeMaster() method of BspServiceMaster).
>
> Hope this helps,
> Matthew
>
>
>
> On Tue, Aug 26, 2014 at 4:36 PM, Matthew Cornell 
> wrote:
>
>> Hi again. My application needs to pass in a String argument to the
>> computation which each Vertex needs access to. (The argument is a list of
>> the form "[item1, item2, ...]".) I found --customArguments (which I set in
>> my tests via conf.set(, )) but I need to check that it's
>> properly formatted. Where do I do that? The only thing I thought of is to
>> specify a DefaultMasterCompute subclass whose initialize() does the check,
>> but all the initialize() examples do is register aggregators; none of them
>> check args or do anything else. Thanks in advance! -- matt
>>
>> --
>> Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
>> Street, Amherst MA 01002 | matthewcornell.org
>>
>
>


-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
Street, Amherst MA 01002 | matthewcornell.org


Which is better: sending many small messages or fewer large ones?

2014-09-04 Thread Matthew Cornell
Hi Everyone,

I have an app whose messaging granularity could be written two ways -
sending many small messages vs. (possibly far) fewer larger ones.
Conceptually what moves around is a set of 'alive' vertex IDs that might
get filtered at each superstep based on a processed list (vertex value)
that vertexes manage. The ones that survive to the end are the lucky
winners. compute() calculates a set of 'new-to-me' incoming IDs that are
perfect for the outgoing message, but I could easily send each ID one at a
time. My guess is that sending fewer messages is more important, but the
each set might contain thousands of IDs.

Thanks!

P.S. A side question: The few custom message type examples I've found are
relatively simple objects with a few primitive instance variables, rather
than collections. Is it nutty to send around a collection of IDs as a
message?

-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
Street, Amherst MA 01002 | matthewcornell.org


Which is better to use to manage Vertex state: POJO instance variables or Giraph values?

2014-08-26 Thread Matthew Cornell
Hi there. I'm confused about when it's OK to use Vertex instance variables
to maintain state rather than proper Giraph values ala getValue(). An
interesting example I found in the source demonstrates both:
SimpleTriangleClosingVertex, which has both an instance variable (closeMap)
and a custom vertex value (IntArrayListWritable). I'm a little surprised
that using an instance variable is legit due to possibly screwing up
serialization (?) My question: Is either valid? If so, how do I choose one
over the other? Thanks very much. -- matt

-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
Street, Amherst MA 01002 | matthewcornell.org


How do I validate customArguments?

2014-08-26 Thread Matthew Cornell
Hi again. My application needs to pass in a String argument to the
computation which each Vertex needs access to. (The argument is a list of
the form "[item1, item2, ...]".) I found --customArguments (which I set in
my tests via conf.set(, )) but I need to check that it's
properly formatted. Where do I do that? The only thing I thought of is to
specify a DefaultMasterCompute subclass whose initialize() does the check,
but all the initialize() examples do is register aggregators; none of them
check args or do anything else. Thanks in advance! -- matt

-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
Street, Amherst MA 01002 | matthewcornell.org


How do I look up a Vertex using its ID?

2014-08-25 Thread Matthew Cornell
Hi Folks. I have a graph computation that passes 'visited' Vertex IDs
around, and I need to output information from those in the output phase.
How do I look up a Vertex from its ID? I found Partition.getVertex(), but
IIUC there is no guarantee that an arbitrary Vertex will be in a particular
partition. Thanks in advance.

-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
Street, Amherst MA 01002 | matthewcornell.org


How do I output only a subset of a graph?

2014-08-25 Thread Matthew Cornell
Hi Folks. I have a graph computation that starts with a subset of vertices
of a certain type and propagates information through the graph to a set of
target vertices, which are also subset of the graph. I want to output only
information from those particular vertices, but I don't see a way to do
this in the various VertexOutputFormat subclasses, which all seem oriented
to outputting something for every vertex in the graph. How do I do this?
E.g., are there hooks for the output phase where I can filter output? Or am
I supposed to write a VertexOutputFormat implementation that generates no
output for the vertices that have no data? Thanks in advance.

-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
Street, Amherst MA 01002 | matthewcornell.org