Re: no speed-up with parallel matrix calculation

Sandy Fri, 19 Sep 2008 15:56:52 -0700

Hmm.

Thanks, Miles. I'm going to mull over that for the evening.


As an aside, I took another look at the job history file for the second
phase, and I noticed something interesting. Even though I specified that the
number of map tasks be 8 using JobConf, and that the maximum number of map
tasks be 8 in my hadoop-site.xml file, I see the following:

Task Summary
============================
Kind    Total   Successful      Failed  Killed  StartTime       FinishTime

Map     787     787             0       0       18-Sep-2008 18:55:14
19-Sep-2008 04:13:46 (9hrs, 18mins, 32sec)
Reduce  4       4               0       0       18-Sep-2008 18:55:19
19-Sep-2008 07:31:33 (12hrs, 36mins, 14sec)
============================

This could be another reason for a lack of speedup: 787 maps are being
spawned -- but how can this be?

I specifically had setNumMapTasks() to be 8 in my code, and also set
mapred.tasktracker.tasks.maximum to 8 in hadoop-site.xml;  how is it
possible for there to be more than 8 maps to show up?

Thanks again,

-SM

On Fri, Sep 19, 2008 at 4:04 PM, Miles Osborne <[EMAIL PROTECTED]> wrote:

> for the word count example, you would have each mapper work-out the
> counts of all words, but in-memory.  (eg having a hash table and
> increment the count each time you saw it).  the merge phrase would
> then simply add together each word-count pair which had the same
> count.   more concretely, the mapper would emit the 2-perm count pairs
> and the reducer would simply sum them up.
>
> continuing with word counting, the basic MR way of doing it maximises
> disk space usage and minimises memory usage.  what i imagine is that
> you can have stages inbetween --use more memory than the pure MR
> stateless approach,  but gain in speed.
>
> returning to your example, you may not need to read-in that 48 GB of
> data if your counts are fairly dense (you see many instances of a
> given 2-pem entry).  if they are sparse, then you are probably doomed.
>
> Miles
>
> 2008/9/19 Sandy <[EMAIL PROTECTED]>:
> > Miles,
> >
> > Thanks for your response.
> >
> > I think I understand.. basically, I'm adding a combiner class that
> computes
> > the partial results in phase 2.  correct (just like in the word count
> > example)?
> >
> > However, even if I do that, I don't think it gets rid of the overhead of
> > reading 48 GB from disk back into memory, which I think may be a large
> part
> > of the problem. Would you agree that that may be the largest source of
> > overhead?
> >
> >  Is there something in hadoop that will allow me to prevent the writing
> of
> > the output of first reduce phase to disk, and instead just store it and
> > place it directly as input for the second reduce?
> >
> > Thanks,
> >
> > -SM
> >
> > On Fri, Sep 19, 2008 at 3:13 PM, Miles Osborne <[EMAIL PROTECTED]>
> wrote:
> >
> >> if each mapper only sees a relatively small chunk of the data, then
> >> why not have each one compute the counting of 2-perms in memory.
> >>
> >> you would then get the reducer to merge these partial results together.
> >>
> >> (details are left to the reader ...)
> >>
> >> Miles
> >>
> >> 2008/9/19 Sandy <[EMAIL PROTECTED]>:
> >> > Hi,
> >> >
> >> > I have a serious problem that I'm not sure how to fix. I have two M/R
> >> phases
> >> > that calculates a matrix in parallel. It works... but it's slower than
> >> the
> >> > serial version (by about 100 times).
> >> >
> >> > Here is a toy program that works similarly to my application. In this
> >> > example I'm having different random numbers being generated, per given
> >> line
> >> > of input, and then creating a n x n matrix that counts how many of
> these
> >> > random numbers were shared.
> >> >
> >> > -------------------
> >> > first map phase() {
> >> > input: key = offset, value = line of text (embedded line number), ln
> >> > generate k random numbers, k1 .. kn
> >> > emit: <ki, ln >
> >> > }
> >> >
> >> > first reduce phase() {
> >> > input: key = ki, value = list(ln)
> >> > if list size is greater than one:
> >> >   for every 2-permutation p:
> >> >      emit: <p, 1>
> >> >    //example: list = 1 2 3
> >> >    //emit: <(1,2), 1>
> >> >    //emit: <(2,3), 1>
> >> >    //emit: <(1,3), 1>
> >> > }
> >> >
> >> > second map phase() {
> >> > input: key = offset, value = (i, j) 1
> >> > //dummy function. acts as a transition to reduce
> >> > parse value into two tokens [(i,j)] and [1]
> >> > emit: <(i,j), 1>
> >> > }
> >> >
> >> > second reduce() {
> >> > input: key = (i,j)  value = list(1)
> >> > //wordcount
> >> > sum up the list of ones
> >> > emit: <(i,j), sum(list(1))>
> >> > }
> >> > ------------------
> >> >
> >> > Now here's the problem:
> >> >
> >> > Let's suppose the file is 27MB.
> >> > The time it takes for the first map phase is about 3 minutes.
> >> > The time it takes for the first reduce phase is about 1 hour.
> >> > The size of the intermediary files that are produced by this first M/R
> >> phase
> >> > is 48GB.
> >> >
> >> > The time it takes for the second map phase is 9 hours (and this
> function
> >> is
> >> > just a dummy funtion!!)
> >> > The time it takes for the second reduce phase is 12 hours
> >> >
> >> > I have been trying to change the number of maps and reduce tasks, but
> >> that
> >> > doesn't seem to really chip away at the massive number of
> 2-permutations
> >> > that need to be taken care of in the second M/R phase. At least not on
> my
> >> > current machine.
> >> >
> >> >
> >> > Has anyone implemented a matrix in parallel using MapReduce? If so, Is
> >> this
> >> > normal or expected behavior? I do realize that I have a small input
> file,
> >> > and that this may impact speedup. The most powerful machine I have to
> run
> >> > this M/R implementation is a MacPro that has two processors, each with
> >> four
> >> > cores, and 4 different hard disks of 1 TB each.
> >> >
> >> > Does anyone have any suggestions on what I can change (either with the
> >> > approach or the cluster setup -- do I need more machines?) in order to
> >> make
> >> > this faster? I am current running 8 map tasks and 4 reduce tasks. I am
> >> going
> >> > to change it 10 map tasks and 9 reduce tasks and see if that helps
> any,
> >> but
> >> > I'm seriously wondering if this is not going to give me much of a
> change
> >> > since I only have one machine.
> >> >
> >> >
> >> > Any insight is greatly appreciated.
> >> >
> >> > Thanks!
> >> >
> >> > -SM
> >> >
> >>
> >>
> >>
> >> --
> >> The University of Edinburgh is a charitable body, registered in
> >> Scotland, with registration number SC005336.
> >>
> >
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>

Re: no speed-up with parallel matrix calculation

Reply via email to