Isabel, thanks for your answer.
for the 4th question, maybe we can still gain some speedups on
multi-machine clusters. But I suspect that we should also explicitly
consider the communication cost, which is non-trivial in such setting.
What do you think?

On 3/26/08, Isabel Drost <[EMAIL PROTECTED]> wrote:
> On Tuesday 25 March 2008, Hao Zheng wrote:
>  > 1. Sect. 4.1 Algorithm Time Complexity Analysis.
>  > the paper assumes m >> n, that is, the training instances are much
>  > larger than the features. its datasets do have very few features. but
>  > this may not be true for many tasks, e.g. text classification, where
>  > feature dimensions will reach 10^4-10^5. then will the analysis still
>  > hold?
>
>
> What I could directly read from the paper in the very same section: The
>  analysis will not hold in this case for those algorithms that require matrix
>  inversions or eigen decompositions as long as these operations are not
>  executed in parallel. The authors did not implement parallel versions for
>  these operations - the reason they state is the fact that in their datasets m
>  >> n.
>
>  The authors state themselves that there is extensive research on 
> parallelising
>  eigen decomposition and matrix inversion as well - so if we assume that we do
>  have a matrix package that can do these operations in a distributed way, IMHO
>  the analysis in the paper should still hold even for algorithms that require
>  these steps.
>
>
>
>  > 2. Sect. 4.1, too.
>  > "reduce phase can minimize communication by combining data as it's
>  > passed back; this accounts for the logP factor", Could you help me
>  > figure out how logP is calculated.
>
>
> Anyone else who can help out here?
>
>
>
>  > 3. Sect 5.4 Results and Discussion
>  > "SVM gets about 13.6% speed up on average over 16 cores", it's 13.6%
>  > or 13.6? From figure 2, it seems should be 13.6?
>
>
> The axis on the graphs do not have clear titles, but I would agree that it
>  should be 13.6 as well.
>
>
>
>  > 4. Sect 5.4, too.
>  > "Finally, the above are runs on multiprocessor machines." No matter
>  > multiprocess or multicore, it runs on a single machine which have a
>  > share memory.
>
>
> The main motivation for the paper was the rise of multi core machines that ask
>  for parallel algorithms even though one might not have a cluster available.
>
>
>
>  > But actually, M/R is for multi-machine, which will involve much more cost 
> on
>  > inter-machine communication. So the results of the paper may be
>  > questionable?
>
>
> I think you should not expect to get the exact same speedups on multi-machine
>  clusters. Still I think one can expect faster computation for large datasets
>  even in this setting. What do others think?
>
>
>
>  Isabel
>
>
>
>  --
>  There is no TRUTH.  There is no REALITY.  There is no CONSISTENCY. There are
>  no ABSOLUTE STATEMENTS.   I'm very probably wrong.
>   |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
>   /,`.-'`'    -.  ;-;;,_
>   |,4-  ) )-,_..;\ (  `'-'
>  '---''(_/--'  `-'\_) (fL)  IM:  <xmpp://[EMAIL PROTECTED]>
>
>

Reply via email to