yes, but it is just a test and I am trying to interpolate results that i
see to bigger volume. sort of. To get some taste of the programming model
performance.

I do get cpu-bound behavior and i hit spark cache 100% of the time. so i
theory, since i am not having spills and i am not doing sorts, it should be
fairly fast.

I have two algorithms. One just sends elementwise messages to the vertex
representing a row it should be in. Another one is using the same set of
initial messages but also uses Bagel combiners which, the way i understand
it, apply combining of elements to form partial vectors before shipping it
off to remote vertex paritition. Reasoning here apparently since elements
are combined, there's fewer io. Well, perhaps not in this case so much,
since we are not really doing any sort of information aggregation. On
single spark node setup i of course don't have actual io, so it should
approach speed of in-core copy-by-serialization.

What i am seeing is that elementwise messages work almost two times faster
in cpu bound behavior than the version with combiners. it would seem the
culprit is that VectorWritable serialization and then deserialization of
vectorized fragments is considerably slower than serialization of
elementwise messages containing only primitive types there (target row,
index, value), even that the latter is significantly larger amount of
objects as well as data.

Still though, i am trying to convince myself that even using combiners
should be ok compared to shuffle and sort overhead. But i think in reality
it still looks a bit slower than i expected. well i guess i should not be
lazy and benchmark it against Mahout MR-based transpose as well as spark's
version of RDD shuffle-and-sort.

anyway, map-only tasks on spark distributed matrices are lightning fast but
Bagel serialze/deserialize scatter/gather seems to be much slower than just
map-only processing. Perhaps I am doing it wrong somehow.


On Mon, Jul 8, 2013 at 10:22 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> Transpose of that small a matrix should happen in memory.
>
> Sent from my iPhone
>
> On Jul 8, 2013, at 17:26, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
>
> > Anybody knows how good (or bad) our performance on matrix transpose? how
> > long will it take to transpose a 10M non-zeros with Mahout (if i wanted
> to
> > setup fully distributed but single node MR cluster?)
> >
> > Trying to figure if the numbers i see with Bagel-based Mahout matrix
> > transposition are any good.
>

Reply via email to