yes, but it is just a test and I am trying to interpolate results that i see to bigger volume. sort of. To get some taste of the programming model performance.
I do get cpu-bound behavior and i hit spark cache 100% of the time. so i theory, since i am not having spills and i am not doing sorts, it should be fairly fast. I have two algorithms. One just sends elementwise messages to the vertex representing a row it should be in. Another one is using the same set of initial messages but also uses Bagel combiners which, the way i understand it, apply combining of elements to form partial vectors before shipping it off to remote vertex paritition. Reasoning here apparently since elements are combined, there's fewer io. Well, perhaps not in this case so much, since we are not really doing any sort of information aggregation. On single spark node setup i of course don't have actual io, so it should approach speed of in-core copy-by-serialization. What i am seeing is that elementwise messages work almost two times faster in cpu bound behavior than the version with combiners. it would seem the culprit is that VectorWritable serialization and then deserialization of vectorized fragments is considerably slower than serialization of elementwise messages containing only primitive types there (target row, index, value), even that the latter is significantly larger amount of objects as well as data. Still though, i am trying to convince myself that even using combiners should be ok compared to shuffle and sort overhead. But i think in reality it still looks a bit slower than i expected. well i guess i should not be lazy and benchmark it against Mahout MR-based transpose as well as spark's version of RDD shuffle-and-sort. anyway, map-only tasks on spark distributed matrices are lightning fast but Bagel serialze/deserialize scatter/gather seems to be much slower than just map-only processing. Perhaps I am doing it wrong somehow. On Mon, Jul 8, 2013 at 10:22 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > Transpose of that small a matrix should happen in memory. > > Sent from my iPhone > > On Jul 8, 2013, at 17:26, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > > > Anybody knows how good (or bad) our performance on matrix transpose? how > > long will it take to transpose a 10M non-zeros with Mahout (if i wanted > to > > setup fully distributed but single node MR cluster?) > > > > Trying to figure if the numbers i see with Bagel-based Mahout matrix > > transposition are any good. >