yes that's my working hypothesis. Serializing and combining RandomAccessSparseVectors is slower than elementwise messages.
On Mon, Jul 8, 2013 at 11:00 PM, Ted Dunning <[email protected]> wrote: > It is common for double serialization to creep into the systems as well. > My guess however is that the primitive serialization is just much faster > than the vector serialization. > > Sent from my iPhone > > On Jul 8, 2013, at 22:55, Dmitriy Lyubimov <[email protected]> wrote: > > > yes, but it is just a test and I am trying to interpolate results that i > > see to bigger volume. sort of. To get some taste of the programming model > > performance. > > > > I do get cpu-bound behavior and i hit spark cache 100% of the time. so i > > theory, since i am not having spills and i am not doing sorts, it should > be > > fairly fast. > > > > I have two algorithms. One just sends elementwise messages to the vertex > > representing a row it should be in. Another one is using the same set of > > initial messages but also uses Bagel combiners which, the way i > understand > > it, apply combining of elements to form partial vectors before shipping > it > > off to remote vertex paritition. Reasoning here apparently since elements > > are combined, there's fewer io. Well, perhaps not in this case so much, > > since we are not really doing any sort of information aggregation. On > > single spark node setup i of course don't have actual io, so it should > > approach speed of in-core copy-by-serialization. > > > > What i am seeing is that elementwise messages work almost two times > faster > > in cpu bound behavior than the version with combiners. it would seem the > > culprit is that VectorWritable serialization and then deserialization of > > vectorized fragments is considerably slower than serialization of > > elementwise messages containing only primitive types there (target row, > > index, value), even that the latter is significantly larger amount of > > objects as well as data. > > > > Still though, i am trying to convince myself that even using combiners > > should be ok compared to shuffle and sort overhead. But i think in > reality > > it still looks a bit slower than i expected. well i guess i should not be > > lazy and benchmark it against Mahout MR-based transpose as well as > spark's > > version of RDD shuffle-and-sort. > > > > anyway, map-only tasks on spark distributed matrices are lightning fast > but > > Bagel serialze/deserialize scatter/gather seems to be much slower than > just > > map-only processing. Perhaps I am doing it wrong somehow. > > > > > > On Mon, Jul 8, 2013 at 10:22 PM, Ted Dunning <[email protected]> > wrote: > > > >> Transpose of that small a matrix should happen in memory. > >> > >> Sent from my iPhone > >> > >> On Jul 8, 2013, at 17:26, Dmitriy Lyubimov <[email protected]> wrote: > >> > >>> Anybody knows how good (or bad) our performance on matrix transpose? > how > >>> long will it take to transpose a 10M non-zeros with Mahout (if i wanted > >> to > >>> setup fully distributed but single node MR cluster?) > >>> > >>> Trying to figure if the numbers i see with Bagel-based Mahout matrix > >>> transposition are any good. > >> >
