that has occurred to me too. we are not inferring any aggregations really here. it may turn out that its use beneficial with bigger volumes and real I/O though. hard to tell. anyway i will probably keep both as an option.
On Tue, Jul 9, 2013 at 7:51 AM, Ted Dunning <ted.dunn...@gmail.com> wrote: > Also, it is likely that the combiner has little effect. This means that > you are essentially using a vector to serialized single elements. > > Sent from my iPhone > > On Jul 8, 2013, at 23:13, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > > > yes that's my working hypothesis. Serializing and combining > > RandomAccessSparseVectors is slower than elementwise messages. > > > > > > On Mon, Jul 8, 2013 at 11:00 PM, Ted Dunning <ted.dunn...@gmail.com> > wrote: > > > >> It is common for double serialization to creep into the systems as well. > >> My guess however is that the primitive serialization is just much faster > >> than the vector serialization. > >> > >> Sent from my iPhone > >> > >> On Jul 8, 2013, at 22:55, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > >> > >>> yes, but it is just a test and I am trying to interpolate results that > i > >>> see to bigger volume. sort of. To get some taste of the programming > model > >>> performance. > >>> > >>> I do get cpu-bound behavior and i hit spark cache 100% of the time. so > i > >>> theory, since i am not having spills and i am not doing sorts, it > should > >> be > >>> fairly fast. > >>> > >>> I have two algorithms. One just sends elementwise messages to the > vertex > >>> representing a row it should be in. Another one is using the same set > of > >>> initial messages but also uses Bagel combiners which, the way i > >> understand > >>> it, apply combining of elements to form partial vectors before shipping > >> it > >>> off to remote vertex paritition. Reasoning here apparently since > elements > >>> are combined, there's fewer io. Well, perhaps not in this case so much, > >>> since we are not really doing any sort of information aggregation. On > >>> single spark node setup i of course don't have actual io, so it should > >>> approach speed of in-core copy-by-serialization. > >>> > >>> What i am seeing is that elementwise messages work almost two times > >> faster > >>> in cpu bound behavior than the version with combiners. it would seem > the > >>> culprit is that VectorWritable serialization and then deserialization > of > >>> vectorized fragments is considerably slower than serialization of > >>> elementwise messages containing only primitive types there (target row, > >>> index, value), even that the latter is significantly larger amount of > >>> objects as well as data. > >>> > >>> Still though, i am trying to convince myself that even using combiners > >>> should be ok compared to shuffle and sort overhead. But i think in > >> reality > >>> it still looks a bit slower than i expected. well i guess i should not > be > >>> lazy and benchmark it against Mahout MR-based transpose as well as > >> spark's > >>> version of RDD shuffle-and-sort. > >>> > >>> anyway, map-only tasks on spark distributed matrices are lightning fast > >> but > >>> Bagel serialze/deserialize scatter/gather seems to be much slower than > >> just > >>> map-only processing. Perhaps I am doing it wrong somehow. > >>> > >>> > >>> On Mon, Jul 8, 2013 at 10:22 PM, Ted Dunning <ted.dunn...@gmail.com> > >> wrote: > >>> > >>>> Transpose of that small a matrix should happen in memory. > >>>> > >>>> Sent from my iPhone > >>>> > >>>> On Jul 8, 2013, at 17:26, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > >>>> > >>>>> Anybody knows how good (or bad) our performance on matrix transpose? > >> how > >>>>> long will it take to transpose a 10M non-zeros with Mahout (if i > wanted > >>>> to > >>>>> setup fully distributed but single node MR cluster?) > >>>>> > >>>>> Trying to figure if the numbers i see with Bagel-based Mahout matrix > >>>>> transposition are any good. > >> >