that has occurred to me too. we are not inferring any aggregations really
here. it may turn out that its use beneficial with bigger volumes and real
I/O though. hard to tell. anyway i will probably keep both as an option.


On Tue, Jul 9, 2013 at 7:51 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> Also, it is likely that the combiner has little effect.  This means that
> you are essentially using a vector to serialized single elements.
>
> Sent from my iPhone
>
> On Jul 8, 2013, at 23:13, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
>
> > yes that's my working hypothesis. Serializing and combining
> > RandomAccessSparseVectors is slower than elementwise messages.
> >
> >
> > On Mon, Jul 8, 2013 at 11:00 PM, Ted Dunning <ted.dunn...@gmail.com>
> wrote:
> >
> >> It is common for double serialization to creep into the systems as well.
> >> My guess however is that the primitive serialization is just much faster
> >> than the vector serialization.
> >>
> >> Sent from my iPhone
> >>
> >> On Jul 8, 2013, at 22:55, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
> >>
> >>> yes, but it is just a test and I am trying to interpolate results that
> i
> >>> see to bigger volume. sort of. To get some taste of the programming
> model
> >>> performance.
> >>>
> >>> I do get cpu-bound behavior and i hit spark cache 100% of the time. so
> i
> >>> theory, since i am not having spills and i am not doing sorts, it
> should
> >> be
> >>> fairly fast.
> >>>
> >>> I have two algorithms. One just sends elementwise messages to the
> vertex
> >>> representing a row it should be in. Another one is using the same set
> of
> >>> initial messages but also uses Bagel combiners which, the way i
> >> understand
> >>> it, apply combining of elements to form partial vectors before shipping
> >> it
> >>> off to remote vertex paritition. Reasoning here apparently since
> elements
> >>> are combined, there's fewer io. Well, perhaps not in this case so much,
> >>> since we are not really doing any sort of information aggregation. On
> >>> single spark node setup i of course don't have actual io, so it should
> >>> approach speed of in-core copy-by-serialization.
> >>>
> >>> What i am seeing is that elementwise messages work almost two times
> >> faster
> >>> in cpu bound behavior than the version with combiners. it would seem
> the
> >>> culprit is that VectorWritable serialization and then deserialization
> of
> >>> vectorized fragments is considerably slower than serialization of
> >>> elementwise messages containing only primitive types there (target row,
> >>> index, value), even that the latter is significantly larger amount of
> >>> objects as well as data.
> >>>
> >>> Still though, i am trying to convince myself that even using combiners
> >>> should be ok compared to shuffle and sort overhead. But i think in
> >> reality
> >>> it still looks a bit slower than i expected. well i guess i should not
> be
> >>> lazy and benchmark it against Mahout MR-based transpose as well as
> >> spark's
> >>> version of RDD shuffle-and-sort.
> >>>
> >>> anyway, map-only tasks on spark distributed matrices are lightning fast
> >> but
> >>> Bagel serialze/deserialize scatter/gather seems to be much slower than
> >> just
> >>> map-only processing. Perhaps I am doing it wrong somehow.
> >>>
> >>>
> >>> On Mon, Jul 8, 2013 at 10:22 PM, Ted Dunning <ted.dunn...@gmail.com>
> >> wrote:
> >>>
> >>>> Transpose of that small a matrix should happen in memory.
> >>>>
> >>>> Sent from my iPhone
> >>>>
> >>>> On Jul 8, 2013, at 17:26, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
> >>>>
> >>>>> Anybody knows how good (or bad) our performance on matrix transpose?
> >> how
> >>>>> long will it take to transpose a 10M non-zeros with Mahout (if i
> wanted
> >>>> to
> >>>>> setup fully distributed but single node MR cluster?)
> >>>>>
> >>>>> Trying to figure if the numbers i see with Bagel-based Mahout matrix
> >>>>> transposition are any good.
> >>
>

Reply via email to