Yes.

Parallelism in Spark makes all the difference.

Since scatter type exchnange in spark increases I/O with increase of # of
the splits, strong scalling is not achievable. if you just keep increasing
parallelism, there's a point where individual cpu load decreases but
cumulative IO cancels out any gains of the parallelism increase. So it is
important to carefully pre-split algorithms inputs using par() operator.

But assuming the same parallelization strategy before and after, release
change also probably should not affect that

-d

On Thu, Apr 28, 2016 at 6:02 AM, Nikaash Puri <[email protected]> wrote:

> Hi,
>
> Ok, so interestingly enough when I repartition my input data across
> indicators on the User IDs, I get significant speedup. This is probably
> because shuffle goes down since RDDs with the same user ids are more likely
> located on the same nodes. What’s even more interesting is the behaviour as
> a function of the number of partitions.
>
> Concretely, in my case I was using around 20 cores. So, setting the number
> of partitions as 200 or more leads to greater shuffle and poorer
> performance. Setting the number of partitions to slightly more than the
> number of cores, 30 in my case gives significant speedups in the AtB
> calculations. Again, my guess is that shuffle is the reason.
>
> I’ll keep experimenting and share more results.
>
> All of these tests are with Spark 1.2.0 and Mahout 0.10.
>
> Thank you,
> Nikaash Puri
> > On 28-Apr-2016, at 2:50 AM, Pat Ferrel <[email protected]> wrote:
> >
> > I have been using the same function through all those versions of
> Mahout. I’m running on newer versions of Spark 1.4-1.6.2. Using my datasets
> there has been no slowdown. I assume that you are only changing the Mahout
> version—leaving data, Spark, HDFS, and all config the same. In which case I
> wonder if you are somehow running into limits of your machine like memory?
> Have you allocated a fixed executor memory limit?
> >
> > There has been almost no code change to item similarity. Dmitriy, do you
> know if the underlying AtB has changed? I seem to recall the partitioning
> was set to “auto” about 0.11. We were having problems with large numbers of
> small part files from Spark Streaming causing partitioning headaches as I
> recall. In some unexpected way the input structure was trickling down into
> partitioning decisions made in Spark.
> >
> > The first thing I’d try is giving the job more executor memory, the
> second is to upgrade Spark. A 3x increase in execution speed is a pretty
> big deal if it isn’t helped with these easy fixes so can you share your
> data?
> >
> > On Apr 27, 2016, at 8:37 AM, Dmitriy Lyubimov <[email protected]> wrote:
> >
> > 0.11 targets 1.3+.
> >
> > I don't quite have anything on top of my head affecting A'B specifically,
> > but i think there were some chanages affecting in-memory multiplication
> > (which is of course used in distributed A'B).
> >
> > I am not in particular familiar or remember details of row similarity on
> > top of my head, i really wish the original contributor would comment on
> > that. trying to see if i can come up with anything useful though.
> >
> > what behavior do you see in this job -- cpu-bound or i/o bound?
> >
> > there are a few pointers to look at:
> >
> > (1)  I/O many times exceeds the input size, so spills are inevitable. So
> > tuning memory sizes and look at spark spill locations to make sure disks
> > are not slow there is critical. Also, i think in spark 1.6 spark added a
> > lot of flexibility in managing task/cache/shuffle memory sizes, it may
> help
> > in some unexpected way.
> >
> > (2) sufficient cache: many pipelines commit reused matrices into cache
> > (MEMORY_ONLY) which is the default mahout algebra behavior, assuming
> there
> > is enough cache memory there for only good things to happen. if it is
> not,
> > however, it will cause recomputation of results that were evicted. (not
> > saying it is a known case for row similarity in particular). make sure
> this
> > is not the case. For cases of scatter type exchanges it is especially
> super
> > bad.
> >
> > (3) A'B -- try to hack and play with implemetnation there in AtB (spark
> > side) class. See if you can come up with a better arrangement.
> >
> > (4) in-memory computations (MMul class) if that's the bottleneck can be
> in
> > practice quick-hacked with mutlithreaded multiplication and bridge to
> > native solvers (netlib-java) at least for dense cases. this is found to
> > improve performance of distributed multiplications a bit. Works best if
> you
> > get 2 threads in the backend and all threads in the front end.
> >
> > There are other known things that can improve speed multiplication of the
> > public mahout version, i hope mahout will improve on those in the future.
> >
> > -d
> >
> > On Wed, Apr 27, 2016 at 6:14 AM, Nikaash Puri <[email protected]>
> wrote:
> >
> >> Hi,
> >>
> >> I’ve been working with LLR in Mahout for a while now. Mostly using the
> >> SimilarityAnalysis.cooccurenceIDss function. I recently upgraded the
> Mahout
> >> libraries to 0.11, and subsequently also tried with 0.12 and the same
> >> program is running orders of magnitude slower (at least 3x based on
> initial
> >> analysis).
> >>
> >> Looking into the tasks more carefully, comparing 0.10 and 0.11 shows
> that
> >> the amount of Shuffle being done in 0.11 is significantly higher,
> >> especially in the AtB step. This could possibly be a reason for the
> >> reduction in performance.
> >>
> >> Although, I am working on Spark 1.2.0. So, its possible that this could
> be
> >> causing the problem. It works fine with Mahout 0.10.
> >>
> >> Any ideas why this might be happening?
> >>
> >> Thank you,
> >> Nikaash Puri
> >
>
>

Reply via email to