Hmm, can’t get images through the Apache mail servers.

The image is here: 
https://drive.google.com/file/d/0B4cAk1SMC1ChWFZiRG9DSEpkdzg/view?usp=sharing

 
On Apr 28, 2016, at 11:55 AM, Pat Ferrel <p...@occamsmachete.com> wrote:

Actually on your advice Dmitriy I think these changes went in about 11. Before 
11 par was not called. Any clue here?

This was in relation to that issue when reading a huge number of part files 
created by Spark Streaming, which probably trickled down to cause too much 
parallelization. The auto=true fixed this issue for me but did it have other 
effects?
 

<PastedGraphic-3.tiff>


On Apr 28, 2016, at 10:12 AM, Dmitriy Lyubimov <dlie...@gmail.com 
<mailto:dlie...@gmail.com>> wrote:

Yes.

Parallelism in Spark makes all the difference.

Since scatter type exchnange in spark increases I/O with increase of # of
the splits, strong scalling is not achievable. if you just keep increasing
parallelism, there's a point where individual cpu load decreases but
cumulative IO cancels out any gains of the parallelism increase. So it is
important to carefully pre-split algorithms inputs using par() operator.

But assuming the same parallelization strategy before and after, release
change also probably should not affect that

-d

On Thu, Apr 28, 2016 at 6:02 AM, Nikaash Puri <nikaashp...@gmail.com 
<mailto:nikaashp...@gmail.com>> wrote:

> Hi,
> 
> Ok, so interestingly enough when I repartition my input data across
> indicators on the User IDs, I get significant speedup. This is probably
> because shuffle goes down since RDDs with the same user ids are more likely
> located on the same nodes. What’s even more interesting is the behaviour as
> a function of the number of partitions.
> 
> Concretely, in my case I was using around 20 cores. So, setting the number
> of partitions as 200 or more leads to greater shuffle and poorer
> performance. Setting the number of partitions to slightly more than the
> number of cores, 30 in my case gives significant speedups in the AtB
> calculations. Again, my guess is that shuffle is the reason.
> 
> I’ll keep experimenting and share more results.
> 
> All of these tests are with Spark 1.2.0 and Mahout 0.10.
> 
> Thank you,
> Nikaash Puri
>> On 28-Apr-2016, at 2:50 AM, Pat Ferrel <p...@occamsmachete.com 
>> <mailto:p...@occamsmachete.com>> wrote:
>> 
>> I have been using the same function through all those versions of
> Mahout. I’m running on newer versions of Spark 1.4-1.6.2. Using my datasets
> there has been no slowdown. I assume that you are only changing the Mahout
> version—leaving data, Spark, HDFS, and all config the same. In which case I
> wonder if you are somehow running into limits of your machine like memory?
> Have you allocated a fixed executor memory limit?
>> 
>> There has been almost no code change to item similarity. Dmitriy, do you
> know if the underlying AtB has changed? I seem to recall the partitioning
> was set to “auto” about 0.11. We were having problems with large numbers of
> small part files from Spark Streaming causing partitioning headaches as I
> recall. In some unexpected way the input structure was trickling down into
> partitioning decisions made in Spark.
>> 
>> The first thing I’d try is giving the job more executor memory, the
> second is to upgrade Spark. A 3x increase in execution speed is a pretty
> big deal if it isn’t helped with these easy fixes so can you share your
> data?
>> 
>> On Apr 27, 2016, at 8:37 AM, Dmitriy Lyubimov <dlie...@gmail.com 
>> <mailto:dlie...@gmail.com>> wrote:
>> 
>> 0.11 targets 1.3+.
>> 
>> I don't quite have anything on top of my head affecting A'B specifically,
>> but i think there were some chanages affecting in-memory multiplication
>> (which is of course used in distributed A'B).
>> 
>> I am not in particular familiar or remember details of row similarity on
>> top of my head, i really wish the original contributor would comment on
>> that. trying to see if i can come up with anything useful though.
>> 
>> what behavior do you see in this job -- cpu-bound or i/o bound?
>> 
>> there are a few pointers to look at:
>> 
>> (1)  I/O many times exceeds the input size, so spills are inevitable. So
>> tuning memory sizes and look at spark spill locations to make sure disks
>> are not slow there is critical. Also, i think in spark 1.6 spark added a
>> lot of flexibility in managing task/cache/shuffle memory sizes, it may
> help
>> in some unexpected way.
>> 
>> (2) sufficient cache: many pipelines commit reused matrices into cache
>> (MEMORY_ONLY) which is the default mahout algebra behavior, assuming
> there
>> is enough cache memory there for only good things to happen. if it is
> not,
>> however, it will cause recomputation of results that were evicted. (not
>> saying it is a known case for row similarity in particular). make sure
> this
>> is not the case. For cases of scatter type exchanges it is especially
> super
>> bad.
>> 
>> (3) A'B -- try to hack and play with implemetnation there in AtB (spark
>> side) class. See if you can come up with a better arrangement.
>> 
>> (4) in-memory computations (MMul class) if that's the bottleneck can be
> in
>> practice quick-hacked with mutlithreaded multiplication and bridge to
>> native solvers (netlib-java) at least for dense cases. this is found to
>> improve performance of distributed multiplications a bit. Works best if
> you
>> get 2 threads in the backend and all threads in the front end.
>> 
>> There are other known things that can improve speed multiplication of the
>> public mahout version, i hope mahout will improve on those in the future.
>> 
>> -d
>> 
>> On Wed, Apr 27, 2016 at 6:14 AM, Nikaash Puri <nikaashp...@gmail.com 
>> <mailto:nikaashp...@gmail.com>>
> wrote:
>> 
>>> Hi,
>>> 
>>> I’ve been working with LLR in Mahout for a while now. Mostly using the
>>> SimilarityAnalysis.cooccurenceIDss function. I recently upgraded the
> Mahout
>>> libraries to 0.11, and subsequently also tried with 0.12 and the same
>>> program is running orders of magnitude slower (at least 3x based on
> initial
>>> analysis).
>>> 
>>> Looking into the tasks more carefully, comparing 0.10 and 0.11 shows
> that
>>> the amount of Shuffle being done in 0.11 is significantly higher,
>>> especially in the AtB step. This could possibly be a reason for the
>>> reduction in performance.
>>> 
>>> Although, I am working on Spark 1.2.0. So, its possible that this could
> be
>>> causing the problem. It works fine with Mahout 0.10.
>>> 
>>> Any ideas why this might be happening?
>>> 
>>> Thank you,
>>> Nikaash Puri
>> 
> 
> 


Reply via email to