Yes. Parallelism in Spark makes all the difference.
Since scatter type exchnange in spark increases I/O with increase of # of the splits, strong scalling is not achievable. if you just keep increasing parallelism, there's a point where individual cpu load decreases but cumulative IO cancels out any gains of the parallelism increase. So it is important to carefully pre-split algorithms inputs using par() operator. But assuming the same parallelization strategy before and after, release change also probably should not affect that -d On Thu, Apr 28, 2016 at 6:02 AM, Nikaash Puri <[email protected]> wrote: > Hi, > > Ok, so interestingly enough when I repartition my input data across > indicators on the User IDs, I get significant speedup. This is probably > because shuffle goes down since RDDs with the same user ids are more likely > located on the same nodes. What’s even more interesting is the behaviour as > a function of the number of partitions. > > Concretely, in my case I was using around 20 cores. So, setting the number > of partitions as 200 or more leads to greater shuffle and poorer > performance. Setting the number of partitions to slightly more than the > number of cores, 30 in my case gives significant speedups in the AtB > calculations. Again, my guess is that shuffle is the reason. > > I’ll keep experimenting and share more results. > > All of these tests are with Spark 1.2.0 and Mahout 0.10. > > Thank you, > Nikaash Puri > > On 28-Apr-2016, at 2:50 AM, Pat Ferrel <[email protected]> wrote: > > > > I have been using the same function through all those versions of > Mahout. I’m running on newer versions of Spark 1.4-1.6.2. Using my datasets > there has been no slowdown. I assume that you are only changing the Mahout > version—leaving data, Spark, HDFS, and all config the same. In which case I > wonder if you are somehow running into limits of your machine like memory? > Have you allocated a fixed executor memory limit? > > > > There has been almost no code change to item similarity. Dmitriy, do you > know if the underlying AtB has changed? I seem to recall the partitioning > was set to “auto” about 0.11. We were having problems with large numbers of > small part files from Spark Streaming causing partitioning headaches as I > recall. In some unexpected way the input structure was trickling down into > partitioning decisions made in Spark. > > > > The first thing I’d try is giving the job more executor memory, the > second is to upgrade Spark. A 3x increase in execution speed is a pretty > big deal if it isn’t helped with these easy fixes so can you share your > data? > > > > On Apr 27, 2016, at 8:37 AM, Dmitriy Lyubimov <[email protected]> wrote: > > > > 0.11 targets 1.3+. > > > > I don't quite have anything on top of my head affecting A'B specifically, > > but i think there were some chanages affecting in-memory multiplication > > (which is of course used in distributed A'B). > > > > I am not in particular familiar or remember details of row similarity on > > top of my head, i really wish the original contributor would comment on > > that. trying to see if i can come up with anything useful though. > > > > what behavior do you see in this job -- cpu-bound or i/o bound? > > > > there are a few pointers to look at: > > > > (1) I/O many times exceeds the input size, so spills are inevitable. So > > tuning memory sizes and look at spark spill locations to make sure disks > > are not slow there is critical. Also, i think in spark 1.6 spark added a > > lot of flexibility in managing task/cache/shuffle memory sizes, it may > help > > in some unexpected way. > > > > (2) sufficient cache: many pipelines commit reused matrices into cache > > (MEMORY_ONLY) which is the default mahout algebra behavior, assuming > there > > is enough cache memory there for only good things to happen. if it is > not, > > however, it will cause recomputation of results that were evicted. (not > > saying it is a known case for row similarity in particular). make sure > this > > is not the case. For cases of scatter type exchanges it is especially > super > > bad. > > > > (3) A'B -- try to hack and play with implemetnation there in AtB (spark > > side) class. See if you can come up with a better arrangement. > > > > (4) in-memory computations (MMul class) if that's the bottleneck can be > in > > practice quick-hacked with mutlithreaded multiplication and bridge to > > native solvers (netlib-java) at least for dense cases. this is found to > > improve performance of distributed multiplications a bit. Works best if > you > > get 2 threads in the backend and all threads in the front end. > > > > There are other known things that can improve speed multiplication of the > > public mahout version, i hope mahout will improve on those in the future. > > > > -d > > > > On Wed, Apr 27, 2016 at 6:14 AM, Nikaash Puri <[email protected]> > wrote: > > > >> Hi, > >> > >> I’ve been working with LLR in Mahout for a while now. Mostly using the > >> SimilarityAnalysis.cooccurenceIDss function. I recently upgraded the > Mahout > >> libraries to 0.11, and subsequently also tried with 0.12 and the same > >> program is running orders of magnitude slower (at least 3x based on > initial > >> analysis). > >> > >> Looking into the tasks more carefully, comparing 0.10 and 0.11 shows > that > >> the amount of Shuffle being done in 0.11 is significantly higher, > >> especially in the AtB step. This could possibly be a reason for the > >> reduction in performance. > >> > >> Although, I am working on Spark 1.2.0. So, its possible that this could > be > >> causing the problem. It works fine with Mahout 0.10. > >> > >> Any ideas why this might be happening? > >> > >> Thank you, > >> Nikaash Puri > > > >
