Hmm, can’t get images through the Apache mail servers. The image is here: https://drive.google.com/file/d/0B4cAk1SMC1ChWFZiRG9DSEpkdzg/view?usp=sharing
On Apr 28, 2016, at 11:55 AM, Pat Ferrel <p...@occamsmachete.com> wrote: Actually on your advice Dmitriy I think these changes went in about 11. Before 11 par was not called. Any clue here? This was in relation to that issue when reading a huge number of part files created by Spark Streaming, which probably trickled down to cause too much parallelization. The auto=true fixed this issue for me but did it have other effects? <PastedGraphic-3.tiff> On Apr 28, 2016, at 10:12 AM, Dmitriy Lyubimov <dlie...@gmail.com <mailto:dlie...@gmail.com>> wrote: Yes. Parallelism in Spark makes all the difference. Since scatter type exchnange in spark increases I/O with increase of # of the splits, strong scalling is not achievable. if you just keep increasing parallelism, there's a point where individual cpu load decreases but cumulative IO cancels out any gains of the parallelism increase. So it is important to carefully pre-split algorithms inputs using par() operator. But assuming the same parallelization strategy before and after, release change also probably should not affect that -d On Thu, Apr 28, 2016 at 6:02 AM, Nikaash Puri <nikaashp...@gmail.com <mailto:nikaashp...@gmail.com>> wrote: > Hi, > > Ok, so interestingly enough when I repartition my input data across > indicators on the User IDs, I get significant speedup. This is probably > because shuffle goes down since RDDs with the same user ids are more likely > located on the same nodes. What’s even more interesting is the behaviour as > a function of the number of partitions. > > Concretely, in my case I was using around 20 cores. So, setting the number > of partitions as 200 or more leads to greater shuffle and poorer > performance. Setting the number of partitions to slightly more than the > number of cores, 30 in my case gives significant speedups in the AtB > calculations. Again, my guess is that shuffle is the reason. > > I’ll keep experimenting and share more results. > > All of these tests are with Spark 1.2.0 and Mahout 0.10. > > Thank you, > Nikaash Puri >> On 28-Apr-2016, at 2:50 AM, Pat Ferrel <p...@occamsmachete.com >> <mailto:p...@occamsmachete.com>> wrote: >> >> I have been using the same function through all those versions of > Mahout. I’m running on newer versions of Spark 1.4-1.6.2. Using my datasets > there has been no slowdown. I assume that you are only changing the Mahout > version—leaving data, Spark, HDFS, and all config the same. In which case I > wonder if you are somehow running into limits of your machine like memory? > Have you allocated a fixed executor memory limit? >> >> There has been almost no code change to item similarity. Dmitriy, do you > know if the underlying AtB has changed? I seem to recall the partitioning > was set to “auto” about 0.11. We were having problems with large numbers of > small part files from Spark Streaming causing partitioning headaches as I > recall. In some unexpected way the input structure was trickling down into > partitioning decisions made in Spark. >> >> The first thing I’d try is giving the job more executor memory, the > second is to upgrade Spark. A 3x increase in execution speed is a pretty > big deal if it isn’t helped with these easy fixes so can you share your > data? >> >> On Apr 27, 2016, at 8:37 AM, Dmitriy Lyubimov <dlie...@gmail.com >> <mailto:dlie...@gmail.com>> wrote: >> >> 0.11 targets 1.3+. >> >> I don't quite have anything on top of my head affecting A'B specifically, >> but i think there were some chanages affecting in-memory multiplication >> (which is of course used in distributed A'B). >> >> I am not in particular familiar or remember details of row similarity on >> top of my head, i really wish the original contributor would comment on >> that. trying to see if i can come up with anything useful though. >> >> what behavior do you see in this job -- cpu-bound or i/o bound? >> >> there are a few pointers to look at: >> >> (1) I/O many times exceeds the input size, so spills are inevitable. So >> tuning memory sizes and look at spark spill locations to make sure disks >> are not slow there is critical. Also, i think in spark 1.6 spark added a >> lot of flexibility in managing task/cache/shuffle memory sizes, it may > help >> in some unexpected way. >> >> (2) sufficient cache: many pipelines commit reused matrices into cache >> (MEMORY_ONLY) which is the default mahout algebra behavior, assuming > there >> is enough cache memory there for only good things to happen. if it is > not, >> however, it will cause recomputation of results that were evicted. (not >> saying it is a known case for row similarity in particular). make sure > this >> is not the case. For cases of scatter type exchanges it is especially > super >> bad. >> >> (3) A'B -- try to hack and play with implemetnation there in AtB (spark >> side) class. See if you can come up with a better arrangement. >> >> (4) in-memory computations (MMul class) if that's the bottleneck can be > in >> practice quick-hacked with mutlithreaded multiplication and bridge to >> native solvers (netlib-java) at least for dense cases. this is found to >> improve performance of distributed multiplications a bit. Works best if > you >> get 2 threads in the backend and all threads in the front end. >> >> There are other known things that can improve speed multiplication of the >> public mahout version, i hope mahout will improve on those in the future. >> >> -d >> >> On Wed, Apr 27, 2016 at 6:14 AM, Nikaash Puri <nikaashp...@gmail.com >> <mailto:nikaashp...@gmail.com>> > wrote: >> >>> Hi, >>> >>> I’ve been working with LLR in Mahout for a while now. Mostly using the >>> SimilarityAnalysis.cooccurenceIDss function. I recently upgraded the > Mahout >>> libraries to 0.11, and subsequently also tried with 0.12 and the same >>> program is running orders of magnitude slower (at least 3x based on > initial >>> analysis). >>> >>> Looking into the tasks more carefully, comparing 0.10 and 0.11 shows > that >>> the amount of Shuffle being done in 0.11 is significantly higher, >>> especially in the AtB step. This could possibly be a reason for the >>> reduction in performance. >>> >>> Although, I am working on Spark 1.2.0. So, its possible that this could > be >>> causing the problem. It works fine with Mahout 0.10. >>> >>> Any ideas why this might be happening? >>> >>> Thank you, >>> Nikaash Puri >> > >