Re: Welcome New Committer Nikolay Sakharnykh
Welcome Nikolay, and thank you for all your efforts for Mahout so far!! Ellen Friedman On Mon, May 1, 2017 at 5:34 PM, Dmitriy Lyubimovwrote: > Welcome!! > > On Wed, Apr 26, 2017 at 8:05 PM, Nikolai Sakharnykh < > nsakharn...@nvidia.com> > wrote: > > > Hello everyone, > > > > I’m sorry for some delay with my introduction, have been swamped with > > other projects recently ☺ > > > > Having worked at NVIDIA for around 8 years I have seen GPUs to evolve > from > > specialized graphics processors to general purpose computing machines > that > > can tackle any problem in the world (as long as you can extract enough > > parallelism ☺). My area of expertise as an engineer changed as well from > > games and visual effects to high-performance computing and graph > analytics. > > > > I must say that I’m relatively new to machine learning but it is a very > > exciting and quickly evolving field and I’d like to share my knowledge > and > > skills with the community. I’m honored and very happy to be part of this > > group and looking forward to making Apache Mahout work efficiently on > GPUs! > > > > Nikolay. > > > > From: Peng Zhang [mailto:pzhang.x...@gmail.com] > > Sent: Saturday, April 22, 2017 4:31 AM > > To: Nikolai Sakharnykh ; d...@mahout.apache.org; > > user@mahout.apache.org > > Subject: Re: Welcome New Committer Nikolay Sakharnykh > > > > Welcome Nikolay. > > > > > > On Sat, 22 Apr 2017 at 12:17 Andrew Musselman > apache.org>> wrote: > > The Apache Mahout PMC is pleased to announce that we have asked Nikolay > > Sakharnykh to become a committer and he has accepted. His contribution of > > an initial set of CUDA bindings into the project are good progress toward > > our goal of simplifying matrix math at scale. > > > > Being a committer allows you to contribute more easily to the project, > > since in addition to posting pull requests and patches you're also > granted > > write access to the code repository; which in turn means you can review > and > > accept community contributions, and help others pitch in. > > > > Nikolay, we're looking forward to working with you in the future; > welcome! > > It is customary for new committers to introduce themselves with a few > words > > :) > > > > Best > > Andrew > > > > > > --- > > This email message is for the sole use of the intended recipient(s) and > > may contain > > confidential information. Any unauthorized review, use, disclosure or > > distribution > > is prohibited. If you are not the intended recipient, please contact the > > sender by > > reply email and destroy all copies of the original message. > > > > --- > > >
Re: Welcome New Committer Nikolay Sakharnykh
Welcome!! On Wed, Apr 26, 2017 at 8:05 PM, Nikolai Sakharnykhwrote: > Hello everyone, > > I’m sorry for some delay with my introduction, have been swamped with > other projects recently ☺ > > Having worked at NVIDIA for around 8 years I have seen GPUs to evolve from > specialized graphics processors to general purpose computing machines that > can tackle any problem in the world (as long as you can extract enough > parallelism ☺). My area of expertise as an engineer changed as well from > games and visual effects to high-performance computing and graph analytics. > > I must say that I’m relatively new to machine learning but it is a very > exciting and quickly evolving field and I’d like to share my knowledge and > skills with the community. I’m honored and very happy to be part of this > group and looking forward to making Apache Mahout work efficiently on GPUs! > > Nikolay. > > From: Peng Zhang [mailto:pzhang.x...@gmail.com] > Sent: Saturday, April 22, 2017 4:31 AM > To: Nikolai Sakharnykh ; d...@mahout.apache.org; > user@mahout.apache.org > Subject: Re: Welcome New Committer Nikolay Sakharnykh > > Welcome Nikolay. > > > On Sat, 22 Apr 2017 at 12:17 Andrew Musselman apache.org>> wrote: > The Apache Mahout PMC is pleased to announce that we have asked Nikolay > Sakharnykh to become a committer and he has accepted. His contribution of > an initial set of CUDA bindings into the project are good progress toward > our goal of simplifying matrix math at scale. > > Being a committer allows you to contribute more easily to the project, > since in addition to posting pull requests and patches you're also granted > write access to the code repository; which in turn means you can review and > accept community contributions, and help others pitch in. > > Nikolay, we're looking forward to working with you in the future; welcome! > It is customary for new committers to introduce themselves with a few words > :) > > Best > Andrew > > > --- > This email message is for the sole use of the intended recipient(s) and > may contain > confidential information. Any unauthorized review, use, disclosure or > distribution > is prohibited. If you are not the intended recipient, please contact the > sender by > reply email and destroy all copies of the original message. > > --- >
Re: New logo
Just seeing this now, so maybe too late for my vote to count, but here goes. On Process: Pat thanks for organizing. +1 to continue to work on logo. Something without the blue man or elephant is good idea. Prefer not all blue logo. On designs: My favorites two from the second batch (both are blue & yellow) BEST for me is one with interlocking thin squares and word "Mahout" - MY FAVORITE [image: Inline image 1] 2nd best for me is one with word "Mahout" in black on interlocking solid yellow/blue background [image: Inline image 2] 3rd is simple letter M as wireframe [image: Inline image 3]but prefer the diagram be in yellow. I don't care for the loopy curved logos (sorry Andrew!) Good luck!! Ellen Friedman On Thu, Apr 27, 2017 at 12:56 PM, Pat Ferrelwrote: > We can treat this like a release vote, if anyone hates all these and > doesn’t want to continue with shortlisted designers for 3 more days (the > next step) vote -1 and say if your vote is binding (your are a PMC member) > > Otherwise all are welcome to rate everything on the polls below. > > In this case you have 24 hours to vote > > Here’s my +1 to continue refining. > > > On Apr 27, 2017, at 11:41 AM, Pat Ferrel wrote: > > Here is a second group, hopefully picked to be unique. > https://99designs.com/contests/poll/vl7xed > > We got a lot of responses, these 2 polls contain the best afaict. > > > On Apr 27, 2017, at 11:25 AM, Pat Ferrel wrote: > > Vote: https://99designs.com/contests/poll/rqcgif > > We asked for something “mathy” and asked for no elephant and rider. We > have the rest of the week to tweak so leave comments about what you like or > would like to change. > > We don’t have to pick one of these, so if you hate them all, make that > known too. > > >
Re: Scaling up spark Iitem similarity on big data data sets
I just ran into the opposite case Sebastian mentions, where a very large % of users have only one interaction. They come from Social media or Search and see only thing and leave. Processing this data turned into a huge job but led to virtually no change in the model since users with very few interactions also have minimal effect on the math. I removed any user with 1 interaction only and sped up the model calc by 10x. The moral of the story is that data prep can really help. I’ve a mind to put min AND max interactions into the algorithm and save people the trouble of doing it themselves. Seems like setting the min = 2 should be the default, at least for the primary/conversion event. You could override to any number. On Jun 23, 2016, at 7:01 AM, Sebastianwrote: Hi, Pairwise similarity is a quadratic problem and its very easy to run into a problem size does not scale anymore, especially with so many items. Our code downsamples the input data to help with this. One thing you can do is decrease the argument maxNumInteractions to a lower number to increase the amount of downsampling. Another thing you can do is to remove the items with the highest amount of interactions from the dataset as they are not very interesting usually (everybody knows the topsellers already) and heavily impact the computation. Best, Sebastian On 23.06.2016 15:47, jelmer wrote: > Hi, > > I am trying to build a simple recommendation engine using spark item > similarity (eg with > org.apache.mahout.math.cf.SimilarityAnalysis.cooccurrencesIDSs) > > Things work fine on comparatively small dataset but I am having difficulty > scaling it up > > The input I am using is CSV data containing 19.988.422 view item events > produced by 1.384.107 users. Looking at 5.135.845 distinct products > > The csv data is stored on hdfs and is split up over 15 files, consequently > the resultant RDD will have 15 partitions. > > After tweaking some parameters I did manage to get the job to run without > going out of memory but the job takes a very very long time to run > > After running for 15 hours it still is stuck on > > org.apache.spark.rdd.RDD.flatMap(RDD.scala:332) > org.apache.mahout.sparkbindings.blas.AtA$.at_a_nongraph_mmul(AtA.scala:254) > org.apache.mahout.sparkbindings.blas.AtA$.at_a(AtA.scala:61) > org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:325) > org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:339) > org.apache.mahout.sparkbindings.SparkEngine$.toPhysical(SparkEngine.scala:123) > org.apache.mahout.math.drm.logical.CheckpointAction.checkpoint(CheckpointAction.scala:41) > org.apache.mahout.math.drm.package$.drm2Checkpointed(package.scala:95) > org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:145) > org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:143) > scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > scala.collection.Iterator$class.foreach(Iterator.scala:727) > scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176) > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45) > scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)scala.collection.AbstractIterator.to(Iterator.scala:1157) > scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:257) > scala.collection.AbstractIterator.toList(Iterator.scala:1157) > > > I am using spark on yarn and containers cannot use more than 16gb > > I figured I would be able to speed things up by throwing a larger number of > executors at the problem. but so far that is not working out very well > > I tried assigning 500 executors and repartitioning the input data to 500 > partitions and even changing the spark.yarn.driver.memoryOverhead to crazy > values (half of the heap) did not resolve this. > > Could someone offer any guidance on how to best speed up item similarity > jobs ? >
Re: New logo
Thanks Scott, You are correct- in fact we're going even further now, that you can do native optimization regardless of the architecture with native-solvers. Do you or anyone more familiar with the history of the website know anything about the origins/uses of this: https://mahout.apache.org/images/Mahout-logo-245x300.png It seems to be a green mahout logo. Also Scott, or anyone lurking who may be able to help. As part of the website reboot I've included a "history" page and would really apppreciate some help capturing that from first person sources if possible. Ive put in some headers but those are only directional: https://github.com/rawkintrevo/mahout/blob/website/website/front/community/history.md Trevor Grant Data Scientist https://github.com/rawkintrevo http://stackexchange.com/users/3002022/rawkintrevo http://trevorgrant.org *"Fortunate is he, who is able to know the causes of things." -Virgil* On Mon, May 1, 2017 at 11:18 AM, scott cotewrote: > Trevor et al: > > Some ideas to spur you on (and related points): > > Mahout is no longer a grab bag of algorithms and routines, but a math > language right? You don’t care about the under the cover implementation. > Today its Spark with alternative implementations in Flink, etc …. > > Don’t know if that is the long term goal still - haven’t kept up - but it > seems like you are insulating yourself from the underlying technology. > > Math is a universal language. Right? > > Tower of Babel is coming to mind …. > > SCott > > > On Apr 27, 2017, at 10:27 PM, Trevor Grant > wrote: > > > > It also bugs me when I can't suggest any alternatives, yet don't like the > > ones in front of me... > > > > I became aware of a symbol a week or so ago, and it keeps coming back to > > me. > > > > The Enso. > > https://en.wikipedia.org/wiki/Ens%C5%8D > > > > Things I like about it: > > (all from wikipedia, since the only thing I knew about this symbol prior > is > > that someone I met had a tattoo of it). > > It represents (among a few other things) enlightenment. > > ^^ This resonated with the 'alternate definition of mahout' from Hebrew- > > which may be something akin to essence or truth. > > > > It is a circle- which plays to the Samsara theme. > > > > It is very expressive, a simple one or two brush stroke circle which > > symbolizes several large concepts and things about the creator, > expressive > > like our DSL (I feel gross comparing such a symbol to a Scala DSL, but > I'm > > spit balling here, please forgive me- I am not so expressive). > > > > "Once the *ensō* is drawn, one does not change it. It evidences the > > character of its creator and the context of its creation in a brief, > > contiguous period of time." Which reminds me of the DRMs > > > > In closed form it represents something akin to Plato's perfection- which > a > > little more wiki surfing tells me is the idea that no one can create a > > perfect circle because a circle is a collection of infinite points and > how > > could ever be sure that you have arranged each one properly, yet such > > things must exist, or what blueprint would a creator of circles be > striving > > for. This, by-the-by reminds me of stochastic approaches to solving > > problems, and really statistics / "machine-learning" in general, in that > we > > can't find perfect solutions, yet we believe solutions exist and serve as > > our blueprint. > > > > Finally, I like that it is simple. > > > > Things I don't like about it: > > Lucent Technologies used it back in the 90s, however they used a very > > specific red one, and this isn't a deal breaker for me. > > > > Other thoughts: > > Based on the tattoo I saw- one could make an Enso using old mahout color > > palatte if one were to dab their brush in the appropriate colors. This > > could also be represented in any single color. (Not sure what that does > to > > our TM, is it ok if we just keep slapping TMs on the side of it? If that > is > > the case is there any reason we must have a single Enso?) > > > > So there is something to throw in the pot that is a little more grown up > > than my runner up favorites (honey badger, blueman riding bomb waving > > cowboy hat, blueman riding lighting bolt into a squirrel covered in > water, > > etc). > > > > Again, only know what wiki has told me, so if anyone is more familiar > with > > this symbol (like was it used as a logo by some horrible dictator which > > carried out terrible attrocities?) or just general comments. > > tg > > > > > > > > Trevor Grant > > Data Scientist > > https://github.com/rawkintrevo > > http://stackexchange.com/users/3002022/rawkintrevo > > http://trevorgrant.org > > > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > > > > On Thu, Apr 27, 2017 at 5:50 PM, Ted Dunning > wrote: > > > >> I don't have any constructive input at all. None of the proposals showed > >> any spark (to me). > >> > >> I
Re: New logo
Trevor et al: Some ideas to spur you on (and related points): Mahout is no longer a grab bag of algorithms and routines, but a math language right? You don’t care about the under the cover implementation. Today its Spark with alternative implementations in Flink, etc …. Don’t know if that is the long term goal still - haven’t kept up - but it seems like you are insulating yourself from the underlying technology. Math is a universal language. Right? Tower of Babel is coming to mind …. SCott > On Apr 27, 2017, at 10:27 PM, Trevor Grantwrote: > > It also bugs me when I can't suggest any alternatives, yet don't like the > ones in front of me... > > I became aware of a symbol a week or so ago, and it keeps coming back to > me. > > The Enso. > https://en.wikipedia.org/wiki/Ens%C5%8D > > Things I like about it: > (all from wikipedia, since the only thing I knew about this symbol prior is > that someone I met had a tattoo of it). > It represents (among a few other things) enlightenment. > ^^ This resonated with the 'alternate definition of mahout' from Hebrew- > which may be something akin to essence or truth. > > It is a circle- which plays to the Samsara theme. > > It is very expressive, a simple one or two brush stroke circle which > symbolizes several large concepts and things about the creator, expressive > like our DSL (I feel gross comparing such a symbol to a Scala DSL, but I'm > spit balling here, please forgive me- I am not so expressive). > > "Once the *ensō* is drawn, one does not change it. It evidences the > character of its creator and the context of its creation in a brief, > contiguous period of time." Which reminds me of the DRMs > > In closed form it represents something akin to Plato's perfection- which a > little more wiki surfing tells me is the idea that no one can create a > perfect circle because a circle is a collection of infinite points and how > could ever be sure that you have arranged each one properly, yet such > things must exist, or what blueprint would a creator of circles be striving > for. This, by-the-by reminds me of stochastic approaches to solving > problems, and really statistics / "machine-learning" in general, in that we > can't find perfect solutions, yet we believe solutions exist and serve as > our blueprint. > > Finally, I like that it is simple. > > Things I don't like about it: > Lucent Technologies used it back in the 90s, however they used a very > specific red one, and this isn't a deal breaker for me. > > Other thoughts: > Based on the tattoo I saw- one could make an Enso using old mahout color > palatte if one were to dab their brush in the appropriate colors. This > could also be represented in any single color. (Not sure what that does to > our TM, is it ok if we just keep slapping TMs on the side of it? If that is > the case is there any reason we must have a single Enso?) > > So there is something to throw in the pot that is a little more grown up > than my runner up favorites (honey badger, blueman riding bomb waving > cowboy hat, blueman riding lighting bolt into a squirrel covered in water, > etc). > > Again, only know what wiki has told me, so if anyone is more familiar with > this symbol (like was it used as a logo by some horrible dictator which > carried out terrible attrocities?) or just general comments. > tg > > > > Trevor Grant > Data Scientist > https://github.com/rawkintrevo > http://stackexchange.com/users/3002022/rawkintrevo > http://trevorgrant.org > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > On Thu, Apr 27, 2017 at 5:50 PM, Ted Dunning wrote: > >> I don't have any constructive input at all. None of the proposals showed >> any spark (to me). >> >> I hate it when I can't suggest a better path and I hate negative feedback. >> But there it is. >> >> >> >> On Thu, Apr 27, 2017 at 3:48 PM, Pat Ferrel wrote: >> >>> Do you have constructive input (guidance or opinion is welcome input) or >>> would you like to discontinue the contest. If the later, -1 now. >>> >>> >>> On Apr 27, 2017, at 3:42 PM, Ted Dunning wrote: >>> >>> I thought that none of the proposals were worth continuing with. >>> >>> >>> >>> On Thu, Apr 27, 2017 at 3:36 PM, Pat Ferrel >> wrote: >>> Yes, -1 means you hate them all or think the designers are not worth paying. We have to pay to continue, I’ll foot the bill (donations appreciated) but don’t want to unless people think it will lead to something. For me there are a couple I wouldn’t mind seeing on the web >>> site or swag and yes we do have time to try something completely different, >>> and the designers will be more willing since there is a guaranteed payout. On Apr 27, 2017, at 3:30 PM, Andrew Musselman < >>> andrew.mussel...@gmail.com> wrote: I thought