Re: Welcome New Committer Nikolay Sakharnykh

2017-05-01 Thread Ellen Friedman
Welcome Nikolay, and thank you for all your efforts for Mahout so far!!

Ellen Friedman


On Mon, May 1, 2017 at 5:34 PM, Dmitriy Lyubimov  wrote:

> Welcome!!
>
> On Wed, Apr 26, 2017 at 8:05 PM, Nikolai Sakharnykh <
> nsakharn...@nvidia.com>
> wrote:
>
> > Hello everyone,
> >
> > I’m sorry for some delay with my introduction, have been swamped with
> > other projects recently ☺
> >
> > Having worked at NVIDIA for around 8 years I have seen GPUs to evolve
> from
> > specialized graphics processors to general purpose computing machines
> that
> > can tackle any problem in the world (as long as you can extract enough
> > parallelism ☺). My area of expertise as an engineer changed as well from
> > games and visual effects to high-performance computing and graph
> analytics.
> >
> > I must say that I’m relatively new to machine learning but it is a very
> > exciting and quickly evolving field and I’d like to share my knowledge
> and
> > skills with the community. I’m honored and very happy to be part of this
> > group and looking forward to making Apache Mahout work efficiently on
> GPUs!
> >
> > Nikolay.
> >
> > From: Peng Zhang [mailto:pzhang.x...@gmail.com]
> > Sent: Saturday, April 22, 2017 4:31 AM
> > To: Nikolai Sakharnykh ; d...@mahout.apache.org;
> > user@mahout.apache.org
> > Subject: Re: Welcome New Committer Nikolay Sakharnykh
> >
> > Welcome Nikolay.
> >
> >
> > On Sat, 22 Apr 2017 at 12:17 Andrew Musselman  > apache.org>> wrote:
> > The Apache Mahout PMC is pleased to announce that we have asked Nikolay
> > Sakharnykh to become a committer and he has accepted. His contribution of
> > an initial set of CUDA bindings into the project are good progress toward
> > our goal of simplifying matrix math at scale.
> >
> > Being a committer allows you to contribute more easily to the project,
> > since in addition to posting pull requests and patches you're also
> granted
> > write access to the code repository; which in turn means you can review
> and
> > accept community contributions, and help others pitch in.
> >
> > Nikolay, we're looking forward to working with you in the future;
> welcome!
> > It is customary for new committers to introduce themselves with a few
> words
> > :)
> >
> > Best
> > Andrew
> >
> > 
> > ---
> > This email message is for the sole use of the intended recipient(s) and
> > may contain
> > confidential information.  Any unauthorized review, use, disclosure or
> > distribution
> > is prohibited.  If you are not the intended recipient, please contact the
> > sender by
> > reply email and destroy all copies of the original message.
> > 
> > ---
> >
>


Re: Welcome New Committer Nikolay Sakharnykh

2017-05-01 Thread Dmitriy Lyubimov
Welcome!!

On Wed, Apr 26, 2017 at 8:05 PM, Nikolai Sakharnykh 
wrote:

> Hello everyone,
>
> I’m sorry for some delay with my introduction, have been swamped with
> other projects recently ☺
>
> Having worked at NVIDIA for around 8 years I have seen GPUs to evolve from
> specialized graphics processors to general purpose computing machines that
> can tackle any problem in the world (as long as you can extract enough
> parallelism ☺). My area of expertise as an engineer changed as well from
> games and visual effects to high-performance computing and graph analytics.
>
> I must say that I’m relatively new to machine learning but it is a very
> exciting and quickly evolving field and I’d like to share my knowledge and
> skills with the community. I’m honored and very happy to be part of this
> group and looking forward to making Apache Mahout work efficiently on GPUs!
>
> Nikolay.
>
> From: Peng Zhang [mailto:pzhang.x...@gmail.com]
> Sent: Saturday, April 22, 2017 4:31 AM
> To: Nikolai Sakharnykh ; d...@mahout.apache.org;
> user@mahout.apache.org
> Subject: Re: Welcome New Committer Nikolay Sakharnykh
>
> Welcome Nikolay.
>
>
> On Sat, 22 Apr 2017 at 12:17 Andrew Musselman  apache.org>> wrote:
> The Apache Mahout PMC is pleased to announce that we have asked Nikolay
> Sakharnykh to become a committer and he has accepted. His contribution of
> an initial set of CUDA bindings into the project are good progress toward
> our goal of simplifying matrix math at scale.
>
> Being a committer allows you to contribute more easily to the project,
> since in addition to posting pull requests and patches you're also granted
> write access to the code repository; which in turn means you can review and
> accept community contributions, and help others pitch in.
>
> Nikolay, we're looking forward to working with you in the future; welcome!
> It is customary for new committers to introduce themselves with a few words
> :)
>
> Best
> Andrew
>
> 
> ---
> This email message is for the sole use of the intended recipient(s) and
> may contain
> confidential information.  Any unauthorized review, use, disclosure or
> distribution
> is prohibited.  If you are not the intended recipient, please contact the
> sender by
> reply email and destroy all copies of the original message.
> 
> ---
>


Re: New logo

2017-05-01 Thread Ellen Friedman
Just seeing this now, so maybe too late for my vote to count, but here
goes.

On Process: Pat thanks for organizing.

+1 to continue to work on logo. Something without the blue man or elephant
is good idea. Prefer not all blue logo.

On designs:

My favorites two from the second batch (both are blue & yellow)

BEST for me is  one with interlocking thin squares and word "Mahout" - MY
FAVORITE [image: Inline image 1]

2nd best for me is one with word "Mahout" in black on interlocking solid
yellow/blue background
[image: Inline image 2]

3rd is simple letter M as wireframe
[image: Inline image 3]but prefer the diagram be in yellow.

I don't care for the loopy curved logos (sorry Andrew!)

Good luck!!

Ellen Friedman

On Thu, Apr 27, 2017 at 12:56 PM, Pat Ferrel  wrote:

> We can treat this like a release vote, if anyone hates all these and
> doesn’t want to continue with shortlisted designers for 3 more days (the
> next step) vote -1 and say if your vote is binding (your are a PMC member)
>
> Otherwise all are welcome to rate everything on the polls below.
>
> In this case you have 24 hours to vote
>
> Here’s my +1 to continue refining.
>
>
> On Apr 27, 2017, at 11:41 AM, Pat Ferrel  wrote:
>
> Here is a second group, hopefully picked to be unique.
> https://99designs.com/contests/poll/vl7xed
>
> We got a lot of responses, these 2 polls contain the best afaict.
>
>
> On Apr 27, 2017, at 11:25 AM, Pat Ferrel  wrote:
>
> Vote: https://99designs.com/contests/poll/rqcgif
>
> We asked for something “mathy” and asked for no elephant and rider. We
> have the rest of the week to tweak so leave comments about what you like or
> would like to change.
>
> We don’t have to pick one of these, so if you hate them all, make that
> known too.
>
>
>


Re: Scaling up spark Iitem similarity on big data data sets

2017-05-01 Thread Pat Ferrel
I just ran into the opposite case Sebastian mentions, where a very large % of 
users have only one interaction. They come from Social media or Search and see 
only thing and leave. Processing this data turned into a huge job but led to 
virtually no change in the model since users with very few interactions also 
have minimal effect on the math. I removed any user with 1 interaction only and 
sped up the model calc by 10x. The moral of the story is that data prep can 
really help. 

I’ve a mind to put min AND max interactions into the algorithm and save people 
the trouble of doing it themselves.

Seems like setting the min = 2 should be the default, at least for the 
primary/conversion event. You could override to any number.


On Jun 23, 2016, at 7:01 AM, Sebastian  wrote:

Hi,

Pairwise similarity is a quadratic problem and its very easy to run into a 
problem size does not scale anymore, especially with so many items. Our code 
downsamples the input data to help with this.

One thing you can do is decrease the argument maxNumInteractions to a lower 
number to increase the amount of downsampling. Another thing you can do is to 
remove the items with the highest amount of interactions from the dataset as 
they are not very interesting usually (everybody knows the topsellers already) 
and heavily impact the computation.

Best,
Sebastian


On 23.06.2016 15:47, jelmer wrote:
> Hi,
> 
> I am trying to build a simple recommendation engine using spark item
> similarity (eg with
> org.apache.mahout.math.cf.SimilarityAnalysis.cooccurrencesIDSs)
> 
> Things work fine on comparatively small dataset but I am having difficulty
> scaling it up
> 
> The input I am using is CSV data containing 19.988.422 view item events
> produced by 1.384.107 users. Looking at 5.135.845 distinct products
> 
> The csv data is stored on hdfs and is split up over 15 files, consequently
> the resultant RDD will have 15 partitions.
> 
> After tweaking some parameters I did manage to get the job to run without
> going out of memory but the job takes a very very long time to run
> 
> After running for 15 hours it still is stuck on
> 
> org.apache.spark.rdd.RDD.flatMap(RDD.scala:332)
> org.apache.mahout.sparkbindings.blas.AtA$.at_a_nongraph_mmul(AtA.scala:254)
> org.apache.mahout.sparkbindings.blas.AtA$.at_a(AtA.scala:61)
> org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:325)
> org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:339)
> org.apache.mahout.sparkbindings.SparkEngine$.toPhysical(SparkEngine.scala:123)
> org.apache.mahout.math.drm.logical.CheckpointAction.checkpoint(CheckpointAction.scala:41)
> org.apache.mahout.math.drm.package$.drm2Checkpointed(package.scala:95)
> org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:145)
> org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:143)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> scala.collection.Iterator$class.foreach(Iterator.scala:727)
> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176)
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)scala.collection.AbstractIterator.to(Iterator.scala:1157)
> scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:257)
> scala.collection.AbstractIterator.toList(Iterator.scala:1157)
> 
> 
> I am using spark on yarn and containers cannot use more than 16gb
> 
> I figured I would be able to speed things up by throwing a larger number of
> executors at the problem. but so far that is not working out very well
> 
> I tried assigning 500 executors and repartitioning the input data to 500
> partitions and even changing the spark.yarn.driver.memoryOverhead to crazy
> values (half of the heap) did not resolve this.
> 
> Could someone offer any guidance on how to best speed up item similarity
> jobs ?
> 



Re: New logo

2017-05-01 Thread Trevor Grant
Thanks Scott,

You are correct- in fact we're going even further now, that you can do
native optimization regardless of the architecture with native-solvers.

Do you or anyone more familiar with the history of the website know
anything about the origins/uses of this:
https://mahout.apache.org/images/Mahout-logo-245x300.png
It seems to be a green mahout logo.

Also Scott, or anyone lurking who may be able to help.  As part of the
website reboot I've included a "history" page and would really apppreciate
some help capturing that from first person sources if possible. Ive put in
some headers but those are only directional:

https://github.com/rawkintrevo/mahout/blob/website/website/front/community/history.md



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Mon, May 1, 2017 at 11:18 AM, scott cote  wrote:

> Trevor et al:
>
> Some ideas to spur you on (and related points):
>
> Mahout is no longer a grab bag of algorithms and routines, but a math
> language right?  You don’t care about the under the cover implementation.
> Today its Spark with alternative implementations in Flink, etc ….
>
> Don’t know if that is the long term goal still  - haven’t kept up - but it
> seems like you are insulating yourself from the underlying technology.
>
> Math is a universal language.  Right?
>
> Tower of Babel is coming to mind ….
>
> SCott
>
> > On Apr 27, 2017, at 10:27 PM, Trevor Grant 
> wrote:
> >
> > It also bugs me when I can't suggest any alternatives, yet don't like the
> > ones in front of me...
> >
> > I became aware of a symbol a week or so ago, and it keeps coming back to
> > me.
> >
> > The Enso.
> > https://en.wikipedia.org/wiki/Ens%C5%8D
> >
> > Things I like about it:
> > (all from wikipedia, since the only thing I knew about this symbol prior
> is
> > that someone I met had a tattoo of it).
> > It represents (among a few other things) enlightenment.
> > ^^ This resonated with the 'alternate definition of mahout' from Hebrew-
> > which may be something akin to essence or truth.
> >
> > It is a circle- which plays to the Samsara theme.
> >
> > It is very expressive, a simple one or two brush stroke circle which
> > symbolizes several large concepts and things about the creator,
> expressive
> > like our DSL (I feel gross comparing such a symbol to a Scala DSL, but
> I'm
> > spit balling here, please forgive me- I am not so expressive).
> >
> > "Once the *ensō* is drawn, one does not change it. It evidences the
> > character of its creator and the context of its creation in a brief,
> > contiguous period of time." Which reminds me of the DRMs
> >
> > In closed form it represents something akin to Plato's perfection- which
> a
> > little more wiki surfing tells me is the idea that no one can create a
> > perfect circle because a circle is a collection of infinite points and
> how
> > could ever be sure that you have arranged each one properly, yet such
> > things must exist, or what blueprint would a creator of circles be
> striving
> > for.  This, by-the-by reminds me of stochastic approaches to solving
> > problems, and really statistics / "machine-learning" in general, in that
> we
> > can't find perfect solutions, yet we believe solutions exist and serve as
> > our blueprint.
> >
> > Finally, I like that it is simple.
> >
> > Things I don't like about it:
> > Lucent Technologies used it back in the 90s, however they used a very
> > specific red one, and this isn't a deal breaker for me.
> >
> > Other thoughts:
> > Based on the tattoo I saw- one could make an Enso using old mahout color
> > palatte if one were to dab their brush in the appropriate colors. This
> > could also be represented in any single color. (Not sure what that does
> to
> > our TM, is it ok if we just keep slapping TMs on the side of it? If that
> is
> > the case is there any reason we must have a single Enso?)
> >
> > So there is something to throw in the pot that is a little more grown up
> > than my runner up favorites (honey badger, blueman riding bomb waving
> > cowboy hat, blueman riding lighting bolt into a squirrel covered in
> water,
> > etc).
> >
> > Again, only know what wiki has told me, so if anyone is more familiar
> with
> > this symbol (like was it used as a logo by some horrible dictator which
> > carried out terrible attrocities?) or just general comments.
> > tg
> >
> >
> >
> > Trevor Grant
> > Data Scientist
> > https://github.com/rawkintrevo
> > http://stackexchange.com/users/3002022/rawkintrevo
> > http://trevorgrant.org
> >
> > *"Fortunate is he, who is able to know the causes of things."  -Virgil*
> >
> >
> > On Thu, Apr 27, 2017 at 5:50 PM, Ted Dunning 
> wrote:
> >
> >> I don't have any constructive input at all. None of the proposals showed
> >> any spark (to me).
> >>
> >> I 

Re: New logo

2017-05-01 Thread scott cote
Trevor et al:

Some ideas to spur you on (and related points):

Mahout is no longer a grab bag of algorithms and routines, but a math language 
right?  You don’t care about the under the cover implementation.  Today its 
Spark with alternative implementations in Flink, etc ….

Don’t know if that is the long term goal still  - haven’t kept up - but it 
seems like you are insulating yourself from the underlying technology.

Math is a universal language.  Right?

Tower of Babel is coming to mind ….

SCott

> On Apr 27, 2017, at 10:27 PM, Trevor Grant  wrote:
> 
> It also bugs me when I can't suggest any alternatives, yet don't like the
> ones in front of me...
> 
> I became aware of a symbol a week or so ago, and it keeps coming back to
> me.
> 
> The Enso.
> https://en.wikipedia.org/wiki/Ens%C5%8D
> 
> Things I like about it:
> (all from wikipedia, since the only thing I knew about this symbol prior is
> that someone I met had a tattoo of it).
> It represents (among a few other things) enlightenment.
> ^^ This resonated with the 'alternate definition of mahout' from Hebrew-
> which may be something akin to essence or truth.
> 
> It is a circle- which plays to the Samsara theme.
> 
> It is very expressive, a simple one or two brush stroke circle which
> symbolizes several large concepts and things about the creator, expressive
> like our DSL (I feel gross comparing such a symbol to a Scala DSL, but I'm
> spit balling here, please forgive me- I am not so expressive).
> 
> "Once the *ensō* is drawn, one does not change it. It evidences the
> character of its creator and the context of its creation in a brief,
> contiguous period of time." Which reminds me of the DRMs
> 
> In closed form it represents something akin to Plato's perfection- which a
> little more wiki surfing tells me is the idea that no one can create a
> perfect circle because a circle is a collection of infinite points and how
> could ever be sure that you have arranged each one properly, yet such
> things must exist, or what blueprint would a creator of circles be striving
> for.  This, by-the-by reminds me of stochastic approaches to solving
> problems, and really statistics / "machine-learning" in general, in that we
> can't find perfect solutions, yet we believe solutions exist and serve as
> our blueprint.
> 
> Finally, I like that it is simple.
> 
> Things I don't like about it:
> Lucent Technologies used it back in the 90s, however they used a very
> specific red one, and this isn't a deal breaker for me.
> 
> Other thoughts:
> Based on the tattoo I saw- one could make an Enso using old mahout color
> palatte if one were to dab their brush in the appropriate colors. This
> could also be represented in any single color. (Not sure what that does to
> our TM, is it ok if we just keep slapping TMs on the side of it? If that is
> the case is there any reason we must have a single Enso?)
> 
> So there is something to throw in the pot that is a little more grown up
> than my runner up favorites (honey badger, blueman riding bomb waving
> cowboy hat, blueman riding lighting bolt into a squirrel covered in water,
> etc).
> 
> Again, only know what wiki has told me, so if anyone is more familiar with
> this symbol (like was it used as a logo by some horrible dictator which
> carried out terrible attrocities?) or just general comments.
> tg
> 
> 
> 
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
> 
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
> 
> 
> On Thu, Apr 27, 2017 at 5:50 PM, Ted Dunning  wrote:
> 
>> I don't have any constructive input at all. None of the proposals showed
>> any spark (to me).
>> 
>> I hate it when I can't suggest a better path and I hate negative feedback.
>> But there it is.
>> 
>> 
>> 
>> On Thu, Apr 27, 2017 at 3:48 PM, Pat Ferrel  wrote:
>> 
>>> Do you have constructive input (guidance or opinion is welcome input) or
>>> would you like to discontinue the contest. If the later, -1 now.
>>> 
>>> 
>>> On Apr 27, 2017, at 3:42 PM, Ted Dunning  wrote:
>>> 
>>> I thought that none of the proposals were worth continuing with.
>>> 
>>> 
>>> 
>>> On Thu, Apr 27, 2017 at 3:36 PM, Pat Ferrel 
>> wrote:
>>> 
 Yes, -1 means you hate them all or think the designers  are not worth
 paying. We have to pay to continue, I’ll foot the bill (donations
 appreciated) but don’t want to unless people think it will lead to
 something. For me there are a couple I wouldn’t mind seeing on the web
>>> site
 or swag and yes we do have time to try something completely different,
>>> and
 the designers will be more willing since there is a guaranteed payout.
 
 
 On Apr 27, 2017, at 3:30 PM, Andrew Musselman <
>>> andrew.mussel...@gmail.com>
 wrote:
 
 I thought