Re: Hi, How could I get involved into mahout?

2016-09-21 Thread Tiramisu Ling
Hi Dmitriy,

Thank you for your reply! I'm a postgraduate student of computer science
and the research direction of mine is Deep learning. And the focus point of
my research is use DBN to do the link(between network node) prediction,
which is the major reason makes want to get involved into mahout and do
some contribution. Most of my program knowledge is about Python and Matlab
and, honestly, I only have basic level of Java programing skill. But I
believe I could learn more about how to use Java by reading the codebase of
mahout, trust me ;).

Best Regards,
MikeLing

2016-09-22 6:12 GMT+08:00 Dmitriy Lyubimov :

> ps another way to approach it, which in fact seems to be most common
> motivator here, is to start with a pragmatic problem one already has at
> hand. Abstract tinkering  rarely produces strategically useful
> contributions, it seems.
>
> On Wed, Sep 21, 2016 at 3:09 PM, Dmitriy Lyubimov 
> wrote:
>
> > if you can tell us about your background a little bit, perhaps we could
> > have ideas. frankly we have a pretty sprawling roadmap. At least a set of
> > ideas. It's frankly more than we can realistically do, we can use help,
> yes.
> >
> > On Sat, Sep 17, 2016 at 8:52 AM, Tiramisu Ling 
> > wrote:
> >
> >> Hey everyone, I'm new to mahout and I would like to contribute to it. In
> >> general, I had read the how to contribute page in [1], and I had clone
> the
> >> repo from github. So what should I do next? Are there any issue like
> 'good
> >> first bug' to work with? Thank you very much!:)
> >>
> >> [1]http://mahout.apache.org/developers/how-to-contribute.html
> >>
> >> Best Regards,
> >> MikeLing
> >>
> >
> >
>


Re: Hi, How could I get involved into mahout?

2016-09-21 Thread Dmitriy Lyubimov
ps another way to approach it, which in fact seems to be most common
motivator here, is to start with a pragmatic problem one already has at
hand. Abstract tinkering  rarely produces strategically useful
contributions, it seems.

On Wed, Sep 21, 2016 at 3:09 PM, Dmitriy Lyubimov  wrote:

> if you can tell us about your background a little bit, perhaps we could
> have ideas. frankly we have a pretty sprawling roadmap. At least a set of
> ideas. It's frankly more than we can realistically do, we can use help, yes.
>
> On Sat, Sep 17, 2016 at 8:52 AM, Tiramisu Ling 
> wrote:
>
>> Hey everyone, I'm new to mahout and I would like to contribute to it. In
>> general, I had read the how to contribute page in [1], and I had clone the
>> repo from github. So what should I do next? Are there any issue like 'good
>> first bug' to work with? Thank you very much!:)
>>
>> [1]http://mahout.apache.org/developers/how-to-contribute.html
>>
>> Best Regards,
>> MikeLing
>>
>
>


Re: Hi, How could I get involved into mahout?

2016-09-21 Thread Dmitriy Lyubimov
if you can tell us about your background a little bit, perhaps we could
have ideas. frankly we have a pretty sprawling roadmap. At least a set of
ideas. It's frankly more than we can realistically do, we can use help, yes.

On Sat, Sep 17, 2016 at 8:52 AM, Tiramisu Ling  wrote:

> Hey everyone, I'm new to mahout and I would like to contribute to it. In
> general, I had read the how to contribute page in [1], and I had clone the
> repo from github. So what should I do next? Are there any issue like 'good
> first bug' to work with? Thank you very much!:)
>
> [1]http://mahout.apache.org/developers/how-to-contribute.html
>
> Best Regards,
> MikeLing
>


Re: Recommenders and MABs

2016-09-21 Thread Dmitriy Lyubimov
there's been a great blog on that somewhere on richrelevance blog... But i
have a vague feeling based on what you are saying it may be all old news to
you...

[1] http://engineering.richrelevance.com/bandits-recommendation-systems/
and there's more in the series

On Sat, Sep 17, 2016 at 3:10 PM, Pat Ferrel  wrote:

> I’ve been thinking about how one would implement an application that only
> shows recommendations. This is partly because people want to build such
> things.
>
> There are many problems with this including cold start and overfit.
> However these problems also face MABs and are solved with sampling schemes.
> So imagine that you have several models from which to draw recommendations:
> 1) CF based recommender, 2) random recommendations, 3) popular recs (by
> some measure). If we look at each individual as facing an MAB with a
> sampling algo trained by them to pull recs from the 3 (or more) arms. This
> implies an MAB per user.
>
> The very first visit to the application would randomly draw from the
> choices and since there is no user data the recs engine would have to be
> able to respond (perhaps with random recs) the same would have to be true
> of the popular model (returning random), and random is always happy. The
> problem with this is that none of the arms are completely independent and
> the model driving each arm will change over time.
>
> The first time a user visits will result in a new MAB for them and will
> randomly draw from all arms but may get better responses from popular (with
> no user specific data yet in the system for cf). So the sampling will start
> to favor popular but will still explore other methods. When enough data is
> accumulated to start making good recs, the recommender will start to
> outperform popular and will get more of the user’s reinforcement.
>
> This seems to work with several unanswered questions and one problem to
> avoid—overfit. We would need a sampling method that would never fully
> converge or the user would never get a chance to show their
> expanding/changing preferences. The cf recommender will also overfit if
> non-cf items are not mixed in. Of the sampling methods I’ve seen for MABs,
> Greedy will not work but even  with some form of Bayesian/Thompson sampling
> the question is how to parameterize the sampling. With too little
> convergence we get sub-optimal exploit but we get the same with too much
> convergence and this will also overfit the cf recs.
>
> I imagine we could train a meta-model on the mature explore amount by
> trying different parameterization and finding if there is one answer for
> all or we could resort to heuristic rules—even business rules.
>
> If anyone has read this far, any ideas or comments?


Re: Machine Learning algorithm implementation

2016-09-21 Thread Dmitriy Lyubimov
We primarily think in platform-independent, R-like way now.
http://mahout.apache.org/users/sparkbindings/home.html

We hope it should be a good news for algebraic algorithm implementers like
you.

Samsara is mapped into spark, flink and H20 as it stands (no mapreduce, you
are correct in that).

We recognize that the existing set of optimized algebra operators may not
always be enough, and so we expect that some part of an algorithm can be
done for a particular backend, but we usually hope it is abstracted enough
so that non-samsara parts can then be ported to other backends if need be.

-d


On Tue, Sep 20, 2016 at 8:05 PM, María José Basgall 
wrote:

> Hi all,
> I am a doctorate student in Computer Science and we are developing a
> Self-Organizing Map (SOM) algorithm on MapReduce. I want to know about what
> ML algorithm implementation is missing, because we want to make a
> contribution to this project.
> We checked out this page: https://mahout.apache.org/user
> s/basics/algorithms.html and we figured out that the most of algorithms in
> the MapReduce column are deprecated, what is the reason for it? Do we need
> to think in Spark instead of MapReduce implementations?
>
> Thanks,
> MJ
>