Sorry to interrupt guys, but I just wanted to bring it to your notice that
I am also interested in contributing to this idea. I am planning to
participate in ASF-ICFOSS mentor-ship
programme<https://cwiki.apache.org/confluence/display/COMDEV/ASF-ICFOSS+Pilot+Mentoring+Programme>.
(this is very similar to GSOC)

I do have strong concepts in machine learning (have done the ML course by
Andrew NG on coursera) also, I am good in programming (have 2.5 yrs of work
experience). I am not really sure of how can I approach this problem (but I
do have a strong interest to work on this problem) hence would like to pair
up on this. I am currently working as a research intern at Indian Institute
of Science (IISc), Bangalore India and can put up 15-20 hrs per week.

Please let me know your thoughts if I can be a part of this.

Thanks & Regards,
Abhishek Sharma
http://www.linkedin.com/in/abhi21
https://github.com/abhi21


On Wed, Jul 17, 2013 at 3:11 AM, Gokhan Capan <[email protected]> wrote:

> Peng,
>
> This is the reason I separated out the DataModel, and only put the learner
> stuff there. The learner I mentioned yesterday just stores the
> parameters, (noOfUsers+noOfItems)*noOfLatentFactors, and does not care
> where preferences are stored.
>
> I, kind of, agree with the multi-level DataModel approach:
> One for iterating over "all" preferences, one for if one wants to deploy a
> recommender and perform a lot of top-N recommendation tasks.
>
> (Or one DataModel with a strategy that might reduce existing memory
> consumption, while still providing fast access, I am not sure. Let me try a
> matrix-backed DataModel approach)
>
> Gokhan
>
>
> On Tue, Jul 16, 2013 at 9:51 PM, Sebastian Schelter <[email protected]>
> wrote:
>
> > I completely agree, Netflix is less than one gigabye in a smart
> > representation, 12x more memory is a nogo. The techniques used in
> > FactorizablePreferences allow a much more memory efficient
> representation,
> > tested on KDD Music dataset which is approx 2.5 times Netflix and fits
> into
> > 3GB with that approach.
> >
> >
> > 2013/7/16 Ted Dunning <[email protected]>
> >
> > > Netflix is a small dataset.  12G for that seems quite excessive.
> > >
> > > Note also that this is before you have done any work.
> > >
> > > Ideally, 100million observations should take << 1GB.
> > >
> > > On Tue, Jul 16, 2013 at 8:19 AM, Peng Cheng <[email protected]>
> > wrote:
> > >
> > > > The second idea is indeed splendid, we should separate
> time-complexity
> > > > first and space-complexity first implementation. What I'm not quite
> > sure,
> > > > is that if we really need to create two interfaces instead of one.
> > > > Personally, I think 12G heap space is not that high right? Most new
> > > laptop
> > > > can already handle that (emphasis on laptop). And if we replace hash
> > map
> > > > (the culprit of high memory consumption) with list/linkedList, it
> would
> > > > simply degrade time complexity for a linear search to O(n), not too
> bad
> > > > either. The current DataModel is a result of careful thoughts and has
> > > > underwent extensive test, it is easier to expand on top of it instead
> > of
> > > > subverting it.
> > >
> >
>



-- 
--
Abhishek Sharma
ThoughtWorks

Reply via email to