Awesome! your reinforcements are highly appreciated.

On 13-07-17 01:29 AM, Abhishek Sharma wrote:
Sorry to interrupt guys, but I just wanted to bring it to your notice that
I am also interested in contributing to this idea. I am planning to
participate in ASF-ICFOSS mentor-ship
programme<https://cwiki.apache.org/confluence/display/COMDEV/ASF-ICFOSS+Pilot+Mentoring+Programme>.
(this is very similar to GSOC)

I do have strong concepts in machine learning (have done the ML course by
Andrew NG on coursera) also, I am good in programming (have 2.5 yrs of work
experience). I am not really sure of how can I approach this problem (but I
do have a strong interest to work on this problem) hence would like to pair
up on this. I am currently working as a research intern at Indian Institute
of Science (IISc), Bangalore India and can put up 15-20 hrs per week.

Please let me know your thoughts if I can be a part of this.

Thanks & Regards,
Abhishek Sharma
http://www.linkedin.com/in/abhi21
https://github.com/abhi21


On Wed, Jul 17, 2013 at 3:11 AM, Gokhan Capan <[email protected]> wrote:

Peng,

This is the reason I separated out the DataModel, and only put the learner
stuff there. The learner I mentioned yesterday just stores the
parameters, (noOfUsers+noOfItems)*noOfLatentFactors, and does not care
where preferences are stored.

I, kind of, agree with the multi-level DataModel approach:
One for iterating over "all" preferences, one for if one wants to deploy a
recommender and perform a lot of top-N recommendation tasks.

(Or one DataModel with a strategy that might reduce existing memory
consumption, while still providing fast access, I am not sure. Let me try a
matrix-backed DataModel approach)

Gokhan


On Tue, Jul 16, 2013 at 9:51 PM, Sebastian Schelter <[email protected]>
wrote:

I completely agree, Netflix is less than one gigabye in a smart
representation, 12x more memory is a nogo. The techniques used in
FactorizablePreferences allow a much more memory efficient
representation,
tested on KDD Music dataset which is approx 2.5 times Netflix and fits
into
3GB with that approach.


2013/7/16 Ted Dunning <[email protected]>

Netflix is a small dataset.  12G for that seems quite excessive.

Note also that this is before you have done any work.

Ideally, 100million observations should take << 1GB.

On Tue, Jul 16, 2013 at 8:19 AM, Peng Cheng <[email protected]>
wrote:
The second idea is indeed splendid, we should separate
time-complexity
first and space-complexity first implementation. What I'm not quite
sure,
is that if we really need to create two interfaces instead of one.
Personally, I think 12G heap space is not that high right? Most new
laptop
can already handle that (emphasis on laptop). And if we replace hash
map
(the culprit of high memory consumption) with list/linkedList, it
would
simply degrade time complexity for a linear search to O(n), not too
bad
either. The current DataModel is a result of careful thoughts and has
underwent extensive test, it is easier to expand on top of it instead
of
subverting it.




Reply via email to