Hi Abhishek,

Great to hear that you're willing to put some work into this! Have you
ever worked with Mahout's recommenders before? If not, then a good first
step would be to get familiar with them and code up a few examples.

Best,
Sebastian

On 17.07.2013 07:29, Abhishek Sharma wrote:
> Sorry to interrupt guys, but I just wanted to bring it to your notice that
> I am also interested in contributing to this idea. I am planning to
> participate in ASF-ICFOSS mentor-ship
> programme<https://cwiki.apache.org/confluence/display/COMDEV/ASF-ICFOSS+Pilot+Mentoring+Programme>.
> (this is very similar to GSOC)
> 
> I do have strong concepts in machine learning (have done the ML course by
> Andrew NG on coursera) also, I am good in programming (have 2.5 yrs of work
> experience). I am not really sure of how can I approach this problem (but I
> do have a strong interest to work on this problem) hence would like to pair
> up on this. I am currently working as a research intern at Indian Institute
> of Science (IISc), Bangalore India and can put up 15-20 hrs per week.
> 
> Please let me know your thoughts if I can be a part of this.
> 
> Thanks & Regards,
> Abhishek Sharma
> http://www.linkedin.com/in/abhi21
> https://github.com/abhi21
> 
> 
> On Wed, Jul 17, 2013 at 3:11 AM, Gokhan Capan <[email protected]> wrote:
> 
>> Peng,
>>
>> This is the reason I separated out the DataModel, and only put the learner
>> stuff there. The learner I mentioned yesterday just stores the
>> parameters, (noOfUsers+noOfItems)*noOfLatentFactors, and does not care
>> where preferences are stored.
>>
>> I, kind of, agree with the multi-level DataModel approach:
>> One for iterating over "all" preferences, one for if one wants to deploy a
>> recommender and perform a lot of top-N recommendation tasks.
>>
>> (Or one DataModel with a strategy that might reduce existing memory
>> consumption, while still providing fast access, I am not sure. Let me try a
>> matrix-backed DataModel approach)
>>
>> Gokhan
>>
>>
>> On Tue, Jul 16, 2013 at 9:51 PM, Sebastian Schelter <[email protected]>
>> wrote:
>>
>>> I completely agree, Netflix is less than one gigabye in a smart
>>> representation, 12x more memory is a nogo. The techniques used in
>>> FactorizablePreferences allow a much more memory efficient
>> representation,
>>> tested on KDD Music dataset which is approx 2.5 times Netflix and fits
>> into
>>> 3GB with that approach.
>>>
>>>
>>> 2013/7/16 Ted Dunning <[email protected]>
>>>
>>>> Netflix is a small dataset.  12G for that seems quite excessive.
>>>>
>>>> Note also that this is before you have done any work.
>>>>
>>>> Ideally, 100million observations should take << 1GB.
>>>>
>>>> On Tue, Jul 16, 2013 at 8:19 AM, Peng Cheng <[email protected]>
>>> wrote:
>>>>
>>>>> The second idea is indeed splendid, we should separate
>> time-complexity
>>>>> first and space-complexity first implementation. What I'm not quite
>>> sure,
>>>>> is that if we really need to create two interfaces instead of one.
>>>>> Personally, I think 12G heap space is not that high right? Most new
>>>> laptop
>>>>> can already handle that (emphasis on laptop). And if we replace hash
>>> map
>>>>> (the culprit of high memory consumption) with list/linkedList, it
>> would
>>>>> simply degrade time complexity for a linear search to O(n), not too
>> bad
>>>>> either. The current DataModel is a result of careful thoughts and has
>>>>> underwent extensive test, it is easier to expand on top of it instead
>>> of
>>>>> subverting it.
>>>>
>>>
>>
> 
> 
> 

Reply via email to