Thanks for the encouragement guys and my apologies for deviating from the
discussion again.

As suggested by Sebastian, I looked into the Mahout's recommenders and
implemented a User-based recommender as well. IMHO, I now have a fair
understanding about the recommenders.

In order to be a part of ASF-ICFOSS mentor-ship
programme<https://cwiki.apache.org/confluence/display/COMDEV/ASF-ICFOSS+Pilot+Mentoring+Programme>I
am required to submit a proposal discussing about my plan in order to
work on this idea. My understanding, based on the discussion on this
thread, so far is that we are still in the process of finalizing an
approach and yet to start working on the idea.

I need your help on the following:
1) Could you please suggest the sub-features that I would be able to
contribute to and thus can mention them under expected deliverables ?
2) Also, I would be really grateful if one of you could be my
mentor<http://community.apache.org/mentoringprogramme.html>as a part
of this programme (As a mentee I would need only some
pointers/directions to proceed further). Please let me know.

I am really excited and looking forward to a positive response to work on
this feature.

Thanks & Regards,
Abhishek Sharma
http://www.linkedin.com/in/abhi21
https://github.com/abhi21

On Thu, Jul 18, 2013 at 2:39 AM, Peng Cheng <[email protected]> wrote:

> Awesome! your reinforcements are highly appreciated.
>
>
> On 13-07-17 01:29 AM, Abhishek Sharma wrote:
>
>> Sorry to interrupt guys, but I just wanted to bring it to your notice that
>> I am also interested in contributing to this idea. I am planning to
>> participate in ASF-ICFOSS mentor-ship
>> programme<https://cwiki.**apache.org/confluence/display/**
>> COMDEV/ASF-ICFOSS+Pilot+**Mentoring+Programme<https://cwiki.apache.org/confluence/display/COMDEV/ASF-ICFOSS+Pilot+Mentoring+Programme>
>> >.
>>
>> (this is very similar to GSOC)
>>
>> I do have strong concepts in machine learning (have done the ML course by
>> Andrew NG on coursera) also, I am good in programming (have 2.5 yrs of
>> work
>> experience). I am not really sure of how can I approach this problem (but
>> I
>> do have a strong interest to work on this problem) hence would like to
>> pair
>> up on this. I am currently working as a research intern at Indian
>> Institute
>> of Science (IISc), Bangalore India and can put up 15-20 hrs per week.
>>
>> Please let me know your thoughts if I can be a part of this.
>>
>> Thanks & Regards,
>> Abhishek Sharma
>> http://www.linkedin.com/in/**abhi21 <http://www.linkedin.com/in/abhi21>
>> https://github.com/abhi21
>>
>>
>> On Wed, Jul 17, 2013 at 3:11 AM, Gokhan Capan <[email protected]> wrote:
>>
>>  Peng,
>>>
>>> This is the reason I separated out the DataModel, and only put the
>>> learner
>>> stuff there. The learner I mentioned yesterday just stores the
>>> parameters, (noOfUsers+noOfItems)***noOfLatentFactors, and does not care
>>> where preferences are stored.
>>>
>>> I, kind of, agree with the multi-level DataModel approach:
>>> One for iterating over "all" preferences, one for if one wants to deploy
>>> a
>>> recommender and perform a lot of top-N recommendation tasks.
>>>
>>> (Or one DataModel with a strategy that might reduce existing memory
>>> consumption, while still providing fast access, I am not sure. Let me
>>> try a
>>> matrix-backed DataModel approach)
>>>
>>> Gokhan
>>>
>>>
>>> On Tue, Jul 16, 2013 at 9:51 PM, Sebastian Schelter <[email protected]>
>>> wrote:
>>>
>>>  I completely agree, Netflix is less than one gigabye in a smart
>>>> representation, 12x more memory is a nogo. The techniques used in
>>>> FactorizablePreferences allow a much more memory efficient
>>>>
>>> representation,
>>>
>>>> tested on KDD Music dataset which is approx 2.5 times Netflix and fits
>>>>
>>> into
>>>
>>>> 3GB with that approach.
>>>>
>>>>
>>>> 2013/7/16 Ted Dunning <[email protected]>
>>>>
>>>>  Netflix is a small dataset.  12G for that seems quite excessive.
>>>>>
>>>>> Note also that this is before you have done any work.
>>>>>
>>>>> Ideally, 100million observations should take << 1GB.
>>>>>
>>>>> On Tue, Jul 16, 2013 at 8:19 AM, Peng Cheng <[email protected]>
>>>>>
>>>> wrote:
>>>>
>>>>> The second idea is indeed splendid, we should separate
>>>>>>
>>>>> time-complexity
>>>
>>>> first and space-complexity first implementation. What I'm not quite
>>>>>>
>>>>> sure,
>>>>
>>>>> is that if we really need to create two interfaces instead of one.
>>>>>> Personally, I think 12G heap space is not that high right? Most new
>>>>>>
>>>>> laptop
>>>>>
>>>>>> can already handle that (emphasis on laptop). And if we replace hash
>>>>>>
>>>>> map
>>>>
>>>>> (the culprit of high memory consumption) with list/linkedList, it
>>>>>>
>>>>> would
>>>
>>>> simply degrade time complexity for a linear search to O(n), not too
>>>>>>
>>>>> bad
>>>
>>>> either. The current DataModel is a result of careful thoughts and has
>>>>>> underwent extensive test, it is easier to expand on top of it instead
>>>>>>
>>>>> of
>>>>
>>>>> subverting it.
>>>>>>
>>>>>
>>
>>
>
>


-- 
--
Abhishek Sharma
ThoughtWorks

Reply via email to