Re: factorization machines as new project

Gokhan Capan Thu, 11 Apr 2013 16:28:36 -0700

So if I understood correctly, the algorithm still runs on matrix, and a
client still can pass a group of matrices.


Again it came to data preparation:)

I will refactor the implementation to run on single matrix, but provide
tools for turning the obvious client data into actual input to the
algorithm.

Sent from my iPhone

On Apr 12, 2013, at 1:13, Ted Dunning <ted.dunn...@gmail.com> wrote:

One easy thing to do is to build an adjoined matrix type that does the
concatenation on the fly.




On Thu, Apr 11, 2013 at 1:43 PM, Gokhan Capan <gkhn...@gmail.com> wrote:

> Yeah, it is simpler indeed.
>
> I am going to think about alternative ways to make concatenation easier
> for clients.
>
> Thanks for your review
>
>
> On Thu, Apr 11, 2013 at 10:45 PM, Robin Anil <robin.a...@gmail.com> wrote:
>
>> I would have folded them all as different feature ids in a single vector,
>> makes things a lot simpler and faster.
>>
>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>
>>
>> On Thu, Apr 11, 2013 at 11:19 AM, Gokhan Capan <gkhn...@gmail.com> wrote:
>>
>>> Hi Robin,
>>>
>>> If you are asking why they are arrays, it is because to save clients
>>> from concatenating multiple matrices to create the input.
>>>
>>> I am quoting from libFM 
>>> paper<http://www.csie.ntu.edu.tw/~b97053/paper/Factorization%20Machines%20with%20libFM.pdf>:
>>> "For easier interpretation,
>>> the features are grouped into indicators for the active user (blue),
>>> active item (red), other movies rated
>>> by the same user (orange), the time in months (green), and the last
>>> movie rated (brown)."
>>>
>>> I thought a client would create multiple group of matrices, and he can
>>> just pass them all to the algorithm.
>>>
>>> Then the wModel is w parameters, it is still array of vectors for me to
>>> keep the indexing consistent, and vModel is the V parameters.
>>>
>>> Was that what you were asking?
>>>
>>>
>>> On Thu, Apr 11, 2013 at 6:44 PM, Robin Anil <robin.a...@gmail.com>wrote:
>>>
>>>> Comments away. I was a bit confused by the use of Vector[] for w1 and
>>>> Matrix[] for inputs.
>>>>
>>>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>>>
>>>>
>>>> On Thu, Apr 11, 2013 at 10:00 AM, Gokhan Capan <gkhn...@gmail.com>wrote:
>>>>
>>>>> Ted,
>>>>> Robin,
>>>>>
>>>>> Although I did not test on a dataset yet, recently I've been
>>>>> implementing Factorization Machines with SGD optimization.
>>>>>
>>>>> The initial implementation is at
>>>>> https://github.com/gcapan/mahout/tree/fm
>>>>>
>>>>> Would you guys consider to take a look so I can make it better and
>>>>> running?
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Apr 1, 2013 at 8:45 PM, Nkechi Nnadi 
>>>>> <nkechi.nn...@gmail.com>wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I'm long time lurker.  I would be interested in implementing these.  I
>>>>>> thought I would get my feet wet with contributing to wiki with
>>>>>> tutorials
>>>>>> since I have used Mahout for recommendation and clustering in my
>>>>>> dissertation.  I have never contributed code before and I would love
>>>>>> to
>>>>>> start now.
>>>>>>
>>>>>> -Nkechi
>>>>>>
>>>>>>
>>>>>> On Sun, Mar 31, 2013 at 1:14 PM, Robin Anil <robin.a...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> > FMs work really well for a whole range of things. Having
>>>>>> implemented them
>>>>>> > myself, I can extend my services as a reviewer if anyone is willing
>>>>>> to
>>>>>> > start on it.
>>>>>> >
>>>>>> > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>>>>> >
>>>>>> >
>>>>>> > On Sun, Mar 31, 2013 at 2:18 AM, Ted Dunning <ted.dunn...@gmail.com
>>>>>> >
>>>>>> > wrote:
>>>>>> >
>>>>>> > > Relative to Dan's recent mention of SOM as possible new project,
>>>>>> here are
>>>>>> > > slides from KDD Cup 2012 in which Stephen Rendle describes how he
>>>>>> did
>>>>>> > using
>>>>>> > > a very straightforward implementation of Factorization Machines
>>>>>> [1,2].
>>>>>> > >
>>>>>> > >
>>>>>> > > FMs are interesting in the context of Mahout because they can be
>>>>>> used in
>>>>>> > a
>>>>>> > > wide variety of settings including recommendation and targeting
>>>>>> and
>>>>>> > because
>>>>>> > > they have very good performance on a number of tasks.
>>>>>> > >
>>>>>> > > I should mention that Robin was the one who first mentioned FMs
>>>>>> to me.
>>>>>> > >
>>>>>> > > The KDD 2012 competition [3] is of interest in any case because it
>>>>>> > provides
>>>>>> > > a large amount of realistic data for commercially important
>>>>>> problems.
>>>>>> > >
>>>>>> > > [1]
>>>>>> > >
>>>>>> > >
>>>>>> >
>>>>>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/RendleSlides.pdf
>>>>>> > >
>>>>>> > > [2]
>>>>>> > >
>>>>>> > >
>>>>>> >
>>>>>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/Rendle.pdf
>>>>>> > >
>>>>>> > > [3] http://www.kddcup2012.org/
>>>>>> > >
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Gokhan
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Gokhan
>>>
>>
>>
>
>
> --
> Gokhan
>

Re: factorization machines as new project

Reply via email to