Re: factorization machines as new project

Gokhan Capan Sun, 14 Apr 2013 12:00:18 -0700

Thanks for quick response. My response is inline


On Sun, Apr 14, 2013 at 7:46 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

>
> This is a good start.  I think that there are some things that bug me in
> the implementation.
>
> - assignColumn should work the same way that viewColumn does.
>
Done.

>
> - the machinery that finds the component matrix for a particular column
> should be separated out as a private method.
>
Done.

>
> - I think that the ColumnSizeCalculator class should go away.  You don't
> need an extra object there, just a method.
>
Done, (with a private static method).

>
> - I strongly suspect that you don't need to implement VectorSuperView.
>  Won't the normal handling of viewRow in AbstractMatrix work here?  Speed
> may be an issue, but all speed questions should be decided by measurements.
>
It was because the iterateNonZero didn't work, and this was intended to
work on mostly sparse matrices. I think (but I'm not sure yet) making this
ConcatenatedMatrix a direct subclass of SparseRowMatrix would solve this
problem, that may be an option. (I personally needed this multi-vectors
anyway, so I implemented it)

>
> - viewPart and like() are important.
>
I intentionally left those unsupported, because I wasn't sure what those
should return. Original AbstractMatrix#viewPart again would cause problems
on iterating on fetched rows (the SparseRowMatrix again would have solved
this, I'm gonna think about it). And I wasn't sure what like(rows, columns)
should have returned. A single matrix?

>
> - set and get should not be implemented on top of viewRow.  That will kill
> performance.
>
Fixed.

>
>  - the   public MatrixSuperView(int rowSize, int columnSize, Matrix[]
> matrices){
> constructor makes no sense to me to expose to users.  It should be inlined
> and go away.
>
Gone away.

>
> - the coding style in terms of white space is erratic.  Your IDE should
> fix this.
>
Done.

>
>
>
> On Sun, Apr 14, 2013 at 4:35 AM, Gokhan Capan <gkhn...@gmail.com> wrote:
>
>> Ted,
>>
>> I wrote one yesterday. Basically it is a view implementing matrix, which
>> allows viewing and iterating on rows as if they are concatenated, via
>> VectorSuperView.
>>
>> Class naming can definitely change though.
>>
>> I'll change the LuceneMatrix code to return single matrix for multiple
>> fields (using this view), too.
>>
>> Could you have a look at this (only the matrix and vector views) so I
>> submit a diff (after handling labels), refactor and resubmit LuceneMatrix
>> patch, and then continue to work on Factorization Machines so it can
>> operate on a single matrix?
>>
>> The code is here (Adding exact locations for each related new class
>> because I did a kind of bad commit, from the top directory)
>>
>>
>> https://github.com/gcapan/mahout/blob/fm/math/src/main/java/org/apache/mahout/math/MatrixSuperView.java
>>
>>
>> https://github.com/gcapan/mahout/blob/fm/math/src/main/java/org/apache/mahout/math/VectorSuperView.java
>>
>>
>> https://github.com/gcapan/mahout/blob/fm/math/src/test/java/org/apache/mahout/math/MatrixSuperViewTest.java
>>
>>
>> https://github.com/gcapan/mahout/blob/fm/math/src/test/java/org/apache/mahout/math/VectorSuperViewTest.java
>>
>>
>> On Sat, Apr 13, 2013 at 10:05 AM, Ted Dunning <ted.dunn...@gmail.com>wrote:
>>
>>> What would this MatrixSuperView do?  Would ConcatenatedMatrix be a
>>> better name?
>>>
>>> Sent from my iPhone
>>>
>>> On Apr 12, 2013, at 1:26, Gokhan Capan <gkhn...@gmail.com> wrote:
>>>
>>> > Ted,
>>> >
>>> > How about a MatrixSuperView implements Matrix? (A MatrixView like
>>> implementation)
>>> >
>>> >
>>> > On Fri, Apr 12, 2013 at 2:28 AM, Gokhan Capan <gkhn...@gmail.com>
>>> wrote:
>>> > So if I understood correctly, the algorithm still runs on matrix, and
>>> a client still can pass a group of matrices.
>>> >
>>> > Again it came to data preparation:)
>>> >
>>> > I will refactor the implementation to run on single matrix, but
>>> provide tools for turning the obvious client data into actual input to the
>>> algorithm.
>>> >
>>> > Sent from my iPhone
>>> >
>>> > On Apr 12, 2013, at 1:13, Ted Dunning <ted.dunn...@gmail.com> wrote:
>>> >
>>> >> One easy thing to do is to build an adjoined matrix type that does
>>> the concatenation on the fly.
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Thu, Apr 11, 2013 at 1:43 PM, Gokhan Capan <gkhn...@gmail.com>
>>> wrote:
>>> >> Yeah, it is simpler indeed.
>>> >>
>>> >> I am going to think about alternative ways to make concatenation
>>> easier for clients.
>>> >>
>>> >> Thanks for your review
>>> >>
>>> >>
>>> >> On Thu, Apr 11, 2013 at 10:45 PM, Robin Anil <robin.a...@gmail.com>
>>> wrote:
>>> >> I would have folded them all as different feature ids in a single
>>> vector, makes things a lot simpler and faster.
>>> >>
>>> >> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>> >>
>>> >>
>>> >> On Thu, Apr 11, 2013 at 11:19 AM, Gokhan Capan <gkhn...@gmail.com>
>>> wrote:
>>> >> Hi Robin,
>>> >>
>>> >> If you are asking why they are arrays, it is because to save clients
>>> from concatenating multiple matrices to create the input.
>>> >>
>>> >> I am quoting from libFM paper: "For easier interpretation,
>>> >> the features are grouped into indicators for the active user (blue),
>>> active item (red), other movies rated
>>> >> by the same user (orange), the time in months (green), and the last
>>> movie rated (brown)."
>>> >>
>>> >> I thought a client would create multiple group of matrices, and he
>>> can just pass them all to the algorithm.
>>> >>
>>> >> Then the wModel is w parameters, it is still array of vectors for me
>>> to keep the indexing consistent, and vModel is the V parameters.
>>> >>
>>> >> Was that what you were asking?
>>> >>
>>> >>
>>> >> On Thu, Apr 11, 2013 at 6:44 PM, Robin Anil <robin.a...@gmail.com>
>>> wrote:
>>> >> Comments away. I was a bit confused by the use of Vector[] for w1 and
>>> Matrix[] for inputs.
>>> >>
>>> >> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>> >>
>>> >>
>>> >> On Thu, Apr 11, 2013 at 10:00 AM, Gokhan Capan <gkhn...@gmail.com>
>>> wrote:
>>> >> Ted,
>>> >> Robin,
>>> >>
>>> >> Although I did not test on a dataset yet, recently I've been
>>> implementing Factorization Machines with SGD optimization.
>>> >>
>>> >> The initial implementation is at
>>> https://github.com/gcapan/mahout/tree/fm
>>> >>
>>> >> Would you guys consider to take a look so I can make it better and
>>> running?
>>> >>
>>> >>
>>> >>
>>> >> On Mon, Apr 1, 2013 at 8:45 PM, Nkechi Nnadi <nkechi.nn...@gmail.com>
>>> wrote:
>>> >> Hello,
>>> >>
>>> >> I'm long time lurker.  I would be interested in implementing these.  I
>>> >> thought I would get my feet wet with contributing to wiki with
>>> tutorials
>>> >> since I have used Mahout for recommendation and clustering in my
>>> >> dissertation.  I have never contributed code before and I would love
>>> to
>>> >> start now.
>>> >>
>>> >> -Nkechi
>>> >>
>>> >>
>>> >> On Sun, Mar 31, 2013 at 1:14 PM, Robin Anil <robin.a...@gmail.com>
>>> wrote:
>>> >>
>>> >> > FMs work really well for a whole range of things. Having
>>> implemented them
>>> >> > myself, I can extend my services as a reviewer if anyone is willing
>>> to
>>> >> > start on it.
>>> >> >
>>> >> > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>> >> >
>>> >> >
>>> >> > On Sun, Mar 31, 2013 at 2:18 AM, Ted Dunning <ted.dunn...@gmail.com
>>> >
>>> >> > wrote:
>>> >> >
>>> >> > > Relative to Dan's recent mention of SOM as possible new project,
>>> here are
>>> >> > > slides from KDD Cup 2012 in which Stephen Rendle describes how he
>>> did
>>> >> > using
>>> >> > > a very straightforward implementation of Factorization Machines
>>> [1,2].
>>> >> > >
>>> >> > >
>>> >> > > FMs are interesting in the context of Mahout because they can be
>>> used in
>>> >> > a
>>> >> > > wide variety of settings including recommendation and targeting
>>> and
>>> >> > because
>>> >> > > they have very good performance on a number of tasks.
>>> >> > >
>>> >> > > I should mention that Robin was the one who first mentioned FMs
>>> to me.
>>> >> > >
>>> >> > > The KDD 2012 competition [3] is of interest in any case because it
>>> >> > provides
>>> >> > > a large amount of realistic data for commercially important
>>> problems.
>>> >> > >
>>> >> > > [1]
>>> >> > >
>>> >> > >
>>> >> >
>>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/RendleSlides.pdf
>>> >> > >
>>> >> > > [2]
>>> >> > >
>>> >> > >
>>> >> >
>>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/Rendle.pdf
>>> >> > >
>>> >> > > [3] http://www.kddcup2012.org/
>>> >> > >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Gokhan
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Gokhan
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Gokhan
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Gokhan
>>>
>>
>>
>>
>> --
>> Gokhan
>>
>
>


-- 
Gokhan

Re: factorization machines as new project

Reply via email to