Thanks for quick response. My response is inline
On Sun, Apr 14, 2013 at 7:46 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > > This is a good start. I think that there are some things that bug me in > the implementation. > > - assignColumn should work the same way that viewColumn does. > Done. > > - the machinery that finds the component matrix for a particular column > should be separated out as a private method. > Done. > > - I think that the ColumnSizeCalculator class should go away. You don't > need an extra object there, just a method. > Done, (with a private static method). > > - I strongly suspect that you don't need to implement VectorSuperView. > Won't the normal handling of viewRow in AbstractMatrix work here? Speed > may be an issue, but all speed questions should be decided by measurements. > It was because the iterateNonZero didn't work, and this was intended to work on mostly sparse matrices. I think (but I'm not sure yet) making this ConcatenatedMatrix a direct subclass of SparseRowMatrix would solve this problem, that may be an option. (I personally needed this multi-vectors anyway, so I implemented it) > > - viewPart and like() are important. > I intentionally left those unsupported, because I wasn't sure what those should return. Original AbstractMatrix#viewPart again would cause problems on iterating on fetched rows (the SparseRowMatrix again would have solved this, I'm gonna think about it). And I wasn't sure what like(rows, columns) should have returned. A single matrix? > > - set and get should not be implemented on top of viewRow. That will kill > performance. > Fixed. > > - the public MatrixSuperView(int rowSize, int columnSize, Matrix[] > matrices){ > constructor makes no sense to me to expose to users. It should be inlined > and go away. > Gone away. > > - the coding style in terms of white space is erratic. Your IDE should > fix this. > Done. > > > > On Sun, Apr 14, 2013 at 4:35 AM, Gokhan Capan <gkhn...@gmail.com> wrote: > >> Ted, >> >> I wrote one yesterday. Basically it is a view implementing matrix, which >> allows viewing and iterating on rows as if they are concatenated, via >> VectorSuperView. >> >> Class naming can definitely change though. >> >> I'll change the LuceneMatrix code to return single matrix for multiple >> fields (using this view), too. >> >> Could you have a look at this (only the matrix and vector views) so I >> submit a diff (after handling labels), refactor and resubmit LuceneMatrix >> patch, and then continue to work on Factorization Machines so it can >> operate on a single matrix? >> >> The code is here (Adding exact locations for each related new class >> because I did a kind of bad commit, from the top directory) >> >> >> https://github.com/gcapan/mahout/blob/fm/math/src/main/java/org/apache/mahout/math/MatrixSuperView.java >> >> >> https://github.com/gcapan/mahout/blob/fm/math/src/main/java/org/apache/mahout/math/VectorSuperView.java >> >> >> https://github.com/gcapan/mahout/blob/fm/math/src/test/java/org/apache/mahout/math/MatrixSuperViewTest.java >> >> >> https://github.com/gcapan/mahout/blob/fm/math/src/test/java/org/apache/mahout/math/VectorSuperViewTest.java >> >> >> On Sat, Apr 13, 2013 at 10:05 AM, Ted Dunning <ted.dunn...@gmail.com>wrote: >> >>> What would this MatrixSuperView do? Would ConcatenatedMatrix be a >>> better name? >>> >>> Sent from my iPhone >>> >>> On Apr 12, 2013, at 1:26, Gokhan Capan <gkhn...@gmail.com> wrote: >>> >>> > Ted, >>> > >>> > How about a MatrixSuperView implements Matrix? (A MatrixView like >>> implementation) >>> > >>> > >>> > On Fri, Apr 12, 2013 at 2:28 AM, Gokhan Capan <gkhn...@gmail.com> >>> wrote: >>> > So if I understood correctly, the algorithm still runs on matrix, and >>> a client still can pass a group of matrices. >>> > >>> > Again it came to data preparation:) >>> > >>> > I will refactor the implementation to run on single matrix, but >>> provide tools for turning the obvious client data into actual input to the >>> algorithm. >>> > >>> > Sent from my iPhone >>> > >>> > On Apr 12, 2013, at 1:13, Ted Dunning <ted.dunn...@gmail.com> wrote: >>> > >>> >> One easy thing to do is to build an adjoined matrix type that does >>> the concatenation on the fly. >>> >> >>> >> >>> >> >>> >> >>> >> On Thu, Apr 11, 2013 at 1:43 PM, Gokhan Capan <gkhn...@gmail.com> >>> wrote: >>> >> Yeah, it is simpler indeed. >>> >> >>> >> I am going to think about alternative ways to make concatenation >>> easier for clients. >>> >> >>> >> Thanks for your review >>> >> >>> >> >>> >> On Thu, Apr 11, 2013 at 10:45 PM, Robin Anil <robin.a...@gmail.com> >>> wrote: >>> >> I would have folded them all as different feature ids in a single >>> vector, makes things a lot simpler and faster. >>> >> >>> >> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. >>> >> >>> >> >>> >> On Thu, Apr 11, 2013 at 11:19 AM, Gokhan Capan <gkhn...@gmail.com> >>> wrote: >>> >> Hi Robin, >>> >> >>> >> If you are asking why they are arrays, it is because to save clients >>> from concatenating multiple matrices to create the input. >>> >> >>> >> I am quoting from libFM paper: "For easier interpretation, >>> >> the features are grouped into indicators for the active user (blue), >>> active item (red), other movies rated >>> >> by the same user (orange), the time in months (green), and the last >>> movie rated (brown)." >>> >> >>> >> I thought a client would create multiple group of matrices, and he >>> can just pass them all to the algorithm. >>> >> >>> >> Then the wModel is w parameters, it is still array of vectors for me >>> to keep the indexing consistent, and vModel is the V parameters. >>> >> >>> >> Was that what you were asking? >>> >> >>> >> >>> >> On Thu, Apr 11, 2013 at 6:44 PM, Robin Anil <robin.a...@gmail.com> >>> wrote: >>> >> Comments away. I was a bit confused by the use of Vector[] for w1 and >>> Matrix[] for inputs. >>> >> >>> >> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. >>> >> >>> >> >>> >> On Thu, Apr 11, 2013 at 10:00 AM, Gokhan Capan <gkhn...@gmail.com> >>> wrote: >>> >> Ted, >>> >> Robin, >>> >> >>> >> Although I did not test on a dataset yet, recently I've been >>> implementing Factorization Machines with SGD optimization. >>> >> >>> >> The initial implementation is at >>> https://github.com/gcapan/mahout/tree/fm >>> >> >>> >> Would you guys consider to take a look so I can make it better and >>> running? >>> >> >>> >> >>> >> >>> >> On Mon, Apr 1, 2013 at 8:45 PM, Nkechi Nnadi <nkechi.nn...@gmail.com> >>> wrote: >>> >> Hello, >>> >> >>> >> I'm long time lurker. I would be interested in implementing these. I >>> >> thought I would get my feet wet with contributing to wiki with >>> tutorials >>> >> since I have used Mahout for recommendation and clustering in my >>> >> dissertation. I have never contributed code before and I would love >>> to >>> >> start now. >>> >> >>> >> -Nkechi >>> >> >>> >> >>> >> On Sun, Mar 31, 2013 at 1:14 PM, Robin Anil <robin.a...@gmail.com> >>> wrote: >>> >> >>> >> > FMs work really well for a whole range of things. Having >>> implemented them >>> >> > myself, I can extend my services as a reviewer if anyone is willing >>> to >>> >> > start on it. >>> >> > >>> >> > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. >>> >> > >>> >> > >>> >> > On Sun, Mar 31, 2013 at 2:18 AM, Ted Dunning <ted.dunn...@gmail.com >>> > >>> >> > wrote: >>> >> > >>> >> > > Relative to Dan's recent mention of SOM as possible new project, >>> here are >>> >> > > slides from KDD Cup 2012 in which Stephen Rendle describes how he >>> did >>> >> > using >>> >> > > a very straightforward implementation of Factorization Machines >>> [1,2]. >>> >> > > >>> >> > > >>> >> > > FMs are interesting in the context of Mahout because they can be >>> used in >>> >> > a >>> >> > > wide variety of settings including recommendation and targeting >>> and >>> >> > because >>> >> > > they have very good performance on a number of tasks. >>> >> > > >>> >> > > I should mention that Robin was the one who first mentioned FMs >>> to me. >>> >> > > >>> >> > > The KDD 2012 competition [3] is of interest in any case because it >>> >> > provides >>> >> > > a large amount of realistic data for commercially important >>> problems. >>> >> > > >>> >> > > [1] >>> >> > > >>> >> > > >>> >> > >>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/RendleSlides.pdf >>> >> > > >>> >> > > [2] >>> >> > > >>> >> > > >>> >> > >>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/Rendle.pdf >>> >> > > >>> >> > > [3] http://www.kddcup2012.org/ >>> >> > > >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> Gokhan >>> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> Gokhan >>> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> Gokhan >>> >> >>> > >>> > >>> > >>> > -- >>> > Gokhan >>> >> >> >> >> -- >> Gokhan >> > > -- Gokhan