This is very interesting! I would be very happy to work with Gautam to bring 
this to MADlib. 

In fact, we could even implement a more general version of factorization 
machines 
with additional prior on the models — through l1 or l2 regularization.  This 
could
be the first step towards a general optimization/training engine framework 
inside MADlib
which, more powerful than existing SGD variants,  also exploits the curvature 
information of the objective 
to facilitate the learning process. Such framework can be designed such that it 
admits a 
c++/python function that computes the gradient on one point and automatically 
optimizes 
the whole training objective with distributed datasets and l1, l2 
regularizations. With such framework
in place, for example, adding the factorization machine module with 
square/logistic loss becomes 
much simpler —  just a matter of adding two simple c++/python functions that 
computes the 
square/logistic loss at a given data point. 

For those who want to dive deeper, here is some relevant published work of mine:
http://link.springer.com/article/10.1007/s10107-016-0997-3 
<http://link.springer.com/article/10.1007/s10107-016-0997-3>
and the free version on arxiv:
http://arxiv.org/abs/1311.6547 <http://arxiv.org/abs/1311.6547>
which is proved to be faster than the specialized sparse logistic regression 
in lib-linear package from the libsvm group 
(https://www.csie.ntu.edu.tw/~cjlin/liblinear/ 
<https://www.csie.ntu.edu.tw/~cjlin/liblinear/>).

Look forward to hearing more from the community. 

Xiaocheng

--
Xiaocheng Tang, Ph.D,
MADlib Committer, 
Pivotal Software, Inc.
xiaochen...@gmail.com


> On Apr 16, 2016, at 11:57 AM, Frank McQuillan <fmcquil...@pivotal.io> wrote:
> 
> Thanks Gautam for your email regarding factorization machines.  This is a
> really interesting approach to address some of the issues that SVM has with
> sparse tuple interactions.  For folks interesting in building recommender
> systems in Greenplum or HDB/HAWQ, FMs could be a key building block.
> 
> So I think this would be a great addition to MADlib, and I would be happy
> to help out in any way to get it into the MADlib core project, e.g.,
> contribute Pivotal resources for scale testing on very large data sets.
> 
> I know Xiaocheng Teng has experience in this domain, so perhaps he would
> like to help marshal this along?
> 
> Frank
> 
> On Thu, Apr 7, 2016 at 10:45 AM, Gautam Muralidhar <gmuralid...@pivotal.io>
> wrote:
> 
>> Hi MADlib Developers,
>> 
>> I have implemented Steffen Rendle's Factorization Machines
>> <http://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf> on Python and
>> MPP. You can find a pure python implementation and it's MPP variant for
>> Greenplum and HAWQ here:
>> 
>> 1. Factorization Machines - Pure Python
>> <
>> https://github.com/gautamsm/data-science-on-mpp/blob/master/ml_algorithms/factorization_machines.ipynb
>>> 
>> 
>> 2. Factorization Machines - MPP
>> <
>> https://github.com/gautamsm/data-science-on-mpp/blob/master/ml_algorithms/factorization_machines_mpp.ipynb
>>> 
>> 
>> 
>> *A quick summary of factorization machines:*
>> 
>> Typically, when building a recommender system, you not only have tuple
>> interactions (user-item, user-offer, user- TV show, user-purchase, etc.),
>> but also side information in the form of user and item features.
>> Historically, algorithms such as SVM and dense kernel methods have not been
>> good at estimating parameters for recommenders owing to the sparsity of the
>> tuple interactions. Bayesian collaborative filtering approaches have had
>> success, but incorporation of side information entails fairly complex
>> hierarchical modeling resulting in MCMC or Gibbs sampling for inference of
>> model parameters.
>> 
>> Rendle's work is an extremely interesting and novel way of thinking about
>> this problem and seamlessly blending tuple interactions with side
>> information. Since tuple interactions are modeled via a low rank matrix
>> decomposition, the quality of parameter estimation will not suffer from
>> sparsity unlike SVMs. I found the approach really elegant and further
>> research and few great posts on Quora (one by a leading name
>> <https://www.quora.com/What-are-some-of-the-best-recommendation-algorithms
>>> ,
>> Xavier Amatrian) convinced me about the merit of this work.
>> 
>> 
>> A couple of key points about factorization machines:
>> 
>> 1. They can be seen as a generalization of SVMs and SVD++
>> 
>> 2. *They can be applied to any real valued feature vector for regression or
>> classification problems (think in place of linear/logistic regression, SVMs
>> etc.)*
>> 
>> *A quick note on my MPP implementation:*
>> 
>> Function interface:
>> 
>> fm_fit (
>> 
>>        src_table varchar,
>> 
>>        id_col_name varchar,
>> 
>>        x_col_name varchar,
>> 
>>        y_col_name varchar,
>> 
>>        parameters varchar,
>> 
>>        out_table varchar
>> 
>> )
>> 
>> Parameters of the fm_fit function:
>> 
>> source_table - table containing the feature vectors and dependent variable
>> as two separate columns
>> 
>> id_col_name - an id column for each row
>> 
>> x_col_name - features
>> 
>> y_col_name - dependent variable
>> 
>> parameters - learning parameters: regularizers (lambda0, lambda1, lambda2),
>> number of iterations (n_iter), learning rate (learning_rate), factorization
>> hyper-parameter (k), random seed (integer for batch-sgd), optim (batch
>> or batch-std), and out_table - name of the table where the model will be
>> stored
>> 
>> 1. The MPP implementation has been currently implemented for the regression
>> problem (squared loss function). I plan to implement the classification
>> problem soon (hinge/log loss).
>> 
>> 2. The implementation assumes your training data is randomly distributed.
>> Two gradient descent variants are provided: mini-batch SGD and regular
>> batch gradient descent. The mini-batch SGD implementation is very closely
>> based on this paper by Smola et al.
>> <http://www.research.rutgers.edu/~lihong/pub/Zinkevich11Parallelized.pdf>
>> 
>> 3.  Lastly, the order complexity of factorization machines is roughly
>> O(TKN) where T is number of SGD iterations, K is the number of latent
>> factors, and N is the feature dimension. The implementation involves a
>> couple of loops for estimating the parameters of the tuple interactions,
>> which I couldn't vectorize (I could, but the vectorization meant building
>> 3-D matrices, which was even slower).  Consequently, an implementation of
>> the SGD in C or C++ (as in MADlib) will be much faster than the PL/Python
>> approach.
>> 
>> Many thanks to Xiaocheng Teng (MADlib committer) for discussion on parallel
>> SGD algrotihms.
>> 
>> Does the community feel this would be a good addition to MADlib?
>> 
>> Regards,
>> 
>> Gautam
>> 
>> --
>> ------------------------------------------------------------------
>> Gautam S. Muralidhar, Ph.D.
>> Sr. Data Scientist, Pivotal Software, Inc.
>> ------------------------------------------------------------------
>> 

Reply via email to