This is an automated email from the ASF dual-hosted git repository. janardhan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/systemml.git
The following commit(s) were added to refs/heads/master by this push: new ac5036b [SYSTEMML-1437][DOC] Document Factorization machines ac5036b is described below commit ac5036b1338bd078cff1bb02f1ae75ec12e3b98d Author: Janardhan Pulivarthi <j...@protonmail.com> AuthorDate: Thu Mar 28 01:25:34 2019 +0530 [SYSTEMML-1437][DOC] Document Factorization machines - This patch documents the technical details of the factorization model - Also, a binary classification and a regression script with suitable examples --- docs/algorithms-factorization-machines.md | 226 ++++++++++++++++++++++++++++++ docs/algorithms-reference.md | 3 + 2 files changed, 229 insertions(+) diff --git a/docs/algorithms-factorization-machines.md b/docs/algorithms-factorization-machines.md new file mode 100644 index 0000000..3a380d3 --- /dev/null +++ b/docs/algorithms-factorization-machines.md @@ -0,0 +1,226 @@ +--- +layout: global +title: SystemML Algorithms Reference - Factorization Machines +displayTitle: <a href="algorithms-reference.html">SystemML Algorithms Reference</a> +--- +<!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> + +# 7. Factorization Machines + +### Description + +The Factorization Machine (FM), is a general predictor like SVMs but is also +able to estimate reliable parameters under very high sparsity. The factorization machine +models all nested variable interactions (compared to a polynomial kernel in SVM), but +uses a factorized parameterization instead of a dense parameterisation like in SVMs. + +## Core Model + +### 1. Model Equation: + +$$ \hat{y}(x) = +w_0 + +\sum_{i=1}^{n} w_i x_i + +\sum_{i=1}^{n} \sum_{j=i+1}^{n} \left <v_i, v_j \right > x_i x_j +$$ + + where the model parameters that have to be estimated are: + $$ + w_0 \in R, + W \in R^n, + V \in R^{n \times k} + $$ + +and + $$ + \left <\cdot, \cdot \right > + $$ +is the dot product of two vectors of size $$k$$: + $$ + \left <v_i, v_j \right > = \sum_{f=1}^{k} v_{i,f} \cdot v_{j,f} + $$ + +A row $$ v_i $$ with in $$ V $$describes the $$i$$th variable with $$k \in N_0^+$$ factors. $$k$$ is a hyperparameter, that +defines the dimensionality of factorization. + + * $$ w_0 $$ : global bias + * $$ w_j $$ : models the strength of the ith variable + * $$ w_{i,j} = \left <v_i, v_j \right> $$ : models the interaction between the $$i$$th & $$j$$th variable. + +Instead of using an own model parameter $$ w_{i,j} \in R $$ for each interaction, the FM +models the interaction by factorizing it. + +### 2. Expressiveness: + +It is well known that for any positive definite matrix $$W$$, there exists a matrix $$V$$ such that +$$W = V \cdot V^t$$ provided that $$k$$ is sufficiently large. This shows that an FM can express any +interaction matrix $$W$$ if $$k$$ is chosen large enough. + +### 3. Parameter Estimation Under Sparsity: + +In sparse settings, there is usually not enough data to estimate interaction between variables +directly & independently. FMs can estimate interactions even in these settings well because +they break the independence of the interaction parameters by factorizing them. + +### 4. Computation: + +Due to factorization of pairwise interactions, there is not model parameter that directly depends +on two variables ( e.g., a parameter with an index $$(i,j)$$ ). So, the pairwise interactions can be +reformulated as shown below. + +$$ +\sum_{i=1}^n \sum_{j=i+1}^n \left <v_i, v_j \right > x_i x_j +$$ + +$$ += {1 \over 2} \sum_{i=1}^n \sum_{j=1}^n x_i x_j - {1 \over 2} \sum_{i=1}^n \left <v_i, v_j \right > x_i x_i +$$ + +$$ += {1 \over 2} \left ( \sum_{i=1}^n \sum_{j=1}^n \sum_{f=1}^k v_{i,f} v_{j,f} - \sum_{i=1}^n \sum_{f=1}^k v_{i,f}v_{i,f} x_i x_i \right ) +$$ + +$$ += {1 \over 2} \left ( \sum_{f=1}^k \right ) \left ( \left (\sum_{i=1}^n v_{i,f} x_i \right ) \left (\sum_{j=1}^n v_{j,f} x_j \right ) - \sum_{i=1}^n v_{i,f}^2 x_i^2 \right ) +$$ + +$$ +{1 \over 2} \sum_{f=1}^k \left ( \left ( \sum_{i=1}^n v_{i,f} x_i \right )^2 - \sum_{i=1}^n v_{i,f}^2 x_i^2 \right ) +$$ + +### 5. Learning Factorization Machines +The gradient vector taken for each of the weights, is +$$ +% <![CDATA[ +{ \delta \over \delta \theta } \hat{y}(x) = +\begin{cases} +1 & \text{if } \theta \text{ is } w_0 \\ +x_i & \text{if } \theta \text{ is } w_i \\ +x_i \sum_{j=1}^{n} v_{j,f} \cdot x_j - v_{i,f} \cdot x_i^2 & \text{if } \theta \text{ is } \theta_{i,f} +\end{cases} %]]> +$$ + +### 6. Factorization Models as Predictors: + +### 6.1. Regression: + $$\hat{y}(x)$$ can be used directly as the predictor and the optimization criterion is the minimal +least square error on $$D$$. + +### Usage: +The `train()` function in the [fm-regression.dml](https://github.com/apache/systemml/blob/master/scripts/staging/fm-regression.dml) script, takes in the input variable matrix and the corresponding target vector with some input kept for validation during training. +``` +train = function(matrix[double] X, matrix[double] y, matrix[double] X_val, matrix[double] y_val) + return (matrix[double] w0, matrix[double] W, matrix[double] V) { + /* + * Trains the FM model. + * + * Inputs: + * - X : n examples with d features, of shape (n, d) + * - y : Target matrix, of shape (n, 1) + * - X_val : Input validation data matrix, of shape (n, d) + * - y_val : Target validation matrix, of shape (n, 1) + * + * Outputs: + * - w0, W, V : updated model parameters. + * + * Network Architecture: + * + * X --> [model] --> out --> l2_loss::backward(out, y) --> dout + * + */ + + ... + # 7.Call adam::update for all parameters + [w0,mw0,vw0] = adam::update(w0, dw0, lr, beta1, beta2, epsilon, t, mw0, vw0); + [W, mW, vW] = adam::update(W, dW, lr, beta1, beta2, epsilon, t, mW, vW ); + [V, mV, vV] = adam::update(V, dV, lr, beta1, beta2, epsilon, t, mV, vV ); + +} +``` +Once the `train` function returns the weights for the `fm` model, these are passed to the `predict` function. + +``` +predict = function(matrix[double] X, matrix[double] w0, matrix[double] W, matrix[double] V) + return (matrix[double] out) { + /* + * Computes the predictions for the given inputs. + * + * Inputs: + * - X : n examples with d features, of shape (n, d). + * - w0, W, V : trained model parameters. + * + * Outputs: + * - out : target vector, y. + */ + + out = fm::forward(X, w0, W, V); + +} +``` + +#### running with dummy data: + The [fm-regression-dummy-data.dml](https://github.com/apache/systemml/blob/master/scripts/nn/examples/fm-regression-dummy-data.dml) file can be a nice template, to extend. + +<div class="codetabs"> +<div data-lang="Hadoop" markdown="1"> + hadoop jar SystemML.jar -f ./scripts/nn/examples/fm-regression-dummy-data.dml + +</div> +<div data-lang="Spark" markdown="1"> + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster + --conf spark.driver.maxResultSize=0 + SystemML.jar + -f ./scripts/nn/examples/fm-regression-dummy-data.dml + -config SystemML-config.xml + -exec hybrid_spark +</div> +</div> + +### 6.2. Binary Classification: + The sign of $$\hat{y}(x)$$ is used & the parameters are optimized for the hinge loss or logit loss. + +### Usage: + The `train` function in the [fm-binclass.dml](https://github.com/apache/systemml/blob/master/scripts/staging/fm-binclass.dml) script, takes in the input variable matrix and the corresponding target vector with some input kept for validation during training. This script also contain `train()` and `predict()` function as in the case of regression. + +### running with dummy data: + The [fm-regression-dummy-data.dml](https://github.com/apache/systemml/blob/master/scripts/nn/examples/fm-regression-dummy-data.dml) file can be a nice template, to extend. + +<div class="codetabs"> +<div data-lang="Hadoop" markdown="1"> + hadoop jar SystemML.jar -f ./scripts/nn/examples/fm-binclass-dummy-data.dml + +</div> +<div data-lang="Spark" markdown="1"> + $SPARK_HOME/bin/spark-submit --master yarn + --deploy-mode cluster + --conf spark.driver.maxResultSize=0 + SystemML.jar + -f ./scripts/nn/examples/fm-binclass-dummy-data.dml + -config SystemML-config.xml + -exec hybrid_spark +</div> +</div> +### 6.3. Ranking: +The vectors are ordered by score of $$\hat{y}(x)$$ and the optimization is done over pass of an instance +vectors $$ (x(a), x(b)) \in D $$ with a pairwise classification loss. + +Regularization terms like $$L2$$ are usually applied to the optimization objective to prevent overfitting. + diff --git a/docs/algorithms-reference.md b/docs/algorithms-reference.md index 26c2141..9319093 100644 --- a/docs/algorithms-reference.md +++ b/docs/algorithms-reference.md @@ -55,5 +55,8 @@ limitations under the License. * [Kaplan-Meier Survival Analysis](algorithms-survival-analysis.html#kaplan-meier-survival-analysis) * [Cox Proportional Hazard Regression Model](algorithms-survival-analysis.html#cox-proportional-hazard-regression-model) +* [Factorization Machines](algorithms-factorization-machines.html) + * [Factorization Machine](algorithms-factorization-machines.html#core-model) + * [Bibliography](algorithms-bibliography.html)