[jira] [Commented] (SPARK-7008) An implementation of Factorization Machine (LibFM)
[ https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255122#comment-15255122 ] Ben McCann commented on SPARK-7008: --- I've found a number of implementations: https://github.com/zhengruifeng/spark-libFM https://github.com/skrusche63/spark-fm https://github.com/blebreton/spark-FM-parallelSGD https://github.com/witgo/zen/tree/master/ml/src/main/scala/com/github/cloudml/zen/ml/recommendation > An implementation of Factorization Machine (LibFM) > -- > > Key: SPARK-7008 > URL: https://issues.apache.org/jira/browse/SPARK-7008 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: zhengruifeng > Labels: features > Attachments: FM_CR.xlsx, FM_convergence_rate.xlsx, QQ20150421-1.png, > QQ20150421-2.png > > > An implementation of Factorization Machines based on Scala and Spark MLlib. > FM is a kind of machine learning algorithm for multi-linear regression, and > is widely used for recommendation. > FM works well in recent years' recommendation competitions. > Ref: > http://libfm.org/ > http://doi.acm.org/10.1145/2168752.2168771 > http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7008) An implementation of Factorization Machine (LibFM)
[ https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970870#comment-14970870 ] Nick Pentreath commented on SPARK-7008: --- Is this now going in 1.6 (as per SPARK-10324)? If so is there a PR, since I cannot find one related. > An implementation of Factorization Machine (LibFM) > -- > > Key: SPARK-7008 > URL: https://issues.apache.org/jira/browse/SPARK-7008 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: zhengruifeng > Labels: features > Attachments: FM_CR.xlsx, FM_convergence_rate.xlsx, QQ20150421-1.png, > QQ20150421-2.png > > > An implementation of Factorization Machines based on Scala and Spark MLlib. > FM is a kind of machine learning algorithm for multi-linear regression, and > is widely used for recommendation. > FM works well in recent years' recommendation competitions. > Ref: > http://libfm.org/ > http://doi.acm.org/10.1145/2168752.2168771 > http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7008) An implementation of Factorization Machine (LibFM)
[ https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658621#comment-14658621 ] Xiangrui Meng commented on SPARK-7008: -- I left the JIRA open but removed the target version. I like the algorithm, but I think we want to hear more success stories of FM before we add it to MLlib. An implementation of Factorization Machine (LibFM) -- Key: SPARK-7008 URL: https://issues.apache.org/jira/browse/SPARK-7008 Project: Spark Issue Type: New Feature Components: MLlib Reporter: zhengruifeng Labels: features Attachments: FM_CR.xlsx, FM_convergence_rate.xlsx, QQ20150421-1.png, QQ20150421-2.png An implementation of Factorization Machines based on Scala and Spark MLlib. FM is a kind of machine learning algorithm for multi-linear regression, and is widely used for recommendation. FM works well in recent years' recommendation competitions. Ref: http://libfm.org/ http://doi.acm.org/10.1145/2168752.2168771 http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7008) An implementation of Factorization Machine (LibFM)
[ https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621830#comment-14621830 ] zhengruifeng commented on SPARK-7008: - Yes, LBFGS provide a faster convergence rate. An implementation of Factorization Machine (LibFM) -- Key: SPARK-7008 URL: https://issues.apache.org/jira/browse/SPARK-7008 Project: Spark Issue Type: New Feature Components: MLlib Reporter: zhengruifeng Labels: features Attachments: FM_CR.xlsx, FM_convergence_rate.xlsx, QQ20150421-1.png, QQ20150421-2.png An implementation of Factorization Machines based on Scala and Spark MLlib. FM is a kind of machine learning algorithm for multi-linear regression, and is widely used for recommendation. FM works well in recent years' recommendation competitions. Ref: http://libfm.org/ http://doi.acm.org/10.1145/2168752.2168771 http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7008) An implementation of Factorization Machine (LibFM)
[ https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573820#comment-14573820 ] DB Tsai commented on SPARK-7008: Do you see better convergence rate when LBFGS is used? An implementation of Factorization Machine (LibFM) -- Key: SPARK-7008 URL: https://issues.apache.org/jira/browse/SPARK-7008 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 1.3.0, 1.3.1, 1.3.2 Reporter: zhengruifeng Labels: features, patch Attachments: FM_CR.xlsx, FM_convergence_rate.xlsx, QQ20150421-1.png, QQ20150421-2.png An implementation of Factorization Machines based on Scala and Spark MLlib. FM is a kind of machine learning algorithm for multi-linear regression, and is widely used for recommendation. FM works well in recent years' recommendation competitions. Ref: http://libfm.org/ http://doi.acm.org/10.1145/2168752.2168771 http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7008) An implementation of Factorization Machine (LibFM)
[ https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513780#comment-14513780 ] zhengruifeng commented on SPARK-7008: - AdaGrad works pretty well in practice, but I think there should be another issue to add it to MLlib as a new Optimizer for general usage. And In my humble opinion, it may be better to avoid binding with some specific Optimizer for new algorithms. An implementation of Factorization Machine (LibFM) -- Key: SPARK-7008 URL: https://issues.apache.org/jira/browse/SPARK-7008 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 1.3.0, 1.3.1, 1.3.2 Reporter: zhengruifeng Labels: features, patch Attachments: FM_CR.xlsx, FM_convergence_rate.xlsx, QQ20150421-1.png, QQ20150421-2.png An implementation of Factorization Machines based on Scala and Spark MLlib. FM is a kind of machine learning algorithm for multi-linear regression, and is widely used for recommendation. FM works well in recent years' recommendation competitions. Ref: http://libfm.org/ http://doi.acm.org/10.1145/2168752.2168771 http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7008) An implementation of Factorization Machine (LibFM)
[ https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513856#comment-14513856 ] Guoqiang Li commented on SPARK-7008: [~mengxr] what's your view for [~podongfeng] said? An implementation of Factorization Machine (LibFM) -- Key: SPARK-7008 URL: https://issues.apache.org/jira/browse/SPARK-7008 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 1.3.0, 1.3.1, 1.3.2 Reporter: zhengruifeng Labels: features, patch Attachments: FM_CR.xlsx, FM_convergence_rate.xlsx, QQ20150421-1.png, QQ20150421-2.png An implementation of Factorization Machines based on Scala and Spark MLlib. FM is a kind of machine learning algorithm for multi-linear regression, and is widely used for recommendation. FM works well in recent years' recommendation competitions. Ref: http://libfm.org/ http://doi.acm.org/10.1145/2168752.2168771 http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7008) An implementation of Factorization Machine (LibFM)
[ https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512110#comment-14512110 ] zhengruifeng commented on SPARK-7008: - The convergence curves of Binary Classification are ploted in attached FM_CR.xlsx. http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/url_combined.bz2 is used, and both SGD and LBFGS are tested. An implementation of Factorization Machine (LibFM) -- Key: SPARK-7008 URL: https://issues.apache.org/jira/browse/SPARK-7008 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 1.3.0, 1.3.1, 1.3.2 Reporter: zhengruifeng Labels: features, patch Attachments: FM_CR.xlsx, FM_convergence_rate.xlsx, QQ20150421-1.png, QQ20150421-2.png An implement of Factorization Machines based on Scala and Spark MLlib. Factorization Machine is a kind of machine learning algorithm for multi-linear regression, and is widely used for recommendation. Factorization Machines works well in recent years' recommendation competitions. Ref: http://libfm.org/ http://doi.acm.org/10.1145/2168752.2168771 http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7008) An implementation of Factorization Machine (LibFM)
[ https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512238#comment-14512238 ] Guoqiang Li commented on SPARK-7008: In practice, relative to the {{LBFGS}} ,{{SGD +AdaGrad}} converges faster and better An implementation of Factorization Machine (LibFM) -- Key: SPARK-7008 URL: https://issues.apache.org/jira/browse/SPARK-7008 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 1.3.0, 1.3.1, 1.3.2 Reporter: zhengruifeng Labels: features, patch Attachments: FM_CR.xlsx, FM_convergence_rate.xlsx, QQ20150421-1.png, QQ20150421-2.png An implementation of Factorization Machines based on Scala and Spark MLlib. FM is a kind of machine learning algorithm for multi-linear regression, and is widely used for recommendation. FM works well in recent years' recommendation competitions. Ref: http://libfm.org/ http://doi.acm.org/10.1145/2168752.2168771 http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7008) An implementation of Factorization Machine (LibFM)
[ https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504596#comment-14504596 ] zhengruifeng commented on SPARK-7008: - I had not considered of the size of model, because the problems which I usualy encounter have dimensionality less than 10 millions. In the situation of higher dimensionality, I think feature hashing may help to limit the number of features (not sure). The libFM had implemented four training algorithms: SGD, AdaptiveSGD, ALS and MCC. I have only implemented the SGD for regression, and I'm to carry out SGD for binary classification. In my opinion, SGD is sensitive to the learning rate: big values cause divergency while small cause long-time training. When coding, I strictly refers to LibFM. There are only two points different: LibFM use strict SGD, I use mini-batch SGD provided by MLlib; LibFM use Learning Rate as a constant, I make it decreasing with the square root of the iteration counter. So I think it's convergence may like LibFM's SGD. I'm testing the library, and the result will be post in several days. Thanks. An implementation of Factorization Machine (LibFM) -- Key: SPARK-7008 URL: https://issues.apache.org/jira/browse/SPARK-7008 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 1.3.0, 1.3.1, 1.3.2 Reporter: zhengruifeng Labels: features, patch Attachments: FM_convergence_rate.xlsx, QQ20150421-1.png, QQ20150421-2.png An implement of Factorization Machines based on Scala and Spark MLlib. Factorization Machine is a kind of machine learning algorithm for multi-linear regression, and is widely used for recommendation. Factorization Machines works well in recent years' recommendation competitions. Ref: http://libfm.org/ http://doi.acm.org/10.1145/2168752.2168771 http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org