[ https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504596#comment-14504596 ]
zhengruifeng commented on SPARK-7008: ------------------------------------- I had not considered of the size of model, because the problems which I usualy encounter have dimensionality less than 10 millions. In the situation of higher dimensionality, I think feature hashing may help to limit the number of features (not sure). The libFM had implemented four training algorithms: SGD, AdaptiveSGD, ALS and MCC. I have only implemented the SGD for regression, and I'm to carry out SGD for binary classification. In my opinion, SGD is sensitive to the learning rate: big values cause divergency while small cause long-time training. When coding, I strictly refers to LibFM. There are only two points different: LibFM use strict SGD, I use mini-batch SGD provided by MLlib; LibFM use Learning Rate as a constant, I make it decreasing with the square root of the iteration counter. So I think it's convergence may like LibFM's SGD. I'm testing the library, and the result will be post in several days. Thanks. > An implementation of Factorization Machine (LibFM) > -------------------------------------------------- > > Key: SPARK-7008 > URL: https://issues.apache.org/jira/browse/SPARK-7008 > Project: Spark > Issue Type: New Feature > Components: MLlib > Affects Versions: 1.3.0, 1.3.1, 1.3.2 > Reporter: zhengruifeng > Labels: features, patch > Attachments: FM_convergence_rate.xlsx, QQ20150421-1.png, > QQ20150421-2.png > > > An implement of Factorization Machines based on Scala and Spark MLlib. > Factorization Machine is a kind of machine learning algorithm for > multi-linear regression, and is widely used for recommendation. > Factorization Machines works well in recent years' recommendation > competitions. > Ref: > http://libfm.org/ > http://doi.acm.org/10.1145/2168752.2168771 > http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org