Re: Is spark streaming +MlLib for online learning?
Hi What is the general consensus/roadmap for implementing additional online / streamed trainable models? Apache Spark 1.2.1 currently supports streaming linear regression clustering, although other streaming linear methods are planned according to the issue tracker. However, I can not find any details on the issue tracker about online training of a collaborative filter. Judging from another mailing list discussion http://mail-archives.us.apache.org/mod_mbox/spark-user/201501.mbox/%3ce07aa61e-eeb9-4ded-be3e-3f04003e4...@storefront.be%3E incremental training should be possible for ALS. Any plans for the future? Regards mucaho -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-spark-streaming-MlLib-for-online-learning-tp19701p21698.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Is spark streaming +MlLib for online learning?
This feature request is already being tracked: https://issues.apache.org/jira/browse/SPARK-4981 Aiming for 1.4 Best, Reza On Wed, Feb 18, 2015 at 2:40 AM, mucaho muc...@yahoo.com wrote: Hi What is the general consensus/roadmap for implementing additional online / streamed trainable models? Apache Spark 1.2.1 currently supports streaming linear regression clustering, although other streaming linear methods are planned according to the issue tracker. However, I can not find any details on the issue tracker about online training of a collaborative filter. Judging from another mailing list discussion http://mail-archives.us.apache.org/mod_mbox/spark-user/201501.mbox/%3ce07aa61e-eeb9-4ded-be3e-3f04003e4...@storefront.be%3E incremental training should be possible for ALS. Any plans for the future? Regards mucaho -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-spark-streaming-MlLib-for-online-learning-tp19701p21698.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Is spark streaming +MlLib for online learning?
In 1.2, we added streaming k-means: https://github.com/apache/spark/pull/2942 . -Xiangrui On Mon, Nov 24, 2014 at 5:25 PM, Joanne Contact joannenetw...@gmail.com wrote: Thank you Tobias! On Mon, Nov 24, 2014 at 5:13 PM, Tobias Pfeiffer t...@preferred.jp wrote: Hi, On Tue, Nov 25, 2014 at 9:40 AM, Joanne Contact joannenetw...@gmail.com wrote: I seemed to read somewhere that spark is still batch learning, but spark streaming could allow online learning. Spark doesn't do Machine Learning itself, but MLlib does. MLlib currently can do online learning only for linear regression https://spark.apache.org/docs/1.1.0/mllib-linear-methods.html#streaming-linear-regression, as far as I know. Tobias - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Is spark streaming +MlLib for online learning?
Hi Gurus, Sorry for my naive question. I am new. I seemed to read somewhere that spark is still batch learning, but spark streaming could allow online learning. I could not find this on the website now. http://spark.apache.org/docs/latest/streaming-programming-guide.html I know MLLib uses incremental or iterative algorithms, I wonder if this is also true between batches of spark streaming. So the question is: say, when I call MLLib linear regression, does the training use one batch data as training data, if yes, then the model update between batches is already taken care of? That is, the model will eventually use all data that arrived from the beginning until current time of scoring as the training data, or the model only use data coming in the past limited number of batches as training data? Many thanks! J
Re: Is spark streaming +MlLib for online learning?
Hi, On Tue, Nov 25, 2014 at 9:40 AM, Joanne Contact joannenetw...@gmail.com wrote: I seemed to read somewhere that spark is still batch learning, but spark streaming could allow online learning. Spark doesn't do Machine Learning itself, but MLlib does. MLlib currently can do online learning only for linear regression https://spark.apache.org/docs/1.1.0/mllib-linear-methods.html#streaming-linear-regression, as far as I know. Tobias
Re: Is spark streaming +MlLib for online learning?
Thank you Tobias! On Mon, Nov 24, 2014 at 5:13 PM, Tobias Pfeiffer t...@preferred.jp wrote: Hi, On Tue, Nov 25, 2014 at 9:40 AM, Joanne Contact joannenetw...@gmail.com wrote: I seemed to read somewhere that spark is still batch learning, but spark streaming could allow online learning. Spark doesn't do Machine Learning itself, but MLlib does. MLlib currently can do online learning only for linear regression https://spark.apache.org/docs/1.1.0/mllib-linear-methods.html#streaming-linear-regression, as far as I know. Tobias