Hi Gurus,

Sorry for my naive question. I am new.

I seemed to read somewhere that spark is still batch learning, but spark
streaming could allow online learning.

I could not find this on the website now.

http://spark.apache.org/docs/latest/streaming-programming-guide.html

I know MLLib uses incremental or iterative algorithms, I wonder if this is
also true between batches of spark streaming.

So the question is: say, when I call MLLib linear regression, does the
training use one batch data as training data, if yes, then the model update
between batches is already taken care of? That is, the model will
eventually use all data that arrived from the beginning until current time
of scoring as the training data, or the model only use data coming in the
past limited number of batches as training data?


Many thanks!

J

Reply via email to