Re: [MLlib] What is the best way to forecast the next month page visit?

2016-02-18 Thread diplomatic Guru
Hi Jorge, Thanks for the example. I managed to get the job to run but the results are appalling. The best I could get it: Test Mean Squared Error: 684.3709679595169 Learned regression tree model: DecisionTreeModel regressor of depth 30 with 6905 nodes I tried tweaking maxDepth and maxBins but I

Re: [MLlib] What is the best way to forecast the next month page visit?

2016-02-02 Thread diplomatic Guru
Hi Jorge, Unfortunately, I couldn't transform the data as you suggested. This is what I get: +---+-+-+ | id|pageIndex| pageVec| +---+-+-+ |0.0| 3.0|(3,[],[])| |1.0| 0.0|(3,[0],[1.0])| |2.0| 2.0|(3,[2],[1.0])| |3.0|

Re: [MLlib] What is the best way to forecast the next month page visit?

2016-02-01 Thread Jorge Machado
Hi Guru, So First transform your Name pages with OneHotEncoder ( https://spark.apache.org/docs/latest/ml-features.html#onehotencoder ) then make the same thing for months: You will end with something like: (first

Re: [MLlib] What is the best way to forecast the next month page visit?

2016-02-01 Thread diplomatic Guru
Any suggestions please? On 29 January 2016 at 22:31, diplomatic Guru wrote: > Hello guys, > > I'm trying understand how I could predict the next month page views based > on the previous access pattern. > > For example, I've collected statistics on page views: > > e.g.

[MLlib] What is the best way to forecast the next month page visit?

2016-01-29 Thread diplomatic Guru
Hello guys, I'm trying understand how I could predict the next month page views based on the previous access pattern. For example, I've collected statistics on page views: e.g. Page,UniqueView - pageA, 1 pageB, 999 ... pageZ,200 I aggregate the statistics monthly.