Hi Xiangrui,Changing the default step size to 0.01 made a huge difference. The results make sense when I use A + B + C + D. MSE is ~0.07 and the outcome matches the domain knowledge. I was wondering is there any documentation on the parameters and when/how to vary them.
> Date: Tue, 7 Oct 2014 15:11:39 -0700 > Subject: Re: MLLib Linear regression > From: men...@gmail.com > To: ssti...@live.com > CC: user@spark.apache.org > > Did you test different regularization parameters and step sizes? In > the combination that works, I don't see "A + D". Did you test that > combination? Are there any linear dependency between A's columns and > D's columns? -Xiangrui > > On Tue, Oct 7, 2014 at 1:56 PM, Sameer Tilak <ssti...@live.com> wrote: > > BTW, one detail: > > > > When number of iterations is 100 all weights are zero or below and the > > indices are only from set A. > > > > When number of iterations is 150 I see 30+ non-zero weights (when sorted by > > weight) and indices are distributed across al sets. however MSE is high > > (5.xxx) and the result does not match the domain knowledge. > > > > When number of iterations is 400 I see 30+ non-zero weights (when sorted by > > weight) and indices are distributed across al sets. however MSE is high > > (6.xxx) and the result does not match the domain knowledge. > > > > Any help will be highly appreciated. > > > > > > ________________________________ > > From: ssti...@live.com > > To: user@spark.apache.org > > Subject: MLLib Linear regression > > Date: Tue, 7 Oct 2014 13:41:03 -0700 > > > > > > Hi All, > > I have following classes of features: > > > > class A: 15000 features > > class B: 170 features > > class C: 900 features > > Class D: 6000 features. > > > > I use linear regression (over sparse data). I get excellent results with low > > RMSE (~0.06) for the following combinations of classes: > > 1. A + B + C > > 2. B + C + D > > 3. A + B > > 4. A + C > > 5. B + D > > 6. C + D > > 7. D > > > > Unfortunately, when I use A + B + C + D (all the features) I get results > > that don't make any sense -- all weights are zero or below and the indices > > are only from set A. I also get high MSE. I changed the number of iterations > > from 100 to 150, 250, or even 400. I still get MSE as (5/ 6). Are there any > > other parameters that I can play with? Any insight on what could be wrong? > > Is it somehow it is not able to scale up to 22K features? (I highly doubt > > that). > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org >