Hi Xiangrui,Changing the default step size to 0.01 made a huge difference. The 
results make sense when I use A + B + C + D. MSE is ~0.07 and the outcome 
matches the domain knowledge. 
I was wondering is there any documentation on the parameters and when/how to 
vary them.  

> Date: Tue, 7 Oct 2014 15:11:39 -0700
> Subject: Re: MLLib Linear regression
> From: men...@gmail.com
> To: ssti...@live.com
> CC: user@spark.apache.org
> 
> Did you test different regularization parameters and step sizes? In
> the combination that works, I don't see "A + D". Did you test that
> combination? Are there any linear dependency between A's columns and
> D's columns? -Xiangrui
> 
> On Tue, Oct 7, 2014 at 1:56 PM, Sameer Tilak <ssti...@live.com> wrote:
> > BTW, one detail:
> >
> > When number of iterations is 100 all weights are zero or below and the
> > indices are only from set A.
> >
> > When  number of iterations is 150 I see 30+ non-zero weights (when sorted by
> > weight) and indices are distributed across al sets. however MSE is high
> > (5.xxx) and the result does not match the domain knowledge.
> >
> > When  number of iterations is 400 I see 30+ non-zero weights (when sorted by
> > weight) and indices are distributed across al sets. however MSE is high
> > (6.xxx) and the result does not match the domain knowledge.
> >
> > Any help will be highly appreciated.
> >
> >
> > ________________________________
> > From: ssti...@live.com
> > To: user@spark.apache.org
> > Subject: MLLib Linear regression
> > Date: Tue, 7 Oct 2014 13:41:03 -0700
> >
> >
> > Hi All,
> > I have following classes of features:
> >
> > class A: 15000 features
> > class B: 170 features
> > class C: 900 features
> > Class D:  6000 features.
> >
> > I use linear regression (over sparse data). I get excellent results with low
> > RMSE (~0.06) for the following combinations of classes:
> > 1. A + B + C
> > 2. B + C + D
> > 3. A + B
> > 4. A + C
> > 5. B + D
> > 6. C + D
> > 7. D
> >
> > Unfortunately, when I use A + B + C + D (all the features) I get results
> > that don't make any sense -- all weights are zero or below and the indices
> > are only from set A. I also get high MSE. I changed the number of iterations
> > from 100 to 150, 250, or even 400. I still get MSE as (5/ 6). Are there any
> > other parameters that I can play with? Any insight on what could be wrong?
> > Is it somehow it is not able to scale up to 22K features? (I highly doubt
> > that).
> >
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 
                                          

Reply via email to