BTW, one detail:
When number of iterations is 100 all weights are zero or below and the indices 
are only from set A.
When  number of iterations is 150 I see 30+ non-zero weights (when sorted by 
weight) and indices are distributed across al sets. however MSE is high (5.xxx) 
and the result does not match the domain knowledge.
When  number of iterations is 400 I see 30+ non-zero weights (when sorted by 
weight) and indices are distributed across al sets. however MSE is high (6.xxx) 
and the result does not match the domain knowledge.
Any help will be highly appreciated.

From: ssti...@live.com
To: user@spark.apache.org
Subject: MLLib Linear regression
Date: Tue, 7 Oct 2014 13:41:03 -0700




Hi All,I have following classes of features:
class A: 15000 featuresclass B: 170 featuresclass C: 900 featuresClass D:  6000 
features.
I use linear regression (over sparse data). I get excellent results with low 
RMSE (~0.06) for the following combinations of classes:1. A + B + C 2. B + C + 
D3. A + B4. A + C5. B + D6. C + D7. D
Unfortunately, when I use A + B + C + D (all the features) I get results that 
don't make any sense -- all weights are zero or below and the indices are only 
from set A. I also get high MSE. I changed the number of iterations from 100 to 
150, 250, or even 400. I still get MSE as (5/ 6). Are there any other 
parameters that I can play with? Any insight on what could be wrong? Is it 
somehow it is not able to scale up to 22K features? (I highly doubt that). 


                                                                                
  

Reply via email to