Hi All,
I am able to run LinearRegressionWithSGD on a small sample dataset (~60MB
Libsvm file of sparse data) with 6700 features.
val model = LinearRegressionWithSGD.train(examples, numIterations)
At the end I get a model that
model.weights.sizeres6: Int = 6699
I am assuming each entry in the
sc.parallelize(model.weights.toArray, blocks).top(k) will get that right ?
For logistic you might want both positive and negative feature...so just
pass it through a filter on abs and then pick top(k)
On Thu, Sep 18, 2014 at 10:30 AM, Sameer Tilak ssti...@live.com wrote:
Hi All,
I am able to
The importance should be based on some statistics, for example, the
standard deviation of the feature column and the magnitude of the
weight. If the columns are scaled to unit standard deviation (using
StandardScaler), you can tell the importance by the absolute value of
the weight. But there are