I guess you're not using too many features (e.g. 10m), just that hashing
the index makes it look that way, is that correct?
If so, the simple dictionary that maps your feature index - rank can be
broadcast and used everywhere, so you can pass mllib just the feature's
rank as its index.
Reza
On
Hi,
Currently in GradientDescent.scala, weights is constructed as a dense
vector:
initialWeights = Vectors.dense(new Array[Double](numFeatures))
And the numFeatures is determined in the loadLibSVMFile as the max index of
features.
But in the case of using hash function to compute feature