Hi,
I see ALS is still using Array[Int] but for other mllib algorithm we moved
to Vector[Double] so that it can support either dense and sparse formats...
ALS can stay in Array[Int] due to the Netflix format for input datasets
which is well defined but it helps if we move ALS to Vector[Double] as
well...that way all algorithms will be consistent...
The second issue is that toString on SparseVector does not write libsvm
format but something not very generic...can we change the
SparseVector.toString to write as libsvm output ? I am dumping a sample of
dataset to see how mllib glm compares with the glmnet-R package for QoR...
Thanks.
Deb
On Mon, May 5, 2014 at 4:05 PM, David Hall d...@cs.berkeley.edu wrote:
On Mon, May 5, 2014 at 3:40 PM, DB Tsai dbt...@stanford.edu wrote:
David,
Could we use Int, Long, Float as the data feature spaces, and Double for
optimizer?
Yes. Breeze doesn't allow operations on mixed types, so you'd need to
convert the double vectors to Floats if you wanted, e.g. dot product with
the weights vector.
You might also be interested in FeatureVector, which is just a wrapper
around Array[Int] that emulates an indicator vector. It supports dot
products, axpy, etc.
-- David
Sincerely,
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On Mon, May 5, 2014 at 3:06 PM, David Hall d...@cs.berkeley.edu
wrote:
Lbfgs and other optimizers would not work immediately, as they require
vector spaces over double. Otherwise it should work.
On May 5, 2014 3:03 PM, DB Tsai dbt...@stanford.edu wrote:
Breeze could take any type (Int, Long, Double, and Float) in the
matrix
template.
Sincerely,
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On Mon, May 5, 2014 at 2:56 PM, Debasish Das
debasish.da...@gmail.com
wrote:
Is this a breeze issue or breeze can take templates on float /
double ?
If breeze can take templates then it is a minor fix for
Vectors.scala
right
?
Thanks.
Deb
On Mon, May 5, 2014 at 2:45 PM, DB Tsai dbt...@stanford.edu
wrote:
+1 Would be nice that we can use different type in Vector.
Sincerely,
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On Mon, May 5, 2014 at 2:41 PM, Debasish Das
debasish.da...@gmail.com
wrote:
Hi,
Why mllib vector is using double as default ?
/**
* Represents a numeric vector, whose index type is Int and
value
type
is
Double.
*/
trait Vector extends Serializable {
/**
* Size of the vector.
*/
def size: Int
/**
* Converts the instance to a double array.
*/
def toArray: Array[Double]
Don't we need a template on float/double ? This will give us
memory
savings...
Thanks.
Deb