Re: Spark on Scala 2.11

2014-05-11 Thread Matei Zaharia
We do want to support it eventually, possibly as early as Spark 1.1 (which we’d 
cross-build on Scala 2.10 and 2.11). If someone wants to look at it before, 
feel free to do so! Scala 2.11 is very close to 2.10 so I think things will 
mostly work, except for possibly the REPL (which has require porting over code 
form the Scala REPL in each version).

Matei

On May 8, 2014, at 6:33 PM, Anand Avati av...@gluster.org wrote:

 Is there an ongoing effort (or intent) to support Spark on Scala 2.11?
 Approximate timeline?
 
 Thanks



Re: Spark on Scala 2.11

2014-05-11 Thread Koert Kuipers
i believe matei has said before that he would like to crossbuild for 2.10
and 2.11, given that the difference is not as big as between 2.9 and 2.10.
but dont know when this would happen...


On Sat, May 10, 2014 at 11:02 PM, Gary Malouf malouf.g...@gmail.com wrote:

 Considering the team just bumped to 2.10 in 0.9, I would be surprised if
 this is a near term priority.


 On Thu, May 8, 2014 at 9:33 PM, Anand Avati av...@gluster.org wrote:

  Is there an ongoing effort (or intent) to support Spark on Scala 2.11?
  Approximate timeline?
 
  Thanks
 



Re: mllib vector templates

2014-05-11 Thread Debasish Das
Hi,

I see ALS is still using Array[Int] but for other mllib algorithm we moved
to Vector[Double] so that it can support either dense and sparse formats...

ALS can stay in Array[Int] due to the Netflix format for input datasets
which is well defined but it helps if we move ALS to Vector[Double] as
well...that way all algorithms will be consistent...

The second issue is that toString on SparseVector does not write libsvm
format but something not very generic...can we change the
SparseVector.toString to write as libsvm output ? I am dumping a sample of
dataset to see how mllib glm compares with the glmnet-R package for QoR...

Thanks.
Deb

On Mon, May 5, 2014 at 4:05 PM, David Hall d...@cs.berkeley.edu wrote:

 On Mon, May 5, 2014 at 3:40 PM, DB Tsai dbt...@stanford.edu wrote:

  David,
 
  Could we use Int, Long, Float as the data feature spaces, and Double for
  optimizer?
 

 Yes. Breeze doesn't allow operations on mixed types, so you'd need to
 convert the double vectors to Floats if you wanted, e.g. dot product with
 the weights vector.

 You might also be interested in FeatureVector, which is just a wrapper
 around Array[Int] that emulates an indicator vector. It supports dot
 products, axpy, etc.

 -- David


 
 
  Sincerely,
 
  DB Tsai
  ---
  My Blog: https://www.dbtsai.com
  LinkedIn: https://www.linkedin.com/in/dbtsai
 
 
  On Mon, May 5, 2014 at 3:06 PM, David Hall d...@cs.berkeley.edu
 wrote:
 
   Lbfgs and other optimizers would not work immediately, as they require
   vector spaces over double. Otherwise it should work.
   On May 5, 2014 3:03 PM, DB Tsai dbt...@stanford.edu wrote:
  
Breeze could take any type (Int, Long, Double, and Float) in the
 matrix
template.
   
   
Sincerely,
   
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
   
   
On Mon, May 5, 2014 at 2:56 PM, Debasish Das 
 debasish.da...@gmail.com
wrote:
   
 Is this a breeze issue or breeze can take templates on float /
  double ?

 If breeze can take templates then it is a minor fix for
 Vectors.scala
right
 ?

 Thanks.
 Deb


 On Mon, May 5, 2014 at 2:45 PM, DB Tsai dbt...@stanford.edu
 wrote:

  +1  Would be nice that we can use different type in Vector.
 
 
  Sincerely,
 
  DB Tsai
  ---
  My Blog: https://www.dbtsai.com
  LinkedIn: https://www.linkedin.com/in/dbtsai
 
 
  On Mon, May 5, 2014 at 2:41 PM, Debasish Das 
   debasish.da...@gmail.com
  wrote:
 
   Hi,
  
   Why mllib vector is using double as default ?
  
   /**
  
* Represents a numeric vector, whose index type is Int and
 value
type
 is
   Double.
  
*/
  
   trait Vector extends Serializable {
  
  
 /**
  
  * Size of the vector.
  
  */
  
 def size: Int
  
  
 /**
  
  * Converts the instance to a double array.
  
  */
  
 def toArray: Array[Double]
  
   Don't we need a template on float/double ? This will give us
  memory
   savings...
  
   Thanks.
  
   Deb