Feed KMeans algorithm with a row major matrix

2014-03-18 Thread Jaonary Rabarisoa
Dear All,

I'm trying to cluster data from native library code with Spark Kmeans||. In
my native library the data are represented as a matrix (row = number of
data and col = dimension). For efficiency reason, they are copied into a
one dimensional scala Array row major wise so after the computation I have
a RDD[Array[Double]] but the dimension of each array represents a set of
data instead of the data itself. I need to transfrom these array into
Array[Array[Double]] before running the KMeans|| algorithm.

How to do this efficiently ?



Best regards,


Re: Feed KMeans algorithm with a row major matrix

2014-03-18 Thread Xiangrui Meng
Hi Jaonary,

With the current implementation, you need to call Array.slice to make
each row an Array[Double] and cache the result RDD. There is a plan to
support block-wise input data and I will keep you informed.

Best,
Xiangrui

On Tue, Mar 18, 2014 at 2:46 AM, Jaonary Rabarisoa jaon...@gmail.com wrote:
 Dear All,

 I'm trying to cluster data from native library code with Spark Kmeans||. In
 my native library the data are represented as a matrix (row = number of data
 and col = dimension). For efficiency reason, they are copied into a one
 dimensional scala Array row major wise so after the computation I have a
 RDD[Array[Double]] but the dimension of each array represents a set of data
 instead of the data itself. I need to transfrom these array into
 Array[Array[Double]] before running the KMeans|| algorithm.

 How to do this efficiently ?



 Best regards,