Hi Jaonary,
With the current implementation, you need to call Array.slice to make
each row an Array[Double] and cache the result RDD. There is a plan to
support block-wise input data and I will keep you informed.
Best,
Xiangrui
On Tue, Mar 18, 2014 at 2:46 AM, Jaonary Rabarisoa jaon...@gmail.com wrote:
Dear All,
I'm trying to cluster data from native library code with Spark Kmeans||. In
my native library the data are represented as a matrix (row = number of data
and col = dimension). For efficiency reason, they are copied into a one
dimensional scala Array row major wise so after the computation I have a
RDD[Array[Double]] but the dimension of each array represents a set of data
instead of the data itself. I need to transfrom these array into
Array[Array[Double]] before running the KMeans|| algorithm.
How to do this efficiently ?
Best regards,