Re: Considering Spark for large data elements

2015-02-26 Thread Jeffrey Jedele
Hi Rob, I fear your questions will be hard to answer without additional information about what kind of simulations you plan to do. int[r][c] basically means you have a matrix of integers? You could for example map this to a row-oriented RDD of integer-arrays or to a column oriented RDD of integer

Considering Spark for large data elements

2015-02-25 Thread Rob Sargent
I have an application which might benefit from Sparks distribution/analysis, but I'm worried about the size and structure of my data set. I need to perform several thousand simulation on a rather large data set and I need access to all the generated simulations. The data element is largely