experimental solution to nesting RDDs

2014-09-24 Thread ldmtwo
I want to share and brainstorm on an experiment before I try it all the way. I hope that Spark contributors can comment. To be clear, it is not my intent to use MLLib where I get partial control on the work being done and I'm not seeing it scale well enough yet. I have fundamental questions about

Re: Spark Akka/actor failures.

2014-08-14 Thread ldmtwo
The reason we are not using MLLib and Breeze is the lack of control over the data and performance. After computing the covariance matrix, there isn't too much we can do after that. Many of the methods are private. For now, we need the max value and the coresponding pair of columns. Later, we may

Spark Akka/actor failures.

2014-08-13 Thread ldmtwo
Need help getting around these errors. I have this program that runs fine on smaller input sizes. As it gets larger, Spark has increasing difficulty of being efficient and functioning without errors. We have about 46GB free on each node. The workers and executors are configured to use this up

Re: Is There Any Benchmarks Comparing C++ MPI with Spark

2014-06-19 Thread ldmtwo
Here is a partial comparison. http://dspace.mit.edu/bitstream/handle/1721.1/82517/MIT-CSAIL-TR-2013-028.pdf?sequence=2 SciDB uses MPI with Intel HW and libraries. Amazing performance at the cost of more work. In case the link stops working: A Complex Analytics Genomics Benchmark Rebecca Taft-,

How do you run your spark app?

2014-06-19 Thread ldmtwo
I want to ask this, not because I can't read endless documentation and several tutorials, but because there seems to be many ways of doing things and I keep having issues. How do you run /your /spark app? I had it working when I was only using yarn+hadoop1 (Cloudera), then I had to get Spark and