the count().
Nick
On Tue, Jun 24, 2014 at 8:47 PM, Alex Boisvert alex.boisv...@gmail.com
wrote:
For the skeptics :), here's a version you can easily reproduce at home:
val rdd1 = sc.parallelize(1 to 1000, 100) // force with 100 partitions
val rdd2 = rdd1.coalesce(100)
val rdd3 = rdd2 map
With the following pseudo-code,
val rdd1 = sc.sequenceFile(...) // has 100 partitions
val rdd2 = rdd1.coalesce(100)
val rdd3 = rdd2 map { ... }
val rdd4 = rdd3.coalesce(2)
val rdd5 = rdd4.saveAsTextFile(...) // want only two output files
I would expect the parallelism of the map() operation to
You might want to try the built-in RDD.cartesian() method.
On Thu, Apr 24, 2014 at 9:05 PM, Qin Wei wei@dewmobile.net wrote:
Hi All,
I have a problem with the Item-Based Collaborative Filtering Recommendation
Algorithms in spark.
The basic flow is as below:
I'll provide answers from our own experience at Bizo. We've been using
Spark for 1+ year now and have found it generally better than previous
approaches (Hadoop + Hive mostly).
On Thu, Apr 10, 2014 at 7:11 AM, Andras Nemeth
andras.nem...@lynxanalytics.com wrote:
I. Is it too much magic? Lots