So I tried this:
.mapPartitions(itr = {
itr.grouped(300).flatMap(items = {
myFunction(items)
})
})
and I tried this:
.mapPartitions(itr = {
itr.grouped(300).flatMap(myFunction)
})
I tried making myFunction a method, a function val, and even moving it
into a singleton
The more I'm thinking about this- I may try this instead:
val myChunkedRDD: RDD[List[Event]] = inputRDD.mapPartitions(_
.grouped(300).toList)
I wonder if this would work. I'll try it when I get back to work tomorrow.
Yuyhao, I tried your approach too but it seems to be somehow moving all the
No, only each group should need to fit.
On Wed, Feb 11, 2015 at 2:56 PM, Corey Nolet cjno...@gmail.com wrote:
Doesn't iter still need to fit entirely into memory?
On Wed, Feb 11, 2015 at 5:55 PM, Mark Hamstra m...@clearstorydata.com
wrote:
rdd.mapPartitions { iter =
val grouped =
rdd.mapPartitions { iter =
val grouped = iter.grouped(batchSize)
for (group - grouped) { ... }
}
On Wed, Feb 11, 2015 at 2:44 PM, Corey Nolet cjno...@gmail.com wrote:
I think the word partition here is a tad different than the term
partition that we use in Spark. Basically, I want
Doesn't iter still need to fit entirely into memory?
On Wed, Feb 11, 2015 at 5:55 PM, Mark Hamstra m...@clearstorydata.com
wrote:
rdd.mapPartitions { iter =
val grouped = iter.grouped(batchSize)
for (group - grouped) { ... }
}
On Wed, Feb 11, 2015 at 2:44 PM, Corey Nolet
Check spark/mllib/src/main/scala/org/apache/spark/mllib/rdd/SlidingRDD.scala
It can be used through sliding(windowSize: Int) in
spark/mllib/src/main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala
Yuhao
From: Mark Hamstra [mailto:m...@clearstorydata.com]
Sent: Thursday, February 12, 2015