I am trying to do the same thing and also wondering what the best strategy is.

Thanks

________________________________________
From: ll <duy.huynh....@gmail.com>
Sent: Wednesday, December 3, 2014 10:28 AM
To: u...@spark.incubator.apache.org
Subject: what is the best way to implement mini batches?

hi.  what is the best way to pass through a large dataset in small,
sequential mini batches?

for example, with 1,000,000 data points and the mini batch size is 10,  we
would need to do some computation at these mini batches (0..9), (10..19),
(20..29), ... (N-9, N)

RDD.repartition(N/10).mapPartitions() work?

thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/what-is-the-best-way-to-implement-mini-batches-tp20264.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to