I am also trying to understand how are files named when writing to hadoop? for eg: how does "saveAs" method ensures that each executor is generating unique files?
On Tue, Aug 11, 2015 at 4:21 PM, ayan guha <guha.a...@gmail.com> wrote: > partitioning - by itself - is a property of RDD. so essentially it is no > different in case of streaming where each batch is one RDD. You can use > partitionBy on RDD and pass on your custom partitioner function to it. > > One thing you should consider is how balanced are your partitions ie your > partition scheme should not skew data into one partition too much. > > Best > Ayan > > On Wed, Aug 12, 2015 at 9:06 AM, Mohit Anchlia <mohitanch...@gmail.com> > wrote: > >> How does partitioning in spark work when it comes to streaming? What's >> the best way to partition a time series data grouped by a certain tag like >> categories of product video, music etc. >> > > > > -- > Best Regards, > Ayan Guha >