Re: Partitioning in spark streaming

Mohit Anchlia Tue, 11 Aug 2015 17:36:06 -0700

I am also trying to understand how are files named when writing to hadoop?
for eg: how does "saveAs" method ensures that each executor is generating
unique files?


On Tue, Aug 11, 2015 at 4:21 PM, ayan guha <guha.a...@gmail.com> wrote:

> partitioning - by itself - is a property of RDD. so essentially it is no
> different in case of streaming where each batch is one RDD. You can use
> partitionBy on RDD and pass on your custom partitioner function to it.
>
> One thing you should consider is how balanced are your partitions ie your
> partition scheme should not skew data into one partition too much.
>
> Best
> Ayan
>
> On Wed, Aug 12, 2015 at 9:06 AM, Mohit Anchlia <mohitanch...@gmail.com>
> wrote:
>
>> How does partitioning in spark work when it comes to streaming? What's
>> the best way to partition a time series data grouped by a certain tag like
>> categories of product video, music etc.
>>
>
>
>
> --
> Best Regards,
> Ayan Guha
>

Re: Partitioning in spark streaming

Reply via email to