When I'm outputting the RDDs to an external source, I would like the RDDs
to be outputted in a random shuffle so that even the order is random. So
far what I understood is that the RDDs do have a type of order, in that the
order for spark streaming RDDs would be the order in which spark streaming
read the tuples from source (e.g. ordered by roughly when the producer sent
the tuple in addition to any latency)

On Mon, Nov 3, 2014 at 8:48 AM, Sean Owen <so...@cloudera.com> wrote:

> I think the answer will be the same in streaming as in the core. You
> want a random permutation of an RDD? in general RDDs don't have
> ordering at all -- excepting when you sort for example -- so a
> permutation doesn't make sense. Do you just want a well-defined but
> random ordering of the data? Do you just want to (re-)assign elements
> randomly to partitions?
>
> On Mon, Nov 3, 2014 at 4:33 PM, Josh J <joshjd...@gmail.com> wrote:
> > Hi,
> >
> > Is there a nice or optimal method to randomly shuffle spark streaming
> RDDs?
> >
> > Thanks,
> > Josh
>

Reply via email to