subject:"Sorted partition ranges without overlap"

Re: Sorted partition ranges without overlap

2017-03-13 Thread Yong Zhang

You can implement your own partitioner based on your own logic.

Yong

From: Kristoffer Sjögren <sto...@gmail.com>
Sent: Monday, March 13, 2017 9:34 AM
To: user
Subject: Sorted partition ranges without overlap

Hi

I have a RDD<byte[]> that needs to be sorted lexicographically and
then processed by partition. The partitions should be split in to
ranged blocks where sorted order is maintained and each partition
containing sequential, non-overlapping keys.

Given keys (1,2,3,4,5,6)

1. Correct
  - 2 partition = (1,2,3),(4,5,6).
  - 3 partition = (1,2),(3,4),(5,6)

2. Incorrect, the ranges overlap even though they're sorted.
  - 2 partitions (1,3,5) (2,4,6)
  - 3 partitions (1,3),(2,5),(4,6)

Is this possible with spark?

Cheers,
-Kristoffer

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Sorted partition ranges without overlap

2017-03-13 Thread Kristoffer Sjögren

Hi

I have a RDD that needs to be sorted lexicographically and
then processed by partition. The partitions should be split in to
ranged blocks where sorted order is maintained and each partition
containing sequential, non-overlapping keys.

Given keys (1,2,3,4,5,6)

1. Correct
  - 2 partition = (1,2,3),(4,5,6).
  - 3 partition = (1,2),(3,4),(5,6)

2. Incorrect, the ranges overlap even though they're sorted.
  - 2 partitions (1,3,5) (2,4,6)
  - 3 partitions (1,3),(2,5),(4,6)


Is this possible with spark?

Cheers,
-Kristoffer

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Sorted partition ranges without overlap

Sorted partition ranges without overlap

2 matches

Site Navigation

Mail list logo

Footer information