Hi Reynold, do you suggest removing RoundRobinPartitioning in
repartition(numPartitions: Int) API implementation? If that's the direction
we're considering, before we have a new implementation, should we suggest
users avoid using the repartition(numPartitions: Int) API?

On Sat, Mar 12, 2022 at 1:47 PM Reynold Xin <r...@databricks.com> wrote:

> This is why RoundRobinPartitioning shouldn't be used ...
>
>
> On Sat, Mar 12, 2022 at 12:08 PM, Jason Xu <jasonxu.sp...@gmail.com>
> wrote:
>
>> Hi Spark community,
>>
>> I reported a data correctness issue in
>> https://issues.apache.org/jira/browse/SPARK-38388. In short,
>> non-deterministic data + Repartition + FetchFailure could result in
>> incorrect data, this is an issue we run into in production pipelines, I
>> have an example to reproduce the bug in the ticket.
>>
>> I report here to bring more attention, could you help confirm it's a bug
>> and worth effort to further investigate and fix, thank you in advance for
>> help!
>>
>> Thanks,
>> Jason Xu
>>
>
>

Reply via email to