Hi Reynold, do you suggest removing RoundRobinPartitioning in repartition(numPartitions: Int) API implementation? If that's the direction we're considering, before we have a new implementation, should we suggest users avoid using the repartition(numPartitions: Int) API?
On Sat, Mar 12, 2022 at 1:47 PM Reynold Xin <r...@databricks.com> wrote: > This is why RoundRobinPartitioning shouldn't be used ... > > > On Sat, Mar 12, 2022 at 12:08 PM, Jason Xu <jasonxu.sp...@gmail.com> > wrote: > >> Hi Spark community, >> >> I reported a data correctness issue in >> https://issues.apache.org/jira/browse/SPARK-38388. In short, >> non-deterministic data + Repartition + FetchFailure could result in >> incorrect data, this is an issue we run into in production pipelines, I >> have an example to reproduce the bug in the ticket. >> >> I report here to bring more attention, could you help confirm it's a bug >> and worth effort to further investigate and fix, thank you in advance for >> help! >> >> Thanks, >> Jason Xu >> > >