Great, thanks! I'm also looking into the design doc of SPARK-23889 - that
looks to be beyond the first proposal of SPARK-23889 which is non-trivial
to understand for me, but still looking into it.

Is it a widely known issue? Probably worth having a guide on workaround
till we have a fix in Spark and Iceberg incorporates the change. (Btw how
do we deal with Spark 3.0 vs 3.1 assuming the change can get into Spark
3.1?)

On Sat, Sep 19, 2020 at 8:26 AM Ryan Blue <rb...@netflix.com.invalid> wrote:

> Jungtaek,
>
> I agree with you that we'd ideally get that Spark issue in upstream as
> soon as we can. I'm currently porting it to our 2.4 build so we can test it
> out with the new sort orders that were added to Iceberg. Once that's done
> and I understand the patch a bit better, I'll work on the review to get it
> into Spark. If you want to help review, that would be great, too!
>
> rb
>
> On Wed, Sep 16, 2020 at 4:27 PM Jungtaek Lim <kabhwan.opensou...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> Recently I played around the partitioned Iceberg table in Spark, and
>> realized it requires manual sort. I had to google to find a workaround - I
>> guess there's no documentation unless I'm missing something.
>>
>> While I encountered this with a DataFrame writer, I suspect there would
>> be more limitations as the root issue is missing SPARK-23889 [1]. My
>> suspicion is that any writes would be affected, including CTAS-kind (like
>> copying from another table in different partitioning), as there's no way to
>> enforce the requirements based on partitioning in DSv2 writer.
>>
>> Do I understand this correctly? I feel we may need to spend efforts to
>> push forward SPARK-23889 for Iceberg (or consider moving down to DSv1
>> writer), as I think the workaround is unacceptable for many end users.
>>
>> And probably need to document the impact and workaround till we fix the
>> issue.
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> 1. https://issues.apache.org/jira/browse/SPARK-23889
>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Reply via email to