Got it. Thanks a lot for the reply.

Best regards,
Saisai

Ryan Blue <rb...@netflix.com> 于2019年8月9日周五 上午6:36写道:

> We've actually been doing all of our API work in upstream Spark instead of
> adding APIs to Iceberg for row-level data manipulation. That's why I'm
> involved in the DataSourceV2 work.
>
> I think for Delta, this is probably an effort to get some features out
> earlier. I think that's easier for Delta because it deeply integrates with
> Spark and adds new plans -- last I checked, some of the project had to be
> located in Spark packages because they use internal classes.
>
> I think that this API will probably be contributed to Spark itself when
> Spark supports update and merge operations. That's probably a good time for
> Iceberg to pick it up because Iceberg still needs to update the format for
> those.
>
> Otherwise, Spark supports the latest features available in DataSourceV2,
> and will continue to. In fact, we're adding features to DSv2 based on what
> we've built internally at Netflix to support Iceberg.
>
> On Wed, Aug 7, 2019 at 7:03 PM Saisai Shao <sai.sai.s...@gmail.com> wrote:
>
>> Thanks a lot Ryan, that would be very helpful!
>>
>> Delta lake recently adds support for such operations in API level (
>> https://github.com/delta-io/delta/blob/master/src/main/scala/io/delta/tables/DeltaTable.scala).
>> I was thinking that in the API level the goal of Iceberg is similar, maybe
>> we could take that as a reference.
>>
>> Besides directly using Iceberg API to manipulate data is not so
>> straightforward, so it would be great if we could also have a DF API/SQL
>> support later on.
>>
>> Best regards
>> Saisai
>>
>> Ryan Blue <rb...@netflix.com> 于2019年8月8日周四 上午1:22写道:
>>
>>> Hi Saisai,
>>>
>>> We are working on adding row-level delete support to Iceberg, where the
>>> deletes are applied when data is read. We’ve had a few good design
>>> discussions and have come up with a good way to integrate these into the
>>> format. Erik has written a good document on it:
>>> https://docs.google.com/document/d/1FMKh_SQ6xSUUmoCA8LerTkzIxDUN5JbStQp5Hzot4eo/edit#heading=h.p74qmh3a6ets
>>>
>>> I’ve also started a milestone to track this work:
>>> https://github.com/apache/incubator-iceberg/issues?q=is%3Aopen+is%3Aissue+milestone%3A%22Row-level+Delete%22
>>>
>>> That’s assuming that you’re talking about row-level deletes. Iceberg
>>> already supports file-level delete, overwrite, etc.
>>>
>>> Iceberg also already supports a vacuum operation using ExpireSnapshots
>>> <http://iceberg.apache.org/javadoc/master/index.html?org/apache/iceberg/ExpireSnapshots.html>.
>>> But, Spark (and other engines) don’t have a way to call this yet. Same for 
>>> MERGE
>>> INTO, open source Spark doesn’t support the operation yet. We’re also
>>> working on building support into Spark as we go.
>>>
>>> I hope that helps!
>>>
>>> On Wed, Aug 7, 2019 at 4:25 AM Saisai Shao <sai.sai.s...@gmail.com>
>>> wrote:
>>>
>>>> Hi team,
>>>>
>>>> Delta lake project recently announced version 0.3.0, which added
>>>> several new features in API level, like update, delete, merge, vacuum, etc.
>>>> May I ask is there any plan to add such features in Iceberg?
>>>>
>>>> Thanks
>>>> Saisai
>>>>
>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Reply via email to