Re: Any plan to support update, delete and others

2019-08-08 Thread Saisai Shao
Got it. Thanks a lot for the reply.

Best regards,
Saisai

Ryan Blue  于2019年8月9日周五 上午6:36写道:

> We've actually been doing all of our API work in upstream Spark instead of
> adding APIs to Iceberg for row-level data manipulation. That's why I'm
> involved in the DataSourceV2 work.
>
> I think for Delta, this is probably an effort to get some features out
> earlier. I think that's easier for Delta because it deeply integrates with
> Spark and adds new plans -- last I checked, some of the project had to be
> located in Spark packages because they use internal classes.
>
> I think that this API will probably be contributed to Spark itself when
> Spark supports update and merge operations. That's probably a good time for
> Iceberg to pick it up because Iceberg still needs to update the format for
> those.
>
> Otherwise, Spark supports the latest features available in DataSourceV2,
> and will continue to. In fact, we're adding features to DSv2 based on what
> we've built internally at Netflix to support Iceberg.
>
> On Wed, Aug 7, 2019 at 7:03 PM Saisai Shao  wrote:
>
>> Thanks a lot Ryan, that would be very helpful!
>>
>> Delta lake recently adds support for such operations in API level (
>> https://github.com/delta-io/delta/blob/master/src/main/scala/io/delta/tables/DeltaTable.scala).
>> I was thinking that in the API level the goal of Iceberg is similar, maybe
>> we could take that as a reference.
>>
>> Besides directly using Iceberg API to manipulate data is not so
>> straightforward, so it would be great if we could also have a DF API/SQL
>> support later on.
>>
>> Best regards
>> Saisai
>>
>> Ryan Blue  于2019年8月8日周四 上午1:22写道:
>>
>>> Hi Saisai,
>>>
>>> We are working on adding row-level delete support to Iceberg, where the
>>> deletes are applied when data is read. We’ve had a few good design
>>> discussions and have come up with a good way to integrate these into the
>>> format. Erik has written a good document on it:
>>> https://docs.google.com/document/d/1FMKh_SQ6xSUUmoCA8LerTkzIxDUN5JbStQp5Hzot4eo/edit#heading=h.p74qmh3a6ets
>>>
>>> I’ve also started a milestone to track this work:
>>> https://github.com/apache/incubator-iceberg/issues?q=is%3Aopen+is%3Aissue+milestone%3A%22Row-level+Delete%22
>>>
>>> That’s assuming that you’re talking about row-level deletes. Iceberg
>>> already supports file-level delete, overwrite, etc.
>>>
>>> Iceberg also already supports a vacuum operation using ExpireSnapshots
>>> .
>>> But, Spark (and other engines) don’t have a way to call this yet. Same for 
>>> MERGE
>>> INTO, open source Spark doesn’t support the operation yet. We’re also
>>> working on building support into Spark as we go.
>>>
>>> I hope that helps!
>>>
>>> On Wed, Aug 7, 2019 at 4:25 AM Saisai Shao 
>>> wrote:
>>>
 Hi team,

 Delta lake project recently announced version 0.3.0, which added
 several new features in API level, like update, delete, merge, vacuum, etc.
 May I ask is there any plan to add such features in Iceberg?

 Thanks
 Saisai

>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


Re: Any plan to support update, delete and others

2019-08-08 Thread Ryan Blue
We've actually been doing all of our API work in upstream Spark instead of
adding APIs to Iceberg for row-level data manipulation. That's why I'm
involved in the DataSourceV2 work.

I think for Delta, this is probably an effort to get some features out
earlier. I think that's easier for Delta because it deeply integrates with
Spark and adds new plans -- last I checked, some of the project had to be
located in Spark packages because they use internal classes.

I think that this API will probably be contributed to Spark itself when
Spark supports update and merge operations. That's probably a good time for
Iceberg to pick it up because Iceberg still needs to update the format for
those.

Otherwise, Spark supports the latest features available in DataSourceV2,
and will continue to. In fact, we're adding features to DSv2 based on what
we've built internally at Netflix to support Iceberg.

On Wed, Aug 7, 2019 at 7:03 PM Saisai Shao  wrote:

> Thanks a lot Ryan, that would be very helpful!
>
> Delta lake recently adds support for such operations in API level (
> https://github.com/delta-io/delta/blob/master/src/main/scala/io/delta/tables/DeltaTable.scala).
> I was thinking that in the API level the goal of Iceberg is similar, maybe
> we could take that as a reference.
>
> Besides directly using Iceberg API to manipulate data is not so
> straightforward, so it would be great if we could also have a DF API/SQL
> support later on.
>
> Best regards
> Saisai
>
> Ryan Blue  于2019年8月8日周四 上午1:22写道:
>
>> Hi Saisai,
>>
>> We are working on adding row-level delete support to Iceberg, where the
>> deletes are applied when data is read. We’ve had a few good design
>> discussions and have come up with a good way to integrate these into the
>> format. Erik has written a good document on it:
>> https://docs.google.com/document/d/1FMKh_SQ6xSUUmoCA8LerTkzIxDUN5JbStQp5Hzot4eo/edit#heading=h.p74qmh3a6ets
>>
>> I’ve also started a milestone to track this work:
>> https://github.com/apache/incubator-iceberg/issues?q=is%3Aopen+is%3Aissue+milestone%3A%22Row-level+Delete%22
>>
>> That’s assuming that you’re talking about row-level deletes. Iceberg
>> already supports file-level delete, overwrite, etc.
>>
>> Iceberg also already supports a vacuum operation using ExpireSnapshots
>> .
>> But, Spark (and other engines) don’t have a way to call this yet. Same for 
>> MERGE
>> INTO, open source Spark doesn’t support the operation yet. We’re also
>> working on building support into Spark as we go.
>>
>> I hope that helps!
>>
>> On Wed, Aug 7, 2019 at 4:25 AM Saisai Shao 
>> wrote:
>>
>>> Hi team,
>>>
>>> Delta lake project recently announced version 0.3.0, which added several
>>> new features in API level, like update, delete, merge, vacuum, etc. May I
>>> ask is there any plan to add such features in Iceberg?
>>>
>>> Thanks
>>> Saisai
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

-- 
Ryan Blue
Software Engineer
Netflix


Re: Any plan to support update, delete and others

2019-08-07 Thread Saisai Shao
Thanks a lot Ryan, that would be very helpful!

Delta lake recently adds support for such operations in API level (
https://github.com/delta-io/delta/blob/master/src/main/scala/io/delta/tables/DeltaTable.scala).
I was thinking that in the API level the goal of Iceberg is similar, maybe
we could take that as a reference.

Besides directly using Iceberg API to manipulate data is not so
straightforward, so it would be great if we could also have a DF API/SQL
support later on.

Best regards
Saisai

Ryan Blue  于2019年8月8日周四 上午1:22写道:

> Hi Saisai,
>
> We are working on adding row-level delete support to Iceberg, where the
> deletes are applied when data is read. We’ve had a few good design
> discussions and have come up with a good way to integrate these into the
> format. Erik has written a good document on it:
> https://docs.google.com/document/d/1FMKh_SQ6xSUUmoCA8LerTkzIxDUN5JbStQp5Hzot4eo/edit#heading=h.p74qmh3a6ets
>
> I’ve also started a milestone to track this work:
> https://github.com/apache/incubator-iceberg/issues?q=is%3Aopen+is%3Aissue+milestone%3A%22Row-level+Delete%22
>
> That’s assuming that you’re talking about row-level deletes. Iceberg
> already supports file-level delete, overwrite, etc.
>
> Iceberg also already supports a vacuum operation using ExpireSnapshots
> .
> But, Spark (and other engines) don’t have a way to call this yet. Same for 
> MERGE
> INTO, open source Spark doesn’t support the operation yet. We’re also
> working on building support into Spark as we go.
>
> I hope that helps!
>
> On Wed, Aug 7, 2019 at 4:25 AM Saisai Shao  wrote:
>
>> Hi team,
>>
>> Delta lake project recently announced version 0.3.0, which added several
>> new features in API level, like update, delete, merge, vacuum, etc. May I
>> ask is there any plan to add such features in Iceberg?
>>
>> Thanks
>> Saisai
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


Re: Any plan to support update, delete and others

2019-08-07 Thread Ryan Blue
Hi Saisai,

We are working on adding row-level delete support to Iceberg, where the
deletes are applied when data is read. We’ve had a few good design
discussions and have come up with a good way to integrate these into the
format. Erik has written a good document on it:
https://docs.google.com/document/d/1FMKh_SQ6xSUUmoCA8LerTkzIxDUN5JbStQp5Hzot4eo/edit#heading=h.p74qmh3a6ets

I’ve also started a milestone to track this work:
https://github.com/apache/incubator-iceberg/issues?q=is%3Aopen+is%3Aissue+milestone%3A%22Row-level+Delete%22

That’s assuming that you’re talking about row-level deletes. Iceberg
already supports file-level delete, overwrite, etc.

Iceberg also already supports a vacuum operation using ExpireSnapshots
.
But, Spark (and other engines) don’t have a way to call this yet. Same
for MERGE
INTO, open source Spark doesn’t support the operation yet. We’re also
working on building support into Spark as we go.

I hope that helps!

On Wed, Aug 7, 2019 at 4:25 AM Saisai Shao  wrote:

> Hi team,
>
> Delta lake project recently announced version 0.3.0, which added several
> new features in API level, like update, delete, merge, vacuum, etc. May I
> ask is there any plan to add such features in Iceberg?
>
> Thanks
> Saisai
>


-- 
Ryan Blue
Software Engineer
Netflix