Got it. Thanks a lot for the reply. Best regards, Saisai
Ryan Blue <rb...@netflix.com> 于2019年8月9日周五 上午6:36写道: > We've actually been doing all of our API work in upstream Spark instead of > adding APIs to Iceberg for row-level data manipulation. That's why I'm > involved in the DataSourceV2 work. > > I think for Delta, this is probably an effort to get some features out > earlier. I think that's easier for Delta because it deeply integrates with > Spark and adds new plans -- last I checked, some of the project had to be > located in Spark packages because they use internal classes. > > I think that this API will probably be contributed to Spark itself when > Spark supports update and merge operations. That's probably a good time for > Iceberg to pick it up because Iceberg still needs to update the format for > those. > > Otherwise, Spark supports the latest features available in DataSourceV2, > and will continue to. In fact, we're adding features to DSv2 based on what > we've built internally at Netflix to support Iceberg. > > On Wed, Aug 7, 2019 at 7:03 PM Saisai Shao <sai.sai.s...@gmail.com> wrote: > >> Thanks a lot Ryan, that would be very helpful! >> >> Delta lake recently adds support for such operations in API level ( >> https://github.com/delta-io/delta/blob/master/src/main/scala/io/delta/tables/DeltaTable.scala). >> I was thinking that in the API level the goal of Iceberg is similar, maybe >> we could take that as a reference. >> >> Besides directly using Iceberg API to manipulate data is not so >> straightforward, so it would be great if we could also have a DF API/SQL >> support later on. >> >> Best regards >> Saisai >> >> Ryan Blue <rb...@netflix.com> 于2019年8月8日周四 上午1:22写道: >> >>> Hi Saisai, >>> >>> We are working on adding row-level delete support to Iceberg, where the >>> deletes are applied when data is read. We’ve had a few good design >>> discussions and have come up with a good way to integrate these into the >>> format. Erik has written a good document on it: >>> https://docs.google.com/document/d/1FMKh_SQ6xSUUmoCA8LerTkzIxDUN5JbStQp5Hzot4eo/edit#heading=h.p74qmh3a6ets >>> >>> I’ve also started a milestone to track this work: >>> https://github.com/apache/incubator-iceberg/issues?q=is%3Aopen+is%3Aissue+milestone%3A%22Row-level+Delete%22 >>> >>> That’s assuming that you’re talking about row-level deletes. Iceberg >>> already supports file-level delete, overwrite, etc. >>> >>> Iceberg also already supports a vacuum operation using ExpireSnapshots >>> <http://iceberg.apache.org/javadoc/master/index.html?org/apache/iceberg/ExpireSnapshots.html>. >>> But, Spark (and other engines) don’t have a way to call this yet. Same for >>> MERGE >>> INTO, open source Spark doesn’t support the operation yet. We’re also >>> working on building support into Spark as we go. >>> >>> I hope that helps! >>> >>> On Wed, Aug 7, 2019 at 4:25 AM Saisai Shao <sai.sai.s...@gmail.com> >>> wrote: >>> >>>> Hi team, >>>> >>>> Delta lake project recently announced version 0.3.0, which added >>>> several new features in API level, like update, delete, merge, vacuum, etc. >>>> May I ask is there any plan to add such features in Iceberg? >>>> >>>> Thanks >>>> Saisai >>>> >>> >>> >>> -- >>> Ryan Blue >>> Software Engineer >>> Netflix >>> >> > > -- > Ryan Blue > Software Engineer > Netflix >