Re: [DISCUSS] Spark version support strategy

2021-09-15 Thread Ryan Blue
Sorry, I was thinking about CI integration between Iceberg Java and Iceberg Spark, I just didn't mention it and I see how that's a big thing to leave out! I would definitely want to test the projects together. One thing we could do is have a nightly build like Russell suggests. I'm also wondering

Re: [DISCUSS] Iceberg roadmap

2021-09-15 Thread Ryan Blue
That sounds great, thanks for taking that on Jack! On Wed, Sep 15, 2021 at 3:51 PM Jack Ye wrote: > For external Trino and PrestoDB tasks, I am thinking about creating one > Github project for Trino and another one for PrestoDB to manage all tasks > under them, adding links of issues and PRs in

Re: [DISCUSS] Iceberg roadmap

2021-09-15 Thread Jack Ye
For external Trino and PrestoDB tasks, I am thinking about creating one Github project for Trino and another one for PrestoDB to manage all tasks under them, adding links of issues and PRs in the other communities to track progress. This is mostly to improve visibility so that people who are

Re: [DISCUSS] Spark version support strategy

2021-09-15 Thread Russell Spitzer
I agree that Option 2 is considerably more difficult for development when core API changes need to be picked up by the external Spark module. I also think a monthly release would probably still be prohibitive to actually implementing new features that appear in the API, I would hope we have a

Re: [DISCUSS] Spark version support strategy

2021-09-15 Thread Wing Yew Poon
IIUC, Option 2 is to move the Spark support for Iceberg into a separate repo (subproject of Iceberg). Would we have branches such as 0.13-2.4, 0.13-3.0, 0.13-3.1, and 0.13-3.2? For features that can be supported in all versions or all Spark 3 versions, then we would need to commit the changes to

Re: [DISCUSS] Iceberg roadmap

2021-09-15 Thread Ryan Blue
Gidon, I think that the v3 part of encryption is actually documenting how it works and adding it to the spec. Right now we have hooks for building some encryption around it, but almost no requirements in the spec for how to use it across implementations. This is fine while we're working on

Re: [DISCUSS] Spark version support strategy

2021-09-15 Thread Ryan Blue
Thanks for bringing this up, Anton. I’m glad that we have the set of potential solutions well defined. Looks like the next step is to decide whether we want to require people to update Spark versions to pick up newer versions of Iceberg. If we choose to make people upgrade, then option 1 is

Re: [DISCUSS] UUID type

2021-09-15 Thread Ryan Blue
I don't think we necessarily reached consensus, but I think the general trend toward the end was to keep support for UUID. Should we start a vote to validate consensus? On Wed, Sep 15, 2021 at 1:15 PM Joshua Howard wrote: > Just following up on Piotr's message here. > > Have we converged? I

Re: [DISCUSS] UUID type

2021-09-15 Thread Joshua Howard
Just following up on Piotr's message here. Have we converged? I think most people would assume that silence is a vote for the status-quo. On Mon, Sep 13, 2021 at 7:30 AM Piotr Findeisen wrote: > Hi, > > It seems we converged here that UUID should remain included. > I read this as a consensus

Re: Spark3 Row Level Delete Support

2021-09-15 Thread Aman Rawat
Thanks a lot Anton. This is really helpful. We will read through this and check out the current PRs to get up to speed here. Best Regards, Aman. On Wed, Sep 15, 2021 at 9:53 PM Anton Okolnychyi wrote: > Here is the action PR I mentioned: > https://github.com/apache/iceberg/pull/2841 >

Re: Spark3 Row Level Delete Support

2021-09-15 Thread Anton Okolnychyi
Here is the action PR I mentioned: https://github.com/apache/iceberg/pull/2841 - Anton > On 15 Sep 2021, at 09:14, Anton Okolnychyi > wrote: > > I have a working end-to-end solution locally using some of the upcoming > features in Spark 3.2 and

Re: Spark3 Row Level Delete Support

2021-09-15 Thread Anton Okolnychyi
I have a working end-to-end solution locally using some of the upcoming features in Spark 3.2 and a more elaborate version of what is available in this Spark PR: https://github.com/apache/spark/pull/33008 More details on the proposed Spark APIs are

Spark3 Row Level Delete Support

2021-09-15 Thread Aman Rawat
Hey team, We are trying to implement Spark support for Row Level Deletes for iceberg. Can you please throw some light on - where this work stream is & how can we help. Regards,

Re: [DISCUSS] Spark version support strategy

2021-09-15 Thread OpenInx
Thanks for bringing this up, Anton. Everyone has great pros/cons to support their preferences. Before giving my preference, let me raise one question:what's the top priority thing for apache iceberg project at this point in time ? This question will help us to answer the following

Re: [DISCUSS] Spark version support strategy

2021-09-15 Thread Saisai Shao
>From Dev's point, it has less burden to always support the latest version of Spark (for example). But from user's point, especially for us who maintain Spark internally, it is not easy to upgrade the Spark version for the first time (since we have many customizations internally), and we're still

Re: Join the python iceberg project

2021-09-15 Thread Jun H.
Hi Mordechai, Thanks for your interest! Addition to what Jack mentioned, we also have a slack channel #python in apache-iceberg slack workspace for the iceberg python library. As the iceberg python library is an implementation of iceberg spec, it would be great to get familiar with the spec