Re: [VOTE] SPIP: Catalog API for view metadata

2022-02-23 Thread John Zhuge
Holden has graciously agreed to shepherd the SPIP. Thanks! On Thu, Feb 10, 2022 at 9:19 AM John Zhuge wrote: > The vote is now closed and the vote passes. Thank you to everyone who took > the time to review and vote on this SPIP. I’m looking forward to adding > this feature to the next Spark

Re: [VOTE] SPIP: Catalog API for view metadata

2022-02-10 Thread John Zhuge
The vote is now closed and the vote passes. Thank you to everyone who took the time to review and vote on this SPIP. I’m looking forward to adding this feature to the next Spark release. The tracking JIRA is https://issues.apache.org/jira/browse/SPARK-31357. The tally is: +1s: Walaa Eldin

Re: [VOTE] SPIP: Catalog API for view metadata

2022-02-07 Thread Wenchen Fan
+1 (binding) On Sun, Feb 6, 2022 at 10:27 AM Jacky Lee wrote: > +1 (non-binding). Thanks John! > It's great to see ViewCatalog moving on, it's a nice feature. > > Terry Kim 于2022年2月5日周六 11:57写道: > >> +1 (non-binding). Thanks John! >> >> Terry >> >> On Fri, Feb 4, 2022 at 4:13 PM Yufei Gu

Re: [VOTE] SPIP: Catalog API for view metadata

2022-02-05 Thread Jacky Lee
+1 (non-binding). Thanks John! It's great to see ViewCatalog moving on, it's a nice feature. Terry Kim 于2022年2月5日周六 11:57写道: > +1 (non-binding). Thanks John! > > Terry > > On Fri, Feb 4, 2022 at 4:13 PM Yufei Gu wrote: > >> +1 (non-binding) >> Best, >> >> Yufei >> >> `This is not a

Re: [VOTE] SPIP: Catalog API for view metadata

2022-02-04 Thread Terry Kim
+1 (non-binding). Thanks John! Terry On Fri, Feb 4, 2022 at 4:13 PM Yufei Gu wrote: > +1 (non-binding) > Best, > > Yufei > > `This is not a contribution` > > > On Fri, Feb 4, 2022 at 11:54 AM huaxin gao wrote: > >> +1 (non-binding) >> >> On Fri, Feb 4, 2022 at 11:40 AM L. C. Hsieh wrote: >>

Re: [VOTE] SPIP: Catalog API for view metadata

2022-02-04 Thread Yufei Gu
+1 (non-binding) Best, Yufei `This is not a contribution` On Fri, Feb 4, 2022 at 11:54 AM huaxin gao wrote: > +1 (non-binding) > > On Fri, Feb 4, 2022 at 11:40 AM L. C. Hsieh wrote: > >> +1 >> >> On Thu, Feb 3, 2022 at 7:25 PM Chao Sun wrote: >> > >> > +1 (non-binding). Looking forward to

Re: [VOTE] SPIP: Catalog API for view metadata

2022-02-04 Thread huaxin gao
+1 (non-binding) On Fri, Feb 4, 2022 at 11:40 AM L. C. Hsieh wrote: > +1 > > On Thu, Feb 3, 2022 at 7:25 PM Chao Sun wrote: > > > > +1 (non-binding). Looking forward to this feature! > > > > On Thu, Feb 3, 2022 at 2:32 PM Ryan Blue wrote: > >> > >> +1 for the SPIP. I think it's well designed

Re: [VOTE] SPIP: Catalog API for view metadata

2022-02-04 Thread L. C. Hsieh
+1 On Thu, Feb 3, 2022 at 7:25 PM Chao Sun wrote: > > +1 (non-binding). Looking forward to this feature! > > On Thu, Feb 3, 2022 at 2:32 PM Ryan Blue wrote: >> >> +1 for the SPIP. I think it's well designed and it has worked quite well at >> Netflix for a long time. >> >> On Thu, Feb 3, 2022

Re: [VOTE] SPIP: Catalog API for view metadata

2022-02-03 Thread Chao Sun
+1 (non-binding). Looking forward to this feature! On Thu, Feb 3, 2022 at 2:32 PM Ryan Blue wrote: > +1 for the SPIP. I think it's well designed and it has worked quite well > at Netflix for a long time. > > On Thu, Feb 3, 2022 at 2:04 PM John Zhuge wrote: > >> Hi Spark community, >> >> I’d

Re: [VOTE] SPIP: Catalog API for view metadata

2022-02-03 Thread Ryan Blue
+1 for the SPIP. I think it's well designed and it has worked quite well at Netflix for a long time. On Thu, Feb 3, 2022 at 2:04 PM John Zhuge wrote: > Hi Spark community, > > I’d like to restart the vote for the ViewCatalog design proposal (SPIP). > > The proposal is to add a ViewCatalog

[VOTE] SPIP: Catalog API for view metadata

2022-02-03 Thread John Zhuge
Hi Spark community, I’d like to restart the vote for the ViewCatalog design proposal (SPIP). The proposal is to add a ViewCatalog interface that can be used to load, create, alter, and drop views in DataSourceV2. Please vote on the SPIP until Feb. 9th (Wednesday). [ ] +1: Accept the proposal

Re: [VOTE] SPIP: Catalog API for view metadata

2022-02-03 Thread John Zhuge
Sure Xiao. Happy Lunar New Year! On Thu, Feb 3, 2022 at 1:57 PM Xiao Li wrote: > Can we extend the voting window to next Wednesday? This week is a holiday > week for the lunar new year. AFAIK, many members in Asia are taking the > whole week off. They might not regularly check the emails. > >

Re: [VOTE] SPIP: Catalog API for view metadata

2022-02-03 Thread Xiao Li
Can we extend the voting window to next Wednesday? This week is a holiday week for the lunar new year. AFAIK, many members in Asia are taking the whole week off. They might not regularly check the emails. Also how about starting a separate email thread starting with [VOTE] ? Happy Lunar New

Re: [VOTE] SPIP: Catalog API for view metadata

2022-02-03 Thread Holden Karau
+1 (binding) On Thu, Feb 3, 2022 at 2:26 PM Erik Krogen wrote: > +1 (non-binding) > > Really looking forward to having this natively supported by Spark, so that > we can get rid of our own hacks to tie in a custom view catalog > implementation. I appreciate the care John has put into various

Re: [VOTE] SPIP: Catalog API for view metadata

2022-02-03 Thread Erik Krogen
+1 (non-binding) Really looking forward to having this natively supported by Spark, so that we can get rid of our own hacks to tie in a custom view catalog implementation. I appreciate the care John has put into various parts of the design and believe this will provide a robust and flexible

Re: [VOTE] SPIP: Catalog API for view metadata

2022-02-03 Thread Walaa Eldin Moustafa
+1 On Thu, Feb 3, 2022 at 11:19 AM John Zhuge wrote: > Hi Spark community, > > I’d like to restart the vote for the ViewCatalog design proposal (SPIP > > ). > > The proposal is to add a

Re: [VOTE] SPIP: Catalog API for view metadata

2022-02-03 Thread John Zhuge
Hi Spark community, I’d like to restart the vote for the ViewCatalog design proposal (SPIP ). The proposal is to add a ViewCatalog interface that can be used to load, create, alter, and drop views

Re: [VOTE] SPIP: Catalog API for view metadata

2021-06-04 Thread Walaa Eldin Moustafa
Considering the API aspect, the ViewCatalog API sounds like a good idea. A view catalog will enable us to integrate Coral (our view SQL translation and management layer) very cleanly to Spark. Currently we can only do it by maintaining our special

Re: [VOTE] SPIP: Catalog API for view metadata

2021-05-26 Thread John Zhuge
Looks like we are running in circles. Should we have an online meeting to get this sorted out? Thanks, John On Wed, May 26, 2021 at 12:01 AM Wenchen Fan wrote: > OK, then I'd vote for TableViewCatalog, because > 1. This is how Hive catalog works, and we need to migrate Hive catalog to > the v2

Re: [VOTE] SPIP: Catalog API for view metadata

2021-05-26 Thread Wenchen Fan
OK, then I'd vote for TableViewCatalog, because 1. This is how Hive catalog works, and we need to migrate Hive catalog to the v2 API sooner or later. 2. Because of 1, TableViewCatalog is easy to support in the current table/view resolution framework. 3. It's better to avoid name conflicts between

Re: [VOTE] SPIP: Catalog API for view metadata

2021-05-24 Thread Ryan Blue
I don't think that it makes sense to discuss a different approach in the PR rather than in the vote. Let's discuss this now since that's the purpose of an SPIP. On Mon, May 24, 2021 at 11:22 AM John Zhuge wrote: > Hi everyone, I’d like to start a vote for the ViewCatalog design proposal >

[VOTE] SPIP: Catalog API for view metadata

2021-05-24 Thread John Zhuge
Hi everyone, I’d like to start a vote for the ViewCatalog design proposal (SPIP). The proposal is to add a ViewCatalog interface that can be used to load, create, alter, and drop views in DataSourceV2. The full SPIP doc is here:

Re: SPIP: Catalog API for view metadata

2021-05-24 Thread John Zhuge
Great! I will start a vote thread. On Mon, May 24, 2021 at 10:54 AM Wenchen Fan wrote: > Yea let's move forward first. We can discuss the caching approach > and TableViewCatalog approach during the PR review. > > On Tue, May 25, 2021 at 1:48 AM John Zhuge wrote: > >> Hi everyone, >> >> Is

Re: SPIP: Catalog API for view metadata

2021-05-24 Thread Wenchen Fan
Yea let's move forward first. We can discuss the caching approach and TableViewCatalog approach during the PR review. On Tue, May 25, 2021 at 1:48 AM John Zhuge wrote: > Hi everyone, > > Is there any more discussion before we start a vote on ViewCatalog? With > FunctionCatalog merged, I hope

Re: SPIP: Catalog API for view metadata

2021-05-24 Thread John Zhuge
Hi everyone, Is there any more discussion before we start a vote on ViewCatalog? With FunctionCatalog merged, I hope this feature can complete the offerings of catalog plugins in 3.2. Once approved, I will refresh the WIP PR. Implementation details can be ironed out during review. Thanks, On

Re: SPIP: Catalog API for view metadata

2020-11-10 Thread Ryan Blue
An extra RPC call is a concern for the catalog implementation. It is simple to cache the result of a call to avoid a second one if the catalog chooses. I don't think that an extra RPC that can be easily avoided is a reasonable justification to add caches in Spark. For one thing, it doesn't solve

Re: SPIP: Catalog API for view metadata

2020-11-09 Thread Wenchen Fan
Moving back the discussion to this thread. The current argument is how to avoid extra RPC calls for catalogs supporting both table and view. There are several options: 1. ignore it as extra PRC calls are cheap compared to the query execution 2. have a per session cache for loaded table/view 3.

Re: SPIP: Catalog API for view metadata

2020-09-04 Thread John Zhuge
SPIP has been updated. Please review. On Thu, Sep 3, 2020 at 9:22 AM John Zhuge wrote: > Wenchen, sorry for the delay, I will post an update shortly. > > On Thu, Sep 3, 2020 at 2:00 AM Wenchen Fan

Re: SPIP: Catalog API for view metadata

2020-09-03 Thread John Zhuge
Wenchen, sorry for the delay, I will post an update shortly. On Thu, Sep 3, 2020 at 2:00 AM Wenchen Fan wrote: > Any updates here? I agree that a new View API is better, but we need a > solution to avoid performance regression. We need to elaborate on the cache > idea. > > On Thu, Aug 20, 2020

Re: SPIP: Catalog API for view metadata

2020-09-03 Thread Wenchen Fan
Any updates here? I agree that a new View API is better, but we need a solution to avoid performance regression. We need to elaborate on the cache idea. On Thu, Aug 20, 2020 at 7:43 AM Ryan Blue wrote: > I think it is a good idea to keep tables and views separate. > > The main two arguments

Re: SPIP: Catalog API for view metadata

2020-08-19 Thread Ryan Blue
I think it is a good idea to keep tables and views separate. The main two arguments I’ve heard for combining lookup into a single function are the ones brought up in this thread. First, an identifier in a catalog must be either a view or a table and should not collide. Second, a single lookup is

Re: SPIP: Catalog API for view metadata

2020-08-18 Thread John Zhuge
> > AFAIK view schema is only used by DESCRIBE. > > Correction: Spark adds a new Project at the top of the parsed plan from > view, based on the stored schema, to make sure the view schema doesn't > change. > Thanks Wenchen! I thought I forgot something :) Yes it is the validation done in

Re: SPIP: Catalog API for view metadata

2020-08-18 Thread John Zhuge
Thanks Wenchen. Will do. On Tue, Aug 18, 2020 at 6:38 AM Wenchen Fan wrote: > > AFAIK view schema is only used by DESCRIBE. > > Correction: Spark adds a new Project at the top of the parsed plan from > view, based on the stored schema, to make sure the view schema doesn't > change. > > Can you

Re: SPIP: Catalog API for view metadata

2020-08-18 Thread Wenchen Fan
> AFAIK view schema is only used by DESCRIBE. Correction: Spark adds a new Project at the top of the parsed plan from view, based on the stored schema, to make sure the view schema doesn't change. Can you update your doc to incorporate the cache idea? Let's make sure we don't have perf issues if

Re: SPIP: Catalog API for view metadata

2020-08-18 Thread John Zhuge
Thanks Burak and Walaa for the feedback! Here are my perspectives: We shouldn't be persisting things like the schema for a view This is not related to which option to choose because existing code persists schema as well. When resolving the view, the analyzer always parses the view sql text, it

Re: SPIP: Catalog API for view metadata

2020-08-14 Thread Walaa Eldin Moustafa
Wenchen, agreed with what you said. I was referring to situations where the underlying table schema evolves (say by introducing a nested field in a Struct), and also what you mentioned in cases of SELECT *. The Hive metastore handling of those does not automatically update view schema (even though

Re: SPIP: Catalog API for view metadata

2020-08-14 Thread Wenchen Fan
View should have a fixed schema like a table. It should either be inferred from the query when creating the view, or be specified by the user manually like CREATE VIEW v(a, b) AS SELECT Users can still alter view schema manually. Basically a view is just a named SQL query, which mostly has

Re: SPIP: Catalog API for view metadata

2020-08-13 Thread Walaa Eldin Moustafa
+1 to making views as special forms of tables. Sometimes a table can be converted to a view to hide some of the implementation details while not impacting readers (provided that the write path is controlled). Also, views can be defined on top of either other views or base tables, so the less

Re: SPIP: Catalog API for view metadata

2020-08-13 Thread Burak Yavuz
My high level comment here is that as a naive person, I would expect a View to be a special form of Table that SupportsRead but doesn't SupportWrite. loadTable in the TableCatalog API should load both tables and views. This way you avoid multiple RPCs to a catalog or data source or metastore, and

Re: SPIP: Catalog API for view metadata

2020-08-13 Thread John Zhuge
Thanks Ryan. ViewCatalog API mimics TableCatalog API including how shared namespace is handled: - The doc for createView states "it will throw ViewAlreadyExistsException when a view or table

Re: SPIP: Catalog API for view metadata

2020-08-13 Thread Ryan Blue
I agree with Wenchen that we need to be clear about resolution and behavior. For example, I think that we would agree that CREATE VIEW catalog.schema.name should fail when there is a table named catalog.schema.name. We’ve already included this behavior in the documentation for the TableCatalog API

Re: SPIP: Catalog API for view metadata

2020-08-12 Thread John Zhuge
Hi Wenchen, Thanks for the feedback! 1. Add a new View API. How to avoid name conflicts between table and view? > When resolving relation, shall we lookup table catalog first or view > catalog? See clarification in SPIP section "Proposed Changes - Namespace": - The proposed new view

Re: SPIP: Catalog API for view metadata

2020-08-12 Thread Wenchen Fan
Hi John, Thanks for working on this! View support is very important to the catalog plugin API. After reading your doc, I have one high-level question: should view be a separated API or it's just a special type of table? AFAIK in most databases, tables and views share the same namespace. You

Re: SPIP: Catalog API for view metadata

2020-08-11 Thread John Zhuge
Hi Spark devs, I'd like to bring more attention to this SPIP. As Dongjoon indicated in the email "Apache Spark 3.1 Feature Expectation (Dec. 2020)", this feature can be considered for 3.2 or even 3.1. View catalog builds on top of the catalog plugin system introduced in DataSourceV2. It adds the

SPIP: Catalog API for view metadata

2020-04-22 Thread John Zhuge
Hi everyone, In order to disassociate view metadata from Hive Metastore and support different storage backends, I am proposing a new view catalog API to load, create, alter, and drop views. Document: https://docs.google.com/document/d/1XOxFtloiMuW24iqJ-zJnDzHl2KMxipTjJoxleJFz66A/edit?usp=sharing