Re: DataSourceV2 community sync #3

Ryan Blue Thu, 29 Nov 2018 14:32:36 -0800

Hi everyone,

Here are my notes from last night’s sync. Some attendees that joined during
discussion may be missing, since I made the list while we were waiting for
people to join.


If you have topic suggestions for the next sync, please start sending them
to me. Thank you!

*Attendees:*

Ryan Blue
John Zhuge
Jamison Bennett
Yuanjian Li
Xiao Li
stczwd
Matt Cheah
Wenchen Fan
Genglian Wang
Kevin Yu
Maryann Xue
Cody Koeninger
Bruce Robbins
Rohit Karlupia

*Agenda:*

   - Follow-up issues or discussion on Wenchen’s PR #23086
   - TableCatalog proposal
   - CatalogTableIdentifier

*Notes:*

   - Discussion about PR #23086
      - Where should the catalog API live since it needs to be accessible
      to catalyst rules, but the catalyst module is private?
      - Wenchen suggested creating a sql-api module for v2 API interfaces,
      making catalyst depend on it
      - Consensus was to use Wenchen’s suggestion
   - In discussion about #23086, Xiao asked how adding catalog to a table
   identifier will work
      - Background from Ryan: existing code paths use TableIdentifier and
      don’t expect a catalog portion. If an identifier with a catalog
were passed
      to existing code, that code may use the default catalog not
knowing that a
      different one was requested, which would be incorrect behavior.
      - Ryan: The proposal for CatalogTableIdentifier addresses this
      problem. TableIdentifier is used for identifiers that have no
catalog set.
      By enforcing that requirement, passing a TableIdentifier to old code
      ensures that no catalogs leak into that code. This is also used when the
      catalog is set from context. For example, the TableCatalog API
accepts only
      TableIdentifier because the catalog is already determined.
   - Xiao asked whether FunctionIdentifier needs to be updated in the same
   way as CatalogTableIdentifier.
      - Ryan: Yes, when a FunctionCatalog API is added
   - The remaining time was spent discussing whether the plan to
   incrementally replace the current catalog API will work. [Not great notes
   here, feel free to add your take in a reply]
      - Xiao suggested that there are restrictions for how tables and
      functions interact. Because of this, he doesn’t think that separate
      TableCatalog and FunctionCatalog APIs are feasible.
      - Wenchen and Ryan think that functions should be orthogonal to data
      sources
      - Matt and Ryan think that catalog design can be done incrementally
      as new interfaces (i.e. FunctionCatalog) are added and that the proposed
      TableCatalog does not preclude designing for Xiao’s concerns later
      - [I forget who] pointed out that there are restrictions in some
      databases for views from different sources
      - There was some discussion about when functions or views cannot be
      orthogonal. For example, where the code runs is important.
Functions pushed
      to sources cannot necessarily be run on other sources and Spark functions
      cannot necessarily be pushed down to sources.
      - Xiao would like a full catalog replacement design, including views,
      databases, and functions and how they interact, before moving
forward with
      the proposed TableCatalog API
      - Ryan [and Matt, I think] think that TableCatalog is compatible with
      future decisions and the best path forward is to build incrementally. An
      exhaustive design process blocks progress on v2.


On Mon, Nov 26, 2018 at 2:54 PM Ryan Blue <[email protected]> wrote:

> Hi everyone,
>
> I just sent out an invite for the next DSv2 community sync for Wednesday,
> 28 Nov at 5PM PST.
>
> We have a few topics left over from last time to cover. A few people
> wanted to cover catalog APIs, so I put two items on the agenda:
>
>    - The TableCatalog proposal (and other catalog APIs)
>    - Using CatalogTableIdentifier to separate v1 and v2 code paths and
>    avoid unintended behavior changes
>
> As I noted in the summary last time, please send topics ahead of time so
> we can get started more quickly.
>
> If you would like to be added to the google hangout invite, please let me
> know and I’ll add you. Thanks!
>
> rb
> --
> Ryan Blue
> Software Engineer
> Netflix
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: DataSourceV2 community sync #3

Reply via email to