Reviews are wrapping up, this will probably merge Monday if I don't hear from anyone else. One more TableProvider API change after review feedback: getTables now returns Map<String, Table> instead of Set<Table>.
Andrew On Thu, May 3, 2018 at 10:41 AM Andrew Pilloud <[email protected]> wrote: > Ok, I've finished with this change. Didn't get reviews on the early > cleanup PRs, so I've pushed all these changes into the first cleanup PR: > https://github.com/apache/beam/pull/5224 > > Andrew > > On Tue, May 1, 2018 at 10:35 AM Andrew Pilloud <[email protected]> > wrote: > >> I'm just starting to move forward on this. Looking at my team's short >> term needs for SQL, option one would be good enough, however I agree with >> Kenn that we want something like option two eventually. I also don't want >> to break existing users and it sounds like there is at least one custom >> MetaStore not in beam. So my plan is to go with option two and simplify the >> interface where functionality loss will not result. >> >> There is a common set of operations between the MetaStore and the >> TableProvider. I'd like to make MetaStore inherit the interface of >> TableProvider. Most operations we need (createTable, dropTable, listTables) >> are already identical between the two, and so this will have no impact on >> custom implementations. The buildBeamSqlTable operation does differ: the >> MetaStore takes a table name, the TableProvider takes a table object. >> However everything calling this API already has the full table object, so I >> would like to simplify this interface by passing the table object in both >> cases. Objections? >> >> Andrew >> >> On Tue, Apr 24, 2018 at 9:27 AM James <[email protected]> wrote: >> >>> Kenn: yes, MetaStore is user-facing, Users can choose to implement their >>> own MetaStore, currently only an InMemory implementation in Beam CodeBase. >>> >>> Andrew: I like the second option, since it "retains the ability for DDL >>> operations to be processed by a custom MetaStore.", IMO we should have the >>> DDL ability as a fully functional SQL. >>> >>> On Tue, Apr 24, 2018 at 10:28 PM Kenneth Knowles <[email protected]> wrote: >>> >>>> Can you say more about how the metastore is used? I presume it is or >>>> will be user-facing, so are Beam SQL users already providing their own? >>>> >>>> I'm sure we want something like that eventually to support things like >>>> Apache Atlas and HCatalog, IIUC for the "create if needed" logic when using >>>> Beam SQL to create a derived data set. But I don't think we should build >>>> out those code paths until we have at least one non-in-memory >>>> implementation. >>>> >>>> Just a really high level $0.02. >>>> >>>> Kenn >>>> >>>> On Mon, Apr 23, 2018 at 4:56 PM Andrew Pilloud <[email protected]> >>>> wrote: >>>> >>>>> I'm working on updating our Beam DDL code to use the DDL execution >>>>> functionality that recently merged into core calcite. This enables us to >>>>> take advantage of Calcite JDBC as a way to use Beam SQL. As part of that I >>>>> need to reconcile the Beam SQL Environments with the Calcite Schema (which >>>>> is calcite's environment). We currently have copies of our tables in the >>>>> Beam meta/store, Calcite Schema, BeamSqlEnv, and BeamQueryPlanner. I have >>>>> a >>>>> pending PR which merges the later two to just use the Calcite Schema copy. >>>>> Merging the Beam MetaStore and Calcite Schema isn't as simple. I have >>>>> two options I'm looking for feedback on: >>>>> >>>>> 1. Make Calcite Schema authoritative and demote MetaStore to be >>>>> something more like a Calcite TableFactory. Calcite Schema already >>>>> implements the semantics of our InMemoryMetaStore. If the Store interface >>>>> is just over built, this approach would result in a significant reduction >>>>> in code. This would however eliminate the CRUD part of the interface >>>>> leaving just the buildBeamSqlTable function. >>>>> >>>>> 2. Pass the Beam MetaStore into Calcite wrapped with a class >>>>> translating to Calcite Schema (like we do already with tables). Instead of >>>>> copying tables into the Calcite Schema we would pass in Beam meta/store as >>>>> the source of truth and Calcite would manipulate tables directly in the >>>>> Beam meta/store. This is a bit more complicated but retains the ability >>>>> for >>>>> DDL operations to be processed by a custom MetaStore. >>>>> >>>>> Thoughts? >>>>> >>>>> Andrew >>>>> >>>>
