Ok, I've finished with this change. Didn't get reviews on the early cleanup PRs, so I've pushed all these changes into the first cleanup PR: https://github.com/apache/beam/pull/5224
Andrew On Tue, May 1, 2018 at 10:35 AM Andrew Pilloud <[email protected]> wrote: > I'm just starting to move forward on this. Looking at my team's short term > needs for SQL, option one would be good enough, however I agree with Kenn > that we want something like option two eventually. I also don't want to > break existing users and it sounds like there is at least one custom > MetaStore not in beam. So my plan is to go with option two and simplify the > interface where functionality loss will not result. > > There is a common set of operations between the MetaStore and the > TableProvider. I'd like to make MetaStore inherit the interface of > TableProvider. Most operations we need (createTable, dropTable, listTables) > are already identical between the two, and so this will have no impact on > custom implementations. The buildBeamSqlTable operation does differ: the > MetaStore takes a table name, the TableProvider takes a table object. > However everything calling this API already has the full table object, so I > would like to simplify this interface by passing the table object in both > cases. Objections? > > Andrew > > On Tue, Apr 24, 2018 at 9:27 AM James <[email protected]> wrote: > >> Kenn: yes, MetaStore is user-facing, Users can choose to implement their >> own MetaStore, currently only an InMemory implementation in Beam CodeBase. >> >> Andrew: I like the second option, since it "retains the ability for DDL >> operations to be processed by a custom MetaStore.", IMO we should have the >> DDL ability as a fully functional SQL. >> >> On Tue, Apr 24, 2018 at 10:28 PM Kenneth Knowles <[email protected]> wrote: >> >>> Can you say more about how the metastore is used? I presume it is or >>> will be user-facing, so are Beam SQL users already providing their own? >>> >>> I'm sure we want something like that eventually to support things like >>> Apache Atlas and HCatalog, IIUC for the "create if needed" logic when using >>> Beam SQL to create a derived data set. But I don't think we should build >>> out those code paths until we have at least one non-in-memory >>> implementation. >>> >>> Just a really high level $0.02. >>> >>> Kenn >>> >>> On Mon, Apr 23, 2018 at 4:56 PM Andrew Pilloud <[email protected]> >>> wrote: >>> >>>> I'm working on updating our Beam DDL code to use the DDL execution >>>> functionality that recently merged into core calcite. This enables us to >>>> take advantage of Calcite JDBC as a way to use Beam SQL. As part of that I >>>> need to reconcile the Beam SQL Environments with the Calcite Schema (which >>>> is calcite's environment). We currently have copies of our tables in the >>>> Beam meta/store, Calcite Schema, BeamSqlEnv, and BeamQueryPlanner. I have a >>>> pending PR which merges the later two to just use the Calcite Schema copy. >>>> Merging the Beam MetaStore and Calcite Schema isn't as simple. I have >>>> two options I'm looking for feedback on: >>>> >>>> 1. Make Calcite Schema authoritative and demote MetaStore to be >>>> something more like a Calcite TableFactory. Calcite Schema already >>>> implements the semantics of our InMemoryMetaStore. If the Store interface >>>> is just over built, this approach would result in a significant reduction >>>> in code. This would however eliminate the CRUD part of the interface >>>> leaving just the buildBeamSqlTable function. >>>> >>>> 2. Pass the Beam MetaStore into Calcite wrapped with a class >>>> translating to Calcite Schema (like we do already with tables). Instead of >>>> copying tables into the Calcite Schema we would pass in Beam meta/store as >>>> the source of truth and Calcite would manipulate tables directly in the >>>> Beam meta/store. This is a bit more complicated but retains the ability for >>>> DDL operations to be processed by a custom MetaStore. >>>> >>>> Thoughts? >>>> >>>> Andrew >>>> >>>
