Google's latest paper about F1[1] claims to support any data sources by using an extension api called TVF see section 6.3. Also need to declare column datatype before the query.
[1] http://www.vldb.org/pvldb/vol11/p1835-samwel.pdf On Fri, Sep 7, 2018 at 9:47 AM Paul Rogers <[email protected]> wrote: > Hi All, > > We've discussed quite a few times whether Drill should or should not > support or require schemas, and if so, how the user might express the > schema. > > I came across a paper [1] that suggests a simple, elegant SQL extension: > > EXTRACT <column>[:<type>] {,<column>[:<type>]} > FROM <stream_name> > > Paraphrasing into Drill's SQL: > > SELECT <column>[:<type>][AS <alias>] {,<column>[:<type>][AS <alias>]} > FROM <table_name> > > Have a collection of JSON files in which string column `foo` appears in > only half the files? Don't want to get schema conflicts with VARCHAR and > nullable INT? Just do: > > SELECT name:VARCHAR, age:INT, foo:VARCHAR > FROM `my-dir` ... > > Not only can the syntax be used to specify the "natural" type for a > column, it might also specify a preferred type. For example. "age:INT" says > that "age" is an INT, even though JSON would normally parse it as a BIGINT. > Similarly, using this syntax is a easy way to tell Drill how to convert CSV > columns from strings to DATE, INT, FLOAT, etc. without the need for CAST > functions. (CAST functions read the data in one format, then convert it to > another in a Project operator. Using a column type might let the reader do > the conversion -- something that is easy to implement if using the "result > set loader" mechanism.) > > Plus, the syntax fits nicely into the existing view file structure. If the > types appear in views, then client tools can continue to use standard SQL > without the type information. > > When this idea came up in the past, someone mentioned the issue of > nullable vs. non-nullable. (Let's also include arrays, since Drill supports > that. Maybe add a suffix to the the name: > > SELECT req:VARCHAR NOT NULL, opt:INT NULL, arr:FLOAT[] FROM ... > > Not pretty, but works with the existing SQL syntax rules. > > Obviously, Drill has much on its plate, so not suggestion that Drill > should do this soon. Just passing it along as yet another option to > consider. > > Thanks, > - Paul > > [1] http://www.cs.columbia.edu/~jrzhou/pub/Scope-VLDBJ.pdf
