As you point out, naming is a separate issue. I believe we inherited names from some other system. But, it is an issue that we use “good” names for implicit columns. If we add more names “createDate”, “modifcationDate”, “owner”, or whatever), we end up breaking someone’s queries that has columns with those names.
Would be good to have a prefix, as you suggest ,to separate Drill names from user names. As it turns out, Drill supports maps (AKA structs) and arrays, so perhaps we could have: $meta$ |- filename |- fqn |- … |- dir[] Where “dir” is an array, rather than the current separate scalar dir0, dir1, etc. The above map let’s us add any number of metadata columns without potentially breaking existing queries. Note that we already have a problem where we can hit a hard schema change because one reader sees “/a/b/c/foo.csv” while another sees “a/b/bar.csv”, resulting in different numbers of “dirx” columns from the two readers. Yet another issue is that, in wildcard queries (e.g. “SELECT *”), we add all implicit columns, then remove them later. We should optimize this case. But, even if we keep the original names, and defer the other issues, the question about map semantics still stands… - Paul > On Oct 9, 2017, at 12:06 PM, Boaz Ben-Zvi <[email protected]> wrote: > > How about changing all those “implicit” columns to have some > “unconventional” prefix, like an underscore (or two _ _ ); e.g. _suffix, > _dir0, etc . > > With such a change we may need to handle the transition of existing users’ > code ; e.g., maybe change the priority (mentioned below) so that an existing > “suffix” column takes precedence over the implicit one. > Or just go “cold turkey” and force the users to change. > > Just an idea, > > Boaz > > On 10/9/17, 10:45 AM, "Paul Rogers" <[email protected]> wrote: > > Hi All, > > Drill provides a set of “implicit” columns to describe files: filename, > suffix, fan and filepath. Drill also provides an open-ended set of partition > columns: dir0, dir1, dir2, etc. > > Not all readers support the above: some do and some don’t. > > Drill semantics seem to treat these as semi-reserved words when a reader > supports implicit columns. If a table has a “suffix” column, then Drill will > treat “suffix” as an implicit column, ignoring the table column. If the user > wants that table column, they can use a session option to temporarily rename > the implicit column. A bit odd, perhaps, but it is our solution. > > What is our desired behavior, however, if the user asks for a column that > includes an implicit column as a prefix: “suffix.a”? Clearly, here, “suffix” > is a map (i.e. structure) and “a” is a field within that map. Since the > implicit “suffix” is never a map, should we: > > 1) Assume that, here, “suffix” is a map column projected from the table? > 2) Issue an error? > 3) Ignore the “.a” part and just return “suffix” as an implicit column? > 4) Something else? > > The code is murky on this point because JSON is implemented far > differently than text files and so on. Each has its own rules. Do we need > consistency of behavior, or is reader-specific behavior the expected design? > > Thanks, > > - Paul > > >
