Hi Bohdan, Thanks for your explanation. My question comes from a little project I'm working on to handle projection in EVF. Queries go through two major steps: planing and execution. At the planning stage we use SQL syntax for the project list. For example:
explain plan for SELECT a, e.`map`.`member`, `dict`['key'], `array`[10] FROM cp.`employee.json` e The planner sends an execution plan to operators. The project list appears in JSON. For the above: "columns" : [ "`a`", "`map`.`member`", "`dict`.`key`", "`array`[10]" ], We see that the JSON works as you described: * The SQL map "map.member" syntax is converted to "`map`.`member`" in the JSON plan. * The SQL DICT "`dict`['key']" syntax is converted to a form identical to maps: "`dict`.`key`". * The SQL DICT/array "`array`[10]" syntax is converted to "`array`[10]" in JSON. That is, on the execution side, we can't tell the difference between a MAP and a DICT request. We also can't tell the difference between an Array and DICT request. Apparently, because of this, the Schema Path parser does not recognize DICT syntax. Given the way projection works, "a.b" and "a['b']" are identical: either works for both a map or a DICT with VARCHAR keys. I was confused because the "ProjectionType" and "RequestedColumn" classes were extended with a DICT projection type. But, as we just saw, it is impossible to ever use that projection type (the dict['key'] syntax) in the execution engine. Shall I just remove special support for DICT projection, and just say that map and array projection are both compatible with a DICT column? One other related question. As I recall, a DICT allows any scalar type as a key. We saw that VARCHAR keys are converted to map references, INT keys are converted to array references. But, what about DOUBLE keys (recognizing that such keys are a bad idea): explain plan for SELECT `dict2`[123.4] FROM cp.`employee.json` VALIDATION ERROR: From line 1, column 25 to line 1, column 38: Cannot apply 'ITEM' to arguments of type 'ITEM(<ANY>, <DECIMAL(4, 1)>)' So. We only support INT and VARCHAR keys in DICT when used with literals. Is this intentional? Obviously, to change this behavior, we'd have to change how columns are stored in JSON and we'd have to change the schema path parser. Doing so would impact all code that uses schema paths (including the projection stuff I'm working on.) Thanks, - Paul On Monday, January 20, 2020, 12:02:29 AM PST, Bohdan Kazydub <bohdan.kazy...@gmail.com> wrote: Hi Paul, `SELECT myMap.x ...` and `SELECT myMap['x'] ...` is treated the same in Drill - schema path parser recognizes it as `myMap.x` in both cases. The same is true for DICT - both `myDict['key1']` and myDict.key1 allows Python-like syntax for projecting DICT values, but in schema path it is also stored as in the case for MAP - myDIct.key1 - as you can see, there is no distinction between MAP and DICT based on schema path alone. (Note, that one can't project a `key` in DICT - `SELECT myDict.key ...` will be treated as if `value` identified by `key` with value 'key' is projected, as in Java's `Map<String, Object> map = ...; Object value = map.get("key");`). In case when a key is an integer, schema path is the same as in case of array. Is this what you meant by "schema path parser does not recognize the syntax" or do you get an error? On Mon, Jan 20, 2020 at 5:16 AM Paul Rogers <par0...@yahoo.com.invalid> wrote: > Hi All, > > What did we decide to do about projecting DICT values? Drill allows us to > project specific MAP members: > > SELECT myMap.x ... > > And, Drill allows projecting array members: > > SELECT myArray[3] ... > > I thought there was discussion of allowing Python-like syntax for > projecting DICT values: > > SELECT myDict['key1'] ... > > I tried this with no quotes, single-quotes and back-tick quotes. Seems > that the schema path parser does not recognize the syntax. > > Is there some trick that I'm missing? > > Thanks, > - Paul > >