[ 
https://issues.apache.org/jira/browse/DRILL-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086580#comment-16086580
 ] 

Paul Rogers commented on DRILL-4264:
------------------------------------

Putting on my user hat, I don't think users of Drill (or even non-planner 
developers like me) will understand that column names behave differently than 
table/schema names.

For example, from the description, I cannot predict what happens here:

{code}
Data: dfs.ds.foo.json, contents: { "a" : { "b.c": 10 } }

SELECT `dfs`.`ds`.`foo.json`.`a`.`b.c` FROM `dfs`.`ds`.`foo.json` (1)
SELECT `dfs.ds.foo.json.a`.`b.c` FROM `dfs.ds`.`foo.json` (2)
SELECT `dfs.ds.foo.json.a.b.c` FROM `dfs.ds.foo.json` (3)
{code}

Case (1) should work, right? Each component of the path names are enclosed in 
back-ticks, the dots are unquoted. {{`b.c`}} is quoted so is a complete field 
name. Similarly the file name, {{`foo.json`}} is quoted so clearly {{.json}} is 
part of the file name.

By this logic, case (2) should not work. The quotes enclose parts of a path and 
so the dots in those names should not be treated as delimiters, but rather as 
part of the name. That is the SELECT list should be:

Table: {{`dfs.ds.foo.a`}}
Column: {{`b.c`}}
Schema: {{`dfs.ds`}}
Table: {{`foo.json`}}

Since the two tables do not agree, the query should fail in the planner. Even 
if it didn't, it should return null because no table column matches {{`b.c`}}. 
And yet your explanation suggests that some part of the quoted name will be 
considered separate components.

If so, then Drill is magic, it knows when dots are part of the name (file name) 
and so (3) should work also. But, it won't for the reasons you state.

Ideally, we'd enforce form (2): dots inside quotes are part of the name; they 
are not separators. But, it seems if we do that we might break existing 
queries. Or, can this actually work?

Let me throw in two more complications:

* The directory containing foo.json has a dot: "my.files"
* The workspace name itself contains a dot: "my.ws"

Can I do the above? If not, why not? What SQL syntax would I use? Maybe:

{code}
SELECT `my.ws`.`/my.files/foo.json`.`a`.`b.c` FROM `dfs`.`ds`.`foo.json` (4)
{code}

Seems we've rather gotten ourselves into a muddle by allowing separator dots in 
names.

About here I guess we should ask, what do Hive and Parquet do? They must have 
solved this issue.

> Dots in identifier are not escaped correctly
> --------------------------------------------
>
>                 Key: DRILL-4264
>                 URL: https://issues.apache.org/jira/browse/DRILL-4264
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Codegen
>            Reporter: Alex
>            Assignee: Volodymyr Vysotskyi
>
> If you have some json data like this...
> {code:javascript}
>     {
>       "0.0.1":{
>         "version":"0.0.1",
>         "date_created":"2014-03-15"
>       },
>       "0.1.2":{
>         "version":"0.1.2",
>         "date_created":"2014-05-21"
>       }
>     }
> {code}
> ... there is no way to select any of the rows since their identifiers contain 
> dots and when trying to select them, Drill throws the following error:
> Error: SYSTEM ERROR: UnsupportedOperationException: Unhandled field reference 
> "0.0.1"; a field reference identifier must not have the form of a qualified 
> name
> This must be fixed since there are many json data files containing dots in 
> some of the keys (e.g. when specifying version numbers etc)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to