[ 
https://issues.apache.org/jira/browse/DRILL-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6168:
------------------------------------
    Fix Version/s: 1.18.0

> Table functions do not "inherit" default configuration
> ------------------------------------------------------
>
>                 Key: DRILL-6168
>                 URL: https://issues.apache.org/jira/browse/DRILL-6168
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.12.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>             Fix For: 1.18.0
>
>
> See DRILL-6167 that describes an attempt to use a table function with a regex 
> format plugin.
> Consider the plugin configuration:
> {code}
>     RegexFormatConfig sampleConfig = new RegexFormatConfig();
>     sampleConfig.extension = "log1";
>     sampleConfig.regex = DATE_ONLY_PATTERN;
>     sampleConfig.fields = Lists.newArrayList("year", "month", "day");
> {code}
> (This plugin is defined in code in a test rather than the usual JSON in the 
> Web console.)
> Run a test with the above. Things work fine.
> Now, try the plugin config with a table function as described in DRILL-6167:
> {code}
>       String sql = "SELECT * FROM table(cp.`regex/simple.log2`\n" +
>           "(type => 'regex', regex => 
> '(\\\\d\\\\d\\\\d\\\\d)-(\\\\d\\\\d)-(\\\\d\\\\d) .*'))";
>       client.queryBuilder().sql(sql).printCsv();
> {code}
> Because we are using a file with suffix "log2", the query will match the 
> format plugin config defined above. A query without the table function does, 
> in fact, work using the defined config. But, with a table function, we get 
> this warning from our regex code:
> {noformat}
> 13307 WARN [257590e1-e846-9d82-61d4-e246a4925ac3:frag:0:0] 
> [org.apache.drill.exec.store.easy.regex.RegexRecordReader] - Column list has 
> fewer
>   names than the pattern has groups, filling extras with Column$n.
> {noformat}
> (The warning is in the custom plugin, not Drill.) This is the plugin saying, 
> "hey! you didn't provide column names!". But, in the format definition, we 
> did provide names. If we run the query without a table function, we do see 
> those names used.
> Result:
> {noformat}
> 3 row(s):
> Column$0<VARCHAR(OPTIONAL)>,Column$1<VARCHAR(OPTIONAL)>,Column$2<VARCHAR(OPTIONAL)>
> 2017,12,17
> 2017,12,18
> 2017,12,19
> Total rows returned : 3.  Returned in 9072ms.
> {noformat}
> Yes, indeed, the table function discarded the defined format config values, 
> filling in blanks, including for the column names.
> The expected behavior is that all properties defined in the config should 
> remain unchanged _except_ for those in the table function. Why? In order to 
> know which format plugin to use, the code has to map from the suffix (".log2" 
> here) to a format plugin _config_. (The config is the only thing that 
> specifies a suffix.) Since we mapped to a config (not the unconfigured 
> plugin), we'd expect the config properties to be used.
> It is highly surprising that all we get to use is the suffix, but all other 
> attributes are ignored. This seems very much in the "bug" category and not at 
> all in the "feature" category.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to