[ https://issues.apache.org/jira/browse/DRILL-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arina Ielchiieva updated DRILL-6168: ------------------------------------ Fix Version/s: 1.18.0 > Table functions do not "inherit" default configuration > ------------------------------------------------------ > > Key: DRILL-6168 > URL: https://issues.apache.org/jira/browse/DRILL-6168 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.12.0 > Reporter: Paul Rogers > Assignee: Paul Rogers > Priority: Major > Fix For: 1.18.0 > > > See DRILL-6167 that describes an attempt to use a table function with a regex > format plugin. > Consider the plugin configuration: > {code} > RegexFormatConfig sampleConfig = new RegexFormatConfig(); > sampleConfig.extension = "log1"; > sampleConfig.regex = DATE_ONLY_PATTERN; > sampleConfig.fields = Lists.newArrayList("year", "month", "day"); > {code} > (This plugin is defined in code in a test rather than the usual JSON in the > Web console.) > Run a test with the above. Things work fine. > Now, try the plugin config with a table function as described in DRILL-6167: > {code} > String sql = "SELECT * FROM table(cp.`regex/simple.log2`\n" + > "(type => 'regex', regex => > '(\\\\d\\\\d\\\\d\\\\d)-(\\\\d\\\\d)-(\\\\d\\\\d) .*'))"; > client.queryBuilder().sql(sql).printCsv(); > {code} > Because we are using a file with suffix "log2", the query will match the > format plugin config defined above. A query without the table function does, > in fact, work using the defined config. But, with a table function, we get > this warning from our regex code: > {noformat} > 13307 WARN [257590e1-e846-9d82-61d4-e246a4925ac3:frag:0:0] > [org.apache.drill.exec.store.easy.regex.RegexRecordReader] - Column list has > fewer > names than the pattern has groups, filling extras with Column$n. > {noformat} > (The warning is in the custom plugin, not Drill.) This is the plugin saying, > "hey! you didn't provide column names!". But, in the format definition, we > did provide names. If we run the query without a table function, we do see > those names used. > Result: > {noformat} > 3 row(s): > Column$0<VARCHAR(OPTIONAL)>,Column$1<VARCHAR(OPTIONAL)>,Column$2<VARCHAR(OPTIONAL)> > 2017,12,17 > 2017,12,18 > 2017,12,19 > Total rows returned : 3. Returned in 9072ms. > {noformat} > Yes, indeed, the table function discarded the defined format config values, > filling in blanks, including for the column names. > The expected behavior is that all properties defined in the config should > remain unchanged _except_ for those in the table function. Why? In order to > know which format plugin to use, the code has to map from the suffix (".log2" > here) to a format plugin _config_. (The config is the only thing that > specifies a suffix.) Since we mapped to a config (not the unconfigured > plugin), we'd expect the config properties to be used. > It is highly surprising that all we get to use is the suffix, but all other > attributes are ignored. This seems very much in the "bug" category and not at > all in the "feature" category. -- This message was sent by Atlassian Jira (v8.3.4#803005)