Paul Rogers created DRILL-7298:
----------------------------------

             Summary: Revise log regex plugin to work with table functions
                 Key: DRILL-7298
                 URL: https://issues.apache.org/jira/browse/DRILL-7298
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.16.0
            Reporter: Paul Rogers


See the [PR for DRILL-7293|https://github.com/apache/drill/pull/1807], the 
discussion regarding table properties. The logRegex plugin contains a list of 
{{LogFormatField}} objects:

{code:java}
  private List<LogFormatField> schema;
{code}

As it turns out, such a list cannot be used with table properties. This ticket 
asks to find a solution, perhaps using the suggestions from the PR.

The log format plugin allows users to read any text file that can be described 
with a regex. The plugin lets the user provide the plugin, and a list of fields 
that match the groups within the regex. These fields are described with the 
{{schema}} list. The schema defines a name, type and parse pattern.

Because of the versatility of logRegex, it would be great to be able to specify 
the pattern and field in a table function so that users do not have to create a 
new plugin config each time they want to query a new kind of file. DRILL-7293 
allows the user to specify the regex and schema using the recently added schema 
provisioning system. Still, it would be handy to use table functions.

The require changes are to use types that the table functions can handle, which 
limits choices to strings and numbers. For ad-hoc query use, it might be fine 
to just list field names. Or, perhaps, if no field names are provided, use the 
{{columns}} array as in CSV. For ad-hoc use, type conversions can be expressed 
as casts rather than as types in the table functions.

h4. Backward Compatibility

Care must be taken when changing the config structure of an existing plugin. In 
the past, Drill would refuse to start if the JSON configs stored in ZK did not 
match the schema that Jackson expects based on the config class. Any fix or 
this problem *must* ensure that existing configs do not cause Drill startup to 
fail. Ideally, configs would be automatically upgraded so that users don't have 
to take any manual steps when upgrading Drill with the features requested here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to