[
https://issues.apache.org/jira/browse/DRILL-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17229723#comment-17229723
]
ASF GitHub Bot commented on DRILL-7534:
---------------------------------------
cgivre opened a new pull request #2112:
URL: https://github.com/apache/drill/pull/2112
# [DRILL-7534](https://issues.apache.org/jira/browse/DRILL-7534): Convert
HTTPD Format Plugin to EVF
## Description
This PR updates the HTTPD format plugin to use the Enhanced Vector Framework
(EVF). In theory there are few changes a user might notice.
1. A new configuration option `maxErrors` has been added which will allow a
user to tune how fault tolerant they want Drill to be when reading log files.
2. Two new implicit fields have been added, `_raw` and `_matched`. They
are described in the docs below.
3. The plugin now includes a limit pushdown which significantly improves
query times for queries with limits.
4. The plugin code is now in the `contrib` folder.
In addition, this PR updates the associated User Agent parsing functions
with the latest version of the underlying libraries.
## Documentation
# Web Server Log Format Plugin (HTTPD)
This plugin enables Drill to read and query httpd (Apache Web Server) and
nginx logs natively. This plugin uses the work by [Niels
Basjes](https://github.com/nielsbasjes) which is available here:
https://github.com/nielsbasjes/logparser.
## Configuration
There are three fields which you will need to configure in order for Drill
to read web server logs which are:
* **`logFormat`**: The log format string is the format string found in your
web server configuration.
* **`timestampFormat`**: The format of time stamps in your log files.
* **`extensions`**: The file extension of your web server logs.
* **`maxErrors`**: Sets the plugin error tolerence. When set to any value
less than `0`, Drill will ignore all errors.
```json
"httpd" : {
"type" : "httpd",
"logFormat" : "%h %l %u %t \"%r\" %s %b \"%{Referer}i\"
\"%{User-agent}i\"",
"timestampFormat" : "dd/MMM/yyyy:HH:mm:ss ZZ",
"maxErrors": 0
}
```
### Implicit Columns
Data queried by this plugin will return two implicit columns:
* **`_raw`**: This returns the raw, unparsed log line
* **`_matched`**: Returns `true` or `false` depending on whether the line
matched the config string.
Thus, if you wanted to see which lines in your log file were not matching
the config, you could use the following query:
```sql
SELECT _raw
FROM <data>
WHERE _matched = false
```
## Testing
Added additional unit tests for this plugin. Ran all unit tests for the
`parse_user_agent()` UDF as well.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Convert HTTPD Format Plugin to EVF
> ----------------------------------
>
> Key: DRILL-7534
> URL: https://issues.apache.org/jira/browse/DRILL-7534
> Project: Apache Drill
> Issue Type: Sub-task
> Reporter: Arina Ielchiieva
> Assignee: Charles Givre
> Priority: Major
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)