[
https://issues.apache.org/jira/browse/DRILL-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245552#comment-17245552
]
ASF GitHub Bot commented on DRILL-7817:
---------------------------------------
nielsbasjes commented on a change in pull request #2122:
URL: https://github.com/apache/drill/pull/2122#discussion_r537874370
##########
File path: contrib/format-httpd/README.md
##########
@@ -10,15 +10,16 @@ There are five fields which you can to configure in order
for Drill to read web
* **`extensions`**: The file extension of your web server logs. Defaults to
`httpd`.
* **`maxErrors`**: Sets the plugin error tolerance. When set to any value
less than `0`, Drill will ignore all errors. If unspecified then maxErrors is 0
which will cause the query to fail on the first error.
* **`flattenWildcards`**: There are a few variables which Drill extracts into
maps. Defaults to `false`.
-
+* **`enableUserAgentParser`**: This enables the additional Yauaa useragent
parser. If unspecified this is disabled because there is a noticeable startup
overhead of this plugin, even if not used.
Review comment:
It might work.
The underlying logparser works on the idea of a type. The useragent parser
hooks to the "HTTP.USERAGENT" type and allows dissecting anything of that type,
regardless of the name.
Practically speaking:
The logparser code only maps the request header "user-agent" to field
"request.user-agent" with type "HTTP.USERAGENT"
So unless someone uses the type remapping to extract the useragent from a
different field this is the only naturally occurring place.
If you do it that way you are working under an assumption about the name of
the column instead of the internal types.
You then also have to take into account that these do not need the extra
step:
request_user-agent
request_user-agent_last
and these do
request_user-agent_operating__system__version__major
request_user-agent_last_operating__system__version__major
So what I think is that you can activate it automatically if someone does
`*` or asks for something as mentioned above.
Yet I do think that if any type remapping is done it should also be
activated if something is mapped to the input type of the UserAgentDissector.
If there are cases where you activate the plugin when it is not needed then
the only downside of this is that you'll have extra startup time and extra
memory usage.
The actual parsing speed is not affected because the logparser will simply
not do any of the Yauaa code if it not needed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Add direct Yauaa support for HTTPD Format Plugin.
> -------------------------------------------------
>
> Key: DRILL-7817
> URL: https://issues.apache.org/jira/browse/DRILL-7817
> Project: Apache Drill
> Issue Type: New Feature
> Reporter: Niels Basjes
> Assignee: Niels Basjes
> Priority: Minor
>
> Enhancement of having the Yauaa useragent parser immediately integrated with
> the HTTPD logparser.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)