[ 
https://issues.apache.org/jira/browse/DRILL-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245552#comment-17245552
 ] 

ASF GitHub Bot commented on DRILL-7817:
---------------------------------------

nielsbasjes commented on a change in pull request #2122:
URL: https://github.com/apache/drill/pull/2122#discussion_r537874370



##########
File path: contrib/format-httpd/README.md
##########
@@ -10,15 +10,16 @@ There are five fields which you can to configure in order 
for Drill to read web
 * **`extensions`**:  The file extension of your web server logs.  Defaults to 
`httpd`.
 * **`maxErrors`**:  Sets the plugin error tolerance. When set to any value 
less than `0`, Drill will ignore all errors. If unspecified then maxErrors is 0 
which will cause the query to fail on the first error.
 * **`flattenWildcards`**: There are a few variables which Drill extracts into 
maps.  Defaults to `false`.
-
+* **`enableUserAgentParser`**:  This enables the additional Yauaa useragent 
parser. If unspecified this is disabled because there is a noticeable startup 
overhead of this plugin, even if not used.
 

Review comment:
       It might work.
   
   The underlying logparser works on the idea of a type. The useragent parser 
hooks to the "HTTP.USERAGENT" type and allows dissecting anything of that type, 
regardless of the name. 
   
   Practically speaking:
   The logparser code only maps the request header "user-agent" to field 
"request.user-agent" with type "HTTP.USERAGENT"
   So unless someone uses the type remapping to extract the useragent from a 
different field this is the only naturally occurring place.
   
   If you do it that way you are working under an assumption about the name of 
the column instead of the internal types.
   
   You then also have to take into account that these do not need the extra 
step:
   
       request_user-agent
       request_user-agent_last
   
   and these do
   
       request_user-agent_operating__system__version__major
       request_user-agent_last_operating__system__version__major
   
   So what I think is that you can activate it automatically if someone does 
`*` or asks for something as mentioned above.
   Yet I do think that if any type remapping is done it should also be 
activated if something is mapped to the input type of the UserAgentDissector.
   
   If there are cases where you activate the plugin when it is not needed then 
the only downside of this is that you'll have extra startup time and extra 
memory usage.
   
   The actual parsing speed is not affected because the logparser will simply 
not do any of the Yauaa code if it not needed.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> Add direct Yauaa support for HTTPD Format Plugin.
> -------------------------------------------------
>
>                 Key: DRILL-7817
>                 URL: https://issues.apache.org/jira/browse/DRILL-7817
>             Project: Apache Drill
>          Issue Type: New Feature
>            Reporter: Niels Basjes
>            Assignee: Niels Basjes
>            Priority: Minor
>
> Enhancement of having the Yauaa useragent parser immediately integrated with 
> the HTTPD logparser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to