[ 
https://issues.apache.org/jira/browse/DRILL-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246654#comment-17246654
 ] 

ASF GitHub Bot commented on DRILL-7817:
---------------------------------------

nielsbasjes commented on a change in pull request #2122:
URL: https://github.com/apache/drill/pull/2122#discussion_r539454963



##########
File path: contrib/format-httpd/README.md
##########
@@ -41,24 +45,67 @@ FROM dfs.test.`logfile.httpd` AS mylogs
 
 ```
 In this example, we assign an alias of `mylogs` to the table, the column name 
is `request_firstline_uri_query_$` and then the individual field within that 
mapping is `username
-`.  This particular example enables you to analyze items in query strings.  
+`.  This particular example enables you to analyze items in query strings.
 
 ### Flattening Maps
-In the event that you have a map field that you would like broken into columns 
rather than getting the nested fields, you can set the `flattenWildcards` 
option to `true` and 
-Drill will create columns for these fields.  For example if you have a URI 
Query option called `username`.  If you selected the `flattedWildcards` option, 
Drill will create a 
-field called `request_firstline_uri_query_username`.  
+In the event that you have a map field that you would like broken into columns 
rather than getting the nested fields, you can set the `flattenWildcards` 
option to `true` and
+Drill will create columns for these fields.  For example if you have a URI 
Query option called `username`.  If you selected the `flattedWildcards` option, 
Drill will create a
+field called `request_firstline_uri_query_username`.
 
-** Note that underscores in the field name are replaced with double 
underscores ** 
- 
- ## Useful Functions
+** Note that underscores in the field name are replaced with double 
underscores **
+
+## Useful Functions
  If you are using Drill to analyze web access logs, there are a few other 
useful functions which you should know about:
- 
+
  * `parse_url(<url>)`: This function accepts a URL as an argument and returns 
a map of the URL's protocol, authority, host, and path.
  * `parse_query(<query_string>)`: This function accepts a query string and 
returns a key/value pairing of the variables submitted in the request.
  * `parse_user_agent(<user agent>)`, `parse_user_agent( <useragent field>, 
<desired field> )`: The function parse_user_agent() takes a user agent string 
as an argument and
-  returns a map of the available fields. Note that not every field will be 
present in every user agent string. 
+  returns a map of the available fields. Note that not every field will be 
present in every user agent string.
   [Complete Docs 
Here](https://github.com/apache/drill/tree/master/contrib/udfs#user-agent-functions)
- 
+
+## LogParser type remapping
+**Advanced feature**
+The underlying [logparser](https://github.com/nielsbasjes/logparser) supports 
something called type remapping.
+Essentially it means that an extracted value which would normally be treated 
as an unparsable STRING can now be 'cast' to something
+that can be further cut into relevant pieces.
+
+The parameter string is a `;` separated list of mappings.
+Each mapping is a `:` separated list of
+- the name of the underlying logparser field (which is different from th Drill 
column name),
+- the underlying `type` which is used to determine which additional Dissectors 
can be applied.
+- optionally the `cast` (one of `STRING`, `LONG`, `DOUBLE`) which may impact 
the type of the Drill column
+
+Examples:
+- If you have a query parameter in the URL called `ua` which is really the 
UserAgent string and you would like to parse this you can add
+`request.firstline.uri.query.ua:HTTP.USERAGENT`
+- If you have a query parameter in the URL called `timestamp` which is really 
the numerical timestamp (epoch milliseconds).
+The additional "LONG" will cause the returned value be a long which tells 
Drill the `TIME.EPOCH` is to be interpreted as a `TIMESTAMP` column.
+`request.firstline.uri.query.timestamp:TIME.EPOCH:LONG`
+

Review comment:
       Yes, I was hoping you would like this feature to be available in Drill.
   Note that the name of the parameter does not matter. If it is called `foo` 
you can do `request.firstline.uri.query.foo:TIME.EPOCH:LONG` and it will work 
aswell.
   
   Having this is especially useful if you are parsing web analytics `beacon` 
logs (where the real data is all in query parameters).
   If you have a query parameter that is really the URL of your site then 
simply remap it to `HTTP.URI` and go from there.
   `request.firstline.uri.query.g:HTTP.URI`
   
   Another nice example of this is what I do here with the MOD_UNIQUE_ID field 
... for which a built in Dissector is available:
   
   
https://github.com/apache/drill/blob/29edfec469c7de4cdd621fd4914dfb143d9a2784/contrib/format-httpd/src/test/java/org/apache/drill/exec/store/httpd/TestHTTPDLogReaderUserAgent.java#L194
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> Add direct Yauaa support for HTTPD Format Plugin.
> -------------------------------------------------
>
>                 Key: DRILL-7817
>                 URL: https://issues.apache.org/jira/browse/DRILL-7817
>             Project: Apache Drill
>          Issue Type: New Feature
>            Reporter: Niels Basjes
>            Assignee: Niels Basjes
>            Priority: Minor
>
> Enhancement of having the Yauaa useragent parser immediately integrated with 
> the HTTPD logparser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to