[
https://issues.apache.org/jira/browse/DRILL-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246613#comment-17246613
]
ASF GitHub Bot commented on DRILL-7817:
---------------------------------------
cgivre commented on a change in pull request #2122:
URL: https://github.com/apache/drill/pull/2122#discussion_r539412478
##########
File path: contrib/format-httpd/README.md
##########
@@ -41,24 +45,67 @@ FROM dfs.test.`logfile.httpd` AS mylogs
```
In this example, we assign an alias of `mylogs` to the table, the column name
is `request_firstline_uri_query_$` and then the individual field within that
mapping is `username
-`. This particular example enables you to analyze items in query strings.
+`. This particular example enables you to analyze items in query strings.
### Flattening Maps
-In the event that you have a map field that you would like broken into columns
rather than getting the nested fields, you can set the `flattenWildcards`
option to `true` and
-Drill will create columns for these fields. For example if you have a URI
Query option called `username`. If you selected the `flattedWildcards` option,
Drill will create a
-field called `request_firstline_uri_query_username`.
+In the event that you have a map field that you would like broken into columns
rather than getting the nested fields, you can set the `flattenWildcards`
option to `true` and
+Drill will create columns for these fields. For example if you have a URI
Query option called `username`. If you selected the `flattedWildcards` option,
Drill will create a
+field called `request_firstline_uri_query_username`.
-** Note that underscores in the field name are replaced with double
underscores **
-
- ## Useful Functions
+** Note that underscores in the field name are replaced with double
underscores **
+
+## Useful Functions
If you are using Drill to analyze web access logs, there are a few other
useful functions which you should know about:
-
+
* `parse_url(<url>)`: This function accepts a URL as an argument and returns
a map of the URL's protocol, authority, host, and path.
* `parse_query(<query_string>)`: This function accepts a query string and
returns a key/value pairing of the variables submitted in the request.
* `parse_user_agent(<user agent>)`, `parse_user_agent( <useragent field>,
<desired field> )`: The function parse_user_agent() takes a user agent string
as an argument and
- returns a map of the available fields. Note that not every field will be
present in every user agent string.
+ returns a map of the available fields. Note that not every field will be
present in every user agent string.
[Complete Docs
Here](https://github.com/apache/drill/tree/master/contrib/udfs#user-agent-functions)
-
+
+## LogParser type remapping
+**Advanced feature**
+The underlying [logparser](https://github.com/nielsbasjes/logparser) supports
something called type remapping.
+Essentially it means that an extracted value which would normally be treated
as an unparsable STRING can now be 'cast' to something
+that can be further cut into relevant pieces.
+
+The parameter string is a `;` separated list of mappings.
+Each mapping is a `:` separated list of
+- the name of the underlying logparser field (which is different from th Drill
column name),
+- the underlying `type` which is used to determine which additional Dissectors
can be applied.
+- optionally the `cast` (one of `STRING`, `LONG`, `DOUBLE`) which may impact
the type of the Drill column
+
+Examples:
+- If you have a query parameter in the URL called `ua` which is really the
UserAgent string and you would like to parse this you can add
+`request.firstline.uri.query.ua:HTTP.USERAGENT`
+- If you have a query parameter in the URL called `timestamp` which is really
the numerical timestamp (epoch milliseconds).
+The additional "LONG" will cause the returned value be a long which tells
Drill the `TIME.EPOCH` is to be interpreted as a `TIMESTAMP` column.
+`request.firstline.uri.query.timestamp:TIME.EPOCH:LONG`
+
Review comment:
This is slick! 🥇
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Add direct Yauaa support for HTTPD Format Plugin.
> -------------------------------------------------
>
> Key: DRILL-7817
> URL: https://issues.apache.org/jira/browse/DRILL-7817
> Project: Apache Drill
> Issue Type: New Feature
> Reporter: Niels Basjes
> Assignee: Niels Basjes
> Priority: Minor
>
> Enhancement of having the Yauaa useragent parser immediately integrated with
> the HTTPD logparser.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)