[ 
https://issues.apache.org/jira/browse/DRILL-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009557#comment-15009557
 ] 

Jacques Nadeau commented on DRILL-3423:
---------------------------------------

Here is my alternative proposal: 

With the log format above: 
"%h %t \"%r\" %>s %b \"%{Referer}i\""

I propose a user gets the following fields (in order)

remote_host (varchar)
request_receive_time (drill timestamp)
request_method (varchar)
request_uri (varchar)
response_status (int)
response_bytes (bigint)
header_referer

Additionally, I think we should provide two new functions: 

parse_url(varchar url)
parse_url_query(varchar querystring, varchar pairDelimiter, varchar 
keyValueDelimiter)

parse_url(varchar) would provide an output of map type similar to: 
{code}
{
  protocol: ...,
  user: ...,
  password: ...,
  host: ...,
  port: 
  path: 
  query:
  fragment:
}
{code}

parse_url_query(...) would return an array of key values:
[
  {key: "...", value: "..."},
  {key: "...", value: "..."},
  {key: "...", value: "..."},
  {key: "...", value: "..."}
]

In response to your proposal: I don't think it makes sense to return many 
fields for a date field. Drill already provides functionality to get parts of a 
date. I also don't think it makes sense to prefix a field with its datatype, we 
don't do that anywhere else in Drill. We should also expose parsing an optional 
behavior in Drill.  Note also that my proposal substantially reduces the number 
of fields exposed to the user. I think this proposal has much better usability 
in the context of sql.

If you want to take advantage of the underlying formats capabilities, you can 
treat that as a pushdown of a particular function (data part or the url parsing 
functions above).





> Add New HTTPD format plugin
> ---------------------------
>
>                 Key: DRILL-3423
>                 URL: https://issues.apache.org/jira/browse/DRILL-3423
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Storage - Other
>            Reporter: Jacques Nadeau
>            Assignee: Jim Scott
>             Fix For: 1.4.0
>
>
> Add an HTTPD logparser based format plugin.  The author has been kind enough 
> to move the logparser project to be released under the Apache License.  Can 
> find it here:
> <dependency>
>     <groupId>nl.basjes.parse.httpdlog</groupId>
>     <artifactId>httpdlog-parser</artifactId>
>     <version>2.0</version>
> </dependency>
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to