[ 
https://issues.apache.org/jira/browse/DRILL-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15011170#comment-15011170
 ] 

Jim Scott commented on DRILL-3423:
----------------------------------

Jacques,

Given everything you have said here I can see value in making some changes. I 
think that in order to move in that direction however, there are a considerable 
number of details not yet covered. I have tried to get them all below. I agree 
on the ideas of the functions and have put those which you suggested here in 
addition to others that would need to be covered. However, I would say that 
these issues must be resolved in order to move in this direction.

h3. Considerations

h4. User must specify a name that drill understands, and that can be mapped 
into a name the parser understands

*_Option_* -- There needs to be a mapping between every format string available 
for the user to be able to query that field (see table of mappings -- user will 
reference with underscore and not dots).

|| Format String || Variable Name || Type ||
| %a | connection.client.ip | IP |
| %\{c}a | connection.client.peerip | IP |
| %A | connection.server.ip | IP |
| %B | response.body.bytes | BYTES |
| %b | response.body.bytesclf | BYTES |
| %\{Foobar}C | request.cookies.* | HTTP.COOKIE |
| %D | server.process.time | MICROSECONDS |
| %\{Foobar}e | server.environment.* | VARIABLE |
| %f | server.filename | FILENAME |
| %h | connection.client.host | IP |
| %H | request.protocol | PROTOCOL |
| %\{Foobar}i | request.header. | HTTP.HEADER |
| %k | connection.keepalivecount | NUMBER |
| %l | connection.client.logname | NUMBER |
| %L | request.errorlogid | STRING |
| %m | request.method | HTTP.METHOD |
| %\{Foobar}n | server.module_note.* | STRING |
| %\{Foobar}o | response.header.* | HTTP.HEADER |
| %p | request.server.port.canonical | PORT |
| %\{canonical}p | connection.server.port.canonical | PORT |
| %\{local}p | connection.server.port | PORT |
| %\{remote}p | connection.client.port | PORT |
| %P | connection.server.child.processid | NUMBER |
| %\{pid}P | connection.server.child.processid | NUMBER |
| %\{tid}P | connection.server.child.threadid | NUMBER |
| %\{hextid}P | connection.server.child.hexthreadid | NUMBER |
| %q | request.querystring | HTTP.QUERYSTRING |
| %r | request.firstline | HTTP.FIRSTLINE |
| %R | request.handler | STRING |
| %s | request.status.original | STRING |
| %>s | request.status.last | STRING |
| %t | request.receive.time | TIME.STAMP |
| %\{msec}t | request.receive.time.begin.msec | TIME.EPOCH |
| %\{begin:msec}t | request.receive.time.begin.msec | TIME.EPOCH |
| %\{end:msec}t | request.receive.time.end.msec | TIME.EPOCH |
| %\{usec}t | request.receive.time.begin.usec | TIME.EPOCH.USEC |
| %\{begin:usec}t | request.receive.time.begin.usec | TIME.EPOCH.USEC |
| %\{end:usec}t | request.receive.time.end.usec | TIME.EPOCH.USEC |
| %\{msec_frac}t | request.receive.time.begin.msec_frac | TIME.EPOCH |
| %\{begin:msec_frac}t | request.receive.time.begin.msec_frac | TIME.EPOCH |
| %\{end:msec_frac}t | request.receive.time.end.msec_frac | TIME.EPOCH |
| %\{usec_frac}t | request.receive.time.begin.usec_frac | TIME.EPOCH.USEC_FRAC |
| %\{begin:usec_frac}t | request.receive.time.begin.usec_frac | 
TIME.EPOCH.USEC_FRAC |
| %\{end:usec_frac}t | request.receive.time.end.usec_frac | 
TIME.EPOCH.USEC_FRAC |
| %T | response.server.processing.time | SECONDS |
| %u | connection.client.user | STRING |
| %U | request.urlpath | URI |
| %v | connection.server.name.canonical | STRING |
| %V | connection.server.name | STRING |
| %X | response.connection.status | HTTP.CONNECTSTATUS |
| %I | request.bytes | BYTES |
| %O | response.bytes | BYTES |
| %\{cookie}i | request.cookies  | HTTP.COOKIES |
| %\{set-cookie}o | response.cookies | HTTP.SETCOOKIES |
| %\{user-agent}i | request.user-agent | HTTP.USERAGENT |
| %\{referer}i | request.referer | HTTP.URI |

h4. There are fields which could be parsed and selected by the user that are 
complex (URL, URI, query string)

*_Option_* -- Provide a function to parse urls into map
{code}
{ 
  protocol: "...", 
  user: "...", 
  password: "...", 
  host: "...", 
  port: "...", 
  path: "...", 
  query: "...", 
  fragment: "..."
}
{code}

*_Option_* -- Provide a function to parse a query string into (users can use 
kvgen on this if they need to)
{code}
{
  "fieldName1": "fieldValue1", 
  "fieldName2": "fieldValue2", 
  ... 
}
{code}

h4. There are fields which could be parsed and selected by the user that are 
arbitrary (cookies, headers, etc..)
*_Option_* -- Cookies are named and contain (domain, expires, path, value)
{code}
[ 
  name: {
    domain: "...", 
    expires: "...", 
    path: "...", 
    value: "..."
  }, 
  ... 
]
{code}

*_Issue to Address_*
There are details in the string format represented by Foobar (e.g. header 
names) that cannot necessarily be identified before hand and must be accounted 
for or else the parser won't be completely effective and the user will not be 
able to query headers, etc... that exist in the log.

h4. Other Possible Issues
Who is going to write the functions to expose the functionality for all Drill 
queries?


> Add New HTTPD format plugin
> ---------------------------
>
>                 Key: DRILL-3423
>                 URL: https://issues.apache.org/jira/browse/DRILL-3423
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Storage - Other
>            Reporter: Jacques Nadeau
>            Assignee: Jim Scott
>             Fix For: 1.4.0
>
>
> Add an HTTPD logparser based format plugin.  The author has been kind enough 
> to move the logparser project to be released under the Apache License.  Can 
> find it here:
> <dependency>
>     <groupId>nl.basjes.parse.httpdlog</groupId>
>     <artifactId>httpdlog-parser</artifactId>
>     <version>2.0</version>
> </dependency>
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to