[ https://issues.apache.org/jira/browse/DRILL-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15011170#comment-15011170 ]
Jim Scott commented on DRILL-3423: ---------------------------------- Jacques, Given everything you have said here I can see value in making some changes. I think that in order to move in that direction however, there are a considerable number of details not yet covered. I have tried to get them all below. I agree on the ideas of the functions and have put those which you suggested here in addition to others that would need to be covered. However, I would say that these issues must be resolved in order to move in this direction. h3. Considerations h4. User must specify a name that drill understands, and that can be mapped into a name the parser understands *_Option_* -- There needs to be a mapping between every format string available for the user to be able to query that field (see table of mappings -- user will reference with underscore and not dots). || Format String || Variable Name || Type || | %a | connection.client.ip | IP | | %\{c}a | connection.client.peerip | IP | | %A | connection.server.ip | IP | | %B | response.body.bytes | BYTES | | %b | response.body.bytesclf | BYTES | | %\{Foobar}C | request.cookies.* | HTTP.COOKIE | | %D | server.process.time | MICROSECONDS | | %\{Foobar}e | server.environment.* | VARIABLE | | %f | server.filename | FILENAME | | %h | connection.client.host | IP | | %H | request.protocol | PROTOCOL | | %\{Foobar}i | request.header. | HTTP.HEADER | | %k | connection.keepalivecount | NUMBER | | %l | connection.client.logname | NUMBER | | %L | request.errorlogid | STRING | | %m | request.method | HTTP.METHOD | | %\{Foobar}n | server.module_note.* | STRING | | %\{Foobar}o | response.header.* | HTTP.HEADER | | %p | request.server.port.canonical | PORT | | %\{canonical}p | connection.server.port.canonical | PORT | | %\{local}p | connection.server.port | PORT | | %\{remote}p | connection.client.port | PORT | | %P | connection.server.child.processid | NUMBER | | %\{pid}P | connection.server.child.processid | NUMBER | | %\{tid}P | connection.server.child.threadid | NUMBER | | %\{hextid}P | connection.server.child.hexthreadid | NUMBER | | %q | request.querystring | HTTP.QUERYSTRING | | %r | request.firstline | HTTP.FIRSTLINE | | %R | request.handler | STRING | | %s | request.status.original | STRING | | %>s | request.status.last | STRING | | %t | request.receive.time | TIME.STAMP | | %\{msec}t | request.receive.time.begin.msec | TIME.EPOCH | | %\{begin:msec}t | request.receive.time.begin.msec | TIME.EPOCH | | %\{end:msec}t | request.receive.time.end.msec | TIME.EPOCH | | %\{usec}t | request.receive.time.begin.usec | TIME.EPOCH.USEC | | %\{begin:usec}t | request.receive.time.begin.usec | TIME.EPOCH.USEC | | %\{end:usec}t | request.receive.time.end.usec | TIME.EPOCH.USEC | | %\{msec_frac}t | request.receive.time.begin.msec_frac | TIME.EPOCH | | %\{begin:msec_frac}t | request.receive.time.begin.msec_frac | TIME.EPOCH | | %\{end:msec_frac}t | request.receive.time.end.msec_frac | TIME.EPOCH | | %\{usec_frac}t | request.receive.time.begin.usec_frac | TIME.EPOCH.USEC_FRAC | | %\{begin:usec_frac}t | request.receive.time.begin.usec_frac | TIME.EPOCH.USEC_FRAC | | %\{end:usec_frac}t | request.receive.time.end.usec_frac | TIME.EPOCH.USEC_FRAC | | %T | response.server.processing.time | SECONDS | | %u | connection.client.user | STRING | | %U | request.urlpath | URI | | %v | connection.server.name.canonical | STRING | | %V | connection.server.name | STRING | | %X | response.connection.status | HTTP.CONNECTSTATUS | | %I | request.bytes | BYTES | | %O | response.bytes | BYTES | | %\{cookie}i | request.cookies | HTTP.COOKIES | | %\{set-cookie}o | response.cookies | HTTP.SETCOOKIES | | %\{user-agent}i | request.user-agent | HTTP.USERAGENT | | %\{referer}i | request.referer | HTTP.URI | h4. There are fields which could be parsed and selected by the user that are complex (URL, URI, query string) *_Option_* -- Provide a function to parse urls into map {code} { protocol: "...", user: "...", password: "...", host: "...", port: "...", path: "...", query: "...", fragment: "..." } {code} *_Option_* -- Provide a function to parse a query string into (users can use kvgen on this if they need to) {code} { "fieldName1": "fieldValue1", "fieldName2": "fieldValue2", ... } {code} h4. There are fields which could be parsed and selected by the user that are arbitrary (cookies, headers, etc..) *_Option_* -- Cookies are named and contain (domain, expires, path, value) {code} [ name: { domain: "...", expires: "...", path: "...", value: "..." }, ... ] {code} *_Issue to Address_* There are details in the string format represented by Foobar (e.g. header names) that cannot necessarily be identified before hand and must be accounted for or else the parser won't be completely effective and the user will not be able to query headers, etc... that exist in the log. h4. Other Possible Issues Who is going to write the functions to expose the functionality for all Drill queries? > Add New HTTPD format plugin > --------------------------- > > Key: DRILL-3423 > URL: https://issues.apache.org/jira/browse/DRILL-3423 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other > Reporter: Jacques Nadeau > Assignee: Jim Scott > Fix For: 1.4.0 > > > Add an HTTPD logparser based format plugin. The author has been kind enough > to move the logparser project to be released under the Apache License. Can > find it here: > <dependency> > <groupId>nl.basjes.parse.httpdlog</groupId> > <artifactId>httpdlog-parser</artifactId> > <version>2.0</version> > </dependency> > -- This message was sent by Atlassian JIRA (v6.3.4#6332)