[
https://issues.apache.org/jira/browse/DRILL-8318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609078#comment-17609078
]
Charles Givre commented on DRILL-8318:
--------------------------------------
[~nielsbasjes], could you take a look?
> httpd format parser throws exception on log item with malformed query string
> ----------------------------------------------------------------------------
>
> Key: DRILL-8318
> URL: https://issues.apache.org/jira/browse/DRILL-8318
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.19.0
> Environment: drill-embedded
> openjdk version "1.8.0_342"
> OpenJDK Runtime Environment Corretto-8.342.07.1 (build 1.8.0_342-b07)
> OpenJDK 64-Bit Server VM Corretto-8.342.07.1 (build 25.342-b07, mixed mode)
> Ubuntu 20.04.4 LTS (Focal Fossa)
> Running under WSL on Windows 11
> Reporter: Richard Downer
> Priority: Major
> Attachments: testcase
>
>
> I am running Apache Drill over my httpd-style access logs. These are
> collecting data from requests on the open Internet, which sometimes means
> questionable requests made by remote Internet users (sometimes with hostile
> intent).
> One such style of request looks like this:
> {{151.236.216.243 - - [15/Sep/2022:20:18:07 +0000] "GET
> /?=PHPE9568F36-D428-11d2-A769-00AA001ACF42 HTTP/1.1" 301 178 "-"
> "curl/7.54.0"}}
> I have put this request into a new log file containing only this line, as a
> test case. I initiate a query:
> {{select request_receive_time, request_status_last, request_firstline_method,
> request_firstline_uri from
> table(dfs.`/home/richard/drill/access-logs/nginx/testcase`(type=>'httpd',
> logFormat=>'combined')) where request_status_last = 404;}}
> This produces this error:
> {{Error: DATA_READ ERROR: Error reading HTTPD file at line number 0}}
> {{Error occurred during setter call: null caused by
> "java.lang.StringIndexOutOfBoundsException: String index out of range: -1"
> when calling "public void
> org.apache.drill.exec.store.httpd.HttpdLogRecord.setWildcard(java.lang.String,java.lang.String)"
> for key = "STRING:request.firstline.uri.query.*" name =
> "STRING:request.firstline.uri.query" value = "Value\{filled=STRING,
> s='PHPE9568F36-D428-11d2-A769-00AA001ACF42', l=null, d=null}" castsTo =
> "[STRING]"}}
> {{Format plugin: httpd}}
> {{Format plugin: HttpdLogFormatPlugin}}
> {{Plugin config name: null}}
> {{Fragment: 0:0}}
> While I appreciate that the query string part of the request is probably
> malformed according to a strict interpretation, this is a real request seen
> "in the wild" and I would prefer that Drill is robust enough to deal with the
> type of garbage requests frequently seen on real web server.
> Thank you for your assistance - if I can provide any more information that
> would help please let me know!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)