[ https://issues.apache.org/jira/browse/DRILL-8318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609078#comment-17609078 ]
Charles Givre commented on DRILL-8318: -------------------------------------- [~nielsbasjes], could you take a look? > httpd format parser throws exception on log item with malformed query string > ---------------------------------------------------------------------------- > > Key: DRILL-8318 > URL: https://issues.apache.org/jira/browse/DRILL-8318 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.19.0 > Environment: drill-embedded > openjdk version "1.8.0_342" > OpenJDK Runtime Environment Corretto-8.342.07.1 (build 1.8.0_342-b07) > OpenJDK 64-Bit Server VM Corretto-8.342.07.1 (build 25.342-b07, mixed mode) > Ubuntu 20.04.4 LTS (Focal Fossa) > Running under WSL on Windows 11 > Reporter: Richard Downer > Priority: Major > Attachments: testcase > > > I am running Apache Drill over my httpd-style access logs. These are > collecting data from requests on the open Internet, which sometimes means > questionable requests made by remote Internet users (sometimes with hostile > intent). > One such style of request looks like this: > {{151.236.216.243 - - [15/Sep/2022:20:18:07 +0000] "GET > /?=PHPE9568F36-D428-11d2-A769-00AA001ACF42 HTTP/1.1" 301 178 "-" > "curl/7.54.0"}} > I have put this request into a new log file containing only this line, as a > test case. I initiate a query: > {{select request_receive_time, request_status_last, request_firstline_method, > request_firstline_uri from > table(dfs.`/home/richard/drill/access-logs/nginx/testcase`(type=>'httpd', > logFormat=>'combined')) where request_status_last = 404;}} > This produces this error: > {{Error: DATA_READ ERROR: Error reading HTTPD file at line number 0}} > {{Error occurred during setter call: null caused by > "java.lang.StringIndexOutOfBoundsException: String index out of range: -1" > when calling "public void > org.apache.drill.exec.store.httpd.HttpdLogRecord.setWildcard(java.lang.String,java.lang.String)" > for key = "STRING:request.firstline.uri.query.*" name = > "STRING:request.firstline.uri.query" value = "Value\{filled=STRING, > s='PHPE9568F36-D428-11d2-A769-00AA001ACF42', l=null, d=null}" castsTo = > "[STRING]"}} > {{Format plugin: httpd}} > {{Format plugin: HttpdLogFormatPlugin}} > {{Plugin config name: null}} > {{Fragment: 0:0}} > While I appreciate that the query string part of the request is probably > malformed according to a strict interpretation, this is a real request seen > "in the wild" and I would prefer that Drill is robust enough to deal with the > type of garbage requests frequently seen on real web server. > Thank you for your assistance - if I can provide any more information that > would help please let me know! -- This message was sent by Atlassian Jira (v8.20.10#820010)