Richard Downer created DRILL-8318:
-------------------------------------

             Summary: httpd format parser throws exception on log item with 
malformed query string
                 Key: DRILL-8318
                 URL: https://issues.apache.org/jira/browse/DRILL-8318
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.19.0
         Environment: drill-embedded

openjdk version "1.8.0_342"
OpenJDK Runtime Environment Corretto-8.342.07.1 (build 1.8.0_342-b07)
OpenJDK 64-Bit Server VM Corretto-8.342.07.1 (build 25.342-b07, mixed mode)

Ubuntu 20.04.4 LTS (Focal Fossa)

Running under WSL on Windows 11
            Reporter: Richard Downer
         Attachments: testcase

I am running Apache Drill over my httpd-style access logs. These are collecting 
data from requests on the open Internet, which sometimes means questionable 
requests made by remote Internet users (sometimes with hostile intent).

One such style of request looks like this:

{{151.236.216.243 - - [15/Sep/2022:20:18:07 +0000] "GET 
/?=PHPE9568F36-D428-11d2-A769-00AA001ACF42 HTTP/1.1" 301 178 "-" "curl/7.54.0"}}

I have put this request into a new log file containing only this line, as a 
test case. I initiate a query:

{{select request_receive_time, request_status_last, request_firstline_method, 
request_firstline_uri from 
table(dfs.`/home/richard/drill/access-logs/nginx/testcase`(type=>'httpd', 
logFormat=>'combined')) where request_status_last = 404;}}

This produces this error:

{{Error: DATA_READ ERROR: Error reading HTTPD file at line number 0}}

{{Error occurred during setter call: null caused by 
"java.lang.StringIndexOutOfBoundsException: String index out of range: -1" when 
calling "public void 
org.apache.drill.exec.store.httpd.HttpdLogRecord.setWildcard(java.lang.String,java.lang.String)"
 for  key = "STRING:request.firstline.uri.query.*"  name = 
"STRING:request.firstline.uri.query"  value = "Value\{filled=STRING, 
s='PHPE9568F36-D428-11d2-A769-00AA001ACF42', l=null, d=null}" castsTo = 
"[STRING]"}}
{{Format plugin: httpd}}
{{Format plugin: HttpdLogFormatPlugin}}
{{Plugin config name: null}}
{{Fragment: 0:0}}

While I appreciate that the query string part of the request is probably 
malformed according to a strict interpretation, this is a real request seen "in 
the wild" and I would prefer that Drill is robust enough to deal with the type 
of garbage requests frequently seen on real web server.

Thank you for your assistance - if I can provide any more information that 
would help please let me know!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to