[ https://issues.apache.org/jira/browse/NUTCH-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel updated NUTCH-2760: ----------------------------------- Labels: patch-available (was: ) > protocol-okhttp: properly record HTTP version in request message header > ----------------------------------------------------------------------- > > Key: NUTCH-2760 > URL: https://issues.apache.org/jira/browse/NUTCH-2760 > Project: Nutch > Issue Type: Bug > Components: plugin, protocol > Affects Versions: 1.16 > Reporter: Sebastian Nagel > Priority: Minor > Labels: patch-available > Fix For: 1.17 > > > The HTTP version in the request message tracked by the plugin protocol-okhttp > ({{store.http.request=true}}) is not the version sent in the request but that > received from the response. > Note that the HTTP version sent in the request may differ from that sent back > in the response. One example (tracked using wget): > {noformat} > > wget -d https://www.kp.ru/daily/27061/4129507/ > ... > ---request begin--- > GET /daily/27061/4129507/ HTTP/1.1 > User-Agent: Wget/1.20.3 (linux-gnu) > Accept: */* > Accept-Encoding: identity > Host: www.kp.ru > Connection: Keep-Alive > ---request end--- > HTTP request sent, awaiting response... > ---response begin--- > HTTP/1.0 200 OK > ... > {noformat} > protocol-http uses the response version ("HTTP/1.0") also for the request: > {noformat} > > bin/nutch parsechecker -Dstore.http.headers=true -Dstore.http.request=true \ > -Dplugin.includes='protocol-okhttp|parse-html' > https://www.kp.ru/daily/27061/4129507/ > ... > _request_=GET /daily/27061/4129507/ HTTP/1.0 > ... > _response.headers_=HTTP/1.0 200 OK > ... > {noformat} > The protocol-http tracks the versions correctly: > {noformat} > > bin/nutch parsechecker -Dstore.http.headers=true -Dstore.http.request=true \ > -Dplugin.includes='protocol-http|parse-html' > https://www.kp.ru/daily/27061/4129507/ > ... > _request_=GET /daily/27061/4129507/ HTTP/1.1 > ... > _response.headers_=HTTP/1.0 200 OK > ... > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)