[ https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison updated NUTCH-3001: ------------------------------- Description: It looks like the selenium protocol requires that there be content-type. The logic seems to be: If the content type is html or xhtml, use selenium, otherwise just grab the bytes. If the content-type is null, nothing is pulled. My guess is that the logic should be : if the content type is not null and equals html or xhtml use selenium, otherwise grab the bytes. Right? {noformat} String contentType = getHeader(Response.CONTENT_TYPE); // handle with Selenium only if content type in HTML or XHTML if (contentType != null) { {noformat} was: It looks like the selenium protocol requires that there be content-type. The logic seems to be: If the content type is html or xhtml, use selenium, otherwise just grab the bytes. If the content-type is null, nothing is pulled. My guess is that the logic should be : if the content type is not null and equals html or xhtml use selenium, otherwise grab the bytes. Right? {noformat} String contentType = getHeader(Response.CONTENT_TYPE); // handle with Selenium only if content type in HTML or XHTML if (contentType != null) { > protocol-selenium requires Content-Type header > ----------------------------------------------- > > Key: NUTCH-3001 > URL: https://issues.apache.org/jira/browse/NUTCH-3001 > Project: Nutch > Issue Type: Bug > Reporter: Tim Allison > Priority: Major > > It looks like the selenium protocol requires that there be content-type. > The logic seems to be: If the content type is html or xhtml, use selenium, > otherwise just grab the bytes. > If the content-type is null, nothing is pulled. > My guess is that the logic should be : if the content type is not null and > equals html or xhtml use selenium, otherwise grab the bytes. > Right? > {noformat} > String contentType = getHeader(Response.CONTENT_TYPE); > // handle with Selenium only if content type in HTML or XHTML > if (contentType != null) { > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)