[ https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466741#comment-13466741 ]
Jasper van Veghel commented on NUTCH-827: ----------------------------------------- Looks like a pretty sloppy mistake in the patch .. ;-) + if (code == 200 && Http.LOG.isTraceEnabled()) { + Http.LOG.trace("url: " + url + + "; status code: " + code + + "; cookies received: " + Http.getClient().getState().getCookies().length); + } else { + Http.LOG.error("Unable to retrieve login page; code = " + code); + } Change that to something like .. + if (code == 200 && Http.LOG.isTraceEnabled()) { + Http.LOG.trace("url: " + url + + "; status code: " + code + + "; cookies received: " + Http.getClient().getState().getCookies().length); + } else if (code != 200) { + Http.LOG.error("Unable to retrieve login page; code = " + code); + } And also change this .. + LOG.error("Cookie-based authentication failed; cookies will not be present for this request but an attempt to retrieve them will be made for the next one."); To something like this .. + LOG.error("Cookie-based authentication failed; cookies will not be present for this request but an attempt to retrieve them will be made for the next one.", e); To see where the Exception is coming from. All it does after that LOG.error() is release the connection. So it shouldn't be throwing an Exception. > HTTP POST Authentication > ------------------------ > > Key: NUTCH-827 > URL: https://issues.apache.org/jira/browse/NUTCH-827 > Project: Nutch > Issue Type: New Feature > Components: fetcher > Affects Versions: 1.1, nutchgora > Reporter: Jasper van Veghel > Priority: Minor > Labels: authentication > Fix For: 1.6 > > Attachments: nutch-http-cookies.patch > > > I've created a patch against the trunk which adds support for very > rudimentary POST-based authentication support. It takes a link from > nutch-site.xml with a site to POST to and its respective parameters > (username, password, etc.). It then checks upon every request whether any > cookies have been initialized, and if none have, it fetches them from the > given link. > This isn't perfect but Works For Me (TM) as I generally only need to retrieve > results from a single domain and so have no cookie overlap (i.e. if the > domain cookies expire, all cookies disappear from the HttpClient and I can > simply re-fetch them). A natural improvement would be to be able to specify > one particular cookie to check the expiration-date against. If anyone is > interested in this beside me I'd be glad to put some more effort into making > this more universally applicable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira