I've never used it, but it seems to be an implementation of the HTTP Basic/Digest Authentication, defined in RFC 2617: http://www.ietf.org/rfc/rfc2617.txt. Please someone correct me if I'm wrong.
If your crawler hits a site that requires user authentication, it won't be able to scrap anything but the entity body sent along the 401 response, which usually isn't very meaningful. You must know the user/password credentials of every site your crawler visits in order to get the actual content. If you want to give it a try, you can set up basic HTTP authentication with PHP. Here are a couple of links: http://stackoverflow.com/questions/4150507/how-can-i-use-basic-http-authentication-in-php http://php.net/manual/en/features.http-auth.php Regards, - AJ Enviar: martes 25 de noviembre de 2014 a las 15:27 De: "Avi Hayun" <[email protected]> Para: "HttpClient User Discussion" <[email protected]> Asunto: When should I use the ClientAuthentication ? I am maintaining a Web Crawler. I want to integrate crawling of sites which have username/password zones. I successfully integrated FORM based authentication. I want to integrate also the ClientAuthentication I can see here: https://hc.apache.org/httpcomponents-client-ga/httpclient/examples/org/apache/http/examples/client/ClientAuthentication.java But, in order to check it out I need a scenario - a site with a zone protected by this type of authentication. Can anybody supply me with an example where I can use this ClientAuthentication in order to crawl ? --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
