I've never used it, but it seems to be an implementation of the HTTP 
Basic/Digest Authentication, defined in RFC 2617: 
http://www.ietf.org/rfc/rfc2617.txt. Please someone correct me if I'm wrong.

If your crawler hits a site that requires user authentication, it won't be able 
to scrap anything but the entity body sent along the 401 response, which 
usually isn't very meaningful. You must know the user/password credentials of 
every site your crawler visits in order to get the actual content.

If you want to give it a try, you can set up basic HTTP authentication with 
PHP. Here are a couple of links:
http://stackoverflow.com/questions/4150507/how-can-i-use-basic-http-authentication-in-php
http://php.net/manual/en/features.http-auth.php
 
Regards,

- AJ
 
 

Enviar: martes 25 de noviembre de 2014 a las 15:27
De: "Avi Hayun" <[email protected]>
Para: "HttpClient User Discussion" <[email protected]>
Asunto: When should I use the ClientAuthentication ?
I am maintaining a Web Crawler.


I want to integrate crawling of sites which have username/password zones.


I successfully integrated FORM based authentication.


I want to integrate also the ClientAuthentication I can see here:
https://hc.apache.org/httpcomponents-client-ga/httpclient/examples/org/apache/http/examples/client/ClientAuthentication.java


But, in order to check it out I need a scenario - a site with a zone
protected by this type of authentication.


Can anybody supply me with an example where I can use this
ClientAuthentication in order to crawl ?

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to