We are trying to use DefaultHttpAsyncClient in our new crawler. We need to handle a few hundred connections per thread asynchronously, and it seems the right candidate.
In the last weeks we experimented with many DefaultHttpClient on a few thousand threads and it worked well (actually, we found a couple of bugs, as our crawls are very wide and meet any kind of server configuration errors). Consider that we crawl URLs from different sites continuously, so we need to change at each request the cookie store, which we do by direct management of the store itself. After digging the (little) documentation, I really couldn't figure out how to manage cookies with HttpAsyncClient. Any suggestion or code snipped would be really welcome: what we need to do, basically, is: - keep a few hundred GET requests open in parallel. - use for each request an AsyncByteConsumer to accumulate in a buffer the content, and in some data structure headers, cookies, etc. - on completion, schedule the received data for analysis. All this requires however to manage cookies, and I could not understand how to modify the cookie store for each async request, and how to get the cookie store in onResponseReceived(). Any help appreciated! seba -- View this message in context: http://httpcomponents.10934.n7.nabble.com/HttpAsyncClient-and-cookies-tp16798.html Sent from the HttpClient-User mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
