Hi, I updated the cleanup branch with a rewritten crawler module. Basically I removed all the protocol stuff and added a simple HttpClient Fetcher for retrieving web pages.
Since HttpClient 4.2 there is a new SystemDefaultHttpClient which uses a PoolingClientConnectionManager and this is be lot faster than the old DroidsHttpConnectionManager. I compared the configuration between the DroidsHttpClient and the SystemDefaultHttpClient and they did not show notable differences. So I removed the classes from the org.apache.droids.protocol.http package. So please check out, test and discuss the cleanup branch https://svn.apache.org/repos/asf/incubator/droids/branches/0.2.x-cleanup/ Next steps: - updating the tika module - adding more javadocs - adding more test cases Tobias
