On Mon, Dec 13, 2010 at 08:06:42PM +0100, Juliusz Chroboczek wrote: > > The old wwwoffle proxy server allowed to index the cache contents with > > external tools, like ht://dig or mnogosearch. > > There are two reasons why this is not completely trivial with Polipo. > > The first is that Polipo saves pages in the exact format that was > provided by the server; this means that some pages will be compressed, > depending what content-encoding was negotiated between the client and > the server. You can potentially work around that by censoring the > Accept-Encoding header, at the cost of not compressing any pages. > > The second is that Polipo stores pages under a hash of the URL, and > recovering the URL requires parsing the on-disk page for the > X-Polipo-Location header. This should not be difficult to work around > with some scripting.
actually it is not much easier with wwwwoffle where you also need some work to figure out page names and in fact wwwoffle may itself compress pages after they reached a certain age when configured so. Both problems have been solved and would not be a significant problem in polipo either, however I doubt it is worth it - I was not at all happy with the search engines that I tried. One nuisance is that all of those I know store an own copy of every page they index.. something that is a must for google but a considerable waste and performance degradation if the page is stored locally anyway. Some engines which I tried simply scaled very badly once confronted with more than a toy dataset, did not allow partial/incremental updates and ht://dig segfaulted quite often. So I would be quite curious if anyone is using such a search engine on a dataset of over 20 Gbytes and which one works. Richard --- Name and OpenPGP keys available from pgp key servers ------------------------------------------------------------------------------ Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d _______________________________________________ Polipo-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/polipo-users
