On Mon, Dec 13, 2010 at 08:06:42PM +0100, Juliusz Chroboczek wrote:
> > The old wwwoffle proxy server allowed to index the cache contents with
> > external tools, like ht://dig or mnogosearch.
> 
> There are two reasons why this is not completely trivial with Polipo.
> 
> The first is that Polipo saves pages in the exact format that was
> provided by the server; this means that some pages will be compressed,
> depending what content-encoding was negotiated between the client and
> the server.  You can potentially work around that by censoring the
> Accept-Encoding header, at the cost of not compressing any pages.
> 
> The second is that Polipo stores pages under a hash of the URL, and
> recovering the URL requires parsing the on-disk page for the
> X-Polipo-Location header.  This should not be difficult to work around
> with some scripting.

actually it is not much easier with wwwwoffle where you also need some work
to figure out page names and in fact wwwoffle may itself compress pages after
they reached a certain age when configured so.

Both problems have been solved and would not be a significant problem in polipo
either, however I doubt it is worth it - I was not at all happy with the search 
engines that I tried.

One nuisance is that all of those I know store an own copy of every page they 
index.. something that is a must for google but a considerable waste and 
performance
degradation if the page is stored locally anyway.

Some engines which I tried simply scaled very badly once confronted with more
than a toy dataset, did not allow partial/incremental updates and ht://dig 
segfaulted quite often.

So I would be quite curious if anyone is using such a search engine on a dataset
of over 20 Gbytes and which one works.

Richard

---
Name and OpenPGP keys available from pgp key servers


------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Polipo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/polipo-users

Reply via email to