Re: squid-prefetching status

2005-05-14 Thread Nick Lewycky
Jon Kay wrote:
> Nick Lewycky wrote:
> 
>>Finally, does anyone have suggestions for how to test for performance
>>improvement due to prefetching?
> 
> A good way to test how your algorithms are working is to get a nice, long
> actual Squid workload -eg, URLs fetched, and compare how long it takes
> to execute the whole thing with and without prefetching.

That's a very good plan. Does anyone have recent logs publicly
available? I have some IRCache logs for the day of May 31, 2004 -- but
when I tried the first 5,000 entries, I found that 87% of the prefetches
weren't fetched later in the log. I think this is mostly because the
pages changed after that date and also because of filtering effects from
client caching.

What I'd really like to have is a way to look at the page load times
instead of running through individual URLs.

> Note that you generally have to prefetch a LOT of stuff to get much
> improvement,
> because web cache fetch popularity follows zipf's law and decays slowly.

I hadn't heard of Zipf's law. It's interesting, thank you for
introducing me to it! Just to make certain I understand what you're
saying ... you're noting that I need a lot of log data to test with
because most fetches enter the working set where prefetching won't help
and so I need a large number of cache misses?

> Good luck with your work.

Thank you!

Nick Lewycky


Re: cvs commit: squid3/include Range.h

2005-05-14 Thread Henrik Nordstrom
On Sat, 14 May 2005, Serassio Guido wrote:
With an empty "port" acl, Squid crashes when dumping configuration in 
cachemgr:

and non empty "port" acl are not working: this is the dump output of the 
default squid.conf:
Fixed. Got a condition the wrong way around again  (ListIterator eof was 
negated).

Regards
Henrik


Re: squid-prefetching status

2005-05-14 Thread Jon Kay
Nick Lewycky wrote:

> Hi. I've been working to add prefetching to squid3. It works by
> analyzing HTML and looking for various tags that a graphical browser an
> be expected to request.
>
> So far, it seems to just-barely work. What works is checking the
> content-type of the document, avoiding encoded (gzip'ed) documents,
> analyzing the HTML using libxml2 in "tag soup" mode, resolving the full
> URL from relative references, and fetching the files into the cache. (I
> would, of course, appreciate code reviews of the branch before I diverge
> too far!)
>
> However, I've run into a few problems.
>
> To prefetch a page, we call clientBeginRequest. I've already had to
> extend the richness of this interface a little. The main problem is that
> it will open up a new socket for each call. On a page with 100
> prefetchables, it will open 100 TCP connections to the remote server.
> That's not nice. I need a way to re-use a connection for multiple
> requests. How should I do this? I'd like clientBeginRequest to be smart
> enough to handle this behind the scenes.
>
> Occasionally I see duplicate prefetches. I think what's going on here is
> that the object is uncacheable. The only way I can think of solving this
> is by adding an "uncacheable" entry type to the store -- but that just
> seems wrong, conceptually. On a related note, maybe we could terminate a
> prefetch as soon as we receive the headers and notice that it's
> uncacheable. Currently, we download the whole thing and just discard it
> (after analyzing it for more prefetchables if it's HTML).
>
> Finally, does anyone have suggestions for how to test for performance
> improvement due to prefetching?

A good way to test how your algorithms are working is to get a nice, long
actual Squid workload -eg, URLs fetched, and compare how long it takes
to execute the whole thing with and without prefetching.

Note that you generally have to prefetch a LOT of stuff to get much
improvement,
because web cache fetch popularity follows zipf's law and decays slowly.

Good luck with your work.

Jon




Re: cvs commit: squid3/include Range.h

2005-05-14 Thread Serassio Guido
Hi Henrik,
At 01.28 09/05/2005, [EMAIL PROTECTED] wrote:
hno 2005/05/08 17:28:06 MDT
  Modified files:
include  Range.h
  Log:
  const correctness
  Revision  ChangesPath
  1.6   +2 -2  squid3/include/Range.h
With an empty "port" acl, Squid crashes when dumping configuration in 
cachemgr:
2005/05/14 18:59:10| Warning: empty ACL: acl bad_port port
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 16384 (LWP 31519)]
0x08056a37 in Range::size (this=0x4) at Range.h:80
80  return end > start ? end - start : 0;
(gdb) backtrace
#0  0x08056a37 in Range::size (this=0x4) at Range.h:80
#1  0x08059d94 in ACLStrategised::dump (this=0x4) at ACLStrategised.h:166
#2  0x08051357 in ACL::dumpGeneric (this=0x0) at acl.cc:563
#3  0x08065bce in dump_acl (entry=0x4068c8c0, name=0x816b2b6 "acl", ae=0x4) 
at cache_cf.cc:811
#4  0x0806cfd5 in dump_config (entry=0x4068c8c0) at cf_parser.h:1620
#5  0x0807138d in cachemgrStart (fd=140053524, request=0x85940d8, 
entry=0x4068c8c0)
at cache_manager.cc:332

and non empty "port" acl are not working: this is the dump output of the 
default squid.conf:

acl to_localhost dst 127.0.0.0/255.0.0.0
acl SSL_ports port
acl Safe_ports port
acl CONNECT method CONNECT
Regards
Guido

-

Guido Serassio
Acme Consulting S.r.l. - Microsoft Certified Partner
Via Lucia Savarino, 1   10098 - Rivoli (TO) - ITALY
Tel. : +39.011.9530135  Fax. : +39.011.9781115
Email: [EMAIL PROTECTED]
WWW: http://www.acmeconsulting.it/


Re: squid 2.5 with icap (fwd)

2005-05-14 Thread Tsantilas Christos
Henrik Nordstrom wrote:
Hello Henrik,
I dont know who is responsible for icapclient development in squid, if 
not you are,  please forward it.

We have been using the squid with icap support. We found the following 
problem in squid icap client: 

When an HTTP server sends a response to squid without HTTP header 
(according to HTTP/0.9),

squid makes wrong icap request, so an icapserver cannot parse it.  The 
HTTP part of squid works good with such request, but the icap client 
does not.
Unfortunatelly there are HTTP  servers that uses this old protocol.

Yes the problem exist..
Sorry, the private information in the icap request is skipped.
We made a patch to fix this problem, and would like somebody to add it 
to the squid icap development branch.

I do not want to apply, there are thinks that I do not  like in this patch.
But I am going to make my solution (maybe based on their patch).
--
Christos