Re: Possible memory leak.

2014-07-10 Thread Eliezer Croitoru
I think I can build a squid rpm with valgrind support for this purpose 
and others.


Well now we have a bug report for the issue and it's good that we can 
test it.
Note that this server has 99% hits since it is serving less then 30 file 
which about half of them should be downloaded as a TCP_MEM_HIT.
Using StoreID stripping from the URL any traces of query parameters 
redirecting squid to fetch the same object for all the request which are 
abusing the "url path query" part to avoid caching.
(why would anyone that is testing his bandwidth against a specific 
server would want to not fetch it from ram instead of spinning disks?)


Eliezer

On 07/11/2014 06:32 AM, Amos Jeffries wrote:

This may be what Martin Sperl is reporting in the squid-users thread
"squid: Memory utilization higher than expected since moving from 3.3 to
3.4 and Vary: working"

What I'm trying to get from him there is a series of mgr:mem reports
over time to see if any particular object type is growing unusually. And
mgr:filedescriptors in case its a side effect of the hung connections
Christos identified recently.

If we are lucky enough that the squid was built with valgrind support
there should be a valgrind leak trace available in one of the info and
mem reports. This will only catch real leaks though, not ref-counting
holding things active.

Amos





Re: [PATCH] Support PROXY protocol

2014-07-10 Thread Alex Rousskov
On 06/25/2014 01:41 PM, Alex Rousskov wrote:
> On 06/21/2014 11:15 PM, Amos Jeffries wrote:
>> > Support receiving PROXY protocol version 1 and 2.

> sounds like nothing-on-top-of-nothing to me in Squid context? The
> terrible name for the PROXY protocol itself is clearly not your fault


Per Amos request on IRC, here are some suggestions for renaming the new
protocol itself (in case the protocol author, Willy Tarreau, might be
open to suggestions):

  * On Behalf Of Protocol

  * True Client Information Disclosure Protocol
  * TCP Client Information Disclosure Protocol

  * The Right To Be Remembered Protocol
  * Remember The Right Client Protocol

  * From and To Protocol
  * Actually From and To Protocol

  * Letterhead Protocol
  * True Client Letterhead Protocol
  * TCP Client Letterhead Protocol

  * Top Line Protocol
  * Top Header Protocol
  * Top Shelf Protocol


Most can be creatively "acronymized" for brevity, of course.

I have not validated any of these suggestions against prior art and
cultural clashes (although some puns were intentional).


The protocol renaming can be combined with renaming the PROXY magic word
in the top line as well, but they do not have to.


HTH,

Alex.



Re: [RFC] post-cache REQMOD

2014-07-10 Thread Alex Rousskov
On 07/10/2014 09:12 PM, Amos Jeffries wrote:
> On 11/07/2014 10:15 a.m., Alex Rousskov wrote:
>> I propose adding support for a third adaptation vectoring point:
>> post-cache REQMOD. Services at this new point receive cache miss
>> requests and may adapt them as usual. If a service satisfies the
>> request, the service response may get cached by Squid. As you know,
>> Squid currently support pre-cache REQMOD and pre-cache RESPMOD.
> 
> Just to clarify you mean this to be the vectoring point which receives
> MISS-only traffic, as the existing one(s) receive HIT+MISS traffic?

+ pre-cache REQMOD receives HIT+MISS request traffic.
+ pre-cache RESPMOD receives MISS response traffic.
* post-cache REQMOD receives MISS request traffic.
- post-cache RESPMOD receives HIT+MISS response traffic.

All four vectoring points are documented in RFC 3507. Squid currently
supports the first two. I propose adding support for the third. Each of
the four points (and probably other!) is useful for some use cases.

Besides getting different HIT/MISS traffic mixture, there are other
differences between these vectoring points. For example, pre-cache
REQMOD responses go straight to the HTTP client while post-cache REQMOD
responses may be cached by Squid (this example is about request
satisfaction mode of REQMOD).


>> We have received many requests for post-cache adaptation support
>> throughput the years, and I personally resisted the temptation of adding
>> another layer of complexity (albeit an optional one) because it is a lot
>> of work and because many use cases could be addressed without post-cache
>> adaptation support.
>>
>> The last straw (and the motivation for this RFC) was PageSpeed[1]
>> integration. With PageSpeed, one can generate various variants of
>> "optimized" content. For example, mobile users may receive smaller
>> images. Apache and Nginx support PageSpeed modules. It is possible to
>> integrate Squid with PageSpeed (and similar services) today, but it is
>> not possible for Squid to _cache_ those generated variants unless one is
>> willing to pay for another round trip to the origin server to get
>> exactly the same unoptimized content.
> 
> Can you show how they are violating standard HTTP variant caching? the
> HTTPbis should probably be informed of the problem.
> If it is actually within standard then it would seem to be a missing
> feature of Squid to cache them properly. We could improve better by
> fixing Squid to cache more compliant traffic.

This is unrelated to caching rules. HTTP does not have a notion of
creating multiple responses from a single response, and that is exactly
what PageSpeed and similar adaptations do: For example, they convert a
large origin server image or page into several variants, one for each
class of clients.


Does this clarify?


Thank you,

Alex.



Re: Possible memory leak.

2014-07-10 Thread Amos Jeffries
On 11/07/2014 6:10 a.m., Eliezer Croitoru wrote:
> OK so I started this reverse proxy for a bandwidth testing site and it
> seems odd that it using more then 400MB when the only difference in the
> config is maximum_object_size_in_memory to 150MB and StoreID
> 
> I have extracted mgr:mem and mgr:info at these urls:
> http://www1.ngtech.co.il/paste/1169/raw/
> http://www1.ngtech.co.il/paste/1168/raw/
> 
> A top snapshot:
> http://www1.ngtech.co.il/paste/1170/raw/
> 
> The default settings are 256MB for ram cache and this instance is ram
> only..
> squid.conf at:
> http://www1.ngtech.co.il/paste/1171/raw/
> 
> I started the machine 29 days ago while squid is up for less then that.
> 
> Any direction is welcomed but test cannot be done on this machine
> directly for now.
> 
> Eliezer

This may be what Martin Sperl is reporting in the squid-users thread
"squid: Memory utilization higher than expected since moving from 3.3 to
3.4 and Vary: working"

What I'm trying to get from him there is a series of mgr:mem reports
over time to see if any particular object type is growing unusually. And
mgr:filedescriptors in case its a side effect of the hung connections
Christos identified recently.

If we are lucky enough that the squid was built with valgrind support
there should be a valgrind leak trace available in one of the info and
mem reports. This will only catch real leaks though, not ref-counting
holding things active.

Amos



Re: [RFC] post-cache REQMOD

2014-07-10 Thread Amos Jeffries
On 11/07/2014 10:15 a.m., Alex Rousskov wrote:
> Hello,
> 
> I propose adding support for a third adaptation vectoring point:
> post-cache REQMOD. Services at this new point receive cache miss
> requests and may adapt them as usual. If a service satisfies the
> request, the service response may get cached by Squid. As you know,
> Squid currently support pre-cache REQMOD and pre-cache RESPMOD.

Just to clarify you mean this to be the vectoring point which receives
MISS-only traffic, as the existing one(s) receive HIT+MISS traffic?


> We have received many requests for post-cache adaptation support
> throughput the years, and I personally resisted the temptation of adding
> another layer of complexity (albeit an optional one) because it is a lot
> of work and because many use cases could be addressed without post-cache
> adaptation support.
> 
> The last straw (and the motivation for this RFC) was PageSpeed[1]
> integration. With PageSpeed, one can generate various variants of
> "optimized" content. For example, mobile users may receive smaller
> images. Apache and Nginx support PageSpeed modules. It is possible to
> integrate Squid with PageSpeed (and similar services) today, but it is
> not possible for Squid to _cache_ those generated variants unless one is
> willing to pay for another round trip to the origin server to get
> exactly the same unoptimized content.

Can you show how they are violating standard HTTP variant caching? the
HTTPbis should probably be informed of the problem.
If it is actually within standard then it would seem to be a missing
feature of Squid to cache them properly. We could improve better by
fixing Squid to cache more compliant traffic.

> 
> The only way to support Squid caching of PageSpeed variants without
> repeated round trips to the origin server is using two Squids. The
> parent Squid would cache origin server responses while the child Squid
> would adapt parent's responses and cache adapted content. Needless to
> say, running two Squids (each with its own cache) instead of one adds
> significant performance/administrative overheads and complexity.
> 
> 
> As far as internals are concerned, I am currently thinking of launching
> adaptation job for this vectoring point from FwdState::Start(). This
> way, its impact on the rest of Squid would be minimal and some adapters
> might even affect FwdState routing decisions. The initial code name for
> the new class is MissReqFilter, but that may change.
> 

Given that FwdState is the global selector to determine where MISS
content comes from this sounds reasonable.

I think after the miss_access tests is best position. We need to split
miss_access lookup off into an async step to be a slow lookup anyway.


> 
> The other candidate location for plugging in the new vectoring point is
> the Server class. However, that class is already complex. It handles
> communication with the next hop (with child classes doing
> protocol-specific work and confusing things further) as well as
> pre-cache RESPMOD vectoring point with caching initiation on top of
> that. The Server code already has trouble distinguishing various content
> streams it has to juggle. I am worried that adding another vectoring
> point there would make that complexity significantly worse.

Agreed. Bad idea.

> 
> It is possible that we would be able to refactor/encapsulate some of the
> code so that it can be reused in both the existing Server and the new
> MissReqFilter classes. I will look out for such opportunities while
> trying to keep the overall complexity in check.
> 
> 
> Any objections to adding post-cache REQMOD or better implementation ideas?

Just the above details about variant caching.

Amos



[RFC] post-cache REQMOD

2014-07-10 Thread Alex Rousskov
Hello,

I propose adding support for a third adaptation vectoring point:
post-cache REQMOD. Services at this new point receive cache miss
requests and may adapt them as usual. If a service satisfies the
request, the service response may get cached by Squid. As you know,
Squid currently support pre-cache REQMOD and pre-cache RESPMOD.


We have received many requests for post-cache adaptation support
throughput the years, and I personally resisted the temptation of adding
another layer of complexity (albeit an optional one) because it is a lot
of work and because many use cases could be addressed without post-cache
adaptation support.

The last straw (and the motivation for this RFC) was PageSpeed[1]
integration. With PageSpeed, one can generate various variants of
"optimized" content. For example, mobile users may receive smaller
images. Apache and Nginx support PageSpeed modules. It is possible to
integrate Squid with PageSpeed (and similar services) today, but it is
not possible for Squid to _cache_ those generated variants unless one is
willing to pay for another round trip to the origin server to get
exactly the same unoptimized content.

The only way to support Squid caching of PageSpeed variants without
repeated round trips to the origin server is using two Squids. The
parent Squid would cache origin server responses while the child Squid
would adapt parent's responses and cache adapted content. Needless to
say, running two Squids (each with its own cache) instead of one adds
significant performance/administrative overheads and complexity.


As far as internals are concerned, I am currently thinking of launching
adaptation job for this vectoring point from FwdState::Start(). This
way, its impact on the rest of Squid would be minimal and some adapters
might even affect FwdState routing decisions. The initial code name for
the new class is MissReqFilter, but that may change.



The other candidate location for plugging in the new vectoring point is
the Server class. However, that class is already complex. It handles
communication with the next hop (with child classes doing
protocol-specific work and confusing things further) as well as
pre-cache RESPMOD vectoring point with caching initiation on top of
that. The Server code already has trouble distinguishing various content
streams it has to juggle. I am worried that adding another vectoring
point there would make that complexity significantly worse.

It is possible that we would be able to refactor/encapsulate some of the
code so that it can be reused in both the existing Server and the new
MissReqFilter classes. I will look out for such opportunities while
trying to keep the overall complexity in check.


Any objections to adding post-cache REQMOD or better implementation ideas?


Thank you,

Alex.
[1] https://developers.google.com/speed/pagespeed/


Possible memory leak.

2014-07-10 Thread Eliezer Croitoru
OK so I started this reverse proxy for a bandwidth testing site and it 
seems odd that it using more then 400MB when the only difference in the 
config is maximum_object_size_in_memory to 150MB and StoreID


I have extracted mgr:mem and mgr:info at these urls:
http://www1.ngtech.co.il/paste/1169/raw/
http://www1.ngtech.co.il/paste/1168/raw/

A top snapshot:
http://www1.ngtech.co.il/paste/1170/raw/

The default settings are 256MB for ram cache and this instance is ram only..
squid.conf at:
http://www1.ngtech.co.il/paste/1171/raw/

I started the machine 29 days ago while squid is up for less then that.

Any direction is welcomed but test cannot be done on this machine 
directly for now.


Eliezer


Re: [PATCH] SSL Server connect I/O timeout

2014-07-10 Thread Amos Jeffries
On 28/06/2014 3:38 a.m., Tsantilas Christos wrote:
> Hi all,
> 
> Currently FwdState::negotiateSSL() operates on a TCP connection without
> a timeout. If, for example, the server never responds to Squid SSL
> Hello, the connection getstuck forever. This happens in real world when,
> for example, a client is trying to establish an SSL connection through
> bumping Squid to an HTTP server that does not speak SSL and does not
> detect initial request garbage (from HTTP point of view)
> 
> Moreover, if the client closes the connection while Squid is fruitlessly
> waiting for server SSL negotiation, the client connection will get into
> the CLOSE_WAIT state with a 1 day client_lifetime timeout.  This patch
> does not address that CLOSE_WAIT problem directly.
> 
> This patch adds an SSL negotiation timeout for the server SSL connection
> and try to not exceed forword_timeout or peer_timeout while connecting
> to an SSL server.
> 
> Some notes:
>  - In this patch still the timeouts used for Ssl::PeerConnector are not
> accurate, they may be 5 secs more then the forward timeout or 1 second
> more than peer_connect timeout, but I think are enough reasonable.
> 
>  - Please check and comment the new
> Comm::Connection::startTime()/::noteStart() mechanism.
> Now the Comm::Connection::startTime_ computed in Comm::Connection
> constructor and resets in Comm::ConnOpener::start() and
> Comm::TcpAcceptor::start()
> 
> 
> This is a Measurement Factory project.


+1. Please apply ASAP.

Amos


Re: [PATCH] SSL Server connect I/O timeout

2014-07-10 Thread Tsantilas Christos

If there are no objections I will apply this patch to trunk

Regards,
   Christos


On 06/27/2014 06:38 PM, Tsantilas Christos wrote:

Hi all,

Currently FwdState::negotiateSSL() operates on a TCP connection 
without a timeout. If, for example, the server never responds to Squid 
SSL Hello, the connection getstuck forever. This happens in real world 
when, for example, a client is trying to establish an SSL connection 
through bumping Squid to an HTTP server that does not speak SSL and 
does not detect initial request garbage (from HTTP point of view)


Moreover, if the client closes the connection while Squid is 
fruitlessly waiting for server SSL negotiation, the client connection 
will get into the CLOSE_WAIT state with a 1 day client_lifetime 
timeout.  This patch does not address that CLOSE_WAIT problem directly.


This patch adds an SSL negotiation timeout for the server SSL 
connection and try to not exceed forword_timeout or peer_timeout while 
connecting to an SSL server.


Some notes:
 - In this patch still the timeouts used for Ssl::PeerConnector are 
not accurate, they may be 5 secs more then the forward timeout or 1 
second more than peer_connect timeout, but I think are enough reasonable.


 - Please check and comment the new 
Comm::Connection::startTime()/::noteStart() mechanism.
Now the Comm::Connection::startTime_ computed in Comm::Connection 
constructor and resets in Comm::ConnOpener::start() and 
Comm::TcpAcceptor::start()



This is a Measurement Factory project.




Re: [PATCH] Support client connection annotation by helpers via clt_conn_id=ID

2014-07-10 Thread Tsantilas Christos

If there are no objections I will apply the latest patch to trunk.

Regards,
   Christos

On 06/26/2014 05:45 PM, Tsantilas Christos wrote:

A new patch.
Changes:
  - clt_conn_id renamed to clt_conn_tag
  - Amos requested fixes.

I hope it is OK.

Regards,
   Christos


On 06/22/2014 08:43 PM, Tsantilas Christos wrote:

Hi all,

The attached patch replaces existin annotation values with the new one
received from helpers.

Just one question. We are documenting key-value pairs in cf.data.pre
only for url-rewriter helpers, but annotations works for all squid 
helpers.

Should we move the related url-rewriter section to a global section? If
yes where?

For example something like the following in a global section should be
enough:

The interface for all helpers has been extended to support arbitrary
lists of key=value pairs, with the syntax  key=value . Some keys have
special meaning to Squid, as documented here.

Currently Squid understands the following optional key=value pairs
received from URL rewriters:
  clt_conn_id=ID
 Associates the received ID with the client TCP connection.
 The clt_conn_id=ID pair is treated as a regular annotation but
 it persists across all transactions on the client connection
 rather than disappearing after the current request. A helper
 may update the client connection ID value during subsequent
 requests by returning a new ID value. To send the connection
 ID to the URL rewriter, use url_rewrite_extras:
 url_rewrite_extras clt_conn_id=%{clt_conn_id}note ...



On 06/19/2014 09:07 PM, Tsantilas Christos wrote:

On 06/16/2014 06:36 PM, Alex Rousskov wrote:

On 06/15/2014 12:07 AM, Amos Jeffries wrote:

On 15/06/2014 4:58 a.m., Alex Rousskov wrote:

On 06/11/2014 08:52 AM, Tsantilas Christos wrote:


I must also note that this patch adds an inconsistency. All
annotation
key=values  pairs received from helpers, accumulated to the
existing key
notes values. The clt_conn_id=Id pair is always unique and replaces
the
existing clt_conn_id=Id annotation pair.
We may want to make all annotations unique, or maybe implement a
configuration mechanism to define which annotations are overwriting
their previous values and which appending the new values.


I suggest making all annotations unique (i.e., new values overwrite
old
ones) because helpers that want to accumulate annotation values 
can do

that by returning old values along with new ones:

   received by helper: name=v1
   returned by helper: name=v1,v2

Please note that the opposite does not work: If annotation values 
are

always accumulated, a helper cannot overwrite/remove the old value.



Doing that would mean passing all existing annotations to every 
helper

lookup.


Why would that mean the above?

AFAICT, the helper should get only the annotations it needs. That need
is helper-specific and, hence, is configurable via the various _extras
and equivalent directives. That is already supported and does not need
to change.

Here is the overall sketch for supporting "unique annotations":

1. Send the helper the annotations it is configured to get
(no changes here).

2. For each unique annotation key received from the helper,
remove any old annotation(s) with the same key.

3. Store annotations received from the helper
(no changes here).

To support explicit annotation deletion, we can adjust #3 to skip
key-value pairs with the value equal to '-'.


If there is not any objection  I will implement this scenario.

Looks that this approach is the best and cover more cases than the
accumulated Notes values.
If someones need to accumulate Note values he can configure squid to
send old note value to helper and helper include it in its response.
This is simple.

If required in the future we can implement a configuration parameter
which configures one or more notes as always accumulated. For example:
   AccumulateHelperNotes status message







HTH,

Alex.