[squid-dev] Squid's mailman

2024-02-09 Thread Marcus Kool

Hi all,

I posted today a message on squid-us...@lists.squid-cache.org and got 2 DMARC messages (more may follow in the next 24 hours) indicating that the mail message that mailman forwards to list members is 
rejected for list members - could be many more if they block without sending a DMARC message.
It seems that the mailman software that lists.squid-cache.org uses does not obey current best practices and fails to deliver a message on the list to all subscribers where the subscriber's mail server 
does SPF checks and notices that the From header does not match with the IP address of Squid's mailman server and rejects/quarantines the message.

It could be that many messages (not just mine) do not reach the mailboxes of 
list subscribers.

Marcus

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
https://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] squid website certificate error

2022-08-29 Thread Marcus Kool

I typed the name of the website without https in the address bar.  I am not 
sure how it was redirected to the https address (could be my browser history or 
web server).

In Firefox and Vivaldi I get the correct site.  When I type 'www.squid-cache.org' in the address bar of Chrome it goes very wrong showing the contents of https://grass.osgeo.org/.  Maybe Chrome tries 
https first and then http.


Marcus

On 29/08/2022 18:52, Francesco Chemolli wrote:

The squid website is not supposed to be over https, because it’s served by 
multiple mirrors not necessarily under the project’s control.
We have some ideas on how to change this but need the developer time to do it.
Help is welcome :)

On Mon, 29 Aug 2022 at 15:25, Marcus Kool  wrote:

Has anybody already complained that the certificates for squid-cache.org 
<http://squid-cache.org> and www.squid-cache.org <http://www.squid-cache.org> 
are messed up?

Marcus

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev

--
@mobile___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


[squid-dev] squid website certificate error

2022-08-29 Thread Marcus Kool

Has anybody already complained that the certificates for squid-cache.org and 
www.squid-cache.org are messed up?

Marcus

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


[squid-dev] TLS 1.3 0rtt

2018-11-15 Thread Marcus Kool
After reading https://www.privateinternetaccess.com/blog/2018/11/supercookey-a-supercookie-built-into-tls-1-2-and-1-3/ I am wondering if the TLS 1.3 implementation in Squid will have an option to 
disable the 0rtt feature so that user tracking is reduced.


Marcus

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Block users dynamically

2018-05-29 Thread Marcus Kool


On 28/05/18 15:10, dean wrote:
I am implementing modifications to Squid 3.5.27 for a thesis job. At some point in the code, I need to block a user. What I'm doing is writing to an external file that is used in the configuration, 
like Squish does. But it does not block the user, however when I reconfigure Squid if it blocks it. Is there something I do not know? When I change the file, should I reconfigure Squid? Is there 
another way to block users dynamically from the Squid code?


You can use ufdbGuard for this purpose.  ufdbGuard is a free URL redirector for 
Squid which can be configured to read lists of usernames or list of IP 
addresses every X minutes (default for X is 15).
So if you control a blacklist with usernames and write the name of the user to 
a defined file, ufdbguardd will block these users.
If the user must be blocked immediately you need to reload ufdbguardd, otherwise you wait until the configured time interval to reread the userlist expires and so after a few minutes the user gets 
blocked.


Note that reloading ufdbguardd does not interfere with Squid and all activity 
by browsers and squid continues normally.

Marcus

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


[squid-dev] wiki.squid-cache.org has an expired certificate

2017-11-08 Thread Marcus Kool


___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [PATCH] Bug 4662 adding --with-libressl build option

2017-02-01 Thread Marcus Kool



Do you think we can compromise and call it USE_OPENSSL_OR_LIBRESSL ?


or call it USE_OPENSSL_API

and then the code will eventually have none or few occurrences of
USE_OPENSSL and USE_LIBRESSL to deal with OpenSSL and LibreSSL specifics.

Marcus
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] g++ 4.8.x and std::regex problems

2016-11-28 Thread Marcus Kool



On 11/28/2016 07:46 PM, Alex Rousskov wrote:

Please undo that commit and let's discuss whether switching from
libregex to std::regex now is a good idea.


Thank you,

Alex.


Has anybody considered using RE2?
It is a regex library that is fast, C++ source, high quality, public domain, 
and is supported by older compilers.

Marcus
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [RFC] simplifying ssl_bump complexity

2016-11-28 Thread Marcus Kool



On 11/27/2016 11:20 PM, Alex Rousskov wrote:

On 11/19/2016 07:06 PM, Amos Jeffries wrote:

On 20/11/2016 12:08 p.m., Marcus Kool wrote:

The current ssl bump steps allow problematic configs where Squid
bumps or stares in one step and to splice in an other step,
which can be resolved (made impossible) in a new configuration syntax.


It would be nice to prohibit truly impossible actions at the syntax
level, but I suspect that the only way to make that possible is to focus
on final actions [instead of steps] and require at *most* one ssl_bump
rule for each of the supported final actions:

  ssl_bump splice...rules that define when to splice...
  ssl_bump bump  ...rules that define when to bump...
  ssl_bump terminate ...rules that define when to terminate...
  # no other ssl_bump lines allowed!

The current intermediate actions (peek and stare) would have to go into
the ACLs. There will be no ssl_bump rules for them at all. In other
words, the admin would be required to _always_ write an equivalent of

  if (a1() && a2() && ...)
  then
  splice
  elsif (b1() && b2() && ...)
  then
  bump
  elsif (c1() && c2() && ...)
  then
  terminate
  else
  splice or bump, depending on state
  (or some other default; this decision is secondary)
  endif

where a1(), b2(), and other functions/ACLs may peek or stare as needed
to get the required information.


The above if-then-else tree is clear.
I like your suggestion to drop steps in the configuration and make
Squid more intelligent to take decisions at the appropriate
moments (steps).

You mentioned admins being surprised about Squid bumping for a
notification of an error and one way to improve that is to replace
'terminate' by 'terminate_with_error' (with bumping) and 'quick_terminate'
(no bumping, just close fd).  The quick_terminate, if used, is also
faster, which is an added benefit.



I am not sure such a change is desirable, but it is worth considering, I
guess.

Please note that I am ignoring the directives/actions naming issue for now.


AFAICT, the syntax proposed by Amos (i.e., making stepN mandatory) does
not solve this particular problem at all:

  # Syntactically valid nonsense:
  ssl_bump_step1 splice all
  ssl_bump_step2 bump all


and neither is yours:

  # Syntactically valid nonsense?
  tls_server_hello passthrough all
  tls_client_hello terminate all



Correct.  It would be nice to have a better configuration syntax
where impossible rules are easier to avoid and/or Squid has
intelligence to detect nonsense rules and produce an error.


Below is a new proposal to attempt to make the configuration
more intuitive and less prone for admin misunderstandings.

First the admin must define if there is any bumping at all.
This could be done with
https_decryption on|off
This is similar to tls_new_connection peek|splice but much
more intuitive.


I do not see why

  https_decryption off

is more intuitive (or more precise) than

  ssl_bump splice all


For me this is because of the used terminology and because
with 'https_decryption off' one does not write anything
that has 'bump' in it, so the admin does not even have to
read the documentation to learn that 'ssl_bump splice all'
means 'no decryption'.


especially after you consider the order of directives.

Again, I am ignoring the naming issue for now. You may assume any name
you want for any directive or ACL.


allright for now.
The only comment that I want to make without starting a
new thread is that I think that conceptual terms are better
than technical terms (hence my preference for 'passthrough'
instead of 'splice').  But let's save this discussion for later.


Iff https_decryption is on:

1) the "connection" step:
When a browser uses "CONNECT " Squid does not need to make
peek or splice decisions.
When Squid intercepts a connection to "port 443 of " no peek
or splice decision is made here any more.
This step becomes obsolete in the proposed configuration.



I am still hoping that will happen. But also still getting pushback that
people want to terminate or splice without even looking at the
clear-text hello details.


I suspect that not looking at some SSL Hellos will always be needed
because some of those Hellos are not Hellos at all and it takes too much
time/resources for the SSL Hellos parser to detect some non-SSL Hellos.
Besides that, it is always nice to be able to selectively bypass the
complex Hello parser code in emergencies.


Perhaps less resources are used if there is a two-stage parser:
1) quick scan of input for data layout of a ClientHello without
   semantic parsing the content, e.g. look at the CipherSuite and verify
   that the whole field has legal characters without verifying
   that the ciphersuite is a valid SSL string of ciphers.
2) do the complex parsing.

Stage 1 should be fast and can separate SSL ClientHello from other
protocols.


3) the "TLS server

Re: [squid-dev] [RFC] simplifying ssl_bump complexity

2016-11-24 Thread Marcus Kool

Hi Amos,

Can you share your thoughts ?

Thanks
Marcus


On 11/20/2016 10:55 AM, Marcus Kool wrote:



On 11/20/2016 12:06 AM, Amos Jeffries wrote:

On 20/11/2016 12:08 p.m., Marcus Kool wrote:



[snip]



I like the intent of the proposal and the new directives tls_*.
What currently makes configuration in Squid 3/4 difficult is
the logic of 'define in step x what to do in the next step' and
IMO this logic is the main cause of misunderstandings and
incorrect configurations.  Also the terms 'bump' and 'splice'
do not help ease of understanding.  Since Squid evolved and
bumping changed from 3.3 - 3.4 - 3.5 to 4.x, and likely will
change again in 5.x, there is an opportunity to improve
things more than is proposed.
There is also a difference in dealing with transparent intercepted
connections and direct connections (browsers doing a CONNECT)
which also causes some misunderstandings.
The current ssl bump steps allow problematic configs where Squid
bumps or stares in one step and to splice in an other step,
which can be resolved (made impossible) in a new configuration syntax.

I propose to use a new logic for the configuration directives
where 'define in step x what to do in the next step' is replaced
with a new logic 'define in step x what to do _now_'.


From reading the below I think you are mistaking what "now" means to
Squid. Input access control directives in squid.conf make a decision
about what action to do based on some state that just arrived.


Maybe it is necessary to redefine 'now' but my point remains that
'define in step x what to do in the next step' is the cause of
most misunderstandings.


For example:
 HTTP message just finished parsing -> check http_access what to do with it.
 HTTP reply message just arrived -> check http_reply_access what to do
with it.

Thus my proposal was along the lines of:
  client hello recieved -> check tls_client_hello what to do with it.
  server hello recieved -> check tls_server_hello what to do with it.


For both hello messages: is the decision moment the moment where it
has been peeked at?



Below is a new proposal to attempt to make the configuration
more intuitive and less prone for admin misunderstandings.

First the admin must define if there is any bumping at all.
This could be done with
https_decryption on|off
This is similar to tls_new_connection peek|splice but much
more intuitive.

Iff https_decryption is on:

1) the "connection" step:
When a browser uses "CONNECT " Squid does not need to make
peek or splice decisions.
When Squid intercepts a connection to "port 443 of " no peek
or splice decision is made here any more.
This step becomes obsolete in the proposed configuration.


I am still hoping that will happen. But also still getting pushback that
people want to terminate or splice without even looking at the
clear-text hello details.


We must know the reasons behind this pushback.  Only then sane decisions
can be made.



2) the "TLS client hello" step:
When a browser uses CONNECT, Squid has a FQDN and does not need
peeking a TLS client hello message. It can use the tls_client_hello
directives given below.


Sadly this is not correct. Squid still needs to get the client hello
details at this point. They are needed to perform bump before the server
hello is received, and to "terminate with an error message" without
contacting a server.


yes, correct.  Squid must do this.  But does it have to be configured?


When Squid intercepts a connection, Squid always peeks to retrieve
the SNI which is the equivalent of the FQDN used by a CONNECT.
In this step admins may want to define what Squid must do, e.g.
tls_client_hello passthrough aclfoo
Note that the acl 'aclfoo' can use tls::client_servername and
tls::client_servername should always have a FQDN if the connection
is https.  tls::client_servername expands to the IP address if
the SNI of an intercepted connection could not be retrieved.


What if the SNI contradicts the CONNECT message FQDN ?
What if a raw-IP in the CONNECT message (or TCP SYN) does not belong to
the server named in SNI ?


:-)  I left this out on purpose to not make the post even larger than it was.
There is of course a lot of error checking.  The question is if
we have to configure it.  If yes, can we get away with one directive based
on an acl that uses tls::handshake_failure ?


Squid would now be diverting the client transparently to a server other
than the one it expects and caching under that FQDN. But the server cert
would still authenticate as being the SNI host, so TLS cannot detect the
diversion.

The fake CONNECT's are a bit messy but IMHO we can only get rid of the
first one done for intercepted connections. Although that alone would
make both cases handle the same way.


I do not know anything about the code that generates the fake CONNECT
of an transparent interception connection, but logically there should
not be a fake CONNECT for true

Re: [squid-dev] [RFC] simplifying ssl_bump complexity

2016-11-20 Thread Marcus Kool



On 11/20/2016 12:06 AM, Amos Jeffries wrote:

On 20/11/2016 12:08 p.m., Marcus Kool wrote:



[snip]



I like the intent of the proposal and the new directives tls_*.
What currently makes configuration in Squid 3/4 difficult is
the logic of 'define in step x what to do in the next step' and
IMO this logic is the main cause of misunderstandings and
incorrect configurations.  Also the terms 'bump' and 'splice'
do not help ease of understanding.  Since Squid evolved and
bumping changed from 3.3 - 3.4 - 3.5 to 4.x, and likely will
change again in 5.x, there is an opportunity to improve
things more than is proposed.
There is also a difference in dealing with transparent intercepted
connections and direct connections (browsers doing a CONNECT)
which also causes some misunderstandings.
The current ssl bump steps allow problematic configs where Squid
bumps or stares in one step and to splice in an other step,
which can be resolved (made impossible) in a new configuration syntax.

I propose to use a new logic for the configuration directives
where 'define in step x what to do in the next step' is replaced
with a new logic 'define in step x what to do _now_'.


From reading the below I think you are mistaking what "now" means to
Squid. Input access control directives in squid.conf make a decision
about what action to do based on some state that just arrived.


Maybe it is necessary to redefine 'now' but my point remains that
'define in step x what to do in the next step' is the cause of
most misunderstandings.


For example:
 HTTP message just finished parsing -> check http_access what to do with it.
 HTTP reply message just arrived -> check http_reply_access what to do
with it.

Thus my proposal was along the lines of:
  client hello recieved -> check tls_client_hello what to do with it.
  server hello recieved -> check tls_server_hello what to do with it.


For both hello messages: is the decision moment the moment where it
has been peeked at?



Below is a new proposal to attempt to make the configuration
more intuitive and less prone for admin misunderstandings.

First the admin must define if there is any bumping at all.
This could be done with
https_decryption on|off
This is similar to tls_new_connection peek|splice but much
more intuitive.

Iff https_decryption is on:

1) the "connection" step:
When a browser uses "CONNECT " Squid does not need to make
peek or splice decisions.
When Squid intercepts a connection to "port 443 of " no peek
or splice decision is made here any more.
This step becomes obsolete in the proposed configuration.


I am still hoping that will happen. But also still getting pushback that
people want to terminate or splice without even looking at the
clear-text hello details.


We must know the reasons behind this pushback.  Only then sane decisions
can be made.



2) the "TLS client hello" step:
When a browser uses CONNECT, Squid has a FQDN and does not need
peeking a TLS client hello message. It can use the tls_client_hello
directives given below.


Sadly this is not correct. Squid still needs to get the client hello
details at this point. They are needed to perform bump before the server
hello is received, and to "terminate with an error message" without
contacting a server.


yes, correct.  Squid must do this.  But does it have to be configured?


When Squid intercepts a connection, Squid always peeks to retrieve
the SNI which is the equivalent of the FQDN used by a CONNECT.
In this step admins may want to define what Squid must do, e.g.
tls_client_hello passthrough aclfoo
Note that the acl 'aclfoo' can use tls::client_servername and
tls::client_servername should always have a FQDN if the connection
is https.  tls::client_servername expands to the IP address if
the SNI of an intercepted connection could not be retrieved.


What if the SNI contradicts the CONNECT message FQDN ?
What if a raw-IP in the CONNECT message (or TCP SYN) does not belong to
the server named in SNI ?


:-)  I left this out on purpose to not make the post even larger than it was.
There is of course a lot of error checking.  The question is if
we have to configure it.  If yes, can we get away with one directive based
on an acl that uses tls::handshake_failure ?


Squid would now be diverting the client transparently to a server other
than the one it expects and caching under that FQDN. But the server cert
would still authenticate as being the SNI host, so TLS cannot detect the
diversion.

The fake CONNECT's are a bit messy but IMHO we can only get rid of the
first one done for intercepted connections. Although that alone would
make both cases handle the same way.


I do not know anything about the code that generates the fake CONNECT
of an transparent interception connection, but logically there should
not be a fake CONNECT for true HTTPS (TLS+HTTP) since a browser does
not do a CONNECT, so why fake one?  Was the fake CONNECT introduce

Re: [squid-dev] [RFC] simplifying ssl_bump complexity

2016-11-19 Thread Marcus Kool



On 11/19/2016 08:07 AM, Amos Jeffries wrote:

Since ssl_bump directive went in my original opinion of it as being too
complicated and confusing has pretty much been demonstrated as correct
by the vast amount of misconfigurations and failed attempts of people to
use it without direct assistance from those of us involved with its design.

Since we are also transitioning to a world where 'SSL' does not exist
any longer I think v5 is a good time to rename and redesign the
directive a bit.

I propose going back to the older config style where each step has its
own directive name which self-documents what it does. That will reduce
the confusion about what is going on at each 'step', and allow us a
chance to have clearly documented default actions for each step.

For example:
 tls_new_connection
  - default: peek all
  - or run ssl_bump check if that directive exists

 tls_client_hello
  - default: splice all
  - or run ssl_bump check if that directive exists

 tls_server_hello
  - default: terminate all
  - or run ssl_bump check if that directive exists


I like the intent of the proposal and the new directives tls_*.
What currently makes configuration in Squid 3/4 difficult is
the logic of 'define in step x what to do in the next step' and
IMO this logic is the main cause of misunderstandings and
incorrect configurations.  Also the terms 'bump' and 'splice'
do not help ease of understanding.  Since Squid evolved and
bumping changed from 3.3 - 3.4 - 3.5 to 4.x, and likely will
change again in 5.x, there is an opportunity to improve
things more than is proposed.
There is also a difference in dealing with transparent intercepted
connections and direct connections (browsers doing a CONNECT)
which also causes some misunderstandings.
The current ssl bump steps allow problematic configs where Squid
bumps or stares in one step and to splice in an other step,
which can be resolved (made impossible) in a new configuration syntax.

I propose to use a new logic for the configuration directives
where 'define in step x what to do in the next step' is replaced
with a new logic 'define in step x what to do _now_'.

Below is a new proposal to attempt to make the configuration
more intuitive and less prone for admin misunderstandings.

First the admin must define if there is any bumping at all.
This could be done with
https_decryption on|off
This is similar to tls_new_connection peek|splice but much
more intuitive.

Iff https_decryption is on:

1) the "connection" step:
When a browser uses "CONNECT " Squid does not need to make
peek or splice decisions.
When Squid intercepts a connection to "port 443 of " no peek
or splice decision is made here any more.
This step becomes obsolete in the proposed configuration.

2) the "TLS client hello" step:
When a browser uses CONNECT, Squid has a FQDN and does not need
peeking a TLS client hello message. It can use the tls_client_hello
directives given below.
When Squid intercepts a connection, Squid always peeks to retrieve
the SNI which is the equivalent of the FQDN used by a CONNECT.
In this step admins may want to define what Squid must do, e.g.
tls_client_hello passthrough aclfoo
Note that the acl 'aclfoo' can use tls::client_servername and
tls::client_servername should always have a FQDN if the connection
is https.  tls::client_servername expands to the IP address if
the SNI of an intercepted connection could not be retrieved.

For https connections with a client hello without the SNI extension:
tls_client_hello passthrough|terminate aclbar
where aclbar can contain tls::client_hello_missing_sni

For connections that do not use TLS (i.e. not a valid
TLS client hello message was seen):
tls_client_hello passthrough|terminate aclbar2
where aclbar2 may contain tls::handshake_failure

To define that the TLS handshake continues, the config can contain
tls_client_hello continue
This is a basically a no-op and not required but enhances readability
of a configuration.

3) the "TLS server hello" step:
Usually no directives are needed since rarely actions are taken
based on the server hello message, so the default is
tls_server_hello continue
The tls_server_hello can be used to terminate specific connections.
In this step many types of certificate errors can be detected
and in the Squid configuration there must be a way to define
what to do for specific errors and optionally for which FQDN.
E.g. allow to define that connections with self-signed certificates
are terminates but the self-signed cert for domain foo.example.com
is allowed.  See also the example config below and the use of
tls::server_servername.

What is left, is a configuration directive for connections
that use TLS as an encryption wrapper but do not use HTTP
inside the TLS wrapper:
tls_no_http passthrough|terminate   # similar to on_unsupported_protocol

An example configuration looks like this:
https_decryption on
acl banks tls::client_servername .bank1.example.org
acl no_sni tls::client_hello_missing_sni
acl no_handshake 

Re: [squid-dev] [PATCH] Support tunneling of bumped non-HTTP traffic. Other SslBump fixes.

2016-10-14 Thread Marcus Kool
I started testing this patch and observed one unwanted side effect of  
this patch:

When a client connects to mtalk.google.com,
Squid sends the following line to the URL rewriter:
(unknown)://173.194.76.188:443 / - NONE

Marcus

Quoting Christos Tsantilas :

Use case: Skype groups appear to use TLS-encrypted MSNP protocol  
instead of HTTPS. This change allows Squid admins using SslBump to  
tunnel Skype groups and similar non-HTTP traffic bytes via  
"on_unsupported_protocol tunnel all". Previously, the combination  
resulted in encrypted HTTP 400 (Bad Request) messages sent to the  
client (that does not speak HTTP).


Also this patch:
 * fixes bug 4529: !EBIT_TEST(entry->flags, ENTRY_FWD_HDR_WAIT)  
assertion in FwdState.cc.


 * when splicing transparent connections during SslBump step1, avoid  
access-logging an extra record and log %ssl::bump_mode as the  
expected "splice" not "none".


 * handles an XXX comment inside clientTunnelOnError for possible  
memory leak of client streams related objects


 * fixes TunnelStateData logging in the case of splicing after peek.

This is a Measurement Factory project.



___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Benchmarking Performance with reuseport

2016-08-13 Thread Marcus Kool

This article better explains the benefits of O_REUSEPORT:
https://lwn.net/Articles/542629/

A key paragraph is this:
The problem with this technique, as Tom pointed out, is that when
multiple threads are waiting in the accept() call, wake-ups are not
fair, so that, under high load, incoming connections may be
distributed across threads in a very unbalanced fashion. At Google,
they have seen a factor-of-three difference between the thread
accepting the most connections and the thread accepting the
fewest connections; that sort of imbalance can lead to
underutilization of CPU cores. By contrast, the SO_REUSEPORT
implementation distributes connections evenly across all of the
threads (or processes) that are blocked in accept() on the same port.

So using O_REUSEPORT seems very beneficial for SMP-based Squid.

Marcus


On 08/09/2016 09:19 PM, Henrik Nordström wrote:

tor 2016-08-04 klockan 23:12 +1200 skrev Amos Jeffries:



I imagine that Nginx are seeing latency reduction due to no longer
needing a central worker that receives the connection then spawns a
whole new process to handle it. The behaviour sort of makes sense for
a
web server (which Nginx is at heart still, a copy of Apache) spawning
CGI processes to handle each request. But kind of daft in these
HTTP/1.1
multiplexed performance-centric days.


No, it's only about accepting new connections on existing workers.

Many high load sites still run with non-persistent connections to keep
worker count down, and these benefit a lot from this change.

Sites using persistent connections only benefit marginally. But the
larger the worker count the higher the benefit as the load from new
connections gets distrubuted by the kernel instead of a stamping herd
of workers.

Regards
Henrik

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


[squid-dev] Benchmarking Performance with reuseport

2016-08-03 Thread Marcus Kool

https://www.nginx.com/blog/socket-sharding-nginx-release-1-9-1/
is an interesting short article about using the SO_REUSEPORT socket
option which increased performance of nginx and had better balancing
of connections across sockets of workers.
Since Squid has the issue that load is not very well balanced between
workers, I thought it is interesting to look at.

Marcus
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] HTTP meetup in Stockholm

2016-07-12 Thread Marcus Kool


On 07/12/2016 06:53 AM, Henrik Nordström wrote:

tis 2016-07-12 klockan 18:34 +1200 skrev Amos Jeffries:

I'm much more in favour of binary formats. The HTTP/2 HPACK design
lends
itself very easily to binary header values (ie sending integers as
interger encoded value). Following PHK's lead on those.


json is very ambiguous with no defined schema or type restrictions.
It's up to the receiver to guess type information from format while
parsing, which in itself is a mess from security point of view.

The beauty of json is that it is trivially extensible with new data,
and have all basic data constructs you need for arbitrary data. (name
tagged, strings, integers, floats, booleans, arrays, dictionaries and
maybe something more). But for the same reason it's also unsuitable for
HTTP header information which should be consise, terse and un-ambiguous
with little room for syntax errors.

Regards
Henrik


Extensible json headers seems to lend itself to put a lot of 
application-specific
stuff in headers instead of in payload. The headers should be used for the
protocol only.

Squid has had many issues in the past with non-conformity to standards.
The Squid developers obviously want to stick with the standards and are
forced by non-conformant apps and servers to support non-conformity.
Can this workshop be used to address this?

Marcus
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [RFC] on_crash

2015-12-09 Thread Marcus Kool



On 12/09/2015 09:20 PM, Alex Rousskov wrote:

On 12/09/2015 02:28 PM, Amos Jeffries wrote:

The above
considerations are all good reasons for us not to be bundling by default
IMO.


I agree.

Alex.


I did not get what the script does, does it call gdb ?

A script/executable that calls gdb and produces a readable stack trace of all 
squid processes is a powerful tool which makes debugging an issue much easier 
for many admins.
So I suggest to release the binaries and scripts that you have, install them by default in a new subdirectory, e.g. .../debugbin or .../sbin/debug, and _not_ configure them in the default squid.conf 
to prevent them being used accidentally.


If you do not want to bundle, then what is the alternative?
Make a download area on squid-cache.org for the binaries and scripts ?

Marcus
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Fake CONNECT requests during SSL Bump

2015-09-24 Thread Marcus Kool


On 09/24/2015 02:13 AM, Eliezer Croitoru wrote:

On 23/09/2015 04:52, Amos Jeffries wrote:

Exactly. They are processing steps. Not messages to be adapted.

Amos


+1 For that.


[...]


In any case the bottom line from me is that for now ICAP and ECAP are called 
ADAPTATION services and not ACL services.
It can be extended to do so and it's not a part of the RFCs or definitions and 
it might be the right way to do things but it will require simple enough 
libraries that will let most admins (if not all)
to be able to implement their ACL logics using these protocol\implementations.

Eliezer


ICAP is an adaptation protocol that almost everybody uses for access control.

The ICAP server must be able to see all traffic going through Squid so that it 
can do what it was designed for and block (parts) of websites and other data 
streams.
Other data streams may not be HTTP(S)-based and hence are not bumped, but for 
the ICAP server to be able to do its thing, it still needs a (fake) CONNECT.

Going back to Steve's original message, I think that it is not necessary to 
generate a (fake) CONNECT for each bump step,
but to send exactly one CONNECT at the moment that Squid makes a decision.  
I.e. when Squid decides to bump or splice.

Marcus
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


[squid-dev] download squid 3.5.8 fails

2015-09-02 Thread Marcus Kool


The download of the 3.5.8 sources fails :-(

wget -vvv http://www.squid-cache.org/Versions/v3/3.5/squid-3.5.8.tar.gz
--2015-09-02 17:16:43--  
http://www.squid-cache.org/Versions/v3/3.5/squid-3.5.8.tar.gz
Resolving www.squid-cache.org (www.squid-cache.org)... 92.223.231.190, 
209.169.10.131
Connecting to www.squid-cache.org (www.squid-cache.org)|92.223.231.190|:80... 
connected.
HTTP request sent, awaiting response... 404 Not Found
2015-09-02 17:16:43 ERROR 404: Not Found.

Best regards,

Marcus
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] download squid 3.5.8 fails

2015-09-02 Thread Marcus Kool

The normal download URL works now also.

Thanks

Marcus

On 09/02/2015 04:14 PM, Amos Jeffries wrote:

On 3/09/2015 3:23 a.m., Marcus Kool wrote:


The download of the 3.5.8 sources fails :-(

wget -vvv http://www.squid-cache.org/Versions/v3/3.5/squid-3.5.8.tar.gz
--2015-09-02 17:16:43--
http://www.squid-cache.org/Versions/v3/3.5/squid-3.5.8.tar.gz
Resolving www.squid-cache.org (www.squid-cache.org)... 92.223.231.190,
209.169.10.131
Connecting to www.squid-cache.org
(www.squid-cache.org)|92.223.231.190|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2015-09-02 17:16:43 ERROR 404: Not Found.



Yeah. I'm having trouble with one of the mirrors too.

Try west.squid-cache.org as the domain. That one I know works.


Amos

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] bug 4303

2015-08-18 Thread Marcus Kool



On 08/18/2015 12:36 PM, Amos Jeffries wrote:

On 19/08/2015 12:56 a.m., Marcus Kool wrote:

Amos, Christos,

Christos' patch seems not to work for plain 3.5.7 sources.
What do you suggest to try ?   Will there be a snapshot release that is
suitable for testing ?


Christos now has it in trunk, but the last snapshot refused to build due
to a compiler issues in the build farm which is now resolved. Tomorows
trunk snapshot should be r14229 or later with it in.

Next round of backports to 3.5 should include it there in 2-3 days as
well unless something goes wrong in the portage.


Thanks, I will wait for the 3.5 backport.  Will the patch be announced on the 
list?

marcus


Amos

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] bug 4303

2015-08-12 Thread Marcus Kool

Amos,
I tried the patch but several hunks failed.
It seems that the patch is not compatible with the 3.5.7 release code or I am 
doing something wrong (see below).
Marcus

[root@srv018 squid-3.5.7]# patch -b -p0 --dry-run  ../squid-sslbump-patch
checking file src/acl/Acl.h
Hunk #1 succeeded at 150 (offset 1 line).
checking file src/acl/BoolOps.cc
checking file src/acl/BoolOps.h
Hunk #1 FAILED at 45.
1 out of 1 hunk FAILED
checking file src/acl/Checklist.cc
checking file src/acl/Checklist.h
checking file src/acl/Tree.cc
Hunk #2 FAILED at 69.
1 out of 2 hunks FAILED
checking file src/acl/Tree.h
Hunk #1 FAILED at 23.
1 out of 1 hunk FAILED
checking file src/client_side.cc
Hunk #1 FAILED at 4181.
Hunk #2 FAILED at 4247.
2 out of 2 hunks FAILED
checking file src/ssl/PeerConnector.cc
Hunk #1 FAILED at 214.
1 out of 1 hunk FAILED


On 08/12/2015 10:25 AM, Amos Jeffries wrote:

On 13/08/2015 12:48 a.m., Marcus Kool wrote:

yesterday I filed bug 4303 - assertion failed in PeerConnector:743 squid
3.5.7
I am not sure if it is a duplicate of bug 4259 since that bug
description has almost no info to compare against.

I enclosed a small fragment of cache.log in the bug report but the debug
setting was ALL,1 93,3 61,9 so cache.log is very large.
In case that you need a larger fragment of cache.log, I can provide it.



Thanks Marcus.

I was about to reply to the bug report, but this is better.

I suspect this is a case of Squid going the wrong way in ssl_bump
interpretation. Specifically the peek action at stage 3.

Would you be able to try Christos' patch at the end of the mail here:
http://lists.squid-cache.org/pipermail/squid-dev/2015-August/002981.html


Amos

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


[squid-dev] bug 4303

2015-08-12 Thread Marcus Kool

yesterday I filed bug 4303 - assertion failed in PeerConnector:743 squid 3.5.7
I am not sure if it is a duplicate of bug 4259 since that bug description has 
almost no info to compare against.

I enclosed a small fragment of cache.log in the bug report but the debug 
setting was ALL,1 93,3 61,9 so cache.log is very large.
In case that you need a larger fragment of cache.log, I can provide it.

Best regards,

Marcus
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [PATCH] Temporary fix to restore compatibility with Amazon

2015-06-24 Thread Marcus Kool



On 06/24/2015 05:24 PM, Kinkie wrote:

My 2c: I vote for reality; possibly with a shaming announce message; I
wouldn't even recommend logging the violation: there is nothing the
average admin can do about it. I would consider adding a shaming
comment in the release notes.


more cents:

correct.

A standard can be considered a strong guideline but if important sites
violate the standard (i.e. users/admins complain) then Squid
should be able to cope with it or it risks getting abandoned because
Squid cannot cope with traffic of sites that otherwise work without Squid.

For an admin it is irrelevant if the problem is caused by Squid or by
a website.  And the admin who dares to say to its users only visit
sites that comply with the standards probably gets fired.



On Wed, Jun 24, 2015 at 10:12 PM, Alex Rousskov
rouss...@measurement-factory.com wrote:

On 06/24/2015 05:26 AM, Amos Jeffries wrote:


On 24/06/2015 5:55 p.m., Alex Rousskov wrote:

 This temporary trunk fix adds support for request URIs containing
'|' characters. Such URIs are used by popular Amazon product (and
probably other) sites: /images/I/ID1._RC|ID2.js,ID3.js,ID4.js_.js

Without this fix, all requests for affected URIs timeout while Squid
waits for the end of request headers it has already received(*).




This is not right. Squid should be identifying the message as
non-HTTP/1.x (which it isn't due to the URI syntax violation) and
treating it as such.


I agree that Amazon violates URI syntax. On the other hand, the message
can be interpreted as HTTP/1.x for all practical purposes AFAICT. If you
want to implement a different fix, please do so. Meanwhile, folks
suffering from this serious regression can try the temporary fix I posted.



The proper long-term fix is to allow any character in URI as long as we
can reliably parse the request line (and, later, URI components). There
is no point in hurting users by rejecting requests while slowly
accumulating the list of benign characters used by web sites but
prohibited by some RFC.



The *proper* long term fix is to obey the standards in regard to message
syntax so applications stop using these invalid (when un-encoded)
characters and claiming HTTP/1.1 support.


We had standards vs reality and policing traffic discussions several
times in the past, with no signs of convergence towards a single
approach, so I am not going to revisit that discussion now. We continue
to disagree [while Squid users continue to suffer].


Thank you,

Alex.

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev





___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Death of SSLv3

2015-05-07 Thread Marcus Kool



On 05/07/2015 07:03 AM, Amos Jeffries wrote:

Its done. SSLv3 is now a MUST NOT use protocol from RFC 7525
(http://tools.ietf.org/html/rfc7525)


good decision.


It's time for us to start ripping out from trunk all features and hacks
supporting its use. Over the coming days I will be submitting patches to
remove the squid.conf settings, similar to SSLv2 removal earlier.

The exceptions which may remain are SSLv3 features which are used by the
still-supported TLS versions. Such as session resume, and the SSLv3
format of Hello message (though not the SSLv3 protocol IDs).


are you sure you want to do this _now_ ?

It is predictable that users will complain with
I know this provider is stupid and uses SSLv3 but I _need_ to access that site for 
our business
and use this as a reason not to upgrade or blame squid.

It may not be that much extra work to have a new option use_sslv3 with the 
default setting to OFF
and not ripping SSLv3 code yet.  Also, if you do not rip SSLv3, Squid can 
detect that a site uses
SSLv3 and give a useful error message like this site insists in using the unsafe 
SSLv3 protocol
instead of a confusing unknown protocol.

Marcus



Christos, if you can keep this in mind for all current / pending, and
future SSL work.

Amos

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [PATCH] Non-HTTP bypass

2015-01-02 Thread Marcus Kool



On 12/31/2014 02:31 PM, Alex Rousskov wrote:

On 12/31/2014 03:33 AM, Marcus Kool wrote:

On 12/31/2014 05:54 AM, Alex Rousskov wrote:

What would help is to decide whether we want to focus on

A) multiple conditions for establishing a TCP tunnel;
B) multiple ways to handle an unrecognized protocol error; OR
C) multiple ways to handle multiple errors.

IMO, we want (B) or perhaps (C) while leaving (A) as a separate
out-of-scope feature.

The proposed patch implements (B). To implement (C), the patch needs to
add an ACL type to distinguish an unrecognized protocol error from
other errors.




 From an administrators point of view, the admins that want Squid to
filter internet access, definitely want (B).  They want (B) to block
audio, video, SSH tunnnels, VPNs, chat, file sharing, webdisks and all
sorts of applications (but not all!) that use port 443.


Agreed, except this is not limited to port 443. The scope includes
intercepted port 80 connections and even CONNECT tunnels.


If CONNECT tunnels are in scope, then so are all the applications that use it,
including webdisk, audio, video, SSH etc.

I think it was Amos who said that application builders hould use 
application-specific
ports, but reality is that all firewalls block those ports by default.
Skype was one of the first applications that worked everywhere, even behind
a corporate firewall and it was done using CONNECT to the web proxy.
And from a security point of view I think that administrators prefer that
applications use CONNECT to the web proxy to have more control and logging
about what traffic is going from a LAN to the internet.


Basically
this means that admins desire a more fine-grained control about what to
do with each tunnel.


There are two different needs here, actually:

1. A choice of actions (i.e., what to do) when dealing with an
unsupported protocol. Currently, there is only one action: Send an HTTP
error response. The proposed feature adds another action (tunnel) and,
more importantly, adds a configuration interface to support more actions
later.


Sending an HTTP error to an application that does not speak HTTP is not
very useful.  Skype, SSH, videoplayers etc. only get confused at best.
Simply closing the tunnel may be better and may result in an end user message
'cannot connect to ...' instead of 'server sends garbage' or 'undefined 
protocol'.

Marcus


2. A way to further classify an unsupported protocol (i.e.,
fine-grained control). I started a new thread on this topic as it is
not about the proposed bypass feature.


Cheers,

Alex.

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [PATCH] Non-HTTP bypass

2014-12-31 Thread Marcus Kool



On 12/31/2014 05:54 AM, Alex Rousskov wrote:
[...]


What would help is to decide whether we want to focus on

   A) multiple conditions for establishing a TCP tunnel;
   B) multiple ways to handle an unrecognized protocol error; OR
   C) multiple ways to handle multiple errors.

IMO, we want (B) or perhaps (C) while leaving (A) as a separate
out-of-scope feature.

The proposed patch implements (B). To implement (C), the patch needs to
add an ACL type to distinguish an unrecognized protocol error from
other errors.


From an administrators point of view, the admins that want Squid to
filter internet access, definitely want (B).  They want (B) to block
audio, video, SSH tunnnels, VPNs, chat, file sharing, webdisks and all sorts
of applications (but not all!) that use port 443.  Basically
this means that admins desire a more fine-grained control about what to
do with each tunnel.

The current functionality of filtering is divided between Squid itself and
3rd party software (ICAP daemons and URL redirectors).
I plea for an interface where an external helper can decide what to do
with an unknown protocol inside a tunnel because it is much more flexible
than using ACLs and extending Squid with detection of (many) protocols.

A while back when we discussed the older sslBump not being able to cope
with Skype I suggested to use ICAP so that the ICAP daemon receives a
REQMOD/RESPMOD message with CONNECT and intercepted content, which also is
a valid option for me.

I wish to all a Blessful and Happy New Year!
Marcus




[...]



Thank you,

Alex.
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] unsupported protocol classification

2014-12-31 Thread Marcus Kool



On 12/31/2014 02:23 PM, Alex Rousskov wrote:

[ I am changing the Subject line for this sub-thread because this new
discussion is not really relevant to the unsupported protocol bypass
feature, even though that bypass feature will be used by those who need
to classify unsupported protocols. ]


On 12/31/2014 03:33 AM, Marcus Kool wrote:


The current functionality of filtering is divided between Squid itself and
3rd party software (ICAP daemons and URL redirectors).


... as well as external ACLs and eCAP adapters.



I plea for an interface where an external helper can decide what to do
with an unknown protocol inside a tunnel because it is much more flexible
than using ACLs and extending Squid with detection of (many) protocols.


I doubt pleading will be enough, unfortunately, because a considerable
amount of coding and design expertise is required to fulfill your dream.
IMO, a quality implementation would involve:


It is clear to me that this functionality will not be implemented next week,
but for me it is not a dream.  It is a reality that filtering becomes more
important, just wait until a headline in the news comes along like
secret document stolen via a web tunnel and everybody wants it.
The risk is real and it is so simple to abuse CONNECT on port 443 for anything
that it is extremely likely that it is already being used for illegal actions
and will continue to be used for illegal actions.

There is also not much point in having a web proxy that can filter 50% or 99%
of what you want to filter.  If you cannot filter everything and especially 
cannot
filter known security risks, the filter solution is very weak.
That is why ufdbGuard currently sends probes to sites that an application 
CONNECTs to.
The probes tell ufdbGuard what type of traffic is to be expected but
are also not 100% reliable since a probe is not the same as an inspection
of the real traffic.


1. Encoding the tunnel information (including traffic) in [small]
HTTP-like messages to be passed to ICAP/eCAP services. It is important
to get this API design right while anticipating complications like
servers that speak first, agents that do not send Hellos until they hear
the other agent Hello, and fragmented Hellos. Most likely, the design
will involve two tightly linked but concurrent streams of adaptation
messages: user-Squid-origin and origin-Squid-user. Let's call that
TUNMOD, as opposed to the existing REQMOD and RESPMOD.


Getting the design right is definitely important.  Therefore I like
to bring up this issue once in a while so that with the design decisions
made today of related parts, it will be easier to implement TUNMOD
in the future.


2. Writing adaptation hooks to pass tunnel information (using TUNMOD
design above) to adaptation services. The primary difficulty here is
handling incremental give me more and give them more decisions while
shoveling tunneled bytes. The current tunneling code does not do any
adaptation at all so the developers would be starting from scratch
(albeit with good examples available from non-tunneling code dealing
with HTTP/FTP requests and HTTP/FTP responses).


It can be simpler.  TUNMOD replies can be limited to
DONTKNOW - continue with what is happening and keep the TUNMOD server informed
ACCEPT - continue and do not inform the TUNMOD server any more about this tunnel
BLOCK - close the tunnel

I think there is no need for adaptation since one accepts a webdisk, voice
chat, VPN or whatever, or one does not accept it. So adaptation as is
used for HTTP, is not an important feature.

Sending an HTTP error on a tunnel is only useful if the tunnel uses
SSL-encapsulated HTTP.


3. Implementing more actions than the already implemented start a blind
tunnel and respond with an error. The shovel this to the other side
and then come back to me with the newly received bytes action would be
essential in many production cases, for example.

The above is a large project. I do not recall any projects of that size
and complexity implemented without sponsors in recent years but YMMV.


We will see.  Maybe there will be a sponsor to do this.

It is 15:38 local time and my last post of the year.
Happy New Year to all.

Marcus


Please note that modern Squid already has an API that lets 3rd party
software to pick one of the supported actions. It is called annotations:
External software sends Squid an annotation and the admin configures
Squid to do X when annotation Y is received in context Z.



A while back when we discussed the older sslBump not being able to cope
with Skype I suggested to use ICAP so that the ICAP daemon receives a
REQMOD/RESPMOD message with CONNECT and intercepted content, which also is
a valid option for me.


Yes, ICAP/eCAP is the right direction here IMO, but there are several
challenges on that road. I tried to detail them above.


HTH,

Alex.



___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org

Re: Possible memory leak.

2014-07-20 Thread Marcus Kool

Eliezer,

It is important to know what implementation of malloc is used.
So it is important to know which OS/distro is used and which version of 
glibc/malloc.

malloc on 64bit CentOS 6.x uses memory-mapped memory for allocations of 128 KB 
or larger
and uses multiple (can't find how many) 64MB segments and many more when 
threads are used.

I also suggest to collect total memory size _and_ resident memory size.
The resident memory size is usually significantly smaller than the total memory 
size
which can be explained by the 64MB segments that are only used for a low 
percentage.

If you use CentOS, I recommend to
   export MALLOC_ARENA_MAX=1# should work well
and/or
   export MMAP_THRESHOLD=4100100100 # no experience if this works
and run the test again.

Marcus


On 07/20/2014 12:27 PM, Eliezer Croitoru wrote:

I want to verify the issue I have seen:
Now The server is on about 286 MB of resident memory.
The issue is that the server memory usage was more then 800MB while two things 
in mind
1 - The whole web server is 600 MB
2 - 150MB is the maximum object size in memory (there is no disk cache)
3 - the cache memory of the server is the default of 256MB.

I cannot think about an option that will lead this server to consume more then 
400MB even if one 10 bytes file is being fetched with a query term every time 
with a different parameter.

If the sum of all the request to the proxy are 30k I do not see how it would 
still lead to 900MB of ram used by squid.

If I am mistaken(could very simple accomplished) then I want to understand what 
to look for in the mgr interface to see if there is a reasonable usage of 
memory or not.
(I know it's a lot to ask but still)

Thanks,
Eliezer

On 07/10/2014 09:10 PM, Eliezer Croitoru wrote:

OK so I started this reverse proxy for a bandwidth testing site and it
seems odd that it using more then 400MB when the only difference in the
config is maximum_object_size_in_memory to 150MB and StoreID

SNIP

Eliezer






Squid 3.4.5 warning about MGR_INDEX

2014-06-08 Thread Marcus Kool


Using Squid 3.4.5 I observed in cache.log the following warning:

2014/06/08 09:50:42.804 kid1| disk.cc(92) file_open: file_open: error opening 
file /local/squid34/share/errors/templates/MGR_INDEX: (2) No such file or 
directory
2014/06/08 09:50:42.805 kid1| errorpage.cc(307) loadDefault: WARNING: failed to 
find or read error text file MGR_INDEX

For other error template files all is well.

Marcus



Re: issue with ICAP message for redirecting HTTPS/CONNECT

2014-06-08 Thread Marcus Kool

Thanks Nathan, that helped.
Sometimes it is frustrating to just not see the small error...

Marcus

On 06/08/2014 01:02 PM, Nathan Hoad wrote:

Hi Marcus,

There's a bug in your ICAP server with how it's handling the
Encapsulated header that it sends back to Squid. This is what your
server sent back to Squid for a REQMOD request:

Encapsulated: res-hdr=0, null-body=1930d
X-Next-Services: 0d
0d
CONNECT blockedhttps.urlfilterdb.com:443 HTTP/1.00d
   -- NOTE: also fails: CONNECT https://blockedhttps.urlfilterdb.com
HTTP/1.00d

snipped for brevity

The Encapsulated header says that the HTTP object that has been sent
back contains HTTP response headers, and no body. This leads Squid to
believe it should be parsing a HTTP response, which expects the first
token of the first line to begin with HTTP/, which is failing because
the server has actually sent back a HTTP request. This explains the
error in the logs, and why it's working for your GET and POST
responses, which do indeed contain HTTP response objects.

So for this particular example, the correct Encapsulated header value
would be 'req-hdr=0, null-body=193'.

I hope that helps,

Nathan.

--
Nathan Hoad
Software Developer
www.getoffmalawn.com


On 9 June 2014 00:22, Marcus Kool marcus.k...@urlfilterdb.com wrote:

I ran into an issue with the ICAP interface.
The issue is that a GET/HTTP-based URL can be successfully rewritten but a
CONNECT/HTTPS-based URL cannot.  I used debug_options ALL,9 to find out what
is going wrong
but I fail to understand Squid.

GET/HTTP to http://googleads.g.doubleclick.net works:

Squid writes:
REQMOD icap://127.0.0.1:1344/reqmod_icapd_squid34 ICAP/1.00d
Host: 127.0.0.1:13440d
Date: Sun, 08 Jun 2014 13:54:09 GMT0d
Encapsulated: req-hdr=0, null-body=1350d
Preview: 00d
Allow: 2040d
X-Client-IP: 127.0.0.10d
0d
GET http://googleads.g.doubleclick.net/ HTTP/1.00d
User-Agent: Wget/1.12 (linux-gnu)0d
Accept: */*0d
Host: googleads.g.doubleclick.net0d
0d

ICAP daemon responds:
ICAP/1.0 200 OK0d
Server: ufdbICAPd/1.00d
Date: Sun, 08 Jun 2014 13:54:09 GMT0d
ISTag: 5394572c-45670d
Connection: keep-alive0d
Encapsulated: res-hdr=0, null-body=2330d
X-Next-Services: 0d
0d
HTTP/1.0 200 OK0d
Date: Sun, 08 Jun 2014 13:54:09 GMT0d
Server: ufdbICAPd/1.00d
Last-Modified: Sun, 08 Jun 2014 13:54:09 GMT0d
ETag: 498a-0001-5394572c-45670d
Cache-Control: max-age=100d
Content-Length: 00d
Content-Type: text/html0d
0d
00d
0d


CONNECT/HTTPS does not work:

Squid writes:
REQMOD icap://127.0.0.1:1344/reqmod_icapd_squid34 ICAP/1.00d
Host: 127.0.0.1:13440d
Date: Sun, 08 Jun 2014 12:29:32 GMT0d
Encapsulated: req-hdr=0, null-body=870d
Preview: 00d
Allow: 2040d
X-Client-IP: 127.0.0.10d
0d
CONNECT googleads.g.doubleclick.net:443 HTTP/1.00d
User-Agent: Wget/1.12 (linux-gnu)0d
0d

ICAP daemon responds:
ICAP/1.0 200 OK0d
Server: ufdbICAPd/1.00d
Date: Sun, 08 Jun 2014 12:29:32 GMT0d
ISTag: 5394572c-45670d
Connection: keep-alive0d
Encapsulated: res-hdr=0, null-body=1930d
X-Next-Services: 0d
0d
CONNECT blockedhttps.urlfilterdb.com:443 HTTP/1.00d--
NOTE: also fails: CONNECT https://blockedhttps.urlfilterdb.com HTTP/1.00d
Host: blockedhttps.urlfilterdb.com0d
User-Agent: Wget/1.12 (linux-gnu)0d
X-blocked-URL: googleads.g.doubleclick.net0d
X-blocked-category: ads0d
0d
00d
0d

and Squid in the end responds to wget:
HTTP/1.1 500 Internal Server Error
Server: squid/3.4.5
Mime-Version: 1.0
Date: Sun, 08 Jun 2014 13:59:27 GMT
Content-Type: text/html
Content-Length: 2804
X-Squid-Error: ERR_ICAP_FAILURE 0
Vary: Accept-Language
Content-Language: en
X-Cache: MISS from XXX
X-Cache-Lookup: NONE from XXX:3128
Via: 1.1 XXX (squid/3.4.5)
Connection: close

A fragment of cache.log is below.
I think that the line
HttpReply.cc(460) sanityCheckStartLine: HttpReply::sanityCheckStartLine:
missing protocol prefix (HTTP/) in 'CONNECT blockedhttps.urlfilterdb.com:443
HTTP/1.00d
indicates where the problem is.

Questions:
The ICAP reply has a HTTP/ protocol prefix so does Squid have a problem
parsing the reply?

What is the issue with the reply of the ICAP daemon?

Not directly related, but interesting: why does Squid sends
CONNECT googleads.g.doubleclick.net:443 HTTP/1.0
to the ICAP daemon instead of
CONNECT https://googleads.g.doubleclick.net HTTP/1.0

Thanks
Marcus

 cache.log:
-
2014/06/08 09:29:32.224 kid1| Xaction.cc(413) noteCommRead: read 384 bytes
2014/06/08 09:29:32.224 kid1| Xaction.cc(73) disableRetries:
Adaptation::Icap::ModXact from now on cannot be retried  [FD 12;rG/RwP(ieof)
job9]
2014/06/08 09:29:32.224 kid1| ModXact.cc(646) parseMore: have 384 bytes to
parse [FD 12;rG/RwP(ieof) job9]
2014/06/08 09:29:32.224 kid1| ModXact.cc(647) parseMore:
ICAP/1.0 200 OK0d
Server: ufdbICAPd/1.00d
Date: Sun, 08 Jun 2014 12:29:32 GMT0d
ISTag: 5394572c-45670d
Connection: keep-alive0d
Encapsulated: res-hdr=0, null-body=1930d
X-Next-Services: 0d
0d
CONNECT blockedhttps.urlfilterdb.com:443 HTTP/1.00d
Host

Re: issue with ICAP message for redirecting HTTPS/CONNECT

2014-06-08 Thread Marcus Kool

no, no sslbump is used

On 06/08/2014 11:57 AM, Eliezer Croitoru wrote:

Are you using SSL-BUMP?

Eliezer

On 06/08/2014 05:22 PM, Marcus Kool wrote:

I ran into an issue with the ICAP interface.
The issue is that a GET/HTTP-based URL can be successfully rewritten but a
CONNECT/HTTPS-based URL cannot.  I used debug_options ALL,9 to find out
what is going wrong
but I fail to understand Squid.






Re: issue with ICAP message for redirecting HTTPS/CONNECT

2014-06-08 Thread Marcus Kool



On 06/08/2014 04:20 PM, Alex Rousskov wrote:

On 06/08/2014 10:02 AM, Nathan Hoad wrote:


There's a bug in your ICAP server with how it's handling the
Encapsulated header that it sends back to Squid.

...

The Encapsulated header says that the HTTP object that has been sent
back contains HTTP response headers, and no body. This leads Squid to
believe it should be parsing a HTTP response



Hello Marcus,

 In addition to the Encapsulated header wrongly promising an HTTP
response, the ICAP response also contains an encapsulated HTTP body
chunk (of zero size) when the Encapsulated header promised no body at
all. That ICAP server bug is present in both GET and CONNECT adaptation
transactions (but the correct behavior would be different in each of
those two cases).


Thanks for pointing that out.


If you are writing a yet another ICAP server, please note that free and
commercial ICAP servers are available. Are you sure you want to go
through the pains of writing a yet another broken one? And that you
actually need ICAP?


For this project I indeed need ICAP.
I was not satisfied with the free ICAP servers and will
make the ICAP server public domain so a commercial one is not an option.


Finally, please note that rewriting and even satisfying CONNECT requests
is difficult because the browser has certain expectations about the
origin server and the browser's security model prevent many CONNECT
request and response manipulations.


yes, I am aware of all troubles with certificates and how browsers deal with 
them.
ICAP was designed for HTTP, not HTTPS, but ICAP is all we got for content 
filtering.

I am aware that ecap exists, but because ecap sits inside the Squid process
but has no support for multithreading, which is a must-have for this project,
ecap is not suitable for technical reasons.

Thanks
Marcus


Cheers,

Alex.




On 9 June 2014 00:22, Marcus Kool marcus.k...@urlfilterdb.com wrote:

I ran into an issue with the ICAP interface.
The issue is that a GET/HTTP-based URL can be successfully rewritten but a
CONNECT/HTTPS-based URL cannot.  I used debug_options ALL,9 to find out what
is going wrong
but I fail to understand Squid.

GET/HTTP to http://googleads.g.doubleclick.net works:

Squid writes:
REQMOD icap://127.0.0.1:1344/reqmod_icapd_squid34 ICAP/1.00d
Host: 127.0.0.1:13440d
Date: Sun, 08 Jun 2014 13:54:09 GMT0d
Encapsulated: req-hdr=0, null-body=1350d
Preview: 00d
Allow: 2040d
X-Client-IP: 127.0.0.10d
0d
GET http://googleads.g.doubleclick.net/ HTTP/1.00d
User-Agent: Wget/1.12 (linux-gnu)0d
Accept: */*0d
Host: googleads.g.doubleclick.net0d
0d

ICAP daemon responds:
ICAP/1.0 200 OK0d
Server: ufdbICAPd/1.00d
Date: Sun, 08 Jun 2014 13:54:09 GMT0d
ISTag: 5394572c-45670d
Connection: keep-alive0d
Encapsulated: res-hdr=0, null-body=2330d
X-Next-Services: 0d
0d
HTTP/1.0 200 OK0d
Date: Sun, 08 Jun 2014 13:54:09 GMT0d
Server: ufdbICAPd/1.00d
Last-Modified: Sun, 08 Jun 2014 13:54:09 GMT0d
ETag: 498a-0001-5394572c-45670d
Cache-Control: max-age=100d
Content-Length: 00d
Content-Type: text/html0d
0d
00d
0d


CONNECT/HTTPS does not work:

Squid writes:
REQMOD icap://127.0.0.1:1344/reqmod_icapd_squid34 ICAP/1.00d
Host: 127.0.0.1:13440d
Date: Sun, 08 Jun 2014 12:29:32 GMT0d
Encapsulated: req-hdr=0, null-body=870d
Preview: 00d
Allow: 2040d
X-Client-IP: 127.0.0.10d
0d
CONNECT googleads.g.doubleclick.net:443 HTTP/1.00d
User-Agent: Wget/1.12 (linux-gnu)0d
0d

ICAP daemon responds:
ICAP/1.0 200 OK0d
Server: ufdbICAPd/1.00d
Date: Sun, 08 Jun 2014 12:29:32 GMT0d
ISTag: 5394572c-45670d
Connection: keep-alive0d
Encapsulated: res-hdr=0, null-body=1930d
X-Next-Services: 0d
0d
CONNECT blockedhttps.urlfilterdb.com:443 HTTP/1.00d--
NOTE: also fails: CONNECT https://blockedhttps.urlfilterdb.com HTTP/1.00d
Host: blockedhttps.urlfilterdb.com0d
User-Agent: Wget/1.12 (linux-gnu)0d
X-blocked-URL: googleads.g.doubleclick.net0d
X-blocked-category: ads0d
0d
00d
0d

and Squid in the end responds to wget:
HTTP/1.1 500 Internal Server Error
Server: squid/3.4.5
Mime-Version: 1.0
Date: Sun, 08 Jun 2014 13:59:27 GMT
Content-Type: text/html
Content-Length: 2804
X-Squid-Error: ERR_ICAP_FAILURE 0
Vary: Accept-Language
Content-Language: en
X-Cache: MISS from XXX
X-Cache-Lookup: NONE from XXX:3128
Via: 1.1 XXX (squid/3.4.5)
Connection: close

A fragment of cache.log is below.
I think that the line
HttpReply.cc(460) sanityCheckStartLine: HttpReply::sanityCheckStartLine:
missing protocol prefix (HTTP/) in 'CONNECT blockedhttps.urlfilterdb.com:443
HTTP/1.00d
indicates where the problem is.






Re: How long is a domain or url can be?

2014-05-01 Thread Marcus Kool



On 05/01/2014 12:50 AM, Eliezer Croitoru wrote:

On 05/01/2014 02:52 AM, Marcus Kool wrote:

Eliezer,

It is not clear what you want to achieve...  If you just want to use a
URL filter
I suggest to use ufdbGuard. I am the author, give support, there are
regular
updates, it is multithreaded and holds only one copy in memory, and has
a documented
proprietary database format which is 3-4 times faster than squidGuard.

Marcus

Thanks Marcus,

I am looking at couple things:
I want to understand how SquidGuard was filtering data and doing policy stuff 
(since i am not able to think alone).
I will try to look at ufdbGuard but now I know I can ask you if not SquidGuard 
team.

Is it possible with ufdbGuard to update the DB without the need to reload or do 
anything?


No, but ufdbguard reloads very fast and it has configuration options on how to 
behave during reload:
- block all traffic
- allow all traffic
- allow and slow down all traffic (to reduce the number of unfiltered URLs)


(is it ok to ask you in private?)


Sure, if questions are not related to squid it is better not to use the squid 
list.

Marcus


Thanks All,
Eliezer




Re: atomic ops on i386

2014-04-14 Thread Marcus Kool

gcc defines the symbols __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 and 
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_8
and I use code that looks like this:

#if defined(__GNUC__)__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4
__SIZEOF_LONG_LONG__ == 4
   (void) __sync_add_and_fetch( longLongVar, 1 );
#elif defined(__GNUC__)__GCC_HAVE_SYNC_COMPARE_AND_SWAP_8
__SIZEOF_LONG_LONG__ == 8
   (void) __sync_add_and_fetch( longLongVar, 1 );
#else
   pthread_mutex_lock( counterMutex ); // or other mutex lock not based on 
pthread
   longLongVar++;
   pthread_mutex_unlock( counterMutex );
#endif

I think that the root cause of the problem is in src/ipc/AtomicWord.h where it 
is assumed
that if HAVE_ATOMIC_OPS is defined, atomic ops are defined for all types (and 
all sizes)
which is an incorrect assumption.

Changing the configure script to detect atomic ops for long long instead of int
is a workaround, but this prevents the use of atomic ops for 4-byte types on 
systems that support it.

Marcus


On 04/14/2014 11:45 AM, Alex Rousskov wrote:

On 04/13/2014 03:19 PM, Stuart Henderson wrote:

On 2014-04-13, Alex Rousskov rouss...@measurement-factory.com wrote:

On 04/13/2014 06:36 AM, Stuart Henderson wrote:


I'm just trying to build 3.5-HEAD on OpenBSD/i386 (i.e. 32-bit mode) for
the first time. It fails due to use of 64-bit atomic ops:

MemStore.o(.text+0xc90): In function `MemStore::anchorEntry(StoreEntry, int, 
Ipc::StoreMapAnchor const)':
: undefined reference to `__sync_fetch_and_add_8'
MemStore.o(.text+0x3aa3): In function `MemStore::copyFromShm(StoreEntry, int, 
Ipc::StoreMapAnchor const)':
: undefined reference to `__sync_fetch_and_add_8'
MemStore.o(.text+0x3cce): In function `MemStore::copyFromShm(StoreEntry, int, 
Ipc::StoreMapAnchor const)':
: undefined reference to `__sync_fetch_and_add_8'
MemStore.o(.text+0x4040): In function `MemStore::copyFromShm(StoreEntry, int, 
Ipc::StoreMapAnchor const)':
: undefined reference to `__sync_fetch_and_add_8'
MemStore.o(.text+0x435f): In function `MemStore::copyFromShm(StoreEntry, int, 
Ipc::StoreMapAnchor const)':
: undefined reference to `__sync_fetch_and_add_8'
MemStore.o(.text+0x473d): more undefined references to `__sync_fetch_and_add_8' 
follow
collect2: error: ld returned 1 exit status


I am not an expert on this, but googling suggests building with
-march=i586 or a similar GCC option may solve your problem. More
possibly relevant details at


That does fix the problem building, but I need this for package builds
which are supposed to still work on 486, so I can't rely on users having
586 (cmpxchg8b).




   http://www.squid-cache.org/mail-archive/squid-dev/201308/0103.html


specifically because swap_file_sz that they need to keep in sync
across Squid kids is 64 bits - so I think fixing the autoconf check is
probably what's needed then.


Probably, assuming your users do not care about SMP-shared caching
(memory or disk).



Should the autoconf test be changed to check for working 64-bit ops, or
is something more involved wanted?


Filing a bug report may be a good idea, especially if you cannot make
this work.


I suppose the simplest fix would be something like this,

--- configure.ac.orig   Fri Apr  4 21:31:38 2014
+++ configure.acSun Apr 13 15:12:37 2014
@@ -416,7 +416,7 @@ dnl Check for atomic operations support in the compile
  dnl
  AC_MSG_CHECKING([for GNU atomic operations support])
  AC_RUN_IFELSE([AC_LANG_PROGRAM([[
-int n = 0;
+long long n = 0;
  ]],[[
  __sync_add_and_fetch(n, 10); // n becomes 10
  __sync_fetch_and_add(n, 20); // n becomes 30


Nitpick: s/long long/long long int/

but I think it would be safer to test both 32- and 64-bit sizes (using
two different variables instead of a single n). Also, ideally, we should
use int32_t and uint64_t types if possible, but that probably requires
#inclusion of other headers and may become difficult unless there are
already working ./configure test cases using those types that you can
copy from.



Happy to open a bug report if that's preferred, I thought I'd ask here
first to work out the best direction.


Again, I am not an expert on this, but I think you are on the right
track. If you can test your patch and post here, somebody will probably
commit it, especially if you can polish it based on the above suggestions.

A bug report just makes it easier to remember to do it (and to point
others to the fix).


Thank you,

Alex.





Re: c++0x and RHEL5.X

2013-12-30 Thread Marcus Kool



On 12/30/2013 10:16 AM, Amos Jeffries wrote:

On 30/12/2013 11:21 p.m., Kinkie wrote:

Hi all,
we have been talking about mandating c++11 some time in the next few months.
Today I was trying to rely on a c++0x feature, and I realized that
RHEL5.X ships gcc 4.1.2, which doesn't support c++0x. RHEL6 ships g++
4.4.7, which supports c++0x but not c++11.

Now, what I need to do here is mostly convenience, I can work around it.
However I am annoyed; we will need to make decision and this fact
complicates things.



RHEL cannot be a blocker for us. They will be stuck with that
half-working GCC version until around 2020 unless they bump it up in a
service pack release.

CentOS has followed that, but is a bit more flexible with compiler
packages. And those or Fedora compiler packages are usually okay for
RHEL as well.



I am running CentOS 5.x and on this has a package called 'gcc44'
which install gcc 4.4.7 next to gcc 4.1.2.
Does RHEL have the gcc44 package ?


I was intending to start the serious decision talk late next year,
probably after 3.5 has gone beta or stable. So that we take a good look
at it for the 3.6 timeframe. That will give us at least half of the
major distros fully on the preferred GCC versions and some like CentOS
etc only a short few years away from EOL on the non-working versions
(probably with packages available for the preferred compiler versions).

PPS. Anything we do in the C++11 direction before 2015 will probably
still require macros and wrappings. So look carefully at the features in
regards to whether there is a non-C++11 equivalent and how messy the
wrappers would make the code.

Amos




Re: SLES build error, what to do?

2013-12-23 Thread Marcus Kool



On 12/23/2013 05:30 PM, Eliezer Croitoru wrote:

Thanks Amos,

On 23/12/13 05:33, Amos Jeffries wrote:



Inquirer.cc:90: error: 'auto_ptr' is deprecated (declared at
/usr/include/c++/4.3/backward/auto_ptr.h:91)

This is a GCC bug. For a couple of releases the STL library required
more advanced C++11 support than the compiler provided.

It can only be worked around by upgrading GCC.

OK so for SLES that has 4.3 specific version and they would probably will not 
upgrade for the next who knows how long..


https://www.suse.com/releasenotes/x86_64/SUSE-SLES/11-SP3/
has the information that you need: SLES 11 SP3 has an optional SDK with gcc 
4.7.2

Marcus


The suggestion is to compile and use it only as one process??
What I mean is that: if there will be pointer error how far can it make the 
runtime error be?

Thanks again,
Eliezer

SNIP

But after using the mentioned option the results seems like:
# tail -f  /usr/local/squid/var/logs/access.log
1387753634.690305 192.168.10.100 TCP_REFRESH_UNMODIFIED/304 306 GET
http://docs.fedoraproject.org/en-US/index.html  -
HIER_DIRECT/80.239.156.215 -

and it seems like it works but will maybe have some problems at runtime?

There will be pointer problems in SMP mode.

Amos






Re: [PATCH] Re: URL redirection with Squid 3.4

2013-12-16 Thread Marcus Kool



On 12/16/2013 01:46 PM, Alex Rousskov wrote:

On 12/14/2013 06:28 AM, Amos Jeffries wrote:

On 14/12/2013 6:59 a.m., Marcus Kool wrote:

all,

as discussed in a previous thread, the URL rewriter protocol of Squid
3.4 is different than with previous versions of Squid.
Despite Amos' belief, I found out yesterday that there is no backward
compatibility since a typical redirection URL is

www.example.com/foo.cgi?category=adulturl=http://www.example.com/foo/bar
and Squid 3.4 has a parser that splits tokens at '=' and then complains
that it does not understand the answer of the URL redirector.



Ouch. Thank you for finding this one.

The fix appears to be limiting the character set we accept for key names
such that it does not match any valid URL. I have now applied a patch to
trunk as rev.13181 which limits characters in kv-pair key name to
alphanumeric, hyphen and underscore.


Based on icap_service experience that had a similar chain of
developments/bugs/fixes, the best fix may be to require uri=... or a
similar key=value pair for communicating URLs.

There is an issue of backward compatibility which might be addressed by
prohibiting bare URLs when newer key=value support is enabled (and
honoring them otherwise). The presence of a uri=... key=value pair can
be used to distinguish the two cases more-or-less reliably.

In other words:

* Want to use the newer key=value format? Use uri=...


Does this mean that a response like
   http://www.example.com/cgi?cat=adulturi=foo
will be parsed correctly? (note the uri= at the end)


* Otherwise, you may continue to use bare URIs.


What would work great for the URL redirector is:
if the response starts with OK, ERR or BH, it can be parsed as the new 3.4 
protocol with kv-pairs,
and if not, it is the old pre-3.4 protocol and should be parsed as such.

I wonder if the same logic works for the other interfaces.
disclaimer: I am not very familiar with all interface changes, only the changes 
for the URL redirector.

Marcus



Please do _not_ interpret the above as a vote against restricting key
name characters. However, we should probably restrict them the same way
for key names _everywhere_ (in all key=value pairs) and not just in
helper responses.


Cheers,

Alex.





Re: url_rewrite_program in Squid 3.4

2013-11-12 Thread Marcus Kool



and must return
OK [status] url=newurl
for a URL that needs to be redirected.

One would expect that ERR is used for an error, not for something
that is the opposite of an error.


The error is that the re-writer could not or would not re-write the URL.


You can return OK without the url=, status= or rewrite-url= keys.

url= is only required *if* the URL is being redirected.
rewrite-url= is only required *if* the URL is being rewritten.


Thanks for the explanation.  This means that the information on
http://www.squid-cache.org/Versions/v3/3.4/cfgman/url_rewrite_program.html
is not correct since url= and rewrite-url are not optional.
I suggest to update this page to include that the result
  OK
is meant for no URL modification / PASS.

Thanks
Marcus


Re: url_rewrite_program in Squid 3.4

2013-11-12 Thread Marcus Kool



On 11/12/2013 09:41 AM, Amos Jeffries wrote:


You can return OK without the url=, status= or rewrite-url= keys.

url= is only required *if* the URL is being redirected.
rewrite-url= is only required *if* the URL is being rewritten.


Thanks for the explanation.  This means that the information on
http://www.squid-cache.org/Versions/v3/3.4/cfgman/url_rewrite_program.html
is not correct since url= and rewrite-url are not optional.
I suggest to update this page to include that the result
   OK
is meant for no URL modification / PASS.



Ah, them docs. I keep looking at the wiki docs for helper protocol:
http://wiki.squid-cache.org/Features/AddonHelpers

Okay, config manual updated.

Amos


I hit refresh in the browser, but did not see an update for
http://www.squid-cache.org/Versions/v3/3.4/cfgman/url_rewrite_program.html
Is there a delay in the update?

Reading http://wiki.squid-cache.org/Features/AddonHelpers
I observed another inconsistency between
http://wiki.squid-cache.org/Features/AddonHelpers
and
http://www.squid-cache.org/Versions/v3/3.4/cfgman/url_rewrite_program.html
which is that in the spec of AddonHelpers states to use a bare URL on a rewrite
while 3.4/cfgman states to use a kv url=URL.
Since you looked at the wiki I assume that the base URL is correct.
Can you confirm this?

Thanks
Marcus


Re: url_rewrite_program in Squid 3.4

2013-11-12 Thread Marcus Kool



On 11/12/2013 06:19 PM, Amos Jeffries wrote:

On 2013-11-13 01:02, Marcus Kool wrote:

On 11/12/2013 09:41 AM, Amos Jeffries wrote:


You can return OK without the url=, status= or rewrite-url= keys.

url= is only required *if* the URL is being redirected.
rewrite-url= is only required *if* the URL is being rewritten.


Thanks for the explanation.  This means that the information on
http://www.squid-cache.org/Versions/v3/3.4/cfgman/url_rewrite_program.html
is not correct since url= and rewrite-url are not optional.
I suggest to update this page to include that the result
   OK
is meant for no URL modification / PASS.



Ah, them docs. I keep looking at the wiki docs for helper protocol:
http://wiki.squid-cache.org/Features/AddonHelpers

Okay, config manual updated.

Amos



I'm not sure exactly what you mean by this. I assume its the [URL] entry in the 
AddonHelper response syntax?

The old response syntax is still supported. That had a bare URL for rewrite and a status:URL pair for redirect. That has been changed to either a status=N url=X pair for redirect or a 
rewrite-url=X for rewrite in the new syntax.
 AddonHelpers mentions the two syntaxes since it covers all supported versions. The config manual only mentions the preferred syntax for the latest version unless you drill down to older release 
series manuals.


Amos

I hit refresh in the browser, but did not see an update for
http://www.squid-cache.org/Versions/v3/3.4/cfgman/url_rewrite_program.html
Is there a delay in the update?


Yes, those docs are generated from the release code. So when 3.4.0.3 comes out 
the site will change.



Reading http://wiki.squid-cache.org/Features/AddonHelpers
I observed another inconsistency between
http://wiki.squid-cache.org/Features/AddonHelpers
and
http://www.squid-cache.org/Versions/v3/3.4/cfgman/url_rewrite_program.html
which is that in the spec of AddonHelpers states to use a bare URL on a rewrite
while 3.4/cfgman states to use a kv url=URL.
Since you looked at the wiki I assume that the base URL is correct.
Can you confirm this?



I'm not sure exactly what you mean by this. I assume its the [URL] entry in the 
AddonHelper response syntax?

The old response syntax is still supported. That had a bare URL for rewrite and a 
status:URL pair for redirect. That has been changed to either a status=N 
url=X pair for redirect or a
rewrite-url=X for rewrite in the new syntax.
  AddonHelpers mentions the two syntaxes since it covers all supported 
versions. The config manual only mentions the preferred syntax for the latest 
version unless you drill down to older release
series manuals.

Amos


OK, I understand the syntax now.
The only thing is the spec in cfgman/3.4 states that 'result' is large spec but does 
not include a simple OK without anything else.

Marcus



url_rewrite_program in Squid 3.4

2013-11-09 Thread Marcus Kool

Hi,

I noticed that 3.4.0.2 uses a new protocol for the url_rewrite_program
that is incompatible with previous versions of Squid.
I am updating ufdbGuard, a URL redirector for Squid, for Squid 3.4
to support the new protocol of Squid version 3.4.

I read
http://www.squid-cache.org/Versions/v3/3.4/cfgman/url_rewrite_program.html
and was utterly surprised to read that a URL redirector must return
   ERR
to indicate that the URL is fine and does not need to be redirected,
and must return
   OK [status] url=newurl
for a URL that needs to be redirected.

One would expect that ERR is used for an error, not for something
that is the opposite of an error.
Is there a chance that the protocol reply ERR can be changed into
something logical like PASS or UNCHANGED ?

Furthermore, I suggest that the BH status code gets a parameter,
a quoted string explaining what is happening with the URL redirector.

Thanks
Marcus



Re: [RFC] Peek and Splice

2013-02-03 Thread Marcus Kool



On 02/01/2013 03:00 PM, Alex Rousskov wrote:

I agree with the general everything we proxy should be available for
analysis principle. Getting to that point would be difficult because
protocols and APIs such as ICAP, eCAP, external ACL helper, and
url_rewriter were not designed to deal with everything. They need to
be tweaked or extended to work with non-HTTP traffic. We already do that
in some cases (e.g., FTP) but more is needed to handle everything.


And that is exactly why I try to encourage you to implement about it now
since doing this together with the planned change is less work than
moving it to a future project.
As a bonus it will make Squid one of the very few proxies which
takes virus scanning and content filtering really seriously.

Marcus


Re: [RFC] Peek and Splice

2013-02-01 Thread Marcus Kool



On 02/01/2013 02:17 AM, Alex Rousskov wrote:

Hello,

 Many SslBump deployments try to minimize potential damage by _not_
bumping sites unless the local policy demands it. Unfortunately, this
decision must currently be made based on very limited information: A
typical HTTP CONNECT request does not contain many details and
intercepted TCP connections are even worse.

We would like to give admins a way to make bumping decision later in the
process, when the SSL server certificate is available (or when it
becomes clear that we are not dealing with an SSL connection at all!).
The project is called Peek and Splice.

The idea is to peek at the SSL client Hello message (if any), send a
similar (to the extent possible) Hello message to the SSL server, peek
at the SSL server Hello message, and then decide whether to bump. If the
decision is _not_ to bump, the server Hello message is forwarded to the
client and the two TCP connections are spliced at TCP level, with  Squid
shoveling TCP bytes back and forth without any decryption.

If we succeed, the project will also pave the way for SSL SNI support
because Squid will be able to send client SNI info to the SSL server,
something that cannot be done today without modifying OpenSSL.

I will not bore you with low-level details, but we think there is a good
chance that Peek and Splice is possible to implement without OpenSSL
modifications. In short, we plan using OpenSSL BIO level to prevent
OpenSSL from prematurely negotiating secure connections on behalf of
Squid (before Squid decides whether to bump or splice). We have started
writing BIO code, and basic pieces appear to work, but the major
challenges are still ahead of us so the whole effort might still fail.


There are a few high-level things in this project that are not clear to
me. I hope you can help find the best solutions:

1. Should other bumping modes switch to using SSL BIO that is required
for Peek and Splice? Pros: Supporting one low-level SSL I/O model keeps
code simpler. Cons: Compared to OpenSSL native implementation, our BIO
code will probably add overheads (not to mention bugs). Is overall code
simplification worth adding those overheads and dangers?


2. How to configure two ssl_bump decisions per transaction?

When Peek and Splice is known to cause problems, the admin should be
able to disable peeking using CONNECT/TCP level info alone. Thus, we
probably have to keep the current ssl_bump option. We can add a peek
action that will tell Squid to enable Peek and Slice: Peek at the
certificates without immediately bumping the client or server connection
(the current code does bump one or the other immediately).

However, many (most?) bumping decisions should be done when server
certificate is known -- the whole point behind Peek and Splice. We can
add ssl_bump2 or ssl_bump_peeked that will be applied to peeked
transactions only:

 ssl_bump peek safeToPeek
 ssl_bump none all

 ssl_bump_peeked server-first safeToBump
 ssl_bump_peeked splice all


Is that the best configuration approach, or am I missing a more elegant
solution?


If there are any other Peek and Splice suggestions or concerns, please
let me know.


Thank you,

Alex.


This PeekSplice feature will make ssl_bump a useful feature since
without PeekSplice ssl_bump aborts all non-SSL CONNECTS from Skype
and other applications, so the user community will certainly welcome this.

Currently Squid only sends to the ICAP server a
   REQMOD CONNECT www.example.com:443 (without content)
and there is never a RESPMOD.
I, as author of ufdbGuard and the (yet unpublished) new ICAP content filter,
would welcome very much if the data of the peeks (client and server)
is encapsulated into ICAP requests for the obvious purpose of
content filtering.

Thanks

Marcus




Re: [RFC] Peek and Splice

2013-02-01 Thread Marcus Kool



On 02/01/2013 01:48 PM, Alex Rousskov wrote:

On 02/01/2013 06:47 AM, Marcus Kool wrote:


This PeekSplice feature will make ssl_bump a useful feature since
without PeekSplice ssl_bump aborts all non-SSL CONNECTS from Skype
and other applications, so the user community will certainly welcome this.


Well, SslBump is already useful in environments where non-SSL CONNECTs
are either prohibited or can be detected and bypassed using CONNECT or
TCP-level information. Peek and Splice will allow bypass of non-SSL
tunnels without building complicated white lists.

While not in this project scope, Peek and Splice would probably make it
possible (with some additional work) to allow Squid to detect and block
non-SSL tunnels without bumping SSL tunnels. That could be useful in
environments where HTTPS is allowed (and does not need to be bumped) but
other tunnels are prohibited.


Yes, I think it is useful to have an option
   allowed_protocols_for_connect: any|ssl




Currently Squid only sends to the ICAP server a
REQMOD CONNECT www.example.com:443 (without content)
and there is never a RESPMOD.
I, as author of ufdbGuard and the (yet unpublished) new ICAP content
filter,
would welcome very much if the data of the peeks (client and server)
is encapsulated into ICAP requests for the obvious purpose of
content filtering.


Squid already sends bumped (i.e., decrypted) HTTP messages to ICAP and
eCAP. If that does not happen in your SslBump tests, it is a bug or
misconfiguration. Squid cannot send encrypted HTTP messages to ICAP or
eCAP -- you must use SslBump if you want to filter encrypted traffic.
There is no way around that.


Yes, correct.  I mixed the behaviour of Squid with sslbump (decrypted
messages go to the ICAP server) and Squid without sslbump (ICAP server
only receives a REQMOD).


Or are you thinking about sending SSL Hello messages to ICAP and eCAP
services? If Peek and Splice succeeds, that will be technically possible
as well, but will require more work and would be a separate project.


I was thinking about this: when Squid peeks at the data and finds that it
is non-SSL, send it to the ICAP server to ask its opinion.
This is obviously more work, but also extremely useful, since a
content filter is only useful if it is able to inspect _all_ content,
and consequently the feature of Squid to connect to content filters
is only useful if Squid sends _all_ data to the content filter for analysis.

Perhaps needless to say: virusses like to communicate in non-standard
ways to Squid would be considered much more secure if it sends _all_ data
to an ICAP server for analysis.

Marcus



Re: Spaces in ACL values

2012-09-13 Thread Marcus Kool



On 09/13/2012 07:16 PM, Alex Rousskov wrote:

2) Add squid.conf directives to turn the new parsing behavior on and off
for a section of the configuration file. This is also 100% backward
compatible but difficult to introduce gradually -- admins will expect
everything inside a quoted strings section to support quoted strings,
and I am not 100% sure we can easily support that because different
options use different token parsers.

# start new quoting support section
configuration_value_parser quoted_strings
# now just use the new quoting support
acl badOne1 user_cert CN Bad Guy
acl badOne2 ext_user Bad Guy

# restore backward-compatible mode
configuration_value_parser bare_tokens
acl oldOne user_cert CN One Two and Four


2b) Add squid.conf directives _at the beginning_ of the conf file
to specify the parser behavior.  So do not toggle and force the admin
to be aware of quoted strings and _must_ check the whole config file himself.
The default value of config_used_quoted_strings is off.
This is still 100% backwards compatible without doing lots (?) of effort
to please everybody and every situation.

Marcus


Re: processing of ICAP Transfer-Ignore options

2012-04-16 Thread Marcus Kool


On 04/16/2012 04:34 AM, Henrik Nordström wrote:

sön 2012-04-15 klockan 22:07 -0300 skrev Marcus Kool:


Are you saying that you want to use the Content-Type header as the
main guide for determining the file extension ?


Yes, when there is a usable content-type.


The idea itself is good.  The problem is that it is very different
than what the ICAP RFC states. I think that the negation of filtering
based on Content-Type should use a new parameter, e.g. Ignore-Content-Type.

And lets not forget that Transfer-Ignore based on a part of the URL
can be used for REQMOD and RESPMOD while Ignore-Content-Type can
only be used for RESPMOD.
This has a small performance impact: there will be more ICAP traffic
since less can be ignored.


Anyway, clarity is the most important thing here and I suggest to
move this discussion to the ICAP discussion forum.


Clarity in a pile of mud...


Hahaha. Do you propose to make a ICAP2 standard that is not backwards
compatible?


Regards

Marcus


Re: processing of ICAP Transfer-Ignore options

2012-04-16 Thread Marcus Kool



I have a number of clients dealing with this most common case among the popular 
CMS systems today...


GET http://example.com/index.php?some/file.jpg HTTP/1.1
...
HTTP/1.1 200 Okay
Content-Type: text/xml
...
GET http://example.com/imagews/1861245634-230is86 HTTP/1.1
...
HTTP/1.1 200 Okay
Content-Type: image/jpeg
...


If one is lucky the CMS *may* put .jpg on the second URI name.

Amos


The referred clients use reverse proxy or forward proxy ?

Yeah, the more we talk about this issue, the more I think the existing 
Transfer-Ignore
and a new Ignore-Content-Type are bogus since there are still too many web 
servers
and CMS's that do things wrong. Both the file extension and the Content-Type 
is too
often wrong.

For RESPMOD, icapd uses content sniffing since when it blocks
an object it insists in sending new content with the correct Content-Type.

I came up with the original question since I try to optimize performance and
ignore irrelevant content by not sending it to the ICAP server.
I think we are stuck with processing all traffic without ignoring any content.

Marcus


Re: processing of ICAP Transfer-Ignore options

2012-04-16 Thread Marcus Kool



On 04/16/2012 11:58 AM, Henrik Nordström wrote:

mån 2012-04-16 klockan 09:40 -0300 skrev Marcus Kool:


The idea itself is good.  The problem is that it is very different
than what the ICAP RFC states.


Is it?

   A list of file extensions that ...

It says file extensions. What is a file?

In my mind the closest to file is what you get on your harddrive when
you download something, and there is no direct map url -  file
extension. The file extension is derived from a combination of
content-type, content-disposition and URL.


I think we agree that file extension is an inappropriate term in
this context.
I agree that it would be more suitable to ignore transfers to the
ICAP server based on Content-Type.

However, looking at the RFC where the example uses asp, bat, exe, com, ole
it seems that the authors of the RFC were thinking of a URL-based suffix,
not content-type.


  I think that the negation of filtering
based on Content-Type should use a new parameter, e.g. Ignore-Content-Type.


That's a useful replacement.

Regards

Marcus


Re: processing of ICAP Transfer-Ignore options

2012-04-15 Thread Marcus Kool



On 04/15/2012 02:33 PM, Henrik Nordström wrote:

lör 2012-04-14 klockan 19:11 -0600 skrev Alex Rousskov:


Sure, I am just trying to find a way to improve compatibility of ICAP
agents, even though the ICAP protocol itself is using wrong concepts
when defining what was meant as a pretty useful feature.


I'd propose the following algorithm:

1. Look up content-type in the mime table and deduce file extension from
there unless the content-type is application/octet-stream. Limited to
mime table expressions on the form \.ext$ where ext do not contain any
special regex patterns.


Are you saying that you want to use the Content-Type header as the
main guide for determining the file extension ?

I think that any change should stay close to the vague definitions
of the ICAP RFC. The text explaining the Transfer-Complete gives an
example of bat which is probable the old Windows .BAT command file
which probably has a Content-Type of text/plain.
IMO using the Content-Type will not have the desired behavior.

At the time that the ICAP RFC was written there were hardly any CGI scripts
and I believe that the intention was that the suffix of the URL was
the file extension. Today, with the CGI parameters one could argue
that they should be stripped before determining the file extension.

Anyway, clarity is the most important thing here and I suggest to
move this discussion to the ICAP discussion forum.

Marcus


2. For application/octet-stream or when the file extension is otherwise
uncertain, identify the filename and derive file extension from there,
in priority order

  a) Content-Disposition filename parameter

  b) URL-path

  c) Last part of query parameters

With some handwaving and juggling to determine priority of b  c...

Regards
Henrik





Re: processing of ICAP Transfer-Ignore options

2012-04-14 Thread Marcus Kool

Yes, the file extension is vague, hence my original question.

However, Squid 3.1 thinks that the file extension is the last bit
of the URL after the last dot (like if the URL had a filename suffix).

It seems logical to strip the CGI parameters before evaluating the
file extension.

I studied a lot of URLs, the file extension and Content-Type
and it turns out that the file extension is far more reliable
as an indicator of the content type than the Content-Type itself.

Best regards,
Marcus

On 04/13/2012 06:42 PM, Henrik Nordström wrote:

fre 2012-04-13 klockan 13:21 -0600 skrev Alex Rousskov:


Yes, but primarily because the extension is not clearly defined. This
is something we can address in ICAP Errata, I guess: Provide a
definition of what should be considered a file extension, with a
disclaimer that not all agents will use the definition provided. It
would not solve all the problems but would be better than doing nothing.


ICAP was designed for HTTP. HTTP does not have file name extensions,
HTTP have content types.

Regards
Henrik





processing of ICAP Transfer-Ignore options

2012-04-13 Thread Marcus Kool


I am testing the ICAP interface of Squid 3.1.18 and noticed the following:

The OPTIONS for RESPMOD is this:
ICAP/1.0 200 OK0d
Methods: RESPMOD0d
Preview: 81920d
Transfer-Preview: *0d
Transfer-Ignore: 
bmp,ico,gif,jpg,jpe,jpeg,png,tiff,crl,avi,divx,flv,h264,mp4,mpg,mpeg,swf,wmv,mp3,wav,ttf,pdf,rar,tar,zip,gz,bz2,jar,js,json,htm,html,dhtml,shtml,css,rss,xml0d
Service: ICAPD 0.9.1 ICAP server by URLfilterDB0d
Service-ID: URLfilterDB0d
ISTag: 4f883424-d44b0d
Connection: keep-alive0d
Encapsulated: null-body=00d
Max-Connections: 5000d
Options-TTL: 6000d
Allow: 2040d
Allow: 2060d
X-Include: X-Client-IP, X-Server-IP, X-Forwarded-For, X-Subscriber-ID, 
X-Client-Username, X-Authenticated-Groups0d

and the Transfer-Ignore processing works as expected for .gif etc. (e.g. the
ICAP server does not receive the previews) _except_ for
http://zzz.com/1409303.mp4?p1=2012-xxx
where the ICAP server unexpectedly receives the preview.

There is no formal definition in the RFC of what a file extension
is. So the question is: is the file extension of
http://zzz.com/1409303.mp4?p1=2012-xxx
mp4 ?

If yes, I will file a bug report.

Marcus



Re: filtering HTTPS/CONNECT (summary and continuation of discussion)

2012-03-20 Thread Marcus Kool

Well, herd of cats is a term I've seen recently to describe FOSS project dev 
teams. Pretty accurate. You yourself are already part of the team simply by dint of your 
contribution pushing this
discussion far enough forward to get a work plan out of it.

With the work plan it should be easy to make up quotes and try to get 
sponsorship for all or parts of it. Some parts can be crossed between projects 
and prioritized by those of us interested in
general code cleanups or proposed to a wider audience of sponsors than would 
support the feature you are asking for.

Amos


Amos,
I am not a native speaker and do not get the hint herd of cats.

I can contribute in all areas except modifying the code of Squid.

Best regards,
Marcus






Re: filtering HTTPS/CONNECT (summary and continuation of discussion)

2012-03-19 Thread Marcus Kool



On 03/17/2012 09:06 PM, Henrik Nordström wrote:

lör 2012-03-17 klockan 11:10 -0600 skrev Alex Rousskov:


No, it will not by default. One would have to maintain a white list of
destinations that should not be bumped.


Which you can't for thinks like Skype as they connect pretty much
anywhere (peer-to-peer network).


This is just one example. There is a growing list of services
that use CONNECT: Citrixonline, videoconferencing, and other chat
applications.


Regards
Henrik


Re: filtering HTTPS/CONNECT (summary and continuation of discussion)

2012-03-19 Thread Marcus Kool



On 03/19/2012 01:48 PM, Henrik Nordström wrote:

mån 2012-03-19 klockan 11:35 -0300 skrev Marcus Kool:

An unfiltered CONNECT (default for Squid) allows (SSH) tunnels.


Squid standard configuration only allows port 443, which restricts this
to those who intentioanlly want to pierce any network usage policy.


I foresee a change. I foresee an increasing desire to be able to
filter everything because of the need to remove the existing holes
in security.


There is undoubtly such environments.

The question is if Squid is the right tool for this, or if it's in the
target for Squid.


This is an important point.

It is the development team who makes the decision which features
will be implemented.  Surely there is some common idea about
which direction Squid will go to but it is not clear to me.
I read the roadmap but it is sort of a wishlist and therefore I
started this discussion.
As Alex stated, there is no use in starting work on a pipe
filter for the filter if there is no Squid developer interested
in doing the work on Squid.

I am not in the position to actively support pipe filtering,
so the only thing that I can do is ask for it.

Best regards
Marcus


Re: filtering HTTPS/CONNECT (summary and continuation of discussion)

2012-03-17 Thread Marcus Kool


Alex Rousskov wrote:

On 03/16/2012 03:05 PM, Marcus Kool wrote:


How do we go on from here?


I recommend splitting this big problem into several smaller areas:

Tunnel classification: As Henrik noted, Squid should wait for client (or
server!) handshake before starting the SSL handshake with the server.
Waiting for one of the sides to speak first (i.e., before Squid) allows
us to categorize the tunnel intent: SSL, HTTP, Other. This step is
critical for other projects below.


Indeed this step is critical. Squid may not guess (wrongly) and
unintentionally cause problems for applications that use CONNECT.


HTTP tunnel: Either go to tunnel.cc or process almost as a regular
request stream. Make the choice configurable.


Not sure what you mean with HTTP tunnel (see below).


SSL tunnel: Use bump-server-first. Add SNI forwarding support. If SSL
handshake with the server fails (there are many broken and weird servers
out there!), bump-server-first returns a secure error to the client. In
some cases, it may be better to re-tunnel the server end (without
bumping) or just close the client connection immediately. The former
requires serious coding effort; the latter does not, but both are pretty
straightforward.  And make the choice configurable.


Only after detection of SSL and after a successful SSL handshake Squid
can detect what happens inside the SSL-wrapped data stream.
Again Squid needs to monitor the server and the client and detect what
is inside the SSL-wrapped data stream: 'regular HTTP' or 'something else'.
When you refer to HTTP tunnel, do you mean SSL-wrapped HTTP ?
Squid should switch to tunnel mode for an SSL-wrapped non-HTTP stream.


Other tunnel: When a non-HTTP traffic is encountered at the beginning of
a tunnel, switch to the tunneling mode or terminate both connections.
Make the choice configurable.


There are too many applications, and of course, a Squid admin wants to block
some and allow others. One switch for all applications seems not very
useful.  Currently, only filters detect the various applications and can do
the selective blocking. Since not all Squid installation have an (ICAP) filter,
it is probably a good thing to have the switch anyway.


Filterable Other tunnels (bumped or not!): Define a protocol and/or API
to adapt tunnel.cc (or similar) I/O. Learn from ICAP mistakes. Implement
the client/hosting side of that protocol/API in Squid. 3rd parties will
implement the service/adapter sides.


Also a must have to satisfy the idea that all data must be filterable.


Did I miss any big cases?

As you can see, all of the above are pretty much independent projects.
Are _you_ interested in all or just some of them?


My point of view is filter based and I think that you could already
read between the lines that I think that Squid should have it all to
make all data filterable.  Filtering is done for security; to block
Skype but allow other safer chat and VOIP applications. To block
HTTPS proxies, to prevent (accidental) leaks of documents to public
document sharing sites. And of course to block viruses.  Filtering
is also done to force employees to pay more attention to their work
and less to the sports comments on the internet, but filtering for
security is more important. As the web changes and more servers
use HTTPS and more applications use CONNECT, I think Squid should
have it all to remain a fully featured and safe web proxy.


Will you do any work on Squid itself or are you looking for a volunteer
on our end? If it is the former, would you like to create dedicated wiki
pages for those projects you are interested in and start nailing down
the details?


I know very little of the internals of Squid and do not have the time
to get to know the code well enough to make these types of changes.

I am willing to write feature pages and assist in writing a detailed document
for the new pipe filter protocol. ufdbGuard is GPLv2 and the
new ICAP filter will also be GPLv2.  The pipe filter module for the server
will also be GPLv2 so that others (I am thinking of antivirus) can benefit
from it.


If you are looking for volunteers to work on the Squid side, then I
would not recommend doing much on your end until you secure at least one
such person. Otherwise, you may end up with a filter that you cannot
attach to Squid.


I do not want to appear to try to push the Squid team to do things.
I know that you all are busy and that new features will have priorities
and will be queued. I can only hope that the development team shares
the same view that to remain a safe proxy, Squid needs the ability to
filter *all* data.

At this moment my #1 priority wish is that Squid 3.1.x or 3.2.x can
be used with the sslBump feature turned on which does not break
Skype and other applications using CONNECT.  Will the new
bump-server-first do this?


Thank you,

Alex.




Re: filtering HTTPS/CONNECT (summary and continuation of discussion)

2012-03-16 Thread Marcus Kool

There were 4 threads about 'filtering HTTPS' and I will try to
summarise here.

Current situation with Squid 3.1.19:
What happens inside a CONNECT is practically not filterable because
1) sslBump is not used, or
2) sslBump is used and SSL+HTTP can be filtered, but it breaks the
   other data streams for Skype et al.  Using the unsafe options
   'sslproxy_cert_error allow all' and 'sslproxy_flags DONT_VERIFY_PEER'
   to circumvent the latter problem are far from desirable.

The wiki features pages say that Alex Rousskov is working on BumpSslServerFirst
and MimicSslServerCert but unfortunately Alex has not (yet) participated in the
discussion.

What I consider as the desired situation:
*all* traffic will be filterable, since if there is an exception for
one category of data, one can write an application that makes a tunnel
using this particular category of data and hence is able to circumvent
all efforts to filter traffic.

To filter HTTP is trivial. To filter HTTPS there are two options:
1) to filter without sslBump and then the filter only receives
   CONNECT endpoint:443 on which it has to make a decision to block
   or not.  This cripples the filter since it does not has access to the
   content and in many cases can not detect which application sends
   what (type of) data.
   An additional drawback is that connection can be blocked but an
   understandable error message cannot be presented to the end user.
2) use sslBump. The filter will receive CONNECT endpoint:443 as well as
   https://endpoint/path; (and content for RESPMOD) for SSL+HTTP based
   connections so this is optimal for filtering SSL+HTTP connections.
   The discussion was much around what to do with data streams that are not
   SSL+HTTP.  This can be any protocol encapsulated by SSL or simply any
   protocol.

To be able to filter all data, Squid needs a modification to present raw data
about the non-SSL+HTTP data streams to a filter (URL redirector or ICAP).
To keep the discussion focussed on one type of filter I will assume that
an ICAP server is used as the filter.

The ICAP protocol has a considerable overhead (CPU processing) and extending
the ICAP protocol for data stream filtering is not the first choice.
Amos and Henrik were optimistic about implementing a new pipe filter.

The data streams for a bidirectional pipe have a different behavior than
HTTP and SSL+HTTP. Both client and server can send data at any time. And
for some, the server initiates the protocol and for others, the client
initiates.  OpenVPN is a chameleon and can pretend to be an SSL+HTTP server
but is also a VPN server.

In all cases that Squid sends a request to a filter, it would be
a *big* plus if it informs the filter what it already knows about the
CONNECT endpoint.  E.g. If it has SSL/TLS or not.

Since sslBump is being rewritten for 3.3 it is a good opportunity
to make Squid suitable for filtering *all* data streams.

The new sslBump flow could be something like this:

A) open socket to server. If error, close socket to client.
B) do the logic for ICAP REQMOD CONNECT endpoint:443
C) start SSL handshake to server and take care of all certificate issues.
   If the SSL handshake fails with a PROTOCOL error, the socket must be closed,
   a new socket must be opened, and Squid will assume that the endpoint
   uses an other protocol than SSL. Squid goes into tunnel mode and all
   filtering will be done by the new pipe filter.
   Squid may get a new option to define its behaviour in case the SSL handshake
   fails. The options could be called sslBumpForNoneSSL with values
   prohibitNoneSSL (terminate connection), passNoneSSL (always allow),
   filterNoneSSL (default value - let new pipe filter decide).
D) Squid now knows that the connection has a SSL/TLS wrapper but does not know
   yet if inside the wrapper HTTP is used.
   Squid monitors what the client *and* the server send on the pipe. If the
   client sends first and sends a valid HTTP command, Squid assumes that the
   connection has SSL+HTTP.
   If there is no SSL+HTTP Squid goes into tunnel mode and all filtering will be
   done with the new pipe filter.
E) do the normal processing and ICAP REQMOD/RESPMOD for https://endpoint/path

The total work of Squid+filter can be reduced if B) is done after C) since
Squid can inform the filter about the SSL handshake and the filter does
not have to do its own probe.

There was a suggestion for a connection cache which allows it to skip checks
and make assumptions about a new CONNECT to an endpoint that was CONNECTed 
before.

The new pipe filter requires a new protocol yet to be defined.
Squid initially tells the filter what it already knows about the endpoint.
I.e. uses SSL or not, time to CONNECT, endpoint address, cached information.
The Squid pipe sends copies of all data to the filter and the filter can reply
with one of the following: OK (proceed with this data), REPLACE-CONTENT (content
and a flag to optionally also terminate the connection), TERMINATE 

Re: filtering HTTPS/CONNECT (summary and continuation of discussion)

2012-03-16 Thread Marcus Kool


Alex Rousskov wrote:

On 03/16/2012 03:05 PM, Marcus Kool wrote:

There were 4 threads about 'filtering HTTPS' and I will try to
summarise here.

Current situation with Squid 3.1.19:
What happens inside a CONNECT is practically not filterable because
1) sslBump is not used, or
2) sslBump is used and SSL+HTTP can be filtered, but it breaks the
   other data streams for Skype et al.  Using the unsafe options
   'sslproxy_cert_error allow all' and 'sslproxy_flags DONT_VERIFY_PEER'
   to circumvent the latter problem are far from desirable.

The wiki features pages say that Alex Rousskov is working on
BumpSslServerFirst
and MimicSslServerCert but unfortunately Alex has not (yet) participated
in the discussion.


Sorry, I was on a business trip when the discussion started and could
not respond until now (I tried!).


ok, no need to apologise.




To filter HTTP is trivial. To filter HTTPS there are two options:
1) to filter without sslBump and then the filter only receives
   CONNECT endpoint:443 on which it has to make a decision to block
   or not.  This cripples the filter since it does not has access to the
   content and in many cases can not detect which application sends
   what (type of) data.
   An additional drawback is that connection can be blocked but an
   understandable error message cannot be presented to the end user.


I believe this is already supported.


Yes. Technically works but the issue of not being able to give
the end user a different error than cannot connect to server
is annoying to users.




2) use sslBump. The filter will receive CONNECT endpoint:443 as well as
   https://endpoint/path; (and content for RESPMOD) for SSL+HTTP based
   connections so this is optimal for filtering SSL+HTTP connections.
   The discussion was much around what to do with data streams that are not
   SSL+HTTP.  This can be any protocol encapsulated by SSL or simply any
   protocol.

To be able to filter all data, Squid needs a modification to present raw
data
about the non-SSL+HTTP data streams to a filter (URL redirector or ICAP).


or eCAP.


I read about eCAP but when I decided to make a new URL filter
(I already wrote ufdbGuard a URL redirector), I decided for ICAP
since it is more widespread and eCAP not yet matured.

My new ICAP server (no better name yet than ufdbicapd) is multithreaded,
loads a 200 MB URL database in memory and not that straightforward
to put inside Squid with a loadable module.
I do not want to judge eCAP since I know little about it, also
because there is not that much documentation.
I think I will look at it again to see if a hybrid solution is
feasible.


To keep the discussion focussed on one type of filter I will assume that
an ICAP server is used as the filter.

The ICAP protocol has a considerable overhead (CPU processing) and
extending
the ICAP protocol for data stream filtering is not the first choice.
Amos and Henrik were optimistic about implementing a new pipe filter.

The data streams for a bidirectional pipe have a different behavior than
HTTP and SSL+HTTP. Both client and server can send data at any time. And
for some, the server initiates the protocol and for others, the client
initiates.  OpenVPN is a chameleon and can pretend to be an SSL+HTTP server
but is also a VPN server.

In all cases that Squid sends a request to a filter, it would be
a *big* plus if it informs the filter what it already knows about the
CONNECT endpoint.  E.g. If it has SSL/TLS or not.

Since sslBump is being rewritten for 3.3 it is a good opportunity
to make Squid suitable for filtering *all* data streams.


Sure, although please keep in mind that the bump-server-first and
certificate mimicking code is pretty much complete. We are going through
beta testing and code polishing cycles now. I hope I would not have to
rewrite a lot of stuff that already works!


well, always good to hear that a project is almost done.




The new sslBump flow could be something like this:

A) open socket to server. If error, close socket to client.


If there is an error, bump-ssl-server-first returns an error to the
client, after establishing a secure connection with it. Closing the
connection can sometimes be a good option as well, of course.


Yeah, this depends on the error. When Squid cannot make a connection
to the server, it could simple close the socket to the client.
Just an idea. But doing a full handshake with a client and given
a user-friendly error message is very nice.



B) do the logic for ICAP REQMOD CONNECT endpoint:443


Bump-ssl-server-first does not change the order of ICAP processing and
server connection establishment. And it would be wrong to change it,
IMO. In other words, your (B) should come before (A) because (B) may
change where we are connecting or even prohibit the CONNECT request
(among other things):

  1. Receive CONNECT.
  2. Authenticate/etc.
  3. Adapt/redirect/etc.
  4. Bump.


You are right. I totally forgot about the REQMOD post-cache vectoring point
and what I

Re: filtering HTTPS

2012-03-14 Thread Marcus Kool



Henrik Nordström wrote:

tis 2012-03-13 klockan 19:27 -0300 skrev Marcus Kool:

Squid is not the tool for filtering non-http(s) traffic beyond requested
hostname.

I agree. Squid is not. This task is for the URL rewritors and ICAP servers.
One way or another, Squid should offer all data that passes through it (1)
to a filter.  I like ICAP, but ICAP is designed for HTTP and not HTTPS
and certainly not for non-HTTP, non-HTTPS data streams.


non-HTTP traffic do not fit URLs or ICAP either. How would you map an
SSH session?


Sorry, I know virtually nothing about the internals of Squid so how
to map it... I don't know.

The only thing that I can say at this moment that Squid should give a
filter the opportunity to inspect the content. If for whatever reason
Squid cannot provide the content of a data stream, it should at least
signal the filter that it does a CONNECT to a non-SSL+HTTP address
so that the filter can probe/analyse/decide what to do.




A filter pipe is interesting. A question is on how to implement it.
ICAP has no support for it and in my opinion ICAP should be extended
to support this. I know it is a long way to extend existing protocols
but maybe it works by just doing it and making it a de facto standard.


ICAP is designed for HTTP and is very message at a time oriented,
separating request  response seeing them as separate entities.

For HTTP(s) it does support piped operation where the request  response
is being filtered as it's being forwarded. It's only a matter of the
ICAP Server starting it's response before the whole request/response
have been seen.

But I do not think ICAP is suitable for general data stream
filtering/adaptation. The protocol is simply not designed for it. In a
data stream filter/adaptation you want to operate on the bidirectional
datastream as a whole.

Regards
Henrik





Re: filtering HTTPS

2012-03-14 Thread Marcus Kool



Tsantilas Christos wrote:

On 03/13/2012 05:12 PM, Marcus Kool wrote:


Henrik Nordström wrote:

And if both sides is monitored for traffic then detection do not need
to rely on timeout. If any message is seen from server or if something
that do not look like ssl hello is seen from client then enter tunnel
mode.

There is one but still, non-http protocols over ssl/tls, not just
CONNECT but actual ssl/tls. Those need ssl/tls tunnel mode where
application protocol is tunneled between client and server ssl
connection. And maybe a dynamic ssl-bump blacklist.

Where does the filtering gets involved? Also NoneSSL sites (aka
tunnelmode) need to be filtered/blocked and/or scanned for virusses.


Is it good idea to try filtering any(?) protocol (eg skype, streaming
servers etc) using HTTP proxies and the ICAP protocol implemented to
filter HTTP content?


Yes.  Skype is not just a simple chat. It does file transfers and
remote desktop viewing.  There are lots of sites who block Skype and
allow ebuddy. Others only allow Yahoo IM and block all other chats.
It is not up to us to decide what can be blocked.  That is up to the
administrator of Squid and the filters.

If Squid filters 95% but intentionally does not filter some type
of data, you will have in no time a new application that uses this
unfiltered type of data to build a tunnel circumventing all filters.


A sslbump whitelist is probably desired as well, skipping ssl/tls
verification if it's already known the server is an https server.

A whitelist has a security issue: www.mybank.com can be safe today and
hacked tomorrow.


I agree with Henrik here. The whitelist is a list saying that the
sslbump can not be used for some sites.


There was some confusion what is meant by 'whitelist'. An other thread
clarified this.
I agree with a cache for already verified endpoints but be careful:
OpenVPN uses a trick to divert HTTPS traffic to a webserver and
the other data streams are used for the VPN.


Skipping certificate verification is unsafe. One should be extremely
careful on skipping it.
A certificate cache seems better: one caches the certificates of
www.mybank.com and on the next CONNECT (the SSL handshake has to be done
anyway), and Squid can bypass the certificate checking rules if the sent
certificates were used in previous CONNECTs.


This is a security issue. The server certificate may change for many
reasons, eg considered unsafe because of a bad private/public key. You
should always check server certificate.


One does not need to re-check if a new connection receives the same 
certificates.
I see for example thousands of CONNECTs in a short time to 
http://plus.google.com
One user for one webpage can have several CONNECTs.
I think it is safe to use a time-limited cache.


And maybe also a CONNECT cache: so that Squid remembers to go into
tunnelmode directly without trying to do a SSL handshake for every Skype
connection.








Re: filtering HTTPS

2012-03-14 Thread Marcus Kool



Tsantilas Christos wrote:


Issue 1: one cannot block CONNECT in an elegant way. I.e. a CONNECT
to an undesired site cannot be redirected or anything since the
application (possibly browser) want do a SSL handshake and it it fails
it displays the 'vague error' cannot connect to site www.example.com
which is indeed vague for an end user who usually only understands
messages like you are not authorised to go to www.example.com.

For true SSL+HTTP (https) sites, issue 1 can be resolved by *not* blocking
the CONNECT and wait for the next GET https://www.example.com/index.html;
and block/redirect this object. Lets call this a 'postponed SSL+HTTP
block'.
But for sites which do not use SSL+HTTP there is not a good solution since
Squid and the URL redirector only see a CONNECT and never see a
GET/HEAD/POST.


I think you are describing the Bump-Server-First feature which is
currently under development:
  http://wiki.squid-cache.org/Features/BumpSslServerFirst


I read about this feature before posting. The feature description
only talks about CONNECT to an SSL+HTTP endpoint while I like
to extend the scope to CONNECT to any endpoint.
The number of data protocols that Squid can encounter are infinite:
- SSL+HTTP
- SSL+ANYTHING
- ANYTHING




Issue 2: Skype does not work any more with sslBump. SSH tunnels, VPNs
and other
chat applications also stop working with sslBump since the sslBump feature
does its SSL certificate checking and if this fails, the CONNECT fails.
Using the options 'sslproxy_cert_error allow all' and
'sslproxy_flags DONT_VERIFY_PEER' is not considered useful since thay are
truely very unsafe and I recommend never to use them.


A ways is to use ACLs to select sites and maybe applications which can
be sslBumped or not.


Eh, yeah. Not so easy. I do like very much that Squid administrators
make these lists. Not all admins have sufficient knowledge or complete
understanding of all pitfalls and risks.
ufdbGuard probes CONNECT endpoints and caches the result.
It has a table of known Skype login servers but there are many
Skype nodes (Skype users) that can only be detected dynamically.
For a CONNECT to a Skype endpoint, ufdbGuard has the knowledge to signal
to Squid to sslBump or not. And if the configuration says to block Skype,
ufdbGuard can signal to Squid do not do the effort of doing an SSL handshake
and terminate the thing.
But lets not focus on just Skype, there is a growing number of applications
that use CONNECT. And they may use any protocol.




More background information:
The URL redirector ufdbGuard has a feature to probe HTTPS connections.
It does a SSL-handshake if this works it is followed by GET / HTTP/1.0
If the SSL-handshake does not work it probes for SSH, Skype and other
chat protocols to find out where the the application CONNECTs to.
ufdbGuard can block CONNECT to IP addresses but make exceptions
for the CONNECTs which are used by allowed chat protocols.
SSH and VPNs are blocked by ufdbGuard if the administrator has configured
to block proxies.

HTTPS is used more and more. Even Google uses it for their search engine.
It is necessary to have a safe HTTPS proxy and content filtering in an
absolutely safe and efficient way.

Proposal:
To have a good combination of web proxy and content filtering combination
I propose the following:
A) Squid's behaviour is modified for sslBump: after an unsuccessful SSL
handshake, the CONNECT does not fail any more by default.  This is to
ensure
that Skype et al. remains functional.


Correct.
This behaviour partially is already implemented under the
Bump-Server-First feature.

I am saying partially because currently works only with HTTPS protocol
(requires that both client-to-squid and squid-to-server connections are
supporting SSL) but can easily extended to support other protocols.
It is easy to extend Bump-Server-First to not initiate SSL connection
with the client if the server is not an SSL server.

But again, why not using ACLs to avoid applying sslBump on applications
like skype?


What ACL do you have in mind?




B) Squid gets a new option to define its behaviour in case the SSL
handshake
fails. The options could be called sslBumpForNoneSSL with values
prohibitNoneSSL (terminate connection), passNoneSSL (always allow),
filterNoneSSL (default value - let ICAP or URL rewritor decide).


Yep, looks that you are right here, something like that required...



C) Squid notifies the URL rewritor and ICAP server about the result of
the SSL handshake. This is to optimise the filters and not do things twice.
Web servers do no like probes and may temporarily block sites that use
Squid
if they receive too many probes, so the least number of probes the better.
I.e. the line sent to the URL redirector is extended with a new flag
like SSLhandshake=(verified|noSSL). This should not break existing URL
redirectors since it already has the variable length urlgroup and most URL
redirectors will consider the new flag part of the urlgroup.
Probably a few URL 

Re: filtering HTTPS

2012-03-14 Thread Marcus Kool



On 03/14/2012 01:33 AM, Amos Jeffries wrote:

It does. http://www.squid-cache.org/Doc/config/icap_206_enable/

The 206 responses are similar to 204 responses (inside or outside
preview) but also allow modifying the headers or the head of the data.

Data streams come in parts.
Maybe a filter wants to see the first data chunk of the client, followed
by the first data chunk from the server and followed by the second data
chunk
of the client to finally decide: block (close sockets) or say I am not
interested anymore.  So the filter receives all data chunks of the data
stream
until it signals the proxy about its decision. For all chunks, when there
is not yet a decision, the filter needs to respond with something like
Continue.


The ICAP protocol is not able to handle such cases. Just extending the
ICAP protocol  is not enough.
Also my opinion is an HTTP proxy is not the correct tool to handle this
type of filtering...
Maybe can be implemented in squid but requires completely new
interface/module to handle this. You can not just extend the ICAP/ECAP
filtering subsystems


Yes, I understood from Henrik's reply that his thought go to a
new type of data stream filter.

There is a no industry standard to filter data streams.
So there is an important decision to make: extend an existing standard
or make a new protocol that only works between Squid on one hand and
ufdbGuard, my new ICAP server and possibly a few other tools.


Re: filtering HTTPS

2012-03-13 Thread Marcus Kool
 sslBumpForNoneSSL with
values prohibitNoneSSL (terminate connection), passNoneSSL (always
allow), filterNoneSSL (default value - let ICAP or URL rewritor
decide). C) Squid notifies the URL rewritor and ICAP server about the
result of the SSL handshake. This is to optimise the filters and not
do things twice. Web servers do no like probes and may temporarily
block sites that use Squid if they receive too many probes, so the
least number of probes the better. I.e. the line sent to the URL
redirector is extended with a new flag like
SSLhandshake=(verified|noSSL). This should not break existing URL
redirectors since it already has the variable length urlgroup and most
URL redirectors will consider the new flag part of the urlgroup.
Probably a few URL redirectors need a minor modification. For ICAP
Squid could send a new header called X-Squid-SSLhandshakeResult. D)
squid.conf.documented, wiki and other documentation is updated that
'sslproxy_flags DONT_VERIFY_PEER' and 'sslproxy_cert_error allow all'
are unsafe and not recommended. E) the option 'squid-uses-ssl-bump' is
introduced to ufdbGuard. If set to 'yes' it will not verify the use of
proper SSL certificates. If Squid can send the new flag SSLhandshake
(URL redirector) or X-Squid-SSLhandshakeResult (ICAP server), the URL
redirector and ICAP servers can be optimised further.

Marcus Kool






Re: filtering HTTPS

2012-03-13 Thread Marcus Kool


Henrik Nordström wrote:

tis 2012-03-13 klockan 12:12 -0300 skrev Marcus Kool:


Where does the filtering gets involved? Also NoneSSL sites (aka tunnelmode) 
need to be filtered/blocked and/or scanned for virusses.


Squid is not the tool for filtering non-http(s) traffic beyond requested
hostname.


I agree. Squid is not. This task is for the URL rewritors and ICAP servers.
One way or another, Squid should offer all data that passes through it (1)
to a filter.  I like ICAP, but ICAP is designed for HTTP and not HTTPS
and certainly not for non-HTTP, non-HTTPS data streams.

(1) a virusscanner and a URL filter do not need the *whole* data stream.
The first max-64K upload and the first max-64K download is most likely
sufficient  to determine what to do, pass or block. The protocol should
have a feature that the filter is able to tell to Squid Continue with
this data stream, but I am not interested in it any more.


But it would be trivial to extend tunnel mode with a filter pipe, both
in normal tunnel mode and SSL relay mode (decrypted  encrypted,
tunneling between two SSL connections).


A filter pipe is interesting. A question is on how to implement it.
ICAP has no support for it and in my opinion ICAP should be extended
to support this. I know it is a long way to extend existing protocols
but maybe it works by just doing it and making it a de facto standard.

The question is what works best:
A) use extended ICAP for regular HTTP(S) and data streams
B) use ICAP for regular HTTP(S) and a new data stream protocol for data streams


Regards
Henrik


Fwd: subscription

2011-07-05 Thread Marcus Kool

Hi, I like to subscribe to squid-dev.
I tried to subscribe at June 27 but got no response.

Thanks

Marcus Kool


 Original Message 
From: - Mon Jun 27 16:29:03 2011
Message-ID: 4e08d9fc.5060...@urlfilterdb.com
Date: Mon, 27 Jun 2011 16:29:00 -0300
From: Marcus Kool marcus.k...@urlfilterdb.com
User-Agent: Thunderbird 2.0.0.24 (X11/20110404)
MIME-Version: 1.0
To: squid-dev@squid-cache.org
Subject: subscription


I am Marcus Kool, author of ufdbguardd, a URL filter for Squid
and author of a recent patch for Squid for Regular Expression Optimisation.
I am also writing a new URL filter based on ICAP and use Squid
for testing.

I like to be part of squid-dev.  The main reasons for wanting to join
squid-dev is to monitor issues with url_rewriter and ICAP and
to talk directly to the developers of Squid for questions regarding
the ICAP module.  I do not intend to write much code for the Squid
project.

Thanks

Marcus



[PATCH] regular expression optimisation patch for squid 3.1.14

2011-07-05 Thread Marcus Kool

Attached is a patch for optimisation of REs.
This is the second submission of the patch and the comments from
Amos' review are addressed.

This patch is inspired by the work that I did for ufdbGuard and a few emails 
with Amos.

The new code optimises lists of regular expressions.
The optimisations are:
* initial .* is stripped
* RE-1 RE-2 ... RE-n are joined into one large RE: (RE-1)|(RE-2)|...|(RE-n)
* -i ... -i options are optimised: the second one is ignored, same for +i

The only modified file is src/acl/RegexData.cc

attached are the patch (RegexData.cc.patch) and files for a unit test:
squidtest.conf
re.4lines- used in squidtest.conf; contains REs
re.200lines - used in squidtest.conf; contains REs
unittest_re_optim_wget - script with wget commands to trigger squid to evaluate 
REs

unittest_re_optim_wget contains instructions on how to setup and perform a unit 
test

I tried to get a member of the squid-dev mailing list but are not yet
so comments should also go to my email address directly.

Marcus Kool



Marcus Kool wrote:


Amos Jeffries wrote:

  Amos Jeffries wrote:
  Hi Marcus,
  Did my audit feedback on this make it to you? I've just noticed my
  mailer has not marked the thread as responded.
 

On 01/07/11 00:52, Marcus Kool wrote:

No, it did not.


Okay. My mailer seems to have screwed up badly. There were a few 
little minor bits.


 * the patch being reversed. Just order the files the other way around 
on next patch.


compileOptimisedREs/compileUnoptimisedREs have duplicate code checking 
for (RElen  BUFSIZ+1) case on the wordlist key. They are already 
checked for that criteria by aclParseRegexList before adding.


debugs() WARNING to the user should be DBG_IMPORTANT in the second 
parameter.


The major problem debugs() need DBG_CRITICAL in parameter #2 and 
ERROR: instead of the function name.


The 100 messages only need to be shown when checking the config for 
problems. ie.

  debugs(28, (opt_parse_cfg_only?DBG_IMPORTANT:2), 


Thanks for the feedback,  I will make a new patch.  I was not able to
do it to be included in the next releases but it will be soon.



None else has mentioned anything, so with these style tweaks it can go 
in. The next releases are planned to happen tomorrow. If you want to 
submit a new patch in the next 12hrs I'll use that.




I tried to subscribe to the squid-dev mailing list the other day
but got no reply yet. But in the list archives I did not see any
response/feedback either.


I saw that arrive. So whoever was moderating this week appears to have 
has okayed you for posting. If you went through the regular ezmail 
subscription process (mail to squid-dev-subscr...@squid-cache.org) you 
should have been receiving list mail for a few days?


I have not yet received emails from squid-dev.  Should I resend
the application ?


Amos


Marcus



patch-RE-optimisation-squid-3-1-14.tar.gz
Description: GNU Zip compressed data


subscription

2011-06-27 Thread Marcus Kool

I am Marcus Kool, author of ufdbguardd, a URL filter for Squid
and author of a recent patch for Squid for Regular Expression Optimisation.
I am also writing a new URL filter based on ICAP and use Squid
for testing.

I like to be part of squid-dev.  The main reasons for wanting to join
squid-dev is to monitor issues with url_rewriter and ICAP and
to talk directly to the developers of Squid for questions regarding
the ICAP module.  I do not intend to write much code for the Squid
project.

Thanks

Marcus


[PATCH] regular expression optimisation patch for squid 3.1.12

2011-06-02 Thread Marcus Kool

This patch is inspired by the work that I did for ufdbGuard and a few emails 
with Amos.

Attached is a patch for squid 3.1.12 to optimise lists of regular expressions.
The optimisations are:
* initial .* is stripped
* RE-1 RE-2 ... RE-n are joined into one large RE: (RE-1)|(RE-2)|...|(RE-n)
* -i ... -i options are optimised: the second one is ignored, same for +i

The only modified file is src/acl/RegexData.cc

attached are the patch (RegexData.cc.patch) and files for a unit test:
squidtest.conf
re.4lines   - used in squidtest.conf; contains REs
re.200lines - used in squidtest.conf; contains REs
unittest_re_optim_wget - script with wget commands to trigger squid to evaluate 
REs

unittest_re_optim_wget contains instructions on how to setup and perform a unit 
test

I am not subscribed to the squid-dev mailing list.
Please reply to my email address also.

Marcus Kool
marcus.k...@urlfilterdb.com

Amos Jeffries wrote:

On 01/06/11 09:18, Marcus Kool wrote:

Hi,

after some emails with Amos I agreed to make a patch for
squid to optimise lists of regular expressions. The
optimisations are:
* initial .* is stripped
* RE-1 RE-2 ... RE-n are joined into one large RE: 
(RE-1)|(RE-2)|...|(RE-n)

* -i ... -i options are optimised: the second one is ignored, same for +i

The only modified file is src/acl/RegexData.cc

My question for submitting the patch:
how do want the patch? is the output of the following command OK?
LC_ALL=C TZ=UTC0 diff -Naur src/acl/RegexData.cc 
src/acl/RegexData.cc.orig


That should be fine.



I used a test set: a squid.conf, two files with regular expressions
and a file with wget commands to test URLs.
Do you want/need these?


That would be helpful for unit-tests. So yes, thank you.



How to post the patch ?


As attachment please, with [PATCH] subject prefix and a description 
suitable for commit message. From an email you are happy adding 
permanently to the credits records.




I am not subscribed to the squid-dev mailing list. Please reply
to my email address also.

Thanks

Marcus Kool


Amos
abc.com
urlfilterdb.com/secret
xs4all.nl/verysecret
cnn.com/public
-i
abc.example.com/scripts/cgi-bin/40example.cgi
-i
foo\.example\.com/html/index\.php
-i
foo\.example\.com/html/asfsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
01john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/01example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/skdfhsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
02john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/02example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/234second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
03john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/03example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/sdfsaassecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
04john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/04example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/345nsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
05john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/05example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/asfkdhsadsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
06john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/06example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/2345234nnsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
07john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/07example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/asd0second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
08john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/08example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/sdgw1second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
09john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/09example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/safn2nsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
10john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/10example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/345n2second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
11john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin

regular expression optimisation patch

2011-05-31 Thread Marcus Kool

Hi,

after some emails with Amos I agreed to make a patch for
squid to optimise lists of regular expressions.  The
optimisations are:
*  initial .* is stripped
*  RE-1 RE-2 ... RE-n are joined into one large RE: (RE-1)|(RE-2)|...|(RE-n)
*  -i ... -i options are optimised: the second one is ignored, same for +i

The only modified file is src/acl/RegexData.cc

My question for submitting the patch:
how do want the patch? is the output of the following command OK?
LC_ALL=C TZ=UTC0 diff -Naur src/acl/RegexData.cc src/acl/RegexData.cc.orig

I used a test set: a squid.conf, two files with regular expressions
and a file with wget commands to test URLs.
Do you want/need these?

How to post the patch ?

I am not subscribed to the squid-dev mailing list.  Please reply
to my email address also.

Thanks

Marcus Kool


debugging Squid ICAP interface

2010-10-12 Thread Marcus Kool

Hello,

My name is Marcus Kool, author of ufdbGuard - a URL redirector for Squid,
and I have started development of an ICAP-based URL filter.

As with all new developments, the code of the ICAP server undoubtedly
has some bugs that need to be investigated and fixed.
I also have seen Squid behaving unexpectedly (2 minutes timeouts
where it seems not handle any request from a browser, assertion failure).

I have various observations and questions about the Squid ICAP interface
and like to discuss these with the persons who wrote or know much about
the ICAP client part of Squid.
I like to know with whom I can discuss this and which mailing list to use.

Thanks,

Marcus