[squid-users] squid 3.1 and HTTPS (and probably ipv6)

2012-03-13 Thread Eugene M. Zheganin

Hi.

I'm using squid 3.1.x on FreeBSD. Squid is built from ports.

Recently I was hit by a weird issue: my users cannot open HTTPS pages. 
This is not something constant - if they hit the F5 button in browser, 
the pages are loading, sometimes showing the message like 'Unable to 
connect. Firefox can't establish a connection to the server at 
access.ripe.net.' (for example. most of them are using FF). In the same 
time plain HTTP pages are working fine.


I did some investigation and it appears like squid really thinks it 
cannot connect to HTTPS-enabled web server:


===Cut===
2012/03/13 14:08:39.661| ACL::ChecklistMatches: result for 'all' is 1
2012/03/13 14:08:39.661| ACLList::matches: result is true
2012/03/13 14:08:39.661| aclmatchAclList: 0x285e4810 returning true (AND 
list satisfied)
2012/03/13 14:08:39.661| ACLChecklist::markFinished: 0x285e4810 
checklist processing finished
2012/03/13 14:08:39.661| ACLChecklist::check: 0x285e4810 match found, 
calling back with 1

2012/03/13 14:08:39.661| ACLChecklist::checkCallback: 0x285e4810 answer=1
2012/03/13 14:08:39.661| peerCheckAlwaysDirectDone: 1
2012/03/13 14:08:39.661| peerSelectFoo: 'CONNECT access.ripe.net'
2012/03/13 14:08:39.661| peerSelectFoo: direct = DIRECT_YES
2012/03/13 14:08:39.661| The AsyncCall SomeCommConnectHandler 
constructed, this=0x286e6740 [call1916]
2012/03/13 14:08:39.661| commConnectStart: FD 14, cb 0x286e6740*1, 
access.ripe.net:443
2012/03/13 14:08:39.661| The AsyncCall SomeCloseHandler constructed, 
this=0x2956c2c0 [call1917]

2012/03/13 14:08:39.661| ipcache_nbgethostbyname: Name 'access.ripe.net'.
2012/03/13 14:08:39.661| ipcache_nbgethostbyname: HIT for 'access.ripe.net'
2012/03/13 14:08:39.662| ipcacheMarkBadAddr: access.ripe.net 
[2001:67c:2e8:22::c100:685]:443
2012/03/13 14:08:39.662| ipcacheCycleAddr: access.ripe.net now at 
193.0.6.133 (2 of 2)

2012/03/13 14:08:39.662| commResetFD: Reset socket FD 14-16 : family=28
2012/03/13 14:08:39.662| commResetFD: Reset socket FD 14-16 : family=28
2012/03/13 14:08:39.662| FilledChecklist.cc(168) ~ACLFilledChecklist: 
ACLFilledChecklist destroyed 0x285e4810

2012/03/13 14:08:39.662| ACLChecklist::~ACLChecklist: destroyed 0x285e4810
2012/03/13 14:08:39.662| FilledChecklist.cc(168) ~ACLFilledChecklist: 
ACLFilledChecklist destroyed 0x285e4910

2012/03/13 14:08:39.662| ACLChecklist::~ACLChecklist: destroyed 0x285e4910
2012/03/13 14:08:39.662| The AsyncCall SomeCommReadHandler constructed, 
this=0x28ce9100 [call1918]
2012/03/13 14:08:39.662| leaving SomeCommReadHandler(FD 150, 
data=0x286b6710, size=4, buf=0x28d1e000)

2012/03/13 14:08:39.662| ipcache_nbgethostbyname: Name 'access.ripe.net'.
2012/03/13 14:08:39.662| ipcache_nbgethostbyname: HIT for 'access.ripe.net'
2012/03/13 14:08:39.662| commResetFD: Reset socket FD 14-16 : family=28
2012/03/13 14:08:39.662| commResetFD: Reset socket FD 14-16 : family=28
2012/03/13 14:08:39.662| ipcacheCycleAddr: access.ripe.net now at 
193.0.6.133 (2 of 2)

2012/03/13 14:08:39.662| ipcache_nbgethostbyname: Name 'access.ripe.net'.
2012/03/13 14:08:39.662| ipcache_nbgethostbyname: HIT for 'access.ripe.net'
2012/03/13 14:08:39.662| ipcacheMarkAllGood: Changing ALL 
access.ripe.net addrs to OK (1/2 bad)

2012/03/13 14:08:39.662| commConnectCallback: FD 14
2012/03/13 14:08:39.662| comm.cc(1195) commSetTimeout: FD 14 timeout -1
2012/03/13 14:08:39.662| comm.cc(1206) commSetTimeout: FD 14 timeout -1
2012/03/13 14:08:39.662| comm.cc(934) will call 
SomeCommConnectHandler(FD 14, errno=22, flag=-8, data=0x28f6bdd0, ) 
[call1916]

2012/03/13 14:08:39.662| commConnectFree: FD 14
2012/03/13 14:08:39.662| entering SomeCommConnectHandler(FD 14, 
errno=22, flag=-8, data=0x28f6bdd0, )
2012/03/13 14:08:39.662| AsyncCall.cc(32) make: make call 
SomeCommConnectHandler [call1916]

2012/03/13 14:08:39.662| errorSend: FD 12, err=0x28f995d0
2012/03/13 14:08:39.662| errorpage.cc(1051) BuildContent: No existing 
error page language negotiated for ERR_CONNECT_FAIL. Using default error 
file.

===Cut==

But why ? I did some telnetting from this server to the 
access.ripe.net:443, and it succeeded like 10 from 10 times (squid error 
rate is far more frequent). The only thing that bothers me is that 
telnet also first tries ipv6 too, but then switches to the ipv4, and 
connects.


Now some suggestions (probably a shot in the dark). This only happens on 
an ipv6-enabled machines, but without actual ipv6 connectivity (no ipv6 
default route or no public ipv6 address, for example I have unique-local 
addresses for the testing purposes). In the same time this issue can be 
easily solved by restoring the ipv6 connectivity to the outer world. So, 
can it be some dual-stack behavior bug ? Or is it 'by design' ? Do I 
need to report it ?


Thanks.
Eugene.


Re: [squid-users] squid 3.1 and HTTPS (and probably ipv6)

2012-03-13 Thread Amos Jeffries

On 13.03.2012 22:10, Eugene M. Zheganin wrote:

Hi.

I'm using squid 3.1.x on FreeBSD. Squid is built from ports.

Recently I was hit by a weird issue: my users cannot open HTTPS
pages. This is not something constant - if they hit the F5 button in
browser, the pages are loading, sometimes showing the message like
'Unable to connect. Firefox can't establish a connection to the 
server

at access.ripe.net.' (for example. most of them are using FF). In the
same time plain HTTP pages are working fine.

I did some investigation and it appears like squid really thinks it
cannot connect to HTTPS-enabled web server:




As you guessed this does seem to be a stack issue. Dual-stack systems 
can be configured to operate as hybrid stacks or as split stacks (two 
distinct socket handling paths). Recently there has been a trend away 
from the simpler hybrid stacks towards split stacks.


Squid-3.1 was written for hybrid stacks with v4-mapping ability. When 
run on stack without mapping (split) it cannot reset the FD protocol to 
switch stack types. Workaround/Support for split stacks has been added 
incrementally across the 3.1 series, with some of the deeper changes 
only in 3.2.




===Cut===

snip
2012/03/13 14:08:39.661| ipcache_nbgethostbyname: HIT for 
'access.ripe.net'


Found the site IPs.

assuming: connect to the first one (IPv6).


2012/03/13 14:08:39.662| ipcacheMarkBadAddr: access.ripe.net
[2001:67c:2e8:22::c100:685]:443


 Didn't work. Mark it bad.


2012/03/13 14:08:39.662| ipcacheCycleAddr: access.ripe.net now at
193.0.6.133 (2 of 2)
2012/03/13 14:08:39.662| commResetFD: Reset socket FD 14-16 : 
family=28
2012/03/13 14:08:39.662| commResetFD: Reset socket FD 14-16 : 
family=28


Reset the socket FD to convert for the new IP (v4).

assuming: socket still IPv6 and trying to use for IPv4?
assuming: connect to this IP also failed.

2012/03/13 14:08:39.662| ipcache_nbgethostbyname: Name 
'access.ripe.net'.
2012/03/13 14:08:39.662| ipcache_nbgethostbyname: HIT for 
'access.ripe.net'
2012/03/13 14:08:39.662| commResetFD: Reset socket FD 14-16 : 
family=28
2012/03/13 14:08:39.662| commResetFD: Reset socket FD 14-16 : 
family=28


 Reset the socket FD (again).. Bit strange that this is still 14-16.


2012/03/13 14:08:39.662| ipcacheCycleAddr: access.ripe.net now at
193.0.6.133 (2 of 2)
2012/03/13 14:08:39.662| ipcache_nbgethostbyname: Name 
'access.ripe.net'.
2012/03/13 14:08:39.662| ipcache_nbgethostbyname: HIT for 
'access.ripe.net'

2012/03/13 14:08:39.662| ipcacheMarkAllGood: Changing ALL
access.ripe.net addrs to OK (1/2 bad)


Failed a second time. tries  number of IPs (huh? 3 or 2 tries?)

Instead of retrying yet again, cycle the IPs ...


2012/03/13 14:08:39.662| errorpage.cc(1051) BuildContent: No existing
error page language negotiated for ERR_CONNECT_FAIL. Using default
error file.


... and respond with error.


===Cut==

But why ? I did some telnetting from this server to the
access.ripe.net:443, and it succeeded like 10 from 10 times (squid
error rate is far more frequent). The only thing that bothers me is
that telnet also first tries ipv6 too, but then switches to the ipv4,
and connects.

Now some suggestions (probably a shot in the dark). This only happens
on an ipv6-enabled machines, but without actual ipv6 connectivity (no
ipv6 default route or no public ipv6 address, for example I have
unique-local addresses for the testing purposes). In the same time
this issue can be easily solved by restoring the ipv6 connectivity to
the outer world. So, can it be some dual-stack behavior bug ? Or is 
it

'by design' ? Do I need to report it ?



Squid opens an IPv6 socket by default, attempts the IPv6 destination 
(route down, IPv6 socket). Fails. Then attempts to reset the socket 
protocol family and contact the IPv4 destination (route fine, IPv6 
socket [oops!]).
You can avoid this in 3.1 by enabling v4-mapping capability in your 
kernel or using tcp_outgoing_address 0.0.0.0 to force the sockets to 
be IPv4-only from the start. 3.2 series has better split-stack support 
so should have this behaviour problem fixed now.



Amos