Re: Any takers on Bug #1703? (diskd stuck at 100% CPU)

2006-07-29 Thread Steven


On Sat, 29 Jul 2006, Henrik Nordstrom wrote:

> l??r 2006-07-29 klockan 23:16 +0800 skrev Steven:
> 
> > I was seeing the msgrecv() calls while running strace, but it wasn't in
> > the same loop as reported in the bug.  Looks like I just found another bug
> > while trying to reproduce this one :)
> 
> Was not aware there was msgrcv() calls in pthreads.
> 
> We don't have a backtrace in the bug, so it could be the same and I was
> chasing down the wrong path...
> 
> Guess we gave to wait for Ralf to answer about the details of his setup.


I have just found another bug.   I have ended up in the situation where
squid has not stared the diskd processes (because the binary is called
diskd-daemon not diskd_daemon).  Squid is not serving any web pages, 
and will not stop because of this, but is is stuck in a loop doing the
following:

epoll_wait(3, {}, 256, 10)  = 0
gettimeofday({1154236711, 528040}, NULL) = 0
msgrcv(1409029, {0, 
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...}, 32, 0,
IPC_NOWAIT) = -1 ENOMSG (No message of desired type)
epoll_wait(3, {}, 256, 10)  = 0


What would happen if the diskd process stopped too early on a busy
system.  Is it possible that the diskd_daemon processes can stop without
processing all requests (which would leave the main squid process in an
endless loop).

Steven



Re: Any takers on Bug #1703? (diskd stuck at 100% CPU)

2006-07-29 Thread Steven


On Sat, 29 Jul 2006, Henrik Nordstrom wrote:

> l??r 2006-07-29 klockan 23:16 +0800 skrev Steven:
> 
> > I was seeing the msgrecv() calls while running strace, but it wasn't in
> > the same loop as reported in the bug.  Looks like I just found another bug
> > while trying to reproduce this one :)
> 
> Was not aware there was msgrcv() calls in pthreads.

I had a COSS + diskd setup.  The msgrecv() syscalls were coming from the
diskd cache_dirs.  They were happening every 10ms, but there was a call to
epoll() in between each msgrecv(), so it's not the same bug.

> We don't have a backtrace in the bug, so it could be the same and I was
> chasing down the wrong path...
> 
> Guess we gave to wait for Ralf to answer about the details of his setup.

I'm going to compile the same version of squid and set up using Debian 
and only diskd cache_dirs and see if I can reproduce.  There are 2
possibilities that I can think of.  diskdinfo->away may not being
decremented every time (or is being incorrectly incremented), and
squid is waiting for replies to messages that have not actually been 
sent.  The other possible issue is if the diskd processes have stopped
due to the reconfigure signal, but squid is waiting for them to send a
message.

Either way, it may only happen on a loaded system (which may make it
harder to reproduce on a test system).  I'll find this out shortly.

Steven



How test HTCP ?

2006-07-29 Thread Guido Serassio

Hi,

I want to test the forward port of HTCP changes from 2.6 to 3.0, but 
I don't know how to check al the HTCP functionality.


Any suggestion ?

Regards

Guido



-

Guido Serassio
Acme Consulting S.r.l. - Microsoft Certified Partner
Via Lucia Savarino, 1   10098 - Rivoli (TO) - ITALY
Tel. : +39.011.9530135  Fax. : +39.011.9781115
Email: [EMAIL PROTECTED]
WWW: http://www.acmeconsulting.it/



Re: Any takers on Bug #1703? (diskd stuck at 100% CPU)

2006-07-29 Thread Henrik Nordstrom
lör 2006-07-29 klockan 23:16 +0800 skrev Steven:

> I was seeing the msgrecv() calls while running strace, but it wasn't in
> the same loop as reported in the bug.  Looks like I just found another bug
> while trying to reproduce this one :)

Was not aware there was msgrcv() calls in pthreads.

We don't have a backtrace in the bug, so it could be the same and I was
chasing down the wrong path...

Guess we gave to wait for Ralf to answer about the details of his setup.

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: Any takers on Bug #1703? (diskd stuck at 100% CPU)

2006-07-29 Thread Steven


On Sat, 29 Jul 2006, Henrik Nordstrom wrote:

> l??r 2006-07-29 klockan 18:05 +0800 skrev Steven:
> 
> > I could reproduce the bug if I had a COSS cache_dir enabled without any
> > aufs cache_dirs.  I've updated the bug with a patch to fix this scenario.
> 
> I think the COSS issue is separate. Based on your patch that problem
> should be seen immediately on startup, and not after a "squid -k
> rotate".
> 
> Also are you sure the symptoms is really the same? In bug #1703 Squid
> seem to be stuck on calling msgrecv() repeatedly.
> 

Hmm, you're right.  I am hitting a similar bug.  Without the patch I
attached to the bug, the COSS code submits an aio request, but because no
aio threads have started the request never finishes.  This does not cause
a 100% CPU load condition on startup. When squid tries to shut down
nicely, or rotate logs, it gets stuck in the following code

squidaio_shutdown():

/* This is the same as in squidaio_sync */
do {
squidaio_poll_queues();
} while (request_queue_len > 0);


this is because no threads will ever complete the request.



I was seeing the msgrecv() calls while running strace, but it wasn't in
the same loop as reported in the bug.  Looks like I just found another bug
while trying to reproduce this one :)

I'll try again and see if I can reproduce this bug here tomorrow.

Steven



Re: [squid-users] tproxy2 patch for squid3

2006-07-29 Thread Henrik Nordstrom
lör 2006-07-29 klockan 17:05 +0200 skrev Jan Engelhardt:

> >The relevant parts of the code to fix this is in FwdState::pconnPush and
> >FwdState::connectStart fwdPconnPool->pop().
> 
> What would I have to add?

You would need to extend the key used in these functions with at least
the source IP of the client (== source IP of connection) when tproxied..
0.0.0.0 otherwise.

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: [squid-users] tproxy2 patch for squid3

2006-07-29 Thread Jan Engelhardt

>> Regular client-side transparent proxying is easily accomplished by 
>> redirecting network traffic using -j DNAT, -j REDIRECT, or -j TPROXY (I do 
>> not know why this seems needed). However, server-side transparency requires 
>> a little more kick.
>> https://lists.balabit.hu/pipermail/tproxy/2006-July/000273.html
>> 
>> This patch actually brew in my homemade version of squid3 and worked long 
>> before tproxy even hit the squid2.6 scene. CAP_NET_ADMIN must be accounted 
>> for
>> by the user, and in my case, is easily done through the MultiAdmin linux 
>> kernel
>> module.
>
>Comments:
>
>Your patch does not handle persistent connections.

I am not too well into the squid code. I was happy to find the function at 
all where connection setup is done, since squid is asynchronous.

>If there is multiple
>clients talking to the same server their requests may get intermixed, no
>longer keeping the source IP binding.

If multiple client-proxy streams are multiplexed into one proxy-webserver 
stream, we anyhow have a hard time to bind to all address at once.
Does this explain why some client-proxy connections time out after 30 
seconds?

>Why the commConnectStart2 function instead of extending
>commConnectStart?

Other places where commConnectStart may be called from would be 
falsely tproxified, that was the idea.


Yes, it's more of a hack rather than a clean implementation. But it did 
what I need, and, except for some timeout connections every now and then 
(like 1% of all connections), work.



Jan Engelhardt
-- 


Re: [squid-users] tproxy2 patch for squid3

2006-07-29 Thread Jan Engelhardt
>> If multiple client-proxy streams are multiplexed into one proxy-webserver 
>> stream, we anyhow have a hard time to bind to all address at once.
>
>The problem I meantioned is more of
>
>1. Client A makes a request, connection to web server is created and
>tproxied as A. It receives response from server. The proxy->webserver
>connection is then released into the idle persistent connections pool.
>
>2. Client B makes a request to the same server. Squid finds the idle
>persistent connection and reuses it, avoiding to have set up a new
>connection. This will therefore use the address of A.

Yes that is what I meant as multiplexed == multiple clients use the same 
webserver connection.

>And to make the situation slightly more complex further illustrating the
>problem:
>
>3. [...]
>4. [...]

>The relevant parts of the code to fix this is in FwdState::pconnPush and
>FwdState::connectStart fwdPconnPool->pop().

What would I have to add?



Jan Engelhardt
-- 


Re: [squid-users] tproxy2 patch for squid3

2006-07-29 Thread Henrik Nordstrom
lör 2006-07-29 klockan 15:47 +0200 skrev Jan Engelhardt:

> If multiple client-proxy streams are multiplexed into one proxy-webserver 
> stream, we anyhow have a hard time to bind to all address at once.

???

The binding is only done on connection setup, not per request. It's TCP
so it can't be done in any other way.

> Does this explain why some client-proxy connections time out after 30 
> seconds?

No. That's something else entirely.

The problem I meantioned is more of

1. Client A makes a request, connection to web server is created and
tproxied as A. It receives response from server. The proxy->webserver
connection is then released into the idle persistent connections pool.

2. Client B makes a request to the same server. Squid finds the idle
persistent connection and reuses it, avoiding to have set up a new
connection. This will therefore use the address of A.


And to make the situation slightly more complex further illustrating the
problem:


3. Client B sends another request while waiting for the response on the
first. This gets a new connection (no idle available) and is tproxied as
B. Response received.

4. Client A now sends another request before B has finished in 2. Squid
finds the idle connetion set up in 3 and A's request is now sent as if
it came from B.


The relevant parts of the code to fix this is in FwdState::pconnPush and
FwdState::connectStart fwdPconnPool->pop().

> Other places where commConnectStart may be called from would be 
> falsely tproxified, that was the idea.

Easily handled as the src argument then defaults to NULL in your
prototype..

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: [squid-users] tproxy2 patch for squid3

2006-07-29 Thread Henrik Nordstrom
lör 2006-07-29 klockan 09:44 +0200 skrev Jan Engelhardt:
> Hello,
> 
> 
> Regular client-side transparent proxying is easily accomplished by 
> redirecting network traffic using -j DNAT, -j REDIRECT, or -j TPROXY (I do 
> not know why this seems needed). However, server-side transparency requires 
> a little more kick.
> https://lists.balabit.hu/pipermail/tproxy/2006-July/000273.html
> 
> This patch actually brew in my homemade version of squid3 and worked long 
> before tproxy even hit the squid2.6 scene. CAP_NET_ADMIN must be accounted for
> by the user, and in my case, is easily done through the MultiAdmin linux 
> kernel
> module.

Comments:

Your patch does not handle persistent connections. If there is multiple
clients talking to the same server their requests may get intermixed, no
longer keeping the source IP binding.

Why the commConnectStart2 function instead of extending
commConnectStart?


Regards
Henrik

> 
> diff --fast -Ndpru squid-3.0.PRE4-20060727~/src/cf.data.pre 
> squid-3.0.PRE4-20060727/src/cf.data.pre
> --- squid-3.0.PRE4-20060727~/src/cf.data.pre  2006-07-02 18:53:46.0 
> +0200
> +++ squid-3.0.PRE4-20060727/src/cf.data.pre   2006-07-28 15:56:59.629577000 
> +0200
> @@ -2852,6 +2852,16 @@ DOC_START
>   the correct result.
>  DOC_END
>  
> +NAME: tproxy
> +TYPE: onoff
> +DEFAULT: off
> +LOC: Config.onoff.tproxy
> +DOC_START
> + If you have Linux with iptables and TPROXY2 support, you can enable
> + this option to have SQUID make outgoing connections using the original
> + IP address of the client.
> +DOC_END
> +
>  NAME: tcp_outgoing_tos tcp_outgoing_ds tcp_outgoing_dscp
>  TYPE: acl_tos
>  DEFAULT: none
> diff --fast -Ndpru squid-3.0.PRE4-20060727~/src/comm.cc 
> squid-3.0.PRE4-20060727/src/comm.cc
> --- squid-3.0.PRE4-20060727~/src/comm.cc  2006-05-30 23:15:58.0 
> +0200
> +++ squid-3.0.PRE4-20060727/src/comm.cc   2006-07-28 15:57:02.299577000 
> +0200
> @@ -39,8 +39,10 @@
>  #include "StoreIOBuffer.h"
>  #include "comm.h"
>  #include "fde.h"
> +#include "forward.h"
>  #include "CommIO.h"
>  #include "ConnectionDetail.h"
> +#include "HttpRequest.h"
>  #include "MemBuf.h"
>  #include "pconn.h"
>  #include "SquidTime.h"
> @@ -52,6 +54,7 @@
>  #include 
>  #endif
>  
> +#include "ip_tproxy.h"
>  
>  class ConnectStateData
>  {
> @@ -66,7 +69,7 @@ public:
>  char *host;
>  u_short port;
>  
> -struct sockaddr_in S;
> +struct sockaddr_in S, src_addr;
>  CallBack callback;
>  
>  struct IN_ADDR in_addr;
> @@ -1150,6 +1153,26 @@ ConnectStateData::operator delete (void 
>  cbdataFree(address);
>  }
>  
> +void commConnectStart2(int fd, const char *host, u_short port, CNCB 
> *callback,
> + FwdState *fs)
> +{
> +ConnectStateData *cs;
> +
> +cs = new ConnectStateData;
> +cs->fd = fd;
> +cs->host = xstrdup(host);
> +cs->port = port;
> +cs->callback = CallBack(callback, fs);
> +if(fs->request != NULL) {
> +cs->src_addr.sin_addr = fs->request->client_addr;
> +cs->src_addr.sin_port = htons(fs->request->client_port);
> +} else {
> +memset(&cs->src_addr, 0, sizeof(cs->src_addr));
> +}
> +comm_add_close_handler(fd, commConnectFree, cs);
> +ipcache_nbgethostbyname(host, commConnectDnsHandle, cs);
> +}
> +
>  void
>  commConnectStart(int fd, const char *host, u_short port, CNCB * callback, 
> void *data)
>  {
> @@ -1353,7 +1376,7 @@ ConnectStateData::connect()
>  if (S.sin_addr.s_addr == 0)
>  defaults();
>  
> -switch (comm_connect_addr(fd, &S)) {
> +switch (comm_connect_addr(fd, &S, &src_addr)) {
>  
>  case COMM_INPROGRESS:
>  debug(5, 5) ("ConnectStateData::connect: FD %d: COMM_INPROGRESS\n", 
> fd);
> @@ -1406,9 +1429,45 @@ commSetTimeout(int fd, int timeout, PF *
>  return F->timeout;
>  }
>  
> -int
> +static void do_tproxy(int sock, const struct sockaddr_in *src,
> + const struct sockaddr_in *dest)
> +{
> +struct in_tproxy itp;
> +int ret;
>  
> -comm_connect_addr(int sock, const struct sockaddr_in *address)
> +memset(&itp, 0, sizeof(itp));
> +itp.v.addr.faddr = src->sin_addr; // fix endianness
> +itp.v.addr.fport = 0; //src->sin_port;
> +itp.op = TPROXY_ASSIGN;
> +
> +if((ret = setsockopt(sock, SOL_IP, IP_TPROXY, &itp, sizeof(itp))) != 0) {
> +debug(5, 3) ("setsockopt IP_TPROXY/TPROXY_ASSIGN failed\n");
> +return;
> +}
> +
> +memset(&itp, 0, sizeof(itp));
> +itp.v.addr.faddr = dest->sin_addr;
> +itp.v.addr.fport = dest->sin_port;
> +itp.op = TPROXY_CONNECT;
> +if((ret = setsockopt(sock, SOL_IP, IP_TPROXY, &itp, sizeof(itp))) != 0) {
> +debug(5, 3) ("setsockopt IP_TPROXY/TPROXY_CONNECT failed\n");
> +return;
> +}
> +
> +memset(&itp, 0, sizeof(itp));
> +itp.v.flags = ITP_CONNECT;
> +itp.op = TPROXY_FLAGS;
> +if((ret = setsockopt(sock, SOL_IP, IP_TPROXY, &itp, sizeof(itp))) != 0) {
> +debug(5, 3) ("setsockopt IP_TPROX

Re: Any takers on Bug #1703? (diskd stuck at 100% CPU)

2006-07-29 Thread Henrik Nordstrom
lör 2006-07-29 klockan 18:05 +0800 skrev Steven:

> I could reproduce the bug if I had a COSS cache_dir enabled without any
> aufs cache_dirs.  I've updated the bug with a patch to fix this scenario.

I think the COSS issue is separate. Based on your patch that problem
should be seen immediately on startup, and not after a "squid -k
rotate".

Also are you sure the symptoms is really the same? In bug #1703 Squid
seem to be stuck on calling msgrecv() repeatedly.

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: Occasional DNS error in Squid 2.6

2006-07-29 Thread Henrik Nordstrom
lör 2006-07-29 klockan 12:47 +0200 skrev Henrik Nordstrom:
> lör 2006-07-29 klockan 09:35 +0200 skrev Guido Serassio:
> > Hi,
> > 
> > I have found an occasional DNS resolution error when browsing 
> > www.microsoft.com.
> > 
> > I have seen the error only few times, less then 10, but the odd thing 
> > is that this happens always only with www.microsoft.com and sometime 
> > it was also happened on 2.5:
> > 
> > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=367963
> > 
> > A page reload is always successful.
> 
> Could be Bug #1602. www.microsoft.com is dangerously close to the UDP
> DNS size limit of 512 octets..

Could also be a broken bind version.. some versions of bind have had
problems with CNAME chains and TTL expiry.

a ethereal trace of port 53 traffic will tell what the problem is.

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: Occasional DNS error in Squid 2.6

2006-07-29 Thread Henrik Nordstrom
lör 2006-07-29 klockan 09:35 +0200 skrev Guido Serassio:
> Hi,
> 
> I have found an occasional DNS resolution error when browsing 
> www.microsoft.com.
> 
> I have seen the error only few times, less then 10, but the odd thing 
> is that this happens always only with www.microsoft.com and sometime 
> it was also happened on 2.5:
> 
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=367963
> 
> A page reload is always successful.

Could be Bug #1602. www.microsoft.com is dangerously close to the UDP
DNS size limit of 512 octets..

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: Any takers on Bug #1703? (diskd stuck at 100% CPU)

2006-07-29 Thread Steven


On Fri, 28 Jul 2006, Henrik Nordstrom wrote:

> Looked at it breafly, but ran out of ideas.
> 
> http://www.squid-cache.org/bugs/show_bug.cgi?id=1703
> 
> Regards
> Henrik

I could reproduce the bug if I had a COSS cache_dir enabled without any
aufs cache_dirs.  I've updated the bug with a patch to fix this scenario.

Steven



Occasional DNS error in Squid 2.6

2006-07-29 Thread Guido Serassio

Hi,

I have found an occasional DNS resolution error when browsing 
www.microsoft.com.


I have seen the error only few times, less then 10, but the odd thing 
is that this happens always only with www.microsoft.com and sometime 
it was also happened on 2.5:


http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=367963

A page reload is always successful.

Regards

Guido



-

Guido Serassio
Acme Consulting S.r.l. - Microsoft Certified Partner
Via Lucia Savarino, 1   10098 - Rivoli (TO) - ITALY
Tel. : +39.011.9530135  Fax. : +39.011.9781115
Email: [EMAIL PROTECTED]
WWW: http://www.acmeconsulting.it/