Re: Any takers on Bug #1703? (diskd stuck at 100% CPU)
On Sat, 29 Jul 2006, Henrik Nordstrom wrote: > l??r 2006-07-29 klockan 23:16 +0800 skrev Steven: > > > I was seeing the msgrecv() calls while running strace, but it wasn't in > > the same loop as reported in the bug. Looks like I just found another bug > > while trying to reproduce this one :) > > Was not aware there was msgrcv() calls in pthreads. > > We don't have a backtrace in the bug, so it could be the same and I was > chasing down the wrong path... > > Guess we gave to wait for Ralf to answer about the details of his setup. I have just found another bug. I have ended up in the situation where squid has not stared the diskd processes (because the binary is called diskd-daemon not diskd_daemon). Squid is not serving any web pages, and will not stop because of this, but is is stuck in a loop doing the following: epoll_wait(3, {}, 256, 10) = 0 gettimeofday({1154236711, 528040}, NULL) = 0 msgrcv(1409029, {0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...}, 32, 0, IPC_NOWAIT) = -1 ENOMSG (No message of desired type) epoll_wait(3, {}, 256, 10) = 0 What would happen if the diskd process stopped too early on a busy system. Is it possible that the diskd_daemon processes can stop without processing all requests (which would leave the main squid process in an endless loop). Steven
Re: Any takers on Bug #1703? (diskd stuck at 100% CPU)
On Sat, 29 Jul 2006, Henrik Nordstrom wrote: > l??r 2006-07-29 klockan 23:16 +0800 skrev Steven: > > > I was seeing the msgrecv() calls while running strace, but it wasn't in > > the same loop as reported in the bug. Looks like I just found another bug > > while trying to reproduce this one :) > > Was not aware there was msgrcv() calls in pthreads. I had a COSS + diskd setup. The msgrecv() syscalls were coming from the diskd cache_dirs. They were happening every 10ms, but there was a call to epoll() in between each msgrecv(), so it's not the same bug. > We don't have a backtrace in the bug, so it could be the same and I was > chasing down the wrong path... > > Guess we gave to wait for Ralf to answer about the details of his setup. I'm going to compile the same version of squid and set up using Debian and only diskd cache_dirs and see if I can reproduce. There are 2 possibilities that I can think of. diskdinfo->away may not being decremented every time (or is being incorrectly incremented), and squid is waiting for replies to messages that have not actually been sent. The other possible issue is if the diskd processes have stopped due to the reconfigure signal, but squid is waiting for them to send a message. Either way, it may only happen on a loaded system (which may make it harder to reproduce on a test system). I'll find this out shortly. Steven
How test HTCP ?
Hi, I want to test the forward port of HTCP changes from 2.6 to 3.0, but I don't know how to check al the HTCP functionality. Any suggestion ? Regards Guido - Guido Serassio Acme Consulting S.r.l. - Microsoft Certified Partner Via Lucia Savarino, 1 10098 - Rivoli (TO) - ITALY Tel. : +39.011.9530135 Fax. : +39.011.9781115 Email: [EMAIL PROTECTED] WWW: http://www.acmeconsulting.it/
Re: Any takers on Bug #1703? (diskd stuck at 100% CPU)
lör 2006-07-29 klockan 23:16 +0800 skrev Steven: > I was seeing the msgrecv() calls while running strace, but it wasn't in > the same loop as reported in the bug. Looks like I just found another bug > while trying to reproduce this one :) Was not aware there was msgrcv() calls in pthreads. We don't have a backtrace in the bug, so it could be the same and I was chasing down the wrong path... Guess we gave to wait for Ralf to answer about the details of his setup. Regards Henrik signature.asc Description: Detta är en digitalt signerad meddelandedel
Re: Any takers on Bug #1703? (diskd stuck at 100% CPU)
On Sat, 29 Jul 2006, Henrik Nordstrom wrote: > l??r 2006-07-29 klockan 18:05 +0800 skrev Steven: > > > I could reproduce the bug if I had a COSS cache_dir enabled without any > > aufs cache_dirs. I've updated the bug with a patch to fix this scenario. > > I think the COSS issue is separate. Based on your patch that problem > should be seen immediately on startup, and not after a "squid -k > rotate". > > Also are you sure the symptoms is really the same? In bug #1703 Squid > seem to be stuck on calling msgrecv() repeatedly. > Hmm, you're right. I am hitting a similar bug. Without the patch I attached to the bug, the COSS code submits an aio request, but because no aio threads have started the request never finishes. This does not cause a 100% CPU load condition on startup. When squid tries to shut down nicely, or rotate logs, it gets stuck in the following code squidaio_shutdown(): /* This is the same as in squidaio_sync */ do { squidaio_poll_queues(); } while (request_queue_len > 0); this is because no threads will ever complete the request. I was seeing the msgrecv() calls while running strace, but it wasn't in the same loop as reported in the bug. Looks like I just found another bug while trying to reproduce this one :) I'll try again and see if I can reproduce this bug here tomorrow. Steven
Re: [squid-users] tproxy2 patch for squid3
lör 2006-07-29 klockan 17:05 +0200 skrev Jan Engelhardt: > >The relevant parts of the code to fix this is in FwdState::pconnPush and > >FwdState::connectStart fwdPconnPool->pop(). > > What would I have to add? You would need to extend the key used in these functions with at least the source IP of the client (== source IP of connection) when tproxied.. 0.0.0.0 otherwise. Regards Henrik signature.asc Description: Detta är en digitalt signerad meddelandedel
Re: [squid-users] tproxy2 patch for squid3
>> Regular client-side transparent proxying is easily accomplished by >> redirecting network traffic using -j DNAT, -j REDIRECT, or -j TPROXY (I do >> not know why this seems needed). However, server-side transparency requires >> a little more kick. >> https://lists.balabit.hu/pipermail/tproxy/2006-July/000273.html >> >> This patch actually brew in my homemade version of squid3 and worked long >> before tproxy even hit the squid2.6 scene. CAP_NET_ADMIN must be accounted >> for >> by the user, and in my case, is easily done through the MultiAdmin linux >> kernel >> module. > >Comments: > >Your patch does not handle persistent connections. I am not too well into the squid code. I was happy to find the function at all where connection setup is done, since squid is asynchronous. >If there is multiple >clients talking to the same server their requests may get intermixed, no >longer keeping the source IP binding. If multiple client-proxy streams are multiplexed into one proxy-webserver stream, we anyhow have a hard time to bind to all address at once. Does this explain why some client-proxy connections time out after 30 seconds? >Why the commConnectStart2 function instead of extending >commConnectStart? Other places where commConnectStart may be called from would be falsely tproxified, that was the idea. Yes, it's more of a hack rather than a clean implementation. But it did what I need, and, except for some timeout connections every now and then (like 1% of all connections), work. Jan Engelhardt --
Re: [squid-users] tproxy2 patch for squid3
>> If multiple client-proxy streams are multiplexed into one proxy-webserver >> stream, we anyhow have a hard time to bind to all address at once. > >The problem I meantioned is more of > >1. Client A makes a request, connection to web server is created and >tproxied as A. It receives response from server. The proxy->webserver >connection is then released into the idle persistent connections pool. > >2. Client B makes a request to the same server. Squid finds the idle >persistent connection and reuses it, avoiding to have set up a new >connection. This will therefore use the address of A. Yes that is what I meant as multiplexed == multiple clients use the same webserver connection. >And to make the situation slightly more complex further illustrating the >problem: > >3. [...] >4. [...] >The relevant parts of the code to fix this is in FwdState::pconnPush and >FwdState::connectStart fwdPconnPool->pop(). What would I have to add? Jan Engelhardt --
Re: [squid-users] tproxy2 patch for squid3
lör 2006-07-29 klockan 15:47 +0200 skrev Jan Engelhardt: > If multiple client-proxy streams are multiplexed into one proxy-webserver > stream, we anyhow have a hard time to bind to all address at once. ??? The binding is only done on connection setup, not per request. It's TCP so it can't be done in any other way. > Does this explain why some client-proxy connections time out after 30 > seconds? No. That's something else entirely. The problem I meantioned is more of 1. Client A makes a request, connection to web server is created and tproxied as A. It receives response from server. The proxy->webserver connection is then released into the idle persistent connections pool. 2. Client B makes a request to the same server. Squid finds the idle persistent connection and reuses it, avoiding to have set up a new connection. This will therefore use the address of A. And to make the situation slightly more complex further illustrating the problem: 3. Client B sends another request while waiting for the response on the first. This gets a new connection (no idle available) and is tproxied as B. Response received. 4. Client A now sends another request before B has finished in 2. Squid finds the idle connetion set up in 3 and A's request is now sent as if it came from B. The relevant parts of the code to fix this is in FwdState::pconnPush and FwdState::connectStart fwdPconnPool->pop(). > Other places where commConnectStart may be called from would be > falsely tproxified, that was the idea. Easily handled as the src argument then defaults to NULL in your prototype.. Regards Henrik signature.asc Description: Detta är en digitalt signerad meddelandedel
Re: [squid-users] tproxy2 patch for squid3
lör 2006-07-29 klockan 09:44 +0200 skrev Jan Engelhardt: > Hello, > > > Regular client-side transparent proxying is easily accomplished by > redirecting network traffic using -j DNAT, -j REDIRECT, or -j TPROXY (I do > not know why this seems needed). However, server-side transparency requires > a little more kick. > https://lists.balabit.hu/pipermail/tproxy/2006-July/000273.html > > This patch actually brew in my homemade version of squid3 and worked long > before tproxy even hit the squid2.6 scene. CAP_NET_ADMIN must be accounted for > by the user, and in my case, is easily done through the MultiAdmin linux > kernel > module. Comments: Your patch does not handle persistent connections. If there is multiple clients talking to the same server their requests may get intermixed, no longer keeping the source IP binding. Why the commConnectStart2 function instead of extending commConnectStart? Regards Henrik > > diff --fast -Ndpru squid-3.0.PRE4-20060727~/src/cf.data.pre > squid-3.0.PRE4-20060727/src/cf.data.pre > --- squid-3.0.PRE4-20060727~/src/cf.data.pre 2006-07-02 18:53:46.0 > +0200 > +++ squid-3.0.PRE4-20060727/src/cf.data.pre 2006-07-28 15:56:59.629577000 > +0200 > @@ -2852,6 +2852,16 @@ DOC_START > the correct result. > DOC_END > > +NAME: tproxy > +TYPE: onoff > +DEFAULT: off > +LOC: Config.onoff.tproxy > +DOC_START > + If you have Linux with iptables and TPROXY2 support, you can enable > + this option to have SQUID make outgoing connections using the original > + IP address of the client. > +DOC_END > + > NAME: tcp_outgoing_tos tcp_outgoing_ds tcp_outgoing_dscp > TYPE: acl_tos > DEFAULT: none > diff --fast -Ndpru squid-3.0.PRE4-20060727~/src/comm.cc > squid-3.0.PRE4-20060727/src/comm.cc > --- squid-3.0.PRE4-20060727~/src/comm.cc 2006-05-30 23:15:58.0 > +0200 > +++ squid-3.0.PRE4-20060727/src/comm.cc 2006-07-28 15:57:02.299577000 > +0200 > @@ -39,8 +39,10 @@ > #include "StoreIOBuffer.h" > #include "comm.h" > #include "fde.h" > +#include "forward.h" > #include "CommIO.h" > #include "ConnectionDetail.h" > +#include "HttpRequest.h" > #include "MemBuf.h" > #include "pconn.h" > #include "SquidTime.h" > @@ -52,6 +54,7 @@ > #include > #endif > > +#include "ip_tproxy.h" > > class ConnectStateData > { > @@ -66,7 +69,7 @@ public: > char *host; > u_short port; > > -struct sockaddr_in S; > +struct sockaddr_in S, src_addr; > CallBack callback; > > struct IN_ADDR in_addr; > @@ -1150,6 +1153,26 @@ ConnectStateData::operator delete (void > cbdataFree(address); > } > > +void commConnectStart2(int fd, const char *host, u_short port, CNCB > *callback, > + FwdState *fs) > +{ > +ConnectStateData *cs; > + > +cs = new ConnectStateData; > +cs->fd = fd; > +cs->host = xstrdup(host); > +cs->port = port; > +cs->callback = CallBack(callback, fs); > +if(fs->request != NULL) { > +cs->src_addr.sin_addr = fs->request->client_addr; > +cs->src_addr.sin_port = htons(fs->request->client_port); > +} else { > +memset(&cs->src_addr, 0, sizeof(cs->src_addr)); > +} > +comm_add_close_handler(fd, commConnectFree, cs); > +ipcache_nbgethostbyname(host, commConnectDnsHandle, cs); > +} > + > void > commConnectStart(int fd, const char *host, u_short port, CNCB * callback, > void *data) > { > @@ -1353,7 +1376,7 @@ ConnectStateData::connect() > if (S.sin_addr.s_addr == 0) > defaults(); > > -switch (comm_connect_addr(fd, &S)) { > +switch (comm_connect_addr(fd, &S, &src_addr)) { > > case COMM_INPROGRESS: > debug(5, 5) ("ConnectStateData::connect: FD %d: COMM_INPROGRESS\n", > fd); > @@ -1406,9 +1429,45 @@ commSetTimeout(int fd, int timeout, PF * > return F->timeout; > } > > -int > +static void do_tproxy(int sock, const struct sockaddr_in *src, > + const struct sockaddr_in *dest) > +{ > +struct in_tproxy itp; > +int ret; > > -comm_connect_addr(int sock, const struct sockaddr_in *address) > +memset(&itp, 0, sizeof(itp)); > +itp.v.addr.faddr = src->sin_addr; // fix endianness > +itp.v.addr.fport = 0; //src->sin_port; > +itp.op = TPROXY_ASSIGN; > + > +if((ret = setsockopt(sock, SOL_IP, IP_TPROXY, &itp, sizeof(itp))) != 0) { > +debug(5, 3) ("setsockopt IP_TPROXY/TPROXY_ASSIGN failed\n"); > +return; > +} > + > +memset(&itp, 0, sizeof(itp)); > +itp.v.addr.faddr = dest->sin_addr; > +itp.v.addr.fport = dest->sin_port; > +itp.op = TPROXY_CONNECT; > +if((ret = setsockopt(sock, SOL_IP, IP_TPROXY, &itp, sizeof(itp))) != 0) { > +debug(5, 3) ("setsockopt IP_TPROXY/TPROXY_CONNECT failed\n"); > +return; > +} > + > +memset(&itp, 0, sizeof(itp)); > +itp.v.flags = ITP_CONNECT; > +itp.op = TPROXY_FLAGS; > +if((ret = setsockopt(sock, SOL_IP, IP_TPROXY, &itp, sizeof(itp))) != 0) { > +debug(5, 3) ("setsockopt IP_TPROX
Re: Any takers on Bug #1703? (diskd stuck at 100% CPU)
lör 2006-07-29 klockan 18:05 +0800 skrev Steven: > I could reproduce the bug if I had a COSS cache_dir enabled without any > aufs cache_dirs. I've updated the bug with a patch to fix this scenario. I think the COSS issue is separate. Based on your patch that problem should be seen immediately on startup, and not after a "squid -k rotate". Also are you sure the symptoms is really the same? In bug #1703 Squid seem to be stuck on calling msgrecv() repeatedly. Regards Henrik signature.asc Description: Detta är en digitalt signerad meddelandedel
Re: Occasional DNS error in Squid 2.6
lör 2006-07-29 klockan 12:47 +0200 skrev Henrik Nordstrom: > lör 2006-07-29 klockan 09:35 +0200 skrev Guido Serassio: > > Hi, > > > > I have found an occasional DNS resolution error when browsing > > www.microsoft.com. > > > > I have seen the error only few times, less then 10, but the odd thing > > is that this happens always only with www.microsoft.com and sometime > > it was also happened on 2.5: > > > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=367963 > > > > A page reload is always successful. > > Could be Bug #1602. www.microsoft.com is dangerously close to the UDP > DNS size limit of 512 octets.. Could also be a broken bind version.. some versions of bind have had problems with CNAME chains and TTL expiry. a ethereal trace of port 53 traffic will tell what the problem is. Regards Henrik signature.asc Description: Detta är en digitalt signerad meddelandedel
Re: Occasional DNS error in Squid 2.6
lör 2006-07-29 klockan 09:35 +0200 skrev Guido Serassio: > Hi, > > I have found an occasional DNS resolution error when browsing > www.microsoft.com. > > I have seen the error only few times, less then 10, but the odd thing > is that this happens always only with www.microsoft.com and sometime > it was also happened on 2.5: > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=367963 > > A page reload is always successful. Could be Bug #1602. www.microsoft.com is dangerously close to the UDP DNS size limit of 512 octets.. Regards Henrik signature.asc Description: Detta är en digitalt signerad meddelandedel
Re: Any takers on Bug #1703? (diskd stuck at 100% CPU)
On Fri, 28 Jul 2006, Henrik Nordstrom wrote: > Looked at it breafly, but ran out of ideas. > > http://www.squid-cache.org/bugs/show_bug.cgi?id=1703 > > Regards > Henrik I could reproduce the bug if I had a COSS cache_dir enabled without any aufs cache_dirs. I've updated the bug with a patch to fix this scenario. Steven
Occasional DNS error in Squid 2.6
Hi, I have found an occasional DNS resolution error when browsing www.microsoft.com. I have seen the error only few times, less then 10, but the odd thing is that this happens always only with www.microsoft.com and sometime it was also happened on 2.5: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=367963 A page reload is always successful. Regards Guido - Guido Serassio Acme Consulting S.r.l. - Microsoft Certified Partner Via Lucia Savarino, 1 10098 - Rivoli (TO) - ITALY Tel. : +39.011.9530135 Fax. : +39.011.9781115 Email: [EMAIL PROTECTED] WWW: http://www.acmeconsulting.it/