Re: [squid-users] Squid loops on epoll/read/gettimeofday

2010-04-03 Thread Henrik Nordstrom
fre 2010-04-02 klockan 19:00 -0300 skrev Marcus Kool:
> 

> I use this Squid at home and it lightly used so maybe it does have
> the same FDs allocated after a restart...

Seems not. So we have to wait until you see the problem again to match
up with cachemgr filedescriptor page.

Regards
Henrik



Re: [squid-users] Squid loops on epoll/read/gettimeofday

2010-04-02 Thread Marcus Kool



Henrik Nordström wrote:

fre 2010-04-02 klockan 15:41 -0300 skrev Marcus Kool:


strange indeed, but this is strace output with which I am not very familiar.
Strace should print the whole array that it uses as argument to
epoll_wait or just prints the first element ?  (and the 2nd argument
could be FD 27)


For epoll_wait it prints the valid entries in the result array, which
has length 1 in this case (see return value last on the line).


Can you please use cachemgr to view the current filedescriptors page to
note what fd 23 & fd 27 is being used for?

I restarted Squid and now it does not loop...
The lsof output in the original email is of little help:


A little. Both are pipes. And it's not unlikely these pipes have the
same fd assignments now after restart. What does cachemgr say now?


I use this Squid at home and it lightly used so maybe it does have
the same FDs allocated after a restart...
Cachemgr gives this (after the restart):

Active file descriptors:
File Type   Tout Nread  * Nwrite * Remote AddressDescription
 --    - 
--
   5 Log   0   00
/local/squid/logs/cache.log
   7 Socket01275  241  .0DNS Socket
   8 Pipe  0  63 9934ufdbgclient #1
   9 Pipe  0   0*   0ufdbgclient #2
  10 Pipe  0   0*   0ufdbgclient #3
  11 Pipe  0   0*   0ufdbgclient #4
  12 Pipe  0   0*   0ufdbgclient #5
  13 Pipe  0   0*   0ufdbgclient #6
  14 Pipe  0   0*   0ufdbgclient #7
  15 Pipe  0   0*   0ufdbgclient #8
  16 Pipe  0   0*   0ufdbgclient #9
  17 Pipe  0   0*   0ufdbgclient #10
  18 Pipe  0   0*   0ufdbgclient #11
  19 Pipe  0   0*   0ufdbgclient #12
  20 File  0   0 8242
/local/squid/logs/access.log
  21 Pipe  0   00unlinkd -> squid
  22 File  0   0  312
/local/squid/cache/swap.state
  23 Socket 1440 848*   0  10.2.2.9.56505 
http://srv004.home01.urlfilterdb.com/Squid/cgi-bin/cachemgr.cgi

  24 Pipe  0   00squid -> unlinkd
  25 Socket0   00  .0HTTP Socket
  26 Socket2   0*1794  10.2.2.9.80 
http://srv004.home01.urlfilterdb.com/Squid/cgi-bin/cachemgr.cgi
  27 Socket 1440 170*   0* 127.0.0.1.35576 
cache_object://localhost/filedescriptors

  28 Pipe  0   0*   0async-io completetion 
event: main
  29 Pipe  0   00async-io completetion 
event: threads


squid   13665 squid   20w   REG9,3  1478687 4718702 
/local/squid/logs/access.log
squid   13665 squid   21r  FIFO0,6  1723502 pipe
squid   13665 squid   22w   REG9,3 30400812 4718715 
/local/squid/cache/swap.state

squid   13665 squid   23r  FIFO0,6  1723506 pipe
squid   13665 squid   24w  FIFO0,6  1723503 pipe
squid   13665 squid   25u  IPv41723505  TCP *:squid 
(LISTEN)
squid   13665 squid   26w  FIFO0,6  1723506 pipe
squid   13665 squid   27r  FIFO0,6  1723507 pipe
FD 27 *
squid   13665 squid   28w  FIFO0,6  1723507 pipe
squid   13665 squid   29u  IPv41771231  TCP srv004:squid->xxx 
(ESTABLISHED)


It seems that FD 27 and 28 point to the same object (as FD 23 and FD 26)


Hmm..


The strace output lists the parameters at the call time and
we do not see the return values (only the function return value).


Should be return values. strace knows how epoll_wait works.


Is it possible that FD 27 has an EPOLLERR condition instead of a
EPOLLIN condition and the loop is caused by trying to do read() on FD 27 ?
(I do not see any check for a return of EPOLLERR in the sources).


read should not return EAGAIN if there is an pending error condition.

>

Regards
Henrik





Re: [squid-users] Squid loops on epoll/read/gettimeofday

2010-04-02 Thread Henrik Nordström
fre 2010-04-02 klockan 15:41 -0300 skrev Marcus Kool:

> strange indeed, but this is strace output with which I am not very familiar.
> Strace should print the whole array that it uses as argument to
> epoll_wait or just prints the first element ?  (and the 2nd argument
> could be FD 27)

For epoll_wait it prints the valid entries in the result array, which
has length 1 in this case (see return value last on the line).

> > Can you please use cachemgr to view the current filedescriptors page to
> > note what fd 23 & fd 27 is being used for?
> 
> I restarted Squid and now it does not loop...
> The lsof output in the original email is of little help:

A little. Both are pipes. And it's not unlikely these pipes have the
same fd assignments now after restart. What does cachemgr say now?

> squid   13665 squid   20w   REG9,3  1478687 4718702 
> /local/squid/logs/access.log
> squid   13665 squid   21r  FIFO0,6  1723502 pipe
> squid   13665 squid   22w   REG9,3 30400812 4718715 
> /local/squid/cache/swap.state
> squid   13665 squid   23r  FIFO0,6  1723506 pipe
> squid   13665 squid   24w  FIFO0,6  1723503 pipe
> squid   13665 squid   25u  IPv41723505  TCP *:squid 
> (LISTEN)
> squid   13665 squid   26w  FIFO0,6  1723506 pipe
> squid   13665 squid   27r  FIFO0,6  1723507 pipe  
>   FD 27 *
> squid   13665 squid   28w  FIFO0,6  1723507 pipe
> squid   13665 squid   29u  IPv41771231  TCP 
> srv004:squid->xxx 
> (ESTABLISHED)
> 
> It seems that FD 27 and 28 point to the same object (as FD 23 and FD 26)

Hmm..

> The strace output lists the parameters at the call time and
> we do not see the return values (only the function return value).

Should be return values. strace knows how epoll_wait works.

> Is it possible that FD 27 has an EPOLLERR condition instead of a
> EPOLLIN condition and the loop is caused by trying to do read() on FD 27 ?
> (I do not see any check for a return of EPOLLERR in the sources).

read should not return EAGAIN if there is an pending error condition.

Regards
Henrik



Re: [squid-users] Squid loops on epoll/read/gettimeofday

2010-04-02 Thread Marcus Kool



Henrik Nordström wrote:

mån 2010-03-29 klockan 13:58 -0300 skrev Marcus Kool:


  0.33 epoll_wait(6, {{EPOLLIN, {u32=23, u64=8800387989527}}}, 2400, 
10) = 1
  0.32 gettimeofday({1269878848, 223083}, NULL) = 0
  0.31 read(27, 0xffd3de98, 256) = -1 EAGAIN (Resource temporarily 
unavailable)


This is odd.. epoll_wait says fd 23 is ready for reading, but then Squid
reads on fd 27.


strange indeed, but this is strace output with which I am not very familiar.
Strace should print the whole array that it uses as argument to
epoll_wait or just prints the first element ?  (and the 2nd argument
could be FD 27)


Can you please use cachemgr to view the current filedescriptors page to
note what fd 23 & fd 27 is being used for?


I restarted Squid and now it does not loop...
The lsof output in the original email is of little help:
squid   13665 squid   20w   REG9,3  1478687 4718702 
/local/squid/logs/access.log
squid   13665 squid   21r  FIFO0,6  1723502 pipe
squid   13665 squid   22w   REG9,3 30400812 4718715 
/local/squid/cache/swap.state

squid   13665 squid   23r  FIFO0,6  1723506 pipe
squid   13665 squid   24w  FIFO0,6  1723503 pipe
squid   13665 squid   25u  IPv41723505  TCP *:squid 
(LISTEN)
squid   13665 squid   26w  FIFO0,6  1723506 pipe
squid   13665 squid   27r  FIFO0,6  1723507 pipe
FD 27 *
squid   13665 squid   28w  FIFO0,6  1723507 pipe
squid   13665 squid   29u  IPv41771231  TCP srv004:squid->xxx 
(ESTABLISHED)


It seems that FD 27 and 28 point to the same object (as FD 23 and FD 26)


Note that FD 27 and FD 28 have the same NODE.
This pipe is used for what ???


Are you using aufs? If so then it's an internal notification channel to
wake up the main thread when disk I/O have completed.


Yes I use aufs.


The EAGAIN return code to read is strange.  It suggest that the
read() could return data soon, but Squid is looping now for over 4 hours.


Not strange if fd 27 is the aufs notification pipe.


Well, the strange thing is that it loops and uses 100% CPU.


The strange thing is why it does not act on fd 23 as returned by
epoll_wait.


The strace output lists the parameters at the call time and
we do not see the return values (only the function return value).

Is it possible that FD 27 has an EPOLLERR condition instead of a
EPOLLIN condition and the loop is caused by trying to do read() on FD 27 ?
(I do not see any check for a return of EPOLLERR in the sources).

-Marcus


Regards
Henrik





Re: [squid-users] Squid loops on epoll/read/gettimeofday

2010-04-02 Thread Henrik Nordström
mån 2010-03-29 klockan 13:58 -0300 skrev Marcus Kool:

>   0.33 epoll_wait(6, {{EPOLLIN, {u32=23, u64=8800387989527}}}, 2400, 
> 10) = 1
>   0.32 gettimeofday({1269878848, 223083}, NULL) = 0
>   0.31 read(27, 0xffd3de98, 256) = -1 EAGAIN (Resource temporarily 
> unavailable)

This is odd.. epoll_wait says fd 23 is ready for reading, but then Squid
reads on fd 27.

Can you please use cachemgr to view the current filedescriptors page to
note what fd 23 & fd 27 is being used for?

> Note that FD 27 and FD 28 have the same NODE.
> This pipe is used for what ???

Are you using aufs? If so then it's an internal notification channel to
wake up the main thread when disk I/O have completed.

> The EAGAIN return code to read is strange.  It suggest that the
> read() could return data soon, but Squid is looping now for over 4 hours.

Not strange if fd 27 is the aufs notification pipe.

The strange thing is why it does not act on fd 23 as returned by
epoll_wait.

Regards
Henrik



Re: [squid-users] Squid loops on epoll/read/gettimeofday

2010-03-29 Thread Amos Jeffries
On Mon, 29 Mar 2010 13:58:58 -0300, Marcus Kool
 wrote:
> I use squid
> Squid Cache: Version 3.0.STABLE20
> configure options:  '--prefix=/local/squid' '--with-default-user=squid' 
> '--with-filedescriptors=2400' '--enable-icap-client'
> '--enable-storeio=aufs,ufs,null' 
> '--with-pthreads' '--enable-async-io=8' '--enable-removal-policies=lru' 
> '--enable-default-err-language=English' '--enable-err-languages=Dutch
> English Portuguese' 
> '--enable-ssl' '--enable-cachemgr-hostname=localhost'
> '--enable-cache-digests' 
> '--enable-follow-x-forwarded-for' '--enable-forw-via-db'
> '--enable-xmalloc-statistics' 
> '--disable-hostname-checks' '--enable-epoll' '--enable-useragent-log'
> '--enable-referer-log' 
> '--enable-underscores' 'CC=gcc' 'CFLAGS=-O2 -m32' 'CXXFLAGS=-O2 -m32'
>

> 
> Note that FD 27 and FD 28 have the same NODE.
> This pipe is used for what ???

Something detailed in your cache.log when running at debug level ALL,1
(notices).

> 
> The EAGAIN return code to read is strange.  It suggest that the
> read() could return data soon, but Squid is looping now for over 4
hours.
> 
> Hopefully one of the developers knows what the pipe is used for and
> can do a guess what is causing the EAGAIN return code.

One of the helpers. Can't tell from the trace.

Please also see if this is reproduced with the 3.1 release.

Amos