On Thursday 11 of April 2013, Arkadiusz Miśkiewicz wrote:

> Another strace including children
> http://ixion.pld-linux.org/~arekm/apache1.txt
> 
> Graceful restart and first OPTIONS write at
> 
> 12594 15:10:01.397356 write(7, "OPTIONS * HTTP/1.0\r\nUser-Agent:
> Apache/2.4.4 (Unix) (internal dummy connection)\r\n\r\n", 83 <unfinished
> ...>

Another observation.

When this happens (when dummy_connection() is called, my " XXX dummy 2 
processing X" are printed) then in that time all children are already 
zombie/defunct, so they cannot answer/do anything. The only living process is 
master process.

> 
> but first read is long after write
> 
> 13279 15:10:52.636685 <... read resumed> "OPTIONS * HTTP/1.0\r\nUser-Agent:
> Apache/2.4.4 (Unix) (internal dummy connection)\r\n\r\n", 8000) = 83
> <0.000014>

And that's read by new generation I think

Old generation childs exit in:

        else if (retained->my_generation !=
                 ap_scoreboard_image->global->running_generation) { /* 
restart? */
            /* yeah, this could be non-graceful restart, in which case the
             * parent will kill us soon enough, but why bother checking?
             */
            die_now = 1;
        }

So looks like master process is trying to send data (in dummy_connection()) to 
childrens that already exit()ed leading to whole slowdown. This is how it 
looks http://ixion.pld-linux.org/~arekm/apache3.txt

Opinions?


13:38:03.383798 connect(7, {sa_family=AF_INET, sin_port=htons(80), 
sin_addr=inet_addr("192.168.1.121")}, 16) = -1 EINPROGRESS (Operation now in 
progress) <0.000024>
13:38:03.383841 poll([{fd=7, events=POLLOUT}], 1, 3000) = 1 ([{fd=7, 
revents=POLLOUT}]) <0.000006>
13:38:03.383872 getsockopt(7, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 <0.000006>
13:38:03.383897 write(7, "OPTIONS * HTTP/1.0\r\nUser-Agent: Apache/2.4.4 
(Unix) (internal dummy connection)\r\n\r\n", 83) = 83 <0.000022>

4 in SO_ERROR is a "Interrupted system call", so could support my suspicion

> 
> Huh?
> 
> That's even after resuming operations mesage which was a bit earlier:
> 
> 12594 15:10:52.606253 write(2, "[Thu Apr 11 15:10:52.606236 2013]
> [mpm_prefork:notice] [pid 12594] AH00163: Apache/2.4.4 (Unix) configured --
> resuming normal operations\n", 137) = 137 <0.000009>
> 
> Compare that to case where ab wasn't running:
> http://ixion.pld-linux.org/~arekm/apache-no-ab.txt
> write OPTIONS in main process and read in childrens are interleaving as
> expected.
> 
> Now why these aren't interleaving in apache1.txt where ab was running?
> 
> > As the accept call
> > on httpd side only returns when the first data is send on the socket,
> > IMHO the time the poll takes does take place in the kernel and not in
> > the httpd children code.
> 
> Well, I think kernel is not the one to blame.
> 
> > Have you checked your messages file if the kernel reports
> > something when this happens? How does your run queue and CPU load look
> > like when this happens (top)?
> 
> No kernel messages, no cpu spikes - looks normal.
> 
> > Regards
> > Rüdiger


-- 
Arkadiusz Miśkiewicz, arekm / maven.pl

Reply via email to