On Fri, 7 Jun 2002, Cliff Woolley wrote:

> On Fri, 7 Jun 2002, Brian Pane wrote:
> > IMHO, that's a design flaw.  Regardless of whether PHP is doing
> > buffering, it shouldn't break up blocks of static content into
> > small pieces--especially not as small as 400 bytes.  While it's
> > certainly valid for PHP to insert a flush bucket right before a
> > block of embedded code (in case that code takes a long time to
> > run), breaking static text into 400-byte chunks will usually mean
> > that it takes *longer* for the content to reach the client, which
> > probably defeats PHP's motivation for doing the nonbuffered output.
> > There's code downstream, in the httpd's core_output_filter and
> > the OS's TCP driver, that can make much better decisions about
> > when to buffer and when not to buffer.
> FWIW, I totally agree here.  One of the biggest problems with the way PHP
> handles buckets (as I'm sure has been discussed before I know) is that
> static content cannot remain in its native form as it goes through PHP, or
> at least not in very big chunks.  Take as a counterexample the way
> mod_include deals with FILE buckets.  It reads the FILE bucket (which
> causes the file the be MMAPed if allowed), and from there it just scans
> through the mmaped region, and if it finds nothing, it hands it on to the
> next filter still in the single-MMAP-bucket form.  PHP/Zend, on the other
> hand, takes the file descriptor out of the file bucket, runs it through a
> lexical analyzer which tokenizes it up to 400 bytes at a time, runs it
> through the yacc-generated grammar as necessary, and handles it from
> there.  Far more optimal would be to take the input, do a search through
> it for a starting tag just as mod_include does, and if none is found (or
> up until one is found), just tell the SAPI module to "go ahead and send up
> to THIS point out to the client unmodified".
> So basically the difference between this and what we have now is that the
> lexer should not take each 400 byte buffer and say "here is (up to) 400
> bytes of inline HTML, send it to the client as-is"; instead, it should be
> able to do something along the lines of taking the input 400 bytes at a
> time, and as soon as it starts seeing inline HTML, keep track of the
> starting offset (in bytes), and keep scanning through those 400 byte
> buffers in a tight loop until it finds something that's NOT inline HTML,
> and set the ending offset.  Then it can notify PHP in one call "send bytes
> 375-10136 to the client as-is, it's inline html".

I don't believe that this is what's slowing PHP down on Apache 2. Don't
forget that after the first piece of PHP code we can't optimize inline
HTML any more because they can be conditional.
I think that the non-mutexed per-thread memory pool would improve
performance much more significantly.


