On Fri, 7 Jun 2002, Brian Pane wrote:
> IMHO, that's a design flaw. Regardless of whether PHP is doing > buffering, it shouldn't break up blocks of static content into > small pieces--especially not as small as 400 bytes. While it's > certainly valid for PHP to insert a flush bucket right before a > block of embedded code (in case that code takes a long time to > run), breaking static text into 400-byte chunks will usually mean > that it takes *longer* for the content to reach the client, which > probably defeats PHP's motivation for doing the nonbuffered output. > There's code downstream, in the httpd's core_output_filter and > the OS's TCP driver, that can make much better decisions about > when to buffer and when not to buffer. FWIW, I totally agree here. One of the biggest problems with the way PHP handles buckets (as I'm sure has been discussed before I know) is that static content cannot remain in its native form as it goes through PHP, or at least not in very big chunks. Take as a counterexample the way mod_include deals with FILE buckets. It reads the FILE bucket (which causes the file the be MMAPed if allowed), and from there it just scans through the mmaped region, and if it finds nothing, it hands it on to the next filter still in the single-MMAP-bucket form. PHP/Zend, on the other hand, takes the file descriptor out of the file bucket, runs it through a lexical analyzer which tokenizes it up to 400 bytes at a time, runs it through the yacc-generated grammar as necessary, and handles it from there. Far more optimal would be to take the input, do a search through it for a starting tag just as mod_include does, and if none is found (or up until one is found), just tell the SAPI module to "go ahead and send up to THIS point out to the client unmodified". So basically the difference between this and what we have now is that the lexer should not take each 400 byte buffer and say "here is (up to) 400 bytes of inline HTML, send it to the client as-is"; instead, it should be able to do something along the lines of taking the input 400 bytes at a time, and as soon as it starts seeing inline HTML, keep track of the starting offset (in bytes), and keep scanning through those 400 byte buffers in a tight loop until it finds something that's NOT inline HTML, and set the ending offset. Then it can notify PHP in one call "send bytes 375-10136 to the client as-is, it's inline html". Another important thing that the lexical analyzer needs to support is that the user of Zend (PHP in this case) should be able to specify the YY_INPUT function rather than being *forced* to give Zend a filename or file descriptor. That's absolutely critical for the filtering design under Apache 2.0 to work right. What we have now is a total kludge. I realize I'm calling into question some fundamental design decisions of which I was not a part, so of course there is the possibility that I'm missing some important detail. But I think it would be relatively easy to insert an optimization here that could make a huge difference without breaking too many assumptions in the code. I think. Thanks, --Cliff -- PHP Development Mailing List <http://www.php.net/> To unsubscribe, visit: http://www.php.net/unsub.php