ID: 45533 User updated by: signe at cothlamadh dot net Reported By: signe at cothlamadh dot net Status: Open Bug Type: cURL related Operating System: FreeBSD 7.0 PHP Version: 5.2.6 New Comment:
Of course, after posting the reproduction, the server that was causing the issue modified something and it's no longer reproducing against them. This was the original output from a request to their server: telnet www.crn.com 80 Trying 66.77.24.10... Connected to crn.com. Escape character is '^]'. GET /rss/cisco/index.xml HTTP/1.1 Host: www.crn.com HTTP/1.1 302 Found Date: Wed, 16 Jul 2008 21:52:30 GMT Server: Apache Location: http://feeds.pheedo.com/rss/cisco Transfer-Encoding: chunked Content-Type: text/html; charset=iso-8859-1 Vary: Accept-Encoding, User-Agent 119 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <HTML><HEAD> <TITLE>302 Found</TITLE> </HEAD><BODY> <H1>Found</H1> The document has moved <A HREF="http://feeds.pheedo.com/rss/cisco">here</A>.<P> <HR> <ADDRESS>Apache/1.3.29 Server at www.crn.com Port 80</ADDRESS> </BODY></HTML> 0 Connection closed by foreign host. Previous Comments: ------------------------------------------------------------------------ [2008-07-16 21:30:14] signe at cothlamadh dot net Description: ------------ When retrieving a url that utilizes a 302 redirect, along with viewable error-document content, the error-document is prepended to any REAL content that is retrieved after following the redirect. This issue is compounded when CURLOPT_HEADER is enabled, because the error-document content is not counted in any of the getinfo data. Reproduce code: --------------- http://www.cothlamadh.net/~signe/.outgoing/curl_location.phps Tested with curl 7.18.0 on FreeBSD 7 and 7.16.4-2ubuntu1 on Ubuntu Gutsy. Expected result: ---------------- Non-header data from redirects should not be included in the returned content. Actual result: -------------- Without headers enabled, the content returned looks like this: """ RedirectErrorDocumentContent ActualDocument """ There is no whitespace between the two documents. With headers enabled, it's much much worse. """ RedirectHeader RedirectErrorDocumentContent ActualDocumentHeader ActualDocument """ There is whitespace between each set of headers and its respective content, but not between the first content and the second batch of headers. To make matters worse, curl_getinfo($cUrl, CURLINFO_HEADER_SIZE) returns the combined length of both header sections, as is expected, and curl_getinfo($cUrl, CURLINFO_CONTENT_LENGTH_DOWNLOAD) returns the length of the ActualDocument, also as expected. The result of this is that RedirectErrorDocumentContent gets tossed in the middle invisibly. This makes it impossible to cleanly split the document into header and content sections. ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=45533&edit=1