[PHP] Finding out which file is retrieved over HTTP

2003-03-23 Thread Jens Lehmann
The following short script retrieves a file over HTTP:

$url = 'http://www.example.com/';
implode('',file($url)); // or file_get_contents()
Now I'd like to find out which file was really retrieved, for instance 
http://www.example.com/index.html. Is this possible and how?

Background:

I need to write a small link-checker (Intranet), which reads in all 
links within a file and then looks if they're broken and collects some 
information. Unfortunately I didn't find a simple, free link-checker 
that's why I write my own. It would be good to find out the "complete" 
url, because I want to collect the file-endings (.php,.html, ...).

Another thing is that my script is recursive, so I need a function 
absolute_link() which takes a (possibly relative) path and an url to 
find out which page to go next.

Example:

$url = http://www.example.com/foo/bar/
Somewhere in the source code:
...  ...
My script reads in $path='../articles/page.html'. The function 
absolute_link($url, $path) should return 
'http://www.example.com/foo/articles/page.html'. However $url could be 
http://www.example.com/foo/bar (bar can be file or dir here imho) or 
http://www.example.com/foo/bar/index.php and in any case absolute_link() 
should return the same. Of course this function is easier to implement 
if I always have something like 
http://www.example.com/foo/bar/index.php. Maybe there's already a useful 
function besides parse-url() I can use here.

Jens

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


Re: [PHP] Finding out which file is retrieved over HTTP

2003-03-23 Thread David Otton
On Sun, 23 Mar 2003 21:21:39 +0100, you wrote:

>The following short script retrieves a file over HTTP:
>
>$url = 'http://www.example.com/';
>implode('',file($url)); // or file_get_contents()
>
>Now I'd like to find out which file was really retrieved, for instance 
>http://www.example.com/index.html. Is this possible and how?

Difficult - you made a request, and the webserver returned a response.
Whether or not the webserver maps your request to a specific file - or
if it even has any concept of a file - is it's own internal matter.

Having said that, you could try the Content-Location header, and the 3xx
status codes.

>I need to write a small link-checker (Intranet), which reads in all 
>links within a file and then looks if they're broken and collects some 
>information. Unfortunately I didn't find a simple, free link-checker 
>that's why I write my own. It would be good to find out the "complete" 
>url, because I want to collect the file-endings (.php,.html, ...).

I really think this already exists. You should probably search a bit
harder.


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Finding out which file is retrieved over HTTP

2003-03-25 Thread Jens Lehmann
David Otton wrote:
On Sun, 23 Mar 2003 21:21:39 +0100, you wrote:


The following short script retrieves a file over HTTP:

$url = 'http://www.example.com/';
implode('',file($url)); // or file_get_contents()
Now I'd like to find out which file was really retrieved, for instance 
http://www.example.com/index.html. Is this possible and how?

[...]
I need to write a small link-checker (Intranet), which reads in all 
links within a file and then looks if they're broken and collects some 
information. Unfortunately I didn't find a simple, free link-checker 
that's why I write my own. It would be good to find out the "complete" 
url, because I want to collect the file-endings (.php,.html, ...).


I really think this already exists. You should probably search a bit
harder.
Maybe there are good standalone-link-checkers, but I need to integrate 
it in an application and my customer has some special wishes. Anyways I 
finished writing the link-checker.

A thing which seemed a bit confusing to me is that if I open a 
non-existing website (fopen('http://www.amazon.de/nonsensestuff')) I 
always get an error message "Success". It's the same thing if I use 
file(). The manual explains that fopen() returns false if the website 
could not be opened, that's why I don't know why this error message 
appears? Besides this "Success"-message is surely a very bad error 
message. I use PHP 4.2.3.

Jens



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


Re: [PHP] Finding out which file is retrieved over HTTP

2003-03-25 Thread Ernest E Vogelsinger
At 13:08 25.03.2003, Jens Lehmann spoke out and said:
[snip]
>A thing which seemed a bit confusing to me is that if I open a 
>non-existing website (fopen('http://www.amazon.de/nonsensestuff')) I 
>always get an error message "Success". It's the same thing if I use 
>file(). The manual explains that fopen() returns false if the website 
>could not be opened, that's why I don't know why this error message 
>appears? Besides this "Success"-message is surely a very bad error 
>message. I use PHP 4.2.3.
[snip] 

If the website doesn't exist, but the web SERVER does, it will return
something, usually a 404-message. fopen() doesn't check for HTTP result
codes, it just opens the stream and retuirns whatever gets sent (with the
single exception of omitting the headers, but without looking).

So in fact an fopen() to a non-existing page will be seen as successful by
fopen(). This is by design and afaik documented somewhere.

To actually check on the HTTP status codes you need to run your own, either
using cURL, or by doing your own stuff using fsockopen().


-- 
   >O Ernest E. Vogelsinger 
   (\) ICQ #13394035 
^ http://www.vogelsinger.at/


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Finding out which file is retrieved over HTTP

2003-03-25 Thread Ernest E Vogelsinger
At 20:48 25.03.2003, Jens Lehmann spoke out and said:
[snip]
>> To actually check on the HTTP status codes you need to run your own, either
>> using cURL, or by doing your own stuff using fsockopen().
>
>I tried using fsockopen(), but still experience a problem, I want to use 
>the following script to retrieve both, the headers and the actual file:

You need to transmit the "Host:" header so the web server has a chance to
know which virtual host you want to reach. Ideally you would also tell the
server what type of content you can handle (see below). Also note that the
"correct" line ending for MIME headers is always CRLF ("\r\n"):

$hostname = 'www.example.com';
$file = '/index.php';

$ip = gethostbyname($hostname);
$fp = fsockopen($ip, 80, &$errno, &$errstr);
if ($fp)
{
fputs( $fp, "GET ".$file." HTTP/1.0\r\n" .
 "Host: $hostname\r\n" .
 "Accept: text/html\r\n" .
 "Accept: text/plain\r\n\r\n" );

$src = '';
while (!feof ($fp))
$src .= fgets($fp, 4096);
fclose ($fp);

}

This will get you the whole page, including all headers.


-- 
   >O Ernest E. Vogelsinger 
   (\) ICQ #13394035 
^ http://www.vogelsinger.at/


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Finding out which file is retrieved over HTTP

2003-03-25 Thread Jens Lehmann
Ernest E Vogelsinger wrote:
At 20:48 25.03.2003, Jens Lehmann spoke out and said:
[snip]
To actually check on the HTTP status codes you need to run your own, either
using cURL, or by doing your own stuff using fsockopen().
I tried using fsockopen(), but still experience a problem, I want to use 
the following script to retrieve both, the headers and the actual file:


You need to transmit the "Host:" header so the web server has a chance to
know which virtual host you want to reach. Ideally you would also tell the
server what type of content you can handle (see below). Also note that the
"correct" line ending for MIME headers is always CRLF ("\r\n"):
$hostname = 'www.example.com';
$file = '/index.php';
$ip = gethostbyname($hostname);
$fp = fsockopen($ip, 80, &$errno, &$errstr);
if ($fp)
{
fputs( $fp, "GET ".$file." HTTP/1.0\r\n" .
 "Host: $hostname\r\n" .
 "Accept: text/html\r\n" .
 "Accept: text/plain\r\n\r\n" );
$src = '';
while (!feof ($fp))
$src .= fgets($fp, 4096);
fclose ($fp);
}

This will get you the whole page, including all headers.
Thank you! This works fine. I can use this for reading the HTTP-Status.

Jens

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php