On 25 November 2010 11:32, Deva <devendra...@gmail.com> wrote:
> Use curl
> http://php.net/manual/en/book.curl.php
>
>
> On Thu, Nov 25, 2010 at 4:41 PM, Shreyas Agasthya <shreya...@gmail.com>wrote:
>
>> I feel you should use more of the 4th method here as you are not trying to
>> read the file but the header level  (7th layer) information of the HTTP
>> protocol.
>>
>> http://php.net/manual/en/function.file-get-contents.php
>>
>>
>> --Shreyas
>>
>> On Thu, Nov 25, 2010 at 4:11 PM, Ron Piggott <
>> ron.pigg...@actsministries.org
>> > wrote:
>>
>> >   Will the header pass with using file_get_contents , or should I be
>> using
>> > another command, and if so, which one?  Ron
>> >
>> > <?php
>> >
>> >     header('User Agent: RonBot (http://www.example.com)');
>> >     $url = "http://www.example.com";; <http://www.example.com%22;>
>> >
>> >         $input = file_get_contents($url);
>> >
>> >
>> >
>> > The Verse of the Day
>> > “Encouragement from God’s Word”
>> > http://www.TheVerseOfTheDay.info
>> >
>> >  *From:* Shreyas Agasthya <shreya...@gmail.com>
>> > *Sent:* Thursday, November 25, 2010 4:21 AM
>> > *To:* Ron Piggott <ron.pigg...@actsministries.org>
>> > *Cc:* php-general@lists.php.net ; a...@ashleysheridan.co.uk
>> > *Subject:* Re: [PHP] Fw: Spoofing user_agent
>> >
>> > A standard HTTP Request headers is : User Agent (without the underscore).
>> >
>> > --Shreyas
>> >
>> > On Thu, Nov 25, 2010 at 2:36 PM, Ron Piggott <
>> > ron.pigg...@actsministries.org> wrote:
>> >
>> >>
>> >> Is this what you are telling me to do:
>> >>
>> >> header('user_agent: RonBot (http://www.theverseoftheday.info)');
>> >>
>> >> Ron
>> >>
>> >> The Verse of the Day
>> >> “Encouragement from God’s Word”
>> >> http://www.TheVerseOfTheDay.info
>> >>
>> >> From: a...@ashleysheridan.co.uk
>> >> Sent: Thursday, November 25, 2010 3:34 AM
>> >> To: Ron Piggott ; php-general@lists.php.net
>> >> Subject: Re: [PHP] Fw: Spoofing user_agent
>> >>
>> >> You need to set it in the header request you make. Putting it in the
>> >> script you're using as a spider with ini_set won't do anything because
>> the
>> >> Target site doesn't know anything about it.
>> >>
>> >> Thanks,
>> >> Ash
>> >> http://www.ashleysheridan.co.uk
>> >>
>> >> ----- Reply message -----
>> >> From: "Ron Piggott" <ron.pigg...@actsministries.org>
>> >> Date: Thu, Nov 25, 2010 08:25
>> >> Subject: [PHP] Fw: Spoofing user_agent
>> >> To: <php-general@lists.php.net>
>> >>
>> >> I have wrote a script to generate a sitemap of my web site.  It crawls
>> all
>> >> of the site web pages.  (About 30,000)
>> >>
>> >> I need help to spoof the user_agent variable so the stats program
>> running
>> >> in the background ( “AWSTATS” ) will treat the crawl as a bot, not
>> browsing
>> >> usage.
>> >>
>> >> The sitemap generator is a cron job.  I tried the syntax:
>> >> ini_set('user_agent', 'RonBot (http://www.theverseoftheday.info)/'/);
>> >>
>> >> This didn’t work.  The browsing was attributed to the dedicated IP
>> >> address.
>> >>
>> >> How do I get AWSTATS to access this, such as other entries under the
>> >> “Robots/Spiders visitors” heading:
>> >> Unknown robot (identified by 'bot*')
>> >>
>> >> I don’t mean any ill will by changing this setting.  Thanks for the
>> help.
>> >>
>> >> Ron
>> >>
>> >> The Verse of the Day
>> >> “Encouragement from God’s Word”
>> >> http://www.TheVerseOfTheDay.info
>> >>
>> >>
>> >
>> >
>> > --
>> > Regards,
>> > Shreyas Agasthya
>> >
>>
>>
>>
>> --
>> Regards,
>> Shreyas Agasthya
>>
>
>
>
> --
> :DJ
>

It is no use using header(). This sets a header for the client, not
the server of any file_get_contents() requests.

I use stream_contexts.

$s_Contents = file_get_contents(
  $s_URL,
  False,
  stream_context_create(
    array(
      'http' => array(
        'method' => 'GET',
        'header' => "User-Agent: RonBot (http://www.example.com)\r\n"
      ),
    )
  )
);

You can supply cookies, or anything else, with the request. Make sure
you add a \r\n to each of the headers and just concatenate them.

If you are doing this in a loop, then I'd recommend creating a default
stream context and then the request would just be ...

$s_Contents = file_get_contents($s_URL);

As the default stream context would be applied.

I had to use a default stream context to route all http requests
through an NTLM authentication proxy server because PHP doesn't deal
with NTLM authentication.

See my user notes on
http://docs.php.net/manual/en/function.stream-context-get-default.php.
Don't bother with the link at the bottom of the user note- it's not
live.

Richard.

-- 
Richard Quadling
Twitter : EE : Zend
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to