Re: Automatically fill drop down values with scrapy

Lhassan Baazzi Tue, 15 Jul 2014 17:37:27 -0700

Hi,

You can use this method to parse the CSV and generate Request to crawl each
game, in the response of this request check the existence of agecheck form:


=> IS TRUE then return a FormRequest and fill all form fields.
=> IS FALSE then parse the response and retrieve game informations.


Regards.

Merci.
---------
Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy
Email/Gtalk: [email protected] - Skype: baazzilhassan - Twitter:
@baazzilhassan <http://twitter.com/baazzilhassan>
Blog: http://blog.jbinfo.io/
Donate - PayPal -
<https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>


2014-07-16 0:14 GMT+00:00 Chetan Motamarri <[email protected]>:

> Hi Lhassan,
>
> Is that possible to take input games from csv ? Instead of crawling top
> 500 games from steamcharts.com.
>
> I have a csv file with list of games whose data is to scraped. Some of the
> games in the csv file also requires agecheck. I have attached the csv file.
> The point is that some games in csv here also requires agecheck.. Could you
> please help on this..
>
> I just want to scrape games data from steam urls like (
> http://store.steampowered.com/app/570/
> <http://store.steampowered.com/app/202480/>), for all the games in csv
> attached.
>
>
> On Wednesday, June 25, 2014 2:09:55 AM UTC-7, Lhassan Baazzi wrote:
>
>> Yeh you are right, I pushed a correction of this on the repository and
>> the CSV output: https://github.com/jbinfo/scrapy_store.steampowered.com/
>> blob/master/output.csv
>> perhaps only 476 games that are exported, the messing  game is due to the
>> website like the ranked game 257 http://store.steampowered.com/
>> app/202480/ if you enter on that link your are redirect to the homepage!
>>
>>
>> Regards.
>>
>> Merci.
>> ---------
>> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy
>> Email/Gtalk: [email protected] - Skype: baazzilhassan - Twitter:
>> @baazzilhassan <http://twitter.com/baazzilhassan>
>> Blog: http://blog.jbinfo.io/
>> [image: Donate - PayPal -]
>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate
>> - PayPal -
>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>
>>
>>
>> 2014-06-25 2:16 GMT+01:00 Chetan Motamarri <[email protected]>:
>>
>> Hi dude,
>>>
>>> The code works very fine. But only 450 games out of 500 are retrieving.
>>> I think it is not scraping two pages out of 20 pages in steamcharts.com.
>>> I found out that one page it is ignoring is "
>>> http://steamcharts.com/top/p.1";. In the output csv there are no games
>>> present in before url.
>>>
>>> I tried editing regex "allow=r'(top\/p\.)([1-9]|1[0-9]|20)$' " but not
>>> able to resolve. Could you please help me..
>>>
>>>
>>> On Tuesday, June 24, 2014 4:35:10 AM UTC-7, Lhassan Baazzi wrote:
>>>
>>>> Hi
>>>>
>>>> Complexe task, look at github repository, I pushed the script of the
>>>> top 500 games, see also the output.csv
>>>>
>>>> There is some minor bugs to fix.
>>>>
>>>> Regards.
>>>>
>>>> Merci.
>>>> ---------
>>>> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy
>>>> Email/Gtalk: [email protected] - Skype: baazzilhassan
>>>> Blog: http://blog.jbinfo.io/
>>>> [image: Donate - PayPal -]
>>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate
>>>> - PayPal -
>>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>
>>>>
>>>>
>>>> 2014-06-24 8:10 GMT+01:00 Chetan Motamarri <[email protected]>:
>>>>
>>>> Hi,
>>>>>
>>>>> The game links which I wanted to scrape are the top 500 games
>>>>> extracted from http://steamcharts.com/top.
>>>>>
>>>>>  Some example urls are "http://store.steampowered.com/app/570/";, "
>>>>> http://store.steampowered.com/app/730/";, "http://store.steampowered.
>>>>> com/app/440/". These urls are top 3 games in that steamcharts.com.
>>>>>
>>>>> These top 500 game urls don't have any unique structure. These urls
>>>>> look like all the other game urls. So how to scrape only these urls. ?
>>>>>
>>>>> On Monday, June 23, 2014 11:48:17 PM UTC-7, Lhassan Baazzi wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I need the structure of this links that you want to scrap, if look at
>>>>>> my code you can see that I limit link by this:
>>>>>>
>>>>>> rules = (
>>>>>>         Rule(SgmlLinkExtractor(allow=r'genre/'), follow=True),
>>>>>>         Rule(SgmlLinkExtractor(allow=r'app/\d+'), callback=
>>>>>> 'parse_item')
>>>>>>     )
>>>>>>
>>>>>> Just link that begin by app/\d+ and genre/ to follow and reject
>>>>>> anythings else.
>>>>>> So, give me example of this specifics links.
>>>>>>
>>>>>> Regards.
>>>>>>
>>>>>>
>>>>>> Merci.
>>>>>> ---------
>>>>>> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy
>>>>>> Email/Gtalk: [email protected] - Skype: baazzilhassan
>>>>>> Blog: http://blog.jbinfo.io/
>>>>>> [image: Donate - PayPal -]
>>>>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate
>>>>>> - PayPal -
>>>>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>
>>>>>>
>>>>>>
>>>>>> 2014-06-24 7:40 GMT+01:00 Chetan Motamarri <[email protected]>:
>>>>>>
>>>>>> Hi Lhassan,
>>>>>>>
>>>>>>> Thanks for your response. Your code was amazing and I got what I am
>>>>>>> looking for.
>>>>>>> But I want to crawl only specific set of urls i.e. I don't want to
>>>>>>> crawl all games. So I specified those urls in start_urls[]. But I came 
>>>>>>> to
>>>>>>> know that we cant use both "def start_requests(self)" and
>>>>>>> "start_urls[]"
>>>>>>>
>>>>>>> So do you have any idea about this ? I just want to scrape specific
>>>>>>> set of urls(some 500 urls) but not all urls.
>>>>>>>
>>>>>>> On Friday, June 20, 2014 4:38:15 AM UTC-7, Lhassan Baazzi wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I create a github project and contain a scrapy project that scrap
>>>>>>>> data for this website, see github repository:
>>>>>>>> https://github.com/jbinfo/scrapy_store.steampowered.com
>>>>>>>> Look at it and clone the project on your local and correct bugs.
>>>>>>>>
>>>>>>>> If you like it, you can make a donate, see my email signature.
>>>>>>>>
>>>>>>>> Regards.
>>>>>>>> ---------
>>>>>>>> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy
>>>>>>>> Email/Gtalk: [email protected] - Skype: baazzilhassan
>>>>>>>> Blog: http://blog.jbinfo.io/
>>>>>>>> [image: Donate - PayPal -]
>>>>>>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate
>>>>>>>> - PayPal -
>>>>>>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-06-19 8:23 GMT+01:00 Chetan Motamarri <[email protected]>:
>>>>>>>>
>>>>>>>>>  Hi folks,
>>>>>>>>>
>>>>>>>>> I am new to scrapy. I had an issue which I don't know how to solve.
>>>>>>>>>
>>>>>>>>> I need to scrape game info from the url: "
>>>>>>>>> http://store.steampowered.com/agecheck/app/252490/"; but it
>>>>>>>>> requires agecheck to process further and scrape game data. So I need 
>>>>>>>>> to
>>>>>>>>> fill this once for game. The website stores info as cookies(I guess) 
>>>>>>>>> as it
>>>>>>>>> is not asking for agecheck for subsequent games. i.e for the first 
>>>>>>>>> game
>>>>>>>>> only we need to enter age then it automatically stores age.
>>>>>>>>>
>>>>>>>>> So my problem is how to automatically send drop down values in
>>>>>>>>> scrapy and store them and use as cookies, and use those cookies for
>>>>>>>>> subsequent start urls.
>>>>>>>>>
>>>>>>>>> Plz help me friends. Thanks in advance.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "scrapy-users" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>> send an email to [email protected].
>>>>>>>>> To post to this group, send email to [email protected].
>>>>>>>>>
>>>>>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "scrapy-users" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> To post to this group, send email to [email protected].
>>>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "scrapy-users" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "scrapy-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Automatically fill drop down values with scrapy

Reply via email to