Hi

Complexe task, look at github repository, I pushed the script of the top
500 games, see also the output.csv

There is some minor bugs to fix.

Regards.

Merci.
---------
Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy
Email/Gtalk: [email protected] - Skype: baazzilhassan
Blog: http://blog.jbinfo.io/
[image: Donate - PayPal -]
<https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate
- PayPal -
<https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>


2014-06-24 8:10 GMT+01:00 Chetan Motamarri <[email protected]>:

> Hi,
>
> The game links which I wanted to scrape are the top 500 games extracted
> from http://steamcharts.com/top.
>
> Some example urls are "http://store.steampowered.com/app/570/";, "
> http://store.steampowered.com/app/730/";, "
> http://store.steampowered.com/app/440/";. These urls are top 3 games in
> that steamcharts.com.
>
> These top 500 game urls don't have any unique structure. These urls look
> like all the other game urls. So how to scrape only these urls. ?
>
> On Monday, June 23, 2014 11:48:17 PM UTC-7, Lhassan Baazzi wrote:
>
>> Hi,
>>
>> I need the structure of this links that you want to scrap, if look at my
>> code you can see that I limit link by this:
>>
>> rules = (
>>         Rule(SgmlLinkExtractor(allow=r'genre/'), follow=True),
>>         Rule(SgmlLinkExtractor(allow=r'app/\d+'), callback='parse_item')
>>     )
>>
>> Just link that begin by app/\d+ and genre/ to follow and reject anythings
>> else.
>> So, give me example of this specifics links.
>>
>> Regards.
>>
>>
>> Merci.
>> ---------
>> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy
>> Email/Gtalk: [email protected] - Skype: baazzilhassan
>> Blog: http://blog.jbinfo.io/
>> [image: Donate - PayPal -]
>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate
>> - PayPal -
>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>
>>
>>
>> 2014-06-24 7:40 GMT+01:00 Chetan Motamarri <[email protected]>:
>>
>> Hi Lhassan,
>>>
>>> Thanks for your response. Your code was amazing and I got what I am
>>> looking for.
>>> But I want to crawl only specific set of urls i.e. I don't want to crawl
>>> all games. So I specified those urls in start_urls[]. But I came to know
>>> that we cant use both "def start_requests(self)" and "start_urls[]"
>>>
>>> So do you have any idea about this ? I just want to scrape specific set
>>> of urls(some 500 urls) but not all urls.
>>>
>>> On Friday, June 20, 2014 4:38:15 AM UTC-7, Lhassan Baazzi wrote:
>>>
>>>> Hi,
>>>>
>>>> I create a github project and contain a scrapy project that scrap data
>>>> for this website, see github repository: https://github.com/jbinfo/scra
>>>> py_store.steampowered.com
>>>> Look at it and clone the project on your local and correct bugs.
>>>>
>>>> If you like it, you can make a donate, see my email signature.
>>>>
>>>> Regards.
>>>> ---------
>>>> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy
>>>> Email/Gtalk: [email protected] - Skype: baazzilhassan
>>>> Blog: http://blog.jbinfo.io/
>>>> [image: Donate - PayPal -]
>>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate
>>>> - PayPal -
>>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>
>>>>
>>>>
>>>> 2014-06-19 8:23 GMT+01:00 Chetan Motamarri <[email protected]>:
>>>>
>>>>>  Hi folks,
>>>>>
>>>>> I am new to scrapy. I had an issue which I don't know how to solve.
>>>>>
>>>>> I need to scrape game info from the url: "http://store.steampowered.
>>>>> com/agecheck/app/252490/" but it requires agecheck to process further
>>>>> and scrape game data. So I need to fill this once for game. The website
>>>>> stores info as cookies(I guess) as it is not asking for agecheck for
>>>>> subsequent games. i.e for the first game only we need to enter age then it
>>>>> automatically stores age.
>>>>>
>>>>> So my problem is how to automatically send drop down values in scrapy
>>>>> and store them and use as cookies, and use those cookies for subsequent
>>>>> start urls.
>>>>>
>>>>> Plz help me friends. Thanks in advance.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "scrapy-users" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>>
>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "scrapy-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to