Re: Automatically fill drop down values with scrapy

Lhassan Baazzi Wed, 25 Jun 2014 02:10:21 -0700

Yeh you are right, I pushed a correction of this on the repository and the
CSV output:
https://github.com/jbinfo/scrapy_store.steampowered.com/blob/master/output.csv
perhaps only 476 games that are exported, the messing  game is due to the
website like the ranked game 257 http://store.steampowered.com/app/202480/
if you enter on that link your are redirect to the homepage!



Regards.

Merci.
---------
Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy
Email/Gtalk: [email protected] - Skype: baazzilhassan - Twitter:
@baazzilhassan <http://twitter.com/baazzilhassan>
Blog: http://blog.jbinfo.io/
[image: Donate - PayPal -]
<https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate
- PayPal -
<https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>


2014-06-25 2:16 GMT+01:00 Chetan Motamarri <[email protected]>:

> Hi dude,
>
> The code works very fine. But only 450 games out of 500 are retrieving. I
> think it is not scraping two pages out of 20 pages in steamcharts.com. I
> found out that one page it is ignoring is "http://steamcharts.com/top/p.1";.
> In the output csv there are no games present in before url.
>
> I tried editing regex "allow=r'(top\/p\.)([1-9]|1[0-9]|20)$' " but not
> able to resolve. Could you please help me..
>
>
> On Tuesday, June 24, 2014 4:35:10 AM UTC-7, Lhassan Baazzi wrote:
>
>> Hi
>>
>> Complexe task, look at github repository, I pushed the script of the top
>> 500 games, see also the output.csv
>>
>> There is some minor bugs to fix.
>>
>> Regards.
>>
>> Merci.
>> ---------
>> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy
>> Email/Gtalk: [email protected] - Skype: baazzilhassan
>> Blog: http://blog.jbinfo.io/
>> [image: Donate - PayPal -]
>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate
>> - PayPal -
>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>
>>
>>
>> 2014-06-24 8:10 GMT+01:00 Chetan Motamarri <[email protected]>:
>>
>> Hi,
>>>
>>> The game links which I wanted to scrape are the top 500 games extracted
>>> from http://steamcharts.com/top.
>>>
>>> Some example urls are "http://store.steampowered.com/app/570/";, "
>>> http://store.steampowered.com/app/730/";, "http://store.steampowered.
>>> com/app/440/". These urls are top 3 games in that steamcharts.com.
>>>
>>> These top 500 game urls don't have any unique structure. These urls look
>>> like all the other game urls. So how to scrape only these urls. ?
>>>
>>> On Monday, June 23, 2014 11:48:17 PM UTC-7, Lhassan Baazzi wrote:
>>>
>>>> Hi,
>>>>
>>>> I need the structure of this links that you want to scrap, if look at
>>>> my code you can see that I limit link by this:
>>>>
>>>> rules = (
>>>>         Rule(SgmlLinkExtractor(allow=r'genre/'), follow=True),
>>>>         Rule(SgmlLinkExtractor(allow=r'app/\d+'), callback='parse_item'
>>>> )
>>>>     )
>>>>
>>>> Just link that begin by app/\d+ and genre/ to follow and reject
>>>> anythings else.
>>>> So, give me example of this specifics links.
>>>>
>>>> Regards.
>>>>
>>>>
>>>> Merci.
>>>> ---------
>>>> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy
>>>> Email/Gtalk: [email protected] - Skype: baazzilhassan
>>>> Blog: http://blog.jbinfo.io/
>>>> [image: Donate - PayPal -]
>>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate
>>>> - PayPal -
>>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>
>>>>
>>>>
>>>> 2014-06-24 7:40 GMT+01:00 Chetan Motamarri <[email protected]>:
>>>>
>>>> Hi Lhassan,
>>>>>
>>>>> Thanks for your response. Your code was amazing and I got what I am
>>>>> looking for.
>>>>> But I want to crawl only specific set of urls i.e. I don't want to
>>>>> crawl all games. So I specified those urls in start_urls[]. But I came to
>>>>> know that we cant use both "def start_requests(self)" and
>>>>> "start_urls[]"
>>>>>
>>>>> So do you have any idea about this ? I just want to scrape specific
>>>>> set of urls(some 500 urls) but not all urls.
>>>>>
>>>>> On Friday, June 20, 2014 4:38:15 AM UTC-7, Lhassan Baazzi wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I create a github project and contain a scrapy project that scrap
>>>>>> data for this website, see github repository:
>>>>>> https://github.com/jbinfo/scrapy_store.steampowered.com
>>>>>> Look at it and clone the project on your local and correct bugs.
>>>>>>
>>>>>> If you like it, you can make a donate, see my email signature.
>>>>>>
>>>>>> Regards.
>>>>>> ---------
>>>>>> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy
>>>>>> Email/Gtalk: [email protected] - Skype: baazzilhassan
>>>>>> Blog: http://blog.jbinfo.io/
>>>>>> [image: Donate - PayPal -]
>>>>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate
>>>>>> - PayPal -
>>>>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>
>>>>>>
>>>>>>
>>>>>> 2014-06-19 8:23 GMT+01:00 Chetan Motamarri <[email protected]>:
>>>>>>
>>>>>>>  Hi folks,
>>>>>>>
>>>>>>> I am new to scrapy. I had an issue which I don't know how to solve.
>>>>>>>
>>>>>>> I need to scrape game info from the url: "http://store.steampowered.
>>>>>>> com/agecheck/app/252490/" but it requires agecheck to process
>>>>>>> further and scrape game data. So I need to fill this once for game. The
>>>>>>> website stores info as cookies(I guess) as it is not asking for agecheck
>>>>>>> for subsequent games. i.e for the first game only we need to enter age 
>>>>>>> then
>>>>>>> it automatically stores age.
>>>>>>>
>>>>>>> So my problem is how to automatically send drop down values in
>>>>>>> scrapy and store them and use as cookies, and use those cookies for
>>>>>>> subsequent start urls.
>>>>>>>
>>>>>>> Plz help me friends. Thanks in advance.
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "scrapy-users" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> To post to this group, send email to [email protected].
>>>>>>>
>>>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "scrapy-users" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "scrapy-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Automatically fill drop down values with scrapy

Reply via email to