Hi,

I need the structure of this links that you want to scrap, if look at my
code you can see that I limit link by this:

rules = (
        Rule(SgmlLinkExtractor(allow=r'genre/'), follow=True),
        Rule(SgmlLinkExtractor(allow=r'app/\d+'), callback='parse_item')
    )

Just link that begin by app/\d+ and genre/ to follow and reject anythings
else.
So, give me example of this specifics links.

Regards.


Merci.
---------
Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy
Email/Gtalk: [email protected] - Skype: baazzilhassan
Blog: http://blog.jbinfo.io/
[image: Donate - PayPal -]
<https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate
- PayPal -
<https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>


2014-06-24 7:40 GMT+01:00 Chetan Motamarri <[email protected]>:

> Hi Lhassan,
>
> Thanks for your response. Your code was amazing and I got what I am
> looking for.
> But I want to crawl only specific set of urls i.e. I don't want to crawl
> all games. So I specified those urls in start_urls[]. But I came to know
> that we cant use both "def start_requests(self)" and "start_urls[]"
>
> So do you have any idea about this ? I just want to scrape specific set of
> urls(some 500 urls) but not all urls.
>
> On Friday, June 20, 2014 4:38:15 AM UTC-7, Lhassan Baazzi wrote:
>
>> Hi,
>>
>> I create a github project and contain a scrapy project that scrap data
>> for this website, see github repository: https://github.com/jbinfo/
>> scrapy_store.steampowered.com
>> Look at it and clone the project on your local and correct bugs.
>>
>> If you like it, you can make a donate, see my email signature.
>>
>> Regards.
>> ---------
>> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy
>> Email/Gtalk: [email protected] - Skype: baazzilhassan
>> Blog: http://blog.jbinfo.io/
>> [image: Donate - PayPal -]
>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate
>> - PayPal -
>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>
>>
>>
>> 2014-06-19 8:23 GMT+01:00 Chetan Motamarri <[email protected]>:
>>
>>> Hi folks,
>>>
>>> I am new to scrapy. I had an issue which I don't know how to solve.
>>>
>>> I need to scrape game info from the url: "http://store.steampowered.
>>> com/agecheck/app/252490/" but it requires agecheck to process further
>>> and scrape game data. So I need to fill this once for game. The website
>>> stores info as cookies(I guess) as it is not asking for agecheck for
>>> subsequent games. i.e for the first game only we need to enter age then it
>>> automatically stores age.
>>>
>>> So my problem is how to automatically send drop down values in scrapy
>>> and store them and use as cookies, and use those cookies for subsequent
>>> start urls.
>>>
>>> Plz help me friends. Thanks in advance.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "scrapy-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>>
>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to