Re: Crawlspider to parse and add links from XML pages on the way

Paul Tremberth Thu, 24 Mar 2016 02:35:17 -0700

Hi,

For the curious and those not following the StackOverflow question,
I posted an alternative solution to Steven's, using a custom LinkExtractor 
for XML files:
http://stackoverflow.com/a/36130453


It requires overriding CrawlSpider _requests_to_follow to allow non-HTML 
responses.
I'll open an issue to make this easier with CrawlSpider

Hope this helps.
Paul.

On Sunday, March 20, 2016 at 6:00:07 PM UTC+1, Arif Sait Birincioglu wrote:
>
> Repost of : 
> http://stackoverflow.com/questions/36100199/crawlspider-to-parse-and-add-links-from-xml-pages-on-the-way
>
>
> I have created a crawlspider for my needs it works perfectly. However 
> there are certain xml sitemaps on some categories(not in all) on the site I 
> am crawling. So I would like to have the feature to parse .xml sitemap on 
> these categories and get links then leave it to the crawlspider to go 
> deeper to those links.
>
> I am aware that there is a SitemapSpider and XMLFeedSpider however I need 
> the functionality of crawlspider with XMLFeedSpider or vice-versa.
>
>
> PS: I have tried Mr. Steven Almeroth 
> <http://stackoverflow.com/users/395737/steven-almeroth>'s solution which 
> isnt working for a Crawlspider. Crawlspider doesn't parse .xml files, the 
> return is null.
>
>
> Any help would be appreciated.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Crawlspider to parse and add links from XML pages on the way

Reply via email to