Hi, For the curious and those not following the StackOverflow question, I posted an alternative solution to Steven's, using a custom LinkExtractor for XML files: http://stackoverflow.com/a/36130453
It requires overriding CrawlSpider _requests_to_follow to allow non-HTML responses. I'll open an issue to make this easier with CrawlSpider Hope this helps. Paul. On Sunday, March 20, 2016 at 6:00:07 PM UTC+1, Arif Sait Birincioglu wrote: > > Repost of : > http://stackoverflow.com/questions/36100199/crawlspider-to-parse-and-add-links-from-xml-pages-on-the-way > > > I have created a crawlspider for my needs it works perfectly. However > there are certain xml sitemaps on some categories(not in all) on the site I > am crawling. So I would like to have the feature to parse .xml sitemap on > these categories and get links then leave it to the crawlspider to go > deeper to those links. > > I am aware that there is a SitemapSpider and XMLFeedSpider however I need > the functionality of crawlspider with XMLFeedSpider or vice-versa. > > > PS: I have tried Mr. Steven Almeroth > <http://stackoverflow.com/users/395737/steven-almeroth>'s solution which > isnt working for a Crawlspider. Crawlspider doesn't parse .xml files, the > return is null. > > > Any help would be appreciated. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
