Hi, Scrapy downloads the DOM of any page. If you want extract data that doesn't come with the DOM(IE: Ajax data), you can make a new Ajax request and set the correct headers and cookies. I would scrapy the page, then call the ajax calls.
On Thursday, April 24, 2014 12:58:03 PM UTC+3, mitch wrote: > > Hi everyone, > > I'm just a hack when it comes to this stuff, so this solution is by no > means elegant. > > I have some dynamic content I want to scrape. I have a small number of > actual pages (< 50), but I want to parse many different page elements. > Because of this, I thought I'd just manually visit the pages, download the > html source after JS does its work, then put the files on my own private > webserver and to a quick crawl so that I can have the parsing benefits of > scrapy.... > > The problem I'm running in to is that, even after the page has been saved > as as html file, much of the information I want is still hidden inside > these "hidden_elem" tags and surrounded by comment type "<--!" characters, > making it invisible to scrapy. However, the information IS in the code, I > can open the file and see it plain as day. How can I make scrapy give it > to me? > > Thanks so much! > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
