Roundabout way of scraping dynamic content.

mitch Thu, 24 Apr 2014 02:58:27 -0700

Hi everyone,

I'm just a hack when it comes to this stuff, so this solution is by no 
means elegant.


I have some dynamic content I want to scrape.  I have a small number of 
actual pages (< 50), but I want to parse many different page elements.  
Because of this, I thought I'd just manually visit the pages, download the 
html source after JS does its work, then put the files on my own private 
webserver and to a quick crawl so that I can have the parsing benefits of 
scrapy....

The problem I'm running in to is that, even after the page has been saved 
as as html file, much of the information I want is still hidden inside 
these "hidden_elem" tags and surrounded by comment type "<--!" characters, 
making it invisible to scrapy.  However, the information IS in the code, I 
can open the file and see it plain as day.  How can I make scrapy give it 
to me?

Thanks so much!

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Roundabout way of scraping dynamic content.

Reply via email to