Re: Roundabout way of scraping dynamic content.

William Kinaan Wed, 30 Apr 2014 11:42:44 -0700

Hi,
Scrapy downloads the DOM of any page. If you want extract data that doesn't 
come with the DOM(IE: Ajax data), you can make a new Ajax request and set 
the correct headers and cookies.
I would scrapy the page, then call the ajax calls.



On Thursday, April 24, 2014 12:58:03 PM UTC+3, mitch wrote:
>
> Hi everyone,
>
> I'm just a hack when it comes to this stuff, so this solution is by no 
> means elegant.
>
> I have some dynamic content I want to scrape.  I have a small number of 
> actual pages (< 50), but I want to parse many different page elements.  
> Because of this, I thought I'd just manually visit the pages, download the 
> html source after JS does its work, then put the files on my own private 
> webserver and to a quick crawl so that I can have the parsing benefits of 
> scrapy....
>
> The problem I'm running in to is that, even after the page has been saved 
> as as html file, much of the information I want is still hidden inside 
> these "hidden_elem" tags and surrounded by comment type "<--!" characters, 
> making it invisible to scrapy.  However, the information IS in the code, I 
> can open the file and see it plain as day.  How can I make scrapy give it 
> to me?
>
> Thanks so much!
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Roundabout way of scraping dynamic content.

Reply via email to