Hey Bill.

i found what I think to be articles discussing the nasa image/scrapy.
Yeah, it's not really doing the headless browser at all.. It's
"simulating" a piece of what the javascript returns from that given
page.. But for a complex dnamic site, still doesn't do a "real"
headless browser..

thanks


On Mon, Apr 28, 2014 at 1:30 PM, bruce <[email protected]> wrote:
> bill...
>
> not sure that's the same... ie, I don't think scrapy has a way to
> "wait" for an element to show up on a given page, based on the
> underlying ajax functions...
>
> I had talked to pablo about this awhile ago and he was saying scrapy
> couldn't handle this. Are you saying it now can??
>
> This would be cool if it really can.
>
>
> On Mon, Apr 28, 2014 at 1:13 PM, Bill Ebeling <[email protected]> wrote:
>> Scrapy sends a request to the ajax address just like it does for the normal
>> webpage. You maintain data from one request to the other with the meta dict.
>>
>> There was a tutorial on it a while back about scraping the nasa website for
>> it's pic of the day.  Can't seem to find it, now though.  If you take a look
>> at the link above, you can read all about it.
>>
>>
>> On Mon, Apr 28, 2014 at 1:01 PM, bruce <[email protected]> wrote:
>>>
>>> I didn't think scrappy had the ability to run remote ajax, similar to
>>> casperjs/phantom/nodejs...
>>>
>>> Does scrappy run a headless browser process to accomplish this??
>>>
>>> thanks
>>>
>>>
>>> On Mon, Apr 28, 2014 at 10:17 AM, Bill Ebeling <[email protected]>
>>> wrote:
>>> > Hey Mitch,
>>> >
>>> > At the risk of stating the obvious, Scrapy handles dynamic content quite
>>> > well.  The general approach is to scrape the page, submit requests for
>>> > the
>>> > ajax, stich the item together, submit it to the pipeline.
>>> >
>>> > That said, it's not complicated, but not trivial, either.
>>> >
>>> > To your specific point, the solution is either to regex it out, or to
>>> > start
>>> > fiddling with the underlying html.  I would not personally download
>>> > someone
>>> > else's page and then put it on a server, since the js is still going to
>>> > be
>>> > running and logging things and all that.
>>> >
>>> > If you want to look into writing a crawler that gets the dynamic
>>> > content,
>>> > start here: http://doc.scrapy.org/en/latest/topics/request-response.html
>>> > and
>>> > pay special attention to the meta dict.
>>> >
>>> > If you want more help with the specific site, provide a link so we can
>>> > see
>>> > it.
>>> >
>>> > Hope that helps.
>>> >
>>> > --
>>> > You received this message because you are subscribed to the Google
>>> > Groups
>>> > "scrapy-users" group.
>>> > To unsubscribe from this group and stop receiving emails from it, send
>>> > an
>>> > email to [email protected].
>>> > To post to this group, send email to [email protected].
>>> > Visit this group at http://groups.google.com/group/scrapy-users.
>>> > For more options, visit https://groups.google.com/d/optout.
>>>
>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "scrapy-users" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/scrapy-users/LyCuWu4ydeA/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "scrapy-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to