In my case phantom.js had issues like: 

- sometimes site uses broken HTML and jQuery gives different result in 
phantom than in the Chrome
- there was cases when I can't trigger 'click' event by phantom, when the 
site uses some strange ways to register onclick function.
- pahntom.js is incompatible with node.js, there are some non-standard 
bindings, I tried 3 such bindings but for me all of them worked very 
unstable, so I just given up at the end.

import.io - an interesting idea, seems like usefull service. But, sadly in 
our case it was a little more complicated, there are lots of complex 
interactions (like click here wait till something appears there if it 
appears next go here if it not appears go there etc.). I doubth you can 
program such behavior using GUI or some sort of DSL. 

Also, we use it heavily and it consumes huge amount of resources (99% 
consumes Selenium + Browser Emulators), it costly even if you pay only for 
the physical servers. If on the other hands you use services provided by 
other company and pay twice - for the servers and for their service - it 
would cost us even more. In our case it was cheaper to spend one month in 
developing such service by ourselves.

On Saturday, 26 April 2014 17:29:56 UTC+4, Duy Nguyen wrote:
>
> I did a scraper with phantomjs before, it works great but I think you 
> should take a look at https://import.io/
>
>
>
>
> On Sat, Apr 26, 2014 at 7:42 AM, Alexey Petrushin 
> <alexey.p...@gmail.com<javascript:>
> > wrote:
>
>> I finished such project recently - Crawler for JavaScript Sites, with 
>> Browser Emulator (Selenium). 
>>
>> It's a private project, but I wrote some details about it and how it 
>> works, maybe it will be interested for someone.
>>
>> http://alex-craft.com/blog/2014/crawling-javascript-sites
>>
>> On Thursday, 16 January 2014 06:09:48 UTC+4, Victor Hooi wrote:
>>
>>> Hi,
>>>
>>> I'm wondering if anybody knows of any web-scraping frameworks in Node.JS?
>>>
>>> Previously, there was node.io (https://github.com/chriso/node.io), 
>>> however, the project was recently discontinued.
>>>
>>> Googling for Node.JS and web scraping, most of the guides online just 
>>> talk about using requests and cheerio - it works, but you need to handle a 
>>> whole bunch of things yourself (throttling, distributing jobs, 
>>> configuration, managing jobs etc.).
>>>
>>> On the Python side, I know of Scrapy (https://github.com/scrapy/scrapy), 
>>> which is using Twisted for asynchronicity
>>>
>>> On the Ruby side, Nokogiri (http://nokogiri.org/) is meant to be good, 
>>> although I haven't dived into it much.
>>>
>>> Is there anything equivalent in the Node world? Or what approaches are 
>>> people using to tackle this problem?
>>>
>>> Cheers,
>>> Victor
>>>
>>  -- 
>> -- 
>> Job Board: http://jobs.nodejs.org/
>> Posting guidelines: 
>> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
>> You received this message because you are subscribed to the Google
>> Groups "nodejs" group.
>> To post to this group, send email to nod...@googlegroups.com<javascript:>
>> To unsubscribe from this group, send email to
>> nodejs+un...@googlegroups.com <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/nodejs?hl=en?hl=en
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "nodejs" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to nodejs+un...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Nguyen Hai Duy
> Mobile : 0914 72 1900
> Skype: nguyenhd2107
> Yahoo: nguyenhd_lucky
>  

-- 
-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nodejs@googlegroups.com
To unsubscribe from this group, send email to
nodejs+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to nodejs+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to