Probably not so good because you want to run the crawler multi-threaded
across a lot of network connections and this would affect Neo4j's
performance (also in terms of GC).

Probably easier to use a message queue to send crawled pages to a neo4j
extension and then let the extension run the graph algorithms you want to
use to integrate the crawling results best into your graph.

HTH Michael

On Wed, Nov 19, 2014 at 6:45 PM, Pedro Montoto García <[email protected]>
wrote:

> Considering the situation of implementing a domain-specific web crawler
> I've come across a number of technologies, but I had an idea to implement
> it as a server extension in neo4j.
>
> The idea would be to use the graph database to implement the concepts of
> "already explored pages" and "frontier" as server-side algorithms and use
> them to feed the crawling algorithm but, as you see, you can go an step
> further and implement the crawling in the server side too. Could this be a
> bad idea? If so, why?
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to