Re: [Neo4j] Implement crawler in neo4j

Benoit Simard Thu, 20 Nov 2014 12:47:22 -0800

Hi,

There is an apache project for that : nutch (https://nutch.apache.org/)

You can do a plugin for gora (https://gora.apache.org/) that save datainto neo4j.


Cheers

Le 20/11/2014 01:46, Michael Hunger a écrit :

Probably not so good because you want to run the crawlermulti-threaded across a lot of network connections and this wouldaffect Neo4j's performance (also in terms of GC).
Probably easier to use a message queue to send crawled pages to aneo4j extension and then let the extension run the graph algorithmsyou want to use to integrate the crawling results best into your graph.
HTH Michael
On Wed, Nov 19, 2014 at 6:45 PM, Pedro Montoto García<[email protected] <mailto:[email protected]>> wrote:
    Considering the situation of implementing a domain-specific web
    crawler I've come across a number of technologies, but I had an
    idea to implement it as a server extension in neo4j.

    The idea would be to use the graph database to implement the
    concepts of "already explored pages" and "frontier" as server-side
    algorithms and use them to feed the crawling algorithm but, as you
    see, you can go an step further and implement the crawling in the
    server side too. Could this be a bad idea? If so, why?
--You received this message because you are subscribed to the Google
    Groups "Neo4j" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected]
    <mailto:[email protected]>.
    For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the GoogleGroups "Neo4j" group.To unsubscribe from this group and stop receiving emails from it, sendan email to [email protected]<mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Implement crawler in neo4j

Reply via email to