I initially rejected nutch due to nonexistent documentation. Let's see what the Apache guys say to the module add.
El jueves, 20 de noviembre de 2014 21:47:16 UTC+1, Benoît Simard escribió: > > Hi, > > There is an apache project for that : nutch (https://nutch.apache.org/) > You can do a plugin for gora (https://gora.apache.org/) that save data > into neo4j. > > Cheers > > Le 20/11/2014 01:46, Michael Hunger a écrit : > > Probably not so good because you want to run the crawler multi-threaded > across a lot of network connections and this would affect Neo4j's > performance (also in terms of GC). > > Probably easier to use a message queue to send crawled pages to a neo4j > extension and then let the extension run the graph algorithms you want to > use to integrate the crawling results best into your graph. > > HTH Michael > > On Wed, Nov 19, 2014 at 6:45 PM, Pedro Montoto García <[email protected] > <javascript:>> wrote: > >> Considering the situation of implementing a domain-specific web crawler >> I've come across a number of technologies, but I had an idea to implement >> it as a server extension in neo4j. >> >> The idea would be to use the graph database to implement the concepts >> of "already explored pages" and "frontier" as server-side algorithms and >> use them to feed the crawling algorithm but, as you see, you can go an step >> further and implement the crawling in the server side too. Could this be a >> bad idea? If so, why? >> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:>. > For more options, visit https://groups.google.com/d/optout. > > > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
