I initially rejected nutch due to nonexistent documentation. Let's see what 
the Apache guys say to the module add.

El jueves, 20 de noviembre de 2014 21:47:16 UTC+1, Benoît Simard escribió:
>
>  Hi,
>
> There is an apache project for that : nutch (https://nutch.apache.org/)
> You can do a plugin for gora (https://gora.apache.org/) that save data 
> into neo4j.
>
> Cheers
>
> Le 20/11/2014 01:46, Michael Hunger a écrit :
>  
> Probably not so good because you want to run the crawler multi-threaded 
> across a lot of network connections and this would affect Neo4j's 
> performance (also in terms of GC). 
>
>  Probably easier to use a message queue to send crawled pages to a neo4j 
> extension and then let the extension run the graph algorithms you want to 
> use to integrate the crawling results best into your graph.
>
>  HTH Michael
>  
> On Wed, Nov 19, 2014 at 6:45 PM, Pedro Montoto García <[email protected] 
> <javascript:>> wrote:
>
>> Considering the situation of implementing a domain-specific web crawler 
>> I've come across a number of technologies, but I had an idea to implement 
>> it as a server extension in neo4j. 
>>
>>  The idea would be to use the graph database to implement the concepts 
>> of "already explored pages" and "frontier" as server-side algorithms and 
>> use them to feed the crawling algorithm but, as you see, you can go an step 
>> further and implement the crawling in the server side too. Could this be a 
>> bad idea? If so, why?
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>  
>  -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/d/optout.
>
>
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to