Hi, guys, Nutch has its own data format for CrawlDB and LinkDB, which are difficult to manage and share among applications. Are there any web crawlers based on relational database? I can see that Nutch is trying to use HBase for storage, but why not use a relational database instead? We can use partitioning to solve scalability problem.
Thanks! Xiao