Hi, Uncharted I took a read on the post that you mentioned, and I noticed that several of the issues reported have been solved or minimized. And I think that the purpose of their use of mongodb was different and heavier, it was to store/query items that could reach millions. The purpose of mongodb here on this project is just to queue/dequeue the tasks to be performed by scrapyd (which was previously being held with SQLite).
But it was nice to know they are using HBase, I'll take a look and try to add to the library an interface for those who find it better to use HBase. Thank you for the advice! Em terça-feira, 3 de maio de 2016 10:24:01 UTC-3, Uncharted escreveu: > > Hi > > I'm currently starting to work on the same kind of use case. > I found this article which does not recommend mongodb : > https://blog.scrapinghub.com/2013/05/13/mongo-bad-for-scraped-data/ > > They say that you'll have the same lock contention with mongodb : the > article was written in 2013 so maybe it's not the case anymore. > > And they migrated to HBase which seems to be the right backend, It is used > also in the Apache Nutch project. > > > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
