I am building a search engine that would scrape a number of websites (not the whole internet)
I am planning to write the website in django and use scrapy for crawling/scraping. I am thinking of using elasticsearch with hadoop as it seems straightforward to integrate and there are a lot of documentations and resources around. I also have user inputted data. reviews, ratings, content moderation and stuff. I am wondering if this could be handled by hadoop as data from here would require joins and complex queries. I am also planning to build analytics data off the hadoop data. So, I am thinking of using another db for this , an rdbms. Do you think hadoop is enough? I am open to any ideas on how to go about this. Or maybe share what you are currently using if you're building something similar. thanks! -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
