I am building a search engine that would scrape a number of websites (not the 
whole internet) 

I am planning to write the website in django and use scrapy for 
crawling/scraping.

I am thinking of using elasticsearch with hadoop as it seems straightforward to 
integrate and there are a lot of documentations and  resources around.

I also have  user inputted data. reviews, ratings, content moderation and 
stuff. I am wondering if this could be handled by hadoop as data from here 
would require joins and complex queries. I am also planning to build analytics 
data off the hadoop data.  So, I am thinking of using another db for this , an 
rdbms. Do you think hadoop is enough?

I am open to any ideas on how to go about this. Or maybe share what you are 
currently using if you're building something similar. thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to