On Mon, Apr 5, 2010 at 11:56 PM, Jonathan Gray <[email protected]> wrote: > Imran, > > It's impossible to give good advice on cluster size and hardware > configuration without some idea of the requirements. >
Sorry my mistake, I should have elaborated a little bit more. Please find some requirements below inline. > How much data? To startup, initially we will not have much, but down the road it will be a lot of data. Plus a lot of user created content. Initially there will be a lot of log-like entries, plus transactions... > How will the data be queried? We are focusing on system design to lookup using key only. For searching it will be Solr only. So the idea is Solr will be used for all searching and then the data lookup will be performed in HBase. In addition, we will have both Application layer caching in Ehcache and Web Accelerator (Varnish). > What kind of load do you expect? Hard to estimate but we are planning for moderate installation, so that if we have good response from the market we can expand, thats one of the 2 primary reason to choose HBase we will be able to scale it on the fly. > You are going to be doing offline batch/MapReduce, online random access, as > well as search all from the same nodes? This can be dangerous. Yes the offline batch, HBase lookup will be on the same machine. But not search as a whole...Solr will use HDFS only to store the index and read it from, but no processing related to search will be done there. It will be on a separate box all together. But your following statement is tempting me to use RAID+DRDB for Solr based searching. Another thing to note is, the offline batch work would be summarize tables. One example from our system would be to generate daily balance sheet of ledgers, profit loss statement etc. for 100+ Journals in a PoS SaaS. > I would strongly recommend against putting Hadoop+HBase on the same nodes as > something like Solr, unless you have dedicated disks for each. Also, don't > forget about ZooKeeper which you definitely will need separate nodes/disks > for if you will be co-locating so many other things. Hmm.. What I understand from this discussion and Patrick's point on ZK. I would go for: - 4 separate DN (each DN with its dedicated disk but may be not physical server) for Solr only, as a side note, initially we will have 2 Solr search boxes. - ZK needs separate disk for performance, so would have dedicated disks for it too. But what I am confused about is how spread out ZK, Multi-Master, RS, DN, TT for HBase. Insight, comments, suggestions on it would be most welcome. Another note on our perspective is that we want to scale horizontally by adding more machines and not vertically (if we wanted it or could afford it, we would have probably chosen a RDBMS). Being able to scale horizontally as our user-base, load and revenue increases was/is essential to us. Waiting eagerly for some insight, comments and/or suggestions. Thank you. Imran > > JG > >> -----Original Message----- >> From: Imran M Yousuf [mailto:[email protected]] >> Sent: Monday, April 05, 2010 9:52 AM >> To: [email protected] >> Subject: About test/production server configuration >> >> Hi, >> >> We are a startup who have decided to use HBase purely because we want >> to take advantage of HDFS based reliability, redundancy, MapReduce and >> BigTable. For that we are thinking to go for a test environment with 5 >> servers and production environment with 10 servers in both case the >> Hadoop cluster will be used for HBase + MapReduce + Solr Index. >> >> Firstly, I would like some opinion on whether 10 servers is a good >> number for all 3 purposes or not. Secondly what kind of test >> environment is currently in use in different organizations. Thirdly, I >> would like to learn some server configuration and purchase price (with >> purchase location if possible). >> >> Waiting eagerly for some feedback. >> >> Thank you, >> >> -- >> Imran M Yousuf >> Entrepreneur & Software Engineer >> Smart IT Engineering >> Dhaka, Bangladesh >> Email: [email protected] >> Blog: http://imyousuf-tech.blogs.smartitengineering.com/ >> Mobile: +880-1711402557 > -- Imran M Yousuf Entrepreneur & Software Engineer Smart IT Engineering Dhaka, Bangladesh Email: [email protected] Blog: http://imyousuf-tech.blogs.smartitengineering.com/ Mobile: +880-1711402557
