On 06/09/10 09:32, 褚 鵬兵 wrote:
hi ,my hadoop friends:i have the 3 questions about hadoop.there are .... 1 the speed between the datanodes. Tera data in one datanodes , the data transfers from one datanode to the another datanode. if the speed is bad, Hadoop will be slow, i think. i heard the gNet architecture in Greenplum , then hadoop ? SAS storage + G-Ethernet is best answer, isn't it?
if your code has locality gigabit ether is fine, saves the hassle of getting faster stuff to work. Have you ever tried to debug infiniband cluster problems?
2 the GUI tool there is a hive web tool in hadoop. but it is not enough to use it for our business work. it is too simple to use it. if hadoop+hive is designed into DWH. then how to use it for users. by CGI Tool(Command),? by New Developed webGUITOOL.?
the community welcomes new contributions. I'd look at cascading, datameeer's stuff, and other things. Hive is designed for people who know SQL, like PHP developers.
3 5 computers Hadoop cluster and 1 computer SQLSERVER2000 5 computers Hadoop celeron 2.66G 1G memory Ethernet namenode + secondarynamenode + 3 datanode 1 computer SQLSERVER2000 celeron 2.66G 1G memory then i did select operation at the same data 100M . 5 computers Hadoop is 2mins 30secs 1 computer SQLSERVER2000 is 2mins 25secs the result is that 5 computers Hadoop is not good .why .can anyone give me some advises. thanks in adverse.
Indexes give RBMS speed, but limit their scale. If your dataset fits onto a single mssql or mysql and you can afford the index costs, stay with that data in a RAID array. Hadoop isn't trying to compete in that space -though things like CouchDB are trying to
However, before you dismiss Hadoop, get in touch with your SQL server or oracle account team and say "we are planning on working with 15 Petabytes of storage with data coming in at 1-2PB/month" and see what they say back and how big their quote is. The search terms "MapReduce a Major Step Backwards" shows some of the debate going on.