Hi Annie, 2010/1/5 qin.wang <qin.w...@i-soft.com.cn>
> Hi team, > > > > When I try to do some research on Hadoop, I have several high level > questions, if any comments from you it will do great help for me: > > > > 1. Hadoop assumes the files are big files, but take Google as an example, > it > seems the google result for user are small files, so how to understand the > big files?And what’s the file content for example? > > I think the "big files" means very large file (bigger than 64MB). Hadoop use the HDFS as Distributed filesystem, the user log & web log etc are stored in HDFS, The engineers can use Hadoop to do analysis on the logs. Anyway, i don't know whether Google puts it's web pags in the distributed filesystem like this. > > 2. Why are the files write-once and read-many times? > > > As mentioned in last section, the logs are stored in HDFS, these log are write-once and alway used by engineers for severl times. > > 3. How to install other softwares on Hadoop, is there any special > requirements for the software? Do they need to support the Map/Reduce > module > and then can be installed? > > I just don't know what you mean, maybe you would like add additional jar which used in your application, if so, "distributed cache" in hadoop will help you. Good Luck! > > It will be very appreciated for your help. > > > > 王 琴 Annie.Wang > > > > 上海市徐汇区桂林路418号7号楼6楼 > Zip code: 200 233 > Tel: +86 21 5497 8666-8004 > Fax: +86 21 5497 7986 > Mobile: +86 137 6108 8369 > > > > -- http://anqiang1900.blog.163.com/