Hi William, I've put a few comments inline... On 28 March 2010 04:06, William Kang <weliam.cl...@gmail.com> wrote: > > Hi, > I am quite confused about the distributions of data in a HBase system. > For instance, if I store 10 videos in 10 HTable rows' cell, I assume that > these 10 videos will be stored in different data nodes (regionservers) in > HBase.
The distribution of the data would depend on the size of the videos. Assuming the videos are 10MB each then all videos will be contained within a single region and served by a single region server. Once a region contains more than 256MB of data (default) the region is split in two. The two regions will then (probably) be served by two region servers, etc... You may also be getting the terminologies a little mixed. I'd suggest having a read of the excellent HBase Architecture 101 article that Lars George wrote: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html > Now, if I wrote a program that do some processes for these 10 videos > parallel, what' going to happen? > Since I only deployed the program in a jar to the master server in HBase, > will all videos in the HBase system have to be transfered into the master > server to get processed? If you run the program from a single machine (don't use the HMaster) then yes, it would have to transfer the data to that machine using the network. > 1. Or do I have another option to assign where the computing should happen > so I do not have to transfer the data over the network and use the region > server's cpu to calculate the process? > 2. Or should I deploy the program jar to each region server so the region > server can use local cpu on the local data? Will HBase system do that > automatically? > 3. Or I need plug M/R into HBase in order to use the local data and > parallelization in processes? > Many thanks. HBase uses HDFS to store files. The data that a region server is serving does not necessarily reside on the same machine as the region server. As a result options 1 and 2 don't really make sense... As Tim Robertson suggests you are left option 3 to consider... > > > William I hope that helps a little. I'd really strongly recommend that you have a read of the HBase Architecture 101 article... Cheers, Dan