Thanks for thinking about ways to optimize such workload. You can start with the following when setting up your cluster: http://hbase.apache.org/book.html#configuration
For transactions, HBase is unique compared with PostgreSQL. See: http://hbase.apache.org/book.html#acid Cheers On Sat, Apr 27, 2013 at 1:20 PM, Atri Sharma <atri.j...@gmail.com> wrote: > Hi all, > > I have been discussing with Priyank sir on the following style of > workload and whether we can improve HBase's performance in this area. > The usecase is as follows: > > 1) Bulk load data. > 2) Query the data multiple times(read access mostly, and no real time > writes). > > This is a common workload, and I am pretty interested in benchmarking > HBase's performance in this area, as well as improve this further. > > Please advice me on how I can proceed in benchmarking. Specifically, > how will I need to set up a HBase cluster, will there be any specific > requirements of the cluster for this type of testing? > > > I worked on a patch to improve performance for a similar usecase in > PostgreSQL. The case is pretty similar, bulk load of data, large > number of mostly read only queries, and then deletion of the data. > > The optimization I targeted was the cost of writes to disk. > Specifically, setting of flags(hint bits) for tracking the commt > status of inserting/deleting transaction was causing a write overhead. > I tried to mitigate this by making a cache which holds the transaction > id in case of the above mentioned workload, hence mitigating the cost > of writes. > > I will start benchmarking once I have the system set up and then start > thinking of tests. Once I have an outline in my mind, I shall post it > on the list. > > i will require the community's guidance in this a lot. > > Thoughts/Comments/Advice please? > > Regards, > > Atri > > -- > Regards, > > Atri > l'apprenant >