Re: Benchmarking and improvement of HBase's performance for a common bulk data workload

Ted Yu Sat, 27 Apr 2013 02:14:39 -0700

Thanks for thinking about ways to optimize such workload.

You can start with the following when setting up your cluster:
http://hbase.apache.org/book.html#configuration


For transactions, HBase is unique compared with PostgreSQL. See:
http://hbase.apache.org/book.html#acid

Cheers

On Sat, Apr 27, 2013 at 1:20 PM, Atri Sharma <[email protected]> wrote:

> Hi all,
>
> I have been discussing with Priyank sir on the following style of
> workload and whether we can improve HBase's performance in this area.
> The usecase is as follows:
>
> 1) Bulk load data.
> 2) Query the data multiple times(read access mostly, and no real time
> writes).
>
> This is a common workload, and I am pretty interested in benchmarking
> HBase's performance in this area, as well as improve this further.
>
> Please advice me on how I can proceed in benchmarking. Specifically,
> how will I need to set up a HBase cluster, will there be any specific
> requirements of the cluster for this type of testing?
>
>
> I worked on a patch to improve performance for a similar usecase in
> PostgreSQL. The case is pretty similar, bulk load of data, large
> number of mostly read only queries, and then deletion of the data.
>
> The optimization I targeted was the cost of writes to disk.
> Specifically, setting of flags(hint bits) for tracking the commt
> status of inserting/deleting transaction was causing a write overhead.
> I tried to mitigate this by making a cache which holds the transaction
> id in case of the above mentioned workload, hence mitigating the cost
> of writes.
>
> I will start benchmarking once I have the system set up and then start
> thinking of tests. Once I have an outline in my mind, I shall post it
> on the list.
>
> i will require the community's guidance in this a lot.
>
> Thoughts/Comments/Advice please?
>
> Regards,
>
> Atri
>
> --
> Regards,
>
> Atri
> l'apprenant
>

Re: Benchmarking and improvement of HBase's performance for a common bulk data workload

Reply via email to