I did some experiments using coprocessors and compare the result with
vanilla scan, and in one case with mapreduce. I wrote up a blog about these
experiments as it was getting a bit difficult for me to explain it on mail,
without figures etc. Please refer to
http://hbase-coprocessor-experiments.blogspot.com/2011/05/extending.html

The result seems to suggest the coprocessor endpoints are a useful feature
when one need to access a larger number of rows (well I can't quantify it as
of now) and generating some sparse results. The main advantage is that the
processing is done in parallel (region level granularity) and it can be
extended to come up with a parallel scanner functionality.
Interestingly, the single result coprocessor endpoints (aka the existing
one) fails when I increased the table data. I tried to do a row count on a
100m rows. I need to dig more into it, but have mentioned my initial
thoughts in the blog.

I want to test them more rigorously and will really appreciate your feedback
on the experiments. I have been on it for a while now, therefore need new
pair of eyes to do some review.

Thanks a lot for your time.

Cheers,
Himanshu

Reply via email to