I did some experiments using coprocessors and compare the result with vanilla scan, and in one case with mapreduce. I wrote up a blog about these experiments as it was getting a bit difficult for me to explain it on mail, without figures etc. Please refer to http://hbase-coprocessor-experiments.blogspot.com/2011/05/extending.html
The result seems to suggest the coprocessor endpoints are a useful feature when one need to access a larger number of rows (well I can't quantify it as of now) and generating some sparse results. The main advantage is that the processing is done in parallel (region level granularity) and it can be extended to come up with a parallel scanner functionality. Interestingly, the single result coprocessor endpoints (aka the existing one) fails when I increased the table data. I tried to do a row count on a 100m rows. I need to dig more into it, but have mentioned my initial thoughts in the blog. I want to test them more rigorously and will really appreciate your feedback on the experiments. I have been on it for a while now, therefore need new pair of eyes to do some review. Thanks a lot for your time. Cheers, Himanshu