Yahoo is not a huge user of Pig and HBase together yet, so my response to this is theoretical rather than based on my need. But if your work produces a significant improvement I would definitely say it is worth contributing. Even if it does not get checked in because we migrate the trunk to work with the latest HBase (which maybe already has the work in it) it's still worthwhile to have the patch in the JIRA so that those who are using Pig with older HBase can apply it to their code and get the benefits.

This functionality should definitely be configurable, since it has correctness implications.

Alan.

On Jan 24, 2011, at 1:22 PM, Corbin Hoenes wrote:

We've got a patch we've made to HBaseStorage which allows a caller to turn
off the WriteAheadLog feature while doing bulk loads into hbase.

From the performance tuning wikipage:
http://wiki.apache.org/hadoop/PerformanceTuning
"To speed up the inserts in a non critical job (like an import job), you can
use Put.writeToWAL(false) to bypass writing to the write ahead log."

We've tested this on HBase 0.20.6 and it helps dramatically. It sounds like future versions of HBase support a feature like this by default--so maybe
this problem goes away when we start using 0.90?

Is this something valuable to contribute back?

Reply via email to