Yahoo is not a huge user of Pig and HBase together yet, so my response
to this is theoretical rather than based on my need. But if your work
produces a significant improvement I would definitely say it is worth
contributing. Even if it does not get checked in because we migrate
the trunk to work with the latest HBase (which maybe already has the
work in it) it's still worthwhile to have the patch in the JIRA so
that those who are using Pig with older HBase can apply it to their
code and get the benefits.
This functionality should definitely be configurable, since it has
correctness implications.
Alan.
On Jan 24, 2011, at 1:22 PM, Corbin Hoenes wrote:
We've got a patch we've made to HBaseStorage which allows a caller
to turn
off the WriteAheadLog feature while doing bulk loads into hbase.
From the performance tuning wikipage:
http://wiki.apache.org/hadoop/PerformanceTuning
"To speed up the inserts in a non critical job (like an import job),
you can
use Put.writeToWAL(false) to bypass writing to the write ahead log."
We've tested this on HBase 0.20.6 and it helps dramatically. It
sounds like
future versions of HBase support a feature like this by default--so
maybe
this problem goes away when we start using 0.90?
Is this something valuable to contribute back?