Hey hcat gurus -
Before Pig got full boolean support a common thing was treating them as
integers*. I'd like to provide boolean-to-int conversion in HCatalog,
enabled with a property, so the following two cases work:
(a) Pre-boolean support pig versions can read tables with boolean columns
(b) Pig scripts written in the pre-boolean days can continue working, even
after updating pig.
Most schema conversion stuff happens with static methods, which makes
sense, but complicates configuration. Any objection to creating a global
static class for stuff like passing configs around? This would be similar
to what Pig and Hive already have:
UDFContext.getUDFContext().getJobConf();
Hive.get().getConf();
HCatContext.get().getConf(); <-- proposed new class
We would set the conf very early on (HCatLoader, HCatInputFormat) and it
could be used to simplify configuration inside HCat. With such a class
adding this conversion would be super easy + maintainable, whereas now it
would be a very invasive change.
Thoughts?
--travis
*
https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/util/ThriftToPig.java#L99