For Pig, my initial thought is using the actual job conf provided to
methods like:
LoadFunc.setLocation(String location, Job job)
LoadMetadata.getSchema(String location, Job job)
StoreFunc.setStoreLocation(String location, Job job)
since they are passed a configuration and called before any HCat code.
For MR I'm less sure specifically how this would work but could look
into it.
Can y'all think of any cases in Pig where HCatContext would be needed
but not yet initialized, if it were initialized in the above methods?
If this sounds like a worthwhile path to explore I can put together a
proof-of-concept patch.
--travis
On Thu, Aug 2, 2012 at 7:55 AM, Alan Gates <[email protected]> wrote:
>
> +1 to having a global config class. Would HCatContext.get().getConf() return
> the actual JobConf? Or would it return an object that HCat would promise
> would end up in the JobConf? If the former then it's hard to use on the
> front-end in Pig since early on the JobConf doesn't exist yet. If the latter
> you have to do a lot of game playing to make sure your info ends up in the
> the actual JobConf properly.
>
> I like the idea, but I need to understand the next level of design and how
> this would interact with Pig's use of the JobConf that is already in place.
>
> Alan.
>
> On Aug 1, 2012, at 6:54 PM, Travis Crawford wrote:
>
> > Hey hcat gurus -
> >
> > Before Pig got full boolean support a common thing was treating them as
> > integers*. I'd like to provide boolean-to-int conversion in HCatalog,
> > enabled with a property, so the following two cases work:
> >
> > (a) Pre-boolean support pig versions can read tables with boolean columns
> > (b) Pig scripts written in the pre-boolean days can continue working, even
> > after updating pig.
> >
> > Most schema conversion stuff happens with static methods, which makes
> > sense, but complicates configuration. Any objection to creating a global
> > static class for stuff like passing configs around? This would be similar
> > to what Pig and Hive already have:
> >
> > UDFContext.getUDFContext().getJobConf();
> > Hive.get().getConf();
> > HCatContext.get().getConf(); <-- proposed new class
> >
> > We would set the conf very early on (HCatLoader, HCatInputFormat) and it
> > could be used to simplify configuration inside HCat. With such a class
> > adding this conversion would be super easy + maintainable, whereas now it
> > would be a very invasive change.
> >
> > Thoughts?
> >
> > --travis
> >
> >
> > *
> > https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/util/ThriftToPig.java#L99
>