Hi, I’m part of a team that’s working on adding Hbase support to hcatalog. We’re just getting our feet wet with the source code. And have some questions. Any help would be appreciated to get things going.
As part of writing the storage drivers for Hbase we need to add a few more configuration parameters (ie range of versions to read, version number to use when writing, etc). Since setInput/setOutput takes in HCatTableInfo as a parameter. It would seem this is the right place to put it? Also when adding parameters it wouldn’t be good design to put implementation specific parameters into HCatTableInfo. So would it be better to subclass this class or add a Properties field to store such information? JobInfo seems to only be used for Input as OutputJobInfo is for output. Shouldn’t we rename the class to InputJobInfo? Also JobInfo doesn’t have a reference to HCatTableInfo while OutJobInfo does info does. Given this is the Hcat context used by the storage drivers shouldn’t it be there? As for the role of the classes it seems to me that it would make much more sense to have *JobInfo passed as the parameter for setInput/setOutput. Looks to me, HCatTableInfo should contain the state of things as persisted in the metastore while *JobInfo classes should contain the job-specific information? We could have a factory method which creates *JobInfo object as well as it’s referenced HCaTableInfo object. Also *StorageDriver.initialize() is not passed the *JobInfo. I know it’s possible to deserialize the object from Context object but wouldn’t it be cleaner to just pass it? Let me know what you guys think. Feel free to point out misinterpretations I have made this’ll help us understand better how things work together. -Francis
