Hi Francis, Apologies for not following up earlier, I've been a bit busy off late, and while I saw your questions and had a look at the classes, I couldn't follow up at greater length, especially considering that HCATALOG-42, which has been waiting on committing changes a bunch of those classes. And on that end, I've not looked at your patches yet, but will soon.
Regarding your comments: a) For configuration parameters - HCatTableInfo by itself is not supposed to contain any parameters specific to any storage drivers. The reason for this is that HCatTableInfo is how the M/R programmer passes on info to HCatInputFormat, and thus, should not contain anything specific to any storage driver implementation as you mention. So, there is already a place for that, and that is stored in the table(and partition metadata), in the Table and Partition objects, as Table.getStorageDescriptor.getParameters() and Partition.getStorageDescriptor.getParameters(). This is read by HCatInputFormat/HCatOutputFormat and passed on to the respective ISD OSD as part of the initialize() call and also in the getInputFormat() and getOutputFormat() calls, and all properties have a hcat.* keyname. Have a look at PigStorageInputDriver as an example - it reads a delim parameter. Or the RCFileInput/OutputDriver. b) JobInfo/OutputJobInfo - Agreed completely, and this is historical, and in need of cleaning up. We used to have two different JobInfos, and we tried to refactor it to be just one, and called it JobInfo. As time progressed, someone needed something extra, and introduced OutputJobInfo. This does need a proper refactor, and I'm glad to take suggestions. c) StorageDriver.initialize() is called at several portions of the code - I have to check to see if passing it separately makes sense, especially if it's already wrapped inside the context, as it currently is. I'll get back to you on this after some checking. Thanks again for the interest, and I hope we can refactor this to be cleaner and easier. :) Thanks, -Sushanth On Wed, Jul 20, 2011 at 10:24 AM, Francis Christopher Liu <[email protected]> wrote: > Hi, > > I've submitted a patch for my concern about the three classes: > > HCATALOG-64 > > I've also submitted a small patch which is needed to support tables which > are not stored as files: > > HCATALOG-60 > > Let me know what you guys think. > > -Francis > > > On 7/15/11 3:09 PM, "Francis Christopher Liu" <[email protected]> wrote: > >> Hi, >> >> I¹m part of a team that¹s working on adding Hbase support to hcatalog. We¹re >> just getting our feet wet with the source code. And have some questions. Any >> help would be appreciated to get things going. >> >> As part of writing the storage drivers for Hbase we need to add a few more >> configuration parameters (ie range of versions to read, version number to use >> when writing, etc). Since setInput/setOutput takes in HCatTableInfo as a >> parameter. It would seem this is the right place to put it? Also when adding >> parameters it wouldn¹t be good design to put implementation specific >> parameters into HCatTableInfo. So would it be better to subclass this class >> or >> add a Properties field to store such information? >> >> JobInfo seems to only be used for Input as OutputJobInfo is for output. >> Shouldn¹t we rename the class to InputJobInfo? Also JobInfo doesn¹t have a >> reference to HCatTableInfo while OutJobInfo does info does. Given this is the >> Hcat context used by the storage drivers shouldn¹t it be there? >> >> As for the role of the classes it seems to me that it would make much more >> sense to have *JobInfo passed as the parameter for setInput/setOutput. Looks >> to me, HCatTableInfo should contain the state of things as persisted in the >> metastore while *JobInfo classes should contain the job-specific information? >> We could have a factory method which creates *JobInfo object as well as it¹s >> referenced HCaTableInfo object. >> >> Also *StorageDriver.initialize() is not passed the *JobInfo. I know it¹s >> possible to deserialize the object from Context object but wouldn¹t it be >> cleaner to just pass it? >> >> Let me know what you guys think. Feel free to point out misinterpretations I >> have made this¹ll help us understand better how things work together. >> >> -Francis >> >> > >
