[
https://issues.apache.org/jira/browse/HCATALOG-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084237#comment-13084237
]
Francis Liu commented on HCATALOG-64:
-------------------------------------
I will upload the design soon.
The design was very much spot on so there was no need to change much though I
was not aware this was vetted by the HCat team. Thinking about it these are the
major changes we did:
* Update the design with this refactor patch in mind.
* TableReader which is a facility for adhoc reading within an MR job has been
ommitted for now. We are focusing on the storage drivers for now.
* Replace exposing a setScan(Scan scan) api to the user with something more
generic getProperties(). Given that this design was vetted then I don't see why
there is so much resistance with exposing features of the underlying
implementation with an API like this. setSnapshot() gets replaced by
getProperties() as well.
* Two implementations of OutputStorageDriver, one using random Puts and the
other using bulk load. Apart from the design being incremental we'd like to
compare the performance of the two.
> Refactor HCatTableInfo, JobInfo and OutputJobInfo
> -------------------------------------------------
>
> Key: HCATALOG-64
> URL: https://issues.apache.org/jira/browse/HCATALOG-64
> Project: HCatalog
> Issue Type: Improvement
> Affects Versions: 0.1, 0.2
> Reporter: Francis Liu
> Assignee: Francis Liu
> Fix For: 0.2
>
> Attachments: HCatTableInfo_JobInfo_OutputJobInfo_3.patch
>
>
> These classes and their roles has become convoluted. HCatTableInfo should be
> an HCat abstraction of table and thus not have any job specific information
> and should not contain different information depending on usage. *JobInfo
> classes should contain job specific information (user provided, derived from
> metastore info, etc). Since *JobInfo contains such information it should be
> the object which is passed to HCatInputFormat.setInput and
> HCatInputFormat.setOutput. Also JobInfo should be renamed to InputJobInfo for
> consistency and clarity. Also there needs to be a way to pass implementation
> specific configuration information down to the actual storage driver.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira