[ 
https://issues.apache.org/jira/browse/HCATALOG-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084237#comment-13084237
 ] 

Francis Liu commented on HCATALOG-64:
-------------------------------------

I will upload the design soon.

The design was very much spot on so there was no need to change much though I 
was not aware this was vetted by the HCat team. Thinking about it these are the 
major changes we did:

* Update the design with this refactor patch in mind. 
* TableReader which is a facility for adhoc reading within an MR job has been 
ommitted for now. We are focusing on the storage drivers for now. 
* Replace exposing a setScan(Scan scan) api to the user with something more 
generic getProperties(). Given that this design was vetted then I don't see why 
there is so much resistance with exposing features of the underlying 
implementation with an API like this. setSnapshot() gets replaced by 
getProperties() as well.
* Two implementations of OutputStorageDriver, one using random Puts and the 
other using bulk load. Apart from the design being incremental we'd like to 
compare the performance of the two. 



> Refactor HCatTableInfo, JobInfo and OutputJobInfo
> -------------------------------------------------
>
>                 Key: HCATALOG-64
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-64
>             Project: HCatalog
>          Issue Type: Improvement
>    Affects Versions: 0.1, 0.2
>            Reporter: Francis Liu
>            Assignee: Francis Liu
>             Fix For: 0.2
>
>         Attachments: HCatTableInfo_JobInfo_OutputJobInfo_3.patch
>
>
> These classes and their roles has become convoluted. HCatTableInfo should be 
> an HCat abstraction of table and thus not have any job specific information 
> and should not contain different information depending on usage. *JobInfo 
> classes should contain job specific information (user provided, derived from 
> metastore info, etc). Since *JobInfo contains such information it should be 
> the object which is passed to HCatInputFormat.setInput and 
> HCatInputFormat.setOutput. Also JobInfo should be renamed to InputJobInfo for 
> consistency and clarity. Also there needs to be a way to pass implementation 
> specific configuration information down to the actual storage driver.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to