[ 
https://issues.apache.org/jira/browse/HCATALOG-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083838#comment-13083838
 ] 

Sushanth Sowmyan commented on HCATALOG-64:
------------------------------------------


Hi Francis,

We(Alan, Ashutosh and myself) had a quick discussion about this, and here's a 
summary of what I feel/think - please add to it, and maybe we can come up with 
a resonable way of going forward on this:

a) First and foremost, I would like the Storage Driver to implemented in such a 
way that even if the M/R program reading the table knows nothing about the 
underlying storage, it should still work consistently without passing any 
parameters. i.e. the parameters, if allowed, should be treated as hints rather 
than required parameters. I don't know how I'd enforce this though.

b) I understand that the HBase storage driver will never be in a position where 
it can contain partitions of another table - either the entire table is a HBase 
table, or it's not. This is good, because it does simplify the scenario in 
addressing a couple of concerns I had earlier. More to the point, I know see a 
potential for something like a HBaseTableInfo which extends HCatTableInfo as 
something that the user can pass in, which has the additional fields as top 
level parameters, which the storage driver can then use if it detects as being 
of an appropriate type.  Also, per 
https://cwiki.apache.org/confluence/display/HCATALOG/HCatalog+HBase+Integration+Design
 which you linked on hcatalog-dev, there's mention of a HBaseTableReader member 
of HCatTableInfo as a first stab of integrating as a top level member rather 
than as a configuration parameter.

c) It might be possible that we have another look at the filter specification 
to see if we can work "version" into that, and if it can be more than just a 
string. What other parameters do you have that you need to pass in?

That said, I'm ok to continue with this - but we do want to revisit 
HCatTableInfo if so at some time to design it in a more generic way that can 
encompass the notoion of a table that works across HBase tables and 
HDFS-IF/OF-based tables. I'm definitely more enthusiastic about/comfortable 
with a top level field for something like getVersion rather than a parameter, 
where all other StorageDrivers would simply ignore that directive.



> Refactor HCatTableInfo, JobInfo and OutputJobInfo
> -------------------------------------------------
>
>                 Key: HCATALOG-64
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-64
>             Project: HCatalog
>          Issue Type: Improvement
>    Affects Versions: 0.1, 0.2
>            Reporter: Francis Liu
>            Assignee: Francis Liu
>             Fix For: 0.2
>
>         Attachments: HCatTableInfo_JobInfo_OutputJobInfo_3.patch
>
>
> These classes and their roles has become convoluted. HCatTableInfo should be 
> an HCat abstraction of table and thus not have any job specific information 
> and should not contain different information depending on usage. *JobInfo 
> classes should contain job specific information (user provided, derived from 
> metastore info, etc). Since *JobInfo contains such information it should be 
> the object which is passed to HCatInputFormat.setInput and 
> HCatInputFormat.setOutput. Also JobInfo should be renamed to InputJobInfo for 
> consistency and clarity. Also there needs to be a way to pass implementation 
> specific configuration information down to the actual storage driver.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to