[
https://issues.apache.org/jira/browse/HCATALOG-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083856#comment-13083856
]
Francis Liu edited comment on HCATALOG-64 at 8/12/11 1:42 AM:
--------------------------------------------------------------
Thanks for addressing my concerns.
a) So users can specify implementation specific hints but not parameters? If
you're concern is abstraction this will be exposing implementation anyway
except the user is "not sure". Most users won't supply hints unless they know
the actual underlying storage system.
b) An HBaseTableInfo sounds like a good idea but it won't solve our use case
since it will be populated with metastore information and not job specific ones
(or at least in this patch :-)). Probably you meant an HBaseInputJobInfo class
which was one of my suggestions in hbase-dev. I didn't go this route because it
was too invasive and I felt we needed to do some experimentation with the first
few drops to really figure things out. BTW the twiki is a bit outdated because
of the refactoring and some redesign. I'll try to migrate the internal one into
confluence.
c) For input we need support for passing a specific version and a range of
versions, per column family. We need this to support the repeatable read
feature we are trying to develop. For output we need just a single version.
It seems you're main concern now is not using a properties field for
implementation specific parameters. We can definitely explore that route but
why don't we experiment with this simpler solution in the meantime we are
ironing a better solution out?
was (Author: toffer):
Thanks for addressing my concerns.
a) So users can specify implementation specific hints but not parameters? If
you're concern is abstraction this will be exposing implementation anyway
except the user is "not sure". Most users won't supply hints unless they know
the actual underlying storage system.
b) An HBaseTableInfo sounds like a good idea but it won't solve our use case
since it will be populated with metastore information and not job specific ones
(or at least in this patch :-)). Probably you meant an HBaseInputJobInfo class
which was one of my suggestions in hbase-dev. I didn't go this route because it
was too invasive and I felt we needed to do some experimentation with the first
few drops to really figure things out. BTW the twiki is a bit outdated because
of the refactoring and some redesign. I'll try to migrate the internal one into
confluence.
c) For input we need support for passing a specific version and a range of
versions, possibly per column family. We need this to support the repeatable
read feature we are trying to develop. For output we need just a single version.
It seems you're main concern now is not using a properties field for
implementation specific parameters. We can definitely explore that route but
why don't we experiment with this simpler solution in the meantime we are
ironing a better solution out?
> Refactor HCatTableInfo, JobInfo and OutputJobInfo
> -------------------------------------------------
>
> Key: HCATALOG-64
> URL: https://issues.apache.org/jira/browse/HCATALOG-64
> Project: HCatalog
> Issue Type: Improvement
> Affects Versions: 0.1, 0.2
> Reporter: Francis Liu
> Assignee: Francis Liu
> Fix For: 0.2
>
> Attachments: HCatTableInfo_JobInfo_OutputJobInfo_3.patch
>
>
> These classes and their roles has become convoluted. HCatTableInfo should be
> an HCat abstraction of table and thus not have any job specific information
> and should not contain different information depending on usage. *JobInfo
> classes should contain job specific information (user provided, derived from
> metastore info, etc). Since *JobInfo contains such information it should be
> the object which is passed to HCatInputFormat.setInput and
> HCatInputFormat.setOutput. Also JobInfo should be renamed to InputJobInfo for
> consistency and clarity. Also there needs to be a way to pass implementation
> specific configuration information down to the actual storage driver.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira