[
https://issues.apache.org/jira/browse/HCATALOG-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083787#comment-13083787
]
Sushanth Sowmyan commented on HCATALOG-64:
------------------------------------------
@Alan: Quick reply for a couple of your points, the rest hold:
#3 : We want to keep these separate because we do striping/adding of partition
columns as needed - these columns are not actually stored in the data stored
itself and not part of the Table/Partition schema, and add them in when we read
it, and strip them when we write it. So it's a useful separation.
#7 : Was discussed on hcatalog-dev mailing list as not belonging here:
The relevant bit from my response there:
--
a) For configuration parameters - HCatTableInfo by itself is not
supposed to contain any parameters specific to any storage drivers.
The reason for this is that HCatTableInfo is how the M/R programmer
passes on info to HCatInputFormat, and thus, should not contain
anything specific to any storage driver implementation as you mention.
So, there is already a place for that, and that is stored in the
table(and partition metadata), in the Table and Partition objects, as
Table.getStorageDescriptor.getParameters() and
Partition.getStorageDescriptor.getParameters(). This is read by
HCatInputFormat/HCatOutputFormat and passed on to the respective ISD
OSD as part of the initialize() call and also in the getInputFormat()
and getOutputFormat() calls, and all properties have a hcat.* keyname.
Have a look at PigStorageInputDriver as an example - it reads a delim
parameter. Or the RCFileInput/OutputDriver.
--
There is already a place for the implementation specific configuration
information in metadata, which is where it'd be necessary to store it for any
manner of persistence of this information.
> Refactor HCatTableInfo, JobInfo and OutputJobInfo
> -------------------------------------------------
>
> Key: HCATALOG-64
> URL: https://issues.apache.org/jira/browse/HCATALOG-64
> Project: HCatalog
> Issue Type: Improvement
> Affects Versions: 0.1, 0.2
> Reporter: Francis Liu
> Assignee: Francis Liu
> Fix For: 0.2
>
> Attachments: HCatTableInfo_JobInfo_OutputJobInfo_3.patch
>
>
> These classes and their roles has become convoluted. HCatTableInfo should be
> an HCat abstraction of table and thus not have any job specific information
> and should not contain different information depending on usage. *JobInfo
> classes should contain job specific information (user provided, derived from
> metastore info, etc). Since *JobInfo contains such information it should be
> the object which is passed to HCatInputFormat.setInput and
> HCatInputFormat.setOutput. Also JobInfo should be renamed to InputJobInfo for
> consistency and clarity. Also there needs to be a way to pass implementation
> specific configuration information down to the actual storage driver.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira