[
https://issues.apache.org/jira/browse/HIVE-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760294#action_12760294
]
Schubert Zhang commented on HIVE-493:
-------------------------------------
Thanks Prasad,
1) Do you mean we should run 'alter table <tbl> add partition <partition spec>'
at the end of each of the MapReducen run?
For example:
The first run of above MapReduce job create directories of partition-1 and
partition-2, and each have some files under. And we should do'alter table <tbl>
add partition partition-1' and 'alter table <tbl> add partition partition-2'.
The second run of the job generate some files under partition-2 and create a
new partition directory partition-3. And we should do 'alter table <tbl> add
partition partition-2' and 'alter table <tbl> add partition partition-3'.
Is right?
2) When the first do run of 1)'s example, the data in partition-1 and
partition-2 are available?
And do you mean after the second run of 1)'s example, the new added data in
partition-2 will become available with the old existing data?
3) Yes, it is a serious issue to co-ordinate the read and write, since Hive is
not a strict data server, but it is a loose solution.
In fact, we are thinking of this issue in our project. Is there any good
practices?
> automatically infer existing partitions of table from HDFS files.
> -----------------------------------------------------------------
>
> Key: HIVE-493
> URL: https://issues.apache.org/jira/browse/HIVE-493
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Metastore, Query Processor
> Affects Versions: 0.3.0, 0.3.1, 0.4.0
> Reporter: Prasad Chakka
>
> Initially partition list for a table is inferred from HDFS directory
> structure instead of looking into metastore (partitions are created using
> 'alter table ... add partition'). but this automatic inferring was removed to
> favor the later approach during checking-in metastore checker feature and
> also to facilitate external partitions.
> Joydeep and Frederick mentioned that it would simple for users to create the
> HDFS directory and let Hive infer rather than explicitly add a partition. But
> doing that raises following...
> 1) External partitions -- so we have to mix both approaches and partition
> list is merged list of inferred partitions and registered partitions. and
> duplicates have to be resolved.
> 2) Partition level schemas can't supported. Which schema to chose for the
> inferred partitions? the table schema when the inferred partition is created
> or the latest tale schema? how do we know the table schema when the inferred
> partitions is created?
> 3) If partitions have to be registered the partitions can be disabled without
> actually deleting the data. this feature is not supported and may not be that
> useful but nevertheless this can't be supported with inferred partitions
> 4) Indexes are being added. So if partitions are not registered then indexes
> for such partitions can not be maintained automatically.
> I would like to know what is the general thinking about this among users of
> Hive. If inferred partitions are preferred then can we live with restricted
> functionality that this imposes?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.