Re: feature request: allow external tables with different partition directory structure

Larry Ogrodnek Tue, 13 Oct 2009 16:13:01 -0700

Prasad,

  Thanks for the response.


  For our use, it would be fine for all tables to share the same
partitioning scheme.

  Although it does seem like storing the partitioning scheme per table
would be more flexible, I have a hard time thinking of a case where it
would come up in practice.  Presumably within any organization they
would have at least their own consistent naming scheme.  I suppose it
might be a problem if you are using data from multiple sources -- some
internal, some external, and you weren't able to transform the
directory names easily.... or, come up with a class that couldn't
handle multiple schemes itself...

  Anyway, for us it would work and would be very helpful.

thanks,
larry

On Mon, Oct 12, 2009 at 11:43 AM, Prasad Chakka <[email protected]> wrote:
> Larry,
>
> This can be done if user can supply the following functions of
> Warehouse.java (there may be other functions but those can be converted to
> call these)
>
> LinkedHashMap<String, String> makeSpecFromName(String name) throws
> MetaException;
> String makePartName(LinkedHashMap<String, String> spec) throws MetaException
>
> These tell Hive how to convert a partspec (partition key, val map) into an
> HDFS relative path and vice-versa. Technically this should be sufficient for
> Hive to determine how to store metadata and how to prune partitions from an
> HDFS name. So, we could create a configurable class and let user specify the
> class name through hive-site.xml. But this also means that all partitioned
> tables have to share the same partitioning scheme. Is that desirable? If not
> then user should be able to specify the partitioning scheme per table and
> store this in the table’s metadata.
>
> Thanks,
> Prasad
> ________________________________
> From: Larry Ogrodnek <[email protected]>
> Reply-To: <[email protected]>
> Date: Mon, 12 Oct 2009 10:32:21 -0700
> To: <[email protected]>
> Subject: feature request: allow external tables with different partition
>  directory structure
>
> This was actually a previous feature request by someone else:
> https://issues.apache.org/jira/browse/HIVE-91
>
> The implemented solution was to allow the location to be specified in
> the "alter table .. add partition" command.
>
> This has worked pretty well for us, however with the recent addition
> of hive support in amazon's elastic map-reduce, it would be really
> convenient if we could set it up so that our data in s3 could be
> auto-discovered without having to add partitions... (it's of the
> format bucket/data-source/YYYYMMDD[HH]).
>
> Amazon has support for simple key/val substitution in scripts, but
> since there's no kind of iteration/looping available, it's not really
> useful if you need to add a couple of weeks worth of partitions as
> part of your script....
>
> If we could somehow define the partition directory layout, it would
> really simplify our hive scripting with elastic mapreduce...
>
> thanks,
> larry
>
>

Re: feature request: allow external tables with different partition directory structure

Reply via email to