Re: Extend SparkTableUtil to Handle Tables Not Tracked in Hive Metastore

Sandeep Nayak Mon, 18 Mar 2019 20:50:46 -0700

To Xabriel's point, it would be good to have a Store abstraction so that
one could plug-in an implementation be it HMS or something else.



On Mon, Mar 18, 2019 at 3:20 PM Xabriel Collazo Mojica
<[email protected]> wrote:

> +1 for having a tool/API to migrate tables from HMS into Iceberg.
>
>
>
> We do not use HMS in my current project, but since HMS is the de facto
> catalog in most companies doing Hadoop, I think such a tool would be vital
> for incentivizing Iceberg adoption and/or PoCs.
>
>
>
> *Xabriel J Collazo Mojica*  |  Senior Software Engineer  |  Adobe  |
> [email protected]
>
>
>
> *From: *<[email protected]> on behalf of Anton Okolnychyi
> <[email protected]>
> *Reply-To: *"[email protected]" <[email protected]>
> *Date: *Monday, March 18, 2019 at 2:22 PM
> *To: *"[email protected]" <[email protected]>, Ryan Blue <
> [email protected]>
> *Subject: *Re: Extend SparkTableUtil to Handle Tables Not Tracked in Hive
> Metastore
>
>
>
> I definitely support this idea. Having a clean and reliable API to migrate
> existing Spark tables to Iceberg will be helpful.
>
> I propose to collect all requirements for the new API in this thread. Then
> I can come up with a doc that we will discuss within the community.
>
>
>
> From the feature perspective, I think it would be important to support
> tables that persist partition information in HMS as well as tables that
> derive partition information from the folder structure. Also, migrating
> just a partition of a table would be useful.
>
>
>
>
>
> On 18 Mar 2019, at 18:28, Ryan Blue <[email protected]> wrote:
>
>
>
> I think that would be fine, but I want to throw out a quick warning:
> SparkTableUtil was initially written as a few handy helpers, so it wasn't
> well designed as an API. It's really useful, so I can understand wanting to
> extend it. But should we come up with a real API for these conversion tasks
> instead of updating the hacks?
>
>
>
> On Mon, Mar 18, 2019 at 11:11 AM Anton Okolnychyi <
> [email protected]> wrote:
>
> Hi,
>
> SparkTableUtil can be helpful for migrating existing Spark tables into
> Iceberg. Right now, SparkTableUtil assumes that the partition information
> is always tracked in Hive metastore.
>
> What about extending SparkTableUtil to handle Spark tables that don’t rely
> on Hive metastore? I have a local prototype that makes use of Spark
> InMemoryFileIndex to infer the partitioning info.
>
> Thanks,
> Anton
>
>
>
>
> --
>
> Ryan Blue
>
> Software Engineer
>
> Netflix
>
>
>

Re: Extend SparkTableUtil to Handle Tables Not Tracked in Hive Metastore

Reply via email to