Re: Catalog API for Partition

Wenchen Fan Fri, 17 Jul 2020 07:53:24 -0700

In Hive, partition does two things:
1. Act as an index to speed up data scan
2. Act as a way to manage the data. People can add/drop partitions.


How do you unify these 2 things in your API design?

On Fri, Jul 17, 2020 at 12:03 AM JackyLee <[email protected]> wrote:

> Hi devs,
>
> In order to support Partition Commands for datasourcev2 and Lakehouse, I'm
> trying to add Partition API for multiple Catalog.
>
> They are widely used APIs in mysql or hive or other datasources, we can use
> these API to mange Partition metadata in Lakehouse.
>
> JIRA: https://issues.apache.org/jira/browse/SPARK-31694
> PR: https://github.com/apache/spark/pull/28617
>
> We have already use these APIs to support Lakehouse on Delta Lake and hive
> on datasourcev2, and it does solves partition supports on datasourcev2.
> Could anyone review it?
>
> Thanks,
> Jacky Lee
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
>
>

Re: Catalog API for Partition

Reply via email to