I think the safer option is to extend the visitor. I'm not sure how much of the transforms we will want to expose, but I'll look into it more soon because we are working on an update to add a partition spec update operation.
On Fri, Oct 2, 2020 at 7:41 AM Gabor Kaszab <[email protected]> wrote: > Hey Ryan, > > Thanks for the help! > I managed to make this PartitionSpecVisitor approach work to get the > 'numBuckets' and 'width' parameters for BUCKET and TRUNCATE transforms, > however, I found some shortcomings. E.g. Even if I want to know one > specific Transform's parameter, I have to get them for all the partitions > in the given PartitionSpec. But what is more concerning is that I can only > retrieve a (sourceId/sourceName -> transform param) mapping that works as > long as one column is used for one partition transform only. Even in this > case there might be a solution to e.g. return a (sourceName + <transform > type prefix> -> transform param) mapping but this is getting way more > complicated than it should be to get a single param from an object. > > I see two ways to make life easier (when querying transform params): > 1) Modify PartitionSpecVisitors function to accept a fieldId as well along > with sourceName and sourceId > 2) Modify the accessibility of Bucket and Truncate classes so that the > numBuckets() and width() functions could be accessed outside from the > package. > > What do you think? > > Gabor > > On Tue, Sep 22, 2020 at 8:55 PM Ryan Blue <[email protected]> > wrote: > >> Hi Gabor, >> >> Right now, I think the only way to get those parameters is to implement a >> `PartitionSpecVisitor`, which will be passed the parameters. We can >> definitely improve the API here where we need to. Initially, I wanted to >> avoid having code that would special case transforms instead of delegating >> to the Transform API. That's why it is so locked down. >> >> rb >> >> On Tue, Sep 22, 2020 at 7:33 AM Gabor Kaszab <[email protected]> >> wrote: >> >>> Hey, >>> >>> I'm working on the integration of Apache Iceberg project into Apache >>> Impala. Currently, I'm investigating how to implement partition transforms >>> that have parameters (Bucket and Truncate) and I haven't found a way to >>> retrieve their parameters (numBuckets and width) from table metadata >>> through the Iceberg API. >>> >>> I see that there are functions for this purpose (numBuckets() >>> <https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/transforms/Bucket.java#L75> >>> and width() >>> <https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/transforms/Truncate.java#L56>) >>> but I found that the classes Bucket and Truncate are only accessible within >>> their packages and I'm not able to import them into Impala project. I can >>> import the base class Transforms but that doesn't provide an interface for >>> my needs. >>> >>> Without this support I won't be able to implement a few things, e.g. >>> SHOW CREATE TABLE just to name one. >>> >>> Am I missing something? Is there a way to get the parameters of a >>> partition transform through the API? >>> >>> Cheers, >>> Gabor >>> >>> >>> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> > -- Ryan Blue Software Engineer Netflix
