Re: [DISCUSS] Revert and revisit the public custom expression API for partition (a.k.a. Transform API)

2020-01-23 Thread Wenchen Fan
I don't think we want to add a lot of flexibility to the PARTITION BY expressions. It's usually just columns or nested fields, or some common functions like year, month, etc. If you look at the parser, we create DS V2 Expression directly. The partition-specific expressions are for

Re: [DISCUSS] Revert and revisit the public custom expression API for partition (a.k.a. Transform API)

2020-01-22 Thread Hyukjin Kwon
There's another PR open to expose this more publicity in Python side ( https://github.com/apache/spark/pull/27331). To sum up, I would like to make sure we know these below: - Is this expression only for partition or do we plan to expose this to replace other existing expressions as some kind of

Re: [DISCUSS] Revert and revisit the public custom expression API for partition (a.k.a. Transform API)

2020-01-16 Thread Hyukjin Kwon
Thanks for giving me some context and clarification, Ryan. I think I was rather trying to propose to revert because I don't see the explicit plan here and it was just left half-done for a long while. >From reading the PR description and codes, I could not guess in which way we should fix this API

Re: [DISCUSS] Revert and revisit the public custom expression API for partition (a.k.a. Transform API)

2020-01-16 Thread Ryan Blue
Hi everyone, Let me recap some of the discussions that got us to where we are with this today. Hopefully that will provide some clarity. The purpose of partition transforms is to allow source implementations to internally handle partitioning. Right now, users are responsible for this. For

Re: [DISCUSS] Revert and revisit the public custom expression API for partition (a.k.a. Transform API)

2020-01-16 Thread Hyukjin Kwon
I think the problem here is if there is an explicit plan or not. The PR was merged one year ago and not many changes have been made to this API to address the main concerns mentioned. Also, the followup JIRA requested seems still open https://issues.apache.org/jira/browse/SPARK-27386 I heard this

Re: [DISCUSS] Revert and revisit the public custom expression API for partition (a.k.a. Transform API)

2020-01-16 Thread Wenchen Fan
The DS v2 project is still evolving so half-backed is inevitable sometimes. This feature is definitely in the right direction to allow more flexible partition implementations, but there are a few problems we can discuss. About expression duplication. This is an existing design choice. We don't

[DISCUSS] Revert and revisit the public custom expression API for partition (a.k.a. Transform API)

2020-01-16 Thread Hyukjin Kwon
Hi all, I would like to suggest to take one step back at https://github.com/apache/spark/pull/24117 and rethink about it. I am writing this email as I raised the issue few times but could not have enough responses promptly, and the code freeze is being close. In particular, please refer the