amaliujia commented on PR #38475: URL: https://github.com/apache/spark/pull/38475#issuecomment-1299371686
@cloud-fan This is a good example that one API can be implemented with or without a plan. Basically if we don't add a new plan to the proto, clients can still implement `toDF(columnNames)` by two RPC calls: 1. The client needs to get the child input proto plan resolved by the server. 2. The client does necessary check (e.g. equal length column names) based on the result of 1. 3. The client construct a Project with all the input column names wrapped by column alias, with unresolved child input proto plan again. Actually I don't know if the 3. will work. Literally we need to know the schema/output of the child input plan, then we get names from there, wrap such names by alias. Lastly we need analyzer to understand those name are from the child input and then apply those alias (replace those names with right attributes from the input plan). Even 3. will work, this alternatively is more like pushing implementation load to the client (which is more than one client in most of the cases). Because of that, I am proposing just to have a plan in proto for this purpose. Lastly, we are still matching API in proto and the DataFrame API is stable, thus it is not hurting longer term maintainability for the Connect proto. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org