[GitHub] [spark] amaliujia commented on pull request #38475: [SPARK-40992][CONNECT] Support toDF(columnNames) in Connect DSL

GitBox Tue, 01 Nov 2022 16:58:36 -0700


amaliujia commented on PR #38475:
URL: https://github.com/apache/spark/pull/38475#issuecomment-1299371686


   @cloud-fan 
   
   This is a good example that one API can be implemented with or without a 
plan.
   
   Basically if we don't add a new plan to the proto, clients can still 
implement `toDF(columnNames)` by two RPC calls:
   1. The client needs to get the child input proto plan resolved by the server.
   2. The client does necessary check (e.g. equal length column names) based on 
the result of 1. 
   3. The client construct a Project with all the input column names wrapped by 
column alias, with unresolved child input proto plan again.
   
   Actually I don't know if the 3. will work. Literally we need to know the 
schema/output of the child input plan, then we get names from there, wrap such 
names by alias. Lastly we need analyzer to understand those name are from the 
child input and then apply those alias (replace those names with right 
attributes from the input plan).
   
   Even 3. will work, this alternatively is more like pushing implementation 
load to the client (which is more than one client in most of the cases). 
Because of that, I am proposing just to have a plan in proto for this purpose.
   
   
   Lastly, we are still matching API in proto and the DataFrame API is stable, 
thus it is not hurting longer term maintainability for the Connect proto. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] amaliujia commented on pull request #38475: [SPARK-40992][CONNECT] Support toDF(columnNames) in Connect DSL

Reply via email to