[ 
https://issues.apache.org/jira/browse/SPARK-41918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655345#comment-17655345
 ] 

Martin Grund commented on SPARK-41918:
--------------------------------------

Renaming fields is WIRE compatible and most likely this is going to be the 
preferred way of compatibility for the protos.

> Refine the naming in proto messages
> -----------------------------------
>
>                 Key: SPARK-41918
>                 URL: https://issues.apache.org/jira/browse/SPARK-41918
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Connect
>    Affects Versions: 3.4.0
>            Reporter: Ruifeng Zheng
>            Priority: Major
>
> normally, we name the fields after the corresponding LogiclalPlan or 
> DataFrame API, but they are not consistent in protos, for example, the column 
> name:
> {code:java}
>   message UnresolvedRegex {
>     // (Required) The column name used to extract column with regex.
>     string col_name = 1;
>   }
> {code}
> {code:java}
>   message Alias {
>     // (Required) The expression that alias will be added on.
>     Expression expr = 1;
>     // (Required) a list of name parts for the alias.
>     //
>     // Scalar columns only has one name that presents.
>     repeated string name = 2;
>     // (Optional) Alias metadata expressed as a JSON map.
>     optional string metadata = 3;
>   }
> {code}
> {code:java}
> // Relation of type [[Deduplicate]] which have duplicate rows removed, could 
> consider either only
> // the subset of columns or all the columns.
> message Deduplicate {
>   // (Required) Input relation for a Deduplicate.
>   Relation input = 1;
>   // (Optional) Deduplicate based on a list of column names.
>   //
>   // This field does not co-use with `all_columns_as_keys`.
>   repeated string column_names = 2;
>   // (Optional) Deduplicate based on all the columns of the input relation.
>   //
>   // This field does not co-use with `column_names`.
>   optional bool all_columns_as_keys = 3;
> }
> {code}
> {code:java}
> // Computes basic statistics for numeric and string columns, including count, 
> mean, stddev, min,
> // and max. If no columns are given, this function computes statistics for 
> all numerical or
> // string columns.
> message StatDescribe {
>   // (Required) The input relation.
>   Relation input = 1;
>   // (Optional) Columns to compute statistics on.
>   repeated string cols = 2;
> }
> {code}
> we probably should unify the naming:
> single column -> `column`
> multi columns -> `columns`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to