[ https://issues.apache.org/jira/browse/SPARK-41918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655545#comment-17655545 ]
Rui Wang commented on SPARK-41918: ---------------------------------- [~grundprinzip-db] I am a bit confused on the renaming and what compatibility it offers: ``` message Foo { int a = 1; } ``` On the receiver side it access the a val t = foo.a + 1 Any renaming will break the receiver side's code? Do I misunderstand `WIRE compatibility` that the receiver should be able to read the output after the wire? > Refine the naming in proto messages > ----------------------------------- > > Key: SPARK-41918 > URL: https://issues.apache.org/jira/browse/SPARK-41918 > Project: Spark > Issue Type: Sub-task > Components: Connect > Affects Versions: 3.4.0 > Reporter: Ruifeng Zheng > Priority: Major > > normally, we name the fields after the corresponding LogiclalPlan or > DataFrame API, but they are not consistent in protos, for example, the column > name: > {code:java} > message UnresolvedRegex { > // (Required) The column name used to extract column with regex. > string col_name = 1; > } > {code} > {code:java} > message Alias { > // (Required) The expression that alias will be added on. > Expression expr = 1; > // (Required) a list of name parts for the alias. > // > // Scalar columns only has one name that presents. > repeated string name = 2; > // (Optional) Alias metadata expressed as a JSON map. > optional string metadata = 3; > } > {code} > {code:java} > // Relation of type [[Deduplicate]] which have duplicate rows removed, could > consider either only > // the subset of columns or all the columns. > message Deduplicate { > // (Required) Input relation for a Deduplicate. > Relation input = 1; > // (Optional) Deduplicate based on a list of column names. > // > // This field does not co-use with `all_columns_as_keys`. > repeated string column_names = 2; > // (Optional) Deduplicate based on all the columns of the input relation. > // > // This field does not co-use with `column_names`. > optional bool all_columns_as_keys = 3; > } > {code} > {code:java} > // Computes basic statistics for numeric and string columns, including count, > mean, stddev, min, > // and max. If no columns are given, this function computes statistics for > all numerical or > // string columns. > message StatDescribe { > // (Required) The input relation. > Relation input = 1; > // (Optional) Columns to compute statistics on. > repeated string cols = 2; > } > {code} > we probably should unify the naming: > single column -> `column` > multi columns -> `columns` -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org