Even if we put ordering of the fields aside, data types equality semantics (StructField in particular) is likely to result in implementation which is either confusing or has limited applicability.
Additionally, Scala StructType is already a Seq[StructField] and as such provides set-like operations (contains, diff, intersect, union) as well as implementations of ++ / :+ / +: so we cannot do much here, without breaking the existing API.
On 8/14/22 11:30, Alexandros Biratsis wrote:
Hello Rui and Tim,Indeed this sound a good idea and quite useful. To make it more formal the list of a StructType could be treated as a Scala/Python set by providing(inheriting?) the common sets' functionality i.e add, remove, concat, intersect, diff etc. The set like functionality could be part of StructType class for both languages.The Scala set collection https://www.scala-lang.org/api/2.13.x/scala/collection/immutable/Set.html <https://www.scala-lang.org/api/2.13.x/scala/collection/immutable/Set.html>Best, AlexOn Wed, Aug 10, 2022, 08:14 Rui Wang <amaliu...@apache.org <mailto:amaliu...@apache.org>> wrote:Thanks for the idea! I am thinking that the usage of "combined = StructType( a.fields + b.fields)" is still good because 1) it is not horrible to merge a and b in this way. 2) itself clarifies the intention which is merge two struct's fields to construct a new struct 3) you also have room to apply more complicated operations on fields merging. For example remove duplicate files with the same name or use a.fields but remove some fields if they are in b. overloading "+" could be 1. it's ambiguous on what this plus is doing. 2. If you define + is a concatenation on the fields, then it's limited to only do the concatenation. How about other operations like extract fields from a based on b? Maybe overloading "-"? In this case the item list will grow. -Rui On Tue, Aug 9, 2022 at 1:10 PM Tim <bosse...@posteo.de <mailto:bosse...@posteo.de>> wrote: Hi all, this is my first message to the Spark mailing list, so please bear with me if I don't fully meet your communication standards. I just wanted to discuss one aspect that I've stumbled across several times over the past few weeks. When working with Spark, I often run into the problem of having to merge two (or more) existing StructTypes into a new one to define a schema. Usually this looks similar (in Python) to the following simplified example: a = StructType([StuctField("field_a", StringType())]) b = StructType([StructField("field_b", IntegerType())]) combined = StructType( a.fields + b.fields) My idea, which I would like to discuss, is to shorten the above example in Python as follows by supporting Python's add operator for StructTypes: combined = a + b What do you think of this idea? Are there any reasons why this is not yet part of StructType's functionality? If you support this idea, I could create a first PR for further and deeper discussion. Best Tim --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org <mailto:dev-unsubscr...@spark.apache.org>
-- Best regards, Maciej Szymkiewicz Web: https://zero323.net PGP: A30CEF0C31A501EC
OpenPGP_signature
Description: OpenPGP digital signature