[ https://issues.apache.org/jira/browse/SPARK-29748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bryan Cutler resolved SPARK-29748. ---------------------------------- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26496 [https://github.com/apache/spark/pull/26496] > Remove sorting of fields in PySpark SQL Row creation > ---------------------------------------------------- > > Key: SPARK-29748 > URL: https://issues.apache.org/jira/browse/SPARK-29748 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL > Affects Versions: 3.0.0 > Reporter: Bryan Cutler > Assignee: Bryan Cutler > Priority: Major > Fix For: 3.0.0 > > > Currently, when a PySpark Row is created with keyword arguments, the fields > are sorted alphabetically. This has created a lot of confusion with users > because it is not obvious (although it is stated in the pydocs) that they > will be sorted alphabetically, and then an error can occur later when > applying a schema and the field order does not match. > The original reason for sorting fields is because kwargs in python < 3.6 are > not guaranteed to be in the same order that they were entered. Sorting > alphabetically would ensure a consistent order. Matters are further > complicated with the flag {{__from_dict__}} that allows the {{Row}} fields to > to be referenced by name when made by kwargs, but this flag is not serialized > with the Row and leads to inconsistent behavior. > This JIRA proposes that any sorting of the Fields is removed. Users with > Python 3.6+ creating Rows with kwargs can continue to do so since Python will > ensure the order is the same as entered. Users with Python < 3.6 will have to > create Rows with an OrderedDict or by using the Row class as a factory > (explained in the pydoc). If kwargs are used, an error will be raised or > based on a conf setting it can fall back to a LegacyRow that will sort the > fields as before. This LegacyRow will be immediately deprecated and removed > once support for Python < 3.6 is dropped. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org