Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20280#discussion_r161964785 --- Diff: python/pyspark/sql/tests.py --- @@ -2306,18 +2306,20 @@ def test_toDF_with_schema_string(self): self.assertEqual(df.schema.simpleString(), "struct<key:string,value:string>") self.assertEqual(df.collect(), [Row(key=str(i), value=str(i)) for i in range(100)]) - # field names can differ. - df = rdd.toDF(" a: int, b: string ") --- End diff -- Yeah, I think you're right. The schema should be able to override names no matter what. I was thinking it was flawed because it relied on the row field to be sorted internally, so just changing the kwarg names (not the order) could cause it to fail. That seems a little strange, but maybe it is the intent?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org