Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21427 I've been thinking about this and came to the same conclusion as @icexelloss here https://github.com/apache/spark/pull/21427#issuecomment-392070950 that we could really support both names and position, and fix this without changing behavior. When the user defines as grouped map udf, the StructType has field names so if the returned DataFrame has column names they should match. If the user returned a DataFrame with positional columns only, pandas will name the columns with an integer index (not an integer string). We could change the logic to do the following: ``` Assign columns by name, catching a KeyError exception If the column names are all integers, then fallback to assign by position Else raise the KeyError (most likely the user has a typo in the column name) ``` I think that will solve this issue and not change the behavior, but I would need check that this will hold for different pandas versions. How does that sound?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org