Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21427 I first glance, I thought this issue was slightly different than https://issues.apache.org/jira/browse/SPARK-23929, but yeah it seems to be the same. After reading through that discussion, I guess we need to be careful about any changes. I'm not used to creating DataFrames by position, but it is possible to do so with a list of tuples like the example from the doctest: ``` >>> @pandas_udf("id long, v double", PandasUDFType.GROUPED_MAP) # doctest: +SKIP ... def mean_udf(key, pdf): ... # key is a tuple of one numpy.int64, which is the value ... # of 'id' for the current group ... return pd.DataFrame([key + (pdf.v.mean(),)]) ``` Then this would be a breaking change... so maybe it would be best to add better documentation for now like @HyukjinKwon mentioned in SPARK-23929, and target a change for Spark 3.0?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org