Github user AlexanderKoryagin commented on the issue:

    https://github.com/apache/spark/pull/22568
  
    @HyukjinKwon 
    Just noticed strange behavior, that can be reproduced with a test below 
placed under `python.pyspark.sql.tests.GroupedMapPandasUDFTests`.
    Is it a bug or some feature?
    ```python
    def test_supported_types_array(self):
        from pyspark.sql.functions import pandas_udf, PandasUDFType
    
        schema = StructType([
            StructField('id', IntegerType()),
            StructField('array', ArrayType(IntegerType()))
        ])
        df = self.spark.createDataFrame(
            [[1, [1, 2, 3]]], schema=schema
        )
    
        udf1 = pandas_udf(
            lambda pdf: pdf.assign(array=pdf.array * 2),
            schema,
            PandasUDFType.GROUPED_MAP
        )
    
        result1 = df.groupby('id').apply(udf1).sort('id').toPandas()
        expected1 = 
df.toPandas().groupby('id').apply(udf1.func).reset_index(drop=True)
        self.assertPandasEqual(expected1, result1)
    ```
    Here is output:
    ```
    python/pyspark/sql/tests.py:244: in assertPandasEqual
        self.assertTrue(expected.equals(result), msg=msg)
    E   AssertionError: DataFrames are not equal:
    E
    E   Expected:
    E      id               array
    E   0   1  [1, 2, 3, 1, 2, 3]
    E   id        int32
    E   array    object
    E   dtype: object
    E
    E   Result:
    E      id      array
    E   0   1  [2, 4, 6]
    E   id        int32
    E   array    object
    E   dtype: object
    ```
    You can see that behavior of `array=pdf.array * 2` different for `result1` 
and `expected1`:
    ```
    result1 = df.groupby('id').apply(udf1).sort('id').toPandas()
        >> [2, 4, 6]
    expected1 = 
df.toPandas().groupby('id').apply(udf1.func).reset_index(drop=True)
        >> [1, 2, 3, 1, 2, 3]
    ```
    Default Python behavior is:
    ```
    [1, 2, 3] * 2
        >> [1, 2, 3, 1, 2, 3]
    ```
    Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to