[ https://issues.apache.org/jira/browse/SPARK-37465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17449584#comment-17449584 ]
Willi Raschkowski commented on SPARK-37465: ------------------------------------------- I also noticed another that {{CategoricalOpsTest}} fails on pandas 0.25.3 (latest 0.x) and works on 1.x: {code:java} $ conda list | grep pandas pandas 0.25.3 py36he6710b0_0 $ python/run-tests --testnames 'pyspark.pandas.tests.data_type_ops.test_categorical_ops CategoricalOpsTest' ... Running tests... ---------------------------------------------------------------------- /home/circleci/project/python/pyspark/context.py:238: FutureWarning: Python 3.6 support is deprecated in Spark 3.2. FutureWarning test_abs (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... ok (2.353s) test_add (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... ok (1.382s) test_and (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... ok (0.265s) ok (6.569s) alOpsTest) ... test_eq (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... FAIL (1.514s) test_floordiv (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... ok (0.910s) test_from_to_pandas (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... ok (0.143s) test_ge (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... FAIL (0.795s) test_gt (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... FAIL (0.891s) test_invert (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... ok (0.044s) test_isnull (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... ok (0.097s) test_le (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... FAIL (0.863s) test_lt (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... FAIL (0.844s) test_mod (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... ok (0.897s) test_mul (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... ok (0.860s) test_ne (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... FAIL (1.405s) test_neg (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... ok (0.044s) test_or (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... ok (0.160s) test_pow (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... ok (0.821s) test_radd (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... ok (0.081s) test_rand (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... ok (0.100s) test_rfloordiv (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... ok (0.083s) test_rmod (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... ok (0.050s) test_rmul (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... ok (0.079s) test_ror (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... ok (0.095s) test_rpow (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... ok (0.078s) test_rsub (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... ok (0.078s) test_rtruediv (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... ok (0.079s) test_sub (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... ok (0.818s) test_truediv (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ... ok (0.832s) ====================================================================== FAIL [1.611s]: test_eq (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/circleci/project/python/pyspark/testing/pandasutils.py", line 122, in assertPandasEqual **kwargs File "/home/circleci/.pyenv/versions/our-miniconda/envs/python3/lib/python3.6/site-packages/pandas/util/testing.py", line 1248, in assert_series_equal assert_attr_equal('name', left, right, obj=obj) File "/home/circleci/.pyenv/versions/our-miniconda/envs/python3/lib/python3.6/site-packages/pandas/util/testing.py", line 941, in assert_attr_equal raise_assert_detail(obj, msg, left_attr, right_attr) AssertionError: Series are different Attribute "name" are different [left]: that_numeric_cat [right]: None The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/circleci/project/python/pyspark/pandas/tests/data_type_ops/test_categorical_ops.py", line 268, in test_eq psdf["this_numeric_cat"] == psdf["that_numeric_cat"], File "/home/circleci/project/python/pyspark/testing/pandasutils.py", line 223, in assert_eq self.assertPandasEqual(lobj, robj, check_exact=check_exact) File "/home/circleci/project/python/pyspark/testing/pandasutils.py", line 130, in assertPandasEqual raise AssertionError(msg) from e AssertionError: Series are different Attribute "name" are different [left]: that_numeric_cat [right]: None Left: Name: that_numeric_cat, dtype: bool bool Right: dtype: bool bool ... {code} Upgrading pandas to 1.x fixes it: {code} $ conda list | grep pandas pandas 1.0.0 py36h0573a6f_0 $ python/run-tests --testnames 'pyspark.pandas.tests.data_type_ops.test_categorical_ops CategoricalOpsTest' Running PySpark tests. Output is in /home/circleci/project/python/unit-tests.log Will test against the following Python executables: ['python3.6'] Will test the following Python tests: ['pyspark.pandas.tests.data_type_ops.test_categorical_ops CategoricalOpsTest'] python3.6 python_implementation is CPython python3.6 version is: Python 3.6.12 :: Anaconda, Inc. Starting test(python3.6): pyspark.pandas.tests.data_type_ops.test_categorical_ops CategoricalOpsTest Finished test(python3.6): pyspark.pandas.tests.data_type_ops.test_categorical_ops CategoricalOpsTest (34s) Tests passed in 34 seconds {code} > PySpark tests failing on Pandas 0.23 > ------------------------------------ > > Key: SPARK-37465 > URL: https://issues.apache.org/jira/browse/SPARK-37465 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.2.0 > Reporter: Willi Raschkowski > Priority: Major > > I was running Spark tests with Pandas {{0.23.4}} and got the error below. The > minimum Pandas version is currently {{0.23.2}} > [(Github)|https://github.com/apache/spark/blob/v3.2.0/python/setup.py#L114]. > Upgrading to {{0.24.0}} fixes the error. I think Spark needs [this fix > (Github)|https://github.com/pandas-dev/pandas/pull/21160/files#diff-1b7183f5b3970e2a1d39a82d71686e39c765d18a34231b54c857b0c4c9bb8222] > in Pandas. > {code:java} > $ python/run-tests --testnames > 'pyspark.pandas.tests.data_type_ops.test_boolean_ops > BooleanOpsTest.test_floordiv' > ... > ====================================================================== > ERROR [5.785s]: test_floordiv > (pyspark.pandas.tests.data_type_ops.test_boolean_ops.BooleanOpsTest) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/home/circleci/project/python/pyspark/pandas/tests/data_type_ops/test_boolean_ops.py", > line 128, in test_floordiv > self.assert_eq(b_pser // b_pser.astype(int), b_psser // > b_psser.astype(int)) > File > "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", > line 1069, in wrapper > result = safe_na_op(lvalues, rvalues) > File > "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", > line 1033, in safe_na_op > return na_op(lvalues, rvalues) > File > "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", > line 1027, in na_op > result = missing.fill_zeros(result, x, y, op_name, fill_zeros) > File > "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/missing.py", > line 641, in fill_zeros > signs = np.sign(y if name.startswith(('r', '__r')) else x) > TypeError: ufunc 'sign' did not contain a loop with signature matching types > dtype('bool') dtype('bool') > {code} > These are my relevant package versions: > {code:java} > $ conda list | grep -e numpy -e pyarrow -e pandas -e python > # packages in environment at /home/circleci/miniconda/envs/python3: > numpy 1.16.6 py36h0a8e133_3 > numpy-base 1.16.6 py36h41b4c56_3 > pandas 0.23.4 py36h04863e7_0 > pyarrow 1.0.1 py36h6200943_36_cpu conda-forge > python 3.6.12 hcff3b4d_2 anaconda > python-dateutil 2.8.1 py_0 anaconda > python_abi 3.6 1_cp36m conda-forg > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org