Weichen Xu created SPARK-34463: ---------------------------------- Summary: toPandas failed with error: buffer source array is read-only Key: SPARK-34463 URL: https://issues.apache.org/jira/browse/SPARK-34463 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.0.2 Reporter: Weichen Xu
Environment: apache/park master pandas version > 1.0.5 Reproduce code: {code: python} spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True) spark.conf.set('spark.sql.execution.arrow.pyspark.selfDestruct.enabled', True) spark.createDataFrame(sc.parallelize([(i,) for i in range(13)], 1), 'id long').selectExpr('IF(id % 3==0, id+1, NULL) AS f1', '(id+1) % 2 AS label').toPandas()['label'].value_counts() {code} Get error like: {{Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/base.py", line 1033, in value_counts dropna=dropna, File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py", line 820, in value_counts keys, counts = value_counts_arraylike(values, dropna) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py", line 865, in value_counts_arraylike keys, counts = f(values, dropna) File "pandas/_libs/hashtable_func_helper.pxi", line 1098, in pandas._libs.hashtable.value_count_int64 File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__ ValueError: buffer source array is read-only }} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org