[jira] [Updated] (SPARK-34463) toPandas failed with error: buffer source array is read-only when Arrow with self-destruct is enabled

2021-02-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-34463:
-
Affects Version/s: (was: 3.0.2)
   3.2.0

> toPandas failed with error: buffer source array is read-only when Arrow with 
> self-destruct is enabled
> -
>
> Key: SPARK-34463
> URL: https://issues.apache.org/jira/browse/SPARK-34463
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Weichen Xu
>Priority: Major
>
> Environment:
> apache/spark master 
>  pandas version > 1.0.5
> Reproduce code:
> {code:java}
> spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True)
> spark.conf.set('spark.sql.execution.arrow.pyspark.selfDestruct.enabled', True)
> spark.createDataFrame(sc.parallelize([(i,) for i in range(13)], 1), 'id 
> long').selectExpr('IF(id % 3==0, id+1, NULL) AS f1', '(id+1) % 2 AS 
> label').toPandas()['label'].value_counts()
> {code}
> Get error like:
> {quote}Traceback (most recent call last): 
>  File "", line 1, in 
>  File 
> "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/base.py",
>  line 1033, in value_counts
>  dropna=dropna,
>  File 
> "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
>  line 820, in value_counts
>  keys, counts = value_counts_arraylike(values, dropna)
>  File 
> "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
>  line 865, in value_counts_arraylike
>  keys, counts = f(values, dropna)
>  File "pandas/_libs/hashtable_func_helper.pxi", line 1098, in 
> pandas._libs.hashtable.value_count_int64
>  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
>  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
>  ValueError: buffer source array is read-only
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34463) toPandas failed with error: buffer source array is read-only when Arrow with self-destruct is enabled

2021-02-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-34463:
-
Summary: toPandas failed with error: buffer source array is read-only when 
Arrow with self-destruct is enabled  (was: toPandas failed with error: buffer 
source array is read-only)

> toPandas failed with error: buffer source array is read-only when Arrow with 
> self-destruct is enabled
> -
>
> Key: SPARK-34463
> URL: https://issues.apache.org/jira/browse/SPARK-34463
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.0.2
>Reporter: Weichen Xu
>Priority: Major
>
> Environment:
> apache/spark master 
>  pandas version > 1.0.5
> Reproduce code:
> {code:java}
> spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True)
> spark.conf.set('spark.sql.execution.arrow.pyspark.selfDestruct.enabled', True)
> spark.createDataFrame(sc.parallelize([(i,) for i in range(13)], 1), 'id 
> long').selectExpr('IF(id % 3==0, id+1, NULL) AS f1', '(id+1) % 2 AS 
> label').toPandas()['label'].value_counts()
> {code}
> Get error like:
> {quote}Traceback (most recent call last): 
>  File "", line 1, in 
>  File 
> "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/base.py",
>  line 1033, in value_counts
>  dropna=dropna,
>  File 
> "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
>  line 820, in value_counts
>  keys, counts = value_counts_arraylike(values, dropna)
>  File 
> "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
>  line 865, in value_counts_arraylike
>  keys, counts = f(values, dropna)
>  File "pandas/_libs/hashtable_func_helper.pxi", line 1098, in 
> pandas._libs.hashtable.value_count_int64
>  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
>  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
>  ValueError: buffer source array is read-only
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34463) toPandas failed with error: buffer source array is read-only

2021-02-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-34463:
-
Description: 
Environment:

apache/spark master 
 pandas version > 1.0.5

Reproduce code:
{code:java}
spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True)
spark.conf.set('spark.sql.execution.arrow.pyspark.selfDestruct.enabled', True)
spark.createDataFrame(sc.parallelize([(i,) for i in range(13)], 1), 'id 
long').selectExpr('IF(id % 3==0, id+1, NULL) AS f1', '(id+1) % 2 AS 
label').toPandas()['label'].value_counts()
{code}
Get error like:
{quote}Traceback (most recent call last): 
 File "", line 1, in 
 File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/base.py",
 line 1033, in value_counts
 dropna=dropna,
 File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
 line 820, in value_counts
 keys, counts = value_counts_arraylike(values, dropna)
 File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
 line 865, in value_counts_arraylike
 keys, counts = f(values, dropna)
 File "pandas/_libs/hashtable_func_helper.pxi", line 1098, in 
pandas._libs.hashtable.value_count_int64
 File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
 File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
 ValueError: buffer source array is read-only
{quote}

  was:
Environment:

apache/spark master 
pandas version > 1.0.5

Reproduce code:
{code}
spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True)
spark.conf.set('spark.sql.execution.arrow.pyspark.selfDestruct.enabled', True)  
  spark.createDataFrame(sc.parallelize([(i,) for i in 
range(13)], 1), 'id long').selectExpr('IF(id % 3==0, id+1, NULL) AS f1', 
'(id+1) % 2 AS label').toPandas()['label'].value_counts()
{code}

Get error like:

{quote}Traceback (most recent call last):   
   
  File "", line 1, in 
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/base.py",
 line 1033, in value_counts
dropna=dropna,
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
 line 820, in value_counts
keys, counts = value_counts_arraylike(values, dropna)
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
 line 865, in value_counts_arraylike
keys, counts = f(values, dropna)
  File "pandas/_libs/hashtable_func_helper.pxi", line 1098, in 
pandas._libs.hashtable.value_count_int64
  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only
{quote}


> toPandas failed with error: buffer source array is read-only
> 
>
> Key: SPARK-34463
> URL: https://issues.apache.org/jira/browse/SPARK-34463
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.0.2
>Reporter: Weichen Xu
>Priority: Major
>
> Environment:
> apache/spark master 
>  pandas version > 1.0.5
> Reproduce code:
> {code:java}
> spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True)
> spark.conf.set('spark.sql.execution.arrow.pyspark.selfDestruct.enabled', True)
> spark.createDataFrame(sc.parallelize([(i,) for i in range(13)], 1), 'id 
> long').selectExpr('IF(id % 3==0, id+1, NULL) AS f1', '(id+1) % 2 AS 
> label').toPandas()['label'].value_counts()
> {code}
> Get error like:
> {quote}Traceback (most recent call last): 
>  File "", line 1, in 
>  File 
> "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/base.py",
>  line 1033, in value_counts
>  dropna=dropna,
>  File 
> "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
>  line 820, in value_counts
>  keys, counts = value_counts_arraylike(values, dropna)
>  File 
> "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
>  line 865, in value_counts_arraylike
>  keys, counts = f(values, dropna)
>  File "pandas/_libs/hashtable_func_helper.pxi", line 1098, in 
> pandas._libs.hashtable.value_count_int64
>  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
>  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
>  ValueError: buffer source array is read-only
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org

[jira] [Updated] (SPARK-34463) toPandas failed with error: buffer source array is read-only

2021-02-18 Thread Weichen Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu updated SPARK-34463:
---
Description: 
Environment:

apache/spark master 
pandas version > 1.0.5

Reproduce code:
{code}
spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True)
spark.conf.set('spark.sql.execution.arrow.pyspark.selfDestruct.enabled', True)  
  spark.createDataFrame(sc.parallelize([(i,) for i in 
range(13)], 1), 'id long').selectExpr('IF(id % 3==0, id+1, NULL) AS f1', 
'(id+1) % 2 AS label').toPandas()['label'].value_counts()
{code}

Get error like:

{quote}Traceback (most recent call last):   
   
  File "", line 1, in 
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/base.py",
 line 1033, in value_counts
dropna=dropna,
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
 line 820, in value_counts
keys, counts = value_counts_arraylike(values, dropna)
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
 line 865, in value_counts_arraylike
keys, counts = f(values, dropna)
  File "pandas/_libs/hashtable_func_helper.pxi", line 1098, in 
pandas._libs.hashtable.value_count_int64
  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only
{quote}

  was:
Environment:

apache/park master 
pandas version > 1.0.5

Reproduce code:
{code}
spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True)
spark.conf.set('spark.sql.execution.arrow.pyspark.selfDestruct.enabled', True)  
  spark.createDataFrame(sc.parallelize([(i,) for i in 
range(13)], 1), 'id long').selectExpr('IF(id % 3==0, id+1, NULL) AS f1', 
'(id+1) % 2 AS label').toPandas()['label'].value_counts()
{code}

Get error like:

{quote}Traceback (most recent call last):   
   
  File "", line 1, in 
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/base.py",
 line 1033, in value_counts
dropna=dropna,
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
 line 820, in value_counts
keys, counts = value_counts_arraylike(values, dropna)
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
 line 865, in value_counts_arraylike
keys, counts = f(values, dropna)
  File "pandas/_libs/hashtable_func_helper.pxi", line 1098, in 
pandas._libs.hashtable.value_count_int64
  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only
{quote}


> toPandas failed with error: buffer source array is read-only
> 
>
> Key: SPARK-34463
> URL: https://issues.apache.org/jira/browse/SPARK-34463
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.0.2
>Reporter: Weichen Xu
>Priority: Major
>
> Environment:
> apache/spark master 
> pandas version > 1.0.5
> Reproduce code:
> {code}
> spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True)
> spark.conf.set('spark.sql.execution.arrow.pyspark.selfDestruct.enabled', 
> True)spark.createDataFrame(sc.parallelize([(i,) for i 
> in range(13)], 1), 'id long').selectExpr('IF(id % 3==0, id+1, NULL) AS f1', 
> '(id+1) % 2 AS label').toPandas()['label'].value_counts()
> {code}
> Get error like:
> {quote}Traceback (most recent call last): 
>  
>   File "", line 1, in 
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/base.py",
>  line 1033, in value_counts
> dropna=dropna,
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
>  line 820, in value_counts
> keys, counts = value_counts_arraylike(values, dropna)
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
>  line 865, in value_counts_arraylike
> keys, counts = f(values, dropna)
>   File "pandas/_libs/hashtable_func_helper.pxi", line 1098, in 
> pandas._libs.hashtable.value_count_int64
>   File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
>   File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
> ValueError: buffer source array is read-only
> {quote}



--
This message was sent by 

[jira] [Updated] (SPARK-34463) toPandas failed with error: buffer source array is read-only

2021-02-18 Thread Weichen Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu updated SPARK-34463:
---
Description: 
Environment:

apache/park master 
pandas version > 1.0.5

Reproduce code:
{code}
spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True)
spark.conf.set('spark.sql.execution.arrow.pyspark.selfDestruct.enabled', True)  
  spark.createDataFrame(sc.parallelize([(i,) for i in 
range(13)], 1), 'id long').selectExpr('IF(id % 3==0, id+1, NULL) AS f1', 
'(id+1) % 2 AS label').toPandas()['label'].value_counts()
{code}

Get error like:

{quote}Traceback (most recent call last):   
   
  File "", line 1, in 
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/base.py",
 line 1033, in value_counts
dropna=dropna,
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
 line 820, in value_counts
keys, counts = value_counts_arraylike(values, dropna)
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
 line 865, in value_counts_arraylike
keys, counts = f(values, dropna)
  File "pandas/_libs/hashtable_func_helper.pxi", line 1098, in 
pandas._libs.hashtable.value_count_int64
  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only
{quote}

  was:
Environment:

apache/park master 
pandas version > 1.0.5

Reproduce code:
{code}
spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True)
spark.conf.set('spark.sql.execution.arrow.pyspark.selfDestruct.enabled', True)  
  spark.createDataFrame(sc.parallelize([(i,) for i in 
range(13)], 1), 'id long').selectExpr('IF(id % 3==0, id+1, NULL) AS f1', 
'(id+1) % 2 AS label').toPandas()['label'].value_counts()
{code}

Get error like:

{{Traceback (most recent call last):
  
  File "", line 1, in 
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/base.py",
 line 1033, in value_counts
dropna=dropna,
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
 line 820, in value_counts
keys, counts = value_counts_arraylike(values, dropna)
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
 line 865, in value_counts_arraylike
keys, counts = f(values, dropna)
  File "pandas/_libs/hashtable_func_helper.pxi", line 1098, in 
pandas._libs.hashtable.value_count_int64
  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only
}}


> toPandas failed with error: buffer source array is read-only
> 
>
> Key: SPARK-34463
> URL: https://issues.apache.org/jira/browse/SPARK-34463
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.0.2
>Reporter: Weichen Xu
>Priority: Major
>
> Environment:
> apache/park master 
> pandas version > 1.0.5
> Reproduce code:
> {code}
> spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True)
> spark.conf.set('spark.sql.execution.arrow.pyspark.selfDestruct.enabled', 
> True)spark.createDataFrame(sc.parallelize([(i,) for i 
> in range(13)], 1), 'id long').selectExpr('IF(id % 3==0, id+1, NULL) AS f1', 
> '(id+1) % 2 AS label').toPandas()['label'].value_counts()
> {code}
> Get error like:
> {quote}Traceback (most recent call last): 
>  
>   File "", line 1, in 
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/base.py",
>  line 1033, in value_counts
> dropna=dropna,
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
>  line 820, in value_counts
> keys, counts = value_counts_arraylike(values, dropna)
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
>  line 865, in value_counts_arraylike
> keys, counts = f(values, dropna)
>   File "pandas/_libs/hashtable_func_helper.pxi", line 1098, in 
> pandas._libs.hashtable.value_count_int64
>   File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
>   File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
> ValueError: buffer source array is read-only
> {quote}



--
This message was sent by Atlassian Jira

[jira] [Updated] (SPARK-34463) toPandas failed with error: buffer source array is read-only

2021-02-18 Thread Weichen Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu updated SPARK-34463:
---
Description: 
Environment:

apache/park master 
pandas version > 1.0.5

Reproduce code:
{code}
spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True)
spark.conf.set('spark.sql.execution.arrow.pyspark.selfDestruct.enabled', True)  
  spark.createDataFrame(sc.parallelize([(i,) for i in 
range(13)], 1), 'id long').selectExpr('IF(id % 3==0, id+1, NULL) AS f1', 
'(id+1) % 2 AS label').toPandas()['label'].value_counts()
{code}

Get error like:

{{Traceback (most recent call last):
  
  File "", line 1, in 
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/base.py",
 line 1033, in value_counts
dropna=dropna,
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
 line 820, in value_counts
keys, counts = value_counts_arraylike(values, dropna)
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
 line 865, in value_counts_arraylike
keys, counts = f(values, dropna)
  File "pandas/_libs/hashtable_func_helper.pxi", line 1098, in 
pandas._libs.hashtable.value_count_int64
  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only
}}

  was:
Environment:

apache/park master 
pandas version > 1.0.5

Reproduce code:
{code: python}
spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True)
spark.conf.set('spark.sql.execution.arrow.pyspark.selfDestruct.enabled', True)  
  spark.createDataFrame(sc.parallelize([(i,) for i in 
range(13)], 1), 'id long').selectExpr('IF(id % 3==0, id+1, NULL) AS f1', 
'(id+1) % 2 AS label').toPandas()['label'].value_counts()
{code}

Get error like:

{{Traceback (most recent call last):
  
  File "", line 1, in 
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/base.py",
 line 1033, in value_counts
dropna=dropna,
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
 line 820, in value_counts
keys, counts = value_counts_arraylike(values, dropna)
  File 
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
 line 865, in value_counts_arraylike
keys, counts = f(values, dropna)
  File "pandas/_libs/hashtable_func_helper.pxi", line 1098, in 
pandas._libs.hashtable.value_count_int64
  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only
}}


> toPandas failed with error: buffer source array is read-only
> 
>
> Key: SPARK-34463
> URL: https://issues.apache.org/jira/browse/SPARK-34463
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.0.2
>Reporter: Weichen Xu
>Priority: Major
>
> Environment:
> apache/park master 
> pandas version > 1.0.5
> Reproduce code:
> {code}
> spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True)
> spark.conf.set('spark.sql.execution.arrow.pyspark.selfDestruct.enabled', 
> True)spark.createDataFrame(sc.parallelize([(i,) for i 
> in range(13)], 1), 'id long').selectExpr('IF(id % 3==0, id+1, NULL) AS f1', 
> '(id+1) % 2 AS label').toPandas()['label'].value_counts()
> {code}
> Get error like:
> {{Traceback (most recent call last):  
> 
>   File "", line 1, in 
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/base.py",
>  line 1033, in value_counts
> dropna=dropna,
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
>  line 820, in value_counts
> keys, counts = value_counts_arraylike(values, dropna)
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py",
>  line 865, in value_counts_arraylike
> keys, counts = f(values, dropna)
>   File "pandas/_libs/hashtable_func_helper.pxi", line 1098, in 
> pandas._libs.hashtable.value_count_int64
>   File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
>   File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
> ValueError: buffer source array is read-only
> }}



--
This message was sent by Atlassian Jira