subject:"\[jira\] \[Commented\] \(SPARK\-24644\) Pyarrow exception while running pandas

[jira] [Commented] (SPARK-24644) Pyarrow exception while running pandas_udf on pyspark 2.3.1

2018-07-17 Thread Hichame El Khalfi (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16547031#comment-16547031
 ] 

Hichame El Khalfi commented on SPARK-24644:
---

Indeed we were using an old version on pandas, now after updating it to 0.19.2, 
no crash/error to report.

Thank you [~bryanc] and [~hyukjin.kwon] for you valuable help and input (y).

 

> Pyarrow exception while running pandas_udf on pyspark 2.3.1
> ---
>
> Key: SPARK-24644
> URL: https://issues.apache.org/jira/browse/SPARK-24644
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.1
> Environment: os: centos
> pyspark 2.3.1
> spark 2.3.1
> pyarrow >= 0.8.0
>Reporter: Hichame El Khalfi
>Priority: Major
>
> Hello,
> When I try to run a `pandas_udf` on my spark dataframe, I get this error
>  
> {code:java}
>   File 
> "/mnt/ephemeral3/yarn/nm/usercache/user/appcache/application_1524574803975_205774/container_e280_1524574803975_205774_01_44/pyspark.zip/pyspark/serializers.py",
>  lin
> e 280, in load_stream
> pdf = batch.to_pandas()
>   File "pyarrow/table.pxi", line 677, in pyarrow.lib.RecordBatch.to_pandas 
> (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:43226)
> return Table.from_batches([self]).to_pandas(nthreads=nthreads)
>   File "pyarrow/table.pxi", line 1043, in pyarrow.lib.Table.to_pandas 
> (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:46331)
> mgr = pdcompat.table_to_blockmanager(options, self, memory_pool,
>   File "/usr/lib64/python2.7/site-packages/pyarrow/pandas_compat.py", line 
> 528, in table_to_blockmanager
> blocks = _table_to_blocks(options, block_table, nthreads, memory_pool)
>   File "/usr/lib64/python2.7/site-packages/pyarrow/pandas_compat.py", line 
> 622, in _table_to_blocks
> return [_reconstruct_block(item) for item in result]
>   File "/usr/lib64/python2.7/site-packages/pyarrow/pandas_compat.py", line 
> 446, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
> TypeError: make_block() takes at least 3 arguments (2 given)
> {code}
>  
>  More than happy to provide any additional information



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24644) Pyarrow exception while running pandas_udf on pyspark 2.3.1

2018-07-16 Thread Bryan Cutler (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545589#comment-16545589
 ] 

Bryan Cutler commented on SPARK-24644:
--

[~helkhalfi], the error in the stack trace is coming from pandas internals and 
it looks like you are using a pretty old version, so my guess is that you need 
to upgrade pandas to solve this.  For Spark, we currently test pyarrow with 
pandas 0.19.2 and I would recommend at least that version or higher.

> Pyarrow exception while running pandas_udf on pyspark 2.3.1
> ---
>
> Key: SPARK-24644
> URL: https://issues.apache.org/jira/browse/SPARK-24644
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.1
> Environment: os: centos
> pyspark 2.3.1
> spark 2.3.1
> pyarrow >= 0.8.0
>Reporter: Hichame El Khalfi
>Priority: Major
>
> Hello,
> When I try to run a `pandas_udf` on my spark dataframe, I get this error
>  
> {code:java}
>   File 
> "/mnt/ephemeral3/yarn/nm/usercache/user/appcache/application_1524574803975_205774/container_e280_1524574803975_205774_01_44/pyspark.zip/pyspark/serializers.py",
>  lin
> e 280, in load_stream
> pdf = batch.to_pandas()
>   File "pyarrow/table.pxi", line 677, in pyarrow.lib.RecordBatch.to_pandas 
> (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:43226)
> return Table.from_batches([self]).to_pandas(nthreads=nthreads)
>   File "pyarrow/table.pxi", line 1043, in pyarrow.lib.Table.to_pandas 
> (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:46331)
> mgr = pdcompat.table_to_blockmanager(options, self, memory_pool,
>   File "/usr/lib64/python2.7/site-packages/pyarrow/pandas_compat.py", line 
> 528, in table_to_blockmanager
> blocks = _table_to_blocks(options, block_table, nthreads, memory_pool)
>   File "/usr/lib64/python2.7/site-packages/pyarrow/pandas_compat.py", line 
> 622, in _table_to_blocks
> return [_reconstruct_block(item) for item in result]
>   File "/usr/lib64/python2.7/site-packages/pyarrow/pandas_compat.py", line 
> 446, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
> TypeError: make_block() takes at least 3 arguments (2 given)
> {code}
>  
>  More than happy to provide any additional information



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24644) Pyarrow exception while running pandas_udf on pyspark 2.3.1

2018-07-10 Thread Hyukjin Kwon (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539626#comment-16539626
 ] 

Hyukjin Kwon commented on SPARK-24644:
--

Thanks, [~helkhalfi]. mind if I ask to post the codes you ran so that I or 
someone else could reproduce and investigate further?

> Pyarrow exception while running pandas_udf on pyspark 2.3.1
> ---
>
> Key: SPARK-24644
> URL: https://issues.apache.org/jira/browse/SPARK-24644
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.1
> Environment: os: centos
> pyspark 2.3.1
> spark 2.3.1
> pyarrow >= 0.8.0
>Reporter: Hichame El Khalfi
>Priority: Major
>
> Hello,
> When I try to run a `pandas_udf` on my spark dataframe, I get this error
>  
> {code:java}
>   File 
> "/mnt/ephemeral3/yarn/nm/usercache/user/appcache/application_1524574803975_205774/container_e280_1524574803975_205774_01_44/pyspark.zip/pyspark/serializers.py",
>  lin
> e 280, in load_stream
> pdf = batch.to_pandas()
>   File "pyarrow/table.pxi", line 677, in pyarrow.lib.RecordBatch.to_pandas 
> (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:43226)
> return Table.from_batches([self]).to_pandas(nthreads=nthreads)
>   File "pyarrow/table.pxi", line 1043, in pyarrow.lib.Table.to_pandas 
> (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:46331)
> mgr = pdcompat.table_to_blockmanager(options, self, memory_pool,
>   File "/usr/lib64/python2.7/site-packages/pyarrow/pandas_compat.py", line 
> 528, in table_to_blockmanager
> blocks = _table_to_blocks(options, block_table, nthreads, memory_pool)
>   File "/usr/lib64/python2.7/site-packages/pyarrow/pandas_compat.py", line 
> 622, in _table_to_blocks
> return [_reconstruct_block(item) for item in result]
>   File "/usr/lib64/python2.7/site-packages/pyarrow/pandas_compat.py", line 
> 446, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
> TypeError: make_block() takes at least 3 arguments (2 given)
> {code}
>  
>  More than happy to provide any additional information



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24644) Pyarrow exception while running pandas_udf on pyspark 2.3.1

2018-07-03 Thread Hichame El Khalfi (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532178#comment-16532178
 ] 

Hichame El Khalfi commented on SPARK-24644:
---

Hello [~hyukjin.kwon]

Thanks for taking time on this ticket

Regardig the environemnt, we are using:
 * CentOS 7
 * JDK 1.8.0_101-b13
 * CPython interpreter 2.7
 * Spark 2.3.1 in distributed mode.
 * pandas 0.13.0
 * pyarrow 0.9.0

 

> Pyarrow exception while running pandas_udf on pyspark 2.3.1
> ---
>
> Key: SPARK-24644
> URL: https://issues.apache.org/jira/browse/SPARK-24644
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 2.3.1
> Environment: os: centos
> pyspark 2.3.1
> spark 2.3.1
> pyarrow >= 0.8.0
>Reporter: Hichame El Khalfi
>Priority: Major
>
> Hello,
> When I try to run a `pandas_udf` on my spark dataframe, I get this error
>  
> {code:java}
>   File 
> "/mnt/ephemeral3/yarn/nm/usercache/user/appcache/application_1524574803975_205774/container_e280_1524574803975_205774_01_44/pyspark.zip/pyspark/serializers.py",
>  lin
> e 280, in load_stream
> pdf = batch.to_pandas()
>   File "pyarrow/table.pxi", line 677, in pyarrow.lib.RecordBatch.to_pandas 
> (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:43226)
> return Table.from_batches([self]).to_pandas(nthreads=nthreads)
>   File "pyarrow/table.pxi", line 1043, in pyarrow.lib.Table.to_pandas 
> (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:46331)
> mgr = pdcompat.table_to_blockmanager(options, self, memory_pool,
>   File "/usr/lib64/python2.7/site-packages/pyarrow/pandas_compat.py", line 
> 528, in table_to_blockmanager
> blocks = _table_to_blocks(options, block_table, nthreads, memory_pool)
>   File "/usr/lib64/python2.7/site-packages/pyarrow/pandas_compat.py", line 
> 622, in _table_to_blocks
> return [_reconstruct_block(item) for item in result]
>   File "/usr/lib64/python2.7/site-packages/pyarrow/pandas_compat.py", line 
> 446, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
> TypeError: make_block() takes at least 3 arguments (2 given)
> {code}
>  
>  More than happy to provide any additional information



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24644) Pyarrow exception while running pandas_udf on pyspark 2.3.1

2018-06-26 Thread Hyukjin Kwon (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523399#comment-16523399
 ] 

Hyukjin Kwon commented on SPARK-24644:
--

Can you clarify the environment, in particular, PyArrow and Pandas versions?

> Pyarrow exception while running pandas_udf on pyspark 2.3.1
> ---
>
> Key: SPARK-24644
> URL: https://issues.apache.org/jira/browse/SPARK-24644
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 2.3.1
> Environment: os: centos
> pyspark 2.3.1
> spark 2.3.1
> pyarrow >= 0.8.0
>Reporter: Hichame El Khalfi
>Priority: Major
>
> Hello,
> When I try to run a `pandas_udf` on my spark dataframe, I get this error
>  
> {code:java}
>   File 
> "/mnt/ephemeral3/yarn/nm/usercache/user/appcache/application_1524574803975_205774/container_e280_1524574803975_205774_01_44/pyspark.zip/pyspark/serializers.py",
>  lin
> e 280, in load_stream
> pdf = batch.to_pandas()
>   File "pyarrow/table.pxi", line 677, in pyarrow.lib.RecordBatch.to_pandas 
> (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:43226)
> return Table.from_batches([self]).to_pandas(nthreads=nthreads)
>   File "pyarrow/table.pxi", line 1043, in pyarrow.lib.Table.to_pandas 
> (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:46331)
> mgr = pdcompat.table_to_blockmanager(options, self, memory_pool,
>   File "/usr/lib64/python2.7/site-packages/pyarrow/pandas_compat.py", line 
> 528, in table_to_blockmanager
> blocks = _table_to_blocks(options, block_table, nthreads, memory_pool)
>   File "/usr/lib64/python2.7/site-packages/pyarrow/pandas_compat.py", line 
> 622, in _table_to_blocks
> return [_reconstruct_block(item) for item in result]
>   File "/usr/lib64/python2.7/site-packages/pyarrow/pandas_compat.py", line 
> 446, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
> TypeError: make_block() takes at least 3 arguments (2 given)
> {code}
>  
>  More than happy to provide any additional information



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24644) Pyarrow exception while running pandas_udf on pyspark 2.3.1

[jira] [Commented] (SPARK-24644) Pyarrow exception while running pandas_udf on pyspark 2.3.1

[jira] [Commented] (SPARK-24644) Pyarrow exception while running pandas_udf on pyspark 2.3.1

[jira] [Commented] (SPARK-24644) Pyarrow exception while running pandas_udf on pyspark 2.3.1

[jira] [Commented] (SPARK-24644) Pyarrow exception while running pandas_udf on pyspark 2.3.1

5 matches

Site Navigation

Mail list logo

Footer information