[jira] [Commented] (ARROW-7907) [Python] Conversion to pandas of empty table with timestamp type aborts

Wes McKinney (Jira) Tue, 10 Mar 2020 18:36:23 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-7907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056573#comment-17056573
 ]


Wes McKinney commented on ARROW-7907:
-------------------------------------

This looks like it was fixed in 
https://github.com/apache/arrow/commit/6ff156972ac426ef88b1e6674b975a6c61ef852d.
 I'll add a unit test to exercise the 0-length slice path

> [Python] Conversion to pandas of empty table with timestamp type aborts
> -----------------------------------------------------------------------
>
>                 Key: ARROW-7907
>                 URL: https://issues.apache.org/jira/browse/ARROW-7907
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Priority: Major
>             Fix For: 0.17.0
>
>
> Creating an empty table:
> {code}
> In [1]: table = pa.table({'a': pa.array([], type=pa.timestamp('us'))})        
>                                                                               
>                                                        
> In [2]: table['a']                                                            
>                                                                               
>                                                        
> Out[2]: 
> <pyarrow.lib.ChunkedArray object at 0x7fbb783e8098>
> [
>   []
> ]
> In [3]: table.to_pandas()                                                     
>                                                                               
>                                                        
> Out[3]: 
> Empty DataFrame
> Columns: [a]
> Index: []
> {code}
> the above works. But the ChunkedArray still has 1 empty chunk. When filtering 
> data, you can actually get no chunks, and this fails:
> {code}
> In [4]: table2 = table.slice(0, 0)                                            
>                                                                               
>                                                        
> In [5]: table2['a']                                                           
>                                                                               
>                                                        
> Out[5]: 
> <pyarrow.lib.ChunkedArray object at 0x7fbb783aa4a8>
> [
> ]
> In [6]: table2.to_pandas()                                                    
>                                                                               
>                                                        
> ../src/arrow/table.cc:48:  Check failed: (chunks.size()) > (0) cannot 
> construct ChunkedArray from empty vector and omitted type
> ...
> Aborted (core dumped)
> {code}
> and this seems to happen specifically for timestamp type, and specifically 
> with non-ns unit (eg with us as above, which is the default in arrow).
> I noticed this when reading a parquet file of the taxi dataset, where the 
> filter I used resulted in an empty batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-7907) [Python] Conversion to pandas of empty table with timestamp type aborts

Reply via email to