[ 
https://issues.apache.org/jira/browse/ARROW-7907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-7907:
-----------------------------------

    Assignee: Wes McKinney

> [Python] Conversion to pandas of empty table with timestamp type aborts
> -----------------------------------------------------------------------
>
>                 Key: ARROW-7907
>                 URL: https://issues.apache.org/jira/browse/ARROW-7907
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Assignee: Wes McKinney
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.17.0
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Creating an empty table:
> {code}
> In [1]: table = pa.table({'a': pa.array([], type=pa.timestamp('us'))})        
>                                                                               
>                                                        
> In [2]: table['a']                                                            
>                                                                               
>                                                        
> Out[2]: 
> <pyarrow.lib.ChunkedArray object at 0x7fbb783e8098>
> [
>   []
> ]
> In [3]: table.to_pandas()                                                     
>                                                                               
>                                                        
> Out[3]: 
> Empty DataFrame
> Columns: [a]
> Index: []
> {code}
> the above works. But the ChunkedArray still has 1 empty chunk. When filtering 
> data, you can actually get no chunks, and this fails:
> {code}
> In [4]: table2 = table.slice(0, 0)                                            
>                                                                               
>                                                        
> In [5]: table2['a']                                                           
>                                                                               
>                                                        
> Out[5]: 
> <pyarrow.lib.ChunkedArray object at 0x7fbb783aa4a8>
> [
> ]
> In [6]: table2.to_pandas()                                                    
>                                                                               
>                                                        
> ../src/arrow/table.cc:48:  Check failed: (chunks.size()) > (0) cannot 
> construct ChunkedArray from empty vector and omitted type
> ...
> Aborted (core dumped)
> {code}
> and this seems to happen specifically for timestamp type, and specifically 
> with non-ns unit (eg with us as above, which is the default in arrow).
> I noticed this when reading a parquet file of the taxi dataset, where the 
> filter I used resulted in an empty batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to