Joris Van den Bossche created ARROW-7907:
--------------------------------------------

             Summary: [Python] Conversion to pandas of empty table with 
timestamp type aborts
                 Key: ARROW-7907
                 URL: https://issues.apache.org/jira/browse/ARROW-7907
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: Joris Van den Bossche
             Fix For: 0.16.1


Creating an empty table:

{code}
In [1]: table = pa.table({'a': pa.array([], type=pa.timestamp('us'))})          
                                                                                
                                                   

In [2]: table['a']                                                              
                                                                                
                                                   
Out[2]: 
<pyarrow.lib.ChunkedArray object at 0x7fbb783e8098>
[
  []
]

In [3]: table.to_pandas()                                                       
                                                                                
                                                   
Out[3]: 
Empty DataFrame
Columns: [a]
Index: []
{code}

the above works. But the ChunkedArray still has 1 empty chunk. When filtering 
data, you can actually get no chunks, and this fails:


{code}
In [4]: table2 = table.slice(0, 0)                                              
                                                                                
                                                   

In [5]: table2['a']                                                             
                                                                                
                                                   
Out[5]: 
<pyarrow.lib.ChunkedArray object at 0x7fbb783aa4a8>
[

]

In [6]: table2.to_pandas()                                                      
                                                                                
                                                   
../src/arrow/table.cc:48:  Check failed: (chunks.size()) > (0) cannot construct 
ChunkedArray from empty vector and omitted type
...
Aborted (core dumped)
{code}

and this seems to happen specifically for timestamp type, and specifically with 
non-ns unit (eg with us as above, which is the default in arrow).

I noticed this when reading a parquet file of the taxi dataset, where the 
filter I used resulted in an empty batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to