[jira] [Created] (ARROW-8613) [C++][Dataset] Raise error for unparsable partition value

Joris Van den Bossche (Jira) Tue, 28 Apr 2020 03:41:55 -0700

Joris Van den Bossche created ARROW-8613:
--------------------------------------------


             Summary: [C++][Dataset] Raise error for unparsable partition value
                 Key: ARROW-8613
                 URL: https://issues.apache.org/jira/browse/ARROW-8613
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++
            Reporter: Joris Van den Bossche
             Fix For: 1.0.0


Currently, when specifying a partitioning schema, but on of the partition field 
values cannot be parsed according to the specified type, you silently get null 
values for that partition field.

Python example:
{code:python}
import pathlib              
import pyarrow.parquet as pq 
import pyarrow.datasets as d

path = pathlib.Path(".") / "dataset_partition_schema_errors" 
path.mkdir(exist_ok=True)                                                       
                                                                                
                                           

table = pa.table({"part": ["1_2", "1_2", "3_4", "3_4"], "values": range(4)})   
pq.write_to_dataset(table, str(path), partition_cols=["part"]) 
{code}
{code:java}
In [17]: ds.dataset(path, partitioning="hive").to_table().to_pandas() 
Out[17]: 
   values part
0       0  1_2
1       1  1_2
2       2  3_4
3       3  3_4

In [18]: partitioning = ds.partitioning(pa.schema([("part", pa.int64())]), 
flavor="hive")                                                                  
                                                        

In [19]: ds.dataset(path, partitioning=partitioning).to_table().to_pandas()   
Out[19]: 
   values  part
0       0   NaN
1       1   NaN
2       2   NaN
3       3   NaN
{code}

Silently ignoring such a parse error doesn't seem the best default to me (since 
partition keys are quite essential). I think raising an error might be better? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8613) [C++][Dataset] Raise error for unparsable partition value

Reply via email to