[
https://issues.apache.org/jira/browse/ARROW-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15970562#comment-15970562
]
Wes McKinney commented on ARROW-601:
------------------------------------
Now we have
{code}
In [5]: table = pq.read_table('/home/wesm/Downloads/t.parquet')
In [6]: table
Out[6]:
pyarrow.Table
an-int: int64
another-int: int64
a-double: double
a-boolean: bool
a-group: struct<bool: bool, another: int64>
a-string: string
a-date: date64[ms]
time: time32[ms]
In [7]: table.to_pandas()
---------------------------------------------------------------------------
ArrowNotImplementedError Traceback (most recent call last)
<ipython-input-7-cad6e023c888> in <module>()
----> 1 table.to_pandas()
/home/wesm/code/arrow/python/pyarrow/_table.pyx in
pyarrow._table.Table.to_pandas
(/home/wesm/code/arrow/python/build/temp.linux-x86_64-3.5/_table.cxx:9726)()
745 nthreads = pyarrow._config.cpu_count()
746
--> 747 mgr = table_to_blockmanager(self.sp_table, nthreads)
748 return _pandas().DataFrame(mgr)
749
/home/wesm/code/arrow/python/pyarrow/_table.pyx in
pyarrow._table.table_to_blockmanager
(/home/wesm/code/arrow/python/build/temp.linux-x86_64-3.5/_table.cxx:7763)()
539
540 with nogil:
--> 541 check_status(pyarrow.ConvertTableToPandas(table, nthreads,
542 &result_obj))
543
/home/wesm/code/arrow/python/pyarrow/_error.pyx in pyarrow._error.check_status
(/home/wesm/code/arrow/python/build/temp.linux-x86_64-3.5/_error.cxx:1542)()
64 raise ArrowKeyError(message)
65 elif status.IsNotImplemented():
---> 66 raise ArrowNotImplementedError(message)
67 elif status.IsTypeError():
68 raise ArrowTypeError(message)
ArrowNotImplementedError: NotImplemented: struct<bool: bool, another: int64>
{code}
The attached file has schema
{code}
message root {
optional int64 an-int;
optional int64 another-int;
optional double a-double;
optional boolean a-boolean;
optional group a-group {
optional boolean bool;
optional int64 another;
}
optional binary a-string (UTF8);
optional int32 a-date (DATE);
optional int32 time (TIME_MILLIS);
}
{code}
So the Arrow schema looks OK (we should switch to date32 now), but the struct
should be skipped on read because it isn't supported. [~xhochy] can you take a
look?
> Some logical types not supported when loading Parquet
> -----------------------------------------------------
>
> Key: ARROW-601
> URL: https://issues.apache.org/jira/browse/ARROW-601
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.2.0
> Reporter: Saul Shanabrook
> Assignee: Miki Tebeka
> Labels: parquet
> Fix For: 0.3.0
>
> Attachments: t.parquet
>
>
> When I try to read a parquet file with some logical types in it, pyarrow says
> they are not supported:
> {code}
> table = pq.read_table('t.parquet')
> ---------------------------------------------------------------------------
> ArrowException Traceback (most recent call last)
> <ipython-input-14-b7190e66bcb5> in <module>()
> ----> 1 table = pq.read_table('parquet/t')
> /opt/conda/lib/python3.5/site-packages/pyarrow/parquet.py in
> read_table(source, columns, nthreads, metadata)
> 113
> 114 pf = ParquetFile(source, metadata=metadata)
> --> 115 return pf.read(columns=columns, nthreads=nthreads)
> 116
> 117
> /opt/conda/lib/python3.5/site-packages/pyarrow/parquet.py in read(self,
> nrows, columns, nthreads)
> 78
> 79 return self.reader.read(column_indices=column_indices,
> ---> 80 nthreads=nthreads)
> 81
> 82
> /opt/conda/lib/python3.5/site-packages/pyarrow/_parquet.pyx in
> pyarrow._parquet.ParquetReader.read
> (/feedstock_root/build_artefacts/pyarrow_1488133203047/work/arrow-f6924ad83bc95741f003830892ad4815ca3b70fd/python/build/temp.linux-x86_64-3.5/_parquet.cxx:7706)()
> /opt/conda/lib/python3.5/site-packages/pyarrow/error.pyx in
> pyarrow.error.check_status
> (/feedstock_root/build_artefacts/pyarrow_1488133203047/work/arrow-f6924ad83bc95741f003830892ad4815ca3b70fd/python/build/temp.linux-x86_64-3.5/error.cxx:1197)()
> ArrowException: NotImplemented: Unhandled logical type for int32
> {code}
> This is the schema of the parquet file (see attached):
> {code}
> optional group root {
> optional int64 instant (TIMESTAMP_MILLIS);
> optional int32 time (TIME_MILLIS);
> optional double a-double;
> optional int64 another-int;
> optional binary a-string (UTF8);
> optional group list (LIST) {
> repeated group list {
> optional int64 element;
> }
> }
> optional boolean a-boolean;
> optional group a-group {
> optional boolean bool;
> optional int64 another;
> }
> optional int64 an-int;
> optional int32 a-date (DATE);
> }
> {code}
> I assume this is because not pyarrow doesn't support loading all the parquet
> logical types yet. Is there someplace I can look (even if it's not
> documented, just in the codebase), where I can find what types are supported
> currently and which are not?
>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)