[
https://issues.apache.org/jira/browse/ARROW-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16571888#comment-16571888
]
Jim Crist commented on ARROW-3009:
----------------------------------
It did when I did the initial writeup as well. The issue is the writer doesn't
support nearly as many features as the reader. In particular, it can only write
HIVE 0.11 format, which is rather old (2013!) and doesn't support decimals with
precision/scale. It would be better than nothing though.
Personally I think using the datasets from the ORC repo, or generating our own
using an external tool would be an easier way to move forward immediately.
> Python ORC failing on 0.10.0
> ----------------------------
>
> Key: ARROW-3009
> URL: https://issues.apache.org/jira/browse/ARROW-3009
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Jim Crist
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.11.0
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> This is probably because there no tests for the python orc reader (sorry I
> never got around to that).
> The error message is
> TypeError: Do not call RecordBatch's constructor directly, use one of the
> `RecordBatch.from_*` functions instead.
> and comes from these lines in the orc cython code:
> https://github.com/apache/arrow/blob/master/python/pyarrow/_orc.pyx#L96-L97
> I'm not sure what they should be replaced with in the new api.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)