Hi,

Currently I’m working on ARROW-11297 
https://github.com/mathyingzhou/arrow/tree/ARROW-11297 
<https://github.com/mathyingzhou/arrow/tree/ARROW-11297>) which will be filed 
as soon as the current PR is merged. 

I managed to reimplement orc::WriterOptions in Arrow (with naming conventions 
Arrow-ized) as arrow::adapters::orc::WriterOptions (which is necessary since we 
do not allow third party headers to be included in our public headers) and 
finished the C++ part of the work. Now I’m trying to expose WriterOptions in 
Python. I do wonder how this is supposed to be done in general. After reading 
the code in array.pxi I think maybe this is the way I want to do it:

1. The end user will see individual ORC writer options (e.g. CompressionKind, 
that is, whether we use ZLIB, LZ0 or some other form of compression or none at 
all) as keyword arguments.
2. These keyword arguments will be processed in _orc.pyx first as a dictionary 
and then using an adapter they will be converted into an 
arrow::adapters::orc::WriterOptions. 

Is this the right way?

Moreover I do wonder how we should convert the enums. Shall I use a series of 
if/elif or a mapping dict to force people to use one of the correct strings or 
get a ValueError?

e.g.

compression_kind_mapping = {’snappy’:CompressionKind._CompressionKind_SNAPPY,
                                                
’zl0’:CompressionKind._CompressionKind_ZL0}} #There are other options, this is 
just an example
If compression_kind not in compression_kind_mapping.keys():
        raise ValueError(“Unknown compression_kind”)
c_compression_kind = compression_kind_mapping[compression_kind]

Ying

Reply via email to