joosthooz commented on code in PR #13709:
URL: https://github.com/apache/arrow/pull/13709#discussion_r939931651
##########
python/pyarrow/dataset.py:
##########
@@ -433,6 +433,10 @@ def _filesystem_dataset(source, schema=None,
filesystem=None,
FileSystemDataset
"""
format = _ensure_format(format or 'parquet')
+ if isinstance(format, CsvFileFormat):
Review Comment:
I can move this down to the place where the `CsvFragmentScanOptions` is
created, and just set the wrapper function in the init function. That also
takes care of there being a publicly accessible function which we don't want.
##########
python/pyarrow/io.pxi:
##########
@@ -1547,6 +1547,33 @@ class Transcoder:
return self._encoder.encode(self._decoder.decode(buf, final), final)
+cdef shared_ptr[function[StreamWrapFunc]] make_streamwrap_func(
+ src_encoding, dest_encoding) except *:
+ """
+ Create a function that will add a transcoding transformation to a stream.
+ Data from that stream will be decoded according to ``src_encoding`` and
+ then re-encoded according to ``dest_encoding``.
+ The created function can be used to wrap streams.
+
+ Parameters
+ ----------
+ src_encoding : str
+ The codec to use when reading data.
+ dest_encoding : str
+ The codec to use for emitted data.
+ """
+ cdef:
+ shared_ptr[function[StreamWrapFunc]] empty_func
+ CTransformInputStreamVTable vtable
+
+ vtable.transform = _cb_transform
+ src_codec = codecs.lookup(src_encoding)
+ dest_codec = codecs.lookup(dest_encoding)
Review Comment:
Yes, also because returning an empty function pointer from here does not
seem to work!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]