emkornfield commented on a change in pull request #4021:
URL: https://github.com/apache/iceberg/pull/4021#discussion_r800221115
##########
File path: python/src/iceberg/io/base.py
##########
@@ -24,7 +24,40 @@
"""
from abc import ABC, abstractmethod
-from typing import Union
+from typing import Protocol, Union, runtime_checkable
+
+
+@runtime_checkable
+class InputStream(Protocol):
+ def read(self, n: int) -> bytes:
+ ...
Review comment:
TL;DR; My best guess is (that take bytes):
- `read(...)`
- `write(...)`
- `seek(...)`
- `tell(...)`
- `writable()`
- `close()`
- `closed()`
I don't know if other libraries (e.g. json) might require more.
I think the safe thing is to use IOBase methods as the protocol. The code
linked above mostly seems to only require "writeable()". The C++ code that
does the adaptation is
[here](https://github.com/apache/arrow/blob/8e43f23dcc6a9e630516228f110c48b64d13cec6/cpp/src/arrow/python/io.cc)
it looks like it accesses most IOBase methods and `read(...)`/`write(...)`
methods (these I think should take there signatures from
[RawIOBase](https://docs.python.org/3/library/io.html#io.RawIOBase). The
methods that don't look like they are necessary are 'isatty` and `fileno`.
Some of the other methods are just straight pass-through to the object
(`truncate`, `readline`, `readlines`) but even if Arrow won't blow-up I think
the question is what other libraries to we expect to interact with and if those
require the methods.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]