samredai commented on a change in pull request #4021:
URL: https://github.com/apache/iceberg/pull/4021#discussion_r800232101



##########
File path: python/src/iceberg/io/base.py
##########
@@ -24,7 +24,40 @@
 """
 
 from abc import ABC, abstractmethod
-from typing import Union
+from typing import Protocol, Union, runtime_checkable
+
+
+@runtime_checkable
+class InputStream(Protocol):
+    def read(self, n: int) -> bytes:
+        ...

Review comment:
       > I think for that we just need to try it out until we have a minimal 
set of functions.
   
   Great idea @rdblue, I tried this out. @emkornfield it looks like your guess 
was right--although `writable` was the only method that could be removed 
without causing any error. I found that interesting since the `PythonFile` 
source code you linked to seems to need that to [infer the handler 
mode](https://github.com/apache/arrow/blob/e9e16c9da7a76718640f2b3f23200a3755790011/python/pyarrow/io.pxi#L703).
 I might be missing where that's inferred some other way, maybe duck-typing.
   
   Read requires:
   - read(...)
   - seek(...)
   - tell(...)
   - closed()
   - close()
   
   Write requires:
   - write(...)
   - closed()
   - close()
   
   This was the code I used to test this out:
   ```py
   from pyarrow import parquet as pq
   
   
   class InputFileImpl:
       def __init__(self, filepath):
           file_object = open(filepath, "rb")
   
           # Required methods for use in pq.read_table(...)
           self.read = file_object.read
           self.seek = file_object.seek
           self.tell = file_object.tell
           self.closed = file_object.closed
           self.close = file_object.close
   
   class OutputFileImpl:
       def __init__(self, filepath):
           file_object = open(filepath, "wb")
   
           # Required methods for use in pq.write_table(...)
           self.write = file_object.write
           self.closed = file_object.closed
           self.close = file_object.close
   
   
   input_file = InputFileImpl("example.parquet")
   table = pq.read_table(input_file)
   
   output_file = OutputFileImpl("example_output.parquet")
   pq.write_table(table, output_file)
   output_file.close()
   table_reread = pq.read_table("example_output.parquet")
   
   table.equals(table_reread)  # returns True
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to