Re: [PR] Implement __getstate__ and __setstate__ on PyArrowFileIO and FsSpecFileIO so that they can be pickled [iceberg-python]

2024-04-07 Thread via GitHub


HonahX merged PR #543:
URL: https://github.com/apache/iceberg-python/pull/543


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement __getstate__ and __setstate__ on PyArrowFileIO and FsSpecFileIO so that they can be pickled [iceberg-python]

2024-04-07 Thread via GitHub


HonahX commented on PR #543:
URL: https://github.com/apache/iceberg-python/pull/543#issuecomment-2041611486

   Merged, Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement __getstate__ and __setstate__ on PyArrowFileIO and FsSpecFileIO so that they can be pickled [iceberg-python]

2024-04-07 Thread via GitHub


amogh-jahagirdar commented on code in PR #543:
URL: https://github.com/apache/iceberg-python/pull/543#discussion_r1554993449


##
tests/io/test_fsspec.py:
##
@@ -586,6 +597,25 @@ def 
test_writing_avro_file_gcs(generated_manifest_entry_file: str, fsspec_fileio
 fsspec_fileio_gcs.delete(f"gs://warehouse/{filename}")
 
 
+@pytest.mark.gcs
+def test_fsspec_pickle_roundtrip_gcs(fsspec_fileio_gcs: FsspecFileIO) -> None:
+_test_fsspec_pickle_round_trip(fsspec_fileio_gcs, "gs://warehouse/foo.txt")
+
+
+def _test_fsspec_pickle_round_trip(fsspec_fileio: FsspecFileIO, location: str) 
-> None:
+serialized_file_io = pickle.dumps(fsspec_fileio)
+deserialized_file_io = pickle.loads(serialized_file_io)
+output_file = deserialized_file_io.new_output(location)
+with output_file.create() as f:
+f.write(b"foo")
+
+input_file = deserialized_file_io.new_input(location)
+with input_file.open() as f:
+data = f.read()
+assert data == b"foo"
+assert len(input_file) == 3
+

Review Comment:
   Good idea, yes tests in general should be able to be re-run properly and to 
do that we should cleanup the resource at the end!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement __getstate__ and __setstate__ on PyArrowFileIO and FsSpecFileIO so that they can be pickled [iceberg-python]

2024-04-07 Thread via GitHub


amogh-jahagirdar commented on code in PR #543:
URL: https://github.com/apache/iceberg-python/pull/543#discussion_r1554993337


##
tests/io/test_fsspec.py:
##
@@ -61,7 +62,7 @@ def test_fsspec_new_input_file(fsspec_fileio: FsspecFileIO) 
-> None:
 assert input_file.location == f"s3://warehouse/{filename}"
 
 
-@pytest.mark.s3
+@pytest.mark.s3fsspec_file_io

Review Comment:
   Ah good catch, I think this was a copy/paste bug (somehow pasted 
fsspec_file_io on this line by mistake)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement __getstate__ and __setstate__ on PyArrowFileIO and FsSpecFileIO so that they can be pickled [iceberg-python]

2024-04-07 Thread via GitHub


amogh-jahagirdar commented on code in PR #543:
URL: https://github.com/apache/iceberg-python/pull/543#discussion_r1554993080


##
tests/io/test_fsspec.py:
##
@@ -586,6 +597,25 @@ def 
test_writing_avro_file_gcs(generated_manifest_entry_file: str, fsspec_fileio
 fsspec_fileio_gcs.delete(f"gs://warehouse/{filename}")
 
 
+@pytest.mark.gcs
+def test_fsspec_pickle_roundtrip_gcs(fsspec_fileio_gcs: FsspecFileIO) -> None:
+_test_fsspec_pickle_round_trip(fsspec_fileio_gcs, "gs://warehouse/foo.txt")
+
+
+def _test_fsspec_pickle_round_trip(fsspec_fileio: FsspecFileIO, location: str) 
-> None:
+serialized_file_io = pickle.dumps(fsspec_fileio)

Review Comment:
   Agreed! I can take up renaming in a separate PR so it's easier to review.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement __getstate__ and __setstate__ on PyArrowFileIO and FsSpecFileIO so that they can be pickled [iceberg-python]

2024-04-07 Thread via GitHub


HonahX commented on code in PR #543:
URL: https://github.com/apache/iceberg-python/pull/543#discussion_r1554859960


##
tests/io/test_fsspec.py:
##
@@ -586,6 +597,25 @@ def 
test_writing_avro_file_gcs(generated_manifest_entry_file: str, fsspec_fileio
 fsspec_fileio_gcs.delete(f"gs://warehouse/{filename}")
 
 
+@pytest.mark.gcs
+def test_fsspec_pickle_roundtrip_gcs(fsspec_fileio_gcs: FsspecFileIO) -> None:
+_test_fsspec_pickle_round_trip(fsspec_fileio_gcs, "gs://warehouse/foo.txt")
+
+
+def _test_fsspec_pickle_round_trip(fsspec_fileio: FsspecFileIO, location: str) 
-> None:
+serialized_file_io = pickle.dumps(fsspec_fileio)
+deserialized_file_io = pickle.loads(serialized_file_io)
+output_file = deserialized_file_io.new_output(location)
+with output_file.create() as f:
+f.write(b"foo")
+
+input_file = deserialized_file_io.new_input(location)
+with input_file.open() as f:
+data = f.read()
+assert data == b"foo"
+assert len(input_file) == 3
+

Review Comment:
   ```suggestion
   fsspec_fileio.delete(location)
   ```
   How about deleting the file in the end to make these tests re-runnable?



##
tests/io/test_fsspec.py:
##
@@ -61,7 +62,7 @@ def test_fsspec_new_input_file(fsspec_fileio: FsspecFileIO) 
-> None:
 assert input_file.location == f"s3://warehouse/{filename}"
 
 
-@pytest.mark.s3
+@pytest.mark.s3fsspec_file_io

Review Comment:
   This seems to be an unrelated change



##
tests/io/test_fsspec.py:
##
@@ -586,6 +597,25 @@ def 
test_writing_avro_file_gcs(generated_manifest_entry_file: str, fsspec_fileio
 fsspec_fileio_gcs.delete(f"gs://warehouse/{filename}")
 
 
+@pytest.mark.gcs
+def test_fsspec_pickle_roundtrip_gcs(fsspec_fileio_gcs: FsspecFileIO) -> None:
+_test_fsspec_pickle_round_trip(fsspec_fileio_gcs, "gs://warehouse/foo.txt")
+
+
+def _test_fsspec_pickle_round_trip(fsspec_fileio: FsspecFileIO, location: str) 
-> None:
+serialized_file_io = pickle.dumps(fsspec_fileio)

Review Comment:
   I just realized that we use both `fileio` and `file_io` in the codespace: 
(e.g. `fsspec_fileio`, `load_file_io`). I would be good if we could 
consistently use one of them. This may be done in a separate PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement __getstate__ and __setstate__ on PyArrowFileIO and FsSpecFileIO so that they can be pickled [iceberg-python]

2024-04-06 Thread via GitHub


amogh-jahagirdar commented on code in PR #543:
URL: https://github.com/apache/iceberg-python/pull/543#discussion_r1554762228


##
tests/io/test_pyarrow.py:
##
@@ -256,6 +257,14 @@ def test_raise_on_opening_a_local_file_not_found() -> None:
 assert "[Errno 2] Failed to open local file" in str(exc_info.value)
 
 
+def test_pickle_pyarrow_file_io() -> None:

Review Comment:
   Sorry for the delay on this, got busy with other work, Updated!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement __getstate__ and __setstate__ on PyArrowFileIO and FsSpecFileIO so that they can be pickled [iceberg-python]

2024-03-25 Thread via GitHub


Fokko commented on code in PR #543:
URL: https://github.com/apache/iceberg-python/pull/543#discussion_r1537486747


##
tests/io/test_pyarrow.py:
##
@@ -256,6 +257,14 @@ def test_raise_on_opening_a_local_file_not_found() -> None:
 assert "[Errno 2] Failed to open local file" in str(exc_info.value)
 
 
+def test_pickle_pyarrow_file_io() -> None:

Review Comment:
   Yes, that would be great. You can just re-use an integration-test  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement __getstate__ and __setstate__ on PyArrowFileIO and FsSpecFileIO so that they can be pickled [iceberg-python]

2024-03-23 Thread via GitHub


amogh-jahagirdar commented on code in PR #543:
URL: https://github.com/apache/iceberg-python/pull/543#discussion_r1536736087


##
tests/io/test_pyarrow.py:
##
@@ -256,6 +257,14 @@ def test_raise_on_opening_a_local_file_not_found() -> None:
 assert "[Errno 2] Failed to open local file" in str(exc_info.value)
 
 
+def test_pickle_pyarrow_file_io() -> None:

Review Comment:
   Let me add a test for fsspec as well. Also, we probably want  a stronger 
round/trip test of pickling worth asserting on a few fields or even more, 
actually attempt to use the deserialized FileIO for reading/writing.



##
tests/io/test_pyarrow.py:
##
@@ -256,6 +257,14 @@ def test_raise_on_opening_a_local_file_not_found() -> None:
 assert "[Errno 2] Failed to open local file" in str(exc_info.value)
 
 
+def test_pickle_pyarrow_file_io() -> None:

Review Comment:
   Let me add a test for fsspec as well. Also, we probably want  a stronger 
round/trip test of pickling worth asserting on a few fields or even better, 
actually attempt to use the deserialized FileIO for reading/writing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement __getstate__ and __setstate__ on PyArrowFileIO and FsSpecFileIO so that they can be pickled [iceberg-python]

2024-03-23 Thread via GitHub


amogh-jahagirdar commented on code in PR #543:
URL: https://github.com/apache/iceberg-python/pull/543#discussion_r1536736087


##
tests/io/test_pyarrow.py:
##
@@ -256,6 +257,14 @@ def test_raise_on_opening_a_local_file_not_found() -> None:
 assert "[Errno 2] Failed to open local file" in str(exc_info.value)
 
 
+def test_pickle_pyarrow_file_io() -> None:

Review Comment:
   Let me add a test for fsspec as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org