kou commented on code in PR #13677:
URL: https://github.com/apache/arrow/pull/13677#discussion_r956571595
##########
python/pyarrow/tests/test_dataset.py:
##########
@@ -4192,27 +4192,27 @@ def test_write_table_multiple_fragments(tempdir):
# Table with multiple batches written as single Fragment by default
base_dir = tempdir / 'single'
ds.write_dataset(table, base_dir, format="feather")
- assert set(base_dir.rglob("*")) == set([base_dir / "part-0.feather"])
+ assert set(base_dir.rglob("*")) == set([base_dir / "part-0.arrow"])
assert ds.dataset(base_dir, format="ipc").to_table().equals(table)
# Same for single-element list of Table
base_dir = tempdir / 'single-list'
ds.write_dataset([table], base_dir, format="feather")
- assert set(base_dir.rglob("*")) == set([base_dir / "part-0.feather"])
+ assert set(base_dir.rglob("*")) == set([base_dir / "part-0.arrow"])
assert ds.dataset(base_dir, format="ipc").to_table().equals(table)
# Provide list of batches to write multiple fragments
base_dir = tempdir / 'multiple'
ds.write_dataset(table.to_batches(), base_dir, format="feather")
assert set(base_dir.rglob("*")) == set(
- [base_dir / "part-0.feather"])
+ [base_dir / "part-0.arrow"])
Review Comment:
I don't think that the implementation I suggested is so complex (if it
works).
I think that specifying `format="feather"` explicitly means that "the user
wants to call the format Feather V2 not Apache Arrow IPC file format" because
the user can specify `format="ipc"` or `format="arrow"` instead of
`format="feather"`. I don't know why we need to force the user to specify
`basename_template` too for this case.
> I want to emphasize that it is hard to understand the relationship between
IPC files and Feather files anyway.
We may need to deprecate Feather V2. Could you start a discussion on the
`[email protected]` mailing list?
> For example, in Julia, if we want to read an IPC file, I need Arrow.jl,
but if I want to read a Feather V1 file, we need the Feather.jl library.
We can resolve the problem by implementing auto-detection feature like we
did in Apache Arrow C++:
https://github.com/apache/arrow/blob/1b9c57e20802fb061c90837c39e99d8fa69cc212/cpp/src/arrow/ipc/feather.cc#L785-L797
I'm not sure where the auto-detection feature should be implemented
(Arrow.jl, Feather.jl or new library?) but how about create an issue to
https://github.com/apache/arrow-julia ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]