[ https://issues.apache.org/jira/browse/ARROW-17045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Will Jones updated ARROW-17045: ------------------------------- Summary: [C++] Reject trailing slashes on file path (was: [C++] GCS doesn't drop ending slash for files) > [C++] Reject trailing slashes on file path > ------------------------------------------ > > Key: ARROW-17045 > URL: https://issues.apache.org/jira/browse/ARROW-17045 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Affects Versions: 8.0.0 > Reporter: Will Jones > Assignee: Will Jones > Priority: Critical > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > We had several different behaviors when passing in file paths with trailing > slashes: LocalFileSystem would return IOError, S3 would trim off the trailing > slash, and GCS would keep the trailing slash as part of the file name (later > creating confusion as the file would be labelled a "directory" in list > calls). This PR moves them all to the behavior of LocalFileSystem: return > IOError. > The R filesystem bindings relied on the behavior provided by S3, so they are > now modified to trim the trailing slash before passing down to C++. > Here is an example of the differences in behavior between S3 and GCS: > {code:python} > import pyarrow.fs > from pyarrow.fs import FileSelector > from datetime import timedelta > gcs = pyarrow.fs.GcsFileSystem( > endpoint_override="localhost:9001", > scheme="http", > anonymous=True, > retry_time_limit=timedelta(seconds=1), > ) > gcs.create_dir("py_test") > # Writing to test.txt with and without slash produces a file and a directory!? > with gcs.open_output_stream("py_test/test.txt") as out_stream: > out_stream.write(b"Hello world!") > with gcs.open_output_stream("py_test/test.txt/") as out_stream: > out_stream.write(b"Hello world!") > gcs.get_file_info(FileSelector("py_test")) > # [<FileInfo for 'py_test/test.txt': type=FileType.File, size=12>, <FileInfo > for 'py_test/test.txt': type=FileType.Directory>] > s3 = pyarrow.fs.S3FileSystem( > access_key="minioadmin", > secret_key="minioadmin", > scheme="http", > endpoint_override="localhost:9000", > allow_bucket_creation=True, > allow_bucket_deletion=True, > ) > s3.create_dir("py-test") > # Writing to test.txt with and without slash writes to same file > with s3.open_output_stream("py-test/test.txt") as out_stream: > out_stream.write(b"Hello world!") > with s3.open_output_stream("py-test/test.txt/") as out_stream: > out_stream.write(b"Hello world!") > s3.get_file_info(FileSelector("py-test")) > # [<FileInfo for 'py-test/test.txt': type=FileType.File, size=12>] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)