jorisvandenbossche commented on PR #14764: URL: https://github.com/apache/arrow/pull/14764#issuecomment-1331933506
OK, so the problem is that we use `_resolve_filesystem_and_path` to preprocess the path. And for handling relative paths, this helper function assumes that the file should exist (it was written for when _reading_ data ...): https://github.com/apache/arrow/blob/54e17920eee65e4227eba889aadbdfeb66c114cd/python/pyarrow/fs.py#L169-L185 And so if the file doesn't exist as a local (potentially relative) path, we fall back to use `from_uri`. And for this case, that then fails because indeed for a URI, the multi-byte character is invalid .. It actually also fails for "foo.parquet", because we don't support plain paths as URIs, but this gives: ``` In [12]: LocalFileSystem.from_uri("foo.parquet") ... ArrowInvalid: URI has empty scheme: 'foo.parquet' ``` And that error about "empty scheme" is specifically checked for, to ignore that in this case: https://github.com/apache/arrow/blob/54e17920eee65e4227eba889aadbdfeb66c114cd/python/pyarrow/fs.py#L184-L191 So I think the gist is that `_resolve_filesystem_and_path` wasn't written initially with _writing_ in mind (when files don't (need to) exist yet), and so we should update that logic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org