jorisvandenbossche commented on PR #14764:
URL: https://github.com/apache/arrow/pull/14764#issuecomment-1331933506

   OK, so the problem is that we use `_resolve_filesystem_and_path` to 
preprocess the path. And for handling relative paths, this helper function 
assumes that the file should exist (it was written for when _reading_ data ...):
   
   
https://github.com/apache/arrow/blob/54e17920eee65e4227eba889aadbdfeb66c114cd/python/pyarrow/fs.py#L169-L185
   
   And so if the file doesn't exist as a local (potentially relative) path, we 
fall back to use `from_uri`. And for this case, that then fails because indeed 
for a URI, the multi-byte character is invalid .. 
   It actually also fails for "foo.parquet", because we don't support plain 
paths as URIs, but this gives:
   
   ```
   In [12]: LocalFileSystem.from_uri("foo.parquet")
   ...
   ArrowInvalid: URI has empty scheme: 'foo.parquet'
   ```
   
   And that error about "empty scheme" is specifically checked for, to ignore 
that in this case:
   
   
https://github.com/apache/arrow/blob/54e17920eee65e4227eba889aadbdfeb66c114cd/python/pyarrow/fs.py#L184-L191
   
   So I think the gist is that `_resolve_filesystem_and_path` wasn't written 
initially with _writing_ in mind (when files don't (need to) exist yet), and so 
we should update that logic. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to