AlgoDeveloper400 opened a new pull request, #3161:
URL: https://github.com/apache/iceberg-python/pull/3161
# fix: handle Windows drive letters in `parse_location`
## Rationale for this change
When a Windows user passes a local file path like `C:\Users\file.avro` to
`PyArrowFileIO`,
Python's `urlparse` incorrectly treats the Windows drive letter `C` as a URL
scheme (like `s3` or `http`).
This caused PyIceberg to crash with:
```
Unrecognized filesystem type in URI: 'c'
```
---
## The Fix
**Before ❌ (Original Code):**
```python
uri = urlparse(location)
if not uri.scheme:
default_scheme = properties.get("DEFAULT_SCHEME", "file")
default_netloc = properties.get("DEFAULT_NETLOC", "")
return default_scheme, default_netloc, os.path.abspath(location)
```
**After ✅ (Fixed Code):**
```python
uri = urlparse(location)
if not uri.scheme or (len(uri.scheme) == 1 and uri.scheme.isalpha()):
# len == 1 and isalpha() catches Windows drive letters like C:\ D:\
default_scheme = properties.get("DEFAULT_SCHEME", "file")
default_netloc = properties.get("DEFAULT_NETLOC", "")
return default_scheme, default_netloc, os.path.abspath(location)
```
**The only change:**
```python
# Before ❌
if not uri.scheme:
# After ✅
if not uri.scheme or (len(uri.scheme) == 1 and uri.scheme.isalpha()):
```
The added condition checks if the scheme is a **single alphabetic
character** (e.g. `C`, `D`, `E`)
and treats it as a Windows drive letter instead of a URL scheme.
---
## Example
```python
from pyiceberg.io.pyarrow import PyArrowFileIO
io = PyArrowFileIO()
# Before fix - crashed with: Unrecognized filesystem type in URI: 'c'
# After fix - works correctly
scheme, netloc, path = io.parse_location("C:\\Users\\test\\file.avro")
print(scheme) # 'file'
print(netloc) # ''
print(path) # 'C:\\Users\\test\\file.avro'
```
---
## Impact
This fix affects all local file operations on Windows including:
- Reading local Iceberg tables
- Writing local Iceberg tables
- Any local Avro/Parquet file operations
---
## Are these changes tested?
Yes - existing tests now pass on Windows.
**`tests/test_avro_sanitization.py`**
```
python -m pytest tests/test_avro_sanitization.py -v
```
```
tests/test_avro_sanitization.py::test_comprehensive_field_name_sanitization
PASSED
tests/test_avro_sanitization.py::test_comprehensive_avro_compatibility
PASSED
tests/test_avro_sanitization.py::test_emoji_field_name_sanitization
PASSED
```
**`tests/io/test_pyarrow.py`**
```
python -m pytest
tests/io/test_pyarrow.py::test_pyarrow_infer_local_fs_from_path -v
```
```
tests/io/test_pyarrow.py::test_pyarrow_infer_local_fs_from_path
PASSED
```
---
## Are there any user-facing changes?
Yes - fixes local file access on Windows for all PyIceberg users.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]