ronvohra opened a new issue, #6373:
URL: https://github.com/apache/opendal/issues/6373
### Describe the bug
Hello!
I'm having some trouble using the operator (Python bindings) to write a
pyarrow table to S3. I'm using `operator.open` (so an `opendal.File`) as a
context manager and passing this object to `pyarrow.parquet.ParquetWriter`
along with my table to be persisted to S3. When calling `write_table` on the
writer object in a unit test, the operation hangs indefinitely (no failure).
Reading a table from S3 works as expected, it's writing that's causing us
trouble.
### Steps to Reproduce
```python
from pathlib import Path
import boto3
import pyarrow as pa
from moto import mock_aws
from moto.moto_server.threaded_moto_server import ThreadedMotoServer
from opendal import Operator
def test_s3_operator_write() -> None:
with mock_aws():
server = ThreadedMotoServer(port=0)
server.start()
_, port = server.get_host_and_port()
endpoint_url = f"http://localhost:{port}"
conn = boto3.resource("s3", endpoint_url=endpoint_url)
conn.create_bucket(Bucket="test-bucket")
handler = Operator(
"s3",
bucket="test-bucket",
region="auto",
endpoint=endpoint_url # Add the mock endpoint URL
)
table = pa.Table.from_pydict({"A": [2]})
dest = Path("prefix/table.parquet")
with handler.open(dest, mode="wb") as f:
with pa.parquet.ParquetWriter(f) as writer:
writer.write(table)
server.stop()
```
### Expected Behavior
The test would actually execute the write to S3, and then I would verify
this by reading using the operator.
### Additional Context
I tried wrapping the operator in `pyarrow.PythonFile` and passing it to
`ParquetWriter`, but to no avail.
I wonder if I'm doing something incorrectly, or if this is even expected
behaviour given [this comment](https://github.com/apache/opendal/issues/4363)
about file-like operations being supported only for `fs` mode?
Would appreciate your help, thanks!
### Are you willing to submit a PR to fix this bug?
- [ ] Yes, I would like to submit a PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]