Al-Moatasem opened a new issue, #974:
URL: https://github.com/apache/iceberg-python/issues/974
### Apache Iceberg version
0.6.1 (latest release)
### Please describe the bug 🐞
Hi,
I am trying to use the **rest** catalog and writing the data into **Minio**,
the script I am using can communicate with Minio (it creates the
`metadata.json` file under `metadata` directory, however, it raises `OSError:
When initiating multiple part upload for key
'poc_new/coordinates/data/00000-0-f27b7921-a6d7-4c7e-b034-2d12221e5054.parquet'
in bucket 'warehouse': AWS Error NETWORK_CONNECTION during
CreateMultipartUpload operation: Encountered network error when sending http
request`
this is the docker compose file that I use
```yaml
version: '3'
services:
rest:
image: tabulario/iceberg-rest:1.5.0
container_name: iceberg-rest
ports:
- 8181:8181
environment:
- AWS_ACCESS_KEY_ID=admin
- AWS_SECRET_ACCESS_KEY=password
- AWS_REGION=us-east-1
- CATALOG_WAREHOUSE=s3://warehouse/
- CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
- CATALOG_S3_ENDPOINT=http://minio:9000
networks:
iceberg-rest:
minio:
image: minio/minio:RELEASE.2024-05-10T01-41-38Z
container_name: minio
environment:
- MINIO_ROOT_USER=admin
- MINIO_ROOT_PASSWORD=password
- MINIO_DOMAIN=minio
ports:
- 9001:9001
- 9000:9000
command: [ "server", "/data", "--console-address", ":9001" ]
networks:
iceberg-rest:
aliases:
- warehouse.minio
mc:
depends_on:
- minio
image: minio/mc:RELEASE.2024-05-09T17-04-24Z
container_name: mc
entrypoint: |
/bin/sh -c "
until (/usr/bin/mc config host add minio http://minio:9000 admin
password)
do
echo '...waiting...' && sleep 1;
done;
/usr/bin/mc rm -r --force minio/warehouse;
/usr/bin/mc mb minio/warehouse;
/usr/bin/mc policy set public minio/warehouse;
tail -f /dev/null
"
environment:
- AWS_ACCESS_KEY_ID=admin
- AWS_SECRET_ACCESS_KEY=password
- AWS_REGION=us-east-1
networks:
iceberg-rest:
networks:
iceberg-rest:
```
And this the script file
```py
import pyarrow as pa
from pyiceberg.catalog import load_rest
from pyiceberg.exceptions import NamespaceAlreadyExistsError,
TableAlreadyExistsError
catalog = load_rest(
name="rest",
conf={
"uri": "http://localhost:8181/",
},
)
namespace = "poc_new"
try:
catalog.create_namespace(namespace)
except NamespaceAlreadyExistsError as e:
pass
df = pa.Table.from_pylist(
[
{"lat": 52.371807, "long": 4.896029},
{"lat": 52.387386, "long": 4.646219},
{"lat": 52.078663, "long": 4.288788},
],
)
schema = df.schema
table_name = "coordinates"
table_identifier = f"{namespace}.{table_name}"
try:
table = catalog.create_table(
identifier=table_identifier,
schema=schema,
)
except TableAlreadyExistsError as e:
pass
table = catalog.load_table(table_identifier)
table.append(df)
```
The Traceback
```
Traceback (most recent call last):
File "d:\flink_iceberg\poc_01_iceberg_rest.py", line 40, in <module>
table.append(df)
File
"D:\flink_iceberg\.venv2\Lib\site-packages\pyiceberg\table\__init__.py", line
1068, in append
for data_file in data_files:
File
"D:\flink_iceberg\.venv2\Lib\site-packages\pyiceberg\table\__init__.py", line
2423, in _dataframe_to_data_files
yield from write_file(table, iter([WriteTask(write_uuid, next(counter),
df)]))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\flink_iceberg\.venv2\Lib\site-packages\pyiceberg\io\pyarrow.py",
line 1726, in write_file
with fo.create(overwrite=True) as fos:
^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\flink_iceberg\.venv2\Lib\site-packages\pyiceberg\io\pyarrow.py",
line 299, in create
output_file = self._filesystem.open_output_stream(self._path,
buffer_size=self._buffer_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow\_fs.pyx", line 868, in
pyarrow._fs.FileSystem.open_output_stream
File "pyarrow\error.pxi", line 144, in
pyarrow.lib.pyarrow_internal_check_status
File "pyarrow\error.pxi", line 115, in pyarrow.lib.check_status
OSError: When initiating multiple part upload for key
'poc_new/coordinates/data/00000-0-efc0be57-453d-442d-af13-2e0b2382a53d.parquet'
in bucket 'warehouse': AWS Error NETWORK_CONNECTION during
CreateMultipartUpload operation: Encountered network error when sending http
request
```
In Minio, the metadata directory is created and it stores the
`metadata.json` file, but, no `data` directory.

Also, this is the requirements.txt file
```
annotated-types==0.7.0
apache-beam==2.48.0
apache-flink==1.19.1
apache-flink-libraries==1.19.1
avro-python3==1.10.2
certifi==2024.7.4
charset-normalizer==3.3.2
click==8.1.7
cloudpickle==2.2.1
colorama==0.4.6
confluent-kafka==2.5.0
crcmod==1.7
dill==0.3.1.1
dnspython==2.6.1
docopt==0.6.2
duckdb==0.9.2
duckdb_engine==0.13.0
Faker==26.0.0
fastavro==1.9.5
fasteners==0.19
fsspec==2023.12.2
greenlet==3.0.3
grpcio==1.65.1
hdfs==2.7.3
httplib2==0.22.0
idna==3.7
kafka-python==2.0.2
markdown-it-py==3.0.0
mdurl==0.1.2
mmhash3==3.0.1
numpy==1.24.4
objsize==0.6.1
orjson==3.10.6
packaging==24.1
pandas==2.2.2
polars==1.2.1
proto-plus==1.24.0
protobuf==4.23.4
py4j==0.10.9.7
pyarrow==11.0.0
pydantic==2.8.2
pydantic-settings==2.3.4
pydantic_core==2.20.1
pydot==1.4.2
Pygments==2.18.0
pyiceberg==0.6.1
pymongo==4.8.0
pyparsing==3.1.2
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
pytz==2024.1
regex==2024.7.24
requests==2.32.3
rich==13.7.1
ruamel.yaml==0.18.6
ruamel.yaml.clib==0.2.8
six==1.16.0
sortedcontainers==2.4.0
SQLAlchemy==2.0.31
strictyaml==1.7.3
typing_extensions==4.12.2
tzdata==2024.1
urllib3==2.2.2
zstandard==0.23.0
```
I checked [this Slack
thread](https://apache-iceberg.slack.com/archives/C029EE6HQ5D/p1707633685716559)
for the same issue, but, it doesn't contain any fix for my case.
OS: Windows 10
environment variables contain `aws` in the three containers
`iceberg-rest` container
```
iceberg@ce79d3f11b5f:/usr/lib/iceberg-rest$ env | grep -i aws
AWS_REGION=us-east-1
CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
AWS_SECRET_ACCESS_KEY=password
AWS_ACCESS_KEY_ID=admin
```
`minio` container, doesn't have any ENV with `aws`
`mc` container
```
AWS_REGION=us-east-1
AWS_SECRET_ACCESS_KEY=password
AWS_ACCESS_KEY_ID=admin
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]