This is an automated email from the ASF dual-hosted git repository.

kevinjqliu pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg-python.git


The following commit(s) were added to refs/heads/main by this push:
     new 382a15be AWS profile support to glue and fsspec s3 fileio (#2948)
382a15be is described below

commit 382a15be769a8b51aac347a56668fd12653b3ac2
Author: Stats <[email protected]>
AuthorDate: Wed Jan 28 02:01:12 2026 +0900

    AWS profile support to glue and fsspec s3 fileio (#2948)
    
    <!--
    Thanks for opening a pull request!
    -->
    
    <!-- In the case this PR will resolve an issue, please replace
    ${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
    Closes #2841
    
    # Rationale for this change
    This PR adds explicit AWS profile support for both the Glue catalog
    client and
    fsspec-based S3 FileIO.
    
    While `GlueCatalog` already supports profile configuration, fsspec-based
    S3
    operations did not propagate profile selection to the underlying
    `S3FileSystem` or async AWS session. As a result, users had to rely on
    environment
    variables or the default AWS profile, which makes it difficult to work
    with
    multiple AWS configurations in parallel.
    
    This change introduces two configuration properties:
    - `client.profile-name`: a unified AWS profile for the catalog client
    and FileIO
    - `s3.profile-name`: an AWS profile specifically for S3 FileIO
    
    Profile resolution follows this precedence:
    1. `s3.profile-name`
    2. `client.profile-name`
    
    This ensures consistent and explicit credential selection across catalog
    and
    FileIO layers when using the fsspec backend.
    
    ## Are these changes tested?
    Yes. New unit tests were added to validate the profile propagation
    behavior.
    
    - **Glue Catalog**
    - Verifies that `boto3.Session(profile_name=...)` is created when
    initializing
        `GlueCatalog` with `client.profile-name`.
    
    - **S3 FileIO (fsspec)**
    - Verifies that `client.profile-name` or `s3.profile-name` results in
    the
    creation of an async AWS session with the correct profile, which is then
        passed to `S3FileSystem`.
    
    The tests were run locally with:
    ```bash
    pytest tests/catalog/test_glue_profile.py tests/io/test_fsspec_profile.py
    ```
    
    Output would be:
    ```
    ==================== test session starts =====================
    platform darwin -- Python 3.12.4, pytest-9.0.2, pluggy-1.6.0
    rootdir: ${ROOTDIR}/iceberg-python
    configfile: pyproject.toml
    plugins: anyio-4.2.0, lazy-fixture-0.6.3, requests-mock-1.12.1
    collected 3 items
    tests/catalog/test_glue_profile.py .                   [ 33%]
    tests/io/test_fsspec_profile.py ..                     [100%]
    ===================== 3 passed in 1.02s ======================
    ```
    
    ## Are there any user-facing changes?
    Yes, this adds new configuration properties that users can set:
    - `client.profile-name`: Sets the AWS profile for both the catalog
    client and FileIO (unified configuration).
    - `s3.profile-name`: Sets the AWS profile specifically for S3 FileIO.
    
    **Example Usage:**
    ```python
    catalog = GlueCatalog(
        "my_catalog",
        **{
            "client.profile-name": "my-aws-profile",
            # ... other config
        }
    )
---
 mkdocs/docs/configuration.md       |   4 +-
 pyiceberg/catalog/glue.py          |   4 +-
 pyiceberg/io/__init__.py           |   2 +
 pyiceberg/io/fsspec.py             |  13 ++++-
 tests/catalog/test_glue_profile.py |  67 +++++++++++++++++++++++
 tests/io/test_fsspec_profile.py    | 106 +++++++++++++++++++++++++++++++++++++
 6 files changed, 192 insertions(+), 4 deletions(-)

diff --git a/mkdocs/docs/configuration.md b/mkdocs/docs/configuration.md
index e42ea1da..efe6ddee 100644
--- a/mkdocs/docs/configuration.md
+++ b/mkdocs/docs/configuration.md
@@ -115,6 +115,7 @@ For the FileIO there are several configuration options 
available:
 | s3.access-key-id            | admin                      | Configure the 
static access key id used to access the FileIO.                                 
                                                                                
                                                                              |
 | s3.secret-access-key        | password                   | Configure the 
static secret access key used to access the FileIO.                             
                                                                                
                                                                              |
 | s3.session-token            | AQoDYXdzEJr...             | Configure the 
static session token used to access the FileIO.                                 
                                                                                
                                                                              |
+| s3.profile-name             | default                    | Configure the AWS 
profile used to access the S3 FileIO.                                           
                                                                                
                                                                         |
 | s3.role-session-name        | session                    | An optional 
identifier for the assumed role session.                                        
                                                                                
                                                                                
|
 | s3.role-arn                 | arn:aws:...                | AWS Role ARN. If 
provided instead of access_key and secret_key, temporary credentials will be 
fetched by assuming this role.                                                  
                                                                              |
 | s3.signer                   | bearer                     | Configure the 
signature version of the FileIO.                                                
                                                                                
                                                                              |
@@ -720,7 +721,7 @@ catalog:
 | glue.id                | 111111111111                           | Configure 
the 12-digit ID of the Glue Catalog                                   |
 | glue.skip-archive      | true                                   | Configure 
whether to skip the archival of older table versions. Default to true |
 | glue.endpoint          | <https://glue.us-east-1.amazonaws.com> | Configure 
an alternative endpoint of the Glue service for GlueCatalog to access |
-| glue.profile-name      | default                                | Configure 
the static profile used to access the Glue Catalog                    |
+| glue.profile-name      | default                                | Configure 
the AWS profile used to access the Glue Catalog                       |
 | glue.region            | us-east-1                              | Set the 
region of the Glue Catalog                                              |
 | glue.access-key-id     | admin                                  | Configure 
the static access key id used to access the Glue Catalog              |
 | glue.secret-access-key | password                               | Configure 
the static secret access key used to access the Glue Catalog          |
@@ -826,6 +827,7 @@ configures the AWS credentials for both Glue Catalog and S3 
FileIO.
 | client.access-key-id     | admin          | Configure the static access key 
id used to access both the Glue/DynamoDB Catalog and the S3 FileIO     |
 | client.secret-access-key | password       | Configure the static secret 
access key used to access both the Glue/DynamoDB Catalog and the S3 FileIO |
 | client.session-token     | AQoDYXdzEJr... | Configure the static session 
token used to access both the Glue/DynamoDB Catalog and the S3 FileIO     |
+| client.profile-name      | default        | Configure the AWS profile used 
to access both the Glue/DynamoDB Catalog and the S3 FileIO           |
 | client.role-session-name      | session                    | An optional 
identifier for the assumed role session.                                        
                                                                                
                                                                              |
 | client.role-arn          | arn:aws:...                | AWS Role ARN. If 
provided instead of access_key and secret_key, temporary credentials will be 
fetched by assuming this role.                                                  
                                                                            |
 
diff --git a/pyiceberg/catalog/glue.py b/pyiceberg/catalog/glue.py
index e55257af..5c09cdbd 100644
--- a/pyiceberg/catalog/glue.py
+++ b/pyiceberg/catalog/glue.py
@@ -48,7 +48,7 @@ from pyiceberg.exceptions import (
     NoSuchTableError,
     TableAlreadyExistsError,
 )
-from pyiceberg.io import AWS_ACCESS_KEY_ID, AWS_REGION, AWS_SECRET_ACCESS_KEY, 
AWS_SESSION_TOKEN
+from pyiceberg.io import AWS_ACCESS_KEY_ID, AWS_PROFILE_NAME, AWS_REGION, 
AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN
 from pyiceberg.partitioning import UNPARTITIONED_PARTITION_SPEC, PartitionSpec
 from pyiceberg.schema import Schema, SchemaVisitor, visit
 from pyiceberg.serializers import FromInputFile
@@ -329,7 +329,7 @@ class GlueCatalog(MetastoreCatalog):
             retry_mode_prop_value = get_first_property_value(properties, 
GLUE_RETRY_MODE)
 
             session = boto3.Session(
-                profile_name=properties.get(GLUE_PROFILE_NAME),
+                profile_name=get_first_property_value(properties, 
GLUE_PROFILE_NAME, AWS_PROFILE_NAME),
                 region_name=get_first_property_value(properties, GLUE_REGION, 
AWS_REGION),
                 botocore_session=properties.get(BOTOCORE_SESSION),
                 aws_access_key_id=get_first_property_value(properties, 
GLUE_ACCESS_KEY_ID, AWS_ACCESS_KEY_ID),
diff --git a/pyiceberg/io/__init__.py b/pyiceberg/io/__init__.py
index 85bd402d..71f763bf 100644
--- a/pyiceberg/io/__init__.py
+++ b/pyiceberg/io/__init__.py
@@ -41,12 +41,14 @@ from pyiceberg.typedef import EMPTY_DICT, Properties
 
 logger = logging.getLogger(__name__)
 
+AWS_PROFILE_NAME = "client.profile-name"
 AWS_REGION = "client.region"
 AWS_ACCESS_KEY_ID = "client.access-key-id"
 AWS_SECRET_ACCESS_KEY = "client.secret-access-key"
 AWS_SESSION_TOKEN = "client.session-token"
 AWS_ROLE_ARN = "client.role-arn"
 AWS_ROLE_SESSION_NAME = "client.role-session-name"
+S3_PROFILE_NAME = "s3.profile-name"
 S3_ANONYMOUS = "s3.anonymous"
 S3_ENDPOINT = "s3.endpoint"
 S3_ACCESS_KEY_ID = "s3.access-key-id"
diff --git a/pyiceberg/io/fsspec.py b/pyiceberg/io/fsspec.py
index eb5342c9..6f44501e 100644
--- a/pyiceberg/io/fsspec.py
+++ b/pyiceberg/io/fsspec.py
@@ -51,6 +51,7 @@ from pyiceberg.io import (
     ADLS_TENANT_ID,
     ADLS_TOKEN,
     AWS_ACCESS_KEY_ID,
+    AWS_PROFILE_NAME,
     AWS_REGION,
     AWS_SECRET_ACCESS_KEY,
     AWS_SESSION_TOKEN,
@@ -71,6 +72,7 @@ from pyiceberg.io import (
     S3_CONNECT_TIMEOUT,
     S3_ENDPOINT,
     S3_FORCE_VIRTUAL_ADDRESSING,
+    S3_PROFILE_NAME,
     S3_PROXY_URI,
     S3_REGION,
     S3_REQUEST_TIMEOUT,
@@ -205,7 +207,16 @@ def _s3(properties: Properties) -> AbstractFileSystem:
     else:
         anon = False
 
-    fs = S3FileSystem(anon=anon, client_kwargs=client_kwargs, 
config_kwargs=config_kwargs)
+    s3_fs_kwargs = {
+        "anon": anon,
+        "client_kwargs": client_kwargs,
+        "config_kwargs": config_kwargs,
+    }
+
+    if profile_name := get_first_property_value(properties, S3_PROFILE_NAME, 
AWS_PROFILE_NAME):
+        s3_fs_kwargs["profile"] = profile_name
+
+    fs = S3FileSystem(**s3_fs_kwargs)
 
     for event_name, event_function in register_events.items():
         fs.s3.meta.events.unregister(event_name, unique_id=1925)
diff --git a/tests/catalog/test_glue_profile.py 
b/tests/catalog/test_glue_profile.py
new file mode 100644
index 00000000..3d9ee92a
--- /dev/null
+++ b/tests/catalog/test_glue_profile.py
@@ -0,0 +1,67 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from unittest import mock
+
+from moto import mock_aws
+
+from pyiceberg.catalog.glue import GlueCatalog
+from pyiceberg.typedef import Properties
+from tests.conftest import UNIFIED_AWS_SESSION_PROPERTIES
+
+
+@mock_aws
+def test_passing_client_profile_name_properties_to_glue() -> None:
+    session_properties: Properties = {
+        "client.profile-name": "profile_name",
+        **UNIFIED_AWS_SESSION_PROPERTIES,
+    }
+
+    with mock.patch("boto3.Session") as mock_session:
+        test_catalog = GlueCatalog("glue", **session_properties)
+
+    mock_session.assert_called_with(
+        aws_access_key_id="client.access-key-id",
+        aws_secret_access_key="client.secret-access-key",
+        aws_session_token="client.session-token",
+        region_name="client.region",
+        profile_name="profile_name",
+        botocore_session=None,
+    )
+    assert test_catalog.glue is mock_session().client()
+
+
+@mock_aws
+def test_glue_profile_precedence() -> None:
+    session_properties: Properties = {
+        "glue.profile-name": "glue-profile",
+        "client.profile-name": "client-profile",
+        **UNIFIED_AWS_SESSION_PROPERTIES,
+    }
+
+    with mock.patch("boto3.Session") as mock_session:
+        test_catalog = GlueCatalog("glue", **session_properties)
+
+    mock_session.assert_called_with(
+        aws_access_key_id="client.access-key-id",
+        aws_secret_access_key="client.secret-access-key",
+        aws_session_token="client.session-token",
+        region_name="client.region",
+        profile_name="glue-profile",
+        botocore_session=None,
+    )
+    assert test_catalog.glue is mock_session().client()
diff --git a/tests/io/test_fsspec_profile.py b/tests/io/test_fsspec_profile.py
new file mode 100644
index 00000000..5f4a63f6
--- /dev/null
+++ b/tests/io/test_fsspec_profile.py
@@ -0,0 +1,106 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+import uuid
+from unittest import mock
+
+from pyiceberg.io.fsspec import FsspecFileIO
+from pyiceberg.typedef import Properties
+from tests.conftest import UNIFIED_AWS_SESSION_PROPERTIES
+
+
+def test_fsspec_s3_session_properties_with_profile() -> None:
+    session_properties: Properties = {
+        "s3.profile-name": "test-profile",
+        "s3.endpoint": "http://localhost:9000";,
+        **UNIFIED_AWS_SESSION_PROPERTIES,
+    }
+
+    with mock.patch("s3fs.S3FileSystem") as mock_s3fs:
+        s3_fileio = FsspecFileIO(properties=session_properties)
+        filename = str(uuid.uuid4())
+
+        s3_fileio.new_input(location=f"s3://warehouse/{filename}")
+
+        mock_s3fs.assert_called_with(
+            anon=False,
+            client_kwargs={
+                "endpoint_url": "http://localhost:9000";,
+                "aws_access_key_id": "client.access-key-id",
+                "aws_secret_access_key": "client.secret-access-key",
+                "region_name": "client.region",
+                "aws_session_token": "client.session-token",
+            },
+            config_kwargs={},
+            profile="test-profile",
+        )
+
+
+def test_fsspec_s3_session_properties_with_client_profile() -> None:
+    session_properties: Properties = {
+        "client.profile-name": "test-profile",
+        "s3.endpoint": "http://localhost:9000";,
+        **UNIFIED_AWS_SESSION_PROPERTIES,
+    }
+
+    with mock.patch("s3fs.S3FileSystem") as mock_s3fs:
+        s3_fileio = FsspecFileIO(properties=session_properties)
+        filename = str(uuid.uuid4())
+
+        s3_fileio.new_input(location=f"s3://warehouse/{filename}")
+
+        mock_s3fs.assert_called_with(
+            anon=False,
+            client_kwargs={
+                "endpoint_url": "http://localhost:9000";,
+                "aws_access_key_id": "client.access-key-id",
+                "aws_secret_access_key": "client.secret-access-key",
+                "region_name": "client.region",
+                "aws_session_token": "client.session-token",
+            },
+            config_kwargs={},
+            profile="test-profile",
+        )
+
+
+def test_fsspec_s3_session_properties_with_s3_and_client_profile() -> None:
+    session_properties: Properties = {
+        "s3.profile-name": "s3-profile",
+        "client.profile-name": "client-profile",
+        "s3.endpoint": "http://localhost:9000";,
+        **UNIFIED_AWS_SESSION_PROPERTIES,
+    }
+
+    with mock.patch("s3fs.S3FileSystem") as mock_s3fs:
+        s3_fileio = FsspecFileIO(properties=session_properties)
+        filename = str(uuid.uuid4())
+
+        s3_fileio.new_input(location=f"s3://warehouse/{filename}")
+
+        mock_s3fs.assert_called_with(
+            anon=False,
+            client_kwargs={
+                "endpoint_url": "http://localhost:9000";,
+                "aws_access_key_id": "client.access-key-id",
+                "aws_secret_access_key": "client.secret-access-key",
+                "region_name": "client.region",
+                "aws_session_token": "client.session-token",
+            },
+            config_kwargs={},
+            profile="s3-profile",
+        )

Reply via email to