This is an automated email from the ASF dual-hosted git repository.

mengw15 pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/texera.git


The following commit(s) were added to refs/heads/main by this push:
     new 9422def3a8 fix: preserve original error in 
IcebergIterator._seek_to_usable_file (#5092)
9422def3a8 is described below

commit 9422def3a8c9ef5b8ee128afc4095d07104f3e2f
Author: Meng Wang <[email protected]>
AuthorDate: Mon May 18 14:40:56 2026 -0700

    fix: preserve original error in IcebergIterator._seek_to_usable_file (#5092)
    
    ### What changes were proposed in this PR?
    
    `IcebergIterator._seek_to_usable_file` previously swallowed every error
    during file-scan setup:
    
    ```python
    except Exception:
        print("Could not read iceberg table:\n")
        raise Exception
    ```
    
    The bare `raise Exception` (no args, no `from`) constructs a fresh
    `Exception` with empty `str()` and no `__cause__`. Callers that do
    `except Exception as e: log.error(str(e))` see only an empty class name
    — the original error type, message, and traceback are all lost. The
    `print` also bypasses the project logger.
    
    This PR replaces the bare re-raise with a true re-raise of the original
    exception and routes the diagnostic message through `loguru`, matching
    the existing `except Exception as err: logger.exception(err)` pattern
    used in `data_processor.py:125`, `main_loop.py:422`, and
    `input_port_materialization_reader_runnable.py:169`:
    
    ```python
    except Exception as err:
        logger.exception(err)
        raise
    ```
    
    Callers now see the actual underlying exception (catalog auth failure,
    S3 IO error, manifest corruption, etc.) with its full class name,
    message, and traceback.
    
    ### Any related issues, documentation, discussions?
    
    Closes #5091.
    
    ### How was this PR tested?
    
    Added
    
`amber/src/test/python/core/storage/iceberg/test_iceberg_iterator_error_paths.py`,
    a slim mocked regression test: it patches `load_table_metadata` to
    return a `Mock` whose `refresh()` raises `RuntimeError("Catalog auth
    failure: token expired")`, drives `next(IcebergIterator(...))`, and
    asserts the caller observes the original `RuntimeError` (via
    `pytest.raises(RuntimeError, match=...)`). Locks in the contract that
    the except clause must not swallow the underlying exception's
    type/message.
    
    Run locally:
    
    ```
    python -m pytest 
amber/src/test/python/core/storage/iceberg/test_iceberg_iterator_error_paths.py 
-v
    ```
    
    Result: `1 passed`.
    
    ### Was this PR authored or co-authored using generative AI tooling?
    
    Generated-by: Claude Code (claude-opus-4-7)
    
    Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
---
 .../core/storage/iceberg/iceberg_document.py       |  7 ++--
 .../iceberg/test_iceberg_iterator_error_paths.py   | 37 ++++++++++++++++++++++
 2 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/amber/src/main/python/core/storage/iceberg/iceberg_document.py 
b/amber/src/main/python/core/storage/iceberg/iceberg_document.py
index 997ab9b5b7..7a5beda916 100644
--- a/amber/src/main/python/core/storage/iceberg/iceberg_document.py
+++ b/amber/src/main/python/core/storage/iceberg/iceberg_document.py
@@ -17,6 +17,7 @@
 
 import pyarrow as pa
 from itertools import islice
+from loguru import logger
 from pyiceberg.catalog import Catalog
 from pyiceberg.schema import Schema
 from pyiceberg.table import Table, FileScanTask
@@ -211,9 +212,9 @@ class IcebergIterator(Iterator[T]):
                             self.num_of_skipped_records += record_count
                             continue
                         yield task
-                except Exception:
-                    print("Could not read iceberg table:\n")
-                    raise Exception
+                except Exception as err:
+                    logger.exception(err)
+                    raise
             else:
                 return iter([])
 
diff --git 
a/amber/src/test/python/core/storage/iceberg/test_iceberg_iterator_error_paths.py
 
b/amber/src/test/python/core/storage/iceberg/test_iceberg_iterator_error_paths.py
new file mode 100644
index 0000000000..d724ac31d5
--- /dev/null
+++ 
b/amber/src/test/python/core/storage/iceberg/test_iceberg_iterator_error_paths.py
@@ -0,0 +1,37 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from unittest.mock import Mock, patch
+
+import pytest
+
+from core.storage.iceberg import iceberg_document
+from core.storage.iceberg.iceberg_document import IcebergIterator
+
+
+def test_seek_to_usable_file_preserves_original_error():
+    failing_table = Mock()
+    failing_table.refresh.side_effect = RuntimeError(
+        "Catalog auth failure: token expired"
+    )
+
+    with patch.object(
+        iceberg_document, "load_table_metadata", return_value=failing_table
+    ):
+        it = IcebergIterator(0, None, None, "ns", "tbl", None, None)
+        with pytest.raises(RuntimeError, match="Catalog auth failure"):
+            next(it)

Reply via email to