This is an automated email from the ASF dual-hosted git repository.

liurenjie1024 pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg-rust.git


The following commit(s) were added to refs/heads/main by this push:
     new 052feaf3b rfc: Modularize `iceberg` Implementations (#1854)
052feaf3b is described below

commit 052feaf3b6c6cd0e3310b1b8211ab3fed7d0e520
Author: Xuanwo <[email protected]>
AuthorDate: Tue Dec 2 18:01:47 2025 +0800

    rfc: Modularize `iceberg` Implementations (#1854)
    
    ## Which issue does this PR close?
    
    
    - Part of https://github.com/apache/iceberg-rust/issues/1819
    
    ## What changes are included in this PR?
    
    Add RFC for iceberg-kernel
    
    ## Are these changes tested?
    
    
    ---------
    
    Signed-off-by: Xuanwo <[email protected]>
    Co-authored-by: Kevin Liu <[email protected]>
    Co-authored-by: Andrew Lamb <[email protected]>
    Co-authored-by: github-actions[bot] 
<41898282+github-actions[bot]@users.noreply.github.com>
---
 .../0001_modularize_iceberg_implementations.md     | 120 +++++++++++++++++++++
 1 file changed, 120 insertions(+)

diff --git a/docs/rfcs/0001_modularize_iceberg_implementations.md 
b/docs/rfcs/0001_modularize_iceberg_implementations.md
new file mode 100644
index 000000000..14bd47827
--- /dev/null
+++ b/docs/rfcs/0001_modularize_iceberg_implementations.md
@@ -0,0 +1,120 @@
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+-->
+
+# RFC: Modularize `iceberg` Implementations
+
+## Background
+
+Issue #1819 highlighted that the current `iceberg` crate mixes the Iceberg 
protocol abstractions (catalog/table/plan/transaction) with concrete runtime, 
storage, and execution code (Tokio runtime wrappers, opendal-based `FileIO`, 
Arrow helpers, DataFusion glue, etc.). This coupling makes the crate heavy and 
blocks users from composing their own storage or execution stacks.
+
+Two principles have been agreed:
+1. The `iceberg` crate remains the single source of truth for all protocol 
traits and data structures. We will not create a separate “kernel” crate or 
facade layer.
+2. Concrete integrations (Tokio runtime, opendal `FileIO`, Arrow/DataFusion 
glue, catalog adapters, etc.) move out into dedicated companion crates. Users 
needing a ready path can depend on those crates (e.g., `iceberg-datafusion` or 
`integrations/local`), while custom stacks depend only on `iceberg`.
+
+This RFC focuses on modularizing implementations; detailed trait signatures 
(e.g., `FileIO`, `Runtime`) will be handled in separate RFCs.
+
+## Goals and Scope
+
+- Keep `iceberg` as the protocol crate (traits + metadata + planning), without 
bundling runtimes, storage adapters, or execution glue.
+- Relocate concrete code into companion crates under `crates/fileio/*`, 
`crates/runtime/*`, and `crates/integrations/*`.
+- Provide a staged plan for extracting Arrow-dependent APIs to avoid 
destabilizing file-format code.
+- Minimize breaking surfaces: traits stay in `iceberg`; downstream crates 
mainly adjust dependencies.
+
+Out of scope: changes to the Iceberg table specification or catalog adapter 
external behavior; detailed trait method design (covered by follow-up RFCs).
+
+## Architecture Overview
+
+### Workspace Layout (target)
+
+```
+crates/
+  iceberg/                # core traits, metadata, planning, transactions
+  fileio/
+    opendal/             # e.g. `iceberg-fileio-opendal`
+    fs/                  # other FileIO implementations
+  runtime/
+    tokio/               # e.g. `iceberg-runtime-tokio`
+    smol/
+  catalog/*              # catalog adapters (REST, HMS, Glue, etc.)
+  integrations/
+    local/               # simple local/arrow-based helper crate
+    datafusion/          # combines core + implementations for DF
+    cache-moka/
+    playground/
+```
+
+- `crates/iceberg` drops direct deps on opendal, Tokio, Arrow, and DataFusion.
+- Implementation crates depend on `iceberg` to implement the traits.
+- Higher-level crates (`integrations/local`, `iceberg-datafusion`) assemble 
the pieces for ready-to-use scenarios.
+
+### Core Trait Surfaces
+
+`FileIO`, `Runtime`, `Catalog`, `Table`, `Transaction`, `TableScan` (plan 
descriptors) all remain hosted in `iceberg`. Precise method signatures are 
deferred to dedicated RFCs to avoid locking details prematurely.
+
+### Usage Modes
+
+- **Custom stacks**: depend on `iceberg` and provide your own implementations.
+- **Pre-built stacks**: depend on `integrations/local` or 
`iceberg-datafusion`, which bundle `iceberg` with selected runtime/FileIO/Arrow 
helpers.
+- `iceberg` does not re-export companion crates; users compose explicitly.
+
+## Migration Plan (staged, with Arrow extraction phased)
+
+1. **Phase 1 – Confirm trait hosting, defer details**
+   - Keep all protocol traits in `iceberg`; move detailed API design (FileIO, 
Runtime, etc.) to separate RFCs.
+   - Add temporary shims/deprecations only when traits are finalized.
+
+2. **Phase 2 – First Arrow step: move `to_arrow()` out**
+   - Relocate the public `to_arrow()` API to `integrations/local` (or another 
higher-level crate). Core no longer exposes Arrow entry points.
+   - Keep internal Arrow-dependent helpers (e.g., `ArrowFileReader`) 
temporarily in `iceberg` to avoid breaking file-format flows.
+
+3. **Phase 3 – Gradual Arrow dependency removal**
+   - Incrementally migrate/replace Arrow-dependent internals 
(`ArrowFileReader`, format-specific readers) into `integrations/local` or other 
helper crates.
+   - Adjust file-format APIs as needed; expect this to be multi-release work.
+
+4. **Phase 4 – Dependency cleanup**
+   - Ensure catalog and integration crates depend only on `iceberg` plus the 
specific runtime/FileIO/helper crates they need.
+   - Verify build/test pipelines against the new dependency graph.
+
+5. **Phase 5 – Docs & release**
+   - Publish migration guides: where `to_arrow()` moved, how to assemble 
local/DataFusion stacks.
+   - Schedule deprecation windows for remaining Arrow helpers; target a 
breaking release once Arrow is fully removed from `iceberg`.
+
+## Compatibility
+
+- Short term: users of `Table::scan().to_arrow()` must switch to 
`integrations/local` (or another crate that rehosts that API). Other Arrow 
types stay temporarily but will migrate in later phases.
+- Long term: `iceberg` will be Arrow-free; companion crates provide 
Arrow-based helpers.
+- Tests/examples move alongside the implementations they exercise.
+
+## Risks and Mitigations
+
+| Risk | Description | Mitigation |
+| ---- | ----------- | ---------- |
+| Arrow dependency unwinding is complex | File-format readers may rely on 
Arrow types | Phase the work; move `to_arrow()` first, then refactor readers; 
document interim state |
+| Discoverability | Users may not know where Arrow helpers went | Clear docs 
pointing to `integrations/local` and `iceberg-datafusion`; migration guide |
+| Trait churn | Future trait RFCs may break early adopters | Use deprecation 
shims and communicate timelines |
+| Duplicate impls | Multiple helper crates could overlap | Provide recommended 
combinations and feature guidance |
+
+## Open Questions
+
+1. Versioning: align companion crate versions with `iceberg`, or allow 
independent versions plus compatibility matrix?
+2. Deprecation schedule: how long do we keep interim Arrow helpers before full 
removal from `iceberg`?
+
+## Conclusion
+
+We will keep `iceberg` as the protocol crate while modularizing concrete 
implementations. Arrow removal will be phased: first relocating `to_arrow()` to 
`integrations/local`, then gradually moving Arrow-dependent readers and 
helpers. This keeps the core lean, lets users compose their preferred 
runtime/FileIO stacks, and still offers ready-to-use combinations via companion 
crates.

Reply via email to