This is an automated email from the ASF dual-hosted git repository.
morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 4ae5a7f921c [iceberg] add pyiceberg doc for other branch (#1585)
4ae5a7f921c is described below
commit 4ae5a7f921c356ee2a1549735c001d6705732839
Author: Mingyu Chen (Rayner) <[email protected]>
AuthorDate: Tue Dec 24 21:04:28 2024 +0800
[iceberg] add pyiceberg doc for other branch (#1585)
## Versions
- [x] dev
- [x] 3.0
- [x] 2.1
- [x] 2.0
## Languages
- [x] Chinese
- [x] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---
.../tutorials/building-lakehouse/doris-iceberg.md | 2 +
.../tutorials/building-lakehouse/doris-iceberg.md | 167 +++++++++++++++++++++
.../tutorials/building-lakehouse/doris-iceberg.md | 167 +++++++++++++++++++++
.../tutorials/building-lakehouse/doris-iceberg.md | 167 +++++++++++++++++++++
.../tutorials/building-lakehouse/doris-iceberg.md | 167 +++++++++++++++++++++
.../tutorials/building-lakehouse/doris-iceberg.md | 166 ++++++++++++++++++++
.../tutorials/building-lakehouse/doris-iceberg.md | 166 ++++++++++++++++++++
.../tutorials/building-lakehouse/doris-iceberg.md | 166 ++++++++++++++++++++
8 files changed, 1168 insertions(+)
diff --git a/docs/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
b/docs/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
index af851851f8b..c4f3d3438fd 100644
--- a/docs/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
+++ b/docs/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
@@ -305,6 +305,8 @@ mysql> SELECT * FROM iceberg.nyc.taxis FOR TIME AS OF
"2024-07-29 03:40:22";
### 07 Interacting with PyIceberg
+> Please use Doris 2.1.8/3.0.4 or above.
+
Load an iceberg table:
```python
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
index 3cc43ab17e4..16fc1aa20ec 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
@@ -304,3 +304,170 @@ mysql> SELECT * FROM iceberg.nyc.taxis FOR TIME AS OF
"2024-07-29 03:40:22";
+-----------+---------+---------------+-------------+--------------------+----------------------------+
4 rows in set (0.05 sec)
```
+
+### 07 与 PyIceberg 交互
+
+> 请使用 Doris 2.1.8/3.0.4 以上版本。
+
+加载 Iceberg 表:
+
+```python
+from pyiceberg.catalog import load_catalog
+
+catalog = load_catalog(
+ "iceberg",
+ **{
+ "warehouse" = "warehouse",
+ "uri" = "http://rest:8181",
+ "s3.access-key-id" = "admin",
+ "s3.secret-access-key" = "password",
+ "s3.endpoint" = "http://minio:9000"
+ },
+)
+table = catalog.load_table("nyc.taxis")
+```
+
+读取为 Arrow Table:
+
+```python
+print(table.scan().to_arrow())
+
+pyarrow.Table
+vendor_id: int64
+trip_id: int64
+trip_distance: float
+fare_amount: double
+store_and_fwd_flag: large_string
+ts: timestamp[us]
+----
+vendor_id: [[1],[1],[2],[2]]
+trip_id: [[1000371],[1000374],[1000373],[1000372]]
+trip_distance: [[1.8],[8.4],[0.9],[2.5]]
+fare_amount: [[15.32],[42.13],[9.01],[22.15]]
+store_and_fwd_flag: [["N"],["Y"],["N"],["N"]]
+ts: [[2024-01-01 09:15:23.000000],[2024-01-03 07:12:33.000000],[2024-01-01
03:25:15.000000],[2024-01-02 12:10:11.000000]]
+```
+
+读取为 Pandas DataFrame:
+
+```python
+print(table.scan().to_pandas())
+
+vendor_id trip_id trip_distance fare_amount store_and_fwd_flag
ts
+0 1 1000371 1.8 15.32 N
2024-01-01 09:15:23
+1 1 1000374 8.4 42.13 Y
2024-01-03 07:12:33
+2 2 1000373 0.9 9.01 N
2024-01-01 03:25:15
+3 2 1000372 2.5 22.15 N
2024-01-02 12:10:11
+```
+
+读取为 Polars DataFrame:
+
+```python
+import polars as pl
+
+print(pl.scan_iceberg(table).collect())
+
+shape: (4, 6)
+┌───────────┬─────────┬───────────────┬─────────────┬────────────────────┬─────────────────────┐
+│ vendor_id ┆ trip_id ┆ trip_distance ┆ fare_amount ┆ store_and_fwd_flag ┆ ts
│
+│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ ---
│
+│ i64 ┆ i64 ┆ f32 ┆ f64 ┆ str ┆
datetime[μs] │
+╞═══════════╪═════════╪═══════════════╪═════════════╪════════════════════╪═════════════════════╡
+│ 1 ┆ 1000371 ┆ 1.8 ┆ 15.32 ┆ N ┆
2024-01-01 09:15:23 │
+│ 1 ┆ 1000374 ┆ 8.4 ┆ 42.13 ┆ Y ┆
2024-01-03 07:12:33 │
+│ 2 ┆ 1000373 ┆ 0.9 ┆ 9.01 ┆ N ┆
2024-01-01 03:25:15 │
+│ 2 ┆ 1000372 ┆ 2.5 ┆ 22.15 ┆ N ┆
2024-01-02 12:10:11 │
+└───────────┴─────────┴───────────────┴─────────────┴────────────────────┴─────────────────────┘
+```
+
+> 通过 pyiceberg 写入 iceberg 数据,请参阅[步骤](#通过-pyiceberg-写入数据)
+
+### 08 附录
+
+#### 通过 PyIceberg 写入数据
+
+加载 Iceberg 表:
+
+```python
+from pyiceberg.catalog import load_catalog
+
+catalog = load_catalog(
+ "iceberg",
+ **{
+ "warehouse" = "warehouse",
+ "uri" = "http://rest:8181",
+ "s3.access-key-id" = "admin",
+ "s3.secret-access-key" = "password",
+ "s3.endpoint" = "http://minio:9000"
+ },
+)
+table = catalog.load_table("nyc.taxis")
+```
+
+Arrow Table 写入 Iceberg:
+
+```python
+import pyarrow as pa
+
+df = pa.Table.from_pydict(
+ {
+ "vendor_id": pa.array([1, 2, 2, 1], pa.int64()),
+ "trip_id": pa.array([1000371, 1000372, 1000373, 1000374], pa.int64()),
+ "trip_distance": pa.array([1.8, 2.5, 0.9, 8.4], pa.float32()),
+ "fare_amount": pa.array([15.32, 22.15, 9.01, 42.13], pa.float64()),
+ "store_and_fwd_flag": pa.array(["N", "N", "N", "Y"], pa.string()),
+ "ts": pa.compute.strptime(
+ ["2024-01-01 9:15:23", "2024-01-02 12:10:11", "2024-01-01
3:25:15", "2024-01-03 7:12:33"],
+ "%Y-%m-%d %H:%M:%S",
+ "us",
+ ),
+ }
+)
+table.append(df)
+```
+
+Pandas DataFrame 写入 Iceberg:
+
+```python
+import pyarrow as pa
+import pandas as pd
+
+df = pd.DataFrame(
+ {
+ "vendor_id": pd.Series([1, 2, 2, 1]).astype("int64[pyarrow]"),
+ "trip_id": pd.Series([1000371, 1000372, 1000373,
1000374]).astype("int64[pyarrow]"),
+ "trip_distance": pd.Series([1.8, 2.5, 0.9,
8.4]).astype("float32[pyarrow]"),
+ "fare_amount": pd.Series([15.32, 22.15, 9.01,
42.13]).astype("float64[pyarrow]"),
+ "store_and_fwd_flag": pd.Series(["N", "N", "N",
"Y"]).astype("string[pyarrow]"),
+ "ts": pd.Series(["2024-01-01 9:15:23", "2024-01-02 12:10:11",
"2024-01-01 3:25:15", "2024-01-03 7:12:33"]).astype("timestamp[us][pyarrow]"),
+ }
+)
+table.append(pa.Table.from_pandas(df))
+```
+
+Polars DataFrame 写入 Iceberg:
+
+```python
+import polars as pl
+
+df = pl.DataFrame(
+ {
+ "vendor_id": [1, 2, 2, 1],
+ "trip_id": [1000371, 1000372, 1000373, 1000374],
+ "trip_distance": [1.8, 2.5, 0.9, 8.4],
+ "fare_amount": [15.32, 22.15, 9.01, 42.13],
+ "store_and_fwd_flag": ["N", "N", "N", "Y"],
+ "ts": ["2024-01-01 9:15:23", "2024-01-02 12:10:11", "2024-01-01
3:25:15", "2024-01-03 7:12:33"],
+ },
+ {
+ "vendor_id": pl.Int64,
+ "trip_id": pl.Int64,
+ "trip_distance": pl.Float32,
+ "fare_amount": pl.Float64,
+ "store_and_fwd_flag": pl.String,
+ "ts": pl.String,
+ },
+).with_columns(pl.col("ts").str.strptime(pl.Datetime, "%Y-%m-%d %H:%M:%S"))
+table.append(df.to_arrow())
+```
+
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
index 3cc43ab17e4..16fc1aa20ec 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
@@ -304,3 +304,170 @@ mysql> SELECT * FROM iceberg.nyc.taxis FOR TIME AS OF
"2024-07-29 03:40:22";
+-----------+---------+---------------+-------------+--------------------+----------------------------+
4 rows in set (0.05 sec)
```
+
+### 07 与 PyIceberg 交互
+
+> 请使用 Doris 2.1.8/3.0.4 以上版本。
+
+加载 Iceberg 表:
+
+```python
+from pyiceberg.catalog import load_catalog
+
+catalog = load_catalog(
+ "iceberg",
+ **{
+ "warehouse" = "warehouse",
+ "uri" = "http://rest:8181",
+ "s3.access-key-id" = "admin",
+ "s3.secret-access-key" = "password",
+ "s3.endpoint" = "http://minio:9000"
+ },
+)
+table = catalog.load_table("nyc.taxis")
+```
+
+读取为 Arrow Table:
+
+```python
+print(table.scan().to_arrow())
+
+pyarrow.Table
+vendor_id: int64
+trip_id: int64
+trip_distance: float
+fare_amount: double
+store_and_fwd_flag: large_string
+ts: timestamp[us]
+----
+vendor_id: [[1],[1],[2],[2]]
+trip_id: [[1000371],[1000374],[1000373],[1000372]]
+trip_distance: [[1.8],[8.4],[0.9],[2.5]]
+fare_amount: [[15.32],[42.13],[9.01],[22.15]]
+store_and_fwd_flag: [["N"],["Y"],["N"],["N"]]
+ts: [[2024-01-01 09:15:23.000000],[2024-01-03 07:12:33.000000],[2024-01-01
03:25:15.000000],[2024-01-02 12:10:11.000000]]
+```
+
+读取为 Pandas DataFrame:
+
+```python
+print(table.scan().to_pandas())
+
+vendor_id trip_id trip_distance fare_amount store_and_fwd_flag
ts
+0 1 1000371 1.8 15.32 N
2024-01-01 09:15:23
+1 1 1000374 8.4 42.13 Y
2024-01-03 07:12:33
+2 2 1000373 0.9 9.01 N
2024-01-01 03:25:15
+3 2 1000372 2.5 22.15 N
2024-01-02 12:10:11
+```
+
+读取为 Polars DataFrame:
+
+```python
+import polars as pl
+
+print(pl.scan_iceberg(table).collect())
+
+shape: (4, 6)
+┌───────────┬─────────┬───────────────┬─────────────┬────────────────────┬─────────────────────┐
+│ vendor_id ┆ trip_id ┆ trip_distance ┆ fare_amount ┆ store_and_fwd_flag ┆ ts
│
+│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ ---
│
+│ i64 ┆ i64 ┆ f32 ┆ f64 ┆ str ┆
datetime[μs] │
+╞═══════════╪═════════╪═══════════════╪═════════════╪════════════════════╪═════════════════════╡
+│ 1 ┆ 1000371 ┆ 1.8 ┆ 15.32 ┆ N ┆
2024-01-01 09:15:23 │
+│ 1 ┆ 1000374 ┆ 8.4 ┆ 42.13 ┆ Y ┆
2024-01-03 07:12:33 │
+│ 2 ┆ 1000373 ┆ 0.9 ┆ 9.01 ┆ N ┆
2024-01-01 03:25:15 │
+│ 2 ┆ 1000372 ┆ 2.5 ┆ 22.15 ┆ N ┆
2024-01-02 12:10:11 │
+└───────────┴─────────┴───────────────┴─────────────┴────────────────────┴─────────────────────┘
+```
+
+> 通过 pyiceberg 写入 iceberg 数据,请参阅[步骤](#通过-pyiceberg-写入数据)
+
+### 08 附录
+
+#### 通过 PyIceberg 写入数据
+
+加载 Iceberg 表:
+
+```python
+from pyiceberg.catalog import load_catalog
+
+catalog = load_catalog(
+ "iceberg",
+ **{
+ "warehouse" = "warehouse",
+ "uri" = "http://rest:8181",
+ "s3.access-key-id" = "admin",
+ "s3.secret-access-key" = "password",
+ "s3.endpoint" = "http://minio:9000"
+ },
+)
+table = catalog.load_table("nyc.taxis")
+```
+
+Arrow Table 写入 Iceberg:
+
+```python
+import pyarrow as pa
+
+df = pa.Table.from_pydict(
+ {
+ "vendor_id": pa.array([1, 2, 2, 1], pa.int64()),
+ "trip_id": pa.array([1000371, 1000372, 1000373, 1000374], pa.int64()),
+ "trip_distance": pa.array([1.8, 2.5, 0.9, 8.4], pa.float32()),
+ "fare_amount": pa.array([15.32, 22.15, 9.01, 42.13], pa.float64()),
+ "store_and_fwd_flag": pa.array(["N", "N", "N", "Y"], pa.string()),
+ "ts": pa.compute.strptime(
+ ["2024-01-01 9:15:23", "2024-01-02 12:10:11", "2024-01-01
3:25:15", "2024-01-03 7:12:33"],
+ "%Y-%m-%d %H:%M:%S",
+ "us",
+ ),
+ }
+)
+table.append(df)
+```
+
+Pandas DataFrame 写入 Iceberg:
+
+```python
+import pyarrow as pa
+import pandas as pd
+
+df = pd.DataFrame(
+ {
+ "vendor_id": pd.Series([1, 2, 2, 1]).astype("int64[pyarrow]"),
+ "trip_id": pd.Series([1000371, 1000372, 1000373,
1000374]).astype("int64[pyarrow]"),
+ "trip_distance": pd.Series([1.8, 2.5, 0.9,
8.4]).astype("float32[pyarrow]"),
+ "fare_amount": pd.Series([15.32, 22.15, 9.01,
42.13]).astype("float64[pyarrow]"),
+ "store_and_fwd_flag": pd.Series(["N", "N", "N",
"Y"]).astype("string[pyarrow]"),
+ "ts": pd.Series(["2024-01-01 9:15:23", "2024-01-02 12:10:11",
"2024-01-01 3:25:15", "2024-01-03 7:12:33"]).astype("timestamp[us][pyarrow]"),
+ }
+)
+table.append(pa.Table.from_pandas(df))
+```
+
+Polars DataFrame 写入 Iceberg:
+
+```python
+import polars as pl
+
+df = pl.DataFrame(
+ {
+ "vendor_id": [1, 2, 2, 1],
+ "trip_id": [1000371, 1000372, 1000373, 1000374],
+ "trip_distance": [1.8, 2.5, 0.9, 8.4],
+ "fare_amount": [15.32, 22.15, 9.01, 42.13],
+ "store_and_fwd_flag": ["N", "N", "N", "Y"],
+ "ts": ["2024-01-01 9:15:23", "2024-01-02 12:10:11", "2024-01-01
3:25:15", "2024-01-03 7:12:33"],
+ },
+ {
+ "vendor_id": pl.Int64,
+ "trip_id": pl.Int64,
+ "trip_distance": pl.Float32,
+ "fare_amount": pl.Float64,
+ "store_and_fwd_flag": pl.String,
+ "ts": pl.String,
+ },
+).with_columns(pl.col("ts").str.strptime(pl.Datetime, "%Y-%m-%d %H:%M:%S"))
+table.append(df.to_arrow())
+```
+
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
index 3cc43ab17e4..16fc1aa20ec 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
@@ -304,3 +304,170 @@ mysql> SELECT * FROM iceberg.nyc.taxis FOR TIME AS OF
"2024-07-29 03:40:22";
+-----------+---------+---------------+-------------+--------------------+----------------------------+
4 rows in set (0.05 sec)
```
+
+### 07 与 PyIceberg 交互
+
+> 请使用 Doris 2.1.8/3.0.4 以上版本。
+
+加载 Iceberg 表:
+
+```python
+from pyiceberg.catalog import load_catalog
+
+catalog = load_catalog(
+ "iceberg",
+ **{
+ "warehouse" = "warehouse",
+ "uri" = "http://rest:8181",
+ "s3.access-key-id" = "admin",
+ "s3.secret-access-key" = "password",
+ "s3.endpoint" = "http://minio:9000"
+ },
+)
+table = catalog.load_table("nyc.taxis")
+```
+
+读取为 Arrow Table:
+
+```python
+print(table.scan().to_arrow())
+
+pyarrow.Table
+vendor_id: int64
+trip_id: int64
+trip_distance: float
+fare_amount: double
+store_and_fwd_flag: large_string
+ts: timestamp[us]
+----
+vendor_id: [[1],[1],[2],[2]]
+trip_id: [[1000371],[1000374],[1000373],[1000372]]
+trip_distance: [[1.8],[8.4],[0.9],[2.5]]
+fare_amount: [[15.32],[42.13],[9.01],[22.15]]
+store_and_fwd_flag: [["N"],["Y"],["N"],["N"]]
+ts: [[2024-01-01 09:15:23.000000],[2024-01-03 07:12:33.000000],[2024-01-01
03:25:15.000000],[2024-01-02 12:10:11.000000]]
+```
+
+读取为 Pandas DataFrame:
+
+```python
+print(table.scan().to_pandas())
+
+vendor_id trip_id trip_distance fare_amount store_and_fwd_flag
ts
+0 1 1000371 1.8 15.32 N
2024-01-01 09:15:23
+1 1 1000374 8.4 42.13 Y
2024-01-03 07:12:33
+2 2 1000373 0.9 9.01 N
2024-01-01 03:25:15
+3 2 1000372 2.5 22.15 N
2024-01-02 12:10:11
+```
+
+读取为 Polars DataFrame:
+
+```python
+import polars as pl
+
+print(pl.scan_iceberg(table).collect())
+
+shape: (4, 6)
+┌───────────┬─────────┬───────────────┬─────────────┬────────────────────┬─────────────────────┐
+│ vendor_id ┆ trip_id ┆ trip_distance ┆ fare_amount ┆ store_and_fwd_flag ┆ ts
│
+│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ ---
│
+│ i64 ┆ i64 ┆ f32 ┆ f64 ┆ str ┆
datetime[μs] │
+╞═══════════╪═════════╪═══════════════╪═════════════╪════════════════════╪═════════════════════╡
+│ 1 ┆ 1000371 ┆ 1.8 ┆ 15.32 ┆ N ┆
2024-01-01 09:15:23 │
+│ 1 ┆ 1000374 ┆ 8.4 ┆ 42.13 ┆ Y ┆
2024-01-03 07:12:33 │
+│ 2 ┆ 1000373 ┆ 0.9 ┆ 9.01 ┆ N ┆
2024-01-01 03:25:15 │
+│ 2 ┆ 1000372 ┆ 2.5 ┆ 22.15 ┆ N ┆
2024-01-02 12:10:11 │
+└───────────┴─────────┴───────────────┴─────────────┴────────────────────┴─────────────────────┘
+```
+
+> 通过 pyiceberg 写入 iceberg 数据,请参阅[步骤](#通过-pyiceberg-写入数据)
+
+### 08 附录
+
+#### 通过 PyIceberg 写入数据
+
+加载 Iceberg 表:
+
+```python
+from pyiceberg.catalog import load_catalog
+
+catalog = load_catalog(
+ "iceberg",
+ **{
+ "warehouse" = "warehouse",
+ "uri" = "http://rest:8181",
+ "s3.access-key-id" = "admin",
+ "s3.secret-access-key" = "password",
+ "s3.endpoint" = "http://minio:9000"
+ },
+)
+table = catalog.load_table("nyc.taxis")
+```
+
+Arrow Table 写入 Iceberg:
+
+```python
+import pyarrow as pa
+
+df = pa.Table.from_pydict(
+ {
+ "vendor_id": pa.array([1, 2, 2, 1], pa.int64()),
+ "trip_id": pa.array([1000371, 1000372, 1000373, 1000374], pa.int64()),
+ "trip_distance": pa.array([1.8, 2.5, 0.9, 8.4], pa.float32()),
+ "fare_amount": pa.array([15.32, 22.15, 9.01, 42.13], pa.float64()),
+ "store_and_fwd_flag": pa.array(["N", "N", "N", "Y"], pa.string()),
+ "ts": pa.compute.strptime(
+ ["2024-01-01 9:15:23", "2024-01-02 12:10:11", "2024-01-01
3:25:15", "2024-01-03 7:12:33"],
+ "%Y-%m-%d %H:%M:%S",
+ "us",
+ ),
+ }
+)
+table.append(df)
+```
+
+Pandas DataFrame 写入 Iceberg:
+
+```python
+import pyarrow as pa
+import pandas as pd
+
+df = pd.DataFrame(
+ {
+ "vendor_id": pd.Series([1, 2, 2, 1]).astype("int64[pyarrow]"),
+ "trip_id": pd.Series([1000371, 1000372, 1000373,
1000374]).astype("int64[pyarrow]"),
+ "trip_distance": pd.Series([1.8, 2.5, 0.9,
8.4]).astype("float32[pyarrow]"),
+ "fare_amount": pd.Series([15.32, 22.15, 9.01,
42.13]).astype("float64[pyarrow]"),
+ "store_and_fwd_flag": pd.Series(["N", "N", "N",
"Y"]).astype("string[pyarrow]"),
+ "ts": pd.Series(["2024-01-01 9:15:23", "2024-01-02 12:10:11",
"2024-01-01 3:25:15", "2024-01-03 7:12:33"]).astype("timestamp[us][pyarrow]"),
+ }
+)
+table.append(pa.Table.from_pandas(df))
+```
+
+Polars DataFrame 写入 Iceberg:
+
+```python
+import polars as pl
+
+df = pl.DataFrame(
+ {
+ "vendor_id": [1, 2, 2, 1],
+ "trip_id": [1000371, 1000372, 1000373, 1000374],
+ "trip_distance": [1.8, 2.5, 0.9, 8.4],
+ "fare_amount": [15.32, 22.15, 9.01, 42.13],
+ "store_and_fwd_flag": ["N", "N", "N", "Y"],
+ "ts": ["2024-01-01 9:15:23", "2024-01-02 12:10:11", "2024-01-01
3:25:15", "2024-01-03 7:12:33"],
+ },
+ {
+ "vendor_id": pl.Int64,
+ "trip_id": pl.Int64,
+ "trip_distance": pl.Float32,
+ "fare_amount": pl.Float64,
+ "store_and_fwd_flag": pl.String,
+ "ts": pl.String,
+ },
+).with_columns(pl.col("ts").str.strptime(pl.Datetime, "%Y-%m-%d %H:%M:%S"))
+table.append(df.to_arrow())
+```
+
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
index 3cc43ab17e4..16fc1aa20ec 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
@@ -304,3 +304,170 @@ mysql> SELECT * FROM iceberg.nyc.taxis FOR TIME AS OF
"2024-07-29 03:40:22";
+-----------+---------+---------------+-------------+--------------------+----------------------------+
4 rows in set (0.05 sec)
```
+
+### 07 与 PyIceberg 交互
+
+> 请使用 Doris 2.1.8/3.0.4 以上版本。
+
+加载 Iceberg 表:
+
+```python
+from pyiceberg.catalog import load_catalog
+
+catalog = load_catalog(
+ "iceberg",
+ **{
+ "warehouse" = "warehouse",
+ "uri" = "http://rest:8181",
+ "s3.access-key-id" = "admin",
+ "s3.secret-access-key" = "password",
+ "s3.endpoint" = "http://minio:9000"
+ },
+)
+table = catalog.load_table("nyc.taxis")
+```
+
+读取为 Arrow Table:
+
+```python
+print(table.scan().to_arrow())
+
+pyarrow.Table
+vendor_id: int64
+trip_id: int64
+trip_distance: float
+fare_amount: double
+store_and_fwd_flag: large_string
+ts: timestamp[us]
+----
+vendor_id: [[1],[1],[2],[2]]
+trip_id: [[1000371],[1000374],[1000373],[1000372]]
+trip_distance: [[1.8],[8.4],[0.9],[2.5]]
+fare_amount: [[15.32],[42.13],[9.01],[22.15]]
+store_and_fwd_flag: [["N"],["Y"],["N"],["N"]]
+ts: [[2024-01-01 09:15:23.000000],[2024-01-03 07:12:33.000000],[2024-01-01
03:25:15.000000],[2024-01-02 12:10:11.000000]]
+```
+
+读取为 Pandas DataFrame:
+
+```python
+print(table.scan().to_pandas())
+
+vendor_id trip_id trip_distance fare_amount store_and_fwd_flag
ts
+0 1 1000371 1.8 15.32 N
2024-01-01 09:15:23
+1 1 1000374 8.4 42.13 Y
2024-01-03 07:12:33
+2 2 1000373 0.9 9.01 N
2024-01-01 03:25:15
+3 2 1000372 2.5 22.15 N
2024-01-02 12:10:11
+```
+
+读取为 Polars DataFrame:
+
+```python
+import polars as pl
+
+print(pl.scan_iceberg(table).collect())
+
+shape: (4, 6)
+┌───────────┬─────────┬───────────────┬─────────────┬────────────────────┬─────────────────────┐
+│ vendor_id ┆ trip_id ┆ trip_distance ┆ fare_amount ┆ store_and_fwd_flag ┆ ts
│
+│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ ---
│
+│ i64 ┆ i64 ┆ f32 ┆ f64 ┆ str ┆
datetime[μs] │
+╞═══════════╪═════════╪═══════════════╪═════════════╪════════════════════╪═════════════════════╡
+│ 1 ┆ 1000371 ┆ 1.8 ┆ 15.32 ┆ N ┆
2024-01-01 09:15:23 │
+│ 1 ┆ 1000374 ┆ 8.4 ┆ 42.13 ┆ Y ┆
2024-01-03 07:12:33 │
+│ 2 ┆ 1000373 ┆ 0.9 ┆ 9.01 ┆ N ┆
2024-01-01 03:25:15 │
+│ 2 ┆ 1000372 ┆ 2.5 ┆ 22.15 ┆ N ┆
2024-01-02 12:10:11 │
+└───────────┴─────────┴───────────────┴─────────────┴────────────────────┴─────────────────────┘
+```
+
+> 通过 pyiceberg 写入 iceberg 数据,请参阅[步骤](#通过-pyiceberg-写入数据)
+
+### 08 附录
+
+#### 通过 PyIceberg 写入数据
+
+加载 Iceberg 表:
+
+```python
+from pyiceberg.catalog import load_catalog
+
+catalog = load_catalog(
+ "iceberg",
+ **{
+ "warehouse" = "warehouse",
+ "uri" = "http://rest:8181",
+ "s3.access-key-id" = "admin",
+ "s3.secret-access-key" = "password",
+ "s3.endpoint" = "http://minio:9000"
+ },
+)
+table = catalog.load_table("nyc.taxis")
+```
+
+Arrow Table 写入 Iceberg:
+
+```python
+import pyarrow as pa
+
+df = pa.Table.from_pydict(
+ {
+ "vendor_id": pa.array([1, 2, 2, 1], pa.int64()),
+ "trip_id": pa.array([1000371, 1000372, 1000373, 1000374], pa.int64()),
+ "trip_distance": pa.array([1.8, 2.5, 0.9, 8.4], pa.float32()),
+ "fare_amount": pa.array([15.32, 22.15, 9.01, 42.13], pa.float64()),
+ "store_and_fwd_flag": pa.array(["N", "N", "N", "Y"], pa.string()),
+ "ts": pa.compute.strptime(
+ ["2024-01-01 9:15:23", "2024-01-02 12:10:11", "2024-01-01
3:25:15", "2024-01-03 7:12:33"],
+ "%Y-%m-%d %H:%M:%S",
+ "us",
+ ),
+ }
+)
+table.append(df)
+```
+
+Pandas DataFrame 写入 Iceberg:
+
+```python
+import pyarrow as pa
+import pandas as pd
+
+df = pd.DataFrame(
+ {
+ "vendor_id": pd.Series([1, 2, 2, 1]).astype("int64[pyarrow]"),
+ "trip_id": pd.Series([1000371, 1000372, 1000373,
1000374]).astype("int64[pyarrow]"),
+ "trip_distance": pd.Series([1.8, 2.5, 0.9,
8.4]).astype("float32[pyarrow]"),
+ "fare_amount": pd.Series([15.32, 22.15, 9.01,
42.13]).astype("float64[pyarrow]"),
+ "store_and_fwd_flag": pd.Series(["N", "N", "N",
"Y"]).astype("string[pyarrow]"),
+ "ts": pd.Series(["2024-01-01 9:15:23", "2024-01-02 12:10:11",
"2024-01-01 3:25:15", "2024-01-03 7:12:33"]).astype("timestamp[us][pyarrow]"),
+ }
+)
+table.append(pa.Table.from_pandas(df))
+```
+
+Polars DataFrame 写入 Iceberg:
+
+```python
+import polars as pl
+
+df = pl.DataFrame(
+ {
+ "vendor_id": [1, 2, 2, 1],
+ "trip_id": [1000371, 1000372, 1000373, 1000374],
+ "trip_distance": [1.8, 2.5, 0.9, 8.4],
+ "fare_amount": [15.32, 22.15, 9.01, 42.13],
+ "store_and_fwd_flag": ["N", "N", "N", "Y"],
+ "ts": ["2024-01-01 9:15:23", "2024-01-02 12:10:11", "2024-01-01
3:25:15", "2024-01-03 7:12:33"],
+ },
+ {
+ "vendor_id": pl.Int64,
+ "trip_id": pl.Int64,
+ "trip_distance": pl.Float32,
+ "fare_amount": pl.Float64,
+ "store_and_fwd_flag": pl.String,
+ "ts": pl.String,
+ },
+).with_columns(pl.col("ts").str.strptime(pl.Datetime, "%Y-%m-%d %H:%M:%S"))
+table.append(df.to_arrow())
+```
+
diff --git
a/versioned_docs/version-2.0/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
b/versioned_docs/version-2.0/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
index 3a6159407bc..c4f3d3438fd 100644
---
a/versioned_docs/version-2.0/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
+++
b/versioned_docs/version-2.0/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
@@ -302,3 +302,169 @@ mysql> SELECT * FROM iceberg.nyc.taxis FOR TIME AS OF
"2024-07-29 03:40:22";
+-----------+---------+---------------+-------------+--------------------+----------------------------+
4 rows in set (0.05 sec)
```
+
+### 07 Interacting with PyIceberg
+
+> Please use Doris 2.1.8/3.0.4 or above.
+
+Load an iceberg table:
+
+```python
+from pyiceberg.catalog import load_catalog
+
+catalog = load_catalog(
+ "iceberg",
+ **{
+ "warehouse" = "warehouse",
+ "uri" = "http://rest:8181",
+ "s3.access-key-id" = "admin",
+ "s3.secret-access-key" = "password",
+ "s3.endpoint" = "http://minio:9000"
+ },
+)
+table = catalog.load_table("nyc.taxis")
+```
+
+Read table as `Arrow Table`:
+
+```python
+print(table.scan().to_arrow())
+
+pyarrow.Table
+vendor_id: int64
+trip_id: int64
+trip_distance: float
+fare_amount: double
+store_and_fwd_flag: large_string
+ts: timestamp[us]
+----
+vendor_id: [[1],[1],[2],[2]]
+trip_id: [[1000371],[1000374],[1000373],[1000372]]
+trip_distance: [[1.8],[8.4],[0.9],[2.5]]
+fare_amount: [[15.32],[42.13],[9.01],[22.15]]
+store_and_fwd_flag: [["N"],["Y"],["N"],["N"]]
+ts: [[2024-01-01 09:15:23.000000],[2024-01-03 07:12:33.000000],[2024-01-01
03:25:15.000000],[2024-01-02 12:10:11.000000]]
+```
+
+Read table as `Pandas DataFrame`:
+
+```python
+print(table.scan().to_pandas())
+
+vendor_id trip_id trip_distance fare_amount store_and_fwd_flag
ts
+0 1 1000371 1.8 15.32 N
2024-01-01 09:15:23
+1 1 1000374 8.4 42.13 Y
2024-01-03 07:12:33
+2 2 1000373 0.9 9.01 N
2024-01-01 03:25:15
+3 2 1000372 2.5 22.15 N
2024-01-02 12:10:11
+```
+
+Read table as `Polars DataFrame`:
+
+```python
+import polars as pl
+
+print(pl.scan_iceberg(table).collect())
+
+shape: (4, 6)
+┌───────────┬─────────┬───────────────┬─────────────┬────────────────────┬─────────────────────┐
+│ vendor_id ┆ trip_id ┆ trip_distance ┆ fare_amount ┆ store_and_fwd_flag ┆ ts
│
+│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ ---
│
+│ i64 ┆ i64 ┆ f32 ┆ f64 ┆ str ┆
datetime[μs] │
+╞═══════════╪═════════╪═══════════════╪═════════════╪════════════════════╪═════════════════════╡
+│ 1 ┆ 1000371 ┆ 1.8 ┆ 15.32 ┆ N ┆
2024-01-01 09:15:23 │
+│ 1 ┆ 1000374 ┆ 8.4 ┆ 42.13 ┆ Y ┆
2024-01-03 07:12:33 │
+│ 2 ┆ 1000373 ┆ 0.9 ┆ 9.01 ┆ N ┆
2024-01-01 03:25:15 │
+│ 2 ┆ 1000372 ┆ 2.5 ┆ 22.15 ┆ N ┆
2024-01-02 12:10:11 │
+└───────────┴─────────┴───────────────┴─────────────┴────────────────────┴─────────────────────┘
+```
+
+> Write iceberg table by PyIceberg, please see
[step](#write-iceberg-table-by-pyiceberg)
+
+### 08 Appendix
+
+#### Write iceberg table by PyIceberg
+
+Load an iceberg table:
+
+```python
+from pyiceberg.catalog import load_catalog
+
+catalog = load_catalog(
+ "iceberg",
+ **{
+ "warehouse" = "warehouse",
+ "uri" = "http://rest:8181",
+ "s3.access-key-id" = "admin",
+ "s3.secret-access-key" = "password",
+ "s3.endpoint" = "http://minio:9000"
+ },
+)
+table = catalog.load_table("nyc.taxis")
+```
+
+Write table with `Arrow Table` :
+
+```python
+import pyarrow as pa
+
+df = pa.Table.from_pydict(
+ {
+ "vendor_id": pa.array([1, 2, 2, 1], pa.int64()),
+ "trip_id": pa.array([1000371, 1000372, 1000373, 1000374],
pa.int64()),
+ "trip_distance": pa.array([1.8, 2.5, 0.9, 8.4], pa.float32()),
+ "fare_amount": pa.array([15.32, 22.15, 9.01, 42.13],
pa.float64()),
+ "store_and_fwd_flag": pa.array(["N", "N", "N", "Y"],
pa.string()),
+ "ts": pa.compute.strptime(
+ ["2024-01-01 9:15:23", "2024-01-02 12:10:11",
"2024-01-01 3:25:15", "2024-01-03 7:12:33"],
+ "%Y-%m-%d %H:%M:%S",
+ "us",
+ ),
+ }
+)
+table.append(df)
+```
+
+Write table with `Pandas DataFrame` :
+
+```python
+import pyarrow as pa
+import pandas as pd
+
+df = pd.DataFrame(
+ {
+ "vendor_id": pd.Series([1, 2, 2, 1]).astype("int64[pyarrow]"),
+ "trip_id": pd.Series([1000371, 1000372, 1000373,
1000374]).astype("int64[pyarrow]"),
+ "trip_distance": pd.Series([1.8, 2.5, 0.9,
8.4]).astype("float32[pyarrow]"),
+ "fare_amount": pd.Series([15.32, 22.15, 9.01,
42.13]).astype("float64[pyarrow]"),
+ "store_and_fwd_flag": pd.Series(["N", "N", "N",
"Y"]).astype("string[pyarrow]"),
+ "ts": pd.Series(["2024-01-01 9:15:23", "2024-01-02 12:10:11",
"2024-01-01 3:25:15", "2024-01-03 7:12:33"]).astype("timestamp[us][pyarrow]"),
+ }
+)
+table.append(pa.Table.from_pandas(df))
+```
+
+Write table with `Polars DataFrame` :
+
+```python
+import polars as pl
+
+df = pl.DataFrame(
+ {
+ "vendor_id": [1, 2, 2, 1],
+ "trip_id": [1000371, 1000372, 1000373, 1000374],
+ "trip_distance": [1.8, 2.5, 0.9, 8.4],
+ "fare_amount": [15.32, 22.15, 9.01, 42.13],
+ "store_and_fwd_flag": ["N", "N", "N", "Y"],
+ "ts": ["2024-01-01 9:15:23", "2024-01-02 12:10:11", "2024-01-01
3:25:15", "2024-01-03 7:12:33"],
+ },
+ {
+ "vendor_id": pl.Int64,
+ "trip_id": pl.Int64,
+ "trip_distance": pl.Float32,
+ "fare_amount": pl.Float64,
+ "store_and_fwd_flag": pl.String,
+ "ts": pl.String,
+ },
+).with_columns(pl.col("ts").str.strptime(pl.Datetime, "%Y-%m-%d %H:%M:%S"))
+table.append(df.to_arrow())
+```
diff --git
a/versioned_docs/version-2.1/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
b/versioned_docs/version-2.1/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
index 3a6159407bc..c4f3d3438fd 100644
---
a/versioned_docs/version-2.1/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
+++
b/versioned_docs/version-2.1/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
@@ -302,3 +302,169 @@ mysql> SELECT * FROM iceberg.nyc.taxis FOR TIME AS OF
"2024-07-29 03:40:22";
+-----------+---------+---------------+-------------+--------------------+----------------------------+
4 rows in set (0.05 sec)
```
+
+### 07 Interacting with PyIceberg
+
+> Please use Doris 2.1.8/3.0.4 or above.
+
+Load an iceberg table:
+
+```python
+from pyiceberg.catalog import load_catalog
+
+catalog = load_catalog(
+ "iceberg",
+ **{
+ "warehouse" = "warehouse",
+ "uri" = "http://rest:8181",
+ "s3.access-key-id" = "admin",
+ "s3.secret-access-key" = "password",
+ "s3.endpoint" = "http://minio:9000"
+ },
+)
+table = catalog.load_table("nyc.taxis")
+```
+
+Read table as `Arrow Table`:
+
+```python
+print(table.scan().to_arrow())
+
+pyarrow.Table
+vendor_id: int64
+trip_id: int64
+trip_distance: float
+fare_amount: double
+store_and_fwd_flag: large_string
+ts: timestamp[us]
+----
+vendor_id: [[1],[1],[2],[2]]
+trip_id: [[1000371],[1000374],[1000373],[1000372]]
+trip_distance: [[1.8],[8.4],[0.9],[2.5]]
+fare_amount: [[15.32],[42.13],[9.01],[22.15]]
+store_and_fwd_flag: [["N"],["Y"],["N"],["N"]]
+ts: [[2024-01-01 09:15:23.000000],[2024-01-03 07:12:33.000000],[2024-01-01
03:25:15.000000],[2024-01-02 12:10:11.000000]]
+```
+
+Read table as `Pandas DataFrame`:
+
+```python
+print(table.scan().to_pandas())
+
+vendor_id trip_id trip_distance fare_amount store_and_fwd_flag
ts
+0 1 1000371 1.8 15.32 N
2024-01-01 09:15:23
+1 1 1000374 8.4 42.13 Y
2024-01-03 07:12:33
+2 2 1000373 0.9 9.01 N
2024-01-01 03:25:15
+3 2 1000372 2.5 22.15 N
2024-01-02 12:10:11
+```
+
+Read table as `Polars DataFrame`:
+
+```python
+import polars as pl
+
+print(pl.scan_iceberg(table).collect())
+
+shape: (4, 6)
+┌───────────┬─────────┬───────────────┬─────────────┬────────────────────┬─────────────────────┐
+│ vendor_id ┆ trip_id ┆ trip_distance ┆ fare_amount ┆ store_and_fwd_flag ┆ ts
│
+│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ ---
│
+│ i64 ┆ i64 ┆ f32 ┆ f64 ┆ str ┆
datetime[μs] │
+╞═══════════╪═════════╪═══════════════╪═════════════╪════════════════════╪═════════════════════╡
+│ 1 ┆ 1000371 ┆ 1.8 ┆ 15.32 ┆ N ┆
2024-01-01 09:15:23 │
+│ 1 ┆ 1000374 ┆ 8.4 ┆ 42.13 ┆ Y ┆
2024-01-03 07:12:33 │
+│ 2 ┆ 1000373 ┆ 0.9 ┆ 9.01 ┆ N ┆
2024-01-01 03:25:15 │
+│ 2 ┆ 1000372 ┆ 2.5 ┆ 22.15 ┆ N ┆
2024-01-02 12:10:11 │
+└───────────┴─────────┴───────────────┴─────────────┴────────────────────┴─────────────────────┘
+```
+
+> Write iceberg table by PyIceberg, please see
[step](#write-iceberg-table-by-pyiceberg)
+
+### 08 Appendix
+
+#### Write iceberg table by PyIceberg
+
+Load an iceberg table:
+
+```python
+from pyiceberg.catalog import load_catalog
+
+catalog = load_catalog(
+ "iceberg",
+ **{
+ "warehouse" = "warehouse",
+ "uri" = "http://rest:8181",
+ "s3.access-key-id" = "admin",
+ "s3.secret-access-key" = "password",
+ "s3.endpoint" = "http://minio:9000"
+ },
+)
+table = catalog.load_table("nyc.taxis")
+```
+
+Write table with `Arrow Table` :
+
+```python
+import pyarrow as pa
+
+df = pa.Table.from_pydict(
+ {
+ "vendor_id": pa.array([1, 2, 2, 1], pa.int64()),
+ "trip_id": pa.array([1000371, 1000372, 1000373, 1000374],
pa.int64()),
+ "trip_distance": pa.array([1.8, 2.5, 0.9, 8.4], pa.float32()),
+ "fare_amount": pa.array([15.32, 22.15, 9.01, 42.13],
pa.float64()),
+ "store_and_fwd_flag": pa.array(["N", "N", "N", "Y"],
pa.string()),
+ "ts": pa.compute.strptime(
+ ["2024-01-01 9:15:23", "2024-01-02 12:10:11",
"2024-01-01 3:25:15", "2024-01-03 7:12:33"],
+ "%Y-%m-%d %H:%M:%S",
+ "us",
+ ),
+ }
+)
+table.append(df)
+```
+
+Write table with `Pandas DataFrame` :
+
+```python
+import pyarrow as pa
+import pandas as pd
+
+df = pd.DataFrame(
+ {
+ "vendor_id": pd.Series([1, 2, 2, 1]).astype("int64[pyarrow]"),
+ "trip_id": pd.Series([1000371, 1000372, 1000373,
1000374]).astype("int64[pyarrow]"),
+ "trip_distance": pd.Series([1.8, 2.5, 0.9,
8.4]).astype("float32[pyarrow]"),
+ "fare_amount": pd.Series([15.32, 22.15, 9.01,
42.13]).astype("float64[pyarrow]"),
+ "store_and_fwd_flag": pd.Series(["N", "N", "N",
"Y"]).astype("string[pyarrow]"),
+ "ts": pd.Series(["2024-01-01 9:15:23", "2024-01-02 12:10:11",
"2024-01-01 3:25:15", "2024-01-03 7:12:33"]).astype("timestamp[us][pyarrow]"),
+ }
+)
+table.append(pa.Table.from_pandas(df))
+```
+
+Write table with `Polars DataFrame` :
+
+```python
+import polars as pl
+
+df = pl.DataFrame(
+ {
+ "vendor_id": [1, 2, 2, 1],
+ "trip_id": [1000371, 1000372, 1000373, 1000374],
+ "trip_distance": [1.8, 2.5, 0.9, 8.4],
+ "fare_amount": [15.32, 22.15, 9.01, 42.13],
+ "store_and_fwd_flag": ["N", "N", "N", "Y"],
+ "ts": ["2024-01-01 9:15:23", "2024-01-02 12:10:11", "2024-01-01
3:25:15", "2024-01-03 7:12:33"],
+ },
+ {
+ "vendor_id": pl.Int64,
+ "trip_id": pl.Int64,
+ "trip_distance": pl.Float32,
+ "fare_amount": pl.Float64,
+ "store_and_fwd_flag": pl.String,
+ "ts": pl.String,
+ },
+).with_columns(pl.col("ts").str.strptime(pl.Datetime, "%Y-%m-%d %H:%M:%S"))
+table.append(df.to_arrow())
+```
diff --git
a/versioned_docs/version-3.0/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
b/versioned_docs/version-3.0/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
index 3a6159407bc..c4f3d3438fd 100644
---
a/versioned_docs/version-3.0/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
+++
b/versioned_docs/version-3.0/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
@@ -302,3 +302,169 @@ mysql> SELECT * FROM iceberg.nyc.taxis FOR TIME AS OF
"2024-07-29 03:40:22";
+-----------+---------+---------------+-------------+--------------------+----------------------------+
4 rows in set (0.05 sec)
```
+
+### 07 Interacting with PyIceberg
+
+> Please use Doris 2.1.8/3.0.4 or above.
+
+Load an iceberg table:
+
+```python
+from pyiceberg.catalog import load_catalog
+
+catalog = load_catalog(
+ "iceberg",
+ **{
+ "warehouse" = "warehouse",
+ "uri" = "http://rest:8181",
+ "s3.access-key-id" = "admin",
+ "s3.secret-access-key" = "password",
+ "s3.endpoint" = "http://minio:9000"
+ },
+)
+table = catalog.load_table("nyc.taxis")
+```
+
+Read table as `Arrow Table`:
+
+```python
+print(table.scan().to_arrow())
+
+pyarrow.Table
+vendor_id: int64
+trip_id: int64
+trip_distance: float
+fare_amount: double
+store_and_fwd_flag: large_string
+ts: timestamp[us]
+----
+vendor_id: [[1],[1],[2],[2]]
+trip_id: [[1000371],[1000374],[1000373],[1000372]]
+trip_distance: [[1.8],[8.4],[0.9],[2.5]]
+fare_amount: [[15.32],[42.13],[9.01],[22.15]]
+store_and_fwd_flag: [["N"],["Y"],["N"],["N"]]
+ts: [[2024-01-01 09:15:23.000000],[2024-01-03 07:12:33.000000],[2024-01-01
03:25:15.000000],[2024-01-02 12:10:11.000000]]
+```
+
+Read table as `Pandas DataFrame`:
+
+```python
+print(table.scan().to_pandas())
+
+vendor_id trip_id trip_distance fare_amount store_and_fwd_flag
ts
+0 1 1000371 1.8 15.32 N
2024-01-01 09:15:23
+1 1 1000374 8.4 42.13 Y
2024-01-03 07:12:33
+2 2 1000373 0.9 9.01 N
2024-01-01 03:25:15
+3 2 1000372 2.5 22.15 N
2024-01-02 12:10:11
+```
+
+Read table as `Polars DataFrame`:
+
+```python
+import polars as pl
+
+print(pl.scan_iceberg(table).collect())
+
+shape: (4, 6)
+┌───────────┬─────────┬───────────────┬─────────────┬────────────────────┬─────────────────────┐
+│ vendor_id ┆ trip_id ┆ trip_distance ┆ fare_amount ┆ store_and_fwd_flag ┆ ts
│
+│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ ---
│
+│ i64 ┆ i64 ┆ f32 ┆ f64 ┆ str ┆
datetime[μs] │
+╞═══════════╪═════════╪═══════════════╪═════════════╪════════════════════╪═════════════════════╡
+│ 1 ┆ 1000371 ┆ 1.8 ┆ 15.32 ┆ N ┆
2024-01-01 09:15:23 │
+│ 1 ┆ 1000374 ┆ 8.4 ┆ 42.13 ┆ Y ┆
2024-01-03 07:12:33 │
+│ 2 ┆ 1000373 ┆ 0.9 ┆ 9.01 ┆ N ┆
2024-01-01 03:25:15 │
+│ 2 ┆ 1000372 ┆ 2.5 ┆ 22.15 ┆ N ┆
2024-01-02 12:10:11 │
+└───────────┴─────────┴───────────────┴─────────────┴────────────────────┴─────────────────────┘
+```
+
+> Write iceberg table by PyIceberg, please see
[step](#write-iceberg-table-by-pyiceberg)
+
+### 08 Appendix
+
+#### Write iceberg table by PyIceberg
+
+Load an iceberg table:
+
+```python
+from pyiceberg.catalog import load_catalog
+
+catalog = load_catalog(
+ "iceberg",
+ **{
+ "warehouse" = "warehouse",
+ "uri" = "http://rest:8181",
+ "s3.access-key-id" = "admin",
+ "s3.secret-access-key" = "password",
+ "s3.endpoint" = "http://minio:9000"
+ },
+)
+table = catalog.load_table("nyc.taxis")
+```
+
+Write table with `Arrow Table` :
+
+```python
+import pyarrow as pa
+
+df = pa.Table.from_pydict(
+ {
+ "vendor_id": pa.array([1, 2, 2, 1], pa.int64()),
+ "trip_id": pa.array([1000371, 1000372, 1000373, 1000374],
pa.int64()),
+ "trip_distance": pa.array([1.8, 2.5, 0.9, 8.4], pa.float32()),
+ "fare_amount": pa.array([15.32, 22.15, 9.01, 42.13],
pa.float64()),
+ "store_and_fwd_flag": pa.array(["N", "N", "N", "Y"],
pa.string()),
+ "ts": pa.compute.strptime(
+ ["2024-01-01 9:15:23", "2024-01-02 12:10:11",
"2024-01-01 3:25:15", "2024-01-03 7:12:33"],
+ "%Y-%m-%d %H:%M:%S",
+ "us",
+ ),
+ }
+)
+table.append(df)
+```
+
+Write table with `Pandas DataFrame` :
+
+```python
+import pyarrow as pa
+import pandas as pd
+
+df = pd.DataFrame(
+ {
+ "vendor_id": pd.Series([1, 2, 2, 1]).astype("int64[pyarrow]"),
+ "trip_id": pd.Series([1000371, 1000372, 1000373,
1000374]).astype("int64[pyarrow]"),
+ "trip_distance": pd.Series([1.8, 2.5, 0.9,
8.4]).astype("float32[pyarrow]"),
+ "fare_amount": pd.Series([15.32, 22.15, 9.01,
42.13]).astype("float64[pyarrow]"),
+ "store_and_fwd_flag": pd.Series(["N", "N", "N",
"Y"]).astype("string[pyarrow]"),
+ "ts": pd.Series(["2024-01-01 9:15:23", "2024-01-02 12:10:11",
"2024-01-01 3:25:15", "2024-01-03 7:12:33"]).astype("timestamp[us][pyarrow]"),
+ }
+)
+table.append(pa.Table.from_pandas(df))
+```
+
+Write table with `Polars DataFrame` :
+
+```python
+import polars as pl
+
+df = pl.DataFrame(
+ {
+ "vendor_id": [1, 2, 2, 1],
+ "trip_id": [1000371, 1000372, 1000373, 1000374],
+ "trip_distance": [1.8, 2.5, 0.9, 8.4],
+ "fare_amount": [15.32, 22.15, 9.01, 42.13],
+ "store_and_fwd_flag": ["N", "N", "N", "Y"],
+ "ts": ["2024-01-01 9:15:23", "2024-01-02 12:10:11", "2024-01-01
3:25:15", "2024-01-03 7:12:33"],
+ },
+ {
+ "vendor_id": pl.Int64,
+ "trip_id": pl.Int64,
+ "trip_distance": pl.Float32,
+ "fare_amount": pl.Float64,
+ "store_and_fwd_flag": pl.String,
+ "ts": pl.String,
+ },
+).with_columns(pl.col("ts").str.strptime(pl.Datetime, "%Y-%m-%d %H:%M:%S"))
+table.append(df.to_arrow())
+```
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]