This is an automated email from the ASF dual-hosted git repository.
lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/paimon.git
The following commit(s) were added to refs/heads/master by this push:
new 0cee9ba66c [doc] Refactor names in python-api
0cee9ba66c is described below
commit 0cee9ba66ceb7ff5c584c8373504d6fb2df647e4
Author: JingsongLi <[email protected]>
AuthorDate: Sun Oct 26 13:57:22 2025 +0800
[doc] Refactor names in python-api
---
docs/content/program-api/python-api.md | 149 ++++++++-------------------------
1 file changed, 34 insertions(+), 115 deletions(-)
diff --git a/docs/content/program-api/python-api.md
b/docs/content/program-api/python-api.md
index 1f276a0b48..2a01dab5bc 100644
--- a/docs/content/program-api/python-api.md
+++ b/docs/content/program-api/python-api.md
@@ -32,8 +32,6 @@ implementation of the brand new PyPaimon does not require JDK
installation.
## Environment Settings
-### SDK Installing
-
SDK is published at [pypaimon](https://pypi.org/project/pypaimon/). You can
install by
```shell
@@ -44,6 +42,8 @@ pip install pypaimon
Before coming into contact with the Table, you need to create a Catalog.
+{{< tabs "create-catalog" >}}
+{{< tab "filesystem" >}}
```python
from pypaimon import CatalogFactory
@@ -53,14 +53,33 @@ catalog_options = {
}
catalog = CatalogFactory.create(catalog_options)
```
+{{< /tab >}}
+{{< tab "rest catalog" >}}
+The sample code is as follows. The detailed meaning of option can be found in
[DLF Token](../concepts/rest/dlf.md).
-Currently, PyPaimon only support filesystem catalog and rest catalog. See
[Catalog]({{< ref "concepts/catalog" >}}).
+```python
+from pypaimon import CatalogFactory
+
+# Note that keys and values are all string
+catalog_options = {
+ 'metastore': 'rest',
+ 'warehouse': 'xxx',
+ 'uri': 'xxx',
+ 'dlf.region': 'xxx',
+ 'token.provider': 'xxx',
+ 'dlf.access-key-id': 'xxx',
+ 'dlf.access-key-secret': 'xxx'
+}
+catalog = CatalogFactory.create(catalog_options)
+```
+{{< /tab >}}
+{{< /tabs >}}
-## Create Database & Table
+Currently, PyPaimon only support filesystem catalog and rest catalog. See
[Catalog]({{< ref "concepts/catalog" >}}).
You can use the catalog to create table for writing data.
-### Create Database (optional)
+## Create Database
Table is located in a database. If you want to create table in a new database,
you should create it.
@@ -72,7 +91,7 @@ catalog.create_database(
)
```
-### Create Schema
+## Create Table
Table schema contains fields definition, partition keys, primary keys, table
options and comment.
The field definition is described by `pyarrow.Schema`. All arguments except
fields definition are optional.
@@ -131,8 +150,6 @@ schema = Schema.from_pyarrow_schema(
)
```
-### Create Table
-
After building table schema, you can create corresponding table:
```python
@@ -142,13 +159,8 @@ catalog.create_table(
schema=schema,
ignore_if_exists=True # To raise error if the table exists, set False
)
-```
-
-## Get Table
-The Table interface provides tools to read and write table.
-
-```python
+# Get Table
table = catalog.get_table('database_name.table_name')
```
@@ -203,7 +215,7 @@ write_builder =
table.new_batch_write_builder().overwrite({'dt': '2024-01-01'})
## Batch Read
-### Get ReadBuilder and Perform pushdown
+### Predicate pushdown
A `ReadBuilder` is used to build reading utils and perform filter and
projection pushdown.
@@ -238,7 +250,7 @@ You can also pushdown projection by `ReadBuilder`:
read_builder = read_builder.with_projection(['f3', 'f2'])
```
-### Scan Plan
+### Generate Splits
Then you can step into Scan Plan stage to get `splits`:
@@ -247,11 +259,9 @@ table_scan = read_builder.new_scan()
splits = table_scan.plan().splits()
```
-### Read Splits
-
Finally, you can read data from the `splits` to various data format.
-#### Apache Arrow
+### Read Apache Arrow
This requires `pyarrow` to be installed.
@@ -285,7 +295,7 @@ for batch in table_read.to_arrow_batch_reader(splits):
# f1: ["a","b","c"]
```
-#### Python Iterator
+### Read Python Iterator
You can read the data row by row into a native Python iterator.
This is convenient for custom row-based processing logic.
@@ -299,7 +309,7 @@ for row in table_read.to_iterator(splits):
# ["a","b","c"]
```
-#### Pandas
+### Read Pandas
This requires `pandas` to be installed.
@@ -318,7 +328,7 @@ print(df)
# ...
```
-#### DuckDB
+### Read DuckDB
This requires `duckdb` to be installed.
@@ -341,7 +351,7 @@ print(duckdb_con.query("SELECT * FROM duckdb_table WHERE f0
= 1").fetchdf())
# 0 1 a
```
-#### Ray
+### Read Ray
This requires `ray` to be installed.
@@ -366,7 +376,7 @@ print(ray_dataset.to_pandas())
# ...
```
-### Incremental Read Between Timestamps
+### Incremental Read
This API allows reading data committed between two snapshot timestamps. The
steps are as follows.
@@ -519,97 +529,6 @@ Key points about shard read:
- **Parallel Processing**: Each shard can be processed independently for
better performance
- **Consistency**: Combining all shards should produce the complete table data
-## REST API
-
-### Create Catalog
-
-The sample code is as follows. The detailed meaning of option can be found in
[DLF Token](../concepts/rest/dlf.md).
-
-```python
-from pypaimon import CatalogFactory
-
-# Note that keys and values are all string
-catalog_options = {
- 'metastore': 'rest',
- 'warehouse': 'xxx',
- 'uri': 'xxx',
- 'dlf.region': 'xxx',
- 'token.provider': 'xxx',
- 'dlf.access-key-id': 'xxx',
- 'dlf.access-key-secret': 'xxx'
-}
-catalog = CatalogFactory.create(catalog_options)
-```
-
-### Write And Read
-
-Write and read operations with RESTCatalog is exactly the same as that of
FileSystemCatalog.
-
-```python
-import pyarrow as pa
-from pypaimon.api.options import Options
-from pypaimon.catalog.catalog_context import CatalogContext
-from pypaimon.catalog.rest.rest_catalog import RESTCatalog
-from pypaimon.schema.schema import Schema
-
-
-def write_test_table(table):
- write_builder = table.new_batch_write_builder()
-
- # first write
- table_write = write_builder.new_write()
- table_commit = write_builder.new_commit()
- data1 = {
- 'user_id': [1, 2, 3, 4],
- 'item_id': [1001, 1002, 1003, 1004],
- 'behavior': ['a', 'b', 'c', 'd'],
- 'dt': ['12', '34', '56', '78'],
- }
- pa_table = pa.Table.from_pydict(data1, schema=pa_schema)
- table_write.write_arrow(pa_table)
- table_commit.commit(table_write.prepare_commit())
- table_write.close()
- table_commit.close()
-
-
-def read_test_table(read_builder):
- table_read = read_builder.new_read()
- splits = read_builder.new_scan().plan().splits()
- return table_read.to_arrow(splits)
-
-
-options = {
- 'metastore': 'rest',
- 'warehouse': 'xxx',
- 'uri': 'xxx',
- 'dlf.region': 'xxx',
- 'token.provider': 'xxx',
- 'dlf.access-key-id': 'xxx',
- 'dlf.access-key-secret': 'xxx'
-}
-
-rest_catalog =
RESTCatalog(CatalogContext.create_from_options(Options(options)))
-print("rest catalog create success")
-pa_schema = pa.schema([
- ('user_id', pa.int32()),
- ('item_id', pa.int64()),
- ('behavior', pa.string()),
- ('dt', pa.string()),
-])
-
-# test parquet append only read
-schema = Schema.from_pyarrow_schema(pa_schema, partition_keys=['dt'])
-rest_catalog.create_table('default.test_t', schema, True)
-table = rest_catalog.get_table('default.test_t')
-write_test_table(table)
-print("write success")
-
-read_builder = table.new_read_builder()
-actual = read_test_table(read_builder)
-print("read data:")
-print(actual)
-```
-
## Data Types
| Python Native Type | PyArrow Type |
Paimon Type |