Re: [PR] Add comprehensive migration documentation section [doris-website]

via GitHub Thu, 05 Feb 2026 15:47:32 -0800


Copilot commented on code in PR #3341:
URL: https://github.com/apache/doris-website/pull/3341#discussion_r2771621682



##########
docs/migration/other-olap-to-doris.md:
##########
@@ -0,0 +1,457 @@
+---
+{
+    "title": "Other OLAP Systems to Doris",
+    "language": "en",
+    "description": "Guide to migrating data from ClickHouse, Greenplum, Hive, 
Iceberg, Hudi and other OLAP systems to Apache Doris"
+}
+---
+
+This guide covers migrating data from various OLAP systems to Apache Doris, 
including ClickHouse, Greenplum, and data lake technologies like Hive, Iceberg, 
and Hudi.
+
+## Migration Methods Overview
+
+| Source System | Recommended Method | Notes |
+|---------------|-------------------|-------|
+| ClickHouse | JDBC Catalog + SQL Convertor | Schema and SQL syntax conversion 
needed |

Review Comment:
   The term "Convertor" should be spelled "Converter" for correct English 
spelling. "Converter" is the standard spelling for a tool that converts 
something from one form to another.
   ```suggestion
   | ClickHouse | JDBC Catalog + SQL Converter | Schema and SQL syntax 
conversion needed |
   ```



##########
docs/migration/elasticsearch-to-doris.md:
##########
@@ -0,0 +1,450 @@
+---
+{
+    "title": "Elasticsearch to Doris",
+    "language": "en",
+    "description": "Comprehensive guide to migrating data from Elasticsearch 
to Apache Doris"
+}
+---
+
+This guide covers migrating data from Elasticsearch to Apache Doris. Doris can 
serve as a powerful alternative to Elasticsearch for log analytics, full-text 
search, and general OLAP workloads, often with better performance and lower 
operational complexity.
+
+## Why Migrate from Elasticsearch to Doris?
+
+| Aspect | Elasticsearch | Apache Doris |
+|--------|---------------|--------------|
+| Query Language | DSL (JSON-based) | Standard SQL |
+| JOINs | Limited | Full SQL JOINs |
+| Storage Efficiency | Higher storage usage | Columnar compression |
+| Operational Complexity | Complex cluster management | Simpler operations |
+| Full-text Search | Native inverted index | Inverted index support |
+| Real-time Analytics | Good | Excellent |
+
+## Considerations
+
+1. **Full-text Search**: Doris supports [Inverted 
Index](../table-design/index/inverted-index/overview.md) for full-text search 
capabilities similar to Elasticsearch.
+
+2. **Index to Table Mapping**: Each Elasticsearch index typically maps to a 
Doris table.
+
+3. **Nested Documents**: Elasticsearch nested types map to Doris 
[VARIANT](../data-operate/import/complex-types/variant.md) type for flexible 
schema handling.
+
+4. **Array Handling**: Elasticsearch arrays require explicit configuration in 
Doris.
+
+## Data Type Mapping
+
+| Elasticsearch Type | Doris Type | Notes |
+|--------------------|------------|-------|
+| null | NULL | |
+| boolean | BOOLEAN | |
+| byte | TINYINT | |
+| short | SMALLINT | |
+| integer | INT | |
+| long | BIGINT | |
+| unsigned_long | LARGEINT | |
+| float | FLOAT | |
+| half_float | FLOAT | |
+| double | DOUBLE | |
+| scaled_float | DOUBLE | |
+| keyword | STRING | |
+| text | STRING | Consider inverted index in Doris |
+| date | DATE or DATETIME | See [Date Handling](#handling-date-types) |
+| ip | STRING | |
+| nested | VARIANT | See [VARIANT 
type](../data-operate/import/complex-types/variant.md) for flexible schema |
+| object | JSON | |
+| flattened | JSON | Supported since Doris 3.1.4, 4.0.3 |
+| geo_point | STRING | Store as "lat,lon" string |
+| geo_shape | STRING | Store as GeoJSON string |
+
+## Migration Options
+
+### Option 1: ES Catalog (Recommended)
+
+The ES Catalog provides direct access to Elasticsearch data from Doris, 
enabling both querying and migration.
+
+#### Prerequisites
+
+- Elasticsearch 5.x or higher
+- Network connectivity between Doris FE/BE nodes and Elasticsearch
+
+#### Step 1: Create ES Catalog
+
+```sql
+CREATE CATALOG es_catalog PROPERTIES (
+    'type' = 'es',
+    'hosts' = 'http://es-node1:9200,http://es-node2:9200',
+    'user' = 'elastic',
+    'password' = 'password'
+);
+```
+
+With additional options:
+
+```sql
+CREATE CATALOG es_catalog PROPERTIES (
+    'type' = 'es',
+    'hosts' = 'http://es-node1:9200',
+    'user' = 'elastic',
+    'password' = 'password',
+    'doc_value_scan' = 'true',
+    'keyword_sniff' = 'true',
+    'nodes_discovery' = 'true',
+    'ssl' = 'false',
+    'mapping_es_id' = 'true'
+);
+```
+
+#### Step 2: Explore Elasticsearch Data
+
+```sql
+-- Switch to ES catalog
+SWITCH es_catalog;
+
+-- ES creates a default_db database
+USE default_db;
+
+-- List indices as tables
+SHOW TABLES;
+
+-- Preview data
+SELECT * FROM logs_index LIMIT 10;
+
+-- Check field mappings
+DESC logs_index;
+```
+
+#### Step 3: Design Doris Table
+
+Based on your Elasticsearch index, design an appropriate Doris table:
+
+```sql
+-- Example: Log data table
+SWITCH internal;
+
+CREATE TABLE logs (
+    `@timestamp` DATETIME NOT NULL,
+    log_id VARCHAR(64),
+    level VARCHAR(16),
+    message TEXT,
+    host VARCHAR(128),
+    service VARCHAR(64),
+    trace_id VARCHAR(64),
+    INDEX idx_message (message) USING INVERTED PROPERTIES("parser" = 
"unicode", "support_phrase" = "true"),
+    INDEX idx_level (level) USING INVERTED,
+    INDEX idx_service (service) USING INVERTED
+)
+DUPLICATE KEY(`@timestamp`, log_id)
+PARTITION BY RANGE(`@timestamp`) ()
+DISTRIBUTED BY HASH(log_id) BUCKETS 16
+PROPERTIES (
+    "dynamic_partition.enable" = "true",
+    "dynamic_partition.time_unit" = "DAY",
+    "dynamic_partition.start" = "-30",
+    "dynamic_partition.end" = "3",
+    "dynamic_partition.prefix" = "p",
+    "replication_num" = "3"
+);
+```
+
+`★ Insight ─────────────────────────────────────`
+1. **DUPLICATE KEY model** is best for log data where append-only writes are 
common
+2. **Inverted indexes** enable full-text search similar to Elasticsearch
+3. **Dynamic partitioning** automatically manages time-based data lifecycle
+`─────────────────────────────────────────────────`

Review Comment:
   This "Insight" section uses an unusual formatting style with inline code 
blocks (backticks) rather than a proper markdown callout or info box. This 
formatting is inconsistent with standard Docusaurus documentation practices and 
may not render as intended. Consider using a standard markdown blockquote (>) 
or a Docusaurus admonition (:::info, :::tip, etc.) instead for better visual 
consistency and rendering.
   ```suggestion
   :::info Insight
   1. **DUPLICATE KEY model** is best for log data where append-only writes are 
common
   2. **Inverted indexes** enable full-text search similar to Elasticsearch
   3. **Dynamic partitioning** automatically manages time-based data lifecycle
   :::
   ```



##########
docs/migration/postgresql-to-doris.md:
##########
@@ -0,0 +1,433 @@
+---
+{
+    "title": "PostgreSQL to Doris",
+    "language": "en",
+    "description": "Comprehensive guide to migrating data from PostgreSQL to 
Apache Doris"
+}
+---
+
+This guide covers migrating data from PostgreSQL to Apache Doris. You can 
choose from several migration methods depending on your requirements for 
real-time sync, data volume, and operational complexity.
+
+## Considerations
+
+1. **Schema Design**: Before migration, select an appropriate Doris [Data 
Model](../table-design/data-model/overview.md) and plan your 
[Partitioning](../table-design/data-partitioning/data-distribution.md) and 
[Bucketing](../table-design/data-partitioning/data-bucketing.md) strategies.
+
+2. **Data Types**: Review the type mapping table below. Some PostgreSQL types 
require special handling (arrays, timestamps with timezone, JSON).
+
+3. **Primary Keys**: PostgreSQL's serial/identity columns map to Doris 
INT/BIGINT types. For unique constraints, use Doris's UNIQUE KEY model.
+
+## Data Type Mapping
+
+| PostgreSQL Type | Doris Type | Notes |
+|-----------------|------------|-------|
+| boolean | BOOLEAN | |
+| smallint / int2 | SMALLINT | |
+| integer / int4 | INT | |
+| bigint / int8 | BIGINT | |
+| decimal / numeric | DECIMAL(P,S) | Numeric without precision maps to STRING |
+| real / float4 | FLOAT | |
+| double precision | DOUBLE | |
+| smallserial | SMALLINT | |
+| serial | INT | |
+| bigserial | BIGINT | |
+| char(n) | CHAR(N) | |
+| varchar / text | STRING | |
+| timestamp | DATETIME | |
+| timestamptz | DATETIME | Converted to local timezone; see [Timezone 
Issues](#handling-timezone-issues) |
+| date | DATE | |
+| time | STRING | Doris does not support TIME type |
+| interval | STRING | |
+| json / jsonb | JSON or STRING | Use STRING for better query performance |
+| uuid | STRING | |
+| bytea | STRING | |
+| array | ARRAY | See [Handling Arrays](#handling-arrays) |
+| inet / cidr / macaddr | STRING | |
+| point / line / polygon | STRING | Geometric types stored as strings |
+
+## Migration Options
+
+### Option 1: JDBC Catalog (Recommended)
+
+The JDBC Catalog provides direct access to PostgreSQL data from Doris. This is 
the simplest approach for both querying and migrating data.
+
+#### Prerequisites
+
+- PostgreSQL 11.x or higher
+- [PostgreSQL JDBC Driver](https://jdbc.postgresql.org/) version 42.5.x or 
above
+- Network connectivity between Doris FE/BE nodes and PostgreSQL (port 5432)
+
+#### Step 1: Download and Deploy JDBC Driver
+
+```bash
+# Download the driver
+wget https://jdbc.postgresql.org/download/postgresql-42.5.6.jar
+
+# Copy to Doris FE and BE jdbc_drivers directories
+cp postgresql-42.5.6.jar $DORIS_HOME/fe/jdbc_drivers/
+cp postgresql-42.5.6.jar $DORIS_HOME/be/jdbc_drivers/
+```
+
+#### Step 2: Create the PostgreSQL Catalog
+
+```sql
+CREATE CATALOG pg_catalog PROPERTIES (
+    'type' = 'jdbc',
+    'user' = 'postgres_user',
+    'password' = 'postgres_password',
+    'jdbc_url' = 'jdbc:postgresql://pg-host:5432/database_name',
+    'driver_url' = 'postgresql-42.5.6.jar',
+    'driver_class' = 'org.postgresql.Driver'
+);
+```
+
+For SSL connections:
+
+```sql
+CREATE CATALOG pg_catalog PROPERTIES (
+    'type' = 'jdbc',
+    'user' = 'postgres_user',
+    'password' = 'postgres_password',
+    'jdbc_url' = 
'jdbc:postgresql://pg-host:5432/database_name?ssl=true&sslmode=require',
+    'driver_url' = 'postgresql-42.5.6.jar',
+    'driver_class' = 'org.postgresql.Driver'
+);
+```
+
+#### Step 3: Explore Source Data
+
+```sql
+-- Switch to the catalog
+SWITCH pg_catalog;
+
+-- List available schemas (databases in Doris)
+SHOW DATABASES;
+
+-- Use a schema
+USE public;
+
+-- List tables
+SHOW TABLES;
+
+-- Preview data
+SELECT * FROM source_table LIMIT 10;
+
+-- Check row count
+SELECT COUNT(*) FROM source_table;
+```
+
+#### Step 4: Create Doris Target Table
+
+```sql
+-- Switch back to internal catalog
+SWITCH internal;
+USE target_db;
+
+-- Create table based on source schema
+CREATE TABLE orders (
+    order_id INT,
+    customer_id INT,
+    order_date DATE NOT NULL,
+    total_amount DECIMAL(10, 2),
+    status VARCHAR(32),
+    created_at DATETIME
+)
+UNIQUE KEY(order_id, order_date)
+PARTITION BY RANGE(order_date) ()
+DISTRIBUTED BY HASH(order_id) BUCKETS 16
+PROPERTIES (
+    "dynamic_partition.enable" = "true",
+    "dynamic_partition.time_unit" = "MONTH",
+    "dynamic_partition.start" = "-12",
+    "dynamic_partition.end" = "3",
+    "dynamic_partition.prefix" = "p",
+    "dynamic_partition.buckets" = "16",
+    "replication_num" = "3"
+);
+```
+
+#### Step 5: Migrate Data
+
+For small to medium tables:
+
+```sql
+INSERT INTO internal.target_db.orders
+SELECT
+    order_id,
+    customer_id,
+    order_date,
+    total_amount,
+    status,
+    created_at
+FROM pg_catalog.public.orders;
+```
+
+For large tables, migrate in batches:
+
+```sql
+-- Batch by date range
+INSERT INTO internal.target_db.orders
+SELECT * FROM pg_catalog.public.orders
+WHERE order_date >= '2024-01-01' AND order_date < '2024-04-01';
+
+INSERT INTO internal.target_db.orders
+SELECT * FROM pg_catalog.public.orders
+WHERE order_date >= '2024-04-01' AND order_date < '2024-07-01';
+```
+
+#### Step 6: Validate Migration
+
+```sql
+-- Compare row counts
+SELECT 'doris' as source, COUNT(*) as cnt FROM internal.target_db.orders
+UNION ALL
+SELECT 'postgres' as source, COUNT(*) as cnt FROM pg_catalog.public.orders;
+
+-- Spot check specific records
+SELECT * FROM internal.target_db.orders WHERE order_id = 12345;
+SELECT * FROM pg_catalog.public.orders WHERE order_id = 12345;
+```
+
+### Option 2: Flink CDC (Real-time Sync)
+
+Flink CDC captures changes from PostgreSQL WAL (Write-Ahead Log) and streams 
them to Doris in real-time. This is ideal for continuous synchronization.
+
+#### Prerequisites
+
+- PostgreSQL with logical replication enabled (`wal_level = logical`)
+- Flink 1.15+ with Flink CDC and Flink Doris Connector
+- A replication slot in PostgreSQL
+
+#### Step 1: Configure PostgreSQL
+
+Ensure these settings in `postgresql.conf`:
+
+```properties
+wal_level = logical
+max_replication_slots = 10
+max_wal_senders = 10
+```
+
+Create a replication user and grant permissions:
+
+```sql
+-- Create user with replication privilege
+CREATE USER flink_cdc WITH REPLICATION PASSWORD 'password';
+
+-- Grant access to tables
+GRANT SELECT ON ALL TABLES IN SCHEMA public TO flink_cdc;
+GRANT USAGE ON SCHEMA public TO flink_cdc;
+```
+
+#### Step 2: Create Flink CDC Job
+
+Using Flink SQL:
+
+```sql
+-- Source: PostgreSQL CDC
+CREATE TABLE pg_orders (
+    order_id INT,
+    customer_id INT,
+    order_date DATE,
+    total_amount DECIMAL(10, 2),
+    status STRING,
+    created_at TIMESTAMP(3),
+    PRIMARY KEY (order_id) NOT ENFORCED
+) WITH (
+    'connector' = 'postgres-cdc',
+    'hostname' = 'pg-host',
+    'port' = '5432',
+    'username' = 'flink_cdc',
+    'password' = 'password',
+    'database-name' = 'source_db',
+    'schema-name' = 'public',
+    'table-name' = 'orders',
+    'slot.name' = 'flink_slot',
+    'decoding.plugin.name' = 'pgoutput'
+);
+
+-- Sink: Doris
+CREATE TABLE doris_orders (
+    order_id INT,
+    customer_id INT,
+    order_date DATE,
+    total_amount DECIMAL(10, 2),
+    status STRING,
+    created_at DATETIME
+) WITH (
+    'connector' = 'doris',
+    'fenodes' = 'doris-fe:8030',
+    'table.identifier' = 'target_db.orders',
+    'username' = 'doris_user',
+    'password' = 'doris_password',
+    'sink.enable-2pc' = 'true',
+    'sink.label-prefix' = 'pg_orders_sync'
+);
+
+-- Start sync
+INSERT INTO doris_orders SELECT * FROM pg_orders;
+```
+
+#### Step 3: Full Database Sync
+
+For synchronizing multiple tables or entire schemas:
+
+```sql
+-- Use Flink Doris Connector's database sync feature
+CREATE DATABASE IF NOT EXISTS sync_db;
+
+-- FlinkCDC whole database sync configuration
+-- See Flink Doris Connector documentation for complete setup

Review Comment:
   The PostgreSQL full database sync section (Step 3) provides only a 
placeholder comment directing users to see the Flink Doris Connector 
documentation, while the MySQL migration guide provides a complete working 
example with the CdcTools command. For consistency and completeness, consider 
adding a similar concrete example here showing the postgres-sync-database 
command, similar to the mysql-sync-database example in mysql-to-doris.md (lines 
132-149).
   ```suggestion
   ```bash
   # Use CdcTools to perform a full PostgreSQL database sync into Doris
   # This is similar in spirit to the mysql-sync-database command
   
   cdc-tools postgres-sync-database \
     --pg-host pg-host \
     --pg-port 5432 \
     --pg-username pg_user \
     --pg-password pg_password \
     --pg-database source_db \
     --include-tables "public.orders,public.customers" \
     --doris-fe-nodes doris-fe:8030 \
     --doris-username doris_user \
     --doris-password doris_password \
     --doris-database target_db \
     --sink-label-prefix pg_full_sync
   ```



##########
docs/migration/elasticsearch-to-doris.md:
##########
@@ -0,0 +1,450 @@
+---
+{
+    "title": "Elasticsearch to Doris",
+    "language": "en",
+    "description": "Comprehensive guide to migrating data from Elasticsearch 
to Apache Doris"
+}
+---
+
+This guide covers migrating data from Elasticsearch to Apache Doris. Doris can 
serve as a powerful alternative to Elasticsearch for log analytics, full-text 
search, and general OLAP workloads, often with better performance and lower 
operational complexity.
+
+## Why Migrate from Elasticsearch to Doris?
+
+| Aspect | Elasticsearch | Apache Doris |
+|--------|---------------|--------------|
+| Query Language | DSL (JSON-based) | Standard SQL |
+| JOINs | Limited | Full SQL JOINs |
+| Storage Efficiency | Higher storage usage | Columnar compression |
+| Operational Complexity | Complex cluster management | Simpler operations |
+| Full-text Search | Native inverted index | Inverted index support |
+| Real-time Analytics | Good | Excellent |
+
+## Considerations
+
+1. **Full-text Search**: Doris supports [Inverted 
Index](../table-design/index/inverted-index/overview.md) for full-text search 
capabilities similar to Elasticsearch.
+
+2. **Index to Table Mapping**: Each Elasticsearch index typically maps to a 
Doris table.
+
+3. **Nested Documents**: Elasticsearch nested types map to Doris 
[VARIANT](../data-operate/import/complex-types/variant.md) type for flexible 
schema handling.
+
+4. **Array Handling**: Elasticsearch arrays require explicit configuration in 
Doris.
+
+## Data Type Mapping
+
+| Elasticsearch Type | Doris Type | Notes |
+|--------------------|------------|-------|
+| null | NULL | |
+| boolean | BOOLEAN | |
+| byte | TINYINT | |
+| short | SMALLINT | |
+| integer | INT | |
+| long | BIGINT | |
+| unsigned_long | LARGEINT | |
+| float | FLOAT | |
+| half_float | FLOAT | |
+| double | DOUBLE | |
+| scaled_float | DOUBLE | |
+| keyword | STRING | |
+| text | STRING | Consider inverted index in Doris |
+| date | DATE or DATETIME | See [Date Handling](#handling-date-types) |
+| ip | STRING | |
+| nested | VARIANT | See [VARIANT 
type](../data-operate/import/complex-types/variant.md) for flexible schema |
+| object | JSON | |
+| flattened | JSON | Supported since Doris 3.1.4, 4.0.3 |
+| geo_point | STRING | Store as "lat,lon" string |
+| geo_shape | STRING | Store as GeoJSON string |
+
+## Migration Options
+
+### Option 1: ES Catalog (Recommended)
+
+The ES Catalog provides direct access to Elasticsearch data from Doris, 
enabling both querying and migration.
+
+#### Prerequisites
+
+- Elasticsearch 5.x or higher
+- Network connectivity between Doris FE/BE nodes and Elasticsearch
+
+#### Step 1: Create ES Catalog
+
+```sql
+CREATE CATALOG es_catalog PROPERTIES (
+    'type' = 'es',
+    'hosts' = 'http://es-node1:9200,http://es-node2:9200',
+    'user' = 'elastic',
+    'password' = 'password'
+);
+```
+
+With additional options:
+
+```sql
+CREATE CATALOG es_catalog PROPERTIES (
+    'type' = 'es',
+    'hosts' = 'http://es-node1:9200',
+    'user' = 'elastic',
+    'password' = 'password',
+    'doc_value_scan' = 'true',
+    'keyword_sniff' = 'true',
+    'nodes_discovery' = 'true',
+    'ssl' = 'false',
+    'mapping_es_id' = 'true'
+);
+```
+
+#### Step 2: Explore Elasticsearch Data
+
+```sql
+-- Switch to ES catalog
+SWITCH es_catalog;
+
+-- ES creates a default_db database
+USE default_db;
+
+-- List indices as tables
+SHOW TABLES;
+
+-- Preview data
+SELECT * FROM logs_index LIMIT 10;
+
+-- Check field mappings
+DESC logs_index;
+```
+
+#### Step 3: Design Doris Table
+
+Based on your Elasticsearch index, design an appropriate Doris table:
+
+```sql
+-- Example: Log data table
+SWITCH internal;
+
+CREATE TABLE logs (
+    `@timestamp` DATETIME NOT NULL,
+    log_id VARCHAR(64),
+    level VARCHAR(16),
+    message TEXT,
+    host VARCHAR(128),
+    service VARCHAR(64),
+    trace_id VARCHAR(64),
+    INDEX idx_message (message) USING INVERTED PROPERTIES("parser" = 
"unicode", "support_phrase" = "true"),
+    INDEX idx_level (level) USING INVERTED,
+    INDEX idx_service (service) USING INVERTED
+)
+DUPLICATE KEY(`@timestamp`, log_id)
+PARTITION BY RANGE(`@timestamp`) ()
+DISTRIBUTED BY HASH(log_id) BUCKETS 16
+PROPERTIES (
+    "dynamic_partition.enable" = "true",
+    "dynamic_partition.time_unit" = "DAY",
+    "dynamic_partition.start" = "-30",
+    "dynamic_partition.end" = "3",
+    "dynamic_partition.prefix" = "p",
+    "replication_num" = "3"
+);
+```
+
+`★ Insight ─────────────────────────────────────`
+1. **DUPLICATE KEY model** is best for log data where append-only writes are 
common
+2. **Inverted indexes** enable full-text search similar to Elasticsearch
+3. **Dynamic partitioning** automatically manages time-based data lifecycle
+`─────────────────────────────────────────────────`
+

Review Comment:
   The "Insight" section present in the English version (lines 147-151) is 
missing from the Chinese translation. This creates an inconsistency between the 
English and Chinese documentation. Either add a corresponding section to the 
Chinese version or remove it from the English version for consistency. If 
keeping it, consider using standard Docusaurus admonition syntax for better 
rendering.
   ```suggestion
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add comprehensive migration documentation section [doris-website]

Reply via email to