This is an automated email from the ASF dual-hosted git repository.
kassiez pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 4b60cf18e64 Update 3.1 EN release (#2909)
4b60cf18e64 is described below
commit 4b60cf18e64118856044c3ffdd105fcbe5439fef
Author: KassieZ <[email protected]>
AuthorDate: Tue Sep 23 19:08:32 2025 +0800
Update 3.1 EN release (#2909)
## Versions
- [ ] dev
- [ ] 3.0
- [ ] 2.1
- [ ] 2.0
## Languages
- [ ] Chinese
- [ ] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---
.../trouble-shooting/memory-management/overview.md | 2 +-
docs/releasenotes/all-release.md | 5 +
docs/releasenotes/v3.1/release-3.1.0.md | 673 ++++++++++++++++++++-
.../trouble-shooting/memory-management/overview.md | 2 +-
.../trouble-shooting/memory-management/overview.md | 2 +-
.../trouble-shooting/memory-management/overview.md | 2 +-
src/pages/index.tsx | 4 +-
static/images/memory-used-by-subsystem.png | Bin 0 -> 507390 bytes
static/images/release-3.1/custom-tokenization.png | Bin 0 -> 9804 bytes
.../trouble-shooting/memory-management/overview.md | 2 +-
.../trouble-shooting/memory-management/overview.md | 2 +-
.../version-3.0/releasenotes/all-release.md | 5 +
.../version-3.0/releasenotes/v3.1/release-3.1.0.md | 673 ++++++++++++++++++++-
13 files changed, 1362 insertions(+), 10 deletions(-)
diff --git a/docs/admin-manual/trouble-shooting/memory-management/overview.md
b/docs/admin-manual/trouble-shooting/memory-management/overview.md
index fa9c7148fcd..c9c105db6fc 100644
--- a/docs/admin-manual/trouble-shooting/memory-management/overview.md
+++ b/docs/admin-manual/trouble-shooting/memory-management/overview.md
@@ -59,7 +59,7 @@ Doris BE uses Memory Tracker to record process memory usage,
supports Web page v
Real-time memory statistics can be viewed through Doris BE's Web page
`http://{be_host}:{be_web_server_port}/mem_tracker`, which displays the current
memory size and peak memory size tracked by Memory Tracker of `type=overview`,
including Query/Load/Compaction/Global, etc. `be_web_server_port` defaults to
8040.
-
+
Memory Tracker is divided into different types. Among the Memory Tracker of
type=overview, except for `process resident memory`, `process virtual memory`,
and `sum of all trackers`, the details of other Memory Trackers of
type=overview can be viewed through
`http://{be_host}:{be_web_server_port}/mem_tracker?type=Lable`.
diff --git a/docs/releasenotes/all-release.md b/docs/releasenotes/all-release.md
index 12b9b2f6206..2f0ccd959c4 100644
--- a/docs/releasenotes/all-release.md
+++ b/docs/releasenotes/all-release.md
@@ -8,6 +8,9 @@
This document presents a summary of Apache Doris versions released within one
year, listed in reverse chronological order.
:::tip Latest Release
+🎉 Version 3.1.0 is released. Check out the 🔗[Release
Notes](../releasenotes/v3.1/release-3.1.0) here. Doris 3.1 introduces a sparse
column and schema template for the VARIANT data type, making it more efficient
to store and query large datasets with dynamic fields, such as logs and JSON
data. For lakehouse capabilities, it enhances asynchronous materialized views
and expands support for Iceberg and Paimon to build a stronger bridge between
data lakes and data warehouses.
+
+<br />
🎉 Version 3.0.8 released now. Check out the 🔗[Release
Notes](../releasenotes/v3.0/release-3.0.8) here. Starting from version 3.X,
Apache Doris supports a compute-storage decoupled mode in addition to the
compute-storage coupled mode for cluster deployment. With the cloud-native
architecture that decouples the computation and storage layers, users can
achieve physical isolation between query loads across multiple compute
clusters, as well as isolation between read and write loads.
@@ -22,6 +25,8 @@ This document presents a summary of Apache Doris versions
released within one ye
- [2025-09-19, Apache Doris 3.0.8 is
released](../releasenotes/v3.0/release-3.0.8.md)
+- [2025-09-04, Apache Doris 3.1.0 is
released](../releasenotes/v3.1/release-3.1.0.md)
+
- [2025-08-25, Apache Doris 3.0.7 is
released](../releasenotes/v3.0/release-3.0.7.md)
- [2025-08-15, Apache Doris 2.1.11 is
released](../releasenotes/v2.1/release-2.1.11.md)
diff --git a/docs/releasenotes/v3.1/release-3.1.0.md
b/docs/releasenotes/v3.1/release-3.1.0.md
index 6d8e7ae5860..af7cbe5ac9b 100644
--- a/docs/releasenotes/v3.1/release-3.1.0.md
+++ b/docs/releasenotes/v3.1/release-3.1.0.md
@@ -5,4 +5,675 @@
}
---
-Coming Soon
\ No newline at end of file
+We are excited to announce the official release of **Apache Doris 3.1**, a
milestone version that makes semi-structured data and lakehouse analytics more
powerful and practical.
+
+Doris 3.1 introduces **sparse columns and schema template for the VARIANT data
type**, allowing users to efficiently store, index, and query datasets with
tens of thousands of dynamic fields, ideal for logs, events, and JSON-heavy
workloads.
+
+For lakehouse capabilities, Doris 3.1 brings better **asynchronous
materialized view** into lakehosue, building a stronger bridge between data
lakes and data warehouses. It also expands support for **Iceberg and Paimon**,
making complex lakehouse workloads easier and more efficient to manage.
+
+More than **90 contributors** submitted over **1,000 improvements and fixes**
during the development of Apache Doris 3.1. We'd like to send a big heartfelt
thank you to everyone who contributed through development, testing, and
feedback.
+
+**[Download Apache Doris 3.1 on
GitHub](https://github.com/apache/doris/releases)**
+
+**[Download from the official website](https://doris.apache.org/download)**
+
+### Key Highlights in Apache Doris 3.1
+
+- **Semi-Structured Data Analytics**
+ - **Sparse columns** for the VARIANT type, supporting tens of thousands of
subcolumns.
+ - **Schema template** for VARIANT, delivering faster queries, more stable
indexing, and controllable costs without losing flexibility.
+ - **Inverted Indexes Storage Format** upgrade from V2 to V3, reducing
storage usage by up to **20%**.
+ - Three new tokenizers: **ICU Tokenizer**, **IK Tokenizer**, and **Basic**
**Tokenizer.** We also added support for **custom tokenizers**, greatly
improving search recall across diverse scenarios.
+- **Lakehouse Upgrades**
+ - **Better materialized views features** were introduced to data lakes,
strengthening the bridge between data lakes and data warehouses.
+ - Broader support for **Iceberg** and **Paimon**.
+ - **Dynamic partition pruning** and **batch splits scheduling**, improve
certain query workloads by up to **40%** and reduce FE (frontend) memory
consumption.
+ - Doris 3.1 refactors the **connection properties for external data
sources**, providing a clearer way to integrate with various metadata services
and storage systems, offering more flexible connectivity options.
+- **Storage Engine Improvements**
+ - **New flexible column updates**, building on partial column updates, now
allow different columns to be updated for each row within a single import.
+ - In storage-compute separation scenarios, Doris 3.1 optimizes the lock
mechanism for **MOW tables**, improving the experience of high-concurrency data
ingestion.
+- **Performance Optimizations**
+ - Doris 3.1 enhanced **partition pruning and query planning**, delivering
faster queries and lower resource usage with large partitions (tens of
thousands) and complex filtering expressions.
+ - The optimizer also introduces **data-aware optimization techniques**,
achieving up to **10× performance gains** in specific workloads.
+
+## 1. VARIANT Semi-Structured Data Analytics Get a Major Upgrade
+
+### Storage Capabilities: Sparse columns and Subcolumns Vertical Compaction
+
+Traditional OLAP systems often struggle with "super-wide tables" containing
thousands to tens of thousands of columns, facing metadata bloat, compaction
amplification, and degraded query performance. **Doris 3.1** addresses this
with **sparse subcolumns** and **subcolumn-level vertical compaction** for the
VARIANT type, pushing the practical column limit to tens of thousands.
+
+**VARIANT brings the following benefits with its thorough optimizations in
storage layer:**
+
+- **Stable support for subcolumns (thousands to tens of thousands)** in
columnar storage, delivering smoother queries and reducing compaction latency.
+- **Controllable metadata and indexing**, preventing exponential growth.
+- **Subcolumnarization** **over 10,000 subcolumns** (columnar storage) is
achievable in real-world tests, with smooth and efficient compaction
performance.
+
+**Key use cases for ultra-wide tables:**
+
+- **Connected Vehicles / IoT Telemetry:** Support various device models and
dynamically changing sensor dimensions.
+- **Marketing Automation / CRM:** Continuously expanding events and user
attributes (e.g., custom events/properties).
+- **Advertising / Event Tracking:** Massive optional properties with sparse
and continuously evolving fields.
+- **Security Auditing / Logs:** Diverse log sources have varying fields that
need to be aggregated and searched by schema.
+- **E-commerce Product Attributes:** Wide category range with highly variable
product attributes.
+
+**How it works:**
+
+- **Frequent JSON keys become real columns**
+
+ We automatically keep the most common JSON keys as true columnar
subcolumns. This keeps queries fast and metadata small. Rare keys stay in a
compact "sparse" area, so the table doesn’t bloat.
+
+- **Smart, low‑memory merges for subcolumns**
+
+ We merge VARIANT subcolumns in small vertical groups, which uses much
less memory and keeps background maintenance smooth even with thousands of
fields. Hot keys are detected and promoted to subcolumns automatically.
+
+- **Faster handling of missing values**
+
+ We batch‑fill default values to cut per‑row overhead during reads and
merges.
+
+- **Lean metadata via LRU caching**
+
+ Column metadata is cached with an LRU policy, reducing memory usage and
speeding up repeated access. In short: you get columnar speed for the fields
that matter, without paying a heavy cost for the long tail, and background
operations stay smooth and predictable.
+
+**How to Enable and Use:**
+
+New column-level Variant parameter control, column properties:
+
+`variant_max_subcolumns_count`: Default is `0`, which means sparse subcolumn
support is disabled. When set to a specific value, the system extracts the
Top-N most frequent JSON keys for columnar storage, while the remaining keys
are stored sparsely.
+
+```SQL
+-- Enable sparse subcolumns and cap hot subcolumn count
+CREATE TABLE IF NOT EXISTS tbl (
+ k BIGINT,
+ v VARIANT<
+ properties("variant_max_subcolumns_count" = "2048") -- pick top-2048 hot
keys
+ >
+) DUPLICATE KEY(k);
+```
+
+### Schema Templates
+
+Schema templates make constantly changing JSON predictable along critical
paths: so queries run faster, indexes stay stable, and costs remain under
control, without losing flexibility. Benefits are:
+
+- **Type stability**: Lock types for key JSON subpaths in DDL to prevent type
drift that causes query errors, index invalidation, and implicit cast overhead.
+- **Faster, more accurate retrieval:** Tailor inverted index strategies per
subpath (tokenized/non-tokenized, parsers, phrase search) to cut latency and
improve hit rates.
+- **Controlled indexing and costs**: Tune indexes per subpath instead of
indexing the whole column, significantly reducing index count, write
amplification, and storage usage.
+- **Maintainability and collaboration**: Acts as a "data contract" for JSON,
semantics stay consistent across teams. Types and index states are easier to
observe and debug.
+- **Easier to evolve:** Template and optionally index hot paths, while keeping
long‑tail fields flexible, so you scale without surprises.
+
+**How to enable and use:**
+
+- Explicitly declare common subpaths and types (including wildcards) inside
`VARIANT<...>`, for example: `VARIANT<'a': INT, 'c.d': TEXT, 'array_int_*':
ARRAY<INT>>`.
+- Configure per‑subpath indexes with differentiated strategies
(`field_pattern`, `parsers`, `tokenization`, `phrase` search), and use
wildcards to cover families of fields at once.
+- New column property: `variant_enable_typed_paths_to_sparse`
+- Default: `false` (typed, predefined paths are not stored sparsely).
+- Set to true also to treat typed paths as sparse candidates, useful when many
paths match and you want to avoid column sprawl.
+
+In short: template what matters, index it precisely, and keep everything else
flexible.
+
+**Example 1:** Schema Definition with Multiple Indexes on a Single Column
+
+```SQL
+-- Common properties: field_pattern (target subpath), analyzer, parser,
support_phrase, etc.
+CREATE TABLE IF NOT EXISTS tbl (
+ k BIGINT,
+ v VARIANT<'content' : STRING>, -- specify concrete type for subcolumn
'content'
+ INDEX idx_tokenized(v) USING INVERTED PROPERTIES("parser" = "english",
"field_pattern" = "content"), -- tokenized inverted index for 'content' with
english parser
+ INDEX idx_v(v) USING INVERTED PROPERTIES("field_pattern" = "content") --
non-tokenized inverted index for 'content'
+);
+
+-- v.content will have both a tokenized (english) inverted index and a
non-tokenized inverted index
+
+-- Use tokenized index
+SELECT * FROM tbl WHERE v['content'] MATCH 'Doris';
+
+-- Use non-tokenized index
+SELECT * FROM tbl WHERE v['content'] = 'Doris';
+```
+
+**Example 2:** Batch Processing Columns Matching a Wildcard Pattern
+
+```SQL
+-- Use wildcard-typed subpaths with per-pattern indexes
+CREATE TABLE IF NOT EXISTS tbl2 (
+ k BIGINT,
+ v VARIANT<
+ 'pattern1_*' : STRING, -- batch-typing: all subpaths matching pattern1_*
are STRING
+ 'pattern2_*' : BIGINT, -- batch-typing: all subpaths matching pattern2_*
are BIGINT
+ properties("variant_max_subcolumns_count" = "2048") -- enable sparse
subcolumns; keep top-2048 hot keys
+ >,
+ INDEX idx_p1 (v) USING INVERTED
+ PROPERTIES("field_pattern"="pattern1_*", "parser" = "english"), --
tokenized inverted index for pattern1_* with english parser
+ INDEX idx_p2 (v) USING INVERTED
+ PROPERTIES("field_pattern"="pattern2_*") -- non-tokenized inverted index
for pattern2_*
+) DUPLICATE KEY(k);
+```
+
+**Example 3:** Allowing Predefined Typed Columns to be Stored Sparsely
+
+```SQL
+-- Allow predefined typed paths to participate in sparse extraction
+CREATE TABLE IF NOT EXISTS tbl3 (
+ k BIGINT,
+ v VARIANT<
+ 'message*' : STRING, -- batch-typing: all subpaths matching prefix
'message*' are STRING
+ properties(
+ "variant_max_subcolumns_count" = "2048", -- enable sparse
subcolumns; keep top-2048 hot keys
+ "variant_enable_typed_paths_to_sparse" = "true" -- include typed
(predefined) paths as sparse candidates (default: false)
+ )
+ >
+) DUPLICATE KEY(k);
+```
+
+## 1. Inverted Index Storage Format V3
+
+Compared to V2, Inverted Index V3 further optimizes storage:
+
+- Smaller index files reduce disk usage and I/O overhead.
+- Testing with httplogs and logsbench datasets shows up to 20% storage space
savings with V3, great for large-scale text data and log analysis scenarios.
+
+| Test dataSet | Data size before import | inverted_index_storage_format = v2
| inverted_index_storage_format = v3 | Space saving: V3 vs. V2 |
+| :----------- | :---------------------- | :---------------------------------
| :--------------------------------- | :---------------------- |
+| httplogs | 30.89 GB | 4.472 GB
| 3.479 GB | 22.2% |
+| logsbench | 1479.31 GB | 182.180 GB
| 138.008 GB | 24.2% |
+
+**Key Improvements:**
+
+- **Introduced ZSTD dictionary compression** for inverted index term
dictionaries, enabled via `dict_compression` in index properties.
+- **Added compression for positional information** associated with each term
in the inverted index, further reducing storage footprint.
+
+**How to Use:**
+
+```SQL
+-- Enable V3 format when creating the table
+CREATE TABLE example_table (
+ content TEXT,
+ INDEX content_idx (content) USING INVERTED
+ PROPERTIES("parser" = "english", "dict_compression" = "true")
+) ENGINE=OLAP
+PROPERTIES ("inverted_index_storage_format" = "V3");
+```
+
+### Full-Text Search Tokenizers
+
+We added new commonly used tokenizers in Doris 3.1 to better meet diverse
tokenization needs:
+
+1. **ICU Tokenizer:**
+
+- **Implementation:** Based on International Components for Unicode (ICU).
+- **Use Cases:** Ideal for internationalized text with complex writing systems
and multilingual documents.
+- **Example:**
+
+```SQL
+SELECT TOKENIZE('مرحبا بالعالم Hello 世界', '"parser"="icu"');
+-- Results: ["مرحبا", "بالعالم", "Hello", "世界"]
+
+SELECT TOKENIZE('มนไมเปนไปตามความตองการ', '"parser"="icu"');
+-- Results: ["มน", "ไมเปน", "ไป", "ตาม", "ความ", "ตองการ"]
+```
+
+2. **Basic Tokenizer**
+
+- **Implementation**: A simple rule-based custom tokenizer, using basic
character type recognition.
+- **Use cases**: Simple scenarios or cases with very high performance
requirements.
+- **Rules**:
+ - Continuous alphanumeric characters are treated as one token.
+ - Each Chinese character is a separate token.
+ - Punctuation, spaces, and special characters are ignored.
+- **Examples**:
+
+```SQL
+-- English text tokenization
+SELECT TOKENIZE('Hello World! This is a test.', '"parser"="basic"');
+-- Results: ["hello", "world", "this", "is", "a", "test"]
+
+-- Chinese text tokenization
+SELECT TOKENIZE('你好世界', '"parser"="basic"');
+-- Results: ["你", "好", "世", "界"]
+
+-- Mixed-language tokenization
+SELECT TOKENIZE('Hello你好World世界', '"parser"="basic"');
+-- Results: ["hello", "你", "好", "world", "世", "界"]
+
+-- Handling numbers and special characters
+SELECT TOKENIZE('GET /images/hm_bg.jpg HTTP/1.0', '"parser"="basic"');
+-- Results: ["get", "images", "hm", "bg", "jpg", "http", "1", "0"]
+
+-- Processing long numeric sequences
+SELECT TOKENIZE('12345678901234567890', '"parser"="basic"');
+-- Results: ["12345678901234567890"]
+```
+
+**Custom Tokenization**
+
+A new **custom tokenization** feature lets users design tokenization logic
according to their needs, improving recall for text search.
+
+By combining character filters, tokenizers, and token filters, users can go
beyond the limitations of built-in analyzers and define exactly how text should
be split into searchable terms, directly influencing both search relevance and
analytical accuracy.
+
+
+
+**Example Use Case**
+
+- **Problem**: With the default Unicode tokenizer, a phone number like
"13891972631" is treated as a single token, making prefix search (e.g., "138")
impossible.
+- **Solution**:
+ - Create a tokenizer using the **Edge N-gram** custom tokenizer:
+ - ```SQL
+ CREATE INVERTED INDEX TOKENIZER IF NOT EXISTS edge_ngram_phone_tokenizer
+ PROPERTIES
+ (
+ "type" = "edge_ngram",
+ "min_gram" = "3",
+ "max_gram" = "10",
+ "token_chars" = "digit"
+ );
+ ```
+
+ - Create an analyzer:
+ - ```SQL
+ CREATE INVERTED INDEX ANALYZER IF NOT EXISTS phone_prefix_analyzer
+ PROPERTIES
+ (
+ "tokenizer" = "edge_ngram_phone_tokenizer"
+ );
+ ```
+
+ - Specify an analyzer when creating the table
+
+ ```SQL
+ CREATE TABLE customer_contacts (
+ id bigint NOT NULL AUTO_INCREMENT(1),
+ phone text NULL,
+ INDEX idx_phone (phone) USING INVERTED PROPERTIES(
+ "analyzer" = "phone_prefix_analyzer"
+ )
+ ) ENGINE=OLAP
+ DUPLICATE KEY(id)
+ DISTRIBUTED BY RANDOM BUCKETS 1
+ PROPERTIES ("replication_allocation" = "tag.location.default: 1");
+ ```
+
+ - Check tokenization results:
+
+ ```SQL
+ SELECT tokenize('13891972631', '"analyzer"="phone_prefix_analyzer"');
+
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+ | tokenize('13891972631', '"analyzer"="phone_prefix_analyzer"')
|
+
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+ | [{
+ "token": "138"
+ }, {
+ "token": "1389"
+ }, {
+ "token": "13891"
+ }, {
+ "token": "138919"
+ }, {
+ "token": "1389197"
+ }, {
+ "token": "13891972"
+ }, {
+ "token": "138919726"
+ }, {
+ "token": "1389197263"
+ }] |
+
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+ ```
+
+ - Text search results
+
+ ```SQL
+ SELECT * FROM customer_contacts_optimized WHERE phone MATCH '138';
+ +------+-------------+
+ | id | phone |
+ +------+-------------+
+ | 1 | 13891972631 |
+ | 2 | 13812345678 |
+ +------+-------------+
+ SELECT * FROM customer_contacts_optimized WHERE phone MATCH '1389';
+ +------+-------------+
+ | id | phone |
+ +------+-------------+
+ | 1 | 13891972631 |
+ | 2 | 13812345678 |
+ +------+-------------+
+ 2 rows in set (0.043 sec)
+ ```
+
+ - Using the Edge N-gram tokenizer, a phone number is broken into multiple
prefix tokens, enabling flexible prefix-matching search.
+
+## 3. Lakehouse Capability Upgrades
+
+### Asynchronous Materialized Views Now Fully Support Data Lakes
+
+In version 3.1, asynchronous materialized views have been significantly
enhanced, now providing comprehensive support for partition-level incremental
maintenance and partition-level transparent query rewriting. This functionality
is available for data lake formats like Iceberg, Paimon, and Hudi.
+
+Doris introduced asynchronous materialized views in version 2.1, and we have
kept improving, including:
+
+| Feature | Description
|
+| :-------------------- |
:----------------------------------------------------------- |
+| Refresh | Full and partition-level incremental maintenance
for internal tables and Hive tables |
+| Trigger Methods | Scheduled refresh, manual refresh for all type
table, and refresh by commit ONLY for internal tables |
+| partitioning | Same partitioning as base tables, with features like
partition rolling, multi-level partitions, and retaining certain "hot"
partitions of the base table |
+| Transparent Rewriting | Supports conditional compensation, join rewriting
and derivation, aggregation rewriting, and partition compensation for internal
tables |
+| Maintenance Tools | Provides tasks(), jobs(), and mv_infos() to query
materialized view runtime information |
+
+Version 3.1 focuses on strengthening our lakehouse capabilities by providing
comprehensive support for partition-level incremental maintenance and
partition-level compensation for external tables during query rewriting. This
enhancement targets mainstream data lake table formats like Iceberg, Paimon,
and Hudi, effectively creating a high-speed data bridge between the data lake
and the data warehouse. The detailed scope of support is outlined in the table
below.
+
+**See specific support details below:**
+
+| External Source | Partition Refresh
| Transparent Partition Rewriting |
+| :-------------- |
:----------------------------------------------------------- |
:------------------------------ |
+| Hive | Supported
| Supported |
+| Iceberg | Supported
| Supported |
+| Paimon | Supported
| Supported |
+| Hudi | Cannot detect partition sync; suitable for manually
refreshing specific partitions | Supported |
+
+### Better Support for Iceberg and Paimon
+
+#### 1. Iceberg
+
+Doris 3.1 introduces a wide range of optimizations and enhancements for
Iceberg table format, offering compatibility with the latest Iceberg
capabilities.
+
+**Branch / Tag Lifecycle Management**
+
+Starting in 3.1, Doris natively supports creating, deleting, reading, and
writing Iceberg Branches and Tags. Much like Git, this allows users to manage
Iceberg table data versions seamlessly. With this capability, users can perform
parallel version management, canary releases, and environment isolation,
without relying on external engines or custom logic.
+
+```SQL
+-- Create branches
+ALTER TABLE iceberg_tbl CREATE BRANCH b1;
+-- Write data to a specific branch
+INSERT INTO iceberg_tbl@branch(b1) values(1, 2);
+-- Query data from a specific branch
+SELECT * FROM iceberg_tbl@branch(b1);
+```
+
+**Comprehensive System Tables Support**
+
+Doris 3.1 adds support for Iceberg system tables such as `$entries`,
`$files`, `$history`, `$manifests`, `$refs`, `$snapshots`. Users can directly
query Iceberg with commands like `SELECT * FROM iceberg_table$history` or
`…$refs` to inspect metadata, snapshot histories, branch/tag mappings, and file
organization. This dramatically improves metadata observability, making issue
diagnosis, performance tuning, and governance more transparent.
+
+You can directly query Iceberg's underlying metadata, snapshot lists, branch,
and tag information using statements like`SELECT * FROM iceberg_table$history`
or `…$refs`. This allows you to gain in-depth insights into the organization of
data files, the history of snapshot changes, and branch mappings. This
capability greatly enhances the observability of Iceberg metadata, making it
easier and more transparent to troubleshoot issues, perform optimization
analysis, and make governance decisions.
+
+**Example:** querying the number of deleted files via system tables.
+
+```SQL
+SELECT
+ CASE
+ WHEN content = 0 THEN 'DataFile'
+ WHEN content = 1 THEN 'PositionDeleteFile'
+ WHEN content = 2 THEN 'EqualityDeleteFile'
+ ELSE 'Unknown'
+ END AS ContentType,
+ COUNT(*) AS FileNum,
+ SUM(file_size_in_bytes) AS SizeInBytes,
+ SUM(record_count) AS Records
+FROM
+ iceberg_table$files
+GROUP BY
+ ContentType;
+
++--------------------+---------+-------------+---------+
+| ContentType | FileNum | SizeInBytes | Records |
++--------------------+---------+-------------+---------+
+| EqualityDeleteFile | 2787 | 1432518 | 27870 |
+| DataFile | 2787 | 4062416 | 38760 |
+| PositionDeleteFile | 11 | 36608 | 10890 |
++--------------------+---------+-------------+---------+
+```
+
+**Support Querying Iceberg View**
+
+Doris 3.1 added support for accessing and querying Iceberg logical views,
further enhancing Doris's Iceberg capabilities. In future 3.x releases, we plan
to add SQL dialect translation support for Iceberg Views.
+
+**Schema Evolution with ALTER Statements**
+
+Starting in 3.1, Doris supports adding, deleting, renaming, and reordering
columns in Iceberg tables through the ALTER TABLE statement. This further
empowers Doris to manage Iceberg tables without needing third-party engines
like Spark.
+
+```SQL
+ALTER TABLE iceberg_table
+ADD COLUMN new_col int;
+```
+
+Additionally, in version 3.1, the Iceberg dependency has been upgraded to
version 1.9.2 to better support new Iceberg features. Future 3.1.x releases
will further improve Iceberg table management, including data compaction and
branch evolution.
+
+Documentation: https://doris.apache.org/docs/lakehouse/catalogs/iceberg-catalog
+
+#### 2. Paimon
+
+Doris 3.1 brings several updates and enhancements to the Paimon table format
based on real use cases.
+
+**Support for Paimon Batch Incremental Queries**
+
+Doris 3.1 allows reading incremental data between two specified snapshots in a
Paimon table. This allows users to better access Paimon incremental data,
especially enabling incremental aggregation materialized views on Paimon tables.
+
+```SQL
+SELECT * FROM paimon_tbl@incr('startSnapshotId'='2', 'endSnapshotId'='5');
+```
+
+**Branch and Tag Reading**
+
+From 3.1 and onwards, Doris supports reading Paimon table data from branches
and tags, offering more flexible multi-version data access.
+
+```SQL
+SELECT * FROM paimon_tbl@branch(branch1);
+SELECT * FROM paimon_tbl@tag(tag1);
+```
+
+**Comprehensive System Tables Support**
+
+Similar to Iceberg, 3.1 adds support for Paimon system tables like `$files`,
`$partitions`, `$manifests`, `$tags`, `$snapshots`. Users can query underlying
metadata directly with statements like `SELECT * FROM partition_table$files`,
making Paimon tables easier to explore, debug, and optimize.
+
+For example, we can use system tables to count new data files added per
partition.
+
+```SQL
+SELECT
+ partition,
+ COUNT(*) AS new_file_count,
+ SUM(file_size_in_bytes)/1024/1024 AS new_total_size_mb
+FROM my_table$files
+WHERE creation_time >= DATE_SUB(NOW(), INTERVAL 3 DAY)
+GROUP BY partition
+ORDER BY new_total_size_mb DESC;
+```
+
+In 3.1, Paimon's dependency is upgraded to version 1.1.1 to better support new
features.
+
+Documentation: https://doris.apache.org/docs/lakehouse/catalogs/paimon-catalog
+
+### Enhanced Data Lake Query Performance
+
+Doris 3.1 added thorough optimizations for data lake table formats, hoping to
provide more stable and efficient data lake analytics in real-world production
environments.
+
+1. **Dynamic Partition Pruning**
+
+In multi-table join queries, this feature generates partition predicates from
the right-hand table at runtime and applies them to the left-hand table.
Pruning unnecessary partitions on the fly reduces data I/O and improves query
performance. Doris 3.0 introduced dynamic partition pruning for Hive tables.
Now with 3.1, we have expanded this capability to Iceberg, Paimon, and Hudi
tables. In test scenarios, queries with high selectivity showed a **30%–40%
performance improvement**.
+
+2. **Batch Split Scheduling**
+
+When a lakehouse table contains a large number of shards, the Frontend (FE)
traditionally collects all shard metadata at once and sends it to the Backend
(BE). This can cause high FE memory consumption and long planning times,
particularly for queries on large datasets.
+
+Batch shard execution solves this by generating shard metadata in batches and
executing them as they are produced. This reduces FE memory pressure, allowing
planning and execution to run in parallel and improving overall efficiency.
Doris 3.0 added support for this feature on Hive tables, and in 3.1 extended it
to Iceberg tables. In large-scale test scenarios, it significantly reduced FE
memory usage and query planning time.
+
+### Federated Analytics: More Flexible, More Powerful Connectors
+
+In 3.1, Doris restructured its connector properties for external data sources.
This makes it easier to integrate with different metadata services and storage
systems, while also expanding the range of supported capabilities.
+
+#### 1. More Flexible Data Lake Rest Catalog Support
+
+- **Iceberg REST Catalog**
+
+ We have improved support for Iceberg REST Catalog. Doris 3.1 now works
with multiple backend implementations, including Unity, Polaris, Gravitino, and
Glue. We also added support for vended credentials, enabling more secure and
flexible credential management. AWS is supported, with GCP and Azure support
planned in upcoming releases.
+
+Refer to
[documentation](https://doris.apache.org/docs/lakehouse/metastores/iceberg-rest)
+
+- **Paimon REST Catalog**
+
+ Doris 3.1 introduces support for Paimon REST Catalog via Alibaba Cloud
DLF, allowing direct access to Paimon tables managed by the latest DLF versions.
+
+Refer to
[documentation](https://doris.apache.org/docs/lakehouse/best-practices/doris-dlf-paimon)
+
+#### 2. More Powerful Hadoop Ecosystem Support
+
+- **Multi-Kerberos Environment Support**
+
+ Doris 3.1 allows access to multiple Kerberos authentication environments
within the same cluster. Different environments can use separate KDC services,
Principals, and Keytabs. Doris 3.1 allows each Catalog to now be configured
with its own Kerberos settings independently. This feature makes it much easier
for users with multiple Kerberos environments to manage them all through Doris
with unified access control.
+
+Refer to
[documentation](https://doris.apache.org/docs/lakehouse/storages/hdfs#kerberos-authentication)
+
+- **Multi-Hadoop Environment Support**
+
+ Previously, Doris only allowed a single Hadoop configuration (e.g.,
hive-site.xml, hdfs-site.xml) to be placed under the conf directory. Multiple
Hadoop environments and configurations are not supported. With Doris 3.1, users
can now assign different Hadoop configuration files to different Catalogs,
making it easier to manage external data sources flexibly.
+
+Refer to
[documentation](https://doris.apache.org/docs/lakehouse/storages/hdfs#kerberos-authentication)
+
+## 1. Storage Engine Improvements
+
+In Doris 3.1, we've continued to improve the storage layer, improving
performance and stability.
+
+### Flexible Column Updates: A New Data Update Experience
+
+Previously, the **Partial Column Update** feature in Doris required each row
in a single import to update the same set of columns. However, in many
scenarios, source systems only provide records containing the primary key and
the specific columns being updated, with different rows updating different
columns. To address this, Doris introduced the **Flexible Column Update**
feature, which greatly simplifies the user's workload of preparing per-column
data and improving write performance.
+
+**How to Use**
+
+- Enable when creating a Merge-on-Write Unique table:
+ - `"enable_unique_key_skip_bitmap_column" = "true"`
+- Specify the import mode:
+ - `unique_key_update_mode: UPDATE_FLEXIBLE_COLUMNS`
+- Doris will automatically handle flexible column updates and fill in missing
data.
+
+**Example**
+
+Flexible column updates allow different rows to update different columns in
the same import:
+
+- Delete a row (`DORIS_DELETE_SIGN`)
+- Update specific columns (e.g., `v1`, `v2`, `v5`)
+- Insert new rows with only the primary key and updated columns; other columns
use default values or historical data.
+
+**Performance**
+
+- Test environment: 1 FE + 4 BE, 16C 64GB, 300M rows × 101 columns, 3 replicas
+ - Import 20,000 rows updating just 1 column (99 columns filled automatically)
+ - Single concurrency import performance: 10.4k rows/sec
+ - Resource usage per node: CPU ~60%, memory ~30GB, read IOPS ~7.5k/s, write
IOPS ~5k/s
+
+### Storage-Compute Separation: MOW Lock Optimization
+
+In storage-compute separation scenarios, updating the Delete Bitmap in MOW
tables requires acquiring a distributed lock `delete_bitmap_update_lock`.
Previously, import, compaction, and schema change operations would lock race,
causing long waits or failures under high-concurrency imports.
+
+We have added several optimizations to make large-scale, concurrent data
ingestion more stable and efficient. These include **reducing compaction lock
times:** cutting p99 commit latency from 1.68 minutes to 49.4 seconds in
high-concurrency multi-tablet import tests. We also **reduced long-tail import
latency** by allowing transactions to force-lock after a wait threshold is
exceeded.
+
+## 2. Query Performance Boosts
+
+### Enhanced Partition Pruning Performance and Coverage
+
+Doris organizes data into partitions that can be stored, queried, and managed
independently. Partitioning improves query performance, optimizes data
management, and reduces resource consumption. During queries, applying filters
to skip irrelevant partitions (known as partition pruning) can significantly
boost performance while lowering system resource usage.
+
+In use cases like log analytics or risk control systems, a single table can
have tens of thousands of partitions, or even hundreds of thousands, but most
queries only touch a few hundred. Efficient partition pruning is therefore
critical for performance.
+
+In 3.1, Doris introduces several optimizations that significantly enhance both
the performance and applicability of partition pruning:
+
+- **Binary search for partition pruning:** For partitions based on a time
column, the partition pruning process has been optimized by sorting partitions
according to the column values. This changes the pruning calculation from a
linear scan to a binary search. In a scenario with 136,000 partitions using a
DATETIME partition field, this optimization reduced pruning time from 724
milliseconds to 43 milliseconds, achieving over a 16-fold speedup.
+- **Add support for a large number of monotonic functions in partition
pruning:** In real-world use cases, filter conditions on time-partitioned
columns are often not simple logical comparisons but complex expressions
involving time-related functions on the partition columns. Examples include
expressions like `to_date(time_stamp) >
'2022-12-22'`,`date_format(timestamp,'%Y-%m-%d %H:%i:%s') > '2022-12-22
11:00:00'`. In Doris 3.1, When a function is monotonic, Doris can determine
whether an [...]
+- **Full-path code optimizations:** Additionally, in Doris 3.1, the entire
partition pruning code path has been thoroughly optimized at the code level to
eliminate unnecessary overhead.
+
+### Data Traits: Up to 10× Performance Boost
+
+In Doris 3.1, the optimizer can make smarter use of data traits to enhance
query performance. It analyzes each node in the query plan to collect data
traits such as UNIQUE, UNIFORM, and EQUAL SET, and infers functional
dependencies between columns. When data at a node meets certain traits, Doris
can eliminate unnecessary joins, aggregations, or sorts, significantly boosting
performance.
+
+In test cases designed to leverage these optimizations, leveraging data traits
has achieved more than **10× performance improvement**. See more in the table
below:
+
+| **Optimization** | **Optimized** | **Not
Optimized** | **Performance Boost** |
+| :--------------------------------------------------- | :------------ |
:---------------- | :-------------------- |
+| Eliminate joins based on unique join keys | 50 ms | 100
ms | 100% |
+| Remove redundant aggregation keys using uniqueness | 80 ms | 960
ms | 1100% |
+| Remove aggregation keys with functional dependencies | 1410 ms | 2110
ms | 50% |
+| Remove aggregation keys on uniform columns | 110 ms | 150
ms | 36% |
+| Eliminate unnecessary sorting | 130 ms | 370
ms | 185% |
+
+## 6. Feature Improvements
+
+### Semi-structured Data
+
+**VARIANT**
+
+- Added `variant_type(x)` function: returns the current actual type of a
Variant subfield.
+- Added ComputeSignature/Helper to improve the ability to infer function
parameters and return types. to improve parameter and return type inference.
+
+**STRUCT**
+
+- Schema Change now supports adding subcolumns to STRUCT types.
+
+#### Lakehouse
+
+- Now support setting metadata cache policies at the Catalog level (e.g.,
cache expiration time), giving users more flexibility to balance data freshness
and metadata access performance.
+
+Refer to [documentation](https://doris.apache.org/docs/lakehouse/meta-cache)
+
+- Added support for the `FILE()` Table Valued Function, which unifies the
existing `S3()`, `HDFS()`, `LOCAL()`functions into a single interface for
easier use and understanding.
+
+Refer to
[documentation](https://doris.apache.org/docs/dev/sql-manual/sql-functions/table-valued-functions/file)
+
+### Enhanced Aggregate Operator Capabilities
+
+In Doris 3.1, the optimizer has focused on improving the aggregate operators,
adding support for two widely used features.
+
+**Non-Standard GROUP BY Support**
+
+SQL Standard does not permit queries for which the select list, `HAVING`
condition, or `ORDER BY` list refer to nonaggregated columns that are not named
in the `GROUP BY` clause. However, in MySQL, if the SQL_MODE does not include
"ONLY_FULL_GROUP_BY," then there's no restriction. See the MySQL documentation
for more details:
https://dev.mysql.com/doc/refman/8.4/en/sql-mode.html#sqlmode_only_full_group_by
+
+The effect of disabling `ONLY_FULL_GROUP_BY` is equivalent to using
`ANY_VALUE` on non-aggregated columns. For example:
+
+```SQL
+-- Non-standard GROUP BY
+SELECT c1, c2 FROM t GROUP BY c1
+-- Equal to
+SELECT c1, any_value(c2) FROM t GROUP BY c1
+```
+
+In 3.1, Doris enables "ONLY_FULL_GROUP_BY" by default, aligned with previous
behavior. To use non-standard GROUP BY functionality, users can enable it
through configuration below:
+
+```SQL
+set sql_mode = replace(@@sql_mode, 'ONLY_FULL_GROUP_BY', '');
+```
+
+**Support for Multiple DISTINCT Aggregates**
+
+In previous versions, if an aggregation query contained multiple DISTINCT
aggregate functions with different parameters—and their DISTINCT semantics
differed from their non-DISTINCT semantics, while not being one of the
following—Doris could not execute the query:
+
+- single-parameter COUNT
+- SUM
+- AVG
+- GROUP_CONCAT
+
+Doris 3.1 significantly enhances this area. Now, queries involving multiple
distinct aggregates can be executed correctly and return results as expected,
for example:
+
+```SQL
+SELECT count(DISTINCT c1,c2), count(DISTINCT c2,c3), count(DISTINCT c3) FROM t;
+```
+
+## 7. Behavioral Changes
+
+**VARIANT**
+
+- `variant_max_subcolumns_count` constraint
+ - In a single table, all Variant columns must have the same
`variant_max_subcolumns_count`value: either all 0 or all greater than 0. Mixing
values will cause errors during table creation or schema changes.
+- New Variant read/write/serde and compaction paths are backward compatible
with old data. Upgrading from older Variant versions may produce slight
differences in query output (e.g., extra spaces or additional levels created by
. separators).
+- Creating an inverted index on Variant data type will generate empty index
files if none of the data fields meet the index conditions; this is expected
behavior.
+
+**Permissions**
+
+- The permission required for SHOW TRANSACTION has changed: it now requires
LOAD_PRIV on the target database instead of ADMIN_PRIV.
+- SHOW FRONTENDS / BACKENDS and the NODE RESTful API now use the same
permission. These interfaces now require SELECT_PRIV in the information_schema
database.
+
+**Get Started with Apache Doris 3.1**
+
+Even before the official release of version 3.1, several features in
semi-structured data and data lakes have been validated in real-world
production scenarios, and showing expected performance improvements. We
encourage users with relevant needs to try the new version:
+
+**[Download Apache Doris 3.1 on
GitHub](https://github.com/apache/doris/releases)**
+
+**[Download from the official website](https://doris.apache.org/download)**
+
+**Acknowledgements**
+
+We sincerely thank all contributors who contributed to the development,
testing, and giving feedback for this release:
+
+@924060929 @airborne12 @amorynan @BePPPower @BiteTheDDDDt @bobhan1 @CalvinKirs
@cambyzju @cjj2010 @csun5285 @DarvenDuan @dataroaring @deardeng @dtkavin
@dwdwqfwe @eldenmoon @englefly @feifeifeimoon @feiniaofeiafei @felixwluo
@freemandealer @Gabriel39 @gavinchou @ghkang98 @gnehil @gohalo @HappenLee
@heguanhui @hello-stephen @HonestManXin @htyoung @hubgeter @hust-hhb @jacktengg
@jeffreys-cat @Jibing-Li @JNSimba @kaijchen @kaka11chen @KeeProMise @koarz
@liaoxin01 @liujiwen-up @liutang123 @l [...]
\ No newline at end of file
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/overview.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/overview.md
index b4a8a70dfc9..699087e7b4d 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/overview.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/overview.md
@@ -59,7 +59,7 @@ Doris BE 使用内存跟踪器(Memory Tracker)记录进程内存使用,支
实时的内存统计结果通过 Doris BE 的 Web 页面查看
`http://{be_host}:{be_web_server_port}/mem_tracker`,展示 `type=overview` 的 Memory
Tracker 当前跟踪的内存大小和峰值内存大小,包括 Query/Load/Compaction/Global 等,`be_web_server_port`
默认 8040。
-
+
Memory Tracker 分为不同的类型,其中 `type=overview` 的 Memory Tracker 中除 `process
resident memory`、`process virtual memory`、`sum of all trackers` 外,其他
`type=overview` 的 Memory Tracker 都可以通过
`http://{be_host}:{be_web_server_port}/mem_tracker?type=Lable` 查看详情。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/trouble-shooting/memory-management/overview.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/trouble-shooting/memory-management/overview.md
index e23909d8ff4..49d3b2a706f 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/trouble-shooting/memory-management/overview.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/trouble-shooting/memory-management/overview.md
@@ -59,7 +59,7 @@ Doris BE 使用内存跟踪器(Memory Tracker)记录进程内存使用,支
实时的内存统计结果通过 Doris BE 的 Web 页面查看
`http://{be_host}:{be_web_server_port}/mem_tracker`,展示 `type=overview` 的 Memory
Tracker 当前跟踪的内存大小和峰值内存大小,包括 Query/Load/Compaction/Global 等,`be_web_server_port`
默认 8040。
-
+
Memory Tracker 分为不同的类型,其中 `type=overview` 的 Memory Tracker 中除 `process
resident memory`、`process virtual memory`、`sum of all trackers` 外,其他
`type=overview` 的 Memory Tracker 都可以通过
`http://{be_host}:{be_web_server_port}/mem_tracker?type=Label` 查看详情。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/trouble-shooting/memory-management/overview.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/trouble-shooting/memory-management/overview.md
index e23909d8ff4..49d3b2a706f 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/trouble-shooting/memory-management/overview.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/trouble-shooting/memory-management/overview.md
@@ -59,7 +59,7 @@ Doris BE 使用内存跟踪器(Memory Tracker)记录进程内存使用,支
实时的内存统计结果通过 Doris BE 的 Web 页面查看
`http://{be_host}:{be_web_server_port}/mem_tracker`,展示 `type=overview` 的 Memory
Tracker 当前跟踪的内存大小和峰值内存大小,包括 Query/Load/Compaction/Global 等,`be_web_server_port`
默认 8040。
-
+
Memory Tracker 分为不同的类型,其中 `type=overview` 的 Memory Tracker 中除 `process
resident memory`、`process virtual memory`、`sum of all trackers` 外,其他
`type=overview` 的 Memory Tracker 都可以通过
`http://{be_host}:{be_web_server_port}/mem_tracker?type=Label` 查看详情。
diff --git a/src/pages/index.tsx b/src/pages/index.tsx
index 06141c0ed93..3b6f9df2b14 100644
--- a/src/pages/index.tsx
+++ b/src/pages/index.tsx
@@ -71,7 +71,7 @@ export default function Home(): JSX.Element {
),
event: (
<Link
- to={'https://www.velodb.io/events/doris-webinar-20250923'}
+ to={'https://apache-doris-summit.org/'}
style={{ background: 'linear-gradient(0deg, #F7F9FE 0%,
#F7F9FE 100%), #FFF', textDecoration: 'none' }}
onMouseEnter={() => {
document.getElementById('event-star-icon').firstChild.style.fill = '#444FD9';
@@ -88,7 +88,7 @@ export default function Home(): JSX.Element {
</span>
</div>
<p className="lg:ml-[0.75rem] group-hover:text-[#444FD9]
text-[1rem]/[1rem] text-[#000]">
- Bridging Data Lake and Database — Apache Doris Webinar |
September 23
+ Doris Summit 2025: Powering Real-Time Analytics & Search
Database in AI Era
</p>
</Link>
),
diff --git a/static/images/memory-used-by-subsystem.png
b/static/images/memory-used-by-subsystem.png
new file mode 100644
index 00000000000..abddebd9851
Binary files /dev/null and b/static/images/memory-used-by-subsystem.png differ
diff --git a/static/images/release-3.1/custom-tokenization.png
b/static/images/release-3.1/custom-tokenization.png
new file mode 100644
index 00000000000..12e0c59132c
Binary files /dev/null and b/static/images/release-3.1/custom-tokenization.png
differ
diff --git
a/versioned_docs/version-2.1/admin-manual/trouble-shooting/memory-management/overview.md
b/versioned_docs/version-2.1/admin-manual/trouble-shooting/memory-management/overview.md
index dae11014c64..d4ff6fb75dc 100644
---
a/versioned_docs/version-2.1/admin-manual/trouble-shooting/memory-management/overview.md
+++
b/versioned_docs/version-2.1/admin-manual/trouble-shooting/memory-management/overview.md
@@ -59,7 +59,7 @@ Doris BE uses Memory Tracker to record process memory usage,
supports Web page v
Real-time memory statistics can be viewed through Doris BE's Web page
`http://{be_host}:{be_web_server_port}/mem_tracker`, which displays the current
memory size and peak memory size tracked by Memory Tracker of `type=overview`,
including Query/Load/Compaction/Global, etc. `be_web_server_port` defaults to
8040.
-
+
Memory Tracker is divided into different types. Among the Memory Tracker of
type=overview, except for `process resident memory`, `process virtual memory`,
and `sum of all trackers`, the details of other Memory Trackers of
type=overview can be viewed through
`http://{be_host}:{be_web_server_port}/mem_tracker?type=Label`.
diff --git
a/versioned_docs/version-3.0/admin-manual/trouble-shooting/memory-management/overview.md
b/versioned_docs/version-3.0/admin-manual/trouble-shooting/memory-management/overview.md
index dae11014c64..d4ff6fb75dc 100644
---
a/versioned_docs/version-3.0/admin-manual/trouble-shooting/memory-management/overview.md
+++
b/versioned_docs/version-3.0/admin-manual/trouble-shooting/memory-management/overview.md
@@ -59,7 +59,7 @@ Doris BE uses Memory Tracker to record process memory usage,
supports Web page v
Real-time memory statistics can be viewed through Doris BE's Web page
`http://{be_host}:{be_web_server_port}/mem_tracker`, which displays the current
memory size and peak memory size tracked by Memory Tracker of `type=overview`,
including Query/Load/Compaction/Global, etc. `be_web_server_port` defaults to
8040.
-
+
Memory Tracker is divided into different types. Among the Memory Tracker of
type=overview, except for `process resident memory`, `process virtual memory`,
and `sum of all trackers`, the details of other Memory Trackers of
type=overview can be viewed through
`http://{be_host}:{be_web_server_port}/mem_tracker?type=Label`.
diff --git a/versioned_docs/version-3.0/releasenotes/all-release.md
b/versioned_docs/version-3.0/releasenotes/all-release.md
index c3160067d2e..bbb64db2c1c 100644
--- a/versioned_docs/version-3.0/releasenotes/all-release.md
+++ b/versioned_docs/version-3.0/releasenotes/all-release.md
@@ -8,6 +8,9 @@
This document presents a summary of Apache Doris versions released within one
year, listed in reverse chronological order.
:::tip Latest Release
+🎉 Version 3.1.0 is released. Check out the 🔗[Release
Notes](../releasenotes/v3.1/release-3.1.0) here. Doris 3.1 introduces a sparse
column and schema template for the VARIANT data type, making it more efficient
to store and query large datasets with dynamic fields, such as logs and JSON
data. For lakehouse capabilities, it enhances asynchronous materialized views
and expands support for Iceberg and Paimon to build a stronger bridge between
data lakes and data warehouses.
+
+<br />
🎉 Version 3.0.8 released now. Check out the 🔗[Release
Notes](../releasenotes/v3.0/release-3.0.8) here. Starting from version 3.X,
Apache Doris supports a compute-storage decoupled mode in addition to the
compute-storage coupled mode for cluster deployment. With the cloud-native
architecture that decouples the computation and storage layers, users can
achieve physical isolation between query loads across multiple compute
clusters, as well as isolation between read and write loads.
@@ -22,6 +25,8 @@ This document presents a summary of Apache Doris versions
released within one ye
- [2025-09-19, Apache Doris 3.0.8 is
released](../releasenotes/v3.0/release-3.0.8.md)
+- [2025-09-04, Apache Doris 3.1.0 is
released](../releasenotes/v3.1/release-3.1.0.md)
+
- [2025-08-25, Apache Doris 3.0.7 is
released](../releasenotes/v3.0/release-3.0.7.md)
- [2025-08-15, Apache Doris 2.1.11 is
released](../releasenotes/v2.1/release-2.1.11.md)
diff --git a/versioned_docs/version-3.0/releasenotes/v3.1/release-3.1.0.md
b/versioned_docs/version-3.0/releasenotes/v3.1/release-3.1.0.md
index 6d8e7ae5860..d2222014772 100644
--- a/versioned_docs/version-3.0/releasenotes/v3.1/release-3.1.0.md
+++ b/versioned_docs/version-3.0/releasenotes/v3.1/release-3.1.0.md
@@ -5,4 +5,675 @@
}
---
-Coming Soon
\ No newline at end of file
+We are excited to announce the official release of **Apache Doris 3.1**, a
milestone version that makes semi-structured data and lakehouse analytics more
powerful and practical.
+
+Doris 3.1 introduces **sparse columns and schema template for the VARIANT data
type**, allowing users to efficiently store, index, and query datasets with
tens of thousands of dynamic fields, ideal for logs, events, and JSON-heavy
workloads.
+
+For lakehouse capabilities, Doris 3.1 brings better **asynchronous
materialized view** into lakehosue, building a stronger bridge between data
lakes and data warehouses. It also expands support for **Iceberg and Paimon**,
making complex lakehouse workloads easier and more efficient to manage.
+
+More than **90 contributors** submitted over **1,000 improvements and fixes**
during the development of Apache Doris 3.1. We'd like to send a big heartfelt
thank you to everyone who contributed through development, testing, and
feedback.
+
+**[Download Apache Doris 3.1 on
GitHub](https://github.com/apache/doris/releases)**
+
+**[Download from the official website](https://doris.apache.org/download)**
+
+### Key Highlights in Apache Doris 3.1
+
+- **Semi-Structured Data Analytics**
+ - **Sparse columns** for the VARIANT type, supporting tens of thousands of
subcolumns.
+ - **Schema template** for VARIANT, delivering faster queries, more stable
indexing, and controllable costs without losing flexibility.
+ - **Inverted Indexes Storage Format** upgrade from V2 to V3, reducing
storage usage by up to **20%**.
+ - Three new tokenizers: **ICU Tokenizer**, **IK Tokenizer**, and **Basic**
**Tokenizer.** We also added support for **custom tokenizers**, greatly
improving search recall across diverse scenarios.
+- **Lakehouse Upgrades**
+ - **Better materialized views features** were introduced to data lakes,
strengthening the bridge between data lakes and data warehouses.
+ - Broader support for **Iceberg** and **Paimon**.
+ - **Dynamic partition pruning** and **batch splits scheduling**, improve
certain query workloads by up to **40%** and reduce FE (frontend) memory
consumption.
+ - Doris 3.1 refactors the **connection properties for external data
sources**, providing a clearer way to integrate with various metadata services
and storage systems, offering more flexible connectivity options.
+- **Storage Engine Improvements**
+ - **New flexible column updates**, building on partial column updates, now
allow different columns to be updated for each row within a single import.
+ - In storage-compute separation scenarios, Doris 3.1 optimizes the lock
mechanism for **MOW tables**, improving the experience of high-concurrency data
ingestion.
+- **Performance Optimizations**
+ - Doris 3.1 enhanced **partition pruning and query planning**, delivering
faster queries and lower resource usage with large partitions (tens of
thousands) and complex filtering expressions.
+ - The optimizer also introduces **data-aware optimization techniques**,
achieving up to **10× performance gains** in specific workloads.
+
+## 1. VARIANT Semi-Structured Data Analytics Get a Major Upgrade
+
+### Storage Capabilities: Sparse columns and Subcolumns Vertical Compaction
+
+Traditional OLAP systems often struggle with "super-wide tables" containing
thousands to tens of thousands of columns, facing metadata bloat, compaction
amplification, and degraded query performance. **Doris 3.1** addresses this
with **sparse subcolumns** and **subcolumn-level vertical compaction** for the
VARIANT type, pushing the practical column limit to tens of thousands.
+
+**VARIANT brings the following benefits with its thorough optimizations in
storage layer:**
+
+- **Stable support for subcolumns (thousands to tens of thousands)** in
columnar storage, delivering smoother queries and reducing compaction latency.
+- **Controllable metadata and indexing**, preventing exponential growth.
+- **Subcolumnarization** **over 10,000 subcolumns** (columnar storage) is
achievable in real-world tests, with smooth and efficient compaction
performance.
+
+**Key use cases for ultra-wide tables:**
+
+- **Connected Vehicles / IoT Telemetry:** Support various device models and
dynamically changing sensor dimensions.
+- **Marketing Automation / CRM:** Continuously expanding events and user
attributes (e.g., custom events/properties).
+- **Advertising / Event Tracking:** Massive optional properties with sparse
and continuously evolving fields.
+- **Security Auditing / Logs:** Diverse log sources have varying fields that
need to be aggregated and searched by schema.
+- **E-commerce Product Attributes:** Wide category range with highly variable
product attributes.
+
+**How it works:**
+
+- **Frequent JSON keys become real columns**
+
+ We automatically keep the most common JSON keys as true columnar
subcolumns. This keeps queries fast and metadata small. Rare keys stay in a
compact "sparse" area, so the table doesn’t bloat.
+
+- **Smart, low‑memory merges for subcolumns**
+
+ We merge VARIANT subcolumns in small vertical groups, which uses much
less memory and keeps background maintenance smooth even with thousands of
fields. Hot keys are detected and promoted to subcolumns automatically.
+
+- **Faster handling of missing values**
+
+ We batch‑fill default values to cut per‑row overhead during reads and
merges.
+
+- **Lean metadata via LRU caching**
+
+ Column metadata is cached with an LRU policy, reducing memory usage and
speeding up repeated access. In short: you get columnar speed for the fields
that matter, without paying a heavy cost for the long tail, and background
operations stay smooth and predictable.
+
+**How to Enable and Use:**
+
+New column-level Variant parameter control, column properties:
+
+`variant_max_subcolumns_count`: Default is `0`, which means sparse subcolumn
support is disabled. When set to a specific value, the system extracts the
Top-N most frequent JSON keys for columnar storage, while the remaining keys
are stored sparsely.
+
+```SQL
+-- Enable sparse subcolumns and cap hot subcolumn count
+CREATE TABLE IF NOT EXISTS tbl (
+ k BIGINT,
+ v VARIANT<
+ properties("variant_max_subcolumns_count" = "2048") -- pick top-2048 hot
keys
+ >
+) DUPLICATE KEY(k);
+```
+
+### Schema Templates
+
+Schema templates make constantly changing JSON predictable along critical
paths: so queries run faster, indexes stay stable, and costs remain under
control, without losing flexibility. Benefits are:
+
+- **Type stability**: Lock types for key JSON subpaths in DDL to prevent type
drift that causes query errors, index invalidation, and implicit cast overhead.
+- **Faster, more accurate retrieval:** Tailor inverted index strategies per
subpath (tokenized/non-tokenized, parsers, phrase search) to cut latency and
improve hit rates.
+- **Controlled indexing and costs**: Tune indexes per subpath instead of
indexing the whole column, significantly reducing index count, write
amplification, and storage usage.
+- **Maintainability and collaboration**: Acts as a "data contract" for JSON,
semantics stay consistent across teams. Types and index states are easier to
observe and debug.
+- **Easier to evolve:** Template and optionally index hot paths, while keeping
long‑tail fields flexible, so you scale without surprises.
+
+**How to enable and use:**
+
+- Explicitly declare common subpaths and types (including wildcards) inside
`VARIANT<...>`, for example: `VARIANT<'a': INT, 'c.d': TEXT, 'array_int_*':
ARRAY<INT>>`.
+- Configure per‑subpath indexes with differentiated strategies
(`field_pattern`, `parsers`, `tokenization`, `phrase` search), and use
wildcards to cover families of fields at once.
+- New column property: `variant_enable_typed_paths_to_sparse`
+- Default: `false` (typed, predefined paths are not stored sparsely).
+- Set to true also to treat typed paths as sparse candidates, useful when many
paths match and you want to avoid column sprawl.
+
+In short: template what matters, index it precisely, and keep everything else
flexible.
+
+**Example 1:** Schema Definition with Multiple Indexes on a Single Column
+
+```SQL
+-- Common properties: field_pattern (target subpath), analyzer, parser,
support_phrase, etc.
+CREATE TABLE IF NOT EXISTS tbl (
+ k BIGINT,
+ v VARIANT<'content' : STRING>, -- specify concrete type for subcolumn
'content'
+ INDEX idx_tokenized(v) USING INVERTED PROPERTIES("parser" = "english",
"field_pattern" = "content"), -- tokenized inverted index for 'content' with
english parser
+ INDEX idx_v(v) USING INVERTED PROPERTIES("field_pattern" = "content") --
non-tokenized inverted index for 'content'
+);
+
+-- v.content will have both a tokenized (english) inverted index and a
non-tokenized inverted index
+
+-- Use tokenized index
+SELECT * FROM tbl WHERE v['content'] MATCH 'Doris';
+
+-- Use non-tokenized index
+SELECT * FROM tbl WHERE v['content'] = 'Doris';
+```
+
+**Example 2:** Batch Processing Columns Matching a Wildcard Pattern
+
+```SQL
+-- Use wildcard-typed subpaths with per-pattern indexes
+CREATE TABLE IF NOT EXISTS tbl2 (
+ k BIGINT,
+ v VARIANT<
+ 'pattern1_*' : STRING, -- batch-typing: all subpaths matching pattern1_*
are STRING
+ 'pattern2_*' : BIGINT, -- batch-typing: all subpaths matching pattern2_*
are BIGINT
+ properties("variant_max_subcolumns_count" = "2048") -- enable sparse
subcolumns; keep top-2048 hot keys
+ >,
+ INDEX idx_p1 (v) USING INVERTED
+ PROPERTIES("field_pattern"="pattern1_*", "parser" = "english"), --
tokenized inverted index for pattern1_* with english parser
+ INDEX idx_p2 (v) USING INVERTED
+ PROPERTIES("field_pattern"="pattern2_*") -- non-tokenized inverted index
for pattern2_*
+) DUPLICATE KEY(k);
+```
+
+**Example 3:** Allowing Predefined Typed Columns to be Stored Sparsely
+
+```SQL
+-- Allow predefined typed paths to participate in sparse extraction
+CREATE TABLE IF NOT EXISTS tbl3 (
+ k BIGINT,
+ v VARIANT<
+ 'message*' : STRING, -- batch-typing: all subpaths matching prefix
'message*' are STRING
+ properties(
+ "variant_max_subcolumns_count" = "2048", -- enable sparse
subcolumns; keep top-2048 hot keys
+ "variant_enable_typed_paths_to_sparse" = "true" -- include typed
(predefined) paths as sparse candidates (default: false)
+ )
+ >
+) DUPLICATE KEY(k);
+```
+
+## 2. Inverted Index Storage Format V3
+
+Compared to V2, Inverted Index V3 further optimizes storage:
+
+- Smaller index files reduce disk usage and I/O overhead.
+- Testing with httplogs and logsbench datasets shows up to 20% storage space
savings with V3, great for large-scale text data and log analysis scenarios.
+
+| Test dataSet | Data size before import | inverted_index_storage_format = v2
| inverted_index_storage_format = v3 | Space saving: V3 vs. V2 |
+| :----------- | :---------------------- | :---------------------------------
| :--------------------------------- | :---------------------- |
+| httplogs | 30.89 GB | 4.472 GB
| 3.479 GB | 22.2% |
+| logsbench | 1479.31 GB | 182.180 GB
| 138.008 GB | 24.2% |
+
+**Key Improvements:**
+
+- **Introduced ZSTD dictionary compression** for inverted index term
dictionaries, enabled via `dict_compression` in index properties.
+- **Added compression for positional information** associated with each term
in the inverted index, further reducing storage footprint.
+
+**How to Use:**
+
+```SQL
+-- Enable V3 format when creating the table
+CREATE TABLE example_table (
+ content TEXT,
+ INDEX content_idx (content) USING INVERTED
+ PROPERTIES("parser" = "english", "dict_compression" = "true")
+) ENGINE=OLAP
+PROPERTIES ("inverted_index_storage_format" = "V3");
+```
+
+### Full-Text Search Tokenizers
+
+We added new commonly used tokenizers in Doris 3.1 to better meet diverse
tokenization needs:
+
+1. **ICU Tokenizer:**
+
+- **Implementation:** Based on International Components for Unicode (ICU).
+- **Use Cases:** Ideal for internationalized text with complex writing systems
and multilingual documents.
+- **Example:**
+
+```SQL
+SELECT TOKENIZE('مرحبا بالعالم Hello 世界', '"parser"="icu"');
+-- Results: ["مرحبا", "بالعالم", "Hello", "世界"]
+
+SELECT TOKENIZE('มนไมเปนไปตามความตองการ', '"parser"="icu"');
+-- Results: ["มน", "ไมเปน", "ไป", "ตาม", "ความ", "ตองการ"]
+```
+
+2. **Basic Tokenizer**
+
+- **Implementation**: A simple rule-based custom tokenizer, using basic
character type recognition.
+- **Use cases**: Simple scenarios or cases with very high performance
requirements.
+- **Rules**:
+ - Continuous alphanumeric characters are treated as one token.
+ - Each Chinese character is a separate token.
+ - Punctuation, spaces, and special characters are ignored.
+- **Examples**:
+
+```SQL
+-- English text tokenization
+SELECT TOKENIZE('Hello World! This is a test.', '"parser"="basic"');
+-- Results: ["hello", "world", "this", "is", "a", "test"]
+
+-- Chinese text tokenization
+SELECT TOKENIZE('你好世界', '"parser"="basic"');
+-- Results: ["你", "好", "世", "界"]
+
+-- Mixed-language tokenization
+SELECT TOKENIZE('Hello你好World世界', '"parser"="basic"');
+-- Results: ["hello", "你", "好", "world", "世", "界"]
+
+-- Handling numbers and special characters
+SELECT TOKENIZE('GET /images/hm_bg.jpg HTTP/1.0', '"parser"="basic"');
+-- Results: ["get", "images", "hm", "bg", "jpg", "http", "1", "0"]
+
+-- Processing long numeric sequences
+SELECT TOKENIZE('12345678901234567890', '"parser"="basic"');
+-- Results: ["12345678901234567890"]
+```
+
+**Custom Tokenization**
+
+A new **custom tokenization** feature lets users design tokenization logic
according to their needs, improving recall for text search.
+
+By combining character filters, tokenizers, and token filters, users can go
beyond the limitations of built-in analyzers and define exactly how text should
be split into searchable terms, directly influencing both search relevance and
analytical accuracy.
+
+
+
+**Example Use Case**
+
+- **Problem**: With the default Unicode tokenizer, a phone number like
"13891972631" is treated as a single token, making prefix search (e.g., "138")
impossible.
+- **Solution**:
+ - Create a tokenizer using the **Edge N-gram** custom tokenizer:
+ - ```SQL
+ CREATE INVERTED INDEX TOKENIZER IF NOT EXISTS edge_ngram_phone_tokenizer
+ PROPERTIES
+ (
+ "type" = "edge_ngram",
+ "min_gram" = "3",
+ "max_gram" = "10",
+ "token_chars" = "digit"
+ );
+ ```
+
+ - Create an analyzer:
+ - ```SQL
+ CREATE INVERTED INDEX ANALYZER IF NOT EXISTS phone_prefix_analyzer
+ PROPERTIES
+ (
+ "tokenizer" = "edge_ngram_phone_tokenizer"
+ );
+ ```
+
+ - Specify an analyzer when creating the table
+
+ ```SQL
+ CREATE TABLE customer_contacts (
+ id bigint NOT NULL AUTO_INCREMENT(1),
+ phone text NULL,
+ INDEX idx_phone (phone) USING INVERTED PROPERTIES(
+ "analyzer" = "phone_prefix_analyzer"
+ )
+ ) ENGINE=OLAP
+ DUPLICATE KEY(id)
+ DISTRIBUTED BY RANDOM BUCKETS 1
+ PROPERTIES ("replication_allocation" = "tag.location.default: 1");
+ ```
+
+ - Check tokenization results:
+
+ ```SQL
+ SELECT tokenize('13891972631', '"analyzer"="phone_prefix_analyzer"');
+
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+ | tokenize('13891972631', '"analyzer"="phone_prefix_analyzer"')
|
+
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+ | [{
+ "token": "138"
+ }, {
+ "token": "1389"
+ }, {
+ "token": "13891"
+ }, {
+ "token": "138919"
+ }, {
+ "token": "1389197"
+ }, {
+ "token": "13891972"
+ }, {
+ "token": "138919726"
+ }, {
+ "token": "1389197263"
+ }] |
+
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+ ```
+
+ - Text search results
+
+ ```SQL
+ SELECT * FROM customer_contacts_optimized WHERE phone MATCH '138';
+ +------+-------------+
+ | id | phone |
+ +------+-------------+
+ | 1 | 13891972631 |
+ | 2 | 13812345678 |
+ +------+-------------+
+ SELECT * FROM customer_contacts_optimized WHERE phone MATCH '1389';
+ +------+-------------+
+ | id | phone |
+ +------+-------------+
+ | 1 | 13891972631 |
+ | 2 | 13812345678 |
+ +------+-------------+
+ 2 rows in set (0.043 sec)
+ ```
+
+ - Using the Edge N-gram tokenizer, a phone number is broken into multiple
prefix tokens, enabling flexible prefix-matching search.
+
+## 3. Lakehouse Capability Upgrades
+
+### Asynchronous Materialized Views Now Fully Support Data Lakes
+
+In version 3.1, asynchronous materialized views have been significantly
enhanced, now providing comprehensive support for partition-level incremental
maintenance and partition-level transparent query rewriting. This functionality
is available for data lake formats like Iceberg, Paimon, and Hudi.
+
+Doris introduced asynchronous materialized views in version 2.1, and we have
kept improving, including:
+
+| Feature | Description
|
+| :-------------------- |
:----------------------------------------------------------- |
+| Refresh | Full and partition-level incremental maintenance
for internal tables and Hive tables |
+| Trigger Methods | Scheduled refresh, manual refresh for all type
table, and refresh by commit ONLY for internal tables |
+| partitioning | Same partitioning as base tables, with features like
partition rolling, multi-level partitions, and retaining certain "hot"
partitions of the base table |
+| Transparent Rewriting | Supports conditional compensation, join rewriting
and derivation, aggregation rewriting, and partition compensation for internal
tables |
+| Maintenance Tools | Provides tasks(), jobs(), and mv_infos() to query
materialized view runtime information |
+
+Version 3.1 focuses on strengthening our lakehouse capabilities by providing
comprehensive support for partition-level incremental maintenance and
partition-level compensation for external tables during query rewriting. This
enhancement targets mainstream data lake table formats like Iceberg, Paimon,
and Hudi, effectively creating a high-speed data bridge between the data lake
and the data warehouse. The detailed scope of support is outlined in the table
below.
+
+**See specific support details below:**
+
+| External Source | Partition Refresh
| Transparent Partition Rewriting |
+| :-------------- |
:----------------------------------------------------------- |
:------------------------------ |
+| Hive | Supported
| Supported |
+| Iceberg | Supported
| Supported |
+| Paimon | Supported
| Supported |
+| Hudi | Cannot detect partition sync; suitable for manually
refreshing specific partitions | Supported |
+
+### Better Support for Iceberg and Paimon
+
+#### 1. Iceberg
+
+Doris 3.1 introduces a wide range of optimizations and enhancements for
Iceberg table format, offering compatibility with the latest Iceberg
capabilities.
+
+**Branch / Tag Lifecycle Management**
+
+Starting in 3.1, Doris natively supports creating, deleting, reading, and
writing Iceberg Branches and Tags. Much like Git, this allows users to manage
Iceberg table data versions seamlessly. With this capability, users can perform
parallel version management, canary releases, and environment isolation,
without relying on external engines or custom logic.
+
+```SQL
+-- Create branches
+ALTER TABLE iceberg_tbl CREATE BRANCH b1;
+-- Write data to a specific branch
+INSERT INTO iceberg_tbl@branch(b1) values(1, 2);
+-- Query data from a specific branch
+SELECT * FROM iceberg_tbl@branch(b1);
+```
+
+**Comprehensive System Tables Support**
+
+Doris 3.1 adds support for Iceberg system tables such as `$entries`,
`$files`, `$history`, `$manifests`, `$refs`, `$snapshots`. Users can directly
query Iceberg with commands like `SELECT * FROM iceberg_table$history` or
`…$refs` to inspect metadata, snapshot histories, branch/tag mappings, and file
organization. This dramatically improves metadata observability, making issue
diagnosis, performance tuning, and governance more transparent.
+
+You can directly query Iceberg's underlying metadata, snapshot lists, branch,
and tag information using statements like`SELECT * FROM iceberg_table$history`
or `…$refs`. This allows you to gain in-depth insights into the organization of
data files, the history of snapshot changes, and branch mappings. This
capability greatly enhances the observability of Iceberg metadata, making it
easier and more transparent to troubleshoot issues, perform optimization
analysis, and make governance decisions.
+
+**Example:** querying the number of deleted files via system tables.
+
+```SQL
+SELECT
+ CASE
+ WHEN content = 0 THEN 'DataFile'
+ WHEN content = 1 THEN 'PositionDeleteFile'
+ WHEN content = 2 THEN 'EqualityDeleteFile'
+ ELSE 'Unknown'
+ END AS ContentType,
+ COUNT(*) AS FileNum,
+ SUM(file_size_in_bytes) AS SizeInBytes,
+ SUM(record_count) AS Records
+FROM
+ iceberg_table$files
+GROUP BY
+ ContentType;
+
++--------------------+---------+-------------+---------+
+| ContentType | FileNum | SizeInBytes | Records |
++--------------------+---------+-------------+---------+
+| EqualityDeleteFile | 2787 | 1432518 | 27870 |
+| DataFile | 2787 | 4062416 | 38760 |
+| PositionDeleteFile | 11 | 36608 | 10890 |
++--------------------+---------+-------------+---------+
+```
+
+**Support Querying Iceberg View**
+
+Doris 3.1 added support for accessing and querying Iceberg logical views,
further enhancing Doris's Iceberg capabilities. In future 3.x releases, we plan
to add SQL dialect translation support for Iceberg Views.
+
+**Schema Evolution with ALTER Statements**
+
+Starting in 3.1, Doris supports adding, deleting, renaming, and reordering
columns in Iceberg tables through the ALTER TABLE statement. This further
empowers Doris to manage Iceberg tables without needing third-party engines
like Spark.
+
+```SQL
+ALTER TABLE iceberg_table
+ADD COLUMN new_col int;
+```
+
+Additionally, in version 3.1, the Iceberg dependency has been upgraded to
version 1.9.2 to better support new Iceberg features. Future 3.1.x releases
will further improve Iceberg table management, including data compaction and
branch evolution.
+
+Documentation: https://doris.apache.org/docs/lakehouse/catalogs/iceberg-catalog
+
+#### 2. Paimon
+
+Doris 3.1 brings several updates and enhancements to the Paimon table format
based on real use cases.
+
+**Support for Paimon Batch Incremental Queries**
+
+Doris 3.1 allows reading incremental data between two specified snapshots in a
Paimon table. This allows users to better access Paimon incremental data,
especially enabling incremental aggregation materialized views on Paimon tables.
+
+```SQL
+SELECT * FROM paimon_tbl@incr('startSnapshotId'='2', 'endSnapshotId'='5');
+```
+
+**Branch and Tag Reading**
+
+From 3.1 and onwards, Doris supports reading Paimon table data from branches
and tags, offering more flexible multi-version data access.
+
+```SQL
+SELECT * FROM paimon_tbl@branch(branch1);
+SELECT * FROM paimon_tbl@tag(tag1);
+```
+
+**Comprehensive System Tables Support**
+
+Similar to Iceberg, 3.1 adds support for Paimon system tables like `$files`,
`$partitions`, `$manifests`, `$tags`, `$snapshots`. Users can query underlying
metadata directly with statements like `SELECT * FROM partition_table$files`,
making Paimon tables easier to explore, debug, and optimize.
+
+For example, we can use system tables to count new data files added per
partition.
+
+```SQL
+SELECT
+ partition,
+ COUNT(*) AS new_file_count,
+ SUM(file_size_in_bytes)/1024/1024 AS new_total_size_mb
+FROM my_table$files
+WHERE creation_time >= DATE_SUB(NOW(), INTERVAL 3 DAY)
+GROUP BY partition
+ORDER BY new_total_size_mb DESC;
+```
+
+In 3.1, Paimon's dependency is upgraded to version 1.1.1 to better support new
features.
+
+Documentation: https://doris.apache.org/docs/lakehouse/catalogs/paimon-catalog
+
+### Enhanced Data Lake Query Performance
+
+Doris 3.1 added thorough optimizations for data lake table formats, hoping to
provide more stable and efficient data lake analytics in real-world production
environments.
+
+1. **Dynamic Partition Pruning**
+
+In multi-table join queries, this feature generates partition predicates from
the right-hand table at runtime and applies them to the left-hand table.
Pruning unnecessary partitions on the fly reduces data I/O and improves query
performance. Doris 3.0 introduced dynamic partition pruning for Hive tables.
Now with 3.1, we have expanded this capability to Iceberg, Paimon, and Hudi
tables. In test scenarios, queries with high selectivity showed a **30%–40%
performance improvement**.
+
+2. **Batch Split Scheduling**
+
+When a lakehouse table contains a large number of shards, the Frontend (FE)
traditionally collects all shard metadata at once and sends it to the Backend
(BE). This can cause high FE memory consumption and long planning times,
particularly for queries on large datasets.
+
+Batch shard execution solves this by generating shard metadata in batches and
executing them as they are produced. This reduces FE memory pressure, allowing
planning and execution to run in parallel and improving overall efficiency.
Doris 3.0 added support for this feature on Hive tables, and in 3.1 extended it
to Iceberg tables. In large-scale test scenarios, it significantly reduced FE
memory usage and query planning time.
+
+### Federated Analytics: More Flexible, More Powerful Connectors
+
+In 3.1, Doris restructured its connector properties for external data sources.
This makes it easier to integrate with different metadata services and storage
systems, while also expanding the range of supported capabilities.
+
+#### 1. More Flexible Data Lake Rest Catalog Support
+
+- **Iceberg REST Catalog**
+
+ We have improved support for Iceberg REST Catalog. Doris 3.1 now works
with multiple backend implementations, including Unity, Polaris, Gravitino, and
Glue. We also added support for vended credentials, enabling more secure and
flexible credential management. AWS is supported, with GCP and Azure support
planned in upcoming releases.
+
+Refer to
[documentation](https://doris.apache.org/docs/lakehouse/metastores/iceberg-rest)
+
+- **Paimon REST Catalog**
+
+ Doris 3.1 introduces support for Paimon REST Catalog via Alibaba Cloud
DLF, allowing direct access to Paimon tables managed by the latest DLF versions.
+
+Refer to
[documentation](https://doris.apache.org/docs/lakehouse/best-practices/doris-dlf-paimon)
+
+#### 2. More Powerful Hadoop Ecosystem Support
+
+- **Multi-Kerberos Environment Support**
+
+ Doris 3.1 allows access to multiple Kerberos authentication environments
within the same cluster. Different environments can use separate KDC services,
Principals, and Keytabs. Doris 3.1 allows each Catalog to now be configured
with its own Kerberos settings independently. This feature makes it much easier
for users with multiple Kerberos environments to manage them all through Doris
with unified access control.
+
+Refer to
[documentation](https://doris.apache.org/docs/lakehouse/storages/hdfs#kerberos-authentication)
+
+- **Multi-Hadoop Environment Support**
+
+ Previously, Doris only allowed a single Hadoop configuration (e.g.,
hive-site.xml, hdfs-site.xml) to be placed under the conf directory. Multiple
Hadoop environments and configurations are not supported. With Doris 3.1, users
can now assign different Hadoop configuration files to different Catalogs,
making it easier to manage external data sources flexibly.
+
+Refer to
[documentation](https://doris.apache.org/docs/lakehouse/storages/hdfs#kerberos-authentication)
+
+## 4. Storage Engine Improvements
+
+In Doris 3.1, we've continued to improve the storage layer, improving
performance and stability.
+
+### Flexible Column Updates: A New Data Update Experience
+
+Previously, the **Partial Column Update** feature in Doris required each row
in a single import to update the same set of columns. However, in many
scenarios, source systems only provide records containing the primary key and
the specific columns being updated, with different rows updating different
columns. To address this, Doris introduced the **Flexible Column Update**
feature, which greatly simplifies the user's workload of preparing per-column
data and improving write performance.
+
+**How to Use**
+
+- Enable when creating a Merge-on-Write Unique table:
+ - `"enable_unique_key_skip_bitmap_column" = "true"`
+- Specify the import mode:
+ - `unique_key_update_mode: UPDATE_FLEXIBLE_COLUMNS`
+- Doris will automatically handle flexible column updates and fill in missing
data.
+
+**Example**
+
+Flexible column updates allow different rows to update different columns in
the same import:
+
+- Delete a row (`DORIS_DELETE_SIGN`)
+- Update specific columns (e.g., `v1`, `v2`, `v5`)
+- Insert new rows with only the primary key and updated columns; other columns
use default values or historical data.
+
+**Performance**
+
+- Test environment: 1 FE + 4 BE, 16C 64GB, 300M rows × 101 columns, 3 replicas
+ - Import 20,000 rows updating just 1 column (99 columns filled automatically)
+ - Single concurrency import performance: 10.4k rows/sec
+ - Resource usage per node: CPU ~60%, memory ~30GB, read IOPS ~7.5k/s, write
IOPS ~5k/s
+
+### Storage-Compute Separation: MOW Lock Optimization
+
+In storage-compute separation scenarios, updating the Delete Bitmap in MOW
tables requires acquiring a distributed lock `delete_bitmap_update_lock`.
Previously, import, compaction, and schema change operations would lock race,
causing long waits or failures under high-concurrency imports.
+
+We have added several optimizations to make large-scale, concurrent data
ingestion more stable and efficient. These include **reducing compaction lock
times:** cutting p99 commit latency from 1.68 minutes to 49.4 seconds in
high-concurrency multi-tablet import tests. We also **reduced long-tail import
latency** by allowing transactions to force-lock after a wait threshold is
exceeded.
+
+## 5. Query Performance Boosts
+
+### Enhanced Partition Pruning Performance and Coverage
+
+Doris organizes data into partitions that can be stored, queried, and managed
independently. Partitioning improves query performance, optimizes data
management, and reduces resource consumption. During queries, applying filters
to skip irrelevant partitions (known as partition pruning) can significantly
boost performance while lowering system resource usage.
+
+In use cases like log analytics or risk control systems, a single table can
have tens of thousands of partitions, or even hundreds of thousands, but most
queries only touch a few hundred. Efficient partition pruning is therefore
critical for performance.
+
+In 3.1, Doris introduces several optimizations that significantly enhance both
the performance and applicability of partition pruning:
+
+- **Binary search for partition pruning:** For partitions based on a time
column, the partition pruning process has been optimized by sorting partitions
according to the column values. This changes the pruning calculation from a
linear scan to a binary search. In a scenario with 136,000 partitions using a
DATETIME partition field, this optimization reduced pruning time from 724
milliseconds to 43 milliseconds, achieving over a 16-fold speedup.
+- **Add support for a large number of monotonic functions in partition
pruning:** In real-world use cases, filter conditions on time-partitioned
columns are often not simple logical comparisons but complex expressions
involving time-related functions on the partition columns. Examples include
expressions like `to_date(time_stamp) >
'2022-12-22'`,`date_format(timestamp,'%Y-%m-%d %H:%i:%s') > '2022-12-22
11:00:00'`. In Doris 3.1, When a function is monotonic, Doris can determine
whether an [...]
+- **Full-path code optimizations:** Additionally, in Doris 3.1, the entire
partition pruning code path has been thoroughly optimized at the code level to
eliminate unnecessary overhead.
+
+### Data Traits: Up to 10× Performance Boost
+
+In Doris 3.1, the optimizer can make smarter use of data traits to enhance
query performance. It analyzes each node in the query plan to collect data
traits such as UNIQUE, UNIFORM, and EQUAL SET, and infers functional
dependencies between columns. When data at a node meets certain traits, Doris
can eliminate unnecessary joins, aggregations, or sorts, significantly boosting
performance.
+
+In test cases designed to leverage these optimizations, leveraging data traits
has achieved more than **10× performance improvement**. See more in the table
below:
+
+| **Optimization** | **Optimized** | **Not
Optimized** | **Performance Boost** |
+| :--------------------------------------------------- | :------------ |
:---------------- | :-------------------- |
+| Eliminate joins based on unique join keys | 50 ms | 100
ms | 100% |
+| Remove redundant aggregation keys using uniqueness | 80 ms | 960
ms | 1100% |
+| Remove aggregation keys with functional dependencies | 1410 ms | 2110
ms | 50% |
+| Remove aggregation keys on uniform columns | 110 ms | 150
ms | 36% |
+| Eliminate unnecessary sorting | 130 ms | 370
ms | 185% |
+
+## 6. Feature Improvements
+
+### Semi-structured Data
+
+**VARIANT**
+
+- Added `variant_type(x)` function: returns the current actual type of a
Variant subfield.
+- Added ComputeSignature/Helper to improve the ability to infer function
parameters and return types. to improve parameter and return type inference.
+
+**STRUCT**
+
+- Schema Change now supports adding subcolumns to STRUCT types.
+
+#### Lakehouse
+
+- Now support setting metadata cache policies at the Catalog level (e.g.,
cache expiration time), giving users more flexibility to balance data freshness
and metadata access performance.
+
+Refer to [documentation](https://doris.apache.org/docs/lakehouse/meta-cache)
+
+- Added support for the `FILE()` Table Valued Function, which unifies the
existing `S3()`, `HDFS()`, `LOCAL()`functions into a single interface for
easier use and understanding.
+
+Refer to
[documentation](https://doris.apache.org/docs/dev/sql-manual/sql-functions/table-valued-functions/file)
+
+### Enhanced Aggregate Operator Capabilities
+
+In Doris 3.1, the optimizer has focused on improving the aggregate operators,
adding support for two widely used features.
+
+**Non-Standard GROUP BY Support**
+
+SQL Standard does not permit queries for which the select list, `HAVING`
condition, or `ORDER BY` list refer to nonaggregated columns that are not named
in the `GROUP BY` clause. However, in MySQL, if the SQL_MODE does not include
"ONLY_FULL_GROUP_BY," then there's no restriction. See the MySQL documentation
for more details:
https://dev.mysql.com/doc/refman/8.4/en/sql-mode.html#sqlmode_only_full_group_by
+
+The effect of disabling `ONLY_FULL_GROUP_BY` is equivalent to using
`ANY_VALUE` on non-aggregated columns. For example:
+
+```SQL
+-- Non-standard GROUP BY
+SELECT c1, c2 FROM t GROUP BY c1
+-- Equal to
+SELECT c1, any_value(c2) FROM t GROUP BY c1
+```
+
+In 3.1, Doris enables "ONLY_FULL_GROUP_BY" by default, aligned with previous
behavior. To use non-standard GROUP BY functionality, users can enable it
through configuration below:
+
+```SQL
+set sql_mode = replace(@@sql_mode, 'ONLY_FULL_GROUP_BY', '');
+```
+
+**Support for Multiple DISTINCT Aggregates**
+
+In previous versions, if an aggregation query contained multiple DISTINCT
aggregate functions with different parameters—and their DISTINCT semantics
differed from their non-DISTINCT semantics, while not being one of the
following—Doris could not execute the query:
+
+- single-parameter COUNT
+- SUM
+- AVG
+- GROUP_CONCAT
+
+Doris 3.1 significantly enhances this area. Now, queries involving multiple
distinct aggregates can be executed correctly and return results as expected,
for example:
+
+```SQL
+SELECT count(DISTINCT c1,c2), count(DISTINCT c2,c3), count(DISTINCT c3) FROM t;
+```
+
+## 7. Behavioral Changes
+
+**VARIANT**
+
+- `variant_max_subcolumns_count` constraint
+ - In a single table, all Variant columns must have the same
`variant_max_subcolumns_count`value: either all 0 or all greater than 0. Mixing
values will cause errors during table creation or schema changes.
+- New Variant read/write/serde and compaction paths are backward compatible
with old data. Upgrading from older Variant versions may produce slight
differences in query output (e.g., extra spaces or additional levels created by
. separators).
+- Creating an inverted index on Variant data type will generate empty index
files if none of the data fields meet the index conditions; this is expected
behavior.
+
+**Permissions**
+
+- The permission required for SHOW TRANSACTION has changed: it now requires
LOAD_PRIV on the target database instead of ADMIN_PRIV.
+- SHOW FRONTENDS / BACKENDS and the NODE RESTful API now use the same
permission. These interfaces now require SELECT_PRIV in the information_schema
database.
+
+**Get Started with Apache Doris 3.1**
+
+Even before the official release of version 3.1, several features in
semi-structured data and data lakes have been validated in real-world
production scenarios, and showing expected performance improvements. We
encourage users with relevant needs to try the new version:
+
+**[Download Apache Doris 3.1 on
GitHub](https://github.com/apache/doris/releases)**
+
+**[Download from the official website](https://doris.apache.org/download)**
+
+**Acknowledgements**
+
+We sincerely thank all contributors who contributed to the development,
testing, and giving feedback for this release:
+
+@924060929 @airborne12 @amorynan @BePPPower @BiteTheDDDDt @bobhan1 @CalvinKirs
@cambyzju @cjj2010 @csun5285 @DarvenDuan @dataroaring @deardeng @dtkavin
@dwdwqfwe @eldenmoon @englefly @feifeifeimoon @feiniaofeiafei @felixwluo
@freemandealer @Gabriel39 @gavinchou @ghkang98 @gnehil @gohalo @HappenLee
@heguanhui @hello-stephen @HonestManXin @htyoung @hubgeter @hust-hhb @jacktengg
@jeffreys-cat @Jibing-Li @JNSimba @kaijchen @kaka11chen @KeeProMise @koarz
@liaoxin01 @liujiwen-up @liutang123 @l [...]
\ No newline at end of file
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]