This is an automated email from the ASF dual-hosted git repository.
kassiez pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new f97c83201ec [update] Update alt description of images for seo (#2244)
f97c83201ec is described below
commit f97c83201ec0b0fcc63672eeec89cefeaafa91a9
Author: KassieZ <[email protected]>
AuthorDate: Mon Mar 31 15:33:14 2025 +0800
[update] Update alt description of images for seo (#2244)
## Versions
- [ ] dev
- [ ] 3.0
- [ ] 2.1
- [ ] 2.0
## Languages
- [ ] Chinese
- [ ] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---
docs/admin-manual/auth/authorization/ranger.md | 14 +++++++-------
docs/admin-manual/auth/ranger.md | 18 +++++++++---------
docs/admin-manual/maint-monitor/monitor-alert.md | 8 ++++----
.../trouble-shooting/memory-management/overview.md | 4 ++--
docs/admin-manual/workload-management/compute-group.md | 2 +-
.../admin-manual/workload-management/resource-group.md | 2 +-
docs/admin-manual/workload-management/spill-disk.md | 5 +++--
.../admin-manual/workload-management/workload-group.md | 2 +-
docs/benchmark/tpcds.md | 2 +-
docs/benchmark/tpch.md | 2 +-
docs/compute-storage-decoupled/overview.md | 6 +++---
docs/data-operate/import/group-commit-manual.md | 4 ++--
.../import/import-way/stream-load-manual.md | 2 +-
docs/db-connect/arrow-flight-sql-connect.md | 2 +-
docs/db-connect/database-connect.md | 8 ++++----
docs/ecosystem/flink-doris-connector.md | 2 +-
docs/gettingStarted/what-is-apache-doris.md | 12 ++++++------
.../integrated-storage-compute-deploy-manually.md | 2 +-
docs/lakehouse/lakehouse-overview.md | 8 ++++----
docs/log-storage-analysis.md | 4 ++--
docs/releasenotes/v2.0/release-2.0.0.md | 6 +++---
.../version-2.1/gettingStarted/what-is-apache-doris.md | 12 ++++++------
.../version-2.1/table-design/index/bloomfilter.md | 2 +-
.../version-3.0/compute-storage-decoupled/overview.md | 6 +++---
.../import/import-way/stream-load-manual.md | 2 +-
.../version-3.0/gettingStarted/what-is-apache-doris.md | 10 +++++-----
.../lakehouse/lakehouse-best-practices/doris-paimon.md | 2 +-
versioned_docs/version-3.0/log-storage-analysis.md | 2 +-
.../version-3.0/table-design/index/bloomfilter.md | 2 +-
29 files changed, 77 insertions(+), 76 deletions(-)
diff --git a/docs/admin-manual/auth/authorization/ranger.md
b/docs/admin-manual/auth/authorization/ranger.md
index 7820e2f9265..2dd1f4eeac2 100644
--- a/docs/admin-manual/auth/authorization/ranger.md
+++ b/docs/admin-manual/auth/authorization/ranger.md
@@ -80,41 +80,41 @@ Equivalent to the internal Doris authorization statement
`grant select_priv on *
- The global option can be found in the dropdown box at the same level as the
catalog.
- Only `*` can be entered in the input box.
- 
+ 
#### Catalog Permissions
Equivalent to the internal Doris authorization statement `grant select_priv on
hive.*.* to user1`;
-
+
#### Database Permissions
Equivalent to the internal Doris authorization statement `grant select_priv on
hive.db1.* to user1`;
-
+
#### Table Permissions
> Here, the term "table" generally refers to tables, views, and asynchronous
> materialized views.
Equivalent to the internal Doris authorization statement `grant select_priv on
hive.db1.tbl1 to user1`;
-
+
#### Column Permissions
Equivalent to the internal Doris authorization statement `grant
select_priv(col1,col2) on hive.db1.tbl1 to user1`;
-
+
#### Resource Permissions
Equivalent to the internal Doris authorization statement `grant usage_priv on
resource 'resource1' to user1`;
- The resource option can be found in the dropdown box at the same level as
the catalog.
-
+
#### Workload Group Permissions
Equivalent to the internal Doris authorization statement `grant usage_priv on
workload group 'group1' to user1`;
- The workload group option can be found in the dropdown box at the same level
as the catalog.
-
+
### Row-Level Permissions Example
diff --git a/docs/admin-manual/auth/ranger.md b/docs/admin-manual/auth/ranger.md
index 2a761efc2a3..f6bc3989ca1 100644
--- a/docs/admin-manual/auth/ranger.md
+++ b/docs/admin-manual/auth/ranger.md
@@ -114,11 +114,11 @@ In version 2.1.0, Doris supports unified permission
management by integrating Ap
After the installation is complete, open the Ranger WebUI and you can see the
Apache Doris plug-in in the Service Manger interface:
-
+
Click the `+` button next to the plugin to add a Doris service:
-
+
The meaning of some parameters of Config Properties is as follows:
@@ -248,39 +248,39 @@ Equivalent to Doris' internal authorization statement
`grant select_priv on *.*.
- The global option can be found in the dropdown menu of the same level in the
catalog
- Only `*` can be entered in the input box
- 
+ 
#### Catalog Privileges
Equivalent to Doris' internal authorization statement `grant select_priv on
hive.*.* to user1`;
-
+
#### Database Privileges
Equivalent to Doris' internal authorization statement `grant select_priv on
hive.tpch.* to user1`;
-
+
#### Table Privileges
Equivalent to Doris' internal authorization statement `grant select_priv on
hive.tpch.user to user1`;
-
+
#### Column Privileges
Equivalent to Doris' internal authorization statement `grant
select_priv(name,age) on hive.tpch.user to user1`;
-
+
#### Resource Privileges
Equivalent to Doris' internal authorization statement `grant usage_priv on
resource 'resource1' to user1`;
- The resource option can be found in the dropdown menu of the same level in
the catalog
-
+
#### Workload Group Privileges
Equivalent to Doris' internal authorization statement `grant usage_priv on
workload group 'group1' to user1`;
- The workload group option can be found in the dropdown menu of the same
level in the catalog
-
+
### Row Policy Example
diff --git a/docs/admin-manual/maint-monitor/monitor-alert.md
b/docs/admin-manual/maint-monitor/monitor-alert.md
index 6bde0166c1e..3f713d42d77 100644
--- a/docs/admin-manual/maint-monitor/monitor-alert.md
+++ b/docs/admin-manual/maint-monitor/monitor-alert.md
@@ -42,7 +42,7 @@ Welcome to provide better dashboard.
Doris uses [Prometheus](https://prometheus.io/) and
[Grafana](https://grafana.com/) to collect and display input monitoring items.
-
+
1. Prometheus
@@ -263,7 +263,7 @@ Here we briefly introduce Doris Dashboard. The content of
Dashboard may change w
1. Top Bar
- 
+ 
* The upper left corner is the name of Dashboard.
* The upper right corner shows the current monitoring time range. You
can choose different time ranges by dropping down. You can also specify a
regular refresh page interval.
@@ -275,7 +275,7 @@ Here we briefly introduce Doris Dashboard. The content of
Dashboard may change w
2. Row.
- 
+ 
In Grafana, the concept of Row is a set of graphs. As shown in the
figure above, Overview and Cluster Overview are two different Rows. Row can be
folded by clicking Row. Currently Dashboard has the following Rows (in
continuous updates):
@@ -288,7 +288,7 @@ Here we briefly introduce Doris Dashboard. The content of
Dashboard may change w
3. Charts
- 
+ 
A typical icon is divided into the following parts:
diff --git a/docs/admin-manual/trouble-shooting/memory-management/overview.md
b/docs/admin-manual/trouble-shooting/memory-management/overview.md
index 7a65b95c6f0..8a47194c28a 100644
--- a/docs/admin-manual/trouble-shooting/memory-management/overview.md
+++ b/docs/admin-manual/trouble-shooting/memory-management/overview.md
@@ -32,7 +32,7 @@ When facing complex calculations and large-scale operations
with huge memory res
## Doris BE memory structure
-
+
```
Server physical memory: The physical memory used by all processes on the
server, MemTotal seen by `cat /proc/meminfo` or `free -h`.
@@ -97,7 +97,7 @@ For more information about Memory Tracker, refer to [Memory
Tracker](./memory-fe
Historical memory statistics can be viewed through Doris BE's Bvar page
`http://{be_host}:{brpc_port}/vars/*memory_*`. Use the real-time memory
statistics page `http://{be_host}:{be_web_server_port}/mem_tracker` to search
for the Bvar page under the Memory Tracker Label to get the memory size change
trend tracked by the corresponding Memory Tracker. `brpc_port` defaults to 8060.
-
+
When the error process memory exceeds the limit or the available memory is
insufficient, you can find the `Memory Tracker Summary` in the `be/log/be.INFO`
log, which contains all the Memory Trackers of `Type=overview` and
`Type=global`, to help users analyze the memory status at that time. For
details, please refer to [Memory Log
Analysis](./memory-analysis/memory-log-analysis.md)
diff --git a/docs/admin-manual/workload-management/compute-group.md
b/docs/admin-manual/workload-management/compute-group.md
index e31257384c8..24a246d5087 100644
--- a/docs/admin-manual/workload-management/compute-group.md
+++ b/docs/admin-manual/workload-management/compute-group.md
@@ -26,7 +26,7 @@ under the License.
Compute Group is a mechanism for physical isolation between different
workloads in a storage-compute separation architecture. The basic principle of
Compute Group is illustrated in the diagram below:
-
+
- One or more BE nodes can form a Compute Group.
diff --git a/docs/admin-manual/workload-management/resource-group.md
b/docs/admin-manual/workload-management/resource-group.md
index 2991a25a1ed..c656ce9db21 100644
--- a/docs/admin-manual/workload-management/resource-group.md
+++ b/docs/admin-manual/workload-management/resource-group.md
@@ -26,7 +26,7 @@ under the License.
Resource Group is a mechanism under the compute-storage integration
architecture that achieves physical isolation between different workloads. Its
basic principle is illustrated in this diagram:
-
+
- By using tags, BEs are divided into different groups, each identified by the
tag's name. For example, in the diagram above, host1, host2, and host3 are all
set to group a, while host4 and host5 are set to group b.
diff --git a/docs/admin-manual/workload-management/spill-disk.md
b/docs/admin-manual/workload-management/spill-disk.md
index 809a7e8b950..3a5f5dfec0e 100644
--- a/docs/admin-manual/workload-management/spill-disk.md
+++ b/docs/admin-manual/workload-management/spill-disk.md
@@ -43,9 +43,10 @@ Currently, the operators that support spilling include:
When a query triggers spilling, additional disk read/write operations may
significantly increase query time. It is recommended to increase the FE Session
variable query_timeout. Additionally, spilling can generate significant disk
I/O, so it is advisable to configure a separate disk directory or use SSD disks
to reduce the impact of query spilling on normal data ingestion or queries. The
query spilling feature is currently disabled by default.
-##Memory Management Mechanism
+## Memory Management Mechanism
Doris's memory management is divided into three levels: process level,
Workload Group level, and Query level.
-
+
+
### BE Process Memory Configuration
The memory of the entire BE process is controlled by the mem_limit parameter
in be.conf. Once Doris's memory usage exceeds this threshold, Doris cancels the
current query that is requesting memory. Additionally, a background task
asynchronously kills some queries to release memory or cache. Therefore,
Doris's internal management operations (such as spilling to disk, flushing
memtable, etc.) need to run when approaching this threshold to avoid reaching
it. Once the threshold is reached, t [...]
diff --git a/docs/admin-manual/workload-management/workload-group.md
b/docs/admin-manual/workload-management/workload-group.md
index 1d24e813e77..0d52f52e00b 100644
--- a/docs/admin-manual/workload-management/workload-group.md
+++ b/docs/admin-manual/workload-management/workload-group.md
@@ -29,7 +29,7 @@ Workload Group is an in-process mechanism for isolating
workloads.
It achieves resource isolation by finely partitioning or limiting resources
(CPU, IO, Memory) within the BE process.
Its principle is illustrated in the diagram below:
-
+
The currently supported isolation capabilities include:
diff --git a/docs/benchmark/tpcds.md b/docs/benchmark/tpcds.md
index 149421addcf..47f5d0d83d8 100644
--- a/docs/benchmark/tpcds.md
+++ b/docs/benchmark/tpcds.md
@@ -35,7 +35,7 @@ This document mainly introduces the performance of Doris on
the TPC-DS 1000G tes
On 99 queries on the TPC-DS standard test data set, we conducted a comparison
test based on Apache Doris 2.1.7-rc03 and Apache Doris 2.0.15.1 versions.
-
+
## 1. Hardware Environment
diff --git a/docs/benchmark/tpch.md b/docs/benchmark/tpch.md
index 6d5e2229d06..9d3a6e9b118 100644
--- a/docs/benchmark/tpch.md
+++ b/docs/benchmark/tpch.md
@@ -32,7 +32,7 @@ This document mainly introduces the performance of Doris on
the TPC-H 1000G test
On 22 queries on the TPC-H standard test data set, we conducted a comparison
test based on Apache Doris 2.1.7-rc03 and Apache Doris 2.0.15.1 versions.
-
+
## 1. Hardware Environment
diff --git a/docs/compute-storage-decoupled/overview.md
b/docs/compute-storage-decoupled/overview.md
index 18b6ee343a3..54aa05bba2b 100644
--- a/docs/compute-storage-decoupled/overview.md
+++ b/docs/compute-storage-decoupled/overview.md
@@ -32,17 +32,17 @@ The following sections will describe in detail how to
deploy and use Apache Dori
The overall architecture of Doris consists of two types of processes: Frontend
(FE) and Backend (BE). The FE is primarily responsible for user request access,
query parsing and planning, metadata management, and node management. The BE is
responsible for data storage and query plan execution. ([More
information](../gettingStarted/what-is-apache-doris))
-### **Compute-storage coupled**
+### Compute-storage coupled
In the compute-storage coupled mode, the BE nodes perform both data storage
and computation, and multiple BE nodes forms a massively parallel processing
(MPP) distributed computing architecture.
-
+
### **Compute-storage decoupled**
The BE nodes no longer store the primary data. Instead, the shared storage
layer serves as the unified primary data storage. Additionally, to overcome the
performance loss caused by the limitations of the underlying object storage
system and the overhead of network transmission, Doris introduces a high-speed
cache on the local compute nodes.
-
+
**Meta data layer:**
diff --git a/docs/data-operate/import/group-commit-manual.md
b/docs/data-operate/import/group-commit-manual.md
index d87ec074069..315b10359e3 100644
--- a/docs/data-operate/import/group-commit-manual.md
+++ b/docs/data-operate/import/group-commit-manual.md
@@ -609,8 +609,8 @@ PROPERTIES (
JMeter Parameter Settings as Shown in the Images
-
-
+
+
1. Set the Init Statement Before Testing:
diff --git a/docs/data-operate/import/import-way/stream-load-manual.md
b/docs/data-operate/import/import-way/stream-load-manual.md
index 9763a166814..e986b081936 100644
--- a/docs/data-operate/import/import-way/stream-load-manual.md
+++ b/docs/data-operate/import/import-way/stream-load-manual.md
@@ -54,7 +54,7 @@ When using Stream Load, it is necessary to initiate an import
job through the HT
The following figure shows the main flow of Stream Load, omitting some import
details.
-
+
1. The client submits a Stream Load imports job request to the FE (Frontend).
2. The FE selects a BE (Backend) as the Coordinator node in a round-robin
manner, which is responsible for scheduling the import job, and then returns an
HTTP redirect to the client.
diff --git a/docs/db-connect/arrow-flight-sql-connect.md
b/docs/db-connect/arrow-flight-sql-connect.md
index 949f4a70631..d2ba18e81ce 100644
--- a/docs/db-connect/arrow-flight-sql-connect.md
+++ b/docs/db-connect/arrow-flight-sql-connect.md
@@ -30,7 +30,7 @@ Since Doris 2.1, a high-speed data link based on the Arrow
Flight SQL protocol h
In Doris, query results are organized in columnar format as Blocks. In
versions prior to 2.1, data could be transferred to the target client via MySQL
Client or JDBC/ODBC drivers, but this required deserializing row-based Bytes
into columnar format. By building a high-speed data transfer link based on
Arrow Flight SQL, if the target client also supports Arrow columnar format, the
entire transfer process avoids serialization and deserialization operations,
completely eliminating the time [...]
-
+
To install Apache Arrow, you can find detailed installation instructions in
the official documentation [Apache Arrow](https://arrow.apache.org/install/).
For more information on how Doris implements the Arrow Flight protocol, you can
refer to [Doris support Arrow Flight SQL
protocol](https://github.com/apache/doris/issues/25514).
diff --git a/docs/db-connect/database-connect.md
b/docs/db-connect/database-connect.md
index 7032e6ac787..54bd17cd120 100644
--- a/docs/db-connect/database-connect.md
+++ b/docs/db-connect/database-connect.md
@@ -83,11 +83,11 @@
jdbc:mysql://FE_IP:FE_PORT/demo?sessionVariables=key1=val1,key2=val2
Create a MySQL connection to Apache Doris:
-
+
Query in DBeaver:
-
+
## Built-in Web UI of Doris
@@ -97,7 +97,7 @@ To access the Web UI, simply enter the URL in a web browser:
http://fe_ip:fe_por
The built-in Web console is primarily intended for use by the root account of
the cluster. By default, the root account password is empty after installation.
-
+
For example, you can execute the following command in the Playground to add a
BE node.
@@ -105,7 +105,7 @@ For example, you can execute the following command in the
Playground to add a BE
ALTER SYSTEM ADD BACKEND "be_host_ip:heartbeat_service_port";
```
-
+
:::tip
For successful execution of statements that are not related to specific
databases/tables in the Playground, it is necessary to randomly select a
database from the left-hand database panel. This limitation will be removed
later.
diff --git a/docs/ecosystem/flink-doris-connector.md
b/docs/ecosystem/flink-doris-connector.md
index f12a93a2f6b..c8ef9ed37bf 100644
--- a/docs/ecosystem/flink-doris-connector.md
+++ b/docs/ecosystem/flink-doris-connector.md
@@ -76,7 +76,7 @@ To use it with Maven, simply add the following dependency to
your Pom file:
### Reading Data from Doris
-
+
When reading data, Flink Doris Connector offers higher performance compared to
Flink JDBC Connector and is recommended for use:
diff --git a/docs/gettingStarted/what-is-apache-doris.md
b/docs/gettingStarted/what-is-apache-doris.md
index 112e86d8463..924f97a154d 100644
--- a/docs/gettingStarted/what-is-apache-doris.md
+++ b/docs/gettingStarted/what-is-apache-doris.md
@@ -37,7 +37,7 @@ Apache Doris has a wide user base. It has been used in
production environments o
As shown in the figure below, after various data integrations and processing,
data sources are typically ingested into the real-time data warehouse Doris and
offline lakehouses (such as Hive, Iceberg, and Hudi). These are widely used in
OLAP analysis scenarios.
-
+
Apache Doris is widely used in the following scenarios:
@@ -74,7 +74,7 @@ The storage-compute integrated architecture of Apache Doris
is streamlined and e
- **Backend (BE):** Primarily responsible for data storage and query
execution. Data is partitioned into shards and stored with multiple replicas
across BE nodes.
-
+
In a production environment, multiple FE nodes can be deployed for disaster
recovery. Each FE node maintains a full copy of the metadata. The FE nodes are
divided into three roles:
@@ -96,7 +96,7 @@ Starting from version 3.0, a compute-storage decoupled
deployment architecture c
- **Storage Layer**: The storage layer can use shared storage solutions such
as S3, HDFS, OSS, COS, OBS, Minio, and Ceph to store Doris's data files,
including Segment files and inverted index files.
-
+
## Core Features of Apache Doris
@@ -146,15 +146,15 @@ Apache Doris also supports strongly consistent
single-table materialized views a
Apache Doris has an MPP-based query engine for parallel execution between and
within nodes. It supports distributed shuffle join for large tables to better
handle complicated queries.
-
+
The query engine of Apache Doris is fully vectorized, with all memory
structures laid out in a columnar format. This can largely reduce virtual
function calls, increase cache hit rates, and make efficient use of SIMD
instructions. Apache Doris delivers a 5~10 times higher performance in wide
table aggregation scenarios than non-vectorized engines.
-
+
Apache Doris uses adaptive query execution technology to dynamically adjust
the execution plan based on runtime statistics. For example, it can generate a
runtime filter and push it to the probe side. Specifically, it pushes the
filters to the lowest-level scan node on the probe side, which largely reduces
the data amount to be processed and increases join performance. The runtime
filter of Apache Doris supports In/Min/Max/Bloom Filter.
-
+
Apache Doris uses a Pipeline execution engine that breaks down queries into
multiple sub-tasks for parallel execution, fully leveraging multi-core CPU
capabilities. It simultaneously addresses the thread explosion problem by
limiting the number of query threads. The Pipeline execution engine reduces
data copying and sharing, optimizes sorting and aggregation operations, thereby
significantly improving query efficiency and throughput.
diff --git
a/docs/install/deploy-manually/integrated-storage-compute-deploy-manually.md
b/docs/install/deploy-manually/integrated-storage-compute-deploy-manually.md
index 9c0ebc341aa..be80f0b9b65 100644
--- a/docs/install/deploy-manually/integrated-storage-compute-deploy-manually.md
+++ b/docs/install/deploy-manually/integrated-storage-compute-deploy-manually.md
@@ -27,7 +27,7 @@ After completing the preliminary checks and planning, such as
environment checks
The integrated storage-compute architecture is shown below, and the deployment
of the integrated storage-compute cluster involves four steps:
-[integrated-storage-compute-architecture](/images/getting-started/apache-doris-technical-overview.png)
+[MPP-based integrated storage compute
architecture](/images/getting-started/apache-doris-technical-overview.png)
1. **Deploy FE Master Node**: Deploy the first FE node as the Master node;
diff --git a/docs/lakehouse/lakehouse-overview.md
b/docs/lakehouse/lakehouse-overview.md
index 0dbc631b480..9f5cb882118 100644
--- a/docs/lakehouse/lakehouse-overview.md
+++ b/docs/lakehouse/lakehouse-overview.md
@@ -30,7 +30,7 @@ under the License.
Doris provides an excellent lakehouse solution for users through an extensible
connector framework, a compute-storage decoupled architecture, a
high-performance data processing engine, and data ecosystem openness.
-
+
### Flexible Data Access
@@ -137,7 +137,7 @@ In the lakehouse solution, Doris is mainly used for
**lakehouse query accelerati
In this scenario, Doris acts as a **compute engine**, accelerating query
analysis on lakehouse data.
-
+
#### Cache Acceleration
@@ -153,7 +153,7 @@ This feature can significantly improve query performance by
reducing runtime com
Doris can act as a **unified SQL query engine**, connecting different data
sources for federated analysis, solving data silos.
-
+
Users can dynamically create multiple catalogs in Doris to connect different
data sources. They can use SQL statements to perform arbitrary join queries on
data from different data sources. For details, refer to the [Catalog
Overview](catalog-overview.md).
@@ -161,7 +161,7 @@ Users can dynamically create multiple catalogs in Doris to
connect different dat
In this scenario, **Doris acts as a data processing engine**, processing
lakehouse data.
-
+
#### Task Scheduling
diff --git a/docs/log-storage-analysis.md b/docs/log-storage-analysis.md
index 068786df212..cdf261395fb 100644
--- a/docs/log-storage-analysis.md
+++ b/docs/log-storage-analysis.md
@@ -40,7 +40,7 @@ Focused on this solution, this chapter contains the following
3 sections:
The following figure illustrates the architecture of the log storage and
analysis platform built on Apache Doris:
-
+
The architecture contains the following 3 parts:
@@ -577,7 +577,7 @@ ORDER BY ts DESC LIMIT 10;
Some third-party vendors offer visual log analysis development platforms based
on Apache Doris, which include a log search and analysis interface similar to
Kibana Discover. These platforms provide an intuitive and user-friendly
exploratory log analysis interaction.
-
+
- Support for full-text search and SQL modes
diff --git a/docs/releasenotes/v2.0/release-2.0.0.md
b/docs/releasenotes/v2.0/release-2.0.0.md
index 85d0ea43dab..782eaf7fbe5 100644
--- a/docs/releasenotes/v2.0/release-2.0.0.md
+++ b/docs/releasenotes/v2.0/release-2.0.0.md
@@ -44,7 +44,7 @@ This new version highlights:
In SSB-Flat and TPC-H benchmarking, Apache Doris 2.0.0 delivered **over
10-time faster query performance** compared to an early version of Apache Doris.
-
+
This is realized by the introduction of a smarter query optimizer, inverted
index, a parallel execution model, and a series of new functionalities to
support high-concurrency point queries.
@@ -54,7 +54,7 @@ The brand new query optimizer, Nereids, has a richer
statistical base and adopts
TPC-H tests showed that Nereids, with no human intervention, outperformed the
old query optimizer by a wide margin. Over 100 users have tried Apache Doris
2.0.0 in their production environment and the vast majority of them reported
huge speedups in query execution.
-
+
**Doc**: https://doris.apache.org/docs/dev/query-acceleration/nereids/
@@ -66,7 +66,7 @@ In Apache Doris 2.0.0, we introduced inverted index to better
support fuzzy keyw
A smartphone manufacturer tested Apache Doris 2.0.0 in their user behavior
analysis scenarios. With inverted index enabled, v2.0.0 was able to finish the
queries within milliseconds and maintain stable performance as the query
concurrency level went up. In this case, it is 5 to 90 times faster than its
old version.
-
+
### 20 times higher concurrency capability
diff --git a/versioned_docs/version-2.1/gettingStarted/what-is-apache-doris.md
b/versioned_docs/version-2.1/gettingStarted/what-is-apache-doris.md
index 4f3356dc923..bab0c6add59 100644
--- a/versioned_docs/version-2.1/gettingStarted/what-is-apache-doris.md
+++ b/versioned_docs/version-2.1/gettingStarted/what-is-apache-doris.md
@@ -37,7 +37,7 @@ Apache Doris has a wide user base. It has been used in
production environments o
As shown in the figure below, after various data integrations and processing,
data sources are typically ingested into the real-time data warehouse Doris and
offline lakehouses (such as Hive, Iceberg, and Hudi). These are widely used in
OLAP analysis scenarios.
-
+
Apache Doris is widely used in the following scenarios:
@@ -73,7 +73,7 @@ The storage-compute integrated architecture of Apache Doris
is streamlined and e
- **Backend (BE):** Primarily responsible for data storage and query
execution. Data is partitioned into shards and stored with multiple replicas
across BE nodes.
-
+
In a production environment, multiple FE nodes can be deployed for disaster
recovery. Each FE node maintains a full copy of the metadata. The FE nodes are
divided into three roles:
@@ -133,15 +133,15 @@ Apache Doris also supports strongly consistent
single-table materialized views a
Apache Doris has an MPP-based query engine for parallel execution between and
within nodes. It supports distributed shuffle join for large tables to better
handle complicated queries.
-
+
The query engine of Apache Doris is fully vectorized, with all memory
structures laid out in a columnar format. This can largely reduce virtual
function calls, increase cache hit rates, and make efficient use of SIMD
instructions. Apache Doris delivers a 5~10 times higher performance in wide
table aggregation scenarios than non-vectorized engines.
-
-
+
+
Apache Doris uses adaptive query execution technology to dynamically adjust
the execution plan based on runtime statistics. For example, it can generate a
runtime filter and push it to the probe side. Specifically, it pushes the
filters to the lowest-level scan node on the probe side, which largely reduces
the data amount to be processed and increases join performance. The runtime
filter of Apache Doris supports In/Min/Max/Bloom Filter.
-
+
Apache Doris uses a Pipeline execution engine that breaks down queries into
multiple sub-tasks for parallel execution, fully leveraging multi-core CPU
capabilities. It simultaneously addresses the thread explosion problem by
limiting the number of query threads. The Pipeline execution engine reduces
data copying and sharing, optimizes sorting and aggregation operations, thereby
significantly improving query efficiency and throughput.
diff --git a/versioned_docs/version-2.1/table-design/index/bloomfilter.md
b/versioned_docs/version-2.1/table-design/index/bloomfilter.md
index 49094e3b853..63552039260 100644
--- a/versioned_docs/version-2.1/table-design/index/bloomfilter.md
+++ b/versioned_docs/version-2.1/table-design/index/bloomfilter.md
@@ -38,7 +38,7 @@ A BloomFilter consists of a very long binary bit array and a
series of hash func
The figure below shows an example of a BloomFilter with m=18 and k=3 (where m
is the size of the bit array and k is the number of hash functions). Elements
x, y, and z in the set are hashed by 3 different hash functions into the bit
array. When querying element w, if any bit calculated by the hash functions is
0, then w is not in the set. Conversely, if all bits are 1, it only indicates
that w may be in the set, but not definitely, due to possible hash collisions.
-
+
Thus, if all bits at the calculated positions are 1, it only indicates that
the element may be in the set, not definitely, due to possible hash collisions.
This is the "false positive" nature of BloomFilter. Therefore, a
BloomFilter-based index can only skip data that does not meet the conditions
but cannot precisely locate data that does.
diff --git a/versioned_docs/version-3.0/compute-storage-decoupled/overview.md
b/versioned_docs/version-3.0/compute-storage-decoupled/overview.md
index 7f0b5cc1f75..25c71e65337 100644
--- a/versioned_docs/version-3.0/compute-storage-decoupled/overview.md
+++ b/versioned_docs/version-3.0/compute-storage-decoupled/overview.md
@@ -28,17 +28,17 @@ This article introduces the differences, advantages, and
applicable scenarios of
The following sections will describe in detail how to deploy and use Apache
Doris in the compute-storage decoupled mode. For information on deployment in
compute-storage coupled mode, please refer to the [Cluster
Deployment](../../../docs/install/deploy-manually/integrated-storage-compute-deploy-manually)
section.
-## **Compute-storage coupled VS decoupled**
+## Compute-storage coupled VS decoupled
The overall architecture of Doris consists of two types of processes: Frontend
(FE) and Backend (BE). The FE is primarily responsible for user request access,
query parsing and planning, metadata management, and node management. The BE is
responsible for data storage and query plan execution. ([More
information](../gettingStarted/what-is-apache-doris))
-### **Compute-storage coupled**
+### Compute-storage coupled
In the compute-storage coupled mode, the BE nodes perform both data storage
and computation, and multiple BE nodes forms a massively parallel processing
(MPP) distributed computing architecture.

-### **Compute-storage decoupled**
+### Compute-storage decoupled
The BE nodes no longer store the primary data. Instead, the shared storage
layer serves as the unified primary data storage. Additionally, to overcome the
performance loss caused by the limitations of the underlying object storage
system and the overhead of network transmission, Doris introduces a high-speed
cache on the local compute nodes.
diff --git
a/versioned_docs/version-3.0/data-operate/import/import-way/stream-load-manual.md
b/versioned_docs/version-3.0/data-operate/import/import-way/stream-load-manual.md
index 9763a166814..988a8a38ba4 100644
---
a/versioned_docs/version-3.0/data-operate/import/import-way/stream-load-manual.md
+++
b/versioned_docs/version-3.0/data-operate/import/import-way/stream-load-manual.md
@@ -54,7 +54,7 @@ When using Stream Load, it is necessary to initiate an import
job through the HT
The following figure shows the main flow of Stream Load, omitting some import
details.
-
+
1. The client submits a Stream Load imports job request to the FE (Frontend).
2. The FE selects a BE (Backend) as the Coordinator node in a round-robin
manner, which is responsible for scheduling the import job, and then returns an
HTTP redirect to the client.
diff --git a/versioned_docs/version-3.0/gettingStarted/what-is-apache-doris.md
b/versioned_docs/version-3.0/gettingStarted/what-is-apache-doris.md
index 15847a98acd..76b0afe75f6 100644
--- a/versioned_docs/version-3.0/gettingStarted/what-is-apache-doris.md
+++ b/versioned_docs/version-3.0/gettingStarted/what-is-apache-doris.md
@@ -37,7 +37,7 @@ Apache Doris has a wide user base. It has been used in
production environments o
As shown in the figure below, after various data integrations and processing,
data sources are typically ingested into the real-time data warehouse Doris and
offline lakehouses (such as Hive, Iceberg, and Hudi). These are widely used in
OLAP analysis scenarios.
-
+
Apache Doris is widely used in the following scenarios:
@@ -74,7 +74,7 @@ The storage-compute integrated architecture of Apache Doris
is streamlined and e
- **Backend (BE):** Primarily responsible for data storage and query
execution. Data is partitioned into shards and stored with multiple replicas
across BE nodes.
-
+
In a production environment, multiple FE nodes can be deployed for disaster
recovery. Each FE node maintains a full copy of the metadata. The FE nodes are
divided into three roles:
@@ -146,15 +146,15 @@ Apache Doris also supports strongly consistent
single-table materialized views a
Apache Doris has an MPP-based query engine for parallel execution between and
within nodes. It supports distributed shuffle join for large tables to better
handle complicated queries.
-
+
The query engine of Apache Doris is fully vectorized, with all memory
structures laid out in a columnar format. This can largely reduce virtual
function calls, increase cache hit rates, and make efficient use of SIMD
instructions. Apache Doris delivers a 5~10 times higher performance in wide
table aggregation scenarios than non-vectorized engines.
-
+
Apache Doris uses adaptive query execution technology to dynamically adjust
the execution plan based on runtime statistics. For example, it can generate a
runtime filter and push it to the probe side. Specifically, it pushes the
filters to the lowest-level scan node on the probe side, which largely reduces
the data amount to be processed and increases join performance. The runtime
filter of Apache Doris supports In/Min/Max/Bloom Filter.
-
+
Apache Doris uses a Pipeline execution engine that breaks down queries into
multiple sub-tasks for parallel execution, fully leveraging multi-core CPU
capabilities. It simultaneously addresses the thread explosion problem by
limiting the number of query threads. The Pipeline execution engine reduces
data copying and sharing, optimizes sorting and aggregation operations, thereby
significantly improving query efficiency and throughput.
diff --git
a/versioned_docs/version-3.0/lakehouse/lakehouse-best-practices/doris-paimon.md
b/versioned_docs/version-3.0/lakehouse/lakehouse-best-practices/doris-paimon.md
index 8a7a5ac79f1..21756f9b7d8 100644
---
a/versioned_docs/version-3.0/lakehouse/lakehouse-best-practices/doris-paimon.md
+++
b/versioned_docs/version-3.0/lakehouse/lakehouse-best-practices/doris-paimon.md
@@ -226,7 +226,7 @@ mysql> select * from customer where c_nationkey=1 limit 2;
We conducted a simple test on the TPCDS 1000 dataset in Paimon (0.8) version,
using Apache Doris 2.1.5 version and Trino 422 version, both with the Primary
Key Table Read Optimized feature enabled.
-
+
From the test results, it can be seen that Doris' average query performance on
the standard static test set is 3-5 times that of Trino. In the future, we will
optimize the Deletion Vector to further improve query efficiency in real
business scenarios.
diff --git a/versioned_docs/version-3.0/log-storage-analysis.md
b/versioned_docs/version-3.0/log-storage-analysis.md
index 74e10aae94d..3562be7f0ec 100644
--- a/versioned_docs/version-3.0/log-storage-analysis.md
+++ b/versioned_docs/version-3.0/log-storage-analysis.md
@@ -40,7 +40,7 @@ Focused on this solution, this chapter contains the following
3 sections:
The following figure illustrates the architecture of the log storage and
analysis platform built on Apache Doris:
-
+
The architecture contains the following 3 parts:
diff --git a/versioned_docs/version-3.0/table-design/index/bloomfilter.md
b/versioned_docs/version-3.0/table-design/index/bloomfilter.md
index 49094e3b853..245f446458c 100644
--- a/versioned_docs/version-3.0/table-design/index/bloomfilter.md
+++ b/versioned_docs/version-3.0/table-design/index/bloomfilter.md
@@ -38,7 +38,7 @@ A BloomFilter consists of a very long binary bit array and a
series of hash func
The figure below shows an example of a BloomFilter with m=18 and k=3 (where m
is the size of the bit array and k is the number of hash functions). Elements
x, y, and z in the set are hashed by 3 different hash functions into the bit
array. When querying element w, if any bit calculated by the hash functions is
0, then w is not in the set. Conversely, if all bits are 1, it only indicates
that w may be in the set, but not definitely, due to possible hash collisions.
-
+
Thus, if all bits at the calculated positions are 1, it only indicates that
the element may be in the set, not definitely, due to possible hash collisions.
This is the "false positive" nature of BloomFilter. Therefore, a
BloomFilter-based index can only skip data that does not meet the conditions
but cannot precisely locate data that does.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]