EasonFeng5870 commented on a change in pull request #6193:
URL: https://github.com/apache/shardingsphere/pull/6193#discussion_r448301275



##########
File path: docs/blog/content/material/database.en.md
##########
@@ -4,4 +4,296 @@ weight = 6
 chapter = true
 +++
 
-## TODO
+## How we build a distributed database
+
+Author | Liang Zhang
+
+  In the past few decades, relational database continuously occupy an absolute 
dominant position in the field of databases. The advantages of relational 
database, like stability, safety, ease of use that are the cornerstone of a 
modern system. With the rapid development of the Internet, databases built on 
stand-alone systems are no longer able to meet higher and higher concurrent 
requests and larger and larger data storage requirements. Therefore, 
distributed databases are more and more widely adopted.
+
+  All along, the database field has been dominated by Western technology 
companies and communities. Nowadays, more and more domestic database solutions 
take distributed as the fulcrum, and gradually make achievements in this field. 
Apache ShardingSphere is one of the distributed database solutions and the only 
database middleware in the Apache Software Foundation so far.
+
+### 1 Background
+
+  Fully compatible with SQL and transactions for traditional relational 
databases, and naturally friendly to distribution, is the design goal of 
distributed database solutions. Its core functions are mainly concentrated in 
the following points:
+
+- Distributed storage: Data storage is not limited by the disk capacity of a 
single machine, and the storage capacity can be improved by increasing the 
number of data servers;
+    
+- Separation of computing and storage: Computing nodes are stateless and can 
increase computing power through horizontal expansion. Storage nodes and 
computing nodes can be optimized hierarchically;
+  
+- Distributed transaction: A high-performance, distributed transaction 
processing engine that fully supports the original meaning of local 
transactions ACID;
+    
+- Elastic scaling: You can dynamically expand and shrink data storage nodes 
anytime, anywhere without affecting existing applications;
+    
+- Multiple data copies: Automatically copy the data to multiple copies across 
datacenters in a strong and consistent manner to ensure the absolute security 
of the data;
+    
+- HTAP: The same set of products is used to mix transactional operations of 
OLTP and analytical operations of OLAP.
+    
+
+The implementation solutions of distributed database can be divided into 
aggressive and stable. The aggressive implementation solution refers to the 
development of a new architecture of NewSQL. Such products are in pursuit of 
higher performance in exchange for the lack of stability and the lack of 
experience in operation and maintenance; the stable implementation solution 
refers to the middleware that provides incremental capabilities based on the 
existing database. Such products sacrifice some performance to ensure the 
stability of the database and reuse of operation and maintenance experience.
+
+### 2 Architecture
+
+Apache ShardingSphere is an ecosystem of open source distributed database 
middleware solutions, It consists of three independent products, Sharding-JDBC, 
Sharding-Proxy, and Sharding-Sidecar (planned). They all provide functions such 
as standardized data sharding, distributed transactions, and distributed 
governance, and can be applied to various diverse application scenarios such as 
Java isomorphism, heterogeneous languages, and cloud native. With the 
continuous exploration of Apache ShardingSphere in query optimizer and 
distributed transaction engine, it has gradually broken the product boundary of 
the implementation solution and evolved into an aggressive and stable 
platform-level solution.
+
+**Sharding-JDBC**
+
+Positioned as a lightweight Java framework, additional services provided in 
Java's JDBC layer. It uses the client to connect directly to the database and 
provides services in the form of jar packages without additional deployment and 
dependencies. It can be understood as an enhanced version of the JDBC driver 
and is fully compatible with JDBC and various ORM frameworks.
+
+![](https://shardingsphere.apache.org/blog/img/database1.jpg)
+
+**Sharding-Proxy**
+
+Positioned as a transparent database agent, it provides a server-side version 
that encapsulates the database binary protocol to complete support for 
heterogeneous languages. Currently, MySQL and PostgreSQL versions are 
available. It can use any access client compatible with MySQL and PostgreSQL 
protocol (such as: MySQL Command Client, MySQL Workbench, Navicat, etc.) to 
operate data, which is more friendly to DBA.
+
+![](https://shardingsphere.apache.org/blog/img/database2.jpg)
+
+**Sharding-Sidecar(Planned)**
+
+Positioned as a cloud-native database agent for Kubernetes, it will proxy all 
access to the database in the form of Sidecar. Through a non-central, 
zero-intrusion solution, it provides the meshing layer that interacts with the 
database, called as Database Mesh, which can also be called a database grid.
+
+![](https://shardingsphere.apache.org/blog/img/database3.jpg)
+
+**Hybrid architecture with separate computing and storage**
+
+Sharding-JDBC adopts a decentralized architecture, suitable for 
high-performance lightweight OLTP applications developed in Java; 
Sharding-Proxy provides static entry and heterogeneous language support, 
suitable for OLAP applications and management and operation and maintenance of 
sharded databases scene.
+
+Each architecture solution has its own advantages and disadvantages. The 
following table compares the advantages and disadvantages of various 
architecture models in different scenarios:
+
+![](https://shardingsphere.apache.org/blog/img/database4.jpg)
+
+Apache ShardingSphere is an ecosystem composed of multiple access points. By 
mixing Sharding-JDBC and Sharding-Proxy, and using the same configuration 
center to configure the sharding strategy uniformly, it is possible to flexibly 
build application systems suitable for various scenarios, allowing architects 
to freely adjust the best system suitable for the current business Architecture.
+
+![](https://shardingsphere.apache.org/blog/img/database5.jpg)
+
+Apache ShardingSphere adopts Share Nothing architecture, and its JDBC and 
Proxy access endpoints both adopt a stateless design. As a computing node, 
Apache ShardingSphere is responsible for the final calculation and summary of 
the acquired data. Since it does not store data itself, Apache ShardingSphere 
can push the calculation down to the data node to take full advantage of the 
database's own computing power. Apache ShardingSphere can increase the 
computing power by increasing the number of deployed nodes; increase the 
storage capacity by increasing the number of database nodes.
+
+### 3 Core functions
+
+Data sharding, distributed transactions, elastic scaling, and distributed 
governance are the four core functions of Apache ShardingSphere at the current 
stage.
+
+#### Data sharding
+
+Divide and conquer is the solution used by Apache ShardingSphere to process 
massive data. Apache ShardingSphere enables distributed storage capabilities in 
databases through data sharding solutions.
+
+It can automatically route SQL to the corresponding data node according to the 
user's preset sharding algorithm to achieve the purpose of operating multiple 
databases. Users can use multiple databases managed by Apache ShardingSphere 
like a stand-alone database. Currently supports MySQL, PostgreSQL, Oracle, 
SQLServer and any database that supports SQL92 standard and JDBC standard 
protocol. The core flow of data sharding is shown in the figure below:
+
+![](https://shardingsphere.apache.org/blog/img/database6.jpg)
+
+The main process is as follows:
+
+1. Obtain the SQL and parameters input by the user by parsing the database 
protocol package or JDBC driver;
+    
+2. Parse SQL into AST (Abstract Syntax Tree) according to lexical analyzer and 
grammar analyzer, and extract the information required for fragmentation;
+    
+3. Match the shard key according to the user preset algorithm and calculate 
the routing path;

Review comment:
       releated also changed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to