This is an automated email from the ASF dual-hosted git repository. benjobs pushed a commit to branch prefix in repository https://gitbox.apache.org/repos/asf/incubator-streampark-website.git
commit 2d4b4e05271d056d64b025ab9b34e25c4401ea12 Author: benjobs <benj...@gmail.com> AuthorDate: Mon Apr 29 23:25:27 2024 +0800 [Improve] Apache projects add the prefix "Apache" --- docs/connector/2-jdbc.md | 2 +- docs/connector/3-clickhouse.md | 2 +- docs/connector/6-hbase.md | 36 ++++++++++++------------- docs/connector/7-http.md | 4 +-- docs/flink-k8s/3-hadoop-resource-integration.md | 10 +++---- docs/intro.md | 4 +-- docs/user-guide/1-deployment.md | 2 +- docs/user-guide/7-Variable.md | 2 +- 8 files changed, 31 insertions(+), 31 deletions(-) diff --git a/docs/connector/2-jdbc.md b/docs/connector/2-jdbc.md index 6326633..e6e0843 100755 --- a/docs/connector/2-jdbc.md +++ b/docs/connector/2-jdbc.md @@ -42,7 +42,7 @@ The parameter `semantic` is the semantics when writing in `JdbcSink`, only effec #### EXACTLY_ONCE -If `JdbcSink` is configured with `EXACTLY_ONCE` semantics, the underlying two-phase commit implementation is used to complete the write, at this time to flink with `Checkpointing` to take effect, how to open checkpoint please refer to Chapter 2 on [checkpoint](/docs/model/conf) configuration section +If `JdbcSink` is configured with `EXACTLY_ONCE` semantics, the underlying two-phase commit implementation is used to complete the write, at this time to Apache Flink with `Checkpointing` to take effect, how to open checkpoint please refer to Chapter 2 on [checkpoint](/docs/model/conf) configuration section #### AT_LEAST_ONCE && NONE diff --git a/docs/connector/3-clickhouse.md b/docs/connector/3-clickhouse.md index 56b8c58..5e1da68 100755 --- a/docs/connector/3-clickhouse.md +++ b/docs/connector/3-clickhouse.md @@ -147,7 +147,7 @@ $ echo 'INSERT INTO t VALUES (1),(2),(3)' | curl 'http://localhost:8123/' --data The operation of the above method is relatively simple. Sure java could also be used for writing. StreamPark adds many functions to the http post writing method, including encapsulation enhancement, adding cache, asynchronous writing, failure retry, and data backup after reaching the retry threshold, -To external components (kafka, mysql, hdfs, hbase), etc., the above functions only need to define the configuration file in the prescribed format, +To external components (Apache Kafka, MySQL, HDFS, Apache HBase), etc., the above functions only need to define the configuration file in the prescribed format, and write the code. ### Write to ClickHouse diff --git a/docs/connector/6-hbase.md b/docs/connector/6-hbase.md index 32f87b5..c18d35f 100755 --- a/docs/connector/6-hbase.md +++ b/docs/connector/6-hbase.md @@ -7,9 +7,9 @@ sidebar_position: 6 import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; -[Apache HBase](https://hbase.apache.org/book.html) is a highly reliable, high-performance, column-oriented, and scalable distributed storage system. Using HBase technology, large-scale structured storage clusters can be built on cheap PC Servers. Unlike general relational databases, HBase is a database suitable for unstructured data storage because HBase storage is based on a column rather than a row-based schema. +[Apache HBase](https://hbase.apache.org/book.html) is a highly reliable, high-performance, column-oriented, and scalable distributed storage system. Using HBase technology, large-scale structured storage clusters can be built on cheap PC Servers. Unlike general relational databases, Apache HBase is a database suitable for unstructured data storage because HBase storage is based on a column rather than a row-based schema. -Apache Flink does not officially provide a connector for HBase DataStream. Apache StreamPark encapsulates HBaseSource and HBaseSink based on `HBase-client`. It supports automatic connection creation based on configuration and simplifies development. StreamPark reading HBase can record the latest status of the read data when the checkpoint is enabled, +Apache Flink does not officially provide a connector for HBase DataStream. Apache StreamPark encapsulates HBaseSource and HBaseSink based on `HBase-client`. It supports automatic connection creation based on configuration and simplifies development. StreamPark reading Apache HBase can record the latest status of the read data when the checkpoint is enabled, and the offset corresponding to the source can be restored through the data itself. Implement source-side AT_LEAST_ONCE. HBaseSource implements Flink Async I/O to improve streaming throughput. The sink side supports AT_LEAST_ONCE by default. @@ -17,14 +17,14 @@ EXACTLY_ONCE is supported when checkpointing is enabled. :::tip hint -StreamPark reading HBASE can record the latest state of the read data when checkpoint is enabled. +StreamPark reading Apache HBase can record the latest state of the read data when checkpoint is enabled. Whether the previous state can be restored after the job is resumed depends entirely on whether the data itself has an offset identifier, which needs to be manually specified in the code. The recovery logic needs to be specified in the func parameter of the getDataStream method of HBaseSource. ::: -## Dependency of HBase writing +## Dependency of Apache HBase writing -HBase Maven Dependency: +Apache HBase Maven Dependency: ```xml <dependency> @@ -39,7 +39,7 @@ HBase Maven Dependency: </dependency> ``` -## Regular way to write and read HBase +## Regular way to write and read Apache HBase ### 1.Create database and table create 'Student', {NAME => 'Stulnfo', VERSIONS => 3}, {NAME =>'Grades', BLOCKCACHE => true} ### 2.Write demo and Read demo @@ -144,7 +144,7 @@ import org.slf4j.LoggerFactory; import java.io.IOException; /** - * Desc: Read stream data, then write to HBase + * Desc: Read stream data, then write to Apache HBase */ @Slf4j public class HBaseStreamWriteMain { @@ -177,10 +177,10 @@ public class HBaseStreamWriteMain { } /** -Write to HBase +Write to Apache HBase Inherit RichSinkFunction to override the parent class method <p> - When writing to hbase, 500 items are flushed once, inserted in batches, using writeBufferSize + When writing to Apache HBase, 500 items are flushed once, inserted in batches, using writeBufferSize */ class HBaseWriter extends RichSinkFunction<String> { private static final Logger logger = LoggerFactory.getLogger(HBaseWriter.class); @@ -236,16 +236,16 @@ class HBaseWriter extends RichSinkFunction<String> { </Tabs> -Reading and writing HBase in this way is cumbersome and inconvenient. `StreamPark` follows the concept of convention over configuration and automatic configuration. -Users only need to configure HBase connection parameters and Flink operating parameters. StreamPark will automatically assemble source and sink, +Reading and writing Apache HBase in this way is cumbersome and inconvenient. `StreamPark` follows the concept of convention over configuration and automatic configuration. +Users only need to configure Apache HBase connection parameters and Flink operating parameters. StreamPark will automatically assemble source and sink, which greatly simplifies development logic and improves development efficiency and maintainability. -## write and read HBase with Apache StreamPark™ +## write and read Apache HBase with Apache StreamPark™ ### 1. Configure policies and connection information ```yaml -# hbase +# apache hbase hbase: zookeeper.quorum: test1,test2,test6 zookeeper.property.clientPort: 2181 @@ -255,12 +255,12 @@ hbase: ``` -### 2. Read and write HBase +### 2. Read and write Apache HBase -Writing to HBase with StreamPark is very simple, the code is as follows: +Writing to Apache HBase with StreamPark is very simple, the code is as follows: <Tabs> -<TabItem value="read HBase"> +<TabItem value="read Apache HBase"> ```scala @@ -317,7 +317,7 @@ object HBaseSourceApp extends FlinkStreaming { ``` </TabItem> -<TabItem value="write HBase"> +<TabItem value="write Apache HBase"> ```scala import org.apache.streampark.flink.core.scala.FlinkStreaming @@ -360,7 +360,7 @@ object HBaseSinkApp extends FlinkStreaming { </TabItem> </Tabs> -When StreamPark writes to HBase, you need to create the method of HBaseQuery, +When StreamPark writes to Apache HBase, you need to create the method of HBaseQuery, specify the method to convert the query result into the required object, identify whether it is running, and pass in the running parameters. details as follows ```scala diff --git a/docs/connector/7-http.md b/docs/connector/7-http.md index ef180a5..97e0822 100755 --- a/docs/connector/7-http.md +++ b/docs/connector/7-http.md @@ -9,11 +9,11 @@ import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; Some background services receive data through HTTP requests. In this scenario, Apache Flink can write result data through HTTP -requests. Currently, Flink officially does not provide a connector for writing data through HTTP requests. Apache StreamPark +requests. Currently, Apache Flink officially does not provide a connector for writing data through HTTP requests. Apache StreamPark encapsulates HttpSink to write data asynchronously in real-time based on asynchttpclient. `HttpSink` writes do not support transactions, writing data to the target service provides AT_LEAST_ONCE semantics. Data -that fails to be retried multiple times will be written to external components (kafka, mysql, hdfs, hbase), and the data +that fails to be retried multiple times will be written to external components (Apache Kafka, MySQL, HDFS, Apache HBase), and the data will be restored manually to achieve final data consistency. ## http asynchronous write diff --git a/docs/flink-k8s/3-hadoop-resource-integration.md b/docs/flink-k8s/3-hadoop-resource-integration.md index c03274d..116e7b7 100644 --- a/docs/flink-k8s/3-hadoop-resource-integration.md +++ b/docs/flink-k8s/3-hadoop-resource-integration.md @@ -116,11 +116,11 @@ public static String getHadoopConfConfigMapName(String clusterId) { #### 2、Hive - To sink data to hive, or use hive metastore as flink's metadata, it is necessary to open the path from flink to hive, which also needs to go through the following two steps: + To sink data to Apache Hive, or use hive metastore as flink's metadata, it is necessary to open the path from Apache Flink to Apache Hive, which also needs to go through the following two steps: -##### i、Add hive related jars +##### i、Add Apache Hive related jars - As mentioned above, the default flink image does not include hive-related jars. The following three hive-related jars need to be placed in the lib directory of flink. Here, hive version 2.3.6 is used as an example: + As mentioned above, the default flink image does not include hive-related jars. The following three hive-related jars need to be placed in the lib directory of flink. Here, Apache Hive version 2.3.6 is used as an example: a、`hive-exec`:https://repo1.maven.org/maven2/org/apache/hive/hive-exec/2.3.6/hive-exec-2.3.6.jar @@ -130,7 +130,7 @@ public static String getHadoopConfConfigMapName(String clusterId) { Similarly, the above-mentioned hive-related jars can also be dependently configured in the `Dependency` in the task configuration of StreamPark in a dependent manner, which will not be repeated here. -##### ii、Add hive configuration file (hive-site.xml) +##### ii、Add Apache Hive configuration file (hive-site.xml) The difference from hdfs is that there is no default loading method for the hive configuration file in the flink source code, so developers need to manually add the hive configuration file. There are three main methods here: @@ -165,7 +165,7 @@ spec: #### Conclusion - Through the above method, flink can be connected with hadoop and hive. This method can be extended to general, that is, flink and external systems such as redis, mongo, etc., generally require the following two steps: + Through the above method, Apache Flink can be connected with Apache Hadoop and Hive. This method can be extended to general, that is, flink and external systems such as redis, mongo, etc., generally require the following two steps: i. Load the connector jar of the specified external service diff --git a/docs/intro.md b/docs/intro.md index 8cac6dd..3ce0fb7 100644 --- a/docs/intro.md +++ b/docs/intro.md @@ -18,12 +18,12 @@ StreamPark also provides a professional task management module including task de Apache Flink and Apache Spark are widely used as the next generation of big data streaming computing engines. Based on a foundation of excellent experiences combined with best practices, we extracted the task deployment and runtime parameters into the configuration files. In this way, an easy-to-use `RuntimeContext` with out-of-the-box connectors can bring an easier and more efficient task development experience. It reduces the learning cost and development barriers, so developers can fo [...] -On the other hand, It can be challenge for enterprises to use Flink & Spark if there is no professional management platform for Flink & Spark tasks during the deployment phase. StreamPark provides such a professional task management platform as described above. +On the other hand, It can be challenge for enterprises to use Apache Flink & Apache Spark if there is no professional management platform for Flink & Spark tasks during the deployment phase. StreamPark provides such a professional task management platform as described above. ## 🎉 Features * Apache Flink & Apache Spark application development scaffold -* Supports multiple versions of Flink & Spark +* Supports multiple versions of Apache Flink & Apache Spark * Wide range of out-of-the-box connectors * One-stop stream processing operation platform * Supports catalog, OLAP, streaming warehouse, etc. diff --git a/docs/user-guide/1-deployment.md b/docs/user-guide/1-deployment.md index 6d2cbaa..735af1f 100755 --- a/docs/user-guide/1-deployment.md +++ b/docs/user-guide/1-deployment.md @@ -8,7 +8,7 @@ import { DeploymentEnvs } from '../components/TableData.jsx'; The overall component stack structure of StreamPark consists of two major parts: streampark-core and streampark-console. -streampark-console is positioned as an **integrated, real-time data platform**, **streaming data warehouse Platform**, **Low Code**, **Flink & Spark task hosting platform**. It can manage Flink tasks better, and integrate project compilation, publishing, parameter configuration, startup, savepoint, flame graph, Flink SQL, monitoring and many other functions, which greatly simplifies the daily operation and maintenance of Flink tasks and integrates many best practices. +streampark-console is positioned as an integrated, real-time data platform, streaming data warehouse Platform, Low Code, Apache Flink & Apache Spark task hosting platform. It can manage Flink tasks better, and integrate project compilation, publishing, parameter configuration, startup, savepoint, flame graph, Flink SQL, monitoring and many other functions, which greatly simplifies the daily operation and maintenance of Flink tasks and integrates many best practices. The goal is to create a one-stop big data solution that integrates real-time data warehouses and batches. diff --git a/docs/user-guide/7-Variable.md b/docs/user-guide/7-Variable.md index 26500c6..c2e774a 100644 --- a/docs/user-guide/7-Variable.md +++ b/docs/user-guide/7-Variable.md @@ -6,7 +6,7 @@ sidebar_position: 7 ## Background Introduction -In the actual production environment, Flink jobs are generally complex, and usually require multiple external components. For example, Flink jobs consume data from Kafka, then connect external components such as HBase or Redis to obtain additional business information, and then write it to the downstream external components. There are the following problems. +In the actual production environment, Flink jobs are generally complex, and usually require multiple external components. For example, Flink jobs consume data from Kafka, then connect external components such as Apache HBase or Redis to obtain additional business information, and then write it to the downstream external components. There are the following problems. - The connection information of external components, such as IP, port and user password, needs to be configured in the application args and transferred to the Flink job, so that the connection information of external components is distributed in multiple applications. Once the connection information of external components changes,