yufan022 commented on a change in pull request #9696: URL: https://github.com/apache/pulsar/pull/9696#discussion_r610519180
########## File path: site2/docs/tiered-storage-aliyun.md ########## @@ -0,0 +1,242 @@ +--- +id: tiered-storage-aliyun +title: Use Aliyun OSS offloader with Pulsar +sidebar_label: Aliyun OSS offloader +--- + +This chapter guides you through every step of installing and configuring the Aliyun Object Storage Service (OSS) offloader and using it with Pulsar. + +## Installation + +Follow the steps below to install the Aliyun OSS offloader. + +### Prerequisite + +- Pulsar: 2.8.0 or later versions + +### Step + +This example uses Pulsar 2.8.0. + +1. Download the Pulsar tarball using one of the following ways: + + * Download from the [Apache mirror](https://archive.apache.org/dist/pulsar/pulsar-2.8.0/apache-pulsar-2.8.0-bin.tar.gz) + + * Download from the Pulsar [downloads page](https://pulsar.apache.org/download) + + * Use [wget](https://www.gnu.org/software/wget): + + ```shell + wget https://archive.apache.org/dist/pulsar/pulsar-2.8.0/apache-pulsar-2.8.0-bin.tar.gz + ``` + +2. Download and untar the Pulsar offloaders package. + + ```bash + wget https://downloads.apache.org/pulsar/pulsar-2.8.0/apache-pulsar-offloaders-2.8.0-bin.tar.gz + tar xvfz apache-pulsar-offloaders-2.8.0-bin.tar.gz + ``` + +3. Copy the Pulsar offloaders as `offloaders` in the Pulsar directory. + + ``` + mv apache-pulsar-offloaders-2.8.0/offloaders apache-pulsar-2.8.0/offloaders + + ls offloaders + ``` + + **Output** + + As shown from the output, Pulsar uses [Apache jclouds](https://jclouds.apache.org) to support [AWS S3](https://aws.amazon.com/s3/), [GCS](https://cloud.google.com/storage/), [Azure](https://portal.azure.com/#home), and [Aliyun OSS](https://www.aliyun.com/product/oss) for long-term storage. + + + ``` + tiered-storage-file-system-2.8.0.nar + tiered-storage-jcloud-2.8.0.nar + ``` + + > #### Note + > + > * If you are running Pulsar in a bare-metal cluster, make sure that `offloaders` tarball is unzipped in every broker's Pulsar directory. + > + > * If you are running Pulsar in Docker or deploying Pulsar using a Docker image (such as K8s and DCOS), you can use the `apachepulsar/pulsar-all` image instead of the `apachepulsar/pulsar` image. `apachepulsar/pulsar-all` image has already bundled tiered storage offloaders. + +## Configuration + +> #### Note +> +> Before offloading data from BookKeeper to Aliyun OSS, you need to configure some properties of the Aliyun OSS offload driver. + +Besides, you can also configure the Aliyun OSS offloader to run it automatically or trigger it manually. + +### Configure Aliyun OSS offloader driver + +You can configure the Aliyun OSS offloader driver in the configuration file `broker.conf` or `standalone.conf`. + +- **Required** configurations are as below. + + Required configuration | Description | Example value + |---|---|--- + `managedLedgerOffloadDriver` | Offloader driver name, which is case-insensitive. | aliyun-oss + `offloadersDirectory` | Offloader directory | offloaders + `managedLedgerOffloadBucket` | Bucket | pulsar-topic-offload + `managedLedgerOffloadServiceEndpoint` | Endpoint | http://oss-cn-hongkong.aliyuncs.com + +- **Optional** configurations are as below. + + Optional | Description | Example value + |---|---|--- + `managedLedgerOffloadReadBufferSizeInBytes`|Size of block read|1 MB + `managedLedgerOffloadMaxBlockSizeInBytes`|Size of block write|64 MB + `managedLedgerMinLedgerRolloverTimeMinutes`|Minimum time between ledger rollover for a topic<br><br>**Note**: it is not recommended that you set this configuration in the production environment.|2 + `managedLedgerMaxEntriesPerLedger`|Maximum number of entries to append to a ledger before triggering a rollover.<br><br>**Note**: it is not recommended that you set this configuration in the production environment.|5000 + +#### Bucket (required) + +A bucket is a basic container that holds your data. Everything you store in Aliyun OSS must be contained in a bucket. You can use a bucket to organize your data and control access to your data, but unlike directory and folder, you cannot nest a bucket. + +##### Example + +This example names the bucket as _pulsar-topic-offload_. + +```conf +managedLedgerOffloadBucket=pulsar-topic-offload +``` + +#### Endpoint (required) + +The endpoint is the region where a bucket is located. + +> #### Tip +> +> For more information about Aliyun OSS regions and endpoints, see [here](https://help.aliyun.com/document_detail/31837.html). + +##### Example + +This example sets the endpoint as _oss-us-west-1-internal_. + +``` +managedLedgerOffloadServiceEndpoint=http://oss-us-west-1-internal.aliyuncs.com +``` + +#### Authentication (required) + +To be able to access Aliyun OSS, you need to authenticate with Aliyun OSS. + +* Set the environment variables `ALIYUN_OSS_ACCESS_KEY_ID` and `ALIYUN_OSS_ACCESS_KEY_SECRET` in `conf/pulsar_env.sh`. + + "export" is important so that the variables are made available in the environment of spawned processes. + + ```bash + export ALIYUN_OSS_ACCESS_KEY_ID=ABC123456789 + export ALIYUN_OSS_ACCESS_KEY_SECRET=ded7db27a4558e2ea8bbf0bf37ae0e8521618f366c + ``` + +#### Size of block read/write + +You can configure the size of a request sent to or read from Aliyun OSS in the configuration file `broker.conf` or `standalone.conf`. + +Configuration|Description|Default value +|---|---|--- +`managedLedgerOffloadReadBufferSizeInBytes`|Block size for each individual read when reading back data from Aliyun OSS.|1 MB +`managedLedgerOffloadMaxBlockSizeInBytes`|Maximum size of a "part" sent during a multipart upload to Aliyun OSS. It **cannot** be smaller than 5 MB. |64 MB + +### Configure Aliyun OSS offloader to run automatically + +Namespace policy can be configured to offload data automatically once a threshold is reached. The threshold is based on the size of data that a topic has stored on a Pulsar cluster. Once the topic reaches the threshold, an offloading operation is triggered automatically. + +Threshold value|Action +|---|--- Review comment: fix ########## File path: site2/docs/tiered-storage-aliyun.md ########## @@ -0,0 +1,242 @@ +--- +id: tiered-storage-aliyun +title: Use Aliyun OSS offloader with Pulsar +sidebar_label: Aliyun OSS offloader +--- + +This chapter guides you through every step of installing and configuring the Aliyun Object Storage Service (OSS) offloader and using it with Pulsar. + +## Installation + +Follow the steps below to install the Aliyun OSS offloader. + +### Prerequisite + +- Pulsar: 2.8.0 or later versions + +### Step + +This example uses Pulsar 2.8.0. + +1. Download the Pulsar tarball using one of the following ways: + + * Download from the [Apache mirror](https://archive.apache.org/dist/pulsar/pulsar-2.8.0/apache-pulsar-2.8.0-bin.tar.gz) + + * Download from the Pulsar [downloads page](https://pulsar.apache.org/download) + + * Use [wget](https://www.gnu.org/software/wget): + + ```shell + wget https://archive.apache.org/dist/pulsar/pulsar-2.8.0/apache-pulsar-2.8.0-bin.tar.gz + ``` + +2. Download and untar the Pulsar offloaders package. + + ```bash + wget https://downloads.apache.org/pulsar/pulsar-2.8.0/apache-pulsar-offloaders-2.8.0-bin.tar.gz + tar xvfz apache-pulsar-offloaders-2.8.0-bin.tar.gz + ``` + +3. Copy the Pulsar offloaders as `offloaders` in the Pulsar directory. + + ``` + mv apache-pulsar-offloaders-2.8.0/offloaders apache-pulsar-2.8.0/offloaders + + ls offloaders + ``` + + **Output** + + As shown from the output, Pulsar uses [Apache jclouds](https://jclouds.apache.org) to support [AWS S3](https://aws.amazon.com/s3/), [GCS](https://cloud.google.com/storage/), [Azure](https://portal.azure.com/#home), and [Aliyun OSS](https://www.aliyun.com/product/oss) for long-term storage. + + + ``` + tiered-storage-file-system-2.8.0.nar + tiered-storage-jcloud-2.8.0.nar + ``` + + > #### Note + > + > * If you are running Pulsar in a bare-metal cluster, make sure that `offloaders` tarball is unzipped in every broker's Pulsar directory. + > + > * If you are running Pulsar in Docker or deploying Pulsar using a Docker image (such as K8s and DCOS), you can use the `apachepulsar/pulsar-all` image instead of the `apachepulsar/pulsar` image. `apachepulsar/pulsar-all` image has already bundled tiered storage offloaders. + +## Configuration + +> #### Note +> +> Before offloading data from BookKeeper to Aliyun OSS, you need to configure some properties of the Aliyun OSS offload driver. + +Besides, you can also configure the Aliyun OSS offloader to run it automatically or trigger it manually. + +### Configure Aliyun OSS offloader driver + +You can configure the Aliyun OSS offloader driver in the configuration file `broker.conf` or `standalone.conf`. + +- **Required** configurations are as below. + + Required configuration | Description | Example value + |---|---|--- + `managedLedgerOffloadDriver` | Offloader driver name, which is case-insensitive. | aliyun-oss + `offloadersDirectory` | Offloader directory | offloaders + `managedLedgerOffloadBucket` | Bucket | pulsar-topic-offload + `managedLedgerOffloadServiceEndpoint` | Endpoint | http://oss-cn-hongkong.aliyuncs.com + +- **Optional** configurations are as below. + + Optional | Description | Example value + |---|---|--- + `managedLedgerOffloadReadBufferSizeInBytes`|Size of block read|1 MB + `managedLedgerOffloadMaxBlockSizeInBytes`|Size of block write|64 MB + `managedLedgerMinLedgerRolloverTimeMinutes`|Minimum time between ledger rollover for a topic<br><br>**Note**: it is not recommended that you set this configuration in the production environment.|2 + `managedLedgerMaxEntriesPerLedger`|Maximum number of entries to append to a ledger before triggering a rollover.<br><br>**Note**: it is not recommended that you set this configuration in the production environment.|5000 + +#### Bucket (required) + +A bucket is a basic container that holds your data. Everything you store in Aliyun OSS must be contained in a bucket. You can use a bucket to organize your data and control access to your data, but unlike directory and folder, you cannot nest a bucket. + +##### Example + +This example names the bucket as _pulsar-topic-offload_. + +```conf +managedLedgerOffloadBucket=pulsar-topic-offload +``` + +#### Endpoint (required) + +The endpoint is the region where a bucket is located. + +> #### Tip +> +> For more information about Aliyun OSS regions and endpoints, see [here](https://help.aliyun.com/document_detail/31837.html). + +##### Example + +This example sets the endpoint as _oss-us-west-1-internal_. + +``` +managedLedgerOffloadServiceEndpoint=http://oss-us-west-1-internal.aliyuncs.com +``` + +#### Authentication (required) + +To be able to access Aliyun OSS, you need to authenticate with Aliyun OSS. + +* Set the environment variables `ALIYUN_OSS_ACCESS_KEY_ID` and `ALIYUN_OSS_ACCESS_KEY_SECRET` in `conf/pulsar_env.sh`. Review comment: fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org