This is an automated email from the ASF dual-hosted git repository.
yuxia pushed a commit to branch release-0.7
in repository https://gitbox.apache.org/repos/asf/fluss.git
The following commit(s) were added to refs/heads/release-0.7 by this push:
new f77ee9edd [docs] Add a section to configure hadoop related
configuration in hdfs remote storage (#1518)
f77ee9edd is described below
commit f77ee9edda77f47ad5a93ba31906261b31b9f64b
Author: CaoZhen <[email protected]>
AuthorDate: Sun Aug 10 22:47:21 2025 -0700
[docs] Add a section to configure hadoop related configuration in hdfs
remote storage (#1518)
---------
Co-authored-by: luoyuxia <[email protected]>
---
website/docs/maintenance/filesystems/hdfs.md | 50 ++++++++++++++++++++++------
1 file changed, 39 insertions(+), 11 deletions(-)
diff --git a/website/docs/maintenance/filesystems/hdfs.md
b/website/docs/maintenance/filesystems/hdfs.md
index 0222a16fb..dc850eef1 100644
--- a/website/docs/maintenance/filesystems/hdfs.md
+++ b/website/docs/maintenance/filesystems/hdfs.md
@@ -25,19 +25,47 @@ supports HDFS as a remote storage.
## Configurations setup
-
To enabled HDFS as remote storage, you need to define the hdfs path as remote
storage in Fluss' `server.yaml`:
-
-```yaml
+```yaml title="conf/server.yaml"
# The dir that used to be as the remote storage of Fluss
remote.data.dir: hdfs://namenode:50010/path/to/remote/storage
```
-To allow for easy adoption, you can use the same configuration keys in Fluss'
server.yaml as in Hadoop's `core-site.xml`.
-You can see the configuration keys in Hadoop's
[`core-site.xml`](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml).
-
-
-
-
-
-
+### Configure Hadoop related configurations
+
+Sometimes, you may want to configure how Fluss accesses your Hadoop
filesystem, Fluss supports three methods for loading Hadoop configuration,
listed in order of priority (highest to lowest):
+
+1. **Fluss Configuration with `fluss.hadoop.*` Prefix.** Any configuration key
prefixed with `fluss.hadoop.` in your `server.yaml` will be passed directly to
Hadoop configuration, with the prefix stripped.
+2. **Environment Variables.** The system automatically searches for Hadoop
configuration files in these locations:
+ - `$HADOOP_CONF_DIR` (if set)
+ - `$HADOOP_HOME/conf` (if HADOOP_HOME is set)
+ - `$HADOOP_HOME/etc/hadoop` (if HADOOP_HOME is set)
+3. **Classpath Loading.** Configuration files (`core-site.xml`,
`hdfs-site.xml`) found in the classpath are loaded automatically.
+
+#### Configuration Examples
+Here's an example of setting up the hadoop configuration in server.yaml:
+
+```yaml title="conf/server.yaml"
+# The all following hadoop related configurations is just for a demonstration
of how
+# to configure hadoop related configurations in `server.yaml`, you may not
need configure them
+
+# Basic HA Hadoop configuration using fluss.hadoop.* prefix
+fluss.hadoop.fs.defaultFS: hdfs://mycluster
+fluss.hadoop.dfs.nameservices: mycluster
+fluss.hadoop.dfs.ha.namenodes.mycluster: nn1,nn2
+fluss.hadoop.dfs.namenode.rpc-address.mycluster.nn1: namenode1:9000
+fluss.hadoop.dfs.namenode.rpc-address.mycluster.nn2: namenode2:9000
+fluss.hadoop.dfs.namenode.http-address.mycluster.nn1: namenode1:9870
+fluss.hadoop.dfs.namenode.http-address.mycluster.nn2: namenode2:9870
+fluss.hadoop.dfs.ha.automatic-failover.enabled: true
+fluss.hadoop.dfs.client.failover.proxy.provider.mycluster:
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
+
+# Optional: Maybe need kerberos authentication
+fluss.hadoop.hadoop.security.authentication: kerberos
+fluss.hadoop.hadoop.security.authorization: true
+fluss.hadoop.dfs.namenode.kerberos.principal: hdfs/[email protected]
+fluss.hadoop.dfs.datanode.kerberos.principal: hdfs/[email protected]
+fluss.hadoop.dfs.web.authentication.kerberos.principal: HTTP/[email protected]
+# Client principal and keytab (adjust paths as needed)
+fluss.hadoop.hadoop.security.kerberos.ticket.cache.path: /tmp/krb5cc_1000
+```