(fluss) branch release-0.7 updated: [docs] Add a section to configure hadoop related configuration in hdfs remote storage (#1518)

yuxia Sun, 10 Aug 2025 22:47:32 -0700

This is an automated email from the ASF dual-hosted git repository.

yuxia pushed a commit to branch release-0.7
in repository https://gitbox.apache.org/repos/asf/fluss.git



The following commit(s) were added to refs/heads/release-0.7 by this push:
     new f77ee9edd [docs] Add a section to configure hadoop related 
configuration in hdfs remote storage  (#1518)
f77ee9edd is described below

commit f77ee9edda77f47ad5a93ba31906261b31b9f64b
Author: CaoZhen <[email protected]>
AuthorDate: Sun Aug 10 22:47:21 2025 -0700

    [docs] Add a section to configure hadoop related configuration in hdfs 
remote storage  (#1518)
    
    ---------
    
    Co-authored-by: luoyuxia <[email protected]>
---
 website/docs/maintenance/filesystems/hdfs.md | 50 ++++++++++++++++++++++------
 1 file changed, 39 insertions(+), 11 deletions(-)

diff --git a/website/docs/maintenance/filesystems/hdfs.md 
b/website/docs/maintenance/filesystems/hdfs.md
index 0222a16fb..dc850eef1 100644
--- a/website/docs/maintenance/filesystems/hdfs.md
+++ b/website/docs/maintenance/filesystems/hdfs.md
@@ -25,19 +25,47 @@ supports HDFS as a remote storage.
 
 
 ## Configurations setup
-
 To enabled HDFS as remote storage, you need to define the hdfs path as remote 
storage in Fluss' `server.yaml`:
-
-```yaml
+```yaml title="conf/server.yaml"
 # The dir that used to be as the remote storage of Fluss
 remote.data.dir: hdfs://namenode:50010/path/to/remote/storage
 ```
 
-To allow for easy adoption, you can use the same configuration keys in Fluss' 
server.yaml as in Hadoop's `core-site.xml`.
-You can see the configuration keys in Hadoop's 
[`core-site.xml`](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml).
-
-
-
-
-
-
+### Configure Hadoop related configurations
+
+Sometimes, you may want to configure how Fluss accesses your Hadoop 
filesystem, Fluss supports three methods for loading Hadoop configuration, 
listed in order of priority (highest to lowest):
+
+1. **Fluss Configuration with `fluss.hadoop.*` Prefix.** Any configuration key 
prefixed with `fluss.hadoop.` in your `server.yaml` will be passed directly to 
Hadoop configuration, with the prefix stripped.
+2. **Environment Variables.** The system automatically searches for Hadoop 
configuration files in these locations:
+   - `$HADOOP_CONF_DIR` (if set)
+   - `$HADOOP_HOME/conf` (if HADOOP_HOME is set)
+   - `$HADOOP_HOME/etc/hadoop` (if HADOOP_HOME is set)
+3. **Classpath Loading.** Configuration files (`core-site.xml`, 
`hdfs-site.xml`) found in the classpath are loaded automatically.
+
+#### Configuration Examples
+Here's an example of setting up the hadoop configuration in server.yaml:
+
+```yaml title="conf/server.yaml"
+# The all following hadoop related configurations is just for a demonstration 
of how 
+# to configure hadoop related configurations in `server.yaml`, you may not 
need configure them
+
+# Basic HA Hadoop configuration using fluss.hadoop.* prefix  
+fluss.hadoop.fs.defaultFS: hdfs://mycluster
+fluss.hadoop.dfs.nameservices: mycluster
+fluss.hadoop.dfs.ha.namenodes.mycluster: nn1,nn2
+fluss.hadoop.dfs.namenode.rpc-address.mycluster.nn1: namenode1:9000
+fluss.hadoop.dfs.namenode.rpc-address.mycluster.nn2: namenode2:9000
+fluss.hadoop.dfs.namenode.http-address.mycluster.nn1: namenode1:9870
+fluss.hadoop.dfs.namenode.http-address.mycluster.nn2: namenode2:9870
+fluss.hadoop.dfs.ha.automatic-failover.enabled: true
+fluss.hadoop.dfs.client.failover.proxy.provider.mycluster: 
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
+
+# Optional: Maybe need kerberos authentication  
+fluss.hadoop.hadoop.security.authentication: kerberos
+fluss.hadoop.hadoop.security.authorization: true
+fluss.hadoop.dfs.namenode.kerberos.principal: hdfs/[email protected]
+fluss.hadoop.dfs.datanode.kerberos.principal: hdfs/[email protected]
+fluss.hadoop.dfs.web.authentication.kerberos.principal: HTTP/[email protected]
+# Client principal and keytab (adjust paths as needed)  
+fluss.hadoop.hadoop.security.kerberos.ticket.cache.path: /tmp/krb5cc_1000
+```

(fluss) branch release-0.7 updated: [docs] Add a section to configure hadoop related configuration in hdfs remote storage (#1518)

Reply via email to