(druid) branch master updated: docs: add deprecation for hadoop and java11 (#18286)

brile Fri, 08 Aug 2025 12:37:27 -0700

This is an automated email from the ASF dual-hosted git repository.

brile pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git



The following commit(s) were added to refs/heads/master by this push:
     new 8d9632fc0fe docs: add deprecation for hadoop and java11 (#18286)
8d9632fc0fe is described below

commit 8d9632fc0feaaf83173b3ea7c94f2e3ac50efcf1
Author: 317brian <[email protected]>
AuthorDate: Fri Aug 8 12:37:16 2025 -0700

    docs: add deprecation for hadoop and java11 (#18286)
    
    Co-authored-by: Victoria Lim <[email protected]>
    Co-authored-by: Lucas Capistrant <[email protected]>
---
 docs/ingestion/data-formats.md             | 10 ++++++++--
 docs/ingestion/faq.md                      |  4 ----
 docs/ingestion/hadoop.md                   |  9 +++++++++
 docs/ingestion/index.md                    |  5 ++---
 docs/operations/java.md                    |  8 ++------
 docs/operations/other-hadoop.md            |  8 ++++++++
 docs/tutorials/cluster.md                  |  4 ++--
 docs/tutorials/index.md                    |  4 ++--
 docs/tutorials/tutorial-batch-hadoop.md    |  7 +++++++
 docs/tutorials/tutorial-kerberos-hadoop.md |  8 ++++++++
 docs/tutorials/tutorial-query.md           |  1 -
 11 files changed, 48 insertions(+), 20 deletions(-)

diff --git a/docs/ingestion/data-formats.md b/docs/ingestion/data-formats.md
index 1bb2de9918f..a595b1c6ded 100644
--- a/docs/ingestion/data-formats.md
+++ b/docs/ingestion/data-formats.md
@@ -962,11 +962,17 @@ Each line can be further parsed using 
[`parseSpec`](#parsespec).
 
 ### Avro Hadoop Parser
 
-:::info
- You need to include the 
[`druid-avro-extensions`](../development/extensions-core/avro.md) as an 
extension to use the Avro Hadoop Parser.
+:::caution[Deprecated]
+
+Hadoop-based ingestion is deprecated. We recommend one of Druid's other 
supported ingestion methods, such as [SQL-based 
ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion 
using Kubernetes](../development/extensions-core/k8s-jobs.md)
+
+You must now explicitly opt-in to using the deprecated `index_hadoop` task 
type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in 
your `common.runtime.properties` file. For more information, see 
[#18239](https://github.com/apache/druid/pull/18239)
+
 :::
 
 :::info
+You need to include 
[`druid-avro-extensions`](../development/extensions-core/avro.md) as an 
extension to use the Avro Hadoop Parser.
+
  See the [Avro Types](../development/extensions-core/avro.md#avro-types) 
section for how Avro types are handled in Druid
 :::
 
diff --git a/docs/ingestion/faq.md b/docs/ingestion/faq.md
index 24e119585ab..3fab83f0ea9 100644
--- a/docs/ingestion/faq.md
+++ b/docs/ingestion/faq.md
@@ -49,10 +49,6 @@ Other common reasons that hand-off fails are as follows:
 
 4) Deep storage is improperly configured. Make sure that your segment actually 
exists in deep storage and that the Coordinator logs have no errors.
 
-## How do I get HDFS to work?
-
-Make sure to include the `druid-hdfs-storage` and all the hadoop 
configuration, dependencies (that can be obtained by running command `hadoop 
classpath` on a machine where hadoop has been setup) in the classpath. And, 
provide necessary HDFS settings as described in [deep 
storage](../design/deep-storage.md) .
-
 ## How do I know when I can make query to Druid after submitting batch 
ingestion task?
 
 You can verify if segments created by a recent ingestion task are loaded onto 
historicals and available for querying using the following workflow.
diff --git a/docs/ingestion/hadoop.md b/docs/ingestion/hadoop.md
index db665f9769d..3dd738f7891 100644
--- a/docs/ingestion/hadoop.md
+++ b/docs/ingestion/hadoop.md
@@ -23,6 +23,15 @@ sidebar_label: "Hadoop-based"
   ~ under the License.
   -->
 
+:::caution[Deprecated]
+
+Hadoop-based ingestion is deprecated. We recommend one of Druid's other 
supported ingestion methods, such as [SQL-based 
ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion 
using Kubernetes](../development/extensions-core/k8s-jobs.md)
+
+You must now explicitly opt-in to using the deprecated `index_hadoop` task 
type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in 
your `common.runtime.properties` file. For more information, see 
[#18239](https://github.com/apache/druid/pull/18239)
+
+:::
+
+
 Apache Hadoop-based batch ingestion in Apache Druid is supported via a 
Hadoop-ingestion task. These tasks can be posted to a running
 instance of a Druid [Overlord](../design/overlord.md). Please refer to our 
[Hadoop-based vs. native batch comparison table](index.md#batch) for
 comparisons between Hadoop-based, native batch (simple), and native batch 
(parallel) ingestion.
diff --git a/docs/ingestion/index.md b/docs/ingestion/index.md
index b2c9002df92..de90051fca0 100644
--- a/docs/ingestion/index.md
+++ b/docs/ingestion/index.md
@@ -28,8 +28,7 @@ your source system and stores it in data files called 
[_segments_](../design/seg
 In general, segment files contain a few million rows each.
 
 For most ingestion methods, the Druid [Middle 
Manager](../design/middlemanager.md) processes or the
-[Indexer](../design/indexer.md) processes load your source data. The sole 
exception is Hadoop-based ingestion, which
-uses a Hadoop MapReduce job on YARN.
+[Indexer](../design/indexer.md) processes load your source data. 
 
 During ingestion, Druid creates segments and stores them in [deep 
storage](../design/deep-storage.md). Historical nodes load the segments into 
memory to respond to queries. For streaming ingestion, the Middle Managers and 
indexers can respond to queries in real-time with arriving data. For more 
information, see [Storage overview](../design/storage.md).
 
@@ -66,7 +65,7 @@ supervisor.
 There are three available options for batch ingestion. Batch ingestion jobs 
are associated with a controller task that
 runs for the duration of the job.
 
-| **Method** | [Native batch](./native-batch.md) | 
[SQL](../multi-stage-query/index.md) | [Hadoop-based](hadoop.md) |
+| **Method** | [Native batch](./native-batch.md) | 
[SQL](../multi-stage-query/index.md) | [Hadoop-based (deprecated)](hadoop.md) |
 |---|-----|--------------|------------|
 | **Controller task type** | `index_parallel` | `query_controller` | 
`index_hadoop` |
 | **How you submit it** | Send an `index_parallel` spec to the [Tasks 
API](../api-reference/tasks-api.md). | Send an 
[INSERT](../multi-stage-query/concepts.md#load-data-with-insert) or 
[REPLACE](../multi-stage-query/concepts.md#overwrite-data-with-replace) 
statement to the [SQL task 
API](../api-reference/sql-ingestion-api.md#submit-a-query). | Send an 
`index_hadoop` spec to the [Tasks API](../api-reference/tasks-api.md). |
diff --git a/docs/operations/java.md b/docs/operations/java.md
index d16f78c8abe..0b8a474ac95 100644
--- a/docs/operations/java.md
+++ b/docs/operations/java.md
@@ -27,11 +27,7 @@ a Java runtime for Druid.
 
 ## Selecting a Java runtime
 
-Druid fully supports Java 11 and Java 17. The project team recommends Java 17.
-
-:::info
-Note: Starting with Apache Druid 32.0.0, support for Java 8 has been removed.
-:::
+ The project team recommends Java 17. Although you can use Java 11, support 
for it is deprecated.
 
 The project team recommends using an OpenJDK-based Java distribution. There 
are many free and actively-supported
 distributions available, including
@@ -74,7 +70,7 @@ Exception in thread "main" 
java.lang.ExceptionInInitializerError
 ```
 
 Druid's out-of-box configuration adds these parameters transparently when you 
use the bundled `bin/start-druid` or
-similar commands. In this case, there is nothing special you need to do to run 
successfully on Java 11 or 17. However,
+similar commands. In this case, there is nothing special you need to do to run 
successfully. However,
 if you have customized your Druid service launching system, you will need to 
ensure the required Java parameters are
 added. There are many ways of doing this. Choose the one that works best for 
you.
 
diff --git a/docs/operations/other-hadoop.md b/docs/operations/other-hadoop.md
index ba19a832643..a82b331de4b 100644
--- a/docs/operations/other-hadoop.md
+++ b/docs/operations/other-hadoop.md
@@ -23,6 +23,14 @@ title: "Working with different versions of Apache Hadoop"
   -->
 
 
+:::caution[Deprecated]
+
+Hadoop-based ingestion is deprecated. We recommend one of Druid's other 
supported ingestion methods, such as [SQL-based 
ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion 
using Kubernetes](../development/extensions-core/k8s-jobs.md)
+
+You must now explicitly opt-in to using the deprecated `index_hadoop` task 
type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in 
your `common.runtime.properties` file. For more information, see 
[#18239](https://github.com/apache/druid/pull/18239)
+
+:::
+
 Apache Druid can interact with Hadoop in two ways:
 
 1. [Use HDFS for deep storage](../development/extensions-core/hdfs.md) using 
the druid-hdfs-storage extension.
diff --git a/docs/tutorials/cluster.md b/docs/tutorials/cluster.md
index f2128489216..cd435c5e1ce 100644
--- a/docs/tutorials/cluster.md
+++ b/docs/tutorials/cluster.md
@@ -133,8 +133,8 @@ The [basic cluster tuning 
guide](../operations/basic-cluster-tuning.md) has info
 
 We recommend running your favorite Linux distribution. You will also need 
 
-* [Java 11 or 17](../operations/java.md)
-* Python 2 or Python 3
+* [Java 17](../operations/java.md)
+* Python 3
 
 :::info
  If needed, you can specify where to find Java using the environment variables
diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md
index 187f3cb952b..f55b26e60f6 100644
--- a/docs/tutorials/index.md
+++ b/docs/tutorials/index.md
@@ -40,8 +40,8 @@ You can follow these steps on a relatively modest machine, 
such as a workstation
 The software requirements for the installation machine are:
 
 * Linux, Mac OS X, or other Unix-like OS. (Windows is not supported)
-* [Java 11 or 17](../operations/java.md)
-* Python 3 (preferred) or Python 2
+* [Java 17](../operations/java.md)
+* Python 3 
 * Perl 5
 
 Java must be available. Either it is on your path, or set one of the 
`JAVA_HOME` or `DRUID_JAVA_HOME` environment variables.
diff --git a/docs/tutorials/tutorial-batch-hadoop.md 
b/docs/tutorials/tutorial-batch-hadoop.md
index a71823544af..c75fc7d35e8 100644
--- a/docs/tutorials/tutorial-batch-hadoop.md
+++ b/docs/tutorials/tutorial-batch-hadoop.md
@@ -23,6 +23,13 @@ sidebar_label: Load from Apache Hadoop
   ~ under the License.
   -->
 
+:::caution[Deprecated]
+
+Hadoop-based ingestion is deprecated. We recommend one of Druid's other 
supported ingestion methods, such as [SQL-based 
ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion 
using Kubernetes](../development/extensions-core/k8s-jobs.md)
+
+You must now explicitly opt-in to using the deprecated `index_hadoop` task 
type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in 
your `common.runtime.properties` file. For more information, see 
[#18239](https://github.com/apache/druid/pull/18239)
+
+:::
 
 This tutorial shows you how to load data files into Apache Druid using a 
remote Hadoop cluster.
 
diff --git a/docs/tutorials/tutorial-kerberos-hadoop.md 
b/docs/tutorials/tutorial-kerberos-hadoop.md
index 24fc290b6a6..cace9b8794f 100644
--- a/docs/tutorials/tutorial-kerberos-hadoop.md
+++ b/docs/tutorials/tutorial-kerberos-hadoop.md
@@ -23,6 +23,14 @@ sidebar_label: Kerberized HDFS deep storage
   ~ under the License.
   -->
 
+:::caution[Deprecated]
+
+Hadoop-based ingestion is deprecated. We recommend one of Druid's other 
supported ingestion methods, such as [SQL-based 
ingestion](../multi-stage-query/index.md) or [MiddleManager-less ingestion 
using Kubernetes](../development/extensions-core/k8s-jobs.md)
+
+You must now explicitly opt-in to using the deprecated `index_hadoop` task 
type. To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in 
your `common.runtime.properties` file. For more information, see 
[#18239](https://github.com/apache/druid/pull/18239)
+
+:::
+
 
 ## Hadoop Setup
 
diff --git a/docs/tutorials/tutorial-query.md b/docs/tutorials/tutorial-query.md
index 54563513a48..66b38974db2 100644
--- a/docs/tutorials/tutorial-query.md
+++ b/docs/tutorials/tutorial-query.md
@@ -32,7 +32,6 @@ by following one of them:
 
 * [Load a file](../tutorials/tutorial-batch.md)
 * [Load stream data from Kafka](../tutorials/tutorial-kafka.md)
-* [Load a file using Hadoop](../tutorials/tutorial-batch-hadoop.md)
 
 There are various ways to run Druid SQL queries: from the web console, using a 
command line utility
 and by posting the query by HTTP. We'll look at each of these. 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(druid) branch master updated: docs: add deprecation for hadoop and java11 (#18286)

Reply via email to