[I] Multi-Kubernetes Cluster Task Runner Support (druid)

via GitHub Sat, 18 Oct 2025 05:56:30 -0700


FrankChen021 opened a new issue, #18641:
URL: https://github.com/apache/druid/issues/18641


   Multi-Kubernetes Cluster Task Runner Support
   
   ## 1. Background
   
   ### Current Problem
   Druid clusters deployed across multiple IDCs (within the same Availability 
Zone) face a critical issue with task management when using Kubernetes-based 
task runners:
   
   **Scenario:**
   - Druid cluster spans multiple IDCs (AZ1, AZ2, etc.)
   - Each IDC has its own Kubernetes cluster
   - Overlord leader runs in AZ1 and submits tasks to K8s cluster in AZ1
   - When leader failover occurs to an Overlord in AZ2:
     - New leader cannot restore running task information from K8s cluster in 
AZ1
     - New leader starts fresh tasks in K8s cluster in AZ2
     - **Result: Duplicate tasks running simultaneously in both clusters**
   
   <img width="850" height="156" alt="Image" 
src="https://github.com/user-attachments/assets/1ecaab78-f95c-4148-a39f-e89e87b500fd";
 />
   
   However, for middle manager deployment, there's no such problem, because 
overlords in different AZs are able to communicate with middle managers in 
different AZs.
   So I would like to see this is a feature missing for K8S-based task 
deployment.
   
   ### Root Cause
   The current `KubernetesTaskRunner` implementation has a fundamental 
limitation:
   - **Single-cluster assumption**: Only connects to one Kubernetes cluster
   - **Local restoration**: Only restores tasks from the local K8s cluster 
during startup
   - **No cross-cluster visibility**: Cannot discover or manage tasks running 
in remote clusters
   
   ### Impact
   - **Data inconsistency**: Duplicate indexing tasks process the same data
   - **Resource waste**: Unnecessary compute resources consumed
   - **Data corruption risk**: Conflicting writes to the same segments
   - **Operational complexity**: Manual intervention required to resolve 
conflicts
   
   ## 2. Solution
   
   ### High-Level Architecture
   
   #### 2.1 Multi-Cluster Client Management
   - **Cluster Registry**: Maintain a registry of all available Kubernetes 
clusters across IDCs
   - **Client Pool**: Create and manage Kubernetes client connections to 
multiple clusters
   - **Health Monitoring**: Continuously monitor cluster availability and 
connectivity
   - **Failover Support**: Gracefully handle cluster unavailability
   
   #### 2.2 Cross-Cluster Task Discovery
   - **Unified Task View**: Aggregate running tasks from all registered clusters
   - **Cluster Tagging**: Tag each task with its originating cluster information
   - **State Synchronization**: Maintain consistent view of task states across 
clusters
   - **Conflict Detection**: Identify and prevent duplicate task submissions
   
   #### 2.3 Intelligent Task Placement
   - **Cluster Selection Strategy**: Implement algorithms to select optimal 
cluster for new tasks
   - **Load Balancing**: Distribute tasks across clusters based on capacity and 
load
   - **Affinity Rules**: Support cluster-specific constraints and preferences
   - **Resource Awareness**: Consider cluster resources and availability
   
   #### 2.4 Enhanced Task Lifecycle Management
   - **Cross-Cluster Monitoring**: Monitor task status across all clusters
   - **Unified Control Plane**: Provide single interface for task management 
regardless of cluster
   - **Graceful Migration**: Support task migration between clusters when needed
   - **Cleanup Coordination**: Ensure proper cleanup of tasks across all 
clusters
   
   ### Key Components
   
   #### 2.5 Configuration Management
   - **Multi-Cluster Config**: Extend configuration to support multiple cluster 
definitions
   - **Context Management**: Handle different authentication contexts for each 
cluster
   - **Namespace Strategy**: Define namespace usage across clusters
   - **Security Policies**: Implement cluster-specific security and access 
controls
   
   #### 2.6 Monitoring and Observability
   - **Cross-Cluster Metrics**: Aggregate metrics from all clusters
   - **Unified Logging**: Provide consolidated view of task execution across 
clusters
   - **Alerting**: Alert on cluster-specific or cross-cluster issues
   - **Dashboard Integration**: Extend existing monitoring to show 
multi-cluster view
   
   #### 2.7 Fault Tolerance
   - **Cluster Isolation**: Ensure failure in one cluster doesn't affect others
   - **Task Recovery**: Implement recovery mechanisms for tasks in failed 
clusters
   - **Leader Election**: Ensure only one Overlord manages tasks at a time
   - **Consistency Guarantees**: Maintain data consistency across cluster 
boundaries
   
   ### Benefits
   
   #### 2.8 Operational Benefits
   - **High Availability**: Tasks continue running even if individual clusters 
fail
   - **Geographic Distribution**: Leverage multiple IDCs for better performance
   - **Resource Optimization**: Better utilization of compute resources across 
clusters
   - **Disaster Recovery**: Natural disaster recovery through geographic 
distribution
   
   #### 2.9 Technical Benefits
   - **Scalability**: Scale beyond single cluster limitations
   - **Flexibility**: Choose optimal cluster for each task based on requirements
   - **Resilience**: Improved fault tolerance and recovery capabilities
   - **Efficiency**: Better resource utilization and load distribution
   
   ### Implementation Considerations
   
   Implements a new K8S task runner `MultiKubernetesClusterTaskRunner` which 
inherits existing `KubernetesTaskRunner` and overrides the task restore/submit 
behaviour.
   
   this task runner initializes different k8s client apis by provided 
kubeconfig for different k8s clusters, and restores tasks from all these client 
apis.
   
   when a new task is submitted, pick up a live k8s cluster and use 
corresponding client api to create task on target k8s cluster.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Multi-Kubernetes Cluster Task Runner Support (druid)

Reply via email to