FrankChen021 opened a new issue, #18641:
URL: https://github.com/apache/druid/issues/18641
Multi-Kubernetes Cluster Task Runner Support
## 1. Background
### Current Problem
Druid clusters deployed across multiple IDCs (within the same Availability
Zone) face a critical issue with task management when using Kubernetes-based
task runners:
**Scenario:**
- Druid cluster spans multiple IDCs (AZ1, AZ2, etc.)
- Each IDC has its own Kubernetes cluster
- Overlord leader runs in AZ1 and submits tasks to K8s cluster in AZ1
- When leader failover occurs to an Overlord in AZ2:
- New leader cannot restore running task information from K8s cluster in
AZ1
- New leader starts fresh tasks in K8s cluster in AZ2
- **Result: Duplicate tasks running simultaneously in both clusters**
<img width="850" height="156" alt="Image"
src="https://github.com/user-attachments/assets/1ecaab78-f95c-4148-a39f-e89e87b500fd"
/>
However, for middle manager deployment, there's no such problem, because
overlords in different AZs are able to communicate with middle managers in
different AZs.
So I would like to see this is a feature missing for K8S-based task
deployment.
### Root Cause
The current `KubernetesTaskRunner` implementation has a fundamental
limitation:
- **Single-cluster assumption**: Only connects to one Kubernetes cluster
- **Local restoration**: Only restores tasks from the local K8s cluster
during startup
- **No cross-cluster visibility**: Cannot discover or manage tasks running
in remote clusters
### Impact
- **Data inconsistency**: Duplicate indexing tasks process the same data
- **Resource waste**: Unnecessary compute resources consumed
- **Data corruption risk**: Conflicting writes to the same segments
- **Operational complexity**: Manual intervention required to resolve
conflicts
## 2. Solution
### High-Level Architecture
#### 2.1 Multi-Cluster Client Management
- **Cluster Registry**: Maintain a registry of all available Kubernetes
clusters across IDCs
- **Client Pool**: Create and manage Kubernetes client connections to
multiple clusters
- **Health Monitoring**: Continuously monitor cluster availability and
connectivity
- **Failover Support**: Gracefully handle cluster unavailability
#### 2.2 Cross-Cluster Task Discovery
- **Unified Task View**: Aggregate running tasks from all registered clusters
- **Cluster Tagging**: Tag each task with its originating cluster information
- **State Synchronization**: Maintain consistent view of task states across
clusters
- **Conflict Detection**: Identify and prevent duplicate task submissions
#### 2.3 Intelligent Task Placement
- **Cluster Selection Strategy**: Implement algorithms to select optimal
cluster for new tasks
- **Load Balancing**: Distribute tasks across clusters based on capacity and
load
- **Affinity Rules**: Support cluster-specific constraints and preferences
- **Resource Awareness**: Consider cluster resources and availability
#### 2.4 Enhanced Task Lifecycle Management
- **Cross-Cluster Monitoring**: Monitor task status across all clusters
- **Unified Control Plane**: Provide single interface for task management
regardless of cluster
- **Graceful Migration**: Support task migration between clusters when needed
- **Cleanup Coordination**: Ensure proper cleanup of tasks across all
clusters
### Key Components
#### 2.5 Configuration Management
- **Multi-Cluster Config**: Extend configuration to support multiple cluster
definitions
- **Context Management**: Handle different authentication contexts for each
cluster
- **Namespace Strategy**: Define namespace usage across clusters
- **Security Policies**: Implement cluster-specific security and access
controls
#### 2.6 Monitoring and Observability
- **Cross-Cluster Metrics**: Aggregate metrics from all clusters
- **Unified Logging**: Provide consolidated view of task execution across
clusters
- **Alerting**: Alert on cluster-specific or cross-cluster issues
- **Dashboard Integration**: Extend existing monitoring to show
multi-cluster view
#### 2.7 Fault Tolerance
- **Cluster Isolation**: Ensure failure in one cluster doesn't affect others
- **Task Recovery**: Implement recovery mechanisms for tasks in failed
clusters
- **Leader Election**: Ensure only one Overlord manages tasks at a time
- **Consistency Guarantees**: Maintain data consistency across cluster
boundaries
### Benefits
#### 2.8 Operational Benefits
- **High Availability**: Tasks continue running even if individual clusters
fail
- **Geographic Distribution**: Leverage multiple IDCs for better performance
- **Resource Optimization**: Better utilization of compute resources across
clusters
- **Disaster Recovery**: Natural disaster recovery through geographic
distribution
#### 2.9 Technical Benefits
- **Scalability**: Scale beyond single cluster limitations
- **Flexibility**: Choose optimal cluster for each task based on requirements
- **Resilience**: Improved fault tolerance and recovery capabilities
- **Efficiency**: Better resource utilization and load distribution
### Implementation Considerations
Implements a new K8S task runner `MultiKubernetesClusterTaskRunner` which
inherits existing `KubernetesTaskRunner` and overrides the task restore/submit
behaviour.
this task runner initializes different k8s client apis by provided
kubeconfig for different k8s clusters, and restores tasks from all these client
apis.
when a new task is submitted, pick up a live k8s cluster and use
corresponding client api to create task on target k8s cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]