This is an automated email from the ASF dual-hosted git repository. DImuthuUpe pushed a commit to branch allocation-dm in repository https://gitbox.apache.org/repos/asf/airavata-custos.git
commit 087075dca149e5038bee0bf07fa69cb960a43fd0 Author: DImuthuUpe <[email protected]> AuthorDate: Sat May 16 09:02:37 2026 -0400 Allocation data models and documentation --- docs/Allocation-Data-Models.md | 109 +++++++++++++++++++++++++++++++++++++++++ docs/allocation-dm.png | Bin 0 -> 2740751 bytes pkg/models/allocation.go | 99 +++++++++++++++++++++++++++++++++++++ pkg/models/project.go | 27 ++++++++++ 4 files changed, 235 insertions(+) diff --git a/docs/Allocation-Data-Models.md b/docs/Allocation-Data-Models.md new file mode 100644 index 000000000..b15f61c3f --- /dev/null +++ b/docs/Allocation-Data-Models.md @@ -0,0 +1,109 @@ +# Compute Allocation Management — Data Models + +## Overview + +This document describes the data models that power a compute allocation management system. The system manages how projects receive, track, and consume compute resources (GPUs, CPUs, etc.) across clusters, with full auditability and fine-grained access control. + +The central abstraction is the **Service Unit (SU)** — a common currency that normalizes heterogeneous resources (GPU hours, CPU hours, etc.) into a single comparable unit. + +--- + +## Core Concepts + +### Users, Organizations, and Projects + +A **User** belongs to an **Organization** and is identified by name and email. Users interact with the system in two roles: + +- **Project PI (Principal Investigator):** A user referenced by `Project.ProjectPIID` who owns and is responsible for a project. +- **Allocation Member:** A user added to a specific compute allocation via `ComputeAllocationMembership`, granting them permission to submit jobs against that allocation's SU budget. + +A **Project** groups one or more compute allocations (and, in the future, storage allocations) under a single umbrella. Projects carry an `Origination` field indicating the source system (ACCESS, NAIRR, XRASS, etc.) and a corresponding `OriginatedID` for cross-referencing. + +### Compute Clusters and Allocations + +A **ComputeCluster** represents a physical or logical cluster where resources are provisioned. + +A **ComputeAllocation** is the primary record linking a project to a cluster. It captures: + +- The cluster where resources live (`ComputeClusterID`). +- An initial SU budget (`InitialSUAmount`) that covers all resource types within the allocation. +- A validity window (`StartTime` / `EndTime`). +- A lifecycle status (`ACTIVE`, `INACTIVE`, `DELETED`). + +A single project can bundle multiple compute allocations — for example, one allocation on Cluster A with GPU resources and another on Cluster B with CPU resources. + +### Resources and Rate Conversion + +**ComputeAllocationResource** represents a specific type of computing unit available within a cluster — for example, "GPU B200", "GPU RTX6000", or "CPU". Each resource records its type and the quantity allocated. + +Resources are linked to allocations through **ComputeAllocationResourceMapping**, which is a many-to-many join: a single allocation can include multiple resource types, and the same resource definition can appear across allocations. + +**ComputeAllocationResourceRate** defines the SU conversion rate for a given resource over a time window. This is the mechanism that normalizes raw resource consumption into the common SU currency. For example: + +| Resource | Rate | Meaning | +|--------------|---------------|--------------------------------| +| GPU H200 | 10.0 | 10 GPU-hours = 1 SU | +| CPU | 100.0 | 100 CPU-hours = 1 SU | +| GPU RTX6000 | 20.0 | 20 GPU-hours = 1 SU | + +Rates are time-bounded (`StartTime` / `EndTime`), allowing rates to change over time without losing historical accuracy. + +### Tracking Changes — Diffs and Change Requests + +All modifications to a compute allocation are captured as **ComputeAllocationDiff** records, providing a complete audit trail. A diff records what changed (SU amount, status, etc.), when it changed, and why. + +Diffs are created through two paths: + +1. **User-initiated changes:** A user submits a **ComputeAllocationChangeRequest** (e.g., requesting additional SUs or a status change). A resource provider admin reviews the request, and upon approval, a corresponding `ComputeAllocationDiff` is generated for the target allocation. The request carries a lifecycle of its own (`PENDING` → `APPROVED` / `REJECTED`), tracked through **ComputeAllocationChangeRequestEvent** records. + +2. **Automated workflows:** Systems such as ACCESS AIME can create `ComputeAllocationDiff` records directly, bypassing the change request flow. This supports programmatic adjustments like periodic SU top-ups or automatic deactivation. + +### Usage Recording + +**ComputeAllocationUsage** tracks resource consumption at the most granular level — per job, per user, per resource type. Each record captures both the raw amount consumed (e.g., 20 GPU-hours) and the equivalent SU cost (calculated using the effective rate at `CalculatedTime`). + +Aggregating all `ComputeAllocationUsage` records for a given allocation yields the total SU consumption, which can be compared against the allocation's SU budget to determine remaining balance. + +### Membership and Per-User SU Limits + +**ComputeAllocationMembership** controls which users can submit jobs against an allocation. Each membership has its own validity window and status, independent of the parent allocation. + +By default, members of an allocation inherit access to the full SU pool. However, administrators can enforce per-user caps by setting the `AllocationAmount` field on a membership record. This partitions a large allocation across members — for example, giving one researcher 500 SUs and another 300 SUs out of a 1,000 SU allocation — preventing any single user from exhausting the shared budget. + +### Multi-Level Status Control + +Allocation state can be controlled from three independent levels: + +| Level | Controlled By | Effect | +|------------------|--------------------------------------------------|-----------------------------------------------------| +| **Project** | Project status / PI actions | Disabling a project disables all its allocations. | +| **Allocation** | `ComputeAllocation.Status` | An individual allocation can be deactivated independently. | +| **User** | `ComputeAllocationMembership.MembershipStatus` | A specific user's access can be revoked without affecting the allocation or other members. | + +This layered approach provides flexibility: an admin can freeze an entire project, pause a single allocation, or remove one user's access — each without disturbing the other levels. + +--- + +## Entity Relationship Summary + + + +--- + +## Model Reference + +| Model | Purpose | +|----------------------------------------|----------------------------------------------------------------| +| `Organization` | Groups users under an institution. | +| `User` | A person who can be a PI or allocation member. | +| `Project` | Bundles allocations; linked to an origination system. | +| `ComputeCluster` | A cluster where resources are provisioned. | +| `ComputeAllocation` | SU budget for a project on a specific cluster. | +| `ComputeAllocationResource` | A specific resource type (GPU model, CPU, etc.). | +| `ComputeAllocationResourceMapping` | Links resources to allocations (many-to-many). | +| `ComputeAllocationResourceRate` | SU conversion rate for a resource, time-bounded. | +| `ComputeAllocationDiff` | Audit record of any change to an allocation. | +| `ComputeAllocationChangeRequest` | User-submitted request to modify an allocation. | +| `ComputeAllocationChangeRequestEvent` | Lifecycle events on a change request. | +| `ComputeAllocationUsage` | Per-job, per-user resource consumption record. | +| `ComputeAllocationMembership` | User access to an allocation, with optional SU cap. | \ No newline at end of file diff --git a/docs/allocation-dm.png b/docs/allocation-dm.png new file mode 100644 index 000000000..c05d2a929 Binary files /dev/null and b/docs/allocation-dm.png differ diff --git a/pkg/models/allocation.go b/pkg/models/allocation.go new file mode 100644 index 000000000..f024dbdf2 --- /dev/null +++ b/pkg/models/allocation.go @@ -0,0 +1,99 @@ +package models + +import "time" + +type AllocationStatus string + +const ( + ACTIVE AllocationStatus = "ACTIVE" + INACTIVE AllocationStatus = "INACTIVE" + DELETED AllocationStatus = "DELETED" +) + +type ComputeCluster struct { + ID string `json:"id"` + Name string `json:"name"` // A human-readable name for the compute cluster, e.g., "Cluster A", "Cluster B", etc. +} + +type ComputeAllocation struct { + ID string `json:"id"` + ProjectID string `json:"project_id"` + Name string `json:"name"` + Status AllocationStatus `json:"status"` // ACTIVE, INACTIVE, DELETED, etc. + ComputeClusterID string `json:"compute_cluster_id"` // The ID of the compute cluster where the allocation is provisioned. + InitialSUAmount int64 `json:"initial_su_amount"` // SUs allocated at the time of allocation creation. + StartTime time.Time `json:"start_time"` + EndTime time.Time `json:"end_time"` +} + +type ComputeAllocationResource struct { + ID string `json:"id"` + Name string `json:"name"` // A human-readable name for the resource, e.g., "GPU B200", "CPU", "GPU RTX6000", etc. + ResourceType string `json:"resource_type"` // CPU, GPU, etc. + ResourceAmount int64 `json:"resource_amount"` // Number of CPUs, GPUs, etc. allocated. +} + +type ComputeAllocationResourceMapping struct { + ID string `json:"id"` + ComputeAllocationID string `json:"compute_allocation_id"` + ComputeAllocationResourceID string `json:"compute_allocation_resource_id"` +} + +type ComputeAllocationResourceRate struct { + ID string `json:"id"` + ComputeAllocationResourceID string `json:"compute_allocation_resource_id"` + Rate float64 `json:"rate"` // The rate for the resource in SUs per unit, e.g., 0.5 SU per CPU hour, 2 SU per GPU hour, etc. + StartTime time.Time `json:"start_time"` // The time when this rate becomes effective. + EndTime time.Time `json:"end_time"` // The time when this rate expires. +} + +type ComputeAllocationDiff struct { // Diff will occur either through a change reqest or automated workflow like ACCESS AIME + ID string `json:"id"` + ComputeAllocationID string `json:"compute_allocation_id"` + DiffType string `json:"diff_type"` // "USAGE_UPDATE", "ALLOCATION_STATUS_CHANGE", etc. + NewSUAmount int64 `json:"new_su_amount"` // New allocation amount in SUs, e.g., 900 SUs, etc. + Status AllocationStatus `json:"status"` // ACTIVE, INACTIVE, DELETED, etc. + Timestamp time.Time `json:"timestamp"` // The time when the diff was generated. + Description string `json:"description,omitempty"` // Optional description of the diff, e.g., "SU usage updated based on job completion", "Allocation marked as INACTIVE due to end time reached", etc. +} + +type ComputeAllocationChangeRequest struct { // Represents a request to change the allocation, e.g., requesting more SUs, requesting a reduction in SUs, etc from users or admins. + ID string `json:"id"` + ComputeAllocationID string `json:"compute_allocation_id"` + RequestedSUAmount int64 `json:"requested_su_amount"` // The requested allocation amount in SUs, e.g., 1200 SUs, etc. + RequestedStatus AllocationStatus `json:"requested_status"` // ACTIVE, INACTIVE, DELETED, etc. + Reason string `json:"reason"` // The reason for the change request, e.g., "Need more SUs for upcoming jobs", "Requesting reduction in SUs due to project completion", etc. + ChangeStatus string `json:"change_status"` // "PENDING", "APPROVED", "REJECTED", etc. + RequesterID string `json:"requester_id"` // The ID of the user who made the change request. + ApproverID string `json:"approver_id,omitempty"` // The ID of the user who approved/rejected the change request, if applicable. + Timestamp time.Time `json:"timestamp"` // The time when the change request was made. +} + +type ComputeAllocationChangeRequestEvent struct { + ID string `json:"id"` + ComputeAllocationChangeRequestID string `json:"compute_allocation_change_request_id"` + EventType string `json:"event_type"` // "CREATED", "APPROVED", "REJECTED", etc. + Description string `json:"description,omitempty"` // Optional description of the event, e.g., "Change request created by user", "Change request approved by admin", etc. + Timestamp time.Time `json:"timestamp"` // The time when the event occurred. +} + +type ComputeAllocationUsage struct { // Represents the usage of a compute allocation, e.g., when a job consumes some of the allocated SUs, etc. + ID string `json:"id"` + ComputeAllocationID string `json:"compute_allocation_id"` + UsedRawAmount int64 `json:"used_raw_amount"` // The raw amount of resource used, e.g., 20 CPU hours, 10 GPU hours, etc. + UsedSUAmount int64 `json:"used_su_amount"` // SUs used by the allocation, e.g., 200 SUs, etc. + CalculatedTime time.Time `json:"last_updated"` // The last time the usage was updated. SU should be calculated up to this point in time and charge rates should be applied based on the rates effective at this time. + UserID string `json:"user_id"` // The ID of the user who used the allocation. + JobID string `json:"job_id"` // The ID of the job that consumed the allocation. + ComputeAllocationResourceID string `json:"compute_allocation_resource_id"` // The specific resource consumed, e.g., 20 CPU hours, 10 GPU hours, etc. +} + +type ComputeAllocationMembership struct { + ID string `json:"id"` + ComputeAllocationID string `json:"compute_allocation_id"` + UserID string `json:"user_id"` + AllocationAmount int64 `json:"allocation_amount"` // SUs allocated to the user, e.g., 100 CPU hours, 50 GPU hours, etc. + StartTime time.Time `json:"start_time"` + EndTime time.Time `json:"end_time"` + MembershipStatus AllocationStatus `json:"membership_status"` // ACTIVE, INACTIVE, etc. +} diff --git a/pkg/models/project.go b/pkg/models/project.go new file mode 100644 index 000000000..99841b6f6 --- /dev/null +++ b/pkg/models/project.go @@ -0,0 +1,27 @@ +package models + +import "time" + +type Project struct { + ID string `json:"id"` + OriginatedID string `json:"originated_id"` // The ID of the project in origination. For example: ACCESS Record ID. + Title string `json:"title"` + Origination string `json:"origination"` // ACCESS, NAIRR, XRASS, etc. + ProjectPIID string `json:"project_pi_id"` + CreatedTime time.Time `json:"created_time"` +} + +type Organization struct { + ID string `json:"id"` + OriginatedID string `json:"originated_id"` // The ID of the organization in origination. For example: ACCESS Record ID. + Name string `json:"name"` +} + +type User struct { + ID string `json:"id"` + OrganizationID string `json:"organization_id"` + FirstName string `json:"first_name"` + LastName string `json:"last_name"` + MiddleName string `json:"middle_name,omitempty"` + Email string `json:"email"` +}
