Hi Apache Airavata community,

My name is Boyang Gong, and I am participating in Google Summer of Code
2026 with Apache Airavata. I wanted to send an initial update to introduce
my participation, share my current project direction, and start a public
thread where I can provide potential future progress updates to the
community.

My GSoC project page is available here:
https://summerofcode.withgoogle.com/programs/2026/projects/13nE0cqE

At this stage, my current focus is shifting toward researching and
exploring checkpoint/restart support for CPU and GPU. The goal is to
understand how running applications, especially GPU NVIDIA workloads, can
be checkpointed and later restored so that long-running workloads can
continue from a saved state.

As an initial literature and technical study, I will be reviewing the
following resources:
- NVIDIA blog on checkpointing CUDA applications with CRIU:
https://developer.nvidia.com/blog/checkpointing-cuda-applications-with-criu/
- CRIUgpu paper: https://arxiv.org/html/2502.16631v1

My current understanding is that CRIU can be used to checkpoint the
CPU/process portion of a running application, while NVIDIA CUDA
checkpointing support can help handle the GPU/CUDA portion. I will first
focus on understanding the basic mechanism and workflow for checkpointing
and restoring CUDA applications. After that, I expect to explore how this
capability could potentially fit into the broader Airavata ecosystem,
including possible future integration with Linkspan.

This is my initial plan based on recent mentor discussions, and I expect
the details may evolve as I learn more and receive further guidance. I am
looking forward to contributing to Apache Airavata and learning from the
community.

Best regards,
Boyang Gong

Reply via email to