On 3/14/26 21:36, vatsal darji wrote:

> Hello mentors,
>
> I am writing to share my progress on Experiment-0. I am happy to 
> report that all four phases have been completed successfully. Please 
> find the detailed report attached.
>
> Here is a brief summary of the results:
>
> Phase 1: OpenMP
> Cloned GCC from the devel/omp/gcc-15 branch, built libgomp from 
> source, and compiled the UA benchmark (Class A) from NPB v3.4.4 
> against the custom libgomp. The benchmark ran successfully with 
> Verification = SUCCESSFUL in 27.45 seconds using 4 OpenMP threads.
>
> Phase 2: Virtualization
> Booted two QEMU/KVM VMs on the same host with a 1:2 pCPU:vCPU 
> oversubscription ratio. Wrote a host-side script that simultaneously 
> launched the UA benchmark on VM-1 and a stress-ng CPU load on VM-2 
> over SSH. Under contention, the benchmark time increased from 27.45s 
> to 121.53s a ~4.5x slowdown which directly demonstrates the cost of 
> vCPU oversubscription on OpenMP execution.
>
> Phase 3: Tracing
> Installed trace-cmd on the host and both VMs. Captured sched_switch 
> events across all three machines simultaneously during Experiment-0. 
> Analyzed the resulting traces in KernelShark, where the preemption of 
> ua.A.x threads is clearly visible as gaps in the timeline, correlating 
> with high scheduling activity from the competing VM-2 workload.
>
> Phase 4: Scheduler / eBPF
> Cloned the Linux stable kernel source and identified the sched_switch 
> and sched_stat_runtime tracepoints in the kernel source. Wrote an eBPF 
> program in C that attaches to both tracepoints and computes the 
> average number of context switches per scheduler tick. Verified GCC's 
> BPF backend availability via gcc -print-targets, and compiled the 
> final eBPF object using clang's BPF backend (gcc -target bpf failed as 
> -target is clang-specific syntax, noted in the report). Re-ran 
> Experiment-0 with the eBPF program loaded and observed:
>
>     Total context switches : 304,852
>     Total scheduler ticks  : 865,909
>     Avg context switches / tick : 0.3521
>
> The 0.3521 ratio confirms measurable scheduler pressure caused by 
> oversubscription. In an uncontended system, this would be close to 0, 
> these 0.3521 switches per tick represent the overhead that libgomp's 
> barrier synchronization is currently blind to.
>
> Additional Study
> Alongside Experiment-0, I have been studying the libgomp functions you 
> recommended gomp_dynamic_max_threads(), gomp_team_start(), 
> gomp_thread_start(), and the barrier synchronization implementation in 
> wait.h, bar.c, and bar.h. I have also been reading your thesis 
> (placeholder version) which has been extremely helpful in connecting 
> the codebase with the problem space. I will continue studying both as 
> I begin working on my GSoC application.
>
> The attached report contains the full step-by-step documentation, all 
> commands, outputs, and analysis for each phase, along with the 
> screenshots and proof of each result.
>
> Please let me know if you have any questions or if there is anything 
> you would like me to revisit or explore further.
>
> Best regards,
> Vatsal Darji

Hi Vatsal,

Thank you for the very detailed and well-organized report. I think at 
this point you have enough familiarity with the problem space, so I 
suggest that you start working on your GSoC proposal. I am interested in 
taking a look at your usage of the bpf map in the task for experiment-0. 
It is okay to share the eBPF code, and your GSoC proposal drafts only 
with the mentors, as other students are also working on these competing 
tasks.

Thanks for reporting on the failure for using GCC for compiling your 
eBPF code. Please note, that we have a strong preference for using GCC 
eBPF back-end for Phantom Tracker. So in your proposal, I look forward 
to reading about your take on troubleshooting the failure you 
encountered. Please also note that you are more than welcome to ask 
questions and discuss the design choices with the mentors before 
finalizing your proposal.

Good luck,
Himadri



Reply via email to