jprotze wrote:

> The nested `target_data_op` can actually cause issues, as demonstrated with 
> our dumping tool:
> 
> ```
> [0][callback_target_data_op_emi] endpoint = begin | target_task_data->value = 
> 777777777 ((nil)) | target_data->value = 0 (0x76723fc137a0) | *host_op_id = 
> 888000003 (0x76723fc13798) | optype = memset | src_addr = (nil) | 
> src_device_num = -1 | dest_addr = 0x767234200000 | dest_device_num = 0 | 
> bytes = 400 | codeptr_ra = 0x5fc5ca1ca25a
> [0][callback_target_data_op_emi] endpoint = begin | target_task_data->value = 
> 777777777 ((nil)) | target_data->value = 0 (0x76723fc137a0) | *host_op_id = 
> 888000004 (0x76723fc13798) | optype = transfer_to_device | src_addr = 
> 0x5fc5e51cb5f0 | src_device_num = -1 | dest_addr = 0x767234200000 | 
> dest_device_num = 0 | bytes = 400 | codeptr_ra = 0x5fc5ca1ca25a
> [0][callback_target_data_op_emi] endpoint = end | target_task_data->value = 
> 777777777 ((nil)) | target_data->value = 0 (0x76723fc137a0) | *host_op_id = 
> 888000004 (0x76723fc13798) | optype = transfer_to_device | src_addr = 
> 0x5fc5e51cb5f0 | src_device_num = -1 | dest_addr = 0x767234200000 | 
> dest_device_num = 0 | bytes = 400 | codeptr_ra = 0x5fc5ca1ca25a
> [0][callback_target_data_op_emi] endpoint = end | target_task_data->value = 
> 777777777 ((nil)) | target_data->value = 0 (0x76723fc137a0) | *host_op_id = 
> 888000004 (0x76723fc13798) | optype = memset | src_addr = (nil) | 
> src_device_num = -1 | dest_addr = 0x767234200000 | dest_device_num = 0 | 
> bytes = 400 | codeptr_ra = 0x5fc5ca1ca25a
> ```
> 
> The `host_op_id` when exiting `memset` corresponds to the value set by 
> `transfer_to_device` (`888[...]4`). This is incorrect and should be 
> `888[...]3` instead. By quickly glancing at the code, I'm not sure if the 
> runtime is able to handle such nested events at all currently, almost 
> certainly not for `HostOpId`.

Depending on the tool workflow, this wrong information might lead to memory 
leaks and double-frees. To handle this nested sequence of data-op callbacks, we 
would need a stack for the `HostOpId`. If we know, that it will never be more 
than two levels, we could cheat and make it a static-sized thread-local stack.

https://github.com/llvm/llvm-project/pull/194168
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to