From: Junyan He <junyan...@linux.intel.com> The profiling support is enabled by this patch set. The profiling information is as following: -------------------------- Log 0 -------------------------- | fix functions id: 7 simd: 16 kernel id: 0 | | thread id: 0 EU id: 1 half slice id: 0 | | dispatch Mask: 1 prolog: 197 epilog: 6699 | | globalX: 4~ 4 globalY: 0~ 0 globalZ: 0~ 0 | | ts0 : 64 | ts1 : 0 | ts2 : 930 | | ts3 : 0 | ts4 : 1046 | ts5 : 1170 | | ts6 : 0 | ts7 : 0 | ts8 : 0 | | ts9 : 1624 | ts10: 1838 | ts11: 0 | | ts12: 2032 | ts13: 0 | ts14: 2312 | | ts15: 2560 | ts16: 0 | ts17: 0 | | ts18: 0 | ts19: 2972 | |
Each hw thread will create one such log items. Prolog is the timestamp when we enter this kernel, while epilog is the timestamp we finish and leave it. ts0~ts19 reocord the time offsets from the prolog, but the base is 0. We now just record first 20 blocks' timestamp. Later after we fully support SourceToBinary, we can set profiling point at any location. V2: 1. Fix GLOBAL XYZ wrong value. Some curbe registers such as lid0, lid1 may have already expired when we reach the bottom block and cause the wrong global values. 2. Fix the problem of wrong device id in profiling info. 3. Fix the pointer size problems on BDW. The pointers are 8 bytes value and the dri_bo_emit_reloc will write 8 bytes. The buffer pointers for printf and profiling are declared as 4 bytes, and so the value next to the pointer in the curbe will be erased and cause the wrong results. 4. Place the prolog and epilog logic to the head and tail block. The old version places the prolog at the beginning of the first block and places the epilog at the last second block, which just before the return block. These will cause the proflog and epilog within in predication. But they should be executed unconditionally. 5. Improve the sub and add functions for timestamp calculation. From BDW, the native long type is supported, use it to make calculation more efficient. Some known issues: On DBW, some log like this: ------------------------ Log 5 ----------------------- | fix functions id: 7 simd: 16 kernel id: 0 | | thread id: 0 EU id: 8 sub slice id: 1 slice id 0 | | dispatch Mask: 1 prolog: 28578 epilog: 15445 | | globalX: 4~ 4 globalY: 0~ 0 globalZ: 0~ 0 | | ts0 : 186 | ts1 : 0 | ts2 : 1504 | | ts3 : 0 | ts4 :4294946425 | ts5 :4294946637 | | ts6 : 0 | ts7 : 0 | ts8 : 0 | | ts9 :4294947235 | ts10:4294947491 | ts11: 0 | | ts12:4294947645 | ts13: 0 | ts14:4294947819 | | ts15:4294947999 | ts16: 0 | ts17: 0 | | ts18: 0 | ts19:4294948561 | | The big huge time stamp is really strange and invalid. It can just be found when run may cases together, can when we switch to one case run, we can never duplicate it. It may have relationship with HW and will not cause any regressions, so I choose to fix it later. Signed-off-by: Junyan He <junyan...@linux.intel.com> --- backend/src/CMakeLists.txt | 3 + backend/src/backend/gen8_context.cpp | 24 + backend/src/backend/gen8_context.hpp | 2 + backend/src/backend/gen_context.cpp | 481 ++++++++++++++++++++ backend/src/backend/gen_context.hpp | 9 + .../src/backend/gen_insn_gen7_schedule_info.hxx | 2 + backend/src/backend/gen_insn_scheduling.cpp | 4 +- backend/src/backend/gen_insn_selection.cpp | 140 ++++++ backend/src/backend/gen_insn_selection.hpp | 8 + backend/src/backend/gen_insn_selection.hxx | 2 + backend/src/backend/gen_program.cpp | 9 +- backend/src/backend/gen_program.hpp | 2 +- backend/src/backend/gen_register.hpp | 9 + backend/src/backend/program.cpp | 35 +- backend/src/backend/program.h | 17 + backend/src/backend/program.hpp | 25 +- backend/src/gbe_bin_interpreter.cpp | 4 + backend/src/ir/instruction.cpp | 96 +++- backend/src/ir/instruction.hpp | 26 ++ backend/src/ir/instruction.hxx | 2 + backend/src/ir/lowering.cpp | 7 + backend/src/ir/profile.cpp | 19 +- backend/src/ir/profile.hpp | 8 +- backend/src/ir/profiling.cpp | 70 +++ backend/src/ir/profiling.hpp | 132 ++++++ backend/src/ir/unit.cpp | 6 +- backend/src/ir/unit.hpp | 10 + backend/src/llvm/llvm_gen_backend.cpp | 49 +- backend/src/llvm/llvm_gen_backend.hpp | 3 + backend/src/llvm/llvm_gen_ocl_function.hxx | 5 + backend/src/llvm/llvm_profiling.cpp | 210 +++++++++ backend/src/llvm/llvm_to_gen.cpp | 7 +- backend/src/llvm/llvm_to_gen.hpp | 3 +- src/cl_command_queue.c | 8 + src/cl_command_queue_gen7.c | 37 ++ src/cl_driver.h | 16 + src/cl_driver_defs.c | 5 + src/cl_gbe_loader.cpp | 15 + src/cl_gbe_loader.h | 3 + src/intel/intel_gpgpu.c | 58 +++ src/intel/intel_gpgpu.h | 3 +- 41 files changed, 1553 insertions(+), 21 deletions(-) -- 1.7.9.5 _______________________________________________ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet