On Thu, Sep 21, 2017 at 8:40 PM, Greg Clayton <clayb...@gmail.com> wrote: > >> On Sep 21, 2017, at 5:15 AM, Ramana <ramana.venka...@gmail.com> wrote: >> >> Sorry, I could not respond yesterday as I was of out of office. >> >>> Interesting. There are two ways to accomplish this: >>> 1 - Treat the CPU as one target and the GPU as another. >>> 2 - Treat the CPU and GPU as one target >>> >>> The tricky things with solution #1 is how to manage switching the targets >>> between the CPU and GPU when events happen (CPU stops, or GPU stops while >>> the other is running or already stopped). We don't have any formal >>> "cooperative targets" yet, but we know they will exist in the future >>> (client/server, vm code/vm debug of vm code, etc) so we will be happy to >>> assist with questions if and when you get there. >> >> I was going along the option #1. Would definitely post here with more >> questions as I progress, thank you. Fortunately, the way OpenVX APIs >> work is, after off-loading the tasks to GPU, they will wait for the >> GPU to complete those tasks before continuing further. And in our >> case, both CPU and GPU can be controlled separately. Given that, do >> you think I still need to bother much about "cooperative targets"? > > If you just want to make two targets that know nothing about each other, then > that is very easy. Is that what you were asking?
Probably I am not getting the significance of "cooperative targets" for our setup at this point. Will get back after I dig a little more deeper. > >> >>> GPU debugging is tricky since they usually don't have a kernel or anything >>> running on the hardware. Many examples I have seen so far will set a >>> breakpoint in the program at some point by compiling the code with a >>> breakpoint inserted, run to that breakpoint, and then if the user wants to >>> continue, you recompile with breakpoints set at a later place and re-run the >>> entire program again. Is your GPU any different? >> >>> We also discussed how to single step in a GPU program. Since multiple cores >>> on the GPU are concurrently running the same program, there was discussion >>> on how single stepping would work. If you are stepping and run into an >>> if/then statement, do you walk through the if and the else at all times? One >>> GPU professional was saying this is how GPU folks would want to see single >>> stepping happen. So I think there is a lot of stuff we need to think about >>> when debugging GPUs in general. >> >> Thanks for sharing that. Yeah, ours is a little different. Basically, >> from the top level, the affinity in our case is per core of the GPU. I >> am not there yet to discuss more on this. > > ok, let me know when you are ready to ask more questions. > >> >>> So we currently have no cooperative targets in LLDB. This will be the first. >>> We will need to discuss how hand off between the targets will occur and many >>> other aspects. We will be sure to comment when and if you get to this point. >> >> Thank you. Will post more when I get there. > > Sounds good. >> >> Regards, >> Ramana >> >> On Tue, Sep 19, 2017 at 8:56 PM, Greg Clayton <clayb...@gmail.com> wrote: >>> >>> On Sep 19, 2017, at 3:32 AM, Ramana <ramana.venka...@gmail.com> wrote: >>> >>> Thank you so much Greg for your comments. >>> >>> What architecture and os are you looking to support? >>> >>> >>> The OS is Linux and the primary use scenario is remote debugging. >>> Basically http://lists.llvm.org/pipermail/lldb-dev/2017-June/012445.html >>> is what I am trying to achieve and unfortunately that query did not >>> get much attention of the members. >>> >>> >>> Sorry about missing that. I will attempt to address this now: >>> >>> I have to implement a debugger for our HW which comprises of CPU+GPU where >>> the GPU is coded in OpenCL and is accelerated through OpenVX API in C++ >>> application which runs on CPU. Our requirement is we should be able to >>> debug the code running on both CPU and GPU simultaneously with in the same >>> LLDB debug session. >>> >>> >>> Interesting. There are two ways to accomplish this: >>> 1 - Treat the CPU as one target and the GPU as another. >>> 2 - Treat the CPU and GPU as one target >>> >>> There are tricky areas for both, but for sanity I would suggest options #1. >>> >>> The tricky things with solution #1 is how to manage switching the targets >>> between the CPU and GPU when events happen (CPU stops, or GPU stops while >>> the other is running or already stopped). We don't have any formal >>> "cooperative targets" yet, but we know they will exist in the future >>> (client/server, vm code/vm debug of vm code, etc) so we will be happy to >>> assist with questions if and when you get there. >>> >>> Option #2 would be tricky as this would be the first target that has >>> multiple architectures within one process. IF the CPU and GPU be be >>> controlled separately, then I would go with option #1 as LLDB currently >>> always stops all threads in a process when any thread stops. You would also >>> need to implement different register contexts for each thread within such a >>> target. It hasn't been done yet, other than through the OS plug-ins that can >>> provide extra threads to show in case you are doing some sort of user space >>> threading. >>> >>> GPU debugging is tricky since they usually don't have a kernel or anything >>> running on the hardware. Many examples I have seen so far will set a >>> breakpoint in the program at some point by compiling the code with a >>> breakpoint inserted, run to that breakpoint, and then if the user wants to >>> continue, you recompile with breakpoints set at a later place and re-run the >>> entire program again. Is your GPU any different? Since they will be used in >>> an OpenCL context maybe your solution is better? We also had discussions on >>> how to represent the various "waves" or sets of cores running the same >>> program on the GPU. The easiest solution is to make one thread per distinct >>> core on the GPU. The harder way would be to treat a thread as a collection >>> of multiple cores and each variable value now can have one value per core. >>> >>> We also discussed how to single step in a GPU program. Since multiple cores >>> on the GPU are concurrently running the same program, there was discussion >>> on how single stepping would work. If you are stepping and run into an >>> if/then statement, do you walk through the if and the else at all times? One >>> GPU professional was saying this is how GPU folks would want to see single >>> stepping happen. So I think there is a lot of stuff we need to think about >>> when debugging GPUs in general. >>> >>> Looking at the mailing list archive I see that there were discussions about >>> this feature in LLDB here >>> http://lists.llvm.org/pipermail/lldb-dev/2014-August/005074.html. >>> >>> What is the present status i.e. what works today and what is to be improved >>> of simultaneous multiple target debugging support in LLDB? Were the changes >>> contributed to LLDB mainstream? >>> >>> >>> So we currently have no cooperative targets in LLDB. This will be the first. >>> We will need to discuss how hand off between the targets will occur and many >>> other aspects. We will be sure to comment when and if you get to this point. >>> >>> How can I access the material for http://llvm.org/devmtg/2014-10/#bof5 >>> (Future directions and features for LLDB) >>> >>> Over the years we have talked about this, but it never really got into any >>> real amount of detail and I don't think the BoF notes will help you much. >>> >>> Appreciate any help/guidance provided on the same. >>> >>> I do believe approach #1 will work the best. The easiest thing you can do is >>> to insulate LLDB from the GPU by putting it behind a GDB server boundary. >>> Then we need to really figure out how we want to do GPU debugging. >>> >>> Hopefully this filled in your missing answers. Let me know what questions >>> you have. >>> >>> Greg >>> >>> Thanks, >>> Ramana >>> >>> On Mon, Sep 18, 2017 at 8:46 PM, Greg Clayton <clayb...@gmail.com> wrote: >>> >>> When supporting a new architecture, our preferred route is to modify >>> lldb-server (a GDB server binary that supports native debugging) to support >>> your architecture. Why? Because this gets you remote debugging for free. If >>> you go this route, then you will subclass a >>> lldb_private::NativeRegisterContext and that will get used by lldb-server >>> (along with lldb_private::NativeProcessProtocol and >>> lldb_private::NativeThreadProtocol). If you are adding a new architecture to >>> Linux, then you will likely just need to subclass NativeRegisterContext. >>> >>> The other way to go is to subclass lldb_private::Process, >>> lldb_private::Thread and lldb_private::RegisterContext. >>> >>> The nice thing about the lldb_private::Native* subclasses is that you only >>> need to worry about native support. You can use #ifdef and use system header >>> files, where as the non native route, those classes need to be able to debug >>> remotely and you can't rely on system headers (lldb_private::Process, >>> lldb_private::Thread and lldb_private::RegisterContext) since they can be >>> compiled on any system for possibly local debugging (if current >>> arch/vendor/os matches the current system) and remote (if you use >>> lldb-server or another form for RPC). >>> >>> I would highly suggest getting going the lldb-server route as then you can >>> use system header files that contain the definitions of the registers and >>> you only need to worry about the native architecture. Linux uses ptrace and >>> has much the the common code filtered out into correct classes (posix >>> ptrace, linux specifics, and more. >>> >>> What architecture and os are you looking to support? >>> >>> Greg Clayton >>> >>> On Sep 16, 2017, at 6:28 AM, Ramana <ramana.venka...@gmail.com> wrote: >>> >>> Thank you Greg for the detailed response. >>> >>> Can you please also shed some light on the NativeRegisterContext. When >>> do we need to subclass NativeRegisterContext and (how) are they >>> related to RegisterContext<OS>_<Arc >>> It appears that not all architectures having >>> RegisterContext<OS>_<Arch> have sub classed NativeRegisterContext. >>> >>> Regards, >>> Ramana >>> >>> On Thu, Sep 14, 2017 at 9:02 PM, Greg Clayton <clayb...@gmail.com> wrote: >>> >>> Seems like this class was added for testing. RegisterInfoInterface is a >>> class that creates a common API for getting lldb_private::RegisterInfo >>> structures. >>> >>> A RegisterContext<OS>_<Arch> class uses one of these to be able to create a >>> buffer large enough to store all registers defined in the >>> RegisterInfoInterface and will actually read/write there registers to/from >>> the debugged process. RegisterContext also caches registers values so they >>> don't get read multiple times when the process hasn't resumed. A >>> RegisterContext subclass is needed for each architecture so we can >>> dynamically tell LLDB what the registers look like for a given architecture. >>> It also provides abstractions by letting each register define its registers >>> numbers for Compilers, DWARF, and generic register numbers like PC, SP, FP, >>> return address, and flags registers. This allows the generic part of LLDB to >>> say "I need you to give me the PC register for this thread" and we don't >>> need to know that the register is "eip" on x86, "rip" on x86_64, "r15" on >>> ARM. RegisterContext classes can also determine how registers are >>> read/written: one at a time, or "get all general purpose regs" and "get all >>> FPU regs". So if someone asks a RegisterContext to read the PC, it might go >>> read all GPR regs and then mark them all as valid in the register context >>> buffer cache, so if someone subsequently asks for SP, it will be already >>> cached. >>> >>> So RegisterInfoInterface defines a common way that many RegisterContext >>> classes can inherit from in order to give out the lldb_private::RegisterInfo >>> (which is required by all subclasses of RegisterContext) info for a register >>> context, and RegisterContext is the one that actually will interface with >>> the debugged process in order to read/write and cache those registers as >>> efficiently as possible for the current program being debugged. >>> >>> On Sep 12, 2017, at 10:59 PM, Ramana via lldb-dev <lldb-dev@lists.llvm.org> >>> wrote: >>> >>> Hi, >>> >>> When deriving RegisterContext<OS>_<Arch>, why some platforms (Arch+OS) >>> are deriving it from lldb_private::RegisterContext while others are >>> deriving from lldb_private::RegisterInfoInterface or in other words >>> how to decide on the base class to derive from between those two and >>> what are the implications? >>> >>> Thanks, >>> Ramana >>> _______________________________________________ >>> lldb-dev mailing list >>> lldb-dev@lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev >>> >>> >>> >>> > _______________________________________________ lldb-dev mailing list lldb-dev@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev