On Thu, Nov 12, 2015 at 12:19:50PM +0100, Jakub Jelinek wrote: > On Mon, Nov 09, 2015 at 05:58:56PM +0100, Martin Jambor wrote: > > > But I don't see any way to disable it on the command line? (no switch?) > > > > No, the switch is -foffload, which has missing documentation (PR > > 67300) and is only described at https://gcc.gnu.org/wiki/Offloading > > Nevertheless, the option allows the user to specify compiler option > > -foffload=disable and no offloading should happen, not even HSA. The > > user can also enumerate just the offload targets they want (and pass > > them special command line stuff). > > > > It seems I have misplaced a hunk in the patch series. Nevertheless, > > in the first patch (with configuration stuff), there is a change to > > opts.c which scans the -foffload= contents and sets the flag variable > > if hsa is not present. > > > > Whenever the compiler has to decide whether HSA is enabled for the > > given compilation or not, it has to look at this variable (if > > configured for HSA). > > Buut what is the difference between > -foffload=disable > or > -foffload={list not including hsa} > and the new param? If you don't gridify, you don't emit any kernels... >
We do. When a kernel cannot be gridified, we try to handle it via dynamic parallelism (i.e. launching a kernel from a kernel) . Even though we have not been able to get any good performance with it and there are several limitations and open problems, I still include this option in my plans because it is unlikely we will be able to handle complex scenarios without it (and I hope that as HSA evolves, it will become a viable, even though most probably always a bit slower, option). Apart from the performance degradation, the biggest problem is that currently HSA dynamic parallelism does not allow you wait for the completion of the child kernel in a straightforward way. There is a hack that allowed us to do it, but by its nature it only allows depth three dispatch (i.e. kernel->kernel->kernel). We are limiting ourselves to depth two, at the moment. Dynamic parallelism also requires non-trivial preparation at the CPU side, and quite a few HSA characteristics of the to-be dispatched kernels have to be known and passed to the GPU when the first kernel is invoked. In our current scheme, we have to know the "dependencies" of each kernel at compile-time, which is sometimes not possible, for example if the second kernel is invoked from a function that is in a different compilation unit than the first kernel. As I said, I hope that with time we will be able to overcome all of this, but at the moment, dynamic parallelism is clearly just an experimental feature (that is why I suggested warning when not gridifying). I hope this answers the question and explains the situation a bit, Martin