On Thu, Nov 12, 2015 at 12:19:50PM +0100, Jakub Jelinek wrote:
> On Mon, Nov 09, 2015 at 05:58:56PM +0100, Martin Jambor wrote:
> > > But I don't see any way to disable it on the command line?  (no switch?)
> > 
> > No, the switch is -foffload, which has missing documentation (PR
> > 67300) and is only described at https://gcc.gnu.org/wiki/Offloading
> > Nevertheless, the option allows the user to specify compiler option
> > -foffload=disable and no offloading should happen, not even HSA.  The
> > user can also enumerate just the offload targets they want (and pass
> > them special command line stuff).
> > 
> > It seems I have misplaced a hunk in the patch series.  Nevertheless,
> > in the first patch (with configuration stuff), there is a change to
> > opts.c which scans the -foffload= contents and sets the flag variable
> > if hsa is not present.
> > 
> > Whenever the compiler has to decide whether HSA is enabled for the
> > given compilation or not, it has to look at this variable (if
> > configured for HSA).
> 
> Buut what is the difference between
> -foffload=disable
> or
> -foffload={list not including hsa}
> and the new param?  If you don't gridify, you don't emit any kernels...
> 

We do.  When a kernel cannot be gridified, we try to handle it via
dynamic parallelism (i.e. launching a kernel from a kernel) .  Even
though we have not been able to get any good performance with it and
there are several limitations and open problems, I still include this
option in my plans because it is unlikely we will be able to handle
complex scenarios without it (and I hope that as HSA evolves, it will
become a viable, even though most probably always a bit slower,
option).

Apart from the performance degradation, the biggest problem is that
currently HSA dynamic parallelism does not allow you wait for the
completion of the child kernel in a straightforward way.  There is a
hack that allowed us to do it, but by its nature it only allows depth
three dispatch (i.e. kernel->kernel->kernel).  We are limiting
ourselves to depth two, at the moment.

Dynamic parallelism also requires non-trivial preparation at the CPU
side, and quite a few HSA characteristics of the to-be dispatched
kernels have to be known and passed to the GPU when the first kernel
is invoked.  In our current scheme, we have to know the "dependencies"
of each kernel at compile-time, which is sometimes not possible, for
example if the second kernel is invoked from a function that is in a
different compilation unit than the first kernel.

As I said, I hope that with time we will be able to overcome all of
this, but at the moment, dynamic parallelism is clearly just an
experimental feature (that is why I suggested warning when not
gridifying).

I hope this answers the question and explains the situation a bit,

Martin

Reply via email to