[PATCH 13/83] hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and DESTROY_QUEUE

2014-07-14 Thread Gabbay, Oded
On Sat, 2014-07-12 at 07:42 +1000, Dave Airlie wrote:
> >  +/* The 64-bit ABI is the authoritative version. */
> >  +#pragma pack(push, 8)
> >  +
>  
> Don't do this, pad and align things explicitly in structs.
>  
> >  +struct kfd_ioctl_create_queue_args {
> >  +   uint64_t ring_base_address; /* to KFD */
> >  +   uint32_t ring_size; /* to KFD */
> >  +   uint32_t gpu_id;/* to KFD */
> >  +   uint32_t queue_type;/* to KFD */
> >  +   uint32_t queue_percentage;  /* to KFD */
> >  +   uint32_t queue_priority;/* to KFD */
> >  +   uint64_t write_pointer_address; /* to KFD */
> >  +   uint64_t read_pointer_address;  /* to KFD */
> >  +
> >  +   uint64_t doorbell_address;  /* from KFD */
> >  +   uint32_t queue_id;  /* from KFD */
> >  +};
> >  +
>  
> maybe put all the uint64_t at the start, or add explicit padding.
>  
> Dave.
Thanks, will be fixed.
Oded


[PATCH 00/83] AMD HSA kernel driver

2014-07-12 Thread Gabbay, Oded
On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote:
> On Thu, Jul 10, 2014 at 10:51:29PM +0000, Gabbay, Oded wrote:
> >  On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
> > >  On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> > > >   This patch set implements a Heterogeneous System Architecture
> > > >  (HSA) driver
> > > >   for radeon-family GPUs.
> > >  This is just quick comments on few things. Given size of this, 
> > > people
> > >  will need to have time to review things.
> > > >   HSA allows different processor types (CPUs, DSPs, GPUs, 
> > > > etc..) to
> > > >  share
> > > >   system resources more effectively via HW features including
> > > >  shared pageable
> > > >   memory, userspace-accessible work queues, and platform-level
> > > >  atomics. In
> > > >   addition to the memory protection mechanisms in GPUVM and
> > > >  IOMMUv2, the Sea
> > > >   Islands family of GPUs also performs HW-level validation of
> > > >  commands passed
> > > >   in through the queues (aka rings).
> > > >   The code in this patch set is intended to serve both as a 
> > > > sample
> > > >  driver for
> > > >   other HSA-compatible hardware devices and as a production 
> > > > driver
> > > >  for
> > > >   radeon-family processors. The code is architected to support
> > > >  multiple CPUs
> > > >   each with connected GPUs, although the current implementation
> > > >  focuses on a
> > > >   single Kaveri/Berlin APU, and works alongside the existing 
> > > > radeon
> > > >  kernel
> > > >   graphics driver (kgd).
> > > >   AMD GPUs designed for use with HSA (Sea Islands and up) share
> > > >  some hardware
> > > >   functionality between HSA compute and regular gfx/compute 
> > > > (memory,
> > > >   interrupts, registers), while other functionality has been 
> > > > added
> > > >   specifically for HSA compute  (hw scheduler for virtualized
> > > >  compute rings).
> > > >   All shared hardware is owned by the radeon graphics driver, 
> > > > and
> > > >  an interface
> > > >   between kfd and kgd allows the kfd to make use of those 
> > > > shared
> > > >  resources,
> > > >   while HSA-specific functionality is managed directly by kfd 
> > > > by
> > > >  submitting
> > > >   packets into an HSA-specific command queue (the "HIQ").
> > > >   During kfd module initialization a char device node 
> > > > (/dev/kfd) is
> > > >  created
> > > >   (surviving until module exit), with ioctls for queue 
> > > > creation &
> > > >  management,
> > > >   and data structures are initialized for managing HSA device
> > > >  topology.
> > > >   The rest of the initialization is driven by calls from the 
> > > > radeon
> > > >  kgd at
> > > >   the following points :
> > > >   - radeon_init (kfd_init)
> > > >   - radeon_exit (kfd_fini)
> > > >   - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> > > >   - radeon_driver_unload_kms (kfd_device_fini)
> > > >   During the probe and init processing per-device data 
> > > > structures
> > > >  are
> > > >   established which connect to the associated graphics kernel
> > > >  driver. This
> > > >   information is exposed to userspace via sysfs, along with a
> > > >  version number
> > > >   allowing userspace to determine if a topology change has 
> > > > occurred
> > > >  while it
> > > >   was reading from sysfs.
> > > >   The interface between kfd and kgd also allows the kfd to 
> > > > request
> > > >  buffer
> > > >   management services from kgd, and allows kgd to route 
> > > > interrupt
> > > >  requests to
> > > >   kfd code since the interrupt block is shared between regular
> > > >   graphics/compute and HSA compute subsystems in the GPU.
> > > >   The kfd code works with an open source usermode library
> > > >  ("libhsakmt") which
> > > >   is in the final stages of IP review and should be published 
> > > > in a
> > > >  separate
> >

[PATCH 00/83] AMD HSA kernel driver

2014-07-10 Thread Gabbay, Oded
On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
> On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> >  This patch set implements a Heterogeneous System Architecture 
> > (HSA) driver
> >  for radeon-family GPUs.
>  
> This is just quick comments on few things. Given size of this, people
> will need to have time to review things.
>  
> >  HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to 
> > share
> >  system resources more effectively via HW features including 
> > shared pageable
> >  memory, userspace-accessible work queues, and platform-level 
> > atomics. In
> >  addition to the memory protection mechanisms in GPUVM and 
> > IOMMUv2, the Sea
> >  Islands family of GPUs also performs HW-level validation of 
> > commands passed
> >  in through the queues (aka rings).
> >  The code in this patch set is intended to serve both as a sample 
> > driver for
> >  other HSA-compatible hardware devices and as a production driver 
> > for
> >  radeon-family processors. The code is architected to support 
> > multiple CPUs
> >  each with connected GPUs, although the current implementation 
> > focuses on a
> >  single Kaveri/Berlin APU, and works alongside the existing radeon 
> > kernel
> >  graphics driver (kgd).
> >  AMD GPUs designed for use with HSA (Sea Islands and up) share 
> > some hardware
> >  functionality between HSA compute and regular gfx/compute (memory,
> >  interrupts, registers), while other functionality has been added
> >  specifically for HSA compute  (hw scheduler for virtualized 
> > compute rings).
> >  All shared hardware is owned by the radeon graphics driver, and 
> > an interface
> >  between kfd and kgd allows the kfd to make use of those shared 
> > resources,
> >  while HSA-specific functionality is managed directly by kfd by 
> > submitting
> >  packets into an HSA-specific command queue (the "HIQ").
> >  During kfd module initialization a char device node (/dev/kfd) is 
> > created
> >  (surviving until module exit), with ioctls for queue creation & 
> > management,
> >  and data structures are initialized for managing HSA device 
> > topology.
> >  The rest of the initialization is driven by calls from the radeon 
> > kgd at
> >  the following points :
> >  - radeon_init (kfd_init)
> >  - radeon_exit (kfd_fini)
> >  - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> >  - radeon_driver_unload_kms (kfd_device_fini)
> >  During the probe and init processing per-device data structures 
> > are
> >  established which connect to the associated graphics kernel 
> > driver. This
> >  information is exposed to userspace via sysfs, along with a 
> > version number
> >  allowing userspace to determine if a topology change has occurred 
> > while it
> >  was reading from sysfs.
> >  The interface between kfd and kgd also allows the kfd to request 
> > buffer
> >  management services from kgd, and allows kgd to route interrupt 
> > requests to
> >  kfd code since the interrupt block is shared between regular
> >  graphics/compute and HSA compute subsystems in the GPU.
> >  The kfd code works with an open source usermode library 
> > ("libhsakmt") which
> >  is in the final stages of IP review and should be published in a 
> > separate
> >  repo over the next few days.
> >  The code operates in one of three modes, selectable via the 
> > sched_policy
> >  module parameter :
> >  - sched_policy=0 uses a hardware scheduler running in the MEC 
> > block within
> >  CP, and allows oversubscription (more queues than HW slots)
> >  - sched_policy=1 also uses HW scheduling but does not allow
> >  oversubscription, so create_queue requests fail when we run out 
> > of HW slots
> >  - sched_policy=2 does not use HW scheduling, so the driver 
> > manually assigns
> >  queues to HW slots by programming registers
> >  The "no HW scheduling" option is for debug & new hardware bringup 
> > only, so
> >  has less test coverage than the other options. Default in the 
> > current code
> >  is "HW scheduling without oversubscription" since that is where 
> > we have the
> >  most test coverage but we expect to change the default to "HW 
> > scheduling
> >  with oversubscription" after further testing. This effectively 
> > removes the
> >  HW limit on the number of work queues available to applications.
> >  Programs running on the GPU are associated with an address space 
> > through the
> >  VMID field, which is translated to a unique PASID at access time 
> > via a set
> >  of 16 VMID-to-PASID mapping registers. The available VMIDs 
> > (currently 16)
> >  are partitioned (under control of the radeon kgd) between current
> >  gfx/compute and HSA compute, with each getting 8 in the current 
> > code. The
> >  VMID-to-PASID mapping registers are updated by the HW scheduler 
> > when used,
> >  and by driver code if HW scheduling is not being used.
> >  The Sea Islands compute queues use a new "doorbell" mechanism 
> > instead of the
> >  earlier kernel-managed