Re: [RFC PATCH v2 00/12] System device hot-plug framework

2013-01-17 Thread Toshi Kani
On Thu, 2013-01-17 at 01:50 +0100, Rafael J. Wysocki wrote:
> On Thursday, January 10, 2013 04:40:18 PM Toshi Kani wrote:
> > This patchset is a prototype of proposed system device hot-plug framework
> > for design review.  Unlike other hot-plug environments, such as USB and
> > PCI, there is no common framework for system device hot-plug [1].
> > Therefore, this patchset is designed to provide a common framework for
> > hot-plugging and online/offline operations of system devices, such as CPU,
> > Memory and Node.  While this patchset only supports ACPI-based hot-plug
> > operations, the framework itself is designed to be platform-neural and
> > can support other FW architectures as necessary.
 :
> At this point I'd like to clearly understand how the code is supposed to work.

Thanks for reviewing!

> From what I can say at the moment it all boils down to having two (ordered)
> lists of notifiers (shp_add_list, shp_del_list) that can be added to or 
> removed
> from with shp_register_handler() and shp_unregister_handler(), respectively

Yes.

> (BTW, the abbreviation "hdr" makes me think about a "header" rather than a
> "handler", but maybe that's just me :-)), 

Well, it makes me think that way as well. :)  How about "hdlr"?

> and a workqueue for requests (why do
> we need a separate workqueue for that?).

This workqueue needs to be platform-neutral and max_active set to 1, and
preferably is dedicated for hotplug operations.  kacpi_hotplug_wq is
close, but is ACPI-specific.  So, I decided to create a new workqueue
for this framework.

> Whoever needs to carry out a hotplug operation is supposed to prepare a 
> request
> and then put it into the workqueue with shp_submit_request().  The framework
> will then execute all of the notifier callbacks from the appropriate notifier
> list (depending on whether the operation is a hot-add or a hot-remove).  If 
> any
> of those callbacks returns an error code and it is not too late (the order of
> the failing notifier is not too high), the already executed notifier callbacks
> will be run again with the "rollback" argument set to 1 (why not to use bool?)

Agreed.  I will change the rollback to bool.

> to indicate that they are supposed to bring things back to the initial state.
> Error codes returned in that stage only cause messages to be printed.
>
> Is the description above correct?

Yes.  It's very good summary!

> If so, it looks like subsystems are supposed to register notifiers (handlers)
> for hotplug/hot-remove operations of the devices they handle.  They are
> supposed to use predefined order values to indicate what kinds of devices
> those are.  Then, hopefully, if they do everything correctly, and the
> initiator of a hotplug/hot-remove operation prepares the request correctly,
> the callbacks will be executed in the right order, they will find their
> devices in the list attached to the request object and they will do what's
> necessary with them.
> 
> Am I still on the right track?

Yes.

> If that's the case, I have a few questions.

Well, there are more than a few :), but they all are excellent
questions!

> (1) Why is this limited to system devices?

It could be extended to other devices, but is specifically designed for
system devices as follows.  So, I think it is best to keep it in that
way.

a) Work with multiple subsystems without bus dependency.  Other hot-plug
frameworks are designed and implemented for a particular bus and a
subsystem.  Therefore, they work best for their targeted environment as
well.

b) Sequence with pre-defined order.  This allows hot-add operation and
the boot sequence to be consistent.  Other non-system devices are
initialized within a subsystem, and do not depend on the boot-up
sequence.

> (2) What's the guarantee that the ordering of hot-removal (for example) of CPU
> cores with respect to memory and host bridges will always be the same?
> What if the CPU cores themselves need to be hot-removed in a specific
> order?

When devices are added in the order of A->B->C, their dependency model
is:
 - B may depend on A (but A may not depend on B)
 - C may depend on A and B (but A and B may not depend on C)

Therefore, they can be deleted in the order of C->B->A.

The boot sequence defines the order for add.  So, it is important to
make sure that we hot-add devices in the same order with the boot
sequence.  Of course, if there is an issue in the order, we need to fix
it.  But the point is that the add order should be consistent between
the boot sequence and hot-add.

In your example, the boot sequence adds them in the order of
memory->CPU->host bridge.  I think this makes sense because cpu may need
its local memory, and host bridge may need its local memory and local
cpu for interrupt.  So, hot-add needs to do the same for node hot-add,
and hot-delete should be able to delete them in the reversed order per
their dependency model.

> (3) What's the guarantee that the ordering of shp_add_list and shp_del_list
>

Re: [RFC PATCH v2 00/12] System device hot-plug framework

2013-01-16 Thread Rafael J. Wysocki
On Thursday, January 10, 2013 04:40:18 PM Toshi Kani wrote:
> This patchset is a prototype of proposed system device hot-plug framework
> for design review.  Unlike other hot-plug environments, such as USB and
> PCI, there is no common framework for system device hot-plug [1].
> Therefore, this patchset is designed to provide a common framework for
> hot-plugging and online/offline operations of system devices, such as CPU,
> Memory and Node.  While this patchset only supports ACPI-based hot-plug
> operations, the framework itself is designed to be platform-neural and
> can support other FW architectures as necessary.
> 
> This patchset is based on Linus's tree (3.8-rc3).
> 
> I have seen a few stability issues with 3.8-rc3 in my testing and will
> look into their solutions.
> 
> [1] System device hot-plug frameworks for ppc and s390 are implemented
> for specific platforms and products.
> 
> 
> Background: System Device Initialization
> 
> System devices, such as CPU and memory, must be initialized during early
> boot sequence as they are the essential components to provide low-level
> services, ex. scheduling, memory allocation and interrupts, which are
> the foundations of the kernel services.  start_kernel() and kernel_init()
> manage the boot-up sequence to initialize system devices and low-level
> services in pre-defined order as shown below. 
> 
>   start_kernel()
> boot_cpu_init()  // init cpu0
> setup_arch()
>   efi_init() // init EFI memory map
>   initmem_init() // init NUMA
>   x86_init.paging.pagetable_init() // init page table
>   acpi_boot_init()   // parse ACPI MADT table
> :
>   kernel_init()
> kernel_init_freeable()
>   smp_init() // init other CPUs
> :
>   do_basic_setup()
> driver_init()
>   cpu_dev_init() // build system/cpu tree
>   memory_dev_init()  // build system/memory tree
> do_initcalls()
>   acpi_init()// build ACPI device tree
> 
> Note that drivers are initialized at the end of the boot sequence as they
> depend on the kernel services from system devices.  Hence, while system
> devices may be exposed to sysfs with their pseudo drivers, their
> initialization may not be fully integrated into the driver structures.  
> 
> Overview of the System Device Hot-plug Framework
> 
> Similar to the boot-up sequence, the system device hot-plug framework
> provides a sequencer that calls all registered handlers in pre-defined
> order for hot-add and hot-delete of system devices.  It allows any modules
> initializing system devices in the boot-up sequence to participate in
> the hot-plug operations as well.  In high-level, there are two types of
> handlers, 1) FW-dependent (ex. ACPI) handlers that enumerate or eject
> system devices, and 2) system device (ex. CPU, Memory) management handlers
> that online or offline the enumerated system devices.  Online/offline
> operations are sub-set of hot-add/delete operations.  The ordering of the
> handlers are symmetric between hot-add (online) and hot-delete (offline)
> operations.
> 
> hot-addonline
>|^:^
>   HW Enum/ ||::
> Eject  ||::
>||::
>   Online/  ||||
>   Offline  ||||
>V|V|
>  hot-del   offline
> 
> The handlers may not call other handlers directly to exceed their role.
> Therefore, the role of the handlers in their modules remains consistent
> with their role at the boot-up sequence.  For instance, the ACPI module
> may not perform online or offline of system devices.
> 
> System Device Hot-plug Operation
> 
> 
> Serialized Startup
> --
> The framework provides an interface (hp_submit_req) to request a hot-plug
> operation.  All requests are queued to and run on a single work queue.
> The framework assures that there is only a single hot-plug or online/
> offline operation running at a time.  A single request may however target
> to multiple devices.  This makes the execution context of handlers to be
> consistent with the boot-up sequence and enables code sharing.
> 
> Phased Execution
> 
> The framework proceeds hot-plug and online/offline operations in the 
> following three phases.  The modules can register their handlers to each
> phase.  The framework also initiates a roll-back operation if any hander
> failed in the validate or execute phase.
> 
> 1) Validate Phase - Handlers validate if they support a given request
> without making any changes to target device(s).  They check any known
> restrictions and/or prerequisite conditions to their modules, and fail
> an unsupported request before making any changes.  For instance, the
> memory module may check if a hot-remove request is targ