Tyrel Datwyler <tyr...@linux.vnet.ibm.com> writes: > On 06/06/2019 10:04 PM, Nathan Lynch wrote: >> During post-migration device tree updates, we can oops in >> pseries_update_drconf_memory if the source device tree has an >> ibm,dynamic-memory-v2 property and the destination has a >> ibm,dynamic_memory (v1) property. The notifier processes an "update" >> for the ibm,dynamic-memory property but it's really an add in this >> scenario. So make sure the old property object is there before >> dereferencing it. >> >> Signed-off-by: Nathan Lynch <nath...@linux.ibm.com> >> --- > > Yes, this patch solves the oops, but I worry it just papers over some short > comings in our code. > > After some poking around unless I'm mistaken our memory notifier only handles > v1 > versions of the ibm,dynamic-memory property. So, on newer firmware we aren't > doing any memory fixups if v2 (ibm,dynamic-memory-v2) of the property > is updated.
It's not clear to me what the notifier is meant to accomplish: for (i = 0; i < entries; i++) { if ((be32_to_cpu(old_drmem[i].flags) & DRCONF_MEM_ASSIGNED) && (!(be32_to_cpu(new_drmem[i].flags) & DRCONF_MEM_ASSIGNED))) { rc = pseries_remove_memblock( be64_to_cpu(old_drmem[i].base_addr), memblock_size); break; } else if ((!(be32_to_cpu(old_drmem[i].flags) & DRCONF_MEM_ASSIGNED)) && (be32_to_cpu(new_drmem[i].flags) & DRCONF_MEM_ASSIGNED)) { rc = memblock_add(be64_to_cpu(old_drmem[i].base_addr), memblock_size); rc = (rc < 0) ? -EINVAL : 0; break; } } This compares the 'assigned' flag for each LMB in the old vs new properties and adds or removes the block accordingly. However: - Migration and PRRNs are specified only to change LMBs' NUMA affinity information. This notifier should be a no-op for those scenarios since the assigned flags should not change. - The memory hotplug/DLPAR path has a hack which inhibits the execution of the notifier: dlpar_memory() ... rtas_hp_event = true; drmem_update_dt() of_update_property() pseries_memory_notifier() pseries_update_drconf_memory() if (rtas_hp_event) return; So what's it for? My best guess is that it's a holdover from when more of the DLPAR work was done in user space. I don't see a purpose for it now. > For older PFW if we have source and target that only support v1 we will update > the memory in response to any update to ibm,dynamic-memory. It also appears to > be the case if we start with v1 and migrate to a target with newer PFW that > supports both v1 and v2 that the PFW will continue with v1 on the target and > as > a result we update memory in accordance to a property update to > ibm,dynamic-memory. > > Now, if we have source and targets that both support v2 after a migration we > will do no update in response to ibm,dynamic-memory-v2 changes. And then there > is the case of a source with v2 support migrating to a target with only v1 > support where we observe this oops. The oops is a result of ibm,dynamic-memory > being a new property that is added and there for no old property date exists. > However, simply returning in response also has the side effect that we do not > update memory in response to a device tree update of dynamic memory. > > Maybe we are ok with this behavior as I haven't dug deep enough into the > memory > subsystem here to really understand what the memory code is updating, but it > is > concerning that we are doing it in some cases, but not all. I hope I've made a good case above that the notifier does not do any useful work, and a counterpart for the v2 format isn't needed. Do you agree? If so, I'll send a patch later to remove the notifier altogether. In the near term I would still like this minimal fix to go in.