On 14.05.19 11:25, Christian Borntraeger wrote: > > > On 14.05.19 11:23, Christian Borntraeger wrote: >> >> >> On 14.05.19 11:20, David Hildenbrand wrote: >>> On 14.05.19 11:10, Christian Borntraeger wrote: >>>> >>>> >>>> On 14.05.19 10:59, David Hildenbrand wrote: >>>>> On 14.05.19 10:49, Cornelia Huck wrote: >>>>>> On Tue, 14 May 2019 10:37:32 +0200 >>>>>> Christian Borntraeger <borntrae...@de.ibm.com> wrote: >>>>>> >>>>>>> On 14.05.19 09:28, David Hildenbrand wrote: >>>>>>>>>>> But that can be tested using the runability information if I am not >>>>>>>>>>> wrong. >>>>>>>>>> >>>>>>>>>> You mean the cpu level information, right? >>>>>>>> >>>>>>>> Yes, query-cpu-definition includes for each model runability >>>>>>>> information >>>>>>>> via "unavailable-features" (valid under the started QEMU machine). >>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> and others that we have today. >>>>>>>>>>>> >>>>>>>>>>>> So yes, I think this would be acceptable. >>>>>>>>>>> >>>>>>>>>>> I guess it is acceptable yes. I doubt anybody uses that many CPUs in >>>>>>>>>>> production either way. But you never know. >>>>>>>>>> >>>>>>>>>> I think that using that many cpus is a more uncommon setup, but I >>>>>>>>>> still >>>>>>>>>> think that having to wait for actual failure >>>>>>>>> >>>>>>>>> That can happen all the time today. You can easily say z14 in the xml >>>>>>>>> when >>>>>>>>> on a zEC12. Only at startup you get the error. The question is >>>>>>>>> really: >>>>>>>> >>>>>>>> "-smp 248 -cpu host" will no longer work, while e.g. "-smp 248 -cpu >>>>>>>> z12" >>>>>>>> will work. Actually, even "-smp 248" will no longer work on affected >>>>>>>> machines. >>>>>>>> >>>>>>>> That is why wonder if it is better to disable the feature and print a >>>>>>>> warning. Similar to CMMA, where want want to tolerate when CMMA is not >>>>>>>> possible in the current environment (huge pages). >>>>>>>> >>>>>>>> "Diag318 will not be enabled because it is not compatible with more >>>>>>>> than >>>>>>>> 240 CPUs". >>>>>>>> >>>>>>>> However, I still think that implementing support for more than one SCLP >>>>>>>> response page is the best solution. Guests will need adaptions for > >>>>>>>> 240 >>>>>>>> CPUs with Diag318, but who cares? Existing setups will continue to >>>>>>>> work. >>>>>>>> >>>>>>>> Implementing that SCLP thingy will avoid any warnings and any errors. >>>>>>>> It >>>>>>>> just works from the QEMU perspective. >>>>>>>> >>>>>>>> Is implementing this realistic? >>>>>>> >>>>>>> Yes it is but it will take time. I will try to get this rolling. To make >>>>>>> progress on the diag318 thing, can we error on startup now and simply >>>>>>> remove that check when when have implemented a larger sccb? If we would >>>>>>> now do all kinds of "change the max number games" would be harder to >>>>>>> "fix". >>>>>> >>>>>> So, the idea right now is: >>>>>> >>>>>> - fail to start if you try to specify a diag318 device and more than >>>>>> 240 cpus (do we need a knob to turn off the device?) >>>>>> - in the future, support more than one SCLP response page >>>>>> >>>>>> I'm getting a bit lost in the discussion; but the above sounds >>>>>> reasonable to me. >>>>>> >>>>> >>>>> We can >>>>> >>>>> 1. Fail to start with #cpus > 240 when diag318=on >>>>> 2. Remove the error once we support more than one SCLP response page >>>>> >>>>> Or >>>>> >>>>> 1. Allow to start with #cpus > 240 when diag318=on, but indicate only >>>>> 240 CPUs via SCLP >>>>> 2. Print a warning >>>>> 3. Remove the restriction and the warning once we support more than one >>>>> SCLP response page >>>>> >>>>> While I prefer the second approach (similar to defining zPCI devices >>>>> without zpci=on), I could also live with the first approach. >>>> >>>> I prefer approach 1. >>>> >>> >>> Isn't approach #2 what we discussed (limiting sclp, but of course to 247 >>> CPUs), but with an additional warning? I'm confused. >> >> Different numbering interpretion. I was talking about 1 = "Allow to start >> with #cpus > 240 when diag318=on, but indicate only >> 240 CPUs via SCLP" > > So yes, variant 2 when I use your numbering. The only question is: do we need > a warning? It probably does not hurt.
After all, we are talking about 1 VCPU that the guest can only use by indirect probing ... I leave that up to Collin :) -- Thanks, David / dhildenb