Re: [Qemu-devel] [qemu-s390x] [PATCH v4] s390: diagnose 318 info reset and migration support

David Hildenbrand Tue, 14 May 2019 02:28:51 -0700

On 14.05.19 11:25, Christian Borntraeger wrote:
> 
> 
> On 14.05.19 11:23, Christian Borntraeger wrote:
>>
>>
>> On 14.05.19 11:20, David Hildenbrand wrote:
>>> On 14.05.19 11:10, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 14.05.19 10:59, David Hildenbrand wrote:
>>>>> On 14.05.19 10:49, Cornelia Huck wrote:
>>>>>> On Tue, 14 May 2019 10:37:32 +0200
>>>>>> Christian Borntraeger <borntrae...@de.ibm.com> wrote:
>>>>>>
>>>>>>> On 14.05.19 09:28, David Hildenbrand wrote:
>>>>>>>>>>> But that can be tested using the runability information if I am not 
>>>>>>>>>>> wrong.  
>>>>>>>>>>
>>>>>>>>>> You mean the cpu level information, right?  
>>>>>>>>
>>>>>>>> Yes, query-cpu-definition includes for each model runability 
>>>>>>>> information
>>>>>>>> via "unavailable-features" (valid under the started QEMU machine).
>>>>>>>>   
>>>>>>>>>>  
>>>>>>>>>>>  
>>>>>>>>>>>> and others that we have today.
>>>>>>>>>>>>
>>>>>>>>>>>> So yes, I think this would be acceptable.    
>>>>>>>>>>>
>>>>>>>>>>> I guess it is acceptable yes. I doubt anybody uses that many CPUs in
>>>>>>>>>>> production either way. But you never know.  
>>>>>>>>>>
>>>>>>>>>> I think that using that many cpus is a more uncommon setup, but I 
>>>>>>>>>> still
>>>>>>>>>> think that having to wait for actual failure  
>>>>>>>>>
>>>>>>>>> That can happen all the time today. You can easily say z14 in the xml 
>>>>>>>>> when 
>>>>>>>>> on a zEC12. Only at startup you get the error. The question is 
>>>>>>>>> really:  
>>>>>>>>
>>>>>>>> "-smp 248 -cpu host" will no longer work, while e.g. "-smp 248 -cpu 
>>>>>>>> z12"
>>>>>>>> will work. Actually, even "-smp 248" will no longer work on affected
>>>>>>>> machines.
>>>>>>>>
>>>>>>>> That is why wonder if it is better to disable the feature and print a
>>>>>>>> warning. Similar to CMMA, where want want to tolerate when CMMA is not
>>>>>>>> possible in the current environment (huge pages).
>>>>>>>>
>>>>>>>> "Diag318 will not be enabled because it is not compatible with more 
>>>>>>>> than
>>>>>>>> 240 CPUs".
>>>>>>>>
>>>>>>>> However, I still think that implementing support for more than one SCLP
>>>>>>>> response page is the best solution. Guests will need adaptions for > 
>>>>>>>> 240
>>>>>>>> CPUs with Diag318, but who cares? Existing setups will continue to 
>>>>>>>> work.
>>>>>>>>
>>>>>>>> Implementing that SCLP thingy will avoid any warnings and any errors. 
>>>>>>>> It
>>>>>>>> just works from the QEMU perspective.
>>>>>>>>
>>>>>>>> Is implementing this realistic?  
>>>>>>>
>>>>>>> Yes it is but it will take time. I will try to get this rolling. To make
>>>>>>> progress on the diag318 thing, can we error on startup now and simply
>>>>>>> remove that check when when have implemented a larger sccb? If we would
>>>>>>> now do all kinds of "change the max number games" would be harder to 
>>>>>>> "fix".
>>>>>>
>>>>>> So, the idea right now is:
>>>>>>
>>>>>> - fail to start if you try to specify a diag318 device and more than
>>>>>>   240 cpus (do we need a knob to turn off the device?)
>>>>>> - in the future, support more than one SCLP response page
>>>>>>
>>>>>> I'm getting a bit lost in the discussion; but the above sounds
>>>>>> reasonable to me.
>>>>>>
>>>>>
>>>>> We can
>>>>>
>>>>> 1. Fail to start with #cpus > 240 when diag318=on
>>>>> 2. Remove the error once we support more than one SCLP response page
>>>>>
>>>>> Or
>>>>>
>>>>> 1. Allow to start with #cpus > 240 when diag318=on, but indicate only
>>>>>    240 CPUs via SCLP
>>>>> 2. Print a warning
>>>>> 3. Remove the restriction and the warning once we support more than one
>>>>>    SCLP response page
>>>>>
>>>>> While I prefer the second approach (similar to defining zPCI devices
>>>>> without zpci=on), I could also live with the first approach.
>>>>
>>>> I prefer approach 1.
>>>>
>>>
>>> Isn't approach #2 what we discussed (limiting sclp, but of course to 247
>>> CPUs), but with an additional warning? I'm confused.
>>
>> Different numbering interpretion. I was talking about 1 = "Allow to start 
>> with #cpus > 240 when diag318=on, but indicate only
>> 240 CPUs via SCLP"
> 
> So yes, variant 2 when I use your numbering. The only question is: do we need
> a warning? It probably does not hurt.


After all, we are talking about 1 VCPU that the guest can only use by
indirect probing ... I leave that up to Collin :)


-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [qemu-s390x] [PATCH v4] s390: diagnose 318 info reset and migration support

Reply via email to