[Xen-devel] [seabios test] 118275: regressions - FAIL

2018-01-22 Thread osstest service owner
flight 118275 seabios real [real]
http://logs.test-lab.xenproject.org/osstest/logs/118275/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop   fail REGR. vs. 115539

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 115539
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 115539
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 115539
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass

version targeted for testing:
 seabios  14d91c353e19b7085fdbb7b2dcc43f3355665670
baseline version:
 seabios  0ca6d6277dfafc671a5b3718cbeb5c78e2a888ea

Last test of basis   115539  2017-11-03 20:48:58 Z   80 days
Failing since115733  2017-11-10 17:19:59 Z   73 days   87 attempts
Testing same since   118140  2018-01-17 05:09:48 Z6 days8 attempts


People who touched revisions under test:
  Kevin O'Connor 
  Marcel Apfelbaum 
  Michael S. Tsirkin 
  Paul Menzel 
  Stefan Berger 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass
 test-amd64-amd64-qemuu-nested-amdfail
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-ws16-amd64 fail
 test-amd64-i386-xl-qemuu-ws16-amd64  fail
 test-amd64-amd64-xl-qemuu-win10-i386 fail
 test-amd64-i386-xl-qemuu-win10-i386  fail
 test-amd64-amd64-qemuu-nested-intel  pass
 test-amd64-i386-qemuu-rhel6hvm-intel pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.


commit 14d91c353e19b7085fdbb7b2dcc43f3355665670
Author: Marcel Apfelbaum 
Date:   Thu Jan 11 22:15:12 2018 +0200

pci: fix 'io hints' capability for RedHat PCI bridges

Commit ec6cb17f (pci: enable RedHat PCI bridges to reserve additional
 resources on PCI init)
added a new vendor specific PCI capability for RedHat PCI bridges
allowing them to reserve additional buses and/or IO/MEM space.

When adding the IO hints PCI capability to the pcie-root-port
without specifying a value for bus reservation, the subordinate bus
computation is wrong and the guest kernel gets messed up.

Fix it by returning to prev code if the value for bus
reservation is not set.

Removed also a wrong debug print "PCI: invalid QEMU resource reserve
cap offset" which appears if the 'IO hints' capability is not present.

Acked-by: Michael S. Tsirkin 

Re: [Xen-devel] [PATCH RFC v2 00/12] xen/x86: use per-vcpu stacks for 64 bit pv domains

2018-01-22 Thread Juergen Gross
On 23/01/18 07:34, Juergen Gross wrote:
> On 22/01/18 19:39, Andrew Cooper wrote:
>> On 22/01/18 16:51, Jan Beulich wrote:
>> On 22.01.18 at 16:00,  wrote:
 On 22/01/18 15:48, Jan Beulich wrote:
 On 22.01.18 at 15:38,  wrote:
>> On 22/01/18 15:22, Jan Beulich wrote:
>> On 22.01.18 at 15:18,  wrote:
 On 22/01/18 13:50, Jan Beulich wrote:
 On 22.01.18 at 13:32,  wrote:
>> As a preparation for doing page table isolation in the Xen hypervisor
>> in order to mitigate "Meltdown" use dedicated stacks, GDT and TSS for
>> 64 bit PV domains mapped to the per-domain virtual area.
>>
>> The per-vcpu stacks are used for early interrupt handling only. After
>> saving the domain's registers stacks are switched back to the normal
>> per physical cpu ones in order to be able to address on-stack data
>> from other cpus e.g. while handling IPIs.
>>
>> Adding %cr3 switching between saving of the registers and switching
>> the stacks will enable the possibility to run guest code without any
>> per physical cpu mapping, i.e. avoiding the threat of a guest being
>> able to access other domains data.
>>
>> Without any further measures it will still be possible for e.g. a
>> guest's user program to read stack data of another vcpu of the same
>> domain, but this can be easily avoided by a little PV-ABI 
>> modification
>> introducing per-cpu user address spaces.
>>
>> This series is meant as a replacement for Andrew's patch series:
>> "x86: Prerequisite work for a Xen KAISER solution".
> Considering in particular the two reverts, what I'm missing here
> is a clear description of the meaningful additional protection this
> approach provides over the band-aid. For context see also
> https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg01735.html
>  
 My approach supports mapping only the following data while the guest is
 running (apart form the guest's own data, of course):

 - the per-vcpu entry stacks of the domain which will contain only the
   guest's registers saved when an interrupt occurs
 - the per-vcpu GDTs and TSSs of the domain
 - the IDT
 - the interrupt handler code (arch/x86/x86_64/[compat/]entry.S

 All other hypervisor data and code can be completely hidden from the
 guests.
>>> I understand that. What I'm not clear about is: Which parts of
>>> the additionally hidden data are actually necessary (or at least
>>> very desirable) to hide?
>> Necessary:
>> - other guests' memory (e.g. physical memory 1:1 mapping)
>> - data from other guests e.g.in stack pages, debug buffers, I/O buffers,
>>   code emulator buffers
>> - other guests' register values e.g. in vcpu structure
> All of this is already being made invisible by the band-aid (with the
> exception of leftovers on the hypervisor stacks across context
> switches, which we've already said could be taken care of by
> memset()ing that area). I'm asking about the _additional_ benefits
> of your approach.
 I'm quite sure the performance will be much better as it doesn't require
 per physical cpu L4 page tables, but just a shadow L4 table for each
 guest L4 table, similar to the Linux kernel KPTI approach.
>>> But isn't that model having the same synchronization issues upon
>>> guest L4 updates which Andrew was fighting with?
>>
>> (Condensing a lot of threads down into one)
>>
>> All the methods have L4 synchronisation update issues, until we have a
>> PV ABI which guarantees that L4's don't get reused.  Any improvements to
>> the shadowing/synchronisation algorithm will benefit all approaches.
>>
>> Juergen: you're now adding a LTR into the context switch path which
>> tends to be very slow.  I.e. As currently presented, this series
>> necessarily has a higher runtime overhead than Jan's XPTI.
> 
> Sure? How slow is LTR compared to a copy of nearly 4kB of data?

I just added some measurement code to ltr(). On my system ltr takes
about 320 cycles, so a little bit more than 100ns (2.9 GHz).

With 10.000 context switches per second and 2 ltr instructions per
context switch this would add up to about 0.2% performance loss.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v2 00/12] xen/x86: use per-vcpu stacks for 64 bit pv domains

2018-01-22 Thread Juergen Gross
On 22/01/18 22:45, Konrad Rzeszutek Wilk wrote:
> On Mon, Jan 22, 2018 at 01:32:44PM +0100, Juergen Gross wrote:
>> As a preparation for doing page table isolation in the Xen hypervisor
>> in order to mitigate "Meltdown" use dedicated stacks, GDT and TSS for
>> 64 bit PV domains mapped to the per-domain virtual area.
>>
>> The per-vcpu stacks are used for early interrupt handling only. After
>> saving the domain's registers stacks are switched back to the normal
>> per physical cpu ones in order to be able to address on-stack data
>> from other cpus e.g. while handling IPIs.
>>
>> Adding %cr3 switching between saving of the registers and switching
>> the stacks will enable the possibility to run guest code without any
>> per physical cpu mapping, i.e. avoiding the threat of a guest being
>> able to access other domains data.
>>
>> Without any further measures it will still be possible for e.g. a
>> guest's user program to read stack data of another vcpu of the same
>> domain, but this can be easily avoided by a little PV-ABI modification
>> introducing per-cpu user address spaces.
>>
>> This series is meant as a replacement for Andrew's patch series:
>> "x86: Prerequisite work for a Xen KAISER solution".
>>
>> What needs to be done:
>> - verify livepatching is still working
> 
> Is there an git repo for this?

https://github.com/jgross1/xen.git xpti


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v2 00/12] xen/x86: use per-vcpu stacks for 64 bit pv domains

2018-01-22 Thread Juergen Gross
On 22/01/18 19:39, Andrew Cooper wrote:
> On 22/01/18 16:51, Jan Beulich wrote:
> On 22.01.18 at 16:00,  wrote:
>>> On 22/01/18 15:48, Jan Beulich wrote:
>>> On 22.01.18 at 15:38,  wrote:
> On 22/01/18 15:22, Jan Beulich wrote:
> On 22.01.18 at 15:18,  wrote:
>>> On 22/01/18 13:50, Jan Beulich wrote:
>>> On 22.01.18 at 13:32,  wrote:
> As a preparation for doing page table isolation in the Xen hypervisor
> in order to mitigate "Meltdown" use dedicated stacks, GDT and TSS for
> 64 bit PV domains mapped to the per-domain virtual area.
>
> The per-vcpu stacks are used for early interrupt handling only. After
> saving the domain's registers stacks are switched back to the normal
> per physical cpu ones in order to be able to address on-stack data
> from other cpus e.g. while handling IPIs.
>
> Adding %cr3 switching between saving of the registers and switching
> the stacks will enable the possibility to run guest code without any
> per physical cpu mapping, i.e. avoiding the threat of a guest being
> able to access other domains data.
>
> Without any further measures it will still be possible for e.g. a
> guest's user program to read stack data of another vcpu of the same
> domain, but this can be easily avoided by a little PV-ABI modification
> introducing per-cpu user address spaces.
>
> This series is meant as a replacement for Andrew's patch series:
> "x86: Prerequisite work for a Xen KAISER solution".
 Considering in particular the two reverts, what I'm missing here
 is a clear description of the meaningful additional protection this
 approach provides over the band-aid. For context see also
 https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg01735.html
  
>>> My approach supports mapping only the following data while the guest is
>>> running (apart form the guest's own data, of course):
>>>
>>> - the per-vcpu entry stacks of the domain which will contain only the
>>>   guest's registers saved when an interrupt occurs
>>> - the per-vcpu GDTs and TSSs of the domain
>>> - the IDT
>>> - the interrupt handler code (arch/x86/x86_64/[compat/]entry.S
>>>
>>> All other hypervisor data and code can be completely hidden from the
>>> guests.
>> I understand that. What I'm not clear about is: Which parts of
>> the additionally hidden data are actually necessary (or at least
>> very desirable) to hide?
> Necessary:
> - other guests' memory (e.g. physical memory 1:1 mapping)
> - data from other guests e.g.in stack pages, debug buffers, I/O buffers,
>   code emulator buffers
> - other guests' register values e.g. in vcpu structure
 All of this is already being made invisible by the band-aid (with the
 exception of leftovers on the hypervisor stacks across context
 switches, which we've already said could be taken care of by
 memset()ing that area). I'm asking about the _additional_ benefits
 of your approach.
>>> I'm quite sure the performance will be much better as it doesn't require
>>> per physical cpu L4 page tables, but just a shadow L4 table for each
>>> guest L4 table, similar to the Linux kernel KPTI approach.
>> But isn't that model having the same synchronization issues upon
>> guest L4 updates which Andrew was fighting with?
> 
> (Condensing a lot of threads down into one)
> 
> All the methods have L4 synchronisation update issues, until we have a
> PV ABI which guarantees that L4's don't get reused.  Any improvements to
> the shadowing/synchronisation algorithm will benefit all approaches.
> 
> Juergen: you're now adding a LTR into the context switch path which
> tends to be very slow.  I.e. As currently presented, this series
> necessarily has a higher runtime overhead than Jan's XPTI.

Sure? How slow is LTR compared to a copy of nearly 4kB of data?

> One of my concerns is that this patch series moves further away from the
> secondary goal of my KAISER series, which was to have the IDT and GDT
> mapped at the same linear addresses on every CPU so a) SIDT/SGDT don't
> leak which CPU you're currently scheduled on into PV guests and b) the
> context switch code can drop a load of its slow instructions like LGDT
> and the VMWRITEs to update the VMCS.

The GDT address of a PV vcpu is depending on vcpu_id only. I don't
see why the IDT can't be mapped to the same address on each cpu with
my approach.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v2 00/12] xen/x86: use per-vcpu stacks for 64 bit pv domains

2018-01-22 Thread Juergen Gross
On 22/01/18 17:51, Jan Beulich wrote:
 On 22.01.18 at 16:00,  wrote:
>> On 22/01/18 15:48, Jan Beulich wrote:
>> On 22.01.18 at 15:38,  wrote:
 On 22/01/18 15:22, Jan Beulich wrote:
 On 22.01.18 at 15:18,  wrote:
>> On 22/01/18 13:50, Jan Beulich wrote:
>> On 22.01.18 at 13:32,  wrote:
 As a preparation for doing page table isolation in the Xen hypervisor
 in order to mitigate "Meltdown" use dedicated stacks, GDT and TSS for
 64 bit PV domains mapped to the per-domain virtual area.

 The per-vcpu stacks are used for early interrupt handling only. After
 saving the domain's registers stacks are switched back to the normal
 per physical cpu ones in order to be able to address on-stack data
 from other cpus e.g. while handling IPIs.

 Adding %cr3 switching between saving of the registers and switching
 the stacks will enable the possibility to run guest code without any
 per physical cpu mapping, i.e. avoiding the threat of a guest being
 able to access other domains data.

 Without any further measures it will still be possible for e.g. a
 guest's user program to read stack data of another vcpu of the same
 domain, but this can be easily avoided by a little PV-ABI modification
 introducing per-cpu user address spaces.

 This series is meant as a replacement for Andrew's patch series:
 "x86: Prerequisite work for a Xen KAISER solution".
>>>
>>> Considering in particular the two reverts, what I'm missing here
>>> is a clear description of the meaningful additional protection this
>>> approach provides over the band-aid. For context see also
>>> https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg01735.html
>>>  
>>
>> My approach supports mapping only the following data while the guest is
>> running (apart form the guest's own data, of course):
>>
>> - the per-vcpu entry stacks of the domain which will contain only the
>>   guest's registers saved when an interrupt occurs
>> - the per-vcpu GDTs and TSSs of the domain
>> - the IDT
>> - the interrupt handler code (arch/x86/x86_64/[compat/]entry.S
>>
>> All other hypervisor data and code can be completely hidden from the
>> guests.
>
> I understand that. What I'm not clear about is: Which parts of
> the additionally hidden data are actually necessary (or at least
> very desirable) to hide?

 Necessary:
 - other guests' memory (e.g. physical memory 1:1 mapping)
 - data from other guests e.g.in stack pages, debug buffers, I/O buffers,
   code emulator buffers
 - other guests' register values e.g. in vcpu structure
>>>
>>> All of this is already being made invisible by the band-aid (with the
>>> exception of leftovers on the hypervisor stacks across context
>>> switches, which we've already said could be taken care of by
>>> memset()ing that area). I'm asking about the _additional_ benefits
>>> of your approach.
>>
>> I'm quite sure the performance will be much better as it doesn't require
>> per physical cpu L4 page tables, but just a shadow L4 table for each
>> guest L4 table, similar to the Linux kernel KPTI approach.
> 
> But isn't that model having the same synchronization issues upon
> guest L4 updates which Andrew was fighting with?

I don't think so, as the number of shadows will always only be max. 1
with my approach.

Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] PVH backports to 4.9 and 4.8

2018-01-22 Thread Simon Gaiser
George Dunlap:
> Part of our solution to XSA-254 SP3 (aka "Meltdown") is to backport
> the PVH mode from 4.10 to 4.9 and 4.8.  This will first allow people
> able to run PVH kernels to switch their PV guests directly to PVH
> guests; and second, eventually enable the backport of patches which
> will enable transparent changing of PV guests into PVH guests.
> 
> All of the hypervisor support seems to have existed already in 4.8, so
> the only backports involve toolstack patches.
> 
> I've put up two trees for a first-cut backport of the PVH
> functionality, to 4.9 and 4.8 here:
> 
> git://xenbits.xen.org/people/gdunlap/xen.git
> 
> Branches out/pvh-backport/4.8/v1 and out/pvh-backport/4.9/v1
> 
> Below are the patches backported from 4.10 to 4.9 (23 patches total):
[...]

So future 4.8 releases will include the backports, right? Asking because
the AFAICS the 4.8.3-pre-shim-comet branch include them but staging-4.8
does not.

Simon



signature.asc
Description: OpenPGP digital signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [linux-next test] 118267: regressions - trouble: broken/fail/pass

2018-01-22 Thread osstest service owner
flight 118267 linux-next real [real]
http://logs.test-lab.xenproject.org/osstest/logs/118267/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-libvirt-pair broken
 test-amd64-i386-libvirt-xsm  broken
 test-amd64-i386-pair broken
 test-amd64-amd64-xl  broken
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsmbroken
 test-amd64-i386-freebsd10-amd64 broken
 test-amd64-i386-examine   8 reboot   fail REGR. vs. 118215
 test-amd64-amd64-xl-credit2   7 xen-boot fail REGR. vs. 118215
 test-amd64-amd64-xl-xsm   7 xen-boot fail REGR. vs. 118215
 test-amd64-i386-xl-qemuu-win10-i386  7 xen-boot  fail REGR. vs. 118215
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 xen-boot  fail REGR. vs. 118215
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 
118215
 test-amd64-i386-xl7 xen-boot fail REGR. vs. 118215
 test-amd64-i386-qemut-rhel6hvm-intel  7 xen-boot fail REGR. vs. 118215
 test-amd64-amd64-libvirt-pair 10 xen-boot/src_host   fail REGR. vs. 118215
 test-amd64-amd64-libvirt-pair 11 xen-boot/dst_host   fail REGR. vs. 118215
 test-amd64-amd64-pygrub   7 xen-boot fail REGR. vs. 118215
 test-amd64-amd64-xl-qemuu-ws16-amd64  7 xen-boot fail REGR. vs. 118215
 test-amd64-amd64-xl-qemut-win7-amd64  7 xen-boot fail REGR. vs. 118215
 test-amd64-amd64-xl-qemut-win10-i386  7 xen-boot fail REGR. vs. 118215
 test-amd64-amd64-qemuu-nested-intel  7 xen-boot  fail REGR. vs. 118215
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  7 xen-bootfail REGR. vs. 118215
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-boot fail REGR. 
vs. 118215
 test-amd64-i386-rumprun-i386  7 xen-boot fail REGR. vs. 118215
 test-amd64-amd64-libvirt-xsm  7 xen-boot fail REGR. vs. 118215
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 
118215
 test-amd64-i386-xl-qemut-win10-i386  7 xen-boot  fail REGR. vs. 118215
 test-amd64-amd64-xl-multivcpu  7 xen-bootfail REGR. vs. 118215
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 
118215
 test-amd64-amd64-xl-qcow2 7 xen-boot fail REGR. vs. 118215
 test-amd64-amd64-xl-pvhv2-amd  7 xen-bootfail REGR. vs. 118215
 test-amd64-amd64-xl-qemut-ws16-amd64  7 xen-boot fail REGR. vs. 118215
 test-amd64-amd64-xl-qemut-debianhvm-amd64  7 xen-bootfail REGR. vs. 118215
 test-amd64-amd64-qemuu-nested-amd  7 xen-bootfail REGR. vs. 118215
 test-amd64-i386-xl-qemuu-debianhvm-amd64  7 xen-boot fail REGR. vs. 118215
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm  7 xen-boot fail REGR. vs. 118215
 test-amd64-i386-xl-qemut-ws16-amd64  7 xen-boot  fail REGR. vs. 118215
 test-amd64-i386-xl-qemut-debianhvm-amd64  7 xen-boot fail REGR. vs. 118215
 test-amd64-i386-xl-xsm7 xen-boot fail REGR. vs. 118215
 test-amd64-i386-qemut-rhel6hvm-amd  7 xen-boot   fail REGR. vs. 118215
 test-amd64-amd64-libvirt  7 xen-boot fail REGR. vs. 118215
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 118215
 test-amd64-amd64-xl-pvhv2-intel  7 xen-boot  fail REGR. vs. 118215
 test-amd64-amd64-libvirt-vhd  7 xen-boot fail REGR. vs. 118215
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 118215
 test-amd64-i386-qemuu-rhel6hvm-intel  7 xen-boot fail REGR. vs. 118215
 test-amd64-i386-xl-qemuu-ws16-amd64  7 xen-boot  fail REGR. vs. 118215
 test-amd64-i386-xl-raw7 xen-boot fail REGR. vs. 118215
 test-amd64-amd64-xl-qemuu-win10-i386  7 xen-boot fail REGR. vs. 118215
 test-amd64-amd64-amd64-pvgrub  7 xen-bootfail REGR. vs. 118215
 test-amd64-amd64-xl-qemuu-ovmf-amd64  7 xen-boot fail REGR. vs. 118215
 test-amd64-amd64-rumprun-amd64  7 xen-boot   fail REGR. vs. 118215
 test-amd64-i386-freebsd10-i386  7 xen-boot   fail REGR. vs. 118215
 test-amd64-i386-xl-qemuu-win7-amd64  7 xen-boot  fail REGR. vs. 118215
 test-amd64-i386-libvirt   7 xen-boot fail REGR. vs. 118215
 test-amd64-i386-qemuu-rhel6hvm-amd  7 xen-boot   fail REGR. vs. 118215
 test-amd64-amd64-i386-pvgrub  7 xen-boot fail REGR. vs. 118215
 test-amd64-amd64-xl-qemuu-win7-amd64  7 xen-boot fail REGR. vs. 118215
 test-amd64-amd64-pair10 xen-boot/src_hostfail REGR. vs. 118215
 test-amd64-amd64-pair11 xen-boot/dst_hostfail REGR. vs. 118215
 test-amd64-i386-xl-qemut-win7-amd64  7 xen-boot  fail REGR. vs. 118215
 

[Xen-devel] [qemu-mainline test] 118270: tolerable FAIL - PUSHED

2018-01-22 Thread osstest service owner
flight 118270 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/118270/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 118253
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 118253
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 118253
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 118253
 test-armhf-armhf-libvirt-xsm 14 saverestore-support-checkfail  like 118253
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 118253
 test-amd64-amd64-xl-pvhv2-intel 12 guest-start fail never pass
 test-amd64-amd64-xl-pvhv2-amd 12 guest-start  fail  never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass

version targeted for testing:
 qemuuf7c6b96c3e639e871bb929038a1b82ded7f39437
baseline version:
 qemuub384cd95eb9c6f73ad84ed1bb0717a26e29cc78f

Last test of basis   118253  2018-01-21 09:20:22 Z1 days
Testing same since   118270  2018-01-22 11:44:53 Z0 days1 attempts


People who touched revisions under test:
  Anton Nefedov 
  John Snow 
  Peter Maydell 
  Thomas Huth 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386  

Re: [Xen-devel] [PATCH v9 06/11] x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point

2018-01-22 Thread Boris Ostrovsky



On 01/22/2018 07:17 PM, Andrew Cooper wrote:

On 22/01/2018 22:27, Boris Ostrovsky wrote:

On 01/19/2018 08:36 AM, Andrew Cooper wrote:

On 19/01/18 11:43, Jan Beulich wrote:


@@ -99,6 +106,10 @@ UNLIKELY_END(realmode)
  .Lvmx_vmentry_fail:
  sti
  SAVE_ALL
+
+SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo Clob: acd */

I think the use of the PV variant here requires a comment.

Oh.  It used to have one...  I'll try to find it.

I, in fact, meant to ask about this for a long time and always forgot.
Perhaps your comment will say more than just why a PV variant is used
here but in case it won't --- why do we have *any* mitigation here? We
are never returning to the guest, do we?


We never return to *this* guest, but we are still open to abuse from a
separate hyperthread, so still need to set SPEC_CTRL.IBRS if we are
using IBRS for safety.  (If we are using lfence+jmp or repoline then we
don't need this change, but its not a hotpath so doesn't warrant yet
another variant of SPEC_CTRL_ENTRY_FROM_*.)



We wrote IBRS during VMEXIT. I thought this serves as barrier for all 
preceding predictions (both threads) from lower protection mode.


Is the concern here that SPEC_CTRL_EXIT_TO_GUEST (before VMENTER) may 
set IBRS to 0 and *that* will open hypervisor to other thread's mischief?


-boris

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [linux-linus test] 118268: trouble: blocked/broken/fail/pass

2018-01-22 Thread osstest service owner
flight 118268 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/118268/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-raw   broken
 test-amd64-amd64-xl-pvhv2-amd broken
 test-amd64-amd64-rumprun-amd64 broken
 test-amd64-amd64-xl-xsm  broken
 test-amd64-i386-qemut-rhel6hvm-amd broken
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsmbroken
 test-amd64-amd64-libvirt-vhd broken
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm   broken
 build-armhf  broken
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm   broken
 test-amd64-amd64-xl-xsm   4 host-install(4)broken REGR. vs. 118250
 test-amd64-i386-xl-raw4 host-install(4)broken REGR. vs. 118250
 test-amd64-i386-qemut-rhel6hvm-amd  4 host-install(4)  broken REGR. vs. 118250
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 4 host-install(4) broken REGR. 
vs. 118250
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 4 host-install(4) broken REGR. 
vs. 118250
 test-amd64-amd64-xl-pvhv2-amd  4 host-install(4)   broken REGR. vs. 118250
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 4 host-install(4) broken 
REGR. vs. 118250
 test-amd64-amd64-libvirt-vhd  4 host-install(4)broken REGR. vs. 118250
 test-amd64-amd64-rumprun-amd64  4 host-install(4)  broken REGR. vs. 118250
 test-amd64-i386-examine   5 host-install   broken REGR. vs. 118250
 build-armhf   4 host-install(4)broken REGR. vs. 118250

Tests which did not succeed, but are not blocking:
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-examine  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-cubietruck  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 118250
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 118250
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 118250
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 118250
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 118250
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 118250
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-xsm  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass
 test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never pass

version targeted for testing:
 linux

Re: [Xen-devel] Xen fails to boot inside QEMU on x86, no VMX

2018-01-22 Thread Andrew Cooper
On 23/01/2018 00:38, Stefano Stabellini wrote:
> On Tue, 23 Jan 2018, Andrew Cooper wrote:
>> On 22/01/2018 23:48, Stefano Stabellini wrote:
>>> Hi all,
>>>
>>> Running Xen inside QEMU x86 without KVM acceleartion and without VMX
>>> emulation leads to the failure appended below.
>>>
>>> This trivial workaround "fixes" the problem:
>>>
>>> diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c
>>> index 72f30d9..a67d6c1 100644
>>> --- a/xen/arch/x86/extable.c
>>> +++ b/xen/arch/x86/extable.c
>>> @@ -168,7 +168,6 @@ static int __init stub_selftest(void)
>>> _ASM_EXTABLE(.Lret%=, .Lfix%=)
>>> : [exn] "+m" (res)
>>> : [stb] "r" (addr), "a" (tests[i].rax));
>>> -ASSERT(res == tests[i].res.raw);
>>>  }
>>>  
>>>  return 0;
>>>
>>>
>>> Any suggestions?
>> Which i failed?  This will probably be an emulation bug in Qemu.
> i=2 is the culprit

Qemu doesn't emulate %rsp-based memory accesses properly.  It should
raise #SS[0], and is presumably raising #GP[0] instead.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen fails to boot inside QEMU on x86, no VMX

2018-01-22 Thread Stefano Stabellini
On Tue, 23 Jan 2018, Andrew Cooper wrote:
> On 22/01/2018 23:48, Stefano Stabellini wrote:
> > Hi all,
> >
> > Running Xen inside QEMU x86 without KVM acceleartion and without VMX
> > emulation leads to the failure appended below.
> >
> > This trivial workaround "fixes" the problem:
> >
> > diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c
> > index 72f30d9..a67d6c1 100644
> > --- a/xen/arch/x86/extable.c
> > +++ b/xen/arch/x86/extable.c
> > @@ -168,7 +168,6 @@ static int __init stub_selftest(void)
> > _ASM_EXTABLE(.Lret%=, .Lfix%=)
> > : [exn] "+m" (res)
> > : [stb] "r" (addr), "a" (tests[i].rax));
> > -ASSERT(res == tests[i].res.raw);
> >  }
> >  
> >  return 0;
> >
> >
> > Any suggestions?
> 
> Which i failed?  This will probably be an emulation bug in Qemu.

i=2 is the culprit___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCHv3] xen: Add EFI_LOAD_OPTION support

2018-01-22 Thread Tamas K Lengyel
When booting Xen via UEFI the Xen config file can contain multiple sections
each describing different boot options. It is currently only possible to choose
which section to boot with if the buffer contains a string. UEFI provides a
different standard to pass optional arguments to an application, and in this
patch we make Xen properly parse this buffer, thus making it possible to have
separate EFI boot options present for the different config sections.

Signed-off-by: Tamas K Lengyel 
---
Cc: Jan Beulich 
Cc: ope...@googlegroups.com

v3: simplify sanity checking logic
v2: move EFI_LOAD_OPTION definition into file that uses it
add more sanity checks to validate the buffer
---
 xen/common/efi/boot.c | 49 +++--
 1 file changed, 43 insertions(+), 6 deletions(-)

diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c
index 469bf980cc..3537fe9588 100644
--- a/xen/common/efi/boot.c
+++ b/xen/common/efi/boot.c
@@ -88,6 +88,16 @@ typedef struct _EFI_APPLE_PROPERTIES {
 EFI_APPLE_PROPERTIES_GETALL GetAll;
 } EFI_APPLE_PROPERTIES;
 
+typedef struct _EFI_LOAD_OPTION {
+UINT32 Attributes;
+UINT16 FilePathListLength;
+CHAR16 Description[];
+} EFI_LOAD_OPTION;
+
+#define LOAD_OPTION_ACTIVE  0x0001
+#define LOAD_OPTION_FORCE_RECONNECT 0x0002
+#define LOAD_OPTION_HIDDEN  0x0008
+
 union string {
 CHAR16 *w;
 char *s;
@@ -375,12 +385,39 @@ static void __init PrintErrMesg(const CHAR16 *mesg, 
EFI_STATUS ErrCode)
 
 static unsigned int __init get_argv(unsigned int argc, CHAR16 **argv,
 CHAR16 *cmdline, UINTN cmdsize,
-CHAR16 **options)
+CHAR16 **options, bool *elo_active)
 {
 CHAR16 *ptr = (CHAR16 *)(argv + argc + 1), *prev = NULL;
 bool prev_sep = true;
 
-for ( ; cmdsize > sizeof(*cmdline) && *cmdline;
+if ( cmdsize > sizeof(EFI_LOAD_OPTION) &&
+ *(CHAR16 *)((void *)cmdline + cmdsize - sizeof(*cmdline)) != L'\0' )
+{
+const EFI_LOAD_OPTION *elo = (const EFI_LOAD_OPTION *)cmdline;
+
+/* The absolute minimum the size of the buffer it needs to be */
+size_t size_check = offsetof(EFI_LOAD_OPTION, Description[1]) +
+elo->FilePathListLength;
+
+if ( (elo->Attributes & LOAD_OPTION_ACTIVE) && size_check < cmdsize )
+{
+const CHAR16 *desc = elo->Description;
+size_t desc_length = 0;
+
+/* Find Description string length in its possible space */
+while ( desc_length < cmdsize - size_check && *desc++ != L'\0')
+desc_length += sizeof(*desc);
+
+if ( size_check + desc_length < cmdsize )
+{
+*elo_active = true;
+cmdline = (void *)cmdline + size_check + desc_length;
+cmdsize = cmdsize - size_check - desc_length;
+}
+}
+}
+
+for ( ; cmdsize >= sizeof(*cmdline) && *cmdline;
 cmdsize -= sizeof(*cmdline), ++cmdline )
 {
 bool cur_sep = *cmdline == L' ' || *cmdline == L'\t';
@@ -1071,7 +1108,7 @@ efi_start(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE 
*SystemTable)
 EFI_SHIM_LOCK_PROTOCOL *shim_lock;
 EFI_GRAPHICS_OUTPUT_PROTOCOL *gop = NULL;
 union string section = { NULL }, name;
-bool base_video = false;
+bool base_video = false, elo_active = false;
 char *option_str;
 bool use_cfg_file;
 
@@ -1096,17 +1133,17 @@ efi_start(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE 
*SystemTable)
 if ( use_cfg_file )
 {
 argc = get_argv(0, NULL, loaded_image->LoadOptions,
-loaded_image->LoadOptionsSize, NULL);
+loaded_image->LoadOptionsSize, NULL, _active);
 if ( argc > 0 &&
  efi_bs->AllocatePool(EfiLoaderData,
   (argc + 1) * sizeof(*argv) +
   loaded_image->LoadOptionsSize,
   (void **)) == EFI_SUCCESS )
 get_argv(argc, argv, loaded_image->LoadOptions,
- loaded_image->LoadOptionsSize, );
+ loaded_image->LoadOptionsSize, , _active);
 else
 argc = 0;
-for ( i = 1; i < argc; ++i )
+for ( i = !elo_active; i < argc; ++i )
 {
 CHAR16 *ptr = argv[i];
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 06/11] x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point

2018-01-22 Thread Andrew Cooper
On 22/01/2018 22:27, Boris Ostrovsky wrote:
> On 01/19/2018 08:36 AM, Andrew Cooper wrote:
>> On 19/01/18 11:43, Jan Beulich wrote:
>>
 @@ -99,6 +106,10 @@ UNLIKELY_END(realmode)
  .Lvmx_vmentry_fail:
  sti
  SAVE_ALL
 +
 +SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo Clob: acd */
>>> I think the use of the PV variant here requires a comment.
>> Oh.  It used to have one...  I'll try to find it.
> I, in fact, meant to ask about this for a long time and always forgot.
> Perhaps your comment will say more than just why a PV variant is used
> here but in case it won't --- why do we have *any* mitigation here? We
> are never returning to the guest, do we?

We never return to *this* guest, but we are still open to abuse from a
separate hyperthread, so still need to set SPEC_CTRL.IBRS if we are
using IBRS for safety.  (If we are using lfence+jmp or repoline then we
don't need this change, but its not a hotpath so doesn't warrant yet
another variant of SPEC_CTRL_ENTRY_FROM_*.)

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen fails to boot inside QEMU on x86, no VMX

2018-01-22 Thread Andrew Cooper
On 22/01/2018 23:48, Stefano Stabellini wrote:
> Hi all,
>
> Running Xen inside QEMU x86 without KVM acceleartion and without VMX
> emulation leads to the failure appended below.
>
> This trivial workaround "fixes" the problem:
>
> diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c
> index 72f30d9..a67d6c1 100644
> --- a/xen/arch/x86/extable.c
> +++ b/xen/arch/x86/extable.c
> @@ -168,7 +168,6 @@ static int __init stub_selftest(void)
> _ASM_EXTABLE(.Lret%=, .Lfix%=)
> : [exn] "+m" (res)
> : [stb] "r" (addr), "a" (tests[i].rax));
> -ASSERT(res == tests[i].res.raw);
>  }
>  
>  return 0;
>
>
> Any suggestions?

Which i failed?  This will probably be an emulation bug in Qemu.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] Xen fails to boot inside QEMU on x86, no VMX

2018-01-22 Thread Stefano Stabellini
Hi all,

Running Xen inside QEMU x86 without KVM acceleartion and without VMX
emulation leads to the failure appended below.

This trivial workaround "fixes" the problem:

diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c
index 72f30d9..a67d6c1 100644
--- a/xen/arch/x86/extable.c
+++ b/xen/arch/x86/extable.c
@@ -168,7 +168,6 @@ static int __init stub_selftest(void)
_ASM_EXTABLE(.Lret%=, .Lfix%=)
: [exn] "+m" (res)
: [stb] "r" (addr), "a" (tests[i].rax));
-ASSERT(res == tests[i].res.raw);
 }
 
 return 0;


Any suggestions?

Cheers,

Stefano

---

(XEN) traps.c:1550: GPF (): 82d0b041 [82d0b041] -> 
82d0803654b2
(XEN) traps.c:1550: GPF (): 82d0b040 [82d0b040] -> 
82d0803654b2
(XEN) Assertion 'res == tests[i].res.raw' failed at extable.c:171
(XEN) [ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]
(XEN) CPU:0
(XEN) RIP:e008:[] extable.c#stub_selftest+0xc8/0xee
(XEN) RFLAGS: 0287   CONTEXT: hypervisor
(XEN) rax: 000d   rbx:    rcx: 0040
(XEN) rdx: 8300   rsi: 0007c7ff   rdi: 83013de1b040
(XEN) rbp: 82d08046fda8   rsp: 82d08046fd58   r8:  83013de24000
(XEN) r9:  00f3   r10: 0004   r11: 0002
(XEN) r12: 82d0804148b0   r13: 82d0805a8028   r14: 82d0b040
(XEN) r15: 82d08046   cr0: 8005003b   cr4: 06e0
(XEN) cr3: bd66   cr2: 
(XEN) fsb:    gsb:    gss: 
(XEN) ds:    es:    fs:    gs:    ss:    cs: e008
(XEN) Xen code around  (extable.c#stub_selftest+0xc8/0xee):
(XEN)  c8 49 39 44 24 10 74 02 <0f> 0b 49 83 c4 18 48 8d 05 5e 97 01 00 49 39 c4
(XEN) Xen stack trace from rsp=82d08046fd58:
(XEN)82d0805a7428 0040 82d08046fd88 000d
(XEN)82d08046fd98 82d08041ae38 82d08041af98 0002
(XEN)82d080452820 0001 82d08046fdc8 82d0803e01e0
(XEN)0002 83013de35fe0 82d08046fef8 82d080404537
(XEN) 003a8180 0167 01ff
(XEN)0002 0002 0002 0001
(XEN)0001 0001 0001 
(XEN)01db 01eb 82d080440d68 0015
(XEN)0192b000 0014 00013de48000 
(XEN)8309ef70 0001 8309efa0 8309efb0
(XEN)  0008 0001006e
(XEN)0003 02f8  
(XEN)0048   bd5d922e
(XEN)bbf24fe0 82d0802000f3  
(XEN)   
(XEN)   
(XEN)   
(XEN)   
(XEN)   
(XEN)   
(XEN) Xen call trace:
(XEN)[] extable.c#stub_selftest+0xc8/0xee
(XEN)[] do_initcalls+0x22/0x31
(XEN)[] __start_xen+0x20a0/0x24ee
(XEN)[] __high_start+0x53/0x60

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [seabios test] 118269: regressions - FAIL

2018-01-22 Thread osstest service owner
flight 118269 seabios real [real]
http://logs.test-lab.xenproject.org/osstest/logs/118269/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-qemuu-rhel6hvm-amdbroken in 118264
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 broken in 118264
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail in 118264 REGR. vs. 
115539

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 4 host-install(4) broken in 118264 
pass in 118269
 test-amd64-i386-qemuu-rhel6hvm-amd 4 host-install(4) broken in 118264 pass in 
118269
 test-amd64-amd64-xl-qemuu-ws16-amd64 16 guest-localmigrate/x10 fail pass in 
118264

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 115539
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 115539
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 115539
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass

version targeted for testing:
 seabios  14d91c353e19b7085fdbb7b2dcc43f3355665670
baseline version:
 seabios  0ca6d6277dfafc671a5b3718cbeb5c78e2a888ea

Last test of basis   115539  2017-11-03 20:48:58 Z   80 days
Failing since115733  2017-11-10 17:19:59 Z   73 days   86 attempts
Testing same since   118140  2018-01-17 05:09:48 Z5 days7 attempts


People who touched revisions under test:
  Kevin O'Connor 
  Marcel Apfelbaum 
  Michael S. Tsirkin 
  Paul Menzel 
  Stefan Berger 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass
 test-amd64-amd64-qemuu-nested-amdfail
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-ws16-amd64 fail
 test-amd64-i386-xl-qemuu-ws16-amd64  fail
 test-amd64-amd64-xl-qemuu-win10-i386 fail
 test-amd64-i386-xl-qemuu-win10-i386  fail
 test-amd64-amd64-qemuu-nested-intel  pass
 test-amd64-i386-qemuu-rhel6hvm-intel pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary

broken-job test-amd64-i386-qemuu-rhel6hvm-amd broken
broken-job test-amd64-amd64-xl-qemuu-debianhvm-amd64 broken

Not pushing.


commit 14d91c353e19b7085fdbb7b2dcc43f3355665670
Author: Marcel Apfelbaum 
Date:   Thu Jan 11 22:15:12 2018 +0200

pci: fix 'io hints' capability for RedHat PCI bridges

Commit ec6cb17f (pci: enable RedHat PCI bridges to reserve additional
 resources on PCI init)
added a new vendor specific PCI capability for RedHat PCI 

Re: [Xen-devel] [PATCH v9 06/11] x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point

2018-01-22 Thread Boris Ostrovsky
On 01/19/2018 08:36 AM, Andrew Cooper wrote:
> On 19/01/18 11:43, Jan Beulich wrote:
>
>>> @@ -99,6 +106,10 @@ UNLIKELY_END(realmode)
>>>  .Lvmx_vmentry_fail:
>>>  sti
>>>  SAVE_ALL
>>> +
>>> +SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo Clob: acd */
>> I think the use of the PV variant here requires a comment.
> Oh.  It used to have one...  I'll try to find it.

I, in fact, meant to ask about this for a long time and always forgot.
Perhaps your comment will say more than just why a PV variant is used
here but in case it won't --- why do we have *any* mitigation here? We
are never returning to the guest, do we?

-boris

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Makefile] asm: handle comments when creating header file

2018-01-22 Thread Tautschnig, Michael
Jan, All,

> On 10 Jan 2018, at 16:02, Jan Beulich  wrote:
> 
 On 10.01.18 at 16:14,  wrote:
>> In the early steps of compilation, the asm header files are created, such
>> as include/asm-$(TARGET_ARCH)/asm-offsets.h. These files depend on the
>> assembly file arch/$(TARGET_ARCH)/asm-offsets.s, which is generated
>> before. Depending on the used assembler, there might be comments in the
>> assembly files.
>> 
>> This commit adds handling comments in the assembler during the creation of
>> the asm header files.
> 
> I have a hard time seeing how ...
> 
>> --- a/xen/Makefile
>> +++ b/xen/Makefile
>> @@ -189,7 +189,7 @@ include/asm-$(TARGET_ARCH)/asm-offsets.h: 
>> arch/$(TARGET_ARCH)/asm-offsets.s
>>echo "#ifndef __ASM_OFFSETS_H__"; \
>>echo "#define __ASM_OFFSETS_H__"; \
>>echo ""; \
>> -  sed -rne "/==>/{s:.*==>(.*)<==.*:\1:; s: [\$$#]: :; p;}"; \
> 
> ... this pattern could match any comment that we currently have.
> Would you mind clarifying what it is that is actually broken (and
> hence wants/needs fixing)?
> 

Re-adding the new line:

+ sed -rne "/^[^#].*==>/{s:.*==>(.*)<==.*:\1:; s: [\$$#]: :; p;}"; \

The change "handles comments" by not printing those to the generated header
file, where they would confuse the preprocessor (as "#" starts comments at
the assembler level). It seems perfectly ok if this matches any existing
comments as none such should get printed to the header file.

The problems showed up when using goto-gcc (a compiler for subsequent use of
static analysis tooling), which embeds additional information in the comments.

Hope this helps,
Michael





Amazon Web Services UK Limited. Registered in England and Wales with 
registration number 08650665 and which has its registered office at 60 Holborn 
Viaduct, London EC1A 2FD, United Kingdom.


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-unstable-smoke test] 118274: tolerable all pass - PUSHED

2018-01-22 Thread osstest service owner
flight 118274 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/118274/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  a5e7ce9560b408dbdc2f7fb8a58f6209601cc054
baseline version:
 xen  4dcfd7d1436c77ee92081a36cf63f569dc4ef725

Last test of basis   118271  2018-01-22 16:14:38 Z0 days
Testing same since   118274  2018-01-22 19:01:16 Z0 days1 attempts


People who touched revisions under test:
  Jan Beulich 
  Julien Grall 
  Wei Liu 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-i386 pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   4dcfd7d143..a5e7ce9560  a5e7ce9560b408dbdc2f7fb8a58f6209601cc054 -> smoke

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v2 00/12] xen/x86: use per-vcpu stacks for 64 bit pv domains

2018-01-22 Thread Andrew Cooper
On 22/01/18 18:48, George Dunlap wrote:
> On 01/22/2018 06:39 PM, Andrew Cooper wrote:
>> On 22/01/18 16:51, Jan Beulich wrote:
>> On 22.01.18 at 16:00,  wrote:
 On 22/01/18 15:48, Jan Beulich wrote:
 On 22.01.18 at 15:38,  wrote:
>> On 22/01/18 15:22, Jan Beulich wrote:
>> On 22.01.18 at 15:18,  wrote:
 On 22/01/18 13:50, Jan Beulich wrote:
 On 22.01.18 at 13:32,  wrote:
>> As a preparation for doing page table isolation in the Xen hypervisor
>> in order to mitigate "Meltdown" use dedicated stacks, GDT and TSS for
>> 64 bit PV domains mapped to the per-domain virtual area.
>>
>> The per-vcpu stacks are used for early interrupt handling only. After
>> saving the domain's registers stacks are switched back to the normal
>> per physical cpu ones in order to be able to address on-stack data
>> from other cpus e.g. while handling IPIs.
>>
>> Adding %cr3 switching between saving of the registers and switching
>> the stacks will enable the possibility to run guest code without any
>> per physical cpu mapping, i.e. avoiding the threat of a guest being
>> able to access other domains data.
>>
>> Without any further measures it will still be possible for e.g. a
>> guest's user program to read stack data of another vcpu of the same
>> domain, but this can be easily avoided by a little PV-ABI 
>> modification
>> introducing per-cpu user address spaces.
>>
>> This series is meant as a replacement for Andrew's patch series:
>> "x86: Prerequisite work for a Xen KAISER solution".
> Considering in particular the two reverts, what I'm missing here
> is a clear description of the meaningful additional protection this
> approach provides over the band-aid. For context see also
> https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg01735.html
>  
 My approach supports mapping only the following data while the guest is
 running (apart form the guest's own data, of course):

 - the per-vcpu entry stacks of the domain which will contain only the
   guest's registers saved when an interrupt occurs
 - the per-vcpu GDTs and TSSs of the domain
 - the IDT
 - the interrupt handler code (arch/x86/x86_64/[compat/]entry.S

 All other hypervisor data and code can be completely hidden from the
 guests.
>>> I understand that. What I'm not clear about is: Which parts of
>>> the additionally hidden data are actually necessary (or at least
>>> very desirable) to hide?
>> Necessary:
>> - other guests' memory (e.g. physical memory 1:1 mapping)
>> - data from other guests e.g.in stack pages, debug buffers, I/O buffers,
>>   code emulator buffers
>> - other guests' register values e.g. in vcpu structure
> All of this is already being made invisible by the band-aid (with the
> exception of leftovers on the hypervisor stacks across context
> switches, which we've already said could be taken care of by
> memset()ing that area). I'm asking about the _additional_ benefits
> of your approach.
 I'm quite sure the performance will be much better as it doesn't require
 per physical cpu L4 page tables, but just a shadow L4 table for each
 guest L4 table, similar to the Linux kernel KPTI approach.
>>> But isn't that model having the same synchronization issues upon
>>> guest L4 updates which Andrew was fighting with?
>> (Condensing a lot of threads down into one)
>>
>> All the methods have L4 synchronisation update issues, until we have a
>> PV ABI which guarantees that L4's don't get reused.  Any improvements to
>> the shadowing/synchronisation algorithm will benefit all approaches.
>>
>> Juergen: you're now adding a LTR into the context switch path which
>> tends to be very slow.  I.e. As currently presented, this series
>> necessarily has a higher runtime overhead than Jan's XPTI.
>>
>> One of my concerns is that this patch series moves further away from the
>> secondary goal of my KAISER series, which was to have the IDT and GDT
>> mapped at the same linear addresses on every CPU so a) SIDT/SGDT don't
>> leak which CPU you're currently scheduled on into PV guests and b) the
>> context switch code can drop a load of its slow instructions like LGDT
>> and the VMWRITEs to update the VMCS.
>>
>> Jan: As to the things not covered by the current XPTI, hiding most of
>> the .text section is important to prevent fingerprinting or ROP
>> scanning.  This is a defence-in-depth argument, but a guest being easily
>> able to identify whether certain XSAs are fixed or not is quite bad. 
> I'm afraid we have a fairly different opinion of what is "quite bad".

I suggest you try 

Re: [Xen-devel] [PATCH RFC v2 00/12] xen/x86: use per-vcpu stacks for 64 bit pv domains

2018-01-22 Thread George Dunlap
On 01/22/2018 06:39 PM, Andrew Cooper wrote:
> On 22/01/18 16:51, Jan Beulich wrote:
> On 22.01.18 at 16:00,  wrote:
>>> On 22/01/18 15:48, Jan Beulich wrote:
>>> On 22.01.18 at 15:38,  wrote:
> On 22/01/18 15:22, Jan Beulich wrote:
> On 22.01.18 at 15:18,  wrote:
>>> On 22/01/18 13:50, Jan Beulich wrote:
>>> On 22.01.18 at 13:32,  wrote:
> As a preparation for doing page table isolation in the Xen hypervisor
> in order to mitigate "Meltdown" use dedicated stacks, GDT and TSS for
> 64 bit PV domains mapped to the per-domain virtual area.
>
> The per-vcpu stacks are used for early interrupt handling only. After
> saving the domain's registers stacks are switched back to the normal
> per physical cpu ones in order to be able to address on-stack data
> from other cpus e.g. while handling IPIs.
>
> Adding %cr3 switching between saving of the registers and switching
> the stacks will enable the possibility to run guest code without any
> per physical cpu mapping, i.e. avoiding the threat of a guest being
> able to access other domains data.
>
> Without any further measures it will still be possible for e.g. a
> guest's user program to read stack data of another vcpu of the same
> domain, but this can be easily avoided by a little PV-ABI modification
> introducing per-cpu user address spaces.
>
> This series is meant as a replacement for Andrew's patch series:
> "x86: Prerequisite work for a Xen KAISER solution".
 Considering in particular the two reverts, what I'm missing here
 is a clear description of the meaningful additional protection this
 approach provides over the band-aid. For context see also
 https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg01735.html
  
>>> My approach supports mapping only the following data while the guest is
>>> running (apart form the guest's own data, of course):
>>>
>>> - the per-vcpu entry stacks of the domain which will contain only the
>>>   guest's registers saved when an interrupt occurs
>>> - the per-vcpu GDTs and TSSs of the domain
>>> - the IDT
>>> - the interrupt handler code (arch/x86/x86_64/[compat/]entry.S
>>>
>>> All other hypervisor data and code can be completely hidden from the
>>> guests.
>> I understand that. What I'm not clear about is: Which parts of
>> the additionally hidden data are actually necessary (or at least
>> very desirable) to hide?
> Necessary:
> - other guests' memory (e.g. physical memory 1:1 mapping)
> - data from other guests e.g.in stack pages, debug buffers, I/O buffers,
>   code emulator buffers
> - other guests' register values e.g. in vcpu structure
 All of this is already being made invisible by the band-aid (with the
 exception of leftovers on the hypervisor stacks across context
 switches, which we've already said could be taken care of by
 memset()ing that area). I'm asking about the _additional_ benefits
 of your approach.
>>> I'm quite sure the performance will be much better as it doesn't require
>>> per physical cpu L4 page tables, but just a shadow L4 table for each
>>> guest L4 table, similar to the Linux kernel KPTI approach.
>> But isn't that model having the same synchronization issues upon
>> guest L4 updates which Andrew was fighting with?
> 
> (Condensing a lot of threads down into one)
> 
> All the methods have L4 synchronisation update issues, until we have a
> PV ABI which guarantees that L4's don't get reused.  Any improvements to
> the shadowing/synchronisation algorithm will benefit all approaches.
> 
> Juergen: you're now adding a LTR into the context switch path which
> tends to be very slow.  I.e. As currently presented, this series
> necessarily has a higher runtime overhead than Jan's XPTI.
> 
> One of my concerns is that this patch series moves further away from the
> secondary goal of my KAISER series, which was to have the IDT and GDT
> mapped at the same linear addresses on every CPU so a) SIDT/SGDT don't
> leak which CPU you're currently scheduled on into PV guests and b) the
> context switch code can drop a load of its slow instructions like LGDT
> and the VMWRITEs to update the VMCS.
> 
> Jan: As to the things not covered by the current XPTI, hiding most of
> the .text section is important to prevent fingerprinting or ROP
> scanning.  This is a defence-in-depth argument, but a guest being easily
> able to identify whether certain XSAs are fixed or not is quite bad. 

I'm afraid we have a fairly different opinion of what is "quite bad".
Suppose we handed users a knob and said, "If you flip this switch,
attackers won't be able to tell if you've fixed XSAs or not without
trying them; but it 

Re: [Xen-devel] [PATCH RFC v2 00/12] xen/x86: use per-vcpu stacks for 64 bit pv domains

2018-01-22 Thread Andrew Cooper
On 22/01/18 16:51, Jan Beulich wrote:
 On 22.01.18 at 16:00,  wrote:
>> On 22/01/18 15:48, Jan Beulich wrote:
>> On 22.01.18 at 15:38,  wrote:
 On 22/01/18 15:22, Jan Beulich wrote:
 On 22.01.18 at 15:18,  wrote:
>> On 22/01/18 13:50, Jan Beulich wrote:
>> On 22.01.18 at 13:32,  wrote:
 As a preparation for doing page table isolation in the Xen hypervisor
 in order to mitigate "Meltdown" use dedicated stacks, GDT and TSS for
 64 bit PV domains mapped to the per-domain virtual area.

 The per-vcpu stacks are used for early interrupt handling only. After
 saving the domain's registers stacks are switched back to the normal
 per physical cpu ones in order to be able to address on-stack data
 from other cpus e.g. while handling IPIs.

 Adding %cr3 switching between saving of the registers and switching
 the stacks will enable the possibility to run guest code without any
 per physical cpu mapping, i.e. avoiding the threat of a guest being
 able to access other domains data.

 Without any further measures it will still be possible for e.g. a
 guest's user program to read stack data of another vcpu of the same
 domain, but this can be easily avoided by a little PV-ABI modification
 introducing per-cpu user address spaces.

 This series is meant as a replacement for Andrew's patch series:
 "x86: Prerequisite work for a Xen KAISER solution".
>>> Considering in particular the two reverts, what I'm missing here
>>> is a clear description of the meaningful additional protection this
>>> approach provides over the band-aid. For context see also
>>> https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg01735.html
>>>  
>> My approach supports mapping only the following data while the guest is
>> running (apart form the guest's own data, of course):
>>
>> - the per-vcpu entry stacks of the domain which will contain only the
>>   guest's registers saved when an interrupt occurs
>> - the per-vcpu GDTs and TSSs of the domain
>> - the IDT
>> - the interrupt handler code (arch/x86/x86_64/[compat/]entry.S
>>
>> All other hypervisor data and code can be completely hidden from the
>> guests.
> I understand that. What I'm not clear about is: Which parts of
> the additionally hidden data are actually necessary (or at least
> very desirable) to hide?
 Necessary:
 - other guests' memory (e.g. physical memory 1:1 mapping)
 - data from other guests e.g.in stack pages, debug buffers, I/O buffers,
   code emulator buffers
 - other guests' register values e.g. in vcpu structure
>>> All of this is already being made invisible by the band-aid (with the
>>> exception of leftovers on the hypervisor stacks across context
>>> switches, which we've already said could be taken care of by
>>> memset()ing that area). I'm asking about the _additional_ benefits
>>> of your approach.
>> I'm quite sure the performance will be much better as it doesn't require
>> per physical cpu L4 page tables, but just a shadow L4 table for each
>> guest L4 table, similar to the Linux kernel KPTI approach.
> But isn't that model having the same synchronization issues upon
> guest L4 updates which Andrew was fighting with?

(Condensing a lot of threads down into one)

All the methods have L4 synchronisation update issues, until we have a
PV ABI which guarantees that L4's don't get reused.  Any improvements to
the shadowing/synchronisation algorithm will benefit all approaches.

Juergen: you're now adding a LTR into the context switch path which
tends to be very slow.  I.e. As currently presented, this series
necessarily has a higher runtime overhead than Jan's XPTI.

One of my concerns is that this patch series moves further away from the
secondary goal of my KAISER series, which was to have the IDT and GDT
mapped at the same linear addresses on every CPU so a) SIDT/SGDT don't
leak which CPU you're currently scheduled on into PV guests and b) the
context switch code can drop a load of its slow instructions like LGDT
and the VMWRITEs to update the VMCS.

Jan: As to the things not covered by the current XPTI, hiding most of
the .text section is important to prevent fingerprinting or ROP
scanning.  This is a defence-in-depth argument, but a guest being easily
able to identify whether certain XSAs are fixed or not is quite bad. 
Also, a load of CPU 0's data data-structures, including the stack is
visible in .data.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4] x86: relocate pvh_info

2018-01-22 Thread Wei Liu
On Mon, Jan 22, 2018 at 06:19:43PM +, Andrew Cooper wrote:
> On 22/01/18 18:17, Wei Liu wrote:
> > On Mon, Jan 22, 2018 at 06:09:14PM +, Andrew Cooper wrote:
> >> On 22/01/18 16:13, Wei Liu wrote:
> >>> Modify early boot code to relocate pvh info as well, so that we can be
> >>> sure __va in __start_xen works.
> >>>
> >>> Signed-off-by: Wei Liu 
> >>> ---
> >>> Cc: Jan Beulich 
> >>> Cc: Andrew Cooper 
> >>> Cc: Roger Pau Monné 
> >>> Cc: Doug Goldstein 
> >>>
> >>> v4: inlcude autoconf.h directly. The code itself is unchanged.
> >>> ---
> >>>  xen/arch/x86/boot/Makefile |  4 +++
> >>>  xen/arch/x86/boot/defs.h   |  3 +++
> >>>  xen/arch/x86/boot/head.S   | 25 ++
> >>>  xen/arch/x86/boot/reloc.c  | 64 
> >>> +-
> >>>  4 files changed, 78 insertions(+), 18 deletions(-)
> >>>
> >>> diff --git a/xen/arch/x86/boot/Makefile b/xen/arch/x86/boot/Makefile
> >>> index c6246c85d2..1b3f121a2f 100644
> >>> --- a/xen/arch/x86/boot/Makefile
> >>> +++ b/xen/arch/x86/boot/Makefile
> >>> @@ -7,6 +7,10 @@ CMDLINE_DEPS = $(DEFS_H_DEPS) video.h
> >>>  RELOC_DEPS = $(DEFS_H_DEPS) $(BASEDIR)/include/xen/multiboot.h \
> >>>$(BASEDIR)/include/xen/multiboot2.h
> >> + autoconf.h
> >>
> >> However, it would much better to take xen/kconfig.h ...
> >>
> > This is fine by me.
> >
> >>>  
> >>> +ifeq ($(CONFIG_PVH_GUEST),y)
> >>> +RELOC_DEPS += $(BASEDIR)/include/public/arch-x86/hvm/start_info.h
> >>> +endif
> > [...]
> >>> diff --git a/xen/arch/x86/boot/reloc.c b/xen/arch/x86/boot/reloc.c
> >>> index b992678b5e..1fe19294ad 100644
> >>> --- a/xen/arch/x86/boot/reloc.c
> >>> +++ b/xen/arch/x86/boot/reloc.c
> >>> @@ -14,8 +14,8 @@
> >>>  
> >>>  /*
> >>>   * This entry point is entered from xen/arch/x86/boot/head.S with:
> >>> - *   - 0x4(%esp) = MULTIBOOT_MAGIC,
> >>> - *   - 0x8(%esp) = MULTIBOOT_INFORMATION_ADDRESS,
> >>> + *   - 0x4(%esp) = MAGIC,
> >>> + *   - 0x8(%esp) = INFORMATION_ADDRESS,
> >>>   *   - 0xc(%esp) = TOPMOST_LOW_MEMORY_STACK_ADDRESS.
> >>>   */
> >>>  asm (
> >>> @@ -29,6 +29,8 @@ asm (
> >>>  #include "../../../include/xen/multiboot.h"
> >>>  #include "../../../include/xen/multiboot2.h"
> >>>  
> >>> +#include "../../../include/generated/autoconf.h"
> >>> +
> >>>  #define get_mb2_data(tag, type, member)   (((multiboot2_tag_##type##_t 
> >>> *)(tag))->member)
> >>>  #define get_mb2_string(tag, type, member) ((u32)get_mb2_data(tag, type, 
> >>> member))
> >>>  
> >>> @@ -71,6 +73,41 @@ static u32 copy_string(u32 src)
> >>>  return copy_mem(src, p - src + 1);
> >>>  }
> >>>  
> >>> +#ifdef CONFIG_PVH_GUEST
> >> ... drop this ifdef and ...
> >>
> > So you want reloc.o to contain pvh_info_reloc unconditionally?
> >
> > Fundamentally I don't think I care enough about all the bikeshedding so
> > if Jan and you agree on this I will just make the change.
> 
> It wont.  The function will be dropped due to DCE, but we'll spot build
> breakages far more easily.  (The important bit is that the function call
> is guarded by the IS_ENABLED())

reloc.o will still have that function in non-PVH build on my machine.
And that's with the following diff applied.

Again. I don't care to argue one way or the other. I have both versions.
You and Jan need to decide which version you like.

---8<---
From 44300edc0e85d841ea2fd1404758d37a46ab0524 Mon Sep 17 00:00:00 2001
From: Wei Liu 
Date: Mon, 22 Jan 2018 18:30:16 +
Subject: [PATCH] xxx

---
 xen/arch/x86/boot/Makefile |  7 ++-
 xen/arch/x86/boot/head.S   |  6 +++---
 xen/arch/x86/boot/reloc.c  | 14 +-
 3 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/xen/arch/x86/boot/Makefile b/xen/arch/x86/boot/Makefile
index 1b3f121a2f..e10388282f 100644
--- a/xen/arch/x86/boot/Makefile
+++ b/xen/arch/x86/boot/Makefile
@@ -5,11 +5,8 @@ DEFS_H_DEPS = defs.h $(BASEDIR)/include/xen/stdbool.h
 CMDLINE_DEPS = $(DEFS_H_DEPS) video.h
 
 RELOC_DEPS = $(DEFS_H_DEPS) $(BASEDIR)/include/xen/multiboot.h \
-$(BASEDIR)/include/xen/multiboot2.h
-
-ifeq ($(CONFIG_PVH_GUEST),y)
-RELOC_DEPS += $(BASEDIR)/include/public/arch-x86/hvm/start_info.h
-endif
+$(BASEDIR)/include/xen/multiboot2.h \
+$(BASEDIR)/include/public/arch-x86/hvm/start_info.h
 
 head.o: cmdline.S reloc.S
 
diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index 9219067231..3cb66fc06b 100644
--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -585,13 +585,13 @@ trampoline_setup:
 push%eax/* Magic number. */
 callreloc
 #ifdef CONFIG_PVH_GUEST
-cmp $0,sym_fs(pvh_boot)
+cmp $0, sym_fs(pvh_boot)
 je  1f
-mov %eax,sym_fs(pvh_start_info_pa)
+mov %eax, sym_fs(pvh_start_info_pa)
 jmp 2f
 #endif
 1:
-mov %eax,sym_fs(multiboot_ptr)
+   

Re: [Xen-devel] [PATCH v4] x86: relocate pvh_info

2018-01-22 Thread Andrew Cooper
On 22/01/18 18:17, Wei Liu wrote:
> On Mon, Jan 22, 2018 at 06:09:14PM +, Andrew Cooper wrote:
>> On 22/01/18 16:13, Wei Liu wrote:
>>> Modify early boot code to relocate pvh info as well, so that we can be
>>> sure __va in __start_xen works.
>>>
>>> Signed-off-by: Wei Liu 
>>> ---
>>> Cc: Jan Beulich 
>>> Cc: Andrew Cooper 
>>> Cc: Roger Pau Monné 
>>> Cc: Doug Goldstein 
>>>
>>> v4: inlcude autoconf.h directly. The code itself is unchanged.
>>> ---
>>>  xen/arch/x86/boot/Makefile |  4 +++
>>>  xen/arch/x86/boot/defs.h   |  3 +++
>>>  xen/arch/x86/boot/head.S   | 25 ++
>>>  xen/arch/x86/boot/reloc.c  | 64 
>>> +-
>>>  4 files changed, 78 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/xen/arch/x86/boot/Makefile b/xen/arch/x86/boot/Makefile
>>> index c6246c85d2..1b3f121a2f 100644
>>> --- a/xen/arch/x86/boot/Makefile
>>> +++ b/xen/arch/x86/boot/Makefile
>>> @@ -7,6 +7,10 @@ CMDLINE_DEPS = $(DEFS_H_DEPS) video.h
>>>  RELOC_DEPS = $(DEFS_H_DEPS) $(BASEDIR)/include/xen/multiboot.h \
>>>  $(BASEDIR)/include/xen/multiboot2.h
>> + autoconf.h
>>
>> However, it would much better to take xen/kconfig.h ...
>>
> This is fine by me.
>
>>>  
>>> +ifeq ($(CONFIG_PVH_GUEST),y)
>>> +RELOC_DEPS += $(BASEDIR)/include/public/arch-x86/hvm/start_info.h
>>> +endif
> [...]
>>> diff --git a/xen/arch/x86/boot/reloc.c b/xen/arch/x86/boot/reloc.c
>>> index b992678b5e..1fe19294ad 100644
>>> --- a/xen/arch/x86/boot/reloc.c
>>> +++ b/xen/arch/x86/boot/reloc.c
>>> @@ -14,8 +14,8 @@
>>>  
>>>  /*
>>>   * This entry point is entered from xen/arch/x86/boot/head.S with:
>>> - *   - 0x4(%esp) = MULTIBOOT_MAGIC,
>>> - *   - 0x8(%esp) = MULTIBOOT_INFORMATION_ADDRESS,
>>> + *   - 0x4(%esp) = MAGIC,
>>> + *   - 0x8(%esp) = INFORMATION_ADDRESS,
>>>   *   - 0xc(%esp) = TOPMOST_LOW_MEMORY_STACK_ADDRESS.
>>>   */
>>>  asm (
>>> @@ -29,6 +29,8 @@ asm (
>>>  #include "../../../include/xen/multiboot.h"
>>>  #include "../../../include/xen/multiboot2.h"
>>>  
>>> +#include "../../../include/generated/autoconf.h"
>>> +
>>>  #define get_mb2_data(tag, type, member)   (((multiboot2_tag_##type##_t 
>>> *)(tag))->member)
>>>  #define get_mb2_string(tag, type, member) ((u32)get_mb2_data(tag, type, 
>>> member))
>>>  
>>> @@ -71,6 +73,41 @@ static u32 copy_string(u32 src)
>>>  return copy_mem(src, p - src + 1);
>>>  }
>>>  
>>> +#ifdef CONFIG_PVH_GUEST
>> ... drop this ifdef and ...
>>
> So you want reloc.o to contain pvh_info_reloc unconditionally?
>
> Fundamentally I don't think I care enough about all the bikeshedding so
> if Jan and you agree on this I will just make the change.

It wont.  The function will be dropped due to DCE, but we'll spot build
breakages far more easily.  (The important bit is that the function call
is guarded by the IS_ENABLED())

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 2/2] xen/arm: GICv3: Only initialize ITS when the distributor supports LPIs.

2018-01-22 Thread Julien Grall
There are firmware tables out describing the ITS but does not support
LPIs. This will result to a data abort when trying to initialize ITS.

While this can be consider a bug in the Device-Tree, same configuration
boots on Linux. So gate the ITS initialization with the support of LPIs
in the distributor.

Signed-off-by: Julien Grall 
---
 xen/arch/arm/gic-v3.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index 9f9cf59f82..730450e34b 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -1637,6 +1637,11 @@ static unsigned long 
gicv3_get_hwdom_extra_madt_size(const struct domain *d)
 }
 #endif
 
+static bool gic_dist_supports_lpis(void)
+{
+return (readl_relaxed(GICD + GICD_TYPER) & GICD_TYPE_LPIS);
+}
+
 /* Set up the GIC */
 static int __init gicv3_init(void)
 {
@@ -1699,9 +1704,12 @@ static int __init gicv3_init(void)
 
 gicv3_dist_init();
 
-res = gicv3_its_init();
-if ( res )
-panic("GICv3: ITS: initialization failed: %d\n", res);
+if ( gic_dist_supports_lpis() )
+{
+res = gicv3_its_init();
+if ( res )
+panic("GICv3: ITS: initialization failed: %d\n", res);
+}
 
 res = gicv3_cpu_init();
 if ( res )
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 0/2] xen/arm: GICv3: Only initialize ITS when LPIs are available

2018-01-22 Thread Julien Grall
Hi all,

This small patch series fix an issue I discovered when using the Foundation
model and the DT provided by Linux upstream.

Indeed the Device-Tree is exposing an ITS but LPIs are not available.
Resulting to an early crash on Xen.

Whilst this looks a DT issue, Linux is able to cope with it. So I think Xen
should also cope with such DT.

Cheers,

Julien Grall (2):
  xen/arm: GICv3: Parse ITS information from the firmware tables later
on
  xen/arm: GICv3: Only initialize ITS when the distributor supports
LPIs.

 xen/arch/arm/gic-v3-its.c| 47 +---
 xen/arch/arm/gic-v3.c| 19 +---
 xen/include/asm-arm/gic_v3_its.h | 12 --
 3 files changed, 41 insertions(+), 37 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4] x86: relocate pvh_info

2018-01-22 Thread Andrew Cooper
On 22/01/18 16:13, Wei Liu wrote:
> Modify early boot code to relocate pvh info as well, so that we can be
> sure __va in __start_xen works.
>
> Signed-off-by: Wei Liu 
> ---
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Cc: Roger Pau Monné 
> Cc: Doug Goldstein 
>
> v4: inlcude autoconf.h directly. The code itself is unchanged.
> ---
>  xen/arch/x86/boot/Makefile |  4 +++
>  xen/arch/x86/boot/defs.h   |  3 +++
>  xen/arch/x86/boot/head.S   | 25 ++
>  xen/arch/x86/boot/reloc.c  | 64 
> +-
>  4 files changed, 78 insertions(+), 18 deletions(-)
>
> diff --git a/xen/arch/x86/boot/Makefile b/xen/arch/x86/boot/Makefile
> index c6246c85d2..1b3f121a2f 100644
> --- a/xen/arch/x86/boot/Makefile
> +++ b/xen/arch/x86/boot/Makefile
> @@ -7,6 +7,10 @@ CMDLINE_DEPS = $(DEFS_H_DEPS) video.h
>  RELOC_DEPS = $(DEFS_H_DEPS) $(BASEDIR)/include/xen/multiboot.h \
>$(BASEDIR)/include/xen/multiboot2.h

+ autoconf.h

However, it would much better to take xen/kconfig.h ...

>  
> +ifeq ($(CONFIG_PVH_GUEST),y)
> +RELOC_DEPS += $(BASEDIR)/include/public/arch-x86/hvm/start_info.h
> +endif

and this unconditionally, and ...

> +
>  head.o: cmdline.S reloc.S
>  
>  cmdline.S: cmdline.c $(CMDLINE_DEPS)
> diff --git a/xen/arch/x86/boot/defs.h b/xen/arch/x86/boot/defs.h
> index 6abdc15446..05921a64a3 100644
> --- a/xen/arch/x86/boot/defs.h
> +++ b/xen/arch/x86/boot/defs.h
> @@ -51,6 +51,9 @@ typedef unsigned short u16;
>  typedef unsigned int u32;
>  typedef unsigned long long u64;
>  typedef unsigned int size_t;
> +typedef u8 uint8_t;
> +typedef u32 uint32_t;
> +typedef u64 uint64_t;
>  
>  #define U16_MAX  ((u16)(~0U))
>  #define UINT_MAX (~0U)
> diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
> index 0f652cea11..aa2e2a93c8 100644
> --- a/xen/arch/x86/boot/head.S
> +++ b/xen/arch/x86/boot/head.S
> @@ -414,6 +414,7 @@ __pvh_start:
>  
>  /* Set trampoline_phys to use mfn 1 to avoid having a mapping at VA 
> 0 */
>  movw$0x1000, sym_esi(trampoline_phys)
> +movl(%ebx), %eax /* mov $XEN_HVM_START_MAGIC_VALUE, %eax */
>  jmp trampoline_setup
>  
>  #endif /* CONFIG_PVH_GUEST */
> @@ -578,18 +579,20 @@ trampoline_setup:
>  /* Get bottom-most low-memory stack address. */
>  add $TRAMPOLINE_SPACE,%ecx
>  
> -#ifdef CONFIG_PVH_GUEST
> -cmpb$0, sym_fs(pvh_boot)
> -jne 1f
> -#endif
> -
> -/* Save the Multiboot info struct (after relocation) for later use. 
> */
> +/* Save Multiboot / PVH info struct (after relocation) for later 
> use. */
>  push%ecx/* Bottom-most low-memory stack address. 
> */
> -push%ebx/* Multiboot information address. */
> -push%eax/* Multiboot magic. */
> +push%ebx/* Multiboot / PVH information address. 
> */
> +push%eax/* Magic number. */
>  callreloc
> -mov %eax,sym_fs(multiboot_ptr)
> +#ifdef CONFIG_PVH_GUEST
> +cmp $0,sym_fs(pvh_boot)
> +je  1f
> +mov %eax,sym_fs(pvh_start_info_pa)
> +jmp 2f
> +#endif
>  1:
> +mov %eax,sym_fs(multiboot_ptr)
> +2:

For new code, please have spaces after commas for readibility.

>  
>  /*
>   * Now trampoline_phys points to the following structure (lowest 
> address
> @@ -598,12 +601,12 @@ trampoline_setup:
>   * ++
>   * | TRAMPOLINE_STACK_SPACE |
>   * ++
> - * |mbi data|
> + * | Data (MBI / PVH)   |
>   * +- - - - - - - - - - - - +
>   * |TRAMPOLINE_SPACE|
>   * ++
>   *
> - * mbi data grows downwards from the highest address of 
> TRAMPOLINE_SPACE
> + * Data grows downwards from the highest address of TRAMPOLINE_SPACE
>   * region to the end of the trampoline. The rest of TRAMPOLINE_SPACE 
> is
>   * reserved for trampoline code and data.
>   */
> diff --git a/xen/arch/x86/boot/reloc.c b/xen/arch/x86/boot/reloc.c
> index b992678b5e..1fe19294ad 100644
> --- a/xen/arch/x86/boot/reloc.c
> +++ b/xen/arch/x86/boot/reloc.c
> @@ -14,8 +14,8 @@
>  
>  /*
>   * This entry point is entered from xen/arch/x86/boot/head.S with:
> - *   - 0x4(%esp) = MULTIBOOT_MAGIC,
> - *   - 0x8(%esp) = MULTIBOOT_INFORMATION_ADDRESS,
> + *   - 0x4(%esp) = MAGIC,
> + *   - 0x8(%esp) = INFORMATION_ADDRESS,
>   *   - 0xc(%esp) = TOPMOST_LOW_MEMORY_STACK_ADDRESS.
>   */
>  asm (
> @@ -29,6 +29,8 @@ asm (
>  #include "../../../include/xen/multiboot.h"
>  #include "../../../include/xen/multiboot2.h"
>  
> +#include "../../../include/generated/autoconf.h"
> +
>  #define 

Re: [Xen-devel] [PATCH v4] x86: relocate pvh_info

2018-01-22 Thread Wei Liu
On Mon, Jan 22, 2018 at 06:09:14PM +, Andrew Cooper wrote:
> On 22/01/18 16:13, Wei Liu wrote:
> > Modify early boot code to relocate pvh info as well, so that we can be
> > sure __va in __start_xen works.
> >
> > Signed-off-by: Wei Liu 
> > ---
> > Cc: Jan Beulich 
> > Cc: Andrew Cooper 
> > Cc: Roger Pau Monné 
> > Cc: Doug Goldstein 
> >
> > v4: inlcude autoconf.h directly. The code itself is unchanged.
> > ---
> >  xen/arch/x86/boot/Makefile |  4 +++
> >  xen/arch/x86/boot/defs.h   |  3 +++
> >  xen/arch/x86/boot/head.S   | 25 ++
> >  xen/arch/x86/boot/reloc.c  | 64 
> > +-
> >  4 files changed, 78 insertions(+), 18 deletions(-)
> >
> > diff --git a/xen/arch/x86/boot/Makefile b/xen/arch/x86/boot/Makefile
> > index c6246c85d2..1b3f121a2f 100644
> > --- a/xen/arch/x86/boot/Makefile
> > +++ b/xen/arch/x86/boot/Makefile
> > @@ -7,6 +7,10 @@ CMDLINE_DEPS = $(DEFS_H_DEPS) video.h
> >  RELOC_DEPS = $(DEFS_H_DEPS) $(BASEDIR)/include/xen/multiboot.h \
> >  $(BASEDIR)/include/xen/multiboot2.h
> 
> + autoconf.h
> 
> However, it would much better to take xen/kconfig.h ...
> 

This is fine by me.

> >  
> > +ifeq ($(CONFIG_PVH_GUEST),y)
> > +RELOC_DEPS += $(BASEDIR)/include/public/arch-x86/hvm/start_info.h
> > +endif
> 
[...]
> > diff --git a/xen/arch/x86/boot/reloc.c b/xen/arch/x86/boot/reloc.c
> > index b992678b5e..1fe19294ad 100644
> > --- a/xen/arch/x86/boot/reloc.c
> > +++ b/xen/arch/x86/boot/reloc.c
> > @@ -14,8 +14,8 @@
> >  
> >  /*
> >   * This entry point is entered from xen/arch/x86/boot/head.S with:
> > - *   - 0x4(%esp) = MULTIBOOT_MAGIC,
> > - *   - 0x8(%esp) = MULTIBOOT_INFORMATION_ADDRESS,
> > + *   - 0x4(%esp) = MAGIC,
> > + *   - 0x8(%esp) = INFORMATION_ADDRESS,
> >   *   - 0xc(%esp) = TOPMOST_LOW_MEMORY_STACK_ADDRESS.
> >   */
> >  asm (
> > @@ -29,6 +29,8 @@ asm (
> >  #include "../../../include/xen/multiboot.h"
> >  #include "../../../include/xen/multiboot2.h"
> >  
> > +#include "../../../include/generated/autoconf.h"
> > +
> >  #define get_mb2_data(tag, type, member)   (((multiboot2_tag_##type##_t 
> > *)(tag))->member)
> >  #define get_mb2_string(tag, type, member) ((u32)get_mb2_data(tag, type, 
> > member))
> >  
> > @@ -71,6 +73,41 @@ static u32 copy_string(u32 src)
> >  return copy_mem(src, p - src + 1);
> >  }
> >  
> > +#ifdef CONFIG_PVH_GUEST
> 
> ... drop this ifdef and ...
> 

So you want reloc.o to contain pvh_info_reloc unconditionally?

Fundamentally I don't think I care enough about all the bikeshedding so
if Jan and you agree on this I will just make the change.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-unstable-smoke test] 118271: tolerable all pass - PUSHED

2018-01-22 Thread osstest service owner
flight 118271 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/118271/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  4dcfd7d1436c77ee92081a36cf63f569dc4ef725
baseline version:
 xen  3fa1b35d785eb80103d185a59d50f238515d2427

Last test of basis   118235  2018-01-19 19:02:00 Z2 days
Testing same since   118271  2018-01-22 16:14:38 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Ian Jackson 
  Jan Beulich 
  Wei Liu 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-i386 pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   3fa1b35d78..4dcfd7d143  4dcfd7d1436c77ee92081a36cf63f569dc4ef725 -> smoke

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] I only see one CPU core on Xen when booted via grub

2018-01-22 Thread Roger Pau Monné
On Mon, Jan 22, 2018 at 06:28:17PM +0100, msd+xen-de...@msd.im wrote:
> Hi,
> 
> I only see 1 CPU core on Xen 4.9 when booted via grub instead of 8.
> 
> It's may be related to :
> - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=820807
> - https://xenproject.atlassian.net/browse/XEN-42
> 
> It is the first server on which I have this problem.
> 
> I can confirm that :
> - if I boot Debian througt grub, I see the 8 cores
> - if I boot Xen through grub, I only see _one_ core
> - if I boot Xen directly through EFI (using `efibootmgr`), I see the 8 cores
> 
> 1. Do you know what happens ?
> 2. Do you need some logs ?

You should provide the output of `xl dmesg` from Xen. And ideally boot
with an hypervisor that's been built with debug support. dmesg from
Linux might also be interesting (both when booted as Dom0 and bare
metal).

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] further post-Meltdown-bad-aid performance thoughts

2018-01-22 Thread Matt Wilson
On Fri, Jan 19, 2018 at 03:43:26PM +, George Dunlap wrote:
[...] 

> But there will surely be more attacks like this (in fact, there may
> already be some in the works[2]).

[...]
 
>  -George
> 
> [1] https://lwn.net/SubscriberLink/744287/02dd9bc503409ca3/
> [2] skyfallattack.com

In case anyone missed it, [2] is an under-informed hoax.

--msw

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] I only see one CPU core on Xen when booted via grub

2018-01-22 Thread msd+xen-de...@msd.im

Hi,

I only see 1 CPU core on Xen 4.9 when booted via grub instead of 8.

It's may be related to :
- https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=820807
- https://xenproject.atlassian.net/browse/XEN-42

It is the first server on which I have this problem.

I can confirm that :
- if I boot Debian througt grub, I see the 8 cores
- if I boot Xen through grub, I only see _one_ core
- if I boot Xen directly through EFI (using `efibootmgr`), I see the 8 cores

1. Do you know what happens ?
2. Do you need some logs ?

Regards,


Guillaume


# cat /proc/cpuinfo
model name  : Intel(R) Xeon(R) CPU E3-1270 v6 @ 3.80GHz
microcode   : 0x5e
flags   : fpu de tsc msr pae mce cx8 apic sep mca cmov pat 
clflush acpi mmx fxsr sse sse2 ht syscall nx lm constant_tsc 
arch_perfmon rep_good nopl nonstop_tsc eagerfpu pni pclmulqdq monitor 
est ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand 
hypervisor lahf_lm abm 3dnowprefetch epb fsgsbase bmi1 hle avx2 bmi2 
erms rtm rdseed adx clflushopt xsaveopt xsavec xgetbv1 dtherm ida arat 
pln pts hwp hwp_notify hwp_act_window hwp_epp


# xl info
release: 4.9.0-5-amd64
version: #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04)
xen_version: 4.8.3-pre

# cat /etc/debian_version
9.3

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] further post-Meltdown-bad-aid performance thoughts

2018-01-22 Thread George Dunlap
On 01/22/2018 05:04 PM, Jan Beulich wrote:
 On 22.01.18 at 16:15,  wrote:
>> On 01/22/2018 01:30 PM, Jan Beulich wrote:
>> On 22.01.18 at 13:33,  wrote:
 What I'm proposing is something like this:

 * We have a "global" region of Xen memory that is mapped by all
 processors.  This will contain everything we consider not sensitive;
 including Xen text segments, and most domain and vcpu data.  But it will
 *not* map all of host memory, nor have access to sensitive data, such as
 vcpu register state.

 * We have per-cpu "local" regions.  In this region we will map,
 on-demand, guest memory which is needed to perform current operations.
 (We can consider how strictly we need to unmap memory after using it.)
 We will also map the current vcpu's registers.

 * On entry to a 64-bit PV guest, we don't change the mapping at all.

 Now, no matter what the speculative attack -- SP1, SP2, or SP3 -- a vcpu
 can only access its own RAM and registers.  There's no extra overhead to
 context switching into or out of the hypervisor.
>>>
>>> And we would open back up the SP3 variant of guest user mode
>>> attacking its own kernel by going through the Xen mappings. I
>>> can't exclude that variants of SP1 (less likely SP2) allowing indirect
>>> guest-user -> guest-kernel attacks couldn't be found.
>>
>> How?  Xen doesn't have the guest kernel memory mapped when it's not
>> using it.
> 
> Oh, so you mean to do away with the direct map altogether?

Yes. :-)  The direct map is *the* core reason why the SP*
vulnerabilities are so dangerous.  If the *only* thing we did was get
rid of the direct map, without doing *anything* else, we would almost
entirely mitigate the effect of all of the attacks.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 08/11] x86/entry: Clobber the Return Stack Buffer/Return Address Stack on entry to Xen

2018-01-22 Thread Andrew Cooper
On 22/01/18 16:49, Jan Beulich wrote:
 On 22.01.18 at 16:51,  wrote:
>> On 19/01/18 15:02, Jan Beulich wrote:
>> On 19.01.18 at 15:24,  wrote:
 On 19/01/18 12:47, Jan Beulich wrote:
 On 18.01.18 at 16:46,  wrote:
>> + * %rsp is preserved by using an extra GPR because a) we've got plenty 
>> spare,
>> + * b) the two movs are shorter to encode than `add $32*8, %rsp`, and c) 
>> can be
>> + * optimised with mov-elimination in modern cores.
>> + */
>> +mov $16, %ecx   /* 16 iterations, two calls per loop */
>> +mov %rsp, %rax  /* Store the current %rsp */
>> +
>> +.L\@_fill_rsb_loop:
>> +
>> +.rept 2 /* Unrolled twice. */
>> +call 2f /* Create an RSB entry. */
>> +1:  pause
>> +jmp 1b  /* Capture rogue speculation. */
>> +2:
> I won't further insist on changing away from numeric labels here, but
> I'd still like to point out an example of a high risk use of such labels 
> in
> mainline code: There's a "jz 1b" soon after
> exception_with_ints_disabled, leading across _two_ other labels and
> quite a few insns and macro invocations. May I at the very least
> suggest that you don't use 1 and 2 here?
 I spent ages trying to get .L labels working here, but they don't
 function inside a rept, as you end up with duplicate local symbols.

 Even using irp to inject a unique number into the loop doesn't appear to
 work, because the \ escape gets interpreted as a token separator. 
 AFAICT, \@ is special by virtue of the fact that it doesn't count as a
 token separator.

 If you've got a better suggestion then I'm all ears.

 Alternatively, I could manually unroll the loop, or pick some arbitrary
 other numbers to use.
>>> Since the unroll number is just 2, this is what I would have
>>> suggested primarily. .rept of course won't work, as it's not a
>>> macro invocation, and hence doesn't increment the internal
>>> counter. With .irp I can get things to work:
>>>
>>> .macro m
>>> .irp n, 1, 2
>>> .Lxyz_\@_\n:mov $\@, %eax
>>> .endr
>>> .endm
>> This appears to only work when \n is at the end of the label.  None of:
>>
>> .Lxyz_\@_\n_:mov$\@, %eax
>> .Lxyz_\@_\n\()_:mov$\@, %eax
>> .Lxyz_\n\@:mov$\@, %eax
>>
>> work.
> .Lxyz_\n\(\)()_\(\)@: mov $\@, %ecx
>
> (There are two rounds of expansion, so \-s you want expanded
> in the second round need escaping for the first one.

Eww.  That's too much like Perl IMO.

I'll see if I can arrange things to work neatly with \n at the end of
the label.  If not, I'll manually unroll the loop; it is only twice
after all.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4] x86: relocate pvh_info

2018-01-22 Thread Jan Beulich
>>> On 22.01.18 at 17:13,  wrote:
> Modify early boot code to relocate pvh info as well, so that we can be
> sure __va in __start_xen works.
> 
> Signed-off-by: Wei Liu 

As before
Reviewed-by: Jan Beulich 
with the caveat that this shouldn't go in without Andrew
withdrawing his general objection.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3] xen/shim: stash RSDP address for ACPI driver

2018-01-22 Thread Jan Beulich
>>> On 22.01.18 at 17:33,  wrote:
> On Mon, Jan 22, 2018 at 04:28:30PM +, Wei Liu wrote:
>> It used to the case that we placed RSDP under 1MB and let Xen search
>> for it. We moved the placement to under 4GB in 4a5733771, so the
>> search wouldn't work.
>> 
>> Introduce rsdp_hint to ACPI code and set that variable in
>> convert_pvh_info.
>> 
>> Signed-off-by: Wei Liu 
> 
> LGTM:
> 
> Reviewed-by: Roger Pau Monné 

Acked-by: Jan Beulich 


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v2 00/12] xen/x86: use per-vcpu stacks for 64 bit pv domains

2018-01-22 Thread Jan Beulich
>>> On 22.01.18 at 16:00,  wrote:
> On 22/01/18 15:48, Jan Beulich wrote:
> On 22.01.18 at 15:38,  wrote:
>>> On 22/01/18 15:22, Jan Beulich wrote:
>>> On 22.01.18 at 15:18,  wrote:
> On 22/01/18 13:50, Jan Beulich wrote:
> On 22.01.18 at 13:32,  wrote:
>>> As a preparation for doing page table isolation in the Xen hypervisor
>>> in order to mitigate "Meltdown" use dedicated stacks, GDT and TSS for
>>> 64 bit PV domains mapped to the per-domain virtual area.
>>>
>>> The per-vcpu stacks are used for early interrupt handling only. After
>>> saving the domain's registers stacks are switched back to the normal
>>> per physical cpu ones in order to be able to address on-stack data
>>> from other cpus e.g. while handling IPIs.
>>>
>>> Adding %cr3 switching between saving of the registers and switching
>>> the stacks will enable the possibility to run guest code without any
>>> per physical cpu mapping, i.e. avoiding the threat of a guest being
>>> able to access other domains data.
>>>
>>> Without any further measures it will still be possible for e.g. a
>>> guest's user program to read stack data of another vcpu of the same
>>> domain, but this can be easily avoided by a little PV-ABI modification
>>> introducing per-cpu user address spaces.
>>>
>>> This series is meant as a replacement for Andrew's patch series:
>>> "x86: Prerequisite work for a Xen KAISER solution".
>>
>> Considering in particular the two reverts, what I'm missing here
>> is a clear description of the meaningful additional protection this
>> approach provides over the band-aid. For context see also
>> https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg01735.html
>>  
>
> My approach supports mapping only the following data while the guest is
> running (apart form the guest's own data, of course):
>
> - the per-vcpu entry stacks of the domain which will contain only the
>   guest's registers saved when an interrupt occurs
> - the per-vcpu GDTs and TSSs of the domain
> - the IDT
> - the interrupt handler code (arch/x86/x86_64/[compat/]entry.S
>
> All other hypervisor data and code can be completely hidden from the
> guests.

 I understand that. What I'm not clear about is: Which parts of
 the additionally hidden data are actually necessary (or at least
 very desirable) to hide?
>>>
>>> Necessary:
>>> - other guests' memory (e.g. physical memory 1:1 mapping)
>>> - data from other guests e.g.in stack pages, debug buffers, I/O buffers,
>>>   code emulator buffers
>>> - other guests' register values e.g. in vcpu structure
>> 
>> All of this is already being made invisible by the band-aid (with the
>> exception of leftovers on the hypervisor stacks across context
>> switches, which we've already said could be taken care of by
>> memset()ing that area). I'm asking about the _additional_ benefits
>> of your approach.
> 
> I'm quite sure the performance will be much better as it doesn't require
> per physical cpu L4 page tables, but just a shadow L4 table for each
> guest L4 table, similar to the Linux kernel KPTI approach.

But isn't that model having the same synchronization issues upon
guest L4 updates which Andrew was fighting with?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 08/11] x86/entry: Clobber the Return Stack Buffer/Return Address Stack on entry to Xen

2018-01-22 Thread Jan Beulich
>>> On 22.01.18 at 16:51,  wrote:
> On 19/01/18 15:02, Jan Beulich wrote:
> On 19.01.18 at 15:24,  wrote:
>>> On 19/01/18 12:47, Jan Beulich wrote:
>>> On 18.01.18 at 16:46,  wrote:
> + * %rsp is preserved by using an extra GPR because a) we've got plenty 
> spare,
> + * b) the two movs are shorter to encode than `add $32*8, %rsp`, and c) 
> can be
> + * optimised with mov-elimination in modern cores.
> + */
> +mov $16, %ecx   /* 16 iterations, two calls per loop */
> +mov %rsp, %rax  /* Store the current %rsp */
> +
> +.L\@_fill_rsb_loop:
> +
> +.rept 2 /* Unrolled twice. */
> +call 2f /* Create an RSB entry. */
> +1:  pause
> +jmp 1b  /* Capture rogue speculation. */
> +2:
 I won't further insist on changing away from numeric labels here, but
 I'd still like to point out an example of a high risk use of such labels in
 mainline code: There's a "jz 1b" soon after
 exception_with_ints_disabled, leading across _two_ other labels and
 quite a few insns and macro invocations. May I at the very least
 suggest that you don't use 1 and 2 here?
>>> I spent ages trying to get .L labels working here, but they don't
>>> function inside a rept, as you end up with duplicate local symbols.
>>>
>>> Even using irp to inject a unique number into the loop doesn't appear to
>>> work, because the \ escape gets interpreted as a token separator. 
>>> AFAICT, \@ is special by virtue of the fact that it doesn't count as a
>>> token separator.
>>>
>>> If you've got a better suggestion then I'm all ears.
>>>
>>> Alternatively, I could manually unroll the loop, or pick some arbitrary
>>> other numbers to use.
>> Since the unroll number is just 2, this is what I would have
>> suggested primarily. .rept of course won't work, as it's not a
>> macro invocation, and hence doesn't increment the internal
>> counter. With .irp I can get things to work:
>>
>>  .macro m
>>  .irp n, 1, 2
>> .Lxyz_\@_\n: mov $\@, %eax
>>  .endr
>>  .endm
> 
> This appears to only work when \n is at the end of the label.  None of:
> 
> .Lxyz_\@_\n_:mov$\@, %eax
> .Lxyz_\@_\n\()_:mov$\@, %eax
> .Lxyz_\n\@:mov$\@, %eax
> 
> work.

.Lxyz_\n\(\)()_\(\)@: mov   $\@, %ecx

(There are two rounds of expansion, so \-s you want expanded
in the second round need escaping for the first one.

> Given this appears to be a corner case to begin with, how likely do you
> think it is to work with older assemblers?

The really old ones where macro handling was a mess will be a
problem irrespective of the above, I'm afraid. From the point on
where macros were made work sensibly I think not much has
changed. But yes, there's a risk.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3] xen/shim: stash RSDP address for ACPI driver

2018-01-22 Thread Roger Pau Monné
On Mon, Jan 22, 2018 at 04:28:30PM +, Wei Liu wrote:
> It used to the case that we placed RSDP under 1MB and let Xen search
> for it. We moved the placement to under 4GB in 4a5733771, so the
> search wouldn't work.
> 
> Introduce rsdp_hint to ACPI code and set that variable in
> convert_pvh_info.
> 
> Signed-off-by: Wei Liu 

LGTM:

Reviewed-by: Roger Pau Monné 

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-unstable test] 118266: regressions - trouble: broken/fail/pass

2018-01-22 Thread Ian Jackson
osstest service owner writes ("[xen-unstable test] 118266: regressions - 
trouble: broken/fail/pass"):
> flight 118266 xen-unstable real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/118266/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-amd64-xl   4 host-install(4)  broken pass in 
> 118261

rimava1 is broken.  I'm investigating.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3] xen/shim: stash RSDP address for ACPI driver

2018-01-22 Thread Wei Liu
It used to the case that we placed RSDP under 1MB and let Xen search
for it. We moved the placement to under 4GB in 4a5733771, so the
search wouldn't work.

Introduce rsdp_hint to ACPI code and set that variable in
convert_pvh_info.

Signed-off-by: Wei Liu 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Roger Pau Monné 

v3: Add BUG_ON. Use __initdata. Move declaration within CONFIG_ACPI.
---
 xen/arch/x86/guest/pvh-boot.c | 3 +++
 xen/drivers/acpi/osl.c| 5 +
 xen/include/xen/acpi.h| 2 ++
 3 files changed, 10 insertions(+)

diff --git a/xen/arch/x86/guest/pvh-boot.c b/xen/arch/x86/guest/pvh-boot.c
index be3122b16c..0e9e5bfdf6 100644
--- a/xen/arch/x86/guest/pvh-boot.c
+++ b/xen/arch/x86/guest/pvh-boot.c
@@ -69,6 +69,9 @@ static void __init convert_pvh_info(void)
 mod[i].mod_end   = entry[i].paddr + entry[i].size;
 mod[i].string= entry[i].cmdline_paddr;
 }
+
+BUG_ON(!pvh_info->rsdp_paddr);
+rsdp_hint = pvh_info->rsdp_paddr;
 }
 
 static void __init get_memory_map(void)
diff --git a/xen/drivers/acpi/osl.c b/xen/drivers/acpi/osl.c
index 52c9b4ba9a..4c8bb7839e 100644
--- a/xen/drivers/acpi/osl.c
+++ b/xen/drivers/acpi/osl.c
@@ -62,8 +62,13 @@ void __init acpi_os_vprintf(const char *fmt, va_list args)
printk("%s", buffer);
 }
 
+acpi_physical_address __initdata rsdp_hint;
+
 acpi_physical_address __init acpi_os_get_root_pointer(void)
 {
+   if (rsdp_hint)
+   return rsdp_hint;
+
if (efi_enabled(EFI_BOOT)) {
if (efi.acpi20 != EFI_INVALID_TABLE_ADDR)
return efi.acpi20;
diff --git a/xen/include/xen/acpi.h b/xen/include/xen/acpi.h
index 9409350f05..fd5b5fb919 100644
--- a/xen/include/xen/acpi.h
+++ b/xen/include/xen/acpi.h
@@ -51,6 +51,8 @@
 
 #ifdef CONFIG_ACPI
 
+extern acpi_physical_address rsdp_hint;
+
 enum acpi_interrupt_id {
ACPI_INTERRUPT_PMI  = 1,
ACPI_INTERRUPT_INIT,
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v4] x86: relocate pvh_info

2018-01-22 Thread Wei Liu
Modify early boot code to relocate pvh info as well, so that we can be
sure __va in __start_xen works.

Signed-off-by: Wei Liu 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Roger Pau Monné 
Cc: Doug Goldstein 

v4: inlcude autoconf.h directly. The code itself is unchanged.
---
 xen/arch/x86/boot/Makefile |  4 +++
 xen/arch/x86/boot/defs.h   |  3 +++
 xen/arch/x86/boot/head.S   | 25 ++
 xen/arch/x86/boot/reloc.c  | 64 +-
 4 files changed, 78 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/boot/Makefile b/xen/arch/x86/boot/Makefile
index c6246c85d2..1b3f121a2f 100644
--- a/xen/arch/x86/boot/Makefile
+++ b/xen/arch/x86/boot/Makefile
@@ -7,6 +7,10 @@ CMDLINE_DEPS = $(DEFS_H_DEPS) video.h
 RELOC_DEPS = $(DEFS_H_DEPS) $(BASEDIR)/include/xen/multiboot.h \
 $(BASEDIR)/include/xen/multiboot2.h
 
+ifeq ($(CONFIG_PVH_GUEST),y)
+RELOC_DEPS += $(BASEDIR)/include/public/arch-x86/hvm/start_info.h
+endif
+
 head.o: cmdline.S reloc.S
 
 cmdline.S: cmdline.c $(CMDLINE_DEPS)
diff --git a/xen/arch/x86/boot/defs.h b/xen/arch/x86/boot/defs.h
index 6abdc15446..05921a64a3 100644
--- a/xen/arch/x86/boot/defs.h
+++ b/xen/arch/x86/boot/defs.h
@@ -51,6 +51,9 @@ typedef unsigned short u16;
 typedef unsigned int u32;
 typedef unsigned long long u64;
 typedef unsigned int size_t;
+typedef u8 uint8_t;
+typedef u32 uint32_t;
+typedef u64 uint64_t;
 
 #define U16_MAX((u16)(~0U))
 #define UINT_MAX   (~0U)
diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index 0f652cea11..aa2e2a93c8 100644
--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -414,6 +414,7 @@ __pvh_start:
 
 /* Set trampoline_phys to use mfn 1 to avoid having a mapping at VA 0 
*/
 movw$0x1000, sym_esi(trampoline_phys)
+movl(%ebx), %eax /* mov $XEN_HVM_START_MAGIC_VALUE, %eax */
 jmp trampoline_setup
 
 #endif /* CONFIG_PVH_GUEST */
@@ -578,18 +579,20 @@ trampoline_setup:
 /* Get bottom-most low-memory stack address. */
 add $TRAMPOLINE_SPACE,%ecx
 
-#ifdef CONFIG_PVH_GUEST
-cmpb$0, sym_fs(pvh_boot)
-jne 1f
-#endif
-
-/* Save the Multiboot info struct (after relocation) for later use. */
+/* Save Multiboot / PVH info struct (after relocation) for later use. 
*/
 push%ecx/* Bottom-most low-memory stack address. */
-push%ebx/* Multiboot information address. */
-push%eax/* Multiboot magic. */
+push%ebx/* Multiboot / PVH information address. */
+push%eax/* Magic number. */
 callreloc
-mov %eax,sym_fs(multiboot_ptr)
+#ifdef CONFIG_PVH_GUEST
+cmp $0,sym_fs(pvh_boot)
+je  1f
+mov %eax,sym_fs(pvh_start_info_pa)
+jmp 2f
+#endif
 1:
+mov %eax,sym_fs(multiboot_ptr)
+2:
 
 /*
  * Now trampoline_phys points to the following structure (lowest 
address
@@ -598,12 +601,12 @@ trampoline_setup:
  * ++
  * | TRAMPOLINE_STACK_SPACE |
  * ++
- * |mbi data|
+ * | Data (MBI / PVH)   |
  * +- - - - - - - - - - - - +
  * |TRAMPOLINE_SPACE|
  * ++
  *
- * mbi data grows downwards from the highest address of 
TRAMPOLINE_SPACE
+ * Data grows downwards from the highest address of TRAMPOLINE_SPACE
  * region to the end of the trampoline. The rest of TRAMPOLINE_SPACE is
  * reserved for trampoline code and data.
  */
diff --git a/xen/arch/x86/boot/reloc.c b/xen/arch/x86/boot/reloc.c
index b992678b5e..1fe19294ad 100644
--- a/xen/arch/x86/boot/reloc.c
+++ b/xen/arch/x86/boot/reloc.c
@@ -14,8 +14,8 @@
 
 /*
  * This entry point is entered from xen/arch/x86/boot/head.S with:
- *   - 0x4(%esp) = MULTIBOOT_MAGIC,
- *   - 0x8(%esp) = MULTIBOOT_INFORMATION_ADDRESS,
+ *   - 0x4(%esp) = MAGIC,
+ *   - 0x8(%esp) = INFORMATION_ADDRESS,
  *   - 0xc(%esp) = TOPMOST_LOW_MEMORY_STACK_ADDRESS.
  */
 asm (
@@ -29,6 +29,8 @@ asm (
 #include "../../../include/xen/multiboot.h"
 #include "../../../include/xen/multiboot2.h"
 
+#include "../../../include/generated/autoconf.h"
+
 #define get_mb2_data(tag, type, member)   (((multiboot2_tag_##type##_t 
*)(tag))->member)
 #define get_mb2_string(tag, type, member) ((u32)get_mb2_data(tag, type, 
member))
 
@@ -71,6 +73,41 @@ static u32 copy_string(u32 src)
 return copy_mem(src, p - src + 1);
 }
 
+#ifdef CONFIG_PVH_GUEST
+
+#include 
+
+static struct hvm_start_info *pvh_info_reloc(u32 in)
+{
+struct hvm_start_info *out;
+
+out = _p(copy_mem(in, sizeof(*out)));
+
+if ( out->cmdline_paddr )
+  

Re: [Xen-devel] [PATCH v2] xen/shim: stash RSDP address for ACPI driver

2018-01-22 Thread Jan Beulich
>>> On 22.01.18 at 16:34,  wrote:
> On Mon, Jan 22, 2018 at 03:02:52PM +, Wei Liu wrote:
>> --- a/xen/drivers/acpi/osl.c
>> +++ b/xen/drivers/acpi/osl.c
>> @@ -62,8 +62,13 @@ void __init acpi_os_vprintf(const char *fmt, va_list args)
>>  printk("%s", buffer);
>>  }
>>  
>> +acpi_physical_address rsdp_hint;
> 
> Since this is only used by acpi_os_get_root_pointer it should be
> __initdata. I also prefer to place global variables at the top of the
> file after the includes, but that's just a matter of taste I guess.

I think keeping such limited use variable declarations /
definitions close to their use site is quite okay, if not
preferable.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 08/11] x86/entry: Clobber the Return Stack Buffer/Return Address Stack on entry to Xen

2018-01-22 Thread Andrew Cooper
On 19/01/18 15:02, Jan Beulich wrote:
 On 19.01.18 at 15:24,  wrote:
>> On 19/01/18 12:47, Jan Beulich wrote:
>> On 18.01.18 at 16:46,  wrote:
 @@ -265,6 +265,10 @@ On hardware supporting IBRS, the `ibrs=` option can 
 be 
 used to force or
  prevent Xen using the feature itself.  If Xen is not using IBRS itself,
  functionality is still set up so IBRS can be virtualised for guests.
  
 +The `rsb_vmexit=` and `rsb_native=` options can be used to fine tune when 
 the
 +RSB gets overwritten.  There are individual controls for an entry from HVM
 +context, and an entry from a native (PV or Xen) context.
>>> Would you mind adding a sentence or two to the description making
>>> clear what use this fine grained control is? I can't really figure why I
>>> might need to be concerned about one of the two cases, but not the
>>> other.
>> I though I'd covered that in the commit message, but I'm not sure this
>> is a suitable place to discuss the details.  PV and HVM guests have
>> different reasoning for why we need to overwrite the RSB.
>>
>> In the past, there used to be a default interaction of rsb_native and
>> SMEP, but that proved to be insufficient and rsb_native is now
>> unconditionally enabled.  In principle however, it should fall within
>> CONFIG_PV.
> Thanks for the explanation, but I'm afraid I'm none the wiser as
> to why the two separate options are needed (or even just wanted).
>
 --- a/xen/include/asm-x86/spec_ctrl_asm.h
 +++ b/xen/include/asm-x86/spec_ctrl_asm.h
 @@ -73,6 +73,40 @@
   *  - SPEC_CTRL_EXIT_TO_GUEST
   */
  
 +.macro DO_OVERWRITE_RSB
 +/*
 + * Requires nothing
 + * Clobbers %rax, %rcx
 + *
 + * Requires 256 bytes of stack space, but %rsp has no net change. Based on
 + * Google's performance numbers, the loop is unrolled to 16 iterations 
 and two
 + * calls per iteration.
 + *
 + * The call filling the RSB needs a nonzero displacement, but we use "1:
 + * pause, jmp 1b" to safely contains any ret-based speculation, even if 
 the
 + * loop is speculatively executed prematurely.
>>> I'm struggling to understand why you use "but" here. Maybe just a
>>> lack of English skills on my part?
>> "displacement.  A nop would do, but" ?
>>
>> It is a justification for why we are putting more than a single byte in
>> the middle.
> Oh, I see, but only with the addition you suggest.
>
 + * %rsp is preserved by using an extra GPR because a) we've got plenty 
 spare,
 + * b) the two movs are shorter to encode than `add $32*8, %rsp`, and c) 
 can be
 + * optimised with mov-elimination in modern cores.
 + */
 +mov $16, %ecx   /* 16 iterations, two calls per loop */
 +mov %rsp, %rax  /* Store the current %rsp */
 +
 +.L\@_fill_rsb_loop:
 +
 +.rept 2 /* Unrolled twice. */
 +call 2f /* Create an RSB entry. */
 +1:  pause
 +jmp 1b  /* Capture rogue speculation. */
 +2:
>>> I won't further insist on changing away from numeric labels here, but
>>> I'd still like to point out an example of a high risk use of such labels in
>>> mainline code: There's a "jz 1b" soon after
>>> exception_with_ints_disabled, leading across _two_ other labels and
>>> quite a few insns and macro invocations. May I at the very least
>>> suggest that you don't use 1 and 2 here?
>> I spent ages trying to get .L labels working here, but they don't
>> function inside a rept, as you end up with duplicate local symbols.
>>
>> Even using irp to inject a unique number into the loop doesn't appear to
>> work, because the \ escape gets interpreted as a token separator. 
>> AFAICT, \@ is special by virtue of the fact that it doesn't count as a
>> token separator.
>>
>> If you've got a better suggestion then I'm all ears.
>>
>> Alternatively, I could manually unroll the loop, or pick some arbitrary
>> other numbers to use.
> Since the unroll number is just 2, this is what I would have
> suggested primarily. .rept of course won't work, as it's not a
> macro invocation, and hence doesn't increment the internal
> counter. With .irp I can get things to work:
>
>   .macro m
>   .irp n, 1, 2
> .Lxyz_\@_\n:  mov $\@, %eax
>   .endr
>   .endm

This appears to only work when \n is at the end of the label.  None of:

.Lxyz_\@_\n_:    mov    $\@, %eax
.Lxyz_\@_\n\()_:    mov    $\@, %eax
.Lxyz_\n\@:    mov    $\@, %eax

work.

Given this appears to be a corner case to begin with, how likely do you
think it is to work with older assemblers?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] xen/shim: stash RSDP address for ACPI driver

2018-01-22 Thread Roger Pau Monné
On Mon, Jan 22, 2018 at 03:02:52PM +, Wei Liu wrote:
> diff --git a/xen/arch/x86/guest/pvh-boot.c b/xen/arch/x86/guest/pvh-boot.c
> index be3122b16c..2903b392bc 100644
> --- a/xen/arch/x86/guest/pvh-boot.c
> +++ b/xen/arch/x86/guest/pvh-boot.c
> @@ -69,6 +69,8 @@ static void __init convert_pvh_info(void)
>  mod[i].mod_end   = entry[i].paddr + entry[i].size;
>  mod[i].string= entry[i].cmdline_paddr;
>  }
> +
> +rsdp_hint = pvh_info->rsdp_paddr;

BUG_ON(!rsdp_hint);

I know it's not ideal, but given the other BUGs I think this should be
here also.

>  }
>  
>  static void __init get_memory_map(void)
> diff --git a/xen/drivers/acpi/osl.c b/xen/drivers/acpi/osl.c
> index 52c9b4ba9a..5d8af6f290 100644
> --- a/xen/drivers/acpi/osl.c
> +++ b/xen/drivers/acpi/osl.c
> @@ -62,8 +62,13 @@ void __init acpi_os_vprintf(const char *fmt, va_list args)
>   printk("%s", buffer);
>  }
>  
> +acpi_physical_address rsdp_hint;

Since this is only used by acpi_os_get_root_pointer it should be
__initdata. I also prefer to place global variables at the top of the
file after the includes, but that's just a matter of taste I guess.

>  acpi_physical_address __init acpi_os_get_root_pointer(void)
>  {
> + if (rsdp_hint)
> + return rsdp_hint;
> +
>   if (efi_enabled(EFI_BOOT)) {
>   if (efi.acpi20 != EFI_INVALID_TABLE_ADDR)
>   return efi.acpi20;
> diff --git a/xen/include/xen/acpi.h b/xen/include/xen/acpi.h
> index 9409350f05..e83182cb14 100644
> --- a/xen/include/xen/acpi.h
> +++ b/xen/include/xen/acpi.h
> @@ -33,6 +33,8 @@
>  #include 
>  #include 
>  
> +extern acpi_physical_address rsdp_hint;
> +
>  #define ACPI_MADT_GET_(fld, x) (((x) & ACPI_MADT_##fld##_MASK) / \
>   (ACPI_MADT_##fld##_MASK & -ACPI_MADT_##fld##_MASK))

I think this needs to go after the #ifdef CONFIG_ACPI guard?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 4/7] x86/shim: use credit scheduler

2018-01-22 Thread Ian Jackson
Roger Pau Monné writes ("Re: [Xen-devel] [PATCH v2 4/7] x86/shim: use credit 
scheduler"):
> On Fri, Jan 19, 2018 at 03:34:55PM +, Wei Liu wrote:
> > Remove sched=null from shim cmdline and doc
> > 
> > We use the default scheduler (credit1 as of writing). The NULL
> > scheduler still has bugs to fix.
> > 
> > Update shim.config.
> > 
> > Signed-off-by: Wei Liu 
> 
> Reviewed-by: Roger Pau Monné 

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] further post-Meltdown-bad-aid performance thoughts

2018-01-22 Thread George Dunlap
On 01/22/2018 01:30 PM, Jan Beulich wrote:
 On 22.01.18 at 13:33,  wrote:
>> On 01/22/2018 09:25 AM, Jan Beulich wrote:
>> On 19.01.18 at 18:00,  wrote:
 On 01/19/2018 04:36 PM, Jan Beulich wrote:
 On 19.01.18 at 16:43,  wrote:
>> So what if instead of trying to close the "windows", we made it so that
>> there was nothing through the windows to see?  If no matter what the
>> hypervisor speculatively executed, nothing sensitive was visibile except
>> what a vcpu was already allowed to see,
>
> I think you didn't finish your sentence here, but I also think I
> can guess the missing part. There's a price to pay for such an
> approach though - iterating over domains, or vCPU-s of a
> domain (just as an example) wouldn't be simple list walks
> anymore. There are certainly other things. IOW - yes, and
> approach like this seems possible, but with all the lost
> performance I think we shouldn't go overboard with further
> hiding.

 Right, so the next question: what information *from other guests* are
 sensitive?

 Obviously the guest registers are sensitive.  But how much of the
 information in vcpu struct that we actually need to have "to hand" is
 actually sensitive information that we need to hide from other VMs?
>>>
>>> None, I think. But that's not the main aspect here. struct vcpu
>>> instances come and go, which would mean we'd have to
>>> permanently update what is or is not being exposed in the page
>>> tables used. This, while solvable, is going to be a significant
>>> burden in terms of synchronizing page tables (if we continue to
>>> use per-CPU ones) and/or TLB shootdown. Whereas if only the
>>> running vCPU's structure (and it's struct domain) are exposed,
>>> no such synchronization is needed (things would simply be
>>> updated during context switch).
>>
>> I'm not sure we're actually communicating.
>>
>> Correct me if I'm wrong; at the moment, under XPTI, hypercalls running
>> under Xen still have access to all of host memory.  To protect against
>> SP3, we remove almost all Xen memory from the address space before
>> switching to the guest.
>>
>> What I'm proposing is something like this:
>>
>> * We have a "global" region of Xen memory that is mapped by all
>> processors.  This will contain everything we consider not sensitive;
>> including Xen text segments, and most domain and vcpu data.  But it will
>> *not* map all of host memory, nor have access to sensitive data, such as
>> vcpu register state.
>>
>> * We have per-cpu "local" regions.  In this region we will map,
>> on-demand, guest memory which is needed to perform current operations.
>> (We can consider how strictly we need to unmap memory after using it.)
>> We will also map the current vcpu's registers.
>>
>> * On entry to a 64-bit PV guest, we don't change the mapping at all.
>>
>> Now, no matter what the speculative attack -- SP1, SP2, or SP3 -- a vcpu
>> can only access its own RAM and registers.  There's no extra overhead to
>> context switching into or out of the hypervisor.
> 
> And we would open back up the SP3 variant of guest user mode
> attacking its own kernel by going through the Xen mappings. I
> can't exclude that variants of SP1 (less likely SP2) allowing indirect
> guest-user -> guest-kernel attacks couldn't be found.

How?  Xen doesn't have the guest kernel memory mapped when it's not
using it.

>> Given that, I don't understand what the following comments mean:
>>
>> "There's a price to pay for such an approach though - iterating over
>> domains, or vCPU-s of a domain (just as an example) wouldn't be simple
>> list walks anymore."
>>
>> If we remove sensitive information from the domain and vcpu structs,
>> then any bit of hypervisor code can iterate over domain and vcpu structs
>> at will; only if they actually need to read or write sensitive data will
>> they have to perform an expensive map/unmap operation.  But in general,
>> to read another vcpu's registers you already need to do a vcpu_pause() /
>> vcpu_unpause(), which involves at least two IPIs (with one
>> spin-and-wait), so it doesn't seem like that should add a lot of extra
>> overhead.
> 
> Reading another vCPU-s register can't be compared with e.g.
> wanting to deliver an interrupt to other than the currently running
> vCPU.

I'm not sure what this has to do with what I said.  Your original claim
was that "iterating over domains wouldn't be simple list walks anymore",
and I said it would be.

If you want to make some other claim about the cost of delivering an
interrupt to another vcpu then please actually make a claim and justify it.

>> "struct vcpu instances come and go, which would mean we'd have to
>> permanently update what is or is not being exposed in the page tables
>> used. This, while solvable, is going to be a significant burden in terms
>> of synchronizing page tables 

[Xen-devel] [PATCH v2] xen/shim: stash RSDP address for ACPI driver

2018-01-22 Thread Wei Liu
It used to the case that we placed RSDP under 1MB and let Xen search
for it. We moved the placement to under 4GB in 4a5733771, so the
search wouldn't work.

Introduce rsdp_hint to ACPI code and set that variable in
convert_pvh_info.

Signed-off-by: Wei Liu 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Roger Pau Monné 
---
 xen/arch/x86/guest/pvh-boot.c | 2 ++
 xen/drivers/acpi/osl.c| 5 +
 xen/include/xen/acpi.h| 2 ++
 3 files changed, 9 insertions(+)

diff --git a/xen/arch/x86/guest/pvh-boot.c b/xen/arch/x86/guest/pvh-boot.c
index be3122b16c..2903b392bc 100644
--- a/xen/arch/x86/guest/pvh-boot.c
+++ b/xen/arch/x86/guest/pvh-boot.c
@@ -69,6 +69,8 @@ static void __init convert_pvh_info(void)
 mod[i].mod_end   = entry[i].paddr + entry[i].size;
 mod[i].string= entry[i].cmdline_paddr;
 }
+
+rsdp_hint = pvh_info->rsdp_paddr;
 }
 
 static void __init get_memory_map(void)
diff --git a/xen/drivers/acpi/osl.c b/xen/drivers/acpi/osl.c
index 52c9b4ba9a..5d8af6f290 100644
--- a/xen/drivers/acpi/osl.c
+++ b/xen/drivers/acpi/osl.c
@@ -62,8 +62,13 @@ void __init acpi_os_vprintf(const char *fmt, va_list args)
printk("%s", buffer);
 }
 
+acpi_physical_address rsdp_hint;
+
 acpi_physical_address __init acpi_os_get_root_pointer(void)
 {
+   if (rsdp_hint)
+   return rsdp_hint;
+
if (efi_enabled(EFI_BOOT)) {
if (efi.acpi20 != EFI_INVALID_TABLE_ADDR)
return efi.acpi20;
diff --git a/xen/include/xen/acpi.h b/xen/include/xen/acpi.h
index 9409350f05..e83182cb14 100644
--- a/xen/include/xen/acpi.h
+++ b/xen/include/xen/acpi.h
@@ -33,6 +33,8 @@
 #include 
 #include 
 
+extern acpi_physical_address rsdp_hint;
+
 #define ACPI_MADT_GET_(fld, x) (((x) & ACPI_MADT_##fld##_MASK) / \
(ACPI_MADT_##fld##_MASK & -ACPI_MADT_##fld##_MASK))
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v2 00/12] xen/x86: use per-vcpu stacks for 64 bit pv domains

2018-01-22 Thread Juergen Gross
On 22/01/18 15:48, Jan Beulich wrote:
 On 22.01.18 at 15:38,  wrote:
>> On 22/01/18 15:22, Jan Beulich wrote:
>> On 22.01.18 at 15:18,  wrote:
 On 22/01/18 13:50, Jan Beulich wrote:
 On 22.01.18 at 13:32,  wrote:
>> As a preparation for doing page table isolation in the Xen hypervisor
>> in order to mitigate "Meltdown" use dedicated stacks, GDT and TSS for
>> 64 bit PV domains mapped to the per-domain virtual area.
>>
>> The per-vcpu stacks are used for early interrupt handling only. After
>> saving the domain's registers stacks are switched back to the normal
>> per physical cpu ones in order to be able to address on-stack data
>> from other cpus e.g. while handling IPIs.
>>
>> Adding %cr3 switching between saving of the registers and switching
>> the stacks will enable the possibility to run guest code without any
>> per physical cpu mapping, i.e. avoiding the threat of a guest being
>> able to access other domains data.
>>
>> Without any further measures it will still be possible for e.g. a
>> guest's user program to read stack data of another vcpu of the same
>> domain, but this can be easily avoided by a little PV-ABI modification
>> introducing per-cpu user address spaces.
>>
>> This series is meant as a replacement for Andrew's patch series:
>> "x86: Prerequisite work for a Xen KAISER solution".
>
> Considering in particular the two reverts, what I'm missing here
> is a clear description of the meaningful additional protection this
> approach provides over the band-aid. For context see also
> https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg01735.html
>  

 My approach supports mapping only the following data while the guest is
 running (apart form the guest's own data, of course):

 - the per-vcpu entry stacks of the domain which will contain only the
   guest's registers saved when an interrupt occurs
 - the per-vcpu GDTs and TSSs of the domain
 - the IDT
 - the interrupt handler code (arch/x86/x86_64/[compat/]entry.S

 All other hypervisor data and code can be completely hidden from the
 guests.
>>>
>>> I understand that. What I'm not clear about is: Which parts of
>>> the additionally hidden data are actually necessary (or at least
>>> very desirable) to hide?
>>
>> Necessary:
>> - other guests' memory (e.g. physical memory 1:1 mapping)
>> - data from other guests e.g.in stack pages, debug buffers, I/O buffers,
>>   code emulator buffers
>> - other guests' register values e.g. in vcpu structure
> 
> All of this is already being made invisible by the band-aid (with the
> exception of leftovers on the hypervisor stacks across context
> switches, which we've already said could be taken care of by
> memset()ing that area). I'm asking about the _additional_ benefits
> of your approach.

I'm quite sure the performance will be much better as it doesn't require
per physical cpu L4 page tables, but just a shadow L4 table for each
guest L4 table, similar to the Linux kernel KPTI approach.

> 
>> Desirable: as much as possible. For instance I don't buy your reasoning
>> regarding the Xen binary: how would you do this e.g. in a public cloud?
>> How do you know which Xen binary (possibly with livepatches) is being
>> used there? And today we don't have something like KASLR in Xen, but
>> not hiding the text and RO data will make the introduction of that quite
>> useless.
> 
> I'm aware that there are people thinking that .text and .rodata
> should be hidden; what I'm not really aware of is the reasoning
> behind that.

In case an attacker knows of some vulnerability it is just harder to use
that knowledge without knowing where specific data structures or coding
is living. Its like switching the lights off when you know somebody is
aiming with a gun at you. The odds are much better if the killer can't
see you.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 03/11] x86/msr: Emulation of MSR_{SPEC_CTRL, PRED_CMD} for guests

2018-01-22 Thread Andrew Cooper
On 19/01/18 10:45, Jan Beulich wrote:
 On 18.01.18 at 16:46,  wrote:
>> @@ -153,14 +168,44 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t 
>> val)
>>  {
>>  const struct vcpu *curr = current;
>>  struct domain *d = v->domain;
>> +const struct cpuid_policy *cp = d->arch.cpuid;
>>  struct msr_domain_policy *dp = d->arch.msr;
>>  struct msr_vcpu_policy *vp = v->arch.msr;
>>  
>>  switch ( msr )
>>  {
>>  case MSR_INTEL_PLATFORM_INFO:
>> +case MSR_ARCH_CAPABILITIES:
>> +/* Read-only */
>>  goto gp_fault;
>>  
>> +case MSR_SPEC_CTRL:
>> +if ( !cp->feat.ibrsb )
>> +goto gp_fault; /* MSR available? */
>> +
>> +/*
>> + * Note: SPEC_CTRL_STIBP is specified as safe to use (i.e. ignored)
>> + * when STIBP isn't enumerated in hardware.
>> + */
>> +
>> +if ( val & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP) )
>> +goto gp_fault; /* Rsvd bit set? */
>> +
>> +vp->spec_ctrl.raw = val;
>> +break;
> Did you check (or inquire) whether reading back the value on a
> system which ignores the write to 1 actually produces the
> written value? I'd sort of expect zero to come back instead.

Tom Lendacky has confirmed on the LKML that AMD will implement these
bits as "read as written" rather than "read as zero".

https://lkml.org/lkml/2018/1/22/499

Still no comment from Intel.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v2 00/12] xen/x86: use per-vcpu stacks for 64 bit pv domains

2018-01-22 Thread Jan Beulich
>>> On 22.01.18 at 15:38,  wrote:
> On 22/01/18 15:22, Jan Beulich wrote:
> On 22.01.18 at 15:18,  wrote:
>>> On 22/01/18 13:50, Jan Beulich wrote:
>>> On 22.01.18 at 13:32,  wrote:
> As a preparation for doing page table isolation in the Xen hypervisor
> in order to mitigate "Meltdown" use dedicated stacks, GDT and TSS for
> 64 bit PV domains mapped to the per-domain virtual area.
>
> The per-vcpu stacks are used for early interrupt handling only. After
> saving the domain's registers stacks are switched back to the normal
> per physical cpu ones in order to be able to address on-stack data
> from other cpus e.g. while handling IPIs.
>
> Adding %cr3 switching between saving of the registers and switching
> the stacks will enable the possibility to run guest code without any
> per physical cpu mapping, i.e. avoiding the threat of a guest being
> able to access other domains data.
>
> Without any further measures it will still be possible for e.g. a
> guest's user program to read stack data of another vcpu of the same
> domain, but this can be easily avoided by a little PV-ABI modification
> introducing per-cpu user address spaces.
>
> This series is meant as a replacement for Andrew's patch series:
> "x86: Prerequisite work for a Xen KAISER solution".

 Considering in particular the two reverts, what I'm missing here
 is a clear description of the meaningful additional protection this
 approach provides over the band-aid. For context see also
 https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg01735.html 
>>>
>>> My approach supports mapping only the following data while the guest is
>>> running (apart form the guest's own data, of course):
>>>
>>> - the per-vcpu entry stacks of the domain which will contain only the
>>>   guest's registers saved when an interrupt occurs
>>> - the per-vcpu GDTs and TSSs of the domain
>>> - the IDT
>>> - the interrupt handler code (arch/x86/x86_64/[compat/]entry.S
>>>
>>> All other hypervisor data and code can be completely hidden from the
>>> guests.
>> 
>> I understand that. What I'm not clear about is: Which parts of
>> the additionally hidden data are actually necessary (or at least
>> very desirable) to hide?
> 
> Necessary:
> - other guests' memory (e.g. physical memory 1:1 mapping)
> - data from other guests e.g.in stack pages, debug buffers, I/O buffers,
>   code emulator buffers
> - other guests' register values e.g. in vcpu structure

All of this is already being made invisible by the band-aid (with the
exception of leftovers on the hypervisor stacks across context
switches, which we've already said could be taken care of by
memset()ing that area). I'm asking about the _additional_ benefits
of your approach.

> Desirable: as much as possible. For instance I don't buy your reasoning
> regarding the Xen binary: how would you do this e.g. in a public cloud?
> How do you know which Xen binary (possibly with livepatches) is being
> used there? And today we don't have something like KASLR in Xen, but
> not hiding the text and RO data will make the introduction of that quite
> useless.

I'm aware that there are people thinking that .text and .rodata
should be hidden; what I'm not really aware of is the reasoning
behind that.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v2 00/12] xen/x86: use per-vcpu stacks for 64 bit pv domains

2018-01-22 Thread Juergen Gross
On 22/01/18 15:22, Jan Beulich wrote:
 On 22.01.18 at 15:18,  wrote:
>> On 22/01/18 13:50, Jan Beulich wrote:
>> On 22.01.18 at 13:32,  wrote:
 As a preparation for doing page table isolation in the Xen hypervisor
 in order to mitigate "Meltdown" use dedicated stacks, GDT and TSS for
 64 bit PV domains mapped to the per-domain virtual area.

 The per-vcpu stacks are used for early interrupt handling only. After
 saving the domain's registers stacks are switched back to the normal
 per physical cpu ones in order to be able to address on-stack data
 from other cpus e.g. while handling IPIs.

 Adding %cr3 switching between saving of the registers and switching
 the stacks will enable the possibility to run guest code without any
 per physical cpu mapping, i.e. avoiding the threat of a guest being
 able to access other domains data.

 Without any further measures it will still be possible for e.g. a
 guest's user program to read stack data of another vcpu of the same
 domain, but this can be easily avoided by a little PV-ABI modification
 introducing per-cpu user address spaces.

 This series is meant as a replacement for Andrew's patch series:
 "x86: Prerequisite work for a Xen KAISER solution".
>>>
>>> Considering in particular the two reverts, what I'm missing here
>>> is a clear description of the meaningful additional protection this
>>> approach provides over the band-aid. For context see also
>>> https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg01735.html 
>>
>> My approach supports mapping only the following data while the guest is
>> running (apart form the guest's own data, of course):
>>
>> - the per-vcpu entry stacks of the domain which will contain only the
>>   guest's registers saved when an interrupt occurs
>> - the per-vcpu GDTs and TSSs of the domain
>> - the IDT
>> - the interrupt handler code (arch/x86/x86_64/[compat/]entry.S
>>
>> All other hypervisor data and code can be completely hidden from the
>> guests.
> 
> I understand that. What I'm not clear about is: Which parts of
> the additionally hidden data are actually necessary (or at least
> very desirable) to hide?

Necessary:
- other guests' memory (e.g. physical memory 1:1 mapping)
- data from other guests e.g.in stack pages, debug buffers, I/O buffers,
  code emulator buffers
- other guests' register values e.g. in vcpu structure

Desirable: as much as possible. For instance I don't buy your reasoning
regarding the Xen binary: how would you do this e.g. in a public cloud?
How do you know which Xen binary (possibly with livepatches) is being
used there? And today we don't have something like KASLR in Xen, but
not hiding the text and RO data will make the introduction of that quite
useless.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] xen/arm: cpuerrata: Remove percpu.h include

2018-01-22 Thread Julien Grall
The include percpu.h was added by mistake in cpuerrata.h (see commit
4c4fddc166 "xen/arm64: Add skeleton to harden the branch aliasing
attacks"). So remove it.

Signed-off-by: Julien Grall 
---
 xen/include/asm-arm/cpuerrata.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/xen/include/asm-arm/cpuerrata.h b/xen/include/asm-arm/cpuerrata.h
index 23ebf367ea..7de68361ff 100644
--- a/xen/include/asm-arm/cpuerrata.h
+++ b/xen/include/asm-arm/cpuerrata.h
@@ -1,7 +1,6 @@
 #ifndef __ARM_CPUERRATA_H__
 #define __ARM_CPUERRATA_H__
 
-#include 
 #include 
 #include 
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v2 01/12] x86: cleanup processor.h

2018-01-22 Thread Jan Beulich
>>> On 22.01.18 at 15:25,  wrote:
> On 22/01/18 14:10, Juergen Gross wrote:
>> On 22/01/18 13:52, Jan Beulich wrote:
>> On 22.01.18 at 13:32,  wrote:
 Remove NSC/Cyrix CPU macros and current_text_addr() which are used
 nowhere.
>>> I agree doing the former, but I have a vague recollection that we've
>>> left the latter in place despite there not being any callers at present.
>> It isn't as if current_text_addr() would be rocket science. I'm quite
>> sure in case it is needed there will be enough brain power available to
>> build it either from scratch again or to find it in git.
>>
>> In case you really like it to stay I won't object, of course.
> 
> FWIW, I've disliked all the recent patches which have tried to use
> current_text_addr(), and I don't see it as a useful debugging utility
> either.
> 
> I would prefer to see it gone than to stay.

Well, okay then. The patch is independent of the other, actual
RFC stuff, so could go in right away.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v2 01/12] x86: cleanup processor.h

2018-01-22 Thread Andrew Cooper
On 22/01/18 14:10, Juergen Gross wrote:
> On 22/01/18 13:52, Jan Beulich wrote:
> On 22.01.18 at 13:32,  wrote:
>>> Remove NSC/Cyrix CPU macros and current_text_addr() which are used
>>> nowhere.
>> I agree doing the former, but I have a vague recollection that we've
>> left the latter in place despite there not being any callers at present.
> It isn't as if current_text_addr() would be rocket science. I'm quite
> sure in case it is needed there will be enough brain power available to
> build it either from scratch again or to find it in git.
>
> In case you really like it to stay I won't object, of course.

FWIW, I've disliked all the recent patches which have tried to use
current_text_addr(), and I don't see it as a useful debugging utility
either.

I would prefer to see it gone than to stay.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen/shim: stash RSDP address for ACPI driver

2018-01-22 Thread Wei Liu
On Mon, Jan 22, 2018 at 06:35:14AM -0700, Jan Beulich wrote:
> >>> On 22.01.18 at 14:21,  wrote:
> > On Mon, Jan 22, 2018 at 01:03:14PM +, Roger Pau Monné wrote:
> >> On Mon, Jan 22, 2018 at 12:47:10PM +, Wei Liu wrote:
> >> > --- a/xen/drivers/acpi/osl.c
> >> > +++ b/xen/drivers/acpi/osl.c
> >> > @@ -38,6 +38,10 @@
> >> >  #include 
> >> >  #include 
> >> >  
> >> > +#ifdef CONFIG_PVH_GUEST
> >> > +#include 
> >> > +#endif
> >> > +
> >> >  #define _COMPONENT  ACPI_OS_SERVICES
> >> >  ACPI_MODULE_NAME("osl")
> >> >  
> >> > @@ -74,6 +78,11 @@ acpi_physical_address __init 
> >> > acpi_os_get_root_pointer(void)
> >> > "System description tables not found\n");
> >> >  return 0;
> >> >  }
> >> > +#ifdef CONFIG_PVH_GUEST
> >> > +} else if (pvh_boot) {
> >> > +ASSERT(pvh_rsdp_pa);
> >> > +return pvh_rsdp_pa;
> >> > +#endif
> >> >  } else if (IS_ENABLED(CONFIG_ACPI_LEGACY_TABLES_LOOKUP)) {
> >> >  acpi_physical_address pa = 0;
> >> 
> >> Can this be done in a non-PVH specific way?
> >> 
> >> Can we have a global rsdp_hint variable or similar that would be used
> >> here if set?
> > 
> > Who will be the anticipated user(s) other than PVH?
> 
> That's not so much the question here imo. Instead the issue I
> see is that the way you code it it's really a layering violation.
> Similar hackery was also rejected in Linux recently, iirc.
> 

OK. I buy this argument.

Let me invent a rsdp_hint instead.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 02/11] acpi: arm: API to query estimated size of hardware domain's IORT

2018-01-22 Thread Julien Grall

Hi,

On 19/01/18 06:10, Manish Jaggi wrote:



On 01/17/2018 12:22 AM, Julien Grall wrote:
  IORT for hardware domain is generated using the requesterId and 
deviceId map.


  Signed-off-by: Manish Jaggi 
---
  xen/arch/arm/domain_build.c |  12 -
  xen/drivers/acpi/arm/Makefile   |   1 +
  xen/drivers/acpi/arm/gen-iort.c | 101 


  xen/include/acpi/gen-iort.h |   6 +++
  4 files changed, 119 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index c74f4dd69d..f5d5e3d271 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -14,6 +14,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -1799,7 +1800,7 @@ static int acpi_create_fadt(struct domain *d, 
struct membank tbl_add[])
    static int estimate_acpi_efi_size(struct domain *d, struct 
kernel_info *kinfo)

  {
-    size_t efi_size, acpi_size, madt_size;
+    size_t efi_size, acpi_size, madt_size, iort_size;


Rather than introduce a variable for 10 instructions, you can rename 
madt_size so it can be re-used. I would be ok for this to be in the 
same patch (providing a proper commit message).

Why would you want to replace iort_size with madt_size ?
What is the harm if adding a variable makes the code more verbose.
I am not able to appreciate your point here.


I didn't ask to replace iort_size with madt_size. But rename madt_size 
to table_size or some other name that could be reused for both.


This is very similar to when you store the error return of a function. 
You are not going to name ret_foo, ret_bar, ret_fish... You are just 
going to use one variable and re-use it.


Anyway, I am not going to fight with that and just send a patch to clean 
that up once it has been merged.


[...]

  diff --git a/xen/drivers/acpi/arm/Makefile 
b/xen/drivers/acpi/arm/Makefile

index 046fad5e3d..13f1a9159f 100644
--- a/xen/drivers/acpi/arm/Makefile
+++ b/xen/drivers/acpi/arm/Makefile
@@ -1 +1,2 @@
  obj-y = ridmap.o
+obj-y += gen-iort.o
diff --git a/xen/drivers/acpi/arm/gen-iort.c 
b/xen/drivers/acpi/arm/gen-iort.c

new file mode 100644
index 00..3fc32959c6
--- /dev/null
+++ b/xen/drivers/acpi/arm/gen-iort.c
@@ -0,0 +1,101 @@
+/*
+ * xen/drivers/acpi/arm/gen-iort.c
+ *
+ * Code to generate IORT for hardware domain using the requesterId
+ * and deviceId map.
+ *
+ * Manish Jaggi 
+ * Copyright (c) 2018 Linaro.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.


The license is wrong (see patch #1).

Please see my comment in patch #1.
This license is used from an existing file in xen.
So there are a lot of wrong licenses in xen code.


Well yes. But does it mean you have to add more wrong code? ;)

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] xen: Drop DOMCTL_getmemlist and xc_get_pfn_list()

2018-01-22 Thread Jan Beulich
>>> On 22.01.18 at 14:29,  wrote:
> On 22/01/18 13:01, Jan Beulich wrote:
> On 22.01.18 at 13:52,  wrote:
>>> On 22/01/18 12:41, Jan Beulich wrote:
>>> On 19.01.18 at 20:19,  wrote:
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -1117,7 +1117,7 @@ struct xen_domctl {
>  #define XEN_DOMCTL_pausedomain3
>  #define XEN_DOMCTL_unpausedomain  4
>  #define XEN_DOMCTL_getdomaininfo  5
> -#define XEN_DOMCTL_getmemlist 6
> +/* #define XEN_DOMCTL_getmemlist  6 Obsolete */
>  /* #define XEN_DOMCTL_getpageframeinfo7 Obsolete - use 
>>> getpageframeinfo3 */
>  /* #define XEN_DOMCTL_getpageframeinfo2   8 Obsolete - use 
>>> getpageframeinfo3 */
>  #define XEN_DOMCTL_setvcpuaffinity9
 Just like mentioned upon someone else's recent submission to
 remove a domctl sub-op: You want to bump the interface version
 (remember that the bump done for the shim doesn't count as long
 as there is a possible plan to make that other recent commit part
 of a 4.10.x stable release).
>>> There has already been a version bump for 4.11.
>> I know, hence the longer explanation, which I had given also
>> when the shim series was first posted: If that domctl change is
>> to be backported to 4.10, interface version 0xf will be burnt
>> for _just that change_. That other bump is sufficient only when
>> there is no plan whatsoever to backport the earlier change.
> 
> If that change is backported to 4.10, that is the time to burn another
> interface version.  Not in this patch.

Not if the backport happens only after 4.11 has shipped. And
even it the backport happened earlier, we're liable to forget if
we don't do it now. If there was just a remote chance of that
backport to happen, I probably wouldn't insist, but aiui there's
a pretty determined plan to do so.

I also find it strange that you didn't respond back when I had
first outlined this extra requirement.

> Also, this demonstrates the inherent problems with the interface
> version.  This trick can only ever be played on the most recently
> released branch.  It is a dire trainwreck in terms of versioning, and
> serves only to make it almost impossible to make changes to an installed
> system.

It's not optimal, but I have yet to see a proposal of a mechanism
that's more flexible than this one, but provides at least the same
minimal protection against mismatches.

As to changes to an installed system - the domctl interface should
be in sufficiently usable a shape that such won't be necessary. Or
in the worst case new sub-ops could always be added.

 Plus I again question whether
 "Obsolete" is an appropriate description for something that's no
 longer part of the interface (rather than just being suggested to
 no longer be used). Is there any point in keeping the old sub-op
 as a comment in the first place?
>>> To avoid the number being reused.  It also serves as a marker to locate
>>> the change which removed the hypercall if anyone is doing archaeology in
>>> the future.
>> The number getting re-used with a higher interface version is no
>> problem at all, afaics.
> 
> Yes it is.  do_domctl() (which inserts the domctl version) is remote
> from the choice of op to use, so reusing numbers means that the language
> subs around libxc can issue completely erroneous hypercalls without
> suffering a build or version failure.  (Again, see trainwreck of a
> versioning scheme.)

do_domctl() itself shouldn't be available for use outside of libxc.
And the actual libxc wrapper for a removed sub-op would be
unavailable in the shared object matching the underlying Xen.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen/shim: stash RSDP address for ACPI driver

2018-01-22 Thread Jan Beulich
>>> On 22.01.18 at 14:21,  wrote:
> On Mon, Jan 22, 2018 at 01:03:14PM +, Roger Pau Monné wrote:
>> On Mon, Jan 22, 2018 at 12:47:10PM +, Wei Liu wrote:
>> > --- a/xen/drivers/acpi/osl.c
>> > +++ b/xen/drivers/acpi/osl.c
>> > @@ -38,6 +38,10 @@
>> >  #include 
>> >  #include 
>> >  
>> > +#ifdef CONFIG_PVH_GUEST
>> > +#include 
>> > +#endif
>> > +
>> >  #define _COMPONENTACPI_OS_SERVICES
>> >  ACPI_MODULE_NAME("osl")
>> >  
>> > @@ -74,6 +78,11 @@ acpi_physical_address __init 
>> > acpi_os_get_root_pointer(void)
>> >   "System description tables not found\n");
>> >return 0;
>> >}
>> > +#ifdef CONFIG_PVH_GUEST
>> > +  } else if (pvh_boot) {
>> > +  ASSERT(pvh_rsdp_pa);
>> > +  return pvh_rsdp_pa;
>> > +#endif
>> >} else if (IS_ENABLED(CONFIG_ACPI_LEGACY_TABLES_LOOKUP)) {
>> >acpi_physical_address pa = 0;
>> 
>> Can this be done in a non-PVH specific way?
>> 
>> Can we have a global rsdp_hint variable or similar that would be used
>> here if set?
> 
> Who will be the anticipated user(s) other than PVH?

That's not so much the question here imo. Instead the issue I
see is that the way you code it it's really a layering violation.
Similar hackery was also rejected in Linux recently, iirc.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] further post-Meltdown-bad-aid performance thoughts

2018-01-22 Thread Jan Beulich
>>> On 22.01.18 at 13:33,  wrote:
> On 01/22/2018 09:25 AM, Jan Beulich wrote:
> On 19.01.18 at 18:00,  wrote:
>>> On 01/19/2018 04:36 PM, Jan Beulich wrote:
>>> On 19.01.18 at 16:43,  wrote:
> So what if instead of trying to close the "windows", we made it so that
> there was nothing through the windows to see?  If no matter what the
> hypervisor speculatively executed, nothing sensitive was visibile except
> what a vcpu was already allowed to see,

 I think you didn't finish your sentence here, but I also think I
 can guess the missing part. There's a price to pay for such an
 approach though - iterating over domains, or vCPU-s of a
 domain (just as an example) wouldn't be simple list walks
 anymore. There are certainly other things. IOW - yes, and
 approach like this seems possible, but with all the lost
 performance I think we shouldn't go overboard with further
 hiding.
>>>
>>> Right, so the next question: what information *from other guests* are
>>> sensitive?
>>>
>>> Obviously the guest registers are sensitive.  But how much of the
>>> information in vcpu struct that we actually need to have "to hand" is
>>> actually sensitive information that we need to hide from other VMs?
>> 
>> None, I think. But that's not the main aspect here. struct vcpu
>> instances come and go, which would mean we'd have to
>> permanently update what is or is not being exposed in the page
>> tables used. This, while solvable, is going to be a significant
>> burden in terms of synchronizing page tables (if we continue to
>> use per-CPU ones) and/or TLB shootdown. Whereas if only the
>> running vCPU's structure (and it's struct domain) are exposed,
>> no such synchronization is needed (things would simply be
>> updated during context switch).
> 
> I'm not sure we're actually communicating.
> 
> Correct me if I'm wrong; at the moment, under XPTI, hypercalls running
> under Xen still have access to all of host memory.  To protect against
> SP3, we remove almost all Xen memory from the address space before
> switching to the guest.
> 
> What I'm proposing is something like this:
> 
> * We have a "global" region of Xen memory that is mapped by all
> processors.  This will contain everything we consider not sensitive;
> including Xen text segments, and most domain and vcpu data.  But it will
> *not* map all of host memory, nor have access to sensitive data, such as
> vcpu register state.
> 
> * We have per-cpu "local" regions.  In this region we will map,
> on-demand, guest memory which is needed to perform current operations.
> (We can consider how strictly we need to unmap memory after using it.)
> We will also map the current vcpu's registers.
> 
> * On entry to a 64-bit PV guest, we don't change the mapping at all.
> 
> Now, no matter what the speculative attack -- SP1, SP2, or SP3 -- a vcpu
> can only access its own RAM and registers.  There's no extra overhead to
> context switching into or out of the hypervisor.

And we would open back up the SP3 variant of guest user mode
attacking its own kernel by going through the Xen mappings. I
can't exclude that variants of SP1 (less likely SP2) allowing indirect
guest-user -> guest-kernel attacks couldn't be found.

> Given that, I don't understand what the following comments mean:
> 
> "There's a price to pay for such an approach though - iterating over
> domains, or vCPU-s of a domain (just as an example) wouldn't be simple
> list walks anymore."
> 
> If we remove sensitive information from the domain and vcpu structs,
> then any bit of hypervisor code can iterate over domain and vcpu structs
> at will; only if they actually need to read or write sensitive data will
> they have to perform an expensive map/unmap operation.  But in general,
> to read another vcpu's registers you already need to do a vcpu_pause() /
> vcpu_unpause(), which involves at least two IPIs (with one
> spin-and-wait), so it doesn't seem like that should add a lot of extra
> overhead.

Reading another vCPU-s register can't be compared with e.g.
wanting to deliver an interrupt to other than the currently running
vCPU.

> "struct vcpu instances come and go, which would mean we'd have to
> permanently update what is or is not being exposed in the page tables
> used. This, while solvable, is going to be a significant burden in terms
> of synchronizing page tables (if we continue to use per-CPU ones) and/or
> TLB shootdown."
> 
> I don't understand what this is referring to in my proposed plan above.

I had specifically said these were just examples (ones coming to
mind immediately). Of course splitting such structures in two parts
is an option, but I'm not sure it's a reasonable one (which perhaps
depends on details on how you would envision the implementation).
If the split off piece(s) was/were being referred to by pointers out
of the main structure, there would be a 

Re: [Xen-devel] [PATCH 2/2] xen: Drop DOMCTL_getmemlist and xc_get_pfn_list()

2018-01-22 Thread Andrew Cooper
On 22/01/18 13:01, Jan Beulich wrote:
 On 22.01.18 at 13:52,  wrote:
>> On 22/01/18 12:41, Jan Beulich wrote:
>> On 19.01.18 at 20:19,  wrote:
 --- a/xen/include/public/domctl.h
 +++ b/xen/include/public/domctl.h
 @@ -1117,7 +1117,7 @@ struct xen_domctl {
  #define XEN_DOMCTL_pausedomain3
  #define XEN_DOMCTL_unpausedomain  4
  #define XEN_DOMCTL_getdomaininfo  5
 -#define XEN_DOMCTL_getmemlist 6
 +/* #define XEN_DOMCTL_getmemlist  6 Obsolete */
  /* #define XEN_DOMCTL_getpageframeinfo7 Obsolete - use 
>> getpageframeinfo3 */
  /* #define XEN_DOMCTL_getpageframeinfo2   8 Obsolete - use 
>> getpageframeinfo3 */
  #define XEN_DOMCTL_setvcpuaffinity9
>>> Just like mentioned upon someone else's recent submission to
>>> remove a domctl sub-op: You want to bump the interface version
>>> (remember that the bump done for the shim doesn't count as long
>>> as there is a possible plan to make that other recent commit part
>>> of a 4.10.x stable release).
>> There has already been a version bump for 4.11.
> I know, hence the longer explanation, which I had given also
> when the shim series was first posted: If that domctl change is
> to be backported to 4.10, interface version 0xf will be burnt
> for _just that change_. That other bump is sufficient only when
> there is no plan whatsoever to backport the earlier change.

If that change is backported to 4.10, that is the time to burn another
interface version.  Not in this patch.

Also, this demonstrates the inherent problems with the interface
version.  This trick can only ever be played on the most recently
released branch.  It is a dire trainwreck in terms of versioning, and
serves only to make it almost impossible to make changes to an installed
system.

>
>>> Plus I again question whether
>>> "Obsolete" is an appropriate description for something that's no
>>> longer part of the interface (rather than just being suggested to
>>> no longer be used). Is there any point in keeping the old sub-op
>>> as a comment in the first place?
>> To avoid the number being reused.  It also serves as a marker to locate
>> the change which removed the hypercall if anyone is doing archaeology in
>> the future.
> The number getting re-used with a higher interface version is no
> problem at all, afaics.

Yes it is.  do_domctl() (which inserts the domctl version) is remote
from the choice of op to use, so reusing numbers means that the language
subs around libxc can issue completely erroneous hypercalls without
suffering a build or version failure.  (Again, see trainwreck of a
versioning scheme.)

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen/shim: stash RSDP address for ACPI driver

2018-01-22 Thread Wei Liu
On Mon, Jan 22, 2018 at 01:03:14PM +, Roger Pau Monné wrote:
> On Mon, Jan 22, 2018 at 12:47:10PM +, Wei Liu wrote:
> > It used to the case that we placed RSDP under 1MB and let Xen search
> > for it. We moved the placement to under 4GB in 4a5733771, so the
> > search wouldn't work.
> > 
> > Stash the RSDP address to solve this problem.
> > 
> > Suggested-by: Roger Pau Monné 
> > Signed-off-by: Wei Liu 
> > ---
> > Cc: Jan Beulich 
> > Cc: Andrew Cooper 
> > Cc: Roger Pau Monné 
> > 
> > What about PVH + EFI?
> 
> PVH guests using EFI firmware will get the RSDP address from the EFI
> tables. Ie: EFI firmware will use the PVH entry point, thus fetching
> the RSDP from start_info, but the kernel loaded from EFI should be
> using the EFI entry point, and thus fetching the RSDP pointer from the
> EFI tables. Or at least that was my thinking.

Good. That means no addition is needed to the EFI path in
acpi_os_get_root_pointer.

> 
> > ---
> >  xen/arch/x86/guest/pvh-boot.c| 4 
> >  xen/drivers/acpi/osl.c   | 9 +
> >  xen/include/asm-x86/guest/pvh-boot.h | 1 +
> >  3 files changed, 14 insertions(+)
> > 
> > diff --git a/xen/arch/x86/guest/pvh-boot.c b/xen/arch/x86/guest/pvh-boot.c
> > index be3122b16c..427f9ea6b1 100644
> > --- a/xen/arch/x86/guest/pvh-boot.c
> > +++ b/xen/arch/x86/guest/pvh-boot.c
> > @@ -30,6 +30,7 @@
> >  /* Initialised in head.S, before .bss is zeroed. */
> >  bool __initdata pvh_boot;
> >  uint32_t __initdata pvh_start_info_pa;
> > +unsigned long __initdata pvh_rsdp_pa;
> 
> uint64_t maybe to use the same type as start_info.h.
> 
> >  
> >  static multiboot_info_t __initdata pvh_mbi;
> >  static module_t __initdata pvh_mbi_mods[8];
> > @@ -69,6 +70,9 @@ static void __init convert_pvh_info(void)
> >  mod[i].mod_end   = entry[i].paddr + entry[i].size;
> >  mod[i].string= entry[i].cmdline_paddr;
> >  }
> > +
> > +/* Stash RSDP pointer so ACPI driver can get it */
> > +pvh_rsdp_pa = pvh_info->rsdp_paddr;;
> 
> Double ';'.
> 
> Is this too early to panic? IMHO we should add:
> 
> if ( !pvh_info->rsdp_paddr )
> panic("Unable to boot in PVH mode without ACPI tables");
> 
> Preferably here or at acpi_os_get_root_pointer.

It is too early to panic here. Even those BUG_ONs are problematic (yes
they do stop booting but no useful message is printed). But I couldn't
come up with a better way when I wrote the patch.

It is better to do that later in acpi_os_get_root_pointer.

> 
> >  }
> >  
> >  static void __init get_memory_map(void)
> > diff --git a/xen/drivers/acpi/osl.c b/xen/drivers/acpi/osl.c
> > index 52c9b4ba9a..6a81de1707 100644
> > --- a/xen/drivers/acpi/osl.c
> > +++ b/xen/drivers/acpi/osl.c
> > @@ -38,6 +38,10 @@
> >  #include 
> >  #include 
> >  
> > +#ifdef CONFIG_PVH_GUEST
> > +#include 
> > +#endif
> > +
> >  #define _COMPONENT ACPI_OS_SERVICES
> >  ACPI_MODULE_NAME("osl")
> >  
> > @@ -74,6 +78,11 @@ acpi_physical_address __init 
> > acpi_os_get_root_pointer(void)
> >"System description tables not found\n");
> > return 0;
> > }
> > +#ifdef CONFIG_PVH_GUEST
> > +   } else if (pvh_boot) {
> > +   ASSERT(pvh_rsdp_pa);
> > +   return pvh_rsdp_pa;
> > +#endif
> > } else if (IS_ENABLED(CONFIG_ACPI_LEGACY_TABLES_LOOKUP)) {
> > acpi_physical_address pa = 0;
> 
> Can this be done in a non-PVH specific way?
> 
> Can we have a global rsdp_hint variable or similar that would be used
> here if set?

Who will be the anticipated user(s) other than PVH?

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen/shim: stash RSDP address for ACPI driver

2018-01-22 Thread Roger Pau Monné
On Mon, Jan 22, 2018 at 12:47:10PM +, Wei Liu wrote:
> It used to the case that we placed RSDP under 1MB and let Xen search
> for it. We moved the placement to under 4GB in 4a5733771, so the
> search wouldn't work.
> 
> Stash the RSDP address to solve this problem.
> 
> Suggested-by: Roger Pau Monné 
> Signed-off-by: Wei Liu 
> ---
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Cc: Roger Pau Monné 
> 
> What about PVH + EFI?

PVH guests using EFI firmware will get the RSDP address from the EFI
tables. Ie: EFI firmware will use the PVH entry point, thus fetching
the RSDP from start_info, but the kernel loaded from EFI should be
using the EFI entry point, and thus fetching the RSDP pointer from the
EFI tables. Or at least that was my thinking.

> ---
>  xen/arch/x86/guest/pvh-boot.c| 4 
>  xen/drivers/acpi/osl.c   | 9 +
>  xen/include/asm-x86/guest/pvh-boot.h | 1 +
>  3 files changed, 14 insertions(+)
> 
> diff --git a/xen/arch/x86/guest/pvh-boot.c b/xen/arch/x86/guest/pvh-boot.c
> index be3122b16c..427f9ea6b1 100644
> --- a/xen/arch/x86/guest/pvh-boot.c
> +++ b/xen/arch/x86/guest/pvh-boot.c
> @@ -30,6 +30,7 @@
>  /* Initialised in head.S, before .bss is zeroed. */
>  bool __initdata pvh_boot;
>  uint32_t __initdata pvh_start_info_pa;
> +unsigned long __initdata pvh_rsdp_pa;

uint64_t maybe to use the same type as start_info.h.

>  
>  static multiboot_info_t __initdata pvh_mbi;
>  static module_t __initdata pvh_mbi_mods[8];
> @@ -69,6 +70,9 @@ static void __init convert_pvh_info(void)
>  mod[i].mod_end   = entry[i].paddr + entry[i].size;
>  mod[i].string= entry[i].cmdline_paddr;
>  }
> +
> +/* Stash RSDP pointer so ACPI driver can get it */
> +pvh_rsdp_pa = pvh_info->rsdp_paddr;;

Double ';'.

Is this too early to panic? IMHO we should add:

if ( !pvh_info->rsdp_paddr )
panic("Unable to boot in PVH mode without ACPI tables");

Preferably here or at acpi_os_get_root_pointer.

>  }
>  
>  static void __init get_memory_map(void)
> diff --git a/xen/drivers/acpi/osl.c b/xen/drivers/acpi/osl.c
> index 52c9b4ba9a..6a81de1707 100644
> --- a/xen/drivers/acpi/osl.c
> +++ b/xen/drivers/acpi/osl.c
> @@ -38,6 +38,10 @@
>  #include 
>  #include 
>  
> +#ifdef CONFIG_PVH_GUEST
> +#include 
> +#endif
> +
>  #define _COMPONENT   ACPI_OS_SERVICES
>  ACPI_MODULE_NAME("osl")
>  
> @@ -74,6 +78,11 @@ acpi_physical_address __init acpi_os_get_root_pointer(void)
>  "System description tables not found\n");
>   return 0;
>   }
> +#ifdef CONFIG_PVH_GUEST
> + } else if (pvh_boot) {
> + ASSERT(pvh_rsdp_pa);
> + return pvh_rsdp_pa;
> +#endif
>   } else if (IS_ENABLED(CONFIG_ACPI_LEGACY_TABLES_LOOKUP)) {
>   acpi_physical_address pa = 0;

Can this be done in a non-PVH specific way?

Can we have a global rsdp_hint variable or similar that would be used
here if set?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] xen: Drop DOMCTL_getmemlist and xc_get_pfn_list()

2018-01-22 Thread Jan Beulich
>>> On 22.01.18 at 13:52,  wrote:
> On 22/01/18 12:41, Jan Beulich wrote:
> On 19.01.18 at 20:19,  wrote:
>>> --- a/xen/include/public/domctl.h
>>> +++ b/xen/include/public/domctl.h
>>> @@ -1117,7 +1117,7 @@ struct xen_domctl {
>>>  #define XEN_DOMCTL_pausedomain3
>>>  #define XEN_DOMCTL_unpausedomain  4
>>>  #define XEN_DOMCTL_getdomaininfo  5
>>> -#define XEN_DOMCTL_getmemlist 6
>>> +/* #define XEN_DOMCTL_getmemlist  6 Obsolete */
>>>  /* #define XEN_DOMCTL_getpageframeinfo7 Obsolete - use 
> getpageframeinfo3 */
>>>  /* #define XEN_DOMCTL_getpageframeinfo2   8 Obsolete - use 
> getpageframeinfo3 */
>>>  #define XEN_DOMCTL_setvcpuaffinity9
>> Just like mentioned upon someone else's recent submission to
>> remove a domctl sub-op: You want to bump the interface version
>> (remember that the bump done for the shim doesn't count as long
>> as there is a possible plan to make that other recent commit part
>> of a 4.10.x stable release).
> 
> There has already been a version bump for 4.11.

I know, hence the longer explanation, which I had given also
when the shim series was first posted: If that domctl change is
to be backported to 4.10, interface version 0xf will be burnt
for _just that change_. That other bump is sufficient only when
there is no plan whatsoever to backport the earlier change.

>> Plus I again question whether
>> "Obsolete" is an appropriate description for something that's no
>> longer part of the interface (rather than just being suggested to
>> no longer be used). Is there any point in keeping the old sub-op
>> as a comment in the first place?
> 
> To avoid the number being reused.  It also serves as a marker to locate
> the change which removed the hypercall if anyone is doing archaeology in
> the future.

The number getting re-used with a higher interface version is no
problem at all, afaics.

> How about removed instead of obsolete?

That would be fine with me.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 for-next 10/12] vpci/msi: add MSI handlers

2018-01-22 Thread Jan Beulich
>>> On 22.01.18 at 13:48,  wrote:
> I think the proper way to solve this is to reset the mask bits to
> masked when the vector is unbound, so that at bind time the state of
> the mask is consistent regardless of whether the vector has been
> previously bound or not. The following patch should fix this:
> 
> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> index 8f16e6c0a5..bab3aa349a 100644
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -645,7 +645,22 @@ int pt_irq_destroy_bind(
>  }
>  break;
>  case PT_IRQ_TYPE_MSI:
> +{
> +unsigned long flags;
> +struct irq_desc *desc = domain_spin_lock_irq_desc(d, machine_gsi,
> +  );
> +
> +if ( !desc )
> +return -EINVAL;
> +/*
> + * Leave the MSI masked, so that the state when calling
> + * pt_irq_create_bind is consistent across bind/unbinds.
> + */
> +guest_mask_msi_irq(desc, true);
> +spin_unlock_irqrestore(>lock, flags);
>  break;
> +}
> +
>  default:
>  return -EOPNOTSUPP;
>  }
> 
> I think this should be send as a separate patch of this series, since
> it's a fix for pt_irq_destroy_bind.

Looks plausible, but I'll defer my ack until I've also seen the
description for it, because if the above is really necessary I'd
sort of expect there to be an actual issue without any of your
series applied.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 5/7] x86: relocate pvh_info

2018-01-22 Thread Jan Beulich
>>> On 22.01.18 at 13:35,  wrote:
> To avoid spamming the list with all the other acked patches, here is the
> updated patch.

Feel free to re-instate my R-b, but please commit only if Andrew
withdraws his earlier voiced more general objection.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v2 01/12] x86: cleanup processor.h

2018-01-22 Thread Jan Beulich
>>> On 22.01.18 at 13:32,  wrote:
> Remove NSC/Cyrix CPU macros and current_text_addr() which are used
> nowhere.

I agree doing the former, but I have a vague recollection that we've
left the latter in place despite there not being any callers at present.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] xen: Drop DOMCTL_getmemlist and xc_get_pfn_list()

2018-01-22 Thread Andrew Cooper
On 22/01/18 12:41, Jan Beulich wrote:
 On 19.01.18 at 20:19,  wrote:
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -1117,7 +1117,7 @@ struct xen_domctl {
>>  #define XEN_DOMCTL_pausedomain3
>>  #define XEN_DOMCTL_unpausedomain  4
>>  #define XEN_DOMCTL_getdomaininfo  5
>> -#define XEN_DOMCTL_getmemlist 6
>> +/* #define XEN_DOMCTL_getmemlist  6 Obsolete */
>>  /* #define XEN_DOMCTL_getpageframeinfo7 Obsolete - use 
>> getpageframeinfo3 */
>>  /* #define XEN_DOMCTL_getpageframeinfo2   8 Obsolete - use 
>> getpageframeinfo3 */
>>  #define XEN_DOMCTL_setvcpuaffinity9
> Just like mentioned upon someone else's recent submission to
> remove a domctl sub-op: You want to bump the interface version
> (remember that the bump done for the shim doesn't count as long
> as there is a possible plan to make that other recent commit part
> of a 4.10.x stable release).

There has already been a version bump for 4.11.

> Plus I again question whether
> "Obsolete" is an appropriate description for something that's no
> longer part of the interface (rather than just being suggested to
> no longer be used). Is there any point in keeping the old sub-op
> as a comment in the first place?

To avoid the number being reused.  It also serves as a marker to locate
the change which removed the hypercall if anyone is doing archaeology in
the future.

How about removed instead of obsolete?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v2 00/12] xen/x86: use per-vcpu stacks for 64 bit pv domains

2018-01-22 Thread Jan Beulich
>>> On 22.01.18 at 13:32,  wrote:
> As a preparation for doing page table isolation in the Xen hypervisor
> in order to mitigate "Meltdown" use dedicated stacks, GDT and TSS for
> 64 bit PV domains mapped to the per-domain virtual area.
> 
> The per-vcpu stacks are used for early interrupt handling only. After
> saving the domain's registers stacks are switched back to the normal
> per physical cpu ones in order to be able to address on-stack data
> from other cpus e.g. while handling IPIs.
> 
> Adding %cr3 switching between saving of the registers and switching
> the stacks will enable the possibility to run guest code without any
> per physical cpu mapping, i.e. avoiding the threat of a guest being
> able to access other domains data.
> 
> Without any further measures it will still be possible for e.g. a
> guest's user program to read stack data of another vcpu of the same
> domain, but this can be easily avoided by a little PV-ABI modification
> introducing per-cpu user address spaces.
> 
> This series is meant as a replacement for Andrew's patch series:
> "x86: Prerequisite work for a Xen KAISER solution".

Considering in particular the two reverts, what I'm missing here
is a clear description of the meaningful additional protection this
approach provides over the band-aid. For context see also
https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg01735.html

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 for-next 10/12] vpci/msi: add MSI handlers

2018-01-22 Thread Roger Pau Monné
On Fri, Dec 15, 2017 at 05:07:06AM -0700, Jan Beulich wrote:
> >>> On 18.10.17 at 13:40,  wrote:
> > +static void control_write(const struct pci_dev *pdev, unsigned int reg,
> > +  uint32_t val, void *data)
> > +{
> > +struct vpci_msi *msi = data;
> > +unsigned int vectors = min_t(uint8_t,
> > + 1u << MASK_EXTR(val, PCI_MSI_FLAGS_QSIZE),
> > + msi->max_vectors);
> > +bool new_enabled = val & PCI_MSI_FLAGS_ENABLE;
> > +
> > +/*
> > + * No change if the enable field and the number of vectors is
> > + * the same or the device is not enabled, in which case the
> > + * vectors field can be updated directly.
> > + */
> > +if ( new_enabled == msi->enabled &&
> > + (vectors == msi->vectors || !msi->enabled) )
> > +{
> > +msi->vectors = vectors;
> > +return;
> > +}
> > +
> > +if ( new_enabled )
> > +{
> > +unsigned int i;
> > +
> > +/*
> > + * If the device is already enabled it means the number of
> > + * enabled messages has changed. Disable and re-enable the
> > + * device in order to apply the change.
> > + */
> > +if ( msi->enabled )
> > +{
> > +vpci_msi_arch_disable(msi, pdev);
> > +msi->enabled = false;
> > +}
> > +
> > +if ( vpci_msi_arch_enable(msi, pdev, vectors) )
> > +return;
> > +
> > +for ( i = 0; msi->masking && i < vectors; i++ )
> > +vpci_msi_arch_mask(msi, pdev, i, (msi->mask >> i) & 1);
> 
> The ordering looks wrong at the first (and second) glance: It gives
> the impression that you enable the vectors and only then mask
> them. I _assume_ the ordering is the way it is because
> vpci_msi_arch_enable() leaves the vectors masked

I've taken another look at this, and I think what's done here is still
not fully correct.

vpci_mis_arch_enable (which calls allocate_and_map_msi_pirq and
pt_irq_create_bind) will leave the masking bits as they where. There's
no explicit masking done there. It just happens that Xen sets the mask
to ~0 when adding the PCI device (see msi_capability_init), and thus
all vectors are masked by default when the device first enables MSI.

So given the following flow:

 - Guest enables MSI with 8 vectors enabled and unmasked.
 - Guest disables MSI.
 - Guest masks vector 4.
 - Guest re-enables MSI.

There's going to be a window where vector 4 won't be masked in the
code above (between the call to vpci_msi_arch_enable and the call to
vpci_msi_arch_mask). It's quite likely that the QEMU side is also
missing this, but AFAICT it's not something that an OS would usually
do.

I think the proper way to solve this is to reset the mask bits to
masked when the vector is unbound, so that at bind time the state of
the mask is consistent regardless of whether the vector has been
previously bound or not. The following patch should fix this:

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 8f16e6c0a5..bab3aa349a 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -645,7 +645,22 @@ int pt_irq_destroy_bind(
 }
 break;
 case PT_IRQ_TYPE_MSI:
+{
+unsigned long flags;
+struct irq_desc *desc = domain_spin_lock_irq_desc(d, machine_gsi,
+  );
+
+if ( !desc )
+return -EINVAL;
+/*
+ * Leave the MSI masked, so that the state when calling
+ * pt_irq_create_bind is consistent across bind/unbinds.
+ */
+guest_mask_msi_irq(desc, true);
+spin_unlock_irqrestore(>lock, flags);
 break;
+}
+
 default:
 return -EOPNOTSUPP;
 }

I think this should be send as a separate patch of this series, since
it's a fix for pt_irq_destroy_bind.

> (albeit that's
> sort of contradicting the msi->masking part of the loop condition),
> and if so this should be explained in a comment. If, however, this
> assumption of mine is wrong, then the order needs changing.

I will add a comment once this is sorted.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] xen/shim: stash RSDP address for ACPI driver

2018-01-22 Thread Wei Liu
It used to the case that we placed RSDP under 1MB and let Xen search
for it. We moved the placement to under 4GB in 4a5733771, so the
search wouldn't work.

Stash the RSDP address to solve this problem.

Suggested-by: Roger Pau Monné 
Signed-off-by: Wei Liu 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Roger Pau Monné 

What about PVH + EFI?
---
 xen/arch/x86/guest/pvh-boot.c| 4 
 xen/drivers/acpi/osl.c   | 9 +
 xen/include/asm-x86/guest/pvh-boot.h | 1 +
 3 files changed, 14 insertions(+)

diff --git a/xen/arch/x86/guest/pvh-boot.c b/xen/arch/x86/guest/pvh-boot.c
index be3122b16c..427f9ea6b1 100644
--- a/xen/arch/x86/guest/pvh-boot.c
+++ b/xen/arch/x86/guest/pvh-boot.c
@@ -30,6 +30,7 @@
 /* Initialised in head.S, before .bss is zeroed. */
 bool __initdata pvh_boot;
 uint32_t __initdata pvh_start_info_pa;
+unsigned long __initdata pvh_rsdp_pa;
 
 static multiboot_info_t __initdata pvh_mbi;
 static module_t __initdata pvh_mbi_mods[8];
@@ -69,6 +70,9 @@ static void __init convert_pvh_info(void)
 mod[i].mod_end   = entry[i].paddr + entry[i].size;
 mod[i].string= entry[i].cmdline_paddr;
 }
+
+/* Stash RSDP pointer so ACPI driver can get it */
+pvh_rsdp_pa = pvh_info->rsdp_paddr;;
 }
 
 static void __init get_memory_map(void)
diff --git a/xen/drivers/acpi/osl.c b/xen/drivers/acpi/osl.c
index 52c9b4ba9a..6a81de1707 100644
--- a/xen/drivers/acpi/osl.c
+++ b/xen/drivers/acpi/osl.c
@@ -38,6 +38,10 @@
 #include 
 #include 
 
+#ifdef CONFIG_PVH_GUEST
+#include 
+#endif
+
 #define _COMPONENT ACPI_OS_SERVICES
 ACPI_MODULE_NAME("osl")
 
@@ -74,6 +78,11 @@ acpi_physical_address __init acpi_os_get_root_pointer(void)
   "System description tables not found\n");
return 0;
}
+#ifdef CONFIG_PVH_GUEST
+   } else if (pvh_boot) {
+   ASSERT(pvh_rsdp_pa);
+   return pvh_rsdp_pa;
+#endif
} else if (IS_ENABLED(CONFIG_ACPI_LEGACY_TABLES_LOOKUP)) {
acpi_physical_address pa = 0;
 
diff --git a/xen/include/asm-x86/guest/pvh-boot.h 
b/xen/include/asm-x86/guest/pvh-boot.h
index 1b429f9401..995500e4da 100644
--- a/xen/include/asm-x86/guest/pvh-boot.h
+++ b/xen/include/asm-x86/guest/pvh-boot.h
@@ -24,6 +24,7 @@
 #ifdef CONFIG_PVH_GUEST
 
 extern bool pvh_boot;
+extern unsigned long pvh_rsdp_pa;
 
 multiboot_info_t *pvh_init(void);
 void pvh_print_info(void);
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 5/7] x86: relocate pvh_info

2018-01-22 Thread Wei Liu
On Mon, Jan 22, 2018 at 12:44:41PM +, Roger Pau Monné wrote:
> On Mon, Jan 22, 2018 at 12:35:21PM +, Wei Liu wrote:
> > To avoid spamming the list with all the other acked patches, here is the
> > updated patch.
> > 
> > ---8<---
> > From 1ac0afbbc0ecd620c5fba3a03bb084bc4dafc78e Mon Sep 17 00:00:00 2001
> > From: Wei Liu 
> > Date: Wed, 17 Jan 2018 18:38:02 +
> > Subject: [PATCH] x86: relocate pvh_info
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> > 
> > Modify early boot code to relocate pvh info as well, so that we can be
> > sure __va in __start_xen works.
> > 
> > Signed-off-by: Wei Liu 
> 
> Reviewed-by: Roger Pau Monné 
> 
> With one question below.
> 
> > diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
> > index 0f652cea11..aa2e2a93c8 100644
> > --- a/xen/arch/x86/boot/head.S
> > +++ b/xen/arch/x86/boot/head.S
> > @@ -414,6 +414,7 @@ __pvh_start:
> >  
> >  /* Set trampoline_phys to use mfn 1 to avoid having a mapping at 
> > VA 0 */
> >  movw$0x1000, sym_esi(trampoline_phys)
> > +movl(%ebx), %eax /* mov $XEN_HVM_START_MAGIC_VALUE, %eax */
> 
> Do you really need the l suffix here?

I guess no. I copied it from your previous reply. ;-)

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 5/7] x86: relocate pvh_info

2018-01-22 Thread Roger Pau Monné
On Mon, Jan 22, 2018 at 12:35:21PM +, Wei Liu wrote:
> To avoid spamming the list with all the other acked patches, here is the
> updated patch.
> 
> ---8<---
> From 1ac0afbbc0ecd620c5fba3a03bb084bc4dafc78e Mon Sep 17 00:00:00 2001
> From: Wei Liu 
> Date: Wed, 17 Jan 2018 18:38:02 +
> Subject: [PATCH] x86: relocate pvh_info
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> Modify early boot code to relocate pvh info as well, so that we can be
> sure __va in __start_xen works.
> 
> Signed-off-by: Wei Liu 

Reviewed-by: Roger Pau Monné 

With one question below.

> diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
> index 0f652cea11..aa2e2a93c8 100644
> --- a/xen/arch/x86/boot/head.S
> +++ b/xen/arch/x86/boot/head.S
> @@ -414,6 +414,7 @@ __pvh_start:
>  
>  /* Set trampoline_phys to use mfn 1 to avoid having a mapping at VA 
> 0 */
>  movw$0x1000, sym_esi(trampoline_phys)
> +movl(%ebx), %eax /* mov $XEN_HVM_START_MAGIC_VALUE, %eax */

Do you really need the l suffix here?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] xen: Drop DOMCTL_getmemlist and xc_get_pfn_list()

2018-01-22 Thread Jan Beulich
>>> On 19.01.18 at 20:19,  wrote:
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -1117,7 +1117,7 @@ struct xen_domctl {
>  #define XEN_DOMCTL_pausedomain3
>  #define XEN_DOMCTL_unpausedomain  4
>  #define XEN_DOMCTL_getdomaininfo  5
> -#define XEN_DOMCTL_getmemlist 6
> +/* #define XEN_DOMCTL_getmemlist  6 Obsolete */
>  /* #define XEN_DOMCTL_getpageframeinfo7 Obsolete - use 
> getpageframeinfo3 */
>  /* #define XEN_DOMCTL_getpageframeinfo2   8 Obsolete - use 
> getpageframeinfo3 */
>  #define XEN_DOMCTL_setvcpuaffinity9

Just like mentioned upon someone else's recent submission to
remove a domctl sub-op: You want to bump the interface version
(remember that the bump done for the shim doesn't count as long
as there is a possible plan to make that other recent commit part
of a 4.10.x stable release). Plus I again question whether
"Obsolete" is an appropriate description for something that's no
longer part of the interface (rather than just being suggested to
no longer be used). Is there any point in keeping the old sub-op
as a comment in the first place?

With this suitably addressed, the hypervisor side is
Acked-by: Jan Beulich 

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH RFC v2 06/12] x86: add a xpti command line parameter

2018-01-22 Thread Juergen Gross
Add a command line parameter for controlling Xen page table isolation
(XPTI): per default it is on for non-AMD systems in 64 bit pv domains.

Possible settings are:
- true: switched on even on AMD systems
- false: switched off for all
- nodom0: switched off for dom0

Signed-off-by: Juergen Gross 
---
 docs/misc/xen-command-line.markdown | 18 
 xen/arch/x86/pv/domain.c| 55 +
 xen/include/asm-x86/domain.h|  2 ++
 3 files changed, 75 insertions(+)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index f5214defbb..90202a5cc9 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1911,6 +1911,24 @@ In the case that x2apic is in use, this option switches 
between physical and
 clustered mode.  The default, given no hint from the **FADT**, is cluster
 mode.
 
+### xpti
+> `= nodom0 | default | `
+
+> Default: `false` on AMD hardware, `true` everywhere else.
+
+> Can be modified at runtime
+
+Override default selection of whether to isolate 64-bit PV guest page
+tables.
+
+`true` activates page table isolation even on AMD hardware.
+
+`false` deactivates page table isolation on all systems.
+
+`nodom0` deactivates page table isolation for dom0.
+
+`default` switch to default settings.
+
 ### xsave
 > `= `
 
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index 74e9e667d2..7d50f9bc19 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -6,6 +6,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -17,6 +18,40 @@
 #undef page_to_mfn
 #define page_to_mfn(pg) _mfn(__page_to_mfn(pg))
 
+static __read_mostly enum {
+XPTI_DEFAULT,
+XPTI_ON,
+XPTI_OFF,
+XPTI_NODOM0
+} opt_xpti = XPTI_DEFAULT;
+
+static int parse_xpti(const char *s)
+{
+int rc = 0;
+
+switch ( parse_bool(s, NULL) )
+{
+case 0:
+opt_xpti = XPTI_OFF;
+break;
+case 1:
+opt_xpti = XPTI_ON;
+break;
+default:
+if ( !strcmp(s, "default") )
+opt_xpti = XPTI_DEFAULT;
+else if ( !strcmp(s, "nodom0") )
+opt_xpti = XPTI_NODOM0;
+else
+rc = -EINVAL;
+break;
+}
+
+return rc;
+}
+
+custom_runtime_param("xpti", parse_xpti);
+
 static void noreturn continue_nonidle_domain(struct vcpu *v)
 {
 check_wakeup_from_wait();
@@ -76,6 +111,8 @@ int switch_compat(struct domain *d)
 goto undo_and_fail;
 }
 
+d->arch.pv_domain.xpti = false;
+
 domain_set_alloc_bitsize(d);
 recalculate_cpuid_policy(d);
 
@@ -212,6 +249,24 @@ int pv_domain_initialise(struct domain *d, unsigned int 
domcr_flags,
 /* 64-bit PV guest by default. */
 d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
 
+switch (opt_xpti)
+{
+case XPTI_OFF:
+d->arch.pv_domain.xpti = false;
+break;
+case XPTI_ON:
+d->arch.pv_domain.xpti = true;
+break;
+case XPTI_NODOM0:
+d->arch.pv_domain.xpti = boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
+ d->domain_id != 0 &&
+ d->domain_id != hardware_domid;
+break;
+case XPTI_DEFAULT:
+d->arch.pv_domain.xpti = boot_cpu_data.x86_vendor != X86_VENDOR_AMD;
+break;
+}
+
 return 0;
 
   fail:
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 4679d5477d..f1230ac621 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -257,6 +257,8 @@ struct pv_domain
 struct mapcache_domain mapcache;
 
 struct cpuidmasks *cpuidmasks;
+
+bool xpti;
 };
 
 struct monitor_write_data {
-- 
2.13.6


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 1/7] xen/arm32: entry: Consolidate DEFINE_TRAP_ENTRY_* macros

2018-01-22 Thread Julien Grall
The only difference between all the DEFINE_TRAP_ENTRY_* macros  are the
interrupts (Asynchronous Abort, IRQ, FIQ) unmasked.

Rather than duplicating the code, introduce __DEFINE_TRAP_ENTRY macro
that will take the list of interrupts to unmask.

This is part of XSA-254.

Signed-off-by: Julien Grall 
---
 xen/arch/arm/arm32/entry.S | 36 +---
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/xen/arch/arm/arm32/entry.S b/xen/arch/arm/arm32/entry.S
index 120922e64e..c6490d2847 100644
--- a/xen/arch/arm/arm32/entry.S
+++ b/xen/arch/arm/arm32/entry.S
@@ -111,39 +111,29 @@ abort_guest_exit_end:
 skip_check:
 mov pc, lr
 
-#define DEFINE_TRAP_ENTRY(trap) \
+/*
+ * Macro to define trap entry. The iflags corresponds to the list of
+ * interrupts (Asynchronous Abort, IRQ, FIQ) to unmask.
+ */
+#define __DEFINE_TRAP_ENTRY(trap, iflags)   \
 ALIGN;  \
 trap_##trap:\
 SAVE_ALL;   \
-cpsie i;/* local_irq_enable */  \
-cpsie a;/* asynchronous abort enable */ \
+cpsie iflags;   \
 adr lr, return_from_trap;   \
 mov r0, sp; \
 mov r11, sp;\
 bic sp, #7; /* Align the stack pointer (noop on guest trap) */  \
 b do_trap_##trap
 
-#define DEFINE_TRAP_ENTRY_NOIRQ(trap)   \
-ALIGN;  \
-trap_##trap:\
-SAVE_ALL;   \
-cpsie a;/* asynchronous abort enable */ \
-adr lr, return_from_trap;   \
-mov r0, sp; \
-mov r11, sp;\
-bic sp, #7; /* Align the stack pointer (noop on guest trap) */  \
-b do_trap_##trap
+/* Trap handler which unmask IRQ/Abort, keep FIQ masked */
+#define DEFINE_TRAP_ENTRY(trap) __DEFINE_TRAP_ENTRY(trap, ai)
 
-#define DEFINE_TRAP_ENTRY_NOABORT(trap) \
-ALIGN;  \
-trap_##trap:\
-SAVE_ALL;   \
-cpsie i;/* local_irq_enable */  \
-adr lr, return_from_trap;   \
-mov r0, sp; \
-mov r11, sp;\
-bic sp, #7; /* Align the stack pointer (noop on guest trap) */  \
-b do_trap_##trap
+/* Trap handler which unmask Abort, keep IRQ/FIQ masked */
+#define DEFINE_TRAP_ENTRY_NOIRQ(trap) __DEFINE_TRAP_ENTRY(trap, a)
+
+/* Trap handler which unmask IRQ, keep Abort/FIQ masked */
+#define DEFINE_TRAP_ENTRY_NOABORT(trap) __DEFINE_TRAP_ENTRY(trap, i)
 
 .align 5
 GLOBAL(hyp_traps_vector)
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH RFC v2 10/12] x86: allocate per-vcpu stacks for interrupt entries

2018-01-22 Thread Juergen Gross
In case of XPTI being active for a pv-domain allocate and initialize
per-vcpu stacks. The stacks are added to the per-domain mappings of
the pv-domain.

Signed-off-by: Juergen Gross 
---
 xen/arch/x86/pv/domain.c  | 72 +++
 xen/include/asm-x86/config.h  | 13 +++-
 xen/include/asm-x86/current.h | 39 ---
 xen/include/asm-x86/domain.h  |  3 ++
 4 files changed, 121 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index 7d50f9bc19..834be96ed8 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -156,6 +156,75 @@ void pv_vcpu_destroy(struct vcpu *v)
 pv_destroy_gdt_ldt_l1tab(v);
 xfree(v->arch.pv_vcpu.trap_ctxt);
 v->arch.pv_vcpu.trap_ctxt = NULL;
+
+if ( v->domain->arch.pv_domain.xpti )
+{
+free_xenheap_page(v->arch.pv_vcpu.stack_regs);
+v->arch.pv_vcpu.stack_regs = NULL;
+destroy_perdomain_mapping(v->domain, XPTI_START(v), STACK_PAGES);
+}
+}
+
+static int pv_vcpu_init_xpti(struct vcpu *v)
+{
+struct domain *d = v->domain;
+struct page_info *pg;
+void *ptr;
+struct cpu_info *info;
+unsigned long stack_bottom;
+int rc;
+
+/* Populate page tables. */
+rc = create_perdomain_mapping(d, XPTI_START(v), STACK_PAGES,
+  NIL(l1_pgentry_t *), NULL);
+if ( rc )
+goto done;
+
+/* Map stacks. */
+rc = create_perdomain_mapping(d, XPTI_START(v), IST_MAX,
+  NULL, NIL(struct page_info *));
+if ( rc )
+goto done;
+
+ptr = alloc_xenheap_page();
+if ( !ptr )
+{
+rc = -ENOMEM;
+goto done;
+}
+clear_page(ptr);
+addmfn_to_perdomain_mapping(d, XPTI_START(v) + STACK_SIZE - PAGE_SIZE,
+_mfn(virt_to_mfn(ptr)));
+info = (struct cpu_info *)((unsigned long)ptr + PAGE_SIZE) - 1;
+info->flags = ON_VCPUSTACK;
+v->arch.pv_vcpu.stack_regs = >guest_cpu_user_regs;
+
+/* Map TSS. */
+rc = create_perdomain_mapping(d, XPTI_TSS(v), 1, NULL, );
+if ( rc )
+goto done;
+info = (struct cpu_info *)(XPTI_START(v) + STACK_SIZE) - 1;
+stack_bottom = (unsigned long)>guest_cpu_user_regs.es;
+ptr = __map_domain_page(pg);
+tss_init(ptr, stack_bottom);
+unmap_domain_page(ptr);
+
+/* Map stub trampolines. */
+rc = create_perdomain_mapping(d, XPTI_TRAMPOLINE(v), 1, NULL, );
+if ( rc )
+goto done;
+ptr = __map_domain_page(pg);
+write_stub_trampoline((unsigned char *)ptr, XPTI_TRAMPOLINE(v),
+  stack_bottom, (unsigned long)lstar_enter);
+write_stub_trampoline((unsigned char *)ptr + STUB_TRAMPOLINE_SIZE_PERVCPU,
+  XPTI_TRAMPOLINE(v) + STUB_TRAMPOLINE_SIZE_PERVCPU,
+  stack_bottom, (unsigned long)cstar_enter);
+unmap_domain_page(ptr);
+flipflags_perdomain_mapping(d, XPTI_TRAMPOLINE(v),
+_PAGE_NX | _PAGE_RW | _PAGE_DIRTY);
+
+ done:
+return rc;
 }
 
 int pv_vcpu_initialise(struct vcpu *v)
@@ -195,6 +264,9 @@ int pv_vcpu_initialise(struct vcpu *v)
 goto done;
 }
 
+if ( d->arch.pv_domain.xpti )
+rc = pv_vcpu_init_xpti(v);
+
  done:
 if ( rc )
 pv_vcpu_destroy(v);
diff --git a/xen/include/asm-x86/config.h b/xen/include/asm-x86/config.h
index 9ef9d03ca7..cb107255af 100644
--- a/xen/include/asm-x86/config.h
+++ b/xen/include/asm-x86/config.h
@@ -66,6 +66,7 @@
 #endif
 
 #define STACK_ORDER 3
+#define STACK_PAGES (1 << STACK_ORDER)
 #define STACK_SIZE  (PAGE_SIZE << STACK_ORDER)
 
 #define TRAMPOLINE_STACK_SPACE  PAGE_SIZE
@@ -202,7 +203,7 @@ extern unsigned char boot_edid_info[128];
 /* Slot 260: per-domain mappings (including map cache). */
 #define PERDOMAIN_VIRT_START(PML4_ADDR(260))
 #define PERDOMAIN_SLOT_MBYTES   (PML4_ENTRY_BYTES >> (20 + PAGETABLE_ORDER))
-#define PERDOMAIN_SLOTS 3
+#define PERDOMAIN_SLOTS 4
 #define PERDOMAIN_VIRT_SLOT(s)  (PERDOMAIN_VIRT_START + (s) * \
  (PERDOMAIN_SLOT_MBYTES << 20))
 /* Slot 261: machine-to-phys conversion table (256GB). */
@@ -310,6 +311,16 @@ extern unsigned long xen_phys_start;
 #define ARG_XLAT_START(v)\
 (ARG_XLAT_VIRT_START + ((v)->vcpu_id << ARG_XLAT_VA_SHIFT))
 
+/* Per-vcpu XPTI pages. The fourth per-domain-mapping sub-area. */
+#define XPTI_VIRT_START  PERDOMAIN_VIRT_SLOT(3)
+#define XPTI_VA_SHIFT(PAGE_SHIFT + STACK_ORDER)
+#define XPTI_TRAMPOLINE_OFF  (IST_MAX << PAGE_SHIFT)
+#define XPTI_TSS_OFF ((IST_MAX + 2) << PAGE_SHIFT)
+#define XPTI_START(v)(XPTI_VIRT_START + \
+  ((v)->vcpu_id << XPTI_VA_SHIFT))
+#define XPTI_TRAMPOLINE(v)   (XPTI_START(v) + XPTI_TRAMPOLINE_OFF)
+#define XPTI_TSS(v)  (XPTI_START(v) + XPTI_TSS_OFF)
+
 

Re: [Xen-devel] [PATCH v2 5/7] x86: relocate pvh_info

2018-01-22 Thread Wei Liu
On Mon, Jan 22, 2018 at 03:31:22AM -0700, Jan Beulich wrote:
> >>> On 19.01.18 at 17:39,  wrote:
> > On Fri, Jan 19, 2018 at 04:29:31PM +, Roger Pau Monné wrote:
> >> On Fri, Jan 19, 2018 at 03:34:56PM +, Wei Liu wrote:
> >> > diff --git a/xen/arch/x86/boot/build32.mk b/xen/arch/x86/boot/build32.mk
> >> > index 48c7407c00..028ac19b96 100644
> >> > --- a/xen/arch/x86/boot/build32.mk
> >> > +++ b/xen/arch/x86/boot/build32.mk
> >> > @@ -36,5 +36,8 @@ CFLAGS := $(filter-out -flto,$(CFLAGS))
> >> >  cmdline.o: cmdline.c $(CMDLINE_DEPS)
> >> >  
> >> >  reloc.o: reloc.c $(RELOC_DEPS)
> >> > +ifeq ($(CONFIG_PVH_GUEST),y)
> >> > +reloc.o: CFLAGS += -DCONFIG_PVH_GUEST
> >> > +endif
> >> 
> >> I would maybe do this above, where the rest of the CFLAGS are set.
> >> Certainly setting -DCONFIG_PVH_GUEST shouldn't cause issues elsewhere.
> >> 
> >> CFLAGS-$(CONFIG_PVH_GUEST) += -DCONFIG_PVH_GUEST
> >> CFLAGS += $(CFLAGS-y)
> >> 
> >> >  .PRECIOUS: %.bin %.lnk
> >> > diff --git a/xen/arch/x86/boot/defs.h b/xen/arch/x86/boot/defs.h
> >> > index 6abdc15446..05921a64a3 100644
> >> > --- a/xen/arch/x86/boot/defs.h
> >> > +++ b/xen/arch/x86/boot/defs.h
> >> > @@ -51,6 +51,9 @@ typedef unsigned short u16;
> >> >  typedef unsigned int u32;
> >> >  typedef unsigned long long u64;
> >> >  typedef unsigned int size_t;
> >> > +typedef u8 uint8_t;
> >> > +typedef u32 uint32_t;
> >> > +typedef u64 uint64_t;
> >> 
> >> This this seems to be always expanding, maybe better to simply replace
> >> the stdbool.h include above with types.h?
> >> 
> > 
> > I'm two minded here. My impression is that this wants to be minimal and
> > standalone. The content in types.h is a lot more than we need here.
> 
> Please keep it the (minimal) way you have it.
> 
> >> >  #define U16_MAX ((u16)(~0U))
> >> >  #define UINT_MAX(~0U)
> >> > diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
> >> > index 0f652cea11..614e53081e 100644
> >> > --- a/xen/arch/x86/boot/head.S
> >> > +++ b/xen/arch/x86/boot/head.S
> >> > @@ -414,6 +414,7 @@ __pvh_start:
> >> >  
> >> >  /* Set trampoline_phys to use mfn 1 to avoid having a mapping 
> >> > at VA 0 */
> >> >  movw$0x1000, sym_esi(trampoline_phys)
> >> > +movl$0x336ec578, %eax /* mov $XEN_HVM_START_MAGIC_VALUE, 
> >> > %eax */
> >> 
> >> Hm, if XEN_HVM_START_MAGIC_VALUE cannot be used I would rather prefer
> >> to use (%ebx).
> > 
> > The same reason I didn't include types.h + hvm_start_info.h here.
> > 
> > We can include both to make $XEN_HVM_START_MAGIC_VALUE work. But I think
> > using (%ebx) is better in here.
> 
> I agree (%ebx) is preferable.
> 

To avoid spamming the list with all the other acked patches, here is the
updated patch.

---8<---
From 1ac0afbbc0ecd620c5fba3a03bb084bc4dafc78e Mon Sep 17 00:00:00 2001
From: Wei Liu 
Date: Wed, 17 Jan 2018 18:38:02 +
Subject: [PATCH] x86: relocate pvh_info
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Modify early boot code to relocate pvh info as well, so that we can be
sure __va in __start_xen works.

Signed-off-by: Wei Liu 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Roger Pau Monné 

v2: use XEN_HVM_START_MAGIC_VALUE and switch statement in reloc.
Move header inclusion.

v3: Use (%ebx). Add blank lines.
---
 xen/arch/x86/boot/Makefile   |  7 -
 xen/arch/x86/boot/build32.mk |  3 +++
 xen/arch/x86/boot/defs.h |  3 +++
 xen/arch/x86/boot/head.S | 25 ++
 xen/arch/x86/boot/reloc.c| 62 +++-
 5 files changed, 81 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/boot/Makefile b/xen/arch/x86/boot/Makefile
index c6246c85d2..9fe5b309c5 100644
--- a/xen/arch/x86/boot/Makefile
+++ b/xen/arch/x86/boot/Makefile
@@ -7,10 +7,15 @@ CMDLINE_DEPS = $(DEFS_H_DEPS) video.h
 RELOC_DEPS = $(DEFS_H_DEPS) $(BASEDIR)/include/xen/multiboot.h \
 $(BASEDIR)/include/xen/multiboot2.h
 
+ifeq ($(CONFIG_PVH_GUEST),y)
+RELOC_DEPS += $(BASEDIR)/include/public/arch-x86/hvm/start_info.h
+RELOC_EXTRA = CONFIG_PVH_GUEST=y
+endif
+
 head.o: cmdline.S reloc.S
 
 cmdline.S: cmdline.c $(CMDLINE_DEPS)
$(MAKE) -f build32.mk $@ CMDLINE_DEPS="$(CMDLINE_DEPS)"
 
 reloc.S: reloc.c $(RELOC_DEPS)
-   $(MAKE) -f build32.mk $@ RELOC_DEPS="$(RELOC_DEPS)"
+   $(MAKE) -f build32.mk $@ RELOC_DEPS="$(RELOC_DEPS)" $(RELOC_EXTRA)
diff --git a/xen/arch/x86/boot/build32.mk b/xen/arch/x86/boot/build32.mk
index 48c7407c00..028ac19b96 100644
--- a/xen/arch/x86/boot/build32.mk
+++ b/xen/arch/x86/boot/build32.mk
@@ -36,5 +36,8 @@ CFLAGS := $(filter-out -flto,$(CFLAGS))
 cmdline.o: cmdline.c $(CMDLINE_DEPS)
 
 reloc.o: reloc.c $(RELOC_DEPS)
+ifeq ($(CONFIG_PVH_GUEST),y)
+reloc.o: CFLAGS += -DCONFIG_PVH_GUEST
+endif
 
 .PRECIOUS: %.bin %.lnk
diff --git a/xen/arch/x86/boot/defs.h 

[Xen-devel] [PATCH RFC v2 11/12] x86: modify interrupt handlers to support stack switching

2018-01-22 Thread Juergen Gross
Modify the interrupt handlers to switch stacks on interrupt entry in
case they are running on a per-vcpu stack. Same applies to returning
to the guest: in case the to be loaded context is located on a
per-vcpu stack switch to this one before returning to the guest.

Signed-off-by: Juergen Gross 
---
 xen/arch/x86/x86_64/asm-offsets.c  |  4 
 xen/arch/x86/x86_64/compat/entry.S |  5 -
 xen/arch/x86/x86_64/entry.S| 15 +--
 xen/common/wait.c  |  8 
 xen/include/asm-x86/asm_defns.h| 19 +++
 xen/include/asm-x86/current.h  | 10 +-
 6 files changed, 53 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/x86_64/asm-offsets.c 
b/xen/arch/x86/x86_64/asm-offsets.c
index e136af6b99..0da756e7af 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -137,6 +137,10 @@ void __dummy__(void)
 OFFSET(CPUINFO_processor_id, struct cpu_info, processor_id);
 OFFSET(CPUINFO_current_vcpu, struct cpu_info, current_vcpu);
 OFFSET(CPUINFO_cr4, struct cpu_info, cr4);
+OFFSET(CPUINFO_stack_bottom_cpu, struct cpu_info, stack_bottom_cpu);
+OFFSET(CPUINFO_flags, struct cpu_info, flags);
+DEFINE(ASM_ON_VCPUSTACK, ON_VCPUSTACK);
+DEFINE(ASM_VCPUSTACK_ACTIVE, VCPUSTACK_ACTIVE);
 DEFINE(CPUINFO_sizeof, sizeof(struct cpu_info));
 BLANK();
 
diff --git a/xen/arch/x86/x86_64/compat/entry.S 
b/xen/arch/x86/x86_64/compat/entry.S
index abf3fcae48..b8d74e83db 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -19,6 +19,7 @@ ENTRY(entry_int82)
 movl  $HYPERCALL_VECTOR, 4(%rsp)
 SAVE_ALL compat=1 /* DPL1 gate, restricted to 32bit PV guests only. */
 mov   %rsp, %rdi
+SWITCH_FROM_VCPU_STACK
 CR4_PV32_RESTORE
 
 GET_CURRENT(bx)
@@ -109,6 +110,7 @@ compat_process_trap:
 /* %rbx: struct vcpu, interrupts disabled */
 ENTRY(compat_restore_all_guest)
 ASSERT_INTERRUPTS_DISABLED
+SWITCH_TO_VCPU_STACK
 mov   $~(X86_EFLAGS_IOPL|X86_EFLAGS_NT|X86_EFLAGS_VM),%r11d
 and   UREGS_eflags(%rsp),%r11d
 .Lcr4_orig:
@@ -195,7 +197,6 @@ ENTRY(compat_post_handle_exception)
 
 /* See lstar_enter for entry register state. */
 ENTRY(cstar_enter)
-sti
 CR4_PV32_RESTORE
 movq  8(%rsp),%rax /* Restore %rax. */
 movq  $FLAT_KERNEL_SS,8(%rsp)
@@ -206,6 +207,8 @@ ENTRY(cstar_enter)
 movl  $TRAP_syscall, 4(%rsp)
 SAVE_ALL
 movq  %rsp, %rdi
+SWITCH_FROM_VCPU_STACK
+sti
 GET_CURRENT(bx)
 movq  VCPU_domain(%rbx),%rcx
 cmpb  $0,DOMAIN_is_32bit_pv(%rcx)
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index f7412b87c2..991a8799a9 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -37,6 +37,7 @@ ENTRY(switch_to_kernel)
 /* %rbx: struct vcpu, interrupts disabled */
 restore_all_guest:
 ASSERT_INTERRUPTS_DISABLED
+SWITCH_TO_VCPU_STACK
 RESTORE_ALL
 testw $TRAP_syscall,4(%rsp)
 jziret_exit_to_guest
@@ -71,6 +72,7 @@ iret_exit_to_guest:
 ALIGN
 /* No special register assumptions. */
 restore_all_xen:
+SWITCH_TO_VCPU_STACK
 RESTORE_ALL adj=8
 iretq
 
@@ -91,7 +93,6 @@ restore_all_xen:
  * %ss must be saved into the space left by the trampoline.
  */
 ENTRY(lstar_enter)
-sti
 movq  8(%rsp),%rax /* Restore %rax. */
 movq  $FLAT_KERNEL_SS,8(%rsp)
 pushq %r11
@@ -101,6 +102,8 @@ ENTRY(lstar_enter)
 movl  $TRAP_syscall, 4(%rsp)
 SAVE_ALL
 mov   %rsp, %rdi
+SWITCH_FROM_VCPU_STACK
+sti
 GET_CURRENT(bx)
 testb $TF_kernel_mode,VCPU_thread_flags(%rbx)
 jzswitch_to_kernel
@@ -189,7 +192,6 @@ process_trap:
 jmp  test_all_events
 
 ENTRY(sysenter_entry)
-sti
 pushq $FLAT_USER_SS
 pushq $0
 pushfq
@@ -201,6 +203,8 @@ GLOBAL(sysenter_eflags_saved)
 movl  $TRAP_syscall, 4(%rsp)
 SAVE_ALL
 movq  %rsp, %rdi
+SWITCH_FROM_VCPU_STACK
+sti
 GET_CURRENT(bx)
 cmpb  $0,VCPU_sysenter_disables_events(%rbx)
 movq  VCPU_sysenter_addr(%rbx),%rax
@@ -237,6 +241,7 @@ ENTRY(int80_direct_trap)
 movl  $0x80, 4(%rsp)
 SAVE_ALL
 mov   %rsp, %rdi
+SWITCH_FROM_VCPU_STACK
 
 cmpb  $0,untrusted_msi(%rip)
 UNLIKELY_START(ne, msi_check)
@@ -408,6 +413,7 @@ ENTRY(dom_crash_sync_extable)
 ENTRY(common_interrupt)
 SAVE_ALL CLAC
 movq %rsp,%rdi
+SWITCH_FROM_VCPU_STACK
 CR4_PV32_RESTORE
 pushq %rdi
 callq do_IRQ
@@ -430,6 +436,7 @@ ENTRY(page_fault)
 GLOBAL(handle_exception)
 SAVE_ALL CLAC
 movq  %rsp, %rdi
+SWITCH_FROM_VCPU_STACK
 handle_exception_saved:
 GET_CURRENT(bx)
 testb 

[Xen-devel] [PATCH RFC v2 04/12] x86: revert 5784de3e2067ed73efc2fe42e62831e8ae7f46c4

2018-01-22 Thread Juergen Gross
Revert patch "x86: Meltdown band-aid against malicious 64-bit PV
guests" in order to prepare for a final Meltdown mitigation.

Signed-off-by: Juergen Gross 
---
 xen/arch/x86/domain.c  |   5 -
 xen/arch/x86/mm.c  |  21 
 xen/arch/x86/smpboot.c | 200 -
 xen/arch/x86/x86_64/asm-offsets.c  |   2 -
 xen/arch/x86/x86_64/compat/entry.S |  11 --
 xen/arch/x86/x86_64/entry.S| 149 +--
 xen/include/asm-x86/asm_defns.h|  30 --
 xen/include/asm-x86/current.h  |  12 ---
 xen/include/asm-x86/processor.h|   1 -
 xen/include/asm-x86/x86_64/page.h  |   5 +-
 10 files changed, 6 insertions(+), 430 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 8589d856be..da1bf1a97b 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1511,9 +1511,6 @@ void paravirt_ctxt_switch_to(struct vcpu *v)
 {
 unsigned long cr4;
 
-this_cpu(root_pgt)[root_table_offset(PERDOMAIN_VIRT_START)] =
-l4e_from_page(v->domain->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW);
-
 cr4 = pv_guest_cr4_to_real_cr4(v);
 if ( unlikely(cr4 != read_cr4()) )
 write_cr4(cr4);
@@ -1685,8 +1682,6 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 
 ASSERT(local_irq_is_enabled());
 
-get_cpu_info()->xen_cr3 = 0;
-
 cpumask_copy(_mask, next->vcpu_dirty_cpumask);
 /* Allow at most one CPU at a time to be dirty. */
 ASSERT(cpumask_weight(_mask) <= 1);
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index c83f5224c1..74cdb6e14d 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3489,7 +3489,6 @@ long do_mmu_update(
 struct vcpu *curr = current, *v = curr;
 struct domain *d = v->domain, *pt_owner = d, *pg_owner;
 mfn_t map_mfn = INVALID_MFN;
-bool sync_guest = false;
 uint32_t xsm_needed = 0;
 uint32_t xsm_checked = 0;
 int rc = put_old_guest_table(curr);
@@ -3653,8 +3652,6 @@ long do_mmu_update(
 break;
 rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
   cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
-if ( !rc )
-sync_guest = true;
 break;
 
 case PGT_writable_page:
@@ -3759,24 +3756,6 @@ long do_mmu_update(
 if ( va )
 unmap_domain_page(va);
 
-if ( sync_guest )
-{
-/*
- * Force other vCPU-s of the affected guest to pick up L4 entry
- * changes (if any). Issue a flush IPI with empty operation mask to
- * facilitate this (including ourselves waiting for the IPI to
- * actually have arrived). Utilize the fact that FLUSH_VA_VALID is
- * meaningless without FLUSH_CACHE, but will allow to pass the no-op
- * check in flush_area_mask().
- */
-unsigned int cpu = smp_processor_id();
-cpumask_t *mask = per_cpu(scratch_cpumask, cpu);
-
-cpumask_andnot(mask, pt_owner->domain_dirty_cpumask, cpumask_of(cpu));
-if ( !cpumask_empty(mask) )
-flush_area_mask(mask, ZERO_BLOCK_PTR, FLUSH_VA_VALID);
-}
-
 perfc_add(num_page_updates, i);
 
  out:
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 37a7e59760..eebc4e8528 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -328,9 +328,6 @@ void start_secondary(void *unused)
  */
 spin_debug_disable();
 
-get_cpu_info()->xen_cr3 = 0;
-get_cpu_info()->pv_cr3 = __pa(this_cpu(root_pgt));
-
 load_system_tables();
 
 /* Full exception support from here on in. */
@@ -640,187 +637,6 @@ void cpu_exit_clear(unsigned int cpu)
 set_cpu_state(CPU_STATE_DEAD);
 }
 
-static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
-{
-unsigned long linear = (unsigned long)ptr, pfn;
-unsigned int flags;
-l3_pgentry_t *pl3e = l4e_to_l3e(idle_pg_table[root_table_offset(linear)]) +
- l3_table_offset(linear);
-l2_pgentry_t *pl2e;
-l1_pgentry_t *pl1e;
-
-if ( linear < DIRECTMAP_VIRT_START )
-return 0;
-
-flags = l3e_get_flags(*pl3e);
-ASSERT(flags & _PAGE_PRESENT);
-if ( flags & _PAGE_PSE )
-{
-pfn = (l3e_get_pfn(*pl3e) & ~((1UL << (2 * PAGETABLE_ORDER)) - 1)) |
-  (PFN_DOWN(linear) & ((1UL << (2 * PAGETABLE_ORDER)) - 1));
-flags &= ~_PAGE_PSE;
-}
-else
-{
-pl2e = l3e_to_l2e(*pl3e) + l2_table_offset(linear);
-flags = l2e_get_flags(*pl2e);
-ASSERT(flags & _PAGE_PRESENT);
-if ( flags & _PAGE_PSE )
-{
-pfn = (l2e_get_pfn(*pl2e) & ~((1UL << PAGETABLE_ORDER) - 1)) |
-  (PFN_DOWN(linear) & ((1UL << PAGETABLE_ORDER) - 1));
-flags &= ~_PAGE_PSE;
-}
-else
-{
-pl1e = l2e_to_l1e(*pl2e) + l1_table_offset(linear);
-flags 

[Xen-devel] [PATCH RFC v2 05/12] x86: don't access saved user regs via rsp in trap handlers

2018-01-22 Thread Juergen Gross
In order to support switching stacks when entering the hypervisor for
support of page table isolation, don't use %rsp for accessing the
saved user registers, but do that via %rdi.

Signed-off-by: Juergen Gross 
---
 xen/arch/x86/x86_64/compat/entry.S |  82 +--
 xen/arch/x86/x86_64/entry.S| 129 +++--
 xen/include/asm-x86/current.h  |  10 ++-
 3 files changed, 134 insertions(+), 87 deletions(-)

diff --git a/xen/arch/x86/x86_64/compat/entry.S 
b/xen/arch/x86/x86_64/compat/entry.S
index 3fea54ee9d..abf3fcae48 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -18,14 +18,14 @@ ENTRY(entry_int82)
 pushq $0
 movl  $HYPERCALL_VECTOR, 4(%rsp)
 SAVE_ALL compat=1 /* DPL1 gate, restricted to 32bit PV guests only. */
+mov   %rsp, %rdi
 CR4_PV32_RESTORE
 
 GET_CURRENT(bx)
 
-mov   %rsp, %rdi
 call  do_entry_int82
 
-/* %rbx: struct vcpu */
+/* %rbx: struct vcpu, %rdi: user_regs */
 ENTRY(compat_test_all_events)
 ASSERT_NOT_IN_ATOMIC
 cli # tests must not race interrupts
@@ -58,20 +58,24 @@ compat_test_guest_events:
 jmp   compat_test_all_events
 
 ALIGN
-/* %rbx: struct vcpu */
+/* %rbx: struct vcpu, %rdi: user_regs */
 compat_process_softirqs:
 sti
+pushq %rdi
 call  do_softirq
+popq  %rdi
 jmp   compat_test_all_events
 
ALIGN
-/* %rbx: struct vcpu */
+/* %rbx: struct vcpu, %rdi: user_regs */
 compat_process_mce:
 testb $1 << VCPU_TRAP_MCE,VCPU_async_exception_mask(%rbx)
 jnz   .Lcompat_test_guest_nmi
 sti
 movb $0,VCPU_mce_pending(%rbx)
+pushq %rdi
 call set_guest_machinecheck_trapbounce
+popq  %rdi
 testl %eax,%eax
 jzcompat_test_all_events
 movzbl VCPU_async_exception_mask(%rbx),%edx # save mask for the
@@ -81,13 +85,15 @@ compat_process_mce:
 jmp   compat_process_trap
 
ALIGN
-/* %rbx: struct vcpu */
+/* %rbx: struct vcpu, %rdi: user_regs */
 compat_process_nmi:
 testb $1 << VCPU_TRAP_NMI,VCPU_async_exception_mask(%rbx)
 jnz  compat_test_guest_events
 sti
 movb  $0,VCPU_nmi_pending(%rbx)
+pushq %rdi
 call  set_guest_nmi_trapbounce
+popq  %rdi
 testl %eax,%eax
 jzcompat_test_all_events
 movzbl VCPU_async_exception_mask(%rbx),%edx # save mask for the
@@ -178,7 +184,7 @@ ENTRY(cr4_pv32_restore)
 xor   %eax, %eax
 ret
 
-/* %rdx: trap_bounce, %rbx: struct vcpu */
+/* %rdx: trap_bounce, %rbx: struct vcpu, %rdi: user_regs */
 ENTRY(compat_post_handle_exception)
 testb $TBF_EXCEPTION,TRAPBOUNCE_flags(%rdx)
 jzcompat_test_all_events
@@ -199,6 +205,7 @@ ENTRY(cstar_enter)
 pushq $0
 movl  $TRAP_syscall, 4(%rsp)
 SAVE_ALL
+movq  %rsp, %rdi
 GET_CURRENT(bx)
 movq  VCPU_domain(%rbx),%rcx
 cmpb  $0,DOMAIN_is_32bit_pv(%rcx)
@@ -211,13 +218,15 @@ ENTRY(cstar_enter)
 testl $~3,%esi
 leal  (,%rcx,TBF_INTERRUPT),%ecx
 UNLIKELY_START(z, compat_syscall_gpf)
-movq  VCPU_trap_ctxt(%rbx),%rdi
-movl  $TRAP_gp_fault,UREGS_entry_vector(%rsp)
-subl  $2,UREGS_rip(%rsp)
+pushq %rcx
+movq  VCPU_trap_ctxt(%rbx),%rcx
+movl  $TRAP_gp_fault,UREGS_entry_vector(%rdi)
+subl  $2,UREGS_rip(%rdi)
 movl  $0,TRAPBOUNCE_error_code(%rdx)
-movl  TRAP_gp_fault * TRAPINFO_sizeof + TRAPINFO_eip(%rdi),%eax
-movzwl TRAP_gp_fault * TRAPINFO_sizeof + TRAPINFO_cs(%rdi),%esi
-testb $4,TRAP_gp_fault * TRAPINFO_sizeof + TRAPINFO_flags(%rdi)
+movl  TRAP_gp_fault * TRAPINFO_sizeof + TRAPINFO_eip(%rcx),%eax
+movzwl TRAP_gp_fault * TRAPINFO_sizeof + TRAPINFO_cs(%rcx),%esi
+testb $4,TRAP_gp_fault * TRAPINFO_sizeof + TRAPINFO_flags(%rcx)
+popq  %rcx
 setnz %cl
 leal  TBF_EXCEPTION|TBF_EXCEPTION_ERRCODE(,%rcx,TBF_INTERRUPT),%ecx
 UNLIKELY_END(compat_syscall_gpf)
@@ -229,12 +238,12 @@ UNLIKELY_END(compat_syscall_gpf)
 ENTRY(compat_sysenter)
 CR4_PV32_RESTORE
 movq  VCPU_trap_ctxt(%rbx),%rcx
-cmpb  $TRAP_gp_fault,UREGS_entry_vector(%rsp)
+cmpb  $TRAP_gp_fault,UREGS_entry_vector(%rdi)
 movzwl VCPU_sysenter_sel(%rbx),%eax
 movzwl TRAP_gp_fault * TRAPINFO_sizeof + TRAPINFO_cs(%rcx),%ecx
 cmovel %ecx,%eax
 testl $~3,%eax
-movl  $FLAT_COMPAT_USER_SS,UREGS_ss(%rsp)
+movl  $FLAT_COMPAT_USER_SS,UREGS_ss(%rdi)
 cmovzl %ecx,%eax
 movw  %ax,TRAPBOUNCE_cs(%rdx)
 call  compat_create_bounce_frame
@@ -247,26 +256,27 @@ ENTRY(compat_int80_direct_trap)
 
 /* CREATE A BASIC EXCEPTION FRAME ON GUEST OS (RING-1) STACK:*/
 /*   {[ERRCODE,] EIP, CS, EFLAGS, [ESP, SS]}  

Re: [Xen-devel] further post-Meltdown-bad-aid performance thoughts

2018-01-22 Thread George Dunlap
On 01/22/2018 09:25 AM, Jan Beulich wrote:
 On 19.01.18 at 18:00,  wrote:
>> On 01/19/2018 04:36 PM, Jan Beulich wrote:
>> On 19.01.18 at 16:43,  wrote:
 So what if instead of trying to close the "windows", we made it so that
 there was nothing through the windows to see?  If no matter what the
 hypervisor speculatively executed, nothing sensitive was visibile except
 what a vcpu was already allowed to see,
>>>
>>> I think you didn't finish your sentence here, but I also think I
>>> can guess the missing part. There's a price to pay for such an
>>> approach though - iterating over domains, or vCPU-s of a
>>> domain (just as an example) wouldn't be simple list walks
>>> anymore. There are certainly other things. IOW - yes, and
>>> approach like this seems possible, but with all the lost
>>> performance I think we shouldn't go overboard with further
>>> hiding.
>>
>> Right, so the next question: what information *from other guests* are
>> sensitive?
>>
>> Obviously the guest registers are sensitive.  But how much of the
>> information in vcpu struct that we actually need to have "to hand" is
>> actually sensitive information that we need to hide from other VMs?
> 
> None, I think. But that's not the main aspect here. struct vcpu
> instances come and go, which would mean we'd have to
> permanently update what is or is not being exposed in the page
> tables used. This, while solvable, is going to be a significant
> burden in terms of synchronizing page tables (if we continue to
> use per-CPU ones) and/or TLB shootdown. Whereas if only the
> running vCPU's structure (and it's struct domain) are exposed,
> no such synchronization is needed (things would simply be
> updated during context switch).

I'm not sure we're actually communicating.

Correct me if I'm wrong; at the moment, under XPTI, hypercalls running
under Xen still have access to all of host memory.  To protect against
SP3, we remove almost all Xen memory from the address space before
switching to the guest.

What I'm proposing is something like this:

* We have a "global" region of Xen memory that is mapped by all
processors.  This will contain everything we consider not sensitive;
including Xen text segments, and most domain and vcpu data.  But it will
*not* map all of host memory, nor have access to sensitive data, such as
vcpu register state.

* We have per-cpu "local" regions.  In this region we will map,
on-demand, guest memory which is needed to perform current operations.
(We can consider how strictly we need to unmap memory after using it.)
We will also map the current vcpu's registers.

* On entry to a 64-bit PV guest, we don't change the mapping at all.

Now, no matter what the speculative attack -- SP1, SP2, or SP3 -- a vcpu
can only access its own RAM and registers.  There's no extra overhead to
context switching into or out of the hypervisor.

Given that, I don't understand what the following comments mean:

"There's a price to pay for such an approach though - iterating over
domains, or vCPU-s of a domain (just as an example) wouldn't be simple
list walks anymore."

If we remove sensitive information from the domain and vcpu structs,
then any bit of hypervisor code can iterate over domain and vcpu structs
at will; only if they actually need to read or write sensitive data will
they have to perform an expensive map/unmap operation.  But in general,
to read another vcpu's registers you already need to do a vcpu_pause() /
vcpu_unpause(), which involves at least two IPIs (with one
spin-and-wait), so it doesn't seem like that should add a lot of extra
overhead.

"struct vcpu instances come and go, which would mean we'd have to
permanently update what is or is not being exposed in the page tables
used. This, while solvable, is going to be a significant burden in terms
of synchronizing page tables (if we continue to use per-CPU ones) and/or
TLB shootdown."

I don't understand what this is referring to in my proposed plan above.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH RFC v2 12/12] x86: activate per-vcpu stacks in case of xpti

2018-01-22 Thread Juergen Gross
When scheduling a vcpu subject to xpti activate the per-vcpu stacks
by loading the vcpu specific gdt and tss. When de-scheduling such a
vcpu switch back to the per physical cpu gdt and tss.

Accessing the user registers on the stack is done via helpers as
depending on XPTI active or not the registers are located either on
the per-vcpu stack or on the default stack.

Signed-off-by: Juergen Gross 
---
 xen/arch/x86/domain.c  | 76 +++---
 xen/arch/x86/pv/domain.c   | 34 +++--
 xen/include/asm-x86/desc.h |  5 +++
 xen/include/asm-x86/regs.h |  2 +
 4 files changed, 107 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index da1bf1a97b..d75234ca35 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1585,9 +1585,28 @@ static inline bool need_full_gdt(const struct domain *d)
 return is_pv_domain(d) && !is_idle_domain(d);
 }
 
+static void copy_user_regs_from_stack(struct vcpu *v)
+{
+struct cpu_user_regs *stack_regs;
+
+stack_regs = (is_pv_vcpu(v) && v->domain->arch.pv_domain.xpti)
+ ? v->arch.pv_vcpu.stack_regs
+ : _cpu_info()->guest_cpu_user_regs;
+memcpy(>arch.user_regs, stack_regs, CTXT_SWITCH_STACK_BYTES);
+}
+
+static void copy_user_regs_to_stack(struct vcpu *v)
+{
+struct cpu_user_regs *stack_regs;
+
+stack_regs = (is_pv_vcpu(v) && v->domain->arch.pv_domain.xpti)
+ ? v->arch.pv_vcpu.stack_regs
+ : _cpu_info()->guest_cpu_user_regs;
+memcpy(stack_regs, >arch.user_regs, CTXT_SWITCH_STACK_BYTES);
+}
+
 static void __context_switch(void)
 {
-struct cpu_user_regs *stack_regs = guest_cpu_user_regs();
 unsigned int  cpu = smp_processor_id();
 struct vcpu  *p = per_cpu(curr_vcpu, cpu);
 struct vcpu  *n = current;
@@ -1600,7 +1619,7 @@ static void __context_switch(void)
 
 if ( !is_idle_domain(pd) )
 {
-memcpy(>arch.user_regs, stack_regs, CTXT_SWITCH_STACK_BYTES);
+copy_user_regs_from_stack(p);
 vcpu_save_fpu(p);
 pd->arch.ctxt_switch->from(p);
 }
@@ -1616,7 +1635,7 @@ static void __context_switch(void)
 
 if ( !is_idle_domain(nd) )
 {
-memcpy(stack_regs, >arch.user_regs, CTXT_SWITCH_STACK_BYTES);
+copy_user_regs_to_stack(n);
 if ( cpu_has_xsave )
 {
 u64 xcr0 = n->arch.xcr0 ?: XSTATE_FP_SSE;
@@ -1635,7 +1654,7 @@ static void __context_switch(void)
 
 gdt = !is_pv_32bit_domain(nd) ? per_cpu(gdt_table, cpu) :
 per_cpu(compat_gdt_table, cpu);
-if ( need_full_gdt(nd) )
+if ( need_full_gdt(nd) && !nd->arch.pv_domain.xpti )
 {
 unsigned long mfn = virt_to_mfn(gdt);
 l1_pgentry_t *pl1e = pv_gdt_ptes(n);
@@ -1647,23 +1666,68 @@ static void __context_switch(void)
 }
 
 if ( need_full_gdt(pd) &&
- ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(nd)) )
+ ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(nd) ||
+  pd->arch.pv_domain.xpti) )
 {
 gdt_desc.limit = LAST_RESERVED_GDT_BYTE;
 gdt_desc.base  = (unsigned long)(gdt - FIRST_RESERVED_GDT_ENTRY);
 
+if ( pd->arch.pv_domain.xpti )
+_set_tssldt_type(gdt + TSS_ENTRY - FIRST_RESERVED_GDT_ENTRY,
+ SYS_DESC_tss_avail);
+
 lgdt(_desc);
+
+if ( pd->arch.pv_domain.xpti )
+{
+unsigned long stub_va = this_cpu(stubs.addr);
+
+ltr(TSS_ENTRY << 3);
+get_cpu_info()->flags &= ~VCPUSTACK_ACTIVE;
+wrmsrl(MSR_LSTAR, stub_va);
+wrmsrl(MSR_CSTAR, stub_va + STUB_TRAMPOLINE_SIZE_PERCPU);
+if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
+ boot_cpu_data.x86_vendor == X86_VENDOR_CENTAUR )
+wrmsrl(MSR_IA32_SYSENTER_ESP,
+   (unsigned long)_cpu_info()->guest_cpu_user_regs.es);
+}
 }
 
 write_ptbase(n);
 
 if ( need_full_gdt(nd) &&
- ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(pd)) )
+ ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(pd) ||
+  nd->arch.pv_domain.xpti) )
 {
 gdt_desc.limit = LAST_RESERVED_GDT_BYTE;
 gdt_desc.base = GDT_VIRT_START(n);
 
+if ( nd->arch.pv_domain.xpti )
+{
+struct cpu_info *info;
+
+gdt = (struct desc_struct *)GDT_VIRT_START(n);
+gdt[PER_CPU_GDT_ENTRY].a = cpu;
+_set_tssldt_type(gdt + TSS_ENTRY, SYS_DESC_tss_avail);
+info = (struct cpu_info *)(XPTI_START(n) + STACK_SIZE) - 1;
+info->stack_bottom_cpu = (unsigned long)guest_cpu_user_regs();
+}
+
 lgdt(_desc);
+
+if ( nd->arch.pv_domain.xpti )
+{
+unsigned long stub_va = XPTI_TRAMPOLINE(n);
+
+ltr(TSS_ENTRY << 3);
+

[Xen-devel] [PATCH RFC v2 01/12] x86: cleanup processor.h

2018-01-22 Thread Juergen Gross
Remove NSC/Cyrix CPU macros and current_text_addr() which are used
nowhere.

Signed-off-by: Juergen Gross 
---
 xen/include/asm-x86/processor.h | 41 -
 1 file changed, 41 deletions(-)

diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 9dd29bb04c..e8c2f02e99 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -102,16 +102,6 @@
 struct domain;
 struct vcpu;
 
-/*
- * Default implementation of macro that returns current
- * instruction pointer ("program counter").
- */
-#define current_text_addr() ({  \
-void *pc;   \
-asm ( "leaq 1f(%%rip),%0\n1:" : "=r" (pc) );\
-pc; \
-})
-
 struct x86_cpu_id {
 uint16_t vendor;
 uint16_t family;
@@ -375,37 +365,6 @@ static inline bool_t read_pkru_wd(uint32_t pkru, unsigned 
int pkey)
 return (pkru >> (pkey * PKRU_ATTRS + PKRU_WRITE)) & 1;
 }
 
-/*
- *  NSC/Cyrix CPU configuration register indexes
- */
-
-#define CX86_PCR0 0x20
-#define CX86_GCR  0xb8
-#define CX86_CCR0 0xc0
-#define CX86_CCR1 0xc1
-#define CX86_CCR2 0xc2
-#define CX86_CCR3 0xc3
-#define CX86_CCR4 0xe8
-#define CX86_CCR5 0xe9
-#define CX86_CCR6 0xea
-#define CX86_CCR7 0xeb
-#define CX86_PCR1 0xf0
-#define CX86_DIR0 0xfe
-#define CX86_DIR1 0xff
-#define CX86_ARR_BASE 0xc4
-#define CX86_RCR_BASE 0xdc
-
-/*
- *  NSC/Cyrix CPU indexed register access macros
- */
-
-#define getCx86(reg) ({ outb((reg), 0x22); inb(0x23); })
-
-#define setCx86(reg, data) do { \
-outb((reg), 0x22); \
-outb((data), 0x23); \
-} while (0)
-
 static always_inline void __monitor(const void *eax, unsigned long ecx,
 unsigned long edx)
 {
-- 
2.13.6


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH RFC v2 02/12] x86: don't use hypervisor stack size for dumping guest stacks

2018-01-22 Thread Juergen Gross
show_guest_stack() and compat_show_guest_stack() stop dumping the
stack of the guest whenever its virtual address reaches the same
alignment which is used for the hypervisor stacks.

Remove this arbitrary limit and try to dump a fixed number of lines
instead.

Signed-off-by: Juergen Gross 
---
 xen/arch/x86/traps.c | 26 +++---
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index a3e8f0c9b9..1115b69050 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -191,7 +191,8 @@ static void compat_show_guest_stack(struct vcpu *v,
 const struct cpu_user_regs *regs,
 int debug_stack_lines)
 {
-unsigned int i, *stack, addr, mask = STACK_SIZE;
+unsigned int i, *stack, addr;
+unsigned long last_addr = -1L;
 
 stack = (unsigned int *)(unsigned long)regs->esp;
 printk("Guest stack trace from esp=%08lx:\n ", (unsigned long)stack);
@@ -220,13 +221,13 @@ static void compat_show_guest_stack(struct vcpu *v,
 printk("Inaccessible guest memory.\n");
 return;
 }
-mask = PAGE_SIZE;
+last_addr = round_pgup((unsigned long)stack);
 }
 }
 
 for ( i = 0; i < debug_stack_lines * 8; i++ )
 {
-if ( (((long)stack - 1) ^ ((long)(stack + 1) - 1)) & mask )
+if ( (unsigned long)stack >= last_addr )
 break;
 if ( __get_user(addr, stack) )
 {
@@ -241,11 +242,9 @@ static void compat_show_guest_stack(struct vcpu *v,
 printk(" %08x", addr);
 stack++;
 }
-if ( mask == PAGE_SIZE )
-{
-BUILD_BUG_ON(PAGE_SIZE == STACK_SIZE);
+if ( last_addr != -1L )
 unmap_domain_page(stack);
-}
+
 if ( i == 0 )
 printk("Stack empty.");
 printk("\n");
@@ -254,8 +253,7 @@ static void compat_show_guest_stack(struct vcpu *v,
 static void show_guest_stack(struct vcpu *v, const struct cpu_user_regs *regs)
 {
 int i;
-unsigned long *stack, addr;
-unsigned long mask = STACK_SIZE;
+unsigned long *stack, addr, last_addr = -1L;
 
 /* Avoid HVM as we don't know what the stack looks like. */
 if ( is_hvm_vcpu(v) )
@@ -290,13 +288,13 @@ static void show_guest_stack(struct vcpu *v, const struct 
cpu_user_regs *regs)
 printk("Inaccessible guest memory.\n");
 return;
 }
-mask = PAGE_SIZE;
+last_addr = round_pgup((unsigned long)stack);
 }
 }
 
 for ( i = 0; i < (debug_stack_lines*stack_words_per_line); i++ )
 {
-if ( (((long)stack - 1) ^ ((long)(stack + 1) - 1)) & mask )
+if ( (unsigned long)stack >= last_addr )
 break;
 if ( __get_user(addr, stack) )
 {
@@ -311,11 +309,9 @@ static void show_guest_stack(struct vcpu *v, const struct 
cpu_user_regs *regs)
 printk(" %p", _p(addr));
 stack++;
 }
-if ( mask == PAGE_SIZE )
-{
-BUILD_BUG_ON(PAGE_SIZE == STACK_SIZE);
+if ( last_addr != -1L )
 unmap_domain_page(stack);
-}
+
 if ( i == 0 )
 printk("Stack empty.");
 printk("\n");
-- 
2.13.6


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH RFC v2 07/12] x86: allow per-domain mappings without NX bit or with specific mfn

2018-01-22 Thread Juergen Gross
For support of per-vcpu stacks we need per-vcpu trampolines. To be
able to put those into the per-domain mappings the upper levels
page tables must not have NX set for per-domain mappings.

In order to be able to reset the NX bit for a per-domain mapping add
a helper flipflags_perdomain_mapping() for flipping page table flags
of a specific mapped page.

To be able to use a page from xen heap for the last per-vcpu stack
page add a helper to map an arbitrary mfn in the perdomain area.

Signed-off-by: Juergen Gross 
---
 xen/arch/x86/mm.c| 81 ++--
 xen/include/asm-x86/mm.h |  3 ++
 2 files changed, 81 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 74cdb6e14d..ab990cc667 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1568,7 +1568,7 @@ void init_xen_l4_slots(l4_pgentry_t *l4t, mfn_t l4mfn,
 
 /* Slot 260: Per-domain mappings (if applicable). */
 l4t[l4_table_offset(PERDOMAIN_VIRT_START)] =
-d ? l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW)
+d ? l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR)
   : l4e_empty();
 
 /* Slot 261-: text/data/bss, RW M2P, vmap, frametable, directmap. */
@@ -5269,7 +5269,7 @@ int create_perdomain_mapping(struct domain *d, unsigned 
long va,
 }
 l2tab = __map_domain_page(pg);
 clear_page(l2tab);
-l3tab[l3_table_offset(va)] = l3e_from_page(pg, __PAGE_HYPERVISOR_RW);
+l3tab[l3_table_offset(va)] = l3e_from_page(pg, __PAGE_HYPERVISOR);
 }
 else
 l2tab = map_l2t_from_l3e(l3tab[l3_table_offset(va)]);
@@ -5311,7 +5311,7 @@ int create_perdomain_mapping(struct domain *d, unsigned 
long va,
 l1tab = __map_domain_page(pg);
 }
 clear_page(l1tab);
-*pl2e = l2e_from_page(pg, __PAGE_HYPERVISOR_RW);
+*pl2e = l2e_from_page(pg, __PAGE_HYPERVISOR);
 }
 else if ( !l1tab )
 l1tab = map_l1t_from_l2e(*pl2e);
@@ -5401,6 +5401,81 @@ void destroy_perdomain_mapping(struct domain *d, 
unsigned long va,
 unmap_domain_page(l3tab);
 }
 
+void flipflags_perdomain_mapping(struct domain *d, unsigned long va,
+ unsigned int flags)
+{
+const l3_pgentry_t *l3tab, *pl3e;
+
+ASSERT(va >= PERDOMAIN_VIRT_START &&
+   va < PERDOMAIN_VIRT_SLOT(PERDOMAIN_SLOTS));
+
+if ( !d->arch.perdomain_l3_pg )
+return;
+
+l3tab = __map_domain_page(d->arch.perdomain_l3_pg);
+pl3e = l3tab + l3_table_offset(va);
+
+if ( l3e_get_flags(*pl3e) & _PAGE_PRESENT )
+{
+const l2_pgentry_t *l2tab = map_l2t_from_l3e(*pl3e);
+const l2_pgentry_t *pl2e = l2tab + l2_table_offset(va);
+
+if ( l2e_get_flags(*pl2e) & _PAGE_PRESENT )
+{
+l1_pgentry_t *l1tab = map_l1t_from_l2e(*pl2e);
+unsigned int off = l1_table_offset(va);
+
+if ( (l1e_get_flags(l1tab[off]) & (_PAGE_PRESENT | _PAGE_AVAIL0)) 
==
+ (_PAGE_PRESENT | _PAGE_AVAIL0) )
+l1e_flip_flags(l1tab[off], flags);
+
+unmap_domain_page(l1tab);
+}
+
+unmap_domain_page(l2tab);
+}
+
+unmap_domain_page(l3tab);
+}
+
+void addmfn_to_perdomain_mapping(struct domain *d, unsigned long va, mfn_t mfn)
+{
+const l3_pgentry_t *l3tab, *pl3e;
+
+ASSERT(va >= PERDOMAIN_VIRT_START &&
+   va < PERDOMAIN_VIRT_SLOT(PERDOMAIN_SLOTS));
+
+if ( !d->arch.perdomain_l3_pg )
+return;
+
+l3tab = __map_domain_page(d->arch.perdomain_l3_pg);
+pl3e = l3tab + l3_table_offset(va);
+
+if ( l3e_get_flags(*pl3e) & _PAGE_PRESENT )
+{
+const l2_pgentry_t *l2tab = map_l2t_from_l3e(*pl3e);
+const l2_pgentry_t *pl2e = l2tab + l2_table_offset(va);
+
+if ( l2e_get_flags(*pl2e) & _PAGE_PRESENT )
+{
+l1_pgentry_t *l1tab = map_l1t_from_l2e(*pl2e);
+unsigned int off = l1_table_offset(va);
+
+if ( (l1e_get_flags(l1tab[off]) & (_PAGE_PRESENT | _PAGE_AVAIL0)) 
==
+ (_PAGE_PRESENT | _PAGE_AVAIL0) )
+free_domheap_page(l1e_get_page(l1tab[off]));
+
+l1tab[off] = l1e_from_mfn(mfn, __PAGE_HYPERVISOR_RW);
+
+unmap_domain_page(l1tab);
+}
+
+unmap_domain_page(l2tab);
+}
+
+unmap_domain_page(l3tab);
+}
+
 void free_perdomain_mappings(struct domain *d)
 {
 l3_pgentry_t *l3tab;
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 3013c266fe..fa158bd96a 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -582,6 +582,9 @@ int create_perdomain_mapping(struct domain *, unsigned long 
va,
  struct page_info **);
 void destroy_perdomain_mapping(struct domain *, unsigned long va,
unsigned int nr);
+void flipflags_perdomain_mapping(struct domain *d, 

[Xen-devel] [PATCH RFC v2 08/12] xen/x86: use dedicated function for tss initialization

2018-01-22 Thread Juergen Gross
Carve out the TSS initialization from load_system_tables().

Signed-off-by: Juergen Gross 
---
 xen/arch/x86/cpu/common.c| 56 
 xen/include/asm-x86/system.h |  1 +
 2 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 4306e59650..f9ec05c3ee 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -702,6 +702,35 @@ void __init early_cpu_init(void)
early_cpu_detect();
 }
 
+void tss_init(struct tss_struct *tss, unsigned long stack_bottom)
+{
+   unsigned long stack_top = stack_bottom & ~(STACK_SIZE - 1);
+
+   *tss = (struct tss_struct){
+   /* Main stack for interrupts/exceptions. */
+   .rsp0 = stack_bottom,
+
+   /* Ring 1 and 2 stacks poisoned. */
+   .rsp1 = 0x8600ul,
+   .rsp2 = 0x8600ul,
+
+   /*
+* MCE, NMI and Double Fault handlers get their own stacks.
+* All others poisoned.
+*/
+   .ist = {
+   [IST_MCE - 1] = stack_top + IST_MCE * PAGE_SIZE,
+   [IST_DF  - 1] = stack_top + IST_DF  * PAGE_SIZE,
+   [IST_NMI - 1] = stack_top + IST_NMI * PAGE_SIZE,
+
+   [IST_MAX ... ARRAY_SIZE(tss->ist) - 1] =
+   0x8600ul,
+   },
+
+   .bitmap = IOBMP_INVALID_OFFSET,
+   };
+}
+
 /*
  * Sets up system tables and descriptors.
  *
@@ -713,8 +742,7 @@ void __init early_cpu_init(void)
 void load_system_tables(void)
 {
unsigned int cpu = smp_processor_id();
-   unsigned long stack_bottom = get_stack_bottom(),
-   stack_top = stack_bottom & ~(STACK_SIZE - 1);
+   unsigned long stack_bottom = get_stack_bottom();
 
struct tss_struct *tss = _cpu(init_tss);
struct desc_struct *gdt =
@@ -731,29 +759,7 @@ void load_system_tables(void)
.limit = (IDT_ENTRIES * sizeof(idt_entry_t)) - 1,
};
 
-   *tss = (struct tss_struct){
-   /* Main stack for interrupts/exceptions. */
-   .rsp0 = stack_bottom,
-
-   /* Ring 1 and 2 stacks poisoned. */
-   .rsp1 = 0x8600ul,
-   .rsp2 = 0x8600ul,
-
-   /*
-* MCE, NMI and Double Fault handlers get their own stacks.
-* All others poisoned.
-*/
-   .ist = {
-   [IST_MCE - 1] = stack_top + IST_MCE * PAGE_SIZE,
-   [IST_DF  - 1] = stack_top + IST_DF  * PAGE_SIZE,
-   [IST_NMI - 1] = stack_top + IST_NMI * PAGE_SIZE,
-
-   [IST_MAX ... ARRAY_SIZE(tss->ist) - 1] =
-   0x8600ul,
-   },
-
-   .bitmap = IOBMP_INVALID_OFFSET,
-   };
+   tss_init(tss, stack_bottom);
 
_set_tssldt_desc(
gdt + TSS_ENTRY,
diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
index 8ac170371b..2cf50d1d49 100644
--- a/xen/include/asm-x86/system.h
+++ b/xen/include/asm-x86/system.h
@@ -230,6 +230,7 @@ static inline int local_irq_is_enabled(void)
 
 void trap_init(void);
 void init_idt_traps(void);
+void tss_init(struct tss_struct *tss, unsigned long stack_bottom);
 void load_system_tables(void);
 void percpu_traps_init(void);
 void subarch_percpu_traps_init(void);
-- 
2.13.6


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH RFC v2 00/12] xen/x86: use per-vcpu stacks for 64 bit pv domains

2018-01-22 Thread Juergen Gross
As a preparation for doing page table isolation in the Xen hypervisor
in order to mitigate "Meltdown" use dedicated stacks, GDT and TSS for
64 bit PV domains mapped to the per-domain virtual area.

The per-vcpu stacks are used for early interrupt handling only. After
saving the domain's registers stacks are switched back to the normal
per physical cpu ones in order to be able to address on-stack data
from other cpus e.g. while handling IPIs.

Adding %cr3 switching between saving of the registers and switching
the stacks will enable the possibility to run guest code without any
per physical cpu mapping, i.e. avoiding the threat of a guest being
able to access other domains data.

Without any further measures it will still be possible for e.g. a
guest's user program to read stack data of another vcpu of the same
domain, but this can be easily avoided by a little PV-ABI modification
introducing per-cpu user address spaces.

This series is meant as a replacement for Andrew's patch series:
"x86: Prerequisite work for a Xen KAISER solution".

What needs to be done:
- verify livepatching is still working
- performance evaluation (Dario is working on it)
- the real page table switching


Changes since RFC V1:
- switch back to per physical cpu stacks in interrupt handling
- complete rework of series
- rebase to current staging
- adding reverts of Jan's band-aid patches
- adding two minor cleanups at the begin of the series
- done much more testing, including NMIs

Juergen Gross (12):
  x86: cleanup processor.h
  x86: don't use hypervisor stack size for dumping guest stacks
  x86: do a revert of e871e80c38547d9faefc6604532ba3e985e65873
  x86: revert 5784de3e2067ed73efc2fe42e62831e8ae7f46c4
  x86: don't access saved user regs via rsp in trap handlers
  x86: add a xpti command line parameter
  x86: allow per-domain mappings without NX bit or with specific mfn
  xen/x86: use dedicated function for tss initialization
  x86: enhance syscall stub to work in per-domain mapping
  x86: allocate per-vcpu stacks for interrupt entries
  x86: modify interrupt handlers to support stack switching
  x86: activate per-vcpu stacks in case of xpti

 docs/misc/xen-command-line.markdown |  16 +-
 xen/arch/x86/cpu/common.c   |  56 ---
 xen/arch/x86/domain.c   |  84 --
 xen/arch/x86/mm.c   | 102 ++---
 xen/arch/x86/pv/domain.c| 161 +++-
 xen/arch/x86/smpboot.c  | 211 --
 xen/arch/x86/traps.c|  26 ++--
 xen/arch/x86/x86_64/asm-offsets.c   |   6 +-
 xen/arch/x86/x86_64/compat/entry.S  |  98 ++--
 xen/arch/x86/x86_64/entry.S | 295 
 xen/arch/x86/x86_64/traps.c |  47 +++---
 xen/common/wait.c   |   8 +-
 xen/include/asm-x86/asm_defns.h |  49 +++---
 xen/include/asm-x86/config.h|  13 +-
 xen/include/asm-x86/current.h   |  71 ++---
 xen/include/asm-x86/desc.h  |   5 +
 xen/include/asm-x86/domain.h|   5 +
 xen/include/asm-x86/mm.h|   3 +
 xen/include/asm-x86/processor.h |  42 -
 xen/include/asm-x86/regs.h  |   2 +
 xen/include/asm-x86/system.h|   8 +
 xen/include/asm-x86/x86_64/page.h   |   5 +-
 22 files changed, 647 insertions(+), 666 deletions(-)

-- 
2.13.6


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH RFC v2 09/12] x86: enhance syscall stub to work in per-domain mapping

2018-01-22 Thread Juergen Gross
Use indirect jump via register in case the target address isn't
reachable via a 32 bit relative jump.

Add macros for stub size and use those instead of returning the size
when writing the stub trampoline in order to support easy switching
between different sized stubs.

Signed-off-by: Juergen Gross 
---
 xen/arch/x86/x86_64/traps.c  | 47 +---
 xen/include/asm-x86/system.h |  7 +++
 2 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index 3652f5ff21..b4836f623c 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -260,10 +260,11 @@ void do_double_fault(struct cpu_user_regs *regs)
 panic("DOUBLE FAULT -- system shutdown");
 }
 
-static unsigned int write_stub_trampoline(
-unsigned char *stub, unsigned long stub_va,
-unsigned long stack_bottom, unsigned long target_va)
+void write_stub_trampoline(unsigned char *stub, unsigned long stub_va,
+   unsigned long stack_bottom, unsigned long target_va)
 {
+long target_diff;
+
 /* movabsq %rax, stack_bottom - 8 */
 stub[0] = 0x48;
 stub[1] = 0xa3;
@@ -282,24 +283,32 @@ static unsigned int write_stub_trampoline(
 /* pushq %rax */
 stub[23] = 0x50;
 
-/* jmp target_va */
-stub[24] = 0xe9;
-*(int32_t *)[25] = target_va - (stub_va + 29);
-
-/* Round up to a multiple of 16 bytes. */
-return 32;
+target_diff = target_va - (stub_va + 29);
+if ( target_diff >> 31 == target_diff >> 63 )
+{
+/* jmp target_va */
+stub[24] = 0xe9;
+*(int32_t *)[25] = target_diff;
+}
+else
+{
+/* movabs target_va, %rax */
+stub[24] = 0x48;
+stub[25] = 0xb8;
+*(uint64_t *)[26] = target_va;
+/* jmpq *%rax */
+stub[34] = 0xff;
+stub[35] = 0xe0;
+}
 }
 
 DEFINE_PER_CPU(struct stubs, stubs);
-void lstar_enter(void);
-void cstar_enter(void);
 
 void subarch_percpu_traps_init(void)
 {
 unsigned long stack_bottom = get_stack_bottom();
 unsigned long stub_va = this_cpu(stubs.addr);
 unsigned char *stub_page;
-unsigned int offset;
 
 /* IST_MAX IST pages + 1 syscall page + 1 guard page + primary stack. */
 BUILD_BUG_ON((IST_MAX + 2) * PAGE_SIZE + PRIMARY_STACK_SIZE > STACK_SIZE);
@@ -312,10 +321,9 @@ void subarch_percpu_traps_init(void)
  * start of the stubs.
  */
 wrmsrl(MSR_LSTAR, stub_va);
-offset = write_stub_trampoline(stub_page + (stub_va & ~PAGE_MASK),
-   stub_va, stack_bottom,
-   (unsigned long)lstar_enter);
-stub_va += offset;
+write_stub_trampoline(stub_page + (stub_va & ~PAGE_MASK), stub_va,
+  stack_bottom, (unsigned long)lstar_enter);
+stub_va += STUB_TRAMPOLINE_SIZE_PERCPU;
 
 if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
  boot_cpu_data.x86_vendor == X86_VENDOR_CENTAUR )
@@ -328,12 +336,11 @@ void subarch_percpu_traps_init(void)
 
 /* Trampoline for SYSCALL entry from compatibility mode. */
 wrmsrl(MSR_CSTAR, stub_va);
-offset += write_stub_trampoline(stub_page + (stub_va & ~PAGE_MASK),
-stub_va, stack_bottom,
-(unsigned long)cstar_enter);
+write_stub_trampoline(stub_page + (stub_va & ~PAGE_MASK), stub_va,
+  stack_bottom, (unsigned long)cstar_enter);
 
 /* Don't consume more than half of the stub space here. */
-ASSERT(offset <= STUB_BUF_SIZE / 2);
+ASSERT(2 * STUB_TRAMPOLINE_SIZE_PERCPU <= STUB_BUF_SIZE / 2);
 
 unmap_domain_page(stub_page);
 
diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
index 2cf50d1d49..c5baf7c991 100644
--- a/xen/include/asm-x86/system.h
+++ b/xen/include/asm-x86/system.h
@@ -231,6 +231,13 @@ static inline int local_irq_is_enabled(void)
 void trap_init(void);
 void init_idt_traps(void);
 void tss_init(struct tss_struct *tss, unsigned long stack_bottom);
+void write_stub_trampoline(unsigned char *stub, unsigned long stub_va,
+   unsigned long stack_bottom,
+   unsigned long target_va);
+#define STUB_TRAMPOLINE_SIZE_PERCPU   32
+#define STUB_TRAMPOLINE_SIZE_PERVCPU  64
+void lstar_enter(void);
+void cstar_enter(void);
 void load_system_tables(void);
 void percpu_traps_init(void);
 void subarch_percpu_traps_init(void);
-- 
2.13.6


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/shutdown: Use ACPI reboot method for Dell PowerEdge R740

2018-01-22 Thread Jan Beulich
>>> On 19.01.18 at 17:57,  wrote:
> --- a/xen/arch/x86/shutdown.c
> +++ b/xen/arch/x86/shutdown.c
> @@ -511,6 +511,15 @@ static struct dmi_system_id __initdata 
> reboot_dmi_table[] = {
>  DMI_MATCH(DMI_PRODUCT_NAME, "Latitude E6520"),
>  },
>  },
> +{/* Handle problems with rebooting on Dell PowerEdge R740. */
> +.callback = override_reboot,
> +.driver_data = (void *)(long)BOOT_ACPI,
> +.ident = "Dell PowerEdge R740",
> +.matches = {
> +DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
> +DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge R740"),
> +},
> +},

Judging from the description you don't really want or need to
override the reboot method if not running under EFI, or if there
was an override on the command line already. override_reboot(),
however, overrides everything and under all circumstances. I
therefore think you may want to introduce a new callback
function.

As an aside - how come the page at address zero is actually
mapped at the time of the reboot attempt?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 5/7] x86: relocate pvh_info

2018-01-22 Thread Wei Liu
On Fri, Jan 19, 2018 at 04:29:31PM +, Roger Pau Monné wrote:
> On Fri, Jan 19, 2018 at 03:34:56PM +, Wei Liu wrote:
> > diff --git a/xen/arch/x86/boot/build32.mk b/xen/arch/x86/boot/build32.mk
> > index 48c7407c00..028ac19b96 100644
> > --- a/xen/arch/x86/boot/build32.mk
> > +++ b/xen/arch/x86/boot/build32.mk
> > @@ -36,5 +36,8 @@ CFLAGS := $(filter-out -flto,$(CFLAGS))
> >  cmdline.o: cmdline.c $(CMDLINE_DEPS)
> >  
> >  reloc.o: reloc.c $(RELOC_DEPS)
> > +ifeq ($(CONFIG_PVH_GUEST),y)
> > +reloc.o: CFLAGS += -DCONFIG_PVH_GUEST
> > +endif
> 
> I would maybe do this above, where the rest of the CFLAGS are set.
> Certainly setting -DCONFIG_PVH_GUEST shouldn't cause issues elsewhere.
> 
> CFLAGS-$(CONFIG_PVH_GUEST) += -DCONFIG_PVH_GUEST
> CFLAGS += $(CFLAGS-y)
> 

Missed this one.

I would rather only have -DCONFIG_PVH_GUEST for the file that needs it.
Let me know if you feel strongly about this.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 06/11] x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point

2018-01-22 Thread Jan Beulich
>>> On 22.01.18 at 12:42,  wrote:
> On 19/01/18 13:51, Jan Beulich wrote:
> On 19.01.18 at 14:36,  wrote:
>>> On 19/01/18 11:43, Jan Beulich wrote:
>>> On 18.01.18 at 16:46,  wrote:
> @@ -729,6 +760,9 @@ ENTRY(nmi)
>  handle_ist_exception:
>  SAVE_ALL CLAC
>  
> +SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, Clob: acd */
> +/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. 
> */
 Following my considerations towards alternative patching to
 eliminate as much overhead as possible from the Meltdown
 band-aid in case it is being disabled, I'm rather hesitant to see any
 patchable code being introduced into the NMI/#MC entry paths
 without the patching logic first being made safe in this regard.
 Exceptions coming here aren't very frequent (except perhaps on
 hardware about to die), so the path isn't performance critical.
 Therefore I think we should try to avoid any patching here, and
 just conditionals instead. This in fact is one of the reasons why I
 didn't want to macro-ize the assembly additions done in the
 Meltdown band-aid.

 I do realize that this then also affects the exit-to-Xen path,
 which I agree is less desirable to use conditionals on.
>>> While I agree that our lack of IST-safe patching is a problem, these
>>> alternative points are already present on the NMI and MCE paths, and
>>> need to be.  As a result, the DF handler is in no worse of a position. 
>>> As a perfect example, observe the CLAC in context.
>> Oh, indeed. We should change that.
>>
>>> I could perhaps be talked into making a SPEC_CTRL_ENTRY_FROM_IST variant
>>> which doesn't use alternatives (but IMO this is pointless in the
>>> presence of CLAC), but still don't think it is reasonable to treat DF
>>> differently to NMI/MCE.
>> #DF is debatable: On one hand I can see that if things go wrong,
>> it can equally be raised at any time. Otoh #MC and even more so
>> NMI can be raised _without_ things going (fatally) wrong, i.e. the
>> patching may break a boot which would otherwise have succeeded
>> (whereas the #DF would make the boot fail anyway).
> 
> I don't see a conclusion here, or a reason for treating #DF differently
> to NMI or #MC.

Odd - I thought my reply was pretty clear in this regard. I have
no good idea how to word it differently. Furthermore the goal
of the reply was not to settle on how to treat #DF, but to try
to convince you to avoid adding more patch points to the NMI /
#MC path (if you want #DF treated similarly, I wouldn't
object patching to be avoided there too).

> There is currently a very very slim race on boot where an NMI or #MC
> hitting the main application of alternatives may cause Xen to explode. 
> This has been the case since alternatives were introduced, and this
> patch doesn't make the problem meaningfully worse.

SMAP patching affects 3 bytes (and I'm intending to put together a
patch removing that patching from the NMI / #MC path), while you
add patching of quite a few more bytes, increasing the risk
accordingly.

If you really don't want to switch away from the patching approach,
I won't refuse to ack the patch. But it'll mean subsequent changes
will be more intrusive, to get this converted to conditionals instead
(unless someone has _immediate_ plans to deal with the issues in
the patching logic itself).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 06/11] x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point

2018-01-22 Thread Andrew Cooper
On 19/01/18 13:51, Jan Beulich wrote:
 On 19.01.18 at 14:36,  wrote:
>> On 19/01/18 11:43, Jan Beulich wrote:
>> On 18.01.18 at 16:46,  wrote:
 @@ -729,6 +760,9 @@ ENTRY(nmi)
  handle_ist_exception:
  SAVE_ALL CLAC
  
 +SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, Clob: acd */
 +/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. 
 */
>>> Following my considerations towards alternative patching to
>>> eliminate as much overhead as possible from the Meltdown
>>> band-aid in case it is being disabled, I'm rather hesitant to see any
>>> patchable code being introduced into the NMI/#MC entry paths
>>> without the patching logic first being made safe in this regard.
>>> Exceptions coming here aren't very frequent (except perhaps on
>>> hardware about to die), so the path isn't performance critical.
>>> Therefore I think we should try to avoid any patching here, and
>>> just conditionals instead. This in fact is one of the reasons why I
>>> didn't want to macro-ize the assembly additions done in the
>>> Meltdown band-aid.
>>>
>>> I do realize that this then also affects the exit-to-Xen path,
>>> which I agree is less desirable to use conditionals on.
>> While I agree that our lack of IST-safe patching is a problem, these
>> alternative points are already present on the NMI and MCE paths, and
>> need to be.  As a result, the DF handler is in no worse of a position. 
>> As a perfect example, observe the CLAC in context.
> Oh, indeed. We should change that.
>
>> I could perhaps be talked into making a SPEC_CTRL_ENTRY_FROM_IST variant
>> which doesn't use alternatives (but IMO this is pointless in the
>> presence of CLAC), but still don't think it is reasonable to treat DF
>> differently to NMI/MCE.
> #DF is debatable: On one hand I can see that if things go wrong,
> it can equally be raised at any time. Otoh #MC and even more so
> NMI can be raised _without_ things going (fatally) wrong, i.e. the
> patching may break a boot which would otherwise have succeeded
> (whereas the #DF would make the boot fail anyway).

I don't see a conclusion here, or a reason for treating #DF differently
to NMI or #MC.

There is currently a very very slim race on boot where an NMI or #MC
hitting the main application of alternatives may cause Xen to explode. 
This has been the case since alternatives were introduced, and this
patch doesn't make the problem meaningfully worse.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 3/7] xen/arm32: entry: Add missing trap_reset entry

2018-01-22 Thread Julien Grall
At the moment, the reset vector is defined as .word 0 (e.g andeq r0, r0,
r0).

This is rather unintuitive and will result to execute the trap
undefined. Instead introduce trap helpers for reset and will generate an
error message in the unlikely case that reset will be called.

This is part of XSA-254.

Signed-off-by: Julien Grall 
---
 xen/arch/arm/arm32/entry.S | 1 +
 xen/arch/arm/arm32/traps.c | 5 +
 2 files changed, 6 insertions(+)

diff --git a/xen/arch/arm/arm32/entry.S b/xen/arch/arm/arm32/entry.S
index c6490d2847..c2fad5fe9b 100644
--- a/xen/arch/arm/arm32/entry.S
+++ b/xen/arch/arm/arm32/entry.S
@@ -146,6 +146,7 @@ GLOBAL(hyp_traps_vector)
 b trap_irq  /* 0x18 - IRQ */
 b trap_fiq  /* 0x1c - FIQ */
 
+DEFINE_TRAP_ENTRY(reset)
 DEFINE_TRAP_ENTRY(undefined_instruction)
 DEFINE_TRAP_ENTRY(hypervisor_call)
 DEFINE_TRAP_ENTRY(prefetch_abort)
diff --git a/xen/arch/arm/arm32/traps.c b/xen/arch/arm/arm32/traps.c
index 705255883e..4f27543dec 100644
--- a/xen/arch/arm/arm32/traps.c
+++ b/xen/arch/arm/arm32/traps.c
@@ -23,6 +23,11 @@
 
 #include 
 
+void do_trap_reset(struct cpu_user_regs *regs)
+{
+do_unexpected_trap("Reset", regs);
+}
+
 void do_trap_undefined_instruction(struct cpu_user_regs *regs)
 {
 uint32_t pc = regs->pc;
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC] ARM PCI Passthrough design document

2018-01-22 Thread Manish Jaggi



On 05/26/2017 10:44 PM, Julien Grall wrote:

Hi all,

Hi Julien,

General consolidated comments first:

Review Comments:

a. The document talks about high level design and does not go into the 
implementation details
 and detailed code flows. So this is missing if adding such detail is 
intended.


b. Document only covers PCI device assignment from the POV of hardware 
domain.

But it does not talks about high level from of PHYSDEVOP_pci_device_add.

c. In the mail chain there was a discussion on Xen only touching config 
space.
Can you add that mail chain discussion on this and config space 
emulation here.


d. Please resolve sections marked as XXX  in the document. We can 
revisit this review after that.


e. Please provide separate flow description for DT and ACPI, it will 
help in understanding.


f. A general picture on how guest domain device assignment would work at 
a high level. As you are covering
it in phase 2, you can add more detail later. This would really help 
completing the design understanding.


Apart from that the document looks ok.

WBR
-Manish


The document below is an RFC version of a design proposal for PCI
Passthrough in Xen on ARM. It aims to describe from an high level perspective
the interaction with the different subsystems and how guest will be able
to discover and access PCI.

Currently on ARM, Xen does not have any knowledge about PCI devices. This
means that IOMMU and interrupt controller (such as ITS) requiring specific
configuration will not work with PCI even with DOM0.

The PCI Passthrough work could be divided in 2 phases:
 * Phase 1: Register all PCI devices in Xen => will allow
to use ITS and SMMU with PCI in Xen
 * Phase 2: Assign devices to guests

This document aims to describe the 2 phases, but for now only phase
1 is fully described.


I think I was able to gather all of the feedbacks and come up with a solution
that will satisfy all the parties. The design document has changed quite a lot
compare to the early draft sent few months ago. The major changes are:
* Provide more details how PCI works on ARM and the interactions with
MSI controller and IOMMU
* Provide details on the existing host bridge implementations
* Give more explanation and justifications on the approach chosen
* Describing the hypercalls used and how they should be called

Feedbacks are welcomed.

Cheers,



% PCI pass-through support on ARM
% Julien Grall 
% Draft B

# Preface

This document aims to describe the components required to enable the PCI
pass-through on ARM.

This is an early draft and some questions are still unanswered. When this is
the case, the text will contain XXX.

# Introduction

PCI pass-through allows the guest to receive full control of physical PCI
devices. This means the guest will have full and direct access to the PCI
device.

ARM is supporting a kind of guest that exploits as much as possible
virtualization support in hardware. The guest will rely on PV driver only
for IO (e.g block, network) and interrupts will come through the virtualized
interrupt controller, therefore there are no big changes required within the
kernel.

As a consequence, it would be possible to replace PV drivers by assigning real
devices to the guest for I/O access. Xen on ARM would therefore be able to
run unmodified operating system.

To achieve this goal, it looks more sensible to go towards emulating the
host bridge (there will be more details later). A guest would be able to take
advantage of the firmware tables, obviating the need for a specific driver
for Xen.

Thus, in this document we follow the emulated host bridge approach.

# PCI terminologies

Each PCI device under a host bridge is uniquely identified by its Requester ID
(AKA RID). A Requester ID is a triplet of Bus number, Device number, and
Function.

When the platform has multiple host bridges, the software can add a fourth
number called Segment (sometimes called Domain) to differentiate host bridges.
A PCI device will then uniquely by segment:bus:device:function (AKA SBDF).

So given a specific SBDF, it would be possible to find the host bridge and the
RID associated to a PCI device. The pair (host bridge, RID) will often be used
to find the relevant information for configuring the different subsystems (e.g
IOMMU, MSI controller). For convenience, the rest of the document will use
SBDF to refer to the pair (host bridge, RID).

# PCI host bridge

PCI host bridge enables data transfer between a host processor and PCI bus
based devices. The bridge is used to access the configuration space of each
PCI devices and, on some platform may also act as an MSI controller.

## Initialization of the PCI host bridge

Whilst it would be expected that the bootloader takes care of initializing
the PCI host bridge, on some platforms it is done in the Operating 

[Xen-devel] [seabios test] 118264: regressions - trouble: broken/fail/pass

2018-01-22 Thread osstest service owner
flight 118264 seabios real [real]
http://logs.test-lab.xenproject.org/osstest/logs/118264/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemuu-debianhvm-amd64   broken
 test-amd64-i386-qemuu-rhel6hvm-amd broken
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop   fail REGR. vs. 115539

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 4 host-install(4) broken pass in 
118256
 test-amd64-i386-qemuu-rhel6hvm-amd  4 host-install(4)broken pass in 118256

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 115539
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 115539
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 115539
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass

version targeted for testing:
 seabios  14d91c353e19b7085fdbb7b2dcc43f3355665670
baseline version:
 seabios  0ca6d6277dfafc671a5b3718cbeb5c78e2a888ea

Last test of basis   115539  2017-11-03 20:48:58 Z   79 days
Failing since115733  2017-11-10 17:19:59 Z   72 days   85 attempts
Testing same since   118140  2018-01-17 05:09:48 Z5 days6 attempts


People who touched revisions under test:
  Kevin O'Connor 
  Marcel Apfelbaum 
  Michael S. Tsirkin 
  Paul Menzel 
  Stefan Berger 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass
 test-amd64-amd64-qemuu-nested-amdfail
 test-amd64-i386-qemuu-rhel6hvm-amd   broken  
 test-amd64-amd64-xl-qemuu-debianhvm-amd64broken  
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-ws16-amd64 fail
 test-amd64-i386-xl-qemuu-ws16-amd64  fail
 test-amd64-amd64-xl-qemuu-win10-i386 fail
 test-amd64-i386-xl-qemuu-win10-i386  fail
 test-amd64-amd64-qemuu-nested-intel  pass
 test-amd64-i386-qemuu-rhel6hvm-intel pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary

broken-job test-amd64-amd64-xl-qemuu-debianhvm-amd64 broken
broken-job test-amd64-i386-qemuu-rhel6hvm-amd broken
broken-step test-amd64-amd64-xl-qemuu-debianhvm-amd64 host-install(4)
broken-step test-amd64-i386-qemuu-rhel6hvm-amd host-install(4)

Not pushing.


commit 14d91c353e19b7085fdbb7b2dcc43f3355665670
Author: Marcel Apfelbaum 
Date:   Thu Jan 11 22:15:12 2018 +0200

pci: fix 'io hints' capability for RedHat PCI bridges

Commit ec6cb17f (pci: enable RedHat PCI bridges to reserve additional
 resources on PCI init)
added a new vendor specific PCI 

Re: [Xen-devel] [PATCH v2 5/7] x86: relocate pvh_info

2018-01-22 Thread Jan Beulich
>>> On 19.01.18 at 17:39,  wrote:
> On Fri, Jan 19, 2018 at 04:29:31PM +, Roger Pau Monné wrote:
>> On Fri, Jan 19, 2018 at 03:34:56PM +, Wei Liu wrote:
>> > diff --git a/xen/arch/x86/boot/build32.mk b/xen/arch/x86/boot/build32.mk
>> > index 48c7407c00..028ac19b96 100644
>> > --- a/xen/arch/x86/boot/build32.mk
>> > +++ b/xen/arch/x86/boot/build32.mk
>> > @@ -36,5 +36,8 @@ CFLAGS := $(filter-out -flto,$(CFLAGS))
>> >  cmdline.o: cmdline.c $(CMDLINE_DEPS)
>> >  
>> >  reloc.o: reloc.c $(RELOC_DEPS)
>> > +ifeq ($(CONFIG_PVH_GUEST),y)
>> > +reloc.o: CFLAGS += -DCONFIG_PVH_GUEST
>> > +endif
>> 
>> I would maybe do this above, where the rest of the CFLAGS are set.
>> Certainly setting -DCONFIG_PVH_GUEST shouldn't cause issues elsewhere.
>> 
>> CFLAGS-$(CONFIG_PVH_GUEST) += -DCONFIG_PVH_GUEST
>> CFLAGS += $(CFLAGS-y)
>> 
>> >  .PRECIOUS: %.bin %.lnk
>> > diff --git a/xen/arch/x86/boot/defs.h b/xen/arch/x86/boot/defs.h
>> > index 6abdc15446..05921a64a3 100644
>> > --- a/xen/arch/x86/boot/defs.h
>> > +++ b/xen/arch/x86/boot/defs.h
>> > @@ -51,6 +51,9 @@ typedef unsigned short u16;
>> >  typedef unsigned int u32;
>> >  typedef unsigned long long u64;
>> >  typedef unsigned int size_t;
>> > +typedef u8 uint8_t;
>> > +typedef u32 uint32_t;
>> > +typedef u64 uint64_t;
>> 
>> This this seems to be always expanding, maybe better to simply replace
>> the stdbool.h include above with types.h?
>> 
> 
> I'm two minded here. My impression is that this wants to be minimal and
> standalone. The content in types.h is a lot more than we need here.

Please keep it the (minimal) way you have it.

>> >  #define U16_MAX   ((u16)(~0U))
>> >  #define UINT_MAX  (~0U)
>> > diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
>> > index 0f652cea11..614e53081e 100644
>> > --- a/xen/arch/x86/boot/head.S
>> > +++ b/xen/arch/x86/boot/head.S
>> > @@ -414,6 +414,7 @@ __pvh_start:
>> >  
>> >  /* Set trampoline_phys to use mfn 1 to avoid having a mapping at 
>> > VA 0 */
>> >  movw$0x1000, sym_esi(trampoline_phys)
>> > +movl$0x336ec578, %eax /* mov $XEN_HVM_START_MAGIC_VALUE, %eax 
>> > */
>> 
>> Hm, if XEN_HVM_START_MAGIC_VALUE cannot be used I would rather prefer
>> to use (%ebx).
> 
> The same reason I didn't include types.h + hvm_start_info.h here.
> 
> We can include both to make $XEN_HVM_START_MAGIC_VALUE work. But I think
> using (%ebx) is better in here.

I agree (%ebx) is preferable.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6.5 08/26] x86/entry: Erase guest GPR state on entry to Xen

2018-01-22 Thread David Woodhouse
On Mon, 2018-01-22 at 10:18 +, Andrew Cooper wrote:
> On 22/01/2018 10:04, David Woodhouse wrote:
> > 
> > On Thu, 2018-01-04 at 00:15 +, Andrew Cooper wrote:
> > > 
> > > --- a/xen/include/asm-x86/asm_defns.h
> > > +++ b/xen/include/asm-x86/asm_defns.h
> > > @@ -217,22 +217,34 @@ static always_inline void stac(void)
> > >  addq  $-(UREGS_error_code-UREGS_r15), %rsp
> > >  cld
> > >  movq  %rdi,UREGS_rdi(%rsp)
> > > +    xor   %edi, %edi
> > >  movq  %rsi,UREGS_rsi(%rsp)
> > > +    xor   %esi, %esi
> > >  movq  %rdx,UREGS_rdx(%rsp)
> > > +    xor   %edx, %edx
> > >  movq  %rcx,UREGS_rcx(%rsp)
> > > +    xor   %ecx, %ecx
> > >  movq  %rax,UREGS_rax(%rsp)
> > > +    xor   %eax, %eax
> > You didn't want to erase all 64 bits?
>
> This does erase all 64 bits.  (We're in long mode, so the upper 32 bits
> are implicitly zeroed, without an added rex prefix.)

Eww. In the grand scheme of things, I'd rather the assembler knew that
(and happily omitted the rex prefix all by itself to use the more
efficient encoding of the instruction), and not me.

smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6.5 08/26] x86/entry: Erase guest GPR state on entry to Xen

2018-01-22 Thread Andrew Cooper
On 22/01/2018 10:04, David Woodhouse wrote:
> On Thu, 2018-01-04 at 00:15 +, Andrew Cooper wrote:
>> --- a/xen/include/asm-x86/asm_defns.h
>> +++ b/xen/include/asm-x86/asm_defns.h
>> @@ -217,22 +217,34 @@ static always_inline void stac(void)
>>  addq  $-(UREGS_error_code-UREGS_r15), %rsp
>>  cld
>>  movq  %rdi,UREGS_rdi(%rsp)
>> +    xor   %edi, %edi
>>  movq  %rsi,UREGS_rsi(%rsp)
>> +    xor   %esi, %esi
>>  movq  %rdx,UREGS_rdx(%rsp)
>> +    xor   %edx, %edx
>>  movq  %rcx,UREGS_rcx(%rsp)
>> +    xor   %ecx, %ecx
>>  movq  %rax,UREGS_rax(%rsp)
>> +    xor   %eax, %eax
> You didn't want to erase all 64 bits?

This does erase all 64 bits.  (We're in long mode, so the upper 32 bits
are implicitly zeroed, without an added rex prefix.)

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  1   2   >