Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
2017-05-13 1:25 GMT+08:00, James Morse: > Hi gengdongjiu, > > On 10/05/17 09:44, gengdongjiu wrote: >> On 2017/5/9 1:28, James Morse wrote: > (hwpoison for KVM is a corner case as Qemu's memory effectively has two > users, > Qemu and KVM. This isn't the example of how user-space gets > signalled.) >>> >>> KVM creates guests as if they were additional users of Qemu's memory. The >>> code >>> in mm/memory-failure.c may find that Qemu didn't have the affected page >>> mapped >>> to user-space - but it may have been in use by stage2. >>> >>> The KVM+SIGBUS patch hides this difference, meaning Qemu gets a signal >>> when the >>> guest touches the hwpoison page as if Qemu had touched the page itself. >>> >>> Signals from KVM is a corner case, for firmware-first decisions should >>> happen in >>> the APEI code based on CPER records. > If so, how the KVM handle the SEA type other than hwpoison? > >>> To deliver to a guest? It shouldn't have to know, user space should use a >>> KVM >>> API to drive this. >>> >>> When received from hardware? It shouldn't have to care, these things >>> should be >>> passed into the APEI code for handling. KVM just needs to put the host >>> registers >>> back. > >> Recently I confirmed with the hardware team. they said almost all the SEA >> errors have the >> Poison flag, so may be there is no need to consider other SEA errors other >> than hwPoison. >> only consider SEA hwpoison errors can be enough. > > We should be careful here, by hwpoison I meant the Linux feature. > From Documentation/vm/hwpoison.txt: >> Upcoming Intel CPUs have support for recovering from some memory errors >> (``MCA recovery''). This requires the OS to declare a page "poisoned", >> kill the processes associated with it and avoid using it in the future. > > We were talking about KVM's reaction to 'the OS declaring a page poisoned'. > Lets try to call this one memory-failure, as that is its Kconfig name. (now > I > understand why we've been confusing each other!) > > Your hwpoison looks like something the CPU reports in the ERRSTATUS > registers > (4.6.10 of DDI0587). This is something firmware should read, then describe > to > the OS via CPER records. Depending on these CPER records linux may invoke > its > memory-failure code. yes > > injection a SEA is no more than setting some registers: elr_el1, PC, PSTATE, SPSR_el1, far_el1, esr_el1 I seen this KVM API do the same thing as Qemu. do you found call this API will have issue and necessary to choose another ESR value? >>> >>> Should we let user-space pick the ESR to deliver to the guest? Yes, >>> letting >>> user-space specify the ESR gives the most flexibility to do something >>> clever in >>> the future. An obvious choice for SEA is between the external-abort and >>> 'parity >>> or ECC error' codes. If we tell user-space which of these happened (I >>> don't >>> think Linux does today) then Qemu can relay that information to the >>> guest. > >> may be the ESR is delivered by the KVM. >> (1) guest OS EL0 happen SEA due to hwpoison >> (2) CPU traps to EL3 firmware, and update the ESR_EL3 >> (3) the EL3 firmware copies the ESR_EL3 to ESR_EL2 >> (4) then jump to EL2 hypervisor, hypervisor uses the ESR_EL2 to inject the >> SEA. >> >> May be the esr_el2 can provide the accurate error information. >> or do you think user-space specify the ESR instead of esr_el2 is better? > > I think the severity needs to be considered as the notification is handled > by > each exception level. There are cases where it will need to be upgraded > from > 'contained' to 'uncontained'. (more discussion on another part of the > thread). understand it. > > > Thanks, > > James >
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
2017-05-13 1:25 GMT+08:00, James Morse : > Hi gengdongjiu, > > On 10/05/17 09:44, gengdongjiu wrote: >> On 2017/5/9 1:28, James Morse wrote: > (hwpoison for KVM is a corner case as Qemu's memory effectively has two > users, > Qemu and KVM. This isn't the example of how user-space gets > signalled.) >>> >>> KVM creates guests as if they were additional users of Qemu's memory. The >>> code >>> in mm/memory-failure.c may find that Qemu didn't have the affected page >>> mapped >>> to user-space - but it may have been in use by stage2. >>> >>> The KVM+SIGBUS patch hides this difference, meaning Qemu gets a signal >>> when the >>> guest touches the hwpoison page as if Qemu had touched the page itself. >>> >>> Signals from KVM is a corner case, for firmware-first decisions should >>> happen in >>> the APEI code based on CPER records. > If so, how the KVM handle the SEA type other than hwpoison? > >>> To deliver to a guest? It shouldn't have to know, user space should use a >>> KVM >>> API to drive this. >>> >>> When received from hardware? It shouldn't have to care, these things >>> should be >>> passed into the APEI code for handling. KVM just needs to put the host >>> registers >>> back. > >> Recently I confirmed with the hardware team. they said almost all the SEA >> errors have the >> Poison flag, so may be there is no need to consider other SEA errors other >> than hwPoison. >> only consider SEA hwpoison errors can be enough. > > We should be careful here, by hwpoison I meant the Linux feature. > From Documentation/vm/hwpoison.txt: >> Upcoming Intel CPUs have support for recovering from some memory errors >> (``MCA recovery''). This requires the OS to declare a page "poisoned", >> kill the processes associated with it and avoid using it in the future. > > We were talking about KVM's reaction to 'the OS declaring a page poisoned'. > Lets try to call this one memory-failure, as that is its Kconfig name. (now > I > understand why we've been confusing each other!) > > Your hwpoison looks like something the CPU reports in the ERRSTATUS > registers > (4.6.10 of DDI0587). This is something firmware should read, then describe > to > the OS via CPER records. Depending on these CPER records linux may invoke > its > memory-failure code. yes > > injection a SEA is no more than setting some registers: elr_el1, PC, PSTATE, SPSR_el1, far_el1, esr_el1 I seen this KVM API do the same thing as Qemu. do you found call this API will have issue and necessary to choose another ESR value? >>> >>> Should we let user-space pick the ESR to deliver to the guest? Yes, >>> letting >>> user-space specify the ESR gives the most flexibility to do something >>> clever in >>> the future. An obvious choice for SEA is between the external-abort and >>> 'parity >>> or ECC error' codes. If we tell user-space which of these happened (I >>> don't >>> think Linux does today) then Qemu can relay that information to the >>> guest. > >> may be the ESR is delivered by the KVM. >> (1) guest OS EL0 happen SEA due to hwpoison >> (2) CPU traps to EL3 firmware, and update the ESR_EL3 >> (3) the EL3 firmware copies the ESR_EL3 to ESR_EL2 >> (4) then jump to EL2 hypervisor, hypervisor uses the ESR_EL2 to inject the >> SEA. >> >> May be the esr_el2 can provide the accurate error information. >> or do you think user-space specify the ESR instead of esr_el2 is better? > > I think the severity needs to be considered as the notification is handled > by > each exception level. There are cases where it will need to be upgraded > from > 'contained' to 'uncontained'. (more discussion on another part of the > thread). understand it. > > > Thanks, > > James >
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
Hi James, sorry for the late response due to recently verify and debug the RAS solution. 2017-05-13 1:24 GMT+08:00, James Morse: > Hi gengdongjiu, > > On 05/05/17 13:31, gengdongjiu wrote: >> when guest OS happen an SEA, My current solution is shown below: >> >> (1) host EL3 firmware firstly handle the SEA error and generate the CPER >> record. >> (2) EL3 firmware separately copy the esr_el3, elr_el3, SPSR_el3, >> far_el3 to the esr_el2, elr_el2, SPSR_el2, far_el2. > > Copying {ELR,SPSR,FAR}_EL3 to the EL2 registers rings some alarm bells: I'm > sure > you exclude values from EL3 or the secure-world, we should never hand those > to > the normal world. it is sure that needs to exclude the EL3 Error and secure-world. > > >> (3) then jump the EL2 hypervisor > >> so the EL2 hypervisor uses the ESR that come from esr_el3, here the >> ESR(esr_el3) value may be different with the exist KVM API's ESR. > > The ESR may be different between EL3 and EL2. The ESR contains the severity > of > the event, the CPU will choose this when it takes the SError to EL3. If it > had > taken the SError to EL2, the CPU may have classified the error differently. > > Firmware may need to generate a more severe ESR if it receives an error > that > would be propagated by delivering SEI to a lower exception level, for > example if > an EL2 system register is 'infected'. > > This is the same for Qemu/kvmtool. A contained error at EL2 may be an > uncontained error if we hand it to guest EL1. Linux's RAS code will decide > this > with its choice of signal to send, (and possibly which code to set). > Qemu/kvmtool need to choose an appropriate APEI notification, which may > involve > generating a relevant ESR. > > Also relevant is the problem we discussed earlier with trying to deliver > fake > Physical-SError from software at EL3: If the SError is routed to EL2, and > EL2 > has PSTATE.A masked, EL3 has to wait and try again later. This is another > case > where firmware may have to upgrade the classification of an error to > uncontainable. it makes sense. thanks to James. > > > Thanks, > > James >
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
Hi James, sorry for the late response due to recently verify and debug the RAS solution. 2017-05-13 1:24 GMT+08:00, James Morse : > Hi gengdongjiu, > > On 05/05/17 13:31, gengdongjiu wrote: >> when guest OS happen an SEA, My current solution is shown below: >> >> (1) host EL3 firmware firstly handle the SEA error and generate the CPER >> record. >> (2) EL3 firmware separately copy the esr_el3, elr_el3, SPSR_el3, >> far_el3 to the esr_el2, elr_el2, SPSR_el2, far_el2. > > Copying {ELR,SPSR,FAR}_EL3 to the EL2 registers rings some alarm bells: I'm > sure > you exclude values from EL3 or the secure-world, we should never hand those > to > the normal world. it is sure that needs to exclude the EL3 Error and secure-world. > > >> (3) then jump the EL2 hypervisor > >> so the EL2 hypervisor uses the ESR that come from esr_el3, here the >> ESR(esr_el3) value may be different with the exist KVM API's ESR. > > The ESR may be different between EL3 and EL2. The ESR contains the severity > of > the event, the CPU will choose this when it takes the SError to EL3. If it > had > taken the SError to EL2, the CPU may have classified the error differently. > > Firmware may need to generate a more severe ESR if it receives an error > that > would be propagated by delivering SEI to a lower exception level, for > example if > an EL2 system register is 'infected'. > > This is the same for Qemu/kvmtool. A contained error at EL2 may be an > uncontained error if we hand it to guest EL1. Linux's RAS code will decide > this > with its choice of signal to send, (and possibly which code to set). > Qemu/kvmtool need to choose an appropriate APEI notification, which may > involve > generating a relevant ESR. > > Also relevant is the problem we discussed earlier with trying to deliver > fake > Physical-SError from software at EL3: If the SError is routed to EL2, and > EL2 > has PSTATE.A masked, EL3 has to wait and try again later. This is another > case > where firmware may have to upgrade the classification of an error to > uncontainable. it makes sense. thanks to James. > > > Thanks, > > James >
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
Hi gengdongjiu, On 05/05/17 13:31, gengdongjiu wrote: > when guest OS happen an SEA, My current solution is shown below: > > (1) host EL3 firmware firstly handle the SEA error and generate the CPER > record. > (2) EL3 firmware separately copy the esr_el3, elr_el3, SPSR_el3, > far_el3 to the esr_el2, elr_el2, SPSR_el2, far_el2. Copying {ELR,SPSR,FAR}_EL3 to the EL2 registers rings some alarm bells: I'm sure you exclude values from EL3 or the secure-world, we should never hand those to the normal world. > (3) then jump the EL2 hypervisor > so the EL2 hypervisor uses the ESR that come from esr_el3, here the > ESR(esr_el3) value may be different with the exist KVM API's ESR. The ESR may be different between EL3 and EL2. The ESR contains the severity of the event, the CPU will choose this when it takes the SError to EL3. If it had taken the SError to EL2, the CPU may have classified the error differently. Firmware may need to generate a more severe ESR if it receives an error that would be propagated by delivering SEI to a lower exception level, for example if an EL2 system register is 'infected'. This is the same for Qemu/kvmtool. A contained error at EL2 may be an uncontained error if we hand it to guest EL1. Linux's RAS code will decide this with its choice of signal to send, (and possibly which code to set). Qemu/kvmtool need to choose an appropriate APEI notification, which may involve generating a relevant ESR. Also relevant is the problem we discussed earlier with trying to deliver fake Physical-SError from software at EL3: If the SError is routed to EL2, and EL2 has PSTATE.A masked, EL3 has to wait and try again later. This is another case where firmware may have to upgrade the classification of an error to uncontainable. Thanks, James
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
Hi gengdongjiu, On 10/05/17 09:44, gengdongjiu wrote: > On 2017/5/9 1:28, James Morse wrote: (hwpoison for KVM is a corner case as Qemu's memory effectively has two users, Qemu and KVM. This isn't the example of how user-space gets signalled.) >> >> KVM creates guests as if they were additional users of Qemu's memory. The >> code >> in mm/memory-failure.c may find that Qemu didn't have the affected page >> mapped >> to user-space - but it may have been in use by stage2. >> >> The KVM+SIGBUS patch hides this difference, meaning Qemu gets a signal when >> the >> guest touches the hwpoison page as if Qemu had touched the page itself. >> >> Signals from KVM is a corner case, for firmware-first decisions should >> happen in >> the APEI code based on CPER records. >>> If so, how the KVM handle the SEA type other than hwpoison? >> To deliver to a guest? It shouldn't have to know, user space should use a KVM >> API to drive this. >> >> When received from hardware? It shouldn't have to care, these things should >> be >> passed into the APEI code for handling. KVM just needs to put the host >> registers >> back. > Recently I confirmed with the hardware team. they said almost all the SEA > errors have the > Poison flag, so may be there is no need to consider other SEA errors other > than hwPoison. > only consider SEA hwpoison errors can be enough. We should be careful here, by hwpoison I meant the Linux feature. >From Documentation/vm/hwpoison.txt: > Upcoming Intel CPUs have support for recovering from some memory errors > (``MCA recovery''). This requires the OS to declare a page "poisoned", > kill the processes associated with it and avoid using it in the future. We were talking about KVM's reaction to 'the OS declaring a page poisoned'. Lets try to call this one memory-failure, as that is its Kconfig name. (now I understand why we've been confusing each other!) Your hwpoison looks like something the CPU reports in the ERRSTATUS registers (4.6.10 of DDI0587). This is something firmware should read, then describe to the OS via CPER records. Depending on these CPER records linux may invoke its memory-failure code. >>> injection a SEA is no more than setting some registers: elr_el1, PC, >>> PSTATE, SPSR_el1, far_el1, esr_el1 >>> I seen this KVM API do the same thing as Qemu. do you found call this >>> API will have issue and necessary to choose another ESR value? >> >> Should we let user-space pick the ESR to deliver to the guest? Yes, letting >> user-space specify the ESR gives the most flexibility to do something clever >> in >> the future. An obvious choice for SEA is between the external-abort and >> 'parity >> or ECC error' codes. If we tell user-space which of these happened (I don't >> think Linux does today) then Qemu can relay that information to the guest. > may be the ESR is delivered by the KVM. > (1) guest OS EL0 happen SEA due to hwpoison > (2) CPU traps to EL3 firmware, and update the ESR_EL3 > (3) the EL3 firmware copies the ESR_EL3 to ESR_EL2 > (4) then jump to EL2 hypervisor, hypervisor uses the ESR_EL2 to inject the > SEA. > > May be the esr_el2 can provide the accurate error information. > or do you think user-space specify the ESR instead of esr_el2 is better? I think the severity needs to be considered as the notification is handled by each exception level. There are cases where it will need to be upgraded from 'contained' to 'uncontained'. (more discussion on another part of the thread). Thanks, James
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
Hi gengdongjiu, On 05/05/17 13:31, gengdongjiu wrote: > when guest OS happen an SEA, My current solution is shown below: > > (1) host EL3 firmware firstly handle the SEA error and generate the CPER > record. > (2) EL3 firmware separately copy the esr_el3, elr_el3, SPSR_el3, > far_el3 to the esr_el2, elr_el2, SPSR_el2, far_el2. Copying {ELR,SPSR,FAR}_EL3 to the EL2 registers rings some alarm bells: I'm sure you exclude values from EL3 or the secure-world, we should never hand those to the normal world. > (3) then jump the EL2 hypervisor > so the EL2 hypervisor uses the ESR that come from esr_el3, here the > ESR(esr_el3) value may be different with the exist KVM API's ESR. The ESR may be different between EL3 and EL2. The ESR contains the severity of the event, the CPU will choose this when it takes the SError to EL3. If it had taken the SError to EL2, the CPU may have classified the error differently. Firmware may need to generate a more severe ESR if it receives an error that would be propagated by delivering SEI to a lower exception level, for example if an EL2 system register is 'infected'. This is the same for Qemu/kvmtool. A contained error at EL2 may be an uncontained error if we hand it to guest EL1. Linux's RAS code will decide this with its choice of signal to send, (and possibly which code to set). Qemu/kvmtool need to choose an appropriate APEI notification, which may involve generating a relevant ESR. Also relevant is the problem we discussed earlier with trying to deliver fake Physical-SError from software at EL3: If the SError is routed to EL2, and EL2 has PSTATE.A masked, EL3 has to wait and try again later. This is another case where firmware may have to upgrade the classification of an error to uncontainable. Thanks, James
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
Hi gengdongjiu, On 10/05/17 09:44, gengdongjiu wrote: > On 2017/5/9 1:28, James Morse wrote: (hwpoison for KVM is a corner case as Qemu's memory effectively has two users, Qemu and KVM. This isn't the example of how user-space gets signalled.) >> >> KVM creates guests as if they were additional users of Qemu's memory. The >> code >> in mm/memory-failure.c may find that Qemu didn't have the affected page >> mapped >> to user-space - but it may have been in use by stage2. >> >> The KVM+SIGBUS patch hides this difference, meaning Qemu gets a signal when >> the >> guest touches the hwpoison page as if Qemu had touched the page itself. >> >> Signals from KVM is a corner case, for firmware-first decisions should >> happen in >> the APEI code based on CPER records. >>> If so, how the KVM handle the SEA type other than hwpoison? >> To deliver to a guest? It shouldn't have to know, user space should use a KVM >> API to drive this. >> >> When received from hardware? It shouldn't have to care, these things should >> be >> passed into the APEI code for handling. KVM just needs to put the host >> registers >> back. > Recently I confirmed with the hardware team. they said almost all the SEA > errors have the > Poison flag, so may be there is no need to consider other SEA errors other > than hwPoison. > only consider SEA hwpoison errors can be enough. We should be careful here, by hwpoison I meant the Linux feature. >From Documentation/vm/hwpoison.txt: > Upcoming Intel CPUs have support for recovering from some memory errors > (``MCA recovery''). This requires the OS to declare a page "poisoned", > kill the processes associated with it and avoid using it in the future. We were talking about KVM's reaction to 'the OS declaring a page poisoned'. Lets try to call this one memory-failure, as that is its Kconfig name. (now I understand why we've been confusing each other!) Your hwpoison looks like something the CPU reports in the ERRSTATUS registers (4.6.10 of DDI0587). This is something firmware should read, then describe to the OS via CPER records. Depending on these CPER records linux may invoke its memory-failure code. >>> injection a SEA is no more than setting some registers: elr_el1, PC, >>> PSTATE, SPSR_el1, far_el1, esr_el1 >>> I seen this KVM API do the same thing as Qemu. do you found call this >>> API will have issue and necessary to choose another ESR value? >> >> Should we let user-space pick the ESR to deliver to the guest? Yes, letting >> user-space specify the ESR gives the most flexibility to do something clever >> in >> the future. An obvious choice for SEA is between the external-abort and >> 'parity >> or ECC error' codes. If we tell user-space which of these happened (I don't >> think Linux does today) then Qemu can relay that information to the guest. > may be the ESR is delivered by the KVM. > (1) guest OS EL0 happen SEA due to hwpoison > (2) CPU traps to EL3 firmware, and update the ESR_EL3 > (3) the EL3 firmware copies the ESR_EL3 to ESR_EL2 > (4) then jump to EL2 hypervisor, hypervisor uses the ESR_EL2 to inject the > SEA. > > May be the esr_el2 can provide the accurate error information. > or do you think user-space specify the ESR instead of esr_el2 is better? I think the severity needs to be considered as the notification is handled by each exception level. There are cases where it will need to be upgraded from 'contained' to 'uncontained'. (more discussion on another part of the thread). Thanks, James
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
Hi Christoffer, On 2017/5/10 20:20, Christoffer Dall wrote: > On Wed, May 10, 2017 at 05:15:04PM +0800, gengdongjiu wrote: >> Thanks James's explanation. >> >> Hi Christoffer, >> >> On 2017/5/9 22:28, James Morse wrote: >>> Hi Christoffer, >>> >>> On 08/05/17 18:54, Christoffer Dall wrote: On Mon, May 08, 2017 at 06:28:02PM +0100, James Morse wrote: I must admit I am losing track of exactly what this proposed API was supposed to do. >>> >>> There are two, and we keep jumping between them! >>> This is about two notification methods APEI has for arm64, 'SEA' and 'SEI'. >>> >>> SEA is synchronous and looks like a data abort. Qemu/kvmtool can inject >>> these >>> today using the KVM_GET/SET_ONE_REG API whenever it wants to. >>> >>> SEI uses SError, is asynchronous and can be masked. In addition these need >>> to be >>> consumed/synchronised by the ESB instruction, even when executed by a guest. >>> Hardware has the necessary bits to drive all this, we need to expose an API >>> to >>> drive it. >>> >>> (I try to spell them out each time so I don't confuse SEI with something >>> synchronous!) >>> >>> >>> This patch was about SEA. I think you've answered our question: >> >> we are talking about the SEA(synchronous data abort) injection two methods: >> >> (1)change vcpu registers in the Qemu/kvmtools and using the >> KVM_GET/SET_ONE_REG API to set. > > Yes, if this is possible, why would you want something more? we will use this method. > >> (2)using existed in-kernel API "kvm_inject_dabt" to inject through IOCTL >> command from Qemu. >> > > I'm not really going to consider this, because "use internal API from > userspace" doesn't work. > > So this should be: > > (2) Introduce a new API to do X. you can ignore the second method, now we will not use it. > > I still think you know what my preference is; use the existing API if at > all possible. > > Thanks, > -Christoffer > > . >
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
Hi Christoffer, On 2017/5/10 20:20, Christoffer Dall wrote: > On Wed, May 10, 2017 at 05:15:04PM +0800, gengdongjiu wrote: >> Thanks James's explanation. >> >> Hi Christoffer, >> >> On 2017/5/9 22:28, James Morse wrote: >>> Hi Christoffer, >>> >>> On 08/05/17 18:54, Christoffer Dall wrote: On Mon, May 08, 2017 at 06:28:02PM +0100, James Morse wrote: I must admit I am losing track of exactly what this proposed API was supposed to do. >>> >>> There are two, and we keep jumping between them! >>> This is about two notification methods APEI has for arm64, 'SEA' and 'SEI'. >>> >>> SEA is synchronous and looks like a data abort. Qemu/kvmtool can inject >>> these >>> today using the KVM_GET/SET_ONE_REG API whenever it wants to. >>> >>> SEI uses SError, is asynchronous and can be masked. In addition these need >>> to be >>> consumed/synchronised by the ESB instruction, even when executed by a guest. >>> Hardware has the necessary bits to drive all this, we need to expose an API >>> to >>> drive it. >>> >>> (I try to spell them out each time so I don't confuse SEI with something >>> synchronous!) >>> >>> >>> This patch was about SEA. I think you've answered our question: >> >> we are talking about the SEA(synchronous data abort) injection two methods: >> >> (1)change vcpu registers in the Qemu/kvmtools and using the >> KVM_GET/SET_ONE_REG API to set. > > Yes, if this is possible, why would you want something more? we will use this method. > >> (2)using existed in-kernel API "kvm_inject_dabt" to inject through IOCTL >> command from Qemu. >> > > I'm not really going to consider this, because "use internal API from > userspace" doesn't work. > > So this should be: > > (2) Introduce a new API to do X. you can ignore the second method, now we will not use it. > > I still think you know what my preference is; use the existing API if at > all possible. > > Thanks, > -Christoffer > > . >
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
On Wed, May 10, 2017 at 05:15:04PM +0800, gengdongjiu wrote: > Thanks James's explanation. > > Hi Christoffer, > > On 2017/5/9 22:28, James Morse wrote: > > Hi Christoffer, > > > > On 08/05/17 18:54, Christoffer Dall wrote: > >> On Mon, May 08, 2017 at 06:28:02PM +0100, James Morse wrote: > >> I must admit I am losing track of exactly what this proposed API was > >> supposed to do. > > > > There are two, and we keep jumping between them! > > This is about two notification methods APEI has for arm64, 'SEA' and 'SEI'. > > > > SEA is synchronous and looks like a data abort. Qemu/kvmtool can inject > > these > > today using the KVM_GET/SET_ONE_REG API whenever it wants to. > > > > SEI uses SError, is asynchronous and can be masked. In addition these need > > to be > > consumed/synchronised by the ESB instruction, even when executed by a guest. > > Hardware has the necessary bits to drive all this, we need to expose an API > > to > > drive it. > > > > (I try to spell them out each time so I don't confuse SEI with something > > synchronous!) > > > > > > This patch was about SEA. I think you've answered our question: > > we are talking about the SEA(synchronous data abort) injection two methods: > > (1)change vcpu registers in the Qemu/kvmtools and using the > KVM_GET/SET_ONE_REG API to set. Yes, if this is possible, why would you want something more? > (2)using existed in-kernel API "kvm_inject_dabt" to inject through IOCTL > command from Qemu. > I'm not really going to consider this, because "use internal API from userspace" doesn't work. So this should be: (2) Introduce a new API to do X. I still think you know what my preference is; use the existing API if at all possible. Thanks, -Christoffer
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
On Wed, May 10, 2017 at 05:15:04PM +0800, gengdongjiu wrote: > Thanks James's explanation. > > Hi Christoffer, > > On 2017/5/9 22:28, James Morse wrote: > > Hi Christoffer, > > > > On 08/05/17 18:54, Christoffer Dall wrote: > >> On Mon, May 08, 2017 at 06:28:02PM +0100, James Morse wrote: > >> I must admit I am losing track of exactly what this proposed API was > >> supposed to do. > > > > There are two, and we keep jumping between them! > > This is about two notification methods APEI has for arm64, 'SEA' and 'SEI'. > > > > SEA is synchronous and looks like a data abort. Qemu/kvmtool can inject > > these > > today using the KVM_GET/SET_ONE_REG API whenever it wants to. > > > > SEI uses SError, is asynchronous and can be masked. In addition these need > > to be > > consumed/synchronised by the ESB instruction, even when executed by a guest. > > Hardware has the necessary bits to drive all this, we need to expose an API > > to > > drive it. > > > > (I try to spell them out each time so I don't confuse SEI with something > > synchronous!) > > > > > > This patch was about SEA. I think you've answered our question: > > we are talking about the SEA(synchronous data abort) injection two methods: > > (1)change vcpu registers in the Qemu/kvmtools and using the > KVM_GET/SET_ONE_REG API to set. Yes, if this is possible, why would you want something more? > (2)using existed in-kernel API "kvm_inject_dabt" to inject through IOCTL > command from Qemu. > I'm not really going to consider this, because "use internal API from userspace" doesn't work. So this should be: (2) Introduce a new API to do X. I still think you know what my preference is; use the existing API if at all possible. Thanks, -Christoffer
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
Thanks James's explanation. Hi Christoffer, On 2017/5/9 22:28, James Morse wrote: > Hi Christoffer, > > On 08/05/17 18:54, Christoffer Dall wrote: >> On Mon, May 08, 2017 at 06:28:02PM +0100, James Morse wrote: >> I must admit I am losing track of exactly what this proposed API was >> supposed to do. > > There are two, and we keep jumping between them! > This is about two notification methods APEI has for arm64, 'SEA' and 'SEI'. > > SEA is synchronous and looks like a data abort. Qemu/kvmtool can inject these > today using the KVM_GET/SET_ONE_REG API whenever it wants to. > > SEI uses SError, is asynchronous and can be masked. In addition these need to > be > consumed/synchronised by the ESB instruction, even when executed by a guest. > Hardware has the necessary bits to drive all this, we need to expose an API to > drive it. > > (I try to spell them out each time so I don't confuse SEI with something > synchronous!) > > > This patch was about SEA. I think you've answered our question: we are talking about the SEA(synchronous data abort) injection two methods: (1)change vcpu registers in the Qemu/kvmtools and using the KVM_GET/SET_ONE_REG API to set. (2)using existed in-kernel API "kvm_inject_dabt" to inject through IOCTL command from Qemu. > >> However, if it's a question about setting up VCPU registers to a certain >> state and potentially modifying memory, then I think experience has >> shown us (psci) that emulating something in the kernel that userspace >> can have fine-grained control over is a bad idea, and should be left to >> userspace using as generic APIs as possible. >> >> Furthermore, if I understand what injecting a SEA requires, it is very >> similar to resetting the CPU and loading data into guest memory, which >> QEMU already does today, and there is no reason to introduce additional >> APIs if it can be done using KVM_GET/SET_ONE_REG ioctls. > > > Thanks, > > James > > . >
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
Thanks James's explanation. Hi Christoffer, On 2017/5/9 22:28, James Morse wrote: > Hi Christoffer, > > On 08/05/17 18:54, Christoffer Dall wrote: >> On Mon, May 08, 2017 at 06:28:02PM +0100, James Morse wrote: >> I must admit I am losing track of exactly what this proposed API was >> supposed to do. > > There are two, and we keep jumping between them! > This is about two notification methods APEI has for arm64, 'SEA' and 'SEI'. > > SEA is synchronous and looks like a data abort. Qemu/kvmtool can inject these > today using the KVM_GET/SET_ONE_REG API whenever it wants to. > > SEI uses SError, is asynchronous and can be masked. In addition these need to > be > consumed/synchronised by the ESB instruction, even when executed by a guest. > Hardware has the necessary bits to drive all this, we need to expose an API to > drive it. > > (I try to spell them out each time so I don't confuse SEI with something > synchronous!) > > > This patch was about SEA. I think you've answered our question: we are talking about the SEA(synchronous data abort) injection two methods: (1)change vcpu registers in the Qemu/kvmtools and using the KVM_GET/SET_ONE_REG API to set. (2)using existed in-kernel API "kvm_inject_dabt" to inject through IOCTL command from Qemu. > >> However, if it's a question about setting up VCPU registers to a certain >> state and potentially modifying memory, then I think experience has >> shown us (psci) that emulating something in the kernel that userspace >> can have fine-grained control over is a bad idea, and should be left to >> userspace using as generic APIs as possible. >> >> Furthermore, if I understand what injecting a SEA requires, it is very >> similar to resetting the CPU and loading data into guest memory, which >> QEMU already does today, and there is no reason to introduce additional >> APIs if it can be done using KVM_GET/SET_ONE_REG ioctls. > > > Thanks, > > James > > . >
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
Hi James, thanks a lot for your answer. On 2017/5/9 1:28, James Morse wrote: > Hi gengdongjiu, > > On 04/05/17 17:52, gengdongjiu wrote: >> 2017-05-04 23:42 GMT+08:00 gengdongjiu: >>> On 30/04/17 06:37, Dongjiu Geng wrote: diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 105b6ab..a96594f 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c +static void kvm_handle_bad_page(unsigned long address, + bool hugetlb, bool hwpoison) +{ + /* handle both hwpoison and other synchronous external Abort */ + if (hwpoison) + kvm_send_signal(address, hugetlb, true); + else + kvm_send_signal(address, hugetlb, false); +} >>> >>> Why the extra level of indirection? We only want to signal userspace like >>> this >>> from KVM for hwpoison. Signals for RAS related reasons should come from the >>> bits >>> of the kernel that decoded the error. >> >> For the SEA, the are maily two types: >> 0b01 Synchronous External Abort on memory access. >> 0b0101xx Synchronous External Abort on page table walk. DFSC[1:0] >> encode the level. > > (KVM shouldn't have to make decisions about this) > > >> hwpoison should belong to the "Synchronous External Abort on memory access" >> if the SEA type is not hwpoison, such as page table walk, do you mean >> KVM do not deliver the SIGBUS? > > > The flow of events should be SEI/SEA from firmware to the hosts's APEI code. > KVM > should only be involved to get us back to the host if we were running a guest. > The APEI/hwpoison code may cause a set of processes to be sent signals. The > code > in mm/memory-failure.c does this by walking the process rmaps using the > physical > addresses in the CPER records. > > We want user space to be sent signals as this can (and should) work in exactly > the same way on arm64 as it does on x86 or any other architecture. If a > web-browser can handle SIGBUS notifications for memory-corruption, it > shouldn't > have to care what architecture it is running on. Ok, James, understand. > > So what is that KVM+SIGBUS patch about?... > >>> (hwpoison for KVM is a corner case as Qemu's memory effectively has two >>> users, >>> Qemu and KVM. This isn't the example of how user-space gets signalled.) > > KVM creates guests as if they were additional users of Qemu's memory. The code > in mm/memory-failure.c may find that Qemu didn't have the affected page mapped > to user-space - but it may have been in use by stage2. > > The KVM+SIGBUS patch hides this difference, meaning Qemu gets a signal when > the > guest touches the hwpoison page as if Qemu had touched the page itself. > > Signals from KVM is a corner case, for firmware-first decisions should happen > in > the APEI code based on CPER records. > > >> If so, how the KVM handle the SEA type other than hwpoison? > > To deliver to a guest? It shouldn't have to know, user space should use a KVM > API to drive this. > > When received from hardware? It shouldn't have to care, these things should be > passed into the APEI code for handling. KVM just needs to put the host > registers > back. Recently I confirmed with the hardware team. they said almost all the SEA errors have the Poison flag, so may be there is no need to consider other SEA errors other than hwPoison. only consider SEA hwpoison errors can be enough. > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index bb02909..1d2e2e7 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1306,6 +1306,7 @@ struct kvm_s390_ucas_mapping { #define KVM_S390_GET_IRQ_STATE _IOW(KVMIO, 0xb6, struct kvm_s390_irq_state) /* Available with KVM_CAP_X86_SMM */ #define KVM_SMI _IO(KVMIO, 0xb7) +#define KVM_ARM_SEA _IO(KVMIO, 0xb8) #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1) >>> >>> Why do we need a userspace API for SEA? It can also be done by using >>> KVM_{G,S}ET_ONE_REG to change the vcpu registers. The advantage of doing it >>> this >>> way is you can choose which ESR value to use. >>> >>> Adding a new API call to do something you could do with an old one doesn't >>> look >>> right. >> >> James, I considered your suggestion before that use the >> KVM_{G,S}ET_ONE_REG to change the vcpu registers. but I found it does >> not have difference to use the alread existed KVM API. > > (Only that is an in-kernel helper, not a published API) yes, the kvm_inject_dabt is an in-kernel API. > > >> so may be >> changing the vcpu registers in qemu will duplicate with the KVM APIs. > > That is true, but the alternative is a new API that doesn't do anything new, > its > just more convenient. > > Marc and Christoffer are the people to convince. > I argue the existing API is sufficient. > > >> injection a SEA is no more than setting some registers: elr_el1, PC, >>
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
Hi James, thanks a lot for your answer. On 2017/5/9 1:28, James Morse wrote: > Hi gengdongjiu, > > On 04/05/17 17:52, gengdongjiu wrote: >> 2017-05-04 23:42 GMT+08:00 gengdongjiu : >>> On 30/04/17 06:37, Dongjiu Geng wrote: diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 105b6ab..a96594f 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c +static void kvm_handle_bad_page(unsigned long address, + bool hugetlb, bool hwpoison) +{ + /* handle both hwpoison and other synchronous external Abort */ + if (hwpoison) + kvm_send_signal(address, hugetlb, true); + else + kvm_send_signal(address, hugetlb, false); +} >>> >>> Why the extra level of indirection? We only want to signal userspace like >>> this >>> from KVM for hwpoison. Signals for RAS related reasons should come from the >>> bits >>> of the kernel that decoded the error. >> >> For the SEA, the are maily two types: >> 0b01 Synchronous External Abort on memory access. >> 0b0101xx Synchronous External Abort on page table walk. DFSC[1:0] >> encode the level. > > (KVM shouldn't have to make decisions about this) > > >> hwpoison should belong to the "Synchronous External Abort on memory access" >> if the SEA type is not hwpoison, such as page table walk, do you mean >> KVM do not deliver the SIGBUS? > > > The flow of events should be SEI/SEA from firmware to the hosts's APEI code. > KVM > should only be involved to get us back to the host if we were running a guest. > The APEI/hwpoison code may cause a set of processes to be sent signals. The > code > in mm/memory-failure.c does this by walking the process rmaps using the > physical > addresses in the CPER records. > > We want user space to be sent signals as this can (and should) work in exactly > the same way on arm64 as it does on x86 or any other architecture. If a > web-browser can handle SIGBUS notifications for memory-corruption, it > shouldn't > have to care what architecture it is running on. Ok, James, understand. > > So what is that KVM+SIGBUS patch about?... > >>> (hwpoison for KVM is a corner case as Qemu's memory effectively has two >>> users, >>> Qemu and KVM. This isn't the example of how user-space gets signalled.) > > KVM creates guests as if they were additional users of Qemu's memory. The code > in mm/memory-failure.c may find that Qemu didn't have the affected page mapped > to user-space - but it may have been in use by stage2. > > The KVM+SIGBUS patch hides this difference, meaning Qemu gets a signal when > the > guest touches the hwpoison page as if Qemu had touched the page itself. > > Signals from KVM is a corner case, for firmware-first decisions should happen > in > the APEI code based on CPER records. > > >> If so, how the KVM handle the SEA type other than hwpoison? > > To deliver to a guest? It shouldn't have to know, user space should use a KVM > API to drive this. > > When received from hardware? It shouldn't have to care, these things should be > passed into the APEI code for handling. KVM just needs to put the host > registers > back. Recently I confirmed with the hardware team. they said almost all the SEA errors have the Poison flag, so may be there is no need to consider other SEA errors other than hwPoison. only consider SEA hwpoison errors can be enough. > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index bb02909..1d2e2e7 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1306,6 +1306,7 @@ struct kvm_s390_ucas_mapping { #define KVM_S390_GET_IRQ_STATE _IOW(KVMIO, 0xb6, struct kvm_s390_irq_state) /* Available with KVM_CAP_X86_SMM */ #define KVM_SMI _IO(KVMIO, 0xb7) +#define KVM_ARM_SEA _IO(KVMIO, 0xb8) #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1) >>> >>> Why do we need a userspace API for SEA? It can also be done by using >>> KVM_{G,S}ET_ONE_REG to change the vcpu registers. The advantage of doing it >>> this >>> way is you can choose which ESR value to use. >>> >>> Adding a new API call to do something you could do with an old one doesn't >>> look >>> right. >> >> James, I considered your suggestion before that use the >> KVM_{G,S}ET_ONE_REG to change the vcpu registers. but I found it does >> not have difference to use the alread existed KVM API. > > (Only that is an in-kernel helper, not a published API) yes, the kvm_inject_dabt is an in-kernel API. > > >> so may be >> changing the vcpu registers in qemu will duplicate with the KVM APIs. > > That is true, but the alternative is a new API that doesn't do anything new, > its > just more convenient. > > Marc and Christoffer are the people to convince. > I argue the existing API is sufficient. > > >> injection a SEA is no more than setting some registers: elr_el1, PC, >> PSTATE, SPSR_el1, far_el1,
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
Hi Christoffer, On 08/05/17 18:54, Christoffer Dall wrote: > On Mon, May 08, 2017 at 06:28:02PM +0100, James Morse wrote: > I must admit I am losing track of exactly what this proposed API was > supposed to do. There are two, and we keep jumping between them! This is about two notification methods APEI has for arm64, 'SEA' and 'SEI'. SEA is synchronous and looks like a data abort. Qemu/kvmtool can inject these today using the KVM_GET/SET_ONE_REG API whenever it wants to. SEI uses SError, is asynchronous and can be masked. In addition these need to be consumed/synchronised by the ESB instruction, even when executed by a guest. Hardware has the necessary bits to drive all this, we need to expose an API to drive it. (I try to spell them out each time so I don't confuse SEI with something synchronous!) This patch was about SEA. I think you've answered our question: > However, if it's a question about setting up VCPU registers to a certain > state and potentially modifying memory, then I think experience has > shown us (psci) that emulating something in the kernel that userspace > can have fine-grained control over is a bad idea, and should be left to > userspace using as generic APIs as possible. > > Furthermore, if I understand what injecting a SEA requires, it is very > similar to resetting the CPU and loading data into guest memory, which > QEMU already does today, and there is no reason to introduce additional > APIs if it can be done using KVM_GET/SET_ONE_REG ioctls. Thanks, James
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
Hi Christoffer, On 08/05/17 18:54, Christoffer Dall wrote: > On Mon, May 08, 2017 at 06:28:02PM +0100, James Morse wrote: > I must admit I am losing track of exactly what this proposed API was > supposed to do. There are two, and we keep jumping between them! This is about two notification methods APEI has for arm64, 'SEA' and 'SEI'. SEA is synchronous and looks like a data abort. Qemu/kvmtool can inject these today using the KVM_GET/SET_ONE_REG API whenever it wants to. SEI uses SError, is asynchronous and can be masked. In addition these need to be consumed/synchronised by the ESB instruction, even when executed by a guest. Hardware has the necessary bits to drive all this, we need to expose an API to drive it. (I try to spell them out each time so I don't confuse SEI with something synchronous!) This patch was about SEA. I think you've answered our question: > However, if it's a question about setting up VCPU registers to a certain > state and potentially modifying memory, then I think experience has > shown us (psci) that emulating something in the kernel that userspace > can have fine-grained control over is a bad idea, and should be left to > userspace using as generic APIs as possible. > > Furthermore, if I understand what injecting a SEA requires, it is very > similar to resetting the CPU and loading data into guest memory, which > QEMU already does today, and there is no reason to introduce additional > APIs if it can be done using KVM_GET/SET_ONE_REG ioctls. Thanks, James
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
On Mon, May 08, 2017 at 06:28:02PM +0100, James Morse wrote: [...] > > > >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > >>> index bb02909..1d2e2e7 100644 > >>> --- a/include/uapi/linux/kvm.h > >>> +++ b/include/uapi/linux/kvm.h > >>> @@ -1306,6 +1306,7 @@ struct kvm_s390_ucas_mapping { > >>> #define KVM_S390_GET_IRQ_STATE _IOW(KVMIO, 0xb6, struct > >>> kvm_s390_irq_state) > >>> /* Available with KVM_CAP_X86_SMM */ > >>> #define KVM_SMI _IO(KVMIO, 0xb7) > >>> +#define KVM_ARM_SEA _IO(KVMIO, 0xb8) > >>> > >>> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) > >>> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1) > >>> > >> > >> Why do we need a userspace API for SEA? It can also be done by using > >> KVM_{G,S}ET_ONE_REG to change the vcpu registers. The advantage of doing > >> it this > >> way is you can choose which ESR value to use. > >> > >> Adding a new API call to do something you could do with an old one doesn't > >> look > >> right. > > > > James, I considered your suggestion before that use the > > KVM_{G,S}ET_ONE_REG to change the vcpu registers. but I found it does > > not have difference to use the alread existed KVM API. > > (Only that is an in-kernel helper, not a published API) > > > > so may be > > changing the vcpu registers in qemu will duplicate with the KVM APIs. > > That is true, but the alternative is a new API that doesn't do anything new, > its > just more convenient. > > Marc and Christoffer are the people to convince. > I argue the existing API is sufficient. > I must admit I am losing track of exactly what this proposed API was supposed to do. However, if it's a question about setting up VCPU registers to a certain state and potentially modifying memory, then I think experience has shown us (psci) that emulating something in the kernel that userspace can have fine-grained control over is a bad idea, and should be left to userspace using as generic APIs as possible. Furthermore, if I understand what injecting a SEA requires, it is very similar to resetting the CPU and loading data into guest memory, which QEMU already does today, and there is no reason to introduce additional APIs if it can be done using KVM_GET/SET_ONE_REG ioctls. Thanks, -Christoffer
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
On Mon, May 08, 2017 at 06:28:02PM +0100, James Morse wrote: [...] > > > >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > >>> index bb02909..1d2e2e7 100644 > >>> --- a/include/uapi/linux/kvm.h > >>> +++ b/include/uapi/linux/kvm.h > >>> @@ -1306,6 +1306,7 @@ struct kvm_s390_ucas_mapping { > >>> #define KVM_S390_GET_IRQ_STATE _IOW(KVMIO, 0xb6, struct > >>> kvm_s390_irq_state) > >>> /* Available with KVM_CAP_X86_SMM */ > >>> #define KVM_SMI _IO(KVMIO, 0xb7) > >>> +#define KVM_ARM_SEA _IO(KVMIO, 0xb8) > >>> > >>> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) > >>> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1) > >>> > >> > >> Why do we need a userspace API for SEA? It can also be done by using > >> KVM_{G,S}ET_ONE_REG to change the vcpu registers. The advantage of doing > >> it this > >> way is you can choose which ESR value to use. > >> > >> Adding a new API call to do something you could do with an old one doesn't > >> look > >> right. > > > > James, I considered your suggestion before that use the > > KVM_{G,S}ET_ONE_REG to change the vcpu registers. but I found it does > > not have difference to use the alread existed KVM API. > > (Only that is an in-kernel helper, not a published API) > > > > so may be > > changing the vcpu registers in qemu will duplicate with the KVM APIs. > > That is true, but the alternative is a new API that doesn't do anything new, > its > just more convenient. > > Marc and Christoffer are the people to convince. > I argue the existing API is sufficient. > I must admit I am losing track of exactly what this proposed API was supposed to do. However, if it's a question about setting up VCPU registers to a certain state and potentially modifying memory, then I think experience has shown us (psci) that emulating something in the kernel that userspace can have fine-grained control over is a bad idea, and should be left to userspace using as generic APIs as possible. Furthermore, if I understand what injecting a SEA requires, it is very similar to resetting the CPU and loading data into guest memory, which QEMU already does today, and there is no reason to introduce additional APIs if it can be done using KVM_GET/SET_ONE_REG ioctls. Thanks, -Christoffer
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
Hi gengdongjiu, On 04/05/17 17:52, gengdongjiu wrote: > 2017-05-04 23:42 GMT+08:00 gengdongjiu: >> On 30/04/17 06:37, Dongjiu Geng wrote: >>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >>> index 105b6ab..a96594f 100644 >>> --- a/arch/arm/kvm/mmu.c >>> +++ b/arch/arm/kvm/mmu.c >>> +static void kvm_handle_bad_page(unsigned long address, >>> + bool hugetlb, bool hwpoison) >>> +{ >>> + /* handle both hwpoison and other synchronous external Abort */ >>> + if (hwpoison) >>> + kvm_send_signal(address, hugetlb, true); >>> + else >>> + kvm_send_signal(address, hugetlb, false); >>> +} >> >> Why the extra level of indirection? We only want to signal userspace like >> this >> from KVM for hwpoison. Signals for RAS related reasons should come from the >> bits >> of the kernel that decoded the error. > > For the SEA, the are maily two types: > 0b01 Synchronous External Abort on memory access. > 0b0101xx Synchronous External Abort on page table walk. DFSC[1:0] > encode the level. (KVM shouldn't have to make decisions about this) > hwpoison should belong to the "Synchronous External Abort on memory access" > if the SEA type is not hwpoison, such as page table walk, do you mean > KVM do not deliver the SIGBUS? The flow of events should be SEI/SEA from firmware to the hosts's APEI code. KVM should only be involved to get us back to the host if we were running a guest. The APEI/hwpoison code may cause a set of processes to be sent signals. The code in mm/memory-failure.c does this by walking the process rmaps using the physical addresses in the CPER records. We want user space to be sent signals as this can (and should) work in exactly the same way on arm64 as it does on x86 or any other architecture. If a web-browser can handle SIGBUS notifications for memory-corruption, it shouldn't have to care what architecture it is running on. So what is that KVM+SIGBUS patch about?... >> (hwpoison for KVM is a corner case as Qemu's memory effectively has two >> users, >> Qemu and KVM. This isn't the example of how user-space gets signalled.) KVM creates guests as if they were additional users of Qemu's memory. The code in mm/memory-failure.c may find that Qemu didn't have the affected page mapped to user-space - but it may have been in use by stage2. The KVM+SIGBUS patch hides this difference, meaning Qemu gets a signal when the guest touches the hwpoison page as if Qemu had touched the page itself. Signals from KVM is a corner case, for firmware-first decisions should happen in the APEI code based on CPER records. > If so, how the KVM handle the SEA type other than hwpoison? To deliver to a guest? It shouldn't have to know, user space should use a KVM API to drive this. When received from hardware? It shouldn't have to care, these things should be passed into the APEI code for handling. KVM just needs to put the host registers back. >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h >>> index bb02909..1d2e2e7 100644 >>> --- a/include/uapi/linux/kvm.h >>> +++ b/include/uapi/linux/kvm.h >>> @@ -1306,6 +1306,7 @@ struct kvm_s390_ucas_mapping { >>> #define KVM_S390_GET_IRQ_STATE _IOW(KVMIO, 0xb6, struct >>> kvm_s390_irq_state) >>> /* Available with KVM_CAP_X86_SMM */ >>> #define KVM_SMI _IO(KVMIO, 0xb7) >>> +#define KVM_ARM_SEA _IO(KVMIO, 0xb8) >>> >>> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) >>> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1) >>> >> >> Why do we need a userspace API for SEA? It can also be done by using >> KVM_{G,S}ET_ONE_REG to change the vcpu registers. The advantage of doing it >> this >> way is you can choose which ESR value to use. >> >> Adding a new API call to do something you could do with an old one doesn't >> look >> right. > > James, I considered your suggestion before that use the > KVM_{G,S}ET_ONE_REG to change the vcpu registers. but I found it does > not have difference to use the alread existed KVM API. (Only that is an in-kernel helper, not a published API) > so may be > changing the vcpu registers in qemu will duplicate with the KVM APIs. That is true, but the alternative is a new API that doesn't do anything new, its just more convenient. Marc and Christoffer are the people to convince. I argue the existing API is sufficient. > injection a SEA is no more than setting some registers: elr_el1, PC, > PSTATE, SPSR_el1, far_el1, esr_el1 > I seen this KVM API do the same thing as Qemu. do you found call this > API will have issue and necessary to choose another ESR value? Should we let user-space pick the ESR to deliver to the guest? Yes, letting user-space specify the ESR gives the most flexibility to do something clever in the future. An obvious choice for SEA is between the external-abort and 'parity or ECC error' codes. If we tell user-space which of these happened (I don't think Linux does today) then Qemu can relay that information to the guest. Thanks, James
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
Hi gengdongjiu, On 04/05/17 17:52, gengdongjiu wrote: > 2017-05-04 23:42 GMT+08:00 gengdongjiu : >> On 30/04/17 06:37, Dongjiu Geng wrote: >>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >>> index 105b6ab..a96594f 100644 >>> --- a/arch/arm/kvm/mmu.c >>> +++ b/arch/arm/kvm/mmu.c >>> +static void kvm_handle_bad_page(unsigned long address, >>> + bool hugetlb, bool hwpoison) >>> +{ >>> + /* handle both hwpoison and other synchronous external Abort */ >>> + if (hwpoison) >>> + kvm_send_signal(address, hugetlb, true); >>> + else >>> + kvm_send_signal(address, hugetlb, false); >>> +} >> >> Why the extra level of indirection? We only want to signal userspace like >> this >> from KVM for hwpoison. Signals for RAS related reasons should come from the >> bits >> of the kernel that decoded the error. > > For the SEA, the are maily two types: > 0b01 Synchronous External Abort on memory access. > 0b0101xx Synchronous External Abort on page table walk. DFSC[1:0] > encode the level. (KVM shouldn't have to make decisions about this) > hwpoison should belong to the "Synchronous External Abort on memory access" > if the SEA type is not hwpoison, such as page table walk, do you mean > KVM do not deliver the SIGBUS? The flow of events should be SEI/SEA from firmware to the hosts's APEI code. KVM should only be involved to get us back to the host if we were running a guest. The APEI/hwpoison code may cause a set of processes to be sent signals. The code in mm/memory-failure.c does this by walking the process rmaps using the physical addresses in the CPER records. We want user space to be sent signals as this can (and should) work in exactly the same way on arm64 as it does on x86 or any other architecture. If a web-browser can handle SIGBUS notifications for memory-corruption, it shouldn't have to care what architecture it is running on. So what is that KVM+SIGBUS patch about?... >> (hwpoison for KVM is a corner case as Qemu's memory effectively has two >> users, >> Qemu and KVM. This isn't the example of how user-space gets signalled.) KVM creates guests as if they were additional users of Qemu's memory. The code in mm/memory-failure.c may find that Qemu didn't have the affected page mapped to user-space - but it may have been in use by stage2. The KVM+SIGBUS patch hides this difference, meaning Qemu gets a signal when the guest touches the hwpoison page as if Qemu had touched the page itself. Signals from KVM is a corner case, for firmware-first decisions should happen in the APEI code based on CPER records. > If so, how the KVM handle the SEA type other than hwpoison? To deliver to a guest? It shouldn't have to know, user space should use a KVM API to drive this. When received from hardware? It shouldn't have to care, these things should be passed into the APEI code for handling. KVM just needs to put the host registers back. >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h >>> index bb02909..1d2e2e7 100644 >>> --- a/include/uapi/linux/kvm.h >>> +++ b/include/uapi/linux/kvm.h >>> @@ -1306,6 +1306,7 @@ struct kvm_s390_ucas_mapping { >>> #define KVM_S390_GET_IRQ_STATE _IOW(KVMIO, 0xb6, struct >>> kvm_s390_irq_state) >>> /* Available with KVM_CAP_X86_SMM */ >>> #define KVM_SMI _IO(KVMIO, 0xb7) >>> +#define KVM_ARM_SEA _IO(KVMIO, 0xb8) >>> >>> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) >>> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1) >>> >> >> Why do we need a userspace API for SEA? It can also be done by using >> KVM_{G,S}ET_ONE_REG to change the vcpu registers. The advantage of doing it >> this >> way is you can choose which ESR value to use. >> >> Adding a new API call to do something you could do with an old one doesn't >> look >> right. > > James, I considered your suggestion before that use the > KVM_{G,S}ET_ONE_REG to change the vcpu registers. but I found it does > not have difference to use the alread existed KVM API. (Only that is an in-kernel helper, not a published API) > so may be > changing the vcpu registers in qemu will duplicate with the KVM APIs. That is true, but the alternative is a new API that doesn't do anything new, its just more convenient. Marc and Christoffer are the people to convince. I argue the existing API is sufficient. > injection a SEA is no more than setting some registers: elr_el1, PC, > PSTATE, SPSR_el1, far_el1, esr_el1 > I seen this KVM API do the same thing as Qemu. do you found call this > API will have issue and necessary to choose another ESR value? Should we let user-space pick the ESR to deliver to the guest? Yes, letting user-space specify the ESR gives the most flexibility to do something clever in the future. An obvious choice for SEA is between the external-abort and 'parity or ECC error' codes. If we tell user-space which of these happened (I don't think Linux does today) then Qemu can relay that information to the guest. Thanks, James
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
HI James, 2017-05-05 0:52 GMT+08:00 gengdongjiu: > Dear James, >Thanks a lot for your review and comments. I am very sorry for the > late response. > > > 2017-05-04 23:42 GMT+08:00 gengdongjiu : >> Hi Dongjiu Geng, >> >> On 30/04/17 06:37, Dongjiu Geng wrote: >>> when happen SEA, deliver signal bus and handle the ioctl that >>> inject SEA abort to guest, so that guest can handle the SEA error. >> >>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >>> index 105b6ab..a96594f 100644 >>> --- a/arch/arm/kvm/mmu.c >>> +++ b/arch/arm/kvm/mmu.c >>> @@ -20,8 +20,10 @@ >>> @@ -1238,6 +1240,36 @@ static void coherent_cache_guest_page(struct >>> kvm_vcpu *vcpu, kvm_pfn_t pfn, >>> __coherent_cache_guest_page(vcpu, pfn, size); >>> } >>> >>> +static void kvm_send_signal(unsigned long address, bool hugetlb, bool >>> hwpoison) >>> +{ >>> + siginfo_t info; >>> + >>> + info.si_signo = SIGBUS; >>> + info.si_errno = 0; >>> + if (hwpoison) >>> + info.si_code= BUS_MCEERR_AR; >>> + else >>> + info.si_code= 0; >>> + >>> + info.si_addr= (void __user *)address; >>> + if (hugetlb) >>> + info.si_addr_lsb = PMD_SHIFT; >>> + else >>> + info.si_addr_lsb = PAGE_SHIFT; >>> + >>> + send_sig_info(SIGBUS, , current); >>> +} >>> + >> « [hide part of quote] >> >> Punit reviewed the other version of this patch, this PMD_SHIFT is not the >> right >> thing to do, it needs a more accurate set of calls and shifts as there may be >> hugetlbfs pages other than PMD_SIZE. >> >> https://www.spinics.net/lists/arm-kernel/msg568919.html >> >> I haven't posted a new version of that patch because I was still hunting a >> bug >> in the hugepage/hwpoison code, even with Punit's fixes series I see -EFAULT >> returned to userspace instead of this hwpoison code being invoked. > > Ok, got it, thanks for your information. >> >> Please avoid duplicating functionality between patches, it wastes reviewers >> time, especially when we know there are problems with this approach. >> >> >>> +static void kvm_handle_bad_page(unsigned long address, >>> + bool hugetlb, bool hwpoison) >>> +{ >>> + /* handle both hwpoison and other synchronous external Abort */ >>> + if (hwpoison) >>> + kvm_send_signal(address, hugetlb, true); >>> + else >>> + kvm_send_signal(address, hugetlb, false); >>> +} >> >> Why the extra level of indirection? We only want to signal userspace like >> this >> from KVM for hwpoison. Signals for RAS related reasons should come from the >> bits >> of the kernel that decoded the error. > > For the SEA, the are maily two types: > 0b01 Synchronous External Abort on memory access. > 0b0101xx Synchronous External Abort on page table walk. DFSC[1:0] > encode the level. > > hwpoison should belong to the "Synchronous External Abort on memory access" > if the SEA type is not hwpoison, such as page table walk, do you mean > KVM do not deliver the SIGBUS? > If so, how the KVM handle the SEA type other than hwpoison? > >> >> (hwpoison for KVM is a corner case as Qemu's memory effectively has two >> users, >> Qemu and KVM. This isn't the example of how user-space gets signalled.) >> >> >>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c >>> index b37446a..780e3c4 100644 >>> --- a/arch/arm64/kvm/guest.c >>> +++ b/arch/arm64/kvm/guest.c >>> @@ -277,6 +277,13 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu >>> *vcpu, >>> return -EINVAL; >>> } >>> >>> +int kvm_vcpu_ioctl_sea(struct kvm_vcpu *vcpu) >>> +{ >>> + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); >>> + >>> + return 0; >>> +} >> >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h >>> index bb02909..1d2e2e7 100644 >>> --- a/include/uapi/linux/kvm.h >>> +++ b/include/uapi/linux/kvm.h >>> @@ -1306,6 +1306,7 @@ struct kvm_s390_ucas_mapping { >>> #define KVM_S390_GET_IRQ_STATE _IOW(KVMIO, 0xb6, struct >>> kvm_s390_irq_state) >>> /* Available with KVM_CAP_X86_SMM */ >>> #define KVM_SMI _IO(KVMIO, 0xb7) >>> +#define KVM_ARM_SEA _IO(KVMIO, 0xb8) >>> >>> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) >>> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1) >>> >> >> Why do we need a userspace API for SEA? It can also be done by using >> KVM_{G,S}ET_ONE_REG to change the vcpu registers. The advantage of doing it >> this >> way is you can choose which ESR value to use. >> >> Adding a new API call to do something you could do with an old one doesn't >> look >> right. > > James, I considered your suggestion before that use the > KVM_{G,S}ET_ONE_REG to change the vcpu registers. but I found it does > not have difference to use the alread existed KVM API. so may be > changing the vcpu registers in qemu will duplicate with the KVM APIs. > > injection a SEA is no more than setting some registers: elr_el1, PC, > PSTATE, SPSR_el1, far_el1, esr_el1 > I seen this KVM API do the same thing as Qemu. do you found call this > API will have issue and necessary to choose another ESR
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
HI James, 2017-05-05 0:52 GMT+08:00 gengdongjiu : > Dear James, >Thanks a lot for your review and comments. I am very sorry for the > late response. > > > 2017-05-04 23:42 GMT+08:00 gengdongjiu : >> Hi Dongjiu Geng, >> >> On 30/04/17 06:37, Dongjiu Geng wrote: >>> when happen SEA, deliver signal bus and handle the ioctl that >>> inject SEA abort to guest, so that guest can handle the SEA error. >> >>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >>> index 105b6ab..a96594f 100644 >>> --- a/arch/arm/kvm/mmu.c >>> +++ b/arch/arm/kvm/mmu.c >>> @@ -20,8 +20,10 @@ >>> @@ -1238,6 +1240,36 @@ static void coherent_cache_guest_page(struct >>> kvm_vcpu *vcpu, kvm_pfn_t pfn, >>> __coherent_cache_guest_page(vcpu, pfn, size); >>> } >>> >>> +static void kvm_send_signal(unsigned long address, bool hugetlb, bool >>> hwpoison) >>> +{ >>> + siginfo_t info; >>> + >>> + info.si_signo = SIGBUS; >>> + info.si_errno = 0; >>> + if (hwpoison) >>> + info.si_code= BUS_MCEERR_AR; >>> + else >>> + info.si_code= 0; >>> + >>> + info.si_addr= (void __user *)address; >>> + if (hugetlb) >>> + info.si_addr_lsb = PMD_SHIFT; >>> + else >>> + info.si_addr_lsb = PAGE_SHIFT; >>> + >>> + send_sig_info(SIGBUS, , current); >>> +} >>> + >> « [hide part of quote] >> >> Punit reviewed the other version of this patch, this PMD_SHIFT is not the >> right >> thing to do, it needs a more accurate set of calls and shifts as there may be >> hugetlbfs pages other than PMD_SIZE. >> >> https://www.spinics.net/lists/arm-kernel/msg568919.html >> >> I haven't posted a new version of that patch because I was still hunting a >> bug >> in the hugepage/hwpoison code, even with Punit's fixes series I see -EFAULT >> returned to userspace instead of this hwpoison code being invoked. > > Ok, got it, thanks for your information. >> >> Please avoid duplicating functionality between patches, it wastes reviewers >> time, especially when we know there are problems with this approach. >> >> >>> +static void kvm_handle_bad_page(unsigned long address, >>> + bool hugetlb, bool hwpoison) >>> +{ >>> + /* handle both hwpoison and other synchronous external Abort */ >>> + if (hwpoison) >>> + kvm_send_signal(address, hugetlb, true); >>> + else >>> + kvm_send_signal(address, hugetlb, false); >>> +} >> >> Why the extra level of indirection? We only want to signal userspace like >> this >> from KVM for hwpoison. Signals for RAS related reasons should come from the >> bits >> of the kernel that decoded the error. > > For the SEA, the are maily two types: > 0b01 Synchronous External Abort on memory access. > 0b0101xx Synchronous External Abort on page table walk. DFSC[1:0] > encode the level. > > hwpoison should belong to the "Synchronous External Abort on memory access" > if the SEA type is not hwpoison, such as page table walk, do you mean > KVM do not deliver the SIGBUS? > If so, how the KVM handle the SEA type other than hwpoison? > >> >> (hwpoison for KVM is a corner case as Qemu's memory effectively has two >> users, >> Qemu and KVM. This isn't the example of how user-space gets signalled.) >> >> >>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c >>> index b37446a..780e3c4 100644 >>> --- a/arch/arm64/kvm/guest.c >>> +++ b/arch/arm64/kvm/guest.c >>> @@ -277,6 +277,13 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu >>> *vcpu, >>> return -EINVAL; >>> } >>> >>> +int kvm_vcpu_ioctl_sea(struct kvm_vcpu *vcpu) >>> +{ >>> + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); >>> + >>> + return 0; >>> +} >> >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h >>> index bb02909..1d2e2e7 100644 >>> --- a/include/uapi/linux/kvm.h >>> +++ b/include/uapi/linux/kvm.h >>> @@ -1306,6 +1306,7 @@ struct kvm_s390_ucas_mapping { >>> #define KVM_S390_GET_IRQ_STATE _IOW(KVMIO, 0xb6, struct >>> kvm_s390_irq_state) >>> /* Available with KVM_CAP_X86_SMM */ >>> #define KVM_SMI _IO(KVMIO, 0xb7) >>> +#define KVM_ARM_SEA _IO(KVMIO, 0xb8) >>> >>> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) >>> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1) >>> >> >> Why do we need a userspace API for SEA? It can also be done by using >> KVM_{G,S}ET_ONE_REG to change the vcpu registers. The advantage of doing it >> this >> way is you can choose which ESR value to use. >> >> Adding a new API call to do something you could do with an old one doesn't >> look >> right. > > James, I considered your suggestion before that use the > KVM_{G,S}ET_ONE_REG to change the vcpu registers. but I found it does > not have difference to use the alread existed KVM API. so may be > changing the vcpu registers in qemu will duplicate with the KVM APIs. > > injection a SEA is no more than setting some registers: elr_el1, PC, > PSTATE, SPSR_el1, far_el1, esr_el1 > I seen this KVM API do the same thing as Qemu. do you found call this > API will have issue and necessary to choose another ESR value? I consider again, you are right.
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
Dear James, Thanks a lot for your review and comments. I am very sorry for the late response. 2017-05-04 23:42 GMT+08:00 gengdongjiu: > Hi Dongjiu Geng, > > On 30/04/17 06:37, Dongjiu Geng wrote: >> when happen SEA, deliver signal bus and handle the ioctl that >> inject SEA abort to guest, so that guest can handle the SEA error. > >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >> index 105b6ab..a96594f 100644 >> --- a/arch/arm/kvm/mmu.c >> +++ b/arch/arm/kvm/mmu.c >> @@ -20,8 +20,10 @@ >> @@ -1238,6 +1240,36 @@ static void coherent_cache_guest_page(struct kvm_vcpu >> *vcpu, kvm_pfn_t pfn, >> __coherent_cache_guest_page(vcpu, pfn, size); >> } >> >> +static void kvm_send_signal(unsigned long address, bool hugetlb, bool >> hwpoison) >> +{ >> + siginfo_t info; >> + >> + info.si_signo = SIGBUS; >> + info.si_errno = 0; >> + if (hwpoison) >> + info.si_code= BUS_MCEERR_AR; >> + else >> + info.si_code= 0; >> + >> + info.si_addr= (void __user *)address; >> + if (hugetlb) >> + info.si_addr_lsb = PMD_SHIFT; >> + else >> + info.si_addr_lsb = PAGE_SHIFT; >> + >> + send_sig_info(SIGBUS, , current); >> +} >> + > « [hide part of quote] > > Punit reviewed the other version of this patch, this PMD_SHIFT is not the > right > thing to do, it needs a more accurate set of calls and shifts as there may be > hugetlbfs pages other than PMD_SIZE. > > https://www.spinics.net/lists/arm-kernel/msg568919.html > > I haven't posted a new version of that patch because I was still hunting a bug > in the hugepage/hwpoison code, even with Punit's fixes series I see -EFAULT > returned to userspace instead of this hwpoison code being invoked. Ok, got it, thanks for your information. > > Please avoid duplicating functionality between patches, it wastes reviewers > time, especially when we know there are problems with this approach. > > >> +static void kvm_handle_bad_page(unsigned long address, >> + bool hugetlb, bool hwpoison) >> +{ >> + /* handle both hwpoison and other synchronous external Abort */ >> + if (hwpoison) >> + kvm_send_signal(address, hugetlb, true); >> + else >> + kvm_send_signal(address, hugetlb, false); >> +} > > Why the extra level of indirection? We only want to signal userspace like this > from KVM for hwpoison. Signals for RAS related reasons should come from the > bits > of the kernel that decoded the error. For the SEA, the are maily two types: 0b01 Synchronous External Abort on memory access. 0b0101xx Synchronous External Abort on page table walk. DFSC[1:0] encode the level. hwpoison should belong to the "Synchronous External Abort on memory access" if the SEA type is not hwpoison, such as page table walk, do you mean KVM do not deliver the SIGBUS? If so, how the KVM handle the SEA type other than hwpoison? > > (hwpoison for KVM is a corner case as Qemu's memory effectively has two users, > Qemu and KVM. This isn't the example of how user-space gets signalled.) > > >> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c >> index b37446a..780e3c4 100644 >> --- a/arch/arm64/kvm/guest.c >> +++ b/arch/arm64/kvm/guest.c >> @@ -277,6 +277,13 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, >> return -EINVAL; >> } >> >> +int kvm_vcpu_ioctl_sea(struct kvm_vcpu *vcpu) >> +{ >> + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); >> + >> + return 0; >> +} > >> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h >> index bb02909..1d2e2e7 100644 >> --- a/include/uapi/linux/kvm.h >> +++ b/include/uapi/linux/kvm.h >> @@ -1306,6 +1306,7 @@ struct kvm_s390_ucas_mapping { >> #define KVM_S390_GET_IRQ_STATE _IOW(KVMIO, 0xb6, struct kvm_s390_irq_state) >> /* Available with KVM_CAP_X86_SMM */ >> #define KVM_SMI _IO(KVMIO, 0xb7) >> +#define KVM_ARM_SEA _IO(KVMIO, 0xb8) >> >> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) >> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1) >> > > Why do we need a userspace API for SEA? It can also be done by using > KVM_{G,S}ET_ONE_REG to change the vcpu registers. The advantage of doing it > this > way is you can choose which ESR value to use. > > Adding a new API call to do something you could do with an old one doesn't > look > right. James, I considered your suggestion before that use the KVM_{G,S}ET_ONE_REG to change the vcpu registers. but I found it does not have difference to use the alread existed KVM API. so may be changing the vcpu registers in qemu will duplicate with the KVM APIs. injection a SEA is no more than setting some registers: elr_el1, PC, PSTATE, SPSR_el1, far_el1, esr_el1 I seen this KVM API do the same thing as Qemu. do you found call this API will have issue and necessary to choose another ESR value? I pasted the alread existed KVM API code: static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr) { unsigned long cpsr = *vcpu_cpsr(vcpu); bool is_aarch32 = vcpu_mode_is_32bit(vcpu); u32 esr = 0; *vcpu_elr_el1(vcpu)
Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
Dear James, Thanks a lot for your review and comments. I am very sorry for the late response. 2017-05-04 23:42 GMT+08:00 gengdongjiu : > Hi Dongjiu Geng, > > On 30/04/17 06:37, Dongjiu Geng wrote: >> when happen SEA, deliver signal bus and handle the ioctl that >> inject SEA abort to guest, so that guest can handle the SEA error. > >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >> index 105b6ab..a96594f 100644 >> --- a/arch/arm/kvm/mmu.c >> +++ b/arch/arm/kvm/mmu.c >> @@ -20,8 +20,10 @@ >> @@ -1238,6 +1240,36 @@ static void coherent_cache_guest_page(struct kvm_vcpu >> *vcpu, kvm_pfn_t pfn, >> __coherent_cache_guest_page(vcpu, pfn, size); >> } >> >> +static void kvm_send_signal(unsigned long address, bool hugetlb, bool >> hwpoison) >> +{ >> + siginfo_t info; >> + >> + info.si_signo = SIGBUS; >> + info.si_errno = 0; >> + if (hwpoison) >> + info.si_code= BUS_MCEERR_AR; >> + else >> + info.si_code= 0; >> + >> + info.si_addr= (void __user *)address; >> + if (hugetlb) >> + info.si_addr_lsb = PMD_SHIFT; >> + else >> + info.si_addr_lsb = PAGE_SHIFT; >> + >> + send_sig_info(SIGBUS, , current); >> +} >> + > « [hide part of quote] > > Punit reviewed the other version of this patch, this PMD_SHIFT is not the > right > thing to do, it needs a more accurate set of calls and shifts as there may be > hugetlbfs pages other than PMD_SIZE. > > https://www.spinics.net/lists/arm-kernel/msg568919.html > > I haven't posted a new version of that patch because I was still hunting a bug > in the hugepage/hwpoison code, even with Punit's fixes series I see -EFAULT > returned to userspace instead of this hwpoison code being invoked. Ok, got it, thanks for your information. > > Please avoid duplicating functionality between patches, it wastes reviewers > time, especially when we know there are problems with this approach. > > >> +static void kvm_handle_bad_page(unsigned long address, >> + bool hugetlb, bool hwpoison) >> +{ >> + /* handle both hwpoison and other synchronous external Abort */ >> + if (hwpoison) >> + kvm_send_signal(address, hugetlb, true); >> + else >> + kvm_send_signal(address, hugetlb, false); >> +} > > Why the extra level of indirection? We only want to signal userspace like this > from KVM for hwpoison. Signals for RAS related reasons should come from the > bits > of the kernel that decoded the error. For the SEA, the are maily two types: 0b01 Synchronous External Abort on memory access. 0b0101xx Synchronous External Abort on page table walk. DFSC[1:0] encode the level. hwpoison should belong to the "Synchronous External Abort on memory access" if the SEA type is not hwpoison, such as page table walk, do you mean KVM do not deliver the SIGBUS? If so, how the KVM handle the SEA type other than hwpoison? > > (hwpoison for KVM is a corner case as Qemu's memory effectively has two users, > Qemu and KVM. This isn't the example of how user-space gets signalled.) > > >> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c >> index b37446a..780e3c4 100644 >> --- a/arch/arm64/kvm/guest.c >> +++ b/arch/arm64/kvm/guest.c >> @@ -277,6 +277,13 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, >> return -EINVAL; >> } >> >> +int kvm_vcpu_ioctl_sea(struct kvm_vcpu *vcpu) >> +{ >> + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); >> + >> + return 0; >> +} > >> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h >> index bb02909..1d2e2e7 100644 >> --- a/include/uapi/linux/kvm.h >> +++ b/include/uapi/linux/kvm.h >> @@ -1306,6 +1306,7 @@ struct kvm_s390_ucas_mapping { >> #define KVM_S390_GET_IRQ_STATE _IOW(KVMIO, 0xb6, struct kvm_s390_irq_state) >> /* Available with KVM_CAP_X86_SMM */ >> #define KVM_SMI _IO(KVMIO, 0xb7) >> +#define KVM_ARM_SEA _IO(KVMIO, 0xb8) >> >> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) >> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1) >> > > Why do we need a userspace API for SEA? It can also be done by using > KVM_{G,S}ET_ONE_REG to change the vcpu registers. The advantage of doing it > this > way is you can choose which ESR value to use. > > Adding a new API call to do something you could do with an old one doesn't > look > right. James, I considered your suggestion before that use the KVM_{G,S}ET_ONE_REG to change the vcpu registers. but I found it does not have difference to use the alread existed KVM API. so may be changing the vcpu registers in qemu will duplicate with the KVM APIs. injection a SEA is no more than setting some registers: elr_el1, PC, PSTATE, SPSR_el1, far_el1, esr_el1 I seen this KVM API do the same thing as Qemu. do you found call this API will have issue and necessary to choose another ESR value? I pasted the alread existed KVM API code: static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr) { unsigned long cpsr = *vcpu_cpsr(vcpu); bool is_aarch32 = vcpu_mode_is_32bit(vcpu); u32 esr = 0; *vcpu_elr_el1(vcpu) = *vcpu_pc(vcpu);