On Sat, Jul 9, 2016 at 4:38 AM, Richard Henderson <r...@twiddle.net> wrote:
> From: Samuel Damashek <samuel.damas...@invincea.com>
>
> As it currently stands, QEMU does not properly handle self-modifying code
> when the write is unaligned and crosses a page boundary. The procedure
> for handling a write to the current translation block is to write-protect
> the current translation block, catch the write, split up the translation
> block into the current instruction (which remains write-protected so that
> the current instruction is not modified) and the remaining instructions
> in the translation block, and then restore the CPU state to before the
> write occurred so the write will be retried and successfully executed.
> However, since unaligned writes across pages are split into one-byte
> writes for simplicity, writes to the second page (which is not the
> current TB) may succeed before a write to the current TB is attempted,
> and since these writes are not invalidated before resuming state after
> splitting the TB, these writes will be performed a second time, thus
> corrupting the second page. Credit goes to Patrick Hulin for
> discovering this.
>
> In recent 64-bit versions of Windows running in emulated mode, this
> results in either being very unstable (a BSOD after a couple minutes of
> uptime), or being entirely unable to boot. Windows performs one or more
> 8-byte unaligned self-modifying writes (xors) which intersect the end
> of the current TB and the beginning of the next TB, which runs into the
> aforementioned issue. This commit fixes that issue by making the
> unaligned write loop perform the writes in forwards order, instead of
> reverse order. This way, QEMU immediately tries to write to the current
> TB, and splits the TB before any write to the second page is executed.
> The write then proceeds as intended. With this patch applied, I am able
> to boot and use Windows 7 64-bit and Windows 10 64-bit in QEMU without
> KVM.
>
> Per Richard Henderson's input, this patch also ensures the second page
> is in the TLB before executing the write loop, to ensure the second
> page is mapped.
>
> The original discussion of the issue is located at
> http://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg02161.html.
>
> Signed-off-by: Samuel Damashek <samuel.damas...@invincea.com>
> Message-Id: <20160706182652.16190-1-samuel.damas...@invincea.com>
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  softmmu_template.h | 44 +++++++++++++++++++++++++++++++++++---------
>  1 file changed, 35 insertions(+), 9 deletions(-)
>
> diff --git a/softmmu_template.h b/softmmu_template.h
> index aeab016..284ab2c 100644
> --- a/softmmu_template.h
> +++ b/softmmu_template.h
> @@ -370,12 +370,25 @@ void helper_le_st_name(CPUArchState *env, target_ulong 
> addr, DATA_TYPE val,
>      if (DATA_SIZE > 1
>          && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
>                       >= TARGET_PAGE_SIZE)) {
> -        int i;
> +        int i, index2;
> +        target_ulong page2, tlb_addr2;
>      do_unaligned_access:
> -        /* XXX: not efficient, but simple */
> -        /* Note: relies on the fact that tlb_fill() does not remove the
> -         * previous page from the TLB cache.  */
> -        for (i = DATA_SIZE - 1; i >= 0; i--) {
> +        /* Ensure the second page is in the TLB.  Note that the first page
> +           is already guaranteed to be filled, and that the second page
> +           cannot evict the first.  */
> +        page2 = (addr + DATA_SIZE) & TARGET_PAGE_MASK;

Should there be (addr + DATA_SIZE - 1)?
A minor optimization: We can compare the second page to the first page at first.

> +        index2 = (page2 >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
> +        tlb_addr2 = env->tlb_table[mmu_idx][index2].addr_write;
> +        if (page2 != (tlb_addr2 & (TARGET_PAGE_MASK | TLB_INVALID_MASK))
> +            && !VICTIM_TLB_HIT(addr_write, page2)) {
> +            tlb_fill(ENV_GET_CPU(env), page2, MMU_DATA_STORE,
> +                     mmu_idx, retaddr);
> +        }
> +
> +        /* XXX: not efficient, but simple.  */
> +        /* This loop must go in the forward direction to avoid issues
> +           with self-modifying code in Windows 64-bit.  */
> +        for (i = 0; i < DATA_SIZE; ++i) {
>              /* Little-endian extract.  */
>              uint8_t val8 = val >> (i * 8);
>              /* Note the adjustment at the beginning of the function.
> @@ -440,12 +453,25 @@ void helper_be_st_name(CPUArchState *env, target_ulong 
> addr, DATA_TYPE val,
>      if (DATA_SIZE > 1
>          && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
>                       >= TARGET_PAGE_SIZE)) {
> -        int i;
> +        int i, index2;
> +        target_ulong page2, tlb_addr2;
>      do_unaligned_access:
> +        /* Ensure the second page is in the TLB.  Note that the first page
> +           is already guaranteed to be filled, and that the second page
> +           cannot evict the first.  */
> +        page2 = (addr + DATA_SIZE) & TARGET_PAGE_MASK;

The same as above.

> +        index2 = (page2 >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
> +        tlb_addr2 = env->tlb_table[mmu_idx][index2].addr_write;
> +        if (page2 != (tlb_addr2 & (TARGET_PAGE_MASK | TLB_INVALID_MASK))
> +            && !VICTIM_TLB_HIT(addr_write, page2)) {
> +            tlb_fill(ENV_GET_CPU(env), page2, MMU_DATA_STORE,
> +                     mmu_idx, retaddr);
> +        }
> +
>          /* XXX: not efficient, but simple */
> -        /* Note: relies on the fact that tlb_fill() does not remove the
> -         * previous page from the TLB cache.  */
> -        for (i = DATA_SIZE - 1; i >= 0; i--) {
> +        /* This loop must go in the forward direction to avoid issues
> +           with self-modifying code.  */
> +        for (i = 0; i < DATA_SIZE; ++i) {
>              /* Big-endian extract.  */
>              uint8_t val8 = val >> (((DATA_SIZE - 1) * 8) - (i * 8));
>              /* Note the adjustment at the beginning of the function.
> --
> 2.7.4
>
>

Reply via email to