Re: [Qemu-devel] [PATCH 03/29] target-sparc: add UA2005 TTE bit #defines

2016-10-12 Thread Richard Henderson

On 10/12/2016 06:18 AM, Artyom Tarasenko wrote:

What I would most like to see, for QEMU, is an artificial sun4v compatible
machine that implements a "hardware" page table walk.  I.e. no use of
SparcTLBEntry, but walking the page tables directly.

Because QEMU can then satisfy a page lookup internally, without having to
longjmp out of a memory reference in progress in order to restart the cpu
for the software TLB miss handler, the emulation runs about 30-50% faster.
At least that has been my experience emulating Alpha vs MIPS.

It would require custom roms, but those should be fairly easy to modify from
the existing source.



Maybe it's even possible without the modifications. For instance,
implement the table walk compatible with the current hypervisor, and
then just add possibility to overlay hypervisor call using some CPU
feature flag.


Maybe so.  What we lack is being given direct access to the page table base. 
But we know that the CPU structure is in the hypervisor shadow register 0, and 
that offset CPU_ROOT is the page table base.


As long as we're willing to hard-code these two facts concerning any rom we 
care to load, we could in fact implement the tlb miss success path inside QEMU. 
 We would let the rom re-do the work for the tlb miss failure path, on the way 
to raising the exception with the supervisor.



r~



Re: [Qemu-devel] [PATCH 03/29] target-sparc: add UA2005 TTE bit #defines

2016-10-12 Thread Artyom Tarasenko
On Tue, Oct 11, 2016 at 5:08 PM, Richard Henderson  wrote:
> On 10/11/2016 08:51 AM, Artyom Tarasenko wrote:
>>
>> On Tue, Oct 11, 2016 at 7:50 AM, Richard Henderson 
>> wrote:
>>>
>>> On 10/10/2016 04:45 PM, Artyom Tarasenko wrote:
>
>
> Hmm.  Would it make more sense to reorg these as
>
>   TTE_US1_*
>   TTE_UA2005_*
>
> with some duplication for the bits that are shared?
> As is, it's pretty hard to tell which actually change...



 All of them :-)
 I'm not sure about renaming: the US1 format is still used in T1 on the
 read
 access.

 On the other hand, it's not used in T2. And then again we don't have the
 T2
 emulation yet.
>>>
>>>
>>>
>>> Oh my.  Different on T2 as well?
>>
>>
>> T2 has more used bits, and can not use the US1 format, I think.
>>
>>> I wonder if it would make sense to have different functions with which to
>>> fill in the CPUClass hooks (or invent new SPARCCPUClass hooks as
>>> necessary)
>>> for the major entry points.
>>>
>>> E.g. sparc_cpu_handle_mmu_fault or get_physical_address could be hooked,
>>> so
>>> that the choice of how to handle the tlb miss is chosen at startup time,
>>> and
>>> not during each fault.  One can arrange subroutines as necessary to share
>>> code between the alternate routines, such as when T1 needs to use parts
>>> of
>>> US1.
>>
>>
>> Yes, I plan to do it once I get to T2 emulation.
>
>
> Ok.
>
>>> Similarly for out-of-line ASI handling, which is already beyond messy,
>>> with
>>> handling for all cpus thrown in the same switch statement.
>>
>>
>> Yes. I think we need to split SPARCv9 standard ASIs from CPU-specific
>> ones, call cpu-specific handlers first and standard handler
>> afterwards.
>> But not in this series.
>
>
> Fair enough.
>
> What I would most like to see, for QEMU, is an artificial sun4v compatible
> machine that implements a "hardware" page table walk.  I.e. no use of
> SparcTLBEntry, but walking the page tables directly.
>
> Because QEMU can then satisfy a page lookup internally, without having to
> longjmp out of a memory reference in progress in order to restart the cpu
> for the software TLB miss handler, the emulation runs about 30-50% faster.
> At least that has been my experience emulating Alpha vs MIPS.
>
> It would require custom roms, but those should be fairly easy to modify from
> the existing source.
>

Maybe it's even possible without the modifications. For instance,
implement the table walk compatible with the current hypervisor, and
then just add possibility to overlay hypervisor call using some CPU
feature flag.


-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu



Re: [Qemu-devel] [PATCH 03/29] target-sparc: add UA2005 TTE bit #defines

2016-10-11 Thread Richard Henderson

On 10/11/2016 08:51 AM, Artyom Tarasenko wrote:

On Tue, Oct 11, 2016 at 7:50 AM, Richard Henderson  wrote:

On 10/10/2016 04:45 PM, Artyom Tarasenko wrote:


Hmm.  Would it make more sense to reorg these as

  TTE_US1_*
  TTE_UA2005_*

with some duplication for the bits that are shared?
As is, it's pretty hard to tell which actually change...



All of them :-)
I'm not sure about renaming: the US1 format is still used in T1 on the
read
access.

On the other hand, it's not used in T2. And then again we don't have the
T2
emulation yet.



Oh my.  Different on T2 as well?


T2 has more used bits, and can not use the US1 format, I think.


I wonder if it would make sense to have different functions with which to
fill in the CPUClass hooks (or invent new SPARCCPUClass hooks as necessary)
for the major entry points.

E.g. sparc_cpu_handle_mmu_fault or get_physical_address could be hooked, so
that the choice of how to handle the tlb miss is chosen at startup time, and
not during each fault.  One can arrange subroutines as necessary to share
code between the alternate routines, such as when T1 needs to use parts of
US1.


Yes, I plan to do it once I get to T2 emulation.


Ok.


Similarly for out-of-line ASI handling, which is already beyond messy, with
handling for all cpus thrown in the same switch statement.


Yes. I think we need to split SPARCv9 standard ASIs from CPU-specific
ones, call cpu-specific handlers first and standard handler
afterwards.
But not in this series.


Fair enough.

What I would most like to see, for QEMU, is an artificial sun4v compatible 
machine that implements a "hardware" page table walk.  I.e. no use of 
SparcTLBEntry, but walking the page tables directly.


Because QEMU can then satisfy a page lookup internally, without having to 
longjmp out of a memory reference in progress in order to restart the cpu for 
the software TLB miss handler, the emulation runs about 30-50% faster.  At 
least that has been my experience emulating Alpha vs MIPS.


It would require custom roms, but those should be fairly easy to modify from 
the existing source.



r~



Re: [Qemu-devel] [PATCH 03/29] target-sparc: add UA2005 TTE bit #defines

2016-10-11 Thread Artyom Tarasenko
On Tue, Oct 11, 2016 at 7:50 AM, Richard Henderson  wrote:
> On 10/10/2016 04:45 PM, Artyom Tarasenko wrote:
>>>
>>> Hmm.  Would it make more sense to reorg these as
>>>
>>>   TTE_US1_*
>>>   TTE_UA2005_*
>>>
>>> with some duplication for the bits that are shared?
>>> As is, it's pretty hard to tell which actually change...
>>
>>
>> All of them :-)
>> I'm not sure about renaming: the US1 format is still used in T1 on the
>> read
>> access.
>>
>> On the other hand, it's not used in T2. And then again we don't have the
>> T2
>> emulation yet.
>
>
> Oh my.  Different on T2 as well?

T2 has more used bits, and can not use the US1 format, I think.

> I wonder if it would make sense to have different functions with which to
> fill in the CPUClass hooks (or invent new SPARCCPUClass hooks as necessary)
> for the major entry points.
>
> E.g. sparc_cpu_handle_mmu_fault or get_physical_address could be hooked, so
> that the choice of how to handle the tlb miss is chosen at startup time, and
> not during each fault.  One can arrange subroutines as necessary to share
> code between the alternate routines, such as when T1 needs to use parts of
> US1.

Yes, I plan to do it once I get to T2 emulation.

> Similarly for out-of-line ASI handling, which is already beyond messy, with
> handling for all cpus thrown in the same switch statement.

Yes. I think we need to split SPARCv9 standard ASIs from CPU-specific
ones, call cpu-specific handlers first and standard handler
afterwards.
But not in this series.

Artyom

-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu



Re: [Qemu-devel] [PATCH 03/29] target-sparc: add UA2005 TTE bit #defines

2016-10-10 Thread Richard Henderson

On 10/10/2016 04:45 PM, Artyom Tarasenko wrote:

Hmm.  Would it make more sense to reorg these as

  TTE_US1_*
  TTE_UA2005_*

with some duplication for the bits that are shared?
As is, it's pretty hard to tell which actually change...


All of them :-)
I'm not sure about renaming: the US1 format is still used in T1 on the read
access.

On the other hand, it's not used in T2. And then again we don't have the T2
emulation yet.


Oh my.  Different on T2 as well?

I wonder if it would make sense to have different functions with which to fill 
in the CPUClass hooks (or invent new SPARCCPUClass hooks as necessary) for the 
major entry points.


E.g. sparc_cpu_handle_mmu_fault or get_physical_address could be hooked, so 
that the choice of how to handle the tlb miss is chosen at startup time, and 
not during each fault.  One can arrange subroutines as necessary to share code 
between the alternate routines, such as when T1 needs to use parts of US1.


Similarly for out-of-line ASI handling, which is already beyond messy, with 
handling for all cpus thrown in the same switch statement.



r~



Re: [Qemu-devel] [PATCH 03/29] target-sparc: add UA2005 TTE bit #defines

2016-10-10 Thread Artyom Tarasenko
10 окт. 2016 г. 23:22 пользователь "Richard Henderson" 
написал:
>
> On 10/01/2016 05:05 AM, Artyom Tarasenko wrote:
>>
>>  #define TTE_VALID_BIT   (1ULL << 63)
>>  #define TTE_NFO_BIT (1ULL << 60)
>> +#define TTE_NFO_BIT_UA2005  (1ULL << 62)
>>  #define TTE_USED_BIT(1ULL << 41)
>> +#define TTE_USED_BIT_UA2005 (1ULL << 47)
>>  #define TTE_LOCKED_BIT  (1ULL <<  6)
>> +#define TTE_LOCKED_BIT_UA2005 (1ULL <<  61)
>>  #define TTE_SIDEEFFECT_BIT  (1ULL <<  3)
>> +#define TTE_SIDEEFFECT_BIT_UA2005 (1ULL <<  11)
>>  #define TTE_PRIV_BIT(1ULL <<  2)
>> +#define TTE_PRIV_BIT_UA2005 (1ULL <<  8)
>>  #define TTE_W_OK_BIT(1ULL <<  1)
>> +#define TTE_W_OK_BIT_UA2005 (1ULL <<  6)
>>  #define TTE_GLOBAL_BIT  (1ULL <<  0)
>
>
> Hmm.  Would it make more sense to reorg these as
>
>   TTE_US1_*
>   TTE_UA2005_*
>
> with some duplication for the bits that are shared?
> As is, it's pretty hard to tell which actually change...

All of them :-)
I'm not sure about renaming: the US1 format is still used in T1 on the read
access.

On the other hand, it's not used in T2. And then again we don't have the T2
emulation yet.

Artyom


Re: [Qemu-devel] [PATCH 03/29] target-sparc: add UA2005 TTE bit #defines

2016-10-10 Thread Richard Henderson

On 10/01/2016 05:05 AM, Artyom Tarasenko wrote:

 #define TTE_VALID_BIT   (1ULL << 63)
 #define TTE_NFO_BIT (1ULL << 60)
+#define TTE_NFO_BIT_UA2005  (1ULL << 62)
 #define TTE_USED_BIT(1ULL << 41)
+#define TTE_USED_BIT_UA2005 (1ULL << 47)
 #define TTE_LOCKED_BIT  (1ULL <<  6)
+#define TTE_LOCKED_BIT_UA2005 (1ULL <<  61)
 #define TTE_SIDEEFFECT_BIT  (1ULL <<  3)
+#define TTE_SIDEEFFECT_BIT_UA2005 (1ULL <<  11)
 #define TTE_PRIV_BIT(1ULL <<  2)
+#define TTE_PRIV_BIT_UA2005 (1ULL <<  8)
 #define TTE_W_OK_BIT(1ULL <<  1)
+#define TTE_W_OK_BIT_UA2005 (1ULL <<  6)
 #define TTE_GLOBAL_BIT  (1ULL <<  0)


Hmm.  Would it make more sense to reorg these as

  TTE_US1_*
  TTE_UA2005_*

with some duplication for the bits that are shared?
As is, it's pretty hard to tell which actually change...


r~



[Qemu-devel] [PATCH 03/29] target-sparc: add UA2005 TTE bit #defines

2016-10-01 Thread Artyom Tarasenko
Signed-off-by: Artyom Tarasenko 
---
 target-sparc/cpu.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/target-sparc/cpu.h b/target-sparc/cpu.h
index 238ebf2..2c169e1 100644
--- a/target-sparc/cpu.h
+++ b/target-sparc/cpu.h
@@ -290,11 +290,17 @@ enum {
 
 #define TTE_VALID_BIT   (1ULL << 63)
 #define TTE_NFO_BIT (1ULL << 60)
+#define TTE_NFO_BIT_UA2005  (1ULL << 62)
 #define TTE_USED_BIT(1ULL << 41)
+#define TTE_USED_BIT_UA2005 (1ULL << 47)
 #define TTE_LOCKED_BIT  (1ULL <<  6)
+#define TTE_LOCKED_BIT_UA2005 (1ULL <<  61)
 #define TTE_SIDEEFFECT_BIT  (1ULL <<  3)
+#define TTE_SIDEEFFECT_BIT_UA2005 (1ULL <<  11)
 #define TTE_PRIV_BIT(1ULL <<  2)
+#define TTE_PRIV_BIT_UA2005 (1ULL <<  8)
 #define TTE_W_OK_BIT(1ULL <<  1)
+#define TTE_W_OK_BIT_UA2005 (1ULL <<  6)
 #define TTE_GLOBAL_BIT  (1ULL <<  0)
 
 #define TTE_IS_VALID(tte)   ((tte) & TTE_VALID_BIT)
@@ -302,14 +308,24 @@ enum {
 #define TTE_IS_USED(tte)((tte) & TTE_USED_BIT)
 #define TTE_IS_LOCKED(tte)  ((tte) & TTE_LOCKED_BIT)
 #define TTE_IS_SIDEEFFECT(tte) ((tte) & TTE_SIDEEFFECT_BIT)
+#define TTE_IS_SIDEEFFECT_UA2005(tte) ((tte) & TTE_SIDEEFFECT_BIT_UA2005)
 #define TTE_IS_PRIV(tte)((tte) & TTE_PRIV_BIT)
 #define TTE_IS_W_OK(tte)((tte) & TTE_W_OK_BIT)
+
+#define TTE_IS_NFO_UA2005(tte) ((tte) & TTE_NFO_BIT_UA2005)
+#define TTE_IS_USED_UA2005(tte)((tte) & TTE_USED_BIT_UA2005)
+#define TTE_IS_LOCKED_UA2005(tte)  ((tte) & TTE_LOCKED_BIT_UA2005)
+#define TTE_IS_SIDEEFFECT_UA2005(tte) ((tte) & TTE_SIDEEFFECT_BIT_UA2005)
+#define TTE_IS_PRIV_UA2005(tte)((tte) & TTE_PRIV_BIT_UA2005)
+#define TTE_IS_W_OK_UA2005(tte)((tte) & TTE_W_OK_BIT_UA2005)
+
 #define TTE_IS_GLOBAL(tte)  ((tte) & TTE_GLOBAL_BIT)
 
 #define TTE_SET_USED(tte)   ((tte) |= TTE_USED_BIT)
 #define TTE_SET_UNUSED(tte) ((tte) &= ~TTE_USED_BIT)
 
 #define TTE_PGSIZE(tte) (((tte) >> 61) & 3ULL)
+#define TTE_PGSIZE_UA2005(tte) ((tte) & 7ULL)
 #define TTE_PA(tte) ((tte) & 0x1ffe000ULL)
 
 #define SFSR_NF_BIT (1ULL << 24)   /* JPS1 NoFault */
-- 
2.7.2