Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-25 Thread Paul Brook
> I'd prefer a generic address type since this makes it easier to
> share code: for example virtio devices can use different busses,

This is exactly why virtio devices should not be accessing memory. They should 
be doing everything via a virtqueue, which can interface with the host 
bindings appropriately.

Paul




Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-25 Thread Michael S. Tsirkin
On Wed, Mar 24, 2010 at 06:07:35PM -0500, Anthony Liguori wrote:
> On 03/24/2010 06:05 PM, Paul Brook wrote:
>>> On 03/24/2010 05:33 PM, Paul Brook wrote:
>>>  
> But now there is a bigger problem, how to pass the property to the
> device. It's not fair to require the user to remember to set it.
>  
 It should not be a property of the device. All devices have a native
 endianness (for PCI this is little-endian), and the intermediate
 busses/interconnects should determine whether byteswapping occurs.

>>> Right, the byte swapping needs to happen at the bus level which requires
>>> that the PCI regions use a different callback mechanism (and don't
>>> register directly with the cpu).
>>>  
>> Not necessarily a different callback mechanism, but probably a different
>> registration mechanism.
>>
>
> Problem with the current scheme is that it's tied to target_phys_addr_t.  
> It's not a target_phys_addr_t and cannot be used with functions that take 
> target_phys_addr_ts.
>
> We can either introduce a generic address type or we can introduce bus  
> specific addresses and have different callbacks.  The later has better  
> type safety and since this isn't an obvious issue to most developers,  
> that's probably an important feature.
>
> Regards,
>
> Anthony Liguori
>
>> Paul
>>


I'd prefer a generic address type since this makes it easier to
share code: for example virtio devices can use different busses,
it's common for pci host bridges to have common code for i/o
and memory, etc.

-- 
MST




Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Jamie Lokier
Anthony Liguori wrote:
> It's a statement of correctness really.  Devices should never deal with 
> target_phys_addr_t's.

Shouldn't they?  On real hardware, 64-bit PCI devices switch to being
32-bit PCI when plugged into a 32-bit motherboard slot.

> The question is, should a pci_addr_t or a sysbus_addr_t be 64 bit or 
> should it be 32-bit on 32-bit platforms.

Depends what you want to emulate.  It's not an accurate emulation if
all the old PCI devices provide 64-bit BARs; that could conceivably
bite some old OS, which expects NE2000-PCI to be a 32-bit device, for example.

And it's not a repeatable emulation if switching beteen 32-bit and
64-bit hosts means the guest sees a change in the PCI devices.  It
should be possible to change hosts with qemu willy nilly with zero
change seen be the guest.

So perhaps the width of pci_addr_t should be a per-device property,
not a host property?

> Honestly, I am extremely sceptical that there would be any
> measurable performance difference.

You may be right, but surely the way to find out is to have a way to
build either, and then compare them.  Not have it dictated by the build system.

64-bit ops on 32-bit hosts, especially x86 due to register pressure,
are noticably more expensive than 32-bit ops.  The question is whether
they are sparse enough among all the other logic that it doesn't matter.

With a bit of make cleverness it should be quite easy to support a mix
of build-once files and build-few-times files according to the minimal
compile time variations those files depend on - and express those
dependencies is a simple, one-liner way.  GNU Make is good for that
sort of thing.

- Jamie




Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Jamie Lokier
Juan Quintela wrote:
> Blue Swirl  wrote:
> > Hi,
> >
> > Here's some planning for getting most files compiled as few times as
> > possible. Comments and suggestions are welcome.
> 
> I took some thought about this at some point.  Problems here start from
> "Recursive Makefile condered Harmful (tm)".

Anyone who remembers the old Linux kernel recursive make compared
with the current "dancing makefile" magic will appreciate the big leap
forward in usability.

-- Jamie




Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Anthony Liguori

On 03/24/2010 06:05 PM, Paul Brook wrote:

On 03/24/2010 05:33 PM, Paul Brook wrote:
 

But now there is a bigger problem, how to pass the property to the
device. It's not fair to require the user to remember to set it.
 

It should not be a property of the device. All devices have a native
endianness (for PCI this is little-endian), and the intermediate
busses/interconnects should determine whether byteswapping occurs.
   

Right, the byte swapping needs to happen at the bus level which requires
that the PCI regions use a different callback mechanism (and don't
register directly with the cpu).
 

Not necessarily a different callback mechanism, but probably a different
registration mechanism.
   


Problem with the current scheme is that it's tied to 
target_phys_addr_t.  It's not a target_phys_addr_t and cannot be used 
with functions that take target_phys_addr_ts.


We can either introduce a generic address type or we can introduce bus 
specific addresses and have different callbacks.  The later has better 
type safety and since this isn't an obvious issue to most developers, 
that's probably an important feature.


Regards,

Anthony Liguori


Paul
   






Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Paul Brook
> On 03/24/2010 05:33 PM, Paul Brook wrote:
> >> But now there is a bigger problem, how to pass the property to the
> >> device. It's not fair to require the user to remember to set it.
> >
> > It should not be a property of the device. All devices have a native
> > endianness (for PCI this is little-endian), and the intermediate
> > busses/interconnects should determine whether byteswapping occurs.
> 
> Right, the byte swapping needs to happen at the bus level which requires
> that the PCI regions use a different callback mechanism (and don't
> register directly with the cpu).

Not necessarily a different callback mechanism, but probably a different 
registration mechanism.

Paul




Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Anthony Liguori

On 03/24/2010 05:47 PM, Paul Brook wrote:

Actually, Anthony suggested at some point to just use 64 bits for
TARGET_PHYS_ADDR_BITS and remove the need for hw32/64.

I think that people emulationg 32bits on 32bits would suffer, but have
no clue how much.  Anthony, what was the idea?
 

Sacrificing runtime performance to avoid rebuilding a few files is not
acceptable. I consider the fact that TARGET_PHYS_ADDR_BITS is always 64 on 64-
bit hosts is a bug. It's just hard to fix, and probably even less of a
performance hit, so I haven't bothered yet.
   


It's a statement of correctness really.  Devices should never deal with 
target_phys_addr_t's.


The question is, should a pci_addr_t or a sysbus_addr_t be 64 bit or 
should it be 32-bit on 32-bit platforms.  Honestly, I am extremely 
sceptical that there would be any measurable performance difference.


Regards,

Anthony Liguori


Paul


   






Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Anthony Liguori

On 03/24/2010 05:33 PM, Paul Brook wrote:

But now there is a bigger problem, how to pass the property to the
device. It's not fair to require the user to remember to set it.
 

It should not be a property of the device. All devices have a native
endianness (for PCI this is little-endian), and the intermediate
busses/interconnects should determine whether byteswapping occurs.
   


Right, the byte swapping needs to happen at the bus level which requires 
that the PCI regions use a different callback mechanism (and don't 
register directly with the cpu).


Regards,

Anthony Liguori


Paul


   






Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Paul Brook
> Actually, Anthony suggested at some point to just use 64 bits for
> TARGET_PHYS_ADDR_BITS and remove the need for hw32/64.
> 
> I think that people emulationg 32bits on 32bits would suffer, but have
> no clue how much.  Anthony, what was the idea?

Sacrificing runtime performance to avoid rebuilding a few files is not 
acceptable. I consider the fact that TARGET_PHYS_ADDR_BITS is always 64 on 64-
bit hosts is a bug. It's just hard to fix, and probably even less of a 
performance hit, so I haven't bothered yet.

Paul




Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Paul Brook
> But now there is a bigger problem, how to pass the property to the
> device. It's not fair to require the user to remember to set it.

It should not be a property of the device. All devices have a native 
endianness (for PCI this is little-endian), and the intermediate 
busses/interconnects should determine whether byteswapping occurs.

Paul




Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Aurelien Jarno
On Wed, Mar 24, 2010 at 10:24:12PM +0200, Blue Swirl wrote:
> On 3/24/10, Michael S. Tsirkin  wrote:
> > On Wed, Mar 24, 2010 at 10:05:10PM +0200, Blue Swirl wrote:
> >  > On 3/24/10, Michael S. Tsirkin  wrote:
> >  > > On Tue, Mar 23, 2010 at 11:43:51PM +0200, Blue Swirl wrote:
> >  > >  > rtl8139.c, e1000.c: need to convert ldl/stl to 
> > cpu_physical_memory_read/write.
> >  > >
> >  > >
> >  > > I don't see how it would help. These still get target_phys_addr_t which
> >  > >  is per-target. Further, a ton of devices do
> >  > >  cpu_register_physical_memory/qemu_register_coalesced_mmio.
> >  > >  These are also per target.
> >  >
> >  > I don't know what I was eating yesterday: there are no references to
> >  > ldl/stl in either rtl8139 or e1000. In fact, the conversion is simple
> >  > for the device itself, just add a property "be". The attached patch
> >  > performs this part.
> >  >
> >  > But now there is a bigger problem, how to pass the property to the
> >  > device. It's not fair to require the user to remember to set it.
> >
> >
> > I still don't fully understand how come e1000 cares about
> >  target endianness.
> 
> It shouldn't. Maybe the real fix is to remove the byte swapping.
> 

The real fix is actually to add a layer handling bus byte swapping
depending on how bus are connected.

Currently it only works because all big endian boards QEMU emulates
need to byteswap bus access, and none of the little endian boards 
need to do that.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net




[Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Blue Swirl
On 3/24/10, Michael S. Tsirkin  wrote:
> On Wed, Mar 24, 2010 at 10:24:12PM +0200, Blue Swirl wrote:
>  > On 3/24/10, Michael S. Tsirkin  wrote:
>  > > On Wed, Mar 24, 2010 at 10:05:10PM +0200, Blue Swirl wrote:
>  > >  > On 3/24/10, Michael S. Tsirkin  wrote:
>  > >  > > On Tue, Mar 23, 2010 at 11:43:51PM +0200, Blue Swirl wrote:
>  > >  > >  > rtl8139.c, e1000.c: need to convert ldl/stl to 
> cpu_physical_memory_read/write.
>  > >  > >
>  > >  > >
>  > >  > > I don't see how it would help. These still get target_phys_addr_t 
> which
>  > >  > >  is per-target. Further, a ton of devices do
>  > >  > >  cpu_register_physical_memory/qemu_register_coalesced_mmio.
>  > >  > >  These are also per target.
>  > >  >
>  > >  > I don't know what I was eating yesterday: there are no references to
>  > >  > ldl/stl in either rtl8139 or e1000. In fact, the conversion is simple
>  > >  > for the device itself, just add a property "be". The attached patch
>  > >  > performs this part.
>  > >  >
>  > >  > But now there is a bigger problem, how to pass the property to the
>  > >  > device. It's not fair to require the user to remember to set it.
>  > >
>  > >
>  > > I still don't fully understand how come e1000 cares about
>  > >  target endianness.
>  >
>  > It shouldn't. Maybe the real fix is to remove the byte swapping.
>
>
> Presumably it's there for a reason?
>
>
>  > >  > >  A simple solution would be to change all of cpu_XX functions to
>  > >  > >  get a 64 bit address. This is a lot of churn, if we do this
>  > >  > >  anyway we should also pass length to callbacks, this way
>  > >  > >  rwhandler will get very small or go away completely.
>  > >  >
>  > >  > It's not too much effort to keep the target_phys_addr_t type.
>  > >
>  > >
>  > > I don't understand - target_phys_addr_t is different for different
>  > >  targets to we will need to recompile the code for each target.
>  > >  What am I missing?
>  >
>  > target_phys_addr_t is 64 bit on a 64 bit host, on a 32 bit host it's
>  > size will be either 64 or 32 bits depending on the target. So the
>  > files are compiled once on 64 bit host, twice on 32 bit host if both
>  > 32 and 64 bits targets are selected.
>
>
> How about just making it 64 bit unconditionally?
>  How much do we save by using a 32 bit address value?

On a 32 bit host, probably a lot because of register pressure. And
it's not too much effort to keep the target_phys_addr_t type logic.




Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Anthony Liguori

On 03/24/2010 03:24 PM, Blue Swirl wrote:

On 3/24/10, Michael S. Tsirkin  wrote:
   

On Wed, Mar 24, 2010 at 10:05:10PM +0200, Blue Swirl wrote:
  >  On 3/24/10, Michael S. Tsirkin  wrote:
  >  >  On Tue, Mar 23, 2010 at 11:43:51PM +0200, Blue Swirl wrote:
  >  >   >  rtl8139.c, e1000.c: need to convert ldl/stl to 
cpu_physical_memory_read/write.
  >  >
  >  >
  >  >  I don't see how it would help. These still get target_phys_addr_t which
  >  >   is per-target. Further, a ton of devices do
  >  >   cpu_register_physical_memory/qemu_register_coalesced_mmio.
  >  >   These are also per target.
  >
  >  I don't know what I was eating yesterday: there are no references to
  >  ldl/stl in either rtl8139 or e1000. In fact, the conversion is simple
  >  for the device itself, just add a property "be". The attached patch
  >  performs this part.
  >
  >  But now there is a bigger problem, how to pass the property to the
  >  device. It's not fair to require the user to remember to set it.


I still don't fully understand how come e1000 cares about
  target endianness.
 

It shouldn't. Maybe the real fix is to remove the byte swapping.
   


My previous pci memory functions patches removed the byte swapping.

The problem is that PCI devices are going to operate in little endian 
mode (usually) whereas the CPU may be acting in big endian mode.  We 
need to do a byte swap somewhere but the better place to do it is in the 
PCI bus layer.


Regards,

Anthony Liguori




[Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Michael S. Tsirkin
On Wed, Mar 24, 2010 at 10:24:12PM +0200, Blue Swirl wrote:
> On 3/24/10, Michael S. Tsirkin  wrote:
> > On Wed, Mar 24, 2010 at 10:05:10PM +0200, Blue Swirl wrote:
> >  > On 3/24/10, Michael S. Tsirkin  wrote:
> >  > > On Tue, Mar 23, 2010 at 11:43:51PM +0200, Blue Swirl wrote:
> >  > >  > rtl8139.c, e1000.c: need to convert ldl/stl to 
> > cpu_physical_memory_read/write.
> >  > >
> >  > >
> >  > > I don't see how it would help. These still get target_phys_addr_t which
> >  > >  is per-target. Further, a ton of devices do
> >  > >  cpu_register_physical_memory/qemu_register_coalesced_mmio.
> >  > >  These are also per target.
> >  >
> >  > I don't know what I was eating yesterday: there are no references to
> >  > ldl/stl in either rtl8139 or e1000. In fact, the conversion is simple
> >  > for the device itself, just add a property "be". The attached patch
> >  > performs this part.
> >  >
> >  > But now there is a bigger problem, how to pass the property to the
> >  > device. It's not fair to require the user to remember to set it.
> >
> >
> > I still don't fully understand how come e1000 cares about
> >  target endianness.
> 
> It shouldn't. Maybe the real fix is to remove the byte swapping.

Presumably it's there for a reason?

> >  > >  A simple solution would be to change all of cpu_XX functions to
> >  > >  get a 64 bit address. This is a lot of churn, if we do this
> >  > >  anyway we should also pass length to callbacks, this way
> >  > >  rwhandler will get very small or go away completely.
> >  >
> >  > It's not too much effort to keep the target_phys_addr_t type.
> >
> >
> > I don't understand - target_phys_addr_t is different for different
> >  targets to we will need to recompile the code for each target.
> >  What am I missing?
> 
> target_phys_addr_t is 64 bit on a 64 bit host, on a 32 bit host it's
> size will be either 64 or 32 bits depending on the target. So the
> files are compiled once on 64 bit host, twice on 32 bit host if both
> 32 and 64 bits targets are selected.

How about just making it 64 bit unconditionally?
How much do we save by using a 32 bit address value?

-- 
MST




[Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Blue Swirl
On 3/24/10, Michael S. Tsirkin  wrote:
> On Wed, Mar 24, 2010 at 10:05:10PM +0200, Blue Swirl wrote:
>  > On 3/24/10, Michael S. Tsirkin  wrote:
>  > > On Tue, Mar 23, 2010 at 11:43:51PM +0200, Blue Swirl wrote:
>  > >  > rtl8139.c, e1000.c: need to convert ldl/stl to 
> cpu_physical_memory_read/write.
>  > >
>  > >
>  > > I don't see how it would help. These still get target_phys_addr_t which
>  > >  is per-target. Further, a ton of devices do
>  > >  cpu_register_physical_memory/qemu_register_coalesced_mmio.
>  > >  These are also per target.
>  >
>  > I don't know what I was eating yesterday: there are no references to
>  > ldl/stl in either rtl8139 or e1000. In fact, the conversion is simple
>  > for the device itself, just add a property "be". The attached patch
>  > performs this part.
>  >
>  > But now there is a bigger problem, how to pass the property to the
>  > device. It's not fair to require the user to remember to set it.
>
>
> I still don't fully understand how come e1000 cares about
>  target endianness.

It shouldn't. Maybe the real fix is to remove the byte swapping.

>  > >  A simple solution would be to change all of cpu_XX functions to
>  > >  get a 64 bit address. This is a lot of churn, if we do this
>  > >  anyway we should also pass length to callbacks, this way
>  > >  rwhandler will get very small or go away completely.
>  >
>  > It's not too much effort to keep the target_phys_addr_t type.
>
>
> I don't understand - target_phys_addr_t is different for different
>  targets to we will need to recompile the code for each target.
>  What am I missing?

target_phys_addr_t is 64 bit on a 64 bit host, on a 32 bit host it's
size will be either 64 or 32 bits depending on the target. So the
files are compiled once on 64 bit host, twice on 32 bit host if both
32 and 64 bits targets are selected.

>  > From e0ab5cc41c68207be558ccb330f4fb83fba4ee6f Mon Sep 17 00:00:00 2001
>  > From: Blue Swirl 
>  > Date: Wed, 24 Mar 2010 19:54:05 +
>  > Subject: [PATCH] Compile rtl8139 and e1000 only once
>  >
>  > WIP
>  >
>  > Signed-off-by: Blue Swirl 
>  > ---
>  >  Makefile.objs   |2 +
>  >  Makefile.target |4 --
>  >  hw/e1000.c  |  108 
> ++
>  >  hw/rtl8139.c|   82 +++---
>  >  4 files changed, 147 insertions(+), 49 deletions(-)
>  >
>  > diff --git a/Makefile.objs b/Makefile.objs
>  > index 281f7a6..54895f8 100644
>  > --- a/Makefile.objs
>  > +++ b/Makefile.objs
>  > @@ -155,6 +155,8 @@ hw-obj-y += msix.o
>  >  hw-obj-y += ne2000.o
>  >  hw-obj-y += eepro100.o
>  >  hw-obj-y += pcnet.o
>  > +hw-obj-y += rtl8139.o
>  > +hw-obj-y += e1000.o
>  >
>  >  hw-obj-$(CONFIG_SMC91C111) += smc91c111.o
>  >  hw-obj-$(CONFIG_LAN9118) += lan9118.o
>  > diff --git a/Makefile.target b/Makefile.target
>  > index eb4d010..1a86fc4 100644
>  > --- a/Makefile.target
>  > +++ b/Makefile.target
>  > @@ -176,10 +176,6 @@ QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
>  >  # xen backend driver support
>  >  obj-$(CONFIG_XEN) += xen_machine_pv.o xen_domainbuild.o
>  >
>  > -# PCI network cards
>  > -obj-y += rtl8139.o
>  > -obj-y += e1000.o
>  > -
>  >  # Hardware support
>  >  obj-i386-y = ide/core.o
>  >  obj-i386-y += pckbd.o dma.o
>  > diff --git a/hw/e1000.c b/hw/e1000.c
>  > index fd3059a..0f72db8 100644
>  > --- a/hw/e1000.c
>  > +++ b/hw/e1000.c
>  > @@ -121,6 +121,7 @@ typedef struct E1000State_st {
>  >  uint16_t reading;
>  >  uint32_t old_eecd;
>  >  } eecd_state;
>  > +uint32_t be;
>  >  } E1000State;
>  >
>  >  #define  defreg(x)   x = (E1000_##x>>2)
>  > @@ -825,14 +826,11 @@ static void (*macreg_writeops[])(E1000State *, int, 
> uint32_t) = {
>  >  enum { NWRITEOPS = ARRAY_SIZE(macreg_writeops) };
>  >
>  >  static void
>  > -e1000_mmio_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
>  > +e1000_mmio_writel_le(void *opaque, target_phys_addr_t addr, uint32_t val)
>  >  {
>  >  E1000State *s = opaque;
>  >  unsigned int index = (addr & 0x1) >> 2;
>  >
>  > -#ifdef TARGET_WORDS_BIGENDIAN
>  > -val = bswap32(val);
>  > -#endif
>  >  if (index < NWRITEOPS && macreg_writeops[index])
>  >  macreg_writeops[index](s, index, val);
>  >  else if (index < NREADOPS && macreg_readops[index])
>  > @@ -841,25 +839,47 @@ e1000_mmio_writel(void *opaque, target_phys_addr_t 
> addr, uint32_t val)
>  >  DBGOUT(UNKNOWN, "MMIO unknown write addr=0x%08x,val=0x%08x\n",
>  > index<<2, val);
>  >  }
>  > +static void
>  > +e1000_mmio_writel_be(void *opaque, target_phys_addr_t addr, uint32_t val)
>  > +{
>  > +val = bswap32(val);
>  > +e1000_mmio_writel_le(opaque, addr, val);
>  > +}
>  >
>  >  static void
>  > -e1000_mmio_writew(void *opaque, target_phys_addr_t addr, uint32_t val)
>  > +e1000_mmio_writew_le(void *opaque, target_phys_addr_t addr, uint32_t val)
>  >  {
>  >  // emulate hw without byte enables: no RMW
>  > -e10

[Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Michael S. Tsirkin
On Wed, Mar 24, 2010 at 10:05:10PM +0200, Blue Swirl wrote:
> On 3/24/10, Michael S. Tsirkin  wrote:
> > On Tue, Mar 23, 2010 at 11:43:51PM +0200, Blue Swirl wrote:
> >  > rtl8139.c, e1000.c: need to convert ldl/stl to 
> > cpu_physical_memory_read/write.
> >
> >
> > I don't see how it would help. These still get target_phys_addr_t which
> >  is per-target. Further, a ton of devices do
> >  cpu_register_physical_memory/qemu_register_coalesced_mmio.
> >  These are also per target.
> 
> I don't know what I was eating yesterday: there are no references to
> ldl/stl in either rtl8139 or e1000. In fact, the conversion is simple
> for the device itself, just add a property "be". The attached patch
> performs this part.
> 
> But now there is a bigger problem, how to pass the property to the
> device. It's not fair to require the user to remember to set it.

I still don't fully understand how come e1000 cares about
target endianness.

> >  A simple solution would be to change all of cpu_XX functions to
> >  get a 64 bit address. This is a lot of churn, if we do this
> >  anyway we should also pass length to callbacks, this way
> >  rwhandler will get very small or go away completely.
> 
> It's not too much effort to keep the target_phys_addr_t type.

I don't understand - target_phys_addr_t is different for different
targets to we will need to recompile the code for each target.
What am I missing?


> From e0ab5cc41c68207be558ccb330f4fb83fba4ee6f Mon Sep 17 00:00:00 2001
> From: Blue Swirl 
> Date: Wed, 24 Mar 2010 19:54:05 +
> Subject: [PATCH] Compile rtl8139 and e1000 only once
> 
> WIP
> 
> Signed-off-by: Blue Swirl 
> ---
>  Makefile.objs   |2 +
>  Makefile.target |4 --
>  hw/e1000.c  |  108 ++
>  hw/rtl8139.c|   82 +++---
>  4 files changed, 147 insertions(+), 49 deletions(-)
> 
> diff --git a/Makefile.objs b/Makefile.objs
> index 281f7a6..54895f8 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -155,6 +155,8 @@ hw-obj-y += msix.o
>  hw-obj-y += ne2000.o
>  hw-obj-y += eepro100.o
>  hw-obj-y += pcnet.o
> +hw-obj-y += rtl8139.o
> +hw-obj-y += e1000.o
>  
>  hw-obj-$(CONFIG_SMC91C111) += smc91c111.o
>  hw-obj-$(CONFIG_LAN9118) += lan9118.o
> diff --git a/Makefile.target b/Makefile.target
> index eb4d010..1a86fc4 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -176,10 +176,6 @@ QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
>  # xen backend driver support
>  obj-$(CONFIG_XEN) += xen_machine_pv.o xen_domainbuild.o
>  
> -# PCI network cards
> -obj-y += rtl8139.o
> -obj-y += e1000.o
> -
>  # Hardware support
>  obj-i386-y = ide/core.o
>  obj-i386-y += pckbd.o dma.o
> diff --git a/hw/e1000.c b/hw/e1000.c
> index fd3059a..0f72db8 100644
> --- a/hw/e1000.c
> +++ b/hw/e1000.c
> @@ -121,6 +121,7 @@ typedef struct E1000State_st {
>  uint16_t reading;
>  uint32_t old_eecd;
>  } eecd_state;
> +uint32_t be;
>  } E1000State;
>  
>  #define  defreg(x)   x = (E1000_##x>>2)
> @@ -825,14 +826,11 @@ static void (*macreg_writeops[])(E1000State *, int, 
> uint32_t) = {
>  enum { NWRITEOPS = ARRAY_SIZE(macreg_writeops) };
>  
>  static void
> -e1000_mmio_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
> +e1000_mmio_writel_le(void *opaque, target_phys_addr_t addr, uint32_t val)
>  {
>  E1000State *s = opaque;
>  unsigned int index = (addr & 0x1) >> 2;
>  
> -#ifdef TARGET_WORDS_BIGENDIAN
> -val = bswap32(val);
> -#endif
>  if (index < NWRITEOPS && macreg_writeops[index])
>  macreg_writeops[index](s, index, val);
>  else if (index < NREADOPS && macreg_readops[index])
> @@ -841,25 +839,47 @@ e1000_mmio_writel(void *opaque, target_phys_addr_t 
> addr, uint32_t val)
>  DBGOUT(UNKNOWN, "MMIO unknown write addr=0x%08x,val=0x%08x\n",
> index<<2, val);
>  }
> +static void
> +e1000_mmio_writel_be(void *opaque, target_phys_addr_t addr, uint32_t val)
> +{
> +val = bswap32(val);
> +e1000_mmio_writel_le(opaque, addr, val);
> +}
>  
>  static void
> -e1000_mmio_writew(void *opaque, target_phys_addr_t addr, uint32_t val)
> +e1000_mmio_writew_le(void *opaque, target_phys_addr_t addr, uint32_t val)
>  {
>  // emulate hw without byte enables: no RMW
> -e1000_mmio_writel(opaque, addr & ~3,
> -  (val & 0x) << (8*(addr & 3)));
> +e1000_mmio_writel_le(opaque, addr & ~3,
> + (val & 0x) << (8*(addr & 3)));
>  }
>  
>  static void
> -e1000_mmio_writeb(void *opaque, target_phys_addr_t addr, uint32_t val)
> +e1000_mmio_writew_be(void *opaque, target_phys_addr_t addr, uint32_t val)
>  {
>  // emulate hw without byte enables: no RMW
> -e1000_mmio_writel(opaque, addr & ~3,
> -  (val & 0xff) << (8*(addr & 3)));
> +e1000_mmio_writel_be(opaque, addr & ~3,
> + (val & 0x) << (8*(addr & 3)));
> +}
> +
> +static void
> +e1000_mmio

Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Richard Henderson
On 03/24/2010 10:07 AM, Richard Henderson wrote:
> struct CPUSmallCommonState
> {
> // most of the stuff from CPU_COMMON.
> // sorted for some thought of padding elimination.  ;-)
> };
> 
> struct CPULargeCommonState
> {
> CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE];
> target_phys_addr_t iotlb[NB_MMU_MODES][CPU_TLB_SIZE];
> struct TranslationBlock *tb_jmp_cache[TB_JMP_CACHE_SIZE];
> jmp_buf jmp_env;
> };
...
> Now.  If you're compiling a file for which cpu-specific code is ok:
> 
> register CPUXYZLargeState *env __asm__(AREG0);
> #define ENV_SMALL_COMMON_STATE(&env->s.common_s)
> #define ENV_LARGE_COMMON_STATE(&env->common_l)
> 
> If you're compiling a file which is supposed to be independant of cpu:
> 
> register CPUSmallCommonState *env __asm__(AREG0);
> #define ENV_SMALL_COMMON_STATE(env)
> #define ENV_LARGE_COMMON_STATE((CPULargeCommonState *)((char *)env + 
> cpu_large_state_offset))

On 03/24/2010 11:00 AM, Blue Swirl wrote:
> One trick is to define a fixed offset (about half CPUState size) and
> make env point to CPUState + offset. Then the negative part of the
> offset space can be used efficiently. But this just doubles the space
> that can be accessed fast, so it's not a big win.

A good idea.  We can eliminate the cpu_large_state_offset from above via:

struct CPUSmallCommonState
{
// most of the stuff from CPU_COMMON.
} __attribute__((aligned));

struct CPUXYZPrivateState
{
// all the cpu-specific stuff
};

struct CPUXYZSmallState
{
CPUXYZPrivateState p;
CPUSmallCommonState s;
};

struct CPUXYZAllState
{
CPUXYZSmallState s;
CPULargeCommonState l;  // ARG0 register points here.
};

register void *biased_env __asm__(AREG0);

static inline CPUXYZPrivateState *get_env_cpu_private(void)
{
return &((CPUXYZSmallState *)biased_env - 1)->p;
}

static inline CPUSmallCommonState *get_env_common_small(void)
{
return (CPUSmallCommonState *)biased_env - 1;
}

static inline CPULargeCommonState *get_env_common_large(void)
{
return (CPULargeCommonState *)biased_env;
}


r~




[Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Blue Swirl
On 3/24/10, Michael S. Tsirkin  wrote:
> On Tue, Mar 23, 2010 at 11:43:51PM +0200, Blue Swirl wrote:
>  > rtl8139.c, e1000.c: need to convert ldl/stl to 
> cpu_physical_memory_read/write.
>
>
> I don't see how it would help. These still get target_phys_addr_t which
>  is per-target. Further, a ton of devices do
>  cpu_register_physical_memory/qemu_register_coalesced_mmio.
>  These are also per target.

I don't know what I was eating yesterday: there are no references to
ldl/stl in either rtl8139 or e1000. In fact, the conversion is simple
for the device itself, just add a property "be". The attached patch
performs this part.

But now there is a bigger problem, how to pass the property to the
device. It's not fair to require the user to remember to set it.

>  A simple solution would be to change all of cpu_XX functions to
>  get a 64 bit address. This is a lot of churn, if we do this
>  anyway we should also pass length to callbacks, this way
>  rwhandler will get very small or go away completely.

It's not too much effort to keep the target_phys_addr_t type.
From e0ab5cc41c68207be558ccb330f4fb83fba4ee6f Mon Sep 17 00:00:00 2001
From: Blue Swirl 
Date: Wed, 24 Mar 2010 19:54:05 +
Subject: [PATCH] Compile rtl8139 and e1000 only once

WIP

Signed-off-by: Blue Swirl 
---
 Makefile.objs   |2 +
 Makefile.target |4 --
 hw/e1000.c  |  108 ++
 hw/rtl8139.c|   82 +++---
 4 files changed, 147 insertions(+), 49 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index 281f7a6..54895f8 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -155,6 +155,8 @@ hw-obj-y += msix.o
 hw-obj-y += ne2000.o
 hw-obj-y += eepro100.o
 hw-obj-y += pcnet.o
+hw-obj-y += rtl8139.o
+hw-obj-y += e1000.o
 
 hw-obj-$(CONFIG_SMC91C111) += smc91c111.o
 hw-obj-$(CONFIG_LAN9118) += lan9118.o
diff --git a/Makefile.target b/Makefile.target
index eb4d010..1a86fc4 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -176,10 +176,6 @@ QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
 # xen backend driver support
 obj-$(CONFIG_XEN) += xen_machine_pv.o xen_domainbuild.o
 
-# PCI network cards
-obj-y += rtl8139.o
-obj-y += e1000.o
-
 # Hardware support
 obj-i386-y = ide/core.o
 obj-i386-y += pckbd.o dma.o
diff --git a/hw/e1000.c b/hw/e1000.c
index fd3059a..0f72db8 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -121,6 +121,7 @@ typedef struct E1000State_st {
 uint16_t reading;
 uint32_t old_eecd;
 } eecd_state;
+uint32_t be;
 } E1000State;
 
 #define	defreg(x)	x = (E1000_##x>>2)
@@ -825,14 +826,11 @@ static void (*macreg_writeops[])(E1000State *, int, uint32_t) = {
 enum { NWRITEOPS = ARRAY_SIZE(macreg_writeops) };
 
 static void
-e1000_mmio_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
+e1000_mmio_writel_le(void *opaque, target_phys_addr_t addr, uint32_t val)
 {
 E1000State *s = opaque;
 unsigned int index = (addr & 0x1) >> 2;
 
-#ifdef TARGET_WORDS_BIGENDIAN
-val = bswap32(val);
-#endif
 if (index < NWRITEOPS && macreg_writeops[index])
 macreg_writeops[index](s, index, val);
 else if (index < NREADOPS && macreg_readops[index])
@@ -841,25 +839,47 @@ e1000_mmio_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
 DBGOUT(UNKNOWN, "MMIO unknown write addr=0x%08x,val=0x%08x\n",
index<<2, val);
 }
+static void
+e1000_mmio_writel_be(void *opaque, target_phys_addr_t addr, uint32_t val)
+{
+val = bswap32(val);
+e1000_mmio_writel_le(opaque, addr, val);
+}
 
 static void
-e1000_mmio_writew(void *opaque, target_phys_addr_t addr, uint32_t val)
+e1000_mmio_writew_le(void *opaque, target_phys_addr_t addr, uint32_t val)
 {
 // emulate hw without byte enables: no RMW
-e1000_mmio_writel(opaque, addr & ~3,
-  (val & 0x) << (8*(addr & 3)));
+e1000_mmio_writel_le(opaque, addr & ~3,
+ (val & 0x) << (8*(addr & 3)));
 }
 
 static void
-e1000_mmio_writeb(void *opaque, target_phys_addr_t addr, uint32_t val)
+e1000_mmio_writew_be(void *opaque, target_phys_addr_t addr, uint32_t val)
 {
 // emulate hw without byte enables: no RMW
-e1000_mmio_writel(opaque, addr & ~3,
-  (val & 0xff) << (8*(addr & 3)));
+e1000_mmio_writel_be(opaque, addr & ~3,
+ (val & 0x) << (8*(addr & 3)));
+}
+
+static void
+e1000_mmio_writeb_be(void *opaque, target_phys_addr_t addr, uint32_t val)
+{
+// emulate hw without byte enables: no RMW
+e1000_mmio_writel_be(opaque, addr & ~3,
+ (val & 0xff) << (8*(addr & 3)));
+}
+
+static void
+e1000_mmio_writeb_le(void *opaque, target_phys_addr_t addr, uint32_t val)
+{
+// emulate hw without byte enables: no RMW
+e1000_mmio_writel_le(opaque, addr & ~3,
+ (val & 0xff) << (8*(addr & 3)));
 }
 
 static uint32_t
-e1000_mmio_readl(void *opaque, target_phys_addr_t addr)
+e1000_mmio_readl_le(

[Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Michael S. Tsirkin
On Tue, Mar 23, 2010 at 11:43:51PM +0200, Blue Swirl wrote:
> rtl8139.c, e1000.c: need to convert ldl/stl to cpu_physical_memory_read/write.

I don't see how it would help. These still get target_phys_addr_t which
is per-target. Further, a ton of devices do
cpu_register_physical_memory/qemu_register_coalesced_mmio.
These are also per target.

A simple solution would be to change all of cpu_XX functions to
get a 64 bit address. This is a lot of churn, if we do this
anyway we should also pass length to callbacks, this way
rwhandler will get very small or go away completely.

-- 
MST




[Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Juan Quintela
Blue Swirl  wrote:
> On 3/24/10, Juan Quintela  wrote:
>> Blue Swirl  wrote:
>>  > Hi,
>>  >
>>  > Here's some planning for getting most files compiled as few times as
>>  > possible. Comments and suggestions are welcome.
>>
>>
>> I took some thought about this at some point.  Problems here start from
>>  "Recursive Makefile condered Harmful (tm)".
>>
>>  Look at how we jump through hops to be able to compile things in
>>  one/other side.
>>
>>  We have:
>>  Makefile
>>  Makefile.target (really lots of them, one for target)
>>  Makefile.hw
>>  Makefile.user
>>
>>  If we had only a single Makefile, things in this department would be
>>  much, much easier. And no, convert to a single Makefile is not trivial
>>  either, but it would make things easier.
>>
>>  Why do we have several Makefiles?  Because we want to compile each file
>>  with different options.
>>
>>  Why do we need to abuse so much VPATH?  Because we need to bring files
>>  randomly from $(ROOT), $(ROOT)/hw  $(ROOT)/$(TARGET).
>
> Would it help if the Makefiles were scattered to each directory, for
> example instead of Makefile.hw we had hw/Makefile?

The interesting one is Makefile.target, hw/Makefile could help, but that
is far away from where the action is.

If you look at Makefile.target from far enough, you will see that it
basically has:

libobj-$(CONFIG_FOO) = ...

ifdef CONFIG_LINUX_USER

endif

ifdef CONFIG_DARWIN_USER
...
endif

ifdef CONFIG_BSD_USER
...
endif

ifdef CONFIG_SOFTMMU
...
endif


The shared bits are very small (out of the libobj-y stuff).
Spliting the others in different Makefiles (or whatever is easy).  How
to get this ones compiled only once per BASE_ARCH/whatever should put us
near the goal of a single Makefile (and compiling each thing just the
number of times required).  Some thought is needed to know how to work here.

Actually, Anthony suggested at some point to just use 64 bits for
TARGET_PHYS_ADDR_BITS and remove the need for hw32/64.

I think that people emulationg 32bits on 32bits would suffer, but have
no clue how much.  Anthony, what was the idea?

>>  Problem with this proposal is that it is not trivial to do in little
>>  steps, and the real big advantages appear when you switch to a single
>>  Makefile at the end.
>
> I may have missed something, but the compile process doesn't care
> about boards, because all boards for some architecture (and therefore
> all devices used by all boards) are linked to a single
> per-architecture executable. So why introduce the boards concept to
> Makefiles?

I missunderstood this bit of your previous message:

> The target dependent cases should be next. On full build, each MIPS
> device file gets compiled four times, PPC files three times and x86
> twice. The devices for architectures that are compiled only once (ARM,
> Cris, Sparc32 etc.) do not need to be touched.

I was refering to this ones, but somehow got confused with boards :(

>
>>  > vl.c: a lot of work. Maybe the CPUState stuff should be separated to a 
>> new file.
>>
>>
>> That should just be a rule in Documentation.  You can't but anything
>>  else in vl.c.  If you move anything out of vl.c (see timers work from
>>  bonzini for example), you get a wild card for free commit bypassing
>>  maintainers or some similar price :)
>
> Cleaning up vl.c would be great, but just for purpose of single
> compilation, it's enough to put the CPUState pieces to a target
> dependent file (cpu-common.c?) and compile the rest once by making
> TARGET_xxx conditional code unconditional. This may still be doable.

I haven't looked at detail at this :(

>>  
>>
>>  I haven't really looked at them at depth.
>>
>>  I looked when I cleaned up the build system, I thought how to do the
>>  next step (outlined before), but got sidetracked by other more urgent
>>  things.
>
> Thanks for the comments.

You are welcome.

Later, Juan.




Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Blue Swirl
On 3/24/10, Richard Henderson  wrote:
> On 03/24/2010 02:47 AM, Paolo Bonzini wrote:
>
> > 1) make CPUState define only common fields. Include CPUState at the
> > beginning of each per-target CPUXYZState.
> >
>
>  Irritatingly, the common fields contain quite big TLBs.  And the
>  offsets from the start of env affect the compactness of the code
>  generated from TCG.  We really really want the general registers
>  to come first to make sure that those offsets fit the host's
>  reg+offset addressing mode.

One trick is to define a fixed offset (about half CPUState size) and
make env point to CPUState + offset. Then the negative part of the
offset space can be used efficiently. But this just doubles the space
that can be accessed fast, so it's not a big win.




[Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Blue Swirl
On 3/24/10, Juan Quintela  wrote:
> Blue Swirl  wrote:
>  > Hi,
>  >
>  > Here's some planning for getting most files compiled as few times as
>  > possible. Comments and suggestions are welcome.
>
>
> I took some thought about this at some point.  Problems here start from
>  "Recursive Makefile condered Harmful (tm)".
>
>  Look at how we jump through hops to be able to compile things in
>  one/other side.
>
>  We have:
>  Makefile
>  Makefile.target (really lots of them, one for target)
>  Makefile.hw
>  Makefile.user
>
>  If we had only a single Makefile, things in this department would be
>  much, much easier. And no, convert to a single Makefile is not trivial
>  either, but it would make things easier.
>
>  Why do we have several Makefiles?  Because we want to compile each file
>  with different options.
>
>  Why do we need to abuse so much VPATH?  Because we need to bring files
>  randomly from $(ROOT), $(ROOT)/hw  $(ROOT)/$(TARGET).

Would it help if the Makefiles were scattered to each directory, for
example instead of Makefile.hw we had hw/Makefile?

>  Problem here, there isn't a simple way to compile files for several
>  target just once (no way to put them).
>
>  Our main copmile rule is:
>
>  $(QEMU_PROG): $(obj-y) $(obj-$(TARGET_BASE_ARCH)-y)
> $(call LINK,$(obj-y) $(obj-$(TARGET_BASE_ARCH)-y))
>
>
>  (notice that things compiled in Makefile are trivial, they are already
>  compiled just once by definition, problems are for all the qemu's we
>  compile).
>
>  We could change: $(obj-$(TARGET_BASE_ARCH)-y) to something like:
>
>  OBJ-TARGET=s/.o/.$(TARGET_BASE_ARCH).o/
>
>  (I forgot the subst Makefile syntax), and have the:
>
>  %.$(TARGET_BASE_ARCH).o: %.c
>gcc $(TARGET_BASE_ARCH options)
>
>  From there, as you suggested, we need some files that are not compiled
>  by architecture, they need to be compiled by board, well, we need to add
>  yet another level obj-$(TARGET_BOARD) or whatever.
>
>  Notice that this is a lot of work, but you are needing the audit to be
>  able to compile only once.  Problem just now is that there is not a
>  simple way to describe that information,  with my proposal it gets
>  trivial to express:
>
>  obj-$(CONFIG_FOO) += foo.o  # You need this for everything
>  obj-mips-$(CONFIG_FOO) += foo.o  # You need this for all mips boards
>  obj-malta-$(CONFIG_FOO) += foo.o  # You need this for all mips malta board
>
>  You still need to do some different magic from hw-32/64 but it could be
>  done this way.  Once you did it this way, you now where the files are
>  (hw or target) and you can drop the VPATH tricks.
>
>  Problem with this proposal is that it is not trivial to do in little
>  steps, and the real big advantages appear when you switch to a single
>  Makefile at the end.

I may have missed something, but the compile process doesn't care
about boards, because all boards for some architecture (and therefore
all devices used by all boards) are linked to a single
per-architecture executable. So why introduce the boards concept to
Makefiles?

>  > vl.c: a lot of work. Maybe the CPUState stuff should be separated to a new 
> file.
>
>
> That should just be a rule in Documentation.  You can't but anything
>  else in vl.c.  If you move anything out of vl.c (see timers work from
>  bonzini for example), you get a wild card for free commit bypassing
>  maintainers or some similar price :)

Cleaning up vl.c would be great, but just for purpose of single
compilation, it's enough to put the CPUState pieces to a target
dependent file (cpu-common.c?) and compile the rest once by making
TARGET_xxx conditional code unconditional. This may still be doable.

>  
>
>  I haven't really looked at them at depth.
>
>  I looked when I cleaned up the build system, I thought how to do the
>  next step (outlined before), but got sidetracked by other more urgent
>  things.

Thanks for the comments.




Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Paul Brook
>  1) make CPUState define only common fields. Include CPUState at the
>  beginning of each per-target CPUXYZState.
> >>>
> >>> Irritatingly, the common fields contain quite big TLBs. And the
> >>> offsets from the start of env affect the compactness of the code
> >>> generated from TCG. We really really want the general registers
> >>> to come first to make sure that those offsets fit the host's
> >>> reg+offset addressing mode.
> >>
> >> What about adding a 512-bytes (or more) block or something like that at
> >> the beginning of CPUState with a union, so you can put the per-target
> >> stuff there?
> >
> > Is it really worth the hassle? Anything touching CPUState is probably
> > going to be CPU specific anyway.
> 
> qemu-timer.c, hw/dma.c is not and these are the first two files I looked
> at.  translate-all.c is the third, and it is except for a trivial cleanup.

The use in hw/dma.c is incorrect.  See previous discussion about how 
qemu_bh_schedule_idle needs to go away.

I'm also unconvinced by your numbers. My i386-softmmu/ directory contains only 
43 object files, most of are device emulation and don't touch CPU state at 
all. arm-softmmu/ contains a good number more, but that's mostly board init 
(which needs to know which CPU it's creating), and devices that are only used 
by one board so noone's bothered to move them into libhw.

Paul




Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Richard Henderson
On 03/24/2010 07:45 AM, Paolo Bonzini wrote:
> On 03/24/2010 12:19 PM, Richard Henderson wrote:
>> On 03/24/2010 02:47 AM, Paolo Bonzini wrote:
>>> 1) make CPUState define only common fields. Include CPUState at the
>>> beginning of each per-target CPUXYZState.
>>
>> Irritatingly, the common fields contain quite big TLBs. And the
>> offsets from the start of env affect the compactness of the code
>> generated from TCG. We really really want the general registers
>> to come first to make sure that those offsets fit the host's
>> reg+offset addressing mode.
> 
> What about adding a 512-bytes (or more) block or something like that at
> the beginning of CPUState with a union, so you can put the per-target
> stuff there?

I think that would be confusing.

What might be just as good (although possibly just as confusing)
is to move the big members into a different structure.  E.g.

struct CPUSmallCommonState
{
// most of the stuff from CPU_COMMON.
// sorted for some thought of padding elimination.  ;-)
};

struct CPULargeCommonState
{
CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE];
target_phys_addr_t iotlb[NB_MMU_MODES][CPU_TLB_SIZE];
struct TranslationBlock *tb_jmp_cache[TB_JMP_CACHE_SIZE];
jmp_buf jmp_env;
};

struct CPUXYZSmallState
{
CPUSmallCommonState common_s;
// the rest of the cpu-specific stuff.
};

struct CPUXYZLargeState
{
CPUXYZSmallState s;
CPUBigCommonState common_l;
};

extern int cpu_large_state_offset = offsetof(CPUXYZLargeState, common_l);

Now.  If you're compiling a file for which cpu-specific code is ok:

register CPUXYZLargeState *env __asm__(AREG0);
#define ENV_SMALL_COMMON_STATE(&env->s.common_s)
#define ENV_LARGE_COMMON_STATE(&env->common_l)

If you're compiling a file which is supposed to be independant of cpu:

register CPUSmallCommonState *env __asm__(AREG0);
#define ENV_SMALL_COMMON_STATE(env)
#define ENV_LARGE_COMMON_STATE((CPULargeCommonState *)((char *)env + 
cpu_large_state_offset))

For the gcc-compiled code, the addition of the cpu_large_state_offset
is probably more or less on par in efficiency with indirection.  But
for TCG generated code, the variable read happens at code generation
time, which means we *still* have a constant in the generated code.


r~




Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Paolo Bonzini

On 03/24/2010 03:56 PM, Paul Brook wrote:

On 03/24/2010 12:19 PM, Richard Henderson wrote:

On 03/24/2010 02:47 AM, Paolo Bonzini wrote:

1) make CPUState define only common fields. Include CPUState at the
beginning of each per-target CPUXYZState.


Irritatingly, the common fields contain quite big TLBs. And the
offsets from the start of env affect the compactness of the code
generated from TCG. We really really want the general registers
to come first to make sure that those offsets fit the host's
reg+offset addressing mode.


What about adding a 512-bytes (or more) block or something like that at
the beginning of CPUState with a union, so you can put the per-target
stuff there?


Is it really worth the hassle? Anything touching CPUState is probably going to
be CPU specific anyway.


qemu-timer.c, hw/dma.c is not and these are the first two files I looked 
at.  translate-all.c is the third, and it is except for a trivial cleanup.


Big parts of vl.c are independent too.

As a quick check:

$ git grep -l 'CPUState' | grep -ve ^tcg -e ^target- | wc -l
96
$ git grep -l 'CPUState' | grep -ve ^tcg -e ^target- | \
 xargs grep -l '#if.*TARGET_' | wc -l
36

The ones that remain are pretty much what would you expect, besides 
translate-all.c and some in hw/ which I snipped:


bsd-user/main.c
darwin-user/main.c darwin-user/qemu.h darwin-user/signal.c
linux-user/elfload.c linux-user/main.c linux-user/qemu.h
linux-user/signal.c linux-user/syscall.c

cpu-all.h cpu-defs.h cpu-exec.c
def-helper.h
disas.c
exec-all.h exec.c
gdbstub.c
monitor.c
translate-all.c
vl.c

Of course this doesn't mean that 60 files are target-independent, but 
30-ish probably are or can be made so.


It would also help code clarity to use CPUXYZState more, to understand 
which hw/ files are specific to one model.  For hw/s390-virtio.c that's 
obvious, but slightly less so for hw/sun4m.c and even less so for 
hw/syborg.c.  This is an independent cleanup.


Paolo




Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Paul Brook
> On 03/24/2010 12:19 PM, Richard Henderson wrote:
> > On 03/24/2010 02:47 AM, Paolo Bonzini wrote:
> >> 1) make CPUState define only common fields. Include CPUState at the
> >> beginning of each per-target CPUXYZState.
> >
> > Irritatingly, the common fields contain quite big TLBs. And the
> > offsets from the start of env affect the compactness of the code
> > generated from TCG. We really really want the general registers
> > to come first to make sure that those offsets fit the host's
> > reg+offset addressing mode.
> 
> What about adding a 512-bytes (or more) block or something like that at
> the beginning of CPUState with a union, so you can put the per-target
> stuff there?

Is it really worth the hassle? Anything touching CPUState is probably going to 
be CPU specific anyway.

Paul




Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Paolo Bonzini

On 03/24/2010 12:19 PM, Richard Henderson wrote:

On 03/24/2010 02:47 AM, Paolo Bonzini wrote:

1) make CPUState define only common fields. Include CPUState at the
beginning of each per-target CPUXYZState.


Irritatingly, the common fields contain quite big TLBs. And the
offsets from the start of env affect the compactness of the code
generated from TCG. We really really want the general registers
to come first to make sure that those offsets fit the host's
reg+offset addressing mode.


What about adding a 512-bytes (or more) block or something like that at 
the beginning of CPUState with a union, so you can put the per-target 
stuff there?


Paolo




Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Richard Henderson

On 03/24/2010 02:47 AM, Paolo Bonzini wrote:

1) make CPUState define only common fields. Include CPUState at the
beginning of each per-target CPUXYZState.


Irritatingly, the common fields contain quite big TLBs.  And the
offsets from the start of env affect the compactness of the code
generated from TCG.  We really really want the general registers
to come first to make sure that those offsets fit the host's
reg+offset addressing mode.


r~




[Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Paolo Bonzini



The harder cases are those where the device code depends somehow on
the architecture. Some thoughts follow.

vl.c: a lot of work. Maybe the CPUState stuff should be separated to a new file.

dma.c: DMA_schedule needs access to CPUState.


Most users of CPUState (e.g. qemu-timer.c and hw/dma.c) either need it 
as an opaque pointer, or only need access to target-independent stuff. 
So you could:


1) make CPUState define only common fields.  Include CPUState at the 
beginning of each per-target CPUXYZState.


2) Do s/CPUState/CPUXYZState/ on target-*/*.

3) Make it compile, possibly by undoing parts of 2) and changing parts 
of it to DO_UPCAST.


Paolo




[Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Juan Quintela
Blue Swirl  wrote:
> Hi,
>
> Here's some planning for getting most files compiled as few times as
> possible. Comments and suggestions are welcome.

I took some thought about this at some point.  Problems here start from
"Recursive Makefile condered Harmful (tm)".

Look at how we jump through hops to be able to compile things in
one/other side.

We have:
Makefile
Makefile.target (really lots of them, one for target)
Makefile.hw
Makefile.user

If we had only a single Makefile, things in this department would be
much, much easier. And no, convert to a single Makefile is not trivial
either, but it would make things easier.

Why do we have several Makefiles?  Because we want to compile each file
with different options.

Why do we need to abuse so much VPATH?  Because we need to bring files
randomly from $(ROOT), $(ROOT)/hw  $(ROOT)/$(TARGET).

Problem here, there isn't a simple way to compile files for several
target just once (no way to put them).

Our main copmile rule is:

$(QEMU_PROG): $(obj-y) $(obj-$(TARGET_BASE_ARCH)-y)
$(call LINK,$(obj-y) $(obj-$(TARGET_BASE_ARCH)-y))


(notice that things compiled in Makefile are trivial, they are already
compiled just once by definition, problems are for all the qemu's we
compile).

We could change: $(obj-$(TARGET_BASE_ARCH)-y) to something like:

OBJ-TARGET=s/.o/.$(TARGET_BASE_ARCH).o/

(I forgot the subst Makefile syntax), and have the:

%.$(TARGET_BASE_ARCH).o: %.c
   gcc $(TARGET_BASE_ARCH options)

>From there, as you suggested, we need some files that are not compiled
by architecture, they need to be compiled by board, well, we need to add
yet another level obj-$(TARGET_BOARD) or whatever.

Notice that this is a lot of work, but you are needing the audit to be
able to compile only once.  Problem just now is that there is not a
simple way to describe that information,  with my proposal it gets
trivial to express:

obj-$(CONFIG_FOO) += foo.o  # You need this for everything
obj-mips-$(CONFIG_FOO) += foo.o  # You need this for all mips boards
obj-malta-$(CONFIG_FOO) += foo.o  # You need this for all mips malta board

You still need to do some different magic from hw-32/64 but it could be
done this way.  Once you did it this way, you now where the files are
(hw or target) and you can drop the VPATH tricks.

Problem with this proposal is that it is not trivial to do in little
steps, and the real big advantages appear when you switch to a single
Makefile at the end.

> vl.c: a lot of work. Maybe the CPUState stuff should be separated to a new 
> file.

That should just be a rule in Documentation.  You can't but anything
else in vl.c.  If you move anything out of vl.c (see timers work from
bonzini for example), you get a wild card for free commit bypassing
maintainers or some similar price :)



I haven't really looked at them at depth.

I looked when I cleaned up the build system, I thought how to do the
next step (outlined before), but got sidetracked by other more urgent
things.

Later, Juan.