[Qemu-devel] Re: KVM call agenda for September 7

2010-09-06 Thread Avi Kivity

 On 09/06/2010 11:00 PM, Juan Quintela wrote:

Please send in any agenda items you are interested in covering.



0.13?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.




[Qemu-devel] Fwd: Guest hangs when I do general operation.

2010-09-06 Thread Amos Kong
-- Forwarded message --
From: Amos Kong 
Date: Tue, Sep 7, 2010 at 7:49 AM
Subject: Guest hangs when I do general operation.
To: 王箫 


kvm upstream: 43e413f7db1a4a90671dda0b1d6c1f8cb30673ed KVM: Whitespace
changes to remove differences wrt kvm-updates/2.6.37
qemu upstream: cb93bbdd7db92e50ff5e60a346b23df68acae46b Fix OpenBSD
linker warning

# ./x86_64-softmmu/qemu-system-x86_64 ~/win7-32.qcow2 -m 1024 -vnc :0
-usbdevice tablet -cpu qemu64 -snapshot -enable-kvm -bios pc-bios/bios.bin

Guest hangs when I do general operation.
It's fine when using upstream qemu-kvm.  this problem only occurred
one time when I debug by gdb, after execute 'continue', guest runs
normally.

debug msg of (qemu + kvm)
(gdb) c
Continuing.

Program received signal SIGUSR2, User defined signal 2.
0x7fc7d4d7bfb3 in select () at ../sysdeps/unix/syscall-template.S:82
82      ../sysdeps/unix/syscall-template.S: No such file or directory.
       in ../sysdeps/unix/syscall-template.S
(gdb) bt
#0  0x7fc7d4d7bfb3 in select () at ../sysdeps/unix/syscall-template.S:82
#1  0x004270ea in qemu_aio_wait () at aio.c:193
#2  0x00426475 in bdrv_read_em (bs=0x186a340,
sector_num=6343320, buf=0x7fc7c5a9b010 "RCRD(", nb_sectors=104) at
block.c:2432
#3  0x0043c437 in qcow_read (bs=0x1838680, start_sect=, cluster_offset=, n_start=,
           n_end=) at block/qcow2-cluster.c:368
#4  copy_sectors (bs=0x1838680, start_sect=,
cluster_offset=, n_start=,
               n_end=) at block/qcow2-cluster.c:406
#5  0x0043c69b in qcow2_alloc_cluster_link_l2 (bs=0x1838680,
m=0x1d5d798) at block/qcow2-cluster.c:689
#6  0x004378d5 in qcow_aio_write_cb (opaque=0x1d5d700, ret=0)
at block/qcow2.c:566
#7  0x00429c5d in posix_aio_process_queue (opaque=) at posix-aio-compat.c:459
#8  0x00429d0c in posix_aio_read (opaque=0x183a250) at
posix-aio-compat.c:489
#9  0x0051fec6 in main_loop_wait (nonblocking=) at /home/devel/qemu/vl.c:1281
#10 0x005209bd in main_loop (argc=0, argv=, envp=) at /home/devel/qemu/vl.c:1332
#11 main (argc=0, argv=, envp=) at /home/devel/qemu/vl.c:2995
---
kvm statistics

 efer_reload                  0       0
 exits                  8714404       0
 fpu_reload              115538       0
 halt_exits               66926       0
 halt_wakeup                  0       0
 host_state_reload      2366344       0
 hypercalls                   0       0
 insn_emulation         1848818       0
 insn_emulation_fail          0       0
 invlpg                  662261       0
 io_exits               1293800       0
 irq_exits               531478       0
 irq_injections          109588       0
 irq_window              114236       0
 largepages                   0       0
 mmio_exits              705388       0
 mmu_cache_miss          355201       0
 mmu_flooded             298554       0
 mmu_pde_zapped           25705       0
 mmu_pte_updated         241815       0
 mmu_pte_write         15701676       0
 mmu_recycled               546       0
 mmu_shadow_zapped       527220       0
 mmu_unsync                4203       0
 nmi_injections               0       0
 nmi_window                   0       0
 pf_fixed               3107522       0
 pf_guest                631148       0
 remote_tlb_flush         31032       0
 request_irq                  0       0
 signal_exits            310597       0
 tlb_flush              2164428       0


Re: [Qemu-devel] [PATCH] raw-posix: improve detection of scsi-generic devices

2010-09-06 Thread Christoph Hellwig
On Mon, Sep 06, 2010 at 05:39:00PM +0200, Alexander Graf wrote:
> 
> On 06.09.2010, at 17:06, Bernhard Kohl wrote:
> 
> > From: Bernhard Kohl 
> > 
> > Allow symbolic links which point to /dev/sgX devices.
> 
> Couldn't you send an SG_IO test ioctl over and see if it works? I really 
> dislike the whole file name magic matching.

You could, but the result would not be what you expect, given that every
/dev/sd* device and more in Linux support it.

What we really need is to stop shoe-horning scsi pass through support
into the block layer.  Once we finally get our generic thread offload
facilily there is no need for it anymore at all, scsi-generic can
simplify offload the SG_IO ioctl and be done with it.  I even have an
old prototype for this, just waiting for the generic thread offload
to get merged before resurrecting it.




Re: [Qemu-devel] Re: [PATCH 01/14] RESEND apb: fix typo.

2010-09-06 Thread Isaku Yamahata
On Mon, Sep 06, 2010 at 10:55:19PM +0300, Michael S. Tsirkin wrote:
> On Mon, Sep 06, 2010 at 05:54:25PM +, Blue Swirl wrote:
> > On Mon, Sep 6, 2010 at 9:46 AM, Michael S. Tsirkin  wrote:
> > > On Mon, Sep 06, 2010 at 04:46:15PM +0900, Isaku Yamahata wrote:
> > >> fix typo.
> > >>
> > >> Signed-off-by: Isaku Yamahata 
> > >
> > > This is separate from the express patches, right?
> > > I'll appply this. Thanks!
> > 
> > Since the patch that introduces the build breakage (pci_brdige_reset)
> > has not been committed to master yet, can you instead drop that commit
> > and introduce a fixed version? This will avoid problems with git
> > bisect.
> 
> Sure, thanks for pointing this out. Isaku, could
> you just repost a fixed up patchset? Working git bisect is nice to have.

I should have included it in the commit log.

12bc74dfc1fe8cc57060a9eb94d0e9884c5c445e
Fortunately it's the head of pci branch.

thanks,

> 
> > >
> > >> ---
> > >> ?hw/apb_pci.c | ? ?6 +++---
> > >> ?1 files changed, 3 insertions(+), 3 deletions(-)
> > >>
> > >> diff --git a/hw/apb_pci.c b/hw/apb_pci.c
> > >> index 10a5baa..c619112 100644
> > >> --- a/hw/apb_pci.c
> > >> +++ b/hw/apb_pci.c
> > >> @@ -362,7 +362,7 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
> > >> ? ? ?/* APB secondary busses */
> > >> ? ? ?pci_dev = pci_create_multifunction(d->bus, PCI_DEVFN(1, 0), true,
> > >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "pbm-bridge");
> > >> - ? ?br = DO_UPCAST(PCIBridge, dev, dev);
> > >> + ? ?br = DO_UPCAST(PCIBridge, dev, pci_dev);
> > >> ? ? ?pci_bridge_map_irq(br, "Advanced PCI Bus secondary bridge 1",
> > >> ? ? ? ? ? ? ? ? ? ? ? ? pci_apb_map_irq);
> > >> ? ? ?qdev_init_nofail(&pci_dev->qdev);
> > >> @@ -370,7 +370,7 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
> > >>
> > >> ? ? ?pci_dev = pci_create_multifunction(d->bus, PCI_DEVFN(1, 1), true,
> > >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "pbm-bridge");
> > >> - ? ?br = DO_UPCAST(PCIBridge, dev, dev);
> > >> + ? ?br = DO_UPCAST(PCIBridge, dev, pci_dev);
> > >> ? ? ?pci_bridge_map_irq(br, "Advanced PCI Bus secondary bridge 2",
> > >> ? ? ? ? ? ? ? ? ? ? ? ? pci_apb_map_irq);
> > >> ? ? ?qdev_init_nofail(&pci_dev->qdev);
> > >> @@ -462,7 +462,7 @@ static PCIDeviceInfo pbm_pci_bridge_info = {
> > >> ? ? ?.qdev.name = "pbm-bridge",
> > >> ? ? ?.qdev.size = sizeof(PCIBridge),
> > >> ? ? ?.qdev.vmsd = &vmstate_pci_device,
> > >> - ? ?.qdev.reset = pci_brdige_reset,
> > >> + ? ?.qdev.reset = pci_bridge_reset,
> > >> ? ? ?.init = apb_pci_bridge_initfn,
> > >> ? ? ?.exit = pci_bridge_exitfn,
> > >> ? ? ?.config_write = pci_bridge_write_config,
> > >> --
> > >> 1.7.1.1
> > >
> > >
> 

-- 
yamahata



Re: [Qemu-devel] Unmaintained QEMU builds

2010-09-06 Thread Andreas Färber

Am 05.09.2010 um 17:57 schrieb Anthony Liguori:


On 09/05/2010 10:10 AM, Avi Kivity wrote:
As a baby step, is there any chance of publishing an automatic  
nightly Windows (cross-)build as a .zip file on qemu.org? That  
might give more users a chance of detecting runtime faults during  
the development cycle.


That's doable and useful, yes.


I doubt it's useful.

[...] We don't have myriads of users demanding better Windows  
support.  Search the list, there's almost no one asking questions  
about Windows [...]


Right. There's some more recent posts on that QEMU forum, although  
buried beneath a pile of spam.


Anyway, it's a chicken-and-egg problem. There's not much Win32 (nor  
Win64) contributions, there's a guesstimated minor number of users.  
Many of us including myself don't really care about the platform. When  
some Windows user does show up and reports an error against QEMU  
Manager v7.0, we have no clue what QEMU exactly that is based on, how  
it was compiled and whether there may be downstream patches involved,  
so we're not of much help. Nobody that I remember ever came up with  
git-bisect results to pinpoint where/how their regression was  
introduced.


There's three ways out:

1) We drop Windows support. No Windows user has so far participated in  
the discussion. When they cry, it'll be too late, cf. kqemu.


2) They make a step towards us.
* Posting questions, thankyous, patches here (and not just about the  
forum being down, like right now;), showing their existence and interest

* Basing patches on Git master and not some oldish release tarball
* Separating unrelated changes and explaining why changes were necessary
* Expecting and responding to review feedback
* ...

3) We make a step towards them. Some trivial step might (or might not)  
raise QEMU's attractiveness and get us one or another mid-term  
contribution.

* Build && publish Win32 binaries with a cron job
* Suggest a Win32 GSoC project?
* Move some of the nice feature documentation from the Contribute Wiki  
section to the About section for better visibility
* Publish a roadmap with estimated release dates to aid downstreams,  
improve downstream/upstream communication beyond KVM
* Add more guiding Wiki contents or link & update/extend MAINTAINERS  
file: define the defacto processes (rebase against Git tree URL, cc  
, link to git-format-patch/git-send-email man page) and add  
info on missing subsystems/files (e.g. linux-user, Cocoa, SDL, VNC,  
the various TCG targets) and update maintainers (many Fabrices and  
question marks)

* Spawn topic lists to reduce qemu-devel traffic? (bugs? block? sparc?)
* Embrace feature contributions of sub-perfect quality? (i.e., picking  
good stuff from bad patches - thinking of an Austrian WLAN+Win32  
patch, an Italian ppcemb patch w/debug leftovers, the Haiku/x86 host  
patch (will ping him again), ...)

* ...

Most of the latter thoughts turn out not to be Windows-specific at all.

Regards,
Andreas



[Qemu-devel] Re: Irregular response times

2010-09-06 Thread Pascal J. Bourguignon
p...@informatimago.com (Pascal J. Bourguignon) writes:

> Mulyadi Santosa  writes:
>
>> On Sun, Jul 4, 2010 at 06:42, Pascal J. Bourguignon
>>  wrote:
>>> p...@informatimago.com (Pascal J. Bourguignon) writes:
>>>
 I've got various Qemu running linux (gentoo or debian, all running the
 same kernel).  On one of them,  I get very irregular response time.

 During a few seconds, it works normally.  Eg. thru a ssh connection, I
 can type commands at bash, and I get the character echoed right away.
 And then, for a few tens of seconds, it hangs, with no output and no
 way to interrupt it or whatever.   The qemu process is in S state,
 waiting in poll_s WCHAN.
> exec qemu $kvm \
> -cpu ${cpu}   -smp ${ncpus},cores=${ncores}  -m ${mem}  ${vid...@]} \
> -boot menu=on \
> -net nic,model=${nic_model} -net tap \
> -hda /domU/images/${vm}.boot.disk \
> -hdb /domU/images/${vm}.ext3 \
> -hdc /domU/images/${vm}.swap \


I eventually found the problem:  all qemu instances will use the same
MAC address for their NIC.   When I set different MACs to the
different VMs, everything works well: 

mac=52:54:0:0:0:1 # or 2, 3, ...
...
-net nic,macaddr=${mac},model=${nic_model} -net tap \


-- 
__Pascal Bourguignon__ http://www.informatimago.com/




[Qemu-devel] KVM call agenda for September 7

2010-09-06 Thread Juan Quintela

Please send in any agenda items you are interested in covering.

thanks,
Juan.



Re: [Qemu-devel] Re: [PATCH 01/14] RESEND apb: fix typo.

2010-09-06 Thread Michael S. Tsirkin
On Mon, Sep 06, 2010 at 05:54:25PM +, Blue Swirl wrote:
> On Mon, Sep 6, 2010 at 9:46 AM, Michael S. Tsirkin  wrote:
> > On Mon, Sep 06, 2010 at 04:46:15PM +0900, Isaku Yamahata wrote:
> >> fix typo.
> >>
> >> Signed-off-by: Isaku Yamahata 
> >
> > This is separate from the express patches, right?
> > I'll appply this. Thanks!
> 
> Since the patch that introduces the build breakage (pci_brdige_reset)
> has not been committed to master yet, can you instead drop that commit
> and introduce a fixed version? This will avoid problems with git
> bisect.

Sure, thanks for pointing this out. Isaku, could
you just repost a fixed up patchset? Working git bisect is nice to have.

> >
> >> ---
> >>  hw/apb_pci.c |    6 +++---
> >>  1 files changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/hw/apb_pci.c b/hw/apb_pci.c
> >> index 10a5baa..c619112 100644
> >> --- a/hw/apb_pci.c
> >> +++ b/hw/apb_pci.c
> >> @@ -362,7 +362,7 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
> >>      /* APB secondary busses */
> >>      pci_dev = pci_create_multifunction(d->bus, PCI_DEVFN(1, 0), true,
> >>                                     "pbm-bridge");
> >> -    br = DO_UPCAST(PCIBridge, dev, dev);
> >> +    br = DO_UPCAST(PCIBridge, dev, pci_dev);
> >>      pci_bridge_map_irq(br, "Advanced PCI Bus secondary bridge 1",
> >>                         pci_apb_map_irq);
> >>      qdev_init_nofail(&pci_dev->qdev);
> >> @@ -370,7 +370,7 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
> >>
> >>      pci_dev = pci_create_multifunction(d->bus, PCI_DEVFN(1, 1), true,
> >>                                     "pbm-bridge");
> >> -    br = DO_UPCAST(PCIBridge, dev, dev);
> >> +    br = DO_UPCAST(PCIBridge, dev, pci_dev);
> >>      pci_bridge_map_irq(br, "Advanced PCI Bus secondary bridge 2",
> >>                         pci_apb_map_irq);
> >>      qdev_init_nofail(&pci_dev->qdev);
> >> @@ -462,7 +462,7 @@ static PCIDeviceInfo pbm_pci_bridge_info = {
> >>      .qdev.name = "pbm-bridge",
> >>      .qdev.size = sizeof(PCIBridge),
> >>      .qdev.vmsd = &vmstate_pci_device,
> >> -    .qdev.reset = pci_brdige_reset,
> >> +    .qdev.reset = pci_bridge_reset,
> >>      .init = apb_pci_bridge_initfn,
> >>      .exit = pci_bridge_exitfn,
> >>      .config_write = pci_bridge_write_config,
> >> --
> >> 1.7.1.1
> >
> >



Re: [Qemu-devel] [PATCH 1/5] Suppress some gcc warnings with -Wtype-limits

2010-09-06 Thread Blue Swirl
On Mon, Sep 6, 2010 at 1:04 PM, Kevin Wolf  wrote:
> Am 04.09.2010 23:07, schrieb Blue Swirl:
>> On Sat, Sep 4, 2010 at 8:30 PM, andrzej zaborowski  wrote:
>>> Hi,
>>>
>>> On 4 September 2010 21:45, Blue Swirl  wrote:
 On Sat, Sep 4, 2010 at 5:57 PM, andrzej zaborowski  
 wrote:
>>  -    if (event < 0 || event >= BLKDBG_EVENT_MAX) {
>>  +    if ((int)event < 0 || event >= BLKDBG_EVENT_MAX) {
> Is the behaviour incorrect for some values, or is it not correct C?
> As far as I know it is correct in c99 because of type promotions
> between enum, int and unsigned int.

 The problem is that the type of an enum may be signed or unsigned,
 which affects the comparison. For example, the following program
 produces different results when it's compiled with -DUNSIGNED and
 without:
 $ cat enum.c
 #include 

 int main(int argc, const char **argv)
 {
    enum {
 #ifdef UNSIGNED
        A = 0,
 #else
        A = -1,
 #endif
    } a;

    a = atoi(argv[1]);
    if (a < 0) {
        puts("1: smaller");
    } else {
        puts("1: not smaller");
    }

    if ((int)a < 0) {
        puts("2: smaller");
    } else {
        puts("2: not smaller");
    }

    return 0;
 }
 $ gcc -DUNSIGNED enum.c -o enum-unsigned
 $ gcc enum.c -o enum-signed
 $ ./enum-signed 0
 1: not smaller
 2: not smaller
 $ ./enum-signed -1
 1: smaller
 2: smaller
 $ ./enum-unsigned 0
 1: not smaller
 2: not smaller
 $ ./enum-unsigned -1
 1: not smaller
 2: smaller
>>>
>>> Ah, that's a good example, however..
>>>

 This is only how GCC uses enums, other compilers have other rules. So
 it's better to avoid any assumptions about signedness of enums.

 In this specific case, because the enum does not have any negative
 items, it is unsigned if compiled with GCC. Unsigned items cannot be
 negative, thus the check is useless.
>>>
>>> I agree it's useless, but saying that it is wrong is a bit of a
>>> stretch in my opinion.  It actually depends on what the intent was --
>>> if the intent was to be able to use the value as an array index, then
>>> I think the check does the job independent of the signedness of the
>>> operands.
>>
>> No.
>>
>> If BLKDBG_EVENT_MAX is < 0x8000, it does not matter if there is
>> the check or not. Because of unsigned arithmetic, only one comparison
>> is enough. With a non-GCC compiler (or if there were negative enum
>> items), the check may have the desired effect, just like with int
>> cast.
>> If BLKDBG_EVENT_MAX is >= 0x8000, the first check is wrong,
>> because you want to allow array access beyond 0x8000, which will
>> be blocked by the first check. A non-GCC compiler may actually reject
>> an enum bigger than 0x8000. Then the checks should be rewritten.
>>
>> The version with int cast is correct in more cases than the version
>> which relies on enum signedness.
>
> If the value isn't explicitly specified, it's defined to start at 0 and
> increment by 1 for each enum constant. So it's very unlikely to hit an
> interesting case - we would have to add some billions of events first.

The discussion about BLKDBG_EVENT_MAX being absurdly high value was
theoretical. In reality, even the check for < 0 is useless at the
moment, since those values can't be produced. It may be useful to
catch internal inconsistencies later if the code changes.

> And isn't it the int cast that would lead to (event < 0) == true if
> BLKDBG_EVENT_MAX was >= 0x800 and falsely return an error? The old
> version should do this right.

No, because if you want to allow event >= 0x8000, you shouldn't
also check for event < 0 when using 32 bit integers on modern
hardware.

> Anyway, even though this specific code shouldn't misbehave today, I'm
> all for silencing warnings and enabling -Wtype-limits. We regularly have
> real bugs of this type.

Thank you!



[Qemu-devel] Re: [PATCH 1/5] Suppress some gcc warnings with -Wtype-limits

2010-09-06 Thread Blue Swirl
On Sun, Sep 5, 2010 at 10:33 PM, andrzej zaborowski  wrote:
> On 5 September 2010 23:44, Blue Swirl  wrote:
>> The problem case is when BLKDBG_EVENT_MAX > 0x8000 and the type of
>> enum is unsigned. Then the first check is ignored by the compiler and
>> the second does not catch values which are between 0x8000 and
>> BLKDBG_EVENT_MAX. This may not be what was desired by the check,
>> though.
>>
>> Those values will be caught with the int cast, or if the compiler
>> still happens to make the enum signed (for example, because
>> BLKDBG_EVENT_MAX was changed to a #define in order to support
>> compilers which don't allow too large enum values).
>
> So you're actually talking about INT_MAX + 1, not 0x8000, the
> number depends on the abi.
>
> Quite clearly BLKDBG_EVENT_MAX is the max value in the enum so that
> the values can be used as indices of an array of a known size.  I
> think it's safe to say it is < INT_MAX.
>
>>
>> I think I explained the problem
>> at detail. There is a bug. I have a fix for the bug. The fix is not a
>> workaround, except maybe for committee-induced stupidity which created
>> the enum signedness ambiguity in the first place.
>>
>> > I agree. But it seems to indicate a bigger problem.
>> >
>> > If we are trying to pass in a negative value, which is not one
>> > of enum values, using BlkDebugEvent as type is just confusing,
>> > we should just pass int instead.
>>
>> AFAICT it's only possible to use the values listed in event_names in
>> blkdebug.c, other values are rejected. So the check should actually 
>> be
>> an assert() or it could even be removed.
>
> Sounds good.
>
>> >> >> How about adding assert(OMAP_EMIFS_BASE == 0) and commenting 
>> >> >> out the
>> >> >> check? Then if the value changes, the need to add the 
>> >> >> comparison back
>> >> >> will be obvious.
>> >> >
>> >> > This would work but it's weird.  The thing is it's currently a 
>> >> > correct
>> >> > code and the check may be useless but it's the optimiser's task 
>> >> > to
>> >> > remove it, not ours.  The compiler is not able to tell whether 
>> >> > the
>> >> > check makes sense or nott, because the compiler only has access 
>> >> > to
>> >> > preprocessed code.  So why should you let the compiler have 
>> >> > anything
>> >> > to say on it.
>> >>
>> >> Good point. I'll try to invent something better.
>> >
>> > Use #pragma to supress the warning? Maybe we could wrap this in a 
>> > macro ..
>>
>> Those lines may also desynch silently with changes to 
>> OMAP_EMIFS_BASE.
>>
>> I think the assertion is still the best way, it ensures that 
>> something
>> will happen if OMAP_EMIFS_BASE changes. We could for example remove
>> OMAP_EMIFS_BASE entirely (it's only used for the check), but someone
>> adding a new define could still forget to adjust the check anyway.
>
> We could replace it with a macro
> #define OMAP_EMIFS_VALID(addr) ((target_phys_addr_t)addr < 
> OMAP_EMIFF_BASE)
> but all this does look artificial. And of course using type casts
> is always scary ...
>
> Would it help to have some inline functions that do the range 
> checking correctly?
> We have a couple of range helpers in pci.h, these could be moved out
> to range.h and we could add some more. As there act on u64 this will 
> get
> the type limits mostly automatically right.

 That seems to be the best solution, I get no warnings with this:
>>>
>>> While the resulting code is clean (just as the current code), I think
>>> it really shows that this warning should not be enabled.  At this
>>> point you find yourself working around your compiler and potentially
>>> forcing other write some really strange code to work around the
>>> problem caused by this.
>>
>> The warnings generated by -Wtype-limits are very useful, because with
>> it I have found several bugs in the code.
>
> Is that an argument for enabling a warning *by default*?  Looking at
> any specific part of the code you'll find bugs. If you enable some
> warning, it'll hint on a given subset of the places in the code, some
> of which are bugs and some are false-positives.  Enable a different
> warning and you get a different subset.  Grep for any given keyword or
> constant and you get a different subset.

 Right, so when we enable *by default* the warning, buggy code (and
 unfortunately the false positives, if any) will not be committed.
>>>
>>> Ok, so "malloc causes memory leeks, let's forbid dynamic allocation", 

Re: [Qemu-devel] Re: [PATCH 01/14] RESEND apb: fix typo.

2010-09-06 Thread Blue Swirl
On Mon, Sep 6, 2010 at 9:46 AM, Michael S. Tsirkin  wrote:
> On Mon, Sep 06, 2010 at 04:46:15PM +0900, Isaku Yamahata wrote:
>> fix typo.
>>
>> Signed-off-by: Isaku Yamahata 
>
> This is separate from the express patches, right?
> I'll appply this. Thanks!

Since the patch that introduces the build breakage (pci_brdige_reset)
has not been committed to master yet, can you instead drop that commit
and introduce a fixed version? This will avoid problems with git
bisect.

>
>> ---
>>  hw/apb_pci.c |    6 +++---
>>  1 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/apb_pci.c b/hw/apb_pci.c
>> index 10a5baa..c619112 100644
>> --- a/hw/apb_pci.c
>> +++ b/hw/apb_pci.c
>> @@ -362,7 +362,7 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
>>      /* APB secondary busses */
>>      pci_dev = pci_create_multifunction(d->bus, PCI_DEVFN(1, 0), true,
>>                                     "pbm-bridge");
>> -    br = DO_UPCAST(PCIBridge, dev, dev);
>> +    br = DO_UPCAST(PCIBridge, dev, pci_dev);
>>      pci_bridge_map_irq(br, "Advanced PCI Bus secondary bridge 1",
>>                         pci_apb_map_irq);
>>      qdev_init_nofail(&pci_dev->qdev);
>> @@ -370,7 +370,7 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
>>
>>      pci_dev = pci_create_multifunction(d->bus, PCI_DEVFN(1, 1), true,
>>                                     "pbm-bridge");
>> -    br = DO_UPCAST(PCIBridge, dev, dev);
>> +    br = DO_UPCAST(PCIBridge, dev, pci_dev);
>>      pci_bridge_map_irq(br, "Advanced PCI Bus secondary bridge 2",
>>                         pci_apb_map_irq);
>>      qdev_init_nofail(&pci_dev->qdev);
>> @@ -462,7 +462,7 @@ static PCIDeviceInfo pbm_pci_bridge_info = {
>>      .qdev.name = "pbm-bridge",
>>      .qdev.size = sizeof(PCIBridge),
>>      .qdev.vmsd = &vmstate_pci_device,
>> -    .qdev.reset = pci_brdige_reset,
>> +    .qdev.reset = pci_bridge_reset,
>>      .init = apb_pci_bridge_initfn,
>>      .exit = pci_bridge_exitfn,
>>      .config_write = pci_bridge_write_config,
>> --
>> 1.7.1.1
>
>



Re: [Qemu-devel] Guest cannot handle a PCI BAR > 1GB

2010-09-06 Thread Cam Macdonell
On Mon, Sep 6, 2010 at 10:37 AM, Cam Macdonell  wrote:
>
>
> On Sun, Sep 5, 2010 at 10:50 AM, Avi Kivity  wrote:
>>
>>  On 09/04/2010 01:22 AM, Cam Macdonell wrote:
>>>
>>> Hi,
>>>
>>> I'm trying to test 2 GB (and eventually larger) BARs with ivshmem and
>>> I get an error in the guest that it is able to find a mem resource for
>>> a BAR larger than 1GB.  I'm using 64-bit BARs.
>>>
>>> when running with 6GB of RAM and a 1GB BAR for ivshmem, it finds a
>>> resource (and searches beyond 32-bit values to find it).  Here is a
>>> log from printfs added to the loop that searches the resources from
>>> find_resource() in kernel/resource.c:363.
>>>
>>
>> This is a kernel question, not a qemu issue.  Copying lkml.
>>
>>> trying 'tmp.start' 1000 to
>>>         'tmp.end' fff
>>> trying 'tmp.start' 9f400 to
>>>         'tmp.end' 9f3ff
>>> trying 'tmp.start' a to
>>>         'tmp.end' e
>>> trying 'tmp.start' 10 to
>>>         'tmp.end' f
>>> trying 'tmp.start' dfffd000 to
>>>         'tmp.end' dfffcfff
>>> trying 'tmp.start' e000 to
>>>         'tmp.end' efff
>>> trying 'tmp.start' f200 to
>>>         'tmp.end' f1ff
>>> trying 'tmp.start' f2001000 to
>>>         'tmp.end' f200
>>> trying 'tmp.start' f202 to
>>>         'tmp.end' f201
>>> trying 'tmp.start' f2021000 to
>>>         'tmp.end' f202
>>> trying 'tmp.start' f204 to
>>>         'tmp.end' f203
>>> trying 'tmp.start' f2040100 to
>>>         'tmp.end' febf
>>> trying 'tmp.start' fec00400 to
>>>         'tmp.end' fffb
>>> trying 'tmp.start' 1 to
>>>         'tmp.end' 
>>> trying 'tmp.start' 1a000 to
>>>         'tmp.end' 
>>> pci :00:04.0: BAR 2: assigned [mem 0x1c000-0x1 64bit]
>>> pci :00:04.0: BAR 2: set to [mem 0x1c000-0x1 64bit]
>>> (PCI address [0x1c000-0x1]
>>>
>>> and you can see the BAR is successfully assigned.
>>>
>>> However, with a 2GB BAR (below), the search fails, but it also never
>>> searches beyong 32-bits.  Again, all that's changed is the size of the
>>> ivshmem region.
>>>
>>> trying 'tmp.start' 1000 to
>>>         'tmp.end' fff
>>> trying 'tmp.start' 9f400 to
>>>         'tmp.end' 9f3ff
>>> trying 'tmp.start' a to
>>>         'tmp.end' e
>>> trying 'tmp.start' 10 to
>>>         'tmp.end' f
>>> trying 'tmp.start' dfffd000 to
>>>         'tmp.end' dfffcfff
>>> pci :00:04.0: BAR 2: can't assign mem (size 0x8000)
>>>
>>> Is there a limit to PCI BAR sizes or resources?  Any pointers or
>>> further debugging tips are greatly appreciated.
>>>
>>
>> What kernel version are you looking at?
>
> latest kvm git, 2.6.36-rc2+
>
>>
>> Please add printks to the loop so we can see this->start and this->end.  It 
>> smells like a truncation issue.
>
> Success with a 1GB BAR
> this->start 1000, this->end 9f3ff
> this->start 9f400, this->end 9
> this->start f, this->end f
> this->start 10, this->end dfffcfff
> this->start dfffd000, this->end dfff
> this->start f000, this->end f1ff
> this->start f200, this->end f2000fff
> this->start f201, this->end f201
> this->start f202, this->end f2020fff
> this->start f203, this->end f203
> this->start f204, this->end f20400ff
> this->start fec0, this->end fec003ff
> this->start fffc, this->end 
> this->start 1, this->end 11fff
> tmp.start 12000, tmp.end 
> pci :00:04.0: BAR 2: assigned [mem 0x14000-0x17fff 64bit]
> and when it fails with a 2GB BAR, the following is printed
> this->start 1000, this->end 9f3ff
> this->start 9f400, this->end 9
> this->start f, this->end f
> this->start 10, this->end dfffcfff
> this->start dfffd000, this->end dfff
> pci :00:04.0: BAR 2: can't assign mem (size 0x8000)
> I added a few more debug statements and found that in the failure case, the 
> function returns that it found a region (the last one printed before the 
> error).  I've added printfs for the two tests in the if that determine when a 
> region is found:
>        if (tmp.start < tmp.end && tmp.end - tmp.start >= size - 1) {
>             new->start = tmp.start;
>             new->end = tmp.start + size - 1;
>             printk(KERN_INFO "returning 0\n");
>             return 0;
>         }
> this->start 1000, this->end 9f3ff
> tmp.start 8000, tmp.end fff
>     true: 8fff >= 7fff
> this->start 9f400, this->end 9
> tmp.start 8000, tmp.end 9f3ff
>     true: 8009f3ff >= 7fff
> this->start f, this->end f
> tmp.start 8000, tmp.end e
>     true: 800e >= 7fff
> this->start 10, this->end dfffcfff
> tmp.start 8000, tmp.end f
>     true: 800f >= 7fff
> this->start dfffd000, this->end dfff
> tmp.start 10, tmp.end dfffcfff
>     true: 10 < dfffcfff
>     true: dfefcfff >= 7fff
> returning 0
> pci :00:04.0: BAR 2: can'

Re: [Qemu-devel] [PATCH v3 00/14] trace: Add static tracing to QEMU

2010-09-06 Thread Anthony Liguori

On 09/06/2010 11:51 AM, Daniel P. Berrange wrote:

On Mon, Sep 06, 2010 at 04:13:57PM +0100, Stefan Hajnoczi wrote:
   

This patch series adds static tracing to QEMU.  It can be used to instrument
QEMU code by means of lightweight logging called trace events.

Prerna and I are now posting the entire patch series with a serious eye towards
checking we meet users' and developers' tracing needs and with the goal of
getting this functionality merged into qemu.git.
 

The main question would be why create a tracing framework and probe
markup macros specific to QEMU ? It looks like quite a few major
open source projects (PostgreSQL, Python, TCL, OpenJDK) are using
DTrace static probe markers for code instrumentation. IIUC this
is accessible on Solaris, (Free/Net?)-BSD, OS-X and also Linux via
SystemTAP's DTrace compat layer. Is this QEMU specific probe markup
flexible enough to make it possible to also support DTrace/SystemTAP
without having to add a second set of source code markers to every
probe point ?
   


Yes, there's a simple generator which converts are marker format to any 
type of backend.  It can be  LTTng, dtrace, or something simpler.


If you look at some of the earlier threads, the basic problem is that no 
single trace point infrastructure seems to be sufficiently mature today 
so using an intermediary to delay the decision of which backend should 
be used seemed like the prudent thing to do.


Regards,

Anthony Liguori

Regards,
Daniel
   





[Qemu-devel] Re: [PATCH 00/14] pcie port switch emulators

2010-09-06 Thread Wei Xu
Isaku,

Thanks! Last week I did some merging to qemu-kvm and found similar issues
(qdev reset) and also ioapic/apic needs many changes. I may ask your help
this week...

Wei


On 9/6/10 12:46 AM, "Isaku Yamahata"  wrote:

> This patch series implements pcie port switch emulators
> which is basic part for pcie/q35 support.
> This is for mst/pci tree.
> 
> some random comments
> - pci bus reset
>   As Anthony is cleaning up qdev reset stuff, so pci bus reset code
>   is commented out. Once the qdev clean up is done, the patch that
>   enabled pci bus reset will be sent.
> 
> - vid/did
>   there are arbitrariness for pcie port switch. I just choose
>   Intel and IT one to model them.
> 
> thanks,
> 
> Isaku Yamahata (14):
>   RESEND apb: fix typo.
>   pci: consolidate pci_add_capability_at_offset() into
> pci_add_capability().
>   pci bridge: add helper function for ssvid capability.
>   pci: call hotplug callback even when not hotplug case for later use.
>   pci: make pci_parse_devfn() aware of func.
>   pci_ids.h: add vendor id of Texus Intesruments.
>   msi: implemented msi.
>   pcie: helper functions for pcie extended capability.
>   pcie port: define struct PCIEPort/PCIESlot and helper functions
>   pcie root port: implement pcie root port.
>   pcie upstream port: pci express switch upstream port.
>   pcie downstream port: pci express switch downstream port.
>   pcie/hotplug: glue pushing attention button command. pcie_abp
>   pcie/aer: glue aer error injection into qemu monitor.
> 
>  Makefile.objs|6 +-
>  hw/acpi_piix4.c  |3 +
>  hw/apb_pci.c |6 +-
>  hw/eepro100.c|4 +-
>  hw/msi.c |  362 +++
>  hw/msi.h |   41 ++
>  hw/msix.c|3 +-
>  hw/pci.c |   70 ++-
>  hw/pci.h |   36 +-
>  hw/pci_bridge.c  |   19 +
>  hw/pci_bridge.h  |3 +
>  hw/pci_ids.h |2 +
>  hw/pcie.c| 1753
> ++
>  hw/pcie.h|  186 ++
>  hw/pcie_downstream.c |  224 +++
>  hw/pcie_downstream.h |   33 +
>  hw/pcie_port.c   |  188 ++
>  hw/pcie_port.h   |   51 ++
>  hw/pcie_root.c   |  247 +++
>  hw/pcie_root.h   |   32 +
>  hw/pcie_upstream.c   |  206 ++
>  hw/pcie_upstream.h   |   32 +
>  qemu-common.h|3 +
>  qemu-monitor.hx  |   36 +
>  sysemu.h |9 +
>  25 files changed, 3520 insertions(+), 35 deletions(-)
>  create mode 100644 hw/msi.c
>  create mode 100644 hw/msi.h
>  create mode 100644 hw/pcie.c
>  create mode 100644 hw/pcie.h
>  create mode 100644 hw/pcie_downstream.c
>  create mode 100644 hw/pcie_downstream.h
>  create mode 100644 hw/pcie_port.c
>  create mode 100644 hw/pcie_port.h
>  create mode 100644 hw/pcie_root.c
>  create mode 100644 hw/pcie_root.h
>  create mode 100644 hw/pcie_upstream.c
>  create mode 100644 hw/pcie_upstream.h
> 




Re: [Qemu-devel] [PATCH v3 00/14] trace: Add static tracing to QEMU

2010-09-06 Thread Daniel P. Berrange
On Mon, Sep 06, 2010 at 04:13:57PM +0100, Stefan Hajnoczi wrote:
> This patch series adds static tracing to QEMU.  It can be used to instrument
> QEMU code by means of lightweight logging called trace events.
> 
> Prerna and I are now posting the entire patch series with a serious eye 
> towards
> checking we meet users' and developers' tracing needs and with the goal of
> getting this functionality merged into qemu.git.

The main question would be why create a tracing framework and probe
markup macros specific to QEMU ? It looks like quite a few major
open source projects (PostgreSQL, Python, TCL, OpenJDK) are using
DTrace static probe markers for code instrumentation. IIUC this
is accessible on Solaris, (Free/Net?)-BSD, OS-X and also Linux via
SystemTAP's DTrace compat layer. Is this QEMU specific probe markup
flexible enough to make it possible to also support DTrace/SystemTAP
without having to add a second set of source code markers to every
probe point ?

Regards,
Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|



Re: [Qemu-devel] Re: [RFC] qed: Add QEMU Enhanced Disk format

2010-09-06 Thread Anthony Liguori

On 09/06/2010 09:10 AM, Kevin Wolf wrote:



If you implement a subset of functionality for an existing on-disk
format, I think you damage user's expectations.
 

I don't really buy that implementing compression/encryption wouldn't
have been possible if it was the only problem. Of course, if you don't
implement it, you can't use an on-disk format that supports them.
   


The trouble with compression is that you don't have fixed size clusters 
any more.  In order to support writes, you either have to write 
uncompressed data to the EOF leaking the compressed version or write 
compressed data and attempt to use a free list to avoid leaking 
clusters.  Since cluster size isn't fixed, the free list is of variable 
size which means you'd have to do something sophisticated like a buddy 
algorithm to allocate from the free list.


It's just not worth it since there's no easy way to do it correctly.

Encryption is straight forward.

Lack of features is a killer though.  The only thing you could really do 
is the same type of trickery we did with qcow2 where we detect whether 
there's room between the header and the L1.  Of course, there's nothing 
in qcow that really says if the L1 doesn't start at sizeof(old_header) 
then you have new_header so this is not technically backwards compatible.


But even assuming it is, the new features introduced in new_header are 
undiscoverable to older version of QEMU.  So if you do something that 
makes the image unreadable to older QEMUs (like adding a new encryption 
algorithm), instead of getting a nice error, you get silent corruption.  
qcow has had more than the QEMU implementation too so we're not the only 
ones that have been creating qcow images so we can't just rely on our 
historic behavior.


IMHO, this alone justifies a new format.

Regards,

Anthony Liguori


If we claim to support qcow images, then given any old qcow image I have
laying around for 5 years ago, I should be able to run it without qemu
throwing an error.

There's some really ugly stuff in qcow.  Nothing is actually aligned.
This makes implementing things like O_DIRECT very challenging since you
basically have to handle bouncing any possible buffer.  Since the L1
table occurs immediately after the header, there's really no room to
play any kind of tricks to add features.
 

That's a good point actually. I didn't remember that.

Kevin
   





Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-06 Thread Anthony Liguori

On 09/06/2010 08:35 AM, Daniel P. Berrange wrote:

On Mon, Sep 06, 2010 at 07:52:41AM -0500, Anthony Liguori wrote:
   

On 09/06/2010 06:18 AM, Daniel P. Berrange wrote:
 

I agree with ditching compression, but encryption is an important
capability which cannot be satisfactorily added at other layers
in the stack. While block devices / local filesystems can layer
in dm-crypt in the host, this is not possible with network/cluster
filesystems which account for a non-trivial target audience.
   

ecryptfs should work with NFS these days.  If it still doesn't, it will
in the not too distant future.
 

Assuming it does work with NFS, IIUC, that still requires the user to
have root privileges to setup ecryptfs for the NFS mount in question.
So it takes care of the use case where the host admin doesn't trust
the network/remote fs admin, but doesn't work for the case of local
unprivileged users with NFS home dirs&  a host admin who doesnt help.
   


There's talk of moving ecryptfs from a stackable file system to a VFS 
feature.  Among other things, this would make it usable by 
non-privileged users since there's really no reason for it to not be.


Let's take a step back though as I'd like to point out two things.  The 
first has feature support which means that if it's just a matter of 
adding something to the header and encrypting blocks, then it's super 
easy to add.  Furthermore, you get graceful detection of failure when 
using an encrypted image with a version of QEMU that doesn't support 
encryption in QED.  When creating new images that aren't encrypted with 
the new QEMU, the images still work with old QEMUs.


So really, there's little rush to add encryption (or any feature) to 
QED.  The main focus ATM is making we achieve good performance and good 
reliability.


But encryption is never simple.  If you want anymore more than a toy, 
you really need to integrate into a key ring system, make use of a 
crypto API to leverage cryptographic accelerators, etc.  This is why 
relying on the a filesystem (or VFS feature) makes so much sense.



As currently implemented the string refers to a QEMU block driver
which is perhaps not the best choice for a general purpose file
format, if we want this applicable to other non-QEMU apps. Perhaps
it would be better if we explicitly declared backing format as an
enumerated int that represents specific file formats, thus decoupling
it from a specific driver.
   


That's one of the reasons I made this an optional feature.  I think 
we're going to have to revisit the backing format in the future to be 
something more meaningful.


For the purposes of the spec, I was going to say that backing_fmt was a 
suggestion to an implementation on how to interpret backing_file and 
leave it at that.


It terms of making something that's strictly enforced, I would suggest 
not specifying the format but rather having something like 
is_backing_raw.  IOW, a boolean that would be set if the backing file 
was raw (and not probe-able).  Otherwise, the backing format can be 
safely probed.


I would then say that backing file cannot be raw unless that bit is set 
or something like that.



Another related idea is perhaps to specify that if backing_fmt is
omitted in the metadata, the backing file must be treated as a QED
format file, rather than probed.


!raw would be a better way of specifying it but yeah, I think it's a 
reasonable idea.


Regards,

Anthony Liguori


More importantly, humans to create image formats by hand.  Instead, they
use tools like qemu-img.  If you think we should for the specification
of a backing file format in qemu-img, that's the place we should do it.
 

Certainly qemu-img can always add a format, even if the specification
declared it optional, but I think its worth considering declaring it
it compulsory in the spec, to take that variable out of the equation
for apps using the images.

Regards,
Daniel
   





Re: [Qemu-devel] Guest cannot handle a PCI BAR > 1GB

2010-09-06 Thread Cam Macdonell
On Sun, Sep 5, 2010 at 10:50 AM, Avi Kivity  wrote:

>  On 09/04/2010 01:22 AM, Cam Macdonell wrote:
>
>> Hi,
>>
>> I'm trying to test 2 GB (and eventually larger) BARs with ivshmem and
>> I get an error in the guest that it is able to find a mem resource for
>> a BAR larger than 1GB.  I'm using 64-bit BARs.
>>
>> when running with 6GB of RAM and a 1GB BAR for ivshmem, it finds a
>> resource (and searches beyond 32-bit values to find it).  Here is a
>> log from printfs added to the loop that searches the resources from
>> find_resource() in kernel/resource.c:363.
>>
>>
> This is a kernel question, not a qemu issue.  Copying lkml.
>
>
>  trying 'tmp.start' 1000 to
>> 'tmp.end' fff
>> trying 'tmp.start' 9f400 to
>> 'tmp.end' 9f3ff
>> trying 'tmp.start' a to
>> 'tmp.end' e
>> trying 'tmp.start' 10 to
>> 'tmp.end' f
>> trying 'tmp.start' dfffd000 to
>> 'tmp.end' dfffcfff
>> trying 'tmp.start' e000 to
>> 'tmp.end' efff
>> trying 'tmp.start' f200 to
>> 'tmp.end' f1ff
>> trying 'tmp.start' f2001000 to
>> 'tmp.end' f200
>> trying 'tmp.start' f202 to
>> 'tmp.end' f201
>> trying 'tmp.start' f2021000 to
>> 'tmp.end' f202
>> trying 'tmp.start' f204 to
>> 'tmp.end' f203
>> trying 'tmp.start' f2040100 to
>> 'tmp.end' febf
>> trying 'tmp.start' fec00400 to
>> 'tmp.end' fffb
>> trying 'tmp.start' 1 to
>> 'tmp.end' 
>> trying 'tmp.start' 1a000 to
>> 'tmp.end' 
>> pci :00:04.0: BAR 2: assigned [mem 0x1c000-0x1 64bit]
>> pci :00:04.0: BAR 2: set to [mem 0x1c000-0x1 64bit]
>> (PCI address [0x1c000-0x1]
>>
>> and you can see the BAR is successfully assigned.
>>
>> However, with a 2GB BAR (below), the search fails, but it also never
>> searches beyong 32-bits.  Again, all that's changed is the size of the
>> ivshmem region.
>>
>> trying 'tmp.start' 1000 to
>> 'tmp.end' fff
>> trying 'tmp.start' 9f400 to
>> 'tmp.end' 9f3ff
>> trying 'tmp.start' a to
>> 'tmp.end' e
>> trying 'tmp.start' 10 to
>> 'tmp.end' f
>> trying 'tmp.start' dfffd000 to
>> 'tmp.end' dfffcfff
>> pci :00:04.0: BAR 2: can't assign mem (size 0x8000)
>>
>> Is there a limit to PCI BAR sizes or resources?  Any pointers or
>> further debugging tips are greatly appreciated.
>>
>>
> What kernel version are you looking at?
>

latest kvm git, 2.6.36-rc2+


>
> Please add printks to the loop so we can see this->start and this->end.  It
> smells like a truncation issue.


Success with a 1GB BAR

this->start 1000, this->end 9f3ff
this->start 9f400, this->end 9
this->start f, this->end f
this->start 10, this->end dfffcfff
this->start dfffd000, this->end dfff
this->start f000, this->end f1ff
this->start f200, this->end f2000fff
this->start f201, this->end f201
this->start f202, this->end f2020fff
this->start f203, this->end f203
this->start f204, this->end f20400ff
this->start fec0, this->end fec003ff
this->start fffc, this->end 
this->start 1, this->end 11fff
tmp.start 12000, tmp.end 
pci :00:04.0: BAR 2: assigned [mem 0x14000-0x17fff 64bit]

and when it fails with a 2GB BAR, the following is printed

this->start 1000, this->end 9f3ff
this->start 9f400, this->end 9
this->start f, this->end f
this->start 10, this->end dfffcfff
this->start dfffd000, this->end dfff
pci :00:04.0: BAR 2: can't assign mem (size 0x8000)

I added a few more debug statements and found that in the failure case, the
function returns that it found a region (the last one printed before the
error).  I've added printfs for the two tests in the if that determine when
a region is found:

   if (tmp.start < tmp.end && tmp.end - tmp.start >= size - 1) {
new->start = tmp.start;
new->end = tmp.start + size - 1;
printk(KERN_INFO "returning 0\n");
return 0;
}

this->start 1000, this->end 9f3ff
tmp.start 8000, tmp.end fff
true: 8fff >= 7fff
this->start 9f400, this->end 9
tmp.start 8000, tmp.end 9f3ff
true: 8009f3ff >= 7fff
this->start f, this->end f
tmp.start 8000, tmp.end e
true: 800e >= 7fff
this->start 10, this->end dfffcfff
tmp.start 8000, tmp.end f
true: 800f >= 7fff
this->start dfffd000, this->end dfff
tmp.start 10, tmp.end dfffcfff
true: 10 < dfffcfff
true: dfefcfff >= 7fff
returning 0
pci :00:04.0: BAR 2: can't assign mem (size 0x8000)





>
>
> --
> error compiling committee.c: too many arguments to function
>
>


Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-06 Thread Anthony Liguori

On 09/06/2010 09:24 AM, Alexander Graf wrote:


Oh - I missed the static there. Yeah, it's even worse. This is racy.
   


It's easy to refactor away so I'll just do that but it's not actually racy.

It's just not re-entrant and the lifetime of the returned result is only 
until the next call.


Regards,

Anthony Liguori


Alex


   





Re: [Qemu-devel] [PATCH] raw-posix: improve detection of scsi-generic devices

2010-09-06 Thread Alexander Graf

On 06.09.2010, at 17:06, Bernhard Kohl wrote:

> From: Bernhard Kohl 
> 
> Allow symbolic links which point to /dev/sgX devices.

Couldn't you send an SG_IO test ioctl over and see if it works? I really 
dislike the whole file name magic matching.


Alex




[Qemu-devel] [PATCH 02/14] trace: Add simple built-in tracing backend

2010-09-06 Thread Stefan Hajnoczi
This patch adds a simple tracer which produces binary trace files.  To
try out the simple backend:

$ ./configure --trace-backend=simple
$ make

After running QEMU you can pretty-print the trace:

$ ./simpletrace.py trace-events trace.log

The output of simpletrace.py looks like this:

  qemu_realloc 0.699 ptr=0x24363f0 size=0x3 newptr=0x24363f0
  qemu_free 0.768 ptr=0x24363f0
  ^   ^ timestamp delta (us)
  | trace event name

Signed-off-by: Stefan Hajnoczi 

trace: Make trace record fields 64-bit

Explicitly use 64-bit fields in trace records so that timestamps and
magic numbers work for 32-bit host builds.

Includes fixes from Prerna Saxena .

Signed-off-by: Prerna Saxena 
Signed-off-by: Stefan Hajnoczi 
---
 .gitignore |1 +
 Makefile   |2 +
 Makefile.objs  |3 +
 configure  |2 +-
 simpletrace.c  |  159 
 simpletrace.h  |   26 +
 simpletrace.py |   93 +
 tracetool  |   78 ++-
 8 files changed, 360 insertions(+), 4 deletions(-)
 create mode 100644 simpletrace.c
 create mode 100644 simpletrace.h
 create mode 100755 simpletrace.py

diff --git a/.gitignore b/.gitignore
index f3f2bb7..72bff98 100644
--- a/.gitignore
+++ b/.gitignore
@@ -42,6 +42,7 @@ QMP/qmp-commands.txt
 *.log
 *.pdf
 *.pg
+*.pyc
 *.toc
 *.tp
 *.vr
diff --git a/Makefile b/Makefile
index 3c5e6a0..ab91d42 100644
--- a/Makefile
+++ b/Makefile
@@ -112,6 +112,8 @@ trace.c: $(SRC_PATH)/trace-events config-host.mak
 
 trace.o: trace.c $(GENERATED_HEADERS)
 
+simpletrace.o: simpletrace.c $(GENERATED_HEADERS)
+
 ##
 
 qemu-img.o: qemu-img-cmds.h
diff --git a/Makefile.objs b/Makefile.objs
index c61332d..3ef6d80 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -269,6 +269,9 @@ libdis-$(CONFIG_SPARC_DIS) += sparc-dis.o
 # trace
 
 trace-obj-y = trace.o
+ifeq ($(TRACE_BACKEND),simple)
+trace-obj-y += simpletrace.o
+endif
 
 vl.o: QEMU_CFLAGS+=$(GPROF_CFLAGS)
 
diff --git a/configure b/configure
index 38613c3..6729dbe 100755
--- a/configure
+++ b/configure
@@ -900,7 +900,7 @@ echo "  --enable-docsenable documentation build"
 echo "  --disable-docs   disable documentation build"
 echo "  --disable-vhost-net  disable vhost-net acceleration support"
 echo "  --enable-vhost-net   enable vhost-net acceleration support"
-echo "  --trace-backend=BTrace backend nop"
+echo "  --trace-backend=BTrace backend nop simple"
 echo ""
 echo "NOTE: The object files are built at the place where configure is 
launched"
 exit 1
diff --git a/simpletrace.c b/simpletrace.c
new file mode 100644
index 000..4777308
--- /dev/null
+++ b/simpletrace.c
@@ -0,0 +1,159 @@
+/*
+ * Simple trace backend
+ *
+ * Copyright IBM, Corp. 2010
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include "trace.h"
+
+/** Trace file header event ID */
+#define HEADER_EVENT_ID (~(uint64_t)0) /* avoids conflicting with 
TraceEventIDs */
+
+/** Trace file magic number */
+#define HEADER_MAGIC 0xf2b177cb0aa429b4ULL
+
+/** Trace file version number, bump if format changes */
+#define HEADER_VERSION 0
+
+/** Trace buffer entry */
+typedef struct {
+uint64_t event;
+uint64_t timestamp_ns;
+uint64_t x1;
+uint64_t x2;
+uint64_t x3;
+uint64_t x4;
+uint64_t x5;
+uint64_t x6;
+} TraceRecord;
+
+enum {
+TRACE_BUF_LEN = 64 * 1024 / sizeof(TraceRecord),
+};
+
+static TraceRecord trace_buf[TRACE_BUF_LEN];
+static unsigned int trace_idx;
+static FILE *trace_fp;
+
+static bool write_header(FILE *fp)
+{
+static const TraceRecord header = {
+.event = HEADER_EVENT_ID,
+.timestamp_ns = HEADER_MAGIC,
+.x1 = HEADER_VERSION,
+};
+
+return fwrite(&header, sizeof header, 1, fp) == 1;
+}
+
+static void flush_trace_buffer(void)
+{
+if (!trace_fp) {
+trace_fp = fopen("trace.log", "w");
+if (trace_fp) {
+write_header(trace_fp);
+}
+}
+if (trace_fp) {
+size_t unused; /* for when fwrite(3) is declared warn_unused_result */
+unused = fwrite(trace_buf, trace_idx * sizeof(trace_buf[0]), 1, 
trace_fp);
+}
+
+/* Discard written trace records */
+trace_idx = 0;
+}
+
+void st_set_trace_file_enabled(bool enable)
+{
+if (enable == trace_file_enabled) {
+return; /* no change */
+}
+
+/* Flush/discard trace buffer */
+st_flush_trace_buffer();
+
+/* To disable, close trace file */
+if (!enable) {
+fclose(trace_fp);
+trace_fp = NULL;
+}
+
+trace_file_enabled = enable;
+}
+
+static void trace(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3,
+  uint64_t x4, uint64_t x5, uint64_t x6)
+{
+TraceRecord *rec = &tra

[Qemu-devel] [PATCH 01/14] trace: Add trace-events file for declaring trace events

2010-09-06 Thread Stefan Hajnoczi
This patch introduces the trace-events file where trace events can be
declared like so:

qemu_malloc(size_t size) "size %zu"
qemu_free(void *ptr) "ptr %p"

These trace event declarations are processed by a new tool called
tracetool to generate code for the trace events.  Trace event
declarations are independent of the backend tracing system (LTTng User
Space Tracing, ftrace markers, DTrace).

The default "nop" backend generates empty trace event functions.
Therefore trace events are disabled by default.

The trace-events file serves two purposes:

1. Adding trace events is easy.  It is not necessary to understand the
   details of a backend tracing system.  The trace-events file is a
   single location where trace events can be declared without code
   duplication.

2. QEMU is not tightly coupled to one particular backend tracing system.
   In order to support tracing across QEMU host platforms and to
   anticipate new backend tracing systems that are currently maturing,
   it is important to be flexible and not tied to one system.

This commit includes fixes from Prerna Saxena
 and Blue Swirl .

Signed-off-by: Stefan Hajnoczi 
---
 .gitignore  |2 +
 Makefile|   17 -
 Makefile.objs   |5 ++
 Makefile.target |1 +
 configure   |   18 ++
 trace-events|   24 
 tracetool   |  175 +++
 7 files changed, 238 insertions(+), 4 deletions(-)
 create mode 100644 trace-events
 create mode 100755 tracetool

diff --git a/.gitignore b/.gitignore
index ec6f89f..f3f2bb7 100644
--- a/.gitignore
+++ b/.gitignore
@@ -2,6 +2,8 @@ config-devices.*
 config-all-devices.*
 config-host.*
 config-target.*
+trace.h
+trace.c
 *-softmmu
 *-darwin-user
 *-linux-user
diff --git a/Makefile b/Makefile
index f95cc2f..3c5e6a0 100644
--- a/Makefile
+++ b/Makefile
@@ -1,6 +1,6 @@
 # Makefile for QEMU.
 
-GENERATED_HEADERS = config-host.h
+GENERATED_HEADERS = config-host.h trace.h
 
 ifneq ($(wildcard config-host.mak),)
 # Put the all: rule here so that config-host.mak can contain dependencies.
@@ -104,16 +104,24 @@ ui/vnc.o: QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
 
 bt-host.o: QEMU_CFLAGS += $(BLUEZ_CFLAGS)
 
+trace.h: $(SRC_PATH)/trace-events config-host.mak
+   $(call quiet-command,sh $(SRC_PATH)/tracetool --$(TRACE_BACKEND) -h < 
$< > $@,"  GEN   $@")
+
+trace.c: $(SRC_PATH)/trace-events config-host.mak
+   $(call quiet-command,sh $(SRC_PATH)/tracetool --$(TRACE_BACKEND) -c < 
$< > $@,"  GEN   $@")
+
+trace.o: trace.c $(GENERATED_HEADERS)
+
 ##
 
 qemu-img.o: qemu-img-cmds.h
 qemu-img.o qemu-tool.o qemu-nbd.o qemu-io.o: $(GENERATED_HEADERS)
 
-qemu-img$(EXESUF): qemu-img.o qemu-tool.o qemu-error.o $(block-obj-y) 
$(qobject-obj-y)
+qemu-img$(EXESUF): qemu-img.o qemu-tool.o qemu-error.o $(trace-obj-y) 
$(block-obj-y) $(qobject-obj-y)
 
-qemu-nbd$(EXESUF): qemu-nbd.o qemu-tool.o qemu-error.o $(block-obj-y) 
$(qobject-obj-y)
+qemu-nbd$(EXESUF): qemu-nbd.o qemu-tool.o qemu-error.o $(trace-obj-y) 
$(block-obj-y) $(qobject-obj-y)
 
-qemu-io$(EXESUF): qemu-io.o cmd.o qemu-tool.o qemu-error.o $(block-obj-y) 
$(qobject-obj-y)
+qemu-io$(EXESUF): qemu-io.o cmd.o qemu-tool.o qemu-error.o $(trace-obj-y) 
$(block-obj-y) $(qobject-obj-y)
 
 qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx
$(call quiet-command,sh $(SRC_PATH)/hxtool -h < $< > $@,"  GEN   $@")
@@ -133,6 +141,7 @@ clean:
rm -f *.o *.d *.a $(TOOLS) TAGS cscope.* *.pod *~ */*~
rm -f slirp/*.o slirp/*.d audio/*.o audio/*.d block/*.o block/*.d 
net/*.o net/*.d fsdev/*.o fsdev/*.d ui/*.o ui/*.d
rm -f qemu-img-cmds.h
+   rm -f trace.c trace.h
$(MAKE) -C tests clean
for d in $(ALL_SUBDIRS) libhw32 libhw64 libuser libdis libdis-user; do \
if test -d $$d; then $(MAKE) -C $$d $@ || exit 1; fi; \
diff --git a/Makefile.objs b/Makefile.objs
index 4a1eaa1..c61332d 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -265,6 +265,11 @@ libdis-$(CONFIG_S390_DIS) += s390-dis.o
 libdis-$(CONFIG_SH4_DIS) += sh4-dis.o
 libdis-$(CONFIG_SPARC_DIS) += sparc-dis.o
 
+##
+# trace
+
+trace-obj-y = trace.o
+
 vl.o: QEMU_CFLAGS+=$(GPROF_CFLAGS)
 
 vl.o: QEMU_CFLAGS+=$(SDL_CFLAGS)
diff --git a/Makefile.target b/Makefile.target
index 18826bb..a4e80b1 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -310,6 +310,7 @@ obj-y += $(addprefix $(HWDIR)/, $(hw-obj-y))
 
 endif # CONFIG_SOFTMMU
 
+obj-y += $(addprefix ../, $(trace-obj-y))
 obj-$(CONFIG_GDBSTUB_XML) += gdbstub-xml.o
 
 $(QEMU_PROG): $(obj-y) $(obj-$(TARGET_BASE_ARCH)-y)
diff --git a/configure b/configure
index 146dac0..38613c3 100755
--- a/configure
+++ b/configure
@@ -317,6 +317,7 @@ pkgversion=""
 check_utests="no"
 user_pie="no"
 zero_malloc=""
+trace_backend="nop"
 
 # OS specific
 if check_define __linux__ ; then
@@ -519,6 +520,8 @@ for opt do
   ;;
   --target-list=*) t

[Qemu-devel] [PATCH 04/14] trace: Support disabled events in trace-events

2010-09-06 Thread Stefan Hajnoczi
Sometimes it is useful to disable a trace event.  Removing the event
from trace-events is not enough since source code will call the
trace_*() function for the event.

This patch makes it easy to build without specific trace events by
marking them disabled in trace-events:

disable multiwrite_cb(void *mcb, int ret) "mcb %p ret %d"

This builds without the multiwrite_cb trace event.

Signed-off-by: Stefan Hajnoczi 

trace: Allow bulk enabling/disabling of trace events at compile time

For 'simple' trace backend, allow bulk enabling/disabling of trace
events at compile time.  Trace events that are preceded by 'disable'
keyword are compiled in, but turned off by default. These can
individually be turned on using the monitor.  All other trace events are
enabled by default.

TODO :
This could be enhanced when the trace-event namespace is partitioned into a
group and an ID within that group. In such a case, marking a group as enabled
would automatically enable all trace-events listed under it.

Signed-off-by: Prerna Saxena 
Signed-off-by: Stefan Hajnoczi 
---
 trace-events |7 ++-
 tracetool|   44 +++-
 2 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/trace-events b/trace-events
index a37d3cc..2a986ec 100644
--- a/trace-events
+++ b/trace-events
@@ -12,10 +12,15 @@
 #
 # Format of a trace event:
 #
-# ( [,  ] ...) ""
+# [disable] ( [,  ] ...) ""
 #
 # Example: qemu_malloc(size_t size) "size %zu"
 #
+# The "disable" keyword will build without the trace event.
+# In case of 'simple' trace backend, it will allow the trace event to be
+# compiled, but this would be turned off by default. It can be toggled on via
+# the monitor.
+#
 # The  must be a valid as a C function name.
 #
 # Types should be standard C types.  Use void * for pointers because the trace
diff --git a/tracetool b/tracetool
index 2e2e37d..8aeac48 100755
--- a/tracetool
+++ b/tracetool
@@ -87,6 +87,20 @@ get_fmt()
 echo "$fmt"
 }
 
+# Get the state of a trace event
+get_state()
+{
+local str disable state
+str=$(get_name "$1")
+disable=${str##disable }
+if [ "$disable" = "$str" ] ; then
+state=1
+else
+state=0
+fi
+echo "$state"
+}
+
 linetoh_begin_nop()
 {
 return
@@ -146,10 +160,14 @@ cast_args_to_uint64_t()
 
 linetoh_simple()
 {
-local name args argc trace_args
+local name args argc trace_args state
 name=$(get_name "$1")
 args=$(get_args "$1")
 argc=$(get_argc "$1")
+state=$(get_state "$1")
+if [ "$state" = "0" ]; then
+name=${name##disable }
+fi
 
 trace_args="$simple_event_num"
 if [ "$argc" -gt 0 ]
@@ -188,10 +206,14 @@ EOF
 
 linetoc_simple()
 {
-local name
+local name state
 name=$(get_name "$1")
+state=$(get_state "$1")
+if [ "$state" = "0" ] ; then
+name=${name##disable }
+fi
 cat <

[Qemu-devel] [PATCH 07/14] trace: Add trace file name command-line option

2010-09-06 Thread Stefan Hajnoczi
From: Prerna Saxena 

This patch adds an optional command line switch '-trace' to specify the
filename to write traces to, when qemu starts.
Eg, If compiled with the 'simple' trace backend,
[t...@system]$ qemu -trace FILENAME IMAGE
Allows the binary traces to be written to FILENAME instead of the option
set at config-time.

Signed-off-by: Prerna Saxena 
Signed-off-by: Stefan Hajnoczi 
---
 qemu-config.c   |   18 ++
 qemu-options.hx |   11 +++
 vl.c|   21 +
 3 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/qemu-config.c b/qemu-config.c
index 3abe655..394ac2d 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -288,6 +288,21 @@ static QemuOptsList qemu_mon_opts = {
 },
 };
 
+#ifdef CONFIG_SIMPLE_TRACE
+static QemuOptsList qemu_trace_opts = {
+.name = "trace",
+.implied_opt_name = "trace",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_trace_opts.head),
+.desc = {
+{
+.name = "file",
+.type = QEMU_OPT_STRING,
+},
+{ /* end if list */ }
+},
+};
+#endif
+
 static QemuOptsList qemu_cpudef_opts = {
 .name = "cpudef",
 .head = QTAILQ_HEAD_INITIALIZER(qemu_cpudef_opts.head),
@@ -346,6 +361,9 @@ static QemuOptsList *vm_config_groups[32] = {
 &qemu_global_opts,
 &qemu_mon_opts,
 &qemu_cpudef_opts,
+#ifdef CONFIG_SIMPLE_TRACE
+&qemu_trace_opts,
+#endif
 NULL,
 };
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 453f129..0589906 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2230,6 +2230,17 @@ Normally QEMU loads a configuration file from 
@var{sysconfdir}/qemu.conf and
 @var{sysconfdir}/targ...@var{arch}.conf on startup.  The @code{-nodefconfig}
 option will prevent QEMU from loading these configuration files at startup.
 ETEXI
+#ifdef CONFIG_SIMPLE_TRACE
+DEF("trace", HAS_ARG, QEMU_OPTION_trace,
+"-trace\n"
+"Specify a trace file to log traces to\n",
+QEMU_ARCH_ALL)
+STEXI
+...@item -trace
+...@findex -trace
+Specify a trace file to log output traces to.
+ETEXI
+#endif
 
 HXCOMM This is the last statement. Insert new options before this line!
 STEXI
diff --git a/vl.c b/vl.c
index bd05e39..8df784a 100644
--- a/vl.c
+++ b/vl.c
@@ -47,6 +47,10 @@
 #include 
 #include 
 #include 
+#ifdef CONFIG_SIMPLE_TRACE
+#include "trace.h"
+#endif
+
 #ifdef CONFIG_BSD
 #include 
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__) || 
defined(__DragonFly__)
@@ -1823,6 +1827,9 @@ int main(int argc, char **argv, char **envp)
 int show_vnc_port = 0;
 int defconfig = 1;
 
+#ifdef CONFIG_SIMPLE_TRACE
+const char *trace_file = NULL;
+#endif
 atexit(qemu_run_exit_notifiers);
 error_set_progname(argv[0]);
 
@@ -2595,6 +2602,14 @@ int main(int argc, char **argv, char **envp)
 }
 xen_mode = XEN_ATTACH;
 break;
+#ifdef CONFIG_SIMPLE_TRACE
+case QEMU_OPTION_trace:
+opts = qemu_opts_parse(qemu_find_opts("trace"), optarg, 0);
+if (opts) {
+trace_file = qemu_opt_get(opts, "file");
+}
+break;
+#endif
 case QEMU_OPTION_readconfig:
 {
 int ret = qemu_read_config_file(optarg);
@@ -2638,6 +2653,12 @@ int main(int argc, char **argv, char **envp)
 data_dir = CONFIG_QEMU_DATADIR;
 }
 
+#ifdef CONFIG_SIMPLE_TRACE
+/*
+ * Set the trace file name, if specified.
+ */
+st_set_trace_file(trace_file);
+#endif
 /*
  * Default to max_cpus = smp_cpus, in case the user doesn't
  * specify a max_cpus value.
-- 
1.7.1




[Qemu-devel] [PATCH v3 00/14] trace: Add static tracing to QEMU

2010-09-06 Thread Stefan Hajnoczi
This patch series adds static tracing to QEMU.  It can be used to instrument
QEMU code by means of lightweight logging called trace events.

Prerna and I are now posting the entire patch series with a serious eye towards
checking we meet users' and developers' tracing needs and with the goal of
getting this functionality merged into qemu.git.

Tracing infrastructure allows debugging, performance analysis, and observation
to be performed.  Right now there are ad-hoc logging calls in some source files
which require rebuilding QEMU with certain #defines and perform poorly.  This
patch series introduces a single tracing infrastructure which is easy to use
and can replace ad-hoc techniques.

Two key points:

1. The trace-events file contains the set of defined trace events which can be
   called from a source file.  For example, trace-events has:

 qemu_malloc(size_t size, void *ptr) "size %zu ptr %p"

   and qemu-malloc.c uses this trace event like this:

 #include "trace.h"  /* needed for trace event prototype */

 void *qemu_malloc(size_t size)
 {
 void *ptr;
 if (!size && !allow_zero_malloc()) {
 abort();
 }
 ptr = oom_check(malloc(size ? size : 1));
 trace_qemu_malloc(size, ptr);  /* <-- trace event */
 return ptr;
 }

2. The built-in 'simple' trace backend writes binary traces to
   trace-.  They can be pretty-printed like this:

 ./simpletrace.py trace-events trace-*

   Although you can also try LTTng Userspace Tracer ('ust' trace backend), the
   'simple' trace backend provides commands within the QEMU monitor to
   enable/disable trace events at runtime.  It is easy to use and should serve
   as a good default trace backend.

   The 'simple' trace backend's limitation is that it isn't thread-safe and can
   therefore only trace correctly when the QEMU global mutex is held.

For full documentation, see:
http://repo.or.cz/w/qemu/stefanha.git/blob_plain/refs/heads/tracing_v3:/docs/tracing.txt

The git branch is here:
http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/tracing_v3

Changes in v3:
 * write trace files to current working directory by default
 * avoid adding -lust to LIBS twice

Changes in v2:
 * add 6th argument to trace record to avoid changing file format later
 * add type usage documentation for trace event declarations
 * switch to QemuOptsList for -trace command-line option
 * use atexit() once and only once to flush the trace buffer
 * keep disabled event ids in sync between tracetool and simpletrace.py
 * change echo -n to printf for portability
 * move #include statements to correct spot in the patch series
 * fix style issues from code review




[Qemu-devel] [PATCH 11/14] trace: Trace virtio-blk, multiwrite, and paio_submit

2010-09-06 Thread Stefan Hajnoczi
This patch adds trace events that make it possible to observe
virtio-blk.

Signed-off-by: Stefan Hajnoczi 
---
 block.c|8 
 hw/virtio-blk.c|7 +++
 posix-aio-compat.c |2 ++
 trace-events   |   14 ++
 4 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index da70f29..603dd1f 100644
--- a/block.c
+++ b/block.c
@@ -23,6 +23,7 @@
  */
 #include "config-host.h"
 #include "qemu-common.h"
+#include "trace.h"
 #include "monitor.h"
 #include "block_int.h"
 #include "module.h"
@@ -2062,6 +2063,8 @@ static void multiwrite_cb(void *opaque, int ret)
 {
 MultiwriteCB *mcb = opaque;
 
+trace_multiwrite_cb(mcb, ret);
+
 if (ret < 0 && !mcb->error) {
 mcb->error = ret;
 }
@@ -2202,6 +2205,8 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, 
BlockRequest *reqs, int num_reqs)
 // Check for mergable requests
 num_reqs = multiwrite_merge(bs, reqs, num_reqs, mcb);
 
+trace_bdrv_aio_multiwrite(mcb, mcb->num_callbacks, num_reqs);
+
 /*
  * Run the aio requests. As soon as one request can't be submitted
  * successfully, fail all requests that are not yet submitted (we must
@@ -2223,6 +2228,7 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, 
BlockRequest *reqs, int num_reqs)
  */
 mcb->num_requests = 1;
 
+// Run the aio requests
 for (i = 0; i < num_reqs; i++) {
 mcb->num_requests++;
 acb = bdrv_aio_writev(bs, reqs[i].sector, reqs[i].qiov,
@@ -2233,8 +2239,10 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, 
BlockRequest *reqs, int num_reqs)
 // submitted yet. Otherwise we'll wait for the submitted AIOs to
 // complete and report the error in the callback.
 if (i == 0) {
+trace_bdrv_aio_multiwrite_earlyfail(mcb);
 goto fail;
 } else {
+trace_bdrv_aio_multiwrite_latefail(mcb, i);
 multiwrite_cb(mcb, -EIO);
 break;
 }
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index c3a7343..e4c4427 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -13,6 +13,7 @@
 
 #include 
 #include "qemu-error.h"
+#include "trace.h"
 #include "blockdev.h"
 #include "virtio-blk.h"
 #ifdef __linux__
@@ -52,6 +53,8 @@ static void virtio_blk_req_complete(VirtIOBlockReq *req, int 
status)
 {
 VirtIOBlock *s = req->dev;
 
+trace_virtio_blk_req_complete(req, status);
+
 req->in->status = status;
 virtqueue_push(s->vq, &req->elem, req->qiov.size + sizeof(*req->in));
 virtio_notify(&s->vdev, s->vq);
@@ -88,6 +91,8 @@ static void virtio_blk_rw_complete(void *opaque, int ret)
 {
 VirtIOBlockReq *req = opaque;
 
+trace_virtio_blk_rw_complete(req, ret);
+
 if (ret) {
 int is_read = !(req->out->type & VIRTIO_BLK_T_OUT);
 if (virtio_blk_handle_rw_error(req, -ret, is_read))
@@ -270,6 +275,8 @@ static void virtio_blk_handle_write(VirtIOBlockReq *req, 
MultiReqBuffer *mrb)
 {
 BlockRequest *blkreq;
 
+trace_virtio_blk_handle_write(req, req->out->sector, req->qiov.size / 512);
+
 if (req->out->sector & req->dev->sector_mask) {
 virtio_blk_rw_complete(req, -EIO);
 return;
diff --git a/posix-aio-compat.c b/posix-aio-compat.c
index a67ffe3..2e13184 100644
--- a/posix-aio-compat.c
+++ b/posix-aio-compat.c
@@ -25,6 +25,7 @@
 #include "qemu-queue.h"
 #include "osdep.h"
 #include "qemu-common.h"
+#include "trace.h"
 #include "block_int.h"
 
 #include "block/raw-posix-aio.h"
@@ -583,6 +584,7 @@ BlockDriverAIOCB *paio_submit(BlockDriverState *bs, int fd,
 acb->next = posix_aio_state->first_aio;
 posix_aio_state->first_aio = acb;
 
+trace_paio_submit(acb, opaque, sector_num, nb_sectors, type);
 qemu_paio_submit(acb);
 return &acb->common;
 }
diff --git a/trace-events b/trace-events
index d2f2bbc..de6479e 100644
--- a/trace-events
+++ b/trace-events
@@ -37,3 +37,17 @@ disable qemu_free(void *ptr) "ptr %p"
 disable qemu_memalign(size_t alignment, size_t size, void *ptr) "alignment %zu 
size %zu ptr %p"
 disable qemu_valloc(size_t size, void *ptr) "size %zu ptr %p"
 disable qemu_vfree(void *ptr) "ptr %p"
+
+# block.c
+disable multiwrite_cb(void *mcb, int ret) "mcb %p ret %d"
+disable bdrv_aio_multiwrite(void *mcb, int num_callbacks, int num_reqs) "mcb 
%p num_callbacks %d num_reqs %d"
+disable bdrv_aio_multiwrite_earlyfail(void *mcb) "mcb %p"
+disable bdrv_aio_multiwrite_latefail(void *mcb, int i) "mcb %p i %d"
+
+# hw/virtio-blk.c
+disable virtio_blk_req_complete(void *req, int status) "req %p status %d"
+disable virtio_blk_rw_complete(void *req, int ret) "req %p ret %d"
+disable virtio_blk_handle_write(void *req, unsigned long sector, unsigned long 
nsectors) "req %p sector %lu nsectors %lu"
+
+# posix-aio-compat.c
+disable paio_submit(void *acb, void *opaque, unsigned long sector_num, 
unsigned long nb_sectors, unsigned long type) "acb %p opaque %p sector_num %lu 
nb_sectors %lu type %lu"

[Qemu-devel] [PATCH 06/14] trace: Add trace-file command to open/close/flush trace file

2010-09-06 Thread Stefan Hajnoczi
This patch adds the trace-file command:

  trace-file [on|off|flush]

  Open, close, or flush the trace file.  If no argument is given,
  the status of the trace file is displayed.

The trace file is turned on by default but is only written out when the
trace buffer becomes full.  The flush operation can be used to force
write out at any time.

Turning off the trace file does not change the state of trace events;
tracing will continue to the trace buffer.  When the trace file is off,
use "info trace" to display the contents of the trace buffer in memory.

Signed-off-by: Stefan Hajnoczi 

This commit also contains the trace-file sub-command from the following
commit:

commit 5ce8d1a957afae2c52ad748944ce72848ccf57bd
Author: Prerna Saxena 
Date:   Wed Aug 4 16:23:54 2010 +0530

trace: Add options to specify trace file name at startup and runtime

This patch adds an optional command line switch '-trace' to specify the
filename to write traces to, when qemu starts.
Eg, If compiled with the 'simple' trace backend,
[t...@system]$ qemu -trace FILENAME IMAGE
Allows the binary traces to be written to FILENAME instead of the option
set at config-time.

Also, this adds monitor sub-command 'set' to trace-file commands to
dynamically change trace log file at runtime.
Eg,
(qemu)trace-file set FILENAME
This allows one to set trace outputs to FILENAME from the default
specified at startup.

Signed-off-by: Prerna Saxena 
Signed-off-by: Stefan Hajnoczi 
---
 monitor.c   |   23 
 qemu-monitor.hx |   14 
 simpletrace.c   |   62 ++
 simpletrace.h   |4 +++
 4 files changed, 89 insertions(+), 14 deletions(-)

diff --git a/monitor.c b/monitor.c
index 0e69bc8..e602480 100644
--- a/monitor.c
+++ b/monitor.c
@@ -549,6 +549,29 @@ static void do_change_trace_event_state(Monitor *mon, 
const QDict *qdict)
 bool new_state = qdict_get_bool(qdict, "option");
 st_change_trace_event_state(tp_name, new_state);
 }
+
+static void do_trace_file(Monitor *mon, const QDict *qdict)
+{
+const char *op = qdict_get_try_str(qdict, "op");
+const char *arg = qdict_get_try_str(qdict, "arg");
+
+if (!op) {
+st_print_trace_file_status((FILE *)mon, &monitor_fprintf);
+} else if (!strcmp(op, "on")) {
+st_set_trace_file_enabled(true);
+} else if (!strcmp(op, "off")) {
+st_set_trace_file_enabled(false);
+} else if (!strcmp(op, "flush")) {
+st_flush_trace_buffer();
+} else if (!strcmp(op, "set")) {
+if (arg) {
+st_set_trace_file(arg);
+}
+} else {
+monitor_printf(mon, "unexpected argument \"%s\"\n", op);
+help_cmd(mon, "trace-file");
+}
+}
 #endif
 
 static void user_monitor_complete(void *opaque, QObject *ret_data)
diff --git a/qemu-monitor.hx b/qemu-monitor.hx
index c264c7d..49bcd8d 100644
--- a/qemu-monitor.hx
+++ b/qemu-monitor.hx
@@ -295,6 +295,20 @@ STEXI
 @findex trace-event
 changes status of a trace event
 ETEXI
+
+{
+.name   = "trace-file",
+.args_type  = "op:s?,arg:F?",
+.params = "on|off|flush|set [arg]",
+.help   = "open, close, or flush trace file, or set a new file 
name",
+.mhandler.cmd = do_trace_file,
+},
+
+STEXI
+...@item trace-file on|off|flush
+...@findex trace-file
+Open, close, or flush the trace file.  If no argument is given, the status of 
the trace file is displayed.
+ETEXI
 #endif
 
 {
diff --git a/simpletrace.c b/simpletrace.c
index 97045a6..f849e42 100644
--- a/simpletrace.c
+++ b/simpletrace.c
@@ -42,6 +42,14 @@ enum {
 static TraceRecord trace_buf[TRACE_BUF_LEN];
 static unsigned int trace_idx;
 static FILE *trace_fp;
+static char *trace_file_name = NULL;
+static bool trace_file_enabled = false;
+
+void st_print_trace_file_status(FILE *stream, int (*stream_printf)(FILE 
*stream, const char *fmt, ...))
+{
+stream_printf(stream, "Trace file \"%s\" %s.\n",
+  trace_file_name, trace_file_enabled ? "on" : "off");
+}
 
 static bool write_header(FILE *fp)
 {
@@ -54,31 +62,57 @@ static bool write_header(FILE *fp)
 return fwrite(&header, sizeof header, 1, fp) == 1;
 }
 
-static bool open_trace_file(void)
+/**
+ * set_trace_file : To set the name of a trace file.
+ * @file : pointer to the name to be set.
+ * If NULL, set to the default name- set at config time.
+ */
+bool st_set_trace_file(const char *file)
 {
-char *filename;
+st_set_trace_file_enabled(false);
 
-if (asprintf(&filename, CONFIG_TRACE_FILE, getpid()) < 0) {
-return false;
-}
+free(trace_file_name);
 
-trace_fp = fopen(filename, "w");
-free(filename);
-if (!trace_fp) {
-return false;
+if (!file) {
+if (asprintf(&trace_file_name, CONFIG_TRACE_FILE, getpid()) < 0) {
+trace_file_name = NULL;
+return false;
+}
+} else {
+  

[Qemu-devel] [PATCH 09/14] trace: Add user documentation

2010-09-06 Thread Stefan Hajnoczi
Signed-off-by: Stefan Hajnoczi 
---
 docs/tracing.txt |  177 ++
 1 files changed, 177 insertions(+), 0 deletions(-)
 create mode 100644 docs/tracing.txt

diff --git a/docs/tracing.txt b/docs/tracing.txt
new file mode 100644
index 000..ae01ff1
--- /dev/null
+++ b/docs/tracing.txt
@@ -0,0 +1,177 @@
+= Tracing =
+
+== Introduction ==
+
+This document describes the tracing infrastructure in QEMU and how to use it
+for debugging, profiling, and observing execution.
+
+== Quickstart ==
+
+1. Build with the 'simple' trace backend:
+
+./configure --trace-backend=simple
+make
+
+2. Enable trace events you are interested in:
+
+$EDITOR trace-events  # remove "disable" from events you want
+
+3. Run the virtual machine to produce a trace file:
+
+qemu ... # your normal QEMU invocation
+
+4. Pretty-print the binary trace file:
+
+./simpletrace.py trace-events trace-*
+
+== Trace events ==
+
+There is a set of static trace events declared in the trace-events source
+file.  Each trace event declaration names the event, its arguments, and the
+format string which can be used for pretty-printing:
+
+qemu_malloc(size_t size, void *ptr) "size %zu ptr %p"
+qemu_free(void *ptr) "ptr %p"
+
+The trace-events file is processed by the tracetool script during build to
+generate code for the trace events.  Trace events are invoked directly from
+source code like this:
+
+#include "trace.h"  /* needed for trace event prototype */
+
+void *qemu_malloc(size_t size)
+{
+void *ptr;
+if (!size && !allow_zero_malloc()) {
+abort();
+}
+ptr = oom_check(malloc(size ? size : 1));
+trace_qemu_malloc(size, ptr);  /* <-- trace event */
+return ptr;
+}
+
+=== Declaring trace events ===
+
+The tracetool script produces the trace.h header file which is included by
+every source file that uses trace events.  Since many source files include
+trace.h, it uses a minimum of types and other header files included to keep
+the namespace clean and compile times and dependencies down.
+
+Trace events should use types as follows:
+
+ * Use stdint.h types for fixed-size types.  Most offsets and guest memory
+   addresses are best represented with uint32_t or uint64_t.  Use fixed-size
+   types over primitive types whose size may change depending on the host
+   (32-bit versus 64-bit) so trace events don't truncate values or break
+   the build.
+
+ * Use void * for pointers to structs or for arrays.  The trace.h header
+   cannot include all user-defined struct declarations and it is therefore
+   necessary to use void * for pointers to structs.
+
+ * For everything else, use primitive scalar types (char, int, long) with the
+   appropriate signedness.
+
+=== Hints for adding new trace events ===
+
+1. Trace state changes in the code.  Interesting points in the code usually
+   involve a state change like starting, stopping, allocating, freeing.  State
+   changes are good trace events because they can be used to understand the
+   execution of the system.
+
+2. Trace guest operations.  Guest I/O accesses like reading device registers
+   are good trace events because they can be used to understand guest
+   interactions.
+
+3. Use correlator fields so the context of an individual line of trace output
+   can be understood.  For example, trace the pointer returned by malloc and
+   used as an argument to free.  This way mallocs and frees can be matched up.
+   Trace events with no context are not very useful.
+
+4. Name trace events after their function.  If there are multiple trace events
+   in one function, append a unique distinguisher at the end of the name.
+
+5. Declare trace events with the "disable" keyword.  Some trace events can
+   produce a lot of output and users are typically only interested in a subset
+   of trace events.  Marking trace events disabled by default saves the user
+   from having to manually disable noisy trace events.
+
+== Trace backends ==
+
+The tracetool script automates tedious trace event code generation and also
+keeps the trace event declarations independent of the trace backend.  The trace
+events are not tightly coupled to a specific trace backend, such as LTTng or
+SystemTap.  Support for trace backends can be added by extending the tracetool
+script.
+
+The trace backend is chosen at configure time and only one trace backend can
+be built into the binary:
+
+./configure --trace-backend=simple
+
+For a list of supported trace backends, try ./configure --help or see below.
+
+The following subsections describe the supported trace backends.
+
+=== Nop ===
+
+The "nop" backend generates empty trace event functions so that the compiler
+can optimize out trace events completely.  This is the default and imposes no
+performance penalty.
+
+=== Simpletrace ===
+
+The "simple" backend supports common use cases and comes as part of the QEMU
+source tree.  It may not

[Qemu-devel] [PATCH 12/14] trace: Trace virtqueue operations

2010-09-06 Thread Stefan Hajnoczi
This patch adds trace events for virtqueue operations including
adding/removing buffers, notifying the guest, and receiving a notify
from the guest.

Signed-off-by: Stefan Hajnoczi 
---
 hw/virtio.c  |8 
 trace-events |8 
 2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/hw/virtio.c b/hw/virtio.c
index 4475bb3..a5741ae 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -13,6 +13,7 @@
 
 #include 
 
+#include "trace.h"
 #include "virtio.h"
 #include "sysemu.h"
 
@@ -205,6 +206,8 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement 
*elem,
 unsigned int offset;
 int i;
 
+trace_virtqueue_fill(vq, elem, len, idx);
+
 offset = 0;
 for (i = 0; i < elem->in_num; i++) {
 size_t size = MIN(len - offset, elem->in_sg[i].iov_len);
@@ -232,6 +235,7 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
 {
 /* Make sure buffer is written before we update index. */
 wmb();
+trace_virtqueue_flush(vq, count);
 vring_used_idx_increment(vq, count);
 vq->inuse -= count;
 }
@@ -422,6 +426,7 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem)
 
 vq->inuse++;
 
+trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
 return elem->in_num + elem->out_num;
 }
 
@@ -560,6 +565,7 @@ int virtio_queue_get_num(VirtIODevice *vdev, int n)
 void virtio_queue_notify(VirtIODevice *vdev, int n)
 {
 if (n < VIRTIO_PCI_QUEUE_MAX && vdev->vq[n].vring.desc) {
+trace_virtio_queue_notify(vdev, n, &vdev->vq[n]);
 vdev->vq[n].handle_output(vdev, &vdev->vq[n]);
 }
 }
@@ -597,6 +603,7 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int 
queue_size,
 
 void virtio_irq(VirtQueue *vq)
 {
+trace_virtio_irq(vq);
 vq->vdev->isr |= 0x01;
 virtio_notify_vector(vq->vdev, vq->vector);
 }
@@ -609,6 +616,7 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue *vq)
  (vq->inuse || vring_avail_idx(vq) != vq->last_avail_idx)))
 return;
 
+trace_virtio_notify(vdev, vq);
 vdev->isr |= 0x01;
 virtio_notify_vector(vdev, vq->vector);
 }
diff --git a/trace-events b/trace-events
index de6479e..29580a5 100644
--- a/trace-events
+++ b/trace-events
@@ -38,6 +38,14 @@ disable qemu_memalign(size_t alignment, size_t size, void 
*ptr) "alignment %zu s
 disable qemu_valloc(size_t size, void *ptr) "size %zu ptr %p"
 disable qemu_vfree(void *ptr) "ptr %p"
 
+# hw/virtio.c
+disable virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned 
int idx) "vq %p elem %p len %u idx %u"
+disable virtqueue_flush(void *vq, unsigned int count) "vq %p count %u"
+disable virtqueue_pop(void *vq, void *elem, unsigned int in_num, unsigned int 
out_num) "vq %p elem %p in_num %u out_num %u"
+disable virtio_queue_notify(void *vdev, int n, void *vq) "vdev %p n %d vq %p"
+disable virtio_irq(void *vq) "vq %p"
+disable virtio_notify(void *vdev, void *vq) "vdev %p vq %p"
+
 # block.c
 disable multiwrite_cb(void *mcb, int ret) "mcb %p ret %d"
 disable bdrv_aio_multiwrite(void *mcb, int num_callbacks, int num_reqs) "mcb 
%p num_callbacks %d num_reqs %d"
-- 
1.7.1




[Qemu-devel] [PATCH 13/14] trace: Trace port IO

2010-09-06 Thread Stefan Hajnoczi
From: Prerna Saxena 

Signed-off-by: Prerna Saxena 
Signed-off-by: Stefan Hajnoczi 
---
 ioport.c |7 +++
 trace-events |4 
 2 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/ioport.c b/ioport.c
index 53dd87a..ec3dc65 100644
--- a/ioport.c
+++ b/ioport.c
@@ -26,6 +26,7 @@
  */
 
 #include "ioport.h"
+#include "trace.h"
 
 /***/
 /* IO Port */
@@ -195,18 +196,21 @@ void isa_unassign_ioport(pio_addr_t start, int length)
 void cpu_outb(pio_addr_t addr, uint8_t val)
 {
 LOG_IOPORT("outb: %04"FMT_pioaddr" %02"PRIx8"\n", addr, val);
+trace_cpu_out(addr, val);
 ioport_write(0, addr, val);
 }
 
 void cpu_outw(pio_addr_t addr, uint16_t val)
 {
 LOG_IOPORT("outw: %04"FMT_pioaddr" %04"PRIx16"\n", addr, val);
+trace_cpu_out(addr, val);
 ioport_write(1, addr, val);
 }
 
 void cpu_outl(pio_addr_t addr, uint32_t val)
 {
 LOG_IOPORT("outl: %04"FMT_pioaddr" %08"PRIx32"\n", addr, val);
+trace_cpu_out(addr, val);
 ioport_write(2, addr, val);
 }
 
@@ -214,6 +218,7 @@ uint8_t cpu_inb(pio_addr_t addr)
 {
 uint8_t val;
 val = ioport_read(0, addr);
+trace_cpu_in(addr, val);
 LOG_IOPORT("inb : %04"FMT_pioaddr" %02"PRIx8"\n", addr, val);
 return val;
 }
@@ -222,6 +227,7 @@ uint16_t cpu_inw(pio_addr_t addr)
 {
 uint16_t val;
 val = ioport_read(1, addr);
+trace_cpu_in(addr, val);
 LOG_IOPORT("inw : %04"FMT_pioaddr" %04"PRIx16"\n", addr, val);
 return val;
 }
@@ -230,6 +236,7 @@ uint32_t cpu_inl(pio_addr_t addr)
 {
 uint32_t val;
 val = ioport_read(2, addr);
+trace_cpu_in(addr, val);
 LOG_IOPORT("inl : %04"FMT_pioaddr" %08"PRIx32"\n", addr, val);
 return val;
 }
diff --git a/trace-events b/trace-events
index 29580a5..b2c7f10 100644
--- a/trace-events
+++ b/trace-events
@@ -59,3 +59,7 @@ disable virtio_blk_handle_write(void *req, unsigned long 
sector, unsigned long n
 
 # posix-aio-compat.c
 disable paio_submit(void *acb, void *opaque, unsigned long sector_num, 
unsigned long nb_sectors, unsigned long type) "acb %p opaque %p sector_num %lu 
nb_sectors %lu type %lu"
+
+# ioport.c
+disable cpu_in(unsigned int addr, unsigned int val) "addr %#x value %u"
+disable cpu_out(unsigned int addr, unsigned int val) "addr %#x value %u"
-- 
1.7.1




[Qemu-devel] [PATCH 10/14] trace: Trace qemu_malloc() and qemu_vmalloc()

2010-09-06 Thread Stefan Hajnoczi
It is often useful to instrument memory management functions in order to
find leaks or performance problems.  This patch adds trace events for
the memory allocation primitives.

Signed-off-by: Stefan Hajnoczi 
---
 osdep.c   |   24 ++--
 qemu-malloc.c |   12 ++--
 trace-events  |   10 ++
 3 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/osdep.c b/osdep.c
index dbf872a..30426ff 100644
--- a/osdep.c
+++ b/osdep.c
@@ -50,6 +50,7 @@
 #endif
 
 #include "qemu-common.h"
+#include "trace.h"
 #include "sysemu.h"
 #include "qemu_socket.h"
 
@@ -71,25 +72,34 @@ static void *oom_check(void *ptr)
 #if defined(_WIN32)
 void *qemu_memalign(size_t alignment, size_t size)
 {
+void *ptr;
+
 if (!size) {
 abort();
 }
-return oom_check(VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE));
+ptr = oom_check(VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE));
+trace_qemu_memalign(alignment, size, ptr);
+return ptr;
 }
 
 void *qemu_vmalloc(size_t size)
 {
+void *ptr;
+
 /* FIXME: this is not exactly optimal solution since VirtualAlloc
has 64Kb granularity, but at least it guarantees us that the
memory is page aligned. */
 if (!size) {
 abort();
 }
-return oom_check(VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE));
+ptr = oom_check(VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE));
+trace_qemu_vmalloc(size, ptr);
+return ptr;
 }
 
 void qemu_vfree(void *ptr)
 {
+trace_qemu_vfree(ptr);
 VirtualFree(ptr, 0, MEM_RELEASE);
 }
 
@@ -97,21 +107,22 @@ void qemu_vfree(void *ptr)
 
 void *qemu_memalign(size_t alignment, size_t size)
 {
+void *ptr;
 #if defined(_POSIX_C_SOURCE) && !defined(__sun__)
 int ret;
-void *ptr;
 ret = posix_memalign(&ptr, alignment, size);
 if (ret != 0) {
 fprintf(stderr, "Failed to allocate %zu B: %s\n",
 size, strerror(ret));
 abort();
 }
-return ptr;
 #elif defined(CONFIG_BSD)
-return oom_check(valloc(size));
+ptr = oom_check(valloc(size));
 #else
-return oom_check(memalign(alignment, size));
+ptr = oom_check(memalign(alignment, size));
 #endif
+trace_qemu_memalign(alignment, size, ptr);
+return ptr;
 }
 
 /* alloc shared memory pages */
@@ -122,6 +133,7 @@ void *qemu_vmalloc(size_t size)
 
 void qemu_vfree(void *ptr)
 {
+trace_qemu_vfree(ptr);
 free(ptr);
 }
 
diff --git a/qemu-malloc.c b/qemu-malloc.c
index 36b0b36..ecffb67 100644
--- a/qemu-malloc.c
+++ b/qemu-malloc.c
@@ -22,6 +22,7 @@
  * THE SOFTWARE.
  */
 #include "qemu-common.h"
+#include "trace.h"
 #include 
 
 static void *oom_check(void *ptr)
@@ -34,6 +35,7 @@ static void *oom_check(void *ptr)
 
 void qemu_free(void *ptr)
 {
+trace_qemu_free(ptr);
 free(ptr);
 }
 
@@ -48,18 +50,24 @@ static int allow_zero_malloc(void)
 
 void *qemu_malloc(size_t size)
 {
+void *ptr;
 if (!size && !allow_zero_malloc()) {
 abort();
 }
-return oom_check(malloc(size ? size : 1));
+ptr = oom_check(malloc(size ? size : 1));
+trace_qemu_malloc(size, ptr);
+return ptr;
 }
 
 void *qemu_realloc(void *ptr, size_t size)
 {
+void *newptr;
 if (!size && !allow_zero_malloc()) {
 abort();
 }
-return oom_check(realloc(ptr, size ? size : 1));
+newptr = oom_check(realloc(ptr, size ? size : 1));
+trace_qemu_realloc(ptr, size, newptr);
+return newptr;
 }
 
 void *qemu_mallocz(size_t size)
diff --git a/trace-events b/trace-events
index 2a986ec..d2f2bbc 100644
--- a/trace-events
+++ b/trace-events
@@ -27,3 +27,13 @@
 # system may not have the necessary headers included.
 #
 # The  should be a sprintf()-compatible format string.
+
+# qemu-malloc.c
+disable qemu_malloc(size_t size, void *ptr) "size %zu ptr %p"
+disable qemu_realloc(void *ptr, size_t size, void *newptr) "ptr %p size %zu 
newptr %p"
+disable qemu_free(void *ptr) "ptr %p"
+
+# osdep.c
+disable qemu_memalign(size_t alignment, size_t size, void *ptr) "alignment %zu 
size %zu ptr %p"
+disable qemu_valloc(size_t size, void *ptr) "size %zu ptr %p"
+disable qemu_vfree(void *ptr) "ptr %p"
-- 
1.7.1




[Qemu-devel] [PATCH 03/14] trace: Support for dynamically enabling/disabling trace events

2010-09-06 Thread Stefan Hajnoczi
From: Prerna Saxena 

This patch adds support for dynamically enabling/disabling of trace events.
This is done by internally maintaining each trace event's state, and
permitting logging of data from a trace event only if it is in an
'active' state.

Monitor commands added :
1) info trace-events: to view all available trace events and
  their state.
2) trace-event NAME on|off  : to enable/disable data logging from a
  given trace event.
  Eg, trace-event paio_submit off
disables logging of data when
paio_submit is hit.

By default, all trace-events are disabled. One can enable desired trace-events
via the monitor.

Signed-off-by: Prerna Saxena 
Signed-off-by: Stefan Hajnoczi 

trace: Monitor command 'info trace'

Monitor command 'info trace' to display contents of trace buffer

Signed-off-by: Prerna Saxena 
Signed-off-by: Stefan Hajnoczi 

trace: Remove monitor.h dependency from simpletrace

User-mode targets don't have a monitor so the simple trace backend
currently does not build on those targets.  This patch abstracts the
monitor printing interface so there is no direct coupling between
simpletrace and the monitor.

Signed-off-by: Stefan Hajnoczi 
---
 configure   |3 +++
 monitor.c   |   40 
 qemu-monitor.hx |   25 +
 simpletrace.c   |   51 +++
 simpletrace.h   |   10 ++
 tracetool   |   24 
 6 files changed, 149 insertions(+), 4 deletions(-)

diff --git a/configure b/configure
index 6729dbe..5afb3b5 100755
--- a/configure
+++ b/configure
@@ -2468,6 +2468,9 @@ bsd)
 esac
 
 echo "TRACE_BACKEND=$trace_backend" >> $config_host_mak
+if test "$trace_backend" = "simple"; then
+  echo "CONFIG_SIMPLE_TRACE=y" >> $config_host_mak
+fi
 echo "TOOLS=$tools" >> $config_host_mak
 echo "ROMS=$roms" >> $config_host_mak
 echo "MAKE=$make" >> $config_host_mak
diff --git a/monitor.c b/monitor.c
index e27f8d8..0e69bc8 100644
--- a/monitor.c
+++ b/monitor.c
@@ -56,6 +56,9 @@
 #include "json-parser.h"
 #include "osdep.h"
 #include "exec-all.h"
+#ifdef CONFIG_SIMPLE_TRACE
+#include "trace.h"
+#endif
 
 //#define DEBUG
 //#define DEBUG_COMPLETION
@@ -539,6 +542,15 @@ static void do_help_cmd(Monitor *mon, const QDict *qdict)
 help_cmd(mon, qdict_get_try_str(qdict, "name"));
 }
 
+#ifdef CONFIG_SIMPLE_TRACE
+static void do_change_trace_event_state(Monitor *mon, const QDict *qdict)
+{
+const char *tp_name = qdict_get_str(qdict, "name");
+bool new_state = qdict_get_bool(qdict, "option");
+st_change_trace_event_state(tp_name, new_state);
+}
+#endif
+
 static void user_monitor_complete(void *opaque, QObject *ret_data)
 {
 MonitorCompletionData *data = (MonitorCompletionData *)opaque; 
@@ -938,6 +950,18 @@ static void do_info_cpu_stats(Monitor *mon)
 }
 #endif
 
+#if defined(CONFIG_SIMPLE_TRACE)
+static void do_info_trace(Monitor *mon)
+{
+st_print_trace((FILE *)mon, &monitor_fprintf);
+}
+
+static void do_info_trace_events(Monitor *mon)
+{
+st_print_trace_events((FILE *)mon, &monitor_fprintf);
+}
+#endif
+
 /**
  * do_quit(): Quit QEMU execution
  */
@@ -2594,6 +2618,22 @@ static const mon_cmd_t info_cmds[] = {
 .help   = "show roms",
 .mhandler.info = do_info_roms,
 },
+#if defined(CONFIG_SIMPLE_TRACE)
+{
+.name   = "trace",
+.args_type  = "",
+.params = "",
+.help   = "show current contents of trace buffer",
+.mhandler.info = do_info_trace,
+},
+{
+.name   = "trace-events",
+.args_type  = "",
+.params = "",
+.help   = "show available trace-events & their state",
+.mhandler.info = do_info_trace_events,
+},
+#endif
 {
 .name   = NULL,
 },
diff --git a/qemu-monitor.hx b/qemu-monitor.hx
index 5c1da33..c264c7d 100644
--- a/qemu-monitor.hx
+++ b/qemu-monitor.hx
@@ -281,6 +281,22 @@ STEXI
 Output logs to @var{filename}.
 ETEXI
 
+#ifdef CONFIG_SIMPLE_TRACE
+{
+.name   = "trace-event",
+.args_type  = "name:s,option:b",
+.params = "name on|off",
+.help   = "changes status of a specific trace event",
+.mhandler.cmd = do_change_trace_event_state,
+},
+
+STEXI
+...@item trace-event
+...@findex trace-event
+changes status of a trace event
+ETEXI
+#endif
+
 {
 .name   = "log",
 .args_type  = "items:s",
@@ -2529,6 +2545,15 @@ show roms
 @end table
 ETEXI
 
+#ifdef CONFIG_SIMPLE_TRACE
+STEXI
+...@item info trace
+show contents of trace buffer
+...@item info trace-events
+show available trace events and their state
+ETEXI
+#endif
+
 HXCOMM DO NOT add new commands after 'info', move your addition before it!
 
 STEXI
diff --git a/sim

[Qemu-devel] [PATCH 08/14] trace: Add LTTng Userspace Tracer backend

2010-09-06 Thread Stefan Hajnoczi
This patch adds LTTng Userspace Tracer (UST) backend support.  The UST
system requires no kernel support but libust and liburcu must be
installed.

$ ./configure --trace-backend ust
$ make

Start the UST daemon:
$ ustd &

List available tracepoints and enable some:
$ ustctl --list-markers $(pgrep qemu)
[...]
{PID: 5458, channel/marker: ust/paio_submit, state: 0, fmt: "acb %p
opaque %p sector_num %lu nb_sectors %lu type %lu" 0x4b32ba}
$ ustctl --enable-marker "ust/paio_submit" $(pgrep qemu)

Run the trace:
$ ustctl --create-trace $(pgrep qemu)
$ ustctl --start-trace $(pgrep qemu)
[...]
$ ustctl --stop-trace $(pgrep qemu)
$ ustctl --destroy-trace $(pgrep qemu)

Trace results can be viewed using lttv-gui.

More information about UST:
http://lttng.org/ust

Signed-off-by: Stefan Hajnoczi 

trace: Check for LTTng Userspace Tracer headers

When using the 'ust' backend, check if the relevant headers are
available at host.

Signed-off-by: Prerna Saxena 
Signed-off-by: Stefan Hajnoczi 
---
 configure |   20 +-
 tracetool |   85 +++-
 2 files changed, 102 insertions(+), 3 deletions(-)

diff --git a/configure b/configure
index 234e75b..4061cb7 100755
--- a/configure
+++ b/configure
@@ -903,7 +903,7 @@ echo "  --enable-docsenable documentation build"
 echo "  --disable-docs   disable documentation build"
 echo "  --disable-vhost-net  disable vhost-net acceleration support"
 echo "  --enable-vhost-net   enable vhost-net acceleration support"
-echo "  --trace-backend=BTrace backend nop simple"
+echo "  --trace-backend=BTrace backend nop simple ust"
 echo "  --trace-file=NAMEFull PATH,NAME of file to store traces"
 echo "   Default:trace-"
 echo ""
@@ -2080,6 +2080,24 @@ if test "$?" -ne 0 ; then
   exit 1
 fi
 
+##
+# For 'ust' backend, test if ust headers are present
+if test "$trace_backend" = "ust"; then
+  cat > $TMPC << EOF
+#include 
+#include 
+int main(void) { return 0; }
+EOF
+  if compile_prog "" "" ; then
+LIBS="-lust $LIBS"
+  else
+echo
+echo "Error: Trace backend 'ust' missing libust header files"
+echo
+exit 1
+  fi
+fi
+##
 # End of CC checks
 # After here, no more $cc or $ld runs
 
diff --git a/tracetool b/tracetool
index 8aeac48..b74b630 100755
--- a/tracetool
+++ b/tracetool
@@ -13,12 +13,13 @@ set -f
 usage()
 {
 cat >&2 <"
+ust_clean_namespace
+}
+
+linetoh_ust()
+{
+local name args argnames
+name=$(get_name "$1")
+args=$(get_args "$1")
+argnames=$(get_argnames "$1")
+
+cat <
+$(ust_clean_namespace)
+#include "trace.h"
+EOF
+}
+
+linetoc_ust()
+{
+local name args argnames fmt
+name=$(get_name "$1")
+args=$(get_args "$1")
+argnames=$(get_argnames "$1")
+fmt=$(get_fmt "$1")
+
+cat <

[Qemu-devel] [PATCH 05/14] trace: Specify trace file name

2010-09-06 Thread Stefan Hajnoczi
From: Prerna Saxena 

Allow users to specify a file for trace-outputs at configuration.
Also, allow trace files to be annotated by  so each qemu instance has
unique traces.

The trace file name can be passed as a config option:
--trace-file=/path/to/file
(Default: trace )
At runtime, the pid of the qemu process is appended to the filename so
that mutiple qemu instances do not have overlapping logs.

Eg : trace-1234 for qemu launched with pid 1234.

I have yet to test this on windows. getpid() is used at many places
in code(including vnc.c), so I'm hoping this would be okay too.

Edited-by: Stefan Hajnoczi 
Signed-off-by: Stefan Hajnoczi 
---
 configure |   12 
 simpletrace.c |   21 +
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/configure b/configure
index 5afb3b5..234e75b 100755
--- a/configure
+++ b/configure
@@ -318,6 +318,7 @@ check_utests="no"
 user_pie="no"
 zero_malloc=""
 trace_backend="nop"
+trace_file="trace"
 
 # OS specific
 if check_define __linux__ ; then
@@ -522,6 +523,8 @@ for opt do
   ;;
   --trace-backend=*) trace_backend="$optarg"
   ;;
+  --trace-file=*) trace_file="$optarg"
+  ;;
   --enable-gprof) gprof="yes"
   ;;
   --static)
@@ -901,6 +904,8 @@ echo "  --disable-docs   disable documentation 
build"
 echo "  --disable-vhost-net  disable vhost-net acceleration support"
 echo "  --enable-vhost-net   enable vhost-net acceleration support"
 echo "  --trace-backend=BTrace backend nop simple"
+echo "  --trace-file=NAMEFull PATH,NAME of file to store traces"
+echo "   Default:trace-"
 echo ""
 echo "NOTE: The object files are built at the place where configure is 
launched"
 exit 1
@@ -2206,6 +2211,7 @@ echo "fdatasync $fdatasync"
 echo "uuid support  $uuid"
 echo "vhost-net support $vhost_net"
 echo "Trace backend $trace_backend"
+echo "Trace output file $trace_file-"
 
 if test $sdl_too_old = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -2471,6 +2477,12 @@ echo "TRACE_BACKEND=$trace_backend" >> $config_host_mak
 if test "$trace_backend" = "simple"; then
   echo "CONFIG_SIMPLE_TRACE=y" >> $config_host_mak
 fi
+# Set the appropriate trace file.
+if test "$trace_backend" = "simple"; then
+  trace_file="\"$trace_file-%u\""
+fi
+echo "CONFIG_TRACE_FILE=$trace_file" >> $config_host_mak
+
 echo "TOOLS=$tools" >> $config_host_mak
 echo "ROMS=$roms" >> $config_host_mak
 echo "MAKE=$make" >> $config_host_mak
diff --git a/simpletrace.c b/simpletrace.c
index 688959a..97045a6 100644
--- a/simpletrace.c
+++ b/simpletrace.c
@@ -54,13 +54,26 @@ static bool write_header(FILE *fp)
 return fwrite(&header, sizeof header, 1, fp) == 1;
 }
 
+static bool open_trace_file(void)
+{
+char *filename;
+
+if (asprintf(&filename, CONFIG_TRACE_FILE, getpid()) < 0) {
+return false;
+}
+
+trace_fp = fopen(filename, "w");
+free(filename);
+if (!trace_fp) {
+return false;
+}
+return write_header(trace_fp);
+}
+
 static void flush_trace_buffer(void)
 {
 if (!trace_fp) {
-trace_fp = fopen("trace.log", "w");
-if (trace_fp) {
-write_header(trace_fp);
-}
+open_trace_file();
 }
 if (trace_fp) {
 size_t unused; /* for when fwrite(3) is declared warn_unused_result */
-- 
1.7.1




[Qemu-devel] [PATCH 14/14] trace: Trace entry point of balloon request handler

2010-09-06 Thread Stefan Hajnoczi
From: Prerna Saxena 

Signed-off-by: Prerna Saxena 
Signed-off-by: Stefan Hajnoczi 
---
 balloon.c|2 ++
 trace-events |4 
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/balloon.c b/balloon.c
index 8e0b7f1..0021fef 100644
--- a/balloon.c
+++ b/balloon.c
@@ -29,6 +29,7 @@
 #include "cpu-common.h"
 #include "kvm.h"
 #include "balloon.h"
+#include "trace.h"
 
 
 static QEMUBalloonEvent *qemu_balloon_event;
@@ -43,6 +44,7 @@ void qemu_add_balloon_handler(QEMUBalloonEvent *func, void 
*opaque)
 int qemu_balloon(ram_addr_t target, MonitorCompletion cb, void *opaque)
 {
 if (qemu_balloon_event) {
+trace_balloon_event(qemu_balloon_event_opaque, target);
 qemu_balloon_event(qemu_balloon_event_opaque, target, cb, opaque);
 return 1;
 } else {
diff --git a/trace-events b/trace-events
index b2c7f10..c5fa0aa 100644
--- a/trace-events
+++ b/trace-events
@@ -63,3 +63,7 @@ disable paio_submit(void *acb, void *opaque, unsigned long 
sector_num, unsigned
 # ioport.c
 disable cpu_in(unsigned int addr, unsigned int val) "addr %#x value %u"
 disable cpu_out(unsigned int addr, unsigned int val) "addr %#x value %u"
+
+# balloon.c
+# Since requests are raised via monitor, not many tracepoints are needed.
+disable balloon_event(void *opaque, unsigned long addr) "opaque %p addr %lu"
-- 
1.7.1




Re: [Qemu-devel] [PULL 00/10] Block patches

2010-09-06 Thread Kevin Wolf
Am 30.08.2010 18:32, schrieb Kevin Wolf:
> The following changes since commit 02a89b219039621c940863aa5a9da4fec81a1546:
> 
>   isapc: fix segfault. (2010-08-28 08:50:40 +)
> 
> are available in the git repository at:
>   git://repo.or.cz/qemu/kevin.git for-anthony
> 
> Andrew de Quincey (1):
>   posix-aio-compat: Fix async_conmtext for ioctl
> 
> Izumi Tsutsui (1):
>   sheepdog: remove unnecessary includes
> 
> Kevin Wolf (4):
>   virtio: Factor virtqueue_map_sg out
>   virtio-blk: Fix migration of queued requests
>   block: Fix image re-open in bdrv_commit
>   qemu-img rebase: Open new backing file read-only
> 
> Laurent Vivier (1):
>   nbd: Introduce NBD named exports.
> 
> Loïc Minier (1):
>   vvfat: fat_chksum(): fix access above array bounds
> 
> Miguel Di Ciurcio Filho (2):
>   monitor: make 'info snapshots' show only fully available snapshots
>   savevm: Generate a name when run without one

Anthony, I think you already pulled the patches for stable-0.13 that I
sent on the same day, but these patches for master still seem to be missing.

Kevin



[Qemu-devel] [PATCH] raw-posix: improve detection of scsi-generic devices

2010-09-06 Thread Bernhard Kohl
From: Bernhard Kohl 

Allow symbolic links which point to /dev/sgX devices.

Signed-off-by: Bernhard Kohl 
---
 block/raw-posix.c |   10 --
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 72fb8ce..4544a57 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -48,6 +48,7 @@
 #endif
 #ifdef __linux__
 #include 
+#include 
 #include 
 #include 
 #endif
@@ -868,8 +869,13 @@ static int hdev_open(BlockDriverState *bs, const char 
*filename, int flags)
 
 s->type = FTYPE_FILE;
 #if defined(__linux__)
-if (strstart(filename, "/dev/sg", NULL)) {
-bs->sg = 1;
+{
+char resolved_path[ MAXPATHLEN ], *temp;
+
+temp = realpath(filename, resolved_path);
+if (temp && strstart(temp, "/dev/sg", NULL)) {
+bs->sg = 1;
+}
 }
 #endif
 
-- 
1.7.2.2




[Qemu-devel] [PATCH] scsi_bus: fix length and xfer_mode for RESERVE and RELEASE commands

2010-09-06 Thread Bernhard Kohl
For the RESERVE and RELEASE commands the length must be zero
and xfer_mode must be SCSI_XFER_NONE.

Signed-off-by: Bernhard Kohl 
---
 hw/scsi-bus.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index 7aa0bcd..5a3fd4b 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -208,6 +208,8 @@ static int scsi_req_length(SCSIRequest *req, uint8_t *cmd)
 case SEEK_6:
 case WRITE_FILEMARKS:
 case SPACE:
+case RESERVE:
+case RELEASE:
 case ERASE:
 case ALLOW_MEDIUM_REMOVAL:
 case VERIFY:
@@ -319,7 +321,6 @@ static void scsi_req_xfer_mode(SCSIRequest *req)
 case WRITE_BUFFER:
 case FORMAT_UNIT:
 case REASSIGN_BLOCKS:
-case RESERVE:
 case SEARCH_EQUAL:
 case SEARCH_HIGH:
 case SEARCH_LOW:
-- 
1.7.2.2




Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-06 Thread Alexander Graf

On 06.09.2010, at 16:21, Luca Tettamanti wrote:

> On Mon, Sep 6, 2010 at 12:25 PM, Alexander Graf  wrote:
>> On 06.09.2010, at 12:04, Stefan Hajnoczi wrote:
>>> +
>>> +const char *bytes_to_str(uint64_t size)
>>> +{
>>> +static char buffer[64];
>>> +
>>> +if (size < (1ULL << 10)) {
>>> +snprintf(buffer, sizeof(buffer), "%" PRIu64 " byte(s)", size);
>>> +} else if (size < (1ULL << 20)) {
>>> +snprintf(buffer, sizeof(buffer), "%" PRIu64 " KB(s)", size >> 10);
>>> +} else if (size < (1ULL << 30)) {
>>> +snprintf(buffer, sizeof(buffer), "%" PRIu64 " MB(s)", size >> 20);
>>> +} else if (size < (1ULL << 40)) {
>>> +snprintf(buffer, sizeof(buffer), "%" PRIu64 " GB(s)", size >> 30);
>>> +} else {
>>> +snprintf(buffer, sizeof(buffer), "%" PRIu64 " TB(s)", size >> 40);
>>> +}
>>> +
>>> +return buffer;
>> 
>> This returns a variable from the stack! Please make the target buffer caller 
>> defined.
> 
> It's static, so it's formally correct. But probably not a good idea :)

Oh - I missed the static there. Yeah, it's even worse. This is racy.

Alex




[Qemu-devel] [PATCH] lsi53c895a: add support for ABORT messages

2010-09-06 Thread Bernhard Kohl
If these messages are not handled correctly the guest driver may hang.

Always mandatory:
- ABORT
- BUS DEVICE RESET

Mandatory if tagged queuing is implemented (which disks usually do):
- ABORT TAG
- CLEAR QUEUE

Signed-off-by: Bernhard Kohl 
---
 hw/lsi53c895a.c |   57 +++
 1 files changed, 57 insertions(+), 0 deletions(-)

diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c
index 5eaf69e..40f2d10 100644
--- a/hw/lsi53c895a.c
+++ b/hw/lsi53c895a.c
@@ -846,6 +846,18 @@ static void lsi_do_msgout(LSIState *s)
 {
 uint8_t msg;
 int len;
+uint32_t current_tag;
+SCSIDevice *current_dev;
+lsi_request *p, *p_next;
+int id;
+
+if (s->current) {
+current_tag = s->current->tag;
+} else {
+current_tag = s->select_tag;
+}
+id = (current_tag >> 8) & 0xf;
+current_dev = s->bus.devs[id];
 
 DPRINTF("MSG out len=%d\n", s->dbc);
 while (s->dbc) {
@@ -890,6 +902,51 @@ static void lsi_do_msgout(LSIState *s)
 BADF("ORDERED queue not implemented\n");
 s->select_tag |= lsi_get_msgbyte(s) | LSI_TAG_VALID;
 break;
+case 0x0d:
+/* The ABORT TAG message clears the current I/O process only. */
+DPRINTF("MSG: ABORT TAG tag=0x%x\n", current_tag);
+current_dev->info->cancel_io(current_dev, current_tag);
+lsi_disconnect(s);
+break;
+case 0x06:
+case 0x0e:
+case 0x0c:
+/* The ABORT message clears all I/O processes for the selecting
+   initiator on the specified logical unit of the target. */
+if (msg == 0x06) {
+DPRINTF("MSG: ABORT tag=0x%x\n", current_tag);
+}
+/* The CLEAR QUEUE message clears all I/O processes for all
+   initiators on the specified logical unit of the target. */
+if (msg == 0x0e) {
+DPRINTF("MSG: CLEAR QUEUE tag=0x%x\n", current_tag);
+}
+/* The BUS DEVICE RESET message clears all I/O processes for all
+   initiators on all logical units of the target. */
+if (msg == 0x0c) {
+DPRINTF("MSG: BUS DEVICE RESET tag=0x%x\n", current_tag);
+}
+
+/* clear the current I/O process */
+current_dev->info->cancel_io(current_dev, current_tag);
+
+/* As the current implemented devices scsi_disk and scsi_generic
+   only support one LUN, we don't need to keep track of LUNs.
+   Clearing I/O processes for other initiators could be possible
+   for scsi_generic by sending a SG_SCSI_RESET to the /dev/sgX
+   device, but this is currently not implemented (and seems not
+   to be really necessary). So let's simply clear all queued
+   commands for the current device: */
+id = current_tag & 0xff00;
+QTAILQ_FOREACH_SAFE(p, &s->queue, next, p_next) {
+if ((p->tag & 0xff00) == id) {
+current_dev->info->cancel_io(current_dev, p->tag);
+QTAILQ_REMOVE(&s->queue, p, next);
+}
+}
+
+lsi_disconnect(s);
+break;
 default:
 if ((msg & 0x80) == 0) {
 goto bad;
-- 
1.7.2.2




Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-06 Thread Luca Tettamanti
On Mon, Sep 6, 2010 at 12:25 PM, Alexander Graf  wrote:
> On 06.09.2010, at 12:04, Stefan Hajnoczi wrote:
>> +
>> +const char *bytes_to_str(uint64_t size)
>> +{
>> +    static char buffer[64];
>> +
>> +    if (size < (1ULL << 10)) {
>> +        snprintf(buffer, sizeof(buffer), "%" PRIu64 " byte(s)", size);
>> +    } else if (size < (1ULL << 20)) {
>> +        snprintf(buffer, sizeof(buffer), "%" PRIu64 " KB(s)", size >> 10);
>> +    } else if (size < (1ULL << 30)) {
>> +        snprintf(buffer, sizeof(buffer), "%" PRIu64 " MB(s)", size >> 20);
>> +    } else if (size < (1ULL << 40)) {
>> +        snprintf(buffer, sizeof(buffer), "%" PRIu64 " GB(s)", size >> 30);
>> +    } else {
>> +        snprintf(buffer, sizeof(buffer), "%" PRIu64 " TB(s)", size >> 40);
>> +    }
>> +
>> +    return buffer;
>
> This returns a variable from the stack! Please make the target buffer caller 
> defined.

It's static, so it's formally correct. But probably not a good idea :)

Luca



[Qemu-devel] [rfc 3/3] arm : make initrd load address dynamic

2010-09-06 Thread Daniel Lezcano
Instead of hardcoding a default value for initrd, let's compute
dynamically from the kernel load address and its size.
We go one page after the end of the kernel.

Signed-off-by: Daniel Lezcano 
---
 hw/arm-misc.h |1 +
 hw/arm_boot.c |   19 ---
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/hw/arm-misc.h b/hw/arm-misc.h
index 010acb4..e72f87e 100644
--- a/hw/arm-misc.h
+++ b/hw/arm-misc.h
@@ -34,6 +34,7 @@ struct arm_boot_info {
 int (*atag_board)(struct arm_boot_info *info, void *p);
 /* Used internally by arm_boot.c */
 int is_linux;
+target_phys_addr_t initrd_load_addr;
 target_phys_addr_t initrd_size;
 target_phys_addr_t entry;
 };
diff --git a/hw/arm_boot.c b/hw/arm_boot.c
index 16a33af..638ef62 100644
--- a/hw/arm_boot.c
+++ b/hw/arm_boot.c
@@ -15,7 +15,6 @@
 
 #define KERNEL_ARGS_ADDR 0x100
 #define KERNEL_LOAD_ADDR 0x0001
-#define INITRD_LOAD_ADDR 0x0080
 
 /* The worlds second smallest bootloader.  Set r0-r2, then jump to kernel.  */
 static uint32_t bootloader[] = {
@@ -71,7 +70,7 @@ static void set_kernel_args(struct arm_boot_info *info)
 /* ATAG_INITRD2 */
 WRITE_WORD(p, 4);
 WRITE_WORD(p, 0x54420005);
-WRITE_WORD(p, info->loader_start + INITRD_LOAD_ADDR);
+WRITE_WORD(p, info->loader_start + info->initrd_load_addr);
 WRITE_WORD(p, initrd_size);
 }
 if (info->kernel_cmdline && *info->kernel_cmdline) {
@@ -147,7 +146,7 @@ static void set_kernel_args_old(struct arm_boot_info *info)
 WRITE_WORD(p, 0);
 /* initrd_start */
 if (initrd_size)
-WRITE_WORD(p, info->loader_start + INITRD_LOAD_ADDR);
+WRITE_WORD(p, info->loader_start + info->initrd_load_addr);
 else
 WRITE_WORD(p, 0);
 /* initrd_size */
@@ -201,6 +200,7 @@ void arm_load_kernel(CPUState *env, struct arm_boot_info 
*info)
 int is_linux = 0;
 uint64_t elf_entry;
 target_phys_addr_t entry;
+target_phys_addr_t initrd_load_addr = 0x0;
 int big_endian;
 
 /* Load the kernel.  */
@@ -242,16 +242,13 @@ void arm_load_kernel(CPUState *env, struct arm_boot_info 
*info)
 if (is_linux) {
 if (info->initrd_filename) {
 
-if (KERNEL_LOAD_ADDR + kernel_size >= INITRD_LOAD_ADDR) {
-   fprintf(stderr, "qemu: kernel is too big: %d Bytes\n",
-   kernel_size);
-   exit(1);
-   }
-
+   initrd_load_addr = KERNEL_LOAD_ADDR + kernel_size +
+   TARGET_PAGE_SIZE;
+   initrd_load_addr = TARGET_PAGE_ALIGN(initrd_load_addr);
 initrd_size = load_image_targphys(info->initrd_filename,
   info->loader_start
-  + INITRD_LOAD_ADDR,
-  ram_size - INITRD_LOAD_ADDR);
+  + initrd_load_addr,
+  ram_size - initrd_load_addr);
 if (initrd_size < 0) {
 fprintf(stderr, "qemu: could not load initrd '%s'\n",
 info->initrd_filename);
-- 
1.7.0.4






[Qemu-devel] [rfc 2/3] arm : factor out set_kernel_args[_old]

2010-09-06 Thread Daniel Lezcano
'initrd_size' and 'base' are already present in the 'info' structure,
passing them as parameter is redundant with the first parameter.

Signed-off-by: Daniel Lezcano 
---
 hw/arm_boot.c |   26 ++
 1 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/hw/arm_boot.c b/hw/arm_boot.c
index 50ec717..16a33af 100644
--- a/hw/arm_boot.c
+++ b/hw/arm_boot.c
@@ -49,12 +49,12 @@ static uint32_t smpboot[] = {
 p += 4;   \
 } while (0)
 
-static void set_kernel_args(struct arm_boot_info *info,
-int initrd_size, target_phys_addr_t base)
+static void set_kernel_args(struct arm_boot_info *info)
 {
-target_phys_addr_t p;
+target_phys_addr_t base = info->loader_start;
+target_phys_addr_t p = base + KERNEL_ARGS_ADDR;
+int initrd_size = info->initrd_size;
 
-p = base + KERNEL_ARGS_ADDR;
 /* ATAG_CORE */
 WRITE_WORD(p, 5);
 WRITE_WORD(p, 0x54410001);
@@ -102,15 +102,14 @@ static void set_kernel_args(struct arm_boot_info *info,
 WRITE_WORD(p, 0);
 }
 
-static void set_kernel_args_old(struct arm_boot_info *info,
-int initrd_size, target_phys_addr_t base)
+static void set_kernel_args_old(struct arm_boot_info *info)
 {
-target_phys_addr_t p;
+target_phys_addr_t base = info->loader_start;
+/* see linux/include/asm-arm/setup.h */
+target_phys_addr_t p = base + KERNEL_ARGS_ADDR;
+int initrd_size = info->initrd_size;
 const char *s;
 
-
-/* see linux/include/asm-arm/setup.h */
-p = base + KERNEL_ARGS_ADDR;
 /* page_size */
 WRITE_WORD(p, 4096);
 /* nr_pages */
@@ -188,12 +187,7 @@ static void main_cpu_reset(void *opaque)
 env->thumb = info->entry & 1;
 } else {
 env->regs[15] = info->loader_start;
-if (old_param) {
-set_kernel_args_old(info, info->initrd_size,
-info->loader_start);
-} else {
-set_kernel_args(info, info->initrd_size, info->loader_start);
-}
+old_param ? set_kernel_args_old(info): set_kernel_args(info);
 }
 }
 /* TODO:  Reset secondary CPUs.  */
-- 
1.7.0.4






[Qemu-devel] [rfc 0/3] arm : dynamically choose initrd load address

2010-09-06 Thread Daniel Lezcano
Hi all,

after compiling my kernel on the arm architecture I was not able to
start it because qemu was segfaulting or going to an infinite loop.

After google'ing I found on launchpad the bug:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/524893

Following the indications, I rebuilt qemu with an higher initrd load
address and the kernel booted correctly.

I am trying to make the things easier and/or to fail gracefully with
patchset but I am not familiar with the ARM architecture neither qemu
internals, so may be I am totally wrong :)

The first patch raise an error if there is an overlapping error.
But the two next patches makes to compute automatically an address
for initrd to loaded.

Daniel Lezcano (3):
  arm : raise an error if the kernel size will overlap the initrd
  arm : factor out set_kernel_args[_old]
  arm : make initrd load address dynamic

 hw/arm-misc.h |1 +
 hw/arm_boot.c |   40 +++-
 2 files changed, 20 insertions(+), 21 deletions(-)






[Qemu-devel] [rfc 1/3] arm : raise an error if the kernel size will overlap the initrd

2010-09-06 Thread Daniel Lezcano
If the kernel size is too big, it overwrite the initrd image in memory
without detecting the problem. Let't detect this error and exit gracefully.

Signed-off-by: Daniel Lezcano 
---
 hw/arm_boot.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/hw/arm_boot.c b/hw/arm_boot.c
index 620550b..50ec717 100644
--- a/hw/arm_boot.c
+++ b/hw/arm_boot.c
@@ -247,6 +247,13 @@ void arm_load_kernel(CPUState *env, struct arm_boot_info 
*info)
 info->entry = entry;
 if (is_linux) {
 if (info->initrd_filename) {
+
+if (KERNEL_LOAD_ADDR + kernel_size >= INITRD_LOAD_ADDR) {
+   fprintf(stderr, "qemu: kernel is too big: %d Bytes\n",
+   kernel_size);
+   exit(1);
+   }
+
 initrd_size = load_image_targphys(info->initrd_filename,
   info->loader_start
   + INITRD_LOAD_ADDR,
-- 
1.7.0.4






Re: [Qemu-devel] Re: [RFC] qed: Add QEMU Enhanced Disk format

2010-09-06 Thread Kevin Wolf
Am 06.09.2010 14:57, schrieb Anthony Liguori:
> On 09/06/2010 07:40 AM, Stefan Hajnoczi wrote:
>> On Mon, Sep 6, 2010 at 11:27 AM, Kevin Wolf  wrote:
>>
>>> Am 06.09.2010 12:04, schrieb Stefan Hajnoczi:
>>>  
 QEMU Enhanced Disk format is a disk image format that forgoes features
 found in qcow2 in favor of better levels of performance and data
 integrity.  Due to its simpler on-disk layout, it is possible to safely
 perform metadata updates more efficiently.

 Installations, suspend-to-disk, and other allocation-heavy I/O workloads
 will see increased performance due to fewer I/Os and syncs.  Workloads
 that do not cause new clusters to be allocated will perform similar to
 raw images due to in-memory metadata caching.

 The format supports sparse disk images.  It does not rely on the host
 filesystem holes feature, making it a good choice for sparse disk images
 that need to be transferred over channels where holes are not supported.

 Backing files are supported so only deltas against a base image can be
 stored.

 The file format is extensible so that additional features can be added
 later with graceful compatibility handling.

 Internal snapshots are not supported.  This eliminates the need for
 additional metadata to track copy-on-write clusters.

 Compression and encryption are not supported.  They add complexity and
 can be implemented at other layers in the stack (i.e. inside the guest
 or on the host).

 The format is currently functional with the following features missing:
   * Resizing the disk image.  The capability has been designed in but the
 code has not been written yet.
   * Resetting the image after backing file commit completes.
   * Changing the backing filename.
   * Consistency check (fsck).  This is simple due to the on-disk layout.

 Signed-off-by: Anthony Liguori
 Signed-off-by: Stefan Hajnoczi

>>> Okay, so before I actually look at the patch longer than a couple of
>>> seconds let me just ask the obvious question...
>>>
>>> Before inventing yet another image format, you certainly have checked
>>> the existing ones. Except for not implementing compression and
>>> encryption this looks a lot like qcow1 to me. I see that you even
>>> retained the two-level cluster tables.
>>>
>>> So if we ignore the implementation for a moment and just compare the
>>> formats, what's the crucial difference between qcow1 and qed that I'm
>>> missing? And if it's not qcow1, why not improving our support for
>>> another existing format like VHD?
>>>  
>> Is this a subset of existing on-disk formats?  Yes.  The motivation is
>> to have an image format that performs well and is safe, with backing
>> image support.  Currently no image format in QEMU meets these
>> requirements.
>>
>> Perhaps it is appropriate to use an existing on-disk format.
> 
> If you implement a subset of functionality for an existing on-disk 
> format, I think you damage user's expectations.

I don't really buy that implementing compression/encryption wouldn't
have been possible if it was the only problem. Of course, if you don't
implement it, you can't use an on-disk format that supports them.

> If we claim to support qcow images, then given any old qcow image I have 
> laying around for 5 years ago, I should be able to run it without qemu 
> throwing an error.
> 
> There's some really ugly stuff in qcow.  Nothing is actually aligned.  
> This makes implementing things like O_DIRECT very challenging since you 
> basically have to handle bouncing any possible buffer.  Since the L1 
> table occurs immediately after the header, there's really no room to 
> play any kind of tricks to add features.

That's a good point actually. I didn't remember that.

Kevin



[Qemu-devel] [PATCH] scsi-generic: add missing reset handler

2010-09-06 Thread Bernhard Kohl
Ensure that pending requests of a SCSI generic device are purged on
system reset. This also avoids calling a NULL function in lsi53c895a.
The lsi code was recently changed to call the .qdev.reset function.

Signed-off-by: Bernhard Kohl 
---
 hw/scsi-generic.c |   21 +++--
 1 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/hw/scsi-generic.c b/hw/scsi-generic.c
index 9538027..7212091 100644
--- a/hw/scsi-generic.c
+++ b/hw/scsi-generic.c
@@ -455,15 +455,31 @@ static int get_stream_blocksize(BlockDriverState *bdrv)
 return (buf[9] << 16) | (buf[10] << 8) | buf[11];
 }
 
-static void scsi_destroy(SCSIDevice *d)
+static void scsi_generic_purge_requests(SCSIGenericState *s)
 {
-SCSIGenericState *s = DO_UPCAST(SCSIGenericState, qdev, d);
 SCSIGenericReq *r;
 
 while (!QTAILQ_EMPTY(&s->qdev.requests)) {
 r = DO_UPCAST(SCSIGenericReq, req, QTAILQ_FIRST(&s->qdev.requests));
+if (r->req.aiocb) {
+bdrv_aio_cancel(r->req.aiocb);
+}
 scsi_remove_request(r);
 }
+}
+
+static void scsi_generic_reset(DeviceState *dev)
+{
+SCSIGenericState *s = DO_UPCAST(SCSIGenericState, qdev.qdev, dev);
+
+scsi_generic_purge_requests(s);
+}
+
+static void scsi_destroy(SCSIDevice *d)
+{
+SCSIGenericState *s = DO_UPCAST(SCSIGenericState, qdev, d);
+
+scsi_generic_purge_requests(s);
 blockdev_mark_auto_del(s->qdev.conf.bs);
 }
 
@@ -537,6 +553,7 @@ static SCSIDeviceInfo scsi_generic_info = {
 .qdev.name= "scsi-generic",
 .qdev.desc= "pass through generic scsi device (/dev/sg*)",
 .qdev.size= sizeof(SCSIGenericState),
+.qdev.reset   = scsi_generic_reset,
 .init = scsi_generic_initfn,
 .destroy  = scsi_destroy,
 .send_command = scsi_send_command,
-- 
1.7.2.2




Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-06 Thread Daniel P. Berrange
On Mon, Sep 06, 2010 at 07:52:41AM -0500, Anthony Liguori wrote:
> On 09/06/2010 06:18 AM, Daniel P. Berrange wrote:
> >I agree with ditching compression, but encryption is an important
> >capability which cannot be satisfactorily added at other layers
> >in the stack. While block devices / local filesystems can layer
> >in dm-crypt in the host, this is not possible with network/cluster
> >filesystems which account for a non-trivial target audience.
> 
> ecryptfs should work with NFS these days.  If it still doesn't, it will 
> in the not too distant future.

Assuming it does work with NFS, IIUC, that still requires the user to
have root privileges to setup ecryptfs for the NFS mount in question.
So it takes care of the use case where the host admin doesn't trust
the network/remote fs admin, but doesn't work for the case of local
unprivileged users with NFS home dirs & a host admin who doesnt help.

> >  Adding
> >encryption inside the guest is sub-optimal because you cannot do
> >secure automation of guest startup. Either you require manaual
> >intervention to start every guest to enter the key, or if you
> >hardcode the key, then anyone who can access the guest disk image
> >can start the guest.
> 
> I think this belongs in the VFS level but from a format perspective, an 
> encryption feature would be easy to add.
> 
> >>+
> >>+if ((s->header.compat_features&  QED_CF_BACKING_FORMAT)) {
> >>+ret = qed_read_string(bs->file, s->header.backing_fmt_offset,
> >>+  s->header.backing_fmt_size,
> >>+  bs->backing_format,
> >>+  sizeof(bs->backing_format));
> >>+if (ret<  0) {
> >>+return ret;
> >>+}
> >>+}
> >> 
> >IMHO we should make the backing format compulsory with use of
> >the backing file. The only time probing is required is when
> >initially creating the child image, thereafter there's no
> >benefit to probing again.
> >   
> 
> Stefan originally made it mandatory but I asked to make it optional.
> 
> From a format specification perspective, backing_fmt introduces some 
> problems.  What does a backing_fmt of 'vmdk' mean outside of qemu?

As currently implemented the string refers to a QEMU block driver
which is perhaps not the best choice for a general purpose file
format, if we want this applicable to other non-QEMU apps. Perhaps
it would be better if we explicitly declared backing format as an
enumerated int that represents specific file formats, thus decoupling
it from a specific driver.

Another related idea is perhaps to specify that if backing_fmt is
omitted in the metadata, the backing file must be treated as a QED
format file, rather than probed. Arguably qemu's VMDK driver should
be treating all VMDK backing files as VMDK format rather than probing
since I'm not VMware has no idea of a backing file in qcow or any
other format.

> More importantly, humans to create image formats by hand.  Instead, they 
> use tools like qemu-img.  If you think we should for the specification 
> of a backing file format in qemu-img, that's the place we should do it.

Certainly qemu-img can always add a format, even if the specification
declared it optional, but I think its worth considering declaring it
it compulsory in the spec, to take that variable out of the equation
for apps using the images.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|



Re: [Qemu-devel] Re: [PATCH 00/14] trace: Add static tracing to QEMU

2010-09-06 Thread Stefan Hajnoczi
On Sun, Sep 5, 2010 at 8:30 AM, Michael S. Tsirkin  wrote:
> On Thu, Aug 12, 2010 at 11:36:21AM +0100, Stefan Hajnoczi wrote:
>> 2. The built-in 'simple' trace backend writes binary traces to
>>    /tmp/trace-.
>
> Saving files with predictable names in /tmp is usually not a good idea,
> see e.g. http://en.wikipedia.org/wiki/Symlink_race.
> Put them in $HOME or something like that.

Let's default to the current working directory (which it already does
on Windows).  If QEMU is run as a daemon or in another environment
where this doesn't make sense, then it needs to be configured
appropriately:
$ ./configure --trace-file=/var/qemu/trace

Stefan



Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-06 Thread Anthony Liguori

On 09/06/2010 05:04 AM, Stefan Hajnoczi wrote:

QEMU Enhanced Disk format is a disk image format that forgoes features
found in qcow2 in favor of better levels of performance and data
integrity.  Due to its simpler on-disk layout, it is possible to safely
perform metadata updates more efficiently.

Installations, suspend-to-disk, and other allocation-heavy I/O workloads
will see increased performance due to fewer I/Os and syncs.  Workloads
that do not cause new clusters to be allocated will perform similar to
raw images due to in-memory metadata caching.

The format supports sparse disk images.  It does not rely on the host
filesystem holes feature, making it a good choice for sparse disk images
that need to be transferred over channels where holes are not supported.

Backing files are supported so only deltas against a base image can be
stored.

The file format is extensible so that additional features can be added
later with graceful compatibility handling.

Internal snapshots are not supported.  This eliminates the need for
additional metadata to track copy-on-write clusters.

Compression and encryption are not supported.  They add complexity and
can be implemented at other layers in the stack (i.e. inside the guest
or on the host).

The format is currently functional with the following features missing:
  * Resizing the disk image.  The capability has been designed in but the
code has not been written yet.
  * Resetting the image after backing file commit completes.
  * Changing the backing filename.
  * Consistency check (fsck).  This is simple due to the on-disk layout.

Signed-off-by: Anthony Liguori
Signed-off-by: Stefan Hajnoczi
   


Another point worth mentioning is that our intention is to have a formal 
specification of the format before merging.  A start of that is located 
at http://wiki.qemu.org/Features/QED


Regards,

Anthony Liguori



Re: [Qemu-devel] [PATCH 1/5] Suppress some gcc warnings with -Wtype-limits

2010-09-06 Thread Kevin Wolf
Am 04.09.2010 23:07, schrieb Blue Swirl:
> On Sat, Sep 4, 2010 at 8:30 PM, andrzej zaborowski  wrote:
>> Hi,
>>
>> On 4 September 2010 21:45, Blue Swirl  wrote:
>>> On Sat, Sep 4, 2010 at 5:57 PM, andrzej zaborowski  
>>> wrote:
>  -if (event < 0 || event >= BLKDBG_EVENT_MAX) {
>  +if ((int)event < 0 || event >= BLKDBG_EVENT_MAX) {
 Is the behaviour incorrect for some values, or is it not correct C?
 As far as I know it is correct in c99 because of type promotions
 between enum, int and unsigned int.
>>>
>>> The problem is that the type of an enum may be signed or unsigned,
>>> which affects the comparison. For example, the following program
>>> produces different results when it's compiled with -DUNSIGNED and
>>> without:
>>> $ cat enum.c
>>> #include 
>>>
>>> int main(int argc, const char **argv)
>>> {
>>>enum {
>>> #ifdef UNSIGNED
>>>A = 0,
>>> #else
>>>A = -1,
>>> #endif
>>>} a;
>>>
>>>a = atoi(argv[1]);
>>>if (a < 0) {
>>>puts("1: smaller");
>>>} else {
>>>puts("1: not smaller");
>>>}
>>>
>>>if ((int)a < 0) {
>>>puts("2: smaller");
>>>} else {
>>>puts("2: not smaller");
>>>}
>>>
>>>return 0;
>>> }
>>> $ gcc -DUNSIGNED enum.c -o enum-unsigned
>>> $ gcc enum.c -o enum-signed
>>> $ ./enum-signed 0
>>> 1: not smaller
>>> 2: not smaller
>>> $ ./enum-signed -1
>>> 1: smaller
>>> 2: smaller
>>> $ ./enum-unsigned 0
>>> 1: not smaller
>>> 2: not smaller
>>> $ ./enum-unsigned -1
>>> 1: not smaller
>>> 2: smaller
>>
>> Ah, that's a good example, however..
>>
>>>
>>> This is only how GCC uses enums, other compilers have other rules. So
>>> it's better to avoid any assumptions about signedness of enums.
>>>
>>> In this specific case, because the enum does not have any negative
>>> items, it is unsigned if compiled with GCC. Unsigned items cannot be
>>> negative, thus the check is useless.
>>
>> I agree it's useless, but saying that it is wrong is a bit of a
>> stretch in my opinion.  It actually depends on what the intent was --
>> if the intent was to be able to use the value as an array index, then
>> I think the check does the job independent of the signedness of the
>> operands.
> 
> No.
> 
> If BLKDBG_EVENT_MAX is < 0x8000, it does not matter if there is
> the check or not. Because of unsigned arithmetic, only one comparison
> is enough. With a non-GCC compiler (or if there were negative enum
> items), the check may have the desired effect, just like with int
> cast.
> If BLKDBG_EVENT_MAX is >= 0x8000, the first check is wrong,
> because you want to allow array access beyond 0x8000, which will
> be blocked by the first check. A non-GCC compiler may actually reject
> an enum bigger than 0x8000. Then the checks should be rewritten.
> 
> The version with int cast is correct in more cases than the version
> which relies on enum signedness.

If the value isn't explicitly specified, it's defined to start at 0 and
increment by 1 for each enum constant. So it's very unlikely to hit an
interesting case - we would have to add some billions of events first.

And isn't it the int cast that would lead to (event < 0) == true if
BLKDBG_EVENT_MAX was >= 0x800 and falsely return an error? The old
version should do this right.

Anyway, even though this specific code shouldn't misbehave today, I'm
all for silencing warnings and enabling -Wtype-limits. We regularly have
real bugs of this type.

Kevin



Re: [Qemu-devel] Re: [RFC] qed: Add QEMU Enhanced Disk format

2010-09-06 Thread Stefan Hajnoczi
On Mon, Sep 6, 2010 at 1:57 PM, Anthony Liguori  wrote:
> On 09/06/2010 07:40 AM, Stefan Hajnoczi wrote:
>>
>> On Mon, Sep 6, 2010 at 11:27 AM, Kevin Wolf  wrote:
>>
>>>
>>> Am 06.09.2010 12:04, schrieb Stefan Hajnoczi:
>>>

 QEMU Enhanced Disk format is a disk image format that forgoes features
 found in qcow2 in favor of better levels of performance and data
 integrity.  Due to its simpler on-disk layout, it is possible to safely
 perform metadata updates more efficiently.

 Installations, suspend-to-disk, and other allocation-heavy I/O workloads
 will see increased performance due to fewer I/Os and syncs.  Workloads
 that do not cause new clusters to be allocated will perform similar to
 raw images due to in-memory metadata caching.

 The format supports sparse disk images.  It does not rely on the host
 filesystem holes feature, making it a good choice for sparse disk images
 that need to be transferred over channels where holes are not supported.

 Backing files are supported so only deltas against a base image can be
 stored.

 The file format is extensible so that additional features can be added
 later with graceful compatibility handling.

 Internal snapshots are not supported.  This eliminates the need for
 additional metadata to track copy-on-write clusters.

 Compression and encryption are not supported.  They add complexity and
 can be implemented at other layers in the stack (i.e. inside the guest
 or on the host).

 The format is currently functional with the following features missing:
  * Resizing the disk image.  The capability has been designed in but the
    code has not been written yet.
  * Resetting the image after backing file commit completes.
  * Changing the backing filename.
  * Consistency check (fsck).  This is simple due to the on-disk layout.

 Signed-off-by: Anthony Liguori
 Signed-off-by: Stefan Hajnoczi

>>>
>>> Okay, so before I actually look at the patch longer than a couple of
>>> seconds let me just ask the obvious question...
>>>
>>> Before inventing yet another image format, you certainly have checked
>>> the existing ones. Except for not implementing compression and
>>> encryption this looks a lot like qcow1 to me. I see that you even
>>> retained the two-level cluster tables.
>>>
>>> So if we ignore the implementation for a moment and just compare the
>>> formats, what's the crucial difference between qcow1 and qed that I'm
>>> missing? And if it's not qcow1, why not improving our support for
>>> another existing format like VHD?
>>>
>>
>> Is this a subset of existing on-disk formats?  Yes.  The motivation is
>> to have an image format that performs well and is safe, with backing
>> image support.  Currently no image format in QEMU meets these
>> requirements.
>>
>> Perhaps it is appropriate to use an existing on-disk format.
>
> If you implement a subset of functionality for an existing on-disk format, I
> think you damage user's expectations.
>
> If we claim to support qcow images, then given any old qcow image I have
> laying around for 5 years ago, I should be able to run it without qemu
> throwing an error.
>
> There's some really ugly stuff in qcow.  Nothing is actually aligned.  This
> makes implementing things like O_DIRECT very challenging since you basically
> have to handle bouncing any possible buffer.  Since the L1 table occurs
> immediately after the header, there's really no room to play any kind of
> tricks to add features.

These are the details that are baggage.  Ultimately it may be hard to
deal with them without just bumping the qcow version number and
thereby having a new format anyway.

Stefan



Re: [Qemu-devel] Re: [RFC] qed: Add QEMU Enhanced Disk format

2010-09-06 Thread Anthony Liguori

On 09/06/2010 07:40 AM, Stefan Hajnoczi wrote:

On Mon, Sep 6, 2010 at 11:27 AM, Kevin Wolf  wrote:
   

Am 06.09.2010 12:04, schrieb Stefan Hajnoczi:
 

QEMU Enhanced Disk format is a disk image format that forgoes features
found in qcow2 in favor of better levels of performance and data
integrity.  Due to its simpler on-disk layout, it is possible to safely
perform metadata updates more efficiently.

Installations, suspend-to-disk, and other allocation-heavy I/O workloads
will see increased performance due to fewer I/Os and syncs.  Workloads
that do not cause new clusters to be allocated will perform similar to
raw images due to in-memory metadata caching.

The format supports sparse disk images.  It does not rely on the host
filesystem holes feature, making it a good choice for sparse disk images
that need to be transferred over channels where holes are not supported.

Backing files are supported so only deltas against a base image can be
stored.

The file format is extensible so that additional features can be added
later with graceful compatibility handling.

Internal snapshots are not supported.  This eliminates the need for
additional metadata to track copy-on-write clusters.

Compression and encryption are not supported.  They add complexity and
can be implemented at other layers in the stack (i.e. inside the guest
or on the host).

The format is currently functional with the following features missing:
  * Resizing the disk image.  The capability has been designed in but the
code has not been written yet.
  * Resetting the image after backing file commit completes.
  * Changing the backing filename.
  * Consistency check (fsck).  This is simple due to the on-disk layout.

Signed-off-by: Anthony Liguori
Signed-off-by: Stefan Hajnoczi
   

Okay, so before I actually look at the patch longer than a couple of
seconds let me just ask the obvious question...

Before inventing yet another image format, you certainly have checked
the existing ones. Except for not implementing compression and
encryption this looks a lot like qcow1 to me. I see that you even
retained the two-level cluster tables.

So if we ignore the implementation for a moment and just compare the
formats, what's the crucial difference between qcow1 and qed that I'm
missing? And if it's not qcow1, why not improving our support for
another existing format like VHD?
 

Is this a subset of existing on-disk formats?  Yes.  The motivation is
to have an image format that performs well and is safe, with backing
image support.  Currently no image format in QEMU meets these
requirements.

Perhaps it is appropriate to use an existing on-disk format.


If you implement a subset of functionality for an existing on-disk 
format, I think you damage user's expectations.


If we claim to support qcow images, then given any old qcow image I have 
laying around for 5 years ago, I should be able to run it without qemu 
throwing an error.


There's some really ugly stuff in qcow.  Nothing is actually aligned.  
This makes implementing things like O_DIRECT very challenging since you 
basically have to handle bouncing any possible buffer.  Since the L1 
table occurs immediately after the header, there's really no room to 
play any kind of tricks to add features.


Regards,

Anthony Liguori


   I
actually considered in-place migration (compatibility) with qcow2 to
make life easier for users and avoid a new format.  However, there is
baggage to doing this and the focus should be on building a solid
image format instead of fitting into a legacy format that qemu-img
convert can take care of.

Stefan

   





Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-06 Thread Anthony Liguori

On 09/06/2010 06:18 AM, Daniel P. Berrange wrote:

I agree with ditching compression, but encryption is an important
capability which cannot be satisfactorily added at other layers
in the stack. While block devices / local filesystems can layer
in dm-crypt in the host, this is not possible with network/cluster
filesystems which account for a non-trivial target audience.


ecryptfs should work with NFS these days.  If it still doesn't, it will 
in the not too distant future.



  Adding
encryption inside the guest is sub-optimal because you cannot do
secure automation of guest startup. Either you require manaual
intervention to start every guest to enter the key, or if you
hardcode the key, then anyone who can access the guest disk image
can start the guest.


I think this belongs in the VFS level but from a format perspective, an 
encryption feature would be easy to add.



+
+if ((s->header.compat_features&  QED_CF_BACKING_FORMAT)) {
+ret = qed_read_string(bs->file, s->header.backing_fmt_offset,
+  s->header.backing_fmt_size,
+  bs->backing_format,
+  sizeof(bs->backing_format));
+if (ret<  0) {
+return ret;
+}
+}
 

IMHO we should make the backing format compulsory with use of
the backing file. The only time probing is required is when
initially creating the child image, thereafter there's no
benefit to probing again.
   


Stefan originally made it mandatory but I asked to make it optional.

From a format specification perspective, backing_fmt introduces some 
problems.  What does a backing_fmt of 'vmdk' mean outside of qemu?


More importantly, humans to create image formats by hand.  Instead, they 
use tools like qemu-img.  If you think we should for the specification 
of a backing file format in qemu-img, that's the place we should do it.


Regards,

Anthony Liguori


Regards,
Daniel
   





Re: [Qemu-devel] Re: [RFC] qed: Add QEMU Enhanced Disk format

2010-09-06 Thread Anthony Liguori

On 09/06/2010 05:27 AM, Kevin Wolf wrote:

Okay, so before I actually look at the patch longer than a couple of
seconds let me just ask the obvious question...

Before inventing yet another image format, you certainly have checked
the existing ones.


Obviously, yes.

Here are the issues:

cow.c: it's cow of an otherwise sparse file.  An important reason for 
implementing a format is the ability to copy (or scp) an image without 
special tools.


qcow2.c: the refcount, cow cluster, and compression make an 
implementation seeking integrity and performance challenging.


vmdk.c: we feel it's important for qemu to have a block format with a 
gpl friendly specification that we have a say in


vhd/vpc.c: same as vmdk with the addition that the OSP is known to not 
be gpl friendly


vdi.c: uses a bitmap instead of a two level table.  An advantage of a 
two level table is that it allows image resize without much fuss.


qcow.c: it lacks extensibility and compression means that there's no 
guarantee that blocks are a fixed size.  This makes it very difficult to 
implement a high performance block format without having two separate 
code paths.



  Except for not implementing compression and
encryption this looks a lot like qcow1 to me. I see that you even
retained the two-level cluster tables.

So if we ignore the implementation for a moment and just compare the
formats, what's the crucial difference between qcow1 and qed that I'm
missing? And if it's not qcow1, why not improving our support for
another existing format like VHD?
   


Block formats are easy to get wrong.  QED is an existence proof that 
given the right constraints, we can build a full asynchronous, high 
performance image format with proper data integrity.


You could get to QED by incrementally improving qcow but you'd have to 
break the format to make it extensible and disable support for 
compression.  But at that point, why not just make a new format since 
you're breaking compatibility.


You would have to fully rewrite the code so what's the point of keeping 
the format?


Regards,

Anthony Liguori


Kevin

   





Re: [Qemu-devel] Re: [RFC] qed: Add QEMU Enhanced Disk format

2010-09-06 Thread Stefan Hajnoczi
On Mon, Sep 6, 2010 at 11:27 AM, Kevin Wolf  wrote:
> Am 06.09.2010 12:04, schrieb Stefan Hajnoczi:
>> QEMU Enhanced Disk format is a disk image format that forgoes features
>> found in qcow2 in favor of better levels of performance and data
>> integrity.  Due to its simpler on-disk layout, it is possible to safely
>> perform metadata updates more efficiently.
>>
>> Installations, suspend-to-disk, and other allocation-heavy I/O workloads
>> will see increased performance due to fewer I/Os and syncs.  Workloads
>> that do not cause new clusters to be allocated will perform similar to
>> raw images due to in-memory metadata caching.
>>
>> The format supports sparse disk images.  It does not rely on the host
>> filesystem holes feature, making it a good choice for sparse disk images
>> that need to be transferred over channels where holes are not supported.
>>
>> Backing files are supported so only deltas against a base image can be
>> stored.
>>
>> The file format is extensible so that additional features can be added
>> later with graceful compatibility handling.
>>
>> Internal snapshots are not supported.  This eliminates the need for
>> additional metadata to track copy-on-write clusters.
>>
>> Compression and encryption are not supported.  They add complexity and
>> can be implemented at other layers in the stack (i.e. inside the guest
>> or on the host).
>>
>> The format is currently functional with the following features missing:
>>  * Resizing the disk image.  The capability has been designed in but the
>>    code has not been written yet.
>>  * Resetting the image after backing file commit completes.
>>  * Changing the backing filename.
>>  * Consistency check (fsck).  This is simple due to the on-disk layout.
>>
>> Signed-off-by: Anthony Liguori 
>> Signed-off-by: Stefan Hajnoczi 
>
> Okay, so before I actually look at the patch longer than a couple of
> seconds let me just ask the obvious question...
>
> Before inventing yet another image format, you certainly have checked
> the existing ones. Except for not implementing compression and
> encryption this looks a lot like qcow1 to me. I see that you even
> retained the two-level cluster tables.
>
> So if we ignore the implementation for a moment and just compare the
> formats, what's the crucial difference between qcow1 and qed that I'm
> missing? And if it's not qcow1, why not improving our support for
> another existing format like VHD?

Is this a subset of existing on-disk formats?  Yes.  The motivation is
to have an image format that performs well and is safe, with backing
image support.  Currently no image format in QEMU meets these
requirements.

Perhaps it is appropriate to use an existing on-disk format.  I
actually considered in-place migration (compatibility) with qcow2 to
make life easier for users and avoid a new format.  However, there is
baggage to doing this and the focus should be on building a solid
image format instead of fitting into a legacy format that qemu-img
convert can take care of.

Stefan



[Qemu-devel] [PATCHv2] qemu: e1000 fix TOR math

2010-09-06 Thread Michael S. Tsirkin
Patch b0b900070c7cb29bbefb732ec00397abe5de6d73 made
TOR valuer incorrect: the spec says it should always
include the CRC field.
No one seems to use this field, but better to stick to spec.

Signed-off-by: Michael S. Tsirkin 
---

Changes from v1:
minor refactoring to avoid += within if statement.

 hw/e1000.c |   11 ---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/hw/e1000.c b/hw/e1000.c
index 80b78bc..7d7d140 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -345,7 +345,7 @@ is_vlan_txd(uint32_t txd_lower)
 
 /* FCS aka Ethernet CRC-32. We don't get it from backends and can't
  * fill it in, just pad descriptor length by 4 bytes unless guest
- * told us to trip it off the packet. */
+ * told us to strip it off the packet. */
 static inline int
 fcs_len(E1000State *s)
 {
@@ -690,9 +690,14 @@ e1000_receive(VLANClientState *nc, const uint8_t *buf, 
size_t size)
 
 s->mac_reg[GPRC]++;
 s->mac_reg[TPR]++;
-n = s->mac_reg[TORL];
-if ((s->mac_reg[TORL] += size) < n)
+/* TOR - Total Octets Received:
+ * This register includes bytes received in a packet from the  field through the  field, inclusively.
+ */
+n = s->mac_reg[TORL] + size + /* Always include FCS length. */ 4;
+if (n < s->mac_reg[TORL])
 s->mac_reg[TORH]++;
+s->mac_reg[TORL] = n;
 
 n = E1000_ICS_RXT0;
 if ((rdt = s->mac_reg[RDT]) < s->mac_reg[RDH])
-- 
1.7.2.rc0.14.g41c1c



[Qemu-devel] [PATCH] hpet: check no_hpet when adding fw_cfg entry.

2010-09-06 Thread Gerd Hoffmann
We should only pass the hpet config entry in case we actually
create a hpet device.

Signed-off-by: Gerd Hoffmann 
---
 hw/pc.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 0c31db1..c96f1fc 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -517,8 +517,10 @@ static void *bochs_bios_init(void)
 fw_cfg_add_bytes(fw_cfg, FW_CFG_E820_TABLE, (uint8_t *)&e820_table,
  sizeof(struct e820_table));
 
-fw_cfg_add_bytes(fw_cfg, FW_CFG_HPET, (uint8_t *)&hpet_cfg,
- sizeof(struct hpet_fw_config));
+if (!no_hpet) {
+fw_cfg_add_bytes(fw_cfg, FW_CFG_HPET, (uint8_t *)&hpet_cfg,
+ sizeof(struct hpet_fw_config));
+}
 /* allocate memory for the NUMA channel: one (64bit) word for the number
  * of nodes, one word for each VCPU->node and one word for each node to
  * hold the amount of memory.
-- 
1.7.1




[Qemu-devel] Re: [PATCH] qemu: e1000 fix TOR math

2010-09-06 Thread Michael S. Tsirkin
On Mon, Sep 06, 2010 at 01:29:27PM +0200, Juan Quintela wrote:
> "Michael S. Tsirkin"  wrote:
> > Patch b0b900070c7cb29bbefb732ec00397abe5de6d73 made
> > TOR valuer incorrect: the spec says it should always
> > include the CRC field, while size does not include CRC now.
> > No one seems to use TOR field (which is likely why
> > current code works fine), but better to stick to spec.
> >
> > Lightly tested with a linux guest.
> >
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> >  hw/e1000.c |8 ++--
> >  1 files changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/e1000.c b/hw/e1000.c
> > index 80b78bc..eb9faf2 100644
> > --- a/hw/e1000.c
> > +++ b/hw/e1000.c
> > @@ -345,7 +345,7 @@ is_vlan_txd(uint32_t txd_lower)
> >  
> >  /* FCS aka Ethernet CRC-32. We don't get it from backends and can't
> >   * fill it in, just pad descriptor length by 4 bytes unless guest
> > - * told us to trip it off the packet. */
> > + * told us to strip it off the packet. */
> >  static inline int
> >  fcs_len(E1000State *s)
> >  {
> > @@ -690,8 +690,12 @@ e1000_receive(VLANClientState *nc, const uint8_t *buf, 
> > size_t size)
> >  
> >  s->mac_reg[GPRC]++;
> >  s->mac_reg[TPR]++;
> > +/* TOR - Total Octets Received:
> > + * This register includes bytes received in a packet from the 
> >  > + * Address> field through the  field, inclusively.
> > + */
> >  n = s->mac_reg[TORL];
> > -if ((s->mac_reg[TORL] += size) < n)
> > +if ((s->mac_reg[TORL] += size + 4 /* Always include FCS
> >  length. */) < n)
> 
> once changing this, can we move the assignation out of the if?
> It was already complex enough, but adding a comment inside an if
> condition looks "too much" to me.
> 
> Later, Juan.

Sure, I'll do that.

> >  s->mac_reg[TORH]++;
> >  
> >  n = E1000_ICS_RXT0;



Re: [Qemu-devel] Re: [PATCH] vl.c: set NULL upon deleting handlers in qemu_set_fd_handler2()

2010-09-06 Thread Yoshiaki Tamura
2010/8/23 Corentin Chary :
> On Mon, Aug 23, 2010 at 2:55 AM, Yoshiaki Tamura
>  wrote:
>> Currently qemu_set_fd_handler2() is only setting ioh->deleted upon
>> deleting.  This may cause a crash when a read handler calls
>> qemu_set_fd_handler2() to delete handlers, but a write handler is
>> still invoked from main_loop_wait().  Because main_loop_wait() checks
>> handlers before calling, setting NULL upon deleting will protect
>> handlers being called if already deleted.
>>
>> One example is the new threaded vnc server.  When an error occurs in
>> the context of a read handler, it'll releases resources and deletes
>> handlers.  However, because the write handler still exists, it'll be
>> called, and then crashes because of lack of resources.  This patch
>> fixes it.
>>
>> Signed-off-by: Yoshiaki Tamura 
>> ---
>>  vl.c |    2 ++
>>  1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/vl.c b/vl.c
>> index ccc8d57..7ae69ab 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -966,6 +966,8 @@ int qemu_set_fd_handler2(int fd,
>>         QLIST_FOREACH(ioh, &io_handlers, next) {
>>             if (ioh->fd == fd) {
>>                 ioh->deleted = 1;
>> +                ioh->fd_read = NULL;
>> +                ioh->fd_write = NULL;
>>                 break;
>>             }
>>         }
>> --
>> 1.7.1.1
>>
>>
>
> Good catch, thanks,
>
> Reviewed-by: Corentin Chary 

Ping?

>
> --
> Corentin Chary
> http://xf.iksaif.net
>
>



[Qemu-devel] Re: [PATCH] qemu: e1000 fix TOR math

2010-09-06 Thread Juan Quintela
"Michael S. Tsirkin"  wrote:
> Patch b0b900070c7cb29bbefb732ec00397abe5de6d73 made
> TOR valuer incorrect: the spec says it should always
> include the CRC field, while size does not include CRC now.
> No one seems to use TOR field (which is likely why
> current code works fine), but better to stick to spec.
>
> Lightly tested with a linux guest.
>
> Signed-off-by: Michael S. Tsirkin 
> ---
>  hw/e1000.c |8 ++--
>  1 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/hw/e1000.c b/hw/e1000.c
> index 80b78bc..eb9faf2 100644
> --- a/hw/e1000.c
> +++ b/hw/e1000.c
> @@ -345,7 +345,7 @@ is_vlan_txd(uint32_t txd_lower)
>  
>  /* FCS aka Ethernet CRC-32. We don't get it from backends and can't
>   * fill it in, just pad descriptor length by 4 bytes unless guest
> - * told us to trip it off the packet. */
> + * told us to strip it off the packet. */
>  static inline int
>  fcs_len(E1000State *s)
>  {
> @@ -690,8 +690,12 @@ e1000_receive(VLANClientState *nc, const uint8_t *buf, 
> size_t size)
>  
>  s->mac_reg[GPRC]++;
>  s->mac_reg[TPR]++;
> +/* TOR - Total Octets Received:
> + * This register includes bytes received in a packet from the 
>  + * Address> field through the  field, inclusively.
> + */
>  n = s->mac_reg[TORL];
> -if ((s->mac_reg[TORL] += size) < n)
> +if ((s->mac_reg[TORL] += size + 4 /* Always include FCS
>  length. */) < n)

once changing this, can we move the assignation out of the if?
It was already complex enough, but adding a comment inside an if
condition looks "too much" to me.

Later, Juan.

>  s->mac_reg[TORH]++;
>  
>  n = E1000_ICS_RXT0;



Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-06 Thread Daniel P. Berrange
On Mon, Sep 06, 2010 at 11:04:38AM +0100, Stefan Hajnoczi wrote:
> QEMU Enhanced Disk format is a disk image format that forgoes features
> found in qcow2 in favor of better levels of performance and data
> integrity.  Due to its simpler on-disk layout, it is possible to safely
> perform metadata updates more efficiently.
> 
> Installations, suspend-to-disk, and other allocation-heavy I/O workloads
> will see increased performance due to fewer I/Os and syncs.  Workloads
> that do not cause new clusters to be allocated will perform similar to
> raw images due to in-memory metadata caching.
> 
> The format supports sparse disk images.  It does not rely on the host
> filesystem holes feature, making it a good choice for sparse disk images
> that need to be transferred over channels where holes are not supported.
> 
> Backing files are supported so only deltas against a base image can be
> stored.
> 
> The file format is extensible so that additional features can be added
> later with graceful compatibility handling.
> 
> Internal snapshots are not supported.  This eliminates the need for
> additional metadata to track copy-on-write clusters.
> 
> Compression and encryption are not supported.  They add complexity and
> can be implemented at other layers in the stack (i.e. inside the guest
> or on the host).

I agree with ditching compression, but encryption is an important
capability which cannot be satisfactorily added at other layers
in the stack. While block devices / local filesystems can layer
in dm-crypt in the host, this is not possible with network/cluster
filesystems which account for a non-trivial target audience. Adding
encryption inside the guest is sub-optimal because you cannot do
secure automation of guest startup. Either you require manaual
intervention to start every guest to enter the key, or if you
hardcode the key, then anyone who can access the guest disk image
can start the guest. The qcow2 encryption is the perfect solution
for this problem, guaranteeing the data security even when the
storage system / network transport offers no security, and allowing
for secure control over guest startup. Further, adding encryptiuon
does not add any serious complexity to the on disk format - just
1 extra header field, nor to the implmenetation - just pass the
data block through a encrypt/decrypt filter, with no extra I/O
paths.

> diff --git a/block/qed-cluster.c b/block/qed-cluster.c
> new file mode 100644
> index 000..6deea27
> --- /dev/null
> +++ b/block/qed-cluster.c
> @@ -0,0 +1,136 @@
> +/*
> + * QEMU Enhanced Disk Format Cluster functions
> + *
> + * Copyright IBM, Corp. 2010
> + *
> + * Authors:
> + *  Stefan Hajnoczi   
> + *  Anthony Liguori   
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qed.h"
> +
> +/**
> + * Count the number of contiguous data clusters
> + *
> + * @s:  QED state
> + * @table:  L2 table
> + * @index:  First cluster index
> + * @n:  Maximum number of clusters
> + * @offset: Set to first cluster offset
> + *
> + * This function scans tables for contiguous allocated or free clusters.
> + */
> +static unsigned int qed_count_contiguous_clusters(BDRVQEDState *s,
> +  QEDTable *table,
> +  unsigned int index,
> +  unsigned int n,
> +  uint64_t *offset)
> +{
> +unsigned int end = MIN(index + n, s->table_nelems);
> +uint64_t last = table->offsets[index];
> +unsigned int i;
> +
> +*offset = last;
> +
> +for (i = index + 1; i < end; i++) {
> +if (last == 0) {
> +/* Counting free clusters */
> +if (table->offsets[i] != 0) {
> +break;
> +}
> +} else {
> +/* Counting allocated clusters */
> +if (table->offsets[i] != last + s->header.cluster_size) {
> +break;
> +}
> +last = table->offsets[i];
> +}
> +}
> +return i - index;
> +}
> +
> +typedef struct {
> +BDRVQEDState *s;
> +uint64_t pos;
> +size_t len;
> +
> +QEDRequest *request;
> +
> +/* User callback */
> +QEDFindClusterFunc *cb;
> +void *opaque;
> +} QEDFindClusterCB;
> +
> +static void qed_find_cluster_cb(void *opaque, int ret)
> +{
> +QEDFindClusterCB *find_cluster_cb = opaque;
> +BDRVQEDState *s = find_cluster_cb->s;
> +QEDRequest *request = find_cluster_cb->request;
> +uint64_t offset = 0;
> +size_t len = 0;
> +unsigned int index;
> +unsigned int n;
> +
> +if (ret) {
> +ret = QED_CLUSTER_ERROR;
> +goto out;
> +}
> +
> +index = qed_l2_index(s, find_cluster_cb->pos);
> +n = qed_bytes_to_clusters(s,
> +   

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-06 Thread Stefan Hajnoczi
On Mon, Sep 6, 2010 at 11:25 AM, Alexander Graf  wrote:
> Should be an extra patch - it doesn't hurt to send an RFC patch set. This 
> thing is so big that it's no fun to review :).

I'll start consolidating commits so the next round will be easier to review.

Stefan



[Qemu-devel] Re: [RFC] qed: Add QEMU Enhanced Disk format

2010-09-06 Thread Kevin Wolf
Am 06.09.2010 12:04, schrieb Stefan Hajnoczi:
> QEMU Enhanced Disk format is a disk image format that forgoes features
> found in qcow2 in favor of better levels of performance and data
> integrity.  Due to its simpler on-disk layout, it is possible to safely
> perform metadata updates more efficiently.
> 
> Installations, suspend-to-disk, and other allocation-heavy I/O workloads
> will see increased performance due to fewer I/Os and syncs.  Workloads
> that do not cause new clusters to be allocated will perform similar to
> raw images due to in-memory metadata caching.
> 
> The format supports sparse disk images.  It does not rely on the host
> filesystem holes feature, making it a good choice for sparse disk images
> that need to be transferred over channels where holes are not supported.
> 
> Backing files are supported so only deltas against a base image can be
> stored.
> 
> The file format is extensible so that additional features can be added
> later with graceful compatibility handling.
> 
> Internal snapshots are not supported.  This eliminates the need for
> additional metadata to track copy-on-write clusters.
> 
> Compression and encryption are not supported.  They add complexity and
> can be implemented at other layers in the stack (i.e. inside the guest
> or on the host).
> 
> The format is currently functional with the following features missing:
>  * Resizing the disk image.  The capability has been designed in but the
>code has not been written yet.
>  * Resetting the image after backing file commit completes.
>  * Changing the backing filename.
>  * Consistency check (fsck).  This is simple due to the on-disk layout.
> 
> Signed-off-by: Anthony Liguori 
> Signed-off-by: Stefan Hajnoczi 

Okay, so before I actually look at the patch longer than a couple of
seconds let me just ask the obvious question...

Before inventing yet another image format, you certainly have checked
the existing ones. Except for not implementing compression and
encryption this looks a lot like qcow1 to me. I see that you even
retained the two-level cluster tables.

So if we ignore the implementation for a moment and just compare the
formats, what's the crucial difference between qcow1 and qed that I'm
missing? And if it's not qcow1, why not improving our support for
another existing format like VHD?

Kevin



Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-06 Thread Alexander Graf

On 06.09.2010, at 12:04, Stefan Hajnoczi wrote:

> QEMU Enhanced Disk format is a disk image format that forgoes features
> found in qcow2 in favor of better levels of performance and data
> integrity.  Due to its simpler on-disk layout, it is possible to safely
> perform metadata updates more efficiently.
> 
> Installations, suspend-to-disk, and other allocation-heavy I/O workloads
> will see increased performance due to fewer I/Os and syncs.  Workloads
> that do not cause new clusters to be allocated will perform similar to
> raw images due to in-memory metadata caching.
> 
> The format supports sparse disk images.  It does not rely on the host
> filesystem holes feature, making it a good choice for sparse disk images
> that need to be transferred over channels where holes are not supported.
> 
> Backing files are supported so only deltas against a base image can be
> stored.
> 
> The file format is extensible so that additional features can be added
> later with graceful compatibility handling.
> 
> Internal snapshots are not supported.  This eliminates the need for
> additional metadata to track copy-on-write clusters.
> 
> Compression and encryption are not supported.  They add complexity and
> can be implemented at other layers in the stack (i.e. inside the guest
> or on the host).
> 
> The format is currently functional with the following features missing:
> * Resizing the disk image.  The capability has been designed in but the
>   code has not been written yet.
> * Resetting the image after backing file commit completes.
> * Changing the backing filename.
> * Consistency check (fsck).  This is simple due to the on-disk layout.

Yippie - yet another disk format :). Let's hope this one survives.

> 
> Signed-off-by: Anthony Liguori 
> Signed-off-by: Stefan Hajnoczi 
> ---
> This code is also available from git (for development and testing the tracing
> and blkverify features are pulled in, whereas this single squashed patch
> applies to mainline qemu.git):
> 
> http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/qed
> 

just looked at it and stumbled over two simple nits.

[snip]

> +/**
> + * Get the number of bits for a power of 2
> + *
> + * The following is true for powers of 2:
> + *   n == 1 << get_bits_from_size(n)
> + */
> +int get_bits_from_size(size_t size)
> +{
> +int res = 0;
> +
> +if (size == 0) {
> +return -1;
> +}
> +
> +while (size != 1) {
> +/* Not a power of two */
> +if (size & 1) {
> +return -1;
> +}
> +
> +size >>= 1;
> +res++;
> +}
> +
> +return res;
> +}

Should be an extra patch - it doesn't hurt to send an RFC patch set. This thing 
is so big that it's no fun to review :).

> +
> +const char *bytes_to_str(uint64_t size)
> +{
> +static char buffer[64];
> +
> +if (size < (1ULL << 10)) {
> +snprintf(buffer, sizeof(buffer), "%" PRIu64 " byte(s)", size);
> +} else if (size < (1ULL << 20)) {
> +snprintf(buffer, sizeof(buffer), "%" PRIu64 " KB(s)", size >> 10);
> +} else if (size < (1ULL << 30)) {
> +snprintf(buffer, sizeof(buffer), "%" PRIu64 " MB(s)", size >> 20);
> +} else if (size < (1ULL << 40)) {
> +snprintf(buffer, sizeof(buffer), "%" PRIu64 " GB(s)", size >> 30);
> +} else {
> +snprintf(buffer, sizeof(buffer), "%" PRIu64 " TB(s)", size >> 40);
> +}
> +
> +return buffer;

This returns a variable from the stack! Please make the target buffer caller 
defined.

> +}
> diff --git a/qemu-common.h b/qemu-common.h
> index dfd3dc0..754b107 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -137,6 +137,8 @@ time_t mktimegm(struct tm *tm);
> int qemu_fls(int i);
> int qemu_fdatasync(int fd);
> int fcntl_setfl(int fd, int flag);
> +int get_bits_from_size(size_t size);
> +const char *bytes_to_str(uint64_t size);
> 
> /* path.c */
> void init_paths(const char *prefix);
> @@ -283,6 +285,7 @@ void qemu_iovec_destroy(QEMUIOVector *qiov);
> void qemu_iovec_reset(QEMUIOVector *qiov);
> void qemu_iovec_to_buffer(QEMUIOVector *qiov, void *buf);
> void qemu_iovec_from_buffer(QEMUIOVector *qiov, const void *buf, size_t 
> count);
> +void qemu_iovec_zero(QEMUIOVector *qiov);

separate patch please.


Alex




[Qemu-devel] Re: [PATCH v3] scsi-disk: add some optional scsi commands

2010-09-06 Thread Kevin Wolf
Am 06.09.2010 11:50, schrieb Bernhard Kohl:
> I use a legacy OS which depends on some optional SCSI commands.
> In fact this implementation does nothing special, but provides minimum
> support for the following commands:
> 
> REZERO UNIT
> WRITE AND VERIFY(10)
> WRITE AND VERIFY(12)
> WRITE AND VERIFY(16)
> MODE SELECT(6)
> MODE SELECT(10)
> SEEK(6)
> SEEK(10)
> 
> Signed-off-by: Bernhard Kohl 

Thanks, this one looks good. Applied to the block branch.

Kevin



[Qemu-devel] Re: [PATCH v2] scsi-disk: add some optional scsi commands

2010-09-06 Thread Kevin Wolf
Am 06.09.2010 11:41, schrieb Bernhard Kohl:
> Am 06.09.2010 11:33, schrieb ext Kevin Wolf:
>> Sorry to request another version, but this patch is corrupted by line
>> wraps. I think it was right when you sent the first version, did you
>> change anything?
>>
> 
> No I didn't change anything. I did 'git commit -s --amend' and
> 'git format-patch' and then I pasted it into Thunderbird. What
> email program are you using?
> 
> The first version I sent with 'git send-email'. I'll try that again.

git send-email is what I use for patches, too. It hasn't broken my
patches so far and it's also quite convenient because it works from the
shell where I just ran git format-patch.

Kevin



[Qemu-devel] Re: [PATCH 01/14] RESEND apb: fix typo.

2010-09-06 Thread Michael S. Tsirkin
On Mon, Sep 06, 2010 at 04:46:15PM +0900, Isaku Yamahata wrote:
> fix typo.
> 
> Signed-off-by: Isaku Yamahata 

This is separate from the express patches, right?
I'll appply this. Thanks!

> ---
>  hw/apb_pci.c |6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/apb_pci.c b/hw/apb_pci.c
> index 10a5baa..c619112 100644
> --- a/hw/apb_pci.c
> +++ b/hw/apb_pci.c
> @@ -362,7 +362,7 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
>  /* APB secondary busses */
>  pci_dev = pci_create_multifunction(d->bus, PCI_DEVFN(1, 0), true,
> "pbm-bridge");
> -br = DO_UPCAST(PCIBridge, dev, dev);
> +br = DO_UPCAST(PCIBridge, dev, pci_dev);
>  pci_bridge_map_irq(br, "Advanced PCI Bus secondary bridge 1",
> pci_apb_map_irq);
>  qdev_init_nofail(&pci_dev->qdev);
> @@ -370,7 +370,7 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
>  
>  pci_dev = pci_create_multifunction(d->bus, PCI_DEVFN(1, 1), true,
> "pbm-bridge");
> -br = DO_UPCAST(PCIBridge, dev, dev);
> +br = DO_UPCAST(PCIBridge, dev, pci_dev);
>  pci_bridge_map_irq(br, "Advanced PCI Bus secondary bridge 2",
> pci_apb_map_irq);
>  qdev_init_nofail(&pci_dev->qdev);
> @@ -462,7 +462,7 @@ static PCIDeviceInfo pbm_pci_bridge_info = {
>  .qdev.name = "pbm-bridge",
>  .qdev.size = sizeof(PCIBridge),
>  .qdev.vmsd = &vmstate_pci_device,
> -.qdev.reset = pci_brdige_reset,
> +.qdev.reset = pci_bridge_reset,
>  .init = apb_pci_bridge_initfn,
>  .exit = pci_bridge_exitfn,
>  .config_write = pci_bridge_write_config,
> -- 
> 1.7.1.1



[Qemu-devel] [PATCH v3] scsi-disk: add some optional scsi commands

2010-09-06 Thread Bernhard Kohl
I use a legacy OS which depends on some optional SCSI commands.
In fact this implementation does nothing special, but provides minimum
support for the following commands:

REZERO UNIT
WRITE AND VERIFY(10)
WRITE AND VERIFY(12)
WRITE AND VERIFY(16)
MODE SELECT(6)
MODE SELECT(10)
SEEK(6)
SEEK(10)

Signed-off-by: Bernhard Kohl 
---
 hw/scsi-disk.c |   36 +++-
 1 files changed, 35 insertions(+), 1 deletions(-)

diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index b80c9bd..1446ca6 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -892,6 +892,12 @@ static int scsi_disk_emulate_command(SCSIRequest *req, 
uint8_t *outbuf)
 break;
 case VERIFY:
 break;
+case REZERO_UNIT:
+DPRINTF("Rezero Unit\n");
+if (!bdrv_is_inserted(s->bs)) {
+goto not_ready;
+}
+break;
 default:
 goto illegal_request;
 }
@@ -1011,6 +1017,7 @@ static int32_t scsi_send_command(SCSIDevice *d, uint32_t 
tag,
 case SERVICE_ACTION_IN:
 case REPORT_LUNS:
 case VERIFY:
+case REZERO_UNIT:
 rc = scsi_disk_emulate_command(&r->req, outbuf);
 if (rc > 0) {
 r->iov.iov_len = rc;
@@ -1034,13 +1041,40 @@ static int32_t scsi_send_command(SCSIDevice *d, 
uint32_t tag,
 case WRITE_10:
 case WRITE_12:
 case WRITE_16:
-DPRINTF("Write (sector %" PRId64 ", count %d)\n", lba, len);
+case WRITE_VERIFY:
+case WRITE_VERIFY_12:
+case WRITE_VERIFY_16:
+DPRINTF("Write %s(sector %" PRId64 ", count %d)\n",
+(command & 0xe) == 0xe ? "And Verify " : "", lba, len);
 if (lba > s->max_lba)
 goto illegal_lba;
 r->sector = lba * s->cluster_size;
 r->sector_count = len * s->cluster_size;
 is_write = 1;
 break;
+case MODE_SELECT:
+DPRINTF("Mode Select(6) (len %d)\n", len);
+/* We don't support mode parameter changes.
+   Allow the mode parameter header + block descriptors only. */
+if (len > 12) {
+goto fail;
+}
+break;
+case MODE_SELECT_10:
+DPRINTF("Mode Select(10) (len %d)\n", len);
+/* We don't support mode parameter changes.
+   Allow the mode parameter header + block descriptors only. */
+if (len > 16) {
+goto fail;
+}
+break;
+case SEEK_6:
+case SEEK_10:
+DPRINTF("Seek(%d) (sector %" PRId64 ")\n", command == SEEK_6 ? 6 : 10, 
lba);
+if (lba > s->max_lba) {
+goto illegal_lba;
+}
+break;
 default:
DPRINTF("Unknown SCSI command (%2.2x)\n", buf[0]);
 fail:
-- 
1.7.2.2




[Qemu-devel] Re: [PATCH 07/14] msi: implemented msi.

2010-09-06 Thread Michael S. Tsirkin
On Mon, Sep 06, 2010 at 04:46:21PM +0900, Isaku Yamahata wrote:
> implemented msi support functions.
> 
> Signed-off-by: Isaku Yamahata 


Good stuff. Below I suggest a couple of minor correctness issues,
and a couple of API changes.  Below are also some suggestions to improve
readability a bit: mostly there are too many one-liners used
only once or twice, with names not informative enough to guess what they
do, so I suggest we opencode them as one ends up reading the source
anyway.

Thanks!

> ---
>  Makefile.objs |2 +-
>  hw/msi.c  |  362 
> +
>  hw/msi.h  |   41 +++
>  hw/pci.h  |5 +
>  4 files changed, 409 insertions(+), 1 deletions(-)
>  create mode 100644 hw/msi.c
>  create mode 100644 hw/msi.h
> 
> diff --git a/Makefile.objs b/Makefile.objs
> index 594894b..5f5a4c5 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -186,7 +186,7 @@ hw-obj-$(CONFIG_PIIX4) += piix4.o
>  # PCI watchdog devices
>  hw-obj-y += wdt_i6300esb.o
>  
> -hw-obj-y += msix.o
> +hw-obj-y += msix.o msi.o
>  
>  # PCI network cards
>  hw-obj-y += ne2000.o
> diff --git a/hw/msi.c b/hw/msi.c
> new file mode 100644
> index 000..dbe89ee
> --- /dev/null
> +++ b/hw/msi.c
> @@ -0,0 +1,362 @@
> +/*
> + * msi.c
> + *
> + * Copyright (c) 2010 Isaku Yamahata 
> + *VA Linux Systems Japan K.K.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see .
> + */
> +
> +#include "msi.h"
> +
> +#define PCI_MSI_PENDING_32  0x10
> +#define PCI_MSI_PENDING_64  0x14
> +

Longer term, pls try to add this to linux, btw.
If you manage to we'll then move this to pci_regs.h

> +/* PCI_MSI_FLAGS */
> +#define PCI_MSI_FLAGS_QSIZE_SHIFT   4
> +#define PCI_MSI_FLAGS_QMASK_SHIFT   1

Use ffs on macros from pci_regs.h :
PCI_MSI_FLAGS_QSIZE_SHIFT ->
(ffs(PCI_MSI_FLAGS_QSIZE) - 1)
as there's a single user, we don't need a macro then.

> +
> +/* PCI_MSI_ADDRESS_LO */
> +#define PCI_MSI_ADDRESS_LO_RESERVED_MASK0x3

Better to have the valid bits than invalid bits:
#define PCI_MSI_ADDRESS_LO_MASK(~0x3)
And then use directly.
again, we can try adding this to linux long term.


> +
> +#define PCI_MSI_32_SIZEOF   0x0a
> +#define PCI_MSI_64_SIZEOF   0x0e
> +#define PCI_MSI_32M_SIZEOF  0x14
> +#define PCI_MSI_64M_SIZEOF  0x18

Note to self: if we get rid of cap allocator, we won't need the above.

> +
> +#define MSI_DEBUG
> +#ifdef MSI_DEBUG
> +# define MSI_DPRINTF(fmt, ...)  \
> +fprintf(stderr, "%s:%d " fmt, __func__, __LINE__, ## __VA_ARGS__)
> +#else
> +# define MSI_DPRINTF(fmt, ...)  do { } while (0)
> +#endif
> +#define MSI_DEV_PRINTF(dev, fmt, ...)   \
> +MSI_DPRINTF("%s:%x " fmt, (dev)->name, (dev)->devfn, ## __VA_ARGS__)
> +
> +static inline bool msi_test_bit(uint32_t bitmap, uint8_t bit)
> +{
> +return bitmap & (1 << bit);
> +}
> +
> +static inline void msi_set_bit(uint32_t *bitmap, uint8_t bit)
> +{
> +*bitmap |= (1 << bit);
> +}
> +
> +static inline void msi_clear_bit(uint32_t *bitmap, uint8_t bit)
> +{
> +*bitmap &= ~(1 << bit);
> +}


The above 3 are much better open-coded - all they do is simple math.
Also, I think we should use 1U << - the above is signed and so it
will cause integer overflow when bit is 31.

> +
> +

Two empty lines -> one

> +/* multiple vector only suport power of 2 and up to 32 */
> +static inline uint8_t ilog2(uint8_t nr_vector)
> +{
> +switch (nr_vector) {
> +case 1:
> +return 0;
> +case 2:
> +return 1;
> +case 4:
> +return 2;
> +case 8:
> +return 3;
> +case 16:
> +return 4;
> +case 32:
> +return 5;
> +default:
> +abort();

Not intuitive to abort in a mathematical function IMO.

> +break;
> +}
> +return 0;
> +}
> +

Please just use ffs opencoded, there's a single user.
ilog -> ffs(n) - 1;
and do this after range-checking as suggested below.

> +static inline bool is_mask_bit_support(uint16_t control)
> +{
> +return control & PCI_MSI_FLAGS_MASKBIT;
> +}
> +
> +static inline bool is_64bit_address(uint16_t control)
> +{
> +return control & PCI_MSI_FLAGS_64BIT;
> +}
> +

These 2 function names are especially confusing.
The above 2 are better open-coded.

> +static inline uint

[Qemu-devel] Re: [PATCH v2] scsi-disk: add some optional scsi commands

2010-09-06 Thread Bernhard Kohl

Am 06.09.2010 11:33, schrieb ext Kevin Wolf:

Sorry to request another version, but this patch is corrupted by line
wraps. I think it was right when you sent the first version, did you
change anything?
   


No I didn't change anything. I did 'git commit -s --amend' and
'git format-patch' and then I pasted it into Thunderbird. What
email program are you using?

The first version I sent with 'git send-email'. I'll try that again.

Bernhard




Re: [Qemu-devel] Re: Unmaintained QEMU builds

2010-09-06 Thread Corentin Chary
On Mon, Sep 6, 2010 at 10:59 AM, Paolo Bonzini  wrote:

> On 09/05/2010 06:05 PM, Avi Kivity wrote:
>
>> I'm perfectly fine with dropping it. btw, there are other features in
>> qemu that seem to be academic exercises - *-user for example. What is it
>> useful for? Most open source stuff is multiplatform, and serious
>> commercial work needs something faster than tcg.
>>
>
> *-user is useful to GCC developers (and used by those who work on the ARM
> backend to run the GCC testsuite).
>
> Paolo
>
>
>
Isn't that also used by scratchbox ?

-- 
Corentin Chary
http://xf.iksaif.net


[Qemu-devel] Re: [PATCH v2] scsi-disk: add some optional scsi commands

2010-09-06 Thread Kevin Wolf
Am 06.09.2010 11:25, schrieb Bernhard Kohl:
> I use a legacy OS which depends on some optional SCSI commands.
> In fact this implementation does nothing special, but provides minimum
> support for the following commands:
> 
> REZERO UNIT
> WRITE AND VERIFY(10)
> WRITE AND VERIFY(12)
> WRITE AND VERIFY(16)
> MODE SELECT(6)
> MODE SELECT(10)
> SEEK(6)
> SEEK(10)
> 
> Signed-off-by: Bernhard Kohl 
> ---
>   hw/scsi-disk.c |   36 +++-
>   1 files changed, 35 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
> index b80c9bd..1446ca6 100644
> --- a/hw/scsi-disk.c
> +++ b/hw/scsi-disk.c
> @@ -892,6 +892,12 @@ static int scsi_disk_emulate_command(SCSIRequest 
> *req, uint8_t *outbuf)

Sorry to request another version, but this patch is corrupted by line
wraps. I think it was right when you sent the first version, did you
change anything?

Kevin



Re: [Qemu-devel] [PATCH] lsi53c895a: avoid calling a NULL function

2010-09-06 Thread Bernhard Kohl

Am 02.09.2010 16:45, schrieb ext Kevin Wolf:

I suspect the real fix is to add a reset handler to scsi-generic. It
should probably remove the requests like it's currently done in
scsi_destroy. The same change was made for scsi-disk in e9447f35.
   


Yes, I agree. Let's skip this patch. I'll send another one.

Bernhard




Re: [Qemu-devel] PCIe Qemu changes for Q35

2010-09-06 Thread Isaku Yamahata
On Sun, Sep 05, 2010 at 08:09:18PM -0700, Adhyas Avasthi wrote:
> Hi Isaku
> 
> I believe you were working on the Q35 chipset and PCIe emulation for
> the same, and planned to check your code in to the main git. I can use
> some of that work for something I am working on.
> I have not been too active on the mailing list, and might have missed
> your reviews if you have sent already. In that case, please disregard
> my email (and let me know privately).
> 
> Otherwise, If you can share your plan of sending the review and
> checking the code, it would give me some idea of when I can expect the
> PCIe changes in the git. I am mostly interested in your PCIe code (and
> not Q35) for now.

The following slide would help.(Maybe you've already check it though.)
http://www.linux-kvm.org/wiki/images/2/2e/2010-forum-Pci-express-in-qemu.pdf

My plan is to get the patches merged for 0.14 release.
As I sent out the patches of pcie port switch today,
the major modification to the pci layer has been sent out.
They will be merged into the main tree soon, I hope.
(Or mst's pci tree at worst)

It depends on other's review how smoothly the patches will be
merged. So If you want to accelerate the merge process,
please review the patches. It will be greatly appreciated.

Once pcie stuff is done, I'll start to post q35 chipset patches.
Right now my public repo for q35 is a bit old because I'd like
to avoid unnecessary rebases.
Once the pcie stuff are merged(i.e. pci layer is stabilized),
I'll rebase it and publish it.

thanks,

> My apologies if I am sounding pushy, I just want to see if your plan
> falls within my requirement schedule and I can use your code.
> 
> Awaiting your response.
> 
> Thanks,
> Adhyas
> 
> 
> Two types have compatible type if their types are the same.
> ? ? ? ANSI C Standard, 3.1.2.6.
> 
> 

-- 
yamahata



[Qemu-devel] [PATCH v2] scsi-disk: add some optional scsi commands

2010-09-06 Thread Bernhard Kohl

I use a legacy OS which depends on some optional SCSI commands.
In fact this implementation does nothing special, but provides minimum
support for the following commands:

REZERO UNIT
WRITE AND VERIFY(10)
WRITE AND VERIFY(12)
WRITE AND VERIFY(16)
MODE SELECT(6)
MODE SELECT(10)
SEEK(6)
SEEK(10)

Signed-off-by: Bernhard Kohl 
---
 hw/scsi-disk.c |   36 +++-
 1 files changed, 35 insertions(+), 1 deletions(-)

diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index b80c9bd..1446ca6 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -892,6 +892,12 @@ static int scsi_disk_emulate_command(SCSIRequest 
*req, uint8_t *outbuf)

 break;
 case VERIFY:
 break;
+case REZERO_UNIT:
+DPRINTF("Rezero Unit\n");
+if (!bdrv_is_inserted(s->bs)) {
+goto not_ready;
+}
+break;
 default:
 goto illegal_request;
 }
@@ -1011,6 +1017,7 @@ static int32_t scsi_send_command(SCSIDevice *d, 
uint32_t tag,

 case SERVICE_ACTION_IN:
 case REPORT_LUNS:
 case VERIFY:
+case REZERO_UNIT:
 rc = scsi_disk_emulate_command(&r->req, outbuf);
 if (rc > 0) {
 r->iov.iov_len = rc;
@@ -1034,13 +1041,40 @@ static int32_t scsi_send_command(SCSIDevice *d, 
uint32_t tag,

 case WRITE_10:
 case WRITE_12:
 case WRITE_16:
-DPRINTF("Write (sector %" PRId64 ", count %d)\n", lba, len);
+case WRITE_VERIFY:
+case WRITE_VERIFY_12:
+case WRITE_VERIFY_16:
+DPRINTF("Write %s(sector %" PRId64 ", count %d)\n",
+(command & 0xe) == 0xe ? "And Verify " : "", lba, len);
 if (lba > s->max_lba)
 goto illegal_lba;
 r->sector = lba * s->cluster_size;
 r->sector_count = len * s->cluster_size;
 is_write = 1;
 break;
+case MODE_SELECT:
+DPRINTF("Mode Select(6) (len %d)\n", len);
+/* We don't support mode parameter changes.
+   Allow the mode parameter header + block descriptors only. */
+if (len > 12) {
+goto fail;
+}
+break;
+case MODE_SELECT_10:
+DPRINTF("Mode Select(10) (len %d)\n", len);
+/* We don't support mode parameter changes.
+   Allow the mode parameter header + block descriptors only. */
+if (len > 16) {
+goto fail;
+}
+break;
+case SEEK_6:
+case SEEK_10:
+DPRINTF("Seek(%d) (sector %" PRId64 ")\n", command == SEEK_6 ? 
6 : 10, lba);

+if (lba > s->max_lba) {
+goto illegal_lba;
+}
+break;
 default:
 DPRINTF("Unknown SCSI command (%2.2x)\n", buf[0]);
 fail:
--
1.7.2.2




[Qemu-devel] Re: [PATCH] scsi-disk: add some optional scsi commands

2010-09-06 Thread Bernhard Kohl

Am 03.09.2010 17:41, schrieb ext Kevin Wolf:

Am 02.09.2010 15:11, schrieb Bernhard Kohl:
   

I use a legacy OS which depends on some optional SCSI commands.
In fact this implementation does nothing special, but provides minimum
support for the following commands:

REZERO UNIT
WRITE AND VERIFY(10)
WRITE AND VERIFY(12)
WRITE AND VERIFY(16)
MODE SELECT(6)
MODE SELECT(10)
SEEK(6)
SEEK(10)
 

You forgot the SoB line on this one. Looks good otherwise.

Kevin
   


Sorry for that. I'll resend the patch.

Bernhard





[Qemu-devel] Re: Unmaintained QEMU builds

2010-09-06 Thread Paolo Bonzini

On 09/05/2010 06:05 PM, Avi Kivity wrote:

I'm perfectly fine with dropping it. btw, there are other features in
qemu that seem to be academic exercises - *-user for example. What is it
useful for? Most open source stuff is multiplatform, and serious
commercial work needs something faster than tcg.


*-user is useful to GCC developers (and used by those who work on the 
ARM backend to run the GCC testsuite).


Paolo



[Qemu-devel] Re: [PATCH 02/14] pci: consolidate pci_add_capability_at_offset() into pci_add_capability().

2010-09-06 Thread Michael S. Tsirkin
On Mon, Sep 06, 2010 at 04:46:16PM +0900, Isaku Yamahata wrote:
> By making pci_add_capability() the special case of
> pci_add_capability_at_offset() of offset = 0,
> consolidate pci_add_capability_at_offset() into pci_add_capability().
> 
> Cc: Stefan Weil 
> Cc: Michael S. Tsirkin 
> Signed-off-by: Isaku Yamahata 

Good stuff.

Here's an idea: Long term I think we should just give up on the ability
to allocate capabilities dynamically, and always pass a valid offset.
In particular, this would help us get rid of the used mask, and we won't
have to pass in capability size.

virtio would then have
#define VIRTIO_PCI_MSIX_CAP_OFF 0x40
and msix.h would be changed to get the offset.

The reason is that we need to preserve the offsets for migration
to work, anyway.

Have said all this, this is just an idea, it is not a must
and can be a separate patch, or I might do this change myself.

> ---
>  hw/eepro100.c |4 ++--
>  hw/msix.c |3 ++-
>  hw/pci.c  |   33 ++---
>  hw/pci.h  |5 ++---
>  4 files changed, 24 insertions(+), 21 deletions(-)
> 
> diff --git a/hw/eepro100.c b/hw/eepro100.c
> index 2b75c8f..8cbc3aa 100644
> --- a/hw/eepro100.c
> +++ b/hw/eepro100.c
> @@ -539,8 +539,8 @@ static void e100_pci_reset(EEPRO100State * s, 
> E100PCIDeviceInfo *e100_device)
>  if (e100_device->power_management) {
>  /* Power Management Capabilities */
>  int cfg_offset = 0xdc;
> -int r = pci_add_capability_at_offset(&s->dev, PCI_CAP_ID_PM,
> - cfg_offset, PCI_PM_SIZEOF);
> +int r = pci_add_capability(&s->dev, PCI_CAP_ID_PM,
> +   cfg_offset, PCI_PM_SIZEOF);
>  assert(r >= 0);
>  pci_set_word(pci_conf + cfg_offset + PCI_PM_PMC, 0x7e21);
>  #if 0 /* TODO: replace dummy code for power management emulation. */
> diff --git a/hw/msix.c b/hw/msix.c
> index d99403a..7ce63eb 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -73,7 +73,8 @@ static int msix_add_config(struct PCIDevice *pdev, unsigned 
> short nentries,
>  }
>  
>  pdev->msix_bar_size = new_size;
> -config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX, 
> MSIX_CAP_LENGTH);
> +config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX,
> +   0, MSIX_CAP_LENGTH);
>  if (config_offset < 0)
>  return config_offset;
>  config = pdev->config + config_offset;
> diff --git a/hw/pci.c b/hw/pci.c
> index 2dc1577..754ffb3 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -1682,11 +1682,25 @@ static void pci_del_option_rom(PCIDevice *pdev)
>  pdev->rom_offset = 0;
>  }
>  
> -/* Reserve space and add capability to the linked list in pci config space */
> -int pci_add_capability_at_offset(PCIDevice *pdev, uint8_t cap_id,
> - uint8_t offset, uint8_t size)
> +/*
> + * if !offset
> + * Reserve space and add capability to the linked list in pci config space
> + *
> + * if offset = 0,
> + * Find and reserve space and add capability to the linked list
> + * in pci config space */
> +int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
> +   uint8_t offset, uint8_t size)
>  {
> -uint8_t *config = pdev->config + offset;
> +uint8_t *config;
> +if (!offset) {
> +offset = pci_find_space(pdev, size);
> +if (!offset) {
> +return -ENOSPC;
> +}
> +}
> +
> +config = pdev->config + offset;
>  config[PCI_CAP_LIST_ID] = cap_id;
>  config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>  pdev->config[PCI_CAPABILITY_LIST] = offset;
> @@ -1699,17 +1713,6 @@ int pci_add_capability_at_offset(PCIDevice *pdev, 
> uint8_t cap_id,
>  return offset;
>  }
>  
> -/* Find and reserve space and add capability to the linked list
> - * in pci config space */
> -int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size)
> -{
> -uint8_t offset = pci_find_space(pdev, size);
> -if (!offset) {
> -return -ENOSPC;
> -}
> -return pci_add_capability_at_offset(pdev, cap_id, offset, size);
> -}
> -
>  /* Unlink capability from the pci config space. */
>  void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size)
>  {
> diff --git a/hw/pci.h b/hw/pci.h
> index c551f96..2ddba59 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -183,9 +183,8 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
>  pcibus_t size, int type,
>  PCIMapIORegionFunc *map_func);
>  
> -int pci_add_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t cap_size);
> -int pci_add_capability_at_offset(PCIDevice *pci_dev, uint8_t cap_id,
> - uint8_t cap_offset, uint8_t cap_size);
> +int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
> +   uint8_t offset, uint8_t size);
>  
>  void pci_del_capability(PCIDevice *pci_dev, uint

[Qemu-devel] [PATCH 09/14] pcie port: define struct PCIEPort/PCIESlot and helper functions

2010-09-06 Thread Isaku Yamahata
define struct PCIEPort which represents common part
of pci express port.(root, upstream and downstream.)
add a helper function for pcie port which can be used commonly by
root/upstream/downstream port.
define struct PCIESlot which represents common part of
pcie slot.(root and downstream.) and helper functions for it.
helper functions for chassis, slot -> PCIESlot conversion.

Signed-off-by: Isaku Yamahata 
---
 Makefile.objs  |2 +-
 hw/pcie_port.c |  106 
 hw/pcie_port.h |   51 +++
 qemu-common.h  |2 +
 4 files changed, 160 insertions(+), 1 deletions(-)
 create mode 100644 hw/pcie_port.c
 create mode 100644 hw/pcie_port.h

diff --git a/Makefile.objs b/Makefile.objs
index eeb5134..c73d12b 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -186,7 +186,7 @@ hw-obj-$(CONFIG_PIIX4) += piix4.o
 # PCI watchdog devices
 hw-obj-y += wdt_i6300esb.o
 
-hw-obj-y += pcie.o
+hw-obj-y += pcie.o pcie_port.o
 hw-obj-y += msix.o msi.o
 
 # PCI network cards
diff --git a/hw/pcie_port.c b/hw/pcie_port.c
new file mode 100644
index 000..d2b1f38
--- /dev/null
+++ b/hw/pcie_port.c
@@ -0,0 +1,106 @@
+/*
+ * pcie_port.c
+ *
+ * Copyright (c) 2010 Isaku Yamahata 
+ *VA Linux Systems Japan K.K.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "pcie_port.h"
+
+void pcie_port_init_reg(PCIDevice *d)
+{
+/* Unlike pci bridge,
+   66MHz and fast back to back don't apply to pci express port. */
+pci_set_word(d->config + PCI_STATUS, 0);
+pci_set_word(d->config + PCI_SEC_STATUS, 0);
+}
+
+/**
+ * (chassis number, pcie physical slot number) -> pcie slot conversion
+ */
+struct PCIEChassis {
+uint8_t number;
+
+QLIST_HEAD(, PCIESlot) slots;
+QLIST_ENTRY(PCIEChassis) next;
+};
+
+QLIST_HEAD(, PCIEChassis) chassis = QLIST_HEAD_INITIALIZER(chassis);
+
+static struct PCIEChassis *pcie_chassis_find(uint8_t chassis_number)
+{
+struct PCIEChassis *c;
+QLIST_FOREACH(c, &chassis, next) {
+if (c->number == chassis_number) {
+break;
+}
+}
+return c;
+}
+
+void pcie_chassis_create(uint8_t chassis_number)
+{
+struct PCIEChassis *c;
+c = pcie_chassis_find(chassis_number);
+if (c) {
+return;
+}
+c = qemu_mallocz(sizeof(*c));
+c->number = chassis_number;
+QLIST_INIT(&c->slots);
+QLIST_INSERT_HEAD(&chassis, c, next);
+}
+
+static PCIESlot *pcie_chassis_find_slot_with_chassis(struct PCIEChassis *c,
+ uint8_t slot)
+{
+PCIESlot *s;
+QLIST_FOREACH(s, &c->slots, next) {
+if (s->slot == slot) {
+break;
+}
+}
+return s;
+}
+
+PCIESlot *pcie_chassis_find_slot(uint8_t chassis_number, uint16_t slot)
+{
+struct PCIEChassis *c;
+c = pcie_chassis_find(chassis_number);
+if (!c) {
+return NULL;
+}
+return pcie_chassis_find_slot_with_chassis(c, slot);
+}
+
+int pcie_chassis_add_slot(struct PCIESlot *slot)
+{
+struct PCIEChassis *c;
+c = pcie_chassis_find(slot->chassis);
+if (!c) {
+return -ENODEV;
+}
+if (pcie_chassis_find_slot_with_chassis(c, slot->slot)) {
+return -EBUSY;
+}
+QLIST_INSERT_HEAD(&c->slots, slot, next);
+return 0;
+}
+
+void pcie_chassis_del_slot(PCIESlot *s)
+{
+QLIST_REMOVE(s, next);
+}
diff --git a/hw/pcie_port.h b/hw/pcie_port.h
new file mode 100644
index 000..3709583
--- /dev/null
+++ b/hw/pcie_port.h
@@ -0,0 +1,51 @@
+/*
+ * pcie_port.h
+ *
+ * Copyright (c) 2010 Isaku Yamahata 
+ *VA Linux Systems Japan K.K.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see 

[Qemu-devel] [PATCH 12/14] pcie downstream port: pci express switch downstream port.

2010-09-06 Thread Isaku Yamahata
pcie switch downstream port.

Signed-off-by: Isaku Yamahata 
---
 Makefile.objs|1 +
 hw/pcie_downstream.c |  224 ++
 hw/pcie_downstream.h |   33 
 3 files changed, 258 insertions(+), 0 deletions(-)
 create mode 100644 hw/pcie_downstream.c
 create mode 100644 hw/pcie_downstream.h

diff --git a/Makefile.objs b/Makefile.objs
index e7a08dc..6904b02 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -140,6 +140,7 @@ hw-obj-y =
 hw-obj-y += vl.o loader.o
 hw-obj-y += virtio.o virtio-console.o
 hw-obj-y += fw_cfg.o pci.o pci_host.o pcie_host.o pci_bridge.o pcie_upstream.o
+hw-obj-y += pcie_downstream.o
 hw-obj-y += watchdog.o
 hw-obj-$(CONFIG_ISA_MMIO) += isa_mmio.o
 hw-obj-$(CONFIG_ECC) += ecc.o
diff --git a/hw/pcie_downstream.c b/hw/pcie_downstream.c
new file mode 100644
index 000..a04efc9
--- /dev/null
+++ b/hw/pcie_downstream.c
@@ -0,0 +1,224 @@
+/*
+ * pcie_downstream.c
+ *
+ * Copyright (c) 2010 Isaku Yamahata 
+ *VA Linux Systems Japan K.K.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "pci_ids.h"
+#include "msi.h"
+#include "pcie.h"
+#include "pcie_downstream.h"
+
+/* For now, TI XIO3130 is borrowed. need to get its own id? */
+#define PCI_DEVICE_ID_TI_XIO3130D   0x8233  /* downstream port */
+#define XIO3130_REVISION0x1
+#define XIO3130_MSI_OFFSET  0x70
+#define XIO3130_MSI_SUPPORTED_FLAGS PCI_MSI_FLAGS_64BIT
+#define XIO3130_MSI_NR_VECTOR   1
+#define XIO3130_SSVID_OFFSET0x80
+#define XIO3130_SSVID_SVID  0
+#define XIO3130_SSVID_SSID  0
+#define XIO3130_EXP_OFFSET  0x90
+#define XIO3130_AER_OFFSET  0x100
+
+#define PCIE_DOWNSTREAM_VID PCI_VENDOR_ID_TI
+#define PCIE_DOWNSTREAM_DID PCI_DEVICE_ID_TI_XIO3130D
+#define PCIE_DOWNSTREAM_REVISIONXIO3130_REVISION
+#define PCIE_DOWNSTREAM_MSI_SUPPORTED_FLAGS XIO3130_MSI_SUPPORTED_FLAGS
+#define PCIE_DOWNSTREAM_MSI_NR_VECTOR   XIO3130_MSI_NR_VECTOR
+#define PCIE_DOWNSTREAM_MSI_OFFSET  XIO3130_MSI_OFFSET
+#define PCIE_DOWNSTREAM_SSVID_OFFSETXIO3130_SSVID_OFFSET
+#define PCIE_DOWNSTREAM_SVIDXIO3130_SSVID_SVID
+#define PCIE_DOWNSTREAM_SSIDXIO3130_SSVID_SSID
+#define PCIE_DOWNSTREAM_EXP_OFFSET  XIO3130_EXP_OFFSET
+#define PCIE_DOWNSTREAM_AER_OFFSET  XIO3130_AER_OFFSET
+
+static void pcie_downstream_write_config(PCIDevice *d, uint32_t address,
+ uint32_t val, int len)
+{
+uint16_t sltctl =
+pci_get_word(d->config + pci_pcie_cap(d) + PCI_EXP_SLTCTL);
+pci_bridge_write_config(d, address, val, len);
+pcie_cap_flr_write_config(d, address, val, len);
+pcie_cap_deverr_write_config(d, address, val, len);
+pcie_cap_slot_write_config(d, address, val, len, sltctl);
+msi_write_config(d, address, val, len);
+pcie_aer_write_config_vbridge(d, address, val, len);
+pcie_aer_write_config(d, address, val, len);
+}
+
+static void pcie_downstream_reset(DeviceState *qdev)
+{
+PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
+msi_reset(d);
+pcie_cap_deverr_reset(d);
+pcie_cap_slot_reset(d);
+pcie_cap_ari_reset(d);
+pci_bridge_reset(qdev);
+}
+
+static void pcie_downstream_flr(PCIDevice *d)
+{
+/* TODO: not enabled until qdev reset clean up
+   waiting for Anthony's qdev cealn up */
+#if 0
+/* So far, sticky bit registers or register which must be preserved
+   over FLR aren't emulated. So just reset this device. */
+pci_device_reset(d);
+#endif
+}
+
+static int pcie_downstream_initfn(PCIDevice *d)
+{
+PCIBridge* br = DO_UPCAST(PCIBridge, dev, d);
+PCIEPort *p = DO_UPCAST(PCIEPort, br, br);
+PCIESlot *s = DO_UPCAST(PCIESlot, port, p);
+int rc;
+
+rc = pci_bridge_initfn(d);
+if (rc < 0) {
+return rc;
+}
+
+pcie_port_init_reg(d);
+pci_config_set_vendor_id(d->config, PCIE_DOWNSTREAM_VID);
+pci_config_set_device_id(d->config, PCIE_DOWNSTREAM_DID);
+d->config[PCI_REVISION_ID] = PCIE_DOWNSTREAM_REVISION;
+
+rc = msi_init(d, PCIE_DOWNSTREAM_MSI_OFFSET, PCIE_DOWNSTREAM_MSI_NR_VECTOR,
+  PCIE_DOWNSTREAM_MSI_SUPPORTED_FLAGS);
+if (rc < 0) {
+return rc;
+}
+rc = pci_bridge_ssvid_init(d, PCIE_DOWNSTREAM_SS

[Qemu-devel] [PATCH 13/14] pcie/hotplug: glue pushing attention button command. pcie_abp

2010-09-06 Thread Isaku Yamahata
glue to pcie_abp monitor command.

Signed-off-by: Isaku Yamahata 
---
 hw/pcie_port.c  |   82 +++
 qemu-monitor.hx |   14 +
 sysemu.h|4 +++
 3 files changed, 100 insertions(+), 0 deletions(-)

diff --git a/hw/pcie_port.c b/hw/pcie_port.c
index d2b1f38..0aa8c53 100644
--- a/hw/pcie_port.c
+++ b/hw/pcie_port.c
@@ -18,6 +18,10 @@
  * with this program; if not, see .
  */
 
+#include "qemu-objects.h"
+#include "sysemu.h"
+#include "monitor.h"
+#include "pcie.h"
 #include "pcie_port.h"
 
 void pcie_port_init_reg(PCIDevice *d)
@@ -104,3 +108,81 @@ void pcie_chassis_del_slot(PCIESlot *s)
 {
 QLIST_REMOVE(s, next);
 }
+
+/**
+ * glue for qemu monitor
+ */
+
+/* Parse [.], return -1 on error */
+static int pcie_parse_slot_addr(const char* slot_addr,
+uint8_t *chassisp, uint16_t *slotp)
+{
+const char *p;
+char *e;
+unsigned long val;
+unsigned long chassis = 0;
+unsigned long slot;
+
+p = slot_addr;
+val = strtoul(p, &e, 0);
+if (e == p) {
+return -1;
+}
+if (*e == '.') {
+chassis = val;
+p = e + 1;
+val = strtoul(p, &e, 0);
+if (e == p) {
+return -1;
+}
+}
+slot = val;
+
+if (*e) {
+return -1;
+}
+
+if (chassis > 0xff || slot > 0x) {
+return -1;
+}
+
+*chassisp = chassis;
+*slotp = slot;
+return 0;
+}
+
+void pcie_attention_button_push_print(Monitor *mon, const QObject *data)
+{
+QDict *qdict;
+
+assert(qobject_type(data) == QTYPE_QDICT);
+qdict = qobject_to_qdict(data);
+
+monitor_printf(mon, "OK chassis %d, slot %d\n",
+   (int) qdict_get_int(qdict, "chassis"),
+   (int) qdict_get_int(qdict, "slot"));
+}
+
+int pcie_attention_button_push(Monitor *mon, const QDict *qdict,
+   QObject **ret_data)
+{
+const char* pcie_slot = qdict_get_str(qdict, "pcie_slot");
+uint8_t chassis;
+uint16_t slot;
+PCIESlot *s;
+
+if (pcie_parse_slot_addr(pcie_slot, &chassis, &slot) < 0) {
+monitor_printf(mon, "invalid pcie slot address %s\n", pcie_slot);
+return -1;
+}
+s = pcie_chassis_find_slot(chassis, slot);
+if (!s) {
+monitor_printf(mon, "slot is not found. %s\n", pcie_slot);
+return -1;
+}
+pcie_cap_slot_push_attention_button(&s->port.br.dev);
+*ret_data = qobject_from_jsonf("{ 'chassis': %d, 'slot': %d}",
+   chassis, slot);
+assert(*ret_data);
+return 0;
+}
diff --git a/qemu-monitor.hx b/qemu-monitor.hx
index 2af3de6..02fbda1 100644
--- a/qemu-monitor.hx
+++ b/qemu-monitor.hx
@@ -1154,6 +1154,20 @@ Hot remove PCI device.
 ETEXI
 
 {
+.name   = "pcie_abp",
+.args_type  = "pcie_slot:s",
+.params = "[.]",
+.help   = "push pci express attention button",
+.user_print  = pcie_attention_button_push_print,
+.mhandler.cmd_new = pcie_attention_button_push,
+},
+
+STEXI
+...@item pcie_abp
+Push PCI express attention button
+ETEXI
+
+{
 .name   = "host_net_add",
 .args_type  = "device:s,opts:s?",
 .params = "tap|user|socket|vde|dump [options]",
diff --git a/sysemu.h b/sysemu.h
index 9c988bb..cca411d 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -150,6 +150,10 @@ extern unsigned int nb_prom_envs;
 void pci_device_hot_add(Monitor *mon, const QDict *qdict);
 void drive_hot_add(Monitor *mon, const QDict *qdict);
 void do_pci_device_hot_remove(Monitor *mon, const QDict *qdict);
+/* pcie hotplug */
+void pcie_attention_button_push_print(Monitor *mon, const QObject *data);
+int pcie_attention_button_push(Monitor *mon, const QDict *qdict,
+   QObject **ret_data);
 
 /* serial ports */
 
-- 
1.7.1.1




[Qemu-devel] [PATCH 07/14] msi: implemented msi.

2010-09-06 Thread Isaku Yamahata
implemented msi support functions.

Signed-off-by: Isaku Yamahata 
---
 Makefile.objs |2 +-
 hw/msi.c  |  362 +
 hw/msi.h  |   41 +++
 hw/pci.h  |5 +
 4 files changed, 409 insertions(+), 1 deletions(-)
 create mode 100644 hw/msi.c
 create mode 100644 hw/msi.h

diff --git a/Makefile.objs b/Makefile.objs
index 594894b..5f5a4c5 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -186,7 +186,7 @@ hw-obj-$(CONFIG_PIIX4) += piix4.o
 # PCI watchdog devices
 hw-obj-y += wdt_i6300esb.o
 
-hw-obj-y += msix.o
+hw-obj-y += msix.o msi.o
 
 # PCI network cards
 hw-obj-y += ne2000.o
diff --git a/hw/msi.c b/hw/msi.c
new file mode 100644
index 000..dbe89ee
--- /dev/null
+++ b/hw/msi.c
@@ -0,0 +1,362 @@
+/*
+ * msi.c
+ *
+ * Copyright (c) 2010 Isaku Yamahata 
+ *VA Linux Systems Japan K.K.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "msi.h"
+
+#define PCI_MSI_PENDING_32  0x10
+#define PCI_MSI_PENDING_64  0x14
+
+/* PCI_MSI_FLAGS */
+#define PCI_MSI_FLAGS_QSIZE_SHIFT   4
+#define PCI_MSI_FLAGS_QMASK_SHIFT   1
+
+/* PCI_MSI_ADDRESS_LO */
+#define PCI_MSI_ADDRESS_LO_RESERVED_MASK0x3
+
+#define PCI_MSI_32_SIZEOF   0x0a
+#define PCI_MSI_64_SIZEOF   0x0e
+#define PCI_MSI_32M_SIZEOF  0x14
+#define PCI_MSI_64M_SIZEOF  0x18
+
+#define MSI_DEBUG
+#ifdef MSI_DEBUG
+# define MSI_DPRINTF(fmt, ...)  \
+fprintf(stderr, "%s:%d " fmt, __func__, __LINE__, ## __VA_ARGS__)
+#else
+# define MSI_DPRINTF(fmt, ...)  do { } while (0)
+#endif
+#define MSI_DEV_PRINTF(dev, fmt, ...)   \
+MSI_DPRINTF("%s:%x " fmt, (dev)->name, (dev)->devfn, ## __VA_ARGS__)
+
+static inline bool msi_test_bit(uint32_t bitmap, uint8_t bit)
+{
+return bitmap & (1 << bit);
+}
+
+static inline void msi_set_bit(uint32_t *bitmap, uint8_t bit)
+{
+*bitmap |= (1 << bit);
+}
+
+static inline void msi_clear_bit(uint32_t *bitmap, uint8_t bit)
+{
+*bitmap &= ~(1 << bit);
+}
+
+
+/* multiple vector only suport power of 2 and up to 32 */
+static inline uint8_t ilog2(uint8_t nr_vector)
+{
+switch (nr_vector) {
+case 1:
+return 0;
+case 2:
+return 1;
+case 4:
+return 2;
+case 8:
+return 3;
+case 16:
+return 4;
+case 32:
+return 5;
+default:
+abort();
+break;
+}
+return 0;
+}
+
+static inline bool is_mask_bit_support(uint16_t control)
+{
+return control & PCI_MSI_FLAGS_MASKBIT;
+}
+
+static inline bool is_64bit_address(uint16_t control)
+{
+return control & PCI_MSI_FLAGS_64BIT;
+}
+
+static inline uint8_t msi_vector_order(uint16_t control)
+{
+return (control & PCI_MSI_FLAGS_QSIZE) >> PCI_MSI_FLAGS_QSIZE_SHIFT;
+}
+
+static inline uint8_t msi_nr_vector(uint16_t control)
+{
+return 1 << msi_vector_order(control);
+}
+
+static inline bool msi_control_enabled(uint16_t control)
+{
+return control & PCI_MSI_FLAGS_ENABLE;
+}
+
+static inline uint8_t msi_cap_sizeof(uint16_t control)
+{
+switch (control & (PCI_MSI_FLAGS_MASKBIT | PCI_MSI_FLAGS_64BIT)) {
+case PCI_MSI_FLAGS_MASKBIT | PCI_MSI_FLAGS_64BIT:
+return PCI_MSI_64M_SIZEOF;
+case PCI_MSI_FLAGS_64BIT:
+return PCI_MSI_64_SIZEOF;
+case PCI_MSI_FLAGS_MASKBIT:
+return PCI_MSI_32M_SIZEOF;
+case 0:
+return PCI_MSI_32_SIZEOF;
+default:
+abort();
+break;
+}
+return 0;
+}
+
+static inline uint16_t msi_control_reg(const PCIDevice* dev)
+{
+return dev->msi_cap + PCI_MSI_FLAGS;
+}
+
+static inline uint32_t msi_lower_address_reg(const PCIDevice* dev)
+{
+return dev->msi_cap + PCI_MSI_ADDRESS_LO;
+}
+
+static inline uint32_t msi_upper_address_reg(const PCIDevice* dev)
+{
+return dev->msi_cap + PCI_MSI_ADDRESS_HI;
+}
+
+static inline uint16_t msi_data_reg(const PCIDevice* dev, bool is64bit)
+{
+return dev->msi_cap + (is64bit ?  PCI_MSI_DATA_64 : PCI_MSI_DATA_32);
+}
+
+static inline uint32_t msi_mask_reg(const PCIDevice* dev, bool is64bit)
+{
+return dev->msi_cap + (is64bit ? PCI_MSI_MASK_64 : PCI_MSI_MASK_32);
+}
+
+static inline uint32_t msi_pending_reg(const PCIDevice* dev, bool is64bit)
+{
+return dev->msi_cap + (is64bit ? PCI_MSI_PENDING_64 : PCI_MSI_PENDING_32);
+}
+
+bool msi_

[Qemu-devel] [PATCH 11/14] pcie upstream port: pci express switch upstream port.

2010-09-06 Thread Isaku Yamahata
pci express switch upstream port.

Signed-off-by: Isaku Yamahata 
---
 Makefile.objs  |2 +-
 hw/pcie_upstream.c |  206 
 hw/pcie_upstream.h |   32 
 3 files changed, 239 insertions(+), 1 deletions(-)
 create mode 100644 hw/pcie_upstream.c
 create mode 100644 hw/pcie_upstream.h

diff --git a/Makefile.objs b/Makefile.objs
index 658c399..e7a08dc 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -139,7 +139,7 @@ user-obj-y += cutils.o cache-utils.o
 hw-obj-y =
 hw-obj-y += vl.o loader.o
 hw-obj-y += virtio.o virtio-console.o
-hw-obj-y += fw_cfg.o pci.o pci_host.o pcie_host.o pci_bridge.o
+hw-obj-y += fw_cfg.o pci.o pci_host.o pcie_host.o pci_bridge.o pcie_upstream.o
 hw-obj-y += watchdog.o
 hw-obj-$(CONFIG_ISA_MMIO) += isa_mmio.o
 hw-obj-$(CONFIG_ECC) += ecc.o
diff --git a/hw/pcie_upstream.c b/hw/pcie_upstream.c
new file mode 100644
index 000..d1f0920
--- /dev/null
+++ b/hw/pcie_upstream.c
@@ -0,0 +1,206 @@
+/*
+ * pcie_upstream.c
+ *
+ * Copyright (c) 2010 Isaku Yamahata 
+ *VA Linux Systems Japan K.K.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "pci_ids.h"
+#include "msi.h"
+#include "pcie.h"
+#include "pcie_upstream.h"
+
+/* For now, TI XIO3130 is borrowed. need to get its own id? */
+#define PCI_DEVICE_ID_TI_XIO3130U   0x8232  /* upstream port */
+#define XIO3130_REVISION0x2
+#define XIO3130_MSI_OFFSET  0x70
+#define XIO3130_MSI_SUPPORTED_FLAGS PCI_MSI_FLAGS_64BIT
+#define XIO3130_MSI_NR_VECTOR   1
+#define XIO3130_SSVID_OFFSET0x80
+#define XIO3130_SSVID_SVID  0
+#define XIO3130_SSVID_SSID  0
+#define XIO3130_EXP_OFFSET  0x90
+#define XIO3130_AER_OFFSET  0x100
+
+#define PCIE_UPSTREAM_VID   PCI_VENDOR_ID_TI
+#define PCIE_UPSTREAM_DID   PCI_DEVICE_ID_TI_XIO3130U
+#define PCIE_UPSTREAM_REVISION  XIO3130_REVISION
+#define PCIE_UPSTREAM_MSI_SUPPORTED_FLAGS   XIO3130_MSI_SUPPORTED_FLAGS
+#define PCIE_UPSTREAM_MSI_NR_VECTOR XIO3130_MSI_NR_VECTOR
+#define PCIE_UPSTREAM_MSI_OFFSETXIO3130_MSI_OFFSET
+#define PCIE_UPSTREAM_SSVID_OFFSET  XIO3130_SSVID_OFFSET
+#define PCIE_UPSTREAM_SVID  XIO3130_SSVID_SVID
+#define PCIE_UPSTREAM_SSID  XIO3130_SSVID_SSID
+#define PCIE_UPSTREAM_EXP_OFFSETXIO3130_EXP_OFFSET
+#define PCIE_UPSTREAM_AER_OFFSETXIO3130_AER_OFFSET
+
+static void pcie_upstream_write_config(PCIDevice *d,
+   uint32_t address, uint32_t val, int len)
+{
+pci_bridge_write_config(d, address, val, len);
+pcie_cap_deverr_write_config(d, address, val, len);
+pcie_cap_flr_write_config(d, address, val, len);
+msi_write_config(d, address, val, len);
+pcie_aer_write_config_vbridge(d, address, val, len);
+pcie_aer_write_config(d, address, val, len);
+}
+
+static void pcie_upstream_reset(DeviceState *qdev)
+{
+PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
+msi_reset(d);
+pci_bridge_reset(qdev);
+pcie_cap_deverr_reset(d);
+}
+
+static void pcie_upstream_flr(PCIDevice *d)
+{
+/* TODO: not enabled until qdev reset clean up
+   waiting for Anthony's qdev cealn up */
+#if 0
+/* So far, sticky bit registers or register which must be preserved
+   over FLR aren't emulated. So just reset this device. */
+pci_device_reset(d);
+#endif
+}
+
+static int pcie_upstream_initfn(PCIDevice *d)
+{
+PCIBridge* br = DO_UPCAST(PCIBridge, dev, d);
+PCIEPort *p = DO_UPCAST(PCIEPort, br, br);
+int rc;
+
+rc = pci_bridge_initfn(d);
+if (rc < 0) {
+return rc;
+}
+
+pcie_port_init_reg(d);
+pci_config_set_vendor_id(d->config, PCIE_UPSTREAM_VID);
+pci_config_set_device_id(d->config, PCIE_UPSTREAM_DID);
+d->config[PCI_REVISION_ID] = PCIE_UPSTREAM_REVISION;
+
+rc = msi_init(d, PCIE_UPSTREAM_MSI_OFFSET, PCIE_UPSTREAM_MSI_NR_VECTOR,
+  PCIE_UPSTREAM_MSI_SUPPORTED_FLAGS);
+if (rc < 0) {
+return rc;
+}
+rc = pci_bridge_ssvid_init(d, PCIE_UPSTREAM_SSVID_OFFSET,
+   PCIE_UPSTREAM_SVID, PCIE_UPSTREAM_SSID);
+if (rc < 0) {
+return rc;
+}
+rc = pci_pcie_cap_init(d, PCIE_UPSTREAM_EXP_OFFSET, PCI_EXP_TYPE_UPSTREAM,
+

[Qemu-devel] [PATCH 05/14] pci: make pci_parse_devfn() aware of func.

2010-09-06 Thread Isaku Yamahata
make pci_parse_devfn() aware of func. With func = NULL it behave as before.
This will be used later.

Signed-off-by: Isaku Yamahata 
---
 hw/pci.c |   34 ++
 hw/pci.h |2 ++
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index bb9ddea..f03b83e 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -424,15 +424,18 @@ static void pci_set_default_subsystem_id(PCIDevice 
*pci_dev)
 }
 
 /*
- * Parse [[:]:], return -1 on error
+ * Parse [[:]:], return -1 on error if funcp == NULL
+ *   [[:]:]., return -1 on error
  */
-static int pci_parse_devaddr(const char *addr, int *domp, int *busp, unsigned 
*slotp)
+int pci_parse_devaddr(const char *addr, int *domp, int *busp,
+  unsigned int *slotp, unsigned int *funcp)
 {
 const char *p;
 char *e;
 unsigned long val;
 unsigned long dom = 0, bus = 0;
-unsigned slot = 0;
+unsigned int slot = 0;
+unsigned int func = 0;
 
 p = addr;
 val = strtoul(p, &e, 16);
@@ -454,11 +457,24 @@ static int pci_parse_devaddr(const char *addr, int *domp, 
int *busp, unsigned *s
}
 }
 
-if (dom > 0x || bus > 0xff || val > 0x1f)
-   return -1;
-
 slot = val;
 
+if (funcp != NULL) {
+if (*e != '.')
+return -1;
+
+p = e + 1;
+val = strtoul(p, &e, 16);
+if (e == p)
+return -1;
+
+func = val;
+}
+
+/* if funcp == NULL func is 0 */
+if (dom > 0x || bus > 0xff || slot > 0x1f || func > 7)
+   return -1;
+
 if (*e)
return -1;
 
@@ -469,6 +485,8 @@ static int pci_parse_devaddr(const char *addr, int *domp, 
int *busp, unsigned *s
 *domp = dom;
 *busp = bus;
 *slotp = slot;
+if (funcp != NULL)
+*funcp = func;
 return 0;
 }
 
@@ -479,7 +497,7 @@ int pci_read_devaddr(Monitor *mon, const char *addr, int 
*domp, int *busp,
 if (!strncmp(addr, "pci_addr=", 9)) {
 addr += 9;
 }
-if (pci_parse_devaddr(addr, domp, busp, slotp)) {
+if (pci_parse_devaddr(addr, domp, busp, slotp, NULL)) {
 monitor_printf(mon, "Invalid pci address\n");
 return -1;
 }
@@ -496,7 +514,7 @@ PCIBus *pci_get_bus_devfn(int *devfnp, const char *devaddr)
 return pci_find_bus(pci_find_root_bus(0), 0);
 }
 
-if (pci_parse_devaddr(devaddr, &dom, &bus, &slot) < 0) {
+if (pci_parse_devaddr(devaddr, &dom, &bus, &slot, NULL) < 0) {
 return NULL;
 }
 
diff --git a/hw/pci.h b/hw/pci.h
index 2ddba59..2b4c318 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -227,6 +227,8 @@ PCIBus *pci_find_bus(PCIBus *bus, int bus_num);
 PCIDevice *pci_find_device(PCIBus *bus, int bus_num, int slot, int function);
 PCIBus *pci_get_bus_devfn(int *devfnp, const char *devaddr);
 
+int pci_parse_devaddr(const char *addr, int *domp, int *busp,
+  unsigned int *slotp, unsigned int *funcp);
 int pci_read_devaddr(Monitor *mon, const char *addr, int *domp, int *busp,
  unsigned *slotp);
 
-- 
1.7.1.1




[Qemu-devel] [PATCH 03/14] pci bridge: add helper function for ssvid capability.

2010-09-06 Thread Isaku Yamahata
helper function to add ssvid capability.

Signed-off-by: Isaku Yamahata 
---
 hw/pci_bridge.c |   19 +++
 hw/pci_bridge.h |3 +++
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/hw/pci_bridge.c b/hw/pci_bridge.c
index 198c3c7..638e3b3 100644
--- a/hw/pci_bridge.c
+++ b/hw/pci_bridge.c
@@ -32,6 +32,25 @@
 #include "pci_bridge.h"
 #include "pci_internals.h"
 
+/* PCI bridge subsystem vendor ID helper functions */
+#define PCI_SSVID_SIZEOF8
+#define PCI_SSVID_SVID  4
+#define PCI_SSVID_SSID  6
+
+int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset,
+  uint16_t svid, uint16_t ssid)
+{
+int pos;
+pos = pci_add_capability(dev, PCI_CAP_ID_SSVID, offset, PCI_SSVID_SIZEOF);
+if (pos < 0) {
+return pos;
+}
+
+pci_set_word(dev->config + pos + PCI_SSVID_SVID, svid);
+pci_set_word(dev->config + pos + PCI_SSVID_SSID, ssid);
+return pos;
+}
+
 /* Accessor function to get parent bridge device from pci bus. */
 PCIDevice *pci_bridge_get_device(PCIBus *bus)
 {
diff --git a/hw/pci_bridge.h b/hw/pci_bridge.h
index 63ada19..f6fade0 100644
--- a/hw/pci_bridge.h
+++ b/hw/pci_bridge.h
@@ -28,6 +28,9 @@
 
 #include "pci.h"
 
+int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset,
+  uint16_t svid, uint16_t ssid);
+
 PCIDevice *pci_bridge_get_device(PCIBus *bus);
 PCIBus *pci_bridge_get_sec_bus(PCIBridge *br);
 
-- 
1.7.1.1




[Qemu-devel] [PATCH 02/14] pci: consolidate pci_add_capability_at_offset() into pci_add_capability().

2010-09-06 Thread Isaku Yamahata
By making pci_add_capability() the special case of
pci_add_capability_at_offset() of offset = 0,
consolidate pci_add_capability_at_offset() into pci_add_capability().

Cc: Stefan Weil 
Cc: Michael S. Tsirkin 
Signed-off-by: Isaku Yamahata 
---
 hw/eepro100.c |4 ++--
 hw/msix.c |3 ++-
 hw/pci.c  |   33 ++---
 hw/pci.h  |5 ++---
 4 files changed, 24 insertions(+), 21 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index 2b75c8f..8cbc3aa 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -539,8 +539,8 @@ static void e100_pci_reset(EEPRO100State * s, 
E100PCIDeviceInfo *e100_device)
 if (e100_device->power_management) {
 /* Power Management Capabilities */
 int cfg_offset = 0xdc;
-int r = pci_add_capability_at_offset(&s->dev, PCI_CAP_ID_PM,
- cfg_offset, PCI_PM_SIZEOF);
+int r = pci_add_capability(&s->dev, PCI_CAP_ID_PM,
+   cfg_offset, PCI_PM_SIZEOF);
 assert(r >= 0);
 pci_set_word(pci_conf + cfg_offset + PCI_PM_PMC, 0x7e21);
 #if 0 /* TODO: replace dummy code for power management emulation. */
diff --git a/hw/msix.c b/hw/msix.c
index d99403a..7ce63eb 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -73,7 +73,8 @@ static int msix_add_config(struct PCIDevice *pdev, unsigned 
short nentries,
 }
 
 pdev->msix_bar_size = new_size;
-config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH);
+config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX,
+   0, MSIX_CAP_LENGTH);
 if (config_offset < 0)
 return config_offset;
 config = pdev->config + config_offset;
diff --git a/hw/pci.c b/hw/pci.c
index 2dc1577..754ffb3 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1682,11 +1682,25 @@ static void pci_del_option_rom(PCIDevice *pdev)
 pdev->rom_offset = 0;
 }
 
-/* Reserve space and add capability to the linked list in pci config space */
-int pci_add_capability_at_offset(PCIDevice *pdev, uint8_t cap_id,
- uint8_t offset, uint8_t size)
+/*
+ * if !offset
+ * Reserve space and add capability to the linked list in pci config space
+ *
+ * if offset = 0,
+ * Find and reserve space and add capability to the linked list
+ * in pci config space */
+int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
+   uint8_t offset, uint8_t size)
 {
-uint8_t *config = pdev->config + offset;
+uint8_t *config;
+if (!offset) {
+offset = pci_find_space(pdev, size);
+if (!offset) {
+return -ENOSPC;
+}
+}
+
+config = pdev->config + offset;
 config[PCI_CAP_LIST_ID] = cap_id;
 config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
 pdev->config[PCI_CAPABILITY_LIST] = offset;
@@ -1699,17 +1713,6 @@ int pci_add_capability_at_offset(PCIDevice *pdev, 
uint8_t cap_id,
 return offset;
 }
 
-/* Find and reserve space and add capability to the linked list
- * in pci config space */
-int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size)
-{
-uint8_t offset = pci_find_space(pdev, size);
-if (!offset) {
-return -ENOSPC;
-}
-return pci_add_capability_at_offset(pdev, cap_id, offset, size);
-}
-
 /* Unlink capability from the pci config space. */
 void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size)
 {
diff --git a/hw/pci.h b/hw/pci.h
index c551f96..2ddba59 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -183,9 +183,8 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
 pcibus_t size, int type,
 PCIMapIORegionFunc *map_func);
 
-int pci_add_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t cap_size);
-int pci_add_capability_at_offset(PCIDevice *pci_dev, uint8_t cap_id,
- uint8_t cap_offset, uint8_t cap_size);
+int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
+   uint8_t offset, uint8_t size);
 
 void pci_del_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t cap_size);
 
-- 
1.7.1.1




[Qemu-devel] [PATCH 04/14] pci: call hotplug callback even when not hotplug case for later use.

2010-09-06 Thread Isaku Yamahata
call hotplug callback even when not hotplug case for later use.
And move hotplug check into hotplug callback.
PCIE slot needs this for card presence detection.

Signed-off-by: Isaku Yamahata 
---
 hw/acpi_piix4.c |3 +++
 hw/pci.c|3 ++-
 2 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index bfa1d9a..24dfcf2 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -611,6 +611,9 @@ static int piix4_device_hotplug(DeviceState *qdev, 
PCIDevice *dev, int state)
 PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev,
 DO_UPCAST(PCIDevice, qdev, qdev));
 
+if (!dev->qdev.hotplugged)
+return 0;
+
 s->pci0_status.up = 0;
 s->pci0_status.down = 0;
 if (state) {
diff --git a/hw/pci.c b/hw/pci.c
index 754ffb3..bb9ddea 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1514,7 +1514,8 @@ static int pci_qdev_init(DeviceState *qdev, DeviceInfo 
*base)
 pci_dev->romfile = qemu_strdup(info->romfile);
 pci_add_option_rom(pci_dev);
 
-if (qdev->hotplugged) {
+if (bus->hotplug) {
+/* lower layer must check qdev->hotplugged */
 rc = bus->hotplug(bus->hotplug_qdev, pci_dev, 1);
 if (rc != 0) {
 int r = pci_unregister_device(&pci_dev->qdev);
-- 
1.7.1.1




[Qemu-devel] [PATCH 10/14] pcie root port: implement pcie root port.

2010-09-06 Thread Isaku Yamahata
pcie root port.

Signed-off-by: Isaku Yamahata 
---
 Makefile.objs  |2 +-
 hw/pcie_root.c |  247 
 hw/pcie_root.h |   32 +++
 3 files changed, 280 insertions(+), 1 deletions(-)
 create mode 100644 hw/pcie_root.c
 create mode 100644 hw/pcie_root.h

diff --git a/Makefile.objs b/Makefile.objs
index c73d12b..658c399 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -186,7 +186,7 @@ hw-obj-$(CONFIG_PIIX4) += piix4.o
 # PCI watchdog devices
 hw-obj-y += wdt_i6300esb.o
 
-hw-obj-y += pcie.o pcie_port.o
+hw-obj-y += pcie.o pcie_port.o pcie_root.o
 hw-obj-y += msix.o msi.o
 
 # PCI network cards
diff --git a/hw/pcie_root.c b/hw/pcie_root.c
new file mode 100644
index 000..cecfc6e
--- /dev/null
+++ b/hw/pcie_root.c
@@ -0,0 +1,247 @@
+/*
+ * pcie_root.c
+ *
+ * Copyright (c) 2010 Isaku Yamahata 
+ *VA Linux Systems Japan K.K.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "pci_ids.h"
+#include "msi.h"
+#include "pcie.h"
+#include "pcie_root.h"
+
+/* For now, Intel X58 IOH corporate deice exporess* root port.
+   need to get its own id? */
+#define PCI_DEVICE_ID_IOH_EPORT 0x3420  /* D0:F0 express mode */
+#define PCI_DEVICE_ID_IOH_REV   0x2
+#define IOH_EP_SSVID_OFFSET 0x40
+#define IOH_EP_SSVID_SVID   PCI_VENDOR_ID_INTEL
+#define IOH_EP_SSVID_SSID   0
+#define IOH_EP_MSI_OFFSET   0x60
+#define IOH_EP_MSI_SUPPORTED_FLAGS  PCI_MSI_FLAGS_MASKBIT
+#define IOH_EP_MSI_NR_VECTOR2
+#define IOH_EP_EXP_OFFSET   0x90
+#define IOH_EP_AER_OFFSET   0x100
+
+#define PCIE_ROOT_VID   PCI_VENDOR_ID_INTEL
+#define PCIE_ROOT_DID   PCI_DEVICE_ID_IOH_EPORT
+#define PCIE_ROOT_REV   PCI_DEVICE_ID_IOH_REV
+#define PCIE_ROOT_SSVID_OFFSET  IOH_EP_SSVID_OFFSET
+#define PCIE_ROOT_SVID  IOH_EP_SSVID_SVID
+#define PCIE_ROOT_SSID  IOH_EP_SSVID_SSID
+#define PCIE_ROOT_MSI_SUPPORTED_FLAGS   IOH_EP_MSI_SUPPORTED_FLAGS
+#define PCIE_ROOT_MSI_NR_VECTOR IOH_EP_MSI_NR_VECTOR
+#define PCIE_ROOT_MSI_OFFSETIOH_EP_MSI_OFFSET
+#define PCIE_ROOT_EXP_OFFSETIOH_EP_EXP_OFFSET
+#define PCIE_ROOT_AER_OFFSETIOH_EP_AER_OFFSET
+
+/*
+ * If two MSI vector are allocated, Advanced Error Interrupt Message Number
+ * is 1. otherwise 0.
+ * 17.12.5.10 RPERRSTS,  32:27 bit Advanced Error Interrupt Message Number.
+ */
+static uint8_t pcie_root_aer_vector(const PCIDevice *d)
+{
+switch (msi_nr_allocated_vector(d)) {
+case 1:
+return 0;
+case 2:
+return 1;
+case 4:
+case 8:
+case 16:
+case 32:
+default:
+break;
+}
+abort();
+return 0;
+}
+
+static void pcie_root_aer_vector_update(PCIDevice *d)
+{
+pcie_aer_root_set_vector(d, pcie_root_aer_vector(d));
+}
+
+static void pcie_root_write_config(PCIDevice *d,
+   uint32_t address, uint32_t val, int len)
+{
+uint16_t sltctl =
+pci_get_word(d->config + pci_pcie_cap(d) + PCI_EXP_SLTCTL);
+uint32_t root_cmd =
+pci_get_long(d->config + pcie_aer_cap(d) + PCI_ERR_ROOT_COMMAND);
+
+pci_bridge_write_config(d, address, val, len);
+msi_write_config(d, address, val, len);
+pcie_root_aer_vector_update(d);
+pcie_cap_deverr_write_config(d, address, val, len);
+pcie_cap_slot_write_config(d, address, val, len, sltctl);
+pcie_aer_write_config_vbridge(d, address, val, len);
+pcie_aer_write_config(d, address, val, len);
+pcie_aer_root_write_config(d, address, val, len, root_cmd);
+}
+
+static void pcie_root_reset(DeviceState *qdev)
+{
+PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
+msi_reset(d);
+pcie_root_aer_vector_update(d);
+pcie_cap_root_reset(d);
+pcie_cap_deverr_reset(d);
+pcie_cap_slot_reset(d);
+pcie_aer_root_reset(d);
+pci_bridge_reset(qdev);
+}
+
+static int pcie_root_initfn(PCIDevice *d)
+{
+PCIBridge* br = DO_UPCAST(PCIBridge, dev, d);
+PCIEPort *p = DO_UPCAST(PCIEPort, br, br);
+PCIESlot *s = DO_UPCAST(PCIESlot, port, p);
+int rc;
+
+rc = pci_bridge_initfn(d);
+if (rc < 0) {
+return rc;
+}
+
+d->config[PCI_REVISION_ID] = PCIE_ROOT_REV;
+pcie_port_init_reg(d);
+
+pc

[Qemu-devel] [PATCH 14/14] pcie/aer: glue aer error injection into qemu monitor.

2010-09-06 Thread Isaku Yamahata
glue aer error injection into qemu monitor.

Signed-off-by: Isaku Yamahata 

Conflicts:

hw/pcie.c
---
 hw/pcie.c   |   85 +++
 qemu-monitor.hx |   22 ++
 sysemu.h|5 +++
 3 files changed, 112 insertions(+), 0 deletions(-)

diff --git a/hw/pcie.c b/hw/pcie.c
index 1f24c2a..f02a868 100644
--- a/hw/pcie.c
+++ b/hw/pcie.c
@@ -19,6 +19,8 @@
  */
 
 #include "sysemu.h"
+#include "qemu-objects.h"
+#include "monitor.h"
 #include "pci_bridge.h"
 #include "pcie.h"
 #include "msix.h"
@@ -1666,3 +1668,86 @@ const VMStateDescription vmstate_pcie_aer_log = {
 VMSTATE_END_OF_LIST()
 }
 };
+
+void pcie_aer_inject_error_print(Monitor *mon, const QObject *data)
+{
+QDict *qdict;
+int devfn;
+assert(qobject_type(data) == QTYPE_QDICT);
+qdict = qobject_to_qdict(data);
+
+devfn = (int)qdict_get_int(qdict, "devfn");
+monitor_printf(mon, "OK domain: %x, bus: %x devfn: %x.%x\n",
+   (int) qdict_get_int(qdict, "domain"),
+   (int) qdict_get_int(qdict, "bus"),
+   PCI_SLOT(devfn), PCI_FUNC(devfn));
+}
+
+int do_pcie_aer_inejct_error(Monitor *mon,
+ const QDict *qdict, QObject **ret_data)
+{
+const char *pci_addr = qdict_get_str(qdict, "pci_addr");
+int dom;
+int bus;
+unsigned int slot;
+unsigned int func;
+PCIDevice *dev;
+struct pcie_aer_err err;
+
+/* Ideally qdev device path should be used.
+ * However at the moment there is no reliable way to determine
+ * wheher a given qdev is pci device or not.
+ * so pci_addr is used.
+ */
+if (pci_parse_devaddr(pci_addr, &dom, &bus, &slot, &func)) {
+monitor_printf(mon, "invalid pci address %s\n", pci_addr);
+return -1;
+}
+dev = pci_find_device(pci_find_root_bus(dom), bus, slot, func);
+if (!dev) {
+monitor_printf(mon, "device is not found. 0x%x:0x%x.0x%x\n",
+   bus, slot, func);
+return -1;
+}
+if (!pci_is_express(dev)) {
+monitor_printf(mon, "the device doesn't support pci express. "
+   "0x%x:0x%x.0x%x\n",
+   bus, slot, func);
+return -1;
+}
+
+err.status = qdict_get_int(qdict, "error_status");
+err.source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn;
+
+err.flags = 0;
+if (qdict_get_int(qdict, "is_correctable")) {
+err.flags |= PCIE_AER_ERR_IS_CORRECTABLE;
+}
+if (qdict_get_int(qdict, "advisory_non_fatal")) {
+err.flags |= PCIE_AER_ERR_MAYBE_ADVISORY;
+}
+if (qdict_haskey(qdict, "tlph0")) {
+err.flags |= PCIE_AER_ERR_HEADER_VALID;
+}
+if (qdict_haskey(qdict, "hpfx0")) {
+err.flags |= PCIE_AER_ERR_TLP_PRESENT;
+}
+
+err.header[0] = qdict_get_try_int(qdict, "tlph0", 0);
+err.header[1] = qdict_get_try_int(qdict, "tlph1", 0);
+err.header[2] = qdict_get_try_int(qdict, "tlph2", 0);
+err.header[3] = qdict_get_try_int(qdict, "tlph3", 0);
+
+err.prefix[0] = qdict_get_try_int(qdict, "hpfx0", 0);
+err.prefix[1] = qdict_get_try_int(qdict, "hpfx1", 0);
+err.prefix[2] = qdict_get_try_int(qdict, "hpfx2", 0);
+err.prefix[3] = qdict_get_try_int(qdict, "hpfx3", 0);
+
+pcie_aer_inject_error(dev, &err);
+*ret_data = qobject_from_jsonf("{ 'domain': %d, 'bus': %d, 'devfn': %d }",
+   pci_find_domain(dev->bus),
+   pci_bus_num(dev->bus), dev->devfn);
+assert(*ret_data);
+
+return 0;
+}
diff --git a/qemu-monitor.hx b/qemu-monitor.hx
index 02fbda1..080c90e 100644
--- a/qemu-monitor.hx
+++ b/qemu-monitor.hx
@@ -1168,6 +1168,28 @@ Push PCI express attention button
 ETEXI
 
 {
+.name   = "pcie_aer_inject_error",
+.args_type  = "advisory_non_fatal:-a,is_correctable:-c,"
+ "pci_addr:s,error_status:i,"
+ "tlph0:i?,tlph1:i?,tlph2:i?,tlph3:i?,"
+ "hpfx0:i?,hpfx1:i?,hpfx2:i?,hpfx3:i?",
+.params = "[-a] [-c] [[:]:]. "
+ " "
+ "[] "
+ "[]",
+.help   = "inject pcie aer error "
+  "(use -a for advisory non fatal error) "
+  "(use -c for correctrable error)",
+.user_print  = pcie_aer_inject_error_print,
+.mhandler.cmd_new = do_pcie_aer_inejct_error,
+},
+
+STEXI
+...@item pcie_abp
+Push PCI express attention button
+ETEXI
+
+{
 .name   = "host_net_add",
 .args_type  = "device:s,opts:s?",
 .params = "tap|user|socket|vde|dump [options]",
diff --git a/sysemu.h b/sysemu.h
index cca411d..2f7157c 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -155,6 +155,11 @@ void pcie_attention_button_push_print(Monitor *mon, const 
QObject *data);
 int pcie_attention_button_push(Monitor *mon, const QDict *qdict,

[Qemu-devel] [PATCH 06/14] pci_ids.h: add vendor id of Texus Intesruments.

2010-09-06 Thread Isaku Yamahata
add vendor id of Texus Intesruments.

Signed-off-by: Isaku Yamahata 
---
 hw/pci_ids.h |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/hw/pci_ids.h b/hw/pci_ids.h
index 39e9f1d..82cba7e 100644
--- a/hw/pci_ids.h
+++ b/hw/pci_ids.h
@@ -57,6 +57,8 @@
 #define PCI_VENDOR_ID_AMD0x1022
 #define PCI_DEVICE_ID_AMD_LANCE  0x2000
 
+#define PCI_VENDOR_ID_TI 0x104c
+
 #define PCI_VENDOR_ID_MOTOROLA   0x1057
 #define PCI_DEVICE_ID_MOTOROLA_MPC1060x0002
 #define PCI_DEVICE_ID_MOTOROLA_RAVEN 0x4801
-- 
1.7.1.1




[Qemu-devel] [PATCH 01/14] RESEND apb: fix typo.

2010-09-06 Thread Isaku Yamahata
fix typo.

Signed-off-by: Isaku Yamahata 
---
 hw/apb_pci.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/apb_pci.c b/hw/apb_pci.c
index 10a5baa..c619112 100644
--- a/hw/apb_pci.c
+++ b/hw/apb_pci.c
@@ -362,7 +362,7 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
 /* APB secondary busses */
 pci_dev = pci_create_multifunction(d->bus, PCI_DEVFN(1, 0), true,
"pbm-bridge");
-br = DO_UPCAST(PCIBridge, dev, dev);
+br = DO_UPCAST(PCIBridge, dev, pci_dev);
 pci_bridge_map_irq(br, "Advanced PCI Bus secondary bridge 1",
pci_apb_map_irq);
 qdev_init_nofail(&pci_dev->qdev);
@@ -370,7 +370,7 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
 
 pci_dev = pci_create_multifunction(d->bus, PCI_DEVFN(1, 1), true,
"pbm-bridge");
-br = DO_UPCAST(PCIBridge, dev, dev);
+br = DO_UPCAST(PCIBridge, dev, pci_dev);
 pci_bridge_map_irq(br, "Advanced PCI Bus secondary bridge 2",
pci_apb_map_irq);
 qdev_init_nofail(&pci_dev->qdev);
@@ -462,7 +462,7 @@ static PCIDeviceInfo pbm_pci_bridge_info = {
 .qdev.name = "pbm-bridge",
 .qdev.size = sizeof(PCIBridge),
 .qdev.vmsd = &vmstate_pci_device,
-.qdev.reset = pci_brdige_reset,
+.qdev.reset = pci_bridge_reset,
 .init = apb_pci_bridge_initfn,
 .exit = pci_bridge_exitfn,
 .config_write = pci_bridge_write_config,
-- 
1.7.1.1




[Qemu-devel] [PATCH 00/14] pcie port switch emulators

2010-09-06 Thread Isaku Yamahata
This patch series implements pcie port switch emulators
which is basic part for pcie/q35 support.
This is for mst/pci tree.

some random comments
- pci bus reset
  As Anthony is cleaning up qdev reset stuff, so pci bus reset code
  is commented out. Once the qdev clean up is done, the patch that
  enabled pci bus reset will be sent.

- vid/did
  there are arbitrariness for pcie port switch. I just choose
  Intel and IT one to model them.

thanks,

Isaku Yamahata (14):
  RESEND apb: fix typo.
  pci: consolidate pci_add_capability_at_offset() into
pci_add_capability().
  pci bridge: add helper function for ssvid capability.
  pci: call hotplug callback even when not hotplug case for later use.
  pci: make pci_parse_devfn() aware of func.
  pci_ids.h: add vendor id of Texus Intesruments.
  msi: implemented msi.
  pcie: helper functions for pcie extended capability.
  pcie port: define struct PCIEPort/PCIESlot and helper functions
  pcie root port: implement pcie root port.
  pcie upstream port: pci express switch upstream port.
  pcie downstream port: pci express switch downstream port.
  pcie/hotplug: glue pushing attention button command. pcie_abp
  pcie/aer: glue aer error injection into qemu monitor.

 Makefile.objs|6 +-
 hw/acpi_piix4.c  |3 +
 hw/apb_pci.c |6 +-
 hw/eepro100.c|4 +-
 hw/msi.c |  362 +++
 hw/msi.h |   41 ++
 hw/msix.c|3 +-
 hw/pci.c |   70 ++-
 hw/pci.h |   36 +-
 hw/pci_bridge.c  |   19 +
 hw/pci_bridge.h  |3 +
 hw/pci_ids.h |2 +
 hw/pcie.c| 1753 ++
 hw/pcie.h|  186 ++
 hw/pcie_downstream.c |  224 +++
 hw/pcie_downstream.h |   33 +
 hw/pcie_port.c   |  188 ++
 hw/pcie_port.h   |   51 ++
 hw/pcie_root.c   |  247 +++
 hw/pcie_root.h   |   32 +
 hw/pcie_upstream.c   |  206 ++
 hw/pcie_upstream.h   |   32 +
 qemu-common.h|3 +
 qemu-monitor.hx  |   36 +
 sysemu.h |9 +
 25 files changed, 3520 insertions(+), 35 deletions(-)
 create mode 100644 hw/msi.c
 create mode 100644 hw/msi.h
 create mode 100644 hw/pcie.c
 create mode 100644 hw/pcie.h
 create mode 100644 hw/pcie_downstream.c
 create mode 100644 hw/pcie_downstream.h
 create mode 100644 hw/pcie_port.c
 create mode 100644 hw/pcie_port.h
 create mode 100644 hw/pcie_root.c
 create mode 100644 hw/pcie_root.h
 create mode 100644 hw/pcie_upstream.c
 create mode 100644 hw/pcie_upstream.h