Re: [PATCH v10 1/9] mm: Introduce memfd_restricted system call to create restricted user memory
> +static struct file *restrictedmem_file_create(struct file *memfd) > +{ > + struct restrictedmem_data *data; > + struct address_space *mapping; > + struct inode *inode; > + struct file *file; > + > + data = kzalloc(sizeof(*data), GFP_KERNEL); > + if (!data) > + return ERR_PTR(-ENOMEM); > + > + data->memfd = memfd; > + mutex_init(&data->lock); > + INIT_LIST_HEAD(&data->notifiers); > + > + inode = alloc_anon_inode(restrictedmem_mnt->mnt_sb); > + if (IS_ERR(inode)) { > + kfree(data); > + return ERR_CAST(inode); > + } alloc_anon_inode() uses new_pseudo_inode() to get the inode. As per the comment, new inode is not added to the superblock s_inodes list. /** * new_inode_pseudo- obtain an inode * @sb: superblock * * Allocates a new inode for given superblock. * Inode wont be chained in superblock s_inodes list * This means : * - fs can't be unmount * - quotas, fsnotify, writeback can't work */ So the restrictedmem_error_page will not find the inode as it was never added to the s_inodes list. We might need to add the inode after allocating. inode_sb_list_add(inode); > +void restrictedmem_error_page(struct page *page, struct address_space > *mapping) > +{ > + struct super_block *sb = restrictedmem_mnt->mnt_sb; > + struct inode *inode, *next; > + > + if (!shmem_mapping(mapping)) > + return; > + > + spin_lock(&sb->s_inode_list_lock); > + list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) { > + struct restrictedmem_data *data = > inode->i_mapping->private_data; > + struct file *memfd = data->memfd; > + > + if (memfd->f_mapping == mapping) { > + pgoff_t start, end; > + > + spin_unlock(&sb->s_inode_list_lock); > + > + start = page->index; > + end = start + thp_nr_pages(page); > + restrictedmem_notifier_error(data, start, end); > + return; > + } > + } > + spin_unlock(&sb->s_inode_list_lock); > +} Regards Nikunj
Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory
On 15/08/22 18:34, Chao Peng wrote: > On Fri, Aug 12, 2022 at 02:18:43PM +0530, Nikunj A. Dadhania wrote: >> >> >> On 12/08/22 12:48, Gupta, Pankaj wrote: >>> >>>>>>>> >>>>>>>> However, fallocate() preallocates full guest memory before starting >>>>>>>> the guest. >>>>>>>> With this behaviour guest memory is *not* demand pinned. Is there a >>>>>>>> way to >>>>>>>> prevent fallocate() from reserving full guest memory? >>>>>>> >>>>>>> Isn't the pinning being handled by the corresponding host memory >>>>>>> backend with mmu > notifier and architecture support while doing the >>>>>>> memory operations e.g page> migration and swapping/reclaim (not >>>>>>> supported currently AFAIU). But yes, we need> to allocate entire guest >>>>>>> memory with the new flags MEMFILE_F_{UNMOVABLE, UNRECLAIMABLE etc}. >>>>>> >>>>>> That is correct, but the question is when does the memory allocated, as >>>>>> these flags are set, >>>>>> memory is neither moved nor reclaimed. In current scenario, if I start a >>>>>> 32GB guest, all 32GB is >>>>>> allocated. >>>>> >>>>> I guess so if guest memory is private by default. >>>>> >>>>> Other option would be to allocate memory as shared by default and >>>>> handle on demand allocation and RMPUPDATE with page state change event. >>>>> But still that would be done at guest boot time, IIUC. >>>> >>>> Sorry! Don't want to hijack the other thread so replying here. >>>> >>>> I thought the question is for SEV SNP. For SEV, maybe the hypercall with >>>> the page state information can be used to allocate memory as we use it or >>>> something like quota based memory allocation (just thinking). >>> >>> But all this would have considerable performance overhead (if by default >>> memory is shared) and used mostly at boot time. >> >>> So, preallocating memory (default memory private) seems better approach for >>> both SEV & SEV SNP with later page management (pinning, reclaim) taken care >>> by host memory backend & architecture together. >> >> I am not sure how will pre-allocating memory help, even if guest would not >> use full memory it will be pre-allocated. Which if I understand correctly is >> not expected. > > Actually the current version allows you to delay the allocation to a > later time (e.g. page fault time) if you don't call fallocate() on the > private fd. fallocate() is necessary in previous versions because we > treat the existense in the fd as 'private' but in this version we track > private/shared info in KVM so we don't rely on that fact from memory > backstores. Thanks for confirming Chao, in that case we can drop fallocate() from qemu in both the case * Once while creating the memfd private object * During ram_block_convert_range() for shared->private and vice versa. > Definitely the page will still be pinned once it's allocated, there is > no way to swap it out for example just with the current code. Agree, at present once the page is brought in, page will remain till VM shutdown. > That kind of support, if desirable, can be extended through MOVABLE flag and > some > other callbacks to let feature-specific code to involve. Sure, that could be future work. Regards Nikunj
Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory
On 12/08/22 12:48, Gupta, Pankaj wrote: > >> >> However, fallocate() preallocates full guest memory before starting the >> guest. >> With this behaviour guest memory is *not* demand pinned. Is there a way >> to >> prevent fallocate() from reserving full guest memory? > > Isn't the pinning being handled by the corresponding host memory backend > with mmu > notifier and architecture support while doing the memory > operations e.g page> migration and swapping/reclaim (not supported > currently AFAIU). But yes, we need> to allocate entire guest memory with > the new flags MEMFILE_F_{UNMOVABLE, UNRECLAIMABLE etc}. That is correct, but the question is when does the memory allocated, as these flags are set, memory is neither moved nor reclaimed. In current scenario, if I start a 32GB guest, all 32GB is allocated. >>> >>> I guess so if guest memory is private by default. >>> >>> Other option would be to allocate memory as shared by default and >>> handle on demand allocation and RMPUPDATE with page state change event. But >>> still that would be done at guest boot time, IIUC. >> >> Sorry! Don't want to hijack the other thread so replying here. >> >> I thought the question is for SEV SNP. For SEV, maybe the hypercall with the >> page state information can be used to allocate memory as we use it or >> something like quota based memory allocation (just thinking). > > But all this would have considerable performance overhead (if by default > memory is shared) and used mostly at boot time. > So, preallocating memory (default memory private) seems better approach for > both SEV & SEV SNP with later page management (pinning, reclaim) taken care > by host memory backend & architecture together. I am not sure how will pre-allocating memory help, even if guest would not use full memory it will be pre-allocated. Which if I understand correctly is not expected. Regards Nikunj
Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory
On 11/08/22 19:02, Chao Peng wrote: > On Thu, Aug 11, 2022 at 01:30:06PM +0200, Gupta, Pankaj wrote: >>> >>> While debugging an issue with SEV+UPM, found that fallocate() returns >>> an error in QEMU which is not handled (EINTR). With the below handling >>> of EINTR subsequent fallocate() succeeds: > > QEMU code has not well-tested so it's not strange you met problem. But > from the man page, there is signal was caught for EINTR, do you know > the signal number? > > Thanks for you patch but before we change it in QEMU I want to make sure > it's indeed a QEMU issue (e.g. not a kernel isssue). > >>> >>> >>> diff --git a/backends/hostmem-memfd-private.c >>> b/backends/hostmem-memfd-private.c >>> index af8fb0c957..e8597ed28d 100644 >>> --- a/backends/hostmem-memfd-private.c >>> +++ b/backends/hostmem-memfd-private.c >>> @@ -39,7 +39,7 @@ priv_memfd_backend_memory_alloc(HostMemoryBackend >>> *backend, Error **errp) >>> MachineState *machine = MACHINE(qdev_get_machine()); >>> uint32_t ram_flags; >>> char *name; >>> -int fd, priv_fd; >>> +int fd, priv_fd, ret; >>> if (!backend->size) { >>> error_setg(errp, "can't create backend with size 0"); >>> @@ -65,7 +65,15 @@ priv_memfd_backend_memory_alloc(HostMemoryBackend >>> *backend, Error **errp) >>> backend->size, ram_flags, fd, 0, errp); >>> g_free(name); >>> -fallocate(priv_fd, 0, 0, backend->size); >>> +again: >>> +ret = fallocate(priv_fd, 0, 0, backend->size); >>> +if (ret) { >>> + perror("Fallocate failed: \n"); >>> + if (errno == EINTR) >>> + goto again; >>> + else >>> + exit(1); >>> +} >>> >>> However, fallocate() preallocates full guest memory before starting the >>> guest. >>> With this behaviour guest memory is *not* demand pinned. This is with reference to the SEV demand pinning patches that I was working on. The understanding was UPM will not reserve memory for SEV/TDX guest in the beginning similar to normal guest. Here is the relevant quote from the discussion with Sean[1]: "I think we should abandon this approach in favor of committing all our resources to fd-based private memory[*], which (if done right) will provide on-demand pinning for "free". " >>> Is there a way to prevent fallocate() from reserving full guest memory? Regards Nikunj [1] https://lore.kernel.org/kvm/ykih8zm7xfhsf...@google.com/
Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory
On 11/08/22 19:02, Chao Peng wrote: > On Thu, Aug 11, 2022 at 01:30:06PM +0200, Gupta, Pankaj wrote: >> Test To test the new functionalities of this patch TDX patchset is needed. Since TDX patchset has not been merged so I did two kinds of test: - Regresion test on kvm/queue (this patchset) Most new code are not covered. Code also in below repo: https://github.com/chao-p/linux/tree/privmem-v7 - New Funational test on latest TDX code The patch is rebased to latest TDX code and tested the new funcationalities. See below repos: Linux: https://github.com/chao-p/linux/tree/privmem-v7-tdx QEMU: https://github.com/chao-p/qemu/tree/privmem-v7 >>> >>> While debugging an issue with SEV+UPM, found that fallocate() returns >>> an error in QEMU which is not handled (EINTR). With the below handling >>> of EINTR subsequent fallocate() succeeds: > > QEMU code has not well-tested so it's not strange you met problem. But > from the man page, there is signal was caught for EINTR, do you know > the signal number? I haven't check that, but that should be fairly straight forward to get. I presume that you are referring to signal_pending() in the shmem_fallocate() > Thanks for you patch but before we change it in QEMU I want to make sure > it's indeed a QEMU issue (e.g. not a kernel isssue). As per the manual fallocate() can return EINTR, and this should be handled by the user space. Regards Nikunj
Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory
On 11/08/22 17:00, Gupta, Pankaj wrote: > >>> This is the v7 of this series which tries to implement the fd-based KVM >>> guest private memory. The patches are based on latest kvm/queue branch >>> commit: >>> >>> b9b71f43683a (kvm/queue) KVM: x86/mmu: Buffer nested MMU >>> split_desc_cache only by default capacity >>> >>> Introduction >>> >>> In general this patch series introduce fd-based memslot which provides >>> guest memory through memory file descriptor fd[offset,size] instead of >>> hva/size. The fd can be created from a supported memory filesystem >>> like tmpfs/hugetlbfs etc. which we refer as memory backing store. KVM >>> and the the memory backing store exchange callbacks when such memslot >>> gets created. At runtime KVM will call into callbacks provided by the >>> backing store to get the pfn with the fd+offset. Memory backing store >>> will also call into KVM callbacks when userspace punch hole on the fd >>> to notify KVM to unmap secondary MMU page table entries. >>> >>> Comparing to existing hva-based memslot, this new type of memslot allows >>> guest memory unmapped from host userspace like QEMU and even the kernel >>> itself, therefore reduce attack surface and prevent bugs. >>> >>> Based on this fd-based memslot, we can build guest private memory that >>> is going to be used in confidential computing environments such as Intel >>> TDX and AMD SEV. When supported, the memory backing store can provide >>> more enforcement on the fd and KVM can use a single memslot to hold both >>> the private and shared part of the guest memory. >>> >>> mm extension >>> - >>> Introduces new MFD_INACCESSIBLE flag for memfd_create(), the file >>> created with these flags cannot read(), write() or mmap() etc via normal >>> MMU operations. The file content can only be used with the newly >>> introduced memfile_notifier extension. >>> >>> The memfile_notifier extension provides two sets of callbacks for KVM to >>> interact with the memory backing store: >>> - memfile_notifier_ops: callbacks for memory backing store to notify >>> KVM when memory gets invalidated. >>> - backing store callbacks: callbacks for KVM to call into memory >>> backing store to request memory pages for guest private memory. >>> >>> The memfile_notifier extension also provides APIs for memory backing >>> store to register/unregister itself and to trigger the notifier when the >>> bookmarked memory gets invalidated. >>> >>> The patchset also introduces a new memfd seal F_SEAL_AUTO_ALLOCATE to >>> prevent double allocation caused by unintentional guest when we only >>> have a single side of the shared/private memfds effective. >>> >>> memslot extension >>> - >>> Add the private fd and the fd offset to existing 'shared' memslot so >>> that both private/shared guest memory can live in one single memslot. >>> A page in the memslot is either private or shared. Whether a guest page >>> is private or shared is maintained through reusing existing SEV ioctls >>> KVM_MEMORY_ENCRYPT_{UN,}REG_REGION. >>> >>> Test >>> >>> To test the new functionalities of this patch TDX patchset is needed. >>> Since TDX patchset has not been merged so I did two kinds of test: >>> >>> - Regresion test on kvm/queue (this patchset) >>> Most new code are not covered. Code also in below repo: >>> https://github.com/chao-p/linux/tree/privmem-v7 >>> >>> - New Funational test on latest TDX code >>> The patch is rebased to latest TDX code and tested the new >>> funcationalities. See below repos: >>> Linux: https://github.com/chao-p/linux/tree/privmem-v7-tdx >>> QEMU: https://github.com/chao-p/qemu/tree/privmem-v7 >> >> While debugging an issue with SEV+UPM, found that fallocate() returns >> an error in QEMU which is not handled (EINTR). With the below handling >> of EINTR subsequent fallocate() succeeds: >> >> >> diff --git a/backends/hostmem-memfd-private.c >> b/backends/hostmem-memfd-private.c >> index af8fb0c957..e8597ed28d 100644 >> --- a/backends/hostmem-memfd-private.c >> +++ b/backends/hostmem-memfd-private.c >> @@ -39,7 +39,7 @@ priv_memfd_backend_memory_alloc(HostMemoryBackend >> *backend, Error **errp) >> MachineState *machine = MACHINE(qdev_get_machine()); >> uint32_t ram_flags; >> char *name; >> - int fd, priv_fd; >> + int fd, priv_fd, ret; >> if (!backend->size) { >> error_setg(errp, "can't create backend with size 0"); >> @@ -65,7 +65,15 @@ priv_memfd_backend_memory_alloc(HostMemoryBackend >> *backend, Error **errp) >> backend->size, ram_flags, fd, 0, errp); >> g_free(name); >> - fallocate(priv_fd, 0, 0, backend->size); >> +again: >> + ret = fallocate(priv_fd, 0, 0, backend->size); >> + if (ret) { >> + perror("Fallocate failed: \n"); >> + if (errno == EINTR) >> + goto again; >> + else >> + exit(1); >> +
Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory
On 06/07/22 13:50, Chao Peng wrote: > This is the v7 of this series which tries to implement the fd-based KVM > guest private memory. The patches are based on latest kvm/queue branch > commit: > > b9b71f43683a (kvm/queue) KVM: x86/mmu: Buffer nested MMU > split_desc_cache only by default capacity > > Introduction > > In general this patch series introduce fd-based memslot which provides > guest memory through memory file descriptor fd[offset,size] instead of > hva/size. The fd can be created from a supported memory filesystem > like tmpfs/hugetlbfs etc. which we refer as memory backing store. KVM > and the the memory backing store exchange callbacks when such memslot > gets created. At runtime KVM will call into callbacks provided by the > backing store to get the pfn with the fd+offset. Memory backing store > will also call into KVM callbacks when userspace punch hole on the fd > to notify KVM to unmap secondary MMU page table entries. > > Comparing to existing hva-based memslot, this new type of memslot allows > guest memory unmapped from host userspace like QEMU and even the kernel > itself, therefore reduce attack surface and prevent bugs. > > Based on this fd-based memslot, we can build guest private memory that > is going to be used in confidential computing environments such as Intel > TDX and AMD SEV. When supported, the memory backing store can provide > more enforcement on the fd and KVM can use a single memslot to hold both > the private and shared part of the guest memory. > > mm extension > - > Introduces new MFD_INACCESSIBLE flag for memfd_create(), the file > created with these flags cannot read(), write() or mmap() etc via normal > MMU operations. The file content can only be used with the newly > introduced memfile_notifier extension. > > The memfile_notifier extension provides two sets of callbacks for KVM to > interact with the memory backing store: > - memfile_notifier_ops: callbacks for memory backing store to notify > KVM when memory gets invalidated. > - backing store callbacks: callbacks for KVM to call into memory > backing store to request memory pages for guest private memory. > > The memfile_notifier extension also provides APIs for memory backing > store to register/unregister itself and to trigger the notifier when the > bookmarked memory gets invalidated. > > The patchset also introduces a new memfd seal F_SEAL_AUTO_ALLOCATE to > prevent double allocation caused by unintentional guest when we only > have a single side of the shared/private memfds effective. > > memslot extension > - > Add the private fd and the fd offset to existing 'shared' memslot so > that both private/shared guest memory can live in one single memslot. > A page in the memslot is either private or shared. Whether a guest page > is private or shared is maintained through reusing existing SEV ioctls > KVM_MEMORY_ENCRYPT_{UN,}REG_REGION. > > Test > > To test the new functionalities of this patch TDX patchset is needed. > Since TDX patchset has not been merged so I did two kinds of test: > > - Regresion test on kvm/queue (this patchset) >Most new code are not covered. Code also in below repo: >https://github.com/chao-p/linux/tree/privmem-v7 > > - New Funational test on latest TDX code >The patch is rebased to latest TDX code and tested the new >funcationalities. See below repos: >Linux: https://github.com/chao-p/linux/tree/privmem-v7-tdx >QEMU: https://github.com/chao-p/qemu/tree/privmem-v7 While debugging an issue with SEV+UPM, found that fallocate() returns an error in QEMU which is not handled (EINTR). With the below handling of EINTR subsequent fallocate() succeeds: diff --git a/backends/hostmem-memfd-private.c b/backends/hostmem-memfd-private.c index af8fb0c957..e8597ed28d 100644 --- a/backends/hostmem-memfd-private.c +++ b/backends/hostmem-memfd-private.c @@ -39,7 +39,7 @@ priv_memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) MachineState *machine = MACHINE(qdev_get_machine()); uint32_t ram_flags; char *name; -int fd, priv_fd; +int fd, priv_fd, ret; if (!backend->size) { error_setg(errp, "can't create backend with size 0"); @@ -65,7 +65,15 @@ priv_memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) backend->size, ram_flags, fd, 0, errp); g_free(name); -fallocate(priv_fd, 0, 0, backend->size); +again: +ret = fallocate(priv_fd, 0, 0, backend->size); +if (ret) { + perror("Fallocate failed: \n"); + if (errno == EINTR) + goto again; + else + exit(1); +} However, fallocate() preallocates full guest memory before starting the guest. With this behaviour guest memory is *not* demand pinned. Is there a way to prevent fallocate() from reserving full guest memory? > An example QEMU command line for TDX t
Re: [PATCH] x86: cpu: Error out if memory exceeds addressable range
On 7/18/2022 7:15 PM, Joao Martins wrote: > On 7/18/22 14:10, Nikunj A. Dadhania wrote: >> On 7/18/2022 6:12 PM, Igor Mammedov wrote: >>> On Mon, 18 Jul 2022 13:47:34 +0530 >>> Nikunj A Dadhania wrote: >>> >>>> Currently it is possible to start a guest with memory that is beyond >>>> the addressable range of CPU and QEMU does not even warn about it. >>>> The default phys_bits is 40 and can address 1TB. However it allows to >>>> start a guest with greater than 1TB memory. >>>> >>>> Prevent this by erroring out in such a scenario. >>>> >>>> Reported-by: Shaju Abraham >>>> Signed-off-by: Nikunj A Dadhania >>> >>> >>> Following shall care of your issue: >>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg900136.html >> >> Thanks, I tried out the patch series, I could start guest till 978G (not >> sure >> why this magic number yet) and after that I start getting errors: > > It's expected. The point of the series is meant to avoid attempting at DMA > mapping > over the HyperTransport region. Before it would just fail to either > hotplug/boot with VFIO > devices on kernels >= 5.4 (even if older kernels or other configs let you go > through you > might still see IOMMU errors at some point). So what we essentially do is to > have the > region above 4G to instead start at 1T, thus requiring 1 more phys-bit on > cases like this > where the max gpa hits the Hyper Transport reserved region. > > The cover-letter and this patch > (https://lore.kernel.org/qemu-devel/20220715171628.21437-11-joao.m.mart...@oracle.com/ > should clarify on the logic. Thanks looks good ! > The check you're adding here is essentially patch 9 of the series. Yes, saw that change. Regards Nikunj
Re: [PATCH] x86: cpu: Error out if memory exceeds addressable range
On 7/18/2022 6:12 PM, Igor Mammedov wrote: > On Mon, 18 Jul 2022 13:47:34 +0530 > Nikunj A Dadhania wrote: > >> Currently it is possible to start a guest with memory that is beyond >> the addressable range of CPU and QEMU does not even warn about it. >> The default phys_bits is 40 and can address 1TB. However it allows to >> start a guest with greater than 1TB memory. >> >> Prevent this by erroring out in such a scenario. >> >> Reported-by: Shaju Abraham >> Signed-off-by: Nikunj A Dadhania > > > Following shall care of your issue: > https://www.mail-archive.com/qemu-devel@nongnu.org/msg900136.html Thanks, I tried out the patch series, I could start guest till 978G (not sure why this magic number yet) and after that I start getting errors: $ ./build/qemu-system-x86_64 -enable-kvm -machine q35 -m 979G -kernel bzImage -initrd initramfs.cpio -vga none -nographic -append "console=ttyS0,115200n8 earlyprintk=serial,ttyS0,115200 debug=1 " -nodefaults -serial stdio qemu-system-x86_64: Address space limit 0xff < 0x1fc3fff phys-bits too low (40) Regards Nikunj
[PATCH] x86: cpu: Error out if memory exceeds addressable range
Currently it is possible to start a guest with memory that is beyond the addressable range of CPU and QEMU does not even warn about it. The default phys_bits is 40 and can address 1TB. However it allows to start a guest with greater than 1TB memory. Prevent this by erroring out in such a scenario. Reported-by: Shaju Abraham Signed-off-by: Nikunj A Dadhania --- target/i386/cpu.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 6a57ef13af..1afbdbac7d 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6376,6 +6376,7 @@ static void x86_cpu_hyperv_realize(X86CPU *cpu) static void x86_cpu_realizefn(DeviceState *dev, Error **errp) { +MachineState *machine = MACHINE(qdev_get_machine()); CPUState *cs = CPU(dev); X86CPU *cpu = X86_CPU(dev); X86CPUClass *xcc = X86_CPU_GET_CLASS(dev); @@ -6541,6 +6542,15 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp) } } +if (BIT_ULL(cpu->phys_bits) < machine->maxram_size) { +error_setg(&local_err, "cannot setup guest memory: " + "%s memory(%lu MiB) exceeds addressable limit(%llu MiB)", + machine->maxram_size == machine->ram_size ? "" : "max", + machine->maxram_size / MiB, + BIT_ULL(cpu->phys_bits) / MiB); +goto out; +} + /* Cache information initialization */ if (!cpu->legacy_cache) { if (!xcc->model || !xcc->model->cpudef->cache_info) { -- 2.32.0
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory
On 5/19/2022 9:07 PM, Chao Peng wrote: > A page fault can carry the information of whether the access if private > or not for KVM_MEM_PRIVATE memslot, this can be filled by architecture > code(like TDX code). To handle page faut for such access, KVM maps the > page only when this private property matches host's view on this page > which can be decided by checking whether the corresponding page is > populated in the private fd or not. A page is considered as private when > the page is populated in the private fd, otherwise it's shared. > > For a successful match, private pfn is obtained with memfile_notifier > callbacks from private fd and shared pfn is obtained with existing > get_user_pages. > > For a failed match, KVM causes a KVM_EXIT_MEMORY_FAULT exit to > userspace. Userspace then can convert memory between private/shared from > host's view then retry the access. > > Co-developed-by: Yu Zhang > Signed-off-by: Yu Zhang > Signed-off-by: Chao Peng > --- > arch/x86/kvm/mmu.h | 1 + > arch/x86/kvm/mmu/mmu.c | 70 +++-- > arch/x86/kvm/mmu/mmu_internal.h | 17 > arch/x86/kvm/mmu/mmutrace.h | 1 + > arch/x86/kvm/mmu/paging_tmpl.h | 5 ++- > include/linux/kvm_host.h| 22 +++ > 6 files changed, 112 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h > index 7e258cc94152..c84835762249 100644 > --- a/arch/x86/kvm/mmu.h > +++ b/arch/x86/kvm/mmu.h > @@ -176,6 +176,7 @@ struct kvm_page_fault { > > /* Derived from mmu and global state. */ > const bool is_tdp; > + const bool is_private; > const bool nx_huge_page_workaround_enabled; > > /* > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index afe18d70ece7..e18460e0d743 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -2899,6 +2899,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, > if (max_level == PG_LEVEL_4K) > return PG_LEVEL_4K; > > + if (kvm_slot_is_private(slot)) > + return max_level; Can you explain the rationale behind the above change? AFAIU, this overrides the transparent_hugepage=never setting for both shared and private mappings. > host_level = host_pfn_mapping_level(kvm, gfn, pfn, slot); > return min(host_level, max_level); > } Regards Nikunj
Re: [Qemu-devel] [PATCH v2] target/ppc/kvm: set vcpu as online/offline
David Gibson writes: > On Tue, Sep 04, 2018 at 11:00:39AM +0530, Nikunj A Dadhania wrote: >> Set the newly added register(KVM_REG_PPC_ONLINE) to indicate if the vcpu is >> online(1) or offline(0) >> >> KVM will use this information to set the RWMR register, which controls the >> PURR >> and SPURR accumulation. >> >> CC: pau...@samba.org >> Signed-off-by: Nikunj A Dadhania >> --- >> hw/ppc/spapr_cpu_core.c | 1 + >> hw/ppc/spapr_rtas.c | 3 +++ >> target/ppc/kvm.c| 9 + >> target/ppc/kvm_ppc.h| 6 ++ >> 4 files changed, 19 insertions(+) >> >> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c >> index 876f0b3d9d..9863221756 100644 >> --- a/hw/ppc/spapr_cpu_core.c >> +++ b/hw/ppc/spapr_cpu_core.c >> @@ -90,6 +90,7 @@ void spapr_cpu_set_entry_state(PowerPCCPU *cpu, >> target_ulong nip, target_ulong r >> >> env->nip = nip; >> env->gpr[3] = r3; >> +kvmppc_set_reg_ppc_online(cpu, 1); >> CPU(cpu)->halted = 0; >> /* Enable Power-saving mode Exit Cause exceptions */ >> ppc_store_lpcr(cpu, env->spr[SPR_LPCR] | pcc->lpcr_pm); >> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c >> index 4ac96bc94b..ce8072689b 100644 >> --- a/hw/ppc/spapr_rtas.c >> +++ b/hw/ppc/spapr_rtas.c >> @@ -33,6 +33,7 @@ >> #include "sysemu/device_tree.h" >> #include "sysemu/cpus.h" >> #include "sysemu/hw_accel.h" >> +#include "kvm_ppc.h" >> >> #include "hw/ppc/spapr.h" >> #include "hw/ppc/spapr_vio.h" >> @@ -187,6 +188,7 @@ static void rtas_start_cpu(PowerPCCPU *callcpu, >> sPAPRMachineState *spapr, >> newcpu->env.tb_env->tb_offset = callcpu->env.tb_env->tb_offset; >> >> spapr_cpu_set_entry_state(newcpu, start, r3); >> +kvmppc_set_reg_ppc_online(newcpu, 1); > > You're setting online in spapr_cpu_set_entry_state() called > immediately above, so you don't need it here as well. Ah right, I missed it. Will send a respin. Regards Nikunj
[Qemu-devel] [PATCH v2] target/ppc/kvm: set vcpu as online/offline
Set the newly added register(KVM_REG_PPC_ONLINE) to indicate if the vcpu is online(1) or offline(0) KVM will use this information to set the RWMR register, which controls the PURR and SPURR accumulation. CC: pau...@samba.org Signed-off-by: Nikunj A Dadhania --- hw/ppc/spapr_cpu_core.c | 1 + hw/ppc/spapr_rtas.c | 3 +++ target/ppc/kvm.c| 9 + target/ppc/kvm_ppc.h| 6 ++ 4 files changed, 19 insertions(+) diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c index 876f0b3d9d..9863221756 100644 --- a/hw/ppc/spapr_cpu_core.c +++ b/hw/ppc/spapr_cpu_core.c @@ -90,6 +90,7 @@ void spapr_cpu_set_entry_state(PowerPCCPU *cpu, target_ulong nip, target_ulong r env->nip = nip; env->gpr[3] = r3; +kvmppc_set_reg_ppc_online(cpu, 1); CPU(cpu)->halted = 0; /* Enable Power-saving mode Exit Cause exceptions */ ppc_store_lpcr(cpu, env->spr[SPR_LPCR] | pcc->lpcr_pm); diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c index 4ac96bc94b..ce8072689b 100644 --- a/hw/ppc/spapr_rtas.c +++ b/hw/ppc/spapr_rtas.c @@ -33,6 +33,7 @@ #include "sysemu/device_tree.h" #include "sysemu/cpus.h" #include "sysemu/hw_accel.h" +#include "kvm_ppc.h" #include "hw/ppc/spapr.h" #include "hw/ppc/spapr_vio.h" @@ -187,6 +188,7 @@ static void rtas_start_cpu(PowerPCCPU *callcpu, sPAPRMachineState *spapr, newcpu->env.tb_env->tb_offset = callcpu->env.tb_env->tb_offset; spapr_cpu_set_entry_state(newcpu, start, r3); +kvmppc_set_reg_ppc_online(newcpu, 1); qemu_cpu_kick(CPU(newcpu)); @@ -207,6 +209,7 @@ static void rtas_stop_self(PowerPCCPU *cpu, sPAPRMachineState *spapr, * guest */ ppc_store_lpcr(cpu, env->spr[SPR_LPCR] & ~pcc->lpcr_pm); cs->halted = 1; +kvmppc_set_reg_ppc_online(cpu, 0); qemu_cpu_kick(cs); } diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c index 9211ee2ee1..7be8081968 100644 --- a/target/ppc/kvm.c +++ b/target/ppc/kvm.c @@ -2785,3 +2785,12 @@ bool kvmppc_pvr_workaround_required(PowerPCCPU *cpu) return !kvmppc_is_pr(cs->kvm_state); } + +void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online) +{ +CPUState *cs = CPU(cpu); + +if (kvm_enabled()) { +kvm_set_one_reg(cs, KVM_REG_PPC_ONLINE, &online); +} +} diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h index 657582bb32..d635032bd9 100644 --- a/target/ppc/kvm_ppc.h +++ b/target/ppc/kvm_ppc.h @@ -72,6 +72,7 @@ bool kvmppc_pvr_workaround_required(PowerPCCPU *cpu); bool kvmppc_hpt_needs_host_contiguous_pages(void); void kvm_check_mmu(PowerPCCPU *cpu, Error **errp); +void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online); #else @@ -187,6 +188,11 @@ static inline target_ulong kvmppc_configure_v3_mmu(PowerPCCPU *cpu, return 0; } +static inline void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online) +{ +return; +} + #ifndef CONFIG_USER_ONLY static inline bool kvmppc_spapr_use_multitce(void) { -- 2.14.4
[Qemu-devel] [PATCH v3] target/ppc/kvm: set vcpu as online/offline
Set the newly added register(KVM_REG_PPC_ONLINE) to indicate if the vcpu is online(1) or offline(0) KVM will use this information to set the RWMR register, which controls the PURR and SPURR accumulation. CC: pau...@samba.org Signed-off-by: Nikunj A Dadhania --- hw/ppc/spapr_cpu_core.c | 1 + hw/ppc/spapr_rtas.c | 2 ++ target/ppc/kvm.c| 9 + target/ppc/kvm_ppc.h| 7 +++ 4 files changed, 19 insertions(+) diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c index 876f0b3d9d..9863221756 100644 --- a/hw/ppc/spapr_cpu_core.c +++ b/hw/ppc/spapr_cpu_core.c @@ -90,6 +90,7 @@ void spapr_cpu_set_entry_state(PowerPCCPU *cpu, target_ulong nip, target_ulong r env->nip = nip; env->gpr[3] = r3; +kvmppc_set_reg_ppc_online(cpu, 1); CPU(cpu)->halted = 0; /* Enable Power-saving mode Exit Cause exceptions */ ppc_store_lpcr(cpu, env->spr[SPR_LPCR] | pcc->lpcr_pm); diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c index 4ac96bc94b..d6a0952154 100644 --- a/hw/ppc/spapr_rtas.c +++ b/hw/ppc/spapr_rtas.c @@ -33,6 +33,7 @@ #include "sysemu/device_tree.h" #include "sysemu/cpus.h" #include "sysemu/hw_accel.h" +#include "kvm_ppc.h" #include "hw/ppc/spapr.h" #include "hw/ppc/spapr_vio.h" @@ -207,6 +208,7 @@ static void rtas_stop_self(PowerPCCPU *cpu, sPAPRMachineState *spapr, * guest */ ppc_store_lpcr(cpu, env->spr[SPR_LPCR] & ~pcc->lpcr_pm); cs->halted = 1; +kvmppc_set_reg_ppc_online(cpu, 0); qemu_cpu_kick(cs); } diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c index 9211ee2ee1..7be8081968 100644 --- a/target/ppc/kvm.c +++ b/target/ppc/kvm.c @@ -2785,3 +2785,12 @@ bool kvmppc_pvr_workaround_required(PowerPCCPU *cpu) return !kvmppc_is_pr(cs->kvm_state); } + +void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online) +{ +CPUState *cs = CPU(cpu); + +if (kvm_enabled()) { +kvm_set_one_reg(cs, KVM_REG_PPC_ONLINE, &online); +} +} diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h index 657582bb32..f696c6e498 100644 --- a/target/ppc/kvm_ppc.h +++ b/target/ppc/kvm_ppc.h @@ -72,6 +72,7 @@ bool kvmppc_pvr_workaround_required(PowerPCCPU *cpu); bool kvmppc_hpt_needs_host_contiguous_pages(void); void kvm_check_mmu(PowerPCCPU *cpu, Error **errp); +void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online); #else @@ -187,6 +188,12 @@ static inline target_ulong kvmppc_configure_v3_mmu(PowerPCCPU *cpu, return 0; } +static inline void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, + unsigned int online) +{ +return; +} + #ifndef CONFIG_USER_ONLY static inline bool kvmppc_spapr_use_multitce(void) { -- 2.14.4
Re: [Qemu-devel] [PATCH] target/ppc/kvm: set vcpu as online/offline
David Gibson writes: > On Mon, Apr 23, 2018 at 11:43:02AM +0530, Nikunj A Dadhania wrote: >> Set the newly added register(KVM_REG_PPC_ONLINE) to indicate if the vcpu is >> online(1) or offline(0) >> >> KVM will use this information to set the RWMR register, which controls the >> PURR >> and SPURR accumulation. >> >> CC: pau...@samba.org >> Signed-off-by: Nikunj A Dadhania >> >> --- >> >> http://patchwork.ozlabs.org/patch/901897/ >> --- >> hw/ppc/spapr.c | 1 + >> hw/ppc/spapr_rtas.c | 4 >> linux-headers/asm-powerpc/kvm.h | 1 + > > The usual convention is to send updates to the imported headers, as a > general update to a specific kernel SHA, rather than folding just the > necessary parts into the patches that use them. Sure, we need to wait until Paul's patch is pulled in. > This will need to wait for 2.13. Yes, that is fine. Regards Nikunj
[Qemu-devel] [PATCH] target/ppc/kvm: set vcpu as online/offline
Set the newly added register(KVM_REG_PPC_ONLINE) to indicate if the vcpu is online(1) or offline(0) KVM will use this information to set the RWMR register, which controls the PURR and SPURR accumulation. CC: pau...@samba.org Signed-off-by: Nikunj A Dadhania --- http://patchwork.ozlabs.org/patch/901897/ --- hw/ppc/spapr.c | 1 + hw/ppc/spapr_rtas.c | 4 linux-headers/asm-powerpc/kvm.h | 1 + target/ppc/kvm.c| 9 + target/ppc/kvm_ppc.h| 6 ++ 5 files changed, 21 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index a81570e7c8..48bafe3765 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1549,6 +1549,7 @@ static void spapr_machine_reset(void) first_ppc_cpu->env.gpr[5] = 0; first_cpu->halted = 0; first_ppc_cpu->env.nip = SPAPR_ENTRY_POINT; +kvmppc_set_reg_ppc_online(first_ppc_cpu, 1); spapr->cas_reboot = false; } diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c index 0ec5fa4cfe..0874a746dd 100644 --- a/hw/ppc/spapr_rtas.c +++ b/hw/ppc/spapr_rtas.c @@ -33,6 +33,7 @@ #include "sysemu/device_tree.h" #include "sysemu/cpus.h" #include "sysemu/kvm.h" +#include "kvm_ppc.h" #include "hw/ppc/spapr.h" #include "hw/ppc/spapr_vio.h" @@ -184,6 +185,8 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, sPAPRMachineState *spapr, spapr_cpu_set_endianness(cpu); spapr_cpu_update_tb_offset(cpu); +kvmppc_set_reg_ppc_online(cpu, 1); + qemu_cpu_kick(cs); rtas_st(rets, 0, RTAS_OUT_SUCCESS); @@ -204,6 +207,7 @@ static void rtas_stop_self(PowerPCCPU *cpu, sPAPRMachineState *spapr, PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu); cs->halted = 1; +kvmppc_set_reg_ppc_online(cpu, 0); qemu_cpu_kick(cs); /* Disable Power-saving mode Exit Cause exceptions for the CPU. diff --git a/linux-headers/asm-powerpc/kvm.h b/linux-headers/asm-powerpc/kvm.h index 833ed9a16a..1b32b56a03 100644 --- a/linux-headers/asm-powerpc/kvm.h +++ b/linux-headers/asm-powerpc/kvm.h @@ -633,6 +633,7 @@ struct kvm_ppc_cpu_char { #define KVM_REG_PPC_PSSCR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbd) #define KVM_REG_PPC_DEC_EXPIRY (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbe) +#define KVM_REG_PPC_ONLINE (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbf) /* Transactional Memory checkpointed state: * This is all GPRs, all VSX regs and a subset of SPRs diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c index 79a436a384..32a146f838 100644 --- a/target/ppc/kvm.c +++ b/target/ppc/kvm.c @@ -2877,3 +2877,12 @@ bool kvmppc_pvr_workaround_required(PowerPCCPU *cpu) return !kvmppc_is_pr(cs->kvm_state); } + +void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online) +{ +CPUState *cs = CPU(cpu); + +if (kvm_enabled()) { +kvm_set_one_reg(cs, KVM_REG_PPC_ONLINE, &online); +} +} diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h index 4d2789eef6..7a915fe422 100644 --- a/target/ppc/kvm_ppc.h +++ b/target/ppc/kvm_ppc.h @@ -72,6 +72,7 @@ int kvmppc_resize_hpt_commit(PowerPCCPU *cpu, target_ulong flags, int shift); bool kvmppc_pvr_workaround_required(PowerPCCPU *cpu); bool kvmppc_is_mem_backend_page_size_ok(const char *obj_path); +void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online); #else @@ -187,6 +188,11 @@ static inline target_ulong kvmppc_configure_v3_mmu(PowerPCCPU *cpu, return 0; } +static inline void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu, unsigned int online) +{ +return; +} + #ifndef CONFIG_USER_ONLY static inline off_t kvmppc_alloc_rma(void **rma) { -- 2.14.3
Re: [Qemu-devel] Using new TCG Vector infrastructure in PowerPC
Richard Henderson writes: > On 03/16/2018 12:08 PM, Nikunj A Dadhania wrote: >> @@ -1078,8 +1079,8 @@ struct CPUPPCState { >> /* Altivec registers */ >> ppc_avr_t avr[32]; >> uint32_t vscr; >> -/* VSX registers */ >> -uint64_t vsr[32]; >> +/* 32 (128bit)- VSX registers */ >> +ppc_avr_t vsr[32]; > > Another thing that needs to happen is to make ppc_avr_t to be 16-byte aligned > (this is documented in tcg-gvec-op.h, I believe). > > This is easily accomplished by adding QEMU_ALIGNED(16) to the first union > member. And then you'd like to put vsr adjacent to avr so that you're not > adding another alignment hole. Sure, will do that. Regards Nikunj
Re: [Qemu-devel] Using new TCG Vector infrastructure in PowerPC
Richard Henderson writes: > On 03/07/2018 06:03 PM, Nikunj A Dadhania wrote: >> Hi Richard, >> >> I was working to get TCG vector support for PowerPC[1]. Started with >> converting logical operations like vector AND/OR/XOR and compare >> instructions. Found some inconsistency during my testing on x86 laptop >> emulating PowerPC: > > Great. > > Well, the problem is that you cannot use TCG generic vectors and TCG global > variables to access the same memory. Interesting, wasn't aware of this. > Thus your first step must be to remove all references to cpu_avrh and > cpu_avrl. > These can be replaced by translator helpers that perform an explicit > tcg_gen_ld_i64 or tcg_gen_st_i64 to the proper memory locations. > > Only after that's done can you begin converting other references to use the > host vectors. Otherwise, the tcg optimizer will do Bad and Unpredictable > Things, which may well have produced the incorrect results that you saw. > > I'll note that it's probably worth rearranging all of {fpr,vsr,avr} to the > more > logical configuration presented by Power7 (?) such that it's one array of 64 x > 128-bit registers. I have following for taking care of making VSRs contiguous 128bits. This has touched lot of code even out of tcg directory. So I currently have 32 AVRs (128bits) and 32 VSRs (128 bits). @@ -1026,8 +1027,8 @@ struct CPUPPCState { /* Floating point execution context */ float_status fp_status; -/* floating point registers */ -float64 fpr[32]; +/* floating point registers multiplexed with vsr */ + /* floating point status and control register */ target_ulong fpscr; @@ -1078,8 +1079,8 @@ struct CPUPPCState { /* Altivec registers */ ppc_avr_t avr[32]; uint32_t vscr; -/* VSX registers */ -uint64_t vsr[32]; +/* 32 (128bit)- VSX registers */ +ppc_avr_t vsr[32]; /* SPE registers */ uint64_t spe_acc; uint32_t spe_fscr; Regards, Nikunj
[Qemu-devel] Using new TCG Vector infrastructure in PowerPC
Hi Richard, I was working to get TCG vector support for PowerPC[1]. Started with converting logical operations like vector AND/OR/XOR and compare instructions. Found some inconsistency during my testing on x86 laptop emulating PowerPC: zero = max = 1) tcg_gen_andc_vec - vandc in PPC New API result: andc(zero, max) - (zero & ~max ) = andc(max, zero) - (max & ~zero ) = andc(max, max) - (max & ~max ) = -->WRONG andc(zero, zero)- (zero & ~zero ) = Expected result: andc(zero, max) (zero & ~max ) = andc(max, zero) (max & ~zero ) = andc(max, max) (max & ~max ) = andc(zero, zero) (zero & ~zero ) = 2) tcg_gen_or_vec - vor in PPC New API result: (zero | max ) = -> WRONG (max | max ) = (zero | zero ) = Expected result: (zero | max ) = (max | max ) = (zero | zero ) = 3) tcg_gen_cmp_vec(TCG_COND_EQ) - vcmpequ* in PPC New API result(all incorrect): vcmpequb (zero == zero ) = vcmpequh (zero == zero ) = vcmpequw (zero == zero ) = vcmpequd (zero == zero ) = Expected result: vcmpequb (zero == zero ) = vcmpequh (zero == zero ) = vcmpequw (zero == zero ) = vcmpequd (zero == zero ) = Do you see something that I am missing here ? Regards, Nikunj 1. PowerPC TCG vector infrastructure implementation https://github.com/nikunjad/qemu/tree/ppc_vec_0
[Qemu-devel] [PATCH] hw/ppc/spapr, e500: Use new property "stdout-path" for boot console
Linux kernel commit 2a9d832cc9aae21ea827520fef635b6c49a06c6d (of: Add bindings for chosen node, stdout-path) deprecated chosen property "linux,stdout-path" and "stdout". Introduce the new property "stdout-path" and continue supporting the older property to remain compatible with existing/older firmware. This older property can be deprecated after 5 years. Signed-off-by: Nikunj A Dadhania --- hw/ppc/e500.c | 7 +++ hw/ppc/spapr.c | 7 +++ 2 files changed, 14 insertions(+) diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c index a40d3ec3e3..a325a95015 100644 --- a/hw/ppc/e500.c +++ b/hw/ppc/e500.c @@ -119,7 +119,14 @@ static void dt_serial_create(void *fdt, unsigned long long offset, qemu_fdt_setprop_string(fdt, "/aliases", alias, ser); if (defcon) { +/* + * "linux,stdout-path" and "stdout" properties are deprecated by linux + * kernel. New platforms should only use the "stdout-path" property. Set + * the new property and continue using older property to remain + * compatible with the existing firmware. + */ qemu_fdt_setprop_string(fdt, "/chosen", "linux,stdout-path", ser); +qemu_fdt_setprop_string(fdt, "/chosen", "stdout-path", ser); } } diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 83c9d66dd5..58a44edc4a 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1062,7 +1062,14 @@ static void spapr_dt_chosen(sPAPRMachineState *spapr, void *fdt) } if (!spapr->has_graphics && stdout_path) { +/* + * "linux,stdout-path" and "stdout" properties are deprecated by linux + * kernel. New platforms should only use the "stdout-path" property. Set + * the new property and continue using older property to remain + * compatible with the existing firmware. + */ _FDT(fdt_setprop_string(fdt, chosen, "linux,stdout-path", stdout_path)); +_FDT(fdt_setprop_string(fdt, chosen, "stdout-path", stdout_path)); } spapr_dt_ov5_platform_support(fdt, chosen); -- 2.14.3
Re: [Qemu-devel] [PATCH v2 3/4] spapr/rtas: fix reboot of a SMP TCG guest
Cédric Le Goater writes: > Just like for hot unplugged CPUs, when a guest is rebooted, the > secondary CPUs can be awaken by the decrementer and start entering > SLOF at the same time the boot CPU is. > > To be safe, let's disable the decrementer interrupt in the LPCR for > the secondaries. > > Based on previous work from Nikunj A Dadhania > > Signed-off-by: Cédric Le Goater Reviewed-by: Nikunj A Dadhania > --- > hw/ppc/spapr_cpu_core.c | 12 > 1 file changed, 12 insertions(+) > > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c > index 37beb56e8b18..112868dc39d5 100644 > --- a/hw/ppc/spapr_cpu_core.c > +++ b/hw/ppc/spapr_cpu_core.c > @@ -20,6 +20,7 @@ > #include "sysemu/numa.h" > #include "sysemu/hw_accel.h" > #include "qemu/error-report.h" > +#include "target/ppc/cpu-models.h" > > void spapr_cpu_parse_features(sPAPRMachineState *spapr) > { > @@ -86,6 +87,17 @@ static void spapr_cpu_reset(void *opaque) > cs->halted = 1; > > env->spr[SPR_HIOR] = 0; > + > +/* Don't let the decremeter wake up CPUs other than the boot > + * CPUs. this can cause issues when rebooting the guest */ > +if (cs != first_cpu) { > +if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) { > +env->spr[SPR_LPCR] &= ~LPCR_DEE; > +} else { > +/* P7 and P8 both have same bit for DECR */ > +env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3; > +} > +} > } > > static void spapr_cpu_destroy(PowerPCCPU *cpu) > -- > 2.13.6
Re: [Qemu-devel] [PATCH 0/2] disable the decrementer interrupt when a CPU is unplugged
Benjamin Herrenschmidt writes: > On Fri, 2017-10-06 at 11:40 +0530, Nikunj A Dadhania wrote: >> Cédric Le Goater writes: >> >> > Hello, >> > >> > When a CPU is stopped with the 'stop-self' RTAS call, its state >> > 'halted' is switched to 1 and, in this case, the MSR is not taken into >> > account anymore in the cpu_has_work() routine. Only the pending >> > hardware interrupts are checked with their LPCR:PECE* enablement bit. >> > >> > If the DECR timer fires after 'stop-self' is called and before the CPU >> > 'stop' state is reached, the nearly-dead CPU will have some work to do >> > and the guest will crash. This case happens very frequently with the >> > not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is >> > occasionally fired but after 'stop' state, so no work is to be done >> > and the guest survives. >> > >> > I suspect there is a race between the QEMU mainloop triggering the >> > timers and the TCG CPU thread but I could not quite identify the root >> > cause. To be safe, let's disable the decrementer interrupt in the LPCR >> > when the CPU is halted and reenable it when the CPU is restarted. >> >> Moreover, disabling the DECR in the reset path solves the TCG multi cpu >> reboot case, as reboot path does not call stop-cpu rtas call. > > SHouldn't we do it in set_papr too and only turn it on for the boot CPU > and in start-cpu RTAS call ? Same with the other PECEs in fact... Yes, +1 for that Regards Nikunj
Re: [Qemu-devel] [PATCH 0/2] disable the decrementer interrupt when a CPU is unplugged
Cédric Le Goater writes: > Hello, > > When a CPU is stopped with the 'stop-self' RTAS call, its state > 'halted' is switched to 1 and, in this case, the MSR is not taken into > account anymore in the cpu_has_work() routine. Only the pending > hardware interrupts are checked with their LPCR:PECE* enablement bit. > > If the DECR timer fires after 'stop-self' is called and before the CPU > 'stop' state is reached, the nearly-dead CPU will have some work to do > and the guest will crash. This case happens very frequently with the > not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is > occasionally fired but after 'stop' state, so no work is to be done > and the guest survives. > > I suspect there is a race between the QEMU mainloop triggering the > timers and the TCG CPU thread but I could not quite identify the root > cause. To be safe, let's disable the decrementer interrupt in the LPCR > when the CPU is halted and reenable it when the CPU is restarted. Moreover, disabling the DECR in the reset path solves the TCG multi cpu reboot case, as reboot path does not call stop-cpu rtas call. diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c index 3e20b1d886..c5150ee590 100644 --- a/hw/ppc/spapr_cpu_core.c +++ b/hw/ppc/spapr_cpu_core.c @@ -86,6 +86,15 @@ static void spapr_cpu_reset(void *opaque) cs->halted = 1; env->spr[SPR_HIOR] = 0; +/* Disable DECR for secondary cpus */ +if (cs != first_cpu) { +if (env->mmu_model == POWERPC_MMU_3_00) { +env->spr[SPR_LPCR] &= ~LPCR_DEE; +} else { +/* P7 and P8 both have same bit for DECR */ +env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3; +} +} } static void spapr_cpu_destroy(PowerPCCPU *cpu) Regards Nikunj
Re: [Qemu-devel] [PATCH] ppc/pnv: fix cores per chip for multiple cpus
Cédric Le Goater writes: > On 09/22/2017 08:00 AM, Nikunj A Dadhania wrote: >> David Gibson writes: >> >>>>>> >>>>>> As smp_thread defaults to 1 in vl.c, similarly smp_cores also has the >>>>>> default value of 1 in vl.c. In powernv, we were setting nr-cores like >>>>>> this: >>>>>> >>>>>> object_property_set_int(chip, smp_cores, "nr-cores", >>>>>> &error_fatal); >>>>>> >>>>>> Even when there were multiple cpus (-smp 4), when the guest boots up, we >>>>>> just get one core (i.e. smp_cores was 1) with single thread(smp_threads >>>>>> was 1), which is wrong as per the command-line that was provided. >>>>> >>>>> Right, so, -smp 4 defaults to 4 sockets, each with 1 core of 1 >>>>> thread. If you can't supply 4 sockets you should error, but you >>>>> shouldn't go and change the number of cores per socket. >>>> >>>> OK, that makes sense now. And I do see that smp_cpus is 4 in the above >>>> case. Now looking more into it, i see that powernv has something called >>>> "num_chips", isnt this same as sockets ? Do we need num_chips separately? >>> >>> Ah, yes, I see. It's probably still reasonable to keep num_chips as >>> an internal variable, rather than using (smp_cpus / smp_cores / >>> smp_threads) everywhere. But we shouldn't have it as a direct >>> user-settable property, instead setting it from the -smp command line >>> option. >> >> Something like the below works till num_chips=2, after that guest does >> not boot up. This might be some limitation within the OS, Cedric might >> have some clue. Otherwise, I see that multiple chips are created with >> single core having single thread. >> >> ppc/pnv: Use num_chips for multiple sockets >> >> When the user does not provide the cpu topology, e.g. "-smp 4", machine >> fails to >> initialize 4 cpus. QEMU assumes smp_threads and smp_cores both as 1. >> Make sure >> that we initialize multiple chips for this. > > -smp 4 would give a machine with 4 sockets with 1 core > -smp 4,cores=4 would give a machine with 1 socket with 4 cores > > correct ? Yes Regards Nikunj
Re: [Qemu-devel] [PATCH] ppc/pnv: fix cores per chip for multiple cpus
David Gibson writes: >> >> >> >> As smp_thread defaults to 1 in vl.c, similarly smp_cores also has the >> >> default value of 1 in vl.c. In powernv, we were setting nr-cores like >> >> this: >> >> >> >> object_property_set_int(chip, smp_cores, "nr-cores", >> >> &error_fatal); >> >> >> >> Even when there were multiple cpus (-smp 4), when the guest boots up, we >> >> just get one core (i.e. smp_cores was 1) with single thread(smp_threads >> >> was 1), which is wrong as per the command-line that was provided. >> > >> > Right, so, -smp 4 defaults to 4 sockets, each with 1 core of 1 >> > thread. If you can't supply 4 sockets you should error, but you >> > shouldn't go and change the number of cores per socket. >> >> OK, that makes sense now. And I do see that smp_cpus is 4 in the above >> case. Now looking more into it, i see that powernv has something called >> "num_chips", isnt this same as sockets ? Do we need num_chips separately? > > Ah, yes, I see. It's probably still reasonable to keep num_chips as > an internal variable, rather than using (smp_cpus / smp_cores / > smp_threads) everywhere. But we shouldn't have it as a direct > user-settable property, instead setting it from the -smp command line > option. Something like the below works till num_chips=2, after that guest does not boot up. This might be some limitation within the OS, Cedric might have some clue. Otherwise, I see that multiple chips are created with single core having single thread. ppc/pnv: Use num_chips for multiple sockets When the user does not provide the cpu topology, e.g. "-smp 4", machine fails to initialize 4 cpus. QEMU assumes smp_threads and smp_cores both as 1. Make sure that we initialize multiple chips for this. Remove the user-settable property num_chips from machine command line option Signed-off-by: Nikunj A Dadhania diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c index 9724719..fa501f9 100644 --- a/hw/ppc/pnv.c +++ b/hw/ppc/pnv.c @@ -709,7 +709,17 @@ static void ppc_powernv_init(MachineState *machine) exit(1); } +pnv->num_chips = smp_cpus / (smp_cores * smp_threads); pnv->chips = g_new0(PnvChip *, pnv->num_chips); + +if (smp_cpus != (smp_threads * pnv->num_chips * smp_cores)) { +error_report("cpu topology not balanced: " + "chips (%u) * cores (%u) * threads (%u) != " + "number of cpus (%u)", + pnv->num_chips, smp_cores, smp_threads, smp_cpus); +exit(1); +} + for (i = 0; i < pnv->num_chips; i++) { char chip_name[32]; Object *chip = object_new(chip_typename); @@ -1255,53 +1265,12 @@ static void pnv_pic_print_info(InterruptStatsProvider *obj, } } -static void pnv_get_num_chips(Object *obj, Visitor *v, const char *name, - void *opaque, Error **errp) -{ -visit_type_uint32(v, name, &POWERNV_MACHINE(obj)->num_chips, errp); -} - -static void pnv_set_num_chips(Object *obj, Visitor *v, const char *name, - void *opaque, Error **errp) -{ -PnvMachineState *pnv = POWERNV_MACHINE(obj); -uint32_t num_chips; -Error *local_err = NULL; - -visit_type_uint32(v, name, &num_chips, &local_err); -if (local_err) { -error_propagate(errp, local_err); -return; -} - -/* - * TODO: should we decide on how many chips we can create based - * on #cores and Venice vs. Murano vs. Naples chip type etc..., - */ -if (!is_power_of_2(num_chips) || num_chips > 4) { -error_setg(errp, "invalid number of chips: '%d'", num_chips); -return; -} - -pnv->num_chips = num_chips; -} - static void powernv_machine_initfn(Object *obj) { PnvMachineState *pnv = POWERNV_MACHINE(obj); pnv->num_chips = 1; } -static void powernv_machine_class_props_init(ObjectClass *oc) -{ -object_class_property_add(oc, "num-chips", "uint32", - pnv_get_num_chips, pnv_set_num_chips, - NULL, NULL, NULL); -object_class_property_set_description(oc, "num-chips", - "Specifies the number of processor chips", - NULL); -} - static void powernv_machine_class_init(ObjectClass *oc, void *data) { MachineClass *mc = MACHINE_CLASS(oc); @@ -1321,8 +1290,6 @@ static void powernv_machine_class_init(ObjectClass *oc, void *data) xic->ics_get = pnv_ics_get; xic->ics_resend = pnv_ics_resend; ispc->print_info = pnv_pic_print_info; - -powernv_machine_class_props_init(oc); } static const TypeInfo powernv_machine_info = {
Re: [Qemu-devel] [PATCH] ppc/pnv: fix cores per chip for multiple cpus
David Gibson writes: > On Wed, Sep 20, 2017 at 12:48:55PM +0530, Nikunj A Dadhania wrote: >> David Gibson writes: >> >> > On Wed, Sep 20, 2017 at 12:10:48PM +0530, Nikunj A Dadhania wrote: >> >> David Gibson writes: >> >> >> >> > On Wed, Sep 20, 2017 at 10:43:19AM +0530, Nikunj A Dadhania wrote: >> >> >> David Gibson writes: >> >> >> >> >> >> > On Wed, Sep 20, 2017 at 09:50:24AM +0530, Nikunj A Dadhania wrote: >> >> >> >> David Gibson writes: >> >> >> >> >> >> >> >> > On Fri, Sep 15, 2017 at 02:39:16PM +0530, Nikunj A Dadhania wrote: >> >> >> >> >> David Gibson writes: >> >> >> >> >> >> >> >> >> >> > On Fri, Sep 15, 2017 at 01:53:15PM +0530, Nikunj A Dadhania >> >> >> >> >> > wrote: >> >> >> >> >> >> David Gibson writes: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> I thought, I am doing the same here for PowerNV, number of >> >> >> >> >> >> >> online cores >> >> >> >> >> >> >> is equal to initial online vcpus / threads per core >> >> >> >> >> >> >> >> >> >> >> >> >> >>int boot_cores_nr = smp_cpus / smp_threads; >> >> >> >> >> >> >> >> >> >> >> >> >> >> Only difference that I see in PowerNV is that we have >> >> >> >> >> >> >> multiple chips >> >> >> >> >> >> >> (max 2, at the moment) >> >> >> >> >> >> >> >> >> >> >> >> >> >> cores_per_chip = smp_cpus / (smp_threads * >> >> >> >> >> >> >> pnv->num_chips); >> >> >> >> >> >> > >> >> >> >> >> >> > This doesn't make sense to me. Cores per chip should >> >> >> >> >> >> > *always* equal >> >> >> >> >> >> > smp_cores, you shouldn't need another calculation for it. >> >> >> >> >> >> > >> >> >> >> >> >> >> And in case user has provided sane smp_cores, we use it. >> >> >> >> >> >> > >> >> >> >> >> >> > If smp_cores isn't sane, you should simply reject it, not >> >> >> >> >> >> > try to fix >> >> >> >> >> >> > it. That's just asking for confusion. >> >> >> >> >> >> >> >> >> >> >> >> This is the case where the user does not provide a >> >> >> >> >> >> topology(which is a >> >> >> >> >> >> valid scenario), not sure we should reject it. So qemu >> >> >> >> >> >> defaults >> >> >> >> >> >> smp_cores/smt_threads to 1. I think it makes sense to >> >> >> >> >> >> over-ride. >> >> >> >> >> > >> >> >> >> >> > If you can find a way to override it by altering smp_cores >> >> >> >> >> > when it's >> >> >> >> >> > not explicitly specified, then ok. >> >> >> >> >> >> >> >> >> >> Should I change the global smp_cores here as well ? >> >> >> >> > >> >> >> >> > I'm pretty uneasy with that option. >> >> >> >> >> >> >> >> Me too. >> >> >> >> >> >> >> >> > It would take a fair bit of checking to ensure that changing >> >> >> >> > smp_cores >> >> >> >> > is safe here. An easier to verify option would be to make the >> >> >> >> > generic >> >> >> >> > logic which splits up an unspecified -smp N into cores and sockets >> >> >> >> > more flexible, possibly based on machine
Re: [Qemu-devel] [PATCH] ppc/pnv: fix cores per chip for multiple cpus
David Gibson writes: > On Wed, Sep 20, 2017 at 12:10:48PM +0530, Nikunj A Dadhania wrote: >> David Gibson writes: >> >> > On Wed, Sep 20, 2017 at 10:43:19AM +0530, Nikunj A Dadhania wrote: >> >> David Gibson writes: >> >> >> >> > On Wed, Sep 20, 2017 at 09:50:24AM +0530, Nikunj A Dadhania wrote: >> >> >> David Gibson writes: >> >> >> >> >> >> > On Fri, Sep 15, 2017 at 02:39:16PM +0530, Nikunj A Dadhania wrote: >> >> >> >> David Gibson writes: >> >> >> >> >> >> >> >> > On Fri, Sep 15, 2017 at 01:53:15PM +0530, Nikunj A Dadhania wrote: >> >> >> >> >> David Gibson writes: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> I thought, I am doing the same here for PowerNV, number of >> >> >> >> >> >> online cores >> >> >> >> >> >> is equal to initial online vcpus / threads per core >> >> >> >> >> >> >> >> >> >> >> >>int boot_cores_nr = smp_cpus / smp_threads; >> >> >> >> >> >> >> >> >> >> >> >> Only difference that I see in PowerNV is that we have >> >> >> >> >> >> multiple chips >> >> >> >> >> >> (max 2, at the moment) >> >> >> >> >> >> >> >> >> >> >> >> cores_per_chip = smp_cpus / (smp_threads * >> >> >> >> >> >> pnv->num_chips); >> >> >> >> >> > >> >> >> >> >> > This doesn't make sense to me. Cores per chip should *always* >> >> >> >> >> > equal >> >> >> >> >> > smp_cores, you shouldn't need another calculation for it. >> >> >> >> >> > >> >> >> >> >> >> And in case user has provided sane smp_cores, we use it. >> >> >> >> >> > >> >> >> >> >> > If smp_cores isn't sane, you should simply reject it, not try >> >> >> >> >> > to fix >> >> >> >> >> > it. That's just asking for confusion. >> >> >> >> >> >> >> >> >> >> This is the case where the user does not provide a >> >> >> >> >> topology(which is a >> >> >> >> >> valid scenario), not sure we should reject it. So qemu defaults >> >> >> >> >> smp_cores/smt_threads to 1. I think it makes sense to over-ride. >> >> >> >> > >> >> >> >> > If you can find a way to override it by altering smp_cores when >> >> >> >> > it's >> >> >> >> > not explicitly specified, then ok. >> >> >> >> >> >> >> >> Should I change the global smp_cores here as well ? >> >> >> > >> >> >> > I'm pretty uneasy with that option. >> >> >> >> >> >> Me too. >> >> >> >> >> >> > It would take a fair bit of checking to ensure that changing >> >> >> > smp_cores >> >> >> > is safe here. An easier to verify option would be to make the generic >> >> >> > logic which splits up an unspecified -smp N into cores and sockets >> >> >> > more flexible, possibly based on machine options for max values. >> >> >> > >> >> >> > That might still be more trouble than its worth. >> >> >> >> >> >> I think the current approach is the simplest and less intrusive, as we >> >> >> are handling a case where user has not bothered to provide a detailed >> >> >> topology, the best we can do is create single threaded cores equal to >> >> >> number of cores. >> >> > >> >> > No, sorry. Having smp_cores not correspond to the number of cores per >> >> > chip in all cases is just not ok. Add an error message if the >> >> > topology isn't workable for powernv by all means. But users having to >> >> > use a longer command line is better than breaking basic assumptions >> >> > about what numbers reflect what topology. >> >> >> >> Sorry to ask again, as I am still not convinced, we do similar >> >> adjustment in spapr where the user did not provide the number of cores, >> >> but qemu assumes them as single threaded cores and created >> >> cores(boot_cores_nr) that were not same as smp_cores ? >> > >> > What? boot_cores_nr has absolutely nothing to do with adjusting the >> > topology, and it certainly doesn't assume they're single threaded. >> >> When we start a TCG guest and user provides following commandline, e.g. >> "-smp 4", smt_threads is set to 1 by default in vl.c. So the guest boots >> with 4 cores, each having 1 thread. > > Ok.. and what's the problem with that behaviour on powernv? As smp_thread defaults to 1 in vl.c, similarly smp_cores also has the default value of 1 in vl.c. In powernv, we were setting nr-cores like this: object_property_set_int(chip, smp_cores, "nr-cores", &error_fatal); Even when there were multiple cpus (-smp 4), when the guest boots up, we just get one core (i.e. smp_cores was 1) with single thread(smp_threads was 1), which is wrong as per the command-line that was provided. Regards Nikunj
Re: [Qemu-devel] [PATCH] ppc/pnv: fix cores per chip for multiple cpus
Nikunj A Dadhania writes: >> >> >>> >> I think the current approach is the simplest and less intrusive, as we >>> >> are handling a case where user has not bothered to provide a detailed >>> >> topology, the best we can do is create single threaded cores equal to >>> >> number of cores. >>> > >>> > No, sorry. Having smp_cores not correspond to the number of cores per >>> > chip in all cases is just not ok. Add an error message if the >>> > topology isn't workable for powernv by all means. But users having to >>> > use a longer command line is better than breaking basic assumptions >>> > about what numbers reflect what topology. >>> >>> Sorry to ask again, as I am still not convinced, we do similar >>> adjustment in spapr where the user did not provide the number of cores, >>> but qemu assumes them as single threaded cores and created >>> cores(boot_cores_nr) that were not same as smp_cores ? >> >> What? boot_cores_nr has absolutely nothing to do with adjusting the >> topology, and it certainly doesn't assume they're single threaded. > > When we start a TCG guest and user provides following commandline, > e.g. "-smp 4", smt_threads is set to 1 by default in vl.c. I meant smp_threads here. > So the guest boots with 4 cores, each having 1 thread. > > Regards > Nikunj
Re: [Qemu-devel] [PATCH] ppc/pnv: fix cores per chip for multiple cpus
David Gibson writes: > On Wed, Sep 20, 2017 at 10:43:19AM +0530, Nikunj A Dadhania wrote: >> David Gibson writes: >> >> > On Wed, Sep 20, 2017 at 09:50:24AM +0530, Nikunj A Dadhania wrote: >> >> David Gibson writes: >> >> >> >> > On Fri, Sep 15, 2017 at 02:39:16PM +0530, Nikunj A Dadhania wrote: >> >> >> David Gibson writes: >> >> >> >> >> >> > On Fri, Sep 15, 2017 at 01:53:15PM +0530, Nikunj A Dadhania wrote: >> >> >> >> David Gibson writes: >> >> >> >> >> >> >> >> >> >> >> >> >> >> I thought, I am doing the same here for PowerNV, number of >> >> >> >> >> online cores >> >> >> >> >> is equal to initial online vcpus / threads per core >> >> >> >> >> >> >> >> >> >>int boot_cores_nr = smp_cpus / smp_threads; >> >> >> >> >> >> >> >> >> >> Only difference that I see in PowerNV is that we have multiple >> >> >> >> >> chips >> >> >> >> >> (max 2, at the moment) >> >> >> >> >> >> >> >> >> >> cores_per_chip = smp_cpus / (smp_threads * >> >> >> >> >> pnv->num_chips); >> >> >> >> > >> >> >> >> > This doesn't make sense to me. Cores per chip should *always* >> >> >> >> > equal >> >> >> >> > smp_cores, you shouldn't need another calculation for it. >> >> >> >> > >> >> >> >> >> And in case user has provided sane smp_cores, we use it. >> >> >> >> > >> >> >> >> > If smp_cores isn't sane, you should simply reject it, not try to >> >> >> >> > fix >> >> >> >> > it. That's just asking for confusion. >> >> >> >> >> >> >> >> This is the case where the user does not provide a topology(which >> >> >> >> is a >> >> >> >> valid scenario), not sure we should reject it. So qemu defaults >> >> >> >> smp_cores/smt_threads to 1. I think it makes sense to over-ride. >> >> >> > >> >> >> > If you can find a way to override it by altering smp_cores when it's >> >> >> > not explicitly specified, then ok. >> >> >> >> >> >> Should I change the global smp_cores here as well ? >> >> > >> >> > I'm pretty uneasy with that option. >> >> >> >> Me too. >> >> >> >> > It would take a fair bit of checking to ensure that changing smp_cores >> >> > is safe here. An easier to verify option would be to make the generic >> >> > logic which splits up an unspecified -smp N into cores and sockets >> >> > more flexible, possibly based on machine options for max values. >> >> > >> >> > That might still be more trouble than its worth. >> >> >> >> I think the current approach is the simplest and less intrusive, as we >> >> are handling a case where user has not bothered to provide a detailed >> >> topology, the best we can do is create single threaded cores equal to >> >> number of cores. >> > >> > No, sorry. Having smp_cores not correspond to the number of cores per >> > chip in all cases is just not ok. Add an error message if the >> > topology isn't workable for powernv by all means. But users having to >> > use a longer command line is better than breaking basic assumptions >> > about what numbers reflect what topology. >> >> Sorry to ask again, as I am still not convinced, we do similar >> adjustment in spapr where the user did not provide the number of cores, >> but qemu assumes them as single threaded cores and created >> cores(boot_cores_nr) that were not same as smp_cores ? > > What? boot_cores_nr has absolutely nothing to do with adjusting the > topology, and it certainly doesn't assume they're single threaded. When we start a TCG guest and user provides following commandline, e.g. "-smp 4", smt_threads is set to 1 by default in vl.c. So the guest boots with 4 cores, each having 1 thread. Regards Nikunj
Re: [Qemu-devel] [PATCH] ppc/pnv: fix cores per chip for multiple cpus
David Gibson writes: > On Wed, Sep 20, 2017 at 09:50:24AM +0530, Nikunj A Dadhania wrote: >> David Gibson writes: >> >> > On Fri, Sep 15, 2017 at 02:39:16PM +0530, Nikunj A Dadhania wrote: >> >> David Gibson writes: >> >> >> >> > On Fri, Sep 15, 2017 at 01:53:15PM +0530, Nikunj A Dadhania wrote: >> >> >> David Gibson writes: >> >> >> >> >> >> >> >> >> >> >> I thought, I am doing the same here for PowerNV, number of online >> >> >> >> cores >> >> >> >> is equal to initial online vcpus / threads per core >> >> >> >> >> >> >> >>int boot_cores_nr = smp_cpus / smp_threads; >> >> >> >> >> >> >> >> Only difference that I see in PowerNV is that we have multiple chips >> >> >> >> (max 2, at the moment) >> >> >> >> >> >> >> >> cores_per_chip = smp_cpus / (smp_threads * pnv->num_chips); >> >> >> > >> >> >> > This doesn't make sense to me. Cores per chip should *always* equal >> >> >> > smp_cores, you shouldn't need another calculation for it. >> >> >> > >> >> >> >> And in case user has provided sane smp_cores, we use it. >> >> >> > >> >> >> > If smp_cores isn't sane, you should simply reject it, not try to fix >> >> >> > it. That's just asking for confusion. >> >> >> >> >> >> This is the case where the user does not provide a topology(which is a >> >> >> valid scenario), not sure we should reject it. So qemu defaults >> >> >> smp_cores/smt_threads to 1. I think it makes sense to over-ride. >> >> > >> >> > If you can find a way to override it by altering smp_cores when it's >> >> > not explicitly specified, then ok. >> >> >> >> Should I change the global smp_cores here as well ? >> > >> > I'm pretty uneasy with that option. >> >> Me too. >> >> > It would take a fair bit of checking to ensure that changing smp_cores >> > is safe here. An easier to verify option would be to make the generic >> > logic which splits up an unspecified -smp N into cores and sockets >> > more flexible, possibly based on machine options for max values. >> > >> > That might still be more trouble than its worth. >> >> I think the current approach is the simplest and less intrusive, as we >> are handling a case where user has not bothered to provide a detailed >> topology, the best we can do is create single threaded cores equal to >> number of cores. > > No, sorry. Having smp_cores not correspond to the number of cores per > chip in all cases is just not ok. Add an error message if the > topology isn't workable for powernv by all means. But users having to > use a longer command line is better than breaking basic assumptions > about what numbers reflect what topology. Sorry to ask again, as I am still not convinced, we do similar adjustment in spapr where the user did not provide the number of cores, but qemu assumes them as single threaded cores and created cores(boot_cores_nr) that were not same as smp_cores ? Regards Nikunj
Re: [Qemu-devel] [PATCH] ppc/pnv: fix cores per chip for multiple cpus
David Gibson writes: > On Fri, Sep 15, 2017 at 02:39:16PM +0530, Nikunj A Dadhania wrote: >> David Gibson writes: >> >> > On Fri, Sep 15, 2017 at 01:53:15PM +0530, Nikunj A Dadhania wrote: >> >> David Gibson writes: >> >> >> >> >> >> >> >> I thought, I am doing the same here for PowerNV, number of online cores >> >> >> is equal to initial online vcpus / threads per core >> >> >> >> >> >>int boot_cores_nr = smp_cpus / smp_threads; >> >> >> >> >> >> Only difference that I see in PowerNV is that we have multiple chips >> >> >> (max 2, at the moment) >> >> >> >> >> >> cores_per_chip = smp_cpus / (smp_threads * pnv->num_chips); >> >> > >> >> > This doesn't make sense to me. Cores per chip should *always* equal >> >> > smp_cores, you shouldn't need another calculation for it. >> >> > >> >> >> And in case user has provided sane smp_cores, we use it. >> >> > >> >> > If smp_cores isn't sane, you should simply reject it, not try to fix >> >> > it. That's just asking for confusion. >> >> >> >> This is the case where the user does not provide a topology(which is a >> >> valid scenario), not sure we should reject it. So qemu defaults >> >> smp_cores/smt_threads to 1. I think it makes sense to over-ride. >> > >> > If you can find a way to override it by altering smp_cores when it's >> > not explicitly specified, then ok. >> >> Should I change the global smp_cores here as well ? > > I'm pretty uneasy with that option. Me too. > It would take a fair bit of checking to ensure that changing smp_cores > is safe here. An easier to verify option would be to make the generic > logic which splits up an unspecified -smp N into cores and sockets > more flexible, possibly based on machine options for max values. > > That might still be more trouble than its worth. I think the current approach is the simplest and less intrusive, as we are handling a case where user has not bothered to provide a detailed topology, the best we can do is create single threaded cores equal to number of cores. Regards Nikunj
Re: [Qemu-devel] [PATCH] ppc/pnv: fix cores per chip for multiple cpus
David Gibson writes: > On Fri, Sep 15, 2017 at 01:53:15PM +0530, Nikunj A Dadhania wrote: >> David Gibson writes: >> >> >> >> >> I thought, I am doing the same here for PowerNV, number of online cores >> >> is equal to initial online vcpus / threads per core >> >> >> >>int boot_cores_nr = smp_cpus / smp_threads; >> >> >> >> Only difference that I see in PowerNV is that we have multiple chips >> >> (max 2, at the moment) >> >> >> >> cores_per_chip = smp_cpus / (smp_threads * pnv->num_chips); >> > >> > This doesn't make sense to me. Cores per chip should *always* equal >> > smp_cores, you shouldn't need another calculation for it. >> > >> >> And in case user has provided sane smp_cores, we use it. >> > >> > If smp_cores isn't sane, you should simply reject it, not try to fix >> > it. That's just asking for confusion. >> >> This is the case where the user does not provide a topology(which is a >> valid scenario), not sure we should reject it. So qemu defaults >> smp_cores/smt_threads to 1. I think it makes sense to over-ride. > > If you can find a way to override it by altering smp_cores when it's > not explicitly specified, then ok. Should I change the global smp_cores here as well ? > But overriding smp_cores with a different variable that's the "real" > number of cores is not acceptable. If that means the user has to > specify cores explicitly, so be it. Right, we would error out in case there is mismatch. > Slight awkwardness in command line is preferable to breaking the > assumption that smp_cores == (# of cores per next level up cpu object) > which is used all over the place. Regards Nikunj
Re: [Qemu-devel] [PATCH] ppc/pnv: fix cores per chip for multiple cpus
David Gibson writes: >> >> I thought, I am doing the same here for PowerNV, number of online cores >> is equal to initial online vcpus / threads per core >> >>int boot_cores_nr = smp_cpus / smp_threads; >> >> Only difference that I see in PowerNV is that we have multiple chips >> (max 2, at the moment) >> >> cores_per_chip = smp_cpus / (smp_threads * pnv->num_chips); > > This doesn't make sense to me. Cores per chip should *always* equal > smp_cores, you shouldn't need another calculation for it. > >> And in case user has provided sane smp_cores, we use it. > > If smp_cores isn't sane, you should simply reject it, not try to fix > it. That's just asking for confusion. This is the case where the user does not provide a topology(which is a valid scenario), not sure we should reject it. So qemu defaults smp_cores/smt_threads to 1. I think it makes sense to over-ride. Regards Nikunj
Re: [Qemu-devel] [PATCH v3] spapr: disable decrementer during reset
David Gibson writes: > On Wed, Jul 19, 2017 at 09:20:52AM +0530, Nikunj A Dadhania wrote: >> David Gibson writes: >> >> > On Tue, Jul 18, 2017 at 10:53:01AM +0530, Nikunj A Dadhania wrote: >> >> David Gibson writes: >> >> >> >> > On Mon, Jul 17, 2017 at 09:46:39AM +0530, Nikunj A Dadhania wrote: >> >> >> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG. >> >> >> >> >> >> When reset happens, all the CPUs are in halted state. First CPU is >> >> >> brought out >> >> >> of reset and secondary CPUs would be initialized by the guest kernel >> >> >> using a >> >> >> rtas call start-cpu. >> >> >> >> >> >> However, in case of TCG, decrementer interrupts keep on coming and >> >> >> waking the >> >> >> secondary CPUs up. >> >> >> >> >> >> These secondary CPUs would see the decrementer interrupt pending, >> >> >> which makes >> >> >> cpu::has_work() to bring them out of wait loop and start executing >> >> >> tcg_exec_cpu(). >> >> >> >> >> >> The problem with this is all the CPUs wake up and start booting SLOF >> >> >> image, >> >> >> causing the following exception(4 CPUs TCG VM): >> >> > >> >> > Ok, I'm still trying to understand why the behaviour on reboot is >> >> > different from the first boot. >> >> >> >> During first boot, the cpu is in the stopped state, so >> >> cpus.c:cpu_thread_is_idle returns true and CPU remains in halted state >> >> until rtas start-cpu. Therefore, we never check the cpu_has_work() >> >> >> >> In case of reboot, all CPUs are resumed after reboot. So we check the >> >> next condition cpu_has_work() in cpu_thread_is_idle(), where we see a >> >> DECR interrupt and remove the CPU from halted state as the CPU has >> >> work. >> > >> > Ok, so it sounds like we should set stopped on all the secondary CPUs >> > on reset as well. What's causing them to be resumed after the reset >> > at the moment? >> >> That is part of the main loop in vl.c, when reset is requested. All the >> vcpus are paused (stopped == true) then system reset is issued, and all >> cpus are resumed (stopped == false). Which is correct. > > is it? Seems we have a different value of 'stopped' on the first boot > compared to reoboots, which doesn't seem right. I have checked this with more debugging prints (patch at the end), as you said, first boot and reboot does not have different value of cpu::stopped cpu_ppc_decr_excp-cpu1: stop 0 stopped 1 halted 0 SPR_LPCR 0 spapr_cpu_reset-cpu1: stop 0 stopped 1 halted 1 SPR_LPCR 2000 spapr_cpu_reset-cpu1: stop 0 stopped 1 halted 1 SPR_LPCR 2000 resume_all_vcpus-cpu0: stop 0 stopped 0 halted 0 resume_all_vcpus-cpu1: stop 0 stopped 0 halted 1 SLOF ** QEMU Starting [ boot fine and then we reboot ] cpu_ppc_decr_excp-cpu1: stop 0 stopped 0 halted 0 SPR_LPCR 2000 cpu_ppc_decr_excp-cpu1: stop 0 stopped 0 halted 0 SPR_LPCR 2000 cpu_ppc_decr_excp-cpu1: stop 0 stopped 0 halted 0 SPR_LPCR 2000 cpu_ppc_decr_excp-cpu1: stop 0 stopped 0 halted 1 SPR_LPCR 2000 [ 54.044388] reboot: Restarting system spapr_cpu_reset-cpu1: stop 0 stopped 1 halted 1 SPR_LPCR 2000 resume_all_vcpus-cpu0: stop 0 stopped 0 halted 0 resume_all_vcpus-cpu1: stop 0 stopped 0 halted 1 cpu_ppc_decr_excp-cpu1: stop 0 stopped 0 halted 1 SPR_LPCR 2000 SSLLOOFF ***[0m ** QEMU Starting Build Date = Aug 16 2017 12:38:57 FW Version = git-685af54d8a47a42d *** QEMU Starting Build Date = Aug 16 2017 12:38:57 FW Version = git-685af54d8a47a42d ERROR: Flatten device tree not available! SPRG2 = RSPRG3 = 0 0 One difference I see here is, the decr exception is delivered before reset in case of first boot for secondary cpu, and then I do not see any decr exception until rtas-start. In case of a reboot, we are getting decr_exception at some interval, then there is spapr_cpu_reset, after that the cpus are resumed, the interrupt gets delivered just after that which brings out the CPU-1 from halted state. Other thing is, could it be a
Re: [Qemu-devel] [PATCH] ppc/pnv: fix cores per chip for multiple cpus
David Gibson writes: > On Mon, Sep 11, 2017 at 10:40:10AM +0530, Nikunj A Dadhania wrote: >> David Gibson writes: >> >> > On Wed, Sep 06, 2017 at 01:57:48PM +0530, Nikunj A Dadhania wrote: >> >> When the user does not provide the cpu topology, e.g. "-smp 4", machine >> >> fails to >> >> initialize 4 cpus. Compute the chip per cores depending on the number of >> >> chips >> >> and smt threads. >> >> >> >> Signed-off-by: Nikunj A Dadhania >> > >> > I don't understand why simply treating smp_cores as cores per chip is >> > wrong. >> >> We do not have SMT support and when "-smp 4" is passed, smp_cores is >> always set to 1. So only once core with one thread finally show up in >> the guest. Moreover, I see spapr too doing similar thing in >> spapr_init_cpus() with boot_cores_nr. > > I'm ok with adding an error if smp_threads > 1, since that won't > work. Breaking the identity that smp_cores is the number of cores per > socket (which should correspond to one chip) is not ok, though. > > I think you're misinterpreting the boot_cores_nr stuff - that's just > about translating the number of initially online vcpus into a number > of initially online cores. I thought, I am doing the same here for PowerNV, number of online cores is equal to initial online vcpus / threads per core int boot_cores_nr = smp_cpus / smp_threads; Only difference that I see in PowerNV is that we have multiple chips (max 2, at the moment) cores_per_chip = smp_cpus / (smp_threads * pnv->num_chips); And in case user has provided sane smp_cores, we use it. Regards, Nikunj
Re: [Qemu-devel] [PATCH] ppc/pnv: fix cores per chip for multiple cpus
David Gibson writes: > On Wed, Sep 06, 2017 at 01:57:48PM +0530, Nikunj A Dadhania wrote: >> When the user does not provide the cpu topology, e.g. "-smp 4", machine >> fails to >> initialize 4 cpus. Compute the chip per cores depending on the number of >> chips >> and smt threads. >> >> Signed-off-by: Nikunj A Dadhania > > I don't understand why simply treating smp_cores as cores per chip is wrong. We do not have SMT support and when "-smp 4" is passed, smp_cores is always set to 1. So only once core with one thread finally show up in the guest. Moreover, I see spapr too doing similar thing in spapr_init_cpus() with boot_cores_nr. Regards Nikunj
[Qemu-devel] [PATCH] ppc/pnv: fix cores per chip for multiple cpus
When the user does not provide the cpu topology, e.g. "-smp 4", machine fails to initialize 4 cpus. Compute the chip per cores depending on the number of chips and smt threads. Signed-off-by: Nikunj A Dadhania --- hw/ppc/pnv.c | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c index 9724719..3fbaafb 100644 --- a/hw/ppc/pnv.c +++ b/hw/ppc/pnv.c @@ -642,7 +642,7 @@ static void ppc_powernv_init(MachineState *machine) MemoryRegion *ram; char *fw_filename; long fw_size; -int i; +int i, cores_per_chip; char *chip_typename; PCIBus *pbus; bool has_gfx = false; @@ -710,6 +710,22 @@ static void ppc_powernv_init(MachineState *machine) } pnv->chips = g_new0(PnvChip *, pnv->num_chips); + +/* If user has specified number of cores, use it. Otherwise, compute it. */ +if (smp_cores != 1) { +cores_per_chip = smp_cores; +} else { +cores_per_chip = smp_cpus / (smp_threads * pnv->num_chips); +} + +if (smp_cpus != (smp_threads * pnv->num_chips * cores_per_chip)) { +error_report("cpu topology not balanced: " + "chips (%u) * cores (%u) * threads (%u) != " + "number of cpus (%u)", + pnv->num_chips, cores_per_chip, smp_threads, smp_cpus); +exit(1); +} + for (i = 0; i < pnv->num_chips; i++) { char chip_name[32]; Object *chip = object_new(chip_typename); @@ -728,7 +744,7 @@ static void ppc_powernv_init(MachineState *machine) object_property_add_child(OBJECT(pnv), chip_name, chip, &error_fatal); object_property_set_int(chip, PNV_CHIP_HWID(i), "chip-id", &error_fatal); -object_property_set_int(chip, smp_cores, "nr-cores", &error_fatal); +object_property_set_int(chip, cores_per_chip, "nr-cores", &error_fatal); object_property_set_int(chip, 1, "num-phbs", &error_fatal); object_property_set_bool(chip, true, "realized", &error_fatal); } -- 2.9.3
Re: [Qemu-devel] [PATCH v3 1/5] ppc: spapr: Register and handle HCALL to receive updated RTAS region
David Gibson writes: > On Wed, Aug 16, 2017 at 02:42:13PM +0530, Aravinda Prasad wrote: >> Receive updates from SLOF about the updated rtas-base. >> A separate patch for SLOF [1] adds functionality to invoke >> a private HCALL whenever OS issues instantiate-rtas with >> a new rtas-base. >> >> This is required as QEMU needs to know the updated rtas-base >> as it allocates error reporting structure in RTAS space upon >> a machine check exception. >> >> [1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-August/120386.html >> >> Signed-off-by: Aravinda Prasad >> Reviewed-by: David Gibson > > Actually, I take back this R-b, see below. > > In any case I'm not willing to apply the patches which depend on this > until the corresponding SLOF update is merged as well. We already have this sitting around in SLOF for a while https://git.qemu.org/gitweb.cgi?p=SLOF.git;a=commit;f=lib/libhvcall/libhvcall.h;h=f9a60de30492863811c2cdf6f28988c9e8a2c3d9 Regards Nikunj
Re: [Qemu-devel] [PATCH v3] spapr: disable decrementer during reset
David Gibson writes: > On Tue, Jul 18, 2017 at 10:53:01AM +0530, Nikunj A Dadhania wrote: >> David Gibson writes: >> >> > On Mon, Jul 17, 2017 at 09:46:39AM +0530, Nikunj A Dadhania wrote: >> >> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG. >> >> >> >> When reset happens, all the CPUs are in halted state. First CPU is >> >> brought out >> >> of reset and secondary CPUs would be initialized by the guest kernel >> >> using a >> >> rtas call start-cpu. >> >> >> >> However, in case of TCG, decrementer interrupts keep on coming and waking >> >> the >> >> secondary CPUs up. >> >> >> >> These secondary CPUs would see the decrementer interrupt pending, which >> >> makes >> >> cpu::has_work() to bring them out of wait loop and start executing >> >> tcg_exec_cpu(). >> >> >> >> The problem with this is all the CPUs wake up and start booting SLOF >> >> image, >> >> causing the following exception(4 CPUs TCG VM): >> > >> > Ok, I'm still trying to understand why the behaviour on reboot is >> > different from the first boot. >> >> During first boot, the cpu is in the stopped state, so >> cpus.c:cpu_thread_is_idle returns true and CPU remains in halted state >> until rtas start-cpu. Therefore, we never check the cpu_has_work() >> >> In case of reboot, all CPUs are resumed after reboot. So we check the >> next condition cpu_has_work() in cpu_thread_is_idle(), where we see a >> DECR interrupt and remove the CPU from halted state as the CPU has >> work. > > Ok, so it sounds like we should set stopped on all the secondary CPUs > on reset as well. What's causing them to be resumed after the reset > at the moment? That is part of the main loop in vl.c, when reset is requested. All the vcpus are paused (stopped == true) then system reset is issued, and all cpus are resumed (stopped == false). Which is correct. static bool main_loop_should_exit(void) { [...] request = qemu_reset_requested(); if (request) { pause_all_vcpus(); qemu_system_reset(request); resume_all_vcpus(); if (!runstate_check(RUN_STATE_RUNNING) && !runstate_check(RUN_STATE_INMIGRATE)) { runstate_set(RUN_STATE_PRELAUNCH); } } [...] } The CPUs are in halted state, i.e. cpu::halted = 1. We have set cpu::halted = 0 for the primary CPU, aka first_cpu, in spapr.c::ppc_spapr_reset() In case of TCG, we have a decr callback (per CPU) scheduled once the machine is started which keeps on running and interrupting irrespective of the state of the machine. tb_env->decr_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, &cpu_ppc_decr_cb, cpu); Which keeps on queueing the interrupt to the CPUs. All the other CPUs are in a tight loop checking cpu_thread_is_idle(), which returns false when it sees the decrementer interrupt, the cpu starts executing: cpu_exec() -> cpu_handle_halt() -> sees cpu_has_work() and sets cpu->halted = 0 And the execution resumes, when it shouldnt have. Ideally, for secondary CPUs, cpu::halted flag is cleared in rtas start-cpu call. Initially, I thought we should not have interrupt during reset state. That was the reason of ignoring decr interrupt when msr_ee was disabled in my previous patch. BenH suggested that it is wrong from HW perspective. Regards, Nikunj
Re: [Qemu-devel] [PATCH v3] spapr: disable decrementer during reset
David Gibson writes: > On Mon, Jul 17, 2017 at 09:46:39AM +0530, Nikunj A Dadhania wrote: >> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG. >> >> When reset happens, all the CPUs are in halted state. First CPU is brought >> out >> of reset and secondary CPUs would be initialized by the guest kernel using a >> rtas call start-cpu. >> >> However, in case of TCG, decrementer interrupts keep on coming and waking the >> secondary CPUs up. >> >> These secondary CPUs would see the decrementer interrupt pending, which makes >> cpu::has_work() to bring them out of wait loop and start executing >> tcg_exec_cpu(). >> >> The problem with this is all the CPUs wake up and start booting SLOF image, >> causing the following exception(4 CPUs TCG VM): > > Ok, I'm still trying to understand why the behaviour on reboot is > different from the first boot. During first boot, the cpu is in the stopped state, so cpus.c:cpu_thread_is_idle returns true and CPU remains in halted state until rtas start-cpu. Therefore, we never check the cpu_has_work() In case of reboot, all CPUs are resumed after reboot. So we check the next condition cpu_has_work() in cpu_thread_is_idle(), where we see a DECR interrupt and remove the CPU from halted state as the CPU has work. > AFAICT on initial boot, the LPCR will > have DEE / PECE3 enabled. So why aren't we getting the same problem > then? Regards Nikunj
Re: [Qemu-devel] [PATCH v3] spapr: disable decrementer during reset
David Gibson writes: > On Mon, Jul 17, 2017 at 09:46:39AM +0530, Nikunj A Dadhania wrote: >> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG. >> >> When reset happens, all the CPUs are in halted state. First CPU is brought >> out >> of reset and secondary CPUs would be initialized by the guest kernel using a >> rtas call start-cpu. >> >> However, in case of TCG, decrementer interrupts keep on coming and waking the >> secondary CPUs up. >> >> These secondary CPUs would see the decrementer interrupt pending, which makes >> cpu::has_work() to bring them out of wait loop and start executing >> tcg_exec_cpu(). >> >> The problem with this is all the CPUs wake up and start booting SLOF image, >> causing the following exception(4 CPUs TCG VM): > > Ok, I'm still trying to understand why the behaviour on reboot is > different from the first boot. During first boot, the cpu is in the stopped state, so cpus.c:cpu_thread_is_idle returns true and CPU remains in halted state until rtas start-cpu. Therefore, we never check the cpu_has_work() In case of reboot, all CPUs are resumed after reboot. So we check the next condition cpu_has_work() in cpu_thread_is_idle(), where we see a DECR interrupt and remove the CPU from halted state as the CPU has work. > AFAICT on initial boot, the LPCR will > have DEE / PECE3 enabled. So why aren't we getting the same problem > then? Regards Nikunj
Re: [Qemu-devel] [PATCH v3] spapr: disable decrementer during reset
David Gibson writes: > On Mon, Jul 17, 2017 at 09:46:39AM +0530, Nikunj A Dadhania wrote: >> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG. >> >> When reset happens, all the CPUs are in halted state. First CPU is brought >> out >> of reset and secondary CPUs would be initialized by the guest kernel using a >> rtas call start-cpu. >> >> However, in case of TCG, decrementer interrupts keep on coming and waking the >> secondary CPUs up. >> >> These secondary CPUs would see the decrementer interrupt pending, which makes >> cpu::has_work() to bring them out of wait loop and start executing >> tcg_exec_cpu(). >> >> The problem with this is all the CPUs wake up and start booting SLOF image, >> causing the following exception(4 CPUs TCG VM): > > Ok, I'm still trying to understand why the behaviour on reboot is > different from the first boot. During first boot, the cpu is in the stopped state, so cpus.c:cpu_thread_is_idle returns true and CPU remains in halted state until rtas start-cpu. Therefore, we never check the cpu_has_work() In case of reboot, all CPUs are resumed after reboot. So we check the next condition cpu_has_work() in cpu_thread_is_idle(), where we see a DECR interrupt and remove the CPU from halted state as the CPU has work. > AFAICT on initial boot, the LPCR will > have DEE / PECE3 enabled. So why aren't we getting the same problem > then? Regards Nikunj
[Qemu-devel] [PATCH v3] spapr: disable decrementer during reset
Rebooting a SMP TCG guest is broken for both single/multi threaded TCG. When reset happens, all the CPUs are in halted state. First CPU is brought out of reset and secondary CPUs would be initialized by the guest kernel using a rtas call start-cpu. However, in case of TCG, decrementer interrupts keep on coming and waking the secondary CPUs up. These secondary CPUs would see the decrementer interrupt pending, which makes cpu::has_work() to bring them out of wait loop and start executing tcg_exec_cpu(). The problem with this is all the CPUs wake up and start booting SLOF image, causing the following exception(4 CPUs TCG VM): [ 81.440850] reboot: Restarting system SLOF S SLOF SLOFLOF[0[0m ** QEMU Starting Build Date = Mar 3 2017 13:29:19 FW Version = git-66d250ef0fd06bb8 [0m ** QEMU Starting Build Date = Mar 3 2017 13:29:19 FW Version = git-66d250ef0fd06bb8 [0m *m**[?25l ** QEMU Starting Build Date = Mar 3 2017 13:29:19 FW Version = git-66d250ef0fd06bb8 *** QEMU Starting Build Date = Mar 3 2017 13:29:19 FW Version = git-66d250ef0fd06bb8 ERROR: Flatten device tree not available! exception 300 SRR0 = 60e4 SRR1 = 80008000 SPRG2 = 0040 SPRG3 = 4bd8 ERROR: Flatten device tree not available! exception 300 SRR0 = 60e4 SRR1 = 80008000 SPRG2 = 0040 SPRG3 = 4bd8 During reset, disable decrementer interrupt for secondary CPUs and enable them when the secondary CPUs are brought online by rtas start-cpu call. Reported-by: Bharata B Rao Signed-off-by: Nikunj A Dadhania --- hw/ppc/spapr_cpu_core.c | 9 + hw/ppc/spapr_rtas.c | 8 2 files changed, 17 insertions(+) diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c index ea278ce..bbfe8c2 100644 --- a/hw/ppc/spapr_cpu_core.c +++ b/hw/ppc/spapr_cpu_core.c @@ -87,6 +87,15 @@ static void spapr_cpu_reset(void *opaque) env->spr[SPR_HIOR] = 0; +/* Disable DECR for secondary cpus */ +if (cs != first_cpu) { +if (env->mmu_model == POWERPC_MMU_3_00) { +env->spr[SPR_LPCR] &= ~LPCR_DEE; +} else { +/* P7 and P8 both have same bit for DECR */ +env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3; +} +} /* * This is a hack for the benefit of KVM PR - it abuses the SDR1 * slot in kvm_sregs to communicate the userspace address of the diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c index 94a2799..4623d1d 100644 --- a/hw/ppc/spapr_rtas.c +++ b/hw/ppc/spapr_rtas.c @@ -174,6 +174,14 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, sPAPRMachineState *spapr, kvm_cpu_synchronize_state(cs); env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME); + +/* Enable DECR interrupt */ +if (env->mmu_model == POWERPC_MMU_3_00) { +env->spr[SPR_LPCR] |= LPCR_DEE; +} else { +/* P7 and P8 both have same bit for DECR */ +env->spr[SPR_LPCR] |= LPCR_P8_PECE3; +} env->nip = start; env->gpr[3] = r3; cs->halted = 0; -- 2.9.3
[Qemu-devel] [PATCH v2] spapr: ignore decr interrupts when MSR_EE is disabled
Rebooting a SMP TCG guest is broken for both single/multi threaded TCG. When reset happens, all the CPUs are in halted state. First CPU is brought out of reset and secondary CPUs would be initialized by the guest kernel using a rtas call start-cpu. However, in case of TCG, decrementer interrupts keep on coming and waking the secondary CPUs up. These secondary CPUs would see the decrementer interrupt pending, which makes cpu::has_work() to bring them out of wait loop and start executing tcg_exec_cpu(). The problem with this is all the CPUs wake up and start booting SLOF image, causing the following exception(4 CPUs TCG VM): [ 81.440850] reboot: Restarting system SLOF S SLOF SLOFLOF[0[0m ** QEMU Starting Build Date = Mar 3 2017 13:29:19 FW Version = git-66d250ef0fd06bb8 [0m ** QEMU Starting Build Date = Mar 3 2017 13:29:19 FW Version = git-66d250ef0fd06bb8 [0m *m**[?25l ** QEMU Starting Build Date = Mar 3 2017 13:29:19 FW Version = git-66d250ef0fd06bb8 *** QEMU Starting Build Date = Mar 3 2017 13:29:19 FW Version = git-66d250ef0fd06bb8 ERROR: Flatten device tree not available! exception 300 SRR0 = 60e4 SRR1 = 80008000 SPRG2 = 0040 SPRG3 = 4bd8 ERROR: Flatten device tree not available! exception 300 SRR0 = 60e4 SRR1 = 80008000 SPRG2 = 0040 SPRG3 = 4bd8 Reported-by: Bharata B Rao Tested-by: Cédric Le Goater Signed-off-by: Nikunj A Dadhania --- target/ppc/translate_init.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c index 783bf98..8b98076 100644 --- a/target/ppc/translate_init.c +++ b/target/ppc/translate_init.c @@ -8536,7 +8536,8 @@ static bool cpu_has_work_POWER7(CPUState *cs) } if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) && (env->spr[SPR_LPCR] & LPCR_P7_PECE1)) { -return true; +/* Return true only when MSR_EE is enabled */ +return msr_ee; } if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK)) && (env->spr[SPR_LPCR] & LPCR_P7_PECE2)) { @@ -8693,7 +8694,8 @@ static bool cpu_has_work_POWER8(CPUState *cs) } if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) && (env->spr[SPR_LPCR] & LPCR_P8_PECE3)) { -return true; +/* Return true only when MSR_EE is enabled */ +return msr_ee; } if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK)) && (env->spr[SPR_LPCR] & LPCR_P8_PECE4)) { @@ -8876,7 +8878,8 @@ static bool cpu_has_work_POWER9(CPUState *cs) /* Decrementer Exception */ if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) && (env->spr[SPR_LPCR] & LPCR_DEE)) { -return true; +/* Return true only when MSR_EE is enabled */ +return msr_ee; } /* Machine Check or Hypervisor Maintenance Exception */ if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK | -- 2.9.3
Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC] spapr: ignore interrupts during reset state
Cédric Le Goater writes: > On 07/13/2017 11:10 AM, Nikunj A Dadhania wrote: >> Cédric Le Goater writes: >> >>> On 07/13/2017 09:55 AM, Nikunj A Dadhania wrote: >>>> Cédric Le Goater writes: >>>> >>>>>>> How about the following, we do not report work until MSR_EE is disabled: >>>>>> >>>>>> With this fix, I could test the XIVE<->XICS transitions at reboot >>>>>> under TCG. However, the second boot is very slow for some reason. >>>>> >>>>> hmm, I am not sure this is related but I just got : >>>> >>>> Havent seen in my setup after around 10 reboot cycles, I was using 2 >>>> cores pseries setup. Lets give it some more testing. When did this >>>> happen, during boot ? >>> >>> yes. >>> >>> I could not reproduce either :/ but I am keeping the patch. qemu runs >>> with : >>> >>> -m 2G -M pseries -accel tcg,thread=multi -cpu POWER9 -smp cores=4,maxcpus=8 >>> -realtime mlock=off -kernel ./vmlinux-4.12.0+ -initrd ./initrd.img-4.12.0+ >>> -append 'console=hvc0 dyndbg="file arch/powerpc/sysdev/xive/* +p"' >>> -nographic -nodefaults -serial mon:stdio -snapshot -d guest_errors,unimp >>> -no-shutdown >>> >> >> With 4 cores I am seeing hangs occasionally, although I havent seen a >> crash. But seems to be similar problem that you had seen. > > The results are good with 4 and 8 cores and so you can add my Tested-by: Did you try my last patch? Regards Nikunj
Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC] spapr: ignore interrupts during reset state
Nikunj A Dadhania writes: > Cédric Le Goater writes: > >> On 07/13/2017 09:55 AM, Nikunj A Dadhania wrote: >>> Cédric Le Goater writes: >>> >>>>>> How about the following, we do not report work until MSR_EE is disabled: >>>>> >>>>> With this fix, I could test the XIVE<->XICS transitions at reboot >>>>> under TCG. However, the second boot is very slow for some reason. >>>> >>>> hmm, I am not sure this is related but I just got : >>> >>> Havent seen in my setup after around 10 reboot cycles, I was using 2 >>> cores pseries setup. Lets give it some more testing. When did this >>> happen, during boot ? >> >> yes. >> >> I could not reproduce either :/ but I am keeping the patch. qemu runs >> with : >> >> -m 2G -M pseries -accel tcg,thread=multi -cpu POWER9 -smp cores=4,maxcpus=8 >> -realtime mlock=off -kernel ./vmlinux-4.12.0+ -initrd ./initrd.img-4.12.0+ >> -append 'console=hvc0 dyndbg="file arch/powerpc/sysdev/xive/* +p"' >> -nographic -nodefaults -serial mon:stdio -snapshot -d guest_errors,unimp >> -no-shutdown >> > > With 4 cores I am seeing hangs occasionally, although I havent seen a > crash. But seems to be similar problem that you had seen. Can you try this one, localized patch, only taking care of DECR interrupt. I am not seeing the hangs with this one. diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c index 783bf98..07e405f 100644 --- a/target/ppc/translate_init.c +++ b/target/ppc/translate_init.c @@ -8536,7 +8536,7 @@ static bool cpu_has_work_POWER7(CPUState *cs) } if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) && (env->spr[SPR_LPCR] & LPCR_P7_PECE1)) { -return true; +return msr_ee ? true : false; } if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK)) && (env->spr[SPR_LPCR] & LPCR_P7_PECE2)) { @@ -8693,7 +8693,7 @@ static bool cpu_has_work_POWER8(CPUState *cs) } if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) && (env->spr[SPR_LPCR] & LPCR_P8_PECE3)) { -return true; +return msr_ee ? true : false; } if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK)) && (env->spr[SPR_LPCR] & LPCR_P8_PECE4)) { @@ -8876,7 +8876,7 @@ static bool cpu_has_work_POWER9(CPUState *cs) /* Decrementer Exception */ if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) && (env->spr[SPR_LPCR] & LPCR_DEE)) { -return true; +return msr_ee ? true : false; } /* Machine Check or Hypervisor Maintenance Exception */ if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK |
Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC] spapr: ignore interrupts during reset state
Cédric Le Goater writes: > On 07/13/2017 09:55 AM, Nikunj A Dadhania wrote: >> Cédric Le Goater writes: >> >>>>> How about the following, we do not report work until MSR_EE is disabled: >>>> >>>> With this fix, I could test the XIVE<->XICS transitions at reboot >>>> under TCG. However, the second boot is very slow for some reason. >>> >>> hmm, I am not sure this is related but I just got : >> >> Havent seen in my setup after around 10 reboot cycles, I was using 2 >> cores pseries setup. Lets give it some more testing. When did this >> happen, during boot ? > > yes. > > I could not reproduce either :/ but I am keeping the patch. qemu runs > with : > > -m 2G -M pseries -accel tcg,thread=multi -cpu POWER9 -smp cores=4,maxcpus=8 > -realtime mlock=off -kernel ./vmlinux-4.12.0+ -initrd ./initrd.img-4.12.0+ > -append 'console=hvc0 dyndbg="file arch/powerpc/sysdev/xive/* +p"' -nographic > -nodefaults -serial mon:stdio -snapshot -d guest_errors,unimp -no-shutdown > With 4 cores I am seeing hangs occasionally, although I havent seen a crash. But seems to be similar problem that you had seen. Regards, Nikunj
Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC] spapr: ignore interrupts during reset state
Cédric Le Goater writes: >>> How about the following, we do not report work until MSR_EE is disabled: >> >> With this fix, I could test the XIVE<->XICS transitions at reboot >> under TCG. However, the second boot is very slow for some reason. > > hmm, I am not sure this is related but I just got : Havent seen in my setup after around 10 reboot cycles, I was using 2 cores pseries setup. Lets give it some more testing. When did this happen, during boot ? > > [ 28.311559] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! > [migration/0:10] > [ 28.311856] Modules linked in: > [ 28.312058] CPU: 0 PID: 10 Comm: migration/0 Not tainted 4.12.0+ #10 > [ 28.312165] task: c0007a842c00 task.stack: c0007a12c000 > [ 28.312214] NIP: c01bf6b0 LR: c01bf788 CTR: > c01bf5b0 > [ 28.312253] REGS: c0007a12f9d0 TRAP: 0901 Not tainted (4.12.0+) > [ 28.312284] MSR: 82009033 EE is enabled, so chances of interrupts getting ignored isnt there. More over the code will trigger only when cs->halted is true. > [ 28.312399] CR: 20004202 XER: 2004 > [ 28.312457] CFAR: c01bf6c4 SOFTE: 1 > [ 28.312457] GPR00: c01bf9c8 c0007a12fc50 c147f000 > > [ 28.312457] GPR04: > > [ 28.312457] GPR08: 0001 0001 > 002b > [ 28.312457] GPR12: cfdc > [ 28.313029] NIP [c01bf6b0] multi_cpu_stop+0x100/0x1f0 > [ 28.313074] LR [c01bf788] multi_cpu_stop+0x1d8/0x1f0 > [ 28.313136] Call Trace: > [ 28.313334] [c0007a12fc50] [c0007a12fd30] 0xc0007a12fd30 > (unreliable) > [ 28.313428] [c0007a12fca0] [c01bf9c8] > cpu_stopper_thread+0xd8/0x220 > [ 28.313480] [c0007a12fd60] [c0113c10] > smpboot_thread_fn+0x290/0x2a0 > [ 28.313571] [c0007a12fdc0] [c010dc04] kthread+0x164/0x1b0 > [ 28.313640] [c0007a12fe30] [c000b268] > ret_from_kernel_thread+0x5c/0x74 > [ 28.313742] Instruction dump: > [ 28.313924] 2fa9 409e001c 813d0020 815d0010 39290001 915e 7c2004ac > 913d0020 > [ 28.314001] 2b9f0004 419e003c 7fe9fb78 7c210b78 <7c421378> 83fd0020 > 7f89f840 409eff94 > > with 4 cores under mttcg. Regards Nikunj
Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC] spapr: ignore interrupts during reset state
Cédric Le Goater writes: > On 07/13/2017 06:38 AM, Nikunj A Dadhania wrote: >> David Gibson writes: >> >>> >>> Ok, but we definitely should be able to fix this without new >>> variables. If we can quiesce the secondary CPUs for the first boot, >>> we should be able to duplicate that for subsequent boots. >> >> How about the following, we do not report work until MSR_EE is disabled: > > With this fix, I could test the XIVE<->XICS transitions at reboot > under TCG. > However, the second boot is very slow for some reason. This is not related with current patch. Its slow otherwise as well. > > Tested-by: Cédric Le Goater Regards Nikunj
Re: [Qemu-devel] [PATCH RFC] spapr: ignore interrupts during reset state
David Gibson writes: > On Fri, Jun 09, 2017 at 10:32:25AM +0530, Nikunj A Dadhania wrote: >> David Gibson writes: >> >> > On Thu, Jun 08, 2017 at 12:06:08PM +0530, Nikunj A Dadhania wrote: >> >> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG. >> > >> > Ouch. When exactly did this happen? >> >> Broken since long >> >> > I know that smp boot used to work under TCG, albeit very slowly. >> >> SMP boot works, its the reboot issued from the guest doesn't boot and >> crashes in SLOF. > > Oh, sorry, I misunderstood. > >> >> >> When reset happens, all the CPUs are in halted state. First CPU is >> >> brought out >> >> of reset and secondary CPUs would be initialized by the guest kernel >> >> using a >> >> rtas call start-cpu. >> >> >> >> However, in case of TCG, decrementer interrupts keep on coming and waking >> >> the >> >> secondary CPUs up. >> > >> > Ok.. how is that happening given that the secondary CPUs should have >> > MSR[EE] == 0? >> >> Basically, the CPU is in halted condition and has_work() does not check >> for MSR_EE in that case. But I am not sure if checking MSR_EE is >> sufficient, as the CPU does go to halted state (idle) while running as >> well. > > Ok, but we definitely should be able to fix this without new > variables. If we can quiesce the secondary CPUs for the first boot, > we should be able to duplicate that for subsequent boots. How about the following, we do not report work until MSR_EE is disabled: diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c index 783bf98..2cac98a 100644 --- a/target/ppc/translate_init.c +++ b/target/ppc/translate_init.c @@ -8527,6 +8527,9 @@ static bool cpu_has_work_POWER7(CPUState *cs) CPUPPCState *env = &cpu->env; if (cs->halted) { +if (!msr_ee) { +return false; +} if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) { return false; } @@ -8684,6 +8687,9 @@ static bool cpu_has_work_POWER8(CPUState *cs) CPUPPCState *env = &cpu->env; if (cs->halted) { +if (!msr_ee) { +return false; +} if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) { return false; } @@ -8865,6 +8871,9 @@ static bool cpu_has_work_POWER9(CPUState *cs) CPUPPCState *env = &cpu->env; if (cs->halted) { +if (!msr_ee) { +return false; +} if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) { return false; } Regards Nikunj
Re: [Qemu-devel] [PATCH v4 0/6] spapr/xics: fix migration of older machine types
Nikunj A Dadhania writes: > Greg Kurz writes: > >> On Sun, 11 Jun 2017 17:38:42 +0800 >> David Gibson wrote: >> >>> On Fri, Jun 09, 2017 at 05:09:13PM +0200, Greg Kurz wrote: >>> > On Fri, 9 Jun 2017 20:28:32 +1000 >>> > David Gibson wrote: >>> > >>> > > On Fri, Jun 09, 2017 at 11:36:31AM +0200, Greg Kurz wrote: >>> > > > On Fri, 9 Jun 2017 12:28:13 +1000 >>> > > > David Gibson wrote: >>> > > > >>> > 1) start guest >>> > >>> > qemu-system-ppc64 \ >>> > -nodefaults -nographic -snapshot -no-shutdown -serial mon:stdio \ >>> > -device virtio-net,netdev=netdev0,id=net0 \ >>> > -netdev >>> > bridge,id=netdev0,br=virbr0,helper=/usr/libexec/qemu-bridge-helper \ >>> > -device virtio-blk,drive=drive0,id=blk0 \ >>> > -drive file=/home/greg/images/sle12-sp1-ppc64le.qcow2,id=drive0,if=none \ >>> > -machine type=pseries,accel=tcg -cpu POWER8 > > Strangely, your command line does not have multiple threads. Need to see > what is the side effect of enabling MTTCG by default here. > >>> > >>> > 2) migrate >>> > >>> > 3) destination crashes (immediately or after very short delay) or >>> > hangs >>> >>> Ok. I'll bisect it when I can, but you might well get to it first. >>> >>> >> >> Heh, maybe you didn't see in my mail but I did bisect: >> >> f0b0685d6694a28c66018f438e822596243b1250 is the first bad commit >> commit f0b0685d6694a28c66018f438e822596243b1250 >> Author: Nikunj A Dadhania >> Date: Thu Apr 27 10:48:23 2017 +0530 >> >> tcg: enable MTTCG by default for PPC64 on x86 > > Let me have a look at it. Interesting problem here, I see that when the migration is completed on source and there is a crash on destination: [ 56.185314] Unable to handle kernel paging request for data at address 0x5deadbeef108 [ 56.185401] Faulting instruction address: 0xc0277bc8 0xc0277bb8 <+168>: ld r7,8(r4) 0xc0277bbc <+172>: ld r6,0(r4) < 0xc0277bc0 <+176>: ori r8,r8,56302 0xc0277bc4 <+180>: rldicr r8,r8,32,31 0xc0277bc8 <+184>: std r7,8(r6) r4 = 0xf00107a0 r6 = 0x5deadbeef100 Code at 0xc0277bbc <+172>, gave junk value in r6, that leads to the guest crash. When I inspect the memory on source and destination in qemu monitor, I get the following differences: diff -u s.txt d.txt --- s.txt 2017-06-16 10:34:39.657221125 +0530 +++ d.txt 2017-06-16 10:34:18.452238305 +0530 @@ -8,8 +8,8 @@ f0010760: 0x20de0b00 0x00f0 0x60040100 0x00f0 f0010770: 0x 0x 0x0004036d 0x00c0 f0010780: 0x6c000100 0xf8ff3f00 0x7817f977 0x00c0 -f0010790: 0x1500 0x 0x 0x0100 -f00107a0: 0x3090a96d 0x00c0 0x3090a96d 0x00c0 +f0010790: 0x0100 0x 0x 0x0100 +f00107a0: 0x000100f0 0xeedbea5d 0x000200f0 0xeedbea5d f00107b0: 0x 0x 0x00d0a96d 0x00c0 f00107c0: 0x2800 0xf8ff3f00 0x8852cc77 0x00c0 f00107d0: 0x 0x 0x 0x0100 Source had a valid address at 0xf00107a0, while garbage on the destination. Some observations: * Source updates the memory location (probably atomic_cmpxchg), but the updated page didnt get transferred to the destination * Getting rid of atomic_cmpxchg tcg ops in ldarx/stdcx, makes migration work fine. MTTCG running with 1 cpu. While I continue debugging, any hints would help. Regards Nikunj
Re: [Qemu-devel] [Qemu-ppc] [PATCH] target/ppc/excp_helper: Take BQL before calling cpu_interrupt()
Alex Bennée writes: > Thomas Huth writes: > >> Since the introduction of MTTCG, using the msgsnd instruction >> abort()s if being called without holding the BQL. So let's protect >> that part of the code now with qemu_mutex_lock_iothread(). >> >> Buglink: https://bugs.launchpad.net/qemu/+bug/1694998 >> Signed-off-by: Thomas Huth > > Reviewed-by: Alex Bennée > > p.s. I was checking the ppc code for other CPU_FOREACH patterns and I > noticed the tlb_flush calls could probably use the tlb_flush_all_cpus > API instead of manually looping themselves. Will that be synchronous call? In PPC, we do lazy tlb flush, the tlb flushes are batched until a synchronization point (for optimization). The batching is achieved using a tlb_need_flush (global/local) and when there is isync/ptesync or an exception, the actual flush is done. At this point we need to make sure that the flush is synchronous. > You should also double check the semantics to make sure none of them > need to use the _synced variant and a cpu_exit if the flush needs to > complete w.r.t the originating CPU. Regards, Nikunj
Re: [Qemu-devel] [PATCH v4 0/6] spapr/xics: fix migration of older machine types
David Gibson writes: > On Tue, Jun 13, 2017 at 01:59:29PM +0530, Nikunj A Dadhania wrote: >> Greg Kurz writes: >> >> > On Sun, 11 Jun 2017 17:38:42 +0800 >> > David Gibson wrote: >> > >> >> On Fri, Jun 09, 2017 at 05:09:13PM +0200, Greg Kurz wrote: >> >> > On Fri, 9 Jun 2017 20:28:32 +1000 >> >> > David Gibson wrote: >> >> > >> >> > > On Fri, Jun 09, 2017 at 11:36:31AM +0200, Greg Kurz wrote: >> >> > > > On Fri, 9 Jun 2017 12:28:13 +1000 >> >> > > > David Gibson wrote: >> >> > > > >> >> > > > > On Thu, Jun 08, 2017 at 03:42:32PM +0200, Greg Kurz wrote: >> >> > > > > > I've provided answers for all comments from the v3 review that >> >> > > > > > I deliberately >> >> > > > > > don't address in v4. >> >> > > > > >> >> > > > > I've merged patches 1-4. 5 & 6 I'm still reviewing. >> >> > > > > >> >> > > > >> >> > > > Cool. FYI, I forgot to mention that I only tested with KVM. >> >> > > > >> >> > > > I'm now trying with TCG and I hit various guest crash on >> >> > > > the destination (using your ppc-for-2.10 branch WITHOUT >> >> > > > my patches): >> >> > > >> >> > > Drat. What's your reproducer for this crash? >> >> > > >> >> > >> >> > 1) start guest >> >> > >> >> > qemu-system-ppc64 \ >> >> > -nodefaults -nographic -snapshot -no-shutdown -serial mon:stdio \ >> >> > -device virtio-net,netdev=netdev0,id=net0 \ >> >> > -netdev >> >> > bridge,id=netdev0,br=virbr0,helper=/usr/libexec/qemu-bridge-helper \ >> >> > -device virtio-blk,drive=drive0,id=blk0 \ >> >> > -drive >> >> > file=/home/greg/images/sle12-sp1-ppc64le.qcow2,id=drive0,if=none \ >> >> > -machine type=pseries,accel=tcg -cpu POWER8 >> >> Strangely, your command line does not have multiple threads. Need to see >> what is the side effect of enabling MTTCG by default here. >> >> >> > >> >> > 2) migrate >> >> > >> >> > 3) destination crashes (immediately or after very short delay) or >> >> > hangs >> >> >> >> Ok. I'll bisect it when I can, but you might well get to it first. >> >> >> >> >> > >> > Heh, maybe you didn't see in my mail but I did bisect: >> > >> > f0b0685d6694a28c66018f438e822596243b1250 is the first bad commit >> > commit f0b0685d6694a28c66018f438e822596243b1250 >> > Author: Nikunj A Dadhania >> > Date: Thu Apr 27 10:48:23 2017 +0530 >> > >> > tcg: enable MTTCG by default for PPC64 on x86 >> >> Let me have a look at it. > > Is this crash fixed by the patch from Thomas I merged yesterday? No, it doesn't. Moreover, in Greg's scenario there is only one cpu. Regards, Nikunj
Re: [Qemu-devel] [PATCH v4 0/6] spapr/xics: fix migration of older machine types
Greg Kurz writes: > On Sun, 11 Jun 2017 17:38:42 +0800 > David Gibson wrote: > >> On Fri, Jun 09, 2017 at 05:09:13PM +0200, Greg Kurz wrote: >> > On Fri, 9 Jun 2017 20:28:32 +1000 >> > David Gibson wrote: >> > >> > > On Fri, Jun 09, 2017 at 11:36:31AM +0200, Greg Kurz wrote: >> > > > On Fri, 9 Jun 2017 12:28:13 +1000 >> > > > David Gibson wrote: >> > > > >> > > > > On Thu, Jun 08, 2017 at 03:42:32PM +0200, Greg Kurz wrote: >> > > > > > I've provided answers for all comments from the v3 review that I >> > > > > > deliberately >> > > > > > don't address in v4. >> > > > > >> > > > > I've merged patches 1-4. 5 & 6 I'm still reviewing. >> > > > > >> > > > >> > > > Cool. FYI, I forgot to mention that I only tested with KVM. >> > > > >> > > > I'm now trying with TCG and I hit various guest crash on >> > > > the destination (using your ppc-for-2.10 branch WITHOUT >> > > > my patches): >> > > >> > > Drat. What's your reproducer for this crash? >> > > >> > >> > 1) start guest >> > >> > qemu-system-ppc64 \ >> > -nodefaults -nographic -snapshot -no-shutdown -serial mon:stdio \ >> > -device virtio-net,netdev=netdev0,id=net0 \ >> > -netdev >> > bridge,id=netdev0,br=virbr0,helper=/usr/libexec/qemu-bridge-helper \ >> > -device virtio-blk,drive=drive0,id=blk0 \ >> > -drive file=/home/greg/images/sle12-sp1-ppc64le.qcow2,id=drive0,if=none \ >> > -machine type=pseries,accel=tcg -cpu POWER8 Strangely, your command line does not have multiple threads. Need to see what is the side effect of enabling MTTCG by default here. >> > >> > 2) migrate >> > >> > 3) destination crashes (immediately or after very short delay) or >> > hangs >> >> Ok. I'll bisect it when I can, but you might well get to it first. >> >> > > Heh, maybe you didn't see in my mail but I did bisect: > > f0b0685d6694a28c66018f438e822596243b1250 is the first bad commit > commit f0b0685d6694a28c66018f438e822596243b1250 > Author: Nikunj A Dadhania > Date: Thu Apr 27 10:48:23 2017 +0530 > > tcg: enable MTTCG by default for PPC64 on x86 Let me have a look at it. Regards, Nikunj
Re: [Qemu-devel] [PATCH RFC] spapr: ignore interrupts during reset state
David Gibson writes: > On Thu, Jun 08, 2017 at 12:06:08PM +0530, Nikunj A Dadhania wrote: >> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG. > > Ouch. When exactly did this happen? Broken since long > I know that smp boot used to work under TCG, albeit very slowly. SMP boot works, its the reboot issued from the guest doesn't boot and crashes in SLOF. >> When reset happens, all the CPUs are in halted state. First CPU is brought >> out >> of reset and secondary CPUs would be initialized by the guest kernel using a >> rtas call start-cpu. >> >> However, in case of TCG, decrementer interrupts keep on coming and waking the >> secondary CPUs up. > > Ok.. how is that happening given that the secondary CPUs should have > MSR[EE] == 0? Basically, the CPU is in halted condition and has_work() does not check for MSR_EE in that case. But I am not sure if checking MSR_EE is sufficient, as the CPU does go to halted state (idle) while running as well. static bool cpu_has_work_POWER8(CPUState *cs) { PowerPCCPU *cpu = POWERPC_CPU(cs); CPUPPCState *env = &cpu->env; if (cs->halted) { [ SNIP ] /* Does not check for msr_ee */ } else { return msr_ee && (cs->interrupt_request & CPU_INTERRUPT_HARD); } } > >> These secondary CPUs would see the decrementer interrupt pending, which makes >> cpu::has_work() to bring them out of wait loop and start executing >> tcg_exec_cpu(). >> >> The problem with this is all the CPUs wake up and start booting SLOF image, >> causing the following exception(4 CPUs TCG VM): > > [snip] >> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h >> index d10808d..eb88bcb 100644 >> --- a/target/ppc/cpu.h >> +++ b/target/ppc/cpu.h >> @@ -1013,6 +1013,13 @@ struct CPUPPCState { >> int access_type; /* when a memory exception occurs, the access >> type is stored here */ >> >> +/* CPU in reset, shouldn't process any interrupts. >> + * >> + * Decrementer interrupts in TCG can still wake the CPU up. Make sure >> that >> + * when this variable is set, cpu_has_work_* should return false. >> + */ >> +int in_reset; > > So I'd really rather not add another flag to the cpu structure, > especially since we'd then need to migrate it as well. I agree, Bharata and I did discuss about the migrate case. This patch was to highlight the exact issue. > I'm pretty sure there should be a way to inhibit the unwanted > interrupts using existing mechanisms. One of the thing that I had observed was msr had just MSR_SF bit set during the reset case, we can test for that maybe. The below works as well: +if ((env->msr & ~(1ull << MSR_SF)) == 0) { +return false; +} >> + >> CPU_COMMON >> >> /* MMU context - only relevant for full system emulation */ >> diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c >> index 56a0ab2..64f4348 100644 >> --- a/target/ppc/translate_init.c >> +++ b/target/ppc/translate_init.c >> @@ -8561,6 +8561,9 @@ static bool cpu_has_work_POWER7(CPUState *cs) >> CPUPPCState *env = &cpu->env; >> >> if (cs->halted) { >> +if (env->in_reset) { >> +return false; >> +} >> if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) { >> return false; >> } >> @@ -8718,6 +8721,9 @@ static bool cpu_has_work_POWER8(CPUState *cs) >> CPUPPCState *env = &cpu->env; >> >> if (cs->halted) { >> +if (env->in_reset) { >> +return false; >> +} >> if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) { >> return false; >> } >> @@ -8899,6 +8905,9 @@ static bool cpu_has_work_POWER9(CPUState *cs) >> CPUPPCState *env = &cpu->env; >> >> if (cs->halted) { >> +if (env->in_reset) { >> +return false; >> +} >> if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) { >> return false; >> } Regards Nikunj
[Qemu-devel] [PATCH RFC] spapr: ignore interrupts during reset state
Rebooting a SMP TCG guest is broken for both single/multi threaded TCG. When reset happens, all the CPUs are in halted state. First CPU is brought out of reset and secondary CPUs would be initialized by the guest kernel using a rtas call start-cpu. However, in case of TCG, decrementer interrupts keep on coming and waking the secondary CPUs up. These secondary CPUs would see the decrementer interrupt pending, which makes cpu::has_work() to bring them out of wait loop and start executing tcg_exec_cpu(). The problem with this is all the CPUs wake up and start booting SLOF image, causing the following exception(4 CPUs TCG VM): [ 81.440850] reboot: Restarting system SLOF S SLOF SLOFLOF[0[0m ** QEMU Starting Build Date = Mar 3 2017 13:29:19 FW Version = git-66d250ef0fd06bb8 [0m ** QEMU Starting Build Date = Mar 3 2017 13:29:19 FW Version = git-66d250ef0fd06bb8 [0m *m**[?25l ** QEMU Starting Build Date = Mar 3 2017 13:29:19 FW Version = git-66d250ef0fd06bb8 *** QEMU Starting Build Date = Mar 3 2017 13:29:19 FW Version = git-66d250ef0fd06bb8 ERROR: Flatten device tree not available! exception 300 SRR0 = 60e4 SRR1 = 80008000 SPRG2 = 0040 SPRG3 = 4bd8 ERROR: Flatten device tree not available! exception 300 SRR0 = 60e4 SRR1 = 80008000 SPRG2 = 0040 SPRG3 = 4bd8 Reported-by: Bharata B Rao Signed-off-by: Nikunj A Dadhania --- Note: Similar changes would be required for powernv as well. Haven't got time to test it there. --- hw/ppc/spapr.c | 1 + hw/ppc/spapr_cpu_core.c | 1 + hw/ppc/spapr_rtas.c | 1 + target/ppc/cpu.h| 7 +++ target/ppc/translate_init.c | 9 + 5 files changed, 19 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 01dda9e..fba2ef5 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1370,6 +1370,7 @@ static void ppc_spapr_reset(void) first_ppc_cpu->env.gpr[3] = fdt_addr; first_ppc_cpu->env.gpr[5] = 0; first_cpu->halted = 0; +first_ppc_cpu->env.in_reset = 0; first_ppc_cpu->env.nip = SPAPR_ENTRY_POINT; spapr->cas_reboot = false; diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c index 029a141..c100213 100644 --- a/hw/ppc/spapr_cpu_core.c +++ b/hw/ppc/spapr_cpu_core.c @@ -33,6 +33,7 @@ static void spapr_cpu_reset(void *opaque) * reset code and the rest are explicitly started up by the guest * using an RTAS call */ cs->halted = 1; +env->in_reset = 1; env->spr[SPR_HIOR] = 0; diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c index 94a2799..eaf0afb 100644 --- a/hw/ppc/spapr_rtas.c +++ b/hw/ppc/spapr_rtas.c @@ -177,6 +177,7 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, sPAPRMachineState *spapr, env->nip = start; env->gpr[3] = r3; cs->halted = 0; +env->in_reset = 0; spapr_cpu_set_endianness(cpu); spapr_cpu_update_tb_offset(cpu); diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h index d10808d..eb88bcb 100644 --- a/target/ppc/cpu.h +++ b/target/ppc/cpu.h @@ -1013,6 +1013,13 @@ struct CPUPPCState { int access_type; /* when a memory exception occurs, the access type is stored here */ +/* CPU in reset, shouldn't process any interrupts. + * + * Decrementer interrupts in TCG can still wake the CPU up. Make sure that + * when this variable is set, cpu_has_work_* should return false. + */ +int in_reset; + CPU_COMMON /* MMU context - only relevant for full system emulation */ diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c index 56a0ab2..64f4348 100644 --- a/target/ppc/translate_init.c +++ b/target/ppc/translate_init.c @@ -8561,6 +8561,9 @@ static bool cpu_has_work_POWER7(CPUState *cs) CPUPPCState *env = &cpu->env; if (cs->halted) { +if (env->in_reset) { +return false; +} if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) { return false; } @@ -8718,6 +8721,9 @@ static bool cpu_has_work_POWER8(CPUState *cs) CPUPPCState *env = &cpu->env; if (cs->halted) { +if (env->in_reset) { +return false; +} if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) { return false; } @@ -8899,6 +8905,9 @@ static bool cpu_has_work_POWER9(CPUState *cs) CPUPPCState *env = &cpu->env; if (cs->halted) { +if (env->in_reset) { +return false; +} if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) { return false; } -- 2.9.3
Re: [Qemu-devel] [PATCH risu] ppc64.risu: Fix broken constraints
Peter Maydell writes: > Commit c10b97092 changed some field names in rldicr and rldimi patterns > but forgot to update the constraints to match the change. Since the > field (previously 'rb' and now 'sh') is an immediate rather than a > register number, the correct fix is to just delete the constraint > since we don't need to avoid particular values. > > Signed-off-by: Peter Maydell Thanks for fixing it. > --- > ppc64.risu | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/ppc64.risu b/ppc64.risu > index dd304e2..e2fd4f6 100644 > --- a/ppc64.risu > +++ b/ppc64.risu > @@ -1473,17 +1473,17 @@ RLDICLd PPC64LE 00 rs:5 ra:5 sh:5 mb:6 000 sha:1 > 1 \ > > # format:MD book:I page:105 PPC SR rldicr[.] Rotate Left Dword Immediate > then Clear Right > RLDICR PPC64LE 00 rs:5 ra:5 sh:5 me:6 001 sha:1 0 \ > -!constraints { $rs != 1 && $ra != 1 && $rb != 1 && $rs != 13 && $ra != 13 && > $rb != 13; } > +!constraints { $rs != 1 && $ra != 1 && $rs != 13 && $ra != 13; } > # format:MD book:I page:105 PPC SR rldicr[.] Rotate Left Dword Immediate > then Clear Right > RLDICRd PPC64LE 00 rs:5 ra:5 sh:5 me:6 001 sha:1 1 \ > -!constraints { $rs != 1 && $ra != 1 && $rb != 1 && $rs != 13 && $ra != 13 && > $rb != 13; } > +!constraints { $rs != 1 && $ra != 1 && $rs != 13 && $ra != 13; } > > # format:MD book:I page:105 PPC SR rldimi[.] Rotate Left Dword Immediate > then Mask Insert > RLDIMI PPC64LE 00 rs:5 ra:5 sh:5 me:6 011 sha:1 0 \ > -!constraints { $rs != 1 && $ra != 1 && $rb != 1 && $rs != 13 && $ra != 13 && > $rb != 13; } > +!constraints { $rs != 1 && $ra != 1 && $rs != 13 && $ra != 13; } > # format:MD book:I page:105 PPC SR rldimi[.] Rotate Left Dword Immediate > then Mask Insert > RLDIMId PPC64LE 00 rs:5 ra:5 sh:5 me:6 011 sha:1 1 \ > -!constraints { $rs != 1 && $ra != 1 && $rb != 1 && $rs != 13 && $ra != 13 && > $rb != 13; } > +!constraints { $rs != 1 && $ra != 1 && $rs != 13 && $ra != 13; } > > # format:M book:I page:102 v:P1 SR rlwimi[.] Rotate Left Word Immediate then > Mask Insert > RLWIMI PPC64LE 010100 rs:5 ra:5 sh:5 mb:5 me:5 0 \ > -- > 2.7.4
Re: [Qemu-devel] [PATCH risu v2] ppc64: Fix patterns for rotate doubleword instructions
Peter Maydell writes: > On 30 May 2017 at 16:39, Peter Maydell wrote: >> On 30 May 2017 at 16:26, Nikunj A Dadhania wrote: >>> Sandipan Das writes: >>> >>>> The patterns for the following instructions are fixed: >>>> * Rotate Left Doubleword then Clear Right (rldcr[.]) >>>> * Rotate Left Doubleword Immediate then Clear Right (rldicr[.]) >>>> * Rotate Left Doubleword Immediate then Mask Insert (rldimi[.]) >>>> >>>> The first instruction has a typo. For the other two instructions, >>>> the extended opcodes are incorrect and the shift field 'sha' is >>>> absent. Also, the shift field 'sh' should be used in place of the >>>> register field 'rb'. >>>> >>>> Signed-off-by: Sandipan Das >>> >>> Reviewed-by: Nikunj A Dadhania >> >> Thanks; applied to risu master. > > ...but I foolishly didn't run build-all-archs first, which > points out that there's a bug: Ouch :( > Syntax error detected evaluating RLDIMId PPC64LE constraints string: ] > { $rs != 1 && $ra != 1 && $rb != 1 && $rs != 13 && $ra != 13 && $rb != 13; } > Global symbol "$rb" requires explicit package name (did you forget to > declare "my $rb"?) at (eval 429) line 1. > Global symbol "$rb" requires explicit package name (did you forget to > declare "my $rb"?) at (eval 429) line 1. > > You forgot to update the constraints when you changed the > field names... I think that the constraints on $rb should > be removed rather than just changed to use $sh because this > field is an immediate, not a register number, so we don't need > to make it avoid 1 and 13. This would bring them into line with > the other rotate-immediates. I'll post a patch in a second. Regards Nikunj
Re: [Qemu-devel] [PATCH risu v2] ppc64: Fix patterns for rotate doubleword instructions
Sandipan Das writes: > The patterns for the following instructions are fixed: > * Rotate Left Doubleword then Clear Right (rldcr[.]) > * Rotate Left Doubleword Immediate then Clear Right (rldicr[.]) > * Rotate Left Doubleword Immediate then Mask Insert (rldimi[.]) > > The first instruction has a typo. For the other two instructions, > the extended opcodes are incorrect and the shift field 'sha' is > absent. Also, the shift field 'sh' should be used in place of the > register field 'rb'. > > Signed-off-by: Sandipan Das Reviewed-by: Nikunj A Dadhania > --- > ppc64.risu | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/ppc64.risu b/ppc64.risu > index 28df9da..dd304e2 100644 > --- a/ppc64.risu > +++ b/ppc64.risu > @@ -1451,7 +1451,7 @@ RLDCLd PPC64LE 00 rs:5 ra:5 rb:5 mb:6 10001 \ > !constraints { $rs != 1 && $ra != 1 && $rb != 1 && $rs != 13 && $ra != 13 && > $rb != 13; } > > # format:MDS book:I page:103 PPC SR rldcr Rotate Left Dword then Clear Right > -RLCDR PPC64LE 00 rs:5 ra:5 rb:5 mb:6 10010 \ > +RLDCR PPC64LE 00 rs:5 ra:5 rb:5 mb:6 10010 \ > !constraints { $rs != 1 && $ra != 1 && $rb != 1 && $rs != 13 && $ra != 13 && > $rb != 13; } > # format:MDS book:I page:103 PPC SR rldcr Rotate Left Dword then Clear Right > RLDCRd PPC64LE 00 rs:5 ra:5 rb:5 mb:6 10011 \ > @@ -1472,17 +1472,17 @@ RLDICLd PPC64LE 00 rs:5 ra:5 sh:5 mb:6 000 sha:1 > 1 \ > !constraints { $rs != 1 && $ra != 1 && $rs != 13 && $ra != 13; } > > # format:MD book:I page:105 PPC SR rldicr[.] Rotate Left Dword Immediate > then Clear Right > -RLDICR PPC64LE 00 rs:5 ra:5 rb:5 me:6 00010 \ > +RLDICR PPC64LE 00 rs:5 ra:5 sh:5 me:6 001 sha:1 0 \ > !constraints { $rs != 1 && $ra != 1 && $rb != 1 && $rs != 13 && $ra != 13 && > $rb != 13; } > # format:MD book:I page:105 PPC SR rldicr[.] Rotate Left Dword Immediate > then Clear Right > -RLDICRd PPC64LE 00 rs:5 ra:5 rb:5 me:6 00011 \ > +RLDICRd PPC64LE 00 rs:5 ra:5 sh:5 me:6 001 sha:1 1 \ > !constraints { $rs != 1 && $ra != 1 && $rb != 1 && $rs != 13 && $ra != 13 && > $rb != 13; } > > # format:MD book:I page:105 PPC SR rldimi[.] Rotate Left Dword Immediate > then Mask Insert > -RLDIMI PPC64LE 00 rs:5 ra:5 rb:5 me:6 00110 \ > +RLDIMI PPC64LE 00 rs:5 ra:5 sh:5 me:6 011 sha:1 0 \ > !constraints { $rs != 1 && $ra != 1 && $rb != 1 && $rs != 13 && $ra != 13 && > $rb != 13; } > # format:MD book:I page:105 PPC SR rldimi[.] Rotate Left Dword Immediate > then Mask Insert > -RLDIMId PPC64LE 00 rs:5 ra:5 rb:5 me:6 00111 \ > +RLDIMId PPC64LE 00 rs:5 ra:5 sh:5 me:6 011 sha:1 1 \ > !constraints { $rs != 1 && $ra != 1 && $rb != 1 && $rs != 13 && $ra != 13 && > $rb != 13; } > > # format:M book:I page:102 v:P1 SR rlwimi[.] Rotate Left Word Immediate then > Mask Insert > -- > 2.7.4
Re: [Qemu-devel] [PATCH risu] ppc64: Fix patterns for rotate doubleword instructions
G 3 writes: > On May 22, 2017, at 4:32 AM, qemu-devel-requ...@nongnu.org wrote: > > Hello I have also done some work risu. My patches add ppc32 support. > Well my patches were made to work with Mac OS X but they are required > to work with Linux. Do you think you could help port these patches to > Linux? Ziviani did the ppc64 work, lets see if he can spare some time. The patches haven't come right on the mailing list, its tedious to pull them. Please resend them with git send-mail. > > ppc.risu: > https://patchwork.kernel.org/patch/9697489/ > > risu_ppc.c: > https://patchwork.kernel.org/patch/9697491/ > > risu_reginfo_ppc.c: > https://patchwork.kernel.org/patch/9697493/ > > risu_reginfo_ppc.h: > https://patchwork.kernel.org/patch/9697495/ > > risugen_ppc.pm: > https://patchwork.kernel.org/patch/9697497/ > > Add ppc support to configure: > https://patchwork.kernel.org/patch/9697499/ > > Add verbose option: > https://patchwork.kernel.org/patch/9697501/ > > Add end of test message: > https://patchwork.kernel.org/patch/9697503/ > > Add more descriptive comment for mismatch or end of test condition: > https://patchwork.kernel.org/patch/9697505/ Regards Nikunj
Re: [Qemu-devel] [PATCH risu] ppc64: Fix patterns for rotate doubleword instructions
Sandipan Das writes: > The patterns for the following instructions are fixed: > * Rotate Left Doubleword then Clear Right (rldcr[.]) > * Rotate Left Doubleword Immediate then Clear Right (rldicr[.]) > * Rotate Left Doubleword Immediate then Mask Insert (rldimi[.]) > > Signed-off-by: Sandipan Das > --- > ppc64.risu | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/ppc64.risu b/ppc64.risu > index 28df9da..0f29248 100644 > --- a/ppc64.risu > +++ b/ppc64.risu > @@ -1451,7 +1451,7 @@ RLDCLd PPC64LE 00 rs:5 ra:5 rb:5 mb:6 10001 \ > !constraints { $rs != 1 && $ra != 1 && $rb != 1 && $rs != 13 && $ra != 13 && > $rb != 13; } > > # format:MDS book:I page:103 PPC SR rldcr Rotate Left Dword then Clear Right > -RLCDR PPC64LE 00 rs:5 ra:5 rb:5 mb:6 10010 \ > +RLDCR PPC64LE 00 rs:5 ra:5 rb:5 mb:6 10010 \ > !constraints { $rs != 1 && $ra != 1 && $rb != 1 && $rs != 13 && $ra != 13 && > $rb != 13; } > # format:MDS book:I page:103 PPC SR rldcr Rotate Left Dword then Clear Right > RLDCRd PPC64LE 00 rs:5 ra:5 rb:5 mb:6 10011 \ > @@ -1472,17 +1472,17 @@ RLDICLd PPC64LE 00 rs:5 ra:5 sh:5 mb:6 000 sha:1 > 1 \ > !constraints { $rs != 1 && $ra != 1 && $rs != 13 && $ra != 13; } > > # format:MD book:I page:105 PPC SR rldicr[.] Rotate Left Dword Immediate > then Clear Right > -RLDICR PPC64LE 00 rs:5 ra:5 rb:5 me:6 00010 \ > +RLDICR PPC64LE 00 rs:5 ra:5 rb:5 me:6 001 sha:1 0 \ RLDICR PPC64LE 00 rs:5 ra:5 sh:5 me:6 001 sha:1 0 Also "rb:5" be changed as "sh:5"? > !constraints { $rs != 1 && $ra != 1 && $rb != 1 && $rs != 13 && $ra != 13 && > $rb != 13; } > # format:MD book:I page:105 PPC SR rldicr[.] Rotate Left Dword Immediate > then Clear Right > -RLDICRd PPC64LE 00 rs:5 ra:5 rb:5 me:6 00011 \ > +RLDICRd PPC64LE 00 rs:5 ra:5 rb:5 me:6 001 sha:1 1 \ RLDICRd PPC64LE 00 rs:5 ra:5 sh:5 me:6 001 sha:1 0 > !constraints { $rs != 1 && $ra != 1 && $rb != 1 && $rs != 13 && $ra != 13 && > $rb != 13; } > > # format:MD book:I page:105 PPC SR rldimi[.] Rotate Left Dword Immediate > then Mask Insert > -RLDIMI PPC64LE 00 rs:5 ra:5 rb:5 me:6 00110 \ > +RLDIMI PPC64LE 00 rs:5 ra:5 rb:5 me:6 011 sha:1 0 \ RLDIMI PPC64LE 00 rs:5 ra:5 sh:5 me:6 011 sha:1 0 > !constraints { $rs != 1 && $ra != 1 && $rb != 1 && $rs != 13 && $ra != 13 && > $rb != 13; } > # format:MD book:I page:105 PPC SR rldimi[.] Rotate Left Dword Immediate > then Mask Insert > -RLDIMId PPC64LE 00 rs:5 ra:5 rb:5 me:6 00111 \ > +RLDIMId PPC64LE 00 rs:5 ra:5 rb:5 me:6 011 sha:1 1 \ RLDIMId PPC64LE 00 rs:5 ra:5 sh:5 me:6 011 sha:1 1 > !constraints { $rs != 1 && $ra != 1 && $rb != 1 && $rs != 13 && $ra != 13 && > $rb != 13; } > > # format:M book:I page:102 v:P1 SR rlwimi[.] Rotate Left Word Immediate then > Mask Insert > -- > 2.7.4 Regards, Nikunj
[Qemu-devel] [PATCH] target/ppc: reset reservation in do_rfi()
For transitioning back to userspace after the interrupt. Suggested-by: Richard Henderson Signed-off-by: Nikunj A Dadhania --- target/ppc/excp_helper.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c index a6bcb47..9cb2123 100644 --- a/target/ppc/excp_helper.c +++ b/target/ppc/excp_helper.c @@ -995,6 +995,9 @@ static inline void do_rfi(CPUPPCState *env, target_ulong nip, target_ulong msr) */ cs->interrupt_request |= CPU_INTERRUPT_EXITTB; +/* Reset the reservation */ +env->reserve_addr = -1; + /* Context synchronizing: check if TCG TLB needs flush */ check_tlb_flush(env, false); } -- 2.9.3
Re: [Qemu-devel] [PATCH 7/8] target/ppc: optimize various functions using extract op
Philippe Mathieu-Daudé writes: > Applied using Coccinelle script. > > Signed-off-by: Philippe Mathieu-Daudé > --- > target/ppc/translate.c | 9 +++-- > target/ppc/translate/vsx-impl.inc.c | 21 +++-- > 2 files changed, 10 insertions(+), 20 deletions(-) > > diff --git a/target/ppc/translate.c b/target/ppc/translate.c > index f40b5a1abf..64ab412bf3 100644 > --- a/target/ppc/translate.c > +++ b/target/ppc/translate.c > @@ -868,8 +868,7 @@ static inline void gen_op_arith_add(DisasContext *ctx, > TCGv ret, TCGv arg1, > } > tcg_gen_xor_tl(cpu_ca, t0, t1);/* bits changed w/ carry > */ > tcg_temp_free(t1); > -tcg_gen_shri_tl(cpu_ca, cpu_ca, 32); /* extract bit 32 */ > -tcg_gen_andi_tl(cpu_ca, cpu_ca, 1); > +tcg_gen_extract_tl(cpu_ca, cpu_ca, 32, 1); > if (is_isa300(ctx)) { > tcg_gen_mov_tl(cpu_ca32, cpu_ca); > } > @@ -1399,8 +1398,7 @@ static inline void gen_op_arith_subf(DisasContext *ctx, > TCGv ret, TCGv arg1, > tcg_temp_free(inv1); > tcg_gen_xor_tl(cpu_ca, t0, t1); /* bits changes w/ carry > */ > tcg_temp_free(t1); > -tcg_gen_shri_tl(cpu_ca, cpu_ca, 32);/* extract bit 32 */ > -tcg_gen_andi_tl(cpu_ca, cpu_ca, 1); > +tcg_gen_extract_tl(cpu_ca, cpu_ca, 32, 1); > if (is_isa300(ctx)) { > tcg_gen_mov_tl(cpu_ca32, cpu_ca); > } Above changes are correct. Rest of them are wrong as discussed above in the thread with Richard. > @@ -5383,8 +5381,7 @@ static void gen_mfsri(DisasContext *ctx) > CHK_SV; > t0 = tcg_temp_new(); > gen_addr_reg_index(ctx, t0); > -tcg_gen_shri_tl(t0, t0, 28); > -tcg_gen_andi_tl(t0, t0, 0xF); > +tcg_gen_extract_tl(t0, t0, 28, 0xF); > gen_helper_load_sr(cpu_gpr[rd], cpu_env, t0); > tcg_temp_free(t0); > if (ra != 0 && ra != rd) > diff --git a/target/ppc/translate/vsx-impl.inc.c > b/target/ppc/translate/vsx-impl.inc.c > index 7f12908029..354a6b113a 100644 > --- a/target/ppc/translate/vsx-impl.inc.c > +++ b/target/ppc/translate/vsx-impl.inc.c > @@ -1262,8 +1262,7 @@ static void gen_xsxexpqp(DisasContext *ctx) > gen_exception(ctx, POWERPC_EXCP_VSXU); > return; > } > -tcg_gen_shri_i64(xth, xbh, 48); > -tcg_gen_andi_i64(xth, xth, 0x7FFF); > +tcg_gen_extract_i64(xth, xbh, 48, 0x7FFF); > tcg_gen_movi_i64(xtl, 0); > } > > @@ -1431,10 +1430,8 @@ static void gen_xvxexpsp(DisasContext *ctx) > gen_exception(ctx, POWERPC_EXCP_VSXU); > return; > } > -tcg_gen_shri_i64(xth, xbh, 23); > -tcg_gen_andi_i64(xth, xth, 0xFF00FF); > -tcg_gen_shri_i64(xtl, xbl, 23); > -tcg_gen_andi_i64(xtl, xtl, 0xFF00FF); > +tcg_gen_extract_i64(xth, xbh, 23, 0xFF00FF); > +tcg_gen_extract_i64(xtl, xbl, 23, 0xFF00FF); > } > > static void gen_xvxexpdp(DisasContext *ctx) > @@ -1448,10 +1445,8 @@ static void gen_xvxexpdp(DisasContext *ctx) > gen_exception(ctx, POWERPC_EXCP_VSXU); > return; > } > -tcg_gen_shri_i64(xth, xbh, 52); > -tcg_gen_andi_i64(xth, xth, 0x7FF); > -tcg_gen_shri_i64(xtl, xbl, 52); > -tcg_gen_andi_i64(xtl, xtl, 0x7FF); > +tcg_gen_extract_i64(xth, xbh, 52, 0x7FF); > +tcg_gen_extract_i64(xtl, xbl, 52, 0x7FF); > } > > GEN_VSX_HELPER_2(xvxsigsp, 0x00, 0x04, 0, PPC2_ISA300) > @@ -1474,16 +1469,14 @@ static void gen_xvxsigdp(DisasContext *ctx) > zr = tcg_const_i64(0); > nan = tcg_const_i64(2047); > > -tcg_gen_shri_i64(exp, xbh, 52); > -tcg_gen_andi_i64(exp, exp, 0x7FF); > +tcg_gen_extract_i64(exp, xbh, 52, 0x7FF); > tcg_gen_movi_i64(t0, 0x0010); > tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, zr, zr, t0); > tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0); > tcg_gen_andi_i64(xth, xbh, 0x000F); > tcg_gen_or_i64(xth, xth, t0); > > -tcg_gen_shri_i64(exp, xbl, 52); > -tcg_gen_andi_i64(exp, exp, 0x7FF); > +tcg_gen_extract_i64(exp, xbl, 52, 0x7FF); > tcg_gen_movi_i64(t0, 0x0010); > tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, zr, zr, t0); > tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0); Regards Nikunj
[Qemu-devel] [PATCH v3 6/6] target/ppc: do not reset reserve_addr in exec_enter
In case when atomic operation is not supported, exit_atomic is called and we stop the world and execute the atomic operation. This results in a following call chain: tcg_gen_atomic_cmpxchg_tl() -> gen_helper_exit_atomic() -> HELPER(exit_atomic) -> cpu_loop_exit_atomic() -> EXCP_ATOMIC -> qemu_tcg_cpu_thread_fn() => case EXCP_ATOMIC -> cpu_exec_step_atomic() -> cpu_step_atomic() -> cc->cpu_exec_enter() = ppc_cpu_exec_enter() Sets env->reserve_addr = -1; But by the time it return back, the reservation is erased and the code fails, this continues forever and the lock is never taken. Instead set this in powerpc_excp() Now that ppc_cpu_exec_enter() doesn't have anything meaningful to do, let us get rid of the function. Signed-off-by: Nikunj A Dadhania --- target/ppc/excp_helper.c| 3 +++ target/ppc/translate_init.c | 9 - 2 files changed, 3 insertions(+), 9 deletions(-) diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c index f4ee7aa..a6bcb47 100644 --- a/target/ppc/excp_helper.c +++ b/target/ppc/excp_helper.c @@ -728,6 +728,9 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int excp_model, int excp) cs->exception_index = POWERPC_EXCP_NONE; env->error_code = 0; +/* Reset the reservation */ +env->reserve_addr = -1; + /* Any interrupt is context synchronizing, check if TCG TLB * needs a delayed flush on ppc64 */ diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c index e82e3e6..9b048cd 100644 --- a/target/ppc/translate_init.c +++ b/target/ppc/translate_init.c @@ -10436,14 +10436,6 @@ static bool ppc_cpu_has_work(CPUState *cs) return msr_ee && (cs->interrupt_request & CPU_INTERRUPT_HARD); } -static void ppc_cpu_exec_enter(CPUState *cs) -{ -PowerPCCPU *cpu = POWERPC_CPU(cs); -CPUPPCState *env = &cpu->env; - -env->reserve_addr = -1; -} - /* CPUClass::reset() */ static void ppc_cpu_reset(CPUState *s) { @@ -10660,7 +10652,6 @@ static void ppc_cpu_class_init(ObjectClass *oc, void *data) cc->get_phys_page_debug = ppc_cpu_get_phys_page_debug; cc->vmsd = &vmstate_ppc_cpu; #endif -cc->cpu_exec_enter = ppc_cpu_exec_enter; #if defined(CONFIG_SOFTMMU) cc->write_elf64_note = ppc64_cpu_write_elf64_note; cc->write_elf32_note = ppc32_cpu_write_elf32_note; -- 2.9.3
[Qemu-devel] [PATCH v3 3/6] target/ppc: Generate fence operations
Signed-off-by: Nikunj A Dadhania Reviewed-by: Richard Henderson --- target/ppc/translate.c | 8 1 file changed, 8 insertions(+) diff --git a/target/ppc/translate.c b/target/ppc/translate.c index 50b6d4d..4a1f24a 100644 --- a/target/ppc/translate.c +++ b/target/ppc/translate.c @@ -2971,6 +2971,7 @@ static void gen_stswx(DisasContext *ctx) /* eieio */ static void gen_eieio(DisasContext *ctx) { +tcg_gen_mb(TCG_MO_LD_ST | TCG_BAR_SC); } #if !defined(CONFIG_USER_ONLY) @@ -3008,6 +3009,7 @@ static void gen_isync(DisasContext *ctx) if (!ctx->pr) { gen_check_tlb_flush(ctx, false); } +tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC); gen_stop_exception(ctx); } @@ -3028,6 +3030,7 @@ static void gen_##name(DisasContext *ctx) \ tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop);\ tcg_gen_mov_tl(cpu_reserve, t0); \ tcg_gen_mov_tl(cpu_reserve_val, gpr);\ +tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ); \ tcg_temp_free(t0); \ } @@ -3177,6 +3180,10 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA, tcg_gen_br(l2); gen_set_label(l1); + +/* Address mismatch implies failure. But we still need to provide the + memory barrier semantics of the instruction. */ +tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL); tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so); gen_set_label(l2); @@ -3308,6 +3315,7 @@ static void gen_sync(DisasContext *ctx) if (((l == 2) || !(ctx->insns_flags & PPC_64B)) && !ctx->pr) { gen_check_tlb_flush(ctx, true); } +tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC); } /* wait */ -- 2.9.3
[Qemu-devel] [PATCH v3 4/6] cpus: Fix CPU unplug for MTTCG
From: Bharata B Rao Ensure that the unplugged CPU thread is destroyed and the waiting thread is notified about it. This is needed for CPU unplug to work correctly in MTTCG mode. Signed-off-by: Bharata B Rao Signed-off-by: Nikunj A Dadhania --- cpus.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/cpus.c b/cpus.c index 740b8dc..79f780b 100644 --- a/cpus.c +++ b/cpus.c @@ -1483,6 +1483,12 @@ static void *qemu_tcg_cpu_thread_fn(void *arg) /* Ignore everything else? */ break; } +} else if (cpu->unplug) { +qemu_tcg_destroy_vcpu(cpu); +cpu->created = false; +qemu_cond_signal(&qemu_cpu_cond); +qemu_mutex_unlock_iothread(); +return NULL; } atomic_mb_set(&cpu->exit_request, 0); -- 2.9.3
[Qemu-devel] [PATCH v3 1/6] target/ppc: Emulate LL/SC using cmpxchg helpers
Emulating LL/SC with cmpxchg is not correct, since it can suffer from the ABA problem. However, portable parallel code is written assuming only cmpxchg which means that in practice this is a viable alternative. Signed-off-by: Nikunj A Dadhania Reviewed-by: Richard Henderson --- target/ppc/translate.c | 29 +++-- 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/target/ppc/translate.c b/target/ppc/translate.c index f40b5a1..50b6d4d 100644 --- a/target/ppc/translate.c +++ b/target/ppc/translate.c @@ -73,6 +73,7 @@ static TCGv cpu_cfar; #endif static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca, cpu_ov32, cpu_ca32; static TCGv cpu_reserve; +static TCGv cpu_reserve_val; static TCGv cpu_fpscr; static TCGv_i32 cpu_access_type; @@ -181,6 +182,9 @@ void ppc_translate_init(void) cpu_reserve = tcg_global_mem_new(cpu_env, offsetof(CPUPPCState, reserve_addr), "reserve_addr"); +cpu_reserve_val = tcg_global_mem_new(cpu_env, + offsetof(CPUPPCState, reserve_val), + "reserve_val"); cpu_fpscr = tcg_global_mem_new(cpu_env, offsetof(CPUPPCState, fpscr), "fpscr"); @@ -3023,7 +3027,7 @@ static void gen_##name(DisasContext *ctx) \ }\ tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop);\ tcg_gen_mov_tl(cpu_reserve, t0); \ -tcg_gen_st_tl(gpr, cpu_env, offsetof(CPUPPCState, reserve_val)); \ +tcg_gen_mov_tl(cpu_reserve_val, gpr);\ tcg_temp_free(t0); \ } @@ -3155,14 +3159,27 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA, static void gen_conditional_store(DisasContext *ctx, TCGv EA, int reg, int memop) { -TCGLabel *l1; +TCGLabel *l1 = gen_new_label(); +TCGLabel *l2 = gen_new_label(); +TCGv t0; -tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so); -l1 = gen_new_label(); tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1); -tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], CRF_EQ); -tcg_gen_qemu_st_tl(cpu_gpr[reg], EA, ctx->mem_idx, memop); + +t0 = tcg_temp_new(); +tcg_gen_atomic_cmpxchg_tl(t0, cpu_reserve, cpu_reserve_val, + cpu_gpr[reg], ctx->mem_idx, + DEF_MEMOP(memop) | MO_ALIGN); +tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_reserve_val); +tcg_gen_shli_tl(t0, t0, CRF_EQ_BIT); +tcg_gen_or_tl(t0, t0, cpu_so); +tcg_gen_trunc_tl_i32(cpu_crf[0], t0); +tcg_temp_free(t0); +tcg_gen_br(l2); + gen_set_label(l1); +tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so); + +gen_set_label(l2); tcg_gen_movi_tl(cpu_reserve, -1); } #endif -- 2.9.3
[Qemu-devel] [PATCH v3 2/6] cputlb: handle first atomic write to the page
In case where the conditional write is the first write to the page, TLB_NOTDIRTY will be set and stop_the_world is triggered. Handle this as a special case and set the dirty bit. After that fall through to the actual atomic instruction below. Signed-off-by: Nikunj A Dadhania Reviewed-by: Richard Henderson --- cputlb.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/cputlb.c b/cputlb.c index f5d056c..743776a 100644 --- a/cputlb.c +++ b/cputlb.c @@ -930,7 +930,13 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr, tlb_addr = tlbe->addr_write; } -/* Notice an IO access, or a notdirty page. */ +/* Check notdirty */ +if (unlikely(tlb_addr & TLB_NOTDIRTY)) { +tlb_set_dirty(ENV_GET_CPU(env), addr); +tlb_addr = tlb_addr & ~TLB_NOTDIRTY; +} + +/* Notice an IO access */ if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) { /* There's really nothing that can be done to support this apart from stop-the-world. */ -- 2.9.3
[Qemu-devel] [PATCH v3 5/6] tcg: enable MTTCG by default for PPC64 on x86
This enables the multi-threaded system emulation by default for PPC64 guests using the x86_64 TCG back-end. Signed-off-by: Nikunj A Dadhania Reviewed-by: Alex Bennée --- configure| 2 ++ target/ppc/cpu.h | 2 ++ 2 files changed, 4 insertions(+) diff --git a/configure b/configure index c35acf1..3814617 100755 --- a/configure +++ b/configure @@ -6090,12 +6090,14 @@ case "$target_name" in ppc64) TARGET_BASE_ARCH=ppc TARGET_ABI_DIR=ppc +mttcg=yes gdb_xml_files="power64-core.xml power-fpu.xml power-altivec.xml power-spe.xml power-vsx.xml" ;; ppc64le) TARGET_ARCH=ppc64 TARGET_BASE_ARCH=ppc TARGET_ABI_DIR=ppc +mttcg=yes gdb_xml_files="power64-core.xml power-fpu.xml power-altivec.xml power-spe.xml power-vsx.xml" ;; ppc64abi32) diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h index e0ff041..ece535d 100644 --- a/target/ppc/cpu.h +++ b/target/ppc/cpu.h @@ -30,6 +30,8 @@ #define TARGET_LONG_BITS 64 #define TARGET_PAGE_BITS 12 +#define TCG_GUEST_DEFAULT_MO 0 + /* Note that the official physical address space bits is 62-M where M is implementation dependent. I've not looked up M for the set of cpus we emulate at the system level. */ -- 2.9.3
[Qemu-devel] [PATCH v3 0/6] The series enables Multi-Threaded TCG on PPC64
Patch 01: Use atomic_cmpxchg in store conditional 02: Handle first write to page during atomic operation 03: Generate memory barriers for sync/isync and load/store conditional 04: Fix CPU unplug in MTTCG 05: Enable MTTCG by default on PPC64 06: Fixes a bug in PPC where the reservation is reset causing atomic operations to never succeed Patches are based on ppc-for-2.10 Changelog: v2: * David found problem related to clang and "make check" that was root caused to a tcg bug and the patch is in mainline: 79b1af9 tcg: Initialize return value after exit_atomic * Fixed a bug in ppc_cpu_exec_enter(), which was resetting the reserve_addr, this should be done in powerpc_excp() v1: * Rewrote store_conditional as suggested by Richard Bharata B Rao (1): cpus: Fix CPU unplug for MTTCG Nikunj A Dadhania (5): target/ppc: Emulate LL/SC using cmpxchg helpers cputlb: handle first atomic write to the page target/ppc: Generate fence operations tcg: enable MTTCG by default for PPC64 on x86 target/ppc: do not reset reserve_addr in exec_enter configure | 2 ++ cpus.c | 6 ++ cputlb.c| 8 +++- target/ppc/cpu.h| 2 ++ target/ppc/excp_helper.c| 3 +++ target/ppc/translate.c | 37 +++-- target/ppc/translate_init.c | 9 - 7 files changed, 51 insertions(+), 16 deletions(-) -- 2.9.3
[Qemu-devel] [PATCH] target/ppc: do not reset reserve_addr in exec_enter
In case when atomic operation is not supported, exit_atomic is called and we stop the world and execute the atomic operation. This results in a following call chain: tcg_gen_atomic_cmpxchg_tl() -> gen_helper_exit_atomic() -> HELPER(exit_atomic) -> cpu_loop_exit_atomic() -> EXCP_ATOMIC -> qemu_tcg_cpu_thread_fn() => case EXCP_ATOMIC -> cpu_exec_step_atomic() -> cpu_step_atomic() -> cc->cpu_exec_enter() = ppc_cpu_exec_enter() Sets env->reserve_addr = -1; But by the time it return back, the reservation is erased and the code fails, this continues forever and the lock is never taken. Instead set this in powerpc_excp() Now that ppc_cpu_exec_enter() doesn't have anything meaningful to do, let us get rid of the function. Signed-off-by: Nikunj A Dadhania --- Depends on following fix by Richard Henderson: https://patchwork.ozlabs.org/patch/755582/ --- target/ppc/excp_helper.c| 3 +++ target/ppc/translate_init.c | 9 - 2 files changed, 3 insertions(+), 9 deletions(-) diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c index f4ee7aa..a6bcb47 100644 --- a/target/ppc/excp_helper.c +++ b/target/ppc/excp_helper.c @@ -728,6 +728,9 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int excp_model, int excp) cs->exception_index = POWERPC_EXCP_NONE; env->error_code = 0; +/* Reset the reservation */ +env->reserve_addr = -1; + /* Any interrupt is context synchronizing, check if TCG TLB * needs a delayed flush on ppc64 */ diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c index 77e5463..dc4239d 100644 --- a/target/ppc/translate_init.c +++ b/target/ppc/translate_init.c @@ -10426,14 +10426,6 @@ static bool ppc_cpu_has_work(CPUState *cs) return msr_ee && (cs->interrupt_request & CPU_INTERRUPT_HARD); } -static void ppc_cpu_exec_enter(CPUState *cs) -{ -PowerPCCPU *cpu = POWERPC_CPU(cs); -CPUPPCState *env = &cpu->env; - -env->reserve_addr = -1; -} - /* CPUClass::reset() */ static void ppc_cpu_reset(CPUState *s) { @@ -10650,7 +10642,6 @@ static void ppc_cpu_class_init(ObjectClass *oc, void *data) cc->get_phys_page_debug = ppc_cpu_get_phys_page_debug; cc->vmsd = &vmstate_ppc_cpu; #endif -cc->cpu_exec_enter = ppc_cpu_exec_enter; #if defined(CONFIG_SOFTMMU) cc->write_elf64_note = ppc64_cpu_write_elf64_note; cc->write_elf32_note = ppc32_cpu_write_elf32_note; -- 2.9.3
Re: [Qemu-devel] [PATCH] tcg: Initialize return value after exit_atomic
Richard Henderson writes: > Users of tcg_gen_atomic_cmpxchg and do_atomic_op rightfully utilize > the output. Even though this code is dead, it gets translated, and > without the initialization we encounter a tcg_error. > > Reported-by: Nikunj A Dadhania > Signed-off-by: Richard Henderson Tested-by: Nikunj A Dadhania > --- > tcg/tcg-op.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c > index 95a39b7..6b1f415 100644 > --- a/tcg/tcg-op.c > +++ b/tcg/tcg-op.c > @@ -2861,6 +2861,9 @@ void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv > addr, TCGv_i64 cmpv, > #endif > #else > gen_helper_exit_atomic(tcg_ctx.tcg_env); > +/* Produce a result, so that we have a well-formed opcode stream > + with respect to uses of the result in the (dead) code following. > */ > +tcg_gen_movi_i64(retv, 0); > #endif /* CONFIG_ATOMIC64 */ > } else { > TCGv_i32 c32 = tcg_temp_new_i32(); > @@ -2966,6 +2969,9 @@ static void do_atomic_op_i64(TCGv_i64 ret, TCGv addr, > TCGv_i64 val, > #endif > #else > gen_helper_exit_atomic(tcg_ctx.tcg_env); > +/* Produce a result, so that we have a well-formed opcode stream > + with respect to uses of the result in the (dead) code following. > */ > +tcg_gen_movi_i64(ret, 0); > #endif /* CONFIG_ATOMIC64 */ > } else { > TCGv_i32 v32 = tcg_temp_new_i32(); > -- > 2.9.3
Re: [Qemu-devel] [PATCH] tcg: Initialize return value after exit_atomic
Nikunj A Dadhania writes: > aNikunj A Dadhania writes: > >> Richard Henderson writes: >> >>> On 04/25/2017 01:21 PM, Nikunj A Dadhania wrote: >>>> Richard Henderson writes: >>>> >>>>> Users of tcg_gen_atomic_cmpxchg and do_atomic_op rightfully utilize >>>>> the output. Even though this code is dead, it gets translated, and >>>>> without the initialization we encounter a tcg_error. >>>>> >>>>> Reported-by: Nikunj A Dadhania >>>>> Signed-off-by: Richard Henderson >>>> >>>> With this the tcg_error goes away. >>>> >>>> But then powernv skiboot code [1] enters into infinite loop. Basically, >>>> in target/ppc/translate.c:gen_conditional_store(), setcond_tl will >>>> always fail, and CRF_EQ_BIT will never be set, the lock will never be >>>> taken. >>> >>> The setcond_tl *shouldn't* always fail. >> >> Correct, in fact we never get here it. >> >>> If that's the case, then we have another bug in the !parallel_cpus >>> code path for gen_conditional_store. >> >> Something interesting is happening, I have instrumented the code such >> that I get some prints for load with reservation and store conditional: >> >> First case is the success case for 32bit atomic_cmpxchg. >> >> $ ./configure --target-list=ppc64-softmmu --cc=clang --host-cc=clang >> $ ./ppc64-softmmu/qemu-system-ppc64 -machine powernv,usb=off -vga none >> -nographic >> [lwarx] >> helper_myprint: t0 cafe t1 cafe >> helper_myprint: t0 cafe0001 t1 cafe0001 >> helper_myprint: t0 cafe0002 t1 cafe0002 >> helper_myprint: t0 f0 t1 0 >> [stwcx] >> helper_myprint: t0 dead t1 dead >> helper_myprint: t0 f0 t1 0 >> helper_myprint: t0 dead0001 t1 dead0001 >> helper_myprint: t0 dead0011 t1 dead0011 >> helper_myprint: t0 0 t1 0 >> [success as t0 and cpu_reserve_val is same] >> >> [ldarx] >> helper_myprint: t0 cafe t1 cafe >> helper_myprint: t0 cafe0001 t1 cafe0001 >> helper_myprint: t0 cafe0002 t1 cafe0002 >> helper_myprint: t0 30200018 t1 0 >> >> [ cpu_reserve = 30200018, cpu_reserve_val = 0 after load ] >> >> [stdcx] >> helper_myprint: t0 dead t1 dead >> helper_myprint: t0 30200018 t1 0 >> helper_myprint: t0 dead0001 t1 dead0001 >> >> [ That is before atomic_cmpxchg_tl, and suddenly we exit out, we did >> not reach setcond_tl ] >> >> helper_myprint: t0 dead t1 dead >> >> [ re-entering gen_store_conditional() ] >> >> helper_myprint: t0 t1 0 >> >> [ cpu_reserve is corrupted ] >> > > That is because of the following: > > tcg_gen_atomic_cmpxchg_tl() > -> gen_helper_exit_atomic() > -> HELPER(exit_atomic) > -> cpu_loop_exit_atomic() -> EXCP_ATOMIC > -> qemu_tcg_cpu_thread_fn() => case EXCP_ATOMIC > -> cpu_exec_step_atomic() > -> cpu_step_atomic() > -> cc->cpu_exec_enter() = ppc_cpu_exec_enter() > Sets env->reserve_addr = -1; > > So when we re-enter back, reserve_addr is rubbed out. After getting rid > of this reset of reserve_addr, I do get ahead a bit and stdcx is > successful. > > > [ldarx] > helper_myprint: t0 cafe t1 cafe > helper_myprint: t0 cafe0001 t1 cafe0001 > helper_myprint: t0 cafe0002 t1 cafe0002 > helper_myprint: t0 30200018 t1 0 > > [stdcx.] > > helper_myprint: t0 dead t1 dead > helper_myprint: t0 30200018 t1 0 > helper_myprint: t0 dead0001 t1 dead0001 > > [ re-enters ] > > helper_myprint: t0 dead t1 dead > > [ cpu_reserve valud is intact now ] > > helper_myprint: t0 30200018 t1 0 > helper_myprint: t0 dead0001 t1 dead0001 > helper_myprint: t0 dead0011 t1 dead0011 > helper_myprint: t0 0 t1 0 > > [ And there is a match ] > > But then the code is getting stuck here. Ok, that was due to some debug code that I had added in skiboot. I confirm that with this patch and the below change in target/ppc/translate_init.c, I am able pass powernv boot serial test. diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c index 77e5463..4eb0219 100644 --- a/target/ppc/translate_init.c +++ b/target/ppc/translate_init.c @@ -10428,10 +10428,6 @@ static bool ppc_cpu_has_work(CPUState *cs) static void ppc_cpu_exec_enter(CPUState *cs) { -PowerPCCPU *cpu = POWERPC_CPU(cs); -CPUPPCState *env = &cpu->env; - -env->reserve_addr = -1; } /* CPUClass::reset() */ I will need to see the implication of this in other context. Regards, Nikunj
Re: [Qemu-devel] [PATCH] tcg: Initialize return value after exit_atomic
aNikunj A Dadhania writes: > Richard Henderson writes: > >> On 04/25/2017 01:21 PM, Nikunj A Dadhania wrote: >>> Richard Henderson writes: >>> >>>> Users of tcg_gen_atomic_cmpxchg and do_atomic_op rightfully utilize >>>> the output. Even though this code is dead, it gets translated, and >>>> without the initialization we encounter a tcg_error. >>>> >>>> Reported-by: Nikunj A Dadhania >>>> Signed-off-by: Richard Henderson >>> >>> With this the tcg_error goes away. >>> >>> But then powernv skiboot code [1] enters into infinite loop. Basically, >>> in target/ppc/translate.c:gen_conditional_store(), setcond_tl will >>> always fail, and CRF_EQ_BIT will never be set, the lock will never be >>> taken. >> >> The setcond_tl *shouldn't* always fail. > > Correct, in fact we never get here it. > >> If that's the case, then we have another bug in the !parallel_cpus >> code path for gen_conditional_store. > > Something interesting is happening, I have instrumented the code such > that I get some prints for load with reservation and store conditional: > > First case is the success case for 32bit atomic_cmpxchg. > > $ ./configure --target-list=ppc64-softmmu --cc=clang --host-cc=clang > $ ./ppc64-softmmu/qemu-system-ppc64 -machine powernv,usb=off -vga none > -nographic > [lwarx] > helper_myprint: t0 cafe t1 cafe > helper_myprint: t0 cafe0001 t1 cafe0001 > helper_myprint: t0 cafe0002 t1 cafe0002 > helper_myprint: t0 f0 t1 0 > [stwcx] > helper_myprint: t0 dead t1 dead > helper_myprint: t0 f0 t1 0 > helper_myprint: t0 dead0001 t1 dead0001 > helper_myprint: t0 dead0011 t1 dead0011 > helper_myprint: t0 0 t1 0 > [success as t0 and cpu_reserve_val is same] > > [ldarx] > helper_myprint: t0 cafe t1 cafe > helper_myprint: t0 cafe0001 t1 cafe0001 > helper_myprint: t0 cafe0002 t1 cafe0002 > helper_myprint: t0 30200018 t1 0 > > [ cpu_reserve = 30200018, cpu_reserve_val = 0 after load ] > > [stdcx] > helper_myprint: t0 dead t1 dead > helper_myprint: t0 30200018 t1 0 > helper_myprint: t0 dead0001 t1 dead0001 > > [ That is before atomic_cmpxchg_tl, and suddenly we exit out, we did > not reach setcond_tl ] > > helper_myprint: t0 dead t1 dead > > [ re-entering gen_store_conditional() ] > > helper_myprint: t0 t1 0 > > [ cpu_reserve is corrupted ] > That is because of the following: tcg_gen_atomic_cmpxchg_tl() -> gen_helper_exit_atomic() -> HELPER(exit_atomic) -> cpu_loop_exit_atomic() -> EXCP_ATOMIC -> qemu_tcg_cpu_thread_fn() => case EXCP_ATOMIC -> cpu_exec_step_atomic() -> cpu_step_atomic() -> cc->cpu_exec_enter() = ppc_cpu_exec_enter() Sets env->reserve_addr = -1; So when we re-enter back, reserve_addr is rubbed out. After getting rid of this reset of reserve_addr, I do get ahead a bit and stdcx is successful. [ldarx] helper_myprint: t0 cafe t1 cafe helper_myprint: t0 cafe0001 t1 cafe0001 helper_myprint: t0 cafe0002 t1 cafe0002 helper_myprint: t0 30200018 t1 0 [stdcx.] helper_myprint: t0 dead t1 dead helper_myprint: t0 30200018 t1 0 helper_myprint: t0 dead0001 t1 dead0001 [ re-enters ] helper_myprint: t0 dead t1 dead [ cpu_reserve valud is intact now ] helper_myprint: t0 30200018 t1 0 helper_myprint: t0 dead0001 t1 dead0001 helper_myprint: t0 dead0011 t1 dead0011 helper_myprint: t0 0 t1 0 [ And there is a match ] But then the code is getting stuck here. Regards Nikunj
Re: [Qemu-devel] [PATCH] tcg: Initialize return value after exit_atomic
Richard Henderson writes: > Users of tcg_gen_atomic_cmpxchg and do_atomic_op rightfully utilize > the output. Even though this code is dead, it gets translated, and > without the initialization we encounter a tcg_error. > > Reported-by: Nikunj A Dadhania > Signed-off-by: Richard Henderson With this the tcg_error goes away. But then powernv skiboot code [1] enters into infinite loop. Basically, in target/ppc/translate.c:gen_conditional_store(), setcond_tl will always fail, and CRF_EQ_BIT will never be set, the lock will never be taken. So "make check" still fails at powernv serial test. ./configure --target-list=ppc64-softmmu --cc=clang --host-cc=clang && make && make check > --- > tcg/tcg-op.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c > index 95a39b7..6b1f415 100644 > --- a/tcg/tcg-op.c > +++ b/tcg/tcg-op.c > @@ -2861,6 +2861,9 @@ void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv > addr, TCGv_i64 cmpv, > #endif > #else > gen_helper_exit_atomic(tcg_ctx.tcg_env); > +/* Produce a result, so that we have a well-formed opcode stream > + with respect to uses of the result in the (dead) code following. > */ > +tcg_gen_movi_i64(retv, 0); > #endif /* CONFIG_ATOMIC64 */ > } else { > TCGv_i32 c32 = tcg_temp_new_i32(); > @@ -2966,6 +2969,9 @@ static void do_atomic_op_i64(TCGv_i64 ret, TCGv addr, > TCGv_i64 val, > #endif > #else > gen_helper_exit_atomic(tcg_ctx.tcg_env); > +/* Produce a result, so that we have a well-formed opcode stream > + with respect to uses of the result in the (dead) code following. > */ > +tcg_gen_movi_i64(ret, 0); > #endif /* CONFIG_ATOMIC64 */ > } else { > TCGv_i32 v32 = tcg_temp_new_i32(); > -- Regards, Nikunj 1. https://github.com/open-power/skiboot/blob/master/asm/lock.S#L36
Re: [Qemu-devel] [PATCH RFC] configure: fix clang failure for libatomic
Peter Maydell writes: > On 25 April 2017 at 09:58, Nikunj A Dadhania > wrote: >> I was trying out the program in the configure script with clang and I do >> get errors without libatomic: >> >> $ clang /tmp/atomic.c >> /tmp/atomic.c:6:7: warning: implicit declaration of function >> '__atomic_load_8' is invalid in C99 [-Wimplicit-function-declaration] >> y = __atomic_load_8(&x, 0); >> ^ >> /tmp/atomic.c:7:3: warning: implicit declaration of function >> '__atomic_store_8' is invalid in C99 [-Wimplicit-function-declaration] >> __atomic_store_8(&x, y, 0); >> ^ >> /tmp/atomic.c:8:3: warning: implicit declaration of function >> '__atomic_compare_exchange_8' is invalid in C99 >> [-Wimplicit-function-declaration] >> __atomic_compare_exchange_8(&x, &y, x, 0, 0, 0); >> ^ >> /tmp/atomic.c:9:3: warning: implicit declaration of function >> '__atomic_exchange_8' is invalid in C99 [-Wimplicit-function-declaration] >> __atomic_exchange_8(&x, y, 0); >> ^ >> /tmp/atomic.c:10:3: warning: implicit declaration of function >> '__atomic_fetch_add_8' is invalid in C99 [-Wimplicit-function-declaration] >> __atomic_fetch_add_8(&x, y, 0); >> ^ >> 5 warnings generated. >> /tmp/atomic-1660e0.o: In function `main': >> /tmp/atomic.c:(.text+0x28): undefined reference to `__atomic_load_8' >> /tmp/atomic.c:(.text+0x40): undefined reference to `__atomic_store_8' >> /tmp/atomic.c:(.text+0x69): undefined reference to >> `__atomic_compare_exchange_8' >> /tmp/atomic.c:(.text+0x7d): undefined reference to `__atomic_exchange_8' >> /tmp/atomic.c:(.text+0x91): undefined reference to `__atomic_fetch_add_8' >> clang-3.9: error: linker command failed with exit code 1 (use -v to see >> invocation) >> >> With -latomic, the linker succeeds in getting the binary. > > What host is this on ? $ uname -a Linux abhimanyu 4.9.13-200.fc25.x86_64 #1 SMP Mon Feb 27 16:48:42 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux $ Regards Nikunj
Re: [Qemu-devel] [PATCH RFC] configure: fix clang failure for libatomic
Peter Maydell writes: > On 25 April 2017 at 09:35, Nikunj A Dadhania > wrote: >> Travis builds failure was reported for powernv boot-serial test with >> qemu built with clang. >> >> Debugging revealed that CONFIG_ATOMIC64 wasnt getting set for the clang >> build because of that atomic operations weren't being used and was >> resulting in MTTCG failure in the powernv boot-serial test. >> >> libatomic is required to successfully test atomic64 and atomic128 for >> clang. Introduced newer checks for the same. And on failure default to >> single threaded tcg support in PPC64. >> >> Signed-off-by: Nikunj A Dadhania > > Do we really need libatomic? I was trying out the program in the configure script with clang and I do get errors without libatomic: $ clang /tmp/atomic.c /tmp/atomic.c:6:7: warning: implicit declaration of function '__atomic_load_8' is invalid in C99 [-Wimplicit-function-declaration] y = __atomic_load_8(&x, 0); ^ /tmp/atomic.c:7:3: warning: implicit declaration of function '__atomic_store_8' is invalid in C99 [-Wimplicit-function-declaration] __atomic_store_8(&x, y, 0); ^ /tmp/atomic.c:8:3: warning: implicit declaration of function '__atomic_compare_exchange_8' is invalid in C99 [-Wimplicit-function-declaration] __atomic_compare_exchange_8(&x, &y, x, 0, 0, 0); ^ /tmp/atomic.c:9:3: warning: implicit declaration of function '__atomic_exchange_8' is invalid in C99 [-Wimplicit-function-declaration] __atomic_exchange_8(&x, y, 0); ^ /tmp/atomic.c:10:3: warning: implicit declaration of function '__atomic_fetch_add_8' is invalid in C99 [-Wimplicit-function-declaration] __atomic_fetch_add_8(&x, y, 0); ^ 5 warnings generated. /tmp/atomic-1660e0.o: In function `main': /tmp/atomic.c:(.text+0x28): undefined reference to `__atomic_load_8' /tmp/atomic.c:(.text+0x40): undefined reference to `__atomic_store_8' /tmp/atomic.c:(.text+0x69): undefined reference to `__atomic_compare_exchange_8' /tmp/atomic.c:(.text+0x7d): undefined reference to `__atomic_exchange_8' /tmp/atomic.c:(.text+0x91): undefined reference to `__atomic_fetch_add_8' clang-3.9: error: linker command failed with exit code 1 (use -v to see invocation) With -latomic, the linker succeeds in getting the binary. > I thought the intention here was that atomic operations were only done > on types of width of the pointer or less, which should all be doable > by the compiler inline, and thus don't need the external library. You are right, even without -latomic in libs_softmmu, tests are passing. But then __atomic_load_8() isn't used in qemu, if it was using them, build would fail. > In the past "doesn't work without libatomic" usually meant >"accidentally tried to do an atomic operation on a type that is too >wide". Regards Nikunj
[Qemu-devel] [PATCH RFC] configure: fix clang failure for libatomic
Travis builds failure was reported for powernv boot-serial test with qemu built with clang. Debugging revealed that CONFIG_ATOMIC64 wasnt getting set for the clang build because of that atomic operations weren't being used and was resulting in MTTCG failure in the powernv boot-serial test. libatomic is required to successfully test atomic64 and atomic128 for clang. Introduced newer checks for the same. And on failure default to single threaded tcg support in PPC64. Signed-off-by: Nikunj A Dadhania --- Reference: https://lists.gnu.org/archive/html/qemu-ppc/2017-04/msg00277.html --- configure | 16 1 file changed, 16 insertions(+) diff --git a/configure b/configure index d31a3e8..1e5f7af 100755 --- a/configure +++ b/configure @@ -4598,6 +4598,9 @@ int main(void) EOF if compile_prog "" "" ; then atomic128=yes + elif compile_prog "" "-latomic" ; then + atomic128=yes + lib_atomic="-latomic" fi fi @@ -4628,6 +4631,9 @@ int main(void) EOF if compile_prog "" "" ; then atomic64=yes +elif compile_prog "" "-latomic" ; then + atomic64=yes + lib_atomic="-latomic" fi @@ -6065,6 +6071,16 @@ if [ "$TARGET_BASE_ARCH" = "" ]; then TARGET_BASE_ARCH=$TARGET_ARCH fi +if test $atomic64 == "yes" || test $atomic128 == "yes" ; then +libs_softmmu="$lib_atomic $libs_softmmu" +elif test $mttcg == "yes" && test $TARGET_BASE_ARCH == "ppc"; then +echo +echo "Note: Atomic library (-latomic) not available, falling" +echo " back to single threaded mode by default" +echo +mttcg=no +fi + symlink "$source_path/Makefile.target" "$target_dir/Makefile" upper() { -- 2.9.3
Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64
Cédric Le Goater writes: > On 04/10/2017 06:44 PM, Nikunj A Dadhania wrote: >> Cédric Le Goater writes: >> >>> On 04/07/2017 08:07 AM, Cédric Le Goater wrote: >>>> On 04/07/2017 07:24 AM, Nikunj A Dadhania wrote: >>>>> Cédric Le Goater writes: >>>> >>>> sure. pnv is still on 2.9, so I will rebase on 2.10, merge your >>>> patches and tell you. >>> >>> The system seems to be spinning in skiboot in cpu_idle/relax when >>> starting the linux kernel. It finally boots, but it is rather long. >>> David has merged enough to test if you want to give it a try. >> >> I have got your powernv-ipmi-2.9 + ppc64 mttcg patches, and testing >> them. I too saw delay during boot, but wasn't aware that its caused by >> mttcg. I will have a look. > > You can use David's branch directly now, there is enough support. Sure > I am not sure where that is exactly, the kernel is somewhere in > early_setup(). It might be the secondary spinloop. Lot of prints missing, i think I need to add a console. [2.303286014,5] INIT: Starting kernel at 0x2001, fdt at 0x30354908 14865 bytes) [ 43.421998779,5] OPAL: Switch to little-endian OS -> smp_release_cpus() spinning_secondaries = 3 <- smp_release_cpus() [0.260526] nvram: Failed to find or create lnx,oops-log partition, err -28 [0.264448] nvram: Failed to initialize oops partition! Regards, Nikunj
Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64
Cédric Le Goater writes: > On 04/07/2017 08:07 AM, Cédric Le Goater wrote: >> On 04/07/2017 07:24 AM, Nikunj A Dadhania wrote: >>> Cédric Le Goater writes: >>> >>>> Hello Nikunj, >>>> >>>> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: >>>>> The series enables Multi-Threaded TCG on PPC64 >>>>> >>>>> Patch 01: Use atomic_cmpxchg in store conditional >>>>> 02: Handle first write to page during atomic operation >>>>> 03: Generate memory barriers for sync/isync and load/store >>>>> conditional >>>>> >>>>> Patches are based on ppc-for-2.10 >>>>> >>>>> Tested using following: >>>>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic >>>>> -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel >>>>> tcg,thread=multi f23.img >>>> >>>> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked >>>> good : the CPU usage of QEMU reached 760% on the host. >>> >>> Cool. >>> >>>>> Todo: >>>>> * Enable other machine types and PPC32. >>>> >>>> I am quite ignorant on the topic. >>>> Have you looked at what it would take to emulate support of the HW >>>> threads ? >>> >>> We would need to implement msgsndp (doorbell support for IPI between >>> threads of same core) >> >> ok. I get it. Thanks, >> >>>> and the PowerNV machine ? >>> >>> Haven't tried it, should work. Just give a shot, let me know if you see >>> problems. >> >> sure. pnv is still on 2.9, so I will rebase on 2.10, merge your >> patches and tell you. > > The system seems to be spinning in skiboot in cpu_idle/relax when > starting the linux kernel. It finally boots, but it is rather long. > David has merged enough to test if you want to give it a try. I have got your powernv-ipmi-2.9 + ppc64 mttcg patches, and testing them. I too saw delay during boot, but wasn't aware that its caused by mttcg. I will have a look. Regards Nikunj
[Qemu-devel] [PATCH for 2.10] tcg: enable MTTCG by default for PPC64 on x86
This enables the multi-threaded system emulation by default for PPC64 guests using the x86_64 TCG back-end. Signed-off-by: Nikunj A Dadhania --- Depends on following patch which fixes the define name: https://patchwork.ozlabs.org/patch/748840/ --- configure| 2 ++ target/ppc/cpu.h | 2 ++ 2 files changed, 4 insertions(+) diff --git a/configure b/configure index 4b3b5cd..2a87495 100755 --- a/configure +++ b/configure @@ -6008,12 +6008,14 @@ case "$target_name" in ppc64) TARGET_BASE_ARCH=ppc TARGET_ABI_DIR=ppc +mttcg=yes gdb_xml_files="power64-core.xml power-fpu.xml power-altivec.xml power-spe.xml power-vsx.xml" ;; ppc64le) TARGET_ARCH=ppc64 TARGET_BASE_ARCH=ppc TARGET_ABI_DIR=ppc +mttcg=yes gdb_xml_files="power64-core.xml power-fpu.xml power-altivec.xml power-spe.xml power-vsx.xml" ;; ppc64abi32) diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h index e0ff041..ece535d 100644 --- a/target/ppc/cpu.h +++ b/target/ppc/cpu.h @@ -30,6 +30,8 @@ #define TARGET_LONG_BITS 64 #define TARGET_PAGE_BITS 12 +#define TCG_GUEST_DEFAULT_MO 0 + /* Note that the official physical address space bits is 62-M where M is implementation dependent. I've not looked up M for the set of cpus we emulate at the system level. */ -- 2.9.3
Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 0/3] Enable MTTCG on PPC64
Alex Bennée writes: > Nikunj A Dadhania writes: > >> Alex Bennée writes: >> >>> luigi burdo writes: >>> >>>> Hi David and Nikuji, >>>> >>>> can i suggest to remove the message: >>>> >>>> >>>> Guest not yet converted to MTTCG - you may get unexpected results >>>> where the mttcg is enabled? >>> >>> Have you declared the memory ordering for the guest? >> >> No, I havent done that yet, will send a patch. > > You also need to update configure so mttcg="yes" (resulting in > TARGET_SUPPORTS_MTTCG being set for the build). Yes, I have that patch for ppc64, will send it to the list. Regards Nikunj
[Qemu-devel] [PATCH fix for-2.9] cpus: fix wrong define name
While the configure script generates TARGET_SUPPORTS_MTTCG define, one of the define is cpus.c is checking wrong name: TARGET_SUPPORT_MTTCG Signed-off-by: Nikunj A Dadhania --- cpus.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cpus.c b/cpus.c index 68fdbc4..58d90aa 100644 --- a/cpus.c +++ b/cpus.c @@ -202,7 +202,7 @@ void qemu_tcg_configure(QemuOpts *opts, Error **errp) } else if (use_icount) { error_setg(errp, "No MTTCG when icount is enabled"); } else { -#ifndef TARGET_SUPPORT_MTTCG +#ifndef TARGET_SUPPORTS_MTTCG error_report("Guest not yet converted to MTTCG - " "you may get unexpected results"); #endif -- 2.9.3
Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 0/3] Enable MTTCG on PPC64
Alex Bennée writes: > luigi burdo writes: > >> Hi David and Nikuji, >> >> can i suggest to remove the message: >> >> >> Guest not yet converted to MTTCG - you may get unexpected results >> where the mttcg is enabled? > > Have you declared the memory ordering for the guest? No, I havent done that yet, will send a patch. >> another thing im finding is this message >> Guest expects a stronger memory ordering than the host provides >> This may cause strange/hard to debug errors > > See ca759f9e387db87e1719911f019bc60c74be9ed8 for an example. Regards Nikunj
[Qemu-devel] [PATCH v2 3/3] target/ppc: Generate fence operations
Signed-off-by: Nikunj A Dadhania --- target/ppc/translate.c | 8 1 file changed, 8 insertions(+) diff --git a/target/ppc/translate.c b/target/ppc/translate.c index 50b6d4d..4a1f24a 100644 --- a/target/ppc/translate.c +++ b/target/ppc/translate.c @@ -2971,6 +2971,7 @@ static void gen_stswx(DisasContext *ctx) /* eieio */ static void gen_eieio(DisasContext *ctx) { +tcg_gen_mb(TCG_MO_LD_ST | TCG_BAR_SC); } #if !defined(CONFIG_USER_ONLY) @@ -3008,6 +3009,7 @@ static void gen_isync(DisasContext *ctx) if (!ctx->pr) { gen_check_tlb_flush(ctx, false); } +tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC); gen_stop_exception(ctx); } @@ -3028,6 +3030,7 @@ static void gen_##name(DisasContext *ctx) \ tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop);\ tcg_gen_mov_tl(cpu_reserve, t0); \ tcg_gen_mov_tl(cpu_reserve_val, gpr);\ +tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ); \ tcg_temp_free(t0); \ } @@ -3177,6 +3180,10 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA, tcg_gen_br(l2); gen_set_label(l1); + +/* Address mismatch implies failure. But we still need to provide the + memory barrier semantics of the instruction. */ +tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL); tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so); gen_set_label(l2); @@ -3308,6 +3315,7 @@ static void gen_sync(DisasContext *ctx) if (((l == 2) || !(ctx->insns_flags & PPC_64B)) && !ctx->pr) { gen_check_tlb_flush(ctx, true); } +tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC); } /* wait */ -- 2.9.3
[Qemu-devel] [PATCH v2 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers
Emulating LL/SC with cmpxchg is not correct, since it can suffer from the ABA problem. However, portable parallel code is written assuming only cmpxchg which means that in practice this is a viable alternative. Signed-off-by: Nikunj A Dadhania --- target/ppc/translate.c | 29 +++-- 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/target/ppc/translate.c b/target/ppc/translate.c index f40b5a1..50b6d4d 100644 --- a/target/ppc/translate.c +++ b/target/ppc/translate.c @@ -73,6 +73,7 @@ static TCGv cpu_cfar; #endif static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca, cpu_ov32, cpu_ca32; static TCGv cpu_reserve; +static TCGv cpu_reserve_val; static TCGv cpu_fpscr; static TCGv_i32 cpu_access_type; @@ -181,6 +182,9 @@ void ppc_translate_init(void) cpu_reserve = tcg_global_mem_new(cpu_env, offsetof(CPUPPCState, reserve_addr), "reserve_addr"); +cpu_reserve_val = tcg_global_mem_new(cpu_env, + offsetof(CPUPPCState, reserve_val), + "reserve_val"); cpu_fpscr = tcg_global_mem_new(cpu_env, offsetof(CPUPPCState, fpscr), "fpscr"); @@ -3023,7 +3027,7 @@ static void gen_##name(DisasContext *ctx) \ }\ tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop);\ tcg_gen_mov_tl(cpu_reserve, t0); \ -tcg_gen_st_tl(gpr, cpu_env, offsetof(CPUPPCState, reserve_val)); \ +tcg_gen_mov_tl(cpu_reserve_val, gpr);\ tcg_temp_free(t0); \ } @@ -3155,14 +3159,27 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA, static void gen_conditional_store(DisasContext *ctx, TCGv EA, int reg, int memop) { -TCGLabel *l1; +TCGLabel *l1 = gen_new_label(); +TCGLabel *l2 = gen_new_label(); +TCGv t0; -tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so); -l1 = gen_new_label(); tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1); -tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], CRF_EQ); -tcg_gen_qemu_st_tl(cpu_gpr[reg], EA, ctx->mem_idx, memop); + +t0 = tcg_temp_new(); +tcg_gen_atomic_cmpxchg_tl(t0, cpu_reserve, cpu_reserve_val, + cpu_gpr[reg], ctx->mem_idx, + DEF_MEMOP(memop) | MO_ALIGN); +tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_reserve_val); +tcg_gen_shli_tl(t0, t0, CRF_EQ_BIT); +tcg_gen_or_tl(t0, t0, cpu_so); +tcg_gen_trunc_tl_i32(cpu_crf[0], t0); +tcg_temp_free(t0); +tcg_gen_br(l2); + gen_set_label(l1); +tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so); + +gen_set_label(l2); tcg_gen_movi_tl(cpu_reserve, -1); } #endif -- 2.9.3
[Qemu-devel] [PATCH v2 0/3] Enable MTTCG on PPC64
The series enables Multi-Threaded TCG on PPC64 Patch 01: Use atomic_cmpxchg in store conditional 02: Handle first write to page during atomic operation 03: Generate memory barriers for sync/isync and load/store conditional Patches are based on ppc-for-2.10 Changelog: v1: * Rewrote store_conditional as suggested by Richard Tested using following: ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img Todo: * Implement lqarx and stqcx * Enable other machine types and PPC32. * More testing for corner cases. Nikunj A Dadhania (3): target/ppc: Emulate LL/SC using cmpxchg helpers cputlb: handle first atomic write to the page target/ppc: Generate fence operations cputlb.c | 8 +++- target/ppc/translate.c | 37 +++-- 2 files changed, 38 insertions(+), 7 deletions(-) -- 2.9.3
[Qemu-devel] [PATCH v2 2/3] cputlb: handle first atomic write to the page
In case where the conditional write is the first write to the page, TLB_NOTDIRTY will be set and stop_the_world is triggered. Handle this as a special case and set the dirty bit. After that fall through to the actual atomic instruction below. Signed-off-by: Nikunj A Dadhania Reviewed-by: Richard Henderson --- cputlb.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/cputlb.c b/cputlb.c index f5d056c..743776a 100644 --- a/cputlb.c +++ b/cputlb.c @@ -930,7 +930,13 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr, tlb_addr = tlbe->addr_write; } -/* Notice an IO access, or a notdirty page. */ +/* Check notdirty */ +if (unlikely(tlb_addr & TLB_NOTDIRTY)) { +tlb_set_dirty(ENV_GET_CPU(env), addr); +tlb_addr = tlb_addr & ~TLB_NOTDIRTY; +} + +/* Notice an IO access */ if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) { /* There's really nothing that can be done to support this apart from stop-the-world. */ -- 2.9.3
Re: [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers
David Gibson writes: > [ Unknown signature status ] > On Thu, Apr 06, 2017 at 03:52:47PM +0530, Nikunj A Dadhania wrote: >> Emulating LL/SC with cmpxchg is not correct, since it can suffer from >> the ABA problem. However, portable parallel code is written assuming >> only cmpxchg which means that in practice this is a viable alternative. >> >> Signed-off-by: Nikunj A Dadhania >> --- >> target/ppc/translate.c | 24 +--- >> 1 file changed, 21 insertions(+), 3 deletions(-) >> >> diff --git a/target/ppc/translate.c b/target/ppc/translate.c >> index b6abc60..a9c733d 100644 >> --- a/target/ppc/translate.c >> +++ b/target/ppc/translate.c >> @@ -73,6 +73,7 @@ static TCGv cpu_cfar; >> #endif >> static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca, cpu_ov32, cpu_ca32; >> static TCGv cpu_reserve; >> +static TCGv cpu_reserve_val; >> static TCGv cpu_fpscr; >> static TCGv_i32 cpu_access_type; >> >> @@ -181,6 +182,9 @@ void ppc_translate_init(void) >> cpu_reserve = tcg_global_mem_new(cpu_env, >> offsetof(CPUPPCState, reserve_addr), >> "reserve_addr"); >> +cpu_reserve_val = tcg_global_mem_new(cpu_env, >> + offsetof(CPUPPCState, reserve_val), >> + "reserve_val"); > > I notice that lqarx is not updated. Does that matter? Thats correct, I haven't touched that yet. Most of the locks are implemented using lwarx/stwcx. Regards Nikunj
Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64
Cédric Le Goater writes: > Hello Nikunj, > > On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: >> The series enables Multi-Threaded TCG on PPC64 >> >> Patch 01: Use atomic_cmpxchg in store conditional >> 02: Handle first write to page during atomic operation >> 03: Generate memory barriers for sync/isync and load/store conditional >> >> Patches are based on ppc-for-2.10 >> >> Tested using following: >> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine >> pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi >> f23.img > > I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked > good : the CPU usage of QEMU reached 760% on the host. Cool. >> Todo: >> * Enable other machine types and PPC32. > > I am quite ignorant on the topic. > Have you looked at what it would take to emulate support of the HW > threads ? We would need to implement msgsndp (doorbell support for IPI between threads of same core) > and the PowerNV machine ? Haven't tried it, should work. Just give a shot, let me know if you see problems. Regards Nikunj
Re: [Qemu-devel] [PATCH RFC v1 3/3] target/ppc: Generate fence operations
Richard Henderson writes: > On 04/06/2017 03:22 AM, Nikunj A Dadhania wrote: >> @@ -3028,6 +3030,7 @@ static void gen_##name(DisasContext *ctx) >> \ >> tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop);\ >> tcg_gen_mov_tl(cpu_reserve, t0); \ >> tcg_gen_mov_tl(cpu_reserve_val, gpr);\ >> +tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ); \ > > Please put the barrier next to the load. > I hope we can merge these soon. Sure > >> @@ -3196,6 +3199,7 @@ static void gen_##name(DisasContext *ctx) >> \ >> if (len > 1) { \ >> gen_check_align(ctx, t0, (len) - 1);\ >> } \ >> +tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL); \ >> gen_conditional_store(ctx, t0, rS(ctx->opcode), memop); \ >> tcg_temp_free(t0); \ >> } > > This one is more complicated... > > The success path includes an atomic_cmpxchg, which itself is a SC barrier. > However, the fail path branches across the cmpxchg and so sees no barrier, > but > one is still required by the architecture. How about a gen_conditional_store > that looks like this: > > { >TCGLabel *l1 = gen_new_label(); >TCGLabel *l2 = gen_new_label(); >TCGv t0; > >tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1); > >t0 = tcg_temp_new(); >tcg_gen_atomic_cmpxchg_tl(t0, cpu_reserve, cpu_reserve_val, > cpu_gpr[reg], ctx->mem_idx, > DEF_MEMOP(memop) | MO_ALIGN); >tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_reserve_val); >tcg_gen_shli_tl(t0, t0, CRF_EQ_BIT); >tcg_gen_or_tl(t0, t0, cpu_so); >tcg_gen_trunc_tl(cpu_crf[0], t0); >tcg_temp_free(t0); >tcg_gen_br(l2); > >gen_set_label(l1); > >/* Address mismatch implies failure. But we still need to provide the > memory barrier semantics of the instruction. */ >tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL); >tcg_gen_trunc_tl(cpu_crf[0], cpu_so); > >gen_set_label(l2); >tcg_gen_movi_tl(cpu_reserve, -1); > } > > Note that you don't need to reset cpu_reserve_val at all. Sure. > I just thought of something we might need to check for this and other ll/sc > implemetations -- do we need to check for address misalignment along the > address comparison failure path? We do that in the macro: if (len > 1) { \ gen_check_align(ctx, t0, (len) - 1);\ } \ Would we still need a barrier before the alignment check? > I suspect that we do. Regards Nikunj
Re: [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers
Richard Henderson writes: > On 04/06/2017 03:22 AM, Nikunj A Dadhania wrote: >> tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so); >> l1 = gen_new_label(); >> tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1); >> -tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], CRF_EQ); >> -tcg_gen_qemu_st_tl(cpu_gpr[reg], EA, ctx->mem_idx, memop); >> + >> +t0 = tcg_temp_new(); >> +tcg_gen_atomic_cmpxchg_tl(t0, EA, cpu_reserve_val, cpu_gpr[reg], >> + ctx->mem_idx, DEF_MEMOP(memop)); > > Actually, I noticed another, existing, problem. > > This code changes CRF[0] before the user memory write, which might fault. > This > needs to delay any changes to the architecture visible state until after any > exception may be triggered. Sure, here you are mentioning cpu_so being moved to CRF. Regards Nikunj
Re: [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers
Richard Henderson writes: > On 04/06/2017 03:22 AM, Nikunj A Dadhania wrote: >> +TCGv_i32 tmp = tcg_temp_local_new_i32(); >> +TCGv t0; >> >> +tcg_gen_movi_i32(tmp, 0); >> tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so); >> l1 = gen_new_label(); >> tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1); >> -tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], CRF_EQ); >> -tcg_gen_qemu_st_tl(cpu_gpr[reg], EA, ctx->mem_idx, memop); >> + >> +t0 = tcg_temp_new(); >> +tcg_gen_atomic_cmpxchg_tl(t0, EA, cpu_reserve_val, cpu_gpr[reg], >> + ctx->mem_idx, DEF_MEMOP(memop)); >> +tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_reserve_val); >> +tcg_gen_trunc_tl_i32(tmp, t0); >> + >> gen_set_label(l1); >> +tcg_gen_shli_i32(tmp, tmp, CRF_EQ_BIT); >> +tcg_gen_or_i32(cpu_crf[0], cpu_crf[0], tmp); > > I encourage you to move these two lines up beside the setcond. > That way you don't need to use a local tmp, which implies a > spill/restore from the stack. Sure. Regards Nikunj
[Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers
Emulating LL/SC with cmpxchg is not correct, since it can suffer from the ABA problem. However, portable parallel code is written assuming only cmpxchg which means that in practice this is a viable alternative. Signed-off-by: Nikunj A Dadhania --- target/ppc/translate.c | 24 +--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/target/ppc/translate.c b/target/ppc/translate.c index b6abc60..a9c733d 100644 --- a/target/ppc/translate.c +++ b/target/ppc/translate.c @@ -73,6 +73,7 @@ static TCGv cpu_cfar; #endif static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca, cpu_ov32, cpu_ca32; static TCGv cpu_reserve; +static TCGv cpu_reserve_val; static TCGv cpu_fpscr; static TCGv_i32 cpu_access_type; @@ -181,6 +182,9 @@ void ppc_translate_init(void) cpu_reserve = tcg_global_mem_new(cpu_env, offsetof(CPUPPCState, reserve_addr), "reserve_addr"); +cpu_reserve_val = tcg_global_mem_new(cpu_env, + offsetof(CPUPPCState, reserve_val), + "reserve_val"); cpu_fpscr = tcg_global_mem_new(cpu_env, offsetof(CPUPPCState, fpscr), "fpscr"); @@ -3023,7 +3027,7 @@ static void gen_##name(DisasContext *ctx) \ }\ tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop);\ tcg_gen_mov_tl(cpu_reserve, t0); \ -tcg_gen_st_tl(gpr, cpu_env, offsetof(CPUPPCState, reserve_val)); \ +tcg_gen_mov_tl(cpu_reserve_val, gpr);\ tcg_temp_free(t0); \ } @@ -3156,14 +3160,28 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA, int reg, int memop) { TCGLabel *l1; +TCGv_i32 tmp = tcg_temp_local_new_i32(); +TCGv t0; +tcg_gen_movi_i32(tmp, 0); tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so); l1 = gen_new_label(); tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1); -tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], CRF_EQ); -tcg_gen_qemu_st_tl(cpu_gpr[reg], EA, ctx->mem_idx, memop); + +t0 = tcg_temp_new(); +tcg_gen_atomic_cmpxchg_tl(t0, EA, cpu_reserve_val, cpu_gpr[reg], + ctx->mem_idx, DEF_MEMOP(memop)); +tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_reserve_val); +tcg_gen_trunc_tl_i32(tmp, t0); + gen_set_label(l1); +tcg_gen_shli_i32(tmp, tmp, CRF_EQ_BIT); +tcg_gen_or_i32(cpu_crf[0], cpu_crf[0], tmp); tcg_gen_movi_tl(cpu_reserve, -1); +tcg_gen_movi_tl(cpu_reserve_val, 0); + +tcg_temp_free(t0); +tcg_temp_free_i32(tmp); } #endif -- 2.9.3
[Qemu-devel] [PATCH RFC v1 2/3] cputlb: handle first atomic write to the page
In case where the conditional write is the first write to the page, TLB_NOTDIRTY will be set and stop_the_world is triggered. Handle this as a special case and set the dirty bit. After that fall through to the actual atomic instruction below. Signed-off-by: Nikunj A Dadhania --- cputlb.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/cputlb.c b/cputlb.c index f5d056c..743776a 100644 --- a/cputlb.c +++ b/cputlb.c @@ -930,7 +930,13 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr, tlb_addr = tlbe->addr_write; } -/* Notice an IO access, or a notdirty page. */ +/* Check notdirty */ +if (unlikely(tlb_addr & TLB_NOTDIRTY)) { +tlb_set_dirty(ENV_GET_CPU(env), addr); +tlb_addr = tlb_addr & ~TLB_NOTDIRTY; +} + +/* Notice an IO access */ if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) { /* There's really nothing that can be done to support this apart from stop-the-world. */ -- 2.9.3
[Qemu-devel] [PATCH RFC v1 0/3] Enable MTTCG on PPC64
The series enables Multi-Threaded TCG on PPC64 Patch 01: Use atomic_cmpxchg in store conditional 02: Handle first write to page during atomic operation 03: Generate memory barriers for sync/isync and load/store conditional Patches are based on ppc-for-2.10 Tested using following: ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img Todo: * Enable other machine types and PPC32. * More testing for corner cases. Nikunj A Dadhania (3): target/ppc: Emulate LL/SC using cmpxchg helpers cputlb: handle first atomic write to the page target/ppc: Generate fence operations cputlb.c | 8 +++- target/ppc/translate.c | 29 ++--- 2 files changed, 33 insertions(+), 4 deletions(-) -- 2.9.3
[Qemu-devel] [PATCH RFC v1 3/3] target/ppc: Generate fence operations
Signed-off-by: Nikunj A Dadhania --- target/ppc/translate.c | 5 + 1 file changed, 5 insertions(+) diff --git a/target/ppc/translate.c b/target/ppc/translate.c index a9c733d..87b4fe4 100644 --- a/target/ppc/translate.c +++ b/target/ppc/translate.c @@ -2971,6 +2971,7 @@ static void gen_stswx(DisasContext *ctx) /* eieio */ static void gen_eieio(DisasContext *ctx) { +tcg_gen_mb(TCG_MO_LD_ST | TCG_BAR_SC); } #if !defined(CONFIG_USER_ONLY) @@ -3008,6 +3009,7 @@ static void gen_isync(DisasContext *ctx) if (!ctx->pr) { gen_check_tlb_flush(ctx, false); } +tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC); gen_stop_exception(ctx); } @@ -3028,6 +3030,7 @@ static void gen_##name(DisasContext *ctx) \ tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop);\ tcg_gen_mov_tl(cpu_reserve, t0); \ tcg_gen_mov_tl(cpu_reserve_val, gpr);\ +tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ); \ tcg_temp_free(t0); \ } @@ -3196,6 +3199,7 @@ static void gen_##name(DisasContext *ctx) \ if (len > 1) { \ gen_check_align(ctx, t0, (len) - 1);\ } \ +tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL); \ gen_conditional_store(ctx, t0, rS(ctx->opcode), memop); \ tcg_temp_free(t0); \ } @@ -3309,6 +3313,7 @@ static void gen_sync(DisasContext *ctx) if (((l == 2) || !(ctx->insns_flags & PPC_64B)) && !ctx->pr) { gen_check_tlb_flush(ctx, true); } +tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC); } /* wait */ -- 2.9.3
Re: [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers
Alex Bennée writes: > Nikunj A Dadhania writes: > >> Richard Henderson writes: >> >>> On 09/12/2016 06:47 AM, Alex Bennée wrote: >>>>> > +/* Notice an IO access, or a notdirty page. */ >>>>> > +if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) { >>>>> > +/* There's really nothing that can be done to >>>>> > + support this apart from stop-the-world. */ >>>>> > +goto stop_the_world; >>>> We are also triggering on TLB_NOTDIRTY here in the case where a >>>> conditional write is the first write to a page. I don't know if a >>>> stop_the_world is required at this point but we will need to ensure we >>>> clear bits as notdirty_mem_write() does. >>>> >>> >>> You're quite right that we could probably special-case TLB_NOTDIRTY here >>> such >>> that (1) we needn't leave the cpu loop, and (2) needn't utilize the actual >>> "write" part of notdirty_mem_write; just set the bits then fall through to >>> the >>> actual atomic instruction below. >> >> I do hit this case with ppc64, where I see that its the first write to >> the page and it exits from this every time, causing the kernel to print >> soft-lockups. >> >> Can we add the special case here for NOTDIRTY and set the page as dirty >> and return successfully? > > Does the atomic step fall-back not work for you? Looked further and I do see that EXCP_ATOMIC does get executed in qemu_tcg_cpu_thread_fn(), but I am not sure what is going wrong there. Following snippet fixes the issue for me: diff --git a/cputlb.c b/cputlb.c index f5d056c..743776a 100644 --- a/cputlb.c +++ b/cputlb.c @@ -930,7 +930,13 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr, tlb_addr = tlbe->addr_write; } -/* Notice an IO access, or a notdirty page. */ +/* Check notdirty */ +if (unlikely(tlb_addr & TLB_NOTDIRTY)) { +tlb_set_dirty(ENV_GET_CPU(env), addr); +tlb_addr = tlb_addr & ~TLB_NOTDIRTY; +} + +/* Notice an IO access */ if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) { /* There's really nothing that can be done to support this apart from stop-the-world. */ Regards Nikunj
Re: [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers
Alex Bennée writes: > Nikunj A Dadhania writes: > >> Richard Henderson writes: >> >>> On 09/12/2016 06:47 AM, Alex Bennée wrote: >>>>> > +/* Notice an IO access, or a notdirty page. */ >>>>> > +if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) { >>>>> > +/* There's really nothing that can be done to >>>>> > + support this apart from stop-the-world. */ >>>>> > +goto stop_the_world; >>>> We are also triggering on TLB_NOTDIRTY here in the case where a >>>> conditional write is the first write to a page. I don't know if a >>>> stop_the_world is required at this point but we will need to ensure we >>>> clear bits as notdirty_mem_write() does. >>>> >>> >>> You're quite right that we could probably special-case TLB_NOTDIRTY here >>> such >>> that (1) we needn't leave the cpu loop, and (2) needn't utilize the actual >>> "write" part of notdirty_mem_write; just set the bits then fall through to >>> the >>> actual atomic instruction below. >> >> I do hit this case with ppc64, where I see that its the first write to >> the page and it exits from this every time, causing the kernel to print >> soft-lockups. >> >> Can we add the special case here for NOTDIRTY and set the page as dirty >> and return successfully? > > Does the atomic step fall-back not work for you? No, it doesn't seem so, otherwise I shouldn't have seen soft-lockups. Regards Nikunj
Re: [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers
Richard Henderson writes: > On 09/12/2016 06:47 AM, Alex Bennée wrote: >>> > +/* Notice an IO access, or a notdirty page. */ >>> > +if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) { >>> > +/* There's really nothing that can be done to >>> > + support this apart from stop-the-world. */ >>> > +goto stop_the_world; >> We are also triggering on TLB_NOTDIRTY here in the case where a >> conditional write is the first write to a page. I don't know if a >> stop_the_world is required at this point but we will need to ensure we >> clear bits as notdirty_mem_write() does. >> > > You're quite right that we could probably special-case TLB_NOTDIRTY here such > that (1) we needn't leave the cpu loop, and (2) needn't utilize the actual > "write" part of notdirty_mem_write; just set the bits then fall through to the > actual atomic instruction below. I do hit this case with ppc64, where I see that its the first write to the page and it exits from this every time, causing the kernel to print soft-lockups. Can we add the special case here for NOTDIRTY and set the page as dirty and return successfully? Regards Nikunj