Hello, Thank you all for helping us understand on issues with no hugepage option.
As Konstantin mentioned at the end, I tried using VFIO module instead of IGB UIO module. I enabled all necessary parameters (IOMMU, virtualization, vfio-pci, VFIO permission) and ran my code with no hugepage option. At first, it seemed to receive packets fine, but after a while, it stopped receiving packets. I could temporarily remove this issue by not calling rte_eth_tx_burst(). Also, when I looked at the received packets, they all contained 0s instead of actual data. Was there anything that I missed in running with VFIO? I'm curious if no hugepage with no hugepage option was confirmed to run with VFIO. Thank you, Younghwan 2015-11-25 ?? 11:12? Ananyev, Konstantin ?(?) ? ?: > >> -----Original Message----- >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Sergio Gonzalez >> Monroy >> Sent: Wednesday, November 25, 2015 1:44 PM >> To: Thomas Monjalon >> Cc: dev at dpdk.org >> Subject: Re: [dpdk-dev] no hugepage with UIO poll-mode driver >> >> On 25/11/2015 13:22, Thomas Monjalon wrote: >>> 2015-11-25 12:02, Bruce Richardson: >>>> On Wed, Nov 25, 2015 at 12:03:05PM +0100, Thomas Monjalon wrote: >>>>> 2015-11-25 11:00, Bruce Richardson: >>>>>> On Wed, Nov 25, 2015 at 11:23:57AM +0100, Thomas Monjalon wrote: >>>>>>> 2015-11-25 10:08, Bruce Richardson: >>>>>>>> On Wed, Nov 25, 2015 at 03:39:17PM +0900, Younghwan Go wrote: >>>>>>>>> Hi Jianfeng, >>>>>>>>> >>>>>>>>> Thanks for the email. rte mempool was successfully created without any >>>>>>>>> error. Now the next problem is that rte_eth_rx_burst() is always >>>>>>>>> returning 0 >>>>>>>>> as if there was no packet to receive... Do you have any suggestion on >>>>>>>>> what >>>>>>>>> might be causing this issue? In the meantime, I will be digging >>>>>>>>> through >>>>>>>>> ixgbe driver code to see what's going on. >>>>>>>>> >>>>>>>>> Thank you, >>>>>>>>> Younghwan >>>>>>>>> >>>>>>>> The problem is that with --no-huge we don't have the physical address >>>>>>>> of the memory >>>>>>>> to write to the network card. That's what it's marked as for testing >>>>>>>> only. >>>>>>> Even with rte_mem_virt2phy() + rte_mem_lock_page() ? >>>>>>> >>>>>> With no-huge, we just set up a single memory segment at startup and set >>>>>> its >>>>>> "physaddr" to be the virtual address. >>>>>> >>>>>> /* hugetlbfs can be disabled */ >>>>>> if (internal_config.no_hugetlbfs) { >>>>>> addr = mmap(NULL, internal_config.memory, PROT_READ | >>>>>> PROT_WRITE, >>>>>> MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); >>>>>> if (addr == MAP_FAILED) { >>>>>> RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", >>>>>> __func__, >>>>>> strerror(errno)); >>>>>> return -1; >>>>>> } >>>>>> mcfg->memseg[0].phys_addr = >>>>>> (phys_addr_t)(uintptr_t)addr; >>>>> rte_mem_virt2phy() does not use memseg.phys_addr but /proc/self/pagemap: >>>>> >>>>> /* >>>>> * the pfn (page frame number) are bits 0-54 (see >>>>> * pagemap.txt in linux Documentation) >>>>> */ >>>>> physaddr = ((page & 0x7fffffffffffffULL) * page_size) >>>>> + ((unsigned long)virtaddr % page_size); >>>>> >>>> Yes, you are right. I was not aware that that function was used as part of >>>> the >>>> mempool init, but now I see that "rte_mempool_virt2phy()" does indeed call >>>> that >>>> function if hugepages are disabled, so my bad. >>> Do you think we could move --no-huge in the main section (not only for >>> testing)? >> Hi, >> >> I think the main issue is going to be the HW descriptors queues. >> AFAIK drivers now call rte_eth_dma_zone_reserve, which is basically a >> wrapper around >> rte_memzone_reserve, to get a chunk of phys memory, and in the case of >> --no-huge is >> not going to be really phys contiguous. >> >> Ideally we would move and expand the functionality of dma_zone reserve >> API to the EAL, >> so we could detect what page size we have and set the boundary for such >> page size. >> dma_zone_reserve does something similar to work on Xen target by >> reserving memzones >> on 2MB boundary. > With xen we have a special kernel driver that allocates physically continuous > chunks of memory for us. > So we can guarantee that each such chunk would be at least 2MB long. > That's enough to allocate HW rings (max HW ring size for let say ixgbe is > ~64KB). > Here there is absolutely no guarantee that memory allocated by kernel will be > memory continuous. > Of course you can search though all pages that you allocated and most likely > you'll find a continuous > chunk big enough for that. > Another problem - mbufs. > You need to be sure that each mbuf doesn't cross page boundary > (in case next page is not adjacent to current one). > So you'll probably need to use rte_mempool_xmem_create() to allocate mbufs > from no hugepages. > BTW, as I remember with vfio in place you should be able to do IO with > no-hugepages options, no? > As it relies on vfio ability to setup IOMMU tables for you. > Konstantin > >> Sergio