>>> +/* >>> + * struct tpacket_memreg_req is used in conjunction with PACKET_MEMREG >>> + * to register user memory which should be used to store the packet >>> + * data. >>> + * >>> + * There are some constraints for the memory being registered: >>> + * - The memory area has to be memory page size aligned. >>> + * - The frame size has to be a power of 2. >>> + * - The frame size cannot be smaller than 2048B. >>> + * - The frame size cannot be larger than the memory page size. >>> + * >>> + * Corollary: The number of frames that can be stored is >>> + * len / frame_size. >>> + * >>> + */ >>> +struct tpacket_memreg_req { >>> + unsigned long addr; /* Start of packet data area */ >>> + unsigned long len; /* Length of packet data area */ >>> + unsigned int frame_size; /* Frame size */ >>> + unsigned int data_headroom; /* Frame head room */ >>> +}; >> >> Existing packet sockets take a tpacket_req, allocate memory and let the >> user process mmap this. I understand that TPACKET_V4 distinguishes >> the descriptor from packet pools, but could both use the existing structs >> and logic (packet_mmap)? That would avoid introducing a lot of new code >> just for granting user pages to the kernel. >> > > We could certainly pass the "tpacket_memreg_req" fields as part of > descriptor ring setup ("tpacket_req4"), but we went with having the > memory register as a new separate setsockopt. Having it separated, > makes it easier to compare regions at the kernel side of things. "Is > this the same umem as another one?" If we go the path of passing the > range at descriptor ring setup, we need to handle all kind of > overlapping ranges to determine when a copy is needed or not, in those > cases where the packet buffer (i.e. umem) is shared between processes.
That's not what I meant. Both descriptor rings and packet pools are memory regions. Packet sockets already have logic to allocate regions and make them available to userspace with mmap(). Packet v4 reuses that logic for its descriptor rings. Can it use the same for its packet pool? Why does the kernel map user memory, instead? That is a lot of non-trivial new logic.