On 30/05/17 10:26, Dmitry Vyukov wrote: > On Tue, May 30, 2017 at 11:08 AM, Vladimir Murzin > <vladimir.mur...@arm.com> wrote: >>> <vladimir.mur...@arm.com> wrote: >>>> On 30/05/17 09:31, Vladimir Murzin wrote: >>>>> [This sender failed our fraud detection checks and may not be who they >>>>> appear to be. Learn about spoofing at http://aka.ms/LearnAboutSpoofing] >>>>> >>>>> On 30/05/17 09:15, Dmitry Vyukov wrote: >>>>>> On Tue, May 30, 2017 at 9:58 AM, Vladimir Murzin >>>>>> <vladimir.mur...@arm.com> wrote: >>>>>>> On 29/05/17 16:29, Dmitry Vyukov wrote: >>>>>>>> I have an alternative proposal. It should be conceptually simpler and >>>>>>>> also less arch-dependent. But I don't know if I miss something >>>>>>>> important that will render it non working. >>>>>>>> Namely, we add a pointer to shadow to the page struct. Then, create a >>>>>>>> slab allocator for 512B shadow blocks. Then, attach/detach these >>>>>>>> shadow blocks to page structs as necessary. It should lead to even >>>>>>>> smaller memory consumption because we won't need a whole shadow page >>>>>>>> when only 1 out of 8 corresponding kernel pages are used (we will need >>>>>>>> just a single 512B block). I guess with some fragmentation we need >>>>>>>> lots of excessive shadow with the current proposed patch. >>>>>>>> This does not depend on TLB in any way and does not require hooking >>>>>>>> into buddy allocator. >>>>>>>> The main downside is that we will need to be careful to not assume >>>>>>>> that shadow is continuous. In particular this means that this mode >>>>>>>> will work only with outline instrumentation and will need some ifdefs. >>>>>>>> Also it will be slower due to the additional indirection when >>>>>>>> accessing shadow, but that's meant as "small but slow" mode as far as >>>>>>>> I understand. >>>>>>>> >>>>>>>> But the main win as I see it is that that's basically complete support >>>>>>>> for 32-bit arches. People do ask about arm32 support: >>>>>>>> https://groups.google.com/d/msg/kasan-dev/Sk6BsSPMRRc/Gqh4oD_wAAAJ >>>>>>>> https://groups.google.com/d/msg/kasan-dev/B22vOFp-QWg/EVJPbrsgAgAJ >>>>>>>> and probably mips32 is relevant as well. >>>>>>>> Such mode does not require a huge continuous address space range, has >>>>>>>> minimal memory consumption and requires minimal arch-dependent code. >>>>>>>> Works only with outline instrumentation, but I think that's a >>>>>>>> reasonable compromise. >>>>>>> >>>>>>> .. or you can just keep shadow in page extension. It was suggested back >>>>>>> in >>>>>>> 2015 [1], but seems that lack of stack instrumentation was "no-way"... >>>>>>> >>>>>>> [1] https://lkml.org/lkml/2015/8/24/573 >>>>>> >>>>>> Right. It describes basically the same idea. >>>>>> >>>>>> How is page_ext better than adding data page struct? >>>>> >>>>> page_ext is already here along with some other debug options ;) >>> >>> >>> But page struct is also here. What am I missing? >>> >> >> Probably, free room in page struct? I guess most of the page_ext stuff would >> love to live in page struct, but... for instance, look at page idle tracking >> which has to live in page_ext only for 32-bit. > > > Sorry for my ignorance. What's the fundamental problem with just > pushing everything into page struct?
I think [1] has an answer for your question ;) > > I don't see anything relevant in page struct comment. Nor I see "idle" > nor "tracking" page struct. I see only 2 mentions of CONFIG_64BIT, but > both declare the same fields just with different types (int vs short). Right, it is because implementation is based on page flags [1]: Note, since there is no room for extra page flags on 32 bit, this feature uses extended page flags when compiled on 32 bit. [1] https://lwn.net/Articles/565097/ [2] 33c3fc7 ("mm: introduce idle page tracking") Cheers Vladimir > > > >>>>>> It seems that memory for all page_ext is preallocated along with page >>>>>> structs; but just the lookup is slower. >>>>>> >>>>> >>>>> Yup. Lookup would look like (based on v4.0): >>>>> >>>>> ... >>>>> page_ext = lookup_page_ext_begin(virt_to_page(start)); >>>>> >>>>> do { >>>>> page_ext->shadow[idx++] = value; >>>>> } while (idx < bound); >>>>> >>>>> lookup_page_ext_end((void *)page_ext); >>>>> >>>>> ... >>>> >>>> Correction: please, ignore that *_{begin,end} stuff - mainline only >>>> lookup_page_ext() is only used. >>> >>> >>> Note that this added code will be executed during handling of each and >>> every memory access in kernel. Every instruction matters on that path. >> >> I know, I know... still better than nothing. >> >>> The additional indirection via page struct will also slow down it, but >>> that's the cost for lower memory consumption and potentially 32-bit >>> support. For page_ext it looks like even more overhead for no gain. >>> >> >> eefa864 (mm/page_ext: resurrect struct page extending code for debugging) >> express some cases where keeping data in page_ext has benefit. >> >> Cheers >> Vladimir >