Re: [RFC PATCH 0/2] powerpc: CPU cache op cleanup
On Nov 15, 2011, at 23:40, Paul Mackerras wrote: > On Tue, Nov 15, 2011 at 04:45:18PM -0600, Moffett, Kyle D wrote: >> >> I guess that's doable, although I have to admit that idea almost gives >> me more of a headache than trying to fix up the 32-bit ASM. >> >> One thing that bothers me in particular is that both 32/64 versions of >> __copy_tofrom_user() are dramatically overcomplicated for what they >> ought to be doing. >> >> It would seem that if we get a page fault during an unaligned copy, we >> ought to just give up and fall back to a simple byte-by-byte copy loop >> from wherever we left off. That would eliminate 90% of the ugly >> special cases without actually hurting performance, right? > > That's basically what we do, IIRC, and most of the complexity comes > from working out where we were up to. We could probably use a simpler > approximation that means we might copy some bytes twice. In fact the > greatest simplification would probably be to implement range entries > in the exception table so we can just have one entry for all the loads > and stores instead of an entry for each individual load and store. Well, I spent some time tinkering with the GCC inline-assembly option, which was probably a waste, but I figured I would post my code here for other people to chuckle at. :-D Here's a basic, relatively easily extended "copy u8" macro that sets up the exception table using "asm goto": #define try_copy_u8(DST, SRC, LOAD_FAULT, STORE_FAULT) do { \ unsigned long try_copy_tmp__ = (try_copy_tmp__);\ asm goto ( \ "1: lbz %[tmp], %[src]\n" \ "2: stb %[tmp], %[dst]\n" \ " .pushsection __ex_table, \"a\"\n" \ " .align 2\n" \ " .long 1b, %l["#LOAD_FAULT"]\n" \ " .long 2b, %l["#STORE_FAULT"]\n" \ " .popsection\n" \ : /* No outputs allowed for "asm goto" */ \ : [dst] "m"(*(__user u8 *)(DST)), \ [src] "m"(*(const __user u8 *)(SRC)), \ [tmp] "r"(try_copy_tmp__) \ : "memory" \ : LOAD_FAULT, STORE_FAULT \ ); \ } while(0) If I put that into a function and compile it, the assembly and the exception table look perfectly OK, even under register pressure. With a few macros like that it looks like it should be possible to write the copy function directly in C and get optimal results. The only other variants you need would be "try_copy_ulong" and "try_copy_4ulong"/"try_copy_8ulong" for 32/64-bit. Unfortunately, as I mentioned before, GCC 4.4 and older don't have "asm goto" support :-(. Perhaps I could put __copy_tofrom_user() into its own file and make the assembled 32/64 output files be ".shipped"? On the other hand, perhaps this is overly complicated :-D. I'll poke at it more tomorrow. >> For a page-fault during a cacheline-aligned copy, we should be able to >> handle the exception and retry from the last cacheline without much >> logic, again with good performance. >> >> With that said, I'm curious about the origin of the PPC32 ASM. In >> particular, it looks like it was generated by GCC at some point in the >> distant past, and I'm wondering if there's a good way to rewrite that >> file in C and trick GCC into generating the relevant exception tables >> for it? > > Why do you think it was generated by gcc? I wrote the original > version, but I think it got extended and macro-ized by others. Ah, sorry, when I first looked at it the large collection of numeric labels and the very sparing comments made it look autogenerated. Although, given how much of a pain in the neck it is maybe you would rather people not think you wrote it at all. ;-) Cheers, Kyle Moffett -- Curious about my work on the Debian powerpcspe port? I'm keeping a blog here: http://pureperl.blogspot.com/ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 0/2] powerpc: CPU cache op cleanup
On Tue, Nov 15, 2011 at 04:45:18PM -0600, Moffett, Kyle D wrote: > On Nov 15, 2011, at 17:29, Benjamin Herrenschmidt wrote: > > On Mon, 2011-11-14 at 21:32 -0500, Kyle Moffett wrote: > >> Unfortunately, I've been staring at PPC asm for long enough that I > >> have a migraine headache and I'm going to have to stop here for now. > >> If somebody else wants to tackle fixing up the 32-bit copy_page() and > >> __copy_tofrom_user() routines it would be highly appreciated. > > > > Yeah that's the one everybody's avoiding :-) > > > > What about my idea of instead compiling it multiple times with a > > different size and fixing up the branch to call the right one ? > > I guess that's doable, although I have to admit that idea almost gives > me more of a headache than trying to fix up the 32-bit ASM. > > One thing that bothers me in particular is that both 32/64 versions of > __copy_tofrom_user() are dramatically overcomplicated for what they > ought to be doing. > > It would seem that if we get a page fault during an unaligned copy, we > ought to just give up and fall back to a simple byte-by-byte copy loop > from wherever we left off. That would eliminate 90% of the ugly > special cases without actually hurting performance, right? That's basically what we do, IIRC, and most of the complexity comes from working out where we were up to. We could probably use a simpler approximation that means we might copy some bytes twice. In fact the greatest simplification would probably be to implement range entries in the exception table so we can just have one entry for all the loads and stores instead of an entry for each individual load and store. > For a page-fault during a cacheline-aligned copy, we should be able to > handle the exception and retry from the last cacheline without much > logic, again with good performance. > > With that said, I'm curious about the origin of the PPC32 ASM. In > particular, it looks like it was generated by GCC at some point in the > distant past, and I'm wondering if there's a good way to rewrite that > file in C and trick GCC into generating the relevant exception tables > for it? Why do you think it was generated by gcc? I wrote the original version, but I think it got extended and macro-ized by others. Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 0/2] powerpc: CPU cache op cleanup
On Nov 15, 2011, at 18:46, Benjamin Herrenschmidt wrote: > On Tue, 2011-11-15 at 16:45 -0600, Moffett, Kyle D wrote: >> >> With that said, I'm curious about the origin of the PPC32 ASM. In >> particular, it looks like it was generated by GCC at some point in the >> distant past, and I'm wondering if there's a good way to rewrite that >> file in C and trick GCC into generating the relevant exception tables >> for it? > > There is some serious history in there :-) > > I would check with Anton, he's been doing some performance work on those > lately (the 64-bit ones). > > It's probably worth throwing a proof-of-concept simpler variant for > 32-bit at least on the table and have people compare the perfs > (typically network perfs). I can test on a range of ppc32 here (6xx, > 7xxx, 4xx). Ok, so there's not really a good way to make GCC generate the exception tables itself. I've come up with several overly-clever ways to do most of what we would want using "asm goto" except that (1) "asm goto" cannot have register outputs, and (2) "asm goto" is only available in GCC 4.5+ I could easily work around the former by putting the code into its own file and creating a "global" register variable just for that file, but the GCC 4.5+ dependency is a total nonstarter. I'm trying to see if I can make it look better than it does now with some judicious use of inline ASM. At the very least, it should be possible to have a wrapper function written in C which calls the ASM guts with the correct cache params. More importantly, the ASM code needs to use something other than totally arbitrary numbers for labels. :-D Cheers, Kyle Moffett -- Curious about my work on the Debian powerpcspe port? I'm keeping a blog here: http://pureperl.blogspot.com/ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 0/2] powerpc: CPU cache op cleanup
On Tue, 2011-11-15 at 16:45 -0600, Moffett, Kyle D wrote: > I guess that's doable, although I have to admit that idea almost gives > me more of a headache than trying to fix up the 32-bit ASM. > > One thing that bothers me in particular is that both 32/64 versions of > __copy_tofrom_user() are dramatically overcomplicated for what they > ought to be doing. > > It would seem that if we get a page fault during an unaligned copy, we > ought to just give up and fall back to a simple byte-by-byte copy loop > from wherever we left off. That would eliminate 90% of the ugly > special cases without actually hurting performance, right? > > For a page-fault during a cacheline-aligned copy, we should be able to > handle the exception and retry from the last cacheline without much > logic, again with good performance. > > With that said, I'm curious about the origin of the PPC32 ASM. In > particular, it looks like it was generated by GCC at some point in the > distant past, and I'm wondering if there's a good way to rewrite that > file in C and trick GCC into generating the relevant exception tables > for it? There is some serious history in there :-) I would check with Anton, he's been doing some performance work on those lately (the 64-bit ones). It's probably worth throwing a proof-of-concept simpler variant for 32-bit at least on the table and have people compare the perfs (typically network perfs). I can test on a range of ppc32 here (6xx, 7xxx, 4xx). Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 0/2] powerpc: CPU cache op cleanup
On Nov 15, 2011, at 17:29, Benjamin Herrenschmidt wrote: > On Mon, 2011-11-14 at 21:32 -0500, Kyle Moffett wrote: >> Unfortunately, I've been staring at PPC asm for long enough that I >> have a migraine headache and I'm going to have to stop here for now. >> If somebody else wants to tackle fixing up the 32-bit copy_page() and >> __copy_tofrom_user() routines it would be highly appreciated. > > Yeah that's the one everybody's avoiding :-) > > What about my idea of instead compiling it multiple times with a > different size and fixing up the branch to call the right one ? I guess that's doable, although I have to admit that idea almost gives me more of a headache than trying to fix up the 32-bit ASM. One thing that bothers me in particular is that both 32/64 versions of __copy_tofrom_user() are dramatically overcomplicated for what they ought to be doing. It would seem that if we get a page fault during an unaligned copy, we ought to just give up and fall back to a simple byte-by-byte copy loop from wherever we left off. That would eliminate 90% of the ugly special cases without actually hurting performance, right? For a page-fault during a cacheline-aligned copy, we should be able to handle the exception and retry from the last cacheline without much logic, again with good performance. With that said, I'm curious about the origin of the PPC32 ASM. In particular, it looks like it was generated by GCC at some point in the distant past, and I'm wondering if there's a good way to rewrite that file in C and trick GCC into generating the relevant exception tables for it? Cheers, Kyle Moffett -- Curious about my work on the Debian powerpcspe port? I'm keeping a blog here: http://pureperl.blogspot.com/ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 0/2] powerpc: CPU cache op cleanup
On Mon, 2011-11-14 at 21:32 -0500, Kyle Moffett wrote: > Unfortunately, I've been staring at PPC asm for long enough that I > have a migraine headache and I'm going to have to stop here for now. > If somebody else wants to tackle fixing up the 32-bit copy_page() and > __copy_tofrom_user() routines it would be highly appreciated. Yeah that's the one everybody's avoiding :-) What about my idea of instead compiling it multiple times with a different size and fixing up the branch to call the right one ? Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev