On Tue, Aug 7, 2018 at 12:48 AM, Sebastian Huber < sebastian.hu...@embedded-brains.de> wrote:
> On 06/08/18 21:14, Joel Sherrill wrote: > > >> >> On Mon, Aug 6, 2018 at 1:21 AM, Chris Johns <chr...@rtems.org <mailto: >> chr...@rtems.org>> wrote: >> >> On 06/08/2018 16:12, Christian Mauderer wrote: >> > Am 06.08.2018 um 07:31 schrieb Chris Johns: >> >> On 06/08/2018 10:51, Chris Johns wrote: >> >>> On 05/08/2018 19:39, Christian Mauderer wrote: >> >>>> Am 05.08.2018 um 04:00 schrieb Chris Johns: >> >>>>> Hi, >> >>>>> >> >>>>> I have been working on migrating covoar in the rtems-tools >> repo to DWARF. The >> >>>>> goal is remove objdump parsing and to get accurate details >> about the functions >> >>>>> being covered. This is an unfunded task. >> >>>>> >> >>>>> The work has resulted in a close examination of inlined code >> in RTEMS and what I >> >>>>> saw alarmed me so I have added a report to the rtems-exeinfo >> tool in rtems-tools >> >>>>> (the change is to be posted for review once I get the >> coverage tests running). >> >>>>> >> >>>>> A summary report for hello.exe on RTEMS 5 for SPARC is: >> >>>>> >> >>>>> inlined funcs : 1412 >> >>>>> total funcs : 1956 >> >>>>> % inline funcs : 72% >> >>>>> total size : 174616 >> >>>>> inline size : 81668 >> >>>>> % inline size : 46% >> >>>>> >> >>>>> This is a small application so it could be argued that skews >> the figures. A >> >>>>> large C/C++ application built with -O2 running on RTEMS 4.11 >> ARM reports the >> >>>>> inline usage as: >> >>>>> >> >>>>> inlined funcs : 10370 >> >>>>> total funcs : 17700 >> >>>>> % inline funcs : 58% >> >>>>> total size : 3066240 >> >>>>> inline size : 1249514 >> >>>>> % inline size : 40% >> >>>>> >> >>>>> This does not seem right to me. >> >>>>> >> >>>>> The report is new and there could be issues in the DWARF >> handling that feeds >> >>>>> this report however I am posting this to start a discussion >> on the topic of >> >>>>> inlining. >> >>>>> >> >>>>> I attach the report for hello.exe. The `-i` option generates >> the inline report. >> >>>>> >> >>>>> The first section is a summary showing the total number of >> functions in the >> >>>>> executable that have machine code and are flagged as inline. >> The report lists >> >>>>> the percentage of functions that are inlined and the >> percentage of machine code >> >>>>> that is inlined. The values seem high to me. >> >>>>> >> >>>>> The second table lists inline functions that are repeated >> sorted from the >> >>>>> largest foot print to the smallest. The first column the >> total size of machine >> >>>>> code in the executable and the second column the number of >> instances. >> >>>>> >> >>>>> The third table is the list of inline functions sorted from >> largest machine code >> >>>>> footprint to smallest. The second column are flags of which >> there is one. A `E` >> >>>>> indicates the inline function is also external which means >> the compiler has >> >>>>> created an external reference to this function, ie an >> address-of is being taken. >> >>>>> The third column is the address in the executable so you can >> take a look with >> >>>>> objdump at the machine code. >> >>>>> >> >>>>> We need to ask some important question in relation to >> inlining. It is cheap to >> >>>>> add and we all feel the code we add needs to be fast and >> needs to be inlined but >> >>>>> does it really need to be inlined? >> >>>>> >> >>>>> Some pieces of code do need to be inlined and the overhead >> is just that an >> >>>>> overhead, for example in the large C/C++ application there >> is a low level >> >>>>> volatile hardware write routing with close to 300 instances >> and a code size of >> >>>>> 10K. This code needs to be inlined for performance reasons >> but should the size >> >>>>> on average be 40 bytes, I doubt it. >> >>>>> >> >>>>> Can we be more judicious with our use of the inline keyword? >> >>>>> >> >>>>> Is the performance gain we really expect or is the actual >> overhead of a call >> >>>>> frame not worth saving? >> >>>>> >> >>>>> What are the real costs of inlining a piece of code? It adds >> size to the >> >>>>> executable and depending on the code being inlined it >> complicates coverage >> >>>>> analysis be adding extra branch points. >> >>>>> >> >>>>> The metrics to determine what should be inlined is >> complicated and I do not >> >>>>> think we have a suitable policy in place. I believe it is >> time we to create one. >> >>>>> >> >>>>> The issue is not limited to our code, gcc, newlib and >> libstdc++ seem to have >> >>>>> some code that should be looked at more closely. For example >> __udivmoddi4, and >> >>>>> __sprint_r. >> >>>>> >> >>>>> Chris >> >>>>> >> >>>>> >> >>>> >> >>>> Hello Chris, >> >>>> >> >>>> I just took a look at one of the first function in your list: >> __sprint_r >> >>>> >> >>>> >> https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;a= >> blob;f=newlib/libc/stdio/vfprintf.c;h=c4bf2dbe31da64462 >> ecccec97c8e901e4ffadd44;hb=HEAD#l403 >> <https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git; >> a=blob;f=newlib/libc/stdio/vfprintf.c;h=c4bf2dbe31da64462 >> ecccec97c8e901e4ffadd44;hb=HEAD#l403> >> >>>> >> >>>> As far as I can see, there is no explicit inline key word for >> that >> >>>> function. So in that case, the compiler decided that it would >> be a good >> >>>> idea to inline that function. >> >>> >> >>> Thanks and yes. At this point in time I cannot tell what is >> happening and I am >> >>> not sure the tool is reporting accurate data, I need to >> investigate. >> >> >> >> I have updated the tool and report to show which inline >> functions are: >> >> >> >> - inlined by compiler >> >> - declared inline and not inlined >> >> - declared inline and inlined >> >> >> >> I have also fixed a quick hack I had where the size was the >> span from the low PC >> >> to the high PC, this was wrong. Inlined code can be split and >> moved when >> >> inlining creating a discontinuous address range. The size in >> the report is now >> >> the number of machine code bytes. >> >> >> >> The report will show any functions not inlined when asked to be >> inlined. We do >> >> not have any. >> >> >> >> The 'C' flag in the inlined table shows which functions the >> compiler has inlined. >> >> >> >> Chris >> >> >> > >> > With that list it is now much clearer which functions would be >> relevant >> > for a potential review. >> >> Yes. FYI the C/C++ RTEMS 4.11 app now gives: >> >> inlined funcs : 10370 >> total funcs : 17700 >> % inline funcs : 58% >> total size : 2296354 >> inline size : 479628 >> % inline size : 20% >> >> This level of reduction is more inline with what I expected. >> >> > >> >>> >> >>>> I'm not sure whether I might just haven't seen it but is there a >> >>>> possibility to distinguish between functions that have been >> inlined by >> >>>> the compiler and ones that have been inlined due to the >> "inline" keyword >> >>>> without looking at every definition? >> >>> >> >>> I am not sure. The DWARF data is complex and detailed and I >> view this initial >> >>> step into the area of using DWARF to perform static analysis >> of RTEMS >> >>> executables as green. >> >>> >> >>> DWARF does provide declaration attributes. I need to review >> the DWARF data and >> >>> standard to determine if we can tell what is declared inline >> and what has been >> >>> inlined. I think it would be good to know. >> >>> >> >>>> Did you try compiling with size optimization? I would expect >> that the >> >>>> compiler would inline far less functions and maybe even >> ignore some >> >>>> "inline" keywords. As far as I know it's more of a hint to >> the compiler. >> >>> >> >>> Not yet. A complete tool build with those options is a lot of >> effort and I am >> >>> still not comfortable the report is accurate. I think this is >> something that >> >>> should be done at some point. I think it would create an >> interesting data point. >> >>> >> >>>> I would only worry about functions that are still inlined if size >> >>>> optimization is selected. >> >>> >> >>> I think we need to review the functions we currently have >> tagged as inline. I >> >>> think the only way we can do this is with real data. >> >>> >> >>>> That's the case when I tell the compiler to >> >>>> make the program as small as possible. In all other cases I >> want some >> >>>> well balanced optimum between speed and size. Inlining small >> functions >> >>>> is OK in that case if you ask me. >> >>> >> >>> How do you define this, ie what is the inline policy we use? >> >>> How do you audit this? >> > >> > Both questions are not simple to answer. It is most likely a case by >> > case decision. I think there are roughly two reasons for inlining: >> > >> > - The code is short enough that it is smaller if it is inlined >> compared >> > to a function call. >> > >> > - There is some performance reason. >> > >> > Anything else? >> >> Not for inlining rather for not inlining? I had a discussion with >> Joel last week >> about coverage and he said he had previously reviewed the inlines >> to reduce the >> branch counts because it complicates coverage. I would prefer he >> talks about >> this, it is his area. >> >> >> Thanks for the lead-in Chris. >> >> First, there are multiple types of coverage. Statement, Decision coverage, >> MC/DC, and object coverage. Based on various limiting factors which I >> am happy to discuss in more detail if someone is interested, we went >> with object coverage. I will briefly remind everyone of the limitations >> and guidelines we went into this with: >> >> + We didn't want instrumented code. Test as you fly, fly as you test. >> Plus there would need to be a way to get information generated by >> the instrumentation off the target. >> >> + Without gcc support for identifying sub-expressions in complex >> logical expressions, there is no way to do DC or MCDC. Further, >> we would have had to have a tool to interpret the non-existent >> information and know which entries in the truth tables for every >> expression had to be exercised for DC and MCDC. >> >> + gcov's reports are primarily statement coverage. And you can't get >> gcov reports without gcov output from runs which we still are working >> on. Also in terms of DO-178, statement coverage is generally Level C >> type coverage. >> >> We wanted higher level coverage for RTEMS and thus object >> coverage with emphasis on every generated branch statement being >> needed and exercisable as taken and not taken. This approach >> ensured we didn't have dead code. >> >> Plus this type of information could easily be obtained from simulator >> traces from Qemu and tsim. covoar was written to combine and >> correlate the information back to assembly and source and generate >> the reports. >> >> That's what led to the current coverage approach. Chris has been >> working to eliminate the use of command line tools (e.g. nm, objdump, >> etc) in favor of using DWARF directly. >> >> That's the background of the approach and how we got reports. >> >> Also remember that RTEMS testing is primarily via the public APIs >> which combined with coverage ensures we actually need all the >> internal code. If you can't reach it from a public API, it is dead. >> >> But the insights into RTEMS coding practices began when we started >> trying to improve the coverage percentages. Key point: >> >> IF CODE IS DUPLICATED AT THE SOURCE OR OBJECT LEVEL, IT HAS TO >> BE FULLY TESTED AT EACH INSTANCE OR INSTANTIATION. >> >> At the source level, sometimes we factored out a common helper method >> to avoid testing the same logic via multiple APIs. This reduced the number >> of test cases needed. >> >> We looked at a lot of generated assembly. Sometimes we would see >> large methods being inlined multiple times. This would increase the >> overall >> size of an RTEMS application. But size is not the only impact of inlining. >> If an inlined method has one or more if's in it, then the branch paths >> it includes are introduced EVERYWHERE it is inlined. When we >> had _Thread_Dispatch_enable, I recall it was used > 100 times and >> includes a branch in it. There was a build option to not inline this >> routine to avoid needing to add over 100 test cases. >> >> There is a delicate code size and complexity dance when deciding >> whether to inline something or not. There isn't an easy or obvous >> answer. >> >> + How large is the method? >> >> + How many times is it used? >> >> + How many branches/loops does it have (e.g. cyclomatic complexity)? >> >> In general, very small methods with no branches can be inlined in less >> code than the subroutine call takes. So these tend to be a win. >> >> Over a certain size, the number of times used multiplied by the inlined >> size can have a negative impact on code size. We saw this a lot when >> doing the code coverage improvements. >> >> If the inlined method has branches, is it possible or cost effective >> to write test cases for the branches introduced at EVERY use? >> >> I came away believing a few things: >> >> + The MISRA rule for using simple, single condition branch logic >> is really good for code coverage analysis. It simplifies it. Statement >> coverage and branch coverage become closer to equivalent. The >> resulting assembly code is also much easier to match to source. >> >> + Inlining a method with conditional logic has to be looked at >> very carefully. You don't want to inline something and find yourself >> having exploded the required number of test cases. Again, look >> at the number of times a method is used but be cautious about >> conditional expressions in inline methods. >> >> + Inlining doesn't always improve performance measurably. It >> can increase code size, code to branch around, etc. >> >> For a current example of something that is very likely a problem >> from a object code test coverage and code size increase view, >> consider _Thread_Dispatch_enable. It is inlined 41 times. >> Chris and I looked at a place where it was inlined. It w >> <https://maps.google.com/?q=ed+at+a+place+where+it+was+inlined.+It+w&entry=gmail&source=g>as >> about >> 80 bytes. 41*80 ==> 3280 bytes. Based on size along, it would >> save ~3K across the entire RTEMS API object code to not inline it >> at all. >> >> Then I looked at the cyclomatic complexity of _Thread_Dispatch_enable(). >> From Wikipedia: >> >> Cyclomatic complexity is a software metric, used to indicate the >> complexity >> of a program. It is a quantitative measure of the number of linearly >> independent >> paths through a program's source code. >> >> I used http://www.lizard.ws/ [1] to analyse this method. If you take the >> conditional >> compilation out of the code, the cyclomatic complexity is 4. This means >> that >> every place this is inlined, it introduces 3 extra paths to test. 41*3 >> ==> 123. >> >> My recommendation would be to evaluate the impact of not inlining this >> method at all. The negative impact on testing and code size are pretty >> high. >> >> Unfortunately, what I wrote up is just the guidelines. Things change over >> time. An inlined method that is a nice encapsulation when written could >> turn out to be too large to be inlined if 10 other use cases get added. >> >> I lean to a method can be inlined if it is small and no more than one >> branch statement in assembly. You can be more lenient if used only >> once or twice. But lean to NO branches on an inlined method if the >> number of uses is high. >> >> Ideally the inlined code is no larger than the code for the call itself >> and there are no branches. But going above that is not a reason to >> not inline. >> >> I know this was long but there are a lot of trade-offs. We want RTEMS to >> be as small as possible and as testable as possible. We don't have an >> army of people to write test cases. And I don't want to need that. >> >> Also I have focused on the things that the RTEMS community has control >> over. Chris' tool gives us a broader view and we can likely help improve >> things more broadly. >> >> [1] You can cut and paste code into this via the web. So pretty easy to >> look at a single method. >> > > I think all this information should be moved to a proper RTEMS > contributors/engineering guide. +1 https://devel.rtems.org/wiki/TBR/UserManual/RTEMS_Coverage_Analysis probably needs scrubbing and merged into a Rest document. It has some good stuff but some of the procedural stuff is out of date. --joel > > > -- > Sebastian Huber, embedded brains GmbH > > Address : Dornierstr. 4, D-82178 Puchheim, Germany > Phone : +49 89 189 47 41-16 > Fax : +49 89 189 47 41-09 > E-Mail : sebastian.hu...@embedded-brains.de > PGP : Public key available on request. > > Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG. > >
_______________________________________________ devel mailing list devel@rtems.org http://lists.rtems.org/mailman/listinfo/devel