RE: [PATCH] [StaticAnalyzer] New checker Sizeof on expression

Anders Rönnholm Mon, 18 Nov 2013 02:00:50 -0800

Hi,

I have run the checker on llvm codebase and chrome and did not find any false 
positives. Not any true positives either. I also ran it on some SFINAE examples 
with no false positives. But that was expected as I have not seen any SFINAE 
examples where they use a binary expression in the sizeof nor sizeof(sizeof()), 
is that really common?


I saw that there is an isSFINAEContext function in semaexpr. What do you think 
about moving the check to sema and use that function to avoid triggering on 
SFINAE?

/Anders


From: Anna Zaks [mailto:[email protected]]
Sent: den 23 oktober 2013 17:35
To: Daniel Marjamäki
Cc: Ted Kremenek; Anders Rönnholm; Richard Smith; [email protected]; 
Jordan Rose
Subject: Re: [PATCH] [StaticAnalyzer] New checker Sizeof on expression

Daniel,

I think Richard was asking about false positives/true positives ratio and I do 
not see any false positives in the list below. It looks like all the reports 
you found are valid. Did you run this check on large C++ codebases?

As far as I can tell there are two outstanding issues:
 - Addressing these questions from Richard:
sizeof(expression) is a common idiom in SFINAE contexts. Is that covered here?
Richard, can you provide examples?

sizeof(sizeof(int)) is a "traditional" way to get sizeof(size_t). Why should we 
warn on that?
(I personally think that if this check goes into the analyzer, it's fine to 
warn in the second case.)

 - Where this check should go (compiler or the analyzer)?
I think that if you can address the questions above, this check could go into 
the compiler.

Cheers,
Anna.

On Oct 14, 2013, at 12:43 PM, Daniel Marjamäki 
<[email protected]<mailto:[email protected]>> wrote:



Hello!

As I understood it.. you wanted to know what the hit ratio is. I have 
investigated the hit ratio on a number of packages in debian. Below is the 
negative and positives I've found. I provide the package URLs so you can look 
for yourself..

I will continue investigating and let you know if I see any false positives...


NEGATIVE (macro)
package: 
http://ftp.us.debian.org/debian/pool/main/a/agedu/agedu_9723.orig.tar.gz

#define sresize(array, number, type) \
    ( (void)sizeof((array)-(type *)0), \
      (type *) srealloc ((array), (number) * sizeof (type)) )

We don't warn about this since it's in a macro. However it is interesting:
    sizeof((array)-(type *)0)
It seems to be written this way by intention. To get sizeof(ptrdiff_t)? Maybe 
sizeof(ptr-ptr) should be ok.



POSITIVE
package: 
http://ftp.us.debian.org/debian/pool/main/a/argus-client_3.0.0.orig.tar.gz

file: argus-clients-3.0.0/radump/print-ldp.c
line: 607
            print_unknown_data(tptr+sizeof(sizeof(struct ldp_msg_header)),"\n\t 
 ",
                               msg_len);

file: argus-clients-3.0.0/radump/print-lmp.c
line: 878
            print_unknown_data(tptr+sizeof(sizeof(struct 
lmp_object_header)),"\n\t    ",
                               lmp_obj_len-sizeof(struct lmp_object_header));

Not sure if these are intended to be sizeof(size_t) or sizeof(struct 
lmp_object_header) but I guess the latter.



POSITIVE
package: 
http://ftp.us.debian.org/debian/pool/main/b/babel/babel_1.4.0.dfsg.orig.tar.gz
file: babel-1.4.0.dfsg/regression/output/libC/synch_RegOut_Impl.c

332:    if ((res < 0) || (res > sizeof(s_result_strings)/sizeof(sizeof(char 
*)))) {

This "sizeof(sizeof(char *))" should be "sizeof(char*)" since s_result_strings 
is a array of char * pointers



POSITIVE
package: http://ftp.us.debian.org/debian/pool/main/b/bird/bird_1.3.7.orig.tar.gz
file: bird-1.3.7/lib/mempool.c

253:
  return ALLOC_OVERHEAD + sizeof(struct linpool) +
    cnt * (ALLOC_OVERHEAD + sizeof(sizeof(struct lp_chunk))) +
    m->total + m->total_large;

Not sure if this is intended to be sizeof(size_t) or sizeof(struct lp_chunk) 
but I guess the latter.


POSITIVE:
package: 
http://ftp.us.debian.org/debian/pool/main/b/blender/blender_2.49.2~dfsg.orig.tar.gz
file: blender-2.49b.orig/source/blender/blenloader/intern/readfile.c
6987:    if(0==strncmp(ima->id.name+2, "Viewer Node", sizeof(ima->id.name+2))) {
6991:    if(0==strncmp(ima->id.name+2, "Render Result", 
sizeof(ima->id.name+2))) {

These are bad, ima->id.name is an array, it should probably be changed to:
    if(0==strncmp(ima->id.name+2, "Viewer Node", sizeof(ima->id.name) - 2)) {


Best regards,
Daniel Marjamäki

..................................................................................................................
Daniel Marjamäki Senior Engineer
Evidente ES East AB  Warfvinges väg 34  SE-112 51 Stockholm  Sweden

Mobile:                 +46 (0)709 12 42 62
E-mail:                 
Daniel.Marjamaki<mailto:[email protected]>@evidente.se<mailto:[email protected]>

www.evidente.se<http://www.evidente.se/>
________________________________
Från: [email protected]<mailto:[email protected]> 
[[email protected]<mailto:[email protected]>] för 
Anna Zaks [[email protected]<mailto:[email protected]>]
Skickat: den 8 oktober 2013 18:18
Till: Ted Kremenek
Cc: Anders Rönnholm; Richard Smith; 
[email protected]<mailto:[email protected]>
Ämne: Re: [PATCH] [StaticAnalyzer] New checker Sizeof on expression

On Oct 7, 2013, at 10:05 PM, Ted Kremenek 
<[email protected]<mailto:[email protected]>> wrote:


On Oct 7, 2013, at 6:05 PM, Jordan Rose 
<[email protected]<mailto:[email protected]>> wrote:



On Oct 7, 2013, at 13:58, Richard Smith 
<[email protected]<mailto:[email protected]>> wrote:


On Mon, Oct 7, 2013 at 9:55 AM, Jordan Rose 
<[email protected]<mailto:[email protected]>> wrote:
I'm fine with this staying in the analyzer for now, unless David, Richard, or 
Eli feel it should be a warning right away.

Do we have evidence that we want this? Does it catch bugs? If so, what do they 
look like? It seems like this would trigger on legitimate code; how does a user 
suppress the warning in that case, and does that suppression make their code 
clearer?

What is the false/true positive ratio for bug finding here?

sizeof(expression) is a common idiom in SFINAE contexts. Is that covered here?

sizeof(sizeof(int)) is a "traditional" way to get sizeof(size_t). Why should we 
warn on that?

And more as a general question than something specific to this patch: Is there 
a region in the space of false positive ratios where we think a syntactic 
warning should go into the static analyzer? If so, why? And what is that 
region? I would have thought that the static analyzer, like the clang warnings, 
would be aimed at (eventually) having a false positive ratio of near zero. If 
so, then should we ever put a warning in the static analyzer if it doesn't 
require the static analyzer's technology (or have a high runtime cost)?

On this last (and bringing in Ted and Anna):

I think the main difference between compiler warnings and syntactic analyzer 
checks is that we try very hard to turn new compiler warnings on by default. A 
second-order effect of this is that we generally avoid style warnings. The 
analyzer can be a bit looser about this, though: because people know the 
analyzer is stricter and more in-depth, I think they might also accept that a 
particular check doesn't fit their project.

On the other hand, we still haven't gotten around to designing a proper bug 
tracking and/or manual suppression system, so that's one advantage of compiler 
warnings. And as you say, checks without a high runtime cost don't really have 
a technical reason to be in the analyzer.

Richard's point is correct that we want the static analyzer to also have a high 
signal-to-noise ratio.  Otherwise it is a useless tool.  I'm also not a fan of 
having the analyzer having a bunch of "junk" checkers that aren't on by 
default, but if a checker, when it is enabled, is HIGHLY useful to some set of 
people (e.g., security experts who are more tolerant of false positive rates if 
they want to do an aggressive code audit)  I think the analyzer is a reasonable 
place to put them, given the current warning policy in Clang where we want the 
warnings there to be generally useful to everybody.

To put it in more context, in the beginning the guiding principal of what goes 
in the static analyzer was:

(a) The warning is very expensive to compute.

 OR

(b) The warning is very domain-specific.  For example, an API such as 
CFNumberCreate() on OS X has some interesting API invariants, but we generally 
should not be hacking those API-specific warnings into the compiler.  
Exceptions exist of course, e.g., printf checking, but often the are grounded 
when such APIs are fairly standard (e.g., in the C standard itself) or the 
checking is based on some annotation like __attribute__((format)) where the 
compiler doesn't know anything about a specific function itself, just the 
annotation.

Style warnings can sometimes fall into (a) (in which case not putting them in 
the compiler makes obvious sense), but one could argue that they are more the 
flavor of (b) then a traditional compiler warning.  As I mentioned earlier, the 
static analyzer can be home to some highly specialized checkers that may not be 
generally useful for everybody but when enabled are very useful to certain 
people.  Style warnings often fit in this category of warnings.

A related problem is that we don't have an ontology for style warnings in clang 
(the compiler).  Should they really be lumped into the same group of all other 
compiler warnings?  What about their behavior with -Werror?  -Weverything?  
Style warnings really are about personal style, and users can be highly 
polarized about them.  For example, it would be highly useful for Clang to have 
a -W80-columns or -Wspaces-instead-of-tabs (not a serious proposal).  Both 
would be cheap to compute, would obviously benefit LLVM developers, but they 
wouldn't work for everybody, or even the majority of software developers.  
Where should they go?  The compiler?  The static analyzer?  Both have 
discoverability and usability concerns in either location.  I'd argue that 
these two warnings would be better suited in the compiler because (when the 
user wants them) they'd get run all the time, but having these warnings 
counteracts the general guiding principal that compiler warnings should 
generally be useful to most people, or clear categories of people (e.g, library 
authors), but not specific groups of people (e.g., LLVM developers).

This is not a thought out proposal, but I would not be opposed to a new 
category of warnings, say -Wstyle, and have all style warnings under -Wstyle, 
perhaps named "-Wstyle-80-columns" or "-Wstyle-spaces-instead-of-tabs".  None 
of these warnings would be on by default.  If we want these kind of warnings in 
the compiler, I think these kind of warnings are worth calling out as being 
something "different" that people shouldn't get hung up about if they don't 
think the warning applies to them.  Also, having them in a special category of 
them own allows institutions to set broad warning polices such as "-Werror 
-Wno-error=style" where they want regular warnings to be treated as error, but 
not style warnings, etc.

I agree with Ted. Currently, there is no good way of adding style and highly 
specialized warnings to clang; given the behavior of Werror and Weverything. 
For example, we encourage users to learn about new generic warnings by turning 
on Weverything, so the specialized warnings should not be included there.

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

RE: [PATCH] [StaticAnalyzer] New checker Sizeof on expression

Reply via email to