On 15/03/2023 17:26, David Malcolm wrote:
On Wed, 2023-03-15 at 16:24 +0100, Pierrick Philippe wrote:
Hi everyone,
Hi Pierrick
I have some question regarding the analyzer.
I'm currently working on an fully out-of-tree static analyzer plugin.
I started development on commit tagged as /basepoints/gcc-13/, but
recently moved my code to a more recent /trunk/ branch (build
/20230309/).
From my different experiments and analyzer's log readings, I have
the
intuition that '/ana::state_machine::state/' class is mostly (if not
only) linked with '/ana::svalue/',
Yes: a program_state (which describes the state at a particular node
along an execution path) contains (among other things) an sm_state_map
per state_machine, and the sm_state_map maps from svalues to
state_machine::states.
i.e. the lvalue of a memory region.
In my analysis case, I would like to also be able to track state for
some rvalue of a memory region, i.e. '/ana::region/' objects.
You're using the terms "lvalue" and "rvalue", but if I'm reading your
email right, you're using them in exactly the opposite way that I
understand them.
I apologize, you are completely right, I swapped their usage...
I think of an "lvalue" as something that can be on the left-hand side
of an assignment: something that can be written to (an ana::region),
whereas an "rvalue" is something that can be on the right-hand side of
an assignment: a pattern of bits (an ana:svalue) that be written into
an lvalue.
An ana::svalue is a pattern of bits (possibly symbolic, such as "the
constant 42" or "the initial value of param x")
An ana::region is a description of a memory location for reads/writes
(possibly symbolic, such as "local var x within this frame", or "the
block reached by dereferencing the initial value of ptr in the frame
above").
Sorry if I've misread things; I'll try to answer the rest of the email
as best as I can...
So, first question: is there any way to associate and track the state
of
a rvalue, independently of its lvalue?
To try to clarify the question, here's an example:
'''
int __attribute__("__some_attribute__") x = 42;
/* STEP-1
From now on, we consider the state of x as being marked by
some_attribute.
But in fact, in the log, we can observe that we'll have something
like
this in the new '/ana::program_state/':
{0x4b955b0: (int)42: marked_state (‘x’)} */
Yes: sm-state is associated within a program_state with an svalue, in
this case with the constant 42.
There isn't yet a way to instead give sm-state to "x" itself. I
suppose you could give sm-state to &x (the pointer that points to x is
an instance of region_svalue, which is an svalue), but I haven't tried
this approach myself.
int *y = &x;
/* STEP-2
For analysis purpose, you want to know that from now on, y is
pointing
to marked data.
So you set state of the LHS of the GIMPLE statement (i.e. some
ssa_name
instance of y) accordingly, with a state named 'points-
to_marked_data'
and setting 'x' as the origin of the state (in the sense of the
argument
/origin/ from '/ana::sm_context::on_transition/'.
What we now have in the new '/ana::program_state/' is this:
{0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0:
(int)42
(‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */
Yes: you've set the state on the svalue "&x", not on "y".
int z = *y;
/* STEP-3
Now you have implicit copy of marked data, and you want to report it.
So you state the LHS of the GIMPLE statement (i.e. some ssa_name
instance of z) as being marked, with 'y' as the origin.
What we now have in the new '/ana::program_state/' is this:
{0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0:
(int)42
(‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */
'''
Presumably the program_state also shows that you have a binding for the
region "z", containing the svalue 42 (this is represented within the
"store", within the region_model within the program_state).
Indeed, there is a binding for region "z" to the svalue 42 within the
new program state.
In STEP-2:
We lost the information saying that the rvalue of y (i.e. y), is
pointing to marked data.
Only remain the information that its lvalue (i.e. &x), is pointing to
marked data, which is of course true.
Note that the analyzer by default attempts to purge state that it
doesn't think will be needed anymore, so it may have purged the state
of "y" if y isn't going to be read from anymore. You could try -fno-
analyzer-state-purge to avoid this purging.
Nothing changes when I run it with the -fno-anlyzer-state-purge,
it is still the state of &x which is tracked.
In STEP-3:
No information is added regarding z in the new program_state.
In fact, if one have a closer look at the log, we see that the LHS of
the GIMPLE statement (the ssa_name instance of z), is already in the
state of 'marked_data'.
Through the log to the call to '/sm_context::on_transition/' this can
be
seen:
marked_sm: state transition of ‘z_5’: marked_data -> marked_data
All of this step is somehow leading to confusing diagnostic.
For example, you miss the fact that 'z' is being marked, because no
event happen as it is somehow aliasing 'x' svalue.
Though it might not be true in case of missed optimization.
Of course, if anyone wants to have a look at my code, I can provide
it
to you (it is not yet publicly available at the moment).
I think I'd have to look at the code to be of more help; I confess that
I stopped understanding somewhere around step 3, sorry.
No worries, the idea is that regarding the state_map, z_3 is already
considered as marked.
Are you able to post a simple example of the code you'd like to
analyze, with an idea of the behavior you'd like from the analyzer?
(e.g. the state transitions)
Sure here is an exemple:
'''
int __attribute__("__some_attribute__") x = 42;
// Transition state for x: start -> marked
int *y = &x;
// Transition state for y: start -> points-to_marked
int z = *y;
// Transition state for z: start -> marked
if (z);
// Diagnostic is issued because of the use of a marked data in a
condition, for analysis purpose.
'''
The issued diagnostic I do have in mind looks like:
'''
1 | int __attribute__("__some_attribute__") x = 42;
| ^
| |
| (1) ‘x’ gets marked here
2 | int *y = &x;
| ~
| |
| (2) ‘y’ points-to marked data here
3 | int z = *y;
| ~
| |
| (3) ‘z’ gets marked here
4 | if (z);
| ~
| |
| (4) used of a marked value 'z' in a condition
'''
What I get from now on is this:
'''
1 | int __attribute__((__taint__)) x = 42;
| ^
| |
| (1) ‘x’ gets
tainted here
|......
4 | if(z) return 666;
| ~
| |
| (2) use of tainted value ‘z’ in a condition
'''
And for some reason, if I change the analyzed code to this:
'''
int __attribute__("__some_attribute__") x = 42;
int *y = &x;
x = 6; // New line
int z = *y;
if (z);
'''
This is what I get:
'''
2 | int __attribute__((__taint__)) x = 42;
| ^
| |
| (1) ‘x’ gets marked
here
3 | int * y = &x;
| ~
| |
| (2) ‘&x’ points to tainted data here
4 | x = 6;
5 | int z = *y;
| ~
| |
| (3) ‘x’ gets tainted here
6 | if(z) return 666;
| ~
| |
| (4) use of tainted value ‘z’ in a condition
'''
Sorry for the long mail here,
Let me know if you want to have a look at the exploded graph or at the
analyzer's log.
Thank you for your time,
Pierrick