On 07/18/2011 05:03 PM, Sebastian Pop wrote:
On Sun, Jul 17, 2011 at 19:41, Tobias Grosser<tob...@grosser.es>  wrote:
Accessing a[100] is undefined in C, so i<    100, and so n<= 100.

Sure. I am just surprised we already include this information in graphite.
Do we add this to the context?

Yes, that's added to the context as a bound on the loop iteration domain.

I just looked again through the gcc_type_ ... types and wondered why it
contains both calls to max_precision_type() and max_signed_precision_type().
Is there a reason for this?

Well, it's all the same: max_precision_type calls
max_signed_precision_type when one of the types is signed.

Also we are calling max_signed_precision_type to determine the type of
the induction variable, as we want to generate signed IV types as much
as possible.

OK.

I also wondered if the solution of reusing the gcc_type_for_clast functions
is the right one. As far as I understood the reason we do not use the type
calculated by the lb_ub functions is that it may be too small and that it
may introduce too many casts. So where exactly do we introduce these casts?

We introduce casts when we generate code for an expression with a type
smaller than the types of its constituents.

So the next question is: What is the type of the expression. As far as I understand we decide ourselves which type the expression should have. Our correctness constraint is that the type must be large enough to contain [lb, ub] of the result as well as all operands. We can choose any larger type to reduce the number of casts that will generated.

> OK.
Here is an example: supposing that we already code generated an IV
graphite_IV with a signed short type, then we are asked to generate
code for the expression "42 * graphite_IV", and finally suppose that
lb_ub returns the interval [-100, 100] for this expression, then we
would have a signed char type for the expression
So the signed char type is the smallest type we can use for the expression.

> and so we would
introduce a cast of the graphite_IV to signed char before doing the
multiplication: so I guess the code generated for this expression
would be something like: "42 * (char) graphite_IV" of type char.
Without taking the max of the types of the sub-expressions, you would
have to generate casts for a shorter type.

Your example is a little incomplete, as 42 is missing a type. At the moment gcc_type_for_value() returns the smallest type that contains the scalar value. So I assume the type is a signed char.

If we choose for as type of the expression signed char we get this:
42 * (char) graphite_IV

On the other hand, if we choose signed short for the expression we get:
((short) 42) * graphite_IV

Both expressions contain a cast. Which one is better for the optimizer is something I do not yet understand. However, I believe this choice is basically a choice that can be taken locally for each expression. Using either max(validy_type,(min(op1_type, op2_type, ...)) to favor small types or max(validy_type,(max(op1_type, op2_type, ...)) to favor large types.

Can we find a solution that solves this problem more locally. I mean instead
of using an imprecise algorithm that leads to larger types which
(accidentally?) leads to less casts, we could understand the type calculated
by lb_ub_... as a minimal type needed for correctness. For each instruction
that we code generate, we derive the type by electing a type that is large
enough to be correct and that at the same time is not smaller than the
smallest type of each operand.


You surely mean "not smaller than the *largest* type of each operand".

No. As said above, both seem to be valid choices and this seems to be basically an issue of generating code that can be optimized well by later transformations.

Yes, that's the intention of the gcc_type_for_clast_* functions, get a
max over the types of sub-expressions.

I honestly do not like this function chain. The problem is that we would calculate huge types, if we derive the type of an expression by only taking into account the types of its sub(sub, ...) expressions. At the moment we do not get these huge types, as our calculations are not 100% correct.

Lets assume we have a clast consisting of a single addition from several integers.

127 + 127 + 127 + 127

Each integer would get the type signed char. Now the function gcc_type_for_clast_red() would calculate the necessary type as max(char, char, char, char) => char. However the correct type is actually max (type([lb, ub]), min/max(char,char,char,char)). Where type([lb,ub]) = type([508,508]) = short and the result is max(short, char) = short.

gcc_type_for_clast_term () seems also to just use the type of the variable instead of the type that would actually be needed to calculate <integer> * variable.

The types would become even larger as they would increase while walking up the ClAST. This is because, we never take the actual ub,lb values into account, but always derive the minimal type needed for correctness from the types of the operands.

I believe, to derive correct types that are nor overly large, doing those calculations locally would be better. For each expression we use ub/lb to calculate the minimal type needed for correctness and then we choose a type equal or larger than this type that minimizes the number of bad casts.

If CLoog one day provides the lb/ub information itself, the migration pass
will be straightforward.

What do you think?

I agree with you that CLooG should provide this information for both
induction variables, and for each expression of the CLAST.  We should
not duplicate code doing the same thing in different compilers that
use CLooG.
Nice. Now we need to implement this in CLooG. ;-)

+static struct clast_user_stmt *
+clast_get_body_of (struct clast_stmt *stmt)
+{
+  if (!stmt
+      || CLAST_STMT_IS_A (stmt, stmt_user))
+    return (struct clast_user_stmt *) stmt;
+
+  if (CLAST_STMT_IS_A (stmt, stmt_for))
+    return clast_get_body_of (((struct clast_for *) stmt)->body);
+
+  if (CLAST_STMT_IS_A (stmt, stmt_guard))
+    return clast_get_body_of (((struct clast_guard *) stmt)->then);
+
+  if (CLAST_STMT_IS_A (stmt, stmt_block))
+    return clast_get_body_of (((struct clast_block *) stmt)->body);
+
+  gcc_unreachable ();
+}

The use of this function seems incorrect to me. We seem to only get the
first statement in a loop and use its domain and scattering to derive
the size of the iv type. Let's assume we have this code:

for (s0 = 0; s0<= n; s0++) {
        if (s0 = 0)
                S1(s0);
        S2(s0);
}

Here we would get a range [0,0] instead of [0,n]. This does not seem to
be correct. A solution would be to completely switch to cloog-isl and
just use the information that the clast provides about the enumerated
space in each loop.


You're right.  However this cleanup should be in a different patch,
probably after another patch set that removes support for CLooG-ppl.

Yes. Is is not worth trying to get this correct based on CLooG-PPL. I just wanted to mention this problem. It will be easy to solve with cloog-isl.

Cheers
Tobi




Reply via email to