Hmm, how about this? The operand can be followed by an arbitrary number
of items from the following list, in any order.

1. Alpha numeric characters.
2. A dot.
3. A [] pair, where everything from the [ to the ] (excluding more [s or
]s) is included. C++ syntax makes this pretty safe, I think.

This picks up the array index case without too much fuss, although it's
still pretty limited. It can handle multidimensional arrays, nested
structure members, etc., but it can't handle members who are selected
with function like syntax, not likely to be a problem, or arrays whose
index is selected with from another array, ie. arr[x[y]], also not
likely to be a problem. This seems like it handles the important case
while not handling more esoteric cases, and still stays pretty simple,
more or less. I like it. What do you think?

Gabe

On 09/27/11 01:58, Gabe Black wrote:
> I played with this a bit, and it turns out this is pretty tricky. If you
> allow ".", whitespace and alphanumeric characters, things generally
> work. If you wanted to do something with SIMD, though, and for instance
> use an array index in a loop like so:
>
> for (int i = 0; i < 8; i++)
>     Ra.bytes[i] = Rb.bytes[i] + Rc.bytes[i];
>
> it would incorrectly decide that Ra was a source because it would see
> the [ and stop looking for a =.
>
> On the other hand, if you let it match anything except a comma,
> semicolon or = on the way to a =, then in a case like this:
>
> if (Foo)
>     Ra = Rb + Rc;
> else
>     Ra = Rb - Rc;
>
> It would incorrectly decide that Foo was a dest because there was no
> comma or semicolon between it and the equals on the next line.
>
> Given that the first approach gets more things right and enables a major
> use case (bitfields in control registers) I'm inclined to go with it,
> but not being able to use it with SIMD, a second major use case, is a
> serious drawback.
>
> Gabe
>
> On 09/24/11 18:07, Gabe Black wrote:
>> Once the "_" vs "." change is tested by Ali and checked in, the next
>> step to get generic operand types working is to address a deficiency in
>> how is_src and is_dest is determined by the parser. Right now it uses
>> this regular expression to determine if something is a dest, and if it's
>> not being used as a dest it's a src. This is for a particular instance,
>> so one operand can be both if it's used more than once.
>>
>> assignRE = re.compile(r'\s*=(?!=)', re.MULTILINE)
>>
>> Basically what that does is it ensures that is described by this comment:
>> # if the token following the operand is an assignment, this is
>> # a destination (LHS), else it's a source (RHS)
>>
>> That's worked quite well, especially considering how simple it is, and
>> I'd like to preserve both its accuracy and its simplicity. The problem,
>> however, is that the operand name won't necessarily be the last thing
>> before the equals if an operand is being used as a dest. As a simple
>> example, if I wanted to set the foo field of the Bar operand, it might
>> look like this:
>>
>> Bar.foo = 42;
>>
>> Here, I believe the regular expression above would determine that Bar
>> was a source because .foo appeared immediately after it.
>>
>> There are two possible solutions I see for this so far. First, we could
>> make the regular expression ignore "."s and identifiers in addition to
>> whitespace on its way to the equals sign. Second, we could make it
>> ignore *everything* on its way to the equals sign, except a "," or a ";"
>> which would, roughly, denote the end of the expression.
>>
>> Neither of these approaches seem like they'll be fool proof, so I was
>> wondering what you guys think? Can you think of any naturally occurring
>> bit of code that would make one or the other get the wrong answer? The
>> original wasn't fool proof either, but in practice it worked really
>> well. I'd like to go for the same thing, so it's ok for it to be wrong
>> sometimes. I just don't want there to be something common case where it
>> messes up.
>>
>>
>>
>> Also, there's still the issue of how partial writes, like my example
>> actually, are handled as far as being a dest or a src or both. In the
>> example above, Bar is actually both a source and a destination because
>> the non-foo bits are set to their old values. That changes, though, if
>> those other bits are necessarily set later, or if Bar is one of the new
>> types of MiscReg Ali added that accumulate partial writes over time,
>> useful for fault bits in floating point. If the Bar.foo = 42 case makes
>> Bar a dest, maybe all we need to do is add "Bar = Bar" where it needs to
>> be both and leave the parser alone. This puts the burden on the ISA
>> description, although it may need to be there anyway, and it feels a
>> little hacky. I want to deal with this after the issue above, but I
>> wanted to mention it since it'll becoming up next probably.
>>
>> Gabe
>> _______________________________________________
>> gem5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/gem5-dev
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to