Hello,

Here are three little issues I faced while implemented a lexing toolkit (see other post).

1. Regex match

Let us say there are three "natures" or "modes" of lexeme:
* SKIP: not even kept, just matched and dropped (eg optional spacing)
* MARK: kept, but slice is irrelevant data (eg all kinds of punctuation)
* DATA: slice is necessary data (eg constant value or symbol)

For the 2 first cases, I still need to get the size ot the matched slice, to advance in source by the corresponding offset. Is there a way to get this information without fetching the slice by calling hit()?

Also, I would like to know when Regex.hit() copies or slices.


2. reference escape

This is a little enigma I face somewhere in this module. Say S is a struct:
    ...
    auto s = S(data);
    return &s;
This code is obvioulsy wrong and the compiler gently warns me about that. But the variant below is allowed and more, seems towork fine:
    return &(S(data);
For me, both versions are synonym. Thus, why does the compiler accept the latter and why does it work? Any later use to the returned struct (recorded in an array) should miserably fail with segfault. (*) Or is it that the compiler recognises the idiom and implicitely allocates the struct outside the local stack?
Example:

struct S { int i; }

S* newS (int i) {
    if (i < 0)
        return null;
//  auto s = S(i);
//  return &s;  // Error: escaping reference to local s
    return &(S(i));
}

unittest {
    int[] ints = [2, -2, 1, -1, 0];
    S[] structs;
    foreach (i ; ints) {
        auto p = newS(i);
        if (p) {
            structs ~= *p;      // explicite deref!
        }
    }
    assert ( structs == [S(2), S(1), S(0)] );   // pass!
}

How can this work?


3. implicite deref

But there is even more mysterious for me: if I first access the struct before recording it like in:

unittest {
    int[] ints = [2, -2, 1, -1, 0];
    S[] structs;
    foreach (i ; ints) {
        auto p = newS(i);
        if (p) {
            write (p.i,' ');    // implicite deref!
            structs ~= *p;      // explicite deref!
        }
    }
    assert ( structs == [S(2), S(1), S(0)] );   // pass!
}

...then the final assert fails!? But the written i's are correct ("2 1 0").
Worse, if I exchange the two deref lines:

unittest {
    int[] ints = [2, -2, 1, -1, 0];
    S[] structs;
    foreach (i ; ints) {
        auto p = newS(i);
        if (p) {
            structs ~= *p;      // explicite deref!
            write (p.i,' ');    // implicite deref!
        }
    }
    assert ( structs == [S(2), S(1), S(0)] );   // pass!
}

...then the assertion passes, but the written integers are wrong (looks like either garbage or an address, repeated 3 times, eg: "134518949 134518949 134518949"; successive runs constantly produce the same value).


Denis
--
_________________
vita es estrany
spir.wikidot.com

Reply via email to