Token re groups

BruceD Fri, 02 Jan 2009 12:39:56 -0800

Hi

I have modified the LexToken class (appended) to return the re groups
using index notation to avoid having to construct a duplicate re to
extract the parsed groups. This is illustrated in the following
snippet which returns a tuple containing the three parts of an HTML
tag:


def t_LEFT_TAG(t):
    ur"<([A-Za-z]+)(?:\ ([^>\/]*))?(\/)?>"
    t.value = (t[0], t[1], t[2])
    if t[3] == u'/':
        t.type = 'CLOSED_TAG'
    return t

I had to resort to a rather clunky way of enumerating the groups and
would like some assistance with tidying up the code to avoid use
before initialization (the __len__ function is called) group items
that are returned as None rather than a str or unicode (the
__getitem__ function). The commented out matchnames (see source code)
appeared to be an ideal way but this fails when self.lexer.lexre has
more than one item (when using an inclusive state).

Thanks,
Bruce

class LexToken(object):
    def __str__(self):
        return "LexToken(%s,%r,%d,%d)" %
(self.type,self.value,self.lineno,self.lexpos)
    def __repr__(self):
        return str(self)
    def skip(self,n):
        self.lexer.skip(n)
        _SkipWarning("Calling t.skip() on a token is deprecated.
Please use t.lexer.skip()")
    # The following code was added by BruceD (Nov 2008)
    def __getMatches(self):
        """Get the submatches returned by the re"""
        try:
            return self.__matches
        except:
            try:
                m = self.lexer.lexmatch
                i = m.lastindex
                #matchnames = self.lexer.lexre[0][1]
                self.__matches = [m.group(i)]
                i += 1
#              while not matchnames[i]:
                while i < len(m.groups()) and isinstance(m.group(i),
(unicode, str)):
                    self.__matches.append(m.group(i))
                    i += 1
            except:
                self.__matches = self.value
            return self.__matches
    def __len__(self):
        try:
            if self.lexer:
                return len(self.__getMatches())
            else:
                return len(self.value)
        except:
            return len(self.value)
    def __getitem__(self, i):
        if len(self.__getMatches()) > i:
            return self.__matches[i]
        else:
            return ''

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ply-hack" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ply-hack?hl=en
-~----------~----~----~----~------~----~------~--~---

Token re groups

Reply via email to