On Apr 24, 10:42 am, "Eric Wertman" <[EMAIL PROTECTED]> wrote: > I'm sure there are cooler ways to do some of that. I spent most of my > time expanding the characters that constitute content. I'm concerned > that over time I'll have things break as other characters show up. > Specifically a few of the nodes are of German locale.. so I could get > some odd international characters. > If you want to add international characters without going to Unicode, a first cut would be to add pyparsing's string constant "ascii8bit".
> It looks like pyparser has a constant for printable characters. I'm > not sure if I can just use that, without worrying about it? > I would discourage you from using printables, since it also includes '[', ']', and '"', which are significant to other elements of the parser (but you could create your own variable initialized with printables, and then use replace("[","") etc. to strip out the offending characters). I'm also a little concerned that you needed to add \t and \n to the content word - was this really necessary? None of your examples showed such words, and I would rather have you let pyparsing skip over the whitespace as is its natural behavior. -- Paul -- http://mail.python.org/mailman/listinfo/python-list