<[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Hi Paul, > > I am trying to extract HTTP response codes from a HTTP page send from > a web server. Below is my test program. The program just hangs. > > Thanks, > Khoa > ################################################## > <snip sample program> > Khoa -
Thanks for supplying a little more information to go on. The problem you are struggling with has to do with pyparsing's handling or non-handling of whitespace, which I'll admit takes some getting used to. In general, pyparsing works its way through the input string, matching input characters against the defined pattern. This gets a little tricky when dealing with whitespace (which includes '\n' characters). In particular, restOfLine will read up to the next '\n', but will not go past it - AND restOfLine will match an empty string. So if you have a grammar that includes repetition, such as OneOrMore(restOfLine), this will read up to the next '\n', and then just keep matching forever. This is just about the case you have in your code, ZeroOrMore(BodyLine), in which BodyLine is BodyLine = Group(nonHTTP + restOfLine) You need to include something to consume the terminating '\n', which is the purpose of the LineEnd() class. Change BodyLine to BodyLine = Group(nonHTTP + restOfLine + LineEnd()) and this will break the infinite looping that occurs at the end of the first body line. (If you like, use LineEnd.suppress(), to keep the '\n' tokens from getting included with your other parsed data.) Now there is one more problem - another infinite loop at the end of the string. By similar reasoning, it is resolved by changing nonHTTP = ~Literal("HTTP/1.1") to nonHTTP = ~Literal("HTTP/1.1") + ~StringEnd() After making those two changes, your program runs to completion on my system. Usually, when someone has some problems with this kind of "line-sensitive" parsing, I recommend that they consider using pyparsing in a different manner, or use some other technique. For instance, you might use pyparsing's scanString generator to match on the HTTP lines, as in for toks,start,end in StatusLine.scanString(data): print toks,toks[0].StatusCode, toks[0].ReasonPhrase print start,end which gives [['HTTP/1.1', '200', ' OK']] 200 OK 0 15 [['HTTP/1.1', '400', ' Bad request']] 400 Bad request 66 90 [['HTTP/1.1', '500', ' Bad request']] 500 Bad request 142 166 If you need the intervening body text, you can use the start and end values to extract it in slices from the input data string. Or, since your data is reasonably well-formed, you could just use readlines, or data.split('\n'), and find the HTTP lines using startswith(). While this is a brute force approach, it will run certainly many times faster than pyparsing. In any event, best of luck using pyparsing, and write back if you have other questions. -- Paul -- http://mail.python.org/mailman/listinfo/python-list