A S wrote: > On Tuesday, 3 December 2019 01:01:25 UTC+8, Peter Otten wrote: >> A S wrote: >> >> I think I've seen this question before ;) >> >> > I am trying to extract all strings in nested parentheses (along with >> > the parentheses itself) in my .txt file. Please see the sample .txt >> > file that I have used in this example here: >> > (https://drive.google.com/open?id=1UKc0ZgY9Fsz5O1rSeBCLqt5dwZkMaQgr). >> > >> > I have tried and done up three different codes but none of them seems >> > to be able to extract all the nested parentheses. They can only extract >> > a portion of the nested parentheses. Any advice on what I've done wrong >> > could really help! >> > >> > Here are the three codes I have done so far: >> > >> > 1st attempt: >> > >> > import re >> > from os.path import join >> > >> > def balanced_braces(args): >> > parts = [] >> > for arg in args: >> > if '(' not in arg: >> > continue >> >> There could still be a ")" that you miss >> >> > chars = [] >> > n = 0 >> > for c in arg: >> > if c == '(': >> > if n > 0: >> > chars.append(c) >> > n += 1 >> > elif c == ')': >> > n -= 1 >> > if n > 0: >> > chars.append(c) >> > elif n == 0: >> > parts.append(''.join(chars).lstrip().rstrip()) >> > chars = [] >> > elif n > 0: >> > chars.append(c) >> > return parts >> >> It's probably easier to understand and implement when you process the >> complete text at once. Then arbitrary splits don't get in the way of your >> quest for ( and ). You just have to remember the position of the first >> opening ( and number of opening parens that have to be closed before you >> take the complete expression: >> >> level: 00011112222100 >> text: abc(def(gh))ij >> when we are here^ >> we need^ >> >> A tentative implementation: >> >> $ cat parse.py >> import re >> >> NOT_SET = object() >> >> def scan(text): >> level = 0 >> start = NOT_SET >> for m in re.compile("[()]").finditer(text): >> if m.group() == ")": >> level -= 1 >> if level < 0: >> raise ValueError("underflow: more closing than opening >> parens") >> if level == 0: >> # outermost closing parenthesis: >> # deliver enclosed string including parens. >> yield text[start:m.end()] >> start = NOT_SET >> elif m.group() == "(": >> if level == 0: >> # outermost opening parenthesis: remember position. >> assert start is NOT_SET >> start = m.start() >> level += 1 >> else: >> assert False >> if level > 0: >> raise ValueError("unclosed parens remain") >> >> >> if __name__ == "__main__": >> with open("lan sample text file.txt") as instream: >> text = instream.read() >> for chunk in scan(text): >> print(chunk) >> $ python3 parse.py >> ("xE'", PUT(xx.xxxx.),"'") >> ("TRUuuuth") > > Hello Peter! I tried this on my actual working files and it returned this > error: "unclosed parens remain". In this case, how can I continue to parse > through my text files by only extracting those with balanced parentheses > and ignore those that are incomplete?
filenames = ... for filename in filenames: with open(filename) as instream: text = instream.read() try: chunks = list(scan(text)) except ValueError as err: print(f"{err} in file {filename!r}", file=sys.stderr) else: for chunk in chunks: print(chunk) -- https://mail.python.org/mailman/listinfo/python-list