Re: Split text file into words
qwweeeit wrote: >ll=re.split(r"[\s,{}[]()+=-/*]",i) The stack overflow comes because the ()+ tried to match an empty string as many times as possible. This regular expression contains a character set '\s,{}[' followed by the expression '()+=-/*]'. You can see that the parentheses aren't part of a character set if you reverse their order which gives you an error when the expression is compiled instead of failing when trying to match: >>> ll=re.split(r"[\s,{}[])(+=-/*]",i) Traceback (most recent call last): File "", line 1, in -toplevel- ll=re.split(r"[\s,{}[])(+=-/*]",i) File "C:\Python24\Lib\sre.py", line 157, in split return _compile(pattern, 0).split(string, maxsplit) File "C:\Python24\Lib\sre.py", line 227, in _compile raise error, v # invalid expression error: unbalanced parenthesis >>> I suspect you actually meant the character set to include the other punctuation characters in which case you need to escape the closing square bracket or make it the first character: Try: ll=re.split(r"[\s,{}[\]()+=-/*]",i) or: ll=re.split(r"[]\s,{}[()+=-/*]",i) instead. -- http://mail.python.org/mailman/listinfo/python-list
Re: Split text file into words
I thank you for your help. I already used re.split successfully but in this case... I didn't explain more deeply because I don't want someone else do my homework. I want to implement a variable & commands cross reference tool. For this goal I must clean the python source from any comment and manifest string. On the cleaned source file I must isolate all the words (keeping the words connected by '.') My wrong code (don't consider the line ref. in traceback ... it's an extract!): import re # input text file w/o strings & comments f=open('file.txt') lInput=f.readlines() f.close() fOut=open('words.txt','w') for i in lInput: . ll=re.split(r"[\s,{}[]()+=-/*]",i) . fOut.write(' '.join(ll)+'\n') fOut.close() Traceback (most recent call last): File "./GetWords.py", line 70, in ? ll=re.split(r"[\s,{}[]()+=-/*]",i) File "/usr/lib/python2.3/sre.py", line 156, in split return _compile(pattern, 0).split(string, maxsplit) RuntimeError: maximum recursion limit exceeded ... and if I use: ll=re.split(r"\s,{}[]()+=-/*",i) Traceback (most recent call last): File "./GetWords.py", line 70, in ? ll=re.split(r"\s,{}[]()+=-/*",i) File "/usr/lib/python2.3/sre.py", line 156, in split return _compile(pattern, 0).split(string, maxsplit) File "/usr/lib/python2.3/sre.py", line 230, in _compile raise error, v # invalid expression sre_constants.error: bad character range I taught it was my mistake in the use of re.split... I am using: Python 2.3.4 (#2, Aug 19 2004, 15:49:40) [GCC 3.4.1 (Mandrakelinux (Alpha 3.4.1-3mdk)] on linux2 -- http://mail.python.org/mailman/listinfo/python-list
Re: Split text file into words
qwweeeit wrote: > The standard split() can use only one delimiter. To split a text file > into words you need multiple delimiters like blank, punctuation, math > signs (+-*/), parenteses and so on. > > I didn't succeeded in using re.split()... > Would you care to elaborate on how you tried to use re.split and failed? We aren't mind readers here. An example of your non-working code along with the expected result and the actual result would be useful. This is the first example given in the documentation for re.split: >>> re.split('\W+', 'Words, words, words.') ['Words', 'words', 'words', ''] Does it do what you want? If not what do you want? -- http://mail.python.org/mailman/listinfo/python-list
Re: Split text file into words
On Tuesday 08 March 2005 14:43, qwweeeit wrote: > The standard split() can use only one delimiter. To split a text file > into words you need multiple delimiters like blank, punctuation, math > signs (+-*/), parenteses and so on. > > I didn't succeeded in using re.split()... Then try again... ;) No, seriously, re.split() can do what you want. Just think about what are word delimiters. Say, you want to split on all whitespace, and ",", ".", and "?", then you'd use something like: [EMAIL PROTECTED] ~ $ python Python 2.3.5 (#1, Feb 27 2005, 22:40:59) [GCC 3.4.3 20050110 (Gentoo Linux 3.4.3.20050110, ssp-3.4.3.20050110-0, pie-8.7 on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import re >>> teststr = "Hello qwweeeit, how are you? I am fine, today, actually." >>> re.split(r"[\s\.,\?]+",teststr) ['Hello', 'qwweeeit', 'how', 'are', 'you', 'I', 'am', 'fine', 'today', 'actually', ''] Extending with other word separators shouldn't be hard... Just have a look at http://docs.python.org/lib/re-syntax.html HTH! -- --- Heiko. pgpiHbI7zcTjy.pgp Description: PGP signature -- http://mail.python.org/mailman/listinfo/python-list
Split text file into words
The standard split() can use only one delimiter. To split a text file into words you need multiple delimiters like blank, punctuation, math signs (+-*/), parenteses and so on. I didn't succeeded in using re.split()... -- http://mail.python.org/mailman/listinfo/python-list