Re: Split text file into words

2005-03-09 Thread Duncan Booth
qwweeeit wrote:

>ll=re.split(r"[\s,{}[]()+=-/*]",i)

The stack overflow comes because the ()+ tried to match an empty string as 
many times as possible.

This regular expression contains a character set '\s,{}[' followed by the 
expression '()+=-/*]'. You can see that the parentheses aren't part of a 
character set if you reverse their order which gives you an error when the 
expression is compiled instead of failing when trying to match:

>>> ll=re.split(r"[\s,{}[])(+=-/*]",i)

Traceback (most recent call last):
  File "", line 1, in -toplevel-
ll=re.split(r"[\s,{}[])(+=-/*]",i)
  File "C:\Python24\Lib\sre.py", line 157, in split
return _compile(pattern, 0).split(string, maxsplit)
  File "C:\Python24\Lib\sre.py", line 227, in _compile
raise error, v # invalid expression
error: unbalanced parenthesis
>>> 

I suspect you actually meant the character set to include the other 
punctuation characters in which case you need to escape the closing square 
bracket or make it the first character:

Try:

ll=re.split(r"[\s,{}[\]()+=-/*]",i)

or:

ll=re.split(r"[]\s,{}[()+=-/*]",i)

instead.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Split text file into words

2005-03-09 Thread qwweeeit
I thank you for your help.
I already used re.split successfully but in this case...
I didn't explain more deeply because I don't want someone else do my
homework.

I want to implement a variable & commands cross reference tool.
For this goal I must clean the python source from any comment and
manifest string.
On the cleaned source file I must isolate all the words (keeping the
words connected by '.')

My wrong code (don't consider the line ref. in traceback ... it's an
extract!):

import re

# input text file w/o strings & comments

f=open('file.txt')
lInput=f.readlines() 
f.close()

fOut=open('words.txt','w')

for i in lInput:
.   ll=re.split(r"[\s,{}[]()+=-/*]",i)
.   fOut.write(' '.join(ll)+'\n')

fOut.close()

Traceback (most recent call last):
  File "./GetWords.py", line 70, in ?
ll=re.split(r"[\s,{}[]()+=-/*]",i)
  File "/usr/lib/python2.3/sre.py", line 156, in split
return _compile(pattern, 0).split(string, maxsplit)
RuntimeError: maximum recursion limit exceeded


... and if I use:
ll=re.split(r"\s,{}[]()+=-/*",i)

Traceback (most recent call last):
  File "./GetWords.py", line 70, in ?
ll=re.split(r"\s,{}[]()+=-/*",i)
  File "/usr/lib/python2.3/sre.py", line 156, in split
return _compile(pattern, 0).split(string, maxsplit)
  File "/usr/lib/python2.3/sre.py", line 230, in _compile
raise error, v # invalid expression
sre_constants.error: bad character range

I taught it was my mistake in the use of re.split...

I am using:   
Python 2.3.4 (#2, Aug 19 2004, 15:49:40)
[GCC 3.4.1 (Mandrakelinux (Alpha 3.4.1-3mdk)] on linux2
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Split text file into words

2005-03-08 Thread Duncan Booth
qwweeeit wrote:

> The standard split() can use only one delimiter. To split a text file
> into words  you need multiple delimiters like blank, punctuation, math
> signs (+-*/), parenteses and so on.
> 
> I didn't succeeded in using re.split()...
> 

Would you care to elaborate on how you tried to use re.split and failed? We 
aren't mind readers here. An example of your non-working code along with 
the expected result and the actual result would be useful.

This is the first example given in the documentation for re.split:

   >>> re.split('\W+', 'Words, words, words.')
   ['Words', 'words', 'words', '']

Does it do what you want? If not what do you want?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Split text file into words

2005-03-08 Thread Heiko Wundram
On Tuesday 08 March 2005 14:43, qwweeeit wrote:
> The standard split() can use only one delimiter. To split a text file
> into words  you need multiple delimiters like blank, punctuation, math
> signs (+-*/), parenteses and so on.
>
> I didn't succeeded in using re.split()...

Then try again... ;) No, seriously, re.split() can do what you want. Just 
think about what are word delimiters.

Say, you want to split on all whitespace, and ",", ".", and "?", then you'd 
use something like:

[EMAIL PROTECTED] ~ $ python
Python 2.3.5 (#1, Feb 27 2005, 22:40:59)
[GCC 3.4.3 20050110 (Gentoo Linux 3.4.3.20050110, ssp-3.4.3.20050110-0, 
pie-8.7 on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> teststr = "Hello qwweeeit, how are you? I am fine, today, actually."
>>> re.split(r"[\s\.,\?]+",teststr)
['Hello', 'qwweeeit', 'how', 'are', 'you', 'I', 'am', 'fine', 'today', 
'actually', '']

Extending with other word separators shouldn't be hard... Just have a look at

http://docs.python.org/lib/re-syntax.html

HTH!

-- 
--- Heiko.


pgpiHbI7zcTjy.pgp
Description: PGP signature
-- 
http://mail.python.org/mailman/listinfo/python-list

Split text file into words

2005-03-08 Thread qwweeeit
The standard split() can use only one delimiter. To split a text file
into words  you need multiple delimiters like blank, punctuation, math
signs (+-*/), parenteses and so on.

I didn't succeeded in using re.split()...
-- 
http://mail.python.org/mailman/listinfo/python-list