En Fri, 06 Aug 2010 06:07:32 -0300, Muhammad Adeel <nawabad...@gmail.com>
escribió:
Does any one know how to tokenize a string in python that returns the
byte offsets and tokens? Moreover, the sentence splitter that returns
the sentences and byte offsets? Finally n-grams returned with byte
offsets.
Input:
This is a string.
Output:
This 0
is 5
a 8
string. 10
Like this?
py> import re
py> s = "This is a string."
py> for g in re.finditer("\S+", s):
... print g.group(), g.start()
...
This 0
is 5
a 8
string. 10
--
Gabriel Genellina
--
http://mail.python.org/mailman/listinfo/python-list