Re: replace only full words

2013-09-28 Thread Tim Chase
On 2013-09-28 09:11, cerr wrote:
 I have a list of sentences and a list of words. Every full word
 that appears within sentence shall be extended by WORD i.e. I
 drink in the house. Would become I drink in the house. (and
 not I drink in the house.)

This is a good place to reach for regular expressions.  It comes with
a ensure there is a word-boundary here token, so you can do
something like the code at the (way) bottom of this email.  I've
pushed it off the bottom in the event you want to try and use regexps
on your own first.  Or if this is homework, at least make you work a
*little* :-)

 Also, is there a way to make it faster?

The code below should do the processing in roughly O(n) time as it
only makes one pass through the data and does O(1) lookups into your
set of nouns.  I included code in the regexp to roughly find
contractions and hyphenated words.  Your original code grows slower
as your list of nouns grows bigger and also suffers from
multiple-replacement issues (if you have the noun-list of [drink,
rink], you'll get results that you don't likely want.

My code hasn't considered case differences, but you should be able to
normalize both the list of nouns and the word you're testing in the
modify() function so that it would find Drink as well as drink

Also, note that some words serve both as nouns and other parts of
speech, e.g. It's kind of you to house me for the weekend and drink
tea with me.

-tkc

































import re

r = re.compile(r
  \b# assert a word boundary
  \w+   # 1+ word characters
  (?:   # a group
   [-']  # a dash or apostrophe
   \w+   # followed by 1+ word characters
   )?# make the group optional (0 or 1 instances)
  \b# assert a word boundary here
  , re.VERBOSE)

nouns = set([
  drink,
  house,
  ])

def modify(matchobj):
  word = matchobj.group(0)
  if word in nouns:
return %s % word
  else:
return word

print r.sub(modify, I drink in the house)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: replace only full words

2013-09-28 Thread MRAB

On 28/09/2013 17:11, cerr wrote:

Hi,

I have a list of sentences and a list of words. Every full word that appears within sentence shall be extended by WORD i.e. I 
drink in the house. Would become I drink in the house. (and not I drink in the 
house.)I have attempted it like this:
   for sentence in sentences:
 for noun in nouns:
   if  +noun+  in sentence or  +noun+? in sentence or  +noun+! in sentence or 
 +noun+. in sentence:
sentence = sentence.replace(noun, '' + noun + '')

 print(sentence)

but what if The word is in the beginning of a sentence and I also don't like 
the approach using defined word terminations. Also, is there a way to make it 
faster?


It sounds like a regex problem to me:

import re

nouns = [drink, house]

pattern = re.compile(r\b( + |.join(nouns) + r)\b)

for sentence in sentences:
sentence = pattern.sub(r\g0, sentence)
print(sentence)

--
https://mail.python.org/mailman/listinfo/python-list


Re: replace only full words

2013-09-28 Thread Jussi Piitulainen
MRAB writes:

 On 28/09/2013 17:11, cerr wrote:
  Hi,
 
  I have a list of sentences and a list of words. Every full word
  that appears within sentence shall be extended by WORD i.e. I
  drink in the house. Would become I drink in the house. (and
  not I drink in the house.)I have attempted it like this:

 for sentence in sentences:
   for noun in nouns:
 if  +noun+  in sentence or  +noun+? in sentence or  
  +noun+! in sentence or  +noun+. in sentence:
  sentence = sentence.replace(noun, '' + noun + '')
 
   print(sentence)
 
  but what if The word is in the beginning of a sentence and I also
  don't like the approach using defined word terminations. Also, is
  there a way to make it faster?
 
 It sounds like a regex problem to me:
 
 import re
 
 nouns = [drink, house]
 
 pattern = re.compile(r\b( + |.join(nouns) + r)\b)
 
 for sentence in sentences:
  sentence = pattern.sub(r\g0, sentence)
  print(sentence)

Maybe tokenize by a regex and then join the replacements of all
tokens:

import re

def substitute(token):
   if isfullword(token.lower()):
  return '{}'.format(token)
   else:
  return token

def tokenize(sentence):
   return re.split(r'(\W)', sentence) 

sentence = 'This is, like, a test.'

tokens = map(substitute, tokenize(sentence))
sentence = ''.join(tokens)

For better results, both tokenization and substitution need to depend
on context. Doing some of that should be an interesting exercise.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: replace only full words

2013-09-28 Thread cerr
On Saturday, September 28, 2013 4:54:35 PM UTC, Tim Chase wrote:
 On 2013-09-28 09:11, cerr wrote:
 
  I have a list of sentences and a list of words. Every full word
 
  that appears within sentence shall be extended by WORD i.e. I
 
  drink in the house. Would become I drink in the house. (and
 
  not I drink in the house.)
 
 
 
 This is a good place to reach for regular expressions.  It comes with
 
 a ensure there is a word-boundary here token, so you can do
 
 something like the code at the (way) bottom of this email.  I've
 
 pushed it off the bottom in the event you want to try and use regexps
 
 on your own first.  Or if this is homework, at least make you work a
 
 *little* :-)
 
 
 
  Also, is there a way to make it faster?
 
 
 
 The code below should do the processing in roughly O(n) time as it
 
 only makes one pass through the data and does O(1) lookups into your
 
 set of nouns.  I included code in the regexp to roughly find
 
 contractions and hyphenated words.  Your original code grows slower
 
 as your list of nouns grows bigger and also suffers from
 
 multiple-replacement issues (if you have the noun-list of [drink,
 
 rink], you'll get results that you don't likely want.
 
 
 
 My code hasn't considered case differences, but you should be able to
 
 normalize both the list of nouns and the word you're testing in the
 
 modify() function so that it would find Drink as well as drink
 
 
 
 Also, note that some words serve both as nouns and other parts of
 
 speech, e.g. It's kind of you to house me for the weekend and drink
 
 tea with me.
 
 
 
 -tkc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 import re
 
 
 
 r = re.compile(r
 
   \b# assert a word boundary
 
   \w+   # 1+ word characters
 
   (?:   # a group
 
[-']  # a dash or apostrophe
 
\w+   # followed by 1+ word characters
 
)?# make the group optional (0 or 1 instances)
 
   \b# assert a word boundary here
 
   , re.VERBOSE)
 
 
 
 nouns = set([
 
   drink,
 
   house,
 
   ])
 
 
 
 def modify(matchobj):
 
   word = matchobj.group(0)
 
   if word in nouns:
 
 return %s % word
 
   else:
 
 return word
 
 
 
 print r.sub(modify, I drink in the house)

Great, only I don't have the re module on my system :(
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: replace only full words

2013-09-28 Thread MRAB

On 28/09/2013 18:43, cerr wrote:
[snip]

Great, only I don't have the re module on my system :(


Really? It's part of Python's standard distribution.

--
https://mail.python.org/mailman/listinfo/python-list


Re: replace only full words

2013-09-28 Thread Tim Chase
[mercy, you could have trimmed down that reply]

On 2013-09-28 10:43, cerr wrote:
 On Saturday, September 28, 2013 4:54:35 PM UTC, Tim Chase wrote:
 import re
 
 Great, only I don't have the re module on my system :(

Um, it's a standard Python library.  You sure about that?

  http://docs.python.org/2/library/re.html

-tkc



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: replace only full words

2013-09-28 Thread cerr
On Saturday, September 28, 2013 11:07:11 AM UTC-7, MRAB wrote:
 On 28/09/2013 18:43, cerr wrote:
 
 [snip]
 
  Great, only I don't have the re module on my system :(
 
 
 
 Really? It's part of Python's standard distribution.

Oh no, sorry, mis-nformation, i DO have module re available!!! All good!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: replace only full words

2013-09-28 Thread cerr
On Saturday, September 28, 2013 11:17:19 AM UTC-7, Tim Chase wrote:
 [mercy, you could have trimmed down that reply]
 
 
 
 On 2013-09-28 10:43, cerr wrote:
 
  On Saturday, September 28, 2013 4:54:35 PM UTC, Tim Chase wrote:
 
  import re
 
  
 
  Great, only I don't have the re module on my system :(
 
 
 
 Um, it's a standard Python library.  You sure about that?
 
 
 
   http://docs.python.org/2/library/re.html
 

Oh no, sorry, mis-nformation, i DO have module re available!!! All good! 
-- 
https://mail.python.org/mailman/listinfo/python-list