On Sunday 08 January 2017 15:33, CM wrote: > On Saturday, January 7, 2017 at 7:59:01 PM UTC-5, Steve D'Aprano wrote: [...] >> Start by printing repr(candidate_text) and see what you really have. > > Yes, that did it. The repr of that one was, in fact: > > u'match /r'
Are you sure it is a forward-slash /r rather than backslash \r? > Thanks, that did it. Do you happen to know why that /r was appended to the > unicode object in this case? *shrug* You're the only one that can answer that, because only you know where the text came from. The code you showed: candidate_text = Paragraph.Range.Text.encode('utf-8') is a mystery, because we don't know what Paragraph is or where it comes from, or what .Range.Text does. You mention "scraping a Word docx", but we have no idea how you're scraping it. If I had to guess, I'd guess: - you actually mean \r rather than /r; - paragraphs in Word docs always end with a carriage return \r; - and whoever typed the paragraph accidentally hit the spacebar after typing the word "match". But its just a guess. For all I know, the software you are using to scrape the docx file inserts space/r after everything. -- Steven "Ever since I learned about confirmation bias, I've been seeing it everywhere." - Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list