This is probably very simple but I get confused when it comes to encoding and 
am generally rusty. (What follows is in Python 2.7; I know.).

I'm scraping a Word docx using win32com and am just trying to do some matching 
rules to find certain paragraphs that, for testing purposes, equal the word 
'match', which I know exists as its own "paragraph" in the target document. 
First, this is at the top of the file:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

Then this is the relevant code:

candidate_text = Paragraph.Range.Text.encode('utf-8')
print 'This is candidate_text:', candidate_text
print type(candidate_text)   
print type('match')
print candidate_text == 'match'
if candidate_text == 'match':
 #  do something...

And that section produces this:

This is candidate_text: match
<type 'str'>
<type 'str'>
False

and, of course, doesn't enter that "do something" loop since apparently 
candidate_text != 'match'...even though it seems like it does.

So what's going on here? Why isn't a string with the content 'match' equal to 
another string with the content 'match'?

I've also tried it with removing that .encode part and the encoding part at the 
very top, but then the candidate_text is a unicode object that I also can't get 
to match to anything.

What am I doing wrong? How should I approach this? Thanks.
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to