On Tue, 28 Nov 2006, BEADLING, Philip, GBM wrote:
def highlight( self, searchText, searchResultFilenames ):
for filename in searchResultFilenames:
# Find text directory from documents directory and convert
network fileshare to local mount
textFile = filename.replace("\\Documents\\","\\Text\\") + ".txt"
textFile = textFile.replace("\\", "/")
textFile =
textFile.replace("//networkshare/IRDcaf/Documentation", "/Documentation")
print "<br>", searchText, "<br>", textFile
if os.path.isfile( textFile ):
filen = open( textFile, 'r' )
textString = filen.read()
filen.close()
term = Term( "field", searchText )
termQuery = TermQuery( term )
scorer = QueryScorer( termQuery )
highlighter = Highlighter( scorer )
simpAn = SimpleAnalyzer()
# PROBLEM IS HERE!!!!
reader = PyLucene.StringReader( textString )
tokenStream = simpAn.tokenStream("field", reader )
print highlighter.getBestFragment( tokenStream, textString )
At first quick glance, it doesn't look like 'textString' is going to be of
type 'unicode' in the above code sample. What comes out of a python file's
read method is a object of type 'str'. I believe PyLucene will try to convert
the 'str' into a 'unicode' object by assuming 'utf-8' encoding. If your 'str'
is not 'utf-8' encoded then that is going to fail.
If you send in a piece of code that runs (with the required data) that
reproduces the problem you're experiencing, I might be able to help you
better.
Andi..
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev