On Tue, 28 Nov 2006, BEADLING, Philip, GBM wrote:

   def highlight( self, searchText, searchResultFilenames ):
       for filename in searchResultFilenames:
           # Find text directory from documents directory and convert
network fileshare to local mount
           textFile = filename.replace("\\Documents\\","\\Text\\") + ".txt"
           textFile = textFile.replace("\\", "/")
           textFile =
textFile.replace("//networkshare/IRDcaf/Documentation", "/Documentation")

           print "<br>", searchText, "<br>", textFile
           if os.path.isfile( textFile ):
               filen = open( textFile, 'r' )
               textString = filen.read()
               filen.close()
               term = Term( "field", searchText )
               termQuery = TermQuery( term )
               scorer = QueryScorer( termQuery )
               highlighter = Highlighter( scorer )
               simpAn = SimpleAnalyzer()
               # PROBLEM IS HERE!!!!
               reader = PyLucene.StringReader( textString )
               tokenStream = simpAn.tokenStream("field", reader )
               print highlighter.getBestFragment( tokenStream, textString )


At first quick glance, it doesn't look like 'textString' is going to be of type 'unicode' in the above code sample. What comes out of a python file's read method is a object of type 'str'. I believe PyLucene will try to convert the 'str' into a 'unicode' object by assuming 'utf-8' encoding. If your 'str' is not 'utf-8' encoded then that is going to fail.

If you send in a piece of code that runs (with the required data) that reproduces the problem you're experiencing, I might be able to help you better.

Andi..
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to