Re: [pylucene-dev] Problems with StringReader()

Andi Vajda Tue, 28 Nov 2006 09:05:47 -0800


On Tue, 28 Nov 2006, BEADLING, Philip, GBM wrote:

   def highlight( self, searchText, searchResultFilenames ):
       for filename in searchResultFilenames:
           # Find text directory from documents directory and convert
network fileshare to local mount
           textFile = filename.replace("\\Documents\\","\\Text\\") + ".txt"
           textFile = textFile.replace("\\", "/")
           textFile =
textFile.replace("//networkshare/IRDcaf/Documentation", "/Documentation")

           print "<br>", searchText, "<br>", textFile
           if os.path.isfile( textFile ):
               filen = open( textFile, 'r' )
               textString = filen.read()
               filen.close()
               term = Term( "field", searchText )
               termQuery = TermQuery( term )
               scorer = QueryScorer( termQuery )
               highlighter = Highlighter( scorer )
               simpAn = SimpleAnalyzer()
               # PROBLEM IS HERE!!!!
               reader = PyLucene.StringReader( textString )
               tokenStream = simpAn.tokenStream("field", reader )
               print highlighter.getBestFragment( tokenStream, textString )

At first quick glance, it doesn't look like 'textString' is going to be oftype 'unicode' in the above code sample. What comes out of a python file'sread method is a object of type 'str'. I believe PyLucene will try to convertthe 'str' into a 'unicode' object by assuming 'utf-8' encoding. If your 'str'is not 'utf-8' encoded then that is going to fail.

If you send in a piece of code that runs (with the required data) thatreproduces the problem you're experiencing, I might be able to help youbetter.


Andi..
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Re: [pylucene-dev] Problems with StringReader()

Reply via email to