Re: highlight using a MemoryIndex

Wolfgang Hoschek Thu, 28 Sep 2006 10:05:18 -0700

Document.get(FIELD_NAME) will always be null because MemoryIndex doesnot store the original untokenized fulltext(s). Those full texts arethrown away immediately after tokenization (i.e. on addField()),keeping only the field names and associated (indexed) tokenizedterms. The latter are what Lucene queries select on.

The underlying rationale is that a user can easily keep the fulltext(s) if desired, given that a MemoryIndex is intended for a singletemporary document at a time.


Wolfgang.

I was able to get the following code to work using a RAMDirectory,but after reading \the description of the MemoryIndex, I wanted to try to use itinstead for speed \reasons. I removed the RAMDir code, and replaced the referenceswith the MemoryIndex, \and all seems to go well till I start stepping through the hits. 1hit is found, but \the String returned from the "hits.doc(i).get(FIELD_NAME);" call isalways null. Any \

ideas where I went wrong?


Thank you,

Dan Williams


    public String findAndWrap( String queryString, String value)

    {

//        System.out.println( "Methods output = [" + \
Methods.containsBoolean(queryString, value) + "]");

        String returnString = null;



        try {

            // store the string to search

            MemoryIndex index = new MemoryIndex();

            index.addField(FIELD_NAME, value, new StandardAnalyzer());


            /* RAMDir method

            RAMDirectory ramDir = new RAMDirectory();

IndexWriter writer = new IndexWriter(ramDir, newStandardAnalyzer(), \

true);

            addDoc(writer, value);


            writer.optimize();

            writer.close();

            IndexReader reader = IndexReader.open(ramDir);

            */


            // setup the queryString

QueryParser parser=new QueryParser(FIELD_NAME, newStandardAnalyzer());


            Query unReWrittenQuery = parser.parse(queryString);

            System.out.println("UnRewritten Query: [" + \
unReWrittenQuery.toString(FIELD_NAME) + "]");

System.out.println("score = " + index.search(unReWrittenQuery));



            IndexSearcher searcher = index.createSearcher();

/*          IndexSearcher searcher = new IndexSearcher(ramDir);*/

            System.out.println("Searcher = " + searcher.toString());

IndexReader reader =searcher.getIndexReader();



            Query query = unReWrittenQuery.rewrite(reader);

System.out.println("Searching for: " + query.toString(FIELD_NAME));


            Hits hits = searcher.search(query);

System.out.println("Number of Hits = [" + hits.length()+ "]");



            if( hits.length() > 0)

            {

                Highlighter highlighter =new Highlighter(this,new \
QueryScorer(query));

                highlighter.setTextFragmenter(new NullFragmenter());

                for (int i = 0; i < hits.length(); i++)

                {

                    String text = hits.doc(i).get(FIELD_NAME);

                    int maxNumFragmentsRequired = 2;

                    String fragmentSeparator = "...";

StandardAnalyzer analyzer2 = newStandardAnalyzer();

TokenStream tokenStream=analyzer2.tokenStream(FIELD_NAME,new \

StringReader(text));


                    returnString =

                        highlighter.getBestFragments(

                                tokenStream,

                                text,

                                maxNumFragmentsRequired,

                                fragmentSeparator);

                }

            }

            else

            {

                System.out.println("No Matches found to highlight.");

                returnString = value;

            }

        } catch (ParseException e) {

            // TODO Auto-generated catch block

            e.printStackTrace();

        } catch (IOException e) {

            // TODO Auto-generated catch block

            e.printStackTrace();

        }


        return returnString;

    }


    public static void main(String[] args) {


        StringBuffer contents = new StringBuffer();


        contents.append("bob\n");

        contents.append("\n");

        contents.append("bobberson\n");

        contents.append("\n");

        contents.append("test\n");

        contents.append("\n");

        contents.append("\n");

        contents.append("\n");

        contents.append("\n");

        contents.append("dated this is also a test sentence\n");

        contents.append("\n");

        contents.append("Dated:7/24/2006\n");

        contents.append("\n");

        contents.append("#Dated: 7/24/2006\n");

        contents.append("\n");

contents.append("asdf asdfas dfasdfDateda sdfasdfa sdfasdfasdf: \

7/24/2006\n");



        String queryString1 = "(+test +Dated*)";


        String queryString2 = "sdfasd and !ted";

        String queryString3 = "test";


        modularHtmlHighlighter hl = new modularHtmlHighlighter();


        System.out.println("string 1");

System.out.println( hl.findAndWrap(queryString1,contents.toString()) );


        System.out.println("string 2");

System.out.println( hl.findAndWrap(queryString2,contents.toString()) );


        System.out.println("string 3");

System.out.println( hl.findAndWrap(queryString3,contents.toString()) );


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: highlight using a MemoryIndex

Reply via email to