The TermFrequencyVector works perfectly for normal query strings. But if I
add a wild card (*) onto words to search for different forms of the word I
get an ArrayIndexOutOfBoundsException because the index is -1. Why does this
happen? And is there anyway to avoid it?
Thanks,
James
jnance wrote:
>
> Yes, the term frequency vector is exactly what I needed. Thanks!
>
> -James
>
>
> Ajay Lakhani wrote:
>>
>> Hi James,
>>
>> Try this:
>>
>> Searcher searcher = new IndexSearcher(dir);
>> QueryParser parser = new QueryParser("content", new
>> StandardAnalyzer());
>> Query query = parser.parse(queryString);
>>
>> HashSet queryTerms = new HashSet();
>> query.extractTerms(queryTerms);
>>
>> Hits hits = searcher.search(query);
>>
>> IndexReader reader = IndexReader.open(dir);
>>
>> for (int i =0; i < hits.length() ; i ++){
>> Document d = hits.doc(i);
>> Field fid = d.getField("id");
>> Field ftitle = d.getField("title");
>> System.out.println("id is " + fid.stringValue());
>> System.out.println("title is " + ftitle.stringValue());
>>
>> TermFreqVector tfv = reader.getTermFreqVector(hits.id(i),
>> "content");
>> String[] terms = tfv.getTerms();
>> int [] freqs = tfv.getTermFrequencies();//get the frequencies
>>
>> // for each term in the query
>> for (Iterator iter = queryTerms.iterator(); iter.hasNext();) {
>> Term term = (Term) iter.next();
>>
>> // for each term in the vector
>> for (int j = 0; j < terms.length; j++) {
>> if (terms[j].equals(term.text())) {
>> System.out.println("frequency of term ["+ term.text() +"] is
>> " +
>> freqs[j] );
>> }
>> }
>> }
>> }
>>
>> Let me know if this helps.
>> Cheers
>> AJ
>>
>> 2008/7/10 Karl Wettin <[EMAIL PROTECTED]>:
>>
>>> Maybe you are looking for the document TermFreqVector?
>>>
>>>
>>> karl
>>>
>>> 9 jul 2008 kl. 15.49 skrev jnance:
>>>
>>>
>>>> Hi,
>>>>
>>>> I am indexing lots of text files and need to see how many times a
>>>> certain
>>>> word comes up in each text file. Right now I have this constructor for
>>>> "search":
>>>>
>>>> static void search(Searcher searcher, String queryString) throws
>>>> ParseException, IOException {
>>>> QueryParser parser = new QueryParser("content", new
>>>> StandardAnalyzer());
>>>> Query query = parser.parse(queryString);
>>>> Hits hits = searcher.search(query);
>>>>
>>>> int hitCount = hits.length();
>>>> if (hitCount == 0) {
>>>> System.out.println("0 documents contain the
>>>> word
>>>> \"" + queryString +
>>>> ".\"");
>>>> }
>>>> else {
>>>> System.out.println(hitCount + " documents
>>>> contain
>>>> the word \"" +
>>>> queryString + ".\"");
>>>> }
>>>> }
>>>>
>>>> This tells me how many documents contain the word I'm looking for...
>>>> but
>>>> how
>>>> do I get it to tell me how many times the word occurs within that
>>>> document?
>>>>
>>>> Thanks,
>>>>
>>>> James
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/Searching-for-instances-within-a-document-tp18362075p18362075.html
>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>>
>>
>>
>
>
--
View this message in context:
http://www.nabble.com/Searching-for-instances-within-a-document-tp18362075p18403878.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]