To follow up on my post from Thursday. I have written a very basic test for TermPositions. This test allows me to identify that only the first 10001 tokens are considered to determine term frequency (ie with the searching term in a position greater than 10001 my test fails).
Is this by design? Is there an obvious work-around so that the frequency that I receive is correct for my document? Thank you for your consideration, Tricia On Thu, 29 Sep 2005, Tricia Williams wrote: > I am finding that TermDocs.freq() method is returning an incorrect value. > I was wondering if anyone else had experienced this problem. > > I am using tp = IndexReader.termPositions( queryTerm ) to return a object > which implements TermPositions. I then use tp.skipTo( docid ) to go > directly to the document from which I wish to retrieve term positions. The > following for loop adds the positions to my ArrayList which I use later: > > for( int pos = tp.nextPosition(), k = 0; > k < tp.freq(); > pos = tp.nextPosition(), k++ ) > { > positionMatches.add( new Integer( pos ) ); > } > > In a document which I know has 48 references to the term, a frequency of > 23 is returned. There doesn't seem to be a pattern to this as some other > documents have (frequency, actual): (25, 48), (36, 43), (30, 149). > > These frequencies are from results within my code and confirmed in Luke, > so I'm pretty certain that this isn't an error on my part. > > I've been trying to find out where the origin of this issue is without > luck thus far. Any help or advice would be appreciated. > > Thanks, > Tricia > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]