Hello Grant, I am currently storing the first term instance only because I just index each token for an article once. What I want to achieve is an index for versioned document collections like wikipedia (See this paper http://www.cis.poly.edu/suel/papers/archive.pdf).
In detail I create on the first level (Lucene) a document for one wikipedia article containing all distinct terms of its versions. On the second level (payloads) I store the frequency information corresponding to each article version and its terms. If I search now I can find an article by its term and through the term and its payload I receive informations about the other versions and how often a token occured (In my case with one term the payload pos is always 1!). So I look on the first level and pick only the information from the second level which I need. By this I can avoid storing informations several times because most wikipedia versions are very similar (in term context). This is working so far and I just want to reduce my index size but I don't know how much I can save by disabling term freqs/pos. I hope I could explain the problem a little bit. If not just tell me I try to explain it again. :) Best regards Alex PS: I am currently looking for a bedroom in New York, Brooklyn (Park Slope or near NYU Poly). Maybe somebody rents a room from 15 Feb until 15 April. :) Am Donnerstag, den 03.02.2011, 12:38 -0500 schrieb Grant Ingersoll: > Payloads only make sense in terms of specific positions in the index, so I > don't think there is a way to hack Lucene for it. You could, I suppose, just > store the payload for the first instance of the term. > > Also, what's the use case you are trying to solve here? Why store term > frequency as a payload when Lucene already does it (and it probably does it > more efficiently) > > -Grant > > On Feb 2, 2011, at 2:35 PM, Alex vB wrote: > > > > > Hello everybody, > > > > I am currently using Lucene 3.0.2 with payloads. I store extra information > > in the payloads about the term like frequencies and therefore I don't need > > frequencies and term positions stored normally by Lucene. I would like to > > set f.setOmitTermFreqAndPositions(true) but then I am not able to retrieve > > payloads. Would it be hard to "hack" Lucene for my requests? Anymore I only > > store one payload per term if that information makes it easier. > > > > Best regards > > Alex > > -- > > View this message in context: > > http://lucene.472066.n3.nabble.com/Storing-payloads-without-term-position-and-frequency-tp2408094p2408094.html > > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org