I ran one of the :data fields through the StandardAnalyzer - the only one we have used - and it tokenized it with no complaints.
Interestingly, the last batch of 1700 sites that we added incrementally to our index does not seem to suffer from this problem. On 6/13/07, Jens Kraemer <[EMAIL PROTECTED]> wrote: > On Wed, Jun 13, 2007 at 08:58:36AM -0400, Richard Jones wrote: > > According to my IndexReader's field_infos, all the fields are stored > > and indexed, with :with_positions_offsets for the term_vectors. > > > > A look at a term vector for one of these :data fields gives: > > > > #<struct Ferret::Index::TermVector field=:data, terms=[], offsets=nil> > > > > Is this what they look like when you index with :index=>no? > > no, with index => no no term vectors can be stored and then term_vector > returns nil, not an empty tv. > > The scenario you have could happen if your analyzer choked at indexing > time and returned not a single term for your document (just like if you > had a doc full of stop words). > > Since you have the stored contents, could you try to index that data > again and see if the problem can be reproduced? > > Jens > > > > -- > Jens Krämer > webit! Gesellschaft für neue Medien mbH > Schnorrstraße 76 | 01069 Dresden > Telefon +49 351 46766-0 | Telefax +49 351 46766-66 > [EMAIL PROTECTED] | www.webit.de > > Amtsgericht Dresden | HRB 15422 > GF Sven Haubold, Hagen Malessa > _______________________________________________ > Ferret-talk mailing list > [email protected] > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Richard Jones [EMAIL PROTECTED] _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

