If I'm not mistaken, that's TEI, and I suggest you consult with the TEI community for strategies for document indexing, as there are a lot of branching-style tags in TEI. My guess is that you'll hear that it's best to perform some sort of term expansion on the document as a preprocessing step.
Michael Della Bitta ------------------------------------------------ Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com -----Original Message----- From: Tigunn Sent: Thursday, May 31, 2012 11:30 AM To: solr-user@lucene.apache.org Subject: Strip html Hello, I have an index full text on xml files. Exemple: --------------------------------------- <item type="fragment" n="3"> <cit dbp:hand="GF-encre"> si les <hi rend="underline">ruches d’<term>abeilles</term> > > </hi> prouvent la > monarchie, les fourmillières, les troupes d’éléphants ou > de <lb/> > <choice> > <orig>C</orig> > <reg>c</reg> > </choice>astors prouvent la > république. <bibl xml:id="b-7468-3"/> </cit> </item> --------------------------------------- I use solr 1.4.1 to make full text search with php. When i search "castor", i can't fund this one. But if i search "c astor" it's ok: problem !!!! I make a transformation XSLT which return : --------------------------------------- si les ruches d’abeilles prouvent la monarchie, les fourmillières, les troupes d’éléphants ou de castors prouvent la république. --------------------------------------- i put this html in solr: $doc->addField('body_strip_html', $body_norm); In schema.xml: <fieldType name="text_strip_html" class="solr.TextField" positionIncrementGap="100"> <analyzer> <charFilter class="solr.HTMLStripCharFilterFactory"/> <tokenizer class="solr.StandardTokenizerFactory"/> </analyzer> </fieldType> AND <field name="body_strip_html" type="text_strip_html" indexed="true" stored="true"/> But this don't work! I want to return this xml files (look exemple) if i search "castor". Can you help me, please? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Strip-html-tp3987051.html Sent from the Solr - User mailing list archive at Nabble.com.