Hi Marcelo, Thanks for the reply. Yes I want to ignore all the tags and store the text in one field. Previously used tags are not known and seems the "XMLAnalyzer" is the solution. Anyway I think Lucene itself does not support a XMLAnalyzer. Do I have to do it manually?
Kalani On Thu, Jul 24, 2008 at 6:10 PM, Marcelo Schneider < [EMAIL PROTECTED]> wrote: > Do you just want to ignore them and store all in one field? If you know the > used tags previously, I guess you could set up a stop words list with them. > If not, you could do an "XMLAnalyzer" that simply ignores everything inside > '<>'... > > If you want to split the xml content in separate fields, you have to parse > it before indexing, take a look at this article: > http://www.ibm.com/developerworks/library/j-lucene/ > > I'm a little bit new to Lucene, so I might be missing something here, but I > wouldn't expect it to have an API for this... > > > Kalani Ruwanpathirana escreveu: > >> Hi all, >> >> I am searching for a way to ignore XML tags in the input when indexing. Is >> there a built in functionality in Lucene to get this done? >> I am sorry if this was discussed before. I searched but couldn't find a >> clear solution. >> >> Thanks in advance >> Kalani >> >> >> > > -- > > > *Marcelo Frantz Schneider* > /SIC - TCO - Tecnologia em Engenharia do Conhecimento/ > > *DÍGITRO TECNOLOGIA* > *E-mail:* [EMAIL PROTECTED] <mailto: > [EMAIL PROTECTED]> > ***Site:* www.digitro.com <http://www.digitro.com> > > -- > Esta mensagem foi verificada pelo sistema de antivírus da Dígitro e > acredita-se estar livre de perigo. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Kalani Ruwanpathirana Department of Computer Science & Engineering University of Moratuwa