Re: Ignoring XML tags when Indexing

Marcelo Schneider Thu, 24 Jul 2008 05:41:53 -0700

Do you just want to ignore them and store all in one field? If you knowthe used tags previously, I guess you could set up a stop words listwith them. If not, you could do an "XMLAnalyzer" that simply ignoreseverything inside '<>'...

If you want to split the xml content in separate fields, you have toparse it before indexing, take a look at this article:http://www.ibm.com/developerworks/library/j-lucene/

I'm a little bit new to Lucene, so I might be missing something here,but I wouldn't expect it to have an API for this...



Kalani Ruwanpathirana escreveu:

Hi all,

I am searching for a way to ignore XML tags in the input when indexing. Is
there a built in functionality in Lucene to get this done?
I am sorry if this was discussed before. I searched but couldn't find a
clear solution.

Thanks in advance
Kalani


--


*Marcelo Frantz Schneider*
/SIC - TCO - Tecnologia em Engenharia do Conhecimento/

*DÍGITRO TECNOLOGIA*

*E-mail:* [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>

***Site:* www.digitro.com <http://www.digitro.com>

--
Esta mensagem foi verificada pelo sistema de antivírus da Dígitro e
acredita-se estar livre de perigo.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Ignoring XML tags when Indexing

Reply via email to