Do you just want to ignore them and store all in one field? If you know
the used tags previously, I guess you could set up a stop words list
with them. If not, you could do an "XMLAnalyzer" that simply ignores
everything inside '<>'...
If you want to split the xml content in separate fields, you have to
parse it before indexing, take a look at this article:
http://www.ibm.com/developerworks/library/j-lucene/
I'm a little bit new to Lucene, so I might be missing something here,
but I wouldn't expect it to have an API for this...
Kalani Ruwanpathirana escreveu:
Hi all,
I am searching for a way to ignore XML tags in the input when indexing. Is
there a built in functionality in Lucene to get this done?
I am sorry if this was discussed before. I searched but couldn't find a
clear solution.
Thanks in advance
Kalani
--
*Marcelo Frantz Schneider*
/SIC - TCO - Tecnologia em Engenharia do Conhecimento/
*DÍGITRO TECNOLOGIA*
*E-mail:* [EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>
***Site:* www.digitro.com <http://www.digitro.com>
--
Esta mensagem foi verificada pelo sistema de antivírus da Dígitro e
acredita-se estar livre de perigo.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]