Do you just want to ignore them and store all in one field? If you know the used tags previously, I guess you could set up a stop words list with them. If not, you could do an "XMLAnalyzer" that simply ignores everything inside '<>'...

If you want to split the xml content in separate fields, you have to parse it before indexing, take a look at this article: http://www.ibm.com/developerworks/library/j-lucene/

I'm a little bit new to Lucene, so I might be missing something here, but I wouldn't expect it to have an API for this...


Kalani Ruwanpathirana escreveu:
Hi all,

I am searching for a way to ignore XML tags in the input when indexing. Is
there a built in functionality in Lucene to get this done?
I am sorry if this was discussed before. I searched but couldn't find a
clear solution.

Thanks in advance
Kalani


--


*Marcelo Frantz Schneider*
/SIC - TCO - Tecnologia em Engenharia do Conhecimento/

*DÍGITRO TECNOLOGIA*
*E-mail:* [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
***Site:* www.digitro.com <http://www.digitro.com>

--
Esta mensagem foi verificada pelo sistema de antivírus da Dígitro e
acredita-se estar livre de perigo.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to