Re: Ignoring XML tags when Indexing

Kalani Ruwanpathirana Thu, 24 Jul 2008 22:56:11 -0700

Hi Marcelo,

Thanks for the reply. Yes I want to ignore all the tags and store the text
in one field. Previously used tags are not known and seems the "XMLAnalyzer"
is the
solution. Anyway I think Lucene itself does not support a XMLAnalyzer. Do I
have to do it manually?


Kalani

On Thu, Jul 24, 2008 at 6:10 PM, Marcelo Schneider <
[EMAIL PROTECTED]> wrote:

> Do you just want to ignore them and store all in one field? If you know the
> used tags previously, I guess you could set up a stop words list with them.
> If not, you could do an "XMLAnalyzer" that simply ignores everything inside
> '<>'...
>
> If you want to split the xml content in separate fields, you have to parse
> it before indexing, take a look at this article:
> http://www.ibm.com/developerworks/library/j-lucene/
>
> I'm a little bit new to Lucene, so I might be missing something here, but I
> wouldn't expect it to have an API for this...
>
>
> Kalani Ruwanpathirana escreveu:
>
>> Hi all,
>>
>> I am searching for a way to ignore XML tags in the input when indexing. Is
>> there a built in functionality in Lucene to get this done?
>> I am sorry if this was discussed before. I searched but couldn't find a
>> clear solution.
>>
>> Thanks in advance
>> Kalani
>>
>>
>>
>
> --
>
>
> *Marcelo Frantz Schneider*
> /SIC - TCO - Tecnologia em Engenharia do Conhecimento/
>
> *DÍGITRO TECNOLOGIA*
> *E-mail:* [EMAIL PROTECTED] <mailto:
> [EMAIL PROTECTED]>
> ***Site:* www.digitro.com <http://www.digitro.com>
>
> --
> Esta mensagem foi verificada pelo sistema de antivírus da Dígitro e
> acredita-se estar livre de perigo.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


-- 
Kalani Ruwanpathirana
Department of Computer Science & Engineering
University of Moratuwa

Re: Ignoring XML tags when Indexing

Reply via email to