Hi Mary,

While testing found more when only basic stemming is enabled.

For example the term "mourir" with basic stemming enabled returns me meurt,
mourant, mourrait, mourir

let $text:= <text xml:lang="fr">marcher avec la bau rupture de baux
septembre 1997, bail marche cette disparues situation bau fait disparaître
la justification. Les services fournis disparu par la demanderesse l'ont
été dans l'attente d'une rémunération,</text>
return
cts:highlight($text,cts:query(<cts:word-query>
                                <cts:text xml:lang="fr">mourir</cts:text>
                                <cts:option>case-insensitive</cts:option>

<cts:option>diacritic-insensitive</cts:option>

<cts:option>punctuation-insensitive</cts:option>
                      </cts:word-query>),<b>{$cts:text}</b>)

Why does not the same happens for the term disparu  or marche?

Why advanced stemming required for these terms? Is it anything specific to
French language ?

Also, when i did check for stems of cts:stem("baux","fr") i get bau,bail
where bau doesnt have any meaning in french.

Since only basic stemming is enabled at my DB level i am seeing documents
contains baux or bau but not bail.

Can you tell me why this difference in bahaviour on french stems.

Thanks,
Praveen.

On Tue, Apr 12, 2016 at 10:45 AM, Mary Holstege <[email protected]
> wrote:

> On Tue, 12 Apr 2016 07:10:46 -0700, Gontla Praveen <
> [email protected]> wrote:
>
> Hi Mary,
>>
>> Why an advanced stemming need to be enabled any specific reason for that?
>>
>
> Not everyone needs or wants advanced stemming: it does more work (so,
> slightly slower) with larger indexes.
> For some languages, the slight increase in recall is not worth it for many
> use cases.
>
>
>> What will be difference between using basic stemming and advanced
>> stemming ?
>>
>
> Basic stemming only indexes the preferred stem for each token (typically,
> the shortest one). Advanced stemming indexes all possible stems.
>
> Completing the picture:
> * decompounding is like advanced stemming, but with additional indexing
> for components of compounds. This principally applies to German and
> languages like that that create long noun clusters as single words.
> * you can also turn stemming off entirely; principally useful where you
> searching over non-linguistic content
>
> //Mary
>
>
>> Thanks,
>> Praveen.
>>
>> On Thu, Mar 31, 2016 at 12:58 PM, Mary Holstege <
>> [email protected]
>>
>>> wrote:
>>>
>>
>>
>>> Do you have advanced stemming enabled? With basic stemming only the first
>>> stem returned from cts:stem indexed and used for matching in search.
>>>
>>> //Mary
>>>
>>>
>>> On 03/31/2016 03:00 AM, Debin, Infant Jerald (LNG-CON) wrote:
>>>
>>> Hi Team,
>>>
>>>
>>>
>>> For the term French term *“disparu”* corresponding French stemmed word
>>> *“disparaître”* is not getting recognized when performing search.
>>>
>>>
>>>
>>> *Example:*
>>>
>>>
>>>
>>> *Query:*
>>>
>>>
>>>
>>> let $text:= <text xml:lang="fr">avec la rupture de septembre 1997, cette
>>> disparues situation fait disparaître la justification. Les services
>>> fournis
>>> disparu par la demanderesse l'ont été dans l'attente d'une
>>> rémunération,</text>
>>>
>>> return
>>>
>>> cts:highlight($text,cts:query(<cts:word-query>
>>>
>>>                                 <cts:text
>>> xml:lang="fr">disparu</cts:text>
>>>
>>>                                 <cts:option>case-insensitive</cts:option>
>>>
>>>
>>> <cts:option>diacritic-insensitive</cts:option>
>>>
>>>
>>> <cts:option>punctuation-insensitive</cts:option>
>>>
>>>                       </cts:word-query>),<h1>{$cts:text}</h1>)
>>>
>>>
>>>
>>> *Result:*
>>>
>>>
>>>
>>> Disparaître is not getting recognized and highlighted as below,
>>>
>>>
>>>
>>> <text xml:lang="fr">avec la rupture de septembre 1997, cette
>>> <h1>disparues</h1> situation fait disparaître la justification. Les
>>> services fournis <h1>disparu</h1> par la demanderesse l'ont été dans
>>> l'attente d'une rémunération,</text>
>>>
>>>
>>>
>>> Below is the result of cts:stem,
>>>
>>>
>>>
>>> cts:stem("disparu","fr")
>>>
>>>
>>>
>>> disparu
>>>
>>> disparaître
>>>
>>>
>>>
>>> Please let us know on this issue.
>>>
>>>
>>>
>>> Thanks and Regards,
>>>
>>>
>>>
>>> Debin
>>>
>>> Mob: +91-9789826001
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> General mailing [email protected]
>>> Manage your subscription at:
>>> http://developer.marklogic.com/mailman/listinfo/general
>>>
>>>
>>>
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> Manage your subscription at:
>>> http://developer.marklogic.com/mailman/listinfo/general
>>>
>>>
>>>
>
> --
> Using Opera's revolutionary email client: http://www.opera.com/mail/
>
_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to