Re: [GENERAL] Full text: Ispell dictionary

2014-05-09 Thread Tim van der Linden
Hi Oleg

> btw, take a look on contrib/dict_xsyn, it's  more powerful than
> synonym dictionary.

Sorry for the late reply...and thank you for the tip.

I will check out xsyn soon. I am about to finish the third and final chapter of 
my full text series, but I could maybe write an "appendix" chapter which 
mentions xsyn...or just update my posts.

Cheers,
Tim

> On Sat, May 3, 2014 at 2:26 AM, Tim van der Linden  wrote:
> > Hi Oleg
> >
> > Haha, understood!
> >
> > Thanks for helping me on this one.
> >
> > Cheers
> > Tim
> >
> >
> > On May 3, 2014 7:24:08 AM GMT+09:00, Oleg Bartunov 
> > wrote:
> >>
> >> Tim,
> >>
> >> you did answer yourself - don't use ispell :)
> >>
> >> On Sat, May 3, 2014 at 1:45 AM, Tim van der Linden  wrote:
> >>>
> >>>  On Fri, 2 May 2014 21:12:56 +0400
> >>>  Oleg Bartunov  wrote:
> >>>
> >>>  Hi Oleg
> >>>
> >>>  Thanks for the response!
> >>>
>   Yes, it's normal for ispell dictionary, think about morphological
>  dictionary.
> >>>
> >>>
> >>>  Hmm, I see, that makes sense. I thought the morphological aspect of the
> >>> Ispell only dealt with splitting up compound words, but it also deals with
> >>> deriving the word to a more "stem" like form, correct?
> >>>
> >>>  As a last question on this, is there a way to disable this dictionary to
> >>> emit multiple lexemes?
> >>>
> >>>
> >>> The reason I am asking is because in my (fairly new) understanding of
> >>> PostgreSQL's full text it is always best to have as few lexemes as 
> >>> possible
> >>> saved in the vector. This to get smaller indexes and faster matching
> >>> afterwards. Also, if you run a tsquery afterwards to, you can still employ
> >>> the power of these multiple lexemes to find a match.
> >>>
> >>>  Or...probably answering my own question...if I do not desire this
> >>> behavior I should maybe not use Ispell and simply use another dictionary 
> >>> :)
> >>>
> >>>  Thanks again.
> >>>
> >>>  Cheers,
> >>>  Tim
> >>>
>   On Fri, May 2, 2014 at 11:54 AM, Tim van der Linden 
>  wrote:
> >
> >  Good morning/afternoon all
> >
> >  I am currently writing a few articles about PostgreSQL's full text
> > capabilities and have a question about the Ispell dictionary which I
> > cannot seem to find an answer to. It is probably a very simple issue, so
> > forgive my ignorance.
> >
> >  In one article I am explaining about dictionaries and I have setup a
> > sample configuration which maps most token categories to only use a 
> > Ispell
> > dictionary (timusan_ispell) which has a default configuration:
> >
> >  CREATE TEXT SEARCH DICTIONARY timusan_ispell (
> >  TEMPLATE = ispell,
> >  DictFile = en_us,
> >  AffFile = en_us,
> >  StopWords = english
> >  );
> >
> >  When I run a simple query like "SELECT
> > to_tsvector('timusan-ispell','smiling')" I get back the following 
> > tsvector:
> >
> >  'smile':1 'smiling':1
> >
> >  As you can see I get two lexemes with the same pointer.
> >  The question here is: why does this happen?
> >
> >  Is it normal behavior for the Ispell dictionary to emit multiple
> > lexemes for a single token? And if so, is this efficient? I
> > mean, why could it not simply save one lexeme 'smile' which (same as
> > the snowball dictionary) would match 'smiling' as well if later matched 
> > with
> > the accompanying tsquery?
> >
> >  Thanks!
> >
> >  Cheers,
> >  Tim
> >
> >
> >  --
> >  Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> >  To make changes to your subscription:
> >  http://www.postgresql.org/mailpref/pgsql-general
> >>>
> >>>
> >>>
> >>>  --
> >>>  Tim van der Linden 


-- 
Tim van der Linden 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Full text: Ispell dictionary

2014-05-07 Thread Oleg Bartunov
btw, take a look on contrib/dict_xsyn, it's  more powerful than
synonym dictionary.

On Sat, May 3, 2014 at 2:26 AM, Tim van der Linden  wrote:
> Hi Oleg
>
> Haha, understood!
>
> Thanks for helping me on this one.
>
> Cheers
> Tim
>
>
> On May 3, 2014 7:24:08 AM GMT+09:00, Oleg Bartunov 
> wrote:
>>
>> Tim,
>>
>> you did answer yourself - don't use ispell :)
>>
>> On Sat, May 3, 2014 at 1:45 AM, Tim van der Linden  wrote:
>>>
>>>  On Fri, 2 May 2014 21:12:56 +0400
>>>  Oleg Bartunov  wrote:
>>>
>>>  Hi Oleg
>>>
>>>  Thanks for the response!
>>>
  Yes, it's normal for ispell dictionary, think about morphological
 dictionary.
>>>
>>>
>>>  Hmm, I see, that makes sense. I thought the morphological aspect of the
>>> Ispell only dealt with splitting up compound words, but it also deals with
>>> deriving the word to a more "stem" like form, correct?
>>>
>>>  As a last question on this, is there a way to disable this dictionary to
>>> emit multiple lexemes?
>>>
>>>
>>> The reason I am asking is because in my (fairly new) understanding of
>>> PostgreSQL's full text it is always best to have as few lexemes as possible
>>> saved in the vector. This to get smaller indexes and faster matching
>>> afterwards. Also, if you run a tsquery afterwards to, you can still employ
>>> the power of these multiple lexemes to find a match.
>>>
>>>  Or...probably answering my own question...if I do not desire this
>>> behavior I should maybe not use Ispell and simply use another dictionary :)
>>>
>>>  Thanks again.
>>>
>>>  Cheers,
>>>  Tim
>>>
  On Fri, May 2, 2014 at 11:54 AM, Tim van der Linden 
 wrote:
>
>  Good morning/afternoon all
>
>  I am currently writing a few articles about PostgreSQL's full text
> capabilities and have a question about the Ispell dictionary which I
> cannot seem to find an answer to. It is probably a very simple issue, so
> forgive my ignorance.
>
>  In one article I am explaining about dictionaries and I have setup a
> sample configuration which maps most token categories to only use a Ispell
> dictionary (timusan_ispell) which has a default configuration:
>
>  CREATE TEXT SEARCH DICTIONARY timusan_ispell (
>  TEMPLATE = ispell,
>  DictFile = en_us,
>  AffFile = en_us,
>  StopWords = english
>  );
>
>  When I run a simple query like "SELECT
> to_tsvector('timusan-ispell','smiling')" I get back the following 
> tsvector:
>
>  'smile':1 'smiling':1
>
>  As you can see I get two lexemes with the same pointer.
>  The question here is: why does this happen?
>
>  Is it normal behavior for the Ispell dictionary to emit multiple
> lexemes for a single token? And if so, is this efficient? I
> mean, why could it not simply save one lexeme 'smile' which (same as
> the snowball dictionary) would match 'smiling' as well if later matched 
> with
> the accompanying tsquery?
>
>  Thanks!
>
>  Cheers,
>  Tim
>
>
>  --
>  Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>  To make changes to your subscription:
>  http://www.postgresql.org/mailpref/pgsql-general
>>>
>>>
>>>
>>>  --
>>>  Tim van der Linden 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Full text: Ispell dictionary

2014-05-02 Thread Tim van der Linden
Hi Oleg

Haha, understood!

Thanks for helping me on this one.

Cheers
Tim

On May 3, 2014 7:24:08 AM GMT+09:00, Oleg Bartunov  wrote:
>Tim,
>
>you did answer yourself - don't use ispell :)
>
>On Sat, May 3, 2014 at 1:45 AM, Tim van der Linden 
>wrote:
>> On Fri, 2 May 2014 21:12:56 +0400
>> Oleg Bartunov  wrote:
>>
>> Hi Oleg
>>
>> Thanks for the response!
>>
>>> Yes, it's normal for ispell dictionary, think about morphological
>dictionary.
>>
>> Hmm, I see, that makes sense. I thought the morphological aspect of
>the Ispell only dealt with splitting up compound words, but it also
>deals with deriving the word to a more "stem" like form, correct?
>>
>> As a last question on this, is there a way to disable this dictionary
>to emit multiple lexemes?
>>
>> The reason I am asking is because in my (fairly new) understanding of
>PostgreSQL's full text it is always best to have as few lexemes as
>possible saved in the vector. This to get smaller indexes and faster
>matching afterwards. Also, if you run a tsquery afterwards to, you can
>still employ the power of these multiple lexemes to find a match.
>>
>> Or...probably answering my own question...if I do not desire this
>behavior I should maybe not use Ispell and simply use another
>dictionary :)
>>
>> Thanks again.
>>
>> Cheers,
>> Tim
>>
>>> On Fri, May 2, 2014 at 11:54 AM, Tim van der Linden 
>wrote:
>>> > Good morning/afternoon all
>>> >
>>> > I am currently writing a few articles about PostgreSQL's full text
>capabilities and have a question about the Ispell dictionary which I
>cannot seem to find an answer to. It is probably a very simple issue,
>so forgive my ignorance.
>>> >
>>> > In one article I am explaining about dictionaries and I have setup
>a sample configuration which maps most token categories to only use a
>Ispell dictionary (timusan_ispell) which has a default configuration:
>>> >
>>> > CREATE TEXT SEARCH DICTIONARY timusan_ispell (
>>> > TEMPLATE = ispell,
>>> > DictFile = en_us,
>>> > AffFile = en_us,
>>> > StopWords = english
>>> > );
>>> >
>>> > When I run a simple query like "SELECT
>to_tsvector('timusan-ispell','smiling')" I get back the following
>tsvector:
>>> >
>>> > 'smile':1 'smiling':1
>>> >
>>> > As you can see I get two lexemes with the same pointer.
>>> > The question here is: why does this happen?
>>> >
>>> > Is it normal behavior for the Ispell dictionary to emit multiple
>lexemes for a single token? And if so, is this efficient? I mean, why
>could it not simply save one lexeme 'smile' which (same as the snowball
>dictionary) would match 'smiling' as well if later matched with the
>accompanying tsquery?
>>> >
>>> > Thanks!
>>> >
>>> > Cheers,
>>> > Tim
>>> >
>>> >
>>> > --
>>> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>>> > To make changes to your subscription:
>>> > http://www.postgresql.org/mailpref/pgsql-general
>>
>>
>> --
>> Tim van der Linden 


Re: [GENERAL] Full text: Ispell dictionary

2014-05-02 Thread Oleg Bartunov
Tim,

you did answer yourself - don't use ispell :)

On Sat, May 3, 2014 at 1:45 AM, Tim van der Linden  wrote:
> On Fri, 2 May 2014 21:12:56 +0400
> Oleg Bartunov  wrote:
>
> Hi Oleg
>
> Thanks for the response!
>
>> Yes, it's normal for ispell dictionary, think about morphological dictionary.
>
> Hmm, I see, that makes sense. I thought the morphological aspect of the 
> Ispell only dealt with splitting up compound words, but it also deals with 
> deriving the word to a more "stem" like form, correct?
>
> As a last question on this, is there a way to disable this dictionary to emit 
> multiple lexemes?
>
> The reason I am asking is because in my (fairly new) understanding of 
> PostgreSQL's full text it is always best to have as few lexemes as possible 
> saved in the vector. This to get smaller indexes and faster matching 
> afterwards. Also, if you run a tsquery afterwards to, you can still employ 
> the power of these multiple lexemes to find a match.
>
> Or...probably answering my own question...if I do not desire this behavior I 
> should maybe not use Ispell and simply use another dictionary :)
>
> Thanks again.
>
> Cheers,
> Tim
>
>> On Fri, May 2, 2014 at 11:54 AM, Tim van der Linden  wrote:
>> > Good morning/afternoon all
>> >
>> > I am currently writing a few articles about PostgreSQL's full text 
>> > capabilities and have a question about the Ispell dictionary which I 
>> > cannot seem to find an answer to. It is probably a very simple issue, so 
>> > forgive my ignorance.
>> >
>> > In one article I am explaining about dictionaries and I have setup a 
>> > sample configuration which maps most token categories to only use a Ispell 
>> > dictionary (timusan_ispell) which has a default configuration:
>> >
>> > CREATE TEXT SEARCH DICTIONARY timusan_ispell (
>> > TEMPLATE = ispell,
>> > DictFile = en_us,
>> > AffFile = en_us,
>> > StopWords = english
>> > );
>> >
>> > When I run a simple query like "SELECT 
>> > to_tsvector('timusan-ispell','smiling')" I get back the following tsvector:
>> >
>> > 'smile':1 'smiling':1
>> >
>> > As you can see I get two lexemes with the same pointer.
>> > The question here is: why does this happen?
>> >
>> > Is it normal behavior for the Ispell dictionary to emit multiple lexemes 
>> > for a single token? And if so, is this efficient? I mean, why could it not 
>> > simply save one lexeme 'smile' which (same as the snowball dictionary) 
>> > would match 'smiling' as well if later matched with the accompanying 
>> > tsquery?
>> >
>> > Thanks!
>> >
>> > Cheers,
>> > Tim
>> >
>> >
>> > --
>> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> > To make changes to your subscription:
>> > http://www.postgresql.org/mailpref/pgsql-general
>
>
> --
> Tim van der Linden 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Full text: Ispell dictionary

2014-05-02 Thread Tim van der Linden
On Fri, 2 May 2014 21:12:56 +0400
Oleg Bartunov  wrote:

Hi Oleg

Thanks for the response!

> Yes, it's normal for ispell dictionary, think about morphological dictionary.

Hmm, I see, that makes sense. I thought the morphological aspect of the Ispell 
only dealt with splitting up compound words, but it also deals with deriving 
the word to a more "stem" like form, correct?

As a last question on this, is there a way to disable this dictionary to emit 
multiple lexemes? 

The reason I am asking is because in my (fairly new) understanding of 
PostgreSQL's full text it is always best to have as few lexemes as possible 
saved in the vector. This to get smaller indexes and faster matching 
afterwards. Also, if you run a tsquery afterwards to, you can still employ the 
power of these multiple lexemes to find a match.

Or...probably answering my own question...if I do not desire this behavior I 
should maybe not use Ispell and simply use another dictionary :)

Thanks again.

Cheers,
Tim

> On Fri, May 2, 2014 at 11:54 AM, Tim van der Linden  wrote:
> > Good morning/afternoon all
> >
> > I am currently writing a few articles about PostgreSQL's full text 
> > capabilities and have a question about the Ispell dictionary which I cannot 
> > seem to find an answer to. It is probably a very simple issue, so forgive 
> > my ignorance.
> >
> > In one article I am explaining about dictionaries and I have setup a sample 
> > configuration which maps most token categories to only use a Ispell 
> > dictionary (timusan_ispell) which has a default configuration:
> >
> > CREATE TEXT SEARCH DICTIONARY timusan_ispell (
> > TEMPLATE = ispell,
> > DictFile = en_us,
> > AffFile = en_us,
> > StopWords = english
> > );
> >
> > When I run a simple query like "SELECT 
> > to_tsvector('timusan-ispell','smiling')" I get back the following tsvector:
> >
> > 'smile':1 'smiling':1
> >
> > As you can see I get two lexemes with the same pointer.
> > The question here is: why does this happen?
> >
> > Is it normal behavior for the Ispell dictionary to emit multiple lexemes 
> > for a single token? And if so, is this efficient? I mean, why could it not 
> > simply save one lexeme 'smile' which (same as the snowball dictionary) 
> > would match 'smiling' as well if later matched with the accompanying 
> > tsquery?
> >
> > Thanks!
> >
> > Cheers,
> > Tim
> >
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general


-- 
Tim van der Linden 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Full text: Ispell dictionary

2014-05-02 Thread Oleg Bartunov
Yes, it's normal for ispell dictionary, think about morphological dictionary.

On Fri, May 2, 2014 at 11:54 AM, Tim van der Linden  wrote:
> Good morning/afternoon all
>
> I am currently writing a few articles about PostgreSQL's full text 
> capabilities and have a question about the Ispell dictionary which I cannot 
> seem to find an answer to. It is probably a very simple issue, so forgive my 
> ignorance.
>
> In one article I am explaining about dictionaries and I have setup a sample 
> configuration which maps most token categories to only use a Ispell 
> dictionary (timusan_ispell) which has a default configuration:
>
> CREATE TEXT SEARCH DICTIONARY timusan_ispell (
> TEMPLATE = ispell,
> DictFile = en_us,
> AffFile = en_us,
> StopWords = english
> );
>
> When I run a simple query like "SELECT 
> to_tsvector('timusan-ispell','smiling')" I get back the following tsvector:
>
> 'smile':1 'smiling':1
>
> As you can see I get two lexemes with the same pointer.
> The question here is: why does this happen?
>
> Is it normal behavior for the Ispell dictionary to emit multiple lexemes for 
> a single token? And if so, is this efficient? I mean, why could it not simply 
> save one lexeme 'smile' which (same as the snowball dictionary) would match 
> 'smiling' as well if later matched with the accompanying tsquery?
>
> Thanks!
>
> Cheers,
> Tim
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general