Re: [GENERAL] Text search with ispell

2009-01-27 Thread Andreas Wenk
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tommy Gildseth schrieb:
 I'm trying to figure out how to use PostgreSQL's fulltext search with an
 ispell dictionary. I'm having a bit of trouble figuring out where this
 norwegian.dict comes from though.
 When I install the norwegian ispell dictionary, i get 4 files, nb.aff,
 nb.hash, nn.aff and nn.hash. What I'm unable to figure out, is the steps
 needed to use this for PostgreSQL?


Which version are you running? It's important to know, because tsearch2 is 
integrated
since version 8.3. The behaviour for implementing in earlier versions is 
therefore
different ...

Cheers

Andy
- --
St.Pauli - Hamburg - Germany

Andreas Wenk



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJfum6Va7znmSP9AwRAlN4AJ9odanCrD3R+gMzb7yzJjXWEKfCUACeN1Tv
SmVDeFa6xemj53T2cMUFoyM=
=khkB
-END PGP SIGNATURE-

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Text search with ispell

2009-01-27 Thread Tommy Gildseth

Andreas Wenk wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tommy Gildseth schrieb:

I'm trying to figure out how to use PostgreSQL's fulltext search with an
ispell dictionary. I'm having a bit of trouble figuring out where this
norwegian.dict comes from though.
When I install the norwegian ispell dictionary, i get 4 files, nb.aff,
nb.hash, nn.aff and nn.hash. What I'm unable to figure out, is the steps
needed to use this for PostgreSQL?



Which version are you running? It's important to know, because tsearch2 is 
integrated
since version 8.3. The behaviour for implementing in earlier versions is 
therefore
different ...


It will be running on version 8.3

--
Tommy Gildseth

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Text search with ispell

2009-01-27 Thread Oleg Bartunov

On Tue, 27 Jan 2009, Tommy Gildseth wrote:

I'm trying to figure out how to use PostgreSQL's fulltext search with an 
ispell dictionary. I'm having a bit of trouble figuring out where this 
norwegian.dict comes from though.
When I install the norwegian ispell dictionary, i get 4 files, nb.aff, 
nb.hash, nn.aff and nn.hash. What I'm unable to figure out, is the steps 
needed to use this for PostgreSQL?


you need to make a choice between two kinds of norwegian language - nn, nb,
see http://en.wikipedia.org/wiki/Norwegian_language
Then follow standard procedure described in documentation.
Where did you get them ?

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Text search with ispell

2009-01-27 Thread Tommy Gildseth

Oleg Bartunov wrote:

On Tue, 27 Jan 2009, Tommy Gildseth wrote:

I'm trying to figure out how to use PostgreSQL's fulltext search with 
an ispell dictionary. I'm having a bit of trouble figuring out where 
this norwegian.dict comes from though.
When I install the norwegian ispell dictionary, i get 4 files, nb.aff, 
nb.hash, nn.aff and nn.hash. What I'm unable to figure out, is the 
steps needed to use this for PostgreSQL?


you need to make a choice between two kinds of norwegian language - nn, nb,
see http://en.wikipedia.org/wiki/Norwegian_language
Then follow standard procedure described in documentation.
Where did you get them ?



Yes, I'm aware of that I need to choose one of those. I guess what I'm 
having problems with, is figuring out where the language.dict file 
comes from.
I didn't find any such file in the rpm downloaded from the links at 
http://ficus-www.cs.ucla.edu/geoff/ispell.html#ftp-sites and also not in 
the inorwegian-package in the ubuntu apt repository.
I have read through 
http://www.postgresql.org/docs/current/static/textsearch.html, but it's 
not quite clear to me, from that, what I need to do, to use an ispell 
dictionary with tsearch.



--
Tommy Gildseth

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Text search with ispell

2009-01-27 Thread Oleg Bartunov

Have you read 
http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY
We suggest to use dictionaries which come with openoffice, hunspell, probably
has better support of composite words.

On Tue, 27 Jan 2009, Tommy Gildseth wrote:


Oleg Bartunov wrote:

On Tue, 27 Jan 2009, Tommy Gildseth wrote:

I'm trying to figure out how to use PostgreSQL's fulltext search with an 
ispell dictionary. I'm having a bit of trouble figuring out where this 
norwegian.dict comes from though.
When I install the norwegian ispell dictionary, i get 4 files, nb.aff, 
nb.hash, nn.aff and nn.hash. What I'm unable to figure out, is the steps 
needed to use this for PostgreSQL?


you need to make a choice between two kinds of norwegian language - nn, nb,
see http://en.wikipedia.org/wiki/Norwegian_language
Then follow standard procedure described in documentation.
Where did you get them ?



Yes, I'm aware of that I need to choose one of those. I guess what I'm having 
problems with, is figuring out where the language.dict file comes from.
I didn't find any such file in the rpm downloaded from the links at 
http://ficus-www.cs.ucla.edu/geoff/ispell.html#ftp-sites and also not in the 
inorwegian-package in the ubuntu apt repository.
I have read through 
http://www.postgresql.org/docs/current/static/textsearch.html, but it's not 
quite clear to me, from that, what I need to do, to use an ispell dictionary 
with tsearch.






Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Text search with ispell

2009-01-27 Thread Tommy Gildseth

Oleg Bartunov wrote:
Have you read 
http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY 

We suggest to use dictionaries which come with openoffice, hunspell, 
probably

has better support of composite words.



Thanks, that knocked me onto the right track. To easy to miss the 
blindingly obvious at times. :-)

Works beautifully now.

--
Tommy Gildseth

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Text search with ispell

2009-01-27 Thread Tommy Gildseth

Tommy Gildseth wrote:

Oleg Bartunov wrote:
Have you read 
http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY 

We suggest to use dictionaries which come with openoffice, hunspell, 
probably

has better support of composite words.



Thanks, that knocked me onto the right track. To easy to miss the 
blindingly obvious at times. :-)

Works beautifully now.



I may have been to quick to declare success.

The following works as expected, returning the individual words:
SELECT
  ts_debug('norwegian', 'overbuljongterningpakkmesterassistent'),
  ts_debug('norwegian', 'sjokoladefabrikk'),
  ts_debug('norwegian', 'epleskrott');
-[ RECORD 1 
]--
ts_debug | (asciiword,Word, all 
ASCII,overbuljongterningpakkmesterassistent,{no_ispell,norwegian_stem},no_ispell,{buljong,terning,pakk,mester,assistent})
ts_debug | (asciiword,Word, all 
ASCII,sjokoladefabrikk,{no_ispell,norwegian_stem},no_ispell,{sjokoladefabrikk,sjokolade,fabrikk})
ts_debug | (asciiword,Word, all 
ASCII,epleskrott,{no_ispell,norwegian_stem},no_ispell,{epleskrott,eple,skrott})



But, the following does not:
SELECT
  ts_debug('norwegian', 'hemsedalsdans'),
  ts_debug('norwegian', 'lærdalsbrua'),
  ts_debug('norwegian', 'hengesmykke');
-[ RECORD 1 
]
ts_debug | (asciiword,Word, all 
ASCII,hemsedalsdans,{no_ispell,norwegian_stem},norwegian_stem,{hemsedalsdan})
ts_debug | (word,Word, all 
letters,lærdalsbrua,{no_ispell,norwegian_stem},norwegian_stem,{lærdalsbru})
ts_debug | (asciiword,Word, all 
ASCII,hengesmykke,{no_ispell,norwegian_stem},norwegian_stem,{hengesmykk})



Would this be due to a limitation in the dictionary, or a 
misconfiguration on my side?


Commands used are as follows:

CREATE TEXT SEARCH DICTIONARY no_ispell ( 


TEMPLATE = ispell,
DictFile = nb_NO,
AffFile =  nb_NO,
StopWords = norwegian
);

and

ALTER TEXT SEARCH CONFIGURATION norwegian ALTER MAPPING FOR  asciiword, 
asciihword, hword_asciipart,word, hword, hword_part WITH no_ispell, 
norwegian_stem;


--
Tommy Gildseth

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Text search with ispell

2009-01-27 Thread Oleg Bartunov

On Tue, 27 Jan 2009, Tommy Gildseth wrote:


Tommy Gildseth wrote:

Oleg Bartunov wrote:
Have you read 
http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY 
We suggest to use dictionaries which come with openoffice, hunspell, 
probably

has better support of composite words.



Thanks, that knocked me onto the right track. To easy to miss the 
blindingly obvious at times. :-)

Works beautifully now.



I may have been to quick to declare success.

The following works as expected, returning the individual words:
SELECT
 ts_debug('norwegian', 'overbuljongterningpakkmesterassistent'),
 ts_debug('norwegian', 'sjokoladefabrikk'),
 ts_debug('norwegian', 'epleskrott');
-[ RECORD 1 
]--
ts_debug | (asciiword,Word, all 
ASCII,overbuljongterningpakkmesterassistent,{no_ispell,norwegian_stem},no_ispell,{buljong,terning,pakk,mester,assistent})
ts_debug | (asciiword,Word, all 
ASCII,sjokoladefabrikk,{no_ispell,norwegian_stem},no_ispell,{sjokoladefabrikk,sjokolade,fabrikk})
ts_debug | (asciiword,Word, all 
ASCII,epleskrott,{no_ispell,norwegian_stem},no_ispell,{epleskrott,eple,skrott})



But, the following does not:
SELECT
 ts_debug('norwegian', 'hemsedalsdans'),
 ts_debug('norwegian', 'l?rdalsbrua'),
 ts_debug('norwegian', 'hengesmykke');
-[ RECORD 1 
]
ts_debug | (asciiword,Word, all 
ASCII,hemsedalsdans,{no_ispell,norwegian_stem},norwegian_stem,{hemsedalsdan})
ts_debug | (word,Word, all 
letters,l?rdalsbrua,{no_ispell,norwegian_stem},norwegian_stem,{l?rdalsbru})
ts_debug | (asciiword,Word, all 
ASCII,hengesmykke,{no_ispell,norwegian_stem},norwegian_stem,{hengesmykk})



Would this be due to a limitation in the dictionary, or a misconfiguration on 
my side?


sorry, I don't know norwegian, what do you mean ?  Did you complain that
no_ispell doesn't recognize these words ?



Commands used are as follows:

CREATE TEXT SEARCH DICTIONARY no_ispell ( 
TEMPLATE = ispell,

   DictFile = nb_NO,
   AffFile =  nb_NO,
   StopWords = norwegian
);

and

ALTER TEXT SEARCH CONFIGURATION norwegian ALTER MAPPING FOR  asciiword, 
asciihword, hword_asciipart,word, hword, hword_part WITH no_ispell, 
norwegian_stem;





Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Text search with ispell

2009-01-27 Thread Tommy Gildseth

Oleg Bartunov wrote:

On Tue, 27 Jan 2009, Tommy Gildseth wrote:


Tommy Gildseth wrote:

Oleg Bartunov wrote:
Have you read 
http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY 
We suggest to use dictionaries which come with openoffice, hunspell, 
probably

has better support of composite words.



Thanks, that knocked me onto the right track. To easy to miss the 
blindingly obvious at times. :-)

Works beautifully now.



I may have been to quick to declare success.

The following works as expected, returning the individual words:
SELECT
 ts_debug('norwegian', 'overbuljongterningpakkmesterassistent'),
 ts_debug('norwegian', 'sjokoladefabrikk'),
 ts_debug('norwegian', 'epleskrott');
-[ RECORD 1 
]-- 

ts_debug | (asciiword,Word, all 
ASCII,overbuljongterningpakkmesterassistent,{no_ispell,norwegian_stem},no_ispell,{buljong,terning,pakk,mester,assistent}) 

ts_debug | (asciiword,Word, all 
ASCII,sjokoladefabrikk,{no_ispell,norwegian_stem},no_ispell,{sjokoladefabrikk,sjokolade,fabrikk}) 

ts_debug | (asciiword,Word, all 
ASCII,epleskrott,{no_ispell,norwegian_stem},no_ispell,{epleskrott,eple,skrott}) 




But, the following does not:
SELECT
 ts_debug('norwegian', 'hemsedalsdans'),
 ts_debug('norwegian', 'l?rdalsbrua'),
 ts_debug('norwegian', 'hengesmykke');
-[ RECORD 1 
] 

ts_debug | (asciiword,Word, all 
ASCII,hemsedalsdans,{no_ispell,norwegian_stem},norwegian_stem,{hemsedalsdan}) 

ts_debug | (word,Word, all 
letters,l?rdalsbrua,{no_ispell,norwegian_stem},norwegian_stem,{l?rdalsbru}) 

ts_debug | (asciiword,Word, all 
ASCII,hengesmykke,{no_ispell,norwegian_stem},norwegian_stem,{hengesmykk}) 




Would this be due to a limitation in the dictionary, or a 
misconfiguration on my side?


sorry, I don't know norwegian, what do you mean ?  Did you complain that
no_ispell doesn't recognize these words ?


Yes, I'm sorry, I should have explained better.
The words hemsedalsdans, hengesmykke and lærdalsbrua, are 
concatenations of the words Hemsedal and dans, henge and smykke and 
Lærdal and bru. Hemsedal and Lærdal are in fact geographic names, so I'm 
not sure it would handle that at all anyway. Both parts of the word, 
hengesmykke, is in the dictionary though, ie. both henge and smykke. It 
seems that some words it is able to properly spilt, and then some it 
doesn't recognise.


The problem I'm trying to work around, is that as far as I can tell, 
tsearch doesn't support truncation, ie. searching for *smykke or 
hemsedal* etc.


--
Tommy Gildseth

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Text search with ispell

2009-01-27 Thread Oleg Bartunov

On Tue, 27 Jan 2009, Tommy Gildseth wrote:


sorry, I don't know norwegian, what do you mean ?  Did you complain that
no_ispell doesn't recognize these words ?


Yes, I'm sorry, I should have explained better.
The words hemsedalsdans, hengesmykke and l?rdalsbrua, are concatenations of 
the words Hemsedal and dans, henge and smykke and L?rdal and bru. Hemsedal 
and L?rdal are in fact geographic names, so I'm not sure it would handle that 
at all anyway. Both parts of the word, hengesmykke, is in the dictionary 
though, ie. both henge and smykke. It seems that some words it is able to 
properly spilt, and then some it doesn't recognise.


you may improve dictionary, affix file should have 
COMPOUNDFLAG z

dict file should contain 'henge', 'smykke' with that flag 'z'.
Where did you get dictionary ?



The problem I'm trying to work around, is that as far as I can tell, tsearch 
doesn't support truncation, ie. searching for *smykke or hemsedal* etc.


8.4 version will support prefix search hemsedal*.
But you could always write your own dictionary or just use dict_xsyn
dictionary for such kinds exceptions.
http://www.postgresql.org/docs/8.3/static/dict-xsyn.html




Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general