Re: [HACKERS] tsearch Parser Hacking

2011-02-17 Thread Oleg Bartunov

David,

as a cool perl guy you can easily take OpenFTS (openfts.sourceforge.net),
which provides perl interface to tsearch datatypes, and develop a
plperl version. That would be interesting for many people, who like flexibility
of perl. We personally use openfts in our web projects,i.e., we use tsearch as
a storage and we prepare tsvector externally. Openfts distribution contains
tests, examples of dictionaries, parser. Current interface of configuration
is ugly, but it should be not difficult to write table driven configuration.

What do you think ?

Oleg

On Wed, 16 Feb 2011, David E. Wheeler wrote:


On Feb 14, 2011, at 11:44 PM, Oleg Bartunov wrote:


IMO, sooner or later we need to trash that code and replace it with
something a bit more modification-friendly.


We thought about configurable parser, but AFAIR, we didn't get any support for 
this at that time.


What would it take to change the requirement such that *any* SQL function could 
be a parser, not only C functions? Maybe require that they turn a nested array 
of tokens? That way I could just write a function in PL/Perl quite easily.

Best,

David



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsearch Parser Hacking

2011-02-17 Thread Jesper Krogh
On 16 Feb 2011, at 23:22, "David E. Wheeler"  wrote:

> On Feb 14, 2011, at 11:44 PM, Oleg Bartunov wrote:
> 
>>> IMO, sooner or later we need to trash that code and replace it with
>>> something a bit more modification-friendly.
>> 
>> We thought about configurable parser, but AFAIR, we didn't get any support 
>> for this at that time.
> 
> What would it take to change the requirement such that *any* SQL function 
> could be a parser, not only C functions? Maybe require that they turn a 
> nested array of tokens? That way I could just write a function in PL/Perl 
> quite easily.

I had just the same thought in mind. But so far I systematically substitute _ 
and a few other characters to ł which doesn't get interpreted as blanks.  But 
more direct control would be appreciated 

Jesper
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsearch Parser Hacking

2011-02-16 Thread David E. Wheeler
On Feb 14, 2011, at 11:44 PM, Oleg Bartunov wrote:

>> IMO, sooner or later we need to trash that code and replace it with
>> something a bit more modification-friendly.
> 
> We thought about configurable parser, but AFAIR, we didn't get any support 
> for this at that time.

What would it take to change the requirement such that *any* SQL function could 
be a parser, not only C functions? Maybe require that they turn a nested array 
of tokens? That way I could just write a function in PL/Perl quite easily.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsearch Parser Hacking

2011-02-14 Thread Oleg Bartunov

On Mon, 14 Feb 2011, David E. Wheeler wrote:


On Feb 14, 2011, at 11:37 PM, Oleg Bartunov wrote:


it's not easy to hack tsearch parser, sorry. You can preparse your input
before to_tsquery,to_tsvector.


Yeah, I was thinking about s{/}{-}g before passing the values in. Might be the 
only way to do it for now?


actually, it's not so difficult to *hack* parser to treat '/' as '-'.
I thought about overriding some default parser behaviour, but didn't come
to any useful solution. 
btw, some users already wrote their own parsers and even I have little

tutorial:
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/HOWTO-parser-tsearch2.html
I wonder if it's worth to add it to 
http://www.postgresql.org/docs/8.4/static/test-parser.html


Probably, good paper/presentation along with improving code docs would be 
enough for now, until someone got very bright idea about parser and 
time to implement it.


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsearch Parser Hacking

2011-02-14 Thread Oleg Bartunov

On Mon, 14 Feb 2011, Tom Lane wrote:


"David E. Wheeler"  writes:

Is it possible to modify the default tsearch parser so that / doesn't get lexed as a 
"file" token?


There is zero, none, nada, provision for modifying the behavior of the
default parser, other than by changing its compiled-in state transition
tables.

It doesn't help any that said tables are baroquely designed and utterly
undocumented.


what do you mean 'baroquely' ? Do you know 'gothic' design :?



IMO, sooner or later we need to trash that code and replace it with
something a bit more modification-friendly.


We thought about configurable parser, but AFAIR, we didn't get any support 
for this at that time.


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsearch Parser Hacking

2011-02-14 Thread David E. Wheeler
On Feb 14, 2011, at 11:37 PM, Oleg Bartunov wrote:

> it's not easy to hack tsearch parser, sorry. You can preparse your input
> before to_tsquery,to_tsvector.

Yeah, I was thinking about s{/}{-}g before passing the values in. Might be the 
only way to do it for now…

Thanks,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsearch Parser Hacking

2011-02-14 Thread Oleg Bartunov

David,

it's not easy to hack tsearch parser, sorry. You can preparse your input
before to_tsquery,to_tsvector.

Oleg
On Mon, 14 Feb 2011, David E. Wheeler wrote:


Hackers,

Is it possible to modify the default tsearch parser so that / doesn't get lexed as a 
"file" token? That is, instead of this:

try=# select * from ts_debug('simple'::regconfig, 'w/d');
alias │description│ token │ dictionaries │ dictionary │ lexemes
───┼───┼───┼──┼┼─
file  │ File or path name │ w/d   │ {simple} │ simple │ {w/d}

Ideally it'd think that / was the same as -:

try=# select * from ts_debug('simple'::regconfig, 'w-d');
 alias  │   description   │ token │ dictionaries │ 
dictionary │ lexemes
─┼─┼───┼──┼┼─
asciihword  │ Hyphenated word, all ASCII  │ w-d   │ {simple} │ 
simple │ {w-d}
hword_asciipart │ Hyphenated word part, all ASCII │ w │ {simple} │ 
simple │ {w}
blank   │ Space symbols   │ - │ {}   │ 
[null] │ [null]
hword_asciipart │ Hyphenated word part, all ASCII │ d │ {simple} │ 
simple │ {d}
(4 rows)

Possible? Or would I have to write a completely new parser just to change this 
bit?

Thanks,

David





Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsearch Parser Hacking

2011-02-14 Thread Sushant Sinha
I agree that it will be a good idea to rewrite the entire thing. However, in
the mean time, I sent a proposal earlier

http://archives.postgresql.org/pgsql-hackers/2010-08/msg00019.php

And a patch later:

http://archives.postgresql.org/pgsql-hackers/2010-09/msg00476.php

Tom asked me to look into Compound Word support but I found it not usable.
Here was my response:
http://archives.postgresql.org/pgsql-hackers/2011-01/msg00419.php

I have not got any response since then,

-Sushant.


On Tue, Feb 15, 2011 at 9:33 AM, David E. Wheeler wrote:

> On Feb 14, 2011, at 3:57 PM, Tom Lane wrote:
>
> > There is zero, none, nada, provision for modifying the behavior of the
> > default parser, other than by changing its compiled-in state transition
> > tables.
> >
> > It doesn't help any that said tables are baroquely designed and utterly
> > undocumented.
> >
> > IMO, sooner or later we need to trash that code and replace it with
> > something a bit more modification-friendly.
>
> I was afraid you'd say that. Thanks.
>
> David
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] tsearch Parser Hacking

2011-02-14 Thread David E. Wheeler
On Feb 14, 2011, at 3:57 PM, Tom Lane wrote:

> There is zero, none, nada, provision for modifying the behavior of the
> default parser, other than by changing its compiled-in state transition
> tables.
> 
> It doesn't help any that said tables are baroquely designed and utterly
> undocumented.
> 
> IMO, sooner or later we need to trash that code and replace it with
> something a bit more modification-friendly.

I was afraid you'd say that. Thanks.

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsearch Parser Hacking

2011-02-14 Thread David Blewett
On Mon, Feb 14, 2011 at 6:57 PM, Tom Lane  wrote:
> "David E. Wheeler"  writes:
>> Is it possible to modify the default tsearch parser so that / doesn't get 
>> lexed as a "file" token?
>
> There is zero, none, nada, provision for modifying the behavior of the
> default parser, other than by changing its compiled-in state transition
> tables.
>
> It doesn't help any that said tables are baroquely designed and utterly
> undocumented.
>
> IMO, sooner or later we need to trash that code and replace it with
> something a bit more modification-friendly.

I added this to the TODO as something that can be tackled in the
future. I've been wishing it would be possible to add other tokens as
well (Python dotted path 'foo.bar.baz', Perl namespace path
'Foo::Bar', more flexible version number parsing, etc).

David Blewett

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsearch Parser Hacking

2011-02-14 Thread Thom Brown
On 14 February 2011 23:57, Tom Lane  wrote:
> "David E. Wheeler"  writes:
>> Is it possible to modify the default tsearch parser so that / doesn't get 
>> lexed as a "file" token?
>
> There is zero, none, nada, provision for modifying the behavior of the
> default parser, other than by changing its compiled-in state transition
> tables.
>
> It doesn't help any that said tables are baroquely designed and utterly
> undocumented.

This is very true. I intended to look into adding new tokens, but gave
up when I couldn't see how those transition tables worked.

> IMO, sooner or later we need to trash that code and replace it with
> something a bit more modification-friendly.

+1 for annihilating the existing code at some point.

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsearch Parser Hacking

2011-02-14 Thread Tom Lane
"David E. Wheeler"  writes:
> Is it possible to modify the default tsearch parser so that / doesn't get 
> lexed as a "file" token?

There is zero, none, nada, provision for modifying the behavior of the
default parser, other than by changing its compiled-in state transition
tables.

It doesn't help any that said tables are baroquely designed and utterly
undocumented.

IMO, sooner or later we need to trash that code and replace it with
something a bit more modification-friendly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] tsearch Parser Hacking

2011-02-14 Thread David E. Wheeler
Hackers,

Is it possible to modify the default tsearch parser so that / doesn't get lexed 
as a "file" token? That is, instead of this:

try=# select * from ts_debug('simple'::regconfig, 'w/d');
 alias │description│ token │ dictionaries │ dictionary │ lexemes 
───┼───┼───┼──┼┼─
 file  │ File or path name │ w/d   │ {simple} │ simple │ {w/d}

Ideally it'd think that / was the same as -:

try=# select * from ts_debug('simple'::regconfig, 'w-d');
  alias  │   description   │ token │ dictionaries │ 
dictionary │ lexemes 
─┼─┼───┼──┼┼─
 asciihword  │ Hyphenated word, all ASCII  │ w-d   │ {simple} │ 
simple │ {w-d}
 hword_asciipart │ Hyphenated word part, all ASCII │ w │ {simple} │ 
simple │ {w}
 blank   │ Space symbols   │ - │ {}   │ 
[null] │ [null]
 hword_asciipart │ Hyphenated word part, all ASCII │ d │ {simple} │ 
simple │ {d}
(4 rows)

Possible? Or would I have to write a completely new parser just to change this 
bit?

Thanks,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers