Re: [HACKERS] tsearch Parser Hacking
David, as a cool perl guy you can easily take OpenFTS (openfts.sourceforge.net), which provides perl interface to tsearch datatypes, and develop a plperl version. That would be interesting for many people, who like flexibility of perl. We personally use openfts in our web projects,i.e., we use tsearch as a storage and we prepare tsvector externally. Openfts distribution contains tests, examples of dictionaries, parser. Current interface of configuration is ugly, but it should be not difficult to write table driven configuration. What do you think ? Oleg On Wed, 16 Feb 2011, David E. Wheeler wrote: On Feb 14, 2011, at 11:44 PM, Oleg Bartunov wrote: IMO, sooner or later we need to trash that code and replace it with something a bit more modification-friendly. We thought about configurable parser, but AFAIR, we didn't get any support for this at that time. What would it take to change the requirement such that *any* SQL function could be a parser, not only C functions? Maybe require that they turn a nested array of tokens? That way I could just write a function in PL/Perl quite easily. Best, David Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsearch Parser Hacking
On 16 Feb 2011, at 23:22, "David E. Wheeler" wrote: > On Feb 14, 2011, at 11:44 PM, Oleg Bartunov wrote: > >>> IMO, sooner or later we need to trash that code and replace it with >>> something a bit more modification-friendly. >> >> We thought about configurable parser, but AFAIR, we didn't get any support >> for this at that time. > > What would it take to change the requirement such that *any* SQL function > could be a parser, not only C functions? Maybe require that they turn a > nested array of tokens? That way I could just write a function in PL/Perl > quite easily. I had just the same thought in mind. But so far I systematically substitute _ and a few other characters to ł which doesn't get interpreted as blanks. But more direct control would be appreciated Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsearch Parser Hacking
On Feb 14, 2011, at 11:44 PM, Oleg Bartunov wrote: >> IMO, sooner or later we need to trash that code and replace it with >> something a bit more modification-friendly. > > We thought about configurable parser, but AFAIR, we didn't get any support > for this at that time. What would it take to change the requirement such that *any* SQL function could be a parser, not only C functions? Maybe require that they turn a nested array of tokens? That way I could just write a function in PL/Perl quite easily. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsearch Parser Hacking
On Mon, 14 Feb 2011, David E. Wheeler wrote: On Feb 14, 2011, at 11:37 PM, Oleg Bartunov wrote: it's not easy to hack tsearch parser, sorry. You can preparse your input before to_tsquery,to_tsvector. Yeah, I was thinking about s{/}{-}g before passing the values in. Might be the only way to do it for now? actually, it's not so difficult to *hack* parser to treat '/' as '-'. I thought about overriding some default parser behaviour, but didn't come to any useful solution. btw, some users already wrote their own parsers and even I have little tutorial: http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/HOWTO-parser-tsearch2.html I wonder if it's worth to add it to http://www.postgresql.org/docs/8.4/static/test-parser.html Probably, good paper/presentation along with improving code docs would be enough for now, until someone got very bright idea about parser and time to implement it. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsearch Parser Hacking
On Mon, 14 Feb 2011, Tom Lane wrote: "David E. Wheeler" writes: Is it possible to modify the default tsearch parser so that / doesn't get lexed as a "file" token? There is zero, none, nada, provision for modifying the behavior of the default parser, other than by changing its compiled-in state transition tables. It doesn't help any that said tables are baroquely designed and utterly undocumented. what do you mean 'baroquely' ? Do you know 'gothic' design :? IMO, sooner or later we need to trash that code and replace it with something a bit more modification-friendly. We thought about configurable parser, but AFAIR, we didn't get any support for this at that time. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsearch Parser Hacking
On Feb 14, 2011, at 11:37 PM, Oleg Bartunov wrote: > it's not easy to hack tsearch parser, sorry. You can preparse your input > before to_tsquery,to_tsvector. Yeah, I was thinking about s{/}{-}g before passing the values in. Might be the only way to do it for now… Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsearch Parser Hacking
David, it's not easy to hack tsearch parser, sorry. You can preparse your input before to_tsquery,to_tsvector. Oleg On Mon, 14 Feb 2011, David E. Wheeler wrote: Hackers, Is it possible to modify the default tsearch parser so that / doesn't get lexed as a "file" token? That is, instead of this: try=# select * from ts_debug('simple'::regconfig, 'w/d'); alias │description│ token │ dictionaries │ dictionary │ lexemes ───┼───┼───┼──┼┼─ file │ File or path name │ w/d │ {simple} │ simple │ {w/d} Ideally it'd think that / was the same as -: try=# select * from ts_debug('simple'::regconfig, 'w-d'); alias │ description │ token │ dictionaries │ dictionary │ lexemes ─┼─┼───┼──┼┼─ asciihword │ Hyphenated word, all ASCII │ w-d │ {simple} │ simple │ {w-d} hword_asciipart │ Hyphenated word part, all ASCII │ w │ {simple} │ simple │ {w} blank │ Space symbols │ - │ {} │ [null] │ [null] hword_asciipart │ Hyphenated word part, all ASCII │ d │ {simple} │ simple │ {d} (4 rows) Possible? Or would I have to write a completely new parser just to change this bit? Thanks, David Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsearch Parser Hacking
I agree that it will be a good idea to rewrite the entire thing. However, in the mean time, I sent a proposal earlier http://archives.postgresql.org/pgsql-hackers/2010-08/msg00019.php And a patch later: http://archives.postgresql.org/pgsql-hackers/2010-09/msg00476.php Tom asked me to look into Compound Word support but I found it not usable. Here was my response: http://archives.postgresql.org/pgsql-hackers/2011-01/msg00419.php I have not got any response since then, -Sushant. On Tue, Feb 15, 2011 at 9:33 AM, David E. Wheeler wrote: > On Feb 14, 2011, at 3:57 PM, Tom Lane wrote: > > > There is zero, none, nada, provision for modifying the behavior of the > > default parser, other than by changing its compiled-in state transition > > tables. > > > > It doesn't help any that said tables are baroquely designed and utterly > > undocumented. > > > > IMO, sooner or later we need to trash that code and replace it with > > something a bit more modification-friendly. > > I was afraid you'd say that. Thanks. > > David > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] tsearch Parser Hacking
On Feb 14, 2011, at 3:57 PM, Tom Lane wrote: > There is zero, none, nada, provision for modifying the behavior of the > default parser, other than by changing its compiled-in state transition > tables. > > It doesn't help any that said tables are baroquely designed and utterly > undocumented. > > IMO, sooner or later we need to trash that code and replace it with > something a bit more modification-friendly. I was afraid you'd say that. Thanks. David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsearch Parser Hacking
On Mon, Feb 14, 2011 at 6:57 PM, Tom Lane wrote: > "David E. Wheeler" writes: >> Is it possible to modify the default tsearch parser so that / doesn't get >> lexed as a "file" token? > > There is zero, none, nada, provision for modifying the behavior of the > default parser, other than by changing its compiled-in state transition > tables. > > It doesn't help any that said tables are baroquely designed and utterly > undocumented. > > IMO, sooner or later we need to trash that code and replace it with > something a bit more modification-friendly. I added this to the TODO as something that can be tackled in the future. I've been wishing it would be possible to add other tokens as well (Python dotted path 'foo.bar.baz', Perl namespace path 'Foo::Bar', more flexible version number parsing, etc). David Blewett -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsearch Parser Hacking
On 14 February 2011 23:57, Tom Lane wrote: > "David E. Wheeler" writes: >> Is it possible to modify the default tsearch parser so that / doesn't get >> lexed as a "file" token? > > There is zero, none, nada, provision for modifying the behavior of the > default parser, other than by changing its compiled-in state transition > tables. > > It doesn't help any that said tables are baroquely designed and utterly > undocumented. This is very true. I intended to look into adding new tokens, but gave up when I couldn't see how those transition tables worked. > IMO, sooner or later we need to trash that code and replace it with > something a bit more modification-friendly. +1 for annihilating the existing code at some point. -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsearch Parser Hacking
"David E. Wheeler" writes: > Is it possible to modify the default tsearch parser so that / doesn't get > lexed as a "file" token? There is zero, none, nada, provision for modifying the behavior of the default parser, other than by changing its compiled-in state transition tables. It doesn't help any that said tables are baroquely designed and utterly undocumented. IMO, sooner or later we need to trash that code and replace it with something a bit more modification-friendly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] tsearch Parser Hacking
Hackers, Is it possible to modify the default tsearch parser so that / doesn't get lexed as a "file" token? That is, instead of this: try=# select * from ts_debug('simple'::regconfig, 'w/d'); alias │description│ token │ dictionaries │ dictionary │ lexemes ───┼───┼───┼──┼┼─ file │ File or path name │ w/d │ {simple} │ simple │ {w/d} Ideally it'd think that / was the same as -: try=# select * from ts_debug('simple'::regconfig, 'w-d'); alias │ description │ token │ dictionaries │ dictionary │ lexemes ─┼─┼───┼──┼┼─ asciihword │ Hyphenated word, all ASCII │ w-d │ {simple} │ simple │ {w-d} hword_asciipart │ Hyphenated word part, all ASCII │ w │ {simple} │ simple │ {w} blank │ Space symbols │ - │ {} │ [null] │ [null] hword_asciipart │ Hyphenated word part, all ASCII │ d │ {simple} │ simple │ {d} (4 rows) Possible? Or would I have to write a completely new parser just to change this bit? Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers