Re: [HACKERS] Flexible configuration for full-text search

2017-11-07 Thread Aleksandr Parfenov
On Tue, 31 Oct 2017 09:47:57 +0100
Emre Hasegeli <e...@hasegeli.com> wrote:

> > If we want to save this behavior, we should somehow pass a stopword
> > to tsvector composition function (parsetext in ts_parse.c) for
> > counter increment or increment it in another way. Currently, an
> > empty lexemes array is passed as a result of LexizeExec.
> >
> > One of possible way to do so is something like:
> > CASE polish_stopword
> > WHEN MATCH THEN KEEP -- stopword counting
> > ELSE polish_isspell
> > END  
> 
> This would mean keeping the stopwords.  What we want is
> 
> CASE polish_stopword-- stopword counting
> WHEN NO MATCH THEN polish_isspell
> END
> 
> Do you think it is possible?

Hi Emre,

I thought how it can be implemented. The way I see is to increment
word counter in case if any chcked dictionary matched the word even
without returning lexeme. Main drawback is that counter increment is
implicit.

--
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Flexible configuration for full-text search

2017-11-06 Thread Aleksandr Parfenov
On Mon, 6 Nov 2017 18:05:23 +1300
Thomas Munro <thomas.mu...@enterprisedb.com> wrote:

> On Sat, Oct 21, 2017 at 1:39 AM, Aleksandr Parfenov
> <a.parfe...@postgrespro.ru> wrote:
> > In attachment updated patch with fixes of empty XML tags in
> > documentation.  
> 
> Hi Aleksandr,
> 
> I'm not sure if this is expected at this stage, but just in case you
> aren't aware, with this version of the patch the binary upgrade test
> in
> src/bin/pg_dump/t/002_pg_dump.pl fails for me:
> 
> #   Failed test 'binary_upgrade: dumps ALTER TEXT SEARCH CONFIGURATION
> dump_test.alt_ts_conf1 ...'
> #   at t/002_pg_dump.pl line 6715.
> 

Hi Thomas,

Thank you for noticing it. I will investigate it during work on next
version of patch.

-- 
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] A hook for session start

2017-11-03 Thread Aleksandr Parfenov
README file in patch 0003 is a copy of README from test_pg_dump module
without any changes.

-- 
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - use enum for meta commands

2017-11-02 Thread Aleksandr Parfenov
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Hi,

Looks good to me.

The only thing I'm not quite sure about is a comment "which meta command ...".
Maybe it's better to write it without question word, something like "meta 
command identifier..."?

The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] A hook for session start

2017-11-02 Thread Aleksandr Parfenov
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

Hi,

Unfortunately, patches 0001 and 0002 don't apply to current master.

The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Flexible configuration for full-text search

2017-10-30 Thread Aleksandr Parfenov
I'm mostly happy with mentioned modifications, but I have few questions
to clarify some points. I will send new patch in week or two.

On Thu, 26 Oct 2017 20:01:14 +0200
Emre Hasegeli <e...@hasegeli.com> wrote:
> To put it formally:
> 
> ALTER TEXT SEARCH CONFIGURATION name
> ADD MAPPING FOR token_type [, ... ] WITH config
> 
> where config is one of:
> 
> dictionary_name
> config { UNION | INTERSECT | EXCEPT } config
> CASE config WHEN [ NO ] MATCH THEN [ KEEP ELSE ] config END

According to formal definition following configurations are valid:

CASE english_hunspell WHEN MATCH THEN KEEP ELSE simple END
CASE english_noun WHEN MATCH THEN english_hunspell END

But configuration:

CASE english_noun WHEN MATCH THEN english_hunspell ELSE simple END

is not (as I understand ELSE can be used only with KEEP).

I think we should decide to allow or disallow usage of different
dictionaries for match checking (between CASE and WHEN) and a result
(after THEN). If answer is 'allow', maybe we should allow the
third example too for consistency in configurations.

> > 3) Using different dictionaries for recognizing and output
> > generation. As I mentioned before, in new syntax condition and
> > command are separate and we can use it for some more complex text
> > processing. Here an example for processing only nouns:
> >
> > ALTER TEXT SEARCH CONFIGURATION nouns_only
> >   ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
> > word, hword, hword_part WITH CASE
> >   WHEN english_noun THEN english_hunspell
> > END  
> 
> This would also still work with the simpler syntax because
> "english_noun", still being a dictionary, would pass the tokens to the
> next one.

Based on formal definition it is possible to describe this example in
following manner:
CASE english_noun WHEN MATCH THEN english_hunspell END

The question is same as in the previous example.

> Instead of supporting old way of putting stopwords on dictionaries, we
> can make them dictionaries on their own.  This would then become
> something like:
> 
> CASE polish_stopword
> WHEN NO MATCH THEN polish_isspell
> END

Currently, stopwords increment position, for example:
SELECT to_tsvector('english','a test message');
-
 'messag':3 'test':2

A stopword 'a' has a position 1 but it is not in the vector.

If we want to save this behavior, we should somehow pass a stopword to
tsvector composition function (parsetext in ts_parse.c) for counter
increment or increment it in another way. Currently, an empty lexemes
array is passed as a result of LexizeExec.

One of possible way to do so is something like:
CASE polish_stopword
WHEN MATCH THEN KEEP -- stopword counting
ELSE polish_isspell
END

-- 
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] I've just started working on Full Text Search with version 10 on Ubuntu 16

2017-10-19 Thread Aleksandr Parfenov
On Wed, 18 Oct 2017 22:53:16 -0400
Ronald Jewell <rljao...@gmail.com> wrote:

> and I'm getting error ...
> 
> ERROR:  could not open extension control file
> "/usr/share/postgresql/10/extension/tsearch2.control": No such file or
> directory
> 
> when I try to create the tsearch2 extension.

Hi,

tsearch2 is an extension which provides FTS interface for
applications were developed for old versions of PostgreSQL.

Since version 8.3 full-text search is part of PostgreSQL core.
tsearch2 solve a problem of API incompatibility in some aspects and
were kept for backward-compatibility reasons. It was removed since
version 10.

For more information about in-core full-text search API check
documentation at
https://www.postgresql.org/docs/10/static/textsearch-intro.html

-- 
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] code cleanup empty string initializations

2017-09-08 Thread Aleksandr Parfenov
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Hi Peter,

I looked through your patches and its look good to me.
Patches make code more readable and clear, especially in case of encodingid.

The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PROPOSAL] Text search configuration extension

2017-08-18 Thread Aleksandr Parfenov
Hello hackers!

I'm working on a new approach in text search configuration and want to
share my thought with community in order to get some feedback and maybe
some new ideas.

Nowadays we can't configure text search engine in Postgres for some
useful scenarios such as multi-language search or exact and
morphological search in one configuration. Additionally, we can't use
dictionaries as a filter-dictionary if it wasn't taken into
consideration during dictionary development. Also I think to split
result set building configuration and command selection configuration.
The last but not the least goal is to keep backward compatibility in
terms of syntax and behavior in currently available scenarios.

In order to meet mentioned goals I propose following syntax for text
search configurations (current syntax could be used as well):

ALTER TEXT SEARCH CONFIGURATION  ADD/ALTER MAPPING FOR
 WITH
CASE
   WHEN  THEN 
   <...>
   [ELSE ]
END;

A  is an expression with dictionary names used as operands
and boolean operators AND, OR and NOT. Additionally, after dictionary
name there could be options for result check IS [NOT] NULL or IS [NOT]
STOP. If there is no check-options for a dictionary, it will be
evaluated as:
   dict IS NOT NULL and dict IS NOT STOP

A  is an expression on sets of lexemes with support of
operators UNION, EXCEPT, INTERSECT and MAP BY. A MAP BY operator is a
way to configure filter-dictionaries, so the output of the righthand
subexpression used as an input of lefthand subexpression. In other
words, MAP BY operator used instead of TSL_FILTER flagged output.

An example of configuration for both English and German search:

ALTER TEXT SEARCH CONFIGURATION en_de_search ADD MAPPING FOR asciiword,
word WITH
CASE
   WHEN english_hunspell IS NOT NULL THEN english_hunspell
   WHEN german_hunspell IS NOT NULL THEN german_hunspell
   ELSE
 -- stem dictionaries can't be used for language detection
 english_stem UNION german_stem
END;

And example with unaccent:

ALTER TEXT SEARCH CONFIGURATION german_unaccent ADD MAPPING FOR
asciiword, word WITH
CASE
   WHEN german_hunspell IS NOT NULL THEN german_hunspell MAP BY unaccent
   ELSE
 german_stem MAP BY unaccent
END;

In the last example the input for german_hunspell is replaced by output
of the unaccent if it is not NULL. If dictionary returns more than one
lexeme, each lexeme processed independently.

I'm not sure should we provide ability to use MAP BY operator in
condition, since MAP BY operates on sets and condition is a boolean
expression. I think to allow this with restriction on obligatory place
it inside parenthesis with check-options. Something like:

(german_hunspell MAP BY unaccent) IS NOT NULL

Because this type of check can be useful in some situations, but we
should isolate set-related subexpression.

-- 
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers