Re: [HACKERS] unexpected result from to_tsvector

2016-03-30 Thread Shulgin, Oleksandr
On Wed, Mar 30, 2016 at 10:17 AM, Artur Zakirov 
wrote:

> On 29.03.2016 19:17, Shulgin, Oleksandr wrote:
>
>>
>> Hm, indeed.   Unfortunately, it is not quite easy to find "the" new RFC,
>> there was quite a number of correcting and extending RFCs issued over
>> the last (almost) 30 years, which is not that surprising...
>>
>> Are we going to do something about it?  Is it likely that
>> relaxing/changing the rules on our side will break any possible
>> workarounds that people might have employed to make the search work like
>> they want it to work?
>>
>
> Do you mean here workarounds to recognize such values as 't...@123-reg.ro'
> as an email address? Actually I do not see any workarounds except a patch
> to PostgreSQL.
>

No, more like disallowing '_' in the host/domain- names.  Anyway, that is
pure speculation on my part.

By the way, Teodor committed the patch yesterday.


I've seen that after posting my reply to the list ;-)

--
Alex


Re: [HACKERS] unexpected result from to_tsvector

2016-03-30 Thread Artur Zakirov

On 29.03.2016 19:17, Shulgin, Oleksandr wrote:


Hm, indeed.   Unfortunately, it is not quite easy to find "the" new RFC,
there was quite a number of correcting and extending RFCs issued over
the last (almost) 30 years, which is not that surprising...

Are we going to do something about it?  Is it likely that
relaxing/changing the rules on our side will break any possible
workarounds that people might have employed to make the search work like
they want it to work?


Do you mean here workarounds to recognize such values as 
't...@123-reg.ro' as an email address? Actually I do not see any 
workarounds except a patch to PostgreSQL.


By the way, Teodor committed the patch yesterday.



--
Alex




--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unexpected result from to_tsvector

2016-03-29 Thread Shulgin, Oleksandr
On Sun, Mar 20, 2016 at 3:42 PM, Tom Lane  wrote:

> "Shulgin, Oleksandr"  writes:
> > On Mar 20, 2016 01:09, "Dmitrii Golub"  wrote:
> >> Alex, actually subdomain can start with digit,
>
> > Not according to the RFC you have linked to.
>
> The powers-that-be relaxed that some time ago; I assume there's a newer
> RFC.  For instance, "163.com" is a real domain:
>
> $ dig 163.com
> ...
> ;; QUESTION SECTION:
> ;163.com.   IN  A
>

Hm, indeed.   Unfortunately, it is not quite easy to find "the" new RFC,
there was quite a number of correcting and extending RFCs issued over the
last (almost) 30 years, which is not that surprising...

Are we going to do something about it?  Is it likely that relaxing/changing
the rules on our side will break any possible workarounds that people might
have employed to make the search work like they want it to work?

--
Alex


Re: [HACKERS] unexpected result from to_tsvector

2016-03-25 Thread Artur Zakirov

On 25.03.2016 19:15, David Steele wrote:

On 3/25/16 12:14 PM, Artur Zakirov wrote:

On 25.03.2016 18:19, David Steele wrote:

Hi Artur,

On 3/20/16 10:42 AM, Tom Lane wrote:

"Shulgin, Oleksandr"  writes:

On Mar 20, 2016 01:09, "Dmitrii Golub" 
wrote:

Alex, actually subdomain can start with digit,



Not according to the RFC you have linked to.


The powers-that-be relaxed that some time ago; I assume there's a newer
RFC.  For instance, "163.com" is a real domain:


You marked this patch "needs review" and then a few minutes later
changed it to "waiting on author".

If this was a mistake please change it back to "needs review".  If you
really are working on a new patch when can we expect that?

Thanks,


Hi,

The previous patch is current, which can be commited.

I mark this patch as "needs review", because I noticed that the patch
was marked as "waiting on author". And I thought that I forgot to mark
as "need review".

But then I noticed that Robert Haas marked the patch as "waiting on
author" after my answer, and I returned "waiting on author". But I cant
find any questions or comments to me after my last answer.

Actually I think that this patch should be marked as "need review".


Done.



Thank you!

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unexpected result from to_tsvector

2016-03-25 Thread David Steele

On 3/25/16 12:14 PM, Artur Zakirov wrote:

On 25.03.2016 18:19, David Steele wrote:

Hi Artur,

On 3/20/16 10:42 AM, Tom Lane wrote:

"Shulgin, Oleksandr"  writes:

On Mar 20, 2016 01:09, "Dmitrii Golub"  wrote:

Alex, actually subdomain can start with digit,



Not according to the RFC you have linked to.


The powers-that-be relaxed that some time ago; I assume there's a newer
RFC.  For instance, "163.com" is a real domain:


You marked this patch "needs review" and then a few minutes later
changed it to "waiting on author".

If this was a mistake please change it back to "needs review".  If you
really are working on a new patch when can we expect that?

Thanks,


Hi,

The previous patch is current, which can be commited.

I mark this patch as "needs review", because I noticed that the patch
was marked as "waiting on author". And I thought that I forgot to mark
as "need review".

But then I noticed that Robert Haas marked the patch as "waiting on
author" after my answer, and I returned "waiting on author". But I cant
find any questions or comments to me after my last answer.

Actually I think that this patch should be marked as "need review".


Done.

--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unexpected result from to_tsvector

2016-03-25 Thread Artur Zakirov

On 25.03.2016 18:19, David Steele wrote:

Hi Artur,

On 3/20/16 10:42 AM, Tom Lane wrote:

"Shulgin, Oleksandr"  writes:

On Mar 20, 2016 01:09, "Dmitrii Golub"  wrote:

Alex, actually subdomain can start with digit,



Not according to the RFC you have linked to.


The powers-that-be relaxed that some time ago; I assume there's a newer
RFC.  For instance, "163.com" is a real domain:


You marked this patch "needs review" and then a few minutes later
changed it to "waiting on author".

If this was a mistake please change it back to "needs review".  If you
really are working on a new patch when can we expect that?

Thanks,


Hi,

The previous patch is current, which can be commited.

I mark this patch as "needs review", because I noticed that the patch 
was marked as "waiting on author". And I thought that I forgot to mark 
as "need review".


But then I noticed that Robert Haas marked the patch as "waiting on 
author" after my answer, and I returned "waiting on author". But I cant 
find any questions or comments to me after my last answer.


Actually I think that this patch should be marked as "need review".

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unexpected result from to_tsvector

2016-03-25 Thread David Steele

Hi Artur,

On 3/20/16 10:42 AM, Tom Lane wrote:

"Shulgin, Oleksandr"  writes:

On Mar 20, 2016 01:09, "Dmitrii Golub"  wrote:

Alex, actually subdomain can start with digit,



Not according to the RFC you have linked to.


The powers-that-be relaxed that some time ago; I assume there's a newer
RFC.  For instance, "163.com" is a real domain:


You marked this patch "needs review" and then a few minutes later 
changed it to "waiting on author".


If this was a mistake please change it back to "needs review".  If you 
really are working on a new patch when can we expect that?


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unexpected result from to_tsvector

2016-03-20 Thread Tom Lane
"Shulgin, Oleksandr"  writes:
> On Mar 20, 2016 01:09, "Dmitrii Golub"  wrote:
>> Alex, actually subdomain can start with digit,

> Not according to the RFC you have linked to.

The powers-that-be relaxed that some time ago; I assume there's a newer
RFC.  For instance, "163.com" is a real domain:

$ dig 163.com
...
;; QUESTION SECTION:
;163.com.   IN  A

;; ANSWER SECTION:
163.com.600 IN  A   123.58.180.8
163.com.600 IN  A   123.58.180.7

;; AUTHORITY SECTION:
163.com.4516IN  NS  ns3.nease.net.
163.com.4516IN  NS  ns2.nease.net.
...

$ whois 163.com
...
Registry Registrant ID: 
Registrant Name: Domain Admin
Registrant Organization: Guangzhou NetEase Computer System Co., Ltd
Registrant Street: No. 16, Keyun Road, Tianhe District, 
Registrant City: Guangzhou
Registrant State/Province: Guangdong
Registrant Postal Code: 510665
Registrant Country: CN
Registrant Phone: +86.2085106370
Registrant Phone Ext: 
Registrant Fax: +86.2085106370
Registrant Fax Ext: 
Registrant Email: nsad...@corp.netease.com
...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unexpected result from to_tsvector

2016-03-20 Thread Shulgin, Oleksandr
On Mar 20, 2016 01:09, "Dmitrii Golub"  wrote:
>
> 2016-03-14 16:22 GMT+03:00 Shulgin, Oleksandr <
oleksandr.shul...@zalando.de>:
>>
>> In fact, the 123-yyy.zzz domain is not valid either according to the RFC
(subdomain can't start with a digit), but since we already allow it, should
we not allow 123_yyy.zzz to be recognized as a Host?  Then why not
recognize aaa@123_yyy.zzz as an email address?
>>
>> Another option is to prohibit underscore in recognized host names, but
this has more breakage potential IMO.
>>
>
> Alex, actually subdomain can start with digit,

Not according to the RFC you have linked to.

> try it.

What do you mean? Try it with ts_debug()? I already did, you could see me
referring to this example above: 123-yyy.zzz

--
Alex


Re: [HACKERS] unexpected result from to_tsvector

2016-03-19 Thread Artur Zakirov
I found the discussion about allowing an underscore in emails 
http://www.postgresql.org/message-id/200908281359.n7sdxfaf044...@wwwmaster.postgresql.org


That bug report is about recognizing an underscore in the local part of 
an email. And is not about recognizing an underscore in a domain name. 
But that patch allows an underscore in recognized host names also.


I am not good in RFC, so I put excerpt from Wikipedia 
https://en.wikipedia.org/wiki/Email_address:



The local-part of the email address may use any of these ASCII characters:

Uppercase and lowercase Latin letters (A–Z, a–z) (ASCII: 65–90, 97–122)
Digits 0 to 9 (ASCII: 48–57)
These special characters: !#$%&'*+-/=?^_`{|}~ (ASCII: 33, 35–39, 42, 43, 45, 
47, 61, 63, 94–96, 123–126)
Character . (dot, period, full stop), ASCII 46, provided that it is not the 
first or last character, and provided also that it does not appear 
consecutively (e.g. john.@example.com is not allowed).
Other special characters are allowed with restrictions (they are only allowed 
inside a quoted string, as described in the paragraph below, and in addition, a 
backslash or double-quote must be preceded by a backslash). These characters 
are:
Space and "(),:;<>@[\] (ASCII: 32, 34, 40, 41, 44, 58, 59, 60, 62, 64, 91–93)
Comments are allowed with parentheses at either end of the local part; e.g. 
john.smith(comment)@example.com and (comment)john.sm...@example.com are both 
equivalent to john.sm...@example.com.


and https://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_host_names


The Internet standards (Requests for Comments) for protocols mandate that 
component hostname labels may contain only the ASCII letters 'a' through 'z' 
(in a case-insensitive manner),the digits '0' through '9', and the hyphen 
('-'). The original specification of hostnames in RFC 952, mandated that labels 
could not start with a digit or with a hyphen, and must not end with a hyphen. 
However, a subsequent specification (RFC 1123) permitted hostname labels to 
start with digits. No other symbols, punctuation characters, or white space are 
permitted.


Hence the valid emails is (I might be wrong):

12...@sample.com
12...@sample.com
1...@123-sample.com
1...@123sample.com

The attached patch allow them to be recognized as a email. But this 
patch does not prohibit underscore in recognized host names.


As a result this patch gives the following results with underscores:

=# select * from ts_debug('simple', 'aaa@123_yyy.zzz');
 alias |  description  |  token  | dictionaries | dictionary | 
 lexemes

---+---+-+--++---
 email | Email address | aaa@123_yyy.zzz | {simple} | simple | 
{aaa@123_yyy.zzz}

(1 row)

=# select * from ts_debug('simple', '123_yyy.zzz');
 alias | description |token| dictionaries | dictionary | 
lexemes

---+-+-+--++---
 host  | Host| 123_yyy.zzz | {simple} | simple | 
{123_yyy.zzz}

(1 row)

On 14.03.2016 17:45, Artur Zakirov wrote:

On 14.03.2016 16:22, Shulgin, Oleksandr wrote:


Hm...  now that doesn't look all that consistent to me (after applying
the patch):

=# select ts_debug('simple', 'a...@123-yyy.zzz');
  ts_debug
---

  (email,"Email
address",a...@123-yyy.zzz,{simple},simple,{a...@123-yyy.zzz})
(1 row)

But:

=# select ts_debug('simple', 'aaa@123_yyy.zzz');
 ts_debug
-
  (asciiword,"Word, all ASCII",aaa,{simple},simple,{aaa})
  (blank,"Space symbols",@,{},,)
  (uint,"Unsigned integer",123,{simple},simple,{123})
  (blank,"Space symbols",_,{},,)
  (host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
(5 rows)

One can also see that if we only keep the domain name, the result is
similar:

=# select ts_debug('simple', '123-yyy.zzz');
ts_debug
---
  (host,Host,123-yyy.zzz,{simple},simple,{123-yyy.zzz})
(1 row)

=# select ts_debug('simple', '123_yyy.zzz');
   ts_debug
-
  (uint,"Unsigned integer",123,{simple},simple,{123})
  (blank,"Space symbols",_,{},,)
  (host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
(3 rows)

But, this only has to do with 123 being recognized as a number, not with
the underscore:

=# select ts_debug('simple', 'abc_yyy.zzz');
ts_debug
---
  (host,Host,abc_yyy.zzz,{simple},simple,{abc_yyy.zzz})
(1 row)

=# select ts_debug('simple', '1abc_yyy.zzz');
ts_debug
---
  (host,Host,1abc_yyy.zzz,{simple},simple,{1abc_yyy.zzz})
(1 row)

In fact, the 123-yyy.zzz domain is not valid either according to the RFC

Re: [HACKERS] unexpected result from to_tsvector

2016-03-19 Thread Dmitrii Golub
2016-03-14 16:22 GMT+03:00 Shulgin, Oleksandr 
:

> On Mon, Mar 7, 2016 at 10:46 PM, Artur Zakirov 
> wrote:
>
>> Hello,
>>
>> On 07.03.2016 23:55, Dmitrii Golub wrote:
>>
>>>
>>>
>>> Hello,
>>>
>>> Should we added tests for this case?
>>>
>>
>> I think we should. I have added tests for teo...@123-stack.net and
>> 1...@stack.net emails.
>>
>>
>>> 123_reg.ro  is not valid domain name, bacause of
>>> symbol "_"
>>>
>>> https://tools.ietf.org/html/rfc1035 page 8.
>>>
>>> Dmitrii Golub
>>>
>>
>> Thank you for the information. Fixed.
>
>
> Hm...  now that doesn't look all that consistent to me (after applying the
> patch):
>
> =# select ts_debug('simple', 'a...@123-yyy.zzz');
>  ts_debug
> ---
>  (email,"Email address",a...@123-yyy.zzz,{simple},simple,{a...@123-yyy.zzz})
> (1 row)
>
> But:
>
> =# select ts_debug('simple', 'aaa@123_yyy.zzz');
> ts_debug
> -
>  (asciiword,"Word, all ASCII",aaa,{simple},simple,{aaa})
>  (blank,"Space symbols",@,{},,)
>  (uint,"Unsigned integer",123,{simple},simple,{123})
>  (blank,"Space symbols",_,{},,)
>  (host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
> (5 rows)
>
> One can also see that if we only keep the domain name, the result is
> similar:
>
> =# select ts_debug('simple', '123-yyy.zzz');
>ts_debug
> ---
>  (host,Host,123-yyy.zzz,{simple},simple,{123-yyy.zzz})
> (1 row)
>
> =# select ts_debug('simple', '123_yyy.zzz');
>   ts_debug
> -
>  (uint,"Unsigned integer",123,{simple},simple,{123})
>  (blank,"Space symbols",_,{},,)
>  (host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
> (3 rows)
>
> But, this only has to do with 123 being recognized as a number, not with
> the underscore:
>
> =# select ts_debug('simple', 'abc_yyy.zzz');
>ts_debug
> ---
>  (host,Host,abc_yyy.zzz,{simple},simple,{abc_yyy.zzz})
> (1 row)
>
> =# select ts_debug('simple', '1abc_yyy.zzz');
>ts_debug
> ---
>  (host,Host,1abc_yyy.zzz,{simple},simple,{1abc_yyy.zzz})
> (1 row)
>
> In fact, the 123-yyy.zzz domain is not valid either according to the RFC
> (subdomain can't start with a digit), but since we already allow it, should
> we not allow 123_yyy.zzz to be recognized as a Host?  Then why not
> recognize aaa@123_yyy.zzz as an email address?
>
> Another option is to prohibit underscore in recognized host names, but
> this has more breakage potential IMO.
>
> --
> Alex
>
>
Alex, actually subdomain can start with digit, try it.


Re: [HACKERS] unexpected result from to_tsvector

2016-03-14 Thread Artur Zakirov

On 14.03.2016 16:22, Shulgin, Oleksandr wrote:


Hm...  now that doesn't look all that consistent to me (after applying
the patch):

=# select ts_debug('simple', 'a...@123-yyy.zzz');
  ts_debug
---
  (email,"Email address",a...@123-yyy.zzz,{simple},simple,{a...@123-yyy.zzz})
(1 row)

But:

=# select ts_debug('simple', 'aaa@123_yyy.zzz');
 ts_debug
-
  (asciiword,"Word, all ASCII",aaa,{simple},simple,{aaa})
  (blank,"Space symbols",@,{},,)
  (uint,"Unsigned integer",123,{simple},simple,{123})
  (blank,"Space symbols",_,{},,)
  (host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
(5 rows)

One can also see that if we only keep the domain name, the result is
similar:

=# select ts_debug('simple', '123-yyy.zzz');
ts_debug
---
  (host,Host,123-yyy.zzz,{simple},simple,{123-yyy.zzz})
(1 row)

=# select ts_debug('simple', '123_yyy.zzz');
   ts_debug
-
  (uint,"Unsigned integer",123,{simple},simple,{123})
  (blank,"Space symbols",_,{},,)
  (host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
(3 rows)

But, this only has to do with 123 being recognized as a number, not with
the underscore:

=# select ts_debug('simple', 'abc_yyy.zzz');
ts_debug
---
  (host,Host,abc_yyy.zzz,{simple},simple,{abc_yyy.zzz})
(1 row)

=# select ts_debug('simple', '1abc_yyy.zzz');
ts_debug
---
  (host,Host,1abc_yyy.zzz,{simple},simple,{1abc_yyy.zzz})
(1 row)

In fact, the 123-yyy.zzz domain is not valid either according to the RFC
(subdomain can't start with a digit), but since we already allow it,
should we not allow 123_yyy.zzz to be recognized as a Host?  Then why
not recognize aaa@123_yyy.zzz as an email address?

Another option is to prohibit underscore in recognized host names, but
this has more breakage potential IMO.

--
Alex



It seems reasonable to me. I like more first option. But I am not 
confident that we should allow 123_yyy.zzz to be recognized as a Host.


By the way, in this question http://webmasters.stackexchange.com/a/775 
you can see examples of domain names with numbers (but not subdomains).


If there are not objections from others, I will send a new patch today 
later or tomorrow with 123_yyy.zzz recognizing.


--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unexpected result from to_tsvector

2016-03-14 Thread Shulgin, Oleksandr
On Mon, Mar 7, 2016 at 10:46 PM, Artur Zakirov 
wrote:

> Hello,
>
> On 07.03.2016 23:55, Dmitrii Golub wrote:
>
>>
>>
>> Hello,
>>
>> Should we added tests for this case?
>>
>
> I think we should. I have added tests for teo...@123-stack.net and
> 1...@stack.net emails.
>
>
>> 123_reg.ro  is not valid domain name, bacause of
>> symbol "_"
>>
>> https://tools.ietf.org/html/rfc1035 page 8.
>>
>> Dmitrii Golub
>>
>
> Thank you for the information. Fixed.


Hm...  now that doesn't look all that consistent to me (after applying the
patch):

=# select ts_debug('simple', 'a...@123-yyy.zzz');
 ts_debug
---
 (email,"Email address",a...@123-yyy.zzz,{simple},simple,{a...@123-yyy.zzz})
(1 row)

But:

=# select ts_debug('simple', 'aaa@123_yyy.zzz');
ts_debug
-
 (asciiword,"Word, all ASCII",aaa,{simple},simple,{aaa})
 (blank,"Space symbols",@,{},,)
 (uint,"Unsigned integer",123,{simple},simple,{123})
 (blank,"Space symbols",_,{},,)
 (host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
(5 rows)

One can also see that if we only keep the domain name, the result is
similar:

=# select ts_debug('simple', '123-yyy.zzz');
   ts_debug
---
 (host,Host,123-yyy.zzz,{simple},simple,{123-yyy.zzz})
(1 row)

=# select ts_debug('simple', '123_yyy.zzz');
  ts_debug
-
 (uint,"Unsigned integer",123,{simple},simple,{123})
 (blank,"Space symbols",_,{},,)
 (host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
(3 rows)

But, this only has to do with 123 being recognized as a number, not with
the underscore:

=# select ts_debug('simple', 'abc_yyy.zzz');
   ts_debug
---
 (host,Host,abc_yyy.zzz,{simple},simple,{abc_yyy.zzz})
(1 row)

=# select ts_debug('simple', '1abc_yyy.zzz');
   ts_debug
---
 (host,Host,1abc_yyy.zzz,{simple},simple,{1abc_yyy.zzz})
(1 row)

In fact, the 123-yyy.zzz domain is not valid either according to the RFC
(subdomain can't start with a digit), but since we already allow it, should
we not allow 123_yyy.zzz to be recognized as a Host?  Then why not
recognize aaa@123_yyy.zzz as an email address?

Another option is to prohibit underscore in recognized host names, but this
has more breakage potential IMO.

--
Alex


Re: [HACKERS] unexpected result from to_tsvector

2016-03-10 Thread Dmitrii Golub
2016-03-08 0:46 GMT+03:00 Artur Zakirov :

> Hello,
>
> On 07.03.2016 23:55, Dmitrii Golub wrote:
>
>>
>>
>> Hello,
>>
>> Should we added tests for this case?
>>
>
> I think we should. I have added tests for teo...@123-stack.net and
> 1...@stack.net emails.
>
>
>> 123_reg.ro  is not valid domain name, bacause of
>> symbol "_"
>>
>> https://tools.ietf.org/html/rfc1035 page 8.
>>
>> Dmitrii Golub
>>
>
> Thank you for the information. Fixed.
>
>
> --
> Artur Zakirov
> Postgres Professional: http://www.postgrespro.com
> Russian Postgres Company
>

Looks good to me

Dmitrii Golub


Re: [HACKERS] unexpected result from to_tsvector

2016-03-07 Thread Artur Zakirov

Hello,

On 07.03.2016 23:55, Dmitrii Golub wrote:



Hello,

Should we added tests for this case?


I think we should. I have added tests for teo...@123-stack.net and 
1...@stack.net emails.




123_reg.ro  is not valid domain name, bacause of
symbol "_"

https://tools.ietf.org/html/rfc1035 page 8.

Dmitrii Golub


Thank you for the information. Fixed.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
*** a/src/backend/tsearch/wparser_def.c
--- b/src/backend/tsearch/wparser_def.c
***
*** 1121,1126  static const TParserStateActionItem actionTPS_InUnsignedInt[] = {
--- 1121,1128 
  	{p_iseqC, '.', A_PUSH, TPS_InUDecimalFirst, 0, NULL},
  	{p_iseqC, 'e', A_PUSH, TPS_InMantissaFirst, 0, NULL},
  	{p_iseqC, 'E', A_PUSH, TPS_InMantissaFirst, 0, NULL},
+ 	{p_iseqC, '-', A_PUSH, TPS_InHostFirstAN, 0, NULL},
+ 	{p_iseqC, '@', A_PUSH, TPS_InEmail, 0, NULL},
  	{p_isasclet, 0, A_PUSH, TPS_InHost, 0, NULL},
  	{p_isalpha, 0, A_NEXT, TPS_InNumWord, 0, NULL},
  	{p_isspecial, 0, A_NEXT, TPS_InNumWord, 0, NULL},
*** a/src/test/regress/expected/tsearch.out
--- b/src/test/regress/expected/tsearch.out
***
*** 264,270  SELECT * FROM ts_token_type('default');
  23 | entity  | XML entity
  (23 rows)
  
! SELECT * FROM ts_parse('default', '345 qwe@efd.r '' http://www.com/ http://aew.werc.ewr/?ad=qwe 1aew.werc.ewr/?ad=qwe 2aew.werc.ewr http://3aew.werc.ewr/?ad=qwe http://4aew.werc.ewr http://5aew.werc.ewr:8100/?  ad=qwe 6aew.werc.ewr:8100/?ad=qwe 7aew.werc.ewr:8100/?ad=qwe=%20%32 +4.0e-10 qwe qwe qwqwe 234.435 455 5.005 teo...@stack.net qwe-wer asdf qwer jf sdjk ewr1> ewri2 
  /usr/local/fff /awdf/dwqe/4325 rewt/ewr wefjn /wqe-324/ewr gist.h gist.h.c gist.c. readline 4.2 4.2. 4.2, readline-4.2 readline-4.2. 234
   wow  < jqw <> qwerty');
   tokid |token 
--- 264,270 
  23 | entity  | XML entity
  (23 rows)
  
! SELECT * FROM ts_parse('default', '345 qwe@efd.r '' http://www.com/ http://aew.werc.ewr/?ad=qwe 1aew.werc.ewr/?ad=qwe 2aew.werc.ewr http://3aew.werc.ewr/?ad=qwe http://4aew.werc.ewr http://5aew.werc.ewr:8100/?  ad=qwe 6aew.werc.ewr:8100/?ad=qwe 7aew.werc.ewr:8100/?ad=qwe=%20%32 +4.0e-10 qwe qwe qwqwe 234.435 455 5.005 teo...@stack.net teo...@123-stack.net 1...@stack.net qwe-wer asdf qwer jf sdjk ewr1> ewri2 
  /usr/local/fff /awdf/dwqe/4325 rewt/ewr wefjn /wqe-324/ewr gist.h gist.h.c gist.c. readline 4.2 4.2. 4.2, readline-4.2 readline-4.2. 234
   wow  < jqw <> qwerty');
   tokid |token 
***
*** 332,337  SELECT * FROM ts_parse('default', '345 qwe@efd.r '' http://www.com/ http://aew.w
--- 332,341 
  12 |  
   4 | teo...@stack.net
  12 |  
+  4 | teo...@123-stack.net
+ 12 |  
+  4 | 1...@stack.net
+ 12 |  
  16 | qwe-wer
  11 | qwe
  12 | -
***
*** 404,425  SELECT * FROM ts_parse('default', '345 qwe@efd.r '' http://www.com/ http://aew.w
  12 |  
  12 | <> 
   1 | qwerty
! (133 rows)
  
! SELECT to_tsvector('english', '345 qwe@efd.r '' http://www.com/ http://aew.werc.ewr/?ad=qwe 1aew.werc.ewr/?ad=qwe 2aew.werc.ewr http://3aew.werc.ewr/?ad=qwe http://4aew.werc.ewr http://5aew.werc.ewr:8100/?  ad=qwe 6aew.werc.ewr:8100/?ad=qwe 7aew.werc.ewr:8100/?ad=qwe=%20%32 +4.0e-10 qwe qwe qwqwe 234.435 455 5.005 teo...@stack.net qwe-wer asdf qwer jf sdjk ewr1> ewri2 
  /usr/local/fff /awdf/dwqe/4325 rewt/ewr wefjn /wqe-324/ewr gist.h gist.h.c gist.c. readline 4.2 4.2. 4.2, readline-4.2 readline-4.2. 234
   wow  < jqw <> qwerty');
!to_tsvector
! 

Re: [HACKERS] unexpected result from to_tsvector

2016-03-07 Thread Dmitrii Golub
2016-02-23 20:53 GMT+03:00 Artur Zakirov :

> Hello,
>
> Here is a little patch. It fixes this issue
> http://www.postgresql.org/message-id/20160217080048.26357.49...@wrigleys.postgresql.org
>
> Without patch we get wrong result for the second email 't...@123-reg.ro':
>
> => SELECT * FROM ts_debug('simple', 't...@vauban-reg.ro');
>  alias |  description  |   token| dictionaries | dictionary |
>  lexemes
>
> ---+---++--++--
>  email | Email address | t...@vauban-reg.ro | {simple} | simple  | {
> t...@vauban-reg.ro}
> (1 row)
>
> => SELECT * FROM ts_debug('simple', 't...@123-reg.ro');
>alias   |   description| token  | dictionaries | dictionary |
> lexemes
>
> ---+--++--++--
>  asciiword | Word, all ASCII  | test   | {simple} | simple | {test}
>  blank | Space symbols| @  | {}   ||
>  uint  | Unsigned integer | 123| {simple} | simple | {123}
>  blank | Space symbols| -  | {}   ||
>  host  | Host | reg.ro | {simple} | simple | {
> reg.ro}
> (5 rows)
>
> After patch we get correct result for the second email:
>
> => SELECT * FROM ts_debug('simple', 't...@123-reg.ro');
>  alias |  description  |  token  | dictionaries | dictionary |
>lexemes
>
> ---+---+-+--++--
>  email | Email address | t...@123-reg.ro | {simple} | simple  | {
> t...@123-reg.ro}
> (1 row)
>
> This patch allows to parser work with emails 't...@123-reg.ro', '
> 1...@123-reg.ro' and 'test@123_reg.ro' correctly.
>
> --
> Artur Zakirov
> Postgres Professional: http://www.postgrespro.com
> Russian Postgres Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>
Hello,

Should we added tests for this case?

123_reg.ro is not valid domain name, bacause of symbol "_"

https://tools.ietf.org/html/rfc1035 page 8.

Dmitrii Golub