from:"Teodor Sigaev"

Re: [HACKERS] [PATCH] Pageinspect - add functions on GIN and GiST indexes from gevel

2017-07-25 Thread Teodor Sigaev



It's not clear from the web site in question that the relevant code is
released under the PostgreSQL license.



As author I allow to use this code in PostgreSQL and under its license.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Double shared memory allocation for SLRU LWLocks

2017-07-12 Thread Teodor Sigaev


It seems to me that we're allocating shared memory for SLRU lwlocks twice,
unless I'm missing something.

I think you are right.

Did you check previous versions? i.e. should it be applyed to previous 
branches?? I suppose yes, to minimize code difference.


Also I'd like an idea to add Assert(offset <= SimpleLruShmemSize(nslots, nlsns)) 
at the end of SimpleLruInit()




SimpleLruShmemSize() calculates total SLRU shared memory size including lwlocks
size.

SimpleLruInit() starts with line

shared = (SlruShared) ShmemInitStruct(name,
  SimpleLruShmemSize(nslots, nlsns),
  );

which allocates SLRU shared memory (LWLocks size is included because
SimpleLruShmemSize() is used for size computation).

Following line allocates shared memory for LWLocks again:
shared->buffer_locks = (LWLockPadded *) ShmemAlloc(sizeof(LWLockPadded) * 
nslots);

Attached patch fixes that by removing extra ShmemAlloc for SLRU.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Suspicious place in heap_prepare_freeze_tuple()

2017-07-05 Thread Teodor Sigaev


Hi!

Playing around freezing tuple I found suspicious piece of code:

heap_prepare_freeze_tuple():
...
frz->t_infomask = tuple->t_infomask;
...
frz->t_infomask &= ~HEAP_XMAX_BITS;
frz->xmax = newxmax;
if (flags & FRM_MARK_COMMITTED)
frz->t_infomask &= HEAP_XMAX_COMMITTED;

Seems, in last line it should be a bitwise OR instead of AND. Now this line 
cleans all bits in t_infomask which later will be copied directly in tuple.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8deb344d09..ec227bac80 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6639,7 +6639,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
 			frz->t_infomask &= ~HEAP_XMAX_BITS;
 			frz->xmax = newxmax;
 			if (flags & FRM_MARK_COMMITTED)
-frz->t_infomask &= HEAP_XMAX_COMMITTED;
+frz->t_infomask |= HEAP_XMAX_COMMITTED;
 			changed = true;
 			totally_frozen = false;
 		}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pluggable storage

2017-06-23 Thread Teodor Sigaev


1. Table AM with a 6-byte TID.
2. Table AM with a custom locator format, which could be TID-like.
3. Table AM with no locators.


Currently TID has its own type in system catalog. Seems, it's possible that 
storage claims type of TID which it uses. Then, AM could claim it too, so the 
core based on that information could solve the question about AM-storage 
compatibility. Storage could also claim that it hasn't TID type at all so it 
couldn't be used with any access method, use case: compressed column oriented 
storage.


As I remeber, only GIN depends on TID format, other indexes use it as opaque 
type. Except, at least, btree and GiST - they believ that internal pointers are 
the same as outer (to heap)


Another dubious part - BitmapScan.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance bug in v10

2017-06-02 Thread Teodor Sigaev


BTW, was the larger query plan that you showed (with a Materialize node)
generated by 9.6, or v10 HEAD?  Because I would be surprised if 9.6 did

v10,
commit acbd8375e954774181b673a31b814e9d46f436a5
Author: Magnus Hagander <mag...@hagander.net>
Date:   Fri Jun 2 11:18:24 2017 +0200

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance bug in v10

2017-06-02 Thread Teodor Sigaev


There were old threads about considering a risk factor when estimating
plans, and I'm thinking this issue is the planner failing to do
exactly that.


I'm afraid it's tool late for v10

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance bug in v10

2017-06-02 Thread Teodor Sigaev



Teodor, could you check if this patch fixes your real-world problem?


It works fine with original query, thank you. But some other query slowdowns for 
~10% (9 secs vs 10 secs). Look at following part of plans of huge query:


without patch:
->  Nested Loop  (cost=34.82..50.91 rows=1 width=20)
 (actual time=0.017..0.061 rows=5 loops=24121)
  ->  ...
  ->  Materialize  (cost=0.56..15.69 rows=1 width=5)
   (actual time=0.003..0.004 rows=2 loops=109061)
->  Index Scan using ... (cost=0.56..15.68 rows=1 width=5)
 (actual time=0.013..0.014 rows=2 loops=24121)

with patch:
->  Nested Loop  (cost=34.82..50.91 rows=1 width=20)
 (actual time=0.018..0.063 rows=5 loops=24121)
  -> ...
  ->  Index Scan using ...  (cost=0.56..15.68 rows=1 width=5)
(actual time=0.012..0.013 rows=2 loops=109061)

(dots hidden the same parts)

As you said, it removes Materialize node, although it's useful here.

If you wish, I can do a test suite, its size will be around 10MB and send it by 
private  email.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance bug in v10

2017-06-01 Thread Teodor Sigaev


Thank you for the answer!



This is all caused by get_variable_numdistinct() deciding that all
values are distinct because ntuples < DEFAULT_NUM_DISTINCT. I see that
if the example is increased to use 300 tuples instead of 32, then
that's enough for the planner to estimate 2 rows instead of clamping
to 1, and the bad plan does not look so good anymore since the planner
predicts that those nested loops need to be executed more than once.
I miss here why could the presence of index influence on that? removing 
index causes a good plan although it isn't used in both plans .




I really think the planner is too inclined to take risks by nesting
Nested loops like this, but I'm not all that sure the best solution to
fix this, and certainly not for beta1.

So, I'm a bit unsure exactly how best to deal with this.  It seems
like we'd better make some effort, as perhaps this could be a case
that might occur when temp tables are used and not ANALYZED, but the
only way I can think to deal with it is not to favour unique inner
nested loops in the costing model.  The unfortunate thing about not
doing this is that the planner will no longer swap the join order of a
2-way join to put the unique rel on the inner side. This is evident by
the regression test failures caused by patching with the attached,
which changes the cost model for nested loops back to what it was
before unique joins.
The patch, seems, works for this particular case, but loosing swap isn't 
good thing, I suppose.




My other line of thought is just not to bother doing anything about
this. There's plenty more queries you could handcraft to trick the
planner into generating a plan that'll blow up like this. Is this a
realistic enough one to bother accounting for? Did it come from a real
world case? else, how did you stumble upon it?


Unfortunately, it's taken from real application.

--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Perfomance bug in v10

2017-05-31 Thread Teodor Sigaev


Hi!

I found an example where v10 chooses extremely non-optimal plan:
select
i::int as a,
i::int + 1 as b,
0 as c
into t
from
generate_series(1,32) as i;

create unique index i on t (c, a);

explain analyze
SELECT
t1.a, t1.b,
t2.a, t2.b,
t3.a, t3.b,
t4.a, t4.b,
t5.a, t5.b,
t6.a, t6.b
/*
,
t7.a, t7.b,
t8.a, t8.b,
t9.a, t9.b,
t10.a, t10.b
*/
FROM t T1
LEFT OUTER JOIN t T2
ON T1.b = T2.a AND T2.c = 0
LEFT OUTER JOIN t T3
ON T2.b = T3.a AND T3.c = 0
LEFT OUTER JOIN t T4
ON T3.b = T4.a AND T4.c = 0
LEFT OUTER JOIN t T5
ON T4.b = T5.a AND T5.c = 0
LEFT OUTER JOIN t T6
ON T5.b = T6.a AND T6.c = 0
LEFT OUTER JOIN t T7
ON T6.b = T7.a AND T7.c = 0
LEFT OUTER JOIN t T8
ON T7.b = T8.a AND T8.c = 0
LEFT OUTER JOIN t T9
ON T8.b = T9.a AND T9.c = 0
LEFT OUTER JOIN t T10
ON T9.b = T10.a AND T10.c = 0
WHERE T1.c = 0 AND T1.a = 5
;

It takes 4 seconds on my laptop, uncommenting commented lines causes run 
forever. analyzing table or removing index reduces execution time to 
milliseconds regardless on commented or uncommented lines.


The commit
commit 9c7f5229ad68d7e0e4dd149e3f80257893e404d4
Author: Tom Lane <t...@sss.pgh.pa.us>
Date:   Fri Apr 7 22:20:03 2017 -0400

Optimize joins when the inner relation can be proven unique.

seems a root this problem - before it the query takes milliseconds. In 
attachment there is a output of explain analyze with commented lines, my 
attention was attracted by a huge number of loops:


 ->  Materialize  (cost=0.00..1.40 rows=1 width=8) (actual time=0.000..0.001 
rows=17 loops=1048576)




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/
Timing is on.
DROP TABLE
Time: 5,268 ms
SELECT 32
Time: 5,515 ms
CREATE INDEX
Time: 14,971 ms
QUERY PLAN  
   
---
 Nested Loop Left Join  (cost=0.00..8.55 rows=1 width=48) (actual 
time=818.219..4159.317 rows=1 loops=1)
   Join Filter: (t1.b = t2.a)
   Rows Removed by Join Filter: 31
   ->  Seq Scan on t t1  (cost=0.00..1.48 rows=1 width=8) (actual 
time=0.026..0.032 rows=1 loops=1)
 Filter: ((c = 0) AND (a = 5))
 Rows Removed by Filter: 31
   ->  Nested Loop Left Join  (cost=0.00..7.06 rows=1 width=40) (actual 
time=10.588..4159.270 rows=32 loops=1)
 Join Filter: (t2.b = t3.a)
 Rows Removed by Join Filter: 993
 ->  Seq Scan on t t2  (cost=0.00..1.40 rows=1 width=8) (actual 
time=0.008..0.020 rows=32 loops=1)
   Filter: (c = 0)
 ->  Nested Loop Left Join  (cost=0.00..5.65 rows=1 width=32) (actual 
time=0.142..129.970 rows=32 loops=32)
   Join Filter: (t3.b = t4.a)
   Rows Removed by Join Filter: 993
   ->  Seq Scan on t t3  (cost=0.00..1.40 rows=1 width=8) (actual 
time=0.002..0.010 rows=32 loops=32)
 Filter: (c = 0)
   ->  Nested Loop Left Join  (cost=0.00..4.23 rows=1 width=24) 
(actual time=0.007..4.055 rows=32 loops=1024)
 Join Filter: (t4.b = t5.a)
 Rows Removed by Join Filter: 993
 ->  Seq Scan on t t4  (cost=0.00..1.40 rows=1 width=8) 
(actual time=0.002..0.010 rows=32 loops=1024)
   Filter: (c = 0)
 ->  Nested Loop Left Join  (cost=0.00..2.82 rows=1 
width=16) (actual time=0.003..0.121 rows=32 loops=32768)
   Join Filter: (t5.b = t6.a)
   Rows Removed by Join Filter: 528
   ->  Seq Scan on t t5  (cost=0.00..1.40 rows=1 
width=8) (actual time=0.002..0.009 rows=32 loops=32768)
 Filter: (c = 0)
   ->  Materialize  (cost=0.00..1.40 rows=1 width=8) 
(actual time=0.000..0.001 rows=17 loops=1048576)
 ->  Seq Scan on t t6  (cost=0.00..1.40 rows=1 
width=8) (actual time=0.008..0.031 rows=32 loops=1)
   Filter: (c = 0)
 Planning time: 3.316 ms
 Execution time: 4159.596 ms
(31 rows)

Time: 4165,372 ms (00:04,165)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] convert EXSITS to inner join gotcha and bug

2017-04-29 Thread Teodor Sigaev


Really, the way to fix Teodor's complaint is to recognize that the
semijoin inner rel is effectively unique against the whole outer rel,
and then strength-reduce the semijoin to a plain join.  The infrastructure
we built for unique joins is capable of proving that, we just weren't
applying it in the right way.
Perfect, it works. Thank you! Second patch reduces time of full query (my 
example was just a small part) from 20 minutes to 20 seconds.



I'm kind of strongly tempted to apply the second patch; but it would
be fair to complain that reduce_unique_semijoins() is new development
and should wait for v11.  Opinions?


Obviously, I'm voting for second patch applied to version 10.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] convert EXSITS to inner join gotcha and bug

2017-04-28 Thread Teodor Sigaev




Ah, thanks for the clue about enable_hashjoin, because it wasn't
reproducing for me as stated.

I missed tweaked config, sorry

--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] convert EXSITS to inner join gotcha and bug

2017-04-28 Thread Teodor Sigaev


Both 9.6 and 10devel are affected to addiction  of query result on seqscan
variable.
Oops, I was too nervious, 9.6 is not affected to enable_seqscan setting. But it 
doesn't push down condition too.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] convert EXSITS to inner join gotcha and bug

2017-04-28 Thread Teodor Sigaev


Hi!

Seems, there two issues:

1) Sometime conditions which for a first glance could be pushed down to scan are 
leaved as join quals. And it could be a ~30 times performance loss.


2) Number of query result depend on enabe_seqscan variable.

The query
explain analyze
SELECT
*
FROM
t1
INNER JOIN t2 ON (
EXISTS (
SELECT
true
FROM
t3
WHERE
t3.id1 = t1.id  AND
t3.id2 = t2.id
)
)
WHERE
t1.name = '5c5fec6a41b8809972870abc154b3ecd'
;

produces following plan:
 Nested Loop  (cost=6.42..1928.71 rows=1 width=99) (actual time=71.415..148.922 
rows=162 loops=1)

   Join Filter: (t3.id1 = t1.id)
   Rows Removed by Join Filter: 70368
   ->  Index Only Scan using t1i2 on t1  (cost=0.28..8.30 rows=1 width=66) 
(actual time=0.100..0.103 rows=1 loops=1)

 Index Cond: (name = '5c5fec6a41b8809972870abc154b3ecd'::text)
 Heap Fetches: 1
   ->  Hash Join  (cost=6.14..1918.37 rows=163 width=66) (actual 
time=0.370..120.971 rows=70530 loops=1)

(1)  Hash Cond: (t3.id2 = t2.id)
(2)  ->  Seq Scan on t3  (cost=0.00..1576.30 rows=70530 width=66) (actual 
time=0.017..27.424 rows=70530 loops=1)
 ->  Hash  (cost=3.84..3.84 rows=184 width=33) (actual 
time=0.273..0.273 rows=184 loops=1)

   Buckets: 1024  Batches: 1  Memory Usage: 20kB
   ->  Seq Scan on t2  (cost=0.00..3.84 rows=184 width=33) (actual 
time=0.017..0.105 rows=184 loops=1)

 Planning time: 7.326 ms
 Execution time: 149.115 ms


Condition (1) is not pushed to scan (2) which seemsly could be safely moved. 
With seqscan = off condition is not pushed too but query returns only one row 
instead of 162. Scan on t3 returns ~7 rows but only ~150 rows are really 
needed. I didn't found a combination of GUCs enable_* to push down that and it 
seems to me there is reason for that which I don't see or support is somehow missed.


If pair of (t3.id1, t3.id2) is unique (see dump, there is a unique index on 
them) the query could be directly rewrited to inner join and its plan is:
 Nested Loop  (cost=9.70..299.96 rows=25 width=66) (actual time=0.376..5.232 
rows=162 loops=1)
   ->  Nested Loop  (cost=9.43..292.77 rows=25 width=99) (actual 
time=0.316..0.645 rows=162 loops=1)
 ->  Index Only Scan using t1i2 on t1  (cost=0.28..8.30 rows=1 
width=66) (actual time=0.047..0.050 rows=1 loops=1)

   Index Cond: (name = '5c5fec6a41b8809972870abc154b3ecd'::text)
   Heap Fetches: 1
 ->  Bitmap Heap Scan on t3  (cost=9.15..283.53 rows=94 width=66) 
(actual time=0.257..0.426 rows=162 loops=1)

   Recheck Cond: (id1 = t1.id)
   Heap Blocks: exact=3
   ->  Bitmap Index Scan on t3i1  (cost=0.00..9.12 rows=94 width=0) 
(actual time=0.186..0.186 rows=162 loops=1)

 Index Cond: (id1 = t1.id)
   ->  Index Only Scan using t2i1 on t2  (cost=0.27..0.29 rows=1 width=33) 
(actual time=0.024..0.024 rows=1 loops=162)

 Index Cond: (id = t3.id2)
 Heap Fetches: 162
 Planning time: 5.532 ms
 Execution time: 5.457 ms

Second plan is ~30 times faster. But with turned  off sequentual scan the first 
query is not work correctly, which points to some bug in planner, I suppose. 
Both 9.6 and 10devel are affected to addiction  of query result on seqscan variable.



Dump to reproduce (subset of real data but obfucated), queries are in attachment
http://sigaev.ru/misc/exists_to_nested.sql.gz
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/
--query returns 162 rows
explain analyze
SELECT
*
FROM
t1
INNER JOIN t2 ON (
EXISTS (
SELECT
true
FROM
t3
WHERE
t3.id1 = t1.id  AND
t3.id2 = t2.id
)
)
WHERE
t1.name = '5c5fec6a41b8809972870abc154b3ecd'
;

set enable_seqscan=off;

--the same query returns only one row!
explain analyze
SELECT
*
FROM
t1
INNER JOIN t2 ON (
EXISTS (
SELECT
true
FROM
t3
WHERE
t3.id1 = t1.id  AND
t3.id2 = t2.id
)
)
WHERE
t1.name = '5c5fec6a41b8809972870abc154b3ecd'
;

explain analyze
SELECT
t1.*
FROM
t1, t2, t3
WHERE
t1.name = '5c5fec6a41b8809972870abc154b3ecd' AND
t3.id1 = t1.

Re: [HACKERS] WIP: Covering + unique indexes.

2017-03-30 Thread Teodor Sigaev

I had a look on patch and played with it, seems, it looks fine. I splitted it to 
two patches: core changes (+bloom index fix) and btree itself. All docs are left 
in first patch - I'm too lazy to rewrite documentation which is changed in 
second patch.

Any objection from reviewers to push both patches?


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


0001-covering-core.patch
Description: binary/octet-stream


0002-covering-btree.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Covering + unique indexes.

2017-03-30 Thread Teodor Sigaev


-   IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P
+   IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P 
INCLUDE

I think your syntax would read no worse, possibly even better, if you
just used the existing INCLUDING keyword.
It was a discussion in this thread about naming and both databases, which 
support covering indexes, use INCLUDE keyword.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] SortSupport for macaddr type

2017-03-29 Thread Teodor Sigaev


Thank you, pushed

Excellent! I've attached a new (and hopefully final)
version of the patch.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] ALTER DEFAULT PRIVILEGES with GRANT/REVOKE ON SCHEMAS

2017-03-28 Thread Teodor Sigaev


Thank you, pushed

Matheus de Oliveira wrote:


On Thu, Mar 2, 2017 at 10:27 AM, David Steele <da...@pgmasters.net
<mailto:da...@pgmasters.net>> wrote:

It looks like this patch is still waiting on an update for tab
completion in psql.


Hi All,

Sorry about the long delay... It was so simple to add it to tab-complete.c that
is a shame I didn't do it before, very sorry about that.

Attached the new version of the patch that is basically the same as previously
with the addition to tab completion for psql and rebased with master.

Hope it is enough. Thank you all.

--
Matheus de Oliveira







--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Potential data loss of 2PC files

2017-03-27 Thread Teodor Sigaev


Thank you, pushed

Michael Paquier wrote:

On Fri, Mar 24, 2017 at 11:36 PM, Teodor Sigaev <teo...@sigaev.ru> wrote:

And the renaming of pg_clog to pg_xact is also my fault. Attached is
an updated patch.



Thank you. One more question: what about symlinks? If DBA moves, for
example, pg_xact to another dist and leaves the symlink in data directoty.
Suppose, fsync on symlink will do nothing actually.


I did not think of this case, but is that really common? There is even
no option to configure that at command level. And surely we cannot
support any fancy scenario that people figure out using PGDATA.
Existing callers of fsync_fname don't bother about this case as well
by the way, take the calls related to pg_logical and pg_repslot.

If something should be done in this area, that would be surely in
fsync_fname directly to centralize all the calls, and I would think of
that as a separate patch, and a separate discussion.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Potential data loss of 2PC files

2017-03-27 Thread Teodor Sigaev


Thank you. One more question: what about symlinks? If DBA moves, for
example, pg_xact to another dist and leaves the symlink in data directoty.
Suppose, fsync on symlink will do nothing actually.


I did not think of this case, but is that really common? There is even

I saw a lot such cases.


If something should be done in this area, that would be surely in
fsync_fname directly to centralize all the calls, and I would think of
that as a separate patch, and a separate discussion.

Agree

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [BUGS] Problem in using pgbench's --connect(-C) and --rate=rate(-R rate) options together.

2017-03-24 Thread Teodor Sigaev


No, it is really needed so that the lag measure is correct.

Thank you, pushed

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Declarative partitioning optimization for large amount of partitions

2017-03-24 Thread Teodor Sigaev


Sorry, 1) and 4) is my fault, comment in hsearch.h:
* ... The hash key
* is expected to be at the start of the caller's hash entry data structure.

Ops, forgot that.

Teodor Sigaev wrote:

things in order I'm attaching the previous patch as well.


Patches look good, but I have some notices:

1 step1 Why do you need TabStatHashEntry at all? TabStatHashEntry.t_id is never
used for read, so entry for hash could be just a pointer to PgStat_TableStatus.

2 step1 In pgstat_report_stat() you remove one by one entries from hash and
remove them all. Isn't it better to hash_destroy/hash_create or even let hash
lives in separate memory context and just resets it?

3 step1 Again, pgstat_report_stat(), all-zero entries aren't deleted from hash
although they will be free from point of view of pgStatTabList.


4 step 2. The same as 1) about SeenRelsEntry->rel_id but it even isn't
initialized anywhere.





--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Declarative partitioning optimization for large amount of partitions

2017-03-24 Thread Teodor Sigaev


things in order I'm attaching the previous patch as well.


Patches look good, but I have some notices:

1 step1 Why do you need TabStatHashEntry at all? TabStatHashEntry.t_id is never 
used for read, so entry for hash could be just a pointer to PgStat_TableStatus.


2 step1 In pgstat_report_stat() you remove one by one entries from hash and 
remove them all. Isn't it better to hash_destroy/hash_create or even let hash 
lives in separate memory context and just resets it?


3 step1 Again, pgstat_report_stat(), all-zero entries aren't deleted from hash 
although they will be free from point of view of pgStatTabList.



4 step 2. The same as 1) about SeenRelsEntry->rel_id but it even isn't 
initialized anywhere.




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Potential data loss of 2PC files

2017-03-24 Thread Teodor Sigaev


And the renaming of pg_clog to pg_xact is also my fault. Attached is
an updated patch.


Thank you. One more question: what about symlinks? If DBA moves, for example, 
pg_xact to another dist and leaves the symlink in data directoty. Suppose, fsync 
on symlink will do nothing actually.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backend crash on non-exclusive backup cancel

2017-03-24 Thread Teodor Sigaev


Thank you all, pushed

Michael Paquier wrote:

On Fri, Mar 24, 2017 at 1:45 AM, Teodor Sigaev <teo...@sigaev.ru> wrote:

I believe patch looks good and it's ready to commit.


Thanks for the review!


As I understand, it fixes bug introduced by
commit 7117685461af50f50c03f43e6a622284c8d54694
Date:   Tue Apr 5 20:03:49 2016 +0200

Implement backup API functions for non-exclusive backups


Indeed.


And, suppose, it should be backpatched to 9.6?


Yes, that's where non-exclusive backups have been introduced.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [BUGS] Problem in using pgbench's --connect(-C) and --rate=rate(-R rate) options together.

2017-03-23 Thread Teodor Sigaev


Hi, the patch looks good except why do you remove initialization of 
is_throttled?
Suppose, just a typo?

 --- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -1967,7 +1967,6 @@ top:
st->listen = false;
st->sleeping = false;
st->throttling = false;
-   st->is_throttled = false;
memset(st->prepared, 0, sizeof(st->prepared));
}


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Potential data loss of 2PC files

2017-03-23 Thread Teodor Sigaev


Hmm, it doesn't work (but appplies) on current HEAD:

% uname -a
FreeBSD *** 11.0-RELEASE-p8 FreeBSD 11.0-RELEASE-p8 #0 r315651: Tue Mar 21 
02:44:23 MSK 2017 teodor@***:/usr/obj/usr/src/sys/XOR  amd64

% pg_config --configure
'--enable-depend' '--enable-cassert' '--enable-debug' '--enable-tap-tests' 
'--with-perl' 'CC=clang' 'CFLAGS=-O0'

% rm -rf /spool/pg_data/*
% initdb -n -E UTF8 --locale ru_RU.UTF-8 --lc-messages C --lc-monetary C 
--lc-numeric C --lc-time C -D /spool/pg_data

Running in no-clean mode.  Mistakes will not be cleaned up.
The files belonging to this database system will be owned by user "teodor".
This user must also own the server process.

The database cluster will be initialized with locales
  COLLATE:  ru_RU.UTF-8
  CTYPE:ru_RU.UTF-8
  MESSAGES: C
  MONETARY: C
  NUMERIC:  C
  TIME: C
The default text search configuration will be set to "russian".

Data page checksums are disabled.

fixing permissions on existing directory /spool/pg_data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... 2017-03-23 23:07:14.705 MSK [40286] FATAL:  could 
not open file "pg_clog": No such file or directory

child process exited with exit code 1
initdb: data directory "/spool/pg_data" not removed at user's request


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backend crash on non-exclusive backup cancel

2017-03-23 Thread Teodor Sigaev


Hi!

I believe patch looks good and it's ready to commit. As I understand, it fixes 
bug introduced by

commit 7117685461af50f50c03f43e6a622284c8d54694
Date:   Tue Apr 5 20:03:49 2016 +0200

Implement backup API functions for non-exclusive backups

And, suppose, it should be backpatched to 9.6?

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

2017-03-23 Thread Teodor Sigaev


Thank you, pushed

Andrew Borodin wrote:

2017-03-22 22:48 GMT+05:00 Teodor Sigaev <teo...@sigaev.ru>:

hasEmptyChild? and hasNonEmptyChild (BTW, isAnyNonempy has missed 't')


Yes, I think this naming is good. It's clear what's in common in these
flags and what's different.


And if the whole posting tree is empty,then we could mark root page as leaf
and remove all other pages in tree without any locking. Although, it could
be a task for separate patch.


From the performance point of view, this is a very good idea. Both,
performance of VACUUM and performance of Scans. But doing so we risk
to leave some garbage pages in case of a crash. And I do not see how
to avoid these without unlinking pages one by one. I agree, that
leaving this trick for a separate patch is quite reasonable.

Best regards, Andrey Borodin.




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

2017-03-22 Thread Teodor Sigaev


No, second conditional code will be called for any subtree, which
contains totally empty subtree. That check !isRoot covers case when
the entire posting tree should be erased: we cannot just quit out of
recursive cleanup, we have to make a scan here, starting from root.

Oh, I see


Probably, variable isChildHasVoid has a bit confusing name. This flag
indicates that some subtree:
1. Had empty pages
2. Did not bother deleting them, because there is a chance that it is
a part of a bigger empty subtree.
May be it'd be better to call the variable "someChildIsVoidSubtree".


hasEmptyChild? and hasNonEmptyChild (BTW, isAnyNonempy has missed 't')

And if the whole posting tree is empty,then we could mark root page as leaf and 
remove all other pages in tree without any locking. Although, it could be a task 
for separate patch.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Potential data loss of 2PC files

2017-03-21 Thread Teodor Sigaev


If that can happen, don't we have the same problem in many other places?
Like, all the SLRUs? They don't fsync the directory either.

Right, pg_commit_ts and pg_clog enter in this category.


Implemented as attached.


Is unlink() guaranteed to be durable, without fsyncing the directory? If
not, then we need to fsync() the directory even if there are no files in it
at the moment, because some might've been removed earlier in the checkpoint
cycle.


What is protection if pg crashes after unlimk() but before fsync()? Right, it's 
rather small window for such scenario, but isn't better to  have another 
protection? Like WAL-logging of WAL segment removing...


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

2017-03-21 Thread Teodor Sigaev


Thank you for your suggestions, do not hesitate to ask any questions,
concurrency and GIN both are very interesting topics.


I had a look on patch and found some issue.
Look at ginvacuum.c around line 387, function ginVacuumPostingTreeLeaves():


/*
 * All subtree is empty - just return TRUE to indicate that parent must
 * do a cleanup. Unless we are ROOT an there is way to go upper.
 */

if(isChildHasVoid && !isAnyNonempy && !isRoot)
return TRUE;

if(isChildHasVoid)
{
...
ginScanToDelete(gvs, blkno, TRUE, , InvalidOffsetNumber);
}

In first 'if' clause I see !isRoot, so second if and corresponding 
ginScanToDelete() could be called only for root in posting tree. If so, it seems 
to me, it wasn't a good idea to move ginScanToDelete() from
ginVacuumPostingTree() and patch should just remove lock for cleanup over 
ginVacuumPostingTreeLeaves() and if it returns that tree contains empty page 
then lock the whole posting tree to do ginScanToDelete() work.




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops

2017-03-21 Thread Teodor Sigaev


Thank you, pushed. I just make test table permanent.

Anastasia Lubennikova wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

As I can see, this bugfix was already discussed and reviewed.
All mentioned issues are fixed in the latest version of the patch
So I mark it "Ready for committer".
Thank you for fixing it!

The new status of this patch is: Ready for Committer



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PinBuffer() no longer makes use of strategy

2017-03-20 Thread Teodor Sigaev


  if (buf->usage_count < BM_MAX_USAGE_COUNT)
  if (BUF_STATE_GET_USAGECOUNT(buf_state) != BM_MAX_USAGE_COUNT)

being prone to paranoia, I prefer the first, but I've seen both styles in
the code so I don't know if it's worth futzing with.


Ok, let's be paranoic and do this same way as before.  Revised patch is 
attached.


I see the change was done in 9.6 release cycle in commit 
48354581a49c30f5757c203415aa8412d85b0f70 at April, 10. Does it mean the fix 
should be backpatched too?


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Phrase search distance syntax

2016-09-23 Thread Teodor Sigaev


Sorry to be asking another phrase search syntax question, and so close
to final release, but ...

Really close...


Why does the phrase distance operator assume <1> means adjacent words,
and not <0>.  (FYI, <-> is the same as <1>.)

Because
1 it is a result of subtruction of word's positions
2 <0> could be used as special case like a word with two infinitives:
# create text search dictionary xx (template = 'ispell', 
DictFile='ispell_sample', AffFile='ispell_sample');
# alter text search configuration english ALTER MAPPING FOR asciiword WITH xx, 
english_stem;


# select to_tsvector('english', 'bookings');
 to_tsvector
--
 'book':1 'booking':1

# select to_tsvector('english', 'bookings') @@ 'book <0> booking';
 ?column?
--
 t
(1 row)



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE

2016-07-21 Thread Teodor Sigaev


Please find attached a patch which makes it possible to disallow
UPDATEs and DELETEs which lack a WHERE clause.  As this changes query
behavior, I've made the new GUCs PGC_SUSET.

What say?


DELETE FROM tbl WHERE true; ?

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] One process per session lack of sharing

2016-07-20 Thread Teodor Sigaev


On 12 July 2016 at 09:57, <amatv...@bitec.ru <mailto:amatv...@bitec.ru>> wrote:

We have faced with some lack of sharing resources.
So in our test memory usage per session:
Oracle: about 5M
MSSqlServer: about 4M
postgreSql: about 160М


Using shared resources also has significant problems, so care must be taken.


To see memory allocation by opinion of pgsql it's possible to use
https://github.com/postgrespro/memstat

This module (9.6+) could collect information about MemoryContexts allocation.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG #14245: Segfault on weird to_tsquery

2016-07-15 Thread Teodor Sigaev


The above-described topic is currently a PostgreSQL 9.6 open item.  Teodor,

I'm working on it now and believe that fix will be published  today.


since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
efforts toward speedy resolution.  Thanks.

[1] 
http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-30 Thread Teodor Sigaev


That didn't cover all the places that needed to be fixed, but I have
re-read the docs and believe I've made things good now.

I have reviewed this thread and verified that all the cases raised in it
now work as desired, so I have marked the open item closed.


Thank you very much!
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Precedence of new phrase search tsquery operator

2016-06-22 Thread Teodor Sigaev


Attached patch changes a precedences of operations to |, &, <->, | in ascending
order. BTW, it simplifies a bit a code around printing and parsing of tsquery.


|, &, <->, ! of course
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Precedence of new phrase search tsquery operator

2016-06-21 Thread Teodor Sigaev


On Wed, Jun 8, 2016 at 7:13 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

It appears that the new <-> operator has been made to have exactly the
same grammatical precedence as the existing & (AND) operator.  Thus,
for example, 'a & b <-> c'::tsquery means something different from
'b <-> c & a'::tsquery:
I find this surprising.  My intuitive feeling is that <-> ought to
bind tighter than & (and therefore also tighter than |).  What's
the reasoning for making it act like this?


ah, now we remember :)   The idea about equivalence of  & and <->
operators appeared in situation when <-> degenerates to & in case of
absence of positional information. Looks like we mixed different
things, will fix.


Attached patch changes a precedences of operations to |, &, <->, | in ascending 
order. BTW, it simplifies a bit a code around printing and parsing of tsquery.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


phrase_predecence-2.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-20 Thread Teodor Sigaev


We're really quickly running out of time to get this done before
beta2.  Please don't commit anything that's going to break the tree
because we only have about 72 hours before the wrap, but if it's
correct then it should go in.


Isn't late now? Or wait to beta2 is out?

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-17 Thread Teodor Sigaev


Such algorithm finds closest pair of (Lpos, Rpos) but satisfying pair could be
not closest, example: to_tsvector('simple', '1 2 1 2') @@ '1 <3> 2';


Oh ... the indexes in the lists don't have much to do with the distances,
do they.  OK, maybe it's not quite as easy as I was thinking.  I'm
okay with the patch as presented.


Huh, I found that my isn't correct for example which I show :(. Reworked patch 
is in attach.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


phrase_exact_distance-2.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-17 Thread Teodor Sigaev




Tom Lane wrote:

Teodor Sigaev <teo...@sigaev.ru> writes:

So I think there's a reasonable case for decreeing that  should only
match lexemes *exactly* N apart.  If we did that, we would no longer have
the misbehavior that Jean-Pierre is complaining about, and we'd not need
to argue about whether <0> needs to be treated specially.



Agree, seems that's easy to change.
...
Patch is attached


Hmm, couldn't the loop logic be simplified a great deal if this is the
definition?  Or are you leaving it like that with the idea that we might
later introduce another operator with the less-than-or-equal behavior?


Do you suggest something like merge join of two sorted lists? ie:

while(Rpos < Rdata.pos + Rdata.npos && Lpos < Ldata.pos + Ldata.npos)
{
if (*Lpos > *Rpos)
Rpos++;
else if (*Lpos < *Rpos)
{
if (*Rpos - *Lpos == distance)
match!
Lpos++;
}
else
{
if (distance == 0)
match!
Lpos++; Rpos++;
}
}

Such algorithm finds closest pair of (Lpos, Rpos) but satisfying pair could be 
not closest, example: to_tsvector('simple', '1 2 1 2') @@ '1 <3> 2';


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-15 Thread Teodor Sigaev


We need to reach a consensus here, since there is no way to say "I don't know".
I inclined to agree with you, that returning false is better in such a
case.That will
indicate user to the source of problem.


Here is a patch, now phrase operation returns false if there is not postion 
information. If this behavior looks more reasonable, I'll commit that.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


phrase_no_fallback.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-15 Thread Teodor Sigaev


what's about word with several infinitives



select to_tsvector('en', 'leavings');
   to_tsvector

  'leave':1 'leavings':1
(1 row)



select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;
  ?column?
--
  t
(1 row)


Second example is not correct:

select phraseto_tsquery('en', 'leavings')
will produce 'leave | leavings'

and

select phraseto_tsquery('en', 'leavings cats')
will produce 'leave <-> cat | leavings <-> cat'

which seems correct and we don't need special threating of <0>.


This brings up something else that I am not very sold on: to wit,
do we really want the "less than or equal" distance behavior at all?
The documentation gives the example that
phraseto_tsquery('cat ate some rats')
produces
( 'cat' <-> 'ate' ) <2> 'rat'
because "some" is a stopword.  However, that pattern will also match
"cat ate rats", which seems surprising and unexpected to me; certainly
it would surprise a user who did not realize that "some" is a stopword.

So I think there's a reasonable case for decreeing that  should only
match lexemes *exactly* N apart.  If we did that, we would no longer have
the misbehavior that Jean-Pierre is complaining about, and we'd not need
to argue about whether <0> needs to be treated specially.


Agree, seems that's easy to change. I thought that I saw an issue with 
hyphenated word but, fortunately, I forget that hyphenated words don't share a 
position:

# select to_tsvector('foo-bar');
 to_tsvector
-
 'bar':3 'foo':2 'foo-bar':1
# select phraseto_tsquery('foo-bar');
 phraseto_tsquery
---
 ( 'foo-bar' <-> 'foo' ) <-> 'bar'
and
# select to_tsvector('foo-bar') @@ phraseto_tsquery('foo-bar');
 ?column?
--
 t


Patch is attached

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


phrase_exact_distance.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-15 Thread Teodor Sigaev


IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 9.6 open item is long past due
for your status update.  Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately.  If I do not hear from you by
2016-06-16 07:00 UTC, I will transfer this item to release management team
ownership without further notice.

[1] 
http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com


I'm working on it right now.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COMMENT ON, psql and access methods

2016-06-01 Thread Teodor Sigaev


As far as I can see, COMMENT ON has no support for access methods.
Wouldn't we want to add it as it is created by a command? On top of
that, perhaps we could have a backslash command in psql to list the
supported access methods, like \dam[S]? The system methods would be in
this case all the in-core ones.


+1.


Are there other opinions? That's not a 9.6 blocker IMO, so I could get
patches out for 10.0 if needed.


I missed that on review process, but I'm agree it should be implemented.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "Allow usage of huge maintenance_work_mem for GIN build" patch

2016-05-30 Thread Teodor Sigaev


--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -903,7 +903,7 @@ typedef struct GinEntryAccumulator
  typedef struct
  {
 GinState   *ginstate;
-   longallocatedMemory;
+   SizeallocatedMemory;
 GinEntryAccumulator *entryallocator;
 uint32  eas_used;
 RBTree *tree;

Are you sure this is safe, Teodor? I don't have time to study the
patch in detail, but offhand I think that it might have been better to
make allocatedMemory of type int64, just like the tuplesort.c memory
accounting variables are post-MaxAllocHuge. It's not obvious to me
that this variable isn't allowed to occasionally become negative, just
like in tuplesort.c. It looks like that *might* be true -- ginbulk.c
may let allocatedMemory go negative for a period, which would now be
broken.
It could not be negative - subtruction is doing only around repalloc call, in 
all other places it only grows.




If you did make this exact error, you would not be the first. If it
isn't actually broken, perhaps you should still make this change,
simply on general principle. I'd like to hear other opinions on that,
though.
It could be an int64 without any regressions or problems. I choose Size type 
because variable allocatedMemory actually stores a size of piece of memory and 
it's the same type as returned by GetMemoryChunkSpace()


AFAIK, size_t type (type Size is a just typedef alias) is a part of C99 and it 
should be unsigned and large enough to store size of any chunk of avaliable RAM. 
And it at least twice larger than allowed all memory-related GUC variables.


I don't see a reason to use int64 except, may be, general practice in pgsql. But 
then why we keep Size type - let us just use int64.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Move all am-related reloption code into src/backend/access/[am-name] and get rid of relopt_kind

2016-05-25 Thread Teodor Sigaev


This all should me moved behind "access method" abstraction...


+1 relopt_kind should be moved in am, at least. Or removed.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] add missing "USING bloom" into bloom extension documentation

2016-05-24 Thread Teodor Sigaev


you should add "USING bloom" to the insert statement, in order make this
example work.

Patch in the attachment fixes an example. Please commit it ;-)

Thank you, applied
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding an alternate syntax for Phrase Search

2016-05-22 Thread Teodor Sigaev




to_tsquery(' Berkus & "PostgreSQL Version 10.0" ')

... would be equivalent to:

to_tsquery(' Berkus & ( PostgreSQL <-> version <-> 10.0 )')


select to_tsquery('Berkus') && phraseto_tsquery('PostgreSQL Version 10.0');
does it as you wish



I realize we're already in beta, but pgCon was actually the first time I
saw the new syntax.  I think if we don't do this now, we'll be doing it
for 10.0.

Havn't an objections for 10.0
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Allocate all page images at once in generic wal interface

2016-05-17 Thread Teodor Sigaev


Allocate all page images at once in generic wal interface

That reduces number of allocation.

Per gripe from Michael Paquier and Tom Lane suggestion.

Branch
--
master

Details
---
http://git.postgresql.org/pg/commitdiff/7c979c95a3700d0bd34c2831f49a9260d505b0f9

Modified Files
--
src/backend/access/transam/generic_xlog.c | 19 +--
1 file changed, 9 insertions(+), 10 deletions(-)


Seems, this patch isn't liked by curculio [1] buildfarm member, but I'm confused 
with diagnostics:
2016-05-17 21:43:19.489 CEST [573b7457.547c:3] LOG:  statement: CREATE EXTENSION 
bloom;
2016-05-17 21:43:19.501 CEST [573b7457.547c:4] ERROR:  syntax error in file 
"/home/pgbf/buildroot/HEAD/inst/share/postgresql/extension/bloom.control" line 
1, near token ""


Could somebody explain me what's going on?

Thank you

[1] 
http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=curculio=2016-05-17+19%3A30%3A09

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Correctly align page's images in generic wal API

2016-05-16 Thread Teodor Sigaev


Instead of allocating this memory unconditionally for each buffer,
wouldn't it be better to set all the page pointers to NULL in
GenericXLogStart and allocate memory only once a buffer is registered
in GenericXLogRegisterBuffer when finding a free slot? This patch is
wasting many cycles.



GenericXLogRegisterBuffer() could be called in another MemoryContext what
can be a reason for strange bugs. Right now only a few pages could be
involved in one round of GenericWal. I don't believe that such allocation
could be a reason of noticable performance degradation. Although I didn't
check that.
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)

2016-05-11 Thread Teodor Sigaev


 Allow Pin/UnpinBuffer to operate in a lockfree manner.
I get the errors:

ERROR:  attempted to delete invisible tuple
ERROR:  unexpected chunk number 1 (expected 2) for toast value


Just reminder, you investigate "unexpected chunk number" problem, but, seems, we 
have another bug (first ERROR: attempted to delete invisible tuple). IMHO, it's 
a separate bug, not related to oid. Unfortunately, I've never seen such error on 
my notebook.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] atomic pin/unpin causing errors

2016-05-05 Thread Teodor Sigaev


Hm. And you're not seeing the asserts I reported in
http://archives.postgresql.org/message-id/20160505185246.2i7qftadwhzewykj%40alap3.anarazel.de
?

I see it a lot, but I think that is a result of ereport(FATAL) after 
FileWrite(BLCKSZ/3) added by Jeff.


Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] atomic pin/unpin causing errors

2016-05-05 Thread Teodor Sigaev

quot;teodor") at postgres.c:4122

#36 0x00839744 in BackendRun (port=0x801d571c0) at postmaster.c:4258
#37 0x00838d54 in BackendStartup (port=0x801d571c0) at postmaster.c:3932
#38 0x00835617 in ServerLoop () at postmaster.c:1690
#39 0x00832c69 in PostmasterMain (argc=4, argv=0x7fffe420) at 
postmaster.c:1298

#40 0x0075f228 in main (argc=4, argv=0x7fffe420) at main.c:228


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] atomic pin/unpin causing errors

2016-05-05 Thread Teodor Sigaev


Any chance you could package up that data directory for me to download?


Sent by personal email to Alexander, Andres and Jeff

In /var/log/message I found

May  4 22:04:07 xor kernel: pid 14010 (postgres), uid 1001: exited on signal 6 
(core dumped)
May  4 22:04:25 xor kernel: pid 14032 (postgres), uid 1001: exited on signal 11 
(core dumped)
May  4 22:04:52 xor kernel: pid 14037 (postgres), uid 1001: exited on signal 6 
(core dumped)



Sometimes postgres is crashed with SIGSEGV signal instead of SIGABRT (which 
comes form abort() in assert)


I'll try to get a coredump after SIGSEGV, but it could take a time.
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] atomic pin/unpin causing errors

2016-05-04 Thread Teodor Sigaev


I get the errors:

ERROR:  attempted to delete invisible tuple
STATEMENT:  update foo set count=count+1,text_array=$1 where text_array @> $2

And also:

ERROR:  unexpected chunk number 1 (expected 2) for toast value
85223889 in pg_toast_16424
STATEMENT:  update foo set count=count+1 where text_array @> $1


Hm. I appear to have trouble reproducing this issue (continuing to try)
on master as of 8826d8507.  Is there any chance you could package up a
data directory after the issue hit?


I've got
ERROR:  unexpected chunk number 0 (expected 1) for toast value 10192986 in 
pg_toast_16424


The test required 10 hours to run on my notebook. postgresql was compiled with 
-O0 --enable-debug --enable-cassert.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tsvector filter problem

2016-05-04 Thread Teodor Sigaev


Thank you, pushed.

Stas Kelvich wrote:

Hi.

As discovered by Oleg Bartunov, current filter() function for tsvector can 
crash backend.
Bug was caused erroneous usage of char type in memmove argument.









--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] amroutine->amsupport from numeric to defined constants

2016-04-28 Thread Teodor Sigaev


I think this number should be specified only once, in .h file.

Yep, that sounds true.
It may be a good idea to make something similar for contrib/bloom if >
0 values are defined for amstrategies or amsupport for consistency.


Thank you, pushed.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-28 Thread Teodor Sigaev


The more I think about it, the more I think gin is just an innocent
bystander, for which I just happen to have a particularly demanding
test.  I think something about snapshots and wrap-around may be
broken.


After 10 hours of running I've got
1587  XX000 2016-04-28 05:57:09.964 MSK:ERROR:  unexpected chunk number 0 
(expected 1) for toast value 10192986 in pg_toast_16424
1587  XX000 2016-04-28 05:57:09.964 MSK:CONTEXT:  automatic analyze of table 
"teodor.public.foo"
1587  0 2016-04-28 05:57:09.974 MSK:LOG:  JJ transaction ID wrap limit is 
2147484514, limited by database with OID 16384



I've pushed v5 of gin-alone-cleanup patch, many thanks to all participants.
Will work on backpatch

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Phrase search ported to 9.6

2016-04-26 Thread Teodor Sigaev


There are some previously unnoticed errors in docs (master branch), this patch
fixes them.

Thank you, pushed

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-26 Thread Teodor Sigaev


Check my reasoning: In version 4 I added a remebering of tail of pending
list into blknoFinish variable. And when we read page which was a tail on
cleanup start then we sets cleanupFinish variable and after cleaning that
page we will stop further cleanup. Any insert caused during cleanup will be
placed after blknoFinish (corner case: in that page), so, vacuum should not
miss tuples marked as deleted.


Yes, I agree with the correctness of v4.  But I do wonder if we should
use that early stopping for vacuum and gin_clean_pending_list, rather

Interesting, I've missed this possible option


than just using it for user backends.  While I think correctness
allows it to stop early, since these routines are explicitly about
cleaning things up it seems like they should volunteer to clean the
whole thing.


I believe that autovacuum should not require guaranteed full clean up, only 
vacuum and gin_clean_pending_list() should do that. In all other cases it should 
stop early to prevent possible infinite cleanup. Patch attached.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


gin_alone_cleanup-5.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-18 Thread Teodor Sigaev


Added, see attached patch (based on v3.1)


With this applied, I am getting a couple errors I have not seen before
after extensive crash recovery testing:
ERROR:  attempted to delete invisible tuple
ERROR:  unexpected chunk number 1 (expected 2) for toast value
100338365 in pg_toast_16425
Huh, seems, it's not related to GIN at all... Indexes don't play with toast 
machinery. The single place where this error can occur is a heap_delete() - 
deleting already deleted tuple.




I've restarted the test harness with intentional crashes turned off,
to see if the problems are related to crash recovery or are more
generic than that.

I've never seen these particular problems before, so don't have much
insight into what might be going on or how to debug it.
Check my reasoning: In version 4 I added a remebering of tail of pending list 
into blknoFinish variable. And when we read page which was a tail on cleanup 
start then we sets cleanupFinish variable and after cleaning that page we will 
stop further cleanup. Any insert caused during cleanup will be placed after 
blknoFinish (corner case: in that page), so, vacuum should not miss tuples 
marked as deleted.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-15 Thread Teodor Sigaev


Alvaro's recommendation, to let the cleaner off the hook once it
passes the page which was the tail page at the time it started, would
prevent any process from getting pinned down indefinitely, but would

Added, see attached patch (based on v3.1)
If there is no objections I will aplly it at mondeay and backpatch all supported 
versions.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-12 Thread Teodor Sigaev


There are only 3 fundamental options I see, the cleaner can wait,
"help", or move on.

"Helping" is what it does now and is dangerous.

Moving on gives the above-discussed unthrottling problem.

Waiting has two problems.  The act of waiting will cause autovacuums
to be canceled, unless ugly hacks are deployed to prevent that.   If
we deploy those ugly hacks, then we have the problem that a user
backend will end up waiting on an autovacuum to finish the cleaning,
and the autovacuum is taking its sweet time due to
autovacuum_vacuum_cost_delay.  (The "helping" model avoids this
problem because the user backend can just catch up with and pass the
io-throttled autovac process)
With pending cleanup patch backend will try to get lock on metapage with 
ConditionalLockPage. Will it interrupt autovacum worker?





For completeness sake, a fourth option would to move on, but only
after inserting the tuple directly into the main index structure
(rather then the pending list) like would be done with fastupdate off,
once the pending list is already oversized.  This is my favorite, but
there is no chance of it going into 9.6, much less being backpatched.

Agree, will reimplement for 9.7



Alvaro's recommendation, to let the cleaner off the hook once it
passes the page which was the tail page at the time it started, would
prevent any process from getting pinned down indefinitely, but would
not prevent the size of the list from increasing without bound.  I
think that would probably be good enough, because the current
throttling behavior is purely accidentally and doesn't *guarantee* a
limit on the size of the pending list.

Added, see attached patch (based on v3.1)
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


gin_alone_cleanup-4.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-12 Thread Teodor Sigaev


This restricts the memory used by ordinary backends when doing the
cleanup to be work_mem. Shouldn't we let them use
maintenance_work_mem? Only one backend can be doing this clean up of a
given index at any given time, so we don't need to worry about many
concurrent allocations of maintenance_work_mem.  This seems very
similar in spirit to index creation, where a backend is allowed to use
maintenance_work_mem.
Because it could be a several indexes in one pg instance. And each cleaner could 
eat maintenance_work_mem.





Also, do we plan on backpatching this?  While there are no known

Yes


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some other things about contrib/bloom and generic_xlog.c

2016-04-12 Thread Teodor Sigaev


I agree with both of these ideas. Patch is attached.


Hmm, you accidentally forget to change flag type of GenericXLogRegister in 
contrib/bloom signature.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some other things about contrib/bloom and generic_xlog.c

2016-04-11 Thread Teodor Sigaev


mainline XLOG replay.  Shouldn't generic_xlog.c take the same trouble?

Yes, it should. Alexander now works on it.



2. Unless I'm missing something, contrib/bloom is making no effort
to ensure that bloom index pages can be distinguished from other pages
by pg_filedump.  That's fine if your expectation is that bloom will always
be a toy with no use in production; but otherwise, not so much.  You need
to make sure that the last two bytes of the page special space contain a
uniquely identifiable code; compare spgist_page_id in SPGiST indexes.


Now contrib/bloom is a canonical example how to implement yourown index and how 
to use generic WAL interface. And it is an useful toy: in some cases it could 
help in production system, patch attached. I'm a little dissatisfied with patch 
because I had to add unused field to BloomPageOpaqueData, in opposite case size 
of BloomPageOpaqueData struct equals 6 bytes but special size is MAXALIGNED, so, 
last two bytes will be unused. If unused field is not a problem, I will push 
this patch.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-04-08 Thread Teodor Sigaev


Thank you very much


thank you, pushed. Pls, pay attention to buildfarm.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-08 Thread Teodor Sigaev


On Wed, Apr 6, 2016 at 1:50 PM, Peter Geoghegan <p...@heroku.com> wrote:

Personally, I like documenting assertions, and will sometimes write
assertions that the compiler could easily optimize away. Maybe going
*that* far is more a matter of personal style, but I think an
assertion about the new index tuple size being <= the old one is just
a good idea. It's not about a problem in your code at all.


You should make index_truncate_tuple()/index_reform_tuple() promise to
always do this in its comments/contract with caller as part of this,
IMV.


Some notices:
- index_truncate_tuple(Relation idxrel, IndexTuple olditup, int indnatts,
   int  indnkeyatts)
  Why we need indnatts/indnkeyatts? They are presented in idxrel struct
  already
- follow code where index_truncate_tuple() is called, it should never called in
  case where indnatts == indnkeyatts. So, indnkeyatts should be strictly less
  than indnatts, pls, change assertion. If they are equal the this function
  becomes complicated variant of CopyIndexTuple()
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-06 Thread Teodor Sigaev


I've tested the v2, v3 and v3.1 of the patch, to see if there are any
differences. The v2 no longer applies, so I tested it on ee943004. The following
table shows the total duration of the data load, and also sizes of the two GIN
indexes.

duration (sec)  subject  body
   ---
   v2  1290 23 MB   684 MB
   v3  1360 24 MB   487 MB
   v3.11360 24 MB   488 MB

Thank you very much.

Hmm. v3 is a just rebased version of v2, v3.1 hasn't unlock/lock cycle during 
cleanup, just to become similar to Jeff's pending lock patch. In theory, v2 and 
v3 should be very close, v3.1 should be close to pending_lock.


I'm inclining to push v3.1 as one of two winners by size/performance and, unlike 
to pending lock patch, it doesn't change an internal logic of lock machinery.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATH] Jsonb, insert a new value into an array at arbitrary position

2016-04-06 Thread Teodor Sigaev


Thank you, pushed with Petr's notice

Dmitry Dolgov wrote:


On 6 April 2016 at 03:29, Andrew Dunstan <and...@dunslane.net
<mailto:and...@dunslane.net>> wrote:


Yeah, keeping it but rejecting update of an existing key is my preference 
too.

cheers

andrew


Yes, it sounds quite reasonable. Here is a new version of patch (it will throw
an error for an existing key). Is it better now?





--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATH] Jsonb, insert a new value into an array at arbitrary position

2016-04-05 Thread Teodor Sigaev


I've been asked to look at and comment on the SQL API of the feature. I think
it's basically sound, although there is one thing that's not clear from the
regression tests: what happens if we're inserting into an object and the key
already exists? e.g.:

select jsonb_insert('{"a": {"b": "value"}}', '{a, b}', '"new_value"');

I think this should be forbidden, i.e. the function shouldn't ever overwrite an
existing value. If that's not handled it should be, and either way there should
be a regression test for it.


I'm agree about covering this case by tests, but I think it should be allowed.
In this case it will work exactly as jsbonb_set
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-04 Thread Teodor Sigaev


It seems to me that the patch is completed.
Except, maybe, grammar check of comments and documentation.

Looking forward to your review.
Are there any objectins on it? I'm planning to look closely today or tommorrow 
and commit it.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-04 Thread Teodor Sigaev


The above-described topic is currently a PostgreSQL 9.6 open item.  Teodor,
since you committed the patch believed to have created it, you own this open
item.  If that responsibility lies elsewhere, please let us know whose
responsibility it is to fix this.  Since new open items may be discovered at
any time and I want to plan to have them all fixed well in advance of the ship
date, I will appreciate your efforts toward speedy resolution.  Please
present, within 72 hours, a plan to fix the defect within seven days of this
message.  Thanks.


I'm waiting of Tomas testing. Suppose, it isn't possible now? so, will do 
myself, after that I will publish results.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Phrase search ported to 9.6

2016-04-01 Thread Teodor Sigaev


there was a character that was very similar to dots I would suggest
that.  The closest is * I think, so what do you think of "***"?


And join opertator for tsqueries is the same :
select 'fat'::tsquery *** 'cat'; ?

Single '*' ?  That's close to regex, any number of tokens. And it saves rules 
about duplicating character.


select 'fat'::tsquery ** 'cat';
select 'fat * cat'::tsquery;
select 'fat * [3] cat'::tsqyery; -- for non-default distance.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Phrase search ported to 9.6

2016-04-01 Thread Teodor Sigaev


Well, I noticed that the docs talk about an operator that can be used in
SQL (outside the tsquery parser), as well as an operator that can be

Just to join 2 tsquery with operator FOLLOWED BY


used inside tsquery.  Inside tsquery anything would be usable, but
outside that it would be good to keep in mind the rest of SQL operators;
and <> means "different from", so using it for FOLLOWED BY seems odd to
me.

previous rule was: duplicate operation character to join tsqueries.
Ellipsis is forbidden to use as operator name:
#  create operator ... (procedure = 'int2and');
ERROR:  syntax error at or near ".."
LINE 1: create operator ... (procedure = 'int2and');
What is your suggestion for joining operator name?


My suggestion of ... (ellipsis) is because that's already known as
related to text, used for omissions of words, where the surrounding
words form a phrase.  It seems much more natural to me.
Yes, agree for omission. But for reason above its'n a good name although I don't 
have a strong objections.


may be <=>? it isn't used anywhere yet.

select 'fat'::tsquery <=> 'cat';
select 'fat <=> cat'::tsquery;
select 'fat <3> cat'::tsqyery; -- for non-default distance.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Phrase search ported to 9.6

2016-03-31 Thread Teodor Sigaev

Looking at patch, I'm inlined to commit it in current commitfest, it looks 
workable and too many people desire it. I've did some changes, mostly in 
formatting and comments.


Patch makes some user-visible changes:
1 change order for tsquery. Assume, it's acceptable, only a few users store 
tsquery in table and have a Btree index. They will need to reindex.

2 less number of parenthesis in tsquery output, and tsquery becomes more 
readable.

Pls, remove tsquery_setweight from patch because it's not a part of phrase 
search and it isn't mentioned in docs. Please, make a separate patch for it.



Dmitry Ivanov wrote:

Sorry for the delay, I desperately needed some time to finish a bunch of
dangling tasks.

I've added some new comments and clarified the ones that were obscure.
Moreover, I felt an urge to recheck most parts of the code since apparently
nobody (besides myself) has gone so far yet.

On 25.03.16 18:42 MSK, David Steele wrote:

Time is short and it's not encouraging that you say there is "still much

work to be done".

Indeed, I was inaccurate. I am more than interested in the positive outcome.






--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


phrase-0.4.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-30 Thread Teodor Sigaev


Thank you, pushed

Emre Hasegeli wrote:

I'll try to explain with two-dimensional example over points. ASCII-art:


Thank you for the explanation.  Should we incorporate this with the patch.


added


I have worked on the comments of the patch.  It is attached.  I hope
it looks more clear than it was before.


+ cmp_double(const double a, const double b)


Does this function necessary?  We can just compare the doubles.


Yes, it compares with limited precision as it does by geometry operations.
Renamed to point actual arguments.


I meant that we could use FP macros directly instead of this function.
Doing so is also more correct.  I haven't tried to produce the
problem, but this function is not same as using the macros directly.
For differences smaller that the epsilon, it can return different
results.  I have removed it on the attached version.


+ boxPointerToRangeBox(BOX *box, RangeBox * rectangle)


The "rectangle" variable in here should be renamed.


fixed


I found a bunch of those too and renamed for clarity.  I have also
reordered the arguments of the helper functions to keep them
consistent.


evalRangeBox() is used to initialize fields of RangeBox in evalRectBox(). If
evalRangeBox() will palloc its result then we need to copy its result into
RangeBox struct and free. Let both fucntion use the same interface.


I found it nicer to copy and edit the existing value than creating it
in two steps like this.  It is also attached.


it works until allthesame branch contains only one inner node. Otherwise
traversalValue will be freeed several times, see spgWalk().


It just works, when traversalValues is not set.  It is also attached.

I have also added the missing regression tests.  While doing that I
noticed that some operators are missing and also added support for
them.  It is also attached with the updated version of my test script.

I hope I haven't changed the patch too much.  I have also pushed the
changes here:

https://github.com/hasegeli/postgres/commits/box-spgist



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2016-03-30 Thread Teodor Sigaev


as we discussed recently [1] you should avoid leaving "holes" with
uninitialized data in structures. Please fix this or provide a comment
that describes why things are done here the way they are done.
[1] http://www.postgresql.org/message-id/56eff347.20...@anastigmatix.net
That discussion is about SQL-level types which could be stored on disk, not 
about in-memory structs



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2016-03-30 Thread Teodor Sigaev


GenericXLogStart(Relation relation)
{
...
if (genericXlogStatus != GXLOG_NOT_STARTED)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg("GenericXLogStart: generic xlog is already started")));


Hmm, seems, generic wal whiil be in incorrect state if exception occurs between 
GenericXLogStart() and GenericXLogFinish() calls because static variable 
genericXlogStatus will contain GXLOG_LOGGED/GXLOG_UNLOGGED status.


Suppose, it could be solved by different ways
- remove all static variable, so, GenericXLogStart() will return an struct
  (object) which incapsulated all data needed to generic wal work. As I can
  see, in case of exception there isn't ane needing to extra cleanup. Also,
  it would allow to use generic wal for two or more relations at the same time,
  although I don't know any useful example for such feature.
- add callback via RegisterResourceReleaseCallback() which will cleanup state
  of genericXlogStatus variable

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2016-03-29 Thread Teodor Sigaev


I incorporated your changes and did some additional refinements on top of them
still.

Attached is delta against v12, that should cause less issues when merging for
Teodor.


But last version is 13th?

BTW, it would be cool to add odcs in VII. Internals chapter, description should 
be similar to index's interface description, i.e. how to use generic WAL interface.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2016-03-29 Thread Teodor Sigaev


And here it is. It's not perfect but it's better (I am not native speaker
either). It's same as v12, just changed comments somewhat.


Thank you, but I have a problems with applying:

% patch -p1 < ~/Downloads/0002-generic-xlog.13.patch
Hmm...  Looks like a new-style context diff to me...
...
|diff --git a/src/backend/access/transam/generic_xlog.c 
b/src/backend/access/transam/generic_xlog.c

|new file mode 100644
|index ...7ca03bf
|*** a/src/backend/access/transam/generic_xlog.c
|--- b/src/backend/access/transam/generic_xlog.c
--
(Creating file src/backend/access/transam/generic_xlog.c...)
Patching file src/backend/access/transam/generic_xlog.c using Plan A...
patch:  malformed patch at line 634: diff --git 
a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2016-03-29 Thread Teodor Sigaev


Does that mean this patch should be closed or is there more remaining to commit?


Petr promises to check english in comments/docs in generic-wal patch at this 
week, bloom patch depends on it. Bloom patch is ready from my point of view.




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Draft release notes for next week's releases

2016-03-29 Thread Teodor Sigaev


Does include ICU mean that collation handling is identical across platforms?
E.g. a query on Linux involving string comparison would yield the same
result on MacOS and Windows?
Yes, it does and that's the most important issue for us.


Yes, exactly. Attached patch adds support for libicu with configure flag 
--with-icu. Patch rebased to current HEAD, hope, it works.


It's based on https://people.freebsd.org/~girgen/postgresql-icu/readme.html 
work, and it was migrated to 9.5 with abbrevation keys support.

Patch in current state is not ready to commit, of course.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


libicu-8.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-24 Thread Teodor Sigaev


+  * boxtype_spgist.c

The names on the file header need to be changed, too.

Oops. fixed




I'll try to explain with two-dimensional example over points. ASCII-art:

Thank you for the explanation.  Should we incorporate this with the patch.

added



+ cmp_double(const double a, const double b)

Does this function necessary?  We can just compare the doubles.
Yes, it compares with limited precision as it does by geometry operations. 
Renamed to point actual arguments.





+ boxPointerToRangeBox(BOX *box, RangeBox * rectangle)

The "rectangle" variable in here should be renamed.

fixed




+ Assert(is_infinite(b) == 0);

This is failing on my test.  You can reproduce with the script I have sent.

I didn't know:
# select '(1,inf)'::point;
  point
-
 (1,inf)

fixed




Wouldn't it be simpler to palloc and return the value on those functions?

evalRangeBox() initializes part of RectBox, so, it could not palloc its
result.
Other methods use the same signature just for consistency.


I couldn't understand it.  evalRangeBox() can palloc and return the
result.  evalRectBox() can return the result palloc'ed by
evalRangeBox().  The caller wouldn't need to palloc anything.

evalRangeBox() is used to initialize fields of RangeBox in evalRectBox(). If
evalRangeBox() will palloc its result then we need to copy its result into 
RangeBox struct and free. Let both fucntion use the same interface.



I went through all other variables:

+ int r = is_infinite(a);
+ double  x = *(double *) a;
+ double  y = *(double *) b;
+ spgInnerConsistentIn *in = (spgInnerConsistentIn *) PG_GETARG_POINTER(0);
+ spgLeafConsistentIn *in = (spgLeafConsistentIn *) PG_GETARG_POINTER(0);
+ BOX*leafBox = DatumGetBoxP(in->leafDatum);

Shouldn't they be "const", too?
They could. But it doesn't required. To be in consistent state I've removed 
const modifier where possible






+/*
+ * Begin. This block evaluates the median of coordinates of boxes
+ */


I would rather explain what the function does on the function header.


fixed

The "end" part of it is still there.

Oops again, fixed




Do we really need to copy the traversalValues on allTheSame case.
Wouldn't it work if just the same value is passed for all of them.
The search shouldn't continue after allTheSame case.


Seems, yes. spgist tree could contain a locng branches with allTheSame.


Does SP-GiST allows any node under allTheSame to not being allTheSame?

No, SP-GiST doesn't allow that

  Not setting traversalValues for allTheSame worked fine with my test.
it works until allthesame branch contains only one inner node. Otherwise 
traversalValue will be freeed several times, see spgWalk().

# select i as id, '1,2,3,4'::box as b into x from generate_series(1,100) i;
# create index ix on i using spgist (b);
# select count(*) from x where b && '1,2,3,4'::box; -- coredump
gdb:
#0  0x00080143564a in thr_kill () from /lib/libc.so.7
#1  0x000801435636 in raise () from /lib/libc.so.7
#2  0x0008014355b9 in abort () from /lib/libc.so.7
#3  0x00a80739 in in_error_recursion_trouble () at elog.c:199
#4  0x00abb748 in pfree (pointer=0x801e90868) at mcxt.c:1016
#5  0x0053330c in freeScanStackEntry (so=0x801e8d358, 
stackEntry=0x801e935d8) at spgscan.c:47
#6  0x00532cdb in spgWalk (index=0x801f1c588, so=0x801e8d358, 
scanWholeIndex=1 '\001', storeRes=0x532d10 
...




+ if (in->allTheSame)

Most of the things happening before this check is not used in there.
Shouldn't we move this to the top of the function?

yep, fixed




+ out->nodeNumbers = (int *) palloc(sizeof(int) * in->nNodes);

This could go before allTheSame block.

yep, fixed

I've attached all patches again. Thank you very much!

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


q4d-4.patch.gz
Description: GNU Zip compressed data


traversalValue-2.patch.gz
Description: GNU Zip compressed data


range-1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-22 Thread Teodor Sigaev




Isn't this basically the same thing that the cube contrib module does? (Which
has the added benefit of kNN-capable operators).
No, cube module introduces new type - N-dimensional box. And adds an index 
support for it.


Current patch suggests non-traditional indexing technique for 2D boxes by 
treating them as point in 4D space. With such representation it's became 
possible to use quad-tree technique which:

1 supports only points to index
2 already supported by SP-GiST

Such technique provides some benefits in comparance with traditional R-Tree 
which implemented in pg with a help GiST. See Oleg's message.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-22 Thread Teodor Sigaev


tend to think of a *point* as having *zero* dimensions.  Would it
perhaps be more accurate to say we are treating a 2-dimensional box
as a point in 4-dimensional space?

Exactly, sorry for ambiguity.
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-21 Thread Teodor Sigaev

nterToRectangle(
+DatumGetBoxP(in->scankeys[i].sk_argument),
+p_query_rect
+);

I don't think this matches the project coding style.

fixed




+ int flag = 1,

Wouldn't it be better as "bool"?

fixed.


The regression tests for the remaining operators are still missing.

Ooops, will be soon.

Attached all patches  to simplify review. Thank you very much!

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/
I'll try to explain with two-dimensional example over points. ASCII-art:
   |
   |
  1|2
   |
---+-
   |P
  3|4
   |
   |

'+' with 'A' represents a centroid or, other words, point which splits plane 
for 4 quadrants and it strorend in parent node. 1,2,3,4 are labels of 
quadrants, each labeling will be the same for all pictures and all centriods, 
and i will not show them in pictures later to prevent too complicated images. 
Let we add C - child node (and again, it will split plane for 4 quads):
   | |
   +-+---
   X   |B|C
   | |
---+-+---
   |P|A
   | |
   |
   |

A and B are points of intersection of lines. So, box PBCAis a bounding box for 
points contained in 3-rd (see labeling above). For example X labeled point is 
not a descendace of child node with centroid C because it must be in branch of 
1-st quad of parent node. So, each node (except root) will have a limitation in 
its quadrant. To transfer that limitation the traversalValue is used.


traversalValue-2.patch.gz
Description: GNU Zip compressed data


q4d-3.patch.gz
Description: GNU Zip compressed data


range-1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2016-03-20 Thread Teodor Sigaev




I don't see any objection, is consensus reached? I'm close to comiit
that...



I did only cursory review on the bloom contrib module and don't really have
complaints there, but I know you can review that one. I'd like the English of
the generic_xlog.c description improved but I won't get to it before weekend.


So, per patch status:
create-am
ready
generic-xlog
need to fix comments/docs
bloom-contrib
need review. I'm an original author of this module and of course
I can do some review of it but, seems, it's needed more eyes.
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] POC, WIP: OR-clause support for indexes

2016-03-19 Thread Teodor Sigaev


I also wonder whether the patch should add explanation of OR-clauses
handling into the READMEs in src/backend/access/*

Not yet, but will


The patch would probably benefit from transforming it into a patch
series - one patch for the infrastructure shared by all the indexes,
then one patch per index type. That should make it easier to review, and
I seriously doubt we'd want to commit this in one huge chunk anyway.

I splitted to two:
1 0001-idx_or_core - only planner and executor changes
2 0002-idx_or_indexes - BRIN/GIN/GiST changes with tests

I don't think that splitting of second patch adds readability but increase 
management diffculties, but if your insist I will split.



4) scanGetItem is a prime example of the "badly needs comments" issue,
particularly because the previous version of the function actually had
quite a lot of them while the new function has none.

added


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


0001-idx_or_core-v4.patch.gz
Description: GNU Zip compressed data


0002-idx_or_indexes-v4.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] POC, WIP: OR-clause support for indexes

2016-03-18 Thread Teodor Sigaev





I also wonder whether the patch should add explanation of OR-clauses
handling into the READMEs in src/backend/access/*


Oops, will add shortly.


The patch would probably benefit from transforming it into a patch
series - one patch for the infrastructure shared by all the indexes,
then one patch per index type. That should make it easier to review, and
I seriously doubt we'd want to commit this in one huge chunk anyway.

Ok, will do it.


1) fields in BrinOpaque are not following the naming convention (all the
existing fields start with bo_)

fixed



2) there's plenty of places violating the usual code style (e.g. for
single-command if branches) - not a big deal for WIP patch, but needs to
get fixed eventually

hope, fixed



3) I wonder whether we really need both SK_OR and SK_AND, considering
they are mutually exclusive. Why not to assume SK_AND by default, and
only use SK_OR? If we really need them, perhaps an assert making sure
they are not set at the same time would be appropriate.

In short: possible ambiguity and increasing stack machine complexity.
Let we have follow expression in reversed polish notation (letters represent a 
condtion, | - OR, & - AND logical operation, ANDs are omitted):

a b c |

Is it ((a & b)| c) or (a & (b | c)) ?

Also, using both SK_ makes code more readable.



4) scanGetItem is a prime example of the "badly needs comments" issue,
particularly because the previous version of the function actually had
quite a lot of them while the new function has none.

Will add soon



5) scanGetItem() may end up using uninitialized 'cmp' - it only gets
initialized when (!leftFinished && !rightFinished), but then gets used
when either part of the condition evaluates to true. Probably should be

 if (!leftFinished || !rightFinished)
 cmp = ...

fixed



6) the code in nodeIndexscan.c should not include call to abort()

 {
 abort();
 elog(ERROR, "unsupported indexqual type: %d",
 (int) nodeTag(clause));
 }

fixed, just forgot to remove



7) I find it rather ugly that the paths are built by converting BitmapOr
paths. Firstly, it means indexes without amgetbitmap can't benefit from
this change. Maybe that's reasonable limitation, though?

I based on following thoughts:
1 code which tries to find OR-index path will be very similar to existing
  generate_or_bitmap code. Obviously, it should not be duplicated.
2 all existsing indexes have amgetbitmap method, only a few don't. amgetbitmap
  interface is simpler. Anyway, I can add an option for generate_or_bitmap
  to use any index, but, in current state it will just repeat all work.



But more importantly, this design already has a bunch of unintended
consequences. For example, the current code completely ignores
enable_indexscan setting, because it merely copies the costs from the
bitmap path.

I'd like to add separate enable_indexorscan


That's pretty dubious, I guess. So this code probably needs to be made
aware of enable_indexscan - right now it entirely ignores startup_cost
in convert_bitmap_path_to_index_clause(). But of course if there are
multiple IndexPaths, the  enable_indexscan=off will be included multiple
times.

9) This already breaks estimation for some reason. Consider this

...

So the OR-clause is estimated to match 0 rows, less than each of the
clauses independently. Needless to say that without the patch this works
just fine.

fixed



10) Also, this already breaks some regression tests, apparently because
it changes how 'width' is computed.

fixed too



So I think this way of building the index path from a BitmapOr path is
pretty much a dead-end.
I don't think so because separate code path to support OR-clause in index will 
significanlty duplicate BitmapOr generator.


Will send next version as soon as possible. Thank you for your attention!

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


index_or-3.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2016-03-18 Thread Teodor Sigaev


Good catch, thanks! Tests were added.


I don't see any objection, is consensus reached? I'm close to comiit that...

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-18 Thread Teodor Sigaev


Do reconstructedValues is required now?  Wouldn't it be cleaner to use
the new field on the prefix tree implementation, too?
reconstructedValues is needed to reconctruct full indexed value and should match 
to type info indexed column




We haven't had specific memory context for reconstructedValues.  Why
is it required for the new field?
because pg knows type of reconstructedValues (see above why) but traversalValue 
just a void*, SP-GiST core doesn't knnow how to free memory of allocated for it.






+   Memory for traversalValues should be allocated in
+   the default context, but make sure to switch to
+   traversalMemoryContext before allocating memory
+   for pointers themselves.


This sentence is not understandable.  Shouldn't it say "the memory
should *not* be allocated in the default context"?  What are the
"pointers themselves"?

Clarified, I hope




The box opclass is broken because of the floating point arithmetics of
the box type.  You can reproduce it with the attached script.  We

hope, fixed. At least, seems, syncronized with seqscan


really need to fix the geometric types, before building more on them.
They fail to serve the only purpose they are useful for, demonstrating
features.

agree, but it's not a deal for this patch



It think the opclass should support the cross data type operators like
box @> point.  Ideally we should do it by using multiple opclasses in
an opfamily.  The floating problem will bite us again in here, because
some of the operators are not using FP macros.

Again, agree. But I suggest to do it by separate patch.



The tests check not supported operator "@".  It should be "<@".  It is
nice that the opclass doesn't support long deprecated operators.

fixed



+ #define LT  -1
+ #define GT   1
+ #define EQ   0


Using these numbers is a very well established pattern.  I don't think
macros make the code any more readable.

fixed




+ /* SP-GiST API functions */
+ Datum   spg_box_quad_config(PG_FUNCTION_ARGS);
+ Datum   spg_box_quad_choose(PG_FUNCTION_ARGS);
+ Datum   spg_box_quad_picksplit(PG_FUNCTION_ARGS);
+ Datum   spg_box_quad_inner_consistent(PG_FUNCTION_ARGS);
+ Datum   spg_box_quad_leaf_consistent(PG_FUNCTION_ARGS);


I guess they should go to the header file.
Remove them because they are presented in auto-generated file 
./src/include/utils/builtins.h


range patch is unchanged, but still attached to keep them altogether.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


q4d-2.patch.gz
Description: GNU Zip compressed data


traversalValue-2.patch.gz
Description: GNU Zip compressed data


range-1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: function parse_ident

2016-03-18 Thread Teodor Sigaev


I hope so the messages are ok now. Few more regress tests added.


Thank you, committed.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: function parse_ident

2016-03-14 Thread Teodor Sigaev


I afraid so I cannot to fix this inconsistency (if this is inconsistency - the
binary values are same) - the parameter of function is raw string with processed
escape codes, and I have not any information about original escape sequences.
When you enter octet value, and I show it as hex value, then there should be
difference. Buy I have not information about your input (octet or hex). I have
the original string of SQL identifier inside parser, executor, but I have not
original string of function parameter inside function (not without pretty
complex and long code).

Ok, agree



I am trying describe it in doc (I am sorry for my less level English) in new
patch. Fixed duplicated oid too.

Edited a bit + fix some typos and remove unneeded headers, patch attached

Sorry, I can't find all corner-cases at once, but:
SELECT parse_ident(E'"c".X XX');
ERROR:  identifier contains disallowed characters: "\"c"

Error message wrongly points to the reason of error.




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


parse_ident-13.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tsvector editing functions

2016-03-11 Thread Teodor Sigaev


I saw errors on windows, here is the fix:

Thank you, pushed

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tsvector editing functions

2016-03-10 Thread Teodor Sigaev


Thanks! Fixed and added tests.

Thank you!

I did some patch cleanup/fix, but I have some doubt with function's names:

1 to_tsvector:
# \df to_tsvector
 List of functions
   Schema   |Name | Result data type | Argument data types |  Type
+-+--+-+
 pg_catalog | to_tsvector | tsvector | regconfig, text | normal
 pg_catalog | to_tsvector | tsvector | text| normal
 pg_catalog | to_tsvector | tsvector | text[]  | normal

First two variants of to_tsvector make a morphological processing, last one 
doesn't.

2 to_array
# \df *to_array
  List of functions
   Schema   | Name  | Result data type | Argument data types | 
 Type

+---+--+-+
 pg_catalog | regexp_split_to_array | text[]   | text, text  | 
normal
 pg_catalog | regexp_split_to_array | text[]   | text, text, text| 
normal
 pg_catalog | string_to_array   | text[]   | text, text  | 
normal
 pg_catalog | string_to_array   | text[]   | text, text, text| 
normal
 pg_catalog | to_array  | text[]   | tsvector| 
normal


Seems, to_array is not a right name compared to other *to_array.

I would like to suggest rename both functions to array_to_tsvector and 
tsvector_to_array to have consistent name. Later we could add 
to_tsvector([regconfig, ], text[]) with morphological processing.


Thoughts?



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4b5ee81..ed0b6be 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -9211,16 +9211,29 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
  
   setweight
  
- setweight(tsvector, "char")
+ setweight(vector tsvector, weight "char")
 
 tsvector
-assign weight to each element of tsvector
+assign weight to each element of vector
 setweight('fat:2,4 cat:3 rat:5B'::tsvector, 'A')
 'cat':3A 'fat':2A,4A 'rat':5A


 
  
+  setweight
+  setweight by filter
+ 
+ setweight(vector tsvector, weight "char", lexemes "text"[])
+
+tsvector
+assign weight to elements of vector that are listed in lexemes array
+setweight('fat:2,4 cat:3 rat:5B'::tsvector, 'A', '{cat,rat}')
+'cat':3A 'fat':2,4 'rat':5A
+   
+   
+
+ 
   strip
  
  strip(tsvector)
@@ -9233,6 +9246,81 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple

 
  
+  delete
+  delete lemexeme
+ 
+ delete(vector tsvector, lexeme text)
+
+tsvector
+remove given lexeme from vector
+delete('fat:2,4 cat:3 rat:5A'::tsvector, 'fat')
+'cat':3 'rat':5A
+   
+   
+
+ 
+  delete
+  delete lemexemes array
+ 
+ delete(vector tsvector, lexemes text[])
+
+tsvector
+remove any occurrence of lexemes in lexemes array from vector
+delete('fat:2,4 cat:3 rat:5A'::tsvector, ARRAY['fat','rat'])
+'cat':3
+   
+   
+
+ 
+  unnest
+ 
+ unnest(tsvector, OUT lexeme text, OUT positions smallint[], OUT weights text)
+
+setof record
+expand a tsvector to a set of rows
+unnest('fat:2,4 cat:3 rat:5A'::tsvector)
+(cat,{3},{D}) ...
+   
+   
+
+ 
+  to_array
+ 
+ to_array(tsvector)
+
+text[]
+convert tsvector to array of lexemes
+to_array('fat:2,4 cat:3 rat:5A'::tsvector)
+{cat,fat,rat}
+   
+   
+
+ 
+  to_tsvector
+  array to tsvector
+ 
+ to_tsvector(text[])
+
+tsvector
+convert array of lexemes to tsvector
+to_tsvector('{fat,cat,rat}'::text[])
+'fat' 'cat' 'rat'
+   
+   
+
+ 
+  filter
+ 
+ filter(vector tsvector, weights "char"[])
+
+tsvector
+Select only elements with given weights from vector
+filter('fat:2,4 cat:3b rat:5A'::tsvector, '{a,b}')
+'cat':3B 'rat':5A
+   
+   
+
+ 
   to_tsquery
  
  to_tsquery( config regconfig ,  query text)
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgm

Re: [HACKERS] proposal: function parse_ident

2016-03-10 Thread Teodor Sigaev


select
parse_ident(E'X\rXX');


I am sending updated patch - I used json function for correct escaping - the
escaping behave is same.


Hmm, it doesn't look so:
% select parse_ident(E'_\005');
ERROR:  identifier contains disallowed characters: "_\u0005"
% select parse_ident(E'\005');
ERROR:  missing identifier: "\u0005"

but

# select parse_ident(E'"\005"');
 parse_ident
-
 {\x05}

Error messages above point wrong character wrongly.

One more inconsistence:
# select parse_ident(E'"\005"') as "\005";
  \005

 {\x05}

Display outputs of actual identifier and parse_indent are differ.

Actually, I can live with both but are any other opinions? Seems, at least 
difference of actual identifier and output of parse_indent should be pointed in 
docs.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 3 4 5 6 7 8 9 >

1 - 100 of 812 matches

Mail list logo