from:"Teodor"

Re: [HACKERS] tsearch in core patch

2007-06-22 Thread teodor


>> That may have been true until we started supporting Windows...
>> Swedish_Sweden.1252 is what I get on my machine, for example. Principle
>> is the same, but values certainly aren't.
>
> Well, at least the name is not itself translated, so a mapping table is
> not right out of the question.  If they had put a name like
> "Español_Chile" instead of "Spanish_Chile" we would be in serious
> trouble.
I don't think so, in oppsite case you can't type or show it to change
locale :).

So, final propose:
rename cfglocale to cfglanguages and store in it array of laguage names
which is produced from first part of locale names:
russian   '{ru_RU, Russian_Russia}'
spanish   '{es_ES, es_CL, Spanish_Spain, Spanish_Chile}'

Comments?

Is there some obstacles to  use GIN indexes in pg_catalog?


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch in core patch

2007-06-22 Thread teodor

> Why not do it the other way around?
> es_ES spanish
> Spanish_Spain spanish
> ru_RU russian
> pt_BR portuguese_brazil
>
> That way you don't need any funny index.  Or do you need the list of
> locales for each language? (but even if you do, you can easily obtain it
> by indexing both columns separately using btrees anyway)

Yes, that's possible but that icreases number of identical configuration:
russian_win Russian_Russia
russian_unixru_RU

They doesn't differ except locale name.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-22 Thread teodor


>> How would this work for initdb with locale C?
>
> I'm worrying about that too.

english '{en_GB, en_US, C}'

I suppose, that locale name always has a dot separator exept C locale ---
which is well known exception




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Re: Full text searching, anyone interested?

2001-06-05 Thread Teodor


> > > I would love to find a way to get a bitmap like index native to Postgres. I
[skip]

> We could implement bitmap handling functions based on one dimentional arrays of
> integers. That's how my stuff deals with them, and postgres already manages
> them.
> 

look at contrib/intarray. gist__intbig_ops is a variant of signature
tree (from each array  get bitmap signature).

Regards,
Teodor

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[HACKERS] Re: GiST patches for 7.2 (please apply)

2001-08-22 Thread Teodor




Tom Lane wrote:
> 
> Oleg Bartunov <[EMAIL PROTECTED]> writes:
> > Teodor finally made patches to current CVS, please review and apply them asap
> > to be in sync with development (last time it was kind of problem)
> 
> Checked and committed.  Note I did not commit your change to the cube
> regression test:
> 
> *** ./contrib/cube/expected/cube.out.orig   Wed Aug 22 16:04:42 2001
> --- ./contrib/cube/expected/cube.outWed Aug 22 16:26:25 2001
> ***
> *** 130,142 
>   SELECT '1e700'::cube AS cube;
>cube
>   ---
> !  (inf)
>   (1 row)
> 
>   SELECT '-1e700'::cube AS cube;
> cube
>   
> !  (-inf)
>   (1 row)
> 
>   SELECT '1e-700'::cube AS cube;
> --- 130,142 
>   SELECT '1e700'::cube AS cube;
>cube
>   ---
> !  (Inf)
>   (1 row)
> 
>   SELECT '-1e700'::cube AS cube;
> cube
>   
> !  (-Inf)
>   (1 row)
> 
>   SELECT '1e-700'::cube AS cube;
> 
> since on my machine "inf" appears to be the correct result.  Is this a
> platform dependency, or just a lack of synchronization somewhere else?
> 
> regards, tom lane

On my box FreeBSD4.3 it looks as 'Inf'. Very similar that this is
platform dependency.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] POC, WIP: OR-clause support for indexes

2016-02-29 Thread Teodor Sigaev


Thank you for review!


I'd like to see comments too! but more so in the code. :) I've had a look over
this, and it seems like a great area in which we could improve on, and your
reported performance improvements are certainly very interesting too. However
I'm finding the code rather hard to follow, which might be a combination of my
lack of familiarity with the index code, but more likely it's the lack of

I've added comments, fixed a found bugs.



comments to explain what's going on. Let's just take 1 function as an example:

Here there's not a single comment, so I'm just going to try to work out what's
going on based on the code.

+static void
+compileScanKeys(IndexScanDesc scan)
+{
+GISTScanOpaqueso = (GISTScanOpaque) scan->opaque;
+int*stack,
+stackPos = -1,
+i;
+
+if (scan->numberOfKeys <= 1 || so->useExec == false)
+return;
+
+Assert(scan->numberOfKeys >=3);

Why can numberOfKeys never be 2? I looked at what calls this and I can't really
Because here they are actually an expression, expression could contain 1 or tree 
or more nodes but could not two (operation AND/OR plus two arguments)



work it out. I'm really also not sure what useExec means as there's no comment
fixed. If useExec == false then SkanKeys are implicitly ANDed and stored in just 
array.



in that struct member, and what if numberOfKeys == 1 and useExec == false, won't
this Assert() fail? If that's not a possible situation then why not?

fixed





+ScanKey key = scan->keyData + i;
Is there a reason not to use keyData[i]; ?

That's the same ScanKey key = &scan->keyData[i];
I prefer first form as more clear but I could be wrong - but there are other 
places in code where pointer arithmetic is used.



+if (stackPos >= 0 && (key->sk_flags & (SK_OR | SK_AND)))
+{
+Assert(stackPos >= 1 && stackPos < scan->numberOfKeys);
stackPos >= 1? This seems unnecessary and confusing as the if test surely makes
that impossible.




+
+so->leftArgs[i] = stack[stackPos - 1];
Something is broken here as stackPos can be 0 (going by the if() not the
Assert()), therefore that's stack[-1].

fixed


stackPos is initialised to -1, so this appears to always skip the first element
of the keyData array. If that's really the intention, then wouldn't it be better
to just make the initial condition of the for() look i = 1 ?

done


I'd like to review more, but it feels like a job that's more difficult than it
needs to be due to lack of comments.

Would it be possible to update the patch to try and explain things a little 
better?

Hope, I made cleaner..


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


index_or-2.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-01 Thread Teodor Sigaev


The basic point of this patch is to apply the generate-and-compare-Paths
paradigm to the planning steps after query_planner(), which only covers

...
> The present patch addresses this problem by inventing Path nodes to
> represent every post-scan/join step

I'm really glad to see that. Separating path nodes for later steps opens a new 
ways to optimize queries. For first glance, consider

select * from a left outer join b on a.i = b.i limit 1;
Limit node could be pushed down to scan over 'a' table if b.i is unique.

I tried to look into patch and I had a question (one for now): why LimitPath 
doesn't contain actual limit/offset value? I saw a lot of subqueries with LIMIT 
1 which could be transformed into EXISTS subquery.


> So I'd really like to get this into 9.6.
Me too. I applied the patch and can confirm that 'make test' doesn't fail on 
FreeBSD 10.2. Now I will try to run kind of TPC-H with and without patch.




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-01 Thread Teodor Sigaev


I do not think the patch will make a lot of performance difference as-is;
its value is more in what it will let us do later.  There are a couple of

Yep, for now on my notebook (best from 5 tries):
% pgbench -i -s 3000
% pgbench  -s 3000 -c 4 -j 4 -P 1 -T 60
HEAD569 tps
patched 542 tps
% pgbench  -s 3000 -c 4 -j 4 -P 1 -T 60 -S
HEAD9500 tps
patched 9458 tps

Looks close to measurement error, but may be explained increased amount of work 
for planning. Including, may be, more complicated path tree.



this kind of optimization to chance.  But the patch is big enough already,
so that (and a lot of other things) are getting left for later.

Agree

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-03 Thread Teodor Sigaev


So I see no evidence for a slowdown on pgbench's SELECT queries.
Anybody else want to check performance on simple scan/join queries?


I did your tests with configure --enable-depend and pgbench -i -s 100 and 
slightly tweaked postgresql.conf, on notebook with CPU i7-3520M (2 cores + 2 
HT), FreeBSD 10.2.


pgbench -c 4 -j 4 -P 1 -T 60 -S
HEAD 35834 tps avg (35825/35828/35808/35844/35865)
Patched HEAD 35529 tps avg (35516/35561/35527/35534/35510)

~1% slowdown. I can live with that.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: function parse_ident

2016-03-10 Thread Teodor Sigaev


select
parse_ident(E'X\rXX');


I am sending updated patch - I used json function for correct escaping - the
escaping behave is same.


Hmm, it doesn't look so:
% select parse_ident(E'_\005');
ERROR:  identifier contains disallowed characters: "_\u0005"
% select parse_ident(E'\005');
ERROR:  missing identifier: "\u0005"

but

# select parse_ident(E'"\005"');
 parse_ident
-
 {\x05}

Error messages above point wrong character wrongly.

One more inconsistence:
# select parse_ident(E'"\005"') as "\005";
  \005

 {\x05}

Display outputs of actual identifier and parse_indent are differ.

Actually, I can live with both but are any other opinions? Seems, at least 
difference of actual identifier and output of parse_indent should be pointed in 
docs.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tsvector editing functions

2016-03-10 Thread Teodor Sigaev


Thanks! Fixed and added tests.

Thank you!

I did some patch cleanup/fix, but I have some doubt with function's names:

1 to_tsvector:
# \df to_tsvector
 List of functions
   Schema   |Name | Result data type | Argument data types |  Type
+-+--+-+
 pg_catalog | to_tsvector | tsvector | regconfig, text | normal
 pg_catalog | to_tsvector | tsvector | text| normal
 pg_catalog | to_tsvector | tsvector | text[]  | normal

First two variants of to_tsvector make a morphological processing, last one 
doesn't.

2 to_array
# \df *to_array
  List of functions
   Schema   | Name  | Result data type | Argument data types | 
 Type

+---+--+-+
 pg_catalog | regexp_split_to_array | text[]   | text, text  | 
normal
 pg_catalog | regexp_split_to_array | text[]   | text, text, text| 
normal
 pg_catalog | string_to_array   | text[]   | text, text  | 
normal
 pg_catalog | string_to_array   | text[]   | text, text, text| 
normal
 pg_catalog | to_array  | text[]   | tsvector| 
normal


Seems, to_array is not a right name compared to other *to_array.

I would like to suggest rename both functions to array_to_tsvector and 
tsvector_to_array to have consistent name. Later we could add 
to_tsvector([regconfig, ], text[]) with morphological processing.


Thoughts?



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4b5ee81..ed0b6be 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -9211,16 +9211,29 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
  
   setweight
  
- setweight(tsvector, "char")
+ setweight(vector tsvector, weight "char")
 
 tsvector
-assign weight to each element of tsvector
+assign weight to each element of vector
 setweight('fat:2,4 cat:3 rat:5B'::tsvector, 'A')
 'cat':3A 'fat':2A,4A 'rat':5A


 
  
+  setweight
+  setweight by filter
+ 
+ setweight(vector tsvector, weight "char", lexemes "text"[])
+
+tsvector
+assign weight to elements of vector that are listed in lexemes array
+setweight('fat:2,4 cat:3 rat:5B'::tsvector, 'A', '{cat,rat}')
+'cat':3A 'fat':2,4 'rat':5A
+   
+   
+
+ 
   strip
  
  strip(tsvector)
@@ -9233,6 +9246,81 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple

 
  
+  delete
+  delete lemexeme
+ 
+ delete(vector tsvector, lexeme text)
+
+tsvector
+remove given lexeme from vector
+delete('fat:2,4 cat:3 rat:5A'::tsvector, 'fat')
+'cat':3 'rat':5A
+   
+   
+
+ 
+  delete
+  delete lemexemes array
+ 
+ delete(vector tsvector, lexemes text[])
+
+tsvector
+remove any occurrence of lexemes in lexemes array from vector
+delete('fat:2,4 cat:3 rat:5A'::tsvector, ARRAY['fat','rat'])
+'cat':3
+   
+   
+
+ 
+  unnest
+ 
+ unnest(tsvector, OUT lexeme text, OUT positions smallint[], OUT weights text)
+
+setof record
+expand a tsvector to a set of rows
+unnest('fat:2,4 cat:3 rat:5A'::tsvector)
+(cat,{3},{D}) ...
+   
+   
+
+ 
+  to_array
+ 
+ to_array(tsvector)
+
+text[]
+convert tsvector to array of lexemes
+to_array('fat:2,4 cat:3 rat:5A'::tsvector)
+{cat,fat,rat}
+   
+   
+
+ 
+  to_tsvector
+  array to tsvector
+ 
+ to_tsvector(text[])
+
+tsvector
+convert array of lexemes to tsvector
+to_tsvector('{fat,cat,rat}'::text[])
+'fat' 'cat' 'rat'
+   
+   
+
+ 
+  filter
+ 
+ filter(vector tsvector, weights "char"[])
+

Re: [HACKERS] Tsvector editing functions

2016-03-11 Thread Teodor Sigaev


I saw errors on windows, here is the fix:

Thank you, pushed

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: function parse_ident

2016-03-14 Thread Teodor Sigaev


I afraid so I cannot to fix this inconsistency (if this is inconsistency - the
binary values are same) - the parameter of function is raw string with processed
escape codes, and I have not any information about original escape sequences.
When you enter octet value, and I show it as hex value, then there should be
difference. Buy I have not information about your input (octet or hex). I have
the original string of SQL identifier inside parser, executor, but I have not
original string of function parameter inside function (not without pretty
complex and long code).

Ok, agree



I am trying describe it in doc (I am sorry for my less level English) in new
patch. Fixed duplicated oid too.

Edited a bit + fix some typos and remove unneeded headers, patch attached

Sorry, I can't find all corner-cases at once, but:
SELECT parse_ident(E'"c".X XX');
ERROR:  identifier contains disallowed characters: "\"c"

Error message wrongly points to the reason of error.




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


parse_ident-13.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: function parse_ident

2016-03-18 Thread Teodor Sigaev


I hope so the messages are ok now. Few more regress tests added.


Thank you, committed.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-18 Thread Teodor Sigaev


Do reconstructedValues is required now?  Wouldn't it be cleaner to use
the new field on the prefix tree implementation, too?
reconstructedValues is needed to reconctruct full indexed value and should match 
to type info indexed column




We haven't had specific memory context for reconstructedValues.  Why
is it required for the new field?
because pg knows type of reconstructedValues (see above why) but traversalValue 
just a void*, SP-GiST core doesn't knnow how to free memory of allocated for it.






+   Memory for traversalValues should be allocated in
+   the default context, but make sure to switch to
+   traversalMemoryContext before allocating memory
+   for pointers themselves.


This sentence is not understandable.  Shouldn't it say "the memory
should *not* be allocated in the default context"?  What are the
"pointers themselves"?

Clarified, I hope




The box opclass is broken because of the floating point arithmetics of
the box type.  You can reproduce it with the attached script.  We

hope, fixed. At least, seems, syncronized with seqscan


really need to fix the geometric types, before building more on them.
They fail to serve the only purpose they are useful for, demonstrating
features.

agree, but it's not a deal for this patch



It think the opclass should support the cross data type operators like
box @> point.  Ideally we should do it by using multiple opclasses in
an opfamily.  The floating problem will bite us again in here, because
some of the operators are not using FP macros.

Again, agree. But I suggest to do it by separate patch.



The tests check not supported operator "@".  It should be "<@".  It is
nice that the opclass doesn't support long deprecated operators.

fixed



+ #define LT  -1
+ #define GT   1
+ #define EQ   0


Using these numbers is a very well established pattern.  I don't think
macros make the code any more readable.

fixed




+ /* SP-GiST API functions */
+ Datum   spg_box_quad_config(PG_FUNCTION_ARGS);
+ Datum   spg_box_quad_choose(PG_FUNCTION_ARGS);
+ Datum   spg_box_quad_picksplit(PG_FUNCTION_ARGS);
+ Datum   spg_box_quad_inner_consistent(PG_FUNCTION_ARGS);
+ Datum   spg_box_quad_leaf_consistent(PG_FUNCTION_ARGS);


I guess they should go to the header file.
Remove them because they are presented in auto-generated file 
./src/include/utils/builtins.h


range patch is unchanged, but still attached to keep them altogether.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


q4d-2.patch.gz
Description: GNU Zip compressed data


traversalValue-2.patch.gz
Description: GNU Zip compressed data


range-1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2016-03-18 Thread Teodor Sigaev


Good catch, thanks! Tests were added.


I don't see any objection, is consensus reached? I'm close to comiit that...

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] POC, WIP: OR-clause support for indexes

2016-03-18 Thread Teodor Sigaev





I also wonder whether the patch should add explanation of OR-clauses
handling into the READMEs in src/backend/access/*


Oops, will add shortly.


The patch would probably benefit from transforming it into a patch
series - one patch for the infrastructure shared by all the indexes,
then one patch per index type. That should make it easier to review, and
I seriously doubt we'd want to commit this in one huge chunk anyway.

Ok, will do it.


1) fields in BrinOpaque are not following the naming convention (all the
existing fields start with bo_)

fixed



2) there's plenty of places violating the usual code style (e.g. for
single-command if branches) - not a big deal for WIP patch, but needs to
get fixed eventually

hope, fixed



3) I wonder whether we really need both SK_OR and SK_AND, considering
they are mutually exclusive. Why not to assume SK_AND by default, and
only use SK_OR? If we really need them, perhaps an assert making sure
they are not set at the same time would be appropriate.

In short: possible ambiguity and increasing stack machine complexity.
Let we have follow expression in reversed polish notation (letters represent a 
condtion, | - OR, & - AND logical operation, ANDs are omitted):

a b c |

Is it ((a & b)| c) or (a & (b | c)) ?

Also, using both SK_ makes code more readable.



4) scanGetItem is a prime example of the "badly needs comments" issue,
particularly because the previous version of the function actually had
quite a lot of them while the new function has none.

Will add soon



5) scanGetItem() may end up using uninitialized 'cmp' - it only gets
initialized when (!leftFinished && !rightFinished), but then gets used
when either part of the condition evaluates to true. Probably should be

 if (!leftFinished || !rightFinished)
 cmp = ...

fixed



6) the code in nodeIndexscan.c should not include call to abort()

 {
 abort();
 elog(ERROR, "unsupported indexqual type: %d",
 (int) nodeTag(clause));
 }

fixed, just forgot to remove



7) I find it rather ugly that the paths are built by converting BitmapOr
paths. Firstly, it means indexes without amgetbitmap can't benefit from
this change. Maybe that's reasonable limitation, though?

I based on following thoughts:
1 code which tries to find OR-index path will be very similar to existing
  generate_or_bitmap code. Obviously, it should not be duplicated.
2 all existsing indexes have amgetbitmap method, only a few don't. amgetbitmap
  interface is simpler. Anyway, I can add an option for generate_or_bitmap
  to use any index, but, in current state it will just repeat all work.



But more importantly, this design already has a bunch of unintended
consequences. For example, the current code completely ignores
enable_indexscan setting, because it merely copies the costs from the
bitmap path.

I'd like to add separate enable_indexorscan


That's pretty dubious, I guess. So this code probably needs to be made
aware of enable_indexscan - right now it entirely ignores startup_cost
in convert_bitmap_path_to_index_clause(). But of course if there are
multiple IndexPaths, the  enable_indexscan=off will be included multiple
times.

9) This already breaks estimation for some reason. Consider this

...

So the OR-clause is estimated to match 0 rows, less than each of the
clauses independently. Needless to say that without the patch this works
just fine.

fixed



10) Also, this already breaks some regression tests, apparently because
it changes how 'width' is computed.

fixed too



So I think this way of building the index path from a BitmapOr path is
pretty much a dead-end.
I don't think so because separate code path to support OR-clause in index will 
significanlty duplicate BitmapOr generator.


Will send next version as soon as possible. Thank you for your attention!

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


index_or-3.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] POC, WIP: OR-clause support for indexes

2016-03-19 Thread Teodor Sigaev


I also wonder whether the patch should add explanation of OR-clauses
handling into the READMEs in src/backend/access/*

Not yet, but will


The patch would probably benefit from transforming it into a patch
series - one patch for the infrastructure shared by all the indexes,
then one patch per index type. That should make it easier to review, and
I seriously doubt we'd want to commit this in one huge chunk anyway.

I splitted to two:
1 0001-idx_or_core - only planner and executor changes
2 0002-idx_or_indexes - BRIN/GIN/GiST changes with tests

I don't think that splitting of second patch adds readability but increase 
management diffculties, but if your insist I will split.



4) scanGetItem is a prime example of the "badly needs comments" issue,
particularly because the previous version of the function actually had
quite a lot of them while the new function has none.

added


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


0001-idx_or_core-v4.patch.gz
Description: GNU Zip compressed data


0002-idx_or_indexes-v4.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2016-03-20 Thread Teodor Sigaev




I don't see any objection, is consensus reached? I'm close to comiit
that...



I did only cursory review on the bloom contrib module and don't really have
complaints there, but I know you can review that one. I'd like the English of
the generic_xlog.c description improved but I won't get to it before weekend.


So, per patch status:
create-am
ready
generic-xlog
need to fix comments/docs
bloom-contrib
need review. I'm an original author of this module and of course
I can do some review of it but, seems, it's needed more eyes.
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-21 Thread Teodor Sigaev

->nkeys && flag; i++)

It checks flag two times.

fixed




+boxPointerToRectangle(
+DatumGetBoxP(in->scankeys[i].sk_argument),
+p_query_rect
+);

I don't think this matches the project coding style.

fixed




+ int flag = 1,

Wouldn't it be better as "bool"?

fixed.


The regression tests for the remaining operators are still missing.

Ooops, will be soon.

Attached all patches  to simplify review. Thank you very much!

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/
I'll try to explain with two-dimensional example over points. ASCII-art:
   |
   |
  1|2
   |
---+-
   |P
  3|4
   |
   |

'+' with 'A' represents a centroid or, other words, point which splits plane 
for 4 quadrants and it strorend in parent node. 1,2,3,4 are labels of 
quadrants, each labeling will be the same for all pictures and all centriods, 
and i will not show them in pictures later to prevent too complicated images. 
Let we add C - child node (and again, it will split plane for 4 quads):
   | |
   +-+---
   X   |B|C
   | |
---+-+---
   |P|A
   | |
   |
   |

A and B are points of intersection of lines. So, box PBCAis a bounding box for 
points contained in 3-rd (see labeling above). For example X labeled point is 
not a descendace of child node with centroid C because it must be in branch of 
1-st quad of parent node. So, each node (except root) will have a limitation in 
its quadrant. To transfer that limitation the traversalValue is used.


traversalValue-2.patch.gz
Description: GNU Zip compressed data


q4d-3.patch.gz
Description: GNU Zip compressed data


range-1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-22 Thread Teodor Sigaev


tend to think of a *point* as having *zero* dimensions.  Would it
perhaps be more accurate to say we are treating a 2-dimensional box
as a point in 4-dimensional space?

Exactly, sorry for ambiguity.
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-22 Thread Teodor Sigaev




Isn't this basically the same thing that the cube contrib module does? (Which
has the added benefit of kNN-capable operators).
No, cube module introduces new type - N-dimensional box. And adds an index 
support for it.


Current patch suggests non-traditional indexing technique for 2D boxes by 
treating them as point in 4D space. With such representation it's became 
possible to use quad-tree technique which:

1 supports only points to index
2 already supported by SP-GiST

Such technique provides some benefits in comparance with traditional R-Tree 
which implemented in pg with a help GiST. See Oleg's message.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-24 Thread Teodor Sigaev


+  * boxtype_spgist.c

The names on the file header need to be changed, too.

Oops. fixed




I'll try to explain with two-dimensional example over points. ASCII-art:

Thank you for the explanation.  Should we incorporate this with the patch.

added



+ cmp_double(const double a, const double b)

Does this function necessary?  We can just compare the doubles.
Yes, it compares with limited precision as it does by geometry operations. 
Renamed to point actual arguments.





+ boxPointerToRangeBox(BOX *box, RangeBox * rectangle)

The "rectangle" variable in here should be renamed.

fixed




+ Assert(is_infinite(b) == 0);

This is failing on my test.  You can reproduce with the script I have sent.

I didn't know:
# select '(1,inf)'::point;
  point
-
 (1,inf)

fixed




Wouldn't it be simpler to palloc and return the value on those functions?

evalRangeBox() initializes part of RectBox, so, it could not palloc its
result.
Other methods use the same signature just for consistency.


I couldn't understand it.  evalRangeBox() can palloc and return the
result.  evalRectBox() can return the result palloc'ed by
evalRangeBox().  The caller wouldn't need to palloc anything.

evalRangeBox() is used to initialize fields of RangeBox in evalRectBox(). If
evalRangeBox() will palloc its result then we need to copy its result into 
RangeBox struct and free. Let both fucntion use the same interface.



I went through all other variables:

+ int r = is_infinite(a);
+ double  x = *(double *) a;
+ double  y = *(double *) b;
+ spgInnerConsistentIn *in = (spgInnerConsistentIn *) PG_GETARG_POINTER(0);
+ spgLeafConsistentIn *in = (spgLeafConsistentIn *) PG_GETARG_POINTER(0);
+ BOX*leafBox = DatumGetBoxP(in->leafDatum);

Shouldn't they be "const", too?
They could. But it doesn't required. To be in consistent state I've removed 
const modifier where possible






+/*
+ * Begin. This block evaluates the median of coordinates of boxes
+ */


I would rather explain what the function does on the function header.


fixed

The "end" part of it is still there.

Oops again, fixed




Do we really need to copy the traversalValues on allTheSame case.
Wouldn't it work if just the same value is passed for all of them.
The search shouldn't continue after allTheSame case.


Seems, yes. spgist tree could contain a locng branches with allTheSame.


Does SP-GiST allows any node under allTheSame to not being allTheSame?

No, SP-GiST doesn't allow that

  Not setting traversalValues for allTheSame worked fine with my test.
it works until allthesame branch contains only one inner node. Otherwise 
traversalValue will be freeed several times, see spgWalk().

# select i as id, '1,2,3,4'::box as b into x from generate_series(1,100) i;
# create index ix on i using spgist (b);
# select count(*) from x where b && '1,2,3,4'::box; -- coredump
gdb:
#0  0x00080143564a in thr_kill () from /lib/libc.so.7
#1  0x000801435636 in raise () from /lib/libc.so.7
#2  0x0008014355b9 in abort () from /lib/libc.so.7
#3  0x00a80739 in in_error_recursion_trouble () at elog.c:199
#4  0x00abb748 in pfree (pointer=0x801e90868) at mcxt.c:1016
#5  0x0053330c in freeScanStackEntry (so=0x801e8d358, 
stackEntry=0x801e935d8) at spgscan.c:47
#6  0x00532cdb in spgWalk (index=0x801f1c588, so=0x801e8d358, 
scanWholeIndex=1 '\001', storeRes=0x532d10 
...




+ if (in->allTheSame)

Most of the things happening before this check is not used in there.
Shouldn't we move this to the top of the function?

yep, fixed




+ out->nodeNumbers = (int *) palloc(sizeof(int) * in->nNodes);

This could go before allTheSame block.

yep, fixed

I've attached all patches again. Thank you very much!

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


q4d-4.patch.gz
Description: GNU Zip compressed data


traversalValue-2.patch.gz
Description: GNU Zip compressed data


range-1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Draft release notes for next week's releases

2016-03-29 Thread Teodor Sigaev


Does include ICU mean that collation handling is identical across platforms?
E.g. a query on Linux involving string comparison would yield the same
result on MacOS and Windows?
Yes, it does and that's the most important issue for us.


Yes, exactly. Attached patch adds support for libicu with configure flag 
--with-icu. Patch rebased to current HEAD, hope, it works.


It's based on https://people.freebsd.org/~girgen/postgresql-icu/readme.html 
work, and it was migrated to 9.5 with abbrevation keys support.

Patch in current state is not ready to commit, of course.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


libicu-8.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2016-03-29 Thread Teodor Sigaev


Does that mean this patch should be closed or is there more remaining to commit?


Petr promises to check english in comments/docs in generic-wal patch at this 
week, bloom patch depends on it. Bloom patch is ready from my point of view.




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2016-03-29 Thread Teodor Sigaev


And here it is. It's not perfect but it's better (I am not native speaker
either). It's same as v12, just changed comments somewhat.


Thank you, but I have a problems with applying:

% patch -p1 < ~/Downloads/0002-generic-xlog.13.patch
Hmm...  Looks like a new-style context diff to me...
...
|diff --git a/src/backend/access/transam/generic_xlog.c 
b/src/backend/access/transam/generic_xlog.c

|new file mode 100644
|index ...7ca03bf
|*** a/src/backend/access/transam/generic_xlog.c
|--- b/src/backend/access/transam/generic_xlog.c
--
(Creating file src/backend/access/transam/generic_xlog.c...)
Patching file src/backend/access/transam/generic_xlog.c using Plan A...
patch:  malformed patch at line 634: diff --git 
a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2016-03-29 Thread Teodor Sigaev


I incorporated your changes and did some additional refinements on top of them
still.

Attached is delta against v12, that should cause less issues when merging for
Teodor.


But last version is 13th?

BTW, it would be cool to add odcs in VII. Internals chapter, description should 
be similar to index's interface description, i.e. how to use generic WAL interface.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2016-03-30 Thread Teodor Sigaev


GenericXLogStart(Relation relation)
{
...
if (genericXlogStatus != GXLOG_NOT_STARTED)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg("GenericXLogStart: generic xlog is already started")));


Hmm, seems, generic wal whiil be in incorrect state if exception occurs between 
GenericXLogStart() and GenericXLogFinish() calls because static variable 
genericXlogStatus will contain GXLOG_LOGGED/GXLOG_UNLOGGED status.


Suppose, it could be solved by different ways
- remove all static variable, so, GenericXLogStart() will return an struct
  (object) which incapsulated all data needed to generic wal work. As I can
  see, in case of exception there isn't ane needing to extra cleanup. Also,
  it would allow to use generic wal for two or more relations at the same time,
  although I don't know any useful example for such feature.
- add callback via RegisterResourceReleaseCallback() which will cleanup state
  of genericXlogStatus variable

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2016-03-30 Thread Teodor Sigaev


as we discussed recently [1] you should avoid leaving "holes" with
uninitialized data in structures. Please fix this or provide a comment
that describes why things are done here the way they are done.
[1] http://www.postgresql.org/message-id/56eff347.20...@anastigmatix.net
That discussion is about SQL-level types which could be stored on disk, not 
about in-memory structs



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-30 Thread Teodor Sigaev


Thank you, pushed

Emre Hasegeli wrote:

I'll try to explain with two-dimensional example over points. ASCII-art:


Thank you for the explanation.  Should we incorporate this with the patch.


added


I have worked on the comments of the patch.  It is attached.  I hope
it looks more clear than it was before.


+ cmp_double(const double a, const double b)


Does this function necessary?  We can just compare the doubles.


Yes, it compares with limited precision as it does by geometry operations.
Renamed to point actual arguments.


I meant that we could use FP macros directly instead of this function.
Doing so is also more correct.  I haven't tried to produce the
problem, but this function is not same as using the macros directly.
For differences smaller that the epsilon, it can return different
results.  I have removed it on the attached version.


+ boxPointerToRangeBox(BOX *box, RangeBox * rectangle)


The "rectangle" variable in here should be renamed.


fixed


I found a bunch of those too and renamed for clarity.  I have also
reordered the arguments of the helper functions to keep them
consistent.


evalRangeBox() is used to initialize fields of RangeBox in evalRectBox(). If
evalRangeBox() will palloc its result then we need to copy its result into
RangeBox struct and free. Let both fucntion use the same interface.


I found it nicer to copy and edit the existing value than creating it
in two steps like this.  It is also attached.


it works until allthesame branch contains only one inner node. Otherwise
traversalValue will be freeed several times, see spgWalk().


It just works, when traversalValues is not set.  It is also attached.

I have also added the missing regression tests.  While doing that I
noticed that some operators are missing and also added support for
them.  It is also attached with the updated version of my test script.

I hope I haven't changed the patch too much.  I have also pushed the
changes here:

https://github.com/hasegeli/postgres/commits/box-spgist



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension

2016-01-29 Thread Teodor Sigaev


Sure. I attached two patches. But notice that pg_trgm.limit should be used with
this command:
SHOW "pg_trgm.limit";
If you will use this command:
SHOW pg_trgm.limit;
you will get the error:
ERROR:  syntax error at or near "limit"
LINE 1: SHOW pg_trgm.limit;
  ^

This is because "limit" is keyword I think.


It's easy to fix in gram.y:
@@ -1499,7 +1499,7 @@ set_rest_more:/* Generic SET syntaxes: */
;

 var_name:  ColId   { $$ = $1; }
-   | var_name '.' ColId
+   | var_name '.' ColLabel
{ $$ = psprintf("%s.%s", $1, $3); }
;

ColId doesn't contain reserved_keyword, it's impossible to change initial part 
of var_name to ColId because of a lot of conflicts in grammar but could be easy 
changed for second part of var_name. It seems like improvement in any case but
sml_limit or similarity_limit or even similarity_treshold  is more preferable 
name than just simple limit. In future we could introduce more tresholds/limits.


Also, should get/set_limit emit a warning about deprecation?

Some notices about substring patch itself:
1  trgm2.data contains too much duplicates (like Barkala or Bakalan). Is it 
really needed for testing?


2 I'm agree with Jeff Janes about <<-> and <->> operation. They are needed. 
(http://www.postgresql.org/message-id/CAMkU=1zynkqfkr-j2_uq8lzp0uho8i+ledfwgt77czk_tnt...@mail.gmail.com)


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension

2016-01-29 Thread Teodor Sigaev


The behavior of this function is surprising to me.

select substring_similarity('dog' ,  'hotdogpound') ;

  substring_similarity
--
  0.25


Substring search was desined to search similar word in string:
contrib_regression=# select substring_similarity('dog' ,  'hot dogpound') ;
 substring_similarity
--
 0.75

contrib_regression=# select substring_similarity('dog' ,  'hot dog pound') ;
 substring_similarity
--
1
It seems to me that users search words in long string. But I'm agree that more 
detailed explanation needed and, may be, we need to change feature name to 
fuzzywordsearch or something else, I can't imagine how.





Also, should we have a function which indicates the position in the
2nd string at which the most similar match to the 1st argument occurs?

select substring_similarity_pos('dog' ,  'hotdogpound') ;

answering: 4

Interesting, I think, it will be useful in some cases.



We could call them <<-> and <->> , where the first corresponds to <%
and the second to %>

Agree
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tsvector editing functions

2016-02-02 Thread Teodor Sigaev


Some notices:

1 tsin in documentation doesn't look like a good name. Changed to vector similar 
to other places.


2 I did some editorization about freeing memory/forgotten names etc

3 It seems to me that tsvector_unnest() could be seriously optimized for
large tsvectors: with current coding it detoasts/decompresses tsvector value on 
each call. Much better to do it once in

multi_call_memory_ctx context at first call init

4 It seems debatable returning empty array for position/weight if they are 
absent:
=# select * from unnest('a:1 b'::tsvector);
 lexeme | positions | weights
+---+-
 a  | {1}   | {D}
 b  | {}| {}
I think, it's better to return NULL in this case

5 
array_to_tsvector/tsvector_setweight_by_filter/tsvector_delete_arr/tsvector_filter 
doesn't check or pay attention to NULL elements in input arrays






--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tsvector editing functions

2016-02-02 Thread Teodor Sigaev


Some notices:

1 tsin in documentation doesn't look like a good name. Changed to vector similar
to other places.

2 I did some editorization about freeing memory/forgotten names etc


Ooops, forgot to attach
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 139aa2b..9c294e3 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -9168,16 +9168,29 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
  
   setweight
  
- setweight(tsvector, "char")
+ setweight(vector tsvector, weight "char")
 
 tsvector
-assign weight to each element of tsvector
+assign weight to each element of vector
 setweight('fat:2,4 cat:3 rat:5B'::tsvector, 'A')
 'cat':3A 'fat':2A,4A 'rat':5A


 
  
+  setweight
+  setweight by filter
+ 
+ setweight(vector tsvector, weight "char", lexemes "text"[])
+
+tsvector
+assign weight to elements of vector that are listed in lexemes array
+setweight('fat:2,4 cat:3 rat:5B'::tsvector, 'A', '{cat,rat}')
+'cat':3A 'fat':2,4 'rat':5A
+   
+   
+
+ 
   strip
  
  strip(tsvector)
@@ -9190,6 +9203,84 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple

 
  
+  delete
+  delete lemexeme
+ 
+ delete(vector tsvector, lexeme text)
+
+tsvector
+remove given lexeme from vector
+delete('fat:2,4 cat:3 rat:5A'::tsvector, 'fat')
+'cat':3 'rat':5A
+   
+   
+
+ 
+  delete
+  delete lemexemes array
+ 
+ delete(vector tsvector, lexemes text[])
+
+tsvector
+remove any occurrence of lexemes in lexemes array from vector
+delete('fat:2,4 cat:3 rat:5A'::tsvector, ARRAY['fat','rat'])
+'cat':3
+   
+   
+
+ 
+  unnest
+ 
+ unnest(tsvector)
+
+setof anyelement
+expand a tsvector to a set of rows. Each row has following columns: lexeme, postings, weights.
+unnest('fat:2,4 cat:3 rat:5A'::tsvector)
+cat{3}{A}
+fat{2,4}  {D,D}
+rat{5}{A}
+(3 rows)
+   
+   
+
+ 
+  to_array
+ 
+ to_array(tsvector)
+
+text[]
+convert tsvector to array of lexemes
+to_array('fat:2,4 cat:3 rat:5A'::tsvector)
+{cat,fat,rat}
+   
+   
+
+ 
+  to_tsvector
+  array to tsvector
+ 
+ to_tsvector(text[])
+
+tsvector
+convert array of lexemes to tsvector
+to_tsvector('{fat,cat,rat}'::text[])
+'fat' 'cat' 'rat'
+   
+   
+
+ 
+  filter
+ 
+ filter(vector tsvector, weights "char"[])
+
+tsvector
+Select only elements with given weights from vector
+filter('fat:2,4 cat:3b rat:5A'::tsvector, '{a,b}')
+'cat':3B 'rat':5A
+   
+   
+
+ 
   to_tsquery
  
  to_tsquery( config regconfig ,  query text)
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index d66b4d5..32033fa 100644
--- a/doc/src/sgml/textsearch.sgml
+++ b/doc/src/sgml/textsearch.sgml
@@ -1326,6 +1326,10 @@ FROM (SELECT id, body, q, ts_rank_cd(ti, q) AS rank
 

 
+   
+Full list of tsvector-related functions available in .
+   
+
   
 
   
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index a3f1c361..e7ea270 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -14,6 +14,7 @@
 
 #include "postgres.h"
 
+#include "access/htup_details.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
@@ -65,6 +66,7 @@ typedef struct
 #define STATHDRSIZE (offsetof(TSVectorStat, data))
 
 static Datum tsvector_update_trigger(PG_FUNCTION_ARGS, bool config_column);
+static int tsvector_bsearch(TSVector tsin, char *lexin, int lexin_len);
 
 /*
  * Order: haspos, len, word, for al

Re: [HACKERS] [patch] Proposal for \crosstabview in psql

2016-02-08 Thread Teodor Sigaev


Hi!

Interesting feature, but it's not very obvious how to use it. I'd like to see 
some example(s) in documentation.


And I see an implementation of AVL tree in psql source code 
(src/bin/psql/crosstabview.c). Postgres already has a Red-Black tree 
implementation in
src/include/lib/rbtree.h and src/backend/lib/rbtree.c. Is any advantage of using 
AVL tree here? I have some doubt about that and this implementation, obviously, 
will not be well tested. But I see in comments that implementation is reduced to 
insert only and it doesn't use the fact of ordering tree, so, even hash could be 
used.


Daniel Verite wrote:

Pavel Stehule wrote:


1. maybe we can decrease name to shorter "crossview" ?? I am happy with
crosstabview too, just crossview is correct too, and shorter


I'm in favor of keeping crosstabview. It's more explicit, only
3 characters longer and we have tab completion anyway.


2. Columns used for ordering should not be displayed by default. I can live
with current behave, but hiding ordering columns is much more practical for
me


I can see why, but I'm concerned about a consequence:
say we have 4 columns A,B,C,D and user does \crosstabview +A:B +C:D
If B and D are excluded by default, then there's nothing
left to display inside the grid.
It doesn't feel quite right. There's something counter-intuitive
in the fact that values in the grid would disappear depending on
whether and how headers are sorted.
With the 3rd argument, we let the user decide what they want
to see.


3. This code is longer, so some regress tests are recommended - attached
simple test case


I've added a few regression tests to the psql testsuite
based on your sample data. New patch with these tests
included is attached, make check passes.

Best regards,






--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: function parse_ident

2016-02-08 Thread Teodor Sigaev


rebased, messages changes per Tom's proposal

Cool feature and sometimes I needed it a lot.

But, seems, there are some bugs in error processing.

1
Following query is okay:
# select * from parse_ident(E'"Some \r Schema".someTable');
 parse_ident
--
 {"Some \r Schema",sometable}
but:
% select * from parse_ident(E'"Some \r Schema".9someTable');
 Schema".9someTable"tifier after "." symbol: ""Some

Return carriage is not escaped in error message. Suppose, any other
special charaters will not be escaped.

2
# select * from parse_ident('.someTable');
ERROR:  missing identifier after "." symbol: ".someTable"
Why AFTER  instead of BEFORE?

2
Function succesfully truncates long indentifier but not in case of quoted 
identifier.
select length(a[1]), length(a[2]) from 
parse_ident('"xx".yyyyy') 
as a ;

 length | length
+
414 | 63








--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Mac OS: invalid byte sequence for encoding "UTF8"

2016-02-10 Thread Teodor Sigaev


It seems that *scanf() with %s format occures only here:
- check.c - get_bin_version()
- server.c - get_major_server_version()
- filemap.c - isRelDataFile()
- pg_backup_directory.c - _LoadBlobs()
- xlog.c - do_pg_stop_backup()
- mac.c - macaddr_in()
I think here sscanf() do not works with the UTF-8 characters. And probably this
is only spell.c issue.


Hmm. Here
src/backend/access/transam/xlog.c read_tablespace_map()
using %s in scanf looks suspisious. I don't fully understand but it looks like 
it tries to read oid as string. So, it should be safe in usial case


Next, _LoadBlobs() reads filename (fname) with a help of sscanf. Could file name 
be in UTF-8 encoding here?




I agree that previous patch is wrong. Instead of using new parse_ooaffentry()
function maybe better to use sscanf() with %ls format. The %ls format is used to
read a wide character string.

Does %ls modifier exist everewhere?
Apple docs says 
(https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/sscanf.3.html):

s ...
  If an l qualifier is present, the next pointer must be a pointer to wchar_t,
  into which the input will be placed after conversion by mbrtowc

Actually, it means that wchar2char() call should be used, but it uses
  wcstombs[_l] which could do not present on some platforms. Does it mean that 
l modifier of string presents too or not? What do we need to do if %l exists but 
wcstombs[_l] not?


I'm a bit crazy with locale problems and it seems to me that Artur's patch is 
good idea. Actually, I don't remember exactly, but, seems, commit 
7ac8a4be8946c11d5a6bf91bb971b9750c1c60e5 introduced parse_affentry() instead of 
corresponding sscanf to avoid problems with encoding and scanf.





--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Improvements of Hunspell dictionaries support

2016-02-10 Thread Teodor Sigaev




I duplicate the patch here.


it's very good thing to update disctionaries to support modern versions. And 
thank you for improving documentation. Also I've impressed by long description 
in spell.c header.


Som notices about code:

1
 struct SPELL. Why do you remove union p? You leave comment
 about using d struct instead of flag field and as can see
 it's right comment. It increases size of SPELL structure.

2 struct AFFIX. I'm agree with Alvaro taht sum of sizes of bit fields should be 
less or equal to size of integer. In opposite case, suppose, we can get 
undefined behavior. Please, split  bitfields  to two integers.


3 unsigned char flagval[65000];
  Is it forbidden to use 6 number? In any case, decodeFlag() doesn't
  restrict return value. I suggest to enlarge array to 1<<16 and add limit
  to return value of decodeFlag().

4
 I'd like to see a short comment describing at least new functions

5
 Pls, add tests for new code.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension

2016-02-11 Thread Teodor Sigaev


The behavior of this function is surprising to me.

select substring_similarity('dog' ,  'hotdogpound') ;

  substring_similarity
--
  0.25


Substring search was desined to search similar word in string:
contrib_regression=# select substring_similarity('dog' ,  'hot dogpound') ;
  substring_similarity
--
  0.75

contrib_regression=# select substring_similarity('dog' ,  'hot dog pound') ;
  substring_similarity
--
 1


Hmm, this behavior looks too much like magic to me.  I mean, a substring
is a substring -- why are we treating the space as a special character
here?


Because  it isn't a regex for substring search. Since implementing, pg_trgm 
works over words in string.

contrib_regression=# select similarity('block hole', 'hole black');
 similarity

   0.571429
contrib_regression=# select similarity('block hole', 'black hole');
 similarity

   0.571429

It ignores spaces between words and word's order.

I agree, that substring_similarity is confusing name, but actually it search 
most similar word in second arg to first arg and returns their similarity.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension

2016-02-11 Thread Teodor Sigaev


I have attached a new version of the patch. It fixes error of operators <->> and
%>:
- operator <->> did not pass the regression test in CentOS 32 bit (gcc 4.4.7
20120313).
- operator %> did not pass the regression test in FreeBSD 32 bit (gcc 4.2.1
20070831).

It was because of variable optimization by gcc.


Fixed with volatile modifier, right?

I'm close to push this patches, but I still doubt in names, and I'd like to see 
comment from English speackers:

1 sml_limit GUC variable (options: similarity_limit, sml_threshold)
2 subword_similarity(). Actually, it finds most similar word (not substring!) 
from whole string. word_similarity? word_in_string_similarity?



substring_similarity_pos() could be a separate patch.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension

2016-02-11 Thread Teodor Sigaev


1 - sml_limit to similarity_limit. sml_threshold is difficult to write I think,
similarity_limit is more simple.

It seems to me that threshold is right word by meaning. sml_threshold is my 
choice.


2 - subword_similarity() to word_similarity().

Agree, according to Mike Rylander opinion in this thread


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improved error reporting in format()

2016-02-11 Thread Teodor Sigaev


Thank you, committed

Jim Nasby wrote:

On 1/2/16 5:57 PM, Jim Nasby wrote:

Attached patch clarifies that %-related error messages with hints as
well as (IMHO) improving the clarity of the message:


Sorry, forgot to update regression tests. New patch attached.





--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GinPageIs* don't actually return a boolean

2016-02-12 Thread Teodor Sigaev


One more option for patch:

#define GinPageIsLeaf(page)((bool)(GinPageGetOpaque(page)->flags & 
GIN_LEAF))

Seems it will work on any platform with built-in bool. But I don't know will it 
work with 'typedef char bool' if high bit will be set.



That's true, but it doesn't really seem like a reason not to commit
this patch.  I mean, the coding here is (a) dangerous by your own
admission and (b) actually breaks on platforms for which we allege
support.  If we find out that somebody has implemented an int-width
bool we'll have some bigger decisions to make, but I don't see any
particular reason why we've got to make those decisions now.


+1

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension

2016-02-12 Thread Teodor Sigaev


On Thu, Feb 11, 2016 at 9:56 AM, Teodor Sigaev  wrote:

1 - sml_limit to similarity_limit. sml_threshold is difficult to write I
think,
similarity_limit is more simple.


It seems to me that threshold is right word by meaning. sml_threshold is my
choice.


Why abbreviate it like that?  Nobody's going to know that "sml" stands
for "similarity" without consulting the documentation, and that sucks.


Ok, I don't have an objections. I worked a lot on various similarity 
modules and sml becomes usual for me. That's why I was asking.


--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small PATCH: check of 2 Perl modules

2016-02-15 Thread Teodor Sigaev


On 2/12/16 8:20 AM, Eugene Kazakov wrote:

TAP-tests need two Perl modules: Test::More and IPC::Run.

I think those modules are part of a standard Perl installation.


IPC::Run is not. At least for perl from ports tree in FreeBSD.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-02-15 Thread Teodor Sigaev

It's very pity but author is not able to continue work on this patch, and I 
would like to raise this flag.


I'd like to add some comments about patches:

traversalValue patch adds arbitrary value assoсiated with branch in SP-GiST tree 
walk. Unlike to recostructedValue it could be just pointer, it havn't to be a 
regular pgsql type. Also, it could be used independly from reconstructedValue. 
This patch is used both following attached patches.


range patch just switchs using reconstructedValue to traversalValue in range 
opclasses. reconstructedValue was used just because it was an acceptable 
workaround in case of range type. Range opclase stores a full value in leafs and 
doesn't need to use reconstructedValue to return tuple in index only scan.

See http://www.postgresql.org/message-id/5399.1343512...@sss.pgh.pa.us

q4d patch implements index over boxes using SP-GiST. Basic idea was an 
observation, range opclass thinks about one-dimensional ranges as 2D points.
Following this idea, we can think that 2D box (what is 2 points or 4 numbers) 
could represent a 4D point. We hoped that this approach will be much more 
effective than traditional R-Tree in case of many overlaps in box's collection.

Performance results are coming shortly.

In near future (say, tommorrow) I hope to suggest a opclass over polygons based 
on this approach.


Alexander Lebedev wrote:

Hello, Hacker.

* [PATCH] add a box index to sp-gist

   We have extended sp-gist with an index that keeps track of boxes

   We have used ideas underlying sp-gist range implementation to
   represent 2D boxes as points in 4D space. We use quad tree
   analogue, but in 4-dimensional space. We call this tree q4d. Each
   node of this tree is a box (a point in 4D space) which splits space
   in 16 hyperrectangles.

   Rationale: r-tree assumes that boxes we're trying to index don't
   overlap much. When this assumption fails, r-tree performs badly,
   while our proposal to represent a rectangle as a point in 4D space
   solves this problem.

   NB: the index uses traversalValue introduced in a separate patch.

* [PATCH] add traversalValue in sp-gist

   During implementation of box index for sp-gist we saw that we only
   keep rectangles, but to determine traversal direction we may need
   to know boundaries of a hyperrectangle. So we calculate them
   on the fly and store them in traversalValue, available
   when traversing child nodes, because otherwise we would't be able to
   calculate them from inside the inner_consistent function call.

   This patch was written by Teodor Sigaev.

* [PATCH] change reconstructValue -> traversalValue in range_spgist.c

   During traversal, local context is
   insufficient to pick traversal direction. As of now, this is worked
   around with the help of reconstructValue. reconstructValue only
   stores data of the same type as a tree node, that is, a range.

   We have written a patch that calculates auxillary values and makes
   them accessible during traversal.

   We then use traversalValue in a new box index and change
   range_spgist.c to use it in place of reconstructValue.

   NB: apply this patch after traversalValue patch.






--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


traversalValue-1.patch.gz
Description: GNU Zip compressed data


range-1.patch.gz
Description: GNU Zip compressed data


q4d-1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] ALTER ... OWNER TO ... CASCADE

2016-02-15 Thread Teodor Sigaev


TBH, this sounds like a completely terrible idea.  There are far too many
sorts of dependencies across which one would not expect ownership to
propagate; for example, use of a function in a view, or even just a
foreign key dependency between two tables.

I'm not even clear that there are *any* cases where this behavior is
wanted, other than perhaps ALTER OWNER on an extension --- and even there,
what you would want is altering the ownership of the member objects, but
not everything that depends on the member objects.

So basically, a generic CASCADE facility sounds like a lot of work to
produce something that would seldom be anything but a foot-gun.


DELETE FROM  or TRUNCATE could be a foot-gun too, but it's not a reason to 
remove tham. I faced with problem when I tried to change owner of datadase with 
all objects inside. Think, this feature could be useful although it should 
restricted to superuser obly.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: function parse_ident

2016-02-17 Thread Teodor Sigaev


I missed some of my edits. Updated patch with those in place attached.


I'm sorry, but special chararacter still isn't escaped correctly in error 
messages:

% select parse_ident(E'X\rXX');
XX""X
Time: 0,510 ms



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2016-02-17 Thread Teodor Sigaev


Next version of patch is attached:
  * Documentation for CREATE ACCESS METHOD command is added.
  * Refactoring and comments for bloom contrib.


Cool work, nice to see.

Some comments
1 create-am.7.patch: function lookup_index_am_handler_func() why doesn't it emit 
error in case of emtpy input? If it checks signature of function and
empty handler is not allowed then, seems, all checks about handler have to be 
moved in lookup_index_am_handler_func().


2 create-am.7.patch: Comment near get_am_name() incorrectly mentions  get_am_oid 
function


3 create-am.7.patch: get_am_name(Oid amOid, char amtype). Seems, amtype argument 
is overengineering.  get_am_name() is used only in error messages and additional 
check in it will do nothing or even confuse user.


4 create-am.7.patch: Inconsistentcy with psql help. As I can see other code, 
it's forbidden to create access method without handler

postgres=# \h create access method
Command: CREATE ACCESS METHOD
Description: define a new access method
Syntax:
CREATE ACCESS METHOD name
TYPE INDEX
[ HANDLER handler_function | NO HANDLER ]

postgres=# create access method yyy type index no handler;
ERROR:  syntax error at or near "no"
LINE 1: create access method yyy type index no handler;

5 create-am.7.patch: file src/bin/pg_dump/pg_dump.h. Extra comma near DO_POLICY:
*** 77,83 
DO_POST_DATA_BOUNDARY,
DO_EVENT_TRIGGER,
DO_REFRESH_MATVIEW,
!   DO_POLICY
  } DumpableObjectType;

  typedef struct _dumpableObject
--- 78,84 
DO_POST_DATA_BOUNDARY,
DO_EVENT_TRIGGER,
DO_REFRESH_MATVIEW,
!   DO_POLICY,
  } DumpableObjectType;

6 generic-xlog.7.patch: writeDelta() Seems, even under USE_ASSERT_CHECKING 
define checking diff by its applying is to expensive. May be, let we use here 
GENERIC_WAL_DEBUG macros?


7 generic-xlog.7.patch: src/backend/access/transam/generic_xlog.c It's unclear 
for me, what does MATCH_THRESHOLD do or intended for? Pls, add comments here.


8 generic-xlog.7.patch: src/backend/access/transam/generic_xlog.c. Again, it's 
unclear for me, what is motivation of size of PageData.data?


9 generic-xlog.7.patch: GenericXLogRegister(), Seems, emitting an error if 
caller tries to register buffer which is already registered isn't good idea. In 
practice, say, SP-GIST, buffer variable is used and page could be easily get 
from buffer. Suppose, GenericXLogRegister() should not emit an error and ignore 
isnew flag if case of second registration of the same buffer.


10 bloom-contrib.7.patch:
blutils.c: In function 'initBloomState':
blutils.c:128:20: warning: variable 'opaque' set but not used 
[-Wunused-but-set-variable]

   BloomPageOpaque  opaque;

11 I'd really like to see regression tests (TAP-tests?) for replication with 
generic xlog.





--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-02-24 Thread Teodor Sigaev


Thank you for remembering this problem, at least for me.


Well, turns out there's a quite significant difference, actually. The
index sizes I get (quite stable after multiple runs):

9.5 : 2428 MB
9.6 + alone cleanup : 730 MB
9.6 + pending lock : 488 MB
Interesting, I don't see why alone_cleanup and pending_lock are so differ. I'd 
like to understand that, but does somebody have an good theory? The single point 
in pending_lock patch is an suspicious exception in ProcSleep, this exception 
may cause problem in future.




So that's quite a significant difference, I guess. The load duration for
each version look like this:

9.5 : 1415 seconds
9.6 + alone cleanup : 1310 seconds
9.6 + pending lock  : 1380 seconds

I'd say I'm happy with sacrificing ~5% of time in exchange for ~35%
reduction of index size.
I think, alone_cleanup patch is faster because regular insert could break its 
cleanup process if autovacuum waits to begin work on cleanup. So, insert process 
could returns earlier from pending cleanup process.


In attachment just rebased v2 alone_cleanup patch.
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


gin_alone_cleanup-3.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-02-25 Thread Teodor Sigaev


Well, turns out there's a quite significant difference, actually. The
index sizes I get (quite stable after multiple runs):

9.5 : 2428 MB
9.6 + alone cleanup : 730 MB
9.6 + pending lock : 488 MB


In attach modified alone_cleanup patch which doesn't break cleanup process as it 
does pending_lock patch but attached patch doesn't touch a lock management.


Tomas. if you can, pls, repeat test with this patch. If not, I will try to do 
it, but later.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


gin_alone_cleanup-3.1.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Phrase search distance syntax

2016-09-23 Thread Teodor Sigaev


Sorry to be asking another phrase search syntax question, and so close
to final release, but ...

Really close...


Why does the phrase distance operator assume <1> means adjacent words,
and not <0>.  (FYI, <-> is the same as <1>.)

Because
1 it is a result of subtruction of word's positions
2 <0> could be used as special case like a word with two infinitives:
# create text search dictionary xx (template = 'ispell', 
DictFile='ispell_sample', AffFile='ispell_sample');
# alter text search configuration english ALTER MAPPING FOR asciiword WITH xx, 
english_stem;


# select to_tsvector('english', 'bookings');
 to_tsvector
--
 'book':1 'booking':1

# select to_tsvector('english', 'bookings') @@ 'book <0> booking';
 ?column?
--
 t
(1 row)



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PinBuffer() no longer makes use of strategy

2017-03-20 Thread Teodor Sigaev


  if (buf->usage_count < BM_MAX_USAGE_COUNT)
  if (BUF_STATE_GET_USAGECOUNT(buf_state) != BM_MAX_USAGE_COUNT)

being prone to paranoia, I prefer the first, but I've seen both styles in
the code so I don't know if it's worth futzing with.


Ok, let's be paranoic and do this same way as before.  Revised patch is 
attached.


I see the change was done in 9.6 release cycle in commit 
48354581a49c30f5757c203415aa8412d85b0f70 at April, 10. Does it mean the fix 
should be backpatched too?


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Wraparound limits

2014-08-07 Thread Teodor Sigaev


Hi!

I have a questions about setting transaction's wraparound limits. Function 
SetTransactionIdLimit() in access/transam/varsup.c:


1)
xidWrapLimit = oldest_datfrozenxid + (MaxTransactionId >> 1);
if (xidWrapLimit < FirstNormalTransactionId)
xidWrapLimit += FirstNormalTransactionId;

Isn't  it a problem if oldest_datfrozenxid > MaxTransactionId/2?

2)
xidStopLimit = xidWrapLimit - 100;
if (xidStopLimit < FirstNormalTransactionId)
xidStopLimit -= FirstNormalTransactionId;

xidWarnLimit = xidStopLimit - 1000;
if (xidWarnLimit < FirstNormalTransactionId)
xidWarnLimit -= FirstNormalTransactionId;

Why does it use '-' instead of '+' if variable < FirstNormalTransactionId? In 
this case it is easy to get xidStopLimit > xidWrapLimit or xidWarnLimit > 
xidStopLimit...



Thank you.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Teodor Sigaev


value-to-be-compressed.  (first_success_by is 1024 in the default set of
compression parameters.)


Curious idea: we could swap JEntry array and values: values in the 
begining of type will be catched by pg_lzcompress. But we will need to 
know offset of JEntry array, so header will grow up till 8 bytes 
(actually, it will be a varlena header!)



--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Teodor Sigaev


Curious idea: we could swap JEntry array and values: values in the
begining of type will be catched by pg_lzcompress. But we will need to
know offset of JEntry array, so header will grow up till 8 bytes
(actually, it will be a varlena header!)


May be I wasn't clear:jsonb type will start from string collection 
instead of JEntry array, JEntry array will be placed at the end of 
object/array. so, pg_lzcompress will find repeatable 4-byte pieces in 
first 1024 bytes of jsonb.


--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops

2017-03-21 Thread Teodor Sigaev


Thank you, pushed. I just make test table permanent.

Anastasia Lubennikova wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

As I can see, this bugfix was already discussed and reviewed.
All mentioned issues are fixed in the latest version of the patch
So I mark it "Ready for committer".
Thank you for fixing it!

The new status of this patch is: Ready for Committer



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

2017-03-21 Thread Teodor Sigaev


Thank you for your suggestions, do not hesitate to ask any questions,
concurrency and GIN both are very interesting topics.


I had a look on patch and found some issue.
Look at ginvacuum.c around line 387, function ginVacuumPostingTreeLeaves():


/*
 * All subtree is empty - just return TRUE to indicate that parent must
 * do a cleanup. Unless we are ROOT an there is way to go upper.
 */

if(isChildHasVoid && !isAnyNonempy && !isRoot)
return TRUE;

if(isChildHasVoid)
{
...
ginScanToDelete(gvs, blkno, TRUE, &root, InvalidOffsetNumber);
}

In first 'if' clause I see !isRoot, so second if and corresponding 
ginScanToDelete() could be called only for root in posting tree. If so, it seems 
to me, it wasn't a good idea to move ginScanToDelete() from
ginVacuumPostingTree() and patch should just remove lock for cleanup over 
ginVacuumPostingTreeLeaves() and if it returns that tree contains empty page 
then lock the whole posting tree to do ginScanToDelete() work.




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Potential data loss of 2PC files

2017-03-21 Thread Teodor Sigaev


If that can happen, don't we have the same problem in many other places?
Like, all the SLRUs? They don't fsync the directory either.

Right, pg_commit_ts and pg_clog enter in this category.


Implemented as attached.


Is unlink() guaranteed to be durable, without fsyncing the directory? If
not, then we need to fsync() the directory even if there are no files in it
at the moment, because some might've been removed earlier in the checkpoint
cycle.


What is protection if pg crashes after unlimk() but before fsync()? Right, it's 
rather small window for such scenario, but isn't better to  have another 
protection? Like WAL-logging of WAL segment removing...


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

2017-03-22 Thread Teodor Sigaev


No, second conditional code will be called for any subtree, which
contains totally empty subtree. That check !isRoot covers case when
the entire posting tree should be erased: we cannot just quit out of
recursive cleanup, we have to make a scan here, starting from root.

Oh, I see


Probably, variable isChildHasVoid has a bit confusing name. This flag
indicates that some subtree:
1. Had empty pages
2. Did not bother deleting them, because there is a chance that it is
a part of a bigger empty subtree.
May be it'd be better to call the variable "someChildIsVoidSubtree".


hasEmptyChild? and hasNonEmptyChild (BTW, isAnyNonempy has missed 't')

And if the whole posting tree is empty,then we could mark root page as leaf and 
remove all other pages in tree without any locking. Although, it could be a task 
for separate patch.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

2017-03-23 Thread Teodor Sigaev


Thank you, pushed

Andrew Borodin wrote:

2017-03-22 22:48 GMT+05:00 Teodor Sigaev :

hasEmptyChild? and hasNonEmptyChild (BTW, isAnyNonempy has missed 't')


Yes, I think this naming is good. It's clear what's in common in these
flags and what's different.


And if the whole posting tree is empty,then we could mark root page as leaf
and remove all other pages in tree without any locking. Although, it could
be a task for separate patch.


From the performance point of view, this is a very good idea. Both,
performance of VACUUM and performance of Scans. But doing so we risk
to leave some garbage pages in case of a crash. And I do not see how
to avoid these without unlinking pages one by one. I agree, that
leaving this trick for a separate patch is quite reasonable.

Best regards, Andrey Borodin.




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backend crash on non-exclusive backup cancel

2017-03-23 Thread Teodor Sigaev


Hi!

I believe patch looks good and it's ready to commit. As I understand, it fixes 
bug introduced by

commit 7117685461af50f50c03f43e6a622284c8d54694
Date:   Tue Apr 5 20:03:49 2016 +0200

Implement backup API functions for non-exclusive backups

And, suppose, it should be backpatched to 9.6?

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Potential data loss of 2PC files

2017-03-23 Thread Teodor Sigaev


Hmm, it doesn't work (but appplies) on current HEAD:

% uname -a
FreeBSD *** 11.0-RELEASE-p8 FreeBSD 11.0-RELEASE-p8 #0 r315651: Tue Mar 21 
02:44:23 MSK 2017 teodor@***:/usr/obj/usr/src/sys/XOR  amd64

% pg_config --configure
'--enable-depend' '--enable-cassert' '--enable-debug' '--enable-tap-tests' 
'--with-perl' 'CC=clang' 'CFLAGS=-O0'

% rm -rf /spool/pg_data/*
% initdb -n -E UTF8 --locale ru_RU.UTF-8 --lc-messages C --lc-monetary C 
--lc-numeric C --lc-time C -D /spool/pg_data

Running in no-clean mode.  Mistakes will not be cleaned up.
The files belonging to this database system will be owned by user "teodor".
This user must also own the server process.

The database cluster will be initialized with locales
  COLLATE:  ru_RU.UTF-8
  CTYPE:ru_RU.UTF-8
  MESSAGES: C
  MONETARY: C
  NUMERIC:  C
  TIME: C
The default text search configuration will be set to "russian".

Data page checksums are disabled.

fixing permissions on existing directory /spool/pg_data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... 2017-03-23 23:07:14.705 MSK [40286] FATAL:  could 
not open file "pg_clog": No such file or directory

child process exited with exit code 1
initdb: data directory "/spool/pg_data" not removed at user's request


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [BUGS] Problem in using pgbench's --connect(-C) and --rate=rate(-R rate) options together.

2017-03-23 Thread Teodor Sigaev


Hi, the patch looks good except why do you remove initialization of 
is_throttled?
Suppose, just a typo?

 --- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -1967,7 +1967,6 @@ top:
st->listen = false;
st->sleeping = false;
st->throttling = false;
-   st->is_throttled = false;
memset(st->prepared, 0, sizeof(st->prepared));
}


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backend crash on non-exclusive backup cancel

2017-03-24 Thread Teodor Sigaev


Thank you all, pushed

Michael Paquier wrote:

On Fri, Mar 24, 2017 at 1:45 AM, Teodor Sigaev  wrote:

I believe patch looks good and it's ready to commit.


Thanks for the review!


As I understand, it fixes bug introduced by
commit 7117685461af50f50c03f43e6a622284c8d54694
Date:   Tue Apr 5 20:03:49 2016 +0200

Implement backup API functions for non-exclusive backups


Indeed.


And, suppose, it should be backpatched to 9.6?


Yes, that's where non-exclusive backups have been introduced.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Potential data loss of 2PC files

2017-03-24 Thread Teodor Sigaev


And the renaming of pg_clog to pg_xact is also my fault. Attached is
an updated patch.


Thank you. One more question: what about symlinks? If DBA moves, for example, 
pg_xact to another dist and leaves the symlink in data directoty. Suppose, fsync 
on symlink will do nothing actually.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Declarative partitioning optimization for large amount of partitions

2017-03-24 Thread Teodor Sigaev


things in order I'm attaching the previous patch as well.


Patches look good, but I have some notices:

1 step1 Why do you need TabStatHashEntry at all? TabStatHashEntry.t_id is never 
used for read, so entry for hash could be just a pointer to PgStat_TableStatus.


2 step1 In pgstat_report_stat() you remove one by one entries from hash and 
remove them all. Isn't it better to hash_destroy/hash_create or even let hash 
lives in separate memory context and just resets it?


3 step1 Again, pgstat_report_stat(), all-zero entries aren't deleted from hash 
although they will be free from point of view of pgStatTabList.



4 step 2. The same as 1) about SeenRelsEntry->rel_id but it even isn't 
initialized anywhere.




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Declarative partitioning optimization for large amount of partitions

2017-03-24 Thread Teodor Sigaev


Sorry, 1) and 4) is my fault, comment in hsearch.h:
* ... The hash key
* is expected to be at the start of the caller's hash entry data structure.

Ops, forgot that.

Teodor Sigaev wrote:

things in order I'm attaching the previous patch as well.


Patches look good, but I have some notices:

1 step1 Why do you need TabStatHashEntry at all? TabStatHashEntry.t_id is never
used for read, so entry for hash could be just a pointer to PgStat_TableStatus.

2 step1 In pgstat_report_stat() you remove one by one entries from hash and
remove them all. Isn't it better to hash_destroy/hash_create or even let hash
lives in separate memory context and just resets it?

3 step1 Again, pgstat_report_stat(), all-zero entries aren't deleted from hash
although they will be free from point of view of pgStatTabList.


4 step 2. The same as 1) about SeenRelsEntry->rel_id but it even isn't
initialized anywhere.





--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [BUGS] Problem in using pgbench's --connect(-C) and --rate=rate(-R rate) options together.

2017-03-24 Thread Teodor Sigaev


No, it is really needed so that the lag measure is correct.

Thank you, pushed

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Potential data loss of 2PC files

2017-03-27 Thread Teodor Sigaev


Thank you. One more question: what about symlinks? If DBA moves, for
example, pg_xact to another dist and leaves the symlink in data directoty.
Suppose, fsync on symlink will do nothing actually.


I did not think of this case, but is that really common? There is even

I saw a lot such cases.


If something should be done in this area, that would be surely in
fsync_fname directly to centralize all the calls, and I would think of
that as a separate patch, and a separate discussion.

Agree

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Potential data loss of 2PC files

2017-03-27 Thread Teodor Sigaev


Thank you, pushed

Michael Paquier wrote:

On Fri, Mar 24, 2017 at 11:36 PM, Teodor Sigaev  wrote:

And the renaming of pg_clog to pg_xact is also my fault. Attached is
an updated patch.



Thank you. One more question: what about symlinks? If DBA moves, for
example, pg_xact to another dist and leaves the symlink in data directoty.
Suppose, fsync on symlink will do nothing actually.


I did not think of this case, but is that really common? There is even
no option to configure that at command level. And surely we cannot
support any fancy scenario that people figure out using PGDATA.
Existing callers of fsync_fname don't bother about this case as well
by the way, take the calls related to pg_logical and pg_repslot.

If something should be done in this area, that would be surely in
fsync_fname directly to centralize all the calls, and I would think of
that as a separate patch, and a separate discussion.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] ALTER DEFAULT PRIVILEGES with GRANT/REVOKE ON SCHEMAS

2017-03-28 Thread Teodor Sigaev


Thank you, pushed

Matheus de Oliveira wrote:


On Thu, Mar 2, 2017 at 10:27 AM, David Steele mailto:da...@pgmasters.net>> wrote:

It looks like this patch is still waiting on an update for tab
completion in psql.


Hi All,

Sorry about the long delay... It was so simple to add it to tab-complete.c that
is a shame I didn't do it before, very sorry about that.

Attached the new version of the patch that is basically the same as previously
with the addition to tab completion for psql and rebased with master.

Hope it is enough. Thank you all.

--
Matheus de Oliveira







--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] SortSupport for macaddr type

2017-03-29 Thread Teodor Sigaev


Thank you, pushed

Excellent! I've attached a new (and hopefully final)
version of the patch.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Covering + unique indexes.

2017-03-30 Thread Teodor Sigaev


-   IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P
+   IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P 
INCLUDE

I think your syntax would read no worse, possibly even better, if you
just used the existing INCLUDING keyword.
It was a discussion in this thread about naming and both databases, which 
support covering indexes, use INCLUDE keyword.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Covering + unique indexes.

2017-03-30 Thread Teodor Sigaev

I had a look on patch and played with it, seems, it looks fine. I splitted it to 
two patches: core changes (+bloom index fix) and btree itself. All docs are left 
in first patch - I'm too lazy to rewrite documentation which is changed in 
second patch.

Any objection from reviewers to push both patches?


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


0001-covering-core.patch
Description: binary/octet-stream


0002-covering-btree.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pluggable storage

2017-06-23 Thread Teodor Sigaev


1. Table AM with a 6-byte TID.
2. Table AM with a custom locator format, which could be TID-like.
3. Table AM with no locators.


Currently TID has its own type in system catalog. Seems, it's possible that 
storage claims type of TID which it uses. Then, AM could claim it too, so the 
core based on that information could solve the question about AM-storage 
compatibility. Storage could also claim that it hasn't TID type at all so it 
couldn't be used with any access method, use case: compressed column oriented 
storage.


As I remeber, only GIN depends on TID format, other indexes use it as opaque 
type. Except, at least, btree and GiST - they believ that internal pointers are 
the same as outer (to heap)


Another dubious part - BitmapScan.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Suspicious place in heap_prepare_freeze_tuple()

2017-07-05 Thread Teodor Sigaev


Hi!

Playing around freezing tuple I found suspicious piece of code:

heap_prepare_freeze_tuple():
...
frz->t_infomask = tuple->t_infomask;
...
frz->t_infomask &= ~HEAP_XMAX_BITS;
frz->xmax = newxmax;
if (flags & FRM_MARK_COMMITTED)
frz->t_infomask &= HEAP_XMAX_COMMITTED;

Seems, in last line it should be a bitwise OR instead of AND. Now this line 
cleans all bits in t_infomask which later will be copied directly in tuple.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8deb344d09..ec227bac80 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6639,7 +6639,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
 			frz->t_infomask &= ~HEAP_XMAX_BITS;
 			frz->xmax = newxmax;
 			if (flags & FRM_MARK_COMMITTED)
-frz->t_infomask &= HEAP_XMAX_COMMITTED;
+frz->t_infomask |= HEAP_XMAX_COMMITTED;
 			changed = true;
 			totally_frozen = false;
 		}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Double shared memory allocation for SLRU LWLocks

2017-07-12 Thread Teodor Sigaev


It seems to me that we're allocating shared memory for SLRU lwlocks twice,
unless I'm missing something.

I think you are right.

Did you check previous versions? i.e. should it be applyed to previous 
branches?? I suppose yes, to minimize code difference.


Also I'd like an idea to add Assert(offset <= SimpleLruShmemSize(nslots, nlsns)) 
at the end of SimpleLruInit()




SimpleLruShmemSize() calculates total SLRU shared memory size including lwlocks
size.

SimpleLruInit() starts with line

shared = (SlruShared) ShmemInitStruct(name,
  SimpleLruShmemSize(nslots, nlsns),
  &found);

which allocates SLRU shared memory (LWLocks size is included because
SimpleLruShmemSize() is used for size computation).

Following line allocates shared memory for LWLocks again:
shared->buffer_locks = (LWLockPadded *) ShmemAlloc(sizeof(LWLockPadded) * 
nslots);

Attached patch fixes that by removing extra ShmemAlloc for SLRU.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Pageinspect - add functions on GIN and GiST indexes from gevel

2017-07-25 Thread Teodor Sigaev



It's not clear from the web site in question that the relevant code is
released under the PostgreSQL license.



As author I allow to use this code in PostgreSQL and under its license.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] convert EXSITS to inner join gotcha and bug

2017-04-28 Thread Teodor Sigaev


Hi!

Seems, there two issues:

1) Sometime conditions which for a first glance could be pushed down to scan are 
leaved as join quals. And it could be a ~30 times performance loss.


2) Number of query result depend on enabe_seqscan variable.

The query
explain analyze
SELECT
*
FROM
t1
INNER JOIN t2 ON (
EXISTS (
SELECT
true
FROM
t3
WHERE
t3.id1 = t1.id  AND
t3.id2 = t2.id
)
)
WHERE
t1.name = '5c5fec6a41b8809972870abc154b3ecd'
;

produces following plan:
 Nested Loop  (cost=6.42..1928.71 rows=1 width=99) (actual time=71.415..148.922 
rows=162 loops=1)

   Join Filter: (t3.id1 = t1.id)
   Rows Removed by Join Filter: 70368
   ->  Index Only Scan using t1i2 on t1  (cost=0.28..8.30 rows=1 width=66) 
(actual time=0.100..0.103 rows=1 loops=1)

 Index Cond: (name = '5c5fec6a41b8809972870abc154b3ecd'::text)
 Heap Fetches: 1
   ->  Hash Join  (cost=6.14..1918.37 rows=163 width=66) (actual 
time=0.370..120.971 rows=70530 loops=1)

(1)  Hash Cond: (t3.id2 = t2.id)
(2)  ->  Seq Scan on t3  (cost=0.00..1576.30 rows=70530 width=66) (actual 
time=0.017..27.424 rows=70530 loops=1)
 ->  Hash  (cost=3.84..3.84 rows=184 width=33) (actual 
time=0.273..0.273 rows=184 loops=1)

   Buckets: 1024  Batches: 1  Memory Usage: 20kB
   ->  Seq Scan on t2  (cost=0.00..3.84 rows=184 width=33) (actual 
time=0.017..0.105 rows=184 loops=1)

 Planning time: 7.326 ms
 Execution time: 149.115 ms


Condition (1) is not pushed to scan (2) which seemsly could be safely moved. 
With seqscan = off condition is not pushed too but query returns only one row 
instead of 162. Scan on t3 returns ~7 rows but only ~150 rows are really 
needed. I didn't found a combination of GUCs enable_* to push down that and it 
seems to me there is reason for that which I don't see or support is somehow missed.


If pair of (t3.id1, t3.id2) is unique (see dump, there is a unique index on 
them) the query could be directly rewrited to inner join and its plan is:
 Nested Loop  (cost=9.70..299.96 rows=25 width=66) (actual time=0.376..5.232 
rows=162 loops=1)
   ->  Nested Loop  (cost=9.43..292.77 rows=25 width=99) (actual 
time=0.316..0.645 rows=162 loops=1)
 ->  Index Only Scan using t1i2 on t1  (cost=0.28..8.30 rows=1 
width=66) (actual time=0.047..0.050 rows=1 loops=1)

   Index Cond: (name = '5c5fec6a41b8809972870abc154b3ecd'::text)
   Heap Fetches: 1
 ->  Bitmap Heap Scan on t3  (cost=9.15..283.53 rows=94 width=66) 
(actual time=0.257..0.426 rows=162 loops=1)

   Recheck Cond: (id1 = t1.id)
   Heap Blocks: exact=3
   ->  Bitmap Index Scan on t3i1  (cost=0.00..9.12 rows=94 width=0) 
(actual time=0.186..0.186 rows=162 loops=1)

 Index Cond: (id1 = t1.id)
   ->  Index Only Scan using t2i1 on t2  (cost=0.27..0.29 rows=1 width=33) 
(actual time=0.024..0.024 rows=1 loops=162)

 Index Cond: (id = t3.id2)
 Heap Fetches: 162
 Planning time: 5.532 ms
 Execution time: 5.457 ms

Second plan is ~30 times faster. But with turned  off sequentual scan the first 
query is not work correctly, which points to some bug in planner, I suppose. 
Both 9.6 and 10devel are affected to addiction  of query result on seqscan variable.



Dump to reproduce (subset of real data but obfucated), queries are in attachment
http://sigaev.ru/misc/exists_to_nested.sql.gz
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/
--query returns 162 rows
explain analyze
SELECT
*
FROM
t1
INNER JOIN t2 ON (
EXISTS (
SELECT
true
FROM
t3
WHERE
t3.id1 = t1.id  AND
t3.id2 = t2.id
)
)
WHERE
t1.name = '5c5fec6a41b8809972870abc154b3ecd'
;

set enable_seqscan=off;

--the same query returns only one row!
explain analyze
SELECT
*
FROM
t1
INNER JOIN t2 ON (
EXISTS (
SELECT
true
FROM
t3
WHERE
t3.id1 = t1.id  AND
t3.id2 = t2.id
)
)
WHERE
t1.name = '5c5fec6a41b8809972870abc154b3ecd'
;

explain analyze
SELECT
t1.*
FROM
t1, t2, t3
WHERE
t1.name =

Re: [HACKERS] convert EXSITS to inner join gotcha and bug

2017-04-28 Thread Teodor Sigaev


Both 9.6 and 10devel are affected to addiction  of query result on seqscan
variable.
Oops, I was too nervious, 9.6 is not affected to enable_seqscan setting. But it 
doesn't push down condition too.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] convert EXSITS to inner join gotcha and bug

2017-04-28 Thread Teodor Sigaev




Ah, thanks for the clue about enable_hashjoin, because it wasn't
reproducing for me as stated.

I missed tweaked config, sorry

--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] convert EXSITS to inner join gotcha and bug

2017-04-29 Thread Teodor Sigaev


Really, the way to fix Teodor's complaint is to recognize that the
semijoin inner rel is effectively unique against the whole outer rel,
and then strength-reduce the semijoin to a plain join.  The infrastructure
we built for unique joins is capable of proving that, we just weren't
applying it in the right way.
Perfect, it works. Thank you! Second patch reduces time of full query (my 
example was just a small part) from 20 minutes to 20 seconds.



I'm kind of strongly tempted to apply the second patch; but it would
be fair to complain that reduce_unique_semijoins() is new development
and should wait for v11.  Opinions?


Obviously, I'm voting for second patch applied to version 10.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Perfomance bug in v10

2017-05-31 Thread Teodor Sigaev


Hi!

I found an example where v10 chooses extremely non-optimal plan:
select
i::int as a,
i::int + 1 as b,
0 as c
into t
from
generate_series(1,32) as i;

create unique index i on t (c, a);

explain analyze
SELECT
t1.a, t1.b,
t2.a, t2.b,
t3.a, t3.b,
t4.a, t4.b,
t5.a, t5.b,
t6.a, t6.b
/*
,
t7.a, t7.b,
t8.a, t8.b,
t9.a, t9.b,
t10.a, t10.b
*/
FROM t T1
LEFT OUTER JOIN t T2
ON T1.b = T2.a AND T2.c = 0
LEFT OUTER JOIN t T3
ON T2.b = T3.a AND T3.c = 0
LEFT OUTER JOIN t T4
ON T3.b = T4.a AND T4.c = 0
LEFT OUTER JOIN t T5
ON T4.b = T5.a AND T5.c = 0
LEFT OUTER JOIN t T6
ON T5.b = T6.a AND T6.c = 0
LEFT OUTER JOIN t T7
ON T6.b = T7.a AND T7.c = 0
LEFT OUTER JOIN t T8
ON T7.b = T8.a AND T8.c = 0
LEFT OUTER JOIN t T9
ON T8.b = T9.a AND T9.c = 0
LEFT OUTER JOIN t T10
ON T9.b = T10.a AND T10.c = 0
WHERE T1.c = 0 AND T1.a = 5
;

It takes 4 seconds on my laptop, uncommenting commented lines causes run 
forever. analyzing table or removing index reduces execution time to 
milliseconds regardless on commented or uncommented lines.


The commit
commit 9c7f5229ad68d7e0e4dd149e3f80257893e404d4
Author: Tom Lane 
Date:   Fri Apr 7 22:20:03 2017 -0400

Optimize joins when the inner relation can be proven unique.

seems a root this problem - before it the query takes milliseconds. In 
attachment there is a output of explain analyze with commented lines, my 
attention was attracted by a huge number of loops:


 ->  Materialize  (cost=0.00..1.40 rows=1 width=8) (actual time=0.000..0.001 
rows=17 loops=1048576)




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/
Timing is on.
DROP TABLE
Time: 5,268 ms
SELECT 32
Time: 5,515 ms
CREATE INDEX
Time: 14,971 ms
QUERY PLAN  
   
---
 Nested Loop Left Join  (cost=0.00..8.55 rows=1 width=48) (actual 
time=818.219..4159.317 rows=1 loops=1)
   Join Filter: (t1.b = t2.a)
   Rows Removed by Join Filter: 31
   ->  Seq Scan on t t1  (cost=0.00..1.48 rows=1 width=8) (actual 
time=0.026..0.032 rows=1 loops=1)
 Filter: ((c = 0) AND (a = 5))
 Rows Removed by Filter: 31
   ->  Nested Loop Left Join  (cost=0.00..7.06 rows=1 width=40) (actual 
time=10.588..4159.270 rows=32 loops=1)
 Join Filter: (t2.b = t3.a)
 Rows Removed by Join Filter: 993
 ->  Seq Scan on t t2  (cost=0.00..1.40 rows=1 width=8) (actual 
time=0.008..0.020 rows=32 loops=1)
   Filter: (c = 0)
 ->  Nested Loop Left Join  (cost=0.00..5.65 rows=1 width=32) (actual 
time=0.142..129.970 rows=32 loops=32)
   Join Filter: (t3.b = t4.a)
   Rows Removed by Join Filter: 993
   ->  Seq Scan on t t3  (cost=0.00..1.40 rows=1 width=8) (actual 
time=0.002..0.010 rows=32 loops=32)
 Filter: (c = 0)
   ->  Nested Loop Left Join  (cost=0.00..4.23 rows=1 width=24) 
(actual time=0.007..4.055 rows=32 loops=1024)
 Join Filter: (t4.b = t5.a)
 Rows Removed by Join Filter: 993
 ->  Seq Scan on t t4  (cost=0.00..1.40 rows=1 width=8) 
(actual time=0.002..0.010 rows=32 loops=1024)
   Filter: (c = 0)
 ->  Nested Loop Left Join  (cost=0.00..2.82 rows=1 
width=16) (actual time=0.003..0.121 rows=32 loops=32768)
   Join Filter: (t5.b = t6.a)
   Rows Removed by Join Filter: 528
   ->  Seq Scan on t t5  (cost=0.00..1.40 rows=1 
width=8) (actual time=0.002..0.009 rows=32 loops=32768)
 Filter: (c = 0)
   ->  Materialize  (cost=0.00..1.40 rows=1 width=8) 
(actual time=0.000..0.001 rows=17 loops=1048576)
 ->  Seq Scan on t t6  (cost=0.00..1.40 rows=1 
width=8) (actual time=0.008..0.031 rows=32 loops=1)
   Filter: (c = 0)
 Planning time: 3.316 ms
 Execution time: 4159.596 ms
(31 rows)

Time: 4165,372 ms (00:04,165)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance bug in v10

2017-06-01 Thread Teodor Sigaev


Thank you for the answer!



This is all caused by get_variable_numdistinct() deciding that all
values are distinct because ntuples < DEFAULT_NUM_DISTINCT. I see that
if the example is increased to use 300 tuples instead of 32, then
that's enough for the planner to estimate 2 rows instead of clamping
to 1, and the bad plan does not look so good anymore since the planner
predicts that those nested loops need to be executed more than once.
I miss here why could the presence of index influence on that? removing 
index causes a good plan although it isn't used in both plans .




I really think the planner is too inclined to take risks by nesting
Nested loops like this, but I'm not all that sure the best solution to
fix this, and certainly not for beta1.

So, I'm a bit unsure exactly how best to deal with this.  It seems
like we'd better make some effort, as perhaps this could be a case
that might occur when temp tables are used and not ANALYZED, but the
only way I can think to deal with it is not to favour unique inner
nested loops in the costing model.  The unfortunate thing about not
doing this is that the planner will no longer swap the join order of a
2-way join to put the unique rel on the inner side. This is evident by
the regression test failures caused by patching with the attached,
which changes the cost model for nested loops back to what it was
before unique joins.
The patch, seems, works for this particular case, but loosing swap isn't 
good thing, I suppose.




My other line of thought is just not to bother doing anything about
this. There's plenty more queries you could handcraft to trick the
planner into generating a plan that'll blow up like this. Is this a
realistic enough one to bother accounting for? Did it come from a real
world case? else, how did you stumble upon it?


Unfortunately, it's taken from real application.

--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance bug in v10

2017-06-02 Thread Teodor Sigaev



Teodor, could you check if this patch fixes your real-world problem?


It works fine with original query, thank you. But some other query slowdowns for 
~10% (9 secs vs 10 secs). Look at following part of plans of huge query:


without patch:
->  Nested Loop  (cost=34.82..50.91 rows=1 width=20)
 (actual time=0.017..0.061 rows=5 loops=24121)
  ->  ...
  ->  Materialize  (cost=0.56..15.69 rows=1 width=5)
   (actual time=0.003..0.004 rows=2 loops=109061)
->  Index Scan using ... (cost=0.56..15.68 rows=1 width=5)
 (actual time=0.013..0.014 rows=2 loops=24121)

with patch:
->  Nested Loop  (cost=34.82..50.91 rows=1 width=20)
 (actual time=0.018..0.063 rows=5 loops=24121)
  -> ...
  ->  Index Scan using ...  (cost=0.56..15.68 rows=1 width=5)
(actual time=0.012..0.013 rows=2 loops=109061)

(dots hidden the same parts)

As you said, it removes Materialize node, although it's useful here.

If you wish, I can do a test suite, its size will be around 10MB and send it by 
private  email.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance bug in v10

2017-06-02 Thread Teodor Sigaev


There were old threads about considering a risk factor when estimating
plans, and I'm thinking this issue is the planner failing to do
exactly that.


I'm afraid it's tool late for v10

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance bug in v10

2017-06-02 Thread Teodor Sigaev


BTW, was the larger query plan that you showed (with a Materialize node)
generated by 9.6, or v10 HEAD?  Because I would be surprised if 9.6 did

v10,
commit acbd8375e954774181b673a31b814e9d46f436a5
Author: Magnus Hagander 
Date:   Fri Jun 2 11:18:24 2017 +0200

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Selectivity estimation for intarray with @@

2015-07-21 Thread Teodor Sigaev


Some notices about code:

1 Near function transformOperator() there is wrong operator name "@<"
2 int_query (and intquery) should be replaced to query_int to be consistent with 
actual type name. At least where it's used as separate lexeme.
3 For historical reasons @@ doesn't commutate with itself, it has a commutator 
~~. Patch assumes that @@ is self-commutator, but ~~ will use  'contsel' as a 
restrict estimation. Suppose, we need to declare ~~ as deprecated and introduce 
commutating @@ operator.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] stringify MAKE_SQLSTATE()

2015-07-22 Thread Teodor Sigaev


Hi!

Following discussion at https://commitfest.postgresql.org/5/190/ patch, I found 
(at seems to me) a way to stringify MAKE_SQLSTATE(), the idea is to use char 
array as string:


#include 

#define MAKE_SQLSTATE(ch1,ch2,ch3,ch4,ch5) \

((char[]){(char)(ch1),(char)(ch2),(char)(ch3),(char)(ch4),(char)(ch5),(char)'\0'})

#define ERRCODE_WARNING_DEPRECATED_FEATURE MAKE_SQLSTATE('0','1','P','0','1')

int
main(int argn, char* argv[])
{
char*x = ERRCODE_WARNING_DEPRECATED_FEATURE;

printf("%s\n", x);

return 0;
}

That works for clang ang gcc48 with flags -Wall -Wextra -ansi (or -std=c90) 
without warnings, but doesn't work with -pedantic. Is that enough to to work 
with other compilers?


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] stringify MAKE_SQLSTATE()

2015-07-22 Thread Teodor Sigaev


#define MAKE_SQLSTATE(ch1,ch2,ch3,ch4,ch5) \
((char[]){(char)(ch1),(char)(ch2),(char)(ch3),(char)(ch4),(char)(ch5),(char)'\0'})


I'm pretty sure that's a gcc-ism, not standard C.
Hmm, after some digging: the feature is called compound literals and it was 
introduced in c99 although gcc has support in c90. To disable this support it's 
needed a flag -pedantic. Sorry.



The real question is do we want to.  What's your use-case for this?
Just do not have a hardcoded values. It is too easy to make a mistake in 
hardcoded value.





--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] compress method for spgist - 2

2015-07-23 Thread Teodor Sigaev


Poorly, by hanging boxes that straddled dividing lines off the parent
node in a big linear list. The hope would be that the case was

Ok, I see, but that's not really what I was wondering. My question is this:
SP-GiST partitions the space into non-overlapping sections. How can you store
polygons - which can overlap - in an SP-GiST index? And how does the compress
method help with that?


I believe if we found a way to index boxes then we will need a compress method 
to build index over polygons.


BTW, we are working on investigation a index structure for box where 2d-box is 
treated as 4d-point.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Microvacuum for gist. Question about GISTPageOpaqueData flag

2015-07-27 Thread Teodor Sigaev


I need an advice, what would be better:
- to add new flag like F_HAS_GARBAGE,
- or to delete all mentions of F_TUPLES_DELETED and use it in gist microvacuum.


According to commit message:
commit 2effb72e682a7dbdc9a8a60a80c22ec1fa9d8079
Author: Heikki Linnakangas 
Date:   Fri Nov 7 15:03:46 2014 +0200
..
The code that generated a record to clear the F_TUPLES_DELETED flag hasn't
existed since we got rid of old-style VACUUM FULL. I kept the code that sets
the flag, although it's not used for anything anymore, because it might
still be interesting information for debugging purposes that some tuples
have been deleted from a page.
..

If Heikki doesn't change his opinion then introduce new flag. Although I don't 
think that we need to keep F_TUPLES_DELETED.



Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Implement failover on libpq connect level.

2015-08-28 Thread Teodor Sigaev


+1 for bringing the jdbc driver URI syntax into libpq, so that all interfaces
can be optionally specified this way. This doesn't preclude the use of
ipfailover, in fact it might be work well together. If you don't like it, don't
use it.


+1

Another thought:  multiple hosts in URI could be used in simple configuration 
for read-only clients. I faced with customers which manages two connections in 
process - to master and to one of several slaves.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2015-09-01 Thread Teodor Sigaev


In general pattern of generic WAL usage is following.

1) Start using generic WAL: specify relation


M-m, what about extensions which wants to use WAL but WAL record doesn't 
connected to any relation? For example, transaction manager or kind of FDW.




GenericXLogStart(index);

2) Register buffers

GenericXLogRegister(0, buffer1, false);
GenericXLogRegister(1, buffer2, true);

first argument is a slot number, second is the buffer, third is flag indicating
new buffer


Why do we need a slot number? to replace already registered buffer?



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Microvacuum for gist.

2015-09-04 Thread Teodor Sigaev


Some notices

1 gistvacuumpage():
OffsetNumber deletable[MaxOffsetNumber];
  Seems, MaxOffsetNumber is too much, MaxIndexTuplesPerPage is enough

2 Loop in gistkillitems() for searching heap pointer. It seems to me that
it could be a performance problem. To fix it, it's needed to remember index 
tuple's offset number somewhere near GISTScanOpaqueData->killedItems. And
gistkillitems() will loop over offsets and compare heap pointer from killedItems 
and index tuple, if they doesn't match then just skip this index tuple.


3 Connected with previous, could you show some performance tests?


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2015-09-07 Thread Teodor Sigaev


Some notices:

1) create-am.3.patch.gz
  As I understand, you didn't add schema name to access method. Why? Suppose, 
if we implement SQL-like interface for AM screation/dropping then we should 
provide a schema qualification for them


2) create-am.3.patch.gz get_object_address_am()
+   switch (list_length(objname))
...
+   case 2:
+   catalogname = strVal(linitial(objname));
+   amname = strVal(lthird(objname));
^^ seems, it should be lsecond()
3) create-am.3.patch.gz
 Suppose, RM_GENERIC_ID is part of generic-xlog.3.patch.gz

4) Makefile(s)
As I can see, object files are lexicographically ordered

5) gindesc.c -> genericdesc.c in file header

6) generic-xlog.3.patch.gz
I don't like an idea to split START_CRIT_SECTION and END_CRIT_SECTION to 
GenericXLogStart and GenericXLogFinish. User's code could throw a error or 
allocate memory between them and error will become a panic.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2015-09-07 Thread Teodor Sigaev




Petr Jelinek wrote:

On 2015-09-07 17:41, Teodor Sigaev wrote:

Some notices:

1) create-am.3.patch.gz
   As I understand, you didn't add schema name to access method. Why?
Suppose, if we implement SQL-like interface for AM screation/dropping
then we should provide a schema qualification for them


Why would access methods need to be schema qualified?


Opfamily and opclass could be schema-qualified.

--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Microvacuum for gist.

2015-09-08 Thread Teodor Sigaev


Something goes wrong...

gist.c:1465:5: warning: unused variable 'minoff' [-Wunused-variable]
minoff,
gistget.c:37:1: warning: unused function 'gistkillitems' [-Wunused-function]
gistkillitems(IndexScanDesc scan)


Without microvacuum. All 'select' queries are executed at about same time
Time: 360,468 ms
Time: 243,197 ms
Time: 238,310 ms
Time: 238,018 ms

With microvacuum. First 'select' invokes gistkillitems(). It's executed a bit
slower than before.
But following queries are executed significantly faster than without 
microvacuum.
Time: 368,780 ms
Time: 69,769 ms
Time: 9,545 ms
Time: 12,427 ms
That's perfect, but I can't reproduce that. Suppose, because of "unused function 
'gistkillitems'"


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Microvacuum for gist.

2015-09-08 Thread Teodor Sigaev


BTW, I slightly modify your test to provide more stable results.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/
drop table test;

CREATE TABLE test (box box);
alter table test set (autovacuum_enabled=off);

insert into  test (select box(point(x, x),point(x, x)) 
from generate_series(1,500) as x);

vacuum test;

CREATE INDEX test_idx ON test USING gist (box);

SET enable_seqscan TO false;
SET enable_bitmapscan TO false;
SET enable_indexscan TO true;
SET enable_indexonlyscan TO true;

select count(box) from test where box && box(point(0,0), 
point(100,100));

delete from test where box && box(point(0,0), point(100,100));

-- This query invokes gistkillitems()
select count(box) from test where box && box(point(0,0), 
point(100,100));

-- These queries should work faster with microvacuum
select count(box) from test where box && box(point(0,0), 
point(100,100));
select count(box) from test where box && box(point(0,0), 
point(100,100));
select count(box) from test where box && box(point(0,0), 
point(100,100));

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 3 4 5 6 7 8 9 >

1 - 100 of 835 matches

Mail list logo