Re: [HACKERS] tsearch in core patch
>> That may have been true until we started supporting Windows... >> Swedish_Sweden.1252 is what I get on my machine, for example. Principle >> is the same, but values certainly aren't. > > Well, at least the name is not itself translated, so a mapping table is > not right out of the question. If they had put a name like > "EspaƱol_Chile" instead of "Spanish_Chile" we would be in serious > trouble. I don't think so, in oppsite case you can't type or show it to change locale :). So, final propose: rename cfglocale to cfglanguages and store in it array of laguage names which is produced from first part of locale names: russian '{ru_RU, Russian_Russia}' spanish '{es_ES, es_CL, Spanish_Spain, Spanish_Chile}' Comments? Is there some obstacles to use GIN indexes in pg_catalog? ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] tsearch in core patch
> Why not do it the other way around? > es_ES spanish > Spanish_Spain spanish > ru_RU russian > pt_BR portuguese_brazil > > That way you don't need any funny index. Or do you need the list of > locales for each language? (but even if you do, you can easily obtain it > by indexing both columns separately using btrees anyway) Yes, that's possible but that icreases number of identical configuration: russian_win Russian_Russia russian_unixru_RU They doesn't differ except locale name. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[Fwd: Re: [HACKERS] tsearch in core patch]
>> How would this work for initdb with locale C? > > I'm worrying about that too. english '{en_GB, en_US, C}' I suppose, that locale name always has a dot separator exept C locale --- which is well known exception ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Re: Full text searching, anyone interested?
> > > I would love to find a way to get a bitmap like index native to Postgres. I [skip] > We could implement bitmap handling functions based on one dimentional arrays of > integers. That's how my stuff deals with them, and postgres already manages > them. > look at contrib/intarray. gist__intbig_ops is a variant of signature tree (from each array get bitmap signature). Regards, Teodor ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] Re: GiST patches for 7.2 (please apply)
Tom Lane wrote: > > Oleg Bartunov <[EMAIL PROTECTED]> writes: > > Teodor finally made patches to current CVS, please review and apply them asap > > to be in sync with development (last time it was kind of problem) > > Checked and committed. Note I did not commit your change to the cube > regression test: > > *** ./contrib/cube/expected/cube.out.orig Wed Aug 22 16:04:42 2001 > --- ./contrib/cube/expected/cube.outWed Aug 22 16:26:25 2001 > *** > *** 130,142 > SELECT '1e700'::cube AS cube; >cube > --- > ! (inf) > (1 row) > > SELECT '-1e700'::cube AS cube; > cube > > ! (-inf) > (1 row) > > SELECT '1e-700'::cube AS cube; > --- 130,142 > SELECT '1e700'::cube AS cube; >cube > --- > ! (Inf) > (1 row) > > SELECT '-1e700'::cube AS cube; > cube > > ! (-Inf) > (1 row) > > SELECT '1e-700'::cube AS cube; > > since on my machine "inf" appears to be the correct result. Is this a > platform dependency, or just a lack of synchronization somewhere else? > > regards, tom lane On my box FreeBSD4.3 it looks as 'Inf'. Very similar that this is platform dependency. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] POC, WIP: OR-clause support for indexes
Thank you for review! I'd like to see comments too! but more so in the code. :) I've had a look over this, and it seems like a great area in which we could improve on, and your reported performance improvements are certainly very interesting too. However I'm finding the code rather hard to follow, which might be a combination of my lack of familiarity with the index code, but more likely it's the lack of I've added comments, fixed a found bugs. comments to explain what's going on. Let's just take 1 function as an example: Here there's not a single comment, so I'm just going to try to work out what's going on based on the code. +static void +compileScanKeys(IndexScanDesc scan) +{ +GISTScanOpaqueso = (GISTScanOpaque) scan->opaque; +int*stack, +stackPos = -1, +i; + +if (scan->numberOfKeys <= 1 || so->useExec == false) +return; + +Assert(scan->numberOfKeys >=3); Why can numberOfKeys never be 2? I looked at what calls this and I can't really Because here they are actually an expression, expression could contain 1 or tree or more nodes but could not two (operation AND/OR plus two arguments) work it out. I'm really also not sure what useExec means as there's no comment fixed. If useExec == false then SkanKeys are implicitly ANDed and stored in just array. in that struct member, and what if numberOfKeys == 1 and useExec == false, won't this Assert() fail? If that's not a possible situation then why not? fixed +ScanKey key = scan->keyData + i; Is there a reason not to use keyData[i]; ? That's the same ScanKey key = &scan->keyData[i]; I prefer first form as more clear but I could be wrong - but there are other places in code where pointer arithmetic is used. +if (stackPos >= 0 && (key->sk_flags & (SK_OR | SK_AND))) +{ +Assert(stackPos >= 1 && stackPos < scan->numberOfKeys); stackPos >= 1? This seems unnecessary and confusing as the if test surely makes that impossible. + +so->leftArgs[i] = stack[stackPos - 1]; Something is broken here as stackPos can be 0 (going by the if() not the Assert()), therefore that's stack[-1]. fixed stackPos is initialised to -1, so this appears to always skip the first element of the keyData array. If that's really the intention, then wouldn't it be better to just make the initial condition of the for() look i = 1 ? done I'd like to review more, but it feels like a job that's more difficult than it needs to be due to lack of comments. Would it be possible to update the patch to try and explain things a little better? Hope, I made cleaner.. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ index_or-2.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
The basic point of this patch is to apply the generate-and-compare-Paths paradigm to the planning steps after query_planner(), which only covers ... > The present patch addresses this problem by inventing Path nodes to > represent every post-scan/join step I'm really glad to see that. Separating path nodes for later steps opens a new ways to optimize queries. For first glance, consider select * from a left outer join b on a.i = b.i limit 1; Limit node could be pushed down to scan over 'a' table if b.i is unique. I tried to look into patch and I had a question (one for now): why LimitPath doesn't contain actual limit/offset value? I saw a lot of subqueries with LIMIT 1 which could be transformed into EXISTS subquery. > So I'd really like to get this into 9.6. Me too. I applied the patch and can confirm that 'make test' doesn't fail on FreeBSD 10.2. Now I will try to run kind of TPC-H with and without patch. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
I do not think the patch will make a lot of performance difference as-is; its value is more in what it will let us do later. There are a couple of Yep, for now on my notebook (best from 5 tries): % pgbench -i -s 3000 % pgbench -s 3000 -c 4 -j 4 -P 1 -T 60 HEAD569 tps patched 542 tps % pgbench -s 3000 -c 4 -j 4 -P 1 -T 60 -S HEAD9500 tps patched 9458 tps Looks close to measurement error, but may be explained increased amount of work for planning. Including, may be, more complicated path tree. this kind of optimization to chance. But the patch is big enough already, so that (and a lot of other things) are getting left for later. Agree -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
So I see no evidence for a slowdown on pgbench's SELECT queries. Anybody else want to check performance on simple scan/join queries? I did your tests with configure --enable-depend and pgbench -i -s 100 and slightly tweaked postgresql.conf, on notebook with CPU i7-3520M (2 cores + 2 HT), FreeBSD 10.2. pgbench -c 4 -j 4 -P 1 -T 60 -S HEAD 35834 tps avg (35825/35828/35808/35844/35865) Patched HEAD 35529 tps avg (35516/35561/35527/35534/35510) ~1% slowdown. I can live with that. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: function parse_ident
select parse_ident(E'X\rXX'); I am sending updated patch - I used json function for correct escaping - the escaping behave is same. Hmm, it doesn't look so: % select parse_ident(E'_\005'); ERROR: identifier contains disallowed characters: "_\u0005" % select parse_ident(E'\005'); ERROR: missing identifier: "\u0005" but # select parse_ident(E'"\005"'); parse_ident - {\x05} Error messages above point wrong character wrongly. One more inconsistence: # select parse_ident(E'"\005"') as "\005"; \005 {\x05} Display outputs of actual identifier and parse_indent are differ. Actually, I can live with both but are any other opinions? Seems, at least difference of actual identifier and output of parse_indent should be pointed in docs. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tsvector editing functions
Thanks! Fixed and added tests. Thank you! I did some patch cleanup/fix, but I have some doubt with function's names: 1 to_tsvector: # \df to_tsvector List of functions Schema |Name | Result data type | Argument data types | Type +-+--+-+ pg_catalog | to_tsvector | tsvector | regconfig, text | normal pg_catalog | to_tsvector | tsvector | text| normal pg_catalog | to_tsvector | tsvector | text[] | normal First two variants of to_tsvector make a morphological processing, last one doesn't. 2 to_array # \df *to_array List of functions Schema | Name | Result data type | Argument data types | Type +---+--+-+ pg_catalog | regexp_split_to_array | text[] | text, text | normal pg_catalog | regexp_split_to_array | text[] | text, text, text| normal pg_catalog | string_to_array | text[] | text, text | normal pg_catalog | string_to_array | text[] | text, text, text| normal pg_catalog | to_array | text[] | tsvector| normal Seems, to_array is not a right name compared to other *to_array. I would like to suggest rename both functions to array_to_tsvector and tsvector_to_array to have consistent name. Later we could add to_tsvector([regconfig, ], text[]) with morphological processing. Thoughts? -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 4b5ee81..ed0b6be 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -9211,16 +9211,29 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple setweight - setweight(tsvector, "char") + setweight(vector tsvector, weight "char") tsvector -assign weight to each element of tsvector +assign weight to each element of vector setweight('fat:2,4 cat:3 rat:5B'::tsvector, 'A') 'cat':3A 'fat':2A,4A 'rat':5A + setweight + setweight by filter + + setweight(vector tsvector, weight "char", lexemes "text"[]) + +tsvector +assign weight to elements of vector that are listed in lexemes array +setweight('fat:2,4 cat:3 rat:5B'::tsvector, 'A', '{cat,rat}') +'cat':3A 'fat':2,4 'rat':5A + + + + strip strip(tsvector) @@ -9233,6 +9246,81 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple + delete + delete lemexeme + + delete(vector tsvector, lexeme text) + +tsvector +remove given lexeme from vector +delete('fat:2,4 cat:3 rat:5A'::tsvector, 'fat') +'cat':3 'rat':5A + + + + + delete + delete lemexemes array + + delete(vector tsvector, lexemes text[]) + +tsvector +remove any occurrence of lexemes in lexemes array from vector +delete('fat:2,4 cat:3 rat:5A'::tsvector, ARRAY['fat','rat']) +'cat':3 + + + + + unnest + + unnest(tsvector, OUT lexeme text, OUT positions smallint[], OUT weights text) + +setof record +expand a tsvector to a set of rows +unnest('fat:2,4 cat:3 rat:5A'::tsvector) +(cat,{3},{D}) ... + + + + + to_array + + to_array(tsvector) + +text[] +convert tsvector to array of lexemes +to_array('fat:2,4 cat:3 rat:5A'::tsvector) +{cat,fat,rat} + + + + + to_tsvector + array to tsvector + + to_tsvector(text[]) + +tsvector +convert array of lexemes to tsvector +to_tsvector('{fat,cat,rat}'::text[]) +'fat' 'cat' 'rat' + + + + + filter + + filter(vector tsvector, weights "char"[]) +
Re: [HACKERS] Tsvector editing functions
I saw errors on windows, here is the fix: Thank you, pushed -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: function parse_ident
I afraid so I cannot to fix this inconsistency (if this is inconsistency - the binary values are same) - the parameter of function is raw string with processed escape codes, and I have not any information about original escape sequences. When you enter octet value, and I show it as hex value, then there should be difference. Buy I have not information about your input (octet or hex). I have the original string of SQL identifier inside parser, executor, but I have not original string of function parameter inside function (not without pretty complex and long code). Ok, agree I am trying describe it in doc (I am sorry for my less level English) in new patch. Fixed duplicated oid too. Edited a bit + fix some typos and remove unneeded headers, patch attached Sorry, I can't find all corner-cases at once, but: SELECT parse_ident(E'"c".X XX'); ERROR: identifier contains disallowed characters: "\"c" Error message wrongly points to the reason of error. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ parse_ident-13.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: function parse_ident
I hope so the messages are ok now. Few more regress tests added. Thank you, committed. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index
Do reconstructedValues is required now? Wouldn't it be cleaner to use the new field on the prefix tree implementation, too? reconstructedValues is needed to reconctruct full indexed value and should match to type info indexed column We haven't had specific memory context for reconstructedValues. Why is it required for the new field? because pg knows type of reconstructedValues (see above why) but traversalValue just a void*, SP-GiST core doesn't knnow how to free memory of allocated for it. + Memory for traversalValues should be allocated in + the default context, but make sure to switch to + traversalMemoryContext before allocating memory + for pointers themselves. This sentence is not understandable. Shouldn't it say "the memory should *not* be allocated in the default context"? What are the "pointers themselves"? Clarified, I hope The box opclass is broken because of the floating point arithmetics of the box type. You can reproduce it with the attached script. We hope, fixed. At least, seems, syncronized with seqscan really need to fix the geometric types, before building more on them. They fail to serve the only purpose they are useful for, demonstrating features. agree, but it's not a deal for this patch It think the opclass should support the cross data type operators like box @> point. Ideally we should do it by using multiple opclasses in an opfamily. The floating problem will bite us again in here, because some of the operators are not using FP macros. Again, agree. But I suggest to do it by separate patch. The tests check not supported operator "@". It should be "<@". It is nice that the opclass doesn't support long deprecated operators. fixed + #define LT -1 + #define GT 1 + #define EQ 0 Using these numbers is a very well established pattern. I don't think macros make the code any more readable. fixed + /* SP-GiST API functions */ + Datum spg_box_quad_config(PG_FUNCTION_ARGS); + Datum spg_box_quad_choose(PG_FUNCTION_ARGS); + Datum spg_box_quad_picksplit(PG_FUNCTION_ARGS); + Datum spg_box_quad_inner_consistent(PG_FUNCTION_ARGS); + Datum spg_box_quad_leaf_consistent(PG_FUNCTION_ARGS); I guess they should go to the header file. Remove them because they are presented in auto-generated file ./src/include/utils/builtins.h range patch is unchanged, but still attached to keep them altogether. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ q4d-2.patch.gz Description: GNU Zip compressed data traversalValue-2.patch.gz Description: GNU Zip compressed data range-1.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Access method extendability
Good catch, thanks! Tests were added. I don't see any objection, is consensus reached? I'm close to comiit that... -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] POC, WIP: OR-clause support for indexes
I also wonder whether the patch should add explanation of OR-clauses handling into the READMEs in src/backend/access/* Oops, will add shortly. The patch would probably benefit from transforming it into a patch series - one patch for the infrastructure shared by all the indexes, then one patch per index type. That should make it easier to review, and I seriously doubt we'd want to commit this in one huge chunk anyway. Ok, will do it. 1) fields in BrinOpaque are not following the naming convention (all the existing fields start with bo_) fixed 2) there's plenty of places violating the usual code style (e.g. for single-command if branches) - not a big deal for WIP patch, but needs to get fixed eventually hope, fixed 3) I wonder whether we really need both SK_OR and SK_AND, considering they are mutually exclusive. Why not to assume SK_AND by default, and only use SK_OR? If we really need them, perhaps an assert making sure they are not set at the same time would be appropriate. In short: possible ambiguity and increasing stack machine complexity. Let we have follow expression in reversed polish notation (letters represent a condtion, | - OR, & - AND logical operation, ANDs are omitted): a b c | Is it ((a & b)| c) or (a & (b | c)) ? Also, using both SK_ makes code more readable. 4) scanGetItem is a prime example of the "badly needs comments" issue, particularly because the previous version of the function actually had quite a lot of them while the new function has none. Will add soon 5) scanGetItem() may end up using uninitialized 'cmp' - it only gets initialized when (!leftFinished && !rightFinished), but then gets used when either part of the condition evaluates to true. Probably should be if (!leftFinished || !rightFinished) cmp = ... fixed 6) the code in nodeIndexscan.c should not include call to abort() { abort(); elog(ERROR, "unsupported indexqual type: %d", (int) nodeTag(clause)); } fixed, just forgot to remove 7) I find it rather ugly that the paths are built by converting BitmapOr paths. Firstly, it means indexes without amgetbitmap can't benefit from this change. Maybe that's reasonable limitation, though? I based on following thoughts: 1 code which tries to find OR-index path will be very similar to existing generate_or_bitmap code. Obviously, it should not be duplicated. 2 all existsing indexes have amgetbitmap method, only a few don't. amgetbitmap interface is simpler. Anyway, I can add an option for generate_or_bitmap to use any index, but, in current state it will just repeat all work. But more importantly, this design already has a bunch of unintended consequences. For example, the current code completely ignores enable_indexscan setting, because it merely copies the costs from the bitmap path. I'd like to add separate enable_indexorscan That's pretty dubious, I guess. So this code probably needs to be made aware of enable_indexscan - right now it entirely ignores startup_cost in convert_bitmap_path_to_index_clause(). But of course if there are multiple IndexPaths, the enable_indexscan=off will be included multiple times. 9) This already breaks estimation for some reason. Consider this ... So the OR-clause is estimated to match 0 rows, less than each of the clauses independently. Needless to say that without the patch this works just fine. fixed 10) Also, this already breaks some regression tests, apparently because it changes how 'width' is computed. fixed too So I think this way of building the index path from a BitmapOr path is pretty much a dead-end. I don't think so because separate code path to support OR-clause in index will significanlty duplicate BitmapOr generator. Will send next version as soon as possible. Thank you for your attention! -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ index_or-3.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] POC, WIP: OR-clause support for indexes
I also wonder whether the patch should add explanation of OR-clauses handling into the READMEs in src/backend/access/* Not yet, but will The patch would probably benefit from transforming it into a patch series - one patch for the infrastructure shared by all the indexes, then one patch per index type. That should make it easier to review, and I seriously doubt we'd want to commit this in one huge chunk anyway. I splitted to two: 1 0001-idx_or_core - only planner and executor changes 2 0002-idx_or_indexes - BRIN/GIN/GiST changes with tests I don't think that splitting of second patch adds readability but increase management diffculties, but if your insist I will split. 4) scanGetItem is a prime example of the "badly needs comments" issue, particularly because the previous version of the function actually had quite a lot of them while the new function has none. added -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ 0001-idx_or_core-v4.patch.gz Description: GNU Zip compressed data 0002-idx_or_indexes-v4.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Access method extendability
I don't see any objection, is consensus reached? I'm close to comiit that... I did only cursory review on the bloom contrib module and don't really have complaints there, but I know you can review that one. I'd like the English of the generic_xlog.c description improved but I won't get to it before weekend. So, per patch status: create-am ready generic-xlog need to fix comments/docs bloom-contrib need review. I'm an original author of this module and of course I can do some review of it but, seems, it's needed more eyes. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index
->nkeys && flag; i++) It checks flag two times. fixed +boxPointerToRectangle( +DatumGetBoxP(in->scankeys[i].sk_argument), +p_query_rect +); I don't think this matches the project coding style. fixed + int flag = 1, Wouldn't it be better as "bool"? fixed. The regression tests for the remaining operators are still missing. Ooops, will be soon. Attached all patches to simplify review. Thank you very much! -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ I'll try to explain with two-dimensional example over points. ASCII-art: | | 1|2 | ---+- |P 3|4 | | '+' with 'A' represents a centroid or, other words, point which splits plane for 4 quadrants and it strorend in parent node. 1,2,3,4 are labels of quadrants, each labeling will be the same for all pictures and all centriods, and i will not show them in pictures later to prevent too complicated images. Let we add C - child node (and again, it will split plane for 4 quads): | | +-+--- X |B|C | | ---+-+--- |P|A | | | | A and B are points of intersection of lines. So, box PBCAis a bounding box for points contained in 3-rd (see labeling above). For example X labeled point is not a descendace of child node with centroid C because it must be in branch of 1-st quad of parent node. So, each node (except root) will have a limitation in its quadrant. To transfer that limitation the traversalValue is used. traversalValue-2.patch.gz Description: GNU Zip compressed data q4d-3.patch.gz Description: GNU Zip compressed data range-1.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index
tend to think of a *point* as having *zero* dimensions. Would it perhaps be more accurate to say we are treating a 2-dimensional box as a point in 4-dimensional space? Exactly, sorry for ambiguity. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index
Isn't this basically the same thing that the cube contrib module does? (Which has the added benefit of kNN-capable operators). No, cube module introduces new type - N-dimensional box. And adds an index support for it. Current patch suggests non-traditional indexing technique for 2D boxes by treating them as point in 4D space. With such representation it's became possible to use quad-tree technique which: 1 supports only points to index 2 already supported by SP-GiST Such technique provides some benefits in comparance with traditional R-Tree which implemented in pg with a help GiST. See Oleg's message. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index
+ * boxtype_spgist.c The names on the file header need to be changed, too. Oops. fixed I'll try to explain with two-dimensional example over points. ASCII-art: Thank you for the explanation. Should we incorporate this with the patch. added + cmp_double(const double a, const double b) Does this function necessary? We can just compare the doubles. Yes, it compares with limited precision as it does by geometry operations. Renamed to point actual arguments. + boxPointerToRangeBox(BOX *box, RangeBox * rectangle) The "rectangle" variable in here should be renamed. fixed + Assert(is_infinite(b) == 0); This is failing on my test. You can reproduce with the script I have sent. I didn't know: # select '(1,inf)'::point; point - (1,inf) fixed Wouldn't it be simpler to palloc and return the value on those functions? evalRangeBox() initializes part of RectBox, so, it could not palloc its result. Other methods use the same signature just for consistency. I couldn't understand it. evalRangeBox() can palloc and return the result. evalRectBox() can return the result palloc'ed by evalRangeBox(). The caller wouldn't need to palloc anything. evalRangeBox() is used to initialize fields of RangeBox in evalRectBox(). If evalRangeBox() will palloc its result then we need to copy its result into RangeBox struct and free. Let both fucntion use the same interface. I went through all other variables: + int r = is_infinite(a); + double x = *(double *) a; + double y = *(double *) b; + spgInnerConsistentIn *in = (spgInnerConsistentIn *) PG_GETARG_POINTER(0); + spgLeafConsistentIn *in = (spgLeafConsistentIn *) PG_GETARG_POINTER(0); + BOX*leafBox = DatumGetBoxP(in->leafDatum); Shouldn't they be "const", too? They could. But it doesn't required. To be in consistent state I've removed const modifier where possible +/* + * Begin. This block evaluates the median of coordinates of boxes + */ I would rather explain what the function does on the function header. fixed The "end" part of it is still there. Oops again, fixed Do we really need to copy the traversalValues on allTheSame case. Wouldn't it work if just the same value is passed for all of them. The search shouldn't continue after allTheSame case. Seems, yes. spgist tree could contain a locng branches with allTheSame. Does SP-GiST allows any node under allTheSame to not being allTheSame? No, SP-GiST doesn't allow that Not setting traversalValues for allTheSame worked fine with my test. it works until allthesame branch contains only one inner node. Otherwise traversalValue will be freeed several times, see spgWalk(). # select i as id, '1,2,3,4'::box as b into x from generate_series(1,100) i; # create index ix on i using spgist (b); # select count(*) from x where b && '1,2,3,4'::box; -- coredump gdb: #0 0x00080143564a in thr_kill () from /lib/libc.so.7 #1 0x000801435636 in raise () from /lib/libc.so.7 #2 0x0008014355b9 in abort () from /lib/libc.so.7 #3 0x00a80739 in in_error_recursion_trouble () at elog.c:199 #4 0x00abb748 in pfree (pointer=0x801e90868) at mcxt.c:1016 #5 0x0053330c in freeScanStackEntry (so=0x801e8d358, stackEntry=0x801e935d8) at spgscan.c:47 #6 0x00532cdb in spgWalk (index=0x801f1c588, so=0x801e8d358, scanWholeIndex=1 '\001', storeRes=0x532d10 ... + if (in->allTheSame) Most of the things happening before this check is not used in there. Shouldn't we move this to the top of the function? yep, fixed + out->nodeNumbers = (int *) palloc(sizeof(int) * in->nNodes); This could go before allTheSame block. yep, fixed I've attached all patches again. Thank you very much! -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ q4d-4.patch.gz Description: GNU Zip compressed data traversalValue-2.patch.gz Description: GNU Zip compressed data range-1.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Draft release notes for next week's releases
Does include ICU mean that collation handling is identical across platforms? E.g. a query on Linux involving string comparison would yield the same result on MacOS and Windows? Yes, it does and that's the most important issue for us. Yes, exactly. Attached patch adds support for libicu with configure flag --with-icu. Patch rebased to current HEAD, hope, it works. It's based on https://people.freebsd.org/~girgen/postgresql-icu/readme.html work, and it was migrated to 9.5 with abbrevation keys support. Patch in current state is not ready to commit, of course. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ libicu-8.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Access method extendability
Does that mean this patch should be closed or is there more remaining to commit? Petr promises to check english in comments/docs in generic-wal patch at this week, bloom patch depends on it. Bloom patch is ready from my point of view. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Access method extendability
And here it is. It's not perfect but it's better (I am not native speaker either). It's same as v12, just changed comments somewhat. Thank you, but I have a problems with applying: % patch -p1 < ~/Downloads/0002-generic-xlog.13.patch Hmm... Looks like a new-style context diff to me... ... |diff --git a/src/backend/access/transam/generic_xlog.c b/src/backend/access/transam/generic_xlog.c |new file mode 100644 |index ...7ca03bf |*** a/src/backend/access/transam/generic_xlog.c |--- b/src/backend/access/transam/generic_xlog.c -- (Creating file src/backend/access/transam/generic_xlog.c...) Patching file src/backend/access/transam/generic_xlog.c using Plan A... patch: malformed patch at line 634: diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Access method extendability
I incorporated your changes and did some additional refinements on top of them still. Attached is delta against v12, that should cause less issues when merging for Teodor. But last version is 13th? BTW, it would be cool to add odcs in VII. Internals chapter, description should be similar to index's interface description, i.e. how to use generic WAL interface. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Access method extendability
GenericXLogStart(Relation relation) { ... if (genericXlogStatus != GXLOG_NOT_STARTED) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("GenericXLogStart: generic xlog is already started"))); Hmm, seems, generic wal whiil be in incorrect state if exception occurs between GenericXLogStart() and GenericXLogFinish() calls because static variable genericXlogStatus will contain GXLOG_LOGGED/GXLOG_UNLOGGED status. Suppose, it could be solved by different ways - remove all static variable, so, GenericXLogStart() will return an struct (object) which incapsulated all data needed to generic wal work. As I can see, in case of exception there isn't ane needing to extra cleanup. Also, it would allow to use generic wal for two or more relations at the same time, although I don't know any useful example for such feature. - add callback via RegisterResourceReleaseCallback() which will cleanup state of genericXlogStatus variable -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Access method extendability
as we discussed recently [1] you should avoid leaving "holes" with uninitialized data in structures. Please fix this or provide a comment that describes why things are done here the way they are done. [1] http://www.postgresql.org/message-id/56eff347.20...@anastigmatix.net That discussion is about SQL-level types which could be stored on disk, not about in-memory structs -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index
Thank you, pushed Emre Hasegeli wrote: I'll try to explain with two-dimensional example over points. ASCII-art: Thank you for the explanation. Should we incorporate this with the patch. added I have worked on the comments of the patch. It is attached. I hope it looks more clear than it was before. + cmp_double(const double a, const double b) Does this function necessary? We can just compare the doubles. Yes, it compares with limited precision as it does by geometry operations. Renamed to point actual arguments. I meant that we could use FP macros directly instead of this function. Doing so is also more correct. I haven't tried to produce the problem, but this function is not same as using the macros directly. For differences smaller that the epsilon, it can return different results. I have removed it on the attached version. + boxPointerToRangeBox(BOX *box, RangeBox * rectangle) The "rectangle" variable in here should be renamed. fixed I found a bunch of those too and renamed for clarity. I have also reordered the arguments of the helper functions to keep them consistent. evalRangeBox() is used to initialize fields of RangeBox in evalRectBox(). If evalRangeBox() will palloc its result then we need to copy its result into RangeBox struct and free. Let both fucntion use the same interface. I found it nicer to copy and edit the existing value than creating it in two steps like this. It is also attached. it works until allthesame branch contains only one inner node. Otherwise traversalValue will be freeed several times, see spgWalk(). It just works, when traversalValues is not set. It is also attached. I have also added the missing regression tests. While doing that I noticed that some operators are missing and also added support for them. It is also attached with the updated version of my test script. I hope I haven't changed the patch too much. I have also pushed the changes here: https://github.com/hasegeli/postgres/commits/box-spgist -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension
Sure. I attached two patches. But notice that pg_trgm.limit should be used with this command: SHOW "pg_trgm.limit"; If you will use this command: SHOW pg_trgm.limit; you will get the error: ERROR: syntax error at or near "limit" LINE 1: SHOW pg_trgm.limit; ^ This is because "limit" is keyword I think. It's easy to fix in gram.y: @@ -1499,7 +1499,7 @@ set_rest_more:/* Generic SET syntaxes: */ ; var_name: ColId { $$ = $1; } - | var_name '.' ColId + | var_name '.' ColLabel { $$ = psprintf("%s.%s", $1, $3); } ; ColId doesn't contain reserved_keyword, it's impossible to change initial part of var_name to ColId because of a lot of conflicts in grammar but could be easy changed for second part of var_name. It seems like improvement in any case but sml_limit or similarity_limit or even similarity_treshold is more preferable name than just simple limit. In future we could introduce more tresholds/limits. Also, should get/set_limit emit a warning about deprecation? Some notices about substring patch itself: 1 trgm2.data contains too much duplicates (like Barkala or Bakalan). Is it really needed for testing? 2 I'm agree with Jeff Janes about <<-> and <->> operation. They are needed. (http://www.postgresql.org/message-id/CAMkU=1zynkqfkr-j2_uq8lzp0uho8i+ledfwgt77czk_tnt...@mail.gmail.com) -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension
The behavior of this function is surprising to me. select substring_similarity('dog' , 'hotdogpound') ; substring_similarity -- 0.25 Substring search was desined to search similar word in string: contrib_regression=# select substring_similarity('dog' , 'hot dogpound') ; substring_similarity -- 0.75 contrib_regression=# select substring_similarity('dog' , 'hot dog pound') ; substring_similarity -- 1 It seems to me that users search words in long string. But I'm agree that more detailed explanation needed and, may be, we need to change feature name to fuzzywordsearch or something else, I can't imagine how. Also, should we have a function which indicates the position in the 2nd string at which the most similar match to the 1st argument occurs? select substring_similarity_pos('dog' , 'hotdogpound') ; answering: 4 Interesting, I think, it will be useful in some cases. We could call them <<-> and <->> , where the first corresponds to <% and the second to %> Agree -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tsvector editing functions
Some notices: 1 tsin in documentation doesn't look like a good name. Changed to vector similar to other places. 2 I did some editorization about freeing memory/forgotten names etc 3 It seems to me that tsvector_unnest() could be seriously optimized for large tsvectors: with current coding it detoasts/decompresses tsvector value on each call. Much better to do it once in multi_call_memory_ctx context at first call init 4 It seems debatable returning empty array for position/weight if they are absent: =# select * from unnest('a:1 b'::tsvector); lexeme | positions | weights +---+- a | {1} | {D} b | {}| {} I think, it's better to return NULL in this case 5 array_to_tsvector/tsvector_setweight_by_filter/tsvector_delete_arr/tsvector_filter doesn't check or pay attention to NULL elements in input arrays -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tsvector editing functions
Some notices: 1 tsin in documentation doesn't look like a good name. Changed to vector similar to other places. 2 I did some editorization about freeing memory/forgotten names etc Ooops, forgot to attach -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 139aa2b..9c294e3 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -9168,16 +9168,29 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple setweight - setweight(tsvector, "char") + setweight(vector tsvector, weight "char") tsvector -assign weight to each element of tsvector +assign weight to each element of vector setweight('fat:2,4 cat:3 rat:5B'::tsvector, 'A') 'cat':3A 'fat':2A,4A 'rat':5A + setweight + setweight by filter + + setweight(vector tsvector, weight "char", lexemes "text"[]) + +tsvector +assign weight to elements of vector that are listed in lexemes array +setweight('fat:2,4 cat:3 rat:5B'::tsvector, 'A', '{cat,rat}') +'cat':3A 'fat':2,4 'rat':5A + + + + strip strip(tsvector) @@ -9190,6 +9203,84 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple + delete + delete lemexeme + + delete(vector tsvector, lexeme text) + +tsvector +remove given lexeme from vector +delete('fat:2,4 cat:3 rat:5A'::tsvector, 'fat') +'cat':3 'rat':5A + + + + + delete + delete lemexemes array + + delete(vector tsvector, lexemes text[]) + +tsvector +remove any occurrence of lexemes in lexemes array from vector +delete('fat:2,4 cat:3 rat:5A'::tsvector, ARRAY['fat','rat']) +'cat':3 + + + + + unnest + + unnest(tsvector) + +setof anyelement +expand a tsvector to a set of rows. Each row has following columns: lexeme, postings, weights. +unnest('fat:2,4 cat:3 rat:5A'::tsvector) +cat{3}{A} +fat{2,4} {D,D} +rat{5}{A} +(3 rows) + + + + + to_array + + to_array(tsvector) + +text[] +convert tsvector to array of lexemes +to_array('fat:2,4 cat:3 rat:5A'::tsvector) +{cat,fat,rat} + + + + + to_tsvector + array to tsvector + + to_tsvector(text[]) + +tsvector +convert array of lexemes to tsvector +to_tsvector('{fat,cat,rat}'::text[]) +'fat' 'cat' 'rat' + + + + + filter + + filter(vector tsvector, weights "char"[]) + +tsvector +Select only elements with given weights from vector +filter('fat:2,4 cat:3b rat:5A'::tsvector, '{a,b}') +'cat':3B 'rat':5A + + + + to_tsquery to_tsquery( config regconfig , query text) diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml index d66b4d5..32033fa 100644 --- a/doc/src/sgml/textsearch.sgml +++ b/doc/src/sgml/textsearch.sgml @@ -1326,6 +1326,10 @@ FROM (SELECT id, body, q, ts_rank_cd(ti, q) AS rank + +Full list of tsvector-related functions available in . + + diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c index a3f1c361..e7ea270 100644 --- a/src/backend/utils/adt/tsvector_op.c +++ b/src/backend/utils/adt/tsvector_op.c @@ -14,6 +14,7 @@ #include "postgres.h" +#include "access/htup_details.h" #include "catalog/namespace.h" #include "catalog/pg_type.h" #include "commands/trigger.h" @@ -65,6 +66,7 @@ typedef struct #define STATHDRSIZE (offsetof(TSVectorStat, data)) static Datum tsvector_update_trigger(PG_FUNCTION_ARGS, bool config_column); +static int tsvector_bsearch(TSVector tsin, char *lexin, int lexin_len); /* * Order: haspos, len, word, for al
Re: [HACKERS] [patch] Proposal for \crosstabview in psql
Hi! Interesting feature, but it's not very obvious how to use it. I'd like to see some example(s) in documentation. And I see an implementation of AVL tree in psql source code (src/bin/psql/crosstabview.c). Postgres already has a Red-Black tree implementation in src/include/lib/rbtree.h and src/backend/lib/rbtree.c. Is any advantage of using AVL tree here? I have some doubt about that and this implementation, obviously, will not be well tested. But I see in comments that implementation is reduced to insert only and it doesn't use the fact of ordering tree, so, even hash could be used. Daniel Verite wrote: Pavel Stehule wrote: 1. maybe we can decrease name to shorter "crossview" ?? I am happy with crosstabview too, just crossview is correct too, and shorter I'm in favor of keeping crosstabview. It's more explicit, only 3 characters longer and we have tab completion anyway. 2. Columns used for ordering should not be displayed by default. I can live with current behave, but hiding ordering columns is much more practical for me I can see why, but I'm concerned about a consequence: say we have 4 columns A,B,C,D and user does \crosstabview +A:B +C:D If B and D are excluded by default, then there's nothing left to display inside the grid. It doesn't feel quite right. There's something counter-intuitive in the fact that values in the grid would disappear depending on whether and how headers are sorted. With the 3rd argument, we let the user decide what they want to see. 3. This code is longer, so some regress tests are recommended - attached simple test case I've added a few regression tests to the psql testsuite based on your sample data. New patch with these tests included is attached, make check passes. Best regards, -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: function parse_ident
rebased, messages changes per Tom's proposal Cool feature and sometimes I needed it a lot. But, seems, there are some bugs in error processing. 1 Following query is okay: # select * from parse_ident(E'"Some \r Schema".someTable'); parse_ident -- {"Some \r Schema",sometable} but: % select * from parse_ident(E'"Some \r Schema".9someTable'); Schema".9someTable"tifier after "." symbol: ""Some Return carriage is not escaped in error message. Suppose, any other special charaters will not be escaped. 2 # select * from parse_ident('.someTable'); ERROR: missing identifier after "." symbol: ".someTable" Why AFTER instead of BEFORE? 2 Function succesfully truncates long indentifier but not in case of quoted identifier. select length(a[1]), length(a[2]) from parse_ident('"xx".yyyyy') as a ; length | length + 414 | 63 -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Mac OS: invalid byte sequence for encoding "UTF8"
It seems that *scanf() with %s format occures only here: - check.c - get_bin_version() - server.c - get_major_server_version() - filemap.c - isRelDataFile() - pg_backup_directory.c - _LoadBlobs() - xlog.c - do_pg_stop_backup() - mac.c - macaddr_in() I think here sscanf() do not works with the UTF-8 characters. And probably this is only spell.c issue. Hmm. Here src/backend/access/transam/xlog.c read_tablespace_map() using %s in scanf looks suspisious. I don't fully understand but it looks like it tries to read oid as string. So, it should be safe in usial case Next, _LoadBlobs() reads filename (fname) with a help of sscanf. Could file name be in UTF-8 encoding here? I agree that previous patch is wrong. Instead of using new parse_ooaffentry() function maybe better to use sscanf() with %ls format. The %ls format is used to read a wide character string. Does %ls modifier exist everewhere? Apple docs says (https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/sscanf.3.html): s ... If an l qualifier is present, the next pointer must be a pointer to wchar_t, into which the input will be placed after conversion by mbrtowc Actually, it means that wchar2char() call should be used, but it uses wcstombs[_l] which could do not present on some platforms. Does it mean that l modifier of string presents too or not? What do we need to do if %l exists but wcstombs[_l] not? I'm a bit crazy with locale problems and it seems to me that Artur's patch is good idea. Actually, I don't remember exactly, but, seems, commit 7ac8a4be8946c11d5a6bf91bb971b9750c1c60e5 introduced parse_affentry() instead of corresponding sscanf to avoid problems with encoding and scanf. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROPOSAL] Improvements of Hunspell dictionaries support
I duplicate the patch here. it's very good thing to update disctionaries to support modern versions. And thank you for improving documentation. Also I've impressed by long description in spell.c header. Som notices about code: 1 struct SPELL. Why do you remove union p? You leave comment about using d struct instead of flag field and as can see it's right comment. It increases size of SPELL structure. 2 struct AFFIX. I'm agree with Alvaro taht sum of sizes of bit fields should be less or equal to size of integer. In opposite case, suppose, we can get undefined behavior. Please, split bitfields to two integers. 3 unsigned char flagval[65000]; Is it forbidden to use 6 number? In any case, decodeFlag() doesn't restrict return value. I suggest to enlarge array to 1<<16 and add limit to return value of decodeFlag(). 4 I'd like to see a short comment describing at least new functions 5 Pls, add tests for new code. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension
The behavior of this function is surprising to me. select substring_similarity('dog' , 'hotdogpound') ; substring_similarity -- 0.25 Substring search was desined to search similar word in string: contrib_regression=# select substring_similarity('dog' , 'hot dogpound') ; substring_similarity -- 0.75 contrib_regression=# select substring_similarity('dog' , 'hot dog pound') ; substring_similarity -- 1 Hmm, this behavior looks too much like magic to me. I mean, a substring is a substring -- why are we treating the space as a special character here? Because it isn't a regex for substring search. Since implementing, pg_trgm works over words in string. contrib_regression=# select similarity('block hole', 'hole black'); similarity 0.571429 contrib_regression=# select similarity('block hole', 'black hole'); similarity 0.571429 It ignores spaces between words and word's order. I agree, that substring_similarity is confusing name, but actually it search most similar word in second arg to first arg and returns their similarity. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension
I have attached a new version of the patch. It fixes error of operators <->> and %>: - operator <->> did not pass the regression test in CentOS 32 bit (gcc 4.4.7 20120313). - operator %> did not pass the regression test in FreeBSD 32 bit (gcc 4.2.1 20070831). It was because of variable optimization by gcc. Fixed with volatile modifier, right? I'm close to push this patches, but I still doubt in names, and I'd like to see comment from English speackers: 1 sml_limit GUC variable (options: similarity_limit, sml_threshold) 2 subword_similarity(). Actually, it finds most similar word (not substring!) from whole string. word_similarity? word_in_string_similarity? substring_similarity_pos() could be a separate patch. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension
1 - sml_limit to similarity_limit. sml_threshold is difficult to write I think, similarity_limit is more simple. It seems to me that threshold is right word by meaning. sml_threshold is my choice. 2 - subword_similarity() to word_similarity(). Agree, according to Mike Rylander opinion in this thread -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improved error reporting in format()
Thank you, committed Jim Nasby wrote: On 1/2/16 5:57 PM, Jim Nasby wrote: Attached patch clarifies that %-related error messages with hints as well as (IMHO) improving the clarity of the message: Sorry, forgot to update regression tests. New patch attached. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GinPageIs* don't actually return a boolean
One more option for patch: #define GinPageIsLeaf(page)((bool)(GinPageGetOpaque(page)->flags & GIN_LEAF)) Seems it will work on any platform with built-in bool. But I don't know will it work with 'typedef char bool' if high bit will be set. That's true, but it doesn't really seem like a reason not to commit this patch. I mean, the coding here is (a) dangerous by your own admission and (b) actually breaks on platforms for which we allege support. If we find out that somebody has implemented an int-width bool we'll have some bigger decisions to make, but I don't see any particular reason why we've got to make those decisions now. +1 -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension
On Thu, Feb 11, 2016 at 9:56 AM, Teodor Sigaev wrote: 1 - sml_limit to similarity_limit. sml_threshold is difficult to write I think, similarity_limit is more simple. It seems to me that threshold is right word by meaning. sml_threshold is my choice. Why abbreviate it like that? Nobody's going to know that "sml" stands for "similarity" without consulting the documentation, and that sucks. Ok, I don't have an objections. I worked a lot on various similarity modules and sml becomes usual for me. That's why I was asking. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small PATCH: check of 2 Perl modules
On 2/12/16 8:20 AM, Eugene Kazakov wrote: TAP-tests need two Perl modules: Test::More and IPC::Run. I think those modules are part of a standard Perl installation. IPC::Run is not. At least for perl from ports tree in FreeBSD. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index
It's very pity but author is not able to continue work on this patch, and I would like to raise this flag. I'd like to add some comments about patches: traversalValue patch adds arbitrary value assoŃiated with branch in SP-GiST tree walk. Unlike to recostructedValue it could be just pointer, it havn't to be a regular pgsql type. Also, it could be used independly from reconstructedValue. This patch is used both following attached patches. range patch just switchs using reconstructedValue to traversalValue in range opclasses. reconstructedValue was used just because it was an acceptable workaround in case of range type. Range opclase stores a full value in leafs and doesn't need to use reconstructedValue to return tuple in index only scan. See http://www.postgresql.org/message-id/5399.1343512...@sss.pgh.pa.us q4d patch implements index over boxes using SP-GiST. Basic idea was an observation, range opclass thinks about one-dimensional ranges as 2D points. Following this idea, we can think that 2D box (what is 2 points or 4 numbers) could represent a 4D point. We hoped that this approach will be much more effective than traditional R-Tree in case of many overlaps in box's collection. Performance results are coming shortly. In near future (say, tommorrow) I hope to suggest a opclass over polygons based on this approach. Alexander Lebedev wrote: Hello, Hacker. * [PATCH] add a box index to sp-gist We have extended sp-gist with an index that keeps track of boxes We have used ideas underlying sp-gist range implementation to represent 2D boxes as points in 4D space. We use quad tree analogue, but in 4-dimensional space. We call this tree q4d. Each node of this tree is a box (a point in 4D space) which splits space in 16 hyperrectangles. Rationale: r-tree assumes that boxes we're trying to index don't overlap much. When this assumption fails, r-tree performs badly, while our proposal to represent a rectangle as a point in 4D space solves this problem. NB: the index uses traversalValue introduced in a separate patch. * [PATCH] add traversalValue in sp-gist During implementation of box index for sp-gist we saw that we only keep rectangles, but to determine traversal direction we may need to know boundaries of a hyperrectangle. So we calculate them on the fly and store them in traversalValue, available when traversing child nodes, because otherwise we would't be able to calculate them from inside the inner_consistent function call. This patch was written by Teodor Sigaev. * [PATCH] change reconstructValue -> traversalValue in range_spgist.c During traversal, local context is insufficient to pick traversal direction. As of now, this is worked around with the help of reconstructValue. reconstructValue only stores data of the same type as a tree node, that is, a range. We have written a patch that calculates auxillary values and makes them accessible during traversal. We then use traversalValue in a new box index and change range_spgist.c to use it in place of reconstructValue. NB: apply this patch after traversalValue patch. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ traversalValue-1.patch.gz Description: GNU Zip compressed data range-1.patch.gz Description: GNU Zip compressed data q4d-1.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] ALTER ... OWNER TO ... CASCADE
TBH, this sounds like a completely terrible idea. There are far too many sorts of dependencies across which one would not expect ownership to propagate; for example, use of a function in a view, or even just a foreign key dependency between two tables. I'm not even clear that there are *any* cases where this behavior is wanted, other than perhaps ALTER OWNER on an extension --- and even there, what you would want is altering the ownership of the member objects, but not everything that depends on the member objects. So basically, a generic CASCADE facility sounds like a lot of work to produce something that would seldom be anything but a foot-gun. DELETE FROM or TRUNCATE could be a foot-gun too, but it's not a reason to remove tham. I faced with problem when I tried to change owner of datadase with all objects inside. Think, this feature could be useful although it should restricted to superuser obly. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: function parse_ident
I missed some of my edits. Updated patch with those in place attached. I'm sorry, but special chararacter still isn't escaped correctly in error messages: % select parse_ident(E'X\rXX'); XX""X Time: 0,510 ms -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Access method extendability
Next version of patch is attached: * Documentation for CREATE ACCESS METHOD command is added. * Refactoring and comments for bloom contrib. Cool work, nice to see. Some comments 1 create-am.7.patch: function lookup_index_am_handler_func() why doesn't it emit error in case of emtpy input? If it checks signature of function and empty handler is not allowed then, seems, all checks about handler have to be moved in lookup_index_am_handler_func(). 2 create-am.7.patch: Comment near get_am_name() incorrectly mentions get_am_oid function 3 create-am.7.patch: get_am_name(Oid amOid, char amtype). Seems, amtype argument is overengineering. get_am_name() is used only in error messages and additional check in it will do nothing or even confuse user. 4 create-am.7.patch: Inconsistentcy with psql help. As I can see other code, it's forbidden to create access method without handler postgres=# \h create access method Command: CREATE ACCESS METHOD Description: define a new access method Syntax: CREATE ACCESS METHOD name TYPE INDEX [ HANDLER handler_function | NO HANDLER ] postgres=# create access method yyy type index no handler; ERROR: syntax error at or near "no" LINE 1: create access method yyy type index no handler; 5 create-am.7.patch: file src/bin/pg_dump/pg_dump.h. Extra comma near DO_POLICY: *** 77,83 DO_POST_DATA_BOUNDARY, DO_EVENT_TRIGGER, DO_REFRESH_MATVIEW, ! DO_POLICY } DumpableObjectType; typedef struct _dumpableObject --- 78,84 DO_POST_DATA_BOUNDARY, DO_EVENT_TRIGGER, DO_REFRESH_MATVIEW, ! DO_POLICY, } DumpableObjectType; 6 generic-xlog.7.patch: writeDelta() Seems, even under USE_ASSERT_CHECKING define checking diff by its applying is to expensive. May be, let we use here GENERIC_WAL_DEBUG macros? 7 generic-xlog.7.patch: src/backend/access/transam/generic_xlog.c It's unclear for me, what does MATCH_THRESHOLD do or intended for? Pls, add comments here. 8 generic-xlog.7.patch: src/backend/access/transam/generic_xlog.c. Again, it's unclear for me, what is motivation of size of PageData.data? 9 generic-xlog.7.patch: GenericXLogRegister(), Seems, emitting an error if caller tries to register buffer which is already registered isn't good idea. In practice, say, SP-GIST, buffer variable is used and page could be easily get from buffer. Suppose, GenericXLogRegister() should not emit an error and ignore isnew flag if case of second registration of the same buffer. 10 bloom-contrib.7.patch: blutils.c: In function 'initBloomState': blutils.c:128:20: warning: variable 'opaque' set but not used [-Wunused-but-set-variable] BloomPageOpaque opaque; 11 I'd really like to see regression tests (TAP-tests?) for replication with generic xlog. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN data corruption bug(s) in 9.6devel
Thank you for remembering this problem, at least for me. Well, turns out there's a quite significant difference, actually. The index sizes I get (quite stable after multiple runs): 9.5 : 2428 MB 9.6 + alone cleanup : 730 MB 9.6 + pending lock : 488 MB Interesting, I don't see why alone_cleanup and pending_lock are so differ. I'd like to understand that, but does somebody have an good theory? The single point in pending_lock patch is an suspicious exception in ProcSleep, this exception may cause problem in future. So that's quite a significant difference, I guess. The load duration for each version look like this: 9.5 : 1415 seconds 9.6 + alone cleanup : 1310 seconds 9.6 + pending lock : 1380 seconds I'd say I'm happy with sacrificing ~5% of time in exchange for ~35% reduction of index size. I think, alone_cleanup patch is faster because regular insert could break its cleanup process if autovacuum waits to begin work on cleanup. So, insert process could returns earlier from pending cleanup process. In attachment just rebased v2 alone_cleanup patch. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ gin_alone_cleanup-3.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN data corruption bug(s) in 9.6devel
Well, turns out there's a quite significant difference, actually. The index sizes I get (quite stable after multiple runs): 9.5 : 2428 MB 9.6 + alone cleanup : 730 MB 9.6 + pending lock : 488 MB In attach modified alone_cleanup patch which doesn't break cleanup process as it does pending_lock patch but attached patch doesn't touch a lock management. Tomas. if you can, pls, repeat test with this patch. If not, I will try to do it, but later. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ gin_alone_cleanup-3.1.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Phrase search distance syntax
Sorry to be asking another phrase search syntax question, and so close to final release, but ... Really close... Why does the phrase distance operator assume <1> means adjacent words, and not <0>. (FYI, <-> is the same as <1>.) Because 1 it is a result of subtruction of word's positions 2 <0> could be used as special case like a word with two infinitives: # create text search dictionary xx (template = 'ispell', DictFile='ispell_sample', AffFile='ispell_sample'); # alter text search configuration english ALTER MAPPING FOR asciiword WITH xx, english_stem; # select to_tsvector('english', 'bookings'); to_tsvector -- 'book':1 'booking':1 # select to_tsvector('english', 'bookings') @@ 'book <0> booking'; ?column? -- t (1 row) -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PinBuffer() no longer makes use of strategy
if (buf->usage_count < BM_MAX_USAGE_COUNT) if (BUF_STATE_GET_USAGECOUNT(buf_state) != BM_MAX_USAGE_COUNT) being prone to paranoia, I prefer the first, but I've seen both styles in the code so I don't know if it's worth futzing with. Ok, let's be paranoic and do this same way as before. Revised patch is attached. I see the change was done in 9.6 release cycle in commit 48354581a49c30f5757c203415aa8412d85b0f70 at April, 10. Does it mean the fix should be backpatched too? -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Wraparound limits
Hi! I have a questions about setting transaction's wraparound limits. Function SetTransactionIdLimit() in access/transam/varsup.c: 1) xidWrapLimit = oldest_datfrozenxid + (MaxTransactionId >> 1); if (xidWrapLimit < FirstNormalTransactionId) xidWrapLimit += FirstNormalTransactionId; Isn't it a problem if oldest_datfrozenxid > MaxTransactionId/2? 2) xidStopLimit = xidWrapLimit - 100; if (xidStopLimit < FirstNormalTransactionId) xidStopLimit -= FirstNormalTransactionId; xidWarnLimit = xidStopLimit - 1000; if (xidWarnLimit < FirstNormalTransactionId) xidWarnLimit -= FirstNormalTransactionId; Why does it use '-' instead of '+' if variable < FirstNormalTransactionId? In this case it is easy to get xidStopLimit > xidWrapLimit or xidWarnLimit > xidStopLimit... Thank you. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb format is pessimal for toast compression
value-to-be-compressed. (first_success_by is 1024 in the default set of compression parameters.) Curious idea: we could swap JEntry array and values: values in the begining of type will be catched by pg_lzcompress. But we will need to know offset of JEntry array, so header will grow up till 8 bytes (actually, it will be a varlena header!) -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb format is pessimal for toast compression
Curious idea: we could swap JEntry array and values: values in the begining of type will be catched by pg_lzcompress. But we will need to know offset of JEntry array, so header will grow up till 8 bytes (actually, it will be a varlena header!) May be I wasn't clear:jsonb type will start from string collection instead of JEntry array, JEntry array will be placed at the end of object/array. so, pg_lzcompress will find repeatable 4-byte pieces in first 1024 bytes of jsonb. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops
Thank you, pushed. I just make test table permanent. Anastasia Lubennikova wrote: The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested As I can see, this bugfix was already discussed and reviewed. All mentioned issues are fixed in the latest version of the patch So I mark it "Ready for committer". Thank you for fixing it! The new status of this patch is: Ready for Committer -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
Thank you for your suggestions, do not hesitate to ask any questions, concurrency and GIN both are very interesting topics. I had a look on patch and found some issue. Look at ginvacuum.c around line 387, function ginVacuumPostingTreeLeaves(): /* * All subtree is empty - just return TRUE to indicate that parent must * do a cleanup. Unless we are ROOT an there is way to go upper. */ if(isChildHasVoid && !isAnyNonempy && !isRoot) return TRUE; if(isChildHasVoid) { ... ginScanToDelete(gvs, blkno, TRUE, &root, InvalidOffsetNumber); } In first 'if' clause I see !isRoot, so second if and corresponding ginScanToDelete() could be called only for root in posting tree. If so, it seems to me, it wasn't a good idea to move ginScanToDelete() from ginVacuumPostingTree() and patch should just remove lock for cleanup over ginVacuumPostingTreeLeaves() and if it returns that tree contains empty page then lock the whole posting tree to do ginScanToDelete() work. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Potential data loss of 2PC files
If that can happen, don't we have the same problem in many other places? Like, all the SLRUs? They don't fsync the directory either. Right, pg_commit_ts and pg_clog enter in this category. Implemented as attached. Is unlink() guaranteed to be durable, without fsyncing the directory? If not, then we need to fsync() the directory even if there are no files in it at the moment, because some might've been removed earlier in the checkpoint cycle. What is protection if pg crashes after unlimk() but before fsync()? Right, it's rather small window for such scenario, but isn't better to have another protection? Like WAL-logging of WAL segment removing... -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
No, second conditional code will be called for any subtree, which contains totally empty subtree. That check !isRoot covers case when the entire posting tree should be erased: we cannot just quit out of recursive cleanup, we have to make a scan here, starting from root. Oh, I see Probably, variable isChildHasVoid has a bit confusing name. This flag indicates that some subtree: 1. Had empty pages 2. Did not bother deleting them, because there is a chance that it is a part of a bigger empty subtree. May be it'd be better to call the variable "someChildIsVoidSubtree". hasEmptyChild? and hasNonEmptyChild (BTW, isAnyNonempy has missed 't') And if the whole posting tree is empty,then we could mark root page as leaf and remove all other pages in tree without any locking. Although, it could be a task for separate patch. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
Thank you, pushed Andrew Borodin wrote: 2017-03-22 22:48 GMT+05:00 Teodor Sigaev : hasEmptyChild? and hasNonEmptyChild (BTW, isAnyNonempy has missed 't') Yes, I think this naming is good. It's clear what's in common in these flags and what's different. And if the whole posting tree is empty,then we could mark root page as leaf and remove all other pages in tree without any locking. Although, it could be a task for separate patch. From the performance point of view, this is a very good idea. Both, performance of VACUUM and performance of Scans. But doing so we risk to leave some garbage pages in case of a crash. And I do not see how to avoid these without unlinking pages one by one. I agree, that leaving this trick for a separate patch is quite reasonable. Best regards, Andrey Borodin. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backend crash on non-exclusive backup cancel
Hi! I believe patch looks good and it's ready to commit. As I understand, it fixes bug introduced by commit 7117685461af50f50c03f43e6a622284c8d54694 Date: Tue Apr 5 20:03:49 2016 +0200 Implement backup API functions for non-exclusive backups And, suppose, it should be backpatched to 9.6? -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Potential data loss of 2PC files
Hmm, it doesn't work (but appplies) on current HEAD: % uname -a FreeBSD *** 11.0-RELEASE-p8 FreeBSD 11.0-RELEASE-p8 #0 r315651: Tue Mar 21 02:44:23 MSK 2017 teodor@***:/usr/obj/usr/src/sys/XOR amd64 % pg_config --configure '--enable-depend' '--enable-cassert' '--enable-debug' '--enable-tap-tests' '--with-perl' 'CC=clang' 'CFLAGS=-O0' % rm -rf /spool/pg_data/* % initdb -n -E UTF8 --locale ru_RU.UTF-8 --lc-messages C --lc-monetary C --lc-numeric C --lc-time C -D /spool/pg_data Running in no-clean mode. Mistakes will not be cleaned up. The files belonging to this database system will be owned by user "teodor". This user must also own the server process. The database cluster will be initialized with locales COLLATE: ru_RU.UTF-8 CTYPE:ru_RU.UTF-8 MESSAGES: C MONETARY: C NUMERIC: C TIME: C The default text search configuration will be set to "russian". Data page checksums are disabled. fixing permissions on existing directory /spool/pg_data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting dynamic shared memory implementation ... posix creating configuration files ... ok running bootstrap script ... 2017-03-23 23:07:14.705 MSK [40286] FATAL: could not open file "pg_clog": No such file or directory child process exited with exit code 1 initdb: data directory "/spool/pg_data" not removed at user's request -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [BUGS] Problem in using pgbench's --connect(-C) and --rate=rate(-R rate) options together.
Hi, the patch looks good except why do you remove initialization of is_throttled? Suppose, just a typo? --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -1967,7 +1967,6 @@ top: st->listen = false; st->sleeping = false; st->throttling = false; - st->is_throttled = false; memset(st->prepared, 0, sizeof(st->prepared)); } -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backend crash on non-exclusive backup cancel
Thank you all, pushed Michael Paquier wrote: On Fri, Mar 24, 2017 at 1:45 AM, Teodor Sigaev wrote: I believe patch looks good and it's ready to commit. Thanks for the review! As I understand, it fixes bug introduced by commit 7117685461af50f50c03f43e6a622284c8d54694 Date: Tue Apr 5 20:03:49 2016 +0200 Implement backup API functions for non-exclusive backups Indeed. And, suppose, it should be backpatched to 9.6? Yes, that's where non-exclusive backups have been introduced. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Potential data loss of 2PC files
And the renaming of pg_clog to pg_xact is also my fault. Attached is an updated patch. Thank you. One more question: what about symlinks? If DBA moves, for example, pg_xact to another dist and leaves the symlink in data directoty. Suppose, fsync on symlink will do nothing actually. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Declarative partitioning optimization for large amount of partitions
things in order I'm attaching the previous patch as well. Patches look good, but I have some notices: 1 step1 Why do you need TabStatHashEntry at all? TabStatHashEntry.t_id is never used for read, so entry for hash could be just a pointer to PgStat_TableStatus. 2 step1 In pgstat_report_stat() you remove one by one entries from hash and remove them all. Isn't it better to hash_destroy/hash_create or even let hash lives in separate memory context and just resets it? 3 step1 Again, pgstat_report_stat(), all-zero entries aren't deleted from hash although they will be free from point of view of pgStatTabList. 4 step 2. The same as 1) about SeenRelsEntry->rel_id but it even isn't initialized anywhere. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Declarative partitioning optimization for large amount of partitions
Sorry, 1) and 4) is my fault, comment in hsearch.h: * ... The hash key * is expected to be at the start of the caller's hash entry data structure. Ops, forgot that. Teodor Sigaev wrote: things in order I'm attaching the previous patch as well. Patches look good, but I have some notices: 1 step1 Why do you need TabStatHashEntry at all? TabStatHashEntry.t_id is never used for read, so entry for hash could be just a pointer to PgStat_TableStatus. 2 step1 In pgstat_report_stat() you remove one by one entries from hash and remove them all. Isn't it better to hash_destroy/hash_create or even let hash lives in separate memory context and just resets it? 3 step1 Again, pgstat_report_stat(), all-zero entries aren't deleted from hash although they will be free from point of view of pgStatTabList. 4 step 2. The same as 1) about SeenRelsEntry->rel_id but it even isn't initialized anywhere. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [BUGS] Problem in using pgbench's --connect(-C) and --rate=rate(-R rate) options together.
No, it is really needed so that the lag measure is correct. Thank you, pushed -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Potential data loss of 2PC files
Thank you. One more question: what about symlinks? If DBA moves, for example, pg_xact to another dist and leaves the symlink in data directoty. Suppose, fsync on symlink will do nothing actually. I did not think of this case, but is that really common? There is even I saw a lot such cases. If something should be done in this area, that would be surely in fsync_fname directly to centralize all the calls, and I would think of that as a separate patch, and a separate discussion. Agree -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Potential data loss of 2PC files
Thank you, pushed Michael Paquier wrote: On Fri, Mar 24, 2017 at 11:36 PM, Teodor Sigaev wrote: And the renaming of pg_clog to pg_xact is also my fault. Attached is an updated patch. Thank you. One more question: what about symlinks? If DBA moves, for example, pg_xact to another dist and leaves the symlink in data directoty. Suppose, fsync on symlink will do nothing actually. I did not think of this case, but is that really common? There is even no option to configure that at command level. And surely we cannot support any fancy scenario that people figure out using PGDATA. Existing callers of fsync_fname don't bother about this case as well by the way, take the calls related to pg_logical and pg_repslot. If something should be done in this area, that would be surely in fsync_fname directly to centralize all the calls, and I would think of that as a separate patch, and a separate discussion. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] ALTER DEFAULT PRIVILEGES with GRANT/REVOKE ON SCHEMAS
Thank you, pushed Matheus de Oliveira wrote: On Thu, Mar 2, 2017 at 10:27 AM, David Steele mailto:da...@pgmasters.net>> wrote: It looks like this patch is still waiting on an update for tab completion in psql. Hi All, Sorry about the long delay... It was so simple to add it to tab-complete.c that is a shame I didn't do it before, very sorry about that. Attached the new version of the patch that is basically the same as previously with the addition to tab completion for psql and rebased with master. Hope it is enough. Thank you all. -- Matheus de Oliveira -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SortSupport for macaddr type
Thank you, pushed Excellent! I've attached a new (and hopefully final) version of the patch. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Covering + unique indexes.
- IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P + IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE I think your syntax would read no worse, possibly even better, if you just used the existing INCLUDING keyword. It was a discussion in this thread about naming and both databases, which support covering indexes, use INCLUDE keyword. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Covering + unique indexes.
I had a look on patch and played with it, seems, it looks fine. I splitted it to two patches: core changes (+bloom index fix) and btree itself. All docs are left in first patch - I'm too lazy to rewrite documentation which is changed in second patch. Any objection from reviewers to push both patches? -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ 0001-covering-core.patch Description: binary/octet-stream 0002-covering-btree.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pluggable storage
1. Table AM with a 6-byte TID. 2. Table AM with a custom locator format, which could be TID-like. 3. Table AM with no locators. Currently TID has its own type in system catalog. Seems, it's possible that storage claims type of TID which it uses. Then, AM could claim it too, so the core based on that information could solve the question about AM-storage compatibility. Storage could also claim that it hasn't TID type at all so it couldn't be used with any access method, use case: compressed column oriented storage. As I remeber, only GIN depends on TID format, other indexes use it as opaque type. Except, at least, btree and GiST - they believ that internal pointers are the same as outer (to heap) Another dubious part - BitmapScan. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Suspicious place in heap_prepare_freeze_tuple()
Hi! Playing around freezing tuple I found suspicious piece of code: heap_prepare_freeze_tuple(): ... frz->t_infomask = tuple->t_infomask; ... frz->t_infomask &= ~HEAP_XMAX_BITS; frz->xmax = newxmax; if (flags & FRM_MARK_COMMITTED) frz->t_infomask &= HEAP_XMAX_COMMITTED; Seems, in last line it should be a bitwise OR instead of AND. Now this line cleans all bits in t_infomask which later will be copied directly in tuple. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 8deb344d09..ec227bac80 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -6639,7 +6639,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid, frz->t_infomask &= ~HEAP_XMAX_BITS; frz->xmax = newxmax; if (flags & FRM_MARK_COMMITTED) -frz->t_infomask &= HEAP_XMAX_COMMITTED; +frz->t_infomask |= HEAP_XMAX_COMMITTED; changed = true; totally_frozen = false; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Double shared memory allocation for SLRU LWLocks
It seems to me that we're allocating shared memory for SLRU lwlocks twice, unless I'm missing something. I think you are right. Did you check previous versions? i.e. should it be applyed to previous branches?? I suppose yes, to minimize code difference. Also I'd like an idea to add Assert(offset <= SimpleLruShmemSize(nslots, nlsns)) at the end of SimpleLruInit() SimpleLruShmemSize() calculates total SLRU shared memory size including lwlocks size. SimpleLruInit() starts with line shared = (SlruShared) ShmemInitStruct(name, SimpleLruShmemSize(nslots, nlsns), &found); which allocates SLRU shared memory (LWLocks size is included because SimpleLruShmemSize() is used for size computation). Following line allocates shared memory for LWLocks again: shared->buffer_locks = (LWLockPadded *) ShmemAlloc(sizeof(LWLockPadded) * nslots); Attached patch fixes that by removing extra ShmemAlloc for SLRU. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Pageinspect - add functions on GIN and GiST indexes from gevel
It's not clear from the web site in question that the relevant code is released under the PostgreSQL license. As author I allow to use this code in PostgreSQL and under its license. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] convert EXSITS to inner join gotcha and bug
Hi! Seems, there two issues: 1) Sometime conditions which for a first glance could be pushed down to scan are leaved as join quals. And it could be a ~30 times performance loss. 2) Number of query result depend on enabe_seqscan variable. The query explain analyze SELECT * FROM t1 INNER JOIN t2 ON ( EXISTS ( SELECT true FROM t3 WHERE t3.id1 = t1.id AND t3.id2 = t2.id ) ) WHERE t1.name = '5c5fec6a41b8809972870abc154b3ecd' ; produces following plan: Nested Loop (cost=6.42..1928.71 rows=1 width=99) (actual time=71.415..148.922 rows=162 loops=1) Join Filter: (t3.id1 = t1.id) Rows Removed by Join Filter: 70368 -> Index Only Scan using t1i2 on t1 (cost=0.28..8.30 rows=1 width=66) (actual time=0.100..0.103 rows=1 loops=1) Index Cond: (name = '5c5fec6a41b8809972870abc154b3ecd'::text) Heap Fetches: 1 -> Hash Join (cost=6.14..1918.37 rows=163 width=66) (actual time=0.370..120.971 rows=70530 loops=1) (1) Hash Cond: (t3.id2 = t2.id) (2) -> Seq Scan on t3 (cost=0.00..1576.30 rows=70530 width=66) (actual time=0.017..27.424 rows=70530 loops=1) -> Hash (cost=3.84..3.84 rows=184 width=33) (actual time=0.273..0.273 rows=184 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 20kB -> Seq Scan on t2 (cost=0.00..3.84 rows=184 width=33) (actual time=0.017..0.105 rows=184 loops=1) Planning time: 7.326 ms Execution time: 149.115 ms Condition (1) is not pushed to scan (2) which seemsly could be safely moved. With seqscan = off condition is not pushed too but query returns only one row instead of 162. Scan on t3 returns ~7 rows but only ~150 rows are really needed. I didn't found a combination of GUCs enable_* to push down that and it seems to me there is reason for that which I don't see or support is somehow missed. If pair of (t3.id1, t3.id2) is unique (see dump, there is a unique index on them) the query could be directly rewrited to inner join and its plan is: Nested Loop (cost=9.70..299.96 rows=25 width=66) (actual time=0.376..5.232 rows=162 loops=1) -> Nested Loop (cost=9.43..292.77 rows=25 width=99) (actual time=0.316..0.645 rows=162 loops=1) -> Index Only Scan using t1i2 on t1 (cost=0.28..8.30 rows=1 width=66) (actual time=0.047..0.050 rows=1 loops=1) Index Cond: (name = '5c5fec6a41b8809972870abc154b3ecd'::text) Heap Fetches: 1 -> Bitmap Heap Scan on t3 (cost=9.15..283.53 rows=94 width=66) (actual time=0.257..0.426 rows=162 loops=1) Recheck Cond: (id1 = t1.id) Heap Blocks: exact=3 -> Bitmap Index Scan on t3i1 (cost=0.00..9.12 rows=94 width=0) (actual time=0.186..0.186 rows=162 loops=1) Index Cond: (id1 = t1.id) -> Index Only Scan using t2i1 on t2 (cost=0.27..0.29 rows=1 width=33) (actual time=0.024..0.024 rows=1 loops=162) Index Cond: (id = t3.id2) Heap Fetches: 162 Planning time: 5.532 ms Execution time: 5.457 ms Second plan is ~30 times faster. But with turned off sequentual scan the first query is not work correctly, which points to some bug in planner, I suppose. Both 9.6 and 10devel are affected to addiction of query result on seqscan variable. Dump to reproduce (subset of real data but obfucated), queries are in attachment http://sigaev.ru/misc/exists_to_nested.sql.gz -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ --query returns 162 rows explain analyze SELECT * FROM t1 INNER JOIN t2 ON ( EXISTS ( SELECT true FROM t3 WHERE t3.id1 = t1.id AND t3.id2 = t2.id ) ) WHERE t1.name = '5c5fec6a41b8809972870abc154b3ecd' ; set enable_seqscan=off; --the same query returns only one row! explain analyze SELECT * FROM t1 INNER JOIN t2 ON ( EXISTS ( SELECT true FROM t3 WHERE t3.id1 = t1.id AND t3.id2 = t2.id ) ) WHERE t1.name = '5c5fec6a41b8809972870abc154b3ecd' ; explain analyze SELECT t1.* FROM t1, t2, t3 WHERE t1.name =
Re: [HACKERS] convert EXSITS to inner join gotcha and bug
Both 9.6 and 10devel are affected to addiction of query result on seqscan variable. Oops, I was too nervious, 9.6 is not affected to enable_seqscan setting. But it doesn't push down condition too. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] convert EXSITS to inner join gotcha and bug
Ah, thanks for the clue about enable_hashjoin, because it wasn't reproducing for me as stated. I missed tweaked config, sorry -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] convert EXSITS to inner join gotcha and bug
Really, the way to fix Teodor's complaint is to recognize that the semijoin inner rel is effectively unique against the whole outer rel, and then strength-reduce the semijoin to a plain join. The infrastructure we built for unique joins is capable of proving that, we just weren't applying it in the right way. Perfect, it works. Thank you! Second patch reduces time of full query (my example was just a small part) from 20 minutes to 20 seconds. I'm kind of strongly tempted to apply the second patch; but it would be fair to complain that reduce_unique_semijoins() is new development and should wait for v11. Opinions? Obviously, I'm voting for second patch applied to version 10. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Perfomance bug in v10
Hi! I found an example where v10 chooses extremely non-optimal plan: select i::int as a, i::int + 1 as b, 0 as c into t from generate_series(1,32) as i; create unique index i on t (c, a); explain analyze SELECT t1.a, t1.b, t2.a, t2.b, t3.a, t3.b, t4.a, t4.b, t5.a, t5.b, t6.a, t6.b /* , t7.a, t7.b, t8.a, t8.b, t9.a, t9.b, t10.a, t10.b */ FROM t T1 LEFT OUTER JOIN t T2 ON T1.b = T2.a AND T2.c = 0 LEFT OUTER JOIN t T3 ON T2.b = T3.a AND T3.c = 0 LEFT OUTER JOIN t T4 ON T3.b = T4.a AND T4.c = 0 LEFT OUTER JOIN t T5 ON T4.b = T5.a AND T5.c = 0 LEFT OUTER JOIN t T6 ON T5.b = T6.a AND T6.c = 0 LEFT OUTER JOIN t T7 ON T6.b = T7.a AND T7.c = 0 LEFT OUTER JOIN t T8 ON T7.b = T8.a AND T8.c = 0 LEFT OUTER JOIN t T9 ON T8.b = T9.a AND T9.c = 0 LEFT OUTER JOIN t T10 ON T9.b = T10.a AND T10.c = 0 WHERE T1.c = 0 AND T1.a = 5 ; It takes 4 seconds on my laptop, uncommenting commented lines causes run forever. analyzing table or removing index reduces execution time to milliseconds regardless on commented or uncommented lines. The commit commit 9c7f5229ad68d7e0e4dd149e3f80257893e404d4 Author: Tom Lane Date: Fri Apr 7 22:20:03 2017 -0400 Optimize joins when the inner relation can be proven unique. seems a root this problem - before it the query takes milliseconds. In attachment there is a output of explain analyze with commented lines, my attention was attracted by a huge number of loops: -> Materialize (cost=0.00..1.40 rows=1 width=8) (actual time=0.000..0.001 rows=17 loops=1048576) -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ Timing is on. DROP TABLE Time: 5,268 ms SELECT 32 Time: 5,515 ms CREATE INDEX Time: 14,971 ms QUERY PLAN --- Nested Loop Left Join (cost=0.00..8.55 rows=1 width=48) (actual time=818.219..4159.317 rows=1 loops=1) Join Filter: (t1.b = t2.a) Rows Removed by Join Filter: 31 -> Seq Scan on t t1 (cost=0.00..1.48 rows=1 width=8) (actual time=0.026..0.032 rows=1 loops=1) Filter: ((c = 0) AND (a = 5)) Rows Removed by Filter: 31 -> Nested Loop Left Join (cost=0.00..7.06 rows=1 width=40) (actual time=10.588..4159.270 rows=32 loops=1) Join Filter: (t2.b = t3.a) Rows Removed by Join Filter: 993 -> Seq Scan on t t2 (cost=0.00..1.40 rows=1 width=8) (actual time=0.008..0.020 rows=32 loops=1) Filter: (c = 0) -> Nested Loop Left Join (cost=0.00..5.65 rows=1 width=32) (actual time=0.142..129.970 rows=32 loops=32) Join Filter: (t3.b = t4.a) Rows Removed by Join Filter: 993 -> Seq Scan on t t3 (cost=0.00..1.40 rows=1 width=8) (actual time=0.002..0.010 rows=32 loops=32) Filter: (c = 0) -> Nested Loop Left Join (cost=0.00..4.23 rows=1 width=24) (actual time=0.007..4.055 rows=32 loops=1024) Join Filter: (t4.b = t5.a) Rows Removed by Join Filter: 993 -> Seq Scan on t t4 (cost=0.00..1.40 rows=1 width=8) (actual time=0.002..0.010 rows=32 loops=1024) Filter: (c = 0) -> Nested Loop Left Join (cost=0.00..2.82 rows=1 width=16) (actual time=0.003..0.121 rows=32 loops=32768) Join Filter: (t5.b = t6.a) Rows Removed by Join Filter: 528 -> Seq Scan on t t5 (cost=0.00..1.40 rows=1 width=8) (actual time=0.002..0.009 rows=32 loops=32768) Filter: (c = 0) -> Materialize (cost=0.00..1.40 rows=1 width=8) (actual time=0.000..0.001 rows=17 loops=1048576) -> Seq Scan on t t6 (cost=0.00..1.40 rows=1 width=8) (actual time=0.008..0.031 rows=32 loops=1) Filter: (c = 0) Planning time: 3.316 ms Execution time: 4159.596 ms (31 rows) Time: 4165,372 ms (00:04,165) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Perfomance bug in v10
Thank you for the answer! This is all caused by get_variable_numdistinct() deciding that all values are distinct because ntuples < DEFAULT_NUM_DISTINCT. I see that if the example is increased to use 300 tuples instead of 32, then that's enough for the planner to estimate 2 rows instead of clamping to 1, and the bad plan does not look so good anymore since the planner predicts that those nested loops need to be executed more than once. I miss here why could the presence of index influence on that? removing index causes a good plan although it isn't used in both plans . I really think the planner is too inclined to take risks by nesting Nested loops like this, but I'm not all that sure the best solution to fix this, and certainly not for beta1. So, I'm a bit unsure exactly how best to deal with this. It seems like we'd better make some effort, as perhaps this could be a case that might occur when temp tables are used and not ANALYZED, but the only way I can think to deal with it is not to favour unique inner nested loops in the costing model. The unfortunate thing about not doing this is that the planner will no longer swap the join order of a 2-way join to put the unique rel on the inner side. This is evident by the regression test failures caused by patching with the attached, which changes the cost model for nested loops back to what it was before unique joins. The patch, seems, works for this particular case, but loosing swap isn't good thing, I suppose. My other line of thought is just not to bother doing anything about this. There's plenty more queries you could handcraft to trick the planner into generating a plan that'll blow up like this. Is this a realistic enough one to bother accounting for? Did it come from a real world case? else, how did you stumble upon it? Unfortunately, it's taken from real application. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Perfomance bug in v10
Teodor, could you check if this patch fixes your real-world problem? It works fine with original query, thank you. But some other query slowdowns for ~10% (9 secs vs 10 secs). Look at following part of plans of huge query: without patch: -> Nested Loop (cost=34.82..50.91 rows=1 width=20) (actual time=0.017..0.061 rows=5 loops=24121) -> ... -> Materialize (cost=0.56..15.69 rows=1 width=5) (actual time=0.003..0.004 rows=2 loops=109061) -> Index Scan using ... (cost=0.56..15.68 rows=1 width=5) (actual time=0.013..0.014 rows=2 loops=24121) with patch: -> Nested Loop (cost=34.82..50.91 rows=1 width=20) (actual time=0.018..0.063 rows=5 loops=24121) -> ... -> Index Scan using ... (cost=0.56..15.68 rows=1 width=5) (actual time=0.012..0.013 rows=2 loops=109061) (dots hidden the same parts) As you said, it removes Materialize node, although it's useful here. If you wish, I can do a test suite, its size will be around 10MB and send it by private email. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Perfomance bug in v10
There were old threads about considering a risk factor when estimating plans, and I'm thinking this issue is the planner failing to do exactly that. I'm afraid it's tool late for v10 -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Perfomance bug in v10
BTW, was the larger query plan that you showed (with a Materialize node) generated by 9.6, or v10 HEAD? Because I would be surprised if 9.6 did v10, commit acbd8375e954774181b673a31b814e9d46f436a5 Author: Magnus Hagander Date: Fri Jun 2 11:18:24 2017 +0200 -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Selectivity estimation for intarray with @@
Some notices about code: 1 Near function transformOperator() there is wrong operator name "@<" 2 int_query (and intquery) should be replaced to query_int to be consistent with actual type name. At least where it's used as separate lexeme. 3 For historical reasons @@ doesn't commutate with itself, it has a commutator ~~. Patch assumes that @@ is self-commutator, but ~~ will use 'contsel' as a restrict estimation. Suppose, we need to declare ~~ as deprecated and introduce commutating @@ operator. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] stringify MAKE_SQLSTATE()
Hi! Following discussion at https://commitfest.postgresql.org/5/190/ patch, I found (at seems to me) a way to stringify MAKE_SQLSTATE(), the idea is to use char array as string: #include #define MAKE_SQLSTATE(ch1,ch2,ch3,ch4,ch5) \ ((char[]){(char)(ch1),(char)(ch2),(char)(ch3),(char)(ch4),(char)(ch5),(char)'\0'}) #define ERRCODE_WARNING_DEPRECATED_FEATURE MAKE_SQLSTATE('0','1','P','0','1') int main(int argn, char* argv[]) { char*x = ERRCODE_WARNING_DEPRECATED_FEATURE; printf("%s\n", x); return 0; } That works for clang ang gcc48 with flags -Wall -Wextra -ansi (or -std=c90) without warnings, but doesn't work with -pedantic. Is that enough to to work with other compilers? -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] stringify MAKE_SQLSTATE()
#define MAKE_SQLSTATE(ch1,ch2,ch3,ch4,ch5) \ ((char[]){(char)(ch1),(char)(ch2),(char)(ch3),(char)(ch4),(char)(ch5),(char)'\0'}) I'm pretty sure that's a gcc-ism, not standard C. Hmm, after some digging: the feature is called compound literals and it was introduced in c99 although gcc has support in c90. To disable this support it's needed a flag -pedantic. Sorry. The real question is do we want to. What's your use-case for this? Just do not have a hardcoded values. It is too easy to make a mistake in hardcoded value. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] compress method for spgist - 2
Poorly, by hanging boxes that straddled dividing lines off the parent node in a big linear list. The hope would be that the case was Ok, I see, but that's not really what I was wondering. My question is this: SP-GiST partitions the space into non-overlapping sections. How can you store polygons - which can overlap - in an SP-GiST index? And how does the compress method help with that? I believe if we found a way to index boxes then we will need a compress method to build index over polygons. BTW, we are working on investigation a index structure for box where 2d-box is treated as 4d-point. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Microvacuum for gist. Question about GISTPageOpaqueData flag
I need an advice, what would be better: - to add new flag like F_HAS_GARBAGE, - or to delete all mentions of F_TUPLES_DELETED and use it in gist microvacuum. According to commit message: commit 2effb72e682a7dbdc9a8a60a80c22ec1fa9d8079 Author: Heikki Linnakangas Date: Fri Nov 7 15:03:46 2014 +0200 .. The code that generated a record to clear the F_TUPLES_DELETED flag hasn't existed since we got rid of old-style VACUUM FULL. I kept the code that sets the flag, although it's not used for anything anymore, because it might still be interesting information for debugging purposes that some tuples have been deleted from a page. .. If Heikki doesn't change his opinion then introduce new flag. Although I don't think that we need to keep F_TUPLES_DELETED. Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Implement failover on libpq connect level.
+1 for bringing the jdbc driver URI syntax into libpq, so that all interfaces can be optionally specified this way. This doesn't preclude the use of ipfailover, in fact it might be work well together. If you don't like it, don't use it. +1 Another thought: multiple hosts in URI could be used in simple configuration for read-only clients. I faced with customers which manages two connections in process - to master and to one of several slaves. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Access method extendability
In general pattern of generic WAL usage is following. 1) Start using generic WAL: specify relation M-m, what about extensions which wants to use WAL but WAL record doesn't connected to any relation? For example, transaction manager or kind of FDW. GenericXLogStart(index); 2) Register buffers GenericXLogRegister(0, buffer1, false); GenericXLogRegister(1, buffer2, true); first argument is a slot number, second is the buffer, third is flag indicating new buffer Why do we need a slot number? to replace already registered buffer? -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Microvacuum for gist.
Some notices 1 gistvacuumpage(): OffsetNumber deletable[MaxOffsetNumber]; Seems, MaxOffsetNumber is too much, MaxIndexTuplesPerPage is enough 2 Loop in gistkillitems() for searching heap pointer. It seems to me that it could be a performance problem. To fix it, it's needed to remember index tuple's offset number somewhere near GISTScanOpaqueData->killedItems. And gistkillitems() will loop over offsets and compare heap pointer from killedItems and index tuple, if they doesn't match then just skip this index tuple. 3 Connected with previous, could you show some performance tests? -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Access method extendability
Some notices: 1) create-am.3.patch.gz As I understand, you didn't add schema name to access method. Why? Suppose, if we implement SQL-like interface for AM screation/dropping then we should provide a schema qualification for them 2) create-am.3.patch.gz get_object_address_am() + switch (list_length(objname)) ... + case 2: + catalogname = strVal(linitial(objname)); + amname = strVal(lthird(objname)); ^^ seems, it should be lsecond() 3) create-am.3.patch.gz Suppose, RM_GENERIC_ID is part of generic-xlog.3.patch.gz 4) Makefile(s) As I can see, object files are lexicographically ordered 5) gindesc.c -> genericdesc.c in file header 6) generic-xlog.3.patch.gz I don't like an idea to split START_CRIT_SECTION and END_CRIT_SECTION to GenericXLogStart and GenericXLogFinish. User's code could throw a error or allocate memory between them and error will become a panic. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Access method extendability
Petr Jelinek wrote: On 2015-09-07 17:41, Teodor Sigaev wrote: Some notices: 1) create-am.3.patch.gz As I understand, you didn't add schema name to access method. Why? Suppose, if we implement SQL-like interface for AM screation/dropping then we should provide a schema qualification for them Why would access methods need to be schema qualified? Opfamily and opclass could be schema-qualified. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Microvacuum for gist.
Something goes wrong... gist.c:1465:5: warning: unused variable 'minoff' [-Wunused-variable] minoff, gistget.c:37:1: warning: unused function 'gistkillitems' [-Wunused-function] gistkillitems(IndexScanDesc scan) Without microvacuum. All 'select' queries are executed at about same time Time: 360,468 ms Time: 243,197 ms Time: 238,310 ms Time: 238,018 ms With microvacuum. First 'select' invokes gistkillitems(). It's executed a bit slower than before. But following queries are executed significantly faster than without microvacuum. Time: 368,780 ms Time: 69,769 ms Time: 9,545 ms Time: 12,427 ms That's perfect, but I can't reproduce that. Suppose, because of "unused function 'gistkillitems'" -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Microvacuum for gist.
BTW, I slightly modify your test to provide more stable results. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ drop table test; CREATE TABLE test (box box); alter table test set (autovacuum_enabled=off); insert into test (select box(point(x, x),point(x, x)) from generate_series(1,500) as x); vacuum test; CREATE INDEX test_idx ON test USING gist (box); SET enable_seqscan TO false; SET enable_bitmapscan TO false; SET enable_indexscan TO true; SET enable_indexonlyscan TO true; select count(box) from test where box && box(point(0,0), point(100,100)); delete from test where box && box(point(0,0), point(100,100)); -- This query invokes gistkillitems() select count(box) from test where box && box(point(0,0), point(100,100)); -- These queries should work faster with microvacuum select count(box) from test where box && box(point(0,0), point(100,100)); select count(box) from test where box && box(point(0,0), point(100,100)); select count(box) from test where box && box(point(0,0), point(100,100)); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers