Re: [HACKERS] postgresql latency bgwriter not doing its job
Hello Heikki, For the kicks, I wrote a quick dirty patch for interleaving the fsyncs, see attached. It works by repeatedly scanning the buffer pool, writing buffers belonging to a single relation segment at a time. I tried this patch on the same host I used with the same -R 25 -L 200 -T 5000, alas without significant positive effects, about 6% of transactions were lost, including stretches of several seconds of unresponsiveness. Maybe this is because pgbench -N only basically touches 2 tables (accounts history), so there are few file syncs involved, thus it does not help much with spreading a lot of writes. 2014-08-30 23:52:24.167 CEST: LOG: checkpoint starting: xlog 2014-08-30 23:54:09.239 CEST: LOG: checkpoint sync: number=1 file=base/16384/11902 time=24.529 msec 2014-08-30 23:54:23.812 CEST: LOG: checkpoint sync: number=1 file=base/16384/16397_vm time=32.547 msec 2014-08-30 23:54:24.771 CEST: LOG: checkpoint sync: number=1 file=base/16384/11873 time=557.470 msec 2014-08-30 23:54:37.419 CEST: LOG: checkpoint complete: wrote 4931 buffers (30.1%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=122.946 s, sync=10.129 s, total=133.252 s; sync files=6, longest=9.854 s, average=1.790 s Note that given the small load and table size, pgbench implies random I/Os and basically nothing else. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improving speed of make check-world
Hello Peter, Here is a review: The version 2 of the patch applies cleanly on current head. The ability to generate and reuse a temporary installation for different tests looks quite useful, thus putting install out of pg_regress and in make seems reasonnable. However I'm wondering whether there could be some use case of pg_regress where doing the install might be useful nevertheless, say for manually doing things on the command line, in which case keeping the feature, even if not used by default, could be nice? About changes: I think that more comments would be welcome, eg with_temp_install. There is not a single documentation change. Should there be some? Hmmm... I have not found much documentation about pg_regress... I have tested make check, check-world, as well as make check in contrib sub-directories. It seems to work fine in sequential mode. Running make -j2 check-world does not work because initdb is not found by pg_regress. but make -j1 check-world does work fine. It seems that some dependencies might be missing and there is a race condition between temporary install and running some checks?? Maybe it is not expected to work anyway? See below suggestions to make it work. However in this case the error message passed by pg_regress contains an error: pg_regress: initdb failed Examine /home/fabien/DEV/GIT/postgresql/contrib/btree_gin/log/initdb.log for the reason. Command was: initdb -D /home/fabien/DEV/GIT/postgresql/contrib/btree_gin/./tmp_check/data --noclean --nosync /home/fabien/DEV/GIT/postgresql/contrib/btree_gin/./tmp_check/log/initdb.log 21 make[2]: *** [check] Error 2 The error messages point to non existing log file (./tmp_check is missing), the temp_instance variable should be used in the error message as well as in the commands. Maybe other error messages have the same issues. I do not like much the MAKELEVEL hack in a phony target. When running in parallel, several makes may have the same level anyway, not sure what would happen... Maybe it is the origin of the race condition which makes initdb not to be found above. I would suggest to try to rely on dependences, maybe something like the following could work to ensure that an installation is done once and only once per make invocation, whatever the underlying parallelism levels, as well as a possibility to reuse the previous installation. # must be defined somewhere common to all makefiles ifndef MAKE_NONCE # define a nanosecond timestamp export MAKE_NONCE := $(shell date +%s.%N) endif # actual new tmp installation .tmp_install: $(RM) ./.tmp_install.* $(RM) -r ./tmp_install # create tmp installation... touch $@ # tmp installation for the nonce .tmp_install.$(MAKE_NONCE): .tmp_install touch $@ # generate a new tmp installation by default # make USE_TMP_INSTALL=1 ... reuses a previous installation if available ifdef USE_TMP_INSTALL TMP_INSTALL = .tmp_install else TMP_INSTALL = .tmp_install.$(MAKE_NONCE) endif # USE_TMP_INSTALL .PHONY: main-temp-install main-temp-install: $(TMP_INSTALL) .PHONY: extra-temp-install extra-temp-install: main-temp-install # do EXTRA_INSTALL MSVC build is not yet adjusted for this. Looking at vcregress.pl, this should be fairly straightforward. No idea about that point. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_filedump for 9.4?
Re: Fabrízio de Royes Mello 2014-06-25 CAFcNs+oAb8h-0w2vLEWj6R-Gv=xizgdBya3K=SCd_9Tjyo=z...@mail.gmail.com On Wed, Jun 25, 2014 at 3:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: Would like that, but I'm not sure what pgindent will do with the // comments. It's been on my to-do list to switch all the comments to C89 style and then pgindent it, but I don't see myself getting to that in this decade :-( I changed all // comments to /* */ and run pgindent. I've pushed these patches to the git repository, thanks. Christoph -- c...@df7cb.de | http://www.df7cb.de/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tips/advice for implementing integrated RESTful HTTP API
On 8/31/14 12:40 AM, Dobes Vandermeer wrote: The background workers can apparently only connect to a single database at a time, but I want to expose all the databases via the API. I think the term background worker should be taken as a hint that foreground protocol endpoint was not one of the envisioned use cases. I think this sort of thing should be done as a separate process in front of the postmaster. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY and heap_sync
On 8/30/14 2:26 AM, Jeff Janes wrote: But there cases were people use COPY in a loop with a small amount of data in each statement. What would be the reason for doing that? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog and replication slots
On Tue, Aug 26, 2014 at 9:43 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Tue, Aug 19, 2014 at 2:49 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Mon, Aug 18, 2014 at 4:01 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Mon, Aug 18, 2014 at 3:48 PM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Aug 18, 2014 at 2:38 PM, Michael Paquier michael.paqu...@gmail.com wrote: And now looking at your patch there is additional refactoring possible with IDENTIFY_SYSTEM and pg_basebackup as well... And attached is a rebased patch doing so. I reworked this patch a bit to take into account the fact that the number of columns to check after running IDENTIFY_SYSTEM is not always 4 as pointed out by Andres: - pg_basebackup = 3 - pg_receivexlog = 3 - pg_recvlogical = 4 As this is a number of patches rolled into one - do you happen to keep them separate in your local repo? If so can you send them as separate ones (refactor identify_system should be quite unrelated to supporting replication slots, right?), for easier review? (if not, I'll just split them apart mentally, but it's easier to review separately) On the identify_system part - my understanding of the code is that what you pass in as num_cols is the number of columns required for it to work, right? We probably need to adjust the error message as well in that case, because it's no longer what's expected, it's what's required? And we might want to include a hint about the reason (wrong version)? There's also a note get LSN start position if necessary, but it tries to do it unconditionally. What is the if necessary supposed to refer to? Actually - why do we even care about the 3 vs 4 in RunIdentifySystem, as it never actually looks at the 4th column anyway? If we do specifically want it to fail in the case of pg_recvlogical, we really need to think up a better error message for it, and perhaps a different way of specifying it? Do we really want those Asserts? There is not a single Assert in bin/pg_basebackup today - as is the case for most things in bin/. We typically use regular if statements for things that can happen, and just ignore the others I think - since the callers are fairly simple to trace. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_filedump for 9.4?
Em domingo, 31 de agosto de 2014, Christoph Berg c...@df7cb.de escreveu: Re: Fabrízio de Royes Mello 2014-06-25 CAFcNs+oAb8h-0w2vLEWj6R-Gv=xizgdBya3K=SCd_9Tjyo=z...@mail.gmail.com javascript:; On Wed, Jun 25, 2014 at 3:52 PM, Tom Lane t...@sss.pgh.pa.us javascript:; wrote: Would like that, but I'm not sure what pgindent will do with the // comments. It's been on my to-do list to switch all the comments to C89 style and then pgindent it, but I don't see myself getting to that in this decade :-( I changed all // comments to /* */ and run pgindent. I've pushed these patches to the git repository, thanks. Thanks! Fabrízio de Royes Mello -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello
Re: [HACKERS] Tips/advice for implementing integrated RESTful HTTP API
Em domingo, 31 de agosto de 2014, Peter Eisentraut pete...@gmx.net escreveu: On 8/31/14 12:40 AM, Dobes Vandermeer wrote: The background workers can apparently only connect to a single database at a time, but I want to expose all the databases via the API. I think the term background worker should be taken as a hint that foreground protocol endpoint was not one of the envisioned use cases. I think this sort of thing should be done as a separate process in front of the postmaster. +1 Do you know the PGRest project? Look at http://pgre.st Regards, Fabrízio -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello
Re: [HACKERS] Final Patch for GROUPING SETS - unrecognized node type: 347
On Tue, August 26, 2014 14:24, Andrew Gierth wrote: Erik == Erik Rijkers e...@xs4all.nl writes: They apply cleanly for me at 2bde297 whether with git apply or patch, except for the contrib one (which you don't need unless you want to run the contrib regression tests without applying the gsp-u patch). Erik Ah, I had not realised that. Excluding that contrib-patch and Erik only applying these three: Erik gsp1.patch Erik gsp2.patch Erik gsp-doc.patch Erik does indeed work (applies, compiles). I put up a rebased contrib patch anyway (linked off the CF). Did the unrecognized node type error go away, or do we still need to look into that? I have found that the unrecognized node type error is caused by: shared_preload_libraries = pg_stat_statements in postgresql.conf (as my default compile script was doing). If I disable that line the error goes away. I don't know exactly what that means for the groping sets patches but I thought I'd mention it here. Otherwise I've not run into any problems with GROUPING SETS. Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Final Patch for GROUPING SETS - unrecognized node type: 347
On Sun, Aug 31, 2014 at 9:07 PM, Erik Rijkers e...@xs4all.nl wrote: On Tue, August 26, 2014 14:24, Andrew Gierth wrote: Erik == Erik Rijkers e...@xs4all.nl writes: They apply cleanly for me at 2bde297 whether with git apply or patch, except for the contrib one (which you don't need unless you want to run the contrib regression tests without applying the gsp-u patch). Erik Ah, I had not realised that. Excluding that contrib-patch and Erik only applying these three: Erik gsp1.patch Erik gsp2.patch Erik gsp-doc.patch Erik does indeed work (applies, compiles). I put up a rebased contrib patch anyway (linked off the CF). Did the unrecognized node type error go away, or do we still need to look into that? I have found that the unrecognized node type error is caused by: shared_preload_libraries = pg_stat_statements in postgresql.conf (as my default compile script was doing). If I disable that line the error goes away. I think thats more of a library linking problem rather than a problem with the patch. I couldnt reproduce it,though. Regards, Atri -- Regards, Atri *l'apprenant*
Re: [HACKERS] Final Patch for GROUPING SETS - unrecognized node type: 347
On 2014-08-31 21:09:59 +0530, Atri Sharma wrote: On Sun, Aug 31, 2014 at 9:07 PM, Erik Rijkers e...@xs4all.nl wrote: I have found that the unrecognized node type error is caused by: It's a warning, not an error, right? shared_preload_libraries = pg_stat_statements in postgresql.conf (as my default compile script was doing). If I disable that line the error goes away. I think thats more of a library linking problem rather than a problem with the patch. I couldnt reproduce it,though. I think it's vastly more likely that the patch simply didn't add the new expression types to pg_stat_statements.c:JumbleExpr(). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improving speed of make check-world
# actual new tmp installation .tmp_install: $(RM) ./.tmp_install.* $(RM) -r ./tmp_install # create tmp installation... touch $@ # tmp installation for the nonce .tmp_install.$(MAKE_NONCE): .tmp_install touch $@ Oops, I got it wrong, the install would not be reexecuted the second time. Maybe someting more like: ifdef USE_ONE_INSTALL TMP_INSTALL = .tmp_install.once else TMP_INSTALL = .tmp_install.$(MAKE_NONCE) endif $(TMP_INSTALL): $(RM) -r ./tmp_install .tmp_install.* # do installation... touch $@ So that the file target is different each time it is run. Hopefully. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Final Patch for GROUPING SETS - unrecognized node type: 347
On Sunday, August 31, 2014, Andres Freund and...@2ndquadrant.com wrote: On 2014-08-31 21:09:59 +0530, Atri Sharma wrote: On Sun, Aug 31, 2014 at 9:07 PM, Erik Rijkers e...@xs4all.nl javascript:; wrote: I have found that the unrecognized node type error is caused by: It's a warning, not an error, right? shared_preload_libraries = pg_stat_statements in postgresql.conf (as my default compile script was doing). If I disable that line the error goes away. I think thats more of a library linking problem rather than a problem with the patch. I couldnt reproduce it,though. I think it's vastly more likely that the patch simply didn't add the new expression types to pg_stat_statements.c:JumbleExpr(). Must have run the above diagnosis in a wrong manner then, I will check.Thanks for the heads up! Regards, Atri -- Regards, Atri *l'apprenant*
Re: [HACKERS] Re: [BUGS] Re: BUG #9555: pg_dump for tables with inheritance recreates the table with the wrong order of columns
Bruce Momjian br...@momjian.us writes: I have developed the attached patch to warn about column reordering in this odd case. The patch mentions the reordering of c: NOTICE: merging column a with inherited definition NOTICE: merging column c with inherited definition; column moved earlier to match inherited column location This does not comport with our error message style guidelines. You could put the additional information as an errdetail sentence, perhaps. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Built-in binning functions
On 30 August 2014 18:24, Tom Lane t...@sss.pgh.pa.us wrote: 3. I am thinking about name - I don't think so varwidth_bucket is correct -- in relation to name width_bucket ... what about range_bucket or scope_bucket ? Neither of those seem like improvements from here. I agree with the objections to bin() as well. bucket() might be all right but it still seems a bit too generic. width_bucket(), which at least shows there's a relationship to the standard functions, still seems like the best of the suggestions so far. I'd been mulling that over and agree with objections to bin() Suggest discretize() as a much more informative name. The other names will be overlooked by anybody that doesn't already know what to look for. Thanks for the polymorphic patch, that sounds good. Sorry, not reviewed more deeply (still travelling). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [BUGS] Re: BUG #9555: pg_dump for tables with inheritance recreates the table with the wrong order of columns
Tom Lane-2 wrote Bruce Momjian lt; bruce@ gt; writes: I have developed the attached patch to warn about column reordering in this odd case. The patch mentions the reordering of c: NOTICE: merging column a with inherited definition NOTICE: merging column c with inherited definition; column moved earlier to match inherited column location This does not comport with our error message style guidelines. You could put the additional information as an errdetail sentence, perhaps. Would it be proper to issue an additional top-level warning with the column moved notification? Thus there would be NOTICE, NOTICE, WARNING in the above example? Or, more generically, columns reordered to match inherited column order to avoid multiple warnings if more than one column is moved. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Re-BUGS-Re-BUG-9555-pg-dump-for-tables-with-inheritance-recreates-the-table-with-the-wrong-order-of-s-tp5816566p5817073.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Selectivity estimation for inet operators
* Isn't X Y equivalent to network_scan_first(X) Y AND network_scan_last(X) Y? Or at least close enough for selectivity estimation purposes? Pardon my ignorance - I'm not too familiar with the inet datatype - but how about just calling scalarineqsel for both bounds? Actually, X Y is equivalent to network_scan_first(X) = network_host(Y) AND network_scan_last(X) = network_host(Y) AND network_masklen(X) network_masklen(X) but we do not have statistics for neither network_scan_last(X) nor network_masklen(X). I tried to find a solution based on the implementation of the operators. * inet_mcv_join_selec() is O(n^2) where n is the number of entries in the MCV lists. With the max statistics target of 1, a worst case query on my laptop took about 15 seconds to plan. Maybe that's acceptable, but you went through some trouble to make planning of MCV vs histogram faster, by the log2 method to compare only some values, so I wonder why you didn't do the same for the MCV vs MCV case? It was like that in the previous versions. It was causing worse estimation, but I was trying to reduce both sides of the lists. It works slightly better when only the left hand side of the list is reduced. Attached version works like that. * A few typos: lenght - length. Fixed. Thank you for looking at it. diff --git a/src/backend/utils/adt/network_selfuncs.c b/src/backend/utils/adt/network_selfuncs.c index d0d806f..a00706c 100644 --- a/src/backend/utils/adt/network_selfuncs.c +++ b/src/backend/utils/adt/network_selfuncs.c @@ -1,32 +1,671 @@ /*- * * network_selfuncs.c * Functions for selectivity estimation of inet/cidr operators * - * Currently these are just stubs, but we hope to do better soon. + * Estimates are based on null fraction, distinct value count, most common + * values, and histogram of inet/cidr datatypes. * * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California * * * IDENTIFICATION * src/backend/utils/adt/network_selfuncs.c * *- */ #include postgres.h +#include math.h + +#include access/htup_details.h +#include catalog/pg_collation.h +#include catalog/pg_operator.h +#include catalog/pg_statistic.h +#include utils/lsyscache.h #include utils/inet.h +#include utils/selfuncs.h +/* Default selectivity constant for the inet overlap operator */ +#define DEFAULT_OVERLAP_SEL 0.01 + +/* Default selectivity constant for the other operators */ +#define DEFAULT_INCLUSION_SEL 0.005 + +/* Default selectivity for given operator */ +#define DEFAULT_SEL(operator) \ + ((operator) == OID_INET_OVERLAP_OP ? \ + DEFAULT_OVERLAP_SEL : DEFAULT_INCLUSION_SEL) + +static Selectivity networkjoinsel_inner(Oid operator, +VariableStatData *vardata1, VariableStatData *vardata2); +extern double eqjoinsel_semi(Oid operator, VariableStatData *vardata1, + VariableStatData *vardata2, RelOptInfo *inner_rel); +extern RelOptInfo *find_join_input_rel(PlannerInfo *root, Relids relids); +static short int inet_opr_order(Oid operator); +static Selectivity inet_his_inclusion_selec(Datum *values, int nvalues, +Datum *constvalue, short int opr_order); +static Selectivity inet_mcv_join_selec(Datum *values1, float4 *numbers1, + int nvalues1, Datum *values2, float4 *numbers2, + int nvalues2, int red_nvalues, Oid operator); +static Selectivity inet_mcv_his_selec(Datum *mcv_values, float4 *mcv_numbers, + int mcv_nvalues, Datum *his_values, int his_nvalues, + int red_nvalues, short int opr_order, + Selectivity *max_selec_pointer); +static Selectivity inet_his_inclusion_join_selec(Datum *his1_values, + int his1_nvalues, Datum *his2_values, int his2_nvalues, + int red_nvalues, short int opr_order); +static short int inet_inclusion_cmp(inet *left, inet *right, + short int opr_order); +static short int inet_masklen_inclusion_cmp(inet *left, inet *right, + short int opr_order); +static short int inet_his_match_divider(inet *boundary, inet *query, + short int opr_order); + +/* + * Selectivity estimation for the subnet inclusion operators + */ Datum networksel(PG_FUNCTION_ARGS) { - PG_RETURN_FLOAT8(0.001); + PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0); + Oid
Re: [HACKERS] Selectivity estimation for inet operators
Heikki Linnakangas hlinnakan...@vmware.com writes: * inet_mcv_join_selec() is O(n^2) where n is the number of entries in the MCV lists. With the max statistics target of 1, a worst case query on my laptop took about 15 seconds to plan. Maybe that's acceptable, but you went through some trouble to make planning of MCV vs histogram faster, by the log2 method to compare only some values, so I wonder why you didn't do the same for the MCV vs MCV case? Actually, what I think needs to be asked is the opposite question: why is the other code ignoring some of the statistical data? If the user asked us to collect a lot of stats detail it seems reasonable that he's expecting us to use it to get more accurate estimates. It's for sure not obvious why these estimators should take shortcuts that are not being taken in the much-longer-established code for scalar comparison estimates. It will still use more statistical data, when statistics_target is higher. It was not sure that the user wants to spent O(n^2) amount of time based on statistics_target. Attached version is without this optimization. Estimates are better without it, but planning takes more time. I'm not exactly convinced that the math adds up in this logic, either. The way in which it combines results from looking at the MCV lists and at the histograms seems pretty arbitrary. I taught the product of the join will be (left_mcv + left_histogram) * (right_mcv + right_histogram) * selectivity and tried to calculate it as in the following: (left_mcv * right_mcv * selectivity) + (right_mcv * left_histogram * selectivity) + (left_mcv * right_histogram * selectivity) + (left_histogram * right_histogram * selectivity) where left_histogram is 1.0 - left_nullfrac - left_mcv I fixed calculation for the MCV vs histogram part. The estimates of inner join are very close to the actual rows with statistics_target = 1000. I think the calculation should be right. diff --git a/src/backend/utils/adt/network_selfuncs.c b/src/backend/utils/adt/network_selfuncs.c index d0d806f..4367f0e 100644 --- a/src/backend/utils/adt/network_selfuncs.c +++ b/src/backend/utils/adt/network_selfuncs.c @@ -1,32 +1,655 @@ /*- * * network_selfuncs.c * Functions for selectivity estimation of inet/cidr operators * - * Currently these are just stubs, but we hope to do better soon. + * Estimates are based on null fraction, distinct value count, most common + * values, and histogram of inet/cidr datatypes. * * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California * * * IDENTIFICATION * src/backend/utils/adt/network_selfuncs.c * *- */ #include postgres.h +#include math.h + +#include access/htup_details.h +#include catalog/pg_collation.h +#include catalog/pg_operator.h +#include catalog/pg_statistic.h +#include utils/lsyscache.h #include utils/inet.h +#include utils/selfuncs.h +/* Default selectivity constant for the inet overlap operator */ +#define DEFAULT_OVERLAP_SEL 0.01 + +/* Default selectivity constant for the other operators */ +#define DEFAULT_INCLUSION_SEL 0.005 + +/* Default selectivity for given operator */ +#define DEFAULT_SEL(operator) \ + ((operator) == OID_INET_OVERLAP_OP ? \ + DEFAULT_OVERLAP_SEL : DEFAULT_INCLUSION_SEL) + +static Selectivity networkjoinsel_inner(Oid operator, +VariableStatData *vardata1, VariableStatData *vardata2); +extern double eqjoinsel_semi(Oid operator, VariableStatData *vardata1, + VariableStatData *vardata2, RelOptInfo *inner_rel); +extern RelOptInfo *find_join_input_rel(PlannerInfo *root, Relids relids); +static short int inet_opr_order(Oid operator); +static Selectivity inet_his_inclusion_selec(Datum *values, int nvalues, +Datum *constvalue, short int opr_order); +static Selectivity inet_mcv_join_selec(Datum *values1, float4 *numbers1, + int nvalues1, Datum *values2, float4 *numbers2, + int nvalues2, Oid operator, Selectivity *max_selec_p); +static Selectivity inet_mcv_his_selec(Datum *mcv_values, float4 *mcv_numbers, + int mcv_nvalues, Datum *his_values, int his_nvalues, + short int opr_order, Selectivity *max_selec_p); +static Selectivity inet_his_inclusion_join_selec(Datum *his1_values, + int his1_nvalues, Datum *his2_values, int his2_nvalues, + short int opr_order); +static short int inet_inclusion_cmp(inet *left, inet
Re: [HACKERS] Built-in binning functions
Simon Riggs si...@2ndquadrant.com writes: Suggest discretize() as a much more informative name. The other names will be overlooked by anybody that doesn't already know what to look for. I did not like that idea to begin with, but it's growing more attractive. In particular, I think it would be unique enough that we could safely mark the polymorphic function as variadic (a move that would cause ambiguity issues for pretty nearly any user-defined function of the same name). OTOH, I'm not as convinced as you that this name would mean anything to anybody that doesn't already know what to look for. It still seems like rather arcane terminology. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Selectivity estimation for inet operators
What you did there is utterly unacceptable from a modularity standpoint; and considering that the values will be nowhere near right, the argument that it's better than returning a constant seems pretty weak. I think you should just take that out again. I will try to come up with a better, data type specific implementation in a week. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [BUGS] Re: BUG #9555: pg_dump for tables with inheritance recreates the table with the wrong order of columns
David G Johnston david.g.johns...@gmail.com writes: Would it be proper to issue an additional top-level warning with the column moved notification? Thus there would be NOTICE, NOTICE, WARNING in the above example? Or, more generically, columns reordered to match inherited column order to avoid multiple warnings if more than one column is moved. That's a good point: if this message fires at all, it will probably fire more than once; do we want that? If we do it as you suggest here, we'll lose the information as to exactly which columns got relocated, which perhaps is bad, or maybe it doesn't matter. Also, I don't remember the exact code structure in that area, but it might be a bit painful to arrange that we get only one such warning even when inheriting from multiple parents. If we do want the specific moved columns to be identified, I'd still go with errdetail on the NOTICE rather than two separate messages. I think calling it a WARNING is a bit extreme anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_filedump for 9.4?
Hi community, while I am currently investigating why a certain table with highly redundant and utterly verbose xml becomes worse storage wise when making the xml more compact. Since i am quite new to this, I believe its the lz compression in the text database. But thats irrelevant now, just mentioning because tools like pg_filedump allow people like me to help themselves and a basic understanding of things. During checking I noticed pg_filedump (current from git.postgresql.org incl. the below mentioned commit) does not compile on Mac-OSX. Afaik it will not compile as soon as post.h comes into play and USE_REPL_SNPRINTF is defined. Then printf and sprintf (ouch particular but code path seems tolerable) in the source of pg_filedump become pg_printf and so on. These replacements are part of postgres and can’t be linked into the standalone pg_filedump. At least that is certainly not the intention. Putting #undef sprintf #undef print after the includes in pg_filedump fixes the mac compile and imho all builds where the USE_REPL_SNPRINTF is defined as a side effect of include postgres.h effectively taking printf from me. Not sure how to deal with this issue correctly so this is just for your consideration since the issue is a bit broader imho. Regards, Stepan Am 31.08.2014 um 17:25 schrieb Fabrízio de Royes Mello fabriziome...@gmail.com: Em domingo, 31 de agosto de 2014, Christoph Berg c...@df7cb.de escreveu: Re: Fabrízio de Royes Mello 2014-06-25 CAFcNs+oAb8h-0w2vLEWj6R-Gv=xizgdBya3K=SCd_9Tjyo=z...@mail.gmail.com On Wed, Jun 25, 2014 at 3:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: Would like that, but I'm not sure what pgindent will do with the // comments. It's been on my to-do list to switch all the comments to C89 style and then pgindent it, but I don't see myself getting to that in this decade :-( I changed all // comments to /* */ and run pgindent. I've pushed these patches to the git repository, thanks. Thanks! Fabrízio de Royes Mello -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Built-in binning functions
On 01/09/14 06:00, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: Suggest discretize() as a much more informative name. The other names will be overlooked by anybody that doesn't already know what to look for. I did not like that idea to begin with, but it's growing more attractive. In particular, I think it would be unique enough that we could safely mark the polymorphic function as variadic (a move that would cause ambiguity issues for pretty nearly any user-defined function of the same name). OTOH, I'm not as convinced as you that this name would mean anything to anybody that doesn't already know what to look for. It still seems like rather arcane terminology. regards, tom lane If I was new to PostgreSQL, I'd probably read this as disc*(), a function to do something with discs. I have done finite maths and statistics at university, plus I have been vaguely following this thread - but, the meaning of discretize() is not immediately apparent to me (if I reread a previous email or 2 in this thread, then I'm sure it would then be obvious!). I might guess that it is to make something discrete, but why would I want to do that? So if I came across this function in a year or 2, I'd probably go right past it. I have been programming for over 40 years, and I still find choosing appropriate names sometimes very difficult, so I know it is not easy to come up with meaningful names - especially ones that mean something relevant to other people! Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Built-in binning functions
On 30/08/14 19:24, Tom Lane wrote: Pavel Stehule pavel.steh...@gmail.com writes: 1. I am thinking so reduction to only numeric types is not necessary - although we can live without it - but there are lot of non numeric categories: chars, date, ... I wasn't terribly happy about that either. I still think we should reduce this to a single polymorphic function, as in the attached. I did try to write generic function very similar to what you wrote but discarded it because of the performance reasons. If we really want to support any data type I am all for having generic function but I still would like to have one optimized for float8 because that seems to be the most used use-case (at least that one is the reason why I even wrote this) for performance reasons. Here are some numbers: First float8: CREATE TABLE wbtest AS SELECT (random() * 100)::float8 a FROM generate_series(1,100) ORDER BY 1; SELECT width_bucket(a, ARRAY[10, 20, 30, 40, 50, 60, 70, 80, 90]::float8[]), width_bucket(a, ARRAY[10, 20, 30, 40, 50, 60, 70, 80, 91]::float8[]), width_bucket(a, ARRAY[10, 20, 30, 40, 50, 60, 70, 80, 92]::float8[]) FROM wbtest; Optimized float8: ~250ms Tom's generic: ~400ms Numeric: CREATE TABLE wbtestn AS SELECT (random() * 100)::numeric a FROM generate_series(1,100) ORDER BY 1; SELECT width_bucket(a, ARRAY[10, 20, 30, 40, 50, 60, 70, 80, 90]::numeric[]), width_bucket(a, ARRAY[10, 20, 30, 40, 50, 60, 70, 80, 91]::numeric[]), width_bucket(a, ARRAY[10, 20, 30, 40, 50, 60, 70, 80, 92]::numeric[]) FROM wbtestn; Optimized numeric: ~600ms My generic: ~780ms Tom's generic: ~900ms The difference between my generic and Tom's generic is because Tom's is slowed down by the deconstruct_array. 2. Still I strongly afraid about used searching method - there is not possible to check a validity of input. Did you check how much linear searching is slower - you spoke, so you have a real data and real use case? Well, if we check the input then we'll be doing O(N) comparisons instead of O(log N). That could be a serious cost penalty for large arrays and expensive comparison functions (such as strcoll()). I think it's probably sufficient to have a clear warning in the docs. With resort the performance is worse than bunch of CASE WHEN :( Also, a documentation quibble: in Petr's patch the documentation and comments refer to the thresholds as right bounds, which I didn't care for and replaced with upper bounds. However, looking at it again, I wonder if we would not be better off describing them as lower bounds. They are exclusive bounds if seen as upper bounds, and inclusive if seen as lower bounds, and on the whole I think the latter viewpoint might be less confusing. Thoughts? Upper bounds is probably better name than right bounds, I agree with that. In any case it's upper bound for a bucket (that's why there is one more bucket to which everything bigger than max threshold goes into). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
Another thought about this general topic: Alvaro Herrera alvhe...@2ndquadrant.com writes: ... Allowed actions on a RELKIND_PARTITION: * CREATE INDEX .. ON PARTITION n ON TABLE xyz ... Still To Be Designed * Are indexes/constraints inherited from the parent rel? I think one of the key design decisions we have to make is whether partitions are all constrained to have exactly the same set of indexes. If we don't insist on that it will greatly complicate planning compared to what we'll get if we do insist on it, because then the planner will need to generate a separate customized plan subtree for each partition. Aside from costing planning time, most likely that would forever prevent us from pushing some types of intelligence about partitioning into the executor. Now, in the current model, it's up to the user what indexes to create on each partition, and sometimes one might feel that maintaining a particular index is unnecessary in some partitions. But the flip side of that is it's awfully easy to screw yourself by forgetting to add some index when you add a new partition. So I'm not real sure which approach is superior from a purely user-oriented perspective. I'm not trying to push one or the other answer right now, just noting that this is a critical decision. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Built-in binning functions
Petr Jelinek p...@2ndquadrant.com writes: On 30/08/14 19:24, Tom Lane wrote: I wasn't terribly happy about that either. I still think we should reduce this to a single polymorphic function, as in the attached. I did try to write generic function very similar to what you wrote but discarded it because of the performance reasons. If we really want to support any data type I am all for having generic function but I still would like to have one optimized for float8 because that seems to be the most used use-case (at least that one is the reason why I even wrote this) for performance reasons. Well, part of the reason why your v3 float8 function looks so fast is that it's cheating: it will not produce the right answers for comparisons involving NaNs. I'm not sure how good it would look once you'd added some isnan() tests to make the behavior actually equivalent to float8_cmp_internal(). However, assuming it still seems worthwhile once that's accounted for, I don't have a strong objection to having an additional code path for float8. There are two ways we could do that: 1. Add a runtime test in the polymorphic function, eg if (element_type == FLOAT8OID) result = width_bucket_float8(...); else if (typentry-typlen 0) result = width_bucket_fixed(...); else result = width_bucket_variable(...); 2. Define a SQL function width_bucket(float8, float8[]) alongside the polymorphic one. While #2 might be marginally faster at runtime, the main difference between these is what the parser would choose to do with ambiguous cases. I experimented a bit and it seemed that the parser would prefer the float8 function in any situation where the inputs were of numeric types, probably because float8 is the preferred type in the numeric category. I'm not real sure that would be a good thing: in particular it would cast integer and numeric inputs to float8, which risks precision loss, and there doesn't seem to be any way to get the parser to make the other choice if the inputs are of numeric-category types. As against that objection, it would make cross-type numeric cases easier to use, for example width_bucket(1, array[2.4]) would be accepted. If we just offer the anyelement/anyarray version then the parser would make you cast 1 to numeric before it would consider the function to match. On balance the runtime-test approach looks like a better idea, in that it doesn't risk any unexpected semantic behaviors. The difference between my generic and Tom's generic is because Tom's is slowed down by the deconstruct_array. Meh. It looked to me like your version would have O(N^2) performance problems from computing array offsets repeatedly, depending on exactly which array element it ended up on. It would avoid repeat calculations only if it always moved right. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
On 08/31/2014 10:03 PM, Tom Lane wrote: Another thought about this general topic: Alvaro Herrera alvhe...@2ndquadrant.com writes: ... Allowed actions on a RELKIND_PARTITION: * CREATE INDEX .. ON PARTITION n ON TABLE xyz ... Still To Be Designed * Are indexes/constraints inherited from the parent rel? I think one of the key design decisions we have to make is whether partitions are all constrained to have exactly the same set of indexes. If we don't insist on that it will greatly complicate planning compared to what we'll get if we do insist on it, because then the planner will need to generate a separate customized plan subtree for each partition. Aside from costing planning time, most likely that would forever prevent us from pushing some types of intelligence about partitioning into the executor. Now, in the current model, it's up to the user what indexes to create on each partition, and sometimes one might feel that maintaining a particular index is unnecessary in some partitions. But the flip side of that is it's awfully easy to screw yourself by forgetting to add some index when you add a new partition. The forgetting part is easy to solve by inheriting all indexes from parent (or template) partition unless explicitly told not to. One other thing that has been bothering me about this proposal is the ability to take partitions offline for maintenance or to load them offline ant then switch in. In current scheme we do this using ALTER TABLE ... [NO] INHERIT ... If we also want to have this with the not-directly-accessible partitions then perhaps it could be done by having a possibility to move a partition between two tables with exactly the same structure ? So I'm not real sure which approach is superior from a purely user-oriented perspective. What we currently have is a very flexible scheme which has a few drawbacks 1) unnecessarily complex for simple case 2) easy to shoot yourself in the foot by forgetting something 3) can be hard on planner, especially with huge number of partitions An alternative way of solving these problems is adding some (meta-)constraints to current way of doing things and some more automation CREATE TABLE FOR PARTITIONMASTER WITH (ALL_INDEXES_SAME=ON, SAME_STRUCTURE_ALWAYS=ON, SINGLE_INHERITANCE_ONLY=ON, NESTED_INHERITS=OFF, PARTITION_FUNCTION=default_range_partitioning(int) ); and then force these when adding inherited tables (in this case partition tables) either via CREATE TABLE or ALTER TABLE Best Regards -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
On Fri, Aug 29, 2014 at 12:35:50PM -0400, Tom Lane wrote: Each partition is assigned an Expression that receives a tuple and returns boolean. This expression returns true if a given tuple belongs into it, false otherwise. -1, in fact minus a lot. One of the core problems of the current approach is that the system, particularly the planner, hasn't got a lot of insight into exactly what the partitioning scheme is in a partitioned table built on inheritance. If you allow the partitioning rule to be a black box then that doesn't get any better. I want to see a design wherein the system understands *exactly* what the partitioning behavior is. I'd start with supporting range-based partitioning explicitly, and maybe we could add other behaviors such as hashing later. In particular, there should never be any question at all that there is exactly one partition that a given row belongs to, not more, not less. You can't achieve that with a set of independent filter expressions; a meta-rule that says exactly one of them should return true is an untrustworthy band-aid. (This does not preclude us from mapping the tuple through the partitioning rule and finding that the corresponding partition doesn't currently exist. I think we could view the partitioning rule as a function from tuples to partition numbers, and then we look in pg_class to see if such a partition exists.) There is one situation where you need to be more flexible, and that is if you ever want to support online repartitioning. To do that you have to distinguish between I want to insert tuple X, which partition should it go into and I want to know which partitions I need to look for partition_key=Y. For the latter you really have need an expression per partition, or something equivalent. If performance is an issue I suppose you could live with having an old and an new partition scheme, so you couldn't have two live repartitionings happening simultaneously. Now, if you want to close the door on online repartitioning forever then that fine. But being in the position of having to say yes our partitioning scheme sucks, but we would have to take the database down for a week to fix it is no fun. Unless logical replication provides a way out maybe?? Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] Built-in binning functions
On 31 August 2014 20:44, Gavin Flower gavinflo...@archidevsys.co.nz wrote: On 01/09/14 06:00, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: Suggest discretize() as a much more informative name. The other names will be overlooked by anybody that doesn't already know what to look for. I did not like that idea to begin with, but it's growing more attractive. In particular, I think it would be unique enough that we could safely mark the polymorphic function as variadic (a move that would cause ambiguity issues for pretty nearly any user-defined function of the same name). OTOH, I'm not as convinced as you that this name would mean anything to anybody that doesn't already know what to look for. It still seems like rather arcane terminology. If I was new to PostgreSQL, I'd probably read this as disc*(), a function to do something with discs. I have done finite maths and statistics at university, plus I have been vaguely following this thread - but, the meaning of discretize() is not immediately apparent to me (if I reread a previous email or 2 in this thread, then I'm sure it would then be obvious!). I might guess that it is to make something discrete, but why would I want to do that? So if I came across this function in a year or 2, I'd probably go right past it. I have been programming for over 40 years, and I still find choosing appropriate names sometimes very difficult, so I know it is not easy to come up with meaningful names - especially ones that mean something relevant to other people! It's important that we *don't* come up with a new name. This function does something useful and the words people already use to describe that are binning and discretization. By this I mean names already used by very widely used software. Search them and see, or suggest more widely used alternatives, please. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Built-in binning functions
On 31/08/14 22:33, Tom Lane wrote: Petr Jelinek p...@2ndquadrant.com writes: On 30/08/14 19:24, Tom Lane wrote: I wasn't terribly happy about that either. I still think we should reduce this to a single polymorphic function, as in the attached. I did try to write generic function very similar to what you wrote but discarded it because of the performance reasons. If we really want to support any data type I am all for having generic function but I still would like to have one optimized for float8 because that seems to be the most used use-case (at least that one is the reason why I even wrote this) for performance reasons. Well, part of the reason why your v3 float8 function looks so fast is that it's cheating: it will not produce the right answers for comparisons involving NaNs. I'm not sure how good it would look once you'd added some isnan() tests to make the behavior actually equivalent to float8_cmp_internal(). True, it increases the time of the test to around 285ms, still almost 30% speed difference. However, assuming it still seems worthwhile once that's accounted for, I don't have a strong objection to having an additional code path for float8. There are two ways we could do that: 1. Add a runtime test in the polymorphic function, eg if (element_type == FLOAT8OID) result = width_bucket_float8(...); else if (typentry-typlen 0) result = width_bucket_fixed(...); else result = width_bucket_variable(...); Yeah I think I prefer this one too, will see how much performance it eats. The difference between my generic and Tom's generic is because Tom's is slowed down by the deconstruct_array. Meh. It looked to me like your version would have O(N^2) performance problems from computing array offsets repeatedly, depending on exactly which array element it ended up on. It would avoid repeat calculations only if it always moved right. I actually think that worst case (when you go always left) for my version is O(N) since you only need to seek for the half of previous interval (it's doing binary search after all) and you do O(N) in the deconstruct_array. It would be very different if we could cache the array somehow (ie, if this was an aggregate) then it would obviously make a lot of sense to use deconstruct_array and in that case it would make even sense to resort probably, but sadly we can't cache afaik. Also, I made more tests with various array sizes (3-1) and distributions and mine version was never slower. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Built-in binning functions
Simon Riggs wrote width_bucket() seems to refer to an equal-width binning process. The function being discussed here is a generic mechanism, the boundaries of which could have been decided using equal-frequency or other mechanisms. Using the word width in those contexts could be confusing. Given width_bucket() is already in use for SQL Standard function, I agree it would be confusing to have same name for different parameter profiles. literal_bucket(float8, float8[]) Since bucket is the 'verb' here (in this specific case meaning lookup the supplied value in the supplied bucket definition) and width is a modifier (the bucket specification describes an equal-width structure) I suggest literal_bucket(val, array[]) such that the bucket is still the verb but now the modifier describes a structure that is literally provided. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Built-in-binning-functions-tp5807266p5817097.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Built-in binning functions
Petr Jelinek p...@2ndquadrant.com writes: On 31/08/14 22:33, Tom Lane wrote: Petr Jelinek p...@2ndquadrant.com writes: The difference between my generic and Tom's generic is because Tom's is slowed down by the deconstruct_array. Meh. It looked to me like your version would have O(N^2) performance problems from computing array offsets repeatedly, depending on exactly which array element it ended up on. It would avoid repeat calculations only if it always moved right. I actually think that worst case (when you go always left) for my version is O(N) since you only need to seek for the half of previous interval (it's doing binary search after all) and you do O(N) in the deconstruct_array. [ thinks about that for awhile... ] Hm, I think you are right. If there are N elements in the array then we will scan over N/2 of them to locate the midpoint element for the first comparison. After that, we will go either left or right, and in either case we will need to scan over N/4 elements to find the second-level midpoint. The third-level midpoint will be found after scanning N/8 elements, and so on. So the number of elements scanned over to compute their lengths is N/2 + N/4 + N/8 ..., or asymptotically N, the same as for the deconstruct_array approach. deconstruct_array might have some small advantage from tighter inner loops, but this is probably more than blown by the need for a palloc and pfree. So I was wrong and your way is better for navigating the array in the variable-width case, assuming that we're not concerned about spending some more code space to make this function a shade faster. BTW, was there a reason for not noticing the case of exact match in the search loop, and falling out early? As it stands the code will reliably choose the leftmost match if there are multiple equal items in the search array, but do we care about such cases? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Built-in binning functions
David G Johnston david.g.johns...@gmail.com writes: Since bucket is the 'verb' here (in this specific case meaning lookup the supplied value in the supplied bucket definition) and width is a modifier (the bucket specification describes an equal-width structure) I suggest literal_bucket(val, array[]) such that the bucket is still the verb but now the modifier describes a structure that is literally provided. It's a very considerable stretch to see bucket as a verb here :-). Maybe that's why the SQL committee's choice of function name seems so unnatural (to me anyway). I was wondering about bucket_index(), ie get the index of the bucket this value falls into. Or get_bucket(), or get_bucket_index() if you like verbosity. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Built-in binning functions
On Sun, Aug 31, 2014 at 7:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: David G Johnston david.g.johns...@gmail.com writes: Since bucket is the 'verb' here (in this specific case meaning lookup the supplied value in the supplied bucket definition) and width is a modifier (the bucket specification describes an equal-width structure) I suggest literal_bucket(val, array[]) such that the bucket is still the verb but now the modifier describes a structure that is literally provided. It's a very considerable stretch to see bucket as a verb here :-). Maybe that's why the SQL committee's choice of function name seems so unnatural (to me anyway). I was wondering about bucket_index(), ie get the index of the bucket this value falls into. Or get_bucket(), or get_bucket_index() if you like verbosity. regards, tom lane I got stuck on the thought that a function name should ideally be/include a verb... Even if you read it as a noun (and thus the verb is an implied get) the naming logic still holds. I pondered a get_ version though the argument for avoiding conflicting user-code decreases its appeal. The good part about SQL standard naming is that the typical programmer isn't likely to pick a conflicting name. bucket_index is appealing by itself though user-code probable...as bad as I think width_bucket is for a name the fact is that it currently exists and, even forced, consistency has merit. David J.
Re: [HACKERS] Tips/advice for implementing integrated RESTful HTTP API
On 08/31/2014 12:40 PM, Dobes Vandermeer wrote: 1. Connecting to multiple databases The background workers can apparently only connect to a single database at a time, but I want to expose all the databases via the API. bgworkers are assigned a database at launch time (if SPI is enabled), and this database may not change during the worker's lifetime, same as a normal backend. Sometimes frustrating, but that's how it is. I think I could use libpq to connect to PostgreSQL on localhost but this might have weird side-effects in terms of authentication, pid use, stuff like that. If you're going to do that, why use a bgworker at all? In general, what do you gain from trying to do this within the database server its self, not as an app in front of the DB? I could probably manage a pool of dynamic workers (as of 9.4), one per user/database combination or something along those lines. Even one per request? Is there some kind of IPC system in place to help shuttle the requests and responses between dynamic workers? Or do I need to come up with my own? The dynamic shmem code apparently has some queuing functionality. I haven't used it yet. It seems like PostgreSQL itself has a way to shuttle requests out to workers, is it possible to tap into that system instead? Basically some way to send the requests to a PostgreSQL backend from the background worker? It does? It's not the SPI, that executes work directly within the bgworker, making it behave like a normal backend for the purpose of query execution. Or perhaps I shouldn't do this as a worker but rather modify PostgreSQL itself and do it in a more integrated/destructive manner? Or just write a front-end. The problem you'd have attempting to modify PostgreSQL its self for this is that connection dispatch occurs via the postmaster, which is a single-threaded process that already needs to do a bit of work to keep an eye on how things are running. You don't want it constantly busy processing and dispatching millions of tiny HTTP requests. It can't just hand a connection off to a back-end immediately after accepting it, either; it'd have to read the HTTP headers to determine what database to connect to. Then launch a new backend for the connection, which is horribly inefficient when doing tiny short-lived connections. The postmaster has no concept of a pool of backends (unfortunately, IMO) to re-use. I imagine (it's not something I've investigated, really) that you'd want a connection accepter process that watched the listening http request socket. It'd hand connections off to dispatcher processes that read the message content to get the target DB and dispatch the request to a worker backend for the appropriate user/db combo, then collect the results and return them on the connection. Hopefully at this point you're thinking that sounds a lot like a connection pool... because it is. An awfully complicated one, probably, as you'd have to manage everything using shared memory segments and latches. In my view it's unwise to try to do this in the DB with PostgreSQL's architecture. Hack PgBouncer or PgPool to do what you want. Or write a server with Tomcat/Jetty using JAX-RS and PgJDBC and the built in connection pool facilities - you won't *believe* how easy it is. 3. Parallelism The regular PostgreSQL server can run many queries in parallel Well, one PostgreSQL instance (postmaster) may have many backends, each of which may run queries in series but not in parallel. Any given process may only run one query at once. but it seems like if I am using SPI I could only run one query at a time - it's not an asynchronous API. Correct. Any help, sage advice, tips, and suggestions how to move forward in these areas would be muchly appreciated! Don't do it with bgworkers. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY and heap_sync
On Sun, Aug 31, 2014 at 10:10 AM, Peter Eisentraut pete...@gmx.net wrote: On 8/30/14 2:26 AM, Jeff Janes wrote: But there cases were people use COPY in a loop with a small amount of data in each statement. What would be the reason for doing that? I used that to the same thing many times. In a company that I was employed we developed scripts to migrate data from one database do another. The first version we used INSERT statements and was very very slow. Then we wrote a second version changing the INSERT by COPY statements. The performance was very better, but we believe that could be better, so in the third version we created some kind of cache (using arrays) to accumulate the records in memory then after N rows we build the COPY statement with the cache contents and run it. This was a really good performance improvement. It's my use case to we have a feature to postpone the heap_sync in COPY statements. I don't know if it's a feature that a lot of people wants, but IMHO it could be nice to improve the bulk load operations. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello
Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?
On 08/29/2014 04:59 AM, Kevin Grittner wrote: I just took a quick look at the spec to refresh my memory, and it seems to require that the WITH TIME ZONE types store UTC (I suppose for fast comparisons), it requires the time zone in the form of a hour:minute offset to be stored with it, so you can determine the local time from which it was derived. I concede that this is not usually useful, and am glad we have a type that behaves as timestamptz does; but occasionally a type that behaves in conformance with the spec would be useful, and it would certainly be less confusing for people who are used to the standard behavior. FWIW, MS SQL's DateTimeOffset data type: http://msdn.microsoft.com/en-AU/library/bb630289.aspx is much more like what I, when I was getting started, expected TIMESTAMP WITH TIME ZONE to be. We don't really have anything equivalent in PostgreSQL. That's also what i expect,a timestamptz = timestampt + offset . Just like the current implementation of TIME WITH TIME ZONE. typedef struct { TimeADT time; /* all time units other than months and years */ int32 zone; /* numeric time zone, in seconds */ } TimeTzADT; And, it's inconvenient for client(jdbc,npgsql...) to understand a strict 'timezone' (such as 'America/New_York') which comes from PostgreSQL and transform it to theirown data type(Such as DateTimeOffset in .NET). But a *offset* is easy to parse and process. Beast Regards rohto -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS Design
Adam, all, * Brightwell, Adam (adam.brightw...@crunchydatasolutions.com) wrote: Attached is a patch for RLS that was create against master at 01363beae52700c7425cb2d2452177133dad3e93 and is ready for review. Many thanks for posting this. As others may realize already, I've reviewed and modified this patch already, working with Adam to get it ready. I'm continuing to review and test it, but in general I'm quite happy with how it's shaping up- additional review, testing, and comments are always appreciated though. Alter a RLS policy named name on table. Specifying a command is optional, if provided then the policy's command is changed otherwise it is left as-is. Specifying a role is optional, if provided then the policy's role is changed otherwise it is left as-is. The condition must always be provided and is therefore always replaced. I'm pretty sure the condition is also optional in this patch (that was a late change that I made), but the documentation needs to be updated. * Plancache Invalidation: If a relation has a row-security policy and row-security is enabled then the invalidation will occur when either the row_security GUC is changed OR when a the current user changes. This invalidation ONLY takes place for cached plans where the target relation has a row security policy. I know there was a lot of discussion about this previously, but I'm fine with the initial version simply invalidating plans which involve RLS-enabled relations and role changes. This patch doesn't create any regressions for individuals who are not using RLS. We can certainly look into improving this in the future to have per-role plan caches but it's a fair bit of additional non-trivial code that can be added independently. * Security Qual Expression: All row-security policies are OR'ed together. This was also a point of much discussion, but I continue to feel this is the right approach for the initial version. We can add flexability here later, if necessary, but OR'ing these together is in-line with how role membership works today (you have right for all roles you are a member of, persuant to inherit/noinherit status, of course). * row_security GUC - enable/disable row level security. Note that, as discussed, pg_dump will set row_security off, unless specifically asked to enable it. row_security will also be set to off when the user logging in is a superuser or does a 'set role' to a superuser. Currently, if a user logging in is *not* a superuser, or a 'set role' is done to a non-superuser, row_security gets re-set to enabled. This is one aspect of the patch that I think we should change (which is a matter of removing just a few lines of code and then updating the regression tests to do 'set row_security = on;' before running), because if you log in as a superuser and then 'set role' to a non-superuser, it occurs to me now (it didn't really when I wrote this originally) as a bit surprising that row_security gets set to 'on' when doing a 'set role'. One thing that I really like about this approach is that a superuser can explicitly set 'row_security' to on and be able to see what happens. Clearly, in an environment of untrusted users, that could be dangerous, but it can also be an extremely useful way of testing things, particularly in development environments where everyone is a superuser. This deserves a bit more documentation also. * BYPASSRLS and NOBYPASSRLS role attribute - allows user to bypass RLS if row_security GUC is set to OFF. If a user sets row_security to OFF and does not have this attribute, then an error is raised when attempting to query a relation with a RLS policy. (note that the superuser is always considered to have the bypassrls attribute) * psql \d table support: psql describe support for listing policy information per table. This works pretty well for me, but we may want to add some indication that RLS is on a table in the \dp listing. Thanks! Stephen signature.asc Description: Digital signature