date:20140831

Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-31 Thread Fabien COELHO



Hello Heikki,

For the kicks, I wrote a quick  dirty patch for interleaving the fsyncs, see 
attached. It works by repeatedly scanning the buffer pool, writing buffers 
belonging to a single relation segment at a time.


I tried this patch on the same host I used with the same -R 25 -L 200 -T 
5000, alas without significant positive effects, about 6% of transactions 
were lost, including stretches of several seconds of unresponsiveness.


Maybe this is because pgbench -N only basically touches 2 tables 
(accounts  history), so there are few file syncs involved, thus it does 
not help much with spreading a lot of writes.


  2014-08-30 23:52:24.167 CEST:
   LOG:  checkpoint starting: xlog
  2014-08-30 23:54:09.239 CEST:
   LOG:  checkpoint sync: number=1 file=base/16384/11902 time=24.529 msec
  2014-08-30 23:54:23.812 CEST:
   LOG:  checkpoint sync: number=1 file=base/16384/16397_vm time=32.547 msec
  2014-08-30 23:54:24.771 CEST:
   LOG:  checkpoint sync: number=1 file=base/16384/11873 time=557.470 msec
  2014-08-30 23:54:37.419 CEST:
   LOG:  checkpoint complete: wrote 4931 buffers (30.1%);
 0 transaction log file(s) added, 0 removed, 3 recycled;
 write=122.946 s, sync=10.129 s, total=133.252 s;
 sync files=6, longest=9.854 s, average=1.790 s

Note that given the small load and table size, pgbench implies random 
I/Os and basically nothing else.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] improving speed of make check-world

2014-08-31 Thread Fabien COELHO

Hello Peter,

Here is a review:

The version 2 of the patch applies cleanly on current head.

The ability to generate and reuse a temporary installation for different
tests looks quite useful, thus putting install out of pg_regress and in
make seems reasonnable.

However I'm wondering whether there could be some use case of pg_regress
where doing the install might be useful nevertheless, say for manually
doing things on the command line, in which case keeping the feature, even
if not used by default, could be nice?

About changes: I think that more comments would be welcome, eg
with_temp_install.

There is not a single documentation change. Should there be some? Hmmm...
I have not found much documentation about pg_regress...

I have tested make check, check-world, as well as make check in contrib
sub-directories. It seems to work fine in sequential mode.

Running make -j2 check-world does not work because initdb is not found
by pg_regress. but make -j1 check-world does work fine. It seems that
some dependencies might be missing and there is a race condition between
temporary install and running some checks?? Maybe it is not expected to
work anyway? See below suggestions to make it work.

However in this case the error message passed by pg_regress contains an
error:

pg_regress: initdb failed
Examine /home/fabien/DEV/GIT/postgresql/contrib/btree_gin/log/initdb.log for
the reason.
Command was: initdb -D /home/fabien/DEV/GIT/postgresql/contrib/btree_gin/./tmp_check/data
--noclean --nosync /home/fabien/DEV/GIT/postgresql/contrib/btree_gin/./tmp_check/log/initdb.log
21
make[2]: *** [check] Error 2

The error messages point to non existing log file (./tmp_check is
missing), the temp_instance variable should be used in the error message
as well as in the commands. Maybe other error messages have the same
issues.

I do not like much the MAKELEVEL hack in a phony target. When running in
parallel, several makes may have the same level anyway, not sure what
would happen... Maybe it is the origin of the race condition which makes
initdb not to be found above. I would suggest to try to rely on
dependences, maybe something like the following could work to ensure that
an installation is done once and only once per make invocation, whatever
the underlying parallelism levels, as well as a possibility to reuse the
previous installation.

# must be defined somewhere common to all makefiles
ifndef MAKE_NONCE
# define a nanosecond timestamp
export MAKE_NONCE := $(shell date +%s.%N)
endif

# actual new tmp installation
.tmp_install:
$(RM) ./.tmp_install.*
$(RM) -r ./tmp_install
# create tmp installation...
touch $@

# tmp installation for the nonce
.tmp_install.$(MAKE_NONCE): .tmp_install
touch $@

# generate a new tmp installation by default
# make USE_TMP_INSTALL=1 ... reuses a previous installation if available
ifdef USE_TMP_INSTALL
TMP_INSTALL = .tmp_install
else
TMP_INSTALL = .tmp_install.$(MAKE_NONCE)
endif # USE_TMP_INSTALL

.PHONY: main-temp-install
main-temp-install: $(TMP_INSTALL)

.PHONY: extra-temp-install
extra-temp-install: main-temp-install
# do EXTRA_INSTALL

MSVC build is not yet adjusted for this. Looking at vcregress.pl, this
should be fairly straightforward.

No idea about that point.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_filedump for 9.4?

2014-08-31 Thread Christoph Berg

Re: Fabrízio de Royes Mello 2014-06-25 
CAFcNs+oAb8h-0w2vLEWj6R-Gv=xizgdBya3K=SCd_9Tjyo=z...@mail.gmail.com
 On Wed, Jun 25, 2014 at 3:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Would like that, but I'm not sure what pgindent will do with the //
  comments.  It's been on my to-do list to switch all the comments to C89
  style and then pgindent it, but I don't see myself getting to that in this
  decade :-(
 
 
 I changed all // comments to /* */ and run pgindent.

I've pushed these patches to the git repository, thanks.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tips/advice for implementing integrated RESTful HTTP API

2014-08-31 Thread Peter Eisentraut

On 8/31/14 12:40 AM, Dobes Vandermeer wrote:
 The background workers can apparently only connect to a single database
 at a time, but I want to expose all the databases via the API. 

I think the term background worker should be taken as a hint that
foreground protocol endpoint was not one of the envisioned use cases.
 I think this sort of thing should be done as a separate process in
front of the postmaster.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY and heap_sync

2014-08-31 Thread Peter Eisentraut

On 8/30/14 2:26 AM, Jeff Janes wrote:
 But there cases were people use COPY in a loop with a small amount of
 data in each statement.

What would be the reason for doing that?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog and replication slots

2014-08-31 Thread Magnus Hagander

On Tue, Aug 26, 2014 at 9:43 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Tue, Aug 19, 2014 at 2:49 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 On Mon, Aug 18, 2014 at 4:01 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 On Mon, Aug 18, 2014 at 3:48 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Aug 18, 2014 at 2:38 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 And now looking at your patch there is additional refactoring possible
 with IDENTIFY_SYSTEM and pg_basebackup as well...
 And attached is a rebased patch doing so.
 I reworked this patch a bit to take into account the fact that the
 number of columns to check after running IDENTIFY_SYSTEM is not always
 4 as pointed out by Andres:
 - pg_basebackup = 3
 - pg_receivexlog = 3
 - pg_recvlogical = 4

As this is a number of patches rolled into one - do you happen to keep
them separate in your local repo? If so can you send them as separate
ones (refactor identify_system should be quite unrelated to supporting
replication slots, right?), for easier review? (if not, I'll just
split them apart mentally, but it's easier to review separately)

On the identify_system part - my understanding of the code is that
what you pass in as num_cols is the number of columns required for it
to work, right? We probably need to adjust the error message as well
in that case, because it's no longer what's expected, it's what's
required? And we might want to include a hint about the reason
(wrong version)?

There's also a note get LSN start position if necessary, but it
tries to do it unconditionally. What is the if necessary supposed to
refer to?

Actually - why do we even care about the 3 vs 4 in RunIdentifySystem,
as it never actually looks at the 4th column anyway? If we do
specifically want it to fail in the case of pg_recvlogical, we really
need to think up a better error message for it, and perhaps a
different way of specifying it?

Do we really want those Asserts? There is not a single Assert in
bin/pg_basebackup today - as is the case for most things in bin/. We
typically use regular if statements for things that can happen, and
just ignore the others I think - since the callers are fairly simple
to trace.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_filedump for 9.4?

2014-08-31 Thread Fabrízio de Royes Mello

Em domingo, 31 de agosto de 2014, Christoph Berg c...@df7cb.de escreveu:

 Re: Fabrízio de Royes Mello 2014-06-25
 CAFcNs+oAb8h-0w2vLEWj6R-Gv=xizgdBya3K=SCd_9Tjyo=z...@mail.gmail.com
 javascript:;
  On Wed, Jun 25, 2014 at 3:52 PM, Tom Lane t...@sss.pgh.pa.us
 javascript:; wrote:
   Would like that, but I'm not sure what pgindent will do with the //
   comments.  It's been on my to-do list to switch all the comments to C89
   style and then pgindent it, but I don't see myself getting to that in
 this
   decade :-(
  
 
  I changed all // comments to /* */ and run pgindent.

 I've pushed these patches to the git repository, thanks.


Thanks!

Fabrízio de Royes Mello


-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello

Re: [HACKERS] Tips/advice for implementing integrated RESTful HTTP API

2014-08-31 Thread Fabrízio de Royes Mello

Em domingo, 31 de agosto de 2014, Peter Eisentraut pete...@gmx.net
escreveu:

 On 8/31/14 12:40 AM, Dobes Vandermeer wrote:
  The background workers can apparently only connect to a single database
  at a time, but I want to expose all the databases via the API.

 I think the term background worker should be taken as a hint that
 foreground protocol endpoint was not one of the envisioned use cases.
  I think this sort of thing should be done as a separate process in
 front of the postmaster.


+1

Do you know the PGRest project?

Look at http://pgre.st

Regards,

Fabrízio


-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello

Re: [HACKERS] Final Patch for GROUPING SETS - unrecognized node type: 347

2014-08-31 Thread Erik Rijkers

On Tue, August 26, 2014 14:24, Andrew Gierth wrote:
 Erik == Erik Rijkers e...@xs4all.nl writes:

   They apply cleanly for me at 2bde297 whether with git apply or
   patch, except for the contrib one (which you don't need unless you
   want to run the contrib regression tests without applying the
   gsp-u patch).

  Erik Ah, I had not realised that.  Excluding that contrib-patch and
  Erik only applying these three:

  Erik gsp1.patch
  Erik gsp2.patch
  Erik gsp-doc.patch

  Erik does indeed work (applies, compiles).

 I put up a rebased contrib patch anyway (linked off the CF).

 Did the unrecognized node type error go away, or do we still need to
 look into that?


I have found that the unrecognized node type error is caused by:

shared_preload_libraries = pg_stat_statements

in postgresql.conf (as my default compile script was doing).

If I disable that line the error goes away.

I don't know exactly what that means for the groping sets patches but I thought 
I'd mention it here.

Otherwise I've not run into any problems with GROUPING SETS.


Erik Rijkers




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Final Patch for GROUPING SETS - unrecognized node type: 347

2014-08-31 Thread Atri Sharma

On Sun, Aug 31, 2014 at 9:07 PM, Erik Rijkers e...@xs4all.nl wrote:

 On Tue, August 26, 2014 14:24, Andrew Gierth wrote:
  Erik == Erik Rijkers e...@xs4all.nl writes:
 
They apply cleanly for me at 2bde297 whether with git apply or
patch, except for the contrib one (which you don't need unless you
want to run the contrib regression tests without applying the
gsp-u patch).
 
   Erik Ah, I had not realised that.  Excluding that contrib-patch and
   Erik only applying these three:
 
   Erik gsp1.patch
   Erik gsp2.patch
   Erik gsp-doc.patch
 
   Erik does indeed work (applies, compiles).
 
  I put up a rebased contrib patch anyway (linked off the CF).
 
  Did the unrecognized node type error go away, or do we still need to
  look into that?
 

 I have found that the unrecognized node type error is caused by:

 shared_preload_libraries = pg_stat_statements

 in postgresql.conf (as my default compile script was doing).

 If I disable that line the error goes away.


I  think thats more of a library linking problem rather than a problem with
the patch. I couldnt reproduce it,though.

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Final Patch for GROUPING SETS - unrecognized node type: 347

2014-08-31 Thread Andres Freund

On 2014-08-31 21:09:59 +0530, Atri Sharma wrote:
 On Sun, Aug 31, 2014 at 9:07 PM, Erik Rijkers e...@xs4all.nl wrote:
  I have found that the unrecognized node type error is caused by:

It's a warning, not an error, right?

  shared_preload_libraries = pg_stat_statements
 
  in postgresql.conf (as my default compile script was doing).
 
  If I disable that line the error goes away.
 
 
 I  think thats more of a library linking problem rather than a problem with
 the patch. I couldnt reproduce it,though.

I think it's vastly more likely that the patch simply didn't add the new
expression types to pg_stat_statements.c:JumbleExpr().

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] improving speed of make check-world

2014-08-31 Thread Fabien COELHO




 # actual new tmp installation
 .tmp_install:
$(RM) ./.tmp_install.*
$(RM) -r ./tmp_install
# create tmp installation...
touch $@

 # tmp installation for the nonce
 .tmp_install.$(MAKE_NONCE): .tmp_install
touch $@


Oops, I got it wrong, the install would not be reexecuted the second time.

Maybe someting more like:

  ifdef USE_ONE_INSTALL
  TMP_INSTALL = .tmp_install.once
  else
  TMP_INSTALL = .tmp_install.$(MAKE_NONCE)
  endif

  $(TMP_INSTALL):
$(RM) -r ./tmp_install .tmp_install.*
# do installation...
touch $@

So that the file target is different each time it is run. Hopefully.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Final Patch for GROUPING SETS - unrecognized node type: 347

2014-08-31 Thread Atri Sharma

On Sunday, August 31, 2014, Andres Freund and...@2ndquadrant.com wrote:

 On 2014-08-31 21:09:59 +0530, Atri Sharma wrote:
  On Sun, Aug 31, 2014 at 9:07 PM, Erik Rijkers e...@xs4all.nl
 javascript:; wrote:
   I have found that the unrecognized node type error is caused by:

 It's a warning, not an error, right?

   shared_preload_libraries = pg_stat_statements
  
   in postgresql.conf (as my default compile script was doing).
  
   If I disable that line the error goes away.
  
  
  I  think thats more of a library linking problem rather than a problem
 with
  the patch. I couldnt reproduce it,though.

 I think it's vastly more likely that the patch simply didn't add the new
 expression types to pg_stat_statements.c:JumbleExpr().



Must have run the above diagnosis in a wrong manner then, I will
check.Thanks for the heads up!

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Re: [BUGS] Re: BUG #9555: pg_dump for tables with inheritance recreates the table with the wrong order of columns

2014-08-31 Thread Tom Lane

Bruce Momjian br...@momjian.us writes:
 I have developed the attached patch to warn about column reordering in
 this odd case.  The patch mentions the reordering of c:

   NOTICE:  merging column a with inherited definition
   NOTICE:  merging column c with inherited definition;  column moved 
 earlier to match inherited column location

This does not comport with our error message style guidelines.
You could put the additional information as an errdetail sentence,
perhaps.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Built-in binning functions

2014-08-31 Thread Simon Riggs

On 30 August 2014 18:24, Tom Lane t...@sss.pgh.pa.us wrote:

 3. I am thinking about name - I don't think so varwidth_bucket is correct
 -- in relation to name width_bucket ... what about range_bucket or
 scope_bucket ?

 Neither of those seem like improvements from here.  I agree with the
 objections to bin() as well.  bucket() might be all right but it still
 seems a bit too generic.  width_bucket(), which at least shows there's
 a relationship to the standard functions, still seems like the best
 of the suggestions so far.

I'd been mulling that over and agree with objections to bin()

Suggest discretize() as a much more informative name. The other names
will be overlooked by anybody that doesn't already know what to look
for.

Thanks for the polymorphic patch, that sounds good. Sorry, not
reviewed more deeply (still travelling).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [BUGS] Re: BUG #9555: pg_dump for tables with inheritance recreates the table with the wrong order of columns

2014-08-31 Thread David G Johnston

Tom Lane-2 wrote
 Bruce Momjian lt;

 bruce@

 gt; writes:
 I have developed the attached patch to warn about column reordering in
 this odd case.  The patch mentions the reordering of c:
 
  NOTICE:  merging column a with inherited definition
  NOTICE:  merging column c with inherited definition;  column moved
 earlier to match inherited column location
 
 This does not comport with our error message style guidelines.
 You could put the additional information as an errdetail sentence,
 perhaps.

Would it be proper to issue an additional top-level warning with the column
moved notification?  Thus there would be NOTICE, NOTICE, WARNING in the
above example?  Or, more generically, columns reordered to match inherited
column order to avoid multiple warnings if more than one column is moved.

David J.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Re-BUGS-Re-BUG-9555-pg-dump-for-tables-with-inheritance-recreates-the-table-with-the-wrong-order-of-s-tp5816566p5817073.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Selectivity estimation for inet operators

2014-08-31 Thread Emre Hasegeli

 * Isn't X  Y equivalent to network_scan_first(X)  Y AND 
 network_scan_last(X)  Y? Or at least close enough for selectivity 
 estimation purposes? Pardon my ignorance - I'm not too familiar with the 
 inet datatype - but how about just calling scalarineqsel for both bounds?

Actually, X  Y is equivalent to

network_scan_first(X) = network_host(Y) AND
network_scan_last(X) = network_host(Y) AND
network_masklen(X)  network_masklen(X)

but we do not have statistics for neither network_scan_last(X)
nor network_masklen(X).  I tried to find a solution based on
the implementation of the operators.

 * inet_mcv_join_selec() is O(n^2) where n is the number of entries in the 
 MCV lists. With the max statistics target of 1, a worst case query on 
 my laptop took about 15 seconds to plan. Maybe that's acceptable, but you 
 went through some trouble to make planning of MCV vs histogram faster, by 
 the log2 method to compare only some values, so I wonder why you didn't do 
 the same for the MCV vs MCV case?

It was like that in the previous versions.  It was causing worse
estimation, but I was trying to reduce both sides of the lists.  It
works slightly better when only the left hand side of the list is
reduced.  Attached version works like that.

 * A few typos: lenght - length.

Fixed.

Thank you for looking at it.
diff --git a/src/backend/utils/adt/network_selfuncs.c 
b/src/backend/utils/adt/network_selfuncs.c
index d0d806f..a00706c 100644
--- a/src/backend/utils/adt/network_selfuncs.c
+++ b/src/backend/utils/adt/network_selfuncs.c
@@ -1,32 +1,671 @@
 /*-
  *
  * network_selfuncs.c
  *   Functions for selectivity estimation of inet/cidr operators
  *
- * Currently these are just stubs, but we hope to do better soon.
+ * Estimates are based on null fraction, distinct value count, most common
+ * values, and histogram of inet/cidr datatypes.
  *
  * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
  *
  * IDENTIFICATION
  *   src/backend/utils/adt/network_selfuncs.c
  *
  *-
  */
 #include postgres.h
 
+#include math.h
+
+#include access/htup_details.h
+#include catalog/pg_collation.h
+#include catalog/pg_operator.h
+#include catalog/pg_statistic.h
+#include utils/lsyscache.h
 #include utils/inet.h
+#include utils/selfuncs.h
 
 
+/* Default selectivity constant for the inet overlap operator */
+#define DEFAULT_OVERLAP_SEL 0.01
+
+/* Default selectivity constant for the other operators */
+#define DEFAULT_INCLUSION_SEL 0.005
+
+/* Default selectivity for given operator */
+#define DEFAULT_SEL(operator) \
+   ((operator) == OID_INET_OVERLAP_OP ? \
+   DEFAULT_OVERLAP_SEL : DEFAULT_INCLUSION_SEL)
+
+static Selectivity networkjoinsel_inner(Oid operator,
+VariableStatData *vardata1, 
VariableStatData *vardata2);
+extern double eqjoinsel_semi(Oid operator, VariableStatData *vardata1,
+  VariableStatData *vardata2, RelOptInfo *inner_rel);
+extern RelOptInfo *find_join_input_rel(PlannerInfo *root, Relids relids);
+static short int inet_opr_order(Oid operator);
+static Selectivity inet_his_inclusion_selec(Datum *values, int nvalues,
+Datum *constvalue, short int 
opr_order);
+static Selectivity inet_mcv_join_selec(Datum *values1, float4 *numbers1,
+   int nvalues1, Datum *values2, float4 
*numbers2,
+   int nvalues2, int red_nvalues, Oid 
operator);
+static Selectivity inet_mcv_his_selec(Datum *mcv_values, float4 *mcv_numbers,
+  int mcv_nvalues, Datum *his_values, int 
his_nvalues,
+  int red_nvalues, short int opr_order,
+  Selectivity *max_selec_pointer);
+static Selectivity inet_his_inclusion_join_selec(Datum *his1_values,
+ int his1_nvalues, Datum *his2_values, 
int his2_nvalues,
+ int red_nvalues, 
short int opr_order);
+static short int inet_inclusion_cmp(inet *left, inet *right,
+  short int opr_order);
+static short int inet_masklen_inclusion_cmp(inet *left, inet *right,
+  short int opr_order);
+static short int inet_his_match_divider(inet *boundary, inet *query,
+  short int opr_order);
+
+/*
+ * Selectivity estimation for the subnet inclusion operators
+ */
 Datum
 networksel(PG_FUNCTION_ARGS)
 {
-   PG_RETURN_FLOAT8(0.001);
+   PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+   Oid

Re: [HACKERS] Selectivity estimation for inet operators

2014-08-31 Thread Emre Hasegeli

 Heikki Linnakangas hlinnakan...@vmware.com writes:
  * inet_mcv_join_selec() is O(n^2) where n is the number of entries in 
  the MCV lists. With the max statistics target of 1, a worst case 
  query on my laptop took about 15 seconds to plan. Maybe that's 
  acceptable, but you went through some trouble to make planning of MCV vs 
  histogram faster, by the log2 method to compare only some values, so I 
  wonder why you didn't do the same for the MCV vs MCV case?
 
 Actually, what I think needs to be asked is the opposite question: why is
 the other code ignoring some of the statistical data?  If the user asked
 us to collect a lot of stats detail it seems reasonable that he's
 expecting us to use it to get more accurate estimates.  It's for sure
 not obvious why these estimators should take shortcuts that are not being
 taken in the much-longer-established code for scalar comparison estimates.

It will still use more statistical data, when statistics_target is
higher.  It was not sure that the user wants to spent O(n^2) amount
of time based on statistics_target.  Attached version is without
this optimization.  Estimates are better without it, but planning
takes more time.

 I'm not exactly convinced that the math adds up in this logic, either.
 The way in which it combines results from looking at the MCV lists and
 at the histograms seems pretty arbitrary.

I taught the product of the join will be

(left_mcv + left_histogram) * (right_mcv + right_histogram) * 
selectivity

and tried to calculate it as in the following:

(left_mcv * right_mcv * selectivity) +
(right_mcv * left_histogram * selectivity) +
(left_mcv * right_histogram * selectivity) +
(left_histogram * right_histogram * selectivity)

where left_histogram is

1.0 - left_nullfrac - left_mcv

I fixed calculation for the MCV vs histogram part.  The estimates of
inner join are very close to the actual rows with statistics_target = 1000.
I think the calculation should be right.
diff --git a/src/backend/utils/adt/network_selfuncs.c 
b/src/backend/utils/adt/network_selfuncs.c
index d0d806f..4367f0e 100644
--- a/src/backend/utils/adt/network_selfuncs.c
+++ b/src/backend/utils/adt/network_selfuncs.c
@@ -1,32 +1,655 @@
 /*-
  *
  * network_selfuncs.c
  *   Functions for selectivity estimation of inet/cidr operators
  *
- * Currently these are just stubs, but we hope to do better soon.
+ * Estimates are based on null fraction, distinct value count, most common
+ * values, and histogram of inet/cidr datatypes.
  *
  * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
  *
  * IDENTIFICATION
  *   src/backend/utils/adt/network_selfuncs.c
  *
  *-
  */
 #include postgres.h
 
+#include math.h
+
+#include access/htup_details.h
+#include catalog/pg_collation.h
+#include catalog/pg_operator.h
+#include catalog/pg_statistic.h
+#include utils/lsyscache.h
 #include utils/inet.h
+#include utils/selfuncs.h
 
 
+/* Default selectivity constant for the inet overlap operator */
+#define DEFAULT_OVERLAP_SEL 0.01
+
+/* Default selectivity constant for the other operators */
+#define DEFAULT_INCLUSION_SEL 0.005
+
+/* Default selectivity for given operator */
+#define DEFAULT_SEL(operator) \
+   ((operator) == OID_INET_OVERLAP_OP ? \
+   DEFAULT_OVERLAP_SEL : DEFAULT_INCLUSION_SEL)
+
+static Selectivity networkjoinsel_inner(Oid operator,
+VariableStatData *vardata1, 
VariableStatData *vardata2);
+extern double eqjoinsel_semi(Oid operator, VariableStatData *vardata1,
+  VariableStatData *vardata2, RelOptInfo *inner_rel);
+extern RelOptInfo *find_join_input_rel(PlannerInfo *root, Relids relids);
+static short int inet_opr_order(Oid operator);
+static Selectivity inet_his_inclusion_selec(Datum *values, int nvalues,
+Datum *constvalue, short int 
opr_order);
+static Selectivity inet_mcv_join_selec(Datum *values1, float4 *numbers1,
+   int nvalues1, Datum *values2, float4 
*numbers2,
+   int nvalues2, Oid operator, Selectivity 
*max_selec_p);
+static Selectivity inet_mcv_his_selec(Datum *mcv_values, float4 *mcv_numbers,
+  int mcv_nvalues, Datum *his_values, int 
his_nvalues,
+  short int opr_order, Selectivity 
*max_selec_p);
+static Selectivity inet_his_inclusion_join_selec(Datum *his1_values,
+ int his1_nvalues, Datum *his2_values, 
int his2_nvalues,
+ short int opr_order);
+static short int inet_inclusion_cmp(inet *left, inet

Re: [HACKERS] Built-in binning functions

2014-08-31 Thread Tom Lane

Simon Riggs si...@2ndquadrant.com writes:
 Suggest discretize() as a much more informative name. The other names
 will be overlooked by anybody that doesn't already know what to look
 for.

I did not like that idea to begin with, but it's growing more attractive.
In particular, I think it would be unique enough that we could safely
mark the polymorphic function as variadic (a move that would cause
ambiguity issues for pretty nearly any user-defined function of the
same name).

OTOH, I'm not as convinced as you that this name would mean anything
to anybody that doesn't already know what to look for.  It still
seems like rather arcane terminology.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Selectivity estimation for inet operators

2014-08-31 Thread Emre Hasegeli

 What you did there is utterly unacceptable from a modularity standpoint;
 and considering that the values will be nowhere near right, the argument
 that it's better than returning a constant seems pretty weak.  I think
 you should just take that out again.

I will try to come up with a better, data type specific implementation
in a week.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [BUGS] Re: BUG #9555: pg_dump for tables with inheritance recreates the table with the wrong order of columns

2014-08-31 Thread Tom Lane

David G Johnston david.g.johns...@gmail.com writes:
 Would it be proper to issue an additional top-level warning with the column
 moved notification?  Thus there would be NOTICE, NOTICE, WARNING in the
 above example?  Or, more generically, columns reordered to match inherited
 column order to avoid multiple warnings if more than one column is moved.

That's a good point: if this message fires at all, it will probably fire
more than once; do we want that?  If we do it as you suggest here, we'll
lose the information as to exactly which columns got relocated, which
perhaps is bad, or maybe it doesn't matter.  Also, I don't remember the
exact code structure in that area, but it might be a bit painful to
arrange that we get only one such warning even when inheriting from
multiple parents.

If we do want the specific moved columns to be identified, I'd still go
with errdetail on the NOTICE rather than two separate messages.  I think
calling it a WARNING is a bit extreme anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_filedump for 9.4?

2014-08-31 Thread Stepan Rutz

Hi community,
while I am currently investigating why a certain table with highly redundant 
and utterly verbose xml becomes worse storage wise when making the xml more 
compact. Since i am quite new to this, I believe its the lz compression in the 
text database. But thats irrelevant now, just mentioning because tools like 
pg_filedump allow people like me to help themselves and a basic understanding 
of things.

During checking I noticed pg_filedump (current from git.postgresql.org incl. 
the below mentioned commit) does not compile on Mac-OSX. Afaik it will not 
compile as soon as post.h comes into play and USE_REPL_SNPRINTF is defined. 
Then printf and sprintf (ouch particular but code path seems tolerable) in the 
source of pg_filedump become pg_printf and so on. These replacements are part 
of postgres and can’t be linked into the standalone pg_filedump. At least that 
is certainly not the intention.

Putting 

#undef sprintf
#undef print

after the includes in pg_filedump fixes the mac compile and imho all builds 
where the USE_REPL_SNPRINTF is defined as a side effect of include postgres.h 
effectively taking printf from me.

Not sure how to deal with this issue correctly so this is just for your 
consideration since the issue is a bit broader imho.

Regards,
Stepan

  
Am 31.08.2014 um 17:25 schrieb Fabrízio de Royes Mello 
fabriziome...@gmail.com:

 
 
 Em domingo, 31 de agosto de 2014, Christoph Berg c...@df7cb.de escreveu:
 Re: Fabrízio de Royes Mello 2014-06-25 
 CAFcNs+oAb8h-0w2vLEWj6R-Gv=xizgdBya3K=SCd_9Tjyo=z...@mail.gmail.com
  On Wed, Jun 25, 2014 at 3:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
   Would like that, but I'm not sure what pgindent will do with the //
   comments.  It's been on my to-do list to switch all the comments to C89
   style and then pgindent it, but I don't see myself getting to that in this
   decade :-(
  
 
  I changed all // comments to /* */ and run pgindent.
 
 I've pushed these patches to the git repository, thanks.
 
 
 Thanks!
 
 Fabrízio de Royes Mello
 
 
 -- 
 Fabrízio de Royes Mello
 Consultoria/Coaching PostgreSQL
  Timbira: http://www.timbira.com.br
  Blog: http://fabriziomello.github.io
  Linkedin: http://br.linkedin.com/in/fabriziomello
  Twitter: http://twitter.com/fabriziomello
  Github: http://github.com/fabriziomello
 



smime.p7s
Description: S/MIME cryptographic signature

Re: [HACKERS] Built-in binning functions

2014-08-31 Thread Gavin Flower


On 01/09/14 06:00, Tom Lane wrote:

Simon Riggs si...@2ndquadrant.com writes:

Suggest discretize() as a much more informative name. The other names
will be overlooked by anybody that doesn't already know what to look
for.

I did not like that idea to begin with, but it's growing more attractive.
In particular, I think it would be unique enough that we could safely
mark the polymorphic function as variadic (a move that would cause
ambiguity issues for pretty nearly any user-defined function of the
same name).

OTOH, I'm not as convinced as you that this name would mean anything
to anybody that doesn't already know what to look for.  It still
seems like rather arcane terminology.

regards, tom lane


If I was new to PostgreSQL, I'd probably read this as disc*(), a 
function to do something with discs.


I have done finite maths and statistics at university, plus I have been 
vaguely following this thread - but, the meaning of discretize() is not 
immediately apparent to me (if I reread a previous email or 2 in this 
thread, then I'm sure it would then be obvious!).  I might guess that it 
is to make something discrete, but why would I want to do that?  So if I 
came across this function in a year or 2, I'd probably go right past it.


I have been programming for over 40 years, and I still find choosing 
appropriate names sometimes very difficult, so I know it is not easy to 
come up with meaningful names - especially ones that mean something 
relevant to other people!



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Built-in binning functions

2014-08-31 Thread Petr Jelinek


On 30/08/14 19:24, Tom Lane wrote:

Pavel Stehule pavel.steh...@gmail.com writes:

1. I am thinking so reduction to only numeric types is not necessary -
although we can live without it - but there are lot of non numeric
categories: chars, date, ...


I wasn't terribly happy about that either.  I still think we should
reduce this to a single polymorphic function, as in the attached.



I did try to write generic function very similar to what you wrote but 
discarded it because of the performance reasons. If we really want to 
support any data type I am all for having generic function but I still 
would like to have one optimized for float8 because that seems to be the 
most used use-case (at least that one is the reason why I even wrote 
this) for performance reasons.


Here are some numbers:
First float8:
CREATE TABLE wbtest AS SELECT (random() * 100)::float8 a FROM 
generate_series(1,100) ORDER BY 1;


SELECT
width_bucket(a, ARRAY[10, 20, 30, 40, 50, 60, 70, 80, 90]::float8[]),
width_bucket(a, ARRAY[10, 20, 30, 40, 50, 60, 70, 80, 91]::float8[]),
width_bucket(a, ARRAY[10, 20, 30, 40, 50, 60, 70, 80, 92]::float8[])
FROM wbtest;

Optimized float8: ~250ms
Tom's generic: ~400ms


Numeric:
CREATE TABLE wbtestn AS SELECT (random() * 100)::numeric a FROM 
generate_series(1,100) ORDER BY 1;


SELECT
width_bucket(a, ARRAY[10, 20, 30, 40, 50, 60, 70, 80, 90]::numeric[]),
width_bucket(a, ARRAY[10, 20, 30, 40, 50, 60, 70, 80, 91]::numeric[]),
width_bucket(a, ARRAY[10, 20, 30, 40, 50, 60, 70, 80, 92]::numeric[])
FROM wbtestn;

Optimized numeric: ~600ms
My generic: ~780ms
Tom's generic: ~900ms

The difference between my generic and Tom's generic is because Tom's is 
slowed down by the deconstruct_array.




2. Still I strongly afraid about used searching method - there is not
possible to check a  validity of input. Did you check how much linear
searching is slower - you spoke, so you have a real data and real use case?


Well, if we check the input then we'll be doing O(N) comparisons instead
of O(log N).  That could be a serious cost penalty for large arrays and
expensive comparison functions (such as strcoll()).  I think it's probably
sufficient to have a clear warning in the docs.



With resort the performance is worse than bunch of CASE WHEN :(



Also, a documentation quibble: in Petr's patch the documentation and
comments refer to the thresholds as right bounds, which I didn't care
for and replaced with upper bounds.  However, looking at it again,
I wonder if we would not be better off describing them as lower bounds.
They are exclusive bounds if seen as upper bounds, and inclusive if seen
as lower bounds, and on the whole I think the latter viewpoint might be
less confusing.  Thoughts?


Upper bounds is probably better name than right bounds, I agree with 
that. In any case it's upper bound for a bucket (that's why there is one 
more bucket to which everything bigger than max threshold goes into).



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-08-31 Thread Tom Lane

Another thought about this general topic:

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 ...
 Allowed actions on a RELKIND_PARTITION:
 * CREATE INDEX .. ON PARTITION n ON TABLE xyz
 ...
 Still To Be Designed
 
 * Are indexes/constraints inherited from the parent rel?

I think one of the key design decisions we have to make is whether
partitions are all constrained to have exactly the same set of indexes.
If we don't insist on that it will greatly complicate planning compared
to what we'll get if we do insist on it, because then the planner will
need to generate a separate customized plan subtree for each partition.
Aside from costing planning time, most likely that would forever prevent
us from pushing some types of intelligence about partitioning into the
executor.

Now, in the current model, it's up to the user what indexes to create
on each partition, and sometimes one might feel that maintaining a
particular index is unnecessary in some partitions.  But the flip side
of that is it's awfully easy to screw yourself by forgetting to add
some index when you add a new partition.  So I'm not real sure which
approach is superior from a purely user-oriented perspective.

I'm not trying to push one or the other answer right now, just noting
that this is a critical decision.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Built-in binning functions

2014-08-31 Thread Tom Lane

Petr Jelinek p...@2ndquadrant.com writes:
 On 30/08/14 19:24, Tom Lane wrote:
 I wasn't terribly happy about that either.  I still think we should
 reduce this to a single polymorphic function, as in the attached.

 I did try to write generic function very similar to what you wrote but 
 discarded it because of the performance reasons. If we really want to 
 support any data type I am all for having generic function but I still 
 would like to have one optimized for float8 because that seems to be the 
 most used use-case (at least that one is the reason why I even wrote 
 this) for performance reasons.

Well, part of the reason why your v3 float8 function looks so fast is that
it's cheating: it will not produce the right answers for comparisons
involving NaNs.  I'm not sure how good it would look once you'd added
some isnan() tests to make the behavior actually equivalent to
float8_cmp_internal().

However, assuming it still seems worthwhile once that's accounted for,
I don't have a strong objection to having an additional code path for
float8.  There are two ways we could do that:

1. Add a runtime test in the polymorphic function, eg

if (element_type == FLOAT8OID)
result = width_bucket_float8(...);
else if (typentry-typlen  0)
result = width_bucket_fixed(...);
else
result = width_bucket_variable(...);

2. Define a SQL function width_bucket(float8, float8[]) alongside
the polymorphic one.

While #2 might be marginally faster at runtime, the main difference
between these is what the parser would choose to do with ambiguous cases.

I experimented a bit and it seemed that the parser would prefer the
float8 function in any situation where the inputs were of numeric types,
probably because float8 is the preferred type in the numeric category.
I'm not real sure that would be a good thing: in particular it would
cast integer and numeric inputs to float8, which risks precision loss,
and there doesn't seem to be any way to get the parser to make the
other choice if the inputs are of numeric-category types.

As against that objection, it would make cross-type numeric cases easier
to use, for example width_bucket(1, array[2.4]) would be accepted.
If we just offer the anyelement/anyarray version then the parser would
make you cast 1 to numeric before it would consider the function to
match.

On balance the runtime-test approach looks like a better idea, in that
it doesn't risk any unexpected semantic behaviors.

 The difference between my generic and Tom's generic is because Tom's is 
 slowed down by the deconstruct_array.

Meh.  It looked to me like your version would have O(N^2) performance
problems from computing array offsets repeatedly, depending on exactly
which array element it ended up on.  It would avoid repeat calculations
only if it always moved right.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-08-31 Thread Hannu Krosing

On 08/31/2014 10:03 PM, Tom Lane wrote:
 Another thought about this general topic:

 Alvaro Herrera alvhe...@2ndquadrant.com writes:
 ...
 Allowed actions on a RELKIND_PARTITION:
 * CREATE INDEX .. ON PARTITION n ON TABLE xyz
 ...
 Still To Be Designed
 
 * Are indexes/constraints inherited from the parent rel?
 I think one of the key design decisions we have to make is whether
 partitions are all constrained to have exactly the same set of indexes.
 If we don't insist on that it will greatly complicate planning compared
 to what we'll get if we do insist on it, because then the planner will
 need to generate a separate customized plan subtree for each partition.
 Aside from costing planning time, most likely that would forever prevent
 us from pushing some types of intelligence about partitioning into the
 executor.

 Now, in the current model, it's up to the user what indexes to create
 on each partition, and sometimes one might feel that maintaining a
 particular index is unnecessary in some partitions.  But the flip side
 of that is it's awfully easy to screw yourself by forgetting to add
 some index when you add a new partition.  
The forgetting part is easy to solve by inheriting all indexes from
parent (or template) partition unless explicitly told not to.

One other thing that has been bothering me about this proposal
is the ability to take partitions offline for maintenance or to load
them offline ant then switch in.

In current scheme we do this using ALTER TABLE ... [NO] INHERIT ...

If we also want to have this with the not-directly-accessible partitions
then perhaps it could be done by having a possibility to move
a partition between two tables with exactly the same structure ?

 So I'm not real sure which
 approach is superior from a purely user-oriented perspective.
What we currently have is a very flexible scheme which has a few
drawbacks

1) unnecessarily complex for simple case
2) easy to shoot yourself in the foot by forgetting something
3) can be hard on planner, especially with huge number of partitions

An alternative way of solving these problems is adding some
(meta-)constraints to current way of doing things and some more
automation

CREATE TABLE FOR PARTITIONMASTER
WITH (ALL_INDEXES_SAME=ON,
  SAME_STRUCTURE_ALWAYS=ON,
  SINGLE_INHERITANCE_ONLY=ON,
  NESTED_INHERITS=OFF,
  PARTITION_FUNCTION=default_range_partitioning(int)
);

and then force these when adding inherited tables (in this case
partition tables)
either via CREATE TABLE or ALTER TABLE

Best Regards

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-08-31 Thread Martijn van Oosterhout

On Fri, Aug 29, 2014 at 12:35:50PM -0400, Tom Lane wrote:
  Each partition is assigned an Expression that receives a tuple and
  returns boolean.  This expression returns true if a given tuple belongs
  into it, false otherwise.
 
 -1, in fact minus a lot.  One of the core problems of the current approach
 is that the system, particularly the planner, hasn't got a lot of insight
 into exactly what the partitioning scheme is in a partitioned table built
 on inheritance.  If you allow the partitioning rule to be a black box then
 that doesn't get any better.  I want to see a design wherein the system
 understands *exactly* what the partitioning behavior is.  I'd start with
 supporting range-based partitioning explicitly, and maybe we could add
 other behaviors such as hashing later.
 
 In particular, there should never be any question at all that there is
 exactly one partition that a given row belongs to, not more, not less.
 You can't achieve that with a set of independent filter expressions;
 a meta-rule that says exactly one of them should return true is an
 untrustworthy band-aid.
 
 (This does not preclude us from mapping the tuple through the partitioning
 rule and finding that the corresponding partition doesn't currently exist.
 I think we could view the partitioning rule as a function from tuples to
 partition numbers, and then we look in pg_class to see if such a partition
 exists.)

There is one situation where you need to be more flexible, and that is
if you ever want to support online repartitioning. To do that you have
to distinguish between I want to insert tuple X, which partition
should it go into and I want to know which partitions I need to look
for partition_key=Y.

For the latter you really have need an expression per partition, or
something equivalent.  If performance is an issue I suppose you could
live with having an old and an new partition scheme, so you
couldn't have two live repartitionings happening simultaneously.

Now, if you want to close the door on online repartitioning forever
then that fine. But being in the position of having to say yes our
partitioning scheme sucks, but we would have to take the database down
for a week to fix it is no fun.

Unless logical replication provides a way out maybe??

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature

Re: [HACKERS] Built-in binning functions

2014-08-31 Thread Simon Riggs

On 31 August 2014 20:44, Gavin Flower gavinflo...@archidevsys.co.nz wrote:
 On 01/09/14 06:00, Tom Lane wrote:

 Simon Riggs si...@2ndquadrant.com writes:

 Suggest discretize() as a much more informative name. The other names
 will be overlooked by anybody that doesn't already know what to look
 for.

 I did not like that idea to begin with, but it's growing more attractive.
 In particular, I think it would be unique enough that we could safely
 mark the polymorphic function as variadic (a move that would cause
 ambiguity issues for pretty nearly any user-defined function of the
 same name).

 OTOH, I'm not as convinced as you that this name would mean anything
 to anybody that doesn't already know what to look for.  It still
 seems like rather arcane terminology.


 If I was new to PostgreSQL, I'd probably read this as disc*(), a function to
 do something with discs.

 I have done finite maths and statistics at university, plus I have been
 vaguely following this thread - but, the meaning of discretize() is not
 immediately apparent to me (if I reread a previous email or 2 in this
 thread, then I'm sure it would then be obvious!).  I might guess that it is
 to make something discrete, but why would I want to do that?  So if I came
 across this function in a year or 2, I'd probably go right past it.

 I have been programming for over 40 years, and I still find choosing
 appropriate names sometimes very difficult, so I know it is not easy to come
 up with meaningful names - especially ones that mean something relevant to
 other people!

It's important that we *don't* come up with a new name.

This function does something useful and the words people already use
to describe that are binning and discretization. By this I mean names
already used by very widely used software. Search them and see, or
suggest more widely used alternatives, please.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Built-in binning functions

2014-08-31 Thread Petr Jelinek


On 31/08/14 22:33, Tom Lane wrote:

Petr Jelinek p...@2ndquadrant.com writes:

On 30/08/14 19:24, Tom Lane wrote:

I wasn't terribly happy about that either.  I still think we should
reduce this to a single polymorphic function, as in the attached.



I did try to write generic function very similar to what you wrote but
discarded it because of the performance reasons. If we really want to
support any data type I am all for having generic function but I still
would like to have one optimized for float8 because that seems to be the
most used use-case (at least that one is the reason why I even wrote
this) for performance reasons.


Well, part of the reason why your v3 float8 function looks so fast is that
it's cheating: it will not produce the right answers for comparisons
involving NaNs.  I'm not sure how good it would look once you'd added
some isnan() tests to make the behavior actually equivalent to
float8_cmp_internal().



True, it increases the time of the test to around 285ms, still almost 
30% speed difference.



However, assuming it still seems worthwhile once that's accounted for,
I don't have a strong objection to having an additional code path for
float8.  There are two ways we could do that:

1. Add a runtime test in the polymorphic function, eg

if (element_type == FLOAT8OID)
result = width_bucket_float8(...);
else if (typentry-typlen  0)
result = width_bucket_fixed(...);
else
result = width_bucket_variable(...);



Yeah I think I prefer this one too, will see how much performance it eats.




The difference between my generic and Tom's generic is because Tom's is
slowed down by the deconstruct_array.


Meh.  It looked to me like your version would have O(N^2) performance
problems from computing array offsets repeatedly, depending on exactly
which array element it ended up on.  It would avoid repeat calculations
only if it always moved right.



I actually think that worst case (when you go always left) for my 
version is O(N) since you only need to seek for the half of previous 
interval (it's doing binary search after all) and you do O(N) in the 
deconstruct_array. It would be very different if we could cache the 
array somehow (ie, if this was an aggregate) then it would obviously 
make a lot of sense to use deconstruct_array and in that case it would 
make even sense to resort probably, but sadly we can't cache afaik.


Also, I made more tests with various array sizes (3-1) and 
distributions and mine version was never slower.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Built-in binning functions

2014-08-31 Thread David G Johnston

Simon Riggs wrote
 width_bucket() seems to refer to an equal-width binning process. The
 function being discussed here is a generic mechanism, the boundaries
 of which could have been decided using equal-frequency or other
 mechanisms. Using the word width in those contexts could be
 confusing.
 
 Given width_bucket() is already in use for SQL Standard function, I
 agree it would be confusing to have same name for different parameter
 profiles.

literal_bucket(float8, float8[])

Since bucket is the 'verb' here (in this specific case meaning lookup the
supplied value in the supplied bucket definition) and width is a modifier
(the bucket specification describes an equal-width structure) I suggest
literal_bucket(val, array[]) such that the bucket is still the verb but
now the modifier describes a structure that is literally provided.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Built-in-binning-functions-tp5807266p5817097.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Built-in binning functions

2014-08-31 Thread Tom Lane

Petr Jelinek p...@2ndquadrant.com writes:
 On 31/08/14 22:33, Tom Lane wrote:
 Petr Jelinek p...@2ndquadrant.com writes:
 The difference between my generic and Tom's generic is because Tom's is
 slowed down by the deconstruct_array.

 Meh.  It looked to me like your version would have O(N^2) performance
 problems from computing array offsets repeatedly, depending on exactly
 which array element it ended up on.  It would avoid repeat calculations
 only if it always moved right.

 I actually think that worst case (when you go always left) for my 
 version is O(N) since you only need to seek for the half of previous 
 interval (it's doing binary search after all) and you do O(N) in the 
 deconstruct_array.

[ thinks about that for awhile... ]  Hm, I think you are right.

If there are N elements in the array then we will scan over N/2 of them
to locate the midpoint element for the first comparison.  After that,
we will go either left or right, and in either case we will need to scan
over N/4 elements to find the second-level midpoint.  The third-level
midpoint will be found after scanning N/8 elements, and so on.  So the
number of elements scanned over to compute their lengths is N/2 + N/4 +
N/8 ..., or asymptotically N, the same as for the deconstruct_array
approach.  deconstruct_array might have some small advantage from tighter
inner loops, but this is probably more than blown by the need for a
palloc and pfree.

So I was wrong and your way is better for navigating the array in the
variable-width case, assuming that we're not concerned about spending
some more code space to make this function a shade faster.

BTW, was there a reason for not noticing the case of exact match in
the search loop, and falling out early?  As it stands the code will
reliably choose the leftmost match if there are multiple equal items
in the search array, but do we care about such cases?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Built-in binning functions

2014-08-31 Thread Tom Lane

David G Johnston david.g.johns...@gmail.com writes:
 Since bucket is the 'verb' here (in this specific case meaning lookup the
 supplied value in the supplied bucket definition) and width is a modifier
 (the bucket specification describes an equal-width structure) I suggest
 literal_bucket(val, array[]) such that the bucket is still the verb but
 now the modifier describes a structure that is literally provided.

It's a very considerable stretch to see bucket as a verb here :-).
Maybe that's why the SQL committee's choice of function name seems
so unnatural (to me anyway).

I was wondering about bucket_index(), ie get the index of the bucket
this value falls into.  Or get_bucket(), or get_bucket_index() if you
like verbosity.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Built-in binning functions

2014-08-31 Thread David Johnston

On Sun, Aug 31, 2014 at 7:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 David G Johnston david.g.johns...@gmail.com writes:
  Since bucket is the 'verb' here (in this specific case meaning lookup
 the
  supplied value in the supplied bucket definition) and width is a
 modifier
  (the bucket specification describes an equal-width structure) I suggest
  literal_bucket(val, array[]) such that the bucket is still the verb but
  now the modifier describes a structure that is literally provided.

 It's a very considerable stretch to see bucket as a verb here :-).
 Maybe that's why the SQL committee's choice of function name seems
 so unnatural (to me anyway).

 I was wondering about bucket_index(), ie get the index of the bucket
 this value falls into.  Or get_bucket(), or get_bucket_index() if you
 like verbosity.

 regards, tom lane


I got stuck on the thought that a function name should ideally be/include
a verb...

Even if you read it as a noun (and thus the verb is an implied get) the
naming logic still holds.

I pondered a get_ version though the argument for avoiding conflicting
user-code decreases its appeal.

The good part about SQL standard naming is that the typical programmer
isn't likely to pick a conflicting name.

bucket_index is appealing by itself though user-code probable...as bad as
I think width_bucket is for a name the fact is that it currently exists
and, even forced, consistency has merit.

David J.

Re: [HACKERS] Tips/advice for implementing integrated RESTful HTTP API

2014-08-31 Thread Craig Ringer

On 08/31/2014 12:40 PM, Dobes Vandermeer wrote:
 1. Connecting to multiple databases
 
 The background workers can apparently only connect to a single database
 at a time, but I want to expose all the databases via the API. 

bgworkers are assigned a database at launch time (if SPI is enabled),
and this database may not change during the worker's lifetime, same as a
normal backend.

Sometimes frustrating, but that's how it is.

 I think I could use libpq to connect to PostgreSQL on localhost but this
 might have weird side-effects in terms of authentication, pid use, stuff
 like that.

If you're going to do that, why use a bgworker at all?

In general, what do you gain from trying to do this within the database
server its self, not as an app in front of the DB?

 I could probably manage a pool of dynamic workers (as of 9.4), one per
 user/database combination or something along those lines.  Even one per
 request?  Is there some kind of IPC system in place to help shuttle the
 requests and responses between dynamic workers?  Or do I need to come up
 with my own?

The dynamic shmem code apparently has some queuing functionality. I
haven't used it yet.

 It seems like PostgreSQL itself has a way to shuttle requests out to
 workers, is it possible to tap into that system instead?  Basically some
 way to send the requests to a PostgreSQL backend from the background worker?

It does?

It's not the SPI, that executes work directly within the bgworker,
making it behave like a normal backend for the purpose of query execution.

 Or perhaps I shouldn't do this as a worker but rather modify PostgreSQL
 itself and do it in a more integrated/destructive manner?

Or just write a front-end.

The problem you'd have attempting to modify PostgreSQL its self for this
is that connection dispatch occurs via the postmaster, which is a
single-threaded process that already needs to do a bit of work to keep
an eye on how things are running. You don't want it constantly busy
processing and dispatching millions of tiny HTTP requests. It can't just
hand a connection off to a back-end immediately after accepting it,
either; it'd have to read the HTTP headers to determine what database to
connect to. Then launch a new backend for the connection, which is
horribly inefficient when doing tiny short-lived connections. The
postmaster has no concept of a pool of backends (unfortunately, IMO) to
re-use.

I imagine (it's not something I've investigated, really) that you'd want
a connection accepter process that watched the listening http request
socket. It'd hand connections off to dispatcher processes that read the
message content to get the target DB and dispatch the request to a
worker backend for the appropriate user/db combo, then collect the
results and return them on the connection. Hopefully at this point
you're thinking that sounds a lot like a connection pool... because it
is. An awfully complicated one, probably, as you'd have to manage
everything using shared memory segments and latches.

In my view it's unwise to try to do this in the DB with PostgreSQL's
architecture. Hack PgBouncer or PgPool to do what you want. Or write a
server with Tomcat/Jetty using JAX-RS and PgJDBC and the built in
connection pool facilities - you won't *believe* how easy it is.

 3. Parallelism
 
 The regular PostgreSQL server can run many queries in parallel

Well, one PostgreSQL instance (postmaster) may have many backends, each
of which may run queries in series but not in parallel. Any given
process may only run one query at once.

 but it
 seems like if I am using SPI I could only run one query at a time - it's
 not an asynchronous API.

Correct.

 Any help, sage advice, tips, and suggestions how to move forward in
 these areas would be muchly appreciated!

Don't do it with bgworkers.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY and heap_sync

2014-08-31 Thread Fabrízio de Royes Mello

On Sun, Aug 31, 2014 at 10:10 AM, Peter Eisentraut pete...@gmx.net wrote:

 On 8/30/14 2:26 AM, Jeff Janes wrote:
  But there cases were people use COPY in a loop with a small amount of
  data in each statement.

 What would be the reason for doing that?


I used that to the same thing many times. In a company that I was employed
we developed scripts to migrate data from one database do another.

The first version we used INSERT statements and was very very slow. Then we
wrote a second version changing the INSERT by COPY statements. The
performance was very better, but  we believe that could be better, so in
the third version we created some kind of cache (using arrays) to
accumulate the records in memory then after N rows we build the COPY
statement with the cache contents and run it. This was a really good
performance improvement.

It's my use case to we have a feature to postpone the heap_sync in COPY
statements. I don't know if it's a feature that a lot of people wants, but
IMHO it could be nice to improve the bulk load operations.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello

Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?

2014-08-31 Thread rohtodeveloper

 On 08/29/2014 04:59 AM, Kevin Grittner wrote:
 I just took a quick look at the spec to refresh my memory, and it
 seems to require that the WITH TIME ZONE types store UTC (I suppose
 for fast comparisons), it requires the time zone in the form of a
 hour:minute offset to be stored with it, so you can determine the
 local time from which it was derived. I concede that this is not
 usually useful, and am glad we have a type that behaves as
 timestamptz does; but occasionally a type that behaves in
 conformance with the spec would be useful, and it would certainly
 be less confusing for people who are used to the standard behavior.
 
 FWIW, MS SQL's DateTimeOffset data type:
 
 http://msdn.microsoft.com/en-AU/library/bb630289.aspx
 
 is much more like what I, when I was getting started, expected TIMESTAMP
 WITH TIME ZONE to be. We don't really have anything equivalent in
 PostgreSQL.
 

That's also what i expect，a timestamptz = timestampt + offset . Just like the 
current implementation  of TIME WITH TIME ZONE.

typedef struct
{
 TimeADT  time;   /* all time units other than months and years */
 int32  zone;   /* numeric time zone, in seconds */
} TimeTzADT;

And, it's inconvenient for client(jdbc,npgsql...) to understand a strict 
'timezone' (such as 'America/New_York') which comes from PostgreSQL and 
transform it to theirown data type(Such as DateTimeOffset in .NET). But a 
*offset* is easy to parse and process.


Beast Regards
rohto 
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RLS Design

2014-08-31 Thread Stephen Frost

Adam, all,

* Brightwell, Adam (adam.brightw...@crunchydatasolutions.com) wrote:
 Attached is a patch for RLS that was create against master at
 01363beae52700c7425cb2d2452177133dad3e93 and is ready for review.

Many thanks for posting this.  As others may realize already, I've
reviewed and modified this patch already, working with Adam to get it
ready.  I'm continuing to review and test it, but in general I'm quite
happy with how it's shaping up- additional review, testing, and comments
are always appreciated though.

 Alter a RLS policy named name on table.  Specifying a command is
 optional, if provided then the policy's command is changed otherwise it is
 left as-is.  Specifying a role is optional, if provided then the policy's
 role is changed otherwise it is left as-is.  The condition must always be
 provided and is therefore always replaced.

I'm pretty sure the condition is also optional in this patch (that was
a late change that I made), but the documentation needs to be updated.

 * Plancache Invalidation:  If a relation has a row-security policy and
 row-security is enabled then the invalidation will occur when either the
 row_security GUC is changed OR when a the current user changes.  This
 invalidation ONLY takes place for cached plans where the target relation
 has a row security policy.

I know there was a lot of discussion about this previously, but I'm fine
with the initial version simply invalidating plans which involve
RLS-enabled relations and role changes.  This patch doesn't create any
regressions for individuals who are not using RLS.  We can certainly
look into improving this in the future to have per-role plan caches but
it's a fair bit of additional non-trivial code that can be added
independently.

 * Security Qual Expression:  All row-security policies are OR'ed together.

This was also a point of much discussion, but I continue to feel this is
the right approach for the initial version.  We can add flexability here
later, if necessary, but OR'ing these together is in-line with how role
membership works today (you have right for all roles you are a member
of, persuant to inherit/noinherit status, of course).

 * row_security GUC - enable/disable row level security.

Note that, as discussed, pg_dump will set row_security off, unless
specifically asked to enable it.  row_security will also be set to off
when the user logging in is a superuser or does a 'set role' to a
superuser.  Currently, if a user logging in is *not* a superuser, or a
'set role' is done to a non-superuser, row_security gets re-set to
enabled.  This is one aspect of the patch that I think we should change
(which is a matter of removing just a few lines of code and then
updating the regression tests to do 'set row_security = on;' before
running), because if you log in as a superuser and then 'set role' to a
non-superuser, it occurs to me now (it didn't really when I wrote this
originally) as a bit surprising that row_security gets set to 'on' when
doing a 'set role'.

One thing that I really like about this approach is that a superuser can
explicitly set 'row_security' to on and be able to see what happens.
Clearly, in an environment of untrusted users, that could be dangerous,
but it can also be an extremely useful way of testing things,
particularly in development environments where everyone is a superuser. 

This deserves a bit more documentation also.

 * BYPASSRLS and NOBYPASSRLS role attribute - allows user to bypass RLS if
 row_security GUC is set to OFF.  If a user sets row_security to OFF and
 does not have this attribute, then an error is raised when attempting to
 query a relation with a RLS policy.

(note that the superuser is always considered to have the bypassrls
attribute)

 * psql \d table support: psql describe support for listing policy
 information per table.

This works pretty well for me, but we may want to add some indication
that RLS is on a table in the \dp listing.

Thanks!

Stephen


signature.asc
Description: Digital signature

38 matches

Mail list logo