Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

2008-12-12 Thread Peter Eisentraut

KaiGai Kohei wrote:

We are going to need to come up with specific answers to these issues
soon.


The origion of issue is simple.

Whether we should support to activate (not only compile) two or more 
security

mechanism in same time, or not.
In my opinion, it is not a frequent situation, and it gives us several big
pains, but benefit is smaller than the pains.


With all respect, you were asked to divide up the issues so we don't 
have to deal with them all at once.


For instance, a separate patch that implements SQL-level row level 
security would be fairly uncontroversial and issue-free at this point, 
but it would be completely useful on its own and it would build 
confidence in the developer community about your other plans.


Most committers have expressed the viewpoint in one way or another that 
having this available is a prerequisite for accepting further work. 
Yet, I am not aware of even an interface proposal for this.


Meanwhile, we are busy worrying about what system columns the follow-up 
features will have.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WIP: for 8.5 named and mixed notation support

2008-12-12 Thread Pavel Stehule
Hello

this patch carry support for named and mixed notation - more detailed
described at http://archive.adaic.com/standards/83rat/html/ratl-08-03.html.

This use ADA syntax - name = value, that is conntroversal and
problematic - so I expect change of syntax.

Regards
Pavel Stehule
*** ./src/backend/catalog/namespace.c.orig	2008-12-10 12:36:42.0 +0100
--- ./src/backend/catalog/namespace.c	2008-12-11 23:33:04.0 +0100
***
*** 38,43 
--- 38,44 
  #include commands/dbcommands.h
  #include miscadmin.h
  #include nodes/makefuncs.h
+ #include nodes/parsenodes.h
  #include parser/parse_func.h
  #include storage/backendid.h
  #include storage/ipc.h
***
*** 49,54 
--- 50,56 
  #include utils/memutils.h
  #include utils/rel.h
  #include utils/syscache.h
+ #include funcapi.h
  
  
  /*
***
*** 584,590 
   * functions to mask variadic ones if the expanded argument list is the same.
   */
  FuncCandidateList
! FuncnameGetCandidates(List *names, int nargs, bool expand_variadic)
  {
  	FuncCandidateList resultList = NULL;
  	bool		any_variadic = false;
--- 586,593 
   * functions to mask variadic ones if the expanded argument list is the same.
   */
  FuncCandidateList
! FuncnameGetCandidates(List *names, int nargs, bool expand_variadic,
! Notation notation, List *argnames)
  {
  	FuncCandidateList resultList = NULL;
  	bool		any_variadic = false;
***
*** 628,643 
  		Oid			va_elem_type;
  		List	   *defaults = NIL;
  		FuncCandidateList newResult;
  
  		/*
  		 * Check if function has some parameter defaults if some
  		 * parameters are missing.
  		 */
! 		if (pronargs  nargs  expand_variadic)
  		{
  			bool		isnull;
  			Datum		proargdefaults;
  			char	   *str;
  
  			/* skip when not enough default expressions */
  			if (nargs + procform-pronargdefaults  pronargs)
--- 631,795 
  		Oid			va_elem_type;
  		List	   *defaults = NIL;
  		FuncCandidateList newResult;
+ 		char	*rparam_type = NULL;
+ 		short int	*map_args = NULL;
+ 		short int	*map_defaults = NULL;
+ 
+ 		/*
+ 		 * Check named params if are used.
+ 		 */
+ 		if (notation != POSITIONAL_NOTATION)
+ 		{
+ 			ListCell		*lc;
+ 			int			i;
+ 			Oid		*p_argtypes;
+ 			char		**p_argnames;
+ 			char		*p_argmodes;
+ 			int	pronallargs;
+ 			bool			found_all;
+ 			bool			any_default;
+ 			Notation	actual_notation;
+ 			int 		j;
+ 
+ 			Assert(argnames != NIL);
+ 
+ 			pronallargs = get_func_arg_info(proctup, p_argtypes, p_argnames, p_argmodes);
+ 			if (!p_argnames)
+ continue;
+ 		
+ 			rparam_type = palloc(pronargs * sizeof(char));
+ 			map_args = palloc(pronargs * sizeof(short int));
+ 			map_defaults = palloc(pronargs * sizeof(short int));
+ 
+ 			/* now we know nothing about params */
+ 			for (j = 0; j  pronargs; j++)
+ 			{
+ rparam_type[j] = 'u';
+ map_args[j] = map_defaults[j] = -1;
+ 			}
+ 
+ 			defaults = NIL;
+ 			any_default = false;
+ 			
+ 			/* if MIXED notation is used, then I have to copy positional arguments */
+ 			i = 0;
+ 			actual_notation = POSITIONAL_NOTATION;
+ 			found_all = true;
+ 
+ 			foreach(lc, argnames)
+ 			{
+ Node *strnode = lfirst(lc);
+ 
+ if (strnode == NULL  actual_notation == POSITIONAL_NOTATION)
+ {
+ 	rparam_type[i] = 'p';
+ 	map_args[i] = i;
+ }
+ /* found end of positional notation */
+ if (strnode != NULL  actual_notation == POSITIONAL_NOTATION)
+ 	actual_notation = NAMED_NOTATION;
+ 
+ Assert(actual_notation == POSITIONAL_NOTATION || strnode != NULL);
+ 
+ if (actual_notation == NAMED_NOTATION)
+ {
+ 	char 	*cargname = strVal(strnode);
+ 	int 		j;
+ 	bool			found = false;
+ 	int			k = 0;
+ 	
+ 	for (j = 0; j  pronallargs; j++)
+ 	{
+ 		/* skip all OUT arguments */
+ 		if (p_argmodes  (p_argmodes[j] != FUNC_PARAM_IN  p_argmodes[j] != FUNC_PARAM_INOUT))
+ 			continue;
+ 		
+ 		if (p_argnames[j]  strcmp(cargname, p_argnames[j]) == 0)
+ 		{
+ 			found = true;
+ 			break;
+ 		}
+ 
+ 		k += 1;
+ 	}
+ 
+ 	if (!found)
+ 	{
+ 		found_all = false;
+ 		break;
+ 	}
+ 	else
+ 	{
+ 		rparam_type[k] = 'n';
+ 		map_args[k] = i;
+ 	}
+ }
+ 
+ i += 1;
+ 			}
+ 
+ 			if (!found_all)
+ continue;
+ 
+ 			/* try to fill unknown params with defaults */
+ 			if (procform-pronargdefaults  0)
+ 			{
+ bool		isnull;
+ Datum		proargdefaults;
+ char	   *str;
+ int 			pronargdefaults = procform-pronargdefaults;
+ 
+ proargdefaults = SysCacheGetAttr(PROCOID, proctup,
+ 			 Anum_pg_proc_proargdefaults, isnull);
+ Assert(!isnull);
+ str = TextDatumGetCString(proargdefaults);
+ defaults = (List *) stringToNode(str);
+ 
+ for (i = 0; i  pronargdefaults; i++)
+ {
+ 
+ 
+ 	if (rparam_type[pronargs - pronargdefaults + i] == 'u')
+ 	{
+ 		rparam_type[pronargs - pronargdefaults + i] = 'd';
+ 		

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

2008-12-12 Thread KaiGai Kohei

Peter Eisentraut wrote:

KaiGai Kohei wrote:

Peter Eisentraut wrote:

On Thursday 11 December 2008 18:32:50 Tom Lane wrote:

How can we stick all of these in the same column at the same time?

Why would we want to?


Because we want to use SQL-based row access control and SELinux-based 
row access control at the same time.  Isn't this exactly one of the 
objections upthread?  Both must be available at the same time.


Please make clear the meaning of use.
As you said, if your concern is based on packaging/distributing issue,
I suggested an alternative proposal which allows to compile multiple
security mechanism and to choose one of them on runtime.


I would like to be able to assign SQL-level ACLs and SELinux labels to 
the same row at the same time in the same build, and have the system 
enforce both on top of each other.


In my opinion, it makes more pains (user-interface, performance, complexity
of implementation and so on) than its benefit which allows to support MAC
and DAC concurrently.


We can debate the merits of having, say, SELinux plus Solaris TX at 
the same time, but if we can have two as per previous paragraph, we 
should design for several.


What platform is available for both of SELinux and Solaris TX?


Well, Solaris, if you believe various rumours.  I agree the case for 
this might be weak, though.


Are you saying about Solaris FMAC project?
It is a different platform from Trusted Solaris.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei kai...@ak.jp.nec.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Simon Riggs

On Fri, 2008-12-12 at 02:23 +, Greg Stark wrote:

 The existing sampling mechanism is tied to solid statistics. It
 provides the correct sample size to get a consistent confidence range
 for range queries. This is the same mathematics which governs election
 polling and other surveys. The sample size you need to get +/- 5% 19
 times out of 20 increases as the population increases, but not by very
 much.

Sounds great, but its not true. The sample size is not linked to data
volume, so how can it possibly give a consistent confidence range?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Simon Riggs

On Thu, 2008-12-11 at 18:52 -0500, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Thu, 2008-12-11 at 22:29 +, Gregory Stark wrote:
  And I would like it even more if the sample size increased according
  to table size, since that makes ndistinct values fairly random for
  large tables.
  
  Unfortunately _any_ ndistinct estimate based on a sample of the table
  is going to be pretty random.
 
  We know that constructed data distributions can destroy the
  effectiveness of the ndistinct estimate and make sample size irrelevant.
  But typical real world data distributions do improve their estimations
  with increased sample size and so it is worthwhile.
 
 This is handwaving unsupported by evidence.  

Not at all.

In the paper cited within the ANALYZE code, shown here:
ftp://ftp.research.microsoft.com/users/autoadmin/histogram_conf.pdf
we implement the sample size for reliable histogram size, but ignore
most of the rest of the paper.

Sections (4) Block Level sampling is ignored, yet the conclusions are
(7.2) that it provides a larger and more effective sample size yet
without significantly increasing number of accessed disk blocks.

Haas Stokes [1998] also indicate that accuracy of n-distinct estimation
is linked to sample size.

In a previous post to hackers I looked at the case where values were
physically clustered together in the table, either naturally or via the
CLUSTER command. Section: ESTIMATES OF D FOR DEPENDENT TABLES
http://archives.postgresql.org/pgsql-hackers/2006-01/msg00153.php

In that case the current ANALYZE algorithm fails badly because of fixed
sample size. This is because ANALYZE randomly samples rows, so that the
average gap between randomly
selected rows increases as the table size increases, because of the
fixed sample size. Since the clustered rows are typically close
together, then the apparent number of multiple instances of the same
data value decreases as the sample fraction decreases. Since the sample
size is currently fixed, this means that the D estimate decreases as the
table size increases. (This is proven in a test case below).

 If you've got a specific
 proposal what to change the sample size to and some numbers about what
 it might gain us or cost us, I'm all ears.

So my specific proposal is: implement block level sampling.

It allows us to
* increase sample size without increasing number of I/Os
* allows us to account correctly for clustered data

I'm not trying to force this to happen now, I'm just bringing it into
the discussion because its relevant and has not been mentioned.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] contrib/pg_stat_statements 1212

2008-12-12 Thread ITAGAKI Takahiro
Here is an updated version of pg_stat_statements.

[Changes]
- A new GUC variable 'explain_analyze_format' is added.
- Statistics counters are converted into a variable 'BufferStats'.

Vladimir Sitnikov sitnikov.vladi...@gmail.com wrote:

 Can I ask my question once again?
 Why don't you want to make print all the info the default output format?

I added a new GUC variable 'explain_analyze_format' to print all
statistics counters in EXPLAIN ANALYZE. We can set special variables
into it to get non-default counters. I think everyone don't always
require all the info. Of course TABLE-explain and XML-explain could
handle the counters better, but this patch should not include them.

A variable 'BufferStats' is for cleanup calculations of many separated
counters. It is exported (marked as PGDLLIMPORT) so that users can add
a user-defined view like pg_stat_session_buffer_usage if needed.

Comments welcome.


[Sample output of explain_analyze_format]
=# SET explain_analyze_format = ' gets=%g reads=%r temp=%t cpu=%p';
=# EXPLAIN ANALYZE SELECT * FROM accounts;
   QUERY 
PLAN
-
 Seq Scan on accounts  (cost=0.00..2640.00 rows=10 width=97) (actual 
time=0.054..117.046 rows=10 loops=1 gets=1640 reads=1608 temp=0 cpu=125.00)
 Total runtime: 208.167 ms
(2 rows)

The following special variables are available:
%g : # of shared buffer gets (= %h + %r)
%h : # of shared buffer hits
%r : # of shared buffer reads
%w : # of shared buffer writes
%G : # of local buffer gets (= %H + %R)
%H : # of local buffer hits
%R : # of local buffer reads
%W : # of local buffer writes
%t : # of buffile reads
%T : # of buffile writes
%u : user cpu time
%s : sys cpu time
%p : total cpu times (= %u + %s)

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



pg_stat_statements-1212.tar.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep: First Thoughts on Code

2008-12-12 Thread Fujii Masao
Hi,

On Fri, Dec 12, 2008 at 1:34 PM, Aidan Van Dyk ai...@highrise.ca wrote:
 * Fujii Masao masao.fu...@gmail.com [081211 23:00]:
 Hi,

   Or, should I
 create the feature for the user to confirm whether it's in synch rep via 
 SQL?

 I don't need a way to check via SQL, but I'ld love a postgresql.conf
 option that when set would make sure that all connections pretty much
 just hang until a slave has connected and everything is setup for sync
 rep.  I think I saw that youre using normal connection setup to start
 the wal streaming to the slave, so you have to allow connections, but
 I'ld really not want any of my pg-clients able to do anything if
 sync-rep isn't happenning...

How about stopping the request / connection from a client in front of
postgres (e.g. connection pooling software)? Or, we should develop
the feature like OFFLINE of Oracle apart from Synch Rep at first.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [Patch] Space reservation (pgupgrade)

2008-12-12 Thread Zdenek Kotala
I attached patch which add capability to reserve space on page for future 
upgrade. It is mandatory for future in-place upgrade implementation. This patch 
contains basic infrastructure not preupgrade script itself. I'm going to send 
WIP preupgrade script today in separate mail.


This patch contains following modifications:

1) I added datpreupgstatus and relpreupgstatus attribute into pg_database and 
pg_class. Original idea was to use only flag, but I need more info for tracking 
several process status (like 0 - not set, 1 - reserved space set, 2 - 
reservation is finished and so on).


I'm not sure if datpreupgstatus will be useful, but I think better is to have it 
here.


2) I added two reloption rs_perpage and rs_pertuple for setup amount of reserved 
space. I think these two attributes are enough for configure all case. Keep in 
mind that for each relation could be these parameters different.


3) I adapted source code to respect new reloptions. Basic idea of it is that 
before someone call PageAddItem it checks free space on a page 
(PageGetFreeSpace...). I modify PageGetFreeSpace function to count reserved 
space. Unfortunately, it requires additional parameters.


It works, but I'm not sure if any external enhancement cannot shortcut this and 
call PageAddItem without PageFreeSpace call.


I'm thinking now about refactoring it and replace PageGetFreeSpace(Heap) 
functions with RelPageGetFreeSpace and add new function RelPageAddItem. 
RelPageAddItem will replace all direct call of PageAddItem.


Comments, ideas?

thanks Zdenek

diff -Nrc pgsql_spacereserve.84e2e9c42ef7/src/backend/access/common/reloptions.c pgsql_spacereserve/src/backend/access/common/reloptions.c
*** pgsql_spacereserve.84e2e9c42ef7/src/backend/access/common/reloptions.c	2008-12-12 11:36:42.140563612 +0100
--- pgsql_spacereserve/src/backend/access/common/reloptions.c	2008-12-12 11:36:42.233907101 +0100
***
*** 286,330 
  default_reloptions(Datum reloptions, bool validate,
     int minFillfactor, int defaultFillfactor)
  {
! 	static const char *const default_keywords[1] = {fillfactor};
! 	char	   *values[1];
! 	int			fillfactor;
  	StdRdOptions *result;
  
! 	parseRelOptions(reloptions, 1, default_keywords, values, validate);
  
  	/*
  	 * If no options, we can just return NULL rather than doing anything.
  	 * (defaultFillfactor is thus not used, but we require callers to pass it
  	 * anyway since we would need it if more options were added.)
  	 */
! 	if (values[0] == NULL)
  		return NULL;
  
! 	if (!parse_int(values[0], fillfactor, 0, NULL))
  	{
! 		if (validate)
! 			ereport(ERROR,
! 	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
! 	 errmsg(fillfactor must be an integer: \%s\,
! 			values[0])));
! 		return NULL;
  	}
  
! 	if (fillfactor  minFillfactor || fillfactor  100)
  	{
! 		if (validate)
! 			ereport(ERROR,
! 	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
! 	 errmsg(fillfactor=%d is out of range (should be between %d and 100),
! 			fillfactor, minFillfactor)));
! 		return NULL;
  	}
  
  	result = (StdRdOptions *) palloc(sizeof(StdRdOptions));
  	SET_VARSIZE(result, sizeof(StdRdOptions));
  
  	result-fillfactor = fillfactor;
  
  	return (bytea *) result;
  }
--- 286,386 
  default_reloptions(Datum reloptions, bool validate,
     int minFillfactor, int defaultFillfactor)
  {
! 	static const char *const default_keywords[3] = {fillfactor,rs_perpage,rs_pertuple};
! 	char	   *values[3];
! 	int			fillfactor=defaultFillfactor;
! 	int			rs_perpage=0;
! 	int			rs_pertuple=0;
  	StdRdOptions *result;
  
! 	parseRelOptions(reloptions, 3, default_keywords, values, validate);
  
  	/*
  	 * If no options, we can just return NULL rather than doing anything.
  	 * (defaultFillfactor is thus not used, but we require callers to pass it
  	 * anyway since we would need it if more options were added.)
  	 */
! 	if ((values[0] == NULL)  (values[1] == NULL)  (values[2] == NULL))
  		return NULL;
  
! 	/* fill factor */
! 	if (values[0] != NULL)
  	{
! 		if (!parse_int(values[0], fillfactor, 0, NULL))
! 		{
! 			if (validate)
! ereport(ERROR,
! 		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
! 		 errmsg(fillfactor must be an integer: \%s\,
! values[0])));
! 			return NULL;
! 		}
! 
! 		if (fillfactor  minFillfactor || fillfactor  100)
! 		{
! 			if (validate)
! ereport(ERROR,
! 		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
! 		 errmsg(fillfactor=%d is out of range (should be between %d and 100),
! fillfactor, minFillfactor)));
! 			return NULL;
! 		}
  	}
  
! 	/* reserved space per page */
! 	if (values[1] != NULL)
  	{
! 		if (!parse_int(values[1], rs_perpage, 0, NULL))
! 		{
! 			if (validate)
! ereport(ERROR,
! 		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
! 		 errmsg(rs_perpage must be an integer: \%s\,
! values[1])));
! 			return NULL;
! 		}
! 
! 		if (rs_perpage  0 || rs_perpage  BLCKSZ/4)
! 		{
! 			if 

Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Robert Haas
On Fri, Dec 12, 2008 at 4:04 AM, Simon Riggs si...@2ndquadrant.com wrote:
 The existing sampling mechanism is tied to solid statistics. It
 provides the correct sample size to get a consistent confidence range
 for range queries. This is the same mathematics which governs election
 polling and other surveys. The sample size you need to get +/- 5% 19
 times out of 20 increases as the population increases, but not by very
 much.

 Sounds great, but its not true. The sample size is not linked to data
 volume, so how can it possibly give a consistent confidence range?

I'm not 100% sure how relevant it is to this case, but I think what
Greg is referring to is:

http://en.wikipedia.org/wiki/Margin_of_error#Effect_of_population_size

It is a pretty well-known mathematical fact that for something like an
opinion poll your margin of error does not depend on the size of the
population but only on the size of your sample.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility maps

2008-12-12 Thread Pavan Deolasee
On Thu, Dec 11, 2008 at 8:09 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Pavan Deolasee wrote:


 I can do some if we don't have already.

 Oh, yes please!


I did some tests today with pgbench on a decent SMP machine. The goal
was to see if multiple clients (20 in the test) tries to update tuples
in different data blocks, if the EX lock on the VM page causes any
contention.

I can confirm that I haven't seen any drop in the tps with VM. I guess
since the bit reset is a very small code compared to the entire UPDATE
code path, may be its less likely than I thought previously that
multiple clients attempt to reset the bit at the same time. I'll do
some more tests to see if setting the bit in HOT-prune path leads to
any contention or not.

I can send details of the test I did, if anyone is interested.

Thanks,
Pavan


-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Robert Haas
On Thu, Dec 11, 2008 at 10:12 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I had this idle thought too, but I didn't write it down because...

 ought to be, but it seems like it ought to be possible to determine
 that given a desired maximum error in the overall estimate.  I'm also
 not very clear on what the total frequency computations (matchfreq2
 and unmatchfreq2 in the current code) ought to look like if we are using
 a variable subset of the inner list.

 ...of this exact concern, which I think is an insurmountable problem.

 Maybe so.  If we stick to the other design (end both lists at a preset
 frequency threshold) then the math clearly goes through the same as
 before, just with num_mcvs that are determined differently.  But can
 we prove anything about the maximum error added from that?

I don't think so, because in that design, it's entirely possible that
you'll throw away the entire MCV list if all of the entries are below
the threshold (as in the example we were just benchmarking, supposing
a threshold of 0.001).

An alternative is to pick a threshold T for the maximum number of
equality probes that you're willing to suffer through.  Then given two
MCV lists of lengths M1 and M2 such that M1 * M2  T, pick the largest
N such that MIN(M1, N) * MIN(M2, N) = T.  This guarantees that you
always use at least T^(1/2) MCVs.

If you compare this approach with T = 10^6 vs. simply chopping off the
MCV list at p = 0.001, this approach will be more accurate at the cost
of more comparisons.  For example in our test case where all the
comparisons fail, chopping off the MCV list at p = 0.001 results in
ignoring it completely, whereas with this approach we use all 1000
entries just as before.  So it might be appropriate to choose a lower
threshold like, say, T = 10^5, since otherwise we're not preventing
any computation.  (I suppose you could even make this a GUC since any
choice of value is going to be pretty arbitrary...)

I'm not sure to what extent we can bound the amount of error with this
approach, but it's definitely better.  As we've seen, a frequency
cutoff can throw away the entire MCV list; this approach always
retains at least T^(1/2) entries, and more if the other list happens
to be shorter than T^(1/2).  So we can say that the result will never
be worse than it would have been had you set the statistics target to
T^(1/2).

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql commands for SQL/MED

2008-12-12 Thread Dimitri Fontaine

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

Le 12 déc. 08 à 12:58, Peter Eisentraut a écrit :

The current proposed patch allocates the following psql \d commands:

[...]
In an idle second I thought, how about F for foreign, but of  
course \dF* is already used for full-text search.  We could overload  
the F, but it might be weird.


Other ideas?


What about \dM prefix, M standing for MED? It seems free in my 8.3  
psql here.


Regards,
- --
dim



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAklCWOkACgkQlBXRlnbh1bkaWgCaA2Y02028n/+3BYPImJTEKJq8
ozgAn2v/XC+uSZy3imKCLW/tFt9Ohrik
=1ttA
-END PGP SIGNATURE-

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility maps

2008-12-12 Thread Dimitri Fontaine

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

Le 12 déc. 08 à 13:11, Pavan Deolasee a écrit :

I did some tests today with pgbench on a decent SMP machine. The goal
was to see if multiple clients (20 in the test) tries to update tuples
in different data blocks, if the EX lock on the VM page causes any
contention.


If you want to test for a really high number of clients, you could  
have a try at tsung, which is designed for doing just this.

  http://archives.postgresql.org/pgsql-admin/2008-12/msg00032.php
  http://tsung.erlang-projects.org/

HTH, regards,
- --
dim



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAklCWekACgkQlBXRlnbh1bmQdACgwloRjx9lZyhLpjGCSuY7K/Au
xmUAoJSAlVoqerio175UHFPS1xVzI3iZ
=45KY
-END PGP SIGNATURE-

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql commands for SQL/MED

2008-12-12 Thread Alvaro Herrera
Dimitri Fontaine wrote:

 Le 12 déc. 08 à 12:58, Peter Eisentraut a écrit :

 In an idle second I thought, how about F for foreign, but of course 
 \dF* is already used for full-text search.  We could overload the F, 
 but it might be weird.

 Other ideas?

 What about \dM prefix, M standing for MED? It seems free in my 8.3 psql 
 here.

So \dMf for foreign servers, \dMu for user mappings, etc?  That seems good.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Space reservation (pgupgrade)

2008-12-12 Thread Heikki Linnakangas

Zdenek Kotala wrote:

I attached patch which add capability to reserve space on page for
future upgrade. It is mandatory for future in-place upgrade 
implementation. This patch contains basic infrastructure not preupgrade 
script itself. I'm going to send WIP preupgrade script today in separate 
mail.


Is that a preupgrade script for upgrading from 8.3 to 8.4? As such, it 
can't take advantage of any of the changes in this patch.



This patch contains following modifications:

1) I added datpreupgstatus and relpreupgstatus attribute into 
pg_database and pg_class. Original idea was to use only flag, but I need 
more info for tracking several process status (like 0 - not set, 1 - 
reserved space set, 2 - reservation is finished and so on).


I'm not sure if datpreupgstatus will be useful, but I think better is to 
have it here.


I think this is too flexible and not flexible enough at the same time. 
It's too flexible, because we don't know what information we'd need to 
store about relations in a future script. Those fields might go unused 
for many releases, just confusing people. It's not flexible enough, if 
it turns out that we need to store more information about relations.


Predicting features that have not yet been written is hard, isn't it. 
Trying to do it too precisely will just lead to failure. The generic 
approach of using a pre-upgrade script is good, but I don't think it's 
wise to prepare anything more specific than that.


It seems that those flags were meant to keep track of what databases and 
relations have already been processed by the pre-upgrade script. I don't 
think the script needs to be restartable. If we accept that the whole 
cluster must be processed in one go, we don't need so much bookkeeping. 
Remember that this is a tool that we'll need to backport to people's 
production systems, so better keep it as simple as possible.


2) I added two reloption rs_perpage and rs_pertuple for setup amount of 
reserved space. I think these two attributes are enough for configure 
all case. Keep in mind that for each relation could be these parameters 
different.


I'm afraid these too are too flexible and not flexible enough.

For example, if we change the representation of a data type so that some 
values become longer, some shorter, how much space would you reserve per 
page and per tuple?


In the future release, when we know exactly what the new on-disk format 
looks like, we can backpatch a patch that reserves the right amount of 
space on pages.


Note that from a testing point of view, those reloptions would go unused 
until it's time to upgrade to the next release, so it wouldn't be 
significantly less risky to just backpatch code to do the calculation at 
that point, vs. implementing some generic formula based on per-page and 
per-tuple reservation earlier.


3) I adapted source code to respect new reloptions. Basic idea of it is 
that before someone call PageAddItem it checks free space on a page 
(PageGetFreeSpace...). I modify PageGetFreeSpace function to count 
reserved space. Unfortunately, it requires additional parameters.


Yeah, I think that's the right place to do it.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread David E. Wheeler

On Dec 11, 2008, at 3:42 PM, Bruce Momjian wrote:


what do you thing about?

select fce(p1,p2,p3, SET paramname1 = val, paramname2 = val)

example
select dosome(10,20,30, SET flaga = true, flagb = false)


I think AS read more naturally because you expect the parameter to  
come

first, not the SET keyword.


Coming to this a bit late, but it seems to me that, while it makes  
sense to assign a label to a value using AS, it's kind of weird to  
use it to assign a value to a label.


SELECT foo( bar = 'ick', baz = 'ack' );
SELECT foo( bar AS 'ick', baz AS 'ack' );

As a Perl hacker, I'm strongly biased toward =, but I guess AS isn't  
*too* bad. At least it's the same number of characters. Is - right out?


Best,

David



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED compatible connection manager

2008-12-12 Thread Peter Eisentraut
Now I have a question about the FDW C interface.  The way I understand 
it, an SQL/MED-enabled server and a FDW each have a specific API by 
which they communicate.  Supposedly, each database vendor should be able 
to ship a binary library for its FDW and each SQL/MED-enabled server 
should be able to load and use it.  (If you don't believe in binary 
compatibility, then I think there should at least be source-level 
interface compatibility.)


Now the way I read the FDWs you provide (default and pgsql), you are 
creating your own API for initialization and options validation that is 
not in the standard.  That would appear to contradict the idea of a 
standard interface.  I understand that option validation is useful, and 
I don't see anything about it in the standard, but should we break the 
API like that?  What are your designs about this?


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] psql commands for SQL/MED

2008-12-12 Thread Peter Eisentraut

The current proposed patch allocates the following psql \d commands:

\dw show foreign-data wrappers
\dr show foreign servers
\dm show user mappings

One might object that this allocates valuable letters for infrequently 
used functionality.


In an idle second I thought, how about F for foreign, but of course 
\dF* is already used for full-text search.  We could overload the F, but 
it might be weird.


Other ideas?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mostly Harmless: Welcoming our C++ friends

2008-12-12 Thread Tom Lane
Kurt Harriman harri...@acm.org writes:
 However, probably an easier alternative would be to have
 just one buildfarm machine do a nightly build configured
 with the --enable-cplusplus option.

There is no such option, and won't be.

 This would build one file - main.c - as C++ (necessary
 because on some platforms the main() function needs to be
 C++ to ensure the C++ runtime library is initialized).

Useless, since main.c doesn't include any large number of headers,
and in particular there is no reason for it to include the headers
that are critical to function libraries.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql commands for SQL/MED

2008-12-12 Thread Martin Pihlak
Alvaro Herrera wrote:
 What about \dM prefix, M standing for MED? It seems free in my 8.3 psql 
 here.
 
 So \dMf for foreign servers, \dMu for user mappings, etc?  That seems good.
 

I find the mixed case commands somewhat inconvinient -- too easy to make
typing mistakes, also slow to enter. If we can afford it, I would suggest
\dmw, \dms and \dmu (wrappers, servers and user mappings).

regards,
Martin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

2008-12-12 Thread Peter Eisentraut

KaiGai Kohei wrote:
I would like to be able to assign SQL-level ACLs and SELinux labels to 
the same row at the same time in the same build, and have the system 
enforce both on top of each other.


In my opinion, it makes more pains (user-interface, performance, complexity
of implementation and so on) than its benefit which allows to support MAC
and DAC concurrently.


I am a bit surprised.  I'd consider the ability to do DAC and MAC 
concurrently to be absolutely essential, for several reasons:


1. DAC is managed by users, MAC by administrators.

2. They address different but concurrent use cases.

3. Transitioning to MAC will be a lot easier if it doesn't require you 
to drop the DAC configuration all at once.


4. You don't propose to drop table ACLs if you enable SELinux, do you? 
Same issue.


5. It's possible!

We can debate the merits of having, say, SELinux plus Solaris TX at 
the same time, but if we can have two as per previous paragraph, we 
should design for several.


What platform is available for both of SELinux and Solaris TX?


Well, Solaris, if you believe various rumours.  I agree the case for 
this might be weak, though.


Are you saying about Solaris FMAC project?
It is a different platform from Trusted Solaris.


Trusted Solaris was a separate fork of Solaris, which is now legacy. 
The current thing is Solaris Trusted Extensions (TX), which is 
integrated in normal Solaris.  So when the FMAC project produces 
something, it should conceivably be available in parallel to the current 
TX stuff.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lifetime of TubleTableSlot* returned by ExecProcNode

2008-12-12 Thread Tom Lane
Bramandia Ramadhana braman...@gmail.com writes:
 As per title, what is the lifetime of the virtual tuple TupleTableSlot*
 returned by ExecProcNode?

Until you next call that same plan node.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread Ian Caulfield
2008/12/12 David E. Wheeler da...@kineticode.com:
 On Dec 11, 2008, at 3:42 PM, Bruce Momjian wrote:

 what do you thing about?

 select fce(p1,p2,p3, SET paramname1 = val, paramname2 = val)

 example
 select dosome(10,20,30, SET flaga = true, flagb = false)

 I think AS read more naturally because you expect the parameter to come
 first, not the SET keyword.

 Coming to this a bit late, but it seems to me that, while it makes sense to
 assign a label to a value using AS, it's kind of weird to use it to assign
 a value to a label.

 SELECT foo( bar = 'ick', baz = 'ack' );
 SELECT foo( bar AS 'ick', baz AS 'ack' );

 As a Perl hacker, I'm strongly biased toward =, but I guess AS isn't *too*
 bad. At least it's the same number of characters. Is - right out?


Personally I'm not keen on named parameter assignment, but if 'AS' is
unpopular, and '=' et al conflict with operators, would verilog-style
syntax - eg function( .param(value) ) - be an idea?

Ian

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep: First Thoughts on Code

2008-12-12 Thread Simon Riggs

On Fri, 2008-12-12 at 12:53 +0900, Fujii Masao wrote:
 
  Quite possibly a terminology problem.. I my case I said sync rep
  meaning the mode such that the transaction doesn't commit successfully
  for my PG client until the xlog record has been streamed to the
  client... and I understand that at his presentation at PGcon, Fujii-san
  there could be possible variants on when the streamed is considered
  done based on network, slave ram, disk, application, etc.
 
 I'd like to define the meaning of synch rep again. synch rep means:
 
 (1) Transaction commit waits for WAL records to be replicated to the standby
   before the command returns a success indication to the client.

 (2) The standby has (can read) all WAL files indispensable for recovery.

I would change can read in (2) to has access to. Can read implies
we have read all files and checked CRCs of individual records.


The crux of this is what we mean by synchronous_replication = on.
There are two possible meanings:

1. Commit will wait only if streaming is available and has waited for
all necessary startup conditions.
This provides Highest Availability

2. Commit will wait *until* full sync rep is available. So we don't
allow it until standby fails and also don't allow it if standby goes
down.
This provides Highest Transaction Durability, though is fairly
fragile. Other systems recommend use of multiple standby nodes if this
option is selected.

Perhaps we should add this as a third option to synchronous_replication,
so we have either off, on, only

So far I realise I've been talking exclusively about (1). In that mode
synchronous_replication = on would wait for streaming to complete even
if last WAL file not fully transferred. 

For (2) we need a full interlock. Given that we don't currently support
multiple streamed standby servers, it seems not much point in
implementing the interlock (2) would require. Should we leave that part
for 8.5, or do it now?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

2008-12-12 Thread Zeugswetter Andreas OSB sIT

 If we use some type of integer, I suggest using this structure for
 pg_security:
 
   CREATE TABLE pg_security(
   relid oid, 
   secid int2, 
   secacl aclitem[], 
   secext TEXT
   );
 
 This allows the per-row value to be a simple int2.  It also improves
 maintenance because rows are associated only with a specific table;
 unused values can then be removed more easily.  And it allows both
 secacl and secext security to be specified.

I do not expect that the number of unique combinations of rights
strongly varies between the tables. Thus I think creating pg_security rows per 
table
would vastly increase the size of pg_security. 
The expected size of pg_security is small in the current implementation. 

Example: security_context = top_secret_t
With above schema you need one row in pg_security for each table that has 
top_secret_t rows.
The current implementation only needs one row for this, which is imho better.

CREATE TABLE pg_security(
secid serial, 
secacl aclitem[], 
secext TEXT
);

May be ok, but I am with KaiGai, that it is not obvious how to update the 
security context syntactically when using 2 subsystems simultaneously.
But using, restricting and selecting is easy.

Andreas
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql commands for SQL/MED

2008-12-12 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Dimitri Fontaine wrote:
 What about \dM prefix, M standing for MED? It seems free in my 8.3 psql 
 here.

 So \dMf for foreign servers, \dMu for user mappings, etc?  That seems good.

I'd suggest e for external.  M for management is a pretty useless
mnemonic --- what's being managed?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread Dimitri Fontaine

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

Le 12 déc. 08 à 14:14, Ian Caulfield a écrit :

unpopular, and '=' et al conflict with operators, would verilog-style
syntax - eg function( .param(value) ) - be an idea?


Ok, time to revisit the classics then ;)
  http://www.gigamonkeys.com/book/functions.html#keyword-parameters

That would give us things like this:
  SELECT foo(1, :name 'bar', :quantity 10);

As colon character does not appear in the list of allowed characters  
for the CREATE OPERATOR, it seems it could be valid.
  http://www.postgresql.org/docs/8.3/interactive/sql- 
createoperator.html


Regards,
- --
dim


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAklCaCMACgkQlBXRlnbh1blryQCfR9/6qtOlSAOiMbQ+RD8PRTi+
bsoAn2UiLMwZOG9nanXyMWfh5iAbQVTX
=p37W
-END PGP SIGNATURE-

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Space reservation (pgupgrade)

2008-12-12 Thread Zdenek Kotala

Heikki Linnakangas napsal(a):

Zdenek Kotala wrote:

I attached patch which add capability to reserve space on page for
future upgrade. It is mandatory for future in-place upgrade 
implementation. This patch contains basic infrastructure not 
preupgrade script itself. I'm going to send WIP preupgrade script 
today in separate mail.


Is that a preupgrade script for upgrading from 8.3 to 8.4? As such, it 
can't take advantage of any of the changes in this patch.


No, it is preupgrade script for 8.4-8.5, better is say it is code which try to 
make space on page - part of future preupgrade script. It will be useful when 
8.5 will be out. But probably we will need to make some improvements in 8.4 code.





This patch contains following modifications:

1) I added datpreupgstatus and relpreupgstatus attribute into 
pg_database and pg_class. Original idea was to use only flag, but I 
need more info for tracking several process status (like 0 - not set, 
1 - reserved space set, 2 - reservation is finished and so on).


I'm not sure if datpreupgstatus will be useful, but I think better is 
to have it here.


I think this is too flexible and not flexible enough at the same time. 
It's too flexible, because we don't know what information we'd need to 
store about relations in a future script. Those fields might go unused 
for many releases, just confusing people. It's not flexible enough, if 
it turns out that we need to store more information about relations.


Yes, it is reason why I use int instead of bool, which should be use differently 
in each version. Of course nobody know what will happen in the future 
development, but I currently implement what I know that we need. If we will need 
more than ...


Predicting features that have not yet been written is hard, isn't it. 
Trying to do it too precisely will just lead to failure. The generic 
approach of using a pre-upgrade script is good, but I don't think it's 
wise to prepare anything more specific than that.


It seems that those flags were meant to keep track of what databases and 
relations have already been processed by the pre-upgrade script. I don't 
think the script needs to be restartable. If we accept that the whole 
cluster must be processed in one go, we don't need so much bookkeeping. 
Remember that this is a tool that we'll need to backport to people's 
production systems, so better keep it as simple as possible.


The problem is with CREATE TABLE/INDEX command. If somebody creates table then 
table will not have correctly set attributes and new table is not prepared for 
upgrade. There are of course more solution like forbidden create table/index 
command during preupgrade or set reserved space for relation based on already 
know constants. I prefer now easiest way and it is to have three statuses.


The idea of preupgrade script is following:

1) for each relation calculate reserved space per page and per tuple. Per page 
is just difference between page headers size and maybe special size. But per 
tuple it requires to calculate per attribute difference - potential null values 
will be ignored. The calculation will be know when next release, but we need 
basic support for it.


2) when reserved space is set then preupgrade script start to process all 
relation and perform appropriate operation on the block.


4) upgrade check in single mode if all relations have correct status.

2) I added two reloption rs_perpage and rs_pertuple for setup amount 
of reserved space. I think these two attributes are enough for 
configure all case. Keep in mind that for each relation could be these 
parameters different.


I'm afraid these too are too flexible and not flexible enough.

For example, if we change the representation of a data type so that some 
values become longer, some shorter, how much space would you reserve per 
page and per tuple?


As I mention several times, data type change should be handled differently. It 
should invoke to create new datatype and keep old datatype implementation (for 
example in separate legacy library).


Tuple size difference is calculate on following structures:

HeapTupleHeader (including DatumTupleHeader for composite datatypes)
OID size (+Security tag)
Array structures
Varlena implementation
Toast pointer

preupgrade script should know difference in size in these structures and takes 
all attributes and count maximal additional size.


In the future release, when we know exactly what the new on-disk format 
looks like, we can backpatch a patch that reserves the right amount of 
space on pages.


Yes but you need to have infrastructure. Better and safer is backport just a 
constants then big amount of code. Probably we will be sometime in situation 
when we will need to backport more but every time we can postpone problematic 
feature and update inplace upgrade infrastructure in current development 
release. See CRC example.


Note that from a testing point of view, those reloptions would go unused 
until it's 

Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread Pavel Stehule
2008/12/12 Dimitri Fontaine dfonta...@hi-media.com:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hi,

 Le 12 déc. 08 à 14:14, Ian Caulfield a écrit :

 unpopular, and '=' et al conflict with operators, would verilog-style
 syntax - eg function( .param(value) ) - be an idea?

 Ok, time to revisit the classics then ;)
  http://www.gigamonkeys.com/book/functions.html#keyword-parameters

 That would give us things like this:
  SELECT foo(1, :name 'bar', :quantity 10);

 As colon character does not appear in the list of allowed characters for the
 CREATE OPERATOR, it seems it could be valid.
  http://www.postgresql.org/docs/8.3/interactive/sql-createoperator.html


I dislike do LISP from nice PL :)

I thing so $name = is safe, but I didn't test it.

regards
Pavel

 Regards,
 - --
 dim


 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.9 (Darwin)

 iEYEARECAAYFAklCaCMACgkQlBXRlnbh1blryQCfR9/6qtOlSAOiMbQ+RD8PRTi+
 bsoAn2UiLMwZOG9nanXyMWfh5iAbQVTX
 =p37W
 -END PGP SIGNATURE-

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread Tom Lane
David E. Wheeler da...@kineticode.com writes:
 As a Perl hacker, I'm strongly biased toward =, but I guess AS isn't  
 *too* bad. At least it's the same number of characters. Is - right out?

It's just as bad as = from the perspective of usurping a probable
user-defined operator name.

I think the fundamental problem with *any* notation like that is that
we don't have a concept of reserved words in the operator name space;
and without a precedent for it it's tough to justify suddenly breaking
people's code.  As was already noted, you could damp down the objections
by choosing some long and ugly operator name, but that's hardly going
to be pleasant to use.

So I think that really this is never going to fly unless it uses a
keyword-looking reserved word.  And we're not going to take some short
word that's not reserved now and suddenly make it so.  So, despite
Pavel's objection that the AS syntax proposal might be confused with
other uses of AS, I seriously doubt that any proposal is going to get
accepted that doesn't recycle AS or some other existing reserved word.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 8.3.4 reproducible crash

2008-12-12 Thread Alvaro Herrera
Tom Lane wrote:

 1. Ensure that a snapshot is set before doing parse analysis of any
 non-utility command.  (We *must* not set a snap before LOCK or a
 transaction control command, and it seems best to not do it for any
 utility command.)  One issue here is that this would change the behavior
 for mixed utility and non-utility commands in a single query string;
 though I'm not sure how much that matters.

I think this is the easiest way out, and the most robust -- we won't be
bitten by some other operation that the parser may think of doing.
(Note that utility commands have their snapshot set in
PortalRunUtility).  Also, perhaps this would let us clean the mess in
pg_plan_queries.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread David E. Wheeler

On Dec 12, 2008, at 2:33 PM, Dimitri Fontaine wrote:



Ok, time to revisit the classics then ;)
 http://www.gigamonkeys.com/book/functions.html#keyword-parameters

That would give us things like this:
 SELECT foo(1, :name 'bar', :quantity 10);

As colon character does not appear in the list of allowed characters  
for the CREATE OPERATOR, it seems it could be valid.
 http://www.postgresql.org/docs/8.3/interactive/sql- 
createoperator.html


Oh, I like the colon, but better at the end of the label:

 SELECT foo(1, name: 'bar', quantity: 10);

Best,

Daivd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread David E. Wheeler

On Dec 12, 2008, at 2:39 PM, Tom Lane wrote:


So I think that really this is never going to fly unless it uses a
keyword-looking reserved word.  And we're not going to take some short
word that's not reserved now and suddenly make it so.  So, despite
Pavel's objection that the AS syntax proposal might be confused with
other uses of AS, I seriously doubt that any proposal is going to get
accepted that doesn't recycle AS or some other existing reserved word.


I'm okay with AS if that's the way it has to be, but what about a  
colon right at the end of the label? A cast is two colons, so no  
conflict there:


  SELECT foo(1, name: 'bar', quantity: 10);

No doubt I'm missing something…

Best

David
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread Tom Lane
David E. Wheeler da...@kineticode.com writes:
 I'm okay with AS if that's the way it has to be, but what about a  
 colon right at the end of the label?

Hmm ... a colon isn't considered to be an operator name, so this
wouldn't break any existing usage.  I'm a little bit worried about
what we might be cutting ourselves off from in the future, but
maybe it's a good solution.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread Pavel Stehule
2008/12/12 David E. Wheeler da...@kineticode.com:
 On Dec 12, 2008, at 2:39 PM, Tom Lane wrote:

 So I think that really this is never going to fly unless it uses a
 keyword-looking reserved word.  And we're not going to take some short
 word that's not reserved now and suddenly make it so.  So, despite
 Pavel's objection that the AS syntax proposal might be confused with
 other uses of AS, I seriously doubt that any proposal is going to get
 accepted that doesn't recycle AS or some other existing reserved word.

when I should exactly identify param name, the we should to use any symbols.


 I'm okay with AS if that's the way it has to be, but what about a colon
 right at the end of the label? A cast is two colons, so no conflict there:

  SELECT foo(1, name: 'bar', quantity: 10);

it's look well, but I still prefer some combination with =

name: = ''
name: = '''
:name = ''
$name = ..
$name = ..

Maybe I am too conservative
Pavel


 No doubt I'm missing something…

 Best

 David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep: First Thoughts on Code

2008-12-12 Thread Aidan Van Dyk
* Simon Riggs si...@2ndquadrant.com [081212 08:20]:
 
 2. Commit will wait *until* full sync rep is available. So we don't
 allow it until standby fails and also don't allow it if standby goes
 down.
 This provides Highest Transaction Durability, though is fairly
 fragile. Other systems recommend use of multiple standby nodes if this
 option is selected.

yes please!

 Perhaps we should add this as a third option to synchronous_replication,
 so we have either off, on, only
 
 So far I realise I've been talking exclusively about (1). In that mode
 synchronous_replication = on would wait for streaming to complete even
 if last WAL file not fully transferred. 

Seems reasonable...

 For (2) we need a full interlock. Given that we don't currently support
 multiple streamed standby servers, it seems not much point in
 implementing the interlock (2) would require. Should we leave that part
 for 8.5, or do it now?

Ugh... If all sync-rep is gong to give is if it's working, the commit
made it the slaves, but it might not be working [anymore|yet], but you
(the app using pg) have no way of knowing..., that sort of defeats the
point ;-)

I'ld love multiple slaves, but I understand that's not in the current
work, and I understand that it might be hard with the accept  become
wall-sender approach.  It should be very easy to make a walsender handle
multiple slaves, and voting of quorum/etc as successfully on slave,
except that we need to get the multiple connections to the walsender
backend...

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [HACKERS] SQL/MED compatible connection manager

2008-12-12 Thread Martin Pihlak
Peter Eisentraut wrote:
 Now the way I read the FDWs you provide (default and pgsql), you are
 creating your own API for initialization and options validation that is
 not in the standard.  That would appear to contradict the idea of a
 standard interface.  I understand that option validation is useful, and
 I don't see anything about it in the standard, but should we break the
 API like that?  What are your designs about this?
 

Hmm, in that perspective it would make sense to make the InitializeFdw
function optional (it was, before I got worried about library reloads).
If no InitializeFdw is present, connection lookup and option validation
are disabled. All of the standard defined FDW functions are fetched by
load_external_function. This way we could have the additional features
and still be able to load standard conforming FDW's.

Actually it would make sense to use _PG_init instead of InitializeFdw.
This way it'd be called automatically on library load, the parameter(s)
would be passed in globals though.

regards,
Martin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread Rod Taylor
How about IS or INTO?

param_name IS 3
param_name IS 'some string value'

3 INTO param_name
'some string value' INTO param_name




On Fri, Dec 12, 2008 at 8:47 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 2008/12/12 David E. Wheeler da...@kineticode.com:
 On Dec 12, 2008, at 2:39 PM, Tom Lane wrote:

 So I think that really this is never going to fly unless it uses a
 keyword-looking reserved word.  And we're not going to take some short
 word that's not reserved now and suddenly make it so.  So, despite
 Pavel's objection that the AS syntax proposal might be confused with
 other uses of AS, I seriously doubt that any proposal is going to get
 accepted that doesn't recycle AS or some other existing reserved word.

 when I should exactly identify param name, the we should to use any symbols.


 I'm okay with AS if that's the way it has to be, but what about a colon
 right at the end of the label? A cast is two colons, so no conflict there:

  SELECT foo(1, name: 'bar', quantity: 10);

 it's look well, but I still prefer some combination with =

 name: = ''
 name: = '''
 :name = ''
 $name = ..
 $name = ..

 Maybe I am too conservative
 Pavel


 No doubt I'm missing something…

 Best

 David

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 8.3.4 reproducible crash

2008-12-12 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Tom Lane wrote:
 1. Ensure that a snapshot is set before doing parse analysis of any
 non-utility command.

 I think this is the easiest way out, and the most robust -- we won't be
 bitten by some other operation that the parser may think of doing.

Yeah.  I think we probably have to do that in any case because we have
an assumption that datatype input routines are allowed to make use of
a snapshot (see comments in fastpath.c for instance).  The fact that
no one's noticed this crash before suggests that none of the common ones
actually do, but I don't think we want to back off that assumption.

There's still a question of whether we want to alter the treatment of
record-type input to make the handling of embedded domains more uniform,
but that's something for the future.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread Tom Lane
Rod Taylor rod.tay...@gmail.com writes:
 How about IS or INTO?

IS isn't a fully reserved word, and INTO seems pretty weird for this.

(IS is a type_func_name_keyword, so maybe we could make it work anyway,
but it sounds a bit fragile.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] table types/check constraints

2008-12-12 Thread Merlin Moncure
In my gripe/suggestion here:
http://archives.postgresql.org/pgsql-hackers/2008-12/msg00642.php

I noted that it might be nice to handle check constraints over
composite types.  It turns out that table check constraints are not
enforced during casts:

postgres=# create table foo(f1 int, f2 int, check(f1 != f2));
postgres=# select (1,1)::foo;
  row
---
 (1,1)
(1 row)

Should the be?  I've been thinking about how to marry tables and
composite types.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread David E. Wheeler

On Dec 12, 2008, at 2:47 PM, Pavel Stehule wrote:


it's look well, but I still prefer some combination with =

name: = ''
name: = '''
:name = ''
$name = ..
$name = ..

Maybe I am too conservative


Given that the colon already indicates This label corresponds to that  
value, the other operator characters are redundant. In English, I  
write things like this:


 first: go to store
 second: get eggs

See what I mean? I quite like the colon solution.

Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread Sam Mason
On Fri, Dec 12, 2008 at 09:00:52AM -0500, Rod Taylor wrote:
 How about IS or INTO?
 
 param_name IS 3
 param_name IS 'some string value'

that wouldn't work with NULL would it?  for example is:

  a IS NULL

checking if identifier 'a' IS NULL, or if you're giving NULL to
parameter 'a'.

 3 INTO param_name
 'some string value' INTO param_name

looks good.  Just to throw another item in, you could keep with SQL's
general verboseness and use:

  WITH ident = expr

that may be too much though.  Names that were mentioned in the keyword
file are:

  AS IS WITH ON HAVING INTO
and the following un-reserved entries
  MATCH NAME NAMES


  Sam

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread Pavel Stehule
 right at the end of the label? A cast is two colons, so no conflict there:

  SELECT foo(1, name: 'bar', quantity: 10);

 it's look well, but I still prefer some combination with =

 name: = ''
 name: = '''
 :name = ''
 $name = ..
 $name = ..

hmm :( $name isn't possible
:name is in conflict with vars in psql :(




 Maybe I am too conservative
 Pavel


 No doubt I'm missing something…

 Best

 David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread Tom Lane
I wrote:
 Rod Taylor rod.tay...@gmail.com writes:
 How about IS or INTO?

 IS isn't a fully reserved word, and INTO seems pretty weird for this.

 (IS is a type_func_name_keyword, so maybe we could make it work anyway,
 but it sounds a bit fragile.)

Actually, there's an obvious counterexample for IS:

select func(foo IS NULL)

Two possible meanings...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread Heikki Linnakangas

David E. Wheeler wrote:
Coming to this a bit late, but it seems to me that, while it makes sense 
to assign a label to a value using AS, it's kind of weird to use it to 
assign a value to a label.


SELECT foo( bar = 'ick', baz = 'ack' );
SELECT foo( bar AS 'ick', baz AS 'ack' );


We could do it the other way round:

SELECT foo( 'ick' AS bar, 'ack' AS baz);

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread Robert Haas
 I'm okay with AS if that's the way it has to be, but what about a colon
 right at the end of the label? A cast is two colons, so no conflict there:

  SELECT foo(1, name: 'bar', quantity: 10);

 No doubt I'm missing something…

I'd just like to mention that there are two different cases to
consider here.  One is when you want to pass some optional parameters,
but there are enough of them that it's inconvenient to have them in
some particular order.  This is the case I think you're primarily
catering to here.

The other is when you want the function that gets called to magically
know what name the system would have assigned to the column had the
expression been used in a select list, so that you can write things
xmlify(foo) and get foo...data from foo.../foo.

I think the AS syntax makes a lot of sense for the second one, but not
so much for the first one.  Maybe we need both:

[keyword:] paramater_expression [AS label]

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread Pavel Stehule
2008/12/12 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 David E. Wheeler wrote:

 Coming to this a bit late, but it seems to me that, while it makes sense
 to assign a label to a value using AS, it's kind of weird to use it to
 assign a value to a label.

 SELECT foo( bar = 'ick', baz = 'ack' );
 SELECT foo( bar AS 'ick', baz AS 'ack' );

 We could do it the other way round:

 SELECT foo( 'ick' AS bar, 'ack' AS baz);


I discussed about this form with Tom.

I thing so following should be readable:

name: [ optional = ] value

  SELECT foo( bar: 'ick', baz: 'ack' );
  SELECT foo( bar: = 'ick', baz: = 'ack' );

or

  SELECT foo( bar: = 'ick', baz: = 'ack' );

reason for optional using of = is too thin char :, so =
optically boost the colon.

Pavel



this is 100% compatible because syntax name: is new token

 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Greg Stark
On Fri, Dec 12, 2008 at 2:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 AFAICS, marginal enlargements in the sample size aren't going to help
 much for ndistinct --- you really need to look at most or all of the
 table to be guaranteed anything about that.

Well you only need to maintain a fixed percentage of the table if by
guaranteed anything you mean guaranteed a consistent level of
confidence. But even a small percentage like 1% means a very different
behaviour than currently. For large tables it could mean sampling a
*lot* more.

However if by guaranteed anything you mean guaranteeing an actual
useful result then it's true. Even samples as large as 50% give a
pretty low confidence estimate.


 But having said that, I have wondered whether we should consider
 allowing the sample to grow to fill maintenance_work_mem

Hm, so I wonder what this does to the time analyze takes. I think it
would be the only thing where raising maintenance_work_mem would
actually increase the amount of time an operation takes. Generally
people raise it to speed up index builds and vacuums etc.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread David E. Wheeler

On Dec 12, 2008, at 3:38 PM, Pavel Stehule wrote:


I discussed about this form with Tom.

I thing so following should be readable:

name: [ optional = ] value

 SELECT foo( bar: 'ick', baz: 'ack' );
 SELECT foo( bar: = 'ick', baz: = 'ack' );

or

 SELECT foo( bar: = 'ick', baz: = 'ack' );

reason for optional using of = is too thin char :, so =
optically boost the colon.


Hrm. I can see that, I guess. In that case, though, I think I'd prefer  
the colon at the beginning of the parameter label:


  SELECT foo( :bar = 'ick', :baz = 'ack' );

In that case, though, I'd want the = to be required. Note that  
there's a precedent here, too: This is Ruby's syntax for using  
symbols for parameter names.



this is 100% compatible because syntax name: is new token


Interesting. I hadn't expected that the use of the colon to make the  
use of = be okay. Cool that it does, though.


Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Dec 11, 2008 at 10:12 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Maybe so.  If we stick to the other design (end both lists at a preset
 frequency threshold) then the math clearly goes through the same as
 before, just with num_mcvs that are determined differently.  But can
 we prove anything about the maximum error added from that?

 I don't think so, because in that design, it's entirely possible that
 you'll throw away the entire MCV list if all of the entries are below
 the threshold (as in the example we were just benchmarking, supposing
 a threshold of 0.001).

Right, but the question is how much that really hurts.  It's not like
we are going to pick a completely clueless number for the ignored MCVs;
rather, we are going to assume that they have the same stats as the
remainder of the population.  If the threshold frequency isn't very
large then the error involved should be bounded.  As an example, in the
perfectly flat distribution set up by the speed tests we were just
doing, there actually wouldn't be any error at all (assuming we got
ndistinct right, which of course is a pretty big assumption).  I haven't
consumed enough caffeine yet to try to do the math, but I think that if
you set the threshold as something a bit more than the assumed frequency
of a non-MCV value then it could work.

 An alternative is to pick a threshold T for the maximum number of
 equality probes that you're willing to suffer through.

I'd like to get there from the other direction, ie figure out what
T has to be to get known maximum error.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread Gregory Stark
Pavel Stehule pavel.steh...@gmail.com writes:

 2008/12/12 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:

 We could do it the other way round:

 SELECT foo( 'ick' AS bar, 'ack' AS baz);

I always assumed we were talking about it this way. Everywhere else in SQL AS
is followed by the labels, not the values.

 I discussed about this form with Tom.

 I thing so following should be readable:

 name: [ optional = ] value

   SELECT foo( bar: 'ick', baz: 'ack' );
   SELECT foo( bar: = 'ick', baz: = 'ack' );

 or

   SELECT foo( bar: = 'ick', baz: = 'ack' );

 reason for optional using of = is too thin char :, so =
 optically boost the colon.

These don't solve anything. There's nothing stopping you from defining a unary
prefix operator = or =

In any case this is all weird. SQL isn't C and doesn't have random bits of
punctuation involved in syntax. It uses whole words for just about everything.
Anything you do using punctuation characters is going to look out of place.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

2008-12-12 Thread KaiGai Kohei

Peter Eisentraut wrote:

KaiGai Kohei wrote:
I would like to be able to assign SQL-level ACLs and SELinux labels 
to the same row at the same time in the same build, and have the 
system enforce both on top of each other.


In my opinion, it makes more pains (user-interface, performance, 
complexity

of implementation and so on) than its benefit which allows to support MAC
and DAC concurrently.


I am a bit surprised.  I'd consider the ability to do DAC and MAC 
concurrently to be absolutely essential, for several reasons:


1. DAC is managed by users, MAC by administrators.

2. They address different but concurrent use cases.


Yes,
Please note that I don't say it is something worthless.
However, when we implement it with a single security system column,
its demerit is unacceptable.

3. Transitioning to MAC will be a lot easier if it doesn't require you 
to drop the DAC configuration all at once.


4. You don't propose to drop table ACLs if you enable SELinux, do you? 
Same issue.


Yes, management of security attribute is a major work of security features.
Thus, when we switch the active security feature, it does not managed
correctly. Please imagine what is happen when we remount ext3 filesystem
without acl flag, or we boot a SELinux'ed system with selinux=0.


5. It's possible!


Indeed, it is possible, but it is fact we also have some of trade-offs.
The biggest matter is user-interfaces to modify security attribute is
unclear.

If we have two system column, security_acl for DAC and security_label
for MAC, it allows to implement the feature without remarkable pains.
Well, we can simply update the security_acl to update per tuple ACLs.
However, again, we return to start of the discussion.

Tom said:
 Wait a minute.  The original argument for providing SQL-driven row level
 security was that it would help provide a framework for testing the code
 and doing something useful with it on non-selinux platforms.

I think it is not a issue which has perfect answer, so we need to decide
a way to implement it. My preference is 1 security system column and
1 security feature design.

I tried to summarize the proposed options, as follows:

o : meritx : demeritX : unacceptable demerit

* 1 security system column, 1 security feature (DAC or MAC)
 o It suitable for a single security system column implementation.
 x If a user want to use both of DAC and MAC concurrently, he has
   to choose one of them.
 o It allows all the security feature on the common framework,
   suitable for the original Row-level ACLs purpose.

* 2 security system column, 2 security feature (DAC and MAC)
 o It allows both of DAC and MAC consurrently, without remarkable
   regressions.
 x It needs two new security system columns.
 x What is the purpose of the Row-level security in original?

* 1 security system column, 2 security feature
 X It gives us catastrophic regression in user-interface, performance
   and code complexity. Its merit is trivial compared to its demerit.

--
KaiGai Kohei

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread Pavel Stehule
2008/12/12 David E. Wheeler da...@kineticode.com:
 On Dec 12, 2008, at 3:38 PM, Pavel Stehule wrote:

 I discussed about this form with Tom.

 I thing so following should be readable:

 name: [ optional = ] value

  SELECT foo( bar: 'ick', baz: 'ack' );
  SELECT foo( bar: = 'ick', baz: = 'ack' );

 or

  SELECT foo( bar: = 'ick', baz: = 'ack' );

 reason for optional using of = is too thin char :, so =
 optically boost the colon.

 Hrm. I can see that, I guess. In that case, though, I think I'd prefer the
 colon at the beginning of the parameter label:

  SELECT foo( :bar = 'ick', :baz = 'ack' );

this syntax is used yet
http://www.postgresql.org/docs/8.3/interactive/app-psql.html

testdb= \set foo 'my_table'
testdb= SELECT * FROM :foo;

would then query the table my_table. The value of the variable is
copied literally, so it can even contain unbalanced quotes or
backslash commands. You must make sure that it makes sense where you
put it. Variable interpolation will not be performed into quoted SQL
entities.

A popular application of this facility is to refer to the last
inserted OID in subsequent statements to build a foreign key scenario.
Another possible use of this mechanism is to copy the contents of a
file into a table column. First load the file into a variable and then
proceed as above:

testdb= \set content  `cat my_file.txt` 
testdb= INSERT INTO my_table VALUES (:content);

regards
Pavel Stehule





 In that case, though, I'd want the = to be required. Note that there's a
 precedent here, too: This is Ruby's syntax for using symbols for parameter
 names.

 this is 100% compatible because syntax name: is new token

 Interesting. I hadn't expected that the use of the colon to make the use of
 = be okay. Cool that it does, though.

 Best,

 David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Simon Riggs

On Fri, 2008-12-12 at 06:44 -0500, Robert Haas wrote:
 On Fri, Dec 12, 2008 at 4:04 AM, Simon Riggs si...@2ndquadrant.com wrote:
  The existing sampling mechanism is tied to solid statistics. It
  provides the correct sample size to get a consistent confidence range
  for range queries. This is the same mathematics which governs election
  polling and other surveys. The sample size you need to get +/- 5% 19
  times out of 20 increases as the population increases, but not by very
  much.
 
  Sounds great, but its not true. The sample size is not linked to data
  volume, so how can it possibly give a consistent confidence range?
 
 I'm not 100% sure how relevant it is to this case, but I think what
 Greg is referring to is:
 
 http://en.wikipedia.org/wiki/Margin_of_error#Effect_of_population_size
 
 It is a pretty well-known mathematical fact that for something like an
 opinion poll your margin of error does not depend on the size of the
 population but only on the size of your sample.

Yes, I agree with that *but* that refers to population statistics and
has nothing at all to do with the calculation of ndistinct, which is
what we were discussing. You can't just switch topics and have the
statement remain true.

ndistinct estimation is improved by larger sample sizes, that's what the
maths says and what experimentation shows also.

Note that the estimator we use was shown to be stable in the range of
sample size between 5-20%.
http://www.almaden.ibm.com/cs/people/peterh/jasa3rj.pdf
We currently use a sample size of 300*stats_target. With default=10 that
means our sample size is 0.3% on a 1 million row table, and 0.003% on a
100 million row table (they're common, I find). 

That size of sample is OK for some kinds of statistics, but not much
good for ndistinct estimation.

These issues only show up in the field, they never show up on optimizer
test platforms because they typically are several orders of magnitude
too small. (Conversely, the stats system works very well indeed for
smaller tables... so I'm not trying to debunk everything).

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread David E. Wheeler

On Dec 12, 2008, at 3:57 PM, Gregory Stark wrote:

In any case this is all weird. SQL isn't C and doesn't have random  
bits of
punctuation involved in syntax. It uses whole words for just about  
everything.
Anything you do using punctuation characters is going to look out of  
place.


Well, what do databases other than Oracle (which uses =) do? What's  
likely to end up in the standard?


Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread David E. Wheeler

On Dec 12, 2008, at 3:56 PM, Pavel Stehule wrote:

Hrm. I can see that, I guess. In that case, though, I think I'd  
prefer the

colon at the beginning of the parameter label:

SELECT foo( :bar = 'ick', :baz = 'ack' );


this syntax is used yet
http://www.postgresql.org/docs/8.3/interactive/app-psql.html

testdb= \set foo 'my_table'
testdb= SELECT * FROM :foo;


Oh, right. Damn.

In that case, I'm happy with your proposal of

  name: [ = ] value

Where = is optional.

Or, if that just doesn't fly for reasons such as those cited by Greg  
Stark, AS would seem to be the only choice left. Though what's on the  
lhs vs the rhs is debatable:


  SELECT foo( label AS 'value' );
  SELECT foo( 'value' AS label );

Maybe they're reversible?

Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread Tom Lane
David E. Wheeler da...@kineticode.com writes:
 Hrm. I can see that, I guess. In that case, though, I think I'd prefer  
 the colon at the beginning of the parameter label:

SELECT foo( :bar = 'ick', :baz = 'ack' );

That's ugly, and incompatible with ecpg syntax, and what's the redeeming
value anyway?

In any case, whichever side you put the colon on, Pavel's proposal for
adding = to it is a nonstarter --- he's ignoring the possibility that
= is defined as a prefix operator.

Hmm ... actually, ecpg might be a problem here anyway.  I know it has
special meaning for :name, but does it allow space between the colon
and the name?  If it does then the colon syntax loses.  If it doesn't
then you could do name: value as long as you were careful to leave
a space after the colon.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Simon Riggs

On Fri, 2008-12-12 at 09:35 -0500, Tom Lane wrote:

 AFAICS, marginal enlargements in the sample size aren't going to help
 much for ndistinct --- you really need to look at most or all of the
 table to be guaranteed anything about that.
 
 But having said that, I have wondered whether we should consider
 allowing the sample to grow to fill maintenance_work_mem, rather than
 making it a predetermined number of rows.  One difficulty is that the
 random-sampling code assumes it has a predetermined rowcount target;
 I haven't looked at whether that'd be easy to change or whether we'd
 need a whole new sampling algorithm.

I think we need to do block sampling before we increase sample size. On
large tables we currently do one I/O per sampled row, so the I/O cost of
ANALYZE would just increase linearly.

We need the increased sample size for ndistinct, not for other stats. So
I would suggest we harvest a larger sample, use that for ndistinct
estimation, but then sample-the-sample to minimise processing time for
other stats that aren't as sensitive as ndistinct.

Currently we fail badly on columns that have been CLUSTERed and we can
improve that significantly by looking at adjacent groups of rows, i.e.
block sampling.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Tom Lane
Greg Stark st...@enterprisedb.com writes:
 On Fri, Dec 12, 2008 at 2:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 But having said that, I have wondered whether we should consider
 allowing the sample to grow to fill maintenance_work_mem

 Hm, so I wonder what this does to the time analyze takes. I think it
 would be the only thing where raising maintenance_work_mem would
 actually increase the amount of time an operation takes. Generally
 people raise it to speed up index builds and vacuums etc.

Yeah --- we might need to make it a separate GUC knob instead of tying
it directly to maintenance_work_mem.  But still, is *any* fixed-size
sample really going to help much for large tables?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread Dimitri Fontaine

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Le 12 déc. 08 à 15:57, Gregory Stark a écrit :
These don't solve anything. There's nothing stopping you from  
defining a unary

prefix operator = or =


That's why I'm preferring the common-lisp syntax of :param value, or  
its variant param: value.


In any case this is all weird. SQL isn't C and doesn't have random  
bits of
punctuation involved in syntax. It uses whole words for just about  
everything.
Anything you do using punctuation characters is going to look out of  
place.


Well, with the exception of function argument list, beginning with  
( and ending with ) and where parameters are separated by ,. Maybe  
some : in there would shock users.


  SELECT foo(a, b, c);
  SELECT foo(a, :c 5);
  SELECT foo(a, c: 5);

Not so much new punctuation characters there, 1 out of 4.
- --
dim




-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAklCfysACgkQlBXRlnbh1blWJQCfdLCB0B9xOzvfX2tOfoBL4cxo
X4UAoI3aTK+834Cx5Wbly/snj2hQbQTX
=s6w5
-END PGP SIGNATURE-

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 We could do it the other way round:
 SELECT foo( 'ick' AS bar, 'ack' AS baz);

Yeah, that's the direction I had always assumed that we would use,
if AS is the chosen syntax for this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread Greg Stark
On Fri, Dec 12, 2008 at 3:11 PM, Dimitri Fontaine
dfonta...@hi-media.com wrote:
 That's why I'm preferring the common-lisp syntax of :param value, or its
 variant param: value.

FWIW there is no such common-lisp syntax. Colon is just a regular
symbol character and :param is just a regular symbol in common-lisp.
There is a convention that functions parse their argument lists
looking for such tokens as indicators of what to do with the next
argument but it's purely a convention. There's no syntactic
significance to the colon.

A similar problem arises with using Perl as a precedent. = is just a
regular operator in perl which quotes the lhs as a string if it's a
simple token and otherwise behaves just like a comma. That would be
very different from what we're talking about having it do here.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Polymorphic types vs. domains

2008-12-12 Thread Grzegorz Jaskiewicz

any news on that front ?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Gregory Stark
Simon Riggs si...@2ndquadrant.com writes:

 On Fri, 2008-12-12 at 06:44 -0500, Robert Haas wrote:
 It is a pretty well-known mathematical fact that for something like an
 opinion poll your margin of error does not depend on the size of the
 population but only on the size of your sample.

 Yes, I agree with that *but* that refers to population statistics and
 has nothing at all to do with the calculation of ndistinct, which is
 what we were discussing. You can't just switch topics and have the
 statement remain true.

If you go back to my email that was kind of my point. The existing sample size
is on a solid foundation for the histograms and most use cases for the
statistics. But entirely bogus for ndistinct.

The ndistinct estimate is just piggy-backing on that data. However to fix it
would require switching over to scanning a percentage of the whole table which
would be a massive increase in work for that one calculation. You can't fix it
by just adjusting the sample size slightly.

 Note that the estimator we use was shown to be stable in the range of
 sample size between 5-20%.
 http://www.almaden.ibm.com/cs/people/peterh/jasa3rj.pdf

Uhm, this is a survey of lots of different methods and does lots of analysis.
I don't see any simple conclusions about stability. Perhaps I'm just missing
it in the technical details. Could you point out exactly what part of the
paper you're basing this on and what stable means?

 We currently use a sample size of 300*stats_target. With default=10 that
 means our sample size is 0.3% on a 1 million row table, and 0.003% on a
 100 million row table (they're common, I find). 

 That size of sample is OK for some kinds of statistics, but not much
 good for ndistinct estimation.

Right, but increasing our sample size by a factor of 150 for a 100M row table
doesn't seem like a reasonable solution to one metric being bogus.

For that matter, if we do consider sampling 5% of the table we may as well
just go ahead and scan the whole table. It wouldn't take much longer and it
would actually produce good estimates.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Tom Lane
Gregory Stark st...@enterprisedb.com writes:
 For that matter, if we do consider sampling 5% of the table we may as well
 just go ahead and scan the whole table. It wouldn't take much longer and it
 would actually produce good estimates.

Yeah.  Anything over a small fraction of a percent is going to imply
fetching every page anyway, for typical row widths.  If you want ANALYZE
to be cheap then you simply don't get to have a trustworthy value of
ndistinct.

Perhaps a better plan is to try to de-emphasize use of ndistinct,
though I concede I have no idea how to do that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Kevin Grittner
 Nathan Boley npbo...@gmail.com wrote: 
 Can anyone suggest a good data set to test this sort of question on?
 
Where we have the biggest problem with bad estimates is on complex
searches involving many joins where the main criterion limiting the
result set is a name.  The estimate based on the histogram is often
very low (e.g. 2) when the actual result set is several hundred. 
While several hundred is far short of 1% of the table, the best plan
for a result set of that size is very different than the best plan for
two rows.
 
Some numbers follow to give an idea of the shape of data where current
techniques sometimes do poorly.  We use a searchName column which
puts the name components from various columns into a canonical format;
this is what is indexed and searched.  The search is usually a LIKE
with the high order portion being six to ten characters followed by
the wild card.
 
Total rows in table: 32,384,830
 
There are 9,958,969 distinct values.
 
There is one value present in over 1% of the rows, with 433,578 rows.
 
There are ten values present in over 0.1% of the rows:
 433578
 140398
 135489
 112088
  64069
  63158
  44656
  36499
  35896
  35819
 
The 100th most common value is present in 4847 rows.
 
There are 186 rows with over 0.01% of the rows.
 
Based on my experience, we would need better estimates for ranges with
200 to 300 rows to improve our plans for the problem cases.  I'd be
happy to have it scan the whole table during our nightly VACUUM
ANALYZE if that would get me statistics which would improve the
estimates to that degree without a huge increase in plan time.
 
Which raises the issue, if we could get better statistics by passing
the whole table, why not do that when VACUUM ANALYZE is run?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread Douglas McNaught
On Fri, Dec 12, 2008 at 10:31 AM, Greg Stark st...@enterprisedb.com wrote:
 On Fri, Dec 12, 2008 at 3:11 PM, Dimitri Fontaine
 dfonta...@hi-media.com wrote:
 That's why I'm preferring the common-lisp syntax of :param value, or its
 variant param: value.

 FWIW there is no such common-lisp syntax. Colon is just a regular
 symbol character and :param is just a regular symbol in common-lisp.
 There is a convention that functions parse their argument lists
 looking for such tokens as indicators of what to do with the next
 argument but it's purely a convention. There's no syntactic
 significance to the colon.

Drifting off-topic and being really nit-picky, you're wrong.  :)  It's
more than just a convention.  Colons *are* special in symbol
names--the leading colon designates the symbol as being in the KEYWORD
package (i.e. symbol namespace; you can put symbols in other packages
by prepending a package name to the colon) and there is standard
syntax (key) for specifying allowed keyword arguments to a function;
said keys must be symbols in the KEYWORD package.

So the proposed foo( :bar 12, :baz 'stuff' ) syntax is actually very
close to the Common Lisp syntax, though there may be very good reasons
not to use it in PG.

 A similar problem arises with using Perl as a precedent. = is just a
 regular operator in perl which quotes the lhs as a string if it's a
 simple token and otherwise behaves just like a comma. That would be
 very different from what we're talking about having it do here.

Very true, and I think the don't break people who are using = as a
prefix operator argument is pretty conclusive.

-Doug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread David E. Wheeler

On Dec 12, 2008, at 4:06 PM, Tom Lane wrote:


David E. Wheeler da...@kineticode.com writes:
Hrm. I can see that, I guess. In that case, though, I think I'd  
prefer

the colon at the beginning of the parameter label:



  SELECT foo( :bar = 'ick', :baz = 'ack' );


That's ugly, and incompatible with ecpg syntax, and what's the  
redeeming

value anyway?


Beauty is in the eye of the beholder, I guess. I got used to it  
hacking Ruby last year.



In any case, whichever side you put the colon on, Pavel's proposal for
adding = to it is a nonstarter --- he's ignoring the possibility that
= is defined as a prefix operator.


Ah.


Hmm ... actually, ecpg might be a problem here anyway.  I know it has
special meaning for :name, but does it allow space between the colon
and the name?  If it does then the colon syntax loses.  If it doesn't
then you could do name: value as long as you were careful to leave
a space after the colon.


So would that eliminate

  SELECT foo( bar: 'ick', baz: 'ack' );

as a possibility?

Best,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

2008-12-12 Thread Robert Haas
 I tried to summarize the proposed options, as follows:

 o : meritx : demeritX : unacceptable demerit

 * 1 security system column, 1 security feature (DAC or MAC)
  o It suitable for a single security system column implementation.
  x If a user want to use both of DAC and MAC concurrently, he has
   to choose one of them.
  o It allows all the security feature on the common framework,
   suitable for the original Row-level ACLs purpose.

 * 2 security system column, 2 security feature (DAC and MAC)
  o It allows both of DAC and MAC consurrently, without remarkable
   regressions.
  x It needs two new security system columns.
  x What is the purpose of the Row-level security in original?

 * 1 security system column, 2 security feature
  X It gives us catastrophic regression in user-interface, performance
   and code complexity. Its merit is trivial compared to its demerit.

Obviously sandwhiching two values in one column is not going to work.
The only question here is whether it's important to simultaneously
support both DAC and MAC.  As far as I can see, KaiGai is the only one
arguing that we don't need to do that (except for Tom, who doesn't
like either feature).  If anyone else agrees with his position, now
would be a good time to speak up.

Peter made an excellent point a few emails upthread: there seemed to
be consensus in the September CommitFest that we needed SQL-level
support for row and column level security before we talked about
implementing those features as part of SELinux.  I don't see that
we're any closer to that goal than we were then.  There has been some
progress made on column-level permissions, but the patch is back in
waiting for author limbo, and the only alternatives for SQL-level
row-level permissions is to have them INSTEAD OF SELinux-based
row-level permissions.  That's not the same thing at all, and I think
it's also the underlying reason behind Bruce's complaint here:

http://archives.postgresql.org/pgsql-hackers/2008-12/msg00863.php

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Robert Haas
 Which raises the issue, if we could get better statistics by passing
 the whole table, why not do that when VACUUM ANALYZE is run?

I think the reason is because the next autovacuum would undo it.

Perhaps a table-level option to scan the whole table instead of
estimating would be appropriate?
.
...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Simon Riggs

On Fri, 2008-12-12 at 16:10 +, Gregory Stark wrote:
 Right, but increasing our sample size by a factor of 150 for a 100M
 row table doesn't seem like a reasonable solution to one metric being
 bogus.
 
 For that matter, if we do consider sampling 5% of the table we may as
 well just go ahead and scan the whole table. It wouldn't take much
 longer and it would actually produce good estimates.

As I said, we would only increase sample for ndistinct, not for others.

At the moment we completely and significantly fail to assess ndistinct
correctly on clustered data for large tables. Using block level sampling
would prevent that. Right now we may as well use a random number
generator.

The amount of I/O could stay the same, just sample all rows on block.
Lifting the sample size will help large tables. Will it be perfect? No.
But I'll take better over not working at all.

If we are going to quote literature we should believe all the
literature. We can't just listen to some people that did a few tests
with sample size, but then ignore the guy that designed the MS optimizer
and many others.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Simon Riggs

On Fri, 2008-12-12 at 11:16 -0500, Tom Lane wrote:

 If you want ANALYZE to be cheap then you simply don't get to have 
 a trustworthy value of ndistinct.

Understood, but a cheap ANALYZE isn't always a higher priority than all
other considerations.

Solutions can also include

* allowing user to note that we would actually like to scan the whole
table (stats_target = -2?)

* manual mechanism for setting ndistinct that doesn't keep getting
overwritten by subsequent ANALYZEs

* have the executor do non-transactional update of the value of
ndistinct if it ever builds a hash table that is larger than expected
(simple learning optimizer action)

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread Michael Meskes
On Fri, Dec 12, 2008 at 10:06:30AM -0500, Tom Lane wrote:
 Hmm ... actually, ecpg might be a problem here anyway.  I know it has
 special meaning for :name, but does it allow space between the colon
 and the name?  If it does then the colon syntax loses.  If it doesn't

No. Here's the lexer rule:

SQL:{identifier}(((-|\.){identifier})|(\[{array}\]))*  

No space possible between :  and {identifier}.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: mes...@jabber.org
Go VfL Borussia! Go SF 49ers! Use Debian GNU/Linux! Use PostgreSQL!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

2008-12-12 Thread Gregory Stark
Robert Haas robertmh...@gmail.com writes:

 Obviously sandwhiching two values in one column is not going to work.
 The only question here is whether it's important to simultaneously
 support both DAC and MAC.  As far as I can see, KaiGai is the only one
 arguing that we don't need to do that (except for Tom, who doesn't
 like either feature).  If anyone else agrees with his position, now
 would be a good time to speak up.

Well maybe I'm not following. I think it's strange to need two different
row-based security methods. Can you give some examples of how you would use
these two mechanisms together?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Simon Riggs

On Fri, 2008-12-12 at 11:16 -0500, Tom Lane wrote:

 Perhaps a better plan is to try to de-emphasize use of ndistinct,
 though I concede I have no idea how to do that.

We don't actually care about the accuracy of the ndistinct much, just
the accuracy of our answer to the question given work_mem = X, is it
better to use a hash plan.

So we just need to scan the table until we can answer that question
accurately enough. i.e. a variable sized sample.

Perhaps we could store a probability distribution for various values of
work_mem, rather than a single ndistinct value.

Anyway, definitely handwaving now to stimulate ideas.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Alvaro Herrera
Robert Haas escribió:
  Which raises the issue, if we could get better statistics by passing
  the whole table, why not do that when VACUUM ANALYZE is run?
 
 I think the reason is because the next autovacuum would undo it.

Is there any way to merge the statistics?  i.e. if a full table scan
is done to compute precise statistics, and later a regular analyze scan
is done, then perhaps instead of clobbering the previous stats, you
merge them with the new ones, thus not completely losing those previous
ones.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Alvaro Herrera
Tom Lane escribió:

 If you want ANALYZE to be cheap then you simply don't get to have a
 trustworthy value of ndistinct.

But then, maybe it's not all that critical that ANALYZE is cheap.  For
example, if we were to rework VACUUM ANALYZE so that on the same pass
that VACUUM cleans each heap page, a callback is called on the page to
grab the needed stats.

Partial vacuum is a roadblock for this though :-(

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Gregory Stark
Simon Riggs si...@2ndquadrant.com writes:

 The amount of I/O could stay the same, just sample all rows on block.
 Lifting the sample size will help large tables. Will it be perfect? No.
 But I'll take better over not working at all.

That will just raise the table size at which the problems start. It'll still
be a constant-sized sample.

It will also introduce strange biases. For instance in a clustered table it'll
think there are a lot more duplicates than there really are because it'll see
lots of similar values.

Incidentally we *do* do block sampling. We pick random blocks and then pick
random records within those blocks. This was new in, uh, 7.4? 8.0? Sometime
around then. It dramatically reduced the i/o requirements but there were long
discussions of how to do it without introducing biases.

 If we are going to quote literature we should believe all the
 literature. We can't just listen to some people that did a few tests
 with sample size, but then ignore the guy that designed the MS optimizer
 and many others.

I'm not sure what you're talking about regarding some people that did a few
tests. I looked around for the paper I keep referencing and can't find it on
my laptop. I'll look for it online. But it included a section which was a
survey of past results from other papers and the best results required
stupidly large sample sizes to get anything worthwhile. 

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

2008-12-12 Thread Alvaro Herrera
Robert Haas escribió:

 Peter made an excellent point a few emails upthread: there seemed to
 be consensus in the September CommitFest that we needed SQL-level
 support for row and column level security before we talked about
 implementing those features as part of SELinux.  I don't see that
 we're any closer to that goal than we were then.  There has been some
 progress made on column-level permissions, but the patch is back in
 waiting for author limbo, and the only alternatives for SQL-level
 row-level permissions is to have them INSTEAD OF SELinux-based
 row-level permissions.

I don't understand -- why wouldn't we just have two columns, one for
plain row-level security and another for whatever security system the
platforms happens to offer?  If we were to follow that route, we could
have row-level security first, extracting the feature from the current
patch; and the rest of PGACE could be a much smaller patch implementing
the rest of the stuff, with SELinux support for now with an eye to
implementing Solaris TX or whatever.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Simon Riggs

On Fri, 2008-12-12 at 16:10 +, Gregory Stark wrote:
 Uhm, this is a survey of lots of different methods and does lots of
 analysis.
 I don't see any simple conclusions about stability. Perhaps I'm just
 missing
 it in the technical details. Could you point out exactly what part of
 the
 paper you're basing this on and what stable means?

I was echoing the comments in the ANALYZE code, which explain that we
use the Duj1 estimator because it is more stable across sample size, as
shown in table 5 on p.21 of the Haas Stokes report.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

2008-12-12 Thread Robert Haas
 Peter made an excellent point a few emails upthread: there seemed to
 be consensus in the September CommitFest that we needed SQL-level
 support for row and column level security before we talked about
 implementing those features as part of SELinux.  I don't see that
 we're any closer to that goal than we were then.  There has been some
 progress made on column-level permissions, but the patch is back in
 waiting for author limbo, and the only alternatives for SQL-level
 row-level permissions is to have them INSTEAD OF SELinux-based
 row-level permissions.

 I don't understand -- why wouldn't we just have two columns, one for
 plain row-level security and another for whatever security system the
 platforms happens to offer?  If we were to follow that route, we could
 have row-level security first, extracting the feature from the current
 patch; and the rest of PGACE could be a much smaller patch implementing
 the rest of the stuff, with SELinux support for now with an eye to
 implementing Solaris TX or whatever.

Well, I think we should do exactly what you're proposing, so don't ask me.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Kevin Grittner
 Robert Haas robertmh...@gmail.com wrote: 
  Which raises the issue, if we could get better statistics by
passing
 the whole table, why not do that when VACUUM ANALYZE is run?
 
 I think the reason is because the next autovacuum would undo it.
 
The table has 32.4 million rows.
autovacuum_analyze_scale_factor is 0.1.
autovacuum_vacuum_scale_factor is 0.2.
We run a nightly VACUUM ANALYZE.
Deletes are rare.
Normal operations don't update more than a few thousand rows per day.
 
I know that normal operations never cause an autovacuum of this table.
Perhaps if there was a way to share this information with
PostgreSQL
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

2008-12-12 Thread Robert Haas
On Fri, Dec 12, 2008 at 11:57 AM, Gregory Stark st...@enterprisedb.com wrote:
 Obviously sandwhiching two values in one column is not going to work.
 The only question here is whether it's important to simultaneously
 support both DAC and MAC.  As far as I can see, KaiGai is the only one
 arguing that we don't need to do that (except for Tom, who doesn't
 like either feature).  If anyone else agrees with his position, now
 would be a good time to speak up.

 Well maybe I'm not following. I think it's strange to need two different
 row-based security methods. Can you give some examples of how you would use
 these two mechanisms together?

Not really.  I'm not an SELinux expert.  But typically the two do
exist alongside one another.  For example, installing SELinux (MAC)
does on your system does not make chmod g+w file (DAC) stop working;
it merely performs an ADDITIONAL security check before allowing access
to the file.  You have to satisfy BOTH SELinux AND the ordinary
filesystem permissions system in order to perform an operation on a
file.

Similarly, we have SQL statements GRANT {SELECT|INSERT|DELETE|etc} ON
table... and hopefully soon a column-level variant of that same
syntax.  Those permissions aren't going to be ignored just because you
also enable SELinux to control access to those tables or columns.  Or
at least I certainly hope they aren't.

The contention of the author of this patch is that row-level access is
somehow different - that even though we have two sets of checks for
files, tables, and (assuming Stephen Frost's patch is accepted)
columns, we will only have one set of checks for rows, and you can
pick which one you want.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Simon Riggs

On Fri, 2008-12-12 at 17:05 +, Gregory Stark wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 
  The amount of I/O could stay the same, just sample all rows on block.
  Lifting the sample size will help large tables. Will it be perfect? No.
  But I'll take better over not working at all.
 
 That will just raise the table size at which the problems start. It'll still
 be a constant-sized sample.

Work with me here. I want to make the situation better. It still won't
be perfect, but is that an argument against any action at all?

 It will also introduce strange biases. For instance in a clustered table it'll
 think there are a lot more duplicates than there really are because it'll see
 lots of similar values.
 
 Incidentally we *do* do block sampling. We pick random blocks and then pick
 random records within those blocks. This was new in, uh, 7.4? 8.0? Sometime
 around then. It dramatically reduced the i/o requirements but there were long
 discussions of how to do it without introducing biases.

No, we pick random rows. On bigger tables, they get further apart
typically and so we miss any clustering. I mean that we should pick a
random block and read all rows on it.

  If we are going to quote literature we should believe all the
  literature. We can't just listen to some people that did a few tests
  with sample size, but then ignore the guy that designed the MS optimizer
  and many others.
 
 I'm not sure what you're talking about regarding some people that did a few
 tests. I looked around for the paper I keep referencing and can't find it on
 my laptop. I'll look for it online. But it included a section which was a
 survey of past results from other papers and the best results required
 stupidly large sample sizes to get anything worthwhile. 

Even if you find it, we still need to know why we would listen to the
research in the absent paper yet ignore the conclusion in the paper by
the man in charge of the MS optimizer who said that block level sampling
is a good idea.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2008-12-12 Thread Teodor Sigaev

Changes:
 - added vacuum_delay_point() in gininsertcleanup
 - add trigger to run vacuum by number of inserted tuples from
   last vacuum. Number of inserted tuples represents number
   of really inserted tuples in index and it is calculated as
   tuples_inserted + tuples_updated - tuples_hot_updated.
   Trigger fires on instuples  vac_base_thresh because search time is linear
   on number of pending pages (tuples)
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


fast_insert_gin-0.17.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multiplexing SUGUSR1

2008-12-12 Thread Markus Wanner
Hi,

Alvaro Herrera wrote:
 No, the signalling needed here is far simpler than Markus' IMessage
 stuff.

Yup, see also Tom's comment [1].

For Postgres-R I'm currently multiplexing by embedding a message type in
the imessage data itself. So this part is certainly overlapping, yes.

Some of the messages I'm using do have additional payload data, others
don't. Moving this message type out of the body part of the message
itself and instead use the upcoming signal multiplexing could save a few
imessage types in favor of using these multiplexed signals. Most message
types require some additional data to be transferred, though.

From my point of view it's hard to understand why one should want to
move out exactly 32 or 64 bits (sig_atomic_t) of a message. From the
point of view of Postgres, it's certainly better than being bound to the
existing Unix signals.

Regards

Markus Wanner

[1]:
http://archives.postgresql.org/message-id/28487.1221147...@sss.pgh.pa.us

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Dec 12, 2008 at 4:04 AM, Simon Riggs si...@2ndquadrant.com wrote:
 The existing sampling mechanism is tied to solid statistics.
 
 Sounds great, but its not true. The sample size is not linked to data
 volume, so how can it possibly give a consistent confidence range?

 It is a pretty well-known mathematical fact that for something like an
 opinion poll your margin of error does not depend on the size of the
 population but only on the size of your sample.

Right.  The solid math that Greg referred to concerns how big a sample
we need in order to have good confidence in the histogram results.
It doesn't speak to whether we get good results for ndistinct (or for
most-common-values, though in practice that seems to work fairly well).

AFAICS, marginal enlargements in the sample size aren't going to help
much for ndistinct --- you really need to look at most or all of the
table to be guaranteed anything about that.

But having said that, I have wondered whether we should consider
allowing the sample to grow to fill maintenance_work_mem, rather than
making it a predetermined number of rows.  One difficulty is that the
random-sampling code assumes it has a predetermined rowcount target;
I haven't looked at whether that'd be easy to change or whether we'd
need a whole new sampling algorithm.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

2008-12-12 Thread Bruce Momjian
KaiGai Kohei wrote:
  If we use some type of integer, I suggest using this structure for
  pg_security:
  
  CREATE TABLE pg_security(
  relid oid, 
  secid int2, 
  secacl aclitem[], 
  secext TEXT
  );
  
  This allows the per-row value to be a simple int2.  It also improves
  maintenance because rows are associated only with a specific table;
  unused values can then be removed more easily.  And it allows both
  secacl and secext security to be specified.
 
 How does the approach resolve the pain of user interface?
 I don't think packing two or more values into one field is not a right way.

I see later emails that say we have to have both security methods
available at the same time, and the table above does that.  There would
be one security oid on every row and it would refer to this table.  

pg_security would contain every _unique_ combination of secacl and
secext. On INSERT the code looks to see if the secacl/secext exists in
pg_security, and if it does it reuses the same oid, if not it adds a new
row.  (There is no method for cleaning out unreferenced pg_security rows
(relid was supposed to help with that but it also bloats pg_security)).

Some people didn't like it was per-table so I removed the relid column:

CREATE TABLE pg_security(
secid oid, 
secacl aclitem[], 
secext TEXT
);

pg_dump and COPY would dump the per-row oid and pg_security so there
should be only one flag to dump security info, even though it supports
two security methods.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

2008-12-12 Thread Bruce Momjian
KaiGai Kohei wrote:
  Also, having the per-row value always be present in the row and
  controlled by the bitmask seems ideal;  it avoids having to add a CREATE
  TABLE option.
  Sorry, I don't understand why it related to a CREATE TABLE option.
  System columns are always allocated for any tables?
  
  Does a table use storage for the security column if no SQL-level
  security value is supplied for a given row?
 
 When Row-level ACL is enabled on the table and user suppies a tuple
 without any specific ACLs, it requires security field, because the
 length of HeapTuple is decided at heap_form_tuple() which is invoked
 prior to fetching the user supplied ACLs.
 
 When Row-level ACL is disabled (by pg_class.reloptions) on the table,
 the storage for security field is not necessary.

It is possible to re-call heap_form_tuple() once we know we need a
security field;  I talked Tom about that.  We can worry about it later.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Tom Lane escribió:
 If you want ANALYZE to be cheap then you simply don't get to have a
 trustworthy value of ndistinct.

 But then, maybe it's not all that critical that ANALYZE is cheap.  For
 example, if we were to rework VACUUM ANALYZE so that on the same pass
 that VACUUM cleans each heap page, a callback is called on the page to
 grab the needed stats.

 Partial vacuum is a roadblock for this though :-(

Yeah --- now that partial vacuum is in, any argument that we can make
ANALYZE piggyback on VACUUM cheaply is dead anyway.

It would be interesting to consider partial analyze processing, but I
don't see how you would combine per-page partial results without a huge
increase in stats-related state data.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Greg Stark
On Fri, Dec 12, 2008 at 5:33 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Incidentally we *do* do block sampling. We pick random blocks and then pick
 random records within those blocks. This was new in, uh, 7.4? 8.0? Sometime
 around then. It dramatically reduced the i/o requirements but there were long
 discussions of how to do it without introducing biases.

 No, we pick random rows. On bigger tables, they get further apart
 typically and so we miss any clustering. I mean that we should pick a
 random block and read all rows on it.

I think what's happening here is that our sampling method is based on
the assumption that a records location is not related to its value.

Consider a table which is clustered and has two copies of each value.
When we look at a block and see n/2 values and we know there are 1000
blocks then a human would conjecture that there are 1000*n/2 distinct
values throughout the table and every value is represented twice
throughout the whole table. But if we're assuming the records are
randomly distributed then looking at the whole block will actually
throw us off completely. We'll deduce from the fact that we saw every
value twice that there must be hundreds of copies spread throughout
the database and there must be a lot less than 1000*n/2 distinct
values.

I think you need to find two different formulas, one which represents
a clustered table and one which represents randomly distributed data.
Then you need a way to measure just how clustered the data is so you
know how much weight to give each formula. Perhaps comparing the
number of duplicates in whole-block samples versus overall random
selections would give that measure.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-12 Thread Tom Lane
Michael Meskes mes...@postgresql.org writes:
 On Fri, Dec 12, 2008 at 10:06:30AM -0500, Tom Lane wrote:
 Hmm ... actually, ecpg might be a problem here anyway.  I know it has
 special meaning for :name, but does it allow space between the colon
 and the name?  If it does then the colon syntax loses.  If it doesn't

 No. Here's the lexer rule:
 SQL:{identifier}(((-|\.){identifier})|(\[{array}\]))*  
 No space possible between :  and {identifier}.

Excellent.  I checked that psql's colon-variable feature behaves the
same.  So it looks like the proposed name: value syntax would indeed
not break any existing features.  Barring better ideas I think we should
go with that one.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Simon Riggs

On Fri, 2008-12-12 at 14:03 -0300, Alvaro Herrera wrote:

 Partial vacuum is a roadblock for this though :-(

Actually, perhaps its an enabler for looking at changed stats?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Fri, 2008-12-12 at 11:16 -0500, Tom Lane wrote:
 Perhaps a better plan is to try to de-emphasize use of ndistinct,
 though I concede I have no idea how to do that.

 We don't actually care about the accuracy of the ndistinct much, just
 the accuracy of our answer to the question given work_mem = X, is it
 better to use a hash plan.

That's hardly the only thing we use ndistinct for.  Also, it's a bit too
simplistic to suppose that we only have to make the right binary choice
between hash and something else at a particular plan level.  If we don't
have at least ballpark-correct figures for cost and number of output
rows, we'll start making mistakes at higher plan levels.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Is there any way to merge the statistics?  i.e. if a full table scan
 is done to compute precise statistics, and later a regular analyze scan
 is done, then perhaps instead of clobbering the previous stats, you
 merge them with the new ones, thus not completely losing those previous
 ones.

Seems like a pretty hard problem unless you store a whole lot more
statistics state than we do now (which of course would create its own
costs).  How would you know which portion of the old stats to not
believe anymore?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Bruce Momjian
Alvaro Herrera wrote:
 Robert Haas escribi?:
   Which raises the issue, if we could get better statistics by passing
   the whole table, why not do that when VACUUM ANALYZE is run?
  
  I think the reason is because the next autovacuum would undo it.
 
 Is there any way to merge the statistics?  i.e. if a full table scan
 is done to compute precise statistics, and later a regular analyze scan
 is done, then perhaps instead of clobbering the previous stats, you
 merge them with the new ones, thus not completely losing those previous
 ones.

Crazy idea, but if a partial analyze finds that 5% of the table has
changed since the last full analyze, but 10% of the statistics are
different, we know something is wrong.  ;-)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Bruce Momjian
Alvaro Herrera wrote:
 Tom Lane escribi?:
 
  If you want ANALYZE to be cheap then you simply don't get to have a
  trustworthy value of ndistinct.
 
 But then, maybe it's not all that critical that ANALYZE is cheap.  For
 example, if we were to rework VACUUM ANALYZE so that on the same pass
 that VACUUM cleans each heap page, a callback is called on the page to
 grab the needed stats.
 
 Partial vacuum is a roadblock for this though :-(

Perhaps it isn't because partial vacuum is going to highlight the
_changed_ blocks, which fits into your idea of merging stats, somehow. ;-)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Simon Riggs

On Fri, 2008-12-12 at 18:01 +, Greg Stark wrote:

 I think you need to find two different formulas, one which represents
 a clustered table and one which represents randomly distributed data.
 Then you need a way to measure just how clustered the data is so you
 know how much weight to give each formula. Perhaps comparing the
 number of duplicates in whole-block samples versus overall random
 selections would give that measure.

Please read the Chaudhuri paper.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner

2008-12-12 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 As I said, we would only increase sample for ndistinct, not for others.

How will you do that?  Keep in mind that one of the things we have to do
to compute ndistinct is to sort the sample.  ISTM that the majority of
the cost of a larger sample is going to get expended anyway ---
certainly we could form the histogram using the more accurate data at
precisely zero extra cost, and I think we have also pretty much done all
the work for MCV collection by the time we finish counting the number of
distinct values.

I seem to recall Greg suggesting that there were ways to estimate
ndistinct without sorting, but short of a fundamental algorithm change
there's not going to be a win here.

 Right now we may as well use a random number generator.

Could we skip the hyperbole please?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >