Re: [HACKERS] standard_conforming_strings

2010-07-14 Thread Richard Huxton

On 14/07/10 15:48, Robert Haas wrote:

On Fri, Jan 29, 2010 at 10:02 PM, Josh Berkus  wrote:

An actual plan here might look like "let's flip it before 9.1alpha1
so we can get some alpha testing cycles on it" ...


"Hey, let's flip it in 9.1 CF 1, so that we can have some alpha testing
cycles on it."


Should we do this?  Patch attached.


Any reason not to add a line to the 9.0 docs/release notes saying 
"WARNING: The PGDG currently plan to change this setting's default in 9.1"?


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, proof of concept

2010-07-14 Thread Pavel Stehule
2010/7/14 Peter Eisentraut :
> On ons, 2010-07-14 at 19:35 +0200, Pavel Stehule wrote:
>> I have only one question - If I understand well you can use collate
>> just for sort. What is your plan for range search operation?
>
> My patch does range searches.  Sorting uses the same operators, so both
> will be supported.  (Sorting is not yet implemented, as I had written.)
>
>> Sort is
>> interesting and I am sure important for multilangual applications, for
>> me - more important is case sensitive, case insensitive, accent
>> sensitive, insensitive filtering - do you have a plan for it?
>
> You may be able to do some of these by using appropriate locale
> definitions.  I'd need some examples to be able to tell for sure.
>
>> what about DISTINCT clause, maybe GROUP BY clause ?
>
> DISTINCT and GROUP BY work with equality, which is not affected by
> locales (at least under the current rules).
>

:( maybe we have to enhance a locales - or do some work in this way.
In Czech's IS is relative often operation some like

name = 'Stěhule' COLLATION cs_CZ_cs_ai -- compare case insensitive
accent insensitive

PostgreSQL is last db, that doesn't integreated support for it

Regards

Pavel

>
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TRUNCATE+COPY optimization and --jobs=1 in pg_restore

2010-07-14 Thread Robert Haas
On Wed, Feb 10, 2010 at 12:19 AM, Tom Lane  wrote:
> Andrew Dunstan  writes:
>> Tom Lane wrote:
>>> The code is only trying to substitute for something you can't have
>>> in parallel restore, ie --single-transaction.
>
>> Exactly. IIRC that's why --single-transaction was introduced in the
>> first place.
>
> To me, --single-transaction is mostly there for people who want to be
> sure they have all-or-nothing restore behavior.  Optimizations are
> secondary.
>
>> Takahiro-san is suggesting there is a case for doing the optimisation in
>> non-parallel mode. But if we do that, is there still a case for
>> --single-transaction?
>
> Yeah, per above.  The main problem I have with doing it in non-parallel
> restore mode is that we couldn't safely do it when outputting to a
> script (since we don't know if the user will try to put begin/end
> around the script), and I really do not want to allow any differences
> between script output and direct-to-database output.  Once that camel's
> nose gets under the tent, debuggability will go down the tubes...

Is this a fatal defect or is there a way to salvage this idea?

Another possible issue is that this changes the behavior.  Suppose the
table wasn't empty before we truncated it...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for 9.1: initdb -C option

2010-07-14 Thread KaiGai Kohei
David,

I'd like to volunteer reviewing your patch at first in this commit fest.

We already had a few comments on the list before. I want to see your
opinion for the suggestions prior to code reviews.

Itagaki-san suggested:
| > Enclosed is a patch to add a -C option to initdb to allow you to easily
| > append configuration directives to the generated postgresql.conf file
| Why don't you use just "echo 'options' >> $PGDATA/postgresql.conf" ?
| Could you explain where the -C options is better than initdb + echo?

Greg suggested:
| We had a patch not quite make it for 9.0 that switched over the 
postgresql.conf
| file to make it easy to scan a whole directory looking for configuration 
files:
| 
http://archives.postgresql.org/message-id/9837222c0910240641p7d75e2a4u2cfa6c1b5e603...@mail.gmail.com
|
| The idea there was to eventually reduce the amount of postgresql.conf hacking
| that initdb and other tools have to do.  Your patch would add more code into
| a path that I'd like to see reduced significantly.
|
| That implementation would make something easy enough for your use case too
| (below untested but show the general idea):
|
| $ for cluster in 1 2 3 4 5 6;
|  do initdb -D data$cluster
|  (
|  cat < data$cluster/conf.d/99clustersetup
| done
|
| This would actually work just fine for what you're doing right now if you used
| ">> data$cluster/postgresql.conf" for that next to last line there.
| There would be duplicates, which I'm guessing is what you wanted to avoid with
| this patch, but the later values set for the parameters added to the end would
| win and be the active ones.

Peter suggested:
| > Enclosed is a patch to add a -C option to initdb to allow you to easily
| > append configuration directives to the generated postgresql.conf file
| > for use in programmatic generation.
| I like this idea, but please use small -c for consistency with the
| postgres program.

It seems to me what Greg suggested is a recent trend. Additional configurations
within separated files enables to install/uninstall third-party plugins easily
from the perspective of packagers, rather than consolidated configuration.

However, $PGDATA/postgresql.conf is created on initdb, so it does not belong
to a certain package. I don't have certainty whether the recent trend is also
suitable for us, or not.

Thanks,

(2010/03/29 14:04), David Christensen wrote:
> Hackers,
> 
> Enclosed is a patch to add a -C option to initdb to allow you to easily 
> append configuration directives to the generated postgresql.conf file for use 
> in programmatic generation.  In my case, I'd been creating multiple db 
> clusters with a script and would have specific overrides that I needed to 
> make.   This patch fell out of the desire to make this a little cleaner.  
> Please review and comment.
> 
>  From the commit message:
> 
>  This is a simple mechanism to allow you to provide explicit overrides
>  to any GUC at initdb time.  As a basic example, consider the case
>  where you are programmatically generating multiple db clusters in
>  order to test various configurations:
> 
>$ for cluster in 1 2 3 4 5 6;
>>do initdb -D data$cluster -C "port = 1234$cluster" -C 
> 'max_connections = 10' -C shared_buffers=1M;
>>  done
> 
>  A possible future improvement would be to provide some basic
>  formatting corrections to allow specificications such as -C 'port
>  1234', -C port=1234, and -C 'port = 1234' to all be ultimately output
>  as 'port = 1234' in the final output.  This would be consistent with
>  postmaster's parsing.
> 
>  The -C flag was chosen to be a mnemonic for "config".
> 
> Regards,
> 
> David
> --
> David Christensen
> End Point Corporation
> da...@endpoint.com
> 

-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Dimitri Fontaine
Joshua Tolley  writes:
>> >>   ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id =3D y.id AND x.id=
2 =3D y.id2)
>> >=20
>> it says X and Y ...  the selectivity of joins are what i am most
>> interested in. cross correlation of columns within the same table are
>> just a byproduct.  the core thing is: how can i estimate the number
>> of rows returned from a join?
>
> All the discussion of this topic that I've seen has been limited to the s=
ingle
> table case. The hard problem in that case is coming up with something you=
can
> precalculate that will actually be useful during query planning, without
> taking too much disk, memory, CPU, or something else. Expanding the discu=
ssion
> to include join relations certainly still has valid use cases, but is even
> harder, because you've also got to keep track of precisely how the underl=
ying
> relations are joined, so you know in what context the statistics remain v=
alid.

Well I've been proposing to handle the correlation problem in another
way in some past mails here, and I've been trying to write it down too:

  http://archives.postgresql.org/pgsql-performance/2009-06/msg00118.php
  http://tapoueh.org/char10.html#sec13

What I propose is to extend ANALYZE to be able to work on a VIEW too,
rather than just a table. The hard parts seems to be:

a. what stats to record, exploiting the view definition the best we can
b. how to match a user query against the view definitions we have in
order to actually use the stats

If you have answers or good ideas=C2=A0:)

Regards,
--=20
dim


-- 
dim


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread marcin mank
On Wed, Jul 14, 2010 at 5:13 PM, Robert Haas  wrote:
> 2010/7/14 Tom Lane :
>> If the combination of columns is actually interesting, there might well
>> be an index in place, or the DBA might be willing to create it.
>
> Indexes aren't free, though, nor even close to it.
>
> Still, I think we should figure out the underlying mechanism first and
> then design the interface afterwards.  One idea I had was a way to say
> "compute the MCVs and histogram buckets for this table WHERE
> ".  If you can prove predicate for a particular query, you
> can use the more refined statistics in place of the full-table
> statistics.  This is fine for the breast cancer case, but not so
> useful for the zip code/street name case (which seems to be the really
> tough one).
>

One way of dealing with the zipcode problem is estimating NDST =
count(distinct row(zipcode, street))  - i.e. multi-column ndistinct.

Then the planner doesn`t have to assume that the selectivity of a
equality condition involving both zipcode and city is a multiple of
the respective selectivities. As a first cut it can assume that it
will get count(*) / NDST rows, but there are ways to improve it.

Greetings
Marcin Mańk

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, proof of concept

2010-07-14 Thread Peter Eisentraut
On ons, 2010-07-14 at 19:35 +0200, Pavel Stehule wrote:
> I have only one question - If I understand well you can use collate
> just for sort. What is your plan for range search operation?

My patch does range searches.  Sorting uses the same operators, so both
will be supported.  (Sorting is not yet implemented, as I had written.)

> Sort is
> interesting and I am sure important for multilangual applications, for
> me - more important is case sensitive, case insensitive, accent
> sensitive, insensitive filtering - do you have a plan for it?

You may be able to do some of these by using appropriate locale
definitions.  I'd need some examples to be able to tell for sure.

> what about DISTINCT clause, maybe GROUP BY clause ?

DISTINCT and GROUP BY work with equality, which is not affected by
locales (at least under the current rules).



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PgWest 2010 Call for Papers!

2010-07-14 Thread Joshua D. Drake
Following on the smashing success of PostgreSQL Conference East,
PostgreSQL Conference West, The PostgreSQL Conference for Decision
Makers, End Users and Developers, is being held at the St. Francis,
Westin Hotel in San Francisco from November 2nd through 4th 2010. Please
join us in making this the largest PostgreSQL Conference to date!

Main site:
http://www.postgresqlconference.org/

CFP:
http://www.postgresqlconference.org/2010/west/cfp

Thank you to our sponsors:
Command Prompt: http://www.commandprompt.com/
EnterpriseDB: http://www.enterprisedb.com/


Time line:
July 14th: Talk submission opens
Sept 5th: Talk submission closes
Sept 10th: Speaker notification

This year we will be continuing our trend of covering the entire
PostgreSQL ecosystem. We would like to see talks and tutorials on the
following topics:

  * General PostgreSQL: 
  * Administration 
  * Performance 
  * High Availability 
  * Migration 
  * GIS 
  * Integration 
  * Solutions and White Papers 
  * The Stack: 
  * Python/Django/Pylons/TurboGears/Custom 
  * Perl5/Catalyst/Bricolage 
  * Ruby/Rails 
  * Java (PLJava would be great)/Groovy/Grails 
  * Operating System optimization
(Linux/FBSD/Solaris/Windows) 
  * Solutions and White Papers 
-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Branch created, let the experiment begin ...

2010-07-14 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
> On Jul 14, 2010, at 12:25 PM, "Joshua D. Drake"  
> wrote:
> >> The branch is now created.
> > 
> > Is Git now the authoritative source?
> >> 
> 
> No. Marc created the branch in CVS.

Right, the decision at the dev meeting (which could be OBE, so don't
take this as gospel) was that we would switch to git after the first
CF (basically, late August time-frame).

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Branch created, let the experiment begin ...

2010-07-14 Thread Robert Haas
On Jul 14, 2010, at 12:25 PM, "Joshua D. Drake"  wrote:
> On Fri, 2010-07-09 at 00:58 -0300, Marc G. Fournier wrote:
>> As decided at this years hackers conference, we are branching 
>> REL9_0_STABLE *before* the release, instead of after.
>> 
>> The hope is that we won't be taking away resources from finishing the 
>> release, but still allow ppl to continue to work on projects that are for 
>> 9.1.
>> 
>> The branch is now created.
> 
> Is Git now the authoritative source?
>> 

No. Marc created the branch in CVS.

...Robert

Re: [HACKERS] Per-column collation, proof of concept

2010-07-14 Thread Pavel Stehule
Hello

I have only one question - If I understand well you can use collate
just for sort. What is your plan for range search operation? Sort is
interesting and I am sure important for multilangual applications, for
me - more important is case sensitive, case insensitive, accent
sensitive, insensitive filtering - do you have a plan for it?

Regards

Pavel Stehule

2010/7/13 Peter Eisentraut :
> Here is a proof of concept for per-column collation support.
>
> Here is how it works: When creating a table, an optional COLLATE clause
> can specify a collation name, which is stored (by OID) in pg_attribute.
> This becomes part of the type information and is propagated through the
> expression parse analysis, like typmod.  When an operator or function
> call is parsed (transformed), the collations of the arguments are
> unified, using some rules (like type analysis, but different in detail).
> The collations of the function/operator arguments come either from Var
> nodes which in turn got them from pg_attribute, or from other
> function and operator calls, or you can override them with explicit
> COLLATE clauses (not yet implemented, but will work a bit like
> RelabelType).  At the end, each function or operator call gets one
> collation to use.
>

what about DISTINCT clause, maybe GROUP BY clause ?

regards

Pavel

> The function call itself can then look up the collation using the
> fcinfo->flinfo->fn_expr field.  (Works for operator calls, but doesn't
> work for sort operations, needs more thought.)
>
> A collation is in this implementation defined as an lc_collate string
> and an lc_ctype string.  The implementation of functions interested in
> that information, such as comparison operators, or upper and lower
> functions, will take the collation OID that is passed in, look up the
> locale string, and use the xlocale.h interface (newlocale(),
> strcoll_l()) to compute the result.
>
> (Note that the xlocale stuff is only 10 or so lines in this patch.  It
> should be feasible to allow other appropriate locale libraries to be
> used.)
>
> Loose ends:
>
> - Support function calls (currently only operator calls) (easy)
>
> - Implementation of sort clauses
>
> - Indexing support/integration
>
> - Domain support (should be straightforward)
>
> - Make all expression node types deal with collation information
>  appropriately
>
> - Explicit COLLATE clause on expressions
>
> - Caching and not leaking memory of locale lookups
>
> - I have typcollatable to mark which types can accept collation
>  information, but perhaps there should also be proicareaboutcollation
>  to skip collation resolution when none of the functions in the
>  expression tree care.
>
> You can start by reading the collate.sql regression test file to see
> what it can do.  Btw., regression tests only work with "make check
> MULTIBYTE=UTF8".  And it (probably) only works with glibc for now.
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Branch created, let the experiment begin ...

2010-07-14 Thread Joshua D. Drake
On Fri, 2010-07-09 at 00:58 -0300, Marc G. Fournier wrote:
> As decided at this years hackers conference, we are branching 
> REL9_0_STABLE *before* the release, instead of after.
> 
> The hope is that we won't be taking away resources from finishing the 
> release, but still allow ppl to continue to work on projects that are for 
> 9.1.
> 
> The branch is now created.

Is Git now the authoritative source?

JD

> 
> 
> 

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, proof of concept

2010-07-14 Thread Kevin Grittner
Peter Eisentraut  wrote:
 
> Here is a proof of concept for per-column collation support.
 
Did you want a WIP review of that patch?  (CF closing to new
submissions soon)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] standard_conforming_strings

2010-07-14 Thread Tom Lane
Robert Haas  writes:
> On Fri, Jan 29, 2010 at 10:02 PM, Josh Berkus  wrote:
> An actual plan here might look like "let's flip it before 9.1alpha1
> so we can get some alpha testing cycles on it" ...
>> 
>> "Hey, let's flip it in 9.1 CF 1, so that we can have some alpha testing
>> cycles on it."

> Should we do this?  Patch attached.

I'm still scared to death of the security implications, but if we don't
do it now, when will be a better time?  Might as well try it and see
what breaks.  Note there had better be a large compatibility warning in
the alpha release notes, and we had better pester driver authors to test
with 9.1alpha1 as soon as it's out.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous replication

2010-07-14 Thread Robert Haas
On Wed, Jul 14, 2010 at 2:50 AM, Fujii Masao  wrote:
> The patch have no features for performance improvement of synchronous
> replication. I admit that currently the performance overhead in the
> master is terrible. We need to address the following TODO items in the
> subsequent CF.
>
> * Change the poll loop in the walsender
> * Change the poll loop in the backend
> * Change the poll loop in the startup process
> * Change the poll loop in the walreceiver
> * Perform the WAL write and replication concurrently
> * Send WAL from not only disk but also WAL buffers

I have a feeling that if we don't have a design for these last two
before we start committing things, we're possibly going to regret it
later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Robert Haas
2010/7/14 Tom Lane :
> If the combination of columns is actually interesting, there might well
> be an index in place, or the DBA might be willing to create it.

Indexes aren't free, though, nor even close to it.

Still, I think we should figure out the underlying mechanism first and
then design the interface afterwards.  One idea I had was a way to say
"compute the MCVs and histogram buckets for this table WHERE
".  If you can prove predicate for a particular query, you
can use the more refined statistics in place of the full-table
statistics.  This is fine for the breast cancer case, but not so
useful for the zip code/street name case (which seems to be the really
tough one).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Joshua Tolley
On Wed, Jul 14, 2010 at 04:41:01PM +0200, PostgreSQL - Hans-Jürgen Schönig 
wrote:
> hello ...
> 
> look at the syntax i posted in more detail:
> 
> >>ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id = y.id AND x.id2 = 
> >> y.id2)
> > 
> it says X and Y ...
> the selectivity of joins are what i am most interested in. cross correlation 
> of columns within the same table are just a byproduct.
> the core thing is: how can i estimate the number of rows returned from a join?

All the discussion of this topic that I've seen has been limited to the single
table case. The hard problem in that case is coming up with something you can
precalculate that will actually be useful during query planning, without
taking too much disk, memory, CPU, or something else. Expanding the discussion
to include join relations certainly still has valid use cases, but is even
harder, because you've also got to keep track of precisely how the underlying
relations are joined, so you know in what context the statistics remain valid.
So it makes sense to tackle the single table version first. Once it's
implemented somehow, and has been proven sufficiently effective to merit the
increased code size and complexity, we can consider expanding it to joined
relations.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


signature.asc
Description: Digital signature


Re: [HACKERS] five-key syscaches

2010-07-14 Thread Robert Haas
On Wed, Jul 14, 2010 at 10:56 AM, Yeb Havinga  wrote:
> Robert Haas wrote:
>> On Wed, Jul 14, 2010 at 7:27 AM, Yeb Havinga  wrote:
>>> Attach is a new patch with all things described above addressed.
>> Thanks!
>>
>> I think we should probably hold off applying this until some of the
>> other KNNGIST work is ready, or we have some other concrete need for
>> 5-key syscaches.
>
> Any thoughts about the << 16 and >> 16 bit shifting on the 5th hash key
> computation? I blithely copied it from the 3rd key.

Hmm, I thought I had the bit in my version, but I see that I don't.
Must have gotten lost from an earlier incarnation.  It's probably bad
to duplicate the bit-shifting pattern of an existing key.  We might
want to shift by something that's not a multiple of 8, like 12/20.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] standard_conforming_strings

2010-07-14 Thread Bruce Momjian
Robert Haas wrote:
> On Fri, Jan 29, 2010 at 10:02 PM, Josh Berkus  wrote:
> >> An actual plan here might look like "let's flip it before 9.1alpha1
> >> so we can get some alpha testing cycles on it" ...
> >
> > "Hey, let's flip it in 9.1 CF 1, so that we can have some alpha testing
> > cycles on it."
> 
> Should we do this?  Patch attached.

+1

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + None of us is going to be here forever. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] suppress automatic recovery after back crash

2010-07-14 Thread Robert Haas
On Wed, Jul 14, 2010 at 3:41 AM, Fujii Masao  wrote:
> I read the patch and found some small typos.

Thanks.  Corrected version attached.

> We should add something like?:
>
> -
> Even if this value is set to true, a backend crash during hot standby doesn't
> reinitialize the database.
> -

Is that actually true? AFAICS, RecoveryError only gets set if the
*startup* process crashes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 1aff181..7eb6521 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5299,6 +5299,47 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
 

 
+   
+Error Handling
+
+
+
+ 
+  exit_on_error (boolean)
+  
+   exit_on_error configuration parameter
+  
+  
+   
+If true, any error will terminate the current session.  By default,
+this is set to false, so that only FATAL errors will terminate the
+session.
+   
+  
+ 
+
+ 
+  automatic_restart (boolean)
+  
+   automatic_restart configuration parameter
+  
+  
+   
+When set to true, which is the default, PostgreSQL
+will automatically reinitialize after a backend crash.  Leaving this
+value set to true is normally the best way to maximize the availability
+of the database.  However, in some circumstances, such as when
+PostgreSQL is being invoked by clusterware, it may be
+useful to disable this behavior, so that the clusterware can gain
+control and take any actions it deems appropriate.
+   
+  
+ 
+
+
+
+   
+

 Preset Options
 
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 11f5022..7e911c1 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -203,6 +203,7 @@ bool		Db_user_namespace = false;
 
 bool		enable_bonjour = false;
 char	   *bonjour_name;
+bool		automatic_restart = true;
 
 /* PIDs of special child processes; 0 when not running */
 static pid_t StartupPID = 0,
@@ -3048,12 +3049,12 @@ PostmasterStateMachine(void)
 	}
 
 	/*
-	 * If recovery failed, wait for all non-syslogger children to exit, and
-	 * then exit postmaster. We don't try to reinitialize when recovery fails,
-	 * because more than likely it will just fail again and we will keep
-	 * trying forever.
+	 * If recovery failed, or if automatic restart has been disabled, wait for
+	 * all non-syslogger children to exit, and then exit postmaster. We don't
+	 * try to reinitialize when recovery fails, because more than likely it
+	 * will just fail again and we will keep trying forever.
 	 */
-	if (RecoveryError && pmState == PM_NO_CHILDREN)
+	if (pmState == PM_NO_CHILDREN && (RecoveryError || !automatic_restart))
 		ExitPostmaster(1);
 
 	/*
diff --git a/src/backend/utils/misc/check_guc b/src/backend/utils/misc/check_guc
index df597b4..5152b4e 100755
--- a/src/backend/utils/misc/check_guc
+++ b/src/backend/utils/misc/check_guc
@@ -16,7 +16,7 @@
 ## if an option is valid but shows up in only one file (guc.c but not
 ## postgresql.conf.sample), it should be listed here so that it 
 ## can be ignored
-INTENTIONALLY_NOT_INCLUDED="autocommit debug_deadlocks exit_on_error \
+INTENTIONALLY_NOT_INCLUDED="autocommit debug_deadlocks \
 is_superuser lc_collate lc_ctype lc_messages lc_monetary lc_numeric lc_time \
 pre_auth_delay role seed server_encoding server_version server_version_int \
 session_authorization trace_lock_oidmin trace_lock_table trace_locks trace_lwlocks \
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index e839639..bae5ae9 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -548,6 +548,8 @@ const char *const config_group_names[] =
 	gettext_noop("Version and Platform Compatibility / Previous PostgreSQL Versions"),
 	/* COMPAT_OPTIONS_CLIENT */
 	gettext_noop("Version and Platform Compatibility / Other Platforms and Clients"),
+	/* ERROR_HANDLING */
+	gettext_noop("Error Handling"),
 	/* PRESET_OPTIONS */
 	gettext_noop("Preset Options"),
 	/* CUSTOM_OPTIONS */
@@ -811,17 +813,25 @@ static struct config_bool ConfigureNamesBool[] =
 #endif
 		assign_debug_assertions, NULL
 	},
+
 	{
-		/* currently undocumented, so don't show in SHOW ALL */
-		{"exit_on_error", PGC_USERSET, UNGROUPED,
-			gettext_noop("No description available."),
-			NULL,
-			GUC_NO_SHOW_ALL | GUC_NOT_IN_SAMPLE
+		{"exit_on_error", PGC_USERSET, ERROR_HANDLING_OPTIONS,
+			gettext_noop("Terminate session on any error."),
+			NULL
 		},
 		&ExitOnAnyError,
 		false, NULL, NULL
 	},
 	{
+		{"automatic_restart", PGC_SIGHUP, ERROR_HANDLING_OPTIONS,
+			gettext_noop("Reinitialize after backend crash."),
+			NULL
+		},
+		&automatic_restart,
+		true, NULL, NULL
+	},
+
+	{
 		{"log_duration", PGC_SUSET, LOG

Re: [HACKERS] five-key syscaches

2010-07-14 Thread Yeb Havinga

Robert Haas wrote:

On Wed, Jul 14, 2010 at 7:27 AM, Yeb Havinga  wrote:
  

Attach is a new patch with all things described above addressed.



Thanks!

I think we should probably hold off applying this until some of the
other KNNGIST work is ready, or we have some other concrete need for
5-key syscaches.
  
Any thoughts about the << 16 and >> 16 bit shifting on the 5th hash key 
computation? I blithely copied it from the 3rd key.


--
Yeb

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] standard_conforming_strings

2010-07-14 Thread Robert Haas
On Fri, Jan 29, 2010 at 10:02 PM, Josh Berkus  wrote:
>> An actual plan here might look like "let's flip it before 9.1alpha1
>> so we can get some alpha testing cycles on it" ...
>
> "Hey, let's flip it in 9.1 CF 1, so that we can have some alpha testing
> cycles on it."

Should we do this?  Patch attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 1aff181..13a39be 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5203,11 +5203,8 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
 This controls whether ordinary string literals
 ('...') treat backslashes literally, as specified in
 the SQL standard.
-The default is currently off, causing
-PostgreSQL to have its historical
-behavior of treating backslashes as escape characters.
-The default will change to on in a future release
-to improve compatibility with the SQL standard.
+Beginning in PostgreSQL 9.1, the default is
+on (prior releases defaulted to off).
 Applications can check this
 parameter to determine how string literals will be processed.
 The presence of this parameter can also be taken as an indication
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 9f2c73b..45e0e9e 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -445,16 +445,15 @@ SELECT 'foo'  'bar';
  If the configuration parameter
   is off,
  then PostgreSQL recognizes backslash escapes
- in both regular and escape string constants.  This is for backward
- compatibility with the historical behavior, where backslash escapes
- were always recognized.
- Although standard_conforming_strings currently defaults to
- off, the default will change to on in a future
- release for improved standards compliance.  Applications are therefore
- encouraged to migrate away from using backslash escapes.  If you need
- to use a backslash escape to represent a special character, write the
- string constant with an E to be sure it will be handled the same
- way in future releases.
+ in both regular and escape string constants.  However, as of
+ PostgreSQL 9.1, the default is on, meaning
+ that backslash escapes are recognized only in escape string constants.
+ This behavior is more standards-compliant, but might break applications
+ which rely on the historical behavior, where backslash escapes
+ were always recognized.  As a workaround, you can set this parameter
+ to off, but it is better to migrate away from using backslash
+ escapes.  If you need to use a backslash escape to represent a special
+ character, write the string constant with an E.
 
 
 
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 903efee..aec94be 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -52,7 +52,7 @@
  */
 intbackslash_quote = BACKSLASH_QUOTE_SAFE_ENCODING;
 bool			escape_string_warning = true;
-bool			standard_conforming_strings = false;
+bool			standard_conforming_strings = true;
 
 /*
  * Set the type of YYSTYPE.
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index e839639..83e8517 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1209,7 +1209,7 @@ static struct config_bool ConfigureNamesBool[] =
 			GUC_REPORT
 		},
 		&standard_conforming_strings,
-		false, NULL, NULL
+		true, NULL, NULL
 	},
 
 	{
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a3b1457..9b06cd4 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -509,7 +509,7 @@
 #escape_string_warning = on
 #lo_compat_privileges = off
 #sql_inheritance = on
-#standard_conforming_strings = off
+#standard_conforming_strings = on
 #synchronize_seqscans = on
 
 # - Other Platforms and Clients -

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread PostgreSQL - Hans-Jürgen Schönig
hello ...

look at the syntax i posted in more detail:

>>  ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id = y.id AND x.id2 = 
>> y.id2)
> 



it says X and Y ...
the selectivity of joins are what i am most interested in. cross correlation of 
columns within the same table are just a byproduct.
the core thing is: how can i estimate the number of rows returned from a join?

an example would be: you have a email accounts + messages. you know that each 
row will match in a join as you can assume that every account will have a 
message.
what we need is a syntax which covers the join case and the case where columns 
inside the same table correlate.
and the fact that an index cannot cover two tables leads me to the conclusion 
that stats on an index are not the solution to the join problem.

many thanks,

hans


On Jul 14, 2010, at 4:22 PM, Tom Lane wrote:

> =?iso-8859-1?Q?PostgreSQL_-_Hans-J=FCrgen_Sch=F6nig?=  
> writes:
>> i think that having stats on an index is a problem by itself for 2 reasons - 
>> for cross column correlation at least:
> 
>>  a.) joins cannot be covered by an index on two tables - we would fix 
>> "inside a table correlation problems" but not joins.
> 
> Your proposed command didn't cover the two-table case either, and anyway
> what the heck do you mean by cross-correlation across tables?
> Cross-correlation is about the correlation between values in the same
> row.
> 
>>  b.) who says that there is actually an index in place?
> 
> If the combination of columns is actually interesting, there might well
> be an index in place, or the DBA might be willing to create it.  For
> that matter, have you considered the idea of examining the index
> contents to derive the statistics?  Might work better than trying to get
> numbers via ANALYZE.
> 
>   regards, tom lane
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] five-key syscaches

2010-07-14 Thread Robert Haas
On Wed, Jul 14, 2010 at 7:27 AM, Yeb Havinga  wrote:
> Attach is a new patch with all things described above addressed.

Thanks!

I think we should probably hold off applying this until some of the
other KNNGIST work is ready, or we have some other concrete need for
5-key syscaches.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Andrew Dunstan



Tom Lane wrote:

If the combination of columns is actually interesting, there might well
be an index in place, or the DBA might be willing to create it.  


I'm having a hard time imagining an interesting case where that wouldn't 
be so.



For
that matter, have you considered the idea of examining the index
contents to derive the statistics?  Might work better than trying to get
numbers via ANALYZE.

  


Sounds like a good idea.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Tom Lane
=?iso-8859-1?Q?PostgreSQL_-_Hans-J=FCrgen_Sch=F6nig?=  
writes:
> i think that having stats on an index is a problem by itself for 2 reasons - 
> for cross column correlation at least:

>   a.) joins cannot be covered by an index on two tables - we would fix 
> "inside a table correlation problems" but not joins.

Your proposed command didn't cover the two-table case either, and anyway
what the heck do you mean by cross-correlation across tables?
Cross-correlation is about the correlation between values in the same
row.

>   b.) who says that there is actually an index in place?

If the combination of columns is actually interesting, there might well
be an index in place, or the DBA might be willing to create it.  For
that matter, have you considered the idea of examining the index
contents to derive the statistics?  Might work better than trying to get
numbers via ANALYZE.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread PostgreSQL - Hans-Jürgen Schönig
hello tom,

i think that having stats on an index is a problem by itself for 2 reasons - 
for cross column correlation at least:

a.) joins cannot be covered by an index on two tables - we would fix 
"inside a table correlation problems" but not joins.
b.) who says that there is actually an index in place? assume you are 
doing some big seq scan to do analytics. you don't want it to be indexed for 10 
different types of queries.

i think i is pretty hard to determine automatically what to collect because we 
cannot know which permutations of cross-column magic people will use.
i was thinking along the line of having it automatic as well but i could not 
figure out how to do it.
i think we can suggest addition stats to the user and we can write tools to 
figure our somehow what would be useful but personally i cannot see anything 
which is better than a command here.

many thanks,

hans



On Jul 14, 2010, at 4:01 PM, Tom Lane wrote:

> Heikki Linnakangas  writes:
>> On 14/07/10 13:12, PostgreSQL - Hans-Jürgen Schönig wrote:
>>> maybe somehow like this ...
>>> ALTER TABLE x SET CORRELATION STATISTICS FOR (id = id2 AND id3=id4)
> 
>> +1 is my general feeling, it's good if you can tell the system to 
>> collect additional statistics where needed.
> 
> The previous discussions about this went in the direction of
> "automatically collect stats if there is an index on that combination of
> columns".  Do we really need a command?
> 
>> However, the problem is how to represent and store the 
>> cross-correlation.
> 
> Yes, whatever the triggering mechanism is for collecting cross-column
> stats, actually doing something useful is the hard part.
> 
>   regards, tom lane
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Joshua Tolley
On Wed, Jul 14, 2010 at 01:21:19PM +0200, Yeb Havinga wrote:
> Heikki Linnakangas wrote:
>> However, the problem is how to represent and store the  
>> cross-correlation. For fields with low cardinality, like "gender" and  
>> boolean "breast-cancer-or-not" you can count the prevalence of all the  
>> different combinations, but that doesn't scale. Another often cited  
>> example is zip code + street address. There's clearly a strong  
>> correlation between them, but how do you represent that?
>>
>> For scalar values we currently store a histogram. I suppose we could  
>> create a 2D histogram for two columns, but that doesn't actually help  
>> with the zip code + street address problem.
> In my head the neuron for 'principle component analysis' went on while  
> reading this. Back in college it was used to prepare input data before  
> feeding it into a neural network. Maybe ideas from PCA could be helpful?

I've been playing off and on with an idea along these lines, which builds an
empirical copula[1] to represent correlations between columns where there
exists a multi-column index containing those columns. This copula gets stored
in pg_statistic. There are plenty of unresolved questions (and a crash I
introduced and haven't had time to track down), but the code I've been working
on is here[2] in the multicolstat branch. Most of the changes are in
analyze.c; no user-visible changes have been introduced. For that matter,
there aren't any changes yet actually to use the values once calculated (more
unresolved questions get in the way there), but it's a start.

[1] http://en.wikipedia.org/wiki/Copula_(statistics)
[2] http://git.postgresql.org/gitweb?p=users/eggyknap/postgres.git

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


signature.asc
Description: Digital signature


Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Tom Lane
Heikki Linnakangas  writes:
> On 14/07/10 13:12, PostgreSQL - Hans-Jürgen Schönig wrote:
>> maybe somehow like this ...
>> ALTER TABLE x SET CORRELATION STATISTICS FOR (id = id2 AND id3=id4)

> +1 is my general feeling, it's good if you can tell the system to 
> collect additional statistics where needed.

The previous discussions about this went in the direction of
"automatically collect stats if there is an index on that combination of
columns".  Do we really need a command?

> However, the problem is how to represent and store the 
> cross-correlation.

Yes, whatever the triggering mechanism is for collecting cross-column
stats, actually doing something useful is the hard part.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: preload dictionary new version

2010-07-14 Thread Pavel Stehule
Hello

this patch is significantly reduced original patch. It doesn't propose
a simple allocator - just eliminate a high memory usage for ispell
dictionary.

without this patch the ispell dictionary takes 55MB for tsearch2
context and 27MB in temp context. With this patch it takes "only" 25MB
tsearch2 context and 19MB in temp context - its for Czech dictionary
and UTF8 encoding. The patch is litlle bit ugly - it was reason, why I
proposed a simple allocator, but it reduce a memory usage on 53% - the
startup is better from 620 to 560 ms ~ 10% faster. little bit strange
is repeated time - it goes down from 18ms to 5ms.

Regards

Pavel Stehule


lessmem.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] five-key syscaches

2010-07-14 Thread Yeb Havinga

Hello Robert,

As part of the current reviewfest, I reviewed your patch, and made some 
changes on the way.


This was all ok:

*) while proofreading I did not find typos other than the one that 
Joachim had already pointed out.
*) the addition of 5-key lookups to the existing ones seems a natural 
extension, and the best way to solve finding the index that 
can-order-by-op needed for the knngist. Solutions were debated in a 
relatively long thread 'knngist patch support', where the first 
reference of four columns being too less was in 
http://archives.postgresql.org/pgsql-hackers/2010-02/msg01071.php

*) regression test ok
*) performance: comparing make check speeds with and without patch did 
not reveal significant differences.


The changes:

*) since the API of the syscache functions is changed, one would expect 
a lot of compile errors but none were found. The patch of 
http://archives.postgresql.org/pgsql-committers/2010-02/msg00174.php 
that introduced macro's around the base functions made that possible. 
Two calls in contrib/tsearch2 were overlooked.
*) after changing the calls in contrib/tsearch2 and compiled and 
installchecked ok
*) I also removed a few unneeded includes of syscache.h from some 
contrib modules
*) In syscache.c the cachedesc structure has a key array that is 
increased from 4 to CATCACHE_MAXKEYS. However, each element of the 
cacheinfo[] array still has 4 attribute numbers listed, so the 5th 
element is undefined. To not rely on compiler or platform and for code 
uniformity I changed all syscaches to have 5 attribute numbers.
*) To test the new functions I added an extra syscache and performed a 5 
key lookup. This gave the following error FATAL:  wrong number of hash 
keys: 5 in CatalogCacheComputeHashValue. I changed that as well, but 
somebody with intimate knowledge of hash algorithms should probably 
decide which bit-shifting on the key values is appropriate. It currently 
does the same as key 3: hashValue ^= oneHash << 16; hashValue ^= oneHash 
>> 16;


I tested a negative and positive search with SearchSysCacheExists5, that 
were executed as expected. Regression test still ok.


Attach is a new patch with all things described above addressed.

regards,
Yeb Havinga


Robert Haas wrote:

On Mon, Mar 29, 2010 at 4:21 AM, Joachim Wieland  wrote:
  

On Mon, Mar 29, 2010 at 12:32 AM, Robert Haas  wrote:


Per previous discussion, PFA a patch to change the maximum number of
keys for a syscache from 4 to 5.

http://archives.postgresql.org/pgsql-hackers/2010-02/msg01105.php

This is intended for application to 9.1, and is supporting
infrastructure for knngist.
  

It looks like there should be a 5 rather than a 4 for nkeys of
SearchSysCacheList().

+#define SearchSysCacheList5(cacheId, key1, key2, key3, key4, key5) \
+   SearchSysCacheList(cacheId, 4, key1, key2, key3, key4, key5)



Good catch.  Will fix.

...Robert

  


diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 82b0730..e803652 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,7 +63,6 @@
 #include "utils/hsearch.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
-#include "utils/syscache.h"
 #include "utils/tqual.h"
 
 #include "dblink.h"
diff --git a/contrib/ltree/ltree_op.c b/contrib/ltree/ltree_op.c
index c76e6cc..906d38d 100644
--- a/contrib/ltree/ltree_op.c
+++ b/contrib/ltree/ltree_op.c
@@ -11,7 +11,6 @@
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
-#include "utils/syscache.h"
 #include "ltree.h"
 
 PG_MODULE_MAGIC;
diff --git a/contrib/tsearch2/tsearch2.c b/contrib/tsearch2/tsearch2.c
index d421f77..224563a 100644
--- a/contrib/tsearch2/tsearch2.c
+++ b/contrib/tsearch2/tsearch2.c
@@ -172,9 +172,8 @@ tsa_set_curdict(PG_FUNCTION_ARGS)
 {
Oid dict_oid = PG_GETARG_OID(0);
 
-   if (!SearchSysCacheExists(TSDICTOID,
- 
ObjectIdGetDatum(dict_oid),
- 0, 0, 0))
+   if (!SearchSysCacheExists1(TSDICTOID,
+  
ObjectIdGetDatum(dict_oid)))
elog(ERROR, "cache lookup failed for text search dictionary %u",
 dict_oid);
 
@@ -211,9 +210,8 @@ tsa_set_curprs(PG_FUNCTION_ARGS)
 {
Oid parser_oid = PG_GETARG_OID(0);
 
-   if (!SearchSysCacheExists(TSPARSEROID,
- 
ObjectIdGetDatum(parser_oid),
- 0, 0, 0))
+   if (!SearchSysCacheExists1(TSPARSEROID,
+ 
ObjectIdGetDatum(parser_oid)))
elog(ERROR, "cache lookup failed for text search parser %u",
 parser_oid);
 
diff --git a/src/backend/utils/cache/catcache.c 
b/src/backend/utils/cache/catcache.c
i

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Yeb Havinga

Heikki Linnakangas wrote:
However, the problem is how to represent and store the 
cross-correlation. For fields with low cardinality, like "gender" and 
boolean "breast-cancer-or-not" you can count the prevalence of all the 
different combinations, but that doesn't scale. Another often cited 
example is zip code + street address. There's clearly a strong 
correlation between them, but how do you represent that?


For scalar values we currently store a histogram. I suppose we could 
create a 2D histogram for two columns, but that doesn't actually help 
with the zip code + street address problem.
In my head the neuron for 'principle component analysis' went on while 
reading this. Back in college it was used to prepare input data before 
feeding it into a neural network. Maybe ideas from PCA could be helpful?


regards,
Yeb Havinga



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread PostgreSQL - Hans-Jürgen Schönig

On Jul 14, 2010, at 12:40 PM, Heikki Linnakangas wrote:

> On 14/07/10 13:12, PostgreSQL - Hans-Jürgen Schönig wrote:
>> hello everybody,
>> 
>> we are currently facing some serious issues with cross correlation issue.
>> consider: 10% of all people have breast cancer. we have 2 genders (50:50).
>> if i select all the men with breast cancer, i will get basically nobody - 
>> the planner will overestimate the output.
>> this is the commonly known problem ...
>> 
>> this cross correlation problem can be quite nasty in many many cases.
>> underestimated nested loops can turn joins into a never ending nightmare and 
>> so on and so on.
>> 
>> my ideas is the following:
>> what if we allow users to specifiy cross-column combinations where we keep 
>> separate stats?
>> maybe somehow like this ...
>> 
>>  ALTER TABLE x SET CORRELATION STATISTICS FOR (id = id2 AND id3=id4)
>> 
>> or ...
>> 
>>  ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id = y.id AND x.id2 = 
>> y.id2)
>> 
>> clearly we cannot store correlation for all combinations of all columns so 
>> we somehow have to limit it.
>> 
>> what is the general feeling about something like that?
> 
> +1 is my general feeling, it's good if you can tell the system to collect 
> additional statistics where needed. And once you have that, you can write an 
> agent or something to detect automatically which extra statistics might be 
> useful.
> 


it seems i can leave my bunker where i was hiding for cover when i was waiting 
for a reply ;).
yes, my idea was to have an agent as well - but this is just some follow up 
problem.


> However, the problem is how to represent and store the cross-correlation. For 
> fields with low cardinality, like "gender" and boolean "breast-cancer-or-not" 
> you can count the prevalence of all the different combinations, but that 
> doesn't scale. Another often cited example is zip code + street address. 
> There's clearly a strong correlation between them, but how do you represent 
> that?


we could play the same story with a table storing people including their home 
country and the color of their skin.
obviously we will have more black people in african countries..


> 
> For scalar values we currently store a histogram. I suppose we could create a 
> 2D histogram for two columns, but that doesn't actually help with the zip 
> code + street address problem.
> 


i think we might go for a second relation here specifically for this issue and 
a boolean flag in the current stats table indicating that additional 
correlation stats exist (to avoid an additional lookup unless really necessary).
do you have a useful syntax in mind? the thing is: this issue can be isolated 
inside a table (e.g. WHERE a.id = a.id2 AND a.id3 = a.id4) or it might span two 
tables with an arbitrary number of fields.

many thanks,

hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Heikki Linnakangas

On 14/07/10 13:12, PostgreSQL - Hans-Jürgen Schönig wrote:

hello everybody,

we are currently facing some serious issues with cross correlation issue.
consider: 10% of all people have breast cancer. we have 2 genders (50:50).
if i select all the men with breast cancer, i will get basically nobody - the 
planner will overestimate the output.
this is the commonly known problem ...

this cross correlation problem can be quite nasty in many many cases.
underestimated nested loops can turn joins into a never ending nightmare and so 
on and so on.

my ideas is the following:
what if we allow users to specifiy cross-column combinations where we keep 
separate stats?
maybe somehow like this ...

ALTER TABLE x SET CORRELATION STATISTICS FOR (id = id2 AND id3=id4)

or ...

ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id = y.id AND x.id2 = 
y.id2)

clearly we cannot store correlation for all combinations of all columns so we 
somehow have to limit it.

what is the general feeling about something like that?


+1 is my general feeling, it's good if you can tell the system to 
collect additional statistics where needed. And once you have that, you 
can write an agent or something to detect automatically which extra 
statistics might be useful.


However, the problem is how to represent and store the 
cross-correlation. For fields with low cardinality, like "gender" and 
boolean "breast-cancer-or-not" you can count the prevalence of all the 
different combinations, but that doesn't scale. Another often cited 
example is zip code + street address. There's clearly a strong 
correlation between them, but how do you represent that?


For scalar values we currently store a histogram. I suppose we could 
create a 2D histogram for two columns, but that doesn't actually help 
with the zip code + street address problem.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] cross column correlation revisted

2010-07-14 Thread PostgreSQL - Hans-Jürgen Schönig
hello everybody,

we are currently facing some serious issues with cross correlation issue.
consider: 10% of all people have breast cancer. we have 2 genders (50:50).
if i select all the men with breast cancer, i will get basically nobody - the 
planner will overestimate the output.
this is the commonly known problem ...

this cross correlation problem can be quite nasty in many many cases.
underestimated nested loops can turn joins into a never ending nightmare and so 
on and so on.

my ideas is the following:
what if we allow users to specifiy cross-column combinations where we keep 
separate stats?
maybe somehow like this ...

ALTER TABLE x SET CORRELATION STATISTICS FOR (id = id2 AND id3=id4)

or ...

ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id = y.id AND x.id2 = 
y.id2)

clearly we cannot store correlation for all combinations of all columns so we 
somehow have to limit it.

what is the general feeling about something like that?

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: overview

2010-07-14 Thread Markus Wanner
Hi,

On 07/13/2010 08:45 PM, Kevin Grittner wrote:
> You could submit them as Work In Progress patches

Okay, I added them. I guess they get more attention that way.

Regards

Markus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] suppress automatic recovery after back crash

2010-07-14 Thread Fujii Masao
On Mon, Jun 28, 2010 at 9:09 PM, Robert Haas  wrote:
> I mulled over which of those names was better; updated version,
> reflecting your proposed naming, attached.

I read the patch and found some small typos.

> +If true, any error will terminate the current session.  Normally,
> +this is set to false, so that only FATAL errors will terminate the

"s/Normally/By default" seems better.

> +When set to true, which is the default, PostgreSQL
> +will automatically reinitialize after a backend crash.  Leaving this
> +value set to true is normally the best way to maximize the 
> availability
> +of the database.  However, in some circumstances, such as when
> +PostgreSQL is being invoked by clusterware, it may be
> +useful to disable this behavior, so that the clusterware can gain
> +control and take any actions it deems appropriate.

We should add something like?:

-
Even if this value is set to true, a backend crash during hot standby doesn't
reinitialize the database.
-

> + /* ERROR_HANDING */
> + gettext_noop("Error Handling"),

You seems to have forgotten to reflect Tom's proposal here.

>  
> #--
> +# ERROR HANDING
> +#--

Typo: s/HANDING/HANDLING

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers