date:20061024

Re: [DOCS] [HACKERS] Replication documentation addition

2006-10-24 Thread Cesar Suga


Hi,

I also wrote Bruce about that.

It happens that, if you 'freely advertise' commercial solutions (rather 
than they doing so by other vehicles) you will always happen to be an 
'updater' to the docs if they change their product lines, if they change 
their business model, if and if.


If you cite a commercial solution, as a fair game you should cite *all* 
of them. If one enterprise has the right to be listed in the 
documentation, all of them might, as you will never be favouring one of 
them.


That's the main motivation to write this. Moreover, if there are also 
commercial solutions for high-end installs and they are cited as 
providers to those solutions, it (to a point) disencourages those of 
gathering themselves and writing open source extensions to PostgreSQL.


As Bruce stated, then should the documentation contemplate 
EnterpriseDB's Oracle functions? Should PostgreSQL also come with it? 
Wouldn't it be painful to make, say, another description for an 
alternate product other than EnterpriseDB if it arises?


If people (who read the documentation) professionally work with 
PostgreSQL, they may already have been briefed by those commercial 
offerings in some way.


I think only the source and its tightly coupled (read: can compile along 
with, free as PostgreSQL) components should be packaged into the tarball.


However, I find Bruce's unofficial wiki idea a good one for comparisons.

Regards,
Cesar

Steve Atkins wrote:


On Oct 24, 2006, at 9:20 PM, Bruce Momjian wrote:


Steve Atkins wrote:

If we are to add them, I need to hear that from people who haven't
worked in PostgreSQL commerical replication companies.


I'm not coming to PostgreSQL for open source solutions. I'm coming
to PostgreSQL for _good_ solutions.

I want to see what solutions might be available for a problem I have.
I certainly want to know whether they're freely available, commercial
or some flavour of open source, but I'd like to know about all of them.

A big part of the value of Postgresql is the applications and 
extensions

that support it. Hiding the existence of some subset of those just
because of the way they're licensed is both underselling postgresql
and doing something of a disservice to the user of the document.


OK, does that mean we mention EnterpriseDB in the section about Oracle
functions?  Why not mention MS SQL if they have a better solution?  I
just don't see where that line can clearly be drawn on what to include.
Do we mention Netiza, which is loosely based on PostgreSQL?   It just
seems very arbitrary to include commercial software.  If someone wants
to put in on a wiki, I think that would be fine because that doesn't
seems as official.


Good question. The line needs to be drawn somewhere. It's basically
your judgement, tempered by other peoples feedback, though. If it
were me, I'd ask myself "Would I mention this product if it were open
source? Would mentioning it help people using the document?".

Cheers,
  Steve


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Reg external sorting alogrithm

2006-10-24 Thread Tom Lane

Praveen Kumar N <[EMAIL PROTECTED]> writes:
>   can anybody tell me what is the computational complexity of 
> external sorting algorithm used by postgres in terms of time and space.

See the comments at the head of tuplesort.c:
http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/sort/tuplesort.c

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[HACKERS] Reg external sorting alogrithm

2006-10-24 Thread Praveen Kumar N


Hi,
	can anybody tell me what is the computational complexity of 
external sorting algorithm used by postgres in terms of time and space.
And one more question is how does # of DISK I/O's vary by varying jsf and 
size of data while using external sorting algorithms to sort(I mean is it 
like linear relation ship or in terms of logarithmic function).


thanks in anticipation.

Regards,
 N Praveen Kumar


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] New CRC algorithm: Slicing by 8

2006-10-24 Thread Tom Lane

"Gurjeet Singh" <[EMAIL PROTECTED]> writes:
> On 10/23/06, Tom Lane <[EMAIL PROTECTED] > wrote:
>> I didn't particularly trust the timing calculations in your benchmark
>> program,

> Any particular reason? (why and what did you doubt in it?).

Well, the specific thing that set off my bogometer was

#define TV_DIFF_MILLI(tv1, tv2) 
((tv2.tv_sec*1000+((tv2.tv_usec)/1000))-(tv1.tv_sec*1000+((tv1.tv_usec)/1000)))

which is going to have overflow problems on any platform where tv_sec
isn't a 64-bit type (which is still all of 'em AFAIK).  But more
generally, your test is timing a CRC across 100 4Kb segments, which
isn't representative of PG's usage of CRCs.  I don't think there are
any XLogWrite calls that have more than about 5 segments, and in most
cases those segments contain a few dozen bytes not a few K.  So you
need to be looking at much shorter loop runs.

The test case I proposed uses timing code that I trusted (borrowed from
long-established coding in postgres.c), and tests loop lengths that are
somewhat representative for PG, but it is still biased in favor of slice8
because it repeats the CRC calculations consecutively without any other
activity --- presumably this fact creates a bias for a method that needs
more L2 cache space over one that doesn't need so much.  I'd have tried
harder to make an unbiased test case if this version had showed slice8 as
competitive, but so far it seems that on a majority of current CPUs and
compilers it's not competitive.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [DOCS] [HACKERS] Replication documentation addition

2006-10-24 Thread Steve Atkins



On Oct 24, 2006, at 9:20 PM, Bruce Momjian wrote:


Steve Atkins wrote:

If we are to add them, I need to hear that from people who haven't
worked in PostgreSQL commerical replication companies.


I'm not coming to PostgreSQL for open source solutions. I'm coming
to PostgreSQL for _good_ solutions.

I want to see what solutions might be available for a problem I have.
I certainly want to know whether they're freely available, commercial
or some flavour of open source, but I'd like to know about all of  
them.


A big part of the value of Postgresql is the applications and  
extensions

that support it. Hiding the existence of some subset of those just
because of the way they're licensed is both underselling postgresql
and doing something of a disservice to the user of the document.


OK, does that mean we mention EnterpriseDB in the section about Oracle
functions?  Why not mention MS SQL if they have a better solution?  I
just don't see where that line can clearly be drawn on what to  
include.

Do we mention Netiza, which is loosely based on PostgreSQL?   It just
seems very arbitrary to include commercial software.  If someone wants
to put in on a wiki, I think that would be fine because that doesn't
seems as official.


Good question. The line needs to be drawn somewhere. It's basically
your judgement, tempered by other peoples feedback, though. If it
were me, I'd ask myself "Would I mention this product if it were open
source? Would mentioning it help people using the document?".

Cheers,
  Steve


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [DOCS] [HACKERS] Replication documentation addition

2006-10-24 Thread Bruce Momjian

Steve Atkins wrote:
> > If we are to add them, I need to hear that from people who haven't
> > worked in PostgreSQL commerical replication companies.
> 
> I'm not coming to PostgreSQL for open source solutions. I'm coming
> to PostgreSQL for _good_ solutions.
> 
> I want to see what solutions might be available for a problem I have.
> I certainly want to know whether they're freely available, commercial
> or some flavour of open source, but I'd like to know about all of them.
> 
> A big part of the value of Postgresql is the applications and extensions
> that support it. Hiding the existence of some subset of those just
> because of the way they're licensed is both underselling postgresql
> and doing something of a disservice to the user of the document.

OK, does that mean we mention EnterpriseDB in the section about Oracle
functions?  Why not mention MS SQL if they have a better solution?  I
just don't see where that line can clearly be drawn on what to include.
Do we mention Netiza, which is loosely based on PostgreSQL?   It just
seems very arbitrary to include commercial software.  If someone wants
to put in on a wiki, I think that would be fine because that doesn't
seems as official.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [DOCS] [HACKERS] Replication documentation addition

2006-10-24 Thread Steve Atkins



On Oct 24, 2006, at 8:48 PM, Bruce Momjian wrote:


Joshua D. Drake wrote:

Josh Berkus wrote:

Bruce,

I have updated the text.  Please let me know what else I should  
change.
I am unsure if I should be mentioning commercial PostgreSQL  
products in

our documentation.


I think you should mention the postgresql-only ones, but just  
briefly with a

link.  Bizgres MPP, ExtenDB, uni/cluster, and Mammoth Replicator.


And to further this I would expect that it would be a subsection..  
e.g;
a  or . I think the open source version should  
absolutely

get top billing though.


I am not inclined to add commercial offerings.  If people wanted
commercial database offerings, they can get them from companies that
advertize.  People are coming to PostgreSQL for open source solutions,
and I think mentioning commercial ones doesn't make sense.

If we are to add them, I need to hear that from people who haven't
worked in PostgreSQL commerical replication companies.


I'm not coming to PostgreSQL for open source solutions. I'm coming
to PostgreSQL for _good_ solutions.

I want to see what solutions might be available for a problem I have.
I certainly want to know whether they're freely available, commercial
or some flavour of open source, but I'd like to know about all of them.

A big part of the value of Postgresql is the applications and extensions
that support it. Hiding the existence of some subset of those just
because of the way they're licensed is both underselling postgresql
and doing something of a disservice to the user of the document.

Cheers,
  Steve

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] materialised view

2006-10-24 Thread Jaime Casanova


On 10/24/06, rajesh boppana <[EMAIL PROTECTED]> wrote:

 i want to implement materialized views in postgresql . to do as i
want to modify the code in backend but i don't know what r the files i have
to modify. so please help me by mentioning about the backend code.



http://www.postgresql.org/developer/coding

--
regards,
Jaime Casanova

"Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs and the universe trying
to produce bigger and better idiots.
So far, the universe is winning."
  Richard Cook

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

[HACKERS] materialised view

2006-10-24 Thread rajesh boppana

 i want to implement materialized views in postgresql . to do as i want to modify the code in backend but i don't know what r the files i have to modify. so please help me by mentioning about the backend code.

Re: [DOCS] [HACKERS] Replication documentation addition

2006-10-24 Thread Bruce Momjian

Joshua D. Drake wrote:
> Josh Berkus wrote:
> > Bruce,
> > 
> >> I have updated the text.  Please let me know what else I should change.
> >> I am unsure if I should be mentioning commercial PostgreSQL products in
> >> our documentation.
> > 
> > I think you should mention the postgresql-only ones, but just briefly with 
> > a 
> > link.  Bizgres MPP, ExtenDB, uni/cluster, and Mammoth Replicator.
> 
> And to further this I would expect that it would be a subsection.. e.g;
> a  or . I think the open source version should absolutely
> get top billing though.

I am not inclined to add commercial offerings.  If people wanted
commercial database offerings, they can get them from companies that
advertize.  People are coming to PostgreSQL for open source solutions,
and I think mentioning commercial ones doesn't make sense.

If we are to add them, I need to hear that from people who haven't
worked in PostgreSQL commerical replication companies.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Incorrect behavior with CE and ORDER BY

2006-10-24 Thread Tom Lane

Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Is this possible?  It would be very fast.

It's possible but not exactly simple.  As an example, your proposed
plan:

> Limit (50)
>   Sort (key: pse_lastlogin)
> Result
>Append
>   Limit (50)
>SeqScan tbl_profile_search
> Limit (50)
>Indexscan tbl_profile_search_interest_1
> Limit (50)
>IndexScan on the index mentioned above

is wrong because there's no guarantee that the first 50 elements of a
seqscan will be anything special.  You could imagine dealing with that
by sorting the seqscan results and limiting to 50, or by not
sorting/limiting that data at all but letting the upper sort see all the
seqscan entries.  Offhand I think either of those could win depending on
how many elements the seqscan will yield.  Also, it might be interesting
to consider inventing a "merge" plan node type that takes N
already-sorted inputs and produces a sorted output stream.  Then we'd
need to trade off this approach versus doing the top-level sort, which
could cope with some of its inputs not being pre-sorted.

This seems to have some aspects in common with the recent discussion
about how to optimize min/max aggregates across an appendrel set.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [JDBC] server process (PID 1188) exited with exit code

2006-10-24 Thread Oliver Jowett


Tom Lane wrote:


   NULL,/* let the backend deduce param type */


I think the JDBC driver will be passing the int4 OID for the param type 
in this case.


Best thing is probably for the OP to run with loglevel=2 and see exactly 
what's being sent, though.


-O

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Joshua D. Drake

Josh Berkus wrote:
> Bruce,
> 
>> I have updated the text.  Please let me know what else I should change.
>> I am unsure if I should be mentioning commercial PostgreSQL products in
>> our documentation.
> 
> I think you should mention the postgresql-only ones, but just briefly with a 
> link.  Bizgres MPP, ExtenDB, uni/cluster, and Mammoth Replicator.

And to further this I would expect that it would be a subsection.. e.g;
a  or . I think the open source version should absolutely
get top billing though.

Sincerely,

Joshua D. Drake

-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Josh Berkus

Bruce,

> I have updated the text.  Please let me know what else I should change.
> I am unsure if I should be mentioning commercial PostgreSQL products in
> our documentation.

I think you should mention the postgresql-only ones, but just briefly with a 
link.  Bizgres MPP, ExtenDB, uni/cluster, and Mammoth Replicator.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [DOCS] [HACKERS] Replication documentation addition

2006-10-24 Thread Bruce Momjian

Joshua D. Drake wrote:
> 
> > Looking at that, I'm a) missing PgCluster and b) arguing that we have to
> > admit that we simply can not 'list .. replication solutions ... and how
> > to get them' because all of the solutions mentioned need quite some
> > knowledge and require a more or less complex installation and
> > configuration.
> 
> There is also the question if we should have a sub section:
> 
> Closed Source replication solutions:
> 
> Mammoth Replicator
> Continuent P/Cluster
> ExtenDB
> Greenplum MPP (although this is kind of horizontal partitioning)

I vote no.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Bruce Momjian


I don't think the PostgreSQL documentation should be mentioning
commercial solutions.

---

Luke Lonergan wrote:
> Bruce, 
> 
> > -Original Message-
> > From: [EMAIL PROTECTED] 
> > [mailto:[EMAIL PROTECTED] On Behalf Of Bruce Momjian
> > Sent: Tuesday, October 24, 2006 5:16 PM
> > To: Hannu Krosing
> > Cc: PostgreSQL-documentation; PostgreSQL-development
> > Subject: Re: [HACKERS] Replication documentation addition
> > 
> > 
> > OK, I have updated the URL.  Please let me know how you like it.
> 
> There's a typo on line 8, first paragraph:
> 
> "perhaps with only one server allowing write rwork together at the same
> time."
> 
> Also, consider this wording of the last description:
> 
> "Single-Query Clustering..."
> 
> Replaced by:
> 
> "Shared Nothing Clustering
> ---
> 
> This allows multiple servers with separate disks to work together on a
> each query.
> In shared nothing clusters, the work of answering each query is
> distributed among
> the servers to increase the performance through parallelism.  These
> systems will
> typically feature high availability by using other forms of replication
> internally.
> 
> While there are no open source options for this type of clustering,
> there are several
> commercial products available that implement this approach, making
> PostgreSQL achieve
> very high performance for multi-Terabyte business intelligence
> databases."
> 
> - Luke

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Bruce Momjian

Markus Schiltknecht wrote:
> Looking at that, I'm a) missing PgCluster and b) arguing that we have to 
> admit that we simply can not 'list .. replication solutions ... and how 
> to get them' because all of the solutions mentioned need quite some 
> knowledge and require a more or less complex installation and configuration.

Where is pgcluster in terms of usability?  Should I mention it?

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Bruce Momjian


I have updated the text.  Please let me know what else I should change. 
I am unsure if I should be mentioning commercial PostgreSQL products in
our documentation.

---

Hannu Krosing wrote:
> ?hel kenal p?eval, T, 2006-10-24 kell 00:20, kirjutas Bruce Momjian:
> > Here is a new replication documentation section I want to add for 8.2:
> > 
> > ftp://momjian.us/pub/postgresql/mypatches/replication
> 
> This is how data partitioning is currently described there
> 
> > Data Partitioning
> > -
> > 
> > Data partitioning splits the database into data sets.  To achieve
> > replication, each data set can only be modified by one server.  For
> > example, data can be partitioned by offices, e.g. London and Paris. 
> > While London and Paris servers have all data records, only London can
> > modify London records, and Paris can only modify Paris records.  Such
> > partitioning is usually accomplished in application code, though rules
> > and triggers can help enforce partitioning and keep the read-only data
> > sets current.  Slony can also be used in such a setup.  While Slony
> > replicates only entire tables, London and Paris can be placed in
> > separate tables, and inheritance can be used to access from both tables
> > using a single table name.
> 
> Maybe another use of partitioning should also be mentioned. That is ,
> when partitioning is used to overcome limitations of single servers
> (especially IO and memory, but also CPU), and only a subset of data is
> stored and processed on each server.
> 
> As an example of this type of partitioning you could mention Bizgres MPP
> (a PG-based commercial product, http://www.greenplum.com ), which
> partitions data to use I/O and CPU of several DB servers for processing
> complex OLAP queries, and Pl_Proxy
> ( http://pgfoundry.org/projects/plproxy/ ) which does the same for OLTP
> loads.
> 
> I think the "official" term for this kind of "replication" is
> Shared-Nothing Clustering.
> 
> -- 
> 
> Hannu Krosing
> Database Architect
> Skype Technologies O?
> Akadeemia tee 21 F, Tallinn, 12618, Estonia
> 
> Skype me:  callto:hkrosing
> Get Skype for free:  http://www.skype.com
> 
> 
> 
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [DOCS] Replication documentation addition

2006-10-24 Thread Bruce Momjian

Simon Riggs wrote:
> On Tue, 2006-10-24 at 00:20 -0400, Bruce Momjian wrote:
> > Here is a new replication documentation section I want to add for 8.2:
> > 
> > ftp://momjian.us/pub/postgresql/mypatches/replication
> > 
> > Comments welcomed.
> 
> It's a very good start to a complete minefield of competing solutions.
> 
> My first thought would be to differentiate between clustering and
> replication, which will bring out many differences.

I have gone with "fail-over" and "load balancing" in the updated text.

> My second thought would be to differentiate between load balancing,
> multi-threading, parallel query, high availability and recoverability,
> which would probably sort out the true differences in the above mix. But
> that wouldn't help most people and almost everybody would find fault.

Yep.

> IMHO most people I've spoken to take "replication" to mean an HA
> solution, so perhaps we should cover it in those terms.

Yes, I removed any reference to replication.  It seemed too general.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [DOCS] Replication documentation addition

2006-10-24 Thread Bruce Momjian


I have changed the text to reference "fail over" and "load balancing". 
I think it makes it clearer.  Let me know what you think.  I am hesitant
to mention commercial PostgreSQL products in our documentation.

---

Markus Schiltknecht wrote:
> Hello Bruce,
> 
> Bruce Momjian wrote:
> > Here is a new replication documentation section I want to add for 8.2:
> > 
> > ftp://momjian.us/pub/postgresql/mypatches/replication
> > 
> > Comments welcomed.
> 
> Thank you, that sounds good. It's targeted to production use and 
> currently available solutions, which makes sense in the official manual.
> 
> You are explaining the sync vs. async categorization, but I sort of 
> asked myself where the explanation of single vs multi-master has gone. I 
> then realized, that you are talking about read-only and a "read/write 
> mix of servers". Then again, you are mentioning 'Multi-Master 
> Replication' as one type of replication solutions. I think we should be 
> consistent in our naming. As Single- and Multi-Master are the more 
> common terms among database replication experts, I'd recommend to use 
> them and explain what they mean instead of introducing new names.
> 
> Along with that, I'd argue that this Single- or Multi-Master is a 
> categorization as Sync vs Async. In that sense, the last chapter should 
> probably be named 'Distributed-Shared-Memory Replication' or something 
> like that instead of 'Multi-Master Replication', because as we know, 
> there are several ways of doing Multi-Master Replication (Slony-II / 
> Postgres-R, Distributed Shared Memory, 2PC in application code or the 
> above mentioned 'Query Broadcast Replication', which would fall into a 
> Multi-Master Replication model as well)
> 
> Also in the last chapter, instead of just saying that "PostgreSQL does 
> not offer this type of replication", we could probably say that 
> different projects are trying to come up with better replication 
> solutions. And there are several proprietary products based on 
> PostgreSQL which do solve some kinds of Multi-Master Replication. Not 
> that I want to advertise for any of them, but it just sounds better than 
> the current "no, we don't offer that".
> 
> As this documentation mainly covers production-quality solutions (which 
> is absolutely perfect), can we document the status of current projects 
> somewhere, probably in a wiki? Or at least mention them somewhere and 
> point to their websites? It would help to get rid of all those rumors 
> and uncertainties. Or are those intentional?
> 
> Just my two cents.
> 
> Regards
> 
> Markus
> 
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [JDBC] server process (PID 1188) exited with exit code -1073741819, 8.2 beta1

2006-10-24 Thread Tom Lane

Dave Cramer <[EMAIL PROTECTED]> writes:
> This is a server bug, I will post to hackers for you,

Please provide a complete test case.  I tried to reproduce the failure
in libpq, with

/* Here is our out-of-line parameter value */
paramValues[0] = "joe's place";

res = PQexecParams(conn,
   "SELECT $1 FROM (select * from pg_database) t",
   1,/* one param */
   NULL,/* let the backend deduce param type */
   paramValues,
   NULL,/* don't need param lengths since text */
   NULL,/* default to all text params */
   0);  /* ask for text results */

and got nothing worse than

SELECT failed: ERROR:  could not determine data type of parameter $1


regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Luke Lonergan

Bruce, 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Bruce Momjian
> Sent: Tuesday, October 24, 2006 5:16 PM
> To: Hannu Krosing
> Cc: PostgreSQL-documentation; PostgreSQL-development
> Subject: Re: [HACKERS] Replication documentation addition
> 
> 
> OK, I have updated the URL.  Please let me know how you like it.

There's a typo on line 8, first paragraph:

"perhaps with only one server allowing write rwork together at the same
time."

Also, consider this wording of the last description:

"Single-Query Clustering..."

Replaced by:

"Shared Nothing Clustering
---

This allows multiple servers with separate disks to work together on a
each query.
In shared nothing clusters, the work of answering each query is
distributed among
the servers to increase the performance through parallelism.  These
systems will
typically feature high availability by using other forms of replication
internally.

While there are no open source options for this type of clustering,
there are several
commercial products available that implement this approach, making
PostgreSQL achieve
very high performance for multi-Terabyte business intelligence
databases."

- Luke


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [DOCS] [HACKERS] Replication documentation addition

2006-10-24 Thread Jeff Frost


On Tue, 24 Oct 2006, Joshua D. Drake wrote:


AFAIK Continuent's product fails that test...


To my knowledge, p/cluster only works with PostgreSQL but I could be wrong.



p/cluster was the old name for the PostgreSQL specific version.  It's been 
rebranded as uni/cluster and they have versions for both PostgreSQL and MySQL. 
One of my customers is trying it out currently.


--
Jeff Frost, Owner   <[EMAIL PROTECTED]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Bruce Momjian


OK, I have updated the URL.  Please let me know how you like it.

---

Hannu Krosing wrote:
> ?hel kenal p?eval, T, 2006-10-24 kell 00:20, kirjutas Bruce Momjian:
> > Here is a new replication documentation section I want to add for 8.2:
> > 
> > ftp://momjian.us/pub/postgresql/mypatches/replication
> 
> This is how data partitioning is currently described there
> 
> > Data Partitioning
> > -
> > 
> > Data partitioning splits the database into data sets.  To achieve
> > replication, each data set can only be modified by one server.  For
> > example, data can be partitioned by offices, e.g. London and Paris. 
> > While London and Paris servers have all data records, only London can
> > modify London records, and Paris can only modify Paris records.  Such
> > partitioning is usually accomplished in application code, though rules
> > and triggers can help enforce partitioning and keep the read-only data
> > sets current.  Slony can also be used in such a setup.  While Slony
> > replicates only entire tables, London and Paris can be placed in
> > separate tables, and inheritance can be used to access from both tables
> > using a single table name.
> 
> Maybe another use of partitioning should also be mentioned. That is ,
> when partitioning is used to overcome limitations of single servers
> (especially IO and memory, but also CPU), and only a subset of data is
> stored and processed on each server.
> 
> As an example of this type of partitioning you could mention Bizgres MPP
> (a PG-based commercial product, http://www.greenplum.com ), which
> partitions data to use I/O and CPU of several DB servers for processing
> complex OLAP queries, and Pl_Proxy
> ( http://pgfoundry.org/projects/plproxy/ ) which does the same for OLTP
> loads.
> 
> I think the "official" term for this kind of "replication" is
> Shared-Nothing Clustering.
> 
> -- 
> 
> Hannu Krosing
> Database Architect
> Skype Technologies O?
> Akadeemia tee 21 F, Tallinn, 12618, Estonia
> 
> Skype me:  callto:hkrosing
> Get Skype for free:  http://www.skype.com
> 

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [DOCS] [HACKERS] Replication documentation addition

2006-10-24 Thread Joshua D. Drake

Jim C. Nasby wrote:
> On Tue, Oct 24, 2006 at 03:33:03PM -0700, Joshua D. Drake wrote:
>> Simon Riggs wrote:
>>> On Tue, 2006-10-24 at 15:13 -0700, Joshua D. Drake wrote:
>>>
 If it were me, I would say that the replication option has to be
 specific to PostgreSQL (e.g; cjdbc or synchronous jakarta pooling
 doesn't go in).
>>> ...and how do you define PostgreSQL exactly?
>> I replication product or software defined to work with only PostgreSQL?
>  
> AFAIK Continuent's product fails that test...

To my knowledge, p/cluster only works with PostgreSQL but I could be wrong.

> 
> I don't see any reason to exclude things that work with databases other
> than PostgreSQL, though I agree that replication that's actually in the
> application space (ie: it ties you to TomCat or some other platform)
> probably doesn't belong.

I was just trying to have a defined criteria of some sort. We could fill
up pages and pages of possible replication solutions :)

Joshua D. Drake

-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [DOCS] [HACKERS] Replication documentation addition

2006-10-24 Thread Jim C. Nasby

On Tue, Oct 24, 2006 at 03:33:03PM -0700, Joshua D. Drake wrote:
> Simon Riggs wrote:
> > On Tue, 2006-10-24 at 15:13 -0700, Joshua D. Drake wrote:
> > 
> >> If it were me, I would say that the replication option has to be
> >> specific to PostgreSQL (e.g; cjdbc or synchronous jakarta pooling
> >> doesn't go in).
> > 
> > ...and how do you define PostgreSQL exactly?
> 
> I replication product or software defined to work with only PostgreSQL?

AFAIK Continuent's product fails that test...

I don't see any reason to exclude things that work with databases other
than PostgreSQL, though I agree that replication that's actually in the
application space (ie: it ties you to TomCat or some other platform)
probably doesn't belong.

My feeling is that people reading this chapter are looking for solutions
and probably don't care as much about how exactly the solution works so
long as it meets their needs.

> I know there are some other products out there that will work from one
> db to another, but I am not sure if those would be considered HA
> solutions or migration solutions (which we could certainly document).
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Jim C. Nasby

On Mon, Oct 23, 2006 at 11:39:34PM -0400, Bruce Momjian wrote:
> Query Broadcast Replication
> ---
> 
> This involves sending write queries to multiple servers.  Read-only
> queries can be sent to a single server because there is no need for all
> servers to process it.   This can be complex to setup because functions
> like random() and CURRENT_TIMESTAMP will have different values on
> different servers, and sequences should be consistent across servers.
> Pgpool implements this type of replication.

Isn't there another active project that does this besides pgpool?

It's probably also worth mentioning the commercial replication schemes
that are out there.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [DOCS] [HACKERS] Replication documentation addition

2006-10-24 Thread Joshua D. Drake

Simon Riggs wrote:
> On Tue, 2006-10-24 at 15:33 -0700, Joshua D. Drake wrote:
>> Simon Riggs wrote:
>>> On Tue, 2006-10-24 at 15:13 -0700, Joshua D. Drake wrote:
>>>
 If it were me, I would say that the replication option has to be
 specific to PostgreSQL (e.g; cjdbc or synchronous jakarta pooling
 doesn't go in).
>>> ...and how do you define PostgreSQL exactly?
>> I replication product or software defined to work with only PostgreSQL?
> 
> (again)... how do you define PostgreSQL exactly?

What about PostgreSQL is unclear? Is your question do I consider
EnterpriseDB, PostgreSQL? I have no comment on that matter.

Sincerely,

Joshua D. Drake

-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [DOCS] [HACKERS] Replication documentation addition

2006-10-24 Thread Simon Riggs

On Tue, 2006-10-24 at 15:33 -0700, Joshua D. Drake wrote:
> Simon Riggs wrote:
> > On Tue, 2006-10-24 at 15:13 -0700, Joshua D. Drake wrote:
> > 
> >> If it were me, I would say that the replication option has to be
> >> specific to PostgreSQL (e.g; cjdbc or synchronous jakarta pooling
> >> doesn't go in).
> > 
> > ...and how do you define PostgreSQL exactly?
> 
> I replication product or software defined to work with only PostgreSQL?

(again)... how do you define PostgreSQL exactly?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [DOCS] [HACKERS] Replication documentation addition

2006-10-24 Thread Joshua D. Drake

Simon Riggs wrote:
> On Tue, 2006-10-24 at 15:13 -0700, Joshua D. Drake wrote:
> 
>> If it were me, I would say that the replication option has to be
>> specific to PostgreSQL (e.g; cjdbc or synchronous jakarta pooling
>> doesn't go in).
> 
> ...and how do you define PostgreSQL exactly?

I replication product or software defined to work with only PostgreSQL?

I know there are some other products out there that will work from one
db to another, but I am not sure if those would be considered HA
solutions or migration solutions (which we could certainly document).

Sincerely,

Joshua D. Drake

-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [DOCS] [HACKERS] Replication documentation addition

2006-10-24 Thread Simon Riggs

On Tue, 2006-10-24 at 15:13 -0700, Joshua D. Drake wrote:

> If it were me, I would say that the replication option has to be
> specific to PostgreSQL (e.g; cjdbc or synchronous jakarta pooling
> doesn't go in).

...and how do you define PostgreSQL exactly?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [DOCS] Replication documentation addition

2006-10-24 Thread Simon Riggs

On Tue, 2006-10-24 at 00:20 -0400, Bruce Momjian wrote:
> Here is a new replication documentation section I want to add for 8.2:
> 
>   ftp://momjian.us/pub/postgresql/mypatches/replication
> 
> Comments welcomed.

It's a very good start to a complete minefield of competing solutions.

My first thought would be to differentiate between clustering and
replication, which will bring out many differences.

My second thought would be to differentiate between load balancing,
multi-threading, parallel query, high availability and recoverability,
which would probably sort out the true differences in the above mix. But
that wouldn't help most people and almost everybody would find fault.

IMHO most people I've spoken to take "replication" to mean an HA
solution, so perhaps we should cover it in those terms.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [DOCS] [HACKERS] Replication documentation addition

2006-10-24 Thread Joshua D. Drake

Simon Riggs wrote:
> On Tue, 2006-10-24 at 12:34 -0700, Joshua D. Drake wrote:
>>> Looking at that, I'm a) missing PgCluster and b) arguing that we have to
>>> admit that we simply can not 'list .. replication solutions ... and how
>>> to get them' because all of the solutions mentioned need quite some
>>> knowledge and require a more or less complex installation and
>>> configuration.
>> There is also the question if we should have a sub section:
>>
>> Closed Source replication solutions:
>>
>> Mammoth Replicator
>> Continuent P/Cluster
>> ExtenDB
>> Greenplum MPP (although this is kind of horizontal partitioning)
> 
> Where do you draw the line?

Well that is certainly a good question but we do include links to some
of the more prominent closed source software on the website as well.

> You maybe surprised about what other options
> that includes. I'm happy to include a whole range of things, but please
> be very careful and precise about what you wish for.

If it were me, I would say that the replication option has to be
specific to PostgreSQL (e.g; cjdbc or synchronous jakarta pooling
doesn't go in).

Sincerely,

Joshua D. Drake

-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [JDBC] server process (PID 1188) exited with exit code -1073741819, 8.2 beta1

2006-10-24 Thread Dave Cramer

This is a server bug, I will post to hackers for you, it has little  
to do with JDBC, however the ? can't be a column in a prepared statement


DAVE
On 24-Oct-06, at 4:45 PM, JEAN-PIERRE PELLETIER wrote:


Hi,

I have a query that throws "org.postgresql.util.PSQLException: An I/ 
O error occured while sending to the backend". Here is a simplified  
version of it where I used pg_database, but any table or view do  
the same.


PreparedStatement pstmt = connection.prepareStatement("select ?  
from (select * from pg_database) t");

pstmt.setObject(1, 1);
ResultSet rs = pstmt.executeQuery();

It restarts PostgreSQL with this in the Log
2006-10-24 15:35:38 LOG:  server process (PID 1188) exited with  
exit code -1073741819
2006-10-24 15:35:38 LOG:  terminating any other active server  
processes
2006-10-24 15:35:38 LOG:  all server processes terminated;  
reinitializing
2006-10-24 15:35:39 LOG:  database system was interrupted at  
2006-10-24 15:34:54 Eastern Standard Time
2006-10-24 15:35:39 LOG:  Windows fopen("recovery.conf","r")  
failed: code 2, errno 2
2006-10-24 15:35:39 LOG:  Windows fopen("pg_xlog/ 
0001.history","r") failed: code 2, errno 2
2006-10-24 15:35:39 LOG:  Windows fopen("backup_label","r") failed:  
code 2, errno 2

2006-10-24 15:35:39 LOG:  checkpoint record is at 0/31FDF0A0
2006-10-24 15:35:39 LOG:  redo record is at 0/31FDF0A0; undo record  
is at 0/0; shutdown TRUE
2006-10-24 15:35:39 LOG:  next transaction ID: 0/22535; next OID:  
101011

2006-10-24 15:35:39 LOG:  next MultiXactId: 1; next MultiXactOffset: 0
2006-10-24 15:35:39 LOG:  database system was not properly shut  
down; automatic recovery in progress

2006-10-24 15:35:39 LOG:  record with zero length at 0/31FDF0F0
2006-10-24 15:35:39 LOG:  redo is not required
2006-10-24 15:35:40 LOG:  database system is ready
2006-10-24 15:35:40 LOG:  Windows fopen("global/pg_fsm.cache","rb")  
failed: code 2, errno 2
2006-10-24 15:35:40 LOG:  transaction ID wrap limit is 2147484172,  
limited by database "postgres"
2006-10-24 15:36:40 LOG:  Windows fopen("global/pgstat.stat","rb")  
failed: code 2, errno 2
2006-10-24 15:36:40 LOG:  Windows fopen("global/pgstat.stat","rb")  
failed: code 2, errno 2


I am using PostgreSQL 8.2 beta1 on Windows XP, Service Pack 2
Both JDBC3 driver 8.1 build 407 and 8.2 build 503 do the same.

Thanks,
Jean-Pierre Pelletier
e-djuster



---(end of  
broadcast)---

TIP 6: explain analyze is your friend




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [DOCS] [HACKERS] Replication documentation addition

2006-10-24 Thread Simon Riggs

On Tue, 2006-10-24 at 12:34 -0700, Joshua D. Drake wrote:
> > Looking at that, I'm a) missing PgCluster and b) arguing that we have to
> > admit that we simply can not 'list .. replication solutions ... and how
> > to get them' because all of the solutions mentioned need quite some
> > knowledge and require a more or less complex installation and
> > configuration.
> 
> There is also the question if we should have a sub section:
> 
> Closed Source replication solutions:
> 
> Mammoth Replicator
> Continuent P/Cluster
> ExtenDB
> Greenplum MPP (although this is kind of horizontal partitioning)

Where do you draw the line? You maybe surprised about what other options
that includes. I'm happy to include a whole range of things, but please
be very careful and precise about what you wish for.

There's enough good solutions for open source PostgreSQL that it is easy
and straightforward to limit it to just that. New contributions welcome,
of course.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Release stamping (Was: [CORE] Schedule for release?)

2006-10-24 Thread Magnus Hagander

> >> Why not?  The shipped tarball would contain exactly the same
> >> pg_config.h.win32 it does today; the only difference is that the 
> >> version info would've been inserted automatically instead of 
> >> manually.
> 
> > Right. And then you can only build from tarball and not 
> from CVS, right?
> 
> But isn't that true for borland already?  We've got the .def 
> files for libpq and the help-file for psql that we don't 
> expect a borland environment to be able to build.

Borland, yes. MSVC, no. So I guess we could hav ethe MSVC bulid script
generate it for MSVC, and then stick it in the tarball for borland. It's
two different ways to do the same thing, but I guess we can do that..

//Magnus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Release stamping (Was: [CORE] Schedule for

2006-10-24 Thread Dave Page

Title: Re: [HACKERS] Release stamping (Was: [CORE] Schedule for release?) 






On 24/10/06 21:59, "Magnus Hagander" <[EMAIL PROTECTED]> wrote:


Right. And then you can only build from tarball and not from CVS, right?
Because the pg_config.h.win32 with version is actually in cvs. Or an I
missing something here?

I suggested checking the result into CVS, per the configure script.

Regards, Dave.

Re: [HACKERS] Release stamping (Was: [CORE] Schedule for release?)

2006-10-24 Thread Tom Lane

"Magnus Hagander" <[EMAIL PROTECTED]> writes:
>> Why not?  The shipped tarball would contain exactly the same
>> pg_config.h.win32 it does today; the only difference is that 
>> the version info would've been inserted automatically instead 
>> of manually.

> Right. And then you can only build from tarball and not from CVS, right?

But isn't that true for borland already?  We've got the .def files
for libpq and the help-file for psql that we don't expect a borland
environment to be able to build.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Release stamping (Was: [CORE] Schedule for release?)

2006-10-24 Thread Magnus Hagander

> >> Sorry - we're just talking about getting the version 
> number in there 
> >> automatically to avoid it getting forgotten during release 
> bundling.
> 
> > I can see that being a good idea. But I don't see Toms ./configure 
> > solution working.
> 
> Why not?  The shipped tarball would contain exactly the same
> pg_config.h.win32 it does today; the only difference is that 
> the version info would've been inserted automatically instead 
> of manually.
> (The start of this discussion was my observation that 
> pg_config.h.win32 contains multiple copies of the version 
> info, and sooner or later somebody would miss one while 
> stamping a release.)

Right. And then you can only build from tarball and not from CVS, right?
Because the pg_config.h.win32 with version is actually in cvs. Or an I
missing something here?


> > What we could do is have the msvc build scripts edit the file and 
> > replace the version with something it reads from 
> configure.in when run.
> 
> That's great if you're using msvc, but what about borland?

Good point. But we could always make that part of the script a separate
one that can be run for Borland as welll.

//Magnus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [SPAM?] Re: Asynchronous I/O Support

2006-10-24 Thread Martijn van Oosterhout

On Tue, Oct 24, 2006 at 12:53:23PM -0700, Ron Mayer wrote:
> Anyway, for those who want to see what they do in Linux,
>   http://www.gelato.unsw.edu.au/lxr/source/mm/fadvise.c
> Pretty scary that Bruce said it could make older linuxes
> dump core - there isn't a lot of code there.

The bug was probably in the glibc interface to the kernel. Google found
this:

http://sourceware.org/ml/libc-hacker/2004-03/msg0.html

i.e. posix_fadvise appears to have been broken on all 64-bit
architechtures prior to March 2004 due to a silly linking error.

And then things like this:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=313219

Which suggest that prior to glibc 2.3.5, posix_fadvise crashed on 2.4
kernels. That's a fairly recent version, so the bug would still be
fairly widespead.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] New CRC algorithm: Slicing by 8

2006-10-24 Thread Simon Riggs

On Tue, 2006-10-24 at 14:07 -0400, Tom Lane wrote:
> "Gregory Maxwell" <[EMAIL PROTECTED]> writes:
> > I'm not aware of any other system which can guaranteed the atomicity
> > of 8k writes.
> 
> The reasoning for supporting full_page_writes = off is that if you have
> a stable kernel and suitable backup power, battery backed write cache,
> etc, your risk of a partially completed write() may be low enough to
> be acceptable.  Obviously there are no 100.00% guarantees, but that's
> what you keep backups for ...
> 
> Simon is essentially arguing that if we are willing to assume no
> incomplete write() we may as well assume it for WAL too.  This seems
> to me to be raising the risk significantly, but I admit that I can't
> put my finger on why exactly.

I agree about the significant additional risk, hence the additional
parameter.

I'll do some internal testing to see what the risk-reward is. If that
seems worthwhile, then I'll post the patch for general testing/comment. 

(Incidentally, having GUCs that depend on other GUCs is bad news since
they are set alphabetically. I'd want to only allow wal_checksum=off iff
full_page_writes=off, which will work, but only because W comes after F
and for no other reason. Generic solution for dependent GUCs would be
great...)

> One point not directly related to crash safety is whether CRC checking
> isn't still going to be a good idea when PITR is enabled.  Archived WAL
> files are going to have been through a great deal more transferring than
> ones merely being used to recover from a crash.

Agreed. Both disks and tapes/other mechanisms must be known CRC-safe
before this idea would be worth using in production. Many enterprises do
already think they have bomb-proof kit, so we may as well support them
in that belief. 

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [SPAM?] Re: Asynchronous I/O Support

2006-10-24 Thread Ron Mayer

Zeugswetter Andreas ADI SD wrote:
> POSIX_FADV_WILLNEED definitely sounds very interesting, but:
> 
> I think this interface was intended to hint larger areas (megabytes).
> But the "wishful" thinking was not to hint seq scans, but to advise
> single 8k pages.

Surely POSIX_FADV_SEQUENTIAL is the one intended to hint seq scans,
and POSIX_FADV_RANDOM to hint random access.  No?

ISTM, _WILLNEED seems just right for small random-access blocks.

Anyway, for those who want to see what they do in Linux,
  http://www.gelato.unsw.edu.au/lxr/source/mm/fadvise.c
Pretty scary that Bruce said it could make older linuxes
dump core - there isn't a lot of code there.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] New CRC algorithm: Slicing by 8

2006-10-24 Thread Gregory Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> Simon is essentially arguing that if we are willing to assume no
> incomplete write() we may as well assume it for WAL too.  This seems
> to me to be raising the risk significantly, but I admit that I can't
> put my finger on why exactly.

Actually I think we can deal with torn pages in the WAL more easily than in
database files anyways. In database files we need to get the entire page
correctly one way or the other so we need full_page_writes in order to be deal
properly. 

In the WAL we just need to be able to detect torn pages and stop reading WAL
at that point. That's easier and doesn't really need a CRC. We could just
adopt the Sybase strategy of storing a unique id number every 512 bytes
throughout the WAL page. If those numbers don't match then we have a torn
page; the system crashed at that point and we should stop reading WAL pages.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [DOCS] [HACKERS] Replication documentation addition

2006-10-24 Thread Joshua D. Drake


> Looking at that, I'm a) missing PgCluster and b) arguing that we have to
> admit that we simply can not 'list .. replication solutions ... and how
> to get them' because all of the solutions mentioned need quite some
> knowledge and require a more or less complex installation and
> configuration.

There is also the question if we should have a sub section:

Closed Source replication solutions:

Mammoth Replicator
Continuent P/Cluster
ExtenDB
Greenplum MPP (although this is kind of horizontal partitioning)

Joshua D. Drake


> 
> Regards
> 
> Markus
> 
> 
> 
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
> 
>   http://archives.postgresql.org
> 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Markus Schiltknecht


Hello Josh,

Josh Berkus wrote:
Hmmm ... while the primer on different types of replication is fine, I 
think what users were really looking for is a listing of the different 
replication solutions which are available for PostgreSQL and how to get 
them.


Well, let's see what we have:

* Shared Disk Fail Over
* Warm Standby Using Point-In-Time Recovery
* Point-In-Time Recovery

these first three require quite some configuration, AFAIK there is no 
tool or single solution you can download, install and be happy with. I 
probably wouldn't even call them 'replication solutions'. For me those 
are more like backups with fail-over capability.



* Continuously Running Fail-Over Server

(BTW, what is 'partial replication' supposed to mean here?)
Here we could link to Slony.


* Data Partitioning

Here we can't provide a link, it's just a way to handle the problem in 
the application code.



* Query Broadcast Replication

Here we could link to PgPool.


* Multi-Master Replication
  (or better: Distributed Shared Memory Replication)

No existing solution for PostgreSQL.


Looking at that, I'm a) missing PgCluster and b) arguing that we have to 
admit that we simply can not 'list .. replication solutions ... and how 
to get them' because all of the solutions mentioned need quite some 
knowledge and require a more or less complex installation and configuration.


Regards

Markus



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [pgsql-advocacy] Conference materials (Was: [HACKERS] pdfs of

2006-10-24 Thread Andrew Sullivan

On Tue, Oct 24, 2006 at 10:13:21PM +0400, Teodor Sigaev wrote:
> Tsearch2 round table should have two presentations, one of them is already 
> on site, another: http://www.sigaev.ru/gin/Gin.pdf

Added.  Thanks.

By the way, I should have mentioned that there are apparently more
audio files on the way, but we don't have them yet.  Someone is
reportedly doing some sort of audio magic to improve the sound.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
The plural of anecdote is not data.
--Roger Brinner

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Incorrect behavior with CE and ORDER BY

2006-10-24 Thread Alvaro Herrera


I followed up with Joshua on Jabber.  This is the query:

SELECT pse_userid FROM tbl_profile_search WHERE pse_normalized_text='1'
and pse_interest_type = 10 order by pse_lastlogin DESC limit 50 offset 0

I suggested adding an index on (pse_normalized_text, pse_lastlogin), on
the assumption that the planner would get the sorted output from there
and be able to push the LIMIT clause, just below the indexscan, thus
saving the big heap scan (and a sort across a large result set).  But it
turns out the index is already there.

So it seems to me to be a planner shortcoming.  Is this correct?

My idea of the plan would be (tabs=8 spaces)

Limit (50)
  Sort (key: pse_lastlogin)
Result
   Append
  SeqScan tbl_profile_search
  Indexscan tbl_profile_search_interest_1
  Limit (50)
 IndexScan on the index mentioned above

Is this possible?  It would be very fast.  Maybe it should be like this
instead:

Limit (50)
  Sort (key: pse_lastlogin)
Result
   Append
  Limit (50)
 SeqScan tbl_profile_search
  Limit (50)
 Indexscan tbl_profile_search_interest_1
  Limit (50)
 IndexScan on the index mentioned above

This is the actual plan:

>  Limit  (cost=47110.19..47110.31 rows=50 width=8) (actual
> time=6088.013..6088.269 rows=50 loops=1)
>->  Sort  (cost=47110.19..47943.14 rows=333179 width=8) (actual
> time=6088.007..6088.104 rows=50 loops=1)
>  Sort Key: public.tbl_profile_search.pse_lastlogin
>  ->  Result  (cost=0.00..16547.78 rows=333179 width=8) (actual
> time=0.020..4339.472 rows=334319 loops=1)
>->  Append  (cost=0.00..16547.78 rows=333179 width=8)
> (actual time=0.016..3208.022 rows=334319 loops=1)
>  ->  Seq Scan on tbl_profile_search
> (cost=0.00..2.27 rows=1 width=8) (actual time=0.012..0.047 rows=2 loops=1)
>Filter: (((pse_normalized_text)::text =
> '1'::text) AND (pse_interest_type = 10))
>  ->  Index Scan using index_pse_09_on_part_1 on
> tbl_profile_search_interest_1 tbl_profile_search  (cost=0.00..4.73
> rows=1 width=8) (actual time=0.202..0.202 rows=0 loops=1)
>Index Cond: ((pse_normalized_text)::text =
> '1'::text)
>Filter: (pse_interest_type = 10)
>  ->  Bitmap Heap Scan on
> tbl_profile_search_interest_10 tbl_profile_search
> (cost=3579.12..16540.78 rows=333177 width=8) (actual
> time=90.619..2116.224 rows=334317 loops=1)
>Recheck Cond: ((pse_normalized_text)::text =
> '1'::text)
>Filter: (pse_interest_type = 10)
>->  Bitmap Index Scan on
> index_pse_09_on_part_10  (cost=0.00..3579.12 rows=333177 width=0)
> (actual time=89.052..89.052 rows=340964 loops=1)
>  Index Cond:
> ((pse_normalized_text)::text = '1'::text)
>  Total runtime: 6103.190 ms


-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Josh Berkus

Bruce,

> Here is my first draft of a new replication section for our
> documentation.  I am looking for any comments.

Hmmm ... while the primer on different types of replication is fine, I 
think what users were really looking for is a listing of the different 
replication solutions which are available for PostgreSQL and how to get 
them.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [pgsql-advocacy] Conference materials (Was: [HACKERS] pdfs of

2006-10-24 Thread Teodor Sigaev


this.  If your materials are not there, it's because we don't have
them.  If you send them to me, I'll put them in place as soon as I
receive them.  Really, I will.  My TODO list doesn't need to get


Tsearch2 round table should have two presentations, one of them is already on 
site, another: http://www.sigaev.ru/gin/Gin.pdf

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] New CRC algorithm: Slicing by 8

2006-10-24 Thread Gurjeet Singh

Sorry for getting into the conversation so late... It was a long weekend in India.On 10/23/06, Tom Lane <
[EMAIL PROTECTED]
> wrote:I didn't particularly trust the timing calculations in your benchmark

program,     Any particular reason? (why and what did you doubt in it?).    I designed the prog. to be flexible to test different sized blocks (to cost single/less INIT/COMP/FIN iterations), and different size lists of data (to control the number of iterations). Please share you wisdom.
    When I first saw your results, I had a strong feeling that function-call overhead was going against SB8. And then, Jeremy's trials, and subsequent success, on disabling loop optimizations also pointed to this possibility.
    So, I have taken your tests and converted the SB8 function calls into macros. And the results are (please note that crc = 0 is explained later):std_8192_noprintcrc.outcrc = 0, bufsize = 8192, loops = 100, elapsed = 
8.471994sb8_8192_noprintcrc.outcrc = 0, bufsize = 8192, loops = 100, elapsed = 0.06std_8192_printcrc.outcrc = 8228BB0E, bufsize = 8192, loops = 100, elapsed = 32.490704sb8_8192_printcrc.out
crc = 7E67A22A, bufsize = 8192, loops = 100, elapsed = 22.349156std_64_noprintcrc.outcrc = 0, bufsize = 64, loops = 100, elapsed = 0.151354sb8_64_noprintcrc.outcrc = 0, bufsize = 64, loops = 100, elapsed = 
0.05std_64_printcrc.outcrc = 9C9FBE2E, bufsize = 64, loops = 100, elapsed = 0.559315sb8_64_printcrc.outcrc = F70BC6AE, bufsize = 64, loops = 100, elapsed = 0.357382    The result names are in the format: __.out
    crc = 0 in the result means that the mycrc variable was not refereced anywhere after the for-loop. As can be seen, if mycrc is not refrenced in the printf, that is, it's usage is limited to just inside the 'for' loop, then GCC (
4.1) seems to be optimizing the loop heavily. In the case of SB8, if mycrc is not referenced later, it seems to have totally removed the loop!!!    The only difference between the _noprintcrc and the _printcrc tests was that in the printf() call, the first parameter after the format string was either a zero or mycrc variable, respectively.
    I am highly apprehensive that I might have made some mistake while converting function calls to macros; though, I have not besen able to prove it thus far. Please check it's validity as compared to the function-call version.
    If there's no mistake, then I think SB8 is back in the performance game now. These results were obtained with gcc 4.1 on FC5 running on Intel Pentium M 1.86 GHz, and OS starteted and running in runlevel 3.
    Please dump the .c and .h files from the attachment on top of Tom's package, and test it as earlier.Best regards,-- [EMAIL PROTECTED][EMAIL PROTECTED] gmail | hotmail | yahoo }.com



my-crctest.tar.gz
Description: GNU Zip compressed data

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] New CRC algorithm: Slicing by 8

2006-10-24 Thread Tom Lane

"Gregory Maxwell" <[EMAIL PROTECTED]> writes:
> I'm not aware of any other system which can guaranteed the atomicity
> of 8k writes.

The reasoning for supporting full_page_writes = off is that if you have
a stable kernel and suitable backup power, battery backed write cache,
etc, your risk of a partially completed write() may be low enough to
be acceptable.  Obviously there are no 100.00% guarantees, but that's
what you keep backups for ...

Simon is essentially arguing that if we are willing to assume no
incomplete write() we may as well assume it for WAL too.  This seems
to me to be raising the risk significantly, but I admit that I can't
put my finger on why exactly.

One point not directly related to crash safety is whether CRC checking
isn't still going to be a good idea when PITR is enabled.  Archived WAL
files are going to have been through a great deal more transferring than
ones merely being used to recover from a crash.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Incorrect behavior with CE and ORDER BY

2006-10-24 Thread Joshua D. Drake

Alvaro Herrera wrote:
> Joshua D. Drake wrote:
>> Tom Lane wrote:
>>> "Joshua D. Drake" <[EMAIL PROTECTED]> writes:
 We have a problem with CE that I want to verify is either expected
 behavior, a bug or something else :).
>>> Uh, what's your problem exactly?  The example only seems to demonstrate
>>> that if you don't ask for a sort, you don't get one.
>> Sorry. The problem is, if I ask for an ORDER BY it scans all partitions
>> versus only scanning the partition that has the data in it.
> 
> Huh, but that's not what the EXPLAIN ANALYZE you posted says ...
> 

Sorry I realize the error of my ways. It isn't that it is scanning all
partitions, it is that it is scanning all of a single partition (subject
to the WHERE clause). That is correct behavior.

Sincerely,

Joshua D. Drake

-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] New CRC algorithm: Slicing by 8

2006-10-24 Thread Gregory Maxwell


On 10/24/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

I wasn't aware that a system could protect against this. :-)

I write 8 Kbytes - how can I guarantee that the underlying disk writes
all 8 Kbytes before it loses power? And why isn't the CRC a valid means
of dealing with this? :-)


[snip]

A file system with an apropreiate transaction method could do this..
In *theory* reiser4 write()s are atomic. No one has verified, however,
that there is no torn page risk introduced in some other part of the
kernel.

I'm not aware of any other system which can guaranteed the atomicity
of 8k writes.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] New CRC algorithm: Slicing by 8

2006-10-24 Thread Alvaro Herrera

[EMAIL PROTECTED] wrote:

> I'm on wrong on one of these assumptions, I'm open to being educated.
> My opinion as of a few seconds ago, is that a write to a single disk
> sector is safe, but that a write that extends across several sectors
> is not.

Unless it's fsync'ed, which is what we do at CHECKPOINT.  Keep in mind
that we save full page images on WAL the first time we touch the page
after a checkpoint.  This means that if a partial write occured, we will
restore it from WAL.

So it's not safe in general, but it is safe in Postgres.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] New CRC algorithm: Slicing by 8

2006-10-24 Thread Simon Riggs

On Tue, 2006-10-24 at 14:52 +0100, Simon Riggs wrote:
> On Tue, 2006-10-24 at 09:37 -0400, Tom Lane wrote:
> > "Simon Riggs" <[EMAIL PROTECTED]> writes:
> > > On Mon, 2006-10-23 at 15:12 -0400, Tom Lane wrote:
> > >> There are actually three checks used to detect end of WAL: zero record
> > >> length, invalid checksum, and incorrect back-pointer.  Zero length is
> > >> the first and cleanest-looking test, but AFAICS we have to have both of
> > >> the others to avoid obvious failure modes.
> > 
> > > The checksum protects from torn pages and disk errors. If you have
> > > full_page_writes set then you already believe yourself safe from torn
> > > pages and your disks could also already be CRC-checking the data.
> > 
> > No, because unlike tuples, WAL records can and do cross page boundaries.
> 
> But not that often, with full_page_writes = off. So we could get away
> with just CRC checking the page-spanning ones and mark the records to
> show whether they have been CRC checked or not and need to be rechecked
> at recovery time. That would reduce the CRC overhead to about 1-5% of
> what it is now (as an option).

Looking further, I see that the xlog page header already contains
xlp_pageaddr which is a XLogRecPtr. So an xlrec that tried to span
multiple pages yet failed in between would easily show up as a failure
in ValidXLOGHeader(), even before the CRC check. [The XLogRecPtr
contains both the offset within the file and a unique identification of
the WAL file, so any data left over from previous uses of that data file
will be easily recognised as such].

So we don't even need to CRC check the page-spanning ones either.

So it all comes down to: do you trust your hardware? I accept that some
hardware/OS combinations will give you high risk, others will reduce it
considerably. All I'm looking to do is to pass on the savings for people
that are confident they have invested wisely in hardware.

Does anybody have a reasonable objection to introducing an option for
people that are comfortable they are making the correct choice?
wal_checksum = off (or other suggested naming...)

I don't want to take blind risks, so shoot me down, please, if I err.
I'm happy either way: either we speed up, or we're safer not to.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Incorrect behavior with CE and ORDER BY

2006-10-24 Thread Alvaro Herrera

Joshua D. Drake wrote:
> Tom Lane wrote:
> > "Joshua D. Drake" <[EMAIL PROTECTED]> writes:
> >> We have a problem with CE that I want to verify is either expected
> >> behavior, a bug or something else :).
> > 
> > Uh, what's your problem exactly?  The example only seems to demonstrate
> > that if you don't ask for a sort, you don't get one.
> 
> Sorry. The problem is, if I ask for an ORDER BY it scans all partitions
> versus only scanning the partition that has the data in it.

Huh, but that's not what the EXPLAIN ANALYZE you posted says ...

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Incorrect behavior with CE and ORDER BY

2006-10-24 Thread Joshua D. Drake

Tom Lane wrote:
> "Joshua D. Drake" <[EMAIL PROTECTED]> writes:
>> We have a problem with CE that I want to verify is either expected
>> behavior, a bug or something else :).
> 
> Uh, what's your problem exactly?  The example only seems to demonstrate
> that if you don't ask for a sort, you don't get one.

Sorry. The problem is, if I ask for an ORDER BY it scans all partitions
versus only scanning the partition that has the data in it.

Sincerely,

Joshua D. Drake

> 
>   regards, tom lane
> 

-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] New CRC algorithm: Slicing by 8

2006-10-24 Thread Simon Riggs

On Tue, 2006-10-24 at 12:47 -0400, [EMAIL PROTECTED] wrote:
> On Tue, Oct 24, 2006 at 05:05:58PM +0100, Simon Riggs wrote:
> > On Tue, 2006-10-24 at 10:18 -0400, [EMAIL PROTECTED] wrote:
> > > On Tue, Oct 24, 2006 at 02:52:36PM +0100, Simon Riggs wrote:
> > > > On Tue, 2006-10-24 at 09:37 -0400, Tom Lane wrote:
> > > > > No, because unlike tuples, WAL records can and do cross page 
> > > > > boundaries.
> > > > But not that often, with full_page_writes = off. So we could get away
> > > > with just CRC checking the page-spanning ones and mark the records to
> > > > show whether they have been CRC checked or not and need to be rechecked
> > > > at recovery time. That would reduce the CRC overhead to about 1-5% of
> > > > what it is now (as an option).
> > > 
> > > WAL pages 8 Kbytes, and disk pages 512 bytes, correct? I don't see a
> > > guarantee in here that the 8 Kbytes worth of data will be written as
> > > sequential writes, nor that the 8 Kbytes of data will necessarily
> > > finish.
> > > 
> > > If the operating system uses 8 Kbyte pages, or the RAID system uses 8
> > > Kbytes or larger chunks, and they guarantee sequential writes, perhaps
> > > it is ok. Still, if the power goes out after writing the first 512
> > > bytes, 2048 bytes, or 4096 bytes, then what? With RAID involved it
> > > might get better or worse, depending on the RAID configuration.
> > 
> > That is the torn-page problem. If your system doesn't already protect
> > you against this you have no business turning off full_page_writes,
> > which was one of my starting assumptions.
> 
> I wasn't aware that a system could protect against this. :-)
> 
> I write 8 Kbytes - how can I guarantee that the underlying disk writes
> all 8 Kbytes before it loses power? And why isn't the CRC a valid means
> of dealing with this? :-)
> 
> I'm on wrong on one of these assumptions, I'm open to being educated.
> My opinion as of a few seconds ago, is that a write to a single disk
> sector is safe, but that a write that extends across several sectors
> is not.

http://developer.postgresql.org/pgdocs/postgres/runtime-config-wal.html

full_page_writes = off

I'm very happy to learn more myself...

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Incorrect behavior with CE and ORDER BY

2006-10-24 Thread Tom Lane

"Joshua D. Drake" <[EMAIL PROTECTED]> writes:
> We have a problem with CE that I want to verify is either expected
> behavior, a bug or something else :).

Uh, what's your problem exactly?  The example only seems to demonstrate
that if you don't ask for a sort, you don't get one.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] New CRC algorithm: Slicing by 8

2006-10-24 Thread mark

On Tue, Oct 24, 2006 at 05:05:58PM +0100, Simon Riggs wrote:
> On Tue, 2006-10-24 at 10:18 -0400, [EMAIL PROTECTED] wrote:
> > On Tue, Oct 24, 2006 at 02:52:36PM +0100, Simon Riggs wrote:
> > > On Tue, 2006-10-24 at 09:37 -0400, Tom Lane wrote:
> > > > No, because unlike tuples, WAL records can and do cross page boundaries.
> > > But not that often, with full_page_writes = off. So we could get away
> > > with just CRC checking the page-spanning ones and mark the records to
> > > show whether they have been CRC checked or not and need to be rechecked
> > > at recovery time. That would reduce the CRC overhead to about 1-5% of
> > > what it is now (as an option).
> > 
> > WAL pages 8 Kbytes, and disk pages 512 bytes, correct? I don't see a
> > guarantee in here that the 8 Kbytes worth of data will be written as
> > sequential writes, nor that the 8 Kbytes of data will necessarily
> > finish.
> > 
> > If the operating system uses 8 Kbyte pages, or the RAID system uses 8
> > Kbytes or larger chunks, and they guarantee sequential writes, perhaps
> > it is ok. Still, if the power goes out after writing the first 512
> > bytes, 2048 bytes, or 4096 bytes, then what? With RAID involved it
> > might get better or worse, depending on the RAID configuration.
> 
> That is the torn-page problem. If your system doesn't already protect
> you against this you have no business turning off full_page_writes,
> which was one of my starting assumptions.

I wasn't aware that a system could protect against this. :-)

I write 8 Kbytes - how can I guarantee that the underlying disk writes
all 8 Kbytes before it loses power? And why isn't the CRC a valid means
of dealing with this? :-)

I'm on wrong on one of these assumptions, I'm open to being educated.
My opinion as of a few seconds ago, is that a write to a single disk
sector is safe, but that a write that extends across several sectors
is not.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[HACKERS] Incorrect behavior with CE and ORDER BY

2006-10-24 Thread Joshua D. Drake

Hello,

We have a problem with CE that I want to verify is either expected
behavior, a bug or something else :).

Yes constraint exclusion is on.

I have tried increasing the default_statistics_target (all the way 1000)
no change in behavior.

Query plan with ORDER BY:

 Limit  (cost=47110.19..47110.31 rows=50 width=8) (actual
time=6088.013..6088.269 rows=50 loops=1)
   ->  Sort  (cost=47110.19..47943.14 rows=333179 width=8) (actual
time=6088.007..6088.104 rows=50 loops=1)
 Sort Key: public.tbl_profile_search.pse_lastlogin
 ->  Result  (cost=0.00..16547.78 rows=333179 width=8) (actual
time=0.020..4339.472 rows=334319 loops=1)
   ->  Append  (cost=0.00..16547.78 rows=333179 width=8)
(actual time=0.016..3208.022 rows=334319 loops=1)
 ->  Seq Scan on tbl_profile_search
(cost=0.00..2.27 rows=1 width=8) (actual time=0.012..0.047 rows=2 loops=1)
   Filter: (((pse_normalized_text)::text =
'1'::text) AND (pse_interest_type = 10))
 ->  Index Scan using index_pse_09_on_part_1 on
tbl_profile_search_interest_1 tbl_profile_search  (cost=0.00..4.73
rows=1 width=8) (actual time=0.202..0.202 rows=0 loops=1)
   Index Cond: ((pse_normalized_text)::text =
'1'::text)
   Filter: (pse_interest_type = 10)
 ->  Bitmap Heap Scan on
tbl_profile_search_interest_10 tbl_profile_search
(cost=3579.12..16540.78 rows=333177 width=8) (actual
time=90.619..2116.224 rows=334317 loops=1)
   Recheck Cond: ((pse_normalized_text)::text =
'1'::text)
   Filter: (pse_interest_type = 10)
   ->  Bitmap Index Scan on
index_pse_09_on_part_10  (cost=0.00..3579.12 rows=333177 width=0)
(actual time=89.052..89.052 rows=340964 loops=1)
 Index Cond:
((pse_normalized_text)::text = '1'::text)
 Total runtime: 6103.190 ms


Same query, just removed ORDER BY:


---
 Limit  (cost=0.00..2.48 rows=50 width=4) (actual time=0.025..57.146
rows=50 loops=1)
   ->  Result  (cost=0.00..16549.78 rows=333179 width=4) (actual
time=0.021..56.993 rows=50 loops=1)
 ->  Append  (cost=0.00..16549.78 rows=333179 width=4) (actual
time=0.017..56.835 rows=50 loops=1)
   ->  Seq Scan on tbl_profile_search  (cost=0.00..2.27
rows=1 width=4) (actual time=0.013..0.050 rows=2 loops=1)
 Filter: (((pse_normalized_text)::text = '1'::text)
AND (pse_interest_type = 10))
   ->  Index Scan using index_pse_09_on_part_1 on
tbl_profile_search_interest_1 tbl_profile_search  (cost=0.00..4.73
rows=1 width=4) (actual time=0.051..0.051 rows=0 loops=1)
 Index Cond: ((pse_normalized_text)::text = '1'::text)
 Filter: (pse_interest_type = 10)
   ->  Bitmap Heap Scan on tbl_profile_search_interest_10
tbl_profile_search  (cost=3581.12..16542.78 rows=333177 width=4) (actual
time=56.481..56.573 rows=48 loops=1)
 Recheck Cond: ((pse_normalized_text)::text = '1'::text)
 Filter: (pse_interest_type = 10)
 ->  Bitmap Index Scan on index_pse_09_on_part_10
(cost=0.00..3581.12 rows=333177 width=0) (actual time=54.999..54.999
rows=341233 loops=1)
   Index Cond: ((pse_normalized_text)::text =
'1'::text)
 Total runtime: 57.396 ms


-- 

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] New CRC algorithm: Slicing by 8

2006-10-24 Thread Simon Riggs

On Tue, 2006-10-24 at 10:18 -0400, [EMAIL PROTECTED] wrote:
> On Tue, Oct 24, 2006 at 02:52:36PM +0100, Simon Riggs wrote:
> > On Tue, 2006-10-24 at 09:37 -0400, Tom Lane wrote:
> > > No, because unlike tuples, WAL records can and do cross page boundaries.
> 
> > But not that often, with full_page_writes = off. So we could get away
> > with just CRC checking the page-spanning ones and mark the records to
> > show whether they have been CRC checked or not and need to be rechecked
> > at recovery time. That would reduce the CRC overhead to about 1-5% of
> > what it is now (as an option).
> 
> WAL pages 8 Kbytes, and disk pages 512 bytes, correct? I don't see a
> guarantee in here that the 8 Kbytes worth of data will be written as
> sequential writes, nor that the 8 Kbytes of data will necessarily
> finish.
> 
> If the operating system uses 8 Kbyte pages, or the RAID system uses 8
> Kbytes or larger chunks, and they guarantee sequential writes, perhaps
> it is ok. Still, if the power goes out after writing the first 512
> bytes, 2048 bytes, or 4096 bytes, then what? With RAID involved it
> might get better or worse, depending on the RAID configuration.

That is the torn-page problem. If your system doesn't already protect
you against this you have no business turning off full_page_writes,
which was one of my starting assumptions.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Release stamping (Was: [CORE] Schedule for release?)

2006-10-24 Thread Peter Eisentraut

Tom Lane wrote:
> Why not?  The shipped tarball would contain exactly the same
> pg_config.h.win32 it does today; the only difference is that the
> version info would've been inserted automatically instead of
> manually.

I suggest you do it in a makefile as part of the distprep target.

distprep: pg_config.h.win32

pg_config.h.win32: pg_config.h.win32.in
sed 's/@VERSION@/$(VERSION)/g' $< >$@

maintainer-clean:
rm -f pg_config.h.win32


That way we don't bother every configure runner with the issue.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Release stamping (Was: [CORE] Schedule for release?)

2006-10-24 Thread Tom Lane

"Magnus Hagander" <[EMAIL PROTECTED]> writes:
>> Sorry - we're just talking about getting the version number 
>> in there automatically to avoid it getting forgotten during 
>> release bundling.

> I can see that being a good idea. But I don't see Toms ./configure
> solution working.

Why not?  The shipped tarball would contain exactly the same
pg_config.h.win32 it does today; the only difference is that the
version info would've been inserted automatically instead of manually.
(The start of this discussion was my observation that pg_config.h.win32
contains multiple copies of the version info, and sooner or later
somebody would miss one while stamping a release.)

> What we could do is have the msvc build scripts edit the file and
> replace the version with something it reads from configure.in when run.

That's great if you're using msvc, but what about borland?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Release stamping (Was: [CORE] Schedule for release?)

2006-10-24 Thread Magnus Hagander

> > Getting late into this discussion, so I may be completely 
> off here :-) 
> > How's that going to work+ pg_config.h.win32 needs to know
> > win32 platform
> > specifics, right? So it has to be created, in that case, on 
> win32. But 
> > when you're building with MSVC, you don't run configure, because 
> > windows can't run that (without the mingw layer).
> 
> Sorry - we're just talking about getting the version number 
> in there automatically to avoid it getting forgotten during 
> release bundling.

I can see that being a good idea. But I don't see Toms ./configure
solution working.

What we could do is have the msvc build scripts edit the file and
replace the version with something it reads from configure.in when run.
This would require that we zap the old "win32.mak" method of buildnig
win32 stuff, which we can't do just yet but IMHO can eventually do. 

The other option is, I would think, to break out the version #defines
into a separate headerfile that's used on all platforms, and use that
one *instead* of configure to set it.

//Magnus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Release stamping (Was: [CORE] Schedule for release?)

2006-10-24 Thread Dave Page

 

> -Original Message-
> From: Magnus Hagander [mailto:[EMAIL PROTECTED] 
> Sent: 24 October 2006 15:56
> To: Tom Lane; Dave Page
> Cc: pgsql-hackers@postgresql.org
> Subject: RE: [HACKERS] Release stamping (Was: [CORE] Schedule 
> for release?) 
> 
> Getting late into this discussion, so I may be completely off here :-)
> How's that going to work+ pg_config.h.win32 needs to know 
> win32 platform
> specifics, right? So it has to be created, in that case, on win32. But
> when you're building with MSVC, you don't run configure, 
> because windows
> can't run that (without the mingw layer).

Sorry - we're just talking about getting the version number in there
automatically to avoid it getting forgotten during release bundling.

Regards, Dave.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Conference materials (Was: [HACKERS] pdfs of the conference)

2006-10-24 Thread Andrew Sullivan

On Tue, Sep 19, 2006 at 12:29:18PM -0400, Andrew Sullivan wrote:
> The slides, all the photos, and even the audio are, I've been
> assured, going to get cleared up in the next few days. 

Well, those were some very long days, but it seems a good time to
note that the slides and audio (all that we have) are now on the
conference site, linked to the conference programme from this summer:

http://conference.postgresql.org/Program

Thanks very much to Gavin Sherry, who did a great deal of the work on
this.  If your materials are not there, it's because we don't have
them.  If you send them to me, I'll put them in place as soon as I
receive them.  Really, I will.  My TODO list doesn't need to get
longer.

I have the unhappy duty of announcing, too, that the audio recordings
from the second room at the conference were lost, along with the rest
of the things on the laptop where they were stored.  

Thanks very much to all the speakers and attendees who made the
conference a success.

Best,
Andrew

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
The whole tendency of modern prose is away from concreteness.
--George Orwell

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Release stamping (Was: [CORE] Schedule for release?)

2006-10-24 Thread Magnus Hagander

> >> The pg_config.h.win32 file is intended to support building in an 
> >> environment where you can't run automake/autoconf, or 
> indeed much of 
> >> anything else.
> 
> > That doesn't matter does it? Marc runs the bootstrap, which inserts 
> > the version numbers into the right place and runs autoconf, then he 
> > commits the changed files (configure, pg_config.h.win32 
> etc) to CVS. 
> > Only he (or you or Bruce) should ever need to run it.
> 
> Hmm, so manufacture pg_config.h.win32 during tarball build 
> and insert the version numbers at that point?  Yeah, that 
> would work.  Actually the easiest thing would likely be to 
> have configure build it the same way it builds pg_config.h, 
> and then not remove it in "make distclean".

Getting late into this discussion, so I may be completely off here :-)
How's that going to work+ pg_config.h.win32 needs to know win32 platform
specifics, right? So it has to be created, in that case, on win32. But
when you're building with MSVC, you don't run configure, because windows
can't run that (without the mingw layer).

//Magnus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] New CRC algorithm: Slicing by 8

2006-10-24 Thread Tom Lane

[EMAIL PROTECTED] writes:
> The 5 year claim seems
> decades too short unless they are talking about a newer technology.

I think what Simon is on about is CRCs being routinely used on the cable
between the disk drive and the CPU.  When I was involved in this stuff
you usually only got a parity bit on each byte transferred.  CRCs for
all disk command/result messages would definitely help close a risk area
--- but that's only one risk of many.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Release stamping (Was: [CORE] Schedule for release?)

2006-10-24 Thread Peter Eisentraut

Tom Lane wrote:
> Hmm, so manufacture pg_config.h.win32 during tarball build and insert
> the version numbers at that point?  Yeah, that would work.  Actually
> the easiest thing would likely be to have configure build it the same
> way it builds pg_config.h, and then not remove it in "make
> distclean".

The libpq makefile contains ample precedent for preparing Windows 
specific files during tarball building.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] New CRC algorithm: Slicing by 8

2006-10-24 Thread mark

On Tue, Oct 24, 2006 at 02:52:36PM +0100, Simon Riggs wrote:
> On Tue, 2006-10-24 at 09:37 -0400, Tom Lane wrote:
> > "Simon Riggs" <[EMAIL PROTECTED]> writes:
> > > On Mon, 2006-10-23 at 15:12 -0400, Tom Lane wrote:
> > >> There are actually three checks used to detect end of WAL: zero record
> > >> length, invalid checksum, and incorrect back-pointer.  Zero length is
> > >> the first and cleanest-looking test, but AFAICS we have to have both of
> > >> the others to avoid obvious failure modes.
> > > The checksum protects from torn pages and disk errors. If you have
> > > full_page_writes set then you already believe yourself safe from torn
> > > pages and your disks could also already be CRC-checking the data.
> > No, because unlike tuples, WAL records can and do cross page boundaries.

> But not that often, with full_page_writes = off. So we could get away
> with just CRC checking the page-spanning ones and mark the records to
> show whether they have been CRC checked or not and need to be rechecked
> at recovery time. That would reduce the CRC overhead to about 1-5% of
> what it is now (as an option).

WAL pages 8 Kbytes, and disk pages 512 bytes, correct? I don't see a
guarantee in here that the 8 Kbytes worth of data will be written as
sequential writes, nor that the 8 Kbytes of data will necessarily
finish.

If the operating system uses 8 Kbyte pages, or the RAID system uses 8
Kbytes or larger chunks, and they guarantee sequential writes, perhaps
it is ok. Still, if the power goes out after writing the first 512
bytes, 2048 bytes, or 4096 bytes, then what? With RAID involved it
might get better or worse, depending on the RAID configuration.

I'm almost wondering whether the three numbers are enough. I'm too busy
to sketch it all down and predict failure points... :-)

> Just a thought: Would there be benefit in not allowing page-spanning WAL
> records, when they are small? That would tend to reduce the number of
> WAL writes, even if it did cause some space wastage on disk. That would
> reduce the number of same block re-writes and might improve the
> sequential behaviour of WAL access. We never needed to think about this
> while full_page_writes=on was the only option.

Probably. Might not be much though.

> > > CRC-checked disks are actually the industry norm and have been for
> > > around 5 years.
> > Huh?  Disks have ALWAYS had CRCs, and this is in any case utterly
> > irrelevant to the database-crash risk.
> According to the man from Seagate...

Once upon a time when bits were stored the size of smarties (exaggeration)
Back in those days (before my time), they liked to use things like parity.

I didn't read the page, but perhaps there is some confusion between
CRC and error correction codes. Obviously, technologies were
introduced over time. I don't remember ever having a hard disk that
didn't have some sort of error detection. The 5 year claim seems
decades too short unless they are talking about a newer technology.
Even the old 5.25" DOS floppies seemed to be able to detect errors
rather than return invalid corrupted bits.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [PATCHES] WAL logging freezing

2006-10-24 Thread Tom Lane

Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> I would've liked to give freezing a new opcode, 
>   but we've ran out of them (see htup.h).

Hardly ... we have plenty of unused rmgr id's still.

The real issue that still has to be resolved is the interaction of all
this stuff with PITR scenarios --- is it still safe to not log hint-bit
updates when PITR is on?  I think it's premature to start writing
patches until we've decided how this really needs to work.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Markus Schiltknecht


Hannu Krosing wrote:

I think the "official" term for this kind of "replication" is
Shared-Nothing Clustering.


Well, that's just another distinction for clusters. Most of the time 
it's between Shared-Disk vs. Shared-Nothing. You could also see the very 
Big Irons as a Shared-Everything Cluster.


While it's certainly true, that any kind of data partitioning for 
databases only make sense for Shared-Nothing Clusters, I don't think 
it's a 'kind of replication'. AFAIK most database replication solutions 
are built for Shared-Nothing Clusters. (With the exception of 
PgCluster-II, I think).


Regards

Markus




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] New CRC algorithm: Slicing by 8

2006-10-24 Thread Simon Riggs

On Tue, 2006-10-24 at 09:37 -0400, Tom Lane wrote:
> "Simon Riggs" <[EMAIL PROTECTED]> writes:
> > On Mon, 2006-10-23 at 15:12 -0400, Tom Lane wrote:
> >> There are actually three checks used to detect end of WAL: zero record
> >> length, invalid checksum, and incorrect back-pointer.  Zero length is
> >> the first and cleanest-looking test, but AFAICS we have to have both of
> >> the others to avoid obvious failure modes.
> 
> > The checksum protects from torn pages and disk errors. If you have
> > full_page_writes set then you already believe yourself safe from torn
> > pages and your disks could also already be CRC-checking the data.
> 
> No, because unlike tuples, WAL records can and do cross page boundaries.

But not that often, with full_page_writes = off. So we could get away
with just CRC checking the page-spanning ones and mark the records to
show whether they have been CRC checked or not and need to be rechecked
at recovery time. That would reduce the CRC overhead to about 1-5% of
what it is now (as an option).

Just a thought: Would there be benefit in not allowing page-spanning WAL
records, when they are small? That would tend to reduce the number of
WAL writes, even if it did cause some space wastage on disk. That would
reduce the number of same block re-writes and might improve the
sequential behaviour of WAL access. We never needed to think about this
while full_page_writes=on was the only option.

> > CRC-checked disks are actually the industry norm and have been for
> > around 5 years.
> 
> Huh?  Disks have ALWAYS had CRCs, and this is in any case utterly
> irrelevant to the database-crash risk.

According to the man from Seagate...

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Release stamping (Was: [CORE] Schedule for release?)

2006-10-24 Thread Tom Lane

"Dave Page"  writes:
> From: Tom Lane [mailto:[EMAIL PROTECTED] 
>> The pg_config.h.win32 file is intended to support building in an
>> environment where you can't run automake/autoconf, or indeed much of
>> anything else. 

> That doesn't matter does it? Marc runs the bootstrap, which inserts the
> version numbers into the right place and runs autoconf, then he commits
> the changed files (configure, pg_config.h.win32 etc) to CVS. Only he (or
> you or Bruce) should ever need to run it.

Hmm, so manufacture pg_config.h.win32 during tarball build and insert
the version numbers at that point?  Yeah, that would work.  Actually the
easiest thing would likely be to have configure build it the same way it
builds pg_config.h, and then not remove it in "make distclean".

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] Release stamping (Was: [CORE] Schedule for release?)

2006-10-24 Thread Dave Page

 

> -Original Message-
> From: Tom Lane [mailto:[EMAIL PROTECTED] 
> Sent: 24 October 2006 14:30
> To: Dave Page
> Cc: [EMAIL PROTECTED]
> Subject: Re: [CORE] Schedule for release? 
> 
> > In pgAdmin we have a simple bootstrap script the writes all 
> the version
> > numbers into a bunch of files, and then runs 
> automake/autoconf et al. It
> > sucks it out of one header which is all we have to modify manually.
> > Any reason we can't do something similar for PosgreSQL?
> 
> The pg_config.h.win32 file is intended to support building in an
> environment where you can't run automake/autoconf, or indeed much of
> anything else. 

That doesn't matter does it? Marc runs the bootstrap, which inserts the
version numbers into the right place and runs autoconf, then he commits
the changed files (configure, pg_config.h.win32 etc) to CVS. Only he (or
you or Bruce) should ever need to run it.

> Perhaps we could require Perl or something to be
> available?  I know Magnus is requiring Perl for his MSVC 
> build support.

Yes.

> This is getting offtopic for -core though.

Moved to -hackers...

Regards, Dave.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] COPY does not work with regproc and aclitem

2006-10-24 Thread Tom Lane

Zdenek Kotala <[EMAIL PROTECTED]> writes:
> I prepared patch which use oid output function instead regproc output. 
> This change works only for COPY TO command.

This is not a bug and we're not going to fix it, most especially not
like that.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] New CRC algorithm: Slicing by 8

2006-10-24 Thread Tom Lane

"Simon Riggs" <[EMAIL PROTECTED]> writes:
> On Mon, 2006-10-23 at 15:12 -0400, Tom Lane wrote:
>> There are actually three checks used to detect end of WAL: zero record
>> length, invalid checksum, and incorrect back-pointer.  Zero length is
>> the first and cleanest-looking test, but AFAICS we have to have both of
>> the others to avoid obvious failure modes.

> The checksum protects from torn pages and disk errors. If you have
> full_page_writes set then you already believe yourself safe from torn
> pages and your disks could also already be CRC-checking the data.

No, because unlike tuples, WAL records can and do cross page boundaries.
Unless you're prepared to posit that your system never crashes at all,
you have to be able to detect the case where you've got a good front
half of a WAL record and non-matching data in the next page.  The two
halves could be written out in widely separated write()s, indeed might
never have been simultaneously resident in the WAL buffers at all.

> CRC-checked disks are actually the industry norm and have been for
> around 5 years.

Huh?  Disks have ALWAYS had CRCs, and this is in any case utterly
irrelevant to the database-crash risk.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Hannu Krosing

Ühel kenal päeval, T, 2006-10-24 kell 00:20, kirjutas Bruce Momjian:
> Here is a new replication documentation section I want to add for 8.2:
> 
>   ftp://momjian.us/pub/postgresql/mypatches/replication

This is how data partitioning is currently described there

> Data Partitioning
> -
> 
> Data partitioning splits the database into data sets.  To achieve
> replication, each data set can only be modified by one server.  For
> example, data can be partitioned by offices, e.g. London and Paris. 
> While London and Paris servers have all data records, only London can
> modify London records, and Paris can only modify Paris records.  Such
> partitioning is usually accomplished in application code, though rules
> and triggers can help enforce partitioning and keep the read-only data
> sets current.  Slony can also be used in such a setup.  While Slony
> replicates only entire tables, London and Paris can be placed in
> separate tables, and inheritance can be used to access from both tables
> using a single table name.

Maybe another use of partitioning should also be mentioned. That is ,
when partitioning is used to overcome limitations of single servers
(especially IO and memory, but also CPU), and only a subset of data is
stored and processed on each server.

As an example of this type of partitioning you could mention Bizgres MPP
(a PG-based commercial product, http://www.greenplum.com ), which
partitions data to use I/O and CPU of several DB servers for processing
complex OLAP queries, and Pl_Proxy
( http://pgfoundry.org/projects/plproxy/ ) which does the same for OLTP
loads.

I think the "official" term for this kind of "replication" is
Shared-Nothing Clustering.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] COPY does not work with regproc and aclitem

2006-10-24 Thread Zdenek Kotala


Tom Lane wrote:

Alvaro Herrera <[EMAIL PROTECTED]> writes:

Hmm, maybe it should be using regprocedure instead?


Not unless you want to break initdb.  The only reason regproc still
exists, really, is to accommodate loading of pg_type during initdb.
Guess what: we can't do type lookup at that point.



I prepared patch which use oid output function instead regproc output. 
This change works only for COPY TO command. SELECT behavior is 
untouched. I extended copy regression test as well.


Please, look on it if it is acceptable fix.

With regards Zdenek
Index: src/backend/commands/copy.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.271
diff -c -r1.271 copy.c
*** src/backend/commands/copy.c	31 Aug 2006 03:17:50 -	1.271
--- src/backend/commands/copy.c	24 Oct 2006 12:35:45 -
***
*** 1309,1315 
  	&out_func_oid,
  	&isvarlena);
  		else
! 			getTypeOutputInfo(attr[attnum - 1]->atttypid,
  			  &out_func_oid,
  			  &isvarlena);
  		fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
--- 1309,1317 
  	&out_func_oid,
  	&isvarlena);
  		else
! 			/* For regproc datatype do not lookup proc name, use OID out function instead.
! 			   It avoids problem with COPY FROM. */ 
! 			getTypeOutputInfo(attr[attnum - 1]->atttypid == REGPROCOID? OIDOID : attr[attnum - 1]->atttypid,
  			  &out_func_oid,
  			  &isvarlena);
  		fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
Index: src/test/regress/input/copy.source
===
RCS file: /projects/cvsroot/pgsql/src/test/regress/input/copy.source,v
retrieving revision 1.14
diff -c -r1.14 copy.source
*** src/test/regress/input/copy.source	2 May 2006 11:28:56 -	1.14
--- src/test/regress/input/copy.source	24 Oct 2006 12:35:46 -
***
*** 105,107 
--- 105,113 
  
  copy copytest3 to stdout csv header;
  
+ --- test correct handling regproc data type
+ CREATE TEMP TABLE test_regproc (like pg_aggregate);
+ COPY pg_catalog.pg_aggregate TO '@abs_builddir@/results/test_regproc.data';
+ COPY test_regproc FROM '@abs_builddir@/results/test_regproc.data';
+ 
+ select aggfnoid, cast(aggfnoid as oid) from pg_aggregate where aggfnoid=2147;
Index: src/test/regress/output/copy.source
===
RCS file: /projects/cvsroot/pgsql/src/test/regress/output/copy.source,v
retrieving revision 1.12
diff -c -r1.12 copy.source
*** src/test/regress/output/copy.source	2 May 2006 11:28:56 -	1.12
--- src/test/regress/output/copy.source	24 Oct 2006 12:35:46 -
***
*** 70,72 
--- 70,82 
  c1,"col with , comma","col with "" quote"
  1,a,1
  2,b,2
+ --- test correct handling regproc data type
+ CREATE TEMP TABLE test_regproc (like pg_aggregate);
+ COPY pg_catalog.pg_aggregate TO '@abs_builddir@/results/test_regproc.data';
+ COPY test_regproc FROM '@abs_builddir@/results/test_regproc.data';
+ select aggfnoid, cast(aggfnoid as oid) from pg_aggregate where aggfnoid=2147;
+  aggfnoid | aggfnoid
+ --+--
+  pg_catalog.count | 2147
+ (1 row)
+ 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] xlogdump fixups and WAL log question.

2006-10-24 Thread Markus Schaber

Hi, Simon,

Simon Riggs wrote:

> 1. Provide a filter that can be easily used by archive_command to remove
> full page writes from WAL files. This would require us to disable the
> file size test when we begin recovery on a new WAL files, plus would
> need to redesign initial location of the checkpoint record since we
> could no longer rely on the XLogRecPtr being a byte offset within the
> file.

pg_WAL_filter could "correct" the XLogRecPtr and file sizes during the
filter run.

HTH,
Markus

-- 
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] [PATCHES] smartvacuum() instead of autovacuum

2006-10-24 Thread Hitoshi Harada

> If the decision to vacuum based on autovacuum criteria is good enough
> for you then I think you should just focus on getting autovac to do what
> you want it to do. Perhaps you just need to decrease the sleep time to a
> few seconds, so that autovac will quickly detect when something needs to
> be vacuumed.

Thanks, I'll do it.
My database is updated frequently all the day and 
runs big building process a day.
Almost all the day autovac is ok but
in the big building process autovac annoys it, 
so I wished there might be the way to order autovac to do its process.


Hitoshi Harada


> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Jim C. Nasby
> Sent: Tuesday, October 24, 2006 3:36 AM
> To: Hitoshi Harada
> Cc: 'Tom Lane'; 'Peter Eisentraut'; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] [PATCHES] smartvacuum() instead of autovacuum
> 
> If the decision to vacuum based on autovacuum criteria is good enough
> for you then I think you should just focus on getting autovac to do what
> you want it to do. Perhaps you just need to decrease the sleep time to a
> few seconds, so that autovac will quickly detect when something needs to
> be vacuumed.
> 
> The only case I can think of where autovac might not work as well as
> smartvacuum would be if you had a lot of databases in the cluster, since
> autovacuum will only vacuum one database at a time.
> 
> On Mon, Oct 23, 2006 at 11:18:39AM +0900, Hitoshi Harada wrote:
> > Ok,
> >
> > But my point is, autovacuum may corrupt with vacuum analyze command
> > on another session. My intention of smartvacuum() is based on this.
> > Any solution for this??
> >
> > Regards,
> >
> >
> > Hitoshi Harada
> >
> > > -Original Message-
> > > From: [EMAIL PROTECTED]
> > > [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane
> > > Sent: Monday, October 23, 2006 11:10 AM
> > > To: Hitoshi Harada
> > > Cc: 'Peter Eisentraut'; pgsql-hackers@postgresql.org
> > > Subject: Re: [HACKERS] [PATCHES] smartvacuum() instead of autovacuum
> > >
> > > "Hitoshi Harada" <[EMAIL PROTECTED]> writes:
> > > >> How is this different from what autovacuum does?
> > >
> > > > My application needs to do vacuum by itself, while
> > > > autovacuum does it as daemon.
> > > > The database is updated so frequently that
> > > > normal vacuum costs too much and tables to be updated are
> > > > not so many as the whole database is vacuumed.
> > > > I want to use autovacuum except the feature of daemon,
> > > > but want to control when to vacuum and which table to vacuum.
> > > > So, nothing is different between autovacuum and smartvacuum(),
> > > > but former is daemon and later is user function.
> > >
> > > This seems completely unconvincing.  What are you going to do that
> > > couldn't be done by autovacuum?
> > >
> > >   regards, tom lane
> > >
> > > ---(end of
broadcast)---
> > > TIP 5: don't forget to increase your free space map settings
> >
> >
> > ---(end of broadcast)---
> > TIP 5: don't forget to increase your free space map settings
> >
> 
> --
> Jim Nasby[EMAIL PROTECTED]
> EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)
> 
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] New CRC algorithm: Slicing by 8

2006-10-24 Thread Simon Riggs

On Mon, 2006-10-23 at 15:12 -0400, Tom Lane wrote:
> "Simon Riggs" <[EMAIL PROTECTED]> writes:
> > On Mon, 2006-10-23 at 13:52 -0400, Tom Lane wrote:
> >> No can do --- we rely on the checksums to be able to tell when we've hit
> >> the end of WAL during replay.  
> 
> > No we don't: Zero length records are the trigger for EOF.
> 
> Only if the file happens to be all-zero already, which is not the normal
> operating state (see WAL-file recycling).  Otherwise you have to be able
> to detect an invalid record.

OK.

> There are actually three checks used to detect end of WAL: zero record
> length, invalid checksum, and incorrect back-pointer.  Zero length is
> the first and cleanest-looking test, but AFAICS we have to have both of
> the others to avoid obvious failure modes.

Well, I understand the need for the zero-length test and the incorrect
back pointer (which also incidentally tests that the current record was
not left over from previous use of xlog file).

The checksum protects from torn pages and disk errors. If you have
full_page_writes set then you already believe yourself safe from torn
pages and your disks could also already be CRC-checking the data. So you
don't *need* the checksum in those cases. If we really think we need it
we could put the xlprev pointer as the *last* field on the xlrec, just
to make doubly sure - having a trailer as well as a header.

CRC-checked disks are actually the industry norm and have been for
around 5 years. ANSI SCSI Parallel Interface 3 (SPI-3) (UltraSCSI 160)
defines the use of CRC and this is available as standard from all key
manufacturers. (CRC-32 is the required level). So if you are using
modern SCSI, SATA or SAS technologies you'll be just fine. (Those checks
are a *requirement* of the underlying technology because of the error
rates when the bus speeds are so high.)

I checked this out in a conversation with the fine people at Seagate,
who also publish a variety of technical manuals/details on this, e.g.
http://www.maxtor.com/_files/maxtor/en_us/documentation/white_papers_technical/wp_ultra320.pdf
...You'll see that CRC is a Mandatory Feature of ANSI SPI-4 now.
http://www.maxtor.com/_files/maxtor/en_us/documentation/white_papers_technical/sas_link_layer.pdf

So, I'd like the *option* to turn our CRC off, please, somehow.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [DOCS] Replication documentation addition

2006-10-24 Thread Markus Schiltknecht


Hello Bruce,

Bruce Momjian wrote:

Here is a new replication documentation section I want to add for 8.2:

ftp://momjian.us/pub/postgresql/mypatches/replication

Comments welcomed.


Thank you, that sounds good. It's targeted to production use and 
currently available solutions, which makes sense in the official manual.


You are explaining the sync vs. async categorization, but I sort of 
asked myself where the explanation of single vs multi-master has gone. I 
then realized, that you are talking about read-only and a "read/write 
mix of servers". Then again, you are mentioning 'Multi-Master 
Replication' as one type of replication solutions. I think we should be 
consistent in our naming. As Single- and Multi-Master are the more 
common terms among database replication experts, I'd recommend to use 
them and explain what they mean instead of introducing new names.


Along with that, I'd argue that this Single- or Multi-Master is a 
categorization as Sync vs Async. In that sense, the last chapter should 
probably be named 'Distributed-Shared-Memory Replication' or something 
like that instead of 'Multi-Master Replication', because as we know, 
there are several ways of doing Multi-Master Replication (Slony-II / 
Postgres-R, Distributed Shared Memory, 2PC in application code or the 
above mentioned 'Query Broadcast Replication', which would fall into a 
Multi-Master Replication model as well)


Also in the last chapter, instead of just saying that "PostgreSQL does 
not offer this type of replication", we could probably say that 
different projects are trying to come up with better replication 
solutions. And there are several proprietary products based on 
PostgreSQL which do solve some kinds of Multi-Master Replication. Not 
that I want to advertise for any of them, but it just sounds better than 
the current "no, we don't offer that".


As this documentation mainly covers production-quality solutions (which 
is absolutely perfect), can we document the status of current projects 
somewhere, probably in a wiki? Or at least mention them somewhere and 
point to their websites? It would help to get rid of all those rumors 
and uncertainties. Or are those intentional?


Just my two cents.

Regards

Markus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

84 matches

Mail list logo