Re: [HACKERS] Postgresql Materialized views

2008-01-14 Thread Simon Riggs
On Sun, 2008-01-13 at 17:44 -0800, Sean Utt wrote:

 It is not my contention that the core developers need to 
 be different in any way. It is also not my contention that the users need to 
 be different in any way.

First, this is an open forum, so thank you for expressing your views
openly in the manner you've felt them.

General comment:

I think one perspective I have on the above statement is the feeling
that there is a distinction between two groups of people and that one
group is put here to serve the other group better.

Many new users of Postgres are so used to the closed source situation of
Developers being the only people who can see the code that they often
perpetuate the concept of tiering or groups, when it doesn't exist.

Almost all of the people on the list are users of Postgres. There's just
a complete range of people from new users to experienced hackers.
Postgres is well documented, well commented and completely open source,
so there is no barrier to anyone who wishes to change, and if you choose
to define that change positively, improve.

So I support Mark Mielke's views on writing code. Anybody who wants to
code, can. There's probably a project of a size and complexity that's
right for your first project. Apparently the guy that invented the new
scheduling algorithms for Linux wasn't even a coder, but he sat down and
worked it out. 

This is Hackers: Write some code today, everybody. You *can*.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Transaction Snapshot Cloning

2008-01-14 Thread Simon Riggs
On Sat, 2008-01-12 at 16:22 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  ... So if we did this via an installable module approach then we
  would be able to use it much sooner for upgrading to 8.3, rather than
  waiting for 8.4
 
 I think it borders on ludicrous to imagine making this work with just an
 installable module and no core-code changes.  So no, I refuse to spend
 any time helping to design an implementation for 8.3.  

Refuse is a strong word, so apologies if I've offended.

I already know how to write it, the main question was how dangerous is
it and I think we answered that. Misuse seems to be the issue you seem
to be worried about. In the wrong hands it could prove to be a Foot Gun
with a larger than normal kill zone.

Dangerous enough that you actively want me to not write it? Or just
saying you're not sure it can be done?

 It'll be a hard enough problem to make this work for 8.4.

Agreed.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Declarative partitioning grammar

2008-01-14 Thread Markus Schiltknecht

Hi,

Jeff Cohen wrote:
We did look at allowing general functions for partitioning and this was 
one concern.  The other is that we want to enforce that a row only gets 
inserted into a single partition, so we wanted a declarative syntax 
where it was relatively easy to check that range and list specifications 
don't overlap.


Why do you need to define a split point so ambiguously at all? Why not 
just give the DBA exactly *one* place to define the split point?


I don't think the separation into list, hash and range partitioning is 
adequate. What is the system supposed to do, if you try to insert a row 
which doesn't fit any of the values in your list or doesn't fit any of 
the ranges you defined?


I prefer a partitioning grammar which doesn't interfere with 
constraints. We all know how to define constraints. Please don't 
introduce a new, ambiguous way. A partitioning definition should be able 
to tell the target partition for *every* row which satisfies the 
constraints (the real ones, not ambiguous ones).


IMO, a single DDL command should only touch a single split point, i.e. 
split a table into two partitions, move the split point or remove the 
split point (joining the partitions again). Those are the only basic 
commands you need to be able to handle partitioning.


Sorry, but for my taste, the proposed grammar is too long per command, 
not flexible enough and instead ambiguous for split points as well as 
for constraints. To me it looks like repeating the mistakes of others.


Regards

Markus


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Postgresql Materialized views

2008-01-14 Thread Csaba Nagy
On Mon, 2008-01-14 at 09:22 +, Simon Riggs wrote:
 So I support Mark Mielke's views on writing code. Anybody who wants to
 code, can. There's probably a project of a size and complexity that's
 right for your first project. 

The main problem is that usually that initial thing is not what you
desperately need today... so the motivation will be pretty low unless
you just have loads of time to start off playing with the code.

 Apparently the guy that invented the new
 scheduling algorithms for Linux wasn't even a coder, but he sat down and
 worked it out.

 This is Hackers: Write some code today, everybody. You *can*.

Certainly everybody  can write code, but the barrier to accept it is
pretty high in the postgres community. So you better be a damn good
coder if you expect your code to be accepted... and even then with
considerable fight for justifying the use case for your feature ;-)

This is all good for a stable product, but it really makes the barrier
between simple users and hackers pretty high.

Cheers,
Csaba.



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Postgresql Materialized views

2008-01-14 Thread Jean-Michel Pouré
Dear Friends,

I hope that this flame war can stop, as it is useless.

The logic of free software is that developers pick-up issues, based on
their skills and interest. The power of the cummunity is to gather very
talented developers from all over the planet. Freedom is the logic and
there is no need to drive the community. In the end, PostgreSQL relies
on the knownledge of talented developers.

When posting this thread, I hope that a talented developer would some
day pick-up the materialized view issue and work on it, during a process
of discussion. 

In a few days, I will post some precise statistics on how much
MATERIALIZED VIEW could boost a 400.000 posts PhpBB 3.0 forum. I will
write the required PL trigger, test them and install them on my server.

Then each individual is able to decide whether materiazed views are
important or not. This is my definition of freedom. Freedom of choice.

I don't ask for more.
Now, I hope that the list can return to a more peaceful state.

Kind regards,
Jean-Michel Pouré


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Using tapes on tuplesort.c

2008-01-14 Thread mac_man2005
Hi to all.

It seems that the current PostgreSQL implementation of the Replacement 
Selection (RS) algorithm [Knuth] changes a logical tape for each run built.
I'm trying to implement that refinement to RS using 2 heaps instead of just one 
(2Way RS). Recall each heap is aimed at building its corresponding physical 
run, both heap cooperate building its own physical run associated to the same 
logical run).

2Way RS stops building the current logical run just after stop building both 
physical runs associated to the current logical run.

My question: should I use/change tape for each physical run or for each logical 
run?

I know you'll be probably busy with issues on the new PostgreSQL release, so 
I'll thank you twice for your reply.

Regards, Manolo.

Re: [HACKERS] Index trouble with 8.3b4

2008-01-14 Thread Hannes Dorbath

Tom Lane wrote:

I've committed a patch to do that.  Please test CVS HEAD and see if you
still see problems.


I'm happy to hear you found something and I will try CVS HEAD in a minute.

In the meantime let me report that the cluster issue happens with GIST 
as well. I have load 5 million rows in that table and did:


test=# CREATE INDEX CONCURRENTLY ts_test_tsv_gist ON public.test 
USING gist (tsv);

CREATE INDEX
test=# CLUSTER test USING ts_test_tsv_gist;
ERROR:  could not create unique index test_pkey
DETAIL:  Table contains duplicated values.
test=#

But as far as I understood this is already covered by your thesis.


--
Best regards,
Hannes Dorbath

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Index trouble with 8.3b4

2008-01-14 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 Oooh ... I can't be sure that this is what's biting you, but I
 definitely see a bug that seems to match the symptoms.  As the comments
 in index.c point out, CREATE INDEX CONCURRENTLY works like this:

  * validate_index() works by first gathering all the TIDs currently in the
  * index, using a bulkdelete callback that just stores the TIDs and doesn't
  * ever say delete it.  (This should be faster than a plain indexscan;
  * also, not all index AMs support full-index indexscan.)  Then we sort the
  * TIDs, and finally scan the table doing a merge join against the TID list
  * to see which tuples are missing from the index.

 The scan is done using the regular heapscan code, which in 8.3 has been
 modified to enable synchronized scanning, which means it might start
 from the middle of the table and wrap around.  

Wow, I'm glad we caught this in beta. Thanks a lot to Hannes Dorbath for
testing and reporting it.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Postgresql Materialized views

2008-01-14 Thread Mark Mielke

Jean-Michel Pouré wrote:

When posting this thread, I hope that a talented developer would some
day pick-up the materialized view issue and work on it, during a process
of discussion. 
  


In case it was lost in the noise - there are several capable people that 
have been personally on this issue for years. You are not the first to 
suggest it, and your opinion that the feature is valuable is shared by a 
larger group. There is still a difference between talking about it and 
doing it. A few of the doers told me off privately stating that it is 
not that difficult. My suspicion is that it *is* difficult and they are 
not doing themselves credit, or their solution is incomplete, but 
whatever - the result is the same. When one or more of these people are 
ready, you will likely see it released. It may even be complete before 
2008 is complete.



In a few days, I will post some precise statistics on how much
MATERIALIZED VIEW could boost a 400.000 posts PhpBB 3.0 forum. I will
write the required PL trigger, test them and install them on my server.
  


The two factors of note here: 1) If you can write a PL trigger for it, 
the value of the feature is limited. In my own case, I found it fairly 
easy to write triggers, or update the summary table from the application 
(poor man's view). 2) In my experience, a custom PL trigger can make 
assumptions about the application that allow greater optimization that a 
general 'syntactical sugar' solution could. In my own case, performance 
of queries leapt from 1500 ms to 1 ms. Even if materialized views were 
implemented to a level that most people would consider full, I do not 
expect to see the same speed improvement, because a generalized 
implementation would not be able to make the assumptions that I can. 
FYI, my triggers are perhaps 10 lines each, and I believe I have three 
triggers in the 1500 ms - 1 ms example. I have a view and a summary 
table. I update the summary table from the view. In my opinion, this 
solution is very manageable given the 1500:1 performance improvement it 
grants me.



Then each individual is able to decide whether materiazed views are
important or not. This is my definition of freedom. Freedom of choice.

I don't ask for more.
Now, I hope that the list can return to a more peaceful state


You are doing fine. I am sorry for assuming you intended more and giving 
you a cold-ish shoulder.


Cheers,
mark

--
Mark Mielke [EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Postgresql Materialized views

2008-01-14 Thread Alvaro Herrera
Mark Mielke wrote:

 FYI, my triggers are perhaps 10 lines each, and I believe I have three
 triggers in the 1500 ms - 1 ms example. I have a view and a summary
 table. I update the summary table from the view. In my opinion, this
 solution is very manageable given the 1500:1 performance improvement
 it grants me.

But you had to modify your queries.  I would think that a materialized
views implementation worth its salt would put the view to work on the
original, unmodified queries.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Postgresql Materialized views

2008-01-14 Thread Roberts, Jon


 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: Sunday, January 13, 2008 8:18 PM
 To: Sean Utt
 Cc: Andrew Dunstan; Joshua D. Drake; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Postgresql Materialized views
 
 Sean Utt [EMAIL PROTECTED] writes:
  My point is simply this: The lack of a clear formal process for feature
  requests leads to this degradation in the conversation.
 
 Two comments:
 
 1) The existing informal process has served us very well for more than
 ten years now.  I'm disinclined to consider replacing it, because that
 would risk altering the community's dynamics for the worse.
 
 2) In the end, this is an open source *community*; no amount of formal
 feature requesting will have any material impact on what actually gets
 implemented, because there isn't any central control.  

Wow.  Being new to Open Source, this amazes me.

 What gets
 implemented is whatever individual contributors choose to work on,
 either because they find it interesting or (in some cases) because
 someone pays them to do something specific.  Certainly, some
 contributors pay attention to what's being requested, but I see no
 reason to think that increasing the level of formality will help them.

What happens when a person adds a feature or changes the architecture of the
database that is perceived by some as incorrect or going in the wrong
direction?  



Jon

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Postgresql Materialized views

2008-01-14 Thread Martijn van Oosterhout
On Mon, Jan 14, 2008 at 08:30:53AM -0600, Roberts, Jon wrote:
  My point is that you should be able to query _table and the system
  should automatically use the view, without you saying so (except by
  initially creating them).
  
 I agree!  From a BI perspective, a materialized view is worthless if you
 have to re-write your query.  There isn't a tool on the market that is smart
 enough to rewrite a query to a view because all other databases handle the
 rewriting internally.  

I don't know about worthless, given that people are doing materialised
views on postgres already. It is however a completely orthoginal
problem. Someone needs to write the code to maintain such a view before
you can even think about working on the planner.

However, step 1 would be to get them onto the TODO list.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Those who make peaceful revolution impossible will make violent revolution 
 inevitable.
  -- John F Kennedy


signature.asc
Description: Digital signature


Re: [HACKERS] Postgresql Materialized views

2008-01-14 Thread Zeugswetter Andreas ADI SD

  But you had to modify your queries.  I would think that a materialized
  views implementation worth its salt would put the view to work on the
  original, unmodified queries.

 
 I might be slow today (everyday? :-) ) - but what do you mean by this? 
 The only difference between *_table and *_view is that *_table is the 
 summary table and *_view is the view. The triggers on the tables the 
 view is derived from select from *_view and update *_table. The queries 
 remain unchanged except for deciding whether to use *_table or *_view.

Traditionally materialized views exist, so that you do not need to code 
differently.
Your queries still run on the detail table, but are silently answered
by a suitable MV. The MV might have count + other aggregated columns
grouped by some columns, and thus be able e.g. shortcircuit a 
select count(*) from atab. The MV should be MVCC aware (have different
values for different snapshots) and not substantially reduce possible 
concurrency of updates to the base table.
 
 For some further background - the base tables are a mirror of accpac 
 tables (augh!) from mssql. The view and summary table gathers 
 information from 5 or so of these tables including aggregates, 
 conditionals, sub-selects (different queries to the same base tables) 
 and deep joins. Perhaps my imagination is too limited - but I 
 don't see 
 how it would be easy to make syntactical sugar for this and still 
 maintain the performance I describe above. For about 30 lines of 
 pl/pgsql and some application-side updates (again from the 
 view to the 
 summary table) in the synchronization script it seems acceptable.

As long as you can formulate a normal view on the above statement,
you should be able to tell the db to materialize that.

A good MV feature would be able to use that MV regardless of whether
you select from the view, or use a statement that the view is a generalization 
of.

I think MV's where originally invented to boost benchmark results
and thus had to operate on given sql to base tables.

Andreas

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Postgresql Materialized views

2008-01-14 Thread Andrew Dunstan



Roberts, Jon wrote:

What gets
implemented is whatever individual contributors choose to work on,
either because they find it interesting or (in some cases) because
someone pays them to do something specific.  Certainly, some
contributors pay attention to what's being requested, but I see no
reason to think that increasing the level of formality will help them.



What happens when a person adds a feature or changes the architecture of the
database that is perceived by some as incorrect or going in the wrong
direction?  



  


If that's the general perception it doesn't get added to our source 
tree. It's very rare that it gets to anything like as formal as a vote. 
To avoid the possibility of people spending lots of time doing work 
which is ultimately not adopted, we strongly discourage ivory tower 
development. For major features especially, developers are encouraged to 
discuss early and often.


But that's a different issue from which items people work on, which is 
very much a matter of individual choice, or at least something the 
community has little control over.


cheers

andrew



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Pl/Java broken since Postgresql 8.3-rc1

2008-01-14 Thread [EMAIL PROTECTED]
Hi there,
as you now is plJava broken with the actual security releases.
There is a pljava.dll at http://www.ejurka.com/pgsql/pljava/83rc1/ to
fix it for version 8.3RC1.

Is a pljava.dll for version 8.2.6 out?
It's very important for me, need it for my office.

Juergen



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Index trouble with 8.3b4

2008-01-14 Thread Tom Lane
Hannes Dorbath [EMAIL PROTECTED] writes:
 In the meantime let me report that the cluster issue happens with GIST 
 as well. ...
 But as far as I understood this is already covered by your thesis.

Right, the bug is independent of which index AM you use (though the
symptoms may vary).

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] could not open relation: Invalid argument

2008-01-14 Thread Alvaro Herrera
Roberts, Jon wrote:
 Version: PostgreSQL 8.2.5 on i686-pc-mingw32
 
 I recently started getting this error message randomly, could not open
 relation 42904/42906/42985: Invalid argument.  I also got it for a couple
 of other files.  All three files are related to tables that have just a
 single row each.

This has been reported before, and we've patched the system so that the
error reported is better.  In HEAD we also patched it so that the system
would automatically retry a number of times if the problem is
ERROR_SHARING_VIOLATION (or something like that).  That patch was not
backported to 8.2 due to lack of testing -- the original reported did
not come back to try the patched version.

If you want to try, the patch is here:
https://projects.commandprompt.com/public/pgsql/changeset/29853

Let us know how it goes.  If your problem is easily reproducible and the
patch makes it go away, we would consider back-patching the fix.

Thanks.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Postgresql Materialized views

2008-01-14 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 My point is simply this: The lack of a clear formal process for feature 
 requests leads to this degradation in the conversation. Without a formalized 
 structure, the conversation devolves rapidly into an argument over semantics 
 and word choice.
...
 There needs to be a way to evaluate the demand for a specific feature as 
 well as the benefits and the effort it will require.

You could always start a page on the developer's wiki:

http://developer.postgresql.org/

That would seem to be a good place to at least describe the problem in detail, 
show how you would like a feature to behave, and have people add the pros and 
cons of certain approaches. Certainly would be better to have a page to point 
to rather than trying to trawl through mailing archives (heck, the page could 
even mostly be a collection of such links).

- --
Greg Sabino Mullane [EMAIL PROTECTED]
PGP Key: 0x14964AC8 200801141104
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-BEGIN PGP SIGNATURE-

iD8DBQFHi4gvvJuQZxSWSsgRA66dAKCGCPBPDfTFDoizE0WDwXBzDK/W3ACg8dwZ
99OvuSU9PPmG6XDPPK2iQzA=
=Xseg
-END PGP SIGNATURE-



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Postgresql Materialized views

2008-01-14 Thread Tom Lane
Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes:
 Traditionally materialized views exist, so that you do not need to code 
 differently.
 Your queries still run on the detail table, but are silently answered
 by a suitable MV. The MV might have count + other aggregated columns
 grouped by some columns, and thus be able e.g. shortcircuit a 
 select count(*) from atab. The MV should be MVCC aware (have different
 values for different snapshots) and not substantially reduce possible 
 concurrency of updates to the base table.

Note that you just raised the minimum bar for implementation of the
feature by a couple orders of magnitude.  We cannot automatically
substitute an MV into queries unless this is guaranteed not to change
the results.  No lazy updates, MVCC transparency required, etc.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Postgresql Materialized views

2008-01-14 Thread Joshua D. Drake

Roberts, Jon wrote:


2) In the end, this is an open source *community*; no amount of formal
feature requesting will have any material impact on what actually gets
implemented, because there isn't any central control.  


Wow.  Being new to Open Source, this amazes me.


Well it depends on what you mean by formal feature requesting. I will 
agree with Tom that we are a bunch of cats, you can not herd us. However 
feature requests do get done and do have material impact.


However it usually takes more work than it should to actually get the 
feature accepted and or committed.




someone pays them to do something specific.  Certainly, some
contributors pay attention to what's being requested, but I see no
reason to think that increasing the level of formality will help them.


What happens when a person adds a feature or changes the architecture of the
database that is perceived by some as incorrect or going in the wrong
direction?  


They can't add a feature. They can submit a feature for inclusion but if 
we don't like it we don't take it. It is the hope that people who are 
trying to submit follow this:


http://www.postgresql.org/docs/faqs.FAQ_DEV.html

Sincerely,

Joshua D. Drake












Jon

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Pl/Java broken since Postgresql 8.3-rc1

2008-01-14 Thread Josh Berkus

Juergen,


Is a pljava.dll for version 8.2.6 out?
It's very important for me, need it for my office.


Try e-mailing pgsql-jdbc mailing list and asking there.

--Josh


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Postgresql Materialized views

2008-01-14 Thread Zeugswetter Andreas ADI SD

  Traditionally materialized views exist, so that you do not need to code 
  differently.
  Your queries still run on the detail table, but are silently answered
  by a suitable MV. The MV might have count + other aggregated columns
  grouped by some columns, and thus be able e.g. shortcircuit a 
  select count(*) from atab. The MV should be MVCC aware (have different
  values for different snapshots) and not substantially reduce possible 
  concurrency of updates to the base table.
 
 Note that you just raised the minimum bar for implementation of the
 feature by a couple orders of magnitude.  We cannot automatically
 substitute an MV into queries unless this is guaranteed not to change
 the results.  No lazy updates, MVCC transparency required, etc.

Yes, unfortunately. But don't you also think that this is what makes it 
a worthwhile feature ?

I mean, we do have the doityourself triggered summary table approach,
which is not overly difficult to set up. It needs some thought and possibly 
design
by the user to solve the most obvious concurrency issues, but it is doable.

Imho MV could be separated in 2 parts:
1: materialized and MVCC aware views (only used explicitly)
2: add the smarts to rewrite sql

Part 1 is already useful by itself since it provides a generic and easy
solution to concurrency for the user. (probably nice and mindboggling, how to 
best implement that, though :-)

The lazy update and non MVCC approach imho sounds too much like your
you can make it arbitrarily fast if it does not need to be correct :-) 

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Postgresql Materialized views

2008-01-14 Thread Ron Mayer
Mark Mielke wrote:
 Mark Mielke wrote:
 Counts, because as we all know, PostgreSQL count(*) is slow, and in
 any case, my count(*) is not on the whole table, but on a subset.
 Doing this in a general way seems complex to me as it would need to be
 able to evaluate whether a given INSERT or UPDATE or one of the
 dependent tables would impact the WHERE clause for the materialized
 view, and it still wouldn't know which rows to add/update/remove
 without detailed analysis, so it would either be throwing out the
 entire materialized view and recreating it on INSERT or UPDATE (or
 deferring until the next query?) in which case it may be very slow, or
 it may be very complex.
 
 Bah. I forgot to add: The feature I've been wondering about (and not
 necessarily looking for somebody else to do, although I don't think I
 know the code well enough to do it at this point):
 
 Web applications often make the same queries over and over. While
 memcache can be used to cache results, the memcache interface is
 different from the web application interfere requiring complex code, and
 as well, one loses the transaction guarantees as the memcache results
 are not guaranteed to be up-to-date with the database. 

Regarding up-to-dateness note that there is a pgfoundry project that
helps there.   http://pgfoundry.org/projects/pgmemcache/   The other
advantages of doing the caching outside the database is that (a) the
memory for the cached results don't have to sit in the database machine,
and (b) you can cache post-processed (rendered into HTML or gifs)
fragments rather than raw data.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Index trouble with 8.3b4

2008-01-14 Thread Hannes Dorbath

Tom Lane wrote:

I wrote:

I think it's okay for CREATE INDEX CONCURRENTLY to use bulk-read access
strategy (that is, seqscan using a limited number of buffers), but it
has to be able to force the scan to start at page zero.


I've committed a patch to do that.  Please test CVS HEAD and see if you
still see problems.


With some limited testing it seems both cases are indeed fixed. I was 
unable to reproduce either with current CVS HEAD. Though I'll run some 
further tests tomorrow to back that up.



Thanks for your time and prompt responses.


--
Best regards,
Hannes Dorbath

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] scan.l: check_escape_warning()

2008-01-14 Thread Tom Lane
Michael Meskes [EMAIL PROTECTED] writes:
 On Fri, Jan 11, 2008 at 10:41:17AM -0500, Tom Lane wrote:
 Perhaps there's some discrepancy between the ecpg and backend lexers
 as to where these are called?

 You're right. There is no way to (un)select standard conforming strings
 which makes up for the difference.

If that's how it is going to be, would you remove or #ifdef NOT_USED
the escape_string_warning function?  The 'function defined but not used'
warning that it's producing now is kind of annoying ...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] timestamp refactor effort

2008-01-14 Thread Warren Turkal
On Jan 13, 2008 9:21 AM, Tom Lane [EMAIL PROTECTED] wrote:
 Warren Turkal [EMAIL PROTECTED] writes:
  I have a question. Are the low level representations of Timestamp and
  TimestampTZ the same?

 They're the same but the interpretations are different, which is why
 I think it's useful to have two typedefs as a way of documenting what
 any given value is intended to be.  The argument for having a third
 typedef would be exactly the same: to help document what a value is
 intended to be.

Makes sense.

wt

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Re: [COMMITTERS] pgsql: Most recent Postgres version is 8.2.6, per report from Robert

2008-01-14 Thread Magnus Hagander

Alvaro Herrera wrote:

Bruce Momjian wrote:

Log Message:
---
Most recent Postgres version is 8.2.6, per report from Robert Treat.


Can't we make this automatically somehow?


Easiest way would be to have it refer to www.postgresql.org, where the 
frontpage already states this...



//Magnus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] Re: [COMMITTERS] pgsql: Most recent Postgres version is 8.2.6, per report from Robert

2008-01-14 Thread Alvaro Herrera
Bruce Momjian wrote:
 Log Message:
 ---
 Most recent Postgres version is 8.2.6, per report from Robert Treat.

Can't we make this automatically somehow?

Are we going to move to having the FAQ in XML Docbook FAQ format?  That
would rock and it would be easier to keep translations.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Index trouble with 8.3b4

2008-01-14 Thread Jeff Davis
On Sun, 2008-01-13 at 18:52 -0500, Tom Lane wrote:
 The scan is done using the regular heapscan code, which in 8.3 has been
 modified to enable synchronized scanning, which means it might start
 from the middle of the table and wrap around.  If that happens, the
 merge join will get totally confused because it is expecting the

Thank you and Hannes Dorbath for tracking this down.

 I wonder whether there are any other places that are silently assuming
 that heapscans start from page zero ...
 

I considered that question when implementing sync scans, but I could not
think of any specific areas of the code that would likely be affected.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] to_char incompatibility

2008-01-14 Thread Roberts, Jon
 
  You'll have to explain to Oracle and their customers that Oracle's
  security model is not a great idea then.
 
 I'd love to, and in fact *do* whenever I'm given the chance.
 
 In fact, Oracle's security model is pretty bad; the reason why Oracle
 advertises Unbreakable so hard is that they have a terrible record of
 security exploits, making them nearly as bad as MySQL. Heck, these days
 you're better off using MSSQL than Oracle to protect your data.

LOL!  I'm not going to trade jabs with you on which product has more
exploits because that is just stupid.

I'm stating that the *model* for Oracle security is very similar to the
non-default behavior of PostgreSQL of using security definer.  I prefer
this model.  I think it is a great idea and I mention Oracle because it is
highly reputable database company that uses this model.

For instance, if I want to allow a user to insert data, I most likely want
them to ONLY do it through my method.  That means creating a function with
security definer set and granting the user execute on the function.  I don't
want the user to select my sequence or inserting data directly to the table.


Also, there is no need to argue this because we can have it both ways.
Security definer is an option and I recommend to always use it over the
default.  If you don't want to use it, don't.



Jon

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Index trouble with 8.3b4

2008-01-14 Thread Tom Lane
Jeff Davis [EMAIL PROTECTED] writes:
 On Sun, 2008-01-13 at 18:52 -0500, Tom Lane wrote:
 I wonder whether there are any other places that are silently assuming
 that heapscans start from page zero ...

 I considered that question when implementing sync scans, but I could not
 think of any specific areas of the code that would likely be affected.

I went through all of the heap_beginscan calls in the code last night.
pgstattuple was broken but AFAICS none of the other callers care about
the visitation order.  I wonder though about third-party add-ons :-(

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] to_char incompatibility

2008-01-14 Thread Josh Berkus
Jon,

 You'll have to explain to Oracle and their customers that Oracle's
 security model is not a great idea then.

I'd love to, and in fact *do* whenever I'm given the chance.  

In fact, Oracle's security model is pretty bad; the reason why Oracle 
advertises Unbreakable so hard is that they have a terrible record of 
security exploits, making them nearly as bad as MySQL. Heck, these days 
you're better off using MSSQL than Oracle to protect your data.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] SSL over Unix-domain sockets

2008-01-14 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 It has been reported that the data transmission overhead is much less
 than the connection establishing overhead, which is measured here.
 But this is certainly not an encouraging measurement, if we want to
 put this close to the default path of use.

I did some more experiments to confirm Peter's results.  My test case
for measuring connection overhead is
pgbench -c 1 -t 1000 -S -n -C bench
(ie, single client, SELECT-only transaction, connecting again for each
transaction).  This is marginally more realistic than Peter's test
since the client executes a SQL command per connection.  I get

$ PGSSLMODE=prefer time pgbench -c 1 -t 1000 -S -n -C bench
transaction type: SELECT only
scaling factor: 10
number of clients: 1
number of transactions per client: 1000
number of transactions actually processed: 1000/1000
tps = 33.078772 (including connections establishing)
tps = 33.078772 (excluding connections establishing)
10.45user 0.68system 0:30.26elapsed 36%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+48465minor)pagefaults 0swaps

$ PGSSLMODE=disable time pgbench -c 1 -t 1000 -S -n -C bench
transaction type: SELECT only
scaling factor: 10
number of clients: 1
number of transactions per client: 1000
number of transactions actually processed: 1000/1000
tps = 156.237184 (including connections establishing)
tps = 156.237208 (excluding connections establishing)
0.20user 0.18system 0:06.41elapsed 6%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+2500minor)pagefaults 0swaps

$ PGSSLMODE=prefer time pgbench -c 1 -t 1000 -S -n -C -h localhost bench
transaction type: SELECT only
scaling factor: 10
number of clients: 1
number of transactions per client: 1000
number of transactions actually processed: 1000/1000
tps = 32.320773 (including connections establishing)
tps = 32.320774 (excluding connections establishing)
10.54user 1.01system 0:30.97elapsed 37%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+49807minor)pagefaults 0swaps

$ PGSSLMODE=disable time pgbench -c 1 -t 1000 -S -n -C -h localhost bench
transaction type: SELECT only
scaling factor: 10
number of clients: 1
number of transactions per client: 1000
number of transactions actually processed: 1000/1000
tps = 144.859620 (including connections establishing)
tps = 144.859641 (excluding connections establishing)
0.32user 0.62system 0:06.91elapsed 13%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+4512minor)pagefaults 0swaps

I also did some tests to measure the encryption overhead for bulk data,
in the form of pg_dumping a medium-size table (which is in fact just
the data from the regression test's tenk1 table, repeated 128 times):

[EMAIL PROTECTED] ~]$ PGSSLMODE=prefer time pg_dump -t foo regression | wc
2.71user 0.36system 0:25.09elapsed 12%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1093minor)pagefaults 0swaps
1280054 20480136 85863449

[EMAIL PROTECTED] ~]$ PGSSLMODE=disable time pg_dump -t foo regression | wc
0.64user 0.30system 0:09.63elapsed 9%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+793minor)pagefaults 0swaps
1280054 20480136 85863449

[EMAIL PROTECTED] ~]$ PGSSLMODE=prefer time pg_dump -t foo -h localhost 
regression | wc
3.06user 0.45system 0:25.82elapsed 13%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1105minor)pagefaults 0swaps
1280054 20480136 85863449

[EMAIL PROTECTED] ~]$ PGSSLMODE=disable time pg_dump -t foo -h localhost 
regression | wc
0.66user 0.42system 0:09.91elapsed 10%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+806minor)pagefaults 0swaps
1280054 20480136 85863449

Note that these times are for SSL enabled, but without any root.crt
files, so no actual authentication is happening --- I'm not sure how
much more connection-time overhead that would incur.  Presumably the
bulk transfer rate wouldn't change though.  All these numbers are
stable to within a percent or three over repeated trials.

Conclusions:

* SSL, even without real authentication, is *way* too expensive to
enable by default.

* The extra cost of going across a local TCP connection is measurable,
but it's insignificant compared to the cost of turning on SSL.  (This
is on a Fedora 8 kernel BTW ... that result might vary on other
platforms.)

So you could make a pretty good case that the answer for DBAs who
want to prevent spoofing is to disable socket connections in pg_hba.conf
and force even local connections to come through hostssl connections.

If we do want to apply Peter's patch, I think it needs to be extended so
that the default behavior on sockets is the same as before, ie, no SSL.
This could be done by giving libpq an additional connection parameter,
say socketsslmode, having the same alternatives as sslmode but
defaulting to allow instead of prefer.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill 

Re: [HACKERS] Postgresql Materialized views

2008-01-14 Thread Bruce Momjian

Added to TODO:

* Add the ability to automatically create materialized views

  Right now materialized views require the user to create triggers on 
the
  main table to keep the summary table current.  SQL syntax should be 
able
  to manager the triggers and summary table automatically.  A more
  sophisticated implementation would automatically retrieve from the
  summary table when the main table is referenced, if possible.

I also thought this was on the TODO list.

---

Zeugswetter Andreas ADI SD wrote:
 
   Traditionally materialized views exist, so that you do not need to code 
   differently.
   Your queries still run on the detail table, but are silently answered
   by a suitable MV. The MV might have count + other aggregated columns
   grouped by some columns, and thus be able e.g. shortcircuit a 
   select count(*) from atab. The MV should be MVCC aware (have different
   values for different snapshots) and not substantially reduce possible 
   concurrency of updates to the base table.
  
  Note that you just raised the minimum bar for implementation of the
  feature by a couple orders of magnitude.  We cannot automatically
  substitute an MV into queries unless this is guaranteed not to change
  the results.  No lazy updates, MVCC transparency required, etc.
 
 Yes, unfortunately. But don't you also think that this is what makes it 
 a worthwhile feature ?
 
 I mean, we do have the doityourself triggered summary table approach,
 which is not overly difficult to set up. It needs some thought and possibly 
 design
 by the user to solve the most obvious concurrency issues, but it is doable.
 
 Imho MV could be separated in 2 parts:
 1: materialized and MVCC aware views (only used explicitly)
 2: add the smarts to rewrite sql
 
 Part 1 is already useful by itself since it provides a generic and easy
 solution to concurrency for the user. (probably nice and mindboggling, how to 
 best implement that, though :-)
 
 The lazy update and non MVCC approach imho sounds too much like your
 you can make it arbitrarily fast if it does not need to be correct :-) 
 
 Andreas
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] to_char incompatibility

2008-01-14 Thread Josh Berkus
Jon,

 Also, there is no need to argue this because we can have it both ways.
 Security definer is an option and I recommend to always use it over the
 default.  If you don't want to use it, don't.

Security Definer has ramifications in PostgreSQL which I don't think it 
does in Oracle.  Particularly, see: 
http://www.postgresql.org/docs/techdocs.77

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] to_char incompatibility

2008-01-14 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 Security Definer has ramifications in PostgreSQL which I don't think it 
 does in Oracle.  Particularly, see: 
 http://www.postgresql.org/docs/techdocs.77

BTW, that article needs to be updated to show the (much easier) way to
do it as of 8.3.

I concur that make all your functions security definer by default is
unlikely to make a system more secure overall --- it'll just move the
problems around.  Especially if it's applied blindly by someone who
stopped reading at that point.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Bug: Unreferenced temp tables disables vacuum to update xid

2008-01-14 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Hmm ... that would be strange.  Off-the-cuff idea: we introduced code to
 advance relfrozenxid in CLUSTER, TRUNCATE and table-rewriting forms of
 ALTER TABLE.  Perhaps the problem is that we're neglecting to update it
 for the toast table there. AFAIR I analyzed the cases and they were all
 handled, but perhaps I forgot something.

I found a smoking gun ...

regression=# create table foo (f1 serial primary key, f2 text);
NOTICE:  CREATE TABLE will create implicit sequence foo_f1_seq for serial 
column foo.f1
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index foo_pkey for 
table foo
CREATE TABLE
regression=# insert into foo values(1,repeat('xyzzy',10));
INSERT 0 1
regression=# insert into foo values(2,repeat('xqzzy',10));
INSERT 0 1
regression=# select relname, relkind, relfrozenxid from pg_class order by oid 
desc limit 6;
relname| relkind | relfrozenxid 
---+-+--
 foo_pkey  | i   |0
 pg_toast_707220_index | i   |0
 pg_toast_707220   | t   |   119421
 foo   | r   |   119421
 foo_f1_seq| S   |0
 xmlview5  | v   |0
(6 rows)

regression=# cluster foo_pkey on foo;
CLUSTER
regression=# select relname, relkind, relfrozenxid from pg_class order by oid 
desc limit 6;
relname| relkind | relfrozenxid 
---+-+--
 pg_toast_707231_index | i   |0
 pg_toast_707231   | t   |   119424
 foo_pkey  | i   |0
 foo   | r   |   4195086720
 foo_f1_seq| S   |0
 xmlview5  | v   |0
(6 rows)

So something is out of whack in CLUSTER.  However it only seems to be
broken in HEAD, so I'm not sure this helps to explain the original
report.  (Speculation: this is related to the rewrite to make CLUSTER
MVCC-safe?)

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Declarative partitioning grammar

2008-01-14 Thread Gavin Sherry
On Sat, Jan 12, 2008 at 04:01:19PM +0530, NikhilS wrote:
 Hi,
 
  We did look at allowing general functions for partitioning and this
  was one concern.  The other is that we want to enforce that a row
  only gets inserted into a single partition, so we wanted a
  declarative syntax where it was relatively easy to check that range
  and list specifications don't overlap.
 
 
 Detection of mutually exclusive ranges might not turn out to be so easy
 afterall. I think there is some code in the constraint_exclusion area which
 might help out in this.

In some prototyping code it didn't seem too difficult but if we've made
a mistake we might have to look at the CE code.

Thanks,

Gavin


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Bug: Unreferenced temp tables disables vacuum to update xid

2008-01-14 Thread Alvaro Herrera
Tom Lane wrote:

 regression=# cluster foo_pkey on foo;
 CLUSTER
 regression=# select relname, relkind, relfrozenxid from pg_class order by oid 
 desc limit 6;
 relname| relkind | relfrozenxid 
 ---+-+--
  pg_toast_707231_index | i   |0
  pg_toast_707231   | t   |   119424
  foo_pkey  | i   |0
  foo   | r   |   4195086720
  foo_f1_seq| S   |0
  xmlview5  | v   |0
 (6 rows)
 
 So something is out of whack in CLUSTER.  However it only seems to be
 broken in HEAD, so I'm not sure this helps to explain the original
 report.  (Speculation: this is related to the rewrite to make CLUSTER
 MVCC-safe?)

Right ... see copy_heap_data --- it sets FreezeXid as relfrozenxid.

If we were to scan each tuple as it is inserted, we could store a higher
relfrozenxid, but I doubt we want to do that.

Perhaps what we could do is take the relfrozenxid from the old relation
and copy it over, if it's later than FreezeXid?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Bug: Unreferenced temp tables disables vacuum to update xid

2008-01-14 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Perhaps what we could do is take the relfrozenxid from the old relation
 and copy it over, if it's later than FreezeXid?

It certainly doesn't seem to make any sense to allow the rel's
relfrozenxid to go backwards.  Indeed this coding lets it end up less
than the DB's datfrozenxid, which is certainly inappropriate.

What might be the best idea is to advance FreezeXid to the old
relfrozenxid between the vacuum_set_xid_limits and begin_heap_rewrite
calls.  Then we'd be quite certain we are not lying: anything older
than that did indeed get frozen.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Postgresql Materialized views

2008-01-14 Thread Tom Lane
Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes:
 Note that you just raised the minimum bar for implementation of the
 feature by a couple orders of magnitude.

 Yes, unfortunately. But don't you also think that this is what makes it 
 a worthwhile feature ?

Well, my point is that taking automatic rewriting as a required feature
has at least two negative impacts:

* it rules out any form of lazy update, even though for many applications
an out-of-date summary view would be acceptable for some purposes;

* requiring MVCC consistency will probably hugely reduce the variety of
views that we can figure out how to materialize, and cost performance
even for the ones we can do at all.

It's not zero-cost, even if you consider implementation effort and
complexity as free (which I don't).

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Index trouble with 8.3b4

2008-01-14 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Jeff Davis [EMAIL PROTECTED] writes:
 On Sun, 2008-01-13 at 18:52 -0500, Tom Lane wrote:
 I wonder whether there are any other places that are silently assuming
 that heapscans start from page zero ...

 I considered that question when implementing sync scans, but I could not
 think of any specific areas of the code that would likely be affected.

 I went through all of the heap_beginscan calls in the code last night.
 pgstattuple was broken but AFAICS none of the other callers care about
 the visitation order.  I wonder though about third-party add-ons :-(

Perhaps we ought to have made heap_beginscan guarantee an ordered scan and
made synch scans be explicitly requested. That would have touched a lot of
lines but been more conservative. I'm not sure it's worth going back on it now
though.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] 8.3RC1 on windows missing descriptive Event handle names

2008-01-14 Thread Stephen Denne
On Windows XP, using Process Explorer with the lower pane showing Handles, not 
all postgres.exe processes are including an Event type with a description of 
what the process is doing.

At the moment, I have ten postgres processes shown as being at the bottom of 
the tree, three of which do not have such descriptions.

The processes that do show an Event type handle show these names:

\BaseNamedObjects\pgident: postgres: wal writer process
\BaseNamedObjects\pgident: postgres: autovacuum launcher process

plus one per connection like:

\BaseNamedObjects\pgident: postgres: postgres mydatabase 127.0.0.1(1954) idle


Comparing the list of processes to those of 8.2, I suspect that the processes 
missing this detail are the log writer (one of the processes does have a File 
Handle on the log file), and the stats collector.

I have autovacuum enabled for 8.3rc1, and disabled for 8.2.

PostgreSQL is started as a service.

Stephen Denne

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer 
focus, and courage. This email with any attachments is confidential and may be 
subject to legal privilege.  If it is not intended for you please advise by 
reply immediately, destroy it and do not copy, disclose or use it in any way.

__
  This email has been scanned by the DMZGlobal Business Quality 
  Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Declarative partitioning grammar

2008-01-14 Thread Jeff Cohen


On Jan 14, 2008, at 1:49 AM, Markus Schiltknecht wrote:

I don't think the separation into list, hash and range partitioning  
is adequate. What is the system supposed to do, if you try to  
insert a row which doesn't fit any of the values in your list or  
doesn't fit any of the ranges you defined?


Hi Markus,

If you don't define a default partition to handle outliers,  the  
insert should fail with an error.


I prefer a partitioning grammar which doesn't interfere with  
constraints. We all know how to define constraints. Please don't  
introduce a new, ambiguous way. A partitioning definition should be  
able to tell the target partition for *every* row which satisfies  
the constraints (the real ones, not ambiguous ones).


IMO, a single DDL command should only touch a single split point,  
i.e. split a table into two partitions, move the split point or  
remove the split point (joining the partitions again). Those are  
the only basic commands you need to be able to handle partitioning.


I can certainly appreciate the simplicity of this approach.  It lets  
us use a generic check constraint to perform partitioning, so it is  
more general than partitioning using hash, list, and range.  However,  
it achieves this generality at the expense of usability for typical  
customer cases.  For example, let's look at the case of a table of 1  
year of sales data, where we want to create 12 partitions -- one for  
each month.


With the generic approach, you start with a single table, and start  
by splitting it into two six-month partitions:


ALTER TABLE sales
  SPLIT where sales_date  date '2007-06-01'
   INTO
(
 PARTITION first_half
 PARTITION second_half
 );

We could implement this approach using check constraints and table  
inheritance: the partition second_half is a child table where  
sales_date  date '2007-06-01', and the partition first_half has the  
complementary constraint NOT(sales_date  date '2007-06-01').


Next, you split each partition:

ALTER TABLE sales
  SPLIT PARTITION first_half where sales_date  date '2007-03-01'
   INTO
(
 PARTITION first_quarter
 PARTITION second_quarter
 );

So now the child table for first_half itself has two children.  As  
you continue this process you construct a binary tree of table  
inheritance using 12 ALTER statements.


In the long grammar you can create and partition the table in one  
statement:


CREATE TABLE sales
...
PARTITION BY sales_date
(
start (date '2007-01-01') end (date '2008-01-01')
every (interval '1 month')
);

Sorry, but for my taste, the proposed grammar is too long per  
command, not flexible enough and instead ambiguous for split points  
as well as for constraints. To me it looks like repeating the  
mistakes of others.


Thanks for your feedback.  Partitioning the table using series of  
splits is a clever solution for situations where the partitioning  
operation cannot be described using simple equality (like list,hash)  
or ordered comparison (range).  But for many common business cases,  
the long grammar is easier to specify.


kind regards,

Jeff


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] to_char incompatibility

2008-01-14 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Josh Berkus [EMAIL PROTECTED] writes:
 Security Definer has ramifications in PostgreSQL which I don't think it 
 does in Oracle.  Particularly, see: 
 http://www.postgresql.org/docs/techdocs.77

 BTW, that article needs to be updated to show the (much easier) way to
 do it as of 8.3.

 I concur that make all your functions security definer by default is
 unlikely to make a system more secure overall --- it'll just move the
 problems around.  Especially if it's applied blindly by someone who
 stopped reading at that point.

I think the reason Oracle DBAs are accustomed to using security definer for
everything is that it has some further effects aside from selecting the
privileges to use. 

Remember that in Oracle the current role also controls what we call the
search_path. So selecting security definer is effectively selecting lexical
scoping over dynamic scoping. It nails down all the references in the package
or function at compile time.

That does have more robust security implications. It's also supposed to
perform better. And experience shows lexical scoping makes it easier to build
large complex systems without getting bogged down in lots of
action-at-a-distance. 

In Postgres the performance consequence is reversed. We have a performance
*hit* for security definer. And the pl interpreters don't behave any
differently as far as when they do their lookups.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Index trouble with 8.3b4

2008-01-14 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 I went through all of the heap_beginscan calls in the code last night.
 pgstattuple was broken but AFAICS none of the other callers care about
 the visitation order.  I wonder though about third-party add-ons :-(

 Perhaps we ought to have made heap_beginscan guarantee an ordered scan and
 made synch scans be explicitly requested. That would have touched a lot of
 lines but been more conservative. I'm not sure it's worth going back on it now
 though.

Hmm.  I'm too lazy to go back and look right now, but IIRC most of the
hardwired heapscans are on system catalogs that are unlikely to be large
enough to trigger a syncscan anyway.  If we were to flip the semantics,
and then change only the callers that clearly need to enable syncscans,
it would not be all that large a patch I think.

On the other hand it's far from clear that there's really a problem.
The model for doing a block-at-a-time scan is VACUUM, and that doesn't
use the heapscan infrastructure but just fetches blocks by number.
It would only be people who'd copied pgstattuple's methodology that
would be likely to be at risk.  I'm not sure we should protect those
hypothetical people at the cost of not doing syncscans for other
(also hypothetical) third-party add-ons that do heapscans on large
tables and wouldn't have a problem with wraparound.

It's a tossup from here.  Anybody have a strong opinion one way or the
other?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] could not open relation: Invalid argument

2008-01-14 Thread Jaime Casanova
On Jan 14, 2008 11:03 AM, Alvaro Herrera [EMAIL PROTECTED] wrote:
 Roberts, Jon wrote:
  Version: PostgreSQL 8.2.5 on i686-pc-mingw32
 
  I recently started getting this error message randomly, could not open
  relation 42904/42906/42985: Invalid argument.  I also got it for a couple
  of other files.  All three files are related to tables that have just a
  single row each.

 This has been reported before, and we've patched the system so that the
 error reported is better.  In HEAD we also patched it so that the system
 would automatically retry a number of times if the problem is
 ERROR_SHARING_VIOLATION (or something like that).  That patch was not
 backported to 8.2 due to lack of testing -- the original reported did
 not come back to try the patched version.

 If you want to try, the patch is here:
 https://projects.commandprompt.com/public/pgsql/changeset/29853

 Let us know how it goes.  If your problem is easily reproducible and the
 patch makes it go away, we would consider back-patching the fix.


I have builded a patched version of 8.2, if you want i can send it to you...
if your mail server doesn't accept large files (the zip file is 5.6
mb) then maybe someone can share some space for temporarily store
it...

-- 
regards,
Jaime Casanova

Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs and the universe trying
to produce bigger and better idiots.
So far, the universe is winning.
   Richard Cook

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] SSL over Unix-domain sockets

2008-01-14 Thread Bruce Momjian
Tom Lane wrote:
 Conclusions:
 
 * SSL, even without real authentication, is *way* too expensive to
 enable by default.
 
 * The extra cost of going across a local TCP connection is measurable,
 but it's insignificant compared to the cost of turning on SSL.  (This
 is on a Fedora 8 kernel BTW ... that result might vary on other
 platforms.)
 
 So you could make a pretty good case that the answer for DBAs who
 want to prevent spoofing is to disable socket connections in pg_hba.conf
 and force even local connections to come through hostssl connections.

Yea, I figured using protected directories for the socket was the
zero-cost solution, and if you have to do SSL, might as well just use
TCP too.  (If you moved the socket file to a protected directory I think
you could use external_pid_file='/tmp/.s.PGSQL.5432' to prevent a spoof
socket file in /tmp.  Should we document that idea?)

 If we do want to apply Peter's patch, I think it needs to be extended so
 that the default behavior on sockets is the same as before, ie, no SSL.
 This could be done by giving libpq an additional connection parameter,
 say socketsslmode, having the same alternatives as sslmode but
 defaulting to allow instead of prefer.

That seems like it is going to be added confusion; just using the
protected socket diretory or TCP  SSL seems less error-prone.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Declarative partitioning grammar

2008-01-14 Thread Tom Lane
Jeff Cohen [EMAIL PROTECTED] writes:
 In the proposed solution, hash and list partitions work for all types  
 that support an equality operator, and range partitions work for all  
 types that support fully-ordered comparison.

Surely a hashing method would require a *hashable* equality operator,
ie a hash opclass; likewise range partitions would demand a matching
btree opclass.  You could do list partitions with an equality operator
of either kind.

Essentially all of the system's current knowledge about the properties
of specific operators is encoded as operator classes for one of these
two built-in index types.  If you want to make assumptions about the
behavior of an operator, it really needs to be founded on these types of
opclasses --- or else you're buying into inventing a comparable amount
of infrastructure for some other organizational concept.

I think Peter's point was that you might want to think about
generalizing your concepts so that other kinds of operator classes could
someday serve as the foundations for other kinds of partitioning rules.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Declarative partitioning grammar

2008-01-14 Thread Jeff Cohen


On Jan 12, 2008, at 9:34 AM, Peter Eisentraut wrote:

Well, with an extensible system such as PostgreSQL you will need to  
have a
partitioning scheme that can deal with extensions.  Perhaps people  
want to
partition by XML, GIS, text-search data, or whatever someone might  
come up

with in the future.


Hi Peter,

In the proposed solution, hash and list partitions work for all types  
that support an equality operator, and range partitions work for all  
types that support fully-ordered comparison.


kind regards,

Jeff

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] SSL over Unix-domain sockets

2008-01-14 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Yea, I figured using protected directories for the socket was the
 zero-cost solution, and if you have to do SSL, might as well just use
 TCP too.  (If you moved the socket file to a protected directory I think
 you could use external_pid_file='/tmp/.s.PGSQL.5432' to prevent a spoof
 socket file in /tmp.  Should we document that idea?)

Umm ... two questions about that:

* will the postmaster fail if there's a socket where it tries to write
the external_pid_file?  (If it does fail, does that really fix
anything?  The spoofer already owns the socket.)

* if there's a plain file where a client expects to find the socket,
what happens?  (Probably nothing very good, since the first thing the
client will do is write on it.)

 If we do want to apply Peter's patch, I think it needs to be extended so
 that the default behavior on sockets is the same as before, ie, no SSL.

 That seems like it is going to be added confusion; just using the
 protected socket diretory or TCP  SSL seems less error-prone.

Yeah, all of this is about confusion and error-proneness.  I still think
that the real problem is that we don't have full control over
client-side code, and therefore can't just write off the problem of a
client deciding to connect to /tmp/.s.PGSQL.5432 even if the local DBA
thinks the socket would be safer elsewhere.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Re: [COMMITTERS] pgsql: Most recent Postgres version is 8.2.6, per report from Robert

2008-01-14 Thread Bruce Momjian
Alvaro Herrera wrote:
 Bruce Momjian wrote:
  Log Message:
  ---
  Most recent Postgres version is 8.2.6, per report from Robert Treat.
 
 Can't we make this automatically somehow?

Hmmm, well, that file is pulled to our web site on every change so I am
not sure how we could do this automatically;  updating this file is in
the release check list but somehow it got skipped.

 Are we going to move to having the FAQ in XML Docbook FAQ format?  That
 would rock and it would be easier to keep translations.

That would make sense.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Pl/Java broken since Postgresql 8.3-rc1

2008-01-14 Thread Kris Jurka



On Mon, 14 Jan 2008, Josh Berkus wrote:


Juergen,


Is a pljava.dll for version 8.2.6 out?
It's very important for me, need it for my office.


Try e-mailing pgsql-jdbc mailing list and asking there.



The correct list is actually [EMAIL PROTECTED]  The JDBC driver 
and the server side language have little in common.


I've put up a new pljava build against 8.2.6 for windows here:

http://www.ejurka.com/pgsql/pljava/826/

In this case you need the new pljava.jar as well as the .dll because 
it's had some updates since the previous release as well.


Kris Jurka

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] SSL over Unix-domain sockets

2008-01-14 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Yea, I figured using protected directories for the socket was the
  zero-cost solution, and if you have to do SSL, might as well just use
  TCP too.  (If you moved the socket file to a protected directory I think
  you could use external_pid_file='/tmp/.s.PGSQL.5432' to prevent a spoof
  socket file in /tmp.  Should we document that idea?)
 
 Umm ... two questions about that:
 
 * will the postmaster fail if there's a socket where it tries to write
 the external_pid_file?  (If it does fail, does that really fix
 anything?  The spoofer already owns the socket.)

I figured it would prevent someone from spoofing while the server was
up, which is a _new_ problem when moving the socket.  :-(

My feeling on the moving of sockets risk is that you are probably going
to have all your clients using the new socket directory before anyone
tries to put something in /tmp, especially if you have the lock file in
/tmp as outlined above.  To spoof in such a situation you would need to
do the attack while the server is down _and_ against a client that
doesn't know the right socket location.

 * if there's a plain file where a client expects to find the socket,
 what happens?  (Probably nothing very good, since the first thing the
 client will do is write on it.)

We would have to test that.

  If we do want to apply Peter's patch, I think it needs to be extended so
  that the default behavior on sockets is the same as before, ie, no SSL.
 
  That seems like it is going to be added confusion; just using the
  protected socket diretory or TCP  SSL seems less error-prone.
 
 Yeah, all of this is about confusion and error-proneness.  I still think
 that the real problem is that we don't have full control over
 client-side code, and therefore can't just write off the problem of a
 client deciding to connect to /tmp/.s.PGSQL.5432 even if the local DBA
 thinks the socket would be safer elsewhere.

Right.  I think the lock file in /tmp does help somewhat.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Array behavior oddities

2008-01-14 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
   If any dimension is written as a slice, i.e. contains a colon, then all
   dimensions are treated as slices.

 Is the the behavior of assuming an entry with no colon is a slice what
 we want, or are we just stuck with it?

Why do you find that surprising?  It's either a slice or it isn't,
there's no halfway point.  Are you proposing to throw an error if only
some of the subscripts have colons?  What would be the point?

 Is there a reason out-of-bounds array accesses behave differently for
 slices and non-slices?

History (although sloppy original implementation would do too).  I'm
not sure if we should try to clean it up --- there've not been that many
complaints, but I'm sure we'd get complaints from people whose code
stopped working, if we change it.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Array behavior oddities

2008-01-14 Thread Bruce Momjian
In reading our array documentation I came across two unusual behaviors. 
The issue relates to slices:

  We can also access arbitrary rectangular slices of an array, or
  subarrays.  An array slice is denoted by writing
  
literalreplaceablelower-bound/replaceable:replaceableupper-bound/replaceable/literal
  for one or more array dimensions.  For example, this query retrieves
  the first item on Bill's schedule for the first two days of the week:

  SELECT schedule[1:2][1:1] FROM sal_emp WHERE name = 'Bill';

First issue:

  If any dimension is written as a slice, i.e. contains a colon, then all
  dimensions are treated as slices.  Any dimension that has only a single
  number (no colon) is treated as being from literal1/ to the number
  specified.  For example, literal[2]/ is treated as literal[1:2]/,
  as in this example:

Is the the behavior of assuming an entry with no colon is a slice what
we want, or are we just stuck with it?

Also:

  An array subscript expression will return null if either the array itself
  or any of the subscript expressions are null.  Also, null is returned
  if a subscript is outside the array bounds (this case does not raise an
  error).  For example, if literalschedule/ currently has the
  dimensions literal[1:3][1:2]/ then referencing
  literalschedule[3][3]/ yields NULL.  Similarly, an array reference
  with the wrong number of subscripts yields a null rather than an error.

  An array slice expression likewise yields null if the array itself or
  any of the subscript expressions are null.  However, in other corner
  cases such as selecting an array slice that is completely outside the
  current array bounds, a slice expression yields an empty
  (zero-dimensional) array instead of null.  If the requested slice
  partially overlaps the array bounds, then it is silently reduced to just
  the overlapping region.

Is there a reason out-of-bounds array accesses behave differently for
slices and non-slices?

Having slices and non-slices behave differently is very confusing to me.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] SSL over Unix-domain sockets

2008-01-14 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Yeah, all of this is about confusion and error-proneness.  I still think
 that the real problem is that we don't have full control over
 client-side code, and therefore can't just write off the problem of a
 client deciding to connect to /tmp/.s.PGSQL.5432 even if the local DBA
 thinks the socket would be safer elsewhere.

 Right.  I think the lock file in /tmp does help somewhat.

Even if it happens to work (on some platforms) it seems like a kluge.

It strikes me that given the postmaster's infrastructure for listening
on multiple sockets, it would be a pretty small matter of programming
to teach it to listen on socket files in multiple directories not only
one.  If we had that, the postmaster could listen in both /tmp and
your-more-secure-directory-of-choice.  Surely an actual socket file
would be a more useful blocker in /tmp than a dead-weight PID file.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly