Re: [HACKERS] Streaming replication document

2010-12-03 Thread Heikki Linnakangas

On 04.12.2010 04:18, Tatsuo Ishii wrote:

There is a description about streaming replication in the doc:
--
25.2.5. Streaming Replication

:
:
If you set up a WAL archive that's accessible from the
standby, wal_keep_segments is not required as the standby can always
use the archive to catch up.
--

I think this description is somewhat inadequate. Since recovery using
WAL archive is file based, it may cause long replication delay. I
think even if WAL archive is set up, we should set wal_keep_segments
to proper value, not 0. Recovery from WAL archive should be the last
resort, shouldn't be?


If your standby falls behind that much, catching up is going to take a 
while anyway. The master always keeps 2-3 * checkpoint_segments WAL 
segments around anyway even if wal_keep_segments is 0.


Depending on the archive, it might well be faster to catch up using the 
archive, instead of streaming from master.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash safe visibility map vs hint bits

2010-12-03 Thread Heikki Linnakangas

On 04.12.2010 09:14, jes...@krogh.cc wrote:

There has been a lot discussion about index-only scans and how to make the 
visibillity map crash safe. Then followed by a good discussion about hint bits.

What seems to be the main concern is the added wal volume and it makes me 
wonder if there is a way in-between that looks more like hint bits.

How about lazily wal-log the complete visibility map say every X minutes or N 
amount of tuple updates and make the wal recovery jobs of rechecking visibility 
of pages touched by the wal stream on recovery.


If you WAL-log the visibility map changes after-the-fact, it doesn't 
solve the race condition we're struggling with: the visibility map 
change might hit the disk before the PD_ALL_VISIBLE to the heap page. If 
you crash, you can end up with a situation where the PD_ALL_VISIBLE flag 
on the heap page is not set, but the bit in the visibility map is. Which 
causes serious issues later on.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Crash safe visibility map vs hint bits

2010-12-03 Thread jes...@krogh.cc
There has been a lot discussion about index-only scans and how to make the 
visibillity map crash safe. Then followed by a good discussion about hint bits.

What seems to be the main concern is the added wal volume and it makes me 
wonder if there is a way in-between that looks more like hint bits.

How about lazily wal-log the complete visibility map say every X minutes or N 
amount of tuple updates and make the wal recovery jobs of rechecking visibility 
of pages touched by the wal stream on recovery.

This seems a lot like the checkpoint mechanism but I can't see if it can just 
follow the same pattern directly.

This may also just demonstrate my total lack of understanding of PGs intervals.

Jesper



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to add a primary key using an existing index

2010-12-03 Thread Tom Lane
"Ross J. Reedstrom"  writes:
> If you consider that an index basically is, in some sense, a pre-canned
> column list, then:

> ALTER TABLE table_name ADD PRIMARY KEY (column_list);
> ALTER TABLE table_name ADD PRIMARY KEY USING index_name;

> are parallel constructions. And it avoids the error case of the user
> providing a column list that doesn't match the index.

+1 for that approach.  One other thought is that ordinarily, the
add-constraint syntax ensures that the constraint is named the same as
its underlying index; in fact we go so far as to keep them in sync if
you rename the index later.  But after

ALTER TABLE table_name ADD CONSTRAINT con_name PRIMARY KEY USING index_name;

they'd be named differently, unless we (a) throw an error or (b)
forcibly rename the index.  Neither of those ideas seems to satisfy the
principle of least surprise, but leaving it alone seems like it will
also lead to confusion later.

I wonder whether, in the same spirit as not letting the user write a
column name list that might not match, we should pick a syntax that
doesn't allow specifying a constraint name different from the index
name.  In the case where you say CONSTRAINT it'd be a bit plausible
to write something like

ALTER TABLE table_name ADD CONSTRAINT con_name PRIMARY KEY USING EXISTING INDEX;

(implying that the index to use is named con_name) but I don't know
what to do if you want to leave off the "CONSTRAINT name" clause.

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] knngist - 0.8

2010-12-03 Thread Tom Lane
Robert Haas  writes:
> 2010/9/13 Teodor Sigaev :
>> [updated patch]

> I realize I'm repeating myself, but...  this patch needs
> documentation.  It's not optional.

I've applied all of this, and written documentation for all of it,
except for the contrib/btree_gist additions which still need to be
redone for the revised API (and then documented!).  My patience ran out
somewhere around there, so I'm marking that part as returned with
feedback.

What we have at this point (pending contrib/btree_gist fixes) is
nearest-neighbor searching capability for point columns.  And
trigram-based nearest-neighbor for text strings, if you install
contrib/pg_trgm.  That doesn't seem like a lot of return for the
amount of work that went into it.  Are there plans to add KNN support
for any other standard types?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Note about KNNGIST limitation: no mark/restore

2010-12-03 Thread Tom Lane
While working on the KNNGIST documentation I noticed that it's possible
to create queries that will try to use a KNN scan as the inside of a
mergejoin, leading to a failure because GIST hasn't got mark/restore
support.  For example, in the current HEAD regression database:

regression=# set enable_seqscan TO 0;
SET
regression=# explain select * from point_tbl a, point_tbl b where (a.f1 <-> 
'(0,0)') = (b.f1 <-> '(0,0)') order by (a.f1 <-> '(0,0)');
 QUERY PLAN 

 Merge Join  (cost=0.00..16.94 rows=7 width=32)
   Merge Cond: ((a.f1 <-> '(0,0)'::point) = (b.f1 <-> '(0,0)'::point))
   ->  Index Scan using gpointind on point_tbl a  (cost=0.00..8.37 rows=7 
width=16)
 Order By: (f1 <-> '(0,0)'::point)
   ->  Index Scan using gpointind on point_tbl b  (cost=0.00..8.37 rows=7 
width=16)
 Order By: (f1 <-> '(0,0)'::point)
(6 rows)

regression=# select * from point_tbl a, point_tbl b where (a.f1 <-> '(0,0)') = 
(b.f1 <-> '(0,0)') order by (a.f1 <-> '(0,0)');
ERROR:  GiST does not support mark/restore

The current planner code will not consider a KNN scan unless it
matches the query's ORDER BY, so the ORDER BY in the above example is
required to provoke the failure.

This seems like a sufficiently far-fetched example that I'm not too
concerned about it.  Adding mark/restore to GiST KNN scans doesn't look
practical at all; so if we were to try to do something, it would need to
involve hacking the planner to know that this plan type doesn't work,
which seems possible but klugy.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST insert algorithm rewrite

2010-12-03 Thread Robert Haas
On Dec 3, 2010, at 4:54 PM, Heikki Linnakangas 
 wrote:
> Here's an updated patch.

How carefully have you perf-tested this?

> On closer look, supporting the invalid tuples in scans was trivial, so I kept 
> that after all. So you can still query an index with invalid tuples. If an 
> insert encounters one, you get an error, and VACUUM emits a LOG message on 
> any such tuples.

Cool.

> There's one bug remaining that I found during testing. If you crash, leaving 
> an incomplete split behind, and then vacuum the table removing all the 
> aborted tuples from the pages, it's possible that you end up with a 
> completely empty page that has no downlink yet. The code to complete 
> incomplete splits doesn't cope with that at the moment - it doesn't know how 
> to construct a parent key for a child that has no tuples.

I think we can live with this.
> 


...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Requirement for the Buffer manager to load multiple pages at once

2010-12-03 Thread Vaibhav Kaushal
Hi all,

I have gone through the code a bit more this time and I think that what
I originally thought of as a task of executor in the beginning (as I
read the documentation) seems to be split at multiple points. So i am
asking a question based on what I have understood so far. Keeping it a
bit organized, I have to say this:


==> What I am planning to do?

I want to parallelize the search / scan (specifically hash) by using
multiple cores in the processors today. 


==> What do I know so far?

Till now all that I know seems to be a problem for this design. I know
that 

* PG backend is not thread safe. Even we can't create a thread in there.

* For anything to be integrated into the PG as well as be crash safe, it
needs to comply with the rules of the WAL and buffer manager subsystem.

* Almost everything in PG is divided into tables and every table is
organized into pages, all of which have the same structure described in
the "database page layout" Section in the documentation. 

* Each page (in case of a table created by the user) will contain
multiple number of entries (at least one). In most cases it will contain
the full data. If it does not, then the storage will be done using
TOAST. [I am considering the aspect which does not involve toast].

* The buffer manager loads pages into the disk and any search is done
using the buffer.

* Any modification done to the buffer is first written to the WAL and
only after WAL is synced to disk will the change take place on the disk
page and the buffer will then be discarded.


==> IS it compatible?

Well, I think that there is a way at hand which can be used to create
some sort of (say) 'magic' processes which can be run in parallel over
the data to search the data. (can't be used for modification, only
search).

Now there are some requirements of these processes. 

* First, they cannot be treated as 'processes' as we talk in everyday
terms. We cannot call them 'threads' either. (Hence I call them
'magic'). [Please do not ask how it can be done. All I can say is that
there is a way out] So the OS still is running only one process. 

* Second, say if I have 4 processes, then none of them will start unless
each 4 of them have something to do. This means that we need to give
them a batch of work together.

* Third, the sequence of actions in PG backend have to be linear. These
magic processes (all 4 of them) can be launched as if it was a function
call. Something like 
launch_magic_search(4, buffers_array_of_4_buffers_in_memory,
additional_parameters);

So this method can/may be used in the backend. These processes still
remain part of that one single backend which executed the function call
and pg backend will not bothered about creation of a child process or a
separate worker thread. It will always appear to be linear.


==> The point of concern:

Since work has to be done in parallel by the processes and in sync, it
is best if we give them equal amount of work which follows similar
pattern for work (one magic process cannot be searching inside a hash
buffer while other is on a seqscan, they are part of same function
call).

Since at a relatively finer level of organization of data, the buffer /
page is same for each type of table, we can ask the processes to work on
similar page buffers.

==> The final question:

Since I want to parallalize at the buffer level, I basically want to
load multiple pages at once and then release them at once. So the final
question is:

Can I ask the buffer manager to load 4 (in this case) pages at one time
and use them for my algorithm to search inside them (no modification is
done here, so no problem for the WAL)? 

==

Thanks for reading the email. I would have asked this one single
question but as it happened in the IRC channel, many questions came up
and I was suggested to ask on the list. I know that the same questions
will come up here and rather than eating up time and head of pg hackers,
I thought of writing it in a comprehensive style and detail what is
possible.

I would like to know the suggestions. 

Regards,

Vaibhav (*_*) 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We need index-only scans

2010-12-03 Thread MARK CALLAGHAN
On Wed, Dec 1, 2010 at 3:53 AM, Kristian Nielsen
 wrote:
> Greg Stark  writes:
>
>> Just so everyone is on the same page Even once we have index-only
>> scans they won't be anywhere near as useful with Postgres as they are
>> with Oracle and other databases. At least not unless we find a
>> solution for a different problem -- our inability to scan btree
>> indexes sequentially.
>
> True, however I would not be too pessimistic about this.
>
> For OLTP like typical web applications, index-only scans are a killer feature
> for being able to read N rows with 1 I/O (for some small N), when the data no
> longer fits in the buffer pool, or after cold start.
>
> Eg. read all account settings for one user account, or subjects of all
> messages, etc. A composite index with user_id in the first column allows to
> fetch all N rows from one (or a few) disk pages with an index-only scan, as
> opposed to N disk pages.
>
> So for this, index-only scans can make a _big_ difference, even without
> support for Oracle-type index fast-full-scans.

I am not trying start a MySQL vs PostgreSQL thread. I lurk on these
lists to learn more about PostgreSQL.

I know that PostgreSQL is good at OLTP and complex query processing
and that index fast-full scans can make a big difference for large
joins, but the workload that I care about is OLTP-only. PostgreSQL
will be more efficient on that workload with support for index-only
scans. The majority of the load is simple queries -- joins that touch
a few rows, short index range scans and index point lookups. With
covering indexes and InnoDB the queries do a few disk reads in the
worst case. Without covering indexes the queries do extra disk IO in
the worst case (buffer pool read miss) and this is much worse for the
range scans. I assume that the behavior with covering indexes but
without index-only scans is similar to not having index-only scans.

I collect 95th percentile response times for my popular queries and
they are much improved with the use of covering indexes.

-- 
Mark Callaghan
mdcal...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Streaming replication document

2010-12-03 Thread Tatsuo Ishii
Hi,

There is a description about streaming replication in the doc:
--
25.2.5. Streaming Replication

:
:
If you set up a WAL archive that's accessible from the
standby, wal_keep_segments is not required as the standby can always
use the archive to catch up.
--

I think this description is somewhat inadequate. Since recovery using
WAL archive is file based, it may cause long replication delay. I
think even if WAL archive is set up, we should set wal_keep_segments
to proper value, not 0. Recovery from WAL archive should be the last
resort, shouldn't be?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Review: Extensions Patch

2010-12-03 Thread David E. Wheeler
Extensions Patch v15 Review
===

Submission review
-

* Is the patch in context diff format?

Yes.

* Does it apply cleanly to the current git master?

Yes.

* Does it include reasonable tests, necessary doc patches, etc?

`make check` passes.
`make installcheck` fails (regression.diffs attached).
`make -C contrib installcheck` passes

I see tests for pg_execute_sql_string() and replace(), but absolutely nothing 
else. Such an important core feature really should have a pretty comprehensive 
suite of tests exercising all the functionality. Relying on the contrib 
extension tests won't do it, as they exercise only a subset of the 
functionality (e.g., no custom_variable_classes), and then only as a 
side-effect of their actual purpose.

Details on docs below.

Usability review


Read what the patch is supposed to do, and consider:

* Does the patch actually implement that? 

Yes.

* Do we want that? 

OH YES!

* Do we already have it? 

Nope.

* Does it follow SQL spec, or the community-agreed behavior? 

It's an implementation of community-agreed behavior, as hashed out on the mail 
list over the years.

* Does it include pg_dump support (if applicable)?

Yes, it does.

* Are there dangers? 

Yes. Much of the execution is superuser-only, so we need to make sure that such 
is actually the case (unit tests would help with that).

* Have all the bases been covered?

Mostly, yes. The lack of tests is the single biggest drawback to this patch.

Feature test


Apply the patch, compile it and test:

* Does the feature work as advertised?

Yes.

* Are there corner cases the author has failed to consider?

I had some trouble getting a third-party extension to work. See the end of this 
document for details.

* Are there any assertion failures or crashes?

No.

Performance review
--

* Does the patch slow down simple tests? 

Not that I've noticed.

* If it claims to improve performance, does it?

N/A.

* Does it slow down other things?

Not that I've noticed.

Coding review
-

Read the changes to the code in detail and consider:

* Does it follow the project 
[http://developer.postgresql.org/pgdocs/postgres/source.html coding 
guidelines]? 

I'm not a C expert, but it looks like it matches quite closely with all the 
other code. It's very neat and well-commented.

* Are there portability issues? 
* Will it work on Windows/BSD etc? 

I can't comment on these.

* Are the comments sufficient and accurate?

Yes.

* Does it do what it says, correctly?

I can't comment on this by looking at the code; see detailed review from a 
user's perspective below.

* Does it produce compiler warnings?

No.

* Can you make it crash?

No.

Architecture review
---

* Is everything done in a way that fits together coherently with other 
features/modules? 
* Are there interdependencies that can cause problems?

Can't really comment on this.

Notes on the documentation
--

The pg_extension catalog docs are a little thin, but probably sufficient. They 
appear to be duplicated, though, with the entries for catalog-pg-extension and 
view-pg-extensions looking identical. One is apparently a list of installed 
extensions, and the other an SRF that lists installed and available extensions. 
But I find the duplication a bit confusing. (Might be easer if I wasn't reading 
SGML though.)

The extend-extension docs

I don't understand this paragraph at all:


 The way to put together a coherent set of custom SQL objects
 is to create an extension. There are two sides of the
 extension, the Operating System side contains several files and
 the SQL one uses them.


This paragraph isn't very clear, either:

   
The control file can also contain
a custom_variable_classes key followed by any
number of custom settings,
named class.varname.
The custom_variable_classes value takes effect
at CREATE EXTENSION time, and is persistent. The
custom settings are set automatically too, but are not persistent.

Examples might help.


Control file keys:

* comment -- Should mention that it's used for CREATE COMMENT ON EXTENSION, no?
* version -- If it's free-form, how will you do dependency-checking? Or will 
you?
* script
* encoding -- Seems unnecessary; one can explicitly set it in the .sql file, no?
* custom_variable_classes

I don't understand this paragraph:

 
  The admin
  function pg_extension_flag_dump
  can be used to revert the default pg_dump policy
  about objects that belong to an extension and force the flagged objects
  to be part of the backups.
 

What's a pg_dump policy?

* There appears to be an irrelevant change to the docs for replace().
* For the pg_read_binary_file() docs, should there not be some sort of note 
about permissions to the file to be read in?

With pg_execute_sql_string() and friends, w

Re: [HACKERS] pg_execute_from_file review

2010-12-03 Thread Itagaki Takahiro
On Fri, Dec 3, 2010 at 18:02, Dimitri Fontaine  wrote:
>    Schema   |  Name   | Result data type | Argument data types |  Type
> +-+--+-+
>  pg_catalog | replace | text             | text, VARIADIC text | normal
>  pg_catalog | replace | text             | text, text, text    | normal
>
> My understanding is that the variadic form shadows the other one in a
> way that it's now impossible to call it from SQL level. That's the
> reason why I did the (text, text, text, VARIADIC text) version before,
> but is it true?

The VARIADIC version doesn't hide the 3-args version. I tested the
behavior by printf-debug. The planner seems to think the non VARIADIC
version is the best-matched one when 3 arguments are passed.

> Also, is it worthwhile to keep the non VARIADIC
> version exposed at SQL level?

Yes, because the non VARIADIC version is much faster than the
VARIADIC one. Of course we could optimize the performance of
replace_text_variadic(), but I think VARIADIC argument itself
is slow because it puts arguments into an array shape.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ERROR: could not identify an equality operator for type box

2010-12-03 Thread Tom Lane
Heikki Linnakangas  writes:
> On 03.12.2010 23:53, Oleg Bartunov wrote:
>> create table qq (b box);
>> CREATE TABLE
>> select count(*), b from qq group by b;
>> ERROR: could not identify an equality operator for type box
>> What does it means ?

> GROUP BY requires b-tree sort operators. Although there is a '=' 
> operator for box, there is no b-tree opclass.

A hash opclass would do too, as of recent releases ... but box hasn't
got one of those either.

There are some semantic issues involved here, notably that box_eq is
actually equality-of-areas, and that both it and box_same use fuzzy
equality which does not work well with either hash or sort-based
semantics.  So you'd need to look at redefining the operator behavior
before you could get far.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to add a primary key using an existing index

2010-12-03 Thread Josh Berkus
On 12/3/10 2:16 PM, Robert Treat wrote:
> Uh, the syntax I posted was based on this currently valid syntax: 
> 
> ALTER TABLE table_name ADD PRIMARY KEY (column_list); 
> 
> The constraint bit is optional, which is why I left it out, but I
> presume it would be optional with the new syntax as well... Also, I'm
> not wedded to the idea of keeping the column list, but if you are
> arguing to make it super consistent, then I think you need to include it.   

No, I'm not in that case.  I'm suggesting we omit the column list and
skip directly to USING.

Why no column list?
1. The extra typing will annoy our users
2. The column list provides opportunities for users to fail to be
consistent with the index and get errors

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to add a primary key using an existing index

2010-12-03 Thread Ross J. Reedstrom
On Fri, Dec 03, 2010 at 05:16:04PM -0500, Robert Treat wrote:
> On Fri, Dec 3, 2010 at 4:41 PM, Josh Berkus  wrote:
> 
> > However, I don't see why we need (column_list). Surely the index has a
> > column list already?
> >
> > ALTER TABLE table_name ADD CONSTRAINT pk_name PRIMARY KEY USING index_name
> >
> > ... seems like the syntax most consistent with the existing commands.
> > Anything else would be confusingly inconsistent with the way you add a
> > brand-new PK.
> >
> >
> Uh, the syntax I posted was based on this currently valid syntax:
> 
> ALTER TABLE table_name ADD PRIMARY KEY (column_list);
> 
> The constraint bit is optional, which is why I left it out, but I presume it
> would be optional with the new syntax as well... Also, I'm not wedded to the
> idea of keeping the column list, but if you are arguing to make it super
> consistent, then I think you need to include it.

If you consider that an index basically is, in some sense, a pre-canned
column list, then:

ALTER TABLE table_name ADD PRIMARY KEY (column_list);
ALTER TABLE table_name ADD PRIMARY KEY USING index_name;

are parallel constructions. And it avoids the error case of the user
providing a column list that doesn't match the index.

Ross
-- 
Ross Reedstrom, Ph.D. reeds...@rice.edu
Systems Engineer & Admin, Research Scientistphone: 713-348-6166
Connexions  http://cnx.orgfax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to add a primary key using an existing index

2010-12-03 Thread Robert Treat
On Fri, Dec 3, 2010 at 4:41 PM, Josh Berkus  wrote:

> On 12/3/10 12:27 PM, Robert Haas wrote:
> > On Fri, Dec 3, 2010 at 2:56 PM, r t  wrote:
> >> What exactly was the objection to the following -->
> >> ALTER TABLE table_name ADD PRIMARY KEY (column_list) USING index_name;
> >> Is the objection that you might have been trying to specify a constraint
> >> named "using" ? I'm willing to make that option more difficult. :-)
> >
> > I think it's that someone might expect the word after USING to be the
> > name of an index AM.
>
> Seems unlikely to cause confusion to me.
>
>
+1. And were we ever to support that, I think that would be the case to use
WITH (storage_parameter) type syntax, where you would specify
access_method=hash (or whatever). Although, let's not debate that syntax
right now, at this point :-)


> However, I don't see why we need (column_list). Surely the index has a
> column list already?
>
> ALTER TABLE table_name ADD CONSTRAINT pk_name PRIMARY KEY USING index_name
>
> ... seems like the syntax most consistent with the existing commands.
> Anything else would be confusingly inconsistent with the way you add a
> brand-new PK.
>
>
Uh, the syntax I posted was based on this currently valid syntax:

ALTER TABLE table_name ADD PRIMARY KEY (column_list);

The constraint bit is optional, which is why I left it out, but I presume it
would be optional with the new syntax as well... Also, I'm not wedded to the
idea of keeping the column list, but if you are arguing to make it super
consistent, then I think you need to include it.

Robert Treat
play: http://www.xzilla.net
work: http://www.omniti.com/is/hiring


Re: [HACKERS] ERROR: could not identify an equality operator for type box

2010-12-03 Thread Heikki Linnakangas

On 03.12.2010 23:53, Oleg Bartunov wrote:

create table qq (b box);
CREATE TABLE
select count(*), b from qq group by b;
ERROR: could not identify an equality operator for type box
LINE 1: select count(*), b from qq group by b;

but following select works fine

select ' (43.6038,48.8664536),(1.3620777,1.44327)'::box = '
(43.6038,48.8664536),(1.3620777,1.44327)'::box;
?column? --
t
(1 row)
^
What does it means ?


GROUP BY requires b-tree sort operators. Although there is a '=' 
operator for box, there is no b-tree opclass.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ERROR: could not identify an equality operator for type box

2010-12-03 Thread Oleg Bartunov
Ok, casting to ::text solves the problem, but still I think we 
need to fix it in the right way


Oleg
On Sat, 4 Dec 2010, Oleg Bartunov wrote:


Hi there,

create table qq (b box);
CREATE TABLE
select count(*), b from qq group by b;
ERROR:  could not identify an equality operator for type box
LINE 1: select count(*), b from qq group by b;

but following select works fine

select ' (43.6038,48.8664536),(1.3620777,1.44327)'::box = ' 
(43.6038,48.8664536),(1.3620777,1.44327)'::box;

?column? --
t
(1 row)
   ^
What does it means ?

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83




Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ERROR: could not identify an equality operator for type box

2010-12-03 Thread Oleg Bartunov

Hi there,

create table qq (b box);
CREATE TABLE
select count(*), b from qq group by b;
ERROR:  could not identify an equality operator for type box
LINE 1: select count(*), b from qq group by b;

but following select works fine

 select ' (43.6038,48.8664536),(1.3620777,1.44327)'::box = ' 
(43.6038,48.8664536),(1.3620777,1.44327)'::box;
 ?column? 
--

 t
(1 row)
^
What does it means ?

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to add a primary key using an existing index

2010-12-03 Thread Josh Berkus
On 12/3/10 12:27 PM, Robert Haas wrote:
> On Fri, Dec 3, 2010 at 2:56 PM, r t  wrote:
>> What exactly was the objection to the following -->
>> ALTER TABLE table_name ADD PRIMARY KEY (column_list) USING index_name;
>> Is the objection that you might have been trying to specify a constraint
>> named "using" ? I'm willing to make that option more difficult. :-)
> 
> I think it's that someone might expect the word after USING to be the
> name of an index AM.

Seems unlikely to cause confusion to me.

However, I don't see why we need (column_list). Surely the index has a
column list already?

ALTER TABLE table_name ADD CONSTRAINT pk_name PRIMARY KEY USING index_name

... seems like the syntax most consistent with the existing commands.
Anything else would be confusingly inconsistent with the way you add a
brand-new PK.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to add a primary key using an existing index

2010-12-03 Thread Robert Haas
On Fri, Dec 3, 2010 at 2:56 PM, r t  wrote:
> What exactly was the objection to the following -->
> ALTER TABLE table_name ADD PRIMARY KEY (column_list) USING index_name;
> Is the objection that you might have been trying to specify a constraint
> named "using" ? I'm willing to make that option more difficult. :-)

I think it's that someone might expect the word after USING to be the
name of an index AM.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+

2010-12-03 Thread Josh Berkus
All,

So, this week I've had my hands on a medium-high-end test system where I
could test various wal_sync_methods.  This is a 24-core Intel Xeon
machine with 72GB of ram, and 8 internal 10K SAS disks attached to a
raid controller with 512MB BBU write cache.  2 of the disks are in a
RAID1, which supports both an Ext4 partition and an XFS partition.  The
remaining disks are in a RAID10 which only supports a single pgdata
partition.

This is running on RHEL6, Linux Kernel: 2.6.32-71.el6.x86_64

I think this kind of a system much better represents our users who are
performance-conscious than testing on people's laptops or on VMs does.

I modified test_fsync in two ways to run this; first, to make it support
O_DIRECT, and second to make it run in the *current* directory.  I think
the second change should be permanent; I imagine that a lot of people
who are running test_fsync are not aware that they're actually testing
the performance of /var/tmp, not whatever FS mount they wanted to test.

Here's the results.  I think you'll agree that, at least on Linux, the
benefits of o_sync and o_dsync as defaults would be highly questionable.
 Particularly, it seems that if O_DIRECT support is absent, fdatasync is
across-the-board faster:

=

test_fsync with directIO, on 2 drives, XFS tuned:

Loops = 1

Simple write:
8k write  198629.457/second

Compare file sync methods using one write:
open_datasync 8k write14798.263/second
open_sync 8k write14316.864/second
8k write, fdatasync   12198.871/second
8k write, fsync   12371.843/second

Compare file sync methods using two writes:
2 open_datasync 8k writes  7362.805/second
2 open_sync 8k writes  7156.685/second
8k write, 8k write, fdatasync 10613.525/second
8k write, 8k write, fsync 10597.396/second

Compare open_sync with different sizes:
open_sync 16k write   13631.816/second
2 open_sync 8k writes  7645.038/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close11427.096/second
8k write, close, fsync11321.220/second


test_fsync with directIO, on 6 drives RAID10, XFS tuned:

Loops = 1

Simple write:
8k write  196494.537/second

Compare file sync methods using one write:
open_datasync 8k write14909.974/second
open_sync 8k write14559.326/second
8k write, fdatasync   11046.025/second
8k write, fsync   11046.916/second

Compare file sync methods using two writes:
2 open_datasync 8k writes  7349.223/second
2 open_sync 8k writes  7667.395/second
8k write, 8k write, fdatasync  9560.495/second
8k write, 8k write, fsync  9557.287/second

Compare open_sync with different sizes:
open_sync 16k write   12060.049/second
2 open_sync 8k writes  7650.746/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close 9377.107/second
8k write, close, fsync 9251.233/second



test_fsync without directIO on RAID1, Ext4, data=journal:

Loops = 1

Simple write:
8k write  150514.005/second

Compare file sync methods using one write:
open_datasync 8k write 4012.070/second
open_sync 8k write 5476.898/second
8k write, fdatasync5512.649/second
8k write, fsync5803.814/second

Compare file sync methods using two writes:
2 open_datasync 8k writes  2910.401/second
2 open_sync 8k writes  2817.377/second
8k write, 8k write, fdatasync  5041.608/second
8k write, 8k write, fsync  5155.248/second

Compare open_sync with different sizes:
open_sync 16k write4895.956/second
2 open_sync 8k writes  2720.875/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close 4724.052/second
8k write, close, fsync 4694.776/second


test_fsync without directIO on RAID1, XFS, tuned:

Loops = 1

Simple write:
8k write  199796.208/second

Compare file sync methods using one write:
open_datasync 8k write12553.525/second
open_sync 8k write12535.978/second
8k write, fdatasync   12268.298/second
8k write, fsync   12305.875/second

Compare file sync methods using two writes:
2 open_datasync 8k writes  6323.835/second
2 open_sy

Re: [HACKERS] Patch to add a primary key using an existing index

2010-12-03 Thread Heikki Linnakangas

On 03.12.2010 21:58, Alvaro Herrera wrote:

Excerpts from Heikki Linnakangas's message of vie dic 03 16:45:59 -0300 2010:


ALTER TABLE table_name SET PRIMARY KEY USING INDEX index_name. Quite
verbose, but imho USING makes it much more clear that it's an existing
index.


I was going to post the same thing (well except I was still thinking in
ADD PRIMARY KEY rather than SET PRIMARY KEY).  I think SET is better
than ADD in that it is a bit different from the syntax that makes it
create a new index.  On the other hand, it could also be pointlessly
annoying.


I think I'd prefer ADD too. I didn't pay attention to that when I posted.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to add a primary key using an existing index

2010-12-03 Thread r t
On Fri, Dec 3, 2010 at 2:43 PM, Robert Haas  wrote:

> On Fri, Dec 3, 2010 at 2:23 PM, Peter Eisentraut  wrote:
> > On sön, 2010-11-28 at 20:40 -0500, Robert Haas wrote:
> >> On Sun, Nov 28, 2010 at 8:06 PM, Itagaki Takahiro
> >>  wrote:
> >> > On Fri, Nov 26, 2010 at 05:58, Steve Singer 
> wrote:
> >> >> The attached version of the patch gets your regression tests to pass.
> >> >> I'm going to mark this as ready for a committer.
> >> >
> >> > I think we need more discussions about the syntax:
> >> >  ALTER TABLE table_name ADD PRIMARY KEY (...) WITH
> (INDEX='index_name')
> >>
> >> Why not:
> >>
> >> ALTER TABLE table_name ADD PRIMARY KEY (...) INDEX index_name;
> >
> > I would think that that determines that name of the index that the
> > command creates.  It does not convey that an existing index is to be
> > used.
>
>
+1 on this being confusing


> Well, that'll become clear pretty quickly if you try to use it that
> way, but I'm certainly open to other ideas.
>
> Random thoughts:
>
> ALTER TABLE table_name SET PRIMARY KEY INDEX index_name
> ALTER INDEX index_name PRIMARY KEY
>
> Other suggestions?


What exactly was the objection to the following -->

ALTER TABLE table_name ADD PRIMARY KEY (column_list) USING index_name;

Is the objection that you might have been trying to specify a constraint
named "using" ? I'm willing to make that option more difficult. :-)

Robert Treat
play: http://www.xzilla.net
work: http://www.omniti.com/is/hiring


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-03 Thread Heikki Linnakangas

On 03.12.2010 21:55, Josh Berkus wrote:

All,

So, I've been doing some reading about this issue, and I think
regardless of what other changes we make we should never enable O_DIRECT
automatically on Linux, and it was a mistake for us to do so in the
first place.

First, in the Linux docs for open():


The quote on that man page is hilarious:

"The thing that has always disturbed me about O_DIRECT  is  that
 the whole interface is just stupid, and was probably designed by
 a deranged monkey on some serious mind-controlling  substances."
  -- Linus

I agree we should not enable it by default. If it's faster on some 
circumstances, the admin is free to do the research and enable it, but 
defaults need to be safe above all.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to add a primary key using an existing index

2010-12-03 Thread Alvaro Herrera
Excerpts from Heikki Linnakangas's message of vie dic 03 16:45:59 -0300 2010:

> ALTER TABLE table_name SET PRIMARY KEY USING INDEX index_name. Quite 
> verbose, but imho USING makes it much more clear that it's an existing 
> index.

I was going to post the same thing (well except I was still thinking in
ADD PRIMARY KEY rather than SET PRIMARY KEY).  I think SET is better
than ADD in that it is a bit different from the syntax that makes it
create a new index.  On the other hand, it could also be pointlessly
annoying.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-03 Thread Josh Berkus
All,

So, I've been doing some reading about this issue, and I think
regardless of what other changes we make we should never enable O_DIRECT
automatically on Linux, and it was a mistake for us to do so in the
first place.

First, in the Linux docs for open():

=

In summary, O_DIRECT is a potentially powerful tool that should be used
with caution.  It is recommended that applications treat use of O_DIRECT
as a performance option which is disabled by default.

=

Second, Linus has a quote about O_DIRECT that I think should serve as an
indicator to us that directIO will never be beneficial-by-default on
Linux, and might even someday be desupported:



The right way to do it is to just not use O_DIRECT.

The whole notion of "direct IO" is totally braindamaged. Just say no.

This is your brain: O
This is your brain on O_DIRECT: .

Any questions?

I should have fought back harder. There really is no valid reason for EVER
using O_DIRECT. You need a buffer whatever IO you do, and it might as well
be the page cache. There are better ways to control the page cache than
play games and think that a page cache isn't necessary.

So don't use O_DIRECT. Use things like madvise() and posix_fadvise()
instead.

Linus
=



-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to add a primary key using an existing index

2010-12-03 Thread Heikki Linnakangas

On 03.12.2010 21:43, Robert Haas wrote:

On Fri, Dec 3, 2010 at 2:23 PM, Peter Eisentraut  wrote:

On sön, 2010-11-28 at 20:40 -0500, Robert Haas wrote:

On Sun, Nov 28, 2010 at 8:06 PM, Itagaki Takahiro
  wrote:

On Fri, Nov 26, 2010 at 05:58, Steve Singer  wrote:

The attached version of the patch gets your regression tests to pass.
I'm going to mark this as ready for a committer.


I think we need more discussions about the syntax:
  ALTER TABLE table_name ADD PRIMARY KEY (...) WITH (INDEX='index_name')


Why not:

ALTER TABLE table_name ADD PRIMARY KEY (...) INDEX index_name;


I would think that that determines that name of the index that the
command creates.  It does not convey that an existing index is to be
used.


Well, that'll become clear pretty quickly if you try to use it that
way, but I'm certainly open to other ideas.

Random thoughts:

ALTER TABLE table_name SET PRIMARY KEY INDEX index_name
ALTER INDEX index_name PRIMARY KEY


ALTER TABLE table_name SET PRIMARY KEY USING INDEX index_name. Quite 
verbose, but imho USING makes it much more clear that it's an existing 
index.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-the-fly index tuple deletion vs. hot_standby

2010-12-03 Thread Heikki Linnakangas

On 29.11.2010 08:10, Noah Misch wrote:

I have a hot_standby system and use it to bear the load of various reporting
queries that take 15-60 minutes each.  In an effort to avoid long pauses in
recovery, I set a vacuum_defer_cleanup_age constituting roughly three hours of
the master's transactions.  Even so, I kept seeing recovery pause for the
duration of a long-running query.  In each case, the culprit record was an
XLOG_BTREE_DELETE arising from on-the-fly deletion of an index tuple.  The
attached test script demonstrates the behavior (on HEAD); the index tuple
reclamation conflicts with a concurrent "SELECT pg_sleep(600)" on the standby.

Since this inserting transaction aborts, HeapTupleSatisfiesVacuum reports
HEAPTUPLE_DEAD independent of vacuum_defer_cleanup_age.  We go ahead and remove
the index tuples.  On the standby, btree_xlog_delete_get_latestRemovedXid does
not regard the inserting-transaction outcome, so btree_redo proceeds to conflict
with snapshots having visibility over that transaction.  Could we correctly
improve this by teaching btree_xlog_delete_get_latestRemovedXid to ignore tuples
of aborted transactions and tuples inserted and deleted within one transaction?


Seems reasonable. HeapTupleHeaderAdvanceLatestRemovedXid() will need 
similar treatment. Actually, btree_xlog_delete_get_latestRemovedXid() 
could just call HeapTupleHeaderAdvanceLatestRemoveXid().


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to add a primary key using an existing index

2010-12-03 Thread Robert Haas
On Fri, Dec 3, 2010 at 2:23 PM, Peter Eisentraut  wrote:
> On sön, 2010-11-28 at 20:40 -0500, Robert Haas wrote:
>> On Sun, Nov 28, 2010 at 8:06 PM, Itagaki Takahiro
>>  wrote:
>> > On Fri, Nov 26, 2010 at 05:58, Steve Singer  
>> > wrote:
>> >> The attached version of the patch gets your regression tests to pass.
>> >> I'm going to mark this as ready for a committer.
>> >
>> > I think we need more discussions about the syntax:
>> >  ALTER TABLE table_name ADD PRIMARY KEY (...) WITH (INDEX='index_name')
>>
>> Why not:
>>
>> ALTER TABLE table_name ADD PRIMARY KEY (...) INDEX index_name;
>
> I would think that that determines that name of the index that the
> command creates.  It does not convey that an existing index is to be
> used.

Well, that'll become clear pretty quickly if you try to use it that
way, but I'm certainly open to other ideas.

Random thoughts:

ALTER TABLE table_name SET PRIMARY KEY INDEX index_name
ALTER INDEX index_name PRIMARY KEY

Other suggestions?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to add a primary key using an existing index

2010-12-03 Thread Peter Eisentraut
On sön, 2010-11-28 at 20:40 -0500, Robert Haas wrote:
> On Sun, Nov 28, 2010 at 8:06 PM, Itagaki Takahiro
>  wrote:
> > On Fri, Nov 26, 2010 at 05:58, Steve Singer  wrote:
> >> The attached version of the patch gets your regression tests to pass.
> >> I'm going to mark this as ready for a committer.
> >
> > I think we need more discussions about the syntax:
> >  ALTER TABLE table_name ADD PRIMARY KEY (...) WITH (INDEX='index_name')
> 
> Why not:
> 
> ALTER TABLE table_name ADD PRIMARY KEY (...) INDEX index_name;

I would think that that determines that name of the index that the
command creates.  It does not convey that an existing index is to be
used.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] should we set hint bits without dirtying the page?

2010-12-03 Thread Jeff Davis
On Thu, 2010-12-02 at 19:00 -0500, Robert Haas wrote:
>  Untidy buffers would be treated as dirty by the background writer
> cleaning scan, but as clean by checkpoints and by backends doing
> emergency buffer cleaning to feed new allocations. 

Differentiating between a backend write and a bgwriter write sounds like
a good heuristic to me. Of course, only numbers can tell, but it sounds
promising.

> I then got to wondering whether we should even go a step further, and
> simply decree that a page with only hint bit updates is not dirty and
> won't be written, period.

Sounds reasonable.

Just to throw another idea out there, perhaps we could change the
behavior based on whether the page is already dirty or not. I haven't
thought this through, but it might be an interesting approach.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions

2010-12-03 Thread David E. Wheeler
On Dec 3, 2010, at 8:38 AM, Dimitri Fontaine wrote:

> David, and anyone feeling like reviewing or trying the patch, if you're
> not already crawling into the v14 patch, you could as well begin with
> this cleaner version — no behavior changes, some cleaner code, make
> check passes, no bitrot against master.

Yes, I'm going to start reviewing it now. Thanks.

David
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] disk caching for writing log

2010-12-03 Thread Stefan Kaltenbrunner

On 12/03/2010 06:43 PM, Heikki Linnakangas wrote:

On 03.12.2010 13:49, flyusa2010 fly wrote:

When writing log, dbms should synchronously flush log to disk. I'm
wondering, if it is possible that the logs are in disk cache, while the
control is returned to dbms again, so dbms thinks logs are persistent on
disk. In this case, if the disk fails, then there's incorrectness for
dbms
log writing, because the log is not persistent, but dbms considers it is
persistent!


I have no idea what you mean. The method we use to flush the WAL to disk
should not be fallible to such failures, we wait for fsync() or
fdatasync() to return before we assume the logs are safely on disk. If
you can elaborate what you mean by "control is returned to dbms", maybe
someone can explain why in more detail.


I think he is refering to the plain old "the disk/os is lying about 
whether the data really made it to stable storage" issue(especially with 
the huge local caches on modern disks) - if you have such a disk and/or 
an OS with broken barrier support you are doomed.



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] disk caching for writing log

2010-12-03 Thread Heikki Linnakangas

On 03.12.2010 13:49, flyusa2010 fly wrote:

When writing log, dbms should synchronously flush log to disk. I'm
wondering, if it is possible that the logs are in disk cache, while the
control is returned to dbms again, so dbms thinks logs are persistent on
disk. In this case, if the disk fails, then there's incorrectness for dbms
log writing, because the log is not persistent, but dbms considers it is
persistent!


I have no idea what you mean. The method we use to flush the WAL to disk 
should not be fallible to such failures, we wait for fsync() or 
fdatasync() to return before we assume the logs are safely on disk. If 
you can elaborate what you mean by "control is returned to dbms", maybe 
someone can explain why in more detail.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-03 Thread Andrew Dunstan



On 12/03/2010 12:17 PM, Alvaro Herrera wrote:

Excerpts from Robert Haas's message of vie dic 03 13:56:32 -0300 2010:


I know the use cases are limited, but I think it's still useful on its own.

I don't understand what's so difficult about starting with the snapshot
cloning patch.  AFAIR it's already been written anyway, no?



Yeah. If we can do it then this whole argument becomes moot. Like you I 
don't see why we can't.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idle git question: how come so many "objects"?

2010-12-03 Thread Florian Weimer
* Tom Lane:

> $ git push   
> Counting objects: 172, done.
> Compressing objects: 100% (89/89), done.
> Writing objects: 100% (89/89), 17.07 KiB, done.
> Total 89 (delta 80), reused 0 (delta 0)
> To ssh://g...@gitmaster.postgresql.org/postgresql.git
>35a3def..8a6eb2e  REL8_1_STABLE -> REL8_1_STABLE
>cfb6ac6..b0e2092  REL8_2_STABLE -> REL8_2_STABLE
>301a822..0d45e8c  REL8_3_STABLE -> REL8_3_STABLE
>61f8618..6bd3753  REL8_4_STABLE -> REL8_4_STABLE
>09425f8..0a85bb2  REL9_0_STABLE -> REL9_0_STABLE
>c0b5fac..225f0aa  master -> master

> How does it get to 172?

These are the number of objects "git push" (actually, git-send-pack, I
think) needs to look at more closely, AFAIUI.  It's a pretty arbitrary
number.  You see it sometimes during pull, too.

> And then where do the 89 and 80 numbers come from?

89 is the number of objects which need to be transmitted.  Of those,
80 were compressed by diffing them to some other object (which might,
in turn, be a diff).

-- 
Florian Weimer
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] disk caching for writing log

2010-12-03 Thread flyusa2010 fly
When writing log, dbms should synchronously flush log to disk. I'm
wondering, if it is possible that the logs are in disk cache, while the
control is returned to dbms again, so dbms thinks logs are persistent on
disk. In this case, if the disk fails, then there's incorrectness for dbms
log writing, because the log is not persistent, but dbms considers it is
persistent!

Am I correct?


Re: [HACKERS] Hypothetical Indexes - PostgreSQL extension - PGCON 2010

2010-12-03 Thread Sergio Lifschitz
Indeed, hypothetical indexes are good to check potentially good 
configurations without harming the whole system with actual index 
creation. Please observer that we've added an "explain hypothetical" 
command, that will include plans considering hypothetical indexes! We'll 
try to add a simple case study that would help those wondering about 
this project.


Sergio

On 3/12/2010 08:06, Richard Huxton wrote:

On 03/12/10 08:14, Jeroen Vermeulen wrote:

On 2010-12-02 00:48, Ana Carolina Brito de Almeida wrote:


We would like to inform you all that our extension to PostgreSQL, that
includes hypothetical indexes (and soon index self-tuning), is
available through a sourgeforge project.



Looking at the sourceforge page, I'm left with one burning question:
what are they for?


I believe they're for performance testing. Add hypothetical index 
(takes very little time). Check estimated costs with EXPLAIN. If good, 
add real index (takes lots of time).


Of course, they're also good for indexing hypothetical data ;-)



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-03 Thread Alvaro Herrera
Excerpts from Robert Haas's message of vie dic 03 13:56:32 -0300 2010:

> I know the use cases are limited, but I think it's still useful on its own.

I don't understand what's so difficult about starting with the snapshot
cloning patch.  AFAIR it's already been written anyway, no?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-03 Thread Robert Haas
On Fri, Dec 3, 2010 at 11:40 AM, Andrew Dunstan  wrote:
>
>
> On 12/03/2010 11:23 AM, Robert Haas wrote:
>>
>> On Fri, Dec 3, 2010 at 8:02 AM, Andrew Dunstan
>>  wrote:
>>>
>>> I think Josh Berkus' comments in the thread you mentioned are correct:
>>>
 Actually, I'd say that there's a broad set of cases of people who want
 to do a parallel pg_dump while their system is active.  Parallel pg_dump
 on a stopped system will help some people (for migration, particularly)
 but parallel pg_dump with snapshot cloning will help a lot more people.
>>
>> But you failed to quote the rest of what he said:
>>
>>> So: if parallel dump in single-user mode is what you can get done, then
>>> do it.  We can always improve it later, and we have to start somewhere.
>>> But we will eventually need parallel pg_dump on active systems, and
>>> that should remain on the TODO list.
>
> Right, and the reason I don't think that's right is that it seems to me like
> a serious potential footgun.
>
> But in any case, the reason I quoted Josh was in answer to a different
> point, namely Tom's statement about the limited potential uses.

I know the use cases are limited, but I think it's still useful on its own.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-03 Thread Andrew Dunstan



On 12/03/2010 11:23 AM, Robert Haas wrote:

On Fri, Dec 3, 2010 at 8:02 AM, Andrew Dunstan  wrote:

I think Josh Berkus' comments in the thread you mentioned are correct:


Actually, I'd say that there's a broad set of cases of people who want
to do a parallel pg_dump while their system is active.  Parallel pg_dump
on a stopped system will help some people (for migration, particularly)
but parallel pg_dump with snapshot cloning will help a lot more people.

But you failed to quote the rest of what he said:


So: if parallel dump in single-user mode is what you can get done, then
do it.  We can always improve it later, and we have to start somewhere.
But we will eventually need parallel pg_dump on active systems, and
that should remain on the TODO list.


Right, and the reason I don't think that's right is that it seems to me 
like a serious potential footgun.


But in any case, the reason I quoted Josh was in answer to a different 
point, namely Tom's statement about the limited potential uses.


cheers

andre

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improving foreign key locks

2010-12-03 Thread Alvaro Herrera
Of course, there is a much more serious problem with the whole idea.  I
have worked through most of the necessary changes and I'm now down to
changing heap_udate to comply with the new locking protocol.

The problem I'm now facing is that we need to know the set of updated
columns pretty early -- just after calling HeapTupleSatisfiesUpdate, in
fact, so that we can change a BeingUpdated result into MayBeUpdated if
the set of columns is right.  However, we don't know the set of updated
columns until much later, when the function has already undergone a lot
of other work and checked other possible error conditions.

Short of resturcturing the function so that we can obtain the set of
updated columns early (which I'm not fond of doing and may not even be
possible), I'm thinking that we should be "optimistic" about the set of
updated columns, and recheck them later.  The problem with this idea, of
course, is that if the test turns out to fail, we could have possibly
caused a lot of work (such as TOAST changes) that now need to be
discarded.  In other words, the optimization is pessimizing some cases
:-(

I'm not seeing any other way around this ATM.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-03 Thread Robert Haas
On Fri, Dec 3, 2010 at 8:02 AM, Andrew Dunstan  wrote:
> I think Josh Berkus' comments in the thread you mentioned are correct:
>
>> Actually, I'd say that there's a broad set of cases of people who want
>> to do a parallel pg_dump while their system is active.  Parallel pg_dump
>> on a stopped system will help some people (for migration, particularly)
>> but parallel pg_dump with snapshot cloning will help a lot more people.

But you failed to quote the rest of what he said:

> So: if parallel dump in single-user mode is what you can get done, then
> do it.  We can always improve it later, and we have to start somewhere.
> But we will eventually need parallel pg_dump on active systems, and
> that should remain on the TODO list.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] should we set hint bits without dirtying the page?

2010-12-03 Thread Robert Haas
On Thu, Dec 2, 2010 at 7:00 PM, Robert Haas  wrote:
> But
> maybe we could ameliorate that problem by freezing more aggressively.

I realized as I was falling asleep last night any sort of more
aggressive freezing is going to be a huge bummer for Hot Standby
users, for which freezing generates a conflict.

Argh.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] column-level update privs + lock table

2010-12-03 Thread Robert Haas
On Tue, Nov 30, 2010 at 1:48 PM, Tom Lane  wrote:
> On the whole I agree with Robert --- let's just adjust the
> documentation, and not enlarge privileges in a way we might regret
> later.

Done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-safe visibility map, take three

2010-12-03 Thread Florian Weimer
* Robert Haas:

> Those hint bit tests are a single machine instruction.  It's tough
> to beat that.  It's tough to get within two orders of magnitude.
> I'd like to, but I don't see how.

For some scans, it might be possible to hoist the checks out of inner
loops.  (At least in principle, I'm not sure how much that would
interfere with the executor architecture.)

-- 
Florian Weimer
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hypothetical Indexes - PostgreSQL extension - PGCON 2010

2010-12-03 Thread Ana Carolina Brito de Almeida
Jeroen,

We add a simple case study (sourceforge page):
http://sourceforge.net/projects/hypotheticalind/files/TUTORIAL_8_4.pdf/download

Although this tutorial is for version 8.4, it also applies to other
versions.

Att,
Ana Carolina

2010/12/3 Jeroen Vermeulen 

> On 2010-12-03 19:44, Sergio Lifschitz wrote:
>
>> Indeed, hypothetical indexes are good to check potentially good
>> configurations without harming the whole system with actual index
>> creation. Please observer that we've added an "explain hypothetical"
>> command, that will include plans considering hypothetical indexes! We'll
>> try to add a simple case study that would help those wondering about
>> this project.
>>
>
> That sounds very useful indeed!
>
>
> Jeroen
>


Re: [HACKERS] Extensions, this time with a patch

2010-12-03 Thread Dimitri Fontaine
Robert Haas  writes:
> I have committed the cfparser patch to which the above comments refer.

Excellent, thank you! On to merging that into the extension's main
branch, will still wait until after pg_execute_sql_file() is in to
produce extension.v15.patch, unless advised otherwise.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-12-03 Thread Robert Haas
On Thu, Nov 25, 2010 at 4:06 PM, Dimitri Fontaine
 wrote:
> Itagaki Takahiro  writes:
>> Thanks. I'll move the patch to Ready for Committer.
>
> Thanks!

I have committed the cfparser patch to which the above comments refer.
 One patch per thread makes things easier!

I adopted most of Itagaki Takahiro's suggestions, which were very helpful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hypothetical Indexes - PostgreSQL extension - PGCON 2010

2010-12-03 Thread Jeroen Vermeulen

On 2010-12-03 19:44, Sergio Lifschitz wrote:

Indeed, hypothetical indexes are good to check potentially good
configurations without harming the whole system with actual index
creation. Please observer that we've added an "explain hypothetical"
command, that will include plans considering hypothetical indexes! We'll
try to add a simple case study that would help those wondering about
this project.


That sounds very useful indeed!


Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-03 Thread Andrew Dunstan



On 12/02/2010 11:44 PM, Joachim Wieland wrote:

On Thu, Dec 2, 2010 at 9:33 PM, Tom Lane  wrote:

In particular, this issue *has* been discussed before, and there was a
consensus that preserving dump consistency was a requirement.  I don't
think that Joachim gets to bypass that decision just by submitting a
patch that ignores it.

I am not trying to bypass anything here :)  Regarding the locking
issue I probably haven't done sufficient research, at least I managed
to miss the emails that mentioned it. Anyway, that seems to be solved
now fortunately, I'm going to implement your idea over the weekend.

Regarding snapshot cloning and dump consistency, I brought this up
already several months ago and asked if the feature is considered
useful even without snapshot cloning. And actually it was you who
motivated me to work on it even without having snapshot consistency...

http://archives.postgresql.org/pgsql-hackers/2010-03/msg01181.php

In my patch pg_dump emits a warning when called with -j, if you feel
better with an extra option
--i-know-that-i-have-no-synchronized-snapshots, fine with me :-)

In the end we provide a tool with limitations, it might not serve all
use cases but there are use cases that would benefit a lot. I
personally think this is better than to provide no tool at all...





I think Tom's statement there:


I think migration to a new server version (that's too incompatible for
PITR or pg_migrate migration) is really the only likely use case.


is just wrong. Say you have a site that's open 24/7. But there is a 
window of, say, 6 hours, each day, when it's almost but not quite quiet. 
You want to be able to make your disaster recovery dump within that 
window, and the low level of traffic means you can afford the degraded 
performance that might result from a parallel dump. Or say you have a 
hot standby machine from which you want to make the dump but want to set 
the max_standby_*_delay as low as possible. These are both cases where 
you might want parallel dump and yet you want dump consistency. I have a 
client currently considering the latter setup, and the timing tolerances 
are a little tricky. The times in which the system is in a state that we 
want dumped are fixed, and we want to be sure that the dump is finished 
by the next time such a time rolls around. (This is a system that in 
effect makes one giant state change at a time.) If we can't complete the 
dump in that time then there will be a delay introduced to the system's 
critical path. Parallel dump will be very useful in helping us avoid 
such a situation, but only if it's properly consistent.


I think Josh Berkus' comments in the thread you mentioned are correct:


Actually, I'd say that there's a broad set of cases of people who want
to do a parallel pg_dump while their system is active.  Parallel pg_dump
on a stopped system will help some people (for migration, particularly)
but parallel pg_dump with snapshot cloning will help a lot more people.




cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Custom code int(32|64) => text conversions out of performance reasons

2010-12-03 Thread Florian Weimer
* Tom Lane:

> Yeah.  You certainly don't want to do the division sequence twice,
> and a log() call wouldn't be cheap either, and there don't seem to
> be many other alternatives.

What about a sequence of comparisons, and unrolling the loop?  That
could avoid the final division, too.  It might also be helpful to
break down the dependency chain for large input values.

The int8 version should probably work in 1e9 chunks and use a
zero-padding variant of the 32-bit code.

-- 
Florian Weimer
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another proposal for table synonyms

2010-12-03 Thread Alexey Klyukin

On Dec 3, 2010, at 2:17 AM, Alvaro Herrera wrote:

> Excerpts from Robert Haas's message of jue dic 02 21:10:48 -0300 2010:
>> On Thu, Dec 2, 2010 at 3:43 PM, Alvaro Herrera
>>  wrote:
>>> Excerpts from Robert Haas's message of jue dic 02 17:27:01 -0300 2010:
> 
>>> Yeah, the Oracle system is a lot more complex than SQL Server's, but I
>>> was only talking about the latter, for which see here:
>>> 
>>> http://www.databasejournal.com/features/mssql/article.php/3635426/SYNONYM-in-SQL-Server-2005.htm
>> 
>> Well, that seems primarily designed to cut down on three and four part
>> names.  We don't have that problem anyway.
> 
> Right.  (My point here is that SQL Server is not a good guidance on what
> the synonym system should do.)
> 
 The list of objects for which they support synonyms is also
 interesting.
>>> 
>>> The bit that allows a synonym to reference another synonym seems like
>>> worth considering further (either reject them altogether, or have some
>>> way to deal with possible cycles).
>> 
>> It's pretty trivial to do cycle-detection at runtime.
> 
> No disagreement on that, but something needs to be decided.

I don't think it makes sense to allow synonyms for synonyms. It would make
resolution code slower, and I don't see any situation where they make sense.
The original proposal didn't mention them, but limited the list of initially
supported objects to those to tables, views and sequences, implicitly
excluding synonyms referring to another synonyms.

--
Alexey Klyukin  http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unlogged tables

2010-12-03 Thread Robert Haas
On Thu, Dec 2, 2010 at 10:53 PM, Alvaro Herrera
 wrote:
> Excerpts from Andy Colson's message of vie dic 03 00:37:17 -0300 2010:
>
>> Ok, forget the time thing.  Has nothing to do with it.  (Which everyone 
>> already assumed I imagine).
>>
>> Its truncate.
>>
>> Create unloged table, fill it, truncate it, fill it again, restart pg, and 
>> the data will still be there.
>
> Hmm, presumably the table rewrite thing in truncate is not preserving
> the unlogged state (perhaps it's the swap-relfilenode business).

Oh ho.  Right.  Yeah, that case is not handled.  Woopsie.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-03 Thread Robert Haas
On Thu, Dec 2, 2010 at 9:33 PM, Tom Lane  wrote:
> Andrew Dunstan  writes:
>> Umm, nobody has attributed ridiculousness to anyone. Please don't put
>> words in my mouth. But I think this is a perfectly reasonable discussion
>> to have. Nobody gets to come along and get the features they want
>> without some sort of consensus, not me, not you, not Joachim, not Tom.
>
> In particular, this issue *has* been discussed before, and there was a
> consensus that preserving dump consistency was a requirement.  I don't
> think that Joachim gets to bypass that decision just by submitting a
> patch that ignores it.

Well, the discussion that Joachim linked too certainly doesn't have
any sort of clear consensus that that's the only way to go.  In fact,
it seems to be much closer to the opposite consensus.  Perhaps there
is some OTHER time that this has been discussed where "synchronization
is a hard requirement" was the consensus.  There's an old saw that the
nice thing about standards is there are so many to choose from, and
the same thing can certainly be said about -hackers discussions on any
particular topic.

I actually think that the phrase "this has been discussed before and
rejected" should be permanently removed from our list of excuses for
rejecting a patch.  Or if we must use that excuse, then I think a link
to the relevant discussion is a must, and the relevant discussion had
better reflect the fact that $TOPIC was in fact rejected.  It seems to
me that in at least 50% of cases, someone comes back and says one of
the following things:

1. I searched the archives and could find no discussion along those lines.
2. I read that discussion and it doesn't appear to me that it reflects
a rejection of this idea.  Instead what people seemed to be saying was
X.
3. At the time that might have been true, but what has changed in the
meanwhile is X.

In short, the problem with referring to previous discussions is that
our memories grow fuzzy over time.  We remember that an idea was not
adopted, but not exactly why it wasn't adopted.  We reject a new patch
with a good implementation of $FEATURE because an old patch was badly
done, or fell down on some peripheral issue, or just never got done.
Veteran backend hackers understand the inevitable necessity of arguing
about what consensus is actually reflected in the archives and whether
it's still relevant, but new people can be (and frequently are) put
off by it; and even for experienced contributors, it does little to
advance the dialogue.  Hmm, according to so-and-so's memory, sometime
in the fourteen-year-history of the project someone didn't like this
idea, or maybe a similar one.  Whee, time to start Googling.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hypothetical Indexes - PostgreSQL extension - PGCON 2010

2010-12-03 Thread Richard Huxton

On 03/12/10 08:14, Jeroen Vermeulen wrote:

On 2010-12-02 00:48, Ana Carolina Brito de Almeida wrote:


We would like to inform you all that our extension to PostgreSQL, that
includes hypothetical indexes (and soon index self-tuning), is
available through a sourgeforge project.



Looking at the sourceforge page, I'm left with one burning question:
what are they for?


I believe they're for performance testing. Add hypothetical index (takes 
very little time). Check estimated costs with EXPLAIN. If good, add real 
index (takes lots of time).


Of course, they're also good for indexing hypothetical data ;-)

--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] directory archive format for pg_dump

2010-12-03 Thread Heikki Linnakangas

On 02.12.2010 23:12, Alvaro Herrera wrote:

Excerpts from Heikki Linnakangas's message of jue dic 02 16:52:27 -0300 2010:

Ok, committed, with some small cleanup since the last patch I posted.


I think the comments on _ReadBuf and friends need to be updated, since
they are not just for headers and TOC stuff anymore.  I'm not sure if
they were already outdated before your patch ...


"These routines are only used to read & write headers & TOC"

Hmm, ReadInt calls _ReadByte, and PrintData used to call ReadInt, so it 
was indirectly been called for things other than headers and TOC 
already. Unless you consider the "headers" to include length integer in 
in each data block. I'm inclined to just remove that sentence.


I also note that the _Clone and _DeClone functions are a bit misplaced. 
There's a big "END OF FORMAT CALLBACKS" earlier in the file, but _Clone 
and _DeClone are such callbacks. I'll move them to the right place.


PS. Thanks for the cleanup you did yesterday.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_execute_from_file review

2010-12-03 Thread Dimitri Fontaine
Itagaki Takahiro  writes:
> I fixed and cleanup some of codes in it; v9 patch attached. Please check
> my modifications, and set the status to "Ready to Committer" if you find
> no problems. I think documentation and code comments might need to be
> checked by native English speakers.

Many thanks, that version is so much cleaner than the previous
one. Comments above.

> You added replace(text, text, text, VARIADIC text), but I think
> replace(text, VARIADIC text) also works. If we have the short form,
> we can use it easily from execute functions with placeholders.

So we now have the following:

   List of functions
   Schema   |  Name   | Result data type | Argument data types |  Type  
+-+--+-+
 pg_catalog | replace | text | text, VARIADIC text | normal
 pg_catalog | replace | text | text, text, text| normal

My understanding is that the variadic form shadows the other one in a
way that it's now impossible to call it from SQL level. That's the
reason why I did the (text, text, text, VARIADIC text) version before,
but is it true? Also, is it worthwhile to keep the non VARIADIC
version exposed at SQL level?

The only other nitpicking I seem to be able to find is that you forgot
to remove the following from builtins.h:

+ extern Datum replace_placeholders(PG_FUNCTION_ARGS);

So I'll go update the commitfest to point to your version of the patch,
add an entry for this "comments" email, and mark as ready for commiter

> Other changes:

All for the best, thank you! I can't help but noticing that some of them
are fixing things that we could want to backpatch. Those:

> * Int64GetDatum((int64) fst.st_size) was broken.
> * An error checks for "could not read file" didn't work.

That's straight from master's branch code, IIRC.

> * Added some regression tests.
> * Read file contents into bytea buffer directly to avoid memcpy.
> * Fixed bad usages of text and bytea values
>   because they are not null-terminated.
> * I don't want to expose ArrayType to builtins.h.
>   So I call replace_text_variadic() from execute functions.
> * pg_execute_sql_file(path, NULL) won't work because it's a STRICT function.
>   It returns NULL with no works when at least one of the argument is NULL.

Not sure to understand this last point, because I already had 3 versions
of it, so surely you would call pg_execute_sql_file(path) in this case?

> BTW, we have many text from/to cstring conversions in the new codes.
> It would be not an item for now, but we would need to improve them
> if those functions are heavily used, Especially replace_text_variadic().

That could become a concern if it actually shadows the other version.

> Agreed. I also prefer pg_read_file_all rather than pg_read_whole_file :P

Going in this line of thought, maybe we should provide a third variant
here, the "real" pg_read_whole_file(path), then we have the existing
other variants, pg_read_file_to_end(path, offset) and the historic one,
pg_read_file(path, offset, bytes_to_read).

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Author names in source files

2010-12-03 Thread Dimitri Fontaine
Itagaki Takahiro  writes:
> Dimitri, your name cannot be added in source files,
> but will be recorded in commit logs ;-)

Oh, I've only been adding it to src/backend/utils/adt/genfile.c because
I saw another name in there. So of course a stricter following of the
project's copyright and naming here is good news.

BTW, I see you already made that happen (and so much more) in your v9
patch, thank you very much about that! Follow-up due later this morning!

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hi- How frequently Postgres Poll for trigger file

2010-12-03 Thread Heikki Linnakangas

On 03.12.2010 03:55, Fujii Masao wrote:

On Thu, Dec 2, 2010 at 12:16 AM, Euler Taveira de Oliveira
  wrote:

As you said, there are platforms that a signal doesn't wake up a
process, so I suggest (ii) but I'm fine to include (i) at docs too.


Can we use "pg_ctl kill" to send signal on such platforms?


It won't make a difference, "pg_ctl kill" sends the signal just like 
regular "kill". Except that "pg_ctl kill" also works on Windows which 
doesn't have the concept of Unix signals.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hypothetical Indexes - PostgreSQL extension - PGCON 2010

2010-12-03 Thread Jeroen Vermeulen

On 2010-12-02 00:48, Ana Carolina Brito de Almeida wrote:


We would like to inform you all that our extension to PostgreSQL, that includes 
hypothetical indexes (and soon index self-tuning), is available through a 
sourgeforge project.
This was suggested at PgCon 2010 and we hope some of you may find it useful, 
contribute and give us your feedback.


Looking at the sourceforge page, I'm left with one burning question: 
what are they for?


I can see what a hypothetical index is, but neither the project pages 
nor the README in the tarball say why I might want one.  I'd be quite 
interested to know that.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers