date:20100721

Re: [HACKERS] patch (for 9.1) string functions

2010-07-21 Thread Pavel Stehule

2010/7/21 Itagaki Takahiro itagaki.takah...@gmail.com:
 I reviewed the core changes of the patch. I don't think we need
 mb_string_info() at all. Instead, we can just call pg_mbxxx() functions.

 I rewrote the patch to use pg_mbstrlen_with_len() and pg_mbcharcliplen().
 What do you think the changes? It requires re-counting lengths of multi-byte
 strings in some cases, but the code will be much simpler and can avoid
 allocating length buffers.


It is a good idea. I see a problem only for right function, where
for most common use case a mblen will be called two times. I am not
able to say now, if this can be a performance issue or not. Highly
probably not - only for very large strings.

postgres=# create or replace function randomstr(int) returns text as
$$select string_agg(substring('abcdefghijklmnop' from
trunc(random()*13)::int+1 for 1),'') from generate_series(1,$1) $$
language sql;
CREATE FUNCTION
Time: 27,452 ms

postgres=# select count(*) from(select right(randomstr(1000),3) from
generate_series(1,1))x;
 count
---
 1
(1 row)

Time: 5615,061 ms
postgres=# select count(*) from(select right(randomstr(1000),3) from
generate_series(1,1))x;
 count
---
 1
(1 row)

Time: 5606,937 ms
postgres=# select count(*) from(select right(randomstr(1000),3) from
generate_series(1,1))x;
 count
---
 1
(1 row)

Time: 5630,771 ms

postgres=# select count(*) from(select right(randomstr(1000),3) from
generate_series(1,1))x;
 count
---
 1
(1 row)

Time: 5753,063 ms
postgres=# select count(*) from(select right(randomstr(1000),3) from
generate_series(1,1))x;
 count
---
 1
(1 row)
Time: 5755,776 ms

It is about 2% slower for UTF8 encoding. So it isn't significant for me.

I agree with your changes. Thank You very much

Regards

Pavel Stehule

 I'd like to apply contrib/stringinfo apart from the core changes,
 because there seems to be still some idea to improve sprintf().

 --
 Itagaki Takahiro


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-07-21 Thread Fujii Masao

On Fri, Jul 16, 2010 at 7:43 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 16/07/10 10:40, Fujii Masao wrote:

 So we should always prevent the standby from applying any WAL in pg_xlog
 unless walreceiver is in progress. That is, if there is no WAL available
 in the archive, the standby ignores pg_xlog and starts walreceiver
 process to request for WAL streaming.

 That completely defeats the purpose of storing streamed WAL in pg_xlog in
 the first place. The reason it's written and fsync'd to pg_xlog is that if
 the standby subsequently crashes, you can use the WAL from pg_xlog to
 reapply the WAL up to minRecoveryPoint. Otherwise you can't start up the
 standby anymore.

But, the standby can start up by reading the missing WAL files from the
master. No?

On the second thought, minRecoveryPoint can be guaranteed to be older
than the fsync location on the master if we'll prevent the standby from
applying the WAL files more than the fsync location. So we can safely
apply the WAL files in pg_xlog up to minRecoveryPoint.

Consequently, we should always prevent the standby from applying any
newer WAL in pg_xlog than minRecoveryPoint unless walreceiver is in
progress. Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCH] Re: [HACKERS] Adding XMLEXISTS to the grammar

2010-07-21 Thread Mike Fowler


Hi Peter,

Thanks for your feedback.

On 20/07/10 19:54, Peter Eisentraut wrote:

Attached is a patch with the revised XMLEXISTS function, complete with
grammar support and regression tests. The implemented grammar is:

XMLEXISTS ( xpath_expression PASSING BY REF xml_value [BY REF] )

Though the full grammar makes everything after the xpath_expression
optional, I've left it has mandatory simply to avoid lots of rework of
the function (would need new null checks, memory handling would need
reworking).

Some thoughts, mostly nitpicks:

The snippet of documentation could be clearer.  It says if the xml
satisifies the xpath.  Not sure what that means exactly.  An XPath
expression, by definition, returns a value.  How is that value used to
determine the result?
   


I'll rephrase it: The function xmlexists returns true if the xpath 
returns any nodes and false otherwise.



Naming of parser symbols: xmlexists_list isn't actually a list of
xmlexists's.  That particular rule can probably be done away with anyway
and the code be put directly into the XMLEXISTS rule.

Why is the first argument AexprConst instead of a_expr?  The SQL
standard says it's a character string literal, but I think we can very
well allow arbitrary expressions.
   


Yes, it was AexprConst because of the specification. I also found that 
using it solved my shift/reduce problems, but I can change it a_expr as 
see if I can work them out in a different way.



xmlexists_query_argument_list should be optional.
   


OK, I'll change it.


The rules xml_default_passing_mechanism and xml_passing_mechanism are
pretty useless to have a separate rules.  Just mention the tokens where
they are used.
   


Again, I'll change that too.


Why c_expr?
   


As with the AexprConst, it's choice was partially influenced by the fact 
it solved the shift/reduce errors I was getting. I'm guessing than that 
I should really use a_expr and resolve the shift/reduce problem differently?



Call the C-level function xmlexists for consistency.
   


Sure. I'll look to get a patch addressing these concerns out in the next 
day or two, work/family/sleep permitting! :)


Regards,

--
Mike Fowler
Registered Linux user: 379787



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-07-21 Thread Fujii Masao

On Sat, Jul 17, 2010 at 3:25 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 14/07/10 09:50, Fujii Masao wrote:

 TODO
 
 The patch have no features for performance improvement of synchronous
 replication. I admit that currently the performance overhead in the
 master is terrible. We need to address the following TODO items in the
 subsequent CF.

 * Change the poll loop in the walsender
 * Change the poll loop in the backend
 * Change the poll loop in the startup process
 * Change the poll loop in the walreceiver

 I was actually hoping to see a patch for these things first, before any of
 the synchronous replication stuff. Eliminating the polling loops is
 important, latency will be laughable otherwise, and it will help the
 synchronous case too.

At first, note that the poll loop in the backend and walreceiver doesn't
exist without synchronous replication stuff.

Yeah, I'll start with the change of the poll loop in the walsender. I'm
thinking that we should make the backend signal the walsender to send the
outstanding WAL immediately as the previous synchronous replication patch
I submitted in the past year did. I use the signal here because walsender
needs to wait for the request from the backend and the ack message from
the standby *concurrently* in synchronous replication. If we use the
semaphore instead of the signal, the walsender would not be able to
respond the ack immediately, which also degrades the performance.

The problem of this idea is that signal can be sent per transaction commit.
I'm not sure if this frequent signaling really harms the performance of
replication. BTW, when I benchmarked the previous synchronous replication
patch based on the idea, AFAIR the result showed no impact of the
signaling. But... Thought? Do you have another better idea?

 * Perform the WAL write and replication concurrently
 * Send WAL from not only disk but also WAL buffers

 IMHO these are premature optimizations that we should not spend any effort
 on now. Maybe later, if ever.

Yep!

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-07-21 Thread Fujii Masao

On Sun, Jul 18, 2010 at 3:14 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 14/07/10 09:50, Fujii Masao wrote:

 Quorum commit
 -
 In previous discussion about synchronous replication, some people
 wanted the quorum commit feature. This feature is included in also
 Zontan's synchronous replication patch, so I decided to create it.

 The patch provides quorum parameter in postgresql.conf, which
 specifies how many standby servers transaction commit will wait for
 WAL records to be replicated to, before the command returns a
 success indication to the client. The default value is zero, which
 always doesn't make transaction commit wait for replication without
 regard to replication_mode. Also transaction commit always doesn't
 wait for replication to asynchronous standby (i.e., replication_mode
 is set to async) without regard to this parameter. If quorum is more
 than the number of synchronous standbys, transaction commit returns
 a success when the ACK has arrived from all of synchronous standbys.

 There should be a way to specify wait for *all* connected standby servers
 to acknowledge

Agreed. I'll allow -1 as the valid value of the quorum parameter, which
means that transaction commit waits for all connected standbys.

 Protocol
 
 I extended the handshake message START_REPLICATION so that it
 includes replication_mode read from recovery.conf. If 'async' is
 passed, the master thinks that it doesn't need to wait for the ACK
 from the standby.

 Please use self-explanatory names for the modes in START_REPLICATION
 command, instead of just an integer.

Agreed. What about changing the START_REPLICATION message to?:

START_REPLICATION XXX/XXX SYNC_LEVEL { async | recv | fsync | replay }

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] antisocial things you can do in git (but not CVS)

2010-07-21 Thread Dave Page

On Tue, Jul 20, 2010 at 8:12 PM, Peter Eisentraut pete...@gmx.net wrote:
 My preference would be to stick to a style where we identify the
 committer using the author tag and note the patch author, reviewers,
 whether the committer made changes, etc. in the commit message.  A
 single author field doesn't feel like enough for our workflow, and
 having a mix of authors and committers in the author field seems like
 a mess.

 Well, I had looked forward to actually putting the real author into the
 author field.

I hadn't realised that was possible until Guillaume did so on his
first commit to the new pgAdmin GIT repo. It seems to work nicely:

http://git.postgresql.org/gitweb?p=pgadmin3.git;a=commit;h=08e2826d90129bd4e4b3b7462bab682dd6a703e4

-- 
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamically allocating chunks from shared memory

2010-07-21 Thread Markus Wanner


On 07/21/2010 01:52 AM, Robert Haas wrote:

On Tue, Jul 20, 2010 at 5:46 PM, Alvaro Herrera
alvhe...@commandprompt.com  wrote:

I guess what Robert is saying is that you don't need shmem to pass
messages around.  The new LISTEN implementation was just an example.
imessages aren't supposed to use it directly.  Rather, the idea is to
store the messages in a new SLRU area.  Thus you don't need to mess with
dynamically allocating shmem at all.


Okay, so I just need to grok the SLRU stuff. Thanks for clarifying.

Note that I sort of /want/ to mess with shared memory. It's what I know 
how to deal with. It's how threaded programs work as well. Ya know, 
locks, conditional variables, mutexes, all those nice thing that allow 
you to shoot your foot so terribly nicely... Oh, well...



I think it should be rather straightforward.  There would be a unique
append-point;


Unique append-point? Sounds like what I had before. That'd be a step 
backwards, compared to the per-backend queue and an allocator that 
hopefully scales well with the amount of CPU cores.



each process desiring to send a new message to another
backend would add a new message at that point.  There would be one read
pointer per backend, and it would be advanced as messages are consumed.
Old segments could be trimmed as backends advance their read pointer,
similar to how sinval queue is handled.


That leads to pretty nasty fragmentation. A dynamic allocator should do 
much better in that regard. (Wamalloc certainly does).



If the messages are mostly unicast, it might be nice if to contrive a
method whereby backends didn't need to explicitly advance over
messages destined only for other backends.  Like maybe allocate a
small, fixed amount of shared memory sufficient for two pointers
into the SLRU area per backend, and then use the SLRU to store each
message with a header indicating where the next message is to be
found.


That's pretty much how imessages currently work. A single list of 
messages queued per backend.



For each backend, you store one pointer to the first queued
message and one pointer to the last queued message.  New messages can
be added by making the current last message point to a newly added
message and updating the last message pointer for that backend.  You'd
need to think about the locking and reference counting carefully to
make sure you eventually freed up unused pages, but it seems like it
might be doable.


I've just read through slru.c, but still don't have a clue how it could 
replace a dynamic allocator.


At the moment, the creator of an imessage allocs memory, copies the 
payload there and then activates the message by appending it to the 
recipient's queue. Upon getting signaled, the recipient consumes the 
message by removing it from the queue and is obliged to release the 
memory the messages occupies after having processed it. Simple and 
straight forward, IMO.


The queue addition and removal is clear. But how would I do the 
alloc/free part with SLRU? Its blocks are fixed size (BLCKSZ) and the 
API with ReadPage and WritePage is rather unlike a pair of alloc() and 
free().



One big advantage of attacking the problem with an SLRU is that
there's no fixed upper limit on the amount of data that can be
enqueued at any given time.  You can spill to disk or whatever as
needed (although hopefully you won't normally do so, for performance
reasons).


Yes, imessages shouldn't ever be spilled to disk. There naturally must 
be an upper limit for them. (Be it total available memory, as for 
threaded things or a given and size-constrained pool, as is the case for 
dynshmem).


To me it rather sounds like SLRU is a candidate for using dynamically 
allocated shared memory underneath, instead of allocating a fixed amount 
of slots in advance. That would allow more efficient use of shared 
memory. (Given SLRU's ability to spill to disk, it could even be used to 
'balance' out anomalies to some extent).


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] antisocial things you can do in git (but not CVS)

2010-07-21 Thread Magnus Hagander

On Wed, Jul 21, 2010 at 02:28, Andrew Dunstan and...@dunslane.net wrote:


 Robert Haas wrote:

 On Tue, Jul 20, 2010 at 3:12 PM, Peter Eisentraut pete...@gmx.net wrote:


 Well, I had looked forward to actually putting the real author into the
 author field.


 What if there's more than one?  What if you make changes yourself?
 How will you credit the reviewer?



 I think our current practice is fine. Put it in the commit log.

If nothing else, I think this definitely falls under the minimum
changes first policy. Let's start by doing things exactly as we're
doing now. We can then consider changing this in the future, but let's
not change everything at once.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Query results differ depending on operating system (using GIN)

2010-07-21 Thread Oleg Bartunov


On Tue, 20 Jul 2010, Robert Haas wrote:


On Tue, Jul 20, 2010 at 5:41 AM, Artur Dabrowski a...@astec.com.pl wrote:

I have been redirected here from pg-general.

I tested full text search using GIN index and it turned out that the results
depend on operating system. Not all the rows are found when executing some
of queries on pg server installed on Win XP SP3 and CentOS 5.4, while
everything seems to be fine on Ubuntu 4.4.1.

More details and tested queries are described here:
http://old.nabble.com/Incorrect-FTS-results-with-GIN-index-ts29172750.html

I hope you can help with this weird problem.


This seems like it's definitely a bug, but I don't know much about the
GIN code.  Copying Oleg and Teodor...


On my machine I didn't reproduce the problem with Artur's dump. I think the 
problem could be with package, since I use only compiled version.



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leaky views, yet again

2010-07-21 Thread Robert Haas

2010/7/21 KaiGai Kohei kai...@ak.jp.nec.com:
 (2010/07/20 2:13), Heikki Linnakangas wrote:
 On 09/07/10 06:47, KaiGai Kohei wrote:
 When leaky and non-leaky functions are chained within a WHERE clause,
 it will be ordered by the cost of functions. So, we have possibility
 that leaky functions are executed earlier than non-leaky functions.

 No, that needs to be forbidden as part of the fix. Leaky functions must
 not be executed before all the quals from the view are evaluated.


 IIUC, a view is extracted to a subquery in the rewriter phase, then it
 can be pulled up to join clause at pull_up_subqueries(). In this case,
 WHERE clause may have the quals come from different origins, isn't it?

 E.g)
  SELECT * FROM v1 WHERE f_malicious(v1.a);

  At the rewriter:
  - SELECT v1.* FROM (SELECT * FROM t1 WHERE f_policy(t1.b)) v1 WHERE 
 f_malicious(v1.a);

  At the pull_up_subqueries()
  - SELECT * FROM t1 WHERE f_policy(t1.b) AND f_malicious(t1.a);
                            ^^     ^
                             cost = 100         cost = 0.0001

 Apart from an idea of secure/leaky function mark, isn't it necessary any
 mechanism to enforce f_policy() shall be executed earlier than f_malicious()?

I think you guys are in fact agreeing with each other.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leaky views, yet again

2010-07-21 Thread Robert Haas

2010/7/21 KaiGai Kohei kai...@ak.jp.nec.com:
 On the other hand, if it's enough from a performance
 point of view to review and mark only a few built-in functions like
 index operators, maybe it's ok.

 I also think it is a worthful idea to try as a proof-of-concept.

Yeah.  So, should we mark this patch as Returned with Feedback, and
you can submit a proof-of-concept patch for the next CF?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql \conninfo command (was: Patch: psql \whoami option)

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 1:07 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Jul 20, 2010 at 11:14 PM, Robert Haas robertmh...@gmail.com wrote:
 OK, committed.

 When I specify the path of the directory for the Unix-domain socket
 as the host, \conninfo doesn't mention that this connection is based
 on the Unix-domain socket. Is this intentional?

 $ psql -h/tmp -c\conninfo
 You are connected to database postgres on host /tmp at port 5432
 as user postgres.

 I expected that something like

    You are connected to database postgres via local socket on
 /tmp at port 5432 as user postgres.

:-(

No, I didn't realize the host field could be used that way.  It's true
that you get a fairly similar message from \c, but that's not exactly
intuitive either.

rhaas=# \c - - /tmp -
You are now connected to database rhaas on host /tmp.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] managing git disk space usage

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 6:17 AM, Abhijit Menon-Sen a...@toroid.org wrote:
 At 2010-07-20 13:04:12 -0400, robertmh...@gmail.com wrote:

 1. Clone the origin.  Then, clone the clone n times locally.  This
 uses hard links, so it saves disk space.  But, every time you want to
 pull, you first have to pull to the main clone, and then to each of
 the slave clones.  And same thing when you want to push.

 If your extra clones are for occasionally-touched back branches, then:

 (a) In my experience, it is almost always much easier to work with many
 branches and move patches between them rather than use multiple clones;
 but

 (b) You don't need to do the double-pull and push. Clone your local
 repository as many times as needed, but create new git-remote(1)s in
 each extra clone and pull/push only the branch you care about directly
 from or to the remote. That way, you'll start off with the bulk of the
 storage shared with your main local repository, and waste a few KB
 when you make (presumably infrequent) new changes.

Ah, that is clever.  Perhaps we need to write up directions on how to do that.

 But that brings me to another point:

 In my experience (doing exactly this kind of old-branch-maintenance with
 Archiveopteryx), git doesn't help you much if you want to backport (i.e.
 cherry-pick) changes from a development branch to old release branches.
 It is much more helpful when you make changes to the *oldest* applicable
 branch and bring it *forward* to your development branch (by merging the
 old branch into your master). Cherry-picking can be done, but it becomes
 painful after a while.

Well, per previous discussion, we're not going to change that at this
point, or maybe ever.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] managing git disk space usage

2010-07-21 Thread Magnus Hagander

On Wed, Jul 21, 2010 at 12:39, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jul 21, 2010 at 6:17 AM, Abhijit Menon-Sen a...@toroid.org wrote:
 At 2010-07-20 13:04:12 -0400, robertmh...@gmail.com wrote:

 1. Clone the origin.  Then, clone the clone n times locally.  This
 uses hard links, so it saves disk space.  But, every time you want to
 pull, you first have to pull to the main clone, and then to each of
 the slave clones.  And same thing when you want to push.

 If your extra clones are for occasionally-touched back branches, then:

 (a) In my experience, it is almost always much easier to work with many
 branches and move patches between them rather than use multiple clones;
 but

 (b) You don't need to do the double-pull and push. Clone your local
 repository as many times as needed, but create new git-remote(1)s in
 each extra clone and pull/push only the branch you care about directly
 from or to the remote. That way, you'll start off with the bulk of the
 storage shared with your main local repository, and waste a few KB
 when you make (presumably infrequent) new changes.

 Ah, that is clever.  Perhaps we need to write up directions on how to do that.

Yeah, that's the way I work with some projects at least.


 But that brings me to another point:

 In my experience (doing exactly this kind of old-branch-maintenance with
 Archiveopteryx), git doesn't help you much if you want to backport (i.e.
 cherry-pick) changes from a development branch to old release branches.
 It is much more helpful when you make changes to the *oldest* applicable
 branch and bring it *forward* to your development branch (by merging the
 old branch into your master). Cherry-picking can be done, but it becomes
 painful after a while.

 Well, per previous discussion, we're not going to change that at this
 point, or maybe ever.

Nope, the deal was definitely that we stick to the current workflow.

Yes, this means we can't use git cherry-pick or similar git-specific
tools to make life easier. But it shouldn't make life harder than it
is *now*, with cvs.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] managing git disk space usage

2010-07-21 Thread Abhijit Menon-Sen

At 2010-07-20 13:04:12 -0400, robertmh...@gmail.com wrote:

 1. Clone the origin.  Then, clone the clone n times locally.  This
 uses hard links, so it saves disk space.  But, every time you want to
 pull, you first have to pull to the main clone, and then to each of
 the slave clones.  And same thing when you want to push.

If your extra clones are for occasionally-touched back branches, then:

(a) In my experience, it is almost always much easier to work with many
branches and move patches between them rather than use multiple clones;
but

(b) You don't need to do the double-pull and push. Clone your local
repository as many times as needed, but create new git-remote(1)s in
each extra clone and pull/push only the branch you care about directly
from or to the remote. That way, you'll start off with the bulk of the
storage shared with your main local repository, and waste a few KB
when you make (presumably infrequent) new changes.

But that brings me to another point:

In my experience (doing exactly this kind of old-branch-maintenance with
Archiveopteryx), git doesn't help you much if you want to backport (i.e.
cherry-pick) changes from a development branch to old release branches.
It is much more helpful when you make changes to the *oldest* applicable
branch and bring it *forward* to your development branch (by merging the
old branch into your master). Cherry-picking can be done, but it becomes
painful after a while.

See http://toroid.org/ams/etc/git-merge-vs-p4-integrate for more.

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] antisocial things you can do in git (but not CVS)

2010-07-21 Thread Abhijit Menon-Sen

At 2010-07-20 14:34:20 -0400, robertmh...@gmail.com wrote:

 I think there is also a committer field, but that doesn't always
 appear and I'm not clear on how it works.

There is always a committer field, and it is set sensibly as long as the
committer has user.name and user.email set correctly with git-config. It
is not displayed by git-log by default, unless it is different from the
author. (As PeterE showed, it's easy to get the list of committers.)

 My preference would be to stick to a style where we identify the
 committer using the author tag and note the patch author, reviewers,
 whether the committer made changes, etc. in the commit message.

An aside: as a patch author (and elsewhere, as a committer), it's nice
when the log shows the author rather than the committer. Will we really
have so many patches with multiple authors or other complications that
we can't set the author by default and fall back to explanations in the
commit message (e.g. applied with changes) for more complicated cases?

 I want to make sure that I don't accidentally push the last three of
 those to the authoritative server...

By default (at least with a recent git), git push will push branches
that are tracking remote branches, but new local branches have to be
pushed explicitly to create them on the remote.

So don't worry about that.

 3. Merge commits.  I believe that we have consensus that commits
 should always be done as a squash, so that the history of all of
 our branches is linear. 

I admit I haven't been paying as much attention as I should, but I did
not know there was such a consensus. If anyone could explain the
rationale, I would be grateful.

 But it seems to me that someone could
 accidentally push a merge commit […]
 Can we forbid this?

Yes, I suppose it's possible, but personally I think it would be a waste
of time to try to ban merge commits.

 4. History rewriting.  Under what circumstances, if any, are we OK
 with rebasing the master?

Please, let's never do that. The cure for pulling a rebased branch into
an existing clone may seem simple, but it's a huge pain in practice, and
it's never really worth it.

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] antisocial things you can do in git (but not CVS)

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 6:46 AM, Abhijit Menon-Sen a...@toroid.org wrote:
 My preference would be to stick to a style where we identify the
 committer using the author tag and note the patch author, reviewers,
 whether the committer made changes, etc. in the commit message.

 An aside: as a patch author (and elsewhere, as a committer), it's nice
 when the log shows the author rather than the committer. Will we really
 have so many patches with multiple authors or other complications that
 we can't set the author by default and fall back to explanations in the
 commit message (e.g. applied with changes) for more complicated cases?

Tom Lane rewrites part of nearly every commit, and even I change maybe
30% of them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] antisocial things you can do in git (but not CVS)

2010-07-21 Thread Magnus Hagander

On Wed, Jul 21, 2010 at 12:46, Abhijit Menon-Sen a...@toroid.org wrote:
 At 2010-07-20 14:34:20 -0400, robertmh...@gmail.com wrote:
 I want to make sure that I don't accidentally push the last three of
 those to the authoritative server...

 By default (at least with a recent git), git push will push branches
 that are tracking remote branches, but new local branches have to be
 pushed explicitly to create them on the remote.

Yeha, i agree this is probably not a big problem. Plus, if we
accidentally push a branch that shouldn't have been pushed, it can
easily be removed (as long as it's noticed before anybody relies on
it). To the suitable embarrassment of the committer who made the
incorrect push, which has a tendency to teach them not to do it next
time :-)


 3. Merge commits.  I believe that we have consensus that commits
 should always be done as a squash, so that the history of all of
 our branches is linear.

 I admit I haven't been paying as much attention as I should, but I did
 not know there was such a consensus. If anyone could explain the
 rationale, I would be grateful.

We are not changing the workflow, just the tool.

We may consider changing the workflow sometime in the future (but
don't bet on it), but we're definitely not changing both at the same
time.

This has been discussed many times before, both here on list and in
person on at least two instances of the pgcon developer meeting. This
is not the time to re-open that discussion.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] managing git disk space usage

2010-07-21 Thread Abhijit Menon-Sen

At 2010-07-21 06:39:28 -0400, robertmh...@gmail.com wrote:

 Perhaps we need to write up directions on how to do that.

I'll write them if you tell me where to put them. It's trivial.

 Well, per previous discussion, we're not going to change that at this
 point, or maybe ever.

Sure. I just wanted to mention it, because it's something I learned the
hard way. It's also true that back-porting changes is a bigger deal for
Postgres than it was for me (in the sense that it's an exception rather
than a routine activity), and individual changes are usually backported
as soon as, or very soon after, they are committed; so it should be less
painful on the whole.

Another point, in response to Magnus's followup:

At 2010-07-21 12:42:03 +0200, mag...@hagander.net wrote:

 Yes, this means we can't use git cherry-pick or similar git-specific
 tools to make life easier.

No, that's not right. You *can* use cherry-pick; in fact, it's the sane
way to backport the occasional change. What you can't do is efficiently
manage a queue of changes to be backported to multiple branches. But as
I said above, that's not exactly what we want to do for Postgres, so it
should not matter too much.

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_config problem on Solaris 10u7 X64

2010-07-21 Thread Robert Haas

On Tue, Jul 20, 2010 at 10:52 PM, Amber guxiaobo1...@gmail.com wrote:
  I am trying to build RPostgreSQL on Solaris 10u7 X64, but have problems
 with pg_config, the configure script of RPostgreSQL checks for pg_config and
 got “checking for pg_config... /usr/bin/pg_config”. In Solaris 10u7 X64,
 three versions of PostgreSQL are installed, there are in
 /usr/postgres/8.2(8.2.9) and /usr/postgres/8.3(8.3.3), the corresponding bin
 files are in /usr/postgres/version/bin and
 /usr/postgres/version/bin/amd64, and the libraries in /usr/bin is 8.1.11
 and it seems a 32bit, and I can’t find the 64bit version bins for 8.1.11.
 My question is how to let RPostgreSQL configure script find the 64bit
 pg_config.

My first guess would be to try changing your PATH before running configure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] managing git disk space usage

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 6:56 AM, Abhijit Menon-Sen a...@toroid.org wrote:
 At 2010-07-21 06:39:28 -0400, robertmh...@gmail.com wrote:

 Perhaps we need to write up directions on how to do that.

 I'll write them if you tell me where to put them. It's trivial.

Post 'em here or drop them on the wiki and post a link.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] antisocial things you can do in git (but not CVS)

2010-07-21 Thread Abhijit Menon-Sen

At 2010-07-21 12:55:55 +0200, mag...@hagander.net wrote:

 We are not changing the workflow, just the tool.

OK, but I don't see why accidental merge commits need to be considered
antisocial, and banned or rebased away. Who cares if they exist? They
don't change anything you need to do to pull, create, view, or push
changes.

 This is not the time to re-open that discussion.

Sure. I apologise for bringing it up.

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leaky views, yet again

2010-07-21 Thread KaiGai Kohei


(2010/07/21 19:26), Robert Haas wrote:

2010/7/21 KaiGai Koheikai...@ak.jp.nec.com:

On the other hand, if it's enough from a performance
point of view to review and mark only a few built-in functions like
index operators, maybe it's ok.


I also think it is a worthful idea to try as a proof-of-concept.


Yeah.  So, should we mark this patch as Returned with Feedback, and
you can submit a proof-of-concept patch for the next CF?


Yes, it's fair enough.

--
KaiGai Kohei kai...@kaigai.gr.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] antisocial things you can do in git (but not CVS)

2010-07-21 Thread Magnus Hagander

On Wed, Jul 21, 2010 at 13:05, Abhijit Menon-Sen a...@toroid.org wrote:
 At 2010-07-21 12:55:55 +0200, mag...@hagander.net wrote:

 We are not changing the workflow, just the tool.

 OK, but I don't see why accidental merge commits need to be considered
 antisocial, and banned or rebased away. Who cares if they exist? They
 don't change anything you need to do to pull, create, view, or push
 changes.

They makes it harder to track how the project has moved along for
people who don't really know about the concept.

I'm not sure, but I bet they may cause issues for those tracking the
project through git-cvs, or any other tool that doesn't deal with
nonlinear history.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: to_string, to_array functions

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 12:39 AM, Itagaki Takahiro
itagaki.takah...@gmail.com wrote:
 2010/7/20 Pavel Stehule pavel.steh...@gmail.com:
 here is a new version - new these functions are not a strict and
 function to_string is marked as stable.

 We have array_to_string(anyarray, text) and string_to_array(text, text),
 and you'll introduce to_string(anyarray, text, text) and
 to_array(text, text, text).
 Do we think it is good idea to have different names for them?  IMHO, we'd
 better  use 3 arguments version of array_to_string() instead of the
 new to_string() ?

The worst part is that the new names are not very mnemonic.

I think maybe what we really need here is array equivalents of
COALESCE() and NULLIF().  It looks like the proposed to_string()
function is basically equivalent to replacing each NULL entry with the
array with a given value, and then doing array_to_string() as usual.
And it looks like the proposed to_array function basically does the
same thing as to_array(), and then replaces empty strings with NULL or
some other value.

Maybe we just need a function array_replace(anyarray, anyelement,
anyelement) that replaces any element in the array that IS NOT
DISTINCT FROM $2 with $3 and returns the new array.  That could be
useful for other things besides this particular case, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: to_string, to_array functions

2010-07-21 Thread Pavel Stehule

2010/7/21 Robert Haas robertmh...@gmail.com:
 On Wed, Jul 21, 2010 at 12:39 AM, Itagaki Takahiro
 itagaki.takah...@gmail.com wrote:
 2010/7/20 Pavel Stehule pavel.steh...@gmail.com:
 here is a new version - new these functions are not a strict and
 function to_string is marked as stable.

 We have array_to_string(anyarray, text) and string_to_array(text, text),
 and you'll introduce to_string(anyarray, text, text) and
 to_array(text, text, text).
 Do we think it is good idea to have different names for them?  IMHO, we'd
 better  use 3 arguments version of array_to_string() instead of the
 new to_string() ?

 The worst part is that the new names are not very mnemonic.

 I think maybe what we really need here is array equivalents of
 COALESCE() and NULLIF().  It looks like the proposed to_string()
 function is basically equivalent to replacing each NULL entry with the
 array with a given value, and then doing array_to_string() as usual.
 And it looks like the proposed to_array function basically does the
 same thing as to_array(), and then replaces empty strings with NULL or
 some other value.

 Maybe we just need a function array_replace(anyarray, anyelement,
 anyelement) that replaces any element in the array that IS NOT
 DISTINCT FROM $2 with $3 and returns the new array.  That could be
 useful for other things besides this particular case, too.


I don't agree. Building or updating any array is little bit expensive.
There can be same performance issue like combination array_agg and
array_to_string versus string_agg. I am not against to possible name
changes. But I am strong in opinion so current string_to_array and
array_to_string are buggy and have to be deprecated.

Regards

Pavel

p.s. can we use a names - text_to_array, array_to_text ?


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] managing git disk space usage

2010-07-21 Thread Abhijit Menon-Sen

At 2010-07-21 06:57:53 -0400, robertmh...@gmail.com wrote:

 Post 'em here or drop them on the wiki and post a link.

1. Clone the remote repository as usual:

git clone git://git.postgresql.org/git/postgresql.git

2. Create as many local clones as you want:

git clone postgresql foobar

3. In each clone (supposing you care about branch xyzzy):

3.1. git remote origin set-url ssh://whatever/postgresql.git

3.2. git remote update  git remote prune

3.2. git checkout -t origin/xyzzy

3.4. git branch -d master

3.5. Edit .git/config and set origin.fetch thus:

 [remote origin]
 fetch = +refs/heads/xyzzy:refs/remotes/origin/xyzzy

 (You can git config remote.origin.fetch '+refs/...' if you're
 squeamish about editing the config file.)

3.6. That's it. git pull and git push will work correctly.

(This will replace the origin remote that pointed at your local
postgresql.git clone with one that points to the real remote; but you
could also add a remote definition named something other than origin,
in which case you'd need to git push thatname etc.)

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: to_string, to_array functions

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 7:39 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 2010/7/21 Robert Haas robertmh...@gmail.com:
 On Wed, Jul 21, 2010 at 12:39 AM, Itagaki Takahiro
 itagaki.takah...@gmail.com wrote:
 2010/7/20 Pavel Stehule pavel.steh...@gmail.com:
 here is a new version - new these functions are not a strict and
 function to_string is marked as stable.

 We have array_to_string(anyarray, text) and string_to_array(text, text),
 and you'll introduce to_string(anyarray, text, text) and
 to_array(text, text, text).
 Do we think it is good idea to have different names for them?  IMHO, we'd
 better  use 3 arguments version of array_to_string() instead of the
 new to_string() ?

 The worst part is that the new names are not very mnemonic.

 I think maybe what we really need here is array equivalents of
 COALESCE() and NULLIF().  It looks like the proposed to_string()
 function is basically equivalent to replacing each NULL entry with the
 array with a given value, and then doing array_to_string() as usual.
 And it looks like the proposed to_array function basically does the
 same thing as to_array(), and then replaces empty strings with NULL or
 some other value.

 Maybe we just need a function array_replace(anyarray, anyelement,
 anyelement) that replaces any element in the array that IS NOT
 DISTINCT FROM $2 with $3 and returns the new array.  That could be
 useful for other things besides this particular case, too.

 I don't agree. Building or updating any array is little bit expensive.
 There can be same performance issue like combination array_agg and
 array_to_string versus string_agg.

But is it really bad enough to introduce custom versions of every
function that might want to do this sort of thing?

 I am not against to possible name
 changes. But I am strong in opinion so current string_to_array and
 array_to_string are buggy and have to be deprecated.

But I don't think anyone else agrees with you.  The current behavior
isn't the only one anyone might want, but it's one reasonable
behavior.

 p.s. can we use a names - text_to_array, array_to_text ?

That's not going to reduce confusion one bit...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: to_string, to_array functions

2010-07-21 Thread Pavel Stehule

2010/7/21 Robert Haas robertmh...@gmail.com:
 On Wed, Jul 21, 2010 at 7:39 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 2010/7/21 Robert Haas robertmh...@gmail.com:
 On Wed, Jul 21, 2010 at 12:39 AM, Itagaki Takahiro
 itagaki.takah...@gmail.com wrote:
 2010/7/20 Pavel Stehule pavel.steh...@gmail.com:
 here is a new version - new these functions are not a strict and
 function to_string is marked as stable.

 We have array_to_string(anyarray, text) and string_to_array(text, text),
 and you'll introduce to_string(anyarray, text, text) and
 to_array(text, text, text).
 Do we think it is good idea to have different names for them?  IMHO, we'd
 better  use 3 arguments version of array_to_string() instead of the
 new to_string() ?

 The worst part is that the new names are not very mnemonic.

 I think maybe what we really need here is array equivalents of
 COALESCE() and NULLIF().  It looks like the proposed to_string()
 function is basically equivalent to replacing each NULL entry with the
 array with a given value, and then doing array_to_string() as usual.
 And it looks like the proposed to_array function basically does the
 same thing as to_array(), and then replaces empty strings with NULL or
 some other value.

 Maybe we just need a function array_replace(anyarray, anyelement,
 anyelement) that replaces any element in the array that IS NOT
 DISTINCT FROM $2 with $3 and returns the new array.  That could be
 useful for other things besides this particular case, too.

 I don't agree. Building or updating any array is little bit expensive.
 There can be same performance issue like combination array_agg and
 array_to_string versus string_agg.

 But is it really bad enough to introduce custom versions of every
 function that might want to do this sort of thing?

 I am not against to possible name
 changes. But I am strong in opinion so current string_to_array and
 array_to_string are buggy and have to be deprecated.

 But I don't think anyone else agrees with you.  The current behavior
 isn't the only one anyone might want, but it's one reasonable
 behavior.

see on discus to these function - this is Marlin Moncure proposal

http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg151503.html

these functions was designed in reaction to reporting bugs and
problems with serialisation and deserialisation of arrays with null
fields.

you can't to parse string to array with null values now

postgres=# select string_to_array('1,2,3,null,5',',')::int[];
ERROR:  invalid input syntax for integer: null
postgres=#

Regards

Pavel Stehule

 p.s. can we use a names - text_to_array, array_to_text ?

 That's not going to reduce confusion one bit...

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: to_string, to_array functions

2010-07-21 Thread Pavel Stehule

2010/7/21 Pavel Stehule pavel.steh...@gmail.com:
 2010/7/21 Robert Haas robertmh...@gmail.com:
 On Wed, Jul 21, 2010 at 7:39 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 2010/7/21 Robert Haas robertmh...@gmail.com:
 On Wed, Jul 21, 2010 at 12:39 AM, Itagaki Takahiro
 itagaki.takah...@gmail.com wrote:
 2010/7/20 Pavel Stehule pavel.steh...@gmail.com:
 here is a new version - new these functions are not a strict and
 function to_string is marked as stable.

 We have array_to_string(anyarray, text) and string_to_array(text, text),
 and you'll introduce to_string(anyarray, text, text) and
 to_array(text, text, text).
 Do we think it is good idea to have different names for them?  IMHO, we'd
 better  use 3 arguments version of array_to_string() instead of the
 new to_string() ?

 The worst part is that the new names are not very mnemonic.

 I think maybe what we really need here is array equivalents of
 COALESCE() and NULLIF().  It looks like the proposed to_string()
 function is basically equivalent to replacing each NULL entry with the
 array with a given value, and then doing array_to_string() as usual.
 And it looks like the proposed to_array function basically does the
 same thing as to_array(), and then replaces empty strings with NULL or
 some other value.

 Maybe we just need a function array_replace(anyarray, anyelement,
 anyelement) that replaces any element in the array that IS NOT
 DISTINCT FROM $2 with $3 and returns the new array.  That could be
 useful for other things besides this particular case, too.

 I don't agree. Building or updating any array is little bit expensive.
 There can be same performance issue like combination array_agg and
 array_to_string versus string_agg.

 But is it really bad enough to introduce custom versions of every
 function that might want to do this sort of thing?

please look on 
http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg151475.html

I am not alone  in opinion so current string to array functions has
not good design

Regards

Pavel



 I am not against to possible name
 changes. But I am strong in opinion so current string_to_array and
 array_to_string are buggy and have to be deprecated.

 But I don't think anyone else agrees with you.  The current behavior
 isn't the only one anyone might want, but it's one reasonable
 behavior.

 see on discus to these function - this is Marlin Moncure proposal

 http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg151503.html

 these functions was designed in reaction to reporting bugs and
 problems with serialisation and deserialisation of arrays with null
 fields.

 you can't to parse string to array with null values now

 postgres=# select string_to_array('1,2,3,null,5',',')::int[];
 ERROR:  invalid input syntax for integer: null
 postgres=#

 Regards

 Pavel Stehule

 p.s. can we use a names - text_to_array, array_to_text ?

 That's not going to reduce confusion one bit...

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] review: psql: edit function, show function commands patch

2010-07-21 Thread Pavel Stehule

Hello

I am sending a actualised patch.

I understand to your criticism about line numbering. I have to agree.
With line numbering the patch is longer. I have a one significant
reason for it. There are not conformance between line numbers of
CREATE FUNCTION statement and line numbers of function's body. Raise
exception, syntactic errors use a function body line numbers. But
users doesn't see alone function's body. He see a CREATE FUNCTION
statement. What more - and this depend on programmer style sometimes
is necessary to correct line number with -1. Now I have enough
knowledges of plpgsql, and I am possible to see a problematic row, but
it little bit hard task for beginners. You can see.

  CREATE OR REPLACE FUNCTION public.foo()
   RETURNS integer
   LANGUAGE plpgsql
  AS $function$
   1  begin
   2return 10/0;
   3  end;
  $function$

postgres=# select foo();
ERROR:  division by zero
CONTEXT:  SQL statement SELECT 10/0
PL/pgSQL function foo line 2 at RETURN
postgres=#

  CREATE OR REPLACE FUNCTION public.foo()
   RETURNS integer
   LANGUAGE plpgsql
   1  AS $function$ begin
   2  return 10/0;
   3  end;
  $function$

postgres=# select foo();
ERROR:  division by zero
CONTEXT:  SQL statement SELECT 10/0
PL/pgSQL function foo line 2 at RETURN

This is very trivial example - for more complex functions, the correct
line numbering is more useful.

2010/7/16 Jan Urbański wulc...@wulczer.org:
 Hi,

 here's a review of the \sf and \ef [num] patch from
 http://archives.postgresql.org/message-id/162867791003290927y3ca44051p80e697bc6b19d...@mail.gmail.com

 == Formatting ==

 The patch has some small tabs/spaces and whitespace  issues and it applies
 with some offsets, I ran pgindent and rebased against HEAD, attaching the
 resulting patch for your convenience.

 == Functionality ==

 The patch adds the following features:
  * \e file.txt num  -  starts a editor for the current query buffer and
 puts the cursor on the [num] line
  * \ef func num - starts a editor for a function and puts the cursor on the
 [num] line
  * \sf func - shows a full CREATE FUNCTION statement for the function
  * \sf+ func - the same, but with line numbers
  * \sf[+] func num - the same, but only from line num onward

 It only touches psql, so no performance or backend stability worries.

 In my humble opinion, only the \sf[+] is interesting, because it gives you a
 copy/pasteable version of the function definition without opening up an
 editor, and I can find that useful (OTOH: you can set PSQL_EDITOR to cat and
 get the same effect with \ef... ok, just joking). Line numbers are an extra
 touch, personally it does not thrill me too much, but I've nothing against
 it.

 The number variants of \e and \ef work by simply executing $EDITOR +num
 file. I tried with some editors that came to my mind, and not all of them
 support it (most do, though):

  * emacs and emacsclient work
  * vi works
  * nano works
  * pico works
  * mcedit works
  * kwrite does not work
  * kedit does not work

 not sure what other people (or for instance Windows people) use. Apart from
 no universal support from editors, it does not save that many keystrokes -
 at most a couple. In the end you can usually easily jump to the line you
 want once you are inside your dream editor.

I found, so there are a few editor for ms win with support for direct
line navigation. There isn't any standart. Next I tested kwrite and
KDE. There is usual a parameter --line. So you can you use a system
variable PSQL_NAVIGATION_COMMAND - for example - for KDE

PSQL_NAVIGATION_COMMAND=--line 

default is +n


 My recommendation would be to only integrate the \sf[+] part of the patch,
 which will have the additional benefit of making it much smaller and cleaner
 (will avoid the grotty splitting of the number from the function name, for
 instance). But I'm just another user out there, maybe others will find uses
 for the other cases.


I disagree. You cannot use a text editor command, because SQL
linenumbers are not equal to body line numbers.

 I would personally not add the leading and trailing newlines to \sf output,
 but that's a question of taste.

 Docs could use some small grammar fixes, but other than that they're fine.

 == Code ==

 In \sf code there just a strncmp, so this works:
 \sfblablabla funcname


fixed

 The error for an empty \sf is not great, it should probably look more like
 \sf: missing required argument
 following the examples of \pset, \copy or \prompt.

 Why is lnptr always being passed as a pointer? Looks like a unnecessary
 complication and one more variable to care about. Can't we just pass lineno?

fixed

I removed redundant code and appended a more comments/

Regards

Pavel Stehule


 == End ==

 Cheers,
 Jan



editfce.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-07-21 Thread Aidan Van Dyk

* Fujii Masao masao.fu...@gmail.com [100721 03:49]:

  The patch provides quorum parameter in postgresql.conf, which
  specifies how many standby servers transaction commit will wait for
  WAL records to be replicated to, before the command returns a
  success indication to the client. The default value is zero, which
  always doesn't make transaction commit wait for replication without
  regard to replication_mode. Also transaction commit always doesn't
  wait for replication to asynchronous standby (i.e., replication_mode
  is set to async) without regard to this parameter. If quorum is more
  than the number of synchronous standbys, transaction commit returns
  a success when the ACK has arrived from all of synchronous standbys.
 
  There should be a way to specify wait for *all* connected standby servers
  to acknowledge
 
 Agreed. I'll allow -1 as the valid value of the quorum parameter, which
 means that transaction commit waits for all connected standbys.

Hm... so if my 1 synchronouse standby is operatign normally, and quarum
is set to 1, I'll get what I want (commit waits until it's safely on both
servers).  But what happens if my standby goes bad.  Suddenly the quarum
setting is ignored (because it's  number of connected standby
servers?)  Is there a way for me to not allow any commits if the quarum
setting number of standbies is *not* availble?  Yes, I want my db to
halt in that situation, and yes, alarmbells will be ringing...

In reality, I'm likely to run 2 synchronous slaves, with quarum of 1.
So 1 slave can fail an dI can still have 2 going.  But if that 2nd slave
ever failed while the other was down, I definately don't want the master
to forge on ahead!

Of course, this won't be for everyone, just as the current just
connected standbys isn't for everything either...

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] patch: to_string, to_array functions

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 8:14 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 2010/7/21 Pavel Stehule pavel.steh...@gmail.com:
 2010/7/21 Robert Haas robertmh...@gmail.com:
 On Wed, Jul 21, 2010 at 7:39 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 2010/7/21 Robert Haas robertmh...@gmail.com:
 On Wed, Jul 21, 2010 at 12:39 AM, Itagaki Takahiro
 itagaki.takah...@gmail.com wrote:
 2010/7/20 Pavel Stehule pavel.steh...@gmail.com:
 here is a new version - new these functions are not a strict and
 function to_string is marked as stable.

 We have array_to_string(anyarray, text) and string_to_array(text, text),
 and you'll introduce to_string(anyarray, text, text) and
 to_array(text, text, text).
 Do we think it is good idea to have different names for them?  IMHO, we'd
 better  use 3 arguments version of array_to_string() instead of the
 new to_string() ?

 The worst part is that the new names are not very mnemonic.

 I think maybe what we really need here is array equivalents of
 COALESCE() and NULLIF().  It looks like the proposed to_string()
 function is basically equivalent to replacing each NULL entry with the
 array with a given value, and then doing array_to_string() as usual.
 And it looks like the proposed to_array function basically does the
 same thing as to_array(), and then replaces empty strings with NULL or
 some other value.

 Maybe we just need a function array_replace(anyarray, anyelement,
 anyelement) that replaces any element in the array that IS NOT
 DISTINCT FROM $2 with $3 and returns the new array.  That could be
 useful for other things besides this particular case, too.

 I don't agree. Building or updating any array is little bit expensive.
 There can be same performance issue like combination array_agg and
 array_to_string versus string_agg.

 But is it really bad enough to introduce custom versions of every
 function that might want to do this sort of thing?

 please look on 
 http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg151475.html

 I am not alone  in opinion so current string to array functions has
 not good design

OK, I stand corrected, although I'm not totally convinced.  I still
think to_array() and to_string() are not a good choice of names.  I am
not sure if we should reuse the existing names (adding a third
parameter) or pick something else, like array_concat() and
split_to_array().

Also, should we consider putting these in contrib/stringfunc rather
than core?  Or is there enough support for core that we should stick
with doing it that way?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: to_string, to_array functions

2010-07-21 Thread Pavel Stehule

 OK, I stand corrected, although I'm not totally convinced.  I still
 think to_array() and to_string() are not a good choice of names.  I am
 not sure if we should reuse the existing names (adding a third
 parameter) or pick something else, like array_concat() and
 split_to_array().


It was discussed before. I would to see some symmetry in names. The
bad thing is so great names like string_to_array and array_to_string
is used, and second bad thing was done three years ago when nobody
thinking about NULL values. I don't think, so we are able to repair
older functions - simply the default behave isn't optimal.

I am thinking so we have to do decision about string_to_array and
array_to_string deprecation first. If these function will be
deprecated, then we can use a similar names (and probably we should to
use a similar names) - so text_to_array or array_to_string can be
acceptable. If not, then this discus is needless - then to_string and
to_array have to be maximally in contrib - stringfunc is good idea -
and maybe we don't need thinking about new names.

 Also, should we consider putting these in contrib/stringfunc rather
 than core?  Or is there enough support for core that we should stick
 with doing it that way?


so it is one variant. I am not against to moving these function to
contrib/stringfunc.

I am thinking, so we have to solve question about marking
string_to_array and array_to_string functions as deprecated first.
Then we can move forward?? My opinion is known - I am for removing of
these function in future and replacing by modernized functions.

Others opinions???

Can we move forward?

Regards

Pavel

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Query optimization problem

2010-07-21 Thread Sam Mason

On Tue, Jul 20, 2010 at 09:57:06AM +0400, Zotov wrote:
  SELECT d1.ID, d2.ID
  FROM DocPrimary d1
JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
  WHERE (d1.ID=234409763) or (d2.ID=234409763)

You could try rewriting it to:

SELECT d1.ID, d2.ID
FROM DocPrimary d1
  JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
WHERE d1.ID=234409763
  UNION
SELECT d1.ID, d2.ID
FROM DocPrimary d1
  JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
WHERE d2.ID=234409763

This should have the same semantics as the original query.  I don't
believe PG knows how to do a rewrite like this at the moment.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Preliminary review of Synchronous Replication patches

2010-07-21 Thread Yeb Havinga


Hello Zoltán, Fujii and list,

Kevin asked me to do a preliminary review on both synchronous 
replication patches. Relevant posts on -hackers are:


(A) http://archives.postgresql.org/pgsql-hackers/2010-04/msg01516.php
(B) 
http://archives.postgresql.org/message-id/aanlktilgyl3y1jkdvhx02433coq7jlmqicsqmosbu...@mail.gmail.com

(1) http://archives.postgresql.org/pgsql-hackers/2010-05/msg00746.php
(2) http://archives.postgresql.org/pgsql-hackers/2010-05/msg01047.php
(3) 
http://wiki.postgresql.org/wiki/Streaming_Replication#Synchronization_capability


The first patch (A) was posted by Zoltán Böszörményi three months ago, 
with comments on -hackers in thread (1). The second patch by Fujii Masao 
a few days ago (B).


Since both patches overlap in functionality, applying one in core means 
not applying the other. Initially I set out to do a complete review of 
both patches and let the difficult choice of preferring one over the 
other to fellow reviewers. However, for the following reasons I believe 
that patch (A) should probably be withdrawn and the review effort 
continued on (B).


* patch (A) was designed and programmed without prior community 
involvement. This in itself doesn't make it a bad patch nor a bad way of 
contributing source code, however thread (1) shows that some issues were 
raised and more ideas existed.
* one of the leafs of thread (A) was (4) where Zoltán Böszörményi hints 
there might be a new version of the patch (replacing XIDs with LSNs). 
However to date no new version was posted. Also this in itself is not 
ground for rejection, but together with the existence of patch (B) gives 
rise to the idea that work on (A) might have halted.
* the work on patch (B) started actually with the post (1) where Fujii 
Masao indicates he is going to write a patch too, and proposes to work 
together with Zoltán Böszörményi on the design.
* patch (B) encompasses functionality of (A) and more, it also addresses 
some, if not all ideas on the design that were raised in the comments on 
patch (A)


Adding this up I have the impression that patch (A) will not get a newer 
version, based on the fact that a newer patch (B) exists which has more 
functionality and is partly based on community feedback on patch (A), 
where patch (A) itself is not. Therefore I think that the focus and 
review time during this commitfest should be on patch (B), unless Zoltán 
Böszörményi disagrees and supplies a new version of this patch.


Depending on a reaction of Zoltán Böszörményi I think patch (A) should 
be set to either Returned With Feedback, if a new version is in the 
making, or Rejected if not.


regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: I: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-07-21 Thread Leonardo Francalanci

 I think writetup_rawheap() and readtup_rawheap() are a little  complex,
 but should work as long as there are no padding between t_len and  t_self
 in HeapTupleData struct.
 
 - It might be cleaner if you write the  total item length
   and tuple data separately.
 - (char *) tuple +  sizeof(tuplen) might be more robust
   than  tuple-t_self.


- I used your functions 
- changed the docs for CLUSTER (I don't know if they make sense/are enough)
- added a minor comment


2 questions:
 
1) about the copypaste from FormIndexDatum comment: how can I improve it?
The idea is that we could have a faster call, but it would mean copying and
pasting a lot of code from FormIndexDatum.

2) what other areas can I comment more?


  

sorted_cluster-20100721.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Explicit psqlrc

2010-07-21 Thread Peter Eisentraut

On tis, 2010-07-20 at 11:48 -0400, Robert Haas wrote:
 It's tempting to propose making .psqlrc apply only in interactive
 mode, period.  But that would be an incompatibility with previous
 releases, and I'm not sure it's the behavior we want, either.

What is a use case for having .psqlrc be read in noninteractive use?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Explicit psqlrc

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 10:24 AM, Peter Eisentraut pete...@gmx.net wrote:
 On tis, 2010-07-20 at 11:48 -0400, Robert Haas wrote:
 It's tempting to propose making .psqlrc apply only in interactive
 mode, period.  But that would be an incompatibility with previous
 releases, and I'm not sure it's the behavior we want, either.

 What is a use case for having .psqlrc be read in noninteractive use?

Well, for example, if I hate the new ASCII format with a fiery passion
that can never be quenched (and, by the way, I do), then I'd like this
to apply:

\pset linestyle old-ascii

Even when I do this:

psql -c '...whatever...'

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multibyte charater set in levenshtein function

2010-07-21 Thread Alexander Korotkov

On Wed, Jul 21, 2010 at 5:54 AM, Robert Haas robertmh...@gmail.com wrote:

 This patch still needs some work.  It includes a bunch of stylistic
 changes that aren't relevant to the purpose of the patch.  There's no
 reason that I can see to change the existing levenshtein_internal
 function to take text arguments instead of char *, or to change !m to
 m == 0 in existing code, or to change the whitespace in the comments
 of that function.  All of those need to be reverted before we can
 consider committing this.

I changed arguments of function from char * to text * in order to avoid
text_to_cstring call. Same benefit can be achived by replacing char * with
char * and length.
I changed !m to m == 0 because Itagaki asked me to make it conforming coding
style. Do you think there is no reason to fix coding style in existing
code?

 There is a huge amount of duplicated code here.  I think it makes
 sense to have the multibyte version of the function be separate, but
 perhaps we can merge the less-than-or-equal to bits  into the main
 code, so that we only have two copies instead of four.  Perhaps we
 can't just add a max_d argument max_distance to levenshtein_internal;
 and if this value is =0 then it represents the max allowable
 distance, but if it is 0 then there is no limit.  Sure, that might
 slow down the existing code a bit, but it might not be significant.
 I'd at least like to see some numbers showing that it is significant
 before we go to this much trouble.

In these case we should add many checks of max_d in levenshtein_internal
function which make code more complex.
Actually, we can merge all four functions into one function. But such
function will have many checks about multibyte encoding and max_d. So, I see
four cases here:
1) one function with checks for multibyte encoding and max_d
2) two functions with checks for multibyte encoding
3) two functions with checks for max_d
4) four separate functions
If you prefer case number 3 you should argue your position little more.

 The code doesn't follow the project coding style.  Braces must be
 uncuddled.  Comment paragraphs will be reflowed unless they begin and
 end with --.  Function definitions should have the type
 declaration on one line and the function name at the start of the
 next.

 Freeing memory with pfree is likely a waste of time; is there any
 reason not to just rely on the memory context reset, as the original
 coding did?

Ok, I'll fix this things.


 I think we might need to remove the acknowledgments section from this
 code.  If everyone who touches this code adds their name, we're
 quickly going to have a mess.  If we're not going to remove the
 acknowledgments section, then please add my name, too, because I've
 already patched this code once...

In that case I think we can leave original acknowledgments section.


With best regards,
Alexander Korotkov.

Re: [HACKERS] antisocial things you can do in git (but not CVS)

2010-07-21 Thread Jonathan Corbet

On Tue, 20 Jul 2010 14:34:20 -0400
Robert Haas robertmh...@gmail.com wrote:

 I have some concerns related to the upcoming conversion to git and how
 we're going to avoid having things get messy as people start using the
 new repository.

Here's a few responses from the point of view of somebody who has been
working with git in the kernel community for some years now.  Hopefully
it's helpful...

 1. Inability to cleanly and easily (and programatically) identify who
 committed what.  

No, git tracks committer information separately, and it's easily
accessible.  Dig into the grungy details of git-log and you'll see that you
can get out just about anything you need, in any format.

IMHO, vandalizing the author field would be a mistake; it's your best way
of tracking where the patch came from and for ensuring credit in your
changelogs.  Why throw away information?

 2. Branch and tag management.  In CVS, there are branches and tags in
 only one place: on the server.  In git, you can have local branches
 and tags and remote branches and tags, and you can pull and push tags
 between servers.  If I'm working on a git repository that has branches
 master, REL9_0_STABLE .. REL7_4_STABLE, inner_join_removal,
 numeric_2b, and temprelnames, I want to make sure that I don't
 accidentally push the last three of those to the authoritative
 server... but I do want to push all the others.  Similarly I want to
 push only the corrects subset of tags (though that should be less of
 an issue, at least for me, as I don't usually create local tags).  I'm
 not sure how to set this up, though.

Branch push policy can be tweaked in your local config.  I'm less sure
about tags.  It's worth noting that the kernel community does very little
with push in general - things are much more often pulled.  That may not be
a workflow that's suitable for postgresql, though.

 3. Merge commits.  I believe that we have consensus that commits
 should always be done as a squash, so that the history of all of our
 branches is linear.  But it seems to me that someone could
 accidentally push a merge commit, either because they forgot to squash
 locally, or because of a conflict between their local git repo's
 master branch and origin/master.  Can we forbid this?

That seems like a terrible idea to me - why would you destroy history?
Obviously I've missed a discussion here.  But, the first time somebody
wants to use bisect to pinpoint a regression-causing patch, you'll wish you
had that information there.

 4. History rewriting.  Under what circumstances, if any, are we OK
 with rebasing the master?  For example, if we decide not to have merge
 commits, and somebody does a merge commit anyway, are we going to
 rebase to get rid of it?

A good general rule of thumb is to treat publicly-exposed history as
immutable.  As soon as you start rebasing trees you create misery for
anybody working with those trees.

If you're really set on avoiding things like merges, there are ways to set
up scripts on the server to enforce policies.

jon

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Explicit psqlrc

2010-07-21 Thread David Christensen


On Jul 21, 2010, at 9:42 AM, Robert Haas wrote:

 On Wed, Jul 21, 2010 at 10:24 AM, Peter Eisentraut pete...@gmx.net wrote:
 On tis, 2010-07-20 at 11:48 -0400, Robert Haas wrote:
 It's tempting to propose making .psqlrc apply only in interactive
 mode, period.  But that would be an incompatibility with previous
 releases, and I'm not sure it's the behavior we want, either.
 
 What is a use case for having .psqlrc be read in noninteractive use?
 
 Well, for example, if I hate the new ASCII format with a fiery passion
 that can never be quenched (and, by the way, I do), then I'd like this
 to apply:
 
 \pset linestyle old-ascii
 
 Even when I do this:
 
 psql -c '...whatever...'


Well, tossing out two possible solutions:

1) .psqlrc + .psql_profile (kinda like how bash separates out the 
interactive/non-interactive parts).  Kinda yucky, but it's a working solution.

2) have a flag which explicitly includes the psqlrc file in non-interactive use 
(perhaps if -x is available, use it for the analogue to -X).

Regards,

David
--
David Christensen
End Point Corporation
da...@endpoint.com





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Preliminary review of Synchronous Replication patches

2010-07-21 Thread Kevin Grittner

Yeb Havinga yebhavi...@gmail.com wrote:
 
 Kevin asked me to do a preliminary review on both synchronous 
 replication patches.
 
Thanks for doing so.
 
BTW, Yeb has emailed me off-list that he has more specific notes on
both patches, but has run into high priority items on his day job
which will prevent him from posting those for another day or two.
 
 Since both patches overlap in functionality, applying one in core
 means not applying the other. Initially I set out to do a complete
 review of both patches and let the difficult choice of preferring
 one over the other to fellow reviewers. However, for the following
 reasons I believe that patch (A) should probably be withdrawn and
 the review effort continued on (B).
 
Unless there are objections, I will mark the patch by Zoltán
Böszörményi as Returned with Feedback in a couple days, and ask that
everyone interested in this feature focus on advancing the patch by
Fujii Masao.  Given the scope and importance of this area, I think
we could benefit from another person or two signing on officially as
Reviewers.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sql/med review - problems with patching

2010-07-21 Thread Pavel Stehule

Hello

I am playing with foreign tables now.

I found a few small issues now:

* fg tables are not dumped via pg_dump
* autocomplete for CREATE FOREIGN DATA WRAPPER doesn't offer HANDLER
keyword (probably it isn't your problem)
* ERROR:  unrecognized objkind: 18 issue

create table omega(a int, b int, c int);
insert into omega select i, i+1, i+2 from generate_series(1,1000,3) g(i);


postgres=# SELECT * from pg_foreign_server ;
 srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | srvoptions
-+--++-+++
 fake|16384 |  16385 | |||
(1 row)

postgres=# SELECT * from pg_foreign_data_wrapper ;
 fdwname | fdwowner | fdwvalidator | fdwhandler | fdwacl | fdwoptions
-+--+--+++
 xx  |16384 | 3120 |   3121 ||
(1 row)

COPY omega to '/tmp/omega';

CREATE FOREIGN TABLE omega3(a int, b int, c int) SERVER fake OPTIONS
(filename '/tmp/omega');

create role tom;
grant select on omega2 to tom;

there was unstable behave - first call of select * from omega was
finished by * ERROR:  unrecognized objkind: 18 (I can't to simulate
later :( )

second was finished with correct exception

ERROR:  must be superuser to COPY to or from a file
HINT:  Anyone can COPY to stdout or from stdin. psql's \copy command
also works for anyone.

Have to be this security limits still ? I understand to this limit for
COPY statement, but I don't see a sense for foreign table. I agree -
only superuser can CREATE FOREIGN TABLE based on file fdw handler. But
why access via MED have to be limited?

I am very happy from implementation of file_fdw_handler. It is proof
so LIMIT isn't a problem, and I don't understand why it have to be a
problem for dblink handler.


postgres=# select count(*) from omega2;
  count
-
 3335004
(1 row)

Time: 1915,281 ms
postgres=# select count(*) from omega2;
  count
-
 3335004
(1 row)

Time: 1921,744 ms

postgres=# select count(*) from (select * from omega2 limit 1000) x;
 count
---
  1000
(1 row)

Time: 1,597 ms

From practical view I like to see a used option for any tables. I am
missing a more described info in \d command

Regards

Pavel




2010/7/20 Itagaki Takahiro itagaki.takah...@gmail.com:
 2010/7/14 Pavel Stehule pavel.steh...@gmail.com:
 please, can you refresh patch, please?

 Updated patch attached. The latest version is always in the git repo.
 http://repo.or.cz/w/pgsql-fdw.git   (branch: fdw)
 I'm developing the patch on postgres' git repo. So, regression test
 for dblink might fail because of out-of-sync issue between cvs and git.

 When I looked to documentation I miss a some tutorial for foreign
 tables. There are only reference. I miss some paragraph where is
 cleanly and simple specified what is possible now and whot isn't
 possible. Enhancing of dblink isn't documented

 Sure. I'll start to write documentation when we agree the design of FDW.

 In function  pgIterate(ForeignScanState *scanstate) you are iterare
 via pg result. I am thinking so using a cursor and fetching multiple
 rows should be preferable.

 Sure, but I'm thinking that it will be improved after libpq supports
 protocol-level cursor. The libpq improvement will be applied
 much more applications including postgresql_fdw.

 --
 Itagaki Takahiro


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Explicit psqlrc

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 11:31 AM, David Christensen da...@endpoint.com wrote:

 On Jul 21, 2010, at 9:42 AM, Robert Haas wrote:

 On Wed, Jul 21, 2010 at 10:24 AM, Peter Eisentraut pete...@gmx.net wrote:
 On tis, 2010-07-20 at 11:48 -0400, Robert Haas wrote:
 It's tempting to propose making .psqlrc apply only in interactive
 mode, period.  But that would be an incompatibility with previous
 releases, and I'm not sure it's the behavior we want, either.

 What is a use case for having .psqlrc be read in noninteractive use?

 Well, for example, if I hate the new ASCII format with a fiery passion
 that can never be quenched (and, by the way, I do), then I'd like this
 to apply:

 \pset linestyle old-ascii

 Even when I do this:

 psql -c '...whatever...'


 Well, tossing out two possible solutions:

 1) .psqlrc + .psql_profile (kinda like how bash separates out the 
 interactive/non-interactive parts).  Kinda yucky, but it's a working solution.

 2) have a flag which explicitly includes the psqlrc file in non-interactive 
 use (perhaps if -x is available, use it for the analogue to -X).

Hmm.  Well, that still doesn't solve the problem that -c and -f do
different things with respect to psqlrc, does it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: to_string, to_array functions

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 9:39 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 It was discussed before. I would to see some symmetry in names.

That's reasonable.

 The
 bad thing is so great names like string_to_array and array_to_string
 is used,

Yeah, those names are not too good.

 and second bad thing was done three years ago when nobody
 thinking about NULL values. I don't think, so we are able to repair
 older functions - simply the default behave isn't optimal.

This is a matter of opinion, but certainly it's not right for everyone.

 I am thinking so we have to do decision about string_to_array and
 array_to_string deprecation first. If these function will be
 deprecated, then we can use a similar names (and probably we should to
 use a similar names) - so text_to_array or array_to_string can be
 acceptable. If not, then this discus is needless - then to_string and
 to_array have to be maximally in contrib - stringfunc is good idea -
 and maybe we don't need thinking about new names.

Well, -1 from me for deprecating string_to_array and array_to_string.

I am not in favor of the names to_string and to_array even if we put
them in contrib, though.  The problem with string_to_array and
array_to_string is that they aren't descriptive enough, and
to_string/to_array is even less so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: to_string, to_array functions

2010-07-21 Thread Brendan Jurd

On 22 July 2010 01:55, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jul 21, 2010 at 9:39 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 I am thinking so we have to do decision about string_to_array and
 array_to_string deprecation first.

 Well, -1 from me for deprecating string_to_array and array_to_string.


For what it's worth, I agree with Pavel about the current behaviour in
core.  It's broken whenever NULLs come into play.  We need to improve
on this one way or another, and I think it would be a shame to deal
with a problem in core by adding something to contrib.

 I am not in favor of the names to_string and to_array even if we put
 them in contrib, though.  The problem with string_to_array and
 array_to_string is that they aren't descriptive enough, and
 to_string/to_array is even less so.

What about implode() and explode()?  It's got symmetry and it's
possibly more descriptive.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: to_string, to_array functions

2010-07-21 Thread Pavel Stehule

2010/7/21 Robert Haas robertmh...@gmail.com:
 On Wed, Jul 21, 2010 at 9:39 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 It was discussed before. I would to see some symmetry in names.

 That's reasonable.

 The
 bad thing is so great names like string_to_array and array_to_string
 is used,

 Yeah, those names are not too good.

 and second bad thing was done three years ago when nobody
 thinking about NULL values. I don't think, so we are able to repair
 older functions - simply the default behave isn't optimal.

 This is a matter of opinion, but certainly it's not right for everyone.

 I am thinking so we have to do decision about string_to_array and
 array_to_string deprecation first. If these function will be
 deprecated, then we can use a similar names (and probably we should to
 use a similar names) - so text_to_array or array_to_string can be
 acceptable. If not, then this discus is needless - then to_string and
 to_array have to be maximally in contrib - stringfunc is good idea -
 and maybe we don't need thinking about new names.

 Well, -1 from me for deprecating string_to_array and array_to_string.

 I am not in favor of the names to_string and to_array even if we put
 them in contrib, though.  The problem with string_to_array and
 array_to_string is that they aren't descriptive enough, and
 to_string/to_array is even less so.


I am not a English native speaker, so I have a different feeling.
These functions do array_serialisation and array_deseralisation, but
this names are too long. I have not idea about better names - it is
descriptive well (for me) text-array, array-text - and these names
shows very cleanly symmetry between functions. I have to repeat - it
is very clean for not native speaker.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] antisocial things you can do in git (but not CVS)

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 10:49 AM, Jonathan Corbet cor...@lwn.net wrote:
 1. Inability to cleanly and easily (and programatically) identify who
 committed what.

 No, git tracks committer information separately, and it's easily
 accessible.  Dig into the grungy details of git-log and you'll see that you
 can get out just about anything you need, in any format.

 IMHO, vandalizing the author field would be a mistake; it's your best way
 of tracking where the patch came from and for ensuring credit in your
 changelogs.  Why throw away information?

If git had a place to store all the information we care about, that
would be fine, but it doesn't.  Here's a recent attribution line I
used:

David Christensen. Reviewed by Steve Singer.  Some further changes by me.

There's no reviewer header, and there's no concept that a patch
might have come from the author (or perhaps multiple authors), but
then have been adjusted by one or more reviewers and then frobnicated
some more by the committer.  I'm not sure it's possible to create a
system that can effectively store all the ways we give credit and
attribution, but a single author line is definitely not it.  How much
do I have to change a patch before I would use my own name on the
author line rather than the patch author's?  A single byte?  A
non-whitespace change?  More than 15% of the patch?  And, oh by the
way, it's the committer who writes the commit message, not the patch
author, so at an *absolute minimum* that part of the commit object
isn't coming from the original author.

 2. Branch and tag management.  In CVS, there are branches and tags in
 only one place: on the server.  In git, you can have local branches
 and tags and remote branches and tags, and you can pull and push tags
 between servers.  If I'm working on a git repository that has branches
 master, REL9_0_STABLE .. REL7_4_STABLE, inner_join_removal,
 numeric_2b, and temprelnames, I want to make sure that I don't
 accidentally push the last three of those to the authoritative
 server... but I do want to push all the others.  Similarly I want to
 push only the corrects subset of tags (though that should be less of
 an issue, at least for me, as I don't usually create local tags).  I'm
 not sure how to set this up, though.

 Branch push policy can be tweaked in your local config.  I'm less sure
 about tags.  It's worth noting that the kernel community does very little
 with push in general - things are much more often pulled.  That may not be
 a workflow that's suitable for postgresql, though.

Seems like we've got this one worked out, per discussion upthread.

 3. Merge commits.  I believe that we have consensus that commits
 should always be done as a squash, so that the history of all of our
 branches is linear.  But it seems to me that someone could
 accidentally push a merge commit, either because they forgot to squash
 locally, or because of a conflict between their local git repo's
 master branch and origin/master.  Can we forbid this?

 That seems like a terrible idea to me - why would you destroy history?
 Obviously I've missed a discussion here.  But, the first time somebody
 wants to use bisect to pinpoint a regression-causing patch, you'll wish you
 had that information there.

In any commit pattern, if I use bisect to pinpoint a regression
causing patch, I will find the commit that broke it.  Whoever made
that particular commit is to blame.  Full stop.  I don't really care
where in the development of the patch that was eventually committed
the breakage happened, and I do not want to wade through 50 revs of
somebody's private development to find the particular place where they
made a thinko.  I only care that their patch *as committed* is broken.
 I don't think that non-linear history is an advantage in any
situation.  It may be an unavoidable necessity if you have lots of
cross-merging between different repositories, but we don't, so for us
it's just clutter.

 4. History rewriting.  Under what circumstances, if any, are we OK
 with rebasing the master?  For example, if we decide not to have merge
 commits, and somebody does a merge commit anyway, are we going to
 rebase to get rid of it?

 A good general rule of thumb is to treat publicly-exposed history as
 immutable.  As soon as you start rebasing trees you create misery for
 anybody working with those trees.

 If you're really set on avoiding things like merges, there are ways to set
 up scripts on the server to enforce policies.

Yep, Magnus coded it up today.  It works great.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] documentation for committing with git

2010-07-21 Thread Robert Haas

At the developer meeting, I promised to do the work of documenting how
committers should use git.  So here's a first version.

http://wiki.postgresql.org/wiki/Committing_with_Git

Note that while anyone is welcome to comment, I mostly care about
whether the document is adequate for our existing committers, rather
than whether someone who is not a committer thinks we should manage
the project differently... that might be an interesting discussion,
but we're theoretically making this switch in about a month, and
getting agreement on changing our current workflow will take about a
decade, so there is not time now to do the latter before we do the
former.  So I would ask everyone to consider postponing those
discussions until after we've made the switch and ironed out the
kinks.  On the other hand, if you have technical corrections, or if
you have suggestions on how to do the same things better (rather than
suggestions on what to do differently), that would be greatly
appreciated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: to_string, to_array functions

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 12:08 PM, Brendan Jurd dire...@gmail.com wrote:
 On 22 July 2010 01:55, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jul 21, 2010 at 9:39 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 I am thinking so we have to do decision about string_to_array and
 array_to_string deprecation first.

 Well, -1 from me for deprecating string_to_array and array_to_string.


 For what it's worth, I agree with Pavel about the current behaviour in
 core.  It's broken whenever NULLs come into play.  We need to improve
 on this one way or another, and I think it would be a shame to deal
 with a problem in core by adding something to contrib.

Fair enough.  I'm OK with putting it in core if we can come up with
suitable names.

 I am not in favor of the names to_string and to_array even if we put
 them in contrib, though.  The problem with string_to_array and
 array_to_string is that they aren't descriptive enough, and
 to_string/to_array is even less so.

 What about implode() and explode()?  It's got symmetry and it's
 possibly more descriptive.

Hmm, it's a thought.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: to_string, to_array functions

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 12:08 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 I am thinking so we have to do decision about string_to_array and
 array_to_string deprecation first. If these function will be
 deprecated, then we can use a similar names (and probably we should to
 use a similar names) - so text_to_array or array_to_string can be
 acceptable. If not, then this discus is needless - then to_string and
 to_array have to be maximally in contrib - stringfunc is good idea -
 and maybe we don't need thinking about new names.

 Well, -1 from me for deprecating string_to_array and array_to_string.

 I am not in favor of the names to_string and to_array even if we put
 them in contrib, though.  The problem with string_to_array and
 array_to_string is that they aren't descriptive enough, and
 to_string/to_array is even less so.

 I am not a English native speaker, so I have a different feeling.
 These functions do array_serialisation and array_deseralisation, but
 this names are too long. I have not idea about better names - it is
 descriptive well (for me) text-array, array-text - and these names
 shows very cleanly symmetry between functions. I have to repeat - it
 is very clean for not native speaker.

Well, the problem is that array_to_string(), for example, tells you
that an array is being converted to a string, but not how.  And
to_string() tells you that you're getting a string, but it doesn't
tell you either what you're getting it from or how you're getting it.
We already have a function to_char() which can be used to format a
whole bunch of different types as strings; I can't see adding a new
function with almost the same name that does something completely
different.

array_split() and array_join(), following Perl?  array_implode() and
array_explode(), along the lines suggested by Brendan?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: to_string, to_array functions

2010-07-21 Thread Michael Glaesemann


On Jul 21, 2010, at 12:30 , Robert Haas wrote:

 array_split() and array_join(), following Perl?

+1. Seems common in other languages such as Ruby, Python, and Java as well.

Michael Glaesemann
grzm seespotcode net




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Explicit psqlrc

2010-07-21 Thread Simon Riggs

On Wed, 2010-07-21 at 17:24 +0300, Peter Eisentraut wrote:
 On tis, 2010-07-20 at 11:48 -0400, Robert Haas wrote:
  It's tempting to propose making .psqlrc apply only in interactive
  mode, period.  But that would be an incompatibility with previous
  releases, and I'm not sure it's the behavior we want, either.
 
 What is a use case for having .psqlrc be read in noninteractive use?

Changing the historical defaults, such as error/exit behaviour, ensuring
timing is on etc..

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamically allocating chunks from shared memory

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 4:33 AM, Markus Wanner mar...@bluegap.ch wrote:
 Okay, so I just need to grok the SLRU stuff. Thanks for clarifying.

 Note that I sort of /want/ to mess with shared memory. It's what I know how
 to deal with. It's how threaded programs work as well. Ya know, locks,
 conditional variables, mutexes, all those nice thing that allow you to shoot
 your foot so terribly nicely... Oh, well...

For what it's worth, I feel your pain.  I think the SLRU method is
*probably* better, but I feel your pain anyway.

 For each backend, you store one pointer to the first queued
 message and one pointer to the last queued message.  New messages can
 be added by making the current last message point to a newly added
 message and updating the last message pointer for that backend.  You'd
 need to think about the locking and reference counting carefully to
 make sure you eventually freed up unused pages, but it seems like it
 might be doable.

 I've just read through slru.c, but still don't have a clue how it could
 replace a dynamic allocator.

 At the moment, the creator of an imessage allocs memory, copies the payload
 there and then activates the message by appending it to the recipient's
 queue. Upon getting signaled, the recipient consumes the message by removing
 it from the queue and is obliged to release the memory the messages occupies
 after having processed it. Simple and straight forward, IMO.

 The queue addition and removal is clear. But how would I do the alloc/free
 part with SLRU? Its blocks are fixed size (BLCKSZ) and the API with ReadPage
 and WritePage is rather unlike a pair of alloc() and free().

Given what you're trying to do, it does sound like you're going to
need some kind of an algorithm for space management; but you'll be
managing space within the SLRU rather than within shared_buffers.  For
example, you might end up putting a header on each SLRU page or
segment and using that to track the available freespace within that
segment for messages to be read and written.  It'll probably be a bit
more complex than the one for listen (see asyncQueueAddEntries).

 One big advantage of attacking the problem with an SLRU is that
 there's no fixed upper limit on the amount of data that can be
 enqueued at any given time.  You can spill to disk or whatever as
 needed (although hopefully you won't normally do so, for performance
 reasons).

 Yes, imessages shouldn't ever be spilled to disk. There naturally must be an
 upper limit for them. (Be it total available memory, as for threaded things
 or a given and size-constrained pool, as is the case for dynshmem).

I guess experience has taught me to be wary of things that are wired
in memory.  Under extreme memory pressure, something's got to give, or
the whole system will croak.  Consider also the contrary situation,
where the imessages stuff is not in use (even for a short period of
time, like a few minutes).  Then we'd really rather not still have
memory carved out for it.

 To me it rather sounds like SLRU is a candidate for using dynamically
 allocated shared memory underneath, instead of allocating a fixed amount of
 slots in advance. That would allow more efficient use of shared memory.
 (Given SLRU's ability to spill to disk, it could even be used to 'balance'
 out anomalies to some extent).

I think what would be even better is to merge the SLRU pools with the
shared_buffer pool, so that the two can duke it out for who is in most
need of the limited amount of memory available.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Explicit psqlrc

2010-07-21 Thread Alvaro Herrera

Excerpts from Peter Eisentraut's message of mié jul 21 10:24:26 -0400 2010:
 On tis, 2010-07-20 at 11:48 -0400, Robert Haas wrote:
  It's tempting to propose making .psqlrc apply only in interactive
  mode, period.  But that would be an incompatibility with previous
  releases, and I'm not sure it's the behavior we want, either.
 
 What is a use case for having .psqlrc be read in noninteractive use?

Even if there weren't one, why does it get applied to -f but not -c?
They're both noninteractive.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: to_string, to_array functions

2010-07-21 Thread Pavel Stehule

2010/7/21 Robert Haas robertmh...@gmail.com:
 On Wed, Jul 21, 2010 at 12:08 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 I am thinking so we have to do decision about string_to_array and
 array_to_string deprecation first. If these function will be
 deprecated, then we can use a similar names (and probably we should to
 use a similar names) - so text_to_array or array_to_string can be
 acceptable. If not, then this discus is needless - then to_string and
 to_array have to be maximally in contrib - stringfunc is good idea -
 and maybe we don't need thinking about new names.

 Well, -1 from me for deprecating string_to_array and array_to_string.

 I am not in favor of the names to_string and to_array even if we put
 them in contrib, though.  The problem with string_to_array and
 array_to_string is that they aren't descriptive enough, and
 to_string/to_array is even less so.

 I am not a English native speaker, so I have a different feeling.
 These functions do array_serialisation and array_deseralisation, but
 this names are too long. I have not idea about better names - it is
 descriptive well (for me) text-array, array-text - and these names
 shows very cleanly symmetry between functions. I have to repeat - it
 is very clean for not native speaker.

 Well, the problem is that array_to_string(), for example, tells you
 that an array is being converted to a string, but not how.  And
 to_string() tells you that you're getting a string, but it doesn't
 tell you either what you're getting it from or how you're getting it.
 We already have a function to_char() which can be used to format a
 whole bunch of different types as strings; I can't see adding a new
 function with almost the same name that does something completely
 different.

 array_split() and array_join(), following Perl?  array_implode() and
 array_explode(), along the lines suggested by Brendan?

I have a problem with array_split - because there string is split. I
looked on net - and languages usually uses a split or join. split
is method of str class in Java. So when I am following Perl, I feel
better with  just only split and join, but join is keyword :( -
step back, maybe string_split X array_join ?

select string_split('1,2,3,4',',');
select array_join(array[1,2,3,4],',');

so my preferences:

1. split, join - I checked - we are able to create join function
2. split, array_join - when only join can be a problem
3. string_split, array_join - there are not clean symmetry, but it
respect wide used a semantics - string.split, array.join
4. explode, implode
5. array_explode, array_implode
-- I cannot to like array_split - it is contradiction for me.

Pavel

p.s. It is typical use case for packages - with it, we can have the
functions string.split() and array.join()


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: to_string, to_array functions

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 1:48 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 2010/7/21 Robert Haas robertmh...@gmail.com:
 On Wed, Jul 21, 2010 at 12:08 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 I am thinking so we have to do decision about string_to_array and
 array_to_string deprecation first. If these function will be
 deprecated, then we can use a similar names (and probably we should to
 use a similar names) - so text_to_array or array_to_string can be
 acceptable. If not, then this discus is needless - then to_string and
 to_array have to be maximally in contrib - stringfunc is good idea -
 and maybe we don't need thinking about new names.

 Well, -1 from me for deprecating string_to_array and array_to_string.

 I am not in favor of the names to_string and to_array even if we put
 them in contrib, though.  The problem with string_to_array and
 array_to_string is that they aren't descriptive enough, and
 to_string/to_array is even less so.

 I am not a English native speaker, so I have a different feeling.
 These functions do array_serialisation and array_deseralisation, but
 this names are too long. I have not idea about better names - it is
 descriptive well (for me) text-array, array-text - and these names
 shows very cleanly symmetry between functions. I have to repeat - it
 is very clean for not native speaker.

 Well, the problem is that array_to_string(), for example, tells you
 that an array is being converted to a string, but not how.  And
 to_string() tells you that you're getting a string, but it doesn't
 tell you either what you're getting it from or how you're getting it.
 We already have a function to_char() which can be used to format a
 whole bunch of different types as strings; I can't see adding a new
 function with almost the same name that does something completely
 different.

 array_split() and array_join(), following Perl?  array_implode() and
 array_explode(), along the lines suggested by Brendan?

 I have a problem with array_split - because there string is split. I
 looked on net - and languages usually uses a split or join. split
 is method of str class in Java. So when I am following Perl, I feel
 better with  just only split and join, but join is keyword :( -
 step back, maybe string_split X array_join ?

 select string_split('1,2,3,4',',');
 select array_join(array[1,2,3,4],',');

 so my preferences:

 1. split, join - I checked - we are able to create join function
 2. split, array_join - when only join can be a problem
 3. string_split, array_join - there are not clean symmetry, but it
 respect wide used a semantics - string.split, array.join
 4. explode, implode
 5. array_explode, array_implode
 -- I cannot to like array_split - it is contradiction for me.

Well, I guess I prefer my suggestion to any of those (I know... what a
surprise), but I think I could live with #3, #4, or #5.  It's hard for
me to imagine that we really want to create a function called just
join(), given the other meanings that JOIN already has in SQL.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: to_string, to_array functions

2010-07-21 Thread Pavel Stehule

2010/7/21 Robert Haas robertmh...@gmail.com:
 On Wed, Jul 21, 2010 at 1:48 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 2010/7/21 Robert Haas robertmh...@gmail.com:
 On Wed, Jul 21, 2010 at 12:08 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 I am thinking so we have to do decision about string_to_array and
 array_to_string deprecation first. If these function will be
 deprecated, then we can use a similar names (and probably we should to
 use a similar names) - so text_to_array or array_to_string can be
 acceptable. If not, then this discus is needless - then to_string and
 to_array have to be maximally in contrib - stringfunc is good idea -
 and maybe we don't need thinking about new names.

 Well, -1 from me for deprecating string_to_array and array_to_string.

 I am not in favor of the names to_string and to_array even if we put
 them in contrib, though.  The problem with string_to_array and
 array_to_string is that they aren't descriptive enough, and
 to_string/to_array is even less so.

 I am not a English native speaker, so I have a different feeling.
 These functions do array_serialisation and array_deseralisation, but
 this names are too long. I have not idea about better names - it is
 descriptive well (for me) text-array, array-text - and these names
 shows very cleanly symmetry between functions. I have to repeat - it
 is very clean for not native speaker.

 Well, the problem is that array_to_string(), for example, tells you
 that an array is being converted to a string, but not how.  And
 to_string() tells you that you're getting a string, but it doesn't
 tell you either what you're getting it from or how you're getting it.
 We already have a function to_char() which can be used to format a
 whole bunch of different types as strings; I can't see adding a new
 function with almost the same name that does something completely
 different.

 array_split() and array_join(), following Perl?  array_implode() and
 array_explode(), along the lines suggested by Brendan?

 I have a problem with array_split - because there string is split. I
 looked on net - and languages usually uses a split or join. split
 is method of str class in Java. So when I am following Perl, I feel
 better with  just only split and join, but join is keyword :( -
 step back, maybe string_split X array_join ?

 select string_split('1,2,3,4',',');
 select array_join(array[1,2,3,4],',');

 so my preferences:

 1. split, join - I checked - we are able to create join function
 2. split, array_join - when only join can be a problem
 3. string_split, array_join - there are not clean symmetry, but it
 respect wide used a semantics - string.split, array.join
 4. explode, implode
 5. array_explode, array_implode
 -- I cannot to like array_split - it is contradiction for me.

 Well, I guess I prefer my suggestion to any of those (I know... what a
 surprise), but I think I could live with #3, #4, or #5.  It's hard for
 me to imagine that we really want to create a function called just
 join(), given the other meanings that JOIN already has in SQL.

it hasn't any relation to SQL language - but I don't expect so some
like this can be accepted by Tom :). So for this moment we are in
agreement on #3, #4, #5. I think, we can wait one or two days for
opinions of others - and than I'll fix patch. ok?

Regards
Pavel


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multibyte charater set in levenshtein function

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 7:40 AM, Alexander Korotkov
aekorot...@gmail.com wrote:
 On Wed, Jul 21, 2010 at 5:54 AM, Robert Haas robertmh...@gmail.com wrote:
 This patch still needs some work.  It includes a bunch of stylistic
 changes that aren't relevant to the purpose of the patch.  There's no
 reason that I can see to change the existing levenshtein_internal
 function to take text arguments instead of char *, or to change !m to
 m == 0 in existing code, or to change the whitespace in the comments
 of that function.  All of those need to be reverted before we can
 consider committing this.

 I changed arguments of function from char * to text * in order to avoid
 text_to_cstring call.

*scratches head*  Aren't you just moving the same call to a different place?

 Same benefit can be achived by replacing char * with
 char * and length.
 I changed !m to m == 0 because Itagaki asked me to make it conforming coding
 style. Do you think there is no reason to fix coding style in existing
 code?

Yeah, we usually try to avoid changing that sort of thing in existing
code, unless there's a very good reason.

 There is a huge amount of duplicated code here.  I think it makes
 sense to have the multibyte version of the function be separate, but
 perhaps we can merge the less-than-or-equal to bits  into the main
 code, so that we only have two copies instead of four.  Perhaps we
 can't just add a max_d argument max_distance to levenshtein_internal;
 and if this value is =0 then it represents the max allowable
 distance, but if it is 0 then there is no limit.  Sure, that might
 slow down the existing code a bit, but it might not be significant.
 I'd at least like to see some numbers showing that it is significant
 before we go to this much trouble.

 In these case we should add many checks of max_d in levenshtein_internal
 function which make code more complex.

When you say many checks, how many?

 Actually, we can merge all four functions into one function. But such
 function will have many checks about multibyte encoding and max_d. So, I see
 four cases here:
 1) one function with checks for multibyte encoding and max_d
 2) two functions with checks for multibyte encoding
 3) two functions with checks for max_d
 4) four separate functions
 If you prefer case number 3 you should argue your position little more.

I'm somewhat convinced that separating the multibyte case out has a
performance benefit both by intuition and because you posted some
numbers, but I haven't seen any argument for separating out the other
case, so I'm asking if you've checked and whether there is an effect
and whether it's significant.  The default is always to try to avoid
maintaining multiple copies of substantially identical code, due to
the danger that a future patch might fail to update all of them and
thus introduce a bug.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: to_string, to_array functions

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 2:25 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 2010/7/21 Robert Haas robertmh...@gmail.com:
 On Wed, Jul 21, 2010 at 1:48 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 2010/7/21 Robert Haas robertmh...@gmail.com:
 On Wed, Jul 21, 2010 at 12:08 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 I am thinking so we have to do decision about string_to_array and
 array_to_string deprecation first. If these function will be
 deprecated, then we can use a similar names (and probably we should to
 use a similar names) - so text_to_array or array_to_string can be
 acceptable. If not, then this discus is needless - then to_string and
 to_array have to be maximally in contrib - stringfunc is good idea -
 and maybe we don't need thinking about new names.

 Well, -1 from me for deprecating string_to_array and array_to_string.

 I am not in favor of the names to_string and to_array even if we put
 them in contrib, though.  The problem with string_to_array and
 array_to_string is that they aren't descriptive enough, and
 to_string/to_array is even less so.

 I am not a English native speaker, so I have a different feeling.
 These functions do array_serialisation and array_deseralisation, but
 this names are too long. I have not idea about better names - it is
 descriptive well (for me) text-array, array-text - and these names
 shows very cleanly symmetry between functions. I have to repeat - it
 is very clean for not native speaker.

 Well, the problem is that array_to_string(), for example, tells you
 that an array is being converted to a string, but not how.  And
 to_string() tells you that you're getting a string, but it doesn't
 tell you either what you're getting it from or how you're getting it.
 We already have a function to_char() which can be used to format a
 whole bunch of different types as strings; I can't see adding a new
 function with almost the same name that does something completely
 different.

 array_split() and array_join(), following Perl?  array_implode() and
 array_explode(), along the lines suggested by Brendan?

 I have a problem with array_split - because there string is split. I
 looked on net - and languages usually uses a split or join. split
 is method of str class in Java. So when I am following Perl, I feel
 better with  just only split and join, but join is keyword :( -
 step back, maybe string_split X array_join ?

 select string_split('1,2,3,4',',');
 select array_join(array[1,2,3,4],',');

 so my preferences:

 1. split, join - I checked - we are able to create join function
 2. split, array_join - when only join can be a problem
 3. string_split, array_join - there are not clean symmetry, but it
 respect wide used a semantics - string.split, array.join
 4. explode, implode
 5. array_explode, array_implode
 -- I cannot to like array_split - it is contradiction for me.

 Well, I guess I prefer my suggestion to any of those (I know... what a
 surprise), but I think I could live with #3, #4, or #5.  It's hard for
 me to imagine that we really want to create a function called just
 join(), given the other meanings that JOIN already has in SQL.

 it hasn't any relation to SQL language - but I don't expect so some
 like this can be accepted by Tom :). So for this moment we are in
 agreement on #3, #4, #5. I think, we can wait one or two days for
 opinions of others - and than I'll fix patch. ok?

Yeah, I'd like some more votes, too.  Aside from what I suggested
(array_join/array_split), I think my favorite is your #5.

We might also want to put some work into documentating the differences
between the old and new functions clearly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add column if not exists (CINE)

2010-07-21 Thread Bernd Helmle




--On 1. Mai 2010 23:09:23 -0400 Robert Haas robertmh...@gmail.com wrote:


On Wed, Apr 28, 2010 at 9:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:

CREATE OR REPLACE is indeed much more complicated.  In fact, for
tables, I maintain that you'll need to link with -ldwim to make it
work properly.


This may in fact be an appropriate way to handle the case for tables,
given the complexity of their definitions.


Patch attached.




I had an initial look at Robert's patch. Patch applies cleanly, 
documentation and regression tests included, everything works as expected. 
When looking at the functionality there's one thing that strikes me a 
little:


be...@localhost:bernd #*= CREATE TABLE IF NOT EXISTS foo(id int);
ERROR:  duplicate key value violates unique constraint 
pg_type_typname_nsp_index

DETAIL:  Key (typname, typnamespace)=(foo, 2200) already exists.

This is what you get from concurrent CINE commands. The typname thingie 
might be confusing by unexperienced users, but i think its hard to do 
anything about it ?





--
Thanks

Bernd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamically allocating chunks from shared memory

2010-07-21 Thread Markus Wanner


Hi,

first of all, thanks for your feedback, I enjoy the discussion.

On 07/21/2010 07:25 PM, Robert Haas wrote:

Given what you're trying to do, it does sound like you're going to
need some kind of an algorithm for space management; but you'll be
managing space within the SLRU rather than within shared_buffers.  For
example, you might end up putting a header on each SLRU page or
segment and using that to track the available freespace within that
segment for messages to be read and written.  It'll probably be a bit
more complex than the one for listen (see asyncQueueAddEntries).


But what would that buy us? Also consider that pretty much all available 
dynamic allocators use shared memory (either from the OS directly, or 
via mmap()'d area).



Yes, imessages shouldn't ever be spilled to disk. There naturally must be an
upper limit for them. (Be it total available memory, as for threaded things
or a given and size-constrained pool, as is the case for dynshmem).


I guess experience has taught me to be wary of things that are wired
in memory.  Under extreme memory pressure, something's got to give, or
the whole system will croak.


I absolutely agree to that last sentence. However, experience has taught 
/me/ to be wary of things that needlessly swap to disk for hours before 
reporting any kind of error (AKA swap hell). I prefer systems that 
adjust to the OOM condition, instead of just ignoring it and falling 
back to disk (which isn't doesn't provide infinite space, so that's just 
pushing the limits).


The solution for imessages certainly isn't spilling to disk, which would 
consume even more resources. Instead the process(es) for which there are 
pending imessages should be allowed to consume them.


That's why upon OOM, IMessageCreate currently simply blocks the process 
that wants to create an imessages. And yes, that's not quite perfect 
(that process should still consume messages for itself), and it might 
not play well with other potential users of dynamically allocated 
memory. But it certainly works better than spilling to disk (and yes, I 
tested that behavior within Postgres-R).



Consider also the contrary situation,
where the imessages stuff is not in use (even for a short period of
time, like a few minutes).  Then we'd really rather not still have
memory carved out for it.


Huh? That's exactly what dynamic allocation could give you: not having 
memory carved out for stuff you currently don't need, but instead being 
able to dynamically use memory where most needed. SLRU has memory (not 
disk space) carved out for pretty much every sub-system separately, if 
I'm reading that code correctly.



I think what would be even better is to merge the SLRU pools with the
shared_buffer pool, so that the two can duke it out for who is in most
need of the limited amount of memory available.


..well, just add the shared_buffer pool to the list of candidates that 
could use dynamically allocated shared memory. It would need some 
thinking about boundaries (i.e. when to spill to disk, for those modules 
that /want/ to spill to disk) and dealing with OOM situations, but 
that's about it.


Regards

Markus

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] managing git disk space usage

2010-07-21 Thread Dimitri Fontaine

Aidan Van Dyk ai...@highrise.ca writes:
 * Robert Haas robertmh...@gmail.com [100720 13:04]:
  
 3. Clone the origin once.  Apply patches to multiple branches by
 switching branches.  Playing around with it, this is probably a
 tolerable way to work when you're only going back one or two branches
 but it's certainly a big nuisance when you're going back 5-7 branches.

 This is what I do when I'm working on a project that has completely
 proper dependancies, and you don't need to always re-run configure
 between different branches.  I use ccache heavily, so configure takes
 longer than a complete build with a couple-dozen
 actually-not-previously-seen changes...

 But *all* dependancies need to be proper in the build system, or you end
 up needing a git-clean-type-cleanup between branch switches, forcing a
 new configure run too, which takes too much time...

 Maybe this will cause make dependancies to be refined in PG ;-)

Well, there's also the VPATH possibility, where all your build objects
are stored out of the way of the repo. So you could checkout the branch
you're interrested in, change to the associated build directory and
build there. And automate that of course.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] documentation for committing with git

2010-07-21 Thread Peter Eisentraut

On ons, 2010-07-21 at 12:22 -0400, Robert Haas wrote:
 At the developer meeting, I promised to do the work of documenting how
 committers should use git.  So here's a first version.
 
 http://wiki.postgresql.org/wiki/Committing_with_Git

Looks good.  Please consolidate this with the Committers page when the
day comes.

Comments:

3. ... your name and email address must match those configured on the
server

== How do we know what those are?  Who controls that?

6. Finally, you must push your changes back to the server.

git push

This will push changes in all branches you've updated, but only branches
that also exist on the remote side will be pushed; thus, you can have
local working branches that won't be pushed.

== This is true, but I have found it saner to configure push.default =
tracking, so that only the current branch is pushes.  Some people might
find that useful.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] documentation for committing with git

2010-07-21 Thread Magnus Hagander

On Wed, Jul 21, 2010 at 21:07, Peter Eisentraut pete...@gmx.net wrote:
 On ons, 2010-07-21 at 12:22 -0400, Robert Haas wrote:
 At the developer meeting, I promised to do the work of documenting how
 committers should use git.  So here's a first version.

 http://wiki.postgresql.org/wiki/Committing_with_Git

 Looks good.  Please consolidate this with the Committers page when the
 day comes.

 Comments:

 3. ... your name and email address must match those configured on the
 server

 == How do we know what those are?  Who controls that?

sysadmins team. It's set up when committers are added, just like
today's authormap on the git mirror. Before we set up the system,
we'll double check all of them with each committer, of course.


 6. Finally, you must push your changes back to the server.

 git push

 This will push changes in all branches you've updated, but only branches
 that also exist on the remote side will be pushed; thus, you can have
 local working branches that won't be pushed.

 == This is true, but I have found it saner to configure push.default =
 tracking, so that only the current branch is pushes.  Some people might
 find that useful.

Indeed. Why don't I do that more often...

+1 on making that a general recommendation, and have people only not
do that if they really know what they're doing :-)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] antisocial things you can do in git (but not CVS)

2010-07-21 Thread Andrew Dunstan




Jonathan Corbet wrote:

3. Merge commits.  I believe that we have consensus that commits
should always be done as a squash, so that the history of all of our
branches is linear.  But it seems to me that someone could
accidentally push a merge commit, either because they forgot to squash
locally, or because of a conflict between their local git repo's
master branch and origin/master.  Can we forbid this?



That seems like a terrible idea to me - why would you destroy history?
Obviously I've missed a discussion here.  But, the first time somebody
wants to use bisect to pinpoint a regression-causing patch, you'll wish you
had that information there.

  


We have a clear idea of what should be part of the public history 
contained in the authoritative repo and what should be history that is 
private to the developer/tester/committer. We don't want to pollute the 
former with the latter. The level of granularity of our current CVS 
commits seems to us to be about right.


So when a committer pushes a patch it should add one fast-forward commit 
to the tree. We want to be able to bisect between these commit objects, 
but not between all the work product commits that led up to them. Of 
course, developers, committers and testers can keep what they like 
privately - we're only talking about what should go in the authoritative 
repo.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] need more ALTER TABLE guards for typed tables

2010-07-21 Thread Peter Eisentraut

After some investigation I figured that I need to add two more checks
into the ALTER TABLE code to prevent certain types of direct changes to
typed tables (see attached patch).

But it's not clear to me whether such checks should go into the Prep
or the Exec phases.  Prep seems more plausible to me, but some
commands such as DropColumn don't have a Prep handler.  A clarification
would be helpful.

Index: src/backend/commands/tablecmds.c
===
RCS file: /cvsroot/pgsql/src/backend/commands/tablecmds.c,v
retrieving revision 1.332
diff -u -3 -p -r1.332 tablecmds.c
--- src/backend/commands/tablecmds.c	6 Jul 2010 19:18:56 -	1.332
+++ src/backend/commands/tablecmds.c	21 Jul 2010 14:34:41 -
@@ -5788,6 +5788,11 @@ ATPrepAlterColumnType(List **wqueue,
 	NewColumnValue *newval;
 	ParseState *pstate = make_parsestate(NULL);
 
+	if (rel-rd_rel-reloftype)
+		ereport(ERROR,
+(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg(cannot alter column type of typed table)));
+
 	/* lookup the attribute so we can check inheritance status */
 	tuple = SearchSysCacheAttName(RelationGetRelid(rel), colName);
 	if (!HeapTupleIsValid(tuple))
@@ -7126,6 +7131,11 @@ ATExecAddInherit(Relation child_rel, Ran
 	int32		inhseqno;
 	List	   *children;
 
+	if (child_rel-rd_rel-reloftype)
+		ereport(ERROR,
+(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg(cannot change inheritance of typed table)));
+
 	/*
 	 * AccessShareLock on the parent is what's obtained during normal CREATE
 	 * TABLE ... INHERITS ..., so should be enough here.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] documentation for committing with git

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 3:11 PM, Magnus Hagander mag...@hagander.net wrote:
 6. Finally, you must push your changes back to the server.

 git push

 This will push changes in all branches you've updated, but only branches
 that also exist on the remote side will be pushed; thus, you can have
 local working branches that won't be pushed.

 == This is true, but I have found it saner to configure push.default =
 tracking, so that only the current branch is pushes.  Some people might
 find that useful.

 Indeed. Why don't I do that more often...

 +1 on making that a general recommendation, and have people only not
 do that if they really know what they're doing :-)

Hmm, I didn't know about that option.  What makes us think that's the
behavior people will most often want?  Because it doesn't seem like
what I want, just for one example...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] documentation for committing with git

2010-07-21 Thread Magnus Hagander

On Wed, Jul 21, 2010 at 21:20, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jul 21, 2010 at 3:11 PM, Magnus Hagander mag...@hagander.net wrote:
 6. Finally, you must push your changes back to the server.

 git push

 This will push changes in all branches you've updated, but only branches
 that also exist on the remote side will be pushed; thus, you can have
 local working branches that won't be pushed.

 == This is true, but I have found it saner to configure push.default =
 tracking, so that only the current branch is pushes.  Some people might
 find that useful.

 Indeed. Why don't I do that more often...

 +1 on making that a general recommendation, and have people only not
 do that if they really know what they're doing :-)

 Hmm, I didn't know about that option.  What makes us think that's the
 behavior people will most often want?  Because it doesn't seem like
 what I want, just for one example...

It'd be what I want for everything *except* when doing backpatching.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] documentation for committing with git

2010-07-21 Thread David Christensen


On Jul 21, 2010, at 2:20 PM, Robert Haas wrote:

 On Wed, Jul 21, 2010 at 3:11 PM, Magnus Hagander mag...@hagander.net wrote:
 6. Finally, you must push your changes back to the server.
 
 git push
 
 This will push changes in all branches you've updated, but only branches
 that also exist on the remote side will be pushed; thus, you can have
 local working branches that won't be pushed.
 
 == This is true, but I have found it saner to configure push.default =
 tracking, so that only the current branch is pushes.  Some people might
 find that useful.
 
 Indeed. Why don't I do that more often...
 
 +1 on making that a general recommendation, and have people only not
 do that if they really know what they're doing :-)
 
 Hmm, I didn't know about that option.  What makes us think that's the
 behavior people will most often want?  Because it doesn't seem like
 what I want, just for one example...


So you're working on some back branch, and make a WIP commit so you can switch 
to master to make a quick commit.  Create a push on master.  Bare git push.  
WIP commit gets pushed upstream.  Oops.

Regards,

David
--
David Christensen
End Point Corporation
da...@endpoint.com





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] documentation for committing with git

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 3:23 PM, David Christensen da...@endpoint.com wrote:

 On Jul 21, 2010, at 2:20 PM, Robert Haas wrote:

 On Wed, Jul 21, 2010 at 3:11 PM, Magnus Hagander mag...@hagander.net wrote:
 6. Finally, you must push your changes back to the server.

 git push

 This will push changes in all branches you've updated, but only branches
 that also exist on the remote side will be pushed; thus, you can have
 local working branches that won't be pushed.

 == This is true, but I have found it saner to configure push.default =
 tracking, so that only the current branch is pushes.  Some people might
 find that useful.

 Indeed. Why don't I do that more often...

 +1 on making that a general recommendation, and have people only not
 do that if they really know what they're doing :-)

 Hmm, I didn't know about that option.  What makes us think that's the
 behavior people will most often want?  Because it doesn't seem like
 what I want, just for one example...


 So you're working on some back branch, and make a WIP commit so you can 
 switch to master to make a quick commit.  Create a push on master.  Bare git 
 push.  WIP commit gets pushed upstream.  Oops.

Sure, oops, but I would never do that.  I'd stash it or put it on a
topic branch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] managing git disk space usage

2010-07-21 Thread Alvaro Herrera

Excerpts from Dimitri Fontaine's message of mié jul 21 15:00:48 -0400 2010:

 Well, there's also the VPATH possibility, where all your build objects
 are stored out of the way of the repo. So you could checkout the branch
 you're interrested in, change to the associated build directory and
 build there. And automate that of course.

This does not work as cleanly as you suppose, because some build
objects are stored in the source tree.  configure being one of them.
So if you switch branches, configure is rerun even in a VPATH build,
which is undesirable.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] documentation for committing with git

2010-07-21 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mié jul 21 15:26:47 -0400 2010:

  So you're working on some back branch, and make a WIP commit so you can 
  switch to master to make a quick commit.  Create a push on master.  Bare 
  git push.  WIP commit gets pushed upstream.  Oops.
 
 Sure, oops, but I would never do that.  I'd stash it or put it on a
 topic branch.

Somebody else will.  Please remember you're writing docs that are not
for yourself.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] antisocial things you can do in git (but not CVS)

2010-07-21 Thread Alvaro Herrera

Excerpts from Andrew Dunstan's message of mié jul 21 15:11:41 -0400 2010:
 
 Jonathan Corbet wrote:

  That seems like a terrible idea to me - why would you destroy history?
  Obviously I've missed a discussion here.  But, the first time somebody
  wants to use bisect to pinpoint a regression-causing patch, you'll wish you
  had that information there.
 

 So when a committer pushes a patch it should add one fast-forward commit 
 to the tree. We want to be able to bisect between these commit objects, 
 but not between all the work product commits that led up to them. Of 
 course, developers, committers and testers can keep what they like 
 privately - we're only talking about what should go in the authoritative 
 repo.

I don't disagree that we're going to squash commits, but I don't believe
that developers will be able to keep what they like privately.  The
commit objects for the final patch are going to differ, if only because
they have different parents than the ones on the main branch.

Of course, they will be able to have a local branch with their local
patch, but to Git there will be no relationship between this branch and
the final, squashed patch in the authoritative repo.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] documentation for committing with git

2010-07-21 Thread Andrew Dunstan




Robert Haas wrote:

At the developer meeting, I promised to do the work of documenting how
committers should use git.  So here's a first version.

http://wiki.postgresql.org/wiki/Committing_with_Git

Note that while anyone is welcome to comment, I mostly care about
whether the document is adequate for our existing committers, rather
than whether someone who is not a committer thinks we should manage
the project differently... that might be an interesting discussion,
but we're theoretically making this switch in about a month, and
getting agreement on changing our current workflow will take about a
decade, so there is not time now to do the latter before we do the
former.  So I would ask everyone to consider postponing those
discussions until after we've made the switch and ironed out the
kinks.  On the other hand, if you have technical corrections, or if
you have suggestions on how to do the same things better (rather than
suggestions on what to do differently), that would be greatly
appreciated.
  


Well, either we have a terminology problem or a statement of policy that 
I'm not sure I agree with, in point 2.  IMNSHO, what we need to forbid 
is commits that are not fast-forward commits, i.e. that do not have the 
current branch head as an ancestor, ideally as the immediate ancestor.


Personally, I have a strong opinion that for everything but totally 
trivial patches, the committer should create a short-lived work branch 
where all the work is done, and then do a squash merge back to the main 
branch, which is then pushed. This pattern is not mentioned at all. In 
my experience, it is essential, especially if you're working on more 
than one thing at a time, as many people often are.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] documentation for committing with git

2010-07-21 Thread Magnus Hagander

On Wed, Jul 21, 2010 at 21:37, Andrew Dunstan and...@dunslane.net wrote:


 Robert Haas wrote:

 At the developer meeting, I promised to do the work of documenting how
 committers should use git.  So here's a first version.

 http://wiki.postgresql.org/wiki/Committing_with_Git

 Note that while anyone is welcome to comment, I mostly care about
 whether the document is adequate for our existing committers, rather
 than whether someone who is not a committer thinks we should manage
 the project differently... that might be an interesting discussion,
 but we're theoretically making this switch in about a month, and
 getting agreement on changing our current workflow will take about a
 decade, so there is not time now to do the latter before we do the
 former.  So I would ask everyone to consider postponing those
 discussions until after we've made the switch and ironed out the
 kinks.  On the other hand, if you have technical corrections, or if
 you have suggestions on how to do the same things better (rather than
 suggestions on what to do differently), that would be greatly
 appreciated.


 Well, either we have a terminology problem or a statement of policy that I'm
 not sure I agree with, in point 2.  IMNSHO, what we need to forbid is
 commits that are not fast-forward commits, i.e. that do not have the current
 branch head as an ancestor, ideally as the immediate ancestor.

 Personally, I have a strong opinion that for everything but totally trivial
 patches, the committer should create a short-lived work branch where all the
 work is done, and then do a squash merge back to the main branch, which is
 then pushed. This pattern is not mentioned at all. In my experience, it is
 essential, especially if you're working on more than one thing at a time, as
 many people often are.

Uh, that's going to create an actual merge commit, no? Or you mean
squash-merge-but-only-fast-forward?

I *think* the docs is based off the pattern of the committer having
two repositories - one for his own work, one for comitting, much like
I assume all of us have today in cvs.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] documentation for committing with git

2010-07-21 Thread David Christensen


On Jul 21, 2010, at 2:39 PM, Magnus Hagander wrote:

 On Wed, Jul 21, 2010 at 21:37, Andrew Dunstan and...@dunslane.net wrote:
 
 
 Robert Haas wrote:
 
 At the developer meeting, I promised to do the work of documenting how
 committers should use git.  So here's a first version.
 
 http://wiki.postgresql.org/wiki/Committing_with_Git
 
 Note that while anyone is welcome to comment, I mostly care about
 whether the document is adequate for our existing committers, rather
 than whether someone who is not a committer thinks we should manage
 the project differently... that might be an interesting discussion,
 but we're theoretically making this switch in about a month, and
 getting agreement on changing our current workflow will take about a
 decade, so there is not time now to do the latter before we do the
 former.  So I would ask everyone to consider postponing those
 discussions until after we've made the switch and ironed out the
 kinks.  On the other hand, if you have technical corrections, or if
 you have suggestions on how to do the same things better (rather than
 suggestions on what to do differently), that would be greatly
 appreciated.
 
 
 Well, either we have a terminology problem or a statement of policy that I'm
 not sure I agree with, in point 2.  IMNSHO, what we need to forbid is
 commits that are not fast-forward commits, i.e. that do not have the current
 branch head as an ancestor, ideally as the immediate ancestor.
 
 Personally, I have a strong opinion that for everything but totally trivial
 patches, the committer should create a short-lived work branch where all the
 work is done, and then do a squash merge back to the main branch, which is
 then pushed. This pattern is not mentioned at all. In my experience, it is
 essential, especially if you're working on more than one thing at a time, as
 many people often are.
 
 Uh, that's going to create an actual merge commit, no? Or you mean
 squash-merge-but-only-fast-forward?
 
 I *think* the docs is based off the pattern of the committer having
 two repositories - one for his own work, one for comitting, much like
 I assume all of us have today in cvs.

You can also do a rebase after the merge to remove the local merge commit 
before pushing.  I tend to do this anytime I merge a local branch, just to 
rebase on top of the most recent origin/master.

Regards,

David
--
David Christensen
End Point Corporation
da...@endpoint.com





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] need more ALTER TABLE guards for typed tables

2010-07-21 Thread Alvaro Herrera

Excerpts from Peter Eisentraut's message of mié jul 21 15:18:58 -0400 2010:
 After some investigation I figured that I need to add two more checks
 into the ALTER TABLE code to prevent certain types of direct changes to
 typed tables (see attached patch).
 
 But it's not clear to me whether such checks should go into the Prep
 or the Exec phases.  Prep seems more plausible to me, but some
 commands such as DropColumn don't have a Prep handler.  A clarification
 would be helpful.

I think if there's no Prep phase, you should add it.  I don't think it
makes sense to have this kind of check in Exec.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] documentation for committing with git

2010-07-21 Thread Andrew Dunstan




Magnus Hagander wrote:

Personally, I have a strong opinion that for everything but totally trivial
patches, the committer should create a short-lived work branch where all the
work is done, and then do a squash merge back to the main branch, which is
then pushed. This pattern is not mentioned at all. In my experience, it is
essential, especially if you're working on more than one thing at a time, as
many people often are.



Uh, that's going to create an actual merge commit, no? Or you mean
squash-merge-but-only-fast-forward?
  


Yes, exactly that. Something like:

   git checkout -b myworkbranch
   ... work, test, commit, rinse, lather repeat ...
   git checkout RELn_m_STABLE
   git pull
   git merge --squash myworkbranch
   git push


I *think* the docs is based off the pattern of the committer having
two repositories - one for his own work, one for comitting, much like
I assume all of us have today in cvs.

  


So then what? After you've done your work you'll still need to pull the 
stuff somehow into your commit tree. I don't think this will buy you a 
lot. I usually clone the whole CVS tree for non-trivial work, but I'm 
not sure that's an ideal work pattern.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] antisocial things you can do in git (but not CVS)

2010-07-21 Thread Jonathan Corbet

On Wed, 21 Jul 2010 15:11:41 -0400
Andrew Dunstan and...@dunslane.net wrote:

 We have a clear idea of what should be part of the public history 
 contained in the authoritative repo and what should be history that is 
 private to the developer/tester/committer. We don't want to pollute the 
 former with the latter.

The thought makes me shudder...you lose the history, the reasons for
specific changes, the authorship of changes, and the ability of your
testers to pinpoint problematic changes.  But...your project, your
decision...we'll keep using PostgreSQL regardless...:)

Thanks,

jon

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Git conversion progress report and call for testing assistance

2010-07-21 Thread Magnus Hagander

Here's a status update on the git conversion, as well as a call for some help
mainly in testing.

After testing a bunch of tools, I've found that using cvs2git is by far the
best option when keeping keywords. It's the one that gives only the issues
that I posted about a couple of days ago.

So I've proceeded based off this one to something that would be how we create
the real repository once we go. This means I've scripted the removal of the
$PostgreSQL$ tags from the tip of the active branches as one big commit after
the migration.

I've also set up the git server and the scripts around it, that we can
eventually use. This includes commit email sending, commit policy enforcement
(no merge commits, correct author/committer tag etc) and proper access control
(a modified version of the one on git.postgresql.org - since we definitely
don't want any external dependencies for the main repository).

This is all available for testing now.

Marc has set up a mailinglist at pgsql-committers-t...@postgresql.org where
commit messages from the new system is sent. If you care about what they look
like, subscribe there and wait for one to show up :-) Subscription is done
the usual way.

Anonymous users can view the repository at git.postgresql.org using gitweb
or the git:// protocol, under the name postgresql-migration.
DISCLAIMER: DO NOT BASE ANY WORK OFF THIS REPOSITORY. IT *WILL* BE RECREATED
SEVERAL TIMES AND MAY CHANGE COMPLETELY!

Existing committers have been set up to access the new repository at
ssh://g...@gitmaster.postgresql.org/postgresql.git.
Robert Haas has written some instructoins for how to use this - please read
and review those.

And in general, a call to committers: please test this! Now is the time, not
after we've migrated ;) Just throw in some random commits, and some non-random
ones, both to get yourself familiar with the workflow and to iron out the bugs
in the scripts (I'm sure they're there).


For those interested in what's done, the scripts running this are all up on
github at http://github.com/mhagander/pg_githooks. The root contains the
scripts for commit messages, policy enforcement and access control.

There's also a temporary directory called migration that contains the
scripts and configuration files that are in use for the version of the
repository that is up there now. There's some minor plumbing around these
that isn't up there yet, but in general it is all that's used. And note that
if you want to play with it, the script uses around 8Gb of temp disk space
when running, so make sure you have enough space if you do it in a VM...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] documentation for committing with git

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 3:31 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of mié jul 21 15:26:47 -0400 2010:

  So you're working on some back branch, and make a WIP commit so you can 
  switch to master to make a quick commit.  Create a push on master.  Bare 
  git push.  WIP commit gets pushed upstream.  Oops.

 Sure, oops, but I would never do that.  I'd stash it or put it on a
 topic branch.

 Somebody else will.  Please remember you're writing docs that are not
 for yourself.

I don't have any problem suggesting it for those who may want it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] git config user.email

2010-07-21 Thread Robert Haas

We need to decide what email addresses committers will use on the new
git repository when they commit.  Although I think we have more votes
(at least from committers) for always having author == committer,
rather than possibly setting the author tag to some other value, the
issue exists independently of that.  I believe we want to try to set
things up so that committers will not need to change the email address
they are using to commit even if their employment situation changes.
Because if that happens, then it becomes more difficult to keep track
of who is who.

My initial suggestion was to say that everyone should just be
usern...@postgresql.org; but I think that met with some resistance.
Magnus, for example, tells me that he is a committer for multiple
projects, and is mag...@hagander.net at all of them.  Since that's a
domain name he owns personally, it seems safe enough.  But I'm
inclined to think we should avoid things like rh...@commandprompt.com,
just on the off chance JD decides to fire me.

Of course, I expect there might be some dissenting voices on this
point, so... thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Disk caching

2010-07-21 Thread mac_man2...@yahoo.it


Hi to all.

I am trying to see how PostgreSQL performance changes on the basis of 
work_mem. So, I am going to execute the 22 queries of TPCH 
(http://www.tpc.org/tpch/) again and again, each time for a different 
value of work_mem.
Since I am interested just in work_mem variations, I should prevent each 
query to take advantages from revious executions of the 22 queries them 
selves. For example, taking cache advantages. So, taking into account 
that the 22 queries are those http://pastebin.com/7Dg50YRZ and are 
executed on tables of hundreds of MB and


1) Is it sufficient to run change the values of work_mem through psql 
and running the queries again without restarting postgres?


2) Or, should I restart postgres?

3) Or, shoud I restart the machine each time I execute the 22 queries?

Thanks for your time.
Regards.

Manolo.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: to_string, to_array functions

2010-07-21 Thread Pavel Stehule

2010/7/21 Robert Haas robertmh...@gmail.com:
 On Wed, Jul 21, 2010 at 2:25 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 2010/7/21 Robert Haas robertmh...@gmail.com:
 On Wed, Jul 21, 2010 at 1:48 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 2010/7/21 Robert Haas robertmh...@gmail.com:
 On Wed, Jul 21, 2010 at 12:08 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 I am thinking so we have to do decision about string_to_array and
 array_to_string deprecation first. If these function will be
 deprecated, then we can use a similar names (and probably we should to
 use a similar names) - so text_to_array or array_to_string can be
 acceptable. If not, then this discus is needless - then to_string and
 to_array have to be maximally in contrib - stringfunc is good idea -
 and maybe we don't need thinking about new names.

 Well, -1 from me for deprecating string_to_array and array_to_string.

 I am not in favor of the names to_string and to_array even if we put
 them in contrib, though.  The problem with string_to_array and
 array_to_string is that they aren't descriptive enough, and
 to_string/to_array is even less so.

 I am not a English native speaker, so I have a different feeling.
 These functions do array_serialisation and array_deseralisation, but
 this names are too long. I have not idea about better names - it is
 descriptive well (for me) text-array, array-text - and these names
 shows very cleanly symmetry between functions. I have to repeat - it
 is very clean for not native speaker.

 Well, the problem is that array_to_string(), for example, tells you
 that an array is being converted to a string, but not how.  And
 to_string() tells you that you're getting a string, but it doesn't
 tell you either what you're getting it from or how you're getting it.
 We already have a function to_char() which can be used to format a
 whole bunch of different types as strings; I can't see adding a new
 function with almost the same name that does something completely
 different.

 array_split() and array_join(), following Perl?  array_implode() and
 array_explode(), along the lines suggested by Brendan?

 I have a problem with array_split - because there string is split. I
 looked on net - and languages usually uses a split or join. split
 is method of str class in Java. So when I am following Perl, I feel
 better with  just only split and join, but join is keyword :( -
 step back, maybe string_split X array_join ?

 select string_split('1,2,3,4',',');
 select array_join(array[1,2,3,4],',');

 so my preferences:

 1. split, join - I checked - we are able to create join function
 2. split, array_join - when only join can be a problem
 3. string_split, array_join - there are not clean symmetry, but it
 respect wide used a semantics - string.split, array.join
 4. explode, implode
 5. array_explode, array_implode
 -- I cannot to like array_split - it is contradiction for me.

 Well, I guess I prefer my suggestion to any of those (I know... what a
 surprise), but I think I could live with #3, #4, or #5.  It's hard for
 me to imagine that we really want to create a function called just
 join(), given the other meanings that JOIN already has in SQL.

 it hasn't any relation to SQL language - but I don't expect so some
 like this can be accepted by Tom :). So for this moment we are in
 agreement on #3, #4, #5. I think, we can wait one or two days for
 opinions of others - and than I'll fix patch. ok?

 Yeah, I'd like some more votes, too.  Aside from what I suggested
 (array_join/array_split), I think my favorite is your #5.


ok

#5 - it is absolutely out of me - explode, implode are used in Czech
only with relation to bombs. In this moment I have a problem to decide
what is related to string_to_array and array_to_string - it is nothing
against to your opinion, just it means, so  it hasn't any meaning for
me - and probably for lot of foreign developers. But I found on net,
that people use this names.

 We might also want to put some work into documentating the differences
 between the old and new functions clearly.


sure

Pavel


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: to_string, to_array functions

2010-07-21 Thread Merlin Moncure

On Wed, Jul 21, 2010 at 2:28 PM, Robert Haas robertmh...@gmail.com wrote:
 Yeah, I'd like some more votes, too.  Aside from what I suggested
 (array_join/array_split), I think my favorite is your #5.

-1 for me for any name that is of the form of:
type_operation();

we don't have bytea_encode, array_unnest(), date_to_char(), etc.  the
non-internal ones that we do have (mostly array funcs), are improperly
named imo.  this is sql, not c.  suppose we want to extend string
serialization to row types?

why not serialize/unserialize?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multibyte charater set in levenshtein function

2010-07-21 Thread Alexander Korotkov

On Wed, Jul 21, 2010 at 10:25 PM, Robert Haas robertmh...@gmail.com wrote:

 *scratches head*  Aren't you just moving the same call to a different
 place?

So, where you can find this different place? :) In this patch
null-terminated strings are not used at all.


 Yeah, we usually try to avoid changing that sort of thing in existing
  code, unless there's a very good reason.

Ok.

 In these case we should add many checks of max_d in levenshtein_internal
  function which make code more complex.

 When you say many checks, how many?

  Actually, we can merge all four functions into one function. But such
  function will have many checks about multibyte encoding and max_d. So, I
 see
  four cases here:
  1) one function with checks for multibyte encoding and max_d
  2) two functions with checks for multibyte encoding
  3) two functions with checks for max_d
  4) four separate functions
  If you prefer case number 3 you should argue your position little more.

 I'm somewhat convinced that separating the multibyte case out has a
 performance benefit both by intuition and because you posted some
 numbers, but I haven't seen any argument for separating out the other
 case, so I'm asking if you've checked and whether there is an effect
 and whether it's significant.  The default is always to try to avoid
 maintaining multiple copies of substantially identical code, due to
 the danger that a future patch might fail to update all of them and
 thus introduce a bug.


I've tested it with big value of max_d and I thought that it's evident that
checking for negative value of max_d will not produce significant benefit.
Anyway, I tried to add checking for negative max_d into
levenshtein_less_equal_mb function.

static int
levenshtein_less_equal_internal_mb(char *s, char *t, int s_len, int t_len,
int ins_c, int del_c, int sub_c, int max_d)
{
intm,
n;
int   *prev;
int   *curr;
inti,
j;
const char *x;
const char *y;
CharLengthAndOffset *lengths_and_offsets;
inty_char_len;
intcurr_left, curr_right, prev_left, prev_right, d;
intdelta, min_d;


/*
 * We should calculate number of characters for multibyte encodings
 */
m = pg_mbstrlen_with_len(s, s_len);
n = pg_mbstrlen_with_len(t, t_len);

/*
 * We can transform an empty s into t with n insertions, or a non-empty
t
 * into an empty s with m deletions.
 */
if (m == 0)
return n * ins_c;
if (n == 0)
return m * del_c;

/*
 * We can find the minimal distance by the difference of lengths
 */
delta = m - n;
if (delta  0)
min_d = delta * del_c;
else if (delta  0)
min_d = - delta * ins_c;
else
min_d = 0;
if (max_d = 0  min_d  max_d)
return max_d + 1;

/*
 * For security concerns, restrict excessive CPU+RAM usage. (This
 * implementation uses O(m) memory and has O(mn) complexity.)
 */
if (m  MAX_LEVENSHTEIN_STRLEN ||
n  MAX_LEVENSHTEIN_STRLEN)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg(argument exceeds the maximum length of %d bytes,
MAX_LEVENSHTEIN_STRLEN)));

/* One more cell for initialization column and row. */
++m;
++n;


/*
 * Instead of building an (m+1)x(n+1) array, we'll use two different
 * arrays of size m+1 for storing accumulated values. At each step one
 * represents the previous row and one is the current row of the
 * notional large array.
 * For multibyte encoding we'll also store array of lengths of
 * characters and array with character offsets in first string
 * in order to avoid great number of
 * pg_mblen calls.
 */
prev = (int *) palloc((2 * sizeof(int) + sizeof(CharLengthAndOffset)) *
m );
curr = prev + m;
lengths_and_offsets = (CharLengthAndOffset *)(prev + 2 * m);
lengths_and_offsets[0].offset = 0;
for (i = 0, x = s; i  m - 1; i++)
{
lengths_and_offsets[i].length = pg_mblen(x);
lengths_and_offsets[i + 1].offset = lengths_and_offsets[i].offset +
lengths_and_offsets[i].length;
x += lengths_and_offsets[i].length;
}
lengths_and_offsets[i].length = 0;


/* Initialize the previous row to 0..cols */
curr_left = 1;
d = min_d;
for (i = 0; i  delta; i++)
{
prev[i] = d;
}
curr_right = m;
for (; i  m; i++)
{
prev[i] = d;
d += (ins_c + del_c);
if (max_d = 0  d  max_d)
{
curr_right = i;
break;
}
}

/*
 * There are following optimizations:
 * 1) Actually the minimal possible value of final distance (in the case
of
 * all possible matches) is stored is the cells of the matrix. In the
case
 * of movement towards diagonal, which contain last cell, value

Re: [HACKERS] documentation for committing with git

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 3:37 PM, Andrew Dunstan and...@dunslane.net wrote:
 Well, either we have a terminology problem or a statement of policy that I'm
 not sure I agree with, in point 2.  IMNSHO, what we need to forbid is
 commits that are not fast-forward commits, i.e. that do not have the current
 branch head as an ancestor, ideally as the immediate ancestor.

There are two separate questions here.  One is whether an update to a
ref is fast-forward or history rewriting, and the other is whether it
is a merge commit or not.  I don't believe that we want either
history-rewriting commits or merge commits to get pushed, but this
paragraph is about merge commits.

 Personally, I have a strong opinion that for everything but totally trivial
 patches, the committer should create a short-lived work branch where all the
 work is done, and then do a squash merge back to the main branch, which is
 then pushed. This pattern is not mentioned at all. In my experience, it is
 essential, especially if you're working on more than one thing at a time, as
 many people often are.

git merge --squash doesn't create a merge commit.  Indeed, the whole
point is to create a commit which essentially encapsulates the same
diff as a merge commit but actually isn't one.  From the man page:

Produce the working tree and index state as if a real merge
happened (except for the merge information), but do not actually
make a commit or move the HEAD, nor record $GIT_DIR/MERGE_HEAD to
cause the next git commit command to create a merge commit.

As for whether to discuss the use of git merge --squash, I could go
either way on that.  Personally, my preferred workflow is to do 'git
rebase -i master' on a topic branch, squash all the commits, and then
switch to the master branch and do 'git merge otherbranch', resulting
in a fast-forward merge with no merge commit.  But there are many
other ways to do it, including 'git merge --squash' and the
already-mentioned 'git commit -a'.  I think there's a risk of this
turning into a complete tutorial on git, which might detract from its
primary purpose of explaining to committers how to get  a basic,
working setup in place.  But we can certainly add whatever you think
is important, or maybe some language indicating that 'git commit -a'
is just an EXAMPLE of how to create a commit...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] review: psql: edit function, show function commands patch

2010-07-21 Thread Dimitri Fontaine

Pavel Stehule pavel.steh...@gmail.com writes:
   CREATE OR REPLACE FUNCTION public.foo()
    RETURNS integer
    LANGUAGE plpgsql
1  AS $function$ begin
2  return 10/0;
3  end;
   $function$

 This is very trivial example - for more complex functions, the correct
 line numbering is more useful.

I completely agree with this, in-functions line numbering is a
must-have. I'd like psql to handle that better.

That said, I usually edit functions in Emacs on my workstation. I did
implement a linum-mode extension to show PL/pgSQL line numbers in
addition to the buffer line numbers in emacs, but it failed to work with
this AS $function$ begin on the same line example. It's fixed in the
attached, should there be any users of it.

Regards,
-- 
dim



dim-pgsql.el
Description: pgsql setup for emacs

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] managing git disk space usage

2010-07-21 Thread Dimitri Fontaine

Alvaro Herrera alvhe...@commandprompt.com writes:
 This does not work as cleanly as you suppose, because some build
 objects are stored in the source tree.  configure being one of them.
 So if you switch branches, configure is rerun even in a VPATH build,
 which is undesirable.

Ouch. Reading -hackers led me to thinking this had received a cleaning
effort in the Makefiles, so that any generated file appears in the build
directory. Sorry to learn that's not (yet?) the case.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add column if not exists (CINE)

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 2:53 PM, Bernd Helmle maili...@oopsware.de wrote:


 --On 1. Mai 2010 23:09:23 -0400 Robert Haas robertmh...@gmail.com wrote:

 On Wed, Apr 28, 2010 at 9:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 CREATE OR REPLACE is indeed much more complicated.  In fact, for
 tables, I maintain that you'll need to link with -ldwim to make it
 work properly.

 This may in fact be an appropriate way to handle the case for tables,
 given the complexity of their definitions.

 Patch attached.

 I had an initial look at Robert's patch. Patch applies cleanly,
 documentation and regression tests included, everything works as expected.
 When looking at the functionality there's one thing that strikes me a
 little:

 be...@localhost:bernd #*= CREATE TABLE IF NOT EXISTS foo(id int);
 ERROR:  duplicate key value violates unique constraint
 pg_type_typname_nsp_index
 DETAIL:  Key (typname, typnamespace)=(foo, 2200) already exists.

 This is what you get from concurrent CINE commands. The typname thingie
 might be confusing by unexperienced users, but i think its hard to do
 anything about it ?

I get the same error message from concurrent CREATE TABLE commands
even without CINE...

S1:
rhaas=# begin;
BEGIN
rhaas=# create table foo (id int);
CREATE TABLE

S2:
rhaas=# begin;
BEGIN
rhaas=# create table foo (id int);
blocks

S1:
rhaas=# commit;
COMMIT

S2:
ERROR:  duplicate key value violates unique constraint
pg_type_typname_nsp_index
DETAIL:  Key (typname, typnamespace)=(foo, 2200) already exists.

I agree it would be nice to fix this.  I'm not sure how hard it is.  I
don't think it's the job of this patch.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git config user.email

2010-07-21 Thread Andrew Dunstan




Robert Haas wrote:

We need to decide what email addresses committers will use on the new
git repository when they commit.  Although I think we have more votes
(at least from committers) for always having author == committer,
rather than possibly setting the author tag to some other value, the
issue exists independently of that.  I believe we want to try to set
things up so that committers will not need to change the email address
they are using to commit even if their employment situation changes.
Because if that happens, then it becomes more difficult to keep track
of who is who.

My initial suggestion was to say that everyone should just be
usern...@postgresql.org; but I think that met with some resistance.
Magnus, for example, tells me that he is a committer for multiple
projects, and is mag...@hagander.net at all of them.  Since that's a
domain name he owns personally, it seems safe enough.  But I'm
inclined to think we should avoid things like rh...@commandprompt.com,
just on the off chance JD decides to fire me.

Of course, I expect there might be some dissenting voices on this
point, so... thoughts?

  


Do we care that much? I agree it should probably be something permanent, 
and so that could rule out employment based addresses, but it doesn't 
strike me as a big deal.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] documentation for committing with git

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 5:03 PM, Robert Haas robertmh...@gmail.com wrote:
 working setup in place.  But we can certainly add whatever you think
 is important, or maybe some language indicating that 'git commit -a'
 is just an EXAMPLE of how to create a commit...

I took a crack at this, as well as incorporating some of the other
suggestions that have been made.  I'm sure it's not perfect, but maybe
it's an improvement...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multibyte charater set in levenshtein function

2010-07-21 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mié jul 21 14:25:47 -0400 2010:
 On Wed, Jul 21, 2010 at 7:40 AM, Alexander Korotkov
 aekorot...@gmail.com wrote:
  On Wed, Jul 21, 2010 at 5:54 AM, Robert Haas robertmh...@gmail.com wrote:

  Same benefit can be achived by replacing char * with
  char * and length.
  I changed !m to m == 0 because Itagaki asked me to make it conforming coding
  style. Do you think there is no reason to fix coding style in existing
  code?
 
 Yeah, we usually try to avoid changing that sort of thing in existing
 code, unless there's a very good reason.

I think fixing a stylistic issue in code that's being edited for other
purposes is fine, and a good idea going forward.  We wouldn't commit a
patch that would *only* fix those, because that would cause a problem
for backpatches for no benefit, but if the patch touches something else,
then a backpatch of another patch is going to need manual intervention
anyway.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git config user.email

2010-07-21 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mié jul 21 12:54:36 -0400 2010:

 My initial suggestion was to say that everyone should just be
 usern...@postgresql.org; but I think that met with some resistance.
 Magnus, for example, tells me that he is a committer for multiple
 projects, and is mag...@hagander.net at all of them.  Since that's a
 domain name he owns personally, it seems safe enough.  But I'm
 inclined to think we should avoid things like rh...@commandprompt.com,
 just on the off chance JD decides to fire me.

I have a mild preference of alvhe...@alvh.no-ip.org over
@postgresql.org.  If other committers are going to use personal
addresses, I'll use mine as well.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] accentuated letters in text-search

2010-07-21 Thread Andreas Joseph Krogh


Hi.

I was googling for how to create a text-seach-config with the following 
properties:

- Map unicode accentuated letters to an un-accentuated equivalent
- No stop-words
- Lowercase all words

And came over this from -general:
http://www.techienuggets.com/Comments?tx=106813

Then after some more googling I found this:
http://www.sai.msu.su/~megera/wiki/unaccent

Any reason the unaccent dict. and function did not make it in 9.0?

--
Andreas Joseph Kroghandr...@officenet.no
Senior Software Developer / CTO
+-+
OfficeNet AS| The most difficult thing in the world is to |
Rosenholmveien 25   | know how to do a thing and to watch |
1414 Trollåsen  | somebody else doing it wrong, without   |
NORWAY  | comment.|
| |
Tlf:+47 24 15 38 90 | |
Fax:+47 24 15 38 91 | |
Mobile: +47 909  56 963 | |
+-+


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multibyte charater set in levenshtein function

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 2:47 PM, Alexander Korotkov
aekorot...@gmail.com wrote:
 On Wed, Jul 21, 2010 at 10:25 PM, Robert Haas robertmh...@gmail.com wrote:

 *scratches head*  Aren't you just moving the same call to a different
 place?

 So, where you can find this different place? :) In this patch
 null-terminated strings are not used at all.

I can't.  You win.  :-)

Actually, I wonder if there's enough performance improvement there
that we might think about extracting that part of the patch and apply
it separately.  Then we could continue trying to figure out what to do
with the rest.  Sometimes it's simpler to deal with one change at a
time.

 I tested it with american-english dictionary with 98569 words.

 test=# select sum(levenshtein(word, 'qwerqwerqwer')) from words;
    sum
 -
  1074376
 (1 row)

 Time: 131,435 ms
 test=# select sum(levenshtein_less_equal(word, 'qwerqwerqwer',100)) from
 words;
    sum
 -
  1074376
 (1 row)

 Time: 221,078 ms
 test=# select sum(levenshtein_less_equal(word, 'qwerqwerqwer',-1)) from
 words;
    sum
 -
  1074376
 (1 row)

 Time: 254,819 ms

 The function with negative value of max_d didn't become faster than with
 just big value of max_d.

Ah, I see.  That's pretty compelling, I guess.  Although it still
seems like a lot of code...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamically allocating chunks from shared memory

2010-07-21 Thread Robert Haas

On Wed, Jul 21, 2010 at 2:53 PM, Markus Wanner mar...@bluegap.ch wrote:
 Consider also the contrary situation,
 where the imessages stuff is not in use (even for a short period of
 time, like a few minutes).  Then we'd really rather not still have
 memory carved out for it.

 Huh? That's exactly what dynamic allocation could give you: not having
 memory carved out for stuff you currently don't need, but instead being able
 to dynamically use memory where most needed. SLRU has memory (not disk
 space) carved out for pretty much every sub-system separately, if I'm
 reading that code correctly.

Yeah, I think you are right.  :-(

 I think what would be even better is to merge the SLRU pools with the
 shared_buffer pool, so that the two can duke it out for who is in most
 need of the limited amount of memory available.

 ..well, just add the shared_buffer pool to the list of candidates that could
 use dynamically allocated shared memory. It would need some thinking about
 boundaries (i.e. when to spill to disk, for those modules that /want/ to
 spill to disk) and dealing with OOM situations, but that's about it.

I'm not sure why merging the SLRU pools with shared_buffers would
benefit from dynamically allocated shared memory.

I might be at (or possibly beyond) the limit of my ability to comment
intelligently on this without looking more at what you want to use
these imessages for, but I'm still pretty skeptical about the idea of
storing them directly in shared memory.  It's possible, though, that I
am all wet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git config user.email

2010-07-21 Thread Tatsuo Ishii

 My initial suggestion was to say that everyone should just be
 usern...@postgresql.org; but I think that met with some resistance.
 Magnus, for example, tells me that he is a committer for multiple
 projects, and is mag...@hagander.net at all of them.  Since that's a
 domain name he owns personally, it seems safe enough.  But I'm
 inclined to think we should avoid things like rh...@commandprompt.com,
 just on the off chance JD decides to fire me.
 
 Of course, I expect there might be some dissenting voices on this
 point, so... thoughts?

I'd prefer usern...@postgresql.org since:

- It's permanent as already pointed out

- It'd make clear that username is working as one of PostgreSQL
  project members

Personal email addesses such as mag...@hagander.net would be ok as
long as he is sure that he will continue to pay charges for his
domain:-)
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 123 matches

Mail list logo