[HACKERS] Toasted table not deleted when no out of line columns left

2008-09-21 Thread Zoltan Boszormenyi
Hi,

we came across a database where a table had a toasted table,
keeping huge amounts of disk space allocated. However,
the table's current definition didn't explain why there was
a toasted table. Then upon some experiments, it struck me.
There _was_ a toasted field but as the schema was modified,
the fields was dropped, leaving only inline stored fields.
VACUUM [FULL] [ANALYZE] didn't cleaned up the space
that was used by the toasted table. My tests were done on 8.3.3.

As every statements that reference a table puts a lock on the
pg_class record, ALTER TABLE cannot progress until all locks
are gone, i.e. the transactions referencing the table finished.
It's true vice-versa, ALTER TABLE blocks every transactions
that may reference the table. Judging from that, the toasted table
cleanup may be part of ALTER TABLE DROP COLUMN.

Best regards,
Zoltán Böszörményi

-- 
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal for smaller indexes on index-ordered tables

2008-06-24 Thread Zoltan Boszormenyi
Jeffrey Baker írta:
 The way I read it, the current btree index stores the index value and
 the TID of every tuple having that value.  When you have a table with
 three columns, you index one of them and you get an index which is
 practically as large as the table itself.

 Supposing the table is generally or strictly ordered by the column to
 be indexed, it would be more compact if the index stored ranges of
 tuples.  Instead of storing the TID of every tuple with that value,
 the index would store a first and last TID, between which all tuples
 have the value.

 Example: table with one million rows indexed on a column having one
 thousand distinct values.  Table is in-order by the indexed column. 
 The traditional index would contain a million TIDs, whereas a range
 index would contain only two thousand.  The range index would be 500
 times smaller, more likely to be cached, etc.

 Thoughts?

 -jwb

Example with your theory:
One (not yet committed) transaction changes one tuple
that was in the middle of a range before but the tuple's indexed
column changed. What would you do?
You need to keep track of multiple index versions:
1. the range has to be split for the not-yet-committed modifier transaction,
it might need to re-read the same table.
2. the old range has to be kept for reader transactions that still see
the old data
Imagine you have thousands of UPDATEs in flight on different rows.

Or you introduce readers has to wait for writers locks and
updaters has to wait for other updaters on the same range
that the MVCC implementation nicely avoids.

Look at MaxDB once, you'll appreciate PostgreSQL then.
MaxDB stores table tuples in the order of its primary key,
it uses a balanced btree for that. This means slower INSERTs and
UPDATEs and decreased concurrency compared to PostgreSQL.

Best regards,
Zoltán Böszörményi

-- 
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal for smaller indexes on index-ordered tables

2008-06-24 Thread Zoltan Boszormenyi
Jeffrey Baker írta:
 On Tue, Jun 24, 2008 at 1:59 PM, Zoltan Boszormenyi [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED] wrote:

 Jeffrey Baker írta:
  The way I read it, the current btree index stores the index
 value and
  the TID of every tuple having that value.  When you have a table
 with
  three columns, you index one of them and you get an index which is
  practically as large as the table itself.
 
  Supposing the table is generally or strictly ordered by the
 column to
  be indexed, it would be more compact if the index stored ranges of
  tuples.  Instead of storing the TID of every tuple with that value,
  the index would store a first and last TID, between which all tuples
  have the value.
 
  Example: table with one million rows indexed on a column having one
  thousand distinct values.  Table is in-order by the indexed column.
  The traditional index would contain a million TIDs, whereas a range
  index would contain only two thousand.  The range index would be 500
  times smaller, more likely to be cached, etc.
 
  Thoughts?
 
  -jwb

 Example with your theory:
 One (not yet committed) transaction changes one tuple
 that was in the middle of a range before but the tuple's indexed
 column changed. What would you do?


 Insert the new tuple at the end of the table and add another range to
 the index.  Leave the old tuple in place and don't touch the original
 index range.

This is what I described below but I only mentioned the index part:

 You need to keep track of multiple index versions:
 1. the range has to be split for the not-yet-committed modifier
 transaction,
it might need to re-read the same table.
 2. the old range has to be kept for reader transactions that still see
 the old data


 This is only true if you update the tuple in-place.

Why? If you update in-place then the above is not needed.
You just need to serialize transactions but there goes concurrency.

 Imagine you have thousands of UPDATEs in flight on different rows.


 I'm quite aware of the problems of maintaining such a table and index,
 but the fact is that data warehouse type tables may never be updated
 after being created.  The particular application I'm struggling with
 does a SELECT ... INTO ... ORDER BY to make an ordered table for
 querying every night.  The problem is it takes longer, much longer, to
 create the index than to create the table, and in the end the index is
 as big as half the table anyway.

 So this type of index would only be useful for an essentially
 read-only table.  I agree.

 Quite another proposal would be to somehow instruct the database that
 the table is strictly in-order by a column and allow a binary search
 access method.  Then you don't need any index at all.

CLUSTER tablename USING indexname;
It's useful for little changing very large tables.

 -jwb



-- 
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rawhide report: 20080612 changes

2008-06-12 Thread Zoltan Boszormenyi
Hm. Someone had his second finger chainsawed? Where is 8.3.2?
The ftp browser also shows 8.3.1 and 8.3.3, and 8.3.1 is the latest on
the main page.

Rawhide írta:
 postgresql-8.3.3-1.fc10
 ---
 * Wed Jun 11 18:00:00 2008 Tom Lane [EMAIL PROTECTED] 8.3.3-1
 - Update to PostgreSQL 8.3.3.
 - Remove postgresql-prefer-ncurses.patch, no longer needed in recent
   Fedora releases because libtermcap is gone.
   




-- 
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rawhide report: 20080612 changes

2008-06-12 Thread Zoltan Boszormenyi
Thanks for the info.

Magnus Hagander írta:
 8.3.2 was pulled back because of an urgent bugfix, and re-released as
 8.3.3 since it had already hit the mirrors.

 8.3.3 has not been officially releasde yet, but it will be out soon.

 //Magnus

 Zoltan Boszormenyi wrote:
   
 Hm. Someone had his second finger chainsawed? Where is 8.3.2?
 The ftp browser also shows 8.3.1 and 8.3.3, and 8.3.1 is the latest on
 the main page.

 Rawhide írta:
 
 postgresql-8.3.3-1.fc10
 ---
 * Wed Jun 11 18:00:00 2008 Tom Lane [EMAIL PROTECTED] 8.3.3-1
 - Update to PostgreSQL 8.3.3.
 - Remove postgresql-prefer-ncurses.patch, no longer needed in recent
   Fedora releases because libtermcap is gone.
   
   


 


   


-- 
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH RECURSIVE patch V0.1

2008-05-19 Thread Zoltan Boszormenyi

Gregory Stark írta:

This is indeed really cool. I'm sorry I haven't gotten to doing what I
promised in this area but I'm glad it's happening anyways.


Zoltan Boszormenyi [EMAIL PROTECTED] writes:

  
Can we get the rows in tree order, please? 
...

After all, I didn't specify any ORDER BY clauses in the base, recursive or the
final queries.



The standard has a clause to specify depth-first order. However doing a
depth-first traversal would necessitate quite a different looking plan and
it's far less obvious (to me anyways) how to do it.
  


That would be even cooler to have it implemented as well.


Also, it seems there are no infinite recursion detection:

# with recursive x(level, parent, child) as (
   select 1::integer, * from test_connect_by where parent is null
   union all
   select x.level + 1, base.* from test_connect_by as base, x where base.child
= x.child
) select * from x;
... it waits and waits and waits ...



Well, psql might wait and wait but it's actually receiving rows. A cleverer
client should be able to deal with infinite streams of records. 
  


I think it's the other way around. The server should not emit infinite 
number of records.



I think DB2 does produce a warning if there is no clause it can determine will
bound the results. But that's not actually reliable. It's quite possible to
have clauses which will limit the output but not in a way the database can
determine. Consider for example a tree-traversal for a binary tree stored in a
recursive table reference. The DBA might know that the data contains no loops
but the database doesn't.
  


Well, a maintenance resjunk could be used like the branch column in 
tablefunc::connectby().


--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH RECURSIVE patch V0.1

2008-05-19 Thread Zoltan Boszormenyi

Martijn van Oosterhout írta:

On Mon, May 19, 2008 at 08:19:17AM +0200, Zoltan Boszormenyi wrote:
  

The standard has a clause to specify depth-first order. However doing a
depth-first traversal would necessitate quite a different looking plan and
it's far less obvious (to me anyways) how to do it.
  

That would be even cooler to have it implemented as well.



From an implementation point of view, the only difference between
breadth-first and depth-first is that your tuplestore needs to be LIFO
instead of FIFO. However, just looking at the plan I don't know whether
it could support that kind of usage. At the very least I don't think
the standard tuplestore code can handle it.

  

Well, psql might wait and wait but it's actually receiving rows. A cleverer
client should be able to deal with infinite streams of records. 
  
I think it's the other way around. The server should not emit infinite 
number of records.



The server won't, the universe will end first.


The universe is alive and well, thank you. :-)
But the server won't emit infinite number of records, you are right.
Given the implementation uses a tuplestore and not producing the
tupleslots on the fly, it will go OOM first not the psql client,
I watched them in 'top'. It just takes a bit of time.


 This is a nice example
of the halting problem:

http://en.wikipedia.org/wiki/Halting_problem

Which was proved unsolvable a long time ago.
  


Hmpf, yes, I forgot too much about Turing-machines since university. :-(


Have a nice day,
  


--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] WITH RECURSIVE patch V0.1

2008-05-19 Thread Zoltan Boszormenyi

Yoshiyuki Asaba írta:

Hi,

From: Zoltan Boszormenyi [EMAIL PROTECTED]
Subject: Re: [PATCHES] WITH RECURSIVE patch V0.1
Date: Mon, 19 May 2008 08:19:17 +0200

  

Also, it seems there are no infinite recursion detection:

# with recursive x(level, parent, child) as (
   select 1::integer, * from test_connect_by where parent is null
   union all
   select x.level + 1, base.* from test_connect_by as base, x where base.child
= x.child
) select * from x;
... it waits and waits and waits ...



Well, psql might wait and wait but it's actually receiving rows. A cleverer
client should be able to deal with infinite streams of records. 
  
  
I think it's the other way around. The server should not emit infinite 
number of records.



How about adding new GUC parameter max_recursive_call?
  


Yes, why not?
MSSQL has a similar MAXRECURSION hint for WITH RECURSIVE queries
according to their docs. 
http://msdn.microsoft.com/en-us/library/ms186243.aspx



Regards,
--
Yoshiyuki Asaba
[EMAIL PROTECTED]

  



--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH RECURSIVE patch V0.1

2008-05-19 Thread Zoltan Boszormenyi

Martijn van Oosterhout írta:

On Mon, May 19, 2008 at 08:19:17AM +0200, Zoltan Boszormenyi wrote:
  

The standard has a clause to specify depth-first order. However doing a
depth-first traversal would necessitate quite a different looking plan and
it's far less obvious (to me anyways) how to do it.
  

That would be even cooler to have it implemented as well.



From an implementation point of view, the only difference between
breadth-first and depth-first is that your tuplestore needs to be LIFO
instead of FIFO.


Are you sure? I think a LIFO tuplestore would simply return reversed
breadth-first order. Depth-first means for every new record descend into
another recursion first then continue with the next record on the right.


However, just looking at the plan I don't know whether
it could support that kind of usage. At the very least I don't think
the standard tuplestore code can handle it.
  



--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH RECURSIVE patch V0.1

2008-05-19 Thread Zoltan Boszormenyi

Martijn van Oosterhout írta:

On Mon, May 19, 2008 at 11:56:17AM +0200, Zoltan Boszormenyi wrote:
  

From an implementation point of view, the only difference between


breadth-first and depth-first is that your tuplestore needs to be LIFO
instead of FIFO.
  

Are you sure? I think a LIFO tuplestore would simply return reversed
breadth-first order. Depth-first means for every new record descend into
another recursion first then continue with the next record on the right.



Say your tree looks like: 
Root-A, D 
A-B,C

D-E,F

LIFO pushes A and D. It then pops A and pushes B and C. B and C have no
children and are returned. Then D is popped and E and F pushed. So the
returned order is: A,B,C,D,E,F. You could also do B,C,A,E,F,D if you
wanted.

FIFO pushes A and D. It then pops A and puts B and C at *the end*. It
then pops D and pushes E and F at the end. So you get the order
A,D,B,C,E,F

Hope this helps,
  


Thanks, I didn't consider popping elements off while processing.
However, if the toplevel query returns tuples in A, D order, you need
a positioned insert into the tuplestore, because the LIFO would pop D first.

Say, a treestore would work this way:
1. setup: treestore is empty, storage_position := 0
2. treestore_puttupleslot() adds slot at current position, 
storage_position++
3. treestore_gettupleslot() removes slot from the beginning, 
storage_position := 0
This works easily in memory lists but it's not obvious for me how it may 
work

with disk backed temporary storage inside PG.

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH RECURSIVE patch V0.1

2008-05-19 Thread Zoltan Boszormenyi

Gregory Stark írta:

Martijn van Oosterhout [EMAIL PROTECTED] writes:

  

From an implementation point of view, the only difference between
breadth-first and depth-first is that your tuplestore needs to be LIFO
instead of FIFO. 



I think it's not so simple. How do you reconcile that concept with the join
plans like merge join or hash join which expect you to be able to be able to
process the records in a specific order?

It sounds like you might have to keep around a stack of started executor nodes
or something but hopefully we can avoid anything like that because, well, ick.
  


If I understand the code right, the recursion from level N to level N+1 
goes like this:
collect all records from level N and JOIN it with the recursive query. 
This way
we get all level 1 records from the base query, then all records at the 
second level, etc.

This is how it gets breadth-first ordering.
Depth-first ordering could go like this: get only 1 from the current 
level then go

into recursion. Repeat until there are no records in the current level.
The only difference would be more recursion steps. Instead of one per level,
there would be N per level if there are N tuples in the current level. 
Definitely

slower then the current implementation but comparable with the tablefunc.c
connectby() code.

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-updated fields

2008-05-08 Thread Zoltan Boszormenyi

Martijn van Oosterhout írta:

On Wed, May 07, 2008 at 03:04:49PM -0700, David Fetter wrote:
  

1.  Create a generic (possibly overloaded) trigger function, bundled
with PostgreSQL, which sets a field to some value.  For example, a
timestamptz version might set the field to now().



Doesn't the SQL standard GENERATED BY functionality work for this? Or
won't that handle updates?
  


You mean GENERATED ALWAYS AS (expression)?
Yes, they should be updated on every UPDATE as the expression
may include other fields in the same row.

A GENERATED column implemented as a stored column would
work for this but a virtual column would not. A virtual column
would return different values for now() in every SELECT.

However we can argue for use cases of a virtual column and implement
it similarly as VIEWs, i.e an ON SELECT rule can expand the original
expression of the column definition.

I suggest using these syntaxes if we decide to implement them:

GENERATED ALWAYS AS (expression) -- SQL standard stored GENERATED column
GENERATED VIRTUAL AS (expression) -- virtual column, obviously


Have a nice day,
  


Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-updated fields

2008-05-08 Thread Zoltan Boszormenyi

Zoltan Boszormenyi írta:

Martijn van Oosterhout írta:

On Wed, May 07, 2008 at 03:04:49PM -0700, David Fetter wrote:
 

1.  Create a generic (possibly overloaded) trigger function, bundled
with PostgreSQL, which sets a field to some value.  For example, a
timestamptz version might set the field to now().



Doesn't the SQL standard GENERATED BY functionality work for this? Or
won't that handle updates?
  


You mean GENERATED ALWAYS AS (expression)?
Yes, they should be updated on every UPDATE as the expression
may include other fields in the same row.

A GENERATED column implemented as a stored column would
work for this but a virtual column would not. A virtual column
would return different values for now() in every SELECT.

However we can argue for use cases of a virtual column and implement
it similarly as VIEWs, i.e an ON SELECT rule can expand the original
expression of the column definition.

I suggest using these syntaxes if we decide to implement them:

GENERATED ALWAYS AS (expression) -- SQL standard stored GENERATED column
GENERATED VIRTUAL AS (expression) -- virtual column, obviously


Or, as found in Oracle 11g:

GENERATED ALWAYS AS (expr) VIRTUAL




Have a nice day,
  


Best regards,
Zoltán Böszörményi




--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [RFC] Localized literals

2008-04-23 Thread Zoltan Boszormenyi

Hi,

we have a customer who shot themselves in the foot by using
table names with german accented characters in them.
The client application on the popular OS is using a single-byte
encoding (LATIN9), their dump of the original database is using
the same but no SET client_encoding = ... line anywhere.
We didn't know the previous conditions and have setup the
default installed database cluster that was created during installation
on Debian 4.0. Obviously we have to re-initialize the cluster
with the original locale so the table names come out right after
reloading the dump.

But the question popped up whether PostgreSQL can be extended
to allow localized literals and apply encoding conversion the same
way as on string data. NAMEDATA can be replaced with regular TEXT
and have the same conversion everywhere. This way the relation and
field name limits are also eliminated. The conversion could be controlled
by a compile-time option and/or a GUC variable. Tell me if I am crazy.

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Localized literals

2008-04-23 Thread Zoltan Boszormenyi

Martijn van Oosterhout írta:

On Wed, Apr 23, 2008 at 10:02:37AM +0200, Zoltan Boszormenyi wrote:
  

But the question popped up whether PostgreSQL can be extended
to allow localized literals and apply encoding conversion the same
way as on string data. NAMEDATA can be replaced with regular TEXT
and have the same conversion everywhere. This way the relation and
field name limits are also eliminated. The conversion could be controlled
by a compile-time option and/or a GUC variable. Tell me if I am crazy.



It does convert the table names also, since the encoding translation is
applied to the whole query string, not just normal strings.


Thanks for enlightenment, I didn't know or check it.


 A simple
SET CLIENT_ENCODING='latin9' at the beginning of your dump should have
worked.
  


I thought so - the dump was off a 7.4 server with its pg_dump
and there was no SET client_encoding = ... in the script.
They insisted on continuing with LATIN9 but with the default
UTF-8 server encoding createdb -E LATIN9 failed.


As for the other point, the reason NAMEDATA is fixed is because these
records is mapped onto in memory structures in the backend. By changing
it to a variable length type all structure accesses would become much
more expensive.
  


Yes, I guessed it would be slower. However, NAMEDATALEN
doesn't translate to the varchar(NAMEDATALEN) if I use accented
characters in literals with UTF-8 encoding. :-(


But none of this has anything ot do with encodings.

Have a nice day,
  


Thanks, I have it. :-)

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TRUNCATE TABLE with IDENTITY

2008-04-22 Thread Zoltan Boszormenyi

Hi,

Zoltan Boszormenyi írta:

Updated patch implements TRUNCATE ... RESTART IDENTITY
which restarts all owned sequences for the truncated table(s).
Regression tests updated, documentation added. pg_dump was
also extended to output original[1] START value for creating SEQUENCEs.

[1] For 8.3 and below I could only guesstimate it as MINVALUE for 
ascending

 and MAXVALUE for descending sequences.

Best regards,
Zoltán Böszörményi


I just saw this on the CommitFest:May page:

alvherre says: I'm not sure if this is the same patch in the previous 
entry, or a different feature


I wanted to clarify, the second patch contains two features.
1. stored start value for sequences, ALTER SEQUENCE ... RESTART;
2. (builds on 1.) TRUNCATE ... RESTART IDENTITY;

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TRUNCATE TABLE with IDENTITY

2008-04-21 Thread Zoltan Boszormenyi

Zoltan Boszormenyi írta:

Zoltan Boszormenyi írta:

Decibel! írta:

On Apr 3, 2008, at 12:52 AM, Zoltan Boszormenyi wrote:

Where is the info in the sequence to provide restarting with
the _original_ start value?


There isn't any. If you want the sequence to start at some magic 
value, adjust the minimum value.


There's the START WITH option for IDENTITY columns and this below
is paragraph 8 under General rules of 14.10 truncate table statement
in 6WD2_02_Foundation_2007-12.pdf (page 902):

8) If RESTART IDENTITY is specified and the table descriptor of T 
includes a column descriptor IDCD of

  an identity column, then:
  a) Let CN be the column name included in IDCD and let SV be the 
start value included in IDCD.
  b) The following alter table statement is effectively executed 
without further Access Rule checking:

  ALTER TABLE TN ALTER COLUMN CN RESTART WITH SV

This says that the original start value is used, not the minimum value.
IDENTITY has the same options as CREATE SEQUENCE. In fact the
identity column specification links to 11.63 sequence generator 
definition

when it comes to IDENTITY sequence options. And surprise, surprise,
11.64 alter sequence generator statement now defines
ALTER SEQUENCE sn RESTART [WITH newvalue]
where omitting the WITH newval part also uses the original start 
value.


Best regards,
Zoltán Böszörményi


Attached patch implements the extension found in the current SQL200n 
draft,
implementing stored start value and supporting ALTER SEQUENCE seq 
RESTART;
Some error check are also added to prohibit CREATE SEQUENCE ... 
RESTART ...

and ALTER SEQUENCE ... START ...

Best regards,
Zoltán Böszörményi


Updated patch implements TRUNCATE ... RESTART IDENTITY
which restarts all owned sequences for the truncated table(s).
Regression tests updated, documentation added. pg_dump was
also extended to output original[1] START value for creating SEQUENCEs.

[1] For 8.3 and below I could only guesstimate it as MINVALUE for ascending
 and MAXVALUE for descending sequences.

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/



sql2008-compliant-seq-v2.patch.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TRUNCATE TABLE with IDENTITY

2008-04-08 Thread Zoltan Boszormenyi

Zoltan Boszormenyi írta:

Decibel! írta:

On Apr 3, 2008, at 12:52 AM, Zoltan Boszormenyi wrote:

Where is the info in the sequence to provide restarting with
the _original_ start value?


There isn't any. If you want the sequence to start at some magic 
value, adjust the minimum value.


There's the START WITH option for IDENTITY columns and this below
is paragraph 8 under General rules of 14.10 truncate table statement
in 6WD2_02_Foundation_2007-12.pdf (page 902):

8) If RESTART IDENTITY is specified and the table descriptor of T 
includes a column descriptor IDCD of

  an identity column, then:
  a) Let CN be the column name included in IDCD and let SV be the 
start value included in IDCD.
  b) The following alter table statement is effectively executed 
without further Access Rule checking:

  ALTER TABLE TN ALTER COLUMN CN RESTART WITH SV

This says that the original start value is used, not the minimum value.
IDENTITY has the same options as CREATE SEQUENCE. In fact the
identity column specification links to 11.63 sequence generator 
definition

when it comes to IDENTITY sequence options. And surprise, surprise,
11.64 alter sequence generator statement now defines
ALTER SEQUENCE sn RESTART [WITH newvalue]
where omitting the WITH newval part also uses the original start value.

Best regards,
Zoltán Böszörményi


Attached patch implements the extension found in the current SQL200n draft,
implementing stored start value and supporting ALTER SEQUENCE seq RESTART;
Some error check are also added to prohibit CREATE SEQUENCE ... RESTART ...
and ALTER SEQUENCE ... START ...

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/

diff -dcrpN pgsql.orig/src/backend/commands/sequence.c pgsql/src/backend/commands/sequence.c
*** pgsql.orig/src/backend/commands/sequence.c	2008-01-01 20:45:49.0 +0100
--- pgsql/src/backend/commands/sequence.c	2008-04-08 10:51:27.0 +0200
*** static Relation open_share_lock(SeqTable
*** 88,94 
  static void init_sequence(Oid relid, SeqTable *p_elm, Relation *p_rel);
  static Form_pg_sequence read_info(SeqTable elm, Relation rel, Buffer *buf);
  static void init_params(List *options, bool isInit,
! 			Form_pg_sequence new, List **owned_by);
  static void do_setval(Oid relid, int64 next, bool iscalled);
  static void process_owned_by(Relation seqrel, List *owned_by);
  
--- 88,94 
  static void init_sequence(Oid relid, SeqTable *p_elm, Relation *p_rel);
  static Form_pg_sequence read_info(SeqTable elm, Relation rel, Buffer *buf);
  static void init_params(List *options, bool isInit,
! 			Form_pg_sequence new, Form_pg_sequence old, List **owned_by);
  static void do_setval(Oid relid, int64 next, bool iscalled);
  static void process_owned_by(Relation seqrel, List *owned_by);
  
*** DefineSequence(CreateSeqStmt *seq)
*** 116,122 
  	NameData	name;
  
  	/* Check and set all option values */
! 	init_params(seq-options, true, new, owned_by);
  
  	/*
  	 * Create relation (and fill *null  *value)
--- 116,122 
  	NameData	name;
  
  	/* Check and set all option values */
! 	init_params(seq-options, true, new, NULL, owned_by);
  
  	/*
  	 * Create relation (and fill *null  *value)
*** DefineSequence(CreateSeqStmt *seq)
*** 143,148 
--- 143,153 
  namestrcpy(name, seq-sequence-relname);
  value[i - 1] = NameGetDatum(name);
  break;
+ 			case SEQ_COL_STARTVAL:
+ coldef-typename = makeTypeNameFromOid(INT8OID, -1);
+ coldef-colname = start_value;
+ value[i - 1] = Int64GetDatumFast(new.start_value);
+ break;
  			case SEQ_COL_LASTVAL:
  coldef-typename = makeTypeNameFromOid(INT8OID, -1);
  coldef-colname = last_value;
*** AlterSequence(AlterSeqStmt *stmt)
*** 336,342 
  	memcpy(new, seq, sizeof(FormData_pg_sequence));
  
  	/* Check and set new values */
! 	init_params(stmt-options, false, new, owned_by);
  
  	/* Clear local cache so that we don't think we have cached numbers */
  	/* Note that we do not change the currval() state */
--- 341,347 
  	memcpy(new, seq, sizeof(FormData_pg_sequence));
  
  	/* Check and set new values */
! 	init_params(stmt-options, false, new, seq, owned_by);
  
  	/* Clear local cache so that we don't think we have cached numbers */
  	/* Note that we do not change the currval() state */
*** read_info(SeqTable elm, Relation rel, Bu
*** 967,973 
   */
  static void
  init_params(List *options, bool isInit,
! 			Form_pg_sequence new, List **owned_by)
  {
  	DefElem*last_value = NULL;
  	DefElem*increment_by = NULL;
--- 972,978 
   */
  static void
  init_params(List *options, bool isInit,
! 			Form_pg_sequence new, Form_pg_sequence old, List **owned_by)
  {
  	DefElem*last_value = NULL;
  	DefElem*increment_by = NULL;
*** init_params(List *options, bool isInit,
*** 995,1003

Re: [HACKERS] TRUNCATE TABLE with IDENTITY

2008-04-07 Thread Zoltan Boszormenyi

Decibel! írta:

On Apr 3, 2008, at 12:52 AM, Zoltan Boszormenyi wrote:

Where is the info in the sequence to provide restarting with
the _original_ start value?


There isn't any. If you want the sequence to start at some magic 
value, adjust the minimum value.


There's the START WITH option for IDENTITY columns and this below
is paragraph 8 under General rules of 14.10 truncate table statement
in 6WD2_02_Foundation_2007-12.pdf (page 902):

8) If RESTART IDENTITY is specified and the table descriptor of T 
includes a column descriptor IDCD of

  an identity column, then:
  a) Let CN be the column name included in IDCD and let SV be the start 
value included in IDCD.
  b) The following alter table statement is effectively executed 
without further Access Rule checking:

  ALTER TABLE TN ALTER COLUMN CN RESTART WITH SV

This says that the original start value is used, not the minimum value.
IDENTITY has the same options as CREATE SEQUENCE. In fact the
identity column specification links to 11.63 sequence generator 
definition

when it comes to IDENTITY sequence options. And surprise, surprise,
11.64 alter sequence generator statement now defines
ALTER SEQUENCE sn RESTART [WITH newvalue]
where omitting the WITH newval part also uses the original start value.

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TRUNCATE TABLE with IDENTITY

2008-04-02 Thread Zoltan Boszormenyi

Decibel! írta:

On Mar 25, 2008, at 11:40 AM, Zoltan Boszormenyi wrote:

All of them? PostgreSQL allow multiple SERIALs to be present,
the standard allows only one IDENTITY column in a table.
And what about this case below?

CREATE TABLE t1 (id1 serial, ...);
ALTER SEQUENCE seq_t1_id1 RESTART WITH 5432 CYCLE;

or the equivalent

CREATE SEQUENCE seq_t1_id1 START WITH 5432 CYCLE;
CREATE TABLE t1 (id1 serial, ...);
ALTER SEQUENCE seq_t1_id1 OWNED BY t1.id1;

PostgreSQL doesn't keep the START WITH information.
But it should to perform a restart on the sequence,
using the minval in this case wouldn't be correct.



I think you misunderstand what ALTER SEQUENCE RESTART does; it only 
changes the current value of the sequence.


I didn't misunderstood, I know that. I quoted both
because (currently) CREATE SEQUENCE ... START WITH does the same.

zozo= create sequence seq1 start with 327;
CREATE SEQUENCE
zozo= select * from seq1;
sequence_name | last_value | increment_by |  max_value  | 
min_value | cache_value | log_cnt | is_cycled | is_called

---++--+-+---+-+-+---+---
seq1  |327 |1 | 9223372036854775807 
| 1 |   1 |   1 | f | f

(1 row)

Note the difference between min_value and last_value.
Using the standard syntax of

CREATE TABLE (
  id integer IDENTITY GENERATED ALWAYS AS (START WITH 327),
  ...
);

and assuming you use the existing sequence infrastructure
there's a problem with TRUNCATE ... RESTART IDENTITY;
Where is the info in the sequence to provide restarting with
the _original_ start value?

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TRUNCATE TABLE with IDENTITY

2008-03-25 Thread Zoltan Boszormenyi

Simon Riggs írta:

On Tue, 2008-03-25 at 09:08 -0700, Steve Crawford wrote:
  

Simon Riggs wrote:


RESTART IDENTITY will reset the SERIAL sequences back to the original
start value.
  
  

Assuming this feature were to be added

In cases where the same sequence has been used across multiple tables, 
what will be the appropriate response when a user attempts to TRUNCATE 
one of those tables with RESTART IDENTITY?



Well, I'm suggesting it as a TODO item, based on the standard. It would
be for whoever took this up to unravel that.

Since that's a weak answer, I'd say it should only reset sequences that
have been placed there automatically through the use of SERIAL or
BIGSERIAL datatypes.
  


All of them? PostgreSQL allow multiple SERIALs to be present,
the standard allows only one IDENTITY column in a table.
And what about this case below?

CREATE TABLE t1 (id1 serial, ...);
ALTER SEQUENCE seq_t1_id1 RESTART WITH 5432 CYCLE;

or the equivalent

CREATE SEQUENCE seq_t1_id1 START WITH 5432 CYCLE;
CREATE TABLE t1 (id1 serial, ...);
ALTER SEQUENCE seq_t1_id1 OWNED BY t1.id1;

PostgreSQL doesn't keep the START WITH information.
But it should to perform a restart on the sequence,
using the minval in this case wouldn't be correct.

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TRUNCATE TABLE with IDENTITY

2008-03-25 Thread Zoltan Boszormenyi

Zoltan Boszormenyi írta:

Simon Riggs írta:

On Tue, 2008-03-25 at 09:08 -0700, Steve Crawford wrote:
 

Simon Riggs wrote:
   

RESTART IDENTITY will reset the SERIAL sequences back to the original
start value.


Assuming this feature were to be added

In cases where the same sequence has been used across multiple 
tables, what will be the appropriate response when a user attempts 
to TRUNCATE one of those tables with RESTART IDENTITY?



Well, I'm suggesting it as a TODO item, based on the standard. It would
be for whoever took this up to unravel that.

Since that's a weak answer, I'd say it should only reset sequences that
have been placed there automatically through the use of SERIAL or
BIGSERIAL datatypes.
  


All of them? PostgreSQL allow multiple SERIALs to be present,
the standard allows only one IDENTITY column in a table.
And what about this case below?

CREATE TABLE t1 (id1 serial, ...);
ALTER SEQUENCE seq_t1_id1 RESTART WITH 5432 CYCLE;

or the equivalent

CREATE SEQUENCE seq_t1_id1 START WITH 5432 CYCLE;
CREATE TABLE t1 (id1 serial, ...);


of course
CREATE TABLE t1 (id1 integer, ...);


ALTER SEQUENCE seq_t1_id1 OWNED BY t1.id1;

PostgreSQL doesn't keep the START WITH information.
But it should to perform a restart on the sequence,
using the minval in this case wouldn't be correct.




--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [Fwd: Re: [PATCHES] 64-bit CommandIds]

2008-03-10 Thread Zoltan Boszormenyi

Hi,

what's your opinion on this?
I saw response only from Alvaro on the -patches list.

Thanks in advance,
Zoltán Böszörményi

 Eredeti üzenet 
Tárgy:  Re: [PATCHES] 64-bit CommandIds
Dátum:  Tue, 04 Mar 2008 21:52:25 +0100
Feladó: Zoltan Boszormenyi [EMAIL PROTECTED]
Címzett:pgsql-patches [EMAIL PROTECTED]
CC: 	Alvaro Herrera [EMAIL PROTECTED], Hans-Juergen Schoenig 
[EMAIL PROTECTED]
Hivatkozások: 	[EMAIL PROTECTED] 
[EMAIL PROTECTED]




Alvaro Herrera írta:

Zoltan Boszormenyi wrote:

  

attached is our patch against HEAD which enables extending CommandIds
to 64-bit. This is for enabling long transactions that really do that much
non-read-only work in one transaction.



I think you should add a pg_control field and corresponding check, to
avoid a 64bit-Cid postmaster to start on a 32bit-Cid data area and vice
versa.
  


I added the check but I needed to add it BEFORE checking for
toast_max_chunk_size otherwise it complained about this more
cryptic problem. I think it's cleaner to report this failure to know
why toast_max_chunk_size != TOAST_MAX_CHUNK_SIZE.

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/




--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/

diff -dcrpN pgsql.orig/configure pgsql-cid64/configure
*** pgsql.orig/configure	2008-03-02 13:44:42.0 +0100
--- pgsql-cid64/configure	2008-03-04 16:53:46.0 +0100
*** if test -n $ac_init_help; then
*** 1349,1354 
--- 1349,1355 
  Optional Features:
--disable-FEATURE   do not include FEATURE (same as --enable-FEATURE=no)
--enable-FEATURE[=ARG]  include FEATURE [ARG=yes]
+   --enable-huge-commandidenable 64-bit CommandId support
--enable-integer-datetimes  enable 64-bit integer date/time support
--enable-nls[=LANGUAGES]  enable Native Language Support
--disable-shareddo not build shared libraries
*** fi
*** 2175,2180 
--- 2176,2219 
  
  
  #
+ # 64-bit CommandId
+ #
+ echo $as_me:$LINENO: checking whether to build with 64-bit CommandId support 5
+ echo $ECHO_N checking whether to build with 64-bit CommandId support... $ECHO_C 6
+ 
+ pgac_args=$pgac_args enable_huge_commandid
+ 
+ # Check whether --enable-huge-commandid or --disable-huge-commandid was given.
+ if test ${enable_huge_commandid+set} = set; then
+   enableval=$enable_huge_commandid
+ 
+   case $enableval in
+ yes)
+ 
+ cat confdefs.h \_ACEOF
+ #define USE_64BIT_COMMANDID 1
+ _ACEOF
+ 
+   ;;
+ no)
+   :
+   ;;
+ *)
+   { { echo $as_me:$LINENO: error: no argument expected for --enable-huge-commandid option 5
+ echo $as_me: error: no argument expected for --enable-huge-commandid option 2;}
+{ (exit 1); exit 1; }; }
+   ;;
+   esac
+ 
+ else
+   enable_huge_commandid=no
+ 
+ fi;
+ 
+ echo $as_me:$LINENO: result: $enable_huge_commandid 5
+ echo ${ECHO_T}$enable_huge_commandid 6
+ 
+ #
  # 64-bit integer date/time storage (--enable-integer-datetimes)
  #
  { echo $as_me:$LINENO: checking whether to build with 64-bit integer date/time support 5
diff -dcrpN pgsql.orig/configure.in pgsql-cid64/configure.in
*** pgsql.orig/configure.in	2008-03-02 13:44:43.0 +0100
--- pgsql-cid64/configure.in	2008-03-04 16:53:46.0 +0100
*** PGAC_ARG_REQ(with, libs,  [  --with-
*** 128,133 
--- 128,142 
  
  
  #
+ # 64-bit CommandId
+ #
+ AC_MSG_CHECKING([whether to build with 64-bit CommandId support])
+ PGAC_ARG_BOOL(enable, huge-commandid, no, [  --enable-huge-commandidenable 64-bit CommandId support],
+ 		[AC_DEFINE([USE_64BIT_COMMANDID], 1,
+ 			[Define to 1 if you want 64-bit CommandId support. (--enable-huge-commandid)])])
+ AC_MSG_RESULT([$enable_huge_commandid])
+ 
+ #
  # 64-bit integer date/time storage (--enable-integer-datetimes)
  #
  AC_MSG_CHECKING([whether to build with 64-bit integer date/time support])
diff -dcrpN pgsql.orig/doc/src/sgml/installation.sgml pgsql-cid64/doc/src/sgml/installation.sgml
*** pgsql.orig/doc/src/sgml/installation.sgml	2008-02-18 13:49:58.0 +0100
--- pgsql-cid64/doc/src/sgml/installation.sgml	2008-03-04 17:16:14.0 +0100
*** su - postgres
*** 1011,1016 
--- 1011,1027 
/varlistentry
  
varlistentry
+termoption--enable-huge-commandid/option/term
+listitem
+ para
+  Use 64-bit CommandIds if you are planning to run transactions
+  consisting of more than 4 billion commands.  This is off by default
+  to save disk space.
+ /para
+/listitem
+   /varlistentry
+ 
+   varlistentry
 termoption--enable-integer-datetimes/option/term
 listitem
  para
diff -dcrpN pgsql.orig/src/backend/access/transam/xact.c pgsql-cid64/src/backend/access/transam/xact.c
*** pgsql.orig/src

[HACKERS] IDENTITY/GENERATED patch

2008-02-18 Thread Zoltan Boszormenyi

Hi,

as the wishlist for PostgreSQL 8.4 still has my
IDENTITY/GENERATED patches, I thought I refresh it.
Before actually doing it though, I wanted to ask for opinions
on implementation and ideas.

Here are the general ideas that were done by my patch:

1. IDENTITY columns.

The IDENTITY columns are similar to SERIALs as featured
by current PostgreSQL with the differences below.

They can be of almost any basic type, i.e. other than INTEGER or BIGINT.

There are two types of them:
GENERATED BY DEFAULT [ AS ( sequence_options ) ]
and
GENERATED ALWAYS [ AS ( sequence_options ) ]

The first behaves more or less as SERIAL in PostgreSQL currently,
i.e. upon INSERT and UPDATE such a field can be assigned a value
explicitely, they can be updated, etc. So, this can be viewed as
nothing more than a SERIAL pseudo-type column, followed by
the statements below:

ALTER TABLE ... ALTER COLUMN ... TYPE ...;
ALTER SEQUENCE seq_name sequence_options;

So, the above form is mostly a syntax sugar over already existing
features. In fact, this was the reason the acceptance was shot down.
The standard describes some behavioural difference that may make
it worth to distinguish from plain SERIALs.

However, IDENTITY GENERATED ALWAYS has some more
constraints. This is more like the autoincrementing fields in other
DBMSs, i.e. upon INSERT, usually the value 0 indicates the
generation of the next value. With the standard behaviour, any value
is ignored and the sequence next value is generated. UPDATE can only
use the DEFAULT keyword, not an explicit value.

It seems the last draft for SQL:2008 largely clarified details for
IDENTITY columns that were either conflicting or unclear in SQL:2003.

2. GENERATED columns

The GENERATED column is an automatic way of computing
expressions over columns in the same row. This is a feature of DB2.
Currently this can be implemented as a BEFORE TRIGGER
in PostgreSQL but that requires being a little familiar writing
PLPGSQL functions. The syntax of GENERATED columns
allows using plain expressions, so it's an easy to use feature.

The GENERATED columns are real, stored columns, not
on-the-fly computed virtual columns. This means that the value
computation cost is in the INSERT and UPDATE statements
instead of the SELECTs. This provides a nice speedup which
I tested but cannot currently find the mail in the archive.
Despite this demonstrable speedup, exactly the non-virtual nature
was the cause why Tom Lane didn't like it.

There's a problem however with GENERATED columns that
was pointed out, regarding BEFORE TRIGGERs. For speed,
the GENERATED columns' values can be computed after all
BEFORE TRIGGERs were executed. However, this causes
that functions running as BEFORE TRIGGERs see the old value
of the columns. This detail was mentioned in some of the
previous drafts but not in the latest SQL:2008 draft. This means
such columns MUST be recomputed before running every
BEFORE TRIGGERs and after the last one, too.
The performance hit in this case is large. A GUC variable can be
introduced to make both the fast and the correct behaviour possible.

Based on my last patch:
http://archives.postgresql.org/pgsql-patches/2007-05/msg00076.php
what problems can you see in it that needs fixing? I know, it extends
on the standard. (e.g. DROP IDENTITY, etc.) However, would a strictly
standard compliant IDENTITY/GENERATED get accepted into
PostgreSQL? Which extensions in the patch are worth keeping?

There are two possible problems that I know of.
I need to fix the OVERRIDING USER VALUE case as it isn't a NOP.
This convinced me that PostgreSQL SERIAL is not equivalent
to the IDENTITY GENERATED BY DEFAULT column and
the requirement that only one IDENTITY column may exists in
a table is strongly assumed by the OVERRIDING clauses.

And I just discovered an extension of TRUNCATE TABLE
in SQL:2008 that make it possible to restart the sequence behind
the identity column upon TRUNCATE.

What else?

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Error building 32 bit on 64 bit linux system

2008-02-18 Thread Zoltan Boszormenyi

Doug Knight írta:

All,
I am trying to build 8.2.5, forcing to a 32 bit build on a 64 bit 
system. I have set CFLAGS=-m32, and I run the configure and make/make 
install as follows:


setarch i386 ./configure
setarch i386 make
setarch i386 make install

However, I get the following error (using timezone for example):

$ make
gcc -m32 -Wall -Wmissing-prototypes -Wpointer-arith -Winline 
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing 
-I../../src/include -D_GNU_SOURCE   -c -o localtime.o localtime.c
gcc -m32 -Wall -Wmissing-prototypes -Wpointer-arith -Winline 
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing 
-I../../src/include -D_GNU_SOURCE   -c -o strftime.o strftime.c
gcc -m32 -Wall -Wmissing-prototypes -Wpointer-arith -Winline 
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing 
-I../../src/include -D_GNU_SOURCE   -c -o pgtz.o pgtz.c

/usr/bin/ld -r -o SUBSYS.o localtime.o strftime.o pgtz.o
/usr/bin/ld: Relocatable linking with relocations from format 
elf32-i386 (localtime.o) to format elf64-x86-64 (SUBSYS.o) is not 
supported

make: *** [SUBSYS.o] Error 1

Funny thing is, there is no SUBSYS.o in my current directory. If I 
build from the top, I see this same error in each dirctory/makefile 
where a SUBSYS.o is linked with. If I search my build tree after a 
top-down build, I do not see any SUBSYS.O files at all. Where is this 
SUBSYS.o getting created, and why isn't it being created as a 32 bit 
file instead of 64 bit?


This problem is solved by using

export CC=gcc -m32
export LD=ld -melf_i386

before doing

setarch i386 (./configure ... ; make)

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] IDENTITY/GENERATED patch

2008-02-18 Thread Zoltan Boszormenyi

Tom Lane írta:

Zoltan Boszormenyi [EMAIL PROTECTED] writes:
  

as the wishlist for PostgreSQL 8.4 still has my
IDENTITY/GENERATED patches, I thought I refresh it.
Before actually doing it though, I wanted to ask for opinions
on implementation and ideas.



IIRC the end conclusion on that patch was that it was awfully messy,
didn't really buy much in new functionality, and was trying to track a
portion of the spec that the SQL committee had apparently not gotten
right yet.

  

It seems the last draft for SQL:2008 largely clarified details for
IDENTITY columns that were either conflicting or unclear in SQL:2003.



Implementing stuff that's still changing in a *draft* standard seems
to me like a good way to get burnt.
  


From http://www.wiscorp.com/SQLStandards.html :

SQL:2008 Draft International Standard Documents 
http://www.wiscorp.com/sql200n.zip (Updated 12/14/2007) - *Updated!!!* 
This points to the /documents which wlll likely be the documents that 
represent the SQL 2008 Standard.

These documents are out for International Standard ballot at this time.
The vote is an Up/Down vote. No changes allowed./

I think it's time to start looking at it.


I think this needs to stay on the back burner, at least until there's
a published final spec that doesn't seem as broken as 2003 was.  I
was sufficiently disillusioned with the whole thing at the end of
the previous review cycle
http://archives.postgresql.org/pgsql-hackers/2007-05/msg00441.php
that I'm not really eager to pour more time down the same hole.

regards, tom lane

  



--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] convert int to bytea

2007-11-29 Thread Zoltan Boszormenyi

Hi,

please don't top post to someone who didn't used this convention
in answering you. It's impolite. I edited the mail a bit to return sanity.

On Nov 29, 2007 9:00 PM, Douglas McNaught [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:


On 11/29/07, [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:

 On Thu, 29 Nov 2007, Gregory Stark wrote:

  What do you want the resulting bytea to look like?
 
 example : id = 9 , bytea = '\000\000\011' IIRC

What do you expect to happen when server and client are
differently-endian?

-Doug



Usama Dar írta:
 Does it matter if you have written an explicit cast for int to bytea?


You don't know what't endianness is, do you?
Say, you have a number: 0x12345678.
This is stored differently depending on the endianness.

Big-endian (like Sparc, Motorola, etc):
0x12 0x34 0x56 0x78

Little-endian (Intel-compatibles, etc):
0x78 0x56 0x34 0x12

So, how do you want your number to come out as a byte array?
Since a bytea is a sequence of bytes as stored in memory,
you may have different meaning for an int-bytea conversion.

It's your homework to look up what's network order is. :-)
But it would give you consistent answer no matter
what CPU your server uses.

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[HACKERS] Problem with pg_dump -n schemaname

2007-11-16 Thread Zoltan Boszormenyi

Hi,

we came across a problem when you want to dump only one schema.
The ASCII output when loaded with psql into an empty database
doesn't produce an identical schema to the original.
The problem comes from this statement ordering:

SET ... -- some initial DB parameters
...
SET search_path = schemaname , pg_catalog;
   -- the above fails because no schema with this name exists
   -- as a consequence, the original search_path (e.g. $user, 
public)

   --   is not modified

DROP INDEX schemaname.index1;
...
DROP TABLE schemaname.table1;
DROP SCHEMA schemaname;

CREATE SCHEMA schemaname;
ALTER SCHEMA schemaname OWNER TO schemaowner;

CREATE TABLE table1; -- note that it was DROPped with full name 
schemaname.table1

...

So, because search_path is ' $user, public ' for e.g. postgres,
the tables are created in the public schema. Hence, I propose
the attached patch which issues SET search_path = ... statements
before the first CREATE TABLE stmt in their respective schema
instead of before the first DROP command.

The problem manifests only when you dump only one schema.
The same problem exists in at least 8.0.3, 8.2.5 and last 8.3cvs.

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/

--- postgresql-8.2.5.orig/src/bin/pg_dump/pg_backup_archiver.c	2007-08-06 03:38:24.0 +0200
+++ postgresql-8.2.5/src/bin/pg_dump/pg_backup_archiver.c	2007-11-16 11:00:46.0 +0100
@@ -241,9 +241,6 @@
 			{
 /* We want the schema */
 ahlog(AH, 1, dropping %s %s\n, te-desc, te-tag);
-/* Select owner and schema as necessary */
-_becomeOwner(AH, te);
-_selectOutputSchema(AH, te-namespace);
 /* Drop it */
 ahprintf(AH, %s, te-dropStmt);
 			}
@@ -275,6 +272,10 @@
 		{
 			ahlog(AH, 1, creating %s %s\n, te-desc, te-tag);
 
+			/* Select owner and schema as necessary */
+			_becomeOwner(AH, te);
+			_selectOutputSchema(AH, te-namespace);
+
 			_printTocEntry(AH, te, ropt, false, false);
 			defnDumped = true;
 

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] support for hyper-long transactions ...

2007-10-16 Thread Zoltan Boszormenyi

Heikki Linnakangas írta:

Hans-Juergen Schoenig wrote:
  

at the moment i am working on an application which is supposed to run
extremely large transactions (a lot of server side stored procedure
stuff which can hardly be split into small transactions for visibility
reasons).
so, from time to time it happens that i exceed my CommandCounter (
2.000.000.000 statements inside the same transaction).



Interesting. What kind of a stored procedure is that?

One optimization that might be possible is to not increment it for
statements that do on-disk changes.

  

my idea is: how about adding a configure option to compile postgres with
a 64 bit command counter. this would allow larger transactions for
special purpose applications while it would not have an impact on normal
applications.



One objection to expanding TransactionId to 64-bits has been that we
depend on assignment of TransactionId to be atomic. That might not be an
issue with CommandIds; I don't think we store commandids in shared
memory structures. It still doesn't feel like a good idea to me, though.
  


Isn't 64-bit assignment atomic on 64-bit machines?
With a little autoconf magic, the conditional can be
disabled for 32-bit platforms. So that's not a real obstacle
for going to 64-bit TransactionIds.

--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] XID wraparound and busy databases

2007-08-15 Thread Zoltan Boszormenyi

Tom Lane írta:

It's hard to see how anyone could be doing 6K xacts/sec unless most are
read-only.

regards, tom lane
  


In a recent stress test with our PostgreSQL-based cluster between two 
machines

3 million transaction were performed with pgbench -c 150 -t 2 -s 200
in about _ten hours_. The primary machine (desktop-level machine for 
development)

used a real disk, the secondary used tmpfs as PGDATA. Say whatever you want
about my disk lying about flush, its 75MB/sec transfer rate transfer 
rate is real.

So 5 million real transaction in 24 hours is not unrealistic.

--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] XID wraparound and busy databases

2007-08-15 Thread Zoltan Boszormenyi

Heikki Linnakangas írta:

Zoltan Boszormenyi wrote:
  

Tom Lane írta:


It's hard to see how anyone could be doing 6K xacts/sec unless most are
read-only.
  

In a recent stress test with our PostgreSQL-based cluster between two
machines
3 million transaction were performed with pgbench -c 150 -t 2 -s 200
in about _ten hours_. The primary machine (desktop-level machine for
development)
used a real disk, the secondary used tmpfs as PGDATA. Say whatever you want
about my disk lying about flush, its 75MB/sec transfer rate transfer
rate is real.
So 5 million real transaction in 24 hours is not unrealistic.



6k xacts / s is five *hundred* million transactions, not five million...
  


Blush. :-) You're right. However a single machine with ramdisk is able 
to do that.


--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] New cast between inet/cidr and bytea

2007-05-31 Thread Zoltan Boszormenyi

Tom Lane írta:

Zoltan Boszormenyi [EMAIL PROTECTED] writes:
  

Bruce Momjian írta:


What is the use case for such a cast?
  


  

The application doesn't want to parse the textual IP address
when all the parsing and checking intelligence is already there
in the inet/cidr type checks.



This presumes exactly the assumption we are questioning, namely that
there's a universal binary representation for these things.


But there is: network order.


  There might
be such for bare IP addresses (ignoring endianness) but the argument
doesn't scale to CIDR.


Would you enlighten me why not?


  You've also failed to make the case that this
application designer has made a sane judgment about whether avoiding
parsing is a good tradeoff here.
  


So, reinventing the wheel is always the way to go?
Even when the app is actually storing those IP addresses
with the type and features PostgreSQL provides?


Also: to the extent that the application is willing to deal with a
Postgres-specific inet/cidr representation (which, in the end, is
what this would be) it can do that *today* using binary output format.
So I'm still not seeing an argument for exposing a cast to bytea.

regards, tom lane
  


But the binary output of inet/cidr needs another round of parsing
which requires using internal server headers.

Would you like a 4/8/16/32 byte output using IP only
or IP + fully represented netmask better?

Best regards,

--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] New cast between inet/cidr and bytea

2007-05-30 Thread Zoltan Boszormenyi

Hi,

we at Cybertec have developed cast functions in C between
inet/cidr - bytea for a client and we would like to submit it.

This is how it works:
- IPv4 inet/cidr value will return 4 bytes if the netmask covers all 32 
bits.

 It returns 5 bytes if the netmask is shorter than 32 bits.
- Similarly for IPv6, 12[+1] bytes is returned.
- 4, 5, 12 or 13 bytes long bytea values are converted
 to an inet/cidr value, the 5th or 13th byte is range-checked
 to be a valid netmask value.

What are the requirements for it to be accepted
as a core cast function or as a contrib module?

--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] New cast between inet/cidr and bytea

2007-05-30 Thread Zoltan Boszormenyi

Bruce Momjian írta:

Zoltan Boszormenyi wrote:
  

Hi,

we at Cybertec have developed cast functions in C between
inet/cidr - bytea for a client and we would like to submit it.

This is how it works:
- IPv4 inet/cidr value will return 4 bytes if the netmask covers all 32 
bits.

  It returns 5 bytes if the netmask is shorter than 32 bits.
- Similarly for IPv6, 12[+1] bytes is returned.
- 4, 5, 12 or 13 bytes long bytea values are converted
  to an inet/cidr value, the 5th or 13th byte is range-checked
  to be a valid netmask value.

What are the requirements for it to be accepted
as a core cast function or as a contrib module?



You discuss it on the hackers list.


That's what I am trying to.

  Have you read the developer's FAQ? 
  


Not yet in its entireness.


What is the use case for such a cast?
  

e.g.:
# select '192.168.0.1'::inet::bytea;
 bytea  
--

\300\250\000\001
(1 row)

# select '192.168.0.0/24'::inet::bytea;
   bytea
--

\300\250\000\000\030
(1 row)

# select decode('\\300\\250\\000\\001', 'escape')::inet;
  decode   
-

192.168.0.1
(1 row)

# select decode('\\300\\250\\000\\000\\030', 'escape')::inet;
decode


192.168.0.2/24
(1 row)

Similarly for IPv6 addresses.
The application doesn't want to parse the textual IP address
when all the parsing and checking intelligence is already there
in the inet/cidr type checks. The reverse when you pass in bytes
is only a logical extension.

--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] New cast between inet/cidr and bytea

2007-05-30 Thread Zoltan Boszormenyi

Tom Lane írta:

Zoltan Boszormenyi [EMAIL PROTECTED] writes:
  

we at Cybertec have developed cast functions in C between
inet/cidr - bytea for a client and we would like to submit it.



Why is this a good idea?  Exposing the internal representation of a
datatype is usually bad.


I didn't expose the inet/cidr internals.

An IP address is an IP address whether you represent it
textually or as a series of bytes. Since the rise of the CIDR
type netmask usage, there is not much support for netmasks
with holes in the middle of the bits, like 255.255.192.128
on the 'net. And there is no support for this kind of netmask
in PostgreSQL either. So, to don't lose data (the netmask _is_
an important data) we decided to go this way.
A single IP address without the netmask means the netmask
covers the whole 32 or 128 bits in real life, too.


  What will you do when we add support for
scoped IPv6 addressing, to take one obvious example?
  


It can still be represented as a series of bytes, won't it?
Just as in an actual IP packet header.
When the support arrives, I can fix the cast, too, if needed.


regards, tom lane
  


Best regards

--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Behavior of GENERATED columns per SQL2003

2007-05-10 Thread Zoltan Boszormenyi

Tom Lane írta:

After some more study of the SQL spec, the distinction between GENERATED
ALWAYS AS IDENTITY and GENERATED BY DEFAULT AS IDENTITY is not what
I thought it was.

* As far as I can find from the spec, there is *no* difference between
the two cases for INSERT commands.  The rule is that you ignore any
user-supplied data and use the default (ie, nextval()) unless OVERRIDING
SYSTEM VALUE is specified.  It is not an error to try to insert data
into an identity column, it's just ignored unless OVERRIDING SYSTEM
VALUE.

* The difference for UPDATE commands is that you can update a BY DEFAULT
identity column to anything you want, whereas for an ALWAYS identity
it's an error to update to anything but DEFAULT (which causes a fresh
nextval() to be assigned).  Both behaviors are different from a
generated column, which is updated whether you mention it or not.
  


The quoted SIGMOD paper mentioned that specifying a value
for a generated column should raise an error in INSERT but
this behaviour is not mentioned by the standard.
BTW, do you know what's a self-referencing column?
I haven't found a definition of it and there are places where the standard
uses this term on behaviour that would be natural for generated columns.
E.g. page 860 in latest drafts, section 10.14, or SQL:2003, section 14.8,
about INSERT statement:  the value the user specified should be stored if
some underlying column of Ci is a self-referencing column and
OVERRIDING SYSTEM VALUE is specified.


This means that GENERATED BY DEFAULT AS IDENTITY is not at all
equivalent to our historical behavior for SERIAL columns and hence we
cannot merge the two cases.
  


Yes, they are equivalent if you read 5IWD2-02-Foundation-2006-04.pdf
or 5CD2-02-Foundation-2006-01.pdf, i.e. the latest two drafts.
(The latter seems to be misnamed considering that www.wiscorp.com
refreshed the sql200n.zip on 2007-03-11.) Page 860, section 14.10, INSERT.
The value the user provides should be accepted for storage if:
- the column is an identity column and you provide
 OVERRIDING SYSTEM VALUE, or
- the column is an GENERATED BY DEFAULT AS IDENTITY
 and you provide neither OVERRIDING USER VALUE nor
 the DEFAULT specification for the column.

I think the babble about OVERRIDING USER VALUE
is somewhat controversial. Why would you want to do e.g.
INSERT INTO tabname (id, ...) OVERRIDING USER VALUE (N, ...);
where N is an explicit constant? And I haven't even implemented
handling it. Anyway, without specifying OVERRIDING USER VALUE
the GENERATED BY DEFAULT AS IDENTITY is
equivalent with traditional SERIAL in PostgreSQL.
Implementing OVERRIDING USER VALUE behaviour means
that GENERATED BY DEFAULT AS IDENTITY (or SERIAL) would be
marked as an identity as well, not as a column simply having a DEFAULT 
clause.

Otherwise OVERRIDING USER VALUE would override every
user-specified value for regular columns having a DEFAULT expression.


The lack of any behavioral difference for INSERT seems surprising
and counterintuitive; have I just missed something in the spec?
  


No, I was just ahead of the times and read newer drafts than SQL:2003.


BTW, I found what they did about the problem that generated columns
are out of sync with their underlying columns during BEFORE-trigger
execution: in 11.39

	12)If BEFORE is specified, then: 
	   ...

   c) The triggered action shall not contain a field
  reference that references a field in the new transition
  variable corresponding to a generated column of T.
  


I vaguely remember reading it, although the idea seem to have remained
in my mind. :-)


IOW they just pretend you can't look.  So I think we need not worry
about leaving the values out-of-date until after the triggers fire.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

  



--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Behavior of GENERATED columns per SQL2003

2007-05-10 Thread Zoltan Boszormenyi

Zoltan Boszormenyi írta:

The quoted SIGMOD paper mentioned that specifying a value
for a generated column should raise an error in INSERT but
this behaviour is not mentioned by the standard.


I found it now, I haven't read hard enough before.
SQL:2003, section 14.8, syntax rules:

10) If contextually typed table value constructor CTTVC is specified, 
then every contextually typed row
   value constructor element simply contained in CTTVC whose 
positionally corresponding column name
   in insert column list references a column of which some underlying 
column is a generated column shall

   be a default specification.

So, I can only omit the generated column or specify DEFAULT.
Anything else should raise an error. Should it be done in analyze.c
or in rewriteHandler.c?

--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] parser dilemma

2007-04-20 Thread Zoltan Boszormenyi

Martijn van Oosterhout írta:

On Thu, Apr 19, 2007 at 11:19:40AM +0200, Zoltan Boszormenyi wrote:
  

The problem comes from cases like

colname coltype DEFAULT 5! GENERATED ...

Since b_expr allows postfix operators, it takes one more token of
lookahead than we have to tell if the default expression is 5!
or 5!GENERATED 
  


ISTM that as long as:

 colname coltype DEFAULT (5!) GENERATED ...

works I don't see why it would be a problem to require the parentheses
in this case. Postfis operators are not going to be that common here I
think.

Have a nice day,
  


You mean like this one?

*** gram.y.old  2007-04-20 09:23:16.0 +0200
--- gram.y  2007-04-20 09:25:34.0 +0200
***
*** 7550,7557 
   { $$ = (Node *) makeA_Expr(AEXPR_OP, $2, 
$1, $3, @2); }
   | qual_Op 
b_expr%prec Op
   { $$ = (Node *) makeA_Expr(AEXPR_OP, $1, 
NULL, $2, @1); }
!   | b_expr 
qual_Op%prec POSTFIXOP
!   { $$ = (Node *) makeA_Expr(AEXPR_OP, $2, 
$1, NULL, @2); }
   | b_expr IS DISTINCT FROM b_expr
%prec IS

   {
   $$ = (Node *) 
makeSimpleA_Expr(AEXPR_DISTINCT, =, $1, $5, @2);

--- 7550,7557 
   { $$ = (Node *) makeA_Expr(AEXPR_OP, $2, 
$1, $3, @2); }
   | qual_Op 
b_expr%prec Op
   { $$ = (Node *) makeA_Expr(AEXPR_OP, $1, 
NULL, $2, @1); }
!   | '(' b_expr qual_Op
')' %prec POSTFIXOP
!   { $$ = (Node *) makeA_Expr(AEXPR_OP, $3, 
$2, NULL, @3); }
   | b_expr IS DISTINCT FROM b_expr
%prec IS

   {
   $$ = (Node *) 
makeSimpleA_Expr(AEXPR_DISTINCT, =, $1, $5, @2);



This change alone brings 13 reduce/reduce conflicts.

Best regards

--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] parser dilemma

2007-04-20 Thread Zoltan Boszormenyi

Andrew Dunstan írta:

Zoltan Boszormenyi wrote:

Martijn van Oosterhout írta:

On Thu, Apr 19, 2007 at 11:19:40AM +0200, Zoltan Boszormenyi wrote:
 

The problem comes from cases like

colname coltype DEFAULT 5! GENERATED ...

Since b_expr allows postfix operators, it takes one more token of
lookahead than we have to tell if the default expression is 5!
or 5!GENERATED 
  


ISTM that as long as:

 colname coltype DEFAULT (5!) GENERATED ...

works I don't see why it would be a problem to require the parentheses
in this case. Postfis operators are not going to be that common here I
think.

Have a nice day,
  


You mean like this one?

*** gram.y.old  2007-04-20 09:23:16.0 +0200
--- gram.y  2007-04-20 09:25:34.0 +0200
***
*** 7550,7557 
   { $$ = (Node *) makeA_Expr(AEXPR_OP, 
$2, $1, $3, @2); }
   | qual_Op 
b_expr%prec Op
   { $$ = (Node *) makeA_Expr(AEXPR_OP, 
$1, NULL, $2, @1); }
!   | b_expr 
qual_Op%prec POSTFIXOP
!   { $$ = (Node *) makeA_Expr(AEXPR_OP, 
$2, $1, NULL, @2); }
   | b_expr IS DISTINCT FROM 
b_expr%prec IS

   {
   $$ = (Node *) 
makeSimpleA_Expr(AEXPR_DISTINCT, =, $1, $5, @2);

--- 7550,7557 
   { $$ = (Node *) makeA_Expr(AEXPR_OP, 
$2, $1, $3, @2); }
   | qual_Op 
b_expr%prec Op
   { $$ = (Node *) makeA_Expr(AEXPR_OP, 
$1, NULL, $2, @1); }
!   | '(' b_expr qual_Op
')' %prec POSTFIXOP
!   { $$ = (Node *) makeA_Expr(AEXPR_OP, 
$3, $2, NULL, @3); }
   | b_expr IS DISTINCT FROM 
b_expr%prec IS

   {
   $$ = (Node *) 
makeSimpleA_Expr(AEXPR_DISTINCT, =, $1, $5, @2);



This change alone brings 13 reduce/reduce conflicts.




No - that's not what you do. All you need to do is remove those 2 
lines from the b_expr rules. The postfix rule will still be in a_expr 
and the parenthesized bit is taken care of in the ( a_expr ) rule for 
c_expr.


cheers

andrew


I just sent a new patch that marks GENERATED as %right,
which also solved the problem.

Best regards,
Zoltán

--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] parser dilemma

2007-04-19 Thread Zoltan Boszormenyi

Tom Lane írta:

...
If anyone seriously wants to propose removing postfix ops from b_expr,
we'd better take it up on someplace more widely read than -patches.

regards, tom lane
  


OK, I take the bullet and send it to -hackers.

For everyone who don't read -patches, let me reiterate the problem

During developing my GENERATED/IDENTITY patches,
a parser problem turned up.

Currently, DEFAULT is handled as a CONSTRAINT by the parser
to be able to write DEFAULT clause and CONSTRAINT clauses
in any order. Handling GENERATED { ALWAYS | BY DEFAULT}
AS { IDENTITY | ( expression ) } syntax in the same way causes
a conflict between DEFAULT and b_expr as discovered by Tom Lane.
He proposed two solutions, quote:


The problem comes from cases like

colname coltype DEFAULT 5! GENERATED ...

Since b_expr allows postfix operators, it takes one more token of
lookahead than we have to tell if the default expression is 5!
or 5!GENERATED 

There are basically two ways to fix this:

1. Collapse GENERATED ALWAYS and GENERATED BY into single tokens
using filtered_base_yylex.

2. Stop allowing postfix operators in b_expr.

I find #1 a bit icky --- not only does every case added to
filtered_base_yylex slow down parsing a little more, but combined
tokens create rough spots in the parser's behavior.  As an example,
both NULLS and FIRST are allegedly unreserved words, so this should
work:

regression=# create table nulls (x int);
CREATE TABLE
regression=# select first.* from nulls first;
ERROR:  syntax error at or near first
LINE 1: select first.* from nulls first;
  ^
regression=#

#2 actually seems like a viable alternative: postfix operators aren't
really in common use, and doing this would not only fix GENERATED but
let us de-reserve a few keywords that are currently reserved.  In a
non-exhaustive check I found that COLLATE, DEFERRABLE, and INITIALLY
could become unreserved_keyword if we take out this production:

*** 7429,7436 
  { $$ = (Node *) makeA_Expr(AEXPR_OP, $2, $1, $3, @2); }
  | qual_Op b_expr%prec Op
  { $$ = (Node *) makeA_Expr(AEXPR_OP, $1, NULL, $2, @1); }
- | b_expr qual_Op%prec POSTFIXOP
- { $$ = (Node *) makeA_Expr(AEXPR_OP, $2, $1, NULL, @2); }
  | b_expr IS DISTINCT FROM b_expr%prec IS
  {
$$ = (Node *) makeSimpleA_Expr(AEXPR_DISTINCT, =, $1, $5, 
@2);
--- 7550,7555 

(Hmm, actually I'm wondering why COLLATE is a keyword at all right
now... but the other two trace directly to the what-comes-after-DEFAULT
issue.)


I proposed a third solution, that is actually standard-conforming
and still leaves the possibility of having postfix operators.
The solution was to admit that DEFAULT is not a CONSTRAINT,
hence not mixable with them. The standard has this syntax:

column definition ::=
column name [ data type or domain name ]
[ default clause | identity column specification | generation 
clause ]

[ column constraint definition... ]
[ collate clause ]

This says that DEFAULT | GENERATED ... AS IDENTITY |
GENERATED ALWAYS AS ( expr ) must come after the data type
and before any CONSTRAINTs and the three forms are mutually exclusive.
This can be nicely handled by the parser and the analyzer phase
can save some cycles by not checking for conflicting DEFAULT clauses.

What do people think? Which would be the preferred solution?

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] Re: IDENTITY/GENERATED v36 Re: [PATCHES] Final version of IDENTITY/GENERATED patch

2007-04-16 Thread Zoltan Boszormenyi

Hi,

Zoltan Boszormenyi írta:

Zoltan Boszormenyi írta:

Tom Lane írta:

Zoltan Boszormenyi [EMAIL PROTECTED] writes:
 

So, I should allow DROP DEFAULT, implement
SET DEFAULT GENERATED ALWAYS AS
and modify the catalog so the GENERATED property
is part of pg_attrdef.



Sounds good.
  


Finally here it is.


What about IDENTITY?
Should it also be part of pg_attrdef? There are two ways
to implement it: have or don't have a notion of it.
The latter would treat GENERATED BY DEFAULT AS IDENTITY
the same as SERIAL.



Is there any good reason to distinguish the two?
  


Actually, I needed to have a flag for IDENTITY
but not for the reason above. I need it to distinguish
between GENERATED ALWAYS AS IDENTITY
and GENERATED ALWAYS AS ( expr ).

Changes:
- Rewritten the GENERATED/IDENTITY flags to be part of the default 
pg_attrdef

  This made the patch MUCH smaller.
- SERIALs are now the same as INTEGER GENERATED BY DEFAULT AS IDENTITY
- Allow DROP DEFAULT on GENERATED/IDENTITY columns
- Implemented SET GENERATED ALWAYS AS
- Modified syntax of SET GENERATED {ALWAYS | BY DEFAULT} AS IDENTITY
   so it reads as SET IDENTITY GENERATED {ALWAYS | BY DEFAULT}
   so compiling gram.y/gram.c doesn't give me errors.
   This DDL statement isn't part of SQL:2003 so it might be accepted
   as a PostgreSQL extension.
- Modified behaviour of SET IDENTITY to also restore the DEFAULT
   expression. Someone might have done did a DROP DEFAULT before
   but kept the OWNED sequence.
- Fixed behaviour of GENERATED columns regarding
   INSERT ... OVERRIDING SYSTEM VALUE and
   only those GENERATED columns get UPDATEd that
   are either explicitly modified with SET column = DEFAULT
   or one of their referenced columns are modified.
- Testcase and documentation is modified to reflect the above.


- Also allowed UPDATE on IDENTITY columns.



Please, review.




I just realized that by treating SERIAL the same as
IDENTITY GENERATED BY DEFAULT, I incidentally
broke the possibility of multiple SERIALs in the same table.
I rewrote the patch so instead of two BOOL flags,
I now have only one CHAR flag:
- ' ' says it's a simple DEFAULT expression
- 'i' says it's GENERATED ALWAYS AS IDENTITY
- 'g' says it's GENERATED ALWAYS AS ( expr )

Apart from making the patch a bit smaller again, checking only
for 'i' still allows multiple SERIALs in the same table but lets
disallowing multiple GENERATED ALWAYS AS IDENTITY.
Thinking a bit about it, is it desired to disallow multiple GENERATED
ALWAYS AS IDENTITY fields? It's just a twisted SERIAL anyway.
And it was said many times that it's not an advantage to blindly
follow the standard.

Also, DROP IDENTITY is equivalent with SET DEFAULT
nextval('owned_sequence') iff the field has an OWNED
sequence and it was GENERATED ALWAYS AS IDENTITY before.
Considering that SERIAL is a macro, and SET/DROP DEFAULT is allowed
on IDENTITY/GENERATED columns per Tom's request,
should I keep this statement?

Also, the current grammar is made to give a syntax error
if you say colname type GENERATED BY DEFAULT AS ( expr ).
But it makes the grammar unbalanced, and gives me:

bison -y -d  gram.y
conflicts: 2 shift/reduce

Is there a good solution to this?

I post the new patch after someone answers those questions for me.

--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[HACKERS] Re: IDENTITY/GENERATED v36 Re: [PATCHES] Final version of IDENTITY/GENERATED patch

2007-04-16 Thread Zoltan Boszormenyi

Tom Lane írta:

Zoltan Boszormenyi [EMAIL PROTECTED] writes:
  

Apart from making the patch a bit smaller again, checking only
for 'i' still allows multiple SERIALs in the same table but lets
disallowing multiple GENERATED ALWAYS AS IDENTITY.
Thinking a bit about it, is it desired to disallow multiple GENERATED
ALWAYS AS IDENTITY fields? It's just a twisted SERIAL anyway.



I don't see the value of disallowing it.
  


I thought so.


Also, DROP IDENTITY is equivalent with SET DEFAULT
nextval('owned_sequence') iff the field has an OWNED
sequence and it was GENERATED ALWAYS AS IDENTITY before.
Considering that SERIAL is a macro, and SET/DROP DEFAULT is allowed
on IDENTITY/GENERATED columns per Tom's request,
should I keep this statement?



If it's not in the spec I don't see any strong reason to have it...
  


It's not. Removed.


Also, the current grammar is made to give a syntax error
if you say colname type GENERATED BY DEFAULT AS ( expr ).
But it makes the grammar unbalanced, and gives me:
bison -y -d  gram.y
conflicts: 2 shift/reduce



You'll have to fix that.  Usually you can get around it by making the
grammar a bit more verbose --- if you were trying to avoid duplication
by means of optional productions, don't do that.
  


What do you mean by making it more verbose?

GENERATED BY DEFAULT AS ( expr ) is another
way of saying DEFAULT expr but that being similar
to GENERATED ALWAYS AS ( expr ) would make
the users think that it would permit smarter expressions
than simple DEFAULT would allow. My thought was
to disallow this in the grammar.

BTW, thanks for the quick answer.

--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Re: IDENTITY/GENERATED v36 Re: [PATCHES] Final version of IDENTITY/GENERATED patch

2007-04-16 Thread Zoltan Boszormenyi

Andrew Dunstan írta:

Florian G. Pflug wrote:


bison -y -d  gram.y
conflicts: 2 shift/reduce


I'ts been quite a time since I last used bison, but as far as I
remember, you can tell it to write a rather details log about
it's analysis of the grammar. That log should include more
detailed information about those conflicts - maybe that helps
to figure out their exact cause, and to find a workaround.



You can almost always get rid of shift/reduce conflicts by unwinding 
some of the productions - resist the temptation to factor the grammar. 
The effect of this is to eliminate places where the parser has to 
decide between shifting and reducing. (This is why, for example, 
almost all the drop foo if exists variants require separate 
productions rather than using opt_if_exists.)


cheers

andrew


Thanks. This idea solved one of the two shift/reduce conflicts.
But the other one can only be solved if I put GENERATED
into the reserved_keyword set. But the standard spec says
it's unreserved. Now what should I do with it?

Best regards,
Zoltán



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly




--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Crash bug in 8.2.3 on Solaris 10/Sparc

2007-03-26 Thread Zoltan Boszormenyi

Zoltan Boszormenyi írta:

Zdenek Kotala írta:

Zoltan Boszormenyi wrote:

Hi,

we have found that psql in PostgreSQL 8.2.3
has problems connecting to the server
running on Solaris 10/Sun SPARC.

$ uname -a
SunOS dev-machine 5.10 Generic_118833-36 sun4u sparc SUNW,Sun-Fire-V440

It seems that somehow the system provided
GCC 3.4.3 miscompiles timestamptz_send()
and it segfaults. The default function looks like this:



Can you send me how you compiled Postgres (configure switches, 
LDFLAGS ...) and is possible get core file?


This was the configure line:

./configure --prefix=/export/local/postgresql/postgresql-8.2.3 
--with-includes=/usr/local/include --with-libraries=/usr/local/lib/


I added --enable-debug --enable-depend --enable-cassert
to get sensible gdb report after that.

The problem was that the server had problems
after psql connected with these commands:

$ psql -l -h dev-machine -p 5477 -U user
psql: server closed the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.
$ psql -h dev-machine -p 5477 -U user template1
psql: server closed the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.

If the user doesn't have permissions in e.g. pg_hba.conf
then I get the correct permission denied error.
If the user can connect then some statement inside psql
causes segfault in the server.

Compiled with debug info, I got this from gdb on the core file:
$ gdb /.../pgsql/bin/postgres /.../data/core
...
Program terminated with signal 11, Segmentation fault.
#0  0x0021c8a0 in timestamptz_send (fcinfo=0x1) at timestamp.c:461
461 PG_RETURN_BYTEA_P(pq_endtypsend(buf));
(gdb)

I described my experiments, compiling with --enable-integer-datetimes
fixed the issue.


We compiled GCC-4.1.2 on this machine, recompiled PostgreSQL
with the new GCC without --enable-integer-datetimes and it fixed
the problem we experienced. It seems that my suspicion was right:
GCC-3.4.3 on Solaris 10/Sparc is buggy.

--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Crash bug in 8.2.3 on Solaris 10/Sparc

2007-03-26 Thread Zoltan Boszormenyi

Zoltan Boszormenyi írta:

Zoltan Boszormenyi írta:

Zdenek Kotala írta:

Zoltan Boszormenyi wrote:

Hi,

we have found that psql in PostgreSQL 8.2.3
has problems connecting to the server
running on Solaris 10/Sun SPARC.

$ uname -a
SunOS dev-machine 5.10 Generic_118833-36 sun4u sparc 
SUNW,Sun-Fire-V440


It seems that somehow the system provided
GCC 3.4.3 miscompiles timestamptz_send()
and it segfaults. The default function looks like this:



Can you send me how you compiled Postgres (configure switches, 
LDFLAGS ...) and is possible get core file?


This was the configure line:

./configure --prefix=/export/local/postgresql/postgresql-8.2.3 
--with-includes=/usr/local/include --with-libraries=/usr/local/lib/


I added --enable-debug --enable-depend --enable-cassert
to get sensible gdb report after that.

The problem was that the server had problems
after psql connected with these commands:

$ psql -l -h dev-machine -p 5477 -U user
psql: server closed the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.
$ psql -h dev-machine -p 5477 -U user template1
psql: server closed the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.

If the user doesn't have permissions in e.g. pg_hba.conf
then I get the correct permission denied error.
If the user can connect then some statement inside psql
causes segfault in the server.

Compiled with debug info, I got this from gdb on the core file:
$ gdb /.../pgsql/bin/postgres /.../data/core
...
Program terminated with signal 11, Segmentation fault.
#0  0x0021c8a0 in timestamptz_send (fcinfo=0x1) at timestamp.c:461
461 PG_RETURN_BYTEA_P(pq_endtypsend(buf));
(gdb)

I described my experiments, compiling with --enable-integer-datetimes
fixed the issue.


We compiled GCC-4.1.2 on this machine, recompiled PostgreSQL
with the new GCC without --enable-integer-datetimes and it fixed
the problem we experienced. It seems that my suspicion was right:
GCC-3.4.3 on Solaris 10/Sparc is buggy.



Oh, and the proof that I use the newly compiled version:

$ psql -h reddb-dev-pgr -p 5477 test
Welcome to psql 8.2.3, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
  \h for help with SQL commands
  \? for help with psql commands
  \g or terminate with semicolon to execute query
  \q to quit

test=# select version();
 version  


PostgreSQL 8.2.3 on sparc-sun-solaris2.10, compiled by GCC gcc (GCC) 4.1.2
(1 row)

test=# show integer_datetimes;
integer_datetimes
---
off
(1 row)

--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[HACKERS] Crash bug in 8.2.3 on Solaris 10/Sparc

2007-03-23 Thread Zoltan Boszormenyi

Hi,

we have found that psql in PostgreSQL 8.2.3
has problems connecting to the server
running on Solaris 10/Sun SPARC.

$ uname -a
SunOS dev-machine 5.10 Generic_118833-36 sun4u sparc SUNW,Sun-Fire-V440

It seems that somehow the system provided
GCC 3.4.3 miscompiles timestamptz_send()
and it segfaults. The default function looks like this:

Datum
timestamptz_send(PG_FUNCTION_ARGS)
{
   TimestampTz timestamp = PG_GETARG_TIMESTAMPTZ(0);
   StringInfoData buf;

   pq_begintypsend(buf);
#ifdef HAVE_INT64_TIMESTAMP
   pq_sendint64(buf, timestamp);
#else
   pq_sendfloat8(buf, timestamp);
#endif
   PG_RETURN_BYTEA_P(pq_endtypsend(buf));
}

GDB indicates crash at the last line.
No matter how I unrolled the function calls,
the indicated crasher line was always the one
before:

   pq_sendfloat8(buf, timestamp);

I must be a stack corruption somehow.
I also unrolled pq_sendfloat8() so the function looks like this:

Datum
timestamptz_send(PG_FUNCTION_ARGS)
{
   TimestampTz timestamp = PG_GETARG_TIMESTAMPTZ(0);
   StringInfoData buf;
   bytea   *byteap;
   union
   {
   float8  f;
   int64   i;
   }   swap;
   uint32  n32;
   pq_begintypsend(buf);
#ifdef HAVE_INT64_TIMESTAMP
   pq_sendint64(buf, timestamp);
   elog(NOTICE, timestamptz_send() HAVE_INT64_TIMESTAMP after
pq_sendint64);
#else
   swap.f = (float8)timestamp;
   elog(NOTICE, timestamptz_send() int64: %lld, swap.i);
   /* High order half first, since we're doing MSB-first */
#ifdef INT64_IS_BUSTED
   /* don't try a right shift of 32 on a 32-bit word */
   n32 = (swap.i  0) ? -1 : 0;
   elog(NOTICE, timestamptz_send() INT64_IS_BUSTED high 32: %d, n32);
#else
   n32 = (uint32) (swap.i  32);
   elog(NOTICE, timestamptz_send() high 32: %d, n32);
#endif
   n32 = htonl(n32);
   elog(NOTICE, timestamptz_send() htonl high 32: %d, n32);
   appendBinaryStringInfo(buf, (char *) n32, 4);

   /* Now the low order half */
   n32 = (uint32) swap.i;
   elog(NOTICE, timestamptz_send() low 32: %d, n32);
   n32 = htonl(n32);
   elog(NOTICE, timestamptz_send() htonl low 32: %d, n32);
   appendBinaryStringInfo(buf, (char *) n32, 4);

   elog(NOTICE, timestamptz_send() pq_sendfloat8);
#endif
   byteap = (bytea *) buf.data;
   elog(NOTICE, timestamptz_send() buf-data = %p, byteap);
   Assert(buf.len = VARHDRSZ);
   VARATT_SIZEP(byteap) = buf.len;
   PG_RETURN_BYTEA_P(byteap);
}

Th crashing line according to GDB is now the elog() call after:

   swap.f = (float8)timestamp;

This is a simple explicit type cast which shouldn't cause problems,
however it is the one that somehow corrupts something on the stack
and causes the segfault upon entering the function at the next
statement.

As a workaround, we recompiled PostgreSQL 8.2.3 with
--enable-integer-datetimes
and the client can connect to the server now, after initdb.

I tried to exercise calling timestamptz_send() but creating a table
with float8 field, INSERTing and SELECTing works, too.
Both textual and binary COPY FROM and COPY TO work, too.
Either these exercises didn't call pq_sendfloat8() or it
doesn't cause problems elsewhere, only in timestamptz_send().


--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Crash bug in 8.2.3 on Solaris 10/Sparc

2007-03-23 Thread Zoltan Boszormenyi

Zdenek Kotala írta:

Zoltan Boszormenyi wrote:

Hi,

we have found that psql in PostgreSQL 8.2.3
has problems connecting to the server
running on Solaris 10/Sun SPARC.

$ uname -a
SunOS dev-machine 5.10 Generic_118833-36 sun4u sparc SUNW,Sun-Fire-V440

It seems that somehow the system provided
GCC 3.4.3 miscompiles timestamptz_send()
and it segfaults. The default function looks like this:



Can you send me how you compiled Postgres (configure switches, LDFLAGS 
...) and is possible get core file?


This was the configure line:

./configure --prefix=/export/local/postgresql/postgresql-8.2.3 
--with-includes=/usr/local/include --with-libraries=/usr/local/lib/


I added --enable-debug --enable-depend --enable-cassert
to get sensible gdb report after that.

The problem was that the server had problems
after psql connected with these commands:

$ psql -l -h dev-machine -p 5477 -U user
psql: server closed the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.
$ psql -h dev-machine -p 5477 -U user template1
psql: server closed the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.

If the user doesn't have permissions in e.g. pg_hba.conf
then I get the correct permission denied error.
If the user can connect then some statement inside psql
causes segfault in the server.

Compiled with debug info, I got this from gdb on the core file:
$ gdb /.../pgsql/bin/postgres /.../data/core
...
Program terminated with signal 11, Segmentation fault.
#0  0x0021c8a0 in timestamptz_send (fcinfo=0x1) at timestamp.c:461
461 PG_RETURN_BYTEA_P(pq_endtypsend(buf));
(gdb)

I described my experiments, compiling with --enable-integer-datetimes
fixed the issue.




Did you try compile with different optimalization flags or did you try 
sun studio compiler?


No, and no. Sun Studio isn't installed, only gcc.



Zdenek



--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Crash bug in 8.2.3 on Solaris 10/Sparc

2007-03-23 Thread Zoltan Boszormenyi

Tom Lane írta:

Zoltan Boszormenyi [EMAIL PROTECTED] writes:
  

we have found that psql in PostgreSQL 8.2.3
has problems connecting to the server
running on Solaris 10/Sun SPARC.
...
It seems that somehow the system provided
GCC 3.4.3 miscompiles timestamptz_send()
and it segfaults.



I find it fairly hard to believe that timestamptz_send would be invoked
at all while using psql, much less during initial connection.  psql
doesn't do any binary-output requests.

regards, tom lane
  


Then please explain this miracle.
Anyway, your comment makes my suspicion about
the correctness of GCC-3.4.3 on Solaris 10/sparc
more founded now. :-)

--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Test report on GENERATED/IDENTITY

2007-03-07 Thread Zoltan Boszormenyi

Hi,

I made some tests to prove that GENERATED can help
boost performance. I created a table like this:

create table t1 (
 id serial,
 i1 integer,
 i2 integer,
 g1 integer generated always as (
 case when i1 is null then i2
when i2 is null then i1
else i1 + i2 end));

I inserted 1 million records into the table:

 for (i = 1; i = 1000; i++)
   for (j = 1; j = 1000; j++)
  INSERT INTO t1 (i1, i2) VALUES (i, j);

After VACUUM FULL ANALYZE,
I timed  SELECT id, i1, i2, g1 FROM t1
and SELECT id, i1, i2, generation expression FROM t1,
result redirected to /dev/null. Results of ten consecutive runs are:

SELECT id, i1, i2, g1 FROM t1
--

2.59user 0.18system 0:04.75elapsed 58%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33536minor)pagefaults 0swaps
2.57user 0.19system 0:04.59elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34068minor)pagefaults 0swaps
2.60user 0.16system 0:04.60elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33826minor)pagefaults 0swaps
2.59user 0.17system 0:04.82elapsed 57%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34067minor)pagefaults 0swaps
2.59user 0.16system 0:04.61elapsed 59%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34068minor)pagefaults 0swaps
2.61user 0.17system 0:04.60elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33574minor)pagefaults 0swaps
2.59user 0.18system 0:04.60elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34067minor)pagefaults 0swaps
2.59user 0.18system 0:04.67elapsed 59%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32147minor)pagefaults 0swaps
2.58user 0.19system 0:04.63elapsed 59%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33584minor)pagefaults 0swaps
2.73user 0.16system 0:04.94elapsed 58%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34066minor)pagefaults 0swaps

Average is 4.68 seconds.

SELECT id, i1, i2, generation expression FROM t1
--

2.76user 0.18system 0:05.49elapsed 53%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33945minor)pagefaults 0swaps
2.69user 0.17system 0:05.23elapsed 54%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33917minor)pagefaults 0swaps
2.60user 0.18system 0:05.04elapsed 55%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32027minor)pagefaults 0swaps
2.64user 0.16system 0:05.03elapsed 56%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32626minor)pagefaults 0swaps
2.60user 0.15system 0:05.03elapsed 54%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32550minor)pagefaults 0swaps
2.77user 0.18system 0:05.21elapsed 56%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32609minor)pagefaults 0swaps
2.71user 0.17system 0:05.07elapsed 57%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33043minor)pagefaults 0swaps
2.66user 0.17system 0:05.12elapsed 55%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33042minor)pagefaults 0swaps
2.80user 0.16system 0:05.19elapsed 57%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33043minor)pagefaults 0swaps
2.71user 0.16system 0:05.14elapsed 56%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32052minor)pagefaults 0swaps

Average is 5.16 seconds.

Using a single GENERATED column gains a good 9.4% in runtime.

With the IDENTITY, the picture is not so bright.
I expected some cost but not that much. Why is
[simple_]heap_update() so expensive? I created a table and
times inserting 1 million rows into it:

create table t2 (id serial, i1 integer);
or
create table t2 (id serial generated always as identity, i1 integer);

Using a serial column gave me about 12 seconds
on the average of 5 runs. With an IDENTITY column,
I got 61 seconds once and 66 seconds twice.
So, the strictness of the identity column gave me 500-550%
performance penalty.

With a single unique index on i1, I got 24.4 seconds
with the serial column and 67 seconds for the identity
column. I run these only once so this last one isn't
representative.

I tried to use heap_inplace_update() to update the
newly updated or inserted tuple in place but it gave me

ERROR:  heap_inplace_update: wrong tuple length

even when I already filled the IDENTITY column with
a constant Datum with an Int64 value 0 converted to
the type of the column.

If I read it correctly, the HOT patch would give me
a speedup for this case?

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

2007-03-01 Thread Zoltan Boszormenyi

Florian G. Pflug írta:

Zoltan Boszormenyi wrote:

The GENERATED column is an easy of use feature
with possibly having less work, whereas the IDENTITY
column is mandatory for some applications (e.g. accounting
and billing is stricter in some countries) where you simply
cannot skip a value in the sequence, the strict monotonity is
not enough.


But just postponing nextval() until after the uniqueness checks
only decreases the *probability* of non-monotonic values, and
*does not* preven them. Consindert two transactions

A: begin ;
B: Begin ;
A: insert ... -- IDENTITY generates value 1
B: insert .. -- IDENTITY generates value 2
A: rollback ;
B: commit ;


I can understand that. But your example is faulty,
you can't have transaction inside a transaction.
Checkpoints are another story. 8-)

You can have some application tricks to
have continous sequence today with regular
serials but only if don't have a unique index
that doesn't use the serial column. Inserting
a record to that table outside the transaction,
making note of the serial value.

If subsequent processing fails (because of unique,
check constraint, etc) you have to go back to the main
table and modify the record, indicating that the record
isn't representing valid data. But you must keep it with
the serial value it was assigned. I have seen systems
requiring this. My point is that with the identity
column, you will be able to define unique index
on the table that exludes the identity column.


Now there is a record with IDENTITY 2, but not with 1. The *only*
way to fix this is to *not* use a sequence, but rather do
lock table t in exclusive mode ;
select max(identity)+1 from t ;
to generate the identity - but of course this prevents any concurrent
inserts, which will make this unuseable for any larger database.

Note that this is not a deficency of postgres sequences - there is no
way to guarantee stricly monotonic values while allowing concurrent
selects at the same time. (Other than lazyly assigning the values, but
this needs to be done by the application)


Agreed.


I agree that I'd be nice to generate the identity columns as late as
possible to prevents needless gaps, but not if price is a for more
intrusive patch, or much higher complexity.


Intrusive, hm? The catalog have to indicate that the column
is IDENTITY, otherwise you cannot know it.

The cost I am thinking now is an extra heap_update()
after heap_insert() without generating the identity value
and inserting index tuples to indexes that doesn't
contain the identity column.

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

2007-03-01 Thread Zoltan Boszormenyi

Zoltan Boszormenyi írta:

Florian G. Pflug írta:

Zoltan Boszormenyi wrote:

The GENERATED column is an easy of use feature
with possibly having less work, whereas the IDENTITY
column is mandatory for some applications (e.g. accounting
and billing is stricter in some countries) where you simply
cannot skip a value in the sequence, the strict monotonity is
not enough.


But just postponing nextval() until after the uniqueness checks
only decreases the *probability* of non-monotonic values, and
*does not* preven them. Consindert two transactions

A: begin ;
B: Begin ;
A: insert ... -- IDENTITY generates value 1
B: insert .. -- IDENTITY generates value 2
A: rollback ;
B: commit ;


I can understand that. But your example is faulty,
you can't have transaction inside a transaction.
Checkpoints are another story. 8-)

You can have some application tricks to
have continous sequence today with regular
serials but only if don't have a unique index
that doesn't use the serial column. Inserting
a record to that table outside the transaction,
making note of the serial value.

If subsequent processing fails (because of unique,
check constraint, etc) you have to go back to the main
table and modify the record, indicating that the record
isn't representing valid data. But you must keep it with
the serial value it was assigned. I have seen systems
requiring this. My point is that with the identity
column, you will be able to define unique index
on the table that exludes the identity column.


Now there is a record with IDENTITY 2, but not with 1. The *only*
way to fix this is to *not* use a sequence, but rather do
lock table t in exclusive mode ;
select max(identity)+1 from t ;
to generate the identity - but of course this prevents any concurrent
inserts, which will make this unuseable for any larger database.

Note that this is not a deficency of postgres sequences - there is no
way to guarantee stricly monotonic values while allowing concurrent
selects at the same time. (Other than lazyly assigning the values, but
this needs to be done by the application)


Agreed.


I agree that I'd be nice to generate the identity columns as late as
possible to prevents needless gaps, but not if price is a for more
intrusive patch, or much higher complexity.


Intrusive, hm? The catalog have to indicate that the column
is IDENTITY, otherwise you cannot know it.

The cost I am thinking now is an extra heap_update()
after heap_insert() without generating the identity value
and inserting index tuples to indexes that doesn't
contain the identity column.


And as far as I tested the current state, there is no cost
if you don't use GENERATED or IDENTITY.
The extra heap_update() would be performed only
if you have an IDENTITY colum.


Best regards,
Zoltán Böszörményi





---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

2007-03-01 Thread Zoltan Boszormenyi

Florian G. Pflug írta:

Yes, of course you can prevent gaps by just filling them
with garbage/invalid records of whatever. But I don't see
why this is usefull - either you want, say, your invoice
number to be continuous because it's required by law - or
you don't. But if the law required your invoice numbers to be
continous, surely just filling the gaps with fake invoices
it just as illegal as having gaps in the first place.


Not fake invoice, stornoed for whatever reason.
But you have to keep the record to show you didn't delete anything.

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

2007-03-01 Thread Zoltan Boszormenyi

Zoltan Boszormenyi írta:

The cost I am thinking now is an extra heap_update()
after heap_insert() without generating the identity value
and inserting index tuples to indexes that doesn't
contain the identity column.


And as far as I tested the current state, there is no cost
if you don't use GENERATED or IDENTITY.
The extra heap_update() would be performed only
if you have an IDENTITY colum.


The modification I imagined is actually working:
- skip indexes using the identity columns
- do a simple_heap_update() after all other columns are
 assigned and index tuples are inserted
- do ExecInsertIndexTuples() on indexes referencing
 the IDENTITY column

However, I get warning messages like:

WARNING:  detected write past chunk end in ExecutorState 0xaaff68

How can I prevent them?

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

2007-03-01 Thread Zoltan Boszormenyi

Tom Lane írta:

Zoltan Boszormenyi [EMAIL PROTECTED] writes:
  

However, I get warning messages like:
WARNING:  detected write past chunk end in ExecutorState 0xaaff68
How can I prevent them?



Find the memory-clobbering bug in your patch.

regards, tom lane
  


Thanks, I found it.

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[HACKERS] psql problem querying relations

2007-02-28 Thread Zoltan Boszormenyi

Hi,

this is with current CVS code:

# \dt
ERROR:  did not find '}' at end of input node

Server log:

ERROR:  did not find '}' at end of input node
STATEMENT:  SELECT n.nspname as Schema,
 c.relname as Name,
 CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 
'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' END as 
Type,

 r.rolname as Owner
   FROM pg_catalog.pg_class c
JOIN pg_catalog.pg_roles r ON r.oid = c.relowner
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
   WHERE c.relkind IN ('r','')
 AND n.nspname NOT IN ('pg_catalog', 'pg_toast')
 AND pg_catalog.pg_table_is_visible(c.oid)
   ORDER BY 1,2;



Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] psql problem querying relations

2007-02-28 Thread Zoltan Boszormenyi

Andrew Dunstan írta:

Zoltan Boszormenyi wrote:

Hi,

this is with current CVS code:

# \dt
ERROR:  did not find '}' at end of input node

Server log:

ERROR:  did not find '}' at end of input node


It's working for me. Have you tried with a fresh checkout or after 
running make clean before you build?


cheers

andrew


I have to reinitdb my test database then tried again.
It's working regardless of my IDENTITY patch is applied or not.
Thanks.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

2007-02-28 Thread Zoltan Boszormenyi

Hi,

I would like to be able to harden the conditions
of generating IDENTITY columns so the
events below run in this order:

- assign values for regular columns (with or without DEFAULT)
- NOT NULL checks on regular columns
- CHECK constraints whose expression contains only regular columns

- assign values for GENERATED columns
- NOT NULL checks on GENERATED columns
- CHECK constraints whose expression may contain regular
 or GENERATED columns

- UNIQUE index checks that has only regular columns
- UNIQUE index checks that may have regular or GENERATED columns

- assign values for IDENTITY column
- NOT NULL on IDENTITY
- CHECK constraints on IDENTITY
- UNIQUE index checks that can reference IDENTITY column

At this point the heap tuple and the index tuple can be inserted
without further checks.

Currently tuple-t_self is required by ExecInsertIndexTuples()
and I don't see any way to make IDENTITY work the way it's
intended but to mix heap_insert()/heap_update() and
ExecInsertIndexTuples() together and use the result in
ExecInsert() and ExecUpdate().

Would it be acceptable?

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

2007-02-28 Thread Zoltan Boszormenyi

Hi,

Tom Lane írta:

Zoltan Boszormenyi [EMAIL PROTECTED] writes:
  

Would it be acceptable?



No, because you can't create index entries when you haven't yet got the
TID for the heap tuple.  What do you propose doing, insert a dummy index
entry and then go back to fill it in later?  Aside from approximately
  


No, I was thinking about breaking up e.g. heap_insert()
to be able to mix with ExecInsertIndexTuples() so I get a
pinned buffer and have the heap_tuple's t_self set first
then doing the uniqueness checks step by step.
BTW, can I use modify_tuple() after doing
RelationGetBufferForTuple() and RelationPutHeapTuple(),
right?


doubling the work involved, this is fundamentally broken because no
  


Well, the work wouldn't be doubled as all the unique indexes
have to be checked anyway with the current way, too, to have
the tuple accepted into the database.


other backend could know what to do upon encountering the dummy index
entry --- there's no way for it to check if the entry references a live
tuple or not.  Not to mention that a crash here will leave a permanently
dummy index entry that there's no way to vacuum.

The other rearrangements you suggest are not any more acceptable;
we are not going to restructure the entire handling of defaults and
check constraints around a single badly-designed SQL2003 feature.
  


My IDENTITY/GENERATED patch broke up the
checks currently this way (CHECK constraints are prohibited
for special case columns):

- normal columns are assigned values (maybe using DEFAULT)
- check NOT NULLs and CHECKs for normal columns

( Up to this point this works the same way as before if you don't
  use neither IDENTITY nor GENERATED. )

- assign GENERATED with ther values
- check NOT NULLs for GENERATED
- assign IDENTITY with value
- check NOT NULL for IDENTITY

and

- check UNIQUE for everything

Identity would be special so it doesn't inflate the sequence
if avoidable. Currently the only way if UNIQUE fails
for any index which is still very much makes it unusable.

What I would like to achieve is for IDENTITY to skip
a sequence value and fail to be INSERTed if the IDENTITY
column's uniqe check is failed. Which pretty much means
that there is already a record with that IDENTITY value
regardless of the UNIQUE index is defined for only the IDENTITY
column or the IDENTITY column is part of a multi-column
UNIQUE index.

If I could broke up the order of events the way I described
in my first mail, I could re-enable having CHECK constraints
for both IDENTITY and GENERATED columns.

The point with GENERATED is you have to have
all other columns assigned with values BEFORE
being able to compute a GENERATED column
that reference other columns in its expression so
you _have to_ break up the current order of computing
DEFAULTs. I know a computed column could be done
either in the application or with SELECTs but compare
the amount of work: if you do it in the SELECT you have to
compute the expressions every time the SELECT is run
making it slower. Doing it on UPDATE or INSERT
makes it LESS work in a fewer INSERT/UPDATE +
heavy SELECT workload. Of course, for a heavy UPDATE
workload it makes it more work but only if you actually
use GENERATED columns. It means exatly the same
amount of work if you use IDENTITY as with SERIAL,
it's just made in different order.

The GENERATED column is an easy of use feature
with possibly having less work, whereas the IDENTITY
column is mandatory for some applications (e.g. accounting
and billing is stricter in some countries) where you simply
cannot skip a value in the sequence, the strict monotonity is
not enough.

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[HACKERS] New version of IDENTITY/GENERATED

2007-02-20 Thread Zoltan Boszormenyi

Hi,

I started working on my previous patch, encouraged
by the fact that it became a wishlist item for 8.3. :-)

The changes in this version are:
- Refreshed to almost current (5 days old)
 CVS version of 8.3 devel
- The original SERIAL pseudo type is left alone,
 you _have to_ spell out GENERATED
 { ALWAYS | BY DEFAULT} AS IDENTITY
 to get an identity column.
- The action-at-a-distance behaviour is actually working
 for the IDENTITY/GENERATED columns on INSERT
 so the DEFAULT value is generated for them
 after all the regular columns were validated via
 ExecConstraints(). This way, if the validation fails,
 the sequence isn't inflated.
- Test case is updated to reflect the above.
- Documentation is updated, Identity columns have a new
 subsection now.
- Dropped my pg_dump changes, as the altered sequence is
 also dumped in 8.2, thanks to Tom Lane.

I am considering the following:
- Since the IDENTITY is a new feature (plain old SERIAL
 behaves the same as always) I will restore the SQL:2003
 confromant check that there can be only one identity column
 in a table at any time.
- I read somewhere (but couldn't find it now in SQL:2003)
 that CHECK constraints cannot be defined for GENERATED
 (and IDENTITY?) columns. Maybe it was in the latest draft,
 I have to look at it... Anyway, I have to implement checks
 to disallow CHECKs for such columns.
- Introduce an ALTER TABLE SET|DROP IDENTITY so
 a serial can be upgraded to an identity. This way, an identity
 column can be built by hand and pg_dump will need it, too.
 SET IDENTITY will either have to issue an error if CHECKs
 defined for such columns or automatically drop every such
 constraints.

And I have a question, too. Is there a way to use ExecEvalExpr*()
so values from a given tuples are used for current row? E.g.
at present, UPDATE table SET f1 = f1 + 1, f2 = f1 + 1;
sets both fields' new value to (f1 value before UPDATE) + 1.
For a GENERATED column, value _after_ UPDATE
is needed, so
CREATE TABLE table (
  f1 INTEGER,
  f2 INTEGER GENERATED ALWAYS AS (f1 + 1));
and no matter which one of the following is used:
UPDATE table SET f1 = f1 + 1;
or
UPDATE table SET f1 = f1 + 1, f2 = default;
the f2 current value = f1 current value + 1 is always maintained.

Best regards,
Zoltán Böszörményi



psql-serial-30.diff.gz
Description: Unix tar archive

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] How can I merge a TargetEntry with a tuple?

2007-02-19 Thread Zoltan Boszormenyi

Hi,

I started working again on my
IDENTITY/GENERATED patch.
My question is $SUBJECT.
This code is in rewriteTargetlist():

new_attr = build_column_default()
new_tle = makeTargetEntry((Expr *) new_expr, ...)

Now, in ExecInsert() I have to compute
the default for IDENTITY/GENERATED
between ExecConstraints() and heap_insert().

How can I create a Datum out of either an Expr
or a TargetEntry (that contains the computed
constant out of the default expression) so I can
use it after I did an heap_deform_tuple()?

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] 8.2.0 Tarball vs. REL8_2_0 vs. REL8_2_STABLE

2006-12-18 Thread Zoltan Boszormenyi

Matt Miller írta:

The [pgcluster-1.7.0rc1-patch] patch applies to the 8.2.0 tarball ...
However, the patch will not apply to cvs branch REL8_2_0.
  

I've been told that the pgcluster patch patches some generated files
(parse.h and other apparently).



Yes, I could not at first apply to REL8_2_0 because the patch file
wanted to patch src/backend/parser/gram.c.  At that point I started over
with a fresh REL8_2_0, ran ./configure; make, and tried the patch
again.  That's when I got a bunch of failures and fuzz.  The problem
files are:

src/backend/parser/gram.c
src/backend/parser/parse.h
src/interfaces/libpq/libpq.rc

So, I suppose libpq.rc is a derived file, also?

Now I have two questions.  First, why does pgcluster patch derived
files?  Is this just sloppy/lazy technique, or could there be some
  


Exactly.

E.g. PGCluster patches configure, not configure.in,
among others. The sugar on the top is PGCluster ruins
the nice portability of PostgreSQL. E.g. plain usage of
fork() without considering EXEC_BACKEND
is not portable across archs.


deeper reason?  I realize this is properly to be posed to the pgcluster
folks, but they don't seem to be too responsive, at least not to their
pgfoundry forums.

Second, does it make sense that the derived files that rejected the
patch would be so different between the 8.2.0 tarball and my
REL8_2_0 build?
  


If the autotools and bison is different,
they may certainly produce different files.

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] [HACKERS] Performance testing of COPY (SELECT) TO

2006-08-30 Thread Zoltan Boszormenyi

Thanks!!!

Tom Lane írta:

=?iso-8859-2?Q?B=F6sz=F6rm=E9nyi_Zolt=E1n?= [EMAIL PROTECTED] writes:
  

as per your suggestion, the COPY view TO support was cut and
a hint was added. Please, review.



Committed after some refactoring to avoid code duplication.

Unfortunately, in a moment of pure brain fade, I looked at the wrong
item in my inbox and wrote Bernd Helmle's name instead of yours in the
commit message :-(.  My sincere apologies.  Bruce, would you make a note
to be sure the right person gets credit in the release notes?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

  



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Performance testing of COPY (SELECT) TO

2006-08-26 Thread Zoltan Boszormenyi

Bruce Momjian írta:

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.
  


Thanks. Would you please add this instead?
psql built-in \copy (select ...) now also work.

Best regards,
Zoltán Böszörményi



pgsql-copyselect-8.patch.gz
Description: Unix tar archive

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Performance testing of COPY (SELECT) TO

2006-08-25 Thread Zoltan Boszormenyi

Hi,

Böszörményi Zoltán írta:

Böszörményi Zoltán [EMAIL PROTECTED] writes:


With PostgreSQL 8.1.4, I used this:
  
begin;

select ... into temp myquery1;
copy myquery1 to stdout csv delimiter '|';
rollback;
  

The performance of this would doubtless vary a lot with the temp_buffers
setting.  Did you try different values?



Yes, I did, but now checked back with 8.2CVS.
The previously quoted result was achieved with
temp_buffers = 1000 on both 8.1.4 and 8.2CVS.
On 8.2CVS with temp_buffers = 4096, the 10 client case kills
the machine with swapping, but the 3 client runtime with
COPY(SELECT) went down to 2:41. The SELECT INTO TEMP
case went down to 3:36.

  

It'd also be interesting to time the same way (with a temp table) in
devel.  I don't remember whether we did any performance work on the
COPY CSV data path in this cycle, or whether that was all present in
8.1.  In any case it'd be worth proving that the COPY SELECT patch isn't
degrading performance of the copy-a-relation case.



I will report back with that, say on Monday.
  


It seems my previous mail hasn't reached
the hackers list, I answer here.

In the export, there is a largish table,
that has both many columns and rows.

With COPY(SELECT) patch applied:

time psql -c copy (select * from table) to 'file' dbx
COPY 886046

real0m13.253s
user0m0.000s
sys 0m0.000s

time psql -c copy table to 'file' dbx
COPY 886046

real0m13.234s
user0m0.000s
sys 0m0.000s

time psql -c copy table to stdout dbx file

real0m15.155s
user0m0.540s
sys 0m0.450s

time psql -c copy (select * from table) to stdout dbx file

real0m15.079s
user0m0.540s
sys 0m0.590s

Surprisingly, without the COPY(SELECT) patch it's slower,
this is the lowest from five runs, e.g. with warm caches:

time psql -c copy table to 'file' dbx

real0m20.464s
user0m0.000s
sys 0m0.010s

time psql -c copy table to stdout dbx file

real0m25.753s
user0m0.570s
sys 0m0.460s

With the original settings, temp_buffers = 1000 on 8.2CVS,
the export runtime with one client looks like this:
first run 1:44, second run 1:12, third run 1:04.
It seems it's a bit faster both on startup and on
subsequent runs.

Best regards,
Zoltán Böszörményi



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] COPY view

2006-08-24 Thread Zoltan Boszormenyi

Tom Lane írta:

Zoltan Boszormenyi [EMAIL PROTECTED] writes:
  

How about the callback solution for the SELECT case
that was copied from the original? Should I consider
open-coding in copy.c what ExecutorRun() does
to avoid the callback?



Adding a DestReceiver type is a good solution ... although that static
variable is not.  Instead define a DestReceiver extension struct that
can carry the CopyState pointer for you.


Done.


  You could also consider
putting the copy-from-view-specific state fields into DestReceiver
instead of CopyState, though this is a bit asymmetric with the relation
case so maybe it's not really cleaner.
  


Left it alone for now.


BTW, lose the tuple_to_values function --- it's an extremely bad
reimplementation of heap_deform_tuple.


Done.


  copy_dest_printtup also seems
coded without regard for the TupleTableSlot access API (read printtup()
to see what to do instead).


I am still interpreting it. Can you give me some hints
besides using slot_getallattrs(slot)?


  And what's the point of factoring out the
heap_getnext loop as CopyRelationTo?  It's not like that lets you share
any more code.  The inside of the loop, ie what you've called
CopyValuesTo, is the sharable part.
  


Done.

The option parsing and error checking is now common.

I also changed it to use transformStmt() in analyze.c.
However, both the UNION and sunselect cases give me
something like this:

ERROR:  could not open relation 1663/16384/16723: No such file or directory

What else can I do with it?

Best regards,
Zoltán Böszörményi



pgsql-copyselect-4.patch.gz
Description: Unix tar archive

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PATCHES] COPY view

2006-08-24 Thread Zoltan Boszormenyi

Zoltan Boszormenyi írta:

The option parsing and error checking is now common.

I also changed it to use transformStmt() in analyze.c.
However, both the UNION and sunselect cases give me
something like this:

ERROR:  could not open relation 1663/16384/16723: No such file or 
directory


What else can I do with it?


But a single SELECT with two tables joined
also works so it must be something trivial.

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] COPY view

2006-08-24 Thread Zoltan Boszormenyi

Zoltan Boszormenyi írta:

Zoltan Boszormenyi írta:

The option parsing and error checking is now common.

I also changed it to use transformStmt() in analyze.c.
However, both the UNION and sunselect cases give me
something like this:

ERROR:  could not open relation 1663/16384/16723: No such file or 
directory


What else can I do with it?


But a single SELECT with two tables joined
also works so it must be something trivial.


Now UNIONs and subselects also work.

Your concern about copy_dest_printtup()
wasn't solved yet.

Best regards,
Zoltán Böszörményi




pgsql-copyselect-5.patch.gz
Description: Unix tar archive

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] COPY view

2006-08-23 Thread Zoltan Boszormenyi

Tom Lane írta:

[ cc list trimmed to something reasonable ]

Zoltan Boszormenyi [EMAIL PROTECTED] writes:
  

OK, here's my current version. The reference leak is fixed.
But as my testcase shows, it only works for single selects
currently. The parser accepts it but COPY doesn't produce
the expected output. Please, suggest a solution.



This patch seems to be adding a tremendous amount of duplicated code
in copy.c.  Why?
  


1. to minimize the already huge code impact on the relation case.
2. the checks done for the SELECT case is not exactly the
same as for the realation case.
3. the relation case is managed by passing around
   a Relation pointer, e.g. CopyGetAttnums. This simply
   not appropriate for the SELECT case.

I will try to clean it up as much as I can, though.
How about the callback solution for the SELECT case
that was copied from the original? Should I consider
open-coding in copy.c what ExecutorRun() does
to avoid the callback?


Also, moving checks for the relation case out of copy.c and into
analyze.c is inappropriate.  The checks you have moved there are
actually wrong because you have no lock on the relation at the time
you are checking.  You could perhaps take a lock at analyze time,
but frankly I see no reason for this patch to be messing with the
relation case at all.
  


OK, I will put the checks back where they were.


As for the UNION problem, try passing the query to transformStmt
rather than prejudging where transformStmt will send it.  Compare for
instance the analyze.c code for ExplainStmt.
  


Thanks.

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PATCHES] COPY view

2006-08-23 Thread Zoltan Boszormenyi

Hi,

Bruce Momjian írta:

Alvaro Herrera wrote:
  

Bruce Momjian wrote:


Andrew Dunstan wrote:
  

Bruce Momjian wrote:


I think Alvaro is saying we need it in a few days, not longer.
  

I thought he was saying today ;-)


He actually said now, but I don't think we need it immediately,
especially if he is still working on it.  We are at least 1-2 weeks away
from having all open patches applied.
  

Yes, I'm saying today so that we can all look at it and point obvious
mistakes now, not in 2 weeks from now.  Release early, release often.
If the patch contains a mistake and we find out in 2 weeks, are we going
to fix it?  No, we are going to reject it.



OK, I understand.   B?sz?rm?nyi, post now so we can see where you are,
but keep working and send it to us again when you are done.  No sense in
not posting your working version.
  


OK, here's my current version. The reference leak is fixed.
But as my testcase shows, it only works for single selects
currently. The parser accepts it but COPY doesn't produce
the expected output. Please, suggest a solution.

BTW, my first name is Zoltán.

Best regards,
Zoltán Böszörményi



pgsql-copyselect.patch.gz
Description: Unix tar archive

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] COPY view

2006-08-23 Thread Zoltan Boszormenyi

Zoltan Boszormenyi írta:

Hi,

Bruce Momjian írta:

Alvaro Herrera wrote:
 

Bruce Momjian wrote:
   

Andrew Dunstan wrote:
 

Bruce Momjian wrote:
   

I think Alvaro is saying we need it in a few days, not longer.
  

I thought he was saying today ;-)


He actually said now, but I don't think we need it immediately,
especially if he is still working on it.  We are at least 1-2 weeks 
away

from having all open patches applied.
  

Yes, I'm saying today so that we can all look at it and point obvious
mistakes now, not in 2 weeks from now.  Release early, release often.
If the patch contains a mistake and we find out in 2 weeks, are we 
going

to fix it?  No, we are going to reject it.



OK, I understand.   B?sz?rm?nyi, post now so we can see where you are,
but keep working and send it to us again when you are done.  No sense in
not posting your working version.
  


OK, here's my current version. The reference leak is fixed.
But as my testcase shows, it only works for single selects
currently. The parser accepts it but COPY doesn't produce
the expected output. Please, suggest a solution.


I meant that UNION selects, subselects don't work yet.




BTW, my first name is Zoltán.

Best regards,
Zoltán Böszörményi




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq
  



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] COPY view

2006-08-23 Thread Zoltan Boszormenyi

Alvaro Herrera írta:

Zoltan Boszormenyi wrote:

  

OK, here's my current version. The reference leak is fixed.
But as my testcase shows, it only works for single selects
currently. The parser accepts it but COPY doesn't produce
the expected output. Please, suggest a solution.



I'm not sure I agree with the approach of creating a fake SELECT * FROM
foo in analyze.c in the relation case and passing it back to the parser
to create a Query node.  That's not there in the original code and you
shouldn't need it.  Just let the case where COPY gets a relation
continue to handle it as it does today, and add a separate case for the
SELECT.
  


The exact same code was there,
e.g. parse and rewrite SELECT * FROM view
just not in analyze.c. I will try without it, though.



That doesn't help you with the UNION stuff though.
  


:-(


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: GENERATED ... AS IDENTITY, Was: Re: [HACKERS] Feature Freeze

2006-08-03 Thread Zoltan Boszormenyi

Rod Taylor írta:

For db restoration (pg_dump), how do you restore to the same values as
previously if it is always regenerated? By making ALWAYS a suggestion
for some users instead of always enforced and providing an override
mechanism for it. I assume it only works for relation owners but I've
not figured out how the spec does permissions.

override clause ::=
OVERRIDING USER VALUE
  | OVERRIDING SYSTEM VALUE

In short, pg_dump should append OVERRIDING SYSTEM VALUE to any insert or
copy for relations with an GENERATED ALWAYS identity column and the
backend will need to respect that.
  
  

Aren't INSERT and COPY distinguished in code paths?



Yes, they are separate but they also use the same permission set.

Any user can copy into a structure at any time and virtually every
restriction will be applied normally (CHECK, DEFAULT, etc.). Copy
bypasses Rules, significant parsing overhead since there is no need to
look for subselects, and possibly some triggers are bypassed. I'm fairly
sure that foreign key triggers fire.

In short, COPY doesn't bypass enforcement. GENERATED ALWAYS is part of
that enforcement and should be included in that.

If it is not included, we cannot recommend GENERATED ALWAYS for uses
like recording CURRENT_USER in an audit log since the data could be
fudged.
  


OK. So COPY needs an OVERRIDING close, too, not just INSERT.
In the meantime I implemented the discussed restrictions on
UPDATE for GENERATED ALWAYS columns, allowing
UPDATE tab SET col = default only for
GENERATED ALWAYS AS. I also implemented
INSERT ... OVERRIDING { SYSTEM | USER } VALUE.
If I got it correctly, OVERRIDING USER VALUE
seems to be the same as omitting the OVERRIDING clause...


ALWAYS is really only enforced for anyone who doesn't have permission to
specify otherwise.


Another one that got me is what do you do if you do this:

CREATE TABLE tab (col integer);
INSERT INTO tab VALUES (10);
ALTER TABLE tab ALTER col GENERATED ALWAYS AS IDENTITY;

What is the value for tab.col? It would seem that the table should
be rewritten with all values for col recalculated -- thus it would be
'1'. But wait! Can we add the override clause here too to keep the old
values and change the enforcement for new tuples only?
  
  

I don't think we should rewrite existing rows because
when it was inserted, the stored value was valid
according to the rules at that time. What if you
have more than one rows in that table?



SERIAL has, until recently, been described as a macro. A tool for
setting things up quickly but many parts of which can be changed by hand
after-ward. It's not exactly a good source for information on how this
structure should work. For one, you can easily override the suggested
default a serial gives at any time as any user. The intention of ALWAYS
is to prevent exactly that behaviour.

I don't have an opinion on ALTER TABLE changes for this one way or the
other. It was my intention to advise that a group decision is required
and some research into what other databases do in this case. I believe
MSSQL and DB2 both implement this functionality.
  


I see.

I looked a bit into the TODO entry that's about
ALTER TABLE tab ALTER col RENAME newcol
should also rename the sequence. My question is:
is it legal to call pg_get_serial_sequence() from
src/backen/parser/analyze.c:transformAlterTableStmt()?
This would be the easiest way to issue an
ALTER TABLE oldseq RENAME newseq
command automatically.

And I think I found a bug in PostgreSQL.
If I do this:

create table tab1 (id serial, t text); -- creates tab1_id_seq
create table tab2 (id serial, t text) inherits (tab1); -- creates 
tab2_id_seq

drop table tab1 cascade;

then tab1_id_seq gets also dropped but tab2_id_seq doesn't.
Both 8.1.4 and current 8.2CVS do this.


Oh, and one more item. These expressions have the same abilities as a
CHECK constraint for referencing other columns.

This example comes from an IBM Guide:

CREATE TABLE T1(c1 INT, c2 DOUBLE, 
c3 DOUBLE GENERATED ALWAYS AS (c1 + c2),
c4 SMALLINT GENERATED ALWAYS AS 
  (CASE
 WHEN c1  c2 THEN 1 
 ELSE NULL

   END)
   ); 

  


For this to work, we need to lift the restriction
on DEFAULT so cother columns can appear in the
expression. Dependencies must be tracked between
columns so GENERATED ALWAYS columns on UPDATE
and DEFAULT/ GENERATED ALWAYS columns on
INSERT get their computed values. Circular dependencies
must be avoided, etc. Hm.


Here is what IBM has to say about ALTER TABLE and GENERATED ALWAYS:

GENERATED 
Specifies that DB2 generates values for the column.

ALWAYS 
Specifies that DB2 will always generate

a value for the 

Re: GENERATED ... AS IDENTITY, Was: Re: [HACKERS] Feature Freeze

2006-08-03 Thread Zoltan Boszormenyi

Hi,

next version follows. Changes:

- Supports OVERRIDING { USER | SYSTEM } VALUE syntax
 not yet documented, I have doubts about USER variant
- UPDATES is forbidden entirely on GENERATED ALWAYS
 AS IDENTITY columns, UPDATE tab SET col = DEFAULT is
 allowed on GENERATED ALWAYS AS ( expr ) columns
- ALTER TABLE tab ALTER col RESTART [WITH] N  and
 ALTER TABLE tab ALTER col SET identity_options are supported
 but not yet documented
- extended the test case but the expected .out wasn't updated
 so 1 out of 101 tests fail.

After exercising with the last one, ALTER tab RENAME to newtab
and ALTER tab RENAME col TO newcol should be easy.
With the introduced infrastructure to correctly support
the first two changes (new column attribute: attidentity)
it is be easy to implement checks to disallow
ALTER TABLE tab DROP DEFAULT on IDENTITY columns.

Best regards,
Zoltán Böszörményi

Zoltan Boszormenyi írta:

Rod Taylor írta:

For db restoration (pg_dump), how do you restore to the same values as
previously if it is always regenerated? By making ALWAYS a suggestion
for some users instead of always enforced and providing an override
mechanism for it. I assume it only works for relation owners but I've
not figured out how the spec does permissions.

override clause ::=
OVERRIDING USER VALUE
  | OVERRIDING SYSTEM VALUE

In short, pg_dump should append OVERRIDING SYSTEM VALUE to any 
insert or

copy for relations with an GENERATED ALWAYS identity column and the
backend will need to respect that.


Aren't INSERT and COPY distinguished in code paths?



Yes, they are separate but they also use the same permission set.

Any user can copy into a structure at any time and virtually every
restriction will be applied normally (CHECK, DEFAULT, etc.). Copy
bypasses Rules, significant parsing overhead since there is no need to
look for subselects, and possibly some triggers are bypassed. I'm fairly
sure that foreign key triggers fire.

In short, COPY doesn't bypass enforcement. GENERATED ALWAYS is part of
that enforcement and should be included in that.

If it is not included, we cannot recommend GENERATED ALWAYS for uses
like recording CURRENT_USER in an audit log since the data could be
fudged.
  


OK. So COPY needs an OVERRIDING close, too, not just INSERT.
In the meantime I implemented the discussed restrictions on
UPDATE for GENERATED ALWAYS columns, allowing
UPDATE tab SET col = default only for
GENERATED ALWAYS AS. I also implemented
INSERT ... OVERRIDING { SYSTEM | USER } VALUE.
If I got it correctly, OVERRIDING USER VALUE
seems to be the same as omitting the OVERRIDING clause...

ALWAYS is really only enforced for anyone who doesn't have 
permission to

specify otherwise.


Another one that got me is what do you do if you do this:

CREATE TABLE tab (col integer);
INSERT INTO tab VALUES (10);
ALTER TABLE tab ALTER col GENERATED ALWAYS AS IDENTITY;

What is the value for tab.col? It would seem that the table should
be rewritten with all values for col recalculated -- thus it 
would be
'1'. But wait! Can we add the override clause here too to keep 
the old

values and change the enforcement for new tuples only?


I don't think we should rewrite existing rows because
when it was inserted, the stored value was valid
according to the rules at that time. What if you
have more than one rows in that table?



SERIAL has, until recently, been described as a macro. A tool for
setting things up quickly but many parts of which can be changed by hand
after-ward. It's not exactly a good source for information on how this
structure should work. For one, you can easily override the suggested
default a serial gives at any time as any user. The intention of ALWAYS
is to prevent exactly that behaviour.

I don't have an opinion on ALTER TABLE changes for this one way or the
other. It was my intention to advise that a group decision is required
and some research into what other databases do in this case. I believe
MSSQL and DB2 both implement this functionality.
  


I see.

I looked a bit into the TODO entry that's about
ALTER TABLE tab ALTER col RENAME newcol
should also rename the sequence. My question is:
is it legal to call pg_get_serial_sequence() from
src/backen/parser/analyze.c:transformAlterTableStmt()?
This would be the easiest way to issue an
ALTER TABLE oldseq RENAME newseq
command automatically.

And I think I found a bug in PostgreSQL.
If I do this:

create table tab1 (id serial, t text); -- creates tab1_id_seq
create table tab2 (id serial, t text) inherits (tab1); -- creates 
tab2_id_seq

drop table tab1 cascade;

then tab1_id_seq gets also dropped but tab2_id_seq doesn't.
Both 8.1.4 and current 8.2CVS do this.


Oh, and one more item. These expressions have the same abilities as a
CHECK constraint for referencing other columns.

This example comes from an IBM Guide:

CREATE TABLE T1(c1 INT, c2 DOUBLE, c3 
DOUBLE

GENERATED ... AS IDENTITY, Was: Re: [HACKERS] Feature Freeze

2006-08-01 Thread Zoltan Boszormenyi

Hi,

I have progressed a bit with my pet project, a.k.a $SUBJECT.

Now GENERATED ALWAYS AS IDENTITY and
GENERATED ALWAYS AS ( expr ) work as
intended. Documentation was also extended.
Some test cases are also included, that shows
that ALTER TABLE ALTER TYPE keeps both
the sequence and the GENERATED ALWAYS
property. Gzipped patch is attached.

Next steps are:
- pg_dump support
- more ALTER TABLE support for adding and
 dropping IDENTITY and GENERATED ALWAYS
 features
- more testing

I still maintain that I don't see any standard
requirement between the GENERATED AS IDENTITY
and NEXT VALUE FOR but obviously both
require SEQUENCE as supported feature
in parallel. I can be proven wrong, though,
but please, quote section# and text where
it can be found in the standard.

As for why GENERATED ALWAYS AS IDENTITY
is useful? Consider someone who is coming from
another DBMS (Informix, Access, etc.) where
INSERT INTO table (id, ...) VALUES (0, ...);
inserts the next value for the autoincrementer field
instead of 0. Leaving out fields from INSERT is
not allowed in the source because of documentation
reasons and writing DEFAULT is not handy or not
found in that legacy DBMS' features.
Multiply it with N applications that was written
that way over the years of the lifespan of a large
project, count in the human resistance to learn
something new (say 2.5x multiplier, but that may be
under-estimated :-) ) and a feature that help porting
easier will be a cheered feature. IIRC Bruce Momjian
himself wrote in this list that ease-of-use features
can boost PostgreSQL userbase pretty quickly.

So, please, review my patch in it's current state
and decide whether it's a 8.2-worthy feature.

BTW, is there anyone working on COPY FROM ( select ) feature?

Thanks in advance and best regards,
Zoltán Böszörményi



psql-serial-12.diff.gz
Description: Unix tar archive

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: GENERATED ... AS IDENTITY, Was: Re: [HACKERS] Feature Freeze

2006-08-01 Thread Zoltan Boszormenyi

Rod Taylor írta:

On Tue, 2006-08-01 at 18:10 +0200, Zoltan Boszormenyi wrote:
  

Hi,

I have progressed a bit with my pet project, a.k.a $SUBJECT.

Now GENERATED ALWAYS AS IDENTITY and
GENERATED ALWAYS AS ( expr ) work as
intended. Documentation was also extended.



I'm only commenting because I debated trying to implement this feature a
couple of times.


Thanks for commenting it.


 The ugliness required for pg_dump put me off of doing
it.
  


I haven't looked into it yet.


I did not see a test for enforcement during COPY.


That was sort of intended, COPY is expected
to pull back the same record it wrote out.
But see below.


 UPDATE restrictions
appear to have been missed as well:

4) If set clause SC specifies an object column that
references a column of which some underlying column is either a
generated column or an identity column whose descriptor
indicates that values are always generated, then the update
source specified in SC shall consist of a default
specification.

object column is the update target, or the left hand side of the
equation. In short, if a column marked GENERATED ALWAYS is updated then
it must be to DEFAULT or not provided as an update target.

CREATE TABLE tab (col integer GENERATED ALWAYS AS IDENTITY);
UPDATE tab SET col = DEFAULT; -- ACCEPTED
UPDATE tab SET col = 1; -- ERROR
  


Yes, I have also read that detail but not yet implemented it.
I was too happy that I found a straightforward way to make
GENERATED ALWAYS work.


For db restoration (pg_dump), how do you restore to the same values as
previously if it is always regenerated? By making ALWAYS a suggestion
for some users instead of always enforced and providing an override
mechanism for it. I assume it only works for relation owners but I've
not figured out how the spec does permissions.

override clause ::=
OVERRIDING USER VALUE
  | OVERRIDING SYSTEM VALUE

In short, pg_dump should append OVERRIDING SYSTEM VALUE to any insert or
copy for relations with an GENERATED ALWAYS identity column and the
backend will need to respect that.
  


Aren't INSERT and COPY distinguished in code paths?
(I don't have too deep knowledge about PostgreSQL internals, yet.)
If they are, OVERRIDING SYSTEM VALUE will be
needed only when pg_dump produces INSERTs.


ALWAYS is really only enforced for anyone who doesn't have permission to
specify otherwise.


Another one that got me is what do you do if you do this:

CREATE TABLE tab (col integer);
INSERT INTO tab VALUES (10);
ALTER TABLE tab ALTER col GENERATED ALWAYS AS IDENTITY;

What is the value for tab.col? It would seem that the table should
be rewritten with all values for col recalculated -- thus it would be
'1'. But wait! Can we add the override clause here too to keep the old
values and change the enforcement for new tuples only?
  


I don't think we should rewrite existing rows because
when it was inserted, the stored value was valid
according to the rules at that time. What if you
have more than one rows in that table?
Which record gets which value? You cannot know
which record was inserted first because subsequent
updates may ruin that order before the ALTER TABLE.
And recalculating the max value of col isn't too reliable
if another session is also inserting records.

And what about non-unique columns?
Plain SERIALs aren't declared unique automatically, either.
Consider the following:

CREATE TABLE tab (col integer);
INSERT INTO tab VALUES (10);
ALTER TABLE tab ALTER col GENERATED ALWAYS AS IDENTITY( MINVALUE 1 MAXVALUE 12 
CYCLE );


Here I expect equal values and I don't want
existing rows rewritten.

E.g. if you want a new start value, you will also need to issue
ALTER TABLE tab ALTER col RESTART WITH n;
which I started to implement.

Also, for a unique SERIAL column, you can still
insert a record with an out-of-order number and
one of the INSERTs that reach that number will
fail with unique violation. e.g. it's not a real
autoincrementer field. Or you can alter a
sequence that supports such a column.
PostgreSQL documents both behaviour and
I wanted to keep it.

Thanks for the comments,
Zoltán Böszörményi


Some test cases are also included, that shows
that ALTER TABLE ALTER TYPE keeps both
the sequence and the GENERATED ALWAYS
property. Gzipped patch is attached.

Next steps are:
- pg_dump support
- more ALTER TABLE support for adding and
  dropping IDENTITY and GENERATED ALWAYS
  features
- more testing

I still maintain that I don't see any standard
requirement between the GENERATED AS IDENTITY
and NEXT VALUE FOR but obviously both
require SEQUENCE as supported feature
in parallel. I can be proven wrong, though,
but please, quote section# and text where
it can be found in the standard.

As for why GENERATED ALWAYS AS IDENTITY
is useful? Consider someone who is coming from
another DBMS (Informix, Access, etc.) where
INSERT INTO table (id

Re: [HACKERS] Three weeks left until feature freeze

2006-07-30 Thread Zoltan Boszormenyi

Hi,

Tom Lane írta:

Zoltan Boszormenyi [EMAIL PROTECTED] writes:
  

I am working on adding a new column contraint,
namely the GENERATED [ALWAYS | BY DEFAULT ] AS
[ IDENTITY ( sequence_options ) | ( expression )]



Doesn't this still have the issue that we're taking over spec-defined
syntax to represent behavior that does not quite match the spec-defined
semantics?  It's not clear to me how closely tied this syntax is to
NEXT VALUE FOR, but if it's got any of the latter's action-at-a-distance
subtleties then I think we ought to leave well enough alone till we have
a solution for that.

regards, tom lane
  


Sorry for not answering earlier, I was on a holiday.
I read again sections 6.13 and 9.21 about NEXT
VALUE FOR and generation of the next value of
a sequence generator, respectively. If I see it right,
neither of them require the other one.

The IDENTITY COLUMN syntax in section 11.4
only mentions section 9.21. Section 14.8 about
INSERT statement does not even mention it,
only refers to default clause in section 11.5.
And that also doesn't say anything about neither
NEXT VALUE FOR nor next value generation of
a sequence.

And I saw comments in the PostgreSQL
documentation that goes like this:
Standard doesn't specify so we are conforming.
Hint, hint. ;-)

I think the IDENTITY COLUMN (and GENERATED
ALWAYS AS) can stand on its own without
NEXT VALUE FOR.

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Three weeks left until feature freeze

2006-07-14 Thread Zoltan Boszormenyi

Hi,

Bruce Momjian írta:

There are roughly three weeks left until the feature freeze on August 1.
If people are working on items, they should be announced before August
1, and the patches submitted by August 1.  If the patch is large, it
should be discussed now and an intermediate patch posted to the lists
soon.

FYI, we don't have many major features ready for 8.2.

--
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com
  


I am working on adding a new column contraint,
namely the GENERATED [ALWAYS | BY DEFAULT ] AS
[ IDENTITY ( sequence_options ) | ( expression )]
I surely won't be ready by August 1 as I am going
on a holiday this weekend for two weeks.

Anyway, I submit what I have in my hands now and
describe what it does and what lacks. Maybe someone
can fill the missing gaps, I didn't have too much time
to dig deeper in the PostgreSQL code.

Lacks:
- proper sgml documentation
- pg_dump support
- GENERATED ALWAYS doesn't work (yet[*])
- ALTER TABLE support

What works:
- GENERATED ... AS IDENTITY creates the sequence
 with the given parameters
- any data types that has an implicit cast from int8 can be
 defined as GENERATED ... AS IDENTITY

DEFAULT and GENERATED are mutually exclusive
and I tweaked gram.y to make it so and GENERATED ...
is now parsed as a column constraint.
Unfortunately I introduced one shift/reduce conflict and
I don't see a way to solve it.

SQL2003 says that only one IDENTITY column may
exist in a table, I didn't make a check for that, it may be
useful to allow more than one.

Also, the parsing is very simplistic, it allows
GENERATED BY DEFAULT AS ( expr )
and I made it to behave exactly like DEFAULT.
It can be eliminated in the parser, though.

Tom Lane was unhappy with my previous attempt
as it introduced SERIAL, SERIAL4, SERIAL8 and
BIGSERIAL as keywords. This time I kept it the way
PostgreSQL always had.

[*] I introduced a new column attribute (attforceddef)
and it could be used in rewriteHandler.c to override
given data with the default for GENERATED ALWAYS
column. I must have missed some places where I have to
assign ColumnDef-force_default to
FormData_pg_attribute-attforceddef or vice-versa
but I am certain that not too many left.

Needless to say it passes make check.

Please, review and point out my mistakes again. :-)

Best regards,
Zoltán Böszörményi



psql-serial-8.diff.gz
Description: Unix tar archive

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Extended SERIAL parsing

2006-06-12 Thread Zoltan Boszormenyi

Hi,

Jim C. Nasby írta:

On Mon, Jun 12, 2006 at 02:27:31PM +0200, B?sz?rm?nyi Zolt?n wrote:
  

Zoltan Boszormenyi [EMAIL PROTECTED] writes:
  

after some experimentation, I came up with the attached patch,
which implements parsing the following SERIAL types:


As has been pointed out before, it would be a seriously bad idea to
implement the SQL syntax for identity columns without matching the
SQL semantics for them.  That would leave us behind the eight-ball
when we wanted to implement the SQL semantics.  Right now we have
a useful but non-standard semantics, and a useful but non-standard
syntax, and those two should stick together.
  

Well, I read all sections of 5WD-02-Foundation-2003-09.pdf
where identity appears, here are the list of changes that will
be needed for an identity column:



Have you read the archives on the recent discussions that have taken
place about whether SERIAL should be a black box or not? IIRC most of
this was all hashed out in that thread.  


I just read it thoroughly, and the issues I listed wasn't mentioned
in the black box thread, at all. I am trying to implement the
standard syntax ( and gradually the conformant behaviour )
along the lines of sections 4.14.7, 11.3, 11.4, 11.7, 11.11,
11.12, 11.17 and 14.8.

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[HACKERS] Extended SERIAL parsing

2006-06-11 Thread Zoltan Boszormenyi

Hi,

after some experimentation, I came up with the attached patch,
which implements parsing the following SERIAL types:

SERIAL
SERIAL GENERATED { ALWAYS | BY DEFAULT }
SERIAL GENERATED [ ALWAYS | BY DEFAULT ] AS IDENTITY( sequence options )

The underlying type is still int4 or int8,
so the problems you discussed aren't solved.
But at least the current semantics is kept.

It passes all regression tests, and it works, too:

# create table proba (i serial generated as identity(minvalue 5 maxvalue 
10) primary key, t text);
NOTICE:  CREATE TABLE will create implicit sequence proba_i_seq for 
serial column proba.i
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index 
proba_pkey for table proba

CREATE TABLE
# insert into proba (t) values ('a');
INSERT 0 1
# select * from proba;
i | t
---+---
5 | a
(1 row)

For now, GENERATED { ALWAYS | BY DEFAULT }
are just fillings.

The condition (column-is_serial  column-force_default)
can help enforcing GENERATED ALWAYS at INSERT time
and can also help fixing the two TODO entries about SERIAL.

Best regards,
Zoltán Böszörményi

diff -ur postgresql-8.2/src/backend/parser/analyze.c postgresql-8.2-serial/src/backend/parser/analyze.c
--- postgresql-8.2/src/backend/parser/analyze.c	2006-04-30 20:30:39.0 +0200
+++ postgresql-8.2-serial/src/backend/parser/analyze.c	2006-06-11 23:36:22.0 +0200
@@ -825,40 +825,17 @@
 transformColumnDefinition(ParseState *pstate, CreateStmtContext *cxt,
 		  ColumnDef *column)
 {
-	bool		is_serial;
 	bool		saw_nullable;
 	Constraint *constraint;
 	ListCell   *clist;
 
 	cxt-columns = lappend(cxt-columns, column);
 
-	/* Check for SERIAL pseudo-types */
-	is_serial = false;
-	if (list_length(column-typename-names) == 1)
-	{
-		char	   *typname = strVal(linitial(column-typename-names));
-
-		if (strcmp(typname, serial) == 0 ||
-			strcmp(typname, serial4) == 0)
-		{
-			is_serial = true;
-			column-typename-names = NIL;
-			column-typename-typeid = INT4OID;
-		}
-		else if (strcmp(typname, bigserial) == 0 ||
- strcmp(typname, serial8) == 0)
-		{
-			is_serial = true;
-			column-typename-names = NIL;
-			column-typename-typeid = INT8OID;
-		}
-	}
-
 	/* Do necessary work on the column type declaration */
 	transformColumnType(pstate, column);
 
 	/* Special actions for SERIAL pseudo-types */
-	if (is_serial)
+	if (column-is_serial)
 	{
 		Oid			snamespaceid;
 		char	   *snamespace;
@@ -898,7 +875,7 @@
 		 */
 		seqstmt = makeNode(CreateSeqStmt);
 		seqstmt-sequence = makeRangeVar(snamespace, sname);
-		seqstmt-options = NIL;
+		seqstmt-options = column-seq_opts;
 
 		cxt-blist = lappend(cxt-blist, seqstmt);
 
diff -ur postgresql-8.2/src/backend/parser/gram.y postgresql-8.2-serial/src/backend/parser/gram.y
--- postgresql-8.2/src/backend/parser/gram.y	2006-05-27 19:38:45.0 +0200
+++ postgresql-8.2-serial/src/backend/parser/gram.y	2006-06-11 23:42:02.0 +0200
@@ -275,6 +275,7 @@
 %type boolean opt_instead opt_analyze
 %type boolean index_opt_unique opt_verbose opt_full
 %type boolean opt_freeze opt_default opt_recheck
+%type boolean	ColIdGen ColOptIdGen
 %type defelt	opt_binary opt_oids copy_delimiter
 
 %type boolean copy_from opt_hold
@@ -284,8 +285,8 @@
 
 %type node	fetch_direction select_limit_value select_offset_value
 
-%type list	OptSeqList
-%type defelt	OptSeqElem
+%type list	OptSeqList OptSerialSeqList
+%type defelt	OptSeqElem OptSerialSeqElem
 
 %type istmt	insert_rest
 
@@ -313,7 +314,7 @@
 %type range	relation_expr_opt_alias
 %type target	target_el insert_target_el update_target_el insert_column_item
 
-%type typnam	Typename SimpleTypename ConstTypename
+%type typnam	Typename SimpleTypename SerialTypename ConstTypename
 GenericType Numeric opt_float
 Character ConstCharacter
 CharacterWithLength CharacterWithoutLength
@@ -357,10 +358,10 @@
 
 /* ordinary key words in alphabetical order */
 %token keyword ABORT_P ABSOLUTE_P ACCESS ACTION ADD_P ADMIN AFTER
-	AGGREGATE ALL ALSO ALTER ANALYSE ANALYZE AND ANY ARRAY AS ASC
+	AGGREGATE ALL ALSO ALTER ALWAYS ANALYSE ANALYZE AND ANY ARRAY AS ASC
 	ASSERTION ASSIGNMENT ASYMMETRIC AT AUTHORIZATION
 
-	BACKWARD BEFORE BEGIN_P BETWEEN BIGINT BINARY BIT
+	BACKWARD BEFORE BEGIN_P BETWEEN BIGINT BIGSERIAL BINARY BIT
 	BOOLEAN_P BOTH BY
 
 	CACHE CALLED CASCADE CASCADED CASE CAST CHAIN CHAR_P
@@ -380,11 +381,11 @@
 	FALSE_P FETCH FIRST_P FLOAT_P FOR FORCE FOREIGN FORWARD
 	FREEZE FROM FULL FUNCTION
 
-	GLOBAL GRANT GRANTED GREATEST GROUP_P
+	GENERATED GLOBAL GRANT GRANTED GREATEST GROUP_P
 
 	HANDLER HAVING HEADER_P HOLD HOUR_P
 
-	IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IN_P INCLUDING INCREMENT
+	IDENTITY IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IN_P INCLUDING INCREMENT
 	INDEX INHERIT INHERITS INITIALLY INNER_P INOUT INPUT_P
 	INSENSITIVE INSERT INSTEAD INT_P INTEGER INTERSECT
 	INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -417,7 +418,7 @@
 	ROLE ROLLBACK ROW ROWS RULE
 
 	SAVEPOINT SCHEMA SCROLL SECOND_P SECURITY SELECT SEQUENCE
-	

Re: [HACKERS] Extended SERIAL parsing

2006-06-11 Thread Zoltan Boszormenyi

Rod Taylor írta:

The condition (column-is_serial  column-force_default)
can help enforcing GENERATED ALWAYS at INSERT time
and can also help fixing the two TODO entries about SERIAL.



You will need to include the insert components of the spec which allow
for overriding GENERATED ALWAYS during an INSERT and extend that to COPY
and teach pg_dump how to use them.
  


OK, that's over my head at the moment. :-)
Maybe the wizards here pick up my patch
and complete it. (I hope.)

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] SERIAL problems?

2006-06-06 Thread Zoltan Boszormenyi

Hi,

I just saw these in the TODO list:

o %Disallow changing DEFAULT expression of a SERIAL column?
   This should be done only if the existing SERIAL problems cannot be 
fixed.

o %Disallow ALTER SEQUENCE changes for SERIAL sequences
   because pg_dump does not dump the changes

What are the existing problems?

I am asking because I am experimenting to implement
the SQL2003 compliant form for the serial type
to be able specify the underlying sequence
parameters:

SERIAL [ GENERATED  [ ALWAYS | BY DEFAULT ]
 AS IDENTITY (
   [ INCREMENT [ BY ] increment ]
   [ MINVALUE minvalue | NO MINVALUE ]
   [ MAXVALUE maxvalue | NO MAXVALUE ]
   [ START [ WITH ] start ]
   [ CACHE cache ]
   [ [ NO ] CYCLE ]
 ) ]

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] psql -A (unaligned format) eats too much

2006-06-05 Thread Zoltan Boszormenyi

Hi!

Tom Lane írta:

Jim C. Nasby [EMAIL PROTECTED] writes:
  

I've been able to verify this on 8.1.4; psql -A -t -c 'SELECT * FROM
largetable'  /dev/null results in psql consuming vast quantities of
memory. Why is this?



Is it different without the -A?

I'm reading this as just another uninformed complaint about libpq's
habit of buffering the whole query result.  It's possible that there's
a memory leak in the -A path specifically, but nothing said so far
provided any evidence for that.

regards, tom lane
  


So, is libpq always buffering the result? Thanks.
I thought psql buffers only because in its formatted output mode
it has to know the widest value for all the columns.

Then the SELECT INTO TEMP ; COPY TO STDOUT solution
I found is _the_ solution.

I guess then the libpq-based ODBC driver suffers
from the same problem? It certainly explains the
performance problems I observed: the server
finishes the query, the ODBC driver (or libpq underneath)
fetches all the records and the application receives
the first record after all these. Nice.

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PERFORM] psql -A (unaligned format) eats too much

2006-06-05 Thread Zoltan Boszormenyi

Andrew Dunstan írta:

Mark Woodward wrote:

Jim C. Nasby [EMAIL PROTECTED] writes:
   

On Mon, Jun 05, 2006 at 11:27:30AM -0400, Tom Lane wrote:
 

I'm reading this as just another uninformed complaint about libpq's
habit of buffering the whole query result.  It's possible that 
there's

a memory leak in the -A path specifically, but nothing said so far
provided any evidence for that.


Certainly seems like it. It seems like it would be good to allow for
libpq not to buffer, since there's cases where it's not needed...
  
See past discussions.  The problem is that libpq's API says that 
when it

hands you back the completed query result, the command is complete and
guaranteed not to fail later.  A streaming interface could not make 
that

guarantee, so it's not a transparent substitution.

I wouldn't have any strong objection to providing a separate API that
operates in a streaming fashion, but defining it is something no one's
bothered to do yet.  In practice, if you have to code to a variant API,
it's not that much more trouble to use a cursor...




Wouldn't the COPY (select ...) TO STDOUT format being discussed solve
this for free? 


Yes, it would for me.

It won't solve it in the general case for clients that expect a result 
set. ISTM that use a cursor is a perfectly reasonable answer, though.


The general case cannot be applied for all particular cases.
E.g. you cannot use cursors from shell scripts and just for
producing an export file it's not too reasonable either.
Redirecting psql's output or COPY is enough.

Best regards,
Zoltán Böszörényi


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Interesting speed anomaly

2005-12-15 Thread Zoltan Boszormenyi

Jim C. Nasby írta:


Those queries aren't the same though. The view is equivalent to

SELECT *
FROM
   (select 'AAA' AS prefix,id from table 1
   union select 'AAA',id from table 2
   ) view
WHERE prefix||id = '...'

In this case the prefixes have already been unioned together, so there's
no chance for the planner to use the function index.

If break the WHERE clause into seperate clauses, such as

WHERE prefix='AAA' AND id = '200501'

then I think the planner will know what selects it can simply ignore. If
that doesn't work, then add 'AAA'||id AS fullid to each of the selects
in the view and that should allow the function indexes to be used.
 



Thanks, both method sworks very fast now and use the expression indexes.
Thanks for the patience and the explanations.

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] Interesting speed anomaly

2005-12-14 Thread Zoltan Boszormenyi

Hi,

I am trying to prove whether PostgreSQL is faster than Informix
so I can feed the management with numbers.

In our system, there is an invoice browser view, an UNION of 12
different tables. (Yes, there are 12 different invoices, like new or
second-hand cars, warranty, service, etc, with in/out directions,
all have to be counted from 1 starting each year, e.g 20050001.
The view contains a constant field that is the so called invoice prefix,
e.g. CARO is CAR-OUT, invoice of sold new cars and so on.

SELECT * or SELECT COUNT(*) from this view for listing all invoices
is overall faster.

When I search for only one invoice, knowing the prefix and the invoice number
is more interesting, however.

Informix results:

$ time echo select * from v_invoice_browse where code = 'CARO' and inv_no = 
20020 | dbaccess db

Database selected.

...

1 row(s) retrieved.

Database closed.

real0m1.263s
user0m0.530s
sys 0m0.000s

$ time echo select * from v_invoice_browse where code||inv_no = 
'CARO20020' | dbaccess db

Database selected.

...

1 row(s) retrieved.

Database closed.

real0m7.942s (varying between 7.5 and 14 seconds)
user0m0.510s
sys 0m0.000s



PostgreSQL results:

$ time echo select * from v_invoice_browse where code = 'CARO' and inv_no = 
20020 |psql db
...
(1 row)

real0m0.061s
user0m0.000s
sys 0m0.010s

$ time echo select * from v_invoice_browse where code||inv_no = 
'CARO20020' |psql db
...
(1 row)

real0m18.158s (varying between about 18 and 24 seconds)
user0m0.000s
sys 0m0.020s


The timing of the first query varied very little between five runs.
The timing variations of the second query is indicated above,
it naturally depends on other system activities.

Is there a way to speed this operation up? Maybe it could be known whether
a field in a view is constant, or it can only have limited values, like in
this situation where we have an union of tables, and every member of the
union has a constant in that field. Or there may be other ways to speed up
comparing concatenated values.

Best regards,
Zoltán Böszörményi




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Interesting speed anomaly

2005-12-14 Thread Zoltan Boszormenyi

Tom Lane írta:


Zoltan Boszormenyi [EMAIL PROTECTED] writes:
 


$ time echo select * from v_invoice_browse where code||inv_no = 
'CARO20020' | dbaccess db
   



 


Is there a way to speed this operation up?
   



Make an expression index on code||inv_no, if you think this case is
important enough to be worth maintaining an extra index for.

(This is not on-topic for -hackers, IMHO.  Try pgsql-perform.)

regards, tom lane

 



Thanks for both the hint and the pointer to the mailing list.
My problem is, I can't see how could I create any index on a view.
PostgreSQL refuses it:

create index iinvbrowse1 on v_invoice_browse ((code||inv_no));
ERROR:  v_invoice_browse is not a table

Creating indexes on the 12 invoice tables, like this:

create index iinvoice1 on invoice1 (('PREFIX'||id));

can be done but it doesn't seem to help, at least the query run time 
doesn't decrease.
Remember, the view is an union on the 12 tables, the 'code' (invoice 
prefix) field is

a fake constant field to distinguish between the different invoice types.
And we have the 'inv_no' field in the view but the serial fields in the 
separate invoice tables
are called 'szam'. So there is no direct linkage between the view and 
table field names,
except the view definition. That still leaves me wondering. Both 
Informix and PostgreSQL
seems to do the query using sequential scan but the above WHERE 
condition is computed
about two times faster in Informix, every other usual queries are faster 
in PostgreSQL

about (and I really meant at least) five times than Informix.
That's why I sent it to pgsql-hackers, maybe the hackers are interested 
in further improving

PostgreSQL. ;-)

I will ask on pqsql-performance, thanks.

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Interesting speed anomaly

2005-12-14 Thread Zoltan Boszormenyi

Gavin Sherry írta:


On Thu, 15 Dec 2005, Gavin Sherry wrote:

 


On Wed, 14 Dec 2005, Zoltan Boszormenyi wrote:

   


Tom Lane írta:

 


Zoltan Boszormenyi [EMAIL PROTECTED] writes:


   


$ time echo select * from v_invoice_browse where code||inv_no = 
'CARO20020' | dbaccess db


 



   


Is there a way to speed this operation up?


 


Make an expression index on code||inv_no, if you think this case is
important enough to be worth maintaining an extra index for.

(This is not on-topic for -hackers, IMHO.  Try pgsql-perform.)

regards, tom lane



   


Thanks for both the hint and the pointer to the mailing list.
My problem is, I can't see how could I create any index on a view.
PostgreSQL refuses it:

create index iinvbrowse1 on v_invoice_browse ((code||inv_no));
ERROR:  v_invoice_browse is not a table

Creating indexes on the 12 invoice tables, like this:

create index iinvoice1 on invoice1 (('PREFIX'||id));
 


Are you creating the index on (core || id) on on the string 'PREFIX' or
some other literal?
   



Sorry, I sent this email instead of cancelling it. I take it 'code' is a
string generated by the query, for example: select 'CAR' as code,* from
cars ? If so, it seems strange that we do not use the expressional
index. Could you send the output of explain analyze?

Thanks,

Gavin

 



The VIEW is created like this (shorter example):

create view v1 (code,num) as
select 'AAA',id from table1
union
select 'BBB',id from table2;

I created the indexes on the individual tables as

create index index1 on table1 (('AAA'||id));
create index index2 on table2 (('BBB'||id));

Every index has the same literal the table is associated with in the VIEW.

Here is the explain analyze output, on PostgreSQL 8.0.3.
I can test the same from 8.1.1 tomorrow.

*
# explain analyze select * from v_invoice_browse where code||inv_no = 
'CARO20020';


   
QUERY PLAN


--
Subquery Scan v_invoice_browse  (cost=346661.81..356932.96 rows=403 
width=680) (actual time=9184.529..9735.884 rows=1 loops=1)

  Filter: ((code || (inv_no)::text) = 'CARO20020'::text)
  -  Unique  (cost=346661.81..355523.19 rows=80558 width=188) (actual 
time=9184.313..9602.540 rows=84693 loops=1)
-  Sort  (cost=346661.81..346863.21 rows=80558 width=188) 
(actual time=9184.310..9241.868 rows=84693 loops=1)
  Sort Key: ?column?, prefix, szam, divitem, ?column?, 
?column?, partner, pmode, inv_date, ins_time, ship_date, pterm, netto, 
total, vat, decimal14_2_1, ?column?, ?column?, rectify, 
invoice_rect, status, acc_status, dpaym, dpaym_incl, netto_w_dpaym, 
vat_w_dpaym, acc_group, currency, car, ?column?, userid, bank_account, 
?column?, ?column?, ?column?, ?column?, ?column?, ?column?, 
diff_tax, prcar, case, inv_no, ?column?
  -  Append  (cost=0.00..321067.25 rows=80558 width=188) 
(actual time=0.149..4540.736 rows=84693 loops=1)
-  Subquery Scan *SELECT* 1  
(cost=0.00..148200.17 rows=23571 width=188) (actual time=0.148..735.239 
rows=24508 loops=1)
  -  Nested Loop  (cost=0.00..147964.46 
rows=23571 width=188) (actual time=0.120..408.176 rows=24508 loops=1)
-  Nested Loop  (cost=0.00..3.19 
rows=1 width=15) (actual time=0.033..0.062 rows=1 loops=1)
  -  Seq Scan on pssysinvoice p  
(cost=0.00..2.17 rows=1 width=7) (actual time=0.022..0.034 rows=1 loops=1)
Filter: ((code)::text = 
'GENI'::text)
  -  Seq Scan on psgenjointemp t  
(cost=0.00..1.01 rows=1 width=8) (actual time=0.001..0.016 rows=1 loops=1)
-  Seq Scan on mminigeninvoice h  
(cost=0.00..1637.49 rows=23571 width=173) (actual time=0.048..81.226 
rows=24508 loops=1)
  Filter: ((status = 2) OR (status 
= 3) OR (status = 4) OR (status = 5) OR (status = 6))

SubPlan
  -  Index Scan using isercar1 on 
mmsercar sc  (cost=0.00..3.13 rows=1 width=21) (actual time=0.005..0.006 
rows=1 loops=3405)

Index Cond: (szam = $7)
  -  Index Scan using icarprorder1 on 
mmcarprorder po  (cost=0.00..3.06 rows=1 width=20) (actual 
time=0.024..0.025 rows

Re: [HACKERS] SERIAL type feature request

2005-12-07 Thread Zoltan Boszormenyi

Hi,

Zoltan Boszormenyi írta:


Jan Wieck írta:


On 12/5/2005 1:03 PM, Zoltan Boszormenyi wrote:


Jan Wieck írta:


On 12/4/2005 5:10 PM, Zoltan Boszormenyi wrote:


I found this in the SQL2003 draft:


4.14.7 Identity columns

... An identity column has a start value, an increment, a maximum 
value, a minimum value,

and a cycle option. ...


The exact properties of a sequence. It would be a good idea to be 
able to provide

all these the same way PostgreSQL provides CREATE SEQUENCE.





I think nobody would object to implementing support for the SQL2003 
syntax. Most of that would be providing all the values that will 
get forwarded into the internal sequence generation during CREATE 
TABLE.


The other thing needed is an extension to the default value 
mechanism that overrides any given value to implement GENERATE 
ALLWAYS. Not too hard either.





Where can I find this syntax? (PDF file name, page#) Thanks.
I think I modify my feature request for the standard behaviour.




It's all in the Foundation paper inside this zip:

http://www.wiscorp.com/sql/sql_2003_standard.zip




Thanks, I found it. It's GENERATED { ALWAYS | BY DEFAULT  } AS 
IDENTITY, isn't it?
If I interpret it correctly, GENERATED ALWAYS AS IDENTITY means that 
no matter
what I give in INSERT INTO MYTABLE (serial_id, ...) VALUES (N, ...), 
the sequence
next value will be inserted into the database. I am all for it, it's 
much stronger than just watching

for the 0 value and would fit my needs.

The other behaviour is GENERATED  BY DEFAULT AS IDENTITY,
which is what PostgreSQL currently provides.

Best regards,
Zoltán Böszörményi



To reiterate it, I would like the following added to PostgreSQL 8.2 TODO 
(I may have got the optional parametes wrong...):


- Extend SERIAL type declaration and functionality with the SQL2003 
compliant sequence generation options:

   SERIAL [ GENERATED { ALWAYS | BY DEFAULT }
[ AS IDENTITY ( [ START WITH startvalue ] [ INCREMENT BY 
incrementvalue ]
   [ MAXVALUE maxvalue ] [ 
MINVALUE minvalue ] [ CYCLE | NO CYCLE ] ) ] ]



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] SERIAL type feature request

2005-12-05 Thread Zoltan Boszormenyi

Jan Wieck írta:


On 12/4/2005 5:10 PM, Zoltan Boszormenyi wrote:


I found this in the SQL2003 draft:


4.14.7 Identity columns

... An identity column has a start value, an increment, a maximum 
value, a minimum value,

and a cycle option. ...


The exact properties of a sequence. It would be a good idea to be 
able to provide

all these the same way PostgreSQL provides CREATE SEQUENCE.



I think nobody would object to implementing support for the SQL2003 
syntax. Most of that would be providing all the values that will get 
forwarded into the internal sequence generation during CREATE TABLE.


The other thing needed is an extension to the default value mechanism 
that overrides any given value to implement GENERATE ALLWAYS. Not too 
hard either.



Where can I find this syntax? (PDF file name, page#) Thanks.
I think I modify my feature request for the standard behaviour.

Thanks and best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] SERIAL type feature request

2005-12-05 Thread Zoltan Boszormenyi

Jan Wieck írta:


On 12/5/2005 1:03 PM, Zoltan Boszormenyi wrote:


Jan Wieck írta:


On 12/4/2005 5:10 PM, Zoltan Boszormenyi wrote:


I found this in the SQL2003 draft:


4.14.7 Identity columns

... An identity column has a start value, an increment, a maximum 
value, a minimum value,

and a cycle option. ...


The exact properties of a sequence. It would be a good idea to be 
able to provide

all these the same way PostgreSQL provides CREATE SEQUENCE.




I think nobody would object to implementing support for the SQL2003 
syntax. Most of that would be providing all the values that will get 
forwarded into the internal sequence generation during CREATE TABLE.


The other thing needed is an extension to the default value 
mechanism that overrides any given value to implement GENERATE 
ALLWAYS. Not too hard either.




Where can I find this syntax? (PDF file name, page#) Thanks.
I think I modify my feature request for the standard behaviour.



It's all in the Foundation paper inside this zip:

http://www.wiscorp.com/sql/sql_2003_standard.zip



Thanks, I found it. It's GENERATED { ALWAYS | BY DEFAULT  } AS 
IDENTITY, isn't it?
If I interpret it correctly, GENERATED ALWAYS AS IDENTITY means that 
no matter
what I give in INSERT INTO MYTABLE (serial_id, ...) VALUES (N, ...), 
the sequence
next value will be inserted into the database. I am all for it, it's 
much stronger than just watching

for the 0 value and would fit my needs.

The other behaviour is GENERATED  BY DEFAULT AS IDENTITY,
which is what PostgreSQL currently provides.

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] SERIAL type feature request

2005-12-04 Thread Zoltan Boszormenyi

Andrew Dunstan írta:




Zoltan Boszormenyi wrote:


Hi!

I would like to add an entry to PostgreSQL 8.2 TODO:
- Extend SERIAL to a full-featured auto-incrementer type.


I can't see this item on the TODO list at all. Where exactly did you 
find it?



That's why I wanted it ADDed... ;-)

Best regards


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] SERIAL type feature request

2005-12-04 Thread Zoltan Boszormenyi

Peter Eisentraut írta:


Josh Berkus wrote:
 


I believe that our SERIAL/SEQUENCE stuff is already in compliance
with the SQL standard for sequences (in SQL03).   Why would we change
it?
   



Because your belief is wrong, but Zoltan's proposal is not getting is 
closer.


 


OK, what does the standard say on SERIAL for specifying the start value?
And about this:

last serial value was e.g. 307
insert into mytable (serial_id, ...) values (500, ...);
delete from mytable where serial_id = 500;

In Informix, this is a way to start the next serial value at 501.

Best regards,
Zoltán


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] SERIAL type feature request

2005-12-04 Thread Zoltan Boszormenyi

OK, I admit I haven't read the SQL standards on this matter.

Tino Wildenhain írta:


Am Samstag, den 03.12.2005, 22:23 +0100 schrieb Zoltan Boszormenyi:
 


Hi!

I would like to add an entry to PostgreSQL 8.2 TODO:
- Extend SERIAL to a full-featured auto-incrementer type.

To achieve this, the following three requirements should be fulfilled:

1. The statement parser should be able to handle this:

create table x (
   id serial(N),
   ...
);

and behind the scenes this would translate into the create sequence ... 
start N

before creating the table.
   



why isnt N max_id? Or increment?
Sounds inconsistent. -1 on this.
 



A SERIAL type has the assumption that its value starts at a low value 
(1) and
is increasing. Or is there a type modifier keyword that makes it work 
backwards?

A start value would also work here, decreasing from there.

 

2. Upon INSERTing to a serial column, explicitly given 0 value or 
'default' keyword

or omitted field (implicit default) should be interchangeable.
   



default and omit are these. 0 would be an error. -1 on this too.
 


Why? A sequence in PostgreSQL won't give you 0 even in wraparound mode.
I just checked it:

db= create sequence proba_seq maxvalue 5 cycle;
CREATE SEQUENCE
db= select nextval('proba_seq');
nextval
-
  1
(1 sor)

db= select nextval('proba_seq');
nextval
-
  2
(1 sor)

db= select nextval('proba_seq');
nextval
-
  3
(1 sor)

db= select nextval('proba_seq');
nextval
-
  4
(1 sor)

db= select nextval('proba_seq');
nextval
-
  5
(1 sor)

db= select nextval('proba_seq');
nextval
-
  1
(1 sor)



3. When a serial field value is given in an INSERT or UPDATE statement
and the value is larger the the current value of the sequence then the 
sequence

should be modified accordingly.
   



sideeffects, raceconditions. -1 on this.
 

This event doesn't (shouldn't) occur often, e.g. you have an invoice 
table, invoice No.
contains the year, too. It's somewhat natural to handle it with the 
serial field, so
it gives out 20051 ... values. At the beginning of the next year, 
you modify
the sequence to start at 20061. What I mean is that there may be two 
paths
in the serial field handling, one where 'default' is used, it's 
hopefully isn't racy
as this is the way it works now. The other is when the value is 
explicitly given,

a little critical section may not hurt:

Lock sequence
Check the current value of section
If given value is higher Then Modify sequence
Unlock sequence


This is the way Informix handles its serial type, although it doesn't seem
to have a visible sequence bound to the serial column.
   



Sounds like this informix is seriously broken ;)
 



Yes, and slow, too. :-( That's why I would like to port the company's 
software to PostgreSQL

but there way too many places where Informixism were used.


Is it feasible in the 8.2 timeframe?
   



I hope not ;)
 




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] SERIAL type feature request

2005-12-04 Thread Zoltan Boszormenyi

Jan Wieck írta:


On 12/3/2005 4:23 PM, Zoltan Boszormenyi wrote:


Hi!

I would like to add an entry to PostgreSQL 8.2 TODO:
- Extend SERIAL to a full-featured auto-incrementer type.

To achieve this, the following three requirements should be fulfilled:

1. The statement parser should be able to handle this:

create table x (
id serial(N),
...
);

and behind the scenes this would translate into the create sequence 
... start N

before creating the table.



Syntactic sugar with zero real value. A setval() after create table 
does exactly the same. Unless you extend your proposal to unambiguosly 
specify any or all of the serials properties (min, max, start, cache, 
cycle), this has to be rejected as incomplete.



I found this in the SQL2003 draft:


4.14.7 Identity columns

... An identity column has a start value, an increment, a maximum value, 
a minimum value,

and a cycle option. ...


The exact properties of a sequence. It would be a good idea to be able 
to provide

all these the same way PostgreSQL provides CREATE SEQUENCE.

2. Upon INSERTing to a serial column, explicitly given 0 value or 
'default' keyword

or omitted field (implicit default) should be interchangeable.




Why exactly would we treat an explicit zero different from any other 
explicit value? What you are asking for is to substitute an explicit, 
literal value presented by the user with something different. Sorry, 
but if Informix does THAT, then Informix is no better than MySQL.



Thinking about it more, 0 is a special value that a sequence created 
with defaults
(just like the ones created for SERIAL fields) will not produce. If 
PostgreSQL
provides a way to specify the sequence parameters for a SERIAL, there 
may be other

values too, that a sequence created with given parameters will not produce.
At the extreme, they may be handled the same way. E.g.
CREATE SEQUENCE seq1 INCREMENT 2 MINVALUE 2 MAXVALUE 100;
won't produce 0, 1, any odd number between 3 and 99, and numbers 101 ... 
2^64 -1.



3. When a serial field value is given in an INSERT or UPDATE statement
and the value is larger the the current value of the sequence then 
the sequence

should be modified accordingly.




How about negative increment values, cycling sequences and max/minval?



For descending sequences, a lower value should update the sequence.

This is the way Informix handles its serial type, although it doesn't 
seem

to have a visible sequence bound to the serial column.




Have you considered asking Informix to do the reverse changes?



Hm. Good idea. I'll try. But I guess they won't backport it to 9.21. :-)

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[HACKERS] SERIAL type feature request

2005-12-03 Thread Zoltan Boszormenyi

Hi!

I would like to add an entry to PostgreSQL 8.2 TODO:
- Extend SERIAL to a full-featured auto-incrementer type.

To achieve this, the following three requirements should be fulfilled:

1. The statement parser should be able to handle this:

create table x (
   id serial(N),
   ...
);

and behind the scenes this would translate into the create sequence ... 
start N

before creating the table.

2. Upon INSERTing to a serial column, explicitly given 0 value or 
'default' keyword

or omitted field (implicit default) should be interchangeable.

3. When a serial field value is given in an INSERT or UPDATE statement
and the value is larger the the current value of the sequence then the 
sequence

should be modified accordingly.

This is the way Informix handles its serial type, although it doesn't seem
to have a visible sequence bound to the serial column.

Is it feasible in the 8.2 timeframe?

Thanks in advance,
Zoltán Böszörményi


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster