Re: [GENERAL] I need to ecrypt one column to an output file

2008-04-16 Thread Richard Huxton

Ralph Smith wrote:

I need to do a simple query and output to a file.
No problem.

But how do I encrypt one column's output?

There are lots of developer related links here, and info to use the 
/contrib pgcrypto, but I'm not a PG developer and I failed to find any 
info on HOW TO USE that library function...


Have you installed the pgcrypto module?
Have you read the guide to pgcrypto in the manual?
  http://www.postgresql.org/docs/8.3/static/pgcrypto.html

--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Need help to clean up after failed CLUSTER

2008-04-16 Thread CG
I'm using PostgreSQL 8.1 ... 

I had to terminate some clustering before it had completed. I think I have 
quite a bit of wasted disk space in half-baked table files. I need some 
suggestions for an easy way to find and clean out the files left over from the 
failed cluster operations.

TIA!


  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to recovery data from folder data installation?

2008-04-16 Thread Magnus Hagander
Craig Ringer wrote:
 Orlando Solarte wrote:
  Hi.
   
  I am trouble with my server postres on linux. It crashed!. I have
  folder postgres installation that was my server. Is possible
  recovery data from folder postgres installation? For example folder
  data?
 
 Unfortunately you have provided very little information.
 
 If you mean that the PostgreSQL database server crashed and will no
 longer start up, then you would need to provide the messages
 PostgreSQL prints to its log files when it tries and fails to start
 up.
 
 If you mean that the server computer its self crashed, and you now
 have a PostgreSQL data directory (from backups or recovered from the
 server's hard disk) then if the PostgreSQL data directory is
 undamaged it should be possible to use it. You will need to use it
 with a version of PostgreSQL with the same minor version as the one
 you were using on the server, eg if you were using 8.1.4 you should
 get the latest PostgreSQL in the 8.1 series (NOT 8.2 or 8.3) to try
 to read the data.

What you mean here is of course that you should be using the same
*major* version, but the latest available *minor* version.

Other than that, it's correct - and the version numbers in the example
is correct :-)

//Magnus

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Not Geography

2008-04-16 Thread Shane Ambler

Bob Pawley wrote:

[GENERAL] Not GeographyI have PostgreSQL, c/w the geometry
attachment, installed on Win XP. I need to get started using it. I am
best able to do that by example, however a simple (geometry for
dummies) description or a tool would also help.


Are you wanting to use the geometric data types that are included in 
postgresql or full GIS types that are added by postgis?


If you are working with basic geometry instead of geographic data then 
this may be somewhat simpler for you.


In case you are lost with that distinction, geometric data types are 
based on a flat 2D graph to locate a point, line, rectangle, polygon etc


GIS data gets a bit more complex as it accounts for distortions from the 
graph being stretched around the roughly circular earth. There are also 
several different ways to calculate these distortions and postgis will 
translate between them. As well as layering different data and help 
deciding what data is viewable at what zoom levels etc.


postgis is normally only used for geographic data but can be more widely 
applied. postgis can also handle 3D data if that is what you are 
interested in.




I am on the Postgis list but everyone there seems to be so
geographically oriented that I wanted to query a larger audience.

Bob

- Original Message - From: Obe, Regina To: Bob Pawley ;
PostgreSQL Sent: Tuesday, April 15, 2008 10:50 AM Subject: RE:
[GENERAL] Not Geography



I have Postgresql database that I would like to link to the
geometry side of Postgis. (This has little comparison to geography)




Does anyone know of a tool or documentation that could get me
started??



Bob


Bob, I don't quite understand your question.  Are you not sure how to
install PostGIS, how to integrate it in your existing datbase, or
once you install how to get started using it?  How to do it depends
on the operating system you are using.  Once you have it installed
you may want to join the Postgis newsgroup for further help

http://postgis.refractions.net/mailman/listinfo/postgis-users


Hope that helps, Regina




--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Storing and accessing GeoData( Latitude and Longitude ) in PostgreSQL 8.3

2008-04-16 Thread Stefan Sturm
Hello,

I have a lot of GeoData( Latitude and Longitude ) in my PostgreSQL Database.
Is store them as numeric at this time.
But now I want to access the data and want to have all Datasets around a
Geographic point. eg:
withinhttp://dict.leo.org/ende?lp=endep=thMx..search=within
a http://dict.leo.org/ende?lp=endep=thMx..search=a
radiushttp://dict.leo.org/ende?lp=endep=thMx..search=radius
of http://dict.leo.org/ende?lp=endep=thMx..search=of
5http://dict.leo.org/ende?lp=endep=thMx..search=5
km... http://dict.leo.org/ende?lp=endep=thMx..search=km

Is there a simple way to do something like this?

Thanks for your Help,
Stefan Sturm


Re: [GENERAL] Storing and accessing GeoData( Latitude and Longitude ) in PostgreSQL 8.3

2008-04-16 Thread ludwig
Yes, there is a great solution,take a look at PostGIS: http://www.postgis.org/It shurely will do all you want with geodata, but perhaps not as simple as you like...bye...LudwigHello,I have a lot of GeoData( Latitude and Longitude ) in my PostgreSQL Database. Is store them as numeric at this time.But now I want to access the data and want to have all Datasets around a Geographic point. eg: within a radius of 5 km...Is there a simple way to do something like this?Thanks for your Help,Stefan Sturm


Re: [GENERAL] Not Geography

2008-04-16 Thread Bob Pawley
In case you are lost with that distinction, geometric data types are based 
on a flat 2D graph to locate a point, line, rectangle, polygon etc


Yes - this is what I am interested in with the eventual potential of 3D.

Bob


- Original Message - 
From: Shane Ambler [EMAIL PROTECTED]

To: Bob Pawley [EMAIL PROTECTED]
Cc: Obe, Regina [EMAIL PROTECTED]; PostgreSQL 
pgsql-general@postgresql.org

Sent: Wednesday, April 16, 2008 2:09 AM
Subject: Re: [GENERAL] Not Geography



Bob Pawley wrote:

[GENERAL] Not GeographyI have PostgreSQL, c/w the geometry
attachment, installed on Win XP. I need to get started using it. I am
best able to do that by example, however a simple (geometry for
dummies) description or a tool would also help.


Are you wanting to use the geometric data types that are included in 
postgresql or full GIS types that are added by postgis?


If you are working with basic geometry instead of geographic data then 
this may be somewhat simpler for you.


In case you are lost with that distinction, geometric data types are based 
on a flat 2D graph to locate a point, line, rectangle, polygon etc


GIS data gets a bit more complex as it accounts for distortions from the 
graph being stretched around the roughly circular earth. There are also 
several different ways to calculate these distortions and postgis will 
translate between them. As well as layering different data and help 
deciding what data is viewable at what zoom levels etc.


postgis is normally only used for geographic data but can be more widely 
applied. postgis can also handle 3D data if that is what you are 
interested in.




I am on the Postgis list but everyone there seems to be so
geographically oriented that I wanted to query a larger audience.

Bob

- Original Message - From: Obe, Regina To: Bob Pawley ;
PostgreSQL Sent: Tuesday, April 15, 2008 10:50 AM Subject: RE:
[GENERAL] Not Geography



I have Postgresql database that I would like to link to the
geometry side of Postgis. (This has little comparison to geography)




Does anyone know of a tool or documentation that could get me
started??



Bob


Bob, I don't quite understand your question.  Are you not sure how to
install PostGIS, how to integrate it in your existing datbase, or
once you install how to get started using it?  How to do it depends
on the operating system you are using.  Once you have it installed
you may want to join the Postgis newsgroup for further help

http://postgis.refractions.net/mailman/listinfo/postgis-users


Hope that helps, Regina




--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz 



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Storing and accessing GeoData( Latitude and Longitude ) in PostgreSQL 8.3

2008-04-16 Thread Ivan Zolotukhin
Hello,

You may want to have a look also at PgSphere
(http://pgfoundry.org/projects/pgsphere) and Q3C (http://q3c.sf.net)
which is actually a spherical indexing solution built specially for
PostgreSQL with the best performance among all other methods (even
within other databases).

Regards,
 Ivan

On Wed, Apr 16, 2008 at 1:23 PM, Stefan Sturm
[EMAIL PROTECTED] wrote:
 Hello,

 I have a lot of GeoData( Latitude and Longitude ) in my PostgreSQL Database.
 Is store them as numeric at this time.
 But now I want to access the data and want to have all Datasets around a
 Geographic point. eg: within a radius of 5 km...

 Is there a simple way to do something like this?

 Thanks for your Help,
 Stefan Sturm


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Master-master replication with PostgreSQL

2008-04-16 Thread Rob Collins
Hello

We're looking for an open-source database solution that has a Python
interface and will do master-master replication. My client has a flawed MS
SQL Server system, with the replication hand-coded in Python. They don't
want to pay very high licence fees for the MS SQL Server Enterprise version
at 20 sites across the UK.

There is one central server with 19 branches. Some tables need to replicate
from the central server to the branches. Other tables are centralised from
the branches into one totalling table at the centre. A few tables need to
replicate in both directions.

From what I've read, Slony-I does only master-slave replication and Slony-II
is not being actively developed. Is this right? Are there any viable
master-master replication tools for PostgreSQL. (They could be
commercial/paid for if necessary.)

Rob


Re: [GENERAL] generate_series woes

2008-04-16 Thread Harald Fuchs
In article [EMAIL PROTECTED],
Merlin Moncure [EMAIL PROTECTED] writes:

 On Mon, Apr 14, 2008 at 5:21 AM, Harald Fuchs [EMAIL PROTECTED] wrote:
 I think there's something sub-optimal with generate_series.
 In the following, documents is a table with more than 12 rows,
 vacuumed and analyzed before the queries.

 everything is working exactly as intended.  while it's obvious to you
 that the generate series function returns a particular number of rows
 based on your supplied inputs, it's not (yet) obvious to the planner.

Which was exactly my point.  Since generate_series is a builtin
function, the planner could theoretically know the number of rows
returned, thus choosing a better plan.

OTOH, the difference between theory and reality is in theory smaller
than in reality.

 your genser function supplies the hint the planner needs and it
 adjusts the plan.  most set returning functions (particularly
 non-immutable ones) are not so easy to determine the # of rows from
 the input parameters anyways.

Yes, of course.  I used genser just to show that there is a better plan.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] generate_series woes

2008-04-16 Thread hubert depesz lubaczewski
On Mon, Apr 14, 2008 at 11:21:58AM +0200, Harald Fuchs wrote:
 I think there's something sub-optimal with generate_series.
 In the following, documents is a table with more than 12 rows,
 vacuumed and analyzed before the queries.
 Since generate_series is a builtin function, can't it tell how many
 rows it will return?

i think it would be better off not to limit some functionality for
builtin functions. it would be much nicer to have the ability to hint
planer about rowcount from function *in* the sql.

something like:

select i from generate_series(1,10) {hint:10} as i;

i'm not proposiung syntax. i'm suggesting the functionality.

depesz

-- 
quicksil1er: postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV! :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] generate_series woes

2008-04-16 Thread Volkan YAZICI
hubert depesz lubaczewski [EMAIL PROTECTED] writes:
 i think it would be better off not to limit some functionality for
 builtin functions. it would be much nicer to have the ability to hint
 planer about rowcount from function *in* the sql.

 something like:

 select i from generate_series(1,10) {hint:10} as i;

 i'm not proposiung syntax. i'm suggesting the functionality.

I'm strongly declined for such non-SQL compliant solutions. I'd be
appreciated if hackers can solve the problem internally, without bugging
SQL syntax.


Regards.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] generate_series woes

2008-04-16 Thread hubert depesz lubaczewski
On Wed, Apr 16, 2008 at 03:37:22PM +0300, Volkan YAZICI wrote:
 I'm strongly declined for such non-SQL compliant solutions. I'd be
 appreciated if hackers can solve the problem internally, without bugging
 SQL syntax.

for generate_series - sure. but i have functions which change (in a
known way) number of returned rows based on their arguments.

depesz

-- 
quicksil1er: postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV! :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Master-master replication with PostgreSQL

2008-04-16 Thread Julio Cesar Sánchez González

El mié, 16-04-2008 a las 12:13 +0100, Rob Collins escribió:
 Hello
 
 We're looking for an open-source database solution that has a Python
 interface and will do master-master replication. My client has a
 flawed MS SQL Server system, with the replication hand-coded in
 Python. They don't want to pay very high licence fees for the MS SQL
 Server Enterprise version at 20 sites across the UK.
 
 There is one central server with 19 branches. Some tables need to
 replicate from the central server to the branches. Other tables are
 centralised from the branches into one totalling table at the centre.
 A few tables need to replicate in both directions. 
 
 From what I've read, Slony-I does only master-slave replication and
 Slony-II is not being actively developed. Is this right? Are there any
 viable master-master replication tools for PostgreSQL. (They could be
 commercial/paid for if necessary.)
 
 Rob

Try with bucardo (http://bucardo.org/;) may be help you :).

Regards,

Julio Cesar Sánchez González.
www.sistemasyconectividad.com.mx
[EMAIL PROTECTED]



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] generate_series woes

2008-04-16 Thread Merlin Moncure
On Wed, Apr 16, 2008 at 8:37 AM, Volkan YAZICI [EMAIL PROTECTED] wrote:
 hubert depesz lubaczewski [EMAIL PROTECTED] writes:
   i think it would be better off not to limit some functionality for
   builtin functions. it would be much nicer to have the ability to hint
   planer about rowcount from function *in* the sql.
  
   something like:
  
   select i from generate_series(1,10) {hint:10} as i;
  
   i'm not proposiung syntax. i'm suggesting the functionality.

  I'm strongly declined for such non-SQL compliant solutions. I'd be
  appreciated if hackers can solve the problem internally, without bugging
  SQL syntax.

maybe -- just an idle thought -- the 'ROWS' clause of create function
could be expanded to take a simple expression based on the input
parameters.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Master-master replication with PostgreSQL

2008-04-16 Thread Dimitri Fontaine
Hi,

Le mercredi 16 avril 2008, Rob Collins a écrit :
 There is one central server with 19 branches. Some tables need to replicate
 from the central server to the branches. Other tables are centralised from
 the branches into one totalling table at the centre. A few tables need to
 replicate in both directions.

I'm working on some projects here with this very need (and same scale), and I 
plan to use londiste (master/slaves asynchronous solution) replication 
solution, which I already use in production on some other project.

The fact is that we want the branches to still be fully available in case of 
network or central server failure, so we don't buy into synchronous 
replication; which is not available yet into PostgreSQL as far as I know, 
even if one of the basics building-block alternatives is provided into -core, 
namely Two Phase Commit.
  http://www.postgresql.org/docs/8.3/static/sql-prepare-transaction.html

You'll find londiste documentation at both following places, the latter is 
up-to-date with last 2.1.6 release, the former I'm not sure about it.
  http://skytools.projects.postgresql.org/doc/
  http://pgsql.tapoueh.org/skytools/

Hope this helps, regards,
-- 
dim


signature.asc
Description: This is a digitally signed message part.


Re: [GENERAL] Master-master replication with PostgreSQL

2008-04-16 Thread Rodrigo Gonzalez

Rob Collins escribió:

Hello

We're looking for an open-source database solution that has a Python 
interface and will do master-master replication. My client has a 
flawed MS SQL Server system, with the replication hand-coded in 
Python. They don't want to pay very high licence fees for the MS SQL 
Server Enterprise version at 20 sites across the UK.


There is one central server with 19 branches. Some tables need to 
replicate from the central server to the branches. Other tables are 
centralised from the branches into one totalling table at the centre. 
A few tables need to replicate in both directions.


From what I've read, Slony-I does only master-slave replication and 
Slony-II is not being actively developed. Is this right? Are there any 
viable master-master replication tools for PostgreSQL. (They could be 
commercial/paid for if necessary.)


Rob
I am testing bucardo (http://www.bucardo.org) in star replication with 
one office at Canada and 3 branches in different countries and it's 
working really nice. It's not yet in production, but my test show that 
it's really good for star replication as you need.


BTW, my replication is bi-directional and all the branches syncronized 
with the others.


Rodrigo



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [GENERAL] util/int8.h: missing int8_text() function

2008-04-16 Thread Alvaro Herrera
Michael Enke wrote:
 Hi all,
 I migrate from 8.1 to 8.3.
 In 8.2.7 and previous there are functions int4_text() and int8_text(),
 in 8.3.0 they are missing (not documented that they are removed in the 
 release notes).
 What I'm supposed to use instead of int8_text() now?
 (I use my own datatype which is similar to varchar/text
  and for a  CREATE CAST (int8 as myType) WITH FUNCTION myFunction(int8) AS 
 ASSIGNMENT;
  I call the int8_text() function.)

IIRC the casts now use the type's output function.  I'm guessing you
should be able to do the same with your type.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Master-master replication with PostgreSQL

2008-04-16 Thread Rob Collins
Hello Dimitri

To clarify the requirement: much like you, we're not looking for synchronous
replication, which would be too slow. The branches and central server can be
different for about 5 to 10 minutes. But the branches need to be able to
function independently if the network or central goes down. Londiste looks
interesting, though the documentation seems a bit sparse. Is it really that
simple to set up, or is there a fair amount of trial and error in the setup
process?

Best wishes

Rob

On 16/04/2008, Dimitri Fontaine [EMAIL PROTECTED] wrote:

 Hi,

 Le mercredi 16 avril 2008, Rob Collins a écrit :

  There is one central server with 19 branches. Some tables need to
 replicate
  from the central server to the branches. Other tables are centralised
 from
  the branches into one totalling table at the centre. A few tables need
 to
  replicate in both directions.


 I'm working on some projects here with this very need (and same scale),
 and I
 plan to use londiste (master/slaves asynchronous solution) replication
 solution, which I already use in production on some other project.

 The fact is that we want the branches to still be fully available in
 case of
 network or central server failure, so we don't buy into synchronous
 replication; which is not available yet into PostgreSQL as far as I know,
 even if one of the basics building-block alternatives is provided into
 -core,
 namely Two Phase Commit.
   http://www.postgresql.org/docs/8.3/static/sql-prepare-transaction.html

 You'll find londiste documentation at both following places, the latter is
 up-to-date with last 2.1.6 release, the former I'm not sure about it.
   http://skytools.projects.postgresql.org/doc/
   http://pgsql.tapoueh.org/skytools/

 Hope this helps, regards,

 --
 dim




Re: [GENERAL] Master-master replication with PostgreSQL

2008-04-16 Thread Vivek Khera


On Apr 16, 2008, at 8:47 AM, Julio Cesar Sánchez González wrote:

From what I've read, Slony-I does only master-slave replication and
Slony-II is not being actively developed. Is this right? Are there  
any

viable master-master replication tools for PostgreSQL. (They could be
commercial/paid for if necessary.)

Rob


Try with bucardo (http://bucardo.org/;) may be help you :).


Doesn't bucardo handle conflicts with whichever one I apply last  
wins?  That doesn't seem safe in all situations (or any, IMO).



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Master-master replication with PostgreSQL

2008-04-16 Thread Alvaro Herrera
Julio Cesar Sánchez González wrote:
 
 El mié, 16-04-2008 a las 12:13 +0100, Rob Collins escribió:

  We're looking for an open-source database solution that has a Python
  interface and will do master-master replication. My client has a
  flawed MS SQL Server system, with the replication hand-coded in
  Python. They don't want to pay very high licence fees for the MS SQL
  Server Enterprise version at 20 sites across the UK.
  
  There is one central server with 19 branches. Some tables need to
  replicate from the central server to the branches.
 
 Try with bucardo (http://bucardo.org/;) may be help you :).

Unfortunately, Bucardo only seems to work with 2 masters -- you can't
have 20.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Master-master replication with PostgreSQL

2008-04-16 Thread Erik Jones


On Apr 16, 2008, at 8:44 AM, Rob Collins wrote:

Hello Dimitri

To clarify the requirement: much like you, we're not looking for  
synchronous replication, which would be too slow. The branches and  
central server can be different for about 5 to 10 minutes. But the  
branches need to be able to function independently if the network or  
central goes down. Londiste looks interesting, though the  
documentation seems a bit sparse. Is it really that simple to set  
up, or is there a fair amount of trial and error in the setup process?


Yes, it really is pretty simple to set up.  If you give it a shot and  
run up against any issues or find something unclear, please, offer  
suggestions or patches on the skytools mailing list.


Erik Jones

DBA | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate  market in style.
Visit us online at http://www.myemma.com




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Master-master replication with PostgreSQL

2008-04-16 Thread Dimitri Fontaine
Hi,

Le mercredi 16 avril 2008, Rob Collins a écrit :
 To clarify the requirement: much like you, we're not looking for
 synchronous replication, which would be too slow. The branches and central
 server can be different for about 5 to 10 minutes. But the branches need to
 be able to function independently if the network or central goes down.

Exactly my need, with the addition that central edited tables and branches 
local edited tables are disjoint sets, which makes the point for master/slave 
replication choice.
I'm experiencing 1.5s and 4.8s average lag on my production setup, with 
respectively 20tps and 200tps (insert/update/delete only).

 Londiste looks interesting, though the documentation seems a bit sparse. Is
 it really that simple to set up, or is there a fair amount of trial and
 error in the setup process?

I had the same question some time ago when I wanted to choose between londiste 
and other solutions, and came to write up a part of the current 
documentation.
In my experience, londiste is really that easy to set up, the only trial and 
errors I've had where either bad reading of the doc (forgot to launch pgq 
daemon, for example) or bad psycopg version usage.

In my case, but several people on skytools mailing-list have different 
experience, I had to stick with psycopg 1 and avoid 2.0.5 (debian stable 
version). YMMV.

I want to add [EMAIL PROTECTED] mailing list is pretty responsive, 
don't hesitate asking there :)

Regards,
-- 
dim


signature.asc
Description: This is a digitally signed message part.


Re: [GENERAL] Master-master replication with PostgreSQL

2008-04-16 Thread Dragan Zubac

Hello

http://www.postgresql.at/english/pr_cybercluster_e.html

didn't test it myself though

Sincerely

Dragan

Rob Collins wrote:

Hello Dimitri

To clarify the requirement: much like you, we're not looking for 
synchronous replication, which would be too slow. The branches and 
central server can be different for about 5 to 10 minutes. But the 
branches need to be able to function independently if the network or 
central goes down. Londiste looks interesting, though the 
documentation seems a bit sparse. Is it really that simple to set up, 
or is there a fair amount of trial and error in the setup process?


Best wishes

Rob

On 16/04/2008, *Dimitri Fontaine* [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:


Hi,

Le mercredi 16 avril 2008, Rob Collins a écrit :

 There is one central server with 19 branches. Some tables need
to replicate
 from the central server to the branches. Other tables are
centralised from
 the branches into one totalling table at the centre. A few
tables need to
 replicate in both directions.


I'm working on some projects here with this very need (and same
scale), and I
plan to use londiste (master/slaves asynchronous solution) replication
solution, which I already use in production on some other project.

The fact is that we want the branches to still be fully
available in case of
network or central server failure, so we don't buy into synchronous
replication; which is not available yet into PostgreSQL as far as
I know,
even if one of the basics building-block alternatives is provided
into -core,
namely Two Phase Commit.
  http://www.postgresql.org/docs/8.3/static/sql-prepare-transaction.html

You'll find londiste documentation at both following places, the
latter is
up-to-date with last 2.1.6 release, the former I'm not sure about it.
  http://skytools.projects.postgresql.org/doc/
  http://pgsql.tapoueh.org/skytools/

Hope this helps, regards,

--
dim






[GENERAL] vacuum and cluster

2008-04-16 Thread Jimmy Choi
Hello,

Does running cluster remove the need to run vacuum?

I get a feeling that since cluster is already physically reordering
the rows, it may as well remove the dead rows... no?

My second question is, if vacuum is still needed, does it matter
whether I run vacuum first or cluster first?

Here's our current weekly db maintenance routine:

1. vacuum full
2. cluster
3. reindex
4. analyze

Thanks,
Jimmy

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Storage sizes for dates/times (documentation bug?)

2008-04-16 Thread Karsten Hilbert
On Tue, Apr 15, 2008 at 03:31:46PM +0100, Sam Mason wrote:

 But I was under the impression that you didn't want any time zone
 information.
Wrong impression.

 You wanted to know that that an appointment was at 3PM at
 the patients local time,
... plus what does local time mean.

 attempting to correct this for the local time
 zone of any analyst is invalid.
Sure, there can be valid and invalid normalisations.

 I must be missing something then, can you explain why the original time
 zone matters?

a) I want to be able to display when a patient's appointment
   happened in local time.

b) I must be able to aggregate appointments from different
   time zones into a coherent EMR. For that I need to be able
   to map them onto, say, UTC.

Taken together this could be served nicely by a UTC-storing,
local-tz-remembering timestamp.

 If you actually hardcoded your timezone as
 GMT+6, or whatever, then yes it may be different.  But only if you went
 around at midnight March 31st, changing computers to be GMT+5
The machines do that by themselves.

 In some cases yes I'd agree, but I have a feeling the number of cases is
 surprisingly small in practise.
The sampling may not be that large but when the problem is
there it is painful.

Basically, akin to there's no such thing as plain text
there should be there's no such thing as a timezone-less
timestamp.

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Storage sizes for dates/times (documentation bug?)

2008-04-16 Thread Karsten Hilbert
On Tue, Apr 15, 2008 at 05:29:09PM +0200, Martijn van Oosterhout wrote:

  Perhaps I confuse this with some limitation of a previous
  implementation of the enum type. Also perhaps I was
  misguided into thinking tags cannot be modified by the
  don't delete from table of tags part.
 
 Oh, it means that if you DROP the type it will leave some stuff behind.
 You can ofcourse handle *value* of that type just like any other value.
 The 'tag table' in this case would be the list of timezones. I'll see
 if I can clarify it.
That'd help, thanks.

  Agree. Another one is non-indexability which I'd truly need.
 
 Well, you can index them ofcourse, but you need to indicate explicitly
 what you want to index: the timestamp or the timestamp shifted to the
 timezone.
Oh, OK.

 I felt the module couldn't make this decision on its own.
True enough.

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] vacuum and cluster

2008-04-16 Thread Jimmy Choi
Hello,

Does running cluster remove the need to run vacuum? 

I get a feeling that since cluster is already physically reordering the
rows, it may as well remove the dead rows... no?

My second question is, if vacuum is still needed, does it matter whether
I run vacuum first or cluster first? 

Here's our current weekly db maintenance routine:

1. vacuum full
2. cluster
3. reindex
4. analyze

Thanks,
Jimmy Choi




Confidentiality Notice.  This message may contain information that is 
confidential or otherwise protected from disclosure.
If you are not the intended recipient, you are hereby notified that any use, 
disclosure, dissemination, distribution, 
or copying of this message, or any attachments, is strictly prohibited.  If you 
have received this message in error, 
please advise the sender by reply e-mail, and delete the message and any 
attachments.  Thank you.




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Master-master replication with PostgreSQL

2008-04-16 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160

 Try with bucardo (http://bucardo.org/;) may be help you :).

 Doesn't bucardo handle conflicts with whichever one I apply last
 wins?  That doesn't seem safe in all situations (or any, IMO).

With Bucardo, conflict handling is chosen as a standard method
(of which one is latest) or a custom handler. Latest has its
places, but the usual recommendation is to pick one of the servers
as the trusted source, or to write your own handler based on your
business logic.

http://www.bucardo.org/bucardo.html#BucardoConflictHandling

- --
Greg Sabino Mullane [EMAIL PROTECTED]
End Point Corporation
PGP Key: 0x14964AC8 200804161143
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAkgGHogACgkQvJuQZxSWSshdAACg6ouoh909Ts1r40YwGscax2M1
y+gAoOQdOG7ToqnYNfVQ+3sS8ypBrpFS
=NC0p
-END PGP SIGNATURE-



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] vacuum and cluster

2008-04-16 Thread Craig Ringer

Jimmy Choi wrote:

Hello,

Does running cluster remove the need to run vacuum?


My understanding is that `CLUSTER' creates a new table file, then swaps 
it out for the old one.


http://www.postgresql.org/docs/8.3/static/sql-cluster.html

 During the cluster operation, a temporary copy of the table is created 
that contains the table data in the index order. Temporary copies of 
each index on the table are created as well. Therefore, you need free 
space on disk at least equal to the sum of the table size and the index 
sizes. 


It's not stated explicitly, but I'm pretty sure discussion here has 
mentioned that too. Given that, VACUUM FULL on a just-CLUSTERed table 
should be redundant.


The easy way to be sure is to use ANALYZE VERBOSE to examine the dead 
row counts etc before and after each operation.


--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread Vance Maverick
I want to create a table with the semantics of a log.  There may be
multiple concurrent writers, because log entries will be generated by
triggers.  And there will be multiple concurrent readers -- mostly
remote processes watching the log over time.  I'd like to guarantee that
each of those readers will see the same, complete sequence of changes.

For a first attempt, I created a table with a serial column (actually I
created the sequence ID separately, but that's a detail).  The readers
connect remotely, scan through the table in sequence order, and remember
the ID of the last row they read.  When they read again, they start from
after that ID -- the query is roughly

   SELECT * FROM logtable WHERE id  ? ORDER BY id

But there's a problem with this -- the rows may not be inserted into the
log in ID order.  For example, if two concurrent sessions get the
sequence IDs 3 and 4, the one with ID 4 might commit first.  If a reader
queries the table at that moment, it will see the sequence (1, 2, 4).
Later, if the other session commits, a log entry with ID 3 will be
added.  The IDs will now be (1, 2, 4, 3) -- but if the reader comes back
to poll the log again, it will miss entry 3.  In order not to miss it,
the reader would have to remember much more information than just a
high-water mark.  (I think it would have to remember the complete set of
IDs it had processed already, and then scan the whole table to find
entries not in the set.)

Is there an approach that will give the semantics I need?  In
particular, I'd like a reader that has already read up to a certain
point in the log to be able to restart at or near that same point, so it
wouldn't have to reread everything.

(I recognize that these log semantics are closer to those provided by
files, not database tables.  A file is an option I'll consider, but
obviously it lacks transactional semantics.)

Thanks,

Vance



Re: [GENERAL] vacuum and cluster

2008-04-16 Thread Alvaro Herrera
Craig Ringer escribió:

 It's not stated explicitly, but I'm pretty sure discussion here has  
 mentioned that too. Given that, VACUUM FULL on a just-CLUSTERed table  
 should be redundant.

It is, and a REINDEX is redundant too because CLUSTER does it
internally.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread brian

Vance Maverick wrote:

I want to create a table with the semantics of a log.  There may be
multiple concurrent writers, because log entries will be generated by
triggers.  And there will be multiple concurrent readers -- mostly
remote processes watching the log over time.  I'd like to guarantee that
each of those readers will see the same, complete sequence of changes.

For a first attempt, I created a table with a serial column (actually I
created the sequence ID separately, but that's a detail).  The readers
connect remotely, scan through the table in sequence order, and remember
the ID of the last row they read.  When they read again, they start from
after that ID -- the query is roughly

   SELECT * FROM logtable WHERE id  ? ORDER BY id

But there's a problem with this -- the rows may not be inserted into the
log in ID order.  For example, if two concurrent sessions get the
sequence IDs 3 and 4, the one with ID 4 might commit first.  If a reader
queries the table at that moment, it will see the sequence (1, 2, 4).
Later, if the other session commits, a log entry with ID 3 will be
added.  The IDs will now be (1, 2, 4, 3) -- but if the reader comes back
to poll the log again, it will miss entry 3.  In order not to miss it,
the reader would have to remember much more information than just a
high-water mark.  (I think it would have to remember the complete set of
IDs it had processed already, and then scan the whole table to find
entries not in the set.)

Is there an approach that will give the semantics I need?  In
particular, I'd like a reader that has already read up to a certain
point in the log to be able to restart at or near that same point, so it
wouldn't have to reread everything.

(I recognize that these log semantics are closer to those provided by
files, not database tables.  A file is an option I'll consider, but
obviously it lacks transactional semantics.)



Use a timestamp column also.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread Craig Ringer

brian wrote:


Use a timestamp column also.


That's subject to the same issues, because a transaction's 
current_timestamp() is determined at transaction start. So, in a 
situation like this:



WRITER 1WRITER 2READER 1

BEGIN
BEGIN
INSERT
INSERT
COMMIT
BEGIN
SELECT
COMMIT

then READER 1 will see the most recent timestamp as that inserted by 
WRITER 2, but it won't see the row inserted by WRITER 1 with an earlier 
timestamp.


I don't think it's even OK in the case of a single-statement INSERT 
(where the transaction is implicit) and/or with the use of 
clock_timestamp() ... though I'm less sure about that.


--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Master-master replication with PostgreSQL

2008-04-16 Thread Rodrigo Gonzalez
Using star replication I have all the nodes as masterso n 
master-master replication with the common DB as master to all the nodes 
and all the nodes as master to the common DB


Alvaro Herrera escribió:

Julio Cesar Sánchez González wrote:
  

El mié, 16-04-2008 a las 12:13 +0100, Rob Collins escribió:



  

We're looking for an open-source database solution that has a Python
interface and will do master-master replication. My client has a
flawed MS SQL Server system, with the replication hand-coded in
Python. They don't want to pay very high licence fees for the MS SQL
Server Enterprise version at 20 sites across the UK.

There is one central server with 19 branches. Some tables need to
replicate from the central server to the branches.
  

Try with bucardo (http://bucardo.org/;) may be help you :).



Unfortunately, Bucardo only seems to work with 2 masters -- you can't
have 20.

  





smime.p7s
Description: S/MIME Cryptographic Signature


Re: [GENERAL] Master-master replication with PostgreSQL

2008-04-16 Thread Alvaro Herrera
Rodrigo Gonzalez escribió:
 Using star replication I have all the nodes as masterso n  
 master-master replication with the common DB as master to all the nodes  
 and all the nodes as master to the common DB

Oh, so the Bucardo docs are incorrect.

http://www.bucardo.org/bucardo.html#BucardoLimitations



-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Master-master replication with PostgreSQL

2008-04-16 Thread Rodrigo Gonzalez

Yes and no

bucardo cannot handle master-master-master

What I am doing is

masterA-masterB
masterA-masterC
masterA-masterD

That is something that bucardo can handle and allow me to replicate 
masterB-masterC but everything goes thru masterA always


Alvaro Herrera escribió:

Rodrigo Gonzalez escribió:
  
Using star replication I have all the nodes as masterso n  
master-master replication with the common DB as master to all the nodes  
and all the nodes as master to the common DB



Oh, so the Bucardo docs are incorrect.

http://www.bucardo.org/bucardo.html#BucardoLimitations



  





smime.p7s
Description: S/MIME Cryptographic Signature


Re: [GENERAL] Master-master replication with PostgreSQL

2008-04-16 Thread Alvaro Herrera
Rodrigo Gonzalez escribió:
 Yes and no

 bucardo cannot handle master-master-master

 What I am doing is

 masterA-masterB
 masterA-masterC
 masterA-masterD

Oh, I see -- makes sense.  Quite misleading docs, then.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Master-master replication with PostgreSQL

2008-04-16 Thread Rodrigo Gonzalez
Yes, basically there is something that is not theremakedelta is what 
allow me to do thatI took that information from bucardo mailing list


But the important thing is that is possible and maybe one day will be a 
true grid multimaster replication systemright now I didnt find 
anything better that work with pgsql


Alvaro Herrera escribió:

Rodrigo Gonzalez escribió:
  

Yes and no

bucardo cannot handle master-master-master

What I am doing is

masterA-masterB
masterA-masterC
masterA-masterD



Oh, I see -- makes sense.  Quite misleading docs, then.

  





smime.p7s
Description: S/MIME Cryptographic Signature


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread Vance Maverick
Craig Ringer craig(at)postnewspapers(dot)com(dot)au wrote:
 brian wrote:

  Use a timestamp column also.

 That's subject to the same issues.
 [...]
 I don't think it's even OK in the case of a single-statement INSERT
(where the
 transaction is implicit) and/or with the use of clock_timestamp() ...
though
 I'm less sure about that.

No, you're right.  The problem is that the timestamp is chosen some time
before the commit succeeds.  So if there are concurrent writers
committing at the same time, the order of commit is determined by a
race, one that takes place after the timestamp values are set.

Another approach would be to queue the log entries in a staging table,
so that a single process could move them into the log.  This is fairly
heavyweight, but it would guarantee the consistent sequencing of the log
as seen by a reader (even if the order of entries in the log didn't
always reflect the true commit sequence in the staging table).  I'm
hoping someone knows a cleverer trick.

Vance



Re: [GENERAL] I need to ecrypt one column to an output file

2008-04-16 Thread Ralph Smith

Ralph Smith wrote:


I need to do a simple query and output to a file.
No problem.
But how do I encrypt one column's output?
There are lots of developer related links here, and info to use  
the /contrib pgcrypto, but I'm not a PG developer and I failed to  
find any info on HOW TO USE that library function...

=
On Apr 15, 2008, at 11:31 PM, Richard Huxton wrote:



Have you installed the pgcrypto module?
Have you read the guide to pgcrypto in the manual?
 http://www.postgresql.org/docs/8.3/static/pgcrypto.html

--
 Richard Huxton
 Archonet Ltd

==
I'm still too dense to get it.  I'm sorry.

*  Where would I look to see if it is installed?
(I'm not the SA who did the install, nor do I have those privs.)

*  If not, must he go the source install route, (we're using Ubuntu),  
or via synaptic package installer (for PG v8.2)?


If it is there, I'm still too much a newbie to find the text to tell  
me how to use it.
I don't want to be totally hand held, but I need to find the HOW-TO  
guide(s).


*  What I want to do (right now) is simply encode usernames onto a  
hard copy.
*  What I will want to do in the future is to encode usernames INTO  
the DB.


Thanks again,

Ralph Smith
=

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] generate_series woes

2008-04-16 Thread Sam Mason
On Wed, Apr 16, 2008 at 09:01:10AM -0400, Merlin Moncure wrote:
 On Wed, Apr 16, 2008 at 8:37 AM, Volkan YAZICI [EMAIL PROTECTED] wrote:
  hubert depesz lubaczewski [EMAIL PROTECTED] writes:
   select i from generate_series(1,10) {hint:10} as i;
  
   i'm not proposiung syntax. i'm suggesting the functionality.
 
  I'm strongly declined for such non-SQL compliant solutions. I'd be
  appreciated if hackers can solve the problem internally, without bugging
  SQL syntax.
 
 maybe -- just an idle thought -- the 'ROWS' clause of create function
 could be expanded to take a simple expression based on the input
 parameters.

In computer science terms, I think you mean that you want something
known as dependant types.  Full dependant types are probably overkill
for PG, but some very limited form would be a fun project.  The idea,
in general, is to move code into the typesystem that is run during
typechecking and used to prove that your code is doing the right thing.
Within PG, this seems to naturally (from my naive viewpoint) extend to
the case of providing hints (proofs would be the normal use) to the
planner about what's going to happen when the code is run.

This seems to imply that types couldn't be stored as OIDs any more
(you'd be creating and destroying lots while planning the query) so
would probably change the structure of the code rather significantly.


  Sam

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Suggestion for psql command interpretation

2008-04-16 Thread Lincoln Yeoh

At 04:46 AM 4/16/2008, Colin Wetherbee wrote:

Tom Lane wrote:

Colin Wetherbee [EMAIL PROTECTED] writes:
I just thought I'd report it here in case it wasn't supposed to 
happen, but from what you say, it seems like it's a feature.
Well, it's more of a historical hangover.  Personally I'd not have 
much problem with breaking backward compatibility on this point.


I'm not generally in the habit of making typos, so this doesn't 
affect me too much.


However, IMHO and as I mentioned previously, I don't think 
\timimng should succeed.  I'll leave the rest of the discussion up 
to you and the other developers, though. :)


I too agree that \timimng should not work.

Maybe someone can think of an undesirable surprise that springs from 
this feature ;).


\i
\e

Are possible candidates (probably others could clobber files given a typo).

I think requiring a space between command and arguments would be a 
good idea. And not doing stuff if there are syntax errors...


Link.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread Craig Ringer

Vance Maverick wrote:


Another approach would be to queue the log entries in a staging table,
so that a single process could move them into the log.  This is fairly
heavyweight, but it would guarantee the consistent sequencing of the log
as seen by a reader (even if the order of entries in the log didn't
always reflect the true commit sequence in the staging table).


The way I see it, one way or another you are going to have to serialize 
writers, otherwise you'll always be faced with commit-order races.


I do have one other idea, but it's not pretty. Write a C/Python/whatever 
procedural function, say send_log_record(), that uses an appropriate 
inter-process communication mechanism to send the log message to another 
process. Your writers use this function to generate log records. The 
outside process receiving log records has a single connection to the DB 
open and it is the only writer to the log table, thus avoiding the 
problems with commit races with multiple writers. Your C function is 
bypassing transactional isolation by communicating with another process 
that modifies the DB, and in the process eliminating the need to hold 
the whole transaction up to ensure predictable log write ordering. 
However, it *does* mean that you'll get a log entry even if the 
transaction then aborts. You might be able to get around that by doing 
your logging with a deferred trigger, but there's always a risk that a 
later deferred trigger will fail and abort the transaction.


Doing it via a staging table is a *lot* nicer, and a lot lighter weight, 
than having your logging code force serialization of all operations in 
transactions that could otherwise, other than the log ordering 
requirement, run concurrently. Say, using an id generation table that 
each transaction locks to ensure ordered ID generation and commits. It's 
also properly transactional, so you won't have any log records for 
aborted transactions.


It's a pity PostgreSQL's RETURNING extension appear to doesn't support
INSERT INTO ... DELETE FROM ... RETURNING
because that'd make your log record mover a rather efficient one-liner.


The only other alternative I can think of is to have multiple writers 
inserting records into the same table the readers are reading from, but 
have the writers insert records with a timestamp field (say `visible') 
set to null.


A single helper (there must only ever be one) can then repeatedly run a 
command sequence like:


BEGIN;
UPDATE logtable
   SET visible = current_timestamp
 WHERE visible IS NULL
COMMIT;

to ensure that records became visible in timestamp order. No race here; 
like in the staging table approach you're using a single write 
transaction on the table being used by the readers.


Readers would filter for records that have `visible  
last_seen_visible', where last_seen_visible would be a literal, being 
the greatest value of `visible' seen in the last query for log records. 
They could trust that no records could ever be missed.


Unfortunately, with this approach you're incurring the cost of a dead 
row for every UPDATE. You can avoid that with 8.3 (using HOT) only if 
you have no index on `visible' - but having no index means a sequential 
scan of the log table for every UPDATE making rows visible and for every 
SELECT looking for the latest rows. Ouch.


That's basically just another way to write your log staging approach, 
anyway. It stays within a single table but it's not otherwise much 
different. I haven't any idea whether it'd perform better or worse than 
using a separate log staging table.



If you really want to make somebody cry, I guess you could do it with 
dblink - connect back to your own database from dblink and use a short 
transaction to commit a log record, using table-based (rather than 
sequence) ID generation to ensure that records were inserted in ID 
order. That'd restrict the critical section in which your various 
transactions were unable to run concurrently to a much shorter period, 
but would result in a log message being saved even if the transaction 
later aborted. It'd also be eye-bleedingly horrible, to the point where 
even the send a message from a C function approach would be nicer.


--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread brian

Craig Ringer wrote:

brian wrote:


Use a timestamp column also.


That's subject to the same issues, because a transaction's 
current_timestamp() is determined at transaction start. So, in a 
situation like this:



WRITER 1WRITER 2READER 1

BEGIN
BEGIN
INSERT
INSERT
COMMIT
BEGIN
SELECT
COMMIT

then READER 1 will see the most recent timestamp as that inserted by 
WRITER 2, but it won't see the row inserted by WRITER 1 with an earlier 
timestamp.


I don't think it's even OK in the case of a single-statement INSERT 
(where the transaction is implicit) and/or with the use of 
clock_timestamp() ... though I'm less sure about that.




I don't mean to rely on *only* the timestamp, but for the reader to 
remember both the last ID and the timestamp for that particular 
transaction. When the next read occurs it should check to see if there's 
an earlier timestamp with a higher ID than that remembered. The database 
knows that WRITER 1 was there first. If it's important to the 
application then the reader will need to take some action to re-order 
things based on what it has already read, which it could do if it if it 
compared timestamps and ID order. It needn't keep the complete set of 
IDs in memory.


Wait--would WRITER 1 have the higher ID?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] vacuum and cluster

2008-04-16 Thread Jimmy Choi
Presumably, even if CLUSTER does reindexing internally, it only does
that for the index used for clustering. Since REINDEX includes all
indices, CLUSTER cannot truly replace REINDEX. Correct?

Jimmy

On Wed, Apr 16, 2008 at 12:06 PM, Alvaro Herrera
[EMAIL PROTECTED] wrote:
 Craig Ringer escribió:


   It's not stated explicitly, but I'm pretty sure discussion here has
   mentioned that too. Given that, VACUUM FULL on a just-CLUSTERed table
   should be redundant.

  It is, and a REINDEX is redundant too because CLUSTER does it
  internally.

  --
  Alvaro Herrerahttp://www.CommandPrompt.com/
  PostgreSQL Replication, Consulting, Custom Development, 24x7 support


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Storage sizes for dates/times (documentation bug?)

2008-04-16 Thread Sam Mason
On Wed, Apr 16, 2008 at 05:09:56PM +0200, Karsten Hilbert wrote:
 On Tue, Apr 15, 2008 at 03:31:46PM +0100, Sam Mason wrote:
  But I was under the impression that you didn't want any time zone
  information.
 Wrong impression.

Doh, yes.

  I must be missing something then, can you explain why the original time
  zone matters?
 
 a) I want to be able to display when a patient's appointment
happened in local time.
 
 b) I must be able to aggregate appointments from different
time zones into a coherent EMR. For that I need to be able
to map them onto, say, UTC.

Hum, what's an EMR?

 Taken together this could be served nicely by a UTC-storing,
 local-tz-remembering timestamp.

Why not do:

  CREATE TYPE tstz AS ( ts TIMESTAMP WITH TIME ZONE, tz TEXT );

And use this instead?

  If you actually hardcoded your timezone as
  GMT+6, or whatever, then yes it may be different.  But only if you went
  around at midnight March 31st, changing computers to be GMT+5
 The machines do that by themselves.

What sort of machines do this?  With computers I've used, if its time
zone is set to the local time of some specific location then yes it
will.  If you set it to some specific offset then no it won't.  These
are independant cases, and not the one I was drawing your attention to
above.  These cases are also independant of the original problem as
well.

If it's adjusting for local time, then it'll know when to apply DST
offsets.  I don't think a timestamp should ever change just because
you're looking at from different DST values.  I think this is why TIMEs
are somewhat awkward beasts, I've tried to stay away from them because
I can't build a consistant model of how they should function.  If TIME
values are only used to store values and not to perform any calculations
on then I see some utility.

  In some cases yes I'd agree, but I have a feeling the number of cases is
  surprisingly small in practise.
 The sampling may not be that large but when the problem is
 there it is painful.

Yes, I'm sure it is!

 Basically, akin to there's no such thing as plain text
 there should be there's no such thing as a timezone-less
 timestamp.

Or maybe, a programming language should allow you to define your own
abstractions if the defaults don't fit.


  Sam

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] vacuum and cluster

2008-04-16 Thread Alvaro Herrera
Jimmy Choi escribió:
 Presumably, even if CLUSTER does reindexing internally, it only does
 that for the index used for clustering. Since REINDEX includes all
 indices, CLUSTER cannot truly replace REINDEX. Correct?

No.  Cluster rewrites all indexes (otherwise their entries would end up
pointing to incorrect places in the heap.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread David Wilson
(I originally missed replying to all here; sorry about the duplicate,
Vance, but figured others might be interested.

On Wed, Apr 16, 2008 at 1:55 PM, Vance Maverick [EMAIL PROTECTED] wrote:

  Another approach would be to queue the log entries in a staging table,
  so that a single process could move them into the log.  This is fairly
  heavyweight, but it would guarantee the consistent sequencing of the log
  as seen by a reader (even if the order of entries in the log didn't
  always reflect the true commit sequence in the staging table).  I'm
  hoping someone knows a cleverer trick.


Consider a loop like the following

advisory lock staging table
if (entries in table)
   copy entries to main log table as a single transaction
release advisory lock on staging table
read out and handle most recent log entries from main table

The advisory lock is automatically released on client disconnect, and
doing the whole thing within one transaction should prevent any
partial-copies on failures.

It doesn't matter that there are concurrent inserts to the staging
table because the staging table is always wiped all at once and
transferred in a synchronous fashion to the main table. You also can't
lose data, because it's always in one of the two tables.

-- 
- David T. Wilson
[EMAIL PROTECTED]

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Storage sizes for dates/times (documentation bug?)

2008-04-16 Thread Karsten Hilbert
On Wed, Apr 16, 2008 at 08:21:15PM +0100, Sam Mason wrote:

 Hum, what's an EMR?

Sorry, Electronic Medical Record.

 Why not do:
 
   CREATE TYPE tstz AS ( ts TIMESTAMP WITH TIME ZONE, tz TEXT );
 
 And use this instead?
That should work. At the time (a couple of years ago) I
wasn't aware of all the implications. Indexability, operator
availability, computability ...  I'm still not sure I'd know
all the pitfalls.

 What sort of machines do this?  With computers I've used, if its time
 zone is set to the local time of some specific location then yes it
 will.  If you set it to some specific offset then no it won't.  These
 are independant cases, and not the one I was drawing your attention to
 above.  These cases are also independant of the original problem as
 well.
All true. I misunderstood what you said.

  Basically, akin to there's no such thing as plain text
  there should be there's no such thing as a timezone-less
  timestamp.
 Or maybe, a programming language should allow you to define your own
 abstractions if the defaults don't fit.
Surely so and both Python and PostgreSQL have both been very
helpful in this regard.

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] ALTER TABLE DDL Triggers?

2008-04-16 Thread Richard Broersma
I don't believe that DDL Triggers exist, correct?  Actually I am
really curious about what options exist to automatically keep table
definitions in sink (for two table) over time: for example a table and
history table pair.

I realise that History-Entity-Attribute-Value tables don't need this
kind of maintenance, but I am depending on the ability to frequently
query the history tables for the previous values of a particular
record.  I don't believe that History-EAV tables can do this.

-- 
Regards,
Richard Broersma Jr.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Curiosity with catalog table array attribute index dimensions

2008-04-16 Thread Erik Jones
So, I noticed today that pg_index.indkey index values start at 0  
instead of 1 as I'd expected.  Are there other cases of this?   
Shouldn't that be documented somewhere?


Erik Jones

DBA | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate  market in style.
Visit us online at http://www.myemma.com




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to recovery data from folder data installation?

2008-04-16 Thread Ron Mayer

Magnus Hagander wrote:

Craig Ringer wrote:

with a version of PostgreSQL with the same minor version as the one
you were using on the server, eg if you were using 8.1.4 you should
get the latest PostgreSQL in the 8.1 series (NOT 8.2 or 8.3) to try
to read the data.


What you mean here is of course that you should be using the same
*major* version, but the latest available *minor* version.

Other than that, it's correct - and the version numbers in the example
is correct :-)


If we want to stop people from being confused about this terminology,
we would need to make our definitions of these terms more visible
on the main web site.

Many places talk about the latest release - but unfortunately
with differing definitions.  For example, latest release is
  defined as 8.3 in the english language FAQ;
  defined as 8.3.1 and 8.2.7, etc on the english language home page;
  defined as 8.2.3 on the traditional chinese FAQ, etc.

Some places (the english FAQ) talk about major releases happening every
year and minor releases happening every few months, but without defining
what that means to the numbering.

Perhaps there should be a FAQ that says that our numbering system
defines the digits as marketing.major.minor.

It's no surprise to me that people don't use those terms consistently.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to recovery data from folder data installation?

2008-04-16 Thread Bruce Momjian
Ron Mayer wrote:
 Magnus Hagander wrote:
  Craig Ringer wrote:
  with a version of PostgreSQL with the same minor version as the one
  you were using on the server, eg if you were using 8.1.4 you should
  get the latest PostgreSQL in the 8.1 series (NOT 8.2 or 8.3) to try
  to read the data.
  
  What you mean here is of course that you should be using the same
  *major* version, but the latest available *minor* version.
  
  Other than that, it's correct - and the version numbers in the example
  is correct :-)
 
 If we want to stop people from being confused about this terminology,
 we would need to make our definitions of these terms more visible
 on the main web site.
 
 Many places talk about the latest release - but unfortunately
 with differing definitions.  For example, latest release is
defined as 8.3 in the english language FAQ;

FAQ updated.

defined as 8.3.1 and 8.2.7, etc on the english language home page;
defined as 8.2.3 on the traditional chinese FAQ, etc.
 
 Some places (the english FAQ) talk about major releases happening every
 year and minor releases happening every few months, but without defining
 what that means to the numbering.
 
 Perhaps there should be a FAQ that says that our numbering system
 defines the digits as marketing.major.minor.

That is in our documentation.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Curiosity with catalog table array attribute index dimensions

2008-04-16 Thread Tom Lane
Erik Jones [EMAIL PROTECTED] writes:
 So, I noticed today that pg_index.indkey index values start at 0  
 instead of 1 as I'd expected.  Are there other cases of this?   
 Shouldn't that be documented somewhere?

Well, the CREATE TYPE manual page has

For historical reasons (i.e., this is clearly wrong but it's far too
late to change it), subscripting of fixed-length array types starts from
zero, rather than from one as for variable-length arrays.

oidvector and int2vector aren't fixed-length anymore, but they maintain
their old behavior in this respect for backward compatibility.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Curiosity with catalog table array attribute index dimensions

2008-04-16 Thread Erik Jones

On Apr 16, 2008, at 4:26 PM, Tom Lane wrote:

Erik Jones [EMAIL PROTECTED] writes:

So, I noticed today that pg_index.indkey index values start at 0
instead of 1 as I'd expected.  Are there other cases of this?
Shouldn't that be documented somewhere?


Well, the CREATE TYPE manual page has

   For historical reasons (i.e., this is clearly wrong but it's far  
too
   late to change it), subscripting of fixed-length array types  
starts from

   zero, rather than from one as for variable-length arrays.

oidvector and int2vector aren't fixed-length anymore, but they  
maintain

their old behavior in this respect for backward compatibility.


Ah, thanks, at least now I know what to look for instead of using  
trial and error.


Erik Jones

DBA | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate  market in style.
Visit us online at http://www.myemma.com




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] ALTER TABLE DDL Triggers?

2008-04-16 Thread Chris Browne
[EMAIL PROTECTED] (Richard Broersma) writes:
 I don't believe that DDL Triggers exist, correct?  

That is correct.[1]

The usual point is that you cannot attach triggers to pg_catalog
tables, which would be the obvious way of trying to notice DDL
changes.  (e.g. - by having triggers that would fire when pg_class or
pg_attribute are changed)

It would presumably not be _completely_ implausible to run a trigger
when a table was ALTERed; the trouble would be in evaluating the
semantics what OLD.* and NEW.* ought to contain, _if anything_.

If you took the [1] TRUNCATE approach, there isn't _anything_ (in
terms of OLD.*/NEW.* values) passed to the function; in that case, it
is sufficient to have something (like a function parameter) to
identify the table, and thus pass that info downstream (e.g. - with
replication, passing the TRUNCATE on to downstream nodes).  TRUNCATE
is pretty simple; we know well that all it's doing is to get rid of
all the contents of the table at a point in time.

In contrast, the effect of ALTER TABLE is to make near-arbitrary
alterations to pg_class, pg_attribute, and such, and there is, on the
one hand, no obvious semantic of what data to even imagine passing on,
and, on the other, a grand problem of reconstructing the change if you
*did* have access to those underlying tables.  That's usually where
the discussion peters out when people propose DDL triggers.

[1] Or about 99% so.  There is a change committed for 8.4 where
TRUNCATE can fire a trigger.  But it's somewhat disputable whether
TRUNCATE should properly be considered DDL or not.
-- 
let name=cbbrowne and tld=linuxfinances.info in name ^ @ ^ tld;;
http://www3.sympatico.ca/cbbrowne/advocacy.html
When aiming for the common denominator, be prepared for the occasional
division by zero.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] ALTER TABLE DDL Triggers?

2008-04-16 Thread Richard Broersma
On Wed, Apr 16, 2008 at 2:49 PM, Chris Browne [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] (Richard Broersma) writes:
  I don't believe that DDL Triggers exist, correct?

 That is correct.[1]

 The usual point is that you cannot attach triggers to pg_catalog
 tables, which would be the obvious way of trying to notice DDL
 changes.  (e.g. - by having triggers that would fire when pg_class or
 pg_attribute are changed)

Thanks for taking the time to explain the situation.
-- 
Regards,
Richard Broersma Jr.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Master-master replication with PostgreSQL

2008-04-16 Thread Greg Smith

On Wed, 16 Apr 2008, Rob Collins wrote:


My client has a flawed MS SQL Server system


There's another kind?

From what I've read, Slony-I does only master-slave replication and 
Slony-II is not being actively developed. Is this right? Are there any 
viable master-master replication tools for PostgreSQL.


Your later comments suggest you specifically need an asynchronous 
multi-master system, which rules out several suggestions.


I've been collecting information on this topic at 
http://wiki.postgresql.org/wiki/Replication%2C_Clustering%2C_and_Connection_Pooling 
which you might find useful.  Nothing jumps out at me as being more 
appropriate for the situation you describe than Bucardo.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread Craig Ringer

brian wrote:

I don't mean to rely on *only* the timestamp, but for the reader to 
remember both the last ID and the timestamp for that particular 
transaction. When the next read occurs it should check to see if there's 
an earlier timestamp with a higher ID than that remembered.


[snip]


Wait--would WRITER 1 have the higher ID?


No, it'll have a lower id in this case because it calls 
nextval('sequence_name') first. Writer 1 would have a lower id, and a 
lower timestamp (because its transaction began first) even if it 
committed later. Using clock_timestamp() in the insert will not help, 
because the first transaction to insert (as in this case) is not 
necessarily the first to commit.


If a reader sees a given id and timestamp, that doesn't meant there 
aren't transactions with lower ids, and lower timestamps, still 
uncomitted or in the process of committing.


What you want is a timestamp that's generated at commit time with a 
guarantee that no later commits will have equal or lower timestamps . As 
far as I know (I'm *VERY* far from the authority here) there's no way to 
achieve that, so you have to serialize your commits to the table somehow.


--
Craig Ringer


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Binary bytea to literal strings

2008-04-16 Thread Dragan Zubac

Hello

Got one table:

Column  |  Type   |   Modifiers  
-+-+---

message | bytea   |

insert data like for example:

insert into zub (message) values (E'\004\065 Ciao');

is it possible to somehow get the following:

0x040x35 Ciao

Sincerely

Dragan

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread Andrew Sullivan
On Thu, Apr 17, 2008 at 12:35:33AM +0800, Craig Ringer wrote:
 That's subject to the same issues, because a transaction's 
 current_timestamp() is determined at transaction start. 

But clock_timestamp() (and its ancestors in Postgres) don't have that
restriction.  I dunno that it's enough for you, though, since you have
visibility issues as well.  You seem to want both the benefits of files and
relational database transactions, and I don't think you can really have both
at once without paying in reader complication.

One way I can think of doing it is to write a seen_log that notes what the
client has already seen with a timestamp of (say) 1 minute.  Then you can
say go forward from this time excluding ids (ids here).

A


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Binary bytea to literal strings

2008-04-16 Thread Craig Ringer
Dragan Zubac wrote:
 Hello
 
 Got one table:
 
 Column  |  Type   |   Modifiers 
 -+-+---
 message | bytea   |
 
 insert data like for example:
 
 insert into zub (message) values (E'\004\065 Ciao');
 
 is it possible to somehow get the following:
 
 0x040x35 Ciao

Not without using a function that knows how your data is structured.
\065 is, in the ASCII encoding, the digit '5'. The only function that
might've done what you wanted would be something that interpreted the
bytea as an extended ASCII string and printed hex escapes for characters
not in the ASCII printable range. That won't work if some of the values
 you wish to escape are printable.

Personally, as you appear to want to treat the two parts of the message
differently, I'd split the value up into a text and bytea part and store
it in those two separate fields. I'd then provide a  formatting function
that took the two parts as arguments and returned a string where the
first part is represented as hex escapes and the second part as literal
text.

For how to get integer values for characters, hex-escape bytea values,
etc, see:

http://www.postgresql.org/docs/current/static//functions-binarystring.html

I'd probably wrap it all up in a composite type with CREATE TYPE, but
you may not want to bother with that - it's a bit of hassle creating the
operator class, ordering functions, etc.



On a side note, is there any way I can get Pg to automatically generate
an operator class and set of comparison operators for a composite type?

I've run into a few situations where it'd be nice to do things like
compare two rows element by element, first to last, or create an index
on a simple composite type that's naturally sorted by
(first,second,) . Most cases are sorted out easily enough by
wrapping the records in a row constructor, eg

ROW(rec1.*)  ROW (rec2.*)

but it seems like it'd make sense to be able to CREATE TYPE to get a
named composite type with the same operator behavior as is implicit in
ROW(...) comparisons. What's trivial with a row constructor requires the
writing of an operator class and a bunch of comparison functions to do
with a named composite type.

--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread Craig Ringer
Andrew Sullivan wrote:
 On Thu, Apr 17, 2008 at 12:35:33AM +0800, Craig Ringer wrote:
 That's subject to the same issues, because a transaction's 
 current_timestamp() is determined at transaction start. 
 
 But clock_timestamp() (and its ancestors in Postgres) don't have that
 restriction.

True, as I noted later, but it doesn't help. AFAIK you can't guarantee
that multiple concurrent INSERTs will be committed in the same order
that their clock_timestamp() calls were evaluated.

Consequently, you can still have a situation where a record with a lower
timestamp becomes visible to readers after a record with a higher
timestamp has, and after the reader has already recorded the higher
timestamp as their cutoff.

 I dunno that it's enough for you, though, since you have
 visibility issues as well.  You seem to want both the benefits of files and
 relational database transactions, and I don't think you can really have both
 at once without paying in reader complication.

Or writer complication. In the end, the idea that using a file based log
wouldn't have this problem is based on the implicit assumption that the
file based logging mechanism would provide some sort of serialization of
writes.

As POSIX requires the write() call to be thread safe, write() would be
doing its own internal locking (or doing clever lock-free queueing etc)
to ensure writes are serialized.

However, at least in Linux fairly recently, writes aren't serialized, so
you have to do it yourself. See:

http://lwn.net/Articles/180387/

In any case, logging to a file with some sort of writer serialization
isn't significantly different to logging to a database table outside
your transaction using some sort of writer serialization. Both
mechanisms must serialize writers to work. Both mechanisms must operate
outside the transactional rules of the transaction invoking the logging
operation in order to avoid serializing all operations in transactions
that write to the log on the log.

 One way I can think of doing it is to write a seen_log that notes what the
 client has already seen with a timestamp of (say) 1 minute.  Then you can
 say go forward from this time excluding ids (ids here).

It won't work with multiple concurrent writers. There is no guarantee
that an INSERT with a timestamp older than the one you just saw isn't
waiting to commit.

--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread Andrew Sullivan
On Thu, Apr 17, 2008 at 12:44:51PM +0800, Craig Ringer wrote:
 It won't work with multiple concurrent writers. There is no guarantee
 that an INSERT with a timestamp older than the one you just saw isn't
 waiting to commit.

This is pretty unlikely -- I won't say impossible, because I'm sure there's
some kernel-level race condition -- if you use the clock time approach and
SERIALIZABLE mode.  You could add a trigger that checks for other timestamps
 yours, I suppose.  Of course, that's pretty heavyweight, too.  How much is
the absolute serialization worth to you in performance?

The only other thing I can suggest is what someone else did: commit them
with wallclock timestamps, and then have a different thread wake up every
_n_ seconds and put the records into the proper table in timestamp order.

A

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread Andrew Sullivan
Oh, one other thing

On Thu, Apr 17, 2008 at 12:44:51PM +0800, Craig Ringer wrote:
 
  One way I can think of doing it is to write a seen_log that notes what the
  client has already seen with a timestamp of (say) 1 minute.  Then you can
  say go forward from this time excluding ids (ids here).
 
 It won't work with multiple concurrent writers. There is no guarantee
 that an INSERT with a timestamp older than the one you just saw isn't
 waiting to commit.

Yeah, I spoke imprecisely.  The idea is, Start at timestamp _t_, but don't
re-process these ones, which I've seen.  The trick is to set your start _t_
far enough back in time that it's incompatible with your business logic that
anything could still be pending from then.  This is nasty and prone to bugs,
but it can be coded up.

A

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general