from:"Dawid Kuroczko"

Re: [GENERAL] In the belly of the beast (MySQLCon)

2008-04-20 Thread Dawid Kuroczko

On Fri, Apr 18, 2008 at 9:04 PM, Joshua D. Drake [EMAIL PROTECTED] wrote:
 On Fri, 18 Apr 2008 14:59:34 -0400
  Alvaro Herrera [EMAIL PROTECTED] wrote:

   I find it pretty unlikely that Slony would be unable to help you
   upgrade here.  Obviously you can upgrade one database at a time.
  
   Also, mostly it's not the database size what's a concern, but rather
   the size of the largest table.

  As I recall (I could be wrong) Slony syncs the whole set as a single
  transaction. So if he has his entire database as a set he may
  have a problem regardless of the largest or smallest table. I would also
  agree that 30 million rows is likely not a problem but he should still
  check his velocity.

Well, you can sync one table at a time (create a (temporary) set with
one able, subscribe it, SYNC, MERGE SET, rince, lather, repeat). :)

   Regards,
   Dawid

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Which Python library - psycopg2 or pygresql?

2008-04-15 Thread Dawid Kuroczko

So I thought, lets learn a bit of Python, and I stumbled upon
a choice of these two libraries.  Whch would you suggest?
How do they differ?

By the looks of descriptions I am slightly inclined towards
psycopg2, but I would feel better if I talked with people
who actually used these libraries.

   Regards,
 Dawid

PS: I don't want to start a flame war!  I just feel I need a bit
of knowledge-push to get me going. ;-)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres on shared network drive

2008-04-12 Thread Dawid Kuroczko

On Fri, Apr 11, 2008 at 2:54 PM, Craig Ringer
[EMAIL PROTECTED] wrote:
 Pavan Deolasee wrote:
  I wonder if it would make sense to add support to mount database in
  *read-only* mode from multiple servers though. I am thinking about
  data warehouse kind of operations where multiple servers can be
  used answer read-only queries. Is there a use case for such applications
  in real world ?

Not quite workable.  Remember that table data is not always available on
the block device -- there are pages modified in the buffer cache (shared
memory), and other machines have no access to the other's shared memory
(and it would be a lot of work to do it efficiently).  Remember also about the
MVCC -- if your read only copy machine starts a complicated query on
some big_table, and in the meanwhile read-write machine decides the
big_table's pages can be reused... well your read-only machine doesn't
even have a way of knowing its returning garbage data. ;-)

Noow, if you really really want a read-only copy of the read write data
available over the network, many NAS/SAN devices will allow you to
make a snapshot of the database -- and you can use that snapshot as
a read-only copy of the database.  But then again, if you want a read-only
copy of a days/weeks old database, there are chaper and better ways of
doing it.

  I'm not sure that makes sense myself. The reason you 'd want multiple read
 only instances is for performance and/or redundancy. Neither of those goals
 are well served by having a shared data store.

  A shared data store won't help performance much because both instances will
 compete for I/O bandwidth. It might be faster if most of the regularly used
 data and indexes fit in memory on the host, but even then I'd personally be
 surprised if the cost of the network/shared storage didn't counteract that
 at least in part.

That is assuming your bottleneck is the I/O subsystem.  If your data fits nicely
in RAM, but you are CPU bound, sometimes it is sensible to have two
machines than having one twice as powerful machine.  Also its easier to
add third machine later, than to buy yet more powerful one.
But this if a field where YMMV.

A known implementation of such a set up would be Oracle RAC, where
you have a shared storage and N machines using it.

  For redundancy, you ideally want to avoid shared infrastructure that can
 fail - like shared storage. It's likely to be better to keep separate copies
 of the data store on each host.

  There are systems - like Slony-I and log shipping replication - that can
 keep servers in sync without shared storage, and are almost certainly more
 useful than shared-storage DB servers.

  What I do think would be very interesting would be the ability to have a DB
 serving read-only queries while still reading in shipped WAL archives as
 they arrive. That'd be a potential big win on performance because each DB
 server could have its own I/O system, CPUs and RAM . With some trickyness
 you could even forward queries that did require writes to the master server
 transparently, while servicing read only queries locally.

Something like pgpool (which can forward read-write queries to a master, and
handle selects on a pool of read-only machines).

While I think pgpool, pgbouncer and Slony-I are great pieces of
software, I would
like to wake up one day and know that PostgreSQL can do it all internally, under
the hood, just like it does WAL-logging and startup recovery automatically. ;-)

   Regards,
   Dawid

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres on shared network drive

2008-04-12 Thread Dawid Kuroczko

On Sat, Apr 12, 2008 at 8:11 PM, Pavan Deolasee
[EMAIL PROTECTED] wrote:
 On Sat, Apr 12, 2008 at 11:00 PM, Dawid Kuroczko [EMAIL PROTECTED] wrote:
Not quite workable.  Remember that table data is not always available on
the block device -- there are pages modified in the buffer cache (shared
memory), and other machines have no access to the other's shared memory
(and it would be a lot of work to do it efficiently).  Remember also 
 about the
MVCC -- if your read only copy machine starts a complicated query on
some big_table, and in the meanwhile read-write machine decides the
big_table's pages can be reused... well your read-only machine doesn't
even have a way of knowing its returning garbage data. ;-)
  I am not suggesting one read-write and many read-only architecture. I am
  rather suggesting all read-only systems. I would be interested in this
  setup if I run large read-only queries on historical data and need easy
  scalability. With read-only setup, you can easily add another machine to
  increase computing power. Also, we may come up with cache-sharing
  systems so that if a buffer is cached on some other node, that can
  be transfered on a high speed interconnect, rather than reading from a
  relatively slower disk.

For example, it coulde be done by an ability to start a cleanly shutdown
database in read-only mode.  I would see it as a very helpful companion
for a PITR recovery.

So in recover.conf you could say you want a recovery done until '10:00',
and then do some read-only queries, decide you want to recover until '10:15',
and so on until you find a place where someone did a big mistake.

Hmm, would be helpful to ask recovery process to create write before logs,
i.e. logs which would allow you to 'roll back' whole recovery of the database
to '10:05', should you decide '10:15' is too late.

Possible TODO entry? ;-)

[...]
  Yes. I was mostly assuming read-only scalability. What are the other
  better ways to do so ?

I was thinking you were saying...  Let's drop the issue, I misunderstood. :-)

Regards,
Dawid

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] SQL injection, php and queueing multiple statement

2008-04-12 Thread Dawid Kuroczko

On Fri, Apr 11, 2008 at 9:21 PM, Ivan Sergio Borgonovo
[EMAIL PROTECTED] wrote:
 Is there a switch (php side or pg side) to avoid things like:

  pg_query(select id from table1 where a=$i);

  into becoming

  pg_query(select id from table1 where a=1 and 1=1; do something
  nasty; -- );

  So that every
  pg_query(...) can contain no more than one statement?

Well, use prepared statements.

Apart from that, make it impossible to do something nasty.  Your
php_db_user should be
allowed as little as possible.  Specifically:
 * she should not be owner of the tables/other objects -- this way you are safe
from nasty DROP TABLEs and the like.
 * you should decide where she is allowed to INSERT/UPDATE/DELETE, the latter
two are the most dangerous ones.
 * you should make use of referential integrity constraints -- so evil
DELETE or UPDATE
will probably fail on these. ;)
 * you should provide PL/pgSQL stored procedures to update your vital
data.  So evil
bulk delete/update will be harder to accomplish (if your evildoer can
craft exploit to
do it, he probably already has a lot of access to your system ;)).

...oh and think about isolating read-only acces (read only user) from
rw-user -- if
that sounds reasonable to do so.

   Regards,
  Dawid

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Survey: renaming/removing script binaries (createdb, createuser...)

2008-03-28 Thread Dawid Kuroczko

On Fri, Mar 28, 2008 at 3:41 PM, Tomasz Ostrowski
[EMAIL PROTECTED] wrote:
 On 2008-03-28 02:00, Andrej Ricnik-Bay wrote:
   On 28/03/2008, Dawid Kuroczko [EMAIL PROTECTED] wrote:
Agree, except I would prefer pg instead of pgc.
  

  And it's been taken for about 35 years by a Unix command called page.
   From its man-page.

 pg - browse pagewise through text files

  So maybe pctl, consistent with psql.

  It is short enough, does not need shift and does not confuse, if man
  knows that it has something to do with Postgres. It looks it is yet not
  taken.

I like it.  Personally pctl feels better than pgc :-)

   Regards,
   Dawid

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Survey: renaming/removing script binaries (createdb, createuser...)

2008-03-27 Thread Dawid Kuroczko

  1) What type of names do you prefer?
1 b
  2) How often do you use these tools?
2 c
  3) What name of initdb do you prefer?
3 e (pg_createcluster by Debian), then d or b
  4) How do you perform VACUUM?
4 c b (autovac  sql vacuum)

   Regards,
 Dawid

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Survey: renaming/removing script binaries (createdb, createuser...)

2008-03-27 Thread Dawid Kuroczko

On Wed, Mar 26, 2008 at 6:46 PM, Ron Mayer
[EMAIL PROTECTED] wrote:
 Zdeněk Kotala wrote:

   1) What type of names do you prefer?

  I'd prefer a pg program that took as arguments
  the command.  So you'd have pg createdb instead
  of pg_createdb.

  There are many precedents. cvs update, git pull
  apt-get install.

  Anyone else like this approach?

+10

I like pg command approach the best!  Clean, simple, short and powerful.

Erase my previous vote. ;)  I prefer:
pg createdb

  Regards,
 Dawid

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Survey: renaming/removing script binaries (createdb, createuser...)

2008-03-27 Thread Dawid Kuroczko

On Thu, Mar 27, 2008 at 11:49 PM, Greg Smith [EMAIL PROTECTED] wrote:
 On Thu, 27 Mar 2008, Leif B. Kristensen wrote:

   I figure something like the more or less standard options for modern
   *nixes, with short and long options like eg.
   pgc -C, --createdb ...

  The idea thrown out was to use something like the CVS/svn model where a
  single command gets called followed by either the name of the subcommand
  or a very short abbreviation for it.  Here's a first cut of how I would
  translate the current names, with the things I use more given the shorter
  abbreviations in cases where there's some overlap in characters:

  pgc cluster
[...]

Agree, except I would prefer pg instead of pgc.

Why?
When I see pgc I am not sure what the command is for -- it looks like
a short form for pg create something, or maybe alias for pg_ctl, of which
I know is for starting and stopping database (and not for creating users ;)).

With pg I am sure that the comand is generic to the extreme, so I don't
have to assume what does c stand for.  Control?  Create?  Client?  or Command.

Also its about 33% shorter. ;-)

   Regards,
 Dawid

PS: And I feel it feels more natural to say pg createuser than pgc
create user,
but that's solely my typing impression.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Feature request/suggestion - CREATE SCHEMA LIKE

2008-03-18 Thread Dawid Kuroczko

On Mon, Mar 17, 2008 at 9:01 PM, wstrzalka [EMAIL PROTECTED] wrote:
 Hi

Features like CREATE DATABASE WITH TEMPLATE or CREATE TABLE LIKE
  are very usefull but it would be great to have such a feature on the
  mid-level too. I mean something CREATE SCHEMA LIKE that would copy all
  the template schema relations, etc...
   What do you think about it ? Would it be hard to implement ? Is it
  worth the effort ?

I think it is a bit too complicated for the backend -- you
have to copy functions, views, types along the tables.
And most importantly -- their dependencies (for the order in
which to create them).  Chances are that user defined
functions won't work in new schema.  Tricky to say the least.
Perhaps a pg_dump -s with an option to rename the schema
would be a better option to consider (sed(1) is a good friend,
but IMHO explicit option would be much better).

If you insist in putting it in database -- a PL/pgSQL
function would be the best approach IMHO, something along:

CREATE OR REPLACE FUNCTION create_schema_like(old_name name, new_name
name) RETURNS void AS $$
DECLARE
rel_name name;
old_schema text;
new_schema text;
ddl text;
path text;
BEGIN
path := current_setting('search_path');
old_schema := quote_ident(old_name);
new_schema := quote_ident(new_name);

EXECUTE 'CREATE SCHEMA '||new_schema;
FOR rel_name IN SELECT tablename FROM pg_tables WHERE
schemaname=old_schema LOOP
ddl := 'CREATE TABLE
'||new_schema||'.'||quote_ident(rel_name)

 ||' (LIKE '||old_schema||'.'||rel_name
||' INCLUDING DEFAULTS INCLUDING
CONSTRAINTS INCLUDING INDEXES)';
EXECUTE ddl;
END LOOP;
-- If we set search_path to old schema, definitions will have
schemanames from other schemas prepended where necessary
EXECUTE 'SET LOCAL search_path TO '||old_schema;
FOR rel_name, ddl IN SELECT viewname,definition FROM
pg_views WHERE schemaname = old_name LOOP
EXECUTE 'SET LOCAL search_path TO '||new_schema;
ddl := 'CREATE VIEW
'||quote_ident(rel_name)||' AS '||ddl;
EXECUTE ddl;
END LOOP;

EXECUTE 'SET LOCAL search_path TO '||path;
RETURN;
END;
$$ LANGUAGE PLpgSQL STRICT;

Of course you need also to:
 * copy functions, types, etc, etc.
 * pray that dependencies are met or get acquainted with pg_depend :)
 * take care of ownerships, ACLs and tablespaces

In my opinion this is way too complicated to put it inside the backend.
It is mostly already inside pg_dump, so either pg_dump|sed|psql or
TODO: pg_dump: optional parameter for renaming schemas (and
tablespaces, and owners).

   Regards,
  Dawid
-- 
Solving [site load issues] with [more database replication] is a lot
like solving your own personal problems with heroin - at first it
sorta works, but after a while things just get out of hand.
 - Fred B. Schneider, PhD

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Trigger to run @ connection time?

2008-03-13 Thread Dawid Kuroczko

On Tue, Mar 11, 2008 at 10:28 PM, Tom Lane [EMAIL PROTECTED] wrote:
 Kynn Jones [EMAIL PROTECTED] writes:
   If one can set up this insert operation so that it happens automatically
   whenever a new connection is made, I'd like to learn how it's done.

  For manual psql sessions, you can put some setup commands in ~/.psqlrc.
  In any other context I'm afraid you're stuck with modifying your client
  application code.

  An ON CONNECT trigger enforced by the database seems a bit scary to me.
  If it's broken, how you gonna get into the DB to fix it?

Well, I would benefit from ON CONNECT trigger, I must admit.

An application which uses tsearch2 ('SELECT set_curdict() / set_curcfg()' being
called upon session start).  That is fine and that works.

Now, using statement pooling solution like pgbouncer is great benefit for this
specific application.  There is one little problem however -- one can never be
sure when session is started.  As a countermeasure there is a need to call
set_curdict() in every transaction (which is very fast), but one needs
to remember
to call that set_curdict() every time.

ON CONNECT trigger would solve that neatly!

Wouldn't be enough to disallow ON COMMIT triggers for SUPERUSERs?
And a BIG FAT WARNING in documentation to wrap the trigger with
BEGIN ... EXCEPTION WHEN OTHERS RAISE NOTICE ... END, and have
a second user handy with proper permissions?

Dawid
-- 
Solving [site load issues] with [more database replication] is a lot
like solving your own personal
problems with heroin - at first it sorta works, but after a while
things just get out of hand.
 - Fred B. Schneider, PhD

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Problem with GRANT in 7.4.19

2008-03-13 Thread Dawid Kuroczko

On Thu, Mar 13, 2008 at 11:52 AM, Kakoli Sen [EMAIL PROTECTED] wrote:
 This time the command GRANT ALL PRIVILEGES ON DATABASE casDatabase to
 tester; did not give error.

 But the permission is still not there. I run the following commands :

 psql -d casDatabase -U tester -W. Then \z command shows empty Access
 Privileges on all tables.

 Also select * from table_name; gives error :

 ERROR:  permission denied for relation table_name

You have to GRANT permissions to each and every table separately.

This is how most (all big) databases work. :-)

   Regards,
 Dawid

PS:
SE:LECT 'GRANT ALL ON '||schemaname||'.'||tablename||' TO tester' FROM
pg_tables; -- to get you started. :)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Trigger to run @ connection time?

2008-03-13 Thread Dawid Kuroczko

On Thu, Mar 13, 2008 at 2:18 PM, Marko Kreen [EMAIL PROTECTED] wrote:
 On 3/13/08, Dawid Kuroczko [EMAIL PROTECTED] wrote:
An application which uses tsearch2 ('SELECT set_curdict() / set_curcfg()' 
 being
called upon session start).  That is fine and that works.
  
Now, using statement pooling solution like pgbouncer is great benefit for 
 this
specific application.  There is one little problem however -- one can 
 never be
sure when session is started.  As a countermeasure there is a need to call
set_curdict() in every transaction (which is very fast), but one needs
to remember
to call that set_curdict() every time.
  
ON CONNECT trigger would solve that neatly!

  Hm.  It seems to make more sense to implement connect-time
  hook directly in pgbouncer.

Indeed that would solve the issue.  But then again it could be argued that
PL/pgSQL could be implemented outside the backend as well. ;-)

I see it as an addition which does have its applications.

   Regards,
   Dawid

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Find Number Of Sundays Between Two Dates

2008-03-05 Thread Dawid Kuroczko

On 3/5/08, A. Kretschmer [EMAIL PROTECTED] wrote:
  With generate_series() i generate a list of dates, and later i check if
  the date are a saturday. Okay, you need to know sunday - change from 6
  to 0 and ou course, you can calculate the parameter for the
  generate_series like

I find it convenient to create a custom version of generate_series:

CREATE OR REPLACE FUNCTION generate_series(date,date)
  RETURNS SETOF date AS $$
SELECT $1 + n FROM generate_series(0,$2-$1) AS x(n)
  $$ LANGUAGE SQL IMMUTABLE;

  Regards,
Dawid

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] How to copy tables between databases?

2008-02-26 Thread Dawid Kuroczko

On Tue, Feb 26, 2008 at 9:19 PM, Kynn Jones [EMAIL PROTECTED] wrote:

 Is there a simple way to copy a table from one database to another without
 generating an intermediate dump file?

Using UNIX pipes :-)

$ pg_dump ... | psql ...

:-)

   Regards,
   Dawid

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Perceived weaknesses of postgres

2008-02-13 Thread Dawid Kuroczko

On Feb 13, 2008 10:49 AM, Csaba Nagy [EMAIL PROTECTED] wrote:
 http://www.theserverside.com/news/thread.tss?thread_id=48339

 The interesting part is where somebody asks why NOT use postgres, and
 it's answers could give some additional hints to those interested on
 what people find missing from postgres to adopt it.

 Just to summarize some of the answers:
 * major PITA to upgrade between major versions;

Would be nice, though I must say that while the data migration is a pain, the
SQL compatibility PostgreSQL provides is a blessing.  Other open source
RDBMS have major PITA changing queries in applications. ;-)))

 * executing a single query on multiple cpus/cores;

I wonder if our most popular open source rival can do it.  I have
heard people claiming
so but I would not consider them authoritative. :)

 * no direct table cache control;

Could you elaborate more on this one?

 * accent-insensitive text comparisons;

Yesss.  That would be quite useful.  A more generic problem would be multiple
collations in one cluster.  Hopefully at column level.

 * fast select count(*);
 Wrong assumptions (but people seem to be sure it's like this):
 * no hotbackup except pg_dump (wrong: there are in fact a few different
 ways to do that);

...and pg_dump is not exactly hot.  And PITR using log archiving is a
great thing.
I've set up a backups using Tivoli Storage Manager, and it works quite fine.

BTW there is an urban legend, that Oracle stops writing to its datafiles
when you ALTER TABLESPACE into backup mode.  This is not true,
they are modifying the files quite same as we do. :-)

 * pg_dump the only way to cleanly upgrade (wrong: slony is good for
 that);

Slony is good as long as there are no DDLs issued.  And its easy to
shoot oneself in the foot if one is not careful (some time ago I have
lost all the triggers while upgrading from 8.1 to 8.2; it was my fault
since I did pg_dump -s on a slave database, not on the master...).

This could be solved with help of DDL/DCL triggers though, which
would be helpful for other things too, I guess.

 * missing 2 phase commit (wrong: it is now implemented);
 * inadequate performance with really large databases (wrong: there are
 known examples of really large postgres DBs);

 There are other claims like (quoting): RAC, enterprise backup
 capabilities, database on raw partition, compatibility with enterprise
 storage (SAN,...) which I don't know if there are adequate solutions
 for postgres or not.

  Regards,
 Dawid

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] pg_stat_activity xact_start and autovacuum

2008-02-11 Thread Dawid Kuroczko

On Feb 11, 2008 2:27 PM, Alvaro Herrera [EMAIL PROTECTED] wrote:
 Dawid Kuroczko escribió:
  I'm using 8.3.0 and I see that autovacuum processes in
  pg_stat_activity have xact_start.
 
  As far as I know, since at least 8.2.x the VACUUM does not start a new
  transaction.
  If that statement is correct, the xact_start column in
  pg_stat_activity should be NULL...
  Why does it matter?  Monitoring.  It's good to know the age of oldest
  running transaction, and autovacuuming is well, adding noise.
 Autovacuum certainly uses transactions ...  ??

I am referrring to the E.8.3.5 Release 8.2 Release Notes:

* Allow VACUUM to expire rows without being affected by other
concurrent VACUUM operations (Hannu Krossing, Alvaro, Tom)

I have probably oversimplifed my statement above.  What I am monitoring
is the age of the oldest transaction, to be alerted before tables accumulate
too many dead rows.  From this point of view long running VACUUM is not
a problem (since relese 8.2).

Right now I am using:
SELECT extract('epoch' from min(xact_start)) AS oldest_xact_age
  FROM pg_stat_activity
 WHERE current_query NOT LIKE 'autovacuum:%';

...which works fine but somehow I feel that if xact_age would be NULL, it would
ring more true.  Since VACUUM does not prevent VACUUMING it can take
days to complete and still I wouldn't need to worry. ;-)

Let me know if I mixed things up horribly. :-)

  Regards,
   Dawid

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[GENERAL] pg_stat_activity xact_start and autovacuum

2008-02-11 Thread Dawid Kuroczko

Hello.

I'm using 8.3.0 and I see that autovacuum processes in
pg_stat_activity have xact_start.

As far as I know, since at least 8.2.x the VACUUM does not start a new
transaction.
If that statement is correct, the xact_start column in
pg_stat_activity should be NULL...

Why does it matter?  Monitoring.  It's good to know the age of oldest
running transaction,
and autovacuuming is well, adding noise.

   Regards,
  Dawid

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] [OT] advanced database design (long)

2008-02-04 Thread Dawid Kuroczko

On Feb 4, 2008 5:14 AM, Alex Turner [EMAIL PROTECTED] wrote:
 Im not a database expert, but wouldn't

[...]

 give you a lot less  pages to load than building a table with say 90 columns
 in it that are all null, which would result in better rather than worse
 performance?

Well, but PostgreSQL's NULLs occupy almost no space, or rather a bit of space,
that is one bit exactly. ;-)  I am pretty much sure that storage-wise
looking NULLs
are more efficient.

   Regards,
  Dawid

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Online Oracle to Postgresql data migration

2008-01-16 Thread Dawid Kuroczko

On Jan 11, 2008 7:14 PM, Scott Marlowe [EMAIL PROTECTED] wrote:

 On Jan 11, 2008 12:02 PM, Josh Harrison [EMAIL PROTECTED] wrote:
  Hi
  We have an Oracle production database with some terbytes of data. We wanted
  to migrate that to Postgresql (rigt now...a test database and not
  production) database.
  What are the good options to do that?
  Please advise me on where to look for more information on this topic

 You're going to need to use your brain for a fair portion of this,
 because how you use oracle will be just different enough from everyone
 else that no boxed solution.

 You have two steps to work on.  The first is the DDL, to create
 equivalent tables in pgsql as in oracle, the second is to migrate over
 your data.

 I've generally done the ddl conversion by hand in an editor, and
 migrated data over with some scripting language like perl or php.

If you are migrating terabytes don't use perl.  I did some experimental
for fun migration some time ago and DBD::Oracle worked remarkably
slow...  What you need is to get a program which will export data
from Oracle as CSV.  As far as I know Oracle does not provide such
a tool (though it will import CSV happily through sqlldr),
but you can Google out a C-code which does just that.  I don't remember
where I left if... :-(

From that, you just need to stream CSV into PostgreSQL's COPY
command.  It worked FAST.  Really.

And be wary of data types conversion.

  Regards,
 Dawid

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Populating large DB from Perl script

2007-11-02 Thread Dawid Kuroczko

On 11/1/07, Kynn Jones [EMAIL PROTECTED] wrote:
 Hi.  This is a recurrent problem that I have not been able to find a
 good solution for.  I have  large database that needs to be built from
 scratch roughly once every month.  I use a Perl script to do this.

 The tables are very large, so I avoid as much as possible using
 in-memory data structures, and instead I rely heavily on temporary
 flat files.

 The problem is the population of tables that refer to internal IDs
 on other tables.  By internal I mean IDs that have no meaning
 external to the database; they exist only to enable relational
 referencing.  They are always defined as serial integers.  So the
 script either must create and keep track of them, or it must populate
 the database in stages, letting Pg assign the serial IDs, and query
 the database for these IDs during subsequent stages.

If it is possible, perhaps you could load raw data into temporary
table and then create ids using these tables.
For instance:
CREATE TEMP TABLE foo_raw (host text, city text, who text, value int);
INSERT INTO hosts (host) SELECT DISTINCT host FROM foo; -- group by perhaps?
INSERT INTO [...]
INSERT INTO foo SELECT host_id,city_id,who_id,value
   FROM foo_raw
   JOIN hosts USING (host)
   JOIN cities USING (city)
   JOIN who USING (who);
This may or may not work, depending on your setup.

But perhaps a better approach, while needing more work would be:

Your script establishes two DB connections, one for processing data
and one for maintaining IDs.  Now whenever you need to get and ID do:
  1) query memcached if found, return it
  2) query database if found return it and insert into memcached
  3) insert into database, and insert into memcached, and perhaps commit it.
Befriend thyself with Cache::* perl modules. :)

   Regards,
  Dawid

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[GENERAL] Scrolling cursors in PL/PgSQL

2007-11-01 Thread Dawid Kuroczko

Hello.

Is there a way to scroll a cursor from within PL/PgSQL?

I tried EXECUTE, but:

ERROR:  cannot manipulate cursors directly in PL/pgSQL
HINT:  Use PL/pgSQL's cursor features instead.

The idea would be that PL/pgsql function would look
through (all) query results, then rewind the cursor and
finally return it.

My fellow developers have created a function which run
the (complex) query twice (first a count(*) on the result
set, then return the row count and cursor to the results.
Then they created a version which uses temporary
table as a placeholder for the query results, and results
count and a pointer to select * from tmp_table.

Now I imagined a function which would open the cursor,
MOVE FORWARD ALL, then MOVE ABSOLUTE 0;
and return count and a cursor, yet it seems it won't work.

Any other idea how to efficiently solve such a problem?

   Regards,
 Dawid

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] multiple row insertion

2007-10-04 Thread Dawid Kuroczko

 On 10/4/07, test tester [EMAIL PROTECTED] wrote:
  i have version 8.1 and i want to know how to insert multiple rows in this
 version.

Please don't top post.

If you need this functionality, you should really upgrade.

In cases where you want to insert multiple rows in version 8.1, you
could use COPY command:
   http://www.postgresql.org/docs/8.1/interactive/sql-copy.html

You use it like this:
COPY cars FROM STDIN DELIMITER AS ',' CSV;
5,toyota
6,ford
\.

Or if you really need to use INSERT you could use such construct:

INSERT INTO cars SELECT 5, 'toyota' UNION ALL SELECT 6, 'ford'
  UNION ALL SELECT 7, 'bmw';

In short: really get the 8.2 version.  8.2 is compatible with earlier
versions and will be coming soon.

  Regards,
 Dawid
-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

[GENERAL] DELETE FROM pg_class

2007-09-24 Thread Dawid Kuroczko

Hello, I see that I can modify system tables even though I have
not set allow_system_table_mods...  Is this a feature or a bug?

Self contained code

postgres=# SELECT version();
version
---
 PostgreSQL 8.2.4 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.3
20070718 (prerelease) (Debian 4.1.2-14)
(1 row)

postgres=# SHOW allow_system_table_mods;
 allow_system_table_mods
-
 off
(1 row)

postgres=# CREATE DATABASE foo;
CREATE DATABASE
postgres=# \c foo
You are now connected to database foo.
foo=# DELETE FROM pg_class;
DELETE 204
foo=# SELECT count(*) FROM pg_class;
ERROR:  could not find pg_class tuple for index 2662
foo=# \c postgres
You are now connected to database postgres.
postgres=# \c foo
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Previous connection kept
postgres=#


   Regards,
 Dawid

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] DELETE FROM pg_class

2007-09-24 Thread Dawid Kuroczko

On 9/24/07, Tom Lane [EMAIL PROTECTED] wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  On Mon, 2007-09-24 at 14:44 +0200, Dawid Kuroczko wrote:
  Hello, I see that I can modify system tables even though I have
  not set allow_system_table_mods...  Is this a feature or a bug?

  allow_system_table_mods allows you to modify the structure, not just the
  data, i.e. add additional columns to system tables.

  Superusers have the capability to modify data in catalog tables and many
  other things besides, normal users don't.

 It is possible to disable this by turning off your
 pg_authid.rolcatupdate flag, but AFAIR there is no handy support for
 that (eg, no separate ALTER ROLE option).

 The better advice though is don't run as superuser except when you
 absolutely must.  You don't do random work as root, do you?

Nah, actually a friend (user of the other open source RDBMS) asked
me if you can overload PostgreSQL builtins (like new()).  And it was quite
simple.  I thought though, that I need allow_system_table_mods for it
and it surprised me that I just needed to become superuser...

Somehow, when I read documentation, my internal parser omitted
the of the structure of the Allows modification of the structure of
system tables. sentence.  I feel a bit foolish for asking this question,
but now I am a bit wiser.

   Regards,
   Dawid

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] LDAP service lookup

2007-08-27 Thread Dawid Kuroczko

On 8/27/07, Albe Laurenz [EMAIL PROTECTED] wrote:
  it could be used as an advocacy lever (you think LDAP directory with
  DB-services
  is neat?  PostgreSQL already has it).
 I'm glad that *somebody* else appreciates it :^)

Oh, I do, I do. :)

  Then again, apart from libpq I don't see it mentioned
  anywhere.  I would like
  to have [1] a Setting-up-LDAP-for-PgSQL-HOWTO.
 Being the author of the code, I'd be willing to compose one if
 there is a demand.
 I thought that the documentation you quoted above would be enough -
 it has samples and everything.
 What information is missing in your opinion?

Looking at the 8.3devel documentation...

I think it should be mentioned in 18. Server Configuration. probably
   somewhere in 18.3 Connections and Authentication, that there is
   a possibility of using Service names instead of traditional connect
   strings -- and a link pointing to libpq-ldap documentation.
   This would make people much less likely to miss this point, especially
   if they don't plan to code in libpq C library. :-)
(personally I think it would fit in more places, like Managing Databases
(though LSAP is more generic in scope) or even HA (makes it much
easier to promote slave to master, just one update in one place), though
I wouldn't want to overpromote it ;)).

[...]
 In 8.3 you will be able to say psql service=myname, in 8.2 you
 have to resort to PGSERVICE=myname psql because there is no
 support for the service parameter.

Somehow I've missed it while skimming through 8.2 docs.  I think
http://www.postgresql.org/docs/8.2/static/libpq-pgservice.html
could use this specific example to send a subliminal message:
  PGSERVICE=myname psql

  Could anyone of you tell me about your setups if you use LDAP for
  this?  How do you feel about it?  Which LDAP server do you use?
 We use Oracle's Internet Directory, but any LDAP server should do.
 It's just a matter of organizing your databases in LDAP entries and
 writing the corresponding LDAP searches into pg_service.conf.

And, while not belonging to PostgreSQL documentation, but defenately
belonging iin techdocs, whould be a step-by-step guide of setting up
pg_services in OID and other LDAP servers.  Funny thing, I think I'll
be looking at putting pg_services in OID as well.

   Regards,
  Dawid

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] LDAP service lookup

2007-08-27 Thread Dawid Kuroczko

On 8/27/07, Albe Laurenz [EMAIL PROTECTED] wrote:
 Dawid Kuroczko wrote:
  Then again, apart from libpq I don't see it mentioned anywhere.
 [...]
  Looking at the 8.3devel documentation...
 
  I think it should be mentioned in 18. Server Configuration. probably
  somewhere in 18.3 Connections and Authentication, that there is
  a possibility of using Service names instead of traditional connect
  strings -- and a link pointing to libpq-ldap documentation.
  This would make people much less likely to miss this point, especially
  if they don't plan to code in libpq C library. :-)
 The server config options are not a good place.

Which I do know, but it's just if it were there, I would have spotted it
much earlier sort of argument.

Incidentally, this is a question for people who have access to www
access logs.  Which sections of documentation have highest hit rate?
My guess would be: Server Configuration and SQL Reference, but it
would be interesting to see one.

 But it could be mentioned in the 'psql' man page, under
 'Connection to a database':

 $ psql service=myservice sslmode=require

I think it defenately should.

 If there are no objections, I'd create a documentation patch for this.

  And, while not belonging to PostgreSQL documentation, but defenately
  belonging iin techdocs, whould be a step-by-step guide of setting up
  pg_services in OID and other LDAP servers.  Funny thing, I think I'll
  be looking at putting pg_services in OID as well.

 Hmm, a tutorial for configuring LDAP servers would be quite off topic.
 I think that the examples in Section 30.15 are sufficient for somebody
 who is familiar with LDAP.

I have been playing with it for a few moments now.  i think there should
be mentioned in the documentation that pg_service.conf can also
contain static service definitions, and it also would be valuable to
add into pg_service.conf.sample an example ldap:// stanza, so if
person opens the file, she will be enlightened.

And a missing feature.  Or rather treat it as feature request. :-)
A wildcard entry.  I would like to set my environment that,
on each client I would put pg_service.conf having two and only
two LDAP servers in it (second one for failover. I think the entry
might look like:
### wildcard entry:
[%]  # or [*] ?
ldap://ldap1.mycompany.com/dc=mycompany,dc=com?uniqueMember?one?(cn=%s)
ldap://ldap2.mycompany.com/dc=mycompany,dc=com?uniqueMember?one?(cn=%s)

Which, when given:
  psql service = foobarbaz
..would query ...?(cn=foobarbaz)

Deploying a new database would be as simple as adding it into LDAP.

  Regards,
 Dawid

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] SQL Diff ?

2007-08-26 Thread Dawid Kuroczko

On 8/26/07, Kevin Kempter [EMAIL PROTECTED] wrote:
 On Saturday 25 August 2007 21:10:19 Ron Johnson wrote:
  On 08/25/07 21:51, Kevin Kempter wrote:
   Hi List;
  
   I have a very large table (52million rows) - I'm creating a copy of it to
   rid it of 35G worth of dead space, then I'll do a sync, drop the original
   table and rename table2.
 
  What is your definition of dead space?
 
  Bad rows, duplicate rows, old rows?  Something else?

 deleted rows that should have been cleaned up with vacuum, problem is the
 client let it go so long that now I cant get a vacuum to finish cause it
 impacts the day2day operations too much.  Long story, see my recent questions
 on the performance list for more info.

In your place I would do something like Slony-I does, when
it replicates the tables.  Create on insert/update/delete triggers
on table1 which will log operations on table1 to some table1_log
table.  Then copy table1 to table2.  Then replay table1_log on
table2, then BEGIN;LOCK tablel1;finish replaying the lock;DROP table1;
alter table rename...;commit;

Or perhaps actually use Slony-I for the above steps?  Should work
quite nicely... Or perhaps use SkyTools for it (I've never used it)?

   Regarda,
   Dawid

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[GENERAL] LDAP service lookup

2007-08-25 Thread Dawid Kuroczko

Hello!

I've just accidently stumbled upon
http://www.postgresql.org/docs/8.2/static/libpq-ldap.html
and thought hey, this is what my friend, a huge BigRDBMS fan, was
telling me about.

Now that I've read it, I think it could be very useful in an
enterpisish sort of way
(addressing databases as services not as host+port+database name), and
it could be used as an advocacy lever (you think LDAP directory with
DB-services
is neat?  PostgreSQL already has it).

Then again, apart from libpq I don't see it mentioned anywhere.  I would like
to have [1] a Setting-up-LDAP-for-PgSQL-HOWTO.  I would like to use it
from DBD::Pg (I _guess_ its a matter of DBI-connect('dbi:Pg',
'service=foo', ...);,
but its a wild guess).  And I would like to use it from psql (this is a tricky
part, since \c expects DBNAME, not a service name, and using both in
this context would introduce ambiguity).

Could anyone of you tell me about your setups if you use LDAP for
this?  How do you feel about it?  Which LDAP server do you use?

   Regards,
   Dawid

[1]: I may get to writing one, as it intrigues me.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] What do people like to monitor (or in other words, what might be nice in pgsnmpd)?

2007-08-02 Thread Dawid Kuroczko

I usually monitor blks_read and blks_hit (of block level stats), when
the latter is high
I see shared memory is doing a good job, when the former then it also
shows something

Also, database-wide number of commits and rollbacks (btw, Slony has a habit of
calling ROLLBACK when it done nothing -- I wonder if calling ROLLBACK instead
of COMMIT on a SELECT-only transaction is such a win?  It certainly blurrs the
image for me. ;)

And a number of clients waiting on a lock.

By the way, one nice thing to have could be counters which record how much
time did it take to load a page into shared memory (less than 1ms, 2ms, 4ms,
8ms, 16m and so on. Could help fine-tuning things like vacuum cost/delay
and so on.  Seen it somewhere in Oraclish stats tables.

Regards,
Dawid

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[GENERAL] varchar does not work too well with IS NOT NULL partial indexes.

2007-07-24 Thread Dawid Kuroczko


Hello, I guess I've stuck upon one problem with (I guess) implicit
casting varchar to text...

Basically it looks like planner makes better use of
WHERE ... IS NOT NULL indexes if either you explicitly
put text as a column type or that you cast the column
to ::text when making index.

Here's a self-contained example (this is 8.2.4 server)

CREATE TABLE foo (i int);

INSERT INTO foo
 SELECT CASE WHEN i%10=0 THEN NULL ELSE i END
FROM generate_series(1,10) AS n(i);

CREATE INDEX foo_i_index ON foo (i) WHERE i IS NOT NULL;
ANALYZE foo;

EXPLAIN SELECT * FROM foo WHERE i=17;
 QUERY PLAN
---
Index Scan using foo_i_index on foo  (cost=0.00..8.28 rows=1 width=4)
  Index Cond: (i = 17)

ALTER TABLE foo ALTER COLUMN i TYPE text;
EXPLAIN SELECT * FROM foo WHERE i=17;
QUERY PLAN
-
Bitmap Heap Scan on foo  (cost=12.14..554.42 rows=500 width=32)
  Recheck Cond: (i = '17'::text)
  -  Bitmap Index Scan on foo_i_index  (cost=0.00..12.01 rows=498 width=0)
Index Cond: (i = '17'::text)

EXPLAIN SELECT * FROM foo WHERE i=17;
  QUERY PLAN
-
Seq Scan on foo  (cost=0.00..1772.00 rows=500 width=34)
  Filter: ((i)::text = '17'::text)

CREATE INDEX foo_i2_index ON foo ((i::text));
EXPLAIN SELECT * FROM foo WHERE i='17'::text;
 QUERY PLAN
--
Bitmap Heap Scan on foo  (cost=12.14..554.82 rows=500 width=34)
  Recheck Cond: ((i)::text = '17'::text)
  -  Bitmap Index Scan on foo_i2_index  (cost=0.00..12.01 rows=500 width=0)
Index Cond: ((i)::text = '17'::text)

Regards,
   Dawid

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/

Re: [GENERAL] varchar does not work too well with IS NOT NULL partial indexes.

2007-07-24 Thread Dawid Kuroczko


On 7/24/07, Gregory Stark [EMAIL PROTECTED] wrote:

Dawid Kuroczko [EMAIL PROTECTED] writes:

 ALTER TABLE foo ALTER COLUMN i TYPE text;
 EXPLAIN SELECT * FROM foo WHERE i=17;
 QUERY PLAN
 -
 Bitmap Heap Scan on foo  (cost=12.14..554.42 rows=500 width=32)
   Recheck Cond: (i = '17'::text)
   -  Bitmap Index Scan on foo_i_index  (cost=0.00..12.01 rows=498 width=0)
 Index Cond: (i = '17'::text)

I think you've lost some single-quotes around 17 in this query. With the
single-quotes it works like this which seems like the correct result. You
don't need the casts in the index definition if you write the query with
single-quotes.


Well, maybe I used wrong example...

CREATE TABLE foo (t varchar(100));
INSERT INTO foo
 SELECT CASE WHEN i % 10 = 0 THEN NULL ELSE 'X' || i END
FROM generate_series(1,100) AS n(i);

What we have here is a table with every 10th row NULL.

CREATE INDEX foo_t_index ON foo (t) WHERE t IS NOT NULL;

...and an index which will contain only NOT NULL values.

Now, if we:

# EXPLAIN ANALYZE SELECT t FROM foo WHERE t='X17';
   QUERY PLAN
---
Seq Scan on foo  (cost=0.00..18025.78 rows=1 width=8) (actual
time=0.079..565.661 rows=1 loops=1)
  Filter: ((t)::text = 'X17'::text)
Total runtime: 565.689 ms


# EXPLAIN ANALYZE SELECT t FROM foo WHERE t='X17';
 QUERY PLAN
---
Seq Scan on foo  (cost=0.00..178.00 rows=50 width=68)
  Filter: ((t)::text = 'X17'::text)
(2 rows)

But if we:

# ALTER TABLE foo ALTER COLUMN t TYPE text;

# EXPLAIN ANALYZE SELECT t FROM foo WHERE t='X17';
   QUERY PLAN
--
Index Scan using foo_t_index on foo  (cost=0.00..8.39 rows=1
width=10) (actual time=0.051..0.052 rows=1 loops=1)
  Index Cond: (t = 'X17'::text)
Total runtime: 0.077 ms

...so it does nothing to do with single quotes.  Actually it works
fine, so long as you use text instead of varchar2:

# EXPLAIN ANALYZE SELECT t FROM foo WHERE t=17;
   QUERY PLAN
--
Index Scan using foo_t_index on foo  (cost=0.00..8.39 rows=1
width=10) (actual time=0.014..0.014 rows=0 loops=1)
  Index Cond: (t = '17'::text)
Total runtime: 0.034 ms


I hope I have stated the problem clearly now. :-)

  Regards,
 Dawid

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] varchar does not work too well with IS NOT NULL partial indexes.

2007-07-24 Thread Dawid Kuroczko


On 7/24/07, Gregory Stark [EMAIL PROTECTED] wrote:

Dawid Kuroczko [EMAIL PROTECTED] writes:

 Now, if we:

 # EXPLAIN ANALYZE SELECT t FROM foo WHERE t='X17';
QUERY PLAN
 
---
 Seq Scan on foo  (cost=0.00..18025.78 rows=1 width=8) (actual
 time=0.079..565.661 rows=1 loops=1)
   Filter: ((t)::text = 'X17'::text)
 Total runtime: 565.689 ms


 # EXPLAIN ANALYZE SELECT t FROM foo WHERE t='X17';
  QUERY PLAN
 ---
 Seq Scan on foo  (cost=0.00..178.00 rows=50 width=68)
   Filter: ((t)::text = 'X17'::text)
 (2 rows)

I still think you're playing games with the output. a) This is not an EXPLAIN
ANALYZE at all, there are no actual values. And b) there's no explanation
for why the estimates should be different for this query than the previous,
identical, query.


My mistake, copypaste error.


Send along the actual psql session, not an edited version.


Actual session is attached.

If I may suggest it -- try to run the queries yourself.  You will find
the problem
lies not in the statistics.

  Regards,
 Dawid


session.log
Description: Binary data

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Difference between PRIMARY KEY index and UNIQUE-NOT NULL index

2007-07-23 Thread Dawid Kuroczko


On 7/22/07, Vincenzo Romano [EMAIL PROTECTED] wrote:

On Sunday 22 July 2007 19:20:08 Tom Lane wrote:
 Vincenzo Romano [EMAIL PROTECTED] writes:
  In the original setup, the UNIQUE constraint had been dropped
  *before* doing the tests. So the slow case is without the
  UNIQUE constraint but with an index. The NOT NULL was instead
  there.

 With what index, pray tell?

   regards, tom lane

Sorry for the incomplete sentence.
Read it as:

In the original setup, the UNIQUE constraint had been dropped
*before* doing the tests. So the slow case is without the
UNIQUE constraint but with an index on NOT NULL fields.


Control question: did you recreate non-unique index after dropping
the UNIQUE constraint?

  Regards,
 Dawid

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Quota

2007-06-27 Thread Dawid Kuroczko


On 6/27/07, Hannes Dorbath [EMAIL PROTECTED] wrote:

What would be a solution to prevent a single user/schema to fill the
disk using PostgreSQL? Is it a good idea to use separate table spaces?


I am afraid currently you are stuck with tablespaces as a quoting tool.

Of course having a filesystem per user per quota is not feasible in
most circumstances.  I am contemplating using XFS filesystem's
quota to achieve per-directory quota.  Basically what you need is
use xfs_quota command.  Here's manual excerpt about enabling it:

  Enabling project quota on an XFS filesystem (restrict files in
log file directories to only using 1 gigabyte of space).

   # mount -o prjquota /dev/xvm/var /var
   # echo 42:/var/log  /etc/projects
   # echo logfiles:42  /etc/projid
   # xfs_quota -x -c 'projects -c logfiles' /home
   # xfs_quota -x -c 'limit -p bhard=1g logfiles' /home

I haven't used it yet, but it does look promising (other than that,
there's ZFS if you are a Sun shop ;-))


Nooow, as we are saying, XFS has yet another nice thing: xfs_fsr
command which does online filesystem level defragmentation (for
example as a nightly job).  It does mix nicely with PostgreSQL's
1-GB table files.. :)

  Regards,
 Dawid

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] [PGSQL 8.2.x] INSERT+INSERT

2007-06-21 Thread Dawid Kuroczko


On 6/21/07, Vincenzo Romano [EMAIL PROTECTED] wrote:

Hi all.
I'd like to do the following:

insert into t1
  values (
'atextvalue',(
  insert into t2
values ( 'somethingelse' )
returning theserial
)
  )
;

that is, I first insert data into t2 getting back the newly created
serial values, then i insert this values in another table.
I get an error message:
ERROR:  syntax error at or near into
referring to thwe second inner into.
Is there a way to do this?
The inner insert...returning should be the expression to be used in
the outer insert.
My objective is to create an SQL script to load some 20+ million
records and avoiding function calls would save some time.


I'm afraid INSERT ... RETURNING cannot be used where a (sub)select
could be.  It returns data to the calling application only.

Given tables:

qnex=# CREATE TABLE t1 (t text, id int);
qnex=# CREATE TABLE t2 (id serial, sth text);
NOTICE:  CREATE TABLE will create implicit sequence t2_id_seq for
serial column t2.id

You want to:

qnex=# INSERT INTO t2 (sth) VALUES ('somethingelse');
INSERT 0 1
qnex=# INSERT INTO t1 VALUES ('atextvalue', currval('t2_id_seq'));
INSERT 0 1

Or wrap it around SQL function:

qnex=# CREATE OR REPLACE FUNCTION t_insert(sth text, t text) RETURNS VOID AS $$
 INSERT INTO t2 (sth) VALUES ($1);
 INSERT INTO t1 (t,id) VALUES ($2, currval('t2_id_seq'));
$$ LANGUAGE SQL;
CREATE FUNCTION
qnex=# SELECT t_insert('foo', 'bar');

...which should be inlined nicely, without PL/PgSQL overhead.

  Regards,
Dawid

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Using the GPU

2007-06-08 Thread Dawid Kuroczko


On 6/8/07, Billings, John [EMAIL PROTECTED] wrote:




Does anyone think  that PostgreSQL could benefit from using the video card as a 
parallel computing  device?  I'm working on a project using Nvidia's CUDA with 
an 8800 series  video card to handle non-graphical algorithms.  I'm curious if 
anyone  thinks that this technology could be used to speed up a database?  If 
so  which part of the database, and what kind of parallel algorithms would be  
used?


You might want to look at:

http://www.andrew.cmu.edu/user/ngm/15-823/project/Final.pdf

...haven't used it though...

  Regards,
  Dawid

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Transactional DDL

2007-06-02 Thread Dawid Kuroczko


On 6/2/07, Jasbinder Singh Bali [EMAIL PROTECTED] wrote:

But its said that transactions in any RDBMS follow ACID properties.
So if i put a create table and an Insert statement in the same begin end
block as one single transactioin, won't both create and insert follow acid
property, being in one single trasaction, and either both get committed or
none, talking about oracle lets say


Actually, Oracle inserts implicit COMMIT after each DDL.

So, if you have:

BEGIN;
INSERT INTO foo (bar) VALUES (1);
CREATE INDEX foo_bar ON foo (bar);
-- Here Oracle will insert implicit COMMIT, thus your foo table will
have value 1 commited.
-- And here Oracle will BEGIN a new trasaction.
INSERT INTO foo (bar) VALUES (2);
ROLLBACK;
-- And you will rollback the insert of value 2.  Value 1 remains in the table,
-- because it is already committed.

  Regards,
  Dawid

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/

[GENERAL] Performance issues of one vs. two split tables.

2007-05-14 Thread Dawid Kuroczko


On 5/14/07, Rich Shepard [EMAIL PROTECTED] wrote:

On Mon, 14 May 2007, PFC wrote:

   I did something like that on MySQL some time ago.
   In the Users table there was stuff that other users need to see (like
 his login name, etc), and stuff that only this user needs to see (like his
 preferences).
   So, when displaying posts in the forum, for instance, only a small
 part of the fields in the Users table was needed, the rest was just dead
 weight, that made the table unable to fit in RAM.
   So I split the table, and it was faster.

   However, you can also buy more RAM...

   Or, use Views without paying more.


Well, views are not going to help with memory consumption here.
It is the table contents that gets cached in buffer cache, not the
views contents.  So if you have a view which returns only one
column from 15-column table, you will be caching that 15-column
data nonetheless.  View, as the name states, is converted into
a select on a real table.

As for actual colors table -- you might consider keeping colors
as int[] (integer array) column, though I'll never admit I suggeted
you that. :D

  Regards,
   Dawid

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] Performance issues of one vs. two split tables.

2007-05-14 Thread Dawid Kuroczko


On 5/15/07, Bill Moseley [EMAIL PROTECTED] wrote:

On Tue, May 15, 2007 at 06:33:26AM +0200, Dawid Kuroczko wrote:
 Well, views are not going to help with memory consumption here.
 It is the table contents that gets cached in buffer cache, not the
 views contents.  So if you have a view which returns only one
 column from 15-column table, you will be caching that 15-column
 data nonetheless.  View, as the name states, is converted into
 a select on a real table.

Are you saying that in Postgresql:

select first_name, last_name from user_table;

uses the same memory as this?

select first_name, last_name,
passowrd, email,
[10 other columns]
from user_table;


Yes.  You read whole page (8KB) into buffer_cache,
then extract these columns from these buffer.  From the
buffer cache point of view, whole tuple is contained in the
cache.

Say, if you first SELECT fname, lname FROM user_table;
and then you issue SELECT * FROM user_table; -- the
second select will be returned from buffer cache -- since
all rows are already in the cache.

Having seperate caches for possible SELECT [column list]
would be well, not quite efficient.

Now, select fname,lname will take less private memory,
but this memory will be freed as soon as the query finishes,
but this won't help our cache much.

  Regards,
   Dawid

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Dangers of fsync = off

2007-05-10 Thread Dawid Kuroczko


On 5/8/07, Joel Dice [EMAIL PROTECTED] wrote:

On Tue, 8 May 2007, Andrew Sullivan wrote:
 My real question is why you want to turn it off.  If you're using a
 battery-backed cache on your disk controller, then fsync ought to be
 pretty close to free.  Are you sure that turning it off will deliver
 the benefit you think it will?

You may very well be right.  I tend to think in terms of software
solutions, but a hardware solution may be most appropriate here.  In any
case, I'm not at all sure this will bring a significant peformance
improvement.  I just want to understand the implications before I start
fiddling; if fsync=off is dangerous, it doesn't matter what the
performance benefits may be.


Well, fsync=off makes failures harder to cope with.

Normally when your operating system crashes/power fails your
master server should start up cleanly.  If it doesn't -- you've got slave.

Now, with fsync=off you should promote slave to master whenever
you experience crash/power failure, just to be safe.  Having battery
backed unit may be cheaper than cost of failovers (time of DBA
costs money, downtime also ;)).  Do some testing, do some
calculations.


 on Y.  Thus, database corruption on X is irrelevant since our first step
 is to drop them.

 Not if the corruption introduces problems for replication, which is
 indeed possible.

That's exactly what I want to understand.  How, exactly, is this possible?
If the danger of fsync is that it may leave the on-disk state of the
database in an inconsistent state after a crash, it would not seem to have
any implications for activity occurring prior to the crash.  In
particular, a trigger-based replication system would seem to be immune.

In other words, while there may be ways the master could cause corruption
on the slave, I don't see how they could be related to the fsync setting.


OK, let's assume you have machine mdb as a master database,
and sdb as slave database.  mdb has fsync=off and Slony-I is used
as a replication system.

You have a power failure/system crash/whatever.  mdb goes down.
Your sdb is consistent, but it's missing, let's say 15 seconds of last
transactions which didn't manage to replicate.
You don't do failover yet.  Your mdb starts up, PostgreSQL replays
its Write Ahead Log.  Everything seems fine, mdb is up and running,
and these 15 seconds of transactions are replicated to sdb.

Oops.  PostgreSQL seemd to be fine, but since fsync was off,
the rows in Money_Transactions weren't flushed to disk (fsync
was off), and PostgreSQL thought they should already be on disk
(WAL was replayed since last known CHECKPOINT), you didn't
actually replicated these transactions.  If you are really unlucky
you've replicated some old contents of database, and thus
now, both your mdb and sdb contain erraneous data.
Of course sdb is consistent in terms of internal structure but
try explaining it to the poor soul who happened to be doing
updates on Money_Transactions table. ;-)

Of course likelihood of this happening isn't very big -- PostgreSQL
really tries to safeguard your data (elephant never forgets ;)),
but only as long as you give him a chance. ;)

  Regards,
 Dawid

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] varchar as primary key

2007-05-03 Thread Dawid Kuroczko


On 5/3/07, Jeff Davis [EMAIL PROTECTED] wrote:

On Thu, 2007-05-03 at 08:58 -0700, Matthew Hixson wrote:
 I'm investigating the usage of a UUID primary key generator using
 Hibernate and Postgres.  The reason for using a UUID is that we will
 have an application hosted at different sites in different
 databases.  We will need to aggregate the data back into a single
 database from time to time and we want to avoid PK collisions.
Is there a significant performance difference between using int
 primary keys and string primary keys in Postgres?

If the only thing you need to do is avoid primary key collisions, why
not just store an extra int that represents the site ID and have a
primary key on (the_key, site_id)?

That way you're *sure* no collisions happen. A UUID has a lot of bits,
but if the random generator is flawed a collision can still happen.


Also, why not simply use sequences?
You can declare sequence to START WITH 1 and INCREMENT BY 10,
and on the other site START WITH 2 INCREMENT BY 10.
There is no chance these will collide (unless human intervenes ;)),
and you can safely add 8 more similar servers, each with similar
sequence.

 Regards,
  Dawid

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] PostgreSQL 9.0

2007-01-30 Thread Dawid Kuroczko


On 30 Jan 2007 12:15:17 -0800, Karen Hill [EMAIL PROTECTED] wrote:

On Jan 29, 11:06 pm, [EMAIL PROTECTED] (Dawid Kuroczko) wrote:
 * updatable views [ or am I missing something? ] -- it seems to me
 they were close to be completed, but I don't remember if they were
 completed and committed or not.

PostgreSQL has updatable views via the rules system.  I use updatable
views all the time in postgres.


That is not a point really.  This todo is not about implementing rule
system which PostgreSQL already has.

It's about implementing infrastructure to set up updatable views automatically,
as the standard dictates.  And this is a feaure PostgreSQL lacks.  If you
want updatable views you have to issue couple of CREATE RULEs apart
from CREATE VIEW.  The point is that you could create updatable views
with sole CREATE VIEW command.

Another example is table partitioning which PostgreSQL has and doesn't
have.  You can set up table partitioning with clever set of triggers and
table inheritance, but it lacks explicit DDLs to do so.

  Regards,
   Dawid

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] PostgreSQL 9.0

2007-01-30 Thread Dawid Kuroczko


On 1/30/07, Ron Mayer [EMAIL PROTECTED] wrote:

Ron Johnson wrote:
 Who would they target anyways?
 There's no one company

 They could buy out CommandPrompt and EnterpriseDB...

 The buyouts wouldn't *kill* pg, but they would wound it mightily.

I don't think so.   High-profile and high priced buyouts
of CommandPrompt and EnterpriseDB would be great for
postgresql.

It would be a strong motivation for entrepreneurs to start
postgresql companies, developers to build postgresql expertise,
VCs to invest in postgresql companies.  And guys like Pervasive
would be kicking themselves for not keeping sticking with it.


One would think that with the acquisiton of Berkeley DB and InnoDB
one should see a flourish of database engine startups, but I seem
to have missed that.

My point is, its not about throwing money at a problem.  PostgreSQL
seems to be having right people at the right place and benefits from
it. They do the hard work, they do it well, hence 8.0, 8.1, 8.2 and
upcoming 8.3 release.  If you buy these people out, it will take time
to find and teach new ones.  Writing RDBMS is not dusting crops,
ya know. ;)))

  Regards,
  Dawid

PS: I guess this thread belongs in advocacy, please update To: headers
accordingly.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] PostgreSQL 9.0

2007-01-29 Thread Dawid Kuroczko


On 29 Jan 2007 13:25:31 -0800, Karen Hill [EMAIL PROTECTED] wrote:

I was just looking at all the upcoming features scheduled to make it
into 8.3, and with all those goodies, wouldn't it make sense for this
to be a 9.0 release instead of an 8.3?  It looks like postgresql is
rapidly catching up to oracle if 8.3 branch gets every feature
scheduled for it.


Well I see it in two ways.  For one, the features are certainly
great and a significant advance.  This alone could mandate version
bump to 9.0.

On the other hand, the 8.x line is so successful I would like it to
stay for a copule revisions more.  Well, it does have a nice feeling
about it: What? Yeah, it does support windowing function, we've
introduced them around version 8.3.  Naah, no big deal, wait for
the version 8.4, you'll be surprosed.  Naah, we keep version 9.0
for truly significant changes.  And I must say, I do like it.


About the only big features pg 8.3 doesn't have is materialized views
and RMAN..


Personally I'm missing two things, which were discussed in the
past, but would be nice to have:
* more efficient storage of varlen data -- some time ago there were
ideas to get rid of constant 4-bytes for length and use more elastic
approach.  Smaller tables, bigger performance.
* updatable views [ or am I missing something? ] -- it seems to me
they were close to be completed, but I don't remember if they were
completed and committed or not.

  Regards,
  Dawid

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] Database versus filesystem for storing images

2007-01-06 Thread Dawid Kuroczko


On 1/5/07, Jorge Godoy [EMAIL PROTECTED] wrote:

Andrew Chernow [EMAIL PROTECTED] writes:

 And how do you guarantee that after a failure?  You're restoring two
 different sets of data here:

 How do you link them together on that specific operation?  Or even on a daily
 basis, if you get corrupted data...

 I answered that already.

I'm sorry.  It must be the flu, the pain or something else, but I really don't
remember reading your message about how you can be 100% sure that all
references to the filesystem have their corresponding files present and also
all present files have their respective database entry.


By designing the system so that you can be sure.  For instance delegate
removing data from filesystem to a dedicated queue table within database,
and carefully check that code.  Let no other software delete data.

If you need PITR, you can stop the remover utility during backups (think of
it as a VACUUM for filesystem ;)).


I've seen HA measures (I don't imagine anyone sacrificing their customers
copying 3 billion files and a few hundred terabytes while still maintaining an
adequate service rate with part of its infra-structure down, just to use your
example to that answer...), ideas about requiring an answer from the
filesystem before considering the transaction done DB-wise (who grants you
that the image really went to the disk and is not on cache when the machine
has a power failure and shuts down abruptly?)...


And who grants you that the WAL log file really went to the disk and is not on
cache when the machine has a power failure and shuts down abruptly?

Use a trustworthy hardware.  You would have to do it anyway, if you wanted
to go with all-in-DB approach.


 Some people have seen this as a disadvantage on this thread, I personally
 don't see it that why.

I am questioning two points that show two situations where it is bad.
Specially if those images are important to the records (e.g. product failure
images, prize winning images, product specs, prototype images, blueprints --
after all, we don't need to restrict our files to images, right? --,
agreements, spreadsheets with the last years of company account movements,
documents received from lawyers, etc.).


I don't think noone is saying that storing images in DB isn't better from
data integrity point of view.  But it has drawbacks, which sometimes make
pepople store their images elsewhere in real life.

Of course if one had infinite budget...  But other than that, if you are storing
lots of data, and you can afford a trade-off between data safety (that you
can loose some data or that your data can be bloated by dead data) and
costs, then you have to consider storing data otherwise.


 I guess it depends on access needs, many files and how much data you have.
 What if you had 3 billion files across a few hundred terabytes?  Can you say
 with experience how the database would hold up in this situation?

I'd have partitioning if I had a case like that.  Part of those would be
delegated to one machine, part to another and so on.  Even if that solution --
partitioning -- makes the overall MTBF lower...


And how do you handle data integrity between many machines? The answer
is of 2PC, I guess.  But still, managing integrity between many machines is
also hard -- remember, you don't have RI constraints for remote data.


And I still can't imagine how you guarantee that all 3 billion files have
their corresponding entries on the database.  Couting them is not enough since
I can have one file with the wrong name present on the filesystem or some
duplicate record on the DB...


Depends what you mean by integrity.

For my needs it is sufficient that I can guarantee that every object mentioned
in a database is present on the filesystem.  Orphaned files are not a problem
and can be removed (if there really is need to) by a monthly cron job (find
old files not mentioned in DB, find not accessed old files and check if they
are in DB, etc. etc.).

More important still, is that I can make a snapshot of data.  With filesystem
it would be just:
 stop remover utility
 pg_dumptar (or whatever) the files
 start remover utility

With DB, one would have to start pg_dump and wait until it finishes.
And observe
how your tables start to bloat as the data needs to be preserved for
data integrity
reasons.  How your database is more and mroe filled with dead tuples.  And how,
after it finishes VACUUM struggles to clean up the tables.  The more data you,
have, the more frequently it changes, the bigger problem it is.
Filesystem is simply
more efficient at storing data (the non-transactionness, and limited
metadata being
the tradeoff).

I don't say all in DB is fundamentally wrong.  It's just that its
niches are low
bandwidth services, like intranets or rarely visited data, some forms of very
important data store services, where one must be absolutely sure about
safety and integrity, and finally, let's call it academic/hobbyst research ;-)

files outside of DB

Re: [GENERAL] [pgsql-advocacy] Merry Christmas!

2006-12-24 Thread Dawid Kuroczko


On 12/24/06, Joshua D. Drake [EMAIL PROTECTED] wrote:

Hello,

I have been threatened with severe penalties by the Chairwoman of the
Board if I so much as look at my email client on Christmas. So I send
this a day early!

Merry Christmas!


And same to You and all other PG-developers and users!

It's been great year, and I'm constantly amazed at the consistency
and steady development of our favourite Database -- many thanks
for all who take part in it -- it's great to have you!  Merry Christmas!!!

  Regards,
 Dawid

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] wildcard alias

2006-11-11 Thread Dawid Kuroczko


On 11/10/06, Matthew Terenzio [EMAIL PROTECTED] wrote:


I suppose I should have named things differently but is there a way to
join two tables with a bunch of identical column names and rather than
explicitly alias each column just use some sort of wildcard like:

SELECT tablename.* AS  alias.*


Well:
qnex=# CREATE TABLE tab1 (a int, b int);
qnex=# CREATE TABLE tab2 (a int, b int);
qnex=# INSERT INTO tab1 VALUES(1,2);
qnex=# INSERT INTO tab1 VALUES(3,4);
qnex=# INSERT INTO tab2 VALUES(1,7);
qnex=# INSERT INTO tab2 VALUES(3,12);

And you want to, instread of:

qnex=# SELECT * FROM tab1 JOIN tab2 USING(a);
a | b | b
---+---+
1 | 2 |  7
3 | 4 | 12

Do something like:

qnex=# SELECT tab1.*, alias.* FROM tab1 JOIN tab2 alias USING(a);
a | b | a | b
---+---+---+
1 | 2 | 1 |  7
3 | 4 | 3 | 12

...it is possible, certainly, but I guess you want to rather have
different column names.  Then aliasing table names doesn't
change column names, tab2.b will be b just as well as alias.b
will be column labeled b.

If you want to make a quickugly trick, do something like:

qnex=# CREATE VIEW tab2_renamed AS SELECT a AS tab2_a, b AS tab2_b FROM tab2;
qnex=# SELECT * FROM tab1 JOIN tab2_renamed ON (a=tab2_a);
a | b | tab2_a | tab2_b
---+---++
1 | 2 |  1 |  7
3 | 4 |  3 | 12

Other than that, I don't see too many options.  Also, consider using * in
queries as a bad coding style, and try to avoid it.

 Regards,
 Dawid

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Table and Field namestyle best practices?

2006-11-11 Thread Dawid Kuroczko


On 11/8/06, novnov [EMAIL PROTECTED] wrote:

I am very curious to hear the various conventions folks here have arrived
at. I don't expect there to be consensus, but the various rationales might
help me arrive at an approach that works well for me.


Personally I use all lower caps names a typical table might look:

CREATE TABLE names (
  name_id serial PRIMARY KEY,
  name varchar(100) UNIQUE NOT NULL,
  birth date
);
CREATE INDEX names_birth_index ON names (birth)
CREATE INDEX names_name_lower_index ON names (lower(name));
CREATE TABLE winners (
 winner_id serial PRIMARY KEY,
 name_id integer REFERENCES names
);
CREATE VIEW winner_names_view AS
 SELECT * FROM winners JOIN names USING (name_id);

...generally I don't like naming columns like 'id' -- if I put
full names, like name_id then JOIN ... USING(col_id) or
NATURAL JOINs are easy and straightforward.

Sometimes I put a trailing _view to mark that given table
is really a view.  My index names are composed of
table_col1_col2_index or  table_col1_function_index
(like the above lower() case).  If index is unique,
I use _key as a suffix instead of _index.

I know couple of people who name their tables like
T_Name, T_Winner etc. (and V_For_Views), but I consider
it a bit superfluous for my tastes.  And if I have whole a lot
tables, I like to keep them organized into schemas, which
are powerful beings in PostgreSQL.

  Regards,
  Dawid

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

[GENERAL] Remote tables infrastructure.

2006-11-09 Thread Dawid Kuroczko


Hello.

One nice feature of PostgreSQL's CREATE FUNCTION/RULE/VIEW system
is ability to create objects which may appear closer than they really are.
Hence, all the dbi-link-like modules.

The usual method is to create a view on a set returning function (if
reasonable),
create RULEs for managing INSERT/UPDATE/DELETE, and voila.

This approach has some limitations.  First of all if you add WHERE restrictions
to such a view, your PostgreSQL will first slurp all the data from SRF, and then
do the hard work of limiting/ordering.  To circumvent that you may want to
pass actual query string at the application level.

Methinks it would be good idea to discuss a more general approach, i.e.
don't create links using existing infrastructure but rather provide something
conceptually similar to CREATE AGGREGATE statements. In other words,
make a remote link a first class object inside PostgreSQL.

The first thing, a connection:
CREATE DBLINK name (
   connection = some connection string,
   start_transaction = foo_start_transaction_func,
   rollback = foo_rollback_func,
   before_commit = foo_prepare_transaction_func,
   after_commit = foo_commit_prepared_func,
   ...
);

A start_transaction_func should receive ISOLATION LEVEL of course.
A 2PC infrastructure could be used to ensure that both transactions
will either commit or rollback.  I think the remote transaction should
start (initialize connection) on first access to the remote table (as not
to cause connection storms upon backend process startup).  If it
fails to initialize it, then our transaction should rollback.
I think it may be reasonable to provide timeout on IDLE (not IDLE in
transaction) remote link.  Of course nontransactional remote links
should be possible without inconveniences.  And we would need
to take (and keep) connection string which will allow us to reuse
the library functions with different connections.

Normally one would prototype using PLperlU/PLpythonU PLs and
if one wants more robust interface -- one would made a .so library
providing more robust interface.

Then we should have ability to access remote tables.  I think PostgreSQL
should not assume that it knows what's on the other side of the link, and
just blindly follow user's request for a remote table (and throw an error
if table does not exist).  I'll elaborate further on access functions later on.
For instance accessing table bar on foo DBLINK would be possible through:

SELECT * FROM foo.public.bar WHERE baz = 123;
 foo -- being connection name,
 public -- being schema name,
 bar -- being table name.

You see now where am I cruising to: this way we're using
DB.SCHEMA.TABLE naming scheme, and if we provide
implicit (or explicit) dblink library for connecting other DBs
in current (or remote) cluster -- we've got a nice new feature
for free.

Once CREATEd DBLINK should remain in system tables until
somebody DROPs it.  It might be good idea to also provide
CREATE TEMPORARY DBLINK similar in concept to temporary
tables.

At minimum we should provide (in CREATE DBLINK) extra functions
for accessing and modifying data.  Something like
CREATE DBLINK foo (
 table_select = foo_tab_select_func
);

It should be possible to provide where parameters, order by and limit
somehow -- if remote supports it.  And we should do it so that
remote end could easily transform LIMIT into ROWNUM for instance.
I think PostgreSQL should reevaluate results -- if possible after all it's
PostgreSQL, and we trust PostgreSQL.  Then again there is a big
question how to provide them, and if simpler cases of GROUP BY
should be also possible (like count(*) ?).

At simplest it might be something like:
table_sepect(table_name text, ret_cols text[], where_cond,
where_vals[], order_cols[], limit int, offset int);

SELECT a, t0, t1-t0,
 FROM foo.public.bar
 WHERE (a  4 AND b IN ('a','z')) OR t=now()
 ORDER BY a
 LIMIT 100 OFFSET 5;

foo_tab_select_func('bar',
 '{a,t0,t0}',
 '(a  ? AND b IN (?,?)) OR t = ?', '{4,a,z,2006-11-10T01:13}',
 'a',
 100, 5);

I know it looks a bit awkward, and I do hope someone comes with
a better idea altogether.  My idea was to transform original query
conditions into perl DBI prepare-like string -- with column names
properly quoted, and arguments supplied as a separate argument.
I.e provide strict rules and limited number of operators to the query
(brackets, AND, OR, NOT, , , =, =, =, IN (...), perhaps
IS [NOT] NULL).  The remote end would try to implement as much
as possible but could as well put it on main backend's back.

There is a case with LIMIT/OFFSET -- we must know if remote end
will do the work for us or not.  Also, it would be nice to be able to
push more advanced parts like GROUP BY queries, or even JOINs
(if applicable) to the remote end.  Perhaps as a special mode for
PostgreSQL-compatible backends -- i.e. if some dblink driver
tries to understand PostgreSQL requests -- it is free to do so;
if it quacks like a duck, it is a duck.  I haven't got idea what format
of such 'request

Re: [GENERAL] Remote tables infrastructure.

2006-11-09 Thread Dawid Kuroczko

On 11/10/06, Tom Lane [EMAIL PROTECTED] wrote:
Dawid Kuroczko [EMAIL PROTECTED] writes: Methinks it would be good idea to discuss a more general approach, i.e.don't create links using existing infrastructure but rather
 provide something conceptually similar to CREATE AGGREGATE statements. In other words, make a remote link a first class object inside PostgreSQL.Please see the archives, particularly past discussions of the SQL-MED
standard.We do not need to go inventing our own syntax for this.Great! I was not aware of that part of SQL:2003. Anyway I managedto find September 2003 draft of the standard (Wikipedia has a link)
and skimmed through it (I didn't find much discussion in the archives,just one thread).While skimming through the draft I made quick notes on thecommands they propose to provide the functionality. Here is
the short list (I've ommited ALTER and DROP statements,and also DATALINK which I don't quite grok at this late hour).First there is a command:
CREATE FOREIGN DATA WRAPPER wrapper LIBRARY ...;which establishes machinery for accessing remote data.I would think of it as CREATE LANGUAGE of remote links.Then there is a command to create a server definition,
the instance of connection.CREATE SEVER servername [ TYPE type ] [ VERSION ver ] FOREIGN DATA WRAPPER wrapper;And finally, when we have a SERVER, we can get someforegin data:
CREATE FOREIGN TABLE tabname ( ...) SERVER servername;Or do it in bulk:IMPORT FOREIGN SCHEMA foreignschema [ LIMIT TO (tab1,tab2,...) | EXCEPT (tab3,tab4,...) ] FROM SERVER server INTO localschema;
And these, I am not sure how they should be used. User mappingseems straightforward, but routine mapping seems cryptic to me.CREATE ROUTINE MAPPING rmapname FOR routine SERVER server;CREATE USER MAPPING umapname FOR userident SERVER server;
 Regards,   DawidPS: So, to create connection to foreign postgresql, one would need to:CREATE FOREIGN DATA WRAPPER which probably should be backedby a bunch of our plain old FUNCTIONs or a specialized dynamically
linked library.Then CREATE SERVER giving it an URI to our database, saypsql://host:port/dbname (is there an official uri syntax for PostgreSQL?)And then either CREATE FOREIGN TABLEs or simply IMPORT FOREIGN
SCHEMA somewhere.This would be a minimum I guess.

Re: [GENERAL] Modifying SQL parser with extensions?

2006-10-29 Thread Dawid Kuroczko


On 10/29/06, Matthias Luedtke [EMAIL PROTECTED] wrote:


Alvaro Herrera wrote:
 In fact, parsing this SQL dialect would just be the first step, as the
 annotations within the query induce an ordering of the result set.

 Huh, what is this supposed to be able to do that you can't do with the
 already existing ORDER BY clause?

Basically, conditional statements are annotated with integers that
represent weights, like

(...)WHERE (foo = 'a')[42] OR (bar = 'b')[20]

In the result set those entries that fulfill both conditions yield score
62, i.e. 42+20, and are ranked top, whereas entries that fulfill only
one of the conditions yield scores 42 and 20 respectively and are
therefore ranked lower.


So, basically you're giving sets of three parameters:
column value, your value, score for that column
and your query should return score for the sum of all
those values.  I'll assume you only use '=' -- if you use
other conditions, feel free to modify!

First, your example data:

qnex=# CREATE TABLE blah (foo text, bar text);
qnex=# INSERT INTO blah VALUES ('a','a');
qnex=# INSERT INTO blah VALUES ('a','b');
qnex=# INSERT INTO blah VALUES ('b','b');
qnex=# INSERT INTO blah VALUES ('c','c');

Second, a user defined scorecounter:

CREATE OR REPLACE FUNCTION scorecounter(colval text[], yourval text[],
score int[]) RETURNS int AS $$
 DECLARE
   i int DEFAULT 1;
   retscore int DEFAULT 0;
 BEGIN
   WHILE score[i] IS NOT NULL
   LOOP
 IF colval[i] = yourval[i] THEN
   retscore := retscore + score[i];
 END IF;
 i := i+1;
   END LOOP;
   RETURN retscore;
 END $$ LANGUAGE PLpgSQL;

I used PL/pgSQL but you may prefer to user perl instead -- the idea
stays the same.  And now for the grand finalle:

SELECT *, scorecounter(ARRAY[foo,bar], ARRAY['a','b'], ARRAY[42,20]) from blah;
foo | bar | scorecounter
-+-+--
a   | a   |   42
a   | b   |   62
b   | b   |   20
c   | c   |0

SELECT * FROM blah ORDER BY scorecounter(ARRAY[foo,bar],
ARRAY['a','b'], ARRAY[42,20]) DESC;
foo | bar
-+-
a   | b
a   | a
b   | b
c   | c


Note that you should add some error checking into the function,
and if you prefer, you may user other syntax for arrays, I used
ARRAY[...] because it felt self explanatory.

Regards,
Dawid

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Wordpress PostgreSQL ...

2006-10-29 Thread Dawid Kuroczko


On 10/26/06, Robert Treat [EMAIL PROTECTED] wrote:

the wordpress guys have basically said they do not want to support postgres,
which is mainly why we swapped to s9y on planetpg.  you can read some more
info here:
http://people.planetpostgresql.org/xzilla/index.php?/archives/13-One-Good-Port.html


Suppose one would like to create a WordPress workalike, i.e. a blogging
engine sharing lookfeel of WordPress but written from scratch.  What
language/framework do you think would be the best?

I mean -- one could code that in PHP and simply use PostgreSQL, but
PHP is probably the most popular language, but I don't think it's the best.

Ruby on Rails perhaps?  I have not written a single line in Ruby, but
I've read on the list here that it has very good PostgreSQL interface...

Python?  Perl?  Something different?

The reason I'm asking this question is that I would like to hear personal
experiences with alternative to PHP environments, how do they feel,
how do they scale, in order to boost my knowledge a bit.

  Regards,
Dawid

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] why not kill -9 postmaster

2006-10-20 Thread Dawid Kuroczko

On 10/20/06, Shane Ambler [EMAIL PROTECTED] wrote:
 After all, that's what a system crash is, right? A system crash is safer in that it won't leave orphaned child
 processes or IPC/synchronization resources around, making it more comparable to a SIGQUIT than a SIGKILL.The one thing worse than kill -9 the postmaster is pulling the powercord out of the server. Which is what makes UPS's so good.


Well, I think that pulling the power cord is much safer than killing -9
the postmaster. If you pull the plug, then during bootup postgresql
will just replay every COMMITed transaction, so there won't be any
dataloss or downtime.

If you kill -9 postmaster... well, it's messy. ;-))) I feel safer when
everything goes down at the same time. ;)
If your server is changing the data file on disk and you pull the powercord, what chance do you expect of reading that data file again?


With PostgreSQL?I expect to read all commited transactions. And
those not commited... well, they weren't commited in the first place,
so I won't see them anyway.

This is all in assumption that you are running your DB with fsync on,
on a reliable filesystem, and your hardware doesn't lie to you about
fsyncing data (and it's best if you have a battery for controller's cache).

Regards,
 Dawid

Re: [GENERAL] more anti-postgresql FUD

2006-10-12 Thread Dawid Kuroczko


Not necessarily. Last I heard, MySQL ships with multiple config files,
ie: small, medium and large. So by choosing one of those you're
effectively tuning MySQL as well.


Hmm, wouldn't it be a good idea to provide something similar?

I think an initdb could have an additional flag, like --tune-for medium-oltp,
or something similar (MBs of RAM dedicated for PgSQL and estimated
number of concurrent users?).

This should probably be accompanied with verbosly claiming:
 applying medium-olap profile:
 ** increasing work_mem from 123 to 456
and maybe some hints for further tuning, and a great advice that
all these numbers are changeable from postgresql.conf

This would probably hint newbies that they should consider modifying
these files, and also help with strange setups where sysadmin is an
enemy (and not a DBA; and wouldn't like to modify postgresql.conf
much, but an additional option to initdb should go fine ;)).

  Regards,
 Dawid

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] IF EXISTS

2006-09-25 Thread Dawid Kuroczko


On 9/21/06, stevethames [EMAIL PROTECTED] wrote:


Ok, thanks, Jeff.
This is not a critical problem.  Just annoying.  I'll wait for 8.2.
BTW, while I can see the reason for adding the IF EXISTS clause to the
language for checking the existence of objects, wouldn't it be easier to
simply provide the PL/PgSQL language for script loading?  Then it would be
possible to create scripts for database maintainance that could be run
periodically without having to make them functions stored in the database.
Just a thought.


Or, more importantly, be able to commit within a procedure, i.e.
make a LOOP which will COMMIT every 1000th iteration.  Would
be great for scripting maintenance activities.

  Regards,
 Dawid

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] constraint -- one or the other column not null

2006-09-06 Thread Dawid Kuroczko


On 9/6/06, George Pavlov [EMAIL PROTECTED] wrote:

I have two columns, both individually nullable, but a row needs to have
a value in one or the other. What is the best way to implement the
constraints? I currently have:

create table f (
  a int,
  b int,
  check (a + b is null),
  check (coalesce(a,b) is not null)
);

Is there a better way to do it?


Personally I woud simply put there
  CHECK(a IS NOT NULL OR b IS NOT NULL)
which is probably the simplest form of your constraint. :)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Best approach for a gap-less sequence

2006-08-17 Thread Dawid Kuroczko


On 8/17/06, Merlin Moncure [EMAIL PROTECTED] wrote:

On 8/16/06, Dawid Kuroczko [EMAIL PROTECTED] wrote:
 -- then create a function to retrieve the values:
 CREATE FUNCTION gseq_nextval(t text) RETURNS integer AS $$
 DECLARE
n integer;
 BEGIN
SELECT INTO n gseq_value+1 FROM gapless_seq WHERE gseq_name = t
 FOR UPDATE;
UPDATE gapless_seq SET gapless_value = n WHERE gseq_name = t;
RETURN n;
 END;
 $$ STABLE LANGUAGE PLpgsql;


the problem here is if you have two concurrent transactions which call
this funtion, it is possible for them both to return the same sequence
number in read comitted mode.  Using this funtion outside of
transactions is no different that using a sequence except that it is
slower.


Hmm, I think you are wrong.  There is a SELECT ... FOR UPDATE;
The first-to-obtain the gapless sequence transaction will establish
a lock onthe tax_id row.  The other transaction will block until
the first transaction finishes (and the row is updated) and will
establish the row lock on it.

FOR UPDATE effectively serializes access to this row for all
transactions wanting to update it, even in read commited
mode.  Your statement would be true if I didn't use SELECT
... FOR UPDATE; but just a plain SELECT there.

  Regards,
  Dawid


PS: 
http://www.postgresql.org/docs/8.1/interactive/sql-select.html#SQL-FOR-UPDATE-SHARE

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] Best approach for a gap-less sequence

2006-08-17 Thread Dawid Kuroczko


On 8/17/06, Merlin Moncure [EMAIL PROTECTED] wrote:

On 8/17/06, Brad Nicholson [EMAIL PROTECTED] wrote:
   Hmm, I think you are wrong.  There is a SELECT ... FOR UPDATE;
   The first-to-obtain the gapless sequence transaction will establish
   a lock onthe tax_id row.  The other transaction will block until
   the first transaction finishes (and the row is updated) and will
   establish the row lock on it.
 
  yes, you are right...i didnt think the problem through properly.

 Lets just hope the performance on a concurrent system is not a
 requirement of such a system...

right, if the transations are long running, there is a big problem as
they are serialized around access to the sequence.  however this is
better than the control record approach because control record have
problems with mvcc bloat.  concurrent performance will of course be
awful.

a good compomise in some cases is to save off canceled transactions
ids' in a free list   you would still have to deal with transactions
that were not gracefully cancelled though.


I believe there is no ON ROLLBACK trigger...  One might periodicaly
check the table for gaps and put them into reuse me table,
say a left outer join between generate_series and a real table,
withing last known continuous id and max(id) from table.

However, if gapless sequence table bloat while long running transactions
is a problem, the other approach might be minimalising the size of this
gapless sequence table, say:

CREATE TABLE gapless_seq (
 gseq_value int NOT NULL;
);
INSER INTO gapless_seq VALUES(1);

...and then SELECT gseq_vaule FROM gapless_seq FOR UPDATE and
UPDATE gapless_seq SET gseq_value = gseq_value+1;

...this saves a few bytes per each update.  If we keep only a single
row, there is no need for an index (there is no vilibility information
in the index, so index would be useless in our case), and no need
for unique constraint checking (assuming we are sure noone will
ever ever insert additional rows -- it would be a disaster -- CREATE RULE
to ignore any INSERTs/DELETEs just to be sure.

Of course this does not solve the concurency issue, but I'm assuming
that if someone wants a gapless sequence, she wants its generation
serialized (or would assign sequences in post-transaction batches).

  Regards,
  Dawid

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Best approach for a gap-less sequence

2006-08-16 Thread Dawid Kuroczko


On 8/12/06, Jorge Godoy [EMAIL PROTECTED] wrote:

I was trying to solve a problem on an old system and realized that there might
be some better approach for doing what I need.

We have some documents that need to be ordered sequentially and without gaps.
I could use a sequence, but if the transaction fails then when I rollback the
sequence will already have been incremented.

So, today I have a control table and I acquire a SHARE ROW EXCLUSIVE lock to
it, read the value, increase it, do what I need and then I COMMIT the
transaction, ensuring that the sequence has no gaps.

Is there a better way to guarantee that there will be no gaps in my sequence
if something goes wrong with my transaction?


Hmm, I would do it this way:

-- First prepare a table for keeping gapless sequence, say:
CREATE TABLE gapless_seq (
   gseq_name varchar(256) PRIMARY KEY,
   gseq_value integer NOT NULL
);
-- ...and populate it:
INSERT INTO gapless_seq VALUES('tax_id', '1');

-- then create a function to retrieve the values:
CREATE FUNCTION gseq_nextval(t text) RETURNS integer AS $$
   DECLARE
  n integer;
   BEGIN
  SELECT INTO n gseq_value+1 FROM gapless_seq WHERE gseq_name = t
FOR UPDATE;
  UPDATE gapless_seq SET gapless_value = n WHERE gseq_name = t;
  RETURN n;
   END;
$$ STABLE LANGUAGE PLpgsql;

-- ...and use it as default in table definiton
CREATE TABLE taxdata (
   tax_id integer PRIMARY KEY DEFAULT gseq_nextval('tax_id'),
   customer text,
   when timestamptz
);

...etc.  SELECT ... FOR UPDATE woud ensure a row lock on gapless sequence,
a PLpgsql function would make a nice wrapper for it (so it would be usable more
or less similar to real sequences), and it should work.

I did not test the code right now, but I've written something similar to
it some time ago, and it worked fine.  Remember to vacuum gapless_seq
table frequently and don't expect stellar performance from it.

  Regards,
 Dawid

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Best approach for a gap-less sequence

2006-08-16 Thread Dawid Kuroczko


On 8/16/06, Dawid Kuroczko [EMAIL PROTECTED] wrote:

-- then create a function to retrieve the values:
CREATE FUNCTION gseq_nextval(t text) RETURNS integer AS $$
DECLARE
   n integer;
BEGIN
   SELECT INTO n gseq_value+1 FROM gapless_seq WHERE gseq_name = t
FOR UPDATE;
   UPDATE gapless_seq SET gapless_value = n WHERE gseq_name = t;
   RETURN n;
END;
$$ STABLE LANGUAGE PLpgsql;

  ^^^
VOLATILE of course!


Regards,
  Dawid

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [GENERAL] Secure connections with changing ip addresses?

2006-07-30 Thread Dawid Kuroczko


On 7/29/06, gene Campbell [EMAIL PROTECTED] wrote:

I have a pg_hba.conf file that is setup like this

hostdb1 user1 222.152.155.194/32md5

 From time to time, my ISP changes my ip address on me.
At that point, my connection to this DB is denied.

I have to log in (ssh) to my server, edit this file with the new ip
address, save and restart to get connectivity back.

Is there a way to have a secure connection that can withstand
changing ip addresses?



I think the best solution for you would be some form of VPN,
perhaps IPsec, or vtund (if you're using *nix).

Generally, when you change your IP, you will lose the connection.
And you also have to tell that machine that you changed the IP,
either manually (ssh to server, as you wrote) or automagically
(using the script which will ssh, change the file, signal the DB).

If you set a VPN, you will always have the same IP address, say
192.168.111.111, no matter what your ISP has selected for you now,
and connections to that address can persist your IP changes
(depends how VPN is set up, but it's doable).

  Regards,
  Dawid

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [GENERAL] Making query without trigger

2006-05-27 Thread Dawid Kuroczko


On 5/26/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

I have trigger on updating the table. Sometimes i need to make queries
without calling that trigger. How can I solve this?


You could try disabling the trigger (ALTER TABLE ...), doing you updates
and reenabling the trigger (ALTER TABLE) -- all within transaction.  If
you do it this way, your change in triggers won't be visible to any other
transaction.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] background triggers?

2006-05-25 Thread Dawid Kuroczko


On 5/25/06, Rafal Pietrak [EMAIL PROTECTED] wrote:

I'd like to propose a 'syntax/semantics' of such trigger:

Triggers normally execute inside of a transaction.

A COMMIT within a trigger could mean: do a fork: fork-1) return to the
main and schedule COMMIT there, fork-2) continue in bacground.


I don't think fork(2)ing a running backed is a good idea, probably it would
end up with major data corruption.  You want to call fork(2) in your
application. Something like: if (fork()==0) { reestablish connection, issue
trigger-code on the database } else  { gracefully return };


From the perspective of my earlier applications, it would be desired to
have an argument for such COMMIT - a label, which is a *global* database
object (may be just a semaphore), and is:
1) Assuming semaphore implementation - semaphore should be tested and
fail if already asserted, NOT tesed an block. FORK should be initiated
only if semaphore test succeeds.
2) the execution of procedure within fork-2 (meaning, after semaphore
assertion succeeds) should be posponed until caller actually COMMITS.
3) On EXIT, fork-2 deasserts semaphore.
4) in the simplest case, the semaphore can be defined on the trigger
function name itself, and consequently, the 'label' for the COMMIT
wouldn't be necesary?


Hmm, I've got a feeling its something like I don't feel like coding it in
application, so it would be better if community changed the backend
to do it. :) However what you propose i 1,2,3,4 points is somewhat
similar to already existing 2PC (2-phase commit), which PostgreSQL
implements. Probably not what you want, but should be valuable to
know, I guess.

And as for COMMIT; within function...  Not possible, not without
breaking awful lot of things.  Think about a tranasction as:
BEGIN;
 -- commands, like INSERTs, ended with commit;
 -- user given triggers fired after user issued COMMIT;
 -- Referential Integrity triggers and what not
COMMIT; -- actual commit performed by DB.

If your trigger would call COMMIT, the referential integrity triggers would
not have anything to do -- the commit would be already done.  No referential
integrity, you might as well use MyISAM then. ;)

So... let's assume the commit whould not actually commit, but rather
start another backend and do the work [1].  The problem is that newly
started backed would not see the work until the old backend actually
COMMIT;

The idea of commit within a procedure might be interesting, but from
the perspective of very-long-runing queries which update whole a lot
of rows, but that's another issue.

  Regards,
  Dawid

[1]: If you really insist on doing it this way, of course you may!  Here is
a fishing rod:
write a trigger in PL/perlU, which will fork();  The newly started child will
use DBI to connect to database, and issue your query, and then call
exit(0) to be sure you don't return to backend.  You might want to call
exec() with a pre-prepared script doing above work.


From the perspective of the main backend, the trigger will call fork(),

and finish.  And your application will commit.

That's everything you need to do it the way you want it.  Have fun!

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [GENERAL] background triggers?

2006-05-25 Thread Dawid Kuroczko


On 5/25/06, Rafal Pietrak [EMAIL PROTECTED] wrote:

 Well, exactly not being interested in the outcome is IMHO the reason
 why your demands clash with archtecture of a RDBMS. Most RDBMS go a long
 way to ensure consistency and safety of your data, once they promised
 (usually by not raising an error at least on commit) that they stored
 them. This doesn't match very well with asynchronous processes for which
 nobody cares (In the sense that there is nobody to reports errors to).

No, no no. This is completly different story.

That would really be very missfortunate if we couldn't relay on RDBMS
'confirmed storage'.


Oh, I think Florian meant that it is strange that your application is not
interested in the trigger's output.  Of course one might want to add
a notify-about-a-trigger-failure-by-email feature to circumvent that,
but I won't be going so far off.

What is here, is that with your approach, you fire a trigger and forget
about it.  It either commits some time later, or does not, and you
don't know it.  You don't know it, because your application went on,
did other things, and has no way of knowing what's with the commit.

Well, you can speculate, that you will notice that no work is being
done.  But why?  Maybe the trigger is inefficient and isss soo slw,
iittt tke aggs tooo cooompleeete.  Or maybe
it ROLLBACKed, effectively removing all evidence of the work done.
With this approach, you don't know it -- and this is what probably
struck Florian's strange for RDBMS feeling.


Here I'm just not interested in that procedure outcome: if it eventually
COMMITED or ROLLED-BACK. But obviously I am interested in consistancy of
database, when the detached procedure COMMITS.


Hmm.  How are you going to get it?  No, an on ROLLBACK trigger is not
a good idea! :-)


I mean. It looks like this is *really* a novelty for RDBMS design - I
feel, that real programmers here (you guys :) are so hard to persuade
its necesary, because it's so far from the 'synchronous nature' of
clasical RDBMS design and triggers in particular.


Don't get me wrong, but a word bizzarre is more suitable than
novelty.  The background processing is there since very long
time -- why do you think LISTEN/NOTIFY was implemented? :)


But I'd like to express my believe, that having such tool within the
server can help build better database applications.


write faster  write better.  As I wrote some time earlier, you can
code a trigger in PL/perlU doing exactly what you want.  The more
usual approach of using LISTEN/NOTIFY or a cron job is easier to
manage (you have much better control on how many times the
given function is called).  Imagine a query with thousands of INSERTS
grouped inside a transaction.  Your background trigger will mean
that postgresql will be spawning awfully alot of new connections,
for nothing, as they won't see a new rows from different transaction.

You said that your scheme would implement exclusive locking.
Well, if I were writing such an application, I would rather want such
code to be fired not more frequently than 1 minute.
ON COMMIT DETACHED WITH MINIMUM DELAY '1 minute'::interval; ? :-)

For locking you can simply use existing infrastructure, like
CREATE TABLE my_locking_table (lasttime timestamptz);
INSERT INTO my_locking_table(now());
BEGIN
 SELECT lasttime FROM my_locking_table WHERE lasttime  now()-'1
minute'::interval FOR UPDATE NOWAIT;
 IF FOUND THEN
   -- do dome work
   UPDATE my_locking_table SET lattime=now();
   RETURN;
 END IF;
 EXECPTION when locked...
END;

And if you want to check for 'ps auxw|grep backup.sh', you may
also, without need for extending these things.


I would only call it a framework if I can say COMMIT within the trigger
body. Or alternatively, if I can define a trigger, so to say: FOR EACH
COMMIT (pls note, that it's a different point in time, then FOR EACH
STATEMENT) which I could also define as DETACHED - launched by the
forked backend.


Actually, I like the idea of ON COMMIT trigger (though without the
DETACHED part), but this is another story...

  Regards,
  Dawid

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] background triggers?

2006-05-23 Thread Dawid Kuroczko


On 5/23/06, Rafal Pietrak [EMAIL PROTECTED] wrote:

On Tue, 2006-05-23 at 15:56 +, Chris Browne wrote:
  The use that I have for this at the moment, and I can think of many
  other uses, is that I want to populate a statistics table each time
  that a table is updated. But the code to populate the table takes 10
  seconds to run. I don't want the user to have to wait 10 seconds to
  add a record.

 This seems a case for using NOTIFY/LISTEN.

 - You have a process connected to the database that runs LISTEN,
   causing it to listen for a particular message.

   LISTEN regen_statistics;

 - Your trigger submits a notification:

   NOTIFY regen_statistics;

Some time ago I fell into quite similair problem. But LISTEN/NOTIFY was
not a solution.

In that case I needed to *disconnect* and never bother about the outcome
of a long running background trigger.


The idea is that you *disconnect* and you have a daemon running at the
server side, which will handle LISTEN efficiently.  Daemon can be quickly
written in perl, and it can use select(2) call to listen for incoming notifies.
Fast, efficient and powerful.


So if I may re-phrase the question: is there a way to have a trigger,
that, when launched, can check if it's already running in backgroung for
some other INSERT, return imediately if so, but turn into background for
a long-lasting job if not.


It is also possible.  You probably want to use locks checking in your
trigger (I'm writing from memory, so forgive me syntax erros, if any).
a trigger might be, say:

CREATE OR REPLACE FUNCTION foo_trigger() RETURNS trigger AS $$
   BEGIN
   SELECT key FROM foo_stats WHERE key = NEW.key FOR UPDATE NOWAIT;
   -- this will either lock the row with key or return
immediately, if it's been locked:
   UPDATE foo_stats SET count=(SELECT count(*) FROM foo WHERE
key=NEW.key) WHERE key=NEW.key;
   RETURN NEW;
   EXCEPTION
   WHEN lockbusyorsomething THEN RETURN NEW;
   END;
$$ LANGUAGE PLpgSQL;

Most likely there are better ways to accomplish your goal.

  Regards,
 Dawid

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] Let's make CPgAN!

2006-05-22 Thread Dawid Kuroczko


On 5/20/06, Bruno Wolff III [EMAIL PROTECTED] wrote:

I am not sure that Postgres needs CPAN. CPAN is particularly useful for
handling dependencies. I doubt that there will be lots of dependencies in
Postgres add ons. So having something like the current system where you
download and build packages from source isn't going to be improved much
with a CPAN like system.


Hmm, I was thinking rather it would be more a definitive system to look
for PgSQL extenstions and a way to encourage distributors to package
the extensions together with mainstream PostgreSQL. :)

  Regards,
  Dawid

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] project

2006-05-22 Thread Dawid Kuroczko


On 5/22/06, karthick muthu [EMAIL PROTECTED] wrote:

hello hai,
I am second year computerscience student(engineering),
I am very impressed by you all in asking questions and sending answers,
I am also very proud to be in this group,
now I am willing to do some mini projects(or)softwares from basics,so
I need help(ideas) from you all .please give me some sites (or)
anything that helpful for me.


For the ideas, see the TODO at:
http://www.postgresql.org/docs/faqs.TODO.html
...especially points marked with '%'. :)

 Regards,
 Dawid

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] Let's make CPgAN!

2006-05-22 Thread Dawid Kuroczko


On 5/22/06, Florian G. Pflug [EMAIL PROTECTED] wrote:

elein wrote:
 This issue is a very old issue and people have not come up with
 the definitive solution to distributing datablades as Stonebraker
 called them.
True, but OTOH there is no definitive solution for OS-level package
management too, but still apt-get or rpm do a pretty decent job.
So, 99% solutions are possible ;-)


Yet a RPM/DPKG/whatever will only make a collection-wide installation,
or rather preparation of package.  Think: PLpgSQL.  It is installed by
default.  But to use it, you have to actually createlang it into your
specific database.

I think the CPgAN should aim at the latter (managament of packages
throughout whole PostgreSQL collection) and help with the former
(make it easy to rpmify/dbmify/whateverify the package; help with
raw source-installation/update) at the same time.  It is what C*ANs
do to other projects. :)


I'd really like to see a solution that enables me to install
a package using something like pgpkg install dbname package.
I'd ask me to install prerequisites (like procedural languages
for example). pg_dump should have an option to exclude any installed
packages from the dump.

As long as packages only contain functions, views and types, things
are quite straight forward, I believe. The hard part would be to support
packages including table-definitions. What do you do when an upgrade
wants to modify a table, but it already contains user data?


Tricky.  Ideally it should either upgrade it, if possible, or fail
with some hints how to update the structure manually.
And it can happen to functions views (think views depending
on  views) and types also.

  Regards,
  Dawid

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

[GENERAL] Let's make CPgAN!

2006-05-20 Thread Dawid Kuroczko


The idea that came up in the -hackers and -advocacy lists, and I think
it should be brought up as a separate thread, in -general.

Backgroud

Once in a while someone comes and suggests adding some package to
postgresql-contrib.  Some other person asks for some feature and is directed
at Pgfoundry. Yet again, someone submits a package and is usually directed
to Pgfoundry to put it there.

While it is a great place to keep development there, but not so obvious a place
for random administrator/DBA/programmer...  So the idea is to make something
akin to:

Comrehensive PostgreSQL Archive Network, or CPgAN

I would like to provoke a discussion how should such a thing look like,
and hopefully to make one.

First, I think it should be pretty intergrated into PostgreSQL, and should
by distribution agnostic.  From user perspective it should be similar to
createuser, createdb, createlang commands.  ie. user would call a shell
command, say:
 pg_package dbname install name_of_package
and it would connect with cpgan, get list of packages, get sources,
compile sources (if C-backed), maintain dependencies, etc.  Or at
least it should.

It's better to talk with an example.  Let's assume the issn/isbn datatype
would be handled by that infrastructure.

If DBA would  want to install it from source, she would use this pg_package
(or whatever the name) utility to download it, unpack, compile and install
generated lib...so file.  It would also install, in some common place a sql
script which would install and uninstall the package.  DBA should be able
to specify if package is available to all users or not (grant its usage), but
it is not essential right now.  Command should be able to handle binary
packages well.  Think: Linux distributions etc -- they would put all files
in place, then call pg_package to notify PostgreSQL about new additon.

The second part is enabling the usage within given database.  A simple
running of bunch CREATE commands on given database.  Plus a matching
deinstall script.

This raises couple of questions:

1. CREATE PACKAGE -- maybe its time to rethink such a command. :)

2. Where to store state information.  A natural place would be a database
itself.  A dedicated cpgan db with all dependencies?  A bit fragile solution,
and much of managament will become cross-database (to install PL/R
to template1 db you would need to connect both to cpgan and template1,
but it may work.

3. There would be a need to automate the server side as possible.  User
should be able to use it to search for package she desires. And developers
should be able to efficiently upload packages.

4. A tree of packages, PgFoundry's is too general I think.  For instance
I feel that PgAdmin and other DB Administration projects would not belong
here, as they are rather interfaces to than extensions of. :)  Say, a tree
with such a look:

pl/
pl/plr
pl/plperlng
replication/slony
datatype/uri
datatype/email
index/ltree

...of course its not complete and not ideal.  I wnat to know how would
you organise such a tree.

5. A common documentation format.  It would be great to be able to automatically
merge in documentation of extensions into PostgreSQL's Manual as a separate
Appendix.  Right now its somewhere between Use The Source, Luke, README
files and what-author-provided documentation format.

What would this all give us?  It would encourage development of simple
extensions,
and would greatly increase the ease of extending PostgreSQL by newbies.  In the
long run it would bring more developers, as PostgreSQL would be perceived not
only as a SQL DB but also as a development platform, extensible and powerful.

What do you think?

 Regards,
 Dawid

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] allow LIMIT in UPDATE and DELETE

2006-05-19 Thread Dawid Kuroczko


On 5/19/06, Csaba Nagy [EMAIL PROTECTED] wrote:

Hi all,

Currently the LIMIT clause is not allowed in UPDATE or DELETE
statements. I wonder how easy it would be to allow it, and what people
think about it ? For our application it would help a lot when processing
things chunk-wise to avoid long running queries.

The fact that the actual rows processed would be unpredictable does not
make it less useful for us. We actually don't care which rows are
processed, we process them all anyway, we just want to make sure it is a
limited number at a time. A lot of our processes do take large amounts
of time (hours up to days), and we cannot allow that to be in one
transaction, the system does on-line processing too...

I guess the low-level infrastructure is already there (from what I
understood from earlier postings, but I may be wrong), and the question
is more if this feature is worth to be included or not... and the syntax
must be adjusted of course if yes.


-- sample data
CREATE TEMP TABLE tab (id serial primary key, n int, t text);
INSERT INTO tab(n) SELECT * FROM generate_series(1,1000);
-- say, you want such an update:
UPDATE tab SET t = 'aqq' WHERE n  10;
-- but with limit:
UPDATE tab SET t = 'aqq' WHERE id in (SELECT id FROM tab WHERE n  10
LIMIT 100);
-- or this way (join):
UPDATE tab SET t = 'aqq' FROM (SELECT id FROM tab WHERE n  10 LIMIT
100) AS tab_ids WHERE tab.id = tab_ids.id;

...this of course assumes that you have a primary key you can use
to target the update.

Then again, there are places where there is no primary key, like:

CREATE TABLE foo (t text);
INSERT INTO foo VALUES('aaa');
INSERT INTO foo VALUES('aaa');
...and you want to update first 'aaa' to 'bbb'.

But you can handle it this way:

CREATE TEMP SEQUENCE aaa_temp_seq;
UPDATE foo SET t='bbb' WHERE t='aaa' AND nextval('aaa_temp_seq') = 1;
-- LIMIT 1
...this of course will suck for big queries (all matching rows will be
searched, but not updated);

 Reagrds,
   Dawid

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] Select first ten of each category?

2006-04-17 Thread Dawid Kuroczko

On 4/13/06, Benjamin Smith [EMAIL PROTECTED] wrote:
I'm stumped on this one...I have a table defined thusly:create table items (id serial,category integer not null references category(id),name varchar not null,price real,unique(category, name));
It has a LARGE number of entries. I'd like to grab the 10 most expensive itemsfrom each category in a single query. How can this be done? Something likeSelect items.*FROM itemswhere id IN (
select firstTen(id) FROM itemsgroup by categoryORDER BY price DESC)ORDER BY price desc;But I've not found any incantation to make this idea work...

I came up with something like this:

SELECT i.* FROM items i JOIN (
 SELECT category, (
 SELECT price FROM items ii
 WHERE ii.category=io.category
 ORDER BY price DESC OFFSET 9 LIMIT 1
 ) AS date FROM items io GROUP BY category) AS sel
 ON (i.category=sel.category AND i.price = sel.price);

I,e. First do a select which will return ninth price of each
category (two inner selects) and then JOIN it with a whole
table, where category and price match they way you want.

Keep in mind, if you happen to have same prices at position
near 10th, you'll end up having more than 10 returns per
given category. Either filter it at application level, or embed
some additional key inside the join condition (like
ON (i.category=sel.category AND (i.price  sel.price OR
(i.price=sel.price AND i.id=sel.last_id)));

 Regards,
 Dawid

Re: [GENERAL] 21 bit number for sequence

2006-04-17 Thread Dawid Kuroczko

On 4/15/06, Shoaib Mir [EMAIL PROTECTED] wrote:
Actually that is the application requirment to use 21 bit numbers as porting it from Oracle where it used to work. Yeah
now i have decided to use a numeric data type in a table and use that
to write my own nextval and currval functions for that purpose.


Something like:
CREATE SEQUENCE some_id_seq;
CREATE TABLE foo (
 some_id bit(21) DEFAULT nextval('some_id_seq')::bit(21),
 t text
);
insert into foo(t) values ('a');
insert into foo(t) values ('bar');
insert into foo(t) values ('baz');
insert into foo(some_id,t) values ('10010', 'uhh');
SELECT * FROM foo;
 some_id t 
- ---
1 a
00010 bar
00011 baz
10010 uhh
(4 rows)

HTH, HAND. :)
 Regards,
 Dawid

Re: [GENERAL] Upcalls (sort of) from the database

2006-04-06 Thread Dawid Kuroczko

On 4/6/06, Don Y [EMAIL PROTECTED] wrote:
For example, the title may match an existing entry -- butthe author may be different (e.g., misspelled, or someother author listed on a book having multiple authors, etc.).Ideally, I would like the database to suspend the INSERT,
ask for confirmation (and why) and then, either committhe INSERT or abort it (based on the user's response).Nearest I can imagine, there's only one ways I can do this:issue a query that looks for these types of problems and
based on the result, let the *application* prompt theuser for confirmation.Then, *if* confirmed, do the realINSERT.Is there a more elegant way?
Well, your application could:
BEGIN TRANSACTION;
Then it would
INSERT INTO... or call a stored procedure. The triggers/stored procedure would
do all what's needed to perform such action, but when it notices something
suspicious it would RAISE (see PLpgSQL) a notice describing the problem(s).

If your user application notices such messages, it issues a message WARNING:
the message, do you want to continue? and if user presses yes, you
do COMMIT. Otherwise you do ROLLBACK.

The thing is to use http://www.postgresql.org/docs/8.1/static/plpgsql-errors-and-messages.html


 Regards,
 Dawid

Re: [GENERAL] Many-To-Many Bridge Table Index

2005-09-30 Thread Dawid Kuroczko

On 9/30/05, Sergei Dubov [EMAIL PROTECTED] wrote:
Hi guys,I'd really appreciate if you could clarify this to me. Let's say I havea table named TABLE_A that is a bridge to many-to-many relationshipbetween TABLE_B and TABLE_C, as such here is my declaration:
CREATE TABLE table_a (table_b_idINT4NOT
NULL REFERENCES table_b ON DELETE CASCADE ON UPDATE CASCADE, table_c_idINT4NOT NULL REFERENCES table_c ON DELETE RESTRICT ON UPDATE CASCADE, CONSTRAINT table_a_pkey
 PRIMARY KEY (table_b_id, table_c_id));This will automatically create a unique index on the primary key.
Yes it will. Primary keys in PostgreSQL are implemented
using unique indexes, one may say.
Well, if I do a query later like:select * from table_a where table_b_id=1,will the query use an index? Will this be true when I do join on this table?

The short answer is: yes.

The longer answer is: yes, when planner sees that the index is the most
efficient way to approach given query. In some cases using sequentional
scan may be more efficient.

Try EXPLAIN ANALYZE SELECT ...; -- it will tell you what did the planer
choose.

 Regards,
 Dawid

Re: [GENERAL] Performance woes relating to DISTINCT (I think)

2005-09-27 Thread Dawid Kuroczko

On 9/26/05, boinger [EMAIL PROTECTED] wrote:
Hello.I'm not sure if this is a question suited for here, the -sql list, orthe -performance list, so if I'm mis-posting, please direct me to theright list.I was unable to come up with anything relevant from the archives
(though...I am not too sure where to start so I may just have beenlooking for the wrong things).I am trying to convert a bunch of code from MySQL to Postgresql.InMySQL I was running this particular query with more complexity (with
joins and such) and it takes far less than a second.The Pgstripped-down version takes over 45 seconds (it was taking over 80seconds with the joins).


QUERY PLAN-GroupAggregate(cost=0.00..85168.65
rows=11 width=22)(actual time=3149.916..45578.292 rows=515 loops=1)
Hmm, planner expected 11 rows, got 515

(cost=0.00..85167.23 rows=107 width=22) (actualtime=3144.908..45366.147 rows=29893 loops=1)


planner expected 107 rows, got 29893...
I guess the problem here is that planner has wrong idea how your
data looks. Try doing two things:

VACUUM ANALYZE;
(of tables in question or whole database)

If that doesn't help, do increase the statistics target. By default PostgreSQL
keeps 10 samples, but you might want to increase it to 50 or even 100.
And then rerun VACUUM ANALYZE.

If it doesn't help -- please repost the new query plan once again.

 Regards,
 Dawid

Re: [GENERAL] COPY - permission denied

2005-09-23 Thread Dawid Kuroczko

On 9/23/05, John Seberg [EMAIL PROTECTED] wrote:
I have used the COPY command dozens of times! I thinkI *get* it. But, I'm getting permission deniederrors.This is a fresh install of Fedora Core 4 (x86). I havePostgresql running under the user postgres. I am
logged into to psql as postgres. The files are ownedby postgres and are -rw---. They are being found -it isn't a file not found error.
Turn off or configure SELinux. The default policy prohibits
PostgreSQL from reading files from weird places, like
/tmp/ or similar.

 Regards,
 Dawid

Re: [GENERAL] COPY - permission denied

2005-09-23 Thread Dawid Kuroczko

On 9/23/05, John Seberg [EMAIL PROTECTED] wrote:
--- Tom Lane [EMAIL PROTECTED] wrote: John Seberg [EMAIL PROTECTED] writes:  I have used the COPY command dozens of times! I
 think  I *get* it. But, I'm getting permission denied  errors.  This is a fresh install of Fedora Core 4 (x86). I have  Postgresql running under the user postgres. I am
  logged into to psql as postgres. The files are owned  by postgres and are -rw---. They are being found -  it isn't a file not found error. Do you have SELinux running in enforcement mode?If
 so, it's likely denying the server the privilege to read or write anywhere outside /var/lib/pgsql.Check for messages about this in /var/log/messages.I turned off SELinux, and that seemed to do it.
Thanks, I would have never thought of that.The thing is - I had *tried* having the files in/var/lib/pgsql (also /var/lib/pgsql/migrate). Maybethey have to be in /var/lib/pgsql/data? I'm sureownership/rights were correct.

No, no. It's not how SELinux policies work. They are
completely orthogonal to ownership/permissions.

There is a process called labelling, during which
each and every file is being marked with a label.
A label describes what kind of data such file is,
for instance config file for Apache or database
file for PostgreSQL (only the labels are symbolic,
not English text).

Then, each process runs in a context, say maybe
a PostgreSQL server context. The context defines
what can it do -- which ports can it bind to, which
files (which labels!) can it read and/or write etc.

If you simply moved files into /var/lib/pgsql/ it probably
wasn't enough -- files probably are already labelled
as your home directory files or so. You need to
relabel them so PostgreSQL could read them
(I don't remember the command to do so).

And please note that when you try using PL/perl you may
run into problems -- the policy definition may not contain
permissions to read perl modules. And so on, and so on.

SELinux is a fine piece of software, but it's a sturdy beast
to configure and I tend to think that the best approach is
to write policy from scratch tailored to specific needs...

 Regards,
 Dawid

Re: [GENERAL] How many insert + update should one transaction handle?

2005-09-23 Thread Dawid Kuroczko

On 9/23/05, Yonatan Ben-Nes [EMAIL PROTECTED] wrote:
Hi all,Every few days I need to DELETE all of the content of few tables andINSERT new data in them.The amount of new data is about 5 million rows and each row get about 3queries (INSERT + UPDATE).

If I understand you well, in transaction you do:
DELETE FROM table; -- remove all columns, you have a table with 5m dead tuples
then you:
INSERT into the table, -- you have 5m dead tuples + 5m new tuples
then you:
UPDATE what you've inserted -- you have 10m dead tuples + 5m new tuples
and then you:
UPDATE once again what you've inserted -- you have 15m dead tuples + 5 new tuples
and then COMMIT;

Alternatively you want to do:
CREATE TEMPORARY temp_table and UPDATE it as you please
BEGIN;
DELETE FROM table;
SELECT INTO table * FROM temp_table;
COMMIT; -- which will leave you with 5m dead and 5m live.

Or you could even try (haven't tested it):
BEGIN;
CREATE new_table;
SELECT INTO new_table * FROM temp_table;
DROP TABLE table;
ALTER TABLE new_table RENAME TO table;
COMMIT; -- leaving you with fresh 5mln new tuples table
...with a risk of loosing all the changes made to old table after BEGIN;

 Regards,
 Dawid

Re: [GENERAL] Slow search.. quite clueless

2005-09-21 Thread Dawid Kuroczko

On 9/20/05, Yonatan Ben-Nes [EMAIL PROTECTED] wrote:
Hi all,Im building a site where the users can search for products with up to 4diffrent keywords which all MUST match to each product which found as aresult to the search.I got 2 tables (which are relevant to the issue :)), one is the product
table (5 million rows) and the other is the keyword table which hold thekeywords of each product (60 million rows).The scheme of the tables is as follows:
Table public.productColumn
|
Type|Modifiers+---+-product_id
| text| not
nullproduct_name
| text| not
nullretail_price
| numeric(10,2) | not nulletc...Indexes: product_product_id_key UNIQUE, btree (product_id)Table public.keywordColumn|
Type| Modifiers-+---+---product_id| text| not nullkeyword | text| not nullIndexes: keyword_keyword btree (keyword)
The best query which I succeded to do till now is adding the keywordtable for each keyword searched for example if someone search for belt black  pants it will create the following query:
poweraise.com=# EXPLAIN ANALYZE SELECTproduct_id,product_name,product_image_url,short_product_description,long_product_description,discount,discount_type,sale_price,retail_priceFROM product INNER JOIN keyword t1 USING(product_id) INNER JOIN keyword
t2 USING(product_id) INNER JOIN keyword t3 USING(product_id) WHEREt1.keyword='belt' AND t2.keyword='black' AND t3.keyword='pants' LIMIT 13;QUERY PLAN---
Limit(cost=37734.15..39957.20 rows=13 width=578) (actualtime=969.798..1520.354 rows=6 loops=1)-Hash Join(cost=37734.15..3754162.82 rows=21733 width=578)(actual time=969.794..1520.337 rows=6 loops=1)
Hash Cond: (outer.product_id = inner.product_id)-Nested
Loop(cost=18867.07..2858707.34 rows=55309width=612) (actual time=82.266..1474.018 rows=156 loops=1)-Hash
Join(cost=18867.07..2581181.09 rows=55309width=34) (actual time=82.170..1462.104 rows=156 loops=1)Hash
Cond: (outer.product_id = inner.product_id)-Index
Scan using keyword_keyword on keyword t2(cost=0.00..331244.43 rows=140771 width=17) (actualtime=0.033..1307.167 rows=109007 loops=1)Index
Cond: (keyword = 'black'::text)-Hash(cost=18851.23..18851.23
rows=6337width=17) (actual time=16.145..16.145 rows=0 loops=1)-Index
Scan using keyword_keyword onkeyword t1(cost=0.00..18851.23 rows=6337 width=17) (actualtime=0.067..11.050 rows=3294 loops=1)Index
Cond: (keyword = 'belt'::text)-Index
Scan using product_product_id_key on product(cost=0.00..5.01 rows=1 width=578) (actual time=0.058..0.060 rows=1loops=156)Index
Cond: (product.product_id = outer.product_id)-Hash(cost=18851.23..18851.23
rows=6337 width=17) (actualtime=42.863..42.863 rows=0 loops=1)-Index
Scan using keyword_keyword on keyword t3(cost=0.00..18851.23 rows=6337 width=17) (actual time=0.073..36.120rows=3932 loops=1)Index
Cond: (keyword = 'pants'::text)Total runtime: 1521.441 ms(17 rows)Sometimes the query work fast even for 3 keywords but that doesnt helpme if at other times it take ages

Hmm, JOIN on a Huge table with LIMIT. You may be suffering from
the same problem I had:

http://archives.postgresql.org/pgsql-performance/2005-07/msg00345.php

Tom came up with a patch which worked marvellous in my case:

http://archives.postgresql.org/pgsql-performance/2005-07/msg00352.php

Try applying this patch, it may solve your problem!

 Regards,
 Dawid

Re: [GENERAL] cobol storedprocedures

2005-08-17 Thread Dawid Kuroczko

On 8/15/05, MICHAEL BATTANI [EMAIL PROTECTED] wrote:
 I've been scanning the postgres website and yours to find out any
 information on cobol stored procedures.  Is there any plans on incorporating
 this
 in future releases?

I don't think anyone is working on such a thing right now.

The procedural languages development usually follows this route:
1. Someone skilled in C needs some procedural language, be it Perl,
Python, Ruby, etc.
2. This person hacks a glue code in C for such a language -- this
step is actually relatively easy -- you just have to create code
similar to already existing procedural languages.
3. This person releases the code, probably as a pg_foundry code,
announces it and so on.
4. If language receives significant response it may be moved into core
system.  If it does not or for some reason (is not mature enough, user
base is too small, nobody feels a need to drive the process), it is
still available as pg_foundry or similar project -- you have to
download it seperately -- it is the case with PL/Ruby, PL/Java, PL/J.

The problem is finding that 'someone'.  The law of big numbers states
that given large enough population of this language developers, you
will find this 'someone' in this
group. ;)  Personally I do not know Cobol, do not know any active
Cobol coders and
do not know any Cobol implementation internals (how difficult is it to
plug it in as
an embedded language).

One question you have to answer yourself is what do you need Cobol for?
There is a high chance that PL/perl, PL/python, PL/ruby or PL/R will do the
thing you need, but have advantage of being already there.

   Regards,
  Dawid

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Converting MySQL tinyint to PostgreSQL

2005-07-21 Thread Dawid Kuroczko

On 7/19/05, Jim C. Nasby [EMAIL PROTECTED] wrote:
  CREATE TABLE sample1 (
  a boolean,
  b int,
  c boolean
  );
 
  ...it will take more storage than:
 
  CREATE TABLE sample2 (
  b int,
  a boolean,
  c boolean
  );
 
 Actually, I believe that's the case with just about every database,

I tried making alternating int and boolean fields (8 columns total), and
the loss due to padding was around 30%.

Out of curiosity I repeated the test using MySQL 4.1 MyISAM (alternating
int and tinyint fields versus ints fist, then tinyints) -- the resulting files
had the same size.  So, for this case, MySQL MyISAM either reorders
data or stores data without padding.

   Regards,
  Dawid

 though of course each one has different alignment constraints. The point
 is that I don't know of any database that will silently re-order fields
 under the covers to optimize storage.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] PG based wiki engine

2005-07-18 Thread Dawid Kuroczko

On 7/18/05, Hannes Dorbath [EMAIL PROTECTED] wrote:
 Is there such a thing? PHP would be preferable, I don't want to maintain
 a MySQL-Server for such a small thing..
 
 Thanks in advance

I have used phpwiki (http://phpwiki.sourceforge.net/) with it.  It's not
really a full blown wiki system, but for intranet wiki or small-user-base
wiki it serves just right.

   Regards,
 Dawid

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [GENERAL] How to create unique constraint on NULL columns

2005-07-18 Thread Dawid Kuroczko

On 7/18/05, Andrus [EMAIL PROTECTED] wrote:
  That's a lot of overhead for doing something very simple, like defining a
  department key that means ALL and a row in the foreign table for it to
  point to.  Maintaining indices is a nontrivial performance trade-off.
 
 Yes, adding department ALL may be simpler solution.
 However, I reference department table from many other tables. In those other
 tables, department ALL is NOT ALLOWED.
 
 If I add ALL to department table, I must restrict all other tables of having
 ALL department. This is a big work and cannot be done nicely in Postgres.

Not true. :)  You simply need to add CHECK (departament_id  0) (assuming
0 is the ID of ALL departaments.  You can even CREATE DOMAIN with this
check built in to save you some typing. :)

If, for some reason, you want to be sure that 'ALL deparaments' is not
visible, you can create a view which will SELECT WHERE departament  0;

Basically -- I think  you should get some pre-declared values, like
departament_id
of 0 and simply restrict it where it is not allowed.  It's better than
forcing NULL
to become a value. :)

   Regards,
   Dawid

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] How to create unique constraint on NULL columns

2005-07-18 Thread Dawid Kuroczko

On 7/18/05, Andrus [EMAIL PROTECTED] wrote:
  If I add ALL to department table, I must restrict all other tables of
  having
  ALL department. This is a big work and cannot be done nicely in Postgres.
 
  Not true. :)  You simply need to add CHECK (departament_id  0) (assuming
  0 is the ID of ALL departaments.  You can even CREATE DOMAIN with this
  check built in to save you some typing. :)
 
  If, for some reason, you want to be sure that 'ALL deparaments' is not
  visible, you can create a view which will SELECT WHERE departament  0;
 
  Basically -- I think  you should get some pre-declared values, like
  departament_id
  of 0 and simply restrict it where it is not allowed.  It's better than
  forcing NULL
  to become a value. :)
 
 Dawid,
 
 I have meaningful primary key in department table (department code used
 inside enterptise), not a surrogate number (I use meaningful primary keys
 whenever possible).

OK, so then just define the UNIQUE INDEX to be exactly what you need,
for example:

CREATE UNIQUE INDEX my_special_unique ON permission (
  user_id,
  permisson_id,
  (department_id IS NULL),
  (CASE when department_id IS NULL THEN 0 ELSE department_id END)
);

This should work for any department_id type.  With one little drawback:
person can have permission to ALL departaments (NULL) _and_ also
an explicit permission for any of already existing ones.

HTH, HAND

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [GENERAL] How to create unique constraint on NULL columns

2005-07-18 Thread Dawid Kuroczko

On 7/18/05, Dawid Kuroczko [EMAIL PROTECTED] wrote:
 On 7/18/05, Andrus [EMAIL PROTECTED] wrote:
  I have meaningful primary key in department table (department code used
  inside enterptise), not a surrogate number (I use meaningful primary keys
  whenever possible).
 
 OK, so then just define the UNIQUE INDEX to be exactly what you need,
 for example:

Or better:
CREATE UNIQUE INDEX permission_unique_key (user_id,permission_id,department_id);
CREATE UNIQUE INDEX permission_uninull_key (user_id,permission_id)
WHERE department_id IS NULL;

...you may want to add WHERE department_id IS NOT NULL to the first query
(or not).  Anyway -- this way these indexes could be used by searches easily
than using previous index I've sent.

Regards,
Dawid

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Converting MySQL tinyint to PostgreSQL

2005-07-18 Thread Dawid Kuroczko

On 7/18/05, Lincoln Yeoh lyeoh@pop.jaring.my wrote:
 However, maybe padding for alignment is a waste on the disk - disks being
 so much slower than CPUs (not sure about that once the data is in memory ).
 Maybe there should be an option to reorder columns so that less space is
 wasted.

Out of curiosity, do I understand right that if I create table

CREATE TABLE sample1 (
a boolean,
b int,
c boolean
);

...it will take more storage than:

CREATE TABLE sample2 (
b int,
a boolean,
c boolean
);

...I don't think such ordering should matter, but I would like to know
how it really is. :)

   Regards,
   Dawid

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] How to create unique constraint on NULL columns

2005-07-15 Thread Dawid Kuroczko

On 7/15/05, Andrus [EMAIL PROTECTED] wrote:
 CREATE TABLE test( col1 CHAR NOT NULL, col2 CHAR,
   UNIQUE (col1, col2) );
 INSERT INTO test VALUES ( '1', NULL );
 INSERT INTO test VALUES ( '1', NULL );
 does NOT cause error!
 
 How to create constraint so that NULL values are treated equal and second
 insert is rejected ?

Please read:
http://www.postgresql.org/docs/8.0/interactive/indexes-unique.html
...or this list archives.

In short: NULL is not equal to NULL.  NULL is a state, not a value.

   Regards,
  Dawid

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Converting MySQL tinyint to PostgreSQL

2005-07-13 Thread Dawid Kuroczko

On 7/13/05, Tom Lane [EMAIL PROTECTED] wrote:
 Greg Stark [EMAIL PROTECTED] writes:
  Personally I would settle for a fuller set of small fixed size datatypes. 
  The
  char datatype is pretty much exactly what's needed except that it provides
  such a quirky interface.
 
 I'm not actually against inventing an int1/tinyint type.  I used to be
 worried that it would screw up the numeric datatype promotion hierarchy
 even more than it already was screwed up :-( ... but I think we have
 dealt with most of those issues now.  It'd be worth trying anyway ---
 much more so than trying to optimize char(1), IMHO.

The problem with int1 type is that the smaller the value, the more
push for unsigned types...  I think it may be worth doing, but is not
exactly the problem -- smallint is fine for most of situations.  The
only place where I was unhappy with signed integers was... int4
(I wanted to put full 32bit unsigned values, so I had to use bigint,
with couple of millions of rows its a bit of a waste ;)).

As for the char/varchar type -- I was wondering.  Worst case
scenario for UTF-8 (correct me on this) is when 1 character
takes 4 bytes.  And biggest problem with char/varchar is that
length indicator takes 4 bytes...  How much overhead would
it be to make a length variable, for example:

(var)char(1)-char(63) -- 1 byte length + string
char(64)-char(16383) -- 2 byte length + string
char(16384)-text-- 4 byte length + string, like now

This would reduce length of char(5) string from 9 bytes to
6 bytes, char(2) from 6 bytes to 3 bytes (for multibyte chars
it would be a win also).

I don't know the internals too well (read: at all), but I guess there
would be a problem of choosing which length of length to use --
would it be possible to make some sort of on-the-fly mapping
when creating tables -- varchar(224) is text_2bytelength,
text is text_4bytelength, char(1) is text_1bytelength...

   Regards,
Dawid

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Converting MySQL tinyint to PostgreSQL

2005-07-13 Thread Dawid Kuroczko

On 7/12/05, Joe [EMAIL PROTECTED] wrote:
 Dawid Kuroczko wrote:
  smallint takes two bytes.  Numeric(1) will take around 10 bytes and char(1) 
  will
  take 5 bytes (4 bytes for length of data).
 I never would've imagined *that* amount of overhead for CHAR(1)!  I would've
 imagined that it would take up one byte (or two with a NULL indicator).  After
 all, we're not talking about VARCHAR(1) [which is sort of useless].  Don't the
 catalogs know the declared length and if so, why the length overhead?  I'm 
 also
 surprised --albeit less-- about the NUMERIC(1) overhead.  Is any of this
 discussed in the Internals chapters?  I didn't see this discussed elsewhere.

It is all described in
http://www.postgresql.org/docs/8.0/interactive/datatype.html
with given above space requirements also.  Noone hides it, it's all black on
white. :-)

 As a perhaps-related aside, I've noticed several examples which declare string
 types as 'text', rather than VARCHAR or CHAR, the former being non-SQL 
 standard.
 Is there some performance benefit to using 'text' (other than it being shorter
 to type :-) or is it just the usual Postgres way?

Some time ago people used text as a way for making easily changeable varchar,
say:
col text CHECK (length(VAL)  100)
...when person sees that 100 is too small it was just a matter of changing the
CHECK constraint.

...but it was long time ago, and since that time PostgreSQL is able to change
the type of column with no problems, and efficiently.

Regards,
 Dawid

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Converting MySQL tinyint to PostgreSQL

2005-07-12 Thread Dawid Kuroczko

On 7/12/05, Joe [EMAIL PROTECTED] wrote:
 I have a MySQL database that I'm converting to PostgreSQL which has 10 columns
 with TINYINT type, i.e., a one-byte integer.  Only one of them qualifies as a
 true BOOLEAN.  Two are entity identifiers (for limited range classes or
 categories) and three others are type/code values.  The last four are 
 month
 numbers or day of month numbers (not implemented as date types because the 
 year
 is separate and is part of the primary key).
 
 I'm wondering what would be the best conversion choice for these columns:
 smallint, numeric(1), char(1), something else?  AFAICT, the application does 
 not

smallint takes two bytes.  Numeric(1) will take around 10 bytes and char(1) will
take 5 bytes (4 bytes for length of data).  The closest match is smallint which
is reasonably small and will do the trick.  Remember that PostgreSQL doesn't
have unsigned types.

This may seem a terrible waste of diskspace at the first glance, but keep
in mind that PostgreSQL's power lies in an intelligent planner.  You can
safely nomralize data and create views which emulate one-table which
both simplifies development and can be more efficient in terms of
storage and raw speed.

 directly do any arithmetic on these columns, but the identifier and code types
 are used as indices into PHP arrays.  The month/day values are formatted into
 dates for display (and the month also servers as an index into a month-name 
 array).

You should consider using date / time / timestamp column for dates and use
functions like extract(day from mydate).  Remember you can use functional
indexes, so if you need data to by indexed by day, you can:
CREATE INDEX dayindex ON sometable ((extract(day from datecolumn)));

   Regards,
 Dawid

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] About unsigned smallint?

2005-07-06 Thread Dawid Kuroczko

On 7/6/05, Ying Lu [EMAIL PROTECTED] wrote:
 Greetings,
 
 Can I know whether postgreSQL 8.0 supports unsigned smallint please? I
 looked at the doc, it seems that OID is unsigned interger. While I was
 trying to create a simple table as:
   create table test (id unsigned smallint);
  or
   create table test (id smallint unsigned);
 
 It seems that postgreSQL did not support unsigned integer?

Well, PostgreSQL doesn't have unsigned types, unless you create
your own.  If you want to have unsigned type, you can add a check
constraint or, even better, create a domain:

CREATE DOMAIN usmallint AS smallint CHECK (VALUE = 0);

...while this gives you unsinged smallint type, its probably not
what you wanted.  If you wanted a type which takes two bytes of
storage and stores values from 0 to 65535 then, well... its not it.

If you ask here, you'll probably get a good explanation why there
aren't unsinged types.  My guess is that unsigned types add
complexity which is not really judged by their usefullness, but
thats only a guess.

If you need unsigned-like type for data consistency reasons, just
CREATE DOMAIN as shown above.

  Regards,
  Dawid

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Hot to restrict access to subset of data

2005-07-03 Thread Dawid Kuroczko

On 7/4/05, Gregory Youngblood [EMAIL PROTECTED] wrote:
 I would strongly suggest that you create a database specific user,
 one that has read/write access within this database, and that your
 application use that user instead of the pg super user.
 
 In general, the super user should never be used, except for
 specific administrative tasks. This holds true for Windows
 Administrator, Unix root, and postgresql's postgres users. If your
 application runs under a single user to the database, then that
 single user should be one that you create specifically for that
 purpose, and not the postgres user.

Exactly.  And the reasons are quite important also.  PostgreSQL
superuser has right to run unsecure scripts.  Let's assume the
unlikely situation that someone finds a hole in your page which
will allow her to do some SQL injections/etc.The normal user
is limited to what that user can do.  In your case, probably wipe
out much of data.  But superuser has right to make scripts
which are unsafe.  In other words -- has right to execute almost
any command in name of UNIX postgres user. If abuser is skillful,
she can run some local root exploit and gain root priveleges,
assuming there is some local hole open.  When using normal
user (who owns all the tables and so on; so is not limited from
point of view of application), it would be (much) harder for her
to gain such an access.

   Regards,
  Dawid

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Which record causes referential integrity violation on delete

2005-07-02 Thread Dawid Kuroczko

On 7/1/05, Andrus [EMAIL PROTECTED] wrote:
 In Postgres 8 I tried commad
 
 DELETE FROM customer WHERE id=123
 
 but got an error
 
 ERROR:  update or delete on customer violates foreign key constraint
 invoice_customer_fkey on invoice'
 
 How to determine the primary key of invoice table which causes this error
 in generic way ?

Well, I am not sure, but information you want may be contained in
information_schema.key_column_usage and 
information_schema.referential_constraints

From psql client, simply use \d command.

 Why Postgres does not report primary key value in error message ?

My guess it is because it gives unnecesary complication.  And maybe
there would be also some performance hit, but I am not sure of the
latter.

 I it is not possible to add more information to error I need function which
 takes 3 arguments:
 
 table name ('customer')
 field name  ('id')
 field value (123)
 
 and returns the name of the table and primary key value which blocks
 record deletion.

Hmm, let's try to do it another way.  You know that the constraint causing
the problem was invoice_customer_fkey.

So you need to:
SELECT unique_constraint_schema,unique_constraint_name FROM
information_schema.referential_constraints WHERE constraint_schema =
'public' AND constraint_name = 'invoice_customer_fkey';

Supposedly, it will return 'public', 'invoice_pkey' values.

This gives you an information about which constraint 'really holds' your delete.
Then do:
SELECT table_schema,table_name,column_name FROM
information_schema.key_column_usage WHERE constraint_schema = 'public'
AND constraint_name = 'invoice_pkey' ORDER BY ordinal_position;

This will give, for example:

'public' | 'invoice' | 'year'
'public' | 'invoice' | 'month'
'public' | 'invoice' | 'id'

(assuming invoices are identified by date and this month's order id.

So now you know that to get that primary key that blocks you from
removing date is:

SELECT year,month,id FROM invoice WHERE foreign key columns;


   Regards,
  Dawid

PS: It is possible to make steps similar to these using PL/pgSQL, its
not that difficult actually.  But I would tend to thing that it would be better
if the client (the application) would know the data and was able to handle
such situations.  I.e. if there is a FK violation on customers, to present
the user with list of undeleted customers invoices and ask her if it should
be removed.  Handling it all behind the scenes in a backend may not
be the best solution.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Which record causes referential integrity violation on delete

2005-07-02 Thread Dawid Kuroczko

On 7/2/05, Dawid Kuroczko [EMAIL PROTECTED] wrote:
 On 7/1/05, Andrus [EMAIL PROTECTED] wrote:
  In Postgres 8 I tried commad
 
  DELETE FROM customer WHERE id=123
 
  but got an error
 
  ERROR:  update or delete on customer violates foreign key constraint
  invoice_customer_fkey on invoice'
 
  How to determine the primary key of invoice table which causes this error
  in generic way ?
 
 Well, I am not sure, but information you want may be contained in
 information_schema.key_column_usage and
 information_schema.referential_constraints
 
[,,,]

I forgot to add, this is of course a simplistic approach which:
1. may be simply wrong
2. assumes data is available to user in nformation_schema (I guess the
information schema lists only data owned by user; yet I am not sure
about that).
3. assumes foreign keys have really simple set up (no FKs depending on
FKs depending on FKs...)
4. and so on.

In other words, best approach would be know thy schema in the
application. ;)))

  Regards,
 Dawid

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [GENERAL] COnsidering a move away from Postgres

2005-06-30 Thread Dawid Kuroczko

On 6/30/05, Jason Tesser [EMAIL PROTECTED] wrote:
 I work for a college and we use PG currently as our main backend.  We are
 currently developing with Java.  We are considering moving away from postgres
 for the reasons I am going to list below.  I would appreciate some thoughts
 from the Postgres community on way we should or shouldn't leave postgres.

Out of curiosity, what other backends do you consider and what is their
syntax for such problems.  Don't get me wrong, I don't intend to prove
anything by asking so.  I am just curious what syntax would you prefer,
or in other words, what syntax is most convenient for a person doing
procedural language intense project.  Hopefully it will help PL/pgSQL
develop in a best direction.

So, please post samples of syntax (and a DB-name, I'm curious about
other DBs syntaxes).

   Regards,
 Dawid

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

[GENERAL] Inherits and get highest id.

2005-06-10 Thread Dawid Kuroczko

Hello.

I've tried data partitioning using INHERITS mechanism (pgsql 8.0.3).

The schema looks like this:

CREATE TABLE log (
logid integer NOT NULL PRIMARY KEY,
logdate timestamp(0) without time zone NOT NULL,
typeid integer NOT NULL,
ip  inet,
[.]
);

CREATE TABLE log_data.log200501 () INHERITS log;
CREATE TABLE log_data.log200502 () INHERITS log;
...
CREATE TABLE log_data.log200512 () INHERITS log;

Each month-table has over 2 milion rows.
Each table has a primary index on logid column.

And now, if I do a simple query like this:

explain analyze SELECT logid FROM log_200501 ORDER BY logid LIMIT 1;
 
QUERY PLAN
---
 Limit  (cost=0.00..0.04 rows=1 width=4) (actual time=0.113..0.115
rows=1 loops=1)
   -  Index Scan using log_200501_pkey on log_200501 
(cost=0.00..168766.23 rows=4149475 width=4) (actual time=0.108..0.108
rows=1 loops=1)
 Total runtime: 0.196 ms

But when I'm trying to get global highest logid I get:

qnex=# explain SELECT logid FROM log ORDER BY logid LIMIT 1;
 QUERY PLAN

 Limit  (cost=4140892.91..4140892.91 rows=1 width=4)
   -  Sort  (cost=4140892.91..4201607.87 rows=24285986 width=4)
 Sort Key: logs.log.logid
 -  Result  (cost=0.00..887109.86 rows=24285986 width=4)
   -  Append  (cost=0.00..887109.86 rows=24285986 width=4)
 -  Seq Scan on log  (cost=0.00..10.70 rows=70 width=4)
 -  Seq Scan on log_200501 log 
(cost=0.00..155529.75 rows=4149475 width=4)
 -  Seq Scan on log_200502 log 
(cost=0.00..145904.29 rows=3857729 width=4)
 -  Seq Scan on log_200503 log 
(cost=0.00..165485.78 rows=4369278 width=4)
 -  Seq Scan on log_200504 log 
(cost=0.00..420093.74 rows=11908874 width=4)
 -  Seq Scan on log_200505 log  (cost=0.00..10.70
rows=70 width=4)
 -  Seq Scan on log_200506 log  (cost=0.00..10.70
rows=70 width=4)
 -  Seq Scan on log_200507 log  (cost=0.00..10.70
rows=70 width=4)
 -  Seq Scan on log_200508 log  (cost=0.00..10.70
rows=70 width=4)
 -  Seq Scan on log_200509 log  (cost=0.00..10.70
rows=70 width=4)
 -  Seq Scan on log_200510 log  (cost=0.00..10.70
rows=70 width=4)
 -  Seq Scan on log_200511 log  (cost=0.00..10.70
rows=70 width=4)
 -  Seq Scan on log_200512 log  (cost=0.00..10.70
rows=70 width=4)
(18 rows)

Time: 3,142 ms



In other words -- doing such a one row select means scanning the
whole partitioned data.  the primaryindex on logid is not used
anywhere.  Creating view (SELECT * UNION ALL SELECT * UNION ALL)
does not help either.

I wonder -- is there any better solution?  selecting highest from each
table, unioning it and then selecting highest of the highest works
well but it strikes me as 'not exactly the niciest implementation'
(one of the reasons is that I have to remembed about it each time
new table will be added to schema...).

Any comments, hints?

Regards,
Dawid

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Just a crazy idea!

2005-05-26 Thread Dawid Kuroczko

On 5/26/05, Hrishikesh Deshmukh [EMAIL PROTECTED] wrote:
 Is it possible to connect a DB in Postgresql to a DB in MySQL!
 I know its a crazy idea!

Why, of course.  Been' doing that.

All you need is to write a set of functios, for example in PL/perlU,
some of them being set returning functions.  

For my purposes I did a function which connects to mysql to
call its encrypt function, and a function which returns show databases
and similar stuff.

Basically -- yes you can do it, though things getting tricky if you want
to have _read_ access to _big_ tables. :)

   Regards,
  Dawid

PS: For the reference, why do you need to connect to mysql?

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [GENERAL] Update on tables when the row doesn't change

2005-05-25 Thread Dawid Kuroczko

On 5/24/05, Sebastian Böck [EMAIL PROTECTED] wrote:
 /* 3rd way of separating updates
con: unnecessary updates on tables
pro: view gets evaluated only 1 time
 
Not adressing the problem of unnecessary updates, but the view
gets only evaluated one time.
 
 */
 
 CREATE OR REPLACE FUNCTION upd (view_test) RETURNS VOID AS $$
 DECLARE
 NEW ALIAS FOR $1;
 BEGIN
 RAISE NOTICE 'UPDATE';
 UPDATE test SET test = NEW.test WHERE id = OLD.id;
 UPDATE join1 SET text1 = NEW.text1 WHERE id = OLD.id;
 UPDATE join2 SET text2 = NEW.text2 WHERE id = OLD.id;
 UPDATE join3 SET text3 = NEW.text3 WHERE id = OLD.id;
 RETURN;
 END;
 $$ LANGUAGE plpgsql;

Control question, I didn't check it, but would it be enough to change from:
   UPDATE join1 SET text1 = NEW.text1 WHERE id = OLD.id;
to:
   UPDATE join1 SET text1 = NEW.text1 WHERE id = OLD.id AND text1  NEW.text1?

...  I may be wrong. :)

   Regards,
 Dawid

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [GENERAL] Update on tables when the row doesn't change

2005-05-25 Thread Dawid Kuroczko

On 5/25/05, Sebastian Böck [EMAIL PROTECTED] wrote:
 CREATE OR REPLACE FUNCTION upd (view_test) RETURNS VOID AS $$
 DECLARE
 NEW ALIAS FOR $1;
 BEGIN
 RAISE NOTICE 'UPDATE';
 UPDATE test SET test = NEW.test WHERE id = OLD.id;
 UPDATE join1 SET text1 = NEW.text1 WHERE id = OLD.id;
 UPDATE join2 SET text2 = NEW.text2 WHERE id = OLD.id;
 UPDATE join3 SET text3 = NEW.text3 WHERE id = OLD.id;
 RETURN;
 END;
 $$ LANGUAGE plpgsql;
 
 
  Control question, I didn't check it, but would it be enough to change from:
 UPDATE join1 SET text1 = NEW.text1 WHERE id = OLD.id;
  to:
 UPDATE join1 SET text1 = NEW.text1 WHERE id = OLD.id AND text1  
  NEW.text1?
 
  ...  I may be wrong. :)
 
 Yes, thats more elegant then my other (4th) solution.
 Was late yesterday evening ;)

Be wary of the NULL values though. :)  Either don't use them, add
something like 'AND (text1  NEW.text1 OR text1 IS NULL OR NEW.text1
IS NULL)' or something more complicated. :)

   Regards,
 Dawid

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

1 2 >

1 - 100 of 128 matches

Mail list logo