from:"Dmitry Koterov"

[GENERAL] Incorrect "ERROR: database "xxx" is being accessed by other users"

2007-03-12 Thread Dmitry Koterov


Hello.

I have a database which I cannot delete (the same for rename):

#  psql -d template1
=# drop database xxx;
ERROR:  database "xxx" is being accessed by other users

But it is NOT accessed by anybody, because:
1. ps ax|grep post shows that there is NO active connections to database
xxx;
2. I restarter postmaster and immediately try to delete the database - the
same error message
3. I stopped web-server, and now it is GUARANTEED that there is no
connections :-)

What other diagnostics could I do?

P. S.
This database has other stranges, like:
1. VALUUM FULL VERBOSE ANALYZE hands at the middle of this database and
never stops its execution;
2. there are some problems when I create a foreign key to particular table:
it also hangs (postmaster restart does not correct this)
3. database is 2-3 times slow than it could be (if I restore it from the
backup to the new place, it works 2-3 times faster)

Practically I have already dump+restore this database to a new location
(xxx_new) and work with it only, but I want to remove the old (broken?)
database xxx and I cannot...

Re: [GENERAL] Incorrect "ERROR: database "xxx" is being accessed by other users"

2007-03-13 Thread Dmitry Koterov

Yes, I have one!
How to remove it now? I tried DEALLOCATE for gid returned by

select * from pg_prepared_xacts;

but it says "prepared statement does not exist"...
Database restart does not reset the prepared transaction...

On 3/13/07, Tom Lane <[EMAIL PROTECTED]> wrote:

"Dmitry Koterov" <[EMAIL PROTECTED]> writes:
> I have a database which I cannot delete (the same for rename):
> ERROR:  database "xxx" is being accessed by other users
> But it is NOT accessed by anybody, because:

Have you checked for prepared transactions in that DB?  See
pg_prepared_xacts view (I've been burnt by that myself...)

regards, tom lane

Re: [GENERAL] Incorrect "ERROR: database "xxx" is being accessed by other users"

2007-03-13 Thread Dmitry Koterov

Thanks a lot! All works!

So - I propose to change error message from

ERROR:  database "xxx" is being accessed by other users

to

ERROR:  database "xxx" is being accessed by other users or there are
prepared transactions exist (please use "SELECT * FROM pg_prepared_xacts"
and "ROLLBACK PREPARED ..." to fix this)

in a new PG version. Is it possible?

On 3/13/07, Michael Fuhr <[EMAIL PROTECTED]> wrote:

On Tue, Mar 13, 2007 at 11:41:46AM +0300, Dmitry Koterov wrote:
> Yes, I have one!
> How to remove it now? I tried DEALLOCATE for gid returned by
>
> select * from pg_prepared_xacts;
>
> but it says "prepared statement does not exist"...

DEALLOCATE is for prepared *statements*; you have a prepared
*transaction*.  Connect to the database you're trying to drop and
use ROLLBACK PREPARED or COMMIT PREPARED, then disconnect from that
database and try dropping it again.

--
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] How to disable displaying of a NOTICE context?

2007-03-13 Thread Dmitry Koterov


Hello.

With debug purposes I use RAISE NOTICE ... inside my stored
functions/triggers. But there is a little problem: if I run this
function/trigger inside psql command-line client (e.g.), it shows not only a
notice text, but also a caller context (started with "CONTEXT" substring)
which looks quite huge and ugly.

Is there any way to disable CONTEXT displaying with still enabled NOTICEs
text?

Re: [GENERAL] How to disable displaying of a NOTICE context?

2007-03-13 Thread Dmitry Koterov

Thanks!

With RAISE NOTICE "terse" mode works great, CONTEXT is not shown.

But does "terse" also disable CONTEXT displaying on fatal errors inside a
stored function? If it does, it is not so useful as it could be, because I
don't think that fatal error context hiding is a good idea... No
documentation about it at all.

On 3/13/07, Tom Lane <[EMAIL PROTECTED]> wrote:

"Dmitry Koterov" <[EMAIL PROTECTED]> writes:
> Is there any way to disable CONTEXT displaying with still enabled
NOTICEs
> text?

You could do "\set VERBOSITY terse"

regards, tom lane

[GENERAL] Creation of a read-only role.

2007-03-16 Thread Dmitry Koterov


Hello.

When we start using of any replication system (e.g. Slony) we need to create
a "read-only" role for access the database. This role must be able to read
anything, but should NOT be able to INSERT, UPDATE or DELETE for all
database objects.

Overall, we need 3 roles:

1. Administrator: can do anything with a database (by default this user is
already exists - "postgres").
2. Read-only: can only read. Runs on all slave nodes.
3. Read-write: can write, but cannot change the database schema. Runs on
master node only.

Is any way to easily create and maintain these standard roles?

Now I have written a stored procedure which iterates over the pg_catalog and
runs a lot of REVOKE & GRANT commands, but it seems to be not an universal
solution, because:

1. I have to re-run this procedure after I change the database schema. (Very
bad item! Can we avoid it?)
2. It looks like a "broot-force" method, and nothing said about it in the
Slony documentation (strange).
3. In MySQL (e.g.) there is a one-command way to create these three roles.

Again, these 3 roles seems to be a de-facto standard for replication
systems, but I found nothing about this question in the Google.

Re: [GENERAL] Creation of a read-only role.

2007-03-18 Thread Dmitry Koterov

actually - you dont need the read-only role, if this is only for slave
nodes. slony takes care about the issue and doesn't allow any writes
on slaves.

Great!

But what about a role which can modify the data, but cannot modify the
database schema?

On 3/17/07, hubert depesz lubaczewski <[EMAIL PROTECTED]> wrote:

On 3/16/07, Dmitry Koterov <[EMAIL PROTECTED]> wrote:
> Overall, we need 3 roles:
> 1. Administrator: can do anything with a database (by default this user
is
> already exists - "postgres").
> 2. Read-only: can only read. Runs on all slave nodes.

actually - you dont need the read-only role, if this is only for slave
nodes. slony takes care about the issue and doesn't allow any writes
on slaves.

depesz

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Creation of a read-only role.

2007-03-18 Thread Dmitry Koterov

Oh, sorry for the previous question - I can create a
scheme-changes-disallowed role by revoking the "CREATE" permission from all
the database schemas.
The issue seems to be closed now, thanks.

On 3/18/07, Dmitry Koterov <[EMAIL PROTECTED]> wrote:

> actually - you dont need the read-only role, if this is only for slave
> nodes. slony takes care about the issue and doesn't allow any writes
> on slaves.
Great!

But what about a role which can modify the data, but cannot modify the
database schema?

On 3/17/07, hubert depesz lubaczewski <[EMAIL PROTECTED]> wrote:
>
> On 3/16/07, Dmitry Koterov <[EMAIL PROTECTED]> wrote:
> > Overall, we need 3 roles:
> > 1. Administrator: can do anything with a database (by default this
> user is
> > already exists - "postgres").
> > 2. Read-only: can only read. Runs on all slave nodes.
>
> actually - you dont need the read-only role, if this is only for slave
> nodes. slony takes care about the issue and doesn't allow any writes
> on slaves.
>
> depesz
>
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster
>

Re: [GENERAL] Planner tuning

2007-03-20 Thread Dmitry Koterov

I don't know exactly is it your case, but sometimes

SET enable_sort = off;

speeds up some queries by the factor of hundred. But in some cases this
command slows down operations, so I TEMPORARILY switch enable_sort on and
off for some queries. It affects the query plan greatly.

On 3/20/07, Alban Hertroys <[EMAIL PROTECTED]> wrote:

Tom Lane wrote:
> Alban Hertroys <[EMAIL PROTECTED]> writes:
>> It seems pretty obvious that the planner underestimates the cost of
>> nestloops here, is there some way to tweak this?
>
> The real problem is the factor-of-a-thousand underestimate of the size
> of this join:

Good observation, I missed that one. Thanks.

>>->  Nested Loop  (cost=0.00..281.74 rows=2 width=14) (actual time=
0.068..14.000 rows=1683 loops=1)
>>  ->  Index Scan using fewo_location_ancestry_full_idx on
fewo_location_ancestry ancestor  (cost=0.00..49.34 rows=9 width=4) (actual
time=0.024..0.172 rows=41 loops=1)
>>Index Cond: ((ancestor_id = 309) AND (ancestor_type_id =
12) AND (child_type_id = 10))
>>  ->  Index Scan using
fewo_property_location_country_location_idx on fewo_property_location
property_location  (cost=0.00..25.80 rows=2 width=18) (actual time=
0.009..0.169 rows=41 loops=41)
>>Index Cond: ((property_location.country_id = 300) AND
("outer".child_id = property_location.location_id))
>>Filter: (property_state_id = 3)
>
> Have you got up-to-date ANALYZE stats for both of these tables?
> Maybe increasing the statistics targets for them would help.

Yes. This is as of this moment a mostly static development database that
has been vacuumed and analyzed quite recently.

> You may be kind of stuck because of the lack of cross-column statistics
> --- I suppose these columns are probably rather highly correlated ---
> but you should at least try pulling the levers you've got.
>
> One thought is that country_id is probably entirely determined by
> location_id, and possibly ancestor_type_id is determined by ancestor_id.

Actually property.location_id refers to cities, which is the deepest
level in the represented data. Country_id is the top level.

Ancestry id, type and child id, type are indeed closely related. I
changed their representation based on your suggestions.

> If so you should be leaving them out of the queries and indexes;
> they're not doing anything for you except fooling the planner about the
> net selectivity of the conditions.

I tried a few things, but it seems I am quite successful at fooling the
planner...

I changed the indices on our ancestry table to not combine id and type
on the same half of the join; which is something we're in fact never
interested in anyway. This seems to have helped some indeed.

I tried removing country_id from the equation, but I haven't had the
patience to wait for the explain analyzes to complete that way - they
take long.
I implemented it this way as an optimization; I decided to join
property_location with both period_type_property and
property_availability_month using (country_id, property_id) as FK.
That quickly narrows down the number of matching records in those
tables, which an index on property_id only somehow didn't accomplish.

The good news is that I get results under 1s without having to
explicitly sort my subquery results.
The bad news is that the estimated row counts are still quite a bit off.
I analyzed the DB just before generating the attached result.

--
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

QUERY
PLAN

--
Aggregate  (cost=6780.04..6780.42 rows=1 width=182) (actual time=
629.652..629.653 rows=1 loops=1)
   ->  Nested Loop  (cost=1053.78..6779.62 rows=1 width=182) (actual time=
160.662..595.059 rows=969 loops=1)
 ->  Hash Join  (cost=1053.78..6688.46 rows=21 width=186) (actual
time=160.493..545.222 rows=3522 loops=1)
   Hash Cond: ("outer".property_id = "inner".property_id)
   ->  GroupAggregate  (cost=0.00..5581.97 rows=3500 width=12)
(actual time=0.159..363.108 rows=3522 loops=1)
 ->  Index Scan using
fewo_property_availability_month_country_property_idx on
fewo_property_availability_month property_availability_month  (cost=
0.00..3893.52 rows=34992 width=12) (actual time=0.023..147.269 rows=37316
loops=1)
   Index Cond: (300 = country_id)
   ->  Hash  (cost=1053.16..1053.16 rows=250 width=86) (actual
time=160.277..160.277 rows=3522 loops=1)
 ->  Ha

Re: [GENERAL] Check the existance of temporary table

2007-03-25 Thread Dmitry Koterov

In stored procedures I used something like

BEGIN
   CREATE TEMPORARY TABLE tmp ...
EXCEPTION
   WHEN ... THEN ...
END

See pg error codes for details (I don't remember exactly, but maybe it is a
dumplicate_table or duplicate_object exception).

On 3/25/07, Tom Lane <[EMAIL PROTECTED]> wrote:

"dfx" <[EMAIL PROTECTED]> writes:
> how I can check  the existance of temporary table?
> I.e. wich query I have to use to know if MY_TEMP_TABLE exists?

As of 8.2 you can do

SELECT ... FROM pg_class
  WHERE relname = 'whatever' AND relnamespace = pg_my_temp_schema();

In earlier releases pg_my_temp_schema() isn't built in, so you have
to do some pushups to determine which schema is your temp schema.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[GENERAL] Temporarily disable all table indices

2007-03-26 Thread Dmitry Koterov


Hello.

I need to perform a mass operation (UPDATE) on each table row. E.g. - modify
one table column:

UPDATE tbl SET tbl_text = MD5(tbl_id);

The problem is that if this table contains a number of indices, such UPDATE
is very very slow on large table.

I have to drop all indices on the table, then run the update (very quick)
and after that - re-create all indices back. It is much more speedy.
Unfortunately the table structure may change in the future (e.g. - new
indices are added), so I don't know exactly in this abstraction layer, what
indices to drop and what - to re-create.

Is any way (or ready piece of code) to save all existed indices, drop them
all and then - re-create after a mass UPDATE?

Re: [GENERAL] Temporarily disable all table indices

2007-03-27 Thread Dmitry Koterov


Thanks!

*pg_indexes.indexdef* is exactly what I was looking for!

On 3/27/07, Erik Jones <[EMAIL PROTECTED]> wrote:


On Mar 26, 2007, at 5:24 PM, Dmitry Koterov wrote:

Hello.

I need to perform a mass operation (UPDATE) on each table row. E.g. -
modify one table column:

UPDATE tbl SET tbl_text = MD5(tbl_id);

The problem is that if this table contains a number of indices, such
UPDATE is very very slow on large table.

I have to drop all indices on the table, then run the update (very quick)
and after that - re-create all indices back. It is much more speedy.
Unfortunately the table structure may change in the future (e.g. - new
indices are added), so I don't know exactly in this abstraction layer, what
indices to drop and what - to re-create.

Is any way (or ready piece of code) to save all existed indices, drop them
all and then - re-create after a mass UPDATE?


No, but you can use the pg_indexes view (
http://www.postgresql.org/docs/8.2/interactive/view-pg-indexes.html) to
dynamically determine what indexes a table has.

erik jones <[EMAIL PROTECTED]>
software developer
615-296-0838
emma(r)

[GENERAL] Strange behaviour under heavy load

2007-03-27 Thread Dmitry Koterov


Hello.

I have found that sometimes heavy loaded PostgreSQL begins to run all the
queries slower than usual, sometimes - 5 and more times slower. I cannot
reprocude that, but symptoms are the following: queries work very fast for
5-10 minutes, and after that - significant slowdown (every query, even a
simple one, works 5-10 and ever more times slower), disk write activity
grows too (but not everytime - I don't know exactli if there is a direct
correlation). 2-3 seconds, then - performance restores back to normal.

Autovacuum is turned off.

Machine has 2 processors, huge memory, fast SCSI disks.

I understand that there is too less information. Please advice what to
monitor for better problem discovering.

Re: [GENERAL] Strange behaviour under heavy load

2007-03-28 Thread Dmitry Koterov


New information.

When I run CHECKPOINT manually in psql console, effect is fully reproduced.
So - it is a checkpoint slowdown.

The quesion is - how to make CHECKPOINT work faster?

On 3/27/07, Dmitry Koterov <[EMAIL PROTECTED]> wrote:


How to tune them?

Now I have:

checkpoint_segments = 5 # in logfile segments, min 1, 16MB each
checkpoint_timeout = 900# range 30-3600, in seconds
checkpoint_warning = 30 # in seconds, 0 is off

No checkpoint warnings in pgsql logs.


On 3/27/07, Oleg Bartunov  wrote:
>
> hmm,
>
> looks like checkpoint ?
> Check checkpoint settings in postgresql.conf.
>
>
> Oleg
> On Tue, 27 Mar 2007, Dmitry Koterov wrote:
>
> > Hello.
> >
> > I have found that sometimes heavy loaded PostgreSQL begins to run all
> the
> > queries slower than usual, sometimes - 5 and more times slower. I
> cannot
> > reprocude that, but symptoms are the following: queries work very fast
> for
> > 5-10 minutes, and after that - significant slowdown (every query, even
> a
> > simple one, works 5-10 and ever more times slower), disk write
> activity
> > grows too (but not everytime - I don't know exactli if there is a
> direct
> > correlation). 2-3 seconds, then - performance restores back to normal.
>
> >
> > Autovacuum is turned off.
> >
> > Machine has 2 processors, huge memory, fast SCSI disks.
> >
> > I understand that there is too less information. Please advice what to
> > monitor for better problem discovering.
> >
>
> Regards,
> Oleg
> _
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru ),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, 
http://www.sai.msu.su/~megera/<http://www.sai.msu.su/%7Emegera/>
> phone: +007(495)939-16-83, +007(495)939-23-83
>
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly
>

Re: [GENERAL] Strange behaviour under heavy load

2007-03-28 Thread Dmitry Koterov


Disc write activity timeline looks like the following:

CHECKPOINT ->
Large disc write (30-40M/s), 1 second ->
4-5 seconds: no read and write activity at all, but server works slow ->
2-3 seconds: disc write activity about 4-5 M/s ->
restore normal speed


On 3/28/07, Dmitry Koterov <[EMAIL PROTECTED]> wrote:


New information.

When I run CHECKPOINT manually in psql console, effect is fully
reproduced.
So - it is a checkpoint slowdown.

The quesion is - how to make CHECKPOINT work faster?

On 3/27/07, Dmitry Koterov <[EMAIL PROTECTED]> wrote:
>
> How to tune them?
>
> Now I have:
>
> checkpoint_segments = 5 # in logfile segments, min 1, 16MB each
> checkpoint_timeout = 900# range 30-3600, in seconds
> checkpoint_warning = 30 # in seconds, 0 is off
>
> No checkpoint warnings in pgsql logs.
>
>
> On 3/27/07, Oleg Bartunov < oleg@sai.msu.su> wrote:
> >
> > hmm,
> >
> > looks like checkpoint ?
> > Check checkpoint settings in postgresql.conf.
> >
> >
> > Oleg
> > On Tue, 27 Mar 2007, Dmitry Koterov wrote:
> >
> > > Hello.
> > >
> > > I have found that sometimes heavy loaded PostgreSQL begins to run
> > all the
> > > queries slower than usual, sometimes - 5 and more times slower. I
> > cannot
> > > reprocude that, but symptoms are the following: queries work very
> > fast for
> > > 5-10 minutes, and after that - significant slowdown (every query,
> > even a
> > > simple one, works 5-10 and ever more times slower), disk write
> > activity
> > > grows too (but not everytime - I don't know exactli if there is a
> > direct
> > > correlation). 2-3 seconds, then - performance restores back to
> > normal.
> > >
> > > Autovacuum is turned off.
> > >
> > > Machine has 2 processors, huge memory, fast SCSI disks.
> > >
> > > I understand that there is too less information. Please advice what
> > to
> > > monitor for better problem discovering.
> > >
> >
> > Regards,
> > Oleg
> > _
> > Oleg Bartunov, Research Scientist, Head of AstroNet ( www.astronet.ru
> > ),
> > Sternberg Astronomical Institute, Moscow University, Russia
> > Internet: oleg@sai.msu.su, 
http://www.sai.msu.su/~megera/<http://www.sai.msu.su/%7Emegera/>
> > phone: +007(495)939-16-83, +007(495)939-23-83
> >
> > ---(end of
> > broadcast)---
> > TIP 1: if posting/reading through Usenet, please send an appropriate
> >subscribe-nomail command to [EMAIL PROTECTED] so that
> > your
> >message can get through to the mailing list cleanly
> >
>
>

[GENERAL] How to speedup CHECKPOINTs?

2007-03-28 Thread Dmitry Koterov


Hello.

The quintessence of the problem explained in previous letters: CHECKPOIND
command works quite slow, disk write activity is high just after it is
executed, and diring the CHECKPOINT SELECT queries work many times slower
than usual.

I experimended with bgwriter, but it affects the CHECKPOINT performance very
little (possibly 2-3 times faster but no more).
What else could I tune?

And the general question - why SELECT queries slowdown  during the
CHECKPOINT? I thought that Postgres is a version-based database and read
queries are never blocked...

Re: [GENERAL] Strange behaviour under heavy load

2007-03-29 Thread Dmitry Koterov


How to tune them?

Now I have:

checkpoint_segments = 5 # in logfile segments, min 1, 16MB each
checkpoint_timeout = 900# range 30-3600, in seconds
checkpoint_warning = 30 # in seconds, 0 is off

No checkpoint warnings in pgsql logs.


On 3/27/07, Oleg Bartunov  wrote:


hmm,

looks like checkpoint ?
Check checkpoint settings in postgresql.conf.


Oleg
On Tue, 27 Mar 2007, Dmitry Koterov wrote:

> Hello.
>
> I have found that sometimes heavy loaded PostgreSQL begins to run all
the
> queries slower than usual, sometimes - 5 and more times slower. I cannot
> reprocude that, but symptoms are the following: queries work very fast
for
> 5-10 minutes, and after that - significant slowdown (every query, even a
> simple one, works 5-10 and ever more times slower), disk write activity
> grows too (but not everytime - I don't know exactli if there is a direct
> correlation). 2-3 seconds, then - performance restores back to normal.
>
> Autovacuum is turned off.
>
> Machine has 2 processors, huge memory, fast SCSI disks.
>
> I understand that there is too less information. Please advice what to
> monitor for better problem discovering.
>

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] How to speedup CHECKPOINTs?

2007-03-29 Thread Dmitry Koterov


No. Disk read activity is ALWAYS 0, the system has a lot of disk cache.

On 3/28/07, Joseph S  wrote:


Dmitry Koterov wrote:

> And the general question - why SELECT queries slowdown  during the
> CHECKPOINT? I thought that Postgres is a version-based database and read
> queries are never blocked...

Because the disk is busy.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] How to speedup CHECKPOINTs?

2007-03-29 Thread Dmitry Koterov

No, but disks are about 120 MB/s speed, peak writes during CHECKPOINT are
only 20-30 MB/s, and there is no disk read activity at all, so - no matter
where the xlog resides.

But now seems bgwriter tuning gets some effect, CHECKPOINT is running faster
(about 2-3 seconds instead of 10-15). It is still beats the performance, but
less.

On 3/29/07, CAJ CAJ <[EMAIL PROTECTED]> wrote:

On 3/28/07, Joseph S  wrote:
>
> Dmitry Koterov wrote:
>
> > And the general question - why SELECT queries slowdown  during the
> > CHECKPOINT? I thought that Postgres is a version-based database and
> read
> > queries are never blocked...
>
> Because the disk is busy.

Is your pg_xlog on a separate disk?

[GENERAL] What about SkyTools?

2007-04-11 Thread Dmitry Koterov


Hello.

Have anybody used SkyTools in production environment?
What's the impression? In practice - is it now more preferrable than Slony
or not yet?

Re: [GENERAL] What about SkyTools?

2007-05-11 Thread Dmitry Koterov

Still looking for a possible (more simple) replacement for Slony in the
background.

1. Seems SkyTools does not have an utility to spread DDL changes among all
the replicas (like slonik does). So, if I want to ALTER TABLE, I have to run
this command manually on each node?

2. The architecture of the system is not clear enough. What daemon should
run in what machine? Seems we must have one PgQ daemon on each machine
(master and all slaves), but should we have a londiste daemon on each
machine too or not? (If yes, we have to care about londiste configs
synchronization on all replicas, which is not very handy.)

Seems SkyTools developed for static schemas only, without an ability to
modify the schema.
Am I wrong?

On 4/12/07, Robert Treat <[EMAIL PROTECTED]> wrote:

On Wednesday 11 April 2007 12:08, Dmitry Koterov wrote:
> Hello.
>
> Have anybody used SkyTools in production environment?
> What's the impression? In practice - is it now more preferrable than
Slony
> or not yet?

Well, skype using them in production...   I think the general consensus of
the
postgresql community is that slony is still the preferred choice, but on
number of deployments and general community knowledge, assuming you need
master/slave style replication.  Everything else is still considered
fairly
green technology, though that's no reason not to test it in your
environment.
IMHO YMMV

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] TWO SAME TABLES, ONE UPDATED. HOW TO SYNC THE OTHER?

2007-05-14 Thread Dmitry Koterov


This query will run quite slow if tables are large, so - you may in addition
create a trigger-updated TIMESTAMP columns and search for changed data
through the recent created/updated elements only.

On 13 May 2007 02:21:30 -0700, [EMAIL PROTECTED] <[EMAIL PROTECTED]>
wrote:


On May 11, 11:06 pm, "L. Berger" <[EMAIL PROTECTED]> wrote:
> Hello
>
> I havetwotables-- A and B. The structure of both is thesame. Only,
> B has many indexes and is used for heavy duty SELECTs. On theother
> hand, A only accepts heavy duty INSERTs, so has onlyoneprimary key
> index.
>
> So my DB design is such that A is only an INSERT table. Periodically,
> say every 20 minutes or so, I would like to take all the new INSERTs
> from table A and put them into B.
>
> Is there any clever command to accomplish this? I'd rather not write a
> PHP script with SQL to take every single new record, and update every
> column of a new row in table B. For instance, can I do a replication
> of onlytables, not databases?
>
> Thanks for any pointers!!
>
> LB

Assuming ID is PK:

INSERT INTO b
SELECT *
FROM a
WHERE NOT EXISTS (
SELECT 1
FROM b
WHERE b.ID = a.ID
)


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[GENERAL] Does slonik EXECUTE SCRIPT call waits for comands termination?

2007-06-01 Thread Dmitry Koterov


Hello.

Seems when I use EXECUTE
SCRIPTand slonik
reports PGRES_TUPLES_OK updates may NOT be finished yet on all
slaves.
I ran a long ALTER TABLE statement (about 3 minutes), master updated
immediately after I had seen PGRES_TUPLES_OK, but slave - 10 or more minutes
later.

So, the questions are:

1. THE MAIN question: is it possible to ask slonik to wait untill all scheme
changes were propogated to all slaves after a slonik call?

2. If slonik updates slaves not immediately, but via event creation, why
does it still need to know an information about ALL database hosts, not only
about the master database? I have to enumerate all slave hosts in slonik
calls:

cluster name = my_cluster;
node 1 admin conninfo='host=host1 dbname=m user=slony port=5432
password=**';
node 2 admin conninfo='host=host2 dbname=m user=slony port=5432
password=**';
node 3 admin conninfo='host=host3 dbname=m user=slony port=5432
password=**';
...
 execute script (
   set id = 1,
   filename = '/tmp/e0H7Aa03Fh',
   event node = 1
 );

But if a schema changes are propogated via events, theoretically we have to
know only master's address...

Re: [GENERAL] Does slonik EXECUTE SCRIPT call waits for comands termination?

2007-06-01 Thread Dmitry Koterov


Sorry for mistake, wrong mailing list.

On 6/1/07, Dmitry Koterov <[EMAIL PROTECTED]> wrote:


Hello.

Seems when I use EXECUTE 
SCRIPT<http://slony.info/documentation/stmtddlscript.html>and slonik reports 
PGRES_TUPLES_OK updates may NOT be finished yet on all
slaves.
I ran a long ALTER TABLE statement (about 3 minutes), master updated
immediately after I had seen PGRES_TUPLES_OK, but slave - 10 or more minutes
later.

So, the questions are:

1. THE MAIN question: is it possible to ask slonik to wait untill all
scheme changes were propogated to all slaves after a slonik call?

2. If slonik updates slaves not immediately, but via event creation, why
does it still need to know an information about ALL database hosts, not only
about the master database? I have to enumerate all slave hosts in slonik
calls:

cluster name = my_cluster;
 node 1 admin conninfo='host=host1 dbname=m user=slony port=5432
password=**';
 node 2 admin conninfo='host=host2 dbname=m user=slony port=5432
password=**';
 node 3 admin conninfo='host=host3 dbname=m user=slony port=5432
password=**';
 ...
  execute script (
set id = 1,
filename = '/tmp/e0H7Aa03Fh',
event node = 1
  );

But if a schema changes are propogated via events, theoretically we have
to know only master's address...

Re: [GENERAL] Does slonik EXECUTE SCRIPT call waits for comands termination?

2007-06-27 Thread Dmitry Koterov


The work-around for old Slony versions:

replication_wait() {
   echo "Waiting for all slaves are in sync with the master...";
   echo "
   `slonik_print_preamble`
# Hack for old Slony: this is a dummy operator which generates a
SYNC
# event and saves its ID for later waiting only, nothing more.
DROP PATH (SERVER = $MASTER_NODE_ID, CLIENT = $MASTER_NODE_ID);
WAIT FOR EVENT (
ORIGIN = ALL,
CONFIRMED = ALL,
WAIT ON = $MASTER_NODE_ID
);
   " | slonik
   echo "All slaves are in sync.";
}

This script waits until all slaves are in sync with the master.


On 6/1/07, Dmitry Koterov <[EMAIL PROTECTED]> wrote:


Hello.

Seems when I use EXECUTE 
SCRIPT<http://slony.info/documentation/stmtddlscript.html>and slonik reports 
PGRES_TUPLES_OK updates may NOT be finished yet on all
slaves.
I ran a long ALTER TABLE statement (about 3 minutes), master updated
immediately after I had seen PGRES_TUPLES_OK, but slave - 10 or more minutes
later.

So, the questions are:

1. THE MAIN question: is it possible to ask slonik to wait untill all
scheme changes were propogated to all slaves after a slonik call?

2. If slonik updates slaves not immediately, but via event creation, why
does it still need to know an information about ALL database hosts, not only
about the master database? I have to enumerate all slave hosts in slonik
calls:

cluster name = my_cluster;
 node 1 admin conninfo='host=host1 dbname=m user=slony port=5432
password=**';
 node 2 admin conninfo='host=host2 dbname=m user=slony port=5432
password=**';
 node 3 admin conninfo='host=host3 dbname=m user=slony port=5432
password=**';
 ...
  execute script (
set id = 1,
filename = '/tmp/e0H7Aa03Fh',
event node = 1
  );

But if a schema changes are propogated via events, theoretically we have
to know only master's address...

[GENERAL] Update a single row without firing its triggers?

2007-07-06 Thread Dmitry Koterov


Hello.

Suppose I have a table tbl with columns (a, b, c, counter).
And I have 5 ON UPDATE triggers assigned to this table. They process (a, b,
c) columns, but never depend on counter.

I need to update counter field, but I know that it is totally independent,
so - for performance reason I want to temporarily disable all triggers
during the tbl.counter updation.
How could I do it?

(Please do not offer ALTER TABLE tbl DISABLE TRIGGER ALL. It is NOT a
production case: ALTER TABLE locks all the table during, so it  cannot be
used in heavy-loaded systems.)
(Please do not also offer top move the counter to another table, because it
is used in complex indices, e.g. INDEX ON (counter, a, c) to speedup
fetching.)


Possible solution: add an additional column named "disable_trg" BOOLEAN: (a,
b, c, disable_trg). Then, I use the following UPDATE:

UPDATE tbl SET counter = counter + 1, disable_trg = true WHERE a = 10;

In each trigger I firstly run an instruction:

IF NEW.disable_trg THEN RETURN NEW; END IF;

And the latest trigger resets disable_trg field to NULL, so it is not
written to the table. So, in some queries I may explicitly specify do I need
to disable triggers or not.

But this solution (the only possible?) looks like a brute-force method.
Possibly Postgrs has another one, better?

[GENERAL] Deadlocks caused by a foreign key constraint

2007-08-10 Thread Dmitry Koterov

Hello.

I have a number of deadlock because of the foreign key constraint:

Assume we have 2 tables: A and B. Table A has a field fk referenced to
B.idas a foreign key constraint.


-- transaction #1
BEGIN;
...
INSERT INTO A(x, y, fk) VALUES (1, 2, 666);
...
END;


-- transaction #2
BEGIN;
UPDATE B SET z = z + 1 WHERE id = 666;
...
UPDATE B SET z = z + 1 WHERE id = 666;
...
UPDATE B SET z = z + 1 WHERE id = 666;
END;


You see, table A is only inserted, and table B is only updated their field z
on its single row.
If we execute a lot of these transactions concurrently using multiple
parellel threads, sometimes we have a deadlock:

DETAIL:  Process 6867 waits for ShareLock on transaction 1259392; blocked by
process 30444.
Process 30444 waits for ShareLock on transaction 1259387; blocked by
process 6867.
CONTEXT:  SQL statement "SELECT 1 FROM ONLY "public"."B" x WHERE "id" = $1
FOR SHARE OF x"

If I delete the foreign key constraint, all begins to work fine.
Seems Postgres thinks that "UPDATE B SET z = z + 1 WHERE id = 666" query may
modify B.id field and touch A.fk, so it holds the shareable lock on it.

The question is: is it possible to KEEP this foreign key constraint, but
avoid deadlocks?

Re: [GENERAL] Creating a row during a select

2007-08-14 Thread Dmitry Koterov

Try to read about CREATE RULE in the documentation.

On 8/14/07, Michal Paluchowski <[EMAIL PROTECTED]> wrote:
>
>
> Hi,
>
> is there a way to have PostgreSQL insert a new row into a table during
> a SELECT query if no row is found by that query?
>
>
> --
> Best regards,
> Michal  mailto:[EMAIL PROTECTED]
>
>
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
>
>http://www.postgresql.org/docs/faq
>

Re: [GENERAL] Persistent connections in PHP

2007-08-14 Thread Dmitry Koterov

Pconnects are absolutely necessary if we use tsearch2, because it
initializes its dictionaries on a first query in a session. It's a very
heavy process (500 ms and more). So, if we do not use pconnect, we waste
about 500 ms on each DB connection. Too much pain.

Or, of course, pconnect may be replaced with pgbouncer. It's even better.

On 8/13/07, Scott Marlowe <[EMAIL PROTECTED]> wrote:
>
> On 8/13/07, Josh Trutwin <[EMAIL PROTECTED]> wrote:
> > On Mon, 13 Aug 2007 09:44:26 -0500
> > Erik Jones <[EMAIL PROTECTED]> wrote:
> >
> > > I'll agree with Scott on this one.  (Not that I can recall
> > > specifically ever disagreeing with him before...).  Unless you
> > > know all of the potential caveats associated with php's persisent
> > > postgres connections and have a use case that fits them, don't use
> > > them.  If you need something to pool connections, look at pgpool.
> >
> > Could elaborate a little on the problems with using php's persistent
> > connections?
> >
> > Personally I use ADODB php abstraction library (adodb.sf.net) for my
> > database stuff and I think there's a way to enable persistent
> > connections though I just use the default connection.
> >
> > I've heard before that php's persistent connections are to be
> > avoided, was just curious as to why though?
>
> OK, there are a few things that gather together to make php's
> persistant connections a problem.
>
> 1:  Each apache / php process maintains its own connections, not
> sharing with others.  So it's NOT connection pooling, but people tend
> to think it is.
> 2:  Each unique connection creates another persistent connection for
> an apache/php child process.  If you routinely connect to multiple
> servers / databases or as > 1 user, then each one of those
> combinations that is unique makes another persistent connection.
> 3:  There's no facility in PHP to clean an old connection out and make
> sure it's in some kind of consistent state when you get it.  It's in
> exactly the same state it was when the previous php script finished
> with it.  Half completed transactions, partial sql statements,
> sequence functions like currval() may have values that don't apply to
> you.
> 4:  pg_close can't close a persistent connection.  Once it's open, it
> stays open until the child process is harvested.
> 5:  Apache, by default, is configured for 150 child processes.
> Postgresql, and many other databases for that matter, are configured
> for 100 or less.  Even if apache only opens one connection to one
> database with one user account, it will eventually try to open the
> 101st connection to postgresql and fail.  So, the default
> configuration of apache / postgresql for number of connections is
> unsafe for pconnect.
> 6:  The reason for connection pooling is primarily to twofold.  One is
> to allow very fast connections to your database when doing lots of
> small things where connection time will cost too much.  The other is
> to prevent your database from having lots of stale / idle connections
> that cause it to waste memory and to be slower since each backend
> needs to communicate with every other backend some amount of data some
> times.  pconnect takes care of the first problem, but exacerbates the
> second.
>
> P.s. dont' think I'm dogging PHP, cause I'm not.  I use it all the
> time, and it's really great for simple small scripts that need to be
> done NOW and need to be lightweight.  I even use pconnect a bit.  But
> my machine is set for 50 or fewer apache children and 150 postgresql
> connects, and I only use pconnect on small, lightweight things that
> need to zoom.  Everything else gets regular old connect.
>
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster
>

[GENERAL] How to trap exceptions inside PL/Perl functions?

2007-08-15 Thread Dmitry Koterov

Hello.

In PL/PGSQL I could write:

BEGIN
  SELECT * FROM non_existed_table;
EXCEPTION
  WHEN ... THEN ...
END;

How to do it in PL/Perl? I tried the standard for Perl trapping method:

eval {
  spi_exec_query("SELECT * FROM non_existed_table");
};
if ($@) { ... }

but it does not work - it says that "eval is not safe" or something like
that. But I use eval with {}, not with quotes, so - it has to be safe.
So, how to trap errors in this case?

Re: [GENERAL] Deadlocks caused by a foreign key constraint

2007-08-15 Thread Dmitry Koterov

No.

I have tested all cases, the code I quoted is complete and minimal. All
operations are non-blocking (count incrementation is non-blocking, insertion
with a foreign key is non-blocking too), but it still generates a deadlock
time to time. Deletion of the foreign key constraint completely solves the
problem.

I am using the latest version of Postgres.

You said "I'm pretty sure that recent versions check to see if the key
actually changed", but how could it be if Postgres uses a row-level locking,
not field-level locking? Seems it cannot check what fields are changed, it
locks the whole row.


On 8/15/07, Decibel! <[EMAIL PROTECTED]> wrote:
>
> On Fri, Aug 10, 2007 at 09:38:36PM +0400, Dmitry Koterov wrote:
> > Hello.
> >
> > I have a number of deadlock because of the foreign key constraint:
> >
> > Assume we have 2 tables: A and B. Table A has a field fk referenced to
> > B.idas a foreign key constraint.
> >
> >
> > -- transaction #1
> > BEGIN;
> > ...
> > INSERT INTO A(x, y, fk) VALUES (1, 2, 666);
> > ...
> > END;
> >
> >
> > -- transaction #2
> > BEGIN;
> > UPDATE B SET z = z + 1 WHERE id = 666;
> > ...
> > UPDATE B SET z = z + 1 WHERE id = 666;
> > ...
> > UPDATE B SET z = z + 1 WHERE id = 666;
> > END;
> >
> >
> > You see, table A is only inserted, and table B is only updated their
> field z
> > on its single row.
> > If we execute a lot of these transactions concurrently using multiple
> > parellel threads, sometimes we have a deadlock:
> >
> > DETAIL:  Process 6867 waits for ShareLock on transaction 1259392;
> blocked by
> > process 30444.
> > Process 30444 waits for ShareLock on transaction 1259387; blocked by
> > process 6867.
> > CONTEXT:  SQL statement "SELECT 1 FROM ONLY "public"."B" x WHERE "id" =
> $1
> > FOR SHARE OF x"
> >
> > If I delete the foreign key constraint, all begins to work fine.
> > Seems Postgres thinks that "UPDATE B SET z = z + 1 WHERE id = 666" query
> may
> > modify B.id field and touch A.fk, so it holds the shareable lock on it.
>
> What version are you running? I'm pretty sure that recent versions check
> to see if the key actually changed.
>
> > The question is: is it possible to KEEP this foreign key constraint, but
> > avoid deadlocks?
>
> I'm pretty sure that the deadlock is actually being caused by your
> application code, likely because you're doing multiple updates within
> one transaction, but not being careful about the id order you do them
> in.
> --
> Decibel!, aka Jim Nasby[EMAIL PROTECTED]
> EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)
>
>

Re: [GENERAL] Need help doing a PostgreSQL vs Firebird feature comparison

2007-08-21 Thread Dmitry Koterov

One difference in SQL syntax is that FireBird could join stored procedures
like this:

SELECT b.*
FROM
  get_ids() a
  LEFT JOIN get_data(a.ID) ON 1=1

(where a.ID parameter is passed from the previous set as a next procedure
parameter), but Postgres cannot.



On 8/21/07, Tony Caduto <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I was just wondering if anyone could help me out by taking a look to see
> if I missed any important features.
>
> http://www.amsoftwaredesign.com/pg_vs_fb
>
> This comparison is going to be for the benefit of Delphi users.  The
> Delphi community is heavily biased to Firebird.
>
> Please post any comments or suggestions here:
>
> http://www.amsoftwaredesign.com/smf/index.php?topic=138.0
>
> You don't need to register.
>
> Thanks,
>
> Tony
>
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
>
>http://archives.postgresql.org/
>

[GENERAL] Postgres, fsync and RAID controller with 100M of internal cache & dedicated battery

2007-08-22 Thread Dmitry Koterov

Hello.

We are trying to use HP CISS contoller (Smart Array E200i) with internal
cache memory (100M for write caching, built-in power battery) together with
Postgres. Typically under a heavy load Postgres runs checkpoint fsync very
slow:

checkpoint buffers dirty=16.8 MB (3.3%) write=24.3 ms sync=6243.3 ms

(If we turn off fsync, the speed increases greatly, fsync=0.) And
unfortunately it affects all the database productivity during the
checkpoint.
Here is the timing (in milliseconds) of a test transaction called multiple
times concurrently (6 threads) with fsync turned ON:

40.4
44.4
37.4
44.0
42.7
41.8
218.1
254.2
101.0
42.2
42.4
41.0
39.5

(you may see a significant slowdown during a checkpoint).
Here is dstat disc write activity log for that test:

   0
   0
 284k
   0
   0
  84k
   0
   0
 276k
  37M
  208k
   0
   0
   0
   0
 156k
   0
   0
   0
   0

I have written a small perl script to check how slow is fsync for Smart
Array E200i controller. Theoretically, because of write cache, fsync MUST
cost nothing, but in practice it is not true:

# cd /mnt/c0d1p1/
# perl -e 'use Time::HiRes qw(gettimeofday tv_interval); system "sync"; open
F, ">bulk"; print F ("a" x (1024 * 1024 * 20)); close F; $t0=[gettimeofday];
system "sync"; print ">>> fsync took " . tv_interval ( $t0, [gettimeofday])
. " s\n"; unlink "bulk"'
>>> fsync took 0.247033 s

You see, 50M block was fsynced for 0.25 s.

The question is: how to solve this problem and make fsync run with no delay.
Seems to me that controller's internal write cache is not used (strange,
because all configuration options are fine), but how to check it? Or, maybe,
there is another side-effect?

[GENERAL] Postgres, fsync and RAID controller with 100M of internal cache & dedicated battery

2007-08-22 Thread Dmitry Koterov

And here are results of built-in Postgres test script:

Simple write timing:
write0.006355

Compare fsync times on write() and non-write() descriptor:
(If the times are similar, fsync() can sync data written
 on a different descriptor.)
write, fsync, close  0.233793
write, close, fsync  0.227444

Compare one o_sync write to two:
one 16k o_sync write 0.297093
two 8k o_sync writes 0.402803

Compare file sync methods with one 8k write:

(o_dsync unavailable)
write, fdatasync 0.228725
write, fsync,0.223302

Compare file sync methods with 2 8k writes:
(o_dsync unavailable)
open o_sync, write   0.414954
write, fdatasync 0.335280
write, fsync,0.327195

(Also, I tried to manually specify open_sync method in postgresql.conf, but
after that Postgres database had completely crashed. :-)


On 8/22/07, Dmitry Koterov <[EMAIL PROTECTED] > wrote:
>
> All settings seems to be fine. Mode is writeback.
>
> We temporarily (for tests only on test machine!!!) put pg_xlog into RAM
> drive (to completely exclude xlog fsync from the statistics), but slowdown
> during the checkpoint and 5-10 second fsync during the checkpoint are alive
> yet.
>
> Here are some statistical data from the controller. Other report data is
> attached to the mail.
>
> ACCELERATOR STATUS:
>Logical Drive Disable Map: 0x
>Read Cache Size:   24 MBytes
>Posted Write Size: 72 MBytes
>Disable Flag:  0x00
>Status:0x0001
>Disable Code:  0x
>Total Memory Size: 128 MBytes
>Battery Count: 1
>Battery Status:0x0001
>Parity Read Errors:
>Parity Write Errors:   
>Error Log: N/A
>Failed Batteries:  0x
>Board Present: Yes
>Accelerator Failure Map:   0x
>Max Error Log Entries: 12
>NVRAM Load Status: 0x00
>Memory Size Shift Factor:  0x0a
>Non Battery Backed Memory: 0 MBytes
>Memory State:      0x00
>
>
> On 8/22/07, Scott Marlowe <[EMAIL PROTECTED]> wrote:
> >
> > On 8/22/07, Dmitry Koterov <[EMAIL PROTECTED]> wrote:
> > > Hello.
> > > You see, 50M block was fsynced for 0.25 s.
> > >
> > > The question is: how to solve this problem and make fsync run with no
> > delay.
> > > Seems to me that controller's internal write cache is not used
> > (strange,
> > > because all configuration options are fine), but how to check it? Or,
> > maybe,
> > > there is another side-effect?
> >
> > I would suggest that either the controller is NOT configured fine, OR
> > there's some bug in how the OS is interacting with it.
> >
> > What options are there for this RAID controller, and what are they set
> > to?  Specifically, the writeback / writethru type options for the
> > cache, and it might be if it doesn't preoprly detect a battery backup
> > module it refuses to go into writeback mode.
> >
>
>
>

Re: [GENERAL] Postgres, fsync and RAID controller with 100M of internal cache & dedicated battery

2007-08-22 Thread Dmitry Koterov

This script is here:
postgresql-8.2.3\src\tools\fsync\test_fsync.c


On 8/22/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
>
> On 23/08/07, Dmitry Koterov <[EMAIL PROTECTED]> wrote:
> > And here are results of built-in Postgres test script:
> >
>
>
>
> Can you tell me how I can execute this script on my system? Where is
> this script?
>
> Thanks!
>

[GENERAL] How to switch off Snowball stemmer for tsearch2?

2007-08-22 Thread Dmitry Koterov

Hello.

We use ispell dictionaries for tsearch2 (ru_ispell_cp1251)..
Now Snowball stemmer is also configured.

How to properly switch OFF Snowball stemmer for Russian without turning off
ispell stemmer? (It is really needed, because "Ivanov" is not the same as
"Ivan".)
Is it enough and correct to simply delete the row from pg_ts_dict or not?

Here is the dump of pg_ts_dict table:

dict_namedict_initdict_initoptiondict_lexizedict_comment
en_ispellspell_init(internal)
DictFile=/usr/lib/ispell/english.med,AffFile=/usr/lib/ispell/english.aff,StopFile=/usr/share/pgsql/contrib/english.stop
spell_lexize(internal,internal,integer)
en_stemsnb_en_init(internal)contrib/english.stop
snb_lexize(internal,internal,integer)English Stemmer. Snowball.
ispell_templatespell_init(internal)
spell_lexize(internal,internal,integer)ISpell interface. Must have .dict
and .aff files
ru_ispell_cp1251spell_init(internal)
DictFile=/usr/lib/ispell/russian.med,AffFile=/usr/lib/ispell/russian.aff,StopFile=/usr/share/pgsql/contrib/russian.stop.cp1251
spell_lexize(internal,internal,integer)
ru_stem_cp1251snb_ru_init_cp1251(internal)
contrib/russian.stop.cp1251snb_lexize(internal,internal,integer)
Russian Stemmer. Snowball. WINDOWS (cp1251) Encoding
ru_stem_koi8snb_ru_init_koi8(internal)contrib/russian.stop
snb_lexize(internal,internal,integer)Russian Stemmer. Snowball. KOI8
Encoding
ru_stem_utf8snb_ru_init_utf8(internal)contrib/russian.stop.utf8
snb_lexize(internal,internal,integer)Russian Stemmer. Snowball. UTF8
Encoding
simpledex_init(internal)dex_lexize(internal,internal,integer)
Simple example of dictionary.
synonymsyn_init(internal)
syn_lexize(internal,internal,integer)Example of synonym dictionary
thesaurus_templatethesaurus_init(internal)
thesaurus_lexize(internal,internal,integer,internal)Thesaurus template,
must be pointed Dictionary and DictFile

Re: [GENERAL] How to switch off Snowball stemmer for tsearch2?

2007-08-22 Thread Dmitry Koterov

Suppose I cannot add such synonyms, because:

1. There are a lot of surnames, cannot take care about all of them.
2. After adding a new surname I have to re-calculate all full-text indices,
it costs too much (about 10 days to complete the recalculation).

So, I neet exactly what I ast - switch OFF stem guessing if a word is not in
the dictionary.

On 8/22/07, Oleg Bartunov <[EMAIL PROTECTED]> wrote:
>
> On Wed, 22 Aug 2007, Dmitry Koterov wrote:
>
> > Hello.
> >
> > We use ispell dictionaries for tsearch2 (ru_ispell_cp1251)..
> > Now Snowball stemmer is also configured.
> >
> > How to properly switch OFF Snowball stemmer for Russian without turning
> off
> > ispell stemmer? (It is really needed, because "Ivanov" is not the same
> as
> > "Ivan".)
> > Is it enough and correct to simply delete the row from pg_ts_dict or
> not?
> >
> > Here is the dump of pg_ts_dict table:
>
> don't use dump, plain select would be  better. In your case, I'd
> suggest to follow standard way - create synonym file like
> ivanov ivanov
> and use it before other dictionaries. Synonym dictionary will recognize
> 'Ivanov' and return 'ivanov'.
>
> >
> > dict_namedict_initdict_initoptiondict_lexizedict_comment
> > en_ispellspell_init(internal)
> >
> DictFile=/usr/lib/ispell/english.med,AffFile=/usr/lib/ispell/english.aff,StopFile=/usr/share/pgsql/contrib/english.stop
> > spell_lexize(internal,internal,integer)
> > en_stemsnb_en_init(internal)contrib/english.stop
> > snb_lexize(internal,internal,integer)English Stemmer. Snowball.
> > ispell_templatespell_init(internal)
> > spell_lexize(internal,internal,integer)ISpell interface. Must have
> .dict
> > and .aff files
> > ru_ispell_cp1251spell_init(internal)
> >
> DictFile=/usr/lib/ispell/russian.med,AffFile=/usr/lib/ispell/russian.aff,StopFile=/usr/share/pgsql/contrib/russian.stop.cp1251
> > spell_lexize(internal,internal,integer)
> > ru_stem_cp1251snb_ru_init_cp1251(internal)
> > contrib/russian.stop.cp1251snb_lexize(internal,internal,integer)
> > Russian Stemmer. Snowball. WINDOWS (cp1251) Encoding
> > ru_stem_koi8snb_ru_init_koi8(internal)contrib/russian.stop
> > snb_lexize(internal,internal,integer)Russian Stemmer. Snowball. KOI8
> > Encoding
> > ru_stem_utf8snb_ru_init_utf8(internal)contrib/russian.stop.utf8
> > snb_lexize(internal,internal,integer)Russian Stemmer. Snowball. UTF8
> > Encoding
> >
> simpledex_init(internal)dex_lexize(internal,internal,integer)
> > Simple example of dictionary.
> > synonymsyn_init(internal)
> > syn_lexize(internal,internal,integer)Example of synonym dictionary
> > thesaurus_templatethesaurus_init(internal)
> > thesaurus_lexize(internal,internal,integer,internal)Thesaurus
> template,
> > must be pointed Dictionary and DictFile
> >
>
> Regards,
> Oleg
> _
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly
>

Re: [GENERAL] How to switch off Snowball stemmer for tsearch2?

2007-08-22 Thread Dmitry Koterov

Oh! Thanks!

delete from pg_ts_cfgmap where dict_name = ARRAY['ru_stem'];

solves the root of the problem. But unfortunately
russian.med(ru_ispell_cp1251) contains all Russian names, so "Ivanov"
is converted to
"Ivan" by ispell too. :-(

Now

select lexize('ru_ispell_cp1251', 'Дмитриев') -> "Дмитрий"
select lexize('ru_ispell_cp1251', 'Иванов') -> "Иван"
- it is completely wrong!

I have a database with all Russian name, is it possible to use it (how?) to
make lexize() not to convert "Ivanov" to "Ivan" even if the ispell
dicrionary contains an element for "Ivan"? So, this pseudo-code logic is
needed:

function new_lexize($string) {
  $stem = lexize('ru_ispell_cp1251', $string);
  if ($stem in names_database) return $string; else return $stem;
}

Maybe tsearch2 implements this logic already?

On 8/22/07, Oleg Bartunov <[EMAIL PROTECTED]> wrote:
>
> On Wed, 22 Aug 2007, Dmitry Koterov wrote:
>
> > Suppose I cannot add such synonyms, because:
> >
> > 1. There are a lot of surnames, cannot take care about all of them.
> > 2. After adding a new surname I have to re-calculate all full-text
> indices,
> > it costs too much (about 10 days to complete the recalculation).
> >
> > So, I neet exactly what I ast - switch OFF stem guessing if a word is
> not in
> > the dictionary.
>
> no problem, just modify pg_ts_cfgmap, which contains mapping
> token - dictionaries.
>
> if you change configuration you should rebuild tsvector and reindex.
> 10 days looks very suspicious.
>
>
> >
> > On 8/22/07, Oleg Bartunov <[EMAIL PROTECTED]> wrote:
> >>
> >> On Wed, 22 Aug 2007, Dmitry Koterov wrote:
> >>
> >>> Hello.
> >>>
> >>> We use ispell dictionaries for tsearch2 (ru_ispell_cp1251)..
> >>> Now Snowball stemmer is also configured.
> >>>
> >>> How to properly switch OFF Snowball stemmer for Russian without
> turning
> >> off
> >>> ispell stemmer? (It is really needed, because "Ivanov" is not the same
> >> as
> >>> "Ivan".)
> >>> Is it enough and correct to simply delete the row from pg_ts_dict or
> >> not?
> >>>
> >>> Here is the dump of pg_ts_dict table:
> >>
> >> don't use dump, plain select would be  better. In your case, I'd
> >> suggest to follow standard way - create synonym file like
> >> ivanov ivanov
> >> and use it before other dictionaries. Synonym dictionary will recognize
> >> 'Ivanov' and return 'ivanov'.
> >>
> >>>
> >>>
> dict_namedict_initdict_initoptiondict_lexizedict_comment
> >>> en_ispellspell_init(internal)
> >>>
> >>
> DictFile=/usr/lib/ispell/english.med,AffFile=/usr/lib/ispell/english.aff,StopFile=/usr/share/pgsql/contrib/english.stop
> >>> spell_lexize(internal,internal,integer)
> >>> en_stemsnb_en_init(internal)contrib/english.stop
> >>> snb_lexize(internal,internal,integer)English Stemmer. Snowball.
> >>> ispell_templatespell_init(internal)
> >>> spell_lexize(internal,internal,integer)ISpell interface. Must have
> >> .dict
> >>> and .aff files
> >>> ru_ispell_cp1251spell_init(internal)
> >>>
> >>
> DictFile=/usr/lib/ispell/russian.med,AffFile=/usr/lib/ispell/russian.aff,StopFile=/usr/share/pgsql/contrib/russian.stop.cp1251
> >>> spell_lexize(internal,internal,integer)
> >>> ru_stem_cp1251snb_ru_init_cp1251(internal)
> >>> contrib/russian.stop.cp1251snb_lexize(internal,internal,integer)
> >>> Russian Stemmer. Snowball. WINDOWS (cp1251) Encoding
> >>> ru_stem_koi8snb_ru_init_koi8(internal)contrib/russian.stop
> >>> snb_lexize(internal,internal,integer)Russian Stemmer. Snowball.
> KOI8
> >>> Encoding
> >>>
> ru_stem_utf8snb_ru_init_utf8(internal)contrib/russian.stop.utf8
> >>> snb_lexize(internal,internal,integer)Russian Stemmer. Snowball.
> UTF8
> >>> Encoding
> >>>
> >>
> simpledex_init(internal)dex_lexize(internal,internal,integer)
> >>> Simple example of dictionary.
> >>> synonymsyn_init(internal)
> >>> syn_lexize(internal,internal,integer)Example of synonym dictionary
> >>> thesaurus_templatethesaurus_init(internal)
> >>> thesaurus_lexize(internal,internal,integer,internal

Re: [GENERAL] Postgres, fsync and RAID controller with 100M of internal cache & dedicated battery

2007-08-22 Thread Dmitry Koterov

>
> > I have written a small perl script to check how slow is fsync for Smart
> > Array E200i controller. Theoretically, because of write cache, fsync
> MUST
> > cost nothing, but in practice it is not true
>
> That theory is fundamentally flawed; you don't know what else is in the
> operating system write cache in front of what you're trying to fsync, and
> you also don't know exactly what's in the controller's cache when you
> start.  For all you know, the controller might be filled with cached reads
> and refuse to kick all of them out.  This is a complicated area where

tests are much more useful than trying to predict the behavior.


Nobody else writes, nobody reads. The machine is for tests, it is clean. I
monitor dstat - for 5 minutes before there is no disc activity. So I suppose
that the conntroller cache is already flushed before I am running the test.


> tests are much more useful than trying to predict the behavior. You
> haven't mentioned any details yet about the operating system you're
>
running on; Solaris?  Guessing from the device name.  There have been some
> comments passing by lately about the write caching behavior not being
> turned on by default in that operating system.
>
Linux CentOS x86_64. A lot of memory, 8 processors.
Filesystem is ext2 (to reduce the journalling side-effects).
OS write caching is turned on, turned off and also set to flush once per
second (all these cases are tested, all these have no effect).

The question is - MUST my test script report about a zero fsync time or not,
if the controler has built-in and large write cache. If yes, something wrong
with controller or drivers (how to diagnose?). If no, why?

There are a lot of discussions in this maillist about fsync & battery-armed
controller, people say that a controller with builtin cache memory reduces
the price of fsync to zero. I just want to achieve this.

Re: [GENERAL] Postgres, fsync and RAID controller with 100M of internal cache & dedicated battery

2007-08-22 Thread Dmitry Koterov

Also, the controller is configured to use 75% of its memory for write
caching and 25% - for read caching. So reads cannot flood writes.

On 8/23/07, Dmitry Koterov <[EMAIL PROTECTED]> wrote:
>
> > I have written a small perl script to check how slow is fsync for Smart
> > > Array E200i controller. Theoretically, because of write cache, fsync
> > MUST
> > > cost nothing, but in practice it is not true
> >
> > That theory is fundamentally flawed; you don't know what else is in the
> > operating system write cache in front of what you're trying to fsync,
> > and
> > you also don't know exactly what's in the controller's cache when you
> > start.  For all you know, the controller might be filled with cached
> > reads
> > and refuse to kick all of them out.  This is a complicated area where
>
> tests are much more useful than trying to predict the behavior.
>
>
> Nobody else writes, nobody reads. The machine is for tests, it is clean. I
> monitor dstat - for 5 minutes before there is no disc activity. So I suppose
> that the conntroller cache is already flushed before I am running the test.
>
>
> > tests are much more useful than trying to predict the behavior. You
> > haven't mentioned any details yet about the operating system you're
> >
> running on; Solaris?  Guessing from the device name.  There have been some
> >
> > comments passing by lately about the write caching behavior not being
> > turned on by default in that operating system.
> >
> Linux CentOS x86_64. A lot of memory, 8 processors.
> Filesystem is ext2 (to reduce the journalling side-effects).
> OS write caching is turned on, turned off and also set to flush once per
> second (all these cases are tested, all these have no effect).
>
> The question is - MUST my test script report about a zero fsync time or
> not, if the controler has built-in and large write cache. If yes, something
> wrong with controller or drivers (how to diagnose?). If no, why?
>
> There are a lot of discussions in this maillist about fsync &
> battery-armed controller, people say that a controller with builtin cache
> memory reduces the price of fsync to zero. I just want to achieve this.
>
>
>

Re: [GENERAL] How to switch off Snowball stemmer for tsearch2?

2007-08-23 Thread Dmitry Koterov

>
> > Now
> >
> > select lexize('ru_ispell_cp1251', 'Дмитриев') -> "Дмитрий"
> > select lexize('ru_ispell_cp1251', 'Иванов') -> "Иван"
> > - it is completely wrong!
> >
> > I have a database with all Russian name, is it possible to use it (how?)
> to
>
> if you have such database why just don't write special dictionary and
> put it in front ?


Of course because this is a database of Russian NAMES, but NOT a database of
surnames.


> make lexize() not to convert "Ivanov" to "Ivan" even if the ispell
> > dicrionary contains an element for "Ivan"? So, this pseudo-code logic is
> > needed:
> >
> > function new_lexize($string) {
> >  $stem = lexize('ru_ispell_cp1251', $string);
> >  if ($stem in names_database) return $string; else return $stem;
> > }
> >
> > Maybe tsearch2 implements this logic already?
>
> sure, it's how text search mapping works.


Could you please detalize?

Of course I can create all word-forms of all Russian names using ispell and
then - subtract this full list from Ispell dictionary (so I will remove
"Ivan", "Ivanami" etc. from it). But possily tsearch2 has this subtraction
algorythm already.


> Dmitry, seems your company could be my client :)


Not now, thank you. Maybe later.

Re: [GENERAL] How to switch off Snowball stemmer for tsearch2?

2007-08-23 Thread Dmitry Koterov

>
> write your own dictionary, which implements any logic you need. In your
> case it's just a wrapper around ispell, which will returns original string
> not stem. See example
> http://www.sai.msu.su/~megera/postgres/fts/doc/fts-intdict-xmp.html
> and russian article
> http://www.sai.msu.su/~megera/postgres/talks/fts_pgsql_intro.html#ftsdict

Ah, I understand you!
You offer to write a small Postgres contrib module (new dictionary) in C and
implement all logic in it.
Seems it's a bit complex solution for such a simple task (exclude surnames
for lexization), but - it could be implemented, of course.


> > Of course I can create all word-forms of all Russian names using ispell
> and
> > then - subtract this full list from Ispell dictionary (so I will remove
> > "Ivan", "Ivanami" etc. from it). But possily tsearch2 has this
> subtraction
> > algorythm already.
> >
>
> don't do that ! Just go plain way.
>

Another method is to generate a singular ** synonym dictionary based on all
Russian names word-forms using ispell (we will get all suspicous surnames in
this set) and add it before ispell. This solution does not need to write
anything in C.

[GENERAL] Sthange things happen: SkyTools pgbouncer is NOT a balancer

2007-09-11 Thread Dmitry Koterov

Hello.

We discovered some time ago that pgbouncer is NOT a balancer, because it
cannot spread connections/queries to the same database to multiple servers.
It's unbeliveable, but it's a fact! So, database name in the config MUST be
unique.

E.g. if we write

bardb = host=192.168.0.1 dbname=bardb
bardb = host=192.168.0.2 dbname=bardb
bardb = host=192.168.0.3 dbname=bardb

in the config, pgbouncer always uses the first connection, and others are
ignored. Here is the part of the source code:

== loader.c:
void parse_database(char *name, char *connstr) {
  ...
  db = add_database(name);
  ...
}

== objects.c:
PgDatabase *add_database(const char *name)
{
PgDatabase *db = find_database(name);
/* create new object if needed */
if (db == NULL) {
db = zmalloc(sizeof(*db));
...
}
return db;
}

In these functions "name" is a key from the config ("bardb" in our example).
We see that it's useless to create duplicate keys in config elements in
[databases] sections, because only the first one is accepted.

So, it's completely magical for me why "Session pooling", "Transaction
pooling" and "Statement pooling" options are exist (see
https://developer.skype.com/SkypeGarage/DbProjects/PgBouncer). If pgbouncer
is not a balancer, what purpose is to use "Statement pooling" - if we sent
100 queries (e.g.) in the same connection, they will always be directed to
the SAME MACHINE in its different connections, no balancing optimization at
all.

[GENERAL] How to make LIKE to use index in "abc%" query?

2007-10-08 Thread Dmitry Koterov

Hello.

I run

explain analyze
SELECT id FROM "table" WHERE name LIKE 'dt%';

having a btree index on "name" column. But unfortunately it uses seqscan
instead of index scan, it's too slow.

I had read some mailing archives about that problem, but have not found a
solution. How to fix this LIKE behaviour withour re-creation af the whole
database? Is it possible?

I use windows-1251 locale, not C locale.

[GENERAL] How to speedup intarray aggregate function?

2007-10-09 Thread Dmitry Koterov

Hello.

I created an aggregate:

CREATE AGGREGATE intarray_aggregate_push (_int4)
(
  STYPE = _int4,
  SFUNC = intarray_push_array,
  INITCOND = '{}'
);

(or - I may use _int_union instead of intarray_push_array, its speed is
practically the same in my case).
This aggregate merges together a list of integer[] arrays resulting one big
array with all elements.

Then I want to use this aggregate:

SELECT intarray_aggregate_push(arrayfield)
FROM arraytable

The table arraytable contains a lot of rows (about 5000), each row has array
with length of 5-10 elements, so - the resulting array should contain about
5 elements.

The query is okay, but its speed is too bad: about 1 second.

The main problem is the speed of intarray_aggregate_push function - it is
quite slow, because intarray_push_array reallocates the memory each time I
merge two arrays. I am pretty sure that the reallocaton and copying is the
bottleneck, because if I use another dummy aggreate:

CREATE AGGREGATE intarray_aggregate_dummy (_int4)
(
  STYPE = _int4,
  SFUNC = dummy,
  INITCOND = '{}'
);

CREATE OR REPLACE FUNCTION "public"."dummy" (a integer [], b integer [])
RETURNS integer [] AS
$body$ BEGIN RETURN a; END; $body$
LANGUAGE 'plpgsql' VOLATILE CALLED ON NULL INPUT SECURITY INVOKER;

where dummy() is the function which returns its first argument without any
modification, the speed grows dramatically - about 25 ms (instead of 1000
ms!).

The question is: how could I optimize this, and is it possible at all in
Postgres? I just want to get one large array glued from a lot of smaller
arrays...


P.S.

I have tested that

SELECT array_to_string(ARRAY(SELECT text FROM tbl), ' ')

query is many times faster than joining of all "text" fields inside one
pg/plsql stored function (I assume that it is because Postgres do not
reallocate & copy memory each time it glues a new text piece). But
unfortunately there is no way to convert integer[] to string to use this
method: I could write

select '{1,2}'::integer[]

but I couldn't use

select ARRAY[1,2]::text

Re: [GENERAL] How to speedup intarray aggregate function?

2007-10-10 Thread Dmitry Koterov

Thanks for your comment.

I see two possible solution directions:


1. Is it possible to create C-function, which could accept something like
ROWSET(ARRAY[]) in its input parameters?
E.g. to call it as

SELECT array_rowset_glue((SELECT arrayfield FROM arraytable));

or something like this?


2. Is it possible to implement in C something like this?

array_buffer_init();
SELECT array_buffer_push(arrayfield) FROM arraytable;
ids := array_buffer_get();
array_buffer_free();

where array_buffer_push() is an aggregate function which returns void, but,
as its side-effect, appends arrayfield to the global array buffer for later
acces with array_buffer_get().


On 10/10/07, Filip Rembiałkowski <[EMAIL PROTECTED]> wrote:
>
> 2007/10/10, Dmitry Koterov <[EMAIL PROTECTED]>:
> > Hello.
> >
> > I created an aggregate:
> >
> > CREATE AGGREGATE intarray_aggregate_push (_int4)
> > (
> >   STYPE = _int4,
> >   SFUNC = intarray_push_array,
> >   INITCOND = '{}'
> > );
> >
> > (or - I may use _int_union instead of intarray_push_array, its speed is
> > practically the same in my case).
> > This aggregate merges together a list of integer[] arrays resulting one
> big
> > array with all elements.
> >
> > Then I want to use this aggregate:
> >
> > SELECT intarray_aggregate_push(arrayfield)
> > FROM arraytable
> >
> >  The table arraytable contains a lot of rows (about 5000), each row has
> > array with length of 5-10 elements, so - the resulting array should
> contain
> > about 5 elements.
> >
> > The query is okay, but its speed is too bad: about 1 second.
> >
> > The main problem is the speed of intarray_aggregate_push function - it
> is
> > quite slow, because intarray_push_array reallocates the memory each time
> I
> > merge two arrays. I am pretty sure that the reallocaton and copying is
> the
> > bottleneck, because if I use another dummy aggreate:
> >
> > CREATE AGGREGATE intarray_aggregate_dummy (_int4)
> > (
> >   STYPE = _int4,
> >   SFUNC = dummy,
> >   INITCOND = '{}'
> > );
> >
> > CREATE OR REPLACE FUNCTION "public"."dummy" (a integer [], b integer [])
> > RETURNS integer [] AS
> > $body$ BEGIN RETURN a; END; $body$
> > LANGUAGE 'plpgsql' VOLATILE CALLED ON NULL INPUT SECURITY INVOKER;
> >
> > where dummy() is the function which returns its first argument without
> any
> > modification, the speed grows dramatically - about 25 ms (instead of
> 1000
> > ms!).
> >
> > The question is: how could I optimize this, and is it possible at all in
> > Postgres? I just want to get one large array glued from a lot of smaller
> > arrays...
>
>
> 1. no wonder copying is the bottleneck - this is what the aggregate
> does, mostly.
>
> 2. you can use plain array_cat for this, in my test it is few percent
> faster
>
> 3. in this case I guess intarrray contrib is not an option, AFAIK it
> was created only for speeding up searches, that is int4[] lookups
>
> 4. to have this kind of optimization you talk about, we would need an
> aggregate operating (in this case appending) directly on
> internalstate. i'm not sure if this is possible in postgres
>
> 5. my results:
> your method (using intarray_push_array): 940 ms
> using array_cat: 860 ms
> same in PL/PgSQL: (LOOP, append) 800 ms
> same thing in Perl, no database (push array of arrays into one and
> print ): 18 ms
>
>
> cheers, Filip
>
>
> --
> Filip Rembiałkowski
>
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
>
>http://archives.postgresql.org/
>

Re: [GENERAL] How to speedup intarray aggregate function?

2007-10-10 Thread Dmitry Koterov

Wow, seems I've found that!

 *  Beginning in PostgreSQL 8.1, the executor's AggState node is passed
as
 *  the fmgr "context" value in all transfunc and finalfunc calls.  It
is
 *  not really intended that the transition functions will look into the
 *  AggState node, but they can use code like
 *if (fcinfo->context && IsA(fcinfo->context, AggState))
 *  to verify that they are being called by nodeAgg.c and not as
ordinary
 *  SQL functions.  The main reason a transition function might want to
know
 *  that is that it can avoid palloc'ing a fixed-size pass-by-ref
transition
 *  value on every call: it can instead just scribble on and return its
left
 *  input.  Ordinarily it is completely forbidden for functions to
modify
 *  pass-by-ref inputs, but in the aggregate case we know the left input
is
 *  either the initial transition value or a previous function result,
and
 *  in either case its value need not be preserved.  See int8inc() for
an
 *  example.Notice that advance_transition_function() is coded to
avoid a
 *  data copy step when the previous transition value pointer is
returned.

So theoretically I may create intarray_aggregate_push() function which, when
called by aggregate, does not reallocate & copy memory each time it is
called. Instead, it may allocate 1M memory at once (with gap), or enlarge
the memory segment by factor of 2 when it need to reallocate it (it is
O(log2) instead of O(N)).

And here is an example from the source code:

Datum
int8inc(PG_FUNCTION_ARGS)
{
if (fcinfo->context && IsA(fcinfo->context, AggState))
{
/*
 * Special case to avoid palloc overhead for COUNT(): when called
from
 * nodeAgg, we know that the argument is modifiable local storage,
so
 * just update it in-place.
 *
 * Note: this assumes int8 is a pass-by-ref type; if we ever support
 * pass-by-val int8, this should be ifdef'd out when int8 is
 * pass-by-val.
 */
int64   *arg = (int64 *) PG_GETARG_POINTER(0);
int64result;

result = *arg + 1;
/* Overflow check */
if (result < 0 && *arg > 0)
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg("bigint out of range")));

    *arg = result;
PG_RETURN_POINTER(arg);
}
...
}

On 10/10/07, Dmitry Koterov <[EMAIL PROTECTED]> wrote:
>
> Thanks for your comment.
>
> I see two possible solution directions:
>
>
> 1. Is it possible to create C-function, which could accept something like
> ROWSET(ARRAY[]) in its input parameters?
> E.g. to call it as
>
> SELECT array_rowset_glue((SELECT arrayfield FROM arraytable));
>
> or something like this?
>
>
> 2. Is it possible to implement in C something like this?
>
> array_buffer_init();
> SELECT array_buffer_push(arrayfield) FROM arraytable;
> ids := array_buffer_get();
> array_buffer_free();
>
> where array_buffer_push() is an aggregate function which returns void,
> but, as its side-effect, appends arrayfield to the global array buffer for
> later acces with array_buffer_get().
>
>
> On 10/10/07, Filip Rembiałkowski <[EMAIL PROTECTED]> wrote:
> >
> > 2007/10/10, Dmitry Koterov <[EMAIL PROTECTED]>:
> > > Hello.
> > >
> > > I created an aggregate:
> > >
> > > CREATE AGGREGATE intarray_aggregate_push (_int4)
> > > (
> > >   STYPE = _int4,
> > >   SFUNC = intarray_push_array,
> > >   INITCOND = '{}'
> > > );
> > >
> > > (or - I may use _int_union instead of intarray_push_array, its speed
> > is
> > > practically the same in my case).
> > > This aggregate merges together a list of integer[] arrays resulting
> > one big
> > > array with all elements.
> > >
> > > Then I want to use this aggregate:
> > >
> > > SELECT intarray_aggregate_push(arrayfield)
> > > FROM arraytable
> > >
> > >  The table arraytable contains a lot of rows (about 5000), each row
> > has
> > > array with length of 5-10 elements, so - the resulting array should
> > contain
> > > about 5 elements.
> > >
> > > The query is okay, but its speed is too bad: about 1 second.
> > >
> > > The main problem is the speed of intarray_aggregate_push function - it
> > is
> > > quite slow, because intarray_push_array reallocates the memory each
> > time I
> > > merge two arrays. I am pretty sure that the reallocaton and copying is
> > the
> > > bottleneck, because if

Re: [GENERAL] How to speedup intarray aggregate function?

2007-10-10 Thread Dmitry Koterov

I have written in C all needed contrib functions: intarray.bidx() (binary
search in sorted list) and intagg.int_agg_append_state (bufferized appending
of one array to another without linear memory reallocation). The speed now
is great: in one case with intersection of 10 and 15000 arrays it become
30ms instead of 1600 ms (50 times faster).

Few days later, after complex testing, I'll publish complete patches in
pgsql-hackers maillist.

On 10/10/07, Dmitry Koterov <[EMAIL PROTECTED]> wrote:
>
> Wow, seems I've found that!
>
>  *  Beginning in PostgreSQL 8.1, the executor's AggState node is
> passed as
>  *  the fmgr "context" value in all transfunc and finalfunc calls.  It
> is
>  *  not really intended that the transition functions will look into
> the
>  *  AggState node, but they can use code like
>  *if (fcinfo->context && IsA(fcinfo->context, AggState))
>  *  to verify that they are being called by nodeAgg.c and not as
> ordinary
>  *  SQL functions.  The main reason a transition function might want
> to know
>  *  that is that it can avoid palloc'ing a fixed-size pass-by-ref
> transition
>  *  value on every call: it can instead just scribble on and return
> its left
>  *  input.  Ordinarily it is completely forbidden for functions to
> modify
>  *  pass-by-ref inputs, but in the aggregate case we know the left
> input is
>  *  either the initial transition value or a previous function result,
> and
>  *  in either case its value need not be preserved.  See int8inc() for
> an
>  *  example.Notice that advance_transition_function() is coded to
> avoid a
>  *  data copy step when the previous transition value pointer is
> returned.
>
> So theoretically I may create intarray_aggregate_push() function which,
> when called by aggregate, does not reallocate & copy memory each time it is
> called. Instead, it may allocate 1M memory at once (with gap), or enlarge
> the memory segment by factor of 2 when it need to reallocate it (it is
> O(log2) instead of O(N)).
>
> And here is an example from the source code:
>
> Datum
> int8inc(PG_FUNCTION_ARGS)
> {
> if (fcinfo->context && IsA(fcinfo->context, AggState))
> {
> /*
>  * Special case to avoid palloc overhead for COUNT(): when called
> from
>  * nodeAgg, we know that the argument is modifiable local storage,
> so
>  * just update it in-place.
>  *
>  * Note: this assumes int8 is a pass-by-ref type; if we ever
> support
>  * pass-by-val int8, this should be ifdef'd out when int8 is
>  * pass-by-val.
>  */
> int64   *arg = (int64 *) PG_GETARG_POINTER(0);
> int64result;
>
> result = *arg + 1;
> /* Overflow check */
> if (result < 0 && *arg > 0)
> ereport(ERROR,
> (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
>  errmsg("bigint out of range")));
>
> *arg = result;
> PG_RETURN_POINTER(arg);
> }
> ...
> }
>
>
> On 10/10/07, Dmitry Koterov < [EMAIL PROTECTED]> wrote:
> >
> > Thanks for your comment.
> >
> > I see two possible solution directions:
> >
> >
> > 1. Is it possible to create C-function, which could accept something
> > like ROWSET(ARRAY[]) in its input parameters?
> > E.g. to call it as
> >
> > SELECT array_rowset_glue((SELECT arrayfield FROM arraytable));
> >
> > or something like this?
> >
> >
> > 2. Is it possible to implement in C something like this?
> >
> > array_buffer_init();
> > SELECT array_buffer_push(arrayfield) FROM arraytable;
> > ids := array_buffer_get();
> > array_buffer_free();
> >
> > where array_buffer_push() is an aggregate function which returns void,
> > but, as its side-effect, appends arrayfield to the global array buffer for
> > later acces with array_buffer_get().
> >
> >
> > On 10/10/07, Filip Rembiałkowski < [EMAIL PROTECTED]> wrote:
> > >
> > > 2007/10/10, Dmitry Koterov <[EMAIL PROTECTED]>:
> > > > Hello.
> > > >
> > > > I created an aggregate:
> > > >
> > > > CREATE AGGREGATE intarray_aggregate_push (_int4)
> > > > (
> > > >   STYPE = _int4,
> > > >   SFUNC = intarray_push_array,
> > > >   INITCOND = '{}'
> > > > );
> > > >
> > > > (or - I may use _int_union instead o

[GENERAL] How to UPDATE in ROW-style?

2008-06-20 Thread Dmitry Koterov

Hello.

How could I write something like:

  DECLARE r table%ROWTYPE;
  ...
  UPDATE table SET (table.*) = (SELECT r.*) WHERE id = r.id;

*This *syntax is invalid, and I cannot find another proper way to do it
without explicit enumeration of table's columns.

I don't want to explicitly specify table's column to minimize later
refactoring.



P.S.

The corresponding INSERT operator works fine:

  DECLARE r table%ROWTYPE;
  ...
  INSERT INTO table (SELECT r.*);

Please say if a similar syntax for UPDATE exists.

Re: [GENERAL] How to modify ENUM datatypes? (The solution)

2008-08-12 Thread Dmitry Koterov

Here is the solution about "on the fly" ALTER ENUM:
http://en.dklab.ru/lib/dklab_postgresql_enum/

Usage:

*-- Add a new element to the ENUM "on the fly".
SELECT enum.enum_add('my_enum', 'third');*

*-- Remove an element from the ENUM "on the fly".
SELECT enum.enum_del('my_enum', 'first');*

Possibly future versions of PostgreSQL will include built-in ALTER TYPE for
ENUM, all the more its implementation is not impossible, as you see above.
Hope this will be helpful.



On Wed, Apr 23, 2008 at 4:25 AM, Merlin Moncure <[EMAIL PROTECTED]> wrote:

> On Tue, Apr 22, 2008 at 6:11 PM, Jeff Davis <[EMAIL PROTECTED]> wrote:
> >  If you store an integer reference instead, joins are not necessarily
> >  expensive. If the number of distinct values is small (which is the
> >  normal use case for ENUM), I would expect the joins to be quite cheap.
> >  Beware of running into bad plans however, or making the optimizer work
> >  too hard (if you have a lot of other joins, too).
>
> Necessarily being the operative word here.  Think about an enum as
> part of a composite key for example.  It's a lot nicer to rely on enum
> for natural ordering than doing something like a functional index.
>
> Anyways, it's pretty easy to extend an enum...you can manually insert
> an entry into pg_enum (see the relevent docs).  Just watch out for oid
> overlap.  One thing currently that is very difficult currently to do
> is to alter the order of the enum elements.  The current state of
> things is pretty workable though.
>
> Scott's color/mystuff example is generally preferred for a lot of
> cases.  I _really_ prefer this to surrogate style enums where you have
> color_id...this approach makes your database unreadable IMO.  A decent
> hybrid approach which I have been using lately is "char" (not char)
> where the choices set is reasonably small, well represented by a
> single character, and the intrinsic ordering property is not too
> important (where an enum might be better).  In many cases though, the
> pure natural approach is simply the best.  The enum though with is
> intrinsic ordering and more efficient indexing has an important niche
> however.
>
> merlin
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] How to modify ENUM datatypes? (The solution)

2008-08-13 Thread Dmitry Koterov

About LGPL - I don't know.
But the license is not a problem, this code is totally freeware (because too
simple).
LGPL is just my favorite license type for years. :-)

I'll change this if you prefer another license and explain, why (why BSD?
BSD is the PostgreSQL license?)


On Wed, Aug 13, 2008 at 4:25 AM, Merlin Moncure <[EMAIL PROTECTED]> wrote:

> On Tue, Aug 12, 2008 at 5:40 PM, Dmitry Koterov <[EMAIL PROTECTED]> wrote:
> > Here is the solution about "on the fly" ALTER ENUM:
> > http://en.dklab.ru/lib/dklab_postgresql_enum/
> >
> > Usage:
> >
> > -- Add a new element to the ENUM "on the fly".
> >
> > SELECT enum.enum_add('my_enum', 'third');
> >
> > -- Remove an element from the ENUM "on the fly".
> > SELECT enum.enum_del('my_enum', 'first');
> >
> > Possibly future versions of PostgreSQL will include built-in ALTER TYPE
> for
> > ENUM, all the more its implementation is not impossible, as you see
> above.
> > Hope this will be helpful.
>
> Decent user space solution...it's easy enough.  IMO 'real' solution is
> through alter type as you suggest.  It's worth noting there there is
> no handling for the unlikely but still possible event of oid
> wraparound.  Also, there is no 'enum_insert', which is not so pleasant
> with how enums are implemented.
>
> Also, is lgpl compatible with bsd licnese? Not that it matters, but I'm
> curious.
>
> merlin
>

Re: [GENERAL] How to get many data at once?

2008-08-13 Thread Dmitry Koterov

Try to use

SELECT ARRAY(SELECT t_data FROM THETABLE WHERE t_ref_id = '1') AS v;

In PHP you may fetch all matched values as a single string and then - use
explode() to split it into values (possibly with later stripslashes).
It is much faster than fetching a thousands of rows.


On Thu, Aug 7, 2008 at 3:03 PM, 窦德厚(ddh) <[EMAIL PROTECTED]> wrote:

> Hi, if I have such a table:
>
> t_ref_id t_data
> 
> 1 'abc'
> 2 '321'
> 1 'ddd'
> 2 'xyz'
> 9 '777'
> ...
>
>
> I want to get data with a special t_ref_id:
>
> SELECT t_data FROM THETABLE WHERE t_ref_id = '1';
>
> I must use a while loop to extract the data (I'm using PHP):
>
> $rows = array();
> while (($row = pgsql_fetch_assoc($result) !== false) {
> $rows[] = $row;
> }
>
> And if there are many matched rows, such as many hundreds or thousands of
> rows, I think such a loop maybe inefficient.
>
> How to do this in a more efficient way?
>
> Thank you!
>
>
>
> --
> ddh
>

Re: [GENERAL] How to modify ENUM datatypes? (The solution)

2008-08-13 Thread Dmitry Koterov

Done.
Now it's BSD licensed. :-)

On Wed, Aug 13, 2008 at 4:57 PM, Merlin Moncure <[EMAIL PROTECTED]> wrote:

> On Wed, Aug 13, 2008 at 5:20 AM, Dmitry Koterov <[EMAIL PROTECTED]> wrote:
> > About LGPL - I don't know.
> > But the license is not a problem, this code is totally freeware (because
> too
> > simple).
> > LGPL is just my favorite license type for years. :-)
> >
> > I'll change this if you prefer another license and explain, why (why BSD?
> > BSD is the PostgreSQL license?)
>
> yup.
>
> merlin
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] selecting data from subquery in same order

2008-08-17 Thread Dmitry Koterov

You may use something like this in a stored function:

DECLARE
a INTEGER[];
BEGIN
a := '{2341548, 2325251, 2333130, 2015421,2073536, 2252374, 2273219,
2350850, 2367318, 2032977, 2032849}';
select * from users where id = any(a) order by idx(a, id);
END;

Or in the plain SQL:

select * from users where id = any(a) order by idx('{2341548, 2325251,
2333130, 2015421,2073536, 2252374, 2273219, 2350850, 2367318, 2032977,
2032849}', id);

Note that it is pretty fast only if the array contains not too much elements
(e.g. 20). Do not use for large arrays!



On Sun, Aug 17, 2008 at 4:11 AM, mark <[EMAIL PROTECTED]> wrote:

> hi
> if i execute this statement:
>
> select * from users where id in (2341548, 2325251, 2333130, 2015421,
> 2073536, 2252374, 2273219, 2350850, 2367318, 2032977, 2032849, )
>
> the order of rows obtained is random.
>
> is there anyway i can get the rows in the same order as the ids in
> subquery? or is there a different statement i can use?
> thanks!
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] How to execute 'set session role' from plpgsql function?

2008-08-17 Thread Dmitry Koterov

Just a suggestion:

EXECUTE 'SET SESSION ROLE wishedrole';

won't help?


2008/8/17 Oleg Vasylenko <[EMAIL PROTECTED]>

> Hi,everybody!
>
> I wish to have a function with code above, but compiller generate
> syntactic error at the line "SET SESSION ROLE wishedrole;".
>
> How to pass the wishedrole value to the structure?
>
> CREATE OR REPLACE FUNCTION f_switch_role(INOUT text,INOUT boolean) AS
>  $BODY$
>  DECLARE
>   wishedrole ALIAS FOR $1;
>   resetrole ALIAS FOR $2;
>  BEGIN
>   if resetrole=true then
>RESET ROLE;
>RETURN;
>   end if;
>
> ERROR OCURS AT THE NEXT LINE <<
>   SET SESSION ROLE wishedrole;
>   RETURN;
>
>  END;
>  $BODY$
>  LANGUAGE 'plpgsql' VOLATILE
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

[GENERAL] Open source PostgreSQL type parsers in PHP?

2008-08-17 Thread Dmitry Koterov

Hello.

Is there any open source and well-tested PHP libraries to parse and build
the following PostgreSQL data types?

- arrays (including N-dimensional): {a,b,c}
- hstore: "a=>b, c=>d"
- ROW: ("a","b","(""c"",""d"")")

E.g. I have some PHP variable and want to build corresponding PostgreSQL
value.
Or, I have fetched a PostgreSQL value as a string and want to parse it into
PHP variable.

Re: [GENERAL] Changing between ORDER BY DESC and ORDER BY ASC

2008-08-18 Thread Dmitry Koterov

http://www.mail-archive.com/pgsql-general@postgresql.org/msg111788.htmlprobably
won't match an index, because ASC or DESC ordering depends NOT on
the table's data, but on the function parameter.

Unfortunately the planner does not recognize the following case:

CREATE TABLE "public"."prime" (
  "num" NUMERIC NOT NULL,
  CONSTRAINT "prime_pkey" PRIMARY KEY("num")
) WITH OIDS;

CREATE INDEX "prime_idx" ON "public"."prime"
  USING btree ((CASE WHEN true THEN num ELSE (- num) END));

CREATE OR REPLACE FUNCTION "public"."prime_test" (a boolean) RETURNS SETOF
integer AS
$body$
DECLARE
rec RECORD;
BEGIN
FOR rec IN
select *
from prime
order by case when a then num else -num end
limit 20
LOOP
RETURN NEXT rec.num;
END LOOP;
END;
$body$
LANGUAGE 'plpgsql' VOLATILE CALLED ON NULL INPUT SECURITY INVOKER;

EXPLAIN ANALYZE select * from prime_test(true);
-- hundreds of seconds - so the index is not used

Seems the planner does not understand that "a" variable is constant "true"
within the query and does not use prime_idx index (in spite of prime_idx is
defined dummyly as CASE WHEN true THEN ... ELSE ... END).

William, you may try to use EXECUTE instruction with customly built query
with ASC or DESC inserted.

On Mon, Aug 18, 2008 at 3:31 PM, Sergey Konoplev <[EMAIL PROTECTED]> wrote:

> On Fri, Aug 15, 2008 at 9:35 PM, William Garrison <[EMAIL PROTECTED]>wrote:
>
>> Is there an easy way to write one single query that can alternate between
>> ASC and DESC orders?  Ex:
>>
>>
> Take a look at this link
> http://www.mail-archive.com/pgsql-general@postgresql.org/msg111788.html
>
> --
> Regards,
> Sergey Konoplev
>

[GENERAL] TSearch2: find a QUERY that does match a single document

2008-09-12 Thread Dmitry Koterov

Hello.

TSearch2 allows to search a table of tsvectors by a single tsquery.
I need to solve the reverse problem.

*I have a large table of tsquery. I need to find all tsqueries in that table
that match a single document tsvector:
*
CREATE TABLE "test"."test_tsq" (
  "id" SERIAL,
  "q" TSQUERY NOT NULL,
  CONSTRAINT "test_tsq_pkey" PRIMARY KEY("id")
);

insert into test.test_tsq(q)
select to_tsquery(g || 'x' || g) from generate_series(10, 90) as g;

explain analyze
select * from test.test_tsq
where to_tsvector('40x40') @@ q

This gets a strange explain analyze:

QUERY PLAN
Seq Scan on test_tsq  (cost=0.00..17477.01 rows=800 width=36) (actual
time=68.698..181.458 rows=1 loops=1)
  Filter: ('''40x40'':1'::tsvector @@ q)
Total runtime: 181.484 ms

No matter if I use GIST index on test_tsq.q or not: the explain analyze
result is the same.
So, why "rows=800"? The table contains much more rows...

Re: [GENERAL] TSearch2: find a QUERY that does match a single document

2008-09-13 Thread Dmitry Koterov

>
> explain analyze
>> select * from test.test_tsq
>> where to_tsvector('40x40') @@ q
>>
>
> why do you need tsvector @@ q ? Much better to use  tsquery = tsquery
>
> test=# explain analyze select * from test_tsq where q =
> '40x40'::tsque>
>QUERY PLAN
>
> ---
>  Seq Scan on test_tsq  (cost=0.00..16667.01 rows=1 width=38) (actual
> time=129.208..341.111 rows=1 loops=1)
>   Filter: (q = '''40x40'''::tsquery)
>  Total runtime: 341.134 ms
> (3 rows)
>
M-mmm... Seems your understood me incorrectly.

I have to find NOT queries which are exactly equal to another query, BUT
queries which MATCH the GIVEN document. '40x40' was a sample only,
in real cases it will be 1-2K document.

Here is a more realistic sample:

explain analyze
select * from test.test_tsq
where to_tsvector('
  Here is a real document text. It may be long, 1-2K.
  In this sample it contains a lexem "40x40", so there is a tsquery
  in test_tsq.q which matches this document. I need to find all such queries
fast.
  Of course, in real cases the document text is unpredictable.
') @@ q



> QUERY PLAN
>> Seq Scan on test_tsq  (cost=0.00..17477.01 rows=800 width=36) (actual
>> time=68.698..181.458 rows=1 loops=1)
>>  Filter: ('''40x40'':1'::tsvector @@ q)
>> Total runtime: 181.484 ms
>>
>
> '800' is the number of estimated rows, which is not good, since you got
> only 1 row.
>
Why 800? The table contains 80 rows, and seqscan is used. Does it scan
the whole table or not? If yes, possibly there is a bug in explain output?
(No mater if I create GIST index on test_tsq.q or not, the number of rows is
still 800, so it seems to me that GIST index is not used at all.)

[GENERAL] Is there bigintarray?

2008-09-13 Thread Dmitry Koterov

Hello.

We have a good intarray contrib module which contains a lot of features:
additional functions, operators with GIN support etc.

Are there plans for bigintarray?

[GENERAL] Indirect access to NEW or OLD records

2008-09-25 Thread Dmitry Koterov

Hello.

I have a variable with a field name and want to extract this field
value from NEW record:

DECLARE
field VARCHAR = 'some_field';
BEGIN
...
value := NEW.{field};  -- ???
END;

Is it possible in pl/pgsql?

I have found one speed-inefficient solution: convert NEW to string and
then - use EXECURE with customly-build query to extract a value from
that constant string. But it is too slow. Is there better solution?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Indirect access to NEW or OLD records

2008-09-25 Thread Dmitry Koterov

I have tried plperl, but the following sample does not work:

CREATE FUNCTION "extract_field_as_varchar" (rec record, field varchar)
RETURNS varchar AS
$body$
...
$body$
LANGUAGE 'plperl' VOLATILE CALLED ON NULL INPUT SECURITY INVOKER;

It says "ERROR:  plperl functions cannot take type record".

So, I cannot ever create an utility function to call it as:

value := extract_field_as_varchar(NEW, 'field');

Seems I have to write an entire trigger in plperl to access NEW record
indirectly?




On Thu, Sep 25, 2008 at 10:08 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> "Dmitry Koterov" <[EMAIL PROTECTED]> writes:
>> I have a variable with a field name and want to extract this field
>> value from NEW record:
>
>> DECLARE
>> field VARCHAR = 'some_field';
>> BEGIN
>> ...
>> value := NEW.{field};  -- ???
>> END;
>
>> Is it possible in pl/pgsql?
>
> No.  Quite aside from the lack of syntax for that, plpgsql would be
> unhappy if the data type wasn't the same on successive executions.
>
> Use one of the less strongly typed PLs instead.  I believe this is
> pretty easy in plperl or pltcl.  (Maybe plpython too, but I don't know
> that language.)
>
>regards, tom lane
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Visual DATA editor for PostgreSQL?

2009-12-30 Thread Dmitry Koterov

Hello.

Is there a GUI utility to visually edit Postgres DATA (not a database
schema!), which allows at least:
- insert/update rows using screen windowed forms (possibly ugly
auto-generated forms, but - still forms)
- insert foreign key references by selecting them from a list (not by typing
the keys manually)
- work with multi-line text fields (textarea) for TEXT columns

There is a lot of DB development tools around (e.g. I like EMS PostgreSQL
Manager). But a developer tool is handy for a database STRUCTURE editing,
and when we need to modify its DATA quickly, these tools are not too useful.

Re: [GENERAL] Visual DATA editor for PostgreSQL?

2009-12-31 Thread Dmitry Koterov

Thanks, but seems Lightning Admin does not support foreign keys choice (at
least, its demo version).
Any other solutions?

On Thu, Dec 31, 2009 at 10:47 AM, Tony Caduto <
tony_cad...@amsoftwaredesign.com> wrote:

> Dmitry Koterov wrote:
>
>> Hello.
>>
>> Is there a GUI utility to visually edit Postgres DATA (not a database
>> schema!), which allows at least:
>> - insert/update rows using screen windowed forms (possibly ugly
>> auto-generated forms, but - still forms)
>> - insert foreign key references by selecting them from a list (not by
>> typing the keys manually)
>> - work with multi-line text fields (textarea) for TEXT columns
>>
>> There is a lot of DB development tools around (e.g. I like EMS PostgreSQL
>> Manager). But a developer tool is handy for a database STRUCTURE editing,
>> and when we need to modify its DATA quickly, these tools are not too useful.
>>
> Lightning Admin has a form view when editing data and shows text fields as
> multi line in the form view and in the data grid itself.
>
> Later,
>
>
> Tony
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

[GENERAL] Split pg_dump dump into files and then combine it back

2010-01-14 Thread Dmitry Koterov

Hello.

Is there a tool (or a way) to parse a pg_dump'ed (structure only) dump file
into smaller files (each function in its own file, its table in its own
etc.) with ability to combine these files later into the proper dump file?

The main problem is dependencies. Sometimes functions are defined before
types, sometimes after. The same for cross-schema references. Dump parsing
is simple, but this process drops dependencies between objects and could not
be reversed easily.

[GENERAL] How to create "auto-increment" field WITHOUT a sequence object?

2011-06-30 Thread Dmitry Koterov

Hello.

I need to create an auto-increment field on a table WITHOUT using sequences:

CREATE TABLE tbl(
  name TEXT,
  uniq_id INTEGER
);

Each INSERT to this table must generate a new uniq_id which is distinct from
all others.

The problem is that these INSERTs are rolled back oftenly (i.e. they are
executed within a transaction block which is rolled back time to time), this
is an existing design of the current architecture and unfortunately we have
to live with it. And I need as compact uniq_id generation (with minimum
"holes") as it possible - this is a VERY important requirement (to export
these values into external systems which accepts only IDs limited from 1 to
10).

So I cannot use sequences: sequence value is obviously not rolled back, so
if I insert nextval(...) as uniq_id, I will have large holes (because of
often transaction rollbacks) and exhaust 10 uniq_ids very fast. How to
deal with all this without sequences?

I tried

BEGIN;
LOCK TABLE tbl;
INSERT INTO tbl(uniq_id) VALUES((SELECT max(uniq_id) FROM tbl) + 1);
COMMIT;

but seems it performs too hard locking - time to time this query is timed
out (or sometimes deadlocks with other queries).

Is there any other, less hard, locking which allow me to guarantee that no
INSERTs will be performed into tbl between max() calculation and UPDATE
query itself, but does not lock the whole table?

Re: [GENERAL] How to create "auto-increment" field WITHOUT a sequence object?

2011-06-30 Thread Dmitry Koterov

Thank you.

It may not be "fully" gapless. The main cause is to keep uniq_id as low as
it could be to not to exhaust 10 values too fast.
I think solutions with addition tables look too complicated for this case,
is there a possiblilty to not to use an additional table?

On Thu, Jun 30, 2011 at 10:55 PM, A.M.  wrote:

>
> On Jun 30, 2011, at 2:40 PM, Dmitry Koterov wrote:
>
> > Hello.
> >
> > I need to create an auto-increment field on a table WITHOUT using
> sequences:
>
> This problem pops up a lot for invoice sequence numbers for the tax office
> and related cases. (Search for "gapless sequence".)
>
> Since the numbers are really only needed for an external system (as you
> mention), then it may make sense to generate the gapless IDs when necessary
> and map the generated IDs to the rows later. The drawback is that some rows
> in the table will not have the gapless ID until the batch job is run, but
> all rows will still be addressable by the real sequence ID.
>
> Cheers,
> M
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

[GENERAL] How to create "auto-increment" field WITHOUT a sequence object?

2011-06-30 Thread Dmitry Koterov

Hello.

I need to create an auto-increment field on a table WITHOUT using sequences:

CREATE TABLE tbl(
  name TEXT,
  uniq_id INTEGER
);

Each INSERT to this table must generate a new uniq_id which is distinct from
all others.

The problem is that these INSERTs are rolled back oftenly (i.e. they are
executed within a transaction block which is rolled back time to time), this
is an existing design of the current architecture and unfortunately we have
to live with it. And I need as compact uniq_id generation (with minimum
"holes") as it possible - this is a VERY important requirement (to export
these values into external systems which accepts only IDs limited from 1 to
10).

So I cannot use sequences: sequence value is obviously not rolled back, so
if I insert nextval(...) as uniq_id, I will have large holes (because of
often transaction rollbacks) and exhaust 10 uniq_ids very fast. How to
deal with all this without sequences?

I tried

BEGIN;
LOCK TABLE tbl;
INSERT INTO tbl(uniq_id) VALUES((SELECT max(uniq_id) FROM tbl) + 1);
COMMIT;

but seems it performs too hard locking - time to time this query is timed
out (or sometimes deadlocks with other queries).

Is there any other, less hard, locking which allow me to guarantee that no
INSERTs will be performed into tbl between max() calculation and UPDATE
query itself, but does not lock the whole table?

Re: [GENERAL] How to create "auto-increment" field WITHOUT a sequence object?

2011-06-30 Thread Dmitry Koterov

Thanks to all, I'll try to live with this solution.
(It is not ideal, but pretty good for my case.)

The better way may be to create an additional table with ANY structure and
no data and LOCK using it, but not the "tbl" table. It theoretically
decrease race conditions - the only thing which I need is to make mutex
around only one update statement.


On Fri, Jul 1, 2011 at 12:01 AM, Dmitry Koterov  wrote:

> OK.
>
> Possible next solution is ON AFTER UPDATE trigger:
>
> BEGIN
>   LOCK TABLE tbl IN SHARE UPDATE EXCLUSIVE MODE;
>UPDATE tbl SET uniq_id=(SELECT MAX(uniq_id) + 1 FROM tbl) WHERE id =
> NEW.id;
> END;
>
> Seems SHARE UPDATE EXCLUSIVE is a minimum locking which conflicts with
> itself and which does not conflict with pg_dump.
> (Unfortunately it conflicts with VACUUM which is performed by autovacuum
> process.)
>
> SHARE UPDATE EXCLUSIVE is better than default LOCK (which also blocks
> reading).
>
>
> On Thu, Jun 30, 2011 at 11:38 PM, A.M. wrote:
>
>>
>> On Jun 30, 2011, at 3:36 PM, Dmitry Koterov wrote:
>>
>> > ...possibly within ON AFTER INSERT trigger:
>> >
>> > BEGIN
>> >  pg_advisory_lock(0xDEADBEEF);
>> >  UPDATE tbl SET uniq_id=(SELECT MAX(uniq_id) + 1 FROM tbl) WHERE id =
>> > NEW.id;
>> > END;
>> >
>> > Would it work without explicit pg_advisory_unlock() - would the locking
>> be
>> > released on COMMIT/ROLLBACK?
>>
>> No- advisory locks are managed by the application, so that is exactly what
>> you don't want. The exclusive table lock is still exactly what you need
>> unless you can postpone the generation of the secondary IDs.
>>
>> Cheers,
>> M
>>
>>
>

[GENERAL] Dump a database excluding one table DATA?

2011-08-18 Thread Dmitry Koterov

Hello.

Is there any way (or hack) to dump the whole database, but to exclude the
DATA from a table within this dump? (DDL of the table should not be
excluded: after restoring the data the excluded table should look "empty".)

I see -T switch of pg_dump, but seems -T excludes the data AND the DDL of a
table (and possibly all objects which depend on this table?), so after
restoration the database structure becomes broken sometimes.

Re: [GENERAL] Dump a database excluding one table DATA?

2011-08-18 Thread Dmitry Koterov

1. I need to shorten pg_dump results (for backup purposes), so pg_restore is
too late for that...

2. If I use "pg_dump -s" separately, the data may not load (or load to slow)
after that, because all indices/foreign keys are already there. Is there a
way to split "pg_dump -s" into 2 parts: the first part dumps everything
excluding indices, checks and foreign keys, and the second part - only them?
Not sure it is possible at all, because I think pg_dump may dump data not
between these two blocks of DDLs...



On Fri, Aug 19, 2011 at 12:04 AM, Adrian Klaver wrote:

> On Thursday, August 18, 2011 12:49:45 pm Dmitry Koterov wrote:
> > Hello.
> >
> > Is there any way (or hack) to dump the whole database, but to exclude the
> > DATA from a table within this dump? (DDL of the table should not be
> > excluded: after restoring the data the excluded table should look
> "empty".)
> >
> > I see -T switch of pg_dump, but seems -T excludes the data AND the DDL of
> a
> > table (and possibly all objects which depend on this table?), so after
> > restoration the database structure becomes broken sometimes.
>
> One way I know you can do it, is exclude the data from restoring. This
> requires
> you use the pg_dump custom format. For full details see here:
>
> http://www.postgresql.org/docs/9.0/interactive/app-pgrestore.html
>
> Short version use pg_restore -l to generate a listing from the dump file.
> Comment out the line that copys the data into that table.
> Use pg_restore ... -L to have pg_restore those items not commented out.
>
> Another way is do it using the -T switch for the 'complete' db dump. Then
> do a
> separate dump using -s (schema only) and -t some_table and then restore it
> on
> its own.
>
> --
> Adrian Klaver
> adrian.kla...@gmail.com
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] Dump a database excluding one table DATA?

2011-08-18 Thread Dmitry Koterov

Thanks, "pg_dump --data-only --disable-triggers" is the king.

(Unfortunately it is not supported by pg_dumpall, but it is entirely another
story. :-)


On Fri, Aug 19, 2011 at 12:36 AM, Adrian Klaver wrote:

> On Thursday, August 18, 2011 1:23:25 pm Dmitry Koterov wrote:
> > 1. I need to shorten pg_dump results (for backup purposes), so pg_restore
> > is too late for that..
>
> >
> > 2. If I use "pg_dump -s" separately, the data may not load (or load to
> > slow) after that, because all indices/foreign keys are already there. Is
> > there a way to split "pg_dump -s" into 2 parts: the first part dumps
> > everything excluding indices, checks and foreign keys, and the second
> part
> > - only them? Not sure it is possible at all, because I think pg_dump may
> > dump data not between these two blocks of DDLs...
> >
>
> I am not sure I follow. Are you saying you eventually restore the data for
> that
> table as a separate step? If so, from the previous link, this might help:
>
> "
> --disable-triggers
>
>This option is only relevant when performing a data-only restore. It
> instructs pg_restore to execute commands to temporarily disable triggers on
> the
> target tables while the data is reloaded. Use this if you have referential
> integrity checks or other triggers on the tables that you do not want to
> invoke
> during data reload.
>
>Presently, the commands emitted for --disable-triggers must be done as
> superuser. So, you should also specify a superuser name with -S, or
> preferably
> run pg_restore as a PostgreSQL superuser.
> "
>
>
>
> --
> Adrian Klaver
> adrian.kla...@gmail.com
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] Dump a database excluding one table DATA?

2011-08-18 Thread Dmitry Koterov

Mmm, --disable-triggers is not surely enough - we also have RULEs and (much
worse) INDEXes.

If we create all indices and then restore all data, it is MUCH SLOWER than
restore the data first and then - create all indices.
So I think that there is no work-around really...

I propose to include an option to pg_dump to skip several tables data
restoration. :-)



On Fri, Aug 19, 2011 at 12:44 AM, Dmitry Koterov  wrote:

> Thanks, "pg_dump --data-only --disable-triggers" is the king.
>
> (Unfortunately it is not supported by pg_dumpall, but it is entirely
> another story. :-)
>
>
> On Fri, Aug 19, 2011 at 12:36 AM, Adrian Klaver 
> wrote:
>
>> On Thursday, August 18, 2011 1:23:25 pm Dmitry Koterov wrote:
>> > 1. I need to shorten pg_dump results (for backup purposes), so
>> pg_restore
>> > is too late for that..
>>
>> >
>> > 2. If I use "pg_dump -s" separately, the data may not load (or load to
>> > slow) after that, because all indices/foreign keys are already there. Is
>> > there a way to split "pg_dump -s" into 2 parts: the first part dumps
>> > everything excluding indices, checks and foreign keys, and the second
>> part
>> > - only them? Not sure it is possible at all, because I think pg_dump may
>> > dump data not between these two blocks of DDLs...
>> >
>>
>> I am not sure I follow. Are you saying you eventually restore the data for
>> that
>> table as a separate step? If so, from the previous link, this might help:
>>
>> "
>> --disable-triggers
>>
>>This option is only relevant when performing a data-only restore. It
>> instructs pg_restore to execute commands to temporarily disable triggers
>> on the
>> target tables while the data is reloaded. Use this if you have referential
>> integrity checks or other triggers on the tables that you do not want to
>> invoke
>> during data reload.
>>
>>Presently, the commands emitted for --disable-triggers must be done as
>> superuser. So, you should also specify a superuser name with -S, or
>> preferably
>> run pg_restore as a PostgreSQL superuser.
>> "
>>
>>
>>
>> --
>> Adrian Klaver
>> adrian.kla...@gmail.com
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>

[GENERAL] Announce: PGUnit - xUnit test framework for pl/pgsql

2008-11-11 Thread Dmitry Koterov

Hello.

Hope this will be helpful for agile developers.

http://en.dklab.ru/lib/dklab_pgunit/

PGUnit is a xUnit-style framework for stored procedures in PostgreSQL 8.3+.
It allows database developers to write automated tests for existed stored
procedures or develop procedures using concepts of Test Driven Development
(TDD). All test cases are stored in the database, so you don't need any
external resources (like files, version control, command-line utilities
etc.) to save tests.

As in traditional xUnit, tests may be grouped in test case; each test-case
may have its own environment initialization code ("fixture preparation
code", or setUp block). The main benefit of PGUnit is that setUp block
(usually quite CPU intensive) is executed only once, and its effect is
stored in a savepoint. Then, all tests are executed from that savepoint, so
the fixture initialization overheat is minimal. All tests are still executed
independently, because their effects are automatically rolled back after the
execution.
Comments are welcome.

[GENERAL] Propose: enum with all registered table names?

2008-11-20 Thread Dmitry Koterov

Hello.

Sometimes I want to save in a table reference to another table's name.
E.g. I am creating an accounting system which (on triggers) logs which
record is changed and in which table:

CREATE TABLE log(
  tbl XXX,
  id INTEGER,
  blahblah
);

Nowadays XXX may be:
1) Table name. But it is quite inefficient by disk usage. Another bad thing:
if I rename a table, I have to rename all rows in log table.
2) Table OID. It is very bad for pg_dump: after the restoration log table
will be unusable.
3) ENUM with all table names. But I have to add an element to ENUM each time
I create a new table, and, if I rename a table, I have to rename an ENUM
element too.

So it would be very useful if Postgres has a special, system ENUM (e.g.
pg_catalog.table_names_enum) which holds names of all tables in the database
(format: schema.table), and their elements are automatically renamed when a
table is renamed.

[GENERAL] Is index enough to do simple fetch, or table is always used too?

2008-12-14 Thread Dmitry Koterov

Hello.

Suppose I have the following index:

CREATE INDEX idx ON tbl  USING btree (abc, def, id)

and perform the query with index scan:

SELECT md5(id)
FROM tbl
WHERE abc=1 AND def=2
LIMIT 200

*The question:* if the table "tbl" scanned to fetch "id" and calculate
md5(id), or the value of "id" is brought directly from "idx" index with no
table data access at all? The second behaviour is logical: why should we
access the table if all the needed data is already in the index entry...

(Some time ago I have read something about this behaviour somewhere, but now
cannot find it in the PostgreSQL documentation. Possibly it were words about
another database, not Postges?)

Re: [GENERAL] Is index enough to do simple fetch, or table is always used too?

2008-12-14 Thread Dmitry Koterov

On Sun, Dec 14, 2008 at 3:36 PM, David Rowley  wrote:

> 2008/12/14 Dmitry Koterov :
> > The question: if the table "tbl" scanned to fetch "id" and calculate
> > md5(id), or the value of "id" is brought directly from "idx" index with
> no
> > table data access at all? The second behaviour is logical: why should we
> > access the table if all the needed data is already in the index entry...
>
> In fact not all the required information is in the index. Postgresql
> uses Multi-version-concurrency-control, which means there may be multi
> versions of the same row. Postgresql must hit the heap (table) no get
> the visibility information.
>

But isn't an index data is also multi-version?
If no, how could a single-versioned index be used to fetch the data from a
past snapshot?

[GENERAL] Planner hints in SELECT queries?

2008-12-17 Thread Dmitry Koterov

Hello.

Sometimes I have to create the following SQL code:

SET something=off;
SET other=off;
SELECT * FROM ... ORDER BY id LIMIT 10;
RESET something;
RESET something;

(e.g. "something" may be equal to "seq_page_cost=10").

I propose to add the SELECT clause to do it natively, like this:

SELECT *
FROM tbl
ORDER BY id
LIMIT 10
SETTING something=off, other=off

This will do RESET back automatically when the query is executed.

(Why do I need it? Because sometimes even after ANALYZE query I get
non-optimal plan, even if I exactly know which plan should be used.)

[GENERAL] Bgwriter and pg_stat_bgwriter.buffers_clean aspects

2008-12-25 Thread Dmitry Koterov

Hello.

I am trying to tune PostgreSQL write parameters to make writing operation
fast on a heavy-loaded database (a lot of inserts/updates).
After resetting the pg_stat_bgwriter statistics (I do it by deleting
global/pgstat.stat file and restarting PostgreSQL) I monitor the following:

# select * from pg_stat_bgwriter;
 checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean |
maxwritten_clean | buffers_backend | buffers_alloc
---+-++---+--+-+---
 0 |   0 |  0 | 0
|0 | 164 |  6653

So, you see that just after resetting the statistics we have a large value
in buffers_backend. Why?
Documentation:
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm
says that "buffers_backend

[GENERAL] Bgwriter and pg_stat_bgwriter.buffers_clean aspects

2008-12-25 Thread Dmitry Koterov

Hello.

I am trying to tune PostgreSQL write parameters to make writing operation
fast on a heavy-loaded database (a lot of inserts/updates).
After resetting the pg_stat_bgwriter statistics (I do it by deleting
global/pgstat.stat file and restarting PostgreSQL) I monitor the following:

# select * from pg_stat_bgwriter;
 checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean |
maxwritten_clean | buffers_backend | buffers_alloc
---+-++---+--+-+---
 8 |   0 |  19092 | 0
|0 |2285 | 30148

So, you see that some time after resetting the statistics we have:
- a large value in buffers_backend;
- a zero buffers_clean.

Why?

Documentation:
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm
says that "buffers_backend ... [is a number of]  times a database backend
(probably the client itself) had to write a page in order to make space for
the new allocation", and "buffers_clean ... [means that] the background
writer cleaned ... buffers (cleaned=wrote out dirty ones) during that time".

What I am trying to achieve is that all writing operation are performed
asynchronously and mostly flushed to the disk before a CHECKPOINT occurred,
so CHECKPOINT is cheap thanks to bgwiter work.

Could you please explain what happened and what large buffers_backend and
zero buffers_clean mean?


Related parameters:

shared_buffers = 512MB
fsync = on
synchronous_commit = off
wal_writer_delay = 2000ms
checkpoint_segments = 20
checkpoint_timeout = 1min
checkpoint_completion_target = 0.8
checkpoint_warning = 1min
bgwriter_delay = 10ms
bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 10

[GENERAL] Stange "duplicate key value violates unique constraint" after "delete" at ON UPDATE trigger

2012-01-26 Thread Dmitry Koterov

Hello.

PostgreSQL 8.4. I am trying to create a trigger which removes "old" rows
from the table just before inserting a new one. The table has an UNIQUE
INDEX on a column, that's why I need to remove an old item with the same
value of the column before inserting a new one.

If I work without transactions (in auto-commit mode), all seems to be
fine. But something strange is happened when I use transactions.

The following SQL represents the problem. How to avoid strange "duplicate
key value violates unique constraint" error (with minimum locking level)?..
And why this error happens at all?

-- Prepare the fixture.
create table a(i integer);
CREATE UNIQUE INDEX a_idx ON a USING btree (i);
CREATE FUNCTION a_tr() RETURNS trigger AS
$body$
BEGIN
DELETE FROM a WHERE i = NEW.i;
RETURN NEW;
END;
$body$
LANGUAGE 'plpgsql';
CREATE TRIGGER a_tr BEFORE INSERT ON a FOR EACH ROW EXECUTE PROCEDURE
a_tr();


-- Check if the trigger really works. No unique constraint errors are
thrown.
insert into a values(1);
insert into a values(1); --> ok


-- NOW IN CONNECTION (A):
begin;
insert into a values(1); --> do not commit!

-- THEN IN CONNECTION (B):
insert into a values(1); --> it hangs, because the
connection (A) is not committed - ok

-- NOW IN CONNECTION (A) AGAIN:
commit; --> ok

-- WE SEE AT CONNECTION (B) THE THE FOLLOWING IMMEDIATELY:
ERROR:  duplicate key value violates unique constraint
"a_idx"

Re: [GENERAL] Stange "duplicate key value violates unique constraint" after "delete" at ON UPDATE trigger

2012-04-26 Thread Dmitry Koterov

I'm not sure the cause is that DELETE does not see the row.

Seems the following method solves the problem when 2 same-time transactions
are active:

CREATE FUNCTION a_tr() RETURNS trigger AS
$body$
DECLARE
tmp INTEGER;
BEGIN
-- Lock until the mathed entry (possibly phantom - i.e. not yet
committed
-- by another transaction) is released.
SELECT i INTO tmp FROM a WHERE i = NEW.i FOR UPDATE;

-- The lock is released here in one of two cases:
--   1. Matched row was phantom, so tmp IS NULL now.
--   2. Matched row was real and committed, so tmp holds its ID.
-- So we cannot use ID in tmp - it is not always returned. That's why
we have to
-- duplicate the selection predicate above...
DELETE FROM a WHERE i = NEW.i;

RETURN NEW;
END;
$body$
LANGUAGE 'plpgsql';

But this method still does not work if 3 or more transactions are active
(if I commit first and commit second, the third fails with "duplicate key"
error).

Are there any universal method which could be implemented purely in a
trigger?..



On Fri, Jan 27, 2012 at 3:45 PM, Julian v. Bock  wrote:

> Hi
>
> >>>>> "DK" == Dmitry Koterov  writes:
>
> DK> create table a(i integer);
> DK> CREATE UNIQUE INDEX a_idx ON a USING btree (i);
> DK> CREATE FUNCTION a_tr() RETURNS trigger AS
> DK> $body$
> DK> BEGIN
> DK> DELETE FROM a WHERE i = NEW.i;
> DK> RETURN NEW;
> DK> END;
> DK> $body$
> DK> LANGUAGE 'plpgsql';
> DK> CREATE TRIGGER a_tr BEFORE INSERT ON a FOR EACH ROW EXECUTE PROCEDURE
> DK> a_tr();
>
> The DELETE doesn't see the row the other transaction inserted and
> doesn't delete anything (and doesn't block). This happens later when the
> row is inserted and the index is updated.
>
> You can try the insert and catch the unique violation in a loop (see
> http://www.postgresql.org/docs/9.1/static/plpgsql-control-structures.html)
> although that won't work with a BEFORE trigger.
>
> Regards,
> Julian
>
> --
> Julian v. Bock   Projektleitung Software-Entwicklung
> OpenIT GmbH  Tel +49 211 239 577-0
> In der Steele 33a-41 Fax +49 211 239 577-10
> D-40599 Düsseldorf   http://www.openit.de
> 
> HRB 38815 Amtsgericht Düsseldorf USt-Id DE 812951861
> Geschäftsführer: Oliver Haakert, Maurice Kemmann
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] Stange "duplicate key value violates unique constraint" after "delete" at ON UPDATE trigger

2012-04-26 Thread Dmitry Koterov

I have had 2 hours of experiments and finally I suppose that there is no
way to satisfy this unique constraint index from within a trigger with
non-whole-table locking. So Julian seems to be right (unfortunately). Only

LOOP
BEGIN
INSERT ...;
EXIT;
EXCEPTION WHEN unique_violation THEN
DELETE FROM ... WHERE ;
END;
END LOOP;

construction helps. There seems to be no way to implement the same using
triggers only.



On Thu, Apr 26, 2012 at 3:39 PM, Dmitry Koterov  wrote:

> I'm not sure the cause is that DELETE does not see the row.
>
> Seems the following method solves the problem when 2 same-time
> transactions are active:
>
> CREATE FUNCTION a_tr() RETURNS trigger AS
> $body$
> DECLARE
> tmp INTEGER;
> BEGIN
> -- Lock until the mathed entry (possibly phantom - i.e. not yet
> committed
> -- by another transaction) is released.
> SELECT i INTO tmp FROM a WHERE i = NEW.i FOR UPDATE;
>
> -- The lock is released here in one of two cases:
> --   1. Matched row was phantom, so tmp IS NULL now.
> --   2. Matched row was real and committed, so tmp holds its ID.
> -- So we cannot use ID in tmp - it is not always returned. That's why
> we have to
> -- duplicate the selection predicate above...
> DELETE FROM a WHERE i = NEW.i;
>
> RETURN NEW;
> END;
> $body$
> LANGUAGE 'plpgsql';
>
> But this method still does not work if 3 or more transactions are active
> (if I commit first and commit second, the third fails with "duplicate key"
> error).
>
> Are there any universal method which could be implemented purely in a
> trigger?..
>
>
>
> On Fri, Jan 27, 2012 at 3:45 PM, Julian v. Bock  wrote:
>
>> Hi
>>
>> >>>>> "DK" == Dmitry Koterov  writes:
>>
>> DK> create table a(i integer);
>> DK> CREATE UNIQUE INDEX a_idx ON a USING btree (i);
>> DK> CREATE FUNCTION a_tr() RETURNS trigger AS
>> DK> $body$
>> DK> BEGIN
>> DK> DELETE FROM a WHERE i = NEW.i;
>> DK> RETURN NEW;
>> DK> END;
>> DK> $body$
>> DK> LANGUAGE 'plpgsql';
>> DK> CREATE TRIGGER a_tr BEFORE INSERT ON a FOR EACH ROW EXECUTE PROCEDURE
>> DK> a_tr();
>>
>> The DELETE doesn't see the row the other transaction inserted and
>> doesn't delete anything (and doesn't block). This happens later when the
>> row is inserted and the index is updated.
>>
>> You can try the insert and catch the unique violation in a loop (see
>> http://www.postgresql.org/docs/9.1/static/plpgsql-control-structures.html
>> )
>> although that won't work with a BEFORE trigger.
>>
>> Regards,
>> Julian
>>
>> --
>> Julian v. Bock   Projektleitung Software-Entwicklung
>> OpenIT GmbH  Tel +49 211 239 577-0
>> In der Steele 33a-41 Fax +49 211 239 577-10
>> D-40599 Düsseldorf   http://www.openit.de
>> 
>> HRB 38815 Amtsgericht Düsseldorf USt-Id DE 812951861
>> Geschäftsführer: Oliver Haakert, Maurice Kemmann
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>

[GENERAL] Difference between speed of 2 functions: SQL+STABLE and PLPGSQL+EXECUTE

2012-04-26 Thread Dmitry Koterov

Hello.

For example, I have 2 functions like these:

CREATE OR REPLACE FUNCTION first(a INTEGER, b INTEGER, ...) RETURNS ... AS
$body$
...any SQL which uses $1, $2 etc. arguments, plus LIMIT $3...
$body$
LANGUAGE 'sql'
*STABLE*

and

CREATE OR REPLACE FUNCTION second(a INTEGER, b INTEGER, ...) RETURNS ... AS
$body$
DECLARE
res ...;
BEGIN
EXECUTE '...the same SELECT, ' ||
'but ' || quote_literal(a) || ' args are embedded, plus ' ||
'LIMIT ' || quote_literal($3)
INTO res;
RETURN res;
END;
$body$
LANGUAGE 'plpgsql'
*STABLE*

And then I call

EXPLAIN ANALYZE SELECT * FROM first(...);
EXPLAIN ANALYZE SELECT * FROM second(...);

Should these two queries be executed by the same time usage (i.e. does
PostgreSQL generate same plans for inner queries)?

I always thought that the answer is YES: if a function is STABLE and with
language=SQL, its SQL code is embedded into outer context after all
arguments are expanded into their values (so the plan is built after
argument expansion). But some days ago I detected a case when second()
works about 100 times faster than first(), and the cause is seems that the
planner does not see all of expanded arguments in first() (if I replace
arguments to constants in first(), especially in LIMIT clause, it begins to
work the same speed as second() does). Unfortunately EXPLAIN ANALYZE does
not go into functions and shows only overall time, so I have no real
information about what plan is actually used in first().

Could you please comment this case a bit?..

Re: [GENERAL] Difference between speed of 2 functions: SQL+STABLE and PLPGSQL+EXECUTE

2012-05-02 Thread Dmitry Koterov

Albe, thanks for detailed explaination.

But it's quite strange that SQL+STABLE function does not recalculate the
plan each time it is called. Because when I use a bunch of SQL+STABLE
functions in e.g. a sub-select of a complex query, I see in the plan of
this complex queries that function calls are "expanded". It looks like
PostgreSQL uses SQL code defined in SQL+STABLE functions and merges
(injects) it into the main query instead of the function call.

E.g.:

==
CREATE TABLE a(i INTEGER, t VARCHAR(5));
INSERT INTO a(i, t) SELECT s, '' FROM generate_series(1, 1) s;
INSERT INTO a(i, t) SELECT s, 't' FROM generate_series(10001, 10002) s;
CREATE INDEX "a_i_idx" ON "public"."a" USING btree ("i");
CREATE INDEX "a_t_idx" ON "public"."a" USING btree ("t");
ANALYZE a;

CREATE OR REPLACE FUNCTION a_get_t(in_a a) RETURNS TEXT STABLE LANGUAGE sql
AS 'SELECT $1.t';

explain analyze
select * from a
where a_get_t(a) = 't';

QUERY PLAN
Index Scan using a_t_idx on a  (cost=0.00..8.29 rows=2 width=5) (actual
time=0.041..0.043 rows=2 loops=1)
  Index Cond: ((t)::text = 't'::text)
==

You may see that a_get_t() SQL code was merged into the main query plan, so
the result is found without a seqscan with t='t' filtering, but the index
is used.

That was a very simple example, I use much more complex SQL+STABLE
functions in my code and I am practically sure that this SQL
extraction+injection is applied by PostgreSQL in other cases too (e.g.
sometimes planner initiates a hash join with tables which are referred
inside SQL+STABLE functions in sub-queries). If I replace STABLE with
VOLATILE in that complex cases, the effect disappears: no more SQL
extraction performed.

And more: assume we have a function f(x, y, z) VOLATILE with very complex
SQL inside and we call it like:

select * from f(1, 2, 3);

The query may took a long time (during not only the first call, but during
all others too). Then I just replace VOLATILE to STABLE for f(), and the
same query:

select * from f(1, 2, 3);

suddenly becomes very fast (e.g. 1000 times faster or even more). It's a
very common case: I've performed many times. I thought that it was because
of re-planning of STABLE functions on each call according to real passed
values...

If STABLE functions has frozen plans too (independent to its real passed
arguments values), how could we explain so much difference in performance
replacing VOLATILE to STABLE?

On Fri, Apr 27, 2012 at 2:34 PM, Albe Laurenz wrote:

> Dmitry Koterov wrote:
> > For example, I have 2 functions like these:
> >
> > CREATE OR REPLACE FUNCTION first(a INTEGER, b INTEGER, ...) RETURNS
> ... AS
> > $body$
> > ...any SQL which uses $1, $2 etc. arguments, plus LIMIT $3...
> > $body$
> > LANGUAGE 'sql'
> > STABLE
> >
> >
> > and
> >
> >
> > CREATE OR REPLACE FUNCTION second(a INTEGER, b INTEGER, ...) RETURNS
> ... AS
> > $body$
> > DECLARE
> > res ...;
> > BEGIN
> > EXECUTE '...the same SELECT, ' ||
> > 'but ' || quote_literal(a) || ' args are embedded, plus ' ||
> > 'LIMIT ' || quote_literal($3)
> > INTO res;
> > RETURN res;
> > END;
> > $body$
> > LANGUAGE 'plpgsql'
> > STABLE
> >
> > And then I call
> >
> > EXPLAIN ANALYZE SELECT * FROM first(...);
> > EXPLAIN ANALYZE SELECT * FROM second(...);
> >
> > Should these two queries be executed by the same time usage (i.e. does
> PostgreSQL generate same plans
> > for inner queries)?
> >
> > I always thought that the answer is YES: if a function is STABLE and
> with language=SQL, its SQL code
> > is embedded into outer context after all arguments are expanded into
> their values (so the plan is
> > built after argument expansion). But some days ago I detected a case
> when second() works about 100
> > times faster than first(), and the cause is seems that the planner
> does not see all of expanded
> > arguments in first() (if I replace arguments to constants in first(),
> especially in LIMIT clause, it
> > begins to work the same speed as second() does). Unfortunately EXPLAIN
> ANALYZE does not go into
> > functions and shows only overall time, so I have no real information
> about what plan is actually used
> > in first().
>
> You can get EXPLAIN plans if you use the auto_explain contrib module
> with auto_explain.log_nested_statements enabled.
>
> As you suspect, the tw

[GENERAL] Interval "1 month" is equals to interval "30 days" - WHY?

2012-08-07 Thread Dmitry Koterov

Hello.

I've just discovered a very strange thing:

SELECT '1 mon'::interval = '30 days'::interval   --> TRUE???

This returns TRUE (also affected when I create an unique index using an
interval column). Why?

I know that Postgres stores monthes, days and seconds in interval values
separately. So how to make "=" to compare intervals "part-by-part" and not
treat "1 mon" as "30 days"?

P.S.
Reproduced at least in 8.4 and 9.1.

Re: [GENERAL] Interval "1 month" is equals to interval "30 days" - WHY?

2012-08-07 Thread Dmitry Koterov

...and even worse:

SELECT ('1 year'::interval) = ('360 days'::interval); --> TRUE :-)
SELECT ('1 year'::interval) = ('365 days'::interval); --> FALSE :-)


On Tue, Aug 7, 2012 at 4:42 PM, Dmitry Koterov  wrote:

> Hello.
>
> I've just discovered a very strange thing:
>
> SELECT '1 mon'::interval = '30 days'::interval   --> TRUE???
>
> This returns TRUE (also affected when I create an unique index using an
> interval column). Why?
>
> I know that Postgres stores monthes, days and seconds in interval values
> separately. So how to make "=" to compare intervals "part-by-part" and not
> treat "1 mon" as "30 days"?
>
> P.S.
> Reproduced at least in 8.4 and 9.1.
>

Re: [GENERAL] Interval "1 month" is equals to interval "30 days" - WHY?

2012-08-08 Thread Dmitry Koterov

Of course NOT.

'1 mon' and '30 days' have different meaning. So they should not be equal.

I understand that conversion to seconds is a more or less correct way to
compare intervals with ">" and "<". But equality is not the same as
ordering (e.g. equality is typically used in JOINs and unique indices).

Now I have to use CREATE UNIQUE INDEX test ON tbl(interval_col::TEXT) and
use the same casting to TEXT in all JOINS and searches - this is very ugly.



On Wed, Aug 8, 2012 at 1:54 PM, Albe Laurenz wrote:

> Dmitry Koterov wrote:
> >> I've just discovered a very strange thing:
> >>
> >> SELECT '1 mon'::interval = '30 days'::interval   --> TRUE???
> >>
> >> This returns TRUE (also affected when I create an unique index using
> an
> >> interval column). Why?
> >>
> >> I know that Postgres stores monthes, days and seconds in interval
> values
> >> separately. So how to make "=" to compare intervals "part-by-part"
> and not
> >> treat "1 mon" as "30 days"?
> >>
> >> P.S.
> >> Reproduced at least in 8.4 and 9.1.
>
> > ...and even worse:
> >
> > SELECT ('1 year'::interval) = ('360 days'::interval); --> TRUE :-)
> > SELECT ('1 year'::interval) = ('365 days'::interval); --> FALSE :-)
>
> Intervals are internally stored in three fields: months, days
> and microseconds.  A year has 12 months.
>
> PostgreSQL converts intervals into microseconds before comparing them:
> a month is converted to 30 days, and a day is converted to 24 hours.
>
> Of course this is not always correct.
> But what should the result of
>   INTERVAL '1 month' = INTERVAL '30 days'
> be?  FALSE would be just as wrong.
>
> Yours,
> Laurenz Albe
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] Interval "1 month" is equals to interval "30 days" - WHY?

2012-08-24 Thread Dmitry Koterov

BTW there are a much more short version of this:

CREATE OR REPLACE FUNCTION int_equal(interval, interval) RETURNS boolean
IMMUTABLE STRICT LANGUAGE sql AS
'SELECT $1::text = $2::text';


On Wed, Aug 8, 2012 at 4:51 PM, Albe Laurenz wrote:

> Then maybe you should use something like this for equality:
>
> CREATE OR REPLACE FUNCTION int_equal(interval, interval) RETURNS boolean
>   IMMUTABLE STRICT LANGUAGE sql AS
>   'SELECT
> 12 * EXTRACT (YEAR FROM $1) + EXTRACT (MONTH FROM $1)
>   = 12 * EXTRACT (YEAR FROM $2) + EXTRACT (MONTH FROM $2)
> AND EXTRACT (DAY FROM $1) = EXTRACT (DAY FROM $2)
> AND 36 * EXTRACT (HOUR FROM $1)
> + 6000 * EXTRACT (MINUTE FROM $1)
> + EXTRACT (MICROSECONDS FROM $1)
>   = 36 * EXTRACT (HOUR FROM $2)
> + 6000 * EXTRACT (MINUTE FROM $2)
> + EXTRACT (MICROSECONDS FROM $2)';
>
>

89 matches

Mail list logo