Re: [GENERAL] Avoid sorting when doing an array_agg

2016-12-03 Thread Kiriakos Georgiou
The array_agg() has nothing to do with it.  It’s the group by.

Without knowing what you are conceptually trying to accomplish, I can’t say 
much. 

On my test 9.4.10 db, a similar example does a HashAggregate, so no sorting 
(google HashAggregate vs GroupAggregate).  But still it’s an expensive query 
because of all the I/O.

If I wanted to instantly have the user ids for a specific first, last name and 
category combo, I’d maintain a summary table via an insert trigger on the users 
table.

 

Kiriakos

 

 

From:  on behalf of Alexis Woo 

Date: Friday, December 2, 2016 at 12:25 PM
To: 
Subject: [GENERAL] Avoid sorting when doing an array_agg

 

I have a users table which contains ~70 million rows that looks like this:

 

   Column|   Type|

-+---+

 id  | integer   |

 first_name  | character varying |

 last_name   | character varying |

 category_id | integer   |

Indexes:

"users_id_idx" btree (id)

"users_category_id_first_name_last_name_idx" btree (category_id, 
first_name, last_name)

 

I'm trying to retrieve the ids for each (first_name, last_name) couple for one 
specific category_id.

The query that I'm currently doing is the following:

 

select array_agg(id)

from users

where category_id = 5432

group by first_name, last_name;

 

For which the explain analyze output is the following:

 

 GroupAggregate  (cost=618461.35..626719.42 rows=26881 width=19) (actual 
time=1683.139..2613.386 rows=102943 loops=1)

   Group Key: first_name, last_name

   ->  Sort  (cost=618461.35..620441.86 rows=792206 width=19) (actual 
time=1683.116..2368.904 rows=849428 loops=1)

 Sort Key: first_name, last_name

 Sort Method: external merge  Disk: 25304kB

 ->  Bitmap Heap Scan on users  (cost=26844.16..524595.92 rows=792206 
width=19) (actual time=86.046..229.469 rows=849428 loops=1)

   Recheck Cond: (category_id = 5432)

   Heap Blocks: exact=7938

   ->  Bitmap Index Scan on 
users_category_id_first_name_last_name_idx  (cost=0.00..26646.11 rows=792206 
width=0) (actual time=85.006..85.006 rows=849428 loops=1)

 Index Cond: (category_id = 5432)

 

What seems to greatly decrease the performance of the query is the "Sort 
Method: external merge Disk: 7526kB."

 

Is it possible to aggregate the ids without doing a sort ?

If not, what other options, apart from increasing the work_mem, do I have ?

Thanks,

Alexis

 



[GENERAL] Announcement: application level undo/redo for PostgreSQL

2016-12-02 Thread Kiriakos Georgiou
Greetings,

 

I’d like to announce the availability of a PostgreSQL extension that I’ve sat 
on for a year – it’s time to share!  It provides undo/redo functionality useful 
to application developers that want to persist application objects in a 
PostgreSQL database.  It does some gymnastics using jsonb under the hood, so 
you’ll need to be on PostgreSQL 9.4 and up.

 

The bad news is that currently you have to model the application object with 
just one table (multi table undo/redo is on the to-do list).  The good news is 
that it implements non-linear multi-timeline undo/redo.  What this means is it 
keeps a full history of table operations and allows you to jump anywhere in the 
history and continue operations from there.

 

For an article detailing how to use it see: 
http://www.mockbites.com/articles/tech/istoria

You can get it from here: https://github.com/KiriakosGeorgiou/istoria

 

This was inspired by the airspace editor project I’m currently working on where 
undo/redo is managed by the application.  I wanted to push all this to the 
database so a) tables don’t get polluted with undo/redo metadata b) the 
application does not need to implement undo/redo logic. 

 

I hope you find it useful.

 

Regards,

Kiriakos Georgiou



[GENERAL] pg_multixact issues

2016-02-10 Thread Kiriakos Georgiou
Hello,

Our pg_multixact directory keeps growing.  I did a "vacuum freeze” which didn’t 
help.  I also did a "vacuum full” which didn’t help either.
We had this condition with 9.3.4 as well.  When I upgraded our cluster to 9.4.5 
(via plain sql dump and load) as expected the issue was resolved but now it’s 
happening again.  Luckily it has no ill effect other than consuming 4G of space 
for an otherwise 1G database.

Can you offer any hints as to how I can cure this?

thanks,
Kiriakos Georgiou


pg_controldata output:

pg_control version number:942
Catalog version number:   201409291
Database system identifier:   6211781659140720513
Database cluster state:   in production
pg_control last modified: Wed Feb 10 13:45:02 2016
Latest checkpoint location:   D/FB5FE630
Prior checkpoint location:D/FB5FE558
Latest checkpoint's REDO location:D/FB5FE5F8
Latest checkpoint's REDO WAL file:0001000D00FB
Latest checkpoint's TimeLineID:   1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:  0/3556219
Latest checkpoint's NextOID:  2227252
Latest checkpoint's NextMultiXactId:  2316566
Latest checkpoint's NextMultiOffset:  823062151
Latest checkpoint's oldestXID:668
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  3556219
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Time of latest checkpoint:Wed Feb 10 13:45:02 2016
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline:   0
Backup start location:0/0
Backup end location:  0/0
End-of-backup record required:no
Current wal_level setting:hot_standby
Current wal_log_hints setting:off
Current max_connections setting:  100
Current max_worker_processes setting: 8
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   1024
Maximum data alignment:   8
Database block size:  8192
Blocks per segment of large relation: 131072
WAL block size:   8192
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum columns in an index:  32
Maximum size of a TOAST chunk:1996
Size of a large-object chunk: 2048
Date/time type storage:   64-bit integers
Float4 argument passing:  by value
Float8 argument passing:  by value
Data page checksum version:   0

the offsets directory:

-rw---   1 postgres dba   262144 Nov  3 15:22 
-rw---   1 postgres dba   262144 Nov  5 12:45 0001
-rw---   1 postgres dba   262144 Nov  9 14:25 0002
-rw---   1 postgres dba   262144 Nov 13 10:10 0003
-rw---   1 postgres dba   262144 Nov 16 15:40 0004
-rw---   1 postgres dba   262144 Nov 20 09:55 0005
-rw---   1 postgres dba   262144 Dec  1 08:00 0006
-rw---   1 postgres dba   262144 Dec  9 11:50 0007
-rw---   1 postgres dba   262144 Dec 16 08:14 0008
-rw---   1 postgres dba   262144 Dec 21 09:40 0009
-rw---   1 postgres dba   262144 Dec 31 09:55 000A
-rw---   1 postgres dba   262144 Jan  4 21:17 000B
-rw---   1 postgres dba   262144 Jan  6 10:50 000C
-rw---   1 postgres dba   262144 Jan  7 18:20 000D
-rw---   1 postgres dba   262144 Jan 13 13:55 000E
-rw---   1 postgres dba   262144 Jan 15 11:55 000F
-rw---   1 postgres dba   262144 Jan 22 07:50 0010
-rw---   1 postgres dba   262144 Jan 26 16:35 0011
-rw---   1 postgres dba   262144 Jan 29 10:16 0012
-rw---   1 postgres dba   262144 Feb  3 13:17 0013
-rw---   1 postgres dba   262144 Feb  3 16:13 0014
-rw---   1 postgres dba   262144 Feb  4 08:24 0015
-rw---   1 postgres dba   262144 Feb  5 13:20 0016
-rw---   1 postgres dba   262144 Feb  8 11:26 0017
-rw---   1 postgres dba   262144 Feb  8 11:46 0018
-rw---   1 postgres dba   262144 Feb  8 12:25 0019
-rw---   1 postgres dba   262144 Feb  8 13:19 001A
-rw---   1 postgres dba   262144 Feb  8 14:23 001B
-rw---   1 postgres dba   262144 Feb  8 15:32 001C
-rw---   1 postgres dba   262144 Feb  8 17:01 001D
-rw---   1 postgres dba   262144 Feb  8 19:19 001E
-rw---   1 postgres dba   262144 Feb  8 22:11 001F
-rw---   1 postgres dba   262144 Feb  9 01:44 0020
-rw---   1 postgres dba   262144 Feb  9 05:57 0021
-rw---   1 postgres dba   262144 Feb  9 10:45 0022
-rw---   1 postgres dba98304 Feb 10 13:35 0023

the members directory has 15723 files:
ls -l|wc -l
   15723

Re: [GENERAL] NOTIFY/LISTEN on server, asynchronous processing

2012-12-11 Thread Kiriakos Georgiou
If I'm understanding you correctly, you want a job queue.

This involves polling and retrieving jobs to work on them.  The polling can be 
assisted by listen/notify so workers don't poll unnecessarily.  The retrieving 
has to be done intelligently to avoid concurrency issues.

Kiriakos Georgiou
http://mockbites.com/about/email

On Dec 11, 2012, at 2:29 PM, rektide rekt...@voodoowarez.com wrote:

 Hi all, I'm writing seeking help for making asynchronous  decoupled 
 processes run on a
 Postgres server.
 
 Here's my current hairbraned workingis:
 1. Create an table async_process and attach a trigger after.
 2. Establish a dblink to localhost.
 3. dblink_send_query(update async_process set counter = counter + 1;) from 
 other sprocs
 3. Designated processing hanging off this async_process table now runs.
 
 All I'm doing is using a table, to create a trigger, that can be run 
 asynchronously.
 
 There's at least two things gross about this strategy:
 1. A async_process table exists only because I need a trigger that can be 
 updated at will.
 2. Having to dblink to oneself to run a query from inside the database 
 asynchronously.
 
 Postgres has a capability for doing async work: NOTIFY/LISTEN. I'd like to 
 verify first,
 LISTEN is only for clients, correct? There's no way I can define something 
 resident on
 postgres itself that will LISTEN, that can be targetted by notifications?
 
 Does anyone have suggestions for decoupling work done on a server, for 
 breaking up a task
 into multiple asychronous pieces? I believe I've described 1. a viable if 
 ugly means of
 doing so, and 2. limitations in the primary asynchronous toolsuite of 
 Postgres, and am
 looking for ways to make more progress.
 
 Regards,
 -rektide
 
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] nice'ing the postgres COPY backend process to make pg_dumps run more softly

2012-03-19 Thread Kiriakos Georgiou
Try this:   http://klicman.org/throttle/

Kiriakos


On Mar 19, 2012, at 12:06 AM, Aleksey Tsalolikhin wrote:

 Hi.  When pg_dump runs, our application becomes inoperative (too
 slow).  I was going to ask if nice'ing the postgres backend process
 that handles the COPY would help but I just realized probably the
 pg_dump takes out locks when it runs and nice'ing it would just make
 it run longer...
 
 However the man page says pg_dump does not block other users
 accessing the database  (readers  or writers).  But if we run a
 pg_dump, the phone starts ringing, users are complaining that the web
 app is not working.
 
 Would appreciate some pointer to help me reconcile these two
 apparently contradictory facts.
 
 Best,
 -at
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Anonymized database dumps

2012-03-19 Thread Kiriakos Georgiou
The data anonymizer process is flawed because you are one misstep away from 
data spillage.  Sensitive data should be stored encrypted to begin.  For test 
databases you or your developers can invoke a process that replaces the real 
encrypted data with fake encrypted data (for which everybody has the 
key/password.)  Or if the overhead is too much (ie billions of rows), you can 
have different decrypt() routines on your test databases that return fake data 
without touching the real encrypted columns.

Kiriakos

On Mar 19, 2012, at 8:22 AM, Bill Moran wrote:

 In response to Janning Vygen vy...@kicktipp.de:
 
 I am working on postgresql 9.1 and loving it!
 
 Sometimes we need a full database dump to test some performance issues 
 with real data.
 
 Of course we don't like to have sensible data like bunches of e-mail 
 addresses on our development machines as they are of no interest for 
 developers and should be kept secure.
 
 So we need an anonymized database dump. I thought about a few ways to 
 achieve this.
 
 1. Best solution would be a special db user and some rules which fire on 
 reading some tables and replace privacy data with some random data. Now 
 doing a dump as this special user doesn't even copy the sensible data at 
 all. The user just has a different view on this database even when he 
 calls pg_dump.
 
 But as rules are not fired on COPY it can't work, right?
 
 2. The other solution I can think of is something like
 
 pg_dump | sed  pgdump_anon
 
 where 'sed' does a lot of magical replace operations on the content of 
 the dump. I don't think this is going to work reliable.
 
 3. More reliable would be to dump the database, restore it on a 
 different server, run some sql script which randomize some data, and 
 dump it again. hmm, seems to be the only reliable way so far. But it is 
 no fun when dumping and restoring takes an hour.
 
 Does anybody has a better idea how to achieve an anonymized database dump?
 
 I highly recommend #3.  It's how we do it where I work.
 
 At first it seems like a big, slow, complicated monster, but once you've
 built the tools and have it running reliably it's very nice.  Our system
 does the dumps overnight via cron (we have over 100 production databases)
 then changes the sensitive data, as well changing all the passwords to
 password so developers can easily log in as any account.  During the
 day, the developers have access to all the sanitized dump files and can
 use them to make as many testing databases as they need.  Yes, the data
 gets up to 24 hours out of date, but it's never been a problem for us.
 
 -- 
 Bill Moran
 http://www.potentialtech.com
 http://people.collaborativefusion.com/~wmoran/
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Anonymized database dumps

2012-03-19 Thread Kiriakos Georgiou
On Mar 19, 2012, at 5:55 PM, Bill Moran wrote:

 
 Sensitive data should be stored encrypted to begin.  For test databases you 
 or your developers can invoke a process that replaces the real encrypted 
 data with fake encrypted data (for which everybody has the key/password.)  
 Or if the overhead is too much (ie billions of rows), you can have different 
 decrypt() routines on your test databases that return fake data without 
 touching the real encrypted columns.
 
 The thing is, this process has the same potential data spillage
 issues as sanitizing the data.  


Not really, in the modality I describe the sensitive data is always encrypted 
in the database and useless because nobody will have the private key or know 
the password that protects it other than the ops subsystems that require access.
So even if you take an ops dump, load it to a test box, and walk away, you are 
good.  If your developers/testers want to play with the data they will be 
forced to over-write and stage test encrypted data they can decrypt, or call 
a fake decrypt() that gives them test data (eg: joins to a test data table.)

Kiriakos
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Anonymized database dumps

2012-03-18 Thread Kiriakos Georgiou
I would store sensitive data encrypted in the database.  Check the pgcrypto 
module.

Kiriakos


On Mar 18, 2012, at 1:00 PM, Janning Vygen wrote:

 Hi,
 
 I am working on postgresql 9.1 and loving it!
 
 Sometimes we need a full database dump to test some performance issues with 
 real data.
 
 Of course we don't like to have sensible data like bunches of e-mail 
 addresses on our development machines as they are of no interest for 
 developers and should be kept secure.
 
 So we need an anonymized database dump. I thought about a few ways to achieve 
 this.
 
 1. Best solution would be a special db user and some rules which fire on 
 reading some tables and replace privacy data with some random data. Now doing 
 a dump as this special user doesn't even copy the sensible data at all. The 
 user just has a different view on this database even when he calls pg_dump.
 
 But as rules are not fired on COPY it can't work, right?
 
 2. The other solution I can think of is something like
 
 pg_dump | sed  pgdump_anon
 
 where 'sed' does a lot of magical replace operations on the content of the 
 dump. I don't think this is going to work reliable.
 
 3. More reliable would be to dump the database, restore it on a different 
 server, run some sql script which randomize some data, and dump it again. 
 hmm, seems to be the only reliable way so far. But it is no fun when dumping 
 and restoring takes an hour.
 
 Does anybody has a better idea how to achieve an anonymized database dump?
 
 regards
 Janning
 
 
 
 
 
 -- 
 Kicktipp GmbH
 
 Venloer Straße 8, 40477 Düsseldorf
 Sitz der Gesellschaft: Düsseldorf
 Geschäftsführung: Janning Vygen
 Handelsregister Düsseldorf: HRB 55639
 
 http://www.kicktipp.de/
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to isolate the result of SELECT's?

2012-03-18 Thread Kiriakos Georgiou
In all likelihood you do not want to do what you described (lock the tables.)  
You have to have a basic understanding of MVCC and transaction isolation levels 
to determine if the default behavior of mvcc + 'read committed' is sufficient.  
In a lot of cases it is.

Kiriakos 


On Mar 18, 2012, at 7:33 PM, Andre Lopes wrote:

 Hi,
 
 I need to do an operation that I will use some SELECT's and get the
 results, but I want to have sure that those tables have not been
 changed with INSERT's or UPDATES during the operation.
 
 Example:
 
 BEGIN OPERATION
 Select field from table1;
 ...
 Select other_field from table2;
 ...
 END OPERATION
 
 How can I lock these tables to assure that the tables are not getting
 INSERTS's or UPDATE's during the operation?
 
 Best Regards,
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Call for Google Summer of Code (GSoC) 2012: Project ideas?

2012-03-13 Thread Kiriakos Georgiou
+1 to seamless partitioning. 

Although the idea of having a student work on this seems a bit scary, but what 
seems scary to me may be a piece of cake for a talented kid :-)

Kiriakos
http://www.mockbites.com



On Mar 13, 2012, at 3:07 PM, John R Pierce wrote:

 On 03/08/12 12:01 PM, Andy Colson wrote:
 
 2) better partitioning support.  Something much more automatic. 
 
 that would be really high on our list.  and something that can handle 
 adding/dropping partitions while there's concurrent transactions involving 
 the partitioned table
 
 also a planner that can cope with optimizing prepared statements where the 
 partitioning variable is a passed parameter.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] GROUP BY or alternative means to group

2012-03-12 Thread Kiriakos Georgiou
Instead of the joins you can use a subquery to get the first address.

Or you can do the joins without the group by and use row_number() 
over(partition by companies.id) on the select list to label each company 
address with a number starting at 1.  You can just keep rows that have 
row_number = 1.  See 
http://www.postgresql.org/docs/current/static/tutorial-window.html to get the 
feeling how window functions work.

Kiriakos

On Mar 12, 2012, at 3:35 PM, Alexander Reichstadt wrote:

 Hi,
 
 the following statement worked on mysql but gives me an error on postgres:
 
 column addresses.address1 must appear in the GROUP BY clause or be used in 
 an aggregate function
 
 I guess I am doing something wrong. I read the web answers, but none of them 
 seem to meet my needs:
 
 SELECT 
 companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip
  FROM companies JOIN addresses_reference ON 
 companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON 
 addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;
 
 
 What I did now was create a view based on above statement but without 
 grouping. This returns a list with non-distinct values for all companies that 
 have more than one address, which is correct. But in some cases I only need 
 one address and the problem is that I cannot use distinct.
 
 I wanted to have some way to display a companies list that only gives me the 
 first stored addresses related, and disregard any further addresses.
 
 Is there any way to do this?
 
 Thanks
 Alex



Re: [GENERAL] Regarding NOTIFY

2012-03-09 Thread Kiriakos Georgiou
Yes, can do.  Just have an insert trigger on the jobs table that notifies the 
monitor, something like:

CREATE OR REPLACE FUNCTION notify_monitor()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $$
BEGIN
NOTIFY monitor;
RETURN NULL;
END
$$

CREATE TRIGGER jobs_trigger
AFTER INSERT
ON jobs
FOR EACH STATEMENT
EXECUTE PROCEDURE notify_monitor();

Then all the monitor has to do is wait for notifications.  For C, see 
http://www.postgresql.org/docs/9.1/static/libpq-notify.html
ruby-pg (the official ruby api to postgresql) has wait_for_notify(), which 
pretty much implements what the postgresql docs suggest.

Kiriakos
http://www.mockbites.com



On Mar 9, 2012, at 11:53 AM, Bret Stern wrote:

 We have a concrete batching application composed of two parts.
 1. The Monitor. 
 The Monitor cycles every 60 seconds, and looks into a Postgresql table
 for jobs to run. Primarily these jobs update Postgresql tables with
 data from external applications.
 
 2. The Client.
 The client schedules orders etc.
 
 When a new product or customer is added to the Accounting or Batching
 Controller (both external applications; and databases) the Client user
 clicks a button and adds a job to run on the Monitor.
 
 Is it possible use the NOTIFY event to serve more like an interrupt,
 and trigger the Monitor to run immediately.
 
 Can it be used with VB?
 or
 Should I use LibPQ?
 
 Any suggestions welcome.
 
 
 If this is the wrong list for these questions, let me know?
 
 Bret Stern
 
 
 
 
 
 
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Advisory transaction lock for 128-bit space

2012-03-08 Thread Kiriakos Georgiou
Indeed, if there is not some sort of implementation limitation, it would be 
cool to be able to lock two big integers like so:

pg_try_advisory_xact_lock(key1 bigint, key2 bigint)

That would solve your problem with locking UUIDs (although you still couldn't 
lock UUIDs simultaneously across different tables without risking lock 
interference.)  It would also enable the use of advisory locks on multiple 
tables that have bigserial (bigint) as the primary key, eg:

pg_try_advisory_xact_lock(t.id, t.tableoid::bigint)

Obviously you don't need a bigint for tableoid but being able to lock two 
bigints allows you to go to 16-bytes if need be.

This came up when I was thinking about how to implement processing queues.  
It's doable if you assign an int4 id for each queue row (each queue is limited 
to not grow beyond 2B rows, which seems reasonably generous), then you can do:

pg_try_advisory_xact_lock(t.qid, t.tableoid::int4)

This is supported by the current postgresql version.

Kiriakos


On Mar 7, 2012, at 12:52 PM, Andrey Chursin wrote:

 Hello.
 My application need to set advisory lock on UUID key, almost like it
 does pg_advisory_xact_lock function. The problem is argument type of
 this function - it consumes 8-byte value, not 16-byte.
 
 I can not lock on any(hi, low or middle) 8-byte part of UUID, as far
 as it can produce unexpected deadlock issues, because locking on some
 ID in this way will imply locking on more wide set of ID then I
 requested.
 
 Now I am doing the 'trick' using indexing insert/delete, e.g.:
 INSERT INTO table_with_uuid_pk(locking_value);
 DELETE FROM table_with_uuid_pk WHERE inserted_row_above;
 
 It works, but I did not found any description of such 'feature' of
 indexes. Can u, please, help to solve this synchronization issue, and
 comment the way I am dealing with it now(with index locking)
 
 P.S. The most significant fear I know have, is that currently used
 method suffers with same problem as locking for part of UUID - doest
 insert/delete really locks only on the value i passed to it?
 
 -- 
 Regards,
 Andrey
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] SELECT FOR UPDATE could see commited trasaction partially.

2012-03-05 Thread Kiriakos Georgiou
This is correct behavior with MVCC.  Do a 'select * from x' in thread 2 and to 
understand why.  The select for update in thread 2 sees the data in table x as 
it was prior to thread 1 committing, thus it won't see the row with a=2.

For further suggestions you'll have to explain what you are logically trying to 
accomplish.

Kiriakos


On Mar 5, 2012, at 1:41 AM, Sam Wong wrote:

 Hi,
  
 I hit a UPDATE/LOCK issue in my application and the result has surprised me 
 somewhat…
 And for the repro, it boils down into this:
 ---
 CREATE TABLE x (a int, b bool);
 INSERT INTO x VALUES (1, TRUE);
 COMMIT;
  
 _THREAD 1_:
 BEGIN;
 UPDATE x SET b=FALSE;
 INSERT INTO x VALUES (2, TRUE);
  
 _THREAD 2_:
 BEGIN;
 SELECT * FROM x WHERE b=TRUE FOR UPDATE; -- It will be blocked, as expected
  
 _THREAD 1_:
 COMMIT;
  
 _THREAD 2_ will be unblocked.  It will return no rows.
 I expect it to return (2, TRUE) instead, when I design the program.
  
 If I issue the same SELECT query in THREAD 2 right now, it does indeed return 
 (2, TRUE).
  
 For the same SQL statement, in MySQL-InnoDB, it does return (2, TRUE) in the 
 first SELECT.
  
 I understand why this happens in PgSQL, (because it first limited the 
 selection and locked the row, upon unlock it recheck the condition)…
  
 I don’t like THERAD 2 only see half of the fact of the committed transaction 
 (it see the effect of the update but not the insert), is there anything I 
 could do?
  
 I considered:
 * ISOLATION serialization – but the thread 2 would abort as deadlock…
 * Both thread do: LOCK TABLE x AS SHARE ROW EXCLUSIVE MODE – does resolve my 
 issue but it creates a big lock contention problem, and relies on app to do 
 the right thing.
 * Advisory lock – pretty much the same, except that I could unlock earlier to 
 make the locking period shorter, but nevertheless it’s the whole table lock…
  
 Thoughts?
  
 Thanks,
 Sam



Re: [GENERAL] Counting different strings (OK%, FB%) in same table, grouped by week number

2012-02-22 Thread Kiriakos Georgiou
I'd code it more general to allow for any user type:

select
   yw, substr(id,1,2) as user_type, count(1)
from
   pref_money
group by
   yw, user_type

You can use some clever pivoting to get the user_types to be columns, but I see 
no need to waste db cycles.
You can get the report you want by one-pass processing of the above result set.

If you have mountains of data I'd precompute, before insert or during insert by 
a trigger, the user_type and store it separately.

Kiriakos
http://www.mockbites.com



On Feb 22, 2012, at 3:36 PM, Alexander Farber wrote:

 Hello,
 
 I have a table holding week numbers (as strings)
 and user ids starting with OK, VK, FB, GG, MR, DE
 (coming through diff. soc. networks to my site):
 
 afarber@www:~ psql
 psql (8.4.9)
 Type help for help.
 
 pref= select * from pref_money;
 
   id| money  |   yw
 -++-
 OK19644992852   |  8 | 2010-44
 OK21807961329   |114 | 2010-44
 FB1845091917|774 | 2010-44
 OK172682607383  |-34 | 2010-44
 VK14831014  | 14 | 2010-44
 VK91770810  |   2368 | 2010-44
 DE8341  |795 | 2010-44
 VK99736508  | 97 | 2010-44
 
 I'm trying to count those different users.
 
 For one type of users (here Facebook) it's easy:
 
 
 pref= select yw, count(*) from pref_money
where id like 'FB%' group by yw order by yw desc;
 
   yw| count
 -+---
 2012-08 |32
 2012-07 |32
 2012-06 |37
 2012-05 |46
 2012-04 |41
 
 But if I want to have a table displaying all users
 (a column for FB%, a column for OK%, etc.) -
 then I either have to perform a lot of copy-paste and
 vim-editing or maybe someone can give me an advice?
 
 I've reread the having-doc at
 http://www.postgresql.org/docs/8.4/static/tutorial-agg.html
 and still can't figure it out...
 
 Thank you
 Alex
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] order of evaluation of search arguments

2012-02-15 Thread Kiriakos Georgiou
As I understand it the order the of evaluation of search arguments is up to the 
optimizer.  I've tested the following query, that is supposed to take advantage 
of advisory locks to skip over rows that are locked by other consumers running 
the exact same query and it seems to work fine.  It seems to me the optimizer 
should always prefer to scan by status.  What say you?

begin;

select *
from queue q
where
   q.status = 'unprocessed'
   and pg_try_advisory_xact_lock(q.qid, q.tableoid::int) = true
limit 2 -- specify batch size here
for update;

-- now process the queue items

Kiriakos





Re: [GENERAL] order of evaluation of search arguments

2012-02-15 Thread Kiriakos Georgiou
I tested it by visual inspection of advisory locks in pg_locks; once with a 
small test table, and once on a larger 'operations' table in our test 
environment.  It seemed to work, but I hear you, I don't like to depend on the 
mood of the optimizer.  The drawback of the subquery version is that if the 
queue table gets big (eg: consumers are shut off) it will spend a lot of time 
building a worktable only to get a few items from it later in the outer query.  
Perhaps putting a limit in the inner query as well can alleviate that, like:  
LIMIT (expected number of consumers) * (batch size)

There is a theoretical race condition with this approach (fast vs slow 
consumers) but it's harmless - a consumer that executes the inner query slowly 
may get less or no items to process, although there might be some, but they 
will picked up eventually, so it's no big deal.

thanks,
Kiriakos

On Feb 16, 2012, at 12:44 AM, Tom Lane wrote:

 Kiriakos Georgiou kg.postgre...@olympiakos.com writes:
 As I understand it the order the of evaluation of search arguments is up to 
 the optimizer.  I've tested the following query, that is supposed to take 
 advantage of advisory locks to skip over rows that are locked by other 
 consumers running the exact same query and it seems to work fine.  It seems 
 to me the optimizer should always prefer to scan by status.  What say you?
 
 When it breaks, you get to keep both pieces.  Was your testing even
 capable of noticing the problem if the query locked more rows than you
 wanted?
 
 Less dangerous coding practices might involve putting the lock function
 in an outer query, while using an OFFSET 0 in the sub-query as an
 optimization fence.
 
   regards, tom lane
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general