Re: [GENERAL] Avoid sorting when doing an array_agg
The array_agg() has nothing to do with it. It’s the group by. Without knowing what you are conceptually trying to accomplish, I can’t say much. On my test 9.4.10 db, a similar example does a HashAggregate, so no sorting (google HashAggregate vs GroupAggregate). But still it’s an expensive query because of all the I/O. If I wanted to instantly have the user ids for a specific first, last name and category combo, I’d maintain a summary table via an insert trigger on the users table. Kiriakos From:on behalf of Alexis Woo Date: Friday, December 2, 2016 at 12:25 PM To: Subject: [GENERAL] Avoid sorting when doing an array_agg I have a users table which contains ~70 million rows that looks like this: Column| Type| -+---+ id | integer | first_name | character varying | last_name | character varying | category_id | integer | Indexes: "users_id_idx" btree (id) "users_category_id_first_name_last_name_idx" btree (category_id, first_name, last_name) I'm trying to retrieve the ids for each (first_name, last_name) couple for one specific category_id. The query that I'm currently doing is the following: select array_agg(id) from users where category_id = 5432 group by first_name, last_name; For which the explain analyze output is the following: GroupAggregate (cost=618461.35..626719.42 rows=26881 width=19) (actual time=1683.139..2613.386 rows=102943 loops=1) Group Key: first_name, last_name -> Sort (cost=618461.35..620441.86 rows=792206 width=19) (actual time=1683.116..2368.904 rows=849428 loops=1) Sort Key: first_name, last_name Sort Method: external merge Disk: 25304kB -> Bitmap Heap Scan on users (cost=26844.16..524595.92 rows=792206 width=19) (actual time=86.046..229.469 rows=849428 loops=1) Recheck Cond: (category_id = 5432) Heap Blocks: exact=7938 -> Bitmap Index Scan on users_category_id_first_name_last_name_idx (cost=0.00..26646.11 rows=792206 width=0) (actual time=85.006..85.006 rows=849428 loops=1) Index Cond: (category_id = 5432) What seems to greatly decrease the performance of the query is the "Sort Method: external merge Disk: 7526kB." Is it possible to aggregate the ids without doing a sort ? If not, what other options, apart from increasing the work_mem, do I have ? Thanks, Alexis
[GENERAL] Announcement: application level undo/redo for PostgreSQL
Greetings, I’d like to announce the availability of a PostgreSQL extension that I’ve sat on for a year – it’s time to share! It provides undo/redo functionality useful to application developers that want to persist application objects in a PostgreSQL database. It does some gymnastics using jsonb under the hood, so you’ll need to be on PostgreSQL 9.4 and up. The bad news is that currently you have to model the application object with just one table (multi table undo/redo is on the to-do list). The good news is that it implements non-linear multi-timeline undo/redo. What this means is it keeps a full history of table operations and allows you to jump anywhere in the history and continue operations from there. For an article detailing how to use it see: http://www.mockbites.com/articles/tech/istoria You can get it from here: https://github.com/KiriakosGeorgiou/istoria This was inspired by the airspace editor project I’m currently working on where undo/redo is managed by the application. I wanted to push all this to the database so a) tables don’t get polluted with undo/redo metadata b) the application does not need to implement undo/redo logic. I hope you find it useful. Regards, Kiriakos Georgiou
[GENERAL] pg_multixact issues
Hello, Our pg_multixact directory keeps growing. I did a "vacuum freeze” which didn’t help. I also did a "vacuum full” which didn’t help either. We had this condition with 9.3.4 as well. When I upgraded our cluster to 9.4.5 (via plain sql dump and load) as expected the issue was resolved but now it’s happening again. Luckily it has no ill effect other than consuming 4G of space for an otherwise 1G database. Can you offer any hints as to how I can cure this? thanks, Kiriakos Georgiou pg_controldata output: pg_control version number:942 Catalog version number: 201409291 Database system identifier: 6211781659140720513 Database cluster state: in production pg_control last modified: Wed Feb 10 13:45:02 2016 Latest checkpoint location: D/FB5FE630 Prior checkpoint location:D/FB5FE558 Latest checkpoint's REDO location:D/FB5FE5F8 Latest checkpoint's REDO WAL file:0001000D00FB Latest checkpoint's TimeLineID: 1 Latest checkpoint's PrevTimeLineID: 1 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0/3556219 Latest checkpoint's NextOID: 2227252 Latest checkpoint's NextMultiXactId: 2316566 Latest checkpoint's NextMultiOffset: 823062151 Latest checkpoint's oldestXID:668 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 3556219 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 1 Time of latest checkpoint:Wed Feb 10 13:45:02 2016 Fake LSN counter for unlogged rels: 0/1 Minimum recovery ending location: 0/0 Min recovery ending loc's timeline: 0 Backup start location:0/0 Backup end location: 0/0 End-of-backup record required:no Current wal_level setting:hot_standby Current wal_log_hints setting:off Current max_connections setting: 100 Current max_worker_processes setting: 8 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 1024 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment:16777216 Maximum length of identifiers:64 Maximum columns in an index: 32 Maximum size of a TOAST chunk:1996 Size of a large-object chunk: 2048 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value Data page checksum version: 0 the offsets directory: -rw--- 1 postgres dba 262144 Nov 3 15:22 -rw--- 1 postgres dba 262144 Nov 5 12:45 0001 -rw--- 1 postgres dba 262144 Nov 9 14:25 0002 -rw--- 1 postgres dba 262144 Nov 13 10:10 0003 -rw--- 1 postgres dba 262144 Nov 16 15:40 0004 -rw--- 1 postgres dba 262144 Nov 20 09:55 0005 -rw--- 1 postgres dba 262144 Dec 1 08:00 0006 -rw--- 1 postgres dba 262144 Dec 9 11:50 0007 -rw--- 1 postgres dba 262144 Dec 16 08:14 0008 -rw--- 1 postgres dba 262144 Dec 21 09:40 0009 -rw--- 1 postgres dba 262144 Dec 31 09:55 000A -rw--- 1 postgres dba 262144 Jan 4 21:17 000B -rw--- 1 postgres dba 262144 Jan 6 10:50 000C -rw--- 1 postgres dba 262144 Jan 7 18:20 000D -rw--- 1 postgres dba 262144 Jan 13 13:55 000E -rw--- 1 postgres dba 262144 Jan 15 11:55 000F -rw--- 1 postgres dba 262144 Jan 22 07:50 0010 -rw--- 1 postgres dba 262144 Jan 26 16:35 0011 -rw--- 1 postgres dba 262144 Jan 29 10:16 0012 -rw--- 1 postgres dba 262144 Feb 3 13:17 0013 -rw--- 1 postgres dba 262144 Feb 3 16:13 0014 -rw--- 1 postgres dba 262144 Feb 4 08:24 0015 -rw--- 1 postgres dba 262144 Feb 5 13:20 0016 -rw--- 1 postgres dba 262144 Feb 8 11:26 0017 -rw--- 1 postgres dba 262144 Feb 8 11:46 0018 -rw--- 1 postgres dba 262144 Feb 8 12:25 0019 -rw--- 1 postgres dba 262144 Feb 8 13:19 001A -rw--- 1 postgres dba 262144 Feb 8 14:23 001B -rw--- 1 postgres dba 262144 Feb 8 15:32 001C -rw--- 1 postgres dba 262144 Feb 8 17:01 001D -rw--- 1 postgres dba 262144 Feb 8 19:19 001E -rw--- 1 postgres dba 262144 Feb 8 22:11 001F -rw--- 1 postgres dba 262144 Feb 9 01:44 0020 -rw--- 1 postgres dba 262144 Feb 9 05:57 0021 -rw--- 1 postgres dba 262144 Feb 9 10:45 0022 -rw--- 1 postgres dba98304 Feb 10 13:35 0023 the members directory has 15723 files: ls -l|wc -l 15723
Re: [GENERAL] NOTIFY/LISTEN on server, asynchronous processing
If I'm understanding you correctly, you want a job queue. This involves polling and retrieving jobs to work on them. The polling can be assisted by listen/notify so workers don't poll unnecessarily. The retrieving has to be done intelligently to avoid concurrency issues. Kiriakos Georgiou http://mockbites.com/about/email On Dec 11, 2012, at 2:29 PM, rektide rekt...@voodoowarez.com wrote: Hi all, I'm writing seeking help for making asynchronous decoupled processes run on a Postgres server. Here's my current hairbraned workingis: 1. Create an table async_process and attach a trigger after. 2. Establish a dblink to localhost. 3. dblink_send_query(update async_process set counter = counter + 1;) from other sprocs 3. Designated processing hanging off this async_process table now runs. All I'm doing is using a table, to create a trigger, that can be run asynchronously. There's at least two things gross about this strategy: 1. A async_process table exists only because I need a trigger that can be updated at will. 2. Having to dblink to oneself to run a query from inside the database asynchronously. Postgres has a capability for doing async work: NOTIFY/LISTEN. I'd like to verify first, LISTEN is only for clients, correct? There's no way I can define something resident on postgres itself that will LISTEN, that can be targetted by notifications? Does anyone have suggestions for decoupling work done on a server, for breaking up a task into multiple asychronous pieces? I believe I've described 1. a viable if ugly means of doing so, and 2. limitations in the primary asynchronous toolsuite of Postgres, and am looking for ways to make more progress. Regards, -rektide -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] nice'ing the postgres COPY backend process to make pg_dumps run more softly
Try this: http://klicman.org/throttle/ Kiriakos On Mar 19, 2012, at 12:06 AM, Aleksey Tsalolikhin wrote: Hi. When pg_dump runs, our application becomes inoperative (too slow). I was going to ask if nice'ing the postgres backend process that handles the COPY would help but I just realized probably the pg_dump takes out locks when it runs and nice'ing it would just make it run longer... However the man page says pg_dump does not block other users accessing the database (readers or writers). But if we run a pg_dump, the phone starts ringing, users are complaining that the web app is not working. Would appreciate some pointer to help me reconcile these two apparently contradictory facts. Best, -at -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Anonymized database dumps
The data anonymizer process is flawed because you are one misstep away from data spillage. Sensitive data should be stored encrypted to begin. For test databases you or your developers can invoke a process that replaces the real encrypted data with fake encrypted data (for which everybody has the key/password.) Or if the overhead is too much (ie billions of rows), you can have different decrypt() routines on your test databases that return fake data without touching the real encrypted columns. Kiriakos On Mar 19, 2012, at 8:22 AM, Bill Moran wrote: In response to Janning Vygen vy...@kicktipp.de: I am working on postgresql 9.1 and loving it! Sometimes we need a full database dump to test some performance issues with real data. Of course we don't like to have sensible data like bunches of e-mail addresses on our development machines as they are of no interest for developers and should be kept secure. So we need an anonymized database dump. I thought about a few ways to achieve this. 1. Best solution would be a special db user and some rules which fire on reading some tables and replace privacy data with some random data. Now doing a dump as this special user doesn't even copy the sensible data at all. The user just has a different view on this database even when he calls pg_dump. But as rules are not fired on COPY it can't work, right? 2. The other solution I can think of is something like pg_dump | sed pgdump_anon where 'sed' does a lot of magical replace operations on the content of the dump. I don't think this is going to work reliable. 3. More reliable would be to dump the database, restore it on a different server, run some sql script which randomize some data, and dump it again. hmm, seems to be the only reliable way so far. But it is no fun when dumping and restoring takes an hour. Does anybody has a better idea how to achieve an anonymized database dump? I highly recommend #3. It's how we do it where I work. At first it seems like a big, slow, complicated monster, but once you've built the tools and have it running reliably it's very nice. Our system does the dumps overnight via cron (we have over 100 production databases) then changes the sensitive data, as well changing all the passwords to password so developers can easily log in as any account. During the day, the developers have access to all the sanitized dump files and can use them to make as many testing databases as they need. Yes, the data gets up to 24 hours out of date, but it's never been a problem for us. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Anonymized database dumps
On Mar 19, 2012, at 5:55 PM, Bill Moran wrote: Sensitive data should be stored encrypted to begin. For test databases you or your developers can invoke a process that replaces the real encrypted data with fake encrypted data (for which everybody has the key/password.) Or if the overhead is too much (ie billions of rows), you can have different decrypt() routines on your test databases that return fake data without touching the real encrypted columns. The thing is, this process has the same potential data spillage issues as sanitizing the data. Not really, in the modality I describe the sensitive data is always encrypted in the database and useless because nobody will have the private key or know the password that protects it other than the ops subsystems that require access. So even if you take an ops dump, load it to a test box, and walk away, you are good. If your developers/testers want to play with the data they will be forced to over-write and stage test encrypted data they can decrypt, or call a fake decrypt() that gives them test data (eg: joins to a test data table.) Kiriakos -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Anonymized database dumps
I would store sensitive data encrypted in the database. Check the pgcrypto module. Kiriakos On Mar 18, 2012, at 1:00 PM, Janning Vygen wrote: Hi, I am working on postgresql 9.1 and loving it! Sometimes we need a full database dump to test some performance issues with real data. Of course we don't like to have sensible data like bunches of e-mail addresses on our development machines as they are of no interest for developers and should be kept secure. So we need an anonymized database dump. I thought about a few ways to achieve this. 1. Best solution would be a special db user and some rules which fire on reading some tables and replace privacy data with some random data. Now doing a dump as this special user doesn't even copy the sensible data at all. The user just has a different view on this database even when he calls pg_dump. But as rules are not fired on COPY it can't work, right? 2. The other solution I can think of is something like pg_dump | sed pgdump_anon where 'sed' does a lot of magical replace operations on the content of the dump. I don't think this is going to work reliable. 3. More reliable would be to dump the database, restore it on a different server, run some sql script which randomize some data, and dump it again. hmm, seems to be the only reliable way so far. But it is no fun when dumping and restoring takes an hour. Does anybody has a better idea how to achieve an anonymized database dump? regards Janning -- Kicktipp GmbH Venloer Straße 8, 40477 Düsseldorf Sitz der Gesellschaft: Düsseldorf Geschäftsführung: Janning Vygen Handelsregister Düsseldorf: HRB 55639 http://www.kicktipp.de/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to isolate the result of SELECT's?
In all likelihood you do not want to do what you described (lock the tables.) You have to have a basic understanding of MVCC and transaction isolation levels to determine if the default behavior of mvcc + 'read committed' is sufficient. In a lot of cases it is. Kiriakos On Mar 18, 2012, at 7:33 PM, Andre Lopes wrote: Hi, I need to do an operation that I will use some SELECT's and get the results, but I want to have sure that those tables have not been changed with INSERT's or UPDATES during the operation. Example: BEGIN OPERATION Select field from table1; ... Select other_field from table2; ... END OPERATION How can I lock these tables to assure that the tables are not getting INSERTS's or UPDATE's during the operation? Best Regards, -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Call for Google Summer of Code (GSoC) 2012: Project ideas?
+1 to seamless partitioning. Although the idea of having a student work on this seems a bit scary, but what seems scary to me may be a piece of cake for a talented kid :-) Kiriakos http://www.mockbites.com On Mar 13, 2012, at 3:07 PM, John R Pierce wrote: On 03/08/12 12:01 PM, Andy Colson wrote: 2) better partitioning support. Something much more automatic. that would be really high on our list. and something that can handle adding/dropping partitions while there's concurrent transactions involving the partitioned table also a planner that can cope with optimizing prepared statements where the partitioning variable is a passed parameter. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] GROUP BY or alternative means to group
Instead of the joins you can use a subquery to get the first address. Or you can do the joins without the group by and use row_number() over(partition by companies.id) on the select list to label each company address with a number starting at 1. You can just keep rows that have row_number = 1. See http://www.postgresql.org/docs/current/static/tutorial-window.html to get the feeling how window functions work. Kiriakos On Mar 12, 2012, at 3:35 PM, Alexander Reichstadt wrote: Hi, the following statement worked on mysql but gives me an error on postgres: column addresses.address1 must appear in the GROUP BY clause or be used in an aggregate function I guess I am doing something wrong. I read the web answers, but none of them seem to meet my needs: SELECT companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip FROM companies JOIN addresses_reference ON companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON addresses_reference.refid_addresses=addresses.id GROUP BY companies.id; What I did now was create a view based on above statement but without grouping. This returns a list with non-distinct values for all companies that have more than one address, which is correct. But in some cases I only need one address and the problem is that I cannot use distinct. I wanted to have some way to display a companies list that only gives me the first stored addresses related, and disregard any further addresses. Is there any way to do this? Thanks Alex
Re: [GENERAL] Regarding NOTIFY
Yes, can do. Just have an insert trigger on the jobs table that notifies the monitor, something like: CREATE OR REPLACE FUNCTION notify_monitor() RETURNS TRIGGER LANGUAGE plpgsql AS $$ BEGIN NOTIFY monitor; RETURN NULL; END $$ CREATE TRIGGER jobs_trigger AFTER INSERT ON jobs FOR EACH STATEMENT EXECUTE PROCEDURE notify_monitor(); Then all the monitor has to do is wait for notifications. For C, see http://www.postgresql.org/docs/9.1/static/libpq-notify.html ruby-pg (the official ruby api to postgresql) has wait_for_notify(), which pretty much implements what the postgresql docs suggest. Kiriakos http://www.mockbites.com On Mar 9, 2012, at 11:53 AM, Bret Stern wrote: We have a concrete batching application composed of two parts. 1. The Monitor. The Monitor cycles every 60 seconds, and looks into a Postgresql table for jobs to run. Primarily these jobs update Postgresql tables with data from external applications. 2. The Client. The client schedules orders etc. When a new product or customer is added to the Accounting or Batching Controller (both external applications; and databases) the Client user clicks a button and adds a job to run on the Monitor. Is it possible use the NOTIFY event to serve more like an interrupt, and trigger the Monitor to run immediately. Can it be used with VB? or Should I use LibPQ? Any suggestions welcome. If this is the wrong list for these questions, let me know? Bret Stern -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Advisory transaction lock for 128-bit space
Indeed, if there is not some sort of implementation limitation, it would be cool to be able to lock two big integers like so: pg_try_advisory_xact_lock(key1 bigint, key2 bigint) That would solve your problem with locking UUIDs (although you still couldn't lock UUIDs simultaneously across different tables without risking lock interference.) It would also enable the use of advisory locks on multiple tables that have bigserial (bigint) as the primary key, eg: pg_try_advisory_xact_lock(t.id, t.tableoid::bigint) Obviously you don't need a bigint for tableoid but being able to lock two bigints allows you to go to 16-bytes if need be. This came up when I was thinking about how to implement processing queues. It's doable if you assign an int4 id for each queue row (each queue is limited to not grow beyond 2B rows, which seems reasonably generous), then you can do: pg_try_advisory_xact_lock(t.qid, t.tableoid::int4) This is supported by the current postgresql version. Kiriakos On Mar 7, 2012, at 12:52 PM, Andrey Chursin wrote: Hello. My application need to set advisory lock on UUID key, almost like it does pg_advisory_xact_lock function. The problem is argument type of this function - it consumes 8-byte value, not 16-byte. I can not lock on any(hi, low or middle) 8-byte part of UUID, as far as it can produce unexpected deadlock issues, because locking on some ID in this way will imply locking on more wide set of ID then I requested. Now I am doing the 'trick' using indexing insert/delete, e.g.: INSERT INTO table_with_uuid_pk(locking_value); DELETE FROM table_with_uuid_pk WHERE inserted_row_above; It works, but I did not found any description of such 'feature' of indexes. Can u, please, help to solve this synchronization issue, and comment the way I am dealing with it now(with index locking) P.S. The most significant fear I know have, is that currently used method suffers with same problem as locking for part of UUID - doest insert/delete really locks only on the value i passed to it? -- Regards, Andrey -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] SELECT FOR UPDATE could see commited trasaction partially.
This is correct behavior with MVCC. Do a 'select * from x' in thread 2 and to understand why. The select for update in thread 2 sees the data in table x as it was prior to thread 1 committing, thus it won't see the row with a=2. For further suggestions you'll have to explain what you are logically trying to accomplish. Kiriakos On Mar 5, 2012, at 1:41 AM, Sam Wong wrote: Hi, I hit a UPDATE/LOCK issue in my application and the result has surprised me somewhat… And for the repro, it boils down into this: --- CREATE TABLE x (a int, b bool); INSERT INTO x VALUES (1, TRUE); COMMIT; _THREAD 1_: BEGIN; UPDATE x SET b=FALSE; INSERT INTO x VALUES (2, TRUE); _THREAD 2_: BEGIN; SELECT * FROM x WHERE b=TRUE FOR UPDATE; -- It will be blocked, as expected _THREAD 1_: COMMIT; _THREAD 2_ will be unblocked. It will return no rows. I expect it to return (2, TRUE) instead, when I design the program. If I issue the same SELECT query in THREAD 2 right now, it does indeed return (2, TRUE). For the same SQL statement, in MySQL-InnoDB, it does return (2, TRUE) in the first SELECT. I understand why this happens in PgSQL, (because it first limited the selection and locked the row, upon unlock it recheck the condition)… I don’t like THERAD 2 only see half of the fact of the committed transaction (it see the effect of the update but not the insert), is there anything I could do? I considered: * ISOLATION serialization – but the thread 2 would abort as deadlock… * Both thread do: LOCK TABLE x AS SHARE ROW EXCLUSIVE MODE – does resolve my issue but it creates a big lock contention problem, and relies on app to do the right thing. * Advisory lock – pretty much the same, except that I could unlock earlier to make the locking period shorter, but nevertheless it’s the whole table lock… Thoughts? Thanks, Sam
Re: [GENERAL] Counting different strings (OK%, FB%) in same table, grouped by week number
I'd code it more general to allow for any user type: select yw, substr(id,1,2) as user_type, count(1) from pref_money group by yw, user_type You can use some clever pivoting to get the user_types to be columns, but I see no need to waste db cycles. You can get the report you want by one-pass processing of the above result set. If you have mountains of data I'd precompute, before insert or during insert by a trigger, the user_type and store it separately. Kiriakos http://www.mockbites.com On Feb 22, 2012, at 3:36 PM, Alexander Farber wrote: Hello, I have a table holding week numbers (as strings) and user ids starting with OK, VK, FB, GG, MR, DE (coming through diff. soc. networks to my site): afarber@www:~ psql psql (8.4.9) Type help for help. pref= select * from pref_money; id| money | yw -++- OK19644992852 | 8 | 2010-44 OK21807961329 |114 | 2010-44 FB1845091917|774 | 2010-44 OK172682607383 |-34 | 2010-44 VK14831014 | 14 | 2010-44 VK91770810 | 2368 | 2010-44 DE8341 |795 | 2010-44 VK99736508 | 97 | 2010-44 I'm trying to count those different users. For one type of users (here Facebook) it's easy: pref= select yw, count(*) from pref_money where id like 'FB%' group by yw order by yw desc; yw| count -+--- 2012-08 |32 2012-07 |32 2012-06 |37 2012-05 |46 2012-04 |41 But if I want to have a table displaying all users (a column for FB%, a column for OK%, etc.) - then I either have to perform a lot of copy-paste and vim-editing or maybe someone can give me an advice? I've reread the having-doc at http://www.postgresql.org/docs/8.4/static/tutorial-agg.html and still can't figure it out... Thank you Alex -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] order of evaluation of search arguments
As I understand it the order the of evaluation of search arguments is up to the optimizer. I've tested the following query, that is supposed to take advantage of advisory locks to skip over rows that are locked by other consumers running the exact same query and it seems to work fine. It seems to me the optimizer should always prefer to scan by status. What say you? begin; select * from queue q where q.status = 'unprocessed' and pg_try_advisory_xact_lock(q.qid, q.tableoid::int) = true limit 2 -- specify batch size here for update; -- now process the queue items Kiriakos
Re: [GENERAL] order of evaluation of search arguments
I tested it by visual inspection of advisory locks in pg_locks; once with a small test table, and once on a larger 'operations' table in our test environment. It seemed to work, but I hear you, I don't like to depend on the mood of the optimizer. The drawback of the subquery version is that if the queue table gets big (eg: consumers are shut off) it will spend a lot of time building a worktable only to get a few items from it later in the outer query. Perhaps putting a limit in the inner query as well can alleviate that, like: LIMIT (expected number of consumers) * (batch size) There is a theoretical race condition with this approach (fast vs slow consumers) but it's harmless - a consumer that executes the inner query slowly may get less or no items to process, although there might be some, but they will picked up eventually, so it's no big deal. thanks, Kiriakos On Feb 16, 2012, at 12:44 AM, Tom Lane wrote: Kiriakos Georgiou kg.postgre...@olympiakos.com writes: As I understand it the order the of evaluation of search arguments is up to the optimizer. I've tested the following query, that is supposed to take advantage of advisory locks to skip over rows that are locked by other consumers running the exact same query and it seems to work fine. It seems to me the optimizer should always prefer to scan by status. What say you? When it breaks, you get to keep both pieces. Was your testing even capable of noticing the problem if the query locked more rows than you wanted? Less dangerous coding practices might involve putting the lock function in an outer query, while using an OFFSET 0 in the sub-query as an optimization fence. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general