[PERFORM] index over timestamp not being used
Hi all, I've got the following two tables running on postgresql 8.1.4 transactions Column |Type | Modifiers --+-+--- transaction_id| character varying(32) | not null user_id | bigint | not null timestamp_in | timestamp without time zone | default now() type_id | integer | technology_id | integer | Indexes: pk_phusrtrans_transid PRIMARY KEY, btree (transaction_id) idx_phusrtrans_paytyptech btree (type_id, technology_id) idx_putrnsctns_tstampin btree (timestamp_in) statistics Column |Type |Modifiers --+-+--- statistic_id | bigint | not null duration | bigint | transaction_id| character varying(32) | Indexes: pk_phstat_statid PRIMARY KEY, btree (statistic_id) idx_phstat_transid btree (transaction_id) the idea is to have a summary of how many transactions, duration, and type for every date. To do so, I've done the following query: SELECT count(t.transaction_id) AS num_transactions , SUM(s.duration) AS duration , date(t.timestamp_in) as date , t.type_id FROM transactions t LEFT OUTER JOIN statistics s ON t.transaction_id = s.transaction_id WHERE t.timestamp_in = to_timestamp('20070101', 'MMDD') GROUP BY date, t.type_id; I think this could be speed up if the index idx_putrnsctns_tstampin (index over the timestamp) could be used, but I haven't been able to do it. Any suggestion? Thanks all -- Arnau ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] index over timestamp not being used
Hi Tom, Alternatively, do you really need to_timestamp at all? The standard timestamp input routine won't have any problem with that format: t.timestamp_in = '20070101' This is always I think I'm worried, what happens if one day the internal format in which the DB stores the date/timestamps changes. I mean, if instead of being stored as MMDD is stored as DDMM, should we have to change all the queries? I thought the to_char/to_date/to_timestamp functions were intented for this purposes -- Arnau ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[PERFORM] Is it possible to know where is the deadlock
Hi all, I have a serious problem with a server. This server holds severals DB, the problem is thet the CPU's spend most of the time waiting: Cpu0: 4.0% us, 2.3% sy, 0.0% ni, 61.5% id, 32.1% wa, 0.0% hi, 0.0% si Cpu1: 2.3% us, 0.3% sy, 0.0% ni, 84.1% id, 13.3% wa, 0.0% hi, 0.0% si Cpu2: 1.3% us, 0.3% sy, 0.0% ni, 68.6% id, 29.8% wa, 0.0% hi, 0.0% si Cpu3: 4.6% us, 3.3% sy, 0.0% ni, 2.6% id, 88.4% wa, 0.3% hi, 0.7% si The iostat -c says about 8% of time waiting for IO. I'm afraid this is due to locks between concurrent queries, is there anyway to have more info about? Thanks all -- Arnau ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Advice about how to delete
Hi all, I have the following scenario, I have users and groups where a user can belong to n groups, and a group can have n users. A user must belogn at least to a group. So when I delete a group I must check that there isn't any orphan. To do this I have something like that: CREATE TABLE users ( user_idSERIAL8 PRIMARY KEY user_name VARCHAR(50) ) CREATE TABLE groups ( group_idSERIAL8 PRIMARY KEY, group_name VARCHAR(50) ) CREATE TABLE user_groups ( user_id INT8 REFERENCES users(user_id), group_id INT8 REFERENCE groups(group_id), CONSTRAINT pk PRIMARY_KEY ( user_id, group_id) ) CREATE INDEX idx_user_id ON user_groups( user_id ); CREATE INDEX idx_group_id ON user_groups( group_id ); FUNCTION delete_group( INT8 ) DECLARE p_groupid ALIAS FOR $1; v_deleted INTEGER; v_count INTEGER; resultRECORD; BEGIN v_deleted = 0; FOR result IN SELECT user_id FROM user_groups WHERE group_id = p_groupid LOOP SELECT INTO v_count COUNT(user_id) FROM user_groups WHERE user_id = result.user_id LIMIT 2; IF v_count = 1 THEN DELETE FROM users WHERE user_id = result.user_id; v_deleted = v_deleted + 1; END IF; END LOOP; DELETE FROM groups WHERE group_id = p_groupid; RETURN v_deleted; END; This works quite fast with small groups but when the group has an important number of users, it takes too much time. The delete_group action is fired from the user interface of the application. Do you have any idea about how I could improve the performance of this? Thanks all -- Arnau ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Advice about how to delete
Hi Michael, Michael Glaesemann wrote: On Jul 6, 2007, at 9:42 , Arnau wrote: I have the following scenario, I have users and groups where a user can belong to n groups, and a group can have n users. A user must belogn at least to a group. So when I delete a group I must check that there isn't any orphan. To do this I have something like that: IF v_count = 1 THEN DELETE FROM users WHERE user_id = result.user_id; v_deleted = v_deleted + 1; END IF; Am I right in reading that you're deleting any users that would be orphans? If so, you can just delete the orphans after rather than delete them beforehand (untested): -- delete user_groupDELETE FROM user_groups WHERE user_group_id = p_group_id; -- delete users that don't belong to any group DELETE FROM users WHERE user_id IN ( SELECT user_id LEFT JOIN user_groups WHERE group_id IS NULL); This should execute pretty quickly. You don't need to loop over any results. Remember, SQL is a set-based language, so if you can pose your question in a set-based way, you can probably find a pretty good, efficient solution. I have tested your solution and it's much worse than mine. My test database has about 254000 users and about 30 groups. The test I have done is remove a group with 258 users, my solution has taken about 3 seconds and your solution after 20seconds didn't finished. Of course the test machine is an old celeron with few MB of RAM, but as test machine does the job. Thank you very much -- Arnau ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] How much memory PostgreSQL is going to use?
Hi all, I have a server with 4GB of memory and I'm tweaking the PostgreSQL configuration. This server will be dedicated to run PostgreSQL so I'd like to dedicate as much as possible RAM to it. I have dedicated 1GB to shared_buffers (shared_buffers=131072) but I'm not sure if this will be the maximum memory used by PostgreSQL or additional to this it will take more memory. Because if shared_buffers is the maximum I could raise that value even more. Cheers! -- Arnau ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] How PostgreSQL handles multiple DDBB instances?
Hi all, I have a doubt/problem about how PostgreSQL handles multiple DDBB instances running on a same server and how I should design the architecture of an application. I have an application that works with multiple customers. Thinking in scalability we are thinking in applying the following approaches: - Create a separate database instance for each customer. - We think that customer's DB will be quite small, about 200MB as average. - The number of clients, then DDBB, can be significant(thousands). - Have as many customers as possible on the same server, so a single server could have more than 300 DDBB instances. Do you think this makes sense? or taking into account that the expected DDBB size, would be better to join several customers DDBB in just one instance. What I'm worried about is, if having so many DDBB instances PostgreSQL's performance would be worse. I have been following the list and one of the advises that appears more often is keep your DB in memory, so if I have just one instance instead of hundreds the performance will be better? Thank you very much -- Arnau ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] How PostgreSQL handles multiple DDBB instances?
Hi Tom, Arnau [EMAIL PROTECTED] writes: I have an application that works with multiple customers. Thinking in scalability we are thinking in applying the following approaches: - Create a separate database instance for each customer. - We think that customer's DB will be quite small, about 200MB as average. - The number of clients, then DDBB, can be significant(thousands). - Have as many customers as possible on the same server, so a single server could have more than 300 DDBB instances. This is probably a bad idea, unless each customer's performance demands are so low that you can afford to use very small shared-memory settings for each instance. But even small settings will probably eat ~10MB per instance --- can you afford to build these machines with multiple GB of RAM? Can you instead run things with one postmaster per machine and one database per customer within that instance? From a performance perspective this is likely to work much better. What I meant is just have only one postmaster per server and a lot of databases running in it. Something like that: template1=# \l List of databases Name| Owner | Encoding ---+---+-- alertwdv2 | gguridi | LATIN1 postgres | postgres | LATIN1 template0 | postgres | LATIN1 template1 | postgres | LATIN1 voicexml | root | LATIN1 wikidb| root | LATIN1 (6 rows) Here I just have 6 databases, so my doubt is if instead having 6 databases have 300/600 bases running on the same postmaster how this will impact the performance e.g. template1=# \l List of databases Name| Owner | Encoding ---+---+-- template0 | postgres | LATIN1 template1 | postgres | LATIN1 customers_group_1 | root | LATIN1 (3 rows) Instead of: template1=# \l List of databases Name| Owner | Encoding ---+---+-- template0 | postgres | LATIN1 template1 | postgres | LATIN1 customers_1 | root | LATIN1 customers_2 | root | LATIN1 customers_3 | root | LATIN1 ... customers_500 | root | LATIN1 (502 rows) If you desire to give the customers database-superuser capability then this probably won't do, but if they are restricted users it might be OK. The users won't have superuser access just execute plain queries. Thank you very much -- Arnau ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] How PostgreSQL handles multiple DDBB instances?
Tom Lane wrote: Arnau [EMAIL PROTECTED] writes: Can you instead run things with one postmaster per machine and one database per customer within that instance? From a performance perspective this is likely to work much better. What I meant is just have only one postmaster per server and a lot of databases running in it. OK, we are on the same page then. Should work fine. I think I've heard of people running installations with thousands of DBs in them. You'll want to test it a bit of course ... I'm worried about performance, I have done some tests and I have on a server more than 400 DBs, so it's possible to run such amount of DBs in a single postmaster. The point I'm worried is performance. Do you think the performance would be better executing exactly the same queries only adding an extra column to all the tables e.g. customer_id, than open a connection to the only one customers DB and execute the query there? I don't know if PostgreSQL cache's mechanism works as good as querying to 400 possible DBs or just to one possible DB. Thank you very much for your help :) -- Arnau ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Performace comparison of indexes over timestamp fields
Hi all, I have some tables where all the queries that will be executed are timestamps driven, so it'd be nice to have an index over those fields. On older versions of PostgreSQL, at least in my experience, queries on timestamps fields even having indexes where performing quite bad mainly sequential scans where performed. Now I have a newer version of PostgreSQL and I've done some tests comparing the performance of an index over a timestamp field with a numeric field. To do so, I have the following table: Table public.payment_transactions Column |Type |Modifiers +-+- transaction_id | character varying(32) | not null timestamp_in | timestamp without time zone | default now() credits| integer | epoch_in | bigint | epoch_in2 | double precision| Indexes: pk_paytrans_transid PRIMARY KEY, btree (transaction_id) idx_paytrans_epochin btree (epoch_in) idx_paytrans_epochin2 btree (epoch_in2) idx_paytrans_timestamp btree (timestamp_in) timestamp_in it's the timestamp, epoch_in and epoch_in2 are the epoch equivalent to timestamp to test how the indexes perform. We have three different indexes (testing purposes) one over a timestamp field, one over an int8 and one over a double precision field. While doing the tests this table has about 100.000 entries. To test the diferent indexes I have executed the following: Index over timestamp_in (timestamp) # explain analyze select * from payment_transactions where timestamp_in between '2007-02-13'::timestamp and '2007-02-15'::timestamp; QUERY PLAN --- Index Scan using idx_paytrans_timestamp on payment_transactions (cost=0.00..1480.24 rows=1698 width=138) (actual time=11.693..310.402 rows=1587 loops=1) Index Cond: ((timestamp_in = '2007-02-13 00:00:00'::timestamp without time zone) AND (timestamp_in = '2007-02-15 00:00:00'::timestamp without time zone)) Total runtime: 318.328 ms (3 rows) Index over epoch_in (int8) # explain analyze select * from payment_transactions where epoch_in between extract( epoch from '2007-02-13'::date )::int8 and extract( epoch from '2007-02-15'::date )::int8; QUERY PLAN - Index Scan using idx_paytrans_epochin on payment_transactions (cost=0.00..1483.24 rows=1698 width=138) (actual time=34.369..114.943 rows=1587 loops=1) Index Cond: ((epoch_in = 1171321200::bigint) AND (epoch_in = 1171494000::bigint)) Total runtime: 120.804 ms (3 rows) Index over epoch_in (double precision) # explain analyze select * from payment_transactions where epoch_in2 between extract( epoch from '2007-02-13'::date ) and extract( epoch from '2007-02-15'::date ); QUERY PLAN - Index Scan using idx_paytrans_epochin2 on payment_transactions (cost=0.00..1479.24 rows=1698 width=138) (actual time=26.115..51.357 rows=1587 loops=1) Index Cond: ((epoch_in2 = 1171321200::double precision) AND (epoch_in2 = 1171494000::double precision)) Total runtime: 57.065 ms (3 rows) As you can see the time difference are very big Timestamp:318.328 ms int8 index: 120.804 ms double precision: 57.065 ms is this normal? am I doing anything wrong? As rule of thumb is better to store epochs than timestamps? Thank you very much -- Arnau ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Equivalents in PostgreSQL of MySQL's ENGINE=MEMORY MAX_ROWS=1000
Hi Josh, Josh Berkus wrote: Arnau, Is there anything similar in PostgreSQL? The idea behind this is how I can do in PostgreSQL to have tables where I can query on them very often something like every few seconds and get results very fast without overloading the postmaster. If you're only querying the tables every few seconds, then you don't really need to worry about performance. Well, the idea behind this is to have events tables, and a monitoring system polls that table every few seconds. I'd like to have a kind of FIFO stack. From the events producer point of view he'll be pushing rows into that table, when it's filled the oldest one will be removed to leave room to the newest one. From the consumer point of view he'll read all the contents of that table. So I'll not only querying the tables, I'll need to also modify that tables. -- Arnau ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Equivalents in PostgreSQL of MySQL's ENGINE=MEMORY MAX_ROWS=1000
Hi Ansgar , On 2007-04-04 Arnau wrote: Josh Berkus wrote: Is there anything similar in PostgreSQL? The idea behind this is how I can do in PostgreSQL to have tables where I can query on them very often something like every few seconds and get results very fast without overloading the postmaster. If you're only querying the tables every few seconds, then you don't really need to worry about performance. Well, the idea behind this is to have events tables, and a monitoring system polls that table every few seconds. I'd like to have a kind of FIFO stack. From the events producer point of view he'll be pushing rows into that table, when it's filled the oldest one will be removed to leave room to the newest one. From the consumer point of view he'll read all the contents of that table. So I'll not only querying the tables, I'll need to also modify that tables. Ummm... this may be a dumb question, but why are you trying to implement something like a FIFO with an RDBMS in the first place? Wouldn't it be much easier to implement something like that as a separate program or script? Well, the idea is have a table with a maximum number of rows. As the number of queries over this table will be very high, I'd like to keep it as small as possible and without indexes and so on that could make the update slower. Maybe it's the moment to change my question, is there any trick to get a table that can be modified/queried very fast and with the minimum of overhead? This table will have several queries every second and I'd like to do this as fast as possible Thanks -- Arnau ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Equivalents in PostgreSQL of MySQL's ENGINE=MEMORY MAX_ROWS=1000
Hi Thor, Thor-Michael Støre wrote: On 2007-04-04 Arnau wrote: Josh Berkus wrote: Arnau, Is there anything similar in PostgreSQL? The idea behind this is how I can do in PostgreSQL to have tables where I can query on them very often something like every few seconds and get results very fast without overloading the postmaster. If you're only querying the tables every few seconds, then you don't really need to worry about performance. Well, the idea behind this is to have events tables, and a monitoring system polls that table every few seconds. I'd like to have a kind of FIFO stack. From the events producer point of view he'll be pushing rows into that table, when it's filled the oldest one will be removed to leave room to the newest one. From the consumer point of view he'll read all the contents of that table. So I'll not only querying the tables, I'll need to also modify that tables. Please try to refrain from doing this. This is the Database as an IPC antipattern (Antipatterns are commonly-reinvented bad solutions to problems, I.E. you can be sure someone has tried this very thing before and found it to be a bad solution) http://en.wikipedia.org/wiki/Database_as_an_IPC Best solution is (like Ansgar hinted at) to use a real IPC system. Ofcourse, I've done it myself (not on PostgreSQL though) when working at a large corporation where corporate politics prevented me from introducing any new interdependency between systems (like having two start talking with eachother when they previously didn't), the only common ground for systems that needed to communicate was a database, and one of the systems was only able to run simple SQL statements and not stored procedures. First of all, thanks for your interested but let me explain what I need to do. We have a web application where customers want to monitor how it's performing, but not performing in terms of speed but how many customers are now browsing in the application, how many have payed browsing sessions, how many payments have been done, ... More or less is to have a control panel. The difference is that they want that the information displayed on a web browser must be real-time that is a query every 1-10 seconds. Then, I haven't read yet the article but I'll do it, how you'd do what I need to do? Thanks -- Arnau ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[PERFORM] Equivalents in PostgreSQL of MySQL's ENGINE=MEMORY MAX_ROWS=1000
Hi all, In MySQL when you create a table you can define something like: CREATE TABLE `sneakers` ( `sneaker_id` char(24) NOT NULL, `sneaker_time` int(10) unsigned NOT NULL default '0', `sneaker_user` int(10) unsigned NOT NULL default '0', UNIQUE KEY `sneaker_id` (`sneaker_id`) ) ENGINE=MEMORY DEFAULT CHARSET=utf8 MAX_ROWS=1000; MySQL manual says: The MEMORY storage engine creates tables with contents that are stored in memory. As indicated by the name, MEMORY tables are stored in memory. They use hash indexes by default, which makes them very fast, and very useful for creating temporary tables. However, when the server shuts down, all rows stored in MEMORY tables are lost. The tables themselves continue to exist because their definitions are stored in .frm files on disk, but they are empty when the server restarts. MAX_ROWS can be used to determine the maximum and minimum numbers of rows Is there anything similar in PostgreSQL? The idea behind this is how I can do in PostgreSQL to have tables where I can query on them very often something like every few seconds and get results very fast without overloading the postmaster. Thank you very much -- Arnau ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] [OT] Very strange postgresql behaviour
Hi all, I have postgresql 7.4.2 running on debian and I have the oddest postgresql behaviour I've ever seen. I do the following queries: espsm_asme=# select customer_app_config_id, customer_app_config_name from customer_app_config where customer_app_config_id = 5929 or customer_app_config_id = 11527 order by customer_app_config_id; customer_app_config_id | customer_app_config_name +-- 5929 | INFO (1 row) I do the same query but changing the order of the or conditions: espsm_asme=# select customer_app_config_id, customer_app_config_name from customer_app_config where customer_app_config_id = 11527 or customer_app_config_id = 5929 order by customer_app_config_id; customer_app_config_id | customer_app_config_name +-- 11527 | MOVIDOSERENA TONI 5523 (1 row) As you can see, the configuration 5929 and 11527 both exists, but when I do the queries they don't appear. Here below you have the execution plans. Those queries use an index, I have done reindex table customer_app_config but nothing has changed. espsm_asme=# explain analyze select customer_app_config_id, customer_app_config_name from customer_app_config where customer_app_config_id = 11527 or customer_app_config_id = 5929 order by customer_app_config_id; QUERY PLAN Sort (cost=10.28..10.29 rows=2 width=28) (actual time=0.252..0.253 rows=1 loops=1) Sort Key: customer_app_config_id - Index Scan using pk_cag_customer_application_id, pk_cag_customer_application_id on customer_app_config (cost=0.00..10.27 rows=2 width=28) (actual time=0.168..0.232 rows=1 loops=1) Index Cond: ((customer_app_config_id = 11527::numeric) OR (customer_app_config_id = 5929::numeric)) Total runtime: 0.305 ms (5 rows) espsm_asme=# explain analyze select customer_app_config_id, customer_app_config_name from customer_app_config where customer_app_config_id = 5929 or customer_app_config_id = 11527 order by customer_app_config_id; QUERY PLAN Sort (cost=10.28..10.29 rows=2 width=28) (actual time=0.063..0.064 rows=1 loops=1) Sort Key: customer_app_config_id - Index Scan using pk_cag_customer_application_id, pk_cag_customer_application_id on customer_app_config (cost=0.00..10.27 rows=2 width=28) (actual time=0.034..0.053 rows=1 loops=1) Index Cond: ((customer_app_config_id = 5929::numeric) OR (customer_app_config_id = 11527::numeric)) Total runtime: 0.114 ms (5 rows) The table definition is the following: espsm_asme=# \d customer_app_config Table public.customer_app_config Column | Type | Modifiers --+---+ customer_app_config_id | numeric(10,0) | not null customer_app_config_name | character varying(32) | not null keyword | character varying(43) | application_id | numeric(10,0) | not null customer_id | numeric(10,0) | not null customer_app_contents_id | numeric(10,0) | number_access_id | numeric(10,0) | prefix | character varying(10) | separator| numeric(1,0) | default 0 on_hold | numeric(1,0) | not null default 0 with_toss| numeric(1,0) | not null default 0 number_id| numeric(10,0) | param_separator_id | numeric(4,0) | default 1 memory_timeout | integer | with_memory | numeric(1,0) | default 0 session_enabled | numeric(1,0) | default 0 session_timeout | integer | number | character varying(15) | Indexes: pk_cag_customer_application_id primary key, btree (customer_app_config_id) un_cag_kwordnumber unique, btree (keyword, number_id) idx_cappconfig_ccontentsid btree (customer_app_contents_id) idx_cappconfig_cusidappid btree (customer_id, application_id) idx_cappconfig_customerid btree (customer_id) idx_cappconfig_onhold btree (on_hold) idx_cappconfig_onholdkeyw btree (on_hold, keyword) Rules: A lot of rules that I don't paste as matter of length. Do you have any idea about how I can fix this? -- Arnau ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] [OT] Very strange postgresql behaviour
Hi Bill, In response to Arnau [EMAIL PROTECTED]: I have postgresql 7.4.2 running on debian and I have the oddest postgresql behaviour I've ever seen. I do the following queries: espsm_asme=# select customer_app_config_id, customer_app_config_name from customer_app_config where customer_app_config_id = 5929 or customer_app_config_id = 11527 order by customer_app_config_id; customer_app_config_id | customer_app_config_name +-- 5929 | INFO (1 row) I do the same query but changing the order of the or conditions: espsm_asme=# select customer_app_config_id, customer_app_config_name from customer_app_config where customer_app_config_id = 11527 or customer_app_config_id = 5929 order by customer_app_config_id; customer_app_config_id | customer_app_config_name +-- 11527 | MOVIDOSERENA TONI 5523 (1 row) As you can see, the configuration 5929 and 11527 both exists, but when I do the queries they don't appear. [snip] Just a guess, but perhaps your index is damaged. Have you tried REINDEXing? Yes, I have tried with: reindex table customer_app_config reindex index pk_cag_customer_application_id but nothing changed. I also tried to drop the index: espsm_asme=# begin; drop index pk_cag_customer_application_id; BEGIN ERROR: cannot drop index pk_cag_customer_application_id because constraint pk_cag_customer_application_id on table customer_app_config requires it HINT: You may drop constraint pk_cag_customer_application_id on table customer_app_config instead. espsm_asme=# rollback; ROLLBACK But I can't remove the constraint as it's the primary key and there are foreign keys over it -- Arnau ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Does it matters the column order in indexes and constraints creation?
Hi all, I've got a doubt about how to create an index and a primary key. Lets say I have the following table: CREATE TABLE blacklist ( telephoneVARCHAR(15), customer_idINT4 CONSTRAINT fk_blacklist_customerid REFERENCES customers( customer_id ), country_id INT2 CONSTRAINT fk_blacklist_countryid REFERENCES countries( country_id ), CONSTRAINT pk_blacklist_cidcustidtel PRIMARY KEY(country_id, customer_id, telephone) ); The country_id column can have maybe 100 - 250 different values. The customer_id column can have as much several hundred values (less than 1000). The telephone is where all will be different. So my doubt is, in terms of performance makes any difference the order of the primary key fields? The same in the index definition? I have checked the postgresql documentation I haven't been able to find anything about. Thanks -- Arnau ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[PERFORM] Partitioning
Hi all, I'm not sure if this question fits in the topic of this list. I'm interested in partitioning and it's the first time I'd use it. There is an issue I don't know how you handle it. Lets say I'm interested in store monthly based statistical data like the example of http://www.postgresql.org/docs/8.2/static/ddl-partitioning.html. What I don't like of this approach is that the monthly tables, rules... must be created manually or at least I haven't found any other option. My question is how do you manage this? do you have a cron task that creates automatically these monthly elements (tables, rules, ... ) or there is another approach that doesn't require external things like cron only PostgreSQL. -- Arnau ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] what work_mem needs a query needs?
Hi all, In a previous post, Ron Peacetree suggested to check what work_mem needs a query needs. How that can be done? Thanks all -- Arnau ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS
Hi all, A= go through each query and see what work_mem needs to be for that query to be as RAM resident as possible. If you have enough RAM, set work_mem for that query that large. Remember that work_mem is =per query=, so queries running in parallel eat the sum of each of their work_mem's. How can I know what work_mem needs a query needs? Regards -- Arnau ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Massive delete of rows, how to proceed?
Hi all, I have a table with statistics with more than 15 million rows. I'd like to delete the oldest statistics and this can be about 7 million rows. Which method would you recommend me to do this? I'd be also interested in calculate some kind of statistics about these deleted rows, like how many rows have been deleted for date. I was thinking in creating a function, any recommendations? Thank you very much -- Arnau ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Is it possible to speed this query up?
Hi all, I execute the following query on postgresql 8.1.0: SELECT u.telephone_number , u.telecom_operator_id , u.name FROM campanas_subcampaign AS sub , agenda_users AS u , agenda_users_groups ug WHERE sub.customer_app_config_id = 19362 AND sub.subcampaign_id = 9723 AND ug.agenda_user_group_id = sub.ini_user_group_id AND ug.user_id=u.user_id AND ug.group_id IN ( SELECT group_id FROM campanas_groups WHERE customer_app_config_id = 19362 ) ORDER BY ug.agenda_user_group_id ASC LIMIT 150 the explain analyze shouts the following: Limit (cost=1.20..4600.56 rows=150 width=74) (actual time=76516.312..76853.191 rows=150 loops=1) - Nested Loop (cost=1.20..333424.31 rows=10874 width=74) (actual time=76516.307..76852.896 rows=150 loops=1) - Nested Loop (cost=1.20..299653.89 rows=10874 width=20) (actual time=76506.926..76512.608 rows=150 loops=1) Join Filter: (outer.agenda_user_group_id = inner.ini_user_group_id) - Nested Loop IN Join (cost=1.20..189802.77 rows=32623 width=20) (actual time=75938.659..76353.748 rows=16200 loops=1) Join Filter: (outer.group_id = inner.group_id) - Index Scan using pk_agndusrgrp_usergroup on agenda_users_groups ug (cost=0.00..123740.26 rows=2936058 width=30) (actual time=0.101..61921.260 rows=2836638 loops=1) - Materialize (cost=1.20..1.21 rows=1 width=10) (actual time=0.001..0.002 rows=1 loops=2836638) - Seq Scan on campanas_groups (cost=0.00..1.20 rows=1 width=10) (actual time=0.052..0.053 rows=1 loops=1) Filter: (customer_app_config_id = 19362::numeric) - Index Scan using pk_cmpnssubc_subcmpnid on campanas_subcampaign sub (cost=0.00..3.35 rows=1 width=8) (actual time=0.005..0.006 rows=1 loops=16200) Index Cond: (subcampaign_id = 9723) Filter: (customer_app_config_id = 19362::numeric) - Index Scan using pk_agenda_uid on agenda_users u (cost=0.00..3.09 rows=1 width=78) (actual time=2.262..2.264 rows=1 loops=150) Index Cond: (outer.user_id = u.user_id) Total runtime: 76853.504 ms (16 rows) Do you think I could do anything to speed it up? Cheers!! -- Arnau ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Query on postgresql 7.4.2 not using index
Hi all, I have the following running on postgresql version 7.4.2: CREATE SEQUENCE agenda_user_group_id_seq MINVALUE 1 MAXVALUE 9223372036854775807 CYCLE INCREMENT 1 START 1; CREATE TABLE AGENDA_USERS_GROUPS ( AGENDA_USER_GROUP_ID INT8 CONSTRAINT pk_agndusrgrp_usergroup PRIMARY KEY DEFAULT NEXTVAL('agenda_user_group_id_seq'), USER_ID NUMERIC(10) CONSTRAINT fk_agenda_uid REFERENCES AGENDA_USERS (USER_ID) ON DELETE CASCADE NOT NULL, GROUP_ID NUMERIC(10) CONSTRAINT fk_agenda_gid REFERENCES AGENDA_GROUPS (GROUP_ID) ON DELETE CASCADE NOT NULL, CREATION_DATE DATE DEFAULT CURRENT_DATE, CONSTRAINT un_agndusrgrp_usergroup UNIQUE(USER_ID, GROUP_ID) ); CREATE INDEX i_agnusrsgrs_userid ON AGENDA_USERS_GROUPS ( USER_ID ); CREATE INDEX i_agnusrsgrs_groupid ON AGENDA_USERS_GROUPS ( GROUP_ID ); When I execute: EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups WHERE group_id = 9; it does a sequential scan and doesn't use the index and I don't understand why, any idea? I have the same in postgresql 8.1 and it uses the index :-| Thanks -- Arnau ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Query on postgresql 7.4.2 not using index
chris smith wrote: On 4/25/06, Arnau [EMAIL PROTECTED] wrote: Hi all, I have the following running on postgresql version 7.4.2: CREATE SEQUENCE agenda_user_group_id_seq MINVALUE 1 MAXVALUE 9223372036854775807 CYCLE INCREMENT 1 START 1; CREATE TABLE AGENDA_USERS_GROUPS ( AGENDA_USER_GROUP_ID INT8 CONSTRAINT pk_agndusrgrp_usergroup PRIMARY KEY DEFAULT NEXTVAL('agenda_user_group_id_seq'), USER_ID NUMERIC(10) CONSTRAINT fk_agenda_uid REFERENCES AGENDA_USERS (USER_ID) ON DELETE CASCADE NOT NULL, GROUP_ID NUMERIC(10) CONSTRAINT fk_agenda_gid REFERENCES AGENDA_GROUPS (GROUP_ID) ON DELETE CASCADE NOT NULL, CREATION_DATE DATE DEFAULT CURRENT_DATE, CONSTRAINT un_agndusrgrp_usergroup UNIQUE(USER_ID, GROUP_ID) ); CREATE INDEX i_agnusrsgrs_userid ON AGENDA_USERS_GROUPS ( USER_ID ); CREATE INDEX i_agnusrsgrs_groupid ON AGENDA_USERS_GROUPS ( GROUP_ID ); When I execute: EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups WHERE group_id = 9; Try EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups WHERE group_id::int8 = 9; or EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups WHERE group_id = '9'; and let us know what happens. The same, the table has 2547556 entries: espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups espsm_moviltelevision-# WHERE group_id::int8 = 9; QUERY PLAN - Seq Scan on agenda_users_groups (cost=0.00..59477.34 rows=12738 width=8) (actual time=3409.541..11818.794 rows=367026 loops=1) Filter: ((group_id)::bigint = 9) Total runtime: 13452.114 ms (3 filas) espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups espsm_moviltelevision-# WHERE group_id = '9'; QUERY PLAN Seq Scan on agenda_users_groups (cost=0.00..53108.45 rows=339675 width=8) (actual time=916.903..5763.830 rows=367026 loops=1) Filter: (group_id = 9::numeric) Total runtime: 7259.861 ms (3 filas) espsm_moviltelevision=# select count(*) from agenda_users_groups ; count - 2547556 Thanks -- Arnau ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Query on postgresql 7.4.2 not using index
Tom Lane wrote: Arnau [EMAIL PROTECTED] writes: Seq Scan on agenda_users_groups (cost=0.00..53108.45 rows=339675 width=8) (actual time=916.903..5763.830 rows=367026 loops=1) Filter: (group_id = 9::numeric) Total runtime: 7259.861 ms (3 filas) espsm_moviltelevision=# select count(*) from agenda_users_groups ; count - 2547556 So the SELECT is fetching nearly 15% of the rows in the table. The planner is doing *the right thing* to use a seqscan, at least for this particular group_id value. I have done the same tests on 8.1.0. espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups WHERE group_id = 9; QUERY PLAN -- Bitmap Heap Scan on agenda_users_groups (cost=2722.26..30341.78 rows=400361 width=8) (actual time=145.533..680.839 rows=367026 loops=1) Recheck Cond: (group_id = 9::numeric) - Bitmap Index Scan on i_agnusrsgrs_groupid (cost=0.00..2722.26 rows=400361 width=0) (actual time=142.958..142.958 rows=367026 loops=1) Index Cond: (group_id = 9::numeric) Total runtime: 1004.966 ms (5 rows) espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups WHERE group_id::int8 = 9; QUERY PLAN --- Seq Scan on agenda_users_groups (cost=0.00..60947.43 rows=12777 width=8) (actual time=457.963..2244.928 rows=367026 loops=1) Filter: ((group_id)::bigint = 9) Total runtime: 2571.496 ms (3 rows) espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups WHERE group_id::int8 = '9'; QUERY PLAN --- Seq Scan on agenda_users_groups (cost=0.00..60947.43 rows=12777 width=8) (actual time=407.193..2182.880 rows=367026 loops=1) Filter: ((group_id)::bigint = 9::bigint) Total runtime: 2506.998 ms (3 rows) espsm_moviltelevision=# select count(*) from agenda_users_groups ; count - 2555437 (1 row) Postgresql then uses the index, I don't understand why? in this server I tried to tune the configuration, it's because of the tuning? Because it's a newer version of postgresql? Thanks for all the replies -- Arnau ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Query on postgresql 7.4.2 not using index
I have done the same tests on 8.1.0. espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups WHERE group_id = 9; QUERY PLAN -- Bitmap Heap Scan on agenda_users_groups (cost=2722.26..30341.78 rows=400361 width=8) (actual time=145.533..680.839 rows=367026 loops=1) Recheck Cond: (group_id = 9::numeric) - Bitmap Index Scan on i_agnusrsgrs_groupid (cost=0.00..2722.26 rows=400361 width=0) (actual time=142.958..142.958 rows=367026 loops=1) Index Cond: (group_id = 9::numeric) Total runtime: 1004.966 ms (5 rows) How big are these individual records? I'm guessing a fairly good size, since an index scan is winning. How I could know the size on an individual record? espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups WHERE group_id::int8 = 9; QUERY PLAN --- Seq Scan on agenda_users_groups (cost=0.00..60947.43 rows=12777 width=8) (actual time=457.963..2244.928 rows=367026 loops=1) Filter: ((group_id)::bigint = 9) Total runtime: 2571.496 ms (3 rows) OK. Stop and think about what you're telling postgresql to do here. You're telling it to cast the field group_id to int8, then compare it to 9. How can it cast the group_id to int8 without fetching it? That's right, you're ensuring a seq scan. You need to put the int8 cast on the other side of that equality comparison, like: where group_id = 9::int8 I just did what Chris Smith asked me to do :), here I paste the results I get when I change the cast. espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups WHERE group_id = 9::int8; QUERY PLAN -- Bitmap Heap Scan on agenda_users_groups (cost=2722.33..30343.06 rows=400379 width=8) (actual time=147.723..714.473 rows=367026 loops=1) Recheck Cond: (group_id = 9::numeric) - Bitmap Index Scan on i_agnusrsgrs_groupid (cost=0.00..2722.33 rows=400379 width=0) (actual time=145.015..145.015 rows=367026 loops=1) Index Cond: (group_id = 9::numeric) Total runtime: 1038.537 ms (5 rows) espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups WHERE group_id = '9'::int8; QUERY PLAN -- Bitmap Heap Scan on agenda_users_groups (cost=2722.33..30343.06 rows=400379 width=8) (actual time=153.858..1192.838 rows=367026 loops=1) Recheck Cond: (group_id = 9::numeric) - Bitmap Index Scan on i_agnusrsgrs_groupid (cost=0.00..2722.33 rows=400379 width=0) (actual time=151.298..151.298 rows=367026 loops=1) Index Cond: (group_id = 9::numeric) Total runtime: 1527.039 ms (5 rows) Thanks -- Arnau ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Where is my bottleneck?
Hi all, I have a performance problem and I don't know where is my bottleneck. I have postgresql 7.4.2 running on a debian server with kernel 2.4.26-1-686-smp with two Xeon(TM) at 2.80GHz and 4GB of RAM and a RAID 5 made with SCSI disks. Maybe its not the latest hardware but I think it's not that bad. My problem is that the general performance is not good enough and I don't know where is the bottleneck. It could be because the queries are not optimized as they should be, but I also think it can be a postgresql configuration problem or hardware problem (HDs not beeing fast enough, not enough RAM, ... ) The configuration of postgresql is the default, I tried to tune the postgresql.conf and the results where disappointing, so I left again the default values. When I do top I get: top - 19:10:24 up 452 days, 15:48, 4 users, load average: 6.31, 6.27, 6.52 Tasks: 91 total, 8 running, 83 sleeping, 0 stopped, 0 zombie Cpu(s): 24.8% user, 15.4% system, 0.0% nice, 59.9% idle Mem: 3748956k total, 3629252k used, 119704k free,57604k buffers Swap: 2097136k total,14188k used, 2082948k free, 3303620k cached Most of the time the idle value is even higher than 60%. I know it's a problem with a very big scope, but could you give me a hint about where I should look to? Thank you very much -- Arnau ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] How import big amounts of data?
Hi all, Which is the best way to import data to tables? I have to import 9 rows into a column and doing it as inserts takes ages. Would be faster with copy? is there any other alternative to insert/copy? Cheers! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] How import big amounts of data?
I am doing twice as big imports daily, and found the follwing method most efficient (other than using copy): - Use plpgsql function to do the actual insert (or update/insert if needed). - Inside a transaction, execute SELECT statements with maximum possible number of insert function calls in one go. This minimizes the number of round trips between the client and the server. Thanks Teemu! could you paste an example of one of those functions? ;-) An example of those SELECTS also would be great, I'm not sure I have completly understood what you mean. -- Arnau ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Advise about how to delete entries
Hi all, COPY FROM a file with all the ID's to delete, into a temporary table, and do a joined delete to your main table (thus, only one query). I already did this, but I don't have idea about how to do this join, could you give me a hint ;-) ? Thank you very much -- Arnau ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Advise about how to delete entries
Hi all, I have the following table: espsm_asme=# \d statistics_sasme Table public.statistics_sasme Column | Type | Modifiers --+--+-- statistic_id | numeric(10,0)| not null default nextval('STATISTICS_OPERATOR_ID_SEQ'::text) input_message_id | character varying(50)| timestamp_in | timestamp with time zone | telecom_operator_id | numeric(4,0) | enduser_number | character varying(15)| not null telephone_number | character varying(15)| not null application_id | numeric(10,0)| customer_id | numeric(10,0)| customer_app_config_id | numeric(10,0)| customer_app_contents_id | numeric(10,0)| message | character varying(160) | message_type_id | numeric(4,0) | Indexes: pk_stsasme_statistic_id primary key, btree (statistic_id) Triggers: RI_ConstraintTrigger_17328735 AFTER INSERT OR UPDATE ON statistics_sasme FROM telecom_operators NOT DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE RI_FKey_check_ins('fk_stsasme_telecom_operator_id', 'statistics_sasme', 'telecom_operators', 'UNSPECIFIED', 'telecom_operator_id', 'telecom_operator_id') RI_ConstraintTrigger_17328738 AFTER INSERT OR UPDATE ON statistics_sasme FROM applications NOT DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE RI_FKey_check_ins('fk_stsasme_application_id', 'statistics_sasme', 'applications', 'UNSPECIFIED', 'application_id', 'application_id') RI_ConstraintTrigger_17328741 AFTER INSERT OR UPDATE ON statistics_sasme FROM customers NOT DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE RI_FKey_check_ins('fk_stsasme_customer_id', 'statistics_sasme', 'customers', 'UNSPECIFIED', 'customer_id', 'customer_id') That contains about 7.000.000 entries and I have to remove 33.000 entries. I have created an sql file with all the delete sentences, e.g.: DELETE FROM statistics_sasme WHERE statistic_id = 9832; then I do \i delete_items.sql. Remove a single entry takes more than 10 seconds. What would you do to speed it up? Thank you very much ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Queries with timestamp II
Hi all, First of all thanks to Josh and Richard for their replies. What I have done to test their indications is the following. I have created a new table identical to STATISTICS, and an index over the TIMESTAMP_IN field. CREATE TABLE STATISTICS2 ( STATISTIC_IDNUMERIC(10) NOT NULL DEFAULT NEXTVAL('STATISTIC_ID_SEQ') CONSTRAINT pk_st_statistic2_id PRIMARY KEY, TIMESTAMP_INTIMESTAMP, VALUE NUMERIC(10) ); CREATE INDEX i_stats2_tin ON STATISTICS2(TIMESTAMP_IN); After that I inserted the data from STATISTICS and vacuumed the DB: INSERT INTO STATISTICS2 ( SELECT * FROM STATISTICS ); vacuumdb -f -z -d test once the vacuum has finished I do the following query explain analyze select * from statistics2 where timestamp_in to_timestamp( '20031201', 'MMDD' ); NOTICE: QUERY PLAN: Seq Scan on statistics2 (cost=0.00..638.00 rows=9289 width=35) (actual time=0.41..688.34 rows=27867 loops=1) Total runtime: 730.82 msec That query is not using the index. Anybody knows what I'm doing wrong? Thank you very much -- Arnau ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Queries with timestamps
Hi all, I'm quite newbie in SQL and I have a performance problem. I have the following table (with some extra fields) and without any index: CREATE TABLE STATISTICS ( STATISTIC_ID NUMERIC(10) NOT NULL DEFAULT nextval('STATISTIC_ID_SEQ') CONSTRAINT pk_st_statistic_id PRIMARY KEY, TIMESTAMP_IN TIMESTAMP, VALUE NUMERIC(10) ); The queries on this table are mainly related with the timestamp field, e.g.: select * from statistics where time::date current_date - interval '1 month'; As the number of rows grows the time needed to execute this query takes longer. What'd I should do improve the performance of this query? Thank you very much -- Arnau ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly