[PERFORM] Sql result b where condition
Hi all, I have a table emp. using where condition can i get the result prioritized. Take the example below. select ENAME,ORIG_SALARY from employee where (ename='Tom' and orig_salary=2413)or(orig_salary=1234 ) if the fist condition(ename='Tom' and orig_salary=2413) is satified then 10 rows will be returned, for the second condition (orig_salary=1234 ) there are 20 rows will be returned. The order of display should be The first 10 rows then next 20 rows. Thanks Regards, Ram -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sql result b where condition
In response to ramasubramanian : Please, create a new mail for a new topic and don't hijack other threads. Hi all, I have a table emp. using where condition can i get the result prioritized. Take the example below. select ENAME,ORIG_SALARY from employee where (ename='Tom' and orig_salary=2413)or(orig_salary=1234 ) if the fist condition(ename='Tom' and orig_salary=2413) is satified then 10 rows will be returned, for the second condition (orig_salary=1234 ) there are 20 rows will be returned. The order of display should be The first 10 rows then next 20 rows. Thanks Regards, Ram For instance: select ENAME,ORIG_SALARY, 1 as my_order from employee where (ename='Tom' and orig_salary=2413) union all select ENAME,ORIG_SALARY, 2 employee where (orig_salary=1234 ) order by my_order. other solution (untested): select ENAME,ORIG_SALARY, case when (ename='Tom' and orig_salary=2413) then 1 else 2 end as my_order from employee where (ename='Tom' and orig_salary=2413)or(orig_salary=1234 ) order by my_order; Regards, Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: - Header) GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431 2EB0 389D 1DC2 3172 0C99 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
On 22/01/10 19:06, Tory M Blue wrote: Here is the explain plan for the query. Actual rows that the query returns is 6369 Actually, it processes 19,799 rows (see the actual rows= below). SLOW - Bitmap Heap Scan on userstats (cost=797.69..118850.46 rows=13399 width=8) (actual time=281.604..31190.290 rows=19799 loops=1) Total runtime: 31219.536 ms FAST - Bitmap Heap Scan on userstats a (cost=802.66..118855.43 rows=33276 width=23) (actual time=55.400..3807.908 rows=2606 loops=1) Total runtime: 3813.626 ms OK - so the first query processes 19,799 rows in 31,219 ms (about 1.5ms per row) The second processes 2,606 rows in 3,813 ms (about 1.3ms per row). You are asking for DISTINCT user-ids, so it's seems reasonable that it will take slightly longer to check a larger set of user-ids. Otherwise, both queries are the same. I'm still a little puzzled by the bitmap scan, but the planner probably knows more about your data than I do. The main time is spent in the bitmap heap scan which is where it's grabbing actual row data (and presumably building a hash over the uid column). you can see how long in the actual time the first number (e.g. 281.604) is the time spent before it starts, and the second is the total time at finish (31190.290). If loops was greater than 1 you would multiply the times by the number of loops to get a total. So - there's nothing wrong in the sense that the second query does the same as the first. Let's take a step back. What you really want is your reports to be faster. You mentioned you were running this query thousands of times with a different makeid each time. Running it once for all possible values and stashing the results in a temp table will probably be *much* faster. The planner can just scan the whole table once and build up its results as it goes. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
On Mon, 25 Jan 2010, Richard Huxton wrote: OK - so the first query processes 19,799 rows in 31,219 ms (about 1.5ms per row) The second processes 2,606 rows in 3,813 ms (about 1.3ms per row). Agreed. One query is faster than the other because it has to do an eighth the amount of work. Matthew -- I wouldn't be so paranoid if you weren't all out to get me!! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sql result b where condition
On Mon, 25 Jan 2010, A. Kretschmer wrote: In response to ramasubramanian : Please, create a new mail for a new topic and don't hijack other threads. Even more so - this isn't probably the right mailing list for generic sql help questions. select ENAME,ORIG_SALARY from employee where (ename='Tom' and orig_salary=2413)or(orig_salary=1234 ) if the fist condition(ename='Tom' and orig_salary=2413) is satified then 10 rows will be returned, for the second condition (orig_salary=1234 ) there are 20 rows will be returned. The order of display should be The first 10 rows then next 20 rows. select ENAME,ORIG_SALARY, 1 as my_order from employee where (ename='Tom' and orig_salary=2413) union all select ENAME,ORIG_SALARY, 2 employee where (orig_salary=1234 ) order by my_order. Or just: select ENAME,ORIG_SALARY from employee where (ename='Tom' and orig_salary=2413)or(orig_salary=1234 ) ORDER BY orig_salary DESC as there is going to be only two values for orig_salary. Matthew -- The early bird gets the worm. If you want something else for breakfast, get up later. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sql result b where condition
In response to Matthew Wakeling : On Mon, 25 Jan 2010, A. Kretschmer wrote: In response to ramasubramanian : Please, create a new mail for a new topic and don't hijack other threads. Even more so - this isn't probably the right mailing list for generic sql help questions. ACK. select ENAME,ORIG_SALARY, 1 as my_order from employee where (ename='Tom' and orig_salary=2413) union all select ENAME,ORIG_SALARY, 2 employee where (orig_salary=1234 ) order by my_order. Or just: select ENAME,ORIG_SALARY from employee where (ename='Tom' and orig_salary=2413)or(orig_salary=1234 ) ORDER BY orig_salary DESC as there is going to be only two values for orig_salary. hehe, yes, overseen that fact ;-) Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: - Header) GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431 2EB0 389D 1DC2 3172 0C99 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?
Scott Carey: (2) The tests: Note: The standard speed was about 800MB/40s, so 20MB/s. a) What I changed: fsync=off Result: 35s, so 5s faster. b) like a) but: checkpoint_segments=128 (was 3) autovacuum=off Result: 35s (no change...?!) yes, more checkpoint_segments will help if your shared_buffers is larger, it won't do a whole lot otherwise. Generally, I like to keep these roughly equal sized as a starting point for any small to medium sized configuration. So if shared_buffers is 1GB, that takes 64 checkpoint segments to hold for heavy write scenarios. (1) Ok, that's what I tested: 1024 MB shared_buffers, 64 checkpoint segments. Unfortunatelly I could not run it on the same hardware anymore: The data is written to a single disk now, not raid anymore. So with the default shared_buffers of 8 MB (?) we should expect 45s for writing the 800 MB. With the large shared_buffers and checkpoints (mentioned above) I got this: 1. run (right after postgres server (re-)start): 28s (!) 2. run: 44s 3. run: 42s So, roughly the same as with small buffers. (2) Then I switched again from 8.2.4 to 8.4.2: 1. run (after server start): 25s. 2. run: 38s 3. run: 38s So, 8.4 helped a bit over 8.2. (3) All in all By (1) + (2) the performance bottleneck has, however, changed a lot (as shown here by the performance monitor): Now, the test system is definitly disk bound. Roughly speaking, at the middle of the whole test, for about 40-50% of the time, the 'data' disk was at 100% (and the 'WAL' at 20%), while before and after that the 'WAL' disk had a lot of peaks at 100% (and 'data' disk at 30%). The average MB/s of the 'data' disk was 40 MB/s (WAL: 20MB/s) -- while the raw performance is 800MB/40s = 20MB/s, so still *half* what the disk does. So, this remains as the last open question to me: It seems the data is doubly written to the 'data' disk, although WAL is written to the separate 'WAL' disk. Ok, I've managed to use 8.4 here. Unfortunatelly: There was nearly no improvement in speed. For example test 2d) performed in 35s. With a very small shared_buffers the improvements to Postgres' shared_buffer / checkpoint interaction can not be utilized. See above. Thank You Felix -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] splitting data into multiple tables
Hello, I am working on a project that will take out structured content from wikipedia and put it in our database. Before putting the data into the database I wrote a script to find out the number of rows every table would be having after the data is in and I found there is a table which will approximately have 5 crore entries after data harvesting. Is it advisable to keep so much data in one table ? I have read about 'partitioning' a table. An other idea I have is to break the table into different tables after the no of rows in a table has reached a certain limit say 10 lacs. For example, dividing a table 'datatable' to 'datatable_a', 'datatable_b' each having 10 lac entries. I needed advice on whether I should go for partitioning or the approach I have thought of. We have a HP server with 32GB ram,16 processors. The storage has 24TB diskspace (1TB/HD). We have put them on RAID-5. It will be great if we could know the parameters that can be changed in the postgres configuration file so that the database makes maximum utilization of the server we have. For eg parameters that would increase the speed of inserts and selects. Thank you in advance Rajiv Nair
Re: [PERFORM] splitting data into multiple tables
nair rajiv nair...@gmail.com wrote: I found there is a table which will approximately have 5 crore entries after data harvesting. Is it advisable to keep so much data in one table ? That's 50,000,000 rows, right? At this site, you're looking at a non-partitioned table with more than seven times that if you go to a case and click the Court Record Events button: http://wcca.wicourts.gov/ I have read about 'partitioning' a table. An other idea I have is to break the table into different tables after the no of rows in a table has reached a certain limit say 10 lacs. For example, dividing a table 'datatable' to 'datatable_a', 'datatable_b' each having 10 lac entries. I needed advice on whether I should go for partitioning or the approach I have thought of. It can help, and it can hurt. It depends on the nature of the data and how it is used. To get a meaningful answer, I think we'd need to know a bit more about it. We have a HP server with 32GB ram,16 processors. The storage has 24TB diskspace (1TB/HD). We have put them on RAID-5. It will be great if we could know the parameters that can be changed in the postgres configuration file so that the database makes maximum utilization of the server we have. Again, it depends a bit on the nature of the queries. For ideas on where to start, you might want to look here: http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server If you get any particular queries which aren't performing as well as you think they should, you can post here with details. See this for information to include: http://wiki.postgresql.org/wiki/SlowQueryQuestions -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] splitting data into multiple tables
Kevin Grittner wrote: nair rajiv nair...@gmail.com wrote: I found there is a table which will approximately have 5 crore entries after data harvesting. Is it advisable to keep so much data in one table ? That's 50,000,000 rows, right? You should remember that words like lac and crore are not English words, and most English speakers around the world don't know what they mean. Thousand, million, billion and so forth are the English words that everyone knows. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
On Mon, Jan 25, 2010 at 3:59 AM, Matthew Wakeling matt...@flymine.org wrote: On Mon, 25 Jan 2010, Richard Huxton wrote: OK - so the first query processes 19,799 rows in 31,219 ms (about 1.5ms per row) The second processes 2,606 rows in 3,813 ms (about 1.3ms per row). Agreed. One query is faster than the other because it has to do an eighth the amount of work. Matthew Thanks guys, ya this has dropped the time by half. The process is manageable now. Thanks again. For some reason we thought this method would make it take more time, vs less. So again appreciate the help :) Tory -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] splitting data into multiple tables
On Tue, Jan 26, 2010 at 1:01 AM, Craig James craig_ja...@emolecules.comwrote: Kevin Grittner wrote: nair rajiv nair...@gmail.com wrote: I found there is a table which will approximately have 5 crore entries after data harvesting. Is it advisable to keep so much data in one table ? That's 50,000,000 rows, right? You should remember that words like lac and crore are not English words, and most English speakers around the world don't know what they mean. Thousand, million, billion and so forth are the English words that everyone knows. Oh I am Sorry. I wasn't aware of that I repost my query with suggested changes. Hello, I am working on a project that will take out structured content from wikipedia and put it in our database. Before putting the data into the database I wrote a script to find out the number of rows every table would be having after the data is in and I found there is a table which will approximately have 50,000,000 rows after data harvesting. Is it advisable to keep so much data in one table ? I have read about 'partitioning' a table. An other idea I have is to break the table into different tables after the no of rows in a table has reached a certain limit say 10,00,000. For example, dividing a table 'datatable' to 'datatable_a', 'datatable_b' each having 10,00,000 rows. I needed advice on whether I should go for partitioning or the approach I have thought of. We have a HP server with 32GB ram,16 processors. The storage has 24TB diskspace (1TB/HD). We have put them on RAID-5. It will be great if we could know the parameters that can be changed in the postgres configuration file so that the database makes maximum utilization of the server we have. For eg parameters that would increase the speed of inserts and selects. Thank you in advance Rajiv Nair Craig
Re: [PERFORM] splitting data into multiple tables
On Tuesday 26 January 2010 01:39:48 nair rajiv wrote: On Tue, Jan 26, 2010 at 1:01 AM, Craig James craig_ja...@emolecules.comwrote: I am working on a project that will take out structured content from wikipedia and put it in our database. Before putting the data into the database I wrote a script to find out the number of rows every table would be having after the data is in and I found there is a table which will approximately have 50,000,000 rows after data harvesting. Is it advisable to keep so much data in one table ? Depends on your access patterns. I.e. how many rows are you accessing at the same time - do those have some common locality and such. I have read about 'partitioning' a table. An other idea I have is to break the table into different tables after the no of rows in a table has reached a certain limit say 10,00,000. For example, dividing a table 'datatable' to 'datatable_a', 'datatable_b' each having 10,00,000 rows. I needed advice on whether I should go for partitioning or the approach I have thought of. Your approach is pretty close to partitioning - except that partitioning makes that mostly invisible to the outside so it is imho preferrable. We have a HP server with 32GB ram,16 processors. The storage has 24TB diskspace (1TB/HD). We have put them on RAID-5. It will be great if we could know the parameters that can be changed in the postgres configuration file so that the database makes maximum utilization of the server we have. For eg parameters that would increase the speed of inserts and selects. Not using RAID-5 possibly would be a good start - many people (me included) experienced bad write performance on it. It depends a great deal on the controller/implementation though. RAID-10 is normally to be considered more advantageous despite its lower usable space ratio. Did you create one big RAID-5 out of all disks? Thats not a good idea, because its pretty likely that another disk fails while you restore a previously failed disk. Unfortunately in that configuration that means you have lost your complete data (in the most common implementations at least). Andres PS: Your lines are strangely wrapped... -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] splitting data into multiple tables
On Tue, Jan 26, 2010 at 6:19 AM, Andres Freund and...@anarazel.de wrote: On Tuesday 26 January 2010 01:39:48 nair rajiv wrote: On Tue, Jan 26, 2010 at 1:01 AM, Craig James craig_ja...@emolecules.comwrote: I am working on a project that will take out structured content from wikipedia and put it in our database. Before putting the data into the database I wrote a script to find out the number of rows every table would be having after the data is in and I found there is a table which will approximately have 50,000,000 rows after data harvesting. Is it advisable to keep so much data in one table ? Depends on your access patterns. I.e. how many rows are you accessing at the same time - do those have some common locality and such. I'll give a brief idea of how this table is. The important columns are subject, predicate and object. So given a predicate and object one should be able to get all the subjects, given subject and a predicate one should be able to retrieve all the objects. I have created an indexes on these three columns. I have read about 'partitioning' a table. An other idea I have is to break the table into different tables after the no of rows in a table has reached a certain limit say 10,00,000. For example, dividing a table 'datatable' to 'datatable_a', 'datatable_b' each having 10,00,000 rows. I needed advice on whether I should go for partitioning or the approach I have thought of. Your approach is pretty close to partitioning - except that partitioning makes that mostly invisible to the outside so it is imho preferrable. We have a HP server with 32GB ram,16 processors. The storage has 24TB diskspace (1TB/HD). We have put them on RAID-5. It will be great if we could know the parameters that can be changed in the postgres configuration file so that the database makes maximum utilization of the server we have. For eg parameters that would increase the speed of inserts and selects. Not using RAID-5 possibly would be a good start - many people (me included) experienced bad write performance on it. It depends a great deal on the controller/implementation though. RAID-10 is normally to be considered more advantageous despite its lower usable space ratio. Did you create one big RAID-5 out of all disks? Thats not a good idea, because its pretty likely that another disk fails while you restore a previously failed disk. Unfortunately in that configuration that means you have lost your complete data (in the most common implementations at least). No, I am using only 12TB i.e 12 HDs of the 24TB I have Andres PS: Your lines are strangely wrapped...
Re: [PERFORM] splitting data into multiple tables
On Tue, Jan 26, 2010 at 9:18 AM, nair rajiv nair...@gmail.com wrote: On Tue, Jan 26, 2010 at 6:19 AM, Andres Freund and...@anarazel.de wrote: On Tuesday 26 January 2010 01:39:48 nair rajiv wrote: On Tue, Jan 26, 2010 at 1:01 AM, Craig James craig_ja...@emolecules.comwrote: I am working on a project that will take out structured content from wikipedia and put it in our database. Before putting the data into the database I wrote a script to find out the number of rows every table would be having after the data is in and I found there is a table which will approximately have 50,000,000 rows after data harvesting. Is it advisable to keep so much data in one table ? Depends on your access patterns. I.e. how many rows are you accessing at the same time - do those have some common locality and such. I'll give a brief idea of how this table is. The important columns are subject, predicate and object. So given a predicate and object one should be able to get all the subjects, given subject and a predicate one should be able to retrieve all the objects. I have created an indexes on these three columns. I have read about 'partitioning' a table. An other idea I have is to break the table into different tables after the no of rows in a table has reached a certain limit say 10,00,000. For example, dividing a table 'datatable' to 'datatable_a', 'datatable_b' each having 10,00,000 rows. I needed advice on whether I should go for partitioning or the approach I have thought of. Your approach is pretty close to partitioning - except that partitioning makes that mostly invisible to the outside so it is imho preferrable. We have a HP server with 32GB ram,16 processors. The storage has 24TB diskspace (1TB/HD). We have put them on RAID-5. It will be great if we could know the parameters that can be changed in the postgres configuration file so that the database makes maximum utilization of the server we have. For eg parameters that would increase the speed of inserts and selects. Not using RAID-5 possibly would be a good start - many people (me included) experienced bad write performance on it. It depends a great deal on the controller/implementation though. RAID-10 is normally to be considered more advantageous despite its lower usable space ratio. Did you create one big RAID-5 out of all disks? Thats not a good idea, because its pretty likely that another disk fails while you restore a previously failed disk. Unfortunately in that configuration that means you have lost your complete data (in the most common implementations at least). No, I am using only 12TB i.e 12 HDs of the 24TB I have A 15k rpm SAS drive will give you a throughput of 12MB and 120 IOPS. Now you can calculate the number of disks, specifically spindles, for getting your desired throughput and IOPs Andres PS: Your lines are strangely wrapped...