[PERFORM] Sql result b where condition

2010-01-25 Thread ramasubramanian

Hi all,
   I have a table emp. using where condition can i get the result 
prioritized.

Take the example below.

select ENAME,ORIG_SALARY from employee where (ename='Tom' and 
orig_salary=2413)or(orig_salary=1234 )


if the fist condition(ename='Tom' and orig_salary=2413) is satified then 10 
rows will be returned, for the second condition (orig_salary=1234 ) there 
are 20 rows will be returned.

The order of display should be

The first 10 rows then
next 20 rows.
Thanks  Regards,
Ram



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sql result b where condition

2010-01-25 Thread A. Kretschmer
In response to ramasubramanian :

Please, create a new mail for a new topic and don't hijack other
threads.


 Hi all,
I have a table emp. using where condition can i get the result 
 prioritized.
 Take the example below.
 
 select ENAME,ORIG_SALARY from employee where (ename='Tom' and 
 orig_salary=2413)or(orig_salary=1234 )
 
 if the fist condition(ename='Tom' and orig_salary=2413) is satified then 10 
 rows will be returned, for the second condition (orig_salary=1234 ) there 
 are 20 rows will be returned.
 The order of display should be
 
 The first 10 rows then
 next 20 rows.
 Thanks  Regards,
 Ram

For instance:

select ENAME,ORIG_SALARY, 1 as my_order from employee where (ename='Tom' and
orig_salary=2413) union all select ENAME,ORIG_SALARY, 2 employee where
(orig_salary=1234 ) order by my_order.

other solution (untested):

select ENAME,ORIG_SALARY, case when (ename='Tom' and orig_salary=2413)
then 1 else 2 end as my_order from employee where (ename='Tom' and
orig_salary=2413)or(orig_salary=1234 ) order by my_order;


Regards, Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

2010-01-25 Thread Richard Huxton

On 22/01/10 19:06, Tory M Blue wrote:

 Here is the explain plan for the query. Actual rows that the query
 returns is 6369

Actually, it processes 19,799 rows (see the actual rows= below).


SLOW



  -   Bitmap Heap Scan on userstats  (cost=797.69..118850.46
rows=13399 width=8) (actual time=281.604..31190.290 rows=19799
loops=1)



Total runtime: 31219.536 ms



FAST



  -   Bitmap Heap Scan on userstats a  (cost=802.66..118855.43
rows=33276 width=23) (actual time=55.400..3807.908 rows=2606 loops=1)



Total runtime: 3813.626 ms


OK - so the first query processes 19,799 rows in 31,219 ms (about 1.5ms 
per row)


The second processes 2,606 rows in 3,813 ms (about 1.3ms per row).

You are asking for DISTINCT user-ids, so it's seems reasonable that it 
will take slightly longer to check a larger set of user-ids.


Otherwise, both queries are the same. I'm still a little puzzled by the 
bitmap scan, but the planner probably knows more about your data than I do.


The main time is spent in the bitmap heap scan which is where it's 
grabbing actual row data (and presumably building a hash over the uid 
column). you can see how long in the actual time the first number 
(e.g. 281.604) is the time spent before it starts, and the second is the 
total time at finish (31190.290). If loops was greater than 1 you 
would multiply the times by the number of loops to get a total.


So - there's nothing wrong in the sense that the second query does the 
same as the first. Let's take a step back. What you really want is your 
reports to be faster.


You mentioned you were running this query thousands of times with a 
different makeid each time. Running it once for all possible values 
and stashing the results in a temp table will probably be *much* faster. 
The planner can just scan the whole table once and build up its results 
as it goes.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

2010-01-25 Thread Matthew Wakeling

On Mon, 25 Jan 2010, Richard Huxton wrote:
OK - so the first query processes 19,799 rows in 31,219 ms (about 1.5ms per 
row)


The second processes 2,606 rows in 3,813 ms (about 1.3ms per row).


Agreed. One query is faster than the other because it has to do an eighth 
the amount of work.


Matthew

--
I wouldn't be so paranoid if you weren't all out to get me!!

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sql result b where condition

2010-01-25 Thread Matthew Wakeling

On Mon, 25 Jan 2010, A. Kretschmer wrote:

In response to ramasubramanian :

Please, create a new mail for a new topic and don't hijack other
threads.


Even more so - this isn't probably the right mailing list for generic sql 
help questions.



select ENAME,ORIG_SALARY from employee where (ename='Tom' and
orig_salary=2413)or(orig_salary=1234 )

if the fist condition(ename='Tom' and orig_salary=2413) is satified then 10
rows will be returned, for the second condition (orig_salary=1234 ) there
are 20 rows will be returned.
The order of display should be

The first 10 rows then
next 20 rows.



select ENAME,ORIG_SALARY, 1 as my_order from employee where (ename='Tom' and
orig_salary=2413) union all select ENAME,ORIG_SALARY, 2 employee where
(orig_salary=1234 ) order by my_order.


Or just:

select ENAME,ORIG_SALARY from employee where (ename='Tom' and
orig_salary=2413)or(orig_salary=1234 ) ORDER BY orig_salary DESC

as there is going to be only two values for orig_salary.

Matthew

--
The early bird gets the worm. If you want something else for breakfast, get
up later.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sql result b where condition

2010-01-25 Thread A. Kretschmer
In response to Matthew Wakeling :
 On Mon, 25 Jan 2010, A. Kretschmer wrote:
 In response to ramasubramanian :
 
 Please, create a new mail for a new topic and don't hijack other
 threads.
 
 Even more so - this isn't probably the right mailing list for generic sql 
 help questions.

ACK.

 select ENAME,ORIG_SALARY, 1 as my_order from employee where (ename='Tom' 
 and
 orig_salary=2413) union all select ENAME,ORIG_SALARY, 2 employee where
 (orig_salary=1234 ) order by my_order.
 
 Or just:
 
 select ENAME,ORIG_SALARY from employee where (ename='Tom' and
 orig_salary=2413)or(orig_salary=1234 ) ORDER BY orig_salary DESC
 
 as there is going to be only two values for orig_salary.

hehe, yes, overseen that fact ;-)


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-25 Thread fka...@googlemail.com
Scott Carey:

  (2) The tests:
  
  Note: The standard speed was about 800MB/40s, so 20MB/s.
  
  
  a)
  What I changed: fsync=off
  Result: 35s, so 5s faster.
  
  
  b) like a) but:
  checkpoint_segments=128 (was 3)
  autovacuum=off
  
  Result: 35s (no change...?!)
  
 
 yes, more checkpoint_segments will help if your
 shared_buffers is larger, it won't do a whole lot
 otherwise.  Generally, I like to keep these roughly equal
 sized as a starting point for any small to medium sized
 configuration.  So if shared_buffers is 1GB, that takes 64
 checkpoint segments to hold for heavy write scenarios.

(1)

Ok, that's what I tested: 1024 MB shared_buffers, 64
checkpoint segments.

Unfortunatelly I could not run it on the same hardware
anymore: The data is written to a single disk now, not raid
anymore. So with the default shared_buffers of 8 MB (?) we
should expect 45s for writing the 800 MB. With the large
shared_buffers and checkpoints (mentioned above) I got this:

1. run (right after postgres server (re-)start): 28s (!)
2. run: 44s
3. run: 42s

So, roughly the same as with small buffers.


(2)
Then I switched again from 8.2.4 to 8.4.2:

1. run (after server start): 25s.
2. run: 38s
3. run: 38s

So, 8.4 helped a bit over 8.2.


(3) All in all

By (1) + (2) the performance bottleneck has, however,
changed a lot (as shown here by the performance monitor):

Now, the test system is definitly disk bound. Roughly
speaking, at the middle of the whole test, for about 40-50%
of the time, the 'data' disk was at 100% (and the 'WAL' at
20%), while before and after that the 'WAL' disk had a lot
of peaks at 100% (and 'data' disk at 30%).

The average MB/s of the 'data' disk was 40 MB/s (WAL:
20MB/s) -- while the raw performance is 800MB/40s = 20MB/s,
so still *half* what the disk does.

So, this remains as the last open question to me: It seems
the data is doubly written to the 'data' disk, although WAL
is written to the separate 'WAL' disk.



  Ok, I've managed to use 8.4 here. Unfortunatelly: There was
  nearly no improvement in speed. For example test 2d)
  performed in 35s.
  
 
 With a very small shared_buffers the improvements to
 Postgres' shared_buffer / checkpoint interaction can not
 be utilized.

See above.


Thank You
 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] splitting data into multiple tables

2010-01-25 Thread nair rajiv
Hello,

  I am working on a project that will take out structured content
from wikipedia
and put it in our database. Before putting the data into the database I
wrote a script to
find out the number of rows every table would be having after the data is in
and I found
there is a table which will approximately have 5 crore entries after data
harvesting.
Is it advisable to keep so much data in one table ?
  I have read about 'partitioning' a table. An other idea I have is
to break the table into
different tables after the no of rows  in a table has reached a certain
limit say 10 lacs.
For example, dividing a table 'datatable' to 'datatable_a', 'datatable_b'
each having 10 lac entries.
I needed advice on whether I should go for partitioning or the approach I
have thought of.
  We have a HP server with 32GB ram,16 processors. The storage has
24TB diskspace (1TB/HD).
We have put them on RAID-5. It will be great if we could know the parameters
that can be changed in the
postgres configuration file so that the database makes maximum utilization
of the server we have.
For eg parameters that would increase the speed of inserts and selects.


Thank you in advance
Rajiv Nair


Re: [PERFORM] splitting data into multiple tables

2010-01-25 Thread Kevin Grittner
nair rajiv nair...@gmail.com wrote:
 
 I found there is a table which will approximately have 5 crore
 entries after data harvesting.
 Is it advisable to keep so much data in one table ?
 
That's 50,000,000 rows, right?  At this site, you're looking at a
non-partitioned table with more than seven times that if you go to a
case and click the Court Record Events button:
 
http://wcca.wicourts.gov/
 
 I have read about 'partitioning' a table. An other idea I have is
 to break the table into different tables after the no of rows  in
 a table has reached a certain limit say 10 lacs.
 For example, dividing a table 'datatable' to 'datatable_a',
 'datatable_b' each having 10 lac entries.
 I needed advice on whether I should go for partitioning or the
 approach I have thought of.
 
It can help, and it can hurt.  It depends on the nature of the data
and how it is used.  To get a meaningful answer, I think we'd need
to know a bit more about it.
 
 We have a HP server with 32GB ram,16 processors. The storage has
 24TB diskspace (1TB/HD).
 We have put them on RAID-5. It will be great if we could know the
 parameters that can be changed in the postgres configuration file
 so that the database makes maximum utilization of the server we
 have.
 
Again, it depends a bit on the nature of the queries.  For ideas on
where to start, you might want to look here:
 
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
 
If you get any particular queries which aren't performing as well as
you think they should, you can post here with details.  See this for
information to include:
 
http://wiki.postgresql.org/wiki/SlowQueryQuestions
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] splitting data into multiple tables

2010-01-25 Thread Craig James

Kevin Grittner wrote:

nair rajiv nair...@gmail.com wrote:
 

I found there is a table which will approximately have 5 crore
entries after data harvesting.
Is it advisable to keep so much data in one table ?
 
That's 50,000,000 rows, right?


You should remember that words like lac and crore are not English words, and 
most English speakers around the world don't know what they mean.  Thousand, 
million, billion and so forth are the English words that everyone knows.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

2010-01-25 Thread Tory M Blue
On Mon, Jan 25, 2010 at 3:59 AM, Matthew Wakeling matt...@flymine.org wrote:
 On Mon, 25 Jan 2010, Richard Huxton wrote:

 OK - so the first query processes 19,799 rows in 31,219 ms (about 1.5ms
 per row)

 The second processes 2,606 rows in 3,813 ms (about 1.3ms per row).

 Agreed. One query is faster than the other because it has to do an eighth
 the amount of work.

 Matthew

Thanks guys, ya this has dropped the time by half. The process is
manageable now. Thanks again. For some reason we thought this method
would make it take more time, vs less. So again appreciate the help :)

Tory

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] splitting data into multiple tables

2010-01-25 Thread nair rajiv
On Tue, Jan 26, 2010 at 1:01 AM, Craig James craig_ja...@emolecules.comwrote:

 Kevin Grittner wrote:

 nair rajiv nair...@gmail.com wrote:


 I found there is a table which will approximately have 5 crore
 entries after data harvesting.
 Is it advisable to keep so much data in one table ?

  That's 50,000,000 rows, right?


 You should remember that words like lac and crore are not English words,
 and most English speakers around the world don't know what they mean.
  Thousand, million, billion and so forth are the English words that everyone
 knows.




Oh I am Sorry. I wasn't aware of that
I repost my query with suggested changes.



Hello,

  I am working on a project that will take out structured content
from wikipedia
and put it in our database. Before putting the data into the database I
wrote a script to
find out the number of rows every table would be having after the data is in
and I found
there is a table which will approximately have 50,000,000 rows after data
harvesting.
Is it advisable to keep so much data in one table ?
  I have read about 'partitioning' a table. An other idea I have is
to break the table into
different tables after the no of rows  in a table has reached a certain
limit say 10,00,000.
For example, dividing a table 'datatable' to 'datatable_a', 'datatable_b'
each having 10,00,000 rows.
I needed advice on whether I should go for partitioning or the approach I
have thought of.
  We have a HP server with 32GB ram,16 processors. The storage has
24TB diskspace (1TB/HD).
We have put them on RAID-5. It will be great if we could know the parameters
that can be changed in the
postgres configuration file so that the database makes maximum utilization
of the server we have.
For eg parameters that would increase the speed of inserts and selects.


Thank you in advance
Rajiv Nair


 Craig



Re: [PERFORM] splitting data into multiple tables

2010-01-25 Thread Andres Freund
On Tuesday 26 January 2010 01:39:48 nair rajiv wrote:
 On Tue, Jan 26, 2010 at 1:01 AM, Craig James 
craig_ja...@emolecules.comwrote:
   I am working on a project that will take out structured content
 from wikipedia
 and put it in our database. Before putting the data into the database I
 wrote a script to
 find out the number of rows every table would be having after the data is
 in and I found
 there is a table which will approximately have 50,000,000 rows after data
 harvesting.
 Is it advisable to keep so much data in one table ?
Depends on your access patterns. I.e. how many rows are you accessing at the 
same time - do those have some common locality and such.


   I have read about 'partitioning' a table. An other idea I have is
 to break the table into
 different tables after the no of rows  in a table has reached a certain
 limit say 10,00,000.
 For example, dividing a table 'datatable' to 'datatable_a', 'datatable_b'
 each having 10,00,000 rows.
 I needed advice on whether I should go for partitioning or the approach I
 have thought of.
Your approach is pretty close to partitioning - except that partitioning makes 
that mostly invisible to the outside so it is imho preferrable.

   We have a HP server with 32GB ram,16 processors. The storage has
 24TB diskspace (1TB/HD).
 We have put them on RAID-5. It will be great if we could know the
 parameters that can be changed in the
 postgres configuration file so that the database makes maximum utilization
 of the server we have.
 For eg parameters that would increase the speed of inserts and selects.
Not using RAID-5 possibly would be a good start - many people (me included) 
experienced bad write performance on it. It depends a great deal on the 
controller/implementation though.
RAID-10 is normally to be considered more advantageous despite its lower 
usable space ratio.
Did you create one big RAID-5 out of all disks? Thats not a good idea, because 
its pretty likely that another disk fails while you restore a previously 
failed disk. Unfortunately in that configuration that means you have lost your 
complete data (in the most common implementations at least).

Andres

PS: Your lines are strangely wrapped...

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] splitting data into multiple tables

2010-01-25 Thread nair rajiv
On Tue, Jan 26, 2010 at 6:19 AM, Andres Freund and...@anarazel.de wrote:

 On Tuesday 26 January 2010 01:39:48 nair rajiv wrote:
  On Tue, Jan 26, 2010 at 1:01 AM, Craig James
 craig_ja...@emolecules.comwrote:
I am working on a project that will take out structured content
  from wikipedia
  and put it in our database. Before putting the data into the database I
  wrote a script to
  find out the number of rows every table would be having after the data is
  in and I found
  there is a table which will approximately have 50,000,000 rows after data
  harvesting.
  Is it advisable to keep so much data in one table ?
 Depends on your access patterns. I.e. how many rows are you accessing at
 the
 same time - do those have some common locality and such.


 I'll give a brief idea of how this table is. The important columns
are
subject, predicate and object. So given a predicate and object one should
be able to get all the subjects, given subject and a predicate one should
be able to retrieve all the objects. I have created an indexes on these
three
columns.



I have read about 'partitioning' a table. An other idea I have
 is
  to break the table into
  different tables after the no of rows  in a table has reached a certain
  limit say 10,00,000.
  For example, dividing a table 'datatable' to 'datatable_a', 'datatable_b'
  each having 10,00,000 rows.
  I needed advice on whether I should go for partitioning or the approach I
  have thought of.
 Your approach is pretty close to partitioning - except that partitioning
 makes
 that mostly invisible to the outside so it is imho preferrable.

We have a HP server with 32GB ram,16 processors. The storage
 has
  24TB diskspace (1TB/HD).
  We have put them on RAID-5. It will be great if we could know the
  parameters that can be changed in the
  postgres configuration file so that the database makes maximum
 utilization
  of the server we have.
  For eg parameters that would increase the speed of inserts and selects.
 Not using RAID-5 possibly would be a good start - many people (me included)
 experienced bad write performance on it. It depends a great deal on the
 controller/implementation though.
 RAID-10 is normally to be considered more advantageous despite its lower
 usable space ratio.
 Did you create one big RAID-5 out of all disks? Thats not a good idea,
 because
 its pretty likely that another disk fails while you restore a previously
 failed disk. Unfortunately in that configuration that means you have lost
 your
 complete data (in the most common implementations at least).


No, I am using only 12TB i.e 12 HDs of the 24TB I have


 Andres

 PS: Your lines are strangely wrapped...



Re: [PERFORM] splitting data into multiple tables

2010-01-25 Thread Viji V Nair
On Tue, Jan 26, 2010 at 9:18 AM, nair rajiv nair...@gmail.com wrote:



 On Tue, Jan 26, 2010 at 6:19 AM, Andres Freund and...@anarazel.de wrote:

 On Tuesday 26 January 2010 01:39:48 nair rajiv wrote:
  On Tue, Jan 26, 2010 at 1:01 AM, Craig James
 craig_ja...@emolecules.comwrote:
I am working on a project that will take out structured
 content
  from wikipedia
  and put it in our database. Before putting the data into the database I
  wrote a script to
  find out the number of rows every table would be having after the data
 is
  in and I found
  there is a table which will approximately have 50,000,000 rows after
 data
  harvesting.
  Is it advisable to keep so much data in one table ?
 Depends on your access patterns. I.e. how many rows are you accessing at
 the
 same time - do those have some common locality and such.


  I'll give a brief idea of how this table is. The important columns
 are
 subject, predicate and object. So given a predicate and object one should
 be able to get all the subjects, given subject and a predicate one should
 be able to retrieve all the objects. I have created an indexes on these
 three
 columns.



I have read about 'partitioning' a table. An other idea I have
 is
  to break the table into
  different tables after the no of rows  in a table has reached a certain
  limit say 10,00,000.
  For example, dividing a table 'datatable' to 'datatable_a',
 'datatable_b'
  each having 10,00,000 rows.
  I needed advice on whether I should go for partitioning or the approach
 I
  have thought of.
 Your approach is pretty close to partitioning - except that partitioning
 makes
 that mostly invisible to the outside so it is imho preferrable.

We have a HP server with 32GB ram,16 processors. The storage
 has
  24TB diskspace (1TB/HD).
  We have put them on RAID-5. It will be great if we could know the
  parameters that can be changed in the
  postgres configuration file so that the database makes maximum
 utilization
  of the server we have.
  For eg parameters that would increase the speed of inserts and selects.
 Not using RAID-5 possibly would be a good start - many people (me
 included)
 experienced bad write performance on it. It depends a great deal on the
 controller/implementation though.
 RAID-10 is normally to be considered more advantageous despite its lower
 usable space ratio.
 Did you create one big RAID-5 out of all disks? Thats not a good idea,
 because
 its pretty likely that another disk fails while you restore a previously
 failed disk. Unfortunately in that configuration that means you have lost
 your
 complete data (in the most common implementations at least).


 No, I am using only 12TB i.e 12 HDs of the 24TB I have


A 15k rpm SAS drive will give you a throughput of 12MB  and 120 IOPS. Now
you can calculate the number of disks, specifically spindles, for getting
your desired throughput and IOPs



 Andres

 PS: Your lines are strangely wrapped...