[PERFORM] Read only transactions - Commit or Rollback

2005-12-20 Thread Markus Schaber
Hello,

We have a database containing PostGIS MAP data, it is accessed mainly
via JDBC. There are multiple simultaneous read-only connections taken
from the JBoss connection pooling, and there usually are no active
writers. We use connection.setReadOnly(true).

Now my question is what is best performance-wise, if it does make any
difference at all:

Having autocommit on or off? (I presume off)

Using commit or rollback?

Committing / rolling back occasionally (e. G. when returning the
connection to the pool) or not at all (until the pool closes the
connection)?

Thanks,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Read only transactions - Commit or Rollback

2005-12-20 Thread Nörder-Tuitje , Marcus
afaik, this should be completely neglectable.

starting a transaction implies write access. if there is none, You do not need 
to think about transactions, because there are none.

postgres needs to schedule the writing transactions with the reading ones, 
anyway.

But I am not that performance profession anyway ;-)


regards,
Marcus

-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Auftrag von Markus
Schaber
Gesendet: Dienstag, 20. Dezember 2005 11:41
An: PostgreSQL Performance List
Betreff: [PERFORM] Read only transactions - Commit or Rollback


Hello,

We have a database containing PostGIS MAP data, it is accessed mainly
via JDBC. There are multiple simultaneous read-only connections taken
from the JBoss connection pooling, and there usually are no active
writers. We use connection.setReadOnly(true).

Now my question is what is best performance-wise, if it does make any
difference at all:

Having autocommit on or off? (I presume off)

Using commit or rollback?

Committing / rolling back occasionally (e. G. when returning the
connection to the pool) or not at all (until the pool closes the
connection)?

Thanks,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] unsubscribe

2005-12-20 Thread William Lai
unsubscribe

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Read only transactions - Commit or Rollback

2005-12-20 Thread Grega Bremec

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160

Nörder-Tuitje wrote:
| We have a database containing PostGIS MAP data, it is accessed
| mainly via JDBC. There are multiple simultaneous read-only
| connections taken from the JBoss connection pooling, and there
| usually are no active writers. We use connection.setReadOnly(true).
|
| Now my question is what is best performance-wise, if it does make
| any difference at all:
|
| Having autocommit on or off? (I presume off)
|
| Using commit or rollback?
|
| Committing / rolling back occasionally (e. G. when returning the
| connection to the pool) or not at all (until the pool closes the
| connection)?
|
| afaik, this should be completely neglectable.
|
| starting a transaction implies write access. if there is none, You do
| not need to think about transactions, because there are none.
|
| postgres needs to schedule the writing transactions with the reading
| ones, anyway.
|
| But I am not that performance profession anyway ;-)

Hello, Marcus, Nörder, list.

What about isolation? For several dependent calculations, MVCC doesn't
happen a bit with autocommit turned on, right?

Cheers,
- --
~Grega Bremec
~gregab at p0f dot net
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFDp+2afu4IwuB3+XoRA6j3AJ0Ri0/NrJtHg4xBNcFsVFFW0XvCoQCfereo
aX6ThZIlPL0RhETJK9IcqtU=
=xalw
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Read only transactions - Commit or Rollback

2005-12-20 Thread Nörder-Tuitje , Marcus
Mmmm, good question.

MVCC blocks reading processes when data is modified. using autocommit implies 
that each modification statement is an atomic operation.

on a massive readonly table, where no data is altered, MVCC shouldn't have any 
effect (but this is only an assumption) basing on

http://en.wikipedia.org/wiki/Mvcc

using rowlevel locks with write access should make most of the mostly available 
to reading-only sessions, but this is an assumption only, too.

maybe the community knows a little more ;-)

regards,
marcus


-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Auftrag von Grega Bremec
Gesendet: Dienstag, 20. Dezember 2005 12:41
An: PostgreSQL Performance List
Betreff: Re: [PERFORM] Read only transactions - Commit or Rollback


-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160

Nörder-Tuitje wrote:
| We have a database containing PostGIS MAP data, it is accessed
| mainly via JDBC. There are multiple simultaneous read-only
| connections taken from the JBoss connection pooling, and there
| usually are no active writers. We use connection.setReadOnly(true).
|
| Now my question is what is best performance-wise, if it does make
| any difference at all:
|
| Having autocommit on or off? (I presume off)
|
| Using commit or rollback?
|
| Committing / rolling back occasionally (e. G. when returning the
| connection to the pool) or not at all (until the pool closes the
| connection)?
|
| afaik, this should be completely neglectable.
|
| starting a transaction implies write access. if there is none, You do
| not need to think about transactions, because there are none.
|
| postgres needs to schedule the writing transactions with the reading
| ones, anyway.
|
| But I am not that performance profession anyway ;-)

Hello, Marcus, Nörder, list.

What about isolation? For several dependent calculations, MVCC doesn't
happen a bit with autocommit turned on, right?

Cheers,
- --
~Grega Bremec
~gregab at p0f dot net
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFDp+2afu4IwuB3+XoRA6j3AJ0Ri0/NrJtHg4xBNcFsVFFW0XvCoQCfereo
aX6ThZIlPL0RhETJK9IcqtU=
=xalw
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Read only transactions - Commit or Rollback

2005-12-20 Thread Markus Schaber
Hi, Marcus,

Nörder-Tuitje wrote:
 afaik, this should be completely neglectable.
 
 starting a transaction implies write access. if there is none, You do
 not need to think about transactions, because there are none.

Hmm, I always thought that the transaction will be opened at the first
statement, because there _could_ be a parallel writing transaction
started later.

 postgres needs to schedule the writing transactions with the reading
 ones, anyway.

As I said, there usually are no writing transactions on the same database.

Btw, there's another setting that might make a difference:

Having ACID-Level SERIALIZABLE or READ COMMITED?

Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Read only transactions - Commit or Rollback

2005-12-20 Thread Michael Riess

Markus Schaber schrieb:

Hello,

We have a database containing PostGIS MAP data, it is accessed mainly
via JDBC. There are multiple simultaneous read-only connections taken
from the JBoss connection pooling, and there usually are no active
writers. We use connection.setReadOnly(true).

Now my question is what is best performance-wise, if it does make any
difference at all:

Having autocommit on or off? (I presume off)



If you are using large ResultSets, it is interesting to know that 
Statement.setFetchSize() does not do anything as long as you have 
autocommit on. So you might want to always disable autocommit and set a 
reasonable fetch size with large results, or otherwise have serious 
memory problems in Java/JDBC.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Read only transactions - Commit or Rollback

2005-12-20 Thread Andreas Seltenreich
Markus Schaber writes:

 As I said, there usually are no writing transactions on the same database.

 Btw, there's another setting that might make a difference:

 Having ACID-Level SERIALIZABLE or READ COMMITED?

Well, if nonrepeatable or phantom reads would pose a problem because
of those occasional writes, you wouldn't be considering autocommit for
performance reasons either, would you?

regards,
Andreas
-- 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Read only transactions - Commit or Rollback

2005-12-20 Thread Nicolas Barbier
On 12/20/05, Nörder-Tuitje, Marcus [EMAIL PROTECTED] wrote:

 MVCC blocks reading processes when data is modified.

That is incorrect. The main difference between 2PL and MVCC is that
readers are never blocked under MVCC.

greetings,
Nicolas

--
Nicolas Barbier
http://www.gnu.org/philosophy/no-word-attachments.html

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] filesystem performance with lots of files

2005-12-20 Thread David Roussel




David Lang wrote:

 ext3 has an option to make searching directories faster (htree), but
enabling it kills performance when you create files. And this doesn't
help with large files.
  
  

The ReiserFS white paper talks about the data structure he uses to
store directories (some kind of tree), and he says it's quick to both
read and write. Don't forget if you find ls slow, that could just be
ls, since it's ls, not the fs, that sorts this files into alphabetical
order.

 how long would it take to do a tar-ftp-untar cycle with no smarts

Note that you can do the taring, zipping, copying and untaring
concurrentlt. I can't remember the exactl netcat command line options,
but it goes something like this

Box1:
tar czvf - myfiles/* | netcat myserver:12345

Box2:
netcat -listen 12345 | tar xzvf -

Not only do you gain from doing it all concurrently, but not writing a
temp file means that disk seeks a reduced too if you have a one spindle
machine.

Also condsider just copying files onto a network mount. May not be as
fast as the above, but will be faster than rsync, which has high CPU
usage and thus not a good choice on a LAN.

Hmm, sorry this is not directly postgres anymore...

David




Re: [PERFORM] High context switches occurring

2005-12-20 Thread Tom Lane
Oleg Bartunov oleg@sai.msu.su writes:
 I see a very low performance and high context switches on our
 dual itanium2 slackware box (Linux ptah 2.6.14 #1 SMP)
 with 8Gb of RAM, running 8.1_STABLE. Any tips here ?

 [EMAIL PROTECTED]:~/cvs/8.1/pgsql/contrib/pgbench$ time pgbench -s 10 -c 10 
 -t 3000 pgbench
 starting vacuum...end.
 transaction type: TPC-B (sort of)
 scaling factor: 1
 number of clients: 10

You can't expect any different with more clients than scaling factor :-(.

Note that -s is only effective when supplied with -i; it's basically
ignored during an actual test run.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL and Ultrasparc T1

2005-12-20 Thread Alan Stange

Jignesh K. Shah wrote:

I guess it depends on what you term as your metric for measurement.
If it is just one query execution time .. It may not be the best on 
UltraSPARC T1.
But if you have more than 8 complex queries running simultaneously, 
UltraSPARC T1 can do well compared comparatively provided the 
application can scale also along with it.


I just want to clarify one issue here.   It's my understanding that the 
8-core, 4 hardware thread (known as strands) system is seen as a 32 cpu 
system by Solaris. 

So, one could have up to 32 postgresql processes running in parallel on 
the current systems (assuming the application can scale).


-- Alan

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL and Ultrasparc T1

2005-12-20 Thread David Lang

On Tue, 20 Dec 2005, Alan Stange wrote:


Jignesh K. Shah wrote:

I guess it depends on what you term as your metric for measurement.
If it is just one query execution time .. It may not be the best on 
UltraSPARC T1.
But if you have more than 8 complex queries running simultaneously, 
UltraSPARC T1 can do well compared comparatively provided the application 
can scale also along with it.


I just want to clarify one issue here.   It's my understanding that the 
8-core, 4 hardware thread (known as strands) system is seen as a 32 cpu 
system by Solaris. 
So, one could have up to 32 postgresql processes running in parallel on the 
current systems (assuming the application can scale).


note that like hyperthreading, the strands aren't full processors, their 
efficiancy depends on how much other threads shareing the core stall 
waiting for external things.


David Lang

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] PostgreSQL and Ultrasparc T1

2005-12-20 Thread Alan Stange

David Lang wrote:

On Tue, 20 Dec 2005, Alan Stange wrote:


Jignesh K. Shah wrote:

I guess it depends on what you term as your metric for measurement.
If it is just one query execution time .. It may not be the best on 
UltraSPARC T1.
But if you have more than 8 complex queries running simultaneously, 
UltraSPARC T1 can do well compared comparatively provided the 
application can scale also along with it.


I just want to clarify one issue here.   It's my understanding that 
the 8-core, 4 hardware thread (known as strands) system is seen as a 
32 cpu system by Solaris. So, one could have up to 32 postgresql 
processes running in parallel on the current systems (assuming the 
application can scale).


note that like hyperthreading, the strands aren't full processors, 
their efficiancy depends on how much other threads shareing the core 
stall waiting for external things. 
Exactly.  


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Read only transactions - Commit or Rollback

2005-12-20 Thread Markus Schaber
Hi, Tom,

Tom Lane wrote:

Some time ago, I had some tests with large bulk insertions, and it
turned out that SERIALIZABLE seemed to be 30% faster, which surprised us.
 
 That surprises me too --- can you provide details on the test case so
 other people can reproduce it?  AFAIR the only performance difference
 between SERIALIZABLE and READ COMMITTED is the frequency with which
 transaction status snapshots are taken; your report suggests you were
 spending 30% of the time in GetSnapshotData, which is a lot higher than
 I've ever seen in a profile.

It was in my previous Job two years ago, so I don't have access to the
exact code, and my memory is foggy. It was PostGIS 0.8 and PostgreSQL 7.4.

AFAIR, it was inserting into a table with about 6 columns and some
indices, some columns having database-provided values (now() and a
SERIAL column), where the other columns (a PostGIS Point, a long, a
foreign key into another table) were set via the aplication. We tried
different insertion methods (INSERT, prepared statements, a pgjdbc patch
to allow COPY support), different bunch sizes and different number of
parallel connections to get the highest overall insert speed. However,
the project never went productive the way it was designed initially.

As you write about transaction snapshots: It may be that the PostgreSQL
config was not optimized well enough, and the hard disk was rather slow.

 As to the original question, a transaction that hasn't modified the
 database does not bother to write either a commit or abort record to
 pg_xlog.  I think you'd be very hard pressed to measure any speed
 difference between saying COMMIT and saying ROLLBACK after a read-only
 transaction.  It'd be worth your while to let transactions run longer
 to minimize their startup/shutdown overhead, but there's a point of
 diminishing returns --- you don't want client code leaving transactions
 open for hours, because of the negative side-effects of holding locks
 that long (eg, VACUUM can't reclaim dead rows).

Okay, so I'll stick with my current behaviour (Autocommit off and
ROLLBACK after each bunch of work).

Thanks,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] PostgreSQL and Ultrasparc T1

2005-12-20 Thread Richard_D_Levine
Jignesh,

Juan says the following below:

I figured the number of cores on the T1000/2000 processors would be
utilized by the forked copies of the postgresql server.  From the comments
I have seen so far it does not look like this is the case.

I think this needs to be refuted.  Doesn't Solaris switch processes as well
as threads (LWPs, whatever) equally well amongst cores?  I realize the
process context switch is more expensive than the thread switch, but
Solaris will utilize all cores as processes or threads become ready to run,
correct?

BTW, it's great to see folks with your email address on the list.  I feel
it points to a brighter future for all involved.

Thanks,

Rick


   
 Jignesh K. Shah 
 [EMAIL PROTECTED] 
   To 
 Sent by:  Juan Casero [EMAIL PROTECTED]   
 pgsql-performance  cc 
 [EMAIL PROTECTED] pgsql-performance@postgresql.org
 .org  Subject 
   Re: [PERFORM] PostgreSQL and
   Ultrasparc T1   
 12/19/2005 11:19  
 PM
   
   
   
   




I guess it depends on what you term as your metric for measurement.
If it is just one query execution time .. It may not be the best on
UltraSPARC T1.
But if you have more than 8 complex queries running simultaneously,
UltraSPARC T1 can do well compared comparatively provided the
application can scale also along with it.

The best way to approach is to figure out your peak workload, find an
accurate way to measure the true metric and then design a  benchmark
for it and run it on both servers.

Regards,
Jignesh


Juan Casero wrote:

Ok.  That  is what I wanted to know.  Right now this database is a
PostgreSQL
7.4.8 system.  I am using it in a sort of DSS role.  I have weekly
summaries
of the sales for our division going back three years.  I have a PHP based
webapp that I wrote to give the managers access to this data.  The webapp
lets them make selections for reports and then it submits a parameterized
query to the database for execution.  The returned data rows are displayed

and formatted in their web browser.  My largest sales table is about 13
million rows along with all the indexes it takes up about 20 gigabytes.  I

need to scale this application up to nearly 100 gigabytes to handle daily
sales summaries.  Once we start looking at daily sales figures our
database
size could grow ten to twenty times.  I use postgresql because it gives me

the kind of enterprise database features I need to program the complex
logic
for the queries.I also need the transaction isolation facilities it
provides so I can optimize the queries in plpgsql without worrying about
multiple users temp tables colliding with each other.  Additionally, I
hope
to rewrite the front end application in JSP so maybe I could use the
multithreaded features of the Java to exploit a multicore multi-cpu
system.
There are almost no writes to the database tables.   The bulk of the
application is just executing parameterized queries and returning huge
amounts of data.  I know bizgres is supposed to be better at this but I
want
to stay away from anything that is beta.  I cannot afford for this thing
to
go wrong.  My reasoning for looking at the T1000/2000 was simply the large

number of cores.  I  know postgresql uses a super server that forks copies
of
itself to handle incoming requests on port 5432.  But I figured the number
of
cores on the T1000/2000 processors would be utilized by the forked copies
of
the postgresql server.  From the comments I have seen so far it does not
look
like this is the case.  We had originally sized up a dual processor dual
core
AMD opteron system from HP for this but I thought I could get more bang
for
the buck on a T1000/2000.  It now seems I may have been wrong.  I am
stronger
in Linux than Solaris so I am not upset I am just trying to find the best
hardware for the anticipated needs of this application.

Thanks,
Juan

On Monday 19 December 2005 01:25, Scott Marlowe wrote:


From: [EMAIL PROTECTED] on behalf of Juan Casero

QUOTE:

Hi -


Can anyone tell me how well PostgreSQL 8.x performs on the new Sun
Ultrasparc T1 processor and architecture on 

Re: [PERFORM] PostgreSQL and Ultrasparc T1

2005-12-20 Thread Jignesh K. Shah
But yes All LWPs (processes and threads) are switched across  virtual 
CPUS . There is intelligence built in Solaris to understand which 
strands are  executing on which cores and it will balance out the cores 
too so if there are only 8 threads running they will essentially run on 
separate cores rather than 2 cores with 8 threads.


The biggest limitation is application scaling. pgbench shows that with 
more processes trying to bottleneck on same files will probably not 
perform better unless you tune your storage/file system. Those are the 
issues which we typically try to solve with community partners (vendors, 
open source) since that gives the biggest benefits.


Best example to verify in such multi-processes environment, do you see 
greater than 60% avg CPU utilization in your dual/quad config 
Xeons/Itaniums, then Sun Fire T2000 will help you a lot. However if you 
are stuck below 50% (for dual) or 25% (for quad) which means you are 
pretty much stuck at 1 CPU performance and/or  probably have more IO 
related contention then it won't help you with these systems.


I hope you get the idea on when a workload will perform better on Sun 
Fire T2000 without burning hands.


I will try to test some more with PostgreSQL on these systems to kind of 
highlight what can work or what will not work.


Is pgbench the workload that you prefer? (It already has issues with 
pg_xlog so my guess is it probably won't scale much)

If you have other workload informations let me know.

Thanks.
Regards,
Jignesh



[EMAIL PROTECTED] wrote:


Jignesh,

Juan says the following below:

I figured the number of cores on the T1000/2000 processors would be
utilized by the forked copies of the postgresql server.  From the comments
I have seen so far it does not look like this is the case.

I think this needs to be refuted.  Doesn't Solaris switch processes as well
as threads (LWPs, whatever) equally well amongst cores?  I realize the
process context switch is more expensive than the thread switch, but
Solaris will utilize all cores as processes or threads become ready to run,
correct?

BTW, it's great to see folks with your email address on the list.  I feel
it points to a brighter future for all involved.

Thanks,

Rick


  
Jignesh K. Shah 
[EMAIL PROTECTED] 
  To 
Sent by:  Juan Casero [EMAIL PROTECTED]   
pgsql-performance  cc 
[EMAIL PROTECTED] pgsql-performance@postgresql.org
.org  Subject 
  Re: [PERFORM] PostgreSQL and
  Ultrasparc T1   
12/19/2005 11:19  
PM
  
  
  
  





I guess it depends on what you term as your metric for measurement.
If it is just one query execution time .. It may not be the best on
UltraSPARC T1.
But if you have more than 8 complex queries running simultaneously,
UltraSPARC T1 can do well compared comparatively provided the
application can scale also along with it.

The best way to approach is to figure out your peak workload, find an
accurate way to measure the true metric and then design a  benchmark
for it and run it on both servers.

Regards,
Jignesh


Juan Casero wrote:

 


Ok.  That  is what I wanted to know.  Right now this database is a
   


PostgreSQL
 


7.4.8 system.  I am using it in a sort of DSS role.  I have weekly
   


summaries
 


of the sales for our division going back three years.  I have a PHP based
webapp that I wrote to give the managers access to this data.  The webapp
lets them make selections for reports and then it submits a parameterized
query to the database for execution.  The returned data rows are displayed
   



 


and formatted in their web browser.  My largest sales table is about 13
million rows along with all the indexes it takes up about 20 gigabytes.  I
   



 


need to scale this application up to nearly 100 gigabytes to handle daily
sales summaries.  Once we start looking at daily sales figures our
   


database
 


size could grow ten to twenty times.  I use postgresql because it gives me
   



 


the kind of enterprise database features I need to program the complex
   


logic
 


for the queries.I also need the 

[PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-20 Thread Antal Attila

Hi!

What do you suggest for the next problem?
We have complex databases with some 100million rows (2-3million new 
records per month). Our current servers are working on low resposibility 
in these days, so we have to buy new hardver for database server. Some 
weeks ago we started to work with PostgreSQL8.1, which solved the 
problem for some months.
There are some massive, hard query execution, which are too slow (5-10 
or more minutes). The parallel processing is infrequent (rarely max. 4-5 
parallel query execution).
So we need high performance in query execution with medium parallel 
processability.
What's your opinion what productions could help us? What is the best or 
only better choice?

The budget line is about 30 000$ - 40 000$.

Regards, Atesz

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Overriding the optimizer

2005-12-20 Thread Jim C. Nasby
On Sat, Dec 17, 2005 at 07:31:40AM -0500, Jaime Casanova wrote:
   Yeah it would - an implementation I have seen that I like is where the
   developer can supply the *entire* execution plan with a query. This is
   complex enough to make casual use unlikely :-), but provides the ability
   to try out other plans, and also fix that vital query that must run
   today.
 
  Being able to specify an exact plan would also provide for query plan
  stability; something that is critically important in certain
  applications. If you have to meet a specific response time requirement
  for a query, you can't afford to have the optimizer suddenly decide that
  some other plan might be faster when in fact it's much slower.
 
 Plan stability doesn't mean time response stability...
 The plan that today is almost instantaneous tomorrow can take hours...

Sure, if your underlying data changes that much, but that's often not
going to happen in production systems (especially OLTP where this is
most important).

Of course if you have a proposal for ensuring that a query always
finishes in X amount of time, rather than always using the same plan,
I'd love to hear it. ;)
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] make bulk deletes faster?

2005-12-20 Thread Jim C. Nasby
On Mon, Dec 19, 2005 at 11:10:50AM -0800, James Klo wrote:
 Yes, I've considered partitioning as a long term change. I was thinking 
 about this for other reasons - mainly performance.  If I go the 
 partitioning route, would I need to even perform archival?

No. The idea is that you have your table split up into date ranges
(perhaps each week gets it's own table). IE: table_2005w01,
table_2005w02, etc. You can do this with either inheritence or
individual tables and a UNION ALL view. In your case, inheritence is
probably the better way to go.

Now, if you have everything broken down by weeks and you typically only
need to access 7 days worth of data, then generally you will only be
reading from two tables, so those two tables should stay in memory, and
indexes on them will be smaller. If desired, you can also play tricks on
the older tables surch as vacuum full or cluster to further reduce space
usage and improve performance.

 The larger problem that I need to solve is really twofold:
 
 1. Need to keep reads on timeblocks that are from the current day 
 through the following seven days very fast, especially current day reads.
 
 2. Need to be able to maintain the timeblocks for reporting purposes, 
 for at least a year (potentially more).  This could probably better 
 handled performing aggregate analysis, but this isn't on my current radar.

I've written an RRD-like implementation in SQL that might interest you;
it's at http://rrs.decibel.org (though the svn web access appears to be
down right now...)
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL and Ultrasparc T1

2005-12-20 Thread Jim C. Nasby
On Tue, Dec 20, 2005 at 12:20:55PM -0500, Jignesh K. Shah wrote:
 Is pgbench the workload that you prefer? (It already has issues with 
 pg_xlog so my guess is it probably won't scale much)
 If you have other workload informations let me know.

From what the user described, dbt3 would probably be the best benchmark
to use. Note that they're basically read-only, which is absolutely not
what pgbench does.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] PostgreSQL and Ultrasparc T1

2005-12-20 Thread Jim C. Nasby
On Sun, Dec 18, 2005 at 11:35:15AM -0500, Juan Casero wrote:
 Can anyone tell me how well PostgreSQL 8.x performs on the new Sun Ultrasparc
 T1 processor and architecture on Solaris 10?   I have a custom built retail
 sales reporting that I developed using PostgreSQL 7.48 and PHP on a Fedora

People have seen some pretty big gains going from 7.4 to 8.1. I recently
migrated http://stats.distributed.net and the daily processing
(basically OLAP) times were cut in half.

As someone else mentioned, IO is probably going to be your biggest
consideration, unless you have a lot of queries running at once.
Probably your best bang for the buck will be from an Opteron-based
server with a good number of internal drives.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Any way to optimize GROUP BY queries?

2005-12-20 Thread Jim C. Nasby
On Mon, Dec 19, 2005 at 03:47:35PM -0500, Greg Stark wrote:
 Increase your work_mem (or sort_mem in older postgres versions), you can do
 this for the server as a whole or just for this one session and set it back
 after this one query. You can increase it up until it starts causing swapping
 at which point it would be counter productive.

Just remember that work_memory is per-operation, so it's easy to push
the box into swapping if the workload increases. You didn't say how much
memory you have, but I'd be careful if work_memory * max_connections
gets very much larger than your total memory.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] separate drives for WAL or pgdata files

2005-12-20 Thread Jim C. Nasby
On Mon, Dec 19, 2005 at 07:20:56PM -0800, David Lang wrote:
 for persistant storage you can replicate from your ram-based system to a 
 disk-based system, and as long as your replication messages hit disk 
 quickly you can allow the disk-based version to lag behind in it's updates 
 during your peak periods (as long as it is able to catch up with the 
 writes overnight), and as the disk-based version won't have to do the 
 seeks for the reads it will be considerably faster then if it was doing 
 all the work (especially if you have good, large  battery-backed disk 
 caches to go with those drives to consolodate the writes)

Huh? Unless you're doing a hell of a lot of writing just run a normal
instance and make sure you have enough bandwidth to the drives with
pg_xlog on it. Make sure those drives are using a battery-backed raid
controller too. You'll also need to tune things to make sure that
checkpoints never have much (if any) work to do when the occur, but you
should be able to set that up with proper bg_writer tuning.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] filesystem performance with lots of files

2005-12-20 Thread Jim C. Nasby
On Tue, Dec 20, 2005 at 01:26:00PM +, David Roussel wrote:
 Note that you can do the taring, zipping, copying and untaring 
 concurrentlt.  I can't remember the exactl netcat command line options, 
 but it goes something like this
 
 Box1:
 tar czvf - myfiles/* | netcat myserver:12345
 
 Box2:
 netcat -listen 12345 | tar xzvf -

You can also use ssh... something like

tar -cf - blah/* | ssh machine tar -xf -
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-20 Thread Jim C. Nasby
On Tue, Dec 20, 2005 at 07:27:15PM +0100, Antal Attila wrote:
 We have complex databases with some 100million rows (2-3million new 

How much space does that equate to?

 records per month). Our current servers are working on low resposibility 
 in these days, so we have to buy new hardver for database server. Some 
 weeks ago we started to work with PostgreSQL8.1, which solved the 
 problem for some months.
 There are some massive, hard query execution, which are too slow (5-10 
 or more minutes). The parallel processing is infrequent (rarely max. 4-5 
 parallel query execution).
 So we need high performance in query execution with medium parallel 
 processability.
 What's your opinion what productions could help us? What is the best or 
 only better choice?
 The budget line is about 30 000$ - 40 000$.

Have you optimized the queries?

Items that generally have the biggest impact on performance in
decreasing order:
1. System architecture
2. Database design
3. (for long-running/problem queries) Query plans
4. Disk I/O
5. Memory
6. CPU

So, I'd make sure that the queries have been optimized (and that
includes tuning postgresql.conf) before assuming you need more hardware.

Based on what you've told us (very little parallelization), then your
biggest priority is probably either disk IO or memory (or both). Without
knowing the size of your database/working set it's difficult to provide
more specific advice.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-20 Thread Vivek Khera


On Dec 20, 2005, at 1:27 PM, Antal Attila wrote:


The budget line is about 30 000$ - 40 000$.


Like Jim said, without more specifics it is hard to give more  
specific recommendations, but I'm architecting something like this  
for my current app which needs ~100GB disk space.  I made room to  
grow in my configuration:


dual opteron 2.2GHz
4GB RAM
LSI MegaRAID 320-2X
14-disk SCSI U320 enclosure with 15k RPM drives, 7 connected to each  
channel on the RAID.

  1 pair in RAID1 mirror for OS + pg_xlog
  rest in RAID10 with each mirrored pair coming from opposite SCSI  
channels for data


I run FreeBSD but whatever you prefer should be sufficient if it is  
not windows.


I don't know how prices are in Hungary, but around here something  
like this with 36GB drives comes to around $11,000 or $12,000.


The place I concentrate on is the disk I/O bandwidth which is why I  
prefer Opteron over Intel XEON.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Any way to optimize GROUP BY queries?

2005-12-20 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 On Mon, Dec 19, 2005 at 03:47:35PM -0500, Greg Stark wrote:
 Increase your work_mem (or sort_mem in older postgres versions), you can do
 this for the server as a whole or just for this one session and set it back
 after this one query. You can increase it up until it starts causing swapping
 at which point it would be counter productive.

 Just remember that work_memory is per-operation, so it's easy to push
 the box into swapping if the workload increases. You didn't say how much
 memory you have, but I'd be careful if work_memory * max_connections
 gets very much larger than your total memory.

It's considered good practice to have a relatively small default
work_mem setting (in postgresql.conf), and then let individual sessions
push up the value locally with SET work_mem if they are going to
execute queries that need it.  This works well as long as you only have
one or a few such heavy sessions at a time.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-20 Thread Juan Casero
Can you elaborate on the reasons the opteron is better than the Xeon when it 
comes to disk io?   I have a PostgreSQL 7.4.8 box running a DSS.   One of our 
tables is about 13 million rows.   I had a number of queries against this 
table that used innner joins on 5 or 6 tables including the 13 million row 
one.  The performance was atrocious.  The database itself is about 20 gigs 
but I want it to scale to 100 gigs.  I tuned postgresql as best I could and 
gave the server huge amounts of memory for caching as well.  I also tweaked 
the cost parameters for a sequential scan vs an index scan of the query 
optimizer and used the query explain mechanism to get some idea of what the 
optimizer was doing and where I should index the tables.  When I added the 
sixth table to the inner join the query performance took a nose dive.  
Admittedly this system is a single PIII 1000Mhz with 1.2 gigs of ram and no 
raid.  I do have two Ultra 160 scsi drives with the database tables mount 
point on a partition on one physical drive and pg_xlog mount point on another 
partition of the second drive.I have been trying to get my employer to 
spring for new hardware ($8k to $10k) which I had planned to be a dual - dual 
core opteron system from HP.  Until they agree to spend the money I resorted 
to writing a plpgsql functions to handle the queries.  Inside plpgsql I can 
break the query apart into seperate stages each of which runs much faster.  I 
can use temporary tables to store intermediate results without worrying about 
temp table collisions with different users thanks to transaction isolation.
I am convinced we need new hardware to scale this application *but* I agree 
with the consensus voiced here that it is more important to optimize the 
query first before going out to buy new hardware.   I was able to do things 
with PostgreSQL on this cheap server that I could never imagine doing with 
SQL server or even oracle on such a low end box.  My OS is Fedora Core 3 but 
I wonder if anyone has tested and benchmarked PostgreSQL on the new Sun x64 
servers running Solaris 10 x86.

Thanks,
Juan

On Tuesday 20 December 2005 16:08, Vivek Khera wrote:
 On Dec 20, 2005, at 1:27 PM, Antal Attila wrote:
  The budget line is about 30 000$ - 40 000$.

 Like Jim said, without more specifics it is hard to give more
 specific recommendations, but I'm architecting something like this
 for my current app which needs ~100GB disk space.  I made room to
 grow in my configuration:

 dual opteron 2.2GHz
 4GB RAM
 LSI MegaRAID 320-2X
 14-disk SCSI U320 enclosure with 15k RPM drives, 7 connected to each
 channel on the RAID.
1 pair in RAID1 mirror for OS + pg_xlog
rest in RAID10 with each mirrored pair coming from opposite SCSI
 channels for data

 I run FreeBSD but whatever you prefer should be sufficient if it is
 not windows.

 I don't know how prices are in Hungary, but around here something
 like this with 36GB drives comes to around $11,000 or $12,000.

 The place I concentrate on is the disk I/O bandwidth which is why I
 prefer Opteron over Intel XEON.


 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-20 Thread David Lang

On Tue, 20 Dec 2005, Juan Casero wrote:


Date: Tue, 20 Dec 2005 19:50:47 -0500
From: Juan Casero [EMAIL PROTECTED]
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

Can you elaborate on the reasons the opteron is better than the Xeon when it
comes to disk io?


the opteron is cheaper so you have more money to spend on disks :-)

also when you go into multi-cpu systems the front-side-bus design of the 
Xeon's can easily become your system bottleneck so that you can't take 
advantage of all the CPU's becouse they stall waiting for memory accesses, 
Opteron systems have a memory bus per socket so the more CPU's you have 
the more memory bandwidth you have.




The database itself is about 20 gigs
but I want it to scale to 100 gigs.


how large is the working set? in your tests you ran into swapping on your 
1.2G system, buying a dual opteron with 16gigs of ram will allow you to 
work with much larger sets of data, and you can go beyond that if needed.


David Lang

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] effizient query with jdbc

2005-12-20 Thread Johannes Bühler
 Hi, 
 I have a java.util.List of values (1) which i wanted to use for a
 query in the where clause of an simple select statement. iterating
 over the list and and use an prepared Statement is quite slow. Is
 there a more efficient way to execute such a query.  Thanks for any
 help.  Johannes 
 . 
 List ids = new ArrayList(); 
 
  List is filled with 1 values ...
 
 List uuids = new ArrayList(); 
 PreparedStatement pstat = db.prepareStatement(SELECT UUID FROM
 MDM.KEYWORDS_INFO WHERE KEYWORDS_ID = ?);  for (Iterator iter =
 ids.iterator(); iter.hasNext();) { String id = (String) iter.next();
 pstat.setString(1, id);
 rs = pstat.executeQuery();
 if (rs.next()) {
 uuids.add(rs.getString(1));
 }
 rs.close();
 } 
 ... 
 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] effizient query with jdbc

2005-12-20 Thread Bühler , Johannes
Hi,
I have a java.util.List of values (1) which i wanted to use for a query in 
the where clause of an simple select statement. iterating over the list and and 
use an prepared Statement is quite slow. Is there a more efficient way to 
execute such a query.
Thanks for any help.
 
Johannes
 
.
 
List ids = new ArrayList(); 

 List is filled with 1 values ...

List uuids = new ArrayList();
 
PreparedStatement pstat = db.prepareStatement(SELECT UUID FROM 
MDM.KEYWORDS_INFO WHERE KEYWORDS_ID = ?);
 
for (Iterator iter = ids.iterator(); iter.hasNext();) {
   String id = (String) iter.next();
   pstat.setString(1, id);
   rs = pstat.executeQuery();
 if (rs.next()) {
uuids.add(rs.getString(1));
   }
   rs.close();
  }
...
 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Speed of different procedural language

2005-12-20 Thread Ben Trewern
I have a few small functions which I need to write.  They will be hopefully 
quick running but will happen on almost every delete, insert and update on 
my database (for audit purposes).

I know I should be writing these in C but that's a bit beyond me.  I was 
going to try PL/Python or PL/Perl or even PL/Ruby.  Has anyone any idea 
which language is fastest, or is the data access going to swamp the overhead 
of small functions?

Thanks,

Ben 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] effizient query with jdbc

2005-12-20 Thread johannesbuehler
Hi, 
I have a java.util.List of values (1) which i wanted to use for a query in 
the where clause of an simple select statement. iterating over the list and and 
use an prepared Statement is quite slow. Is there a more efficient way to 
execute such a query. 

Thanks for any help. 
Johannes 
. 
List ids = new ArrayList();

 List is filled with 1 values ...

List uuids = new ArrayList(); 
PreparedStatement pstat = db.prepareStatement(SELECT UUID FROM 
MDM.KEYWORDS_INFO WHERE KEYWORDS_ID = ?); 
for (Iterator iter = ids.iterator(); iter.hasNext();) {
String id = (String) iter.next();
pstat.setString(1, id);
rs = pstat.executeQuery();
if (rs.next()) {
uuids.add(rs.getString(1));
}
rs.close();
} 
... 






---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Simple Join

2005-12-20 Thread Mitchell Skinner
On Wed, 2005-12-14 at 17:47 -0500, Tom Lane wrote:
 That plan looks perfectly fine to me.  You could try forcing some other
 choices by fooling with the planner enable switches (eg set
 enable_seqscan = off) but I doubt you'll find much improvement.  There
 are too many rows being pulled from ordered_products to make an index
 nestloop a good idea.

Well, I'm no expert either, but if there was an index on
ordered_products (paid, suspended_sub, id) it should be mergejoinable
with the index on to_ship.ordered_product_id, right?  Given the
conditions on paid and suspended_sub.

If you (Kevin) try adding such an index, ideally it would get used given
that you're only pulling out a small fraction of the rows in to_ship.
If it doesn't get used, then I had a similar issue with 8.0.3 where an
index that was mergejoinable (only because of the restrictions in the
where clause) wasn't getting picked up.

Mitch

Kevin Brown wrote:
 CREATE TABLE to_ship
 (
   id int8 NOT NULL DEFAULT nextval(('to_ship_seq'::text)::regclass),
   ordered_product_id int8 NOT NULL,
   bounced int4 NOT NULL DEFAULT 0,
   operator_id varchar(20) NOT NULL,
   timestamp timestamptz NOT NULL DEFAULT ('now'::text)::timestamp(6)
 with 
 time zone,
   CONSTRAINT to_ship_pkey PRIMARY KEY (id),
   CONSTRAINT to_ship_ordered_product_id_fkey FOREIGN KEY
 (ordered_product_id) 
 REFERENCES ordered_products (id) ON UPDATE RESTRICT ON DELETE RESTRICT
 ) 
 WITHOUT OIDS;
 
 CREATE TABLE ordered_products
 (
   id int8 NOT NULL DEFAULT
 nextval(('ordered_products_seq'::text)::regclass),
   order_id int8 NOT NULL,
   product_id int8 NOT NULL,
   recipient_address_id int8 NOT NULL,
   hide bool NOT NULL DEFAULT false,
   renewal bool NOT NULL DEFAULT false,
   timestamp timestamptz NOT NULL DEFAULT ('now'::text)::timestamp(6)
 with 
 time zone,
   operator_id varchar(20) NOT NULL,
   suspended_sub bool NOT NULL DEFAULT false,
   quantity int4 NOT NULL DEFAULT 1,
   price_paid numeric NOT NULL,
   tax_paid numeric NOT NULL DEFAULT 0,
   shipping_paid numeric NOT NULL DEFAULT 0,
   remaining_issue_obligation int4 NOT NULL DEFAULT 0,
   parent_product_id int8,
   delivery_method_id int8 NOT NULL,
   paid bool NOT NULL DEFAULT false,
   CONSTRAINT ordered_products_pkey PRIMARY KEY (id),
   CONSTRAINT ordered_products_order_id_fkey FOREIGN KEY (order_id)
 REFERENCES 
 orders (id) ON UPDATE RESTRICT ON DELETE RESTRICT,
   CONSTRAINT ordered_products_parent_product_id_fkey FOREIGN KEY 
 (parent_product_id) REFERENCES ordered_products (id) ON UPDATE
 RESTRICT ON 
 DELETE RESTRICT,
   CONSTRAINT ordered_products_recipient_address_id_fkey FOREIGN KEY 
 (recipient_address_id) REFERENCES addresses (id) ON UPDATE RESTRICT ON
 DELETE 
 RESTRICT
 ) 
 WITHOUT OIDS;
 
 === The two indexes that should matter ===
 CREATE INDEX ordered_product_id_index
   ON to_ship
   USING btree
   (ordered_product_id);
 
 CREATE INDEX paid_index
   ON ordered_products
   USING btree
   (paid);
 
 ordered_products.id is a primary key, so it should have an implicit
 index.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] SAN/NAS options

2005-12-20 Thread Charles Sprickman

Hello all,

It seems that I'm starting to outgrow our current Postgres setup.  We've 
been running a handful of machines as standalone db servers.  This is all 
in a colocation environment, so everything is stuffed into 1U Supermicro 
boxes.  Our standard build looks like this:


Supermicro 1U w/SCA backplane and 4 bays
2x2.8 GHz Xeons
Adaptec 2015S zero channel RAID card
2 or 4 x 73GB Seagate 10K Ultra 320 drives (mirrored+striped)
2GB RAM
FreeBSD 4.11
PGSQL data from 5-10GB per box

Recently I started studying what we were running up against in our nightly 
runs that do a ton of updates/inserts to prep things for the tasks the db 
does during the business day (light mix of selects/inserts/updates). 
While we have plenty of disk bandwidth (according to bonnie), we are 
really dying on IOPS.  I'm guessing this is a mix of a rather anemic RAID 
controller (ever notice how adaptec doesn't publish any real 
performance specs on raid cards?) and having only two or four spindles 
(effectively 1 or 2 on writes).


So that's where we are...

I'm new to the whole SAN thing, but did recently pick up a few used NetApp 
shelves and a Fibre Channel RAID HBA (Mylex ExtremeRAID 3000, also used) 
to toy with.  I started wondering if I could put something together to 
both get our storage on one set of boxes and allow me to get data striped 
across more drives.  Our budget is not huge and we are not adverse to 
getting used gear where appropriate.


What do you folks recommend?  I'm just starting to look at what's out 
there for SANs and NAS, and from what I've seen, our options are:


NetApp Filers - the pluses with these are that if we use NFS, we don't 
have to worry about either large filesystem support in FreeBSD (2TB 
practical limit), or limitation on growing partitions as the NetApp just 
deals with that.  I also understand these make backups a bit simpler.  I 
have a great, trusted, spare-stocking source for these.


Apple X-Serve RAID - well, it's pretty cheap.  Honestly, that's all I know 
about it - they don't talk about IOPS numbers, and I have no idea what 
lurks in that box as a RAID controller.


SAN box w/integrated RAID - it seems like this might not be a good choice 
since the RAID hardware in the box may be where I hit any limits.  I also 
imagine I'm probably overpaying for some OEM RAID controller integrated 
into the box.  No idea where to look for used gear.


SAN box, JBOD - this seems like it might be affordable as well.  A few big 
shelves full of drives a SAN switch to plug all the shelves and hosts 
into and a FC RAID card in each host.  No idea where to look for used gear 
here either.


You'll note that I'm being somewhat driven by my OS of choice, FreeBSD. 
Unlike Solaris or other commercial offerings, there is no nice volume 
management available.  While I'd love to keep managing a dozen or so 
FreeBSD boxes, I could be persuaded to go to Solaris x86 if the volume 
management really shines and Postgres performs well on it.


Lastly, one thing that I'm not yet finding in trying to educate myself on 
SANs is a good overview of what's come out in the past few years that's 
more affordable than the old big-iron stuff.  For example I saw some brief 
info on this list's archives about the Dell/EMC offerings.  Anything else 
in that vein to look at?


I hope this isn't too far off topic for this list.  Postgres is the 
main application that I'm looking to accomodate.  Anything else I can do 
with whatever solution we find is just gravy...


Thanks!

Charles


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] [postgis-users] Is my query planner failing me,or vice versa?

2005-12-20 Thread Mark Cave-Ayland
 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: Wed 12/14/2005 9:36 PM
 To:   Gregory S. Williamson
 Cc:   pgsql-performance@postgresql.org; PostGIS Users Discussion
 Subject:  Re: [PERFORM] [postgis-users] Is my query planner failing
me,
 or vice versa?
 Gregory S. Williamson [EMAIL PROTECTED] writes:
  Forgive the cross-posting, but I found myself wondering if might not
  be some way future way of telling the planner that a given table
  (column ?) has a high likelyhood of being TOASTed.
 
 What would you expect the planner to do with the information, exactly?
 
 We could certainly cause ANALYZE to record some estimate of this, but
 I'm not too clear on what happens after that...
 
   regards, tom lane


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:postgis-users-
 [EMAIL PROTECTED] On Behalf Of Gregory S. Williamson
 Sent: 15 December 2005 12:03
 To: Tom Lane
 Cc: pgsql-performance@postgresql.org; PostGIS Users Discussion
 Subject: RE: [PERFORM] [postgis-users] Is my query planner failing me,or
 vice versa?
 
 Well, what does the random_page_cost do internally ?
 
 I don't think I'd expect postgres to be able to *do* anything in
 particular, any more than I would expect it to do something about slow
 disk I/O or having limited cache. But it might be useful to the EXPLAIN
 ANALYZE in estimating costs of retrieving such data.
 
 Admittedly, this is not as clear as wanting a sequential scan in
 preference to indexed reads when there are either very few rows or a huge
 number, but it strikes me as useful to me the DBA to have this factoid
 thrust in front of me when considering why a given query is slower than I
 might like. Perhaps an added time based on this factor and the
 random_page_cost value, since lots of TOAST data and a high access time
 would indicate to my (ignorant!) mind that retrieval would be slower,
 especially over large data sets.
 
 Forgive my ignorance ... obviously I am but a humble user. grin.
 
 G


As I understood from the original discussions with Markus/Tom, the problem
was that the optimizer didn't consider the value of the VacAttrStats
stawidth value when calculating the cost of a sequential scan. I don't know
if this is still the case though - Tom will probably have a rough idea
already whereas I would need to spend some time sifting through the source.

However, I do know that the PostGIS statistics collector does store the
average detoasted geometry size in stawidth during ANALYZE so the value is
there if it can be used.


Kind regards,

Mark.


WebBased Ltd
17 Research Way
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023

http://www.webbased.co.uk   
http://www.infomapper.com
http://www.swtc.co.uk  

This email and any attachments are confidential to the intended recipient
and may also be privileged. If you are not the intended recipient please
delete it from your system and notify the sender. You should not copy it or
use it for any purpose nor disclose or distribute its contents to any other
person.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] SAN/NAS options

2005-12-20 Thread Charles Sprickman

On Wed, 14 Dec 2005, Charles Sprickman wrote:

[big snip]

The list server seems to be regurgitating old stuff, and in doing so it 
reminded me to thank everyone for their input.  I was kind of waiting to 
see if anyone who was very pro-NAS/SAN was going to pipe up, but it looks 
like most people are content with per-host storage.


You've given me a lot to go on...  Now I'm going to have to do some 
research as to real-world RAID controller performance.  It's vexing (to 
say the least) that most vendors don't supply any raw throughput or TPS 
stats on this stuff...


Anyhow, thanks again.  You'll probably see me back here in the coming 
months as I try to shake some mysql info out of my brain as our pgsql DBA 
gets me up to speed on pgsql and what specifically he's doing to stress 
things.


Charles

I hope this isn't too far off topic for this list.  Postgres is the main 
application that I'm looking to accomodate.  Anything else I can do with 
whatever solution we find is just gravy...


Thanks!

Charles


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Simple Join

2005-12-20 Thread Mark Kirkwood

Mark Kirkwood wrote:

Kevin Brown wrote:


I'll just start by warning that I'm new-ish to postgresql.

I'm running 8.1 installed from source on a Debian Sarge server.  I 
have a simple query that I believe I've placed the indexes correctly 
for, and I still end up with a seq scan.  It makes sense, kinda, but 
it should be able to use the index to gather the right values.  I do 
have a production set of data inserted into the tables, so this is 
running realistically:


dli=# explain analyze SELECT ordered_products.product_id
dli-# FROM to_ship, ordered_products
dli-# WHERE to_ship.ordered_product_id = ordered_products.id AND
dli-# ordered_products.paid = TRUE AND
dli-# ordered_products.suspended_sub = FALSE;



You scan 60 rows from to_ship to get about 25000 - so some way to 
cut this down would help.


Try out an explicit INNER JOIN which includes the filter info for paid 
and suspended_sub in the join condition (you may need indexes on each of 
id, paid and suspended_sub, so that the 8.1 optimizer can use a bitmap 
scan):



SELECT ordered_products.product_id
FROM to_ship INNER JOIN ordered_products
ON (to_ship.ordered_product_id = ordered_products.id
AND ordered_products.paid = TRUE  AND 
ordered_products.suspended_sub = FALSE);



It has been a quiet day today, so I took another look at this. If the 
selectivity of clauses :


paid = TRUE
suspended_sub = FALSE

is fairly high, then rewriting as a subquery might help:

SELECT o.product_id
FROM ordered_products o
WHERE o.paid = TRUE
AND o.suspended_sub = FALSE
AND  EXISTS (
  SELECT 1
  FROM to_ship s
  WHERE s.ordered_product_id = o.id
);


However it depends on you not needing anything from to_ship in the 
SELECT list...


Cheers

Mark

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match