Re: [HACKERS] sun blade 1000 donation

2009-05-28 Thread Jignesh K. Shah



On 05/27/09 22:00, Josh Berkus wrote:

Andy,


I have a Sun blade 1000 that's just collecting dust now days.  I was
wondering if there were any pg-hackers that could find use for it.

Its dual UltraSPARC III 750 (I think) and has two 36? gig fiber channel
scsi disks.

It weighs a ton.

I'd be happy to donate it to a good cause.


Feh, as much as we need more servers, we're really limited in our 
ability to accept stuff which is large  high power consumption.


Now, if we had a DSL line we could hook it to, I could see using it 
for the buildfarm; it would be interesting old HW / old Solaris for us.




Actually I think you can use cutting edge OpenSolaris 2009.06 release 
(which will happen in less than a week)  for SPARC on that hardware. I 
haven't tried it out on Sun Blade 1000/2000 yet but in theory you can. 
Refer to the following thread


http://mail.opensolaris.org/pipermail/indiana-discuss/2009-February/014134.html

Though you will need an Automated Installer setup to install OpenSolaris 
on SPARC

http://dlc.sun.com/osol/docs/content/dev/AIinstall/index.html


Regards,
Jignesh


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Revisiting default_statistics_target

2009-05-22 Thread Jignesh K. Shah



Greg Sabino Mullane wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


  

While most cases were dead even or a modest improvement, his dbt-2 results
suggest a 15-20% regression in 8.4.  Changing the default_statistics_taget
to 100 was responsible for about 80% of that regression.


...
  

The situation where the stats target being so low hurts things the most
are the data warehouse use cases.



Er...so why should we change our defaults to support data warehousing
users? Certainly the rest of the postgresql.conf settings don't attempt
to do that.

  


The test that was greatly impacted is DBT-2 (OLTP Benchmark) with 8.4 
defaults but seems to work fine/better when reverted to use 8.3 
defaults. The 8.4 defaults seemed to improve DBT-3 (Data Warehousing) 
though I haven't retested them with 8.3 defaults in 8.4.   Of course I 
am not a big fan of DBT-2 myself and I am just providing datapoints of 
what I observed during my testing with various workloads. I certainly 
don't claim to understand what is happening (yet).


-Jignesh


The bump from 10 to 100 was supported by microbenchmarks that suggested it
would be tolerable.



No, the 10 to 100 was supported by years of people working in the field who
routinely did that adjustment (and 100) and saw great gains. Also, as the one
who originally started the push to 100, my original goal was to get it over the
magic 99 bump, at which the planner started acting very differently. This
caused a huge performance regression in one of the Postgres releases (don't
remember which one exactly), which severely impacted one of our large clients.

  

That doesn't seem to be reality here though, and it's questionable whether
this change really helps the people who need to fool with the value the most.



The goal of defaults is not to help people who fool with the value - it's to
get a good default out of the box for people who *don't* fool with all the
values. :)

  

But unless someone has some compelling evidence to the contrary, it looks like
the stats target needs to go back to a lower value.



Please don't. This is a very good change, and I don't see why changing it back
because it might hurt people doing DW is a good thing, when most of users are
not doing DW.

  

As for the change to constraint_exclusion, the regression impact there is
much less severe and the downside of getting it wrong is pretty bad.



Similarly, the people who are affected by something like presumably are not
running a default postgresql.conf anyway, so they can toggle it back to squeeze
a little more performance out of their system.

- --
Greg Sabino Mullane g...@turnstep.com
End Point Corporation
PGP Key: 0x14964AC8 200905221239
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAkoW1iEACgkQvJuQZxSWSsh1gACgqHBcwEd0zLsfbZJvCnXywlGp
jZ8AoNn79heFG+iLE2uh6eZ0lxRmwuHR
=/A/F
-END PGP SIGNATURE-


  


--
Jignesh Shah   http://blogs.sun.com/jkshah  
The New Sun Microsystems,Inc   http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimizing Read-Only Scalability

2009-05-16 Thread Jignesh K. Shah



Simon Riggs wrote:

On Thu, 2009-05-14 at 16:21 -0700, Josh Berkus wrote:

  

So we can optimize away the scan through the procarray by doing two if
tests, one outside of the lock, one inside. In normal running, both will
be optimized away, though in read-only periods we would avoid much work.
  

How much work would it be to work up a test patch?



Not much. The most important thing is a place to test it and access to
detailed feedback. Let's see if Dimitri does this.

There are some other tuning aspects to be got right first also, but
those are already known.

  
I would be interested in testing it out.. I have been collecting some 
sysbench read-scalability numbers and some other numbers that I can cook 
up with dbt3 , igen.. So I have a frame of reference on those numbers .. 
I am sure we can always use some extra performance.


Regards,
Jignesh

--
Jignesh Shah   http://blogs.sun.com/jkshah  
The New Sun Microsystems,Inc   http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Closing some 8.4 open items

2009-04-09 Thread Jignesh K. Shah



On 04/08/09 13:10, Josh Berkus wrote:

On 4/8/09 9:44 AM, Tom Lane wrote:

Josh Berkusj...@agliodbs.com  writes:

What about seq scans?


If the kernel can't read-ahead a seqscan by itself, it's unlikely to
be smart enough to be helped by posix_fadvise ... or at least so I
would think.  Do you have reason to think differently?


Well, Solaris 10 + UFS should be helped by fadvise -- in theory at 
least, it would eliminate the need to modify your mount points for 
better readahead when setting up a PG-Solaris server.  Solaris-UFS 
quite lazy about readahead.  Zdenek, Jignesh?


Definitely this helps.. specially since forcedirectio hurts CLOGs and 
helps WAL .. something that can be done without really impacting the 
whole file system always helps.


Solaris by default only does readahead upto 56K  and max tunable is 1MB. 
If you use forcedirectio there is no readahead by the filesystem itself


ZFS is different it has no forcedirectio and hence fadvise flag for now 
is ignored.


Regards,
Jignesh


You're probably correct about Linux and FreeBSD.  I don't know if OSX 
+ HFS supports fadvise.  If so, it could only help; readahead on HFS 
right now is nonexistant.


Presumably fadvise is useless on Windows.  Anyone know?




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-11 Thread Jignesh K. Shah



Josh Berkus wrote:

Heikki,

Ideally, of course, there would be no wal_buffers setting, and WAL 
buffers would be allocated from shared_buffers pool on demand...




+1

--Josh



+1

-Jignesh


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposal: New LWLockmode LW_OWNER

2008-06-06 Thread Jignesh K. Shah


Currently there are two modes of LWLock : SHARED and EXCLUSIVE
Mostly you need to have EXCLUSIVE lock mode to make any changes, add, 
delete and SHARED if you are just reading it.  Multiple backends can 
grab SHARED mode simultaneously while only one Backend can grab 
EXCLUSIVE at a time. There are situations when there are opportunities 
that certain values are typically changed by the same backend also in 
such cases when multiple backend  wants to change similarly values that 
they themselves typically change there is no lockmode which allows 
simultaneous updates in such cases and the result is sequential changes 
to the values which can be done simultaneously


New Lock Mode Proposed: LW_EX_OWNER  (input on better name will be 
appreciated).


So now there will be three modes SHARED, EXCLUSIVE, EX_OWNER
They will still be all mutually exclusive in the sense at any given time 
there can be only one mode active. However there is a marked difference 
while values of SHARED can be 0..N and EXCLUSIVE can be 0..1, the new 
lock mode EX_OWNER can be again 0..N.


This is primarily important so that we can carry out tasks of updates 
which not necessarily will be modified by any other backend.. Protection 
is guranteed since mostly old code will still have EXCLUSIVE lock so 
they will still need to work as guranteed. However advantage is for 
certain operations where we can still allow others to write into their 
own area with this EX_OWNER lock mode which allows multiple backend 
into shared area that they own. The area is not guaranteed by the lock 
mode but the procedure themselves and lock framework need not worry 
about that.


I wrote a prototype which needs changes in lwlock.h and lwlock.c and 
modifies lwlock.c which also awakes all SHARED together and if the first 
waiter is EX_OWNER it awakes all EX_OWNER together and if it EXCLUSIVE 
then just wakes the EXCLUSIVE waiter..



The potential for this new lock mode  can be used in various scenarios 
though it will need separate proposals on how to handle them since all 
bases are not covered yet but just as examples here:
1.  Proc array structure: Many times specific proc array structure is 
modified by the same backend.
2. WAL Buffers themselves: One way to make wal_buffer scalable is to 
have areas defined and have certain backend go against such areas rather 
than get the whole buffer lock. EXCLUSIVE is still used for most of them 
except certain identified parts..

3. Many other not identified yet.


Right now this proposal is only for the new Lock mode. Since thats a 
Lock framework enhancements which can give rise to multiple uses later


Regards,
Jignesh


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: New LWLockmode LW_OWNER

2008-06-06 Thread Jignesh K. Shah



Tom Lane wrote:

Jignesh K. Shah [EMAIL PROTECTED] writes:
  
New Lock Mode Proposed: LW_EX_OWNER  (input on better name will be 
appreciated).



This seems rather crazy, and you haven't actually given a single
convincing use-case.  Shouldn't you be trying to break down a lock
into multiple locks instead of inventing new lock semantics that
nobody really understands?




One area that I find it useful is where it will be useful is in 
ProcArrayEndTransaction where it uses exclusive to update  proc array 
structure where right now it uses EXCLUSIVE and most commit transactions 
are updating their own proc array structure this lock semantic can be 
useful.. However I havent figured out on the last line where it updates 
ShmemVariableCache-latestCompletedXid which might require still an 
EXCLUSIVE lock and hence did not propose the use case.


http://doxygen.postgresql.org/procarray_8c-source.html#l00231

The whole concept of a single Exclusive lock to me is more flawed  than 
the proposed idea. Single Exclusive locks are artificial bottlenecks in 
PostgreSQL and thats why a new lock semantic is helpful in opening 
people's mind beyond exclusive lock and suddenly people will start doing 
more parallel work where possible and LW_EX_OWNER allows that to work. 
Which infact will allow somebody else to innovate on 
ProcArrayEndTransaction and solve the problem that I could not figure 
regarding latestCompletedXid.



In my sample test where putting LW_EX_OWNER or LW_OWNER as I have it my 
code in ProcArrayEndTransaction  the throughput of HEAD CVS went from 
200,000 transactions per minute to 300,000 transactions per minute but 
though in my case latestCompletedXid is unsafe. If thats solved, there 
is a potential upswing in scalability in PostgreSQL core.


Once that function is implemented correctly, it will highlight other 
places where such lock semantics could prove to be useful.. (My money on 
WALInsertLock)



-Jignesh




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: New LWLockmode LW_OWNER

2008-06-06 Thread Jignesh K. Shah



Tom Lane wrote:

Jignesh K. Shah [EMAIL PROTECTED] writes:
  

Tom Lane wrote:


This seems rather crazy, and you haven't actually given a single
convincing use-case.
  


  
One area that I find it useful is where it will be useful is in 
ProcArrayEndTransaction where it uses exclusive to update  proc array 
structure where right now it uses EXCLUSIVE and most commit transactions 
are updating their own proc array structure this lock semantic can be 
useful..



That is exactly a place where you CAN'T use this, because it will break
transactional semantics, specifically serialization of commits relative
to snapshots.  See all the discussions around the last refactoring of
ProcArray locking, and particularly the summary in
src/backend/access/transam/README.


  


Quoting from the README that you mentioned:

Formally, the correctness requirement is if a snapshot A considers 
transaction X as committed, and any of transaction X's snapshots 
considered transaction Y as committed, then snapshot A must consider 
transaction Y as committed.


What we actually enforce is strict serialization of commits and 
rollbacks with snapshot-taking: we do not allow any transaction to exit 
the set of running transactions while a snapshot is being taken.  (This 
rule is stronger than necessary for consistency, but is relatively 
simple to enforce, and it assists with some other issues as explained 
below.)  The implementation of this is that GetSnapshotData takes the 
ProcArrayLock in shared mode (so that multiple backends can take 
snapshots in parallel), but ProcArrayEndTransaction must take the 
ProcArrayLock in exclusive mode while clearing MyProc-xid at 
transaction end (either commit or abort).


ProcArrayEndTransaction also holds the lock while advancing the shared 
latestCompletedXid variable.  This allows GetSnapshotData to use 
latestCompletedXid + 1 as xmax for its snapshot: there can be no 
transaction = this xid value that the snapshot needs to consider as 
completed.


Quote End

What I understand is the rule of serializations comes into play when 
Snapshot is being taken... Snapshot is being taken only when SHARED lock 
has already been acquired. If EX_OWNER is being used in this scenario, 
it still works as designed. If EX_OWNER has been acquired, NOBODY can 
get SHARED Lock which means the rule is still satisfied. Of course I am 
still worried about latestCompletedXid being written at the same time 
(maybe use some atomic compare and swap function for it to solve the 
problem) but I dont understand why you think that this is the place 
where it cannot be used?


Simon,  Scalability increases 50% from what I see  (throughput increases 
from 200k tpm to 300k tpm and system utilization went from 50% to 80% or 
so) and is a step forward in overall performance and system utilization. 
Also response time on high loads also falls drastically to something 
similar to low load response times.



Regards,
Jignesh









--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-02 Thread Jignesh K. Shah



Simon Riggs wrote:


Some other problems I see with GUCs

* It's not possible to set one parameter depending upon the setting of
another.
  


To me this is more critical.. Most people I have seen will increase one 
or few but not all parameters related to memory which can result in loss 
of performance and productivity in figuring out.


What happened to AvailRAM setting and base all memory gucs on that.  
Ideally PostgreSQL should only create one big memory pool and allow all 
other variables to change runtime via dba or some tuner process or 
customized application as long as total is less than the allocated 
shared_memory and local_memory settings. (This will also reduce the need 
of restarting Postgres if a value needs to be changed)




-Jignesh


* It's always unclear which GUCs can be changed, and when. That is much
more infrequently understood than the meaning of them.

* We should rename effective_cache_size to something that doesn't sound
like it does what shared_buffers does

* There is no config verification utility, so if you make a change and
then try to restart and it won't, you are in trouble.

  


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [JDBC] Re: [HACKERS] How embarrassing: optimization of a one-shot query doesn't work

2008-05-28 Thread Jignesh K. Shah

Are there any head fixes proposed for it?

I am seeing some scaling problems with EAStress which uses JDBC with 
8.3.0 and this one could be the reason why I am seeing some problems.. I 
will be happy to try it out and report on it.. The setup is ready right 
now if someone can point me to a patch that I can try it out and 
hopefully see if the patch fixes my problem.


-Jignesh


Dave Cramer wrote:

It's pretty easy to test.

prepare the query
and
run explain analyze on the prepared statement.

Dave
On 10-Apr-08, at 5:47 AM, Thomas Burdairon wrote:


Is there any patch available for this one?

I'm encountering troubles with some JDBC queries and I'd like to test 
it before asking some help on the JDBC list.


Thanks.
Tom

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [JDBC] Re: [HACKERS] How embarrassing: optimization of a one-shot query doesn't work

2008-05-28 Thread Jignesh K. Shah



Tom Lane wrote:

Jignesh K. Shah [EMAIL PROTECTED] writes:
  

Are there any head fixes proposed for it?



It's been fixed in CVS for a month.  We just haven't pushed a release yet.
  


Let me try it out and see what I find out in my EAStress workload.

Regards,
Jignesh


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why are we waiting?

2008-02-07 Thread Jignesh K. Shah

I dont think my earlier message got through..

We use separate lookup tables for 825 and 83 based on the respective 
lwlock.h for that version.



-Jignesh


Simon Riggs wrote:

On Thu, 2008-02-07 at 16:29 +0100, Staale Smedseng wrote:
  

On Wed, 2008-02-06 at 19:55, Tom Lane wrote:


I am wondering if the waits are being
attributed to the right locks --- I remember such an error in a previous
set of dtrace results, and some of the other details such as claiming
shared lock delays but no exclusive lock delays for FirstLockMgrLock
seem less than credible as well.
  

Good catch. We've checked the DTrace scripts against the respective
versions of lwlock.h, and the FirstLockMgrLock is off (this is actually
the results for FirstBufMappingLock). 



I just realised you are using a lookup to get the text for the name of
the lock. You used the same lookup table for both releases?

  


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Why are we waiting?

2008-02-07 Thread Jignesh K. Shah
Last try for the script/results (truncating less significant portions of 
output which are too big)




Staale Smedseng wrote:


her locks should have been output correctly, however.

But as Tom pointed out, the dynamic locks were not in the equation. So
now we're measuring all lock waits instead of assuming. :-)

Staale


  



Here are some recent runs on 4-quadcore xeons  using PostgreSQL 8.3.0 
with 10-second runs


Hope it gets through this time

-Jignesh

for about 500users :
  Lock IdMode   Count
   64  Shared 754
   53   Exclusive 846
   43   Exclusive1379
   11  Shared2038
3   Exclusive2376
   47   Exclusive2900
4  Shared3194
   42   Exclusive3389
   50   Exclusive3689
   41   Exclusive5280
   11   Exclusive6560
   48   Exclusive   10837
7   Exclusive   20599
   44   Exclusive   29193
   46   Exclusive   30299
4   Exclusive   66303

  Lock IdMode   Combined Time (ns)
   43   Exclusive   3440644758
   47   Exclusive   6392480847
   50   Exclusive   7199215298
   42   Exclusive   7865126413
   11   Exclusive   8503940437
4  Shared   9142071412
   61  Shared  10613733376
   41   Exclusive  14380504651
   48   Exclusive  16631996177
7   Exclusive  17485716438
   46   Exclusive  61584135053
   44   Exclusive  68889040292
4   Exclusive 177458486936



For about 700 Users:


  Lock IdMode   Count
3   Exclusive2873
4  Shared3436
   47   Exclusive3555
   11  Shared4098
   50   Exclusive4392
   42   Exclusive4573
   11   Exclusive6209
   41   Exclusive7552
   48   Exclusive   12335
7   Exclusive   22469
   44   Exclusive   36987
   46   Exclusive   38467
4   Exclusive   81453

  Lock IdMode   Combined Time (ns)
7   Exclusive  12033761450
   43   Exclusive  12217953092
   50   Exclusive  15454797539
   47   Exclusive  15684068953
   42   Exclusive  19365161311
4  Shared  27038955376
   48   Exclusive  32990318986
   41   Exclusive  48350689916
   46   Exclusive 193727946721
   44   Exclusive 212025745038
4   Exclusive 713263386624

At 1000 users


  Lock IdMode   Count
   59  Shared   10475
   58  Shared   11062
   57  Shared   13726
7   Exclusive   18548
   44   Exclusive   29714
   46   Exclusive   33886
4   Exclusive   74773
   11   Exclusive   79596

  Lock IdMode   Combined Time (ns)
   43   Exclusive   9067359926
   47   Exclusive  12259067669
   50   Exclusive  13239372833
   42   Exclusive  16534292830
7   Exclusive  23505490039
   48   Exclusive  24991948402
   41   Exclusive  33874749560
   11  Shared  43963854482
4  Shared  64098606143
   63  Shared 130988696365
   64  Shared 137865936811
   61  Shared 140978086498
   59  Shared 161661023016
   62  Shared 163857754020
   58  Shared 16794620
   44   Exclusive 220719773416
   57  Shared 245170386594
   46   Exclusive 

Re: [HACKERS] Why are we waiting?

2008-02-07 Thread Jignesh K. Shah



Tom Lane wrote:

Gregory Stark [EMAIL PROTECTED] writes:
  

This is a tangent but are these actual Postgres processes? What's the logic
behind trying to run a 1,000 processes on a box with 16 cpus?



We should certainly be careful about trying to eliminate contention in
this scenario at the cost of making things slower in more normal cases,
but it seems interesting to stress the system just to see what happens.

  

Was this with your patch to raise the size of the clog lru?



That's an important question.

  

What is MaxBackends actually set to for the runs.



That I think is not.  I'm fairly sure there are no performance-relevant
paths in which cost is driven by MaxBackends rather than the actual
current number of live backends.  Certainly nothing in or around the
ProcArray would act that way.

  
			regards, tom lane
  



I guess I was not clear.. It was PostgreSQL 8.3.0 (with no source code 
change)

I had compiled it 64-bit with DTRACE enabled.
max-backend was set to 1500 But I dont think that causes any thing to 
work slow. But yes the connections are pre-opened in the sense when 
500 users are actively doing work there are about 1006 postgresql 
processes running.


Yes I think I am taking the database to the extreme. But generally there 
is some THINK time of 50ms involved so there are time slices available 
for other users. Yes Commercial DB can also do pretty well on such 
systems so its not unrealistic to expect that PostgreSQL cannot perform 
here.


The old idea of stress testing it is to prove that it can go beyond 
these 16cores infact our target is about 64-cores soon.


Regards,
Jignesh


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] LDC - Load Distributed Checkpoints with PG8.3b2 on Solaris

2007-11-15 Thread Jignesh K. Shah

Yes I separate out as follows:
PGDATA  + 1 TABLE which needs to be cached (also workaround CLOG read 
problem)

LOGS
DATABASE TABLES
DATABASE INDEX
to get a good view of IOs out

I have full_page_writes=off in my settings
I dont see spikes of increase on WAL during checkpoints (maybe due to my 
setting) but the constant load which is in the range of about 
2-2.5MB/sec which is not low but my load is high.


In my current run I do have async wal on with wal_writer_delay=100ms and 
commit_delay off.



-Jignesh

Heikki Linnakangas wrote:

Jignesh K. Shah wrote:
Since its really writes that I am having trouble.. the auto vacuum 
message tells me 11 pages were removed and so many tuples were 
removed..  I am guessing its writes.


Do you keep track of I/O to WAL and data separately? WAL bandwidth 
will spike up when a checkpoint starts, because of full page writes.




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] LDC - Load Distributed Checkpoints with PG8.3b2 on Solaris

2007-11-14 Thread Jignesh K. Shah


I was waiting to digest  what I saw before sending it to the group

I am running EAStress workload

I am using odata_sync which should sync as soon as it is written

with checkpoint_completion_target=0.9 and checkpoint_time=5m it seems to 
be doing the right thing from the logfile output



2007-11-13 09:20:49.070 PST 9180  LOG:  checkpoint starting: time
2007-11-13 09:21:13.808 PST 9458  LOG:  automatic analyze of table 
specdb.public.o_orderline system usage: CPU 0.03s/0.50u sec elapsed 
7.79 sec
2007-11-13 09:21:19.830 PST 9458  LOG:  automatic vacuum of table 
specdb.public.txn_log_table: index scans: 1

   pages: 11 removed, 105 remain
   tuples: 3147 removed, 40 remain
   system usage: CPU 0.11s/0.09u sec elapsed 6.02 sec
2007-11-13 09:22:12.112 PST 9462  LOG:  automatic vacuum of table 
specdb.public.txn_log_table: index scans: 1

   pages: 28 removed, 77 remain
   tuples: 1990 removed, 95 remain
   system usage: CPU 0.11s/0.09u sec elapsed 5.98 sec
2007-11-13 09:23:12.121 PST 9466  LOG:  automatic vacuum of table 
specdb.public.txn_log_table: index scans: 1

   pages: 0 removed, 77 remain
   tuples: 3178 removed, 128 remain
   system usage: CPU 0.11s/0.04u sec elapsed 5.87 sec
2007-11-13 09:24:12.220 PST 9470  LOG:  automatic vacuum of table 
specdb.public.txn_log_table: index scans: 1

   pages: 0 removed, 77 remain
   tuples: 3394 removed, 57 remain
   system usage: CPU 0.11s/0.04u sec elapsed 5.85 sec
2007-11-13 09:25:12.400 PST 9474  LOG:  automatic vacuum of table 
specdb.public.txn_log_table: index scans: 1

   pages: 0 removed, 77 remain
   tuples: 3137 removed, 1 remain
   system usage: CPU 0.11s/0.04u sec elapsed 5.93 sec
2007-11-13 09:25:18.723 PST 9180  LOG:  checkpoint complete: wrote 33362 
buffers (2.2%); 0 transaction log file(s) added, 0 removed, 0 recycled; 
write=269.642 s, sync=0.003 s, total=269.653 s

2007-11-13 09:25:49.000 PST 9180  LOG:  checkpoint starting: time


However actual iostat output still shows non-uniform distribution  but I 
havent put the exact time stamp on the iostat outputs to correlate that 
with the logfile entries.. Maybe I should do that.


So from the PostgreSQL view things are doing fine based on outputs: I 
need to figure out the Solaris view on it now.


Could it be related to autovacuum happening also?


Regards,
Jignesh



Tom Lane wrote:

Jignesh K. Shah [EMAIL PROTECTED] writes:
  

I will turn on checkpoint_logging to get more idea as Heikki suggested



Did you find out anything?

Did this happen on every checkpoint, or only some of them?  The bug
Itagaki-san pointed out today in IsCheckpointOnSchedule might account
for some checkpoints being done at full speed, but not all ...

regards, tom lane
  


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] LDC - Load Distributed Checkpoints with PG8.3b2 on Solaris

2007-11-14 Thread Jignesh K. Shah

I dont understand vacuum a lot.. I admit I am stupid :-)

When you say scanned... do you mean reads or do you mean writes?

Since its really writes that I am having trouble.. the auto vacuum 
message tells me 11 pages were removed and so many tuples were 
removed..  I am guessing its writes.


I can try vacuuming that table before it starts the run to see it can 
avoid that..


-Jignesh




Tom Lane wrote:

Jignesh K. Shah [EMAIL PROTECTED] writes:
  
So from the PostgreSQL view things are doing fine based on outputs: I 
need to figure out the Solaris view on it now.



  

Could it be related to autovacuum happening also?



Maybe ... have you tried fiddling with the vacuum_cost_delay options?

Looking at the autovacuum log output, 

  
2007-11-13 09:21:19.830 PST 9458  LOG:  automatic vacuum of table 
specdb.public.txn_log_table: index scans: 1

pages: 11 removed, 105 remain
tuples: 3147 removed, 40 remain
system usage: CPU 0.11s/0.09u sec elapsed 6.02 sec



it seems like a serious omission that this gives you no hint how many
pages were scanned.

regards, tom lane
  


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] LDC - Load Distributed Checkpoints with PG8.3b2 on Solaris

2007-11-13 Thread Jignesh K. Shah

Hello,

I am running tests with PG8.3b2 on Solaris 10 8/07 and I still see IO 
flood when checkpoint happens.
I have tried increasing the bg_lru_multiplier from 2 to 5 from default 
but I dont see any more writes by bgwriter happening than my previous 
test which used the default.


Then I tried increasing checkpoint_completion_target=0.9 but still no 
spread of IO (checkpoint_timeout is set to default 5m)


What am I missing?

How does PostgreSQL determine the Load distribution?

Regards,
Jignesh



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] LDC - Load Distributed Checkpoints with PG8.3b2 on Solaris

2007-11-13 Thread Jignesh K. Shah
I am running EAStress workload.. which doesnt do manual checkpoints as 
far as I know..


I will turn on checkpoint_logging to get more idea as Heikki suggested

thanks.

-Jignesh


Tom Lane wrote:

Jignesh K. Shah [EMAIL PROTECTED] writes:
  
I am running tests with PG8.3b2 on Solaris 10 8/07 and I still see IO 
flood when checkpoint happens.



I am thinking that you are probably trying to test that by issuing
manual CHECKPOINT commands.  A manual checkpoint is still done at full
speed, as are shutdown checkpoints.  You need to study the behavior of
automatic (background) checkpoints, instead.

regards, tom lane
  


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-26 Thread Jignesh K. Shah
I agree with Tom..  somehow I think  increasing NUM_CLOG_BUFFERS is just 
avoiding the symptom to a later value.. I promise to look more into it 
before making any recommendations to increase NUM_CLOG_BUFFERs.



Because though iGen  showed improvements in that area by increasing 
num_clog_buffers , EAStress had shown no improvements.. Plus the reason 
I think this is not the problem in 8.3beta1 since the Lock Output 
clearly does not show CLOGControlFile as to be the issue which I had 
seen in earlier case.  So I dont think that increasing NUM_CLOG_BUFFERS 
will change thing here.


Now I dont understand the code pretty well yet I see three hotspots and 
not sure if they are related to each other
* ProcArrayLock waits  - causing Waits  as reported by 
83_lockwait.d script
* SimpleLRUReadPage - causing read IOs as reported by 
iostat/rsnoop.d

* GetSnapshotData - causing CPU utilization  as reported by hotuser

But I will shut up and do more testing.

Regards,
Jignesh



Tom Lane wrote:

Josh Berkus [EMAIL PROTECTED] writes:
  
Actually, 32 made a significant difference as I recall ... do you still have 
the figures for that, Jignesh?



I'd want to see a new set of test runs backing up any call for a change
in NUM_CLOG_BUFFERS --- we've changed enough stuff around this area that
benchmarks using code from a few months back shouldn't carry a lot of
weight.

regards, tom lane
  


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-26 Thread Jignesh K. Shah
The problem I saw was first highlighted by EAStress runs with PostgreSQL 
on Solaris with 120-150 users. I just replicated that via my smaller 
internal benchmark that we use here to recreate that problem.


EAStress should be just fine to highlight it.. Just put pg_clog on 
O_DIRECT or something so that all IOs go to disk making it easier to 
observe.


In the meanwhile I will try to get more information.


Regards,
Jignesh


Tom Lane wrote:

Gregory Stark [EMAIL PROTECTED] writes:
  

Didn't we already go through this? He and Simon were pushing to bump up
NUM_CLOG_BUFFERS and you were arguing that the test wasn't representative and
some other clog.c would have to be reengineered to scale well to larger
values. 



AFAIR we never did get any clear explanation of what the test case is.
I guess it must be write-mostly, else lazy XID assignment would have
helped this by reducing the rate of XID consumption.

It's still true that I'm leery of a large increase in the number of
buffers without reengineering slru.c.  That code was written on the
assumption that there were few enough buffers that a linear search
would be fine.  I'd hold still for 16, or maybe even 32, but I dunno
how much impact that will have for such a test case.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match
  


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PERFORM] 8.3beta1 testing on Solaris

2007-10-26 Thread Jignesh K. Shah


Hi George,

I have seen the 4M/sec problem first actually during an EAStress type 
run with only 150 connections.


I will try to do more testing today that Tom has requested.

Regards,
Jignesh


Gregory Stark wrote:

Jignesh K. Shah [EMAIL PROTECTED] writes:

  

CLOG data is not cached in any PostgreSQL shared memory segments and hence
becomes the bottleneck as it has to constantly go to the filesystem to get
the read data.



This is the same bottleneck you discussed earlier. CLOG reads are cached in
the Postgres shared memory segment but only NUM_CLOG_BUFFERS are which
defaults to 8 buffers of 8kb each. With 1,000 clients and the transaction rate
you're running you needed a larger number of buffers.

Using the filesystem buffer cache is also an entirely reasonable solution
though. That's surely part of the logic behind not trying to keep more of the
clog in shared memory. Do you have any measurements of how much time is being
spent just doing the logical I/O to the buffer cache for the clog pages? 4MB/s
seems like it's not insignificant but your machine is big enough that perhaps
I'm thinking at the wrong scale.

I'm really curious whether you see any benefit from the vxid read-only
transactions. I'm not sure how to get an apples to apples comparison though.
Ideally just comparing it to CVS HEAD from immediately prior to the vxid patch
going in. Perhaps calling some function which forces an xid to be allocated
and seeing how much it slows down the benchmark would be a good substitute.

  


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-26 Thread Jignesh K. Shah

Tom,

Here is what I did:

I started aggregating all read information:

First I also had added group by pid(arg0,arg1, pid) and the counts 
were all coming as 1


Then I just grouped by filename and location (arg0,arg1 of reads) and 
the counts came back as


# cat read.d
#!/usr/sbin/dtrace -s
syscall::read:entry
/execname==postgres/
{
   @read[fds[arg0].fi_pathname, arg1] = count();
}


# ./read.d
dtrace: script './read.d' matched 1 probe
^C

 /export/home0/igen/pgdata/pg_clog/0014   
-27530282934721
 /export/home0/igen/pgdata/pg_clog/0014   
-27530282770881
 /export/home0/igen/pgdata/pg_clog/0015   
-27530282443202
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028268896   14
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028260704   25
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028252512   27
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028277088   28
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028293472   37



FYI  I pressed ctrl-c within like less than a second

So to me this seems that multiple processes are reading the same page 
from different pids. (This was with about 600 suers active.


Aparently we do have a problem that we are reading the same buffer 
address again.  (Same as not being cached anywhere or not finding it in 
cache anywhere).


I reran lock wait script on couple of processes and did not see 
CLogControlFileLock  as a problem..


# ./83_lwlock_wait.d 14341

Lock IdMode   Count
  WALInsertLock   Exclusive   1
  ProcArrayLock   Exclusive  16

Lock Id   Combined Time (ns)
  WALInsertLock   383109
  ProcArrayLock198866236

# ./83_lwlock_wait.d 14607

Lock IdMode   Count
  WALInsertLock   Exclusive   2
  ProcArrayLock   Exclusive  15

Lock Id   Combined Time (ns)
  WALInsertLock55243
  ProcArrayLock 69700140

#

What will help you find out why it is reading the same page again?


-Jignesh



Jignesh K. Shah wrote:
I agree with Tom..  somehow I think  increasing NUM_CLOG_BUFFERS is 
just avoiding the symptom to a later value.. I promise to look more 
into it before making any recommendations to increase NUM_CLOG_BUFFERs.



Because though iGen  showed improvements in that area by increasing 
num_clog_buffers , EAStress had shown no improvements.. Plus the 
reason I think this is not the problem in 8.3beta1 since the Lock 
Output clearly does not show CLOGControlFile as to be the issue which 
I had seen in earlier case.  So I dont think that increasing 
NUM_CLOG_BUFFERS will change thing here.


Now I dont understand the code pretty well yet I see three hotspots 
and not sure if they are related to each other
* ProcArrayLock waits  - causing Waits  as reported by 
83_lockwait.d script
* SimpleLRUReadPage - causing read IOs as reported by 
iostat/rsnoop.d

* GetSnapshotData - causing CPU utilization  as reported by hotuser

But I will shut up and do more testing.

Regards,
Jignesh



Tom Lane wrote:

Josh Berkus [EMAIL PROTECTED] writes:
 
Actually, 32 made a significant difference as I recall ... do you 
still have the figures for that, Jignesh?



I'd want to see a new set of test runs backing up any call for a change
in NUM_CLOG_BUFFERS --- we've changed enough stuff around this area that
benchmarks using code from a few months back shouldn't carry a lot of
weight.

regards, tom lane
  


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-26 Thread Jignesh K. Shah

Also to give perspective on the equivalent writes on CLOG

I used the following script which runs for 10 sec to track all writes to 
the clog directory and here is what it came up with... (This is with 500 
users running)


# cat write.d
#!/usr/sbin/dtrace -s
syscall::write:entry
/execname==postgres  
dirname(fds[arg0].fi_pathname)==/export/home0/igen/pgdata/pg_clog/

{
   @write[fds[arg0].fi_pathname,arg1] = count();
}
tick-10sec
{
exit(0);
}

# ./write.d
dtrace: script './write.d' matched 2 probes
CPU IDFUNCTION:NAME
 3   1026  :tick-10sec

 /export/home0/igen/pgdata/pg_clog/001E   
-27530282770881

#
I modified read.d to do a 5sec read
# ./read.d
dtrace: script './read.d' matched 3 probes
CPU IDFUNCTION:NAME
 0  1   :BEGIN
 0   1027   :tick-5sec

 /export/home0/igen/pgdata/pg_clog/001F   
-27530282688961
 /export/home0/igen/pgdata/pg_clog/001F   
-27530282525121
 /export/home0/igen/pgdata/pg_clog/001F   
-27530282852802
 /export/home0/igen/pgdata/pg_clog/001F   
-27530282770883
 /export/home0/igen/pgdata/pg_clog/001F   
-27530282361283
 /export/home0/igen/pgdata/pg_clog/001E   
-27530282852805
 /export/home0/igen/pgdata/pg_clog/001E   
-27530282361289
 /export/home0/igen/pgdata/pg_clog/001E   
-2753028277088   13
 /export/home0/igen/pgdata/pg_clog/001E   
-2753028268896   15
 /export/home0/igen/pgdata/pg_clog/001E   
-2753028252512   27

#

So the ratio of reads vs writes to clog files is pretty huge..


-Jignesh



Jignesh K. Shah wrote:

Tom,

Here is what I did:

I started aggregating all read information:

First I also had added group by pid(arg0,arg1, pid) and the counts 
were all coming as 1


Then I just grouped by filename and location (arg0,arg1 of reads) and 
the counts came back as


# cat read.d
#!/usr/sbin/dtrace -s
syscall::read:entry
/execname==postgres/
{
   @read[fds[arg0].fi_pathname, arg1] = count();
}


# ./read.d
dtrace: script './read.d' matched 1 probe
^C

 /export/home0/igen/pgdata/pg_clog/0014   
-27530282934721
 /export/home0/igen/pgdata/pg_clog/0014   
-27530282770881
 /export/home0/igen/pgdata/pg_clog/0015   
-27530282443202
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028268896   14
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028260704   25
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028252512   27
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028277088   28
 /export/home0/igen/pgdata/pg_clog/0015   
-2753028293472   37



FYI  I pressed ctrl-c within like less than a second

So to me this seems that multiple processes are reading the same page 
from different pids. (This was with about 600 suers active.


Aparently we do have a problem that we are reading the same buffer 
address again.  (Same as not being cached anywhere or not finding it 
in cache anywhere).


I reran lock wait script on couple of processes and did not see 
CLogControlFileLock  as a problem..


# ./83_lwlock_wait.d 14341

Lock IdMode   Count
  WALInsertLock   Exclusive   1
  ProcArrayLock   Exclusive  16

Lock Id   Combined Time (ns)
  WALInsertLock   383109
  ProcArrayLock198866236

# ./83_lwlock_wait.d 14607

Lock IdMode   Count
  WALInsertLock   Exclusive   2
  ProcArrayLock   Exclusive  15

Lock Id   Combined Time (ns)
  WALInsertLock55243
  ProcArrayLock 69700140

#

What will help you find out why it is reading the same page again?


-Jignesh



Jignesh K. Shah wrote:
I agree with Tom..  somehow I think  increasing NUM_CLOG_BUFFERS is 
just avoiding the symptom to a later value.. I promise to look more 
into it before making any recommendations to increase NUM_CLOG_BUFFERs.



Because though iGen  showed improvements in that area by increasing 
num_clog_buffers , EAStress had shown no improvements.. Plus the 
reason I think this is not the problem in 8.3beta1 since the Lock 
Output clearly does not show CLOGControlFile as to be the issue which 
I had seen in earlier case.  So I dont think that increasing 
NUM_CLOG_BUFFERS will change thing here.


Now I dont understand the code pretty well yet I see three hotspots 
and not sure if they are related to each other
* ProcArrayLock waits  - causing Waits  as reported by 
83_lockwait.d script

Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-26 Thread Jignesh K. Shah


I  changed  CLOG Buffers to 16

Running the test again:
# ./read.d
dtrace: script './read.d' matched 2 probes
CPU IDFUNCTION:NAME
 0   1027   :tick-5sec

 /export/home0/igen/pgdata/pg_clog/0024   
-27530282192961
 /export/home0/igen/pgdata/pg_clog/0025   
-27530282111041

# ./read.d
dtrace: script './read.d' matched 2 probes
CPU IDFUNCTION:NAME
 1   1027   :tick-5sec

# ./read.d
dtrace: script './read.d' matched 2 probes
CPU IDFUNCTION:NAME
 1   1027   :tick-5sec

# ./read.d
dtrace: script './read.d' matched 2 probes
CPU IDFUNCTION:NAME
 0   1027   :tick-5sec

 /export/home0/igen/pgdata/pg_clog/0025   
-27530281947201



So Tom seems to be correct that it is a case of CLOG Buffer thrashing. 
But since I saw the same problem with two different workloads, I think 
people hitting this problem is pretty high.


Also I am bit surprised that CLogControlFile did not show up as being 
hot.. Maybe because not much writes are going on .. Or maybe since I did 
not trace all 500 users to see their hot lock status..



Dmitri has another workload to test, I might try that out later on to 
see if it causes similar impact or not.


Of course I havent seen my throughput go up yet since I am already CPU 
bound... But this is good since the number of IOPS to the disk are 
reduced (and hence system calls).



If I take this as my baseline number.. I can then proceed to hunt other 
bottlenecks



Whats the view of the community?

Hunt down CPU utilizations or Lock waits next?

Your votes are crucial on where I put my focus.

Another thing Josh B told me to check out was the wal_writer_delay setting:

I have done two settings with almost equal performance (with the CLOG 16 
setting) .. One with 100ms and other default at 200ms.. Based on the 
runs it seemed that the 100ms was slightly better than the default .. 
(Plus the risk of loosing data is reduced from 600ms to 300ms)


Thanks.

Regards,
Jignesh




Tom Lane wrote:

Jignesh K. Shah [EMAIL PROTECTED] writes:
  

So the ratio of reads vs writes to clog files is pretty huge..



It looks to me that the issue is simply one of not having quite enough
CLOG buffers.  Your first run shows 8 different pages being fetched and
the second shows 10.  Bearing in mind that we pin the latest CLOG page
into buffers, there are only NUM_CLOG_BUFFERS-1 buffers available for
older pages, so what we've got here is thrashing for the available
slots.

Try increasing NUM_CLOG_BUFFERS to 16 and see how it affects this test.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq
  


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[HACKERS] 8.3beta1 testing on Solaris

2007-10-25 Thread Jignesh K. Shah

Update on my testing 8.3beta1 on Solaris.

* CLOG reads
* Asynchronous Commit benefit
* Hot CPU Utilization

Regards,
Jignesh

__Background_:_
We were using PostgreSQL 8.3beta1 testing on our latest Sun SPARC 
Enterprise T5220 Server using Solaris 10 8/07 and Sun Fire X4200 using 
Solaris 10 8/07. Generally for performance benefits in Solaris  we put 
file systems on forcedirectio we bypass the filesystem cache and go 
direct to disks.


__Problem_:_
What we were observing that there were lots of reads happening  about 
4MB/sec on the file system holding $PGDATA and the database tables 
during an OLTP Benchmark run. Initially we thought that our bufferpools 
were not big enough. But thanks to 64-bit builds  we could use bigger 
bufferpools. However even with extraordinary bufferpool sizes we still 
saw lots of reads going to the disks.


__DTrace to the Rescue_:_

I modified iosnoop.d to just snoop on reads. The modified rsnoop.d is as 
follows:

$ cat rsnoop.d
#!/usr/sbin/dtrace -s
syscall::read:entry
/execname==postgres/
{
  printf(pid %d reading  %s\n, pid, fds[arg0].fi_pathname);
}

Based on it I found that most postgresql  processes were doing lots of 
reads from pg_clog directory.
CLOG or commit logs keep track of transactions in flight. Writes of CLOG 
comes from recording of transaction commits( or when it aborts) or when 
an XLOG is generated. However though I am not clear on reads yet, it 
seems every process constantly reads it to get some status. CLOG data is 
not cached in any PostgreSQL shared memory segments and hence becomes 
the bottleneck as it has to constantly go to the filesystem to get the 
read data.

# ./rsnoop.d
dtrace: script './rsnoop.d' matched 1 probe
CPU IDFUNCTION:NAME
 0  49222   read:entry pid 8739 reading  
/export/home0/igen/pgdata/pg_clog/000C


 0  49222   read:entry pid 9607 reading  
/export/home0/igen/pgdata/pg_clog/000C


 0  49222   read:entry pid 9423 reading  
/export/home0/igen/pgdata/pg_clog/000C


 0  49222   read:entry pid 8731 reading  
/export/home0/igen/pgdata/pg_clog/000C


 0  49222   read:entry pid 8719 reading  
/export/home0/igen/pgdata/pg_clog/000C


 0  49222   read:entry pid 9019 reading  
/export/home0/igen/pgdata/pg_clog/000C


 1  49222   read:entry pid 9255 reading  
/export/home0/igen/pgdata/pg_clog/000C


 1  49222   read:entry pid 8867 reading  
/export/home0/igen/pgdata/pg_clog/000C



Later on  during another run I added  ustack() after the printf in the 
above script to get the function name also:


# ./rsnoop.d
dtrace: script './rsnoop.d' matched 1 probe
CPU IDFUNCTION:NAME
 0  49222   read:entry pid 10956 reading  
/export/home0/igen/pgdata/pg_clog/0011

 libc.so.1`_read+0xa
 postgres`SimpleLruReadPage+0x3e6
 postgres`SimpleLruReadPage_ReadOnly+0x9b
 postgres`TransactionIdGetStatus+0x1f
 postgres`TransactionIdDidCommit+0x42
 postgres`HeapTupleSatisfiesVacuum+0x21a
 postgres`heap_prune_chain+0x14b
 postgres`heap_page_prune_opt+0x1e6
 postgres`index_getnext+0x144
 postgres`IndexNext+0xe1
 postgres`ExecScan+0x189
 postgres`ExecIndexScan+0x43
 postgres`ExecProcNode+0x183
 postgres`ExecutePlan+0x9e
 postgres`ExecutorRun+0xab
 postgres`PortalRunSelect+0x47a
 postgres`PortalRun+0x262
 postgres`exec_execute_message+0x565
 postgres`PostgresMain+0xf45
 postgres`BackendRun+0x3f9

 0  49222   read:entry pid 10414 reading  
/export/home0/igen/pgdata/pg_clog/0011

 libc.so.1`_read+0xa
 postgres`SimpleLruReadPage+0x3e6
 postgres`SimpleLruReadPage_ReadOnly+0x9b
 postgres`TransactionIdGetStatus+0x1f
 postgres`TransactionIdDidCommit+0x42
 postgres`HeapTupleSatisfiesVacuum+0x21a
 postgres`heap_prune_chain+0x14b
 postgres`heap_page_prune_opt+0x1e6
 postgres`index_getnext+0x144
 postgres`IndexNext+0xe1
 postgres`ExecScan+0x189
^C  libc.so.1`_read+0xa
 postgres`SimpleLruReadPage+0x3e6
 postgres`SimpleLruReadPage_ReadOnly+0x9b
 postgres`TransactionIdGetStatus+0x1f
 postgres`TransactionIdDidCommit+0x42
 postgres`HeapTupleSatisfiesMVCC+0x34f
 postgres`index_getnext+0x29e
 postgres`IndexNext+0xe1
 postgres`ExecScan+0x189
 postgres`ExecIndexScan+0x43
 postgres`ExecProcNode+0x183
 postgres`ExecutePlan+0x9e
 postgres`ExecutorRun+0xab
 postgres`PortalRunSelect+0x47a
 

[HACKERS] Function quote_literal broken in CATALOG_VERSION_NO 200707251

2007-08-20 Thread Jignesh K. Shah


Just FYI:

I took the latest snapshot CATALOG_VERSION_NO 200707251  and function 
quote_literal throws an error:


# select quote_literal(1);
2007-08-20 18:50:17 PDT ERROR:  function quote_literal(integer) does not 
exist at character 8
2007-08-20 18:50:17 PDT HINT:  No function matches the given name and 
argument types. You might need to add explicit type casts.

2007-08-20 18:50:17 PDT STATEMENT:  select quote_literal(1);
ERROR:  function quote_literal(integer) does not exist
LINE 1: select quote_literal(1);


I went back to an older snapshot CATALOG_VERSION_NO 200705211 and it 
works with the same SunStudio compiler and options:


postgres=# select quote_literal(1);
quote_literal
---
'1'
(1 row)


-Jignesh


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] fixing Makefile.shlib for solaris/gcc with -m64 flag

2007-02-01 Thread Jignesh K. Shah
I dont think we solved this.. But I think the way to put -m64 should be same as in Linux and Solaris 
and not different.


Thanks.
Regards,
Jignesh


Tom Lane wrote:

Peter Eisentraut [EMAIL PROTECTED] writes:

Am Mittwoch, 17. Januar 2007 17:12 schrieb Tom Lane:

Jignesh K. Shah [EMAIL PROTECTED] writes:

simple if I use -m64 for 64 bit then all end binaries are generated
64-bit and the shared libraries are generated 32-bit and the compilation
fails (ONLY ON SOLARIS) since that particular line is only for the
condition Solaris AND gcc.

If I use the COMPILER which is CC + CFLAGS it passes -m64 properly to it
and generates shared libraries 64-bit and the compile continues..

Hmm ... I see we're doing it that way already for some other platforms,
but I can't help thinking it's a kluge.  Wouldn't the correct answer be
that -m64 needs to be in LDFLAGS?



The correct answer may be to put -m64 into CC.


Did we conclude that that was a satisfactory solution, or is this still
a live patch proposal?

If -m64 in CC is the right solution, it should probably be mentioned in
FAQ_Solaris.

regards, tom lane


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[HACKERS] fixing Makefile.shlib for solaris/gcc with -m64 flag

2007-01-17 Thread Jignesh K. Shah

Hello All,

When I compile with gcc on Solaris with -m64 flags, all shared lbiraries fail.

Can someone fix the following in Makefile.shlib


ifeq ($(PORTNAME), solaris)
  ifeq ($(GCC), yes)
LINK.shared = $(CC) -shared
  else
LINK.shared = $(CC) -G $(CFLAGS)# CFLAGS added for X86_64
  endif


It should be

ifeq ($(PORTNAME), solaris)
  ifeq ($(GCC), yes)
LINK.shared = $(COMPILER) -shared
  else
LINK.shared = $(CC) -G $(CFLAGS)# CFLAGS added for X86_64
  endif


Thanks.
Regards,
Jignesh

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] fixing Makefile.shlib for solaris/gcc with -m64 flag

2007-01-17 Thread Jignesh K. Shah
simple if I use -m64 for 64 bit then all end binaries are generated 64-bit and the shared libraries 
are generated 32-bit and the compilation fails (ONLY ON SOLARIS) since that particular line is only 
for the condition Solaris AND gcc.


If I use the COMPILER which is CC + CFLAGS it passes -m64 properly to it and generates shared 
libraries 64-bit and the compile continues..


I just tested it out.

-Jignesh


Tom Lane wrote:

Jignesh K. Shah [EMAIL PROTECTED] writes:

It should be
 LINK.shared = $(COMPILER) -shared


Why?  What's the difference, and why is it appropriate to fix it that
way instead of by changing CFLAGS?

regards, tom lane


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] fixing Makefile.shlib for solaris/gcc with -m64 flag

2007-01-17 Thread Jignesh K. Shah

I tried that but it didn't work.

Also on Solaris it typically uses the ld in /usr/ccs/bin/ld which uses -64 as its flag for 64 bit 
and if you put LDFLAGS out there it will fail as unrecognized unless gcc parses -64 to -m64.


Putting -m64 in CC will do the workaround but then I guess that's what CFLAGS 
is for..

Regards,
Jignesh


Tom Lane wrote:

Jignesh K. Shah [EMAIL PROTECTED] writes:
simple if I use -m64 for 64 bit then all end binaries are generated 64-bit and the shared libraries 
are generated 32-bit and the compilation fails (ONLY ON SOLARIS) since that particular line is only 
for the condition Solaris AND gcc.


If I use the COMPILER which is CC + CFLAGS it passes -m64 properly to it and generates shared 
libraries 64-bit and the compile continues..


Hmm ... I see we're doing it that way already for some other platforms,
but I can't help thinking it's a kluge.  Wouldn't the correct answer be
that -m64 needs to be in LDFLAGS?

regards, tom lane


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-30 Thread Jignesh K. Shah

I have seen similar performance as Josh and my reasoning is as follows:

* WAL is the biggest bottleneck with its default size of 16MB. Many 
people hate to recompile the code   to change its default, and 
increasing checkpoint segments help but still there is lot of overhead 
in the rotation of WAL files (Even putting WAL on tmpfs shows that it is 
still slow). Having an option for bigger size is helpful to a small 
extent percentagewise (and frees up CPU a bit in doing file rotation)


* Growing files: Even though this is OS dependent but it does spend lot 
of time doing small 8K block increases to grow files. If we can signal 
bigger chunks to grow or pre-grow  to expected size of  data files 
that will help a lot in such cases.


* COPY command had restriction but that has been fixed to a large 
extent.(Great job)


But ofcourse I have lost touch with programming and can't begin to 
understand PostgreSQL code to change it myself.


Regards,
Jignesh




Ron Peacetree wrote:


That 11MBps was your =bulk load= speed.  If just loading a table
is this slow, then there are issues with basic physical IO, not just
IO during sort operations.

As I said, the obvious candidates are inefficient physical layout
and/or flawed IO code.

Until the basic IO issues are addressed, we could replace the
present sorting code with infinitely fast sorting code and we'd
still be scrod performance wise.

So why does basic IO suck so badly?

Ron  



-Original Message-
From: Josh Berkus josh@agliodbs.com
Sent: Sep 30, 2005 1:23 PM
To: Ron Peacetree [EMAIL PROTECTED]
Cc: pgsql-hackers@postgresql.org, pgsql-performance@postgresql.org
Subject: Re: [HACKERS] [PERFORM] A Better External Sort?

Ron,

 


Hmmm.
60GB/5400secs= 11MBps.  That's ssllooww.  So the first
problem is evidently our physical layout and/or HD IO layer
sucks.
   



Actually, it's much worse than that, because the sort is only dealing 
with one column.  As I said, monitoring the iostat our top speed was 
2.2mb/s.


--Josh


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly
 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match