from:"'Alfred Perlstein'"

Re: [HACKERS] pgbench vs. wait events

2016-10-07 Thread Alfred Perlstein




On 10/7/16 10:42 AM, Andres Freund wrote:

Hi,

On 2016-10-06 20:52:22 -0700, Alfred Perlstein wrote:

This contention on WAL reminds me of another scenario I've heard about that
was similar.

To fix things what happened was that anyone that the first person to block
would be responsible for writing out all buffers for anyone blocked behind
"him".

We pretty much do that already. But while that's happening, the other
would-be-writers show up as blocking on the lock.  We don't use kind of
an odd locking model for the waiters (LWLockAcquireOrWait()), which
waits for the lock to be released, but doesn't try to acquire it
afterwards. Instead the wal position is rechecked, and in many cases
we'll be done afterwards, because enough has been written out.

Greetings,

Andres Freund



Are the batched writes all done before fsync is called?

Are you sure that A only calls fsync after flushing all the buffers from 
B, C, and D?  Or will it fsync twice?  Is there instrumentation to show 
that?


I know there's a tremendous level of skill involved in this code, but 
simply asking in case there's some tricks.


Another strategy that may work is actually intentionally 
waiting/buffering some few ms between flushes/fsync, for example, make 
sure that the number of flushes per second doesn't exceed some 
configurable amount because each flush likely eats at least one iop from 
the disk and there is a maximum iops per disk, so might as well buffer 
more if you're exceeding that iops count.  You'll trade some latency, 
but gain throughput for doing that.


Does this make sense?  Again apologies if this has been covered.  Is 
there a whitepaper or blog post or clear way I can examine the algorithm 
wrt locks/buffering for flushing WAL logs?


-Alfred






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench vs. wait events

2016-10-07 Thread Alfred Perlstein


Robert,

This contention on WAL reminds me of another scenario I've heard about 
that was similar.


To fix things what happened was that anyone that the first person to 
block would be responsible for writing out all buffers for anyone 
blocked behind "him".


The for example if you have many threads, A, B, C, D

If while A is writing to WAL and hold the lock, then B arrives, B of 
course blocks, then C comes along and blocks as well, then D.


Finally A finishes its write and then 

Now you have two options for resolving this, either

1) A drops its lock, B picks up the lock... B writes its buffer and then 
drops the lock.  Then C gets the lock, writes its buffer, drops the 
lock, then finally D gets the lock, writes its buffer and then drops the 
lock.


2) A then writes out B's, C's, and D's buffers, then A drops the lock, 
B, C and D wake up, note that their respective buffers are written and 
just return.  This greatly speeds up the system. (just be careful to 
make sure A doesn't do "too much work" otherwise you can get a sort of 
livelock if too many threads are blocked behind it, generally only issue 
one additional flush on behalf of other threads, do not "loop until the 
queue is empty")


I'm not sure if this is actually possible with the way WAL is 
implemented, (or perhaps if this strategy is already implemented) but 
it's definitely worth if not done already as it can speed things up 
enormously.


On 10/6/16 11:38 AM, Robert Haas wrote:

Hi,

I decided to do some testing on hydra (IBM-provided community
resource, POWER, 16 cores/64 threads, kernel 3.2.6-3.fc16.ppc64) using
the newly-enhanced wait event stuff to try to get an idea of what
we're waiting for during pgbench.  I did 30-minute pgbench runs with
various configurations, but all had max_connections = 200,
shared_buffers = 8GB, maintenance_work_mem = 4GB, synchronous_commit =
off, checkpoint_timeout = 15min, checkpoint_completion_target = 0.9,
log_line_prefix = '%t [%p] ', max_wal_size = 40GB, log_checkpoints =
on.  During each run, I ran this psql script in another window and
captured the output:

\t
select wait_event_type, wait_event from pg_stat_activity where pid !=
pg_backend_pid()
\watch 0.5

Then, I used a little shell-scripting to count up the number of times
each wait event occurred in the output.  First, I tried scale factor
3000 with 32 clients and got these results:

   1  LWLockTranche   | buffer_mapping
   9  LWLockNamed | CLogControlLock
  14  LWLockNamed | ProcArrayLock
  16  Lock| tuple
  25  LWLockNamed | CheckpointerCommLock
  49  LWLockNamed | WALBufMappingLock
 122  LWLockTranche   | clog
 182  Lock| transactionid
 287  LWLockNamed | XidGenLock
1300  Client  | ClientRead
1375  LWLockTranche   | buffer_content
3990  Lock| extend
   21014  LWLockNamed | WALWriteLock
   28497  |
   58279  LWLockTranche   | wal_insert

tps = 1150.803133 (including connections establishing)

What jumps out here is, at least to me, is that there is furious
contention on both the wal_insert locks and on WALWriteLock.
Apparently, the system simply can't get WAL on disk fast enough to
keep up with this workload.   Relation extension locks and
buffer_content locks also are also pretty common, both ahead of
ClientRead, a relatively uncommon wait event on this test.  The load
average on the system was only about 3 during this test, indicating
that most processes are in fact spending most of their time off-CPU.
The first thing I tried was switching to unlogged tables, which
produces these results:

   1  BufferPin   | BufferPin
   1  LWLockTranche   | lock_manager
   2  LWLockTranche   | buffer_mapping
   8  LWLockNamed | ProcArrayLock
   9  LWLockNamed | CheckpointerCommLock
   9  LWLockNamed | CLogControlLock
  11  LWLockTranche   | buffer_content
  37  LWLockTranche   | clog
 153  Lock| tuple
 388  LWLockNamed | XidGenLock
 827  Lock| transactionid
1267  Client  | ClientRead
   20631  Lock| extend
   91767  |

tps = 1223.239416 (including connections establishing)

If you don't look at the TPS number, these results look like a vast
improvement.  The overall amount of time spent not waiting for
anything is now much higher, and the problematic locks have largely
disappeared from the picture.  However, the load average now shoots up
to about 30, because most of the time that the backends are "not
waiting for anything" they are in fact in kernel wait state D; that
is, they're stuck doing I/O.  This suggests that we might want to
consider advertising a wait state when a backend is doing I/O, so we
can measure this sort of thing.

Next, I tried lowering the scale factor to something that fits in
shared buffers.  Here are the results at scale factor 300:

  14  Lock| tup

Re: [HACKERS] pgbench vs. wait events

2016-10-06 Thread Alfred Perlstein


Robert,

This contention on WAL reminds me of another scenario I've heard about 
that was similar.


To fix things what happened was that anyone that the first person to 
block would be responsible for writing out all buffers for anyone 
blocked behind "him".


The for example if you have many threads, A, B, C, D

If while A is writing to WAL and hold the lock, then B arrives, B of 
course blocks, then C comes along and blocks as well, then D.


Finally A finishes its write and then 

Now you have two options for resolving this, either

1) A drops its lock, B picks up the lock... B writes its buffer and then 
drops the lock.  Then C gets the lock, writes its buffer, drops the 
lock, then finally D gets the lock, writes its buffer and then drops the 
lock.


2) A then writes out B's, C's, and D's buffers, then A drops the lock, 
B, C and D wake up, note that their respective buffers are written and 
just return.  This greatly speeds up the system. (just be careful to 
make sure A doesn't do "too much work" otherwise you can get a sort of 
livelock if too many threads are blocked behind it, generally only issue 
one additional flush on behalf of other threads, do not "loop until the 
queue is empty")


I'm not sure if this is actually possible with the way WAL is 
implemented, (or perhaps if this strategy is already implemented) but 
it's definitely worth if not done already as it can speed things up 
enormously.



On 10/6/16 11:38 AM, Robert Haas wrote:

Hi,

I decided to do some testing on hydra (IBM-provided community
resource, POWER, 16 cores/64 threads, kernel 3.2.6-3.fc16.ppc64) using
the newly-enhanced wait event stuff to try to get an idea of what
we're waiting for during pgbench.  I did 30-minute pgbench runs with
various configurations, but all had max_connections = 200,
shared_buffers = 8GB, maintenance_work_mem = 4GB, synchronous_commit =
off, checkpoint_timeout = 15min, checkpoint_completion_target = 0.9,
log_line_prefix = '%t [%p] ', max_wal_size = 40GB, log_checkpoints =
on.  During each run, I ran this psql script in another window and
captured the output:

\t
select wait_event_type, wait_event from pg_stat_activity where pid !=
pg_backend_pid()
\watch 0.5

Then, I used a little shell-scripting to count up the number of times
each wait event occurred in the output.  First, I tried scale factor
3000 with 32 clients and got these results:

   1  LWLockTranche   | buffer_mapping
   9  LWLockNamed | CLogControlLock
  14  LWLockNamed | ProcArrayLock
  16  Lock| tuple
  25  LWLockNamed | CheckpointerCommLock
  49  LWLockNamed | WALBufMappingLock
 122  LWLockTranche   | clog
 182  Lock| transactionid
 287  LWLockNamed | XidGenLock
1300  Client  | ClientRead
1375  LWLockTranche   | buffer_content
3990  Lock| extend
   21014  LWLockNamed | WALWriteLock
   28497  |
   58279  LWLockTranche   | wal_insert

tps = 1150.803133 (including connections establishing)

What jumps out here is, at least to me, is that there is furious
contention on both the wal_insert locks and on WALWriteLock.
Apparently, the system simply can't get WAL on disk fast enough to
keep up with this workload.   Relation extension locks and
buffer_content locks also are also pretty common, both ahead of
ClientRead, a relatively uncommon wait event on this test.  The load
average on the system was only about 3 during this test, indicating
that most processes are in fact spending most of their time off-CPU.
The first thing I tried was switching to unlogged tables, which
produces these results:

   1  BufferPin   | BufferPin
   1  LWLockTranche   | lock_manager
   2  LWLockTranche   | buffer_mapping
   8  LWLockNamed | ProcArrayLock
   9  LWLockNamed | CheckpointerCommLock
   9  LWLockNamed | CLogControlLock
  11  LWLockTranche   | buffer_content
  37  LWLockTranche   | clog
 153  Lock| tuple
 388  LWLockNamed | XidGenLock
 827  Lock| transactionid
1267  Client  | ClientRead
   20631  Lock| extend
   91767  |

tps = 1223.239416 (including connections establishing)

If you don't look at the TPS number, these results look like a vast
improvement.  The overall amount of time spent not waiting for
anything is now much higher, and the problematic locks have largely
disappeared from the picture.  However, the load average now shoots up
to about 30, because most of the time that the backends are "not
waiting for anything" they are in fact in kernel wait state D; that
is, they're stuck doing I/O.  This suggests that we might want to
consider advertising a wait state when a backend is doing I/O, so we
can measure this sort of thing.

Next, I tried lowering the scale factor to something that fits in
shared buffers.  Here are the results at scale factor 300:

  14  Lock| tu

Re: [HACKERS] Why we lost Uber as a user

2016-08-16 Thread Alfred Perlstein




On 8/3/16 3:29 AM, Greg Stark wrote:


Honestly the take-away I see in the Uber story is that they apparently
had nobody on staff that was on -hackers or apparently even -general
and tried to go it alone rather than involve experts from outside
their company. As a result they misdiagnosed their problems based on
prejudices seeing what they expected to see rather than what the real
problem was.


Agree strongly, but there are still lessons to be learned on the psql side.

-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we lost Uber as a user

2016-08-16 Thread Alfred Perlstein




On 8/2/16 10:02 PM, Mark Kirkwood wrote:

On 03/08/16 02:27, Robert Haas wrote:


Personally, I think that incremental surgery on our current heap
format to try to fix this is not going to get very far.  If you look
at the history of this, 8.3 was a huge release for timely cleanup of
dead tuple.  There was also significant progress in 8.4 as a result of
5da9da71c44f27ba48fdad08ef263bf70e43e689.   As far as I can recall, we
then made no progress at all in 9.0 - 9.4.  We made a very small
improvement in 9.5 with 94028691609f8e148bd4ce72c46163f018832a5b, but
that's pretty niche.  In 9.6, we have "snapshot too old", which I'd
argue is potentially a large improvement, but it was big and invasive
and will no doubt pose code maintenance hazards in the years to come;
also, many people won't be able to use it or won't realize that they
should use it.  I think it is likely that further incremental
improvements here will be quite hard to find, and the amount of effort
will be large relative to the amount of benefit.  I think we need a
new storage format where the bloat is cleanly separated from the data
rather than intermingled with it; every other major RDMS works that
way.  Perhaps this is a case of "the grass is greener on the other
side of the fence", but I don't think so.


Yeah, I think this is a good summary of the state of play.

The only other new db development to use a non-overwriting design like 
ours that I know of was Jim Starky's Falcon engine for (ironically) 
Mysql 6.0. Not sure if anyone is still progressing that at all now.


I do wonder if Uber could have successfully tamed dead tuple bloat 
with aggressive per-table autovacuum settings (and if in fact they 
tried), but as I think Robert said earlier, it is pretty easy to come 
up with a highly update (or insert + delete) workload that makes for a 
pretty ugly bloat component even with real aggressive autovacuuming.
I also wonder if they had used "star schema" which to my understanding 
would mean multiple tables to replace the single-table that has multiple 
indecies to work around the write amplification problem in postgresql.




Cheers

Mark







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we lost Uber as a user

2016-08-04 Thread Alfred Perlstein




On 8/4/16 2:00 AM, Torsten Zuehlsdorff wrote:



On 03.08.2016 21:05, Robert Haas wrote:

On Wed, Aug 3, 2016 at 2:23 PM, Tom Lane  wrote:

Robert Haas  writes:

I don't think they are saying that logical replication is more
reliable than physical replication, nor do I believe that to be true.
I think they are saying that if logical corruption happens, you can
fix it by typing SQL statements to UPDATE, INSERT, or DELETE the
affected rows, whereas if physical corruption happens, there's no
equally clear path to recovery.


Well, that's not an entirely unreasonable point, but I dispute the
implication that it makes recovery from corruption an easy thing to do.
How are you going to know what SQL statements to issue?  If the master
database is changing 24x7, how are you going to keep up with that?


I think in many cases people fix their data using business logic.  For
example, suppose your database goes down and you have to run
pg_resetxlog to get it back up.  You dump-and-restore, as one does,
and find that you can't rebuild one of your unique indexes because
there are now two records with that same PK.  Well, what you do is you
look at them and judge which one has the correct data, often the one
that looks more complete or the one with the newer timestamp. Or,
maybe you need to merge them somehow.  In my experience helping users
through problems of this type, once you explain the problem to the
user and tell them they have to square it on their end, the support
call ends.  The user may not always be entirely thrilled about having
to, say, validate a problematic record against external sources of
truth, but they usually know how to do it.  Database bugs aren't the
only way that databases become inaccurate.  If the database that they
use to keep track of land ownership in the jurisdiction where I live
says that two different people own the same piece of property,
somewhere there is a paper deed in a filing cabinet.  Fishing that out
to understand what happened may not be fun, but a DBA can explain that
problem to other people in the organization and those people can get
it fixed.  It's a problem, but it's fixable.

On the other hand, if a heap tuple contains invalid infomask bits that
cause an error every time you read the page (this actually happened to
an EnterpriseDB customer!), the DBA can't tell other people how to fix
it and can't fix it personally either.  Instead, the DBA calls me.


After reading this statement the ZFS filesystem pops into my mind. It 
has protection build in against various problems (data degradation, 
current spikes, phantom writes, etc).


For me this raises two questions:

1) would the usage of ZFS prevent such errors?

My feeling would say yes, but i have no idea about how a invalid 
infomask bit could occur.


2) would it be possible to add such prevention to PostgreSQL

I know this could add a massive overhead, but it its optional this 
could be a fine thing?
Postgresql is very "zfs-like" in its internals.  The problem was a bug 
in postgresql that caused it to just write data to the wrong place.


Some vendors use ZFS under databases to provide very cool services such 
as backup snapshots, test snapshots and other such uses.  I think Joyent 
is one such vendor but I'm not 100% sure.


-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we lost Uber as a user

2016-08-03 Thread Alfred Perlstein



> On Aug 3, 2016, at 3:29 AM, Greg Stark  wrote:
> 
>> 
> 
> Honestly the take-away I see in the Uber story is that they apparently
> had nobody on staff that was on -hackers or apparently even -general
> and tried to go it alone rather than involve experts from outside
> their company. As a result they misdiagnosed their problems based on
> prejudices seeing what they expected to see rather than what the real
> problem was.
> 

+1 very true. 

At the same time there are some lessons to be learned. At the very least 
putting in big bold letters where to come for help is one. 





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we lost Uber as a user

2016-08-02 Thread Alfred Perlstein




On 8/2/16 2:14 PM, Tom Lane wrote:

Stephen Frost  writes:

With physical replication, there is the concern that a bug in *just* the
physical (WAL) side of things could cause corruption.

Right.  But with logical replication, there's the same risk that the
master's state could be fine but a replication bug creates corruption on
the slave.

Assuming that the logical replication works by issuing valid SQL commands
to the slave, one could hope that this sort of "corruption" only extends
to having valid data on the slave that fails to match the master.
But that's still not a good state to be in.  And to the extent that
performance concerns lead the implementation to bypass some levels of the
SQL engine, you can easily lose that guarantee too.

In short, I think Uber's position that logical replication is somehow more
reliable than physical is just wishful thinking.  If anything, my money
would be on the other way around: there's a lot less mechanism that can go
wrong in physical replication.  Which is not to say there aren't good
reasons to use logical replication; I just do not believe that one.

regards, tom lane


The reason it can be less catastrophic is that for logical replication 
you may futz up your data, but you are safe from corrupting your entire 
db.  Meaning if an update is missed or doubled that may be addressed by 
a fixup SQL stmt, however if the replication causes a write to the 
entirely wrong place in the db file then you need to "fsck" your db and 
hope that nothing super critical was blown away.


The impact across a cluster is potentially magnified by physical 
replication.


So for instance, let's say there is a bug in the master's write to 
disk.  The logical replication acts as a barrier from that bad write 
going to the slaves.   With bad writes going to slaves then any 
corruption experienced on the master will quickly reach the slaves and 
they too will be corrupted.


With logical replication a bug may be stopped at the replication layer.  
At that point you can resync the slave from the master.


Now in the case of physical replication all your base are belong to zuul 
and you are in a very bad state.


That said with logical replication, who's to say that if the statement 
is replicated to a slave that the slave won't experience the same bug 
and also corrupt itself.


We may be saying the same thing, but still there is something to be said 
for logical replication... also, didnt they show that logical 
replication was faster for some use cases at Uber?


-Alfred







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we lost Uber as a user

2016-08-02 Thread Alfred Perlstein



> On Aug 2, 2016, at 2:33 AM, Geoff Winkless  wrote:
> 
>> On 2 August 2016 at 08:11, Alfred Perlstein  wrote:
>>> On 7/2/16 4:39 AM, Geoff Winkless wrote:
>>> I maintain that this is a nonsense argument. Especially since (as you 
>>> pointed out and as I missed first time around) the bug actually occurred at 
>>> different records on different slaves, so he invalidates his own point.
> 
>> Seriously?
> 
> No, I make a habit of spouting off random arguments to a list full of
> people whose opinions I massively respect purely for kicks. What do
> you think?
> 
>> There's a valid point here, you're sending over commands at the block level, 
>> effectively "write to disk at this location" versus "update this record 
>> based on PK", obviously this has some drawbacks that are reason for concern.
> 
> Writing values directly into file offsets is only problematic if
> something else has failed that has caused the file to be an inexact
> copy. If a different bug occurred that caused the primary key to be
> corrupted on the slave (or indeed the master), PK-based updates would
> exhibit similar propagation errors.
> 
> To reiterate my point, uber's described problem came about because of
> a bug. Every software has bugs at some point in its life, to pretend
> otherwise is simply naive. I'm not trying to excuse the bug, or to
> belittle the impact that such a bug has on data integrity or on uber
> or indeed on the reputation of PostgreSQL. While I'm prepared to
> accept (because I have a job that requires I spend time on things
> other than digging through obscure reddits and mailing lists to
> understand more fully the exact cause) that in _this particular
> instance_ the bug was propagated because of the replication mechanism
> (although I'm still dubious about that, as per my comment above), that
> does _not_ preclude other bugs propagating in a statement-based
> replication. That's what I said is a nonsense argument, and no-one has
> yet explained in what way that's incorrect.
> 
> Geoff


Geoff,

You are quite technical, my feeling is that you will understand it, however it 
will need to be a self learned lesson. 

-Alfred




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we lost Uber as a user

2016-08-02 Thread Alfred Perlstein




On 7/26/16 9:54 AM, Joshua D. Drake wrote:

Hello,

The following article is a very good look at some of our limitations 
and highlights some of the pains many of us have been working "around" 
since we started using the software.


https://eng.uber.com/mysql-migration/

Specifically:

* Inefficient architecture for writes
* Inefficient data replication
* Issues with table corruption
* Poor replica MVCC support
* Difficulty upgrading to newer releases

It is a very good read and I encourage our hackers to do so with an 
open mind.


Sincerely,

JD


It was a good read.

Having based a high performance web tracking service as well as a high 
performance security appliance on Postgresql I too have been bitten by 
these issues.


I had a few questions that maybe the folks with core knowledge can answer:

1) Would it be possible to create a "star-like" schema to fix this 
problem?  For example, let's say you have a table that is similar to Uber's:

col0pk, col1, col2, col3, col4, col5

All cols are indexed.
Assuming that updates happen to only 1 column at a time.
Why not figure out some way to encourage or automate the splitting of 
this table into multiple tables that present themselves as a single table?


What I mean is that you would then wind up with the following tables:
table1: col0pk, col1
table2: col0pk, col2
table3: col0pk, col3
table4: col0pk, col4
table5: col0pk, col5

Now when you update "col5" on a row, you only have to update the index 
on table5:col5 and table5:col0pk as opposed to beforehand where you 
would have to update more indecies.  In addition I believe that vacuum 
would be somewhat mitigated as well in this case.


2) Why not have a look at how innodb does its storage, would it be 
possible to do this?


3) For the small-ish table that Uber mentioned, is there a way to "have 
it in memory" however provide some level of sync to disk so that it is 
consistent?


thanks!
-Alfred




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we lost Uber as a user

2016-08-02 Thread Alfred Perlstein




On 7/28/16 7:08 AM, Merlin Moncure wrote:


*) postgres may not be the ideal choice for those who want a thin and
simple database
This is a huge market, addressing it will bring mindshare and more jobs, 
code and braintrust to psql.


-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we lost Uber as a user

2016-08-02 Thread Alfred Perlstein

On 7/28/16 4:39 AM, Geoff Winkless wrote:

On 28 Jul 2016 12:19, "Vitaly Burovoy" > wrote:

>
> On 7/28/16, Geoff Winkless > wrote:
> > On 27 July 2016 at 17:04, Bruce Momjian > wrote:

> >
> >> Well, their big complaint about binary replication is that a bug can
> >> spread from a master to all slaves, which doesn't happen with 
statement

> >> level replication.
> >
> > 
> > I'm not sure that that makes sense to me. If there's a database 
bug that
> > occurs when you run a statement on the master, it seems there's a 
decent
> > chance that that same bug is going to occur when you run the same 
statement

> > on the slave.
> >
> > Obviously it depends on the type of bug and how identical the 
slave is, but

> > statement-level replication certainly doesn't preclude such a bug from
> > propagating.
> >
> > Geoff
>
> Please, read the article first! The bug is about wrong visibility of
> tuples after applying WAL at slaves.
> For example, you can see two different records selecting from a table
> by a primary key (moreover, their PKs are the same, but other columns
> differ).

I read the article. It affected slaves as well as the master.

I quote:
"because of the way replication works, this issue has the potential to 
spread into all of the databases in a replication hierarchy"

I maintain that this is a nonsense argument. Especially since (as you 
pointed out and as I missed first time around) the bug actually 
occurred at different records on different slaves, so he invalidates 
his own point.

Geoff

Seriously?

There's a valid point here, you're sending over commands at the block 
level, effectively "write to disk at this location" versus "update this 
record based on PK", obviously this has some drawbacks that are reason 
for concern.  Does it validate the move on its own? NO.  Does it add to 
the reasons to move away?  Yes, that much is obvious.

Please read this thread:
https://www.reddit.com/r/programming/comments/4vms8x/why_we_lost_uber_as_a_user_postgresql_mailing_list/d5zx82n

Do I love postgresql?  Yes.
Have I been bitten by things such as this?  Yes.
Should the community learn from these things and think of ways to avoid 
it?  Absolutely!

-Alfred

[HACKERS] Question about durability and postgresql.

2015-02-20 Thread Alfred Perlstein

Hello,

We have a combination of 9.3 and 9.4 databases used for logging of data.

We do not need a strong durability guarantee, meaning it is ok if on crash a 
minute or two of data is lost from our logs.  (This is just stats for our 
internal tool).

I am looking at this page:
http://www.postgresql.org/docs/9.4/static/non-durability.html

And it's not clear which setting I should turn on.

What we do NOT want is to lose the entire table or corrupt the database.  We do 
want to gain speed though by not making DATA writes durable.

Which setting is appropriate for this use case?

At a glance it looks like a combination of
1) "Turn off synchronous_commit"
and possibly:
2)  Increase checkpoint_segments and checkpoint_timeout ; this reduces the 
frequency of checkpoints, but increases the storage requirements of /pg_xlog.
3) Turn off full_page_writes; there is no need to guard against partial page 
writes.

The point here is to never get a corrupt database, but in case of crash we 
might lose a few minutes of last transactions.

Any suggestions please?

thank you,
-Alfred

[HACKERS] Question about durability and postgresql.

2015-02-19 Thread Alfred Perlstein


Hello,

We have a combination of 9.3 and 9.4 databases used for logging of data.

We do not need a strong durability guarantee, meaning it is ok if on crash a 
minute or two of data is lost from our logs.  (This is just stats for our 
internal tool).

I am looking at this page:
http://www.postgresql.org/docs/9.4/static/non-durability.html

And it's not clear which setting I should turn on.

What we do NOT want is to lose the entire table or corrupt the database.  We do 
want to gain speed though by not making DATA writes durable.

Which setting is appropriate for this use case?

At a glance it looks like a combination of
1) "Turn off synchronous_commit"
and possibly:
2)  Increase checkpoint_segments and checkpoint_timeout ; this reduces the 
frequency of checkpoints, but increases the storage requirements of /pg_xlog.
3) Turn off full_page_writes; there is no need to guard against partial page 
writes.

The point here is to never get a corrupt database, but in case of crash we 
might lose a few minutes of last transactions.

Any suggestions please?

thank you,
-Alfred

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-26 Thread Alfred Perlstein

JFYI we have 3 or 4 machines racked for the pgsql project in our DC. 

Tom informed me he would be lighting them up this week time permitting.  

Sent from my iPhone

> On Apr 26, 2014, at 6:15 PM, Stephen Frost  wrote:
> 
> Jim,
> 
> * Jim Nasby (j...@nasby.net) wrote:
>>> On 4/22/14, 5:01 PM, Alfred Perlstein wrote:
>>> We also have colo space and power, etc.  So this would be the whole deal.  
>>> The cluster would be up for as long as needed.
>>> 
>>> Are the machine specs sufficient?  Any other things we should look for?
>>> 
>>> CC'd Tom on this email.
>> 
>> Did anyone respond to this off-list?
> 
> Yes, I did follow-up with Tom.  I'll do so again, as the discussion had
> died down.
> 
>> Would these machines be more useful as dedicated performance test servers 
>> for the community or generic BenchFarm members?
> 
> I don't believe they would be terribly useful as buildfarm systems; we
> could set up similar systems with VMs to just run the regression tests.
> Where I see these systems being particularly valuable would be as the
> start of our performance farm, and perhaps one of the systems as a PG
> infrastructure server.
> 
>Thanks!
> 
>Stephen


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-22 Thread Alfred Perlstein



On 4/22/14, 8:26 AM, Andrew Dunstan wrote:


On 04/22/2014 01:36 AM, Joshua D. Drake wrote:


On 04/21/2014 06:19 PM, Andrew Dunstan wrote:



If we never start we'll never get there.

I can think of several organizations that might be approached to donate
hardware.


Like .Org?

We have a hardware farm, a rack full of hardware and spindles. It 
isn't the most current but it is there.






I'm going away tomorrow for a few days R&R. when I'm back next week I 
will set up a demo client running this module. If you can have a 
machine prepped for this purpose by then so much the better, otherwise 
I will have to drag out a box I recently rescued and have been waiting 
for something to use it with. It's more important that it's stable 
(i.e. nothing else running on it) than that it's very powerful. It 
could be running Ubuntu or some Redhattish variant or, yes, even FreeBSD.


cheers

andrew




Hey folks, I just spoke with our director of netops Tom Sparks here at 
Norse and we have a vested interest in Postgresql.  We can throw 
together a cluster of 4 machines with specs approximately in the range 
of dual quad core westmere with ~64GB of ram running FreeBSD 10 or 11.  
We can also do an Ubungu install as well or other Linux distro.  Please 
let me know if that this would be a something that the project could 
make use of please.


We also have colo space and power, etc.  So this would be the whole 
deal.  The cluster would be up for as long as needed.


Are the machine specs sufficient?  Any other things we should look for?

CC'd Tom on this email.

-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein



On 4/21/14, 2:23 PM, Stephen Frost wrote:

Alfred,

* Alfred Perlstein (alf...@freebsd.org) wrote:

On 4/21/14, 12:47 PM, Stephen Frost wrote:

  Asking for help to address the FreeBSD performance would have
been much better received. Thanks, Stephen

That is exactly what I did, I asked for a version of postgresql that
was easy to switch at runtime between two behaviors.

That would make it a LOT easier to run a few scripts and make sure I
got the correct binary without having to munge PREFIX and a bunch of
PATH and other tools to get my test harness to DTRT.

I'm sure one of the hackers would be happy to provide you with a patch
to help you with your testing.

That would be fine.

That's quite a different thing from asking for a GUC to be provided and
then supported over the next 5 years as part of the core release, which
is what I believe we all thought you were asking for.
I did not know that GUCs were not classified into 
"experimental/non-experimental".  The fact that a single GUC would need 
to be supported for 5 years is definitely something to consider.  Now I 
understand the push back a little more.


-Alfred



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein



On 4/21/14, 12:47 PM, Stephen Frost wrote:
  Asking for help to address the FreeBSD performance would have been 
much better received. Thanks, Stephen 


That is exactly what I did, I asked for a version of postgresql that was 
easy to switch at runtime between two behaviors.


That would make it a LOT easier to run a few scripts and make sure I got 
the correct binary without having to munge PREFIX and a bunch of PATH 
and other tools to get my test harness to DTRT.


-Alfred



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein



On 4/21/14, 11:14 AM, Stephen Frost wrote:

Alfred,

* Alfred Perlstein (alf...@freebsd.org) wrote:

On 4/21/14, 9:51 AM, Andres Freund wrote:

On 2014-04-21 09:42:06 -0700, Alfred Perlstein wrote:

Sure, to be fair, we are under the gun here for a product, it may just mean
that the end result of that conversation is "mysql".

Personally arguments in that vain are removing just about any incentive
I have to work on the problem.

I was just explaining that we have a timeline over here and while
that may disincentive you for providing what we need it would be
very unfair.

I'm pretty sure Andres was referring to the part where there's a
'threat' to move to some other platform due to a modest performance
degredation, as if it's the only factor involved in making a decision
among the various RDBMS options.  If that's really your deciding
criteria instead of the myriad of other factors, I daresay you have your
priorities mixed up.


There are other Linux centric dbs to pick from.  If pgsql is just
another Linux centric DB then that is unfortunate but something I
can deal with.

These attacks really aren't going to get you anywhere.  We're talking
about a specific performance issue that FreeBSD has and how much PG
(surely not the only application impacted by this issue) should bend
to address it, even though the FreeBSD folks were made aware of the
issue over year ago and have done nothing to address it.

Moreover, you'd like to also define the way we deal with the issue as
being to make it runtime configurable rather than as a compile-time
option, even though 90% of the users out there won't understand the
difference nor would know how to correctly set it (and, in many cases,
may end up making the wrong decision because it's the default for
other platforms, unless we add more code to address this at initdb
time).

Basically, it doesn't sound like you're terribly concerned with the
majority of our user base, even on FreeBSD, and would prefer to try
and browbeat us into doing what you've decided is the correct solution
because it'd work better for you.

I've been guiltly of the same in the past and it's not fun having to
back off from a proposal when it's pointed out that there's a better
option, particularly when it doesn't seem like the alternative is
better for me, but that's just part of working in any large project.

Stephen, please calm down on the hyperbole, seriously, picking another 
db is not an attack.


I was simply asking for a feature that would make my life easier as both 
an admin deploying postgresql and a kernel dev attempting to fix a 
problem.  I'm one guy, probably the only guy right now asking.  Honestly 
the thought of needing to compile two versions of postgresql to do sysv 
vs mmap performance would take me more time than I would like to devote 
to the issue when my time is already limited.


Again, it was an ask, you are free to do what you like, the same way you 
were free to ignore my advice at pgcon about mmap being less efficient.  
It does not make what I'm saying an "attack".  Just like when 
interviewing people choosing a different candidate for a job is not an 
attack on the other candidates!


-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein



On 4/21/14, 9:52 AM, Alvaro Herrera wrote:

Alfred Perlstein wrote:


I am unsure of the true overhead of making this a runtime tunable so
pardon if I'm asking for "a lot".  From the perspective of both an
OS developer and postgresql user (I am both) it really makes more
sense to have it a runtime tunable for the following reasons:

 From an OS developer making this a runtime allows us to much more
easily do the testing (instead of needing two compiled versions).
 From a sysadmin perspective it makes switching to/from a LOT easier
in case the new mmap code exposes a stability or performance bug.

In this case, AFAICS the only overhead of a runtime option (what we call
a GUC) is the added potential for user confusion, and the necessary
documentation.  If we instead go for a compile-time option, both items
become smaller.

In any case, I don't see that there's much need for a runtime option,
really; you already know that the mmap code path is slower in FreeBSD.
You only need to benchmark both options once the FreeBSD vm code itself
is fixed, right?

In fact, it might not even need to be a configure option; I would
suggest a pg_config_manual.h setting instead, and perhaps tweaks to the
src/template/freebsd file to enable it automatically on the "broken"
FreeBSD releases.  We could then, in the future, have the template
itself turn the option off for the future FreeBSD release that fixes the
problem.

That is correct, until you're in prod and suddenly one option becomes 
unstable, or you want to try a quick kernel patch without rebooting.


Look, this is an argument I've lost time and time again in open source 
software communities, the idea of a software option as opposed to 
compile time really seems to hit people the wrong way.


I think from now on it just makes sense to sit back and let whatever 
happens happen.


-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein



On 4/21/14, 9:51 AM, Andres Freund wrote:

On 2014-04-21 09:42:06 -0700, Alfred Perlstein wrote:

Sure, to be fair, we are under the gun here for a product, it may just mean
that the end result of that conversation is "mysql".

Personally arguments in that vain are removing just about any incentive
I have to work on the problem.


I was just explaining that we have a timeline over here and while that 
may disincentive you for providing what we need it would be very unfair.


In that I mean sometimes the reality of a situation can be inconvenient 
and for that I do apologize.


What I am seeing here is unfortunately a very strong departure from 
FreeBSD support by the community from several of the developers.  In 
fact over drinks at pgcon last year there were a TON of jokes making fun 
of FreeBSD users and developers which I took in stride as professional 
joking with alcohol involved.  I thought it was pretty funny.  However a 
year later and I realize that there appears to be a real problem with 
FreeBSD in the pgsql community.


There are other Linux centric dbs to pick from.  If pgsql is just 
another Linux centric DB then that is unfortunate but something I can 
deal with.


-Alfred



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein



On 4/21/14, 9:51 AM, Andrew Dunstan wrote:


On 04/21/2014 12:44 PM, Alfred Perlstein wrote:

On 4/21/14 9:38 AM, Andrew Dunstan wrote:


On 04/21/2014 12:25 PM, Alfred Perlstein wrote:






1. OS developers are not the target audience for GUCs. If the OS 
developers want to test and can't be botherrd with building with a 
couple of different parameters then I'm not very impressed.


2. We should be trying to get rid of GUCs where possible, and only 
add them when we must. The more there are the more we confuse 
users. If a packager can pick a default surely they can pick build 
options too.
Thank you for the lecture Andrew!  Really pleasant way to treat a 
user and a fan of the system. :)





I confess to being mightily confused.


Sure, to clarify:

Andrew, you just told someone who in a db stack sits both below (as a 
pgsql user 15 years) and above (as a FreeBSD kernel dev 15 years) 
your software what they "really need".






I told you what *we* (i.e. the PostgreSQL community) need, IMNSHO (and 
speaking as a Postgres developer and consultant of 10 or so years 
standing).


How high on the hierarchy of PostgreSQL's "needs" is making a single 
option a tunable versus compile time thing?  I mean seriously you mean 
to stick on this one point when one of your users are asking you about 
this?   That is pretty concerning to me.


-Alfred





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein


On 4/21/14 4:10 AM, Andres Freund wrote:

Hi,

On 2014-04-20 11:24:38 +0200, Palle Girgensohn wrote:

I see performance degradation with PostgreSQL 9.3 vs 9.2 on FreeBSD, and I'm 
wondering who to poke to mitigate the problem. In reference to this thread [1], 
who where the FreeBSD people that Francois mentioned? If mmap needs to perform 
well in the kernel, I'd like to know of someone with FreeBSD kernel knowledge 
who is interested in working with mmap perfocmance. If mmap is indeed the 
cuplrit, I've just tested 9.2.8 vs 9.3.4, I nevere isolated the mmap patch, 
although I believe Francois did just that with similar results.

If there are indeed such large regressions on FreeBSD we need to treat
them as postgres regressions. It's nicer not to add config options for
things that don't need it, but apparently that's not the case here.

Imo this means we need to add GUC to control wether anon mmap() or sysv
shmem is to be used. In 9.3.

Greetings,

Andres Freund

Andres, thank you.  Speaking as a FreeBSD developer that would be a good 
idea.


--
Alfred Perlstein



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein


On 4/21/14 9:38 AM, Andrew Dunstan wrote:


On 04/21/2014 12:25 PM, Alfred Perlstein wrote:






1. OS developers are not the target audience for GUCs. If the OS 
developers want to test and can't be botherrd with building with a 
couple of different parameters then I'm not very impressed.


2. We should be trying to get rid of GUCs where possible, and only 
add them when we must. The more there are the more we confuse users. 
If a packager can pick a default surely they can pick build options 
too.
Thank you for the lecture Andrew!  Really pleasant way to treat a 
user and a fan of the system. :)





I confess to being mightily confused.


Sure, to clarify:

Andrew, you just told someone who in a db stack sits both below (as a 
pgsql user 15 years) and above (as a FreeBSD kernel dev 15 years) your 
software what they "really need".


-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein


On 4/21/14 9:34 AM, Stephen Frost wrote:

* Alfred Perlstein (alf...@freebsd.org) wrote:

There is definitely hope, however changes to the FreeBSD vm are
taken as seriously as changes to core changes to Postresql's store.
In addition changes to vm is somewhat in the realm of complexity of
Postgresql store as well so it may not be coming in the next few
days/weeks, but rather a month or two.  I am not sure if an easy fix
is available in FreeBSD but we will see in short order.

This has been known for over a year.. :(

I know!  I remember warning y'all about it back at pgcon last year. :)



I need to do some research.  I work with Adrian (FreeBSD kernel dev
mentioned earlier in the thread), I'll grab him today and discuss
what the issue may be.

Hopefully that'll get things moving in the right direction, finally..
Sure, to be fair, we are under the gun here for a product, it may just 
mean that the end result of that conversation is "mysql".


I'm hoping we can use Postgresql as I've been a huge fan since 1999.  I 
based my first successful project on it and had a LOT of help from the 
pgsql community, Tom, Bruce and we even contracted Vadim for some work 
on incremental vacuums!


-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein

On 4/21/14 9:24 AM, Andrew Dunstan wrote:

On 04/21/2014 11:59 AM, Alfred Perlstein wrote:

On 4/21/14 8:45 AM, Andrew Dunstan wrote:

On 04/21/2014 11:39 AM, Magnus Hagander wrote:
On Mon, Apr 21, 2014 at 4:51 PM, Andres Freund 
mailto:and...@2ndquadrant.com>> wrote:

On 2014-04-21 10:45:24 -0400, Tom Lane wrote:
> Andres Freund mailto:and...@2ndquadrant.com>> writes:
> > If there are indeed such large regressions on FreeBSD we need
to treat
> > them as postgres regressions. It's nicer not to add config
options for
> > things that don't need it, but apparently that's not the case
here.
>
> > Imo this means we need to add GUC to control wether anon
mmap() or sysv
> > shmem is to be used. In 9.3.
>
> I will resist this mightily.  One of the main reasons to switch
to mmap
> was so we would no longer have to explain about SysV shm
configuration.

It's still explained in the docs and one of the dynshm 
implementations
is based on sysv shmem. So I don't see this as a convincing 
reason.

Regressing installed OSs by 15-20% just to save a couple of 
lines of

docs and code seems rather unconvincing to me.

There's also the fact that even if it's changed in FreeBSD, that 
might be somethign that takes years to trickle out to whatever 
stable release people are actually using.

But do we really want a *guc* for it though? Isn't it enough (and 
in fact better) with a configure switch to pick the implementation 
when multiple are available, that could then be set by default for 
example by the freebsd ports build? That's a lot less "overhead" to 
keep dragging around...

That seems to make more sense. I can't imagine why this would be a 
runtime parameter as opposed to build time.

I am unsure of the true overhead of making this a runtime tunable so 
pardon if I'm asking for "a lot".  From the perspective of both an OS 
developer and postgresql user (I am both) it really makes more sense 
to have it a runtime tunable for the following reasons:

From an OS developer making this a runtime allows us to much more 
easily do the testing (instead of needing two compiled versions).
From a sysadmin perspective it makes switching to/from a LOT easier 
in case the new mmap code exposes a stability or performance bug.

1. OS developers are not the target audience for GUCs. If the OS 
developers want to test and can't be botherrd with building with a 
couple of different parameters then I'm not very impressed.

2. We should be trying to get rid of GUCs where possible, and only add 
them when we must. The more there are the more we confuse users. If a 
packager can pick a default surely they can pick build options too.
Thank you for the lecture Andrew!  Really pleasant way to treat a user 
and a fan of the system. :)

-Alfred

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein


On 4/21/14 9:13 AM, Stephen Frost wrote:

* Alfred Perlstein (alf...@freebsd.org) wrote:

Can the package builder not set the default for the runtime tunable?

Yeah, I was thinking about that also, but at least in this case it seems
pretty clear that the 'right' answer is known at build time.


Honestly we're about to select a db platform for another FreeBSD
based system we are building, I strongly hoping that we can get back
to sysvshm easily otherwise we may have to select another store.

Is there no hope of this getting fixed in FreeBSD..?  PG wouldn't be the
only application impacted by this, I'm sure...
There is definitely hope, however changes to the FreeBSD vm are taken as 
seriously as changes to core changes to Postresql's store. In addition 
changes to vm is somewhat in the realm of complexity of Postgresql store 
as well so it may not be coming in the next few days/weeks, but rather a 
month or two.  I am not sure if an easy fix is available in FreeBSD but 
we will see in short order.


I need to do some research.  I work with Adrian (FreeBSD kernel dev 
mentioned earlier in the thread), I'll grab him today and discuss what 
the issue may be.


-Alfred



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein


On 4/21/14 8:58 AM, Tom Lane wrote:

Andres Freund  writes:

On 2014-04-21 11:45:49 -0400, Andrew Dunstan wrote:

That seems to make more sense. I can't imagine why this would be a runtime
parameter as opposed to build time.

Because that implies that packagers and porters need to make that
decision. If it's a GUC people can benchmark it and decide.

As against that, the packager would be more likely to get it right
(or even to know that there's an issue).


Can the package builder not set the default for the runtime tunable?

Honestly we're about to select a db platform for another FreeBSD based 
system we are building, I strongly hoping that we can get back to 
sysvshm easily otherwise we may have to select another store.


-Alfred (who still remembers back when Tom had a login on our primary db 
to help us. :) )




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein

On 4/21/14 8:45 AM, Andrew Dunstan wrote:

On 04/21/2014 11:39 AM, Magnus Hagander wrote:
On Mon, Apr 21, 2014 at 4:51 PM, Andres Freund 
mailto:and...@2ndquadrant.com>> wrote:

On 2014-04-21 10:45:24 -0400, Tom Lane wrote:
> Andres Freund mailto:and...@2ndquadrant.com>> writes:
> > If there are indeed such large regressions on FreeBSD we need
to treat
> > them as postgres regressions. It's nicer not to add config
options for
> > things that don't need it, but apparently that's not the case
here.
>
> > Imo this means we need to add GUC to control wether anon
mmap() or sysv
> > shmem is to be used. In 9.3.
>
> I will resist this mightily.  One of the main reasons to switch
to mmap
> was so we would no longer have to explain about SysV shm
configuration.

It's still explained in the docs and one of the dynshm 
implementations

is based on sysv shmem. So I don't see this as a convincing reason.

Regressing installed OSs by 15-20% just to save a couple of lines of
docs and code seems rather unconvincing to me.

There's also the fact that even if it's changed in FreeBSD, that 
might be somethign that takes years to trickle out to whatever stable 
release people are actually using.

But do we really want a *guc* for it though? Isn't it enough (and in 
fact better) with a configure switch to pick the implementation when 
multiple are available, that could then be set by default for example 
by the freebsd ports build? That's a lot less "overhead" to keep 
dragging around...

That seems to make more sense. I can't imagine why this would be a 
runtime parameter as opposed to build time.

I am unsure of the true overhead of making this a runtime tunable so 
pardon if I'm asking for "a lot".  From the perspective of both an OS 
developer and postgresql user (I am both) it really makes more sense to 
have it a runtime tunable for the following reasons:

From an OS developer making this a runtime allows us to much more 
easily do the testing (instead of needing two compiled versions).
From a sysadmin perspective it makes switching to/from a LOT easier in 
case the new mmap code exposes a stability or performance bug.

-Alfred

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein


On 4/21/14 4:10 AM, Andres Freund wrote:

Hi,

On 2014-04-20 11:24:38 +0200, Palle Girgensohn wrote:

I see performance degradation with PostgreSQL 9.3 vs 9.2 on FreeBSD, and I'm 
wondering who to poke to mitigate the problem. In reference to this thread [1], 
who where the FreeBSD people that Francois mentioned? If mmap needs to perform 
well in the kernel, I'd like to know of someone with FreeBSD kernel knowledge 
who is interested in working with mmap perfocmance. If mmap is indeed the 
cuplrit, I've just tested 9.2.8 vs 9.3.4, I nevere isolated the mmap patch, 
although I believe Francois did just that with similar results.

If there are indeed such large regressions on FreeBSD we need to treat
them as postgres regressions. It's nicer not to add config options for
things that don't need it, but apparently that's not the case here.

Imo this means we need to add GUC to control wether anon mmap() or sysv
shmem is to be used. In 9.3.

Greetings,

Andres Freund

Andres, thank you.  Speaking as a FreeBSD developer that would be a good 
idea.


-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] PGCON meetup FreeNAS/FreeBSD: In Ottawa Tue & Wed.

2013-05-20 Thread Alfred Perlstein


Hello PostgreSQL Hackers,

I am now in Ottawa, last week we wrapped up the BSDCon and I was hoping 
to chat with a few Postgresql developers in person about using 
Postgresql in FreeNAS and offering it as an extension to the platform as 
a plug-in technology.  Unfortunately due to time constraints I can not 
attend the entire conference and I am only in town until Wednesday at noon.


I'm hoping there's a good time to talk to a few developers about 
Postgresql + FreeNAS before I have to depart back to the bay area.


Some info on me:  My name is Alfred Perlstein, I am a FreeBSD developer 
and FreeNAS project lead.  I am the VP of Software Engineering at 
iXsystems.  I have been a fan of Postgresql for many years.  In the 
early 2000s we build a high speed web tracking application on top of 
Postgresql and worked closely with the community to shake out 
performance and bug, so closely that Tom Lane and Vadim Mikheevhad 
logins on our box.  Since that time I have tried to get Postgresql into 
as many places as possible.


Some info on the topics I wanted to briefly discuss:

1) Using Postgresql as the config store for FreeNAS.
We currently use SQLITE, SQLITE fits our needs until we get to the point 
of replication between HA (high availability) units.  Then we are forced 
to manually sync data between configurations.  A discussion on how we 
might do this better using Postgresql, while still maintaining our ease 
of config export (single file) and small footprint would be interesting.


2) Postgresql plugin for FreeNAS.
Flip a switch and suddenly your file server is also serving enterprise 
data.  We currently have a plug-in architecture, but would like to 
discuss the possibility of a tighter integration so that Postgresql 
looks like a more cohesive addition to FreeNAS.


3) Statistic monitoring / EagleEye
In FreeBSD/FreeNAS I have developed a system called EagleEye. EagleEye 
is a system where all mibs are easily exportable with timestamps in a 
common format (for now YAML & modified CSV) which is then consumed by a 
utility which can then provide graphs. The entire point of EagleEye is 
to eventually upstream the modifications to future proof statistics 
tracking into the FreeBSD and FreeNAS systems.  I have spoken with some 
Illuminos/ZFS developers and they are interested as well.


I think that is all I have, please drop me a note if you'll have some 
time in Ottawa today, tomorrow or early Wednesday.  I'd love to discuss 
and buy some beers for the group.


thank you,
-Alfred Perlstein
VP Software Engineering, iXsystems.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Utilizing "direct writes" Re: [HACKERS] File system performance and pg_xlog

2001-05-05 Thread Alfred Perlstein

* Marko Kreen <[EMAIL PROTECTED]> [010505 17:39] wrote:
> 
> There already exist bazillion filesystems, _some_ of them should
> be usable for PostgreSQL too :)
> 
> Besides resource waste there are others problems with app-level
> fs:
> 
> * double-buffering and incompatibilities of avoiding that

Depends on the OS, most Operating systems like FreeBSD and Solaris
offer character device access, this means that the OS will DMA
directly from the process's address space.  Avoiding the double
copy is trivial except that one must align and size writes correctly,
generally on 512 byte boundries and in 512 byte increments.

> * a lot of code should be reimplemented that already exists
>   in today's OS'es

That's true.

> * you lose all of UNIX user-space tools

Even worse. :)

> * the speed difference will not be very big.  Remeber: it _was_
>   big on OS'es and fs' in year 1990.  Today's fs are lot of
>   better and there should be a os/fs combo that is 95% perfect.

Well, here's an idea, has anyone tried using the "direct write"
interface that some OS's offer?  I doubt FreeBSD does, but I'm
positive that Solaris offers it as well as possibly IRIX.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] elog(LOG), elog(DEBUG)

2001-05-05 Thread Alfred Perlstein


* Peter Eisentraut <[EMAIL PROTECTED]> [010505 02:06] wrote:
> There's a TODO item to make elog(LOG) a separate level.  I propose the
> name INFO.  It would be identical to DEBUG in effect, only with a
> different label.  Additionally, all DEBUG logging should either be
> disabled unless the debug_level is greater than zero, or alternatively
> some elog(DEBUG) calls should be converted to INFO conditional on a
> configuration setting (like log_pid, for example).
> 
> The stricter distinction between DEBUG and INFO would also yield the
> possibility of optionally sending DEBUG output to the frontend, as has
> been requested a few times.

INFO makes sense as afaik it maps to syslog.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] New Linux xfs/reiser file systems

2001-05-02 Thread Alfred Perlstein


* Bruce Momjian <[EMAIL PROTECTED]> [010502 15:20] wrote:
> > The "problem" with log based filesystems is that they most likely
> > do not know the consequences of a write so an fsync on a file may
> > require double writing to both the log and the "real" portion of
> > the disk.  They can also exhibit the problem that an fsync may
> > cause all pending writes to require scheduling unless the log is
> > constructed on the fly rather than incrementally.
> 
> Yes, this double-writing is a problem.  Suppose you have your WAL on a
> separate drive.  You can fsync() WAL with zero head movement.  With a
> log based file system, you need two head movements, so you have gone
> from zero movements to two.

It may be worse depending on how the filesystem actually does
journalling.  I wonder if an fsync() may cause ALL pending
meta-data to be updated (even metadata not related to the 
postgresql files).

Do you know if reiser or xfs have this problem?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] New Linux xfs/reiser file systems

2001-05-02 Thread Alfred Perlstein

* Bruce Momjian <[EMAIL PROTECTED]> [010502 14:01] wrote:
> I was talking to a Linux user yesterday, and he said that performance
> using the xfs file system is pretty bad.  He believes it has to do with
> the fact that fsync() on log-based file systems requires more writes.
> 
> With a standard BSD/ext2 file system, WAL writes can stay on the same
> cylinder to perform fsync.  Is that true of log-based file systems?
> 
> I know xfs and reiser are both log based.  Do we need to be concerned
> about PostgreSQL performance on these file systems?  I use BSD FFS with
> soft updates here, so it doesn't affect me.

The "problem" with log based filesystems is that they most likely
do not know the consequences of a write so an fsync on a file may
require double writing to both the log and the "real" portion of
the disk.  They can also exhibit the problem that an fsync may
cause all pending writes to require scheduling unless the log is
constructed on the fly rather than incrementally.

There was also the problem that was brought up recently that
certain versions (maybe all?) of Linux perform fsync() in a very
non-optimal manner, if the user is able to use the O_FSYNC option
rather than fsync he may see a performance increase.

But his guess is probably nearly as good as mine. :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
http://www.egr.unlv.edu/~slumos/on-netbsd.html

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] 7.1 startup recovery failure

2001-05-01 Thread Alfred Perlstein


* Rod Taylor <[EMAIL PROTECTED]> [010430 22:10] wrote:
> Corrupted or not, after a crash take a snapshot of the data tree
> before firing it back up again.  Doesn't take that much time
> (especially with a netapp filer) and it allows for a virtually
> unlimited number of attempts to solve the trouble or debug.
> 

You run your database over NFS?  They must be made of steel. :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] COPY commands could use an enhancement.

2001-04-30 Thread Alfred Perlstein

* Tom Lane <[EMAIL PROTECTED]> [010430 08:37] wrote:
> Alfred Perlstein <[EMAIL PROTECTED]> writes:
> > It would be very helpful if the COPY command could be expanded
> > in order to provide positional parameters.
> 
> I think it's a bad idea to try to expand COPY into a full-tilt data
> import/conversion utility, which is the direction that this sort of
> suggestion is headed in.  COPY is designed as a simple, fast, reliable,
> low-overhead data transfer mechanism for backup and restore.  The more
> warts we add to it, the less well it will serve that purpose.

Honestly it would be hard for COPY to be any more less serving of
people's needs, it really makes sense for it to be able to parse
positional paramters for both speed and correctness.

> Example: if we allow selective column import, what do we do with missing
> columns?

What is already done, if you initiate a copy into a 5 column table
using only 4 columns of copy data the fifth is left empty.

> Must COPY now be able to handle insertion of default-value
> expressions?

No, copy should be what it is simple but at the same time useful
enough for bulk transfer without painful contortions and fear
of modifying tables.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Represent yourself, show up at BABUG http://www.babug.org/

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

[HACKERS] COPY commands could use an enhancement.

2001-04-30 Thread Alfred Perlstein


It would be very helpful if the COPY command could be expanded
in order to provide positional parameters.

I noticed that it didn't a while back and it can really hurt
someone when they happen to try to use pg_dump to move data
from one database to another database and they happened to
create the feilds in the tables in different orders.

Basically:
COPY "webmaster" FROM stdin;

could become:
COPY "webmaster" FIELDS "id", "name", "ssn" FROM stdin;

this way when sourcing it would know where to place the
feilds.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Thanks, naming conventions, and count()

2001-04-29 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [010429 23:12] wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > big problem is that there is no good way to make the symlinks reliable
> > because in a crash, the symlink could point to a table creation that got
> > rolled back or the renaming of a table that got rolled back.
> 
> Yes.  Have you already forgotten the very long discussion we had about
> this some months back?  There is no way to provide a reliable symlink
> mapping without re-introducing all the same problems that we went to
> numeric filenames to avoid.  Now if you want an *UNRELIABLE* symlink
> mapping, maybe we could talk about it ... but IMHO such a feature would
> be worse than useless.  Murphy's law says that the symlinks would be
> right often enough to mislead dbadmins into trusting them, and wrong
> exactly when it would do the most damage to trust them.  The same goes
> for other methods of unreliably exporting the name-to-number mapping,
> such as dumping it into a flat file.
> 
> We do need to document how to get the mapping (ie, select relfilenode,
> relname from pg_class).  But I really doubt that an automated method
> for exporting the mapping would be worth the cycles it would cost,
> even if it could be made reliable which it can't.

Perhaps an external tool to rebuild the symlink state that could be
run on an offline database.  But I'm sure you have more important
things to do. :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Thanks, naming conventions, and count()

2001-04-29 Thread Alfred Perlstein


* Bruce Momjian <[EMAIL PROTECTED]> [010429 20:14] wrote:

> Yes, I like that idea, but the problem is that it is hard to update just
> one table in the file.  You sort of have to update the entire file each
> time a table changes.  That is why I liked symlinks because they are
> per-table, but you are right that the symlink creation could fail
> because the new table file was never created or something, leaving the
> symlink pointing to nothing.  Not sure how to address this.  Is there a
> way to update a flat file when a single table changes?

Sort of, if that flat file is in the form of:
123456;"tablename   "
33;"another_table   "

ie, each line is a fixed length.


-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Re: SAP-DB

2001-04-29 Thread Alfred Perlstein

* Bruce Momjian <[EMAIL PROTECTED]> [010429 10:44] wrote:
> > > I swore I'd never post to the hackers list again, but this is an amazing
> > > statement by Bruce.
> > > 
> > > Boy, the robustness of the software is determined by the number of characters
> > > in the directory name?
> > > 
> > > By the languages used?
> > 
> > [Snip]
> > 
> > My guess is that Bruce was implying that the code was obfuscated. It is a
> > common trick for closed source to be "open" but not really.
> > 
> > I don't think it was any sort of technology snobbery. Far be it for me to
> > suggest an explanation to the words of others, that is just how I read it.
> 
> I don't think they intentionally confused the code.
> 
> The real problem I see in that it was very hard for me to find anything
> in the code.  I would be interested to see if others can find stuff.

I think this is general problem in a lot of projects, you open up
foo.c and say... "what the heck is this..." after a few hours of
studying the source you finally figure out is something that does
miniscule part X of massive part Y and by then you're too engrossed
to write a little banner for the file or dir explaining what it's
for and incorrectly assume that even if you did, it wouldn't help
that user unless he went through the same painful steps that you
did.

Been there, done that.. er, actually, still there, mostly still
doing that.  :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
http://www.egr.unlv.edu/~slumos/on-netbsd.html

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

[HACKERS] Re: 7.1 vacuum

2001-04-27 Thread Alfred Perlstein


* mlw <[EMAIL PROTECTED]> [010427 05:50] wrote:
> Alfred Perlstein wrote:
> > 
> > * Magnus Naeslund(f) <[EMAIL PROTECTED]> [010426 21:17] wrote:
> > > How does 7.1 work now with the vacuum and all?
> > >
> > > Does it go for indexes by default, even when i haven't run a vacuum at all?
> > > Does vacuum lock up postgres? It says the analyze part shouldn't, but how's
> > > that for all of the vacuum?
> > >
> > > An 7.0.3 db we have here we are forced to run vacuum every hour to get an
> > > acceptable speed, and while doing that vacuum (5-10 minutes) it totaly
> > > blocks our application that's mucking with the db.
> > 
> > http://people.freebsd.org/~alfred/vacfix/
> 
> What's the deal with vacuum lazy in 7.1? I was looking forward to it. It was
> never clear whether or not you guys decided to put it in.
> 
> If it is in as a feature, how does one use it?
> If it is a patch, how does one get it?

If you actually download and read the enclosed READMEs it's pretty
clear.

> If it is neither a patch nor an existing feature, has development stopped?

I have no idea, I haven't been tracking postgresql all that much 
since leaving the place where we contracted that work.


-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Represent yourself, show up at BABUG http://www.babug.org/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] 7.1 vacuum

2001-04-26 Thread Alfred Perlstein


* Magnus Naeslund(f) <[EMAIL PROTECTED]> [010426 21:17] wrote:
> How does 7.1 work now with the vacuum and all?
> 
> Does it go for indexes by default, even when i haven't run a vacuum at all?
> Does vacuum lock up postgres? It says the analyze part shouldn't, but how's
> that for all of the vacuum?
> 
> An 7.0.3 db we have here we are forced to run vacuum every hour to get an
> acceptable speed, and while doing that vacuum (5-10 minutes) it totaly
> blocks our application that's mucking with the db.

http://people.freebsd.org/~alfred/vacfix/

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Instead of asking why a piece of software is using "1970s technology,"
start asking why software is ignoring 30 years of accumulated wisdom.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] CVS commits

2001-04-02 Thread Alfred Perlstein

* Michael Meskes <[EMAIL PROTECTED]> [010402 04:41] wrote:
> Will current CVS commits make it into 7.1? Or do I have to use a different
> branch. I just committed a minor patch to keep the parsers in sync and also
> committed a bug fix last week. Both should be in 7.1.

You should be able to check that via the cvsweb interface off the
developer's corner on the postgresql website.  Just find your files
and see if there's a new tag for 7.1 and whether or not your code
is against HEAD or against the tag (if it exists) for 7.1.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [ADMIN] Re: [HACKERS] Re: [PORTS] pgmonitor and Solaris

2001-03-29 Thread Alfred Perlstein


* Karel Zak <[EMAIL PROTECTED]> [010329 03:10] wrote:
> On Wed, Mar 28, 2001 at 04:10:52PM -0800, Alfred Perlstein wrote:
> > * Tom Lane <[EMAIL PROTECTED]> [010328 16:07] wrote:
> > > Larry Rosenman <[EMAIL PROTECTED]> writes:
> > > > FYI, the WU-FTPD code (2.6.0 or better) has a couple of more platforms 
> > > > including UnixWare.  The UnixWare code will need /dev/kmem permission to 
> > > > change it's stuff, so I don't know whether we want to do this or not, but 
> > > > if people are looking at the ps stuff, please look at this as well.
> > > 
> > > Well, *I* sure wouldn't run Postgres with write permission on /dev/kmem.
> > > Especially not just to make ps_status work...
> > 
> > Wow... is this all for "pgmonitor"?
> > 
> > sorry, just my opinion...
> > 
> > If it for pgmonitor then you guys ought to just mark it broken on
> > these platforms, the non-"ps based" solution could have been
> > implemented with all the time wasted trying to get the "ps based"
> > hack working. :(
> 
>  If is still a mind for (IMHO) terrible work with 'ps'... not sure if 
> it's interesting, but exist cross-platforms library for processes 
> information loading - the libgtop. Supports:
> 
>   * All versions of Linux
>   * FreeBSD, NetBSD and OpenBSD
>   * BSD/OS
>   * Digital Unix
>   * Solaris
>   * ..and may be others
> 
>  Sure, it's in C :-)

And infected with the GPV, not even the LGPV... not worth it. :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [ADMIN] Re: [HACKERS] Re: [PORTS] pgmonitor and Solaris

2001-03-28 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [010328 16:07] wrote:
> Larry Rosenman <[EMAIL PROTECTED]> writes:
> > FYI, the WU-FTPD code (2.6.0 or better) has a couple of more platforms 
> > including UnixWare.  The UnixWare code will need /dev/kmem permission to 
> > change it's stuff, so I don't know whether we want to do this or not, but 
> > if people are looking at the ps stuff, please look at this as well.
> 
> Well, *I* sure wouldn't run Postgres with write permission on /dev/kmem.
> Especially not just to make ps_status work...

Wow... is this all for "pgmonitor"?

sorry, just my opinion...

If it for pgmonitor then you guys ought to just mark it broken on
these platforms, the non-"ps based" solution could have been
implemented with all the time wasted trying to get the "ps based"
hack working. :(

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] CVS tags for betas and release candidate

2001-03-27 Thread Alfred Perlstein


* The Hermit Hacker <[EMAIL PROTECTED]> [010327 05:31] wrote:
> On Tue, 27 Mar 2001, Alfred Perlstein wrote:
> 
> > * The Hermit Hacker <[EMAIL PROTECTED]> [010327 04:53] wrote:
> > > On Mon, 26 Mar 2001, Matthias Juchem wrote:
> > >
> > > > Hi there.
> > > >
> > > > I was just looking for the CVS tags for downloading the beta6 and the
> > > > RC1 of 7.1 but there are only the following tags:
> > > >
> > > > REL_7_1_BETA2
> > > > REL_7_1_BETA3
> > > > REL_7_1
> > > >
> > > > Aren't there tags for the versions I am looking for?
> > >
> > > Nope ... doing the tags didn't work as well as was hoped, so we've just
> > > been using date ranges instead ... release itself will be tag'd ...
> >
> > You know you can nuke tags right?
> 
> no, this one I didn't know ... how? :)


  If you use `cvs tag -d symbolic_tag...',  the  sym-
  bolic  tag  you specify is deleted instead of being
  added.  Warning: Be very  certain  of  your  ground
  before  you  delete  a  tag; doing this effectively
  discards some  historical  information,  which  may
  later turn out to have been valuable.



-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Instead of asking why a piece of software is using "1970s technology,"
start asking why software is ignoring 30 years of accumulated wisdom.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] CVS tags for betas and release candidate

2001-03-27 Thread Alfred Perlstein


* The Hermit Hacker <[EMAIL PROTECTED]> [010327 04:53] wrote:
> On Mon, 26 Mar 2001, Matthias Juchem wrote:
> 
> > Hi there.
> >
> > I was just looking for the CVS tags for downloading the beta6 and the
> > RC1 of 7.1 but there are only the following tags:
> >
> > REL_7_1_BETA2
> > REL_7_1_BETA3
> > REL_7_1
> >
> > Aren't there tags for the versions I am looking for?
> 
> Nope ... doing the tags didn't work as well as was hoped, so we've just
> been using date ranges instead ... release itself will be tag'd ...

You know you can nuke tags right?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] pgindent run?

2001-03-22 Thread Alfred Perlstein


* Bruce Momjian <[EMAIL PROTECTED]> [010322 07:12] wrote:
> > It seems that you guys are dead set on using this pgindent tool,
> > this is cool, we'd probably use some indentation tool on the FreeBSD
> > sources if there was one that met our code style(9) guidelines.
> 
> I would liken running pgindent to having a nice looking store or
> website.  No one is going to go to a website or a store only because it
> looks nice, but having it look nice does keep people coming back.

I'm not saying I don't like pgindent, I'm saying I don't like
pgindent's effect on the CVS history.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Instead of asking why a piece of software is using "1970s technology,"
start asking why software is ignoring 30 years of accumulated wisdom.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] pgindent run?

2001-03-22 Thread Alfred Perlstein

ied should work.
> 
> I imagine we can get CVS to do that automatically.  The number of patch
> on top of another patch is pretty rare and it would solve the other
> stated problems.
> 
> 
> > There's also the argument that a developer's pgindent may force a
> > contributor to resolve conflicts, while this is true, it's also
> > true that you guys expect diffs to be in context format, comments
> > to be in english, function prototypes to be new style, etc, etc..
> > 
> > I think contributors can deal with this.
> 
> If someone submits a massive patch, and we apply it, all patches after
> that that they give us will not apply cleanly because they still have
> the old format.  The other argument for not doing pgindent on cvs commit
> is that if someone is working in an area of the code, they should be
> able to format that code as they like to see it.  They may be working in
> there for months.  Only during release are is their _style_ removed.

And how exactly does that make sense?  You guys have a long beta period,
this give people that sit in thier own little corner of the code (OBDC
for instance) stuck with a major conflict resolution after release when
they go to add patches they held off on during beta.  You also suddenly
make the code look completely forien to the contributor... what if he
has a major issue with the style of pgindent?  It would make a lot more
sense to explain this up front...
  "Say, Alfred, I noticed you use two space indents, that's gross,
   have you run your code through pgindent as explained in the
   contributor's guide at http://www.postgresql.org/faq/cont?"

> On a side note, the idea of having people submit patches only against
> current CVS seems bad to me.  If people are running production machines
> and they develop a patch and test it there, I want the patch that works
> on their machine and can make sure it applies here.  Having them
> download CVS and do the merge themselves seems really risky, especially
> because they probably can't test the CVS in production.  The CVS may not
> even run properly.

Well that's true, but how confident are you that the patch applied to
the CVS version has the same effect as the -release version?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] pgindent run?

2001-03-21 Thread Alfred Perlstein

* Bruce Momjian <[EMAIL PROTECTED]> [010321 21:14] wrote:
> > The Hermit Hacker <[EMAIL PROTECTED]> writes:
> > > and most times, those have to be merged into the source tree due to
> > > extensive changes anyway ... maybe we should just get rid of the use of
> > > pgindent altogether?
> > 
> > I think pgindent is a good thing; the style of different parts of the
> > code would vary too much without it.  I'm only unhappy about the risk
> > issues of running it at this late stage of the release cycle.
> 
> This is the usual discussion.  Some like it, some don't like the risk,
> some don't like the timing.  I don't think we ever came up with a better
> time than before RC, though I think we could do it a little earlier in
> beta if people were not holding patches during that period.  It is the
> beta patching folks that we have the most control over.

It seems that you guys are dead set on using this pgindent tool,
this is cool, we'd probably use some indentation tool on the FreeBSD
sources if there was one that met our code style(9) guidelines.

With that said, I really scares the crud out of me to see those massive
pg_indent runs right before you guys do a release.

It would make a lot more sense to force a pgindent run after applying
each patch.  This way you don't loose the history.

You want to be upset with yourself Bruce?  Go into a directory and type:

cvs annotate 

cvs annotate is a really, really handy tool, unfortunetly these
indent runs remove this very useful tool as well as do a major job
of obfuscating the code changes.

It's not like you guys have a massive devel team with new people each
week that have a steep committer learning curve ahead of them, making
pgindent as patches are applied should work.

There's also the argument that a developer's pgindent may force a
contributor to resolve conflicts, while this is true, it's also
true that you guys expect diffs to be in context format, comments
to be in english, function prototypes to be new style, etc, etc..

I think contributors can deal with this.

just my usual 20 cents. :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Fw: [vorbis-dev] ogg123: shared memory by mmap()

2001-03-20 Thread Alfred Perlstein


* Bruce Momjian <[EMAIL PROTECTED]> [010320 14:10] wrote:
> > > > The patch below adds:
> > > >
> > > > - acinclude.m4:  A new macro A_FUNC_SMMAP to check that sharing
> > > pages
> > > >   through mmap() works.  This is taken from Joerg Schilling's star.
> > > > - configure.in:  A_FUNC_SMMAP
> > > > - ogg123/buffer.c:  If we have a working mmap(), use it to create
> > > >   a region of shared memory instead of using System V IPC.
> > > >
> > > > Works on BSD.  Should also work on SVR4 and offspring (Solaris),
> > > > and Linux.
> > 
> > This is a really bad idea performance wise.  Solaris has a special
> > code path for SYSV shared memory that doesn't require tons of swap
> > tracking structures per-page/per-process.  FreeBSD also has this
> > optimization (it's off by default, but should work since FreeBSD
> > 4.2 via the sysctl kern.ipc.shm_use_phys=1)
> 
> > 
> > Both OS's use a trick of making the pages non-pageable, this allows
> > signifigant savings in kernel space required for each attached
> > process, as well as the use of large pages which reduce the amount
> > of TLB faults your processes will incurr.
> 
> That is interesting.  BSDi has SysV shared memory as non-pagable, and I
> always thought of that as a bug.  Seems you are saying that having it
> pagable has a significant performance penalty.  Interesting.

Yes, having it pageable is actually sort of bad.

It doesn't allow you to do several important optimizations.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Final Call: RC1 about to go out the door ...

2001-03-20 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [010320 10:21] wrote:
> The Hermit Hacker <[EMAIL PROTECTED]> writes:
> > Speak now, or forever hold your piece (where forever is the time
> > between now and RC1 is packaged) ...
> 
> I rather hope it's *NOT* 

And still no LAZY vacuum.  *sigh*

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] ODBC/FreeBSD/LinuxEmulation/RPM?

2001-03-19 Thread Alfred Perlstein


* Alfred Perlstein <[EMAIL PROTECTED]> [010319 11:27] wrote:
> * Larry Rosenman <[EMAIL PROTECTED]> [010319 10:35] wrote:
> > 
> > Is there any way to get just the ODBC RPM to install with OUT
> > installing the whole DB? 
> > 
> > I have a strange situation:
> > 
> > StarOffice 5.2 (Linux) Running under FreeBSD Linux Emulation
> > PG running NATIVE.
> > 
> > I want the two to talk, using ODBC.
> > 
> > How do I make this happen?
> 
> rpm2cpio  > pg_rpmfile.cpio
> cpio -i < pg_rpmfile.cpio
> tar xzvf pg_rpmfile.tgz

Sorry, i was just waking up when I wrote this... the idea is to
extract the rpm then just grab the required ODBC files.

best of luck,
-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] ODBC/FreeBSD/LinuxEmulation/RPM?

2001-03-19 Thread Alfred Perlstein


* Larry Rosenman <[EMAIL PROTECTED]> [010319 10:35] wrote:
> 
> Is there any way to get just the ODBC RPM to install with OUT
> installing the whole DB? 
> 
> I have a strange situation:
> 
> StarOffice 5.2 (Linux) Running under FreeBSD Linux Emulation
> PG running NATIVE.
> 
> I want the two to talk, using ODBC.
> 
> How do I make this happen?

rpm2cpio  > pg_rpmfile.cpio
cpio -i < pg_rpmfile.cpio
tar xzvf pg_rpmfile.tgz

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Fw: [vorbis-dev] ogg123: shared memory by mmap()

2001-03-19 Thread Alfred Perlstein


WOOT WOOT! DANGER WILL ROBINSON!

> - Original Message -
> From: "Christian Weisgerber" <[EMAIL PROTECTED]>
> Newsgroups: list.vorbis.dev
> To: <[EMAIL PROTECTED]>
> Sent: Saturday, March 17, 2001 12:01 PM
> Subject: [vorbis-dev] ogg123: shared memory by mmap()
> 
> 
> > The patch below adds:
> >
> > - acinclude.m4:  A new macro A_FUNC_SMMAP to check that sharing
> pages
> >   through mmap() works.  This is taken from Joerg Schilling's star.
> > - configure.in:  A_FUNC_SMMAP
> > - ogg123/buffer.c:  If we have a working mmap(), use it to create
> >   a region of shared memory instead of using System V IPC.
> >
> > Works on BSD.  Should also work on SVR4 and offspring (Solaris),
> > and Linux.

This is a really bad idea performance wise.  Solaris has a special
code path for SYSV shared memory that doesn't require tons of swap
tracking structures per-page/per-process.  FreeBSD also has this
optimization (it's off by default, but should work since FreeBSD
4.2 via the sysctl kern.ipc.shm_use_phys=1)

Both OS's use a trick of making the pages non-pageable, this allows
signifigant savings in kernel space required for each attached
process, as well as the use of large pages which reduce the amount
of TLB faults your processes will incurr.

Anyhow, if you could make this a runtime option it wouldn't be so
evil, but as a compile time option, it's a really bad idea for
Solaris and FreeBSD.

--
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-18 Thread Alfred Perlstein


* Larry Rosenman <[EMAIL PROTECTED]> [010318 14:17] wrote:
> * Tom Lane <[EMAIL PROTECTED]> [010318 14:55]:
> > Alfred Perlstein <[EMAIL PROTECTED]> writes:
> > >> Just by making a thread call libc changes personality to use thread
> > >> safe routines (I.E. add mutex locking).  Use one thread feature, get
> > >> the whole set...which may not be that bad.
> > 
> > > Actually it can be pretty bad.  Locked bus cycles needed for mutex
> > > operations are very, very expensive, not something you want to do
> > > unless you really really need to do it.
> > 
> > It'd be interesting to try to get some numbers about the actual cost
> > of using a thread-aware libc, on platforms where there's a difference.
> > Shouldn't be that hard to build a postgres executable with the proper
> > library and run some benchmarks ... anyone care to try?
> I can get the code compiled, but don't have the skills to generate
> a test case worthy of anything

There's a 'make test' or something ('regression' maybe?) target that
runs a suite of tests on the database, you could use that as a
bench/timer, you could also try mysql's "crashme" script.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-18 Thread Alfred Perlstein


* William K. Volkman <[EMAIL PROTECTED]> [010318 11:56] wrote:
> The Hermit Hacker wrote:
> >> 
> > But, with shared libraries, are you really pulling in a "whole
> > thread-support library"?  My understanding of shared libraries (altho it
> > may be totally off) was that instead of pulling in a whole library, you
> > pulled in the bits that you needed, pretty much as you needed them ...
> 
> Just by making a thread call libc changes personality to use thread
> safe routines (I.E. add mutex locking).  Use one thread feature, get
> the whole set...which may not be that bad.

Actually it can be pretty bad.  Locked bus cycles needed for mutex
operations are very, very expensive, not something you want to do
unless you really really need to do it.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Performance monitor signal handler

2001-03-16 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [010316 10:06] wrote:
> Jan Wieck <[EMAIL PROTECTED]> writes:
> > Uh - not much time to spend if the statistics should at least
> > be  half  accurate. And it would become worse in SMP systems.
> > So that was a nifty idea, but I think it'd  cause  much  more
> > statistic losses than I assumed at first.
> 
> > Back to drawing board. Maybe a SYS-V message queue can serve?
> 
> That would be the same as a pipe: backends would block if the collector
> stopped accepting data.  I do like the "auto discard" aspect of this
> UDP-socket approach.
> 
> I think Philip had the right idea: each backend should send totals,
> not deltas, in its messages.  Then, it doesn't matter (much) if the
> collector loses some messages --- that just means that sometimes it
> has a slightly out-of-date idea about how much work some backends have
> done.  It should be easy to design the software so that that just makes
> a small, transient error in the currently displayed statistics.

MSGSND(3)  FreeBSD Library Functions Manual  MSGSND(3)


ERRORS
 msgsnd() will fail if:

 [EAGAIN]   There was no space for this message either on the
    queue, or in the whole system, and IPC_NOWAIT was set
in msgflg.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Alfred Perlstein


> On 3/16/01, 11:10:34 AM, The Hermit Hacker <[EMAIL PROTECTED]> wrote 
> regarding Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC :
> 
> > But, with shared libraries, are you really pulling in a "whole
> > thread-support library"?  My understanding of shared libraries (altho it
> > may be totally off) was that instead of pulling in a whole library, you
> > pulled in the bits that you needed, pretty much as you needed them ...


* Larry Rosenman <[EMAIL PROTECTED]> [010316 10:02] wrote:
> Yes, you are.  On UnixWare, you need to add -Kthread, which CHANGES a LOT 
> of primitives to go through threads wrappers and scheduling.
> 
> See the doc on the http://UW7DOC.SCO.COM or http://www.lerctr.org:457/ 
> web pages.
> 
> Also, some functions are NOT available without the -Kthread or -Kpthread 
> directives. 

This is true on FreeBSD as well.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Performance monitor signal handler

2001-03-16 Thread Alfred Perlstein


* Jan Wieck <[EMAIL PROTECTED]> [010316 08:08] wrote:
> Philip Warner wrote:
> >
> > But I prefer the UDP/Collector model anyway; it gives use greater
> > flexibility + the ability to keep stats past backend termination, and,as
> > you say, removes any possible locking requirements from the backends.
> 
> OK, did some tests...
> 
> The  postmaster can create a SOCK_DGRAM socket at startup and
> bind(2) it to "127.0.0.1:0", what causes the kernel to assign
> a  non-privileged  port  number  that  then  can be read with
> getsockname(2). No other process can have a socket  with  the
> same port number for the lifetime of the postmaster.
> 
> If  the  socket  get's  ready, it'll read one backend message
> from   it   with   recvfrom(2).   The   fromaddr   mustbe
> "127.0.0.1:xxx"  where  xxx  is  the  port  number the kernel
> assigned to the above socket.  Yes,  this  is  his  own  one,
> shared  with  postmaster  and  all  backends.  So  both,  the
> postmaster and the backends can  use  this  one  UDP  socket,
> which  the  backends  inherit on fork(2), to send messages to
> the collector. If such  a  UDP  packet  really  came  from  a
> process other than the postmaster or a backend, well then the
> sysadmin has  a  more  severe  problem  than  manipulated  DB
> runtime statistics :-)

Doing this is a bad idea:

a) it allows any program to start spamming localhost:randport with
messages and screw with the postmaster.

b) it may even allow remote people to mess with it, (see recent
bugtraq articles about this)

You should use a unix domain socket (at least when possible).

> Running  a 500MHz P-III, 192MB, RedHat 6.1 Linux 2.2.17 here,
> I've been able to loose no single message during the parallel
> regression  test,  if each backend sends one 1K sized message
> per query executed, and the collector simply sucks  them  out
> of  the  socket. Message losses start if the collector does a
> per message idle loop like this:
> 
> for (i=0,sum=0;i<25;i++,sum+=1);
> 
> Uh - not much time to spend if the statistics should at least
> be  half  accurate. And it would become worse in SMP systems.
> So that was a nifty idea, but I think it'd  cause  much  more
> statistic losses than I assumed at first.
> 
> Back to drawing board. Maybe a SYS-V message queue can serve?

I wouldn't say back to the drawing board, I would say two steps back.

What about instead of sending deltas, you send totals?  This would
allow you to loose messages and still maintain accurate stats.

You can also enable SIGIO on the socket, then have a signal handler
buffer packets that arrive when not actively select()ing on the
UDP socket.  You can then use sigsetmask(2) to provide mutual
exclusion with your SIGIO handler and general select()ing on the
socket.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [010316 08:16] wrote:
> Alfred Perlstein <[EMAIL PROTECTED]> writes:
> >> couldn't the syncer process cache opened files? is there any problem I
> >> didn't consider ?
> 
> > 1) IPC latency, the amount of time it takes to call fsync will
> >increase by at least two context switches.
> 
> > 2) a working set (number of files needed to be fsync'd) that
> >is larger than the amount of files you wish to keep open.
> 
> These days we're really only interested in fsync'ing the current WAL
> log file, so working set doesn't seem like a problem anymore.  However
> context-switch latency is likely to be a big problem.  One thing we'd
> definitely need before considering this is to replace the existing
> spinlock mechanism with something more efficient.

What sort of problems are you seeing with the spinlock code?

> Vadim has designed the WAL stuff in such a way that a separate
> writer/syncer process would be easy to add; in fact it's almost that way
> already, in that any backend can write or sync data that's been added
> to the queue by any other backend.  The question is whether it'd
> actually buy anything to have another process.  Good stuff to experiment
> with for 7.2.

The delayed/coallecesed (sp?) fsync looked interesting.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: Re[2]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Alfred Perlstein


* Bruce Momjian <[EMAIL PROTECTED]> [010316 07:11] wrote:
> > > Could anyone consider fork a syncer process to sync data to disk ?
> > > build a shared sync queue, when a daemon process want to do sync after
> > > write() is called, just put a sync request to the queue. this can release
> > > process from blocked on writing as soon as possible. multipile sync
> > > request for one file can be merged when the request is been inserting to
> > > the queue.
> > 
> > I suggested this about a year ago. :)
> > 
> > The problem is that you need that process to potentially open and close
> > many files over and over.
> > 
> > I still think it's somewhat of a good idea.
> 
> I like the idea too, but people want the transaction to return COMMIT
> only after data has been fsync'ed so I don't see a big win.

This isn't simply handing off the sync to this other process, it requires
an ack from the syncer before returning 'COMMIT'.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Alfred Perlstein


* Xu Yifeng <[EMAIL PROTECTED]> [010316 01:15] wrote:
> Hello Alfred,
> 
> Friday, March 16, 2001, 3:21:09 PM, you wrote:
> 
> AP> * Xu Yifeng <[EMAIL PROTECTED]> [010315 22:25] wrote:
> >>
> >> Could anyone consider fork a syncer process to sync data to disk ?
> >> build a shared sync queue, when a daemon process want to do sync after
> >> write() is called, just put a sync request to the queue. this can release
> >> process from blocked on writing as soon as possible. multipile sync
> >> request for one file can be merged when the request is been inserting to
> >> the queue.
> 
> AP> I suggested this about a year ago. :)
> 
> AP> The problem is that you need that process to potentially open and close
> AP> many files over and over.
> 
> AP> I still think it's somewhat of a good idea.
> 
> I am not a DBMS guru.

Hah, same here. :)

> couldn't the syncer process cache opened files? is there any problem I
> didn't consider ?

1) IPC latency, the amount of time it takes to call fsync will
   increase by at least two context switches.

2) a working set (number of files needed to be fsync'd) that
   is larger than the amount of files you wish to keep open.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: Re[2]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein


* Xu Yifeng <[EMAIL PROTECTED]> [010315 22:25] wrote:
> Hello Tom,
> 
> Friday, March 16, 2001, 6:54:22 AM, you wrote:
> 
> TL> Alfred Perlstein <[EMAIL PROTECTED]> writes:
> >> How many files need to be fsync'd?
> 
> TL> Only one.
> 
> >> If it's more than one, what might work is using mmap() to map the
> >> files in adjacent areas, then calling msync() on the entire range,
> >> this would allow you to batch fsync the data.
> 
> TL> Interesting thought, but mmap to a prespecified address is most
> TL> definitely not portable, whether or not you want to assume that
> TL> plain mmap is ...
> 
> TL> regards, tom lane
> 
> Could anyone consider fork a syncer process to sync data to disk ?
> build a shared sync queue, when a daemon process want to do sync after
> write() is called, just put a sync request to the queue. this can release
> process from blocked on writing as soon as possible. multipile sync
> request for one file can be merged when the request is been inserting to
> the queue.

I suggested this about a year ago. :)

The problem is that you need that process to potentially open and close
many files over and over.

I still think it's somewhat of a good idea.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Alfred Perlstein


* Philip Warner <[EMAIL PROTECTED]> [010315 17:08] wrote:
> At 16:55 15/03/01 -0800, Alfred Perlstein wrote:
> >* Philip Warner <[EMAIL PROTECTED]> [010315 16:46] wrote:
> >> At 16:17 15/03/01 -0800, Alfred Perlstein wrote:
> >> >
> >> >Lost data is probably better than incorrect data.  Either use locks
> >> >or a copying mechanism.  People will depend on the data returned
> >> >making sense.
> >> >
> >> 
> >> But with per-backend data, there is only ever *one* writer to a given set
> >> of counters. Everyone else is a reader.
> >
> >This doesn't prevent a reader from getting an inconsistant view.
> >
> >Think about a 64bit counter on a 32bit machine.  If you charged per
> >megabyte, wouldn't it upset you to have a small chance of loosing
> >4 billion units of sale?
> >
> >(ie, doing a read after an addition that wraps the low 32 bits
> >but before the carry is done to the top most signifigant 32bits?)
> 
> I assume this means we can not rely on the existence of any kind of
> interlocked add on 64 bit machines?
> 
> 
> >Ok, what what if everything can be read atomically by itself?
> >
> >You're still busted the minute you need to export any sort of
> >compound stat.
> 
> Which is why the backends should not do anything other than maintain the
> raw data. If there is atomic data than can cause inconsistency, then a
> dropped UDP packet will do the same.

The UDP packet (a COPY) can contain a consistant snapshot of the data.
If you have dependancies, you fit a consistant snapshot into a single
packet.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Alfred Perlstein

* Philip Warner <[EMAIL PROTECTED]> [010315 16:46] wrote:
> At 16:17 15/03/01 -0800, Alfred Perlstein wrote:
> >
> >Lost data is probably better than incorrect data.  Either use locks
> >or a copying mechanism.  People will depend on the data returned
> >making sense.
> >
> 
> But with per-backend data, there is only ever *one* writer to a given set
> of counters. Everyone else is a reader.

This doesn't prevent a reader from getting an inconsistant view.

Think about a 64bit counter on a 32bit machine.  If you charged per
megabyte, wouldn't it upset you to have a small chance of loosing
4 billion units of sale?

(ie, doing a read after an addition that wraps the low 32 bits
but before the carry is done to the top most signifigant 32bits?)

Ok, what what if everything can be read atomically by itself?

You're still busted the minute you need to export any sort of
compound stat.

If A, B and C need to add up to 100 you have a read race.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Alfred Perlstein


* Philip Warner <[EMAIL PROTECTED]> [010315 16:14] wrote:
> At 06:57 15/03/01 -0500, Jan Wieck wrote:
> >
> >And  shared  memory has all the interlocking problems we want
> >to avoid.
> 
> I suspect that if we keep per-backend data in a separate area, then we
> don;t need locking since there is only one writer. It does not matter if a
> reader gets an inconsistent view, the same as if you drop a few UDP packets.

No, this is completely different.

Lost data is probably better than incorrect data.  Either use locks
or a copying mechanism.  People will depend on the data returned
making sense.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [010315 14:54] wrote:
> Alfred Perlstein <[EMAIL PROTECTED]> writes:
> > How many files need to be fsync'd?
> 
> Only one.
> 
> > If it's more than one, what might work is using mmap() to map the
> > files in adjacent areas, then calling msync() on the entire range,
> > this would allow you to batch fsync the data.
> 
> Interesting thought, but mmap to a prespecified address is most
> definitely not portable, whether or not you want to assume that
> plain mmap is ...

Yeah... :(

Evil thought though (for reference):

mmap(anon memory) returns addr1
addr2 = addr1 + maplen
split addr1<->addr2 on points A B and C
mmap(file1 over addr1 to A)
mmap(file2 over A to B)
mmap(file3 over B to C)
mmap(file4 over C to addr2)

It _should_ work, but there's probably some corner cases where it
doesn't.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein

* Mikheev, Vadim <[EMAIL PROTECTED]> [010315 13:52] wrote:
> > I believe that we don't know enough yet to nail down a hard-wired
> > decision.  Vadim's idea of preferring O_DSYNC if it appears to be
> > different from O_SYNC is a good first cut, but I think we'd 
> > better make it possible to override that, at least for testing purposes.
> 
> So let's leave fsync as default and add option to open log files
> with O_DSYNC/O_SYNC.

I have a weird and untested suggestion:

How many files need to be fsync'd?

If it's more than one, what might work is using mmap() to map the
files in adjacent areas, then calling msync() on the entire range,
this would allow you to batch fsync the data.

The only problem is that I'm not sure:

1) how portable msync() is.
2) if msync garauntees metadata consistancy.

Another benifit of mmap() is the 'zero' copy nature of it.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [010315 11:45] wrote:
> Alfred Perlstein <[EMAIL PROTECTED]> writes:
> > And since we're sorta on the topic of IO, I noticed that it looks
> > like (at least in 7.0.3) that vacuum and certain other routines
> > read files in reverse order.
> 
> Vacuum does that because it's trying to push tuples down from the end
> into free space in earlier blocks.  I don't see much way around that
> (nor any good reason to think that it's a critical part of vacuum's
> performance anyway).  Where else have you seen such behavior?

Just vacuum, but the source is large, and I'm sort of lacking
on database-foo so I guessed that it may be done elsewhere.

You can optimize this out by implementing the read behind yourselves
sorta like this:

struct sglist *
read(fd, len)
{

if (fd.lastpos - fd.curpos <= THRESHOLD) {
fd.curpos = fd.lastpos - THRESHOLD;
len = THRESHOLD;
}

return (do_read(fd, len));
}

of course this is entirely wrong, but illustrates what
would/could help.

I would fix FreeBSD, but it's sort of a mess and beyond what
I've got time to do ATM.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein

* Peter Eisentraut <[EMAIL PROTECTED]> [010315 11:33] wrote:
> Alfred Perlstein writes:
> 
> > Sorry, what's a GUC? :)
> 
> Grand Unified Configuration system
> 
> It's basically a cute name for the achievement that there's now a single
> name space and interface for (almost) all postmaster run time
> configuration variables,

Oh, thanks.

Well considering that, a runtime check for doing_sync_wal_writes
== 1 shouldn't be that expensive.  Sort of the inverse of -F,
meaning that we're using O_SYNC for WAL writes, we don't need to
fsync it.

Btw, if you guys want to get some speed with WAL, I'd implement a
write-behind process if it was possible to do the O_SYNC writes.

...

And since we're sorta on the topic of IO, I noticed that it looks
like (at least in 7.0.3) that vacuum and certain other routines
read files in reverse order.

The problem (at least in FreeBSD) is that we haven't tuned
the system to detect reverse reading and hence don't do
much readahead.  There may be some going on as a function
of the read clustering, but I'm not entirely sure.

I'd suspect that other OSs might have neglected to check
for reverse reading of files as well, but I'm not sure.

Basically, if there was a way to do this another way, or
anticipate the backwards motion and do large reads, it
may add latency, but it should improve performance.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [010315 11:07] wrote:
> "Mikheev, Vadim" <[EMAIL PROTECTED]> writes:
> > ... I would either
> > use fsync as default or don't deal with O_SYNC at all.
> > But if O_DSYNC is defined and O_DSYNC != O_SYNC then we should
> > use O_DSYNC by default.
> 
> Hm.  We could do that reasonably painlessly as a compile-time test in
> xlog.c, but I'm not clear on how it would play out as a GUC option.
> Peter, what do you think about configuration-dependent defaults for
> GUC variables?

Sorry, what's a GUC? :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [010315 09:35] wrote:
> 
> BTW, are there any platforms where O_DSYNC exists but has a different
> spelling?

Yes, FreeBSD only has: O_FSYNC
it doesn't have O_SYNC nor O_DSYNC.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[HACKERS] Re: Performance monitor signal handler

2001-03-13 Thread Alfred Perlstein


* Thomas Swan <[EMAIL PROTECTED]> [010313 13:37] wrote:
> 
> >On reciept of the info signal, the backends collaborate to piece
> >together a status file.  The status file is given a temporay name.
> >When complete the status file is rename(2)'d over a well known
> >file.
> 
> Reporting to files, particularly well known ones, could lead to race 
> conditions.
> 
> All in all, I think your better off passing messages through pipes or a 
> similar communication method.
> 
> I really liked the idea of a "server" that could parse/analyze data from 
> multiple backends.
> 
> My 2/100 worth...

Take a few moments to think about the semantics of rename(2).

Yes, you would still need syncronization between the backend
processes to do this correctly, but not any external app.

The external app can just open the file, assuming it exists it
will always have a complete and consistant snapshot of whatever
the backends agreed on.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] WAL & SHM principles

2001-03-13 Thread Alfred Perlstein

* Matthew Kirkwood <[EMAIL PROTECTED]> [010313 13:12] wrote:
> On Tue, 13 Mar 2001, Ken Hirsch wrote:
> 
> > > mlock() guarantees that the locked address space is in memory.  This
> > > doesn't imply that updates are not written to the backing file.
> >
> > I've wondered about this myself.  It _is_ true on Linux that mlock
> > prevents writes to the backing store,
> 
> I don't believe that this is true.  The manpage offers no
> such promises, and the semantics are not useful.

Afaik FreeBSD's Linux emulator:

revision 1.13
date: 2001/02/28 04:30:27;  author: dillon;  state: Exp;  lines: +3 -1
Linux does not filesystem-sync file-backed writable mmap pages on
a regular basis.  Adjust our linux emulation to conform.  This will
cause more dirty pages to be left for the pagedaemon to deal with,
but our new low-memory handling code can deal with it.   The linux
way appears to be a trend, and we may very well make MAP_NOSYNC the
default for FreeBSD as well (once we have reasonable sequential
write-behind heuristics for random faults).
(will be MFC'd prior to 4.3 freeze)

Suggested by: Andrew Gallatin

Basically any mmap'd data doesn't seem to get sync()'d out on
a regular basis.

> > and this is used as a security feature for cryptography software.
> 
> mlock() is used to prevent pages being swapped out.  Its
> use for crypto software is essentially restricted to anon
> memory (allocated via brk() or mmap() of /dev/zero).

What about userland device drivers that want to send parts
of a disk backed file to a driver's dma routine?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Performance monitor signal handler

2001-03-13 Thread Alfred Perlstein

* Philip Warner <[EMAIL PROTECTED]> [010313 06:42] wrote:
> >
> >This ought to always give a consistant snapshot of the file to
> >whomever opens it.
> >
> 
> I think Tom has previously stated that there are technical reasons not to
> do IO in signal handlers, and I have philosophical problems with
> performance monitors that ask 50 backends to do file IO. I really do think
> shared memory is TWTG.

I wasn't really suggesting any of those courses of action, all I
suggested was using rename(2) to give a seperate appilcation a
consistant snapshot of the stats.

Actually, what makes the most sense (although it may be a performance
killer) is to have the backends update a system table that the external
app can query.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Performance monitor signal handler

2001-03-13 Thread Alfred Perlstein


* Philip Warner <[EMAIL PROTECTED]> [010312 18:56] wrote:
> At 13:34 12/03/01 -0800, Alfred Perlstein wrote:
> >Is it possible
> >to have a spinlock over it so that an external utility can take a snapshot
> >of it with the spinlock held?
> 
> I'd suggest that locking the stats area might be a bad idea; there is only
> one writer for each backend-specific chunk, and it won't matter a hell of a
> lot if a reader gets inconsistent views (since I assume they will be
> re-reading every second or so). All the stats area should contain would be
> a bunch of counters with timestamps, I think, and the cost up writing to it
> should be kept to an absolute minimum.
> 
> 
> >
> >just some ideas..
> >
> 
> Unfortunatley, based on prior discussions, Bruce seems quite opposed to a
> shared memory solution.

Ok, here's another nifty idea.

On reciept of the info signal, the backends collaborate to piece
together a status file.  The status file is given a temporay name.
When complete the status file is rename(2)'d over a well known
file.

This ought to always give a consistant snapshot of the file to
whomever opens it.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Performance monitor signal handler

2001-03-12 Thread Alfred Perlstein

* Bruce Momjian <[EMAIL PROTECTED]> [010312 12:12] wrote:
> I was going to implement the signal handler like we do with Cancel,
> where the signal sets a flag and we check the status of the flag in
> various _safe_ places.
> 
> Can anyone think of a better way to get information out of a backend?

Why not use a static area of the shared memory segment?  Is it possible
to have a spinlock over it so that an external utility can take a snapshot
of it with the spinlock held?

Also, this could work for other stuff as well, instead of overloading
a lot of signal handlers one could just periodically poll a region of
the shared segment.

just some ideas..

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster

2001-03-06 Thread Alfred Perlstein


> >Alfred Perlstein <[EMAIL PROTECTED]> writes:
> >>> Are there any portability problems with relying on shm_nattch to be
> >>> available?  If not, I like this a lot...
> >
> >> Well it's available on FreeBSD and Solaris, I'm sure Redhat has
> >> some deamon that resets the value to 0 periodically just for kicks
> >> so it might not be viable... :)
> >
> >I notice that our BeOS and QNX emulations of shmctl() don't support
> >IPC_STAT, but that could be dealt with, at least to the extent of
> >stubbing it out.

* Cyril VELTER <[EMAIL PROTECTED]> [010306 16:15] wrote:
> 
>BeOS haven't this stat (I have a bunch of others but not this one).
> 
>If I unsterstand correctly, you want to check if there is some backend 
> still attached to shared mem segment of a given key ? In this case, I have an 
> easy solution to fake the stat, because all segment have an encoded name 
> containing this key, so I can count them.

We need to be able to take a single shared memory segment and
determine if any other process is using it.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster

2001-03-06 Thread Alfred Perlstein


* Lamar Owen <[EMAIL PROTECTED]> [010306 13:27] wrote:
> Nathan Myers wrote:
> > That is why there is no problem with version skew in the syscall
> > argument structures on a correctly-configured Linux system.  (On a
> > Red Hat system it is very easy to get them out of sync, but RH fans
> > are used to problems.)
> 
> Is RedHat bashing really necessary here?  At least they are payrolling
> Second Chair on the Linux kernel hierarchy.  And they are very
> supportive of PostgreSQL (by shipping us with their distribution).

Just because they do some really nice things and have some really
nice stuff doesn't mean they should really get cut any slack for
doing things like shipping out of sync kernel/system headers, kill
-9'ing databases and having programs like 'tmpwatch' running on
the boxes.  It really shows a lack of understanding of how Unix is
supposed to run.

What they really need to do is hire some grey beards (old school
Unix folks) to QA the releases and keep stuff like this from
happening/shipping. 

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster

2001-03-06 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [010306 11:49] wrote:
> Peter Eisentraut <[EMAIL PROTECTED]> writes:
> > What I don't like is that my /usr/include/sys/shm.h (through other
> > headers) has [foo]
> > whereas /usr/src/linux/include/shm.h has [bar]
> 
> Are those declarations perhaps bit-compatible?  Looks a tad endian-
> dependent, though ...

Of course not, the size of the struct changed (short->unsigned
long, basically int16_t -> uint32_t), because the kernel and userland
in Linux are hardly in sync you have the fun of guessing if you
get:

old struct -> old syscall (ok)
new struct -> old syscall (boom)
old struct -> new syscall (boom)
new struct -> new syscall (ok)

Honestly I think this problem should be left to the vendor to fix
properly (if it needs fixing), the sysV API was published at least
6 years ago, they ought to have it mostly correct by now.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster

2001-03-06 Thread Alfred Perlstein


* Lamar Owen <[EMAIL PROTECTED]> [010306 11:39] wrote:
> Peter Eisentraut wrote:
> > Not only note the shm_nattch type, but also shm_segsz, and the "unused"
> > fields in between.  I don't know a thing about the Linux kernel sources,
> > but this doesn't seem right.
> 
> Red Hat 7, right?  My RedHat 7 system isn't running RH 7 right now (it's
> this notebook that I'm running Win95 on right now), but see which RPM's
> own the two headers.  You may be in for a shock.  IIRC, the first system
> include is from the 2.4 kernel, and the second in the kernel source tree
> is from the 2.2 kernel.
> 
> Odd, but not really broken.  Should be fixed in the latest public beta
> of RedHat, that actually has the 2.4 kernel.  I can't really say any
> more about that, however.

Y'know, I was only kidding about Linux going out of its way to
defeat the 'shm_nattch' trick... *sigh*

As a FreeBSD developer I'm wondering if Linux keeps compatibility
calls around for old binaries or not.  Any idea?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster

2001-03-06 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [010306 11:30] wrote:
> Alfred Perlstein <[EMAIL PROTECTED]> writes:
> > * Tom Lane <[EMAIL PROTECTED]> [010306 11:03] wrote:
> >> I notice that our BeOS and QNX emulations of shmctl() don't support
> >> IPC_STAT, but that could be dealt with, at least to the extent of
> >> stubbing it out.
> 
> > Well since we already have spinlocks, I can't see why we can't
> > keep the refcount and spinlock in a special place in the shm
> > for all cases?
> 
> No, we mustn't go there.  If the kernel isn't keeping the refcount
> then it's worse than useless: as soon as some process crashes without
> decrementing its refcount, you have a condition that you can't recover
> from without reboot.

Not if the postmaster outputs the following:

> What I'm currently imagining is that the stub implementations will just
> return a failure code for IPC_STAT, and the outer code will in turn fail
> with a message along the lines of "It looks like there's a pre-existing
> shmem block (id XXX) still in use.  If you're sure there are no old
> backends still running, remove the shmem block with ipcrm(1), or just
> delete $PGDATA/postmaster.pid."  I dunno what shmem management tools
> exist on BeOS/QNX, but deleting the lockfile will definitely suppress
> the startup interlock ;-).
> 
> > Yes, if possible a more meaningfull error message and pointer to
> > some docco would be nice
> 
> Is the above good enough?

Sure. :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster

2001-03-06 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [010306 11:03] wrote:
> Alfred Perlstein <[EMAIL PROTECTED]> writes:
> >> Are there any portability problems with relying on shm_nattch to be
> >> available?  If not, I like this a lot...
> 
> > Well it's available on FreeBSD and Solaris, I'm sure Redhat has
> > some deamon that resets the value to 0 periodically just for kicks
> > so it might not be viable... :)
> 
> I notice that our BeOS and QNX emulations of shmctl() don't support
> IPC_STAT, but that could be dealt with, at least to the extent of
> stubbing it out.

Well since we already have spinlocks, I can't see why we can't
keep the refcount and spinlock in a special place in the shm
for all cases?

> This does raise the question of what to do if shmctl(IPC_STAT) fails
> for a reason other than EINVAL.  I think the conservative thing to do
> is refuse to start up.  On EPERM, for example, it's possible that there
> is a postmaster running in your PGDATA but with a different userid.

Yes, if possible a more meaningfull error message and pointer to
some docco would be nice or even a nice "i don't care, i killed
all the backends, just start darnit" flag, it's really no fun at
all to have to attempt to decypher some cryptic error message at
3am when the database/system is acting up. :)

> > Seriously, there's some dispute on the type that 'shm_nattch' is,
> > under Solaris it's "shmatt_t" (unsigned long afaik), under FreeBSD
> > it's 'short' (i should fix this. :)).
> 
> > But since you're really only testing for 0'ness then it shouldn't
> > really be a problem.
> 
> We need not copy the value anywhere, so as long as the struct is
> correctly declared in the system header files I don't think it matters
> what the field type is ...

Yup, my point exactly.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster

2001-03-06 Thread Alfred Perlstein

* Tom Lane <[EMAIL PROTECTED]> [010306 10:35] wrote:
> Alfred Perlstein <[EMAIL PROTECTED]> writes:
> 
> > What about encoding the shm id in the pidfile?  Then one can just ask
> > how many processes are attached to that segment?  (if it doesn't
> > exist, one can assume all backends have exited)
> 
> Hmm ... that might actually be a pretty good idea.  A small problem is
> that the shm key isn't yet selected at the time we initially create the
> lockfile, but I can't think of any reason that we could not go back and
> append the key to the lockfile afterwards.
> 
> > you want the field 'shm_nattch'
> 
> Are there any portability problems with relying on shm_nattch to be
> available?  If not, I like this a lot...

Well it's available on FreeBSD and Solaris, I'm sure Redhat has
some deamon that resets the value to 0 periodically just for kicks
so it might not be viable... :)

Seriously, there's some dispute on the type that 'shm_nattch' is,
under Solaris it's "shmatt_t" (unsigned long afaik), under FreeBSD
it's 'short' (i should fix this. :)).

But since you're really only testing for 0'ness then it shouldn't
really be a problem.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster

2001-03-06 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [010306 10:10] wrote:
> Alfred Perlstein <[EMAIL PROTECTED]> writes:
> > I'm sure some sort of encoding of the PGDATA directory along with
> > the pids stored in the shm segment...
> 
> I thought about this too, but it strikes me as not very trustworthy.
> The problem is that there's no guarantee that the new postmaster will
> even notice the old shmem segment: it might select a different shmem
> key.  (The 7.1 coding of shmem key selection makes this more likely
> than it used to be, but even under 7.0, it will certainly fail to work
> if I choose to start the new postmaster using a different port number
> than the old one had.  The shmem key is driven primarily by port number
> not data directory ...)

This seems like a mistake.  

I'm suprised you guys aren't just using some form of the FreeBSD
ftok() algorithm for this:

FTOK(3)FreeBSD Library Functions ManualFTOK(3)

...

 The ftok() function attempts to create a unique key suitable for use with
 the msgget(3), semget(2) and shmget(2) functions given the path of an ex-
 isting file and a user-selectable id.

 The specified path must specify an existing file that is accessible to
 the calling process or the call will fail.  Also, note that links to
 files will return the same key, given the same id.

BUGS
 The returned key is computed based on the device minor number and inode
 of the specified path in combination with the lower 8 bits of the given
 id.  Thus it is quite possible for the routine to return duplicate keys.

The "BUGS" seems to be exactly what you guys are looking for, a somewhat
reliable method of obtaining a system id.  If that sounds evil, read 
below for an alternate suggestion.

> The interlock has to be tightly tied to the PGDATA directory, because
> what we're trying to protect is the files in and under that directory.
> It seems that something based on file(s) in that directory is the way
> to go.
> 
> The best idea I've seen so far is Hiroshi's idea of having all the
> backends hold fcntl locks on the same file (probably postmaster.pid
> would do fine).  Then the new postmaster can test whether any backends
> are still alive by trying to lock the old postmaster.pid file.
> Unfortunately, I read in the fcntl man page:
> 
> Locks are not inherited by a child process in a fork(2) system call.
> 
> This makes the idea much less attractive than I originally thought:
> a new backend would not automatically inherit a lock on the
> postmaster.pid file from the postmaster, but would have to open/lock it
> for itself.  That means there's a window where the new backend exists
> but would be invisible to a hypothetical new postmaster.
> 
> We could work around this with the following, very ugly protocol:
> 
> 1. Postmaster normally maintains fcntl read lock on its postmaster.pid
> file.  Each spawned backend immediately opens and read-locks
> postmaster.pid, too, and holds that file open until it dies.  (Thus
> wasting a kernel FD per backend, which is one of the less attractive
> things about this.)  If the backend is unable to obtain read lock on
> postmaster.pid, then it complains and dies.  We must use read locks
> here so that all these processes can hold them separately.
> 
> 2. If a newly started postmaster sees a pre-existing postmaster.pid
> file, it tries to obtain a *write* lock on that file.  If it fails,
> conclude that an old postmaster or backend is still alive; complain
> and quit.  If it succeeds, sit for say 1 second before deleting the file
> and creating a new one.  (The delay here is to allow any just-started
> old backends to fail to acquire read lock and quit.  A possible
> objection is that we have no way to guarantee 1 second is enough, though
> it ought to be plenty if the lock acquisition is just after the fork.)
> 
> One thing that worries me a little bit is that this means an fcntl
> read-lock request will exist inside the kernel for each active backend.
> Does anyone know of any performance problems or hard kernel limits we
> might run into with large numbers of backends (lots and lots of fcntl
> locks)?  At least the locks are on a file that we don't actually touch
> in the normal course of business.
> 
> A small savings is that the backends don't actually need to open new FDs
> for the postmaster.pid file; they can use the one they inherit from the
> postmaster, even though they do need to lock it again.  I'm not sure how
> much that saves inside the kernel, but at least something.
> 
> There are also the usual set of concerns about portability of flock,
> though this time we're locking a plain file and not a socket, so it
> sh

Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster

2001-03-05 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [010305 19:13] wrote:
> Lamar Owen <[EMAIL PROTECTED]> writes:
> > Tom Lane wrote:
> >> Postmaster down, backends alive is not a scenario we're currently
> >> prepared for.  We need a way to plug that gap.
> 
> > Postmaster can easily enough find out if zombie backends are 'out there'
> > during startup, right?
> 
> If you think it's easy enough, enlighten the rest of us ;-).  Be sure
> your solution only finds leftover backends from the previous instance of
> the same postmaster, else it will prevent running multiple postmasters
> on one system.

I'm sure some sort of encoding of the PGDATA directory along with
the pids stored in the shm segment...

> > What can postmaster _do_ about it, though?  It
> > won't necessarily be able to kill them -- but it also can't control
> > them.  If it _can_ kill them, should it try?
> 
> I think refusal to start is sufficient.  They should go away by
> themselves as their clients disconnect, and forcing the issue doesn't
> seem like it will improve matters.  The admin can kill them (hopefully
> with just a SIGTERM ;-)) if he wants to move things along ... but I'd
> not like to see a newly-starting postmaster do that automatically.

I agree, shooting down processes incorrectly should be left up to
vendors braindead scripts. :)

> > Should the backend look for the presence of its parent postmaster
> > periodically and gracefully come down if postmaster goes away without
> > the proper handshake?
> 
> Unless we checked just before every disk write, this wouldn't represent
> a safe failure mode.  The onus has to be on the newly-starting
> postmaster, I think, not on the old backends.
> 
> > Should a set of backends detect a new postmaster coming up and try to
> > 'sync up' with that postmaster,
> 
> Nice try ;-).  How will you persuade the kernel that these processes are
> now children of the new postmaster?

Oh, easy, use ptrace. :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster

2001-03-05 Thread Alfred Perlstein

* Tom Lane <[EMAIL PROTECTED]> [010305 14:51] wrote:
> 
> I think we need a stronger interlock to prevent this scenario, but I'm
> unsure what it should be.  Ideas?

Re having multiple postmasters active by accident.

The sysV IPC stuff has some hooks in it that may help you.

One idea is to check the 'struct shmid_ds' feild 'shm_nattch',
basically at startup if it's not 1 (or 0) then you have more than
one postgresql instance messing with it and it should not proceed.

I'd also suggest looking into using sysV semaphores and the semundo
stuff, afaik it can be used to track the number of consumers of
a reasource.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] preproc.y error

2001-02-07 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [010207 17:24] wrote:
> Vince Vielhaber <[EMAIL PROTECTED]> writes:
> > Now I get:
> > byacc -d  preproc.y
> > byacc: f - maximum table size exceeded
> > gmake[4]: *** [preproc.c] Error 2
> 
> Better install bison if you want to work with CVS sources ...
> the lack of bison probably explains why it's failing for you on
> this system when it's OK on other FreeBSD boxes.
> 
> I wonder if we ought not accept byacc as a suitable yacc in configure?
> Peter, what do you think?

I think I reported this broken a couple of months ago, but it was
too late to add the check to configure for 7.0.  byacc doesn't work,
you need bison (or maybe some special flags to byacc).

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Auto-indexing

2001-02-06 Thread Alfred Perlstein

* Christopher Kings-Lynne <[EMAIL PROTECTED]> [010206 18:29] wrote:
> Is it a feasible idea that PostgreSQL could detect when an index would be
> handy, and create it itself, or at least log that a table is being queried
> but the indices are not appropriate?
> 
> I suggest this as it's a feature of most windows databases, and MySQL does
> it.  I think it would be a great timesaver as we have hundreds of different
> queries, and it's a real pain to have to EXPLAIN them all, etc.   Is that
> possible?  Feasible?

Probably both, but if it's done there should be options to:

.) disable it completely or by table/database or even threshold or
   disk free parameters (indicies can be large)
.) log any auto-created databases to inform the DBA.
.) if disabled optionally log when it would have created an index on
   the fly.  (suggest an index)
.) expire old and unused auto-created indecies.

Generally Postgresql assumes the user knows what he's doing, but
it couldn't hurt too much to provide an option to have it assist
the user.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] RE: Index grows huge, possible leakage?

2001-02-05 Thread &#x27;Alfred Perlstein'


* Mikheev, Vadim <[EMAIL PROTECTED]> [010202 10:39] wrote:
> > After several weeks our idicies grow very large (in one case to
> > 4-5 gigabytes) After droppping and recreating the indecies they
> > shrink back to something more reasonable (500megs same case).
> > 
> > We are currently using Vadim's vacuum patches for VLAZY and MMNB, 
> > against 7.0.3.  We are using a LAZY vacuum on these tables
> > 
> > However a normal (non-lazy) vacuum doesn't shrink the index, the
> > only thing that helps reduce the size is dropping and recreating.
> > 
> > Is this a bug in 7.0.3?  A possible bug in Vadim's patches? Or is
> > this somewhat expected behavior that we have to cope with?
> 
> When index is created its pages are filled in full => any insert
> into such pages results in page split - ie in additional page.
> So, it's very easy to get 4Gb from 500Mb.

Well that certainly stinks. :(

> Vacuum was never able to shrink indices - it just removes dead index
> tuples and so allows to re-use space ... if you'll insert the same
> keys.

This doesn't make sense to me, seriously, if the table is locked
during a normal vacuum (not VLAZY), why not have vaccum make a
new index by copying valid index entries into a new index instead
of just vacating slots that aren't used?

> To know does VLAZY work properly or not I would need in vacuum debug
> messages. Did you run vacuum with verbose option or do you have
> postmaster' logs? With LAZY vacuum writes messages like
> 
> Index _name_: deleted XXX unfound YYY
> 
> YYY supposed to be 0...

With what you explained (indecies normally growing) I don't think
VLAZY is the problem here.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Re: 1024 limits??

2001-02-05 Thread Alfred Perlstein


* Mathieu Dube <[EMAIL PROTECTED]> [010205 09:32] wrote:
> cc -o therver therver.c -g -lflipr -lpq -L. -lpthread -D_REENTRANT
> ./libpq.a(fe-auth.o): In function `pg_password_sendauth':
> /usr/local/postgresql-7.0.3/src/interfaces/libpq/fe-auth.c:465: undefined reference 
>to `crypt'
> collect2: ld returned 1 exit status
> make: *** [therver] Error 1
> 
> this is what I get 
> should I copy other files?

no, just add -lcrypt to your LDFLAGS

[HACKERS] Index grows huge, possible leakage?

2001-02-01 Thread Alfred Perlstein


After several weeks our idicies grow very large (in one case to
4-5 gigabytes) After droppping and recreating the indecies they
shrink back to something more reasonable (500megs same case).

We are currently using Vadim's vacuum patches for VLAZY and MMNB, 
against 7.0.3.  We are using a LAZY vacuum on these tables

However a normal (non-lazy) vacuum doesn't shrink the index, the
only thing that helps reduce the size is dropping and recreating.

Is this a bug in 7.0.3?  A possible bug in Vadim's patches? Or is
this somewhat expected behavior that we have to cope with?

As a side note, the space requirement is actually 'ok' it's just
that performance gets terrible once the indecies reach such huge
sizes.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Sure enough, the lock file is gone

2001-01-26 Thread Alfred Perlstein


* Peter Eisentraut <[EMAIL PROTECTED]> [010126 12:11] wrote:
> The 'tmpwatch' program on Red Hat will remove the /tmp/.s.PGSQL.5432.lock
> file after the server has run 6 days.  This will be a problem.
> 
> We could touch (open) the file once every time the ServerLoop() runs
> around.  It's not perfect but it should work in practice.

Why not have the RPM/configure scripts stick it in where ever redhat
says it's safe to?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Libpq async issues

2001-01-24 Thread Alfred Perlstein

* Tom Lane <[EMAIL PROTECTED]> [010124 10:27] wrote:
> Alfred Perlstein <[EMAIL PROTECTED]> writes:
> > * Bruce Momjian <[EMAIL PROTECTED]> [010124 07:58] wrote:
> >> I have added this email to TODO.detail and a mention in the TODO list.
> 
> > The bug mentioned here is long gone,
> 
> Au contraire, the misdesign is still there.  The nonblock-mode code
> will *never* be reliable under stress until something is done about
> that, and that means fairly extensive code and API changes.

The "bug" is the one mentioned in the first paragraph of the email
where I broke _blocking_ connections for a short period.

I still need to fix async connections for myself (and of course
contribute it back), but I just haven't had the time.  If anyone
else wants it fixed earlier they can wait for me to do it, do it
themself, contract me to do it or hope someone else comes along
to fix it.

I'm thinking that I'll do what you said and have seperate paths
for writing/reading to the socket and API's to do so that give
the user the option of a boundry, basically:

 buffer this, but don't allow me to write until it's flushed

which would allow for larger than 8k COPY rows to go into the
backend.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Libpq async issues

2001-01-24 Thread Alfred Perlstein

he state is
> > when the can't-do-it-yet return is made; note that I specified "no data
> > was queued" above.  If pqPutBytes might queue some of the data before
> > returning 1, the application is in trouble again.  While you apparently
> > foresaw that in recoding pqPutBytes, your code doesn't actually work.
> > There is the minor code bug that you fail to update "avail" after the
> > first pqFlush call, and the much more fundamental problem that you
> > cannot guarantee to have queued all or none of the data.  Think about
> > what happens if the passed nbytes is larger than the output buffer size.
> > You may pass the first pqFlush successfully, then get into the loop and
> > get a won't-block return from pqFlush in the loop.  What then?
> > You can't simply refuse to support the case nbytes > bufsize at all,
> > because that will cause application failures as well (too long query
> > sends it into an infinite loop trying to queue data, most likely).
> > 
> > A possible answer is to specify that a return of +N means "N bytes
> > remain unqueued due to risk of blocking" (after having queued as much
> > as you could).  This would put the onus on the caller to update his
> > pointers/counts properly; propagating that into all the internal uses
> > of pqPutBytes would be no fun.  (Of course, so far you haven't updated
> > *any* of the internal callers to behave reasonably in case of a
> > won't-block return; PQfn is just one example.)
> > 
> > Another possible answer is to preserve pqPutBytes' old API, "queue or
> > bust", by the expedient of enlarging the output buffer to hold whatever
> > we can't send immediately.  This is probably more attractive, even
> > though a long query might suck up a lot of space that won't get
> > reclaimed as long as the connection lives.  If you don't do this then
> > you are going to have to make a lot of ugly changes in the internal
> > callers to deal with won't-block returns.  Actually, a bulk COPY IN
> > would probably be the worst case --- the app could easily load data into
> > the buffer far faster than it could be sent.  It might be best to extend
> > PQputline to have a three-way return and add code there to limit the
> > growth of the output buffer, while allowing all internal callers to
> > assume that the buffer is expanded when they need it.
> > 
> > pqFlush has the same kind of interface design problem: the same EOF code
> > is returned for either a hard error or can't-flush-yet, but it would be
> > disastrous to treat those cases alike.  You must provide a 3-way return
> > code.
> > 
> > Furthermore, the same sort of 3-way return code convention will have to
> > propagate out through anything that calls pqFlush (with corresponding
> > documentation updates).  pqPutBytes can be made to hide a pqFlush won't-
> > block return by trying to enlarge the output buffer, but in most other
> > places you won't have a choice except to punt it back to the caller.
> > 
> > PQendcopy has the same interface design problem.  It used to be that
> > (unless you passed a null pointer) PQendcopy would *guarantee* that
> > the connection was no longer in COPY state on return --- by resetting
> > it, if necessary.  So the return code was mainly informative; the
> > application didn't have to do anything different if PQendcopy reported
> > failure.  But now, a nonblocking application does need to pay attention
> > to whether PQendcopy completed or not --- and you haven't provided a way
> > for it to tell.  If 1 is returned, the connection might still be in
> > COPY state, or it might not (PQendcopy might have reset it).  If the
> > application doesn't distinguish these cases then it will fail.
> > 
> > I also think that you want to take a hard look at the automatic "reset"
> > behavior upon COPY failure, since a PQreset call will block the
> > application until it finishes.  Really, what is needed to close down a
> > COPY safely in nonblock mode is a pair of entry points along the line of
> > "PQendcopyStart" and "PQendcopyPoll", with API conventions similar to
> > PQresetStart/PQresetPoll.  This gives you the ability to do the reset
> > (if one is necessary) without blocking the application.  PQendcopy
> > itself will only be useful to blocking applications.
> > 
> > > I'm sorry if they don't work for some situations other than COPY IN,
> > > but it's functionality that I needed and I expect to be expanded on
> > > by myself and others that take interest in nonblocking operation.
> > 
> > I don't think that the nonblock code is anywhere near production quality
> > at this point.  It may work for you, if you don't stress it too hard and
> > never have a communications failure; but I don't want to see us ship it
> > as part of Postgres unless these issues get addressed.
> > 
> > regards, tom lane
> > 
> > 
> > 
> 
> 
> -- 
>   Bruce Momjian|  http://candle.pha.pa.us
>   [EMAIL PROTECTED]   |  (610) 853-3000
>   +  If your life is a hard drive, |  830 Blythe Avenue
>   +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Does Oracle store values in indices?

2001-01-23 Thread Alfred Perlstein


* Bruce Momjian <[EMAIL PROTECTED]> [010123 11:17] wrote:
> [ Charset KOI8-R unsupported, converting... ]
> > > > > The reason you have to visit the main table is that tuple validity
> > > > > status is only stored in the main table, not in each index.
> > > > > See prior discussions in the archives.
> > > >
> > > > But how Oracle handles this?
> > >
> > > Oracle doesn't have non-overwriting storage manager but uses
> > > rollback segments to maintain MVCC. Rollback segments are used
> > > to restore valid version of entire index/table page.
> > 
> > Are there any plans to have something like this? I mean overwriting storage 
> > manager.
> 
> We hope to have it some day, hopefully soon.

Vadim says that he hopes it to be done by 7.2, so if things go
well it shouldn't be that far off...

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Patches with vacuum fixes available for 7.0.x

2001-01-23 Thread Alfred Perlstein


* Bruce Momjian <[EMAIL PROTECTED]> [010122 19:55] wrote:
> 
> Vadim, did these patches ever make it into 7.1?

According to:
http://www.postgresql.org/cgi/cvsweb.cgi/pgsql/src/backend/parser/gram.y?rev=2.217&content-type=text/x-cvsweb-markup

nope. :(

> 
> > We recently had a very satisfactory contract completed by
> > Vadim.
> > 
> > Basically Vadim has been able to reduce the amount of time
> > taken by a vacuum from 10-15 minutes down to under 10 seconds.
> > 
> > We've been running with these patches under heavy load for
> > about a week now without any problems except one:
> >   don't 'lazy' (new option for vacuum) a table which has just
> >   had an index created on it, or at least don't expect it to
> >   take any less time than a normal vacuum would.
> > 
> > There's three patchsets and they are available at:
> > 
> > http://people.freebsd.org/~alfred/vacfix/
> > 
> > complete diff:
> > http://people.freebsd.org/~alfred/vacfix/v.diff
> > 
> > only lazy vacuum option to speed up index vacuums:
> > http://people.freebsd.org/~alfred/vacfix/vlazy.tgz
> > 
> > only lazy vacuum option to only scan from start of modified
> > data:
> > http://people.freebsd.org/~alfred/vacfix/mnmb.tgz
> > 
> > Although the patches are for 7.0.x I'm hoping that they
> > can be forward ported (if Vadim hasn't done it already)
> > to 7.1.
> > 
> > enjoy!
> > 
> > -- 
> > -Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
> > "I have the heart of a child; I keep it in a jar on my desk."
> > 
> 
> 
> -- 
>   Bruce Momjian|  http://candle.pha.pa.us
>   [EMAIL PROTECTED]   |  (610) 853-3000
>   +  If your life is a hard drive, |  830 Blythe Avenue
>   +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Re: [GENERAL] postgres memory management

2001-01-22 Thread Alfred Perlstein


* Peter Mount <[EMAIL PROTECTED]> [010122 13:21] wrote:
> At 13:18 21/01/01 +0100, Alexander Jerusalem wrote:
> >Hi all,
> >
> >I'm experiencing some strange behaviour with postgresql 7.0.3 on Red Hat 
> >Linux 7. I'm sending lots of insert statements to the postgresql server 
> >from another machine via JDBC. During that process postgresql continues to 
> >take up more and more memory and seemingly never returns it to the system. 
> >Oddly if I watch the postmaster and it's sub processes in ktop, I can't 
> >see which process takes up this memory. ktop shows that the postgresql 
> >related processes have a constant memory usage but the overall memory 
> >usage always increases as long as I continue to send insert statements.
> >
> >When the database connection is closed, no memory is reclaimed, the 
> >overall memory usage stays the same. And when I close down all postgresql 
> >processes including postmaster, it's the same.
> >I'm rather new to Linux and postgresql so I'm not sure if I should call 
> >this a memory leak :-)
> >Has anybody experienced a similar thing?
> 
> I'm not sure myself. You can rule out JDBC (or Java) here as you say you 
> are connecting from another machine.
> 
> When your JDBC app closes, does it call the connection's close() method? 
> Does any messages like "Unexpected EOF from client" appear on the server side?
> 
> The only other thing that comes to mine is possibly something weird is 
> happening with IPC. After you closed down postgres, does ipcclean free up 
> any memory?

I don't know if this is valid for Linux, but it is how FreeBSD
works, for the most part used memory is never free'd, it is only
marked as reclaimable.  This is so the system can cache more data.
On a freshly booted FreeBSD box you'll have a lot of 'free' memory,
after the box has been running for a long time the 'free' memory
will probably never go higher that 10megs, the rest is being used
as cache.

The main things you have to worry about is:
a) really running out of memory (are you useing a lot of swap?)
b) not cleaning up IPC as Peter suggested.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

1 2 3 >

1 - 100 of 212 matches

Mail list logo