from:"Alfred Perlstein"

Re: [HACKERS] pgbench vs. wait events

2016-10-06 Thread Alfred Perlstein


Robert,

This contention on WAL reminds me of another scenario I've heard about 
that was similar.


To fix things what happened was that anyone that the first person to 
block would be responsible for writing out all buffers for anyone 
blocked behind "him".


The for example if you have many threads, A, B, C, D

If while A is writing to WAL and hold the lock, then B arrives, B of 
course blocks, then C comes along and blocks as well, then D.


Finally A finishes its write and then 

Now you have two options for resolving this, either

1) A drops its lock, B picks up the lock... B writes its buffer and then 
drops the lock.  Then C gets the lock, writes its buffer, drops the 
lock, then finally D gets the lock, writes its buffer and then drops the 
lock.


2) A then writes out B's, C's, and D's buffers, then A drops the lock, 
B, C and D wake up, note that their respective buffers are written and 
just return.  This greatly speeds up the system. (just be careful to 
make sure A doesn't do "too much work" otherwise you can get a sort of 
livelock if too many threads are blocked behind it, generally only issue 
one additional flush on behalf of other threads, do not "loop until the 
queue is empty")


I'm not sure if this is actually possible with the way WAL is 
implemented, (or perhaps if this strategy is already implemented) but 
it's definitely worth if not done already as it can speed things up 
enormously.



On 10/6/16 11:38 AM, Robert Haas wrote:

Hi,

I decided to do some testing on hydra (IBM-provided community
resource, POWER, 16 cores/64 threads, kernel 3.2.6-3.fc16.ppc64) using
the newly-enhanced wait event stuff to try to get an idea of what
we're waiting for during pgbench.  I did 30-minute pgbench runs with
various configurations, but all had max_connections = 200,
shared_buffers = 8GB, maintenance_work_mem = 4GB, synchronous_commit =
off, checkpoint_timeout = 15min, checkpoint_completion_target = 0.9,
log_line_prefix = '%t [%p] ', max_wal_size = 40GB, log_checkpoints =
on.  During each run, I ran this psql script in another window and
captured the output:

\t
select wait_event_type, wait_event from pg_stat_activity where pid !=
pg_backend_pid()
\watch 0.5

Then, I used a little shell-scripting to count up the number of times
each wait event occurred in the output.  First, I tried scale factor
3000 with 32 clients and got these results:

   1  LWLockTranche   | buffer_mapping
   9  LWLockNamed | CLogControlLock
  14  LWLockNamed | ProcArrayLock
  16  Lock| tuple
  25  LWLockNamed | CheckpointerCommLock
  49  LWLockNamed | WALBufMappingLock
 122  LWLockTranche   | clog
 182  Lock| transactionid
 287  LWLockNamed | XidGenLock
1300  Client  | ClientRead
1375  LWLockTranche   | buffer_content
3990  Lock| extend
   21014  LWLockNamed | WALWriteLock
   28497  |
   58279  LWLockTranche   | wal_insert

tps = 1150.803133 (including connections establishing)

What jumps out here is, at least to me, is that there is furious
contention on both the wal_insert locks and on WALWriteLock.
Apparently, the system simply can't get WAL on disk fast enough to
keep up with this workload.   Relation extension locks and
buffer_content locks also are also pretty common, both ahead of
ClientRead, a relatively uncommon wait event on this test.  The load
average on the system was only about 3 during this test, indicating
that most processes are in fact spending most of their time off-CPU.
The first thing I tried was switching to unlogged tables, which
produces these results:

   1  BufferPin   | BufferPin
   1  LWLockTranche   | lock_manager
   2  LWLockTranche   | buffer_mapping
   8  LWLockNamed | ProcArrayLock
   9  LWLockNamed | CheckpointerCommLock
   9  LWLockNamed | CLogControlLock
  11  LWLockTranche   | buffer_content
  37  LWLockTranche   | clog
 153  Lock| tuple
 388  LWLockNamed | XidGenLock
 827  Lock| transactionid
1267  Client  | ClientRead
   20631  Lock| extend
   91767  |

tps = 1223.239416 (including connections establishing)

If you don't look at the TPS number, these results look like a vast
improvement.  The overall amount of time spent not waiting for
anything is now much higher, and the problematic locks have largely
disappeared from the picture.  However, the load average now shoots up
to about 30, because most of the time that the backends are "not
waiting for anything" they are in fact in kernel wait state D; that
is, they're stuck doing I/O.  This suggests that we might want to
consider advertising a wait state when a backend is doing I/O, so we
can measure this sort of thing.

Next, I tried lowering the scale factor to something that fits in
shared buffers.  Here are the results at scale factor 300:

  14  Lock| tu

Re: [HACKERS] pgbench vs. wait events

2016-10-07 Thread Alfred Perlstein


Robert,

This contention on WAL reminds me of another scenario I've heard about 
that was similar.


To fix things what happened was that anyone that the first person to 
block would be responsible for writing out all buffers for anyone 
blocked behind "him".


The for example if you have many threads, A, B, C, D

If while A is writing to WAL and hold the lock, then B arrives, B of 
course blocks, then C comes along and blocks as well, then D.


Finally A finishes its write and then 

Now you have two options for resolving this, either

1) A drops its lock, B picks up the lock... B writes its buffer and then 
drops the lock.  Then C gets the lock, writes its buffer, drops the 
lock, then finally D gets the lock, writes its buffer and then drops the 
lock.


2) A then writes out B's, C's, and D's buffers, then A drops the lock, 
B, C and D wake up, note that their respective buffers are written and 
just return.  This greatly speeds up the system. (just be careful to 
make sure A doesn't do "too much work" otherwise you can get a sort of 
livelock if too many threads are blocked behind it, generally only issue 
one additional flush on behalf of other threads, do not "loop until the 
queue is empty")


I'm not sure if this is actually possible with the way WAL is 
implemented, (or perhaps if this strategy is already implemented) but 
it's definitely worth if not done already as it can speed things up 
enormously.


On 10/6/16 11:38 AM, Robert Haas wrote:

Hi,

I decided to do some testing on hydra (IBM-provided community
resource, POWER, 16 cores/64 threads, kernel 3.2.6-3.fc16.ppc64) using
the newly-enhanced wait event stuff to try to get an idea of what
we're waiting for during pgbench.  I did 30-minute pgbench runs with
various configurations, but all had max_connections = 200,
shared_buffers = 8GB, maintenance_work_mem = 4GB, synchronous_commit =
off, checkpoint_timeout = 15min, checkpoint_completion_target = 0.9,
log_line_prefix = '%t [%p] ', max_wal_size = 40GB, log_checkpoints =
on.  During each run, I ran this psql script in another window and
captured the output:

\t
select wait_event_type, wait_event from pg_stat_activity where pid !=
pg_backend_pid()
\watch 0.5

Then, I used a little shell-scripting to count up the number of times
each wait event occurred in the output.  First, I tried scale factor
3000 with 32 clients and got these results:

   1  LWLockTranche   | buffer_mapping
   9  LWLockNamed | CLogControlLock
  14  LWLockNamed | ProcArrayLock
  16  Lock| tuple
  25  LWLockNamed | CheckpointerCommLock
  49  LWLockNamed | WALBufMappingLock
 122  LWLockTranche   | clog
 182  Lock| transactionid
 287  LWLockNamed | XidGenLock
1300  Client  | ClientRead
1375  LWLockTranche   | buffer_content
3990  Lock| extend
   21014  LWLockNamed | WALWriteLock
   28497  |
   58279  LWLockTranche   | wal_insert

tps = 1150.803133 (including connections establishing)

What jumps out here is, at least to me, is that there is furious
contention on both the wal_insert locks and on WALWriteLock.
Apparently, the system simply can't get WAL on disk fast enough to
keep up with this workload.   Relation extension locks and
buffer_content locks also are also pretty common, both ahead of
ClientRead, a relatively uncommon wait event on this test.  The load
average on the system was only about 3 during this test, indicating
that most processes are in fact spending most of their time off-CPU.
The first thing I tried was switching to unlogged tables, which
produces these results:

   1  BufferPin   | BufferPin
   1  LWLockTranche   | lock_manager
   2  LWLockTranche   | buffer_mapping
   8  LWLockNamed | ProcArrayLock
   9  LWLockNamed | CheckpointerCommLock
   9  LWLockNamed | CLogControlLock
  11  LWLockTranche   | buffer_content
  37  LWLockTranche   | clog
 153  Lock| tuple
 388  LWLockNamed | XidGenLock
 827  Lock| transactionid
1267  Client  | ClientRead
   20631  Lock| extend
   91767  |

tps = 1223.239416 (including connections establishing)

If you don't look at the TPS number, these results look like a vast
improvement.  The overall amount of time spent not waiting for
anything is now much higher, and the problematic locks have largely
disappeared from the picture.  However, the load average now shoots up
to about 30, because most of the time that the backends are "not
waiting for anything" they are in fact in kernel wait state D; that
is, they're stuck doing I/O.  This suggests that we might want to
consider advertising a wait state when a backend is doing I/O, so we
can measure this sort of thing.

Next, I tried lowering the scale factor to something that fits in
shared buffers.  Here are the results at scale factor 300:

  14  Lock| tup

Re: [HACKERS] pgbench vs. wait events

2016-10-07 Thread Alfred Perlstein




On 10/7/16 10:42 AM, Andres Freund wrote:

Hi,

On 2016-10-06 20:52:22 -0700, Alfred Perlstein wrote:

This contention on WAL reminds me of another scenario I've heard about that
was similar.

To fix things what happened was that anyone that the first person to block
would be responsible for writing out all buffers for anyone blocked behind
"him".

We pretty much do that already. But while that's happening, the other
would-be-writers show up as blocking on the lock.  We don't use kind of
an odd locking model for the waiters (LWLockAcquireOrWait()), which
waits for the lock to be released, but doesn't try to acquire it
afterwards. Instead the wal position is rechecked, and in many cases
we'll be done afterwards, because enough has been written out.

Greetings,

Andres Freund



Are the batched writes all done before fsync is called?

Are you sure that A only calls fsync after flushing all the buffers from 
B, C, and D?  Or will it fsync twice?  Is there instrumentation to show 
that?


I know there's a tremendous level of skill involved in this code, but 
simply asking in case there's some tricks.


Another strategy that may work is actually intentionally 
waiting/buffering some few ms between flushes/fsync, for example, make 
sure that the number of flushes per second doesn't exceed some 
configurable amount because each flush likely eats at least one iop from 
the disk and there is a maximum iops per disk, so might as well buffer 
more if you're exceeding that iops count.  You'll trade some latency, 
but gain throughput for doing that.


Does this make sense?  Again apologies if this has been covered.  Is 
there a whitepaper or blog post or clear way I can examine the algorithm 
wrt locks/buffering for flushing WAL logs?


-Alfred






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] PGCON meetup FreeNAS/FreeBSD: In Ottawa Tue & Wed.

2013-05-20 Thread Alfred Perlstein


Hello PostgreSQL Hackers,

I am now in Ottawa, last week we wrapped up the BSDCon and I was hoping 
to chat with a few Postgresql developers in person about using 
Postgresql in FreeNAS and offering it as an extension to the platform as 
a plug-in technology.  Unfortunately due to time constraints I can not 
attend the entire conference and I am only in town until Wednesday at noon.


I'm hoping there's a good time to talk to a few developers about 
Postgresql + FreeNAS before I have to depart back to the bay area.


Some info on me:  My name is Alfred Perlstein, I am a FreeBSD developer 
and FreeNAS project lead.  I am the VP of Software Engineering at 
iXsystems.  I have been a fan of Postgresql for many years.  In the 
early 2000s we build a high speed web tracking application on top of 
Postgresql and worked closely with the community to shake out 
performance and bug, so closely that Tom Lane and Vadim Mikheevhad 
logins on our box.  Since that time I have tried to get Postgresql into 
as many places as possible.


Some info on the topics I wanted to briefly discuss:

1) Using Postgresql as the config store for FreeNAS.
We currently use SQLITE, SQLITE fits our needs until we get to the point 
of replication between HA (high availability) units.  Then we are forced 
to manually sync data between configurations.  A discussion on how we 
might do this better using Postgresql, while still maintaining our ease 
of config export (single file) and small footprint would be interesting.


2) Postgresql plugin for FreeNAS.
Flip a switch and suddenly your file server is also serving enterprise 
data.  We currently have a plug-in architecture, but would like to 
discuss the possibility of a tighter integration so that Postgresql 
looks like a more cohesive addition to FreeNAS.


3) Statistic monitoring / EagleEye
In FreeBSD/FreeNAS I have developed a system called EagleEye. EagleEye 
is a system where all mibs are easily exportable with timestamps in a 
common format (for now YAML & modified CSV) which is then consumed by a 
utility which can then provide graphs. The entire point of EagleEye is 
to eventually upstream the modifications to future proof statistics 
tracking into the FreeBSD and FreeNAS systems.  I have spoken with some 
Illuminos/ZFS developers and they are interested as well.


I think that is all I have, please drop me a note if you'll have some 
time in Ottawa today, tomorrow or early Wednesday.  I'd love to discuss 
and buy some beers for the group.


thank you,
-Alfred Perlstein
VP Software Engineering, iXsystems.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Question about durability and postgresql.

2015-02-19 Thread Alfred Perlstein


Hello,

We have a combination of 9.3 and 9.4 databases used for logging of data.

We do not need a strong durability guarantee, meaning it is ok if on crash a 
minute or two of data is lost from our logs.  (This is just stats for our 
internal tool).

I am looking at this page:
http://www.postgresql.org/docs/9.4/static/non-durability.html

And it's not clear which setting I should turn on.

What we do NOT want is to lose the entire table or corrupt the database.  We do 
want to gain speed though by not making DATA writes durable.

Which setting is appropriate for this use case?

At a glance it looks like a combination of
1) "Turn off synchronous_commit"
and possibly:
2)  Increase checkpoint_segments and checkpoint_timeout ; this reduces the 
frequency of checkpoints, but increases the storage requirements of /pg_xlog.
3) Turn off full_page_writes; there is no need to guard against partial page 
writes.

The point here is to never get a corrupt database, but in case of crash we 
might lose a few minutes of last transactions.

Any suggestions please?

thank you,
-Alfred

[HACKERS] Question about durability and postgresql.

2015-02-20 Thread Alfred Perlstein

Hello,

We have a combination of 9.3 and 9.4 databases used for logging of data.

We do not need a strong durability guarantee, meaning it is ok if on crash a 
minute or two of data is lost from our logs.  (This is just stats for our 
internal tool).

I am looking at this page:
http://www.postgresql.org/docs/9.4/static/non-durability.html

And it's not clear which setting I should turn on.

What we do NOT want is to lose the entire table or corrupt the database.  We do 
want to gain speed though by not making DATA writes durable.

Which setting is appropriate for this use case?

At a glance it looks like a combination of
1) "Turn off synchronous_commit"
and possibly:
2)  Increase checkpoint_segments and checkpoint_timeout ; this reduces the 
frequency of checkpoints, but increases the storage requirements of /pg_xlog.
3) Turn off full_page_writes; there is no need to guard against partial page 
writes.

The point here is to never get a corrupt database, but in case of crash we 
might lose a few minutes of last transactions.

Any suggestions please?

thank you,
-Alfred

Re: [HACKERS] Why we lost Uber as a user

2016-08-02 Thread Alfred Perlstein

On 7/28/16 4:39 AM, Geoff Winkless wrote:

On 28 Jul 2016 12:19, "Vitaly Burovoy" > wrote:

>
> On 7/28/16, Geoff Winkless > wrote:
> > On 27 July 2016 at 17:04, Bruce Momjian > wrote:

> >
> >> Well, their big complaint about binary replication is that a bug can
> >> spread from a master to all slaves, which doesn't happen with 
statement

> >> level replication.
> >
> > 
> > I'm not sure that that makes sense to me. If there's a database 
bug that
> > occurs when you run a statement on the master, it seems there's a 
decent
> > chance that that same bug is going to occur when you run the same 
statement

> > on the slave.
> >
> > Obviously it depends on the type of bug and how identical the 
slave is, but

> > statement-level replication certainly doesn't preclude such a bug from
> > propagating.
> >
> > Geoff
>
> Please, read the article first! The bug is about wrong visibility of
> tuples after applying WAL at slaves.
> For example, you can see two different records selecting from a table
> by a primary key (moreover, their PKs are the same, but other columns
> differ).

I read the article. It affected slaves as well as the master.

I quote:
"because of the way replication works, this issue has the potential to 
spread into all of the databases in a replication hierarchy"

I maintain that this is a nonsense argument. Especially since (as you 
pointed out and as I missed first time around) the bug actually 
occurred at different records on different slaves, so he invalidates 
his own point.

Geoff

Seriously?

There's a valid point here, you're sending over commands at the block 
level, effectively "write to disk at this location" versus "update this 
record based on PK", obviously this has some drawbacks that are reason 
for concern.  Does it validate the move on its own? NO.  Does it add to 
the reasons to move away?  Yes, that much is obvious.

Please read this thread:
https://www.reddit.com/r/programming/comments/4vms8x/why_we_lost_uber_as_a_user_postgresql_mailing_list/d5zx82n

Do I love postgresql?  Yes.
Have I been bitten by things such as this?  Yes.
Should the community learn from these things and think of ways to avoid 
it?  Absolutely!

-Alfred

Re: [HACKERS] Why we lost Uber as a user

2016-08-02 Thread Alfred Perlstein




On 7/28/16 7:08 AM, Merlin Moncure wrote:


*) postgres may not be the ideal choice for those who want a thin and
simple database
This is a huge market, addressing it will bring mindshare and more jobs, 
code and braintrust to psql.


-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we lost Uber as a user

2016-08-02 Thread Alfred Perlstein




On 7/26/16 9:54 AM, Joshua D. Drake wrote:

Hello,

The following article is a very good look at some of our limitations 
and highlights some of the pains many of us have been working "around" 
since we started using the software.


https://eng.uber.com/mysql-migration/

Specifically:

* Inefficient architecture for writes
* Inefficient data replication
* Issues with table corruption
* Poor replica MVCC support
* Difficulty upgrading to newer releases

It is a very good read and I encourage our hackers to do so with an 
open mind.


Sincerely,

JD


It was a good read.

Having based a high performance web tracking service as well as a high 
performance security appliance on Postgresql I too have been bitten by 
these issues.


I had a few questions that maybe the folks with core knowledge can answer:

1) Would it be possible to create a "star-like" schema to fix this 
problem?  For example, let's say you have a table that is similar to Uber's:

col0pk, col1, col2, col3, col4, col5

All cols are indexed.
Assuming that updates happen to only 1 column at a time.
Why not figure out some way to encourage or automate the splitting of 
this table into multiple tables that present themselves as a single table?


What I mean is that you would then wind up with the following tables:
table1: col0pk, col1
table2: col0pk, col2
table3: col0pk, col3
table4: col0pk, col4
table5: col0pk, col5

Now when you update "col5" on a row, you only have to update the index 
on table5:col5 and table5:col0pk as opposed to beforehand where you 
would have to update more indecies.  In addition I believe that vacuum 
would be somewhat mitigated as well in this case.


2) Why not have a look at how innodb does its storage, would it be 
possible to do this?


3) For the small-ish table that Uber mentioned, is there a way to "have 
it in memory" however provide some level of sync to disk so that it is 
consistent?


thanks!
-Alfred




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we lost Uber as a user

2016-08-02 Thread Alfred Perlstein



> On Aug 2, 2016, at 2:33 AM, Geoff Winkless  wrote:
> 
>> On 2 August 2016 at 08:11, Alfred Perlstein  wrote:
>>> On 7/2/16 4:39 AM, Geoff Winkless wrote:
>>> I maintain that this is a nonsense argument. Especially since (as you 
>>> pointed out and as I missed first time around) the bug actually occurred at 
>>> different records on different slaves, so he invalidates his own point.
> 
>> Seriously?
> 
> No, I make a habit of spouting off random arguments to a list full of
> people whose opinions I massively respect purely for kicks. What do
> you think?
> 
>> There's a valid point here, you're sending over commands at the block level, 
>> effectively "write to disk at this location" versus "update this record 
>> based on PK", obviously this has some drawbacks that are reason for concern.
> 
> Writing values directly into file offsets is only problematic if
> something else has failed that has caused the file to be an inexact
> copy. If a different bug occurred that caused the primary key to be
> corrupted on the slave (or indeed the master), PK-based updates would
> exhibit similar propagation errors.
> 
> To reiterate my point, uber's described problem came about because of
> a bug. Every software has bugs at some point in its life, to pretend
> otherwise is simply naive. I'm not trying to excuse the bug, or to
> belittle the impact that such a bug has on data integrity or on uber
> or indeed on the reputation of PostgreSQL. While I'm prepared to
> accept (because I have a job that requires I spend time on things
> other than digging through obscure reddits and mailing lists to
> understand more fully the exact cause) that in _this particular
> instance_ the bug was propagated because of the replication mechanism
> (although I'm still dubious about that, as per my comment above), that
> does _not_ preclude other bugs propagating in a statement-based
> replication. That's what I said is a nonsense argument, and no-one has
> yet explained in what way that's incorrect.
> 
> Geoff


Geoff,

You are quite technical, my feeling is that you will understand it, however it 
will need to be a self learned lesson. 

-Alfred




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we lost Uber as a user

2016-08-02 Thread Alfred Perlstein




On 8/2/16 2:14 PM, Tom Lane wrote:

Stephen Frost  writes:

With physical replication, there is the concern that a bug in *just* the
physical (WAL) side of things could cause corruption.

Right.  But with logical replication, there's the same risk that the
master's state could be fine but a replication bug creates corruption on
the slave.

Assuming that the logical replication works by issuing valid SQL commands
to the slave, one could hope that this sort of "corruption" only extends
to having valid data on the slave that fails to match the master.
But that's still not a good state to be in.  And to the extent that
performance concerns lead the implementation to bypass some levels of the
SQL engine, you can easily lose that guarantee too.

In short, I think Uber's position that logical replication is somehow more
reliable than physical is just wishful thinking.  If anything, my money
would be on the other way around: there's a lot less mechanism that can go
wrong in physical replication.  Which is not to say there aren't good
reasons to use logical replication; I just do not believe that one.

regards, tom lane


The reason it can be less catastrophic is that for logical replication 
you may futz up your data, but you are safe from corrupting your entire 
db.  Meaning if an update is missed or doubled that may be addressed by 
a fixup SQL stmt, however if the replication causes a write to the 
entirely wrong place in the db file then you need to "fsck" your db and 
hope that nothing super critical was blown away.


The impact across a cluster is potentially magnified by physical 
replication.


So for instance, let's say there is a bug in the master's write to 
disk.  The logical replication acts as a barrier from that bad write 
going to the slaves.   With bad writes going to slaves then any 
corruption experienced on the master will quickly reach the slaves and 
they too will be corrupted.


With logical replication a bug may be stopped at the replication layer.  
At that point you can resync the slave from the master.


Now in the case of physical replication all your base are belong to zuul 
and you are in a very bad state.


That said with logical replication, who's to say that if the statement 
is replicated to a slave that the slave won't experience the same bug 
and also corrupt itself.


We may be saying the same thing, but still there is something to be said 
for logical replication... also, didnt they show that logical 
replication was faster for some use cases at Uber?


-Alfred







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we lost Uber as a user

2016-08-03 Thread Alfred Perlstein



> On Aug 3, 2016, at 3:29 AM, Greg Stark  wrote:
> 
>> 
> 
> Honestly the take-away I see in the Uber story is that they apparently
> had nobody on staff that was on -hackers or apparently even -general
> and tried to go it alone rather than involve experts from outside
> their company. As a result they misdiagnosed their problems based on
> prejudices seeing what they expected to see rather than what the real
> problem was.
> 

+1 very true. 

At the same time there are some lessons to be learned. At the very least 
putting in big bold letters where to come for help is one. 





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we lost Uber as a user

2016-08-04 Thread Alfred Perlstein




On 8/4/16 2:00 AM, Torsten Zuehlsdorff wrote:



On 03.08.2016 21:05, Robert Haas wrote:

On Wed, Aug 3, 2016 at 2:23 PM, Tom Lane  wrote:

Robert Haas  writes:

I don't think they are saying that logical replication is more
reliable than physical replication, nor do I believe that to be true.
I think they are saying that if logical corruption happens, you can
fix it by typing SQL statements to UPDATE, INSERT, or DELETE the
affected rows, whereas if physical corruption happens, there's no
equally clear path to recovery.


Well, that's not an entirely unreasonable point, but I dispute the
implication that it makes recovery from corruption an easy thing to do.
How are you going to know what SQL statements to issue?  If the master
database is changing 24x7, how are you going to keep up with that?


I think in many cases people fix their data using business logic.  For
example, suppose your database goes down and you have to run
pg_resetxlog to get it back up.  You dump-and-restore, as one does,
and find that you can't rebuild one of your unique indexes because
there are now two records with that same PK.  Well, what you do is you
look at them and judge which one has the correct data, often the one
that looks more complete or the one with the newer timestamp. Or,
maybe you need to merge them somehow.  In my experience helping users
through problems of this type, once you explain the problem to the
user and tell them they have to square it on their end, the support
call ends.  The user may not always be entirely thrilled about having
to, say, validate a problematic record against external sources of
truth, but they usually know how to do it.  Database bugs aren't the
only way that databases become inaccurate.  If the database that they
use to keep track of land ownership in the jurisdiction where I live
says that two different people own the same piece of property,
somewhere there is a paper deed in a filing cabinet.  Fishing that out
to understand what happened may not be fun, but a DBA can explain that
problem to other people in the organization and those people can get
it fixed.  It's a problem, but it's fixable.

On the other hand, if a heap tuple contains invalid infomask bits that
cause an error every time you read the page (this actually happened to
an EnterpriseDB customer!), the DBA can't tell other people how to fix
it and can't fix it personally either.  Instead, the DBA calls me.


After reading this statement the ZFS filesystem pops into my mind. It 
has protection build in against various problems (data degradation, 
current spikes, phantom writes, etc).


For me this raises two questions:

1) would the usage of ZFS prevent such errors?

My feeling would say yes, but i have no idea about how a invalid 
infomask bit could occur.


2) would it be possible to add such prevention to PostgreSQL

I know this could add a massive overhead, but it its optional this 
could be a fine thing?
Postgresql is very "zfs-like" in its internals.  The problem was a bug 
in postgresql that caused it to just write data to the wrong place.


Some vendors use ZFS under databases to provide very cool services such 
as backup snapshots, test snapshots and other such uses.  I think Joyent 
is one such vendor but I'm not 100% sure.


-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we lost Uber as a user

2016-08-16 Thread Alfred Perlstein




On 8/2/16 10:02 PM, Mark Kirkwood wrote:

On 03/08/16 02:27, Robert Haas wrote:


Personally, I think that incremental surgery on our current heap
format to try to fix this is not going to get very far.  If you look
at the history of this, 8.3 was a huge release for timely cleanup of
dead tuple.  There was also significant progress in 8.4 as a result of
5da9da71c44f27ba48fdad08ef263bf70e43e689.   As far as I can recall, we
then made no progress at all in 9.0 - 9.4.  We made a very small
improvement in 9.5 with 94028691609f8e148bd4ce72c46163f018832a5b, but
that's pretty niche.  In 9.6, we have "snapshot too old", which I'd
argue is potentially a large improvement, but it was big and invasive
and will no doubt pose code maintenance hazards in the years to come;
also, many people won't be able to use it or won't realize that they
should use it.  I think it is likely that further incremental
improvements here will be quite hard to find, and the amount of effort
will be large relative to the amount of benefit.  I think we need a
new storage format where the bloat is cleanly separated from the data
rather than intermingled with it; every other major RDMS works that
way.  Perhaps this is a case of "the grass is greener on the other
side of the fence", but I don't think so.


Yeah, I think this is a good summary of the state of play.

The only other new db development to use a non-overwriting design like 
ours that I know of was Jim Starky's Falcon engine for (ironically) 
Mysql 6.0. Not sure if anyone is still progressing that at all now.


I do wonder if Uber could have successfully tamed dead tuple bloat 
with aggressive per-table autovacuum settings (and if in fact they 
tried), but as I think Robert said earlier, it is pretty easy to come 
up with a highly update (or insert + delete) workload that makes for a 
pretty ugly bloat component even with real aggressive autovacuuming.
I also wonder if they had used "star schema" which to my understanding 
would mean multiple tables to replace the single-table that has multiple 
indecies to work around the write amplification problem in postgresql.




Cheers

Mark







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we lost Uber as a user

2016-08-16 Thread Alfred Perlstein




On 8/3/16 3:29 AM, Greg Stark wrote:


Honestly the take-away I see in the Uber story is that they apparently
had nobody on staff that was on -hackers or apparently even -general
and tried to go it alone rather than involve experts from outside
their company. As a result they misdiagnosed their problems based on
prejudices seeing what they expected to see rather than what the real
problem was.


Agree strongly, but there are still lessons to be learned on the psql side.

-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 7.1 vacuum

2001-04-26 Thread Alfred Perlstein


* Magnus Naeslund(f) <[EMAIL PROTECTED]> [010426 21:17] wrote:
> How does 7.1 work now with the vacuum and all?
> 
> Does it go for indexes by default, even when i haven't run a vacuum at all?
> Does vacuum lock up postgres? It says the analyze part shouldn't, but how's
> that for all of the vacuum?
> 
> An 7.0.3 db we have here we are forced to run vacuum every hour to get an
> acceptable speed, and while doing that vacuum (5-10 minutes) it totaly
> blocks our application that's mucking with the db.

http://people.freebsd.org/~alfred/vacfix/

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Instead of asking why a piece of software is using "1970s technology,"
start asking why software is ignoring 30 years of accumulated wisdom.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

[HACKERS] Re: 7.1 vacuum

2001-04-27 Thread Alfred Perlstein


* mlw <[EMAIL PROTECTED]> [010427 05:50] wrote:
> Alfred Perlstein wrote:
> > 
> > * Magnus Naeslund(f) <[EMAIL PROTECTED]> [010426 21:17] wrote:
> > > How does 7.1 work now with the vacuum and all?
> > >
> > > Does it go for indexes by default, even when i haven't run a vacuum at all?
> > > Does vacuum lock up postgres? It says the analyze part shouldn't, but how's
> > > that for all of the vacuum?
> > >
> > > An 7.0.3 db we have here we are forced to run vacuum every hour to get an
> > > acceptable speed, and while doing that vacuum (5-10 minutes) it totaly
> > > blocks our application that's mucking with the db.
> > 
> > http://people.freebsd.org/~alfred/vacfix/
> 
> What's the deal with vacuum lazy in 7.1? I was looking forward to it. It was
> never clear whether or not you guys decided to put it in.
> 
> If it is in as a feature, how does one use it?
> If it is a patch, how does one get it?

If you actually download and read the enclosed READMEs it's pretty
clear.

> If it is neither a patch nor an existing feature, has development stopped?

I have no idea, I haven't been tracking postgresql all that much 
since leaving the place where we contracted that work.


-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Represent yourself, show up at BABUG http://www.babug.org/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Re: SAP-DB

2001-04-29 Thread Alfred Perlstein

* Bruce Momjian <[EMAIL PROTECTED]> [010429 10:44] wrote:
> > > I swore I'd never post to the hackers list again, but this is an amazing
> > > statement by Bruce.
> > > 
> > > Boy, the robustness of the software is determined by the number of characters
> > > in the directory name?
> > > 
> > > By the languages used?
> > 
> > [Snip]
> > 
> > My guess is that Bruce was implying that the code was obfuscated. It is a
> > common trick for closed source to be "open" but not really.
> > 
> > I don't think it was any sort of technology snobbery. Far be it for me to
> > suggest an explanation to the words of others, that is just how I read it.
> 
> I don't think they intentionally confused the code.
> 
> The real problem I see in that it was very hard for me to find anything
> in the code.  I would be interested to see if others can find stuff.

I think this is general problem in a lot of projects, you open up
foo.c and say... "what the heck is this..." after a few hours of
studying the source you finally figure out is something that does
miniscule part X of massive part Y and by then you're too engrossed
to write a little banner for the file or dir explaining what it's
for and incorrectly assume that even if you did, it wouldn't help
that user unless he went through the same painful steps that you
did.

Been there, done that.. er, actually, still there, mostly still
doing that.  :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
http://www.egr.unlv.edu/~slumos/on-netbsd.html

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Thanks, naming conventions, and count()

2001-04-29 Thread Alfred Perlstein


* Bruce Momjian <[EMAIL PROTECTED]> [010429 20:14] wrote:

> Yes, I like that idea, but the problem is that it is hard to update just
> one table in the file.  You sort of have to update the entire file each
> time a table changes.  That is why I liked symlinks because they are
> per-table, but you are right that the symlink creation could fail
> because the new table file was never created or something, leaving the
> symlink pointing to nothing.  Not sure how to address this.  Is there a
> way to update a flat file when a single table changes?

Sort of, if that flat file is in the form of:
123456;"tablename   "
33;"another_table   "

ie, each line is a fixed length.


-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Thanks, naming conventions, and count()

2001-04-29 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [010429 23:12] wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > big problem is that there is no good way to make the symlinks reliable
> > because in a crash, the symlink could point to a table creation that got
> > rolled back or the renaming of a table that got rolled back.
> 
> Yes.  Have you already forgotten the very long discussion we had about
> this some months back?  There is no way to provide a reliable symlink
> mapping without re-introducing all the same problems that we went to
> numeric filenames to avoid.  Now if you want an *UNRELIABLE* symlink
> mapping, maybe we could talk about it ... but IMHO such a feature would
> be worse than useless.  Murphy's law says that the symlinks would be
> right often enough to mislead dbadmins into trusting them, and wrong
> exactly when it would do the most damage to trust them.  The same goes
> for other methods of unreliably exporting the name-to-number mapping,
> such as dumping it into a flat file.
> 
> We do need to document how to get the mapping (ie, select relfilenode,
> relname from pg_class).  But I really doubt that an automated method
> for exporting the mapping would be worth the cycles it would cost,
> even if it could be made reliable which it can't.

Perhaps an external tool to rebuild the symlink state that could be
run on an offline database.  But I'm sure you have more important
things to do. :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[HACKERS] COPY commands could use an enhancement.

2001-04-30 Thread Alfred Perlstein


It would be very helpful if the COPY command could be expanded
in order to provide positional parameters.

I noticed that it didn't a while back and it can really hurt
someone when they happen to try to use pg_dump to move data
from one database to another database and they happened to
create the feilds in the tables in different orders.

Basically:
COPY "webmaster" FROM stdin;

could become:
COPY "webmaster" FIELDS "id", "name", "ssn" FROM stdin;

this way when sourcing it would know where to place the
feilds.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] COPY commands could use an enhancement.

2001-04-30 Thread Alfred Perlstein

* Tom Lane <[EMAIL PROTECTED]> [010430 08:37] wrote:
> Alfred Perlstein <[EMAIL PROTECTED]> writes:
> > It would be very helpful if the COPY command could be expanded
> > in order to provide positional parameters.
> 
> I think it's a bad idea to try to expand COPY into a full-tilt data
> import/conversion utility, which is the direction that this sort of
> suggestion is headed in.  COPY is designed as a simple, fast, reliable,
> low-overhead data transfer mechanism for backup and restore.  The more
> warts we add to it, the less well it will serve that purpose.

Honestly it would be hard for COPY to be any more less serving of
people's needs, it really makes sense for it to be able to parse
positional paramters for both speed and correctness.

> Example: if we allow selective column import, what do we do with missing
> columns?

What is already done, if you initiate a copy into a 5 column table
using only 4 columns of copy data the fifth is left empty.

> Must COPY now be able to handle insertion of default-value
> expressions?

No, copy should be what it is simple but at the same time useful
enough for bulk transfer without painful contortions and fear
of modifying tables.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Represent yourself, show up at BABUG http://www.babug.org/

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] 7.1 startup recovery failure

2001-05-01 Thread Alfred Perlstein


* Rod Taylor <[EMAIL PROTECTED]> [010430 22:10] wrote:
> Corrupted or not, after a crash take a snapshot of the data tree
> before firing it back up again.  Doesn't take that much time
> (especially with a netapp filer) and it allows for a virtually
> unlimited number of attempts to solve the trouble or debug.
> 

You run your database over NFS?  They must be made of steel. :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] New Linux xfs/reiser file systems

2001-05-02 Thread Alfred Perlstein

* Bruce Momjian <[EMAIL PROTECTED]> [010502 14:01] wrote:
> I was talking to a Linux user yesterday, and he said that performance
> using the xfs file system is pretty bad.  He believes it has to do with
> the fact that fsync() on log-based file systems requires more writes.
> 
> With a standard BSD/ext2 file system, WAL writes can stay on the same
> cylinder to perform fsync.  Is that true of log-based file systems?
> 
> I know xfs and reiser are both log based.  Do we need to be concerned
> about PostgreSQL performance on these file systems?  I use BSD FFS with
> soft updates here, so it doesn't affect me.

The "problem" with log based filesystems is that they most likely
do not know the consequences of a write so an fsync on a file may
require double writing to both the log and the "real" portion of
the disk.  They can also exhibit the problem that an fsync may
cause all pending writes to require scheduling unless the log is
constructed on the fly rather than incrementally.

There was also the problem that was brought up recently that
certain versions (maybe all?) of Linux perform fsync() in a very
non-optimal manner, if the user is able to use the O_FSYNC option
rather than fsync he may see a performance increase.

But his guess is probably nearly as good as mine. :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
http://www.egr.unlv.edu/~slumos/on-netbsd.html

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] New Linux xfs/reiser file systems

2001-05-02 Thread Alfred Perlstein


* Bruce Momjian <[EMAIL PROTECTED]> [010502 15:20] wrote:
> > The "problem" with log based filesystems is that they most likely
> > do not know the consequences of a write so an fsync on a file may
> > require double writing to both the log and the "real" portion of
> > the disk.  They can also exhibit the problem that an fsync may
> > cause all pending writes to require scheduling unless the log is
> > constructed on the fly rather than incrementally.
> 
> Yes, this double-writing is a problem.  Suppose you have your WAL on a
> separate drive.  You can fsync() WAL with zero head movement.  With a
> log based file system, you need two head movements, so you have gone
> from zero movements to two.

It may be worse depending on how the filesystem actually does
journalling.  I wonder if an fsync() may cause ALL pending
meta-data to be updated (even metadata not related to the 
postgresql files).

Do you know if reiser or xfs have this problem?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] elog(LOG), elog(DEBUG)

2001-05-05 Thread Alfred Perlstein


* Peter Eisentraut <[EMAIL PROTECTED]> [010505 02:06] wrote:
> There's a TODO item to make elog(LOG) a separate level.  I propose the
> name INFO.  It would be identical to DEBUG in effect, only with a
> different label.  Additionally, all DEBUG logging should either be
> disabled unless the debug_level is greater than zero, or alternatively
> some elog(DEBUG) calls should be converted to INFO conditional on a
> configuration setting (like log_pid, for example).
> 
> The stricter distinction between DEBUG and INFO would also yield the
> possibility of optionally sending DEBUG output to the frontend, as has
> been requested a few times.

INFO makes sense as afaik it maps to syslog.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Utilizing "direct writes" Re: [HACKERS] File system performance and pg_xlog

2001-05-05 Thread Alfred Perlstein

* Marko Kreen <[EMAIL PROTECTED]> [010505 17:39] wrote:
> 
> There already exist bazillion filesystems, _some_ of them should
> be usable for PostgreSQL too :)
> 
> Besides resource waste there are others problems with app-level
> fs:
> 
> * double-buffering and incompatibilities of avoiding that

Depends on the OS, most Operating systems like FreeBSD and Solaris
offer character device access, this means that the OS will DMA
directly from the process's address space.  Avoiding the double
copy is trivial except that one must align and size writes correctly,
generally on 512 byte boundries and in 512 byte increments.

> * a lot of code should be reimplemented that already exists
>   in today's OS'es

That's true.

> * you lose all of UNIX user-space tools

Even worse. :)

> * the speed difference will not be very big.  Remeber: it _was_
>   big on OS'es and fs' in year 1990.  Today's fs are lot of
>   better and there should be a os/fs combo that is 95% perfect.

Well, here's an idea, has anyone tried using the "direct write"
interface that some OS's offer?  I doubt FreeBSD does, but I'm
positive that Solaris offers it as well as possibly IRIX.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] 7.0.2 crash (maybe linux kernel bug??)

2000-10-31 Thread Alfred Perlstein


* Michael J Schout <[EMAIL PROTECTED]> [001031 11:22] wrote:
> Hi.
> 
> Ive had a crash in postgresql 7.0.2.  Looking at what happened, I actually
> suspect that this is a filesystem bug, and not a postgresql bug necessarily,
> but I wanted to report it here and see if anyone else had any opinions.
> 
> The platform this happened on was linux (redhat 6.2), kernel 2.2.16 (SMP) dual
> pentium III 500MHz cpus, Mylex DAC960 raid controller running in raid5 mode.
> 
> During regular activity, I got a kernel oops.  Looking at the call trace from
> the kernel, as well as the EIP, I think maybe there is a bug here int the fs
> buffer code, and that htis is a linux kernel problem (not a postgresql
> problem).
> 
> Bug I'm no expert here.. Does this sould correct looking at the kernel erros
> below?
> 
> Sorry if this is off topic.  I just want to make sure this is a kernel bug and
> not a postgresql bug.
> 
> Mike
> 
> The oopses:
> 
> kernel: Unable to handle kernel NULL pointer dereference at virtual address 0134 
> kernel: current->tss.cr3 = 1a325000, %%cr3 = 1a325000 
> kernel: *pde =  
> kernel: Oops: 0002 
> kernel: CPU:0 
> kernel: EIP:0010:[remove_from_queues+169/328] 
> kernel: EFLAGS: 00010206 
> kernel: eax: 0100   ebx: 0002   ecx: df022e40   edx: efba76b8 
> kernel: esi: df022e40   edi:    ebp:    esp: da327ea4 
> kernel: ds: 0018   es: 0018   ss: 0018 
> kernel: Process postmaster (pid: 11527, process nr: 51, stackpage=da327000) 
> kernel: Stack: df022e40 c012be79 df022e40 df022e40 1000 c0142cb8 c0142cc7 
>df022e40  
> kernel:ec247140 ffea ec0b026c da326000 df022e40 df022e40 df022e40 
>000a4000  
> kernel: da327f08   eff29200 1000 00a5 
>000a5000  
> kernel: Call Trace: [refile_buffer+77/184] [ext2_file_write+996/1584] 
>[ext2_file_write+1011/1584] [kfree_skbmem+51/64] [__kfree_skb+162/168] 
>[lockd:__insmod_lockd_O/lib/modules/2.2.16-3smp/fs/lockd.o_M394EA7+-76392/76] 
>[handle_IRQ_event+90/140]  
> kernel:[sys_write+240/292] [ext2_file_write+0/1584] [system_call+52/56] 
>[startup_32+43/164]  
> kernel: Code: 89 50 34 c7 01 00 00 00 00 89 02 c7 41 34 00 00 00 00 ff 0d  
> kernel: Unable to handle kernel NULL pointer dereference at virtual address 0100 

Yes, your kernel basically segfaulted, I would get a traceback from your
crashdump and discuss it with the kernel developers.

--
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."


> kernel: current->tss.cr3 = 1ba46000, %%cr3 = 1ba46000 
> kernel: *pde =  
> kernel: Oops:  
> kernel: CPU:1 
> kernel: EIP:0010:[find_buffer+104/144] 
> kernel: EFLAGS: 00010206 
> kernel: eax: 0100   ebx: 0007   ecx: 00069dae   edx: 0100 
> kernel: esi: 000d   edi: 3006   ebp: 0005ce4b   esp: e53a19f4 
> kernel: ds: 0018   es: 0018   ss: 0018 
> kernel: Process postmaster (pid: 5545, process nr: 37, stackpage=e53a1000) 
> kernel: Stack: 0005ce4b 3006 00069dae c012b953 3006 0005ce4b 1000 
>c012bcc6  
> kernel:3006 0005ce4b 1000 3006 eff29200 3006 4e4b 
>ef18c960  
> kernel:c0141ee7 3006 0005ce4b 1000 0005ce4b e53a1bb0 edc3c660 
>edc3c660  
> kernel: Call Trace: [get_hash_table+23/36] [getblk+30/324] 
>[ext2_new_block+2291/2756] [getblk+271/324] [ext2_alloc_block+344/356] 
>[block_getblk+305/624] [ext2_getblk+256/524]  
> kernel:[ext2_file_write+1308/1584] [__brelse+19/84] [permission+36/248] 
>[dump_seek+53/104] [dump_seek+53/104] [dump_write+48/84] [elf_core_dump+3104/3216] 
>[do_IRQ+82/92]  
> kernel:[tcp_write_xmit+407/472] [__release_sock+36/124] 
>[tcp_do_sendmsg+2125/2144] [inet_sendmsg+0/144] [cprt+1553/20096] [cprt+1553/20096] 
>[cprt+1553/20096] [do_signal+458/724]  
> kernel:[force_sig_info+168/180] [force_sig+17/24] 
>[do_general_protection+54/160] [error_code+45/52] [signal_return+20/24]  
> kernel: Code: 8b 00 39 6a 04 75 15 8b 4c 24 20 39 4a 08 75 0c 66 39 7a 0c

[HACKERS] Query cache import?

2000-10-31 Thread Alfred Perlstein


I never saw much traffic regarding Karel's work on making stored
proceedures:

http://people.freebsd.org/~alfred/karel-pgsql.txt

What happened with this?  It looked pretty interesting. :(

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Query cache import?

2000-10-31 Thread Alfred Perlstein


* Karel Zak <[EMAIL PROTECTED]> [001031 16:18] wrote:
> 
> On Tue, 31 Oct 2000, Alfred Perlstein wrote:
> 
> > I never saw much traffic regarding Karel's work on making stored
> > proceedures:
> >
> > http://people.freebsd.org/~alfred/karel-pgsql.txt
> > 
> > What happened with this?  It looked pretty interesting. :(
> 
>  It's probably a little about me :-) ... well,
> 
>   My query cache is in usable state and it's efficient for all 
> things those motivate me to work on this.
> 
>  some basic features:
> 
>   - share parsed plans between backends in shared memory
>   - store plans to private backend hash table
>   - use parameters for stored queries
>   - better design for SPI 
>   - memory usage for saved plans
>   - save plans "by key"
> 
>  
>  The current query cache code depend on 7.1 memory management. After
> official 7.1 release I prepare patch with query cache+SPI (if not
> hit me over head, please ..)
> 
>  All what will doing next time not depend on me, *it's on code developers*.
> 
>  For example Jan has interesting idea about caching all plans which
> processing backend. But it's far future and IMHO we must go by small
> steps to Oracle's funeral :-) 

Well I'm just hoping that perl's $dbh->prepare() actually does a
temporary stored proceedure so that I can shave cycles off of 
my thousands upon thousands of repeated queries. :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] VACUUM causes violent postmaster death

2000-11-03 Thread Alfred Perlstein


* Dan Moschuk <[EMAIL PROTECTED]> [001103 14:55] wrote:
> 
> Server process (pid 13361) exited with status 26 at Fri Nov  3 17:49:44 2000
> Terminating any active server processes...
> NOTICE:  Message from PostgreSQL backend:
> The Postmaster has informed me that some other backend died abnormally and 
>possibly corrupted shared memory.
> I have rolled back the current transaction and am going to terminate your 
>database system connection and exit.
> Please reconnect to the database system and repeat your query.
> 
> This happens fairly regularly.  I assume exit code 26 is used to dictate
> that a specific error has occured.
> 
> The database is a decent size (~3M records) with about 4 indexes.

What version of postgresql?  Tom Lane recently fixed some severe problems
with vacuum and heavily used databases, the fix should be in the latest
7.0.2-patches/7.0.3 release.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] VACUUM causes violent postmaster death

2000-11-03 Thread Alfred Perlstein


* Dan Moschuk <[EMAIL PROTECTED]> [001103 15:32] wrote:
> 
> | > This happens fairly regularly.  I assume exit code 26 is used to dictate
> | > that a specific error has occured.
> | > 
> | > The database is a decent size (~3M records) with about 4 indexes.
> | 
> | What version of postgresql?  Tom Lane recently fixed some severe problems
> | with vacuum and heavily used databases, the fix should be in the latest
> | 7.0.2-patches/7.0.3 release.
> 
> It's 7.0.2-patches from about two or three weeks ago.

Make sure pgsql/src/backend/commands/vacuum.c is at:

revision 1.148.2.1
date: 2000/09/19 21:01:04;  author: tgl;  state: Exp;  lines: +37 -19
Back-patch fix to ensure that VACUUM always calls FlushRelationBuffers.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Alpha FreeBSD port of PostgreSQL !!!

2000-11-03 Thread Alfred Perlstein

* Nathan Boeger <[EMAIL PROTECTED]> [001103 15:43] wrote:
> is anyone working on the port of PostgreSQL for Alpha  FreeBSD ?? I have
> been waiting for over a year very very patiently !!!
> 
> I really love my Alpha FreeBSD box and I want to use PostgreSQL on it...
> but postgresql does not build.
> 
> If they need a box I am more than willing to give them complete access
> to my Alpha !
> 
> please let me know

Part of the problem is that Postgresql assumes FreeBSD == -m486, since
I have absolutely no 'configure/automake' clue it's where I faltered
when initially trying to compile on FreeBSD.

I have access to a FreeBSD box through the FreeBSD project and would
like to have another shot at it, but I was hoping one of the guys
more initmate with autoconf could lend me a hand.

thanks,
-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

[HACKERS] Re: [GENERAL] Query caching

2000-10-31 Thread Alfred Perlstein


* Steve Wolfe <[EMAIL PROTECTED]> [001031 13:47] wrote:
> 
> > >(Incidentally,  we've toyed  around with  developping a
> query-caching
> > > system that would sit betwen PostgreSQL and our DB libraries.
> >
> >  Sounds  amazing, but  requires some  research, I  guess. However,  in
> many
> > cases one  would be  more than  happy with  cahced connections.  Of
> course,
> > cahced query results  can be naturally added to that,  but just
> connections
> > are OK to start with. Security
> 
> To me, it doesn't sound like it would be that difficult of a project, at
> least not for the likes of the PostgreSQL developpers.  It also doesn't seem
> like it would really introduce any security problems, not if it were done
> inside of PostgreSQL.  Long ago, I got sidetracked from my endeavors in C,
> and so I don't feel that I'm qualified to do it.  (otherwise, I would have
> done it already. : ) )   If you wanted it done in Perl or Object Pascal, I
> could help. : )
> 
> Here's a simple design that I was tossing back and forth.  Please
> understand that I'm not saying this is the best way to do it, or even a good
> way to do it.  Just a possible way to do it.  I haven't been able to give it
> as much thought as I would like to.  Here goes.
> 
> 
> Implementation
> 

[snip]

Karel Zak <[EMAIL PROTECTED]> Implemented stored proceedures for
postgresql but still hasn't been approached to integrated them.

You can find his second attempt to get a response from the developers
here:

http://people.freebsd.org/~alfred/karel-pgsql.txt

--
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Restricting permissions on Unix socket

2000-10-31 Thread Alfred Perlstein


* Peter Eisentraut <[EMAIL PROTECTED]> [001031 12:57] wrote:
> I'd like to add an option or two to restrict the set of users that can
> connect to the Unix domain socket of the postmaster, as an extra security
> option.
> 
> I imagine something like this:
> 
> unix_socket_perm = 0660
> unix_socket_group = pgusers
> 
> Obviously, permissions that don't have 6's in there don't make much sense,
> but I feel this notation is the most intuitive way for admins.
> 
> I'm not sure how to do the group thing, though.  If I use chown(2) then
> there's a race condition, but doing savegid; create socket; restoregid
> might be too awkward?  Any hints?

Set your umask to 777 then go to town.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

[HACKERS] 7.0.2 dies when connection dropped mid-transaction

2000-11-09 Thread Alfred Perlstein


I have a program that does a:
DECLARE getsitescursor CURSOR FOR select...

I ^C'd it and it didn't properly shut down the channel to
postgresql and I got this crash:

#0  0x4828ffc8 in kill () from /usr/lib/libc.so.4
#1  0x482cbbf2 in abort () from /usr/lib/libc.so.4
#2  0x814442f in ExcAbort () at excabort.c:27
#3  0x81443ae in ExcUnCaught (excP=0x81a6070, detail=0, data=0x0, 
message=0x819a860 "!(AllocSetContains(set, pointer))") at exc.c:170
#4  0x81443f5 in ExcRaise (excP=0x81a6070, detail=0, data=0x0, 
message=0x819a860 "!(AllocSetContains(set, pointer))") at exc.c:187
#5  0x8143ae4 in ExceptionalCondition (
conditionName=0x819a860 "!(AllocSetContains(set, pointer))", 
exceptionP=0x81a6070, detail=0x0, fileName=0x819a720 "aset.c", 
lineNumber=392) at assert.c:73
#6  0x8147897 in AllocSetFree (set=0x8465134, 
pointer=0x84e0018 "") at aset.c:392
#7  0x8148394 in PortalVariableMemoryFree (this=0x846512c, 
pointer=0x84e0018 "") at portalmem.c:204
#8  0x8147e99 in MemoryContextFree (context=0x846512c, 
pointer=0x84e0018 "") at mcxt.c:245
#9  0x81490e5 in PortalDrop (portalP=0x8467944) at portalmem.c:802
#10 0x8148715 in CollectNamedPortals (portalP=0x0, destroy=1)
at portalmem.c:442
#11 0x814880f in AtEOXact_portals () at portalmem.c:472
#12 0x80870ad in AbortTransaction () at xact.c:1053
#13 0x80872ec in AbortOutOfAnyTransaction () at xact.c:1552
#14 0x810b3d0 in PostgresMain (argc=9, argv=0xbfbff0e0, real_argc=10, 
real_argv=0xbfbffb40) at postgres.c:1643
#15 0x80f0736 in DoBackend (port=0x8464000) at postmaster.c:2009
#16 0x80f02c9 in BackendStartup (port=0x8464000) at postmaster.c:1776
#17 0x80ef4ed in ServerLoop () at postmaster.c:1037
#18 0x80eeed2 in PostmasterMain (argc=10, argv=0xbfbffb40) at postmaster.c:725
#19 0x80bf3df in main (argc=10, argv=0xbfbffb40) at main.c:93
#20 0x8063495 in _start ()

things go to pot here:
387 {
388 AllocChunk  chunk;
389 
390 /* AssertArg(AllocSetIsValid(set)); */
391 /* AssertArg(AllocPointerIsValid(pointer)); */
392 AssertArg(AllocSetContains(set, pointer));
393 
394 chunk = AllocPointerGetChunk(pointer);
395 
396 #ifdef CLOBBER_FREED_MEMORY
(gdb) print *set
$2 = {blocks = 0x0, freelist = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}
(gdb) print pointer
$3 = 0x84e0018 ""

These sources are the current CVS sources with the exception of
some removed files by Marc.

Is there any more information I can provide?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] 7.0.2 dies when connection dropped mid-transaction

2000-11-09 Thread Alfred Perlstein


* Alfred Perlstein <[EMAIL PROTECTED]> [001109 17:07] wrote:
> I have a program that does a:
> DECLARE getsitescursor CURSOR FOR select...
> 
> I ^C'd it and it didn't properly shut down the channel to
> postgresql and I got this crash:

[snip]

> These sources are the current CVS sources with the exception of
> some removed files by Marc.
> 
> Is there any more information I can provide?

I forgot to mention, this is the latest REL7_0_PATCHES.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] 7.0.2 dies when connection dropped mid-transaction

2000-11-09 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [001109 18:30] wrote:
> I said:
> > OK, after digging some more, it seems that the critical requirement
> > is that the cursor's query contain a hash join.
> 
> Here's the deal:
> 
> test7=# set enable_mergejoin to off;
> SET VARIABLE
> test7=# begin;
> BEGIN
> -- I've previously checked that this produces a hash join plan:
> test7=# declare c cursor for select * from foo t1, foo t2 where t1.f1=t2.f1;
> SELECT
> test7=# fetch 1 from c;
>  f1 | f1
> +
>   1 |  1
> (1 row)
> 
> test7=# abort;
> NOTICE:  trying to delete portal name that does not exist.
> pqReadData() -- backend closed the channel unexpectedly.
> This probably means the backend terminated abnormally
> before or while processing the request.
> 
> This happens with either 7.0.2 or 7.0.3 (probably with anything back to
> 6.5, if not before).  It does *not* happen with current development tip.
> 
> The problem is that two "portal" structures are used.  One holds the
> overall query plan and execution state for the cursor, and the other
> holds the hash table for the hash join.  During abort, the portal
> manager tries to delete both of them.  BUT: deleting the query plan
> causes query cleanup to be executed, which among other things deletes
> the hash join's table.  Then the portal manager tries to delete the
> already-deleted second portal, which leads first to the above notice
> and then to Assert failure (and probably would lead to coredump if
> you didn't have Asserts on).  Alternatively, it might try to delete
> the hash join portal first, which would leave the query cleanup code
> deleting an already-deleted portal, and doubtless still crashing.
> 
> Current sources don't show the problem because hashtables aren't kept
> in portals anymore.
> 
> I've thought for some time that CollectNamedPortals is a horrid kluge,
> and really ought to be rewritten.  Hadn't seen it actually do the wrong
> thing before, but now...
> 
> I guess the immediate question is do we want to hold up 7.0.3 release
> for a fix?  This bug is clearly ancient, so I'm not sure it's
> appropriate to go through a fire drill to fix it for 7.0.3.
> Comments?

I dunno, having the database crash because a errant client disconnected
without shutting down, or needed to abort a transaction looks like
a show stopper.

We do track CVS and wouldn't have a problem shifting to 7_0_3_PATCHES,
but I'm not sure if the rest of the userbase is going to have much
fun.

It seems to be a serious problem, I think people wouldn't mind
waiting for you to squash this one.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] 7.0.2 dies when connection dropped mid-transaction

2000-11-09 Thread Alfred Perlstein

* Bruce Momjian <[EMAIL PROTECTED]> [001109 18:55] wrote:
> > I guess the immediate question is do we want to hold up 7.0.3 release
> > for a fix?  This bug is clearly ancient, so I'm not sure it's
> > appropriate to go through a fire drill to fix it for 7.0.3.
> > Comments?
> 
> We have delayed 7.0.3 already.  Tom is fixing so many bugs, we may find
> at some point that Tom never stops fixing bugs long enough for us to do
> a release.  I say let's push 7.0.3 out.  We can always do 7.0.4 later if
> we wish.

I think being able to crash the backend by just dropping a connection
during a pretty trivial query is a bad thing and it'd be more
prudent to wait.  I have no problem syncing with your guys CVS,
but people using redhat RPMS and FreeBSD Packages are going to wind
up with this bug if you cut the release before squashing it. :(

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

Re: [HACKERS] 7.0.2 dies when connection dropped mid-transaction

2000-11-09 Thread Alfred Perlstein


* The Hermit Hacker <[EMAIL PROTECTED]> [001109 20:19] wrote:
> On Thu, 9 Nov 2000, Tom Lane wrote:
> 
> > The Hermit Hacker <[EMAIL PROTECTED]> writes:
> > > Tom, if you can plug this one in the next, say, 48hrs (Saturday night),
> > 
> > Done.  Want to generate some new 7.0.3 release-candidate tarballs?
> 
> Done, and just forced a sync to ftp.postgresql.org of the new tarballs
> ... if nobody reports any probs with this by ~midnight tomorrow night,
> I'll finish up the 'release links' and get vince to add release info to
> the WWW site, followed by putting out an official announcement ...
> 
> Great work, as always :)

Tom rules.

*thinking freebsd port should add user tgl rather than pgsql*

:)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

2000-11-10 Thread Alfred Perlstein

* Tatsuo Ishii <[EMAIL PROTECTED]> [001110 18:42] wrote:
> > 
> > Yes, though we can change this. We also can implement now
> > feature that Bruce wanted so long and so much -:) -
> > fsync log not on each commit but each ~ 5sec, if
> > losing some recent commits is acceptable.
> 
> Sounds great.

Not really, I thought an ack on a commit would mean that the data
is actually in stable storage, breaking that would be pretty bad
no?  Or are you only talking about when someone is running with
async Postgresql?

Although this doesn't have an effect on my current application,
when running Postgresql with sync commits and WAL can one expect
the old behavior, ie. success only after data and meta data (log)
are written?

Another question I had was what would the effect of a mid-fsync
crash have on a system using WAL, let's say someone yanks the
power while the OS in the midst of fsync, will all be ok?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] 7.0.2 dies when connection dropped mid-transaction

2000-11-11 Thread Alfred Perlstein


* The Hermit Hacker <[EMAIL PROTECTED]> [001109 20:19] wrote:
> On Thu, 9 Nov 2000, Tom Lane wrote:
> 
> > The Hermit Hacker <[EMAIL PROTECTED]> writes:
> > > Tom, if you can plug this one in the next, say, 48hrs (Saturday night),
> > 
> > Done.  Want to generate some new 7.0.3 release-candidate tarballs?
> 
> Done, and just forced a sync to ftp.postgresql.org of the new tarballs
> ... if nobody reports any probs with this by ~midnight tomorrow night,
> I'll finish up the 'release links' and get vince to add release info to
> the WWW site, followed by putting out an official announcement ...
> 
> Great work, as always :)

Just wanted to confirm that we haven't experianced the bug since we've
applied Tom's patch several days ago.

thanks for the excellent work!

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] cygwin gcc problem.

2000-11-11 Thread Alfred Perlstein

* Gary MacDougall <[EMAIL PROTECTED]> [00 11:28] wrote:
> I'm trying to compile postgresql on Windows 2000.  I've followed the directions 
>accordingly.
> 
> When I run the "configure" script, and I get the following error message:
> 
> 
> configure: error  installation or configuration problem: C compiler cannot creat
> e executables.
> 
> If anyone has any clues, I'd greatly appreciate the assistance.

I think you need to ask on the cygwin lists.  If you're compiling
this on Windows 2000 you already need a compiler to compile it.

I would just find the binary distribution and install that.

-Alfred

Re: [HACKERS] RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

2000-11-11 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [00 12:06] wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> >> I have to agree with Alfred here: this does not sound like a feature,
> >> it sounds like a horrid hack.  You're giving up *all* consistency
> >> guarantees for a performance gain that is really going to be pretty
> >> minimal in the WAL context.
> 
> > It does not give up consistency.  The db is still consistent, it is just
> > consistent from a few seconds ago, rather than commit time.
> 
> No, it isn't consistent.  Without the fsync you don't know what order
> the kernel will choose to plop down WAL log blocks in; you could end up
> with a corrupt log.  (Actually, perhaps that could be worked around if
> the log blocks are suitably marked so that you can tell where the last
> sequentially valid one is.  I haven't looked at the log structure in
> any detail...)

This could be fixed by using O_FSYNC on the open call for the WAL
data files on *BSD, i'm not sure of the sysV equivelant, but I know
it exists.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] 486 Optimizations...

2000-11-14 Thread Alfred Perlstein


* Larry Rosenman <[EMAIL PROTECTED]> [001114 13:42] wrote:
> Anyone care if I build a patch to kill the -m486 type options in the
> following files:
> 
> $ grep -i -- 486 *
> bsdi:  i?86)  CFLAGS="$CFLAGS -m486";;
> freebsd:CFLAGS='-O2 -m486 -pipe'
> univel:CFLAGS='-v -O -K i486,host,inline,loop_unroll -Dsvr4'
> $ pwd
> /home/ler/pg-dev/pgsql/src/template
> $

I have a patch pending for FreeBSD to support alpha builds that
also disables -m486 so if you left the freebsd template alone it
would be ok.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] 486 Optimizations...

2000-11-14 Thread Alfred Perlstein


* Trond Eivind Glomsrød <[EMAIL PROTECTED]> [001114 13:45] wrote:
> Larry Rosenman <[EMAIL PROTECTED]> writes:
> 
> > Anyone care if I build a patch to kill the -m486 type options in the
> > following files:
> > 
> > $ grep -i -- 486 *
> > bsdi:  i?86)  CFLAGS="$CFLAGS -m486";;
> > freebsd:CFLAGS='-O2 -m486 -pipe'
> > univel:CFLAGS='-v -O -K i486,host,inline,loop_unroll -Dsvr4'
> 
> Why would you want to? Not all gccs support -mpentium/mpentiumpro etc.

The idea is to remove it entirely (I hope) not add even more arch
specific compile flags.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] 486 Optimizations...

2000-11-14 Thread Alfred Perlstein


* Larry Rosenman <[EMAIL PROTECTED]> [001114 13:47] wrote:
> * Alfred Perlstein <[EMAIL PROTECTED]> [001114 15:46]:
> > * Larry Rosenman <[EMAIL PROTECTED]> [001114 13:42] wrote:
> > > Anyone care if I build a patch to kill the -m486 type options in the
> > > following files:
> > > 
> > > $ grep -i -- 486 *
> > > bsdi:  i?86)  CFLAGS="$CFLAGS -m486";;
> > > freebsd:CFLAGS='-O2 -m486 -pipe'
> > > univel:CFLAGS='-v -O -K i486,host,inline,loop_unroll -Dsvr4'
> > > $ pwd
> > > /home/ler/pg-dev/pgsql/src/template
> > > $
> > 
> > I have a patch pending for FreeBSD to support alpha builds that
> > also disables -m486 so if you left the freebsd template alone it
> > would be ok.
> I have a P-III, I don't want the template to specify it *AT ALL*. 
> (this is on FreeBSD 4.2-BETA). 

My patches set i386-FreeBSD to -O2 and alpha-FreeBSD to -O, no
worries.

> It seems like GCC does the right (or mostly right) thing without 
> the -m option

heh. :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

[HACKERS] IRC?

2000-11-14 Thread Alfred Perlstein


I remeber a few developers used to gather on efnet irc,
there was a lot of instability recently that seems to have
cleared up even more recently.

Are you guys planning on coming back?  Or have you all
moved to a different network?


-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: One more [HACKERS] 486 Optimizations...

2000-11-14 Thread Alfred Perlstein

* igor <[EMAIL PROTECTED]> [001114 20:46] wrote:
> Hi ,
> 
> I would like to increase perfomance of PG 7.02 on i486,
> where can I read about this ? May be there is any flags for
> postgres ?

Check your C compiler's manpage for the relevant optimization
flags, be aware that some compilers can emit broken code when
pushed to thier highest optimization levels.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

2000-11-16 Thread Alfred Perlstein


* Larry Rosenman <[EMAIL PROTECTED]> [001116 12:09] wrote:
> * Bruce Momjian <[EMAIL PROTECTED]> [001116 14:02]:
> > > > This sounds like an interesting approach, yes.
> > > Question: Is sleep(0) guaranteed to at least give up control? 
> > > 
> > > The way I read my UnixWare 7's man page, it might not, since alarm(0)
> > > just cancels the alarm...
> > 
> > Well, it certainly is a kernel call, and most OS's re-evaluate on kernel
> > call return.
> BUT, do we know for sure that sleep(0) is not optimized in the library
> to just return? 

sleep(3) should conform to POSIX specification, if anyone has the
reference they can check it to see what the effect of sleep(0)
should be.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] location of Unix socket

2000-11-17 Thread Alfred Perlstein


* Oliver Elphick <[EMAIL PROTECTED]> [001117 16:41] wrote:
> At present the Unix socket's location is hard-coded as /tmp.
> 
> As a result of a bug report, I have moved it in the Debian package to 
> /var/run/postgresql/.  (The bug was that tmpreaper was deleting it and
> thus blocking new connections.)
> 
> I suppose that we cannot assume that /var/run exists across all target
> systems, so could the socket location be made a configurable parameter
> in 7.1?

What about X sockets and ssh-agent sockets, and so on?

Where's the source to this thing? :)

It would make more sense to fix tempreaper to ignore non regular
files.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

2000-11-11 Thread Alfred Perlstein

* Bruce Momjian <[EMAIL PROTECTED]> [00 00:16] wrote:
> > * Tatsuo Ishii <[EMAIL PROTECTED]> [001110 18:42] wrote:
> > > > 
> > > > Yes, though we can change this. We also can implement now
> > > > feature that Bruce wanted so long and so much -:) -
> > > > fsync log not on each commit but each ~ 5sec, if
> > > > losing some recent commits is acceptable.
> > > 
> > > Sounds great.
> > 
> > Not really, I thought an ack on a commit would mean that the data
> > is actually in stable storage, breaking that would be pretty bad
> > no?  Or are you only talking about when someone is running with
> > async Postgresql?
> 
> The default is to sync on commit, but we need to give people options of
> several seconds delay for performance reasons.  Inforimx calls it
> buffered logging, and it is used by most of the sites I know because it
> has much better performance that sync on commit.
> 
> If the machine crashes five seconds after commit, many people don't have
> a problem with just re-entering the data.

We have several critical tables and running certain updates/deletes/inserts
on them in async mode worries me.  Would it be possible to add a
'set' command to force a backend into fsync mode and perhaps back
into non-fsync mode as well?

What about setting an attribute on a table that could mean
a) anyone updating me better fsync me.
b) anyone updating me better fsync me as well as fsyncing
   anything else they touch.

I swear one of these days I'm going to get more familiar with the
codebase and actually submit some useful patches for the backend.
:(

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

Re: [HACKERS] 486 Optimizations...

2000-11-15 Thread Alfred Perlstein


* Peter Eisentraut <[EMAIL PROTECTED]> [001115 08:15] wrote:
> 
> I couldn't say I like these options, because they seem arbitrary, but
> given that it only affects the 0 univel users and the 3 bsdi users left
> (freebsd will be fixed), I wouldn't make a fuzz.

BSDi still has a market niche, and they are actively porting to
more platforms.

> 
> I do feel more strongly about removing '-pipe', but it's not something I'm
> going to pursue.

Why?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Alpha FreeBSD port of PostgreSQL !!!

2000-11-03 Thread Alfred Perlstein


* Peter Eisentraut <[EMAIL PROTECTED]> [001103 16:16] wrote:
> Alfred Perlstein writes:
> 
> > Part of the problem is that Postgresql assumes FreeBSD == -m486,
> 
> If that's all then go into src/template/freebsd and remove it.

ok, thanks for the pointer, I'll try to have some patches in the
near future.

> The interesting question is whether the spinlock code, which was written
> for Alpha/Linux, works (src/include/storage/s_lock.h).  All the rest
> should work out of the box.

I'll see. :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

2000-11-16 Thread Alfred Perlstein


* Bruce Momjian <[EMAIL PROTECTED]> [001116 08:59] wrote:
> [ Charset ISO-8859-1 unsupported, converting... ]
> > > > > Earlier, Vadim was talking about arranging to share fsyncs of the WAL
> > > > > log file across transactions (after writing your commit record to the
> > > > > log, sleep a few milliseconds to see if anyone else fsyncs before you
> > > > > do; if not, issue the fsync yourself).  That would offer less-than-
> > > > > one-fsync-per-transaction performance without giving up any 
> > > > > guarantees.
> > > > > If people feel a compulsion to have a tunable parameter, let 'em tune
> > > > > the length of the pre-fsync sleep ...
> > > > 
> > > > Already implemented (without ability to tune this parameter - 
> > > > xact.c:CommitDelay, - yet). Currently CommitDelay is 5, so
> > > > backend sleeps 1/200 sec before checking/forcing log fsync.
> > > 
> > > But it returns _completed_ to the client before sleeping, right?
> > 
> > No.
> 
> Ewe, so we have this 1/200 second delay for every transaction.  Seems
> bad to me.

I think as long as it becomes a tunable this isn't a bad idea at
all.  Fixing it at 1/200 isn't so great because people not wrapping
large amounts of inserts/updates with transaction blocks will
suffer.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

2000-11-16 Thread Alfred Perlstein


* Bruce Momjian <[EMAIL PROTECTED]> [001116 11:59] wrote:
> > At 02:13 PM 11/16/00 -0500, Bruce Momjian wrote:
> > 
> > >> I think the default should probably be no delay, and the documentation
> > >> on enabling this needs to be clear and obvious (i.e. hard to miss).
> > >
> > >I just talked to Tom Lane about this.  I think a sleep(0) just before
> > >the flush would be the best.  It would reliquish the cpu slice if
> > >another process is ready to run.  If no other backend is running, it
> > >probably just returns.  If there is another one, it gives it a chance to
> > >complete.  On return from sleep(0), it can check if it still needs to
> > >flush.  This would tend to bunch up flushers so they flush only once,
> > >while not delaying cases where only one backend is running.
> > 
> > This sounds like an interesting approach, yes.
> 
> In OS kernel design, you try to avoid process herding bottlenecks. 
> Here, we want them herded, and giving up the CPU may be the best way to
> do it.

Yes, but if everyone yeilds you're back where you started, and with
128 or more backends do you really want to cause possibly that many
context switches per fsync?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

2000-11-16 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [001116 13:31] wrote:
> Alfred Perlstein <[EMAIL PROTECTED]> writes:
> > It might make more sense to keep a private copy of the last time
> > the file was modified per-backend by that particular backend and
> > a timestamp of the last fsync shared globally so one can forgo the
> > fsync if "it hasn't been dirtied by me since the last fsync"
> > This would provide a rendevous point for the fsync call although
> > cost more as one would need to periodically call gettimeofday to
> > set the modified by me timestamp as well as the post-fsync shared
> > timestamp.
> 
> That's the hard way to do it.  We just need to keep track of the
> endpoint of the log as of the last fsync.  You need to fsync (after
> returning from sleep()) iff your commit record position > fsync
> endpoint.  No need to ask the kernel for time-of-day.

Well that breaks when you move to a overwriting storage manager,
however if you use oid instead that optimization would survive
the change to a overwriting storage manager.  ?

-Alfred

Re: [HACKERS] RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

2000-11-16 Thread Alfred Perlstein


* Bruce Momjian <[EMAIL PROTECTED]> [001116 12:31] wrote:
> > > In OS kernel design, you try to avoid process herding bottlenecks. 
> > > Here, we want them herded, and giving up the CPU may be the best way to
> > > do it.
> > 
> > Yes, but if everyone yeilds you're back where you started, and with
> > 128 or more backends do you really want to cause possibly that many
> > context switches per fsync?
> 
> You are going to kernel call/yield anyway to fsync, so why not try and
> if someone does the fsync, we don't need to do it.  I am suggesting
> re-checking the need for fsync after the return from sleep(0).

It might make more sense to keep a private copy of the last time
the file was modified per-backend by that particular backend and
a timestamp of the last fsync shared globally so one can forgo the
fsync if "it hasn't been dirtied by me since the last fsync"

This would provide a rendevous point for the fsync call although
cost more as one would need to periodically call gettimeofday to
set the modified by me timestamp as well as the post-fsync shared
timestamp.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Re: LOCK Fixes/Break on FreeBSD 4.2-STABLE

2000-11-28 Thread Alfred Perlstein


* Larry Rosenman <[EMAIL PROTECTED]> [001128 20:44] wrote:
> * Tom Lane <[EMAIL PROTECTED]> [001128 22:31]:
> > Larry Rosenman <[EMAIL PROTECTED]> writes:
> > > The last batch of commits break on FreeBSD 4.2-STABLE. 
> > > /usr/include/machine/lock.h:148: conflicting types for `s_lock'
> > > ../../../src/include/storage/s_lock.h:402: previous declaration of `s_lock'
> > 
> > That's odd.  s_lock has been declared the same way right along in our
> > code; I didn't change it.  Can you see what's changed to cause a
> > conflict where there was none before?
> > 
> > regards, tom lane
> Other things that may be an issue:
> 
> 1) BINUTILS 2.10.1
> 2) OPENSSL 0.9.6 
> 
> both just MFC'd into FreeBSD recently, but I believe we built until
> tonite. 
> 
> I can make you an account on the box if you'd like

My signifigant other just installed a fresh copy of 4.2 last night,
unfortunetly the poor box is only a 233mhz, it'll be a while before
we build -stable on it.

However I'm confident I can have a fix within a couple of days.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Initdb not running on beos

2000-11-28 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [001128 20:48] wrote:
> Adam Haberlach <[EMAIL PROTECTED]> writes:
> > On Mon, Nov 27, 2000 at 04:09:46PM -0500, Tom Lane wrote:
> >> Somewhere right around here is where I am going to ask why we are
> >> entertaining the idea of a BeOS port in the first place... it's
> >> evidently not Unix or even trying hard to be close to Unix.
> 
> > You've asked this before.
> 
> > How does Windows manage to work?
> 
> Objection!  Point not in evidence!
> 
> ;-)
> 
> Seriously, we do not pretend to run on Windows.  It does seem to be
> possible to run Postgres atop Cygwin's Unix emulation atop Windows.
> However, that's only because of some superhuman efforts from the
> Cygwin team, not because Windows is a Postgres-compatible platform.
> 
> As far as the original question goes, I suspect that a rename() would
> work just as well as the link()/unlink() combo that's in that code now.
> I would have no objection to a submitted patch along that line.  But the
> target audience for Postgres is POSIX-compatible platforms, and I do not
> think that the core group of developers should be spending much time on
> hacking the code to work on platforms that can't meet the POSIX spec.
> If anyone else wants to make that happen, we'll accept patches ... but
> don't expect us to supply solutions, OK?

Afaik the atomicity of rename() (the same as a link()/unlink() pair)
is specified by POSIX.

Sorry for jumping in late in the thread, but rename() sure sounds a
lot better than a link()/unlink() pair, but I'm probably taking it
out of context.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Re: LOCK Fixes/Break on FreeBSD 4.2-STABLE

2000-11-28 Thread Alfred Perlstein

* Larry Rosenman <[EMAIL PROTECTED]> [001128 20:52] wrote:
> My offer stands for you as well, if you'd like an account
> on this P-III 600E, you are welcome to one...

I just remebered my laptop in the other room, it's a pretty recent 4.2.

I'll give it shot.

Yes, it's possible to forget about a computer...
   http://people.freebsd.org/~alfred/images/lab.jpg

:)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] COPY BINARY is broken...

2000-12-01 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [001201 14:42] wrote:
> I've just noticed that COPY BINARY is pretty thoroughly broken by TOAST,
> because what it does is to dump out verbatim the bytes making up each
> tuple of the relation.  In the case of a moved-off value, you'll get
> the toast reference, which is not going to be too helpful for reloading
> the table data.  In the case of a compressed-in-line datum, you'll at
> least have all the data there, but the COPY BINARY reader will crash
> and burn when it sees it.
> 
> Fixing this while retaining backwards compatibility with the existing
> COPY BINARY file format is possible, but it seems rather a headache:
> we'd need to detoast all the toasted columns, then heap_formtuple a
> new tuple containing the expanded data, and finally write that out.
> (Can't do it on a field-by-field basis because the file format requires
> the total tuple size to precede the tuple data.)  Kind of ugly.
> 
> The existing COPY BINARY file format is entirely brain-dead anyway; for
> example, it wants the number of tuples to be stored at the front, which
> means we have to scan the whole relation an extra time to get that info.
> Its handling of nulls is bizarre, too.  I'm thinking this might be a
> good time to abandon backwards compatibility and switch to a format
> that's a little easier to read and write.  Does anyone have an opinion
> pro or con about that?

BINARY COPY scared the bejeezus out of me, anyone using the interface
is asking for trouble and supporting it seems like a nightmare, I
would rip it out.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] COPY BINARY is broken...

2000-12-01 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [001201 14:57] wrote:
> Alfred Perlstein <[EMAIL PROTECTED]> writes:
> > I would rip it out.
> 
> I thought about that too, but was afraid to suggest it ;-)

I think you'd agree that you have more fun and important things to
do than to deal with this yucky interface. :)

> How many people are actually using COPY BINARY?

I'm not using it. :)

How about adding COPY XML?











(kidding of course about the XML, but it would make postgresql more
buzzword compliant :) )

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Re: LOCK Fixes/Break on FreeBSD 4.2-STABLE

2000-12-05 Thread Alfred Perlstein

* Larry Rosenman <[EMAIL PROTECTED]> [001128 20:44] wrote:
> * Tom Lane <[EMAIL PROTECTED]> [001128 22:31]:
> > Larry Rosenman <[EMAIL PROTECTED]> writes:
> > > The last batch of commits break on FreeBSD 4.2-STABLE. 
> > > /usr/include/machine/lock.h:148: conflicting types for `s_lock'
> > > ../../../src/include/storage/s_lock.h:402: previous declaration of `s_lock'
> > 
> > That's odd.  s_lock has been declared the same way right along in our
> > code; I didn't change it.  Can you see what's changed to cause a
> > conflict where there was none before?
> > 
> > regards, tom lane
> Other things that may be an issue:
> 
> 1) BINUTILS 2.10.1
> 2) OPENSSL 0.9.6 
> 
> both just MFC'd into FreeBSD recently, but I believe we built until
> tonite. 
> 
> I can make you an account on the box if you'd like

Grr, couldn't find the original message.  I think you thought
you solved your problem with building on FreeBSD, however I
think you just forgot to compile with perl support enabled.

When I compiled with perl support it broke.  This isn't a 
postgresql bug, nor a FreeBSD bug although the fault lies
mostly with FreeBSD for polluting the C namespace a _lot_
when sys/mount.h is included.

What happens is the the perl code brings in perl.h which brings
in sys/mount.h, sys/mount.h includes sys/lock.h because our
kernel structure "mount" has a VFS lock in it, VFS locks happen
to contain spinlocks (simplelocks) and one of our our functions
to manipulate the simplelocks is called s_lock().  This causes
a namespace conflict which causes the compile error.

Anyhow, to address the problem I've removed struct mount from
userland visibility in both FreeBSD 5.x (current) and FreeBSD 4.x
(stable).

Things should work now but let me know if you have any other
problems.

And thanks for pointing it out and offering to help track it
down.

here's the patch if you don't want to cvsup your machine all the
way.

Index: sys/sys/mount.h
===
RCS file: /home/ncvs/src/sys/sys/mount.h,v
retrieving revision 1.89
diff -u -r1.89 mount.h
--- sys/sys/mount.h 2000/01/19 06:07:34 1.89
+++ sys/sys/mount.h 2000/12/04 20:00:54
@@ -46,7 +46,9 @@
 #endif /* !_KERNEL */

 #include 
+#ifdef _KERNEL
 #include 
+#endif

 typedef struct fsid { int32_t val[2]; } fsid_t;/* file system id type */

@@ -99,6 +101,7 @@
longf_spare[2]; /* unused spare */
 };

+#ifdef _KERNEL
 /*
  * Structure per mounted file system.  Each mounted file system has an
  * array of operations and an instance record.  The file systems are
@@ -122,6 +125,7 @@
time_t  mnt_time;   /* last time written*/
u_int   mnt_iosize_max; /* max IO request size */
 };
+#endif /* _KERNEL */

 /*
  * User specifiable flags.
Index: usr.bin/fstat/cd9660.c
===
RCS file: /home/ncvs/src/usr.bin/fstat/cd9660.c,v
retrieving revision 1.1.2.1
diff -u -r1.1.2.1 cd9660.c
--- usr.bin/fstat/cd9660.c  2000/07/02 10:20:24 1.1.2.1
+++ usr.bin/fstat/cd9660.c  2000/12/04 23:35:21
@@ -46,7 +46,9 @@
 #include 
 #include 
 #include 
+#define _KERNEL
 #include 
+#undef _KERNEL

 #include 

Index: usr.bin/fstat/fstat.c
===
RCS file: /home/ncvs/src/usr.bin/fstat/fstat.c,v
retrieving revision 1.21.2.2
diff -u -r1.21.2.2 fstat.c
--- usr.bin/fstat/fstat.c   2000/07/02 10:28:38 1.21.2.2
+++ usr.bin/fstat/fstat.c   2000/12/04 20:01:08
@@ -66,8 +66,8 @@
 #include 
 #include 
 #include 
-#undef _KERNEL
 #include 
+#undef _KERNEL
 #include 
 #include 
 #include 

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

[HACKERS] Spinlocks may be broken.

2000-12-05 Thread Alfred Perlstein


I'm debugging some code here where I get problems related to
spinlocks, anyhow, while running through the files I noticed
that the UNLOCK code seems sort of broken.

What I mean is that on machines that have loosely ordered
memory models you can have problems because of data that's
supposed to be protected by the lock not getting flushed
out to main memory until possibly after the unlock happens.

I'm pretty sure you guys need memory barrier ops.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

[HACKERS] Need help with phys backed shm segments (Postgresql+FreeBSD).

2000-12-05 Thread Alfred Perlstein


On FreeBSD 4.1.1 and above there's a sysctl tunable called
kern.ipc.shm_use_phys, when set to 1 it's supposed to
make the kernel's handling of shared memory much more
effecient at the expense or making the shm segment unpageable.

I tried to use this option with 7.0.3 and FreeBSD 4.2 but
for some reason spinlocks keep getting mucked up (there's
a log at the tail end of this message).

Anyone using Postgresql on FreeBSD probably wants this to work,
otherwise using extremely large chunks of shm and many backends
active can exhaust kernel memory.

I was wondering if any of the more experienced developers could
take a look at what's happenening here.

Here's the log, the number in parens is the address of the lock,
on tas() the value printed to the right is the value in _ret,
for the others, it's the value before the lock count is set.

S_INIT_LOCK: (0x30048008) -> 0
S_UNLOCK: (0x30048008) -> 0
S_INIT_LOCK: (0x3004800c) -> 0
S_UNLOCK: (0x3004800c) -> 0
S_INIT_LOCK: (0x30048010) -> 0
S_UNLOCK: (0x30048010) -> 0
S_INIT_LOCK: (0x30048011) -> 0
S_UNLOCK: (0x30048011) -> 0
S_INIT_LOCK: (0x30048012) -> 0
S_UNLOCK: (0x30048012) -> 0
S_INIT_LOCK: (0x30048018) -> 0
S_UNLOCK: (0x30048018) -> 0
S_INIT_LOCK: (0x3004801c) -> 0
S_UNLOCK: (0x3004801c) -> 0
S_INIT_LOCK: (0x3004801d) -> 1
S_UNLOCK: (0x3004801d) -> 1
S_INIT_LOCK: (0x3004801e) -> 0
S_UNLOCK: (0x3004801e) -> 0
S_INIT_LOCK: (0x30048024) -> 127
S_UNLOCK: (0x30048024) -> 127
S_INIT_LOCK: (0x30048028) -> 255
S_UNLOCK: (0x30048028) -> 255
S_INIT_LOCK: (0x30048029) -> 0
S_UNLOCK: (0x30048029) -> 0
S_INIT_LOCK: (0x3004802a) -> 0
S_UNLOCK: (0x3004802a) -> 0
S_INIT_LOCK: (0x30048030) -> 1
S_UNLOCK: (0x30048030) -> 1
S_INIT_LOCK: (0x30048034) -> 0
S_UNLOCK: (0x30048034) -> 0
S_INIT_LOCK: (0x30048035) -> 0
S_UNLOCK: (0x30048035) -> 0
S_INIT_LOCK: (0x30048036) -> 0
S_UNLOCK: (0x30048036) -> 0
S_INIT_LOCK: (0x3004803c) -> 50
S_UNLOCK: (0x3004803c) -> 50
S_INIT_LOCK: (0x30048040) -> 10
S_UNLOCK: (0x30048040) -> 10
S_INIT_LOCK: (0x30048041) -> 0
S_UNLOCK: (0x30048041) -> 0
S_INIT_LOCK: (0x30048042) -> 0
S_UNLOCK: (0x30048042) -> 0
S_INIT_LOCK: (0x30048048) -> 1
S_UNLOCK: (0x30048048) -> 1
S_INIT_LOCK: (0x3004804c) -> 80
S_UNLOCK: (0x3004804c) -> 80
S_INIT_LOCK: (0x3004804d) -> 1
S_UNLOCK: (0x3004804d) -> 1
S_INIT_LOCK: (0x3004804e) -> 0
S_UNLOCK: (0x3004804e) -> 0
S_INIT_LOCK: (0x30048054) -> 0
S_UNLOCK: (0x30048054) -> 0
S_INIT_LOCK: (0x30048058) -> 1
S_UNLOCK: (0x30048058) -> 1
S_INIT_LOCK: (0x30048059) -> 1
S_UNLOCK: (0x30048059) -> 1
S_INIT_LOCK: (0x3004805a) -> 0
S_UNLOCK: (0x3004805a) -> 0
S_INIT_LOCK: (0x30048060) -> 0
S_UNLOCK: (0x30048060) -> 0
S_INIT_LOCK: (0x30048064) -> 0
S_UNLOCK: (0x30048064) -> 0
S_INIT_LOCK: (0x30048065) -> 0
S_UNLOCK: (0x30048065) -> 0
S_INIT_LOCK: (0x30048066) -> 0
S_UNLOCK: (0x30048066) -> 0
S_INIT_LOCK: (0x3004806c) -> 0
S_UNLOCK: (0x3004806c) -> 0
S_INIT_LOCK: (0x30048070) -> 0
S_UNLOCK: (0x30048070) -> 0
S_INIT_LOCK: (0x30048071) -> 0
S_UNLOCK: (0x30048071) -> 0
S_INIT_LOCK: (0x30048072) -> 0
S_UNLOCK: (0x30048072) -> 0
S_INIT_LOCK: (0x30048078) -> 0
S_UNLOCK: (0x30048078) -> 0
S_INIT_LOCK: (0x3004807c) -> 0
S_UNLOCK: (0x3004807c) -> 0
S_INIT_LOCK: (0x3004807d) -> 0
S_UNLOCK: (0x3004807d) -> 0
S_INIT_LOCK: (0x3004807e) -> 0
S_UNLOCK: (0x3004807e) -> 0
tas (0x30048054) -> 0
tas (0x30048059) -> 0
tas (0x30048058) -> 0
S_UNLOCK: (0x30048054) -> 1
tas (0x30048048) -> 0
tas (0x3004804d) -> 0
tas (0x3004804c) -> 0
S_UNLOCK: (0x30048048) -> 1
tas (0x30048048) -> 0
S_UNLOCK: (0x3004804c) -> 1
S_UNLOCK: (0x3004804d) -> 1
S_UNLOCK: (0x30048048) -> 1
tas (0x30048048) -> 0
tas (0x3004804d) -> 0
tas (0x3004804c) -> 0
S_UNLOCK: (0x30048048) -> 1
tas (0x30048048) -> 0
S_UNLOCK: (0x3004804c) -> 1
S_UNLOCK: (0x3004804d) -> 1
S_UNLOCK: (0x30048048) -> 1
tas (0x30048048) -> 0
tas (0x3004804d) -> 4
tas (0x3004804d) -> 1
tas (0x3004804d) -> 1
tas (0x3004804d) -> 1
tas (0x3004804d) -> 1
tas (0x3004804d) -> 1
tas (0x3004804d) -> 1
tas (0x3004804d) -> 1

repeats (it's stuck)


-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Re: LOCK Fixes/Break on FreeBSD 4.2-STABLE

2000-12-05 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [001205 07:14] wrote:
> Alfred Perlstein <[EMAIL PROTECTED]> writes:
> > Anyhow, to address the problem I've removed struct mount from
> > userland visibility in both FreeBSD 5.x (current) and FreeBSD 4.x
> > (stable).
> 
> That might fix things on your box, but we can hardly rely on it as an
> answer for everyone running FreeBSD :-(.
> 
> Anyway, I've already worked around the problem by rearranging the PG
> headers so that plperl doesn't need to import s_lock.h ...

Well I didn't say it was completely our fault, it's just that we
try pretty hard not to let those types of structs leak into userland
and for us to "steal" something called s_lock from userland, well
that's no good. :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Spinlocks may be broken.

2000-12-05 Thread Alfred Perlstein

* Tom Lane <[EMAIL PROTECTED]> [001205 07:24] wrote:
> Alfred Perlstein <[EMAIL PROTECTED]> writes:
> > I'm pretty sure you guys need memory barrier ops.
> 
> On a machine that requires such a thing, the assembly code for UNLOCK
> should include it.  Want to provide a patch?

My assembler is extremely rusty, you can probably find such code
in the NetBSD or Linux kernel for all the archs you want to do.
I wouldn't feel confident providing a patch, all I have is x86
hardware.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Need help with phys backed shm segments (Postgresql+FreeBSD).

2000-12-05 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [001205 07:43] wrote:
> Alfred Perlstein <[EMAIL PROTECTED]> writes:
> > Here's the log, the number in parens is the address of the lock,
> > on tas() the value printed to the right is the value in _ret,
> > for the others, it's the value before the lock count is set.
> 
> This looks to be the trace of a SpinAcquire()
> (see src/backend/storage/ipc/spin.c):

Yes, those are my debug printfs :).

> > tas (0x30048048) -> 0
> > tas (0x3004804d) -> 0
> > tas (0x3004804c) -> 0
> > S_UNLOCK: (0x30048048) -> 1
> 
> followed by SpinRelease():
> 
> > tas (0x30048048) -> 0
> > S_UNLOCK: (0x3004804c) -> 1
> > S_UNLOCK: (0x3004804d) -> 1
> > S_UNLOCK: (0x30048048) -> 1
> 
> followed by a failed attempt to reacquire the same SLock:
> 
> > tas (0x30048048) -> 0
> > tas (0x3004804d) -> 4
> > tas (0x3004804d) -> 1
> > tas (0x3004804d) -> 1
> > tas (0x3004804d) -> 1
> > tas (0x3004804d) -> 1
> 
> And that looks completely broken :-( ... something's clobbered the
> exlock field of the SLock struct, apparently.  Are you sure this
> kernel feature you're trying to use actually works?

No I'm not sure actually. :)  I'll look into it further, but I
was wondering if there was something I could do to debug the
locks better.  I think I'll add some S_MAGIC or something in
the struct to see if the whole thing is getting clobbered or
what...  If you have any suggestions let me know.

> BTW, if you're wondering why an SLock needs to contain *three*
> hardware spinlocks, the answer is that it doesn't.  This code has
> been greatly simplified in current sources...

It did look a bit strange...

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Need help with phys backed shm segments (Postgresql+FreeBSD).

2000-12-05 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [001205 08:37] wrote:
> BTW, I just remembered that in 7.0.*, the SLocks that are managed by
> SpinAcquire() all live in their own little shm segment.  On a machine
> where slock_t is char, it'd likely only amount to 128 bytes or so.
> Maybe you are seeing some bug in FreeBSD's handling of tiny shm
> segments?

Good call, i think I found it! :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] beta testing version

2000-12-05 Thread Alfred Perlstein


> > > I totaly missed your point here. How closing source of 
> > > ERserver is related to closing code of PostgreSQL DB server?
> > > Let me clear things:
> > >
> > > 1. ERserver isn't based on WAL. It will work with any version >= 6.5
> > >
> > > 2. WAL was partially sponsored by my employer, Sectorbase.com,
> > > not by PG, Inc.
> > 
> > Has somebody thought about putting PG in the GPL licence 
> > instead of the BSD? 
> > PG inc would still be able to do there money giving support 
> > (just like IBM, HP and Compaq are doing there share with Linux),
> > without been able to close the code.

This gets brought up every couple of months, I don't see the point
in denying any of the current Postgresql developers the chance
to make some money selling a non-freeware version of Postgresql.

We can also look at it another way, let's say ER server was meant
to be closed source, if the code it was derived from was GPL'd
then that chance was gone before it even happened.  Hence no
reason to develop it.

*poof* no ER server.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Need help with phys backed shm segments (Postgresql+FreeBSD).

2000-12-05 Thread Alfred Perlstein


* Alfred Perlstein <[EMAIL PROTECTED]> [001205 12:30] wrote:
> * Tom Lane <[EMAIL PROTECTED]> [001205 08:37] wrote:
> > BTW, I just remembered that in 7.0.*, the SLocks that are managed by
> > SpinAcquire() all live in their own little shm segment.  On a machine
> > where slock_t is char, it'd likely only amount to 128 bytes or so.
> > Maybe you are seeing some bug in FreeBSD's handling of tiny shm
> > segments?
> 
> Good call, i think I found it! :)

Here's the patch I'm using on FreeBSD, it seems to work, if any
other FreeBSD'ers want to try it out, just apply the patch:
cd /usr/src/sys/vm ; patch < patchfile

and recompile and boot with a new kernel, then do this:

sysctl -w kern.ipc.shm_use_phys=1

or add:
kern.ipc.shm_use_phys=1 
to /etc/sysctl.conf

Let me know if it works.

thanks,
-Alfred

Index: phys_pager.c
===
RCS file: /home/ncvs/src/sys/vm/phys_pager.c,v
retrieving revision 1.3.2.1
diff -u -u -r1.3.2.1 phys_pager.c
--- phys_pager.c2000/08/04 22:31:11 1.3.2.1
+++ phys_pager.c2000/12/05 20:13:25
@@ -83,7 +83,7 @@
 * Allocate object and associate it with the pager.
 */
object = vm_object_allocate(OBJT_PHYS,
-   OFF_TO_IDX(foff + size));
+   OFF_TO_IDX(foff + PAGE_MASK + size));
object->handle = handle;
TAILQ_INSERT_TAIL(&phys_pager_object_list, object,
pager_object_list);

[HACKERS] Re: Sorry

2000-12-05 Thread Alfred Perlstein


* Randy Jonasz <[EMAIL PROTECTED]> [001205 14:31] wrote:
> 
> Sorry about that email.  I was trying to forward your comments to a friend
> and due to a lack of sleep I just typed "R" in pine. Doh!

That's ok, you work with Dan Moschuk right?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Need help with phys backed shm segments (Postgresql+FreeBSD).

2000-12-05 Thread Alfred Perlstein

* Oleg Bartunov <[EMAIL PROTECTED]> [001205 13:33] wrote:
> Alfred,
> 
> do you have any numbers with and without your patch ?
> I mean performance. You may use pg_check utility.

Er, I just made the patch a couple of hours ago, and I'm also
dealing with some other FreeBSD issues right now.  I will report
on it as soon as I can.

Theoretically You'll only see performance gains when doing fork(),
the real intent here is to allow for giant segments, without
kern.ipc.shm_use_phys=1 running let's say 768meg (out of 1gig)
shared memory segments will probably cause performance problems
because of the amount of swap structures needed per-process to
manage swappable segments.

I'm going to be enabling this on one of our boxes and see if it
makes a noticeable difference.  I'll let you guys know.

> > Date: Tue, 5 Dec 2000 13:04:45 -0800
> > From: Alfred Perlstein <[EMAIL PROTECTED]>
> > To: Tom Lane <[EMAIL PROTECTED]>
> > Cc: [EMAIL PROTECTED]
> > Subject: Re: [HACKERS] Need help with phys backed shm segments 
>(Postgresql+FreeBSD).
> > 
> > Here's the patch I'm using on FreeBSD, it seems to work, if any
> > other FreeBSD'ers want to try it out, just apply the patch:
> > cd /usr/src/sys/vm ; patch < patchfile
> > 
> > and recompile and boot with a new kernel, then do this:
> > 
> > sysctl -w kern.ipc.shm_use_phys=1
> > 
> > or add:
> > kern.ipc.shm_use_phys=1 
> > to /etc/sysctl.conf
> > 
> > Let me know if it works.
> > 
> > thanks,
> > -Alfred

[HACKERS] Patches with vacuum fixes available for 7.0.x

2000-12-07 Thread Alfred Perlstein


We recently had a very satisfactory contract completed by
Vadim.

Basically Vadim has been able to reduce the amount of time
taken by a vacuum from 10-15 minutes down to under 10 seconds.

We've been running with these patches under heavy load for
about a week now without any problems except one:
  don't 'lazy' (new option for vacuum) a table which has just
  had an index created on it, or at least don't expect it to
  take any less time than a normal vacuum would.

There's three patchsets and they are available at:

http://people.freebsd.org/~alfred/vacfix/

complete diff:
http://people.freebsd.org/~alfred/vacfix/v.diff

only lazy vacuum option to speed up index vacuums:
http://people.freebsd.org/~alfred/vacfix/vlazy.tgz

only lazy vacuum option to only scan from start of modified
data:
http://people.freebsd.org/~alfred/vacfix/mnmb.tgz

Although the patches are for 7.0.x I'm hoping that they
can be forward ported (if Vadim hasn't done it already)
to 7.1.

enjoy!

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

[HACKERS] abstract: fix poor constant folding in 7.0.x, fixed in 7.1?

2000-12-07 Thread Alfred Perlstein


I have an abstract solution for a problem in postgresql's
handling of what should be constant data.

We had problem with a query taking way too long, basically
we had this:

select
  date_part('hour',t_date) as hour,
  transval as val
from st
where
  id = 500 
  AND hit_date >= '2000-12-07 14:27:24-08'::timestamp - '24 hours'::timespan
  AND hit_date <= '2000-12-07 14:27:24-08'::timestamp
;

turning it into:

select
  date_part('hour',t_date) as hour,
  transval as val
from st
where
  id = 500 
  AND hit_date >= '2000-12-07 14:27:24-08'::timestamp
  AND hit_date <= '2000-12-07 14:27:24-08'::timestamp
;

(doing the -24 hours seperately)

The values of cost went from:
(cost=0.00..127.24 rows=11 width=12)
to:
(cost=0.00..4.94 rows=1 width=12)

By simply assigning each sql "function" a taint value for constness
one could easily reduce:
  '2000-12-07 14:27:24-08'::timestamp - '24 hours'::timespan
to:
  '2000-12-07 14:27:24-08'::timestamp
by applying the expression and rewriting the query.

Each function should have a marker that explains whether when given
a const input if the output might vary, that way subexpressions can
be collapsed until an input becomes non-const.

Here, let's break up:
  '2000-12-07 14:27:24-08'::timestamp - '24 hours'::timespan

What we have is:
   timestamp(const) - timespan(const)

we have timestamp defined like so:
const timestamp(const string)
non-const timestamp(non-const)

and timespan like so:
const timespan(const string)
non-const timespan(non-const)

So now we have:
   const timestamp((const string)'2000-12-07 14:27:24-08')
 - const timespan((const string)'24 hours')
---
   const
 - const

   const

then eval the query.

You may want to allow a function to have a hook where it can
eval a const because depending on the const it may or may not
be able to return a const, for instance if some string
you passed to timestamp() caused it to return non-const data.

Or maybe this is fixed in 7.1?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] abstract: fix poor constant folding in 7.0.x, fixed in 7.1?

2000-12-07 Thread Alfred Perlstein


* Joel Burton <[EMAIL PROTECTED]> [001207 15:52] wrote:
> > We had problem with a query taking way too long, basically
> > we had this:
> > 
> > select
> >   date_part('hour',t_date) as hour,
> >   transval as val
> > from st
> > where
> >   id = 500 
> >   AND hit_date >= '2000-12-07 14:27:24-08'::timestamp - '24
> >   hours'::timespan AND hit_date <= '2000-12-07 14:27:24-08'::timestamp
> > ;
> > 
> > turning it into:
> > 
> > select
> >   date_part('hour',t_date) as hour,
> >   transval as val
> > from st
> > where
> >   id = 500 
> >   AND hit_date >= '2000-12-07 14:27:24-08'::timestamp
> >   AND hit_date <= '2000-12-07 14:27:24-08'::timestamp
> > ;
> 
> Perhaps I'm being daft, but why should hit_date be both >= and <= 
> the exact same time and date? (or did you mean to subtract 24 
> hours from your example and forgot?)

Yes, typo.

> > (doing the -24 hours seperately)
> > 
> > The values of cost went from:
> > (cost=0.00..127.24 rows=11 width=12)
> > to:
> > (cost=0.00..4.94 rows=1 width=12)
> > 
> > By simply assigning each sql "function" a taint value for constness
> > one could easily reduce:
> >   '2000-12-07 14:27:24-08'::timestamp - '24 hours'::timespan
> > to:
> >   '2000-12-07 14:27:24-08'::timestamp
> 
> You mean '2000-12-06', don't you?

Yes, typo. :)

> 
> > Each function should have a marker that explains whether when given a
> > const input if the output might vary, that way subexpressions can be
> > collapsed until an input becomes non-const.
> 
> There is "with (iscachable)".
> 
> Does
> 
> CREATE FUNCTION YESTERDAY(timestamp) RETURNS timestamp AS
> 'SELECT $1-''24 hours''::interval' WITH (iscachable)
> 
> work faster?

It could be, but it could be done in the sql compiler/planner
explicitly to save me from myself, no?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] abstract: fix poor constant folding in 7.0.x, fixed in 7.1?

2000-12-07 Thread Alfred Perlstein

* Tom Lane <[EMAIL PROTECTED]> [001207 16:45] wrote:
> Alfred Perlstein <[EMAIL PROTECTED]> writes:
> > Each function should have a marker that explains whether when given
> > a const input if the output might vary, that way subexpressions can
> > be collapsed until an input becomes non-const.
> 
> We already have that and do that.
> 
> The reason the datetime-related routines are generally not marked
> 'proiscachable' is that there's this weird notion of a CURRENT time
> value, which means that the result of a datetime calculation may
> vary depending on when you do it, even though the inputs don't.
> 
> Note that CURRENT here does not mean translating 'now' to current
> time during input conversion, it's a special-case data value inside
> the system.
> 
> I proposed awhile back (see pghackers thread "Constant propagation and
> similar issues" from mid-September) that we should eliminate the CURRENT
> concept, so that datetime calculations can be constant-folded safely.
> That, um, didn't meet with universal approval... but I still think it
> would be a good idea.

I agree with you that doing anything to be able to fold these would
be nice.  However there's a hook mentioned in my abstract that
explains that if a constant makes it into a function, you can
provide a hook so that the function can return whether or not that
constant is cacheable.

If the date functions used that hook to get a glimpse of the constant
data passed in, they could return 'cachable' if it doesn't contain
the 'CURRENT' stuff you're talking about.

something like this could be called on input to "maybe-cachable"
functions:

int
date_cachable_hook(const char *datestr)
{

if (strcasecmp("current", datestr) == 0)
return (UNCACHEABLE);
return (CACHEABLE);
}

Or maybe I'm missunderstanding what CURRENT implies?

I do see that on:
  http://www.postgresql.org/mhonarc/pgsql-hackers/2000-09/msg00408.html

both you and Thomas Lockhart agree that CURRENT is a broken concept
because it can cause btree inconsistancies and should probably be
removed anyway.

No one seems to dispute that, and then the thread leads off into
discussions about optimizer hints.

> In the meantime you can cheat by defining functions that you choose
> to mark ISCACHABLE, as has been discussed several times in the archives.

Yes, but it doesn't help the niave user (me :) ) much. :(

Somehow I doubt that if 'CURRENT' was ifdef'd people would complain.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Patches with vacuum fixes available for 7.0.x

2000-12-07 Thread Alfred Perlstein

* Tom Lane <[EMAIL PROTECTED]> [001207 17:10] wrote:
> Alfred Perlstein <[EMAIL PROTECTED]> writes:
> > Basically Vadim has been able to reduce the amount of time
> > taken by a vacuum from 10-15 minutes down to under 10 seconds.
> 
> Cool.  What's it do, exactly?

The first is a bonus that Vadim gave us to speed up index
vacuums, I'm not sure I understand it completely, but it 
work really well. :)

here's the README he gave us:

   Vacuum LAZY index cleanup option

LAZY vacuum option introduces new way of indices cleanup.
Instead of reading entire index file to remove index tuples
pointing to deleted table records, with LAZY option vacuum
performes index scans using keys fetched from table record
to be deleted. Vacuum checks each result returned by index
scan if it points to target heap record and removes
corresponding index tuple.
This can greatly speed up indices cleaning if not so many
table records were deleted/modified between vacuum runs.
Vacuum uses new option on user' demand.

New vacuum syntax is:

vacuum [verbose] [analyze] [lazy] [table [(columns)]]

The second is one of the suggestions I gave on the lists a while
back, keeping track of the "last dirtied" block in the data files
to only scan the tail end of the file for deleted rows, I think
what he instead did was keep a table that holds all the modified
blocks and vacuum only scans those:

  Minimal Number Modified Block (MNMB)

This feature is to track MNMB of required tables with triggers
to avoid reading unmodified table pages by vacuum. Triggers
store MNMB in per-table files in specified directory
($LIBDIR/contrib/mnmb by default) and create these files if not
existed.

Vacuum first looks up functions

mnmb_getblock(Oid databaseId, Oid tableId)
mnmb_setblock(Oid databaseId, Oid tableId, Oid block)

in catalog. If *both* functions were found *and* there was no
ANALYZE option specified then vacuum calls mnmb_getblock to obtain
MNMB for table being vacuumed and starts reading this table from
block number returned. After table was processed vacuum calls
mnmb_setblock to update data in file to last table block number.
Neither mnmb_getblock nor mnmb_setblock try to create file.
If there was no file for table being vacuumed then mnmb_getblock
returns 0 and mnmb_setblock does nothing.
mnmb_setblock() may be used to set in file MNMB to 0 and force
vacuum to read entire table if required.

To compile MNMB you have to add -DMNMB to CUSTOM_COPT
in src/Makefile.custom.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Patches with vacuum fixes available for 7.0.x

2000-12-07 Thread Alfred Perlstein


* Tom Samplonius <[EMAIL PROTECTED]> [001207 18:55] wrote:
> 
> On Thu, 7 Dec 2000, Alfred Perlstein wrote:
> 
> > We recently had a very satisfactory contract completed by
> > Vadim.
> > 
> > Basically Vadim has been able to reduce the amount of time
> > taken by a vacuum from 10-15 minutes down to under 10 seconds.
> ...
> 
>   What size database was that on?

Tables were around 300 megabytes.

>   I looking at moving a 2GB database from MySQL to Postgres.  Most of that
> data is one table with 12 million records, to which we post about 1.5
> million records a month.  MySQL's table locking sucks, but as long as are
> careful about what reports we run and when, we can avoid the problem.  
> However, Postgres' vacuum also sucks.  I have no idea how long our
> particular database would take to vacuum, but I don't think it would be
> very nice.

We only do about 54,000,000 updates to a single table per-month.

>   That also leads to the erserver thing.  erserver sounds nice, but I sure
> wish it was possible to get more details on it.  It seems rather
> intangible right now.  If erserver is payware, where do I buy it?

Contact Pgsql Inc. I think it's free, but you have to discuss terms
with them.

>   This is getting a bit off-topic now...

Scalabilty is hardly ever off-topic. :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Re: CRC

2000-12-10 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [001210 12:00] wrote:
> Bruce Guenter <[EMAIL PROTECTED]> writes:
> >> A good theory, but unfortunately not a correct theory.  PA-RISC can do a
> >> circular shift in one cycle using the "shift right double" instruction,
> 
> > Interesting.  I was under the impression that virtually no RISC CPU had
> > a rotate instruction.  Do any others?
> 
> Darn if I know.  A RISC purist would probably say that PA-RISC isn't all
> that reduced ... for example, the reason it needs six cycles not seven
> for the CRC inner loop is that the LOAD instruction has an option to
> postincrement the pointer register (like a C "*ptr++").
> 
> > Same with the x86 core:
> > movb %dl,%al
> > xorb (%ecx),%al
> > andl $255,%eax
> > shrl $8,%edx
> > incl %ecx
> > xorl (%esi,%eax,4),%edx
> 
> > On my Celeron, the timing for those six opcodes is almost whopping 13
> > cycles per byte.  Obviously there's some major performance hit to do the
> > memory instructions, because there's no more than 4 cycles worth of
> > dependant instructions in that snippet.
> 
> Yes.  It looks like we're looking at pipeline stalls for the memory
> reads.  I expect PA-RISC would have the same problem if it were not that
> the CRC table and data buffer are almost certainly loaded into level-2
> cache memory.  Curious that you don't get the same result --- what is
> the memory cache architecture on your box?
> 
> As Nathan remarks nearby, this is just minutiae, but I'm interested
> anyway...

I would try unrolling the loop some (if possible) and retesting.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

[HACKERS] (one more time) Patches with vacuum fixes available.

2000-12-11 Thread Alfred Perlstein


I know you guys are pretty busy with the upcoming release but I
was hoping for more interest in this work.

With this (which needs forward porting) we're able to cut
vacuum time down from ~10minutes to under 30 seconds.

The code is a nop unless you compile with special options(MMNB)
specify the special vacuum flag (VLAZY) and doesn't look like it
messes with anything otherwise.

I was hoping to see it go into 7.0.x because of the non-intrusiveness
of it and also because Vadim did it so he should understand it so
that it won't cause any problems (and on the slight chance that it
does, he should be able to fix it).

Basically Vadim left it up to me to campaign for acceptance of this
work and he said he wouldn't have a problem bringing it in as long
as it was ok with the rest of the development team.

So can we get a go-ahead on this? :)

thanks,
-Alfred

- Forwarded message from Alfred Perlstein <[EMAIL PROTECTED]> -----

From: Alfred Perlstein <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: [HACKERS] Patches with vacuum fixes available for 7.0.x
Date: Thu, 7 Dec 2000 14:57:32 -0800
Message-ID: <[EMAIL PROTECTED]>
User-Agent: Mutt/1.2.5i
Sender: [EMAIL PROTECTED]

We recently had a very satisfactory contract completed by
Vadim.

Basically Vadim has been able to reduce the amount of time
taken by a vacuum from 10-15 minutes down to under 10 seconds.

We've been running with these patches under heavy load for
about a week now without any problems except one:
  don't 'lazy' (new option for vacuum) a table which has just
  had an index created on it, or at least don't expect it to
  take any less time than a normal vacuum would.

There's three patchsets and they are available at:

http://people.freebsd.org/~alfred/vacfix/

complete diff:
http://people.freebsd.org/~alfred/vacfix/v.diff

only lazy vacuum option to speed up index vacuums:
http://people.freebsd.org/~alfred/vacfix/vlazy.tgz

only lazy vacuum option to only scan from start of modified
data:
http://people.freebsd.org/~alfred/vacfix/mnmb.tgz

Although the patches are for 7.0.x I'm hoping that they
can be forward ported (if Vadim hasn't done it already)
to 7.1.

enjoy!

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

- End forwarded message -

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] (one more time) Patches with vacuum fixes available.

2000-12-11 Thread Alfred Perlstein


* The Hermit Hacker <[EMAIL PROTECTED]> [001211 14:27] wrote:
> On Mon, 11 Dec 2000, Bruce Momjian wrote:
> 
> > > Alfred Perlstein <[EMAIL PROTECTED]> writes:
> > > > Basically Vadim left it up to me to campaign for acceptance of this
> > > > work and he said he wouldn't have a problem bringing it in as long
> > > > as it was ok with the rest of the development team.
> > > > So can we get a go-ahead on this? :)
> > > 
> > > If Vadim isn't sufficiently confident of it to commit it on his own
> > > authority, I'm inclined to leave it out of 7.1.  My concern is mostly
> > > schedule.  We are well into beta cycle now and this seems like way too
> > > critical (not to say high-risk) a feature to be adding after start of
> > > beta.
> > 
> > I was wondering if Vadim was hesitant because he had done this under
> > contract.  Vadim, are you concerned about reliability or are there other
> > issues?
> 
> Irrelevant .. we are post-beta release, and this doesn't fix a bug, so it
> doesn't go in ...

I'm hoping this just means it won't be investigated until the release
is made?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] (one more time) Patches with vacuum fixes available .

2000-12-11 Thread Alfred Perlstein


* Andrew Snow <[EMAIL PROTECTED]> [001211 20:21] wrote:
> 
> On Mon, 11 Dec 2000, Tom Lane wrote:
> 
> > "Mikheev, Vadim" <[EMAIL PROTECTED]> writes:
> > > If there are no objections then I'm ready to add changes to 7.1.
> > > Else, I'll produce patches for 7.1 just after release and incorporate
> > > changes into 7.2.
> > 
> > I'd vote for the second choice.  I do not think we should be adding new
> > features now.  Also, I don't know about you, but I have enough bug fix,
> > testing, and documentation work to keep me busy till January even
> > without any new features...
> 
> It'd be really naughty to add it to the beta at this stage.  Would it be
> possible to add it to the 7.1 package with some kind of compile-time option?
> So that those of us who do want to use it, can.

One is a compile time option (CFLAGS+=-DMMNB), the other doesn't
happen unless you ask for it:

vacuum lazy ;

I don't understand what the deal here is, as I said, it's optional
code that you won't see unless you ask for it.

[children: 0 12/11/2000 21:57:20 x]
Vacuuming link.
[children: 0 12/11/2000 21:57:54 x]

-rw---  1 pgsql  pgsql  134627328 Dec 11 21:57 link
-rw---  1 pgsql  pgsql  261201920 Dec 11 21:57 link_triple_idx

Yup, 30 seconds, the table is 134 megabytes and the index is 261 megs.

I think normally this takes about 10 or so _minutes_.

On our faster server:

[children: 0 12/11/2000 22:17:50 x]
Vacuuming referer_link.
[children: 0 12/11/2000 22:18:09 x]

-rw---  1 pgsql  wheel  273670144 Dec 11 22:15 link
-rw---  1 pgsql  wheel  641048576 Dec 11 22:15 link_triple_idx

time is ~19seconds, table is 273 megs, and index 641 megs.

dual 800mhz, raid 5 disks.

I think the users deserve this patch. :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Why vacuum?

2000-12-13 Thread Alfred Perlstein

* Martin A. Marques <[EMAIL PROTECTED]> [001213 15:15] wrote:
> El Mié 13 Dic 2000 16:41, bpalmer escribió:
> > I noticed the other day that one of my pg databases was slow,  so I ran
> > vacuum on it,  which brought a question to mind:  why the need?  I looked
> > at my oracle server and we aren't doing anything of the sort (that I can
> > find),  so why does pg need it?  Any info?
> 
> I know nothing about Oracle, but I can tell you that Informix has an update 
> statistics, which I don't know if it's similar to vacuum, but
> What vacuum does is clean the database from rows that were left during 
> updates and deletes, non the less, the tables get shrincked, so searches get 
> faster.

Yes, postgresql requires vacuum quite often otherwise queries and
updates start taking ungodly amounts of time to complete.  If you're
having problems because vacuum locks up your tables for too long
you might want to check out:

http://people.freebsd.org/~alfred/vacfix/

It has some tarballs that have patches to speed up vacuum depending
on how you access your tables you can see up to a 20x reduction in
vacuum time.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Idea for reducing planning time

2000-12-13 Thread Alfred Perlstein

* Tom Lane <[EMAIL PROTECTED]> [001213 15:18] wrote:
> 
> I'm trying to resist the temptation to make this change right now :-).
> It's not quite a bug fix --- well, maybe you could call it a performance
> bug fix --- so I'm kind of thinking it shouldn't be done during beta.
> OTOH I seem to have lost the argument that Vadim shouldn't commit VACUUM
> performance improvements during beta, so maybe this should go in too.
> What do you think?

If you're saying that you're OK with the work Vadim has done please
let him know, I'm assuming he hasn't committed out of respect for your
still standing objection.

If you're terribly against it then say so again, I just would rather
it not happen because you objected rather than missed communication.

As far as the work you're proposing, how much of a gain is it over
the current code?  2x? 3x? 20x? :)  There's a difference between a
slight performance increase and something too good to pass up.

thanks,
-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Why vacuum?

2000-12-13 Thread Alfred Perlstein


* xuyifeng <[EMAIL PROTECTED]> [001213 18:54] wrote:
> I have this nasty problem too,  in early time, I don't know the problem, but we used 
>it for a while,
> than we found our table growing too fast without insert any record( we use update), 
>this behaviour 
> most like M$ MSACCESS database I had used a long time ago which don't reuse deleted 
>record 
> space and full fill your hard disk after several hours,  the nasty vaccum block any 
>other users to operate
> on table,  this is a big problem for a large table, because it will block tooo long 
>to let other user to run
> query. we have a project affected by this problem, and sadly we decide to use 
>closure source database
>  - SYBASE on linux, we havn't any other selections. :(
> 
> note that SYBASE and Informix both have 'update statistics' command, but they run it 
>fast in seconds,
> not block any other user, this is pretty. ya, what's good technology!

http://people.freebsd.org/~alfred/vacfix/

-Alfred

Re: [HACKERS] Why vacuum?

2000-12-14 Thread Alfred Perlstein


* Ross J. Reedstrom <[EMAIL PROTECTED]> [001214 07:57] wrote:
> On Thu, Dec 14, 2000 at 12:07:00PM +0100, Zeugswetter Andreas SB wrote:
> > 
> > They all have an overwriting storage manager. The current storage manager
> > of PostgreSQL is non overwriting, which has other advantages.
> > 
> > There seem to be 2 answers to the problem:
> > 1. change to an overwrite storage manager
> > 2. make vacuum concurrent capable
> > 
> > The tendency here seems to be towards an improved smgr.
> > But, it is currently extremely cheap to calculate where a new row
> > needs to be located physically. This task is *a lot* more expensive
> > in an overwrite smgr. It needs to maintain a list of pages with free slots,
> > which has all sorts of concurrency and persistence problems.
> > 
> 
> Not to mention the recent thread here about people recovering data that
> was accidently deleted, or from damaged db files: the old tuples serve
> as redundant backup, in a way. Not a real compelling reason to keep a
> non-overwriting smgr, but still a surprise bonus for those who need it.

One could make vacuum optional such that it either:

1) always overwrites
2) will not overwrite data until a vacuum is called (perhaps with
   a date option to specify how much deleted data you wish to
   reclaim) data can be marked free but not free for re-use
   until vacuum is run.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Why vacuum?

2000-12-14 Thread Alfred Perlstein


* Daniele Orlandi <[EMAIL PROTECTED]> [001214 09:10] wrote:
> Zeugswetter Andreas SB wrote:
> > 
> > If the priority is too low you will end up with the same behavior as current,
> 
> Yes, and it is the intended behaviour. I'd use idle priority for it.

If you're talking about vacuum, you really don't want to do this,
what's going to happen is that since you have an exclusive lock on
the file during your vacuum and no way to do priority lending you
can deadlock.

> > because the cache will be emptied by high priority multiple new rows,
> > thus writing to the end anyways.
> 
> Yes, but this only happens when you don't have enought spare idle CPU
> time. If you are in such situation for long periods, there's nothing you
> can do, you already have problems.
> 
> My approach in winning here because it allows you to have bursts of CPU
> utilization without being affected by the overhead of a overwriting smgr
> that (without hacks) will always try to find available slots, even in
> high load situations.
> 
> > Conclusio: In those cases where overwrite would be most advantageous (high
> > volume modified table) your system won't work
> 
> Why ? I have plenty of CPU time available on my server, even if one of
> my table is highly volatile, fast-changing.

When your table grows to be very large you'll see what we're talking 
about.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Why vacuum?

2000-12-14 Thread Alfred Perlstein


* mlw <[EMAIL PROTECTED]> [001214 09:30] wrote:
> "Martin A. Marques" wrote:
> > 
> > El Mié 13 Dic 2000 16:41, bpalmer escribió:
> > > I noticed the other day that one of my pg databases was slow,  so I ran
> > > vacuum on it,  which brought a question to mind:  why the need?  I looked
> > > at my oracle server and we aren't doing anything of the sort (that I can
> > > find),  so why does pg need it?  Any info?
> > 
> > I know nothing about Oracle, but I can tell you that Informix has an update
> > statistics, which I don't know if it's similar to vacuum, but
> > What vacuum does is clean the database from rows that were left during
> > updates and deletes, non the less, the tables get shrincked, so searches get
> > faster.
> > 
> 
> While I would like Postgres to perform statistics, one and a while, on
> it own. I like vacuum in general.
> 
> I would rather trade unused disk space for performace. The last thing
> you need during high loads is the database thinking that it is time to
> clean up.

Even worse is having to scan a file that has grown 20x the size
because you havne't vacuum'd in a while.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Idea for reducing planning time

2000-12-15 Thread Alfred Perlstein


* Bruce Momjian <[EMAIL PROTECTED]> [001215 10:34] wrote:
> > 
> > sorry, meant to respond to the original and deleted it too fast ... 
> > 
> > Tom, if the difference between 7.0 and 7.1 is such that there is a
> > performance decrease, *please* apply the fix ... with the boon that OUTER
> > JOINs will provide, would hate to see us with a performance hit reducing
> > that impact ...
> > 
> > One thing I would like to suggest for this stage of the beta, though, is
> > that a little 'peer review' before committing the code might be something
> > that would help 'ease' implementing stuff like this and Vadim's VACUUM
> > code ... read through Vadim's code and see if it looks okay to you ... get
> > Vadim to read through your code/patch and see if it looks okay to him
> > ... it adds a day or two to the commit cycle, but at least you can say it
> > was reviewed before committed ...
> > 
> 
> Totally agree.  In the old days, we posted all our patches to the list
> so people could see.  We used to make cvs commits only on the main
> server, so we had the patch handy, and it made sense to post it.  Now
> that we have remote cvs, we don't do it as much, but in this case, cvs
> diff -c is a big help.

It seems that Tom has committed his fixups but we're still waiting
on Vadim?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

[HACKERS] externalizing PGresult?

2000-12-21 Thread Alfred Perlstein


Is there anything for encoding a PGresult struct into something I
can pass between processes?  Like turning it into a platform
independant stream that I can pass between machines?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Unable to check out REL7_1 via cvs

2000-12-22 Thread Alfred Perlstein


* Yusuf Goolamabbas <[EMAIL PROTECTED]> [001222 15:34] wrote:
> Hi, I am using the following command to check out the 7.1 branch of
> PostgreSQL.
> cvs -d :pserver:[EMAIL PROTECTED]:/home/projects/pgsql/cvsroot co -r REL7_1 
>pgsql
> 
> This is the error I am getting.
> cvs [server aborted]: cannot write /home/projects/pgsql/cvsroot/CVSROOT/val-tags: 
>Permission denied
> 
> I can check out HEAD perfectly alright
> 
> Anybody else seeing similar results ?

Try using "cvs -Rq ..." or just use CVSup it's (cvsup) a lot quicker.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Unable to check out REL7_1 via cvs

2000-12-22 Thread Alfred Perlstein


* Yusuf Goolamabbas <[EMAIL PROTECTED]> [001222 15:47] wrote:
> Nope, no luck with cvs -Rq also. Me thinks its some repository
> permission issue. Don't know if CVSup would help either. I don't have
> cvsup installed on this machine. 

CVSup would work, that's what I use.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] GNU readline and BSD license

2000-12-23 Thread Alfred Perlstein


* Bruce Momjian <[EMAIL PROTECTED]> [001223 06:59] wrote:
> Rasmus Lerdorf, the big PHP developer, told me that the existance of GNU
> readline hooks in our source tree could cause RMS/GNU to force us to a
> GNU license.
> 
> Obviously, we could remove readline hooks and ship a BSD line editing
> library, but does this make any sense to you?  It doesn't make sense to
> me, but he was quite certain.
> 
> Our ODBC library is also GNU licensed, but I am told this is not a
> problem because it doesn't link into the backend.  However, neither does
> readline.  However, readline does link into psql.

FreeBSD has a freely available library called 'libedit' that could
be shipped with postgresql, it's under the BSD license.

If you have access to a FreeBSD box see the editline(3) manpage,
or go to: 
http://www.freebsd.org/cgi/man.cgi?query=editline&apropos=0&sektion=0&manpath=FreeBSD+4.2-RELEASE&format=html

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Re: Too many open files (was Re: spinlock problems reported earlier)

2000-12-23 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [001223 14:16] wrote:
> Department of Things that Fell Through the Cracks:
> 
> Back in August we had concluded that it is a bad idea to trust
> "sysconf(_SC_OPEN_MAX)" as an indicator of how many files each backend
> can safely open.  FreeBSD was reported to return 4136, and I have
> since noticed that LinuxPPC returns 1024.  Both of those are
> unreasonably large fractions of the actual kernel file table size.
> A few dozen backends opening hundreds of files apiece will fill the
> kernel file table on most Unix platforms.

getdtablesize(2) on BSD should tell you the per-process limit.
sysconf on FreeBSD shouldn't lie to you.

getdtablesize should take into account limits in place.

later versions of FreeBSD have a sysctl 'kern.openfiles' which
can be checked to see if the system is approaching the systemwide
limit.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Upper limit on number of buffers?

2000-12-24 Thread Alfred Perlstein

* mlw <[EMAIL PROTECTED]> [001224 18:06] wrote:
> This line works:
> /usr/local/pgsql/bin/postmaster -N 32 -B 928 -i -S
> -D/home/postgres/pgdev -o "-F -fs -S 4096"
> 
> Where as this line:
> 
> /usr/local/pgsql/bin/postmaster -N 32 -B 1024 -i -S
> -D/home/postgres/pgdev -o "-F -fs -S 4096"
> 
> does not.
> 
> Any ideas? 
> I have 256M of memory, RedHat Linux 7.0, CVS version of Postgres as of a
> couple days ago.

Giving us the exact reason it "doesn't work" would be helpful, perhaps
the error message?

I'm just going to guess that you need to consult your OS's
documentation and figure out how to raise the amount of system V
shared memory available.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] Assuming that TAS() will succeed the first time is verboten

2000-12-28 Thread Alfred Perlstein

* Tom Lane <[EMAIL PROTECTED]> [001228 14:25] wrote:
> [EMAIL PROTECTED] (Nathan Myers) writes:
> > I wonder about the advisability of using spinlocks in user-level code 
> > which might be swapped out any time.
> 
> The reason we use spinlocks is that we expect the lock to succeed (not
> block) the majority of the time, and we want the code to fall through
> as quickly as possible in that case.  In particular we do *not* want to
> expend a kernel call when we are able to acquire the lock immediately.
> It's not a true "spin" lock because we don't sit in a tight loop when
> we do have to wait for the lock --- we use select() to delay for a small
> interval before trying again.  See src/backend/storage/buffer/s_lock.c.
> 
> The design is reasonable, even if a little bit offbeat.

It sounds pretty bad, if you have a contested lock you'll trap into
the kernel each time you miss, crossing the protection boundry and
then waiting.  It's a tough call to make, because on UP systems
you loose bigtime by spinning for your quantum, however on SMP
systems there's a chance that the lock is owned by a process on
another CPU and spinning might be benificial.

One trick that may help is calling sched_yield(2) on a lock miss,
it's a POSIX call and quite new so you'd need a 'configure' test
for it.

http://www.freebsd.org/cgi/man.cgi?query=sched_yield&apropos=0&sektion=0&manpath=FreeBSD+4.2-RELEASE&format=html

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] GNU readline and BSD license

2000-12-29 Thread Alfred Perlstein


* The Hermit Hacker <[EMAIL PROTECTED]> [001229 14:11] wrote:
> On Sat, 23 Dec 2000, Bruce Momjian wrote:
> 
> > > FreeBSD has a freely available library called 'libedit' that could
> > > be shipped with postgresql, it's under the BSD license.
> > 
> > Yes, that is our solution if we have a real problem here.
> 
> Is there a reason *not* to move towards that for v7.2 so that the
> functions we are making optional with readline are automatic?  Since we
> could then ship the code, we could make it a standard vs optional
> "feature" ...
> 
> My thought would be to put 'make history feaure standard using libedit'
> onto the TODO list and take it from there ...

I doubt I'd have the time to do it, but if you guys want to use
libedit it'd probably be a good idea at least to reduce the amount
of potential GPL tainting in the source code.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Re: [HACKERS] GNU readline and BSD license

2000-12-29 Thread Alfred Perlstein


* Tom Lane <[EMAIL PROTECTED]> [001229 15:43] wrote:
> Lamar Owen <[EMAIL PROTECTED]> writes:
> > How different is the feature set?
> 
> I was going to ask the same thing.  If it's an exact replacement then
> OK, but I do not want to put up with non-Emacs-compatible keybindings,
> to mention just one likely issue.
> 
> The whole thing really strikes me as make-work anyway.  Linux is GPL'd;
> does anyone want to argue that we shouldn't run on Linux?  Since we
> are not including libreadline in our distribution, there is NO reason
> to worry about using it when it's available.  Wanting to find a
> replacement purely because of the license amounts to license bigotry,
> IMHO.

Rasmus Lerdorf warned one of you guys that simply linking to GNU
readline can contaminate code with the GPL.

Readline isn't LGPL which permits linking without lincense issues,
it is GPL which means that if you link to it, you must be GPL as
well.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

1 2 3 >

1 - 100 of 212 matches

Mail list logo