date:20070908

Re: [HACKERS] tsearch filenames unlikes special symbols and numbers

2007-09-08 Thread Oleg Bartunov


On Sun, 9 Sep 2007, Oleg Bartunov wrote:


Oh, my god, I see we dictate extensions !

STATEMENT:  CREATE TEXT SEARCH DICTIONARY en_ispell (
   TEMPLATE = ispell,
   DictFile = englishDict,
   AffFile =  englishAff,
   StopWords = englishStop
   );
ERROR:  could not open dictionary file 
"/usr/local/pgsql-dev/share/tsearch_data/englishdict.dict": No such file or 
directory


Folk, this is too much ! Now, we dictate extensions '.dict, .affix, .stop',
what else ?


I notice, that documentation doesn't mention about this
http://momjian.us/main/writings/pgsql/sgml/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

[HACKERS] ispell dictionary broken in CVS HEAD ?

2007-09-08 Thread Oleg Bartunov


Hi there,

seems something is broken in ispell dictionary (CVS HEAD).

event=# CREATE TEXT SEARCH DICTIONARY en_ispell (
TEMPLATE = ispell,
DictFile = english,
AffFile =  english,
StopWords = english
);
CREATE TEXT SEARCH DICTIONARY
event=# select ts_lexize('en_ispell','stars');
 ts_lexize
---


But ispell does know 'stars'

zen:~/app/pgsql/pgweb>ispell
@(#) International Ispell Version 3.2.06 08/01/01
word: stars
ok (derives from root STAR)


Checked in tsearch2 (8.2.4):

apod=# insert into pg_ts_dict
(SELECT 'en_ispell', dict_init,
'DictFile="/usr/local/share/dicts/ispell/utf8/english-utf8.dict",'
 'AffFile="/usr/local/share/dicts/ispell/utf8/english-utf8.aff",'
 'StopFile="/usr/local/share/dicts/ispell/utf8/english-utf8.stop"',
 dict_lexize
 FROM pg_ts_dict
 WHERE dict_name = 'ispell_template'
 );

apod=# select lexize('en_ispell','stars');
 lexize 


 {star}


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] tsearch filenames unlikes special symbols and numbers

2007-09-08 Thread Oleg Bartunov


On Sun, 2 Sep 2007, Tom Lane wrote:


Gregory Stark <[EMAIL PROTECTED]> writes:

"Tom Lane" <[EMAIL PROTECTED]> writes:

I made it reject all but latin letters, which is the same restriction
that's in place for timezone set filenames.  That might be overly
strong, but we definitely have to forbid "." and "/" (and "\" on
Windows).  Do we want to restrict it to letters, digits, underscore?
Or does it need to be weaker than that?



What's the problem with "."?


../../../../etc/passwd

Possibly we could allow '.' as long as we forbade /, but the other
trouble with allowing . is that it encourages people to try to specify
the filetype suffix (as indeed Oleg was doing).  I'd prefer to keep the
suffixes out of the SQL object definitions, with an eye to possibly
someday migrating all the configuration data inside the database.
There's a reasonable argument for restricting the names used for these
things in the SQL definitions to be valid SQL identifiers, so that that
will work nicely...


So, what's the current policy ? Still a-z, A-Z ? I think we should allow
'.' and prevent '/'. Look, how ugly is our current ispell setup, which
depends on 3 files - stop word list, .dict and .aff.

Right now, I can use something like

CREATE TEXT SEARCH DICTIONARY en_ispell (
TEMPLATE = ispell,
DictFile = englishDict,
AffFile =  englishAff,
StopWords = english
);

I'd better use english.dict, english.aff, english.stop, whih is usual for
any user, without dictating user here. We already did a lot of 
restrictions.


I hope we won't require special extension like .dict, .aff, since it's
unknown in advance what files will use other dictionaries.
If we allow '.' without '/', then we'd be happy.
I'd remove requirement for extension of stop words list, which looks
rather artificially to me.

Oh, my god, I see we dictate extensions !

STATEMENT:  CREATE TEXT SEARCH DICTIONARY en_ispell (
TEMPLATE = ispell,
DictFile = englishDict,
AffFile =  englishAff,
StopWords = englishStop
);
ERROR:  could not open dictionary file 
"/usr/local/pgsql-dev/share/tsearch_data/englishdict.dict": No such file or 
directory

Folk, this is too much ! Now, we dictate extensions '.dict, .affix, .stop',
what else ?

Does it defined by ispell template only, or it's global requirements ?

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

[HACKERS] invalidly encoded strings

2007-09-08 Thread Andrew Dunstan



I have been looking at fixing the issue of accepting strings that are 
not valid in the database encoding. It appears from previous discussion 
that we need to add a call to pg_verifymbstr() to the relevant input 
routines and ensure that the chr() function returns a valid string. That 
leaves several issues:


. which are the relevant input routines? I have identified the following 
as needing remediation: textin(), bpcharin(), varcharin(), anyenum_in(), 
namein().  Do we also need one for cstring_in()? Does the xml code 
handle this as part of xml validation?


. what do we need to do to make the verification code more efficient? I 
think we need to address the correctness issue first, but doing so 
should certainly make us want to improve the verification code. For 
example, I'm wondering if it might benefit from having a tiny cache.


. for chr() under UTF8, it seems to be generally agreed that the 
argument should represent the codepoint and the function should return 
the correspondingly encoded character. If so, possible the argument 
should be a bigint to accommodate the full range of possible code 
points. It is not clear what the argument should represent for other 
multi-byte encodings for any argument higher than 127. Similarly, it is 
not clear what ascii() should return in such cases. I would be inclined 
just to error out.


cheers

andrew

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Just-in-time Background Writer Patch+Test Results

2007-09-08 Thread Alvaro Herrera

Greg Smith wrote:
> On Sat, 8 Sep 2007, Tom Lane wrote:
>
>> It might be interesting to consider making the delay auto-tune: if you
>> wake up and find nothing (much) to do, sleep longer the next time,
>> conversely shorten the delay when work picks up.  Something for 8.4,
>> though, at this point.
>
> I have a couple of pages of notes on how to tune the delay automatically. 
> The tricky part are applications that go from 0 to full speed with little 
> warning; the first few seconds of the stock market open come to mind.

Maybe have the backends send a signal to bgwriter when they see it
sleeping and are overwhelmed by work.  That way, bgwriter can sleep for
a few seconds, safe in the knowledge that somebody else will wake it up
if needed sooner.  The way backends would detect that bgwriter is
sleeping is that bgwriter would keep an atomic flag in shared memory,
and it gets set only if it's going to sleep for long (so if it's going
to sleep for (say) 100ms or less, it doesn't set the flag, so the
backends won't signal it).  In order to avoid a huge amount of signals
when all backends suddenly start working at the same instant, have the
signal itself be sent only by the first backend that manages to
LWLockConditionalAcquire a lwlock that's only used for that purpose.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-08 Thread Gregory Stark

"Josh Berkus" <[EMAIL PROTECTED]> writes:

> No, that would be REPEATABLE READ.  That's one of the areas where we need to 
> test HOT; does RR and SERIALIZABLE still work correctly with HOT?

I'm a little confused by your question. REPEATABLE READ is an isolation level
we don't directly support in Postgres. If you set the isolation level to
REPEATABLE READ you get SERIALIZABLE which is a higher level.

HOT works fine with SERIALIZABLE, It uses the same criteria as vacuum for
determining when a tuple is dead, namely it compares against RecentGlobalXmin.

The check for whether a transaction can see any old tuples in "broken" chains
uses the serializable snapshot as a conservative proxy for the oldest snapshot
which might be in use. That will work for both serializable and read
committed.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Hash index todo list item

2007-09-08 Thread Mark Mielke


Kenneth Marshall wrote:

The value of storing the
actual value, possibly as an option, is that the key check can be done
in the index without requiring a heap lookup to check the actual value
which would be a benefit for a unique index. I agree that supporting
variable length data would complicate the index and reduce performance.
I am not willing to assume that ~1 entry per hash bucket is necessarily
what we want, at least without some testing.
I think that if the case of >1 entry per hash becomes common enough to 
be significant, and the key is stored in the hash, that a btree will 
perform equal or better, and there is no point in pursuing  such a hash 
index model. This is where we are today.



It seems reasonable that
with the performance differences between L1/L2/L3 cache, main memory,
and the disk subsystem a higher load factor would result in better
overall system performance by reducing cache line misses and improving
hardware pre-fetch efficiency.
If the key is stored, all of these factors likely favor the btree format 
over the hash format. Again, this is where we are today.



Along with the hypothetical performance
wins, the hash index space efficiency would be improved by a similar
factor. Obviously, all of these ideas would need to be tested in
various workload environments. In the large index arena, 10^6 to 10^9
keys and more, space efficiency will help keep the index manageable
in todays system memories.
  
Space efficiency is provided by not storing the key, nor the header data 
required (length prefix?).

Please keep the ideas and comments coming. I am certain that a synthesis
of them will provide an implementation with the performance characteristics
that we are seeking.
  

The subject interests me. :-)

Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>

Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-08 Thread apoc9009




No! Actually I'm wearing my tin hat right now and I Never say Anything
about My Suspicions about 9/11 on Internet in fear of Echelon catching
and filing me.

---
Hannu

hmm, a little bit Para?

http://www.myvideo.de/watch/1776449

Ok, now your point of View its more clearly...

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] WIP patch for latestCompletedXid method of computing snapshot xmax

2007-09-08 Thread Gregory Stark

"Josh Berkus" <[EMAIL PROTECTED]> writes:

> Hmmm.  Your results are withing the margin of error for DBT2, so they show no 
> real difference.  What we need for this is a heavy-read workload, though; on 
> a workload like DBT2 (which is mostly writes) I wouldn't expect lazy-XID to 
> help much.

The TPM is definitely within the margin of error but the 90th percentile
response times seem worrisome. But they're already over 5s which are TPC-C
failures. From what we've seen when you push the scale factor too high until
those numbers are even close to 5s you get very unstable results. 

The published benchmarks seem to keep the max around 5s which puts the 90th
percentile around half that. That seems overly conservative but I wonder what
reason they have for doing it that way.

> I'll see if I can find something appropriate.  Maybe Jan's TPC-W 
> implementation?

We could just rejigger the percentages on the transactions. I think Stock
Level and Order Status are entirely read-only but I would have to check. Stock
Level is a very intensive query though so I wouldn't suggest raising the
percentage on that.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Hash index todo list item

2007-09-08 Thread Kenneth Marshall

On Sat, Sep 08, 2007 at 05:14:09PM -0400, Mark Mielke wrote:
> Kenneth Marshall wrote:
>> Continuing this train of thought While it would make sense for larger
>> keys to store the hash in the index, if the key is smaller, particularly
>> if it is of fixed size, it would make sense to store the key in the index
>> instead. This would have the benefit of allowing use of the hash index
>> in non-lossy mode albeit with a slight increase in complexity.
>>   
> I suspect there is no value in designing a hash implementation to work well 
> for a context where a btree index would already perform equally well.
>
> If there are too few hash buckets, performance is not O(1). For a hash 
> index to function better than btree, I believe focus should be spent on the 
> O(1) case, which means ensuring that enough hash buckets are used to 
> provide O(1).
>
> All of these must match: 1) Hash value, 2) Key value, 3) Tuple visibility.
>
> In the optimum O(1) scenario, each existing key will map to a hash bucket 
> that contains ~1 entry. For this case, there is no value to having the key 
> stored in the index row, as 3) Tuple visibility, will still require access 
> to the table row. In this optimum scenario, I do not believe anything of 
> value is saved by storing the key in the index row. The loss, however, is 
> that the hash index data structures become more complex, and would likely 
> require support for variable length data. The resulting increase in hash 
> index size and code complexity would reduce performance.
>
> Just an opinion.
>
> Cheers,
> mark
>

I agree that we should focus on the O(1) area. The value of storing the
actual value, possibly as an option, is that the key check can be done
in the index without requiring a heap lookup to check the actual value
which would be a benefit for a unique index. I agree that supporting
variable length data would complicate the index and reduce performance.
I am not willing to assume that ~1 entry per hash bucket is necessarily
what we want, at least without some testing. It seems reasonable that
with the performance differences between L1/L2/L3 cache, main memory,
and the disk subsystem a higher load factor would result in better
overall system performance by reducing cache line misses and improving
hardware pre-fetch efficiency. Along with the hypothetical performance
wins, the hash index space efficiency would be improved by a similar
factor. Obviously, all of these ideas would need to be tested in
various workload environments. In the large index arena, 10^6 to 10^9
keys and more, space efficiency will help keep the index manageable
in todays system memories.

Please keep the ideas and comments coming. I am certain that a synthesis
of them will provide an implementation with the performance characteristics
that we are seeking.

Regards,
Ken

> -- 
> Mark Mielke <[EMAIL PROTECTED]>
>

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] WIP patch for latestCompletedXid method of computing snapshot xmax

2007-09-08 Thread Gregory Stark

"Tom Lane" <[EMAIL PROTECTED]> writes:

> My desktop machine has a single consumer-grade IDE drive, and even with
> fsync off and synchronous_commit off, it can barely make 190 tps sustained
> pgbench throughput --- it's just disk write bound all the time. On a run
> with 8 clients, 1 transactions per client, DB scale factor 25, I get
> this distribution of transaction times from CVS HEAD:

Wouldn't you expect to see more of an effect on cpu-bound environments? Is
there any i/o going on while these locks are being held? I suppose there's a
WAL commit record that has to be synced?

Have you tried with a smaller scale factor?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] WIP patch for latestCompletedXid method of computing snapshot xmax

2007-09-08 Thread Tom Lane

Josh Berkus <[EMAIL PROTECTED]> writes:
> Is the latest version of the patch from -patches the one I should test?

Yeah, the posted patch is the same as what I committed.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] WIP patch for latestCompletedXid method of computing snapshot xmax

2007-09-08 Thread Tom Lane

Josh Berkus <[EMAIL PROTECTED]> writes:
> Hmmm.  Your results are withing the margin of error for DBT2, so they
> show no real difference.  What we need for this is a heavy-read
> workload, though; on a workload like DBT2 (which is mostly writes) I
> wouldn't expect lazy-XID to help much.

Lazy-XID doesn't help on a write-mostly workload, but I would have
expected to see some benefit from the latestCompletedXid patch.

The rough tests I just finished suggest that the win to look for is
improvement of the very tail of the distribution --- well beyond the
90th percentile point which is the most we can see in Markus'
printout.  Can we get 95% and 99% response time percentiles?

For comparison, in my little test I got 0.828546 vs 0.873957 as the
99% percentile pgbench transaction times.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Hash index todo list item

2007-09-08 Thread Mark Mielke


Kenneth Marshall wrote:

Continuing this train of thought While it would make sense for larger
keys to store the hash in the index, if the key is smaller, particularly
if it is of fixed size, it would make sense to store the key in the index
instead. This would have the benefit of allowing use of the hash index
in non-lossy mode albeit with a slight increase in complexity.
  
I suspect there is no value in designing a hash implementation to work 
well for a context where a btree index would already perform equally well.


If there are too few hash buckets, performance is not O(1). For a hash 
index to function better than btree, I believe focus should be spent on 
the O(1) case, which means ensuring that enough hash buckets are used to 
provide O(1).


All of these must match: 1) Hash value, 2) Key value, 3) Tuple visibility.

In the optimum O(1) scenario, each existing key will map to a hash 
bucket that contains ~1 entry. For this case, there is no value to 
having the key stored in the index row, as 3) Tuple visibility, will 
still require access to the table row. In this optimum scenario, I do 
not believe anything of value is saved by storing the key in the index 
row. The loss, however, is that the hash index data structures become 
more complex, and would likely require support for variable length data. 
The resulting increase in hash index size and code complexity would 
reduce performance.


Just an opinion.

Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>

Re: [HACKERS] WIP patch for latestCompletedXid method of computing snapshot xmax

2007-09-08 Thread Josh Berkus

Markus,

> I've just completed two dbt2 test runs on a mid-level system, with 4GB
> RAM and a 7 disk SATA RAID 1+0 w/ BBU. Once with code as of 2007/09/05
> 18:00 (which is *before* the first lazy xid commit) and once with cvs
> HEAD (2007/09/08  +latestCompletedXid.patch.

Hmmm.  Your results are withing the margin of error for DBT2, so they show no 
real difference.  What we need for this is a heavy-read workload, though; on 
a workload like DBT2 (which is mostly writes) I wouldn't expect lazy-XID to 
help much.

I'll see if I can find something appropriate.  Maybe Jan's TPC-W 
implementation?

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] WIP patch for latestCompletedXid method of computing snapshot xmax

2007-09-08 Thread Markus Schiltknecht


Hello Tom,

Tom Lane wrote:

So on the strength of that, I'm going to go ahead and commit the patch,
but I'd be interested to see benchmarks from people with access to
better hardware.


I've just completed two dbt2 test runs on a mid-level system, with 4GB 
RAM and a 7 disk SATA RAID 1+0 w/ BBU. Once with code as of 2007/09/05 
18:00 (which is *before* the first lazy xid commit) and once with cvs 
HEAD (2007/09/08  +latestCompletedXid.patch.


Here are the results from the first test run (test run 33, without lazy 
xid):


$ cat 33/driver/results.out 
 Response Time (s)

 Transaction  %Average :90th %TotalRollbacks  %
  -  -  ---  ---  -
Delivery   3.97  3.745 : 7.844118440   0.00
   New Order  45.35  3.844 : 7.692   135192 1352   1.01
Order Status   3.95  2.728 : 6.371117640   0.00
 Payment  42.74  2.649 : 6.349   1274150   0.00
 Stock Level   4.00  2.172 : 5.634119150   0.00
  -  -  ---  ---  -

1103.45 new-order transactions per minute (NOTPM)
120.1 minute duration
0 total unknown errors
1003 second(s) ramping up


And that's with HEAD +latestCompletedXid.patch (test run 34):

$ cat 34/driver/results.out 
 Response Time (s)

 Transaction  %Average :90th %TotalRollbacks  %
  -  -  ---  ---  -
Delivery   3.96  3.843 : 8.223117600   0.00
   New Order  45.28  4.049 : 8.451   134398 1300   0.98
Order Status   3.97  2.877 : 6.815117770   0.00
 Payment  42.80  2.745 : 6.718   1270270   0.00
 Stock Level   4.00  2.280 : 6.129118590   0.00
  -  -  ---  ---  -

1097.71 new-order transactions per minute (NOTPM)
120.1 minute duration
0 total unknown errors
1003 second(s) ramping up


Both tests ran for two hours, had 100 warehouses and 50 connections. 
shared_buffers were set to 1024MB, effective_cachesize = 3800MB, all 
other settings were standard.


Regards

Markus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] WIP patch for latestCompletedXid method of computing snapshot xmax

2007-09-08 Thread Josh Berkus

Tom,

> So on the strength of that, I'm going to go ahead and commit the patch,
> but I'd be interested to see benchmarks from people with access to
> better hardware.

Is the latest version of the patch from -patches the one I should test?

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Just-in-time Background Writer Patch+Test Results

2007-09-08 Thread Greg Smith


On Thu, 6 Sep 2007, Decibel! wrote:


I don't know that there should be a direct correlation, but ISTM that
scan_whole_pool_seconds should take checkpoint intervals into account
somehow.


Any direct correlation is weak at this point.  The LRU cleaner has a small 
impact on checkpoints, in that it's writing out buffers that may make the 
checkpoint quicker.  But this particular write trickling mechanism is not 
aimed directly at flushing the whole pool; it's more about smoothing out 
idle periods a bit.


Also, computing the checkpoint interval is itself tricky.  Heikki had to 
put some work into getting something that took into account both the 
timeout and segments mechanisms to gauge progress, and I'm not sure I can 
directly re-use that because it's really only doing that while the 
checkpoint is active.  I'm not saying it's a bad idea to have the expected 
interval as an input to the model, just that it's not obvious to me how to 
do it and whether it would really help.



I like the idea of not having that as a GUC, but I'm doubtful that it
can be hard-coded like that. What if checkpoint_timeout is set to 120?
Or 60? Or 2000?


Someone using 60 or 120 has checkpoint problems way bigger than the LRU 
cleaner can be expected to help with.  How fast the reusable buffers it 
can write are pushed out is the least of their problems.  Also, I'd expect 
that the only cases using such a low value for a good reason are doing so 
because they have enormous amounts of activity on their system, and in 
that case the primary JIT mechanism should dominate how the LRU cleaner 
treats them.  scan_whole_pool_seconds doesn't do anything if the primary 
mechanism was already planning to scan more buffers than it aims for.


Someone who has very infrequent checkpoints and therefore low activity, 
like your 2000 case, can expect that the LRU cleaner will lap and catch up 
to the strategy point about 2 minutes after any activity and then follow 
directly behind it with the way I've set this up.  If that's cleaning the 
buffer cache too aggressively, I think those in that situation would be 
better served by constraining the maxpages parameter; that's directly 
adjusting what I'd expect their real issue is, how fast pages can flush to 
disk, rather than the secondary one of how fast the pool is being scanned.


I picked 2 minutes for that value because it's as slow as I can make it 
and still serve its purpose, while not feeling to me like it's too fast 
for a relatively idle system even if someone set maxpages=1000.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] WIP patch for latestCompletedXid method of computing snapshot xmax

2007-09-08 Thread Tom Lane

I wrote:
> This patch implements Florian's idea about how to manage snapshot xmax
> without the ugly and performance-losing tactic of taking XidGenLock and
> ProcArrayLock at the same time.  I had to do a couple of slightly klugy
> things to get bootstrap and prepared transactions to work, but on the
> whole it seems at least as clean as the code we have now.  Comments?

I spent a fair amount of time this afternoon trying to measure the
performance impact of this patch using pgbench, without a lot of success
--- as far as average transaction rates go, it's a wash compared to CVS
HEAD.  However, it eventually struck me to look at the distribution of
transaction times, using pgbench's -l logging output, and on that basis
it is clear that getting rid of the XidGenLock interaction has a good
deal of use in eliminating outlier times.  My desktop machine has a
single consumer-grade IDE drive, and even with fsync off and
synchronous_commit off, it can barely make 190 tps sustained pgbench
throughput --- it's just disk write bound all the time.  On a run with 8
clients, 1 transactions per client, DB scale factor 25, I get this
distribution of transaction times from CVS HEAD:

postgres=# select usec/100, count(*) from plhead group by 1 order by 1;
 ?column? | count 
--+---
0 | 79306
1 |   290
2 |   116
3 |65
4 |82
5 |30
6 |31
7 |32
   10 | 8
   11 | 8
   13 |16
   14 | 5
   15 | 3
   20 | 8
(14 rows)

and this from HEAD plus the patch:

postgres=# select usec/100, count(*) from plpatch group by 1 order by 1;
 ?column? | count 
--+---
0 | 79305
1 |   325
2 |85
3 |49
4 |68
5 |50
6 |45
7 |35
8 |14
9 | 8
   10 | 6
   11 |10
(12 rows)

The worst-case transaction time has dropped by nearly a factor of 2:

postgres=# select * from plhead order by usec desc limit 20;
 client | trans |   usec   | f |   epoch|  lsb   
+---+--+---++
  2 |  6379 | 20621910 | 0 | 1189280557 | 664207
  6 |  5992 | 20621175 | 0 | 1189280557 | 665970
  7 |  5795 | 20621024 | 0 | 1189280557 | 666353
  1 |  6327 | 20620833 | 0 | 1189280557 | 663606
  3 |  6463 | 20620277 | 0 | 1189280557 | 663895
  4 |  6383 | 20620260 | 0 | 1189280557 | 664000
  5 |  6209 | 20620077 | 0 | 1189280557 | 665060
  0 |  6269 | 20619875 | 0 | 1189280557 | 664935
  6 |  8182 | 15191784 | 0 | 1189280655 |  87859
  3 |  8810 | 15191637 | 0 | 1189280655 |  86802
  2 |  8700 | 15185120 | 0 | 1189280655 |  86742
  5 |  8479 | 14078513 | 0 | 1189280653 | 978339
  1 |  8618 | 14077106 | 0 | 1189280653 | 978216
  7 |  7930 | 14076905 | 0 | 1189280653 | 978832
  4 |  8704 | 14076429 | 0 | 1189280653 | 977877
  0 |  8557 | 14076249 | 0 | 1189280653 | 977477
  0 |  6717 | 13932179 | 0 | 1189280576 |  65288
  1 |  6775 | 13931973 | 0 | 1189280576 |  65387
  6 |  6410 | 13931493 | 0 | 1189280576 |  67192
  7 |  6201 | 13931140 | 0 | 1189280576 |  69247
(20 rows)

postgres=# select * from plpatch order by usec desc limit 20;
 client | trans |   usec   | f |   epoch|  lsb   
+---+--+---++
  6 |  6008 | 11833702 | 0 | 1189281093 | 646851
  0 |  6140 | 11833041 | 0 | 1189281093 | 645738
  2 |  6289 | 11809343 | 0 | 1189281093 | 616734
  4 |  6315 | 11808044 | 0 | 1189281093 | 617505
  3 |  6344 | 11807762 | 0 | 1189281093 | 616970
  7 |  5802 | 11807641 | 0 | 1189281093 | 617932
  5 |  6183 | 11806964 | 0 | 1189281093 | 618060
  1 |  6163 | 11805494 | 0 | 1189281093 | 616679
  7 |  8175 | 11027973 | 0 | 1189281239 | 675499
  2 |  8725 | 11019066 | 0 | 1189281239 | 674305
  5 |  8828 | 10997331 | 0 | 1189281239 | 674953
  0 |  8541 | 10987629 | 0 | 1189281239 | 673773
  1 |  8799 | 10713734 | 0 | 1189281239 | 673217
  4 |  8897 | 10705975 | 0 | 1189281239 | 672743
  6 |  8364 | 10702875 | 0 | 1189281239 | 677163
  3 |  8814 | 10701467 | 0 | 1189281239 | 674369
  3 |  9223 |  9158554 | 0 | 1189281258 | 202934
  5 |  9234 |  9143744 | 0 | 1189281258 | 203073
  7 |  8493 |  9099174 | 0 | 1189281258 | 204092
  4 |  9306 |  9097074 | 0 | 1189281258 | 202402
(20 rows)


So on the strength of that, I'm going to go ahead and commit the patch,
but I'd be interested to see benchmarks from people with access to
better hardware.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Hash index todo list item

2007-09-08 Thread Kenneth Marshall

On Fri, Sep 07, 2007 at 10:36:41AM -0400, Brian Hurt wrote:
> Kenneth Marshall wrote:
>
>> I understand that a hash value is a many-to-one mapping. That is the
>> point of the flag in the index. The flag means that there is only one
>> item in the heap corresponding to that hash value. In this case we
>> know that the value in the heap is the correct one and a possibly
>> very expensive string comparison can be skipped. Given that the hash
>> function is doing its job, almost every string comparison can be skipped.
>> How long would it take to compare 1-32K of data? How much CPU usage?
>> With this field in place, you only need to check tuple visibility.
>>  
>
> How likely is it that you will get a hash collision, two strings that are 
> different that will hash to the same value?  To avoid this requires a very 
> large hash key (128 bits, minimum)- otherwise you get into birthday attack 
> problems.  With a 32-bit hash, the likelyhood is greater than 50% that two 
> strings in a collection of 100,000 will hash to the same value.  With a 
> 64-bit hash, the likelyhood is greater than 50% that two strings in a 
> collection of 10 billion will has to same value.  10 billion is a large 
> number, but not an unreasonable number, of strings to want to put into a 
> hash table- and it's exactly this case where the O(1) cost of hashtables 
> starts being a real win.
>
> Brian
>
Continuing this train of thought While it would make sense for larger
keys to store the hash in the index, if the key is smaller, particularly
if it is of fixed size, it would make sense to store the key in the index
instead. This would have the benefit of allowing use of the hash index
in non-lossy mode albeit with a slight increase in complexity.

Ken

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-08 Thread Hannu Krosing

Ühel kenal päeval, L, 2007-09-08 kell 21:15, kirjutas Apoc Sagdiyev:
> > C. 'backup _is_ replication' is also true 
> >
> > --
> > Hannu
> 
> It is useless to speak with a person like you 

Oh, you think that _I_ am "scandinavian" ?? Never thought about that
possibility ;P

If speaking with me is useless to you, then don't. 

> about the diffrence between Backup and Replications.
> Both Things having diffrent Concepts and Approaches, but for you it is all 
> the same.

by Capitalising these Nouns you really make them look like some Big
Concepts that are Beyond Understanding Of Mere Mortals.

> What should i say? Thadts the typically scandinavian Fishheadnature. A 
> Fish is a Fish. (anyway if one is a Wale and next is a Shark).

The reason I have been so patient with you is that your question and
writing style suggests that you are still quite young, probably no more
that 17 to 19, and also I thought that you were looking for a solution
to your problem, not just trying to look smart or pushing your latest
great idea.



What I was trying to tell you, is that in order to have a backup that
can actually be used quickly in case of failure on master db, you need
your backup to be in form of replica (that is one meaning of "backup
_is_ replication").

If you had your backup as a week-old base backup + a set of WAL files,
it can take days to bring the replacement machine up, as both unpacking
the base backup and especially replaying the WAL files take time.



> I guess, the next Thing you will say is: Nobody was on the Moon and the 
> 9/11 Incident was the Work of George W. Bush Junior itself and Santa 
> Claus will be the next President of the United States.

No! Actually I'm wearing my tin hat right now and I Never say Anything
about My Suspicions about 9/11 on Internet in fear of Echelon catching
and filing me.

---
Hannu



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Just-in-time Background Writer Patch+Test Results

2007-09-08 Thread Greg Smith


On Sat, 8 Sep 2007, Tom Lane wrote:


It might be interesting to consider making the delay auto-tune: if you
wake up and find nothing (much) to do, sleep longer the next time,
conversely shorten the delay when work picks up.  Something for 8.4,
though, at this point.


I have a couple of pages of notes on how to tune the delay automatically. 
The tricky part are applications that go from 0 to full speed with little 
warning; the first few seconds of the stock market open come to mind. 
What I was working toward was considering what you set the delay to as a 
steady-state value, and then the delay cranks downward as activity levels 
go up.  As activity dies off, it slowly returns to the default again.


But I realized that I needed to get all this other stuff working, all the 
statistics counters exposed usefully, and then collect a lot more data 
before I could implement that plan.  Definately something that might fit 
into 8.4, completely impossible for 8.3.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-08 Thread Josh Berkus

Greg,

> Aren't there some things that depend on the idea that even READ COMMITTED
> transactions still have a serializable snapshot lying around for them to
> use?

No, that would be REPEATABLE READ.  That's one of the areas where we need to 
test HOT; does RR and SERIALIZABLE still work correctly with HOT?

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (MaybeOFFTOPIC)

2007-09-08 Thread Gregory Stark


"Hannu Krosing" <[EMAIL PROTECTED]> writes:

> Is this "apoc9009" guy real ?

Please, just don't respond.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Just-in-time Background Writer Patch+Test Results

2007-09-08 Thread Gregory Stark

"Greg Smith" <[EMAIL PROTECTED]> writes:

> On Sat, 8 Sep 2007, Tom Lane wrote:
>
>> I've already gotten flak about the current default of 200ms:
>> https://bugzilla.redhat.com/show_bug.cgi?id=252129
>> I can't imagine that folk with those types of goals will tolerate an
>> un-tunable 10ms cycle.
>
> That's the counter-example for why lowering the default is unacceptable I was
> looking for.  Scratch bgwriter_delay off the list of things that might be 
> fixed
> to a specific value.

Ok, time for the obligatory contrarian voice here. It's all well and good to
aim to eliminate GUC variables but I don't think it's productive to do so by
simply hard-wiring them. 

Firstly that doesn't really make life any easier than simply finding good
defaults and documenting that DBAs probably shouldn't be bothering to tweak
them.

Secondly it's unlikely to work. The variables under consideration may have
reasonable defaults but they're not likely to have defaults will work in every
case. This example is pretty typical. There aren't many variables that will
have a reasonable default which will work for both an interactive desktop
where Postgres is running in the background and Sun's 1000+ process
benchmarks.

What I think is more likely to work is looking for ways to make these
variables auto-tuning. That eliminates the knob not by just hiding it away and
declaring it doesn't exist but by architecting the system so that there really
is no knob that might need tweaking.

Perhaps what would work better here is having a semaphore which bgwriter
sleeps on which backends wake up whenever the clock sweep hand completes a
cycle. Or gets within a certain fraction of a cycle of catching up.

Or perhaps bgwriter shouldn't be adjusting the number of pages it processes at
all and instead it should only be adjusting the sleep time. So it would always
process a full cycle for example but adjust the sleep time based on what
percentage of the cycle the backends used up in the last sleep time.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[HACKERS] TODO item: add \# which lists line numbers, and allows command execution

2007-09-08 Thread Sibte Abbas

Hi all,

Realizing that the mentioned TODO item discussed at
http://archives.postgresql.org/pgsql-hackers/2006-12/msg00255.phpcan
be useful for myself and others as well, I would like to go ahead and
implement it.

Synopsis:
==

Sticking to the syntax and semantics which Joshua already mentioned in the
thread:

\#: displays the command history. Like \s but prefixes the lines with line
numbers

\# : executes the command(if any) executed at the line specified by
line_no


Display the history contents:
=

Well the answer to this pretty much lies in the \s (without any filename)
implementation. However \s simply writes the contents of the history to the
TTY while \# would prefix all the history lines with their respective line
numbers.

Because of this difference, we'll have to employ a more flexible approach
rather than simply using the write_history() function.

A solution to this is to use history_list() function to get a list of
HIST_ENTRY structures and simply display them as per our needs i.e. to
prefix each line with incrementing numbers in our case.

Execute a line based on line number:
==

For this we can simply use the history_get(int offset) function of the gnu
history api to return a HIST_ENTRY structure corresponding to the given
offset.

The returned HIST_ENTRY can then be used to access the command that was
executed at that specific line number.

Comments are welcomed.

regards,
-- 
Sibte Abbas

Re: [HACKERS] Just-in-time Background Writer Patch+Test Results

2007-09-08 Thread Tom Lane

Greg Smith <[EMAIL PROTECTED]> writes:
> On Sat, 8 Sep 2007, Tom Lane wrote:
>> In fact, given the numbers you show here, I'd say you should leave the 
>> default cycle time at 200ms.  The 10ms value is eating way more CPU and 
>> producing absolutely no measured benefit relative to 200ms...

> My server is a bit underpowered to run at 10ms and gain anything when 
> doing a stress test like this; I was content that it didn't degrade 
> performance significantly, that was the best I could hope for.  I would 
> expect the class of systems that Simon and Heikki are working with could 
> show significant benefit from running the BGW that often.

Quite possibly.  So it sounds like we still need to expose
bgwriter_delay as a tunable.

It might be interesting to consider making the delay auto-tune: if you
wake up and find nothing (much) to do, sleep longer the next time,
conversely shorten the delay when work picks up.  Something for 8.4,
though, at this point.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-08 Thread apoc9009



C. 'backup _is_ replication' is also true 


--
Hannu


It is useless to speak with a person like you about the diffrence between
Backup and Replications.Both Things having diffrent Concepts and 
Approaches,

but for you it is all the same.

What should i say? Thadts the typically scandinavian Fishheadnature. A 
Fish is a Fish.

(anyway if one is a Wale and next is a Shark).

I guess, the next Thing you will say is: Nobody was on the Moon and the 
9/11 Incident
was the Work of George W. Bush Junior itself and Santa Claus will be the 
next President

of the United States.

Apoc

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Just-in-time Background Writer Patch+Test Results

2007-09-08 Thread Greg Smith


On Sat, 8 Sep 2007, Tom Lane wrote:

I've already gotten flak about the current default of 200ms: 
https://bugzilla.redhat.com/show_bug.cgi?id=252129
I can't imagine that folk with those types of goals will tolerate an 
un-tunable 10ms cycle.


That's the counter-example for why lowering the default is unacceptable I 
was looking for.  Scratch bgwriter_delay off the list of things that might 
be fixed to a specific value.


Will return to the drawing board to figure out a way to incorporate what 
I've learned about running at 10ms into a tuning plan that still works 
fine at 200ms or higher.  The good news as far as I'm concerned is that I 
haven't had to adjust the code so far, just tweak the existing knobs.


In fact, given the numbers you show here, I'd say you should leave the 
default cycle time at 200ms.  The 10ms value is eating way more CPU and 
producing absolutely no measured benefit relative to 200ms...


My server is a bit underpowered to run at 10ms and gain anything when 
doing a stress test like this; I was content that it didn't degrade 
performance significantly, that was the best I could hope for.  I would 
expect the class of systems that Simon and Heikki are working with could 
show significant benefit from running the BGW that often.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Just-in-time Background Writer Patch+Test Results

2007-09-08 Thread Tom Lane

Greg Smith <[EMAIL PROTECTED]> writes:
> If anyone has a reason why they feel the bgwriter_delay needs to be a 
> tunable or why the rate might need to run even faster than 10ms, now would 
> be a good time to say why.

You'd be hard-wiring the thing to wake up 100 times per second?  Doesn't
sound like a good plan from here.  Keep in mind that not everyone wants
their machine to be dedicated to Postgres, and some people even would
like their CPU to go to sleep now and again.

I've already gotten flak about the current default of 200ms:
https://bugzilla.redhat.com/show_bug.cgi?id=252129
I can't imagine that folk with those types of goals will tolerate
an un-tunable 10ms cycle.

In fact, given the numbers you show here, I'd say you should leave the
default cycle time at 200ms.  The 10ms value is eating way more CPU and
producing absolutely no measured benefit relative to 200ms...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Just-in-time Background Writer Patch+Test Results

2007-09-08 Thread Greg Smith


On Fri, 7 Sep 2007, Simon Riggs wrote:


For me, the bgwriter should sleep for at most 10ms at a time.


Here's the results I got when I pushed the time down significantly from 
the defaults, with some of the earlier results for comparision:


 info  | set | tps  | cleaner_pct
---+-+--+-
 jit multiplier=2.0 scan_whole=120s delay=200ms|  17 |  981 |   99.98
 jit multiplier=1.0 scan_whole=120s delay=200ms|  18 |  970 |   99.99

 jit multiplier=1.0 scan_whole=120s delay=20ms |  20 |  956 |   92.34
 jit multiplier=2.0 scan_whole=120s delay=20ms |  21 |  967 |   99.94

 jit multiplier=1.5 scan_whole=120s delay=10ms |  22 |  944 |   97.91
 jit multiplier=2.0 scan_whole=120s delay=10ms |  23 |  981 |99.7

It seems I have to push the multiplier higher to get good results when 
using a much lower interval, which was expected, but the fundamentals all 
scale down to the running much faster the way I'd hoped.


I'm tempted to make the default 10ms, adjust some of the other constants 
just a bit to optimize better for that time scale:  make the default 
multiplier 2.0, increase the weighted average sample period, and perhaps 
reduce scan_whole a bit because that's barely doing anything at 10ms.  If 
no one discovers any problems with working that way during beta, then 
consider locking them in for the RC.  That would leave just the multiplier 
and maxpages as the exposed tunables, and it's very easy to tune maxpages 
just by watching pg_stat_bgwriter.  This would obviously be a very 
aggressive plan--it would be eliminating GUCs and reducing flexibility for 
people in the field, aiming instead at making this more automatic for the 
average case.


If anyone has a reason why they feel the bgwriter_delay needs to be a 
tunable or why the rate might need to run even faster than 10ms, now would 
be a good time to say why.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-08 Thread Hannu Krosing

Ühel kenal päeval, L, 2007-09-08 kell 10:39, kirjutas apoc9009:
> Joshua D. Drake schrieb:
> >
> > This is not acceptable on our lists. Do not post in such a way again.
> >
> > Sincerely,
> >
> > Joshua D. Drake
> Hi Josh,
> Your're right, but this special Guy has just boring me a lot with 
> Replication Things but
> my [Featurerequest] on the Topic was dedicated to "Streaming 
> Onlinebackup" and this is not
> just the same as a simply Replication.
> 
> Sometimes i think, its an Scandinavian Tradition, boring and talking 
> without link to the Topic
> and giving primitive, useless advices like: "look in google.com OR use: 
> "www.postgresql.org search Button"
> tadt doesnt match with the Subject of Discussion.

While his second advice:

2) read what other people write to you and try to understand that.

may look like something "primitive" in general, I think it was to the
point in this case.

Try thinking of it some more, and you may eventually zen it, recognizing
that 


A. you can easily roll your own "Streaming Onlinebackup" in any
scripting language and with exactly the ftp directory structure using
the info provided, and without needing any more skills than making
queries to database and reading a portion of file from position N to
position M.

B. it probably won't take more than an hour to set up including testing.

C. 'backup _is_ replication' is also true 

--
Hannu



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-08 Thread Tom Lane

Gregory Stark <[EMAIL PROTECTED]> writes:
> "Tom Lane" <[EMAIL PROTECTED]> writes:
>> If you issue BEGIN, then SELECT, then sit, you'll be publishing an xmin
>> but not an xid, so at that point you become a problem for VACUUM.
>> However, internally you don't have any live snapshots (if you're in READ
>> COMMITTED mode), so eventually we could have you stop publishing an xmin
>> too.  That's something for 8.4 though.

> Aren't there some things that depend on the idea that even READ COMMITTED
> transactions still have a serializable snapshot lying around for them to use?

I don't see why.  A READ COMMITTED transaction that is between
statements does still have a serializable snapshot sitting around, but
it won't ever reference it again.  (AFAIR you can't switch an active
transaction into serializable mode.)  So with a bit of work on snapshot
management we could recognize that we have no live snapshots and clear
the published xmin.  This has been discussed before, eg this thread:
http://archives.postgresql.org/pgsql-patches/2007-03/msg00381.php

> This is actually one of the rough spots in HOT that I'm afraid you'll have
> problems with when you review it. If there were any HOT chains which are
> broken according to a new index definition then a any transaction considering
> using that index needs to know whether there's any possibility the plan will
> be used with a snapshot which can see those old tuples. It currently does this
> by checking if the transaction which created the index is in its serializable
> snapshot.

This seems to be a proxy for "what's the oldest live snapshot in this
backend", which is exactly what we'd have to track to adjust xmin
anyway.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-08 Thread Gregory Stark

"Tom Lane" <[EMAIL PROTECTED]> writes:

> If you issue BEGIN, then SELECT, then sit, you'll be publishing an xmin
> but not an xid, so at that point you become a problem for VACUUM.
> However, internally you don't have any live snapshots (if you're in READ
> COMMITTED mode), so eventually we could have you stop publishing an xmin
> too.  That's something for 8.4 though.

Aren't there some things that depend on the idea that even READ COMMITTED
transactions still have a serializable snapshot lying around for them to use?

This is actually one of the rough spots in HOT that I'm afraid you'll have
problems with when you review it. If there were any HOT chains which are
broken according to a new index definition then a any transaction considering
using that index needs to know whether there's any possibility the plan will
be used with a snapshot which can see those old tuples. It currently does this
by checking if the transaction which created the index is in its serializable
snapshot.

It seems weird to have the planner using snapshots in any way, but I don't see
any direct problems which arise. Essentially it's using the serializable
snapshot as a proxy for better live snapshot bookkeeping. If there's no
serializable snapshot then any future snapshot taken will surely not be able
to see the old tuples, and if there is then it should be fine (though not as
good as possible if we did more bookkeeping) to use it to determine whether
the old tuples could possibly be visible when the plan is executed.

I'm a bit afraid the plan will stay cached longer than would be ideal. If the
plan doesn't use the new index then ideally we would want to invalidate it
from the cache at the end of this transaction I suppose.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-08 Thread Tom Lane

Alvaro Herrera <[EMAIL PROTECTED]> writes:
> As a fallout of this work that I haven't seen made explicit, a session
> opening a transaction and then sitting around doing nothing will not
> cause as many problems as it used to -- for example it won't cause
> VACUUM to be unable to clean up dead rows.  Is this correct?

Yeah, if you just issue BEGIN and then sit, you won't have acquired
either an xid or an xmin, so you don't create a VACUUM problem anymore.

If you issue BEGIN, then SELECT, then sit, you'll be publishing an xmin
but not an xid, so at that point you become a problem for VACUUM.
However, internally you don't have any live snapshots (if you're in READ
COMMITTED mode), so eventually we could have you stop publishing an xmin
too.  That's something for 8.4 though.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-08 Thread Alvaro Herrera


As a fallout of this work that I haven't seen made explicit, a session
opening a transaction and then sitting around doing nothing will not
cause as many problems as it used to -- for example it won't cause
VACUUM to be unable to clean up dead rows.  Is this correct?

Nowadays this is no longer much of a problem, but it used to be, back
when drivers were more broken than they are now (when they had the
commit method send "COMMIT; BEGIN"); however, it still seems to me like
a big step forwards.

-- 
Alvaro Herrera  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"Siempre hay que alimentar a los dioses, aunque la tierra esté seca" (Orual)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-08 Thread apoc9009


Joshua D. Drake schrieb:


This is not acceptable on our lists. Do not post in such a way again.

Sincerely,

Joshua D. Drake

Hi Josh,
Your're right, but this special Guy has just boring me a lot with 
Replication Things but
my [Featurerequest] on the Topic was dedicated to "Streaming 
Onlinebackup" and this is not

just the same as a simply Replication.

Sometimes i think, its an Scandinavian Tradition, boring and talking 
without link to the Topic
and giving primitive, useless advices like: "look in google.com OR use: 
"www.postgresql.org search Button"

tadt doesnt match with the Subject of Discussion.

Apoc

---(end of broadcast)---
TIP 6: explain analyze is your friend

37 matches

Mail list logo