Re: [HACKERS] Block-level CRC checks

2008-10-03 Thread Brian Hurt
OK, I have a stupid question- torn pages are a problem, but only during 
recovery.  Recovery is (I assume) a fairly rare condition- if data 
corruption is going to happen, it's most likely to happen during normal 
operation.  So why not just turn off CRC checksumming during recovery, 
or at least treat it as a much less critical error?  During recovery, if 
the CRC checksum matches, we can assume the page is good- not only not 
corrupt, but not torn either.  If the CRC checksum doesn't match, we 
don't panic, but maybe we do more careful analysis of the page to make 
sure that only the hint bits are wrong.  Or maybe not.  It's only during 
normal operation that a CRC checksum failure would be considered critical.



Feel free to explain to me why I'm an idiot.

Brian


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block-level CRC checks

2008-10-02 Thread Brian Hurt

Jonah H. Harris wrote:

On Thu, Oct 2, 2008 at 9:07 AM, Brian Hurt <[EMAIL PROTECTED]> wrote:
  

I have a stupid question wrt hint bits and CRC checksums- it seems to me
that it should be possible, if you change the hint bits, to be able to very
easily calculate what the change in the CRC checksum should be.



Doesn't the problem still remain?  The problem being that the buffer
can be changed as it's written, yes?

  

Another possibility is to just not checksum the hint bits...

Brian



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block-level CRC checks

2008-10-02 Thread Brian Hurt
I have a stupid question wrt hint bits and CRC checksums- it seems to me 
that it should be possible, if you change the hint bits, to be able to 
very easily calculate what the change in the CRC checksum should be.


The basic idea of the CRC checksum is that, given a message x, the 
checksum is x mod p where p is some constant polynomial (all operations 
are in GF(2^n)).  Now, the interesting thing about modulo is that it's 
distributable- that is to say, (x ^ y) mod p = (x mod p) ^ (y mod p), 
and that
(x * y) mod p = ((x mod p) * (y mod p)) mod p (I'm using ^ instead of 
the more traditional + here to emphasize that it's xor, not addition, 
I'm doing).  So let's assume we're updating a word a known n bytes from 
the end of the message- we calculate y = old_value ^ new_value, so our 
change is the equivalent of changing the original block m to (m ^ (y * 
x^{8n})).  The new checksum is then (m ^ (y * x^{8n})) mod p =
(m mod p) ^ (((y mod p) * (x^{8n} mod p)) mod p).  Now, m mod p is the 
original checksum, and (x^{8n} mod p) is a constant for a given n, and 
the multiplication modulo p can be implemented as a set of table 
lookups, one per byte.


The take away here is that, if we know ahead of time where the 
modifications are going to be, we can make updating the CRC checksum 
(relatively) cheap.  So, for example, a change of the hint bits would 
only need 4 tables lookups and a couple of xors to update the block's 
CRC checksum.  We could extended this idea- break the 8K page up into, 
say, 32 256-byte "subblocks".  Changing any given subblock would require 
only re-checksumming that subblock and then updating the CRC checksum.  
The reason for the subblocks would be to limit the number of tables 
necessary- each subblock requires it's own set of 4 256-word tables, so 
having 32 subblocks means that the tables involved would be 32*4*256*4 = 
128K in size.  Going to, say, 64 byte subblocks means needing 128 tables 
or 512K of tables.


If people are interested, I could bang out the code tonight, and post it 
to the list tomorrow.


Brian


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block-level CRC checks

2008-10-01 Thread Brian Hurt

Brian Hurt wrote:

Paul Schlie wrote:


... if that doesn't fix
the problem, assume a single bit error, and iteratively flip
single bits until the check sum matches ...
This can actually be done much faster, if you're doing a CRC checksum 
(aka modulo over GF(2^n)). Basically, an error flipping bit n will 
always create the same xor between the computed CRC and the stored 
CRC. So you can just store a table- for all n, an error in bit n will 
create an xor of this value, sort the table in order of xor values, 
and then you can do a binary search on the table, and get exactly what 
bit was wrong.


This is actually probably fairly safe- for an 8K page, there are only 
65536 possible bit positions. Assuming a 32-bit CRC, that means that 
larger corrupts are much more likely to hit one of the other 
4,294,901,760 (2^32 - 2^16) CRC values- 99.998% likely, in fact.




Actually, I think I'm going to take this back. Thinking about it, the 
table is going to be large-ish (~512K) and it assumes a fixed 8K page 
size. I think a better solution would be a tight loop, something like:

r = 1u;
for (i = 0; i < max_bits_per_page; ++i) {
if (r == xor_difference) {
return i;
} else if ((r & 1u) == 1u) {
r = (r >> 1) ^ CRC_POLY;
} else {
r >>= 1;
}
}

Brian


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block-level CRC checks

2008-10-01 Thread Brian Hurt

Paul Schlie wrote:


... if that doesn't fix
the problem, assume a single bit error, and iteratively flip
single bits until the check sum matches ...
This can actually be done much faster, if you're doing a CRC checksum 
(aka modulo over GF(2^n)). Basically, an error flipping bit n will 
always create the same xor between the computed CRC and the stored CRC. 
So you can just store a table- for all n, an error in bit n will create 
an xor of this value, sort the table in order of xor values, and then 
you can do a binary search on the table, and get exactly what bit was wrong.


This is actually probably fairly safe- for an 8K page, there are only 
65536 possible bit positions. Assuming a 32-bit CRC, that means that 
larger corrupts are much more likely to hit one of the other 
4,294,901,760 (2^32 - 2^16) CRC values- 99.998% likely, in fact.


Brian




(hopefully not making the
problem worse as may be the case if many bits were actually already
in error) and write the data back, and proceed as normal, possibly
logging the action; otherwise presume the data is unrecoverable and
in error, somehow mark it as being so such that subsequent queries
which may utilize any portion of it knows it may be corrupt (which
I suspect may be best done not on file-system blocks, but actually
on a logical rows or even individual entries if very large, as my
best initial guess, and likely to measurably affect performance
when enabled, and haven't a clue how resulting query should/could
be identified as being potentially corrupt without confusing the
client which requested it).



  



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-29 Thread Brian Hurt

Andrew Sullivan wrote:


On Thu, May 29, 2008 at 12:11:21PM -0400, Brian Hurt wrote:
 

Being able to do read-only queries makes this feature more valuable in more 
situations, but I disagree that it's a deal-breaker.
   



Your managers are apparently more enlightened than some. ;-)

A

 

No doubt.  But defining the minimum acceptable feature set by the 
demands of the dumbest manager is a no-win proposition.


Brian



Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-29 Thread Brian Hurt

David Fetter wrote:



This part is a deal-killer.  It's a giant up-hill slog to sell warm
standby to those in charge of making resources available because the
warm standby machine consumes SA time, bandwidth, power, rack space,
etc., but provides no tangible benefit, and this feature would have
exactly the same problem.

IMHO, without the ability to do read-only queries on slaves, it's not
worth doing this feature at all.

 



I don't think I agree with this.  There are a large number of situations 
where it's positive expectancy to do precisely this- it's not unlike 
buying a $1 lottery ticket with a 1 chance in 100 of winning $1000- the 
vast majority of the time (99 times out of 100), you're going to lose 
$1.  But when you win, you win big, and make up for all the small losses 
you incurred getting there and then some.  Failover machines are like 
that- most of the time they're negative value, as you said- taking up SA 
time, bandwidth, power, rack space, money, etc.  But every once in a 
(great) while, they save you.  If the cost of having the database down 
for hours or days (as you madly try to next-day replacement hardware) 
isn't that great, then no, this isn't  worthwhile- but in cases where 
the database being down chalks up the lost money quickly, this is easy 
to cost-justify.


Being able to do read-only queries makes this feature more valuable in 
more situations, but I disagree that it's a deal-breaker.


Brian



Re: [HACKERS] An idea for parallelizing COPY within one backend

2008-02-27 Thread Brian Hurt

Andrew Dunstan wrote:




Florian G. Pflug wrote:



Would it be possible to determine when the copy is starting that 
this case holds, and not use the parallel parsing idea in those cases?



In theory, yes. In pratice, I don't want to be the one who has to 
answer to an angry user who just suffered a major drop in COPY 
performance after adding an ENUM column to his table.





I am yet to be convinced that this is even theoretically a good path 
to follow. Any sufficiently large table could probably be partitioned 
and then we could use the parallelism that is being discussed for 
pg_restore without any modification to the backend at all. Similar 
tricks could be played by an external bulk loader for third party data 
sources.




I was just floating this as an idea- I don't know enough about the 
backend to know if it was a good idea or not, it sounds like "not".


Brian


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] An idea for parallelizing COPY within one backend

2008-02-27 Thread Brian Hurt

Tom Lane wrote:


"Florian G. Pflug" <[EMAIL PROTECTED]> writes:
 


...
Neither the "dealer", nor the "workers" would need access to the either
the shared memory or the disk, thereby not messing with the "one backend
is one transaction is one session" dogma.
...
   



Unfortunately, this idea has far too narrow a view of what a datatype
input function might do.  Just for starters, consider "enum" input,
which certainly requires catalog access.  We have also explicitly
acknowledged the idea that datatype I/O functions might try to store
typmod-related data in some special catalog somewhere.

regards, tom lane

 

Would it be possible to determine when the copy is starting that this 
case holds, and not use the parallel parsing idea in those cases?


I'm a big user of copy, generally into very simple tables- few indexes, 
simple column types (numeric, varchar, and int almost exclusively), no 
fancy features.  A parallel copy input in the "simple" cases would be of 
great advantage to me, even if it doesn't parallelize "complicated" cases.


Brian



Re: [HACKERS] Slow count(*)

2008-01-03 Thread Brian Hurt

Kevin Grittner wrote:



If you really are doing proper maintenance, and you don't need exact
counts, you might be able to use the approximation stored in the
system tables:
 



Also, if you're using count(*) as an existance test (common in Mysql 
code), it's better to use exists instead.  Using a table in my system, I 
see:


proddb=> explain analyze select count(*) from instrument_listings 
where update_date is null and delist_date is null;

QUERY PLAN
---
 Aggregate  (cost=18385.45..18385.46 rows=1 width=0) (actual 
time=897.799..897.801 rows=1 loops=1)
   ->  Seq Scan on instrument_listings  (cost=0.00..17973.43 
rows=164807 width=0) (actual time=0.018..634.197 rows=146122 loops=1)

 Filter: ((update_date IS NULL) AND (delist_date IS NULL))
 Total runtime: 897.846 ms
(4 rows)

Time: 898.478 ms
proddb=> explain analyze select true where exists(select 1 from 
instrument_listings where update_date is null and delist_date is null 
limit 1);

 QUERY PLAN

 Result  (cost=0.11..0.12 rows=1 width=0) (actual time=0.026..0.028 
rows=1 loops=1)

   One-Time Filter: $0
   InitPlan
 ->  Limit  (cost=0.00..0.11 rows=1 width=0) (actual 
time=0.022..0.022 rows=1 loops=1)
   ->  Seq Scan on instrument_listings  (cost=0.00..17973.43 
rows=164807 width=0) (actual time=0.017..0.017 rows=1 loops=1)

 Filter: ((update_date IS NULL) AND (delist_date IS NULL))
 Total runtime: 0.063 ms
(7 rows)

Time: 0.768 ms
proddb=>



The exists version is over 1000x faster (and the only reason it's not 
more studly is that I'm working on the table as we speak, so it's all in 
memory).


As a general rule in postgres, don't do count(*) unless you really mean it.

Brian


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Sorting Improvements for 8.4

2007-12-21 Thread Brian Hurt

Brian Hurt wrote:

While we're blue skying things, I've had an idea for a sorting 
algorithm kicking around for a couple of years that might be 
interesting.  It's a variation on heapsort to make it significantly 
more block-friendly.  I have no idea if the idea would work, or how 
well it'd work, but it might be worthwhile kicking around.


Now, the core idea of heapsort is that the array is put into heap 
order- basically, that a[i] >= a[2i+1] and a[i] >= a[2i+2] (doing the 
0-based array version here).  The problem is that, assuming that the 
length of a is larger than memory, then a[2i+1] is likely going to be 
on a different page or block than a[i].  That means every time you 
have to bubble down a new element, you end up reading O(log N) blocks- 
this is *per element*.


The variation is to instead work with blocks, so you have a block of 
entries b[i], and you change the definition of heap order, so that 
min(b[i]) >= max(b[2i+1]) and min(b[i]) >= max(b[2i+2]).  Also, during 
bubble down, you need to be carefull to only change the minimum value 
of one of the two child blocks b[2i+1] and b[2i+2].  Other than that, 
the algorithm works as normal.  The advantage of doing it this way is 
that while each bubble down still takes O(log N) blocks being touched, 
you get a entire block worth of results for your effort.  Make your 
blocks large enough (say, 1/4 the size of workmem) and you greatly 
reduce N, the number of blocks you have to deal with, and get much 
better I/O (when you're reading, you're reading megabytes at a shot).


Now, there are boatloads of complexities I'm glossing over here.  This 
is more of a sketch of the idea.  But it's something to consider.


Following up to myself (my apologies), but it's occurred to me that 
there are three advantages to this proposal that I've since thought of:


1) The two child blocks b[2i+1] and b[2i+2]- the one with the larger 
minimum element is the one we might replace.  In other words, if 
min(b[2i+1]) > min(b[2i+2]) and min(b[i]) < min(b[2i+1]), then we know 
we're going to want the blocks b[4i+3] and b[4i+4]- before we're done 
with blocks b[2i+1] and b[2i+2].  The point here is that this would work 
wonders with the posix_fadvise/asyncio ideas kicking around.  It'd be 
easy for the code to keep 2 large writes and 2 large reads going pretty 
constantly.


2) There is some easy parallelization available.  I'm not sure how much 
worth this is, but the bubble down code is fairly easy to parallelize.  
If we have two bubble-downs going on in parallel, once they go down 
different branches (one thread goes to block b[2i+1] while the other 
goes to b[2i+2]) they no longer interact.  Blocks near the root of the 
heap would be contended over, and multiple threads means smaller blocks 
to keep the total memory foot print the same.  Personally, I think the 
asyncio idea above is more likely to be worthwhile.


3) It's possible to perform the sort lazily.  You have the initial O(N) 
pass over the list, but then each block is only O(log N) cost.  If it's 
likely that only the first part of the result is needed, then much of 
the work can be avoided.


Brian


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Sorting Improvements for 8.4

2007-12-20 Thread Brian Hurt
While we're blue skying things, I've had an idea for a sorting algorithm 
kicking around for a couple of years that might be interesting.  It's a 
variation on heapsort to make it significantly more block-friendly.  I 
have no idea if the idea would work, or how well it'd work, but it might 
be worthwhile kicking around.


Now, the core idea of heapsort is that the array is put into heap order- 
basically, that a[i] >= a[2i+1] and a[i] >= a[2i+2] (doing the 0-based 
array version here).  The problem is that, assuming that the length of a 
is larger than memory, then a[2i+1] is likely going to be on a different 
page or block than a[i].  That means every time you have to bubble down 
a new element, you end up reading O(log N) blocks- this is *per element*.


The variation is to instead work with blocks, so you have a block of 
entries b[i], and you change the definition of heap order, so that 
min(b[i]) >= max(b[2i+1]) and min(b[i]) >= max(b[2i+2]).  Also, during 
bubble down, you need to be carefull to only change the minimum value of 
one of the two child blocks b[2i+1] and b[2i+2].  Other than that, the 
algorithm works as normal.  The advantage of doing it this way is that 
while each bubble down still takes O(log N) blocks being touched, you 
get a entire block worth of results for your effort.  Make your blocks 
large enough (say, 1/4 the size of workmem) and you greatly reduce N, 
the number of blocks you have to deal with, and get much better I/O 
(when you're reading, you're reading megabytes at a shot).


Now, there are boatloads of complexities I'm glossing over here.  This 
is more of a sketch of the idea.  But it's something to consider.


Brian


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Feature Freeze date for 8.4

2007-10-24 Thread Brian Hurt

Josh Berkus wrote:


Folks,

 


You are way ahead of us here. And my vote *still* goes to Mercurial, if
we're picking SCMs.
   



Will a new SCM actually make this easier, or are people just using it as an 
excuse?


 

We use mercurial here at work, having switched to it recently, and while 
I don't claim to be an expert, it does seem nice.  For example, you can 
have a local repository you're checking code into, and can pull from and 
merge up with some shared repository.  Also, you can pull from one 
repository and check into another- so, for example, we have a staging 
repository and a compiles repository (unless you welcome the pain)- you 
pull from the compiles repository, but push changes back to the staging 
repository.  Then we have a script that pulls recent changes from the 
staging repository, make sure they compile and the unit tests run, 
before moving them over to the compiles repository.  This way, the 
version you're pulling at least compiles and passes some minimal unit tests.


A similiar process could work for postgres- except instead of "staging" 
and "compiles" you'd have a "sumbitted" and "accepted" repositories.  
And instead of a compile daemon, it'd be reviewers who would move code 
from one to the other. 

Note that everything I'm talking about here is not unique to Mercurial- 
you can do this just about as easily in darcs or git (I'd advise against 
Bazaar/bzr)- so don't take this as being pro-Mercurial, just pro-SCM.


Brian



Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Brian Hurt

Kenneth Marshall wrote:



 

How likely is it that you will get a hash collision, two strings that are 
different that will hash to the same value?  To avoid this requires a very 
large hash key (128 bits, minimum)- otherwise you get into birthday attack 
problems.  With a 32-bit hash, the likelyhood is greater than 50% that two 
strings in a collection of 100,000 will hash to the same value.  With a 
64-bit hash, the likelyhood is greater than 50% that two strings in a 
collection of 10 billion will has to same value.  10 billion is a large 
number, but not an unreasonable number, of strings to want to put into a 
hash table- and it's exactly this case where the O(1) cost of hashtables 
starts being a real win.


Brian

   


Yes, there is a non-negligible chance of collision (In a DB is there
any chance that is non-negligible? :) ) and the values must be checked
against the actual. The win is the collapse of the index size and only
needed to check a small fraction of the actual tuples.


 



Ah, OK- I misunderstood you.  I thought you were saying that the hash 
values would need to be unique, and you wouldn't check the original 
values at all.  My bad.


Brian



Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Brian Hurt

Kenneth Marshall wrote:


I understand that a hash value is a many-to-one mapping. That is the
point of the flag in the index. The flag means that there is only one
item in the heap corresponding to that hash value. In this case we
know that the value in the heap is the correct one and a possibly
very expensive string comparison can be skipped. Given that the hash
function is doing its job, almost every string comparison can be skipped.
How long would it take to compare 1-32K of data? How much CPU usage?
With this field in place, you only need to check tuple visibility.
 



How likely is it that you will get a hash collision, two strings that 
are different that will hash to the same value?  To avoid this requires 
a very large hash key (128 bits, minimum)- otherwise you get into 
birthday attack problems.  With a 32-bit hash, the likelyhood is greater 
than 50% that two strings in a collection of 100,000 will hash to the 
same value.  With a 64-bit hash, the likelyhood is greater than 50% that 
two strings in a collection of 10 billion will has to same value.  10 
billion is a large number, but not an unreasonable number, of strings to 
want to put into a hash table- and it's exactly this case where the O(1) 
cost of hashtables starts being a real win.


Brian


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Should pointers to PGPROC be volatile-qualified?

2007-09-05 Thread Brian Hurt

Tom Lane wrote:


Comments?  Does anyone think the C standard forbids what I'm worried
about?
 



My understanding of the C spec is that it explicitly *allows* for 
exactly what you're afraid of.  It's even possible if the uses include 
function calls, as the compiler might inline the function calls.


The downside of litering the code with volatile qualifications is that 
it's an optimization stopper.  For example, if proc is declared 
volatile, the compiler couldn't merge multiple different proc->foo 
references into a single load into a register.


Note that all sorts of weirdnesses are possible when you have shared 
mutable state between multiple different threads.  For example, assume 
you have two threads, and two global ints x and y, initially both 0.  
Thread 1 do:

   y = 1;
   r1 = x;
(where r1 is a local variable in thread 1), while thread 2 does:
   x = 1;
   r2 = y;
(with r2 being a local variable in thread 2).

Here's the thing: both r1 and r2 can end up 0!  I've seen this in real 
code.  What happens is that the compiler notices that in both cases, the 
load and stores are independent, so it can reorder them.  And as loads 
tend to be expensive, and nothing can progress until the load completes, 
it moves the loads up before the stores, assuming the program won't 
notice.  Unfortunately, it does, as "the impossible" can then happen.


Brian


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Brian Hurt

Chuck McDevitt wrote:


Tom, it isn't just a case of "convenience".  When we are trying to
convert users from another database (say Oracle for example) to
PostgeSQL, one of the big stumbling blocks that slows down the work is
all the little query changes that people have to make (people who might
not have written the query as well), and it can require the review of
hundreds or thousands of SQL scripts and applications.  The harder it
is, the more reluctant they are to convert.

Sometimes supporting "de-facto" standards as well as official standards
makes sense.
 



One of the ways "de-facto" standards are even better than standards is 
that there are even more of them.  Which de-facto standard should we 
follow: Oracle, MySQL, or MS SQL Server?  Note that following all of 
them is a bad idea, as one persons convience is another persons 
stumbling block.


Brian



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[HACKERS] Possible feature request

2007-07-25 Thread Brian Hurt


I'm not sure if this is the right venue for this- if it isn't, I apologize.

But a feature that I'd like to see is the ability to put an 'as' alias 
on the target of an update statement.


I find myself often writing statements like:
   UPDATE
  some_really_long_table_name
   SET
  col1 = some_value,
  col2 = some_other_value
   FROM
  some_other_really_long_table_name AS table2
   WHERE
  some_really_long_table_name.col3 = table2.col3
  AND some_really_long_table_name.col4 = table2.col4
   ;

What I'd like to write is:
   UPDATE
  some_really_long_table_name AS table1
   SET
  col1 = some_value,
  col2 = some_other_value
   FROM
  some_other_really_long_table_name AS table2
   WHERE
  table1.col3 = table2.col3
  AND table1.col4 = table2.col4
   ;

Is this possible?  Not for 8.3, obviously, but maybe for 8.4?

Brian


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] GUC time unit spelling a bit inconsistent

2007-06-21 Thread Brian Hurt

Peter Eisentraut wrote:


Am Donnerstag, 21. Juni 2007 00:38 schrieb Gregory Stark:
 


I think people are worried that an 'm' in one column might mean something
different than an 'm' in another column, and perhaps that is confusing.
 


To whom? the person writing it?
   



If everyone around here had gotten their way we'd already be in a situation 
were you could write


log_rotation_age = 5m
log_rotation_size = 5m
 



Would it be a bad idea to always support the multi-character 
abbreviations?  So the min, sec, hr, day, wk, mth, yr ar all, to my 
knowledge, unambiguous.  As are kb, mb, gb, tb.  So the above would be:

log_rotation_age = 5min
log_rotation_size = 5mb
edinburgh_distance=5mi

Brian


Re: [HACKERS] Selecting a constant question

2007-06-12 Thread Brian Hurt

Larry McGhaw wrote:


I'm really frustrated by this process I'm not trying to attack anyone
here.  I'm just surprised that no one will even entertain the idea that
this is an issue that needs to be addressed.

Instead nearly all of the responses have been attacking the applications
that rely on the metadata.
 

Having been following this debate, I think what people have really been 
attacking is the idea that the metadata for:


SELECT '1' AS varchar_column;

should be different from the metadata for:

SELECT varchar_column FROM really_big_table;

or for:

SELECT varchar_column FROM really_small_table;

Or at least that's what I've taken away from the dicussion- it's not so 
much that the metadata shouldn't be relied on, it's that the metadata 
may be more generic than theoretically necessary.  And that the metadata 
may not contain the length of a variable length field even when that 
length is known.


Brian


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Ye olde drop-the-database-you-just-left problem

2007-05-30 Thread Brian Hurt

Tom Lane wrote:


I just finished giving someone the standard advice to wait a bit before
trying to drop a database that'd just been accessed:
http://archives.postgresql.org/pgsql-general/2007-05/msg01505.php

AFAICT a "real" fix for this would involve making PQfinish() synchronous
(don't return till backend is dead), which doesn't seem like a great
idea.  However, it suddenly struck me that we could probably make most
of the problem go away if we put that same wait into DROP DATABASE
itself --- that is, if we see other backends in the target DB, sleep
for a second or two and then recheck before erroring out.

This isn't bulletproof since under high load the other backend might
not get to quit, but it'd surely reduce the frequency of complaints
a great deal.  And we could take out the ad-hoc sleeps that are done
in (eg) the contrib regression tests.

Thoughts?
 



Is this a synchronization issue?  I'm wondering if there isn't a better 
solution.  The problem with waiting is that a) you're going to be 
waiting a lot when it's not necessary, and b) the likelyhood you won't 
wait long enough (especially under load, as you mentioned). 

I'm wondering if something like this would work.  When a backend 
connects to the database, it increments a semaphore associated with that 
database.  The last thing it does when exiting is release the semaphore- 
which is the backend's way of saying "OK, all done here".  The drop 
database command checks the semaphore- if it still has a non-zero count, 
it fails rather than dropping the database.  A possibly optional 
argument would have it wait until the semaphore is 0, and then drop the 
database.  This has the advantage of only waiting long enough.


No idea how practical this would be, tho...

Brian

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Last minute mini-proposal (I know, I know) for PQexecf()

2007-04-05 Thread Brian Hurt

My apologies for the late reply...

Tom Lane wrote:


<[EMAIL PROTECTED]> writes:
 


I'd like to see a new variant on PQexec():
   PGresult * PQexecf(PGconn *conn, const char *fmt, ...);
   



Way too late for 8.3 --- if we were going to do something like this,
we should think first and program later.  In particular, blindly
adopting the sprintf format string definition doesn't seem very helpful.
The sorts of escapes I'd want to have are "properly quoted SQL
identifier", "properly quoted SQL literal", etc.  A large fraction of
what sprintf knows about is more or less irrelevant to the task of
creating SQL commands.

 

The advantage of using stock sprintf commands is that most compilers 
understand them these days, and can check that the arguments given match 
the format string.  If you go with your own format specifiers, this is 
no longer true.


Brian



Re: [HACKERS] Money type todos?

2007-03-21 Thread Brian Hurt

Shane Ambler wrote:


August Zajonc wrote:


Agreed with Tom on this one. Full usage of money is beyond tagged types
etc. For example, when you earn money in another currency, it is the
time at which you earn it that describes its value. So for P&L accounts
there is generally no change in exchange rates over time and you need to
track what the rate was at time of earning. Solution is to date earnings
and have a table of exchange rates by day.



Personally I think a true money type should hold the numeric value and 
optionally the currency (similar to the timestamp with timezone) and 
have support functions that handle the i/0 conversion (text - 
$US1,000.00 - to money) as happens now. As opposed to the db designer 
storing it in different columns.



It'd be nice if there were an easy, standard solution to this problem- 
but I don't think there is.  For example, our application (which would 
be greatly simplified if there was a standard solution to this) knows of 
~200 different currencies, including such standards as the Uganda 
Shilling, the Zambia Kwacha, and Ethiopian Birr.  Not to mention you get 
situations where goverments (for various reasons) issue "new" currency, 
like the "new" Zimbabwe Dollar, vr.s the "old" Zimbabwe Dollar.  Confuse 
these two and you'll lose your shirt.


Personally, I don't think it's that big of a deal to have to do in my 
queries:

   SELECT
  table.amount || ccy.code
   FROM
  table
  LEFT JOIN
  lu_currency AS ccy
  WHERE table.ccy_id =  ccy.id

to make the report come out as "100USD".

Brian


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-21 Thread Brian Hurt

Markus Schiltknecht wrote:


Hi,

I recall having read something about rewriting the parser. Together 
with Tom being worried about parser performance and knowing GCC has 
switched to a hand written parser some time ago, I suspected bison to 
be slow. That's why I've asked.


This has little to do with performance and everything to do with the 
insanity which is C++:

http://gnu.teleglobe.net/software/gcc/gcc-3.4/changes.html


* A hand-written recursive-descent C++ parser has replaced the
  YACC-derived C++ parser from previous GCC releases. The new
  parser contains much improved infrastructure needed for better
  parsing of C++ source codes, handling of extensions, and clean
  separation (where possible) between proper semantics analysis
  and parsing. The new parser fixes many bugs that were found in
  the old parser.



Short form: C++ is basically not LALR(1) parseable.

Brian



Re: [HACKERS] tripping an assert in 8.1.6 (more info)

2007-01-23 Thread Brian Hurt

Tom Lane wrote:


Brian Hurt <[EMAIL PROTECTED]> writes:
 


Execute, on a fresh database, the following sql, to recreate the bug:
   



Hmm, I thought that stack trace looked a bit familiar --- we seem to
have fixed the problem as of 8.2.  Unfortunately I can't recall what
the relevant change was exactly; time for some digging in the CVS logs.





Any hope of getting the fix back-ported into the 8.1 tree?

Brian



Re: [HACKERS] tripping an assert in 8.1.6 (more info)

2007-01-23 Thread Brian Hurt
I forgot to mention: core dumps available upon request (obviously I 
don't want to post them to the list).


Brian


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] tripping an assert in 8.1.6 (more info)

2007-01-23 Thread Brian Hurt

More info on that assert I've hit.

Compile 8.1.6 with configuration options:

./configure --with-perl --enable-debug --enable-cassert

(not sure if --perl is relevent or not, I think not).

This is on Fedora Core 5 on x86-32.

Execute, on a fresh database, the following sql, to recreate the bug:

CREATE TABLE foo
(
some_data VARCHAR(32)
,row_date DATE
);

CREATE TABLE bar
(
some_data VARCHAR(32)
,row_date DATE
);

CREATE TABLE quux
(
some_data VARCHAR(32)
,more_data VARCHAR(32)
);


CREATE OR REPLACE VIEW bazz AS
SELECT
   ('bar: ' || bar.row_date) :: TEXT AS action,
   quux.more_data AS more_data,
   bar.row_date AS row_date
FROM
   bar
   JOIN quux
   ON bar.some_data = quux.some_data
UNION ALL
SELECT
   ('foo: ' || foo.row_date) :: TEXT AS action,
   quux.more_data AS more_data,
   foo.row_date AS row_date
FROM
   foo
   JOIN quux
   ON foo.some_data = quux.some_data
;

SELECT action, more_data FROM bazz;



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[HACKERS] tripping an assert in 8.1.6

2007-01-23 Thread Brian Hurt
Hello all.  It seems I'm tripping an assert in 8.1.6- the assert on line 
219 of  src/backend/executor/execScan.c (found by running gdb on a core 
dump).  This is on x86 and Redhat Linux (forget which version).  Note 
that if I recompile 8.1.6 with asserts turned off the query completes 
just fine.  I'm trying to put together an example which reproduces the 
problem without requiring half our company's data- that should follow soon.


The gdb backtrace is:


#0  0xe410 in __kernel_vsyscall ()
(gdb) bt
#0  0xe410 in __kernel_vsyscall ()
#1  0xb7d2dee9 in raise () from /lib/libc.so.6
#2  0xb7d2f4f1 in abort () from /lib/libc.so.6
#3  0x0824f931 in ExceptionalCondition (conditionName=Variable 
"conditionName" is not available.

) at assert.c:51
#4  0x081537ac in ExecAssignScanProjectionInfo (node=0x8426bec)
at execScan.c:219
#5  0x08161339 in ExecInitSubqueryScan (node=0x8412de4, estate=0x8426ad4)
at nodeSubqueryscan.c:212
#6  0x0814e0e4 in ExecInitNode (node=0x8412de4, estate=0x8426ad4)
at execProcnode.c:179
#7  0x0814c554 in ExecutorStart (queryDesc=0x842554c, explainOnly=1 
'\001')

at execMain.c:618
#8  0x081193f5 in ExplainOnePlan (queryDesc=0x842554c, stmt=0x839afe4,
tstate=0x83cbdac) at explain.c:243
#9  0x081198ac in ExplainOneQuery (query=0x83b88e4, stmt=0x839afe4,
tstate=0x83cbdac) at explain.c:214
#10 0x08119a92 in ExplainQuery (stmt=0x839afe4, dest=0x83b8a54)
at explain.c:121
#11 0x081da391 in PortalRunUtility (portal=0x83b67b4, query=0x839b07c,
dest=0x83b8a54, completionTag=0x0) at pquery.c:987
#12 0x081db6dc in PortalRun (portal=0x83b67b4, count=2147483647,
dest=0x839b030, altdest=0x839b030, completionTag=0xbf9efee8 "")
at pquery.c:637
#13 0x081d713c in exec_simple_query (
query_string=0x839a26c "explain SELECT action, bloomberg_code, 
composite_bloomberg_code, reuters_code, cusip_code, sedol_code, 
isin_code FROM vw_ca_generic_actions WHERE (action_date >= 
'20070122'::date) AND (action_date <= "...)

at postgres.c:1004
#14 0x081d8bd3 in PostgresMain (argc=4, argv=0x83593f0,
username=0x83593b8 "postgres") at postgres.c:3232
#15 0x081aca37 in ServerLoop () at postmaster.c:2865
#16 0x081ad936 in PostmasterMain (argc=3, argv=0x8358560) at 
postmaster.c:941
#17 0x0816c1c9 in main (argc=3, argv=Cannot access memory at address 
0x1515

) at main.c:265



This is mainly a "heads up- bug incomming" message.  Thanks.

Brian


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Function execution costs 'n all that

2007-01-15 Thread Brian Hurt

Neil Conway wrote:


On Mon, 2007-01-15 at 10:51 -0800, Richard Troy wrote:
 


I therefore propose that the engine evaluate -
benchmark, if you will - all functions as they are ingested, or
vacuum-like at some later date (when valid data for testing may exist),
and assign a cost relative to what it already knows - the built-ins, for
example.
   



That seems pretty unworkable. It is unsafe, for one: evaluating a
function may have side effects (inside or outside the database), so the
DBMS cannot just invoke user-defined functions at whim. Also, the
relationship between a function's arguments and its performance will
often be highly complex -- it would be very difficult, not too mention
computationally infeasible, to reconstruct that relationship
automatically, especially without any real knowledge about the
function's behavior.
 

Non-developer here, but we use a lot of plpgsql functions at work.  And 
the functions we use fall into two broad, ill-defined catagories- 
"expensive" functions and "cheap" functions.  What I'd like as a user is 
some way to tell the planner "this function is expensive- prefer plans 
which call this function less even if they're otherwise more expensive" 
or "this function is cheap, prefer plans that are otherwise less 
expensive even if they call this function more often".  Precise cost 
estimates aren't that important, IMHO.


Brian



Re: [HACKERS] [PATCHES] [BUGS] BUG #2846: inconsistent and

2006-12-29 Thread Brian Hurt

Bruce Momjian wrote:


Tom Lane wrote:
 


Bruce Momjian <[EMAIL PROTECTED]> writes:
   


Tom Lane wrote:
 


This is *not* going in the right direction :-(
   


Well, then show me what direction you think is better.
 


Fewer restrictions, not more.  The thrust of what I've been saying
(and I think Roman too) is to trust in the hardware float-arithmetic
implementation to be right.  Every time you add an additional "error
check" you are going in the wrong direction.
   



OK, are you saying that there is a signal we are ignoring for
overflow/underflow, or that we should just silently overflow/underflow
and not throw an error?

 

My understanding is that you have to actually set flags in the floating 
point environment to make overflows, underflows, infinities, NaNs, etc. 
raise signals.  You might take a look at fenv.h (defined in the C99 
spec) for the functions to do this.  My apologies if I'm recovering well 
trod ground here.


Note that taking a signal on an FP exception is a horribly expensive 
proposition- we're talking about hundreds or thousands of clock cycles 
here.  But it's probably worthwhile vr.s the cost of testing every 
floating point result, as generally FP exceptions will be rare (probably 
even more rare in database work than in general).  So it's probably 
worthwhile.


Brian



Re: [HACKERS] invalid input syntax for type timestamp.

2006-12-16 Thread Brian Hurt

Joachim Wieland wrote:


Yes, I completely agree that JDT should not be included. I just wanted to
understand how those lines show that JST is still in active use. As far as I
understand it, it says that JST was observed from 1948 to 1951 (the second
rule) and now there is a time zone "J%sT" (because there is no "until"-date
in the last line) but there is no rule that tells us what to replace "%s"
with...
 



We're using JST as a time zone here, so I'd like to politely request 
that JST stay in the list of time zones, wether it is an "official" time 
zone or not.  Thanks.


Brian


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[HACKERS] Compressing table images

2006-05-11 Thread Brian Hurt
My apologies if this subject has already been hashed to death, or if 
this is the wrong list, but I was wondering if people had seen this paper:

http://www.cwi.nl/htbin/ins1/publications?request=intabstract&key=ZuHeNeBo:ICDE:06

Basically it describes a compression algorithm for tables of a 
database.  The huge advantage of  doing this is that it reduced the disk 
traffic by (approximately) a factor of four- at the cost of more CPU 
utilization. 


Any thoughts or comments?

Brian


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq